I have a panel of individuals (persnr) who work for different firms (idnum). Each individual belongs to a skill group (skill=1,2,3). I have calculated average incomes per skill group per firm per year (avgincome1...2...3). My dataset looks like this (I have left out average incomes for the other two skill groups as the procedure should be the same):
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long(persnr idnum) int year float(skill avgincome1) 922006 5 2007 3 . 907384 5 2007 2 76.19995 860399 5 2007 1 76.19995 959584 5 2007 2 . 959584 5 2009 2 . 860399 5 2009 1 34.78 916402 5 2011 2 . 876267 5 2013 2 . 876267 5 2014 2 31.219986 982616 5 2015 1 31.219986 876267 5 2015 2 . 876267 5 2016 2 . 973232 5 2016 1 117.59998 973232 5 2017 1 64.87999 943983 5 2017 1 64.87999 987521 8 2016 1 56.59332 990060 8 2016 2 56.59332 987521 8 2016 1 56.59332 856780 8 2016 1 56.59332
I have tried the following code, which I found in a similar forum (I believe by Nick Cox, I would like to provide the link but I really could not find it again), but adapted to my variables. This example is for the average income of skill group 1.
Code:
gen finc1=avgincome1 bysort idnum (year): replace finc1=finc1[_n+1] if missing(finc1) gsort idnum -year gen binc1 = avgincome1 by idnum: replace binc1=binc1[_n-1] if missing(binc1) replace avgincome1=finc1 if missing(avgincome1) & finc1==binc1
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long(persnr idnum) int year float(skill avgincome1 finc1 binc1) 922006 5 2007 3 . . 76.19995 907384 5 2007 2 76.19995 76.19995 76.19995 860399 5 2007 1 76.19995 76.19995 76.19995 959584 5 2007 2 . . 76.19995 959584 5 2009 2 . 34.78 31.219986 860399 5 2009 1 34.78 34.78 34.78 916402 5 2011 2 . . 31.219986 876267 5 2013 2 . . 31.219986 876267 5 2014 2 31.219986 31.219986 31.219986 982616 5 2015 1 31.219986 31.219986 31.219986 876267 5 2015 2 . . 31.219986 876267 5 2016 2 . 117.59998 64.87999 973232 5 2016 1 117.59998 117.59998 117.59998 973232 5 2017 1 64.87999 64.87999 64.87999 943983 5 2017 1 64.87999 64.87999 64.87999 987521 8 2016 1 56.59332 56.59332 56.59332 990060 8 2016 2 56.59332 56.59332 56.59332 987521 8 2016 1 56.59332 56.59332 56.59332 856780 8 2016 1 56.59332 56.59332 56.59332
I have tried to loop over variable year, which Stata does not allow me. I have also tried to use the tsset command, as found here: https://stats.idre.ucla.edu/stata/fa...time-variable/, but Stata returns the error message "repeated time values within panel r(451)" .
I really appreciate any thoughts on this,
best,
Helen
0 Response to Replace missings in an unbalanced panel CONDITIONAL on year and firm ID
Post a Comment