I have a panel dataset with 15 years of consecutive data, however I have one critical set of variables that is not continuous and I'm wondering how best to approach the problem. I think I could either choose to 1) select certain years for the OLS regression and successive fixed and random effects models or 2) attempt imputation to fill in the missing data.
See example below:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long county int year float(log_hirenew pctaanh pctpoverty percenhsgrad percentassoci percentbachdeg percentgradprofdeg) 5 2001 6.546785 29.568596 24.6 . . . . 5 2002 9.297618 29.30387 24 . . . . 5 2003 5.81413 29.20119 23.1 . . . . 5 2004 7.112328 30.26378 24.8 . . . . 5 2005 6.701961 30.17532 28.6 . . . . 5 2006 5.991465 30.14272 26.9 . . . . 5 2007 10.828143 29.88295 26.7 . . . . 5 2008 8.70963 29.86476 21.9 . . . . 5 2009 . 29.919483 24.9 . . . . 5 2010 7.647309 29.90785 . 37.5 3.9 6.8 3.4 5 2011 6.967909 29.72365 . 38.8 4.6 6.2 3.4 5 2012 6.703188 29.70028 23.1 38 4.5 6.2 3.3 5 2013 5.70711 29.81466 22.6 38.4 4.9 6.3 3.5 5 2014 7.715569 29.952784 22.1 40 5 6.1 3.2 5 2015 8.864464 30.14471 23.2 41 17 6 3 7 2001 6.2186 42.35591 21.5 . . . . 7 2002 9.251482 42.03236 20.6 . . . . 7 2003 6.624065 41.90008 20.6 . . . . 7 2004 6.925595 42.43552 22.5 . . . . 7 2005 6.706862 42.45867 25.5 . . . . 7 2006 6.045005 42.50034 25.8 . . . . 7 2007 10.70965 42.47316 25.5 . . . . 7 2008 8.696176 42.64614 21 . . . . 7 2009 . 42.79832 23.2 . . . . 7 2010 7.669028 42.61848 . 42.7 4 8.9 3.9 7 2011 6.921658 42.4865 . 41.3 3.7 9 4.6 7 2012 7.079185 42.29848 28 40.2 4.3 8 4.2 7 2013 5.638355 42.66886 26.9 43.4 3.9 7.5 3.5 7 2014 7.834788 42.41764 27.5 41.3 4.2 8.2 3.9 7 2015 8.895355 42.34971 27.5 43 24 7 4 8 2001 6.289716 21.73095 12.5 . . . . 8 2002 9.040382 21.669275 12.5 . . . . 8 2003 6.629363 21.63812 13.4 . . . . 8 2004 6.829794 22.19319 14.4 . . . . 8 2005 6.483108 22.0468 14.9 . . . . 8 2006 6.2186 21.805595 12.1 . . . . 8 2007 10.761238 21.606606 15 . . . . 8 2008 8.389359 21.40128 13.6 . . . . 8 2009 . 21.44194 14.3 . . . . 8 2010 7.574045 21.370724 14 33.8 7.5 15.1 6.5 8 2011 6.81564 21.193 14.1 33.5 7.4 15.3 7.5 8 2012 7.053586 21.32348 13.4 32.8 7.6 15.4 7.9 8 2013 5.703783 21.53184 14.3 31 6.9 16.3 8.9 8 2014 7.896924 21.88471 13.6 30.7 7.3 15.8 9.2 8 2015 8.796187 22.0216 14.3 29 25 16 10 9 2001 7.654443 45.64599 19.6 . . . . 9 2002 9.135294 45.8177 19.6 . . . . 9 2003 6.530878 45.96724 19.7 . . . . 9 2004 6.774224 47.15914 20.5 . . . . 9 2005 6.459905 47.18263 23.1 . . . . 9 2006 6.052089 47.45845 22.2 . . . . 9 2007 10.895368 47.56185 22.9 . . . . 9 2008 8.23589 47.66972 19.8 . . . . 9 2009 . 47.68865 16.9 . . . . 9 2010 10.89748 47.74874 19.3 34.9 5.6 14.1 7.9 9 2011 6.80017 47.93261 21.2 34.9 6 14 7.9 9 2012 6.976348 48.25891 19.3 34.2 6.2 13.8 8.3 9 2013 5.429346 48.67302 19.5 33.9 6.1 14.7 8 9 2014 7.783224 48.95944 21.1 33.2 5.9 15 8.5 9 2015 8.801319 49.25023 21.5 33 23 15 9 11 2001 7.618742 17.581144 20.7 . . . . 11 2002 8.985946 17.347332 19.6 . . . . 11 2003 6.335054 17.507563 19 . . . . 11 2004 6.682108 18.270735 20.6 . . . . 11 2005 6.612041 17.8624 21.8 . . . . 11 2006 5.97381 17.908842 22.2 . . . . 11 2007 10.64304 17.594501 19.9 . . . .
Panel Option? Is it possible to run an OLS (and subsequent fixed and random effects models) by certain years? I think this is the worst case option since I'm essentially losing the years I care most about.
Imputed Option? I am not familiar with imputed data creation in Stata, but I do have 2000 and 2010-2015 data for these education variables, and I am wondering if this is enough to impute data? If so, I will have mostly continuous observations in these data if this is an option. If so, is there a good online tutorial or code to create the imputed data? I've read the Stata manual on imputation and can't follow it very well.
0 Response to Imputing Data Procedure
Post a Comment