Imputing Data Procedure

Hello Statlister,

I have a panel dataset with 15 years of consecutive data, however I have one critical set of variables that is not continuous and I'm wondering how best to approach the problem. I think I could either choose to 1) select certain years for the OLS regression and successive fixed and random effects models or 2) attempt imputation to fill in the missing data.

See example below:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long county int year float(log_hirenew pctaanh pctpoverty percenhsgrad percentassoci percentbachdeg percentgradprofdeg)
 5 2001  6.546785 29.568596 24.6    .   .    .   .
 5 2002  9.297618  29.30387   24    .   .    .   .
 5 2003   5.81413  29.20119 23.1    .   .    .   .
 5 2004  7.112328  30.26378 24.8    .   .    .   .
 5 2005  6.701961  30.17532 28.6    .   .    .   .
 5 2006  5.991465  30.14272 26.9    .   .    .   .
 5 2007 10.828143  29.88295 26.7    .   .    .   .
 5 2008   8.70963  29.86476 21.9    .   .    .   .
 5 2009         . 29.919483 24.9    .   .    .   .
 5 2010  7.647309  29.90785    . 37.5 3.9  6.8 3.4
 5 2011  6.967909  29.72365    . 38.8 4.6  6.2 3.4
 5 2012  6.703188  29.70028 23.1   38 4.5  6.2 3.3
 5 2013   5.70711  29.81466 22.6 38.4 4.9  6.3 3.5
 5 2014  7.715569 29.952784 22.1   40   5  6.1 3.2
 5 2015  8.864464  30.14471 23.2   41  17    6   3
 7 2001    6.2186  42.35591 21.5    .   .    .   .
 7 2002  9.251482  42.03236 20.6    .   .    .   .
 7 2003  6.624065  41.90008 20.6    .   .    .   .
 7 2004  6.925595  42.43552 22.5    .   .    .   .
 7 2005  6.706862  42.45867 25.5    .   .    .   .
 7 2006  6.045005  42.50034 25.8    .   .    .   .
 7 2007  10.70965  42.47316 25.5    .   .    .   .
 7 2008  8.696176  42.64614   21    .   .    .   .
 7 2009         .  42.79832 23.2    .   .    .   .
 7 2010  7.669028  42.61848    . 42.7   4  8.9 3.9
 7 2011  6.921658   42.4865    . 41.3 3.7    9 4.6
 7 2012  7.079185  42.29848   28 40.2 4.3    8 4.2
 7 2013  5.638355  42.66886 26.9 43.4 3.9  7.5 3.5
 7 2014  7.834788  42.41764 27.5 41.3 4.2  8.2 3.9
 7 2015  8.895355  42.34971 27.5   43  24    7   4
 8 2001  6.289716  21.73095 12.5    .   .    .   .
 8 2002  9.040382 21.669275 12.5    .   .    .   .
 8 2003  6.629363  21.63812 13.4    .   .    .   .
 8 2004  6.829794  22.19319 14.4    .   .    .   .
 8 2005  6.483108   22.0468 14.9    .   .    .   .
 8 2006    6.2186 21.805595 12.1    .   .    .   .
 8 2007 10.761238 21.606606   15    .   .    .   .
 8 2008  8.389359  21.40128 13.6    .   .    .   .
 8 2009         .  21.44194 14.3    .   .    .   .
 8 2010  7.574045 21.370724   14 33.8 7.5 15.1 6.5
 8 2011   6.81564    21.193 14.1 33.5 7.4 15.3 7.5
 8 2012  7.053586  21.32348 13.4 32.8 7.6 15.4 7.9
 8 2013  5.703783  21.53184 14.3   31 6.9 16.3 8.9
 8 2014  7.896924  21.88471 13.6 30.7 7.3 15.8 9.2
 8 2015  8.796187   22.0216 14.3   29  25   16  10
 9 2001  7.654443  45.64599 19.6    .   .    .   .
 9 2002  9.135294   45.8177 19.6    .   .    .   .
 9 2003  6.530878  45.96724 19.7    .   .    .   .
 9 2004  6.774224  47.15914 20.5    .   .    .   .
 9 2005  6.459905  47.18263 23.1    .   .    .   .
 9 2006  6.052089  47.45845 22.2    .   .    .   .
 9 2007 10.895368  47.56185 22.9    .   .    .   .
 9 2008   8.23589  47.66972 19.8    .   .    .   .
 9 2009         .  47.68865 16.9    .   .    .   .
 9 2010  10.89748  47.74874 19.3 34.9 5.6 14.1 7.9
 9 2011   6.80017  47.93261 21.2 34.9   6   14 7.9
 9 2012  6.976348  48.25891 19.3 34.2 6.2 13.8 8.3
 9 2013  5.429346  48.67302 19.5 33.9 6.1 14.7   8
 9 2014  7.783224  48.95944 21.1 33.2 5.9   15 8.5
 9 2015  8.801319  49.25023 21.5   33  23   15   9
11 2001  7.618742 17.581144 20.7    .   .    .   .
11 2002  8.985946 17.347332 19.6    .   .    .   .
11 2003  6.335054 17.507563   19    .   .    .   .
11 2004  6.682108 18.270735 20.6    .   .    .   .
11 2005  6.612041   17.8624 21.8    .   .    .   .
11 2006   5.97381 17.908842 22.2    .   .    .   .
11 2007  10.64304 17.594501 19.9    .   .    .   .

In my model, I assume that new job (hires) are a function of education, poverty, and the concentration of African Americans in the county for each year in the dataset. I have been back and forth with the Census to try to get at the 2001-2009 education attainment data, but have run into a dead end there. FYI: If anyone knows where Louisiana ACS 5 year estimates for S1501: Education Attainment (ACS Variable/Table Name) for years 2001-2009--I would love to know where you found it! Two questions:

Panel Option? Is it possible to run an OLS (and subsequent fixed and random effects models) by certain years? I think this is the worst case option since I'm essentially losing the years I care most about.

Imputed Option? I am not familiar with imputed data creation in Stata, but I do have 2000 and 2010-2015 data for these education variables, and I am wondering if this is enough to impute data? If so, I will have mostly continuous observations in these data if this is an option. If so, is there a good online tutorial or code to create the imputed data? I've read the Stata manual on imputation and can't follow it very well.

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Imputing Data Procedure
Imputing Data Procedure

0 Response to Imputing Data Procedure

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Imputing Data Procedure Imputing Data Procedure

Related Posts with Imputing Data Procedure

0 Response to Imputing Data Procedure

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Imputing Data Procedure
Imputing Data Procedure