I am currently working with a country-year panel data set where I have employment shares data (emp_share) and value added (va) data for each country, but there are missing values for some years for the employment shares. The missing years differ per country. An example is provided below, where I have created two artificial countries OKE and NOPE:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str8 iso3code int year float(emp_share va) "OKE" 1991 . 459.4454 "OKE" 1992 . 6542.613 "OKE" 1993 . 199112.8 "OKE" 1994 . 1093583.8 "OKE" 1995 . 4679846 "OKE" 1996 . 6214834 "OKE" 1997 . 7284615 "OKE" 1998 . 7718513 "OKE" 1999 . 9656713 "OKE" 2000 . 15330421 "OKE" 2001 12.55305 18412974 "OKE" 2002 12.55305 18409856 "OKE" 2003 12.55305 18167794 "OKE" 2004 12.55305 23281054 "OKE" 2005 12.55305 25322364 "OKE" 2006 12.55305 25517636 "OKE" 2007 12.55305 29441796 "OKE" 2008 12.55305 40398244 "OKE" 2009 12.55305 40799392 "OKE" 2010 12.55305 50149996 "OKE" 2011 12.55305 66476740 "OKE" 2012 12.55305 68491848 "OKE" 2013 12.55305 80316104 "OKE" 2014 12.55305 100533944 "OKE" 2015 12.55305 149608384 "OKE" 2016 12.55305 174497792 "OKE" 2017 12.55305 189625440 "NOPE" 1991 23.45305 53628756 "NOPE" 1992 23.45305 58215084 "NOPE" 1993 23.45305 56443096 "NOPE" 1994 23.45305 61759068 "NOPE" 1995 23.45305 57224860 "NOPE" 1996 23.45305 67961032 "NOPE" 1997 23.45305 67752560 "NOPE" 1998 23.45305 62262480 "NOPE" 1999 23.45305 57770604 "NOPE" 2000 23.45305 61326680 "NOPE" 2001 23.45305 62262480 "NOPE" 2002 . 59642220 "NOPE" 2003 . 71121472 "NOPE" 2004 23.45305 89151392 "NOPE" 2005 23.45305 80042848 "NOPE" 2006 23.45305 78046456 "NOPE" 2007 23.45305 89900040 "NOPE" 2008 23.45305 91834040 "NOPE" 2009 23.45305 81103440 "NOPE" 2010 23.45305 91272552 "NOPE" 2011 . 112858552 "NOPE" 2012 . 112047512 "NOPE" 2013 . 134506928 "NOPE" 2014 . 125398384 "NOPE" 2015 . 112733776 "NOPE" 2016 . 102502264 "NOPE" 2017 . 108928152 end
What I would like to do is the following. I would like to impute values for the employment share where missing, based on the employment share moving with value added. For example, the employment share is missing in 2011 for "NOPE", but because I know the growth rate of value added, I would like to use this growth rate to impute the employment share based on the value added growth rate between 2010 and 2011, multiplied by the employment share in 2010. Similarly, the employment share for 2003 is missing for "NOPE", and here I would need to backwardly impute the value using the backward growth rate from 2004 to 2003 in value added, times the employment share in 2004. As a final example, for country "OKE", I would need to backwardly impute the missing values for the employment share for 2000 and before. I want to write a command that does this sequentially but backwards. Manually writing down this command is not difficult, but I know that there is a easier way to do this. I have consulted the Stata manual and came across mipolate, but this does not seem to do what I want or maybe I am not sure how to implement this at least. I also wonder in general whether there is a single command that imputes all the missing values based on the observed growth rate of a specified variable (in this case value added) or whether I have to first impute values for before 2001 for "OKE", then the values in the middle of a country's period, i.e. for 2002 and 2003 for "NOPE", and at the end of the period of a country, e.g. for 2011 onwards for the country "NOPE"
Any help on this is greatly appreciated.
Best,
Satya
0 Response to filling in missing values backwards based on growth rate
Post a Comment