I am working with the Ethiopian Medium and Large Manufacturing Census for the years 1998-2009, carried out by the Central Statistical Agency of Ethiopia (at the following link you can find the metadata for the year 2009 http://catalog.ihsn.org/index.php/ca...ata_dictionary).
My problem is that in 2005 a survey was conducted (instead of the whole census), and this feature seems to have an influence on the outcome of my analysis.
I think that this survey has not been carried out using a random sampling approach because my summary statistics on the share of private and public firms in 2005 are fairly different with respect to the other years (in particular, the share of public firms seems to be higher wrt the rest of the years). Also the summary statistics on my main variables of interest (i.e. wages and number of workers) seem to be somehow biased for the year 2005.
Is there any way to imputate the values for the year 2005 instead than using the survey? I thought about calculating an average between the value of the variables in the years 2004-2006, even thou I am aware that this is not a very precise approach....any other advice?
I am posting the table of my summary statistics with the red color to undeline the things which are a bit "wierd" in order to allow u to see the problem:
Evolution of Ethiopian manufacturing sector, average values | |||||||||||||
1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | ||
Number of firms | 725 | 739 | 731 | 765 | 883 | 939 | 997 | 991 | 1153 | 1339 | 1734 | 1948 | |
Share of Private | 0.81 | 0.81 | 0.82 | 0.83 | 0.85 | 0.86 | 0.86 | 0.64 | 0.88 | 0.91 | 0.43 | 0.95 | |
Share of Public | 0.19 | 0.19 | 0.18 | 0.17 | 0.15 | 0.14 | 0.14 | 0.36 | 0.12 | 0.09 | 0.57 | 0.05 | |
Median employment | 20 | 21 | 21 | 23 | 18 | 20 | 23 | 28 | 24 | 20 | 17 | 16 | |
Share of firms located in the capital | 0.66 | 0.63 | 0.63 | 0.60 | 0.61 | 0.58 | 0.55 | 0.46 | 0.53 | 0.50 | 0.44 | 0.39 | |
Exported value added | 0.0206 | 0.0217 | 0.0233 | 0.0238 | 0.0194 | 0.0227 | 0.0208 | 0.0262 | 0.0205 | 0.0195 | 0.0151 | 0.0166 | |
Capital intensity (capital/worker) '000 Birr | 26.46 | 23.94 | 38.48 | 121.03 | 68.09 | 69.16 | 79.73 | 102.11 | 89.90 | 84.89 | 114.28 | 122.19 | |
Gender pay gap (Wm-Wf)/Wm | 0.16 | 0.13 | 0.16 | 0.13 | 0.15 | 0.17 | 0.13 | -0.25 | 0.05 | 0.11 | 0.12 | 0.02 | |
Gender gap in workers comp (Nm-Nf)/Nm | 0.50 | 0.45 | 0.45 | 0.43 | 0.43 | 0.45 | 0.48 | 0.37 | 0.40 | 0.42 | 0.41 | 0.30 | |
Technology level of the industry, ISIC classification, share | |||||||||||||
1 | 0.50 | 0.50 | 0.51 | 0.52 | 0.51 | 0.49 | 0.50 | 0.41 | 0.47 | 0.44 | 0.40 | 0.39 | |
2 | 0.20 | 0.19 | 0.20 | 0.20 | 0.21 | 0.22 | 0.23 | 0.18 | 0.25 | 0.29 | 0.36 | 0.38 | |
3 | 0.26 | 0.26 | 0.24 | 0.23 | 0.23 | 0.24 | 0.23 | 0.13 | 0.24 | 0.24 | 0.21 | 0.20 | |
4 | 0.00 | 0.00 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | |
. | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | 0.03 | 0.27 | 0.04 | 0.03 | 0.03 | 0.03 | |
Share of firms in each industry | |||||||||||||
Food and Beverage | 0.28 | 0.28 | 0.29 | 0.31 | 0.30 | 0.29 | 0.29 | 0.21 | 0.29 | 0.26 | 0.25 | 0.25 | |
Textile and Garments | 0.07 | 0.07 | 0.07 | 0.07 | 0.06 | 0.06 | 0.07 | 0.06 | 0.06 | 0.05 | 0.03 | 0.04 | |
Leather and Footwear | 0.08 | 0.07 | 0.06 | 0.07 | 0.06 | 0.06 | 0.06 | 0.06 | 0.05 | 0.05 | 0.04 | 0.04 | |
Wood and Furniture | 0.03 | 0.03 | 0.03 | 0.02 | 0.03 | 0.02 | 0.02 | 0.02 | 0.02 | 0.03 | 0.03 | 0.02 | |
Printing and Paper | 0.07 | 0.08 | 0.09 | 0.08 | 0.08 | 0.08 | 0.07 | 0.08 | 0.07 | 0.07 | 0.06 | 0.05 | |
Chemical and Plastic | 0.09 | 0.09 | 0.08 | 0.08 | 0.08 | 0.08 | 0.08 | 0.09 | 0.10 | 0.09 | 0.08 | 0.07 | |
Non Metal | 0.11 | 0.11 | 0.10 | 0.11 | 0.11 | 0.12 | 0.12 | 0.07 | 0.12 | 0.20 | 0.26 | 0.29 | |
Metal and Machinery | 0.27 | 0.27 | 0.27 | 0.26 | 0.27 | 0.28 | 0.28 | 0.17 | 0.29 | 0.24 | 0.23 | 0.23 | |
Explained share | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.77 | 1.00 | 1.00 | 1.00 | 1.00 |
Thank you in advance!
PS I know, also the year 2008 is not so nice ahen it comes to summary statistics...
0 Response to How to imputate missing values for one year in a pooled cross section dataset?
Post a Comment