In order to obtain repeated cross-sectional data, I have appended 3 datasets stemming from different rounds of a household survey in Kenya (survey years 2006, 2009, 2015).
Each of the 3 surveys comes with a variable that specifies the district of the interviewed household, however the number of unique districts increases with every survey year.
Now I would like to generate a new variable that gives me the district for all observations in the dataset, i.e. combine district2006, district2009 and district2015 into one variable. However, I would like its values to be generated as missing if the district of the household was not included in the 2006 survey:
District2006 ={A,B}, District2009={A,B,C}, District2015={A,B,C,D,E}
-> District060915= {A,B}
I hope the following table makes it a bit more clear:
ID | District2006 | District2009 | District2015 | *District060915* |
1-06 | A | . | . | A |
2-06 | B. | . | . | B |
1-09 | A | . | A | |
2-09 | . | B | . | B |
3-09 | . | C | . | . |
1-15 | A | A | ||
2-15 | . | B | B | |
3-15 | C | . | ||
4-15 | D | . | ||
5-15 | E | . |
Is there an efficient way to do this? Or do I have to manually compare the three columns to create the 4th? Thank you very much in advance!
Best,
Sophia Magis
0 Response to Repeated cross-section: gen new variable combining three survey years
Post a Comment