I am working with two rounds of survey data, that interviews individuals across different states (varname v024) in India. I want to append the datasets but there are a few issues with the encoded state names that I need to sort out.
For example in data from 2015-16
Code:
tab v024
state | Freq. Percent Cum.
----------------------------+-----------------------------------
andaman and nicobar islands | 2,811 0.40 0.40
andhra pradesh | 10,428 1.49 1.89
arunachal pradesh | 14,294 2.04 3.94
assam | 28,447 4.07 8.00
bihar | 45,812 6.55 14.55
chandigarh | 746 0.11 14.65
chhattisgarh | 25,172 3.60 18.25
--------------------------------------------------------------In the data from 2005-06, however, label names and values change:
Code:
tab v024
state | Freq. Percent Cum.
-----------------------+-----------------------------------
[jm] jammu and kashmir | 3,281 2.64 2.64
[hp] himachal pradesh | 3,193 2.57 5.20
[pj] punjab | 3,681 2.96 8.16
[uc] uttaranchal | 2,953 2.37 10.54
[hr] haryana | 2,790 2.24 12.78
[dl] delhi | 3,349 2.69 15.47
[rj] rajasthan | 3,892 3.13 18.60
[up] uttar pradesh | 12,183 9.79 28.40
[bh] bihar | 3,818 3.07 31.47
[sk] sikkim | 2,127 1.71 33.18
[ar] arunachal pradesh | 1,647 1.32 34.50
[na] nagaland | 3,896 3.13 37.63
[mn] manipur | 4,512 3.63 41.26
[mz] mizoram | 1,791 1.44 42.70
[tr] tripura | 1,906 1.53 44.23
[mg] meghalaya | 2,124 1.71 45.94
[as] assam | 3,840 3.09 49.03
[wb] west bengal | 6,794 5.46 54.49
[jh] jharkhand | 2,983 2.40 56.89
[or] orissa | 4,540 3.65 60.54
[ch] chhattisgarh | 3,810 3.06 63.60
[mp] madhya pradesh | 6,427 5.17 68.77
[gj] gujarat | 3,729 3.00 71.77
[mh] maharashtra | 9,034 7.26 79.03
[ap] andhra pradesh | 7,128 5.73 84.76
[ka] karnataka | 6,008 4.83 89.59
[go] goa | 3,464 2.78 92.37
[ke] kerala | 3,566 2.87 95.24
[tn] tamil nadu | 5,919 4.76 100.00
-----------------------+-----------------------------------
Total | 124,385 100.00I thought to fix this I could instead generate a new variable called state, replace values and define labels to match 2015-16, and then append the two, dataset after creating a variable called state in 2015-16.
Code:
gen state =. replace state = 2 if v024 == 28 replace state = 3 if v024 == 12 replace state = 4 if v024 == 18 label define 2 "andhra pradesh" 3 "arunachal pradesh" 4 "assam"
My question now is, given the rather large number of observations,how do I find the corresponding value behind each label without having to scroll through the data browser ie 1 - andaman and nicobar islands, 2- andhra pradesh 3 - arunachal pradesh etc? Also does the aforementioned method seem like the most efficient way to accomplish the correct append?
Thanks a lot!
Best,
Lori
No comments:
Post a Comment