Hi All,

I am working with two rounds of survey data, that interviews individuals across different states (varname v024) in India. I want to append the datasets but there are a few issues with the encoded state names that I need to sort out.

For example in data from 2015-16

Code:
tab v024

                      state |      Freq.     Percent        Cum.
----------------------------+-----------------------------------
andaman and nicobar islands |      2,811        0.40        0.40
             andhra pradesh |     10,428        1.49        1.89
          arunachal pradesh |     14,294        2.04        3.94
                      assam |     28,447        4.07        8.00
                      bihar |     45,812        6.55       14.55
                 chandigarh |        746        0.11       14.65
               chhattisgarh |     25,172        3.60       18.25
   --------------------------------------------------------------
Here for example the state andhra pradesh is encoded with value 2.

In the data from 2005-06, however, label names and values change:

Code:
 tab v024

                 state |      Freq.     Percent        Cum.
-----------------------+-----------------------------------
[jm] jammu and kashmir |      3,281        2.64        2.64
 [hp] himachal pradesh |      3,193        2.57        5.20
           [pj] punjab |      3,681        2.96        8.16
      [uc] uttaranchal |      2,953        2.37       10.54
          [hr] haryana |      2,790        2.24       12.78
            [dl] delhi |      3,349        2.69       15.47
        [rj] rajasthan |      3,892        3.13       18.60
    [up] uttar pradesh |     12,183        9.79       28.40
            [bh] bihar |      3,818        3.07       31.47
           [sk] sikkim |      2,127        1.71       33.18
[ar] arunachal pradesh |      1,647        1.32       34.50
         [na] nagaland |      3,896        3.13       37.63
          [mn] manipur |      4,512        3.63       41.26
          [mz] mizoram |      1,791        1.44       42.70
          [tr] tripura |      1,906        1.53       44.23
        [mg] meghalaya |      2,124        1.71       45.94
            [as] assam |      3,840        3.09       49.03
      [wb] west bengal |      6,794        5.46       54.49
        [jh] jharkhand |      2,983        2.40       56.89
           [or] orissa |      4,540        3.65       60.54
     [ch] chhattisgarh |      3,810        3.06       63.60
   [mp] madhya pradesh |      6,427        5.17       68.77
          [gj] gujarat |      3,729        3.00       71.77
      [mh] maharashtra |      9,034        7.26       79.03
   [ap] andhra pradesh |      7,128        5.73       84.76
        [ka] karnataka |      6,008        4.83       89.59
              [go] goa |      3,464        2.78       92.37
           [ke] kerala |      3,566        2.87       95.24
       [tn] tamil nadu |      5,919        4.76      100.00
-----------------------+-----------------------------------
                 Total |    124,385      100.00
And the same state andhra pradesh now has label [ap] andhra pradesh with value equal to 28.

I thought to fix this I could instead generate a new variable called state, replace values and define labels to match 2015-16, and then append the two, dataset after creating a variable called state in 2015-16.

Code:
gen state =.
replace state = 2 if v024 == 28
replace state = 3 if v024 == 12
replace state = 4 if v024 == 18

label define 2 "andhra pradesh"  3 "arunachal pradesh" 4 "assam"
Else, appending without these changes result in the wrong states being matched based on the encoded value.

My question now is, given the rather large number of observations,how do I find the corresponding value behind each label without having to scroll through the data browser ie 1 - andaman and nicobar islands, 2- andhra pradesh 3 - arunachal pradesh etc? Also does the aforementioned method seem like the most efficient way to accomplish the correct append?

Thanks a lot!

Best,
Lori