Hello,

I am appending several rounds/datasets from different demographic and health surveys. The variables are identical across samples, but the value labels are likely to be different. One dataset may have 10 regions, another may have two different ones. In order to append these and synchronize the value labels, i was advised to convert the variables that have value labels to string, and then encode them to numeric after the append.

But I am running into issues with some variables that have labels attached only to some numbers. Here is a label list for one of such variables:

Code:
v852a -- how long ago first had sex with most recent partner

         101 days: 1
         199 days: number missing
         201 weeks: 1
         299 weeks: number missing
         301 months: 1
         399 months: number missing
         401 years: 1
         499 years: number missing

This variable is actually an integer, with values ranging from 102 through 198 days; 202 - 298 weeks, etc.

Given that this variable appears in different dataset with different value labels, I decided to change it to string, based on advice given here earlier last month. I decode it via the following command:

Code:
decode v852a, gen(v852a_string)
The decoded variable however disregarded all the numeric codes/values (i.e., 102 through 198; 202 through 298, etc), leaving me only partial data, as shown in this frequency table.



Code:
v852a_string -- how long ago first had sex with most recent partner
---------------------------------------------------------------
                  |      Freq.    Percent      Valid       Cum.
------------------+--------------------------------------------
Valid   days: 1   |        211       1.79      18.74      18.74
        months: 1 |        224       1.90      19.89      38.63
        weeks: 1  |         77       0.65       6.84      45.47
        years: 1  |        614       5.21      54.53     100.00
        Total     |       1126       9.56     100.00           
Missing           |      10658      90.44                      
Total             |      11784     100.00                      
---------------------------------------------------------------


How can I resolve this issue so that the variable can be decoded, but retain the numeric values, which can then be encoded later.

thanks, Yawo