Hello,

I am quite new to Stata and this is my first post and I would really appreciate any help. I am working on my master's thesis and I am doing some empirical analysis on 5 waves of the SHARE panel data sets. Hence, I had to append the 5 datasets and also merge the different modules of each wave. I merged before appending. To do this, I used the unique identifier of each observation (mergeid). After appending, I also created a wave identifier so I can uniquely identify each wave's observations in the combined dataset.

After creating the wave identifier (without syntax errors), I realized that some variables that were not present in the first and second waves in the original datasets had some observations in the combined dataset. Please see my code below.


sort mergeid , stable
by mergeid: gen wave = 1 if _n==1

bysort mergeid: replace wave = 1 if _n==1 & firstwave==1 & hhid4~="" & hhid5~=""
bysort mergeid: replace wave = 2 if _n==2 & firstwave==1 & hhid4~="" & hhid5~=""
bysort mergeid: replace wave = 4 if _n==3 & firstwave==1 & hhid4~="" & hhid5~=""
bysort mergeid: replace wave = 5 if _n==4 & firstwave==1 & hhid4~="" & hhid5~=""
bysort mergeid: replace wave = 6 if _n==5 & firstwave==1 & hhid4~="" & hhid5~=""

bysort mergeid: replace wave = 2 if _n==1 & firstwave==2 & hhid4~="" & hhid5~=""
bysort mergeid: replace wave = 4 if _n==2 & firstwave==2 & hhid4~="" & hhid5~=""
bysort mergeid: replace wave = 5 if _n==3 & firstwave==2 & hhid4~="" & hhid5~=""
bysort mergeid: replace wave = 6 if _n==4 & firstwave==2 & hhid4~="" & hhid5~=""

bysort mergeid: replace wave = 4 if _n==1 & firstwave==3 & hhid4~="" & hhid5~=""
bysort mergeid: replace wave = 5 if _n==2 & firstwave==3 & hhid4~="" & hhid5~=""
bysort mergeid: replace wave = 6 if _n==3 & firstwave==3 & hhid4~="" & hhid5~=""

bysort mergeid: replace wave = 4 if _n==1 & firstwave==4 & hhid4~="" & hhid5~=""
bysort mergeid: replace wave = 5 if _n==2 & firstwave==4 & hhid4~="" & hhid5~=""
bysort mergeid: replace wave = 6 if _n==3 & firstwave==4 & hhid4~="" & hhid5~=""

bysort mergeid: replace wave = 1 if _n==1 & firstwave==1 & hhid4=="" & hhid5~=""
bysort mergeid: replace wave = 2 if _n==2 & firstwave==1 & hhid4=="" & hhid5~=""
bysort mergeid: replace wave = 5 if _n==3 & firstwave==1 & hhid4=="" & hhid5~=""
bysort mergeid: replace wave = 6 if _n==4 & firstwave==1 & hhid4=="" & hhid5~=""

bysort mergeid: replace wave = 2 if _n==1 & firstwave==2 & hhid4=="" & hhid5~=""
bysort mergeid: replace wave = 5 if _n==2 & firstwave==2 & hhid4=="" & hhid5~=""
bysort mergeid: replace wave = 6 if _n==3 & firstwave==2 & hhid4=="" & hhid5~=""

bysort mergeid: replace wave = 5 if _n==1 & firstwave==3 & hhid4=="" & hhid5~=""
bysort mergeid: replace wave = 6 if _n==2 & firstwave==3 & hhid4=="" & hhid5~=""

bysort mergeid: replace wave = 5 if _n==1 & firstwave==4 & hhid4=="" & hhid5~=""
bysort mergeid: replace wave = 6 if _n==2 & firstwave==4 & hhid4=="" & hhid5~=""

bysort mergeid: replace wave = 5 if _n==1 & firstwave==5 & hhid4=="" & hhid5~=""
bysort mergeid: replace wave = 6 if _n==2 & firstwave==5 & hhid4=="" & hhid5~=""

bysort mergeid: replace wave = 1 if _n==1 & firstwave==1 & hhid4~="" & hhid5==""
bysort mergeid: replace wave = 2 if _n==2 & firstwave==1 & hhid4~="" & hhid5==""
bysort mergeid: replace wave = 4 if _n==3 & firstwave==1 & hhid4~="" & hhid5==""
bysort mergeid: replace wave = 6 if _n==4 & firstwave==1 & hhid4~="" & hhid5==""

bysort mergeid: replace wave = 2 if _n==1 & firstwave==2 & hhid4~="" & hhid5==""
bysort mergeid: replace wave = 4 if _n==2 & firstwave==2 & hhid4~="" & hhid5==""
bysort mergeid: replace wave = 6 if _n==3 & firstwave==2 & hhid4~="" & hhid5==""

bysort mergeid: replace wave = 4 if _n==1 & firstwave==3 & hhid4~="" & hhid5==""
bysort mergeid: replace wave = 6 if _n==2 & firstwave==3 & hhid4~="" & hhid5==""

bysort mergeid: replace wave = 4 if _n==1 & firstwave==4 & hhid4~="" & hhid5==""
bysort mergeid: replace wave = 6 if _n==2 & firstwave==4 & hhid4~="" & hhid5==""

bysort mergeid: replace wave = 1 if _n==1 & firstwave==1 & hhid4=="" & hhid5==""
bysort mergeid: replace wave = 2 if _n==2 & firstwave==1 & hhid4=="" & hhid5==""
bysort mergeid: replace wave = 6 if _n==3 & firstwave==1 & hhid4=="" & hhid5==""

bysort mergeid: replace wave = 2 if _n==1 & firstwave==2 & hhid4=="" & hhid5==""
bysort mergeid: replace wave = 6 if _n==2 & firstwave==2 & hhid4=="" & hhid5==""

bysort mergeid: replace wave = 6 if _n==1 & firstwave==3 & hhid4=="" & hhid5==""

bysort mergeid: replace wave = 6 if _n==1 & firstwave==6 & hhid4=="" & hhid5==""

mergeid is the unique identifier for each observation, hhid is the unique identifier for each household present in each wave, firstwave is the first wave in which the respondent appeared.

Please what could be the possible issue with my logic or code? Thank you very much.