I am interested in doing regression analysis with panel data.The data I've chosen is in 2 waves, which is the 1st wave and 9th wave. I have appended the 9th wave dataset onto the 1st wave dataset and used the keep command in my best attempt to 'clean' the data to show only variables I need. For Panel Data tests, I need a variable like "year" or in this case, "wave" which would differentiate between 2 different recordings of a variable for the same person across the 2 waves. Here is my DO commands for now:
Code:
/* 2ND WAVE DATASET Please execute these for ci_indresp_w.dta on a separate Stata window use "C:\Users\User\OneDrive\Desktop\ci_indresp_w.dta" generate wave=., after(pidp) replace wave=9 if wave==. *save dataset and exit* */ //1ST WAVE DATASET// use "C:\Users\User\OneDrive\Desktop\ca_indresp_w.dta" generate wave=., after(pidp) replace wave=1 if wave==. //combine both waves of dataset// append using "C:\Users\User\OneDrive\Desktop\ci_indresp_w.dta" sort pidp keep pidp wave ca_netpay_amount ca_netpay_period ci_netpay_amount ci_netpay_period ca_hours ci_hours ca_sex ci_sex ca_age ci_age ca_couple ci_couple ca_hhcompa ci_hhcompa ca_hhcompb ci_hhcompb ca_hhcompc ci_hhcompc ca_hhcompd ci_hhcompd ca_hhcompe ci_hhcompe gen netpay=ca_netpay_amount, after(pidp) replace netpay=ci_netpay_amount if netpay==. gen netpayperiod=ca_netpay_period, after(pidp) replace netpayperiod=ci_netpay_period if netpayperiod==. gen hours=ca_hours, after(pidp) replace hours=ci_hours if hours==. gen sex=ca_sex, after(pidp) replace sex=ci_sex if sex==. gen couple=ca_couple, after(pidp) replace couple=ci_couple if couple==.
Here is a visual of the data I have now:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input long pidp float(couple sex hours netpayperiod netpay wave) 76165 1 2 25 3 3200 1 76165 1 2 38 3 3500 9 280165 1 2 0 3 1700 1 469205 2 2 16 3 750 9 469205 2 2 0 3 650 1 599765 1 2 37 3 2591 1 599765 1 2 35 3 2617 9 732365 2 1 -8 -8 -8 9 732365 2 1 -8 -8 -8 1 1587125 2 2 37 1 600 1 1587125 2 2 37 3 2200 9 3424485 2 2 -8 -8 -8 1 3424485 2 2 -8 -8 -8 9 4849085 1 1 38 3 3215 9 4849085 1 1 46 3 3200 1 68002725 2 2 -8 -8 -8 9 68008847 2 2 39 3 1389 9 68008847 2 2 39 3 1202 1 68010887 1 2 37 3 1300 1 68031967 2 2 -8 -8 -8 1 68035365 2 1 -8 -8 -8 9 68035365 2 1 -8 -8 -8 1 end
This is only a snapshot and the dataset contains 30579 observations (including duplicate IDs)
The ID's in bold are what I am trying to delete or drop because they are not present in both the datasets. Is there away to do this or is it futile? Or is the presence of these ID's without pairs not going to affect testing later?
I also noticed that the wave variable values alternate but not in a uniformed way, for example, I can see that it alternates like this: "1,9,9,1,1,9,9,1,.....so on" but up until row 11 it changes , but the alternating starts again. Is there a way to sort wave so that it does not alternate and is uniform like : "1,9,1,9,1,9,...." ?
I appreciate any help at all. I am sorry for the poor composition of this question and the messy Stata codes and output.
Thank you and have a good day.
0 Response to Deleting ID that is not present in both datasets for Panel Data
Post a Comment