Good morning all,
I am a PhD student and relative newcomer to Stata and was hoping you might be able to help me with a problem I am encountering when merging two datasets. The first dataset comprises Wave 1 of a survey (999 participants). The second dataset is Wave 2 of the same survey (plus some additional new questions) (815 participants) i.e. a follow-up survey which targeted the same sample just 3 months later. Each participant in both surveys has a unique numeric Prolific id (e.g. 503ld3858389289hk12). Responses from each of the two surveys have been uploaded separately to Stata, destrung and cleaned. I have checked the uploaded data against the original raw data and in both cases it all looks fine. The problem comes when I try to merge the two files in order to perform a panel data analysis. My process is as follows:
  • clear all, use "C:\Users\User\Downloads\Final Raw merged data file from LimeSurvey.dta" (MASTER FILE - ie RAW SURVEY 1 DATA)
  • I then run a duplicates list WhatisyourProlificid check and drop all duplicate ids (WhatisyourProlificid is the string id variable used in the data)
  • Then I run a whole load of rename, label var and encode [ ], gen[ ] commands to clean and destring the data, creating a destrung version of the WhatisyourProlificid var "id_d"
  • I then import the Wave 2 data (which has also been cleaned and destrung and checked for duplicate ids using merge 1:1 id_d using "C:\Users\User\OneDrive\Documents\STATA\finalcovid surveydestrungdataupdated03092020.dta" and drop the variables that don't match. Process seems to run fine and I am left with n=822 participants
BUT when I order the new merged dataset and place [age] Wave 1 age var and [agec] Wave 2 age var side by side, it is obvious that the using data which has been imported has been changed. For example whereas in the faw files and in the stand alone data files age is 34 and agec is also 34, in the merged file age is still 34 but agec has changed to 69! The same goes for other variables such as education, parental status etc which should not have changed and which were fine in the stand alone data files.

I am at a complete loss as to what is going on or what I could do to try and fix this issue so any help or advice you could offer would be hugely appreciated.

Best wishes
Diane