I am using Stata 15.
I am trying to merge many (over 100) files for three years of survey data (over 300 separate files in total). Some variables (for example, name, id number, age) are listed in multiple files, but are inconsistent. This is not always due to missing values, but sometimes due to mistranslation of handwriting etc, so I'd like to flag when this occurs, rather update and replace.
As I understand, when Stata merges two files and I don't specify that two identical variable names should be matched, the 'using' data is dumped (or I can specify that it be updated or replaced, but I can't keep both in separate variables). I'd like to keep the variable from the 'using' file with a new name (or the old name with a suffix). For example, if I ran the following:
Code:
set obs 100 forval j = 1/5 { gen v`j' = uniform() } replace v1 = _n save myfile1, replace set obs 100 forval j = 5/10 { gen v`j' = uniform() } gen v1 = _n save myfile2, replace use myfile1, clear merge 1:1 v1 using myfile2
v1 v2 v3 v4 v5 v5_suffix v6 v7 v8 v9 v10
Alternatively, I'd be happy if I could export list of variable names as a single line to the same excel file, where I could order them and rename/compare manually.
So far I have tried:
Code:
use "$survey/A", clear global mylist B C D E F G H I J K foreach filename of global mylist { use `"$survey/`filename'"' keep if _n==1 export excel * using "$constructedData/checks.xlsx", firstrow(varlabels) append }
I'd be very grateful for any advice.
0 Response to Checking merges over a very large number of files
Post a Comment