The data that I'm using has been delivered to us with duplicated patient IDs.
These are 257 patients out of an 800,012 sample.
The patients appear to have different sex and other characteristics, although they have the same ID number; hence I decided to remove them and select my cohorts in the remaining sample.
I'm using Stata 16, windows 10; the computer has 512 GB RAM.
As these are medical records, I wanted to remove the observations that correspond to these patients.
Code:
global gold "D:/bms98e/Data/GOLD" global analysis "D:/bms98e/Analysis" foreach file in test therapy patient referral immunisation consultation { use "$gold/`file'", clear merge m:1 e_patid using "$analysis/duplicates_drop.dta", gen(merge) drop if merge==3 drop merge save "$gold/`file'", replace
I decided to rerun it to ensure that it worked, expecting that I would see 0 matched and 0 dropped observations.
I have repeated the loop a few times now.
The first time it removed thousands of observations from each dataset.
When I repeated the loop, it dropped some observations again (some hundreds).
At one point, it matched to less than 10 observations in some of the datasets and 0 in others.
I just looped it again, and it matched to more.
Why does the number keep changing?
Can anyone please advise me why this happens?
Does it have to do with the fact that I used global macros?
Could it be that there is something wrong with the patient ID numbers?
If there is a better way to do this, please let me know.
Thank you,
Louisa
0 Response to Issues with loop used to drop observations
Post a Comment