Dear Statalist
I have a dataset with 32 million observations, and around 30 variables, I want to perform an operation on one variable (diag) to label ICD10 codes. To make this run quicker, I created an id variable (gen id = _n), and saved the dataset as "original_file"
I then dropped all variables except id and diag, then ran the operation on diag_01 (which took several hours)
I then wanted to merge the original file using
merge 1:1 id using original_file
But my id variable does not uniquely identify observations in either file. When I look at the data in Data Editor, I see that at large numbers the id variable repeats its self.
Does anyone know why this happens, and how to get round it? Should I be specifying the format of the id variable, to make sure its long enough not to round the large numbers?
Any help would be much appreciated
Best Wishes
Joe
Related Posts with Problems with using _n to create id variable
Plotting polynomial regressionHi there, I am trying to plot a polynomial regression between the variables hiscam and my third ord…
Help Building a Loop to tell me which observations meet my CriterionHello wonderful people of statlist. I have a question that is based off of an offshoot of a prior qu…
How to extract numbers and operation symbols from string variables in StataHi all, I have a question about how to extract numbers and operation symbols from a string variable…
Probit versus marginsDear all, I am testing the effect of a policy using a DID strategy, I get positive and significant …
Import delimited - ignoring linebreaksHello, I have (random) linebreaks that I cannot remove with Notepad++ before importing. Three observ…
Subscribe to:
Post Comments (Atom)
0 Response to Problems with using _n to create id variable
Post a Comment