Dear Statalist
I have a dataset with 32 million observations, and around 30 variables, I want to perform an operation on one variable (diag) to label ICD10 codes. To make this run quicker, I created an id variable (gen id = _n), and saved the dataset as "original_file"
I then dropped all variables except id and diag, then ran the operation on diag_01 (which took several hours)
I then wanted to merge the original file using
merge 1:1 id using original_file
But my id variable does not uniquely identify observations in either file. When I look at the data in Data Editor, I see that at large numbers the id variable repeats its self.
Does anyone know why this happens, and how to get round it? Should I be specifying the format of the id variable, to make sure its long enough not to round the large numbers?
Any help would be much appreciated
Best Wishes
Joe
0 Response to Problems with using _n to create id variable
Post a Comment