Dear Statalist
I have a dataset with 32 million observations, and around 30 variables, I want to perform an operation on one variable (diag) to label ICD10 codes. To make this run quicker, I created an id variable (gen id = _n), and saved the dataset as "original_file"
I then dropped all variables except id and diag, then ran the operation on diag_01 (which took several hours)
I then wanted to merge the original file using
merge 1:1 id using original_file
But my id variable does not uniquely identify observations in either file. When I look at the data in Data Editor, I see that at large numbers the id variable repeats its self.
Does anyone know why this happens, and how to get round it? Should I be specifying the format of the id variable, to make sure its long enough not to round the large numbers?
Any help would be much appreciated
Best Wishes
Joe
Related Posts with Problems with using _n to create id variable
Which model would you recommend to use? Panel DataHi all, I do my University work and need your help. I have a panel data and want to know how 10 inde…
Weighted adjacency matrixHi all, I am struggling with something similar to the original poster. I am using Stata 15 and have …
How to test whether difference in differences is statistically significant without regression?Dear all, I have two groups and have tested whether the mean of my variable of interest is signific…
-xtgls- vs -xtscc-?I am analysing a panel data with n=19 (ID/panel variable i.e. countries) and T=44 (time variable) to…
Calculating the expected return of Eurostoxx 50 with STATA (Event Study)Hello everyone, currently I am working on an event study with the Eurostoxx 50 Index as underlying m…
Subscribe to:
Post Comments (Atom)
0 Response to Problems with using _n to create id variable
Post a Comment