Dear Statalist
I have a dataset with 32 million observations, and around 30 variables, I want to perform an operation on one variable (diag) to label ICD10 codes. To make this run quicker, I created an id variable (gen id = _n), and saved the dataset as "original_file"
I then dropped all variables except id and diag, then ran the operation on diag_01 (which took several hours)
I then wanted to merge the original file using
merge 1:1 id using original_file
But my id variable does not uniquely identify observations in either file. When I look at the data in Data Editor, I see that at large numbers the id variable repeats its self.
Does anyone know why this happens, and how to get round it? Should I be specifying the format of the id variable, to make sure its long enough not to round the large numbers?
Any help would be much appreciated
Best Wishes
Joe
Related Posts with Problems with using _n to create id variable
How to correct a time-varying variable that assumes two different values within the same period for the same IDDear all, I am working with an administrative dataset that contains a string variable that might as…
Panel data mediation use --sgmediation--Dear all, I want to use the command--sgmediation- to analyze the mediation of panel data.I read that…
VentilesHello, I want to take a variable of mine, put it into ventiles, then use collapse to graph the mean…
Bootstrap Anova Confidence Intervals F statisticHey everyone, I have the following issue. In order to validate my results in a rather low dataset, …
"logit" and "areg" commandsHello! I am not so sure if I can ask this question here.. But, I am wondering if I can use logit an…
Subscribe to:
Post Comments (Atom)
0 Response to Problems with using _n to create id variable
Post a Comment