Dear Statalist
I have a dataset with 32 million observations, and around 30 variables, I want to perform an operation on one variable (diag) to label ICD10 codes. To make this run quicker, I created an id variable (gen id = _n), and saved the dataset as "original_file"
I then dropped all variables except id and diag, then ran the operation on diag_01 (which took several hours)
I then wanted to merge the original file using
merge 1:1 id using original_file
But my id variable does not uniquely identify observations in either file. When I look at the data in Data Editor, I see that at large numbers the id variable repeats its self.
Does anyone know why this happens, and how to get round it? Should I be specifying the format of the id variable, to make sure its long enough not to round the large numbers?
Any help would be much appreciated
Best Wishes
Joe
Related Posts with Problems with using _n to create id variable
Displaying Significance Stars in combined summary and correlations tableDear Forum, i merged my correlations and summary statistics to one table via the below code (thanks…
Missing valuesHello, fellow stata lovers! I am working on my thesis and have a dataset from the Enterprise Survey…
Esttab: Store value of matrix and display in tex-tableDear All, I am encountering the following issue. I want to compute a mean after a regression with t…
R2 for xtnbreg modelHello everyone, i am currently analyzing a count variable with the -xtnbreg- command. This command …
Dumitrescu & Hurlin (2012) Granger non-causality testHello Dears, I am trying to see the granger causality between government revenue and government spen…
Subscribe to:
Post Comments (Atom)
0 Response to Problems with using _n to create id variable
Post a Comment