Dear Statalist
I have a dataset with 32 million observations, and around 30 variables, I want to perform an operation on one variable (diag) to label ICD10 codes. To make this run quicker, I created an id variable (gen id = _n), and saved the dataset as "original_file"
I then dropped all variables except id and diag, then ran the operation on diag_01 (which took several hours)
I then wanted to merge the original file using
merge 1:1 id using original_file
But my id variable does not uniquely identify observations in either file. When I look at the data in Data Editor, I see that at large numbers the id variable repeats its self.
Does anyone know why this happens, and how to get round it? Should I be specifying the format of the id variable, to make sure its long enough not to round the large numbers?
Any help would be much appreciated
Best Wishes
Joe
Related Posts with Problems with using _n to create id variable
Comparing prevalenceHi community, I wish to compare the prevalence of food security among adult males (AGE_VQ_P>=20)…
Add common prefix to all variables (except key variable) when merging dataHi everyone, I am currently working with panel data. Lets say I have four data sets, each containin…
Ordered probit with binary endogeneous explanatory variable: heckprobit vs. opsel vs. generating generalized residuals manuallyHi All, I want to run an ordered probit model with a binary endogeneous explanatory variable. I have…
esttab result table does not provide pseudo r2 (r2_ml) calculated with fitstat...(clogit model)I am running a clogit model and I installed spost13 to get the fitstats for the model. here is the c…
Xtset Identifier - FE estimationHello Stata Forum, I am having trouble to understand my identifier and relate with fixed effects es…
Subscribe to:
Post Comments (Atom)
0 Response to Problems with using _n to create id variable
Post a Comment