BJ Data Tech Solution

Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.

Problems with using _n to create id variable
Problems with using _n to create id variable

Dear Statalist

I have a dataset with 32 million observations, and around 30 variables, I want to perform an operation on one variable (diag) to label ICD10 codes. To make this run quicker, I created an id variable (gen id = _n), and saved the dataset as "original_file"

I then dropped all variables except id and diag, then ran the operation on diag_01 (which took several hours)

I then wanted to merge the original file using

merge 1:1 id using original_file

But my id variable does not uniquely identify observations in either file. When I look at the data in Data Editor, I see that at large numbers the id variable repeats its self.

Does anyone know why this happens, and how to get round it? Should I be specifying the format of the id variable, to make sure its long enough not to round the large numbers?

Any help would be much appreciated

Best Wishes

Joe

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Problems with using _n to create id variable
Problems with using _n to create id variable

0 Response to Problems with using _n to create id variable

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Problems with using _n to create id variable Problems with using _n to create id variable

Related Posts with Problems with using _n to create id variable

0 Response to Problems with using _n to create id variable