BJ Data Tech Solution

Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.

Duplicates in very large dataset
Duplicates in very large dataset

Hi all,

I am trying to merge two huge datasets.
To do so, I am generating a unique identifier as

Code:

gen id_mas = _n

However, when I check for duplicates, stata found that there are though the numbers displayed are different.
For instance:

Array

as you can see the number displayed is 2.33e+07 but it is precisely 23309572. The number below is displayed again as 2.33e+07 but it's 23309514. So they are uniquely defined but stata seems to care only about the rounded value.
How can I solve this issue and tell stata that these are two separate numbers?

Thank you

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Duplicates in very large dataset
Duplicates in very large dataset

0 Response to Duplicates in very large dataset

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Duplicates in very large dataset Duplicates in very large dataset

Related Posts with Duplicates in very large dataset

0 Response to Duplicates in very large dataset