What contributed to the discrepancies in mini dataset (extracted from raw data) versus the original data?

Hi Stata experts,

I am working on a modified dataset including only one variable and 18 observations. Please see below:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str10 linking_id
"2030502713"
"2050804309"
"2050804402"
"2050804414"
"2051104902"
"2051105322"
"2051205411"
"2051305508"
"2051305604"
"2051405711"
"2051505821"
"3031606504"
"7043615817"
"7043815106"
"7043815323"
"7124318123"
"8034719402"
"8034819813"
end

I wrote a paragraph of code to extract the values on the ID variable and save them in a separate data file (temporary file). Below is my code:

Code:

count

capture drop max
scalar max = `r(N)'


postfile ado_refuse_id_list linking_id using myresults, replace

capture drop i
gen i = 1
forvalues i = 1/`=scalar(max)' {
    
    destring linking_id, replace
    
    scalar id = linking_id

    post ado_refuse_id_list (`=scalar(id)') 
    
    drop if _n == 1


    replace i = i + 1
}

postclose ado_refuse_id_list

Because I destring the ID variable in my previous code, and I want to convert ID to a readable format. I used the following code to examine those values:

Code:

format linking_id %12.0f

tostring linking_id, gen(id) usedisplayformat

However, the values are not displayed as those in the original dataset. Specifically, across all 18 observations, the last four digits are off compared to the original IDs. The reformated values I got:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str10 id
"2030502656"
"2050804352"
"2050804352"
"2050804352"
"2051104896"
"2051105280"
"2051205376"
"2051305472"
"2051305600"
"2051405696"
"2051505792"
"3031606528"
"7043615744"
"7043814912"
"7043815424"
"7124318208"
"8034719232"
"8034819584"
end

I could not figure out the cause for the observed discrepancies for hours. I would be appreciative to have your guidance on how to fix the problem.

Many thanks,
Mengmeng

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / What contributed to the discrepancies in mini dataset (extracted from raw data) versus the original data?
What contributed to the discrepancies in mini dataset (extracted from raw data) versus the original data?

0 Response to What contributed to the discrepancies in mini dataset (extracted from raw data) versus the original data?

Post a Comment

Home / Data Cleaning / Data management / Data Processing / What contributed to the discrepancies in mini dataset (extracted from raw data) versus the original data? What contributed to the discrepancies in mini dataset (extracted from raw data) versus the original data?

Related Posts with What contributed to the discrepancies in mini dataset (extracted from raw data) versus the original data?

0 Response to What contributed to the discrepancies in mini dataset (extracted from raw data) versus the original data?

Post a Comment

Home / Data Cleaning / Data management / Data Processing / What contributed to the discrepancies in mini dataset (extracted from raw data) versus the original data?
What contributed to the discrepancies in mini dataset (extracted from raw data) versus the original data?