Hello Everybody,

This is my second time posting, so I’m going to try to add additional information to see if it helps. I have an existing dataset of 5,597 families that participated in a treatment program, and I noticed that some of their termination dates were missing. Thus, I received an updated dataset in Excel to fill in the missing termination dates, which I converted to a Stata dataset using Stattransfer version 14.

The first time I tried to merge on the additional dataset, it did not recognize the identifier although it was a numeric variable. Thus I used the following code, which helped me match the identifiers:

tostring fpid, replace format (%07.0f)
encode fpid, g (fpid2)
rename fpid fpid_original
rename fpid2 fpid
sort fpid
save

Below is the dataex from this new dataset
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str7 fpid_original long(FP_REFERRAL_START FP_REFERRAL_TERM FP_REFERRAL_END fpid)
"1087671" 19361 19509 19541  1
"1087672" 19361 19395 19541  2
"1087674" 19361 19416 19541  3
"1087698" 19360 19524 19539  4
"1087703" 19360 19477 19540  5
"1087704" 19366 19473 19534  6
"1087722" 19365 19438 19545  7
"1087756" 19361 19446 19541  8
"1087759" 19361 19386 19541  9
"1087760" 19361 19375 19541 10
"1087762" 19361 19477 19541 11
"1087765" 19366 19414 19540 12
"1087799" 19360 19526 19540 13
"1087813" 19366 19730 19730 14
"1087814" 19366 19502 19546 15
"1087818" 19365 19754 19788 16
"1087822" 19365 19563 19607 17
"1087823" 19365 19458 19545 18
"1087837" 19361 19607 19633 19
"1087839" 19372 19446 19542 20
end
format %tdD_m_Y FP_REFERRAL_START
format %tdD_m_Y FP_REFERRAL_TERM
format %tdD_m_Y FP_REFERRAL_END
label values fpid fpid2
label def fpid2 1 "1087671", modify
label def fpid2 2 "1087672", modify
label def fpid2 3 "1087674", modify
label def fpid2 4 "1087698", modify
label def fpid2 5 "1087703", modify
label def fpid2 6 "1087704", modify
label def fpid2 7 "1087722", modify
label def fpid2 8 "1087756", modify
label def fpid2 9 "1087759", modify
label def fpid2 10 "1087760", modify
label def fpid2 11 "1087762", modify
label def fpid2 12 "1087765", modify
label def fpid2 13 "1087799", modify
label def fpid2 14 "1087813", modify
label def fpid2 15 "1087814", modify
label def fpid2 16 "1087818", modify
label def fpid2 17 "1087822", modify
label def fpid2 18 "1087823", modify
label def fpid2 19 "1087837", modify
label def fpid2 20 "1087839", modify


I successfully merged this dataset by the fpid to the new one, and the merged dataset is below. However, I am finding a curious error when I double-check the data. Some of the merged dates are incorrect i.e. the start dates do not match and should match every time. The two variables that should match are fpstart & FP_REFERRAL_START. I pasted the dataex below and the erroneous variables are in bold font; I am using Stata 15.1.


Below is the dataex from this merged dataset
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long(fpstart term_date fpid FP_REFERRAL_START FP_REFERRAL_TERM)
19361 19509  1 19361 19509
19361 19395  2 19361 19395
19361 19416  3 19361 19416
19360 19524  4 19360 19524
19360 19477  5 19360 19477
19365 19438  6 19366 19473
19361 19477  7 19365 19438
19360 19526  8 19361 19446
19366 19730  9 19361 19386
19366 19502 10 19361 19375
19365 19754 11 19361 19477
19365 19563 12 19366 19414
19365 19458 13 19360 19526
19361 19607 14 19366 19730
19372 19446 15 19366 19502
19360 19477 16 19365 19754
19372 19558 17 19365 19563
19372 19537 18 19365 19458
19362 19705 19 19361 19607
19372 19409 20 19372 19446
end
format %tdD_m_Y fpstart
format %tdD_m_Y term_date
format %tdD_m_Y FP_REFERRAL_START
format %tdD_m_Y FP_REFERRAL_TERM
label values fpid fpid
label def fpid 1 "1087671", modify
label def fpid 2 "1087672", modify
label def fpid 3 "1087674", modify
label def fpid 4 "1087698", modify
label def fpid 5 "1087703", modify
label def fpid 6 "1087722", modify
label def fpid 7 "1087762", modify
label def fpid 8 "1087799", modify
label def fpid 9 "1087813", modify
label def fpid 10 "1087814", modify
label def fpid 11 "1087818", modify
label def fpid 12 "1087822", modify
label def fpid 13 "1087823", modify
label def fpid 14 "1087837", modify
label def fpid 15 "1087839", modify
label def fpid 16 "1087843", modify
label def fpid 17 "1087844", modify
label def fpid 18 "1087846", modify
label def fpid 19 "1087847", modify
label def fpid 20 "1087848", modify
I would appreciate any help in figuring out why these start dates do not match. I’m wondering if it had something to do with the initial code I used to convert the fpid from string back to numeric. Thanks!