Hello guys
i was preparing my data for survival analysis and encountered a problem. Below is a sample which contains two cases
Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input long claimdy str54 name int(chal tgt) long(begclaim endclaim)
 3801 "Eastern Greenland" 385 390 192107 193304
81403 "Hong Kong"         710 200 197203 199707
end
after a series of codes:
Code:
tostring begclaim, replace
tostring endclaim, replace
gen rightcensor=1 if endclaim=="200199"
generate claimstart = date(begclaim, "YM")
generate claimend = date(endclaim, "YM")
format claimstart %td
format claimend %td
gen claimend_mon=month(claimend)
gen claimend_yr=year(claimend)
gen leap = mod(claimend_yr,400)==0 | mod(claimend_yr,4)==0 & mod(claimend_yr,100)!=0
gen claimend_day=31 if claimend_mon==1
replace claimend_day=30 if claimend_mon==4
replace claimend_day=31 if claimend_mon==7
gen claimend2=mdy(claimend_mon, claimend_day, claimend_yr)
format claimend2 %td
gen claimserialstart= claimstart
format claimserialstart %td
gen claimserialend=claimend2
format claimserialend %td
gen claimfail=1 if rightcensor==.
gen claimbeg_yr=year(claimserialstart)
gen claimbeg_day=day(claimserialstart)
drop claimend claimend_mon claimend_yr claimstart
**because territorial norm starts in 1919
gen enterdate=mdy(1, 1, 1919)
format enterdate %td
stset claimserialend, id(claimdy) fail(claimfail==1) origin(time claimserialstart)  enter(enterdate) scale(365.25)
stsplit yearst,every(1)
*gen month=month(claimserialend)
*gen year=year(claimserialend)
gen year =0
sort claimdy _t
by claimdy _t:replace year= claimbeg_yr + _t if claimbeg_yr < 1919
by claimdy _t:replace year= claimbeg_yr + _t0 if claimbeg_yr >= 1919
i got the below sample:
Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input long claimdy str54 name int(chal tgt) str6(begclaim endclaim) float year double _t byte _t0
 3801 "Eastern Greenland" 385 390 "192107" "193304" 1921                  1  0
 3801 "Eastern Greenland" 385 390 "192107" "193304" 1922                  2  1
 3801 "Eastern Greenland" 385 390 "192107" "193304" 1923                  3  2
 3801 "Eastern Greenland" 385 390 "192107" "193304" 1924                  4  3
 3801 "Eastern Greenland" 385 390 "192107" "193304" 1925                  5  4
 3801 "Eastern Greenland" 385 390 "192107" "193304" 1926                  6  5
 3801 "Eastern Greenland" 385 390 "192107" "193304" 1927                  7  6
 3801 "Eastern Greenland" 385 390 "192107" "193304" 1928                  8  7
 3801 "Eastern Greenland" 385 390 "192107" "193304" 1929                  9  8
 3801 "Eastern Greenland" 385 390 "192107" "193304" 1930                 10  9
 3801 "Eastern Greenland" 385 390 "192107" "193304" 1931                 11 10
 3801 "Eastern Greenland" 385 390 "192107" "193304" 1932  11.83025325119781 11
81403 "Hong Kong"         710 200 "197203" "199707" 1972                  1  0
81403 "Hong Kong"         710 200 "197203" "199707" 1973                  2  1
81403 "Hong Kong"         710 200 "197203" "199707" 1974                  3  2
81403 "Hong Kong"         710 200 "197203" "199707" 1975                  4  3
81403 "Hong Kong"         710 200 "197203" "199707" 1976                  5  4
81403 "Hong Kong"         710 200 "197203" "199707" 1977                  6  5
81403 "Hong Kong"         710 200 "197203" "199707" 1978                  7  6
81403 "Hong Kong"         710 200 "197203" "199707" 1979                  8  7
81403 "Hong Kong"         710 200 "197203" "199707" 1980                  9  8
81403 "Hong Kong"         710 200 "197203" "199707" 1981                 10  9
81403 "Hong Kong"         710 200 "197203" "199707" 1982                 11 10
81403 "Hong Kong"         710 200 "197203" "199707" 1983                 12 11
81403 "Hong Kong"         710 200 "197203" "199707" 1984                 13 12
81403 "Hong Kong"         710 200 "197203" "199707" 1985                 14 13
81403 "Hong Kong"         710 200 "197203" "199707" 1986                 15 14
81403 "Hong Kong"         710 200 "197203" "199707" 1987                 16 15
81403 "Hong Kong"         710 200 "197203" "199707" 1988                 17 16
81403 "Hong Kong"         710 200 "197203" "199707" 1989                 18 17
81403 "Hong Kong"         710 200 "197203" "199707" 1990                 19 18
81403 "Hong Kong"         710 200 "197203" "199707" 1991                 20 19
81403 "Hong Kong"         710 200 "197203" "199707" 1992                 21 20
81403 "Hong Kong"         710 200 "197203" "199707" 1993                 22 21
81403 "Hong Kong"         710 200 "197203" "199707" 1994                 23 22
81403 "Hong Kong"         710 200 "197203" "199707" 1995                 24 23
81403 "Hong Kong"         710 200 "197203" "199707" 1996                 25 24
81403 "Hong Kong"         710 200 "197203" "199707" 1997 25.415468856947296 25
end

my concern is to the "Eastern Greenland" case, it has 12 years observations, instead of 13(1921-1933). to the "Hong Kong" case, it is correct, at least i think so, as it has 26 observations (1972-1997). so i am not sure what is going wrong with my codes? why the same codes make two cases produce different number of observations? Thanks in advance.

last, my stata version is stata/IC, 16.1