i was preparing my data for survival analysis and encountered a problem. Below is a sample which contains two cases
Code:
* Example generated by -dataex-. For more info, type help dataex clear input long claimdy str54 name int(chal tgt) long(begclaim endclaim) 3801 "Eastern Greenland" 385 390 192107 193304 81403 "Hong Kong" 710 200 197203 199707 end
Code:
tostring begclaim, replace tostring endclaim, replace gen rightcensor=1 if endclaim=="200199" generate claimstart = date(begclaim, "YM") generate claimend = date(endclaim, "YM") format claimstart %td format claimend %td gen claimend_mon=month(claimend) gen claimend_yr=year(claimend) gen leap = mod(claimend_yr,400)==0 | mod(claimend_yr,4)==0 & mod(claimend_yr,100)!=0 gen claimend_day=31 if claimend_mon==1 replace claimend_day=30 if claimend_mon==4 replace claimend_day=31 if claimend_mon==7 gen claimend2=mdy(claimend_mon, claimend_day, claimend_yr) format claimend2 %td gen claimserialstart= claimstart format claimserialstart %td gen claimserialend=claimend2 format claimserialend %td gen claimfail=1 if rightcensor==. gen claimbeg_yr=year(claimserialstart) gen claimbeg_day=day(claimserialstart) drop claimend claimend_mon claimend_yr claimstart **because territorial norm starts in 1919 gen enterdate=mdy(1, 1, 1919) format enterdate %td stset claimserialend, id(claimdy) fail(claimfail==1) origin(time claimserialstart) enter(enterdate) scale(365.25) stsplit yearst,every(1) *gen month=month(claimserialend) *gen year=year(claimserialend) gen year =0 sort claimdy _t by claimdy _t:replace year= claimbeg_yr + _t if claimbeg_yr < 1919 by claimdy _t:replace year= claimbeg_yr + _t0 if claimbeg_yr >= 1919
Code:
* Example generated by -dataex-. For more info, type help dataex clear input long claimdy str54 name int(chal tgt) str6(begclaim endclaim) float year double _t byte _t0 3801 "Eastern Greenland" 385 390 "192107" "193304" 1921 1 0 3801 "Eastern Greenland" 385 390 "192107" "193304" 1922 2 1 3801 "Eastern Greenland" 385 390 "192107" "193304" 1923 3 2 3801 "Eastern Greenland" 385 390 "192107" "193304" 1924 4 3 3801 "Eastern Greenland" 385 390 "192107" "193304" 1925 5 4 3801 "Eastern Greenland" 385 390 "192107" "193304" 1926 6 5 3801 "Eastern Greenland" 385 390 "192107" "193304" 1927 7 6 3801 "Eastern Greenland" 385 390 "192107" "193304" 1928 8 7 3801 "Eastern Greenland" 385 390 "192107" "193304" 1929 9 8 3801 "Eastern Greenland" 385 390 "192107" "193304" 1930 10 9 3801 "Eastern Greenland" 385 390 "192107" "193304" 1931 11 10 3801 "Eastern Greenland" 385 390 "192107" "193304" 1932 11.83025325119781 11 81403 "Hong Kong" 710 200 "197203" "199707" 1972 1 0 81403 "Hong Kong" 710 200 "197203" "199707" 1973 2 1 81403 "Hong Kong" 710 200 "197203" "199707" 1974 3 2 81403 "Hong Kong" 710 200 "197203" "199707" 1975 4 3 81403 "Hong Kong" 710 200 "197203" "199707" 1976 5 4 81403 "Hong Kong" 710 200 "197203" "199707" 1977 6 5 81403 "Hong Kong" 710 200 "197203" "199707" 1978 7 6 81403 "Hong Kong" 710 200 "197203" "199707" 1979 8 7 81403 "Hong Kong" 710 200 "197203" "199707" 1980 9 8 81403 "Hong Kong" 710 200 "197203" "199707" 1981 10 9 81403 "Hong Kong" 710 200 "197203" "199707" 1982 11 10 81403 "Hong Kong" 710 200 "197203" "199707" 1983 12 11 81403 "Hong Kong" 710 200 "197203" "199707" 1984 13 12 81403 "Hong Kong" 710 200 "197203" "199707" 1985 14 13 81403 "Hong Kong" 710 200 "197203" "199707" 1986 15 14 81403 "Hong Kong" 710 200 "197203" "199707" 1987 16 15 81403 "Hong Kong" 710 200 "197203" "199707" 1988 17 16 81403 "Hong Kong" 710 200 "197203" "199707" 1989 18 17 81403 "Hong Kong" 710 200 "197203" "199707" 1990 19 18 81403 "Hong Kong" 710 200 "197203" "199707" 1991 20 19 81403 "Hong Kong" 710 200 "197203" "199707" 1992 21 20 81403 "Hong Kong" 710 200 "197203" "199707" 1993 22 21 81403 "Hong Kong" 710 200 "197203" "199707" 1994 23 22 81403 "Hong Kong" 710 200 "197203" "199707" 1995 24 23 81403 "Hong Kong" 710 200 "197203" "199707" 1996 25 24 81403 "Hong Kong" 710 200 "197203" "199707" 1997 25.415468856947296 25 end
my concern is to the "Eastern Greenland" case, it has 12 years observations, instead of 13(1921-1933). to the "Hong Kong" case, it is correct, at least i think so, as it has 26 observations (1972-1997). so i am not sure what is going wrong with my codes? why the same codes make two cases produce different number of observations? Thanks in advance.
last, my stata version is stata/IC, 16.1
0 Response to stset issue
Post a Comment