Hello Statalist,
I am having an issue running the estimates of a Kaplan Meier estimate via the stset command. Within my study I am looking to understand rates of first marriage at each age among three kinds of migrants (variable mig_type_19). I observe individuals multiple times throughout the study, with the variable person_id uniquely identifying individuals. Individuals can begin the study at ages 15, 16, or 17, which means there is some left truncation. Individuals leave the study after age 24, or upon getting married.

I have a number of questions regarding the output of the sts list command, and whether or not I am using the stset command correctly. First, when I run stset command and sts list over my three migration types, I get the following

Code:
stset age, id(person_id) failure(mar_ind_r)

Survival-time data settings

           ID variable: person_id
         Failure event: mar_ind_r!=0 & mar_ind_r<.
Observed time interval: (age[_n-1], age]
     Exit on or before: failure

--------------------------------------------------------------------------
     16,692  total observations
      1,053  observations begin on or after (first) failure
--------------------------------------------------------------------------
     15,639  observations remaining, representing
      3,700  subjects
        713  failures in single-failure-per-subject data
     80,222  total analysis time at risk and under observation
                                                At risk from t =         0
                                     Earliest observed entry t =         0
                                          Last observed exit t =        24

. sts list, f by(mig_type_19)

        Failure _d: mar_ind_r
  Analysis time _t: age
       ID variable: person_id

Kaplan–Meier failure function
By variable: mig_type_19

             At           Net     Failure      Std.
  Time     risk   Fail   lost    function     error     [95% conf. int.]
------------------------------------------------------------------------
Rural Stayer
    15      493      3      1      0.0061    0.0035     0.0020    0.0187
    16      489      3      9      0.0122    0.0049     0.0055    0.0269
    17      477      5     29      0.0225    0.0067     0.0125    0.0403
    18      443     16     33      0.0578    0.0108     0.0400    0.0833
    19      394     22     32      0.1104    0.0149     0.0846    0.1436
    20      340     20     23      0.1628    0.0181     0.1307    0.2018
    21      297     29     15      0.2445    0.0218     0.2049    0.2903
    22      253     21     37      0.3072    0.0239     0.2631    0.3568
    23      195     18     71      0.3712    0.0260     0.3226    0.4245
    24      106     23     83      0.5076    0.0324     0.4460    0.5725
Rural Mover
    15     2748      2      6      0.0007    0.0005     0.0002    0.0029
    16     2740      3     21      0.0018    0.0008     0.0008    0.0044
    17     2716     23    161      0.0103    0.0019     0.0071    0.0148
    18     2532     35    197      0.0240    0.0030     0.0188    0.0306
    19     2300     37    164      0.0397    0.0039     0.0327    0.0481
    20     2099     54    106      0.0644    0.0050     0.0552    0.0750
    21     1939     48    103      0.0875    0.0059     0.0766    0.0999
    22     1788     65    163      0.1207    0.0070     0.1077    0.1351
    23     1560     88    682      0.1703    0.0084     0.1546    0.1874
    24      790     72    718      0.2459    0.0114     0.2244    0.2691
Urban Stayer
    15      459      2      0      0.0044    0.0031     0.0011    0.0173
    16      457      1      2      0.0065    0.0038     0.0021    0.0201
    17      454     12      6      0.0328    0.0083     0.0199    0.0538
    18      436     15     13      0.0661    0.0117     0.0467    0.0931
    19      408     10     13      0.0890    0.0134     0.0660    0.1193
    20      385     17     13      0.1292    0.0160     0.1011    0.1643
    21      355     14     13      0.1635    0.0178     0.1318    0.2019
    22      328     17     22      0.2069    0.0197     0.1712    0.2488
    23      289     18    119      0.2563    0.0217     0.2166    0.3017
    24      152     20    132      0.3541    0.0278     0.3027    0.4114
------------------------------------------------------------------------
Note: Net lost equals the number lost minus the number who entered.


However if I run a tab of the percent married over the same three groups, I get the following:


Code:
tab age mig_type_19,

           |      Migration Type in 2019
       Age | Rural Sta  Rural Mov  Urban Sta |     Total
-----------+---------------------------------+----------
        15 |       269      1,521        267 |     2,057
        16 |       276      1,420        260 |     1,956
        17 |       254      1,526        272 |     2,052
        18 |       262      1,453        261 |     1,976
        19 |       217      1,240        242 |     1,699
        20 |       217      1,060        194 |     1,471
        21 |       184        994        205 |     1,383
        22 |       198        986        187 |     1,371
        23 |       167      1,016        211 |     1,394
        24 |       174        956        203 |     1,333
-----------+---------------------------------+----------
     Total |     2,218     12,172      2,302 |    16,692

. tab age mig_type_19, sum(mar_ind_r) nost nofreq

                          Means of First Marriage

           |    Migration Type in 2019
       Age | Rural Sta  Rural Mov  Urban Sta |     Total
-----------+---------------------------------+----------
        15 | .01115242  .00131492  .00749064 | .00340301
        16 | .01811594   .0028169  .01153846 | .00613497
        17 | .03543307  .01703801  .05514706 | .02436647
        18 | .08778626  .03234687  .09578544 | .04807692
        19 | .15668203  .05806452  .12809917 | .08063567
        20 | .23041475  .09339623  .19587629 | .12712441
        21 | .32065217  .12977867  .25365854 | .17353579
        22 | .38888889    .163286  .30481283 | .21517141
        23 | .43113772   .1988189  .32701422 | .24605452
        24 | .52298851  .24895397  .34975369 | .30007502
-----------+---------------------------------+----------
     Total | .19071235  .08051265  .15768897 | .10579919
What I am interested in is that are small differences in the failure function and the percent married. For example among rural stayers (the first group) at time/age 15 the failure function = .0061, while the percent married at age 15 = .0115. This is a large difference and can’t be explained by censoring since this is the first year. What is explaining this?

Second, there are only 269 individuals who are rural stayers at age 15, but the corresponding at risk population is 493. Why is there is a difference between these two outputs? What is the “at risk” population referring to?

Which one is correct between the sts list and the cross tabs? Based on the information I have provided is there any additional options that I need to include in the stset command.

Thank you