Hi all,

I am trying to set up survival data for a cohort of breast cancer patients, following them from their breast cancer diagnosis until date of death or end of 2017, whichever is earliest (I.e. I want alive patients on 31 Dec 2017 to be censored at that date).

I set up my survival analysis as follows;

gen timetodeath=min(dateofdeath,td(31dec2017))-dateofdiag)/365.25

My outcome of interest is breast cancer death, which is a dichotomous variable in my dataset representing if a woman died of breast cancer or not (0=didn’t die of bc, 1=did die of breast cancer).

I then stset my data as follows;

stset timetodeath failure(bcdeath=1) id(patientid)

When I check my ‘timetodeath’ variable, patients have been correctly followed up for the length of time I want them to be followed up for (I.e. from their date of diagnosis until death or end of 2017, whichever is earliest).

However, when I check the ‘_d’ variable that STATA produces when stsetting data, some patients who died of breast cancer AFTER 31 Dec 2017 are being counted as a failure, and aren’t being censored. This is strange to me; as I mentioned, they are being followed up for the correct amount of time, but it seems like they’re being followed until 31 Dec 2017 and then still being counted as a failure/event.

What is going on here? How can I fix my data set up so women who haven’t died by 31 Dec 2017 get censored then and aren’t counted as a death?

Any help would be much appreciated.

Thanks!