Survival Data

Dear all,
I am trying to restructure/compress my dataset because it’s currently too big to do anything with. In its current form I have around 200,000 individuals observed in 1080 time periods (days) each – giving me a dataset with more than 200m obs.
I am using it for a survival analysis and its current form looks like this:

id	t0	t1	y	var1	var2	var3
1	0	1	0	0	0	4
1	1	2	0	2	1	4
1	2	3	0	2	1	4
1	3	4	0	3	1	4
1	4	5	1	5	0	4

I.e. individual 1’s failure time is t1==5.
var1 and var2 are time-varying variables and var3 is constant.
I am mainly interested in the effect of the time-varying variables var1 and var2.
For instance, I want to run the following cox regression model
stset t1, failure(y==1) time0(t0) id(id)
stcox var1 var2 var3
Here, the interpretation of (the exponentiated coefficient of) var1 is the percentage-change in the hazard ratio associated with a unit increase in var2 in a given day.
However, because of the size of the dataset I am thinking about restructuring to something like:

id	t0	t1	y	var1	var2	var3
1	0	1	0	0	0	4
1	1	3	0	2	1	4
1	3	4	0	3	1	4
1	4	5	1	5	0	4

I.e. I want to collapse rows where var1 and var2 don’t change to end up with fewer observations. This can be done with:
collapse (first) t0 (last) t1 (first) var3, by(id var1 var2 y)
My question is:
Will the interpretation of var1 remain the same? I.e. will Stata still know that individual 1 had var1=2 in t1=2 and t1=3?

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Survival Data
Survival Data

0 Response to Survival Data

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Survival Data Survival Data

Related Posts with Survival Data

0 Response to Survival Data

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Survival Data
Survival Data