I am trying to restructure/compress my dataset because it’s currently too big to do anything with. In its current form I have around 200,000 individuals observed in 1080 time periods (days) each – giving me a dataset with more than 200m obs.
I am using it for a survival analysis and its current form looks like this:
id | t0 | t1 | y | var1 | var2 | var3 |
1 | 0 | 1 | 0 | 0 | 0 | 4 |
1 | 1 | 2 | 0 | 2 | 1 | 4 |
1 | 2 | 3 | 0 | 2 | 1 | 4 |
1 | 3 | 4 | 0 | 3 | 1 | 4 |
1 | 4 | 5 | 1 | 5 | 0 | 4 |
var1 and var2 are time-varying variables and var3 is constant.
I am mainly interested in the effect of the time-varying variables var1 and var2.
For instance, I want to run the following cox regression model
stset t1, failure(y==1) time0(t0) id(id)
stcox var1 var2 var3
Here, the interpretation of (the exponentiated coefficient of) var1 is the percentage-change in the hazard ratio associated with a unit increase in var2 in a given day.
However, because of the size of the dataset I am thinking about restructuring to something like:
id | t0 | t1 | y | var1 | var2 | var3 |
1 | 0 | 1 | 0 | 0 | 0 | 4 |
1 | 1 | 3 | 0 | 2 | 1 | 4 |
1 | 3 | 4 | 0 | 3 | 1 | 4 |
1 | 4 | 5 | 1 | 5 | 0 | 4 |
collapse (first) t0 (last) t1 (first) var3, by(id var1 var2 y)
My question is:
Will the interpretation of var1 remain the same? I.e. will Stata still know that individual 1 had var1=2 in t1=2 and t1=3?
0 Response to Survival Data
Post a Comment