I naively tried to throw a fairly sizable dataset with over 30,000 observations at it and I couldn't really tell if I was going to get convergence in some reasonable time. The data is Case-I interval-censored, which is to say it's a single point-in-time observation with a single covariate. The longest observable survival time is around 150.
In order to get a rough estimate of how it scales, I reran the command on a small random sample of the data data starting with 1000 observations up to 15000 observations and timed the completion, using the favorspeed option. I'm using 4-core Stata/MP, and according to the latest Stata/MP Performance Report, the 4-core performance is only 1.3x and doesn't scale much beyond that at 16 cores. This going to be a little quick-and-dirty, so take the results with a grain of salt. I got the following results:
Observations | Time (s) | Time increase (%) |
1000 | 4.1 | |
2000 | 10.8 | 167% |
3000 | 19.1 | 77% |
4000 | 36.2 | 89% |
5000 | 63.6 | 76% |
6000 | 63.5 | 0% |
7000 | 110.9 | 75% |
8000 | 147.3 | 33% |
9000 | 172.1 | 17% |
10000 | 218.1 | 27% |
11000 | 250.3 | 15% |
12000 | 346.5 | 38% |
13000 | 425.6 | 23% |
14000 | 488.5 | 15% |
15000 | 592.1 | 21% |
I was initially worried that the time was going to nearly double every 1000 observations which would have made estimation on the whole dataset infeasible, but the time delta seems to scale better at the 8,000 obs mark. I'm guessing estimation on the full dataset will converge in a couple hours, though I don't have a sense of how additional covariates are going to impact things.
Anyhow, to the meat of my question. Does anyone have any experience estimating stintcox on datasets with 20,000+ observations under different models and scenarios? If so, do you have any rough observations about the speed of convergence under different scenarios, or any particular data-structure issues to look out for that might make convergence practically infeasible?
0 Response to Has anyone benchmarked the performance of the new -stintcox- "Interval censored Cox proportional hazards model" command?
Post a Comment