Dear statalist users,
I used the following code in internal validation of our model, and get the new file of area (bootstrap AUC), diff (bootstrap AUC-base AUC (only one base AUC?)) and optimism (final AUC, I suppose?) of 200 bootstrap samples. The following is my code according to your suggestion in website:
capture program drop optimism
program define optimism, rclass
preserve
bsample
logit AO agec i.sex i.jobm i.incomef i.snec bmi i.lungsymp i.mrcyn i.diaasthma
lroc, nograph
return scalar area_bootstrap = r(area)
end
logit AO agec i.sex i.jobm i.incomef i.snec bmi i.lungsymp i.mrcyn i.diaasthma
lroc, nograph
local base_ROC = r(area)
tempfile sim_results
simulate area = r(area_bootstrap), reps(200) seed(12345) saving(`sim_results'): optimism use 'sim_results', clear
sum area
gen diff = area - 0.7410
gen optimism = 0.7410 - diff
sum area
sum diff
sum optimism
_pctile optimism, p(2.5 50 97.5)
return list
According to TRIPOD explanation and elaboration, the bootstrap validation should include 6 steps:
1. Develop the prediction model in the orignial data and determine the apparent AUC.
2. Generate a bootstrap sample.
3. Develop a model using the bootstrap sample (applying all the same modeling and predictor selection methods), determining the apparent performace of the model on the bootstrhap sample and the test performance of the bootstrap model in the original sample. (My question is which is the codes of testing performance of the bootstrap model in the original sample?)
4. Calculate the optimism as the difference between the bootstrap performance and test performance (Is the only base AUC the test performance?).
5. Repeat steps 2 through 4 200 times.
6. Average the estmates of optmism in step 5, and substract the value from the apparent performance obtained in step 1 to obtain the optimism-corrected estimate of performance.
The main question is where is the code for testing performance (the performance of bootstrapmodel in the original sample)? Should we use the apparent performance obtained in step 1, instead of the testing performance?
Another question is what command is for convert the log to predicted probbability for cox regression model? I know the command for logistic regression model is 'invlogit'.
Many thanks!
0 Response to bootstrap for internal validation: how can we calculate the testing performance (the performance of bootstrapmodel in the original sample)?
Post a Comment