Dear statalist users,
I used the following code in internal validation of our model, and get the new file of area (bootstrap AUC), diff (bootstrap AUC-base AUC (only one base AUC?)) and optimism (final AUC, I suppose?) of 200 bootstrap samples. The following is my code according to your suggestion in website:
capture program drop optimism
program define optimism, rclass
preserve
bsample
logit AO agec i.sex i.jobm i.incomef i.snec bmi i.lungsymp i.mrcyn i.diaasthma
lroc, nograph
return scalar area_bootstrap = r(area)
end
logit AO agec i.sex i.jobm i.incomef i.snec bmi i.lungsymp i.mrcyn i.diaasthma
lroc, nograph
local base_ROC = r(area)
tempfile sim_results
simulate area = r(area_bootstrap), reps(200) seed(12345) saving(`sim_results'): optimism use 'sim_results', clear
sum area
gen diff = area - 0.7410
gen optimism = 0.7410 - diff
sum area
sum diff
sum optimism
_pctile optimism, p(2.5 50 97.5)
return list
According to TRIPOD explanation and elaboration, the bootstrap validation should include 6 steps:
1. Develop the prediction model in the orignial data and determine the apparent AUC.
2. Generate a bootstrap sample.
3. Develop a model using the bootstrap sample (applying all the same modeling and predictor selection methods), determining the apparent performace of the model on the bootstrhap sample and the test performance of the bootstrap model in the original sample. (My question is which is the codes of testing performance of the bootstrap model in the original sample?)
4. Calculate the optimism as the difference between the bootstrap performance and test performance (Is the only base AUC the test performance?).
5. Repeat steps 2 through 4 200 times.
6. Average the estmates of optmism in step 5, and substract the value from the apparent performance obtained in step 1 to obtain the optimism-corrected estimate of performance.
The main question is where is the code for testing performance (the performance of bootstrapmodel in the original sample)? Should we use the apparent performance obtained in step 1, instead of the testing performance?
Another question is what command is for convert the log to predicted probbability for cox regression model? I know the command for logistic regression model is 'invlogit'.
Many thanks!
Related Posts with bootstrap for internal validation: how can we calculate the testing performance (the performance of bootstrapmodel in the original sample)?
Multilevel Mixed modellingHi there, I am new to the STATA program (and forum), so excuse me if my question sounds unclear/vag…
How to store the beta/standardized coefficients after a regressionIs there a way to store the beta/standardized coefficients after a regression? For example, how can…
Effect of msising dataHi all, I want to look at the effect of policy on the change in working hours for a certain age gro…
How to check panel model's robustnessGreetings! Please, kindly let me know how to conduct postestimation tests for panel model in order …
Gen Variable: Percentage Share (Row)Dear Statalisters, I'd like to calculate the percentage share of "experience" a member has in its p…
Subscribe to:
Post Comments (Atom)
0 Response to bootstrap for internal validation: how can we calculate the testing performance (the performance of bootstrapmodel in the original sample)?
Post a Comment