Hi all,

I want to establish a (logistic regression) model with regards to disease relapse within the first year of a certain treatment and assess performance.

There was missing data, thus I imputed multiple data sets. I established a model in each of these sets, and chose predictors for a final model. The final model has 6 predictors and looks like below
"logit relapse duration esr hb sex mh_card mh_malign"

The first problem occurs, when I want to bootstrap validate the model.
For this I fit a model with the 6 predictors in a bsample (from 1 of the imputed data sets).
The model fit in the bsample and model coefficients were extracted.
These model coefficients then have to be applied in the original sample, and model fit (likelihood) and predicted probabilities have to be extracted.
This is where I encountered a problem: in forcing/pre-specifying the entire model with coefficients in the original sample, and extracting not just the predictions, but also model likelihood.

The second problem is in line with the first. This is when I want to adjust model coefficients based on a shrinkage factor, and extract likelihood and predictions.
- so as an example: the regression coefficient of 'esr' is 1.5, we multiply this by 0.90 (shrinkage factor) and we get a (adjusted) coeffient of 1.35.
- now we want to use (all of the) adjusted coefficient(s), so in the example the 1.35 and extract likelihood and predictions of this model.

I am using STATA/IC version 13.1 for windows.

Code:
*the basical model for the first imputed data set
logit  relapse duration esr hb sex  mh_card mh_malign if _mi_m==1
 below for getting a heuristic uniform shrinkage factor, and adjust coefficients by that 
generate df = e(df_m)
generate shrinkage = (Chi-df)/Chi
foreach var of varlist B_* {
                generate adj`var' = `var' * shrinkage
                * here B_ are the (unadjusted) regression coefficients of the final (original) model (previously saved as variables)
}
 
* below is the prediction model for the first imputed data set (_mi_m==1) and the process i (tried to) use for extracting the model performance i want 
local modelv duration esr hb sex  mh_card mh_malign
qui logit relapse `modelv' if _mi_m==1 // here we want to use our own, pre-determined, coefficients/model
sca n_obs = e(N)
sca llF = e(ll)       // this is what i can't seem to calculate 
sca llR = e(ll_0) // this can be calculated using an intercept only model
sca r2ml = 1 - exp((2*(llR-llF)/n_obs))
sca r2cu = (r2ml)/(1-exp(2*llR/n_obs))
qui generate R2BO = r2cu            //  nagelkerke's R2
qui predict predictedBO if _mi_m==1 // for performance based on predictions                 
qui brier relapse predictedBO // Brier score performance e.g.
With kind regards,

Thomas Bolhuis