Dear statalisters,

I am currently trying to develop a mapping algorithm from the EQ-5D-5L to a disease-specific HRQoL instrument.

I want to do internal validation and I am planning on using the function crossfold to do k-fold cross-validation.

This is what I am doing for e.g. the mixed-effects linear regression.

crossfold xtreg eq5d diseasescore age i.gender, mle k(10)

However, I am not sure how to proceed with the cross validation when I run a two-part model.

In this two part model, I first conducted a mixed-effects logistic regression which was fitted to predict the probability of a respondent having full health (EQ-5D-5L score equal to one). In a second stage a mixed-effects OLS truncated to those who do not have full health was estimated. The overall expected EQ-5D-5L index score was calculated using an expected value approach with the formula Expected EQ-5D = predicted value for those in full health + (1- predicted value for those in full health) * predicted value for those in imperfect health.

Basically I do not know how can I do cross validation in this specific case. I know I could use the crossfold command for each part of the model (logistic regression or linear regression), but I am interested in the final eq-5d results obtained from the two parts of the model...

Thanks a lot!



I am using STATA 15.

Thanks!