Dear Statalisters,
I'm using Stata 17.0.
I have a composite outcome: in particular, the 3 components of my outcome are: 1) Status (positive vs negative) at T1; 2) Status at T2; 3) Status at T3. My outcome is: "being positive in at least one timepoint vs being always negative". If I ignored observations with missing values, I would lose also the ones that I'm sure have outcome=1, because they are positive in at least one timepoint. If I included observations positive in at least one timepoint but with at least one missing value, I would overestimate the outcome probability (observations with 1 or 2 missing values could be either positive or missing, but not negative). Thus, I am forced to do imputation. In particular, I
A) used the "mi impute chained (logit)" command, using one binary predictor ("treatment", that is actually also my variable of interest in the final model), with 10 multiple imputations;
B) built my composite predictor, using the "mi passive" command;
C) performed my logistic regression of the composite outcomand on treatment, through "mi estimate";
D) after observing that the Largest FMI is equal to 0.4082, repeated points A-C by using 50 imputations (to respect the rule of thumb to have at least 100*LFMI imputations);
E) after observing that the Largest FMI with 50 replications is equal to 0.4379, accepted the results and passed to the post-estimation phase.
I have, however, several doubts about such approach.
1) Does it work better than simply estimating probabilities for each group separately (e.g.: P (T3=1 | T1=0, T2=0, treatment=1)), in order to replace missing values with the probabilities to have Outcome=1, and then performing a fractional logit regression? At the end of the day, the probabilities to estimate would be 7*2=14 (7 combinations of T1, T2 and T3 being missing or 0, since when they are all zeros no imputation is required, and 2 treatment status). I understand that the probabilities estimated in such a way wouldn't be observed values, thus I guess I would somehow underestimate standard errors, but it would seem to me a much more intuitive approach.
2) Is the rule of thumb of 100*LFMI imputations still valid when the outcome is binary, and when the imputed values are outcome components, or I should increase the number of imputations?
3) The imputation model estimates how the probability of being positive in a given timepoint differ depending on whether it is positive or negative in the other timepoints. I'm not actually interested in that: a positive status in any timepoint makes status at the other timepoints irrelevant. Shouldn't I use a somehow more direct approach, meant to estimate the probabilities I talk about at point A, i.e.: P(Outcome=1 | available information), by disregarding all situations where I already know that the outcome is positive? Or, put another way, shouldn't I just base my estimations on P(T1=1 | t2!=1, t3!=1); P(T2=1 | t1!=1, t3!=1); P(T3=1 | t1!=1, t2!=1) (I used capital letters to mean "the real value", thus either 0 or 1, and lower-case letter to mean "the observed value", thus including "missing")?
4) Does the order in which I list the variables to impute matter? I noticed Stata first impute values at T1 using treatment, then at T2 using T1 and treatment, then at T3 using T2, T1 and treatment, then re-estimate everything using everything. Does convergence guarantee irrelevance of the starting point? Otherwise, how could I get rid of such arbitrariness?
Related Posts with Use of multiple imputation for missing values of binary outcome components
Efficient construction of loop commands involving strmatch() when numlist values have leading zeroesHi All, running Stata 15.1 I am attempting to use a list of two-digit numeric prefixes to tag obser…
Saving multiple Excel files as Stata filesIn "C:\Users\budug\OneDrive\Documents\School\Projects ", I have a few Excel files, namely data_2004,…
Benchmarking Stata to Test Speed Across Computers and VersionsAfter helping my son upgrade his gaming machine, I decided it was time to upgrade my 8-year old desk…
Likert normalizationHi - I'm helping my son with his science fair project. He did placebo taste testing where he has so…
Shaded area xtline graphDear Statalisters, Currently I am working on a panel dataset where my probit model predicts either …
Subscribe to:
Post Comments (Atom)
0 Response to Use of multiple imputation for missing values of binary outcome components
Post a Comment