Hello,
I am analysing data for a large number of school students (over 1 million).
Data on parental background --comprising 3 variables-- is missing for about 5 percent. Few students are missing all 3 variables.
I am using the mi impute chained command.
Two variables are binary. The other is continuous.
The heavy lifting is being done using MP on a HPC. In batch mode. I'm running 16 CPUs, but I could increase to 32.
(I did try testing my code on my 6 core desktop...which didn't go well).
And I am generating 20 imputations (m=20), with the data set as flong.
Problem is, it has taken 13 hours to develop just one imputation (doing 10 iterations).
So at this rate it would take 11 days to generate 20 imputations.
I can run multiple jobs in parallel.
So... is it an option to say run 10 jobs in parallel, where each generate 2 imputations. With each job using a different seed.
And then append the resulting observations together?
I believe it is theoretically sound, as imputing with m = 20, is essentially randomly choosing 20 points on the distribution. Each imputation is independent of the others.
Which I believe is no different to imputing with m = 2, 10 times...so long as a different seed is used.
Does anyone disagree?
If the theory is sound, there is then a question of how to append the 10 files together to look like one file. And for the mi settings etc to work.
Has anyone got any experience doing this? With the data on the HPC in command line mode, my usual trick of going in to inspect the data isn't possible.
Regards,
Andrew
Related Posts with Generating a set of imputed data in parallel
Grouped Difference of Mean TestDear all, I want to perform a difference of means test on my dataset, to see whether there is a dif…
Matching: Conditioning on lagged dependent variablesHi everybody, I wondered whether it does make sense to condition on the lagged dependent variable w…
Predicting residual in constrained linear regressionI did a constrained linear regression and obtained residuals by predict. Stata 15 reports that it ge…
Trajectory Analysis in longitudinal dataDear Altruists, Good morning. Any suggestions for performing trajectory analysis for longitudinal ob…
Quantify the effect size of an interaction termHello, I have a doubt that it is not specific about Stata. However, I would like to ask you for an…
Subscribe to:
Post Comments (Atom)
0 Response to Generating a set of imputed data in parallel
Post a Comment