Hello,
I am analysing data for a large number of school students (over 1 million).
Data on parental background --comprising 3 variables-- is missing for about 5 percent. Few students are missing all 3 variables.
I am using the mi impute chained command.
Two variables are binary. The other is continuous.
The heavy lifting is being done using MP on a HPC. In batch mode. I'm running 16 CPUs, but I could increase to 32.
(I did try testing my code on my 6 core desktop...which didn't go well).
And I am generating 20 imputations (m=20), with the data set as flong.
Problem is, it has taken 13 hours to develop just one imputation (doing 10 iterations).
So at this rate it would take 11 days to generate 20 imputations.
I can run multiple jobs in parallel.
So... is it an option to say run 10 jobs in parallel, where each generate 2 imputations. With each job using a different seed.
And then append the resulting observations together?
I believe it is theoretically sound, as imputing with m = 20, is essentially randomly choosing 20 points on the distribution. Each imputation is independent of the others.
Which I believe is no different to imputing with m = 2, 10 times...so long as a different seed is used.
Does anyone disagree?
If the theory is sound, there is then a question of how to append the 10 files together to look like one file. And for the mi settings etc to work.
Has anyone got any experience doing this? With the data on the HPC in command line mode, my usual trick of going in to inspect the data isn't possible.
Regards,
Andrew
Related Posts with Generating a set of imputed data in parallel
Dummy Variables OmmittedIn a study looking at the effect of foreign ownership, we have run into the issue that our variables…
Maximum Likelihood Exploratory Factor Analysis and Confirmatory Factor AnalysisDear Statalist users, I am using Stata 15 and comparing the results between Maximum likelihood expl…
How to make a hbar which shows percentage of sum of one variable over groupsI have a dataset like this (part of it): Code: * Example generated by -dataex-. To install: ssc in…
Hazard as a Propensity Score for Matching - Propensity Score Matching with Time-Dependent CovariatesHello, I am trying to create a matched sample using hazard ratios by observation by id. I am conduct…
Listing unique observations with a rolling windowDear all, Suppose I have the following dataset: Code: clear input str8 memberid year "B000" 1980…
Subscribe to:
Post Comments (Atom)
0 Response to Generating a set of imputed data in parallel
Post a Comment