Generating a set of imputed data in parallel

Hello,
I am analysing data for a large number of school students (over 1 million).
Data on parental background --comprising 3 variables-- is missing for about 5 percent. Few students are missing all 3 variables.

I am using the mi impute chained command.
Two variables are binary. The other is continuous.

The heavy lifting is being done using MP on a HPC. In batch mode. I'm running 16 CPUs, but I could increase to 32.
(I did try testing my code on my 6 core desktop...which didn't go well).
And I am generating 20 imputations (m=20), with the data set as flong.

Problem is, it has taken 13 hours to develop just one imputation (doing 10 iterations).

So at this rate it would take 11 days to generate 20 imputations.

I can run multiple jobs in parallel.
So... is it an option to say run 10 jobs in parallel, where each generate 2 imputations. With each job using a different seed.
And then append the resulting observations together?

I believe it is theoretically sound, as imputing with m = 20, is essentially randomly choosing 20 points on the distribution. Each imputation is independent of the others.

Which I believe is no different to imputing with m = 2, 10 times...so long as a different seed is used.

Does anyone disagree?

If the theory is sound, there is then a question of how to append the 10 files together to look like one file. And for the mi settings etc to work.

Has anyone got any experience doing this? With the data on the HPC in command line mode, my usual trick of going in to inspect the data isn't possible.

Regards,

Andrew

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Generating a set of imputed data in parallel
Generating a set of imputed data in parallel

0 Response to Generating a set of imputed data in parallel

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Generating a set of imputed data in parallel Generating a set of imputed data in parallel

Related Posts with Generating a set of imputed data in parallel

0 Response to Generating a set of imputed data in parallel

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Generating a set of imputed data in parallel
Generating a set of imputed data in parallel