I am looking a for an efficient way to make draws from a dataset for each observation and write the data for those draws to that observation to new variables.
Example:
I have a dataset of pairs of jobs at two different locations within the same firm. Next I wish to add a randomly selected job (half of a pairing) at a different firm and with different other characteristics such as wage but with the same location, SOC, PayFrequency, SalaryType, and Year as the first job in the pairing. I would write the characteristics of this drawn job to new variables and do this for each observation (pairing).
More detail on what we tried:
So far we have tried looping through individual observations, which was not efficient (computationally impossible with millions of observations). We tried two methods looping through the observations. First, keeping an appropriate subset of the data and then sampling a single observation from that subset and saving it to disk. Then we also tried preserving the big dataset, keeping an appropriate subset and sampling an observation, writing all of the variable values to macros, restoring the big data, and then writing to that observation with the macro values.
We also tried joining by the desired same characteristics and then keeping only observations in which the draw was from a different firm and met other criteria, but this rapidly exceeded memory available.
Thanks in advance for any help. I should mention we are using Stata 15.
0 Response to Efficient way to make draws for each observation and write to dataset in memory
Post a Comment