Hello,

I am constructing a pooled dataset from the demographic and health surveys (DHS).

Currently, I have all my data stored in one huge folder, in which each file represents a data for a country, say, Albenia 2008, Albenia 2014; Kenya 2003, Kenya 2008.

I have harmonized all the datasets to ensure that the variables are equivalent. The next stage is to append them to each other.

However, there are a set of operations that I need to do on each dataset before trying to append/pool them. Some of these are computing and developing new variable weights (for denormalization processes), creating country level summaries (for example, GNP, HIV prevalence etc).

At the moment, i call each dataset, execute the commands for that dataset, close it, and open the next etc. One way is to write a major do file that will call each dataset, run it, save the output and move on to the next. I think there can be a more efficient way of doing this - a sort of loop that will input and output a new dataset with the changes.

I would appreciate some thoughts on whether this is possible/feasible.

cheers, Yy