Dear Statalist,

We have collected annual data on minorities in several countries for 25 years. Now I would like to create a master dataset with all data included: 160 variables with data on 25 years for almost 800 minorities. The variables are in numeric and string format.

The tricky part:
The data for each minority is stored in a separate Excel file (i.e. 800 Excel files).
Each Excel file consists of five worksheets.
Each worksheet contains different variables (and a different number of variables, hence columns).
In the first row of each worksheet the years are listed.
Each worksheet contains two fields indicating country and group identifier.
The design of the worksheets is unfavourable, because country and group identifier and year are in the same row. Similarly, the group identifier and the variables are in the same row as well.

I would like to
  1. Merge all worksheets within each Excel file by year.
  2. Create variable “year”.
  3. Create variable “ID” which informs about country-group per year.
  4. Append all Excel files in order to have one master dataset.

Unfortunately, I was not able to find help for my particular problem. Is there a code that I could use for a loop over all my 800 Excel files?

This is how the worksheets look like:

Worksheet 1 in Excel file A
Country: FRA
Group: B2 Var_1 Var_2 Var_3
1990
1991
1992
1993
1994

Worksheet 2 in Excel file A
Country: FRA
Group: B2 Var_4 Var_5 Var_6
1990
1991
1992
1993
1994









This is how the worksheets should look like after they were merged:
ID year Var_1 Var_2 Var_3 Var_4 Var_5 Var_6
FRA-B2 1990
FRA-B2 1991
FRA-B2 1992
FRA-B2 1993
FRA-B2 1994
In the above format I could append the various Excel files. I hope my description is specific enough. I very much appreciate your support.

Thank you. Tom.