Dear All,

This is my first post on the forum.

My task is to import into stata a number of excel files, clean them, merge them and make them ready for regression analysis.
For each file, I need to retrieve data from 3 separate specific tables (out of many tables in the file) and merge them together.

However, I have the problem that the cell-range changes, both the starting and ending cell, for each excel file. The coloumn's should be constant from table to table or at least within each table so I can already write them in the respective cell-ranges, but the starting and ending row are unknown and change for each table.

The common characteristic of where the starting cell is for the 3 tables across the excel files is 5 rows below : "Table 1" ,"Table 4", "Table M1". And they all end on the closest row to the start of the range with the phrase "Average of forecasts"

What the code should look like is something like this:

1) local directory over the files in the folder
2) for each file, import it into excel, read the rows and (maybe define a local) with the specific row for each of the 3 tables, given the common criteria listed above (i have tried with substr)
3) import excel for each of the cellranges separately (as for each table I need to manipulate it separately and then merge them together)


local myfilelist : dir "/Users/adrianomariani/Desktop/Research Assistant scheme/Forecasts excel" files "*.xlsx"

foreach filename of local myfilelist {

import excel using `filename', clear

gen start1=_n+5 if substr(A,1,7)=="Table 1"
gen end1=_n-1 if substr(A,1,7)=="Average" & _n<(start1+80) // i am using start2+80 as there are many averages, and I want the first one after 'start1', similarly for the others
gen start2=_n+5 if substr(A,1,7)=="Table 4"
gen end2=_n-1 if substr(A,1,7)=="Average" & _n<(start2+80)
gen start3=_n+5 if substr(A,1,8)=="Table M1"
gen end3=_n-1 if substr(A,1,7)=="Average" & _n<(start3+80)

local table1s dis(start1)
local table1e dis(min(end1))
local table4s dis(start2)
local table4e dis(min(end2))
local tableM1s dis(start3)
local tableM1e dis(min(end3))

import excel using `filename', cellrange(A`table1s':D`table1e') clear

......

import excel using `filename', cellrange(A`table4s':D`table4e') clear

.....


import excel using `filename', cellrange(A`tableM1s':H`tableM1e') clear //the coloumns 'A,H' in this case and 'A,D' in the other two are constant, just the rows are of interest

.....

}

When I try to run this (even without the loop over cell-files and hence on just one file) it returns the problem that cellrange is out of range, so there must be a problem with how I have defined it.

Thanks in advance for the help