Hi everyone,
I want to make a panel data dta file from several excel files sitting in several folders in my directory. I have two do files that can clean and transfer the data in each folder of excel files into stata. But I look for an adjustment in my code that enables it to automatically go through folders. The folder names do not follow any special pattern, they are basically named after some mineral products. I have read the -folders- command and other extended macro functions uses to transfer files into stata and prepared the following code:
************************************************** ************************************************** ************************************************** ************************************************** *************
clear all
global user "C:\Users\Shadi\Dropbox\ADC_JM_Commodities\Dat a\Co mmodity production\production_post1970"
cd "$user\_all"
clear
save "C:\Users\Shadi\Dropbox\ADC_JM_Commodities\Dat a\Co mmodity production\production_post1970\_all\all.dta", replace emptyok
cd "$user"
folders
foreach d in `r(folders)' {
cd "`d'"
fs filename*
foreach f in `r(files)' {
ls `f'
local datafiles: dir "." files "*.xlsx"
dir
foreach file of local datafiles {
import excel "`file'" , sheet("World mineral statistics data") allstring clear
rename * c(#), renumber
describe, short varlist
scalar ln = r(k)
local i = 1
foreach var of varlist c2-c`=ln' {
replace `var' = c`i' if `var' =="" & _n == 2
local i = 1+ `i'
}
local name = c2[1]
replace c2 = "`name'"
foreach var of varlist c1-c`=ln' {
replace `var' = subinstr(`var', "(a)", "",.)
replace `var' = subinstr(`var', "*", "",.)
replace `var' = subinstr(`var', "(b)", "",.)
replace `var' = subinstr(`var', "#", "",.)
replace `var' = subinstr(`var', "(f)", "",.)
replace `var' = subinstr(`var', "(k)", "",.)
replace `var' = subinstr(`var', "(c)", "",.)
replace `var' = subinstr(`var', "(h)", "",.)
replace `var' = subinstr(`var', "(r)", "",.)
replace `var' = subinstr(`var', "(d)", "",.)
replace `var' = subinstr(`var', "(q)", "",.)
replace `var' = subinstr(`var', "(g)", "",.)
replace `var' = subinstr(`var', "(n)", "",.)
replace `var' = subinstr(`var', "(j)", "",.)
replace `var' = subinstr(`var', "(l)", "",.)
replace `var' = subinstr(`var', "(i)", "",.)
replace `var' = subinstr(`var', "(t)", "",.)
replace `var' = subinstr(`var', "(g)", "",.)
replace `var' = subinstr(`var', "(e)", "",.)
}
replace c1 = "" if c1 == "#)"
replace c1 = "" if c1 == "*)"
replace c1 = "" if c1 == "a)"
replace c1 = "" if c1 == "b)"
replace c1 = "" if c1 == "c)"
replace c1 = "" if c1 == "d)"
replace c1 = "" if c1 == "i)"
replace c1 = "" if c1 == "f)"
replace c1 = "" if c1 == "g)"
replace c1 = "" if c1 == "e)"
replace c1 = "" if c1 == "h)"
replace c1 = "" if c1 == "j)"
replace c1 = "" if c1 == "1:"
replace c1 = "" if c1 == "2:"
replace c1 = "" if c1 == "Footnotes"
replace c1 = "" if c1 == "Notes"
replace c1 = "" if c1 == ")"
replace c1 = "" if c1 == "Table notes"
replace c1 = "" if c1 == "3:"
replace c1 = "" if c1 == "l)"
replace c1 = "" if c1 == "n)"
replace c1 = "" if c1 == "t)"
replace c1 = "" if c1 == "k)"
drop if c1 == ""
local i = 3
foreach var of varlist c4-c`=ln' {
local n1 = `var'[2]
local n2 = c`i'[2]
if `n1' == `n2' & regexm(`var'[2], "(^[0-9][0-9][0-9][0-9]$)") {
replace `var' = "y1_`=regexs(1)'" if _n== 1
}
if `n1' ~= `n2' & regexm(`var'[2], "(^[0-9][0-9][0-9][0-9]$)") {
replace `var' = " y_`=regexs(1)'" if _n== 1
}
local i = 1+ `i'
}
local year = c3[2]
rename c3 y_`year'
foreach var of varlist c4-c`=ln' {
local year = `var'[1]
rename `var' `year'
}
}
}
cd ..
}
************************************************** ************************************************** ************************************************** ************************************************** ******************************************
The problem is that the code only lists different directories for me for each of the folders as shown below:
*************
C:\Users\Shadi\Dropbox\ADC_JM_Commodities\Data\Com modity production\productio
> n_post1970\aluminium
C:\Users\Shadi\Dropbox\ADC_JM_Commodities\Data\Com modity production\productio
> n_post1970
...
*************
It gives no errors but the commands in between that are supposed to clean my data do not run and I get no dataset when I browse to see the data table. The results is the same when I include no command in between and run it like the following:
************************************************** ************************************************** ************************************************** *************
clear all
global user "C:\Users\Shadi\Dropbox\ADC_JM_Commodities\Dat a\Co mmodity production\production_post1970"
cd "$user\_all"
clear
save "C:\Users\Shadi\Dropbox\ADC_JM_Commodities\Dat a\Co mmodity production\production_post1970\_all\all.dta", replace emptyok
cd "$user"
folders
foreach d in `r(folders)' {
cd "`d'"
fs filenames*
foreach f in `r(files)' {
ls `f'
}
cd ..
}
************************************************** ************************************************** ************************************************** *************
I am very new to looping over folders and tried to find relevant post but I couldn't. I would be glad to receive your comments and suggestions. I'm making a mistake that is not visible to me and I appreciate it when you could point it out.
Best regards,
Shadi
0 Response to Looping over folders in the directory and then over files in each folder
Post a Comment