I have started learning Stata recently and I'm super excited to be here!
I have been following the forum as a guest and realized that its a really connected community here, so I wanted to introduce myself briefly first.
I am a master's in finance student studying in Portugal, and working as a research assistant to two truly amazing professors here. I am looking to pursue a Ph.D. afterward, so likely that I will be around for many years to come

My question is, I had appended 1620 *.csv files to form a *.dta. But the total size of the *.csv files was 2.5GB whereas the newly formed *.dta is about 20GB, despite I dropped some variables. If I simply export the *.dta as a *.csv and then imported it and saved as a *.dta and file become only 1.2GB. Considering that both of those files should contain the same information, I don't understand how can the size vary so much. Is there something wrong with my code, or is that a normal feature of *.dta file type?
Thank you!
The code I used to convert *.csv to *.dta:
Code:
clear clear matrix local dir "E:\Research\" cd "`dir'\input" set more off local folderlist : dir . dirs "*" foreach folder of local folderlist { mkdir "`dir'temp\\`folder'\\" local csvlist : dir "`dir'\input/`folder'" files"*.csv" foreach file of local csvlist { drop _all insheet using "`dir'input\\`folder'\\`file'", clear drop v1 drop v2 drop v3 drop v4 drop v5 local outfile = subinstr("`dir'\temp\\`folder'\\`file'",".csv","",.) save "`outfile'", replace } **csv to dta conversion is done at this point } cd "`dir'\output" save Database, emptyok cd "`dir'\temp" local folderlist : dir . dirs "*" foreach folder of local folderlist { local filelist: dir "`dir'\temp/`folder'" files"*.dta" foreach file of local filelist { cd "`dir'\temp/`folder'" use `"`file'"', clear di `"`file'"' gen source = `"`file'"' cd "`dir'\output" append using Database save Database, replace } use Database drop source duplicates list id duplicates drop id, force save Database, replace
The code I used to convert *.dta to *.csv and then back to *.dta:
Code:
clear clear matrix local dir "C:\RA\Week 8\Database v1.1 Lean" use "`dir'\input\Database.dta" export delimited using "`dir'\temp\Database.csv", replace import delimited "`dir'\temp\Database.csv",varnames(1) clear save "`dir'\output\Database v1.1.dta"
0 Response to dta files gets extremely large after appending
Post a Comment