I have started learning Stata recently and I'm super excited to be here!
I have been following the forum as a guest and realized that its a really connected community here, so I wanted to introduce myself briefly first.
I am a master's in finance student studying in Portugal, and working as a research assistant to two truly amazing professors here. I am looking to pursue a Ph.D. afterward, so likely that I will be around for many years to come

My question is, I had appended 1620 *.csv files to form a *.dta. But the total size of the *.csv files was 2.5GB whereas the newly formed *.dta is about 20GB, despite I dropped some variables. If I simply export the *.dta as a *.csv and then imported it and saved as a *.dta and file become only 1.2GB. Considering that both of those files should contain the same information, I don't understand how can the size vary so much. Is there something wrong with my code, or is that a normal feature of *.dta file type?
Thank you!
The code I used to convert *.csv to *.dta:
Code:
clear
clear matrix
local dir "E:\Research\"
cd "`dir'\input"
set more off
local folderlist : dir . dirs "*"
foreach folder of local folderlist {
mkdir "`dir'temp\\`folder'\\"
local csvlist : dir "`dir'\input/`folder'" files"*.csv"
foreach file of local csvlist {
drop _all
insheet using "`dir'input\\`folder'\\`file'", clear
drop v1
drop v2
drop v3
drop v4
drop v5
local outfile = subinstr("`dir'\temp\\`folder'\\`file'",".csv","",.)
save "`outfile'", replace
}
**csv to dta conversion is done at this point
}
cd "`dir'\output"
save Database, emptyok
cd "`dir'\temp"
local folderlist : dir . dirs "*"
foreach folder of local folderlist {
local filelist: dir "`dir'\temp/`folder'" files"*.dta"
foreach file of local filelist {
cd "`dir'\temp/`folder'"
use `"`file'"', clear
di `"`file'"'
gen source = `"`file'"'
cd "`dir'\output"
append using Database
save Database, replace
}
use Database
drop source
duplicates list id
duplicates drop id, force
save Database, replaceThe code I used to convert *.dta to *.csv and then back to *.dta:
Code:
clear clear matrix local dir "C:\RA\Week 8\Database v1.1 Lean" use "`dir'\input\Database.dta" export delimited using "`dir'\temp\Database.csv", replace import delimited "`dir'\temp\Database.csv",varnames(1) clear save "`dir'\output\Database v1.1.dta"
No comments:
Post a Comment