Hi everyone!
I have started learning Stata recently and I'm super excited to be here!
I have been following the forum as a guest and realized that its a really connected community here, so I wanted to introduce myself briefly first.
I am a master's in finance student studying in Portugal, and working as a research assistant to two truly amazing professors here. I am looking to pursue a Ph.D. afterward, so likely that I will be around for many years to come

My question is, I had appended 1620 *.csv files to form a *.dta. But the total size of the *.csv files was 2.5GB whereas the newly formed *.dta is about 20GB, despite I dropped some variables. If I simply export the *.dta as a *.csv and then imported it and saved as a *.dta and file become only 1.2GB. Considering that both of those files should contain the same information, I don't understand how can the size vary so much. Is there something wrong with my code, or is that a normal feature of *.dta file type?

Thank you!

The code I used to convert *.csv to *.dta:
Code:
clear
clear matrix
local dir "E:\Research\"
cd "`dir'\input"
set more off

local folderlist : dir . dirs "*"
foreach folder of local folderlist {
    
mkdir "`dir'temp\\`folder'\\"
local csvlist : dir "`dir'\input/`folder'" files"*.csv"

foreach file of local csvlist {
drop _all
insheet using "`dir'input\\`folder'\\`file'", clear
drop v1
drop v2
drop v3
drop v4
drop v5
local outfile = subinstr("`dir'\temp\\`folder'\\`file'",".csv","",.)
save "`outfile'", replace
}

**csv to dta conversion is done at this point
}

cd "`dir'\output"

save Database, emptyok
cd "`dir'\temp"

local folderlist : dir . dirs "*"
foreach folder of local folderlist {
local filelist: dir "`dir'\temp/`folder'" files"*.dta"
foreach file of local filelist {
  cd "`dir'\temp/`folder'"
  use `"`file'"', clear
  di `"`file'"'
  gen source = `"`file'"'
  cd "`dir'\output"
  append using Database
  save Database, replace
}
use Database
drop source
duplicates list id
duplicates drop id, force
save Database, replace

The code I used to convert *.dta to *.csv and then back to *.dta:
Code:
clear
clear matrix

local dir "C:\RA\Week 8\Database v1.1 Lean"
use "`dir'\input\Database.dta"
export delimited using "`dir'\temp\Database.csv", replace
import delimited "`dir'\temp\Database.csv",varnames(1) clear 
save "`dir'\output\Database v1.1.dta"