Hi everyone,
I was handling with a very large confidential data. Specifically, I run the Stata program on AWS EC2 (computing cluster) to import and manipulalte the data. The data itself is of fixed-format (position) and has ~ 10^8 rows and tens of variables (50 GB). My code goes like:
import delimited "spending.txt", clear
gen spend_date=substr(v1,1,8)
......
......
......
However, the process itself is very slow (tens of hours or even days). I was wondering if there's any method to speed up this process. I was also considering using -infix- command, but somehow it didn't work on my sample data (the imported data was almost nothing and very weird). Your help would be much appreciated.
Related Posts with Importing and manipulating super large datasets
Missing value: Generating a new variable but only missing values have been generatedHello, I am working on a firm level panel data. I am trying to create a new variable by subtracting…
Synthetic ControlHi. I want to create a synthetic state for my study using the command synth. However my data is at …
Question re. F-stat missing in xtreg FE models and potential singleton issue// sorry for the long post - tried to provide all the background info // Dear Statalist community, …
Creating new variable: 'flows' of newborn babies from CPS dataHi everyone, I am using CPS data and I am trying to estimate the effect of a policy change on the s…
Panel data Mixed Logit modelI developed a transport choice model (car, AV, and SAV). The software produces model generic not alt…
Subscribe to:
Post Comments (Atom)
0 Response to Importing and manipulating super large datasets
Post a Comment