Hi everyone,
I was handling with a very large confidential data. Specifically, I run the Stata program on AWS EC2 (computing cluster) to import and manipulalte the data. The data itself is of fixed-format (position) and has ~ 10^8 rows and tens of variables (50 GB). My code goes like:
import delimited "spending.txt", clear
gen spend_date=substr(v1,1,8)
......
......
......
However, the process itself is very slow (tens of hours or even days). I was wondering if there's any method to speed up this process. I was also considering using -infix- command, but somehow it didn't work on my sample data (the imported data was almost nothing and very weird). Your help would be much appreciated.
Related Posts with Importing and manipulating super large datasets
Trouble configuring use of whereis with Word for Mac and markstat in Stata 16Hello all, I very happily developed a .do file on a Windows 10 machine using markstat with pandoc a…
Running regression with time indicator interactionHello, I am running regressions for daily exchange rate data and using Brexit-related news to see t…
Preserve/Restore with GlobalsIf I preserve, establish globals, and then restore, will the globals I established in the preserve/r…
Grand margin is ouside range given by margins at specified values of covariatesI have a problem understanding -margin results, even after reading Clyde Schechter's clarifications.…
Is there code for analyzing the Dietary Screener Questionnaire in Stata?Hello, I am using the Dietary Screener Questionnaire to assess dietary intake. The National Cancer …
Subscribe to:
Post Comments (Atom)
0 Response to Importing and manipulating super large datasets
Post a Comment