Hi everyone,
I was handling with a very large confidential data. Specifically, I run the Stata program on AWS EC2 (computing cluster) to import and manipulalte the data. The data itself is of fixed-format (position) and has ~ 10^8 rows and tens of variables (50 GB). My code goes like:
import delimited "spending.txt", clear
gen spend_date=substr(v1,1,8)
......
......
......
However, the process itself is very slow (tens of hours or even days). I was wondering if there's any method to speed up this process. I was also considering using -infix- command, but somehow it didn't work on my sample data (the imported data was almost nothing and very weird). Your help would be much appreciated.
Related Posts with Importing and manipulating super large datasets
Odds ratios with cmmprobit ?Dear all, I am wondering if there is a way to get odds ratios or risk ratios with cmmprobit in Stata…
Stata output InterpretationKindly assist with the interpretation of Table 2, Figure 1, Table 3 and Table 4 in the attached docu…
Data spanning across different time period - how to convert it into common variables?Hi, I asked this question in an earlier thread but I guess the data was too confusing, so simplifyin…
How to calculate odds ratio for C.DVHi everyone, How to calculate odds ratio from continuous DV (0 - 100) then transformation log …
Stata output InterpretationKindly assist with the discussion and interpretation of Tables 2-4 and Figure 1 in the attached docu…
Subscribe to:
Post Comments (Atom)
0 Response to Importing and manipulating super large datasets
Post a Comment