Hi everyone,

I was handling with a very large confidential data. Specifically, I run the Stata program on AWS EC2 (computing cluster) to import and manipulalte the data. The data itself is of fixed-format (position) and has ~ 10^8 rows and tens of variables (50 GB). My code goes like:

import delimited "spending.txt", clear

gen spend_date=substr(v1,1,8)
......
......
......

However, the process itself is very slow (tens of hours or even days). I was wondering if there's any method to speed up this process. I was also considering using -infix- command, but somehow it didn't work on my sample data (the imported data was almost nothing and very weird). Your help would be much appreciated.