Hi everyone,
I was handling with a very large confidential data. Specifically, I run the Stata program on AWS EC2 (computing cluster) to import and manipulalte the data. The data itself is of fixed-format (position) and has ~ 10^8 rows and tens of variables (50 GB). My code goes like:
import delimited "spending.txt", clear
gen spend_date=substr(v1,1,8)
......
......
......
However, the process itself is very slow (tens of hours or even days). I was wondering if there's any method to speed up this process. I was also considering using -infix- command, but somehow it didn't work on my sample data (the imported data was almost nothing and very weird). Your help would be much appreciated.
Related Posts with Importing and manipulating super large datasets
How much computer specs do I have to analyze large data for fixed effectHi I have very large daily data. The number of rows is 1 million and the number of columns is 50. Ho…
Melogit: variance of random slope looks insignficant, but LRtest says it is?Can someone please help me with a multilevel modeling question? I've gotten conflicting advice from …
Generate a row rather than ColumnHello Everyone, May I ask if anyone has an idea if we can generate a row of data in stata? If I hav…
Reshaping data from wide to longHello! I am working with the following data obtained from World Bank's World Development Indicators.…
Create loop based on multiple qualifiers and panel datasetHello everyone, Although I've been trying for several hours, I'm afraid I can't solve this one by m…
Subscribe to:
Post Comments (Atom)
0 Response to Importing and manipulating super large datasets
Post a Comment