BJ Data Tech Solution

Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.

Importing and manipulating super large datasets
Importing and manipulating super large datasets

Hi everyone,

I was handling with a very large confidential data. Specifically, I run the Stata program on AWS EC2 (computing cluster) to import and manipulalte the data. The data itself is of fixed-format (position) and has ~ 10^8 rows and tens of variables (50 GB). My code goes like:

import delimited "spending.txt", clear

gen spend_date=substr(v1,1,8)
......
......
......

However, the process itself is very slow (tens of hours or even days). I was wondering if there's any method to speed up this process. I was also considering using -infix- command, but somehow it didn't work on my sample data (the imported data was almost nothing and very weird). Your help would be much appreciated.

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Importing and manipulating super large datasets
Importing and manipulating super large datasets

0 Response to Importing and manipulating super large datasets

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Importing and manipulating super large datasets Importing and manipulating super large datasets

0 Response to Importing and manipulating super large datasets