Alternatives to using split and then reshape long?

I am trying to separate out data into rows, but without using split and reshape. Consider the following toy dataset.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float person_id str10 income
1 "10, 20, 50"
2 "35, 20"    
3 ""          
4 "40"        
5 "34, 24"    
end

I would like to separate out the income column using commas as delimiters, then get it into the following long form.

person_id	income
1	10
1	20
1	50
2	35
2	20
4	40
5	34
5	24

Typically, I would run

Code:

split income, parse(,) generate(inc) destring
drop income
reshape long inc, i(person_id) j(obs_id)
drop if inc == .
drop obs_id

However, I am in a situation with lots of data (instead of five people, we have 800k observations, and instead of a maximum of three values to be separated out, we have up to 200). My attempts at applying the above method (using Stata MP, Version 16) to this large dataset have so far been futile (in the sense that the reshape runs for hours without finishing). I am wondering if there is a better (more memory-efficient) way to do this.

For example, can I directly split my income values into rows? Or, is there a way in which I can ignore missing values when doing reshape? In my full dataset, most observations have only a few values in the income column, so that inc200 is almost all missing values. So if there was some way to preemptively drop them during the reshape, I think that could work as well.

Open to any other suggestions! Thanks everyone.

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Alternatives to using split and then reshape long?
Alternatives to using split and then reshape long?

0 Response to Alternatives to using split and then reshape long?

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Alternatives to using split and then reshape long? Alternatives to using split and then reshape long?

Related Posts with Alternatives to using split and then reshape long?

0 Response to Alternatives to using split and then reshape long?

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Alternatives to using split and then reshape long?
Alternatives to using split and then reshape long?