Cross tabulations based on multiple dummy variables

Hello,

I have a large dataset for N= 59,000,000 observations.

I have 154 dummy variables relating to a set of diagnosis codes, and 173 dummy variables relating to service codes ( e.g. chemotherapy, orthopedics etc). The service codes all begin with ncb*. I have one string variable that identifies individuals and one cost variable (float).

I need to produce a cross tab of diagnosis code (column) *service code(row) and count the number individuals in each cell. Some people will have no diagnosis and no service code. I would then like to repeat with a sum of cost rather than count of individuals

In the past I have used collapse and reshape, to get the output I would like, but with columns being dummy variables I am unsure of how to do this.

Ideally I would like to produce a new Stata file as this will be easier to export to excel I think.

Below is a sample of the data but with the id variable not included - because it is a long string. I have included the cost variable. I am assuming I could replace a sum with a count to count the ID rather than sum the cost.

Can anyone advise please?

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float total_ss_costs byte(r69 r50_r68 r47_r49 r40_r46 ncbps29m_hist ncbps29b_hist ncbps27b_hist)
        0 0 0 0 01 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 1 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 1 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 1 0 0 0 0 0
        0 0 0 0 0 0 0 0
 7143.572 0 1 0 0 1 0 0
        0 0 1 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
141.99995 1 0 0 0 0 0 1
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
 6460.776 0 1 0 0 1 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
 903.4579 0 1 0 0 0 0 0
        0 0 1 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
 4346.248 0 0 0 1 0 0 0
        0 0 0 0 0 0 0 0
        0 0 1 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 1 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 1 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
 426.3519 0 1 0 0 1 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 1 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 1 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
13727.894 0 1 0 0 0 0 1
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
        0 0 0 0 0 0 0 0
284.62573 0 0 0 0 0 0 1
110.00005 0 1 0 0 1 0 0
        0 0 0 0 0 0 0 0
      529 0 0 1 0 0 0 1
        0 0 0 0 0 0 0 0
end

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Cross tabulations based on multiple dummy variables
Cross tabulations based on multiple dummy variables

0 Response to Cross tabulations based on multiple dummy variables

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Cross tabulations based on multiple dummy variables Cross tabulations based on multiple dummy variables

Related Posts with Cross tabulations based on multiple dummy variables

0 Response to Cross tabulations based on multiple dummy variables

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Cross tabulations based on multiple dummy variables
Cross tabulations based on multiple dummy variables