Hybrid long-wide. Help with transforming dataset

Hi,

I recently got a dataset of 600 patients admitted to the ICU. It contains info regarding sex, age, height etc (data that occurs once and does not change). Patients stay in the ICU for different lengths of time, they also take different blood samples. It contains more than 1500 different types of samples.

My problem is that the dataset I received is a hybrid between long and wide format, and I can not get my head around how to solve it and make the set (preferably) long, or at least consistent.

Variables:
pat_id - anonymized patient identification number
AdmissionTime - Time patient was admitted to the ICU
DischargeTime - Time patient was discharged from the ICU
Sex - Sex of patient
Age - Age at admission
Dead - Dead during stay
DeathDate - Date of death
sample_datetime - Date and time when blood sample was taken
sample_type - Type of blood sample (ex hemoglobin, creatinine, potassium and so on)
sample_acute - 1 if sample was ordered acute, 0 if ordered with normal analysing time
sample_value - Value of the blood sample
sample_unit - The unit of the blood sample (milligrams/ml, mmol/L and so on)
sample_normal_ref - The normal reference range of the blood sample value

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long pat_id double(AdmissionTime DischargeTime) str1 Sex str2 Age byte Dead int DeathDate double sample_datetime str105 sample_type byte sample_acute str24 sample_value str10 sample_unit str13 sample_normal_ref
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-St.bik ABL/PNA"     0 "26"   "mmol/L"     "22-27"    
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-pO2 ABL/PNA"        0 "9,0"  "kPa"        "8,0-13,0" 
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-MetHb ABL/PNA"      0 "0,8"  "%"          "<1,5"     
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-Glukos ABL/PNA"     0 "8,5"  "mmol/L"     "4,0-6,0"  
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-FO2Hb ABL/PNA"      0 "94"   "%"          ">93"      
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-Ca,fri ABL/PNA"     0 "1,10" "mmol/L"     "1,02-1,31"
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-Natrium ABL/PNA"    0 "137"  "mmol/L"     "137-145"  
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-Laktat ABL/PNA"     0 "1,7"  "mmol/L"     "0,5-2,3"  
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "Syrgastillf% ABL/PNA"  0 "85"   "%"          ""         
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-pCO2 ABL/PNA"       0 "3,6"  "kPa"        "4,6-6,0"  
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-AnjGap ABL/PNA"     0 "6,8"  "mmol/L"     "8-16"     
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-Klorid ABL/PNA"     0 "107"  "mmol/L"     "98-107"   
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-Krea ABL/PNA"       0 "71"   "mikromol/L" "<100"     
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-BE(st) ABL/PNA"     0 "1"    "mmol/L"     "-3-3"     
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-Kalium ABL/PNA"     0 "3,9"  "mmol/L"     "3,5-4,6"  
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-pH ABL/PNA"         0 "7,54" ""           "7,35-7,45"
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-COHb ABL/PNA"       0 "0,1"  "%"          "<2,0"     
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-sO2 ABL/PNA"        0 "95"   "%"          ">95"      
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "Oxygentillförsel PNA" 0 "0,0"  "L/min"      ""         
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-Hb ABL/PNA"         0 "151"  "g/L"        "134-170"  
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "X-LD"                  1 "9,9"  "mikrokat/L" ""         
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "P-LD"                  1 "10,4" "mikrokat/L" "<4,3"     
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "B-Eosinofila granulo"  1 "<0,1" "x10(9)/L"   "0,0-0,5"  
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "P-CRP"                 1 "167"  "mg/L"       "<3"       
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "P-Troponin T"          1 "15"   "nanog/L"    "<15"      
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "P-ASAT"                1 "2,07" "mikrokat/L" "<0,76"    
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "B-Hemoglobin"          1 "155"  "g/L"        "134-170"  
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "Erc(B)-MCH"            1 "30"   "pg"         "27-33"    
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "B-Monocyter"           1 "0,1"  "x10(9)/L"   "0,2-0,8"  
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "P-PK(INR)"             1 "1,2"  "INR"        "<1,2"     
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "P-Pankreasamylas"      1 "0,77" "mikrokat/L" "0,15-1,10"
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "P-ALAT"                1 "1,04" "mikrokat/L" "<1,1"     
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "B-Erytrocyter"         1 "5,1"  "x10(12)/L"  "4,2-5,7"  
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "Erc(B)-MCV"            1 "87"   "fL"         "82-98"    
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "B-Lymfocyter"          1 "0,8"  "x10(9)/L"   "1,1-3,5"  
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "P-Calcium"             1 "2,16" "mmol/L"     "2,15-2,50"
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "P-GT"                  1 "1,9"  "mikrokat/L" "<2,0"     
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "B-EVF"                 1 "0,44" ""           "0,39-0,50"
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "B-Neutrofila granulo"  1 "6,7"  "x10(9)/L"   "1,6-5,9"  
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "P-APT-tid"             1 "30"   "s"          "20-30"    
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "P-Albumin"             1 "22"   "g/L"        "34-45"    
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "B-Trombocyter"         1 "254"  "x10(9)/L"   "145-348"  
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "B-Basofila granulo"    1 "<0,1" "x10(9)/L"   "0,0-0,1"  
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "B-Leukocyter"          1 "7,6"  "x10(9)/L"   "3,5-8,8"  
1 1902256680000 1903179599999.9998 "M" "81" 0 .      1902204420000 "P-Bilirubin"           1 "13"   "mikromol/L" "<26"      
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902216780000.0002 "Oxygentillförsel PNA" 0 "16,0" "L/min"      ""         
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902216780000.0002 "aB-Natrium ABL/PNA"    0 "137"  "mmol/L"     "137-145"  
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902216780000.0002 "aB-sO2 ABL/PNA"        0 "91"   "%"          ">95"      
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902216780000.0002 "aB-Hb ABL/PNA"         0 "152"  "g/L"        "134-170"  
1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902216780000.0002 "aB-AnjGap ABL/PNA"     0 "7,1"  "mmol/L"     "8-16"     
end
format %tcnn/dd/ccYY_hh:MM AdmissionTime
format %tcnn/dd/ccYY_hh:MM DischargeTime
format %tdnn/dd/CCYY DeathDate
format %tcnn/dd/ccYY_hh:MM sample_datetime
label values pat_id pat_id

When I have cleaned sample that was missing values, non-interesting sample-types and more I have 640k observations left, 416 different patients. patients are staying in the ICU from 2 days up until 60+ days. All patients have different samples taken and at different times. As you can see the problem is that the sample_* variables are listed one by one as observations.

What I would like:
Short answer is something that is easier to work with. Long or wide format.

Maybe something like this:
1. Long format
2. One day (days since admission) on each row. "day" == 1 corresponding to the first calender day admitted. day == 2 the second day and so on. Up until day 60+ for a couple of patients. That would probably mean that a lot of patients, who did not spend 60+ days in the ICU would have missing on all sample_* variables for the days after they were discharged. Fine by me.
3. One sample is one variable. Ex "aB-St.bik ABL/PNA" is one, "aB-pO2 ABL/PNA" the next. 671 different samples types.
If several samples of the same type was taken on the same day I would like the second to be named "aB-St.bik ABL/PNA_2nd" and the third sample of the same type and day be named "aB-St.bik ABL/PNA_3rd" and so on.
4. Corresponding values for each sample on respective observation (i.e. day).

Maybe something like this:

pat_id

day

AdmissionTime

DischargeTime

Sex

Age

Dead

Deathdate

aB_Stbik_ABL/PNA_1st

aB_Stbik_ABL/PNA_1st_datetime

aB_Stbik_ABL/PNA_1st_acute

aB_Stbik_ABL/PNA_1st_unit

aB_Stbik_ABL/PNA_2nd

aB_Stbik_ABL/PNA_2nd_datetime

4/11/2020 8:38

4/22/2020 12:59

4/11/2020 12:08

mmol/L

4/11/2020 14:10

4/11/2020 8:38

4/22/2020 12:59

4/11/2020 8:38

4/22/2020 12:59

4/11/2020 8:38

4/22/2020 12:59

4/11/2020 8:38

4/22/2020 12:59

4/11/2020 8:38

4/22/2020 12:59

4/16/2020 17:08

mmol/L

4/11/2020 8:38

4/22/2020 12:59

4/11/2020 8:38

4/22/2020 12:59

4/11/2020 8:38

4/22/2020 12:59

4/11/2020 8:38

4/22/2020 12:59

4/11/2020 8:38

4/22/2020 12:59

4/11/2020 8:38

4/22/2020 12:59

I hope I could make myself understood. Other suggestions how to arrange my dataset is more than welcome.

All the best,

Jesper Eriksson

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Hybrid long-wide. Help with transforming dataset
Hybrid long-wide. Help with transforming dataset

0 Response to Hybrid long-wide. Help with transforming dataset

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Hybrid long-wide. Help with transforming dataset Hybrid long-wide. Help with transforming dataset

Related Posts with Hybrid long-wide. Help with transforming dataset

0 Response to Hybrid long-wide. Help with transforming dataset

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Hybrid long-wide. Help with transforming dataset
Hybrid long-wide. Help with transforming dataset