I recently got a dataset of 600 patients admitted to the ICU. It contains info regarding sex, age, height etc (data that occurs once and does not change). Patients stay in the ICU for different lengths of time, they also take different blood samples. It contains more than 1500 different types of samples.
My problem is that the dataset I received is a hybrid between long and wide format, and I can not get my head around how to solve it and make the set (preferably) long, or at least consistent.
Variables:
pat_id - anonymized patient identification number
AdmissionTime - Time patient was admitted to the ICU
DischargeTime - Time patient was discharged from the ICU
Sex - Sex of patient
Age - Age at admission
Dead - Dead during stay
DeathDate - Date of death
sample_datetime - Date and time when blood sample was taken
sample_type - Type of blood sample (ex hemoglobin, creatinine, potassium and so on)
sample_acute - 1 if sample was ordered acute, 0 if ordered with normal analysing time
sample_value - Value of the blood sample
sample_unit - The unit of the blood sample (milligrams/ml, mmol/L and so on)
sample_normal_ref - The normal reference range of the blood sample value
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long pat_id double(AdmissionTime DischargeTime) str1 Sex str2 Age byte Dead int DeathDate double sample_datetime str105 sample_type byte sample_acute str24 sample_value str10 sample_unit str13 sample_normal_ref 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-St.bik ABL/PNA" 0 "26" "mmol/L" "22-27" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-pO2 ABL/PNA" 0 "9,0" "kPa" "8,0-13,0" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-MetHb ABL/PNA" 0 "0,8" "%" "<1,5" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-Glukos ABL/PNA" 0 "8,5" "mmol/L" "4,0-6,0" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-FO2Hb ABL/PNA" 0 "94" "%" ">93" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-Ca,fri ABL/PNA" 0 "1,10" "mmol/L" "1,02-1,31" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-Natrium ABL/PNA" 0 "137" "mmol/L" "137-145" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-Laktat ABL/PNA" 0 "1,7" "mmol/L" "0,5-2,3" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "Syrgastillf% ABL/PNA" 0 "85" "%" "" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-pCO2 ABL/PNA" 0 "3,6" "kPa" "4,6-6,0" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-AnjGap ABL/PNA" 0 "6,8" "mmol/L" "8-16" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-Klorid ABL/PNA" 0 "107" "mmol/L" "98-107" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-Krea ABL/PNA" 0 "71" "mikromol/L" "<100" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-BE(st) ABL/PNA" 0 "1" "mmol/L" "-3-3" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-Kalium ABL/PNA" 0 "3,9" "mmol/L" "3,5-4,6" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-pH ABL/PNA" 0 "7,54" "" "7,35-7,45" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-COHb ABL/PNA" 0 "0,1" "%" "<2,0" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-sO2 ABL/PNA" 0 "95" "%" ">95" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "Oxygentillförsel PNA" 0 "0,0" "L/min" "" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902182880000.0002 "aB-Hb ABL/PNA" 0 "151" "g/L" "134-170" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902204420000 "X-LD" 1 "9,9" "mikrokat/L" "" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902204420000 "P-LD" 1 "10,4" "mikrokat/L" "<4,3" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902204420000 "B-Eosinofila granulo" 1 "<0,1" "x10(9)/L" "0,0-0,5" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902204420000 "P-CRP" 1 "167" "mg/L" "<3" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902204420000 "P-Troponin T" 1 "15" "nanog/L" "<15" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902204420000 "P-ASAT" 1 "2,07" "mikrokat/L" "<0,76" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902204420000 "B-Hemoglobin" 1 "155" "g/L" "134-170" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902204420000 "Erc(B)-MCH" 1 "30" "pg" "27-33" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902204420000 "B-Monocyter" 1 "0,1" "x10(9)/L" "0,2-0,8" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902204420000 "P-PK(INR)" 1 "1,2" "INR" "<1,2" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902204420000 "P-Pankreasamylas" 1 "0,77" "mikrokat/L" "0,15-1,10" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902204420000 "P-ALAT" 1 "1,04" "mikrokat/L" "<1,1" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902204420000 "B-Erytrocyter" 1 "5,1" "x10(12)/L" "4,2-5,7" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902204420000 "Erc(B)-MCV" 1 "87" "fL" "82-98" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902204420000 "B-Lymfocyter" 1 "0,8" "x10(9)/L" "1,1-3,5" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902204420000 "P-Calcium" 1 "2,16" "mmol/L" "2,15-2,50" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902204420000 "P-GT" 1 "1,9" "mikrokat/L" "<2,0" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902204420000 "B-EVF" 1 "0,44" "" "0,39-0,50" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902204420000 "B-Neutrofila granulo" 1 "6,7" "x10(9)/L" "1,6-5,9" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902204420000 "P-APT-tid" 1 "30" "s" "20-30" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902204420000 "P-Albumin" 1 "22" "g/L" "34-45" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902204420000 "B-Trombocyter" 1 "254" "x10(9)/L" "145-348" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902204420000 "B-Basofila granulo" 1 "<0,1" "x10(9)/L" "0,0-0,1" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902204420000 "B-Leukocyter" 1 "7,6" "x10(9)/L" "3,5-8,8" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902204420000 "P-Bilirubin" 1 "13" "mikromol/L" "<26" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902216780000.0002 "Oxygentillförsel PNA" 0 "16,0" "L/min" "" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902216780000.0002 "aB-Natrium ABL/PNA" 0 "137" "mmol/L" "137-145" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902216780000.0002 "aB-sO2 ABL/PNA" 0 "91" "%" ">95" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902216780000.0002 "aB-Hb ABL/PNA" 0 "152" "g/L" "134-170" 1 1902256680000 1903179599999.9998 "M" "81" 0 . 1902216780000.0002 "aB-AnjGap ABL/PNA" 0 "7,1" "mmol/L" "8-16" end format %tcnn/dd/ccYY_hh:MM AdmissionTime format %tcnn/dd/ccYY_hh:MM DischargeTime format %tdnn/dd/CCYY DeathDate format %tcnn/dd/ccYY_hh:MM sample_datetime label values pat_id pat_id
What I would like:
Short answer is something that is easier to work with. Long or wide format.
Maybe something like this:
1. Long format
2. One day (days since admission) on each row. "day" == 1 corresponding to the first calender day admitted. day == 2 the second day and so on. Up until day 60+ for a couple of patients. That would probably mean that a lot of patients, who did not spend 60+ days in the ICU would have missing on all sample_* variables for the days after they were discharged. Fine by me.
3. One sample is one variable. Ex "aB-St.bik ABL/PNA" is one, "aB-pO2 ABL/PNA" the next. 671 different samples types.
If several samples of the same type was taken on the same day I would like the second to be named "aB-St.bik ABL/PNA_2nd" and the third sample of the same type and day be named "aB-St.bik ABL/PNA_3rd" and so on.
4. Corresponding values for each sample on respective observation (i.e. day).
Maybe something like this:
pat_id | day | AdmissionTime | DischargeTime |
|
Age | Dead | Deathdate | aB_Stbik_ABL/PNA_1st | aB_Stbik_ABL/PNA_1st_datetime | aB_Stbik_ABL/PNA_1st_acute | aB_Stbik_ABL/PNA_1st_unit | aB_Stbik_ABL/PNA_2nd | aB_Stbik_ABL/PNA_2nd_datetime | |||||||
1 | 1 | 4/11/2020 8:38 | 4/22/2020 12:59 | M | 81 | 0 | . | 26 | 4/11/2020 12:08 | 0 | mmol/L | 30 | 4/11/2020 14:10 | |||||||
1 | 2 | 4/11/2020 8:38 | 4/22/2020 12:59 | M | 81 | 0 | . | . | . | . | . | |||||||||
1 | 3 | 4/11/2020 8:38 | 4/22/2020 12:59 | M | 81 | 0 | . | . | . | . | . | |||||||||
1 | 4 | 4/11/2020 8:38 | 4/22/2020 12:59 | M | 81 | 0 | . | . | . | . | . | |||||||||
1 | 5 | 4/11/2020 8:38 | 4/22/2020 12:59 | M | 81 | 0 | . | . | . | . | . | |||||||||
1 | 6 | 4/11/2020 8:38 | 4/22/2020 12:59 | M | 81 | 0 | . | 35 | 4/16/2020 17:08 | 1 | mmol/L | |||||||||
1 | 7 | 4/11/2020 8:38 | 4/22/2020 12:59 | M | 81 | 0 | . | |||||||||||||
1 | 8 | 4/11/2020 8:38 | 4/22/2020 12:59 | M | 81 | 0 | . | |||||||||||||
1 | 9 | 4/11/2020 8:38 | 4/22/2020 12:59 | M | 81 | 0 | . | |||||||||||||
1 | 10 | 4/11/2020 8:38 | 4/22/2020 12:59 | M | 81 | 0 | . | |||||||||||||
1 | 11 | 4/11/2020 8:38 | 4/22/2020 12:59 | M | 81 | 0 | . | |||||||||||||
1 | 12 | 4/11/2020 8:38 | 4/22/2020 12:59 | M | 81 | 0 | . | |||||||||||||
I hope I could make myself understood. Other suggestions how to arrange my dataset is more than welcome.
All the best,
Jesper Eriksson
0 Response to Hybrid long-wide. Help with transforming dataset
Post a Comment