Here is the problem:
I have a large dataset of blood values for children. I want to create a binary variable anaemia that is 0 if the child is not anaemic and 1 if the child is anaemic. The normal values for haemoglobin vary by age and sex. In the past I have used published mean (SD) values to generate Z-scores, and identified anaemia as children with a haemoglobin Z-score less than -2. There are some issues with the parametric assumptions underlying that approach. Here I will be using lower bound cut-offs as published in: Staffa et al, Pediatric hematology normal ranges derived from pediatric primary care patients. Am J Hematol [Internet]. 2020 Oct [cited 2022 Dec 17];95(10).
The relevant variables in my dataset for achieving this are:
- haemoglobin (path_hb)
- sex (sex)
- date of birth (dob)
- date of blood test (path_date)
Code:
* Example generated by -dataex-. For more info, type help dataex clear input int path_hb byte sex int(dob path_date) float path_age 128 1 20853 21570 7 112 1 19185 21625 9 117 2 21397 22484 8 102 2 21642 22486 8 92 1 22337 22465 6 123 1 20591 22451 8 108 2 22363 22468 6 126 1 19719 22486 9 81 1 21210 22540 8 99 2 21917 22563 7 111 2 20887 22564 8 133 1 20385 21624 8 99 1 22084 22575 7 76 1 22257 22584 7 99 2 21312 22580 8 106 1 22279 22635 7 105 1 22325 22636 7 91 2 22481 22644 6 120 1 19505 21627 8 103 2 20331 21629 8 130 2 19890 21631 8 114 1 20886 21633 8 119 1 20965 21644 7 89 1 21510 21646 6 105 2 20648 21650 8 101 2 19456 21556 8 108 1 20751 21678 8 117 1 21525 21680 6 100 2 19458 21693 9 89 1 21452 21715 7 139 1 18245 21720 9 113 1 21175 21726 7 121 1 19665 21735 8 119 1 19253 21755 9 80 1 21279 21749 7 93 1 20515 21756 8 113 2 21448 21557 6 126 1 20900 21760 8 132 1 20744 21764 8 91 1 20860 21788 8 end format %dM_d,_CY dob format %dM_d,_CY path_date label values sex sex_lbl label def sex_lbl 1 "Male", modify label def sex_lbl 2 "Female", modify label values path_age path_age_lbl label def path_age_lbl 5 "31d to 60d", modify label def path_age_lbl 6 "61d to 180d", modify label def path_age_lbl 7 "6m to <2y", modify label def path_age_lbl 8 "2y to <6y", modify label def path_age_lbl 9 "6y to <12y", modify label def path_age_lbl 10 "12y to <18y", modify
Code:
replace anaemia = 1 if sex == X & path_age = Y & path_hb < Z
Code:
matrix input hbRef = ( 128, 128 \ /// 1: 1-3 days 133, 130 \ /// 2: 4-7 days 110, 120 \ /// 3: 8-14 days 98, 102 \ /// 4: 15-30 days 90, 89 \ /// 5: 31-60 days 94, 96 \ /// 6: 61-180 days 102, 103 \ /// 7: 181 days to <2 years 107, 107 \ /// 8: 2 years to <6 years 113, 112 \ /// 9: 6 years to <12 years 124, 114 ) // 10: 12 years to <18 years) matrix colnames hbRef = "Males" "Females" matrix rownames hbRef = "1d to 3d" "4d to 7d" "8d to 14d" "15d to 30d" "31d to 60d" "61d to 180d" "6m to <2y" "2y to <6y" "6y to <12y" "12y to <18y"
Code:
. mat list hbRef hbRef[10,2] Males Females 1d to 3d 128 128 4d to 7d 133 130 8d to 14d 110 120 15d to 30d 98 102 31d to 60d 90 89 61d to 180d 94 96 6m to <2y 102 103 2y to <6y 107 107 6y to <12y 113 112 12y to <18y 124 114
Code:
gen anaemia = . replace anaemia = 0 if !missing(path_hb) & !missing(path_age) replace anaemia = 1 if path_hb < hbRef[path_age, sex] label variable anaemia "Anaemic for age" label values anaemia yesno_lbl
0 Response to Using a Matrix to Identify Anaemia
Post a Comment