Here is the problem:
I have a large dataset of blood values for children. I want to create a binary variable anaemia that is 0 if the child is not anaemic and 1 if the child is anaemic. The normal values for haemoglobin vary by age and sex. In the past I have used published mean (SD) values to generate Z-scores, and identified anaemia as children with a haemoglobin Z-score less than -2. There are some issues with the parametric assumptions underlying that approach. Here I will be using lower bound cut-offs as published in: Staffa et al, Pediatric hematology normal ranges derived from pediatric primary care patients. Am J Hematol [Internet]. 2020 Oct [cited 2022 Dec 17];95(10).
The relevant variables in my dataset for achieving this are:
- haemoglobin (path_hb)
- sex (sex)
- date of birth (dob)
- date of blood test (path_date)
Code:
* Example generated by -dataex-. For more info, type help dataex clear input int path_hb byte sex int(dob path_date) float path_age 128 1 20853 21570 7 112 1 19185 21625 9 117 2 21397 22484 8 102 2 21642 22486 8 92 1 22337 22465 6 123 1 20591 22451 8 108 2 22363 22468 6 126 1 19719 22486 9 81 1 21210 22540 8 99 2 21917 22563 7 111 2 20887 22564 8 133 1 20385 21624 8 99 1 22084 22575 7 76 1 22257 22584 7 99 2 21312 22580 8 106 1 22279 22635 7 105 1 22325 22636 7 91 2 22481 22644 6 120 1 19505 21627 8 103 2 20331 21629 8 130 2 19890 21631 8 114 1 20886 21633 8 119 1 20965 21644 7 89 1 21510 21646 6 105 2 20648 21650 8 101 2 19456 21556 8 108 1 20751 21678 8 117 1 21525 21680 6 100 2 19458 21693 9 89 1 21452 21715 7 139 1 18245 21720 9 113 1 21175 21726 7 121 1 19665 21735 8 119 1 19253 21755 9 80 1 21279 21749 7 93 1 20515 21756 8 113 2 21448 21557 6 126 1 20900 21760 8 132 1 20744 21764 8 91 1 20860 21788 8 end format %dM_d,_CY dob format %dM_d,_CY path_date label values sex sex_lbl label def sex_lbl 1 "Male", modify label def sex_lbl 2 "Female", modify label values path_age path_age_lbl label def path_age_lbl 5 "31d to 60d", modify label def path_age_lbl 6 "61d to 180d", modify label def path_age_lbl 7 "6m to <2y", modify label def path_age_lbl 8 "2y to <6y", modify label def path_age_lbl 9 "6y to <12y", modify label def path_age_lbl 10 "12y to <18y", modify
Code:
replace anaemia = 1 if sex == X & path_age = Y & path_hb < Z
Code:
matrix input hbRef = ( 128, 128 \ /// 1: 1-3 days 133, 130 \ /// 2: 4-7 days 110, 120 \ /// 3: 8-14 days 98, 102 \ /// 4: 15-30 days 90, 89 \ /// 5: 31-60 days 94, 96 \ /// 6: 61-180 days 102, 103 \ /// 7: 181 days to <2 years 107, 107 \ /// 8: 2 years to <6 years 113, 112 \ /// 9: 6 years to <12 years 124, 114 ) // 10: 12 years to <18 years) matrix colnames hbRef = "Males" "Females" matrix rownames hbRef = "1d to 3d" "4d to 7d" "8d to 14d" "15d to 30d" "31d to 60d" "61d to 180d" "6m to <2y" "2y to <6y" "6y to <12y" "12y to <18y"
Code:
. mat list hbRef
hbRef[10,2]
               Males  Females
   1d to 3d      128      128
   4d to 7d      133      130
  8d to 14d      110      120
 15d to 30d       98      102
 31d to 60d       90       89
61d to 180d       94       96
  6m to <2y      102      103
  2y to <6y      107      107
 6y to <12y      113      112
12y to <18y      124      114Code:
gen anaemia = . replace anaemia = 0 if !missing(path_hb) & !missing(path_age) replace anaemia = 1 if path_hb < hbRef[path_age, sex] label variable anaemia "Anaemic for age" label values anaemia yesno_lbl
0 Response to Using a Matrix to Identify Anaemia
Post a Comment