Hi
I am trying to match data by gender, age range +/- 5 years and BMI +/- 3. With the code below it matches but it is including BMI values outside the +/- 3 range for some matches. Could some see what is wrong with this code? Thanks
clear
** creating matched data for age (+/- 5), gender(exact match) and BMI (-/+ 3)
******************************Data preparation task********************************************** ********
use "F:\OSA data\Latestcode\AllwithOSAdata_191121latest.dta"
** create cases subset
keep if id_casecntrl==1
keep if flag==1
rename id id_case
save "F:\OSA data\Latestcode\Cases1.dta", replace
** create controls subset
use "F:\OSA data\Latestcode\AllwithOSAdata_191121latest.dta"
keep if id_casecntrl==2
rename id id_cntl
save "F:\OSA data\Latestcode\Controls1.dta", replace
gen rand = runiform()
sort rand
drop rand
save "F:\OSA data\Latestcode\Controls2.dta", replace
*rename * *_cntl
*rename id_cntl id
*duplicates drop id, force
*save "C:\Users\venka\Desktop\NSWHealth\Venkatesha - Consults\1970 - Premala Sureshkumar\Controls3.dta", replace
******************************End of Data preparation task********************************************** ********
*Read the cases data file. Replace the file path of the data set appropraitely in the program
use "F:\OSA data\Latestcode\Cases1.dta"
* matching (exact) on Gender, within +/- 5 years for age
compress
rangejoin ageatvisit -5 5 using "F:\OSA data\Latestcode\Controls2.dta", by (gender)
order id_case id_cntl gender ageatvisit
drop *_U
gen rand = runiform()
sort rand
drop rand
*rename *_U *_cntl
*rename id id_cases
*sort id_cases
*drop if id_casecntrl_cntl==.
*use matched control only twice for each matched case(preserving 1:2 case : control ratio)
*bysort id_cases: keep if _n <= 2
*Check how many controls were found for every case
*bysort id_cases: gen byte numcontrols = _N if _n == 1
*tab numcontrols
*drop if numcontrols == 1
*drop numcontrols
** Matching on age and gender is complete.
*rename id_cntl id
*drop *_cntl
*gen rand = runiform()
*sort rand
*drop rand
* matching within +/- 3 units of BMI
rangejoin bmi -3 3 using "F:\OSA data\Latestcode\Controls2.dta", by (id_cntl)
drop if ageatvisit_U==.
drop if gender_U==""
order id_case id_cntl gender gender_U ageatvisit ageatvisit_U bmi bmi_U
drop *_U
*sort id_case
*use matched control only twice for each matched case(preserving 1:2 case : control ratio)
bysort id_case id_cntl: keep if _n == 1
bysort id_case: keep if _n <= 2
*Check how many controls were found for every case
bysort id_case: gen byte numcontrols = _N if _n ==1
tab numcontrols
drop if numcontrols == 1
drop numcontrols
rename * *_case
rename (id_case_case id_cntl_case) (id_case id_cntl)
*drop *_U
*rename * *_case
*rename (id_cases_case id_case) (id_case id)
save "F:\OSA data\Latestcode\MatchedData_08December\Matched_Age GenderBMI0.dta", replace
use "F:\OSA data\Latestcode\Controls2.dta"
rename * *_cntl
rename id_cntl_cntl id_cntl
duplicates drop id_cntl, force
save "F:\OSA data\Latestcode\Controls3.dta", replace
use "F:\OSA data\Latestcode\MatchedData_08December\Matched_Age GenderBMI0.dta"
merge m:m id_cntl using "F:\OSA data\Latestcode\Controls3.dta"
order id_case id_cntl gender_case gender_cntl ageatvisit_case ageatvisit_cntl bmi_case bmi_cntl
drop if id_case==""
drop _merge
bysort id_case id_cntl: keep if _n == 1
*Check how many controls were found for every case
bysort id_case: gen byte numcontrols = _N if _n ==1
tab numcontrols
drop if numcontrols == 1
drop numcontrols
sort id_case
order id_case id_cntl gender_case gender_cntl ageatvisit_case ageatvisit_cntl bmi_case bmi_cntl
save "F:\OSA data\Latestcode\MatchedData_08December\Matched_Age GenderBMI1.dta", replace
** Matching on age,gender and BMI is complete.
0 Response to Case control matching with age, gender and BMI
Post a Comment