Hi
I am trying to match data by gender, age range +/- 5 years and BMI +/- 3. With the code below it matches but it is including BMI values outside the +/- 3 range for some matches. Could some see what is wrong with this code? Thanks


clear

** creating matched data for age (+/- 5), gender(exact match) and BMI (-/+ 3)

******************************Data preparation task********************************************** ********

use "F:\OSA data\Latestcode\AllwithOSAdata_191121latest.dta"

** create cases subset
keep if id_casecntrl==1
keep if flag==1
rename id id_case
save "F:\OSA data\Latestcode\Cases1.dta", replace

** create controls subset
use "F:\OSA data\Latestcode\AllwithOSAdata_191121latest.dta"
keep if id_casecntrl==2
rename id id_cntl
save "F:\OSA data\Latestcode\Controls1.dta", replace

gen rand = runiform()
sort rand
drop rand

save "F:\OSA data\Latestcode\Controls2.dta", replace

*rename * *_cntl
*rename id_cntl id
*duplicates drop id, force
*save "C:\Users\venka\Desktop\NSWHealth\Venkatesha - Consults\1970 - Premala Sureshkumar\Controls3.dta", replace

******************************End of Data preparation task********************************************** ********


*Read the cases data file. Replace the file path of the data set appropraitely in the program
use "F:\OSA data\Latestcode\Cases1.dta"

* matching (exact) on Gender, within +/- 5 years for age
compress
rangejoin ageatvisit -5 5 using "F:\OSA data\Latestcode\Controls2.dta", by (gender)

order id_case id_cntl gender ageatvisit
drop *_U

gen rand = runiform()
sort rand
drop rand

*rename *_U *_cntl
*rename id id_cases
*sort id_cases
*drop if id_casecntrl_cntl==.

*use matched control only twice for each matched case(preserving 1:2 case : control ratio)
*bysort id_cases: keep if _n <= 2

*Check how many controls were found for every case
*bysort id_cases: gen byte numcontrols = _N if _n == 1
*tab numcontrols
*drop if numcontrols == 1
*drop numcontrols

** Matching on age and gender is complete.

*rename id_cntl id
*drop *_cntl

*gen rand = runiform()
*sort rand
*drop rand

* matching within +/- 3 units of BMI

rangejoin bmi -3 3 using "F:\OSA data\Latestcode\Controls2.dta", by (id_cntl)
drop if ageatvisit_U==.
drop if gender_U==""

order id_case id_cntl gender gender_U ageatvisit ageatvisit_U bmi bmi_U

drop *_U
*sort id_case

*use matched control only twice for each matched case(preserving 1:2 case : control ratio)

bysort id_case id_cntl: keep if _n == 1
bysort id_case: keep if _n <= 2

*Check how many controls were found for every case
bysort id_case: gen byte numcontrols = _N if _n ==1
tab numcontrols
drop if numcontrols == 1
drop numcontrols

rename * *_case
rename (id_case_case id_cntl_case) (id_case id_cntl)

*drop *_U
*rename * *_case
*rename (id_cases_case id_case) (id_case id)

save "F:\OSA data\Latestcode\MatchedData_08December\Matched_Age GenderBMI0.dta", replace


use "F:\OSA data\Latestcode\Controls2.dta"

rename * *_cntl
rename id_cntl_cntl id_cntl
duplicates drop id_cntl, force

save "F:\OSA data\Latestcode\Controls3.dta", replace

use "F:\OSA data\Latestcode\MatchedData_08December\Matched_Age GenderBMI0.dta"

merge m:m id_cntl using "F:\OSA data\Latestcode\Controls3.dta"

order id_case id_cntl gender_case gender_cntl ageatvisit_case ageatvisit_cntl bmi_case bmi_cntl
drop if id_case==""
drop _merge

bysort id_case id_cntl: keep if _n == 1

*Check how many controls were found for every case
bysort id_case: gen byte numcontrols = _N if _n ==1
tab numcontrols
drop if numcontrols == 1
drop numcontrols
sort id_case

order id_case id_cntl gender_case gender_cntl ageatvisit_case ageatvisit_cntl bmi_case bmi_cntl

save "F:\OSA data\Latestcode\MatchedData_08December\Matched_Age GenderBMI1.dta", replace
** Matching on age,gender and BMI is complete.