To estimate a regression discontinuity in time model I want to use cross validation to determine the optimal bandwidth. I have 7 weeks of pre-intervention and 7 weeks of post-intervention data. For cross validation, I retain only the 7 weeks pre-intervention.
I use the leave-on-out procedure
and then iterate through 2, 3, 4, 5, 6 , 7 weeks to see which bandwidth gives me the smallest mean square error.Following is the data:
Code:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str10 npi int year float week int userTRA
"J338339LLR" 2014 4 0
"J338339LLR" 2014 6 0
"J338339LLR" 2014 7 0
"J33833J3J3" 2014 2 0
"J33833J99R" 2014 2 1
"J33833JOLJ" 2014 4 0
"J33833NF9L" 2014 5 0
"J33833R8F8" 2014 1 0
"J33833RLFF" 2014 7 0
"J33833RO8R" 2014 2 0
"J338383FRV" 2014 7 0
"J338383R89" 2014 2 0
"J33838FR9R" 2014 6 0
"J33838LJ8R" 2014 3 0
"J33838LVFO" 2014 6 1
"J33838RFNL" 2014 1 1
"J338393FOR" 2014 7 0
"J338398J88" 2014 6 0
"J3383998JF" 2014 4 0
"J338399JRF" 2014 5 0
"J338399N33" 2014 2 1
"J338399V3R" 2014 7 0
"J33839F99O" 2014 6 1
"J33839F99O" 2014 7 0
"J33839FNL3" 2014 5 0
"J33839JFRV" 2014 6 0
"J33839JLRL" 2014 5 0
"J33839NNOL" 2014 6 0
"J33839O383" 2014 4 0
"J33839O8R8" 2014 2 0
"J33839OR8R" 2014 6 2
"J3383F33NJ" 2014 2 0
"J3383F38JN" 2014 4 0
"J3383F988V" 2014 2 0
"J3383FN3VR" 2014 2 1
"J3383FNFNL" 2014 1 0
"J3383FNFNL" 2014 5 0
"J3383FR8L9" 2014 2 0
"J3383FROOF" 2014 2 0
"J3383FVRVO" 2014 5 0
"J3383J3983" 2014 1 0
"J3383J3JV8" 2014 3 1
"J3383J88FO" 2014 3 0
"J3383J8RJV" 2014 3 0
"J3383J8RJV" 2014 4 0
"J3383J8VFV" 2014 5 0
"J3383JONVF" 2014 5 0
"J3383JRLJ8" 2014 2 1
"J3383JRLJ8" 2014 7 1
"J3383L3VJV" 2014 7 0
"J3383L88NV" 2014 5 0
"J3383LF888" 2014 1 0
"J3383LFR3J" 2014 7 0
"J3383LJJFO" 2014 2 0
"J3383LL9RN" 2014 6 1
"J3383LLN8N" 2014 5 0
"J3383LLVFJ" 2014 1 0
"J3383LLVFJ" 2014 5 0
"J3383LRVO8" 2014 2 0
"J3383LVFOR" 2014 7 0
"J3383LVN93" 2014 3 0
"J3383N83R8" 2014 5 0
"J3383N888L" 2014 2 0
"J3383N9LFJ" 2014 5 0
"J3383NL93R" 2014 2 0
"J3383NLV8O" 2014 7 1
"J3383NNJFV" 2014 1 0
"J3383NNJFV" 2014 6 1
"J3383NO3RF" 2014 6 0
"J3383NVJNJ" 2014 4 0
"J3383O3LVV" 2014 7 0
"J3383OFJJL" 2014 5 0
"J3383ON8LO" 2014 3 1
"J3383OOO9F" 2014 1 2
"J3383OORLN" 2014 1 0
"J3383ORLLO" 2014 2 0
"J3383OVN8F" 2014 4 0
"J3383OVRF3" 2014 6 0
"J3383R3F9N" 2014 2 0
"J3383RF9O9" 2014 6 0
"J3383RF9O9" 2014 7 0
"J3383RNV9R" 2014 1 0
"J3383ROLLV" 2014 6 0
"J3383V3OOO" 2014 4 0
"J3383V3OOO" 2014 7 0
"J3383V83FL" 2014 6 0
"J3383V83N9" 2014 7 0
"J3383V9J89" 2014 3 0
"J3383VL398" 2014 1 1
"J3383VL398" 2014 2 1
"J338F8FJFV" 2014 4 0
"J338FLNVF3" 2014 5 0
"J338FR9RLL" 2014 3 0
"J338FRR3LF" 2014 6 0
"J338FRV38F" 2014 5 0
"J338J33FOV" 2014 3 0
"J338J33RNO" 2014 5 1
"J338J38JR3" 2014 5 1
"J338J398ON" 2014 1 1
"J338J39OV3" 2014 5 0
end
Code:
. // PREPARING DATA FOR CV FOR FEDERAL SCHEDULING
.
. use "C:\Users\Sumedha\Documents\OPTUM\fillsProviderPanel_weekly_DS_10.dta", clear
. drop if dateWeek<2834 |dateWeek>2840 // only 7 weeks pre-intervention
(5,265,166 observations deleted)
. gen event_time_dateweek=dateWeek-2840
. rename prov_state state
. sort state
. drop _merge
. merge m:1 state using "G:\Misc. Data\stateFIPS.dta"
Result # of obs.
-----------------------------------------
not matched 3,459
from master 3,459 (_merge==1)
from using 0 (_merge==2)
matched 66,609 (_merge==3)
-----------------------------------------
. drop if _merge~=3
(3,459 observations deleted)
. drop _merge
. rename stateFIPS st_fips
.
. gen ReschedTRA_treat=0
. replace ReschedTRA_treat=1 if st_fips==5| st_fips==13|st_fips==17|st_fips==21|
> ///
> st_fips== 28|st_fips==35|st_fips==36 |st_fips==38|st_fips==40|st_fips==47
> |st_fips==56|st_fips==39
(15,658 real changes made)
.
. keep if ReschedTRA_treat==0
(15,658 observations deleted)
. gen week=dateWeek-2833
.
. /*
> NOTES: cllr_crossval
> The goal is to estimate the bandwidth that minimizes the IMSE of a local linear regress
> ion.
> A grid search is used and estimation is based on the cllr program described above.
>
> Arguments
> outcome: a stata variable containing the dependent variable
> x: a stata variable containing the independent variable
> start: a hardcoded number or local variable defining start of a sequence candidate
> bandwidths
> step: a hardcoded number or local variable defining the stepsize of the sequence o
> f candidate bandwidth
> stop: a hardcoded number or local variable defining the end of a sequence of candi
> date bandwidths.
> sub: a stata variable set to 1 if the observation should be included in the analy
> sis
>
> Returns
> A stata matrix and set of stata variables that contain the estimated IMSE for each ca
> ndidate bandwidth.
>
> */
.
.
. sort npi
. gen N=_n if npi[_n]~=npi[_n-1]
(6,945 missing values generated)
. bysort npi: egen maxN=max(N)
. replace N=maxN if N==.
(6,945 real changes made)
. bysort N week: gen counter=_n
. drop if counter>1
(141 observations deleted)
. xtset N week
panel variable: N (unbalanced)
time variable: week, 1 to 7, but with gaps
delta: 1 unit
.
. gen outcome = userTRA
. gen x = week
.
. capture program drop cllr_crossval
. program define cllr_crossval
1. set more off
2. args outcome x start step stop sub narrowsub
3. tempvar cx ew e2 e2n
4.
. local stop = 7
5. local start = 1
6. local step = 1
7. *make a matrix to store the estimated IMSE
. local size = ((`stop' - `start')/`step')+1
8. matrix M = J(`size', 3, .)
9.
.
. *Iterate over candidate bandwidths
. local count = 0
10. forvalues h = `start'(`step')`stop'{
11.
. *increment counter
. local count = `count' + 1
12.
. *store location on the bandwidth grid
. matrix M[`count', 1] = `h'
13.
. *initialize the residual variable
. gen `e2' = .
14.
.
. *Iterate over observations
. forvalues i = 1(1)`N'{
15. capture quietly reghdfe /*regress*/ `outcome' `x' if _n~=`i' & week
> =<`h', absorb(npi)
16. replace `e2' = (`outcome' - _b[_cons])^2 in `i'
17. }
18.
. *compute IMSE for the candidate bandwidth
. su `e2'
19. matrix M[`count',2] = r(mean)
20.
.
. drop `e2'
21. }
22.
. matrix list M
23. svmat M
24. end
.
end of do-file
.Sincerely,
Sumedha.
0 Response to Program for cross validation for regression discontinuity in time not running
Post a Comment