To estimate a regression discontinuity in time model I want to use cross validation to determine the optimal bandwidth. I have 7 weeks of pre-intervention and 7 weeks of post-intervention data. For cross validation, I retain only the 7 weeks pre-intervention.
I use the leave-on-out procedure
and then iterate through 2, 3, 4, 5, 6 , 7 weeks to see which bandwidth gives me the smallest mean square error.Following is the data:
Code:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str10 npi int year float week int userTRA
"J338339LLR" 2014 4 0
"J338339LLR" 2014 6 0
"J338339LLR" 2014 7 0
"J33833J3J3" 2014 2 0
"J33833J99R" 2014 2 1
"J33833JOLJ" 2014 4 0
"J33833NF9L" 2014 5 0
"J33833R8F8" 2014 1 0
"J33833RLFF" 2014 7 0
"J33833RO8R" 2014 2 0
"J338383FRV" 2014 7 0
"J338383R89" 2014 2 0
"J33838FR9R" 2014 6 0
"J33838LJ8R" 2014 3 0
"J33838LVFO" 2014 6 1
"J33838RFNL" 2014 1 1
"J338393FOR" 2014 7 0
"J338398J88" 2014 6 0
"J3383998JF" 2014 4 0
"J338399JRF" 2014 5 0
"J338399N33" 2014 2 1
"J338399V3R" 2014 7 0
"J33839F99O" 2014 6 1
"J33839F99O" 2014 7 0
"J33839FNL3" 2014 5 0
"J33839JFRV" 2014 6 0
"J33839JLRL" 2014 5 0
"J33839NNOL" 2014 6 0
"J33839O383" 2014 4 0
"J33839O8R8" 2014 2 0
"J33839OR8R" 2014 6 2
"J3383F33NJ" 2014 2 0
"J3383F38JN" 2014 4 0
"J3383F988V" 2014 2 0
"J3383FN3VR" 2014 2 1
"J3383FNFNL" 2014 1 0
"J3383FNFNL" 2014 5 0
"J3383FR8L9" 2014 2 0
"J3383FROOF" 2014 2 0
"J3383FVRVO" 2014 5 0
"J3383J3983" 2014 1 0
"J3383J3JV8" 2014 3 1
"J3383J88FO" 2014 3 0
"J3383J8RJV" 2014 3 0
"J3383J8RJV" 2014 4 0
"J3383J8VFV" 2014 5 0
"J3383JONVF" 2014 5 0
"J3383JRLJ8" 2014 2 1
"J3383JRLJ8" 2014 7 1
"J3383L3VJV" 2014 7 0
"J3383L88NV" 2014 5 0
"J3383LF888" 2014 1 0
"J3383LFR3J" 2014 7 0
"J3383LJJFO" 2014 2 0
"J3383LL9RN" 2014 6 1
"J3383LLN8N" 2014 5 0
"J3383LLVFJ" 2014 1 0
"J3383LLVFJ" 2014 5 0
"J3383LRVO8" 2014 2 0
"J3383LVFOR" 2014 7 0
"J3383LVN93" 2014 3 0
"J3383N83R8" 2014 5 0
"J3383N888L" 2014 2 0
"J3383N9LFJ" 2014 5 0
"J3383NL93R" 2014 2 0
"J3383NLV8O" 2014 7 1
"J3383NNJFV" 2014 1 0
"J3383NNJFV" 2014 6 1
"J3383NO3RF" 2014 6 0
"J3383NVJNJ" 2014 4 0
"J3383O3LVV" 2014 7 0
"J3383OFJJL" 2014 5 0
"J3383ON8LO" 2014 3 1
"J3383OOO9F" 2014 1 2
"J3383OORLN" 2014 1 0
"J3383ORLLO" 2014 2 0
"J3383OVN8F" 2014 4 0
"J3383OVRF3" 2014 6 0
"J3383R3F9N" 2014 2 0
"J3383RF9O9" 2014 6 0
"J3383RF9O9" 2014 7 0
"J3383RNV9R" 2014 1 0
"J3383ROLLV" 2014 6 0
"J3383V3OOO" 2014 4 0
"J3383V3OOO" 2014 7 0
"J3383V83FL" 2014 6 0
"J3383V83N9" 2014 7 0
"J3383V9J89" 2014 3 0
"J3383VL398" 2014 1 1
"J3383VL398" 2014 2 1
"J338F8FJFV" 2014 4 0
"J338FLNVF3" 2014 5 0
"J338FR9RLL" 2014 3 0
"J338FRR3LF" 2014 6 0
"J338FRV38F" 2014 5 0
"J338J33FOV" 2014 3 0
"J338J33RNO" 2014 5 1
"J338J38JR3" 2014 5 1
"J338J398ON" 2014 1 1
"J338J39OV3" 2014 5 0
end
Code:
. // PREPARING DATA FOR CV FOR FEDERAL SCHEDULING . . use "C:\Users\Sumedha\Documents\OPTUM\fillsProviderPanel_weekly_DS_10.dta", clear . drop if dateWeek<2834 |dateWeek>2840 // only 7 weeks pre-intervention (5,265,166 observations deleted) . gen event_time_dateweek=dateWeek-2840 . rename prov_state state . sort state . drop _merge . merge m:1 state using "G:\Misc. Data\stateFIPS.dta" Result # of obs. ----------------------------------------- not matched 3,459 from master 3,459 (_merge==1) from using 0 (_merge==2) matched 66,609 (_merge==3) ----------------------------------------- . drop if _merge~=3 (3,459 observations deleted) . drop _merge . rename stateFIPS st_fips . . gen ReschedTRA_treat=0 . replace ReschedTRA_treat=1 if st_fips==5| st_fips==13|st_fips==17|st_fips==21| > /// > st_fips== 28|st_fips==35|st_fips==36 |st_fips==38|st_fips==40|st_fips==47 > |st_fips==56|st_fips==39 (15,658 real changes made) . . keep if ReschedTRA_treat==0 (15,658 observations deleted) . gen week=dateWeek-2833 . . /* > NOTES: cllr_crossval > The goal is to estimate the bandwidth that minimizes the IMSE of a local linear regress > ion. > A grid search is used and estimation is based on the cllr program described above. > > Arguments > outcome: a stata variable containing the dependent variable > x: a stata variable containing the independent variable > start: a hardcoded number or local variable defining start of a sequence candidate > bandwidths > step: a hardcoded number or local variable defining the stepsize of the sequence o > f candidate bandwidth > stop: a hardcoded number or local variable defining the end of a sequence of candi > date bandwidths. > sub: a stata variable set to 1 if the observation should be included in the analy > sis > > Returns > A stata matrix and set of stata variables that contain the estimated IMSE for each ca > ndidate bandwidth. > > */ . . . sort npi . gen N=_n if npi[_n]~=npi[_n-1] (6,945 missing values generated) . bysort npi: egen maxN=max(N) . replace N=maxN if N==. (6,945 real changes made) . bysort N week: gen counter=_n . drop if counter>1 (141 observations deleted) . xtset N week panel variable: N (unbalanced) time variable: week, 1 to 7, but with gaps delta: 1 unit . . gen outcome = userTRA . gen x = week . . capture program drop cllr_crossval . program define cllr_crossval 1. set more off 2. args outcome x start step stop sub narrowsub 3. tempvar cx ew e2 e2n 4. . local stop = 7 5. local start = 1 6. local step = 1 7. *make a matrix to store the estimated IMSE . local size = ((`stop' - `start')/`step')+1 8. matrix M = J(`size', 3, .) 9. . . *Iterate over candidate bandwidths . local count = 0 10. forvalues h = `start'(`step')`stop'{ 11. . *increment counter . local count = `count' + 1 12. . *store location on the bandwidth grid . matrix M[`count', 1] = `h' 13. . *initialize the residual variable . gen `e2' = . 14. . . *Iterate over observations . forvalues i = 1(1)`N'{ 15. capture quietly reghdfe /*regress*/ `outcome' `x' if _n~=`i' & week > =<`h', absorb(npi) 16. replace `e2' = (`outcome' - _b[_cons])^2 in `i' 17. } 18. . *compute IMSE for the candidate bandwidth . su `e2' 19. matrix M[`count',2] = r(mean) 20. . . drop `e2' 21. } 22. . matrix list M 23. svmat M 24. end . end of do-file .
Sincerely,
Sumedha.
0 Response to Program for cross validation for regression discontinuity in time not running
Post a Comment