Program for cross validation for regression discontinuity in time not running

Dear All,

To estimate a regression discontinuity in time model I want to use cross validation to determine the optimal bandwidth. I have 7 weeks of pre-intervention and 7 weeks of post-intervention data. For cross validation, I retain only the 7 weeks pre-intervention.

I use the leave-on-out procedure

and then iterate through 2, 3, 4, 5, 6 , 7 weeks to see which bandwidth gives me the smallest mean square error.

Following is the data:

Code:

* Example generated by -dataex-. To install: ssc install dataex

clear

input str10 npi int year float week int userTRA

"J338339LLR" 2014 4 0

"J338339LLR" 2014 6 0

"J338339LLR" 2014 7 0

"J33833J3J3" 2014 2 0

"J33833J99R" 2014 2 1

"J33833JOLJ" 2014 4 0

"J33833NF9L" 2014 5 0

"J33833R8F8" 2014 1 0

"J33833RLFF" 2014 7 0

"J33833RO8R" 2014 2 0

"J338383FRV" 2014 7 0

"J338383R89" 2014 2 0

"J33838FR9R" 2014 6 0

"J33838LJ8R" 2014 3 0

"J33838LVFO" 2014 6 1

"J33838RFNL" 2014 1 1

"J338393FOR" 2014 7 0

"J338398J88" 2014 6 0

"J3383998JF" 2014 4 0

"J338399JRF" 2014 5 0

"J338399N33" 2014 2 1

"J338399V3R" 2014 7 0

"J33839F99O" 2014 6 1

"J33839F99O" 2014 7 0

"J33839FNL3" 2014 5 0

"J33839JFRV" 2014 6 0

"J33839JLRL" 2014 5 0

"J33839NNOL" 2014 6 0

"J33839O383" 2014 4 0

"J33839O8R8" 2014 2 0

"J33839OR8R" 2014 6 2

"J3383F33NJ" 2014 2 0

"J3383F38JN" 2014 4 0

"J3383F988V" 2014 2 0

"J3383FN3VR" 2014 2 1

"J3383FNFNL" 2014 1 0

"J3383FNFNL" 2014 5 0

"J3383FR8L9" 2014 2 0

"J3383FROOF" 2014 2 0

"J3383FVRVO" 2014 5 0

"J3383J3983" 2014 1 0

"J3383J3JV8" 2014 3 1

"J3383J88FO" 2014 3 0

"J3383J8RJV" 2014 3 0

"J3383J8RJV" 2014 4 0

"J3383J8VFV" 2014 5 0

"J3383JONVF" 2014 5 0

"J3383JRLJ8" 2014 2 1

"J3383JRLJ8" 2014 7 1

"J3383L3VJV" 2014 7 0

"J3383L88NV" 2014 5 0

"J3383LF888" 2014 1 0

"J3383LFR3J" 2014 7 0

"J3383LJJFO" 2014 2 0

"J3383LL9RN" 2014 6 1

"J3383LLN8N" 2014 5 0

"J3383LLVFJ" 2014 1 0

"J3383LLVFJ" 2014 5 0

"J3383LRVO8" 2014 2 0

"J3383LVFOR" 2014 7 0

"J3383LVN93" 2014 3 0

"J3383N83R8" 2014 5 0

"J3383N888L" 2014 2 0

"J3383N9LFJ" 2014 5 0

"J3383NL93R" 2014 2 0

"J3383NLV8O" 2014 7 1

"J3383NNJFV" 2014 1 0

"J3383NNJFV" 2014 6 1

"J3383NO3RF" 2014 6 0

"J3383NVJNJ" 2014 4 0

"J3383O3LVV" 2014 7 0

"J3383OFJJL" 2014 5 0

"J3383ON8LO" 2014 3 1

"J3383OOO9F" 2014 1 2

"J3383OORLN" 2014 1 0

"J3383ORLLO" 2014 2 0

"J3383OVN8F" 2014 4 0

"J3383OVRF3" 2014 6 0

"J3383R3F9N" 2014 2 0

"J3383RF9O9" 2014 6 0

"J3383RF9O9" 2014 7 0

"J3383RNV9R" 2014 1 0

"J3383ROLLV" 2014 6 0

"J3383V3OOO" 2014 4 0

"J3383V3OOO" 2014 7 0

"J3383V83FL" 2014 6 0

"J3383V83N9" 2014 7 0

"J3383V9J89" 2014 3 0

"J3383VL398" 2014 1 1

"J3383VL398" 2014 2 1

"J338F8FJFV" 2014 4 0

"J338FLNVF3" 2014 5 0

"J338FR9RLL" 2014 3 0

"J338FRR3LF" 2014 6 0

"J338FRV38F" 2014 5 0

"J338J33FOV" 2014 3 0

"J338J33RNO" 2014 5 1

"J338J38JR3" 2014 5 1

"J338J398ON" 2014 1 1

"J338J39OV3" 2014 5 0

end

Then the code I am trying to run is:

Code:

. // PREPARING DATA FOR CV FOR FEDERAL SCHEDULING
.
. use "C:\Users\Sumedha\Documents\OPTUM\fillsProviderPanel_weekly_DS_10.dta", clear

. drop if dateWeek<2834 |dateWeek>2840 // only 7 weeks pre-intervention
(5,265,166 observations deleted)

. gen event_time_dateweek=dateWeek-2840

. rename  prov_state state

. sort state

. drop _merge

. merge m:1  state using "G:\Misc. Data\stateFIPS.dta"

    Result                           # of obs.
    -----------------------------------------
    not matched                         3,459
        from master                     3,459  (_merge==1)
        from using                          0  (_merge==2)

    matched                            66,609  (_merge==3)
    -----------------------------------------

. drop if _merge~=3
(3,459 observations deleted)

. drop _merge

. rename stateFIPS st_fips

.
.         gen ReschedTRA_treat=0

.         replace ReschedTRA_treat=1 if st_fips==5| st_fips==13|st_fips==17|st_fips==21|
> ///
>         st_fips==       28|st_fips==35|st_fips==36 |st_fips==38|st_fips==40|st_fips==47
> |st_fips==56|st_fips==39
(15,658 real changes made)

.
. keep if ReschedTRA_treat==0
(15,658 observations deleted)

. gen week=dateWeek-2833

.
. /*
> NOTES: cllr_crossval
> The goal is to estimate the bandwidth that minimizes the IMSE of a local linear regress
> ion.
> A grid search is used and estimation is based on the cllr program described above.
>
> Arguments
>   outcome: a stata variable containing the dependent variable
>         x: a stata variable containing the independent variable
>     start: a hardcoded number or local variable defining start of a sequence candidate
> bandwidths
>      step: a hardcoded number or local variable defining the stepsize of the sequence o
> f candidate bandwidth
>      stop: a hardcoded number or local variable defining the end of a sequence of candi
> date bandwidths.
>       sub: a stata variable set to 1 if the observation should be included in the analy
> sis
>
> Returns
>   A stata matrix and set of stata variables that contain the estimated IMSE for each ca
> ndidate bandwidth.
>
> */
.
.
. sort npi

. gen N=_n if npi[_n]~=npi[_n-1]
(6,945 missing values generated)

. bysort npi: egen maxN=max(N)

. replace N=maxN if N==.
(6,945 real changes made)

. bysort N week: gen counter=_n

. drop if counter>1
(141 observations deleted)

. xtset N week
       panel variable:  N (unbalanced)
        time variable:  week, 1 to 7, but with gaps
                delta:  1 unit

.
. gen outcome = userTRA

. gen x = week

.
. capture program drop cllr_crossval

. program define cllr_crossval
  1.         set more off
  2.         args outcome x start step stop sub narrowsub
  3.         tempvar cx ew e2 e2n
  4.
.         local stop = 7
  5.         local start = 1
  6.         local step = 1
  7.         *make a matrix to store the estimated IMSE
.         local size = ((`stop' - `start')/`step')+1
  8.         matrix M = J(`size', 3, .)
  9.        
.        
.         *Iterate over candidate bandwidths
.         local count = 0
 10.         forvalues h = `start'(`step')`stop'{
 11.        
.                 *increment counter
.                 local count = `count' + 1
 12.                
.                 *store location on the bandwidth grid
.                 matrix M[`count', 1] = `h'
 13.                
.                 *initialize the residual variable
.                 gen `e2' = .
 14.                
.                
.                 *Iterate over observations
.                 forvalues i = 1(1)`N'{
 15.                 capture quietly reghdfe /*regress*/ `outcome' `x' if  _n~=`i' & week
> =<`h', absorb(npi)
 16.                                 replace `e2' = (`outcome' - _b[_cons])^2 in `i'
 17.                                         }
 18.                
.                 *compute IMSE for the candidate bandwidth
.                 su `e2'
 19.                 matrix M[`count',2] = r(mean)
 20.
.                                
.                 drop `e2'
 21.         }
 22.
.         matrix list M
 23.         svmat M
 24. end

.
end of do-file

.

But it doesn't run at all! I will be grateful for any help in identifying what I am doing wrong.

Sincerely,
Sumedha.

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Program for cross validation for regression discontinuity in time not running
Program for cross validation for regression discontinuity in time not running

0 Response to Program for cross validation for regression discontinuity in time not running

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Program for cross validation for regression discontinuity in time not running Program for cross validation for regression discontinuity in time not running

Related Posts with Program for cross validation for regression discontinuity in time not running

0 Response to Program for cross validation for regression discontinuity in time not running

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Program for cross validation for regression discontinuity in time not running
Program for cross validation for regression discontinuity in time not running