I have a dataset that includes a quarterly_macro variable which is the same for each week of the quarter and weekly variables (i.e., the macro variable of quarter Q is linked to weeks from 1-12 of quarter Q).
I want to fit a model on a sample and test its performance out-of-sample. My problem is as follows:
When I run the same loop on my data, it produces different results conditional on the -sort- I use. I am not sure what does the -sort- exactly mean in each case. I elaborate below.
First alternative:
Code:
frames reset
use quarterly_weekly.dta, clear
// The issue is here:
sort quarter_date myWEEK // this sort changes the result
gen SUBsample=.
replace SUBsample=1 if quarter_date<tq(1999q4) // a sample to fit the model
replace SUBsample=2 if SUBsample==. // a sample to evaluate the model (out of sample)
gen prediction=.
set seed 12345 // this is done for the results to be reproducible
levelsof myWEEK, local(levels)
foreach x of local levels {
lasso linear quarterly_macro weekly_var1-weekly_var30 if SUBsample == 1
estimates store LASSOrev
predict temp if SUBsample==2 & myWEEK==`x', postselection
replace prediction=temp if myWEEK==`x'
drop temp
}
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(quarter_date myWEEK quarterly_macro) double(weekly_var1 weekly_var2)
88 1 .234 .10359720885753632 .8679591417312622
88 2 .234 .1684819906949997 .6942142844200134
88 3 .234 .1135205626487732 .78067946434021
88 4 .234 .1124100349843502 .7621394395828247
88 5 .234 .11129950731992722 .7485765814781189
88 6 .234 .12139459326863289 .7513777315616608
88 7 .234 .12892714142799377 .7541788816452026
88 8 .234 .1302659884095192 .7332549393177032
88 9 .234 .12449266016483307 .7331060171127319
88 10 .234 .12892714142799377 .7332549393177032
88 11 .234 .13228518515825272 .7331060171127319
88 12 .234 .13228518515825272 .7365813255310059
89 1 .442 .07843606919050217 .7152390778064728
89 2 .442 .08434794098138809 .6936862468719482
89 3 .442 .1293327808380127 .7534686923027039
89 4 .442 .1293327808380127 .7293541431427002
89 5 .442 .1374310553073883 .7074033617973328
89 6 .442 .135581336915493 .7195637226104736
89 7 .442 .13115884363651276 .7097733020782471
89 8 .442 .12929266691207886 .7085883319377899
89 9 .442 .12363984063267708 .7193162739276886
89 10 .442 .12925255298614502 .7097733020782471
89 11 .442 .11802712827920914 .7097733020782471
89 12 .442 .11715778335928917 .7193162739276886
90 1 -.027 .05430242419242859 .6410727500915527
90 2 -.027 .04069159924983978 .6852232813835144
90 3 -.027 .07825291529297829 .7406743466854095
90 4 -.027 .0726819857954979 .7218160629272461
90 5 -.027 .07203014194965363 .7287604510784149
90 6 -.027 .07235606387257576 .7396003007888794
90 7 -.027 .07137066125869751 .7391292452812195
90 8 -.027 .0707111805677414 .7340230643749237
90 10 -.027 .0707111805677414 .7287604510784149
90 11 -.027 .0726819857954979 .7287604510784149
90 12 -.027 .07203014194965363 .7340230643749237
91 1 1.466 .12981104850769043 .6597848534584045
91 2 1.466 .09218613058328629 .6661829948425293
91 3 1.466 .06856242567300797 .6784851551055908
91 4 1.466 .06452743709087372 .6872596144676208
91 5 1.466 .052778784185647964 .6839054822921753
91 6 1.466 .05186103284358978 .6872596144676208
91 7 1.466 .052778784185647964 .6872596144676208
91 8 1.466 .0509432815015316 .6872596144676208
91 9 1.466 .05037875659763813 .6872596144676208
91 10 1.466 .0509432815015316 .6872596144676208
91 11 1.466 .05037875659763813 .6872596144676208
91 12 1.466 .05037875659763813 .6880350112915039
end
format %tq quarter_date
[/CODE]
Second alternative: // the same code but I only change the sort
Code:
frames reset
use quarterly_weekly.dta, clear
// The issue is here: THE CHANGE
sort myWEEK quarter_date // this sort changes the result
gen SUBsample=.
replace SUBsample=1 if quarter_date<tq(1999q4) // a sample to fit the model
replace SUBsample=2 if SUBsample==. // a sample to evaluate the model (out of sample)
gen prediction=.
set seed 12345 // this is done for the results to be reproducible
levelsof myWEEK, local(levels)
foreach x of local levels {
lasso linear quarterly_macro weekly_var1-weekly_var30 if SUBsample == 1
estimates store LASSOrev
predict temp if SUBsample==2 & myWEEK==`x', postselection
replace prediction=temp if myWEEK==`x'
drop temp
}
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float(quarter_date myWEEK quarterly_macro) double(weekly_var1 weekly_var2) 88 1 .234 .10359720885753632 .8679591417312622 89 1 .442 .07843606919050217 .7152390778064728 90 1 -.027 .05430242419242859 .6410727500915527 91 1 1.466 .12981104850769043 .6597848534584045 92 1 -.498 .1568187177181244 .752483606338501 93 1 1.061 .19480887055397034 .6877254247665405 94 1 -.277 -.22961216419935226 .6739154607057571 95 1 .533 .20222391188144684 .6799057126045227 96 1 1.357 .16426868736743927 .607530027627945 97 1 -.366 .15438511967658997 .6481877565383911 98 1 -1.091 .5329873561859131 .5102922320365906 99 1 .326 .36372411251068115 .56190025806427 100 1 -1.089 .2837163358926773 .749171257019043 101 1 .146 .10795796103775501 .5764660388231277 103 1 -1.603 .07519269734621048 .59975266456604 104 1 -.262 .04535355046391487 .68744957447052 105 1 -.474 -.061102996580302715 .7769795358181 106 1 .345 -.03943045064806938 .7835010290145874 107 1 -.69 .1510646566748619 .7343248724937439 108 1 .47 .12457912415266037 .8115817308425903 109 1 -.119 .09771569073200226 .6026725769042969 110 1 .51 .027549786493182182 .8069815039634705 111 1 .614 .23088725749403238 .7811554968357086 112 1 1.348 .06547258608043194 .5904433727264404 113 1 -.114 .04756193794310093 .6583850979804993 114 1 .214 .09916634485125542 .8127434849739075 115 1 .426 .09765896201133728 .6433976590633392 116 1 -1.1 .18991564214229584 .7861429452896118 117 1 .851 .07597751915454865 .746464192867279 118 1 .474 .07555382326245308 .7181055545806885 119 1 .58 .13181869685649872 .6889693140983582 120 1 -.234 .17234022915363312 .6527844965457916 121 1 -.786 .04685705155134201 .7141219973564148 122 1 -.351 .07157503068447113 .8004719018936157 123 1 .548 .09237945079803467 .6117232441902161 124 1 .019 .017031755298376083 .7257599234580994 125 1 -.921 .09774142131209373 .6640038192272186 126 1 -.541 .08078432828187943 .650404155254364 127 1 .14 -.05550992488861084 .6119781732559204 128 1 .753 -.005241314647719264 .670651912689209 129 1 .158 .054290205240249634 .6977491974830627 130 1 .77 .2013719528913498 .6792399883270264 131 1 .92 .0713081881403923 .5789836049079895 132 1 -1.068 .06416945718228817 .6183741390705109 133 1 .319 .038945429027080536 .7245792746543884 134 1 .024 .18857061862945557 .6980202794075012 135 1 1.101 .05807130690664053 .5858350396156311 136 1 .822 .04747616872191429 .5992066860198975 137 1 .383 .09270800650119781 .6318810880184174 138 1 .604 .13145361468195915 .71683070063591 139 1 .54 .1347259134054184 .590053141117096 140 1 -.12 .16695279628038406 .6224272847175598 end format %tq quarter_date
I will really appreciate it if someone can explain how the loop estimates the model and then produces the predictions in each case. For example, does it estimate the model on a weekly basis (i.e. does any of these alternatives run the regression for the quarterly dependent variable and week 1 data in SUBsample 1 and then use the parameters to create predictions for week 1 in SUBsample 2 and then do the same thing for each week?)
I look forward to reading your contributions
Thanks
0 Response to Results from a loop with quarterly and weekly data are different in relation to the sort of data: What is happening here?
Post a Comment