Dear All

I have data available at the end of each week (e.g. weekly_predictor1, weekly_predictor2, weekly_predictor3) which I use to forecast a quarterly variable say quarterly_GDP.
The data is organized in 12 series such that series 1 includes only data for week 1 of each quarter of the year merged with quarterly GDP.; followed by series 2 which then includes data available at week 2 of each quarter of the year merged with quarterly GDP and so on.

Using dataex, I give you an example of how the data look like below.


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float quarter_date int year byte(quarter month week) float series double quarterly_GDP float(weekly_predictor1 weekly_predictor2 weekly_predictor3)
108 1987 1 1 1 1  4.299 -1.7400645   1.4280612  2.1785972
109 1987 2 1 1 1  2.591 -2.8215795   -.3261277 -.23142336
110 1987 3 1 1 1  3.838 -3.8739974     .706113   1.460365
111 1987 4 1 1 1  4.151  -3.122698    .4272946  1.1591737
112 1988 1 1 1 1  2.266 -3.5722666    1.432191   2.432849
113 1988 2 1 1 1  3.089  -1.486287   .17579167  1.0115452
114 1988 3 1 1 1  2.237   -3.54055    1.411383  1.7815936
115 1988 4 1 1 1   1.99 -1.0885705    1.715589   2.654196
116 1989 1 1 1 1  5.546  -2.506453   1.8187963   2.738661
117 1989 2 1 1 1  1.676  -2.459989   1.8606597    2.78734
118 1989 3 1 1 1  2.501  1.6824546    1.887378   2.860331
119 1989 4 1 1 1   .501  2.3742833    1.911596   2.920467
120 1990 1 1 1 1  2.096   .9221073   .36569595  1.4366348
121 1990 2 1 1 1   1.22 -1.8166744   2.0296164  3.0796354
122 1990 3 1 1 1  1.793  1.9894003   2.0968032   3.134805
123 1990 4 1 1 1 -2.131  4.6542344    3.010042  4.4679475
124 1991 1 1 1 1 -2.811   7.097381    3.114446   3.351834
125 1991 2 1 1 1   .418   9.544506    3.216622   6.088016
126 1991 3 1 1 1  2.371   5.090754   4.0031595   4.819216
127 1991 4 1 1 1   .297  -.3905046   1.7156185  1.8559965
128 1992 1 1 1 1  1.978  1.9397452   1.9902998  2.1061885
129 1992 2 1 1 1  1.386   2.390985    2.597939   .7381327
130 1992 3 1 1 1   2.65 -1.3339542   1.6087084  1.9718102
131 1992 4 1 1 1   3.79   .8567627   1.7855448   1.962867
132 1993 1 1 1 1  1.799 -1.7540462   1.9229333   2.899825
133 1993 2 1 1 1  1.577 -2.0117745  -.29160684   .4485936
134 1993 3 1 1 1  2.844 -1.0230411 -.024617754  1.0182488
135 1993 4 1 1 1   5.87  -2.672692   1.3204687  2.5648735
136 1994 1 1 1 1  2.581 -4.1622634     1.98522   3.033693
137 1994 2 1 1 1  3.708  -2.737124    .8887708  2.9118304
138 1994 3 1 1 1  3.438  -6.106498   1.3969445  1.8722954
139 1994 4 1 1 1  4.532  -3.720529   1.8928193  3.1347666
108 1987 1 1 2 2  4.299 -2.0297842    -.312875  -.6567536
109 1987 2 1 2 2  2.591  -2.703133  -1.5230926 -3.0387714
110 1987 3 1 2 2  3.838 -3.5352764   -.6086997 -1.9099023
111 1987 4 1 2 2  4.151  -3.710979  -1.0030841  -.8193175
112 1988 1 1 2 2  2.266  -5.708353   -1.315071   -1.81234
113 1988 2 1 2 2  3.089 -1.3538846   -.6764717 -2.5990624
114 1988 3 1 2 2  2.237 -2.4105334  -.21421558  .40100315
115 1988 4 1 2 2   1.99 -.08228127  -.49769095  1.0387636
116 1989 1 1 2 2  5.546 -1.9950553 -.014957066 -.58217704
117 1989 2 1 2 2  1.676  -.7270581   -.8647903  -.8096528
118 1989 3 1 2 2  2.501  2.6689715    .8862366  1.2430537
119 1989 4 1 2 2   .501   3.506664   .05685891  -.1775705
120 1990 1 1 2 2  2.096   .8159853  -1.5025678 -1.1917896
121 1990 2 1 2 2   1.22 -1.1706387   1.5260097  -.9492483
122 1990 3 1 2 2  1.793  -.4118942    .9257024   .7638614
123 1990 4 1 2 2 -2.131  1.1157335   1.1741554   .8788975
124 1991 1 1 2 2 -2.811   6.711821    3.525193   5.894926
125 1991 2 1 2 2   .418   8.058287   4.2774143   9.068833
126 1991 3 1 2 2  2.371   .4149646    3.225771   5.304836
127 1991 4 1 2 2   .297  -2.970027    .6311643 -.28448343
128 1992 1 1 2 2  1.978   .5811588    .8370029   -.950491
129 1992 2 1 2 2  1.386  1.4056878    .8845069 -1.5377135
130 1992 3 1 2 2   2.65 -4.1659355   .51111126 -1.9837395
131 1992 4 1 2 2   3.79 -1.0498939   -.3481121  -.3189843
132 1993 1 1 2 2  1.799  -1.759533   1.7788386  .55372554
133 1993 2 1 2 2  1.577 -1.9506114   -.4742392 -1.1890967
134 1993 3 1 2 2  2.844  .05259408  -.13787183 -.14832537
135 1993 4 1 2 2   5.87  -.3848414  -.10044453 -.08202688
136 1994 1 1 2 2  2.581  -4.169868   .21792404     .38209
137 1994 2 1 2 2  3.708  -2.605387     .077101  .10528808
138 1994 3 1 2 2  3.438  -5.123921    .1539701  -.9468574
139 1994 4 1 2 2  4.532  -3.162079    .6104902 -.12868021
108 1987 1 1 3 3  4.299 -2.1076612   -2.356725 -2.7361324
109 1987 2 1 3 3  2.591   -2.59371  -2.2176385  -3.308432
110 1987 3 1 3 3  3.838 -3.5369556   -1.988106 -2.5531945
111 1987 4 1 3 3  4.151  -3.756836  -1.8028708  -2.467215
112 1988 1 1 3 3  2.266  -5.638969    -1.60548 -2.2708828
113 1988 2 1 3 3  3.089 -1.3491007  -2.2246215 -1.7797307
114 1988 3 1 3 3  2.237  -2.410444   -2.578337 -1.8754367
115 1988 4 1 3 3   1.99 -.09574382   -.7692303 -1.9727464
116 1989 1 1 3 3  5.546  -1.995568   -.9678542 -1.9843154
117 1989 2 1 3 3  1.676  -.6276433   -.6068677  -2.032895
118 1989 3 1 3 3  2.501   2.771389  -.12857853 -1.5378385
119 1989 4 1 3 3   .501   3.595633  -1.0707742  -2.094334
120 1990 1 1 3 3  2.096   .8063396    -.970474 -1.1299096
121 1990 2 1 3 3   1.22 -1.0888479    .2601256 -1.6639595
122 1990 3 1 3 3  1.793 -.40419185   .30467215 -1.7502427
123 1990 4 1 3 3 -2.131  1.1760539    .9444076  -.2948909
124 1991 1 1 3 3 -2.811   6.568848   3.9914865   6.704494
125 1991 2 1 3 3   .418   7.920979    5.012939   8.347696
126 1991 3 1 3 3  2.371   .3071753    5.038057    7.27375
127 1991 4 1 3 3   .297  -3.035961   1.2079372  -.2178535
128 1992 1 1 3 3  1.978   .6693566   .06978367  -.8070132
129 1992 2 1 3 3  1.386  1.3361707   -.1376001 -1.2892662
130 1992 3 1 3 3   2.65 -4.2310996   -.6247261 -1.1468972
131 1992 4 1 3 3   3.79 -1.0610285  -1.0327399 -1.2286204
132 1993 1 1 3 3  1.799 -1.6186522   1.0936519  -.7818964
133 1993 2 1 3 3  1.577  -2.062438  -1.4230428  -1.073461
134 1993 3 1 3 3  2.844  .11124817   -.9719778 -.24220905
135 1993 4 1 3 3   5.87   -.261408   -.8646913  -.9575393
136 1994 1 1 3 3  2.581  -4.232854   -.5098582  -.2983453
137 1994 2 1 3 3  3.708  -2.671986   -.5958835  -.7905924
138 1994 3 1 3 3  3.438  -5.054919   -1.278387 -1.0083964
139 1994 4 1 3 3  4.532  -3.078687   -.7171309  -.8781888
108 1987 1 1 4 4  4.299 -1.9988308  -1.0315892 -2.6316936
109 1987 2 1 4 4  2.591  -2.511362   -.9939098   -2.58485
110 1987 3 1 4 4  3.838 -3.6620574  -1.5999373  -2.831921
111 1987 4 1 4 4  4.151  -3.832227  -2.7039974  -3.109755
end
format %tq quarter_date

Now, I hope the structure of the data is clear to everyone.

What I do is that I generate out of sample forecasts of quarterly_GDP using data available at every week of the quarter, and then calculate mean square forecast errors.

I use the amazing rangesatat as recommended by Robert Picard in a different post https://www.statalist.org/forums/for...s-in-one/page2

Perhaps there is no need to go to this old post as I summarized relevant issues here.

Please see the code below:


Code:
xtset series quarter_date, quarterly


* define a linear regression in Mata using quadcross() - help mata cross()
mata:
mata clear
mata set matastrict on
real rowvector myreg(real matrix Xall)
{
    real colvector y, b, Xy
    real matrix X, XX

    y = Xall[.,1]                // dependent var is first column of Xall
    X = Xall[.,2::cols(Xall)]    // the remaining cols are the independent variables
    X = X,J(rows(X),1,1)         // add a constant
    
    XX = quadcross(X, X)        // linear regression, see help mata cross(), example 2
    Xy = quadcross(X, y)
    b  = invsym(XX) * Xy
    
    return(rows(X), b')
}
end

* the low and high bounds of the recursive rolling window, don't calculate
* results if year >= 2016
by series: gen low = quarter_date[1]
by series: gen high = cond(year < 2016, quarter_date, .)

gen Lquarterly_GDP=L.quarterly_GDP

gen L2weekly_predictor1=L2.weekly_predictor1
gen L2weekly_predictor2=L2.weekly_predictor2
gen L2weekly_predictor3=L2.weekly_predictor3

**************************************Creating out of sample forecasts and mean square forecast errors using rangestat
** Now in the regression the dependent variable is lagged and all predictors are lagged as the dependent variable is not available in real time during the quarter
rangestat (myreg) Lquarterly_GDP L2weekly_predictor1  L2weekly_predictor2   L2weekly_predictor3  , interval(quarter_date low high) by(series) casewise
rename myreg1 obs
rename myreg2 b_Lweekly_predictor1
rename myreg3 b_Lweekly_predictor2
rename myreg4 b_Lweekly_predictor3
rename myreg5 b_cons

*Next, I extract the coefficients from the regressions and then multiply by the current weekly predictors 
by series (quarter_date): gen Forecast = b_cons + b_L2weekly_predictor1 * weekly_predictor1 + b_L2weekly_predictor2 * weekly_predictor2 + L2weekly_predictor3 * b_weekly_predictor3 if obs >= 32

gen forecast_Error = quarterly_GDP - Forecast

bysort series: egen MSFE= mean(forecast_Error^2)
by series: replace MSFE = . if obs < 32 | year >= 2016


Now, my current problem:

A reviewer recommends using Lasso regression and including a very large number of weekly predictors. This number can go more than 200 weekly predictors. So, now I have weekly_predictor200 and not only 3 as before.

I find this very interesting but I am not sure how to change the code to make this possible? Is there any possible tweak to the code to make this happening?
I think the main problem is also how to get the non-zero coefficients after lasso to be used to generate the forecasts.

OR is there any other significantly different code that can allow for this?


I hope to get some assistance.

Thank you so much

Mike