I have data available at the end of each week (e.g. weekly_predictor1, weekly_predictor2, weekly_predictor3) which I use to forecast a quarterly variable say quarterly_GDP.
The data is organized in 12 series such that series 1 includes only data for week 1 of each quarter of the year merged with quarterly GDP.; followed by series 2 which then includes data available at week 2 of each quarter of the year merged with quarterly GDP and so on.
Using dataex, I give you an example of how the data look like below.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float quarter_date int year byte(quarter month week) float series double quarterly_GDP float(weekly_predictor1 weekly_predictor2 weekly_predictor3) 108 1987 1 1 1 1 4.299 -1.7400645 1.4280612 2.1785972 109 1987 2 1 1 1 2.591 -2.8215795 -.3261277 -.23142336 110 1987 3 1 1 1 3.838 -3.8739974 .706113 1.460365 111 1987 4 1 1 1 4.151 -3.122698 .4272946 1.1591737 112 1988 1 1 1 1 2.266 -3.5722666 1.432191 2.432849 113 1988 2 1 1 1 3.089 -1.486287 .17579167 1.0115452 114 1988 3 1 1 1 2.237 -3.54055 1.411383 1.7815936 115 1988 4 1 1 1 1.99 -1.0885705 1.715589 2.654196 116 1989 1 1 1 1 5.546 -2.506453 1.8187963 2.738661 117 1989 2 1 1 1 1.676 -2.459989 1.8606597 2.78734 118 1989 3 1 1 1 2.501 1.6824546 1.887378 2.860331 119 1989 4 1 1 1 .501 2.3742833 1.911596 2.920467 120 1990 1 1 1 1 2.096 .9221073 .36569595 1.4366348 121 1990 2 1 1 1 1.22 -1.8166744 2.0296164 3.0796354 122 1990 3 1 1 1 1.793 1.9894003 2.0968032 3.134805 123 1990 4 1 1 1 -2.131 4.6542344 3.010042 4.4679475 124 1991 1 1 1 1 -2.811 7.097381 3.114446 3.351834 125 1991 2 1 1 1 .418 9.544506 3.216622 6.088016 126 1991 3 1 1 1 2.371 5.090754 4.0031595 4.819216 127 1991 4 1 1 1 .297 -.3905046 1.7156185 1.8559965 128 1992 1 1 1 1 1.978 1.9397452 1.9902998 2.1061885 129 1992 2 1 1 1 1.386 2.390985 2.597939 .7381327 130 1992 3 1 1 1 2.65 -1.3339542 1.6087084 1.9718102 131 1992 4 1 1 1 3.79 .8567627 1.7855448 1.962867 132 1993 1 1 1 1 1.799 -1.7540462 1.9229333 2.899825 133 1993 2 1 1 1 1.577 -2.0117745 -.29160684 .4485936 134 1993 3 1 1 1 2.844 -1.0230411 -.024617754 1.0182488 135 1993 4 1 1 1 5.87 -2.672692 1.3204687 2.5648735 136 1994 1 1 1 1 2.581 -4.1622634 1.98522 3.033693 137 1994 2 1 1 1 3.708 -2.737124 .8887708 2.9118304 138 1994 3 1 1 1 3.438 -6.106498 1.3969445 1.8722954 139 1994 4 1 1 1 4.532 -3.720529 1.8928193 3.1347666 108 1987 1 1 2 2 4.299 -2.0297842 -.312875 -.6567536 109 1987 2 1 2 2 2.591 -2.703133 -1.5230926 -3.0387714 110 1987 3 1 2 2 3.838 -3.5352764 -.6086997 -1.9099023 111 1987 4 1 2 2 4.151 -3.710979 -1.0030841 -.8193175 112 1988 1 1 2 2 2.266 -5.708353 -1.315071 -1.81234 113 1988 2 1 2 2 3.089 -1.3538846 -.6764717 -2.5990624 114 1988 3 1 2 2 2.237 -2.4105334 -.21421558 .40100315 115 1988 4 1 2 2 1.99 -.08228127 -.49769095 1.0387636 116 1989 1 1 2 2 5.546 -1.9950553 -.014957066 -.58217704 117 1989 2 1 2 2 1.676 -.7270581 -.8647903 -.8096528 118 1989 3 1 2 2 2.501 2.6689715 .8862366 1.2430537 119 1989 4 1 2 2 .501 3.506664 .05685891 -.1775705 120 1990 1 1 2 2 2.096 .8159853 -1.5025678 -1.1917896 121 1990 2 1 2 2 1.22 -1.1706387 1.5260097 -.9492483 122 1990 3 1 2 2 1.793 -.4118942 .9257024 .7638614 123 1990 4 1 2 2 -2.131 1.1157335 1.1741554 .8788975 124 1991 1 1 2 2 -2.811 6.711821 3.525193 5.894926 125 1991 2 1 2 2 .418 8.058287 4.2774143 9.068833 126 1991 3 1 2 2 2.371 .4149646 3.225771 5.304836 127 1991 4 1 2 2 .297 -2.970027 .6311643 -.28448343 128 1992 1 1 2 2 1.978 .5811588 .8370029 -.950491 129 1992 2 1 2 2 1.386 1.4056878 .8845069 -1.5377135 130 1992 3 1 2 2 2.65 -4.1659355 .51111126 -1.9837395 131 1992 4 1 2 2 3.79 -1.0498939 -.3481121 -.3189843 132 1993 1 1 2 2 1.799 -1.759533 1.7788386 .55372554 133 1993 2 1 2 2 1.577 -1.9506114 -.4742392 -1.1890967 134 1993 3 1 2 2 2.844 .05259408 -.13787183 -.14832537 135 1993 4 1 2 2 5.87 -.3848414 -.10044453 -.08202688 136 1994 1 1 2 2 2.581 -4.169868 .21792404 .38209 137 1994 2 1 2 2 3.708 -2.605387 .077101 .10528808 138 1994 3 1 2 2 3.438 -5.123921 .1539701 -.9468574 139 1994 4 1 2 2 4.532 -3.162079 .6104902 -.12868021 108 1987 1 1 3 3 4.299 -2.1076612 -2.356725 -2.7361324 109 1987 2 1 3 3 2.591 -2.59371 -2.2176385 -3.308432 110 1987 3 1 3 3 3.838 -3.5369556 -1.988106 -2.5531945 111 1987 4 1 3 3 4.151 -3.756836 -1.8028708 -2.467215 112 1988 1 1 3 3 2.266 -5.638969 -1.60548 -2.2708828 113 1988 2 1 3 3 3.089 -1.3491007 -2.2246215 -1.7797307 114 1988 3 1 3 3 2.237 -2.410444 -2.578337 -1.8754367 115 1988 4 1 3 3 1.99 -.09574382 -.7692303 -1.9727464 116 1989 1 1 3 3 5.546 -1.995568 -.9678542 -1.9843154 117 1989 2 1 3 3 1.676 -.6276433 -.6068677 -2.032895 118 1989 3 1 3 3 2.501 2.771389 -.12857853 -1.5378385 119 1989 4 1 3 3 .501 3.595633 -1.0707742 -2.094334 120 1990 1 1 3 3 2.096 .8063396 -.970474 -1.1299096 121 1990 2 1 3 3 1.22 -1.0888479 .2601256 -1.6639595 122 1990 3 1 3 3 1.793 -.40419185 .30467215 -1.7502427 123 1990 4 1 3 3 -2.131 1.1760539 .9444076 -.2948909 124 1991 1 1 3 3 -2.811 6.568848 3.9914865 6.704494 125 1991 2 1 3 3 .418 7.920979 5.012939 8.347696 126 1991 3 1 3 3 2.371 .3071753 5.038057 7.27375 127 1991 4 1 3 3 .297 -3.035961 1.2079372 -.2178535 128 1992 1 1 3 3 1.978 .6693566 .06978367 -.8070132 129 1992 2 1 3 3 1.386 1.3361707 -.1376001 -1.2892662 130 1992 3 1 3 3 2.65 -4.2310996 -.6247261 -1.1468972 131 1992 4 1 3 3 3.79 -1.0610285 -1.0327399 -1.2286204 132 1993 1 1 3 3 1.799 -1.6186522 1.0936519 -.7818964 133 1993 2 1 3 3 1.577 -2.062438 -1.4230428 -1.073461 134 1993 3 1 3 3 2.844 .11124817 -.9719778 -.24220905 135 1993 4 1 3 3 5.87 -.261408 -.8646913 -.9575393 136 1994 1 1 3 3 2.581 -4.232854 -.5098582 -.2983453 137 1994 2 1 3 3 3.708 -2.671986 -.5958835 -.7905924 138 1994 3 1 3 3 3.438 -5.054919 -1.278387 -1.0083964 139 1994 4 1 3 3 4.532 -3.078687 -.7171309 -.8781888 108 1987 1 1 4 4 4.299 -1.9988308 -1.0315892 -2.6316936 109 1987 2 1 4 4 2.591 -2.511362 -.9939098 -2.58485 110 1987 3 1 4 4 3.838 -3.6620574 -1.5999373 -2.831921 111 1987 4 1 4 4 4.151 -3.832227 -2.7039974 -3.109755 end format %tq quarter_date
Now, I hope the structure of the data is clear to everyone.
What I do is that I generate out of sample forecasts of quarterly_GDP using data available at every week of the quarter, and then calculate mean square forecast errors.
I use the amazing rangesatat as recommended by Robert Picard in a different post https://www.statalist.org/forums/for...s-in-one/page2
Perhaps there is no need to go to this old post as I summarized relevant issues here.
Please see the code below:
Code:
xtset series quarter_date, quarterly * define a linear regression in Mata using quadcross() - help mata cross() mata: mata clear mata set matastrict on real rowvector myreg(real matrix Xall) { real colvector y, b, Xy real matrix X, XX y = Xall[.,1] // dependent var is first column of Xall X = Xall[.,2::cols(Xall)] // the remaining cols are the independent variables X = X,J(rows(X),1,1) // add a constant XX = quadcross(X, X) // linear regression, see help mata cross(), example 2 Xy = quadcross(X, y) b = invsym(XX) * Xy return(rows(X), b') } end * the low and high bounds of the recursive rolling window, don't calculate * results if year >= 2016 by series: gen low = quarter_date[1] by series: gen high = cond(year < 2016, quarter_date, .) gen Lquarterly_GDP=L.quarterly_GDP gen L2weekly_predictor1=L2.weekly_predictor1 gen L2weekly_predictor2=L2.weekly_predictor2 gen L2weekly_predictor3=L2.weekly_predictor3 **************************************Creating out of sample forecasts and mean square forecast errors using rangestat ** Now in the regression the dependent variable is lagged and all predictors are lagged as the dependent variable is not available in real time during the quarter rangestat (myreg) Lquarterly_GDP L2weekly_predictor1 L2weekly_predictor2 L2weekly_predictor3 , interval(quarter_date low high) by(series) casewise rename myreg1 obs rename myreg2 b_Lweekly_predictor1 rename myreg3 b_Lweekly_predictor2 rename myreg4 b_Lweekly_predictor3 rename myreg5 b_cons *Next, I extract the coefficients from the regressions and then multiply by the current weekly predictors by series (quarter_date): gen Forecast = b_cons + b_L2weekly_predictor1 * weekly_predictor1 + b_L2weekly_predictor2 * weekly_predictor2 + L2weekly_predictor3 * b_weekly_predictor3 if obs >= 32 gen forecast_Error = quarterly_GDP - Forecast bysort series: egen MSFE= mean(forecast_Error^2) by series: replace MSFE = . if obs < 32 | year >= 2016
Now, my current problem:
A reviewer recommends using Lasso regression and including a very large number of weekly predictors. This number can go more than 200 weekly predictors. So, now I have weekly_predictor200 and not only 3 as before.
I find this very interesting but I am not sure how to change the code to make this possible? Is there any possible tweak to the code to make this happening?
I think the main problem is also how to get the non-zero coefficients after lasso to be used to generate the forecasts.
OR is there any other significantly different code that can allow for this?
I hope to get some assistance.
Thank you so much
Mike
0 Response to Lasso problems: Extracting coefficients after lasso and then estimating out of sample forecasts in a rolling fashion!!
Post a Comment