I would appreciate some advice about choosing between a model with interactions (data in long format) and a joint estimation (in wide format).
I am running a model to estimate the effect of several expenditure categories on revenue. To generate an illustration, the following is an example of the data in wide and long formats respectively.
Code:
year cost1 cost2 revenue 2019 10300.48 4035.11 190495.36 2019 7199.57 10395.63 210668.33 2019 11203.21 7352.16 215108.37
Code:
year source cost revenue 2019 "source1" 10300.48 190495.36 2019 "source2" 4035.11 190495.36 2019 "source1" 7199.57 210668.33 2019 "source2" 10395.63 210668.33 2019 "source1" 11203.21 215108.37 2019 "source2" 7352.16 215108.37
Code:
. reg revenue cost1 cost2 if year > 2020 Source | SS df MS Number of obs = 52 -------------+---------------------------------- F(2, 49) = 89.46 Model | 9.2743e+11 2 4.6371e+11 Prob > F = 0.0000 Residual | 2.5399e+11 49 5.1836e+09 R-squared = 0.7850 -------------+---------------------------------- Adj R-squared = 0.7762 Total | 1.1814e+12 51 2.3165e+10 Root MSE = 71997 ------------------------------------------------------------------------------ revenue | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- cost1 | 10.83604 1.989934 5.45 0.000 6.83712 14.83497 cost2 | 3.998844 1.036033 3.86 0.000 1.916857 6.080831 _cons | 64755.98 29273.31 2.21 0.032 5929.067 123582.9 ------------------------------------------------------------------------------
On the other hand, if we run a model with interactions (long data), we get the following output (below). Previously, I was under the impression that this model should replicate the model above. However, I can see from the margins command (and directly from the coefficient of cost below) that coefficients from this model replicate not those in the above model, where cost1 and cost2 are jointly estimated, but coefficients from separate bivariate regression models (reg revenue cost1 and reg revenue cost2).
My question is: is it possible to replicate the above model in panel data format? It seems logical that the more robust way to proceed is including both variables in the regression (notwithstanding multicollinearity). I thought the interaction approach did just this, only in different data format, but turns out it estimates the coefficients from separate regressions for source = 1 and source = 2. I would think data format (wide vs long) would not have any impact on the results as long as the same exact commands are used, so I would like to ask (1) how it is possible to replicate the wide data model above in long data format, and (2) would you argue there is any reason in favour of not including both cost measures in the model and proceeding with the following model with interactions.
Thank you!
Code:
. reg revenue c.cost##i.source if year > 2020 Source | SS df MS Number of obs = 104 -------------+---------------------------------- F(3, 100) = 73.26 Model | 1.6239e+12 3 5.4131e+11 Prob > F = 0.0000 Residual | 7.3892e+11 100 7.3892e+09 R-squared = 0.6873 -------------+---------------------------------- Adj R-squared = 0.6779 Total | 2.3628e+12 103 2.2940e+10 Root MSE = 85960 ------------------------------------------------------------------------------- revenue | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------------+---------------------------------------------------------------- cost | 8.269876 .8081747 10.23 0.000 6.66648 9.873271 | source | Google | 19917.72 46326.79 0.43 0.668 -71993.32 111828.8 | source#c.cost | Google | 8.380857 1.750063 4.79 0.000 4.908781 11.85293 | _cons | 95964.47 34274.35 2.80 0.006 27965.14 163963.8 -------------------------------------------------------------------------------
0 Response to Model with interactions vs joint estimation
Post a Comment