Hello!

I would appreciate some advice about choosing between a model with interactions (data in long format) and a joint estimation (in wide format).

I am running a model to estimate the effect of several expenditure categories on revenue. To generate an illustration, the following is an example of the data in wide and long formats respectively.

Code:
year     cost1   cost2     revenue
2019 10300.48 4035.11 190495.36 
2019 7199.57 10395.63 210668.33
2019 11203.21 7352.16 215108.37
Code:
year source        cost revenue
2019 "source1"  10300.48 190495.36 
2019 "source2"  4035.11 190495.36 
2019 "source1"  7199.57 210668.33
2019 "source2" 10395.63 210668.33
2019 "source1" 11203.21 215108.37
2019 "source2" 7352.16 215108.37
Now, this is the regression output if we run a simple model with both cost1 and cost2 included as explanatory variables (wide data).

Code:
. reg revenue cost1 cost2 if year > 2020

      Source |       SS           df       MS      Number of obs   =        52
-------------+----------------------------------   F(2, 49)        =     89.46
       Model |  9.2743e+11         2  4.6371e+11   Prob > F        =    0.0000
    Residual |  2.5399e+11        49  5.1836e+09   R-squared       =    0.7850
-------------+----------------------------------   Adj R-squared   =    0.7762
       Total |  1.1814e+12        51  2.3165e+10   Root MSE        =     71997

------------------------------------------------------------------------------
     revenue |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       cost1 |   10.83604   1.989934     5.45   0.000      6.83712    14.83497
       cost2 |   3.998844   1.036033     3.86   0.000     1.916857    6.080831
       _cons |   64755.98   29273.31     2.21   0.032     5929.067    123582.9
------------------------------------------------------------------------------

On the other hand, if we run a model with interactions (long data), we get the following output (below). Previously, I was under the impression that this model should replicate the model above. However, I can see from the margins command (and directly from the coefficient of cost below) that coefficients from this model replicate not those in the above model, where cost1 and cost2 are jointly estimated, but coefficients from separate bivariate regression models (reg revenue cost1 and reg revenue cost2).

My question is: is it possible to replicate the above model in panel data format? It seems logical that the more robust way to proceed is including both variables in the regression (notwithstanding multicollinearity). I thought the interaction approach did just this, only in different data format, but turns out it estimates the coefficients from separate regressions for source = 1 and source = 2. I would think data format (wide vs long) would not have any impact on the results as long as the same exact commands are used, so I would like to ask (1) how it is possible to replicate the wide data model above in long data format, and (2) would you argue there is any reason in favour of not including both cost measures in the model and proceeding with the following model with interactions.

Thank you!

Code:
. reg revenue c.cost##i.source if year > 2020

      Source |       SS           df       MS      Number of obs   =       104
-------------+----------------------------------   F(3, 100)       =     73.26
       Model |  1.6239e+12         3  5.4131e+11   Prob > F        =    0.0000
    Residual |  7.3892e+11       100  7.3892e+09   R-squared       =    0.6873
-------------+----------------------------------   Adj R-squared   =    0.6779
       Total |  2.3628e+12       103  2.2940e+10   Root MSE        =     85960

-------------------------------------------------------------------------------
      revenue |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
         cost |   8.269876   .8081747    10.23   0.000      6.66648    9.873271
              |
       source |
      Google  |   19917.72   46326.79     0.43   0.668    -71993.32    111828.8
              |
source#c.cost |
      Google  |   8.380857   1.750063     4.79   0.000     4.908781    11.85293
              |
        _cons |   95964.47   34274.35     2.80   0.006     27965.14    163963.8
-------------------------------------------------------------------------------