Hi,

I have time series trading data on an aggregate level for two distinct groups of investors (group A and group B). I want to estimate the effect of variable X_t on their daily trading volume VOL_t. The two groups have been constructed based on a criterion that depends to a certain extent on a daily trading signal. This signal can be represented through a dummy variable D_t which is 1 if the signal is observed on day t and zero else. Investors of group A trade relatively frequently according to that signal and investors of group B just trade randomly according to that signal.

My hypothesis is that explanatory variable X_t only has a positive effect on VOL_t for investors of group A on signal days (i.e., D_t = 1). It should have no effect on VOL_t for group A investors if D_t = 0. Also, it should have no (or at least a smaller) effect on VOL_t for group B investors.

I have some issues to construct a model for this.

My attempt was to fit a regression model for VOL_t of group A and a model for VOL_t of group B of the form:

VOL_t = a + b1 * D_t + b2 * X_t + b3 * (D_t * X_t) + e_t

where (D_t * X_t) is the interaction term between X_t and D_t.

If I estimate this model separately for group A and group B trading volume, I get the results hypothesized above.


Results for group A investors:
Code:
                         |               Robust
                   VOL_A |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------------------+----------------------------------------------------------------
                       D |    .776241   .0581441    13.35   0.000     .6619864    .8904956
                       X |   .0012703   .0045436     0.28   0.780     -.007658    .0101986
               D_times_X |   .1556122   .0475769     3.27   0.001     .0621225    .2491019

Results for group B investors:
Code:
                         |               Robust
                   VOL_B |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------------------+----------------------------------------------------------------
                       D |  -.1899324   .0453358    -4.19   0.000    -.2790081   -.1008567
                       X |   .0048146   .0095859     0.50   0.616    -.0140198     .023649
               D_times_X |   .0717573   .0398264     1.80   0.072    -.0064935    .1500082
However, I am not sure if this is really what is considered a good specification because VOL_t is correlated with D_t by construction for group A (even though it is quite low: corr(VOL_t,D_t) = 0.04 for group A and corr(VOL_t,D_t) = −0.01 for group B).

So, my question is: Can I interpret a positive and significant coefficient b3 in the regression model for group A as an indication for higher trading volume if X_t is higher on days with D_t = 1? Or can this result be simply spurious due to the relation between VOL_t and D_t for group A?

If that is the case: How could I do better regarding the design of the regression model?

Thanks already for any input!