Dear all,

Greetings to all contributors from someone new to the forum and a user of Stata 15.1/SE.

I have a data set of roughly 900 obs and am trying to perform a regression on 17 variables (no panel data) plus their respective interaction terms with one variable. For simplicity, assume I only have 3 main vars X1 (dummy), X2 (dummy) and X3 (continuous) in a regression model written in pseudocode:
Code:
Y = b0 + b1 * X1 + b2 * X2 + b3 * X3 + b4 * (X1*X2) + b5 * (X1*X3) + e
As I have the informed suspicion that my key interaction variable X1 might be endogenous, I have attempted to find an instrument for it. Let's say, I have identified Z1 (dummy) and Z2 (continuous) as relevant and valid intruments for X1, so:
Code:
X1 = a0 + a1 * Z1 + a2 * Z2 + u
My understanding is, that if I want the first model to be specified correctly, I have to run an IV regression due to the mentioned endogeneity. To my mind come the Stata built-in -ivregress- or the equally intuitive -ivreg2- from SSC by Baum, Schaffer and Stillmann.

Essentially my problem has two facets:
1. Notwithstanding any background information about my research design: is this a statistically sound approach, i.e. are statistical inferences plausible with this model? I have stumbled across the forbidden regression as provided in Wooldridge (2000), Econometric Analysis of Cross Section and Panel Data, section 9.5, esp. pp. 236-7. However, my impression is that my problem differs from the forbidden regression model as I have endogeneity suggested only in X1 and not in any of X2, X3. But maybe I am misinterpreting things here.

2. Given the answer to Question 1 is 'yes': How can I implement this in Stata? I have come across - amongst other - this informative post: https://www.statalist.org/forums/for...eraction-terms and especially answer #8. However, I fail to imagine how I can execute the code proposed therein when I have two instruments. Employing -ivreg2-, I imagine this would in its simplest form result in something like:
Code:
 / * IV Regression With Interaction Terms * /
* This should be the model without endogeneity:
reg Y X1 X2 X3 X1##X2 X1##c.X3

* Now with IV approach:
ssc install ivreg2, replace
ivreg2 Y X2 X3 (X1 X1##X2 X1##c.X3 = Z1 Z2 Z1##X2 c.Z2##X2 Z1##c.X3 c.Z2##c.X3)
I especially struggle with the part in parentheses after the -ivreg2- command. While this IV approach provides at least any output, I am not entirely certain if it produces valid results.


Any insight is greatly appreciated.


Best,
Fabio