Hi,

I'm essentially following the idea of Dr. Jeff Wooldridge Jeff Wooldridge in trying to test for the assumption assumption of endogeneity of a binary endogenous explanatory variable in the context of a nonlinear poisson structural equation/response function.

I want to:
A) Test for endogeneity of binary y2 in an unbalanced panel format
B) After testing, gain consistent estimates of my parameters of interest

I have:
1. An unbalanced panel
2. A count response with overdispersion
3. Potentially endogeneity of y2

I want to use the correlated random effects approach with a control function to see if I can perhaps check for the endogeneity of y2. Following the material, I've created a selection indicator that removes the unbalanced panel for which all the covariates are not observed (which is, for my case, only the time period 1 observations for all of my IDs). It is "The strict exogeneity of selection assumption" (the entire paper in the first attachment) Following the remainder of the material, I've done the following two step approach:

1. Pooled OLS on the binary endogenous regressor creating residuals from a LPM on all of the "instruments" in Z. That is, the control variables in the structural equation ("x"), the time dummies for the unbalanced panel (required for CRE approach), the instruments (time dummies plus instruments plus controls are all in "z"), and the time averages of all of those variables ("zbar"). Note, this procedure is also in Jeff Wooldridge book Econometric Analysis of Panel Data page 766.

Code:
reg campaign_factor_a $zlist *_mean , vce(cluster household_key)
2. Compute the residuals, which I'll call lpuhat

Code:
predict lpuhat, residuals
Then, I've performed two separate regressions for endogeneity, which I'll deem 3a and 3b (page 17 on the 2nd attachment)

3a. Pooled Poisson QMLE of count response y1 on y2, z-bar, the controls "x", and an offset. I get a coefficient on lpuhat that is indeed statistically significant indicating endogeneity.

Code:
poisson visits_per_period campaign_factor_a $xlist  *_mean  c.lpuhat, offset(log_diff_days)  vce(cluster household_key)
3b. Pooled Poisson QMLE of count response y1 on y2, y2-bar, z-bar, the controls "x", and an offset. I get a coefficient on lpuhat that is indeed statistically significant indicating endogeneity. I also get a statistically significant coefficent on y2-bar (at 10% significance level). This regression, 3b, states that this is a test only of idiosyncratic exogeneity since I controlled for unobserved heterogeneity due to inclusion of y2-bar.

Code:
poisson visits_per_period campaign_factor_a $xlist  *_mean  c.lpuhat, offset(log_diff_days)  vce(cluster household_key) //This includes y2_bar in *_mean
Results
Poisson regression Number of obs = 14,238
Wald chi2(79) = .
Prob > chi2 = .
Log pseudolikelihood = -43841.69 Pseudo R2 = 0.4345

(Std. Err. adjusted for 1,584 clusters in household_key)
--------------------------------------------------------------------------------
| Robust
visits_per_p~d | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------------+----------------------------------------------------------------
campaign_fac~a | .0244323 .0344804 0.71 0.479 -.0431481 .0920127
campaign_fa~_b | .1820243 .0277672 6.56 0.000 .1276016 .2364471
campaign_fa~ab | .6791518 .0192806 35.22 0.000 .6413625 .7169411
campaign_f~abc | .9160575 .0412163 22.23 0.000 .8352751 .99684
campaign_f~_bc | .5711531 .0416605 13.71 0.000 .4894999 .6528062
campaign_fa~_c | .0807524 .0357571 2.26 0.024 .0106698 .1508351
campaign_fa~ac | .6084234 .0284141 21.41 0.000 .5527328 .664114
items_per_pe~3 | .001461 .0000906 16.12 0.000 .0012834 .0016387
_Iperiod_2 | .5242697 .056334 9.31 0.000 .413857 .6346823
_Iperiod_3 | .3950308 .0483376 8.17 0.000 .3002909 .4897706
_Iperiod_4 | .4836983 .0577899 8.37 0.000 .3704323 .5969644
_Iperiod_5 | .4117123 .0494826 8.32 0.000 .3147283 .5086964
_Iperiod_6 | .4684135 .0585501 8.00 0.000 .3536574 .5831697
_Iperiod_7 | .3608489 .0512862 7.04 0.000 .2603298 .461368
_Iperiod_8 | .4051761 .056305 7.20 0.000 .2948203 .5155319
_Iperiod_9 | .3288929 .0513919 6.40 0.000 .2281666 .4296191
_Iperiod_10 | .3747675 .0585755 6.40 0.000 .2599616 .4895734
_Iperiod_11 | .3050915 .0544466 5.60 0.000 .1983781 .4118049
_Iperiod_12 | .3467143 .0571859 6.06 0.000 .234632 .4587967
_Iperiod_13 | .231065 .0557044 4.15 0.000 .1218863 .3402437
_Iperiod_14 | .3232659 .0562936 5.74 0.000 .2129325 .4335992
_Iperiod_15 | .2694259 .0579282 4.65 0.000 .1558886 .3829631
_Iperiod_16 | .2889558 .0661618 4.37 0.000 .1592811 .4186306
_Iperiod_17 | .306312 .0613791 4.99 0.000 .1860113 .4266128
_Iperiod_18 | .3917844 .0668209 5.86 0.000 .2608178 .5227509
_Iperiod_19 | .2412787 .0683353 3.53 0.000 .1073439 .3752135
_Iperiod_20 | .162989 .0681987 2.39 0.017 .029322 .296656
_Iperiod_21 | .2581352 .0783971 3.29 0.001 .1044797 .4117907
_Iperiod_22 | .3792274 .084947 4.46 0.000 .2127343 .5457206
_Iperiod_23 | .0598377 .0855905 0.70 0.484 -.1079167 .2275921
_Iperiod_24 | .1010361 .1201353 0.84 0.400 -.1344248 .336497
_Iperiod_25 | .3724466 .118256 3.15 0.002 .1406691 .6042241
_Iperiod_26 | .1775822 .2022082 0.88 0.380 -.2187385 .573903
_Iperiod_27 | .1494721 .096566 1.55 0.122 -.0397938 .338738
_Iperiod_28 | -.1009292 .2272399 -0.44 0.657 -.5463112 .3444527
_Iperiod_29 | .1885725 .1546619 1.22 0.223 -.1145592 .4917042
_Iperiod_30 | -.54734 .4384815 -1.25 0.212 -1.406748 .3120679
_Iperiod_31 | .1407992 .2185676 0.64 0.519 -.2875855 .5691838
_Iperiod_32 | -10.6318 .9994915 -10.64 0.000 -12.59077 -8.672832
_Iperiod_33 | .2351399 .0304283 7.73 0.000 .1755015 .2947782
_Iperiod_34 | 0 (omitted)
campai~_b_mean | -.0653849 .3781024 -0.17 0.863 -.8064519 .6756821
campai~ab_mean | -.0158255 .2287326 -0.07 0.945 -.4641332 .4324822
campa~abc_mean | .35993 .591749 0.61 0.543 -.7998766 1.519737
campa~_bc_mean | -.637022 .3933618 -1.62 0.105 -1.407997 .1339529
campai~_c_mean | -.8971231 .5502648 -1.63 0.103 -1.975622 .1813762
campai~ac_mean | .3519183 .3269151 1.08 0.282 -.2888234 .9926601
_Iperiod_2_m~n | -1.005768 4.595628 -0.22 0.827 -10.01303 8.001498
_Iperiod_3_m~n | 0 (omitted)
_Iperiod_4_m~n | -1.114617 2.295491 -0.49 0.627 -5.613696 3.384463
_Iperiod_5_m~n | 2.20383 2.337211 0.94 0.346 -2.37702 6.784679
_Iperiod_6_m~n | 1.387314 2.417485 0.57 0.566 -3.350869 6.125496
_Iperiod_7_m~n | .2184244 2.517606 0.09 0.931 -4.715993 5.152842
_Iperiod_8_m~n | -.3355481 2.619149 -0.13 0.898 -5.468986 4.79789
_Iperiod_9_m~n | 2.600205 2.662685 0.98 0.329 -2.618562 7.818972
_Iperiod_10_~n | -.8957391 2.335228 -0.38 0.701 -5.472701 3.681223
_Iperiod_11_~n | 2.475095 2.389934 1.04 0.300 -2.20909 7.15928
_Iperiod_12_~n | 2.622049 2.534325 1.03 0.301 -2.345138 7.589236
_Iperiod_13_~n | .1924305 2.58857 0.07 0.941 -4.881073 5.265934
_Iperiod_14_~n | 1.570859 2.45265 0.64 0.522 -3.236246 6.377964
_Iperiod_15_~n | 1.055761 2.506522 0.42 0.674 -3.856931 5.968454
_Iperiod_16_~n | -.3317405 2.653816 -0.13 0.901 -5.533125 4.869644
_Iperiod_17_~n | 3.802867 2.589789 1.47 0.142 -1.273027 8.87876
_Iperiod_18_~n | .6874144 2.660317 0.26 0.796 -4.526712 5.901541
_Iperiod_19_~n | 1.154653 2.769009 0.42 0.677 -4.272504 6.58181
_Iperiod_20_~n | 1.459548 2.736716 0.53 0.594 -3.904318 6.823414
_Iperiod_21_~n | 4.222689 2.83617 1.49 0.137 -1.336101 9.78148
_Iperiod_22_~n | -2.280563 3.843791 -0.59 0.553 -9.814254 5.253129
_Iperiod_23_~n | 5.494265 4.079368 1.35 0.178 -2.50115 13.48968
_Iperiod_24_~n | -4.046938 3.2166 -1.26 0.208 -10.35136 2.257483
_Iperiod_25_~n | 4.73571 3.79902 1.25 0.213 -2.710233 12.18165
_Iperiod_26_~n | 14.96615 5.598303 2.67 0.008 3.993677 25.93862
_Iperiod_27_~n | -4.979484 5.133265 -0.97 0.332 -15.0405 5.08153
_Iperiod_28_~n | .2224332 2.815017 0.08 0.937 -5.294899 5.739765
_Iperiod_29_~n | 4.635166 3.196474 1.45 0.147 -1.629808 10.90014
_Iperiod_30_~n | 13.139 5.703868 2.30 0.021 1.959628 24.31838
_Iperiod_31_~n | -7.063863 10.29276 -0.69 0.493 -27.2373 13.10957
_Iperiod_32_~n | 0 (omitted)
_Iperiod_33_~n | 0 (omitted)
pre_days_be~an | -.010314 .0016647 -6.20 0.000 -.0135767 -.0070512
sales_per_pe~n | .0004635 .0001722 2.69 0.007 .000126 .0008009
pre_items_mean | -.0005158 .0001057 -4.88 0.000 -.000723 -.0003087
pre_sales_mean | -.0002386 .0000549 -4.34 0.000 -.0003462 -.0001309
campaig~a_mean | -.0087831 .4472429 -0.02 0.984 -.8853631 .8677969
items_per_pe~n | .0022165 .0006399 3.46 0.001 .0009624 .0034706
baseline_spe~n | .0000851 .0000154 5.54 0.000 .000055 .0001152
control_vr_m~n | .9682009 .1835686 5.27 0.000 .608413 1.327989
lpuhat | .2433615 .0408389 5.96 0.000 .1633187 .3234044
_cons | -.9555393 2.394563 -0.40 0.690 -5.648796 3.737718
log_diff_days | 1 (offset)
--------------------------------------------

This may be a lot to ask, but I think this is correct, but I really need clarification on a few things.

I.) The "strict exogeneity assumption of selection" (data point in a time period cannot be systematically related to the idiosyncratic errors). I'm performing this analysis only where (y,x) are observed for all my ids/panels. Is this assumption reasonable? Is there a way to explicitly test for the selection is ignorable?

II.) Is the endogeneity resulting from 3a really a "problem". Can we really not disentangle the source of endogeneity with this approach?

III.) What exactly are the "significant interpretations" of the means of all the variables in Z in my final regression? Aren't they just controls for the CRE approach? Can they be surpressed? Most are significant.

IV.) Since y2 is endogenous binary variable, it is true that these parameters will not lead to consistent estimates since the "bad" assumption of a linear reduced form for the errors (and further assumptions). However, can I gain consistent estimates using a GMM approach?

Thank you all so much. This is my first post; I'd appreciate any help.
-AJ