Hi all,

This is my 3rd post so far on this topic and I am so lost. Thank you guys so much for having Statalist.

My Dataex is here:

Code:
input float ln_earnings byte(age married race) float drink_intensity byte days_exer_week
 8.517193 44 0 1   4 0
 9.903487 45 1 1  .5 0
 10.08581 25 1 1  .5 2
11.523855 44 1 1   1 2
 9.305651 20 0 4  12 5
10.308952 20 0 1  30 5
I am investigating the impacts of alcohol consumption on earnings and run an IV regression:
  • Dependent Variable: ln_earnings
  • Control Variables: married, race, education
  • Explanatory Variables: drink_intensity and its square drink_intensity_sq (Because the relationship is quadratic, not linear) (drink_intensity is number of drink per week)
  • Instruments: days_exer_week (Number of days exercise per week) and its square
The thing is, as I include the square term of the explanatory variable, I don't know how should I run the IV. So far, I try 2 ways and they show different results.

First way:
Code:
regress drink_intensity age age_2 age_3 age_4 i.educgrp married i.race days_exer_week if female==0
predict drink_hat, xb
generate drink_hat_sq = drink_hat*drink_hat

regress ln_earnings age age_2 age_3 age_4 i.educgrp married i.race drink_hat drink_hat_sq if female==0
Second Way:
Code:
ivregress 2sls ln_earnings age age_2 age_3 age_4 married i.race i.educgrp (c.drink_intensity##c.drink_intensity = bmi c.days_exer_week##c.days_exer_week) if female==0, first

Even though I know the first way is not ideal, it shows statistically significant. The second way is not significant at all. They produce different coefficients as well. I don't have much options with the instruments because the data is quite restricted. In that case, which one should I go for? Is that correct to run IV regression with squared terms?

Thanks a lot. Any answer is so much appreciated!! I drop my dataset here if anyone's interested in replication.