Hello all,

I run a fixed effects regression in a linear probability model of health outcomes/behaviors and local employment change over three waves. One of these behaviors is the quantity of cigarettes consumed. It was suggested that an OLS model with Heckman correction for number of cigarettes consumed would model the decision to smoke or not, and then conditional on this fact, the quantity smoked. I agree with this, but am not sure how to apply a Heckman selection model in panel data with fixed effects.

In my analysis I model several outcomes and behaviors in Stata as below, and would like to keep this approach when applying the heckman correction, for comparability across outcomes studied and also because I need to apply weights to my analysis of cigarette consumption.

I saw a suggestion on stack exchange to cluster the standard errors on the panel id (https://stats.stackexchange.com/ques...and-panel-data) so would that mean updating my current clustering from county to individual id? I xtset the data by id year.

Alternatively I saw a comment by Phil Bromiley that
Fixed effects can be done with i.panel in heckman. You'll probably need to increase matsize and you'll end up with a pile of parameter estimate on the panels that are not of interest. xtreg y x with the panel called panel is identical to reg y x i.panel
(https://www.statalist.org/forums/for...for-panel-data) but I don't know what that would mean in an applied sense in Stata.

Although I found that the UNESCAP suggest doing the following:

Heckman depvar indepvar1 indepvar2 … dum1 dum2 …, select(indepvar1 indepvar2 … dum1 dum2 … overidvar1…) options

https://artnet.unescap.org/tid/artnet/mtg/cbtr7-s12.pdf

But I'm not even sure what the dummies I'm supposed to add are....


I thought xtheckman might save me, but it's a random effects regression with selection and I need fixed effects (https://www.stata.com/new-in-stata/xtheckman/).

I would really appreciate applied advice on what I should do to my analysis to apply a Heckman correction.

Thanks for any help,

John

This is my core model:

Code:
. xtreg no_cigs_cons_deflated_y  psum_unemployed_total_cont_y i.yrlycurrent_county_y1 i.year age_y i.marita
> lstatus_y if has_y0_questionnaire==1 & has_y5_questionnaire==1, cluster (current_county_y1) fe robust 
note: 6.yrlycurrent_county_y1 omitted because of collinearity
note: 15.yrlycurrent_county_y1 omitted because of collinearity
note: 18.yrlycurrent_county_y1 omitted because of collinearity
note: 23.yrlycurrent_county_y1 omitted because of collinearity
note: 25.yrlycurrent_county_y1 omitted because of collinearity
note: 26.yrlycurrent_county_y1 omitted because of collinearity
note: 29.yrlycurrent_county_y1 omitted because of collinearity
note: 5.year omitted because of collinearity

Fixed-effects (within) regression               Number of obs      =      1152
Group variable: id                              Number of groups   =       642

R-sq:  within  = 0.0605                         Obs per group: min =         1
       between = 0.0179                                        avg =       1.8
       overall = 0.0145                                        max =         2

                                                F(13,28)           =         .
corr(u_i, Xb)  = -0.8476                        Prob > F           =         .

                                     (Std. Err. adjusted for 29 clusters in current_county_y1)
----------------------------------------------------------------------------------------------
                             |               Robust
     no_cigs_cons_deflated_y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-----------------------------+----------------------------------------------------------------
psum_unemployed_total_cont_y |  -.2387741   .1100417    -2.17   0.039    -.4641842    -.013364
                             |
       yrlycurrent_county_y1 |
                      Clare  |    1.84201   2.511288     0.73   0.469    -3.302129     6.98615
                       Cork  |   .9439361   2.271351     0.42   0.681    -3.708716    5.596588
                    Donegal  |          0  (omitted)
                  Dublin 16  |   .0798436   2.427069     0.03   0.974    -4.891781    5.051468
                Dublin City  |   1.268084   2.435825     0.52   0.607    -3.721478    6.257646
     Dún Laoghaire-Rathdown  |   .4580872   2.367576     0.19   0.848    -4.391673    5.307847
                     Fingal  |   .1145035   2.333406     0.05   0.961    -4.665262    4.894269
                     Galway  |  -16.52429   .3514215   -47.02   0.000    -17.24415   -15.80444
                Galway City  |  -17.09233   .4548787   -37.58   0.000     -18.0241   -16.16055
                      Kerry  |   1.898583   2.566648     0.74   0.466    -3.358958    7.156123
                    Kildare  |   1.688322   2.394418     0.71   0.487     -3.21642    6.593064
                   Kilkenny  |          0  (omitted)
                      Laois  |   2.852193   1.208139     2.36   0.025      .377433    5.326952
                    Leitrim  |   2.076192   2.333259     0.89   0.381    -2.703273    6.855657
                   Limerick  |          0  (omitted)
                   Longford  |   .5373577   2.372396     0.23   0.822    -4.322276    5.396991
                      Louth  |   1.385586   2.386451     0.58   0.566    -3.502838     6.27401
                       Mayo  |  -17.88611   .3841588   -46.56   0.000    -18.67302    -17.0992
                      Meath  |   .1920723   2.276061     0.08   0.933    -4.470227    4.854372
                   Monaghan  |          0  (omitted)
                     Offaly  |   .9486299   2.335269     0.41   0.688    -3.834952    5.732212
                  Roscommon  |          0  (omitted)
                      Sligo  |          0  (omitted)
               South Dublin  |   .0798436   2.427069     0.03   0.974    -4.891781    5.051468
                  Tipperary  |  -.0933459   .3734837    -0.25   0.804    -.8583927    .6717008
            Tipperary North  |          0  (omitted)
                  Waterford  |  -15.97167   .4552278   -35.09   0.000    -16.90416   -15.03918
                  Westmeath  |   1.313337   2.349551     0.56   0.581      -3.4995    6.126175
                    Wexford  |   -.604106   2.456075    -0.25   0.808    -5.635147    4.426935
                    Wicklow  |   3.927572    3.03076     1.30   0.206    -2.280659     10.1358
                             |
                      5.year |          0  (omitted)
                       age_y |   .0837821   .0470026     1.78   0.086    -.0124983    .1800625
                             |
             maritalstatus_y |
                 Cohabiting  |   .5289705   .4076338     1.30   0.205    -.3060295    1.363971
                  Separated  |   -.547115   .1271997    -4.30   0.000    -.8076718   -.2865582
                   Divorced  |  -6.950598   1.454566    -4.78   0.000    -9.930142   -3.971054
                    Widowed  |    3.47176   1.996616     1.74   0.093    -.6181229    7.561643
       Single/Never married  |  -1.460055   1.615909    -0.90   0.374    -4.770094    1.849984
                             |
                       _cons |   5.822622   2.518999     2.31   0.028     .6626857    10.98256
-----------------------------+----------------------------------------------------------------
                     sigma_u |  9.0440127
                     sigma_e |  3.4804153
                         rho |  .87100821   (fraction of variance due to u_i)
----------------------------------------------------------------------------------------------
And I want to model it as something like

Code:
heckman no_cigs_cons_y psum_unemployed_total_cont_y i.yrlycurrent_county_y1 i.year age_y i.maritalstatus_y [pw=ipw55] if has_y0_questionnaire==1 & has_y5_questionnaire==1, select(age_y medical_card_y i.year) vce (cluster id)