I have a 4 wave panel of children's height and weight and parental employment. Anthropometric measures are objectively measured and so I convert them to z scores using the zanthro package in Stata. I create weight for age, weight for height, height for age and BMI z scores and then I create new binary variables of overweight using z-scores cut-offs using the WHO child growth charts to go with these continuous variables.

Before I begin my analysis I remove wave 4 as there is a lack of response in this wave:

Code:
drop if wave==4

I would like to analyse the data in a fixed effects analysis, but the data providers have suggested I apply a wave 3 weight they provide to make the sample representative of the national child population.

xtlogit will not allow me to apply weights so instead I do the following:

Code:
clogit child_overweight_y parents_unemployed_y  i.urban_or_rural_y child_age_y [pw=weighting_factor], group(id) nolog robust
margins, dydx(parents_unemployed_y) post
estimates store logitmod
estimates table logitmod, star stats(N r2 r2_a)
Which provides the following output:


Code:
. clogit child_overweight_y parents_unemployed_y  i.urban_or_rural_y child_age_y [pw=weighting_factor], gro
> up(id) nolog robust
note: multiple positive outcomes within groups encountered.
note: 7,150 groups (20,713 obs) dropped because of all positive or
      all negative outcomes.

Conditional (fixed-effects) logistic regression

                                                Number of obs     =      5,341
                                                Wald chi2(3)      =      51.26
                                                Prob > chi2       =     0.0000
Log pseudolikelihood = -1939.9054               Pseudo R2         =     0.0199

                                             (Std. Err. adjusted for clustering on id)
--------------------------------------------------------------------------------------
                     |               Robust
  child_overweight_y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------------+----------------------------------------------------------------
parents_unemployed_y |   .4196281   .1226047     3.42   0.001     .1793273     .659929
  1.urban_or_rural_y |   .1462153   .1653259     0.88   0.376    -.1778174    .4702481
         child_age_y |  -.0089368   .0013886    -6.44   0.000    -.0116585   -.0062151
--------------------------------------------------------------------------------------

. margins, dydx(parents_unemployed_y) post

Average marginal effects                        Number of obs     =      5,341
Model VCE    : Robust

Expression   : Pr(child_overweight_y|fixed effect is 0), predict(pu0)
dy/dx w.r.t. : parents_unemployed_y

--------------------------------------------------------------------------------------
                     |            Delta-method
                     |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------------+----------------------------------------------------------------
parents_unemployed_y |   .1024646    .029751     3.44   0.001     .0441537    .1607755
--------------------------------------------------------------------------------------

. estimates store logitmod

. estimates table logitmod, star stats(N r2 r2_a)

------------------------------
    Variable |   logitmod     
-------------+----------------
parents_un~y |  .10246458***  
-------------+----------------
           N |       5341     
          r2 |                
        r2_a |                
------------------------------
legend: * p<0.05; ** p<0.01; *** p<0.001

I would like to cluster the standard errors by the child's location but urban_or_rural_y is the closest variable I have to location, referring to whether the child lives in an urban or rural region and is a binary variable as below:

Code:
. tab urban_or_rural_y

urban_or_ru |
      ral_y |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |     17,091       57.34       57.34
          1 |     12,713       42.66      100.00
------------+-----------------------------------
      Total |     29,804      100.00

.

Where 0 is urban and 1 is rural. When I try to include this cluster I get the following outcome:


Code:
groups (strata) are not nested within clusters
I'm not quite sure what that means, is it that I don't have enough clusters? i.e. I only have urban or rural?

I want to look at whether parental employment increases the probability of being overweight, so above I take this result as indicating that parental employment increases the probability of being overweight by 10%, i.e. as either parent goes from employed to unemployed the probability of the child going from a normal to overweight increases by 10%

Having done that I would like to know if either parent being unemployed increases the z-score, as I feel that a larger z-score implies a child is further from the mean and closer to being overweight if the score is positive and large, so I do the following:


Code:
xtreg z_score_bmi parents_unemployed_y i.urban_or_rural_y child_age_y [pw=weighting_factor], fe

Which gives me the following result:

Code:
. xtreg z_score_bmi parents_unemployed_y i.urban_or_rural_y child_age_y [pw=weighting_factor], fe 

Fixed-effects (within) regression               Number of obs     =     26,054
Group variable: id                              Number of groups  =      8,972

R-sq:                                           Obs per group:
     within  = 0.0089                                         min =          1
     between = 0.0000                                         avg =        2.9
     overall = 0.0024                                         max =          3

                                                F(3,8971)         =      30.52
corr(u_i, Xb)  = -0.0192                        Prob > F          =     0.0000

                                         (Std. Err. adjusted for 8,972 clusters in id)
--------------------------------------------------------------------------------------
                     |               Robust
         z_score_bmi |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------------+----------------------------------------------------------------
parents_unemployed_y |   .1075761   .0263291     4.09   0.000      .055965    .1591872
  1.urban_or_rural_y |   .0516994   .0344913     1.50   0.134    -.0159113    .1193102
         child_age_y |  -.0026084   .0003005    -8.68   0.000    -.0031974   -.0020193
               _cons |   .8034086   .0191922    41.86   0.000     .7657874    .8410298
---------------------+----------------------------------------------------------------
             sigma_u |  .86341018
             sigma_e |  .77595763
                 rho |  .55319391   (fraction of variance due to u_i)
--------------------------------------------------------------------------------------

Which I take as indicating that as either parent becomes unemployed the child's weight increases by a tenth of a standard deviation.

Does my approach, and understanding of my results make sense?

I would hate to make a mistake and would really appreciate if anyone could point out my mistakes now so that I could correct them at the beginning of my study and do better!


Thank you so much,

John