Hello!

We have a question about implementing the ‘Mundlak’ approach in a multilevel (3-levels) nested hierarchical model. We have employees (level 1), nested within year-cohorts (level 2), nested within firms (level 3).

In terms of data structure, the dependent variable is employee satisfaction (ordinal measure) at employee level (i) over time (t) and across firms (j) (let’s call this Y_itj), noting that we have repeated cross sections with different individuals observed in every period, while as regressors, we are mainly interested in the impact of a firm level time-variant, but employee invariant, variable (let’s call it X_tj). We apply a 3-level ordered profit model (meoprobit in Stata).

We are concerned with endogeneity issues of the X_tj variable, which we hope to (at least partially) resolve by using some form of a Mundlak approach, by including firm specific averages for X_tj, as well as for all other time-varying explanatory variables. The idea is that if the firm-specific averages are added as additional control variables, then the coefficients of the original variables represent the ‘within effect’, i.e. how changing X_tj affects Y_itj (employee satisfaction).

However, we are not sure whether approach 1 or 2 below is more appropriate, because X_tj is a level 2 (firm level) variable.
  1. The firm specific averages of X_tj (as well as other explanatory variables measured at level 2) need to be calculated by averaging over individuals, even though the variable itself is a Level 2 variable (varies only over time for each firm). That is, in Stata: bysort firm_id: egen mean_X= mean(X). As our data set is unbalanced (so the number of observations for each firm varies over time), these means are driven by the time periods with more observations. For example, in a 2-period model, if a company has a lot of employee reviews in t=1 but very few in t=2, the observations in t=1 will dominate this mean.
  1. Alternatively, as the X_tj variable is a level 2 variable, the firm specific averages need to be calculated by averaging over time periods. That is: we first create a tag that is 1 only for the first observation in the sample per firm/year, and then do: bysort firm_id: egen mean_X= mean(X) if tag==1. This gives equal weight to each time period, irrespective of how many employee-level observations we have in that period. For example, although a company has a lot of employee reviews in t=1 and very few in t=2, the firm specific mean will treat the two periods as equally important.
The two means are different, and we are unsure which approach is the correct one (and which mean produces the ‘true’ contextual effect of X_tj on Y_itj). We have been unable to locate in the literature a detailed treatment of the issue for 3-level models (as opposed to 2-level models where the situation is straightforward). Any advice/suggestions on the above would be very much appreciated.