I am writing up research using the excellent xtabond2 to estimate correlates of economic growth using a panel of countries.

The Arrelano-Bond (1991) and related estimators were created to address the problem that the country constant causes a lagged dependent variable to be correlated with the error term when using fixed effects or first differenced estimation.

For \(y_{it} = \mu_i + \theta y_{i,t-1} + \varepsilon_{it}\), the country constant can be eliminated by taking deviations from the country means:
\[y_{it} - \bar{y} = \theta (y_{i,t-1} - \bar{y}) + \varepsilon_{it} - \bar{\varepsilon}\]
but then the deviated lagged dependent variable is correlated with the deviated error term:
\[Cov[y_{i,t-1} - \bar{y}, \varepsilon_{it} - \bar{\varepsilon}] \ne 0 \]
because both deviated terms include \(\bar{\varepsilon}\).

First differences also eliminates the country constant but presents a similar problem.

\[y_{it} - y_{i,t-1} = \theta (y_{i,t-1} - y_{i,t-2}) + \varepsilon_{it} - \varepsilon_{i,t-1} \]
but
\[Cov[y_{i,t-1} - y_{i,t-2}, \varepsilon_{it} - \varepsilon_{i,t-1}] \ne 0 \]
because both \(y_{i,t-1} - y_{i,t-2}\) and \(\varepsilon_{it} - \varepsilon_{i,t-1}\) include \(\varepsilon_{i,t-1}\)

However, if I just estimate \(y_{it} = \mu_i + \theta y_{i,t-1} + \varepsilon_{it}\) by OLS with dummy variables for each country, then
\[Cov[y_{i,t-1}, \varepsilon_{it}] = Cov[\mu_i + y_{i,t-2} + \varepsilon_{i,t-1}, \varepsilon_{it}] = 0 \]
as long as \(Cov[\mu_i, \varepsilon_{it}] = 0\) and \(Cov[\varepsilon_{it}, \varepsilon_{is}] = 0\ \ \forall\ \ t\ne s\), which is also required for Arrelano-Bond-type estimators.

So why aren't people efficiently estimating dynamic panel data models with OLS when they don't have huge numbers of panel units (countries, people, firms, etc.)? A couple thousand dummy variables is no problem for Stata.

It seems I must be missing something basic.