Dear Stata forum!

I´m writing a thesis on the causal mechanism of health and socioeconomic position, using the HRS panel data. The data spans for 13 waves from 1994 - 2016 (delta 2 years) and includes a variety of socioeconomic and health-related variables. I have created a health index using factor analysis, which I will estimate on using household incomes following the article from Foverskov & Holm (2015).

Our goal is to use the Anderson-Hsiao estimator to cleanse the endogeneity of the lagged dependent variable.

I use the following code for income of lag 1:


*global xlist"yr94-yr16 Age Agesq"

* xtivreg d.HI ld.income d.$xlist (ld.HI = l2.HI) , vce(cluster HHIDPN)

Variable explaination:
HI - log(Health Index)
income - log(HH income)
HHIDPN - ID number (person)
yr94-yr16 - Year dummy for 1994 to 2016

My question is this: Does it make sense to include year dummy´s and age variables in this differenced model? From what I read in the previous post (Wooldridge a.o.), as long as we believed that the model is specified correctly, then we should look past the estimation method. I´m still not convinced, since we only observe the difference in age from wave to wave e.g. 2 for most individuals (some variance with 1 and 3 due to date of birthdays and survey date). Same problem for the year dummys, since the dummys will take on the form of -1, 0 and 1 throughout the survey. I´m afraid that I'm not specifying the model correctly.

I appreciate any help I can get.