Hi,

I'm working with consecutive censuses. I can follow the same individuals through several decades. However, age is not always reported (missing from the census, unreadable, etc.) and therefore a zero is shown instead of a missing (don't be mad, I know...moreover, newborns also show an age of 0...no comment). But age probably have been reported in a previous or a subsequent census. How can I use that information to infer age when it is 0 (when applicable)?

Also, age is not always consistent so (t-1 + 10) and (t+1 - 10) may yield different results. From my experience, most of the age spread through time range between 8 and 12 years so no matter the census year used in the calculation age should be in the ballpark. In the example below, how to determine which census to use in the calculation?

Finally, individuals are part of dyads (last variable) and may be present in more than one dyad. Note sure it is relevant in the calculation, but agediff is a dyad characteristic that will need to be updated afterwards.

I'm adding a few questions that may help figure out all the possible cases:
- A newborn will be coded as 0. What if in the next census the individual is also of age 0 (instead of 9-10)? Should the calculation start from the last occurrence to the first?
- What if it's the last occurrence that is 0?

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long ego int census byte(ego_age agediff) long dyad
708884 1881 24 23 11415600
708884 1891  0  0 11415600
708884 1901 40 20 11415600
708884 1911 57 25 11415600
739865 1881  1 23 11415600
739865 1891  0  0 11415600
739865 1901 20 20 11415600
739865 1911 32 25 11415600
end
Thanks

EDIT: data is coming from the censuses in SQL tables. Since individuals have only one occurrence by census (compared to multiple kin relationships in dyadic format), maybe I should figure how to recode age in SQL so that age remains consistent through all dyads.