Hi,

I am using Stata 16 and carrying out analysis on intergenerational income mobility, using the Understanding Society (UKHLS) and Harmonised BHPS dataset for the UK (https://beta.ukdataservice.ac.uk/dat.../study?id=7453).

After appending and merging individual data files for 26 of the waves together (data from 1991-2018), I have the following data:

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long pidp float(hhid2 wave ln_annual_net_hh_income ln_annual_income_child age sex_indv pno_derived pno_mother pno_father) long(mother_pidp father_pidp)
 687     1  1  8.232247        . 91 2 1 0 0 . .
1367     2  1  8.257969        . 28 1 1 0 0 . .
1367    41  2  9.448879        . 29 1 1 0 0 . .
2051     2  1  8.257969        . 26 1 2 0 0 . .
2051    41  2  9.448879        . 27 1 2 0 0 . .
2051 63798  3   8.56237        . 28 1 1 0 0 . .
2727     3  1   8.72498   9.2192 57 2 1 0 0 . .
2727 16258  2 10.003557 8.943591 59 2 1 0 0 . .
2727 63811  3  8.944751 8.919909 59 2 1 0 0 . .
2727 99743  8   9.44914 8.897836 64 2 2 0 0 . .
2727 82925  9  9.568527        . 65 2 1 0 0 . .
2727 91994 10  9.893599 8.707804 66 2 1 0 0 . .
2727 36017 11  10.00738 8.662105 67 2 1 0 0 . .
3407    18  1  9.171175 8.928665 36 1 1 0 0 . .
3407    18  1  9.171175 8.928665 36 1 1 0 0 . .
end
NB: I have non-missing values for the variables mother_pidp/father_pidp but just not for the select few observations shown in the example above.

pidp is a unique cross-wave person identifier; hhid2 is a within-wave household identifier; ln_annual_net_hh_income is the parents' household income; ln_annual_income_child is the child's income as an adult; pno_derived is the person number of the individual within a household; pno_mother/pno_father is the person number of the mother/father within the household unit.

I have not dropped values for parent_pidp equalling zero (i.e. observations where a parent is not identified in the same household as the child) as this would mean I cannot then identify the individuals parents if they themselves do not have a parent who is interviewed as part of the survey.

In order to regress child adult income on household (parental) income when the child was younger, I need parental income (ln_annual_net_hh_income), identified by mother_pidp (unique ID of mother of individual) or father_pidp (unique ID of father of individual), to appear in the same row as the observation for the child i.e. a series of column variables for the child's row stating e.g. parent_income_w1, parent_income_w4 etc. where the suffix refers to the waves in which the parental household income is observed. I would also like to see the the age of the parent in each respective wave that their income is observed so that I can selectively choose the ages at which to compute the parental income variable. However, I'm unsure of what commands would enable me to create such variables in order to carry out the regression.


Thanks!