Hi,
I’ve been working on a large panel study. We’ve reshaped a database for the entire population by address to get up to 12 household members for each address. I then used address to match the panel members to household members.
The household members include some data about the panel member. One way around this is to use name and age to identify the panel member. Loops to identify the number of people of a particular race or the oldest person in the household can easily be written to exclude the panelist.




dataex id forename surname house street year panel_14_fm fore_14_fm_1 sur_14_fm_1 race_14_fm_1 start_yr_14_fm_1 fore_14_fm_2 sur_14_fm_2 race_14_fm_2 star
> t_yr_14_fm_2

----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input byte id str7(forename surname) byte house str11 street int year byte panel_14_fm str7 fore_14_fm_1 str9 sur_14_fm_1 byte race_14_fm_1 int start_yr_14_fm_1 str5 fore_14_fm_2 str10 sur_14_fm_2 byte race_14_fm_2 int start_yr_14_fm_2
1 ""        "Who"      . "Tardis"      1776 2 "Tegan"   "Jokanovic" 1 1960 ""      "Who"        1 1776
2 "Barbara" "Wright"  17 "Coroonation" 1940 1 "Barbara" "Wright"    1 1940 "Ian"   "Chesterton" 1 1940
3 "Ben "    "Jackson" 32 "Every"       1935 1 "Ben"     "Jackson"   1 1935 "Polly" "Jackson"    . 1940
4 "Vicky"   "Water"    . "London"      1850 2 "Jamie"   "McCrimmon" 1 1730 "Vicky" "Water"      1 1850
end
However, I’ve been told that the data will be cleaner if we take out the panel member from the household data!


Code:
 forvalues j = 1/2 {
replace fore_14_fm_`j'="" if panel_14_fm==`j'
replace sur_14_fm_`j'="" if panel_14_fm==`j'
replace race_14_fm_`j'=. if panel_14_fm==`j'
replace start_yr_14_fm_`j'=. if panel_14_fm==`j' 
*…
}
This leaves gaps for household members where the panelist was. I’ve been looking at ways of moving the remaining household members to the left to remove the gaps. Reshape might be an option, but by this stage we have data for five time periods. Another complication in moving the data across is that although every household member has a surname, title and gender, etc, are optional fields.


dataex id forename surname house street year panel_14_fm fore_14_fm_1 sur_14_fm_1 race_14_fm_1 start_yr_14_fm_1 fore_14_fm_2 sur_14_fm_2 race_14_fm_2 star
> t_yr_14_fm_2

copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input byte id str7(forename surname) byte house str11 street int year byte panel_14_fm str7 fore_14_fm_1 str9 sur_14_fm_1 byte race_14_fm_1 int start_yr_14_fm_1    str5    fore_14_fm_2    str10    sur_14_fm_2    byte    race_14_fm_2    int    start_yr_14_fm_2
1 ""        "Who"      . "Tardis"      1776 2 "Tegan" "Jokanovic" 1 1960 ""      ""           .    .
2 "Barbara" "Wright"  17 "Coroonation" 1940 1 ""      ""          .    . "Ian"   "Chesterton" 1 1940
3 "Ben "    "Jackson" 32 "Every"       1935 1 ""      ""          .    . "Polly" "Jackson"    . 1940
4 "Vicky"   "Water"    . "London"      1850 2 "Jamie" "McCrimmon" 1 1730 ""      ""           .    .
end
I've tried the below code, but it isn't working. I wondered if, short of reshaping the data, there are solutions for ensuring there are no gaps. Alternatively we could live with the gaps or simply write programs that work around panel members also being household members. There are up to 12 housemates (e.g. religious communities) so I'd like to try a loop or automated command.
Code:
forvalues j = 1/2 {
replace fore_14_fm_`j'=fore_14_fm_`j+1' if sur_14_fm_`j'==""
}
Suggestions are welcome,
​​​​​​​Matthew Gibbons