Hi all,

I had a question about how to approach my data problem. I start with a file that is child-parent level. In image 1, "hhidpn" uniquely represents respondents while "kidid" represents children. The file is wide such that "k5age" is the child's age at wave 5 and "k6age" is the child's age at wave 6. I start by converting this to a long format file such that parent-child pairs each have 8 observations (since I am only interested in waves 5-12). This is shown in image 2.

For the final step, my goal is to have this be a respondent level file instead of respond-kid file. In Image 3, you can see where I currently stand. I have parent-child rows, but I want these to be merged. In the image, I have circled the cells that I would like to match up. In this case, kidid would become irrelevant and now, keduc1 would be the education of child 1, keduc2 the education of child 2, etc. I've tried using "collapse(firstnm)" by 'hhidpn and wave." This works, but the missing codes are important. For example, when collapsing, ".p" and ".n" simply become ".". I could convert these to numbers, collapse, and then reconvert, but I have many different extended missing codes and they mean different things for different variables. Alternatively, is there anyway to make Stata not read ".p" ".m" etc as missing? In an ideal world, if only "." counted as missing, I would have no issue.

Does anyone have any suggestions? I apologize is this is convoluted.


As a final note, the missing patterns are random throughout the data. Some respondents have almost every wave, some have none.


Image 1
Array

Image 2
Array

Image 3
Array