Hello Statalisters,
I'm trying to summarize some statistics by household (NUM_HOG). Specifically, I would like a summary variable of how many kids they have that are 0 (age_0) through 5 years (age_5) and then I would like a simple binary indicating if the family has at least 1 child under 5.

I created the following code to create the age_* variables and the under5 variable:

forval i = 0/5 {
bysort NUM_HOG: egen age_`i' = count(PPA03) if PPA03==`i'
replace age_`i' =
}

bys NUM_HOG: gen under5 = 0
replace under5=1 if age_0 !=. | age_1 !=. | age_2 !=. | age_3 !=. | age_4 !=. | age_5 !=.

This resulted in the following dataset:
Code:
input double(NUM_HOG PPA02 PPA03) float(ame hh_ame age_0 age_1 age_2 age_3 age_4 age_5 under5)
13680 1 71 .76 4.02 . . . . . . 0
13680 2 65 .65 4.02 . . . . . . 0
13680 1 28   1 4.02 . . . . . . 0
13680 1 16 .96 4.02 . . . . . . 0
13680 2 13 .65 4.02 . . . . . . 0
13681 1 42 .95 3.06 . . . . . . 0
13681 2 25 .74 3.06 . . . . . . 0
13681 1  7 .56 3.06 . . . . . . 0
13681 2  4 .44 3.06 . . . . 1 . 1
13681 2  2 .37 3.06 . . 1 . . . 1
13682 1 34 .95 2.98 . . . . . . 0
13682 2 24 .74 2.98 . . . . . . 0
13682 1  3 .37 2.98 . . . 1 . . 1
13682 2  1 .27 2.98 . 1 . . . . 1
13682 2 69 .65 2.98 . . . . . . 0
end
As you can see for NUM_HOG==13682 the under5 variable is sometimes 0 and sometimes 1 based on that specific individual within the family, and with the age_* variables they are largely missing even if someone in their family is 1 year old, for example.

Question: I would like 1 observation per household with these summary statistics so that I can merge it with another portion of this national survey. As the data currently stands, if I collapse by under5 using mean or count, I'm going to inaccurately capture the number of children each household has. As I see it, I think I need to figure out a way to adjust/add to my code so that the age_* variables and under5 variables are all set to the same number per household (NUM_HOG), something like how the hh_ame variable is currently.

I reviewed this post, which was helpful, but couldn't quite figure out how to apply it to the question at hand. I hope this is sufficient information to answer my question, but of course please highlight if more clarification is needed.

Thank you in advance!