Hello,
I have a clarificatory question about the bysort command.
Suppose in my datafile HHID is the ID given to each HH that is not unique.
I want to run a command to see how many households that have daughters
The command I used was ,

Code:
bys HHID: egen HHCH= max(relation==11)
where relation is the variable that shows relation to the household head.

What is the difference between using this first command and the following command?

Code:
bys STATEID DISTID PSUID HHID: egen HHDaughter= max(relation==11)[
I have a unique identifier IDHH for the household.

Heres my data

[CODE]
dataex STATEID DISTID PSUID HHSPLITID HHID IDHH

----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int(STATEID DISTID PSUID HHSPLITID HHID) double IDHH
1 2 1 0  1  10201010
1 2 1 0  1  10201010
1 2 1 0  1  10201010
1 2 1 0  1  10201010
1 2 1 0  1  10201010
1 2 1 0  2  10201020
1 2 1 0  2  10201020
1 2 1 0  2  10201020
1 2 1 0  2  10201020
1 2 1 0  2  10201020
1 2 1 0  2  10201020
1 2 1 0  2  10201020
1 2 1 0  2  10201020
1 2 1 0  2  10201020
1 2 1 0  2  10201020
1 2 1 0  2  10201020
1 2 1 0  2  10201020
1 2 1 0  2  10201020
1 2 1 0  2  10201020
1 2 1 0  3  10201030
I understand I can use
Code:
bys IDHH: egen HHdaughter= max(relation==11)
.

But what I need clarification on is the intuition between using bys STATEID DISTID PSUID HHID.
How is the sorting taking place here vs "bys HHID" ?

ThankYou