Hi,

I am working with a survey where I have information at the individual and household levels. I see in the data each household for 5 periods. After the 5th period, the household changes but it's possible to keep having the same household and individual identifiers in the dataset because the survey recycles them. I want to create a variable that differentiates the households that are different but share the same identifier. To write that code I need to draw a stratified random sample because the dataset I currently have is too heavy for my computer to work with it. For that random sample, I need to keep all the observations that share the same identifier of household and person for all the periods in which it appears. Which command do you recommend?

Here is an example of my data:
HH IN ENT YEAR QR
1 1 1 2001 1
1 1 2 2001 2
1 1 3 2001 3
1 1 4 2001 4
1 1 5 2002 1
1 2 1 2001 1
1 2 2 2001 2
1 2 3 2001 3
1 2 4 2001 4
1 2 5 2002 1
2 1 1 2001 2
2 1 2 2001 3
2 1 3 2001 4
2 1 4 2002 1
2 1 5 2001 2

Thanks,

Tessa