Hello there!

I need to drop some observations of my dataset. I have aprox.115,000 observations of different households in two different periods. Some households were surveyed twice, so in some cases I have information for the same household for the two different years. In other cases, some households were surveyed only in one of the two years. In addition, some households appear manny times because there different members of the family answering the survey. So, some households are repetead because the family has more than one member or because it was surveyed in two different years (or both things at the same time). I need to make two different datasets from the original one:
  • First, I need to keep only that observations (households) that were surveyed in the two years, droping the ones wich were surveyed in only one the that years.
  • Second, I need to drop the units (households) wich were surveryed in the two years, but only droping the observations for first year, and keeping that units for the second year. I mean, housold A appears 4 times, because it has two members wich answer the poll and because it was surveyed twice: in 2014 and in 2015. I need to keep this household only when year is equal to 2015, and drop it when year is equal to 2014.
My "problem" is that, as I have around 115 thousand observations, I need to do this for all of them in both cases, and I don't know how to work with a "generic" code. I have different variables wich could help to make the code:
  1. Household ID
  2. Year
  3. Number of member
Essentialy, I don't know how to say to stata "if the household ID appear for two different years, keep; if not, drop it" (first case) and "when the household appear in two different years, keep only the more recent year (in our example, 2015) and drop it for the oldest year".

I hope I were clear.

Thank you in advance.