Dear all,

I'm trying to answer a question here: https://stackoverflow.com/questions/...rvations-stata

The question is how to tag observations where there is more than 1 ID within each firm/year combination.

I gave it a try:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(Id Firm_id) int Year
1 50 2010
1 50 2011
2 50 2010
2 50 2011
3 22 2010
3 22 2011
4 22 2010
4 20 2011
end

local condition "Id != Id[_n-1] & !missing(Id) & _n > 1"

bysort Firm_id Year (Id): gen cond = `condition'
bysort Firm_id Year: egen tokeep = max(cond)

bysort Firm_id Year (Id): egen tokeep2 = max(`condition')
list

     +-----------------------------------------------+
     | Id   Firm_id   Year   cond   tokeep   tokeep2 |
     |-----------------------------------------------|
  1. |  4        20   2011      0        0         0 |
  2. |  3        22   2010      0        1         1 |
  3. |  4        22   2010      1        1         1 |
  4. |  3        22   2011      0        0         1 |
  5. |  1        50   2010      0        1         1 |
     |-----------------------------------------------|
  6. |  2        50   2010      1        1         1 |
  7. |  1        50   2011      0        1         1 |
  8. |  2        50   2011      1        1         1 |
     +-----------------------------------------------+
tokeep gives the right results, but tokeep2 gives different results. I fail to see how these results can be different. Is this a bug or am I missing something?