Dear all,
How can I write code more efficiently to select the first group or the last group membership?
I use the following codes with 3 lines, but I know there should be an efficient way. I tried to apply 'by' but couldn't figure out. Thank you.
C


encode COHORT, gen(COHORT_1)
egen COHORT_2 = max(COHORT_1)
keep if COHORT_1 == COHORT_2


Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input double PERSON_ID str6 COHORT
517877 "201010"
510879 "201010"
512590 "201010"
515317 "201010"
512580 "201010"
531253 "201110"
531378 "201110"
527816 "201110"
531807 "201110"
524803 "201110"
538907 "201210"
539477 "201210"
540091 "201210"
539153 "201210"
543484 "201210"
550003 "201310"
551269 "201310"
549953 "201310"
549951 "201310"
549942 "201310"
560820 "201410"
562103 "201410"
563341 "201410"
562101 "201410"
563991 "201410"
574394 "201510"
569987 "201510"
569827 "201510"
572599 "201510"
568758 "201510"
585164 "201610"
578954 "201610"
585001 "201610"
577872 "201610"
587184 "201610"
594510 "201710"
592563 "201710"
594477 "201710"
594469 "201710"
593787 "201710"
603141 "201810"
611437 "201810"
614263 "201810"
606238 "201810"
605326 "201810"
621624 "201910"
628749 "201910"
629139 "201910"
622690 "201910"
621377 "201910"
631157 "202010"
639737 "202010"
631058 "202010"
639695 "202010"
641433 "202010"
652750 "202110"
647773 "202110"
645486 "202110"
652190 "202110"
647151 "202110"
660184 "202210"
662295 "202210"
654938 "202210"
665859 "202210"
655894 "202210"
end