I have data where individuals may be represented multiple times in the data and I am trying to create an identifier in Stata that takes into account the name and date of birth of the entry right above it. (The reason why I want to take into account the name and DOB above it is that there are routine data entry errors that I want to override to say "Assign the same personid if the name or DOB is the same as the entry above it"). I have done this in Excel in the past by: (1) sorting the data on name and DOB, (2) placing a "1" in the A2 cell, and then (3) using an IF(OR) statement in cell A3. (The IF(OR) statement is: =IF(OR(B3=B2, C3=C2),A2,A2+1). This works well with a smaller number of data points but would take a long time with 15 million rows/units of analysis. Below is an example of how I want the personid to look:
personid | Name | Date of birth |
1 | Smith, John | 1/7/1980 |
1 | Smith, John | 1/7/1980 |
1 | Smith, Jon | 1/7/1980 |
2 | Smith, Joseph | 6/13/1947 |
3 | Smith, Josephine | 12/13/1985 |
Thank you for your assistance!
Jamie
0 Response to Group identifiers with conditional _n-1 statements
Post a Comment