I have single line per observation survival data (4 million lines). Here is a simplified example
default | zip code | date_start | date_end | date_default |
1 | 12345 | 2000q2 | 2016q1 | 2005q3 |
0 | 54321 | 1993q4 | 2016q1 | |
1 | 13467 | 2003q1 | 2016q1 | 2010q1 |
zip code | date | unemployment | default rate |
11111 | 1990q1 | 4.2 | x |
11111 | 1990q2 | 4.1 | x |
11111 | 1990q3 | 4.6 | x |
One guess was to create some new variable that uniquely identifies zipcode/quarter combinations, and then to do a statsby on this. But that would imply ~12,000 groups (100 zip codes * 30 years *4 quarters), and that just doesn't seem right/efficient.
It shouldn't be hard for me to find a way to count the defaults per quarter/department (although I can't do tab default department zipcode, as this is too many variables :/), but I must confess I have no idea where to start on counting (and organizing in a new panel, without Excel) the at-risk loans per quarter.
Thank you so much for even some rough intuitions about how to go about this in STATA.
Have a great day,
John
0 Response to Count number of cases if dates are within a certain range (a la statsby)
Post a Comment