Hi,

I am trying to calculate how many PCP visits a patient have in a year. The dataset has enrolid (unique id), PCP visits (yes or no, 1 or 0), and the svcdate (date of service). Each enrolid could have multiple observations/ rows depending on how many services they used. Also, each enrolid might have multiple services on the same svcdate. Below is how the data looks. (The original dataset has 6.8M rows for 20,000 enrolid, and I am trying to generate variables so that in the end I am left with 20K rows, one for each enrol id)

input double enrolid float pcp long svcdate
350802 0 20616
350802 0 20688
350802 0 20688
350802 0 20688
350802 0 20688
350802 0 20688
350802 0 20688
350802 0 20688
350802 0 20688
350802 0 20688
350802 0 20688
350802 1 20688
1315601 1 20481
1315601 1 20481
1315601 0 20488
1315601 0 20488
1315601 1 20488
1315601 1 20488
1315601 1 20488
1315601 1 20488
1315601 1 20488

format %tdD_m_Y svcdate
[/CODE]

I want to now generate 2 variables:
1) Did the patient have any pcp visit that year?
For that, I used the following command:
by enrolid, sort: egen anypcpvisit2016 = max(pcp) //If any visit with the pcp during the year

Does the above code seem reasonable?

2) Now I want to know how many pcp visits they had. All the pcp visits that they had on the same service date should be counted as one visit. I think I am stuck at this point. As you can see above in the dataset, the patient 1(350802) had 1 pcp visit, but the other patient (1315601) had 2 PCP visits on svcdate 20481 and then 5 on svcdate 20488. Thus the output I want is that the patient (1315601) had a total of 2 visits in the year but the code I wrote below gives me the answer as 7.

I wrote the command as follows:
by enrolid, sort: egen numberpcpvisit2016 = sum(pcp) //No of pcp visits during the year

May be I need to sort by enrolid AND svcdate, but don't know how to do that.

Any suggestions?
That you
Anu