I have a dataset in long format with the list of physicians (pf_doc) who have worked in a given hospitalization (n_aih) in a given hospital (cnes). More than one physician can work in one hospitalization. Physicians can work in not only different hospitalizations, but also different hospitals. The id of hospitalization is not only unique within a hospital.

I would like to compute, for each physician in each hospital, the share of hospitalizations that he worked alone, the share that he worked with other physicians only once and the share that he worked with other physicians more than once. The sum of all shares should lead to 100% of hospitalizations in which that physician was involved in that hospital.

I tried to run the command -joinby- to get the pairwise combination of every hospitalization-hospital (n_aih - cnes). However, as the original dataset has 250 million observations, -joinby- is unfeasible. I also imagine there is a more efficient way to approach the problem.

Code:
. use SP_PF_brasil, clear

. keep if d_med==1
(37,701,138 observations deleted)

. keep cnes n_aih pf_doc

. egen hosp_id = group(cnes n_aih)

. drop cnes n_aih

. egen ego_id = group(pf_doc)

. drop pf_doc

. save ego, replace /*file with ego phycisians*/
file ego.dta saved

.
. rename ego_id alter_id

. save alter, replace /*file with alter phycisians*/
file alter.dta saved

.
. use ego, clear

. joinby hosp_id using alter.dta /*combination of pairwise interactions*/
sum of expand values exceed 2,147,483,620
    The dataset may not contain more than 2,147,483,620 observations.
r(459);

end of do-file

r(459);
Many thanks
Paula