Dear Statalist,

Hello! I've been a regular visitor here for months, but this is my first time posting. I wanted to ask you for help with a dataset with multiple observations by ID (what I need are underlined further down). Below, you'll see a sample dataset with 18 total observations among 10 distinct IDs.

id bmi cal_pat
1 41.33 CP
1 41.33 PG
2 42.29 PG
2 42.29 FC
3 40.59 CP
3 40.59 PG
3 40.59 FC
4 47.42 CP
4 47.42 PG
4 47.42 FC
5 48.13 PG
6 44.11 CP
6 44.11 PG
7 42.63 PG
8 41.33 CP
8 41.33 PG
9 73.51 FC
10 54.95 CP

id is ID, bmi is BMI (kg/m^2), and cal_pat is a designation of calcification pattern/type in the brain (CP: choroid plexus; PG: pineal gland; FC: falx cerebri).

Patients may have more than one type of calcification (#1, 2, 3, 4, 6, 8), while others may have just one (#5, 7, 9, 10).

What I need are the Stata codes with which these can be answered:
1) The number of people by calcification pattern (cal_pat). To clarify, I'm not interested in the frequency of each calcification type across all observations but by ID instead, so, something along the line of saying: "Of the 10 patients, # had CP, # had PG, and # had FC." In the dataset above, 6/10 patients had CP, 8/10 had PG, and 4/10 had FC (if I counted correctly).
2) The output in #1 by BMI category (for this, I will have prepared a variable named "bmi_cat" classifying each patient as following: underweight=1, normal=2, overweight=3, obese=4, etc., though everyone in the example here is >40 kg/m^2)

I realize this may be a simple task for most of you, but it's been bugging my mind for so long that I can't think of how to proceed. I've seen many questions here about a dataset with multiple observations, but not exactly like what I have here. I'm looking forward to your response/suggestions and what I'd learn in the process.

Thank you so much in advance!


Kevin