Hello, I am analyzing a hospital dataset which provides demographic, diagnostic, and procedural data for each discharged patient.

The patients are divided into two groups based on whether they have a specific diagnosis (yes they had this diagnosis, no they did not).

There are 15 procedure code variables for each patient (but not all patients had 15 procedures). These are labeled cpt1, cpt2, cpt3, etc. I would like to create a table with the top 10 cpt codes (out of all cpt codes) for each group, but do not know how to look at each variable simultaneously. For example, I don't want to just know what is the most common cpt1 code, but rather the most common code out of all CPT variables.

I did try to reshape the data to long form using the code below, but the counts were extremely low per each cpt code and I am not sure it was accurate. Is there another way to approach this?


reshape long cpt, i(patientid) j(proced_num)
tab cpt


I am using Stata/SE 14.2.