Hello,

I have a dataset like below.

What I want to do is to group based on icd_code (first 3 digit, e.g. I01, I02, I03 etc.) and re-count its frequency and percentage for each group.

I have already manually given 2 examples in the dataset:
For row1-2, grouped into "I05-I09"
For row 3-11, grouped into "I10-I15"

So, could anyone help to give an example how to group I20-I21 and calculate their frequency and percentage using Stata?

Thank you in advance!

Best regards,
Z

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str5 icd_code int freq double percent str7 icdgroup int groupsumfreq double groupsumpercent
"I051"    1  .05 "I05-I09"   3  .14
"I052"    2  .09 ""          .    .
"I10"     3  .14 "I10-I15" 216 9.85
"I109"  194 8.84 ""          .    .
"I110"    3  .14 ""          .    .
"I119"    4  .18 ""          .    .
"I120"    4  .18 ""          .    .
"I129"    1  .05 ""          .    .
"I130"    1  .05 ""          .    .
"I150"    2  .09 ""          .    .
"I159"    4  .18 ""          .    .
"I200"   58 2.64 ""          .    .
"I201"    7  .32 ""          .    .
"I208"   86 3.92 ""          .    .
"I209"  125  5.7 ""          .    .
"I210"   20  .91 ""          .    .
"I211"   16  .73 ""          .    .
"I212"    6  .27 ""          .    .
"I213"    7  .32 ""          .    .
"I214"   51 2.32 ""          .    .
"I214A"   6  .27 ""          .    .
"I214B"   2  .09 ""          .    .
"I214W"   1  .05 ""          .    .
"I214X"   8  .36 ""          .    .
"I219"   36 1.64 ""          .    .
end