Dears,

I have a dataset with data on pharmaceutical supplies with around 7million observations. One of the variables is on the pharmaceutical presentation, and I have around 1200 different types of presentations. I want to go from the pharmaceutical presentation to the total amount of substance per box
The variable is in string as follows and I want to multiply the first number (amount of substance per pill) per the number in the end of the string (number of pills in a box)

40 MG COM REV CT BL AL PLAS TRANS X 10
40 MG COM REV CT BL AL PLAS TRANS X 20
40 MG COM REV CT BL AL PLAS TRANS X 30
5 MG COM CT 2 BL AL PLAS INC X 10
5 MG COM CT BL AL / AL X 20
5 MG COM CT BL AL / AL X 30
5 MG COM CT BL AL PLAS BCO LEIT X 20
5 MG COM CT BL AL PLAS INC X 100
5 MG COM CT BL AL PLAS INC X 20
5 MG COM CT BL AL PLAS INC X 20
5 MG COM CT BL AL PLAS INC X 30
5 MG COM CT BL AL PLAS OPC X 20
5 MG COM CT BL AL PLAS OPC X 30

Problem 2: some of thee strings (less than 10%) have information in parenthesis that do not interest me in the end, i.e:
20 MG COM DISP CT 2 BL AL PLAS INC X 14 (PORT 344/98 L - C1)
25 MG CAP GEL DURA CT BL AL PLAS INC X 50 (EMB HOSP)

Do you have any suggestions on how to solve it?

Thanks so much