Hello everyone,

I am new to Stata and currently working with Stata 15.

Please I urgently need help with generate new variable consist of ICD 10 codes related to substance use from existing ICD 10 codes variable.
To be specific, I am working on a dataset that consist numerous ICD 10 codes as observations, however, I am only interested in ICD 10 codes related to substance use (alcohol and illicit drug use).
My data looks like this:

diagnosis_codep
R10.3
P28.83
O82
O82
O99.8
N97.9
Z38.0
M23.22
T39.1
Z36.8
O82
O42.0
O99.8
P83.8
P83.8
B34.9
O80
J21.9
O40
O80
Z03.79
T63.3
L51.9
P07.32
O99.8
O99.3
O10.0
O80

Please note that each column consist up to 4million observations.The above is just to show you what it looks like.

Of note, I have a list of all the codes I am interested in.
Hence, I will like to extract all the ICD 10 codes related to substance use in each column (F10-F19, G31.2, K29.2 etc) under a new variable.

I have try this code
[CODE]
gen new_icd10p=diagnosis_codep if strmatch(diagnosis_codep, "F1*" "G31.2*" "G62.1*" "K70*" "G72.1*" "K29.2*" "K85.2*")
The above code came back as "
(4,131,716 missing values generated)"

I also tried this ode:
[CODE]
gen new_icd10p=diagnosis_codep if strmatch(diagnosis_codep, "F1*", "G31.2*" ,"G62.1*" ,"K70*", "G72.1*")
The above code only take "F1*" into consideration and ignored others.

Please can someone guide me as to how to go about this!!!