Hi! I am fairly new to Stata and looking to organize one of my variables within a dataset. it is a list of codes, that start with either HS (health and safety) , PC (penal code) , VC (vehicle code) or BP (business and practices), all compiled into one list with letters, numbers, and decimals to describe the type of offense, and wether it was a misdemeanor (M), felony (F) or infraction (I). This is how they read in the dataset: HS11350(a)-M, VS 11762-I, etc.

I am trying to organize each code that starts with HS to its own numerical value, but since Stata cant read letters, I don't see how I can do this without renaming the actual offense code in its raw form, which is on an Excel spreadsheet. I created a new variable titled "offense"

I am using this code, and I keep getting a type mismatch error:

replace offense = 1 if OffenseSection = "HS11350(a)-M" | OffenseSection = "HS11351-F" | OffenseSection = "" |... so on and so forth.


My question is... Does anyone know a more efficient method for me to assign these codes a numerical value so that I can group all codes with similar values (letters beginning with the same thing) like so: HS = 1, VC = 2, PC =3, BP = 4.

Any help is so appreciated, because I am stumped and tired. Thanks!