I am conducting a cross-sectional study of UK FTSE 100 firms. I want to group companies by industry into less than 20 industry groups. I initially used the following code which created individual groups for individual values of the industry codes (variable 'industry').

sort industry by industry: gen newid = 1 if _n==1 replace newid = sum(newid) replace newid = . if missing(industry) However, under the 2007 SIC codes, there are many that can be grouped into larger subsets and I desire to do this for meaningful analysis. I need suggestions for generating a new industry variable where I can decide how many SIC codes can fall under one category. For example, codes 30000 to 39999 = 3, 4000 to 47999 = 4 etc.