Dear reader,

for a project we need to make a regression where we control for industry*year fixed effects. We use 282 sic codes and 1977-2015 as period. We created the following loop:

gen industry = Ac3SIC
encode industry, gen(test)
gen year=year(date)

forvalues a = 1977(1)2015 {

forvalues b = 1(1)288 {
gen y`a'i`b' = 0
replace y`a'i`b' = 1 if year==`a' & test==`b'

foreach var of varlist y`a'i`b' {
sum y`a'i`b', meanonly

if r(mean) == 0 {
drop y`a'i`b'
}
}
}
}

This resulted in almost 4000 dummy variables. For our regression we want to make a categorical variable which replaces those 4000 dummy variables. Can anyone help us with this? Or is there an other way how we should approach this?

Kind Regards,
Jan Blokland