Hi everyone!

I am currently working with a rather big dataset (135mm observations) and I need to run a few regressions. I think it's for the best if I explain my doubt through a reproducible example. My main goal would be to attain the equivalent of the coefficients from the following regression:
Code:
clear all
sysuse auto

reghdfe price i.mpg, noabsorb vce(robust)
However, if I try this approach on the real data I run out of memory (I would be using ~90GB of memory, which I do not have access to). Thinking about ways to circumvent this problem, I thought about doing as follows:
Code:
foreach l of local levels{
    gen byte mpg_`l' = 0
    replace mpg_`l' = 1 if mpg==`l'
} 
reghdfe price mpg_14, absorb(mpg_15-mpg_41) vce(robust)
And then proceed to do the same for each level of mpg (not including mpg_12). As expected, this approach yields the same coefficients. That said, the computational burden remains significant and it would be for the best if I could optimize the code even further. In order to do so I tried the following:
Code:
gen int mpg_aux = .
replace mpg_aux = mpg if mpg!=12 & mpg!=14

reghdfe price mpg_14, absorb(mpg_aux) vce(robust)
This should be significantly faster - and the syntax is also substantially better. However, although this last specification should be equivalent to the former two (as far as I know), it omits the coefficient of interest due to collinearity, thus being useless to me.

Can any of you explain to me why collinearity is only a problem in this last specification? Is there any way to circumvent this while retaining efficiency?

Best regards,
Pedro