Hi,

I am using the new cluster feature for LASSO in stata and I am a little confused by the way the SE are estimated.

When I don't cluster, all is fine. If I do:

sysuse nlsw88.dta
dsregress wage grade , controls( age race married never_married collgrad south )
Stata calculate a grade effect and a SE which is exactly equivalent to running the OLS regression with the LASSO-selected covariates:

reg wage grade `e(controls_sel)' , r
the LASSO and the OLS gives exactly the same result.

But let's say I want to cluster for industry:

sysuse nlsw88.dta
dsregress wage grade , controls( age race married never_married collgrad south )
dsregress wage grade , controls( age race married never_married collgrad south ) cluster(industry)
Both give the same SE. Which makes me think that stata uses clustering to compute the LASSO but does not correct for clustering when estimating the effects. This is super misleading in my view. When you code with option cluster you expect you SE to be clustered. I am missing something here?

Thanks