standard error double lasso with clustering

Hi,

I am using the new cluster feature for LASSO in stata and I am a little confused by the way the SE are estimated.

When I don't cluster, all is fine. If I do:

sysuse nlsw88.dta
dsregress wage grade , controls( age race married never_married collgrad south )

Stata calculate a grade effect and a SE which is exactly equivalent to running the OLS regression with the LASSO-selected covariates:

reg wage grade `e(controls_sel)' , r

the LASSO and the OLS gives exactly the same result.

But let's say I want to cluster for industry:

sysuse nlsw88.dta
dsregress wage grade , controls( age race married never_married collgrad south )
dsregress wage grade , controls( age race married never_married collgrad south ) cluster(industry)

Both give the same SE. Which makes me think that stata uses clustering to compute the LASSO but does not correct for clustering when estimating the effects. This is super misleading in my view. When you code with option cluster you expect you SE to be clustered. I am missing something here?

Thanks

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / standard error double lasso with clustering
standard error double lasso with clustering

0 Response to standard error double lasso with clustering

Post a Comment

Home / Data Cleaning / Data management / Data Processing / standard error double lasso with clustering standard error double lasso with clustering

Related Posts with standard error double lasso with clustering

0 Response to standard error double lasso with clustering

Post a Comment

Home / Data Cleaning / Data management / Data Processing / standard error double lasso with clustering
standard error double lasso with clustering