I would like to suggest an addition to the documentation for collinear covariates in LASSO models. The Summary section currently reads as follows:

Summary
Consider factor variablegroupthat takes on the values 1, 2, and 3. If you type
. lasso linear y i.group. . .
lassowill know that separate covariates forgroup1, 2, and 3 are to be included among the variables
to be potentially included in the model.
If you create your own indicator variables, you need to create and specify indicators for all the
values of the factor variable:
. generate g1 = (group==1)
. generate g2 = (group==2)
. generate g3 = (group==3)
. lasso linear y g1 g2 g3. . .
It is important that you do not omit one of them, say,g1, and instead type
. lasso linear y g2 g3. . .


While tinkering around, I discovered that that one must not use ib#.group in place of i.group. Doing so causes the specified base level to be omitted, and will therefore give different results. I think a warning about this should be added to the documentation. E.g., something like this could be added to the Summary section.

Note as well that you must not use the ib# prefix, because that will cause the selected base level to be omitted. For example, using ib1.group is equivalent to including g2 and g3 but not g1.
I'm sure the folks who write the documentation can improve on the wording, but I hope this gets the idea across.

For anyone who is interested, the code for my "tinkering" is pasted below.

Cheers,
Bruce


Code:
// File:  LASSO_collinear_covariates.do
// Date:  25-Oct-2022
// Name:  Bruce Weaver, bweaver@lakeheadu.ca

// Suggestion:  Caution users of LASSO that factor variables will not
// be handled as described in the documentation if one uses ib#.variable.
// Only the i.variable form of factor variable notation is handled properly.

// The relevant documentation can be seen here:
// https://www.stata.com/manuals/lassocollinearcovariates.pdf#lassoCollinearcovariates

// Use auto.dta to create an example like the one described.
clear *
sysuse auto

// Create 5 indicator variables for rep78
forvalues i = 1(1)5 {
    generate byte rep`i' = rep78 == `i' if !missing(rep78)
}
summarize rep1-rep5

// NOTE that you must reset the seed before estimating each model.

* [1] Use factor variable notation for rep78
set seed 1234
quietly lasso linear mpg i.rep78 ///
foreign headroom weight turn gear_ratio price trunk length displacement
* Show which variables have been retained
lassocoef, display(coef)

* [2] Use the 5 indicator variables for rep78
set seed 1234
quietly lasso linear mpg rep1 rep2 rep3 rep4 rep5 ///
foreign headroom weight turn gear_ratio price trunk length displacement
* Show which variables have been retained
lassocoef, display(coef)

// Q. What happens if one uses ib#.rep78 rather than i.rep78?

forvalues i = 1(1)5 {
set seed 1234
display "Base level for rep78 = "`i'
quietly lasso linear mpg ib`i'.rep78 ///
foreign headroom weight turn gear_ratio price trunk length displacement
* Show which variables have been retained
lassocoef, display(coef)     
}

// A. Stata omits the base level when I do that.
// Let's check a couple of them to verify.  

* Factor variable notiation with ib3.rep78
set seed 1234
quietly lasso linear mpg ib3.rep78 ///
foreign headroom weight turn gear_ratio price trunk length displacement
* Show which variables have been retained
lassocoef, display(coef)     
* Indicator variables with rep3 omitted
set seed 1234
quietly lasso linear mpg rep1 rep2 rep4 rep5 ///
foreign headroom weight turn gear_ratio price trunk length displacement
* Show which variables have been retained
lassocoef, display(coef)     

* Factor variable notiation with ib5.rep78
set seed 1234
quietly lasso linear mpg ib5.rep78 ///
foreign headroom weight turn gear_ratio price trunk length displacement
* Show which variables have been retained
lassocoef, display(coef)     
* Indicator variables with rep5 omitted
set seed 1234
quietly lasso linear mpg rep1 rep2 rep3 rep4 ///
foreign headroom weight turn gear_ratio price trunk length displacement
* Show which variables have been retained
lassocoef, display(coef)     

// Confirmed.