Hi Statalisters,

I just would like to share a solution for the inlist limit of 10 string arguments, which I have not seen anywhere else here on Statalist.

There's a multitude of topics complaining about the 10 argument limit of inlist, for example here, here, here, here and here. Different solutions have been proposed and there is also inlist2 from SSC. The latter creates a dummy variable and doesn't allow comma's in the strings though.

My solution builds on Andrew Musau 's and William Lisowski suggestions in the topics above to use regexm, which works nicely in most cases but does have a limit as well. ustrregxm however appears to be limited only by the maximum length of a local macro.

The following program converts a list to a regular expression:
Code:
program list_to_regex
    args list
    mata: st_local("regex", "^(" + invtokens(tokens(st_local("list")), "|") + ")$")
    c_local regex "`regex'"
end
An example of usage is:
Code:
list_to_regex "US BE JP"
keep if ustrregexm(country, "`regex'")
It also works for arguments with spaces:
Code:
sysuse auto, clear
list_to_regex `"`"Audi 5000"' `"Audi Fox"'"'
keep if ustrregexm(make, "`regex'")
Here's an example to show that it scales to at least 17576 elements, which should be more than enough for any inlist application.
Code:
clear
set obs `=26^3 + 1'
gen foo = ""
local i 0
foreach a in `c(alpha)' {
    foreach b in `c(alpha)' {
        foreach c in `c(alpha)' {
            local ++i
            replace foo = "`a'`b'`c'" in `i'
            local list `list' `a'`b'`c'
        }
    }
}

replace foo = "abcd" if missing(foo)
list_to_regex "`list'"
drop if ustrregexm(foo, "`regex'")
list
I hope this can help some get around the (slightly annoying) 10 string elements limit of inlist.

P.S. I'm not advocating to use large lists for filtering, there's probably a better way to do this, I just like the idea of using something for which scaling is not a problem. And sometimes it is easier to use a list than to create a different dataset and merge.