I have encountered a strange glitch(?) when combining cond() and ustrregexs(). Using the following code (where the regex operator \d matches a single numeric digit):

Code:
clear
input str3 var1
"1a"
"2b"
"3c"
"abc"
end
gen var2 = cond(ustrregexm(var1,"\d"),ustrregexs(0),var1)
The output I expect is as follows (for each observation which matches the ustrregexm(), var2 contains the matching digit. Otherwise, var2 contains a copy of var1):

Code:
     +-------------+
     | var1   var2 |
     |-------------|
  1. |   1a      1 |
  2. |   2b      2 |
  3. |   3c      3 |
  4. |  abc    abc |
     +-------------+
The actual output looks different, however:

Code:
     +-------------+
     | var1   var2 |
     |-------------|
  1. |   1a        |
  2. |   2b      1 |
  3. |   3c      2 |
  4. |  abc    abc |
     +-------------+
While the output for non-matching observations is as expected, it seems that when the ustrregexm() results in a match, that match is used to evaluate the next observation's ustrregexs().

This is odd because ostensibly the current observation's ustrregexm() must be evaluated before the current observation's ustrregexs() in order to determine whether the condition is true or false, which means that the ustrregexs() should subsequently evaluate in the current observation.

I can't put my finger on why exactly cond() and ustrregexs() behave this way. Any ideas would be appreciated.

Note: I am aware I could use ustrregexra to achieve the same effect, but I am specifically hoping to understand why cond behaves this way.