I have a string variable
Code:
People
that provides a sentence about the number and type of people on board of boats. I want to convert this variable into three variables:
Code:
N_all
for the total number of passengers,
Code:
N_crew
for the number of crew and
Code:
N_children
for the number of children. The text is inconsistent in that it doesn't mention crew or children if there are none, e.g.:
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str29 People
"49 (2 crew, 17 children) "
"50 (1 crew, 8 children) " 
"40 (2 crew, 4 children) " 
"47 (2 crew, 13 children) "
"27 (2 crew, 4 children) " 
"58 (2, crew, 2 children) "
"38 (2 crew, 3 children) " 
"28 (2 crew, 2 children) " 
"20 (2 crew) "             
"3 (1 crew) "              
"3 (2 crew) "              
"41 (1 crew, 9 children) " 
"10 (3 crew) "             
"37 (6 children) "         
"3 (2 crew) "              
"4 "                       
end
I created N_all via:
Code:
gen         N_all = regexs(0) if regexm(People, "^[0-9]+")
But I have not been able to successfully extract the crew or children using regular expressions. For example,
Code:
gen         N_crew = regexs(0) if regexm(People, "(\d+)[^\d]+?(?=crew)")
gives the error "regexp: nested *?+". What am I doing wrong?