Dear Statalist,

I am trying to extract the five words before and after a given keyword in a string variable (keyword-in-context analysis). Each keyword can occur multiple times in a string and each context should be written to a new variable. Also, the context should be displayed only up to the period, exclamation mark or question mark. Note that the text may have double spacing and line breaks.

Theoretically, this should be possible with "regexs" and "regexm", but I'm stuck. Maybe you have an idea?

Here is an example where the keyword of interest is "we" (including variants such as "we've", "we've", and "WE") and the text is in the variable "string". The variables of interest are we1, we2, we3 and we_freq. In the example, apostrophes are considered separate words, but that's not really important.

Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input byte id str121 string str45 we1 str42 we2 str27 we3 byte we_freq
1 "Overall, we're confident that it's a great example. But what we really value is your feedback."                            "Overall, we're confident that it's"            "But what we really value is your feedback." ""                            2
2 "Now, there are  also some things we are afraid of. Here's a list of what We  think is scary.  Wait until we've shown you!" "there are  also some things we are afraid of." "Here's a list of what We  think is  scary." "Wait until we've shown you!" 3
3 "This is just a random text. Thanks, WE love it ;-)."                                                                       "Thanks, WE love it ;-)."                       ""                                           ""                            1
end

With thanks and regards,

Marvin