Hello Stata Forum,

I have a series of doctors notes which range from 10,000 to 30,000 characters in length and I am hoping to identify specific medications from these notes as well as the medication instructions. As a result, I was planning to use Stata string functions to search the notes for specific key word (for example lisinopril) and then extract a set number of characters immediately preceding and following the key word (to start, was planning to extract 100 characters before and 100 characters after).

Two examples
1. Out of a 20,000 character note a fragment might say "Patient on lisinopril 10 MG daily".
2. "Patient instructed to stop taking lisinopril due to side effects"

My goal is not to extract exact sentences, but simply enough characters around the key word to provide a text fragment that gives context without having to read a full 20,000 character string note.

I read the following posts which touches on a similar issue and provided guidance for how to handle misspellings as well as using regexm to search for words beforehand, but neither quite explained how to do character based extractions. I was hoping for suggestions on how to specify a number of characters be extracted to a new variable both before and after the key term.

HTML Code:
https://www.statalist.org/forums/forum/general-stata-discussion/general/1328155-string-var-extract-sentence-based-on-a-single-word
HTML Code:
https://stats.idre.ucla.edu/stata/faq/how-can-i-extract-a-portion-of-a-string-variable-using-regular-expressions/
Best,
Tim