Hello Statalist,

I have a string variable which is interspersed with HTML tags (e.g. "<br>" or "</span>"). I want to get rid of all these tags that are identified by angled brackets.

To make things complicated:
1) There is a large variety of these tags, so I cannot simply run a "subinstr()" for a select list of them - I need something that catches them in an automated way via the angled brackets.
2) There can be more than one of these tags per observation.

I tried the following code (looping it 9 times to remove up to 9 tags):

Code:
foreach num of numlist 1/9 {
   gen htmltag`num'=substr(textwithtags,strpos(textwithtags,"<"),strpos(textwithtags,">"))
   replace textwithtags=subinstr(textwithtags,htmltag`num',"",.)
}
But this doesn't work well for cases with multiple tags. Take the following example: "<br> Does NAME have an <span style="color:red"> AGREEMENT or CONTRACT</span> to return?" - this approach doesn't know which pair of brackets belong together as one tag, and in consequence some of the text between brackets is also removed...

Any help would be much appreciated!

Best,
Felix