Hello,

I want generate a new varibale that display the words found in the string variable that match my local list of words.

The context hereby is that I want to create a profanity filter. Nevertheless not all words are considered to be profane in every context.
Therefore I want to see which words my profanity filter is classifying as profane.

The Approach sofar:

gen profanitydummy = 0
gen profanitycount = 0

local badwords "badword1 badword2 badword3 badword4"

foreach b in `badwords' {
replace profanitydummy = 1 if strpos(varstring, " `b' ") != 0
replace profanitycount = profanitycount + 1 if strpos(varstring, " `b' ") != 0
}

This results in a dummy if a word in the varstring matches a word in the local badwords.
In addition it counts the number of unique badwords used in the string.

The local badwords list is approx. 1100 words, I used from a reseacher gathering "offensive" words.


I now want to know, for which words the profanity dummy is indicating that there is a bad word in the varstring.

My approach:


gen badwordinstring = ""
foreach b in `badwords'{
replace badwordinstring = " `b' " if strpos(varstring), " `b' ")
}


Nevertheless, get the error message "invalid Syntax" and cant figure out where the problem is.

My desired goal would be: badwordsinstring: "badword5 badword7"


In addition as of right now my profanitycounter only counts the unique badwords used in a the string.
Do you guys have a hint how to change it to the absolute number of badwords in the string.

For example if badword1 is used 2 times and badword2 is used 5 times the varibale should indicate 7, as of right now I am only able to get the unique amount of badwords.


Thank you in advance.