Hello,
I want generate a new varibale that display the words found in the string variable that match my local list of words.
The context hereby is that I want to create a profanity filter. Nevertheless not all words are considered to be profane in every context.
Therefore I want to see which words my profanity filter is classifying as profane.
The Approach sofar:
gen profanitydummy = 0
gen profanitycount = 0
local badwords "badword1 badword2 badword3 badword4"
foreach b in `badwords' {
replace profanitydummy = 1 if strpos(varstring, " `b' ") != 0
replace profanitycount = profanitycount + 1 if strpos(varstring, " `b' ") != 0
}
This results in a dummy if a word in the varstring matches a word in the local badwords.
In addition it counts the number of unique badwords used in the string.
The local badwords list is approx. 1100 words, I used from a reseacher gathering "offensive" words.
I now want to know, for which words the profanity dummy is indicating that there is a bad word in the varstring.
My approach:
gen badwordinstring = ""
foreach b in `badwords'{
replace badwordinstring = " `b' " if strpos(varstring), " `b' ")
}
Nevertheless, get the error message "invalid Syntax" and cant figure out where the problem is.
My desired goal would be: badwordsinstring: "badword5 badword7"
In addition as of right now my profanitycounter only counts the unique badwords used in a the string.
Do you guys have a hint how to change it to the absolute number of badwords in the string.
For example if badword1 is used 2 times and badword2 is used 5 times the varibale should indicate 7, as of right now I am only able to get the unique amount of badwords.
Thank you in advance.
Related Posts with Extracting words from a string variable using a local list
Inclusion of country-fixed effects in a difference-in-difference model, specified on firm-year level and using REGHDFEDear Forum, I have a panel of firm-year observations from different countries. A given firm will be…
Issues using PPML Fixed Effects Gravity ModelHello all, I am a masters student using the gravity model to try and understand trade between the f…
Issue for Stata CodeI write the command "lp" for local projection method in Stata 16, however, it shows "lp" is unrecogn…
Fmolscan PANEL FMOLS be used with I(1) dependent variable and i have 4 independent variables one of them …
Stationarity of Panel DataWhere can I find a theoretical reference as why there is no need to check for stationarity for short…
Subscribe to:
Post Comments (Atom)
0 Response to Extracting words from a string variable using a local list
Post a Comment