Hello,
I want generate a new varibale that display the words found in the string variable that match my local list of words.
The context hereby is that I want to create a profanity filter. Nevertheless not all words are considered to be profane in every context.
Therefore I want to see which words my profanity filter is classifying as profane.
The Approach sofar:
gen profanitydummy = 0
gen profanitycount = 0
local badwords "badword1 badword2 badword3 badword4"
foreach b in `badwords' {
replace profanitydummy = 1 if strpos(varstring, " `b' ") != 0
replace profanitycount = profanitycount + 1 if strpos(varstring, " `b' ") != 0
}
This results in a dummy if a word in the varstring matches a word in the local badwords.
In addition it counts the number of unique badwords used in the string.
The local badwords list is approx. 1100 words, I used from a reseacher gathering "offensive" words.
I now want to know, for which words the profanity dummy is indicating that there is a bad word in the varstring.
My approach:
gen badwordinstring = ""
foreach b in `badwords'{
replace badwordinstring = " `b' " if strpos(varstring), " `b' ")
}
Nevertheless, get the error message "invalid Syntax" and cant figure out where the problem is.
My desired goal would be: badwordsinstring: "badword5 badword7"
In addition as of right now my profanitycounter only counts the unique badwords used in a the string.
Do you guys have a hint how to change it to the absolute number of badwords in the string.
For example if badword1 is used 2 times and badword2 is used 5 times the varibale should indicate 7, as of right now I am only able to get the unique amount of badwords.
Thank you in advance.
Related Posts with Extracting words from a string variable using a local list
GMM Standard Errors and SAR ModelsAll: Please excuse this question as I am an ex-GeoDa user coming to STATA. In the Spatial Autoregre…
Granger causality test for Panel data regressionI want to test the direction of causality for my panel data. In order to do so, I wanted to carry ou…
panel data analysisHi, I am getting negative test statistic for hausman test to check b/w random and fixed effects mode…
panel data analysiscould anyone please explain how to interpret the results of panel data fixed effects and random effe…
Reorganising panel dataI have Panel data ranging over 8 time periods (quarters). For each individual, there are 4 quarters…
Subscribe to:
Post Comments (Atom)
0 Response to Extracting words from a string variable using a local list
Post a Comment