I would like to remove stopwords from strings and I have received the advice to use txtool. However, it says unmatched quote when performing the command. The strings include texts of annual report files that were stored as strings in Stata by using Wordstat. I use Stata version 16. The text is transformed to lower case in the variable document_lc.
I counted the overall amount of words with wordcount and now I want to create a variable that specifies the amount of words without stopwords.
This is the command I used:
txttool document_lc, generate(text_wo_stopwords_german) noclean nooutput stopwords("/Volumes/Elements//Stopwords/German stopwords.txt)
Is it possible that the strings are too long? What might be a solution?
Thank you
Robert
Related Posts with Stopwords Removal with Txttool
nested loop for cycling over observations by groupHi everyone, I have a dataset with 14 million observations and 22 variables. The variables of inter…
"Starting values invalid" message when using nlsur commandsDear Stata users, I am trying nlsur command using function evaluator program to estimate coefficien…
Loop to summarize multiple variables in oneHello all, I am trying to create a variable that combines the school district identifiers listed be…
Counting words but dependence on 2 VariablesHello, I am new here and trying to open the topic the right way. I hope I implemented the code corr…
logistic regression: outcome does not varyMy research topic is about child mortality and risk factors, including education, age, child with fu…
Subscribe to:
Post Comments (Atom)
0 Response to Stopwords Removal with Txttool
Post a Comment