I would like to remove stopwords from strings and I have received the advice to use txtool. However, it says unmatched quote when performing the command. The strings include texts of annual report files that were stored as strings in Stata by using Wordstat. I use Stata version 16. The text is transformed to lower case in the variable document_lc.
I counted the overall amount of words with wordcount and now I want to create a variable that specifies the amount of words without stopwords.
This is the command I used:
txttool document_lc, generate(text_wo_stopwords_german) noclean nooutput stopwords("/Volumes/Elements//Stopwords/German stopwords.txt)
Is it possible that the strings are too long? What might be a solution?
Thank you
Robert
Related Posts with Stopwords Removal with Txttool
Reshape with two types of ID variablesHi all, Is there a way to reshape by individual ID with another ID (household ID) as a variable, as …
Bar graph with multiple variables overlayingHello, I am wondering if it is possible to create a figure like this using stata. I have two variab…
how to create a cross-tabulation, dummy variables and regression with wide dataCan someone help me a syntax to use on how to i run cross tabulations, create dummy variables and ru…
how to create a cross-tabulations, dummy variables and regression with wide dataCan someone help me with a syntax to use; on how to run cross tabulations, create dummy variables an…
Moving Average in Time SeriesHi All, I want to calculate the backward 12-month moving average of bse_monthly_market_volatility. …
Subscribe to:
Post Comments (Atom)
0 Response to Stopwords Removal with Txttool
Post a Comment