I would like to remove stopwords from strings and I have received the advice to use txtool. However, it says unmatched quote when performing the command. The strings include texts of annual report files that were stored as strings in Stata by using Wordstat. I use Stata version 16. The text is transformed to lower case in the variable document_lc.
I counted the overall amount of words with wordcount and now I want to create a variable that specifies the amount of words without stopwords.
This is the command I used:
txttool document_lc, generate(text_wo_stopwords_german) noclean nooutput stopwords("/Volumes/Elements//Stopwords/German stopwords.txt)
Is it possible that the strings are too long? What might be a solution?
Thank you
Robert
Related Posts with Stopwords Removal with Txttool
Clustering standard errors, panel data, when households move to different clusters over timeI have a household panel dataset, with a two stage sample design, first at village and then househol…
Cross sectional dependence testDear all, I am applying a fixed effect model, and I have used xtcsd, friedman command to see whethe…
comparing Sem and Gsem with their AICHello, I wanted to know if it is possible to compare a sem model with a gsem model by their AIC ? I …
Update to -moremata- available from SSCTo install the update, type Code: . ssc install moremata, replace or use the adoupdate command. T…
Major update to -kmatch- now available from SSCA major update to kmatch is now available from SSC. To install the update, type Code: . ssc instal…
Subscribe to:
Post Comments (Atom)
0 Response to Stopwords Removal with Txttool
Post a Comment