Dear all,
I have a collection of around 2,400 PDFs of parliamentary debate transcriptions that I would like to import into Stata. Having found no easy solution to directly importing PDFs into Stata, I have batch converted them to text files to import them.
I have tried using multimport (multimport delimited, extensions (txt) clear) as a way to bring all of the text files in. However, this command by itself is incorrect because it returns only 200 observations, when there should be around 1 million. I have read the help file and tried to look at alternative approaches (for example a loop involving import delimited) but couldn't solve this issue.
The attraction of multimport is that I can potentially record the filename as a new variable, which would be helpful in later processing.
I have two questions based on this:
1. Is conversion of PDFs to text files before importing the appropriate way to approach this problem?
2. If multimport is the correct command, does anyone have any insight on how to tailor the command to get the appropriate output?
Thanks,
Nate
0 Response to Help/advice on importing large number of text files into Stata
Post a Comment