Hi all,

I have a single string that is extremely long. It has a length of about 600,000 characters and 90,000 words separated by single spaces.

I want to get each word of this single string as one observation each. So, I would like to have 90,000 observations with each observation corresponding to each word of the initial long string.

What would be the most efficient way to achieve this?

I tried using the split command with a variety of separators in the parse option. The idea is to split the string by spaces or some other separator and then reshape it from wide to long. Two examples I tried include:

Code:
clear
set maxvar 32767

split text, parse(" and")
split text, parse(" ")

Naturally, no matter what separator I use to split the string, Stata returns a "no room to add more variables because of width" error. I understand that this is happening because my string is so long that Stata is reaching the maximum number of variables allowable per observation.

Is there a workaround to this issue to get to my final objective of converting the single long string with 90,000 words into a dataset with 90,000 observations/words?

FYI, I am attaching an example string that I tried splitting. I did not include the string in the code above due to its immense size.

Regards,
Tasneem