I have a single string that is extremely long. It has a length of about 600,000 characters and 90,000 words separated by single spaces.
I want to get each word of this single string as one observation each. So, I would like to have 90,000 observations with each observation corresponding to each word of the initial long string.
What would be the most efficient way to achieve this?
I tried using the split command with a variety of separators in the parse option. The idea is to split the string by spaces or some other separator and then reshape it from wide to long. Two examples I tried include:
Code:
clear set maxvar 32767 split text, parse(" and") split text, parse(" ")
Naturally, no matter what separator I use to split the string, Stata returns a "no room to add more variables because of width" error. I understand that this is happening because my string is so long that Stata is reaching the maximum number of variables allowable per observation.
Is there a workaround to this issue to get to my final objective of converting the single long string with 90,000 words into a dataset with 90,000 observations/words?
FYI, I am attaching an example string that I tried splitting. I did not include the string in the code above due to its immense size.
Regards,
Tasneem
0 Response to Fix no room to add more variables because of width issue when splitting a long string
Post a Comment