Hey everyone. Consider the following dataset (may be worth looking at the text file
here in your browser).
Code:
import delimited "https://pubs.niaaa.nih.gov/publications/surveillance119/pcyr1970-2020.txt", ///
rowrange(130) clear
As we can see, this imports a text file with some columns that're intended to be variables for year, state FIPS code, type of beverage, and various statistics related to alcohol consumption. The trouble is getting everything in a format that's useful (i.e., the 14 variable columns we see here). Consider:
Code:
import delimited "https://pubs.niaaa.nih.gov/publications/surveillance119/pcyr1970-2020.txt", ///
delimiter(space) rowrange(130) clear
Well..... this
sort of works. The year it imports fine. The fips codes (the 'updated' column), kinda sorta. But what about the other columns!! There are now over 40 other columns that are likely treated different due to spacing reasons or other qualities of the dataset. When I change the delimiter to
Code:
import delimited "https://pubs.niaaa.nih.gov/publications/surveillance119/pcyr1970-2020.txt", ///
delimiter(whitespace) rowrange(130) clear
or
Code:
import delimited "https://pubs.niaaa.nih.gov/publications/surveillance119/pcyr1970-2020.txt", ///
delimiter(".") rowrange(130) clear
we barely get better results. My original idea was tot concatenate some of the columns, but I strongly suspect if I did that I'd run into values from different variables that appear in the same cell, which would give me incorrect results. I could use the split command to select on spaces
Code:
import delimited "https://pubs.niaaa.nih.gov/publications/surveillance119/pcyr1970-2020.txt", ///
rowrange(130) clear
split v1, p(" ")
but this arguably the task even more hellish. I guess all I'd really want is to have what appears to be a 14 column dataset of all the variables they have here- the rest (state codes, etc) I can work with just fine. How might I begin to tackle this interesting text file?
0 Response to Working With A Pesky Text File
Post a Comment