I am importing about 75 csv's and appending them together. I got these csvs from a google doc where our surveyors are inputting information about each case. One of these variables is a notes column where manual comments have been typed about each case. The vast majority of the data is fine, but occasionally, a surveyor used a carriage-return (ALT + ENTER) to seperate out longer comments on to individual lines within the notes field. When the forms were exported to csv, the result was that some lines are parsed incorrectly and break before they should.
An example of what each csv looks like when there is no problem:
cleancsv.csv:
RESPID,TYPE NOTES,VAR1 VAR2,VAR3
ID1,TYPEA,some notes for the first case, 1,0,1
ID2,TYPEA,some notes for the second case, 1,1,0
ID3,TYPEB,some notes for the third case, 0,0,1
ID4,TYPEA,some notes for the fourth case, 1,1,1
ID5,TYPEB,some notes for the fifth case, 1,1,0
ID6,TYPEB,some notes for the sixth case, 0,0,1
...
An example with a manually entered carriage-return in the third observation's notes, it looks like:
carriagereturncsv.csv:
RESPID,TYPE NOTES,VAR1,VAR2,VAR3
ID1,TYPEA,some notes for the first case, 1,0,1
ID2,TYPEA,some notes for the second case, 1,1,0
ID3,TYPEB,some notes for the third case
that ended up containing a carriage return, 0,0,1
ID4,TYPEA,some notes for the fourth case, 1,1,1
ID5,TYPEB,some notes for the fifth case, 1,1,0
ID6,TYPEB,some notes for the sixth case, 0,0,1
When I "import delimited using "cleancsv.csv", I get a clean looking dataset that looks like:
Code:
clear all input str50(RESPID TYPE NOTES VAR1 VAR2 VAR3) "ID1" "TYPEA" "some notes for the first case" " 1" "0" "1" "ID2" "TYPEA" "some notes for the second case" " 1" "1" "0" "ID3" "TYPEB" "some notes for the third case" " 0" "0" "1" "ID4" "TYPEA" "some notes for the fourth case" " 1" "1" "1" "ID5" "TYPEB" "some notes for the fifth case" " 1" "1" "0" "ID6" "TYPEB" "some notes for the sixth case" " 0" "0" "1" end
Code:
clear all input str50(RESPID TYPE NOTES VAR1 VAR2 VAR3) "ID1" "TYPEA" "some notes for the first case" " 1" "0" "1" "ID2" "TYPEA" "some notes for the second case" " 1" "1" "0" "ID3" "TYPEB" "notes for the third case" "that ended up containing a carriage return " "0" "0" "1" "ID4" "TYPEA" "some notes for the fourth case" " 1" "1" "1" "ID5" "TYPEB" "some notes for the fifth case" " 1" "1" "0" "ID6" "TYPEB" "some notes for the sixth case" " 0" "0" "1"
The question: is there any systematic way to identify and negate these accidental carriage returns from STATA?
Thanks much.
0 Response to import delimited and accidental carriage returns
Post a Comment