import delimited and accidental carriage returns

Hi Statalist,

I am importing about 75 csv's and appending them together. I got these csvs from a google doc where our surveyors are inputting information about each case. One of these variables is a notes column where manual comments have been typed about each case. The vast majority of the data is fine, but occasionally, a surveyor used a carriage-return (ALT + ENTER) to seperate out longer comments on to individual lines within the notes field. When the forms were exported to csv, the result was that some lines are parsed incorrectly and break before they should.

An example of what each csv looks like when there is no problem:

cleancsv.csv:

RESPID,TYPE NOTES,VAR1 VAR2,VAR3
ID1,TYPEA,some notes for the first case, 1,0,1
ID2,TYPEA,some notes for the second case, 1,1,0
ID3,TYPEB,some notes for the third case, 0,0,1
ID4,TYPEA,some notes for the fourth case, 1,1,1
ID5,TYPEB,some notes for the fifth case, 1,1,0
ID6,TYPEB,some notes for the sixth case, 0,0,1
...

An example with a manually entered carriage-return in the third observation's notes, it looks like:

carriagereturncsv.csv:

RESPID,TYPE NOTES,VAR1,VAR2,VAR3
ID1,TYPEA,some notes for the first case, 1,0,1
ID2,TYPEA,some notes for the second case, 1,1,0
ID3,TYPEB,some notes for the third case
that ended up containing a carriage return, 0,0,1
ID4,TYPEA,some notes for the fourth case, 1,1,1
ID5,TYPEB,some notes for the fifth case, 1,1,0
ID6,TYPEB,some notes for the sixth case, 0,0,1

When I "import delimited using "cleancsv.csv", I get a clean looking dataset that looks like:

Code:

clear all

input str50(RESPID TYPE NOTES VAR1 VAR2 VAR3)
"ID1" "TYPEA" "some notes for the first case" " 1" "0" "1"
"ID2" "TYPEA" "some notes for the second case" " 1" "1" "0"
"ID3" "TYPEB" "some notes for the third case" " 0" "0" "1"
"ID4" "TYPEA" "some notes for the fourth case" " 1" "1" "1"
"ID5" "TYPEB" "some notes for the fifth case" " 1" "1" "0"
"ID6" "TYPEB" "some notes for the sixth case" " 0" "0" "1"
end

But when I "import delimited using "carriagereturncsv.csv", I get a warped dataset that looks like:

Code:

clear all

input str50(RESPID TYPE NOTES VAR1 VAR2 VAR3)
"ID1" "TYPEA" "some notes for the first case" " 1" "0" "1"
"ID2" "TYPEA" "some notes for the second case" " 1" "1" "0"
"ID3" "TYPEB" "notes for the third case"
"that ended up containing a carriage return " "0" "0" "1"
"ID4" "TYPEA" "some notes for the fourth case" " 1" "1" "1"
"ID5" "TYPEB" "some notes for the fifth case" " 1" "1" "0"
"ID6" "TYPEB" "some notes for the sixth case" " 0" "0" "1"

The result is that I have an extra observation with RESPID = "that ended up containing a carriage return", and VAR1, VAR2, and VAR3 missing for the third observation with RESPID=ID3.

The question: is there any systematic way to identify and negate these accidental carriage returns from STATA?

Thanks much.

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / import delimited and accidental carriage returns
import delimited and accidental carriage returns

0 Response to import delimited and accidental carriage returns

Post a Comment

Home / Data Cleaning / Data management / Data Processing / import delimited and accidental carriage returns import delimited and accidental carriage returns

Related Posts with import delimited and accidental carriage returns

0 Response to import delimited and accidental carriage returns

Post a Comment

Home / Data Cleaning / Data management / Data Processing / import delimited and accidental carriage returns
import delimited and accidental carriage returns