given a dataset in tab-delimited format, is there any way to tell the insheet/import delimited (orany other import command) to replace certain words in columns during import before the types of columns are determined?
Specifically, the tab file looks like this:
Code:
x y z
1 alpha
2 alpha NaN
NaN alpha 9
3 beta 9
banana 9
2 banana 9
I would like to import it specifying that the specific value NaN should be treated as .a (ext missing) during import and permit the column to be imported as numeric during the types analysis. Clearly NaN may occur in string literals such as in "banana" in the example. I would like to preserve the distinction between the empty value and the NaN value. My desired result in Stata is:
Code:
x y z
1 alpha .
2 alpha .a
.a alpha 9
3 beta 9
. banana 9
2 banana 9
where x and z are imported as numeric and y as a string variable.There are couple of other (known) keywords denoting problems detected by the data-producing system that I would like to filter-out that way. Conceptually it is similar to the ignore() option of the destring command, but for the whole file, rather than one variable.
I understand it is possible to import the column as string, and replace the particular value, then recast or destring. This route I can follow myself, but I am looking for a solution that would utilize a standard command.
If it matters, I have thousands of these files with unknown structure (which column is string or numeric), so eyeballing it would not be possible.
If there is any advice, please provide, before I go with custom code.
Thank you, Sergiy Radyakin
0 Response to replacing values during import
Post a Comment