Specialized on Data processing, Data management Implementation plan, Data Collection tools - electronic and paper base, Data cleaning specifications, Data extraction, Data transformation, Data load, Analytical Datasets, and Data analysis. BJ Data Tech Solutions teaches on design and developing Electronic Data Collection Tools using CSPro, and STATA commands for data manipulation. Setting up Data Management systems using modern data technologies such as Relational Databases, C#, PHP and Android.
Monday, August 24, 2020
-import delimited- and handling utf-8 encoding
Can one determine the proper encoding for a text file to specify when using -import delimited-?
I've increasingly encountered CSV files from various sources that imported with upper ASCII characters in a variable name and label because (I learned) the files had UTF-8 encoding, which I mistakenly imported with the default latin1 encoding. (Stata version 15.1). While this problem is easy enough to fix after the fact, is there a way to get the proper encoding other than having external knowledge of how the file was encoded and specifying it with the encoding() option? I see that newer versions of Microsoft Excel offer UTF-8 encoding of CSV files as an option, which I guess accounts for this issue becoming more frequent.
(While there have been other threads on StataList in the direction of this topic, I didn't find one that narrowed down the issue to the possibility of handling the encoding difference before it bites.)
No comments:
Post a Comment