extracting components from filename including special characters

Hi, i received a bunch (several hundred) of xls. files that should be read into Stata. Unfortunately, the filename contains umlaut (ÄÖÜ), and this seems to produce an issue in the substr command (used to extract parts of the filename). I attached an example code to generate these filenames. Here, the second and third entry is shifted to the left by one position, e.g. mzp results in "-t" instead of "t3". Or the variable ID is 8 instead of 9 characters long.
Is there a trick to avoid this behaviour? I would like to avoid conditional coding, also renaming the filenames is not an option or the project

I´m using Stata 15.1. on a OS X 10.15.2. (Catalina)

thanks,
Marc

Code:

input str50 filename 
          "11-11-t0AL011OAO1-t3-Baseline-Segment01-19.01.99"
          "11-11-t0AL111ÜAK1-t3-Baseline-Segment01-19.01.99"
          "11-11-t0AL111RIÖ1-t3-Baseline-Segment01-19.01.99"
          end
bro

gen str ID = substr(filename,9,9)
gen str mzp = substr(filename,19,2)
gen str phasestr = substr(filename,22,6)
gen str phase = substr(filename,4,2)

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / extracting components from filename including special characters
extracting components from filename including special characters

0 Response to extracting components from filename including special characters

Post a Comment

Home / Data Cleaning / Data management / Data Processing / extracting components from filename including special characters extracting components from filename including special characters

Related Posts with extracting components from filename including special characters

0 Response to extracting components from filename including special characters

Post a Comment

Home / Data Cleaning / Data management / Data Processing / extracting components from filename including special characters
extracting components from filename including special characters