Dear Statalist members,

I am working with a dataset in .csv format that is separated in different files containing general information about patients (i.e birthdate, gender), physical examination data (i.e height, weight, blood pressure), blood analysis (i.e sodium, potassium) and urine analysis (i.e sodium, potassium). When importing for example the file related to urine analysis in Stata, there is an issue with variables names because they contain numbers, spaces and ".". If I use the command import delimited with the option varname (1), I end up with my variables names as variables labels.

Here is an example with 4 variables, 10.xx indicating that this variable is from the urine analysis (9.xx would be for the blood analysis):

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str9 v4 str4(v11 v13) str5 v29
"V4" "66" "19" "3300"
"V4" "56" "28" "3900"
"V2" "91" "40" "6800"
"V2" "79" "40" "6000"
"V5" "55" "25" "4400"
end
label var v4 "0.8 Visit number" 
label var v11 "10.1 Sodium (Na)" 
label var v13 "10.2 Potassium (K)" 
label var v29 "10.10 Creatinine"

My idea was to use loops to extract the labels and rename the variables. But there are several modifications that I would like to (and need to) bring to the variables names:

- remove spaces and ". "
- replace 10. by u_ to indicate that it is the urinary value (similarly by b_ when the variable starts by 9. to indicate that it is a blood value), but only when the number 10 is in the first position. As you can see the variable for urinary creatinine is labeled as "10.10 Creatinine" and in the blood analysis there is a variable "9.10 Uric acid", so it would be important to discriminate the position
- remove the numbers if the variable is not a blood or urinary value (i.e 0.8 Visit number)
- remove the content inside the brackets i.e (Na), (K)
- i don't know if in that case the capital letters would also need to be modified

So in the end, it would look like this: visit_number, u_sodium, u_potassium, u_creatinine


This is the code I have done so far, but I still have a problem with the first position being a number so I added a temporary a_ before the variable name to be able to run the loop. Would you have any suggestions to do the modifications stated above?

Code:
foreach var of varlist _all{
    
    local current_lab: variable label `var'
    local new_current_lab = subinstr("`current_lab'", " ","_",.)
    local new_current_lab = subinstr("`new_current_lab'", ".","_",.)
    local new_current_lab = subinstr("`new_current_lab'", "(","_",.)
    local new_current_lab = subinstr("`new_current_lab'", ")","_",.)
    
    dis "`new_current_lab'"
    rename `var' a_`new_current_lab'
    
    }

Also, some of those variables such as 10.1 Sodium (Na) are supposed to be numeric but are stored as a string at the moment. Would it be recognised as numeric if the variables were named in a way Stata can understand them or would I also have to convert them into numeric?

Thank you in advance,
Constance