Renaming variables according to position

Good morning,

Long text to avoid XY problem.

I'm writing a Stata code that opens and conduct cleaning steps on excel files that will be sent weekly to me by a another person. Since the excel file the person sent as an example to me contains several sheets (which I would like to be appended). I decided on using xls2dta. Using an option on xls2dta, I created new variables that contain the information on the file path and sheet within the file for which a given observation was obtained. The sheet within the excel file is specially important, because information is contained on the name of the sheet. The exact xls2dta command I'm using is:

Code:

xls2dta , clear allsheets gen(file sheet) : append using "file_path\file_name.xls", import(firstrow allstring)

The excel file also contain double headers and the first line of the header is a description of the excel file merged over several cells, which xls2dta reads as the variable name for the first variable, leaving the next variables as B, C, D and so on. The first observation of the the file already imported to Stata contains the real variable names. To solve that issue, I used

Code:

renvars, map(strtoname(@[1]))
drop in 1

which maps the content of the first observation and rename the variables according to that. It also gets rid of the first observation after mapping variable names. This works well, except that it also renames my "file" and "sheet" variables created using xls2dta.

I see some possible approaches to solve the issue of having my file and sheet variables renamed:

Rename those 2 variables using evocative names - don't want to go this way because different weekly excel files will have different names and different sheet names, so this would introduce a step of human work. Having something automatic is what I have in mind.
I could limit the reach of renvars to not change the name of the last 2 variables of the dataset - the file and sheet variables are expected to always be the last 2 vars of the dataset. However, I don't know how to do that (or even if that's possible). This seems to be too much of a specific problem, which is not covered in the renvars help file.
I could rename the 2 variables to their original name again without using their evocative name, but their position in the dataset (they are always expected to be the last 2 vars of the dataset). I found an interesting post on Statalist that approaches a similar problem, but the solution presented there involves changing variable names based on their numeric position, and I expect this to change over different excel files sent to me. I can not guarantee that file and sheet will always be variables number 11 and 12. A stronger assumption is that they will always be the last 2 variables, but even on the post mentioned above, I could not find information on how to code to change the name of the "last 2 variables" of a dataset.
I could ask the person sending me the excel files to add a variable with the name of the sheet on each sheet of the excel file, but there is resistance for this change. It doesn't seem like it's going to happen.

I appreciate any input you can give me. There are several steps that I'm doing, so I'm sharing snippets of the dataset in different stages of the problem:

Immediately after xls2dta:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str9 AllSECONDFARMSAREESITIMATED str6(B C) str9 D str23 file str10 sheet
"load time" "ORIGIN" "GROUP#" "# to load" "file_path\file_name.xls" "sheet_name"
"7AM"       "A"      "X"      "10"        "file_path\file_name.xls" "sheet_name"
"8AM"       "A"      "X"      "10"        "file_path\file_name.xls" "sheet_name"
"6AM"       "B"      "Y"      "20"        "file_path\file_name.xls" "sheet_name"
end

Immediately after renvars (notice how my vars file and sheet got renamed):

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str9 load_time str6(ORIGIN GROUP_) str9 __to_load str23 file_path_file_name_xls str10 sheet_name
"7AM" "A" "X" "10" "file_path\file_name.xls" "sheet_name"
"8AM" "A" "X" "10" "file_path\file_name.xls" "sheet_name"
"6AM" "B" "Y" "20" "file_path\file_name.xls" "sheet_name"
end

Hope this is clear, the post got way longer than anticipated.

Best;

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Renaming variables according to position
Renaming variables according to position

0 Response to Renaming variables according to position

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Renaming variables according to position Renaming variables according to position

Related Posts with Renaming variables according to position

0 Response to Renaming variables according to position

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Renaming variables according to position
Renaming variables according to position