Long text to avoid XY problem.
I'm writing a Stata code that opens and conduct cleaning steps on excel files that will be sent weekly to me by a another person. Since the excel file the person sent as an example to me contains several sheets (which I would like to be appended). I decided on using xls2dta. Using an option on xls2dta, I created new variables that contain the information on the file path and sheet within the file for which a given observation was obtained. The sheet within the excel file is specially important, because information is contained on the name of the sheet. The exact xls2dta command I'm using is:
Code:
xls2dta , clear allsheets gen(file sheet) : append using "file_path\file_name.xls", import(firstrow allstring)
Code:
renvars, map(strtoname(@[1])) drop in 1
I see some possible approaches to solve the issue of having my file and sheet variables renamed:
- Rename those 2 variables using evocative names - don't want to go this way because different weekly excel files will have different names and different sheet names, so this would introduce a step of human work. Having something automatic is what I have in mind.
- I could limit the reach of renvars to not change the name of the last 2 variables of the dataset - the file and sheet variables are expected to always be the last 2 vars of the dataset. However, I don't know how to do that (or even if that's possible). This seems to be too much of a specific problem, which is not covered in the renvars help file.
- I could rename the 2 variables to their original name again without using their evocative name, but their position in the dataset (they are always expected to be the last 2 vars of the dataset). I found an interesting post on Statalist that approaches a similar problem, but the solution presented there involves changing variable names based on their numeric position, and I expect this to change over different excel files sent to me. I can not guarantee that file and sheet will always be variables number 11 and 12. A stronger assumption is that they will always be the last 2 variables, but even on the post mentioned above, I could not find information on how to code to change the name of the "last 2 variables" of a dataset.
- I could ask the person sending me the excel files to add a variable with the name of the sheet on each sheet of the excel file, but there is resistance for this change. It doesn't seem like it's going to happen.
Immediately after xls2dta:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str9 AllSECONDFARMSAREESITIMATED str6(B C) str9 D str23 file str10 sheet "load time" "ORIGIN" "GROUP#" "# to load" "file_path\file_name.xls" "sheet_name" "7AM" "A" "X" "10" "file_path\file_name.xls" "sheet_name" "8AM" "A" "X" "10" "file_path\file_name.xls" "sheet_name" "6AM" "B" "Y" "20" "file_path\file_name.xls" "sheet_name" end
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str9 load_time str6(ORIGIN GROUP_) str9 __to_load str23 file_path_file_name_xls str10 sheet_name "7AM" "A" "X" "10" "file_path\file_name.xls" "sheet_name" "8AM" "A" "X" "10" "file_path\file_name.xls" "sheet_name" "6AM" "B" "Y" "20" "file_path\file_name.xls" "sheet_name" end
Best;
0 Response to Renaming variables according to position
Post a Comment