I have downloaded the dataset from the scimago journal ranking website. The downloaded data is very unstructured. So, I am using Stata to clean it up and rank the journals area-wise. The "categories" variable contains the ranking of several areas. The objective is to rank journals in each area. For this, we have to also create several areas like "Accounting", "Economics and Econometrics", "Strategy and Management", "Finance" and so on.
I have the following dataset.
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input strL categories
"Economics and Econometrics (Q1)"
"Economics and Econometrics (Q3)"
"Accounting (Q1); Economics and Econometrics (Q1); Finance (Q1)"
"Economics and Econometrics (Q2)"
"Economics and Econometrics (Q3)"
"Economics and Econometrics (Q1)"
"Economics, Econometrics and Finance (miscellaneous) (Q1)"
"Accounting (Q3); Economics and Econometrics (Q4); Finance (Q1)"
"Economics and Econometrics (Q1)"
"Accounting (Q1); Economics and Econometrics (Q1); Finance (Q1); Strategy and Management (Q1)"
"Economics, Econometrics and Finance (miscellaneous) (Q3)"
"Economics and Econometrics (Q2)"
"Economics and Econometrics (Q3); Finance (Q1)"
"Economics and Econometrics (Q1); Social Sciences (miscellaneous) (Q1)"
"Anthropology (Q1); Arts and Humanities (miscellaneous) (Q1); Business and International Management (Q1); Economics and Econometrics (Q1); Marketing (Q1)"
"Economics and Econometrics (Q1); Industrial Relations (Q1)"
"Economics, Econometrics and Finance (miscellaneous) (Q1)"
"Business and International Management (Q1); Economics and Econometrics (Q4); Marketing (Q1)"
"Economics, Econometrics and Finance (miscellaneous) (Q2)"
"Finance (Q3); Strategy and Management (Q1)"
end
Also, since the actual dataset consists of 126 areas (e.g, Accounting, Finance, Marketing, and so on), we have to create 126 new variables representing each area of ranking. I create them manually. Is there a better way to create them?
Is it possible to take the ranking of journals in each area directly from the string variable "categories" using substr function or regular expressions?
Code:
split categories, parse(";")
gen Accounting=""
forvalues i=1/5{
replace Accounting="Accounting (Q1)" if categories`i'=="Accounting (Q1)"
replace Accounting="Accounting (Q2)" if categories`i'=="Accounting (Q2)"
replace Accounting="Accounting (Q3)" if categories`i'=="Accounting (Q3)"
replace Accounting="Accounting (Q4)" if categories`i'=="Accounting (Q4)"
}
Code:
gen EconomicsandEconometrics = ""
forvalues i=1/5{
replace EconomicsandEconometrics = "Economics and Econometrics (Q1)" if categories`i' == "Economics and Econometrics (Q1)"
replace EconomicsandEconometrics = "Economics and Econometrics (Q2)" if categories`i' == "Economics and Econometrics (Q2)"
replace EconomicsandEconometrics = "Economics and Econometrics (Q3)" if categories`i' == "Economics and Econometrics (Q3)"
replace EconomicsandEconometrics = "Economics and Econometrics (Q4)" if categories`i' == "Economics and Econometrics (Q4)"
}
Thank you.
0 Response to Separating big string variables
Post a Comment