Dear Statalist-community,
within a paper for my studies, I am doing an empirical analysis.

In the past, I have worked only with the basics in STATA so far, and after trying for many days I am a little stuck: the creation of some of the empirical variables is a bit overwhelming for the level of knowledge and practice I’m at.
In sum, I have problems with the creation of three variables (#1 #5 #6), and for three other variables (#2 #4 #5) I would be thankful for a quick confirmation that the logic is coherent .

In case I did something wrong in asking questions, I apologise. I tried to regard the FAQs as best as possible (except the dataset attachment*).

The construction of #1 is the most important to me, so when focusing on one, I would be especially thankful for this one!

I analyse companies from 28 countries from 2017-2019.
Country-Identifier = cid, Firm-identifier = fid

I know that these are many questions, but nevertheless hope that some of you could help me with one or two variables.
A very, very huge thank you in advance!
Best regards, Elena
______________________________________________
*I would have liked to attach the dataset, but it is a quite big dataset in xlsx.-format - which, according to the FAQs, is not wished. I am not sure on how to simplify the data otherwise, as quite a few variables are constructed from the original dataset to get to the variables needed above.
Also, in order to get the whole picture, I would have needed to add the whole code, in .do-format. As a construction of more variables were necessary to get there, I don't want to make it too long and compliated, and would be especially thankful for theoretic computation advice.
If the attachment is welcome regardoess, I would be happy to add it.

_______________________________________________

#1 BTC Proxy:
"I take the mean of all values of PermBTD per country and year." I have coded it like this:

Code:
bys cid Year : egen PermBTD_mean= mean(PermBTD)
From here, I am a little lost:

"On the basis of PermBTD_mean, a rank to each country and year is assigned."
I tried the following, but it did not really assign ranks to the variable.
Code:
bys cid Year: egen PermBTD_rank = rank(PermBTD_mean)
And from here, I am completely lost:
"Afterwards, descending rankings are used so that the highest value in a particular year takes a value of 0 and the lowest value in a year takes a value of n-1, where n is the number of countries that are included in that year.
Then, to scale these rankings so that they range between 0 and 1, we divide by n-1."
I will call this variable PermBTD_desc

"The final scaled rank is calculated as the average rank over a three year period."
Here, I would just proceed like this:
Code:
Bysort fid Year: egen BTC = mean(PermBTD_desc).

#2
EM 1: "Country-year median ratio of the firm-level standard deviation of OPPL_scale divided by the firm-level standard deviation of CFO_scale."
Standard deviations are calculated using data from t-1 to t." Have I computed this the correct way or do I need to specify something about the standard deviations?

I coded the following so far, and am just making sure that this is correct after the definition:
Code:
bys fid : egen OPPL_sd = sd(OPPL_scale)
bys fid : egen CFO_sd = sd(CFO_scale)
bys fid: gen temp = OPPL_sd/CFO_sd
bys cid Year: egen EM1 = median(temp)
drop temp

#3
EM 2: "Country-year's spearman correlation between d_DACR_ABS_meanscale and d_CFO_meanscale. (The correlations are calculated cross-sectionally at the country-year level.)"
I constructed the following: (To understand the background of the variables, I added the construction of d_DACR_ABS_meanscale and d_CFO_meanscale. TOAS = total assets of a firm's balance sheet.)

Code:
sort fid Year
gen TOAS_mean = (TOAS+l.TOAS)/2
gen d_DACR_ABS_meanscale = d.DACR_ABS/TOAS_mean
gen d_CFO_meanscale = d.CFO/TOAS_mean

gen EM2 = .
levelsof cid, local(lvs)
foreach X in `lvs' {
pwcorr d_DACR_ABS_meanscale d_CFO_meanscale if cid ==`Z'
replace EM2 = r(rho) if cid == `Z'
}
Here, the error message is "invalid syntax", something must have gone wrong.
Also, I have only included the year-level, but not the country-level at the moment as I’m not sure how to.

AT the moment I can not check because of the error message, but would this be a way? (The bold printed letters are the possible changes to the code).
Code:
levelsof cid, local(lvs)
levelsof Year, local(year)
foreach X in `lvs year' {
pwcorr delta_DACR_ABS_meanscale delta_CFO_meanscale if Cid ==`Z'
replace EM2 = r(rho) if Cid == `Z'
}

#4
EM 3: "Country-year's median ratio of the absolute value of DACR divided by the absolute value of CFO".
Here, I am just asking if you would consider that as right.

Code:
bys cid Year: g DACRdCFO = abs(DACR)/abs(CFO)
bys cid Year: egen EM3 = median(DACRdCFO)

#5
EM 4: "Number of SL divided by the number of SP for each country-year."
For this variable, I also just want to make sure that this is constructed right.

Code:
bysort cid Year: g EM4 = SL/SP
#6
EMagg:
"For each country-year the four measures (EM1 EM2 EM3 EM4) are ranked such that a higher rank corresponds to a higher level of EM.
These ranks are converted into percentiles by subtracting 1 and dividing by n-1 where n is the number of countries in the sample in a given year.
(This removes any time effects from the variable)"

With this creation I am very lost, to be honest, have not found a way to code at all and would be thankful for a construction help from scratch.

"EM_agg ist then created by averaging the country-year rankings for the four individual EM variables."
This last part I would simply compute as

Code:
bysort Cid Year gen EM_agg = (EM1+EM2+EM3+EM4)/4


Again, I thank every single one assisting me in advance!