Predicting Probability

Hey everyone,

first of all, I'm not an expert in working with stata, which is why is require your help for my bachelor thesis.
The Problem i have is rather complex.
I need to compare scores from different rating agencies.

I have a data sample which contains scores from three different ratings (Let's call them Rating A, Rating B and Rating C). Each of these scores vary for every company in this data set. (Companies are identified through isin).
I've already done an analysis of difference in means via ttest.
What i wanted to do now is, predicting the probability how likely it is, that the score of one firm is in the same quartile (only 25%, 50% and 75%) for two ratings.

In detail:
For Example: Company X has a score of 5 for Rating A and is therefore in the 25% quartile. Whats the probability for Company X to be in the 25% Quantile at Rating B aswell?
I don't have independent variables which are resulting the score, i literally only have the score. Because of this i don't know if the probit/logit regression is the correct way to predict these probabilities.
What I've already done is:
- For each Rating A, B and C the mean of the company score (i.e. rating B - "bysort isin: egen company_mean_B") Now I have a company specific score for each rating and each company.
- For each Rating the 25th, 50th and 75th quantile as a new variable (i.e. rating A - ratingA_25; ratingA_50; ratingA_75)
- dummy variables for each of these quantiles. I. e. for the 25th quantile(ratingA_dummy_25). The dummy has the value 1 = "yes" (company_mean_a <= ratingA_25) and 0 = "no" (company_mean_a > ratingA_25)
I don't know if i need all these variables, but I did some research on progressing a logit regression via stata and I was told, that you need a yes/no decision to predict the probability on how likely it is to receive the yes.
What confuses me is, that i don't clearly have a dependent and independent variable for a logit regression, since the scores don't influence themselves.

Or is the dummy variable with the "yes"/"no" decision the dependent one and the syntax should be something like: logit ratingA_dummy_25 company_mean_B
This would result the following output (changed a little because of privacy protection):
--------------------------------------------------------------------------------------------
ratingA_dummy_25 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------------------+----------------------------------------------------------------
company_mean_B | -.14334 .0008444 -161.55 0.000 -.1374021 -.1341907
_cons | 6.110663 .0443111 136.12 0.000 6.02384 6.197777

Do i interpret the negative coefficient as "If the company_mean_B Score will increase by one [unit], the probability to receive the answer "yes" for ratingA (That the mean score for companies of Rating B will be lower than the 25% Quantile) will reduce by 14%?
Im pretty curious about this regression since I don't know if this is the correct approach and I don't want to write about a wrong analysis.

I don't even know whether the regression is the way to do this. But if I'm correct or if there is another approach to this particular problem, i would be grateful for some information about it.

If you need further details, please let me know.

Thank you already in advance.

Niklas

BJ Data Tech Solution

Home / Data Cleaning / Data management / Data Processing / Predicting Probability
Predicting Probability

0 Response to Predicting Probability

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Predicting Probability Predicting Probability

Related Posts with Predicting Probability

0 Response to Predicting Probability

Post a Comment

Home / Data Cleaning / Data management / Data Processing / Predicting Probability
Predicting Probability