Hello!

I am running a probit regression in Stata. The dependent variable is binary with 1 for promoted and 0 for not promoted. The variable of interest is an indicator variable for whether or not the individual is African American (1 = African American, 0 = other).

There are many controls in the regression, including a factor variable for department and a performance score variable (integer from 1-4, treated as factor). I have found that nobody with a performance score of 3 or 4 is ever promoted, which covers roughly half of the population. This is confirmed by company documents that support the inability of 3s and 4s to be promoted. There are also several small departments that perfectly predict promotion or not promotion, simply because they only have 2 or 3 employees that are all promoted or not promoted. When Stata runs the regression, observations that are perfectly predicted are dropped from the regression. When I calculate the average marginal effect of being African American, I would like to make the following adjustment:
  1. Include individuals with a score of 3 or 4 with a marginal effect of 0. For these individuals, they could not have been promoted, so the effect of being African American is nothing.
  2. Continue to exclude individuals from small departments. It is possible that a 4th employee would have been promoted, so we cannot truly know the marginal effect of being African American.
Are there any theoretical issues with this approach? If not, how could I go about implementing this in probit or margins (depending on which stage is the most straightforward)?

Thanks!
Andy