HI,

We are running an analysis that estimates promotion hazards for 13,534 workers of different races (White, Black, Asian, Hispanics) in a US federal agency. The regressions control for nearly perfectly observed productivity measures that are supposed to matter for promotions. Our results suggest statistically different promotion hazards in the following order: Blacks < Hispanics < Asians < Whites. Blacks have up to a 70% lower hazard of promotion at the highest grade.

We want to check whether this disparity is a result of stereotypes and statistical discrimination. Thus, we estimate the same hazard models, but replacing the race categorical variables with variables that may lead to stereotypes, in particular (a) Median income of each of the four races, (b) Average educational levels of each of the four races, and (c) GDP of race-origin countries. Most importantly, we do NOT have within group/race variation on any of these three variables, so basically, we have only four values per variable (one value per race). We obtain results suggesting a strong positive relationship between these variables and promotion hazards.

Question: Is it OK to do what we did? That is, replace a categorical variable that can take on four possible values (Blacks, Hispanics, Asians, Whites) with one variable, say median income for each of the four races?

Thank you so much.
Deepak