Hi,
I am working with a Household survey for 2022, where each row is an individual. I have information on demographics as well as employment variables. I want to simulate each individual's employment status and sector for 2024. For simplicity, assume we are working with 2 sectors. What I have done so far is to create the following categorical variable (I called it sector_all): 1 if employed in sector A, 2 if employed in sector B, 3 if unemployed, and 4 if out of the labor force. Using this as a dependent variable, I run the following multinomial logit regression:
mlogit sector_all gender married children indigenous c.age i.educ i.rural hh_size i.region
which I then used to predict the probability that each individual falls in each of the four categories of sector_all:
predict p1 p2 p3 p4, pr
Now, I would like to use these probabilities to create a simulated version of sector_all, but for 2024. The caveat is that I would like the distribution of workers in 2024 to follow macro growth data in each sector. Lets imagine that sector A is projected to grow 5% in that period, and sector B is projected to decrease in 3%; then I would like that the number of workers in sectors A and B t represent those growth rates.
I am having a lot of trouble to find a way to do this.So far, I have obtained for each person the highest probability across all four categories, and to which sector it corresponds (i.e. the most likely sector they would move to) - see my code below
egen highest_p = rowmax(p1-p4) /*Highest probability*/
forval i = 1/8 {
gen aux`i' = `i' if p`i'== highest_p
}
egen pred_sector_all = rowmax(aux*) /*Predicted sector*/
I have tried generating a random number from a uniform distribution and compare it to this probability, and decide if an individual moves or not based on this comparison, but it never converges to the numbers I need.
gen sector_form2024 = sector_form
gen u = .
loc y_sectorA = 0
loc y_sectorB = 0
while `pred_sectorA' != `y_sectorA' | `pred_sectorB' != `y_sectorB' {
replace sector_form2024 = sector_form
replace u = runiform()
replace sector_form2024 = pred_sector_form if u > highest_p
count if sector_all2024 == 1
loc y_sectorA = r(N)
count if sector_form2024 == 2
loc y_sectorB = r(N)
}
(here pred_sectorA and pred_sectorB are the target number of workers in each corresponding sector after using the growth rates mentioned before)
Any ideas?? Any help would be much much appreciated.
0 Response to Problem with prediction after mlogit
Post a Comment