Hi,
I am working with a Household survey for 2022, where each row is an individual. I have information on demographics as well as employment variables. I want to simulate each individual's employment status and sector for 2024. For simplicity, assume we are working with 2 sectors. What I have done so far is to create the following categorical variable (I called it sector_all): 1 if employed in sector A, 2 if employed in sector B, 3 if unemployed, and 4 if out of the labor force. Using this as a dependent variable, I run the following multinomial logit regression:
mlogit sector_all gender married children indigenous c.age i.educ i.rural hh_size i.region
which I then used to predict the probability that each individual falls in each of the four categories of sector_all:
predict p1 p2 p3 p4, pr
Now, I would like to use these probabilities to create a simulated version of sector_all, but for 2024. The caveat is that I would like the distribution of workers in 2024 to follow macro growth data in each sector. Lets imagine that sector A is projected to grow 5% in that period, and sector B is projected to decrease in 3%; then I would like that the number of workers in sectors A and B t represent those growth rates.
I am having a lot of trouble to find a way to do this.So far, I have obtained for each person the highest probability across all four categories, and to which sector it corresponds (i.e. the most likely sector they would move to) - see my code below
egen highest_p = rowmax(p1-p4) /*Highest probability*/
forval i = 1/8 {
gen aux`i' = `i' if p`i'== highest_p
}
egen pred_sector_all = rowmax(aux*) /*Predicted sector*/
I have tried generating a random number from a uniform distribution and compare it to this probability, and decide if an individual moves or not based on this comparison, but it never converges to the numbers I need.
gen sector_form2024 = sector_form
gen u = .
loc y_sectorA = 0
loc y_sectorB = 0
while `pred_sectorA' != `y_sectorA' | `pred_sectorB' != `y_sectorB' {
replace sector_form2024 = sector_form
replace u = runiform()
replace sector_form2024 = pred_sector_form if u > highest_p
count if sector_all2024 == 1
loc y_sectorA = r(N)
count if sector_form2024 == 2
loc y_sectorB = r(N)
}
(here pred_sectorA and pred_sectorB are the target number of workers in each corresponding sector after using the growth rates mentioned before)
Any ideas?? Any help would be much much appreciated.
Related Posts with Problem with prediction after mlogit
Interpreting "estat vce, corr output" posetimation of menbreg with random slopes and interceptHi, I'm looking to get some assistance understanding the output of a postestimation command, estat …
Seemingly unrelated bivariate probit: backs up during iterationsHi everyone, I am trying to run the biprobit regression, using the following syntax: biprobit (dep…
Last observation of daily data for each monthDate IV 3/1/2000 27.81 3/2/2000 24.46 3/3/2000 22.65 3/6/2000 22.38 3/7/2000 26.36 3/8/2000 27.42 3/…
Fractional polynomials and nonlinear least-squares estimationIs there a fundamental reason why fractional polynomials (either -fp- or -mfp-) are not allowed when…
Confidence interval for interquartile rangeDear Statalist members, I would be most thankful for an advice. I need to estimate the confidence in…
Subscribe to:
Post Comments (Atom)
0 Response to Problem with prediction after mlogit
Post a Comment