Hi,
I am working with a Household survey for 2022, where each row is an individual. I have information on demographics as well as employment variables. I want to simulate each individual's employment status and sector for 2024. For simplicity, assume we are working with 2 sectors. What I have done so far is to create the following categorical variable (I called it sector_all): 1 if employed in sector A, 2 if employed in sector B, 3 if unemployed, and 4 if out of the labor force. Using this as a dependent variable, I run the following multinomial logit regression:
mlogit sector_all gender married children indigenous c.age i.educ i.rural hh_size i.region
which I then used to predict the probability that each individual falls in each of the four categories of sector_all:
predict p1 p2 p3 p4, pr
Now, I would like to use these probabilities to create a simulated version of sector_all, but for 2024. The caveat is that I would like the distribution of workers in 2024 to follow macro growth data in each sector. Lets imagine that sector A is projected to grow 5% in that period, and sector B is projected to decrease in 3%; then I would like that the number of workers in sectors A and B t represent those growth rates.
I am having a lot of trouble to find a way to do this.So far, I have obtained for each person the highest probability across all four categories, and to which sector it corresponds (i.e. the most likely sector they would move to) - see my code below
egen highest_p = rowmax(p1-p4) /*Highest probability*/
forval i = 1/8 {
gen aux`i' = `i' if p`i'== highest_p
}
egen pred_sector_all = rowmax(aux*) /*Predicted sector*/
I have tried generating a random number from a uniform distribution and compare it to this probability, and decide if an individual moves or not based on this comparison, but it never converges to the numbers I need.
gen sector_form2024 = sector_form
gen u = .
loc y_sectorA = 0
loc y_sectorB = 0
while `pred_sectorA' != `y_sectorA' | `pred_sectorB' != `y_sectorB' {
replace sector_form2024 = sector_form
replace u = runiform()
replace sector_form2024 = pred_sector_form if u > highest_p
count if sector_all2024 == 1
loc y_sectorA = r(N)
count if sector_form2024 == 2
loc y_sectorB = r(N)
}
(here pred_sectorA and pred_sectorB are the target number of workers in each corresponding sector after using the growth rates mentioned before)
Any ideas?? Any help would be much much appreciated.
Related Posts with Problem with prediction after mlogit
Applying HCUP software on NIS data through STATAHello everyone, I'm using NIS data for research, which is a dataset that uses ICD-10 codes to genera…
Sensitivity Analysis for Causal Mediation with Interacted Independent VariableHello! I am conducting a causal mediation analysis in which treatment --> mediator --> outcome…
Proportion test for multiple groupsHi, I want to compare how does use of drug "A" differs across age categories. Total drug A users w…
can we set history window and variable window appear at the same time for Stata Maccan we set history window and variable window appear at the same time for Stata Mac? Every time I ne…
dynamic probit fixed effects model using "probitfe" r(3900)Hi all, I am working with large dynamic binary panel data with i=53856 and t=11 (total obs. 592416)…
Subscribe to:
Post Comments (Atom)
0 Response to Problem with prediction after mlogit
Post a Comment