I am trying to test the capabilities of STATA 15's FMM procedure to estimate the parameters of a zero-inflated distribution of a proportion. For the purpose, I simulate the underlying data and estimate the parameters. The FMM seems to do a good job recovering the latent class marginal probabilities, yielding estimates of 38.8% for class 1 (true prob = 39%) and 61.2% for class 2 (true prob = 61%). It captures well the magnitude of the pointmass at 0, but it seems to have a hard time recovering the parameters of the logistically distributed proportion within the (0, 1) interval. In contrast, the simple GLM procedure, using the class 2 data within the (0, 1) interval, recovers successfully the slope parameter b (estimate = 0.3903188, while the true parameter is set at 0.4).
In particular, the code I am running is:
Code:
clear all set more off set obs 2000 set seed 12345 // generate class indicator gen class = inrange(_n, 1, 780)*0 + /// // 39% in class 1 inrange(_n, 781, 2000)*1 // 61% in class 2 // set parameters scalar mu = -0.1 scalar sx = 0.3 scalar se = 0.1 scalar b = 0.4 // generate random Normal variables gen x = rnormal(0, sx) gen e = rnormal(0, se) // generate simulated series for Y gen y = 0 if class == 0 replace y = 1/(1 + exp(-(mu + b*x + e))) if class == 1 // plot the ys versus the x twoway scatter y x, by(class) name(y_by_x, replace) histogram y, frequency by(class) width(0.03) fcolor(forrest_green%50) name(y_by_class, replace) histogram y, frequency width(0.03) fcolor(navy%50) name(y_hist, replace) // estimate paramters using known class and single GLM glm y x if class == 1, family(binomial) link(logit) sort y // use FMM with a pointmass at 0 and a GLM to estimate the parameters fmm, difficult : (pointmass y, value(0)) /// (glm y x, family(binomial) link(logit)) predict exp_y* predict pr*, classposteriorpr format %4.3f pr* estat lcprob estat lcmean // compute the predicted values gen y_hat pr1*0 + pr2*exp_y1 // summarize the dependent variable and its fitted values su y y_hat exp_y
0 Response to FMM with GLM yields strange results on simulated data
Post a Comment