Dear all,

I'm trying to verify the estimation procedure proposed in Wooldridge (2019) "Correlated random effects models with unbalanced panels". I constructed a pseudo-random dataset to estimate a fractional response model (see slides p38-44), but the program does not return the true parameter values. I checked everything but couldn't find out why.

The paper can be found here: https://www.statalist.org/forums/fil...tch?id=1384040
I followed the estimation procedure in the slides (p46): https://www.stata.com/meeting/chicag...wooldridge.pdf

The codes I used are the following:
Code:
clear

//parameterize
local b0     "0.5"
local b     "0.2"
local phi1     "0.3"
local phi2     "0.2"
local phi3     "0.4"
local xi1    "-0.3"
local xi2    "0.1"
local xi3    "0.6"
local om1     "0.15"
local om2    "0.2"
local om3    "-0.1"
local tao    "0.6"

//draw a random independent variable
set obs 10000 //cross-section
gen id=_n

local T "3" //time periods
expand `T'
bys id: gen t=_n
sort id t
gen x=rnormal(1,2)


//randomly drop some obs to create an unbalanced panel
bys id: gen I=runiform()
keep if I<.5

tab t

//generate variables in the CRE
egen xbar=mean(x),by(id) //time-average
bys id: gen tobs=_N //number of time periods for each id
forvalues i=1/`T'{
    gen g`i'=tobs==`i'
} //number of time periods dummy indicator

gen cmean=`phi1'*g1+`phi2'*g2+`phi3'*g3+`xi1'*g1*xbar+`xi2'*g2*xbar+`xi3'*g3*xbar
gen cvar=exp(`tao'+`om1'*g1+`om2'*g2)

preserve
collapse (mean) cmean cvar,by(id)
gen c=rnormal(cmean,sqrt(cvar))
save "C:\c_draw.dta", replace
restore

merge m:1 id using "C:\c_draw.dta", nogenerate keep(master match)

sort id t

//generate E(y)
gen y=normal(`b0'+`b'*x+c)

//estimate using the program in the slides (p46)
capture program drop frac_het

program frac_het
    version 15.1
    args llf xb zg
    quietly replace `llf'=$ML_y1*log(normal(`xb'*exp(-`zg')))+(1 - $ML_y1)*log(1 - normal(`xb'*exp(-`zg')))
    end

ml model lf frac_het (y = x g1 g2 g1xbar g2xbar) (g1 g2, nocons), vce(robust)

ml max
As you can see after running the program, the true values of parameters do not fall in the 95% confidence interval generally.

Besides, I'm also confused about which number of period indicators should be chosen as explanatory variables (i.e. g1, g2, g3), similarly for the variance explanatory variables. I'm not sure whether I had an incorrect understanding, but any help with the program and the variable choice would be greatly appreciated!

Many thanks in advance,
Ziwei