Dear Stata users,

I have 31000 country-pairs (panel variable) during 17 years (time variable) and my dependent variable is a kind of market share (from an exporting country i in the market of country j), so it is a fractional within [0, 1]. Note that 89% of the observations of this market share are zeros because country-pairs have no trade relationship for the product I'm studying. Regarding the explanatory variables I have the traditional ones for gravity equations (distance, gdp, FTA's, etc) and production of that product.

I started by estimating the Two-Part Fractional Regression Model as presented in Ramalho et al (2011),
Code:
frm weight_usd ldistw lgdpcap_d fta_wto ler eu_d lprod* comlang_off comrelig yr*, model(2P) inf(0) linkbin(logit) linkf(logit) vcebin(cluster pairid_a) vcefrac(cluster pairid_a)
This methodology gives me interesting results and allows to keep time invariant variables like distance. Additionally, I can observe different behaviours in the two parts. However, it does not take into account the country-pairs heterogeneity, which should be important in international trade.

Therefore, following Prof Wooldridge (2018), I want to compute the FE estimator as a pooled OLS estimator using the Mundlak device (i.e. using the original data and adding the time averages of the covariates as additional explanatory variables), but also considering a two-part model if possible.
I would imagine that the code would be something like
Code:
xtset pairid_a
gen WWW=1
replace WWW=0 if weight_usd==0
capture program drop weight_boot
program weight_boot, rclass
*1st stage
probit WWW lgdpcap_d fta_wto ler eu_d lprod1000hl_o lprod1000hl_d yr* if sample==1, cluster(pairid_a) 
predict xb, xb
gen double imr = normalden(xb)/normal(xb)
*2nd stage
glm weight_usd lgdpcap_d fta_wto ler eu_d lprod1000hl_o lprod1000hl_d yr* mean* imr if WWW==1 , fa(bin) link(probit) cluster(pairid_a)
predict x1b1hat, xb
gen scale=normalden(x1b1hat)
gen pe1=scale*_b[lgdpcap_d]
summarize pe1
return scalar ape1=r(mean)
gen pe2=scale*_b[fta_wto]
summarize pe2
return scalar ape2=r(mean)
gen pe3=scale*_b[ler]
summarize pe3
return scalar ape3=r(mean)
gen pe4=scale*_b[eu_d]
summarize pe4
return scalar ape4=r(mean)
gen pe5=scale*_b[lprod1000hl_o]
summarize pe5
return scalar ape5=r(mean)
gen pe6=scale*_b[lprod1000hl_d]
summarize pe6
return scalar ape6=r(mean)
drop xb imr x1b1hat scale pe1 pe2 pe3 pe4 pe5 pe6
end
*Bootstrapped SE within districts
bootstrap r(ape1) r(ape2) r(ape3) r(ape4) r(ape5) r(ape6), reps(50) seed(123) cluster(pairid_a) idcluster(newid): weight_boot
program drop weight_boot
In mean* I have the time averages of the covariates (including time averages of the year dummies because of my unbalanced panel).
When I try this code I get the error “insufficient observations to compute bootstrap standard errors”. I do not know if this happens because what I'm asking is a) impossible in general, b) impossible for my data, or c) possible but I just have the wrong code.

Hope someone can give me some advice. Thank you for your attention.

Anthony Macedo

References:
J.M. Wooldridge (2018) Correlated random effects models with unbalanced panels, Journal of Econometrics.
E.A. Ramalho, J.J.S. Ramalho and J.M.R. Murteira (2011) Alternative estimating and testing empirical strategies for fractional regression models, Journal of Economic Surveys.