Currently I am using
xtreg, fe vce(cluster ID)
in Stata/MP 14.2 with panel data with 42 entities (128 "ID"s) with anywhere from 70 to 750 observations per entity (unbalanced) and around 10-20 independent variables. SSC packages are off-limits as there is no internet connection on the machine with Stata installed. I noticed that my dependent variable is mostly zeros (>95%) (but still continuous, semicontinuous is the term I believe) and wondered if there was a specific way to analyze that sort of data. My research led me to the two-part model (model the outcome of yes/no as logistic and then model the non-zero observations with a regular linear regression or some other method) [1]. But I am not sure how to implement this with fixed effects in Stata. I would post a histogram of the data but unfortunately I am not allowed to share it.

First, any thoughts on how to do this in Stata?

Second, is there any nice way to do something like
xtreg, fe vce(cluster ID)
in Stata for zero-inflated/semicontinuous data?

Third, any thoughts in general about the most appropriate way to model this data? Is
xtreg, fe vce(cluster ID)
a bad idea with zero-inflated/semicontinuous data? Reading [1] leads me to believe that such a model would be highly susceptible to extreme positive values, of which I have several. Additionally, the residuals resulting from using
xtreg, fe vce(cluster ID)
are certainly not normally distributed. We are looking to understand the significance/association (coefficient and p-value) of the independent variables with the dependent variable more so than to make predictions (if there is a distinction between the goals that matters).

[1] Boulton AJ, Williford A (2018) Analyzing skewed continuous outcomes with many zeros: A tutorial for social work and youth prevention science researchers. J Soc Social Work Res 9:721–740. doi: 10.1086/701235

Also posted on Stack Overflow.