Currently I am using
Code:
xtreg, fe vce(cluster ID)
in Stata/MP 14.2 with panel data with 42 entities (128 "ID"s) with anywhere from 70 to 750 observations per entity (unbalanced) and around 10-20 independent variables. SSC packages are off-limits as there is no internet connection on the machine with Stata installed. I noticed that my dependent variable is mostly zeros (>95%) (but still continuous, semicontinuous is the term I believe) and wondered if there was a specific way to analyze that sort of data. My research led me to the two-part model (model the outcome of yes/no as logistic and then model the non-zero observations with a regular linear regression or some other method) [1]. But I am not sure how to implement this with fixed effects in Stata. I would post a histogram of the data but unfortunately I am not allowed to share it.
First, any thoughts on how to do this in Stata?
Second, is there any nice way to do something like
Code:
xtreg, fe vce(cluster ID)
in Stata for zero-inflated/semicontinuous data?
Third, any thoughts in general about the most appropriate way to model this data? Is
Code:
xtreg, fe vce(cluster ID)
a bad idea with zero-inflated/semicontinuous data? Reading [1] leads me to believe that such a model would be highly susceptible to extreme positive values, of which I have several. Additionally, the residuals resulting from using
Code:
xtreg, fe vce(cluster ID)
are certainly not normally distributed. We are looking to understand the significance/association (coefficient and p-value) of the independent variables with the dependent variable more so than to make predictions (if there is a distinction between the goals that matters).
[1] Boulton AJ, Williford A (2018) Analyzing skewed continuous outcomes with many zeros: A tutorial for social work and youth prevention science researchers. J Soc Social Work Res 9:721–740. doi: 10.1086/701235
Also posted on
Stack Overflow.
0 Response to Non-negative continuous right-skewed (zero-inflated) panel data analysis
Post a Comment