Stata/MP 14.2 and my installation does not have internet access, so I cannot copy code or output to this forum.
My data is longitudinal, with 128 "zones" and around 700 daily observations for whether or not a pipe break has occurred on that day. My covariates consist of various time-varying factors specific to each zone, like water demand, pressure measurements, etc. One issue is that pressure measurements (a key variable) are only available for 42 zones and are severely unbalanced. Time invariant factors are ignored since there are so many that we can't quantify.
Initially the idea was to do a regression on breaks per mile of pipe, but it has since come to light that the miles used in that calculation are unreliable estimates. So, a binary outcome of whether or not a break happened seems reasonable.
xtlogit has the random effects, conditional fixed effects, and population averaged approach available, but I am not sure which would be best.
As I understand it, random effects is only valid for random samples from a larger population, and since the population averaged approach is similar, does that exclude that approach too? Also, since we don't quantify the time-invariant variables, isn't random effects invalid? Does that apply to population averaged approaches too?
So conditional fixed effects remains, but it doesn't have cluster-robust standard errors. I could use the bootstrap option (but the docs don't explicitly say that this would be sufficient, but threads on this forum suggest this is the case) or do clogit with robust standard errors. But clogit is for matched case-control data according to the docs...
What is the most appropriate approach? Hosmer and Lemeshow (2013) mention a "cluster-specific" model, but I don't see that language anywhere in the Stata documentation.
Hosmer Jr., D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied Logistic Regression. Wiley series in probability and statistics. Hoboken, NJ, USA: John Wiley & Sons, Inc.
Related Posts with Help with choosing which type of logistic regression is most appropriate
GSEM for the ordered dependent variable and endogeneityDear all, I am applying gsem to address the endogeneity using two different commands I found on Stat…
How official does a Windows Python version have to be to be usable by Stata?Hello All. Thanks to Rafal Raciborski of StataCorp, I now know about the python: prefix in Stata Ver…
Instrumental Variable and Quadratic TermDear All, I hope this post finds you well. I would like to test for endogeneity by using the command…
Implementation of an event study with risk effectsDear all, I am running an event study with 649 firm announcements about digital innovations. With …
True Zero ValuesHello everyone. I have panel data set of 27 manufacturing industries for T=20, Effective Tax Rate is…
Subscribe to:
Post Comments (Atom)
0 Response to Help with choosing which type of logistic regression is most appropriate
Post a Comment