Stata/MP 14.2 and my installation does not have internet access, so I cannot copy code or output to this forum.
My data is longitudinal, with 128 "zones" and around 700 daily observations for whether or not a pipe break has occurred on that day. My covariates consist of various time-varying factors specific to each zone, like water demand, pressure measurements, etc. One issue is that pressure measurements (a key variable) are only available for 42 zones and are severely unbalanced. Time invariant factors are ignored since there are so many that we can't quantify.
Initially the idea was to do a regression on breaks per mile of pipe, but it has since come to light that the miles used in that calculation are unreliable estimates. So, a binary outcome of whether or not a break happened seems reasonable.
xtlogit has the random effects, conditional fixed effects, and population averaged approach available, but I am not sure which would be best.
As I understand it, random effects is only valid for random samples from a larger population, and since the population averaged approach is similar, does that exclude that approach too? Also, since we don't quantify the time-invariant variables, isn't random effects invalid? Does that apply to population averaged approaches too?
So conditional fixed effects remains, but it doesn't have cluster-robust standard errors. I could use the bootstrap option (but the docs don't explicitly say that this would be sufficient, but threads on this forum suggest this is the case) or do clogit with robust standard errors. But clogit is for matched case-control data according to the docs...
What is the most appropriate approach? Hosmer and Lemeshow (2013) mention a "cluster-specific" model, but I don't see that language anywhere in the Stata documentation.
Hosmer Jr., D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied Logistic Regression. Wiley series in probability and statistics. Hoboken, NJ, USA: John Wiley & Sons, Inc.
Related Posts with Help with choosing which type of logistic regression is most appropriate
Pooled cross section the best way to analyze non-panel longitudinal data?There are a bunch of datasets, e.g. world values survey, alcohol usage report etc which present good…
Stacked bar chart using pre-aggregated census dataHello, I have a table of aggregated data that I downloaded from Census. The first three years of the…
Launch of Cross Sectional Dependence Tests in StataHi Dear, Can you please tell me when or in which year and month Stata introduced cross sectional de…
Loop to avoid using more variables than Stata allowsHi everyone, I am working with a lot of data and could really use your help. I am trying to create …
Non-typical periodicity in panel/time seriesWhen using tsset or xtset, there are a bunch of frequencies that are standard and can be set easily …
Subscribe to:
Post Comments (Atom)
0 Response to Help with choosing which type of logistic regression is most appropriate
Post a Comment