Hello users,

I am new to the forum, but hoped you would be able to settle a small dispute we are having. Currently we are analyzing data from a hospital setting. Patients have been recruited based on the presence of respiratory symptoms (yes/no) e.g. cough, sore throat, inspiratory wheezing etc.
The patients have been recruited from 6 different hospitals over the course of four influenza seasons (winter of 2010, 2011, 2012, 2013). On this, we are planning to run a logistic model that investigate the associations of the different respiratory symptoms and the disease of interest (outcome e.g Human Metapneumovirus). Since it is a hospital setting, all patients are sick and the once that are negative to Human Metapneumovirus will have some other respiratory disease. Which other disease(s) that are most prevalent among the “controls” will differ somewhat depending on the season (e.g season one was a strong influenza season, while season two was more dominated by Respiratory Syncytial Virus).

The discussion have ended up with four different views on how to address this problem.
  1. Keeping it simple: Using a standard logistic regression with hospital and season as independent variables in the model:
    logistic disease cough i.hospital i.season
  2. Clustering 1: Taking the possible clustering arising from having different hospitals into account by using melogit (or possibly xtgee).
  3. Clustering 2: Taking the possible clustering arising from both different hospitals and season into account by using melogit
  4. Clustering 2: Since it is the same 4 season and same hospitals in all seasons there was a suggestion of viewing this as a “two-way crossed random effects" situation (not sure how to do this in STATA).

This is still at the discussion level, so unfortunately I cannot give an example of the data. But I am interested in hearing what other outside our small cluster think before the argument heats up when the data arrives…

Best wishes,
Jon