Hi
I have a data set I could use some input on how best to analyze.

Its a hierarchical data with animals clustered at two different geographical locations, each with up to 12 sub-units with animals, we can assume there are two levels of clustering here; the geographical location and unit within the location. Each sub-unit contains approximately 150 000 individual animals. Each sampling date occurred weekly, but not exactly every 7 days between each sampling point. At each sampling between 20 and 60 individual animals were selected from each unit (all units were examined every week), the number of parasites on each individual were counted and the animal was returned to its unit. So animal number 1 is just the first animal examined that day, not the same animal each week. I have weekly data from almost one year per unit per geographical location.
The number of parasites were counted, and many dates have zero counts.

Example of the structure:

Code:
clear
input int date byte Location int unit byte(animal_number parasite)
21045 1 105  1 1
21045 1 105  2 0
21045 1 105  3 0
21045 1 105  4 0
21045 1 105  5 0
21045 1 105  6 0
21045 1 105  7 0
21045 1 105  8 0
21045 1 105  9 0
21045 1 105 10 0
21045 1 105 11 1
21045 1 105 12 0
21045 1 106  1 1
21045 1 106  2 0
21045 1 106  3 0
21045 1 106  4 2
end
format %td date
label var date "date"
label var Location "Location"
label var unit "unit"
label var animal_number "animal_number"
label var parasite "parasite"

The unit of interest for the analysis is the sub-unit and not each individual animal. And the goal is to examine the effect of unit-level interventions on the parasite load.

I was thinking to use panel either Poisson or neg. binomial regression, but I cannot xtset the data due to the repeated measurements per unit per date. So my first question is then, can I use multilevel Pisson/neg.binomial instead?

However I expect there to be an auto-correlation in the data as the parasite load spreads in the population unit, so counts recorded early in the time series are more likely to be zero than counts later in the time series, and two consecutive counts are more likely to be similar than counts far apart in time.
I am unsure how reliable the auto-correlation test will be as the dates for measurement are not evenly spaced, I could however use week instead of day to get around this?

Or should I aggregate the data on date-unit, as a mean value for the number of parasites per unit-date, and transform it to get an approximation towards normality, xtset the data, and just use a multilevel linear regression?

Thank you in advanced for your input.

Regards,
Marit