I am having some issues working with and interpreting seasonal data on malaria cases in country X. In the attached are some images of the outputs I have generated Array . I am looking at the national rates of malaria cases by month from 2016 to 2019 as a proportion of the under-5 population in the country, which should increase massively in the rainy season.

My intention with this is to estimate the seasonal trends i.e. look at the differences between seasons (3 month periods) and show that in certain seasons you are X% more likely to see an increase in malaria transmission rates.

By ‘de-trending’ this data to remove any annual increases, I can see the true impact of seasonal fluctuation in malaria rates…

regress u5s_malaria_per date2
predict malaria_detrended, resid
predict malaria_trend, xb

There is a very clear and strong seasonal trend (as shown in the attached) with peaks at the beginning of the year and troughs nearer the end of the year. The model has pvalues that are stat sig for all months and an R2 of 18.4 and Adjusted R2 of 18.3

The residuals for this regression reveals the pattern below and the command “estate hettest” shows heteroskedasticity… This is not surprising given the seasonality in the data.
Given the patterns in the data, I thought polynomial X vars of month^2 and month^3 might account for the curvature and allow me to make estimations. I run the below and get the outputs for the data and residuals show in the attached doc. Array

regress c.malaria_detrended month month_2 month_3
predict yhat_3
twoway (line yhat_3 month, sort) || lfit malaria_detrended month
rvfplot, sort msize(vsmall)

The line fits the curve much better and Adjusted R squared increases to 0.23, but the regression’s residuals still show some pattern I believe and there is heteroskedasticity when I run the “estat hettest” command. Array

Can anyone offer assistance? I understand it is natural to see some autocorrelation and heteroskedastic residuals as a result in this kind of data, but I am not sure how/if I can interpret this data in anyway.

I am looking to estimate the seasonal trends i.e. look at the differences between seasons (3 month periods) and show that in certain seasons you are X% more likely to see an increase in malaria transmission rates. I have used the monthly data here so its easier to see residuals and the pattern in the data, but the same issues around heteroskedasticity etc exists in the seasonal estimates as well.

I am unsure as where to go next with this or whether I have taken the appropriate steps so far or any of my outputs can be used…

Can someone advise?