Hi all,

I have been asked to verify the effect of water and sanitation on the mortality of children under the age of 5 and quantify whether providing water services or sanitation services has a larger effect on child mortality.

Using what I understand about changing functional form and l
ooking at the scatter graphs I concluded that only GDPPC would need to be transformed as the other variables were measured in percentages or proportions.

I ran regress and I got a much smaller coefficient for WATER than I would’ve expected and a P value for WATER of 0.948 which leads me the believe something is wrong with the model.

I then ran regress on all variations of the model I thought could be plausible based on the nature of the variables. However, they all turn up either coefficients whose direction is different to that seen in the scatter graphs and/or very large P values.

The only thing I could thing of is the high correlation between the variables.

I would be very grateful for any thoughts of the correct functional form to use or why my results are so unexpected.

Code:
summarize

gen LGDPPC=log(GDPPC)

regress INFMORT LGDPPC WATER SANIT

ovtest

vif

correlate
. summarize

Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
CCODE | 0
GDPPC | 40 14312.08 14534.66 1349.372 52926.54
INFMORT | 40 32.31 25.03579 3.4 95.1
WATER | 39 85.70895 17.52148 36.59633 100
SANIT | 40 70.71659 28.64349 13.94848 100


. regress INFMORT LGDPPC WATER SANIT

Source | SS df MS Number of obs = 39
-------------+---------------------------------- F(3, 35) = 35.75
Model | 18397.8535 3 6132.61782 Prob > F = 0.0000
Residual | 6003.51577 35 171.529022 R-squared = 0.7540
-------------+---------------------------------- Adj R-squared = 0.7329
Total | 24401.3692 38 642.141296 Root MSE = 13.097

------------------------------------------------------------------------------
INFMORT | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
LGDPPC | -8.453043 3.386237 -2.50 0.017 -15.32747 -1.578615
WATER | -.0147222 .2241601 -0.07 0.948 -.4697913 .4403469
SANIT | -.5062109 .1450451 -3.49 0.001 -.8006682 -.2117536
_cons | 146.1188 24.48058 5.97 0.000 96.42059 195.8171
------------------------------------------------------------------------------


. correlate
(CCODE ignored because string variable)
(obs=39)

| GDPPC INFMORT WATER SANIT LGDPPC
-------------+---------------------------------------------
GDPPC | 1.0000
INFMORT | -0.6592 1.0000
WATER | 0.5542 -0.7316 1.0000
SANIT | 0.6441 -0.8401 0.8262 1.0000
LGDPPC | 0.9029 -0.7849 0.7342 0.7664 1.0000