Hi,

I am working with a municipality-level dataset. I am analysing the effect of immigration shocks on a political voting outcome. I want to compare responses to immigration inside the municipality (in-municipality immigration), versus immigration in the neighbouring areas.

I have calculated immigration shocks in neighbouring areas, with different radiuses (10, 20, 30km). I have distance-weighted them. My thinking was to regress my outcome one of these variables, along the in-municipality immigration variable - to compare effects.

I find that in-municipality immigration is strongly, positively correlated with immigration shocks in neighbouring areas. For example, the raw correlation coefficient of the in-municipality immigration variable, with the 30km neighbouring measure is of 0.66. Moreover, descriptive stats have the same mean, with standard deviation being not too different:

Variable N Mean Sd Min Max
In-municipality immigration 2,138 4.01 2.41 -1.06 22.68
Neighbouring 30km variable 2,138 4.01 3.55 -2.78 33.14
Neighbouring 20km variable 2,138 4.02 2.56 -1.30 24.93
Neighbouring 10km variable 2,138 4.01 2.41 -1.06 22.68

When I regress my outcome on in-municipality immigration alone, the coefficient is positive. When I regress my outcome on in-municipality immigration and, say, the 30km neighbouring measure, both variables have positive effects. Yet, the effect for the in-municipality variable reduces sharply - reflecting the positive correlation.

I did a VIF test. I regress the outcome against the in-municipality and the 30km neighbouring measure. Then run the "vif" command. And I get a 1.77 value, which would suggest no collinearity problems, as I understand.

Do you have any suggestions on whether I should be concerned or on what to do?