Dear Statalists,

I want to conduct a Geographic Regression Discontinuity Design to compare municipalities on each side of provincial borders in Spain. I would like to compare municipalities that are within a fixed bandwidth away from the provincial border, and for that I need to calculate the minimum distance from each municipality to the closest point in the border.

I have two datasets:

The MAIN DATASET is a cross-section of all Spanish municipalities: one municipality = one observation. Then I have the variables:
  • the Province where the municipality is located
  • the latitude of the municipality
  • the longitude of the municipality
  • all other relevant socio-demographic variables…
The SECONDARY DATASET is a traditional .dta with coordinates generated with the shp2dta command. This dataset draws the map of provinces of Spain, with latitude and longitude of thousands of observations points that create the grid of provincial borders.

My question is that I don´t know how to combine these two files to run the Geographic RDD.

Ideally I would like to run something like this.

Outcome_var = alpha + border_1 * beta1 + border_2 * beta2 ….. + f(d) + epsilon


Border_1: is a DUMMY that takes value 1 for municipalities on one side of the 1st provincial border (within X distance), takes value -1 for municipalities on the other side of the 1st provincial border (within X distance)… and value 0 for all other municipalities.

I have more than 30 relevant provincial borders, so more than 30 dummies to be included.

And f(d) indicating the shortest linear distance of a given municipality to the closest point of the provincial border.

Thank you very much Statalists!


Inigo de Juan