Dear all,

I turn to you as I have a puzzle that has frustrated me to the level of despair. I don't post any data as I suppose in order to reproduce the effect I will talk about needs the full dataset and it is large (23GB). So, I simply state my puzzle which I think it is clear enough already. This is:

I look at firms' connections between two countries. What I want to see is whether the probability of a firm been connected with another firm is significantly different within each country than across the two countries close to the border. I have a dataset with all the actual links and produce all possible pairs within 50km from border using the genear routine. Then I calculate for each firm the share of connections within country and across and conduct a t-test for mean differences. I do that both for each country as a whole and by group (North,Central,South). Regarding the size of each subgroup they are at least of 1,800 firms.
The p-values suggest highly significant differences across all specifications (all are 0.00). The image drawn is uniform; probability of been connected is much higher within each country than across.

I further draw a scatter plot with all the actual links which shows that links across countries are clearly way less than the links within each country.

And finally, when go to estimate a simple regression: "connected = β 0 + crosspair + dist_btw_firms", where dist_btw_firms is the distance in kilometres between the two firms of each pair, I take a significant positive effect of a pair been across the two countries on the probability of been connected. In other words, regression suggests that two firms that are not in the same country are more likely to be connected than two firms that fall within each country.

I also compare the shares of "crosspairs" and "withinpairs" that are actually links and the share of "within pairs" is higher about ten times.


So, I am really frustrated. Can anyone come with an idea how this can be the case?

Thanks in advance.