Dear all,

I am using Stata 16 on Mac 10.14. I am currently working with two geocoded datasets. Dataset 1 (D1) is my main dataset which consists in a set of answers to a survey (n=84 000 observations):

idstd is the id of the survey respondent, lat_1 its latitude and lon_1 its longitude.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input double idstd float(lat_1 lon_1)
468901 18.801293 46.71348
468902  12.52105 48.13531
468903 14.525024 46.11655
468904 16.515405 49.12687
468905 14.546634 49.09956
468906 12.78726 43.62897
468907  14.76148 42.59135
468908  14.75744 48.61428
468909 14.537621 43.14062
468910 14.486714 41.04305
end

Dataset 2 (D2) is an auxiliary dataset which consists in a set of specific firms (n=23 000 observations).

PropertyID is the id of the firm, lon_2 its longitude and lat_2 its latitude.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str8 PropertyID double(lat_2 lon_2)
"f_1"    -12.009   121.342
"f_2"      -25.163   119.083
"f_3"    -26.94044  101.9716
"f_4"    -28.93786 162.00677
"f_5"       43.957   100.929
"f_6"   -11.321    121.49
"f_7"   -41.711   140.465
"f_8"   -21.691   140.471
"f_9"   -31.684   140.472
"f_10"   -21.679   140.473
end
My goal is to build an adjancency matrix for Dataset 1 using Dataset 2.

I already built a first adjancency matrix (call it A1) which has 84 000 lines * 23 000 columns. In A1, aij=0 if a survey respondent from D1 is outside a 20-km radius of firm from D2 and aij=1 if a survey respondent from D1 is inside a 20-km radius of firm from D2. This adjacency matrix was saved as a dataset and has n=84 000 observations and 23 001 variables (idstd + 23 000 firms).

A1 is as below (1s were added for the sake of the example) :

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long idstd byte(f_1 f_2 f_3 f_4 f_5)
468901 0 0 0 0 0
468902 0 1 0 0 0
468903 0 0 1 0 0
468904 0 0 0 0 1
468905 0 0 0 0 0
468906 0 0 1 0 0
468907 0 0 0 0 0
468908 0 1 0 0 1
468909 0 0 0 0 0
468910 0 1 0 0 1
end
However, I wish to build a second adjacency matrix (call it A2) of size 84 000 lines * 84 000 columns (84 000 respondents from D1 * 84 000 respondents from D1) with aij=n if respondents from D1 have a 1 in the same n column (if they are both in the radius of firms D2) and 0 otherwise. By convention aii=1 for the same firm as a firm is always considered within its own network.

Following up on the previous example, this new adjacency (symmetric) matrix A2 would like this:

Code:
input long idstd byte(v_468901 v_468902 v_468903 v_468904 v_468905 v_468906 v_468907 v_468908 v_468909 v_468910)
468901 1 0 0 0 0 0 0 0 0 0
468902 0 1 0 0 0 0 0 1 0 1
468903 0 0 1 0 0 1 0 0 0 0
468904 0 0 0 1 0 0 0 1 0 1
468905 0 0 0 0 1 0 0 0 0 0
468906 0 0 1 0 0 1 0 0 0 0
468907 0 0 0 0 0 0 1 0 0 0
468908 0 1 0 1 0 0 0 1 0 2
468909 0 0 0 0 0 0 0 0 1 0
468910 0 1 0 1 0 0 0 2 0 1
end
For example survey respondents N°468908 and 468910 are both in the 20km radius of firm f_2 and f_5, hence they receive a value of 2 as they have 2 links between each other. Survey respondents N°468903 and 468906 are both in the 20km radius of firm f_3, hence they receive a value of 1 as they have only 1 link.

The goal for building this adjacency matrix is using it with the acreg package (https://acregstata.weebly.com/). More precisely this adjacency matrix would be used as the input for the option varlist_links.

I) My first issue is building such adjacency matrix in Stata because the variable limit for Stata SE/16 is 32,767 variables. My 84 000 columns are a far greater number than this limit. I have understood that this limit could be avoided using Mata, but guidance on this point would be most welcomed as I am not at all familiar with Mata. To be clear, I don't know how to perform the transformation from A1 to A2.

II) My second question is (if what I want to do is doable on Mata), can we use this Mata Matrix as input in the Stata function acreg that "asks" the adjacency matrix in variable form?

Thank you in advance,

Loic Porte