I am using Stata 16 on Mac 10.14. I am currently working with two geocoded datasets. Dataset 1 (D1) is my main dataset which consists in a set of answers to a survey (n=84 000 observations):
idstd is the id of the survey respondent, lat_1 its latitude and lon_1 its longitude.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double idstd float(lat_1 lon_1) 468901 18.801293 46.71348 468902 12.52105 48.13531 468903 14.525024 46.11655 468904 16.515405 49.12687 468905 14.546634 49.09956 468906 12.78726 43.62897 468907 14.76148 42.59135 468908 14.75744 48.61428 468909 14.537621 43.14062 468910 14.486714 41.04305 end
Dataset 2 (D2) is an auxiliary dataset which consists in a set of specific firms (n=23 000 observations).
PropertyID is the id of the firm, lon_2 its longitude and lat_2 its latitude.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str8 PropertyID double(lat_2 lon_2) "f_1" -12.009 121.342 "f_2" -25.163 119.083 "f_3" -26.94044 101.9716 "f_4" -28.93786 162.00677 "f_5" 43.957 100.929 "f_6" -11.321 121.49 "f_7" -41.711 140.465 "f_8" -21.691 140.471 "f_9" -31.684 140.472 "f_10" -21.679 140.473 end
I already built a first adjancency matrix (call it A1) which has 84 000 lines * 23 000 columns. In A1, aij=0 if a survey respondent from D1 is outside a 20-km radius of firm from D2 and aij=1 if a survey respondent from D1 is inside a 20-km radius of firm from D2. This adjacency matrix was saved as a dataset and has n=84 000 observations and 23 001 variables (idstd + 23 000 firms).
A1 is as below (1s were added for the sake of the example) :
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long idstd byte(f_1 f_2 f_3 f_4 f_5) 468901 0 0 0 0 0 468902 0 1 0 0 0 468903 0 0 1 0 0 468904 0 0 0 0 1 468905 0 0 0 0 0 468906 0 0 1 0 0 468907 0 0 0 0 0 468908 0 1 0 0 1 468909 0 0 0 0 0 468910 0 1 0 0 1 end
Following up on the previous example, this new adjacency (symmetric) matrix A2 would like this:
Code:
input long idstd byte(v_468901 v_468902 v_468903 v_468904 v_468905 v_468906 v_468907 v_468908 v_468909 v_468910) 468901 1 0 0 0 0 0 0 0 0 0 468902 0 1 0 0 0 0 0 1 0 1 468903 0 0 1 0 0 1 0 0 0 0 468904 0 0 0 1 0 0 0 1 0 1 468905 0 0 0 0 1 0 0 0 0 0 468906 0 0 1 0 0 1 0 0 0 0 468907 0 0 0 0 0 0 1 0 0 0 468908 0 1 0 1 0 0 0 1 0 2 468909 0 0 0 0 0 0 0 0 1 0 468910 0 1 0 1 0 0 0 2 0 1 end
The goal for building this adjacency matrix is using it with the acreg package (https://acregstata.weebly.com/). More precisely this adjacency matrix would be used as the input for the option varlist_links.
I) My first issue is building such adjacency matrix in Stata because the variable limit for Stata SE/16 is 32,767 variables. My 84 000 columns are a far greater number than this limit. I have understood that this limit could be avoided using Mata, but guidance on this point would be most welcomed as I am not at all familiar with Mata. To be clear, I don't know how to perform the transformation from A1 to A2.
II) My second question is (if what I want to do is doable on Mata), can we use this Mata Matrix as input in the Stata function acreg that "asks" the adjacency matrix in variable form?
Thank you in advance,
Loic Porte
0 Response to Generate adjacency matrix in Mata and using it as input in Stata
Post a Comment