Dear,

I have a list of patents defined by an application id, each patent has number of technological classes, so called IPC4. I would like to calculate a simple co-occurrence matrix, i.e. how many patents are there with for example class A01N and class A61P. Below I include a part of my sample of patents with corresponding IPC4 codes as well as a resulting co-occurence matrix based on that sample. It has been calculated manually and as I have to repeat that exercise for much larger sample of 300K patents I am looking for more efficient way to tackle that task. Hence, I would like to kindly ask for any suggestions and clues as to how ( or if it is possible at all) to create such co-occurence matrix using Stata.
Sample
appln_id ipc4
335751077 A01N
458497114 A01N
497 A61K
1204 A61K
58708 A61K
159561 A61K
16525572 A61K
16684626 A61K
16906855 A61K
17420428 A61K
55216987 A61K
266933230 A61K
335751077 A61K
405325474 A61K
417635173 A61K
458497114 A61K
458497114 A61L
58708 A61P
159561 A61P
16684626 A61P
16906855 A61P
17420428 A61P
266933230 A61P
335751077 A61P
417635173 A61P
497 A61Q
16684626 C07C
17420428 C07C
17420428 C07D
335751077 C07D
458497114 C07H
2 C07K
72 C07K
1204 C07K
159561 C07K
16906855 C07K
17420428 C07K
458497114 C07K
497 C11D
55217042 C12M
2 C12N
72 C12N
1204 C12N
32352 C12N
159561 C12N
386134 C12N
16906855 C12N
55217042 C12N
405325474 C12N
458497114 C12N
2 C12P
159561 C12P
16906855 C12P
458497114 C12P
159561 C12Q
55217042 C12Q
417635364 C12Q
2 C12R
159561 C12R
2 G01N
159561 G01N
55217042 G01N
co-occurence matrix
ipc4 A01N A61K A61L A61P A61Q C07C C07D C07H C07K C11D C12M C12N C12P C12Q C12R G01N
A01N 0 2 1 1 0 0 1 1 1 0 0 1 1 0 0 0
A61K 2 2 1 8 1 2 2 1 5 1 0 5 3 1 1 1
A61L 1 1 0 0 0 0 0 1 1 0 0 1 1 0 0 0
A61P 1 8 0 0 0 2 2 0 3 0 0 2 2 1 1 1
A61Q 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0
C07C 0 2 0 2 0 0 1 0 1 0 0 0 0 0 0 0
C07D 1 2 0 2 0 1 0 0 1 0 0 0 0 0 0 0
C07H 1 1 1 0 0 0 0 0 1 0 0 1 1 0 0 0
C07K 1 5 1 3 0 1 1 1 0 0 0 6 4 1 2 2
C11D 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0
C12M 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1
C12N 1 5 1 2 0 0 0 1 6 0 1 2 4 2 2 3
C12P 1 3 1 2 0 0 0 1 4 0 0 4 0 1 2 2
C12Q 0 1 0 1 0 0 0 0 1 0 1 2 1 1 1 2
C12R 0 1 0 1 0 0 0 0 2 0 0 2 2 1 0 2
G01N 0 1 0 1 0 0 0 0 2 0 1 3 2 2 2 0
Best,
Marcelina