I have a list of patents defined by an application id, each patent has number of technological classes, so called IPC4. I would like to calculate a simple co-occurrence matrix, i.e. how many patents are there with for example class A01N and class A61P. Below I include a part of my sample of patents with corresponding IPC4 codes as well as a resulting co-occurence matrix based on that sample. It has been calculated manually and as I have to repeat that exercise for much larger sample of 300K patents I am looking for more efficient way to tackle that task. Hence, I would like to kindly ask for any suggestions and clues as to how ( or if it is possible at all) to create such co-occurence matrix using Stata.
Sample
appln_id | ipc4 |
335751077 | A01N |
458497114 | A01N |
497 | A61K |
1204 | A61K |
58708 | A61K |
159561 | A61K |
16525572 | A61K |
16684626 | A61K |
16906855 | A61K |
17420428 | A61K |
55216987 | A61K |
266933230 | A61K |
335751077 | A61K |
405325474 | A61K |
417635173 | A61K |
458497114 | A61K |
458497114 | A61L |
58708 | A61P |
159561 | A61P |
16684626 | A61P |
16906855 | A61P |
17420428 | A61P |
266933230 | A61P |
335751077 | A61P |
417635173 | A61P |
497 | A61Q |
16684626 | C07C |
17420428 | C07C |
17420428 | C07D |
335751077 | C07D |
458497114 | C07H |
2 | C07K |
72 | C07K |
1204 | C07K |
159561 | C07K |
16906855 | C07K |
17420428 | C07K |
458497114 | C07K |
497 | C11D |
55217042 | C12M |
2 | C12N |
72 | C12N |
1204 | C12N |
32352 | C12N |
159561 | C12N |
386134 | C12N |
16906855 | C12N |
55217042 | C12N |
405325474 | C12N |
458497114 | C12N |
2 | C12P |
159561 | C12P |
16906855 | C12P |
458497114 | C12P |
159561 | C12Q |
55217042 | C12Q |
417635364 | C12Q |
2 | C12R |
159561 | C12R |
2 | G01N |
159561 | G01N |
55217042 | G01N |
ipc4 | A01N | A61K | A61L | A61P | A61Q | C07C | C07D | C07H | C07K | C11D | C12M | C12N | C12P | C12Q | C12R | G01N |
A01N | 0 | 2 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
A61K | 2 | 2 | 1 | 8 | 1 | 2 | 2 | 1 | 5 | 1 | 0 | 5 | 3 | 1 | 1 | 1 |
A61L | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
A61P | 1 | 8 | 0 | 0 | 0 | 2 | 2 | 0 | 3 | 0 | 0 | 2 | 2 | 1 | 1 | 1 |
A61Q | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
C07C | 0 | 2 | 0 | 2 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
C07D | 1 | 2 | 0 | 2 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
C07H | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
C07K | 1 | 5 | 1 | 3 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 6 | 4 | 1 | 2 | 2 |
C11D | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
C12M | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
C12N | 1 | 5 | 1 | 2 | 0 | 0 | 0 | 1 | 6 | 0 | 1 | 2 | 4 | 2 | 2 | 3 |
C12P | 1 | 3 | 1 | 2 | 0 | 0 | 0 | 1 | 4 | 0 | 0 | 4 | 0 | 1 | 2 | 2 |
C12Q | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 2 | 1 | 1 | 1 | 2 |
C12R | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 2 | 2 | 1 | 0 | 2 |
G01N | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 2 | 0 | 1 | 3 | 2 | 2 | 2 | 0 |
Marcelina
0 Response to Co-occurence matrix for patent technological classess
Post a Comment