I have a list of patents defined by an application id, each patent has number of technological classes, so called IPC4. I would like to calculate a simple co-occurrence matrix, i.e. how many patents are there with for example class A01N and class A61P. Below I include a part of my sample of patents with corresponding IPC4 codes as well as a resulting co-occurence matrix based on that sample. It has been calculated manually and as I have to repeat that exercise for much larger sample of 300K patents I am looking for more efficient way to tackle that task. Hence, I would like to kindly ask for any suggestions and clues as to how ( or if it is possible at all) to create such co-occurence matrix using Stata.
Sample
| appln_id | ipc4 |
| 335751077 | A01N |
| 458497114 | A01N |
| 497 | A61K |
| 1204 | A61K |
| 58708 | A61K |
| 159561 | A61K |
| 16525572 | A61K |
| 16684626 | A61K |
| 16906855 | A61K |
| 17420428 | A61K |
| 55216987 | A61K |
| 266933230 | A61K |
| 335751077 | A61K |
| 405325474 | A61K |
| 417635173 | A61K |
| 458497114 | A61K |
| 458497114 | A61L |
| 58708 | A61P |
| 159561 | A61P |
| 16684626 | A61P |
| 16906855 | A61P |
| 17420428 | A61P |
| 266933230 | A61P |
| 335751077 | A61P |
| 417635173 | A61P |
| 497 | A61Q |
| 16684626 | C07C |
| 17420428 | C07C |
| 17420428 | C07D |
| 335751077 | C07D |
| 458497114 | C07H |
| 2 | C07K |
| 72 | C07K |
| 1204 | C07K |
| 159561 | C07K |
| 16906855 | C07K |
| 17420428 | C07K |
| 458497114 | C07K |
| 497 | C11D |
| 55217042 | C12M |
| 2 | C12N |
| 72 | C12N |
| 1204 | C12N |
| 32352 | C12N |
| 159561 | C12N |
| 386134 | C12N |
| 16906855 | C12N |
| 55217042 | C12N |
| 405325474 | C12N |
| 458497114 | C12N |
| 2 | C12P |
| 159561 | C12P |
| 16906855 | C12P |
| 458497114 | C12P |
| 159561 | C12Q |
| 55217042 | C12Q |
| 417635364 | C12Q |
| 2 | C12R |
| 159561 | C12R |
| 2 | G01N |
| 159561 | G01N |
| 55217042 | G01N |
| ipc4 | A01N | A61K | A61L | A61P | A61Q | C07C | C07D | C07H | C07K | C11D | C12M | C12N | C12P | C12Q | C12R | G01N |
| A01N | 0 | 2 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
| A61K | 2 | 2 | 1 | 8 | 1 | 2 | 2 | 1 | 5 | 1 | 0 | 5 | 3 | 1 | 1 | 1 |
| A61L | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
| A61P | 1 | 8 | 0 | 0 | 0 | 2 | 2 | 0 | 3 | 0 | 0 | 2 | 2 | 1 | 1 | 1 |
| A61Q | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| C07C | 0 | 2 | 0 | 2 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| C07D | 1 | 2 | 0 | 2 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| C07H | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
| C07K | 1 | 5 | 1 | 3 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 6 | 4 | 1 | 2 | 2 |
| C11D | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| C12M | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
| C12N | 1 | 5 | 1 | 2 | 0 | 0 | 0 | 1 | 6 | 0 | 1 | 2 | 4 | 2 | 2 | 3 |
| C12P | 1 | 3 | 1 | 2 | 0 | 0 | 0 | 1 | 4 | 0 | 0 | 4 | 0 | 1 | 2 | 2 |
| C12Q | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 2 | 1 | 1 | 1 | 2 |
| C12R | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 2 | 2 | 1 | 0 | 2 |
| G01N | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 2 | 0 | 1 | 3 | 2 | 2 | 2 | 0 |
Marcelina
0 Response to Co-occurence matrix for patent technological classess
Post a Comment