Hi everyone,
I am relatively new to stata. For my thesis, I am working on a dataset that lists patents and cited patents with their respective filing date and industry classification, see an example below.
Now I would like to identify how often the industry classification of each patent (cpc_p) has been used 5 years before the filing of each individual patent (date_p), and what percentage of these citations have been referring to other patents of the same industry class (cpc_c). Optimally I could thus create a new variable that expresses this percentage for each observation.
I would imagine this could be achieved by counting the occurrences of equal and unequal industry classifications (i.e. cpc_p = or ≠ cpc_c). However, I can not figure out how to express these for a limited but shifting timeframe (i.e. 5 years before date_p).

Any help would be much appreciated.

Thank you for your support!

Best,

Paul


Example of data:
patent_id citation_id date_p date_c cpc_p cpc_c
6523634 4779697 1998-04-09 1996-06-11 B62K B62K
6732830 4779697 2002-03-12 1996-06-11 B62K B62K
5514171 4784151 1994-07-07 1992-03-30 A61B A61B
6591143 4784151 2000-05-24 1992-03-30 A61N A61B
5271392 4784151 1997-01-27 1992-03-30 A61N A61B