Dear Statalist Community,

I am currently struggling with a data transformation task and was wondering if you could help me (see example below).
  1. First, I want to calculate the cumulative number of patents (patent_id) in a moving window of ten years (patent_year) for a set of companies (firm_id); i.e., how many patents have been granted to the company in the last ten years? In the following example, the variable of interest is entered manually as “cum_10y”.
  2. Second, I would like to obtain the cumulative number of different patent classes of each company in a moving ten-year window. In other words, I would like to know in which patent classes a company has been active over the last ten years. In the example, the patent classes are given as string variables in the columns ipc_class1 to ipc_class3 (although there are more columns in the actual data set). The unique ipc classes (e.g., "04") should be written as new columns (e.g., class4), and the new entries should contain the cumulative sum of the last ten years. Also note that the same ipc_classes can be listed several times and appear in different columns, but should only be counted once. The new variables of interest are manually inserted as class1 to class32 in the example below, with each column containing the cumulative moving sum some for each unique patent class.
  3. Third, I would like to expand the data set so that missing years are added and filled with the value from the previous year (e.g., if there was no patent in 2000 for a given firm, I would like to add a new line for patent_year 2000 containing the values from the previous available year). Sometimes, several years may be missing.
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input int patent_id byte firm_id int patent_year str2(ipc_class1 ipc_class2 ipc_class3) byte(cum_10y class1 class2 class4 class9 class32)
123 1 1990 "04" "04" "01" 1 1 . 1 . .
124 1 1991 "01" "01" ""   5 5 . 1 1 .
125 1 1991 "01" "09" ""   5 5 . 1 1 .
126 1 1991 "01" ""   ""   5 5 . 1 1 .
127 1 1991 "01" ""   ""   5 5 . 1 1 .
128 1 1992 "01" ""   ""   7 6 . 1 1 .
129 1 1992 "04" "01" ""   7 6 . 2 1 .
130 1 2003 "09" ""   ""   1 . . . 1 .
131 1 2004 "01" ""   ""   3 1 . . 1 .
132 1 2004 "04" "04" ""   3 1 . 1 1 .
133 1 2007 "01" ""   ""   4 2 . 1 1 .
134 1 2008 "32" ""   ""   5 2 . 1 1 1
135 1 2009 "01" ""   ""   6 3 . 1 1 1
221 2 2000 "01" ""   ""   1 1 . . . .
222 2 2001 "01" "04" ""   2 2 . 1 . .
223 2 2002 "32" "02" "02" 3 2 1 1 . 1
224 2 2006 "01" ""   ""   5 4 1 1 . .
225 2 2006 "01" "04" "04" 5 4 1 2 . .
226 2 2007 "01" ""   ""   6 5 1 2 . .
227 2 2009 "09" "01" ""   7 6 1 2 1 .
228 2 2012 "32" ""   ""   6 6 1 2 1 1
end

I apologize for this lengthy post and look forward to a reply.

Kind regards,

Marvin