Dear Statalist, I have a dataset of several firms patenting. However, there are patents with more than one firm, which is captured by different variables (citing_firm_id1, citing_firm_id2, citing_firm_id3…). My aim is to end with a single obs for each firm-year (a panel of firms). For this, I use the main firm variable which is “citing_firm_id1”. The problem is that there are some firms that never appear in this main variable, but they might appear in the rest of variables (citing_firm_id2, citing_firm_id3…), something I capture using the variables match2, match3,... which is equal to 1 if they do not appear in the main variable citing_firm_id1.

I would like to put those obs in which a given firm does not appear in the main variable (if match2, or match3… are 1) in the main variable. I think that maybe expanding only those obs and changing their ids in citing_firm_id1 for their ids in citing_firm_id2 or citing_firm_id3 might be a solution. Is it possible to do this only for those ids reporting a 1 in each of their matchx variables?

For instance, firm 993 is in the second firm variables (citing_firm_id2) and have match2==1. So, this firm never appear in citing_firm_id1 and thus, I would be losing such a firm. The idea would be to expand these 7 obs and put its id (993) in citing_firm_id1 only for those expanded obs. Notice however, that this firm is also in variable citing_firm_id3, so, I will need the same for this obs. It may happen this firm also appearing in several other variables (citing_firm_id4, citing_firm_id5… until citing_firm_id49) reporting its match4, maych5… variable as equal 1.

Also notice that in the case of citing_firm_id1== 2422461, the firms ids in citing_firm_id2 and citing_firm_id3 are reported as never showing up in citing_firm_id1. Thus, these 5 obs should be expanded 10 times (5 for id 3811035; and 5 for id 2754) while putting such ids in the expanded citing_firm_id1.

If you think there is a better way of dealing with this, it will be more than welcome.
Here you have a small dataex example for if you can help me with this problem.

Thanks a lot for your help.

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long(citing_firm_id1 citing_firm_id2) byte match2 int citing_firm_id3 byte match3 long(cited_pub_date citing_appln_id cited_appln_id) byte cit_total long citing_pub_date
1170342     993 1    . . 20110427 471259150 323675857 9 20180314
1170342     993 1    . . 20120613 470647463 339614333 8 20180221
1170342     993 1    . . 20100203 330934545 266772412 4 20120523
1170342     993 1    . . 20040114 448427401    340722 9 20170517
1170342     993 1    . . 20100203 420528862 266772412 4 20151202
1170342     993 1    . . 19960925 448427401  17174701 9 20170517
1170342     993 1    . . 20080716 496107987      7659 5 20191023
 114959    9472 1    . . 19810107  16716120  16457427 2 19890412
 114959    9472 1    . . 19851002  16716119  16605862 5 19890412
 114959    9472 1    . . 19810107  16716119  16457427 5 19890412
3083727    9429 .    . . 19951018  15747335  17106735 4 20020206
3083727    9429 .    . . 19910807  15747335  16888971 4 20020206
3083727    9429 .    . . 19941005  15747335  17032091 4 20020206
 520374   12024 .  993 1 20120613 451180603 339733097 7 20161019
2422461 3811035 1 2754 1 20140521 457930913 412195248 4 20171108
2422461 3811035 1 2754 1 20121212 457930913 363965896 4 20171108
2422461 3811035 1 2754 1 20121212 457930906 363965896 5 20171115
2422461 3811035 1 2754 1 19980617 457930906  17236097 5 20171115
2422461 3811035 1 2754 1 20140521 457930906 412195248 5 20171115
end