Hi all,
I have dataset with patent, patent class, inventor and firm. Example as follows:
firm_id year inventor move patent class sourcef kd
10036 2001 4144893-1 0 6197294 424093 .
10036 2001 4144893-1 0 6225448 424130 .
10036 2001 4212742-2 1 6322804 424422 100390 0.8
10036 2001 4495402-2 0 6322804 424422
10036 2001 4861627-1 0 6262034 424468
10036 2001 4868121-1 0 6322804 424422
10036 2001 4868121-2 0 6322804 424422
10036 2001 4877029-2 6322804 424422
firm_id: the focal firm who hires inventor from source firm (sourcef_id) at year t.
inventor_id: inventors hwo is now working at focal firm (firm_id).
move: dummy variable, 1 if inventor move to focal firm at year t, otherwise 0.
class: classification number which indicates certain class for each patent (patent_id)
sourcef_id: id for source firm, that is the inventors' employer before moving to focal firm. this variable only exists when inventor move (move =1).
kd: knowledge distance indicates knowledge distance between hiring firm and mobile inventor at the time inventor moves in to hiring firm. This is the desired varaible.
E.g.The third row of the dataset says that: focal firm (firm_id) hire inventor 4212742-2 at year 2001 from firm 100390. And I woule like to know the knwoledge distance between inventor 4212742-2 and firm 10036.
How calculate:
correlation of vector Ci and Cj .
kdij=Ci*Cj / [(Ci*Ci)(Cj*Cj)]1/2
Ci: proportion of inventor i patent at each patent class 5 years before move.
Cj: proportion of firm j patent at each patent class 5 tears before hiring certain inventor.
kd is a variable from 0 to 1, the larger, the less knowledge distance (more close) between hiring firm and mobilr inventor.
Take the third row as an example, inventor 4212742-2 move to firm 10036 at 2001. Calculation could be as follows as I could understand (probabily more siple and easy method could be used):
Firstly, I will calculate patent class matrix Ci for inventor 4212742-2. Let's suppose, 5 years before move the inventor's patent hisotry in patet class is as follows:
Ci = [1/16, 3/16, 0, 0 ,0 , 5/16, 0, 0, ......] : 1*k matrix. k will be number of patent class.
Secondly, calculate patent class matrix for hiring firm.
Cj= [1/120, 1/120, 0, 0 ,0 , 4/120, 0, 0, ......] : 1*k matrix. k will be number of patent class.
Thirdly, the knowledge distance above could be calculated for each pair of mobile inventor and hiring firm.
However, there're thousands of inventors' mobility events at different years. I have problem transfering these logic into codes. Any suggestions would be rather appreciated!!
Note:
1. for inventors who have less than 5-year patent history before move, then calculate as the years they have.
2. for firms with less than 5-year patent history before hiring, calculate the years as they have.
0 Response to Caculate proximity between two variables
Post a Comment