Hi everyone,

I am working with a lot of data and could really use your help. I am trying to create a pairwise measure for some variables for which I create a matrix out of two categorical variables (one of which I then turn into dummy variables for inventors, so a 1 is indicating that assignee i has worked that inventor). However, I can't simply create dummies for all the categories of the inventor variable, because those would be more variables than stata would allow me to use. Is there a way for me to write a loop that always takes two assignees at a time and then creates my similarity measure for these and in the end adds everything together?

Given my data example, below is the code that I have been working with on smaller samples. What I want to do from here is have the code loop through a few pairs of assignees at a time, somehow store the results (in pairwise format) and merge them, so that in the end I will get a data set which has three variables | assignee_id_1 assignee_id_2 total_inv | where the last variable is the measure denoting something along the lines of the total shared inventors. Note that the first part of the code (under **manual dummy creator) won't work in my scenario because there are two many variables being created and would have to be incorporated into the loop:

I have also added a data example at the very bottom. Please do let me know should anything be unclear and I will try to resolve any confusion as quickly as possible. I would really appreciate any help I could get.

Thanks,
John

Code:
*********************************
encode inventor_id, gen(enc_inv_id)
**manual dummy creator
levelsof enc_inv_id, local(n)
foreach j in `n'{
gen d_`j'= `j' == enc_inv_id
}

***create matrix
collapse(max) d_*, by(assignee_id)



******
******Jaccard
******
******
******Jaccard
******
preserve
rename * *_inv
tempfile id2_inv
save `id2_inv'
restore
cross using `id2_inv'
drop if assignee_id== assignee_id_inv
drop if assignee_id< assignee_id_inv

*number is 30325 for 500
forval i=1/30235{
capture gen tinv_`i'= cond((d_`i'+d_`i'_inv) > 1, 1, (d_`i'+d_`i'_inv))
}

egen total_inv= rowtotal(tinv_*)

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input str11 inventor_id str36 assignee_id
"10082632-2" "org_LzsrUkjiWAIjd5v2paoI"
"7107498-1"  "org_kSu9KMJQP5Qjd1ksQEfJ"
"7298882-3"  "org_Re0fk3jNFREw0dcTKfbh"
"5493647-7"  "org_vhtWu21o8nVXJPQm82Rg"
"10050270-2" "org_vhtWu21o8nVXJPQm82Rg"
"10072529-1" "org_vhtWu21o8nVXJPQm82Rg"
"10085369-1" "org_0b1dsqimqIilzoArXt5E"
"10085369-1" "org_0b1dsqimqIilzoArXt5E"
"10079282-2" "org_vhtWu21o8nVXJPQm82Rg"
"10066281-3" "org_D6QsTT7TAi7VCJKGWsch"
"7797813-4"  "org_KkXCLSsnWnBXYx4AkNuK"
"10038545-4" "org_ISzfczIbBwtvODSmyDh0"
"6020252-1"  "org_SIFMk7wufuE8EAAX00co"
"7397516-1"  "org_0b1dsqimqIilzoArXt5E"
"10044878-2" "org_Z5OO886eRu5q4OFbfJki"
"6041748-2"  "org_1zuIlrebuviyK6v0YrBW"
"5973456-2"  "org_0b1dsqimqIilzoArXt5E"
"8132276-1"  "org_Re0fk3jNFREw0dcTKfbh"
"9970057-2"  "org_MXADrSM40Tb8jm6v4hYe"
"6583036-1"  "org_hke26dncXOrqlNsMhgUL"
"10000438-1" "org_J6ZIJIkyzJ4mfvUJ0EPZ"
"8550478-4"  "org_UVkz6sbdutjKFh5akYfA"
"6615990-3"  "org_1zuIlrebuviyK6v0YrBW"
"7594444-5"  "org_69z9bh3CjxKHL2yOCxil"
"5287700-2"  "org_wHd57MIpQIC17zih5MDg"
"5140178-1"  "org_vhtWu21o8nVXJPQm82Rg"
"5432481-1"  "org_vhtWu21o8nVXJPQm82Rg"
"10014593-1" "org_wHd57MIpQIC17zih5MDg"
"7324670-1"  "org_vhtWu21o8nVXJPQm82Rg"
"10040763-3" "org_OeOCSMKvIyMnrOBiLUWE"
"10040763-3" "org_OeOCSMKvIyMnrOBiLUWE"
"7805565-3"  "org_STc1KHT2TAIOvxitTruU"
"9878450-7"  "org_wHd57MIpQIC17zih5MDg"
"5281520-3"  "org_l2aE7VeZKBQmMw8FsMro"
"5281520-3"  "org_l2aE7VeZKBQmMw8FsMro"
"10135172-1" "org_LzuWdBdhsJGSOccRyR7c"
"3930622-1"  "org_vhtWu21o8nVXJPQm82Rg"
"9112343-6"  "org_wHd57MIpQIC17zih5MDg"
"10002621-1" "org_ISzfczIbBwtvODSmyDh0"
"10002621-1" "org_ISzfczIbBwtvODSmyDh0"
"10002621-1" "org_ISzfczIbBwtvODSmyDh0"
"8951673-1"  "org_hwwhVrm6DmcUHmptTCv2"
"8951673-1"  "org_hwwhVrm6DmcUHmptTCv2"
"5746724-3"  "org_gzQbESqZ6fQ4qpVglKBf"
"5632663-1"  "org_StddogfTkajU6bPvEq59"
"10025048-1" "org_hke26dncXOrqlNsMhgUL"
"10076373-1" "org_bl4IwhdXgo8T0BvN5rgl"
"10012962-4" "org_67H6kwB9En0b4zX9LlgF"
"7312330-5"  "org_7Ii76o0QYwYxPzsgHLSD"
"10100121-3" "org_DyNSIsS5orV1ZMcyOuCN"
end