I am working with a lot of data and could really use your help. I am trying to create a pairwise measure for some variables for which I create a matrix out of two categorical variables (one of which I then turn into dummy variables for inventors, so a 1 is indicating that assignee i has worked that inventor). However, I can't simply create dummies for all the categories of the inventor variable, because those would be more variables than stata would allow me to use. Is there a way for me to write a loop that always takes two assignees at a time and then creates my similarity measure for these and in the end adds everything together?
Given my data example, below is the code that I have been working with on smaller samples. What I want to do from here is have the code loop through a few pairs of assignees at a time, somehow store the results (in pairwise format) and merge them, so that in the end I will get a data set which has three variables | assignee_id_1 assignee_id_2 total_inv | where the last variable is the measure denoting something along the lines of the total shared inventors. Note that the first part of the code (under **manual dummy creator) won't work in my scenario because there are two many variables being created and would have to be incorporated into the loop:
I have also added a data example at the very bottom. Please do let me know should anything be unclear and I will try to resolve any confusion as quickly as possible. I would really appreciate any help I could get.
Thanks,
John
Code:
********************************* encode inventor_id, gen(enc_inv_id) **manual dummy creator levelsof enc_inv_id, local(n) foreach j in `n'{ gen d_`j'= `j' == enc_inv_id } ***create matrix collapse(max) d_*, by(assignee_id) ****** ******Jaccard ****** ****** ******Jaccard ****** preserve rename * *_inv tempfile id2_inv save `id2_inv' restore cross using `id2_inv' drop if assignee_id== assignee_id_inv drop if assignee_id< assignee_id_inv *number is 30325 for 500 forval i=1/30235{ capture gen tinv_`i'= cond((d_`i'+d_`i'_inv) > 1, 1, (d_`i'+d_`i'_inv)) } egen total_inv= rowtotal(tinv_*)
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str11 inventor_id str36 assignee_id "10082632-2" "org_LzsrUkjiWAIjd5v2paoI" "7107498-1" "org_kSu9KMJQP5Qjd1ksQEfJ" "7298882-3" "org_Re0fk3jNFREw0dcTKfbh" "5493647-7" "org_vhtWu21o8nVXJPQm82Rg" "10050270-2" "org_vhtWu21o8nVXJPQm82Rg" "10072529-1" "org_vhtWu21o8nVXJPQm82Rg" "10085369-1" "org_0b1dsqimqIilzoArXt5E" "10085369-1" "org_0b1dsqimqIilzoArXt5E" "10079282-2" "org_vhtWu21o8nVXJPQm82Rg" "10066281-3" "org_D6QsTT7TAi7VCJKGWsch" "7797813-4" "org_KkXCLSsnWnBXYx4AkNuK" "10038545-4" "org_ISzfczIbBwtvODSmyDh0" "6020252-1" "org_SIFMk7wufuE8EAAX00co" "7397516-1" "org_0b1dsqimqIilzoArXt5E" "10044878-2" "org_Z5OO886eRu5q4OFbfJki" "6041748-2" "org_1zuIlrebuviyK6v0YrBW" "5973456-2" "org_0b1dsqimqIilzoArXt5E" "8132276-1" "org_Re0fk3jNFREw0dcTKfbh" "9970057-2" "org_MXADrSM40Tb8jm6v4hYe" "6583036-1" "org_hke26dncXOrqlNsMhgUL" "10000438-1" "org_J6ZIJIkyzJ4mfvUJ0EPZ" "8550478-4" "org_UVkz6sbdutjKFh5akYfA" "6615990-3" "org_1zuIlrebuviyK6v0YrBW" "7594444-5" "org_69z9bh3CjxKHL2yOCxil" "5287700-2" "org_wHd57MIpQIC17zih5MDg" "5140178-1" "org_vhtWu21o8nVXJPQm82Rg" "5432481-1" "org_vhtWu21o8nVXJPQm82Rg" "10014593-1" "org_wHd57MIpQIC17zih5MDg" "7324670-1" "org_vhtWu21o8nVXJPQm82Rg" "10040763-3" "org_OeOCSMKvIyMnrOBiLUWE" "10040763-3" "org_OeOCSMKvIyMnrOBiLUWE" "7805565-3" "org_STc1KHT2TAIOvxitTruU" "9878450-7" "org_wHd57MIpQIC17zih5MDg" "5281520-3" "org_l2aE7VeZKBQmMw8FsMro" "5281520-3" "org_l2aE7VeZKBQmMw8FsMro" "10135172-1" "org_LzuWdBdhsJGSOccRyR7c" "3930622-1" "org_vhtWu21o8nVXJPQm82Rg" "9112343-6" "org_wHd57MIpQIC17zih5MDg" "10002621-1" "org_ISzfczIbBwtvODSmyDh0" "10002621-1" "org_ISzfczIbBwtvODSmyDh0" "10002621-1" "org_ISzfczIbBwtvODSmyDh0" "8951673-1" "org_hwwhVrm6DmcUHmptTCv2" "8951673-1" "org_hwwhVrm6DmcUHmptTCv2" "5746724-3" "org_gzQbESqZ6fQ4qpVglKBf" "5632663-1" "org_StddogfTkajU6bPvEq59" "10025048-1" "org_hke26dncXOrqlNsMhgUL" "10076373-1" "org_bl4IwhdXgo8T0BvN5rgl" "10012962-4" "org_67H6kwB9En0b4zX9LlgF" "7312330-5" "org_7Ii76o0QYwYxPzsgHLSD" "10100121-3" "org_DyNSIsS5orV1ZMcyOuCN" end
0 Response to Loop to avoid using more variables than Stata allows
Post a Comment