Hello,

I'm trying to merge two datasets with -matchit- based on venture names and want to keep additional variables.

The two datasets I have contain the following information:

Dataset 1 [patent_count.dta]: Number of patents assigned to a venture in a specific year
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float assigneeid str256 assignee float(year patent_count)
36705 "Cumbre Pharmaceuticals Inc."  2005 7
36705 "Cumbre Pharmaceuticals, Inc." 2005 1
end
Dataset 2 [ventures.dta]: Additional variables about this venture and other ventures (such as SIC code and amount of funding)
Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float venture_id str60 venture str4 sic str17 venturefunding     
26 "Ardelyx Inc"                         "2836" "133.8618"         
10 "Argolyn Bioscience, Inc."            "2834" "10.1652"          
37 "Atara Biotherapeutics Inc"           "2834" "107"              
11 "Avidia, Inc."                        "2836" "78.5538"          
21 "Calistoga Pharmaceuticals, Inc."     "2834" "90.57899999999999"
58 "Caraway Therapeutics Inc"            "8731" "22.9999"          
 5 "Cumbre Pharmaceuticals Inc"          "2836" "39.5428"          
18 "DecImmune Therapeutics Inc"          "2834" "7.6"              
28 "Epizyme Inc"                         "2836" "98"               
54 "Glow Concept Inc"                    "5999" "9.516"            
48 "Glowforge Inc"                       "3577" "36.831"           
27 "Groove Biopharma Corp"               "8731" "13.5341"          
14 "Homestead Clinical Corporation"      "8731" "5.4502"           
 6 "ISB Accelerator"                     "6719" ".2001"            
60 "Ilumno Technologies Ltd Corp"        "7371" "-"                
46 "Imago Biosciences Inc"               "8731" "99.39019999999999"
 3 "Infinity Pharmaceuticals, Inc."      "2834" "185.2302"         
16 "InteKrin Therapeutics Inc"           "2836" "60.0351"          
19 "REN Pharmaceuticals Inc"             "2834" "6.3001"           
31 "Ra Pharmaceuticals Inc"              "8731" "84.8283"           
13 "TetraLogic Pharmaceuticals Inc"      "2834" "142.9995"         
12 "Theraclone Sciences Inc"             "2834" "55.9802"          
 9 "Viral Logic Systems Technology Corp" "2834" "49.0258"          
30 "Xori Corp"                           "2836" "4.4018"           
23 "miRagen Therapeutics Inc"            "8731" "103.2001"         
end

I would now like to match the first dataset (patent_count) to the second (ventures) based on the venture names, and include the SIC and amount of funding variables, so that my final dataset looks like this:

Venture Year Patent Count SIC Funding

I have tried opening the patent_count dataset in STATA and then running the following code:
Code:
matchit assigneeid assignee using "ventures.dta", idusing(venture_id) txtusing(venture)
However, it drops the additional variables (SIC Code and amount of funding) and also it seems to run into some issues as the venture names often include for example "Inc" or "Pharmaceuticals", which leads to fairly high similarity score even though the ventures are different.

Is there any way to circumvent these issues?

Thanks in advance!