I try to use fuzzy match commands matchit and reclink to merge two datasets.
Here is an example of master file. I am focusing on using the third column cnms (company name) to match data.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float fyear str58 conm str50 cnms 2004 "180 CONNECT INC" "DIRECTV GROUP INC" 2005 "180 CONNECT INC" "DIRECTV GROUP INC" 2006 "180 CONNECT INC" "DIRECTV GROUP INC" 2007 "180 CONNECT INC" "DIRECTV GROUP INC" 2000 "1MAGE SOFTWARE INC" "Reynolds & Reynolds -CL A" 2001 "1MAGE SOFTWARE INC" "Reynolds & Reynolds -CL A" 2002 "1MAGE SOFTWARE INC" "Reynolds & Reynolds -CL A" 2003 "1MAGE SOFTWARE INC" "Reynolds & Reynolds -CL A" 2012 "2U INC" "Georgetown University School of Nursing and Health" 2012 "2U INC" "University of Southern California" end
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str50 cnms str6 gvkey_cus str9 cusip_cus "20TH CENTRY" "012886" "90130A101" "20TH CENTURY FOX" "012886" "90130A101" "20TH CENTY" "012886" "90130A101" "TWENTY-FIRST CENTURY FOX INC" "012886" "90130A101" "2122UNITED NATURAL FOODS INC" "#N/A" "" "21ST CENTY TELECOM GROUP INC" "#N/A" "" "238 TELECOM LIMITED" "#N/A" "" "24 HOUR FITNESS" "#N/A" "" "24 HOUR FITNESS USA, INC." "#N/A" "" "24 HOUR FITNESS WORLD, INC." "#N/A" "" "24/7" "#N/A" "" end
Code:
reclink cnms using final1000, idmaster(idmaster) idusing(idusing) gen(matchscore) _merge(_merge) minscore(.9)
Code:
matchit idmaster cnms using final1000, idusing(idusing) txtusing(cnms)
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float fyear str58 conm str6 gvkey str10 cusip str4 sic str6 naics str50(cnms Ucnms) str8 ctype double salecs float(idmaster matchscore idusing) str6 gvkey_cus str9 cusip_cus byte _merge 2001 "ACURA PHARMACEUTICALS INC" "011929" "00509L802" "2834" "325412" "WATSON PHARMACEUTICALS INC" "AGIOS PHARMACEUTICALS INC" "COMPANY" 14.559 3359 .9310636 827 "#N/A" "" 3 2002 "ACURA PHARMACEUTICALS INC" "011929" "00509L802" "2834" "325412" "WATSON PHARMACEUTICALS INC" "AGIOS PHARMACEUTICALS INC" "COMPANY" 6.974 3361 .9310636 827 "#N/A" "" 3 2003 "ACURA PHARMACEUTICALS INC" "011929" "00509L802" "2834" "325412" "WATSON PHARMACEUTICALS INC" "AGIOS PHARMACEUTICALS INC" "COMPANY" 3.335 3362 .9310636 827 "#N/A" "" 3 2009 "ADTRAN INC" "030576" "00738A106" "3661" "334210" "AT&T INC" "AT&T INC" "COMPANY" 106.521 4057 1 116 "009899" "00206R102" 3 2010 "ADTRAN INC" "030576" "00738A106" "3661" "334210" "AT&T INC" "AT&T INC" "COMPANY" 109.021 4064 1 116 "009899" "00206R102" 3 2001 "ADV NEUROMODULATION SYS INC" "008872" "00757T101" "3845" "334510" "ARROW INTERNATIONAL" "ADS INTERNATIONAL" "COMPANY" 1.8 4100 .9397588 530 "#N/A" "" 3 2002 "ADV NEUROMODULATION SYS INC" "008872" "00757T101" "3845" "334510" "ARROW INTERNATIONAL" "ADS INTERNATIONAL" "COMPANY" 2.78 4102 .9397588 530 "#N/A" "" 3 2003 "ADV NEUROMODULATION SYS INC" "008872" "00757T101" "3845" "334510" "ARROW INTERNATIONAL" "ADS INTERNATIONAL" "COMPANY" 1.44 4106 .9397588 530 "#N/A" "" 3 end
0 Response to How to overcome problems in fuzzy match via matchit and reclink?
Post a Comment