[CODE]
Example of file 1 (master)
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str17 STATE str26(DISTRICT TEHSIL) float id str54 sdt "VP" "AD" "Adilabad" 1 "Andhra Pradesh Adilabad Adilabad" "VP" "AD" "Asifabad" 2 "Andhra Pradesh Adilabad Asifabad" "VP" "AD" "Bazarhathnoor" 3 "Andhra Pradesh Adilabad Bazarhathnoor" "VP" "AD" "Bejjur" 4 "Andhra Pradesh Adilabad Bejjur" "VP" "AD" "Bela" 5 "Andhra Pradesh Adilabad Bela" "VP" "AD" "Bellampalle" 6 "Andhra Pradesh Adilabad Bellampalle" "VP" "AD" "Bhainsa" 7 "Andhra Pradesh Adilabad Bhainsa" "VP" "AD" "Bhimini" 8 "Andhra Pradesh Adilabad Bhimini" "VP" "AD" "Boath" 9 "Andhra Pradesh Adilabad Boath" "VP" "AD" "Chennur" 10 "Andhra Pradesh Adilabad Chennur" end
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str25 State str28 District str37 Subdisrict float id str60 sdt "VP" "AD" "Adilabad" 1 "Andhra Pradesh Adilabad Adilabad" "VP" "AD" "Asifabad" 2 "Andhra Pradesh Adilabad Asifabad" "VP" "AD" "Bazarhathnoor" 3 "Andhra Pradesh Adilabad Bazarhathnoor" "VP" "AD" "Bejjur" 4 "Andhra Pradesh Adilabad Bejjur" "VP" "AD" "Bela" 5 "Andhra Pradesh Adilabad Bela" "VP" "AD" "Bellampalle" 6 "Andhra Pradesh Adilabad Bellampalle" "VP" "AD" "Bhainsa" 7 "Andhra Pradesh Adilabad Bhainsa" "VP" "AD" "Bhimini" 8 "Andhra Pradesh Adilabad Bhimini" "VP" "AD" "Boath" 9 "Andhra Pradesh Adilabad Boath" "VP" "AD" "Chennur" 10 "Andhra Pradesh Adilabad Chennur" end
To run matchit I have generated a variable sdt (state+district+sub-district)
The command I use is
Code:
matchit id sdt using file2.dta, idusing(id) txtusing(sdt) sim(token) w(simple) override
My problem is that when i run the above command I get around 4422 matches with similscore=1
However in the rest of the matches there are some clear errors.
Even when there are the same states, districts, and sub-districts are present i get matches with incorrect states.
I have attached an example below
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input int id str60 sdt int id1 str60 sdt1 double similscore 2253 "Himachal Pradesh Hamirpur Hamirpur" 4800 "Uttar Pradesh Hamirpur Hamirpur" .9811807135173987 5286 "Uttar Pradesh Hamirpur Hamirpur" 2286 "Himachal Pradesh Hamirpur Hamirpur" .9811807135173987 1182 "Assam Jorhat Jorhat East" 1188 "Assam Jorhat Jorhat West" .9828654154650485 3150 "Karnataka Bijapur Bijapur" 1834 "Chhattisgarh Bijapur Bijapur" .9844976172254488 5392 "Uttar Pradesh Pratapgarh Pratapgarh" 4400 "Rajasthan Pratapgarh Pratapgarh" .9884977970834815 4881 "Rajasthan Pratapgarh Pratapgarh" 4909 "Uttar Pradesh Pratapgarh Pratapgarh" .9884977970834815 end
I have seen earlier posts (and reclink) but and not able to adjust my matchit command
I would appreciate any suggestions
0 Response to Fuzzy matching two data-sets using Matchit
Post a Comment