I came across your matchit command in Stata for data consolidation and cleaning using fuzzy string comparisons. I would like to use it for matching EU-ETS installations (ID) and emission details (ED) of such installations. ID contains location and ED contains emissions from such installations. Both the ID and ED file contains unique identification code PERMIT_ID. The problem is there are more than 15,000 entries for ID and more than 12,000 entries for ED, I want to match both the files and find the locations for installations in the ED file, I know the only way is to use by PERMIT_ID, but I cannot figure out the command. . I have checked in excel both the files have more than 10,000 common entries. I also do not want to discard the extra 2,000 in the ED file.
The columns look as below. I would really appreciate if you can suggest me a way.
ED file:
INSTALLATION_NAME | PERMIT_ID | 2019 | 2018 | 2017 |
Baumit Baustoffe Bad Ischl | IKA119 | 46415 | 42302 | 48681 |
Breitenfelder Edelstahl Mitterdorf | IES069 | 26777 | 23457 | 21031 |
Ziegelwerk Danreiter Ried im Innkreis | IZI155 | 5129 | 3738 | 3487 |
Isomax Dekorative Laminate Wiener Neudorf | ICH113 | 33020 | 30922 | 30088 |
Sandoz Werk Kundl | ICH106 | 57869 | 58011 | 65993 |
Ziegelwerk Martin Pichler Aschach | IZI150 | 14202 | 14040 | 8852 |
FHKW Süd StW St. Pölten | EFE041 | 1476 | 3060 | 3687 |
FHKW Nord StW St. Pölten | EFE040 | 31973 | 32342 | 31624 |
Vetropack Pöchlarn | IGL173 | 59982 | 59513 | 58286 |
Vetropack Kremsmünster | IGL172 | 68178 | 61631 | 66001 |
Sinteranl., Hochöfen, Stahlwerk Donawitz | IVA065 | 2846643 | 2923552 | 3075201 |
Voestalpine Stahl Linz | IVA062 | 8812969 | 7816077 | 9220971 |
VOEST-Alpine Stahl Linz (Kalk) Steyrling | IKA120 | 303621 | 274486 | 346279 |
ID file:
Installation Name | Permit_ID | NUTS_ID |
Baumit Baustoffe Bad Ischl | IKA119 | AT31 |
Breitenfelder Edelstahl Mitterdorf | IES069 | AT22 |
Ziegelwerk Danreiter Ried im Innkreis | IZI155 | AT31 |
Wienerberger Blindenmarkt | IZI146-1 | AT12 |
Isomax Dekorative Laminate Wiener Neudorf | ICH113 | AT12 |
Sandoz Werk Kundl | ICH106 | AT33 |
Ziegelwerk Martin Pichler Aschach | IZI150 | AT31 |
FHKW Süd StW St. Pölten | EFE041 | AT12 |
FHKW Nord StW St. Pölten | EFE040 | AT12 |
Vetropack Pöchlarn | IGL173 | AT12 |
Vetropack Kremsmünster | IGL172 | AT31 |
Energiepark Donawitz | IVA066 | AT22 |
Sinteranl., Hochöfen, Stahlwerk Donawitz | IVA065 | AT22 |
Thanks
0 Response to MATCHIT- Stata for data consolidation and cleaning using fuzzy string comparisons
Post a Comment