I am working with several (almost 100) database, which all have the same configuration.
I have data for each country of the EU from jan2000 to dec2019 (some of them have missing values, which are written ":").
Since all my database are on the same period & frequency, and since they all have the same data names, I would like to know how to merge them all in one file (that is the sum of all values for each country).
Illustration:
Database 1
m | y | AUSTRIA | BELGIUM (and LUXBG -> 1998) | BULGARIA | CYPRUS | CZECHIA (CS->1992) | GERMANY (incl DD from 1991) | DENMARK | ESTONIA |
1 | 2000 | : | : | : | : | : | 37328 | 148550 | : |
2 | 2000 | : | : | : | : | : | 38461 | 185092 | 67 |
3 | 2000 | : | : | : | : | : | : | 250220 | : |
4 | 2000 | : | : | : | : | : | 54536 | 181326 | : |
5 | 2000 | : | : | : | : | : | 150693 | 370858 | : |
6 | 2000 | : | : | : | : | : | 371919 | 1067027 | : |
7 | 2000 | : | : | : | : | : | 575746 | 1157469 | 56 |
8 | 2000 | : | : | : | : | : | 501796 | 1219928 | 47 |
9 | 2000 | : | : | : | : | : | 478317 | 903141 | 57 |
10 | 2000 | : | : | : | : | : | 569846 | 1190308 | 235 |
11 | 2000 | : | : | : | : | : | 541034 | 1096933 | 443 |
12 | 2000 | : | : | : | : | : | 807289 | 566551 | 388 |
Database 2
m | y | AUSTRIA | BELGIUM (and LUXBG -> 1998) | BULGARIA | CYPRUS | CZECHIA (CS->1992) | GERMANY (incl DD from 1991) | DENMARK | ESTONIA |
1 | 2000 | : | : | : | : | : | 418 | : | 243 |
2 | 2000 | : | : | 465 | : | : | : | : | 366 |
3 | 2000 | : | : | 38 | : | : | : | : | : |
4 | 2000 | : | : | 49 | : | : | : | : | : |
5 | 2000 | : | : | : | : | : | : | : | : |
6 | 2000 | : | : | : | : | : | : | : | : |
7 | 2000 | : | : | : | : | : | 64 581 | : | 54 |
8 | 2000 | : | : | 91 | : | : | : | : | 299 |
9 | 2000 | : | : | : | : | : | : | : | 366 |
10 | 2000 | : | : | 147 | : | : | : | : | 553 |
11 | 2000 | : | : | : | : | : | 15 646 | : | 249 |
12 | 2000 | : | : | : | : | : | 65 956 | 22 545 | 2 021 |
Database 3
m | y | AUSTRIA | BELGIUM (and LUXBG -> 1998) | BULGARIA | CYPRUS | CZECHIA (CS->1992) | GERMANY (incl DD from 1991) | DENMARK | ESTONIA |
1 | 2000 | : | : | 30 | : | : | 3 954 918 | : | 1 252 |
2 | 2000 | : | : | : | : | : | 638 856 | 83 229 | 818 |
3 | 2000 | : | 36 212 | : | : | : | 2 184 877 | 136 363 | 126 |
4 | 2000 | : | 17 648 | : | : | : | 1 097 291 | 91 441 | 585 |
5 | 2000 | : | : | : | : | : | 432 368 | 235 023 | 456 |
6 | 2000 | : | : | 100 | : | : | 493 020 | 471 125 | 196 |
7 | 2000 | : | : | : | : | : | 2 828 800 | 304 038 | 148 |
8 | 2000 | : | : | 96 | : | : | 2 440 080 | 453 412 | : |
9 | 2000 | : | 20 655 | : | : | : | 2 355 697 | 565 428 | : |
10 | 2000 | : | : | : | : | : | 4 057 272 | 747 554 | 241 |
11 | 2000 | : | : | 232 | : | : | 2 707 837 | 471 183 | : |
12 | 2000 | : | : | : | : | : | 3 514 004 | 353 058 | 344 |
and so on.
What I want is a database that is merging all and do the sum of all, or, in this example, these 3 database (under the name database4, for example):
Database 4
m | y | AUSTRIA | BELGIUM (and LUXBG -> 1998) | BULGARIA | CYPRUS | CZECHIA (CS->1992) | GERMANY (incl DD from 1991) | DENMARK | ESTONIA |
1 | 2000 | 0 | 0 | 30 | 0 | 0 | 3992664 | 148550 | 1495 |
2 | 2000 | 0 | 0 | 465 | 0 | 0 | 677317 | 268321 | 1251 |
3 | 2000 | 0 | 36212 | 38 | 0 | 0 | 2184877 | 386583 | 126 |
4 | 2000 | 0 | 17648 | 49 | 0 | 0 | 1151827 | 272767 | 585 |
5 | 2000 | 0 | 0 | 0 | 0 | 0 | 583061 | 605881 | 456 |
6 | 2000 | 0 | 0 | 100 | 0 | 0 | 864939 | 1538152 | 196 |
7 | 2000 | 0 | 0 | 0 | 0 | 0 | 3469127 | 1461507 | 258 |
8 | 2000 | 0 | 0 | 187 | 0 | 0 | 2941876 | 1673340 | 346 |
9 | 2000 | 0 | 20655 | 0 | 0 | 0 | 2834014 | 1468569 | 423 |
10 | 2000 | 0 | 0 | 147 | 0 | 0 | 4627118 | 1937862 | 1029 |
11 | 2000 | 0 | 0 | 232 | 0 | 0 | 3264517 | 1568116 | 692 |
12 | 2000 | 0 | 0 | 0 | 0 | 0 | 4387249 | 942154 | 2753 |
I hope that I have correctly explained my problem.
I know that I can't keep these ":" and that weird variable names are a problem so I will first of all run something like that for each database:
Code:
// Gen date from "y" and "m" columns {gen edate = ym(y, m) format edate %tm drop y m rename edate date } // rename {rename belgium~1998 belgium rename czechia~1992 czechia rename germany~1991 germany //and so on ... } // replace ":" by "." for STATA to understand that these are missing values { replace austria = "." if austria == ":" replace belgium = "." if belgium == ":" replace bulgaria = "." if bulgaria == ":" //and so on ... }
0 Response to Merge and sum several dataset
Post a Comment