I have the following problem:
I am trying to match two survey waves (let's call 2012 and 2014).
I have information on the country(nuts), sector(nace_b), turnover in 2012 and employees in 2012. (so the 2012 figures are asked in the 2014 survey as well, regarding turnover and employees).
So in theory, exact matching should not be much of a problem.
Nevertheless, there might be a small deviance on the turnover and employee figures. (they might be answered by a different person, might be rounded slightly different in the two waves...).
Therefore, and exact match might not return an appropriate number of matches.
Let's say we match 2012 (master) to the 2014 data.
From this problem, I formulate the following requirements for my match:
-exact matching on nuts and nace_b (find for every 2012 observation all 2014 observations in the same country and industry)
-keep only the 2014 observations on turnover and employee figures that deviate maximum a certain % from the 2012 figures (let's say with maximum 5% difference).
-within these potential subgroups that fulfill the previous requirements, take the closest 2014 observation (to keep it simple: that has the least % of total deviance in employee and turnover data, combined).
-As a last requirement, every 2014 observation should be matched to it's closest 2012 counterpart as well... (this is the part that confuses me the most).
I have looked into the following commands:
psmatch2
teffects
calipmatch
cem
...
But I cannot find how to use them appropriately to match my data as required.
Herewith an example of the data generated by datatex:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str2 nuts str5 nace_b double turn12 float(emp12 year) "HU" "31" 16979712.259999998 5659 2014 "HU" "23" 39522475 13174 2012 "NO" "58" 11520000 3840 2014 "ES" "52" 28719661 9573 2012 "ES" "46" 56909291 18969 2012 "DE" "22" 13906200 4635 2014 "ES" "59-60" 1224437.48 408 2014 "ES" "10-12" 16174637 5391 2012 "DE" "28" 5540000 1846 2012 "NO" "26" 209000 69 2014 "ES" "73" 8702138 2900 2012 "HR" "10-12" 232659000 77553 2014 "PT" "49" 1700603.8 566 2014 "ES" "62-63" 1163809 387 2012 "DE" "26" 94100000 31366 2014 "ES" "62-63" 7486122 2495 2014 "ES" "23" 47452171.339999996 15817 2014 "ES" "46" 178943368 59647 2014 "BG" "46" 729113 243 2012 "BG" "14-15" 17036.32 5 2014 "DE" "25" 3650000 1216 2014 "CZ" "13" 30904529 10301 2012 "CZ" "31" 931757.54 310 2014 "EE" "23" 1690129 563 2012 "PT" "13" 6110728 2036 2012 "ES" "62-63" 1627808 542 2012 "CZ" "46" 15372063 5124 2014 "DE" "22" 14190000 4730 2012 "CZ" "24" 165548690 55182 2012 "DE" "37-39" 2583280 861 2014 "BG" "16" 315175.83999999997 105 2014 "DE" "79" 10645000 3548 2012 "NO" "62-63" 5090000 1696 2012 "DE" "53" 341040 113 2014 "DE" "64" 1.8332e+09 611066 2014 "DE" "71" 663000 221 2012 "DE" "27" 52000000 17333 2012 "ES" "28" 17050399.66 5683 2014 "CZ" "17" 30007714 10002 2012 "BG" "25" 432560 144 2012 "ES" "46" 20227923 6742 2012 "BG" "14-15" 4449330 1483 2014 "ES" "18" 1735012 578 2012 "DE" "37-39" 1862000 620 2014 "DE" "29" 1086000 362 2012 "ES" "71" 277066 92 2014 "CZ" "62-63" 2137898 712 2014 "LT" "69-75" 44732 14 2012 "ES" "19-20" 6101640 2033 2014 "BG" "71" 736783 245 2012 "DE" "29" 1064280 354 2014 "ES" "42" 71051622 23683 2014 "NO" "30" 29731000 9910 2012 "ES" "21" 39389172 13129 2012 "HU" "49" 6668902 2222 2014 "ES" "31" 9495946.18 3165 2014 "ES" "5-9" 2012564 670 2012 "HU" "52" 3192252 1064 2012 "CZ" "58" 3606504.86 1202 2014 "DE" "81" 26698000 8899 2012 "ES" "72+75" 152397 50 2014 "ES" "10-12" 12804166 4268 2014 "CZ" "26" 1946519 648 2012 "PT" "73" 5490983 1830 2012 "DE" "17" 25382000 8460 2014 "CZ" "64" 366761899 122253 2012 "ES" "23" 11951247.98 3983 2014 "DE" "61" 1176000 392 2014 "ES" "62-63" 1595251.84 531 2014 "DE" "19-20" 9310000 3103 2014 "DE" "25" 33781000 11260 2012 "PT" "62-63" 61362084.56 20454 2014 "DE" "62-63" 650000 216 2012 "LT" "58-63" 513704 171 2012 "DE" "37-39" 900000 300 2014 "DE" "22" 2100000 700 2014 "NO" "26" 209000 69 2012 "ES" "10-12" 2454043 818 2012 "LT" "13-15" 2075714 691 2014 "EE" "10-12" 39682392.26 13227 2014 "DE" "25" 1000000 333 2014 "BG" "14-15" 1866244 622 2012 "ES" "28" 11253027 3751 2012 "CZ" "46" 8739672 2913 2012 "BG" "46" 6931690 2310 2012 "HU" "28" 415090643.38 138363 2014 "LT" "24-25" 406238.42 135 2014 "LT" "31-32" 39420181 13140 2012 "CZ" "17" 30007714 10002 2014 "NO" "71" 20237980 6745 2014 "DE" "64" 1.8332e+09 611066 2012 "NO" "33" 5890780 1963 2014 "DE" "23" 1645420 548 2014 "ES" "72+75" 3817565.5 1272 2014 "ES" "74" 7394730 2464 2012 "ES" "71" 8414352 2804 2012 "HU" "62-63" 32595865 10865 2014 "ES" "87" 1603969 534 2012 "DE" "28" 5429200 1809 2014 "ES" "71" 2360006 786 2012 end
Does anyone know how to properly execute this matching?
Thanks in advance!
Maarten
0 Response to closest relative match based on multiple variables after exact matching
Post a Comment