I am currently writing a code to capture the differences in earnings management between US firms and cross-listed firms (foreign firms on an American stock exchange). Because the cross-listed firms are self-selected, the data might be biased. Therefore, I have to match the cross-listed firms with US firms based on:
- fyear (fiscal year)
- sic_2 (industry)
- at (total assets)
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input long gvkey double fyear byte sic_2 double at 1004 1996 50 529.584 1004 1997 50 670.559 1004 1998 50 726.63 1004 1999 50 740.998 1004 2000 50 701.854 1004 2001 50 710.199 1004 2002 50 686.621 1004 2003 50 709.292 1004 2004 50 732.23 1004 2005 50 978.819 1004 2006 50 1067.633 1004 2007 50 1362.01 1004 2008 50 1377.511 1004 2009 50 1501.042 1004 2010 50 1703.727 1004 2011 50 2195.653 1004 2012 50 2136.9 1004 2013 50 2199.5 1004 2014 50 1515 1004 2015 50 1442.1 1010 1997 37 3181.3 1010 1998 37 3257.3 1010 1999 37 3563.4 1010 2000 37 3794.5 1010 2001 37 3723.1 1010 2002 37 3702.5 1010 2003 37 4832.1 1013 1997 36 936.303 1013 1998 36 1300.587 1013 1999 36 1672.529 1013 2000 36 3970.5 1013 2001 36 2499.7 1013 2002 36 1144.2 1013 2003 36 1296.9 1013 2004 36 1428.1 1013 2005 36 1535 1013 2006 36 1611.4 1013 2007 36 1764.8 1013 2008 36 1921 1013 2009 36 1343.6 1013 2010 36 1474.5 1019 1997 73 26.71 1019 1998 73 29.283 1019 1999 73 29.341 1019 2000 73 28.638 1019 2001 73 30.836 1021 1997 38 20.516 1021 1998 38 18.661 1021 1999 38 13.986 1021 2000 38 11.608 1021 2001 38 8.635 1021 2002 38 7.85 1021 2003 38 6.044 1021 2004 38 6.245 1021 2005 38 8.153 1021 2006 38 14.341 1021 2007 38 27.171 1021 2008 38 21.401 1034 1997 28 631.866 1034 1998 28 908.936 1034 1999 28 1160.266 1034 2000 28 1610.435 1034 2001 28 2390.008 1034 2002 28 2296.924 1034 2003 28 2329.268 1034 2004 28 2003.842 1034 2005 28 1623.383 1034 2006 28 927.239 1034 2007 28 1288.165 1036 1997 34 1778.547 1036 1998 34 2113.32 1036 1999 34 2241.575 1036 2000 34 2325.377 1037 1996 36 4.969 1037 1997 36 5.45 1037 1998 36 3.228 1037 1999 36 4.575 1037 2000 36 6.373 1037 2001 36 17.867 1038 1996 78 718.213 1038 1997 78 795.78 1038 1998 78 975.73 1038 1999 78 1188.805 1038 2000 78 1047.264 1038 2001 78 1279.17 1038 2002 78 1491.698 1038 2003 78 1506.534 1043 1997 50 44.9 1043 1998 50 45.639 1043 1999 50 42.21 1045 1997 45 20915 1045 1998 45 22303 1045 1999 45 24374 1045 2000 45 26213 1045 2001 45 32841 1045 2002 45 30267 1045 2003 45 29330 1045 2004 45 28773 1045 2005 45 29495 1045 2006 45 29145 end
Perhaps what I mean is not exactly called 'matching'. Anyway, I am looking to keep one American firm for each cross-listed firm, that is in the same year and industry and with the closest asset size.
If I use -joinby- I get exactly the American firms that I need, however in that case they are in the same observation line as the cross-listed firms. If you can help me in 'splitting up' these observations, that would be equally helpful. The code I used for that is:
Code:
preserve keep if dummy_foreign == 0 ds sic_2 fyear, not rename (`r(varlist)') for0_= tempfile for0 save `for0' restore keep if dummy_foreign == 1 ds sic_2 fyear, not rename (`r(varlist)') for1_= tempfile for1 save `for1' // FIND ALL ADMISSIBLE PAIRINGS joinby sic_2 fyear using `for0' // KEEP ONE WITH CLOSEST SIZE gen size_diff = abs(for0_at - for1_at) by for1_gvkey fyear sic_2 (size_diff), sort: keep if _n == 1
0 Response to Matching firms
Post a Comment