Dear all,


I am trying to figure out how to calculate the proportion of market overlap between firm I and firm J over the past 5 years (excluding the current year). Ultimately, I'd like to use the resulting value as a weighting factor that gives more weight to a firm that is more similar to the focal firm.


My original data looks like the below:

no indicates the number of markets a focal firm I entered at year T (e.g., 2 means that the firm entered two different markets at T).
market indicates the ID of each market. Max number of this ID is 15, implying that there are 15 types of markets.


In order to perform one-to-one matching, I changed the data structure using 'joinby' command. There should be a more efficient, clever way to perform this task, but this is what I came up with (Please see the second dataset).


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(firm year no market)
 237 2005 1  3
 237 2006 1  3
 237 2008 1  6
 237 2009 1  3
 237 2010 1  6
 237 2011 1  1
 237 2012 1  1
 237 2013 1  6
 237 2014 2  3
 237 2014 2  6
1694 2001 1  9
1694 2002 3  5
1694 2002 3  6
1694 2002 3  9
1694 2004 1  9
1694 2005 2  5
1694 2005 2  6
1694 2006 1  9
1694 2007 1  9
1694 2010 1  9
2263 2009 2  1
2263 2009 2  9
2263 2010 1  9
2263 2011 2  9
2263 2011 2 15
2263 2012 2  9
2263 2012 2 15
2263 2013 1 15
2263 2015 2  9
2263 2015 2 15
2814 2001 2  6
2814 2001 2  9
2814 2004 1  6
2814 2005 2  6
2814 2005 2  9
2814 2008 1  9
2814 2012 1  9
2814 2013 1  7
2814 2014 1  9
2814 2015 1  9
3741 2009 1  9
3741 2010 1  9
3741 2011 1  9
3741 2012 1  9
3741 2013 1  9
3741 2014 3  6
3741 2014 3  7
3741 2014 3  9
3741 2015 2  7
3741 2015 2  9
4027 2007 1  6
4027 2008 1  5
4027 2009 2  7
4027 2009 2  9
4027 2010 2  1
4027 2010 2  9
4027 2011 1  7
4027 2012 1  9
4027 2013 1  6
4027 2014 1  7
5425 2002 1  6
5425 2003 1  9
5425 2004 1  9
5425 2006 1  9
5425 2008 1  9
5425 2009 1  6
5425 2010 1  9
5425 2012 1  9
5425 2013 1  9
5425 2015 1  6
5651 2003 1  1
5651 2004 1  1
5651 2005 4  1
5651 2005 4  5
5651 2005 4  7
5651 2005 4  9
5651 2006 2  6
5651 2006 2  9
5651 2008 2  6
5651 2008 2  7
5885 2000 1  7
5885 2001 1  9
5885 2002 2  6
5885 2002 2  9
5885 2003 1  1
5885 2006 1  3
5885 2008 1  1
5885 2010 1  6
5885 2013 1  6
5885 2014 1  9
6594 2004 1  6
6594 2007 1  1
6594 2010 1  1
6594 2011 2  1
6594 2011 2  3
6594 2013 2  1
6594 2013 2  3
6594 2014 2  6
6594 2014 2 15
6594 2015 1  6
end

Here are the codes I used.

u t300.dta, clear
ren * *2
ren year2 year
joinby year using "t300.dta"
drop if firm==firm2
ren no no1
order firm firm2 year market market2 no1 no2
sort firm firm2 year market market2

bys firm firm2 year: egen intersection = count(market) if market==market2
sort firm firm2 year intersection
by firm firm2 year: replace intersection = intersection[_n-1] if missing(intersection)
replace intersection=0 if missing(intersection)
gen overlap = intersection / (no1+no2-intersection)


firm2: firm J (other firms)
market2: market ID of firm J
no2: number of markets firm J entered at T
intersection: number of overlapped markets between firm I and firm J at T
overlap: % of overlapped markets between firm I and firm J at T


Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(firm firm2 year market market2 no1 no2 intersection overlap)
 237 1694 2005 3  5 1 2 0        0
 237 1694 2005 3  6 1 2 0        0
 237 1694 2006 3  9 1 1 0        0
 237 1694 2010 6  9 1 1 0        0
 237 2263 2009 3  9 1 2 0        0
 237 2263 2009 3  1 1 2 0        0
 237 2263 2010 6  9 1 1 0        0
 237 2263 2011 1 15 1 2 0        0
 237 2263 2011 1  9 1 2 0        0
 237 2263 2012 1  9 1 2 0        0
 237 2263 2012 1 15 1 2 0        0
 237 2263 2013 6 15 1 1 0        0
 237 2814 2005 3  9 1 2 0        0
 237 2814 2005 3  6 1 2 0        0
 237 2814 2008 6  9 1 1 0        0
 237 2814 2012 1  9 1 1 0        0
 237 2814 2013 6  7 1 1 0        0
 237 2814 2014 3  9 2 1 0        0
 237 2814 2014 6  9 2 1 0        0
 237 3741 2009 3  9 1 1 0        0
 237 3741 2010 6  9 1 1 0        0
 237 3741 2011 1  9 1 1 0        0
 237 3741 2012 1  9 1 1 0        0
 237 3741 2013 6  9 1 1 0        0
 237 3741 2014 6  6 2 3 1      .25
 237 3741 2014 3  6 2 3 1      .25
 237 3741 2014 3  7 2 3 1      .25
 237 3741 2014 3  9 2 3 1      .25
 237 3741 2014 6  7 2 3 1      .25
 237 3741 2014 6  9 2 3 1      .25
 237 4027 2008 6  5 1 1 0        0
 237 4027 2009 3  9 1 2 0        0
 237 4027 2009 3  7 1 2 0        0
 237 4027 2010 6  9 1 2 0        0
 237 4027 2010 6  1 1 2 0        0
 237 4027 2011 1  7 1 1 0        0
 237 4027 2012 1  9 1 1 0        0
 237 4027 2013 6  6 1 1 1        1
 237 4027 2014 6  7 2 1 0        0
 237 4027 2014 3  7 2 1 0        0
 237 5425 2006 3  9 1 1 0        0
 237 5425 2008 6  9 1 1 0        0
 237 5425 2009 3  6 1 1 0        0
 237 5425 2010 6  9 1 1 0        0
 237 5425 2012 1  9 1 1 0        0
 237 5425 2013 6  9 1 1 0        0
 237 5651 2005 3  1 1 4 0        0
 237 5651 2005 3  7 1 4 0        0
 237 5651 2005 3  9 1 4 0        0
 237 5651 2005 3  5 1 4 0        0
 237 5651 2006 3  6 1 2 0        0
 237 5651 2006 3  9 1 2 0        0
 237 5651 2008 6  6 1 2 1       .5
 237 5651 2008 6  7 1 2 1       .5
 237 5885 2006 3  3 1 1 1        1
 237 5885 2008 6  1 1 1 0        0
 237 5885 2010 6  6 1 1 1        1
 237 5885 2013 6  6 1 1 1        1
 237 5885 2014 3  9 2 1 0        0
 237 5885 2014 6  9 2 1 0        0
 237 6594 2010 6  1 1 1 0        0
 237 6594 2011 1  1 1 2 1       .5
 237 6594 2011 1  3 1 2 1       .5
 237 6594 2013 6  1 1 2 0        0
 237 6594 2013 6  3 1 2 0        0
 237 6594 2014 6  6 2 2 1 .3333333
 237 6594 2014 3  6 2 2 1 .3333333
 237 6594 2014 6 15 2 2 1 .3333333
 237 6594 2014 3 15 2 2 1 .3333333
1694  237 2005 5  3 2 1 0        0
1694  237 2005 6  3 2 1 0        0
1694  237 2006 9  3 1 1 0        0
1694  237 2010 9  6 1 1 0        0
1694 2263 2010 9  9 1 1 1        1
1694 2814 2001 9  9 1 2 1       .5
1694 2814 2001 9  6 1 2 1       .5
1694 2814 2004 9  6 1 1 0        0
1694 2814 2005 6  6 2 2 1 .3333333
1694 2814 2005 5  9 2 2 1 .3333333
1694 2814 2005 5  6 2 2 1 .3333333
1694 2814 2005 6  9 2 2 1 .3333333
1694 3741 2010 9  9 1 1 1        1
1694 4027 2007 9  6 1 1 0        0
1694 4027 2010 9  9 1 2 1       .5
1694 4027 2010 9  1 1 2 1       .5
1694 5425 2002 6  6 3 1 1 .3333333
1694 5425 2002 5  6 3 1 1 .3333333
1694 5425 2002 9  6 3 1 1 .3333333
1694 5425 2004 9  9 1 1 1        1
1694 5425 2006 9  9 1 1 1        1
1694 5425 2010 9  9 1 1 1        1
1694 5651 2004 9  1 1 1 0        0
1694 5651 2005 5  5 2 4 1       .2
1694 5651 2005 5  1 2 4 1       .2
1694 5651 2005 5  9 2 4 1       .2
1694 5651 2005 6  7 2 4 1       .2
1694 5651 2005 6  5 2 4 1       .2
1694 5651 2005 5  7 2 4 1       .2
1694 5651 2005 6  9 2 4 1       .2
1694 5651 2005 6  1 2 4 1       .2
end

This allowed me to calculate the % of market overlap between firm I and firm j at T, but I'd like to know how I can calculate the % of market overlap between firm I and firm J over the past 5 years.

Since I have more than 100,000 observations, using joinby command to restructure the dataset like this takes tremendous time. I would greatly appreciate it if you give me any suggestions and advice. I definitely think that there is a more efficient way to perform this task.


I am very looking forward to your response.




Best regards,


Anna