Dear all,

Assume that we have a dataset that looks like this:

Code:
input str10 firm_id int year str10 shareholder_id double (SIC3_code rivals beta_firm_s)
A   2000    s1  251 2   0.05
A   2000    s2  251 2   0.025
A   2000    s3  251 2   0.18
A   2000    s4  251 2   0.035
A   2000    s5  251 2   0.098
B   2000    s1  305 0   0.2
B   2000    s8  305 0   0.3
B   2000    s10 305 0   0.0625
C   2000    s1  251 2   0.1875
C   2000    s4  251 2   0.03125
C   2000    s12 251 2   0.08125
C   2000    s13 251 2   0.1
D   2000    s1  251 2   0.114285714
D   2000    s4  251 2   0.071428571
D   2000    s5  251 2   0.257142857
end
In this fictional database, you see four different firms (A to D) that have information for a specific year, 2000. For each of these firms, there is a variable called shareholder_id. Assume that this is an investor in the firm. You will notice that some firms have the same investor. For example, investor s1 is also in firm B, C, and D. From this database, I would like to create a number of smaller databases that contain only information for two firms at a time and have the following characteristics:

1. All firms should belong to the same SIC3 code.
2. For each firm pair, you have the information for the firm of interest appear first. After the firm of interest, information for its pair appears. All these are sorted by shareholder_id.
3. Only information for common investors should appear. For example, if I pair A and C, I should see only information for investors s1 and s4.
4. A column must be created that informs which firm is the firm of interest.

Based on the above, the following code should be able to create the following outcome.

Pair of firm A and C
-------------------------
Code:
input str10 (case_firm firm_id) int year str10 shareholder_id double (rivals beta_firm_s)
A   A   2000    s1  2   0.05
A   A   2000    s4  2   0.035
A   C   2000    s1  2   0.1875
A   C   2000    s4  2   0.03125
end
Pair of firm A and D
-------------------------
Code:
input str10 (case_firm firm_id) int year str10 shareholder_id double (rivals beta_firm_s)
A   A   2000    s1  2   0.05
A   A   2000    s4  2   0.035
A   A   2000    s5  2   0.098
A   D   2000    s1  2   0.114285714
A   D   2000    s4  2   0.071428571
A   D   2000    s5  2   0.257142857
end
Pair of C and A
-------------------------
Code:
input str10 (case_firm firm_id) int year str10 shareholder_id double (rivals beta_firm_s)
C   C   2000    s1  2   0.1875
C   C   2000    s4  2   0.03125
C   A   2000    s1  2   0.05
C   A   2000    s4  2   0.035
end
Pair of C and D
-------------------------
Code:
input str10 (case_firm firm_id) int year str10 shareholder_id double (rivals beta_firm_s)
C   C   2000    s1  2   0.1875
C   C   2000    s4  2   0.03125
C   D   2000    s1  2   0.114285714
C   D   2000    s4  2   0.071428571
end
Pair of D and A
-------------------------
Code:
input str10 (case_firm firm_id) int year str10 shareholder_id double (rivals beta_firm_s)
D   D   2000    s1  2   0.114285714
D   D   2000    s4  2   0.071428571
D   D   2000    s5  2   0.257142857
D   A   2000    s1  2   0.05
D   A   2000    s4  2   0.035
D   A   2000    s5  2   0.098
end
Pair of D and C
-------------------------
Code:
input str10 (case_firm firm_id) int year str10 shareholder_id double (rivals beta_firm_s)
D   D   2000    s1  2   0.114285714
D   D   2000    s4  2   0.071428571
D   C   2000    s1  2   0.1875
D   C   2000    s4  2   0.03125
end
Thanks in advance for your help.