Dear all,

Assume that we have a dataset that has the following form:

Code:
input firm_id    year    shareholder_id    shares_held    company_shares    SIC3_code    beta_firm_s
1    2000    1    500    10000    251    0.05
1    2000    2    250    10000    251    0.025
1    2000    3    1800    10000    251    0.18
1    2000    4    350    10000    251    0.035
1    2000    5    980    10000    251    0.098
2    2000    1    800    4000    305    0.2
2    2000    8    1200    4000    305    0.3
2    2000    10    250    4000    305    0.0625
3    2000    1    150    800    251    0.1875
3    2000    4    25    800    251    0.03125
3    2000    12    65    800    251    0.08125
3    2000    13    80    800    251    0.1
4    2000    1    400    3500    251    0.114285714
4    2000    4    250    3500    251    0.071428571
4    2000    5    900    3500    251    0.257142857
end
In the above, the last column, beta_firm_s is just the ratio of shares_held/company_shares.

In this hypothetical example the database contains shareholder information for firms at a specific year. We know firms' identifiers, their shareholder's identifiers, along with the number of shares each one of them has. Further, we know the total number of shares outstanding and also the SIC 3 digit-code for each firm.

The aim is to calculate the lambdas for all firms when available.

I have provided the LaTeX code for the formula that needs to be calculated. One can copy and paste it in a LaTeX editor to see in more detail in case the forum cannot render TeX code.

Code:
\[
\lambda_f = \frac{1}{N} \sum_{g\neq f} \frac{\sum_{\forall s} \gamma_{fs} \beta_{gs}}{\sum_{\forall s} \gamma_{fs} \beta_{fs}}.
\]
In the above formula, $f$ and $g$ represent two distinctive firms, $s$ denotes shareholders. Further, $\gamma_{fs}$ indicates voting shares, while $\beta_{fs}$ the cash flow shares of shareholder $s$ in firm $f$. This component, $\frac{\sum_{\forall s} \gamma_{fs} \beta_{gs}}{\sum_{\forall s} \gamma_{fs} \beta_{fs}}$, is the weight that firm $f$ puts on its rival $g$'s  profit if there are investors that hold shares in both firms. $N$ is the number of rivals in the same industry.
From the table that I provided above, firms 1, 3, and 4 share the same SIC3 code. This means that firms with SIC code 251 have two rivals each. Usually, a standard assumption here is that $\beta_{fs} = \gamma_{fs}$. With this assumption in mind (I hope my calculations are correct), one should get a final dataset that looks like this:

Code:
input firm_id year lambda
1 2000 2.657
2 2000 .
3 2000 0.472
4 2000 0.85
end
So far, I have written some code that calculates the number of rivals (see below).

Code:
egen sic3_id = group(sic3_code)        

* Work only in the data with the same sic3_code
summ sic3_id
    di `r(max)'

forvalues i = 1(1)`r(max)' {
    preserve
    keep if sic3_id == `i'
    /* Count the number of unique firms under that sic3_id */
    distinct firm_id if sic3_id == `i'        
    gen rival= `r(ndistinct)' - 1 
    /* Save that case as a temporary file in the memory */
    tempfile rival_sic`i' 
    save `rival_sic`i'', replace
    restore
}

summ sic3_id
use `rival_sic1', clear
forvalues i = 2(1)`r(max)' {
    append using `rival_sic`i''
}
save data_rivals.dta, replace
beep
A more efficient code could be written for the above, but the real problem is with summation parts, as the form of the dataset, is not appropriate to apply the formula. It seems that to solve this, one would need to work on chunks of data each time (in parts, that is). I would appreciate any help on this matter.