Hello,
I’m using STATA 15 to generate a variable that will be used as an instrument in a regression.
The dataset that I use is on director level, meaning that for each firm in a year the directors of the board are unique observations. (note that the directors in the sample don’t need to add up to the board size, since executives are excluded, for example)
Below is a subsection of the dataset.
cusip_9 fyear directorid gender board_size fraction_female_directos
63904106 2000 323732 M 9 0.111
63904106 2001 323712 M 10 0.1
63904106 2001 323719 M 10 0.1
63904106 2001 323721 M 10 0.1
63904106 2001 323722 F 10 0.1
63904106 2001 323728 M 10 0.1
63904106 2001 324051 M 10 0.1
63904106 2001 324056 M 10 0.1
63904106 2002 323713 M 11 0.182
63904106 2002 323722 F 11 0.182
63904106 2002 323726 M 11 0.182
63904106 2002 323728 M 11 0.182
63904106 2002 323734 F 11 0.182
63904106 2002 324056 M 11 0.182
63904106 2003 323716 M 11 0.182
63904106 2003 323719 M 11 0.182
63904106 2003 323721 M 11 0.182
Now the variable that I want to make is the: Fraction of Males with Board Connections to Female Directors.

So, in the data a director can sit on multiple boards(firms) in a year. To break it down, I want to know of the directors that sit on other boards if there are females on those other boards. Secondly making the fraction of male directors in a board that sit on other boards in a year that have a female on the board. It is important that this fraction doesn’t incorporate females on the board in question for which I want to calculate this connection fraction.

The code that I used so far is the following, and for a fact wrong and inefficient

Code:
//generating variable to indicate on how many other boards a directors sits in the sample.

sort directorid fyear
quietly by directorid fyear: gen other_boards=cond(_N==1,0,_n)
label variable other_boards "# of boards a directors sits on in the sample"

//generating other_boards only for men

gen other_boards_m = 0
replace other_boards_m = 1 if (gender=="M" & other_boards>1)


//generating instrument as the fraction of male directors on the board who sit on other boards on which there are female directors
//frist generating connection and later the fraction
gen connection_female = 0
replace connection_female = 1 if (firm_has_female_directors==1 & other_boards_m>=1)

//fraction of male directors on the board who sit on other boards on which there are female directors

gen fraction_connection_female = (connection_female/board_size)
label variable fraction_connection_female "fraction of male directors on the board who sit on other boards on which there are female directors"
This code is wrong since it doens't produce results for other_boards ==0. Which is also needed. Furthermore it doens't add up the individual directors first and then takes the fraction.
Someone mentioned that the egen command could be far easier to compute what I'm after. Unfortunately, I have no idea how, since my stata knowledge is not that great.

Hopefully someone can help me.