Hello everyone,

I have learned a lot on the Statalist forum but I am not running into something that I could use your expertise on. I think I can manage to do what I want in Excel, but since I have many observations and I want to learn how it works in Stata I am asking it here.

I am doing a study of which one part consists of the structure of boards. For this, I am using BoardEx. From BoardEx, I have exported three files, which together contain the information I need. I will be talking about the three files below. The main goal of what I am trying to achieve is to calculate the ratio of female directors in a given firm-year. I want to do this by dividing the number of females in a given firm-year by the total board size.

In the first file, the genders of the board members are stated per DirectorID. Here is a short preview.

Code:
input str21 Title str30 Forename1 str11 DOB str3 Age str1 Gender str36 Nationality long DirectorID int NetworkSize
"Admiral"               "Anna"             "Jan 1972"    "47" "F" ""           1530600    68
"Director"               "Aziz"       "1966"        "54" "M" ""           1356558   245
"Admiral"               "Bobby"            "04 Apr 1931" "89" "M" "American"     33004  4323
"Chairman"               "John"          "10 Jul 1953" "66" "M" "French"     1381271   120
end
In the second file, the education per DirectorID is stated. I will use this later on as a control variable.

Code:
input str64 DirectorName str128 CompanyName str85 Qualification long(DirectorID CompanyID) int AwardDate
"Doctor Christopher Albrecht"        "Universitat Basel (University of Basel)"                                                                                     "Graduated"                        39   62183     .
"Doctor Christopher Albrecht"        "Universitat Basel (University of Basel)"                                                                                     "PhD"                              39   62183  1461
"John Loudon"                        "Yale University"                                                                                                             "BA"                               52   62981 -1095
"John Loudon"                        "Université Paris Sorbonne - Paris IV (Paris Sorbonne University - Paris IV)"                                                "Graduated"                        52   63794     .
end
format %tdnn/dd/CCYY AwardDate
I have successfully managed to merge these two files. However, now comes the part where I am lost. The last export from BoardEx contains the board size per company per year.

Code:
input long(BoardID DirectorID) str4 Year str12 ISIN byte NumberDirectors
1990357 31 "2016" "JE00BD3QJR55"  3
1990357 31 "2015" "JE00BD3QJR55"  3
1990357 31 "2014" "JE00BD3QJR55"  3
1990357 31 "2017" "JE00BD3QJR55"  3
1990357 31 "2019" "JE00BD3QJR55"  4
1990357 31 "2018" "JE00BD3QJR55"  3
  17834 36 "2002" "IE0004906560" 18
  17834 36 "2006" "IE0004906560" 19
  17834 36 "2004" "IE0004906560" 19
  17834 36 "2011" "IE0004906560" 15
  17834 36 "2005" "IE0004906560" 19
  17834 36 "2010" "IE0004906560" 15
  17834 36 "2007" "IE0004906560" 18
  17834 36 "2003" "IE0004906560" 19
  17834 36 "2008" "IE0004906560" 15
  17834 36 "2009" "IE0004906560" 15
  32910 37 "2002" "DE0007664005" 28
  32910 37 "2002" "DE0007664039" 28
   3447 39 "2002" "CH0012410517" 12
  20144 42 "2003" "IT0000062957" 23
   8678 42 "2003" "FR0000120644" 13
  15520 42 "2002" "IT0001353173" 15
end
When I am trying to merge the gender and education file with this file, I get the message
Code:
variable DirectorID does not uniquely identify observations in the master data
. I get that this is because the DirectorID's in the last file is not unique but repeating, but that shouldn't matter if I want to have the gender in it.

So, my question is: how can I realize this? And how can I then calculate the female ratio (number of females/number of directors)?

Thank you for your time and efforts. If something is not clear please let me know. If I can in any way improve my future posts I would gladly like to hear that as well.