Hello all,

I have a dataset that includes a admissions information for students to a school by the race and ethnicity of the students for the years 2000-2009. My analysis focuses on the numeric variables Admits, Apps, and race_broad, as I am trying to work out the admissions rates for each group.

My starting code, just to help visualize:

label variable Apps "Number of applicants"

label variable Admits "Number admitted"


label define race_lbl 1 "Black"
label define race_lbl 2 "AIAN", add
label define race_lbl 3 "Asian", add
label define race_lbl 4 "Latinx", add
label define race_lbl 5 "Pacific Islander", add
label define race_lbl 6 "White", add
label values race_broad race_lbl


The race_broad variable just gives me the broader categories. The dataset also had a variable for more detailed racial/ethnic information, race_detail. So, instead of one observation for Latinx each year, there are two (one row for Mexican and one row for Other Latinx but for both race_broad==4).

What I am hoping to do is to combine the observations where race_broad==4 into one so that I just have the admissions info for Latinx overall. I don't want to drop the other observations for other values of race_broad, I just don't want there to be two separate observations for Latinx. Is this possible?

2) If the answer above is "NO", here is what I did next:

Assuming that I could not just combine my observations, in order to calculate the admission rates for Latinx students based off the variables Apps and Admits, I decided to use egen and gen as follows:



egen admits_latinx = total(Admits) if race_broad==4
label variable admits_latinx "Total Latinx students admitted"

egen apps_latinx = total(Apps) if race_broad==4
label variable apps_latinx "Total Latinx applicants"

gen admit_rate_latinx = admits_latinx/apps_latinx


This did not work either, because it calculated the total of all observations!. So now I am quite stuck.

​​​​​​​Thank you. Please let me know if anything is unclear.