Hi all,

I usually have no problem with egen but I am finding it impossible to complete a relatively easy egen function and I'm hoping someone can help me.

I have a dataset of schools, years, ethnicities, and the quantity of each ethnicity in each school. The dataset is long and looks like this (a snapshot of a massive dataset):

School Year Ethnicity quantity
1 1998 1 20
1 1998 2 5
1 1999 1 30
2 1998 1 25

In this snapshot I have 2 schools. The first is observed in 1998 and 1999 and the second only in 1998. I have two Ethnicities. The quantity of ethnicity 1 in 1998 in school 1 is 20 learners. The quantitiy of ethnicity 2 in 1998 in school 1 is 5 learners. And so on.

I want to be able to say "school 1 had 80% learners who were ethnicity 1 in 1998 and school 2 had 100% learners who were ethnicity 1 in 1998". Of course this is easy to see from a browse but this is a massive dataset and I obviously want to automate the calculation.

I started by creating a variable which identifies the quantity and the learner at the same time using egen:

egen eth1_quant=mean(quantity) if Ethnicity==1, by(School Year) // mean or max or min are irrelevant since there is only one value per school and year
egen eth2_quant=mean(quantity) if Ethnicity==2, by(School Year)

This worked as expected and gave me a dataset which looks like this:

School Year Ethnicity quantity eth1_quant eth2_quant
1 1998 1 20 20 .
1 1998 2 5 . 5
1 1999 1 30 30 .
2 1998 1 25 25 .

Then I wanted to create a variable which identifies the quantity of each ethnicity in each school within the same observation. I used egen and this is where I am having trouble.

My code is:

egen eth1_inschool=min(eth1_quant), by(School Year) // min is used but it is irrelevant since there is only one value per school and year
egen eth2_inschool=min(eth2_quant), by(School Year)

What I expected to get out was this:

School Year Ethnicity quantity eth1_quant eth2_quant eth1_inschool eth2_inschool
1 1998 1 20 20 . 20 5
1 1998 2 5 . 5 20 5
1 1999 1 30 30 . 30 .
2 1998 1 25 25 . 25 .


But instead I got

School Year Ethnicity quantity eth1_quant eth2_quant eth1_inschool eth2_inschool
1 1998 1 20 20 . 20 .
1 1998 2 5 . 5 . 5
1 1999 1 30 30 . 30 .
2 1998 1 25 25 . 25 .


Essentially the egen did not pick up the value within school and year and simply gave me the row value.

Any ideas as to why? I've tried using other egen functions rather than min such as total or max or mean but it is all the same.

Any help will be greatly appreciated.