Hi all,
I usually have no problem with egen but I am finding it impossible to complete a relatively easy egen function and I'm hoping someone can help me.
I have a dataset of schools, years, ethnicities, and the quantity of each ethnicity in each school. The dataset is long and looks like this (a snapshot of a massive dataset):
School Year Ethnicity quantity
1 1998 1 20
1 1998 2 5
1 1999 1 30
2 1998 1 25
In this snapshot I have 2 schools. The first is observed in 1998 and 1999 and the second only in 1998. I have two Ethnicities. The quantity of ethnicity 1 in 1998 in school 1 is 20 learners. The quantitiy of ethnicity 2 in 1998 in school 1 is 5 learners. And so on.
I want to be able to say "school 1 had 80% learners who were ethnicity 1 in 1998 and school 2 had 100% learners who were ethnicity 1 in 1998". Of course this is easy to see from a browse but this is a massive dataset and I obviously want to automate the calculation.
I started by creating a variable which identifies the quantity and the learner at the same time using egen:
egen eth1_quant=mean(quantity) if Ethnicity==1, by(School Year) // mean or max or min are irrelevant since there is only one value per school and year
egen eth2_quant=mean(quantity) if Ethnicity==2, by(School Year)
This worked as expected and gave me a dataset which looks like this:
School Year Ethnicity quantity eth1_quant eth2_quant
1 1998 1 20 20 .
1 1998 2 5 . 5
1 1999 1 30 30 .
2 1998 1 25 25 .
Then I wanted to create a variable which identifies the quantity of each ethnicity in each school within the same observation. I used egen and this is where I am having trouble.
My code is:
egen eth1_inschool=min(eth1_quant), by(School Year) // min is used but it is irrelevant since there is only one value per school and year
egen eth2_inschool=min(eth2_quant), by(School Year)
What I expected to get out was this:
School Year Ethnicity quantity eth1_quant eth2_quant eth1_inschool eth2_inschool
1 1998 1 20 20 . 20 5
1 1998 2 5 . 5 20 5
1 1999 1 30 30 . 30 .
2 1998 1 25 25 . 25 .
But instead I got
School Year Ethnicity quantity eth1_quant eth2_quant eth1_inschool eth2_inschool
1 1998 1 20 20 . 20 .
1 1998 2 5 . 5 . 5
1 1999 1 30 30 . 30 .
2 1998 1 25 25 . 25 .
Essentially the egen did not pick up the value within school and year and simply gave me the row value.
Any ideas as to why? I've tried using other egen functions rather than min such as total or max or mean but it is all the same.
Any help will be greatly appreciated.
Related Posts with Difficulty with egen
Splitting string in two numeric variablesHello! My data set includes a variable called price that is recorded as an interval (in most cases):…
Get no. of times for loop has iteratedIs there a way that I can access the number of times a given loop is on. For example, Code: sysus…
Deleting rows with the same valuesDear Statalists, In my dataset i have rows with same value, more precisely "NA" therefore i want to …
Issue with Reshape Code: * Example generated by -dataex-. To install: ssc install dataex clear input str25 VarYr str2…
Working with bootstrap, program and foreachHello, I have investigated my problem and have not been able to find a solution. I would like to ca…
Subscribe to:
Post Comments (Atom)
0 Response to Difficulty with egen
Post a Comment