Hi all,
I usually have no problem with egen but I am finding it impossible to complete a relatively easy egen function and I'm hoping someone can help me.
I have a dataset of schools, years, ethnicities, and the quantity of each ethnicity in each school. The dataset is long and looks like this (a snapshot of a massive dataset):
School Year Ethnicity quantity
1 1998 1 20
1 1998 2 5
1 1999 1 30
2 1998 1 25
In this snapshot I have 2 schools. The first is observed in 1998 and 1999 and the second only in 1998. I have two Ethnicities. The quantity of ethnicity 1 in 1998 in school 1 is 20 learners. The quantitiy of ethnicity 2 in 1998 in school 1 is 5 learners. And so on.
I want to be able to say "school 1 had 80% learners who were ethnicity 1 in 1998 and school 2 had 100% learners who were ethnicity 1 in 1998". Of course this is easy to see from a browse but this is a massive dataset and I obviously want to automate the calculation.
I started by creating a variable which identifies the quantity and the learner at the same time using egen:
egen eth1_quant=mean(quantity) if Ethnicity==1, by(School Year) // mean or max or min are irrelevant since there is only one value per school and year
egen eth2_quant=mean(quantity) if Ethnicity==2, by(School Year)
This worked as expected and gave me a dataset which looks like this:
School Year Ethnicity quantity eth1_quant eth2_quant
1 1998 1 20 20 .
1 1998 2 5 . 5
1 1999 1 30 30 .
2 1998 1 25 25 .
Then I wanted to create a variable which identifies the quantity of each ethnicity in each school within the same observation. I used egen and this is where I am having trouble.
My code is:
egen eth1_inschool=min(eth1_quant), by(School Year) // min is used but it is irrelevant since there is only one value per school and year
egen eth2_inschool=min(eth2_quant), by(School Year)
What I expected to get out was this:
School Year Ethnicity quantity eth1_quant eth2_quant eth1_inschool eth2_inschool
1 1998 1 20 20 . 20 5
1 1998 2 5 . 5 20 5
1 1999 1 30 30 . 30 .
2 1998 1 25 25 . 25 .
But instead I got
School Year Ethnicity quantity eth1_quant eth2_quant eth1_inschool eth2_inschool
1 1998 1 20 20 . 20 .
1 1998 2 5 . 5 . 5
1 1999 1 30 30 . 30 .
2 1998 1 25 25 . 25 .
Essentially the egen did not pick up the value within school and year and simply gave me the row value.
Any ideas as to why? I've tried using other egen functions rather than min such as total or max or mean but it is all the same.
Any help will be greatly appreciated.
Related Posts with Difficulty with egen
Survival analysis. Age as time variable. Need to include age in posterior analysis.Dear statalist community, Following the advice from the community given in my last post, I set up a…
How to reconvert the set of values that a variable can assume to a new set of values simultaneouslyHello, I have a variable called HighEd1 that can assume values from 1 to 8 and each value stands fo…
Extracting hour dummies from datetime variableDear Statalisters, I am working with a panel data that includes a datime variable (Cdayhour, ex: 21f…
Generate dataset for simulation (non-i.i.d panel data)Hello everyone, Hope you all are well and keeping safe. I have a question for the community here a…
Pooled panled portfolio regressionDear Statalists, currently I am struggling with the outputs of my pooled portfolio regression. I ha…
Subscribe to:
Post Comments (Atom)
0 Response to Difficulty with egen
Post a Comment