Hi everyone,

I am looking to do two things with my data that I am struggling with setting up the equation for - please advise. First, I want to estimate the total proportion (not the mean proportion) by group (of schools), year, outcome, and treatment (=0 or =1). Then, from that proportion, I would like to calculate the average for all years in that proportion.

This is my code:

generate outcome_total=.
replace outcome_total = sum(outcome) / sum(number_of_participants)
sum outcome_total if treatment==1 | year==2001
sum outcome_total if treatment==0 | year==2001
sum outcome_total if treatment==1 | year==2002
sum outcome_total if treatment==0 | year==2002
sum outcome_total if treatment==1 | year==2003
sum outcome_total if treatment==0 | year==2003

mean outcome_total if treatment==1
mean outcome_total if treatment==0

The issue I'm running into is whether I'm supposed to "sum" of "outcome_total" per year and then "mean" it to get the total average proportion? I wanted to get the aggregate of outcome by treatment and year and then the mean of all the years. But without the "sum" per year, I get different numbers for the different schools in each year because I don't know how to group the schools together. I would appreciate any help in grouping the schools by year, outcome, treatment and getting the aggregate of that and then take the mean by total years. Please let me know if that makes sense. Many thanks!


This is my dataset:

clear
input float(year school treatment outcome number_of_participants)
6 1 1 1
7 1 1 1
8 1 1 1
9 1 1 1
10 1 1 1
6 1 1 1
7 1 1 1
8 1 1 1
9 1 1 1
10 1 1 1
6 1 1 1
7 1 1 1
8 1 1 1
9 1 1 1
10 1 1 1
1 1 164 183
2 1 195 203
3 1 208 214
4 1 314 209
5 1 247 195
6 1 57 71
7 1 51 87
8 1 47 57
9 1 36 23