Hello,

My dataset consists of ~6000 referrals to a suite of four types of psychological programs over a 5 year period. Some people have been referred to multiple different programs over the time period. The programs are run at different institutional settings, so the same person may be allocated to attend different programs at different locations over the time period. The main outcome of interest is whether or not participants complete the programs successfully.

I want to answer a question assessing whether people who are from an ethnic minority are more likely to complete programs if there are a higher proportion of people who are also from ethnic minority in their program group. The hypothesis is that when participants participate in program groups with more people with the same ethnic background as them, they are more likely to continue participating in the group and complete them.

However, I cannot figure out how to create a variable which will tell me - in a particular program cohort/group (i.e. people participating in the same program type, at the same program delivery location on the same date) - what proportion are from an ethnic minority?

Here is an example of what the data looks like (from excel, sorry the data is very identifying and so I had to make an example version):
ID Program_type Program_delivery_location Program_delivery_date Ethnic_Minority Completion_flag
1 1 1 13-Jan-15 0 0
2 1 1 13-Jan-15 1 1
3 1 1 13-Jan-15 1 1
4 1 1 13-Jan-15 0 1
5 1 1 13-Jan-15 0 0
6 1 1 13-Jan-15 1 1
7 2 1 13-Jan-15 1 1
1 2 2 9-Apr-15 0 1
2 2 2 9-Apr-15 1 1
8 2 2 9-Apr-15 0 0
9 3 1 17-Aug-18 1 0
10 3 1 17-Aug-18 0 1
11 3 1 17-Aug-18 1 0
12 3 1 17-Aug-18 1 0
13 1 3 25-Feb-17 1 0
14 1 3 25-Feb-17 1 0
Thanks in advance!