Hi everyone,

I am currently faced with a fun little challenge which is giving me a bit of a headache. I will talk about the data and goal first and then add some thoughts of mine (but since I’m nowhere near sure how to tackle this, I am leaving that for the end). I have also provided a simplified data example at the bottom of this post.

I have data on students which work on projects, in teams, for different classes. I am trying to identify whether students on a certain project have worked together on a project for a different class before. The difficulty with this, for me, is that I am not just trying to identify whether the whole team has worked together in a different class before, but also if a subset of any size has worked together for a different class before. Ideally, in the end, I would have an indicator variable that indicates whether a specific student has worked together with another student from the project previously for a different class (e.g. looking at the first two projects in the data example below, we can see that they are from different classes and students 1&3 are in both of them, so their value of the new indicator should be 1 for the second project [think of class 1 as preceding class 2]), i.e. the data format remains unchanged / returns to the original one.

Currently, my best guess is that I have to create rows for all x-wise combinations of students (and each way of ordering them?) to then identify which students have worked together before, but I’m not quite sure how to start this (or if this is the way to go)? Has anybody encountered a similar problem before and/or could help me out with this.

Thanks so much

Code:
* Example generated by -dataex-. To install: ssc install dataex
clear
input long project byte(student class)
6125807 1 1
6125807 2 1
6125807 3 1
6126790 1 2
6126790 3 2
6126790 4 2
6127078 1 1
6127078 5 1
6127078 6 1
6127080 2 2
6127080 4 2
6127080 5 2
6127080 7 2
6127212 2 2
6127212 4 2
6127212 5 2
6128238 3 1
6128238 8 1
6128238 9 1
6128885 1 2
6128885 3 2
6128885 4 2
6128885 6 2
6129210 1 2
6129210 7 2
6129210 8 2
6131066 3 1
6131066 4 1
6131066 8 1
6131066 9 1
6131672 7 1
6131672 8 1
6131672 9 1
end