Hi All,
Any tips on how to analyze level of agreement when
1) many raters, let's say students that were randomly selected from larger population
2) raters examine many different randomly selected subjects from larger population of interest, not all subjects rated by every rater and some subjects rated by >1 rater
3) score is binary
4) compare against a single gold standard rater that evaluated all subjects

interested in level of agreement between students (as representing the student body in general) vs gold standard.
Thanks in advance!!
Mark

Data would look some thing like:
student_id subject rating gold_stand_rating
1 1 0 0
1 2 1 1
2 1 0
2 3 0 0
2 4 1 0
3 1 0
4 3 0
4 5 1 1
4 6 1 0
4 7 0 0
5 2 0
5 5 0