If the coder answered that the document was type 1, then they would code the T1 variables and the T2 variables would be missing. If it was type2, then they’d code the T2 variables. If it is both type 1 and 2, the coder should have coded it twice, once for each type. As the data is arrange now, each variable has four columns, the RA first coding, my first coding , the RA second coding (if applicable) and my second coding (if applicable)
doc_id | type_ra1 | type_me1 | type_ra2 | type_me2 | T1_ra1 | T1_me1 | T1_ra2 | T1_me2 | T2_ra1 | T2_me1 | T2_ra2 | T2_me2 |
1 | 1 | 1 | . | . | 2 | 2 | . | . | . | . | . | . |
2 | 2 | 2 | . | . | . | . | . | . | 1 | 2 | . | . |
3 | 1 | 1 | 2 | 1 | 1 | . | . | . | . | 2 | ||
4 | 1 | 2 | . | 1 | 2 | . | . | 2 | . | 2 | . | . |
So, in the above example, the first row should have perfect reliability between me and the RA.
The second row would be reliable for the type variable, but not the T2 variable.
The third row both coders coded it as type 1 the same, but I coded it a second time.
In the fourth row, my second coding matched the RA first coding, which should still count towards higher reliability, but it would have been better if the RA had coded a second time in the same way as my first.
For the ICR I’ve been grouping all attempts for comparable variables like this:
Kappaetc type_ra1 type_ra2 type_me1 type_me2
Kappaetc T1_ra1 T1_ra2 T1_me1 T1_me2
…
So, I have 2 questions:
1) Is there a better way to account for missing values here? When I make missing values 0 the reliability hits around .7 depending on the variable, but without it is around .85.
2) Is there a good way to measure total reliability? I want to be able to see if the document as a whole was reliably coded, not just the individual variables. Because there are a lot of variables coded for each document, it could be that most documents have something off when looking at the variables collectively, even though the individual variables tend to have high reliability.
0 Response to kappaetc / Krippendorff's alpha with missing values and across multiple variables
Post a Comment