I am using kappaetc (SSC) Inter-rater reliability, comparing diagnostic agreement before vs. after a diagnostic skills training event. I'd like to ask for advice on how to approach testing for signficance of the difference between those two measurements, before vs. after training.

kappaetc returns both the coefficients and standard errors as matrices.