Skip to main content

Table 2 The annotator agreement and the model performance

From: Evolution of topics and hate speech in retweet network communities

 

No. of tweets

Overall

Acceptable

Unacceptable

Alpha

Acc

F1(A)

F1(U)

Self-agreement

5981

0.79

0.88

0.92

0.87

Inter-annotator agreement

53,831

0.60

0.79

0.85

0.75

Classification model

     

   Training set

50,000

0.61

0.80

0.85

0.77

   Evaluation set

10,000

0.57

0.80

0.86

0.71

  1. Three measures are used: ordinal Krippendorff’s \(Alpha\), accuracy (\(Acc\)), and \(F_{1}\) for the classes of acceptable (A) and unacceptable (U) tweets. The first line is the self-agreement of individual annotators, and the second line is the inter-annotator agreement between different annotators. The last two lines are the evaluation results of the model, on the training set (by cross validation) and on the out-of-sample evaluation set, respectively. Note that the model performance is comparable to the inter-annotator agreement