Skip to main content

Table 2 The annotator agreement and the model performance

From: Evolution of topics and hate speech in retweet network communities

  No. of tweets Overall Acceptable Unacceptable
Alpha Acc F1(A) F1(U)
Self-agreement 5981 0.79 0.88 0.92 0.87
Inter-annotator agreement 53,831 0.60 0.79 0.85 0.75
Classification model      
   Training set 50,000 0.61 0.80 0.85 0.77
   Evaluation set 10,000 0.57 0.80 0.86 0.71
  1. Three measures are used: ordinal Krippendorff’s \(Alpha\), accuracy (\(Acc\)), and \(F_{1}\) for the classes of acceptable (A) and unacceptable (U) tweets. The first line is the self-agreement of individual annotators, and the second line is the inter-annotator agreement between different annotators. The last two lines are the evaluation results of the model, on the training set (by cross validation) and on the out-of-sample evaluation set, respectively. Note that the model performance is comparable to the inter-annotator agreement