Skip to main content

Table 1 The roles of different subsets of the 2018–2020 Slovenian Twitter dataset

From: Evolution of topics and hate speech in retweet network communities

Dataset

Period

No. of tweets

Role

All tweets

Jan. 2018–Dec. 2020

12,961,136

Collection, hate speech classification and topic detection

Original tweets

Jan. 2018–Dec. 2020

8,363,271

Hate speech modeling

Retweets

Jan. 2018–Dec. 2020

4,597,865

Network construction and community detection

Training set

Dec. 2017–Jan. 2020

50,000

Hate speech model training and cross validation

Evaluation set

Feb. 2020–Aug. 2020

10,000

Hate speech model evaluation

  1. Out of almost 13 million tweets collected, a sample of the original tweets is used for hate speech annotation, training of classification models, and their evaluation. The retweets are used to create retweet networks, and detect communities. All the tweets are automatically classified by the hate speech classification model, and are used to detect topics