Evolution of topics and hate speech in retweet network communities

Applied Network Science

Table 1 The roles of different subsets of the 2018–2020 Slovenian Twitter dataset

Dataset	Period	No. of tweets	Role
All tweets	Jan. 2018–Dec. 2020	12,961,136	Collection, hate speech classification and topic detection
Original tweets	Jan. 2018–Dec. 2020	8,363,271	Hate speech modeling
Retweets	Jan. 2018–Dec. 2020	4,597,865	Network construction and community detection
Training set	Dec. 2017–Jan. 2020	50,000	Hate speech model training and cross validation
Evaluation set	Feb. 2020–Aug. 2020	10,000	Hate speech model evaluation

Out of almost 13 million tweets collected, a sample of the original tweets is used for hate speech annotation, training of classification models, and their evaluation. The retweets are used to create retweet networks, and detect communities. All the tweets are automatically classified by the hate speech classification model, and are used to detect topics