The ever-increasing social media and user-generated contents on the web is an abundant source of data which can provide relevant insight. This work is based on data from Twitter1, a social networking and micro blogging platform with over 300 million monthly active users, posting over 500 million tweets per day.
There are at least two approaches to analyzing the Twitter data: the (social) network analysis, and the contents analysis. In our previous research (Sluban et al. 2015), we combined both approaches. We detected influential communities, identified discussion topics, and determined the sentiment of the communities towards selected topics. However, the question whether the detected communities have corresponding real-world counterparts remained unanswered.
In this paper, we study retweet networks of the Members of the European Parliament and investigate their community structure. In particular, our goal is to determine what the community structure actually reflects. We approach this problem from the perspective of the network theory, which has been applied successfully to characterize a wide variety of complex systems. We show that the network theory is particularly effective at uncovering structure without prior knowledge of political orientation or national membership.
Twitter provides users with opportunities for different forms of interaction. The most prevalent form of interaction is following other users. When a user follows others, the tweets those users post are shown in the follower’s feed. A user can mention other users in a tweet, which brings the tweet to the attention of the users that are mentioned. Closely related to mentions are replies. A user can reply to a specific tweet from another user, engaging her/him in a direct conversation.
Retweets are the form of interaction most characteristic of Twitter as a social network. A user can retweet the tweets posted by other users. By doing this, the user creates another tweet with the exact same content as the original, with an additional attribution to the original tweet. This way, the information about the original author of the tweet is preserved. An idiosyncrasy of retweets is that the original tweet is always attributed in a retweet, eliminating the possibility of retweeting a retweet. When a user retweets a tweet, it is distributed to all her/his followers, just as if it were an originally authored tweet. Users retweet content that they find interesting or agreeable. Retweets have been analyzed in the context of information spreading and cascade formation. Additionally, retweets have been analyzed as a form of influence.
Existing research has analyzed various means of acquiring relevant tweets from the Twitter APIs (Kumar et al. 2015; Morstatter et al. 2013; Sampson et al. 2015) including the Streaming API used in this work, as well as improvements of the acquisition process by modification of queries.
To the best of our knowledge, there has been no previous work on the analysis of retweet networks of the Members of the European Parliament. Nevertheless, there is a considerable body of literature on several aspects relevant to this study.
Conover et al. 2011a predict the political alignment of Twitter users in the run-up to the 2010 US elections based on content and network structure. They analyze the polarization of retweet and mention networks for the same elections (Conover et al. 2011b). Borondo et al. 2012 analyze the user activity during the 2011 Spanish presidential elections. They additionally analyze the 2012 Catalan elections focusing on the interplay between the language and the community structure of the network (Borondo et al. 2014). Most existing research, as Larsson points out (Larsson 2014), focuses on the online behavior of political figures during election campaigns. Hix et al. 2009 investigate the voting cohesion of political groups in the European Parliament. Larsson 2014 examines the Twitter presence of representatives outside of election periods.
Lazer 2011 highlights the importance of the network science approaches in political science at large, since politics is a relational phenomenon at its core. Recent research has adopted the network approach to investigate the structure of legislative work in the US Congress, including committee and subcommittee membership (Porter et al. 2005), bill cosponsoring (Zhang et al. 2008), and roll-call votes (Waugh et al. 2009). In a more recent work, Dal Maso et al. 2014 examine the community structure with respect to political coalitions and government structure in the Italian Parliament.
As previously noted, there are three main modalities in which users on Twitter interact: 1) the user follows posts of other users, 2) the user responds to other user’s tweets by mentioning them or replying to them, and 3) the user forwards interesting tweets by retweeting them. Based on these three interaction types, one can define three measures of influence of a Twitter user: indegree influence (the number of followers, indicating the size of her/his audience), mention influence (the number of mentions of the user, indicating her/his ability to engage others in conversation), and retweet influence (the number of retweets, indicating the ability of the user to write content of interest to be forwarded to others).
Kwak et al. 2010 compare three different network-based measures of influence on Twitter: the number of followers, page-rank, and the number of retweets—finding the ranking of the most influential users differ depending on the measure. Cha et al. 2010 also compare three different measures of influence: the number of followers, the number of retweets, and the number of mentions—also finding that the most followed users do not necessarily score the highest on the other measures. Wang et al. 2010 compare the number of followers and page-rank with a modified page-rank measure that accounts for topic, again finding that ranking depends on the influence measure. Suh et al. 2010
investigate how different factors such as the account age, the use of hashtags and URLs impact the influence of the user measured by the number of retweets. Bakshy et al. 2011 investigate how information spreads on a retweet network and whether there are preconditions for the user to become influential. Boyd et al. 2010 examine retweets as a conversational practice and note that retweeting can be understood both as a form of information diffusion and as a means of participating in a diffuse conversation. Another perspective on Twitter analysis is to detect and focus on specific events instead of the complete time-frame (Kenett et al. 2014).
Along with the small-world phenomenon and power-law degree distribution, the most salient property that real-world networks exhibit is community structure, where network nodes are partitioned together in tightly knit groups, between which there are only loose connections (Girvan and Newman 2002). The identification of the community structure in the network is commonly based on the optimization of its modularity (Newman and Girvan 2004). Identification of communities in social networks is a very vibrant field of research. Many different algorithms exist which employ various approaches (Fortunato 2010). In this work, we perform community detection by the Louvain method, introduced by Blondel 2008, that is found to be among the best performing algorithms in a variaty of domains (Harenberg et al. 2014; Hric et al. 2014).
Evaluation of the community structure is often performed by qualitative comparison to the ground truth (Lužar et al. 2014; Perc 2010). The methodology for evaluating the degree to which the detected communities match known groups (Yang and Leskovec 2015), used in this work, is based on the B
3 algorithm (Bagga and Baldwin 2011; 2008). The B
3 measure is the most appropriate measure according to the formal constraints for extrinsic clustering evaluation measures proposed by Amigo 2009.
This paper is based on our preliminary work, presented in a workshop proceeding (Cherepnalkoski and Mozetič 2015), and extending it along three main dimensions. First, the collection process is extended from eight months to one year, increasing the data size by 50 %. Second, in addition to examining political group and country membership, we examine language as a potential factor along which the communities are formed. Finally, we investigate the indirect links between the countries in the retweet network in terms of tweet sharing between the EP members and the Europe-wide audience.
The paper is organized as follows. The section “The European Parliament on Twitter” describes the EU Parliament and the Twitter data collected. In the section “Community detection and evaluation”, we outline the Louvain community detection method and the measures to evaluate the detected communities w.r.t. the ‘ground truth’, i.e., the actual labels. The sections “Communities in the core network” and “Communities in the extended network” present the results of the community detection. The “Linking the EU countries in the extended network” section presents the results of the tweet sharing analysis between the countries. In “Conclusions”, we discuss the results and plans for future research.