Forecasting elections results via the voter model with stubborn nodes

In this paper we propose a novel method to forecast the result of elections using only official results of previous ones. It is based on the voter model with stubborn nodes and uses theoretical results developed in a previous work of ours. We look at popular vote shares for the Conservative and Labour parties in the UK and the Republican and Democrat parties in the US. We are able to perform time-evolving estimates of the model parameters and use these to forecast the vote shares for each party in any election. We obtain a mean absolute error of 4.74\%. As a side product, our parameters estimates provide meaningful insight on the political landscape, informing us on the proportion of voters that are strong supporters of each of the considered parties.

effect, which may lead the group to react negatively and reinforce its level of polarisation in response. Our results are supported by numerical experiments on synthetic data.

Introduction
For decades, modern democratic societies have been polling populations to try and track the popularity of elections candidates and members of governments. Those are often conducted by means of phone, online or even in person surveys, which can be very time-consuming and and usually suffer from limited sample sizes and bias -e.g. respondants with controversial views might be reluctant of sharing them. This is why different methods are being investigated nowadays. With the rapid growth of online social platforms such as Facebook or Twitter, any individual can now publicly express their views and opinions, adding to an evergrowing pool of directly accessible data. This has open the door for a new avenue of research, that seeks to use this precious resource to forecast polls and election results without having to survey the population.
As of today, most efforts have focused on applying machine learning methods such as sentiment analysis to evaluate public opinion through samples of Twitter data and try to predict the outcome of democratic processes around the globe (Saleiro et al., 2016;Garcia et al., 2018;Grimaldi et al., 2020). The quality of predictions spans a rather wide range and numerous voices have expressed concerns over these methods, arguing that there are multiple factors at play that may alter their reliability (Gayo-Avello, 2012;Jungherr et al., 2017). This is why in this work we propose a novel method that does not rely on data analysis but rather uses the authentic and official results of previous elections to perform estimation for future ones.
More precisely, we consider the well-known voter model for opinion dynamics. A population of connected nodes form a graph where some of them are in state 0 and some others in state 1. Nodes can then randomly change state over time following the distribution of others' states. Nodes are usually meant to represent users on a social network and states their opinions or views. This model then allows to describe in a simple and intuitive manner social dynamics where people are divided between two parties and form their opinion by observing that of others around them. A previous work of ours was dedicated to the theoretical study of this model in the specific case where everyone is influenced by everyone else and some users are stubborn and never change opinion (Vendeville et al., 2021). Notably, we provided closed-form expressions for the distribution of opinions at any point in the process and convergence time to equilibrium. This paper is a follow-up of that work, as we apply our previous findings to forecast results of both general elections in the United Kingdom and presidential elections in the United States. We consider the share of popular votes won by either the Conservative and Labour party -or Republican and Democratic party in the US -as a representation of node states in a fictional graph and perform time-evolving estimation of optimal parameters for the corresponding voter model. This allows us to obtain a theoretical distribution for the number of seats or votes, from which we draw the expected result of future elections. We compare with real-life outcomes to assess the viability of our approach.

Related Literature
A number of research projects have focused on applying machine learning algorithms to Twitter data in order to forecast opinion poll results or election outcomes. We discuss some of them here and refer the interested reader to Gayo-Avello (2013); Phillips et al. (2017) for more in-depth reviews of the literature. A pioneer work in this area was that of Tumasjan et al. (2011) whose model achieved a mean average error (MAE) of 1.65% when predicting results of the 2009 German federal election. Authors used Twitter mention counts as an direct indicator of a candidate's popularity, a method that has been considered by several other works as well, often in combination with a sentiment analysis of tweets content (O'Connor et al., 2010;Saleiro et al., 2016;Garcia et al., 2018;Grimaldi et al., 2020;Fink et al., 2013;Huberty, 2013;Caldarelli et al., 2014;Thapen and Ghanem, 2013). In particular, Garcia et al. (2018) achieved 90% accuracy in predicting the top two candidates in various municipalities during Brazilian municipal elections, and Saleiro et al. (2016) achieved a MAE of 0.63% when trying to predict opinion poll results during the Portuguese bailout (2011)(2012)(2013)(2014).
The relevance of such approaches has however been questioned by a number of authors (Fink et al., 2013;Huberty, 2013;Caldarelli et al., 2014;Thapen and Ghanem, 2013;Jungherr et al., 2012Jungherr et al., , 2017Gayo-Avello, 2012). Jungherr et al. (2012) showed that merely changing the timeframe of forecast in the work of Tumasjan et al. (2011) would invalidate the results. Fink et al. (2013) found that use of Twitter mentions mirrored actual popularity of only some of the candidates but not all of them. Jungherr et al. (2017) argued that mentions count, used in most of the works cited above, show evidence of attention to politics rather that support to the actual candidates. This is why researchers often combine mentions count with sentiment analysis algorithms, but even these can have trouble detecting and correctly interpreting all subtelties of the human language. This particular concern, has been raised by several authors (Huberty, 2013;Caldarelli et al., 2014;Gayo-Avello, 2012). Self-selection, i.e. the fact that people choose whether to express their views online or not, may also bias results. Add to it the rife presence of bots on the Twitter platform and it is difficult to say for sure whether the online population is an accurate representation of the real one. Some researchers have thus considered different avenues, drawing features from the Twitter user graph topology (Dokoohaki et al., 2015), hashtags co-occurences (Bovet et al., 2018) or even discarding the social platform entirely and using fluctuations of the Pound to forecast the popularity of the Conservative party in the UK (Usher and Dondio, 2020). Integrating in this line of works, we build a model that does not rely on Twitter but rather uses official results of previous elections to guess the outcome of future ones. Our model is a variant of the celebrated voter model, where nodes on a graph are in one of two possible states and repeatedly update their beliefs to agree with other nodes chosen at random. It was introduced independently by Holley and Liggett (1975) and Clifford and Sudbury (1973) in the context of particles interaction. They proved that consensus is reached, i.e. that every node is eventually in the same state, on the infinite Z d lattice. Several works have since looked at different network topologies: complete graphs (Hassin and Peleg, 2002;Sood et al., 2008;Perron et al., 2009;, Erdös-Rényi random graphs (Sood et al., 2008;, scale-free random graphs (Sood et al., 2008;Fernley and Ortgiese, 2019), and other various structures Sood et al., 2008). Variants where nodes deterministically update to the most common state amongst their neighbours have also been studied (Chen and Redner, 2005;Mossel et al., 2014).
In this paper we consider the specific case where stubborn nodes who never switch state are present in the graph. Such nodes may for example represent lobbyists, politicians or activists, i.e. entities looking to lead rather than follow and who will not easily change side. One of those placed within the network can singlehandedly change the outcome of the process (Mobilia, 2003;Sood et al., 2008). If several of them are present on both sides, consensus is usually not reachable and instead the distribution of states converge to an equilibrium in which it fluctuate indefinitely (Mobilia et al., 2007;Yildiz et al., 2013). Recently, Mukhopadhyay et al. (2020) considered nodes with different degrees of stubbornness and show that time to reach consensus grows linearly with their number. Klamser et al. (2017) studied the effect of stubborn nodes on a dynamically evolving graph, and show that the two main factors shaping their influence are their degrees and the dynamical rewiring probabilities. Finally, in our previous work we developed closed-form formulas for the distribution of opinion at any step and convergence time to equilibrium in the case where stubborn nodes are present in a strongly connected network (Vendeville et al., 2021).
Our contributions. In this paper we propose a new model for the forecast of elections outcome, based on official results of previous elections. Our method is based on the voter model with stubborn nodes and uses theoretical results developed in a previous work of ours (Vendeville et al., 2021). We apply it to the United Kingdom general elections and in the United States presidential elections and achieve an MAE of 4.74%. To the best of our knowledge this is the first time such work is conducted. All code used is available online.

Theoretical background
Here we present the mathematical framework behind our forecasting method. In the traditional voter model, we consider a group of n nodes labelled 1, . . . , n who are each in state 0 or 1. These states are prone to change over time and we let x i ptq denote the state of node i at time t. Each node has access to the state of some of the others, called its neighbours. Nodes can then be seen as forming a graph of size n, with an edge from j from i if and only if i has access to the state of j. Here we consider this graph to be a clique with unweighted edges and no self-loops. Thus each node accounts for the state of every other, except their own, with no particular preference. The process then unfolds as follows. Starting with a given initial distribution of states, an independent exponential clock of parameter 1 is associated to each node. Whenever a clock rings, the concerned node changes its state to that of one of its neighbours selected uniformly at random -or equivalently, chooses its new state by sampling the distribution of its neighbours' states.
We let N 1 ptq denote the number of state-1 holders at time t; it will be our quantity of interest. Note that the number of state-0 nodes at time t is given by n´N 1 ptq. We assume N 1 p0q is fixed and let n 1 denote its value. We are interested in the particular situation where some of the nodes are stubborn, that is never change state, and we describe the evolution of N 1 ptq over time. We denote by s 0 and s 1 the numbers of stubborn state-0 and state-1 nodes respectively and require at least one of them to be strictly positive. To this end we define and require ps 0 , s 1 q P S n . We write rm ij s i,j to denote the matrix with entry m ij in the i-th row and j-th column and let e M denote the exponential of any matrix M . Because s 0 and s 1 nodes will always be in respective states 0 and 1 no matter what, N 1 ptq is comprised between s 1 and n´s 0 for all t. The idea behind our analysis is that it describes a birth-and-death process over the state-space ts 1 , . . . , n´s 0 u with transition rates, for all s 1 ď k ď n´s 0 , (2) Indeed to move from state k to k´1 we need a non stubborn state-1 node to adopt the state of an state-0 node. There are k´s 1 non stubborn state-1 nodes and for each of these, a proportion pn´kq{pn´1q of the others is in state 0, hence q k,k´1 " pk´s 1 qpn´kq{pn´1q. We obtain q k,k`1 via an analogous reasoning and define q k,k "´q k,k`1´qk,k´1 . Since the process only evolves by unit increments or decrements, q k,j " 0 if j R tk´1, k, k`1u. As expected we have q s 1 ,s 1´1 " 0 and q n´s 0 ,n´s 0`1 " 0. Finally we let Q " rq ij s i,j denote the transition rate matrix. From there we are able to compute the distribution of N 1 ptq and its expected value at any point in time.
Theorem 1. Let Q be the matrix with entries described in (2) and let N 1 p0q " n 1 be given. Assuming ps 0 , s 1 q P S n is the repartition of stubborn nodes, the probability for N 1 to equal k at time t is p n 1 ,k ptq :" re tQ s n 1 ,k . Hence, is the expected number of state-1 nodes at time t.
Because there are stubborn agents in both camps, consensus is never reached and instead the system indefinitely fluctuates within a state of equilibrium. We would like to know if the political system we consider can be considered to be within such state. To this end, the long term expectation of N 1 ptq is given by the following theorem.
Theorem 2. Assuming ps 0 , s 1 q P S n is the repartition of stubborn agents, the expected number of opinion-1 holders at equilibrium is given by where π " pπ s 1 , . . . , π n´s 0 q denotes the steady-state distribution of N 1 ptq.
The theory is developed in another work of ours (Vendeville et al., 2021) to which we refer the interested reader for more details and proofs.

Setup
We use the official database of the United Kingdom general elections results, published by the House of Commons (Audickas et al., 2020), as well as results for presidential elections in the United States manually collected from Wikipedia. 1 Each time we are interested in the percentage of popular votes won by the two major parties -Conservative and Labour in the UK, Republicans and Democrats in the US. We assume these quantities correspond to pointwise observations of independent realisations of the voter model. The result of each election can then be forecast via Theorem 1, provided we have an estimate of the quantity of stubborn nodes ps 0 , s 1 q. Thus, our analysis is done in two steps: first we make for each elections an estimate of ps 0 , s 1 q based on previous results, then Equation 4 gives us the expected value for the coming election that we use as a predictor.
In the UK dataset, our quantity of interest is the percentage of popular votes won by the Conservative and Labour parties in each general elections from 1922 onwards. In the US dataset, it is the number of popular votes gathered by Republicans and Democrats in each presidential elections from 1912 onwards. For the sake of clarity we present our method in the UK case, but note that it directly translates to the US case by replacing Conservative and Labour with Republicans and Democrats.
Because our model cannot account for decimal values values we round the percentages to the nearest integer. Different parties are present, the two major ones being Conservative 2 and Labour, the rest including Liberal Democrats or Social Nationalists amongst others. Because our model applies to a two-sided situation only, we cannot consider all of them at once. Thus, we aggregate all non-Conservative parties under the label 0 while Conservatives are attributed label 1. We let x i denote the number of seats won by Conservatives on the i th elections and t i the elapsed time, in years, since the starting point 1922. There have been m " 27 elections total, with the last one taking place in 2019. Thus t 1 " 0 and t m " 2019´1922 " 97. We let x m denote the percentage of seats won by the conservatives in 2019. To concur with our theoretical framework we consider one seat won by the Conservatives (resp. non-Conservatives) as the observation of an node being in state 1 (resp. 0) amongst n " 100 of them. The x i 's then correspond to pointwise observations at times t i 's of a realisation of the process N 1 ptq described in Section 3. All the reasoning described here and in the following will also be applied independently in the cases Labour versus non-Labour, Republican versus non-Republican (US) and Democrat versus non-Democrat (US).

Methodology
To be able to use Theorem 1 to make predictions, we first need to estimate the proportion of potential stubborn nodes in the population, that is the percentage of votes which are guaranteed for or against Conservatives. Let s 0 denote the number of stubborn state-0 (non-Conservative) nodes and s 1 that of state-1 (Conservative) ones. We look for the values ps ‹ 0 , s ‹ 1 q that maximise the log-likelihood of the observed data. Let's say we want to predict results for the i th election. Because we need at least two datapoints to make an estimaation, we require 3 ď i ď m`1. Following the notations introduced in Section 3 we let p ps 0 ,s 1 q k,l ptq denote the theoretical probability for N 1 ptq to go from k to l in t units of time when there are respectively s 0 and s 1 state-0 and state-1 stubborn nodes. We seek to solve argmax s 0 ,s 1 i´2 ÿ j"1 log´p ps 0 ,s 1 q x j ,x j`1 pt j`1´tj q¯ (6) Indeed, p ps 0 ,s 1 q x j ,x j`1 pt j`1´tj q is by definition the probability for Conservatives to win x j`1 percent of the votes in the pj`1q th election knowing they won x j percent in the j th one. Thus we seek to simultaneously maximise the likelihood of all past elections results. Let Q ps 0 ,s 1 q be the matrix with entries calculated via (2). By Theorem 1, we have that (6) is equivalent to argmax s 0 ,s 1 i´2 ÿ j"1 log " e pt j`1´tj qQ ps 0 ,s 1 q ı The computation of matrix exponential is typically done in cubic time and quickly becomes intractable as the size of the matrix increases. Here however, because we have n " 100, the number of possible couples ps 0 , s 1 q is small enough here that (7) can be solved by directly computing the sum for each of these couples individually. The optimal value s ‹ 1 for s 1 then gives us an estimation of the percentage of votes "locked" by the Conservative party, proportion of the population that will always root for them. The optimal value s ‹ 0 for s 0 is an estimate of the quantity of such votes for all other parties aggregated.
To make a forecast for the i th election, we just have to apply Theorem 1 with Q " Q ps ‹ 0 ,s ‹ 1 q , n 1 " x i´1 and t " t i´ti´1 . Equation 4 then gives us the expected percentagẽ x i of votes gathered by Conservatives on that occasion. This can then be compared to the actual value x i to assess the efficacy of our approach.
6 Results for the UK We show in Table 1 (left) the estimated values for ps ‹ 0 , s ‹ 1 q, updated with each new election. They seem to globally stabilise between 15 and 25 for both parties. Look at the last value in the Labour case for example, which is p24, 15q. According to our model, this means there is an estimated proportion of 15% of voters that will always vote Labour. On the other side, 24% of voters are found to be stubborn "anti-Labour" -by that we don't mean that they are fundamentally against the Labour party but rather that they will never vote for it. Note that these estimates fluctuate according to the variability of the data. For example in 1922 and 1923 there were twice in a row 38% votes for Conservative 3 and as a result it was estimated that 38% of all voters are stubborn pro-Conservative and the 62% are stubborn against the party. This is indeed what maximises the likelihood, with this configuration yielding a probability of 1 for the observed values. On the other hand, with pro-Conservative votes jumping from 38% to 61% in 1935, estimated values of s 0 and s 1 dropped significantly to account for the wide range covered by the data.
In Figure 1 and Figure 2 we compare our predictions, that is the expectationsx i , with the real outcomes x i . We plot both values for each election starting with the third one that took place in 1924, because the optimisation problem (7) requires i ě 3. For both parties, most values seem to fluctuate around the 40% mark. The global tendency of the real outcomes looks respected by the predictions, albeit with less variability. Also note that most predictions appear to be within a˘5% vicinity of the real values.
To get a better insight we look at the absolute errors |x i´xi | of our predictions. We plot running averages over the last 5 elections in Figure 5. After a few erratic first years they seem to stabilise between 2 and 8%. More precisely, if we discard the first few years up until 1960 where the model lacks sufficient amount of data to properly calibrate, we get MAEs of respectively 4.63% and 5.23% for Conservative and Labour. Minimal values of 0.06% for Conservatives in 1979 and 0.40% for Labour in 2001 are observed, showing that our method was able to make very accurate predictions in these cases. Surprisingly however, the errors do not seem to monotically decrease over time, but rather fluctuate. As a matter of facts, peak absolute errors were observed in 1983 (Labour, 13.0%) and 1997 (Conservative, 13.6%).

Results for the US
We apply the exact method described above to the case of presidential elections in the United States. As we did before we independently consider two cases, Republicans versus non-Republicans and Democrats versus non-Democrats. Presidential elections in the US take place every 4 years and we start with the year 1912, then 1920, 1924, and so on. Here again, keep in mind that due to how the American system work, the party with the most popular votes does not necessarily win the elections. The first estimation we are able to make is based on the first two elections and thus our first prediction is for 1924.
We observe similar results as in the UK case. Stubborn values (Table 1, right) estimated ps ‹ 0 , s ‹ 1 q are close, albeit a little bit lower -stabilising at (18,17) for Republicans and (16,14) for Democrats. Regarding the predictions (Figure 3, Figure 4) we again see a majority of them within a 5% margin from the actual outcomes, and a prediction curve that looks more stable than the slightly spiky ones with real values. Note that because of the two-party system in place in the United States, both Republicans and Democrats see their share of popular votes fluctuate around the 50% mark. In the previous case, it was rather around 40% because of the space occupied by smaller parties such as Liberal Democrats or Scottish National Party amongst others. The two-sided aspect of our model -always one party (0) versus another (1) -may thus be more adapted to the study of the US system.
As for the errors, running averages over the last 5 elections are shown in Figure 6. Here again after a few erratic first years values appear to be comprised between 2 and 8%. However, where errors in the UK case seemed to increase in the last few years, here they to are dropping down. In fact, our most accurate forecast regarding Democrat votes is for 2016, with only 0.04% error. For Republicans it is in 1940 with 0.10%. Peak errors were again around 13% for both parties, in 1972 (Republicans, 14.0%) and1964 (Democrats, 12.3%). The MAE over all elections, starting in 1940 when forecasts start to stabilise, is 4.27% for Republicans and 4.83% for Democrats. This is slightly better than in the UK case (4.63% and 5.23%). The MAE error over both cases is then 4.74%.

Conclusion and future work
In this paper we proposed a new method for the forecast of elections results. A lot of published work have used Twitter data for this purpose, usually applying machine learning algorithm to extract sentiment from tweets and estimate a candidate's popularity this way. Such methods have been criticised in the past few years, with problem ranging from bot presence to text mining reliability that cast doubt over their reliability. As such, our model does not rely on Twitter data at all. Instead, we used official results of past elections in the United Kingdom and in the United States to try and predict outcomes of future ones.
Our method is based on findings from a previous work of ours, where we conducted a theoretical analysis of the voter model with stubborn on strongly-connected graphs. Here we applied those in the case to try and predict the percentage of popular votes won by Conservative and Labour parties in the United Kingdom, and the percentage of popular votes collected by the Republican and Democratic parties in the United States. To do so, we considered official results of past elections as observations of independent realisations of the voter model. From there we were able to perform time-evolving estimates of the model parameters and use them to forecast an outcome.
Our model yielded an MAE of 4.74%, reaching absolute errors as low as 0.04% and as high as 14%. In their review, Gayo-Avello (2013) suggest that any model used to predict the elections outcome should not have an MAE higher than 1 or 2%. This is because the result of an election is more often than not the matter of just a few percents. According to this standard, our MAE is not low enough to reliably predict the outcome of an election. Some previous works reached error averages as low as 0.63 Saleiro et al. (2016) and 1. 65% Tumasjan et al. (2011). Additionally, we tested our method against the baseline of systematically predict the exact result of the previous election. This simple method returned an overal 5.03% average error, which is not much worse than the 4.74% obtained via our method. Moreover, the first few elections results were discarded in both cases, as it was deemed that the model did not have enough data at this point to make predictions with a high enough confidence. The choice of a limit though is made on the basis of our observation of the model's behaviour and is purely subjective. Changing the limit would in turn make for different results that might be better or worse.
Although our method did not prove to yield significant enough results here, we believe it is an interesting step in a novel direction. Despite some impressive accomplishments, there is lot of controversy regarding Twitter-based forecasting which is deemed unreliable by a lot of authors. Thus our method that uses results of previous elections provides a new take on the matter, which only relies on official data. Also our model does not only forecast the elections results, it also gives us estimates of the quantity of stubborn voters that are firmly pro or against any given party. This provides meaningful insight on the political landscape of the considered areas.
Several extensions of the model could be considered to improve its accuracy. First of all, adding in-between election polls to the data would go a long way in improving the estimates. With a few years gap from one election to another, it is too wide a range of possibilites for the model to account for. Second, one could take a deeper look into the past of a country's results adn try to detect tendancies about landslide victories, incumbency reelection and so forth. We believe that having a deeper understanding of the specific country one is working with could substantially improve the model calibration process.

Funding
This project was funded by the UK EPSRC grant EP/S022503/1 that supports the Centre for Doctoral Training in Cybersecurity delivered by UCL's Departments of Computer Science, Security and Crime Science, and Science, Technology, Engineering and Public Policy.

Availability of data and materials
All code used is available online at https://github.com/AntoineVendeville/HowOpinionsCrystallise. Data for the United Kingdoms elections is available online (Audickas et al., 2020). Data for the United States elections has been crawled from Wikipedia (https://en. wikipedia.org/wiki/United_States_presidential_election#Popular_vote_results). The shaded area covers a˘5% deviation away from the predictions.  The shaded area covers a˘5% deviation away from the predictions. The shaded area covers a˘5% deviation away from the predictions.   Year Conservative Labour