Semi-supervised graph labelling reveals increasing partisanship in the United States Congress

Graph labelling is a key activity of network science, with broad practical applications, and close relations to other network science tasks, such as community detection and clustering. While a large body of work exists on both unsupervised and supervised labelling algorithms, the class of random walk-based supervised algorithms requires further exploration, particularly given their relevance to social and political networks. This work refines and expands upon a new semi-supervised graph labelling method, the GLaSS method, that exactly calculates absorption probabilities for random walks on connected graphs. The method models graphs exactly as discrete-time Markov chains, treating labelled nodes as absorbing states. The method is applied to roll call voting data for 42 meetings of the United States House of Representatives and Senate, from 1935 to 2019. Analysis of the 84 resultant political networks demonstrates strong and consistent performance of GLaSS when estimating labels for unlabelled nodes in graphs, and reveals a significant trend of increasing partisanship within the United States Congress.


Introduction
Graph labelling is concerned with the problem of estimating the labels of one or more nodes within a graph, where an association between the graph's structure and the distribution of labels is assumed to exist. Many graph labelling algorithms exist, both supervised [2,8,17] and unsupervised [11,21]. In both approaches, a graph comprises unlabelled and labelled nodes, and the algorithms seek to estimate the labels of the unlabelled nodes. While a diverse range of graph labelling methods exist [4], this work focusses on the class of dynamical and statistical inference methods that use random walks.
One prominent application of network science is in the analysis of political networks [18,19], including the labelling of nodes in political voting networks. Previous works have examined methods to locate individual politicians within a multidimensional political spectrum [13,14], the detection of voting blocs or communities within a political voting network [20], and an analysis of partisanship trends reflected in voting networks from the United States Congress [1,12]. This work presents an analysis of a large collection of United States Congressional roll call voting networks, using a semi-supervised graph labelling method to determine the party affiliation of individuals. Changes in partisanship over time are also examined, with results in accordance with previous studies [1,12].

Random Walk-Based Graph Labelling Methods
In unsupervised algorithms, the graph is organised into clusters, without consideration of the labelled nodes. Once clustered, labels for unlabelled nodes in the graph can be estimated based on the clusters to which labelled nodes belong. However, cases may arise where an identified cluster contains no labelled nodes, or where a cluster contains multiple nodes with different labels, creating uncertainty as to how labels should be estimated for nodes in such clusters.
The Walktrap algorithm is one commonly used random walk-based unsupervised graph labelling method [11]. Walktrap searches for densely connected subgraphs by simulating short random walks on a graph, reasoning that short walks are more likely to remain in the same cluster than to leave it. Walktrap quantifies the similarity between nodes using a distance metric, then recursively merges identified clusters based on short random walks, providing a hard classification for each node. Because Walktrap does not use information about labelled nodes, there is no generally accepted method for estimating the labels for unlabelled nodes based on the clusters it identifies.
Unlike unsupervised algorithms, supervised algorithms utilise the information contained in labelled nodes when estimating the labels of unlabelled nodes. A common approach is to treat labelled nodes as absorbing states and unlabelled nodes as transient states in a discrete-time Markov chain (DTMC), and estimate the absorption probabilities or expected times to absorption for all transient states in the chain. Labels for each unlabelled state can then be estimated using the approximate probabilities or times. However, while existing supervised and semi-supervised methods use both labelled nodes and the graph's structure to estimate labels, they only approximate absorption probabilities and times, rather than calculating them exactly.
The Rendezvous algorithm [2] labels nodes in a semi-supervised setting by constructing a simplified, "rendezvous" graph, where edges are drawn from an unlabelled node to only its M nearest neighbours. M is chosen to be as small as possible while ensuring that each unlabelled node in the rendezvous graph is connected to at least one labelled node. Once the rendezvous graph has been constructed, edge weights are calculated using a Euclidean distance metric, and absorption probabilities are calculated using the eigenvalues and eigenvectors of the rendezvous graph's transition matrix. Absorption probabilities for nodes in the rendezvous graph are then used to estimate the label of nodes in the full graph.
Another semi-supervised graph labelling method seeks to label nodes in a binary setting according to expected time to absorption, rather than absorption probability [8]. The"Censored Time" method simulates step-limited random walks over a graph, recording the number of steps taken for all walks that are absorbed before being terminated by the step limit. The censored times to absorption for absorbed walks are used to approximate the conditional expected time to absorption in each labelled node in the graph. A "hard" binary classification is used to estimate labels according to the lowest censored conditional time to absorption.

Political Science and Networks
Analysis of United States Congressional voting data is a popular activity within the field of political science and political networks, in part because large amounts of voting data are freely available [10]. Various attempts have been made to analyse voting trends within Congressional voting data, including modelling Congresses as political networks, where nodes represent individual politicians, and edges capture some relationship between them. DW-NOMINATE [14], and its predecessors, D-NOMINATE [13] and W-NOMINATE, represents one of the most detailled attempts to study voting behaviour and trends in the United States Congress. A multidimensional scaling method, DW-NOMINATE models individual politicians as points embedded in multidimensional space. Each point, representing the politician's true political alignment, can be estimated by analysing historical voting records. Individuals with similar ideologies (as reflected by their voting records) are spatially "close" to one another, while individuals with differing ideologies are "distant". Amongst many other applications, DW-NOMINATE is notable for its use in analysing changes in partisanship over time [12].
More recent work also discusses changes in partisanship in roll call voting networks over time [1]. The work examines pairs of nodes within roll call voting networks, modelling the probability distibutions for cooperation between politicians (edges between nodes) from the same party and from opposing parties. A significant long-term trend of increasing partisanship and decreasing inter-party cooperation is identified; increasing the probability of edges between nodes from the same party and decreasing the probability of edges between nodes from opposing parties. The work also makes reference to the continued, though diminishing, presence of "super-cooperators" -members who cooperate across party lines -in Congress.
Separate work examining United States Congressional voting data uses modularity to measure political polarisation [20]. This work detects voting blocs or communities within Congressional roll call voting networks without making assumptions that rely on the two-party system, revealing more (and more varied) groups than simply Democrats and Republicans. The composition and behaviour of blocs is observed to vary significantly over time, as are the strengths of connections between blocs. The work reveals not only increases in partisanship over time, but points to a possible underestimation of partisanship by other methods in Congresses with weaker party structure.

Contributions
This work expands upon a new semi-supervised graph labelling method, the Graph Labelling Semi-Supervised (GLaSS) method, using random walks to absorption [6]. The method models a graph as a DTMC, where transient states correspond to unlabelled nodes, and absorbing states correspond to labelled nodes. The transition matrix P , for the DTMC, is formed from the graph's weighted adjacency matrix by normalising the weighted out-degree of each node in the network. From careful construction of P , the probability of absorption in each absorbing state can be calculated exactly, and these probabilities can then be used to estimate the label for every node corresponding to a transient state in the DTMC.
By calculating exact absorption probabilities and expected times to absorption, the GLaSS method provides better label estimates than contemporary supervised methods, which rely on approximations of these quantites [6]. By utilising the information contained in labelled nodes in the graph, GLaSS also provides a clear method for estimating the label of unlabelled nodes using quantities that are meaningful and interpretable, unlike unsupervised random walk methods.
This work also contributes to existing work on political networks, through the analysis of a large collection of US Congressional roll call voting networks. In particular, this work contains the first analysis of roll call voting networks using a random walk-based graph labelling method, while also identifying notable trends wthin the House of Representatives and the Senate. The GLaSS method is able to detect and confirm rising partisanship within the House of Representatives and the Senate [1], while also identifying possible historical periods of reduced partisanship.
This work formally introduces the GLaSS method, describes, in detail, the data to be analysed, presents a full description of all analyses performed, and discusses the results of this analysis and possible areas of further work.

Method
Consider an undirected graph G = (V, E) comprising n nodes, V = {v 1 , ..., v n }, connected by a set of positive real-weighted edges E. Define the weighted adjacency matrix A = [a i,j ], where a i,j = a j,i records the weight of the edge connecting v i and v j , and a i,j = 0 if no edge connects v i and v j . Suppose the first u nodes in G are unlabelled, and the remaining nodes in G are labelled, where n = u + , and construct the sets U = {1, ..., u} and L = {u + 1, ..., n} to index the unlabelled and labelled nodes of G, respectively. Arrange A as where A J,K describes the weighted edges connecting nodes indexed by J to nodes indexed by K. Consider a random walk on G, described by a discrete-time Markov chain (DTMC) where all unlabelled nodes map to transient states and all labelled nodes map to absorbing states. Let X t denote the state of the chain at time t. Calculate the transition probabilities for the DTMC using the adjacency matrix A, where is the probability that the DTMC is in state j at the next time step, given that the DTMC is currently in state i. Construct the transition matrix The u × u matrix R governs transitions between transient states, the u × matrix S governs transitions from transient states to absorbing states, 0 is an × u zero matrix, and I is the × identity matrix.

DTMC Absorption Probabilities
Let h i,j be the probability that the DTMC is eventually absorbed in state j, given that the chain starts in state i. Define the matrix of absorption probabilities H = [h i,j ]. H is restricted to have u rows and columns, corresponding to the u transient states and absorbing states of the DTMC, respectively. Then H can be formally calculated as where I u is the u × u identity matrix, and R and S are as above [7]. a

Semi-Supervised Graph Labelling
Given a graph G and the matrix of absorption probabilities H, let the random variable Y i be the label of an unlabelled node v i , and let x j be the label of a labelled node v j . The distribution over Y i can be directly derived from H, for all i ∈ U , as follows: where 1 is the indicator function, taking value 1 if its argument is true, and 0 otherwise.

DTMC Expected Times to Absorption
Let t i be the expected number of time steps before the DTMC is absorbed in any absorbing state, given that the chain starts in state i. Define the vector of expected times to absorption t = (t 1 , ..., t u ) T , where the u elements of t correspond to the u transient states of the DTMC. Then t can be calculated as where c is a column vector of length u whose entries are all 1, and I u and R are as above [7].

The Graph Labelling Semi-Supervised (GLaSS) Method
Consider a graph G, with u unlabelled nodes and labelled nodes, and suppose that all labelled nodes have one of two labels; either K 1 or K 2 . From the weighted adjacency matrix A, construct the transition matrix P , as in (1). Using P , calculate the vector of expected times to absorption t, as in (5). The expected times to absorption may, optionally, be used as a filtering criterion; nodes with a large expected time to absorption, relative to the disibution of t i over all nodes in the graph, may be excluded from further analysis. Once nodes have been optionally filtered using t, calculate the matrix of absorption probabilities H, by (3), and calculate P (Y i = K 1 ) and P (Y i = K 2 ) for all i ∈ U , as in (4). Because P (Y i = K 1 ) + P (Y i = K 2 ) = 1, only one probability is required to classify the unlabelled nodes.
For the purposes of this analysis, nodes are classified in the following way: Suppose that m of the u unlabelled nodes in G have a true label K 2 , and that the remaining (u − m) unlabelled nodes have a true label K 1 . That is, the ratio of nodes with label K 1 to nodes with label K 2 is known, but which nodes should bear those labels is not. Sorting the probabilities P (Y i = K 1 ) from smallest to largest, the m th order statistic (the m th smallest probability) is chosen as a threshold α, and a binary classifier is implemented. If P (Y i = K 1 ) > α, estimate the label for node v i as K 1 ; otherwise, if P (Y i = K 1 ) ≤ α, estimate the label for node v i as K 2 . Thus, α is chosen to assign a label of K 1 to the (u − m) nodes deemed most likely to have that label, and assigns a label of K 2 to the remaining m unlabelled nodes in G.
Using this method, it is possible to estimate the label for every unlabelled node in G. This method forms a modification and extension to the GLaSS method [6], a graph labelling method in a semi-supervised setting. Hereafter, we refer to this modification as "the GLaSS method".

Data
Validating the GLaSS method requires graphs with a clear community structure and known labels for all nodes. To emulate a graph with few known labels, only a small subset of all known labels will be used by GLaSS, with the remaining labels withheld to emulate "unlabelled" nodes in the graph. All labels estimated by GLaSS can then be compared to actual, withheld labels, to assess performance. United States roll call voting data are chosen to validate the GLaSS method.
In the United States House of Representatives (the House) and the Senate, parliamentary procedure occasionally gives rise to roll call votes. In a roll call vote, the vote of every member of the House or the Senate is recorded, making it possible to see which members voted the same way. Roll call voting data for the House and the Senate can be modelled as an undirected graph, where each node represents a member of Congress, and a positive integer-weighted edge records the number of times respective members voted the same way. Roll call voting data for the House and the Senate are modelled as separate graphs.
The results of roll call votes in the House and the Senate for 42 separate Congresses, between 1935 and 2019, b have been collected for analysis, and modelled as 84 separate undirected graphs. The data has been made available on Voteview [10] and Figshare for analysis. c For simplicity, in each Congress, the following rules are applied: 1 Only "yea" and "nay" votes are considered. 2 Only members whose party affiliation is Democrat or Republican are considered. 3 In cases where a member's party affiliation changes during a meeting of Congress, their party affiliation at the time they were elected is used. 4 In rare cases, a member of Congress does not sit for the entire meeting of Congress, and their seat is taken by a new member. In these cases, the voting records of both members are retained. d 5 In both the House and the Senate, votes where the Democrat and Republican leader cast the same vote (either "yea" or "nay") are not considered, as they provide no information about partisanship. 6 In both the House and the Senate, multiple members may serve (nonconcurrently) as party leader. In these cases, the vote cast by the party leader at the time the vote was held is considered when implementing rule 5.

Results
Each House and Senate is modelled as a graph, and each graph is analysed using the GLaSS method, as described above. Expected time to absorption is calculated for each "unlabelled" node in each graph. Based on the distribution of t, for each graph, no filtering is required, and labels are estimated for all "unlabelled" nodes in all graphs.
In graphs containing only two labelled nodes (one Democrat leader, one Republican leader), each labelled node forms an absorbing state, and the probability of being absorbed in the Democrat state of the corresponding DTMC is considered. In graphs containing more than two labelled nodes (multiple Democrat leaders or multiple Republican leaders), labelled nodes for each party are taken to form an absorbing class, and the probability of being absorbed in the Democrat class of the corresponding DTMC is considered. For illustrative purposes, full graphs, and histograms of absorption probabilities for the 90th and 110th Senates are provided in Figure 1. Histograms for all Houses and all Senates show separation between Democrat and Republican members, though some overlap between clusters does exist for some Congresses. Using the binary classifier in GLaSS, a threshold α is chosen for each House and each Senate. If P (Y i = Democrat) > α, then member i is labelled a Democrat; otherwise, member i is labelled a Republican. Estimated labels are compared to the true party affiliation for all "unlabelled" nodes in each graph. A confusion matrix is constructed for each graph, and used to calculate an F1 score, to measure the performance of GLaSS. An F1 score of 1 implies that GLaSS is able to correctly label all "unlabelled" members in the corresponding House or Senate. Plots of F1 score over time are given for the House and the Senate in Figures 2 and 3, respectively. F1 scores calculated for the 84 graphs range from a minimum of 0.8571 (achieved by the 90 th Senate) to a maximum of 1 (achieved by 8 Houses and 9 Senates). In particular, every House from the 108 th (2003-05) onwards has an F1 score of 1, implying that the GLaSS method was able to perfectly identify the party affiliation of every member in those Houses. The plot of standardised differences shows the magnitude of overlap (values below the horizontal line at 0) or separation (values above the horizontal line at 0) between Democrats and Republicans, according to absorption probabilities calculated by the GLaSS method. F1 score appears to decrease with increasing magnitude of overlap, while also showing that the two parties have grown increasingly far apart since they first separated entirely in the 108 th House. To better understand the behaviour of graphs with an F1 score of 1, absorption probabilities are standardised using the population mean and pooled variance. The difference between the lowest standardised P (Y i = Democrat) among all true Democrats and the highest standardised P (Y i = Democrat) among all true Republicans is calculated and plotted against time (see Figures 2 and 3). In conjunction with plots of F1 score over time, these figures illustrate where overlap exists between Democrats and Republicans, but also by how much Democrats and Republicans are separated when they do not overlap. Figures 2 and 3 show a notable drop in F1 score through the 1960s and 1970s, corresponding to a period where the GLaSS method is less able to determine the party affiliation of members of the House and the Senate. The causes for this drop are unclear, but represent a possible decrease in partisanship during this period. The figures also show that, over the last 10 to 15 years, partisanship in both the House and the Senate has increased significantly. During this period, the F1 scores for both Bottom: Difference between the smallest standardised P (Y i = Democrat) among all true Democrats and the largest standardised P (Y i = Democrat) for all true Republicans for the Senate from 1935-37 (74 th Senate) to 2017-19 (115 th Senate). F1 scores show some variability over time, but are again relatively high for all Senates (minimum F1 score = 0.8571), indicating that the GLaSS method also performs very strongly in labelling members of the Senate as Democrat or Republican. Every Senate since the 110 th (2007-09) has an F1 score of 1, implying complete separation of the parties and perfect performance by GLaSS for those Senates. The Senate also experienced complete separation of Democrats and Republicans in 1983-85, 1995-97, and 1997-99 (98 th , 104 th , and 105 th Senates, respectively). The plot of standardised differences illustrates the magnitude of overlap or separation between Democrats and Republicans in the Senate, as measured by the GLaSS method. Some negative association between F1 score and magnitude of overlap is apparent, while it is also clear that the parties are now more separated in the Senate than at any time since 1935-37.

The Effects of Party Affiliation and Control on Partisanship
To better understand the factors that may influence partisanship within the House and the Senate, as measured by F1 scores calculated for the GLaSS method, two regression models are fitted. The first model examines the effect of three factors on F1 score; which party holds a majority in the House, which party holds a majority in the Senate, and which party holds the Presidency. Each factor comprises two levels -in each instance, Democrats or Republicans are in majority in the House, Democrats or Republicans are in majority in the Senate, and the sitting President is a Democrat or Republican -and all two-way and three-way interaction terms are considered. Thus, the full model is where F is F1 score, H is the party in majority in the House, S is the party in majority in the Senate, P is the party affiliation of the sitting President, and × denotes an interaction between factors. The model is fitted for F1 scores from the House and the Senate separately, but no significant predictors are identified in either case. In the second model, the number of factors is reduced. For the House, two binary factors are considered; whether the party that holds a majority in the Senate is the same as the party that holds a majority in the House, and whether the party that holds the Presidency is the same as the party that holds a majority in the House. An equivalent model is specified for the Senate, and two-way interactions are considered in both cases. For the House, the full model is where S' and P' denote whether the party in control of the Senate and the Presidency, respectively, is the same as the party in control of the House. For the Senate, the full model is where H' and P' denote whether the party in control of the House and the Presidency, respectively, is the same as the party in control of the Senate. In (7) and (8), F denotes F1 score for the House and Senate, respectively, and × is as previously defined. Again, no significant predictors of F1 are identified for the House or the Senate. Thus, while partisanship has clearly varied over time in both the House and the Senate, it appears that this variation is not explained by which party controls each branch of the US Government, or the interplay between controlling parties.

Discussion
Graph labelling is a fundamental task within network science, with diverse applications. This work builds upon a previously introduced [6] semi-supervised graph labelling method using random walks to absorption, the GLaSS method, and uses it to analyse a collection of undirected politcal networks from the United States House of Representatives and Senate. In these networks, the GLaSS method is used to estimate the party affiliation of members of the House and the Senate based on roll call voting data. The GLaSS method shows universally strong performance in analysing these networks, returning F1 scores in excess of 0.85 for all 84 graphs, even where graphs display significant overlap between the two communities. In 20% of cases (17 of the 84 networks analysed), GLaSS returns an F1 score of 1, indicating perfect labelling of all members in the House or the Senate based on random walks to absorption in these roll call voting networks. Previous work evaluating GLaSS showed that it outperormed other supervised and unsupervised random walk-based graph labelling methods in anaylsing a small series of undirected political networks [6]. Results in this work provide more evidence that continued investigation and evaluation of the GLaSS method is warranted. Future work will extend this work to examine the performance of the GLaSS method for graphs of varying size, connectedness, density, and with different numbers of known labels. Extending the GLaSS method to label graphs with more than two clusters, and graphs with fewer labelled nodes than clusters, is of particular interest.
This work also provides reinforcing empirical evidence of increasing partisanship in United States politics [1,12], while also representing the first such analysis to use a random walk-based graph labelling method. Analysis of roll call voting data in both the House and the Senate using GLaSS shows that the Democrat and Republican parties have undergone a recent rapid separation. The party affiliation of members can now be entirely predicted by their voting trends, where previously some uncertainty existed. This raises important questions about the causes of the recent and increasing polarisation in both the House and the Senate, as well as what factors historically decreased political partisanship in Congresses with lower F1 scores as calculated by GLaSS. Regression modelling indicates that variations in partisanship cannot be explained by which parties control the House, the Senate, and the Presidency.
In an applied setting, future work will also use GLaSS to further explore social, political, and other networks. Online and social-media networks are of particular interest, with a growing body of work examining the structure, dynamics, and polarisation of online social networks [3,5,15,16]. Future applied work with GLaSS will examine these characteristics for new and existing graphs. The roll call voting data presented here have a clear longitudinal structure; the construction of a metagraph from graphs of individual Houses or Senates and extension of the GLaSS method to analyse metagraphs is also an area for future work.
Endnotes a In practice, calculating the matrix inverse (I u − R) −1 may be computationally impractical when R is large, but a variety of techniques exist to find H approximately.
b Each meeting of Congress begins on January 3 and runs for a period of two