Characterization of the Firm-Firm Public Procurement Co-Bidding Network from the State of Cear\'a (Brazil) Municipalities

Fraud in public funding can have deleterious consequences for the economic, social, and political well-being of societies. Fraudulent activity associated with public procurement contracts accounts for losses of billions of euros every year. Thus, it is of utmost relevance to explore analytical frameworks that can help public authorities identify agents that are more susceptible to incur in irregular activities. Here, we use standard network science methods to study the co-biding relationships between firms that participate in public tenders issued by the $184$ municipalities of the State of Cear\'a (Brazil) between 2015 and 2019. We identify $22$ groups/communities of firms with similar patterns of procurement activity, defined by their geographic and activity scopes. The profiling of the communities allows us to highlight groups that are more susceptible to market manipulation and irregular activities. Our work reinforces the potential application of network analysis in policy to unfold the complex nature of relationships between market agents in a scenario of scarce data.


INTRODUCTION
Despite the weight of public procurement in governmental budgets [1], it is still one of the activities that is most vulnerable to corruption [2,3]. In that context, corruption can have many forms [4] and occur at any point in the procurement cycle-from the pre-tendering to the tendering and the past-award phases-making it difficult to detect and measure [5][6][7]. In Brazil alone, corruption in procurement contracts can represent an additional 20% to 30% of the expected price, which represents losses of around 200 billion Reais annually [8]. Likewise, in Europe, it is estimated that losses are of around 5B Euros annually [9]. Naturally, these losses undermine the ability of governments and public authorities to push-forward essential investments in health, education, infrastructure, security, housing, and social services [10,11]. Unsurprisingly, there is a considerable effort to develop analytical solutions to understand and mitigate the effects of corruption in the public procurement process [12]. Recently, the increasing availability of open data concerning public administration activities [13] has renewed the scientific community's efforts on uncovering the hidden connections between participating agents and how their relationships can link to fraudulent activities [14,15].
One of the most challenging aspects of identifying corruption in the context of public procurement contracts is related to the lack of labeled data. Hence, it is often impossible to know which instances correspond to corruption [16]. In that sense, past works have approached this problem from an unsupervised learning perspective, meaning that they look to extract from the data more information about the relationships between the involved parties and, thus, flag groups of agents with patterns as-sociated with a high risk of corruption. In that sense, a fundamental principle in public procurement is that of transparency in bidding [4,[17][18][19]. In that sense, competition leads to greater efficiency for the public sector. As such, firms that developed the necessary relationships to achieve leverage to manipulating a tender process at a high risk of corrupting the procurement process [20].
An open issue remains, can the communities identified from firms co-biding patterns allow us to highlight groups of firms that are more susceptible to collusion and market manipulation? The use of network analysis for the study of corruption is not new [21][22][23]. In the context of public procurement, past studies can be divided into two main groups: 1) works that explore bipartite relationships between public bodies and firms [24,25]; and 2) studies that explore firm-firm co-biding relationships in public tenders [26][27][28][29]. Both approaches have their merits, and each is more suitable to identify different mechanics underlying the manipulation of the procurement process. For instance, bipartite relationships are suitable to identify fraud that stems from bribes and influence ties; while firm-firm relationships are more suited to identify cartels and collusion. Despite these, the use of network analysis to study the relationship between firms in procurement bids is a relatively new venture [27], and more evidence is required in order to have a clear picture of the universality of existing patterns and mechanics across cultural and socio-economic contexts.
Here, we use methods from network science and complexity sciences to map and characterize the co-bidding network [16,27,30,31] between firms that participated in public tenders issued by the 184 municipalities of the state of Ceará (Brazil). In that sense, we provide a characterization of the relationships between competing firms and identify the major communities of firms that often compete for tenders with similar scope. Moreover, we ar-

Tender-Firm Bipartite
Unweighted and Directed     gue that some of such communities have characteristics that place them at a higher risk of market manipulation and irregular activities often associated with corruption.

DATA
We use data from Tribunal de Contas do Estado Ceará (Brazil) covering public tenders issued by the 184 municipalities of the State of Ceará between 2015 to 2019. Each observation informs about the bid of a firm to a tender and whether the bid was one of the winning bids. It also includes information about the municipality that issued the tender, and whether a firm won a contract. Hence, the data is naturally represented through a bipartite na-ture [32], which connects firms to tenders (see Figure 1a). The data set contains 196, 608 observations that account for the bids of 45, 502 firms to 84, 835 tenders.
Information about the firms and tenders is anonymized, and bidding values are not available. Moreover, the data set does not contain information about which contracts/firms have been investigated in the past for irregularities.

NETWORK INFERENCE
Since we are interested in studying the relationships between firms we focus on the Firm-Firm projection. We estimate the projection from the co-bidding patterns of firms [30] using the Jaccard similarity coefficient [16,[33][34][35]. Figures 1a shows a graphical illustration of the structure of the data and depicts the steps conducted in order to infer the Firm-Firm network from the original Tender-Firm bipartite structure.
In order to infer the Firm-Firm co-bidding network, we start by discarding all firms that did not bid at least once during each year of analysis. By doing so, we are able to extract the core of active firms, while removing firms with sporadic activity. Figure 1b and 1c compare the original (P ALL ) with the filtered data set (P Sample ). In particular, showing that filtering tends to remove excess participants from tenders, while it does not affect the distribution of the number of bids done by each firm. Likewise, we refer to firms present in the firm-firm cobiding network as Established firms. The final working data set includes 1, 906 firms, which account for 72, 078 bids to 39, 523 tenders.
Hence, next, we compute the centered Jaccard coefficient [35] is between each pair of firms, which can be computed as: where b ik is one if if firm i made a bid to tender t, being zero otherwise; and p i is the fraction of tenders in which firm i participated (p i = k b it . The second term in Equation 1 provides the expected number of observations when the bids from both firms are independent and identically distributed through a Bernoulli process [35]. Hence, the centered Jaccard coefficient allows us to distinguish between positive and negative associations between firms. Finally, we estimate the significance of the computed J c ij (i.e., to test the hypothesis that J c ij > 0). To that end, we bootstrap a null distribution (Ĵ ij ) of centered Jaccard coefficient for each by generating an ensemble of 1000 randomizations of the initial bipartite network. Data was randomized in order to ensure that the number of bids observed per firm and per year remained constant while keeping constant the number of firms bidding to each tender. Then, using statistical inference methods [36], we estimate the p-value associated with J c ij by calculating the upper tail probability of obtaining a value equal or greater than J c ij from the cumulative frequency of the null-distributionĴ ij . We discard links with p-value > 0.05.
The resulting firm-firm co-bidding network contains 1, 529 nodes and 12, 892 edges. Relationships are treated as undirected and unweighted, identifying firms that have a similar pattern of bidding. The network exhibits an average degree of 16.86, with a cluster coefficient of 0.52 [38], and 56 connected components. Figure 2 shows the Degree Distribution (panel 2a) decays exponentially with the degree. Meaning that the underlying mechanics of co-bidding can be approximated with a random attachment process [39]. However, the average clustering coefficient shows an inverse relationship with the degree (panel 2b), suggesting the existence of some level of hierarchy in the structure of the network. It is noteworthy to mention that the largest connected component contains 1, 141 nodes, 10, 630 edges, and a clustering coefficient of 0.43. Figure 3 presents the giant component of the firm-firm co-bidding network. In particular, we highlight the eight largest communities (nodes are colored accordingly). Using the Louvain algorithm [37] we identified 22 communities with a modularity of 0.66. We refer to these communities as C 1 , C 2 ,...,C 22 , whose index is ordered in descending order to the number of firms in the communities. The high modularity of the network is, however, unsurprisingly and can be explained by the fact that the network represents firms in different markets characterized by the different nature of contracts (e.g., works, services, etc) in different regions. As indicated in Figure 3, the largest communities divide the network into two major groups of firms that operate mostly in the northern (Red, Blue, and Green) and south (Purple and Yellow) regions of the state of Ceará, but also on contracts that deliver Food services (Blue and Purple) or construction works (Red and Yellow). Interestingly, the remaining communities highlighted identify firms that compete at state-wide level (Pink and light Blue) and one particular instance of a community (Violet) operating in a small region (Jaguaribe) and a single contract type (Food services). Hence, the network highlights competing markets and provides a characterization that is interpretable.

RESULTS AND DISCUSSION
As discussed above, communities in Firm-Firm network structures can be used to identify market segments. In limiting scenarios, cases in which firms form "echo chambers" or highly dense communities, it also allows us to flag groups of firms that present a high risk of collusion and procurement manipulation. In other words, corruption. As such, next, we explore the use of network science as an approach to classify the communities of firms that might be of interest to investigate deeper by the authorities of interest.

Activities Diversity
We start by looking at the regional diversity on which firms performed their activities (e.g., bid on tenders), and also the diversity of the type of contracts that they bid to. While a firm with low diversity in both regional reach and contract-type can simply indicate a firm that is narrow in both scope and domain, the existence of clear groups (i.e., a community of firms) that share such indicator can highlight more a troublesome scenario. In particular, it can indicate the conditions for firms to coordinate and cooperate to control a specific market in a specific regional context, and should be investigated with further care. To

< l a t e x i t s h a 1 _ b a s e 6 4 = " z 1 p m 8 Z Y m o G m q 2 n e k 7 z B S X h l O p M A = " > A A A C D 3 i c b Z C 7 S g N B F I Z n 4 y 3 G W 9 T S Z j A o s T D s i k H L o B a W E c w F k h h m J y f J k N k L M 2 c l Y d k 3 s P F V b C w U s b W 1 8 2 2 c X A q N / j D w 8 Z 9 z O H N + N 5 R C o 2 1 / W a m F x a X l l f R q Z m 1 9 Y 3 M r u 7 1 T 1 U G k O F R 4 I A N V d 5 k G K X y o o E A J 9 V A B 8 1 w J N X d w O a 7 X 7 k F p E f i 3 O A q h 5 b G e L 7 q C M z R W O 3 t 4 1 Y 6 b C E O M k Q m Z J P n B E W 2 y M F T B k M J d f G w X 7 C I d J O 1 s z t B E 9 C 8 4 M 8 i R m c r t 7 G e z E / D I A x + 5 Z F o 3 H D v E V s w U C i 4 h y T Q j D S H j A 9 a D h k G f e a B b 8 e S e h B 4 Y p 0 O 7 g T L P R z p x f 0 7 E z N N 6 5 L m m 0 2 P Y 1 / O 1 s f l f r R F h 9 7 w V C z + M E H w + X d S N J M W A j s O h H a G
A o x w Z Y F w J 8 1 f K + 0 w x j i b C j A n B m T / 5 L 1 R P C k 6 x Y N + c 5 k o X s z j S Z I / s k z x x y B k p k W t S J h X C y Q N 5 I i / k 1 X q 0 n q 0 3 6 3 3 a m r J m M 7 v k l 6 y P b 5 N 0 m 7 E = < / l a t e x i t > D tail (k) ⇡ e 0.05k a) b) that end, we start by estimating the Simpson's diversity index [40] for each community, which can be computed as

< l a t e x i t s h a 1 _ b a s e 6 4 = " / T y w 0 n N h v T H 5 J O 1 t v j h j j O j 8 + b 8 = " > A A A B / 3 i c b V D L S g M x F M 3 U V 6 2 v U c G N m 2 A R 6 s I y I 1 p d F r t x W c E + o B 1 L J s 2 0 Y T K Z k G T E M n b h r 7 h x o Y h b f 8 O d f 2 P a z k K r B y 4 c z r m X e + / x B a N K O 8 6 X l V t Y X F p e y a 8 W 1 t Y 3 N r f s 7 Z 2 m i h O J S Q P H L J Z t H y n C K C c N T T U j b S E J i n x G W n 5 Y m / i t O y I V j f m N H g n i R W j A a U A x 0 k b q 2 X u 1 U n g E u 0 g I G d / D 8 D Y 9 d s q V y r h n F 5 2 y M w X 8 S 9 y M F E G G e s / + 7 P Z j n E S E a 8 y Q U h 3 X E d p L k d Q U M z I u d B N F B M I h G p C O o R x F R H n p 9 P 4 x P D R K H w a x N M U 1 n K o / J 1 I U K T W K f N M Z I T 1 U 8 9 5 E / M / r J D q 4 8 F L K R a I J x 7 N F Q c K g j u E k D N i n k m D N R o Y g L K m 5 F e I h k g h r E 1 n B h O D O v / y X N E / K 7 l n Z u T 4 t V i + z O P J g H x y A E n D B O a i C K 1 A H D Y D B A 3 g C L + D V e r S e r T f r f d a a s 7 K Z X f
where p t Ci corresponds to the fraction of bids done in a procurement contract of type γ :{Consumables Health, Services, Construction, Events, Food, Fuel,...} or regions type γ :{Metropolinana, Norte, Sul, Noroeste,...} by the firms in community C i , ∀i ∈ {1, 2, ..., 21, 22}. The quantity p t Ci is normalized per community, so that t p t Ci = 1.0. We estimate λ Ci cat independently for each community (C i ), and for contracts according to the region that issue the tender and the tender contract type (e.g., services, food, tenancy, constraction, etc). Our choice of the Simpson's index over other alternatives (e.g., entropy) is due to its straightforward interpretation: the probability that two bids from a community are in the same category (e.g., region or contract type). Figure 4a illustrates the empirical distributions (p t Ci ) of procurement activity for the ten largest communities. We show the results for both the Regional distribution of activities and by Contract Type. Blue colors denote a low relative frequency of bids, while red identifies a high frequency. These indicators allow us to infer the degree of specialization and agglomeration of a community. In particular, we find that Community 8 (C 8 ) activities are agglomerated in a single region (Jaguaribe) and firms specialize in one type of contract (Food). The same conclusion can also be inferred from the high levels of λ C8 cat , which means that Community 8 has low diversity of activity distribution. Figure 4b compares all the 22 communities in terms of the two diversity indicators defined above. We find a clustering of communities in the bottom left quadrant-a low level of agglomeration and specialization-that we associate with healthy markets composed of firms that, on average, have a diversified portfolio of activities and regional distribution. In contrast, in the top right quadrant, we find communities that rely on procurement contracts of a single type and agglomerated in a small number of regions.
The combination of these two diversity indicators, at the community level, provides a powerful feature to identify groups of firms that can dominate over a niche market or, in the worst case, develop undesirable leverage, as a group, in negotiating procurement contracts. Hence, lowering the desirable efficiency that public procurement aims at achieving in the tendering process. However, it is important to stress that these metrics are just indicative of potential problems, and thus the true nature of the activities of the firms in each community should be carefully investigated by the corresponding local authorities.

Bidding Coordination
To further investigate the risk/susceptibility of market manipulation by firms, we next look at the propensity that each community has in participating in "single bidder" contracts. Another pattern often associated with corruption and loss of efficiency. Hence, what is the susceptibility of each community to such practice? To answer that question, we start by investigating the average number of times, per community, that a firm is the single bidder of a tender. Figure 5a shows the results for all 22 communities in the largest component of the Firm-Firm network. Traditionally a higher level of single bidders would be the main indicator of market manipulation. However, at the community level, both very low and very high levels are indicative of unusual activity. In particular, if we set as a baseline number of single bids by a typical firm. The reason being that low levels indicate the risk of coordination (e.g., firms participating coherently in the same contracts) while high levels can sign the prevalence of less competitive markets. Overall, of the largest 10 communities, only Community 8 exhibits FIG. 3. Graphical representation of the Firm-Firm network, which relates firms with similar bidding pattern. In order to build the network we consider only the most active firms, and edges with a significant Jaccard similarity index. Represented is the giant component with some relevant disconnected components. Nodes in the giant component are colored according to one of the eight major communities (out of 22) identified using the Louvain algorithm [37]. The presented partition of the network achieves a modularity of 0.61.
low levels of single bidders, a pattern extended to Communities 14 and 21 as well.. In contrast, we see that community 12 strongly deviates from the average with average value of single bidding that is roughly four times larger than the average. In addition, it is important to look at the average number of bidders per tender in order to assess the potential existence of coherent behavior, that is, coordination between the firms in a community. To that end, for each community, we estimate the average number of bids per tender, which we normalize by the size of the community (i.e., the number of firms in a community). Interestingly, Figure 5b shows that in Community 8 firms participated tend to participate in tenders with a number of participating firms that matches almost exactly the size of the community. While, in some cases-Communities 14 and 19-the numbers are several times larger. Noteworthy to mention that this analysis is biased by the size of the communities, so the expectation would be to see a smoothly increasing relationship, with the largest community achieving the smallest value, and in the limiting case of a community with a single firm we would obtain the maximum. However, it is clear that in some cases-Communities 8, 14, and 19-there are clear deviations.

CONCLUSIONS
In this manuscript, we explore the potential of mining a large data set of public tenders collected from the activity  shows the average number of bidders in tenders that firms within a community typically participates. We normalize the value obtained for each community by the number of firms in that community. The horizontal red line shows the threshold that marks the size of the community of firms to compete for procurement contracts issued by the municipalities of the State of Ceará (Brazil). By matching firms with similar bidding patterns, we have inferred a firm-firm network comprising a total of 1, 141 nodes and 10, 630 edges. We show that we are able to identify communities of firms with similar bidding patterns. The network exhibits a high modular structure partitioned in 22 communities. These communities cluster firms that have a similar scope in procurement activity both in the nature of the contracts they celebrate and in the regional reach of their activities. Moreover, we look at two diversity indicators-regional diversity and procurement contract nature diversity-as a sign of the potential of certain communities to develop leverage over the procurement process. In other words, in affecting the expected efficiency of the market. Finally, we look at the sizes of the tenders, first by looking at the abundance of single bidders in communities, and secondly by looking at the average number of bidders in each tender. Overall we have identified on a particular community (Community 8) that combines several undesirable properties. Community 8 involves a group of firms that offers Food services in the region of Jaguarabe. They have an unusually low number of single bids; the average number of participating firms per tender matches the number of firms in the community, and they exhibit a high specialization and agglomeration in their activities.
Finally, it is important to highlight some shortcomings in our analysis and future working directions. The lack of pre-labeled data on past cases of corruption largely limited our ability to make any causal link between the network structure, its motifs, and the location of firms in the network with irregular procurement behavior. In that sense, our results are merely exploratory and show the potential of combining network science methods with descriptive statistics to highlight relevant groups of firms according to their activity pattern in a data-scarce environment. Future works should look at the evolution of the network, that is, if a larger temporal window is avail-able, to capture the evolution and segregation of communities of interest but also of their parametric path in terms of the diversity of their activities.