### Organization of labor

As usual in collaborative systems, only a few contributors do most of the work (Barabasi 2003). When we analyze the number of contributions made by each author, we find a power-law distribution (Fig. 2B) and high Gini indices (Fig. 2C). In Fig. 2A we represented this distribution in the form of the Lorenz curve: authors are ordered by the number of contributions and curves represent the cumulative fraction of posts produced by the corresponding fraction of ranked authors. From the figure, we can see that the most active 10% produce the 80% of the posts (with the exception of project 4, which is characterized by a lower Gini index, where the 20% of contributors produce the 80% of the posts).

Following the procedure described in Bassolas et al. (2019), we use the Lorenz curve to categorize authors hierarchically: We take the derivative of the Lorenz curve at the point (1,1) and set an initial threshold at the point where the derivative crosses the horizontal axis (as you can see in Fig. 2A). The authors after this threshold represent the most productive *elite* of the project. We remove these *elite* contributors and we repeat the procedure recursively, identifying a group we define as the *first shell* (highly active authors but outside the hyperactive elite) at the first iteration, and the *peripheral shells* (namely shells E3, E4, E5, E6 in Fig. 2D) at subsequent iterations. In Fig. 2C, we display the number of contributors in the elite group and in the first shell, while Fig. 2D shows the percentage of authors in each hierarchical category. We can see that, according to this classification, the elite group contains less than 10% of the authors while the peripheral shells are consistently the most represented. In the following we will refer to authors belonging to the *elite* and the *first shell* as the *active core*.

#### Interactions between the authors

To better understand the division of labor in Polymath, we investigated the distribution of interactions between authors. In particular, we focused on how the *active core* authors, as defined in Sect. 3.1, interact with the *peripheral shells*.

In order to do so, based on the dependencies between posts, we defined a *comments interaction network* CIN \(= ({\mathcal {V}}, {\mathcal {E}}, W)\) with the following properties: each node \(i \in {\mathcal {V}}\) represents an author, an edge \((i,j) \in {\mathcal {E}}\) represents the existence of at least one comment by author *i* to a post of author *j*, and the weight \(W_{ij}\) associated to the edge (*i*, *j*) represents the number of times author *i* replied to a post of author *j*. To understand whether such interactions are highly concentrated in the active core of elite authors or more spread towards peripheral contributors, we compared the obtained graphs with a stochastic network model preserving, on average, the activity level of each node. Similarly to Roth et al. (2013), we hence simulate *K* networks \(\{ \Gamma ^k = ({\mathcal {V}}, {\mathcal {E}}^k, Q^k)\}_{k \in \{1,\ldots ,K\}}\) with an expected degree for each node equal to the one of the authors of our dataset, keeping the same number of nodes \(n = |{\mathcal {V}}|\) and edges \(m = |{\mathcal {E}}| = |{\mathcal {E}}^k| \,\) for all \(k \in \{1,\ldots ,K\}\). To do so we draw the weights \(Q_{ij}^k\) from a multinomial distribution with parameters *m* and \(p = \{p_{ij}\}_{i,j \in {\mathcal {V}}}\) such that

$$\begin{aligned} p_{ij} = \frac{d_i^{\text {out}}\cdot d_j^{\text {in}}}{m^2} \end{aligned}$$

where \(d_i^{\text {out}}\) is the out-degree of node *i* and \(d_j^{\text {in}}\) is the in-degree of node *j* in the comments interaction network. Figure 3A shows the distribution of the *fraction of in-core links* (i.e., the fraction of messages from elite contributors to other elite contributors) in our \(K = 100\) simulations and compares these distributions with the actual fraction of in-core links in our dataset. Figure 3B shows the same comparison for the in-periphery links, i.e., the fraction of messages written by peripheral contributors directed to other peripheral contributors. Both plots show a peculiar division of labor in the Polymath project: both core-to-core and periphery-to-periphery links are more represented than in random simulations, underlining that authors are more likely to reply to contributors who participate in the discovery process to a similar extent.

As mentioned in the Data and Methods section, some of the blogs we studied limit the depth of response structures. We qualitatively observed a shift from a non-hierarchical structure in the very first project (i.e., only the presence of second-level comments and no deeper structures) to a more structured organization of posts in later projects. To evaluate the robustness of the results presented in Fig. 3, we compared them with the results obtained with a different definition of network interactions. We define a *topic interaction network* TIN(T) \(= ({\mathcal {V}}, \widetilde{{\mathcal {E}}}(T), {\widetilde{W}}(T))\) with the following properties: the node set \({\mathcal {V}}\) still represents the set of authors, an edge \((i,j) \in \widetilde{{\mathcal {E}}}\) represents the fact that authors *i* and *j* published a post on the same topic at a distance no bigger than *T* posts (when posts are ordered chronologically). The weight \({\widetilde{W}}_{ij}(T)\), associated to the edge (*i*, *j*), represents the number of times author *i* and *j* published a post on the same topic in the time window defined by parameter *T*. Notice that, by definition, such a network is undirected. Once again, in order to study authors interactions, we need to compare them with a set of simulated networks \(\{ {\widetilde{\Gamma }}^k = ({\mathcal {V}}, \widetilde{{\mathcal {E}}}^k, {\widetilde{Q}}^k)\}_{k \in \{1,\ldots ,K\}}\) where \({\widetilde{m}} = |\widetilde{{\mathcal {E}}}| = |\widetilde{{\mathcal {E}}}^k|,\; k \in \{1, \ldots , K\}\). To do so, it is now sufficient to draw the solely values \(\{{\widetilde{Q}}_{ij}\}_{i,j \in {\mathcal {V}}, j \ge i}\) from a multinomial random distribution, as we want the network to be undirected and \({\widetilde{Q}}_{ij}^k = {\widetilde{Q}}_{ji}^k\) for all the *K* simulations. Therefore, we draw the values \(\{{\widetilde{Q}}_{ij}\}_{i,j \in {\mathcal {V}}, j \ge i}\) from a multinomial distribution of parameters \({\widetilde{m}}\) and \({\widetilde{p}} = \{{\widetilde{p}}_{ij}\}_{i,j \in {\mathcal {V}}, j \ge i}\) such that

$$\begin{aligned} {\widetilde{p}}_{ij}&= 2\frac{d_i\cdot d_j}{{\widetilde{m}}^2} \;\;\;\;\;\;\;\;\;\;\;\; \text {if } i \ne j \\ {\widetilde{p}}_{ii}&= \frac{d_i\cdot d_i}{{\widetilde{m}}^2} \;\;\;\;\;\;\;\;\;\;\;\;\;\; \text {otherwise,} \end{aligned}$$

where \(d_i\) is the degree of node *i* in the topic interaction network. The resulting distribution of in-core and in-periphery interactions is shown in Fig. 4. We notice that, regardless the definition of the window, in-core and in-periphery interactions are higher in the actual Polymath projects than in simulated networks. Moreover, results obtained on the *topic interaction network* confirm the ones obtained on the *comment interaction network*. We can thus conclude that in the Polymath collaborations, elite actors interact more with other elite actors, while peripheral actors preferentially respond to other peripheral actors. This result is consistent with the forms of status-based homophily observed by McPherson in social networks (McPherson and Smith-Lovin 1987). This is however surprising in a scientific context where interactions are generally assumed to be based on cumulative advantage processes (Merton 1968).

### Collective intelligence at work

Several studies on collaborative systems have shown a super-linear effect of collaboration: The very expression “collective intelligence” suggests that the collective productivity (in our case the number of posts) is higher than the sum of the individual productions. To test this feature dynamically, we count the daily number of posts and the daily number of participants for all projects:

$$\begin{aligned} n_{post}(t)&=[n_{post}(t_0),n_{post}(t_1),\ldots ]\\ n_{user}(t)&=[n_{user}(t_0),n_{user}(t_1),\ldots ], \end{aligned}$$

where \(t_0,t_1,\ldots\) represent different days. To reduce noise, we smooth these time series with a 7-days rolling window. By plotting the pairs \((n_{user}(t),n_{post}(t))\), we obtain the curves representing the relationship between the number of users and the number of posts. Figure 5A shows a pronounced superlinear growth of the number of posts with the number of users, aggregated for all projects: \(n_{post}=n_{user}^\gamma\) (with exponent \(\gamma =1,46\)). Our results are similar to those of Sornette et al. (2014) for GitHub.

Figure 5B, C suggest that contributions have positive super-linear effects, even when they are relatively marginal. In Fig. 5B, we show that the average individual daily production (for all contributors with more than 10 posts in all the projects) grows with the number of users active on that day. Figure 5C displays the average daily productivity of the active core as a function of the number of users in the peripheral shells. We observe that an important presence of peripheral users boosts the productivity of the most active users.

In Fig. 5, we show the results obtained by aggregating all the projects. The individual analysis of each blog shows similar trends with very small variations in the growth exponents (blog1: \(\gamma =1.30\), blog4: \(\gamma =1.22\), blog5: \(\gamma =1.46\), blog8: \(\gamma =1.65\), blog15: \(\gamma =1.50\)). Since the blog platforms are diverse, the robustness of these results suggests super-productivity to be an intrinsic characteristic of collaborative science, regardless the communication medium.

### Statistical properties of scientific discoveries

While in the previous sections we analyzed collaborative patterns in open science, we now focus on the analysis of the scientific discovery process itself.

First, we analyze the statistical properties of the mathematical concepts used in the projects. As described in the Methods Section, we have assigned a set of mathematical concepts to each post. We first test whether our corpus follows the basic laws of linguistic patterns: Zipf’s Law and Heaps’ Law. Zipf’s law expresses the relationship between the frequency and the ranking of words. It states that the frequency of a word is inversely correlated with its rank, \(f\sim r^{-\alpha }\). For example, looking at the Gutenberg Project corpus (a large sample of English literature), one can observe a value \(\alpha \sim -1\) for low values of *r* and \(\alpha \sim -2\) for high values of *r*. Heaps’ law concerns the entry of innovative concepts into a text and expresses the relationship between the number of different words (i.e., the vocabulary size) and the total number of words used (i.e., the length of the text). It describes an initial linear growth followed by an asymptotic behavior according to the power law \(l=v^\alpha\): in the Gutenberg corpus \(\alpha \sim 1\) for low values of *l* and \(\alpha \sim 0.44\) for high values of *l* have been observed. In Fig. 6, we see that not only are these laws respected in our corpus, but also all projects have the same behavior and exponents, Zipf’s exponents being \(\alpha =-0.36\) and \(\alpha =-2\) and Heaps’ exponents being \(\alpha =0.9\) and \(\alpha =0.4\). These values are consistent with those from Gutenberg corpus (Tria et al. 2018), although the first exponent of Zipf’s law in our corpora is lower, due to the fact that we removed the non-mathematical expressions and stop words. This consistency means that, statistically, the creative process of scientific discovery follows the same basic rules that characterize literary production.

Second, we focused on the typical timing of the discovery process, based on the the hypothesis that posts that are close in time would also tend to be similar in terms of content. In Fig. 7 we show the average Jaccard similarity between all pairs of posts published within a given time delay. We observe a power law decay of similarity with time, \(J\sim \Delta t^{-\gamma }\) (with \(\gamma =0.2\)), once again similar for all projects. Thus, for all projects, there exists a typical time window in which the debate remains focused on the same topic before switching to new one.

### Innovation patterns

Finally, we analyze how innovations affects the discovery mechanism, by using the innovation measure we defined in Sect. 2.4. As observed in Fig. 8A, the innovation values’ distributions are long-tailed, meaning that few posts have a much larger innovative content compared to the others: high innovation is rare, but statistically significant.

We define posts in the top quartile of the innovation distribution as innovative. Then, referring to the definition of activity shells introduced in Sect. 3.1, we examine which actors lead innovation. Since the groups vary in size, we compare the number of innovations observed in each class with their multinomial expectation, namely the probability that a post is innovative (25%) multiplied by the number of posts produced by the group. We calculate the z-score between the observed and expected values. While the previous results showed a fairly homogeneous behavior between the different projects, here we observe significant differences. In projects 1,4,8, the elite produces more innovation than expected. In project 15, the first shell is the main driver of innovation. Finally, in project 5, the peripheral shells are the largest producers of innovation. This result highlights that in large-scale collaborations no rule determines a priori who will be the main innovators. An innovator can be a member of the hyper-active elite, but sometimes serendipitous interactions of peripheral participants can also have a large impact on the discovery process: an isolated contribution of an occasional participant can be responsible for opening a large adjacent possible and giving a new direction to the work.