Skip to main content

Advertisement

Impact of network density on cascade size and community growth

Article metrics

Abstract

This paper addresses two critical questions related to (1) the co-evolutionary dynamics between information diffusion and network topology and (2) the relationship between viral content and social reinforcement. A recurrence relation model is developed to formulate the growth dynamics of a community centered on a dominant user using the tweet-retweet-follow (TRF) and exogenous link-creation events. This model illustrates several fundamental relations among parameters and quantities that are critical to answering the questions. The model reproduces social reinforcement and structural trapping effects, including empirical evidence that viral content does not require strong social reinforcement. In addition, the model demonstrates that the community growth rate is influenced not only by the rate of retweets but also by the network density of the community.

Introduction

Online social networks (OSNs) have become indispensable vehicles of communication, information access, and advertisement. Content on OSNs may have economically, politically, or socially strong influences when shared among a large number of OSN users. Numerous papers in the literature have documented the manner in which such phenomena emerge; however, the mechanism that causes the phenomena may not be completely understood. This paper presents new insights into this field of study, shedding light on certain mechanisms that work in OSNs.

Every once in a while, large-scale information cascades emerge. Because of their rare occurrence, statistical or machine learning approaches present risks associated with overdependence on small samples. Therefore, analytical approaches must verify pieces of empirical evidence and derive unknown causal relationships based on the evidence. This paper contributes to the literature in the field of information diffusion by answering two important questions.

Question 1

(co-evolution) Does the information diffusion co-evolve with the network topology?

It has been established that the underlying network topology affects information diffusion. Conversely, can information diffusion significantly alter the topology? To answer this question, this paper evaluates the community growth rate while a diffusion process is underway and demonstrates that the growth rate is greatly influenced by the network density of the community.

Question 2

(viral content) Why does viral content not require strong social reinforcement?

The seminal paper by Weng et al. (2013) presented a discussion on several community-related features that exhibit significant influences on information diffusion, including social reinforcement, which indicates that each additional exposure significantly increases the chance of adoption. One of the key findings of the paper was that non-viral content requires strong social reinforcement, whereas viral content does not. From an analytical perspective, this paper illustrates that large-scale cascades emerge only if the network density of the community is low.

These questions are directly related to improvements in future cascade-size prediction. Therefore, answering these questions can help, for example, develop more effective viral marketing strategies and realize more efficient content delivery network (CDN) resource deployment for viral content.

The model in this paper presents a community structure for revealing the mechanisms that yield large-scale information cascades because the community-related features are considered to exhibit a strong influence on future content popularity (Weng et al. 2014). An important factor that must be considered while designing an analytical model is that deviations from the empirical evidence must be small. Based on the evidence that a large number of adoptions occurred within only one degree of a few dominant individuals (Goel et al. 2012), this paper presents a scenario in which one dominant user, who has a significant influence on some online application or some social issues, is followed by a large number of Twitter users. This paper incorporates link-creation events, called tweet-retweet-follow (TRF), into the model because TRF link-creation events are orders of magnitude more likely than link-creation events caused by exogenous reasons (Antoniades and Dovrolis 2015). In addition, the paper also presents a discussion on the effects of exogenous link-creation events.

This paper focuses on the two Twitter features (a dominant user and TRF events) mentioned previously and emphasizes their importance by reproducing the community-related effects (social reinforcement and structural trapping) and the evidence describing that non-viral (viral) content (does not) requires strong social reinforcement (Question 2). After that, the paper presents a discussion on the manner in which the parameters in the model, particularly the network density, relate to the two questions.

The paper is organized as follows. “Related work” section presents the related work. “The model” section defines the retweet diffusion model, and “Results” section derives the fundamental relations among parameters and quantities based on the computational results, and presents some insights into Questions 1 and 2. “The extended model” section extends the model by detailing user behavior. “Discussion” section presents a discussion on the implications of the findings and the limitations of this approach. “Conclusions” section concludes the paper.

Related work

Information diffusion on OSNs has been a subject of intensive study. It has been widely recognized that information diffusion is a more complex contagion than the spread of infectious diseases. Several empirical studies indicate that connectivity distributions among online users are highly inhomogeneous (Goel et al. 2012; Goel et al. 2015; Bose and Shin 2008). Prior studies have examined the influence of underlying network properties, such as scale-free, small-world, and community structures, on the process of information diffusion under the assumption that the networks are static (Weng et al. 2013; Moore and Newman 2000; Watts 2002; Moreno et al. 2004). The influence maximization problem in social networks, under a variety of influence mechanisms and network properties, is an ongoing focus of research (Kempe et al. 2003; 2005; Aral and Dhillon 2018).

Current OSNs include a variety of communities such as language communities (Jin 2017); therefore, many studies have examined or applied the characteristics of information diffusion processes as information is transmitted within/across OSN communities. For example, the authors in (Li et al. 2015) categorized nodes into six community-aware types and evaluated their diffusion capabilities. This paper investigates the dynamics of a single Twitter community, assuming all community members are homogeneous and uniformly connected with each other.

Recent work (Tur et al. 2018) evaluated the effect of social reinforcement using a threshold model that assumes that additional contacts are not redundant but reinforce the probability of contagion. The model in this paper also assumes that additional retweet arrivals are not redundant; the retweet rate is constant independent of the number of arrivals. A fundamental difference is that the model in Tur et al. (2018) considers static networks. Another similar study (Banisch and Olbrich 2019) introduced positive and negative social feedback, and discussed the role of network density in the emergence of opinion polarization. The model in Banisch and Olbrich (2019) also assumes static underlying networks.

There are numerous studies that focus on predicting large-scale information diffusion (Weng et al. 2013; Szabo and Huberman 2010; Cheng et al. 2014; Zhao et al. 2015; Cheung et al. 2017; Krishnan et al. 2016). Incorporating the community structure into diffusion models is expected to improve model accuracy (Bao et al. 2017). Several studies utilize community properties for prediction, such as predicting the future popularity of a meme (Weng et al. 2014) and the virality timing of content (Junus et al. 2015). Contrary to these results, this paper discusses the intrinsic difficulty associated with the prediction.

Moreover, there are studies on the dynamic nature of the OSNs (Leskovec et al. 2008; Myers and Leskovec 2014). The work in (Myers and Leskovec 2014) empirically demonstrated that large information diffusion has the potential to generate sudden influxes of edge creation or deletion events. In contrast to the assumption that the network topology is static while a diffusion process is ongoing, some papers have attempted to understand the diffusion phenomena on the basis of the co-evolutionary dynamics, where network topology and information diffusion influence each other reciprocally. In (Weng et al. 2013), several link-creation mechanisms were introduced and their roles in the diffusion process were examined. The authors in Farajtabar et al. (2017) formulated a stochastic process expressing the joint dynamics of information diffusion and network evolution. In Antoniades and Dovrolis (2015), the authors studied the importance of a link-creation event called tweet-retweet-follow (TRF). They demonstrated that TRF events can create dense user communities. There have been several studies that focused on link-creation events such as triadic closure (Leskovec et al. 2008), shortcuts (Weng et al. 2013), homophily (Hours et al. 2016; Aragón et al. 2017), and social influence (Aragón et al. 2017). This paper formulates a retweet diffusion model based on TRF events because the events express general content sharing and link creation dynamics on Twitter. Instead of introducing stochastic processes or differential equations, this paper uses recurrence relations to express TRF dynamics because the recurrence relations derived are simple and ensure extensibility.

The model

The model includes an initiator, a large number of Twitter followers of the initiator (referred to as supporters), and Twitter users who are not supporters (referred to as non-supporters). The model assumes that supporters are closely related to each other on the basis of the follower-followee relationship and form a community. Therefore, the model is a community centered on the initiator. This section introduces recurrence relations to express the retweet propagation phenomena that originate from the initiator to supporters and non-supporters according to the TRF events.

Table 1 presents the parameter values in the model. Ni denotes the number of supporters at the time initiator tweets, Nf is the number of followers a user (except the initiator) has, and α(0,1] is the ratio of supporters in Nf followers of a supporter; e.g., α=0 (1) indicates that all Nf followers are non-supporters (supporters). Parameter α is an indicator of the network density of the community because a higher α implies a higher network density. Therefore, α is also referred to as the network density indicator of the community. This paper fixes the value of Nf at 500 because most of the Twitter analysis reports on the Internet present three-digit numbers for the average number of followers.

Table 1 Parameter values in the model

The model is designed under the following assumptions:

Asm 1

Supporters and non-supporters are homogeneous in that they do not possess any unique attributes.

  • Supporters (non-supporters) retweet with rate λs (λn). λsλn.

  • Supporters and non-supporters have the same Nf followers.

  • Supporters have αNf followers who are supporters (0<α≤1), while the followers of non-supporters are all non-supporters.

Asm 2

Users synchronously post retweets after receiving (re)tweets. The synchronization points are called steps.

Asm 3

The community does not have any specific network topology. At each step, αNf followers of each supporter are uniformly selected from all supporters.

Asm 4

All non-supporters who post retweets become supporters (i.e., follow the initiator).

Asm 4 is relaxed in “The extended model” section; however, the assumption provides most of the essential results in this paper. The model is simplified to focus on two Twitter properties: TRF events and locally scale-free structure (one dominant user and other users with a relatively small number of followers). The importance of the properties is confirmed by reproducing the community-related effects (social reinforcement and structural trapping) and (non-)viral content behavior.

Figure 1 illustrates a retweet cascade originated from the initiator at step 0. The recurrence relation model can be described as follows. Let r(i) (\(\bar q(i)\)) be the number of supporters (non-supporters) who retweet at step i. At step 0, the initiator tweets. At step 1, r(1) supporters retweet the tweet. At step 2, r(2)+q(1,1) followers of the r(1) supporters post retweets, where q(i,j) denotes the number of non-supporters who post retweets at step i+j after receiving retweets from q(i,j−1) non-supporters if j>1 or from r(i) supporters if j=1 (see Fig. 1).

Fig. 1
figure1

The arrow from r(2) to q(2,1) indicates that after receiving retweets from r(2) supporters at step 2, q(2,1) non-supporters post retweets at the next step

From Asm 4, q(1,1) followers become supporters and the new q(1,1) supporters influence r(3). Therefore, q(i,j) also denotes the number of new supporters at step k>i+j. At step 3, r(3)+q(1,2)+q(2,1) users retweet. At step i, \(r(i)+\bar q(i)\) users retweet, where \(\bar q(1)=0\) and for i>1,

$$\begin{array}{@{}rcl@{}} \bar q(i)=\sum_{k=1}^{i-1} q(k,i-k). \end{array} $$
(1)

Using the parameters in Table 1, r(1)=λsNi, q(1,1)=λn(NfαNf)r(1), and q(1,2)=λn(NfαNf)q(1,1). In general, for i≥1 and j≥0,

$$\begin{array}{@{}rcl@{}} q(i,j+1) &=& \lambda_{n} (N_{f}-\alpha N_{f}) q(i,j) \\ &=& \lambda^{j+1}_{n} (N_{f}-\alpha N_{f})^{j+1} r(i). \end{array} $$
(2)

On the other hand, r(i) is computed according to Algorithm 1. The algorithm can be explained as follows. At step \(i-1, r(i-1)+\bar q(i-1)\) supporters retweet and each supporter sends αNf retweets to supporters. Because each retweet message arrives at uniformly selected followers (Asm 3), the probability (Punret) that the k-th retweet arrives at a supporter who has not posted any retweets is given by

$$\begin{array}{@{}rcl@{}} P_{unret} = \frac{N_{i} - r'-\sum_{1\le \ell\le i-1} r(\ell)}{N_{i}-1-(k-1)+\sum_{1\le \ell\le i-1} \bar q(\ell)}, \end{array} $$
(3)

where the numerator denotes the number of supporters who have not posted any retweets, and the denominator denotes the number of supporters who have the possibility to receive the k-th retweet. In (3), r is the number of supporters who have retweeted at step i. In the denominator, “ − 1” indicates that retweeters do not receive their own retweets, and “ −(k−1)” indicates that a user does not receive multiple retweets from the same user. Because Punret is also the mean of the binomial distribution B(1,Punret), the mean value of the number of supporters who retweet after receiving the k-th retweet is given by λsPunret. For all \((r(i-1)+\bar q(i-1)) \times \alpha N_{f}\) retweets, λsPunret values are computed and added to r in line 5. r(i) is then derived in line 8.

The computation of r(i) and \(\bar q(i)\) stops at i=T+1 if

$$\begin{array}{@{}rcl@{}} r(T+1)+\bar q(T+1)<1, \end{array} $$
(4)

which indicates that the number of users who post retweets at step T+1 is less than one. This paper defines T (steps) as the duration of retweet propagation.

Results

Sparse community effect

This section discusses the impact of the ratio α on \(R=\sum _{k=1}^{T} r(k)/N_{i}, Q=\sum _{k=1}^{T} \bar q(k)/N_{i}\), and T, where R denotes the relative number of original supporters (supporters at step 0) who have retweeted until step T and Q is the relative number of non-supporters who become supporters until step T. R satisfies 0≤R≤1 and R=1 indicates that all original supporters have retweeted. Q can grow larger than one and Q>1 means that the number of supporters has more than doubled. Figure 2 illustrates R, Q, and T as functions of α. From Fig. 2(upper left), R exhibits a curve that is similar to a sigmoid curve, while Fig. 2(upper right) and (lower left) illustrate that Q and T have uni-modal curves. Therefore, there are three characteristic ratios: αR, αQ, and αT. αQ and αT denote the ratios at which T and Q exhibit peaks, respectively. αR is the smallest ratio satisfying |R−1|<δ, where δ is a positive and sufficiently small real number (e.g., 10−4).

Fig. 2
figure2

Upper left: the social reinforcement and sparse community effects (αR as λs or λn) appear. Upper right: the structural trapping and sparse community effects (αQ as λs or λn) and “ Q(αQ) as λs or λn” hold. Lower left: the sparse community effect (αT as λs or λn) and “ T(αT) as λs or λn” hold. Lower right: relations “ αT<αQ<αR” and “ T(αT) as Ni” hold. Upper left, upper right, and lower left: Ni=104. Lower right: Ni=104,105,λs=0.01, and λn=0.0015. R and Q curves at Ni=104 in the lower right graph completely overlap with those at Ni=105

Through intensive calculations, the following four fundamental relations are derived. The first two findings are related to the behavior of αR,αQ, and αT. From Fig. 2(lower right),

$$\begin{array}{@{}rcl@{}} \alpha_{T} < \alpha_{Q} < \alpha_{R}. \end{array} $$
(5)

The second finding is that αT, αQ, and αR decrease as λs or λn increases (see Fig. 2(upper left), (upper right), and (lower left)); i.e.,

$$\begin{array}{@{}rcl@{}} \alpha_{T} \downarrow, \alpha_{Q} \downarrow, \alpha_{R} \downarrow & \text{as} &\lambda_{s}\uparrow \text{or}\ \lambda_{n} \uparrow. \end{array} $$
(6)

This paper terms this phenomenon the sparse community effect. The effect implies that sparse communities, in which supporters have many (few) links to non-supporters (supporters), yield the largest cascades (R+Q) and the most long-lived cascades (T) when the rates of retweet are high. The sparse community effect presents a case in which the network structure affects content virality.

The third finding is that Q(αQ) rises as λs or λn grows; whereas T(αT) falls (rises) as λs (λn) grows; i.e.,

$$\begin{array}{@{}rcl@{}} {T(\alpha_{T}) \downarrow, Q(\alpha_{Q}) \uparrow} & \text{as} &{\lambda_{s} \uparrow}, \end{array} $$
(7)
$$\begin{array}{@{}rcl@{}} {T(\alpha_{T}) \uparrow, Q(\alpha_{Q}) \uparrow} & \text{as} &{\lambda_{n}\uparrow}. \end{array} $$
(8)

The fourth finding is illustrated in Fig. 2(lower right). T(αT) increases with Ni; i.e.,

$$\begin{array}{@{}rcl@{}} T(\alpha_{T}) \uparrow & \text{as} &{N_{i} \uparrow}. \end{array} $$
(9)

Note that in Fig. 2(lower right), R and Q do not change with Ni (these are quantities relative to Ni).

Structural trapping and social reinforcement

Figure 2 presents other findings that agree with the empirical phenomena. It can be observed from Fig. 2 (upper right) that Q approaches zero as α approaches one; i.e.,

$$\begin{array}{@{}rcl@{}} {Q \downarrow 0} & \text{as} &{\alpha \uparrow 1}, \end{array} $$
(10)

which represents the structural trapping effect (Weng et al. 2013) because a larger α, indicating denser communities with fewer outgoing links, lowers Q, indicating a smaller amount of propagation outside of the community. While, Fig. 2 (upper left) illustrates the social reinforcement effect, which indicates that in the presence of high clustering, any additional adoption is likely to produce more multiple exposures than in the case of low clustering (Weng et al. 2013), because R is a strictly increasing function of α; i.e.,

$$\begin{array}{@{}rcl@{}} {R \uparrow} & \text{as} &{\alpha \uparrow}. \end{array} $$
(11)

Answering two questions

Let us first discuss Question 1 (co-evolution) in “Introduction” section using the community growth rate measured by Q/T. A large Q/T implies dynamic community growth. The following demonstrate under what conditions it is dynamic (or static). Because αTαQ, the growth rate Q/T must greatly vary in the range of [αT,αQ]. It can be anticipated that

$$\begin{array}{@{}rcl@{}} \frac{Q(\alpha_{T})}{T(\alpha_{T})} < \frac{Q(\alpha_{Q})}{T(\alpha_{Q})}. \end{array} $$
(12)

Figure 3 compares the growth rates at α=0.05, which is close to αT, and at α=0.2(≈αQ). From the figure, Q(0.2)/T(0.2) is approximately 10 times greater than Q(0.05)/T(0.05).

Fig. 3
figure3

Growth rate Q/T depends on α,λs, and Ni. \(Q(0.2)/T(0.2) \approx 10 \times Q(0.05)/T(0.05), Q(0.05)/T(0.05)|_{\lambda _{s}=0.05} \approx 30 \times Q(0.05)/T(0.05)|_{\lambda _{s}=0.01}\), and \(Q/T|_{N_{i}=10^{4}}>Q/T|_{N_{i}=10^{5}}\). λn=0.0015

Meanwhile, (7) suggests that Q/T monotonically increases with λs; i.e.,

$$\begin{array}{@{}rcl@{}} {Q/T \uparrow} & \text{as} &{\lambda_{s} \uparrow}. \end{array} $$
(13)

Figure 3 illustrates the case where λs increases from 0.01 to 0.05. From the figure, Q(0.05)/T(0.05) at λs=0.05 is approximately 30 times greater than that at λs=0.01. This implies that an attractive tweet from the initiator has the potential to greatly enlarge the growth rate. Note from Fig. 3 that Q(0.05)/T(0.05) at Ni=104 and λs=0.05 is approximately 0.06, and this implies that after 10 steps, the average community size increases by 60%.

Because Q is independent of Ni (see Fig. 2(lower right)), it follows from (9) that

$$\begin{array}{@{}rcl@{}} {Q/T \downarrow} & \text{as} &{N_{i} \uparrow}. \end{array} $$
(14)

In other words, communities with a large number of supporters grow slowly. Figure 3 illustrates that Q/T at Ni=105 is slightly smaller that at Ni=104. Accordingly, the growth rate is particularly sensitive to α and λs.

Let us next discuss Question 2 (viral content) using the cascade size W, which is measured by W=R+Q. From (6)–(8),

$$\begin{array}{@{}rcl@{}} Q(\alpha_{Q}) \uparrow, \alpha_{Q} \downarrow, \alpha_{R} \downarrow \text{as } \lambda_{s} \uparrow \text{or }\ \lambda_{n} \uparrow, \end{array} $$
(15)

which implies that cascade size W exhibits a higher peak at a smaller α as λs or λn increases; i.e.,

$$\begin{array}{@{}rcl@{}} {W(\alpha_{W}) \uparrow, \alpha_{W} \downarrow \text{as } \lambda_{s} \uparrow \text{or}\ \lambda_{n} \uparrow}, \end{array} $$
(16)

where αW is the ratio α that maximizes W. Namely, a large number of users post retweets (i.e., viral content emerges) at small α values where the social reinforcement effect is considered to be small.

Let us intuitively explain why (16) occurs. Community members (i.e., supporters) exhibit a greater tendency to post retweets than non-members (i.e., non-supporters) because of the social reinforcement effect and λs>λn. When the value of λs is high, for example, members easily retweet; therefore, by reducing α, more retweets must be sent outside of the community, providing non-members more chances to retweet; otherwise, the structural trapping effect becomes more pronounced. In other words, to maximize W, the ratio αW controls the numbers of retweet messages flowing inside and outside the community such that fewer (more) messages flow inside (outside) the community (i.e., αW is reduced) when the value of λs or λn increases (the sparse community effect).

The extended model

Detailed user behavior

Asm 4 in “Results” section assumes that all non-supporters who post retweets become supporters. This section relaxes the assumption and introduces exogenous link-creation, which states that non-supporters become supporters without receiving retweets, by replacing Asm 4 with the following assumptions:

Asm 5

The ratio of non-supporters who become supporters after retweeting βr (referred to as the retweeter’s follow rate), which is given by

$$\begin{array}{@{}rcl@{}} \beta_{r}=\frac{\#\{x \in S_{n} |\textit{x}\ \text{posts a retweet and become a supporter}\}} {\#\{x \in S_{n} |\text{\textit{x} posts a retweet}\}}, \end{array} $$
(17)

satisfies 0<βr≤1, where Sn is the set of non-supporters.

Asm 6

The ratio of non-supporters who become supporters without retweeting βn (referred to as the exogenous link-creation indicator), which is given by

$$\begin{array}{@{}rcl@{}} \beta_{n} =\frac{\#\{x \in S_{n} |\textit{x}\ \text{becomes a supporter and does not retweet}\}} {\#\{x \in S_{n} |\text{\textit{x} becomes a supporter}\}}, \end{array} $$
(18)

satisfies 0≤βn<1.

Asm 7

The ratio of non-supporters who become supporters without retweeting after receiving retweets c, which is given by

$$\begin{array}{@{}rcl@{}} c =\frac{\#\{x \in S_{n} |{x} \text{ receives a retweet, becomes a supporter, and does not retweet}\}} {\#\{x \in S_{n} |{x} \text{ becomes a supporter}\}}, \end{array} $$
(19)

is fixed and 0≤cβn.

Asm 4 corresponds to βr=1 and βn=0. Constant c in Asm 7 does not affect the output of the extended model; while, Asms 5 and 6 require making the following modifications to Algorithm 1:

  • In line 2 in Algorithm 1, \(r(i-1)+\bar q(i-1)\) (the number of supporters who retweet at step i−1) is replaced with \(r(i-1)+\beta _{r} \bar q(i-1)\).

  • In line 4, Punret is revised as

    $$\begin{array}{@{}rcl@{}} P_{unret} &=&\frac{N_{i} - r' - \sum_{1\le \ell\le i-1} r(\ell) + \beta_{r}\frac{\beta_{n}}{1-\beta_{n}} \sum_{1\le \ell\le i-1} \bar q(\ell)} {N_{i}-1-(k-1)+\beta_{r} \left(1+\frac{\beta_{n}}{1-\beta_{n}}\right) \sum_{1\le \ell\le i-1} \bar q(\ell)}. \end{array} $$
    (20)

The following explains (20). Figure 4 illustrates five non-supporter groups classified on the basis of three actions: receive, retweet, and follow. In the figure, Nk denotes the size of the k-th group at each step. Note that the previous model corresponds to the case of N2=N4=N5=0. From (17)–(19),

$$\begin{array}{@{}rcl@{}} \beta_{r} &= &\frac{N_{3}}{N_{2}+N_{3}}, \end{array} $$
(21)
Fig. 4
figure4

Non-supporters are classified into five groups on the basis of three actions (receive, retweet, and follow) at each step. N1,N2,…,N5 denote the sizes of these groups (e.g., N2 non-supporters post retweets and do not become supporters)

$$\begin{array}{@{}rcl@{}} \beta_{n} &= & \frac{N_{4}+N_{5}}{N_{3}+N_{4}+N_{5}}, \end{array} $$
(22)
$$\begin{array}{@{}rcl@{}} c &= & \frac{N_{4}}{N_{3}+N_{4}+N_{5}}. \end{array} $$
(23)

Because the number of non-supporters who retweet at step is \(\bar q(\ell)\),

$$\begin{array}{@{}rcl@{}} N_{2}+N_{3}=\bar q(\ell). \end{array} $$
(24)

As a modification, the numerator in (3) must include N4+N5 (the number of new supporters who have not retweeted) of all previous steps and the denominator requires replacement of N3, which corresponds to \(\bar q(\ell)\), with N3+N4+N5 (the number of new supporters) for all previous steps. From (21), (22), and (24),

$$\begin{array}{@{}rcl@{}} N_{4}+N_{5} &=& \frac{\beta_{n}}{1-\beta_{n}} \beta_{r} \bar q(\ell), \end{array} $$
(25)
$$\begin{array}{@{}rcl@{}} N_{3}+N_{4}+N_{5} &=& \beta_{r} \bar q(\ell) + \frac{\beta_{n}}{1-\beta_{n}} \beta_{r} \bar q(\ell). \end{array} $$
(26)

Therefore, compared with (3), the numerator in (20) includes a new term \(\beta _{r}\frac {\beta _{n}}{1-\beta _{n}} \sum _{1\le \ell \le i-1} \bar q(\ell)\), and in the denominator, \(\sum _{1\le \ell \le i-1} \bar q(\ell)\) is replaced by \(\beta _{r} \left (1+\frac {\beta _{n}}{1-\beta _{n}}\right) \sum _{1\le \ell \le i-1} \bar q(\ell)\).

Let us next consider exogenous link-creation events (Antoniades and Dovrolis 2015), which indicate that non-supporters become supporters without receiving retweets. Therefore, the ratio of exogenous link-creation events in the link-creation events βe is given by

$$\begin{array}{@{}rcl@{}} \beta_{e} &= & \frac{N_{5}}{N_{3}+N_{4}+N_{5}}. \end{array} $$
(27)

Using (22), (23), and (27),

$$\begin{array}{@{}rcl@{}} \beta_{n} = \beta_{e} + c, \end{array} $$
(28)

which indicates that βe linearly increases with βn. Therefore, βn is referred to as the exogenous link-creation indicator.

The new assumptions redefine R (the relative number of supporters who retweet until step T), Q (the relative number of non-supporters who become supporters until T), and cascade size W (the relative number of users who retweet until T). Let \(\bar Q= \sum _{k=1}^{T} \bar q(k)\) and \(\bar R= \sum _{k=1}^{T} r(k)\). They are written as

$$\begin{array}{@{}rcl@{}} R&=&\bar R \left(N_{i}+ \beta_{r} \frac{\beta_{n}}{1-\beta_{n}} \bar Q \right)^{-1}, \end{array} $$
(29)
$$\begin{array}{@{}rcl@{}} Q&=& \beta_{r} \left(1+\frac{\beta_{n}}{1-\beta_{n}} \right) \bar Q/N_{i}, \end{array} $$
(30)
$$\begin{array}{@{}rcl@{}} W&=& \left(\bar Q+\bar R \right)/N_{i}, \end{array} $$
(31)

where R is normalized such that it satisfies 0≤R≤1, and Q and W are quantities relative to Ni. Note that Q and \(\bar Q/N_{i}\) are different; \(\bar Q/N_{i}\) denotes the relative number of non-supporters who post retweets until T. Note also that the definition of W is the same as before.

Relaxing sparse community effect

Figure 5 exhibits the effects of the retweeter’s follow rate βr and the exogenous link-creation indicator βn. In Fig. 5(left) and (right), the solid lines correspond to the previous model (i.e., βr=1 and βn=0). Because these parameters directly affect Q, it is natural that Q must increase with βr or βn. The effects of βr and βn can be summarized as follows:

$$\begin{array}{@{}rcl@{}} \alpha_{T} \downarrow, \alpha_{Q} \downarrow, \alpha_{R} \downarrow & \text{as} &{\beta_{r} \uparrow \text{or}\ \beta_{n} \downarrow}, \end{array} $$
(32)
Fig. 5
figure5

Left: “ αT,αQ,αR,T(αT),Q(αQ) as βr” hold. Right: an increase in βn highly enlarges Q(αQ). “ αT,αQ,αR,T(αT),Q(αQ) as βn” hold. Both figures show “ αT<αQ<αR.” For all figures, solid lines correspond to the previous model (βr=1,βn=0), λs=0.01,λn=0.0015, and Ni=104

$$\begin{array}{@{}rcl@{}} {T(\alpha_{T}) \uparrow, Q(\alpha_{Q}) \uparrow} & \text{as} &{\beta_{r} \uparrow \text{or}\ \beta_{n} \uparrow}. \end{array} $$
(33)

From (32), an increase in βr strengthens the sparse community effect, whereas βn weakens the effect. The following partly explains (32). An increase in βr or a decrease in βn indicates that a percentage of retweeters in the community increases; therefore, to enlarge R and Q, more retweets must flow outside the community, which implies that αQ and αR decrease. black

The extended model also satisfies the fundamental relations (5)-(9) in “Results” section and exhibits the sparse community, social reinforcement, and structural trapping effects. Therefore, the extended model provides the same discussion on Questions 1 and 2 as in “Answering two questions” subsection.

Figure 6 illustrates the growth rate (Q/T) and the cascade size (W) for different βr and βn values. It can be observed from the figure that

$$\begin{array}{@{}rcl@{}} Q/T \uparrow & \text{as} &{\beta_{r} \uparrow \text{or}\ \beta_{n} \uparrow}, \end{array} $$
(34)
Fig. 6
figure6

Upper left: “ Q/T as βr” and \(Q(0.1)/T(0.1)|_{\beta _{r}=1} \approx 50 \times Q(0.1)/T(0.1)|_{\beta _{r}=0.1}\) holds. Upper right: “ Q/T as βn” holds. Lower left: “ W as βr” holds. Lower right: βn has a large impact on W. “ W as βn” holds. For all figures, solid lines correspond to the previous model (βr=1,βn=0), λs=0.01,λn=0.0015, and Ni=104

$$\begin{array}{@{}rcl@{}} W \uparrow & \text{as} &{\beta_{r} \uparrow \text{or}\ \beta_{n} \uparrow}. \end{array} $$
(35)

It is worth noting that in Fig. 6(upper left), Q(0.1)/T(0.1) at βr=1 is approximately 50 times greater than that at βr=0.1; i.e., a low retweeter’s follow rate makes networks static. In addition, from Fig. 6(lower right), the high peak of W at βn=0.4 suggests another mechanism that creates large-scale cascades.

Large cascade creation mechanisms

Figure 7 illustrates contour lines of cascade size W and growth rate Q/T on the αλs plane. Figure 7(upper left) exhibits the case wherein large-scale cascades emerge because W=16 implies W×Ni=160,000 retweets. This large cascade creation mechanism at βn=0 is referred to as the endogenous mechanism. The figure exhibits that

$$\begin{array}{@{}rcl@{}} \alpha_{W} <0.01. \end{array} $$
(36)
Fig. 7
figure7

Upper left: contour lines of W at λn=0.002 and βn=0. Large-scale cascades emerge at α<0.01. Upper right: contour lines of W at λn=0.001 and βn=0. Large-scale cascades do not appear. Lower left: contour lines of W at λn=0.001 and βn=0.7. Large-scale cascades emerge at αW(0.05,0.07). They require somewhat weak social reinforcement. Lower right: contour lines of Q/T at λn=0.002 and βn=0. Q/T is small if λs and α are both small. For all figures, βr=0.5 and Ni=104

Inequality (36) coincides with (16). The figure also indicates that W significantly decreases at low ratios α as λs decreases to 0.005. This result also agrees with the sparse community effect because (6) is equivalent to

$$\begin{array}{@{}rcl@{}} \alpha_{Q} \uparrow, \alpha_{R} \uparrow & \text{as} &{\lambda_{s}\downarrow}. \end{array} $$
(37)

It must be noted that the cascade size is determined not by λs (content infectivity) but by α (network structure) when α>0.1.

Figure 7(upper right) is obtained using the same parameter values as Fig. 7(upper left) except that λn is reduced from 0.002 to 0.001. Large-scale cascades do not appear in this case. The maximum W falls to 2.5 and αW increases and satisfies

$$\begin{array}{@{}rcl@{}} 0.3 < \alpha_{W} <0.4, \end{array} $$
(38)

which also coincides with (16). This result implies that large α values in (38) enlarge small- and middle-scale cascades; i.e., non-viral content requires social reinforcement. The result also demonstrates that large-scale cascades require not only a large λs but also a large λn.

Figure 7(lower left) is calculated using the same parameter values as Fig. 7(upper right) except that βn is increased from 0 to 0.7. In this case, large-scale cascades again emerge because the maximum of W is 25. This large cascade creation mechanism is referred to as the exogenous mechanism. From the figure,

$$\begin{array}{@{}rcl@{}} 0.05 < \alpha_{W} <0.07, \end{array} $$
(39)

which indicates that compared with (36), this mechanism requires somewhat weak social reinforcement. This result agrees with (32).

Figure 7(lower right) illustrates contour lines of growth rate Q/T obtained using the same parameter values as Fig. 7(upper left). It can be observed that the αλs plane is separated into dynamic (high Q/T) and static (low Q/T) areas. The static area is located in a small area where λs and α are both small. This result agrees with (13), Figs. 3, 6(upper left), and 6(upper right). From Fig. 7(upper left) and (lower right), W and Q/T are not correlated and they have the maximum values at different (α,λs) points.

Discussion

Implications of the findings

There are many studies that focus on the prediction of large-scale information cascades. Figure 7(upper left) demonstrates the potential difficulty in predicting such information cascades. From the figure, it can be observed that not only large-scale cascades but also small- and middle-scale cascades appear at low ratios and their future sizes are highly sensitive to α,λs, and λn. The same result can be seen in Figs. 2(upper left) and 2(upper right), where R and Q vary significantly near ratio zero when λs and λn are high. It is not easy to measure these rates accurately at the initial stage of a cascade because the sample size at that stage is small. Even if the rates are accurately measured, the future size will, in general, have a certain probability distribution. For example, W at (α,λs)=(0.01,0.01) in Fig. 7(upper left) is not deterministically given but will take values such as 4, 6, and 8. Furthermore, ratio α, a network density indicator, itself may vary with time (Antoniades and Dovrolis 2015; Oida and Okubo 2019). This paper considers the ratio as time-invariant; however, if TRF events affect the network densities of Twitter communities (Antoniades and Dovrolis 2015), the prediction of viral content becomes even more difficult.

This paper incorporated exogenous link-creation events into the model and demonstrated that these events have the potential to create large-scale cascades. According to (Antoniades and Dovrolis 2015), their occurrence probability is significantly low (two or three orders of magnitude lower than the occurrence probability of TRF events). However, big data analytical services, such as personalized recommendation in social networking sites, may boost exogenous link creation in the future.

This paper presents a simplified model, focusing on two Twitter properties: the TRF events and the existence of a dominant user with a large number of followers; nevertheless, the model reproduces social reinforcement and structural trapping effects and presents many interesting causal relations related to Questions 1 and 2. This outcome suggests that these properties are important factors for understanding real network dynamics.

Limitations of this approach

The results in the paper are only hypotheses and must be validated by detailed event-driven simulations or empirical evidence. Because of the uniformly-connected homogeneous agent and synchronous retweet assumptions (Asms 1–3), the model presents the average behavior of the retweet diffusion phenomena. To measure the effects of variations in agent behavior or network structure, one approach is to introduce a heterogeneous agent model with some network topological features. Another approach is to extend the model such that communities or sub-communities inside a community have different attribute values. The latter approach minimizes the complexity of the model and is therefore suited to simulate large-scale diffusion processes. Using the recurrence relations and Algorithm 1 in the paper, the heterogeneous (sub)community model can be easily derived.

Conclusions

This paper addressed the following questions: (1) Can the underlying network be considered static while a diffusion process is underway? (2) Why does viral content not require strong social reinforcement? The paper introduced a recurrence relation model that describes the retweet diffusion and community growth dynamics, in which one dominant user (initiator) with a large number of followers posts a tweet, then retweets by followers propagate the tweet, and TRF and exogenous link-creation events generate new followers. Consequently, the community comprising the followers of the initiator grows. The paper focused on the ratio of community members in the followers of each community member (α), which is also a network density indicator of the community. blackThe model not only illustrates the structural trapping and social reinforcement effects but also presents several important relations among three ratios αT,αQ, and αR, three quantities R, Q, and T, and four parameters λs,λn,βr, and βn. Above all, the magnitude relation (αT<αQ<αR), the sparse community effect, and the result that Q(αQ) grows and T(αT) falls as λs increases were directly associated with the two questions.

The computational results derived from the model answered the two questions as follows: (1) The criterion for network variability was the community growth rate. Except for the cases in which the retweeter’s follow rate βr is significantly small or the community size is extremely large, the growth rate is small when retweet rate λs and ratio α are both small. If the condition does not hold, the growth rate may rise by more than 10 times; i.e., the networks are potentially dynamic. Therefore, depending on the evaluation conditions of diffusion models, the effect of topological changes must be considered. (2) There are two mechanisms that create large-scale cascades: the endogenous and exogenous mechanisms. Currently, only the endogenous mechanism is functional in the Twitter network. Both mechanisms demonstrated that the cascade size is maximized at a low ratio α (i.e., at a low network density), at which the social reinforcement effect is considered to be low. Thus, the model reproduced the evidence that viral content does not require strong social reinforcement. This result is closely related to the sparse community effect, which indicates that as the retweet rates increase, the α value that maximizes the cascade size decreases. Therefore, large-scale cascades emerge only if the ratio is low.

References

  1. Antoniades, D, Dovrolis C (2015) Co-evolutionary dynamics in social networks: A case study of twitter. Comput Soc Netw 2(1):14.

  2. Aragón, P, Gómez V, García D, Kaltenbrunner A (2017) Generative models of online discussion threads: state of the art and research challenges. J Internet Serv Appl 8(1):15.

  3. Aral, S, Dhillon PS (2018) Social influence maximization under empirical influence models. Nat Hum Behav 2(6):375.

  4. Banisch, S, Olbrich E (2019) Opinion polarization by learning from social feedback. J Math Sociol 43(2):76–103.

  5. Bao, Q, Cheung WK, Zhang Y, Liu J (2017) A component-based diffusion model with structural diversity for social networks. IEEE Trans Cybernet 47(4):1078–1089.

  6. Bose, A, Shin KG (2008) On capturing malware dynamics in mobile power-law networks In: Proceedings of the 4th International Conference on Security and Privacy in Communication Netowrks, 12.. ACM.

  7. Cheng, J, Adamic L, Dow PA, Kleinberg JM, Leskovec J (2014) Can cascades be predicted? In: Proceedings of the 23rd International Conference on World Wide Web, 925–936.. ACM.

  8. Cheung, M, She J, Junus A, Cao L (2017) Prediction of virality timing using cascades in social media. ACM Trans Multimed Comput Commun Appl (TOMM) 13(1):2.

  9. Farajtabar, M, Wang Y, Gomez-Rodriguez M, Li S, Zha H (2017) Coevolve: A joint point process model for information diffusion and network evolution. J Mach Learn Res 18:1–49.

  10. Goel, S, Anderson A, Hofman J, Watts DJ (2015) The structural virality of online diffusion. Manag Sci 62(1):180–196.

  11. Goel, S, Watts DJ, Goldstein DG (2012) The structure of online diffusion networks In: Proceedings of the 13th ACM Conference on Electronic Commerce, 623–638.. ACM.

  12. Hours, H, Fleury E, Karsai M (2016) Link prediction in the twitter mention network: impacts of local structure and similarity of interest In: Data Mining Workshops (ICDMW), 2016 IEEE 16th International Conference On, 454–461.. IEEE.

  13. Jin, H (2017) Detection and characterization of influential cross-lingual information diffusion on social networks In: Proceedings of the 26th International Conference on World Wide Web Companion, 741–745. International World Wide Web Conferences Steering Committee.

  14. Junus, A, Ming C, She J, Jie Z (2015) Community-aware prediction of virality timing using big data of social cascades In: Big Data Computing Service and Applications (BigDataService), 2015 IEEE First International Conference On, 487–492.. IEEE.

  15. Kempe, D, Kleinberg J, Tardos É (2003) Maximizing the spread of influence through a social network In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 137–146.. ACM.

  16. Kempe, D, Kleinberg J, Tardos É (2005) Influential nodes in a diffusion model for social networks In: International Colloquium on Automata, Languages, and Programming, 1127–1138.. Springer.

  17. Krishnan, S, Butler P, Tandon R, Leskovec J, Ramakrishnan N (2016) Seeing the forest for the trees: new approaches to forecasting cascades In: Proceedings of the 8th ACM Conference on Web Science, 249–258.. ACM.

  18. Leskovec, J, Backstrom L, Kumar R, Tomkins A (2008) Microscopic evolution of social networks In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 462–470.. ACM.

  19. Li, C-T, Lin Y-J, Yeh M-Y (2015) The roles of network communities in social information diffusion In: Big Data (Big Data), 2015 IEEE International Conference On, 391–400.. IEEE.

  20. Myers, SA, Leskovec J (2014) The bursty dynamics of the twitter information network In: Proceedings of the 23rd International Conference on World Wide Web, 913–924.. ACM.

  21. Moore, C, Newman ME (2000) Epidemics and percolation in small-world networks. Phys Rev E 61(5):5678.

  22. Moreno, Y, Nekovee M, Pacheco AF (2004) Dynamics of rumor spreading in complex networks. Phys Rev E 69(6):066130.

  23. Oida, K, Okubo K (2019) Adopter community formation accelerated by repeaters of product advertisement campaigns. IEEE Trans Comput Soc Syst 6(1):56–72.

  24. Szabo, G, Huberman BA (2010) Predicting the popularity of online content. Commun ACM 53(8):80–88.

  25. Tur, EM, Zeppini P, Frenken K (2018) Diffusion with social reinforcement: The role of individual preferences. Phys Rev E 97(2):022302.

  26. Watts, DJ (2002) A simple model of global cascades on random networks. Proc Natl Acad Sci 99(9):5766–5771.

  27. Weng, L, Menczer F, Ahn Y-Y (2013) Virality prediction and community structure in social networks. Sci Rep 3:2522.

  28. Weng, L, Menczer F, Ahn Y-Y (2014) Predicting successful memes using network and community structure In: ICWSM.

  29. Weng, L, Ratkiewicz J, Perra N, Gonçalves B, Castillo C, Bonchi F, Schifanella R, Menczer F, Flammini A (2013) The role of information diffusion in the evolution of social networks In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 356–364.. ACM.

  30. Zhao, Q, Erdogdu MA, He HY, Rajaraman A, Leskovec J (2015) Seismic: A self-exciting point process model for predicting tweet popularity In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1513–1522.. ACM.

Download references

Funding

Not applicable.

Availability of data and materials

Not applicable.

Author information

Correspondence to Kazumasa Oida.

Ethics declarations

Competing interests

The author declare that no competing interests exist.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional information

Author’s contributions

The author conducted this study alone. The author read and approved the final manuscript.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Keywords

  • Network density
  • Community
  • Twitter
  • TRF events
  • Recurrence relation
  • Co-evolutionary dynamics
  • Information cascade