 Research
 Open access
 Published:
Complex influence propagation based on trustaware dynamic linear threshold models
Applied Network Science volume 4, Article number: 14 (2019)
Abstract
To properly capture the complexity of influence propagation phenomena in realworld contexts, such as those related to viral marketing and misinformation spread, information diffusion models should fulfill a number of requirements. These include accounting for several dynamic aspects in the propagation (e.g., latency, time horizon), dealing with multiple cascades of information that might occur competitively, accounting for the contingencies that lead a user to change her/his adoption of one or alternative information items, and leveraging trust/distrust in the users’ relationships and its effect of influence on the users’ decisions. To the best of our knowledge, no diffusion model unifying all of the above requirements has been developed so far. In this work, we address such a challenge and propose a novel class of diffusion models, inspired by the classic linear threshold model, which are designed to deal with trustaware, noncompetitive as well as competitive timevarying propagation scenarios. Our theoretical inspection of the proposed models unveils important findings on the relations with existing linear threshold models for which properties are known about whether monotonicity and submodularity hold for the corresponding activation function. We also propose strategies for the selection of the initial spreaders of the propagation process, for both noncompetitive and competitive influence propagation tasks, whose goal is to mimic contexts of misinformation spread. Our extensive experimental evaluation, which was conducted on publicly available networks and included comparison with competing methods, provides evidence on the meaningfulness and uniqueness of our models.
Introduction
Since the early applications in viral marketing, the development of information diffusion models and their embedding in optimization methods has provided effective support to address a variety of influence propagation problems.
However, due to the shrinking boundary between real and online/virtual social life (Bessi et al. 2014) along with the unlimited misinformation spots over the Web, e.g., fake news (Kumar et al. 2016; Kim et al. 2018), deciding whether a source of information is reliable or not has become a delicate task. For these reasons, understanding the complex dynamics of information diffusion phenomena has emerged as a task of paramount importance, since the way people act on the Web reflects how people behave in reality, which eventually depends to some extent on the way everyone consumes and acquires information.
A few studies on the spreading of fake news and hoaxes (Metaxas and Mustafaraj 2010; Mustafaraj and Metaxas 2017) argued that, the likelihood of people to be deceived by a spreading information item is increased because assessing the reliability and trustworthiness of the source generating and/or sharing such item becomes harder. Within this view, one side effect is the tendency of users to access information from likeminded sources (Koutra et al. 2015) and at the same time, to be trapped inside information bubbles, thus favoring network polarization phenomena (Garimella et al. 2017).
When it comes to debunking misinformation, two main strategies can be devised: realtime detection and correction, or delayed correction (Kumar and Geethakumari 2014). However, in both cases, the response time plays a crucial role into the effectiveness of the correction attempt, because users tend to reinforce their own belief — a cognitive phenomenon known as confirmation bias. Moreover, there is no guarantee about the effectiveness of such corrections: on the contrary, highlighting a fake news may even produce a backfire effect, i.e., driving users’ attention towards the misleading piece of information.
In this scenario, it appears that one recipe to deal with the interleaving of information and dis/misinformation should be to educate people to be mindful of the informative source. Unfortunately, it is often difficult to understand where an information item originated from. Therefore, it turns out to be essential to capture the effects that different types of social ties, particularly trust/distrust relationships, can have on both the user behavior and propagation dynamics. Two related questions hence arise: Q1What are the keyfeatures that make a diffusion model able to explain the inherent dynamic, and often competitive, nature of realworld propagation phenomena?Q2Do the currently used models of diffusion already incorporate such features?
To address question Q1, we recognize a number of aspects as essential constituents of a “realistic” information diffusion model, namely: (1) leveraging trust/distrust information in the user relationships to capture different effects of influence on decisions taken by a user; (2) accounting for a user’s change in adopting one or alternative information items (i.e., relaxation of the diffusion progressivity assumption); (3) accounting for a user’s hesitation or inclination towards the adoption of an information over time; (4) accounting for timedependent variables, such as latency, to explain the propagation dynamics; (5) dealing with multiple cascades of information that might occur competitively.
Motivating example. Our above hypothesis is supported by the following example: consider a typical scenario occurring in a political campaign, where two candidates want to target the audience of potential electors. Let’s assume, at the start of the political campaign, every elector has a complete unbiased opinion towards one of the two candidates. The ultimate decision about which candidate to vote it will likely be affected by both “exogenous” and “endogenous” influencing factors, i.e., one may be genuinely influenced by decisions taken by her/his social contacts — impact of homophily factors — but s/he may also have formed her/his own opinion outside the network of friends. In fact, not only friends, but also the network of foes has some degree of influence over the decision process of an individual. As a consequence of such negative influence received by foes, one may become more hesitant in taking a decision, which would be reflected by a quiescence state of the elector before being fully engaged in the promotion of the chosen candidate. Moreover, despite an elector may alternate her/his opinion in favor of one or the other candidate before the final endorsement, it will be more difficult to induce this change over time. In this regard, a timeaware notion of activation threshold is needed to mimic the effects of the confirmation bias. Finally, all decisions must be taken before the time limit, i.e., the election day, which constrains the political campaign period.
Question Q2 has been addressed by a relatively large corpus of research studies in the last few years. A variety of methods, mainly built upon classic information diffusion models such as Independent Cascade (IC) and Linear Threshold (LT) (Kempe et al. 2003), have tried to explain realistic propagation phenomena in order to solve optimization problems related to influence propagation. As we shall discuss in “Related work” section, diffusion models have been developed to incorporate one or more of the following aspects: multiple, competitive cascades of information; time horizon for the unfolding of the diffusion process; timedependent influence; delay in the propagation; and trustworthiness of the influence relations. However, to the best of our knowledge, all of the above aspects have never been unified into the same (LTbased) diffusion model.
Contributions. In this paper, we propose a novel class of diffusion models, named FriendFoe Dynamic Linear Threshold Models (F^{2}DLT). They are based on the classic LT model and are designed to deal with noncompetitive as well as competitive timevarying propagation scenarios. In our proposed models, the information diffusion graph is defined on top of a trust network, so that the strength of trust and distrust relationships is encoded into the influence probabilities. The response of a user to the influencing attempts is described by the means of a timevarying activation function, depending on both the inherent activationthreshold of the user and her/his tendency of keeping or leaving the campaignspecific activation state over time. We also introduce a quiescence function to model the latency or delay in the propagation, which accounts for the involvement of the user’s foes in the information diffusion. Remarkably, in our models, the trusted connections and distrusted connections play different roles: only friends can exert a degree of influence for activation/contagion purposes, whereas foes can only contribute to increase the user’s hesitation to commit with the propagation process. For competitive scenarios, we define two models with clearly different semantics: a semiprogressive model, which assumes that a user, once activated, is only allowed to switch to a different campaign, and a nonprogressive model, which instead requires a user to have always the support of her/his inneighbors to keep the activation state with a certain campaign.
We provided several theoretical insights into the proposed models. In particular, we demonstrated how each of our models could be reduced to other LTbased models for which properties are known about whether monotonicity and submodularity hold for the corresponding activation function.
Another contribution of this work is the definition of four seed selection strategies, which mimic different, realistic scenarios of influence propagation. These strategies are central to our methodology of propagation simulation, since the development of optimization methods under our diffusion models is beyond the goals of this work. Notably, in competitive scenarios, we have focused on combinations of strategies (to associate with competing campaigns) that might be reasonably considered for a misinformation spread limitation problem.
Experimental evaluation conducted on four realworld networks, also including comparison with stochastic epidemic models and the dynamic linearthreshold (DLT) model, has provided interesting findings on the meaningfulness and uniqueness of our proposed models.
Related work
We overview information diffusion models that, in the attempt of explaining realistic propagation phenomena, incorporate one or more of the following aspects: multiple, competitive cascades of information, time horizon for the unfolding of the diffusion process, timedependent influence, delay in the propagation, trustworthiness of the influence relations. Table 1 provides a guide to our discussion.
Please note that here we refer to the vast literature on probabilistic models originally designed to explain stochastic processes of information diffusion, which include the classic Independent Cascade(IC) and Linear Threshold (LT) models (Chen et al. 2013), and relating optimization problems, such as influence maximization. By contrast, we will leave out of consideration deterministic models, such as the structural cascades specifically designed to model context/contentsensitive diffusion over an interaction network (e.g., (Krishnan et al. 2016; Das et al. 2016)). Also, it is worth noting that the information diffusion modeling problem we tackle in this work is significantly different from the one addressed by epidemic models, such as SIS, SIR(S), and SEIR(S) (Hethcote 2000), already for the noncompetitive scenario. Standard epidemic models are originally defined as compartmental models, since the individuals of a population are divided in compartments that describe an epidemiological state. The parameters used to represent transition rates for changing states are absolute constants, which means that the infection process in compartmental models has a deterministic behavior. Also, standard epidemic models are of massaction type, since individuals are represented as normalized fraction of a population which randomly interact with each other. As discussed in (Dodds 2018), even social contagion based on stochastic or generalized epidemic models (i.e., there is a probability distribution of rates to govern the infection process) is originally defined on random networks, and its revision to deal with social networks would lead to more complicated models. In this regard, one direction is taken by the stochastic individualcontact, network models, whereby SIS and related models are reformulated by considering a stochastic infection process and a networkbased population of individually identifiable elements. In “Comparison with the IC, SIR and SEIR models” section, we present a stage of experimental evaluation devoted to a comparison with such models. However, even if epidemic models have also been used for social influence, they are not the most common approach to such topic (Porter and Gleeson 2016). This is manly due to the fact that identifying and modeling the causal mechanisms of the spread of ideas is more difficult than for the spread of diseases. By constrast, the threshold models for influence propagation (even the simplest ones) have two important features that are not clearly present in epidemic models. First, individuals have different behaviors, being such differences reflected in the distribution of activation thresholds associated with the individuals; by contrast, in stochastic epidemic models, the statetransition probabilities are drawn independently of the individuals’ relations. Second, an individual’s behavior also depends on the behavior of other individuals s/he is linked to: here, it is helpful to think about threshold models as an example of complex contagions, whereby an individual takes an action as a result of the exposure to multiple sources of influence; by contrast, epidemic models are more likely to represent simple contagion, in that a single source of influence (as social contact) may suffice to cause an individual’s action. Moreover, while the transition to the recovered state assumes nonprogressivity in stochastic diffusion models such as LT or IC, such a transition in SIR(S) is defined to happen spontaneously, discarding any influence that may result from the interaction with other individuals. For all such reasons, thresholds models are usually considered more appropriate in contexts like the adoption of new technologies or controversial ideas (Chen et al. 2013). And, in our work, we indeed follow this line of research. One further point of divergence adheres to the notion of competitiveness that is somehow found in advanced epidemic models: this refers to the presence of two or multiple groups of individuals (with some distinguishing characteristics) which are however affected by the same, single disease (Han et al. 2003; Ji et al. 2012; Hu et al. 2013), therefore it corresponds to a totally different notion than what is addressed in our work.
In the following, we briefly recall the definition of the LT model, which is at the basis of our proposal; then, we focus on related work that address the aforementioned aspects concerning complex propagation phenomena.
The classic Linear Threshold model. Given a directed graph representing a social network, with estimates of influence probabilities provided as edge weights, nodes can be “activated” (i.e., influenced) through an information cascade starting from an initially selected set of seed nodes (i.e., earlyadopters). At the beginning of the information diffusion process, each node is assigned a threshold uniformly at random from [0,1]. The diffusion process unfolds in discrete time steps and follows certain rules: nodes are either active or inactive; once activated, nodes cannot deactivate; an active node may trigger activation of neighboring nodes; a node can be activated at time t+1 by its active neighbors if their total influence weight at time t exceeds the threshold associated to that node. The process runs until no more activations are possible.
Competitive diffusion. A number of studies have been devoted to model competitive diffusion; see, e.g., (Chen et al. 2013) for a general introduction to IC and LT classes of competitive diffusion models. Focusing on competitive diffusion and related optimization problems under the context of misinformation spread limitation, one of the earliest work is (Budak et al. 2011), which proposes a multicampaign IC model to address the influence limitation problem, i.e., to find a seed set of size k for one, “good” campaign such that the number of nodes influenced by the other, “bad” campaign is minimized. In (Tong et al. 2017), the problem of rumor blocking is addressed under the competitive IC model and a randomized algorithm is developed for the selection of the seed set able to yield the maximum reduction in the number of badinfected nodes. An influence blocking maximization problem is also addressed in (He et al. 2012), using competitive LT. In (Chen et al. 2011), the two competing cascades correspond to opposite opinions, where the negative one may emerge spontaneously from any user in the network, e.g., a user got disappointed with a purchased item and decides to spread negative opinion among her/his contacts. Lu et al. (2015) also address the aspect of complementarity between two competing campaigns, under the assumption that if the two information items are correlated then the adoption of one item might favor further adoption of the second item over time.
Nonprogressive diffusion. While modeling the competitive nature of information cascades, the above works however refer to progressive models. On the contrary, a few studies have been proposed to model nonprogressive diffusion. For instance, (Lou et al. 2014) introduces a deactivation function into a continuous nonprogressive model, whereas an extension of LT is proposed in (Fazli et al. 2014) to define a nonprogressive strict majority model. However, both models are also noncompetitive.
Social ties and temporal aspects. All of the aforementioned works discard two important aspects: (i) the nature of social ties and their impact on the influence propagation, and (ii) time aspects concerning the diffusion process. The dichotomy between opposite types of social ties (e.g., friend vs. foe relations) has been widely studied in OSN analysis (e.g., (Leskovec et al. 2010b)), however its incorporation into diffusion models has been relatively little explored so far. For instance, two extensions of competitive LT with negative relations are defined in Talluri et al. (2015), to support positive opinion maximization, and in Weng et al. (2016), to model the adoption of opinions from friends or opposite opinions from foes. All of such models are competitive but do not consider temporal aspects in the activation or propagation processes.
Several works have studied different types of temporal variables and their impact on spreading processes (e.g., (LibenNowell and Kleinberg 2008; Vazquez et al. 2007; Iribarren and Moro 2009)), mainly focusing on lags and delays due to the diverse responsetime and heterogeneous susceptibility of users. In (Liu et al. 2012), the authors propose a latencyaware IC model inspired by (Iribarren and Moro 2009), in which an influencing delay is introduced in the activation function. Under this model, a timeconstrained IM problem is defined, i.e., to find a seed set of size k such that the expected number of nodes is activated before a given time limit. Another extension of IC is proposed in (Chen et al. 2012), where a notion of meeting probability is introduced to control the activation of neighbors. The models in Liu et al. (2012); Chen et al. (2012) are noncompetitive. By contrast, the trustbased latencyaware IC model proposed in (MohamadiBaghmolaei et al. 2015) features competitiveness, nonprogressivity, temporal delay in propagation, and is also designed to deal with trust/distrust relations.
Dynamic behaviors. All of the previously mentioned works still lack aspects modeling the dynamic behavior of the users. In particular, according to recent studies about polarization of opinion in OSNs (Anagnostopoulos et al. 2015) and related works about misinformation reduction (Kumar and Geethakumari 2014; Lewandowsky et al. 2012), a crucial aspect is to intervene before a competing campaign can reach the users, or at least soon enough, so that a user does not have time to radicalize her/his thoughts. This idea was first captured in (Litou et al. 2016), where a dynamic LT model (DLT) is defined to deal with competitive information cascades. The influence weights temporally decay according to a Poisson distribution, and every node can be either positively or negatively activated at a given time depending on the absolute value of the cumulative influence of its neighbors, while the activation sign depends on the sign of the cumulative influence. Moreover, a dynamic behavior aspect lays on the update of the activation threshold whenever a user switches her/his belief.
The latter work shares with our proposal all features of competitiveness, nonprogressivity (although deactivation is not allowed), timeaware propagation, dynamic influence behavior, and incorporation of opposite opinions in the influence probabilities; moreover, it is also based on LT. However, our competitive models differ from DLT in (Litou et al. 2016) since (i) we explicitly model trust and distrust relationships to define the influence probabilities, (ii) our activation function takes into account only the trusted connections while (iii) distinguishing between the two information cascades; (iv) we introduce a quiescence function to model a delay in the information propagation depending on the strength of influence exerted by distrust relations (i.e., foe neighbors); finally, (v) the activation threshold in our models becomes stronger over time as a node is holding a particular belief. In “Comparison with the DLT model” section we shall compare our models with DLT.
Friendfoe dynamic linear threshold models
In this section we describe our proposed class of FriendFoe Dynamic Linear Threshold (F^{2}DLT) models, which is comprised of: the NonCompetitive F^{2}DLT (nCF^{2}DLT), the SemiProgressive Competitive F^{2}DLT (spCF^{2}DLT), and the NonProgressive Competitive F^{2}DLT (npCF^{2}DLT). We first provide an overview of the framework based on F^{2}DLT. Next, we introduce key features common to all models, then we elaborate on each of them.
Overview
Figure 1 illustrates the conceptual architecture of a framework for information diffusion and influence propagation based on our proposed models. Given a population of OSN users, the framework requires three main inputs: (i) a trust network, which is inferred from the social network of those users to model their trust/distrust relationships; (ii) user behavioral characteristics that are intrinsic to each user (i.e., exogenous to an information diffusion scenario) and oriented to express two aspects: activationthreshold, i.e., the effort needed to activate a user through cumulative influence from her/his neighbors, and quiescence, i.e., the user’s hesitation in being actively committed with the propagation process; and, (iii) one or multiple competing campaigns, i.e., information cascades generated from the agent(s) having viral marketing purposes. Moreover, the information diffusion process has a time horizon, and its temporal unfolding is reflected in the evolution of the information diffusion graph: this also depends on the dynamics of the users’ behaviors in response to the influence chains started by the campaign(s), which admit that users may switch from the adoption of a campaign’s item to that of another one. Putting it all together, our F^{2}DLT based framework embeds all previously discussed aspects that are required to explain complex propagation phenomena, i.e., competitive diffusion, nonprogressivity, timeaware activation, delayed propagation, and trust/distrust relations.
Please note that inferring a trust network from a social network is not an objective of this work; rather, we assume that trust relationships between users of an OSN are available and, as we shall describe next in this section, they are exploited as key information to develop our proposed models. Several heuristics have been proposed to infer a trust network from social relations and interactions among users in an OSN. A common approach is to infer trust relationships based on the social influence exerted by users over the network and propagation of trust ratings (Golbeck and Hendler 2006; Overgoor et al. 2012; Hamdi et al. 2013; Jiang et al. 2014). Other studies utilize users’ activities in social media (Gilbert and Karahalios 2009), or users’ attributes and interactions (Liu et al. 2008), or combinations of aspects concerning user affinity, familiarity and reputation (Yin et al. 2012), social influence, social cohesion and the afffective valence expressed by the users in the textual contents they produce (Vedula et al. 2017). The interested reader may refer to (Tang and Liu 2015; Sherchan et al. 2013) for an exhaustive overview on the topic.
Basic definitions
We are given a trust network represented by a directed graph G=〈V,E,w〉, with set of nodes V, set of edges E, and weighting function w:E↦[−1,1] such that, for every edge (u,v)∈E,w_{uv}:=w(u,v) expresses how much v trusts its inneighbor u. Positive, resp. negative, value of w_{uv} corresponds to a trust, resp. distrust, relation.
For every v∈V, we denote with \(N^{in}_{+}(v)\) and \(N^{in}_{}(v)\) the set of neighbors trusted by v (i.e., friends of v) and the set of neighbors distrusted by v (i.e., foes of v), respectively. Moreover, as required in linear threshold models, the constraints \(\sum \nolimits _{u \in N^{in}_{+}(v)} w_{uv} \leq 1\) and \(\sum \nolimits _{u \in N^{in}_{}(v)} w_{uv} \leq 1\) must be fulfilled.
Let \({\mathcal {G}} = G(g, q, T) = \langle V, E, w, g, q, T \rangle \) be a directed weighted graph representing the LTbased information diffusion graph associated with trust network G, where T denotes a time interval for the diffusion process, g and q denote timedependent activationthreshold and quiescence functions. These are introduced in \({\mathcal {G}}\) to model the aspects of timeaware activation and delayed propagation, respectively. We use symbol S_{t} to denote the set of active nodes at time t, and symbol \({\widetilde {S}}_{t}\) to denote the set of active nodes for which, at t, the quiescence time is not expired yet, i.e., the quiescent nodes.
Activationthreshold function. According to the LT model, every node v∈V is associated with an exogenous activationthreshold, θ_{v}∈(0,1], which corresponds to the apriori effort needed in terms of cumulative influence to activate the node. We enhance this concept by defining an activationthreshold function, \(g: V, T \mapsto \mathbb {R}^{+}\), such that for every v∈V and t∈T:
i.e., the activation of v at time t depends both on the user’s preassigned threshold, θ_{v}, and on a timeevolving activation term, 𝜗(·,·), which models the dynamic response of a user towards the activation attempt exerted by her/his neighbors.
To specify 𝜗(·,·), we devise two main scenarios for g(·,·):

A biased scenario, modeled as a nondecreasing monotone function, to capture the tendency of a user to consolidate her/his belief, according to the confirmationbias principle (Anagnostopoulos et al. 2015).

An unbiased scenario, modeled as nonmonotone function, whereby we assume that a user could revise her/his uncertainty to activate over time, thus becoming more or less inclined to change her/his opinion on an information item. This is particularly meaningful in applications such as customer retention, or churn prediction (i.e., a decrease in the activationthreshold would correspond to the tendency of a user to churn in favor of another service).
Both variants 𝜗(·,·) range within the interval [0,1], for any v∈V.
Let us first consider the biased scenario, which is focused on the confirmation bias principle. We choose the following form for the activationthreshold function, by which the value increases by increasing the time a node keeps staying in the same active state:
where \(t_{v}^{last}\) denotes the last (i.e., most recent) time v was activated and δ≥0^{Footnote 1} represents the increment in the value of g(v,t) for consecutive time steps. Thus, the longer a node has kept its active state for the same information cascade (campaign), the higher its activation value, and as a consequence, it will be harder to make the node change its state, or even no more possible (i.e., g(v,t) saturates to 1, as the difference \(\left (t  t_{v}^{last}\right)\) exceeds (1−θ_{v})/δ).
In the unbiased scenario, we define the activationthreshold function such that, for each v, the value of the function is maximum (i.e., 1) just after the activation, i.e., at time \(t=t_{v}^{last}+1\), then for subsequent time steps, the function exponentially decreases towards θ_{v}:
where \(\mathbb {I}[\cdot ]\) denotes the indicator function, i.e., it equals 1 if \(tt_{v}^{last}=1\), 0 otherwise. Note that δ is used differently w.r.t. the previous scenario, as it acts as a coefficient that controls the decrease of the activationthreshold function over time.
Quiescence function. Each node in \({\mathcal {G}}\) is also associated with a quiescence value, which quantifies the latency in propagation through that node. We define a quiescence function, q:V,T↦T, nondecreasing and monotone, such that for every v∈V,t∈T, with v activated at time t:
where τ_{v}∈T represents an exogenous term modeling the user’s hesitation in being fully committed with the propagation process, and \(\psi (N^{in}_{}(v),t)\) provides an additional delay proportional to the amount of v’s neighbors that are distrusted and active, by the time the activation attempt is performed by the v’s trusted neighbors:
where λ≥0 is a coefficient modeling the average user sensitivity in the perceived negative influence. Intuitively, this coefficient would weight more the negative influence as the diffusing informative item is more “worth of suspicion”. Note also that, in Eq. 3, w_{uv} is a negative value, since u is a distrusted neighbor of v, i.e., \(u \in N^{in}_{}(v)\).
Rationale for activation and propagation. Our choice of using, on the one hand, friends for the activation of a user, and on the other hand, foes to impact on delayed propagation, represents a key distinction from related work (Litou et al. 2016; Talluri et al. 2015; Weng et al. 2016). Therefore, in our models, the trusted connections and distrusted connections play different roles: only friends can exert a degree of (positive) influence, whereas foes can only contribute to increase the user’s hesitation to commit with the propagation process. It should be noted that both activation and delayed propagation terms also include exogenous factors. We indeed take into consideration both the existence of environmental and personal factors of influence on an individual’s behavior. Several studies in information diffusion and influence maximization have reported evidences that, apart from influence coming from social contacts, an individual may be affected by some external event(s) and/or personal reasons to adopt an information (Goyal et al. 2010) as well as to delay the adoption of an information (Iniguez et al. 2018). In our setting, we tend to reject as true in general, the principle “I agree with my friends’ idea and disagree with my foes’ idea” (which is also close to the adage “the enemy of my enemy is my friend”), since this would imply that the behavior of a user should be completely determined by the stimuli coming from her/his neighbors. Rather, according to most conceptual models developed in social science and humancomputer interaction fields (see, e.g., (Tedjamulia et al. 2005; Bishop 2007)), we believe that the individual’s influenceability has a component based on personal characteristics.
Noncompetitive model
We introduce the first of the three proposed models, which refers to a singleitem propagation scenario. Figure 2 shows the lifecycle of a node in the diffusion graph under this model.
Definition 1
NonCompetitive FriendFoe Dynamic Linear Threshold Model (nCF^{2}DLT) Let \({\mathcal {G}} = \langle V, E, w, g, q, T \rangle \) be the diffusion graph of Non Competitive FriendFoe Dynamic Linear Threshold Model (nCF ^{2}DLT). The diffusion process under the nCF ^{2}DLT model unfolds in discrete time steps. At time t=0, an initial set of nodes S_{0} is activated. At time t≥1, the following rule applies: for any inactive node \(v \in V \setminus \left (S_{t1} \cup \widetilde {S}_{t1}\right)\), if \(\sum \nolimits _{u \in N^{in}_{+}(v) \cap S_{t1}} w_{uv} \geq g(v,t)\), then v will be added to the set of quiescent nodes \(\widetilde {S}_{t}\), with quiescence time equal to t^{∗}=q(v,t). Once the quiescence time is expired, v will be removed from \(\widetilde {S}_{t}\) and added to the set of active nodes \(\phantom {\dot {i}\!}{S}_{t^{*}}\). The process continues until T is expired or no more activation attempts can be performed. □
Competitive models
Here we introduce the two competitive F^{2}DLT models. Let us first provide our motivation for developing two different competitive models: through the following example, we illustrate a particular situation that may occur when dealing with two campaigns competitively propagating through a network. Please note that, throughout the rest of this paper, we will consider only two competing campaigns for the sake of simplicity; nevertheless, our proposed models are generalizable to more than two competing campaigns.
Example 1
Figure 3 shows an example activation sequence in a competitive scenario between two information cascades, distinguished by colors red and green. At time t=0, nodes u and z are greenactive, and their joint influence causes greenactivation of node v as well (since 0.3+0.5≥0.6). At time t=1, as fully influenced by node x, node z has switched its activation in favor of the red campaign. After this switch, at time t=2, it happens that v’s activation state is no more consistent with the (joint or individual) influenced exerted by u and z. In particular, two mutually exclusive events might in principle happen at t=2: either v is deactivated or v maintains its greenactivation state. ■
The uncertainty situation depicted in the above example prompted us to the definition of two models, namely semiprogressive and nonprogressive F^{2}DLT: the former corresponds to the case of v keeping its current (i.e., green) activation state, whereas the latter corresponds to v returning to the inactive state. Clearly, the two models’ semantics are different from each other: the semiprogressive model assumes that a user, once activated, cannot step aside, unlike the nonprogressive one, which instead requires a user to have always the support of her/his inneighbors to keep activation.
Given two information cascades, or campaigns C^{′},C^{′′}, for every time step t∈T we will use symbols \({S^{\prime }_{t}}\) and \({S^{\prime \prime }_{t}}\) to denote the sets of active nodes, such that \({S^{\prime }_{t}} \cap {S^{\prime \prime }_{t}} = \emptyset \), and analogously symbols \({\widetilde {S}^{\prime }_{t}}\) and \({\widetilde {S}^{\prime \prime }_{t}}\) as the sets of quiescent nodes, for C^{′} and C^{′′}, respectively. Also, \(S_{t} = {S^{\prime }_{t}} \cup {S^{\prime \prime }_{t}}\) and \({\widetilde {S}_{t}} = {\widetilde {S^{\prime }}_{t}} \cup {\widetilde {S^{\prime \prime }}_{t}}\).
It should also be noted that, while sharing the time interval (T) of diffusion, C^{′} and C^{′′} are not constrained to start at the same time t_{0}. Nevertheless, for the sake of simplicity, we hereinafter assume that \(t_{0}=t_{0}^{\prime }=t_{0}^{\prime \prime }\) (with t_{0}∈T), unless otherwise specified (cf. “Results” section).
Definition 2
SemiProgressive Competitive FriendFoe Dynamic Linear Threshold Model (spCF^{2}DLT). Let \({\mathcal {G}} = \langle V, E, w, g,\) q,T〉 be the diffusion graph of SemiProgressive Competitive FriendFoe Dynamic Linear Threshold Model (spCF ^{2}DLT), and C^{′},C^{″} be two campaigns on \({\mathcal {G}}\). The diffusion process under the spCF ^{2}DLT model unfolds in discrete time steps. At time t=0, two initial sets of nodes, \({S^{\prime }_{0}}\) and \({S^{\prime \prime }_{0}}\), are activated for each campaign. At every time step t≥1, the following statetransition rules apply:
R1. For any inactive node \(v \in V \setminus \left (S_{t1} \cup {\widetilde {S}_{t1}}\right)\), if \(\sum \nolimits _{N^{in}_{+}(v) \cap {S^{\prime }_{t1}}} w_{uv} \geq g(v,t)\), then v will be added to \({\widetilde {S^{\prime }}_{t}}\); analogously, if \(\sum \nolimits _{N^{in}_{+}(v) \cap {S^{\prime \prime }_{t1}}} w_{uv} \geq g(v,t)\), then v will be added to \({\widetilde {S^{\prime \prime }}_{t}}\). If both conditions hold, i.e., v can be simultaneously activated by both campaigns, a tiebreaking rule will apply, in order to decide which campaign actually determines the node’s transition in the quiescent state.
R2. When a node v enters the quiescent state corresponding to C^{′} (resp. C^{′′}) for the first time, it will stay in the quiescent nodeset \({\widetilde {S^{\prime }}_{t}}\) (resp. \({\widetilde {S^{\prime \prime }}_{t}}\)) until the quiescence time is expired. After that, v will be moved to \({S^{\prime }_{t}}\) (resp. \({S^{\prime \prime }_{t}}\)), i.e., it will become active for C^{′} (resp. C^{′′}).
R3. Given a node v active for C^{′′}, i.e., \(v \in {S^{\prime \prime }_{t1}}\), if \(\sum \nolimits _{N^{in}_{+}(v) \cap {S^{\prime }_{t1}}} w_{uv} \geq g(v,t)\) and \(\sum \nolimits _{N^{in}_{+}(v) \cap {S^{\prime }_{t1}}} w_{uv} > \sum \nolimits _{N^{in}_{+}(v) \cap {S^{\prime \prime }_{t1}}} w_{uv}\), then v will be removed from \({S^{\prime \prime }_{t}}\) and added to \({S^{\prime }_{t}}\); analogous rule holds for any node active for the first campaign.
Every node for which none of the above transitionstate rules is triggered at time t, it will keep its current state at time t+1. □
The lifecycle of a node in spCF^{2}DLT is shown in Fig. 4. Note that, once a node becomes active, it cannot turn back to the inactive state, but it can only change the activation campaign. Moreover, switch transitions occur instantly.
Definition 3
NonProgressive Competitive FriendFoe Dynamic Linear Threshold Model (npCF^{2}DLT) Let \({\mathcal {G}} = \langle V, E, w, g,\) q,T〉 be the diffusion graph of NonProgressive Competitive FriendFoe Dynamic Linear Threshold Model (npCF ^{2}DLT), and C^{′},C^{′′} be two campaigns on \({\mathcal {G}}\). The diffusion process in npCF ^{2}DLT evolves according to the same rules as in spCF ^{2}DLT plus the following rule concerning the deactivation process of an active node:
R4. For any active node v at time t−1, if \(\sum \nolimits _{N^{in}_{+}(v) \cap {S^{\prime }_{t1}}} w_{uv} < \theta _{v}\) and \(\sum \nolimits _{N^{in}_{+}(v) \cap {S^{\prime \prime }_{t1}}} w_{uv} < \theta _{v}\), then v will turn back to the inactive state at time t.
Every node for which none of the transitionstate rules is triggered at time t (including the ones defined for spCF ^{2}DLT), it will keep its current state at time t+1. □
It should be noted that a node’s deactivation rule depends on θ_{v} only (rather than on the whole function g(v,t)); otherwise, every node activated at a given time could deactivate itself in the next time step, due to the increase in its activation threshold. This would eventually lead to a configuration in which all nodes in the network, except the initially activated ones, are in the inactive state. The lifecycle of a node in the npCF^{2}DLT is illustrated in Fig. 4. Note that, unlike in spCF^{2}DLT, transitions to inactive state are allowed.
Theoretical properties of the models
In this section we provide insights into the proposed models. Our main goal is to understand how the features introduced in each of our LTbased models impact on the models’ spread behavior, particularly on monotonicity and submodularity properties. We organize our analysis into two parts: the first corresponding to noncompetitive diffusion, and the second to competitive diffusion.
Noncompetitive diffusion
We show that nCF^{2}DLT can be reduced to LT with quiescence time, hereinafter denoted as LTqt. By proving the equivalence between the two models, we hence claim that both the monotonicity and submodularity properties hold for nCF^{2}DLT. Note that since we deal with a progressive model, we assume without loss of generality that, for every node v, the activationthreshold function has a constant value for the whole duration of the diffusion process, i.e., g(v,t)=θ_{v}.
Definition 4
Reduction ofnCF^{2}DLT toLTqt. Given \({\mathcal {G}} = \langle V, E, w, g, q, T\rangle \) for nCF^{2}DLT, a diffusion graph \({\mathcal {G}}_{LT}=\langle V_{LT},E_{LT} \rangle \) can be derived, under LTqt, such that V_{LT}=V and E_{LT}={(u,v)(u,v)∈E,w_{uv}>0}. Every node v∈V_{LT} is assigned a quiescence time equal to the maximum value of the quiescence function q_{v}(·), i.e., \(\tau _{v}^{max} = \tau _{v} + \psi (N_{}^{in}(v))\). □
Definition 4 exploits the fact that the distrust connections are not involved in the activation process, but only in the calculation of the quiescence time. Therefore, we can assume this time to be the maximum possible value, and hence we can study the propagation under LTqt. The reduction of nCF^{2}DLT to LTqt is meaningful since the two models are proved to be equivalent, as we report in the following theoretical result.
Proposition 1
The NonCompetitive Trust Threshold Model (nC F^{2}DLT) and the Linear Threshold Model with quiescence time (LTqt) are equivalent. \(\blacktriangleleft \)
Proof
According to the definition of equivalence of two diffusion models in (Kempe et al. 2003; Chen et al. 2013), in order to prove the equivalence of nCF^{2}DLT and LTqt we need to prove that the distribution of the active sets for any given seed set S_{0} is the same under the two models. We provide a proof by induction, hence we consider the evolution of the active sets during the diffusion rounds.
For the LTqt model, the probability of a node to be activated exactly at time t+1 (with t≥1) is given by:
Above, it should be noted that the joint probability \(\Pr \left (v \in \widetilde {S_{t+1}}, v \notin S_{t}\right)\) corresponds to the probability that the threshold associated with node v falls into the interval denoted by the influence received by v until the previous time step and the one received at the current time step. Moreover, Pr(v∉S_{t}) is just the probability that, at time (t−1), the influence received by v is still below its threshold. Finally, we derive the last equality in Eq. 4, which intuitively denotes that the influence exerted by the nodes in S_{t}∖S_{t−1}, i.e., the nodes turning into the active state exactly in the current time step, is decisive to exceed the threshold θ_{v}.
For the npCF^{2}DLT model, the conditional probability \(\Pr \left (v \in \widetilde {S_{t1}} \mid v \notin S_{t}\right)\) can be derived starting from Eq. 4 by constraining w_{uv} such that \(u \in N^{in}_{+}(v)\), i.e., only trusted relations are considered. This leads to an equivalent definition of conditional probability, which holds for every time step t and seed set S_{0}. Therefore, we can conclude that the final active sets will be the same for both models. □
It should be noted that, due to the quiescence times, the sets of active nodes in the two models may not be the same at every time step, but the two final active sets will match each other.
Since the introduction of quiescence time in LT does not have effect on the distribution of the final active nodes (Chen et al. 2013), we obtain the following equivalence: LT ≡LTqt ≡nCF^{2}DLT. Therefore, the activation function is still monotone and submodular under nCF^{2}DLT.
Example 2
Consider Fig. 5 , where the propagation process unfolds according to the LTqt dynamics. Nodes u and z are chosen as initial seeds. Thresholds and weights are set such that θ≤w_{uv} and max{w_{vx},w_{zx}}<θ_{x}≤w_{vx}+w_{zx}, therefore the combined influence of v and z is required for the activation of node x. The dashed edge denotes a distrust connection removed as a result of the reduction defined in Definition 4. In the initial time step (t=0), u activates v causing its transition from the inactive state to the quiescent state (in yellow). When \(t=\tau _{v}^{max}\), v turns to the active state, and together with z it becomes able to trigger the activation of node x (which will eventually become active by the timehorizon T).
It should be noted that the same dynamics holds for the nCF ^{2}DLT model, apart from the difference that concerns the quiescence time of node v: this would be less than \(\tau _{v}^{max}\) since y, a foe of v, is not involved in the propagation process. ■
Competitive diffusion
We focus here on spCF^{2}DLT and npCF^{2}DLT, and show that both models can be reduced to the Homogeneous Competitive Linear Threshold (HCLT) with Majority Vote as tiebreaking rule (Chen et al. 2013). This is a competitive, progressive model based on LT, for which it is known that its activation function is monotone but not submodular regardless of the particular tiebreaking rule.
To begin with, we might recall that the nonprogressive LTbased diffusion can be reduced to the progressive case, using a particular form of layered graph (Kempe et al. 2003). Given a time interval T and a diffusion graph G=〈V,E〉 for nonprogressive LT, a new graph G^{T} can be derived such that every node v∈V will have a replica v_{t} in every layer at time t∈T, and for every edge (u,v)∈E there will be an edge (u_{t−1},v_{t}) in G^{T}.
Unfortunately, this serialization technique cannot be directly applied to our models, since it is not designed to deal with competitive or nonprogressive diffusion and it discards activation or delayed propagation aspects. In the following, we define serialization techniques that are suitable for our competitive models and treat one particular configuration at a time. One general requirement is related to the time horizon to bound the unfolding of the diffusion process. In fact, when dealing with competitive models, the termination guarantee is lost. A simple example is provided next to depict such a nontermination scenario.
Example 3
In Fig. 6 , nodes u and z are chosen as seed for the green campaign and the red one, respectively. Nodes v and x become greenactive and redactive, respectively, at time t=1. Next, they will constantly switch their activation campaign, causing nontermination of the diffusion process. ■
CONFIGURATION 1: No quiescence time, constant activationthreshold.
We assume that q(v)=0 and g(v,t)=θ_{v}, for all v∈V,t∈T. For both spCF^{2}DLT and npCF^{2}DLT, we claim their reduction to the HCLT model with majority voting as tiebreaking rule.
Definition 5
spCF^{2}DLT graph serialization for reduction toHCLT. Given a time interval T, we define a layered graph G^{T}=〈V^{T},E^{T}〉 such that, for each layer at time t∈T, every node v∈V will be represented in V^{T} as a tuple \(\left \langle v^{1}_{t},v^{2}_{t},v^{3}_{t} \right \rangle \). Instances \(v^{1}_{t}\) and \(v^{2}_{t}\) have activationthreshold equal to 0, while \(v^{3}_{t}\) has the same threshold as the original node v∈V. The set of edges is defined as \(E^{T} = \left \{\left (u_{t}^{1},v_{t+1}^{3}\right) \mid (u,v) \in E, t, t+1 \in T \right \} \cup \left \{\left (v_{t}^{3},v_{t}^{2}\right) \mid v \in V, t \in T\right \} \cup \left \{\left (v_{t}^{2},v_{t}^{1}\right) \mid v \in V, t \in T\right \} \cup \left \{\left (v_{t}^{1},v_{t+1}^{2}\right) \mid v \in V, t \in T\right \}\), and the following constraint on edge weights must hold: \( \forall v_{t}^{2} \in V^{T}, \ w\left (v_{t1}^{1},v_{t}^{2}\right) < w\left (v_{t}^{3},v_{t}^{2}\right).\) □
In the above definition, triples act as connectors between two consecutive timelayers. The role of any connector component is as a sort of “switch” to enable a node choosing between its activation state in a layer and the one in the subsequent layer. In other words, node \(v^{1}_{t}\) is the main instance of node v, since the activation state of \(v^{1}_{t}\) reflects the state of v in the original graph, under spCF^{2}DLT at time t; node \(v^{3}_{t}\) is the instance of v connected with other nodes from layer at t−1, therefore it reflects the influence received by v in the original graph, at time t−1; if the activation attempt to \(v^{3}_{t}\) fails, node \(v^{2}_{t}\) will be activated with the same state of v; otherwise, according to the edge weight constraint (cf. Definition 5), \(v^{2}_{t}\) will switch to the other campaign, and then will propagate to instance \(v^{1}_{t}\). Recall that \(v^{1}_{t}, v^{2}_{t}\) have zero activationthreshold. Figure 18 in Appendix A shows an example of serialization for a spCF^{2}DLT diffusion graph with time horizon set to 2.
It should be emphasized that, compared to the serialization method in (Kempe et al. 2003), we require replication of each node in each layer, and additional edges connecting the replicainstances, in order to allow the maintenance of the activation state when no activation event occurs between two timeconsecutive layers.
Analogous reduction technique can be defined for the npCF^{2}DLT model.
Definition 6
npCF^{2}DLT graph serialization for reduction to HCLT. Given a time interval T, we define a layered graph G^{T}=〈V^{T},E^{T}〉 such that, for each layer at time t∈T, every node v∈V will be represented in V^{T} as a tuple \(\left \langle v^{1}_{t},v^{2}_{t},v^{3}_{t} \right \rangle \). Instances \(v^{1}_{t}\) and \(v^{2}_{t}\) have activationthreshold equal to 1 and 0, respectively, while \(v^{3}_{t}\) has the same threshold as the original node v∈V. The set of edges is defined as \( E^{T} = \left \{\left (u_{t}^{1},v_{t+1}^{3}\right) \mid (u,v) \in E, t,t+1 \in T \right \} \cup \left \{\left (v_{t}^{3},v_{t}^{2}\right) \mid v \in V, t \in T\right \} \cup \left \{\left (v_{t}^{2},v_{t}^{1}\right) \mid v \in V, t \in T\right \} \cup \left \{\left (v_{t}^{3},v_{t}^{1}\right) \mid v \in V, t \in T\right \} \cup \left \{\left (v_{t}^{1},v_{t+1}^{2}\right) \mid v \in V, t, t+1 \in T \right \}\), and the following constraints on edge weights must hold: \( \forall v_{t}^{2} \in V^{T}, \ w\left (v_{t1}^{1},v_{t}^{2}\right) < w\left (v_{t}^{3},v_{t}^{2}\right)\), and \( \forall v_{t}^{1} \in V^{T}, \ w\left (v_{t}^{2},v_{t}^{1}\right) + w\left (v_{t}^{3},v_{t}^{1}\right) = 1.\) □
It should be noted that the last condition in Definition 6 imposes nodes \(v_{t}^{2}\) and \(v_{t}^{3}\) to hold the same activation state in order to activate \(v_{t}^{3}\).
Analogously to the reduction of spCF^{2}DLT to HCLT, we can conveniently devise a notion of “connector” component between any two consecutive layers, which however in this case should also account for node deactivations. Figure 19 in Appendix A shows an example of connector for the npCF^{2}DLT model.
Claim 1
For any given diffusion graph \({\mathcal {G}}\) under spCF^{2}DLT (resp. npCF^{2}DLT), assuming constant activationthreshold and no quiescence time, every node v in \({\mathcal {G}}\) is active at time t∈T if and only if its corresponding instance \(v^{1}_{t}\) is active in the serialized graph G^{T} (resp. npCF^{2}DLT). \(\blacktriangleleft \)
CONFIGURATION 2: Constant quiescence time, constant activationthreshold.
We assume that q(v)=τ_{v} and g(v,t)=θ_{v}, for all v∈V. For both spCF^{2}DLT and npCF^{2}DLT, we claim their reduction to HCLT with majority voting as tiebreaking rule.
In this case, we need to consider that, whenever a node is activated, its quiescence time may not expire before the time horizon; for this reason, we will consider only nodes reachable from \(S_{0} = S^{\prime }_{0} \cup S^{\prime \prime }_{0}\) within T, for any two given seed sets \(S^{\prime }_{0}\) and \(S^{\prime \prime }_{0}\). To identify such nodes, we define a quiescenceaware distance measure that accounts for the quiescence times along the path connecting any two nodes. Given nodes u,v, and the set P(u,v) of all paths between u and v, the distance from u to v will be measured as \( d(u,v) = \min _{p \in P(u,v)} \sum \nolimits _{x \in p} \tau _{x} \). Moreover, we denote with d(S_{0},v) the minimum distance between nodes u∈S_{0} and v. By exploiting this distance, we will discard all nodes that cannot be “contagious” before the end of T, say t_{max}. Therefore, the node set V^{T} of the layered graph is defined as:
Each node v∈V with quiescence time τ_{v} will have connections from the previous layers according to the following rule: for any layer at time t, if t<d(S_{0},v) then v will not have any incoming edges, otherwise all incoming edges of v will be from the layer at time t−τ_{v}−1.
Using the above settings in the serialization method previously presented, it can easily be demonstrated that both spCF^{2}DLT and npCF^{2}DLT can be reduced to an equivalent HCLT model.
Claim 2
For any given diffusion graph \({\mathcal {G}}\) under spCF^{2}DLT (resp. npCF^{2}DLT), assuming constant activationthreshold and constant quiescence time, every node v in \({\mathcal {G}}\) is active at time t∈T if and only if its corresponding instance \(v^{1}_{t}\) is active in the serialized graph G^{T} (resp. npCF^{2}DLT). \(\blacktriangleleft \)
CONFIGURATION 3: Variable quiescence time, constant activationthreshold.
We assume that q(v,t) is variable, while g(v,t)=θ_{v}, for all v∈V,t∈T.
Like in the previous case, we need to specify the seed sets \(S^{\prime }_{0}, S^{\prime \prime }_{0}\). However, note that the quiescence time of a node now depends on the actual activation state of its inneighborhood (cf. Eq. 3), which makes it unfeasible a direct serialization of the whole diffusion graph.
Starting from the original diffusion graph \({\mathcal {G}}\), we derive an “intermediate” graph \(\widehat {{\mathcal {G}}}\), which is equivalent to \({\mathcal {G}}\) unless each node v∈V is associated with a quiescence time interval \([\tau _{v},\tau _{v}^{max}]\), where \(\tau _{v}^{max}=\tau _{v} + \psi (N_{}^{in}(v))\). Let us denote with \({\mathcal {G}}^{min}\) the instance of \(\widehat {{\mathcal {G}}}\) such that the quiescence time of every \(v \in \widehat {{\mathcal {G}}}\) is τ_{v}, and with \({\mathcal {G}}^{max}\) the instance of \(\widehat {{\mathcal {G}}}\) such that the quiescence time of every \(v \in \widehat {{\mathcal {G}}}\) is \(\tau _{v}^{max}\).
Although we cannot assert that spCF^{2}DLT and npCF^{2}DLT are equivalent to HCLT under the layered graph obtained by applying the previously described serialization techniques, an important theoretical result can nonetheless be provided, as reported next.
Claim 3
For any diffusion graph \({\mathcal {G}}\) under spCF^{2}DLT (resp. npCF^{2}DLT), with campaigns C^{′},C^{′′}, assuming constant activationthreshold and variable quiescence time, for any seed sets \(S^{\prime }_{0}\) and \(S^{\prime \prime }_{0}\), it holds that:
where σ^{′} is the number of nodes activated by C^{′} under spCF^{2}DLT (resp. npCF^{2}DLT), \(\sigma ^{\prime }_{H\text {}CLTmax}(S^{\prime }_{0},S^{\prime \prime }_{0})\) and \(\sigma ^{\prime }_{H\text {}CLTmin}(S^{\prime }_{0},S^{\prime \prime }_{0})\) are the number of nodes activated by C^{′} under HCLT in the layered graph obtained by serialization of spCF^{2}DLT (resp. npCF^{2}DLT) on \({\mathcal {G}}^{max}\) and \({\mathcal {G}}^{min}\), respectively. \(\blacktriangleleft \)
Enabling variable quiescence time, i.e., ψ(·), means that the exact time required by each node to make a transition from the quiescent state to the active one cannot be established in advance at the beginning of the propagation process. Since for any node v the quiescent time ranges within \([ \tau _{v}, \tau _{v}^{max}]\), we devise two opposite scenarios. In the first scenario, represented by the rightmost side of Eq. 5, each node is assumed to wait the minimum amount of time, i.e., τ^{min}, before its activation; this leads to a higher fraction of nodes that could be activated before the time horizon T is reached. The second scenario, represented by the leftmost side of Eq. 5, assumes that each node has to wait the maximum possible quiescence time, i.e., τ^{max}; as a consequence, a smaller fraction of nodes will be able to complete the activation process before the time limit, thus leading to a lower spread.
CONFIGURATION 4: No quiescence time, variable activationthreshold.
We assume that q(v)=0 and g(v,t)=θ_{v}+𝜗(θ_{v},t), for all v∈V,t∈T. For both spCF^{2}DLT and npCF^{2}DLT, we claim their reduction to H−CLT with majority voting as tiebreaking rule. In the following, we refer to the biased activationthreshold function, although it is easy to show analogous considerations for the nonbiased activationthreshold function.
Because of the dynamic behavior of the activationthreshold function, we cannot predict its value at any particular time step of the diffusion process; nevertheless, by specifying the value of coefficient δ in Eq. 1, we can derive the value of \(t_{v}^{max}\), which would suggest how many timelayers we have to look back in order to know the actual threshold value of v at a particular time t. In order to capture such dynamic aspect in HCLT, we define a further serialization technique, built on top of the previously defined. We will restrict to a particular case, afterwards we provide some rules that apply to the general case.
Let us assume to focus on a particular node v, and at any two consecutive time steps of activation for the same campaign its threshold increases by δ. Again, node v will have replicas for any timelayer t, i.e., \(\langle v^{1}_{t},v^{2}_{t},v^{3}_{t} \rangle \), with the first replica, \(v^{1}_{t}\), holding the actual state of v in the corresponding serialized graph for the competitive model. In addition, we introduce further replicas, in number equal to the value \(t_{v}^{max}\); suppose, for the sake of simplicity, \(t_{v}^{max} = 3\), we derive replica nodes \(\left \langle v_{t}^{3,r1},v_{t}^{3,r2},v_{t}^{3,r3} \right \rangle \), such that each of them will have a threshold value in [θ_{v},1] with increment of δ. Figure 7 illustrates this new component in the serialized graph.
Because this component is introduced as an extension of the previous techniques, the meaning of the nodes \(v^{1}_{t1},v^{1}_{t2},v^{1}_{t3}\) remains the same as in the previous cases. On the right side of Fig. 7, each of the additional replicas has a different value of threshold and it is connected with nodes coming from the previous layers. Clearly, the overall behavior of this component depends on the weights attached to every edge in the structure. In this regard, we define the following constraints on the edge weights:
It should be noted that the activation attempts are performed directly on the replicas. Therefore, the above constraints on the edge weights control whether a node assumes the state derived as the outcome of the most recent activation attempts, or the one consistent with its personal history. as the outcome of the most recent activation attempts or the one consistent with its personal history. Each of the aforementioned inequality contributes to this decision process, following a different purpose. Eq. 6(a) ensures that the state derived from the last activation attempt is always preferred to the one derived from the previous time step. Eq. 6(b) ensures that the information coming from the previous time steps shall be given the same importance as the one derived from the current replicas. Eq. 6(cd) ensures that the most recent information, i.e., the closest previous time steps, has higher priority than the earliest one. Eq. 6(e) ensures that there is consistency with respect to the state assumed in the closest previous time step and farthest involved time step (e.g., the third previous time step in the addressed scenario).
Moreover, the threshold of the “central” node in the component (\(v^{3}_{t}\)) is set to w^{3,r1}, to ensure sequentiality of the diffusion. By setting \(\theta _{v_{t}^{3}}\) equal to w^{3,r1}, we avoid that \(v_{t}^{3}\) can be activated by its own replicas belonging to layers preceding the t−1th layer.
Figure 8 shows how the above defined connector is integrated into a serialization technique. In the figure, only the connections incident on vertex v are expanded. The red edges are the ones connecting consecutive layers, therefore the replica \(v_{t}^{3,r1}\) is connected with the previous layer, the replica \(v_{t}^{3,r2}\) is connected with the second previous layer and so on. Blue edges represent the new connections due to the introduction of this new component.
Claim 4
For any given diffusion graph \({\mathcal {G}}\) under spCF^{2}DLT (resp. npCF^{2}DLT), assuming variable activationthreshold and no quiescence time, every node v in \({\mathcal {G}}\) is active at time t∈T if and only if its corresponding instance \(v^{1}_{t}\) is active in the serialized graph G^{T} (resp. npCF^{2}DLT). \(\blacktriangleleft \)
Evaluation methodology
Data
We used four realworld, publicly available networks, namely: Epinions (Leskovec et al. 2010b), Slashdot (Leskovec et al. 2010b), WikiConflict (Brandes et al. 2009) and WikiVote (Leskovec et al. 2010a). Epinions is a “whotrustwhom” network of the homonymous review site. Slashdot models friend/foe relations between the users of the homonymous technologyrelated news website. WikiConflict refers to Wikipedia users involved in an “editwar”, i.e., edges represent either positive or negative conflicts in editing a wikipage. WikiVote models “whovotewhom” relations between Wikipedia users that voted for/against each other in admin elections. Our choice of the evaluation datasets was mainly driven by two intents: (i) to provide a reproducible evaluation framework based on publicly available network data, and (ii) to test our models on a diversified set of realworld OSNs with suitable characteristics for information propagation processes.
Table 2 summarizes main structural characteristics of the networks. To favor meaningful competition of campaigns based on selected pairs of strategies, we limited the diffusion context to the largest strongly connected component in each evaluation network; note that, for WikiConflict, the largest strongly connected component coincides with the whole graph. Also, the clustering coefficient corresponds to the definition of global transitivity in an undirected graph (the direction of the edges is ignored).
All networks are originally directed and signed; in addition, the two Wikipediabased networks also have timestamped edges. In order to derive the weighted graphs of influence probabilities, we defined the following method: for every (u,v)∈E, the edge weight w_{uv} was sampled from a binomial distribution \(\mathcal {B}\left (N^{in}_{+}(v),p\right)\) if \(u \in N^{in}_{+}(v)\) (i.e., v trusts u), otherwise \(w_{uv} \sim \mathcal {B}\left (N^{in}_{}(v),p\right)\), where the probability of success p is equal to the fraction of trust edges in the network; the rationale is that for higher fraction of trusted connections in the network, the nodes will be more likely to trust each other, and hence each node is more likely to be involved in the propagation process. We performed 1000 samplings of edge weights, for each of the four networks. Therefore, all presented results will correspond to averages of 1000 simulation runs.
Seed selection strategies
We defined four seed selection strategies, each of which mimics a different, realistic scenario of influence propagation.
Exogenous and malicious sources of information. This method, hereinafter referred to as MSources, aims at simulating the presence of multiple sources of malicious information within the network. Here, an exogenous source is meant as a node without incoming links, e.g., a user that is just interested in spreading her/his opinion: such a node is also regarded as malicious if a high fraction of outgoing influence exerted by the node is distrusted by outneighbors. Formally, given a budget k, the method selects the topk users in a ranking solution determined as \(r(v) = (\bar {W}^{} / (\bar {W}^{} + \bar {W}^{+})) \log (N^{out}(v))\), for every v such that N^{in}(v)=∅, where \(\bar {W}^{+}, \bar {W}^{}\) are shortcut symbols to denote the sum of trust (resp. distrust) weights, respectively, outgoing from v.
Exogenous and influential trusted sources of information. Analogously to the previous method, this one, dubbed ISources, searches for the “best” influential trusted sources. The ranking function is as \(r(v) = (\bar {W}^{+} / (\bar {W}^{} + \bar {W}^{+})) \log (N^{out}(v))\). Note that this still takes into account the negative weights, because even a highly trusted user might be distrusted by some other users (e.g., “haters”).
Stress triads. This strategy is based on the notion of structural balance in triads (Leskovec et al. 2010b). Figure 9 shows an example of stresstriad configuration: node v has two incoming connections, the one from node z with negative weight, and the other from u with positive weight, and there is also a trust link from z to u. We say that z is a stressnode since, despite the distrusted link to v, it could also indirectly influence v through the trusted connection with u. Based on that, our proposed StressTriads strategy searches for all triads containing stressnodes and selects as seeds the first k stressnodes with the highest number of triads they participate to.
Newcomers. We call a node v∈V as a newcomer if all of its incoming edges are timestamped as less recent than its oldest outgoing edge. The starttime of v is the oldest timestamped associated with its incoming edges. We divide the set of newcomers into two groups obtained by equalfrequency binning on the temporal range specific of a network. Upon this, we distinguish between two strategies, dubbed LeastNew and MostNew, which correspond to the selection of k newcomers having highest outdegree among those with the oldest starttime and with the newest starttime, respectively. Both strategies were applied to WikiVote and WikiConflict, due to the availability of timestamped edges.
Settings of the model parameters
For every user v, the exogenous activationthreshold θ_{v} and quiescence time τ_{v} were chosen uniformly at random within [0,1] and [0,5]. Moreover, λ (used in the quiescence function) was varied between 0 and 5, while the coefficient δ (used in the activationthreshold function) was selected in {0,0.1} for the biased scenario (Eq. 1) and kept fixed to 1 for the unbiased scenario (Eq. 2).
Results
We organize the presentation of our experimental results into three parts. The first part is devoted to the evaluation of the noncompetitive model (“Evaluation of nCF^{2}DLT” section), and the second part for the competitive models (“Evaluation of competitive models” section). In the third part (“Comparative evaluation” section), we present a comparative evaluation of our noncompetitive model against IC and stochastic individualcontact epidemic models, whereas for the competitive scenario, we compare our models with the DLT model (Litou et al. 2016).
Evaluation of nCF ^{2} DLT
Spread, stressed users and negative influence
We analyzed the number of final activated users (i.e., spread) by varying the size (k) of seed set, for every seed selection strategy. In this analysis, we assumed constant activation thresholds (i.e., 𝜗(·,·)=0) and constant quiescence times (i.e., ψ(·,·)=0). Moreover, we distinguished between “stressed” and “unstressed” users, being the former regarded as active users having at least one distrusted active inneighbor. As shown in Fig. 10 for some representative cases, besides the expected growth in spread as k increases, we found the activation of stressed users lower in amount but following similar trend as that corresponding to unstressed users. For both types of users, ISources revealed higher spread capability, followed by StressTriads, in all networks (with the exception of WikiVote). The two newcomersbased strategies (where applicable) turned out to be effective as well, with LeastNew prevailing on MostNew for lower k. By contrast, MSources was in general unable to yield a spread comparable to other strategies.
We further investigated the effect of distrusted connections on the spread during the unfolding of the diffusion process. In this regard, Table 3 shows the amount of nodes that, at the time of their involvement in the propagation process, were negatively influenced by inneighbors activated at any previous time, along with their perceived negative influence. The symbol “” in Table 3 denotes that the corresponding seedselection strategy does not apply to a particular network. In general, we observed a significant presence of negative influence spread when using ISources and StressTriads. Considering Epinions and Slashdot, the former (resp. the latter) corresponded to a negative influence spread of the order of thousands (resp. hundreds), with average influence weight around 0.3 (resp. 0.2). The impact of these strategies was lower in the Wikipedia networks (one order of magnitude below). MSources yielded to null (in Epinions and Slashdot) but also nonnegligible (in WikiVote and WikiConflict) spread. By contrast, the newcomersbased strategies had small (in WikiVote) or negligible (in WikiConflict) effect on the negative influence spread.
Activation loss
As partially unveiled by the previous analysis, the users’ involvement in the propagation process is affected by the behavior of the quiescence function, whose impact would increase with the amount of distrusted influence in the spread. This further prompted us to measure the activation loss, i.e., the percentage decrease of activated users, due to the enabling of the timevarying quiescence factor (i.e., λ>0 in Eq. 3) in the users’ activation states. Figure 11 shows results corresponding to relatively large λ (set to 5) and k (set to 50). For each seed selection strategy, the curve is drawn by using polynomial splines,^{Footnote 2} where the marked points (from low to high time steps) refer to the 25%, 50%, 75% and 100% of the time horizon observed for the diffusion process under the chosen strategy without timevarying quiescence times. One general remark that stands out is a relatively high percentage of activation loss for the initial time steps; this holds in particular for StressTriads, which might be explained since the initial influenced users by means of this strategy tend to be subjected to a certain amount of distrusted influence. As the time steps get closer to the time horizon, the activation loss tends to significantly decrease, down to nearly zero in most cases, with few exceptions including the use of ISources in Slashdot and Epinions, and StressTriads and MSources in WikiVote — note this is indeed consistent with the previous analysis on negative influence spread.
Evaluation of competitive models
To analyze the behavior of spCF^{2}DLT and npCF^{2}DLT, we aimed at simulating a scenario of limitation of misinformation spread, i.e., we assumed that one campaign, the “bad” one, has started diffusing, and consequently another campaign, the “good” one, is carried out in reaction to the first campaign.
Combining seed selection strategies
Within this view, we preliminarily investigated about strategy combinations that might be reasonably considered for a misinformation spread limitation problem. Table 4 provides a number of statistics we collected to characterize selected pairs of strategies, for two campaigns carried out independently to each other, i.e., in a noncompetitive scenario, with k=50. Using StressTriads for the bad campaign and ISources for the good campaign was found to be significant for all networks, with sharing percentage close to 100% in Epinions and Slashdot and above 80% in WikiConflict. Also, pairing MSources with ISources, and LeastNew with MostNew, was wellsuited in Wiki networks.
Setting and goals for the evaluation of competitive diffusion
As previously mentioned, the seed selection strategies chosen for the two campaigns might not start at the same time, in which case we assume that the firststarted one is the bad campaign. Moreover, we used fixedprobability as tiebreaking rule, with probability equal to 1 for the bad campaign. Also, we set the time horizon to the endtime of the (noncompetitive) diffusion of the bad campaign.
Our main goal in the analysis of the two competitive models was to understand the effect of the setting of the activationthreshold function on the users’ campaignchanges/deactivations, under the case of “realtime correction” or “delayed correction” by the good campaign against the bad one (cf. Introduction).
Evaluation of spCF ^{2} DLT
We present results on the campaign spreads, the number of users activated for one campaign that switched to the other campaign, and the total number of switches; the latter two measurements are represented, in the barcharts shown in Fig. 12, by the lower and upper whiskers, respectively, in the linerange vertically placed on each bar. Results correspond to startdelays Δ t_{0} of the good campaign w.r.t. the bad one (from 0 to 75% of the endtime of the bad campaign). For this analysis, we considered the biased definition of the activationthreshold function (Eq. 1).
One general remark is that, for δ=0,Δ t_{0}=0, the seed strategy that showed to be most effective in spread in the noncompetitive case (cf. Table 4) confirmed its advantage against the other campaign’s strategy. Nevertheless, for δ>0,Δ t_{0}>0, the two campaigns would tend to an equilibrium, or even to invert their trend (e.g., in Epinions and WikiVote). In particular, by accounting for (even little) confirmation bias and letting both campaigns start at the same time, ISources slightly increases its spread (which is explained since this strategy allows for activating first a high fraction of shared users, e.g., 70% in Epinions); but, as the startdelay increases at 50%, the good campaign is no more able to save users from being influenced by the bad campaign (i.e., StressTriads in Epinions, MSources in WikiVote).
Interesting remarks were also drawn from the analysis of the transitions from one campaign to the other one. For δ=0, as the startdelay increases, the number of switched users follows a nearly constant trend in all networks (but WikiVote, where we observed a drastic decrease for both campaigns), while the total number of switches is subjected to a more evident decreasing trend. Moreover, we observed a higher number of (unique and total) switches from the bad campaign to the good campaign, than vice versa, which occurred even when the spread of the bad campaign was higher than the good one (e.g., in WikiVote, for both combinations of strategy choices). Setting δ=0.1 led to a general decrease in the switch measurements w.r.t. the corresponding previous case, and also to a substantial increase in “saved” users by the good campaign.
Biased vs. unbiased activationthreshold function.
We also investigated how our proposed semiprogressive model behaves under the unbiased scenario correponding to the activationthreshold function (Eq. 2).
Figure 13 shows flow diagrams of the spread based on spCF^{2}DLT for the selection of strategies ISources (red color) and StressTriads (green color), with the activationthreshold function defined either for the unbiased scenario (plots on the top) or for the biased scenario (plots on the bottom). In each plot, the height of a vertical bar along with the percentage displayed upon it, denote the number of active users at a particular time step and the ratio w.r.t. the maximum number of active users achieved by the corresponding selection strategy. The space between two consecutive bars corresponds to a time window, here set to 6 time steps for readability reasons. In each window we record two main events: (i) the number of active nodes that keep the same activation state, represented in base2 logarithmic scale by the flow connecting two consecutive bars for the same campaign, and (ii) the number of users that switched from one campaign to the opposite one, represented in base10 logarithmic scale by the flow connecting two consecutive bars with different colors.^{Footnote 3} Note that for this analysis we discarded the start delay for the good campaign.
As expected, the number of switched users tends always to be in favor the good campaign, which has typically the best strategy of activation. However, and more importantly, the number of switched users in the unbiased scenario is significantly greater than in the biased scenario. Moreover, when the confirmationbias effect is enabled, the majority of the switches are concentrated in the initial timewindows, then they follow a rapid decreasing trend until the time horizon. On the contrary, in the unbiased scenario, there is still a concentration of switches in the early stages of the propagation, but it becomes less evident and the number of switches tends to decrease more smoothly as opposed to the confirmationbias scenario. This is particularly evident in Slashdot, where switches last until the latest timewindows, while in the confirmationbias scenario the switches stop just after the third timewindow. No significant differences can be observed on WikiConflict, which is explained since the majority of shared nodes are activated in the early stage of the propagation, where the diffusion seems to behave in the same way regardless of the particular activationthreshold function.
Evaluation of npCF ^{2} DLT
Compared to the evaluation of spCF^{2}DLT, the spread trends observed under npCF^{2}DLT showed no particular differences. However, more importantly, the occurrence of deactivation events, which are admitted by npCF^{2}DLT, appeared to favor the good campaign strategy, as shown in Fig. 14. In particular, in Epinions and Slashdot, the number of userunique and total deactivations tend to increase for the bad campaign and to decrease for the good one; moreover, although the spread of the good campaign remains higher, the deactivations for the good campaign are more frequent than those for the bad campaign as long as the startdelay remains zero or low, and the confirmation bias factor is not introduced. A few differences arise in WikiConflict. As concerns StressTriads vs. ISources, although the 95% of shared users is activated first by the bad campaign, this advantage revealed not to be enough to avoid that the good campaign will eventually activate more users. In fact, the number of deactivations with δ=0.1 increases for StressTriads and is always higher than for ISource, in which the statistic remains nearly constant by increasing the startdelay. Similar situation was observed for the combination MSources with ISources. Also, using LeastNew with MostNew led to no deactivations, which might be explained since the totality of shared users was reached first by the bad campaign (cf. Table 4).
Comparative evaluation
We conducted a twofold comparative evaluation, divided in two stages. The first one refers to the noncompetitive scenario, whereby we compared nCF^{2}DLT to two epidemic models, i.e., SIR and SEIR, and the IC model. inspired by the studies in the epidemiology field. The second stage of our evaluation addresses the comparison between spCF^{2}DLT with the DLT model (Litou et al. 2016), which is the closest to our work, as we previously discussed in “Related work” section.
Comparison with the IC, SIR and SEIR models
We begin with briefly recalling the basic principles underlying the competitor models considered in this section. The independent cascade (IC) model is a stochastic discretetime diffusion model like LT, such that once a node becomes active, in the following time step of propagation it has a single chance of activating each of its outneighbors. As concerns the epidemic models SIR and SEIR, the individuals of a population are divided in compartments that describe one of the following epidemiological states: susceptible (S), infective (I), latentperiod or exposed (E), and recovered (R); therefore, individuals transition through those states. It is important to note that, since we need to treat each node in a network as an individual agent in order to enable a comparison with our diffusion model, we implemented SIR and SEIR based on a stochastic individualcontact network modeling as opposed to the standard, deterministic compartmental modeling (cf. “Related work” section). Within this view, the infection process in SIR is governed by two main parameters: (i) the transmission or contact rate (β), i.e., the probability of a susceptible node to be infected by any of its infected inneighbors; and (ii) the recovery rate (γ), i.e. the probability of an infected node to transition to the recovery state with immunity, thus consequently stopping propagating the disease along the network. Moreover, in the SEIR model, the transition to the exposed state is governed by the incubation rate (σ), which defines the average duration of incubation as 1/σ; note that the notion of exposed state somehow resembles the quiescent state of nodes in nCF^{2}DLT.
Figure 15 shows the complementary cumulative distribution function (CCDF) of the probability for a node of being active/infected from any given time step t to the termination of the process. The presented results correspond to β set to 0.2 and γ set to either 0 or 1: note that γ=1 implies that any node recovers immediately after its activation, and hence similarly to the IC model it has a single chance for activating its susceptible outneighbors; by constrast, setting γ=0 implies that a node is unable to recover after its activation, therefore similarly to nCF^{2}DLT it will continue to contribute to the activation of its susceptible/inactive outneighbors until the end of the process. (Further results for other settings of β and γ are reported in Appendix B.) Moreover, for SEIR, we set σ=0.4, thus imposing an average incubation time equal to 2.5 time steps; this setting enables a fair comparison with our model, since each node will be expected to spend the same amount of time in the quiescent state for nCF^{2}DLT (cf. “Settings of the model parameters” section) as in the exposed state for SEIR.
Looking at the figure, as expected IC and SIR (with γ=1) show an almost identical behavior, since most activations occur in the early stage of the propagation. On the contrary, when γ=0, the SIR model tends to behave relatively closer to nCF^{2}DLT rather than IC, since the activations appear more uniformly distributed along the lifetime of the process. Also, the introduction of the exposed state in the SEIR model forces the dynamics of the propagation to be further more similar to the nCF^{2}DLT model, especially with γ=1. One general remark that stands out is that nCF^{2}DLT tends to favor a slower diffusion, since the propagation process lasts consistently longer than IC and the epidemic models. Moreover, nCF^{2}DLT yields a smoother behavior in terms of timedecay of its CCDF than those corresponding to the other models.
In general, we can state that already for the noncompetitive scenario, epidemic models even in their stochastic contact network formulation provide a differnt solution in terms of behavioral dynamics w.r.t. our proposed nCF^{2}DLT.
Comparison with the DLT model
We finally conducted a stage of comparative evaluation with the DLT method (Litou et al. 2016) (cf. “Related work” section). To this purpose, we analyzed the trends of spread and corresponding overlaps of activated nodes, under a competitive scenario. For DLT, we considered two cases: the one including the decay of influence probabilities (with Poisson decay coefficient set to 1), and the other one discarding the influence decay (as also studied in (Litou et al. 2016)), hereinafter dubbed DLT ^{∗}; as concerns our models, we were forced to use spCF^{2}DLT since DLT does not allow node deactivation.
Figure 16 shows results obtained on Slashdot, for the choice of strategies ISources and StressTriads (similar trends were observed for Epinions and Wiki networks). A first remark is that, regardless of the seed selection strategy, the diffusion process under DLT terminates in a very few time steps, mainly due to the influence decay factor. Also, before convergence, DLT enables the activation of more nodes than spCF^{2}DLT, though this actually corresponds to a small portion of the finally activated nodes by spCF^{2}DLT and, in any case, with an overlap that is generally below the 50%.
We explored more in detail the spread overlap between spCF^{2}DLT and DLT as well as its variant without decay (DLT ^{∗}); for spCF^{2}DLT, we considered the settings δ=0 and δ=0.1. Figure 17 shows the heatmaps for the percentages of overlap of activated nodes at convergence of their respective models. As we observe in both plots, and regardless of δ, the overlap between DLT ^{∗} and spCF^{2}DLT is around 4045%, which further drops to less than 10% when the influence decay is considered (the lighter, the lower is the overlap).
Overall, we can conclude that DLT, and even its variant DLT ^{∗} without influence decay, behaves significantly different from our semiprogressive F^{2}DLT.
Discussion and usage recommendations
Our theoretical inspection of the proposed models, whose technical details have been presented in “Theoretical properties of the models” section, revealed two important findings:
(F1): The noncompetitive, progressive model, nCF^{2}DLT, is proven to be equivalent to LT with Quiescence Time; therefore, the activation function in nCF^{2}DLT is monotone and submodular.
(F2): The competitive, nonprogressive models, spCF^{2}DLT and npC F^{2}DLT, can be reduced, via graph serialization, to Homogeneous Competitive LT (Chen et al. 2013), which is competitive and progressive, and has monotone, nonsubmodular activation function; therefore, the activation function in spCF^{2}DLT and npC F^{2}DLT is monotone but not submodular. It should be emphasized that the basic technique of graph serialization introduced in (Kempe et al. 2003) to reduce the nonprogressive LTbased diffusion to the progressive case, cannot be applied to our proposed models, since it is not designed to deal with competitive or nonprogressive diffusion and it discards activation or delayed propagation aspects; to overcome this issue, we provided new serialization techniques and relating definitions of layeredgraphs that are suitable for our competitive models, focusing on particular settings of the activationthreshold and quiescence functions.
The two findings clearly have different impact on the development, upon our F^{2}DLT models, of approximate solutions to influence maximization, rumor blocking, and related problems. On the other hand, in terms of expressiveness of our competitive F^{2}DLT models, it should be noted that the serialization techniques require the construction of layered graphs whose size easily grows with some of the models’ parameters, making the application of such serialized graphs unfeasible at a large scale. Therefore, using our competitive F^{2}DLT models turns out to be essential in the representation of complex, dynamic propagation phenomena.
Our proposed class of trustaware, dynamic models for noncompetitive and competitive information diffusion offers a versatile solution for enhanced understanding of complex influencepropagation phenomena that occur in reallife network scenarios. Our models are also unique, since they have significantly different behavioral dynamics w.r.t. epidemic models and the dynamic linear threshold model, according to theoretical considerations that were also clearly supported by empirical evidence in our experimental evaluation.
It should be noted that the setting of the dynamic activationthreshold function 𝜗 and quiescence function ψ, especially of their parameters δ and λ, respectively, plays a crucial role in the expressiveness of our models, thus differently impacting on positiveinfluence propagation and on negativeinfluence/misinformation limitation. We make the following recommendations for the usage of our models.

The results of our evaluation revealed that the average user’s sensitivity in the negative influence perceived from distrusted neighbors (which is controlled by λ) makes the seed identification process more aware of the negative influence spread, thus considering the quiescencebiased contingencies by which a nonnegligible fraction of users cannot be activated before the time limit.

The confirmationbias effect underlying δ may lead the “stronger” campaign (i.e., the one able to activate most users at the early steps of its diffusion) to increase its spread capability.

As shown by simulations under the semiprogressive competitive model (spCF^{2}DLT), the combined effect of increased δ with an increase in the delay of the beginning of the secondstarted (good) campaign may reduce its capability of “saving” users from the influence of the bad campaign; therefore, to limit misinformation spread, the good campaign should concentrate its (activation) efforts in the early stage of its diffusion.

The nonprogressive competitive model (npCF^{2}DLT) appears to be less sensitive to the increase of δ. Yet, it tends to favor deactivation events (for users previously activated by the weaker campaign) over switched events.
In this regard, it would be interesting to study how to learn the various parameters in our models for the corresponding IM scenarios. Note that learning parameters for IM tasks is a challenging problem, which is still largely open, given the relatively little work done even under basic diffusion models (Saito et al. 2008; Goyal et al. 2010). Major difficulties are in the assumption of availability of past propagation data from which the parameters would be learnt, which is in general difficult to obtain, and the large number of parameters, which poses efficiency issues. To overcome these aspects, the approach in (Vaswani et al. 2017) appears to be particularly promising, as it does not depend on the rules that control how the propagation unfolds over time. Nonetheless, we expect that the learning problem in our setting will easily become much more challenging, given the presence of other parameters than just the diffusion probabilities, and the need for coping with competitive influence scenarios. Therefore, we believe there will be much work to do in such a direction.
Conclusion
We proposed a novel class of trustaware, dynamic LTbased models for noncompetitive and competitive influence propagation in information networks. Evaluation on realworld, publicly available networks included simulations of scenarios of misinformation spread limitation, based on realistic strategies of selection of the initial influential users.
We believe that our proposed models can pave the way for the development of sophisticated methods to solve misinformation spread limitation and related optimization problems. Remarkably, our models can profitably be used in a variety of applications whereby there is an emergence to predict the time required to debunk fake information, or to estimate how people are affected by the spread of competitive opposite opinions through a social network. Also, we envisage an effective support for factchecking, through a contextualization of the activation and quiescence functions to the production/consumption of contents in interaction networks.
Appendix A: Additional details on the properties of the models
Figure 18 shows an example of serialization for a spCF^{2}DLT diffusion graph with time horizon set to 2. Dashed lines correspond to the edges in the original graph, whereas solid lines correspond to the edges in the resulting serialized graph. Each of the four nodes in the original graph is replicated as a triple on each of the two timelayers. Triples act as “connectors” between two consecutive timelayers.
Analogously to the reduction of spCF^{2}DLT to HCLT, we can conveniently devise a notion of “connector” component between any two consecutive layers, shown in Fig. 19, which in the case of npCF^{2}DLT needs to account for node deactivations.
Example 4 shows a selection of possible configurations for the component utilized in competitive models shown in Fig. 7, in order to prove the correctness of the set of constraints in Eq. 6.
Example 4
In the example of Fig. 20, we assume that node v in the original graph needs three consecutive time steps to reach the unit value for its threshold.
On the right side of each subfigure, there are the additional node replicas:
\(\left \langle v_{t}^{3,r1},v_{t}^{3,r2},v_{t}^{3,r3} \right \rangle \), where \(v_{t}^{3,r3}\) has the maximum value for the activation threshold.
Figure 20a represents the case when the node has already reached the maximum value of its threshold and its inneighbors are only able to activate the first two replicas, which is not enough for making v change its activation campaign. We need thus to verify that:
Note that the inequality in (9) holds (cf. Eq. 6 (e)).
Figure 20b shows the case when v is active for just two consecutive time steps. Therefore, the configuration on the right side of the figure is enough to activate v in favor of the redcampaign. Therefore, the following inequality must hold:
Above, inequality in (12) holds as given in Eq. 6 (a).
Figure 20c shows the case when the node v switched from one campaign to the opposite in the middle of the three consecutive time steps. In this case the activation of node \(v_{t}^{3,r1}\) must guarantee the change of activation campaign. So the following inequality must hold:
Indeed the validation is straightforward, because all the summations on the left side in 13 are by definition greater than the one on the right side.
Figure 20d shows the case when at time step t none of the inneighbors of v is able to activate it. In this case, the node will keep its previous state, and it will do it only after the activation of the corresponding replica at the very previous time, this allow us to guarantee the sequentiality of the whole process. ■
Appendix B: Additional details on epidemic models
Figure 21 provides a focus on the behavior of SIR and SEIR models in terms of varying parameters (i.e., transmission rate β, recovery rate γ, and incubation rate σ) for the analysis discussed in “Comparison with the IC, SIR and SEIR models” section.
We observe that, for both models, most of the infections tend to occur at the early time steps of the propagation as β increases. On the other hand, higher values of γ yield cascades that show a smoother decay over time, and consequently they last longer than those corresponding to smaller γ.
Notes
We assume the second additive term in Eq. (1) is zero if δ=0.
We used the splines2 Rpackage, available at https://CRAN.Rproject.org/package=splines2.
The choice of two different logarithmic scales to represent the active users and the switched users is for the sake of readability of the plots, since the number of switched users is typically orders of magnitude smaller than the number of active users.
Abbreviations
 F ^{2} DLT :

Friendfoe dynamic linear threshold models
 IC:

Independent cascade model
 LT:

Linear threshold model
 nCF ^{2} DLT :

Noncompetitive F^{2}DLT model
 npCF ^{2} DLT :

Nonprogressive competitive F^{2}DLT model
 OSN:

Online social network
 spCF ^{2} DLT :

Semiprogressive competitive F^{2}DLT model
References
Anagnostopoulos, A, Bessi A, Caldarelli G, Vicario MD, Petroni F, Scala A, Zollo F, Quattrociocchi W (2015) Viral misinformation: The role of homophily and polarization In: Proc. World Wide Web Conf, 355–356.. ACM, New York.
Bessi, A, Caldarelli G, Del Vicario M, Scala A, Quattrociocchi W (2014) Social determinants of content selection in the age of (mis)information In: Proc. Int. Conf. on Social Informatics (SocInfo), 259–268.. Springer International Publishing.
Bishop, J (2007) Increasing participation in online communities: A framework for humancomputer interaction. Comput Hum Behav 23(4):1881–1893.
Brandes, U, Kenis P, Lerner J, van Raaij D (2009) Network analysis of collaboration structure in Wikipedia In: Proc. World Wide Web Conf, 731–740.. ACM, New York.
Budak, C, Agrawal D, Abbadi AE (2011) Limiting the spread of misinformation in social networks In: Proc. World Wide Web Conf, 665–674.. ACM, New York.
Caliò, A, Tagarelli A (2018) TrustBased Dynamic Linear Threshold Models for Noncompetitive and Competitive Influence Propagation In: Proc. IEEE Int. Conf. On Trust, Security And Privacy In Computing And Communications (TrustCom), 156–162. https://doi.org/10.1109/TrustCom/BigDataSE.2018.00033.
Chen, S, He K (2015) Influence Maximization on Signed Social Networks with Integrated PageRank In: Proc. IEEE Social Computing Conf, 289–292.. IEEE Computer Society.
Chen, W, Lakshmanan LVS, Castillo C (2013) Information and Influence Propagation in Social Networks. Morgan & Claypool, San Rafael.
Chen, W, Lu W, Zhang N (2012) Timecritical influence maximization in social networks with timedelayed diffusion In: Proc. AAAI Conf, 592–598.. AAAI Press.
Chen, W, Collins A, Cummings R, Ke T, Liu Z, Rincon D, Sun X, Wang Y, Wei W, Yuan Y (2011) Influence maximization in social networks when negative opinions may emerge and propagate In: Proc. SIAM Conf. on Data Mining.. SIAM, Philadelphia.
Das, A, Gollapudi S, Kiciman E, Varol O (2016) Information dissemination in heterogeneousintent networks In: Proc. ACM Conf. on Web Science, 259–268.. ACM, New York.
Dodds, PS (2018) Slightly generalized contagion: Unifying simple models of biological and social spreading. In: Lehman S Ahn YY (eds)Complex Spreading Phenomena in Social Systems. Computational Social Sciences, 67–80.. Springer Internation Publishing.
Fan, L, Lu Z, Wu W, Thuraisingham B, Ma H, Bi Y (2013) Least cost rumor blocking in social networks In: Proc. Distr. Comp. Syst. Conf, 540–549.
Fazli, M, Ghodsi M, Habibi J, Khalilabadi PJ, Mirrokni VS, Sadeghabad SS (2014) On nonprogressive spread of influence through social networks. Theor Comput Sci 550:36–50.
Garimella, K, De Francisci Morales G, Gionis A, Mathioudakis M (2017) The effect of collective attention on controversial debates on social media In: Proc. ACM Conf. on Web Science, 43–52.. ACM, New York.
Gilbert, E, Karahalios K (2009) Predicting tie strength with social media In: Proc. Int. Conf. on Human Factors in Computing Systems (CHI), 211–220.. ACM, New York.
Golbeck, J, Hendler JA (2006) Inferring binary trust relationships in webbased social networks. ACM Trans Internet Techn 6(4):497–529.
Goyal, A, Bonchi F, Lakshmanan LVS (2010) Learning influence probabilities in social networks In: Proc. Int. Conf. on Web Search and Web Data Mining (WSDM), 241–250.. ACM, New York.
Hamdi, S, Bouzeghoub A, Gancarski AL, Yahia SB (2013) Trust inference computation for online social networks In: Proc. IEEE TrustCom/ISPA/IUCC, 210–217.. IEEE Computer Society.
Han, L, Ma Z, Shi T (2003) An sirs epidemic model of two competitive species. Math Comput Model 37(12):87–108.
He, X, Song G, Chen W, Jiang Q (2012) Influence blocking maximization in social networks under the competitive linear threshold model In: Proc. SIAM Conf. on Data Mining, 463–474.
Hethcote, H (2000) The mathematics of infectious diseases. SIAM Rev 42(4):599–653.
Hu, M, Jia S, Chen Q, Jia Z, Hong L (2013) The analysis of epidemic disease propagation in competition environment In: Proc. Intelligent Automation Conf, 227–234.. Springer Berlin Heidelberg, Berlin.
Iniguez, G, Ruan Z, Kaski K, Kertesz J, Karsai M (2018) Service adoption spreading in online social networks. In: Lehman S Ahn YY (eds)Complex Spreading Phenomena in Social Systems. Computational Social Sciences, 151–175.. Springer International Publishing.
Iribarren, JL, Moro E (2009) Impact of human activity patterns on the dynamics of information diffusion. Phys Rev Lett 103(3).
Ji, C, Jiang D, Yang Q, Shi N (2012) Dynamics of a multigroup SIR epidemic model with stochastic perturbation. Automatica 48(1):121–131.
Jiang, W, Wang G, Wu J (2014) Generating trusted graphs for trust evaluation in online social networks. Future Generation Comp Syst 31:48–58.
Kempe, D, Kleinberg J, Tardos E (2003) Maximizing the spread of influence through a social network In: Proc. ACM Conf. Knowl. Disc. Data Min, 137–146.. ACM, New York.
Kim, J, Tabibian B, Oh A, Schölkopf B, GomezRodriguez M (2018) Leveraging the crowd to detect and reduce the spread of fake news and misinformation In: Proc. ACM Conf. on Web Search and Data Mining, 324–332.. ACM, New York.
Koutra, D, Bennett PN, Horvitz E (2015) Events and controversies: Influences of a shocking news event on information seeking In: Proc. World Wide Web Conf, 614–624.. ACM, New York.
Krishnan, S, Butler P, Tandon R, Leskovec J, Ramakrishnan N (2016) Seeing the forest for the trees: New approaches to forecasting cascades In: Proc. ACM Conf. on Web Science, 249–258.. ACM, New York.
Kumar, KPK, Geethakumari G (2014) Detecting misinformation in online social networks using cognitive psychology. Humancent Comp Inf Sci 4:1–22.
Kumar, S, West R, Leskovec J (2016) Disinformation on the web: Impact, characteristics, and detection of wikipedia hoaxes In: Proc. World Wide Web Conf, 591–602.. International World Wide Web Conferences, Republic and Canton of Geneva.
Leskovec, J, Huttenlocher D, Kleinberg J (2010a) Governance in social media: A case study of the Wikipedia promotion process In: Proc. Conf. on Weblogs and Social Media.. The AAAI Press.
Leskovec, J, Huttenlocher D, Kleinberg J (2010b) Signed networks in social media In: Proc. Conf. on Human Factors in Computing Systems, 1361–1370.. ACM, New York.
Lewandowsky, S, Ecker UKH, Seifert CM, Schwarz N, Cook J (2012) Misinformation and its correction. Psychol Sci Public Interest 13(3):106–131.
LibenNowell, D, Kleinberg J (2008) Tracing information flow on a global scale using internet chainletter data. Proc Natl Acad Sci 105(12):4633–4638.
Litou, I, Kalogeraki V, Katakis I, Gunopulos D (2016) Realtime and costeffective limitation of misinformation propagation In: Proc. IEEE Conf. on Mobile Data Management, 158–163.. IEEE Computer Society.
Liu, B, Cong G, Xu D, Zeng Y (2012) Time constrained influence maximization in social networks In: Proc. IEEE Conf. on Data Mining, 439–448.. IEEE Computer Society.
Liu, H, Lim E, Lauw HW, Le M, Sun A, Srivastava J, Kim YA (2008) Predicting trusts among users of online communities: an epinions case study In: Proc. ACM Conf. on Electronic Commerce (EC), 310–319.. ACM, New York.
Lou, VY, Bhagat S, Lakshmanan LVS, Vaswani S (2014) Modeling nonprogressive phenomena for influence propagation In: Proc. ACM Conf. on Online Social Networks, 131–137.. ACM, New York.
Lu, W, Chen W, Lakshmanan LVS (2015) From competition to complementarity: Comparative influence diffusion and maximization. Proc VLDB Endow 9(2):60–71.
Metaxas, P, Mustafaraj E (2010) From obscurity to prominence in minutes: Political speech and realtime search In: Proc. ACM Conf. on Web Science.
MohamadiBaghmolaei, R, Mozafari N, Hamzeh A (2015) Trust based latency aware influence maximization in social networks. Eng Appl Artif Intell 41:195–206.
Mustafaraj, E, Metaxas PT (2017) The fake news spreading plague: Was it preventable? In: Proc. ACM Conf. on Web Science, 235–239.. ACM, New York.
Overgoor, J, Wulczyn E, Potts C (2012) Trust propagation with mixedeffects models In: Proc. Int. Conf. on Weblogs and Social Media (ICWSM).. The AAAI Press.
Porter, MA, Gleeson JP (2016) Dynamical Systems on Networks. Frontiers in Applied Dynamical Systems: Reviews and Tutorials. Springer, Cham.
Saito, K, Nakano R, Kimura M (2008) Prediction of information diffusion probabilities for independent cascade model In: Proc. Int. Conf. on KnowledgeBased Intelligent Information and Engineering Systems (KES), 67–75.. Springer.
Sherchan, W, Nepal S, Paris C (2013) A survey of trust in social networks. Comput Surv ACM 45(4):47:1–47:33.
Talluri, M, Kaur H, He JS (2015) Influence maximization in social networks: Considering both positive and negative relationships In: Proc. Conf. on Collaboration Technologies and Systems, 479–480.. IEEE.
Tang, J, Liu H (2015) Trust in Social Media. Synthesis Lectures on Information Security, Privacy, & Trust. Morgan & Claypool Publishers, San Rafael.
Tedjamulia, SJJ, Dean DL, Olsen DR, Albrecht CC (2005) Motivating content contributions to online communities: Toward a more comprehensive theory In: Proc. Int. Conf. on System Sciences (HICSS).. IEEE computer society, Washington, DC.
Tong, G, Wu W, Guo L, Li D, Liu C, Liu B, Du D (2017) An efficient randomized algorithm for rumor blocking in online social networks In: Proc. IEEE Conf. on Computer Communications, 1–9.
Vaswani, S, Kveton B, Wen Z, Ghavamzadeh M, Lakshmanan LVS, Schmidt M (2017) ModelIndependent Online Learning for Influence Maximization In: Proc. Int. Conf. on Machine Learning (ICML), 3530–3539.. PMLR.
Vazquez, A, Rácz B, Lukács A, Barabási AL (2007) Impact of nonpoissonian activity patterns on spreading processes. Phys Rev Lett 98:158702. ACM, New York.
Vedula, N, Parthasarathy S, Shalin VL (2017) Predicting trust relations within a social network: A case study on emergency response In: Proc. ACM Conf. on Web Science, 53–62.. ACM, New York.
Weng, X, Liu Z, Li Z (2016) An efficient influence maximization algorithm considering both positive and negative relationships In: Proc. IEEE TrustCom/BigDataSE/ISPA, 1931–1936.. IEEE.
Yin, G, Jiang F, Cheng S, Li X, He X (2012) AUTrust: A Practical Trust Measurement for Adjacent Users in Social Networks In: Proc. Int. Conf. on Cloud and Green Computing (CGC), 360–367.. IEEE Computer Society.
Acknowledgements
This research work has been partly supported by the Start(H)Open POR Grant No. J28C17000380006 funded by the Calabria Region Administration, and by the NextShop PON Grant No. F/050374/0103/X32 funded by the Italian Ministry for Economic Development.
Funding
Not applicable.
Availability of data and materials
Data used in this work refer to publicly available networks.
Author information
Authors and Affiliations
Contributions
AT and AC devised the study, analyzed the data, and wrote the paper; AC performed the experiments and prepared the result presentation; AT supervised the writing of the paper. Both authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional information
An abridged version of this work appeared in the Procs. of the 2018 IEEE TrustCom Conf. (Caliò and Tagarelli 2018).
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Caliò, A., Tagarelli, A. Complex influence propagation based on trustaware dynamic linear threshold models. Appl Netw Sci 4, 14 (2019). https://doi.org/10.1007/s4110901901245
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s4110901901245