- Research
- Open access
- Published:
Complex influence propagation based on trust-aware dynamic linear threshold models
Applied Network Science volumeĀ 4, ArticleĀ number:Ā 14 (2019)
Abstract
To properly capture the complexity of influence propagation phenomena in real-world contexts, such as those related to viral marketing and misinformation spread, information diffusion models should fulfill a number of requirements. These include accounting for several dynamic aspects in the propagation (e.g., latency, time horizon), dealing with multiple cascades of information that might occur competitively, accounting for the contingencies that lead a user to change her/his adoption of one or alternative information items, and leveraging trust/distrust in the usersā relationships and its effect of influence on the usersā decisions. To the best of our knowledge, no diffusion model unifying all of the above requirements has been developed so far. In this work, we address such a challenge and propose a novel class of diffusion models, inspired by the classic linear threshold model, which are designed to deal with trust-aware, non-competitive as well as competitive time-varying propagation scenarios. Our theoretical inspection of the proposed models unveils important findings on the relations with existing linear threshold models for which properties are known about whether monotonicity and submodularity hold for the corresponding activation function. We also propose strategies for the selection of the initial spreaders of the propagation process, for both non-competitive and competitive influence propagation tasks, whose goal is to mimic contexts of misinformation spread. Our extensive experimental evaluation, which was conducted on publicly available networks and included comparison with competing methods, provides evidence on the meaningfulness and uniqueness of our models.
Introduction
Since the early applications in viral marketing, the development of information diffusion models and their embedding in optimization methods has provided effective support to address a variety of influence propagation problems.
However, due to the shrinking boundary between real and online/virtual social life (Bessi et al. 2014) along with the unlimited misinformation spots over the Web, e.g., fake news (Kumar et al. 2016; Kim et al. 2018), deciding whether a source of information is reliable or not has become a delicate task. For these reasons, understanding the complex dynamics of information diffusion phenomena has emerged as a task of paramount importance, since the way people act on the Web reflects how people behave in reality, which eventually depends to some extent on the way everyone consumes and acquires information.
A few studies on the spreading of fake news and hoaxes (Metaxas and Mustafaraj 2010; Mustafaraj and Metaxas 2017) argued that, the likelihood of people to be deceived by a spreading information item is increased because assessing the reliability and trustworthiness of the source generating and/or sharing such item becomes harder. Within this view, one side effect is the tendency of users to access information from like-minded sources (Koutra et al. 2015) and at the same time, to be trapped inside information bubbles, thus favoring network polarization phenomena (Garimella et al. 2017).
When it comes to debunking misinformation, two main strategies can be devised: real-time detection and correction, or delayed correction (Kumar and Geethakumari 2014). However, in both cases, the response time plays a crucial role into the effectiveness of the correction attempt, because users tend to reinforce their own belief ā a cognitive phenomenon known as confirmation bias. Moreover, there is no guarantee about the effectiveness of such corrections: on the contrary, highlighting a fake news may even produce a backfire effect, i.e., driving usersā attention towards the misleading piece of information.
In this scenario, it appears that one recipe to deal with the interleaving of information and dis/misinformation should be to educate people to be mindful of the informative source. Unfortunately, it is often difficult to understand where an information item originated from. Therefore, it turns out to be essential to capture the effects that different types of social ties, particularly trust/distrust relationships, can have on both the user behavior and propagation dynamics. Two related questions hence arise: Q1What are the key-features that make a diffusion model able to explain the inherent dynamic, and often competitive, nature of real-world propagation phenomena?Q2Do the currently used models of diffusion already incorporate such features?
To address question Q1, we recognize a number of aspects as essential constituents of a ārealisticā information diffusion model, namely: (1) leveraging trust/distrust information in the user relationships to capture different effects of influence on decisions taken by a user; (2) accounting for a userās change in adopting one or alternative information items (i.e., relaxation of the diffusion progressivity assumption); (3) accounting for a userās hesitation or inclination towards the adoption of an information over time; (4) accounting for time-dependent variables, such as latency, to explain the propagation dynamics; (5) dealing with multiple cascades of information that might occur competitively.
Motivating example. Our above hypothesis is supported by the following example: consider a typical scenario occurring in a political campaign, where two candidates want to target the audience of potential electors. Letās assume, at the start of the political campaign, every elector has a complete unbiased opinion towards one of the two candidates. The ultimate decision about which candidate to vote it will likely be affected by both āexogenousā and āendogenousā influencing factors, i.e., one may be genuinely influenced by decisions taken by her/his social contacts ā impact of homophily factors ā but s/he may also have formed her/his own opinion outside the network of friends. In fact, not only friends, but also the network of foes has some degree of influence over the decision process of an individual. As a consequence of such negative influence received by foes, one may become more hesitant in taking a decision, which would be reflected by a quiescence state of the elector before being fully engaged in the promotion of the chosen candidate. Moreover, despite an elector may alternate her/his opinion in favor of one or the other candidate before the final endorsement, it will be more difficult to induce this change over time. In this regard, a time-aware notion of activation threshold is needed to mimic the effects of the confirmation bias. Finally, all decisions must be taken before the time limit, i.e., the election day, which constrains the political campaign period.
Question Q2 has been addressed by a relatively large corpus of research studies in the last few years. A variety of methods, mainly built upon classic information diffusion models such as Independent Cascade (IC) and Linear Threshold (LT) (Kempe et al. 2003), have tried to explain realistic propagation phenomena in order to solve optimization problems related to influence propagation. As we shall discuss in āRelated workā section, diffusion models have been developed to incorporate one or more of the following aspects: multiple, competitive cascades of information; time horizon for the unfolding of the diffusion process; time-dependent influence; delay in the propagation; and trustworthiness of the influence relations. However, to the best of our knowledge, all of the above aspects have never been unified into the same (LT-based) diffusion model.
Contributions. In this paper, we propose a novel class of diffusion models, named Friend-Foe Dynamic Linear Threshold Models (F2DLT). They are based on the classic LT model and are designed to deal with non-competitive as well as competitive time-varying propagation scenarios. In our proposed models, the information diffusion graph is defined on top of a trust network, so that the strength of trust and distrust relationships is encoded into the influence probabilities. The response of a user to the influencing attempts is described by the means of a time-varying activation function, depending on both the inherent activation-threshold of the user and her/his tendency of keeping or leaving the campaign-specific activation state over time. We also introduce a quiescence function to model the latency or delay in the propagation, which accounts for the involvement of the userās foes in the information diffusion. Remarkably, in our models, the trusted connections and distrusted connections play different roles: only friends can exert a degree of influence for activation/contagion purposes, whereas foes can only contribute to increase the userās hesitation to commit with the propagation process. For competitive scenarios, we define two models with clearly different semantics: a semi-progressive model, which assumes that a user, once activated, is only allowed to switch to a different campaign, and a non-progressive model, which instead requires a user to have always the support of her/his in-neighbors to keep the activation state with a certain campaign.
We provided several theoretical insights into the proposed models. In particular, we demonstrated how each of our models could be reduced to other LT-based models for which properties are known about whether monotonicity and submodularity hold for the corresponding activation function.
Another contribution of this work is the definition of four seed selection strategies, which mimic different, realistic scenarios of influence propagation. These strategies are central to our methodology of propagation simulation, since the development of optimization methods under our diffusion models is beyond the goals of this work. Notably, in competitive scenarios, we have focused on combinations of strategies (to associate with competing campaigns) that might be reasonably considered for a misinformation spread limitation problem.
Experimental evaluation conducted on four real-world networks, also including comparison with stochastic epidemic models and the dynamic linear-threshold (DLT) model, has provided interesting findings on the meaningfulness and uniqueness of our proposed models.
Related work
We overview information diffusion models that, in the attempt of explaining realistic propagation phenomena, incorporate one or more of the following aspects: multiple, competitive cascades of information, time horizon for the unfolding of the diffusion process, time-dependent influence, delay in the propagation, trustworthiness of the influence relations. Table 1 provides a guide to our discussion.
Please note that here we refer to the vast literature on probabilistic models originally designed to explain stochastic processes of information diffusion, which include the classic Independent Cascade(IC) and Linear Threshold (LT) models (Chen et al. 2013), and relating optimization problems, such as influence maximization. By contrast, we will leave out of consideration deterministic models, such as the structural cascades specifically designed to model context/content-sensitive diffusion over an interaction network (e.g., (Krishnan et al. 2016; Das et al. 2016)). Also, it is worth noting that the information diffusion modeling problem we tackle in this work is significantly different from the one addressed by epidemic models, such as SIS, SIR(S), and SEIR(S) (Hethcote 2000), already for the non-competitive scenario. Standard epidemic models are originally defined as compartmental models, since the individuals of a population are divided in compartments that describe an epidemiological state. The parameters used to represent transition rates for changing states are absolute constants, which means that the infection process in compartmental models has a deterministic behavior. Also, standard epidemic models are of mass-action type, since individuals are represented as normalized fraction of a population which randomly interact with each other. As discussed in (Dodds 2018), even social contagion based on stochastic or generalized epidemic models (i.e., there is a probability distribution of rates to govern the infection process) is originally defined on random networks, and its revision to deal with social networks would lead to more complicated models. In this regard, one direction is taken by the stochastic individual-contact, network models, whereby SIS and related models are reformulated by considering a stochastic infection process and a network-based population of individually identifiable elements. In āComparison with the IC, SIR and SEIR modelsā section, we present a stage of experimental evaluation devoted to a comparison with such models. However, even if epidemic models have also been used for social influence, they are not the most common approach to such topic (Porter and Gleeson 2016). This is manly due to the fact that identifying and modeling the causal mechanisms of the spread of ideas is more difficult than for the spread of diseases. By constrast, the threshold models for influence propagation (even the simplest ones) have two important features that are not clearly present in epidemic models. First, individuals have different behaviors, being such differences reflected in the distribution of activation thresholds associated with the individuals; by contrast, in stochastic epidemic models, the state-transition probabilities are drawn independently of the individualsā relations. Second, an individualās behavior also depends on the behavior of other individuals s/he is linked to: here, it is helpful to think about threshold models as an example of complex contagions, whereby an individual takes an action as a result of the exposure to multiple sources of influence; by contrast, epidemic models are more likely to represent simple contagion, in that a single source of influence (as social contact) may suffice to cause an individualās action. Moreover, while the transition to the recovered state assumes non-progressivity in stochastic diffusion models such as LT or IC, such a transition in SIR(S) is defined to happen spontaneously, discarding any influence that may result from the interaction with other individuals. For all such reasons, thresholds models are usually considered more appropriate in contexts like the adoption of new technologies or controversial ideas (Chen et al. 2013). And, in our work, we indeed follow this line of research. One further point of divergence adheres to the notion of competitiveness that is somehow found in advanced epidemic models: this refers to the presence of two or multiple groups of individuals (with some distinguishing characteristics) which are however affected by the same, single disease (Han et al. 2003; Ji et al. 2012; Hu et al. 2013), therefore it corresponds to a totally different notion than what is addressed in our work.
In the following, we briefly recall the definition of the LT model, which is at the basis of our proposal; then, we focus on related work that address the aforementioned aspects concerning complex propagation phenomena.
The classic Linear Threshold model. Given a directed graph representing a social network, with estimates of influence probabilities provided as edge weights, nodes can be āactivatedā (i.e., influenced) through an information cascade starting from an initially selected set of seed nodes (i.e., early-adopters). At the beginning of the information diffusion process, each node is assigned a threshold uniformly at random from [0,1]. The diffusion process unfolds in discrete time steps and follows certain rules: nodes are either active or inactive; once activated, nodes cannot deactivate; an active node may trigger activation of neighboring nodes; a node can be activated at time t+1 by its active neighbors if their total influence weight at time t exceeds the threshold associated to that node. The process runs until no more activations are possible.
Competitive diffusion. A number of studies have been devoted to model competitive diffusion; see, e.g., (Chen et al. 2013) for a general introduction to IC and LT classes of competitive diffusion models. Focusing on competitive diffusion and related optimization problems under the context of misinformation spread limitation, one of the earliest work is (Budak et al. 2011), which proposes a multi-campaign IC model to address the influence limitation problem, i.e., to find a seed set of size k for one, āgoodā campaign such that the number of nodes influenced by the other, ābadā campaign is minimized. In (Tong et al. 2017), the problem of rumor blocking is addressed under the competitive IC model and a randomized algorithm is developed for the selection of the seed set able to yield the maximum reduction in the number of bad-infected nodes. An influence blocking maximization problem is also addressed in (He et al. 2012), using competitive LT. In (Chen et al. 2011), the two competing cascades correspond to opposite opinions, where the negative one may emerge spontaneously from any user in the network, e.g., a user got disappointed with a purchased item and decides to spread negative opinion among her/his contacts. Lu et al. (2015) also address the aspect of complementarity between two competing campaigns, under the assumption that if the two information items are correlated then the adoption of one item might favor further adoption of the second item over time.
Non-progressive diffusion. While modeling the competitive nature of information cascades, the above works however refer to progressive models. On the contrary, a few studies have been proposed to model non-progressive diffusion. For instance, (Lou et al. 2014) introduces a deactivation function into a continuous non-progressive model, whereas an extension of LT is proposed in (Fazli et al. 2014) to define a non-progressive strict majority model. However, both models are also non-competitive.
Social ties and temporal aspects. All of the aforementioned works discard two important aspects: (i) the nature of social ties and their impact on the influence propagation, and (ii) time aspects concerning the diffusion process. The dichotomy between opposite types of social ties (e.g., friend vs. foe relations) has been widely studied in OSN analysis (e.g., (Leskovec et al. 2010b)), however its incorporation into diffusion models has been relatively little explored so far. For instance, two extensions of competitive LT with negative relations are defined in Talluri et al. (2015), to support positive opinion maximization, and in Weng et al. (2016), to model the adoption of opinions from friends or opposite opinions from foes. All of such models are competitive but do not consider temporal aspects in the activation or propagation processes.
Several works have studied different types of temporal variables and their impact on spreading processes (e.g., (Liben-Nowell and Kleinberg 2008; Vazquez et al. 2007; Iribarren and Moro 2009)), mainly focusing on lags and delays due to the diverse response-time and heterogeneous susceptibility of users. In (Liu et al. 2012), the authors propose a latency-aware IC model inspired by (Iribarren and Moro 2009), in which an influencing delay is introduced in the activation function. Under this model, a time-constrained IM problem is defined, i.e., to find a seed set of size k such that the expected number of nodes is activated before a given time limit. Another extension of IC is proposed in (Chen et al. 2012), where a notion of meeting probability is introduced to control the activation of neighbors. The models in Liu et al. (2012); Chen et al. (2012) are non-competitive. By contrast, the trust-based latency-aware IC model proposed in (Mohamadi-Baghmolaei et al. 2015) features competitiveness, non-progressivity, temporal delay in propagation, and is also designed to deal with trust/distrust relations.
Dynamic behaviors. All of the previously mentioned works still lack aspects modeling the dynamic behavior of the users. In particular, according to recent studies about polarization of opinion in OSNs (Anagnostopoulos et al. 2015) and related works about misinformation reduction (Kumar and Geethakumari 2014; Lewandowsky et al. 2012), a crucial aspect is to intervene before a competing campaign can reach the users, or at least soon enough, so that a user does not have time to radicalize her/his thoughts. This idea was first captured in (Litou et al. 2016), where a dynamic LT model (DLT) is defined to deal with competitive information cascades. The influence weights temporally decay according to a Poisson distribution, and every node can be either positively or negatively activated at a given time depending on the absolute value of the cumulative influence of its neighbors, while the activation sign depends on the sign of the cumulative influence. Moreover, a dynamic behavior aspect lays on the update of the activation threshold whenever a user switches her/his belief.
The latter work shares with our proposal all features of competitiveness, non-progressivity (although deactivation is not allowed), time-aware propagation, dynamic influence behavior, and incorporation of opposite opinions in the influence probabilities; moreover, it is also based on LT. However, our competitive models differ from DLT in (Litou et al. 2016) since (i) we explicitly model trust and distrust relationships to define the influence probabilities, (ii) our activation function takes into account only the trusted connections while (iii) distinguishing between the two information cascades; (iv) we introduce a quiescence function to model a delay in the information propagation depending on the strength of influence exerted by distrust relations (i.e., foe neighbors); finally, (v) the activation threshold in our models becomes stronger over time as a node is holding a particular belief. In āComparison with the DLT modelā section we shall compare our models with DLT.
Friend-foe dynamic linear threshold models
In this section we describe our proposed class of Friend-Foe Dynamic Linear Threshold (F2DLT) models, which is comprised of: the Non-Competitive F2DLT (nC-F2DLT), the Semi-Progressive Competitive F2DLT (spC-F2DLT), and the Non-Progressive Competitive F2DLT (npC-F2DLT). We first provide an overview of the framework based on F2DLT. Next, we introduce key features common to all models, then we elaborate on each of them.
Overview
Figure 1 illustrates the conceptual architecture of a framework for information diffusion and influence propagation based on our proposed models. Given a population of OSN users, the framework requires three main inputs: (i) a trust network, which is inferred from the social network of those users to model their trust/distrust relationships; (ii) user behavioral characteristics that are intrinsic to each user (i.e., exogenous to an information diffusion scenario) and oriented to express two aspects: activation-threshold, i.e., the effort needed to activate a user through cumulative influence from her/his neighbors, and quiescence, i.e., the userās hesitation in being actively committed with the propagation process; and, (iii) one or multiple competing campaigns, i.e., information cascades generated from the agent(s) having viral marketing purposes. Moreover, the information diffusion process has a time horizon, and its temporal unfolding is reflected in the evolution of the information diffusion graph: this also depends on the dynamics of the usersā behaviors in response to the influence chains started by the campaign(s), which admit that users may switch from the adoption of a campaignās item to that of another one. Putting it all together, our F2DLT based framework embeds all previously discussed aspects that are required to explain complex propagation phenomena, i.e., competitive diffusion, non-progressivity, time-aware activation, delayed propagation, and trust/distrust relations.
Please note that inferring a trust network from a social network is not an objective of this work; rather, we assume that trust relationships between users of an OSN are available and, as we shall describe next in this section, they are exploited as key information to develop our proposed models. Several heuristics have been proposed to infer a trust network from social relations and interactions among users in an OSN. A common approach is to infer trust relationships based on the social influence exerted by users over the network and propagation of trust ratings (Golbeck and Hendler 2006; Overgoor et al. 2012; Hamdi et al. 2013; Jiang et al. 2014). Other studies utilize usersā activities in social media (Gilbert and Karahalios 2009), or usersā attributes and interactions (Liu et al. 2008), or combinations of aspects concerning user affinity, familiarity and reputation (Yin et al. 2012), social influence, social cohesion and the afffective valence expressed by the users in the textual contents they produce (Vedula et al. 2017). The interested reader may refer to (Tang and Liu 2015; Sherchan et al. 2013) for an exhaustive overview on the topic.
Basic definitions
We are given a trust network represented by a directed graph G=ćV,E,wć, with set of nodes V, set of edges E, and weighting function w:Eā¦[ā1,1] such that, for every edge (u,v)āE,wuv:=w(u,v) expresses how much v trusts its in-neighbor u. Positive, resp. negative, value of wuv corresponds to a trust, resp. distrust, relation.
For every vāV, we denote with \(N^{in}_{+}(v)\) and \(N^{in}_{-}(v)\) the set of neighbors trusted by v (i.e., friends of v) and the set of neighbors distrusted by v (i.e., foes of v), respectively. Moreover, as required in linear threshold models, the constraints \(\sum \nolimits _{u \in N^{in}_{+}(v)} w_{uv} \leq 1\) and \(\sum \nolimits _{u \in N^{in}_{-}(v)} |w_{uv}| \leq 1\) must be fulfilled.
Let \({\mathcal {G}} = G(g, q, T) = \langle V, E, w, g, q, T \rangle \) be a directed weighted graph representing the LT-based information diffusion graph associated with trust network G, where T denotes a time interval for the diffusion process, g and q denote time-dependent activation-threshold and quiescence functions. These are introduced in \({\mathcal {G}}\) to model the aspects of time-aware activation and delayed propagation, respectively. We use symbol St to denote the set of active nodes at time t, and symbol \({\widetilde {S}}_{t}\) to denote the set of active nodes for which, at t, the quiescence time is not expired yet, i.e., the quiescent nodes.
Activation-threshold function. According to the LT model, every node vāV is associated with an exogenous activation-threshold, Īøvā(0,1], which corresponds to the a-priori effort needed in terms of cumulative influence to activate the node. We enhance this concept by defining an activation-threshold function, \(g: V, T \mapsto \mathbb {R}^{+}\), such that for every vāV and tāT:
i.e., the activation of v at time t depends both on the userās pre-assigned threshold, Īøv, and on a time-evolving activation term, š(Ā·,Ā·), which models the dynamic response of a user towards the activation attempt exerted by her/his neighbors.
To specify š(Ā·,Ā·), we devise two main scenarios for g(Ā·,Ā·):
-
A biased scenario, modeled as a non-decreasing monotone function, to capture the tendency of a user to consolidate her/his belief, according to the confirmation-bias principle (Anagnostopoulos et al. 2015).
-
An unbiased scenario, modeled as non-monotone function, whereby we assume that a user could revise her/his uncertainty to activate over time, thus becoming more or less inclined to change her/his opinion on an information item. This is particularly meaningful in applications such as customer retention, or churn prediction (i.e., a decrease in the activation-threshold would correspond to the tendency of a user to churn in favor of another service).
Both variants š(Ā·,Ā·) range within the interval [0,1], for any vāV.
Let us first consider the biased scenario, which is focused on the confirmation bias principle. We choose the following form for the activation-threshold function, by which the value increases by increasing the time a node keeps staying in the same active state:
where \(t_{v}^{last}\) denotes the last (i.e., most recent) time v was activated and Ī“ā„0Footnote 1 represents the increment in the value of g(v,t) for consecutive time steps. Thus, the longer a node has kept its active state for the same information cascade (campaign), the higher its activation value, and as a consequence, it will be harder to make the node change its state, or even no more possible (i.e., g(v,t) saturates to 1, as the difference \(\left (t - t_{v}^{last}\right)\) exceeds (1āĪøv)/Ī“).
In the unbiased scenario, we define the activation-threshold function such that, for each v, the value of the function is maximum (i.e., 1) just after the activation, i.e., at time \(t=t_{v}^{last}+1\), then for subsequent time steps, the function exponentially decreases towards Īøv:
where \(\mathbb {I}[\cdot ]\) denotes the indicator function, i.e., it equals 1 if \(t-t_{v}^{last}=1\), 0 otherwise. Note that Ī“ is used differently w.r.t. the previous scenario, as it acts as a coefficient that controls the decrease of the activation-threshold function over time.
Quiescence function. Each node in \({\mathcal {G}}\) is also associated with a quiescence value, which quantifies the latency in propagation through that node. We define a quiescence function, q:V,Tā¦T, non-decreasing and monotone, such that for every vāV,tāT, with v activated at time t:
where ĻvāT represents an exogenous term modeling the userās hesitation in being fully committed with the propagation process, and \(\psi (N^{in}_{-}(v),t)\) provides an additional delay proportional to the amount of vās neighbors that are distrusted and active, by the time the activation attempt is performed by the vās trusted neighbors:
where Ī»ā„0 is a coefficient modeling the average user sensitivity in the perceived negative influence. Intuitively, this coefficient would weight more the negative influence as the diffusing informative item is more āworth of suspicionā. Note also that, in Eq. 3, wuv is a negative value, since u is a distrusted neighbor of v, i.e., \(u \in N^{in}_{-}(v)\).
Rationale for activation and propagation. Our choice of using, on the one hand, friends for the activation of a user, and on the other hand, foes to impact on delayed propagation, represents a key distinction from related work (Litou et al. 2016; Talluri et al. 2015; Weng et al. 2016). Therefore, in our models, the trusted connections and distrusted connections play different roles: only friends can exert a degree of (positive) influence, whereas foes can only contribute to increase the userās hesitation to commit with the propagation process. It should be noted that both activation and delayed propagation terms also include exogenous factors. We indeed take into consideration both the existence of environmental and personal factors of influence on an individualās behavior. Several studies in information diffusion and influence maximization have reported evidences that, apart from influence coming from social contacts, an individual may be affected by some external event(s) and/or personal reasons to adopt an information (Goyal et al. 2010) as well as to delay the adoption of an information (Iniguez et al. 2018). In our setting, we tend to reject as true in general, the principle āI agree with my friendsā idea and disagree with my foesā ideaā (which is also close to the adage āthe enemy of my enemy is my friendā), since this would imply that the behavior of a user should be completely determined by the stimuli coming from her/his neighbors. Rather, according to most conceptual models developed in social science and human-computer interaction fields (see, e.g., (Tedjamulia et al. 2005; Bishop 2007)), we believe that the individualās influenceability has a component based on personal characteristics.
Non-competitive model
We introduce the first of the three proposed models, which refers to a single-item propagation scenario. Figure 2 shows the life-cycle of a node in the diffusion graph under this model.
Definition 1
Non-Competitive Friend-Foe Dynamic Linear Threshold Model (nC-F2DLT) Let \({\mathcal {G}} = \langle V, E, w, g, q, T \rangle \) be the diffusion graph of Non- Competitive Friend-Foe Dynamic Linear Threshold Model (nC-F 2DLT). The diffusion process under the nC-F 2DLT model unfolds in discrete time steps. At time t=0, an initial set of nodes S0 is activated. At time tā„1, the following rule applies: for any inactive node \(v \in V \setminus \left (S_{t-1} \cup \widetilde {S}_{t-1}\right)\), if \(\sum \nolimits _{u \in N^{in}_{+}(v) \cap S_{t-1}} w_{uv} \geq g(v,t)\), then v will be added to the set of quiescent nodes \(\widetilde {S}_{t}\), with quiescence time equal to tā=q(v,t). Once the quiescence time is expired, v will be removed from \(\widetilde {S}_{t}\) and added to the set of active nodes \(\phantom {\dot {i}\!}{S}_{t^{*}}\). The process continues until T is expired or no more activation attempts can be performed. ā”
Competitive models
Here we introduce the two competitive F2DLT models. Let us first provide our motivation for developing two different competitive models: through the following example, we illustrate a particular situation that may occur when dealing with two campaigns competitively propagating through a network. Please note that, throughout the rest of this paper, we will consider only two competing campaigns for the sake of simplicity; nevertheless, our proposed models are generalizable to more than two competing campaigns.
Example 1
Figure 3 shows an example activation sequence in a competitive scenario between two information cascades, distinguished by colors red and green. At time t=0, nodes u and z are green-active, and their joint influence causes green-activation of node v as well (since 0.3+0.5ā„0.6). At time t=1, as fully influenced by node x, node z has switched its activation in favor of the red campaign. After this switch, at time t=2, it happens that vās activation state is no more consistent with the (joint or individual) influenced exerted by u and z. In particular, two mutually exclusive events might in principle happen at t=2: either v is deactivated or v maintains its green-activation state. ā
The uncertainty situation depicted in the above example prompted us to the definition of two models, namely semi-progressive and non-progressive F2DLT: the former corresponds to the case of v keeping its current (i.e., green) activation state, whereas the latter corresponds to v returning to the inactive state. Clearly, the two modelsā semantics are different from each other: the semi-progressive model assumes that a user, once activated, cannot step aside, unlike the non-progressive one, which instead requires a user to have always the support of her/his in-neighbors to keep activation.
Given two information cascades, or campaigns Cā²,Cā²ā², for every time step tāT we will use symbols \({S^{\prime }_{t}}\) and \({S^{\prime \prime }_{t}}\) to denote the sets of active nodes, such that \({S^{\prime }_{t}} \cap {S^{\prime \prime }_{t}} = \emptyset \), and analogously symbols \({\widetilde {S}^{\prime }_{t}}\) and \({\widetilde {S}^{\prime \prime }_{t}}\) as the sets of quiescent nodes, for Cā² and Cā²ā², respectively. Also, \(S_{t} = {S^{\prime }_{t}} \cup {S^{\prime \prime }_{t}}\) and \({\widetilde {S}_{t}} = {\widetilde {S^{\prime }}_{t}} \cup {\widetilde {S^{\prime \prime }}_{t}}\).
It should also be noted that, while sharing the time interval (T) of diffusion, Cā² and Cā²ā² are not constrained to start at the same time t0. Nevertheless, for the sake of simplicity, we hereinafter assume that \(t_{0}=t_{0}^{\prime }=t_{0}^{\prime \prime }\) (with t0āT), unless otherwise specified (cf. āResultsā section).
Definition 2
Semi-Progressive Competitive Friend-Foe Dynamic Linear Threshold Model (spC-F2DLT). Let \({\mathcal {G}} = \langle V, E, w, g,\) q,Tć be the diffusion graph of Semi-Progressive Competitive Friend-Foe Dynamic Linear Threshold Model (spC-F 2DLT), and Cā²,Cā³ be two campaigns on \({\mathcal {G}}\). The diffusion process under the spC-F 2DLT model unfolds in discrete time steps. At time t=0, two initial sets of nodes, \({S^{\prime }_{0}}\) and \({S^{\prime \prime }_{0}}\), are activated for each campaign. At every time step tā„1, the following state-transition rules apply:
R1. For any inactive node \(v \in V \setminus \left (S_{t-1} \cup {\widetilde {S}_{t-1}}\right)\), if \(\sum \nolimits _{N^{in}_{+}(v) \cap {S^{\prime }_{t-1}}} w_{uv} \geq g(v,t)\), then v will be added to \({\widetilde {S^{\prime }}_{t}}\); analogously, if \(\sum \nolimits _{N^{in}_{+}(v) \cap {S^{\prime \prime }_{t-1}}} w_{uv} \geq g(v,t)\), then v will be added to \({\widetilde {S^{\prime \prime }}_{t}}\). If both conditions hold, i.e., v can be simultaneously activated by both campaigns, a tie-breaking rule will apply, in order to decide which campaign actually determines the nodeās transition in the quiescent state.
R2. When a node v enters the quiescent state corresponding to Cā² (resp. Cā²ā²) for the first time, it will stay in the quiescent node-set \({\widetilde {S^{\prime }}_{t}}\) (resp. \({\widetilde {S^{\prime \prime }}_{t}}\)) until the quiescence time is expired. After that, v will be moved to \({S^{\prime }_{t}}\) (resp. \({S^{\prime \prime }_{t}}\)), i.e., it will become active for Cā² (resp. Cā²ā²).
R3. Given a node v active for Cā²ā², i.e., \(v \in {S^{\prime \prime }_{t-1}}\), if \(\sum \nolimits _{N^{in}_{+}(v) \cap {S^{\prime }_{t-1}}} w_{uv} \geq g(v,t)\) and \(\sum \nolimits _{N^{in}_{+}(v) \cap {S^{\prime }_{t-1}}} w_{uv} > \sum \nolimits _{N^{in}_{+}(v) \cap {S^{\prime \prime }_{t-1}}} w_{uv}\), then v will be removed from \({S^{\prime \prime }_{t}}\) and added to \({S^{\prime }_{t}}\); analogous rule holds for any node active for the first campaign.
Every node for which none of the above transition-state rules is triggered at time t, it will keep its current state at time t+1. ā”
The life-cycle of a node in spC-F2DLT is shown in Fig. 4. Note that, once a node becomes active, it cannot turn back to the inactive state, but it can only change the activation campaign. Moreover, switch transitions occur instantly.
Definition 3
Non-Progressive Competitive Friend-Foe Dynamic Linear Threshold Model (npC-F2DLT) Let \({\mathcal {G}} = \langle V, E, w, g,\) q,Tć be the diffusion graph of Non-Progressive Competitive Friend-Foe Dynamic Linear Threshold Model (npC-F 2DLT), and Cā²,Cā²ā² be two campaigns on \({\mathcal {G}}\). The diffusion process in npC-F 2DLT evolves according to the same rules as in spC-F 2DLT plus the following rule concerning the deactivation process of an active node:
R4. For any active node v at time tā1, if \(\sum \nolimits _{N^{in}_{+}(v) \cap {S^{\prime }_{t-1}}} w_{uv} < \theta _{v}\) and \(\sum \nolimits _{N^{in}_{+}(v) \cap {S^{\prime \prime }_{t-1}}} w_{uv} < \theta _{v}\), then v will turn back to the inactive state at time t.
Every node for which none of the transition-state rules is triggered at time t (including the ones defined for spC-F 2DLT), it will keep its current state at time t+1. ā”
It should be noted that a nodeās deactivation rule depends on Īøv only (rather than on the whole function g(v,t)); otherwise, every node activated at a given time could deactivate itself in the next time step, due to the increase in its activation threshold. This would eventually lead to a configuration in which all nodes in the network, except the initially activated ones, are in the inactive state. The life-cycle of a node in the npC-F2DLT is illustrated in Fig. 4. Note that, unlike in spC-F2DLT, transitions to inactive state are allowed.
Theoretical properties of the models
In this section we provide insights into the proposed models. Our main goal is to understand how the features introduced in each of our LT-based models impact on the modelsā spread behavior, particularly on monotonicity and submodularity properties. We organize our analysis into two parts: the first corresponding to non-competitive diffusion, and the second to competitive diffusion.
Non-competitive diffusion
We show that nC-F2DLT can be reduced to LT with quiescence time, hereinafter denoted as LTqt. By proving the equivalence between the two models, we hence claim that both the monotonicity and submodularity properties hold for nC-F2DLT. Note that since we deal with a progressive model, we assume without loss of generality that, for every node v, the activation-threshold function has a constant value for the whole duration of the diffusion process, i.e., g(v,t)=Īøv.
Definition 4
Reduction ofnC-F2DLT toLTqt. Given \({\mathcal {G}} = \langle V, E, w, g, q, T\rangle \) for nC-F2DLT, a diffusion graph \({\mathcal {G}}_{LT}=\langle V_{LT},E_{LT} \rangle \) can be derived, under LTqt, such that VLT=V and ELT={(u,v)|(u,v)āE,wuv>0}. Every node vāVLT is assigned a quiescence time equal to the maximum value of the quiescence function qv(Ā·), i.e., \(\tau _{v}^{max} = \tau _{v} + \psi (N_{-}^{in}(v))\). ā”
Definition 4 exploits the fact that the distrust connections are not involved in the activation process, but only in the calculation of the quiescence time. Therefore, we can assume this time to be the maximum possible value, and hence we can study the propagation under LTqt. The reduction of nC-F2DLT to LTqt is meaningful since the two models are proved to be equivalent, as we report in the following theoretical result.
Proposition 1
The Non-Competitive Trust Threshold Model (nC- F2DLT) and the Linear Threshold Model with quiescence time (LTqt) are equivalent. \(\blacktriangleleft \)
Proof
According to the definition of equivalence of two diffusion models in (Kempe et al. 2003; Chen et al. 2013), in order to prove the equivalence of nC-F2DLT and LTqt we need to prove that the distribution of the active sets for any given seed set S0 is the same under the two models. We provide a proof by induction, hence we consider the evolution of the active sets during the diffusion rounds.
For the LTqt model, the probability of a node to be activated exactly at time t+1 (with tā„1) is given by:
Above, it should be noted that the joint probability \(\Pr \left (v \in \widetilde {S_{t+1}}, v \notin S_{t}\right)\) corresponds to the probability that the threshold associated with node v falls into the interval denoted by the influence received by v until the previous time step and the one received at the current time step. Moreover, Pr(vāSt) is just the probability that, at time (tā1), the influence received by v is still below its threshold. Finally, we derive the last equality in Eq. 4, which intuitively denotes that the influence exerted by the nodes in StāStā1, i.e., the nodes turning into the active state exactly in the current time step, is decisive to exceed the threshold Īøv.
For the npC-F2DLT model, the conditional probability \(\Pr \left (v \in \widetilde {S_{t-1}} \mid v \notin S_{t}\right)\) can be derived starting from Eq. 4 by constraining wuv such that \(u \in N^{in}_{+}(v)\), i.e., only trusted relations are considered. This leads to an equivalent definition of conditional probability, which holds for every time step t and seed set S0. Therefore, we can conclude that the final active sets will be the same for both models. ā”
It should be noted that, due to the quiescence times, the sets of active nodes in the two models may not be the same at every time step, but the two final active sets will match each other.
Since the introduction of quiescence time in LT does not have effect on the distribution of the final active nodes (Chen et al. 2013), we obtain the following equivalence: LT ā”LTqt ā”nC-F2DLT. Therefore, the activation function is still monotone and submodular under nC-F2DLT.
Example 2
Consider Fig. 5 , where the propagation process unfolds according to the LTqt dynamics. Nodes u and z are chosen as initial seeds. Thresholds and weights are set such that Īøā¤wuv and max{wvx,wzx}<Īøxā¤wvx+wzx, therefore the combined influence of v and z is required for the activation of node x. The dashed edge denotes a distrust connection removed as a result of the reduction defined in Definition 4. In the initial time step (t=0), u activates v causing its transition from the inactive state to the quiescent state (in yellow). When \(t=\tau _{v}^{max}\), v turns to the active state, and together with z it becomes able to trigger the activation of node x (which will eventually become active by the time-horizon T).
It should be noted that the same dynamics holds for the nC-F 2DLT model, apart from the difference that concerns the quiescence time of node v: this would be less than \(\tau _{v}^{max}\) since y, a foe of v, is not involved in the propagation process. ā
Competitive diffusion
We focus here on spC-F2DLT and npC-F2DLT, and show that both models can be reduced to the Homogeneous Competitive Linear Threshold (H-CLT) with Majority Vote as tie-breaking rule (Chen et al. 2013). This is a competitive, progressive model based on LT, for which it is known that its activation function is monotone but not submodular regardless of the particular tie-breaking rule.
To begin with, we might recall that the non-progressive LT-based diffusion can be reduced to the progressive case, using a particular form of layered graph (Kempe et al. 2003). Given a time interval T and a diffusion graph G=ćV,Eć for non-progressive LT, a new graph GT can be derived such that every node vāV will have a replica vt in every layer at time tāT, and for every edge (u,v)āE there will be an edge (utā1,vt) in GT.
Unfortunately, this serialization technique cannot be directly applied to our models, since it is not designed to deal with competitive or non-progressive diffusion and it discards activation or delayed propagation aspects. In the following, we define serialization techniques that are suitable for our competitive models and treat one particular configuration at a time. One general requirement is related to the time horizon to bound the unfolding of the diffusion process. In fact, when dealing with competitive models, the termination guarantee is lost. A simple example is provided next to depict such a non-termination scenario.
Example 3
In Fig. 6 , nodes u and z are chosen as seed for the green campaign and the red one, respectively. Nodes v and x become green-active and red-active, respectively, at time t=1. Next, they will constantly switch their activation campaign, causing non-termination of the diffusion process. ā
CONFIGURATION 1: No quiescence time, constant activation-threshold.
We assume that q(v)=0 and g(v,t)=Īøv, for all vāV,tāT. For both spC-F2DLT and npC-F2DLT, we claim their reduction to the H-CLT model with majority voting as tie-breaking rule.
Definition 5
spC-F2DLT graph serialization for reduction toH-CLT. Given a time interval T, we define a layered graph GT=ćVT,ETć such that, for each layer at time tāT, every node vāV will be represented in VT as a tuple \(\left \langle v^{1}_{t},v^{2}_{t},v^{3}_{t} \right \rangle \). Instances \(v^{1}_{t}\) and \(v^{2}_{t}\) have activation-threshold equal to 0, while \(v^{3}_{t}\) has the same threshold as the original node vāV. The set of edges is defined as \(E^{T} = \left \{\left (u_{t}^{1},v_{t+1}^{3}\right) \mid (u,v) \in E, t, t+1 \in T \right \} \cup \left \{\left (v_{t}^{3},v_{t}^{2}\right) \mid v \in V, t \in T\right \} \cup \left \{\left (v_{t}^{2},v_{t}^{1}\right) \mid v \in V, t \in T\right \} \cup \left \{\left (v_{t}^{1},v_{t+1}^{2}\right) \mid v \in V, t \in T\right \}\), and the following constraint on edge weights must hold: \( \forall v_{t}^{2} \in V^{T}, \ w\left (v_{t-1}^{1},v_{t}^{2}\right) < w\left (v_{t}^{3},v_{t}^{2}\right).\) ā”
In the above definition, triples act as connectors between two consecutive time-layers. The role of any connector component is as a sort of āswitchā to enable a node choosing between its activation state in a layer and the one in the subsequent layer. In other words, node \(v^{1}_{t}\) is the main instance of node v, since the activation state of \(v^{1}_{t}\) reflects the state of v in the original graph, under spC-F2DLT at time t; node \(v^{3}_{t}\) is the instance of v connected with other nodes from layer at tā1, therefore it reflects the influence received by v in the original graph, at time tā1; if the activation attempt to \(v^{3}_{t}\) fails, node \(v^{2}_{t}\) will be activated with the same state of v; otherwise, according to the edge weight constraint (cf. Definition 5), \(v^{2}_{t}\) will switch to the other campaign, and then will propagate to instance \(v^{1}_{t}\). Recall that \(v^{1}_{t}, v^{2}_{t}\) have zero activation-threshold. Figure 18 in Appendix A shows an example of serialization for a spC-F2DLT diffusion graph with time horizon set to 2.
It should be emphasized that, compared to the serialization method in (Kempe et al. 2003), we require replication of each node in each layer, and additional edges connecting the replica-instances, in order to allow the maintenance of the activation state when no activation event occurs between two time-consecutive layers.
Analogous reduction technique can be defined for the npC-F2DLT model.
Definition 6
npC-F2DLT graph serialization for reduction to H-CLT. Given a time interval T, we define a layered graph GT=ćVT,ETć such that, for each layer at time tāT, every node vāV will be represented in VT as a tuple \(\left \langle v^{1}_{t},v^{2}_{t},v^{3}_{t} \right \rangle \). Instances \(v^{1}_{t}\) and \(v^{2}_{t}\) have activation-threshold equal to 1 and 0, respectively, while \(v^{3}_{t}\) has the same threshold as the original node vāV. The set of edges is defined as \( E^{T} = \left \{\left (u_{t}^{1},v_{t+1}^{3}\right) \mid (u,v) \in E, t,t+1 \in T \right \} \cup \left \{\left (v_{t}^{3},v_{t}^{2}\right) \mid v \in V, t \in T\right \} \cup \left \{\left (v_{t}^{2},v_{t}^{1}\right) \mid v \in V, t \in T\right \} \cup \left \{\left (v_{t}^{3},v_{t}^{1}\right) \mid v \in V, t \in T\right \} \cup \left \{\left (v_{t}^{1},v_{t+1}^{2}\right) \mid v \in V, t, t+1 \in T \right \}\), and the following constraints on edge weights must hold: \( \forall v_{t}^{2} \in V^{T}, \ w\left (v_{t-1}^{1},v_{t}^{2}\right) < w\left (v_{t}^{3},v_{t}^{2}\right)\), and \( \forall v_{t}^{1} \in V^{T}, \ w\left (v_{t}^{2},v_{t}^{1}\right) + w\left (v_{t}^{3},v_{t}^{1}\right) = 1.\) ā”
It should be noted that the last condition in Definition 6 imposes nodes \(v_{t}^{2}\) and \(v_{t}^{3}\) to hold the same activation state in order to activate \(v_{t}^{3}\).
Analogously to the reduction of spC-F2DLT to H-CLT, we can conveniently devise a notion of āconnectorā component between any two consecutive layers, which however in this case should also account for node deactivations. Figure 19 in Appendix A shows an example of connector for the npC-F2DLT model.
Claim 1
For any given diffusion graph \({\mathcal {G}}\) under spC-F2DLT (resp. npC-F2DLT), assuming constant activation-threshold and no quiescence time, every node v in \({\mathcal {G}}\) is active at time tāT if and only if its corresponding instance \(v^{1}_{t}\) is active in the serialized graph GT (resp. npC-F2DLT). \(\blacktriangleleft \)
CONFIGURATION 2: Constant quiescence time, constant activation-threshold.
We assume that q(v)=Ļv and g(v,t)=Īøv, for all vāV. For both spC-F2DLT and npC-F2DLT, we claim their reduction to H-CLT with majority voting as tie-breaking rule.
In this case, we need to consider that, whenever a node is activated, its quiescence time may not expire before the time horizon; for this reason, we will consider only nodes reachable from \(S_{0} = S^{\prime }_{0} \cup S^{\prime \prime }_{0}\) within T, for any two given seed sets \(S^{\prime }_{0}\) and \(S^{\prime \prime }_{0}\). To identify such nodes, we define a quiescence-aware distance measure that accounts for the quiescence times along the path connecting any two nodes. Given nodes u,v, and the set P(u,v) of all paths between u and v, the distance from u to v will be measured as \( d(u,v) = \min _{p \in P(u,v)} \sum \nolimits _{x \in p} \tau _{x} \). Moreover, we denote with d(S0,v) the minimum distance between nodes uāS0 and v. By exploiting this distance, we will discard all nodes that cannot be ācontagiousā before the end of T, say tmax. Therefore, the node set VT of the layered graph is defined as:
Each node vāV with quiescence time Ļv will have connections from the previous layers according to the following rule: for any layer at time t, if t<d(S0,v) then v will not have any incoming edges, otherwise all incoming edges of v will be from the layer at time tāĻvā1.
Using the above settings in the serialization method previously presented, it can easily be demonstrated that both spC-F2DLT and npC-F2DLT can be reduced to an equivalent H-CLT model.
Claim 2
For any given diffusion graph \({\mathcal {G}}\) under spC-F2DLT (resp. npC-F2DLT), assuming constant activation-threshold and constant quiescence time, every node v in \({\mathcal {G}}\) is active at time tāT if and only if its corresponding instance \(v^{1}_{t}\) is active in the serialized graph GT (resp. npC-F2DLT). \(\blacktriangleleft \)
CONFIGURATION 3: Variable quiescence time, constant activation-threshold.
We assume that q(v,t) is variable, while g(v,t)=Īøv, for all vāV,tāT.
Like in the previous case, we need to specify the seed sets \(S^{\prime }_{0}, S^{\prime \prime }_{0}\). However, note that the quiescence time of a node now depends on the actual activation state of its in-neighborhood (cf. Eq. 3), which makes it unfeasible a direct serialization of the whole diffusion graph.
Starting from the original diffusion graph \({\mathcal {G}}\), we derive an āintermediateā graph \(\widehat {{\mathcal {G}}}\), which is equivalent to \({\mathcal {G}}\) unless each node vāV is associated with a quiescence time interval \([\tau _{v},\tau _{v}^{max}]\), where \(\tau _{v}^{max}=\tau _{v} + \psi (N_{-}^{in}(v))\). Let us denote with \({\mathcal {G}}^{min}\) the instance of \(\widehat {{\mathcal {G}}}\) such that the quiescence time of every \(v \in \widehat {{\mathcal {G}}}\) is Ļv, and with \({\mathcal {G}}^{max}\) the instance of \(\widehat {{\mathcal {G}}}\) such that the quiescence time of every \(v \in \widehat {{\mathcal {G}}}\) is \(\tau _{v}^{max}\).
Although we cannot assert that spC-F2DLT and npC-F2DLT are equivalent to H-CLT under the layered graph obtained by applying the previously described serialization techniques, an important theoretical result can nonetheless be provided, as reported next.
Claim 3
For any diffusion graph \({\mathcal {G}}\) under spC-F2DLT (resp. npC-F2DLT), with campaigns Cā²,Cā²ā², assuming constant activation-threshold and variable quiescence time, for any seed sets \(S^{\prime }_{0}\) and \(S^{\prime \prime }_{0}\), it holds that:
where Ļā² is the number of nodes activated by Cā² under spC-F2DLT (resp. npC-F2DLT), \(\sigma ^{\prime }_{H\text {-}CLTmax}(S^{\prime }_{0},S^{\prime \prime }_{0})\) and \(\sigma ^{\prime }_{H\text {-}CLTmin}(S^{\prime }_{0},S^{\prime \prime }_{0})\) are the number of nodes activated by Cā² under H-CLT in the layered graph obtained by serialization of spC-F2DLT (resp. npC-F2DLT) on \({\mathcal {G}}^{max}\) and \({\mathcal {G}}^{min}\), respectively. \(\blacktriangleleft \)
Enabling variable quiescence time, i.e., Ļ(Ā·), means that the exact time required by each node to make a transition from the quiescent state to the active one cannot be established in advance at the beginning of the propagation process. Since for any node v the quiescent time ranges within \([ \tau _{v}, \tau _{v}^{max}]\), we devise two opposite scenarios. In the first scenario, represented by the rightmost side of Eq. 5, each node is assumed to wait the minimum amount of time, i.e., Ļmin, before its activation; this leads to a higher fraction of nodes that could be activated before the time horizon T is reached. The second scenario, represented by the leftmost side of Eq. 5, assumes that each node has to wait the maximum possible quiescence time, i.e., Ļmax; as a consequence, a smaller fraction of nodes will be able to complete the activation process before the time limit, thus leading to a lower spread.
CONFIGURATION 4: No quiescence time, variable activation-threshold.
We assume that q(v)=0 and g(v,t)=Īøv+š(Īøv,t), for all vāV,tāT. For both spC-F2DLT and npC-F2DLT, we claim their reduction to HāCLT with majority voting as tie-breaking rule. In the following, we refer to the biased activation-threshold function, although it is easy to show analogous considerations for the non-biased activation-threshold function.
Because of the dynamic behavior of the activation-threshold function, we cannot predict its value at any particular time step of the diffusion process; nevertheless, by specifying the value of coefficient Ī“ in Eq. 1, we can derive the value of \(t_{v}^{max}\), which would suggest how many time-layers we have to look back in order to know the actual threshold value of v at a particular time t. In order to capture such dynamic aspect in H-CLT, we define a further serialization technique, built on top of the previously defined. We will restrict to a particular case, afterwards we provide some rules that apply to the general case.
Let us assume to focus on a particular node v, and at any two consecutive time steps of activation for the same campaign its threshold increases by Ī“. Again, node v will have replicas for any time-layer t, i.e., \(\langle v^{1}_{t},v^{2}_{t},v^{3}_{t} \rangle \), with the first replica, \(v^{1}_{t}\), holding the actual state of v in the corresponding serialized graph for the competitive model. In addition, we introduce further replicas, in number equal to the value \(t_{v}^{max}\); suppose, for the sake of simplicity, \(t_{v}^{max} = 3\), we derive replica nodes \(\left \langle v_{t}^{3,r1},v_{t}^{3,r2},v_{t}^{3,r3} \right \rangle \), such that each of them will have a threshold value in [Īøv,1] with increment of Ī“. Figure 7 illustrates this new component in the serialized graph.
Because this component is introduced as an extension of the previous techniques, the meaning of the nodes \(v^{1}_{t-1},v^{1}_{t-2},v^{1}_{t-3}\) remains the same as in the previous cases. On the right side of Fig. 7, each of the additional replicas has a different value of threshold and it is connected with nodes coming from the previous layers. Clearly, the overall behavior of this component depends on the weights attached to every edge in the structure. In this regard, we define the following constraints on the edge weights:
It should be noted that the activation attempts are performed directly on the replicas. Therefore, the above constraints on the edge weights control whether a node assumes the state derived as the outcome of the most recent activation attempts, or the one consistent with its personal history. as the outcome of the most recent activation attempts or the one consistent with its personal history. Each of the aforementioned inequality contributes to this decision process, following a different purpose. Eq. 6(a) ensures that the state derived from the last activation attempt is always preferred to the one derived from the previous time step. Eq. 6(b) ensures that the information coming from the previous time steps shall be given the same importance as the one derived from the current replicas. Eq. 6(c-d) ensures that the most recent information, i.e., the closest previous time steps, has higher priority than the earliest one. Eq. 6(e) ensures that there is consistency with respect to the state assumed in the closest previous time step and farthest involved time step (e.g., the third previous time step in the addressed scenario).
Moreover, the threshold of the ācentralā node in the component (\(v^{3}_{t}\)) is set to w3,r1, to ensure sequentiality of the diffusion. By setting \(\theta _{v_{t}^{3}}\) equal to w3,r1, we avoid that \(v_{t}^{3}\) can be activated by its own replicas belonging to layers preceding the tā1-th layer.
Figure 8 shows how the above defined connector is integrated into a serialization technique. In the figure, only the connections incident on vertex v are expanded. The red edges are the ones connecting consecutive layers, therefore the replica \(v_{t}^{3,r1}\) is connected with the previous layer, the replica \(v_{t}^{3,r2}\) is connected with the second previous layer and so on. Blue edges represent the new connections due to the introduction of this new component.
Claim 4
For any given diffusion graph \({\mathcal {G}}\) under spC-F2DLT (resp. npC-F2DLT), assuming variable activation-threshold and no quiescence time, every node v in \({\mathcal {G}}\) is active at time tāT if and only if its corresponding instance \(v^{1}_{t}\) is active in the serialized graph GT (resp. npC-F2DLT). \(\blacktriangleleft \)
Evaluation methodology
Data
We used four real-world, publicly available networks, namely: Epinions (Leskovec et al. 2010b), Slashdot (Leskovec et al. 2010b), Wiki-Conflict (Brandes et al. 2009) and Wiki-Vote (Leskovec et al. 2010a). Epinions is a āwho-trust-whomā network of the homonymous review site. Slashdot models friend/foe relations between the users of the homonymous technology-related news website. Wiki-Conflict refers to Wikipedia users involved in an āedit-warā, i.e., edges represent either positive or negative conflicts in editing a wikipage. Wiki-Vote models āwho-vote-whomā relations between Wikipedia users that voted for/against each other in admin elections. Our choice of the evaluation datasets was mainly driven by two intents: (i) to provide a reproducible evaluation framework based on publicly available network data, and (ii) to test our models on a diversified set of real-world OSNs with suitable characteristics for information propagation processes.
Table 2 summarizes main structural characteristics of the networks. To favor meaningful competition of campaigns based on selected pairs of strategies, we limited the diffusion context to the largest strongly connected component in each evaluation network; note that, for Wiki-Conflict, the largest strongly connected component coincides with the whole graph. Also, the clustering coefficient corresponds to the definition of global transitivity in an undirected graph (the direction of the edges is ignored).
All networks are originally directed and signed; in addition, the two Wikipedia-based networks also have timestamped edges. In order to derive the weighted graphs of influence probabilities, we defined the following method: for every (u,v)āE, the edge weight wuv was sampled from a binomial distribution \(\mathcal {B}\left (|N^{in}_{+}(v)|,p\right)\) if \(u \in N^{in}_{+}(v)\) (i.e., v trusts u), otherwise \(w_{uv} \sim -\mathcal {B}\left (|N^{in}_{-}(v)|,p\right)\), where the probability of success p is equal to the fraction of trust edges in the network; the rationale is that for higher fraction of trusted connections in the network, the nodes will be more likely to trust each other, and hence each node is more likely to be involved in the propagation process. We performed 1000 samplings of edge weights, for each of the four networks. Therefore, all presented results will correspond to averages of 1000 simulation runs.
Seed selection strategies
We defined four seed selection strategies, each of which mimics a different, realistic scenario of influence propagation.
Exogenous and malicious sources of information. This method, hereinafter referred to as M-Sources, aims at simulating the presence of multiple sources of malicious information within the network. Here, an exogenous source is meant as a node without incoming links, e.g., a user that is just interested in spreading her/his opinion: such a node is also regarded as malicious if a high fraction of outgoing influence exerted by the node is distrusted by out-neighbors. Formally, given a budget k, the method selects the top-k users in a ranking solution determined as \(r(v) = (\bar {W}^{-} / (\bar {W}^{-} + \bar {W}^{+})) \log (|N^{out}(v)|)\), for every v such that Nin(v)=ā , where \(\bar {W}^{+}, \bar {W}^{-}\) are shortcut symbols to denote the sum of trust (resp. distrust) weights, respectively, outgoing from v.
Exogenous and influential trusted sources of information. Analogously to the previous method, this one, dubbed I-Sources, searches for the ābestā influential trusted sources. The ranking function is as \(r(v) = (\bar {W}^{+} / (\bar {W}^{-} + \bar {W}^{+})) \log (|N^{out}(v)|)\). Note that this still takes into account the negative weights, because even a highly trusted user might be distrusted by some other users (e.g., āhatersā).
Stress triads. This strategy is based on the notion of structural balance in triads (Leskovec et al. 2010b). Figure 9 shows an example of stress-triad configuration: node v has two incoming connections, the one from node z with negative weight, and the other from u with positive weight, and there is also a trust link from z to u. We say that z is a stress-node since, despite the distrusted link to v, it could also indirectly influence v through the trusted connection with u. Based on that, our proposed Stress-Triads strategy searches for all triads containing stress-nodes and selects as seeds the first k stress-nodes with the highest number of triads they participate to.
Newcomers. We call a node vāV as a newcomer if all of its incoming edges are timestamped as less recent than its oldest outgoing edge. The start-time of v is the oldest timestamped associated with its incoming edges. We divide the set of newcomers into two groups obtained by equal-frequency binning on the temporal range specific of a network. Upon this, we distinguish between two strategies, dubbed Least-New and Most-New, which correspond to the selection of k newcomers having highest out-degree among those with the oldest start-time and with the newest start-time, respectively. Both strategies were applied to Wiki-Vote and Wiki-Conflict, due to the availability of timestamped edges.
Settings of the model parameters
For every user v, the exogenous activation-threshold Īøv and quiescence time Ļv were chosen uniformly at random within [0,1] and [0,5]. Moreover, Ī» (used in the quiescence function) was varied between 0 and 5, while the coefficient Ī“ (used in the activation-threshold function) was selected in {0,0.1} for the biased scenario (Eq. 1) and kept fixed to 1 for the unbiased scenario (Eq. 2).
Results
We organize the presentation of our experimental results into three parts. The first part is devoted to the evaluation of the non-competitive model (āEvaluation of nC-F2DLTā section), and the second part for the competitive models (āEvaluation of competitive modelsā section). In the third part (āComparative evaluationā section), we present a comparative evaluation of our non-competitive model against IC and stochastic individual-contact epidemic models, whereas for the competitive scenario, we compare our models with the DLT model (Litou et al. 2016).
Evaluation of nC-F 2 DLT
Spread, stressed users and negative influence
We analyzed the number of final activated users (i.e., spread) by varying the size (k) of seed set, for every seed selection strategy. In this analysis, we assumed constant activation thresholds (i.e., š(Ā·,Ā·)=0) and constant quiescence times (i.e., Ļ(Ā·,Ā·)=0). Moreover, we distinguished between āstressedā and āunstressedā users, being the former regarded as active users having at least one distrusted active in-neighbor. As shown in Fig. 10 for some representative cases, besides the expected growth in spread as k increases, we found the activation of stressed users lower in amount but following similar trend as that corresponding to unstressed users. For both types of users, I-Sources revealed higher spread capability, followed by Stress-Triads, in all networks (with the exception of Wiki-Vote). The two newcomers-based strategies (where applicable) turned out to be effective as well, with Least-New prevailing on Most-New for lower k. By contrast, M-Sources was in general unable to yield a spread comparable to other strategies.
We further investigated the effect of distrusted connections on the spread during the unfolding of the diffusion process. In this regard, Table 3 shows the amount of nodes that, at the time of their involvement in the propagation process, were negatively influenced by in-neighbors activated at any previous time, along with their perceived negative influence. The symbol ā-ā in Table 3 denotes that the corresponding seed-selection strategy does not apply to a particular network. In general, we observed a significant presence of negative influence spread when using I-Sources and Stress-Triads. Considering Epinions and Slashdot, the former (resp. the latter) corresponded to a negative influence spread of the order of thousands (resp. hundreds), with average influence weight around 0.3 (resp. 0.2). The impact of these strategies was lower in the Wikipedia networks (one order of magnitude below). M-Sources yielded to null (in Epinions and Slashdot) but also non-negligible (in Wiki-Vote and Wiki-Conflict) spread. By contrast, the newcomers-based strategies had small (in Wiki-Vote) or negligible (in Wiki-Conflict) effect on the negative influence spread.
Activation loss
As partially unveiled by the previous analysis, the usersā involvement in the propagation process is affected by the behavior of the quiescence function, whose impact would increase with the amount of distrusted influence in the spread. This further prompted us to measure the activation loss, i.e., the percentage decrease of activated users, due to the enabling of the time-varying quiescence factor (i.e., Ī»>0 in Eq. 3) in the usersā activation states. Figure 11 shows results corresponding to relatively large Ī» (set to 5) and k (set to 50). For each seed selection strategy, the curve is drawn by using polynomial splines,Footnote 2 where the marked points (from low to high time steps) refer to the 25%, 50%, 75% and 100% of the time horizon observed for the diffusion process under the chosen strategy without time-varying quiescence times. One general remark that stands out is a relatively high percentage of activation loss for the initial time steps; this holds in particular for Stress-Triads, which might be explained since the initial influenced users by means of this strategy tend to be subjected to a certain amount of distrusted influence. As the time steps get closer to the time horizon, the activation loss tends to significantly decrease, down to nearly zero in most cases, with few exceptions including the use of I-Sources in Slashdot and Epinions, and Stress-Triads and M-Sources in Wiki-Vote ā note this is indeed consistent with the previous analysis on negative influence spread.
Evaluation of competitive models
To analyze the behavior of spC-F2DLT and npC-F2DLT, we aimed at simulating a scenario of limitation of misinformation spread, i.e., we assumed that one campaign, the ābadā one, has started diffusing, and consequently another campaign, the āgoodā one, is carried out in reaction to the first campaign.
Combining seed selection strategies
Within this view, we preliminarily investigated about strategy combinations that might be reasonably considered for a misinformation spread limitation problem. Table 4 provides a number of statistics we collected to characterize selected pairs of strategies, for two campaigns carried out independently to each other, i.e., in a non-competitive scenario, with k=50. Using Stress-Triads for the bad campaign and I-Sources for the good campaign was found to be significant for all networks, with sharing percentage close to 100% in Epinions and Slashdot and above 80% in Wiki-Conflict. Also, pairing M-Sources with I-Sources, and Least-New with Most-New, was well-suited in Wiki networks.
Setting and goals for the evaluation of competitive diffusion
As previously mentioned, the seed selection strategies chosen for the two campaigns might not start at the same time, in which case we assume that the first-started one is the bad campaign. Moreover, we used fixed-probability as tie-breaking rule, with probability equal to 1 for the bad campaign. Also, we set the time horizon to the end-time of the (non-competitive) diffusion of the bad campaign.
Our main goal in the analysis of the two competitive models was to understand the effect of the setting of the activation-threshold function on the usersā campaign-changes/deactivations, under the case of āreal-time correctionā or ādelayed correctionā by the good campaign against the bad one (cf. Introduction).
Evaluation of spC-F 2 DLT
We present results on the campaign spreads, the number of users activated for one campaign that switched to the other campaign, and the total number of switches; the latter two measurements are represented, in the barcharts shown in Fig. 12, by the lower and upper whiskers, respectively, in the linerange vertically placed on each bar. Results correspond to start-delays Ī t0 of the good campaign w.r.t. the bad one (from 0 to 75% of the end-time of the bad campaign). For this analysis, we considered the biased definition of the activation-threshold function (Eq. 1).
One general remark is that, for Ī“=0,Ī t0=0, the seed strategy that showed to be most effective in spread in the non-competitive case (cf. Table 4) confirmed its advantage against the other campaignās strategy. Nevertheless, for Ī“>0,Ī t0>0, the two campaigns would tend to an equilibrium, or even to invert their trend (e.g., in Epinions and Wiki-Vote). In particular, by accounting for (even little) confirmation bias and letting both campaigns start at the same time, I-Sources slightly increases its spread (which is explained since this strategy allows for activating first a high fraction of shared users, e.g., 70% in Epinions); but, as the start-delay increases at 50%, the good campaign is no more able to save users from being influenced by the bad campaign (i.e., Stress-Triads in Epinions, M-Sources in Wiki-Vote).
Interesting remarks were also drawn from the analysis of the transitions from one campaign to the other one. For Ī“=0, as the start-delay increases, the number of switched users follows a nearly constant trend in all networks (but Wiki-Vote, where we observed a drastic decrease for both campaigns), while the total number of switches is subjected to a more evident decreasing trend. Moreover, we observed a higher number of (unique and total) switches from the bad campaign to the good campaign, than vice versa, which occurred even when the spread of the bad campaign was higher than the good one (e.g., in Wiki-Vote, for both combinations of strategy choices). Setting Ī“=0.1 led to a general decrease in the switch measurements w.r.t. the corresponding previous case, and also to a substantial increase in āsavedā users by the good campaign.
Biased vs. unbiased activation-threshold function.
We also investigated how our proposed semi-progressive model behaves under the unbiased scenario correponding to the activation-threshold function (Eq. 2).
Figure 13 shows flow diagrams of the spread based on spC-F2DLT for the selection of strategies I-Sources (red color) and Stress-Triads (green color), with the activation-threshold function defined either for the unbiased scenario (plots on the top) or for the biased scenario (plots on the bottom). In each plot, the height of a vertical bar along with the percentage displayed upon it, denote the number of active users at a particular time step and the ratio w.r.t. the maximum number of active users achieved by the corresponding selection strategy. The space between two consecutive bars corresponds to a time window, here set to 6 time steps for readability reasons. In each window we record two main events: (i) the number of active nodes that keep the same activation state, represented in base-2 logarithmic scale by the flow connecting two consecutive bars for the same campaign, and (ii) the number of users that switched from one campaign to the opposite one, represented in base-10 logarithmic scale by the flow connecting two consecutive bars with different colors.Footnote 3 Note that for this analysis we discarded the start delay for the good campaign.
As expected, the number of switched users tends always to be in favor the good campaign, which has typically the best strategy of activation. However, and more importantly, the number of switched users in the unbiased scenario is significantly greater than in the biased scenario. Moreover, when the confirmation-bias effect is enabled, the majority of the switches are concentrated in the initial time-windows, then they follow a rapid decreasing trend until the time horizon. On the contrary, in the unbiased scenario, there is still a concentration of switches in the early stages of the propagation, but it becomes less evident and the number of switches tends to decrease more smoothly as opposed to the confirmation-bias scenario. This is particularly evident in Slashdot, where switches last until the latest time-windows, while in the confirmation-bias scenario the switches stop just after the third time-window. No significant differences can be observed on Wiki-Conflict, which is explained since the majority of shared nodes are activated in the early stage of the propagation, where the diffusion seems to behave in the same way regardless of the particular activation-threshold function.
Evaluation of npC-F 2 DLT
Compared to the evaluation of spC-F2DLT, the spread trends observed under npC-F2DLT showed no particular differences. However, more importantly, the occurrence of deactivation events, which are admitted by npC-F2DLT, appeared to favor the good campaign strategy, as shown in Fig. 14. In particular, in Epinions and Slashdot, the number of user-unique and total deactivations tend to increase for the bad campaign and to decrease for the good one; moreover, although the spread of the good campaign remains higher, the deactivations for the good campaign are more frequent than those for the bad campaign as long as the start-delay remains zero or low, and the confirmation bias factor is not introduced. A few differences arise in Wiki-Conflict. As concerns Stress-Triads vs. I-Sources, although the 95% of shared users is activated first by the bad campaign, this advantage revealed not to be enough to avoid that the good campaign will eventually activate more users. In fact, the number of deactivations with Ī“=0.1 increases for Stress-Triads and is always higher than for I-Source, in which the statistic remains nearly constant by increasing the start-delay. Similar situation was observed for the combination M-Sources with I-Sources. Also, using Least-New with Most-New led to no deactivations, which might be explained since the totality of shared users was reached first by the bad campaign (cf. Table 4).
Comparative evaluation
We conducted a twofold comparative evaluation, divided in two stages. The first one refers to the non-competitive scenario, whereby we compared nC-F2DLT to two epidemic models, i.e., SIR and SEIR, and the IC model. inspired by the studies in the epidemiology field. The second stage of our evaluation addresses the comparison between spC-F2DLT with the DLT model (Litou et al. 2016), which is the closest to our work, as we previously discussed in āRelated workā section.
Comparison with the IC, SIR and SEIR models
We begin with briefly recalling the basic principles underlying the competitor models considered in this section. The independent cascade (IC) model is a stochastic discrete-time diffusion model like LT, such that once a node becomes active, in the following time step of propagation it has a single chance of activating each of its out-neighbors. As concerns the epidemic models SIR and SEIR, the individuals of a population are divided in compartments that describe one of the following epidemiological states: susceptible (S), infective (I), latent-period or exposed (E), and recovered (R); therefore, individuals transition through those states. It is important to note that, since we need to treat each node in a network as an individual agent in order to enable a comparison with our diffusion model, we implemented SIR and SEIR based on a stochastic individual-contact network modeling as opposed to the standard, deterministic compartmental modeling (cf. āRelated workā section). Within this view, the infection process in SIR is governed by two main parameters: (i) the transmission or contact rate (Ī²), i.e., the probability of a susceptible node to be infected by any of its infected in-neighbors; and (ii) the recovery rate (Ī³), i.e. the probability of an infected node to transition to the recovery state with immunity, thus consequently stopping propagating the disease along the network. Moreover, in the SEIR model, the transition to the exposed state is governed by the incubation rate (Ļ), which defines the average duration of incubation as 1/Ļ; note that the notion of exposed state somehow resembles the quiescent state of nodes in nC-F2DLT.
Figure 15 shows the complementary cumulative distribution function (CCDF) of the probability for a node of being active/infected from any given time step t to the termination of the process. The presented results correspond to Ī² set to 0.2 and Ī³ set to either 0 or 1: note that Ī³=1 implies that any node recovers immediately after its activation, and hence similarly to the IC model it has a single chance for activating its susceptible out-neighbors; by constrast, setting Ī³=0 implies that a node is unable to recover after its activation, therefore similarly to nC-F2DLT it will continue to contribute to the activation of its susceptible/inactive out-neighbors until the end of the process. (Further results for other settings of Ī² and Ī³ are reported in Appendix B.) Moreover, for SEIR, we set Ļ=0.4, thus imposing an average incubation time equal to 2.5 time steps; this setting enables a fair comparison with our model, since each node will be expected to spend the same amount of time in the quiescent state for nC-F2DLT (cf. āSettings of the model parametersā section) as in the exposed state for SEIR.
Looking at the figure, as expected IC and SIR (with Ī³=1) show an almost identical behavior, since most activations occur in the early stage of the propagation. On the contrary, when Ī³=0, the SIR model tends to behave relatively closer to nC-F2DLT rather than IC, since the activations appear more uniformly distributed along the lifetime of the process. Also, the introduction of the exposed state in the SEIR model forces the dynamics of the propagation to be further more similar to the nC-F2DLT model, especially with Ī³=1. One general remark that stands out is that nC-F2DLT tends to favor a slower diffusion, since the propagation process lasts consistently longer than IC and the epidemic models. Moreover, nC-F2DLT yields a smoother behavior in terms of time-decay of its CCDF than those corresponding to the other models.
In general, we can state that already for the non-competitive scenario, epidemic models even in their stochastic contact network formulation provide a differnt solution in terms of behavioral dynamics w.r.t. our proposed nC-F2DLT.
Comparison with the DLT model
We finally conducted a stage of comparative evaluation with the DLT method (Litou et al. 2016) (cf. āRelated workā section). To this purpose, we analyzed the trends of spread and corresponding overlaps of activated nodes, under a competitive scenario. For DLT, we considered two cases: the one including the decay of influence probabilities (with Poisson decay coefficient set to 1), and the other one discarding the influence decay (as also studied in (Litou et al. 2016)), hereinafter dubbed DLT ā; as concerns our models, we were forced to use spC-F2DLT since DLT does not allow node deactivation.
Figure 16 shows results obtained on Slashdot, for the choice of strategies I-Sources and Stress-Triads (similar trends were observed for Epinions and Wiki networks). A first remark is that, regardless of the seed selection strategy, the diffusion process under DLT terminates in a very few time steps, mainly due to the influence decay factor. Also, before convergence, DLT enables the activation of more nodes than spC-F2DLT, though this actually corresponds to a small portion of the finally activated nodes by spC-F2DLT and, in any case, with an overlap that is generally below the 50%.
We explored more in detail the spread overlap between spC-F2DLT and DLT as well as its variant without decay (DLT ā); for spC-F2DLT, we considered the settings Ī“=0 and Ī“=0.1. Figure 17 shows the heatmaps for the percentages of overlap of activated nodes at convergence of their respective models. As we observe in both plots, and regardless of Ī“, the overlap between DLT ā and spC-F2DLT is around 40-45%, which further drops to less than 10% when the influence decay is considered (the lighter, the lower is the overlap).
Overall, we can conclude that DLT, and even its variant DLT ā without influence decay, behaves significantly different from our semi-progressive F2DLT.
Discussion and usage recommendations
Our theoretical inspection of the proposed models, whose technical details have been presented in āTheoretical properties of the modelsā section, revealed two important findings:
(F1): The non-competitive, progressive model, nC-F2DLT, is proven to be equivalent to LT with Quiescence Time; therefore, the activation function in nC-F2DLT is monotone and submodular.
(F2): The competitive, non-progressive models, spC-F2DLT and npC- F2DLT, can be reduced, via graph serialization, to Homogeneous Competitive LT (Chen et al. 2013), which is competitive and progressive, and has monotone, non-submodular activation function; therefore, the activation function in spC-F2DLT and npC- F2DLT is monotone but not submodular. It should be emphasized that the basic technique of graph serialization introduced in (Kempe et al. 2003) to reduce the non-progressive LT-based diffusion to the progressive case, cannot be applied to our proposed models, since it is not designed to deal with competitive or non-progressive diffusion and it discards activation or delayed propagation aspects; to overcome this issue, we provided new serialization techniques and relating definitions of layered-graphs that are suitable for our competitive models, focusing on particular settings of the activation-threshold and quiescence functions.
The two findings clearly have different impact on the development, upon our F2DLT models, of approximate solutions to influence maximization, rumor blocking, and related problems. On the other hand, in terms of expressiveness of our competitive F2DLT models, it should be noted that the serialization techniques require the construction of layered graphs whose size easily grows with some of the modelsā parameters, making the application of such serialized graphs unfeasible at a large scale. Therefore, using our competitive F2DLT models turns out to be essential in the representation of complex, dynamic propagation phenomena.
Our proposed class of trust-aware, dynamic models for non-competitive and competitive information diffusion offers a versatile solution for enhanced understanding of complex influence-propagation phenomena that occur in real-life network scenarios. Our models are also unique, since they have significantly different behavioral dynamics w.r.t. epidemic models and the dynamic linear threshold model, according to theoretical considerations that were also clearly supported by empirical evidence in our experimental evaluation.
It should be noted that the setting of the dynamic activation-threshold function š and quiescence function Ļ, especially of their parameters Ī“ and Ī», respectively, plays a crucial role in the expressiveness of our models, thus differently impacting on positive-influence propagation and on negative-influence/misinformation limitation. We make the following recommendations for the usage of our models.
-
The results of our evaluation revealed that the average userās sensitivity in the negative influence perceived from distrusted neighbors (which is controlled by Ī») makes the seed identification process more aware of the negative influence spread, thus considering the quiescence-biased contingencies by which a non-negligible fraction of users cannot be activated before the time limit.
-
The confirmation-bias effect underlying Ī“ may lead the āstrongerā campaign (i.e., the one able to activate most users at the early steps of its diffusion) to increase its spread capability.
-
As shown by simulations under the semi-progressive competitive model (spC-F2DLT), the combined effect of increased Ī“ with an increase in the delay of the beginning of the second-started (good) campaign may reduce its capability of āsavingā users from the influence of the bad campaign; therefore, to limit misinformation spread, the good campaign should concentrate its (activation) efforts in the early stage of its diffusion.
-
The non-progressive competitive model (npC-F2DLT) appears to be less sensitive to the increase of Ī“. Yet, it tends to favor deactivation events (for users previously activated by the weaker campaign) over switched events.
In this regard, it would be interesting to study how to learn the various parameters in our models for the corresponding IM scenarios. Note that learning parameters for IM tasks is a challenging problem, which is still largely open, given the relatively little work done even under basic diffusion models (Saito et al. 2008; Goyal et al. 2010). Major difficulties are in the assumption of availability of past propagation data from which the parameters would be learnt, which is in general difficult to obtain, and the large number of parameters, which poses efficiency issues. To overcome these aspects, the approach in (Vaswani et al. 2017) appears to be particularly promising, as it does not depend on the rules that control how the propagation unfolds over time. Nonetheless, we expect that the learning problem in our setting will easily become much more challenging, given the presence of other parameters than just the diffusion probabilities, and the need for coping with competitive influence scenarios. Therefore, we believe there will be much work to do in such a direction.
Conclusion
We proposed a novel class of trust-aware, dynamic LT-based models for non-competitive and competitive influence propagation in information networks. Evaluation on real-world, publicly available networks included simulations of scenarios of misinformation spread limitation, based on realistic strategies of selection of the initial influential users.
We believe that our proposed models can pave the way for the development of sophisticated methods to solve misinformation spread limitation and related optimization problems. Remarkably, our models can profitably be used in a variety of applications whereby there is an emergence to predict the time required to debunk fake information, or to estimate how people are affected by the spread of competitive opposite opinions through a social network. Also, we envisage an effective support for fact-checking, through a contextualization of the activation and quiescence functions to the production/consumption of contents in interaction networks.
Appendix A: Additional details on the properties of the models
Figure 18 shows an example of serialization for a spC-F2DLT diffusion graph with time horizon set to 2. Dashed lines correspond to the edges in the original graph, whereas solid lines correspond to the edges in the resulting serialized graph. Each of the four nodes in the original graph is replicated as a triple on each of the two time-layers. Triples act as āconnectorsā between two consecutive time-layers.
Analogously to the reduction of spC-F2DLT to H-CLT, we can conveniently devise a notion of āconnectorā component between any two consecutive layers, shown in Fig. 19, which in the case of npC-F2DLT needs to account for node deactivations.
Example 4 shows a selection of possible configurations for the component utilized in competitive models shown in Fig. 7, in order to prove the correctness of the set of constraints in Eq. 6.
Example 4
In the example of Fig. 20, we assume that node v in the original graph needs three consecutive time steps to reach the unit value for its threshold.
On the right side of each subfigure, there are the additional node replicas:
\(\left \langle v_{t}^{3,r1},v_{t}^{3,r2},v_{t}^{3,r3} \right \rangle \), where \(v_{t}^{3,r3}\) has the maximum value for the activation threshold.
Figure 20a represents the case when the node has already reached the maximum value of its threshold and its in-neighbors are only able to activate the first two replicas, which is not enough for making v change its activation campaign. We need thus to verify that:
Note that the inequality in (9) holds (cf. Eq. 6 (e)).
Figure 20b shows the case when v is active for just two consecutive time steps. Therefore, the configuration on the right side of the figure is enough to activate v in favor of the red-campaign. Therefore, the following inequality must hold:
Above, inequality in (12) holds as given in Eq. 6 (a).
Figure 20c shows the case when the node v switched from one campaign to the opposite in the middle of the three consecutive time steps. In this case the activation of node \(v_{t}^{3,r1}\) must guarantee the change of activation campaign. So the following inequality must hold:
Indeed the validation is straightforward, because all the summations on the left side in 13 are by definition greater than the one on the right side.
Figure 20d shows the case when at time step t none of the in-neighbors of v is able to activate it. In this case, the node will keep its previous state, and it will do it only after the activation of the corresponding replica at the very previous time, this allow us to guarantee the sequentiality of the whole process. ā
Appendix B: Additional details on epidemic models
Figure 21 provides a focus on the behavior of SIR and SEIR models in terms of varying parameters (i.e., transmission rate Ī², recovery rate Ī³, and incubation rate Ļ) for the analysis discussed in āComparison with the IC, SIR and SEIR modelsā section.
We observe that, for both models, most of the infections tend to occur at the early time steps of the propagation as Ī² increases. On the other hand, higher values of Ī³ yield cascades that show a smoother decay over time, and consequently they last longer than those corresponding to smaller Ī³.
Notes
We assume the second additive term in Eq. (1) is zero if Ī“=0.
We used the splines2 R-package, available at https://CRAN.R-project.org/package=splines2.
The choice of two different logarithmic scales to represent the active users and the switched users is for the sake of readability of the plots, since the number of switched users is typically orders of magnitude smaller than the number of active users.
Abbreviations
- F 2 DLT :
-
Friend-foe dynamic linear threshold models
- IC:
-
Independent cascade model
- LT:
-
Linear threshold model
- nC-F 2 DLT :
-
Non-competitive F2DLT model
- npC-F 2 DLT :
-
Non-progressive competitive F2DLT model
- OSN:
-
Online social network
- spC-F 2 DLT :
-
Semi-progressive competitive F2DLT model
References
Anagnostopoulos, A, Bessi A, Caldarelli G, Vicario MD, Petroni F, Scala A, Zollo F, Quattrociocchi W (2015) Viral misinformation: The role of homophily and polarization In: Proc. World Wide Web Conf, 355ā356.. ACM, New York.
Bessi, A, Caldarelli G, Del Vicario M, Scala A, Quattrociocchi W (2014) Social determinants of content selection in the age of (mis)information In: Proc. Int. Conf. on Social Informatics (SocInfo), 259ā268.. Springer International Publishing.
Bishop, J (2007) Increasing participation in online communities: A framework for human-computer interaction. Comput Hum Behav 23(4):1881ā1893.
Brandes, U, Kenis P, Lerner J, van Raaij D (2009) Network analysis of collaboration structure in Wikipedia In: Proc. World Wide Web Conf, 731ā740.. ACM, New York.
Budak, C, Agrawal D, Abbadi AE (2011) Limiting the spread of misinformation in social networks In: Proc. World Wide Web Conf, 665ā674.. ACM, New York.
CaliĆ², A, Tagarelli A (2018) Trust-Based Dynamic Linear Threshold Models for Non-competitive and Competitive Influence Propagation In: Proc. IEEE Int. Conf. On Trust, Security And Privacy In Computing And Communications (TrustCom), 156ā162. https://doi.org/10.1109/TrustCom/BigDataSE.2018.00033.
Chen, S, He K (2015) Influence Maximization on Signed Social Networks with Integrated PageRank In: Proc. IEEE Social Computing Conf, 289ā292.. IEEE Computer Society.
Chen, W, Lakshmanan LVS, Castillo C (2013) Information and Influence Propagation in Social Networks. Morgan & Claypool, San Rafael.
Chen, W, Lu W, Zhang N (2012) Time-critical influence maximization in social networks with time-delayed diffusion In: Proc. AAAI Conf, 592ā598.. AAAI Press.
Chen, W, Collins A, Cummings R, Ke T, Liu Z, Rincon D, Sun X, Wang Y, Wei W, Yuan Y (2011) Influence maximization in social networks when negative opinions may emerge and propagate In: Proc. SIAM Conf. on Data Mining.. SIAM, Philadelphia.
Das, A, Gollapudi S, Kiciman E, Varol O (2016) Information dissemination in heterogeneous-intent networks In: Proc. ACM Conf. on Web Science, 259ā268.. ACM, New York.
Dodds, PS (2018) Slightly generalized contagion: Unifying simple models of biological and social spreading. In: Lehman S Ahn Y-Y (eds)Complex Spreading Phenomena in Social Systems. Computational Social Sciences, 67ā80.. Springer Internation Publishing.
Fan, L, Lu Z, Wu W, Thuraisingham B, Ma H, Bi Y (2013) Least cost rumor blocking in social networks In: Proc. Distr. Comp. Syst. Conf, 540ā549.
Fazli, M, Ghodsi M, Habibi J, Khalilabadi PJ, Mirrokni VS, Sadeghabad SS (2014) On non-progressive spread of influence through social networks. Theor Comput Sci 550:36ā50.
Garimella, K, De Francisci Morales G, Gionis A, Mathioudakis M (2017) The effect of collective attention on controversial debates on social media In: Proc. ACM Conf. on Web Science, 43ā52.. ACM, New York.
Gilbert, E, Karahalios K (2009) Predicting tie strength with social media In: Proc. Int. Conf. on Human Factors in Computing Systems (CHI), 211ā220.. ACM, New York.
Golbeck, J, Hendler JA (2006) Inferring binary trust relationships in web-based social networks. ACM Trans Internet Techn 6(4):497ā529.
Goyal, A, Bonchi F, Lakshmanan LVS (2010) Learning influence probabilities in social networks In: Proc. Int. Conf. on Web Search and Web Data Mining (WSDM), 241ā250.. ACM, New York.
Hamdi, S, Bouzeghoub A, Gancarski AL, Yahia SB (2013) Trust inference computation for online social networks In: Proc. IEEE TrustCom/ISPA/IUCC, 210ā217.. IEEE Computer Society.
Han, L, Ma Z, Shi T (2003) An sirs epidemic model of two competitive species. Math Comput Model 37(1-2):87ā108.
He, X, Song G, Chen W, Jiang Q (2012) Influence blocking maximization in social networks under the competitive linear threshold model In: Proc. SIAM Conf. on Data Mining, 463ā474.
Hethcote, H (2000) The mathematics of infectious diseases. SIAM Rev 42(4):599ā653.
Hu, M, Jia S, Chen Q, Jia Z, Hong L (2013) The analysis of epidemic disease propagation in competition environment In: Proc. Intelligent Automation Conf, 227ā234.. Springer Berlin Heidelberg, Berlin.
Iniguez, G, Ruan Z, Kaski K, Kertesz J, Karsai M (2018) Service adoption spreading in online social networks. In: Lehman S Ahn Y-Y (eds)Complex Spreading Phenomena in Social Systems. Computational Social Sciences, 151ā175.. Springer International Publishing.
Iribarren, JL, Moro E (2009) Impact of human activity patterns on the dynamics of information diffusion. Phys Rev Lett 103(3).
Ji, C, Jiang D, Yang Q, Shi N (2012) Dynamics of a multigroup SIR epidemic model with stochastic perturbation. Automatica 48(1):121ā131.
Jiang, W, Wang G, Wu J (2014) Generating trusted graphs for trust evaluation in online social networks. Future Generation Comp Syst 31:48ā58.
Kempe, D, Kleinberg J, Tardos E (2003) Maximizing the spread of influence through a social network In: Proc. ACM Conf. Knowl. Disc. Data Min, 137ā146.. ACM, New York.
Kim, J, Tabibian B, Oh A, Schƶlkopf B, Gomez-Rodriguez M (2018) Leveraging the crowd to detect and reduce the spread of fake news and misinformation In: Proc. ACM Conf. on Web Search and Data Mining, 324ā332.. ACM, New York.
Koutra, D, Bennett PN, Horvitz E (2015) Events and controversies: Influences of a shocking news event on information seeking In: Proc. World Wide Web Conf, 614ā624.. ACM, New York.
Krishnan, S, Butler P, Tandon R, Leskovec J, Ramakrishnan N (2016) Seeing the forest for the trees: New approaches to forecasting cascades In: Proc. ACM Conf. on Web Science, 249ā258.. ACM, New York.
Kumar, KPK, Geethakumari G (2014) Detecting misinformation in online social networks using cognitive psychology. Human-cent Comp Inf Sci 4:1ā22.
Kumar, S, West R, Leskovec J (2016) Disinformation on the web: Impact, characteristics, and detection of wikipedia hoaxes In: Proc. World Wide Web Conf, 591ā602.. International World Wide Web Conferences, Republic and Canton of Geneva.
Leskovec, J, Huttenlocher D, Kleinberg J (2010a) Governance in social media: A case study of the Wikipedia promotion process In: Proc. Conf. on Weblogs and Social Media.. The AAAI Press.
Leskovec, J, Huttenlocher D, Kleinberg J (2010b) Signed networks in social media In: Proc. Conf. on Human Factors in Computing Systems, 1361ā1370.. ACM, New York.
Lewandowsky, S, Ecker UKH, Seifert CM, Schwarz N, Cook J (2012) Misinformation and its correction. Psychol Sci Public Interest 13(3):106ā131.
Liben-Nowell, D, Kleinberg J (2008) Tracing information flow on a global scale using internet chain-letter data. Proc Natl Acad Sci 105(12):4633ā4638.
Litou, I, Kalogeraki V, Katakis I, Gunopulos D (2016) Real-time and cost-effective limitation of misinformation propagation In: Proc. IEEE Conf. on Mobile Data Management, 158ā163.. IEEE Computer Society.
Liu, B, Cong G, Xu D, Zeng Y (2012) Time constrained influence maximization in social networks In: Proc. IEEE Conf. on Data Mining, 439ā448.. IEEE Computer Society.
Liu, H, Lim E, Lauw HW, Le M, Sun A, Srivastava J, Kim YA (2008) Predicting trusts among users of online communities: an epinions case study In: Proc. ACM Conf. on Electronic Commerce (EC), 310ā319.. ACM, New York.
Lou, VY, Bhagat S, Lakshmanan LVS, Vaswani S (2014) Modeling non-progressive phenomena for influence propagation In: Proc. ACM Conf. on Online Social Networks, 131ā137.. ACM, New York.
Lu, W, Chen W, Lakshmanan LVS (2015) From competition to complementarity: Comparative influence diffusion and maximization. Proc VLDB Endow 9(2):60ā71.
Metaxas, P, Mustafaraj E (2010) From obscurity to prominence in minutes: Political speech and real-time search In: Proc. ACM Conf. on Web Science.
Mohamadi-Baghmolaei, R, Mozafari N, Hamzeh A (2015) Trust based latency aware influence maximization in social networks. Eng Appl Artif Intell 41:195ā206.
Mustafaraj, E, Metaxas PT (2017) The fake news spreading plague: Was it preventable? In: Proc. ACM Conf. on Web Science, 235ā239.. ACM, New York.
Overgoor, J, Wulczyn E, Potts C (2012) Trust propagation with mixed-effects models In: Proc. Int. Conf. on Weblogs and Social Media (ICWSM).. The AAAI Press.
Porter, MA, Gleeson JP (2016) Dynamical Systems on Networks. Frontiers in Applied Dynamical Systems: Reviews and Tutorials. Springer, Cham.
Saito, K, Nakano R, Kimura M (2008) Prediction of information diffusion probabilities for independent cascade model In: Proc. Int. Conf. on Knowledge-Based Intelligent Information and Engineering Systems (KES), 67ā75.. Springer.
Sherchan, W, Nepal S, Paris C (2013) A survey of trust in social networks. Comput Surv ACM 45(4):47:1ā47:33.
Talluri, M, Kaur H, He JS (2015) Influence maximization in social networks: Considering both positive and negative relationships In: Proc. Conf. on Collaboration Technologies and Systems, 479ā480.. IEEE.
Tang, J, Liu H (2015) Trust in Social Media. Synthesis Lectures on Information Security, Privacy, & Trust. Morgan & Claypool Publishers, San Rafael.
Tedjamulia, SJJ, Dean DL, Olsen DR, Albrecht CC (2005) Motivating content contributions to online communities: Toward a more comprehensive theory In: Proc. Int. Conf. on System Sciences (HICSS).. IEEE computer society, Washington, DC.
Tong, G, Wu W, Guo L, Li D, Liu C, Liu B, Du D (2017) An efficient randomized algorithm for rumor blocking in online social networks In: Proc. IEEE Conf. on Computer Communications, 1ā9.
Vaswani, S, Kveton B, Wen Z, Ghavamzadeh M, Lakshmanan LVS, Schmidt M (2017) Model-Independent Online Learning for Influence Maximization In: Proc. Int. Conf. on Machine Learning (ICML), 3530ā3539.. PMLR.
Vazquez, A, RƔcz B, LukƔcs A, BarabƔsi A-L (2007) Impact of non-poissonian activity patterns on spreading processes. Phys Rev Lett 98:158702. ACM, New York.
Vedula, N, Parthasarathy S, Shalin VL (2017) Predicting trust relations within a social network: A case study on emergency response In: Proc. ACM Conf. on Web Science, 53ā62.. ACM, New York.
Weng, X, Liu Z, Li Z (2016) An efficient influence maximization algorithm considering both positive and negative relationships In: Proc. IEEE TrustCom/BigDataSE/ISPA, 1931ā1936.. IEEE.
Yin, G, Jiang F, Cheng S, Li X, He X (2012) AUTrust: A Practical Trust Measurement for Adjacent Users in Social Networks In: Proc. Int. Conf. on Cloud and Green Computing (CGC), 360ā367.. IEEE Computer Society.
Acknowledgements
This research work has been partly supported by the Start-(H)Open POR Grant No. J28C17000380006 funded by the Calabria Region Administration, and by the NextShop PON Grant No. F/050374/01-03/X32 funded by the Italian Ministry for Economic Development.
Funding
Not applicable.
Availability of data and materials
Data used in this work refer to publicly available networks.
Author information
Authors and Affiliations
Contributions
AT and AC devised the study, analyzed the data, and wrote the paper; AC performed the experiments and prepared the result presentation; AT supervised the writing of the paper. Both authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Publisherās Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional information
An abridged version of this work appeared in the Procs. of the 2018 IEEE TrustCom Conf. (CaliĆ² and Tagarelli 2018).
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
CaliĆ², A., Tagarelli, A. Complex influence propagation based on trust-aware dynamic linear threshold models. Appl Netw Sci 4, 14 (2019). https://doi.org/10.1007/s41109-019-0124-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41109-019-0124-5