 Research
 Open access
 Published:
A novel regularized weighted estimation method for information diffusion prediction in social networks
Applied Network Science volumeÂ 8, ArticleÂ number:Â 81 (2023)
Abstract
In recent years, social networks have become popular among Internet users, and various studies have been performed on the analysis of usersâ€™ behavior in social networks. Information diffusion analysis is one of the leading fields in social network analysis. In this context, users are influenced by other users in the social network, such as their friends. User behavior is analyzed using several models designed for information diffusion modeling and prediction. In this paper, first, the problem of estimating the diffusion probabilities for the independent cascade model is studied. We propose a method for estimating diffusion probabilities. This method assigns a weight to each individual diffusion sample within a network. To account for the different effects of diffusion samples, several weighting schemes are proposed. Afterward, the proposed method is applied to real cascade datasets such as Twitter and Digg. We try to estimate diffusion probabilities for the independent cascade model considering the continuous time of nodesâ€™ infections. The results of our evaluation of our methods are presented based on several datasets. The results show the high performance of our methods in terms of training time as well as other metrics such as mean absolute error and Fmeasure.
Introduction
In recent years, online social networks have become a major source of information. The emergence of the importance of social networks is the result of the large amount of data they produce. Usersâ€™ actions are a major source of information in online social networks. A researcher analyzes data such as likes, shares, comments, and other interactions between social network users to find relationships between them. This information is used to understand usersâ€™ behavior in social networks and their influence on each other. A number of problems are studied in this area, such as community detection, link prediction, and information diffusion. The study of information diffusion is mostly focused on how content spreads in social networks through interactions between users or external sources.
Millions of people use popular online social networks such as Facebook, Twitter, and YouTube. Such websites play a vital role in the propagation of information between internet users by providing the opportunity to view everyoneâ€™s viewpoints about different subjects. The role of Facebook during the 2010 Arab Spring and Twitter during the 2016 U.S. presidential elections (Tan et al. 2013) shows the importance of online social networks in this era. In the analysis of information diffusion in social networks, it is assumed that content propagation is the result of usersâ€™ interactions. In this regard, information diffusion prediction over social networks has become a priority for researchers. Several studies try to predict which users become aware of the propagating information in a social network. This field of study has many applications in other information diffusionrelated research topics like rumor blocking (Wang et al. 2017), influence maximization (Khomami et al. 2018), and crowdsourcing. For example, when a rumor spreads about a company, company leaders can make early decisions to avoid damages in the future. On the other hand, understanding how content propagates more on a social network such as Twitter helps advertising find strategies. This results in more information spread in favor of a company (Gao et al. 2017).
The study of information diffusion in social networks was first initiated in epidemiology and social sciences contexts (Najar et al. 2012). In recent years, understanding user behavior in online social networks has become a major research topic. In this context, users influence each other through social relationships like following on Twitter or friendship on Facebook, resulting in information cascades (Bao et al. 2017). In the study of information diffusion, some research areas include detecting popular topics, identifying influential information spreaders, and modeling information diffusion (Patel and Nanavati 2023). It has been observed that many research studies in the context of information diffusion use independent cascade models (ICM) (Goldenberg et al. 2001) or linear threshold models (LTM) (Granovetter 1978) to analyze how information propagates in social networks. Information diffuses in these models through an iterative process. The probability of reaching information to a user depends on the neighbors, which already have propagated the information (Yang and Leskovec 2010; Gomez Rodriguez et al. 2011; Ver Steeg and Galstyan 2013). Various models have been proposed based on these two models. Some of them consider the continuous time of the diffusion process to be more realistic (Saito et al. 2009, 2010).
The parameters of each diffusion model play a significant role in the reflection of nearrealistic modeling. Thus, it can be inferred from the history of usersâ€™ activities on social networks, for example, by learning parameters of the diffusion model. The problem of learning diffusion probabilities for ICM has been first studied in Saito et al. (2008). Learning the parameters of a diffusion model requires observing users' interactions and behavior in the social network for a while. After learning the parameters, the model can be used in various information diffusion applications, for example, predicting which users will be infected in a diffusion process, maximizing influence propagation, etc. Most similar works, which try to learn diffusion probabilities for ICM, have poor training time performance over large networks. This is because they use iterative algorithms to learn diffusion probabilities. Therefore, we consider the reasonable sequence of activating nodes in such a way that a sequence of diffusions may occur from a node as a parent node to other connected nodes as child nodes. In this regard, we assign different weights as diffusion probabilities for each role of nodes as parents or children. To this end, this paper proposes a novel method for estimating diffusion probabilities for ICM. Moreover, we estimate ICM parameters by considering the continuous time of the diffusion process. The model can be useful for predicting information diffusion in social networks for practical purposes. Furthermore, we design a method to overcome the lack of learning data for many social network links. We extract some features from each link and use regression algorithms to learn a regularization term for each link's diffusion probability. Using this method, it is possible to learn the diffusion probabilities of links from fewer data. Our methods have lower training time and error than iterative methods.
In summary, our main contributions are as follows:

We propose a basic method called Weighted Estimation of Diffusion Probabilities (WEDP) to estimate diffusion probabilities for ICM. In this method, we assign different weights to different types of information obtained from past cascades.

We propose an improved version of WEDP, called Continuous Time Weighted Estimation of Diffusion Probabilities (CTWEDP), to learn diffusion probabilities from ICM considering the continuous time of the diffusion process. Similar to WEDP, different weights are obtained for different types of information obtained from cascades.

We propose an improved version of CTWEDP, called Continuous Time Regularized Weighted Estimation of Diffusion Probabilities (CTRWEDP), to learn diffusion probabilities from ICM considering the continuous time of the diffusion process and using regression algorithms to learn a regularization term for each link. This helps links with fewer available data to learn their diffusion probabilities.
The paper is organized as follows. Sect.Â â€śRelated workâ€ť reviews previous works on this problem and their strengths and weaknesses. Sect.Â â€śBackgroundâ€ť explains the background and preliminary requirements for the rest of the paper. Sect.Â â€śWEDP: weighted estimation of diffusion probabilitiesâ€ť describes the basic method of WEDP, including diffusion model description, problem definition, and evaluation of WEDP. Sect.Â â€śCT_WEDP: continuous time weighted estimation of diffusion probabilitiesâ€ť describes the proposed methods for continuous time. The experimental evaluation of the results of the information diffusion prediction task based on the proposed algorithms is given in Sect.Â â€śExperimental evaluationâ€ť. Finally, Sect.Â â€śConclusion and future workâ€ť concludes our work and gives some insights into future works.
Related work
Most social network analysis researchers use graphs to model social networks, where nodes represent users of social networks and edges represent relationships between users, such as friendship, following, etc. In recent years, many research works have studied information diffusion analysis (Yang and Leskovec 2010; Chang et al. 2018; Feng, et al. 2022; Kempe et al. 2003; Yu and Chu 2017; Failed 2016; Sharma and Bajaj 2023). Some researchers focus on influence maximization, which is the study of how to propagate content to reach more users in a social network. On the other hand, some of the studies try to minimize contamination spread in the social network. Other methods focus on learning a diffusion model or predicting information diffusion in an online social network.
Domingos and Richardson (Domingos and Richardson 2001; Richardson and Domingos 2002) were the first to study the spread of influence in social networks and identify the most influential nodes with a data mining approach. There are also some graph heuristics, such as node degree, closeness, etc., for identifying the most influential nodes in social networks. Kempe et al. (Kempe et al. 2003) formally defined the influence maximization problem as an optimization problem. They proved that the optimization problem of selecting the most influential nodes in ICM and LTM is NPhard. They also designed a greedy method with approximation guarantees for the problem of selecting the most influential nodes in both ICM and LTM.
There have been several studies of information diffusion dynamics (Failed 2023; Rezvanian et al. 2023; Guille and Hacid 2012). A timebased asynchronous independent cascade model was proposed by Guille and Hacid (Guille and Hacid 2012). In this model, diffusion probabilities are a function of propagation time. They used Bayesian logistic regression to learn diffusion probabilities from features extracted from each Twitter link. Such studies learn a model based on usersâ€™ extracted features instead of learning a diffusion probability for each link in a social network.
A number of research studies have studied the prediction of information diffusion as the problem of predicting which users will be infected at the end of the diffusion process. One approach to this problem is to learn the parameters of a diffusion model and use the model to predict information diffusion. Saito et al. (Saito et al. 2008) studied learning ICM diffusion probabilities. They formulated the problem as a likelihood function and used the EM algorithm for learning diffusion probabilities. Moreover, Goyal et al. (Goyal et al. 2010) investigated the learning parameters of a general threshold model. Their work is scalable, unlike Saito et al. (Saito et al. 2008), because their model's parameters are not learned iteratively. Some research has attempted to improve the performance of these two primary works. Continuous time models AsIC and AsLT have been developed to consider the continuous time of the diffusion process (Goldenberg et al. 2001; Granovetter 1978). Lamprier et al. (Lamprier et al. 2016) focused on the partial order of node infections instead of the exact timestamp that each node gets infected. Some other studies tackle the problem of information diffusion prediction by embedding users and contents in continuous latent space (Gao et al. 2017; Bourigault et al. 2016). In most cases, they don't consider the full social network graph, and the underlying network is not known to them.
Some other studies use the diffused content and usersâ€™ profiles to enhance information diffusion prediction in social networks (Wang et al. 2020, 2016; Varshney et al. 2017; Barbieri et al. 2013). Barbieri et al. (Barbieri et al. 2013) proposed Topicaware Independent Cascade and Topicaware Linear Threshold models, which consider the topic of content propagated in learning modelsâ€™ parameters. In similar research, Wang et al. (Wang et al. 2016) used the emotion of the propagated content to learn diffusion probabilities for ICM.
In (Beni et al. 2023), Bouyer et. al., an improved version of the ICM is presented with an improved criterion for calculating the diffusion process is used. This is done by considering different diffusion rates for nodes in different layers with respect to core nodes. It means that highinfluence nodes are close to core nodes and lowinfluence nodes reside in outer core nodes. This concept is realized by applying the Kshell algorithm. Based on this model the authors provided candidate node selection and showed in simulation that the algorithm has linear time complexity.
A tensorbased independent cascade model is developed by incorporating the similarity matrix by Lin et al. (2023). The model considers the probabilities of information spreading from one node to its neighbors based on node interactions and similarity. The authors also propose an optimization algorithm to estimate the parameters and identify influential links (Lin et al. 2023).
In (Haldar et al. 2023), Haldar et. al., a novel temporal cascade model is introduced for understanding and analyzing information spreading dynamics in evolving networks. They aim to investigate how network structure and temporal patterns affect the spread of contagions. The proposed model takes into account both network topology and the temporal characteristics of interactions between individuals in the network. It considers the sequential nature of information spread, where individuals become infected and spread the contagion to their neighbors. Due to the critical role of learning diffusion probabilities for diffusion models in realworld scenarios, a stochastic diffusion model is proposed by Rezvanian et al. In this model, the authors take into account the probabilistic nature of information diffusion and incorporate the concept of network centrality as stochastic. They aimed to find a seed set of initial users who can trigger a cascading effect leading to maximum influence spread using learning automata theory (Rezvanian et al. 2023).
In this paper, we tackle the problem of learning diffusion probabilities for ICM and information diffusion prediction in social networks. Using information obtained from observed cascades in the social network, we design weighted estimation methods for this purpose. Additionally, we developed a method that learns a regularization term for diffusion probabilities based on some features of social network links. Our methods are more efficient than other methods for the same purpose.
Background
In this study, we learn the parameters of ICM. First, we explain this model. Next, we introduce the notation used in this paper and define the required preliminary.
ICM diffusion model
In the ICM model, a set of seed nodes is activated at time zero. The diffusion process proceeds discretely. At each time, every newly activated node at the previous time has a single chance to activate any of its inactive child nodes. Node \(v\) is the child of node \(u\) if there is an edge from \(u\) to \(v,\) and \(u\) is the parent of \(v\). If the parent node fails to activate the child node, it will not have another chance to activate it again. When there is no newly activated node, the diffusion process ends. It is not possible to change the state of a node from active to inactive in ICM. In other words, they remain active after their activation time. A parent node activates a child node with a diffusion probability, which should be specified for the corresponding edge between them. Diffusion probabilities are ICM parameters. In this paper, we tackle the problem of estimating these probabilities.
Preliminaries
In this paper, we will follow (Saito et al. 2008)â€™s notation closely. We represent a social network (a directed network) with a graph \(G=(V, E)\). Let \(V\) be the set of nodes and \(E\) be the set of edges of graph. We also represent each edge of the graph with \((u, v)\), which means that there is a link from node \(u\) to node \(v\). Next, we define set of outneighbors and inneighbors of node \(u\). \({N}^{out}\left(u\right)=\{v\left(u, v\right)\in E\}\) denotes the set of children of \(u\) and \({N}^{in}\left(u\right)=\{v\left(v, u\right)\in E\}\) denotes the set of parents of \(u\).
We define a diffusion episode as \({D}_{s}=<D\left(0\right), \dots , D\left(T\right)>\), which is a sequence of node sets, where \({D}_{s}(t)\) denotes a set of nodes that are activated at the time \(t\). Due to the assumption that nodes cannot change from active to inactive, a node cannot appear more than once in a diffusion episode. It means that nodes can be activated once in a single diffusion episode. In addition, \({C}_{s}(t)\) represents a set of nodes activated before time \(t\) in diffusion episode \({D}_{s}\). Moreover, \(A{T}_{s}(u)\) denotes the activation time of node \(u\) in the diffusion episode \({D}_{s}\). \(A{T}_{s}(u)\) will be infinity if \(u\) is not activated in \({D}_{s}\). Furthermore, \(T({D}_{s})\) is the activation time of the last node in \({D}_{s}\).
WEDP: Weighted estimation of diffusion probabilities
In this section, we explain our basic method (called WEDP) to estimate diffusion probabilities for ICM, which is similar to our method presented in Mashayekhi et al. (2018). The problem definition is the same as in Saito et al. (2008). We are trying to answer the question: How can we estimate the diffusion probabilities for ICM, given a directed network graph \(G=(V, E)\) and a set of diffusion episodes \(S=\{{D}_{0}, \dots , {D}_{n}\}\)?. To answer the question, we will estimate the diffusion probability for an arbitrary edge \(e=(u, v)\), denoted by \({P}_{uv}^{WEDP}\), using the diffusion episode set \(S\). \({P}_{uv}^{WEDP}\) is equal to the number of times \(u\) has succeeded in activating \(v\) divided by the number of times \(u\) has attempted to activate \(v\). To estimate \({P}_{uv}^{WEDP}\), we use two functions whose inputs are an edge and a diffusion episode. These two functions are \(isDiffused\) and \(weight\).
The function \(isDiffused\) indicates whether or not node \(u\) could have activated \(v\) in a diffusion episode \({D}_{s}\). Assume that node \(v\) is activated at time \(t+1\). If node \(u\) is activated at time \(t\), \(isDiffused\) returns 1. There is a problem here. If multiple parents of node \(v\) are activated at time \(t\), we do not know which parent activated \(v\) and which parent failed to activate \(v\); \(isdiffused\) returns 1 for all of parents. To solve this problem, the second function, i.e. \(weight\), is proposed.
The function \(weight\) calculates how much we can trust the episode \({D}_{s}\) about the edge \(e=(u, v)\). We assume that if node \(v\) is activated at time \(t+1\) and if more than one of its parents is activated at time \(t\), then all of the parents of \(v\) with activation time \(t\) have successfully activated \(v\). In computing diffusion probability for edge \(e=(u, v)\), information obtained from such diffusion episodes will be weighted lower than episodes in which the parent node \(u\) has successfully or unsuccessfully activated the child node \(v\).
These two functions are explained in the rest of this section.
Function \(isDiffused\)
To explain our method, first, we define \(isDiffused\) function, as shown in Pseudocode 1, which shows that in diffusion episode \({D}_{s}\) and for edge \(e=(u, v)\), whether or not \(u\) could have activated \(v\) in a diffusion episode.
In Pseudocode 1, if the parent node (i.e., node \(u\)) activates child node (i.e. node \(v\)), this function returns one. This means that:

(1)
Parent node \(u\) must be activated in \({D}_{s}\) earlier than the last active node in \({D}_{s}\). (i.e. \(A{T}_{s}\left(u\right)\le T\left({D}_{s}\right)\)),

(2)
Child node \(v\) has the same condition (i.e., \(A{T}_{s}\left(v\right)\le T\left({D}_{s}\right)\)), and

(3)
Child node \(v\) must be made active right after parent node \(u\) in \({D}_{s}\) (i.e. \(A{T}_{s}\left(v\right)=A{T}_{s}\left(u\right)+1\)).
If any of these three conditions is false, the \(isDiffused\) function returns zero.
Function \(weight\)
In this subsection, we define how to specify the weights of information obtained from a diffusion episode \({D}_{s}\) for edge \(e=(u, v)\). To this end, the \(weight(\left(u, v\right), {D}_{s})\) function is defined in Pseudocode 2.
Function \(weight\) will assign a weight to the information obtained from the diffusion episode \({D}_{s}\) for the edge \(e=(u, v)\).

The function assigns weight to zero if \(u\) did not have any chance to activate \(v\). This can occur if \(u\) is not activated at all or if it is activated after \(v\).

The function assigns weight to one if \(u\) had the opportunity to activate \(v\) and we are certain that it did not succeed. It can happen if \(u\) is activated at time \(t\), and \(v\) is activated at time \(t{\prime}>t+1\), or it is not active at all.

If \(u\) is activated at time \(t\) and \(v\) is activated at time \(t+1\), two possible scenarios can occur:

(1)
If \(u\) is the only parent of \(v\) activated at time \(t\) (i.e., #Activated_parents_of_ \(v\) _in_ \({D}_{s}\)=1), the function makes sure that \(u\) has activated \(v\) and assigns weight 1 for this information.

(2)
If there is more than one parent of \(v\) activated at time \(t\), the function calls another function \(W\) to assign a weight for \(e=(u, v)\). This new function is described in the next subsection.

(1)
Function \({\varvec{W}}\)
In the following, we propose two weighting schemes to calculate \(W\): linear decay and exponential decay.

Linear decay weighting: In the linear decay weighting scheme, we define \(\mathrm{W}(\mathrm{A})=\frac{1}{\mathrm{A}}\). In other words, suppose that node \(\mathrm{v}\) is activated at the timestep \(t+1\). Thus, as the number of parents of node \(v\) that may have activated it increases, information for the edges between them and \(\mathrm{v}\) would receive lower weight.

Exponential decay weighting: In the exponential decay weighting scheme, we define \(\mathrm{W}(\mathrm{A})={\mathrm{e}}^{\left(\mathrm{A}1\right)}\). Consequently, this weighting scheme defines lower weights than linear decay weighting when more than one parent of \(v\) could activate it.
Estimate diffusion probabilities for edges
Now we can compute diffusion probabilities for ICM. The diffusion probability of edge \(e=(u, v)\) using a set of diffusion episodes \(S=\{{D}_{0}, \dots , {D}_{n}\}\) is computed by the following formula:
In Eq.Â (1), the diffusion probability of edge \(e=(u, v)\) is equal to the number of times \(e=(u, v)\) may have activated \(v\), divided by the number of times \(u\) had the chance to activate \(v\), with applying weights defined beforehand.
To increase the robustness of estimating diffusion probabilities, we consider a special case. If there is no diffusion episode for an edge, we cannot be certain whether the parent node has succeeded or failed in activating the child node. Although we did not find a diffusion episode that shows the parent node activated the child node, the probability would be one in this case. We use a random probability instead of an estimated probability in this situation.
We refer to this method as Weighted Estimation of Diffusion Probabilities (WEDP).
An example with two simple scenarios
In this subsection, we provide a toy example for diffusion probabilities estimation based on the WEDP. Consider\(G=(V=\{u, v, j\} , E=\{\left(u, v\right),(j, v)\})\). We define two simple scenarios in each of which there are different sets of diffusion episodes\(S=\{{D}_{0}, {D}_{1}\}\). The following Table 1 presents the activation time of each node in each diffusion episode in two scenarios. For example, \(A{T}_{0}\left(u\right)=1\) indicates that node \(u\) is activated at time 1 in the diffusion episode\({D}_{0}\).
As given in Table 1, for Scenario 1, in \({D}_{1}\), multiple parents of node v are activated at time 1, while node v is activated at time 2. Thus, we need to consider different weights for the estimation of diffusion probabilities of edge (u, v) as \({P}_{uv}^{WEDP}\) and edge (j, v) as \({P}_{jv}^{WEDP}\). On the other hand, in scenario 2, each parent of node v is activated in separate episodes. Therefore, function \(isdiffused\) returns different values for each edge to estimate diffusion probabilities.
The estimation of diffusion probabilities for two edges \((u, v)\) and \((j,\) \(v)\) are shown in Table 2. For calculation in function \(W\) the Linear Decay Weighting scheme is used. The calculations related to each scenario are shown in different rows. As presented in Table 2, for each scenario, the calculated diffusion probabilities for edges \((u, v)\) and \((j,\) \(v)\) are the same, whereas the returned values from \(isdiffused\) and \(weight\) functions are different.
Time complexity
WEDP's time complexity is discussed in this section. This is accomplished by calculating the time required to drive Eq.Â (2). In every diffusion episode, two functions are executed, \(isdiffusion\) and \(weight\). Each function has a time complexity of \(O(m)\), where \(m\) is the number of edges in the graph. For one diffusion episode, the time complexity is \(O(2m)\). In Eq.Â (2), because there are \(n\) episodes in \({D}_{s}\), the time complexity is \(O(2mn)\). Thus, the time complexity of the WEDP is \(O(mn)\).
CT_WEDP: Continuous time weighted estimation of diffusion probabilities
Our proposed methods for predicting information diffusion for continuous time are described in this section. We are trying to answer the question: given a set of nodes in a social network as the initial nodes in a diffusion episode. Which users will be infected at the end of the diffusion episode?
One approach to answering this question is to learn a diffusion model and utilize it to simulate the diffusion episode. This will enable us to predict information diffusion. In this paper, we try to learn ICM parameters and use them for this purpose. One problem with the method explained in the previous section is that it does not consider continuous time in diffusion episodes. This will happen in real online social networksâ€™ diffusion episodes. Hence, we use the problem definition of the previous section, but we consider continuous time in diffusion episodes.
To overcome this problem, while estimating diffusion probabilities, if a parent node is active at time \(t\), we consider that it can activate its child nodes at any time \(t\mathrm{^{\prime}}>t\) instead of \(t+1\). Consequently, if a node is activated at time t, then any of its parents that were activated before this time may have activated it. To include continuous diffusion time, we redefine the \(isDiffused\) as \(isDiffused\_CT\) and \(weight\) as \(weight\_CT\) functions in Pseudocodes 3 and 4, respectively.
In Pseudocode 3, the function returns one if parent node \(u\) could have activated child node \(v\). This will happen if \(v\) is activated after \(u\). It returns zero otherwise. Pseudocode 4 is similar to Pseudocode 2 but uses the \(isDiffused\_CT\) function instead of \(isDiffused\).
Now, the diffusion probability for \(e=(u, v)\) can be calculated by (2), similar to the formula proposed for WEDP. We refer to this method as Continuous Time Weighted Estimation of Diffusion Probabilities (CTWEDP).
The sparsity of information diffusion data on realworld social networks
One of the problems with estimating diffusion probabilities in realworld social networks is the lack of learning data for most of the links in the network. More specifically, for many edges in the social network graph, it does not often happen that the parent node becomes active before its children. Additionally, most of the time, the number of a node's parents who have been activated before it is much smaller than the number of times a child is not activated at all or just one of its parents has activated it. Consequently, estimating diffusion probabilities for many edges is not as robust as others.
To get more insight, we collected tweets from a set of Twitter users between January 1^{st}, 2018, and March 19th, 2018. We considered hashtag propagation as diffusion episodes. Further, we considered edges in the graph when at least one parent node activated a child during a diffusion episode (the \(isDiffused\) function returns one in at least one diffusion episode). In the dataset, we have 6500 nodes, 512,144 edges, and 70,709 diffusion episodes.
FigureÂ 1 represents the number of edges concerning their sum of weights. We used linear decay weighting in Fig.Â 1. We call the sum of weights of an edge the weight of the edge and refer to it as \({W}_{uv}\).
According to Fig.Â 1, the number of edges with high weights is less than edges with low weights. It shows that data available for estimating diffusion probabilities for many edges are rare. Hence, diffusion probabilities in this manner have considerable error.Â
Improving robustness
To learn diffusion probabilities, we improve the CTWEDP. To this end, we use various social networking features to estimate the probability of the edges that have less data to learn by those edges that have more data to learn. Thus, the diffusion probability of the edge \(e=(u, v)\) using a set of diffusion episodes \(S=\{{D}_{0}, \dots , {D}_{n}\}\) is computed by the following formula:
In (3), \({P}_{uv}^{CTWEDP}\) is probability learned from CTWEDP and \({P}_{uv}^{learn}\) learn is probability learned from edges with more data. To learn this term, we use edges with more weights. \({\lambda }_{uv}\) is a balancing factor. We refer to this method as Continuous Time Regularized Weighted Estimation of Diffusion Probabilities (CTRWEDP).
We use the sum of the weights of the edge to compute \({\lambda }_{uv}\). We suggest a logarithm function as a concave function for \({\lambda }_{uv}\).
In (4), \({W}_{uv}\) is the sum of the weights of the edge \((u, v)\) and \({W}_{AvgEstimate}\) is the average of \({W}_{zw}\) for all edges which are used for estimating purposes. We use \(1+\epsilon\) in the denominator to ensure that it is greater than zero. These edges are \(\alpha \%\) of all edges, with more \({W}_{zw}\).
Consider Eq.Â (4). If \({W}_{uv}<{W}_{AvgEstimate}\), the equation returns a value smaller than 1 for Lambda. It means we need to improve robustness. In this situation, the information obtained from the diffusion episode set for edge \((u, v)\) is not enough, and more learning is needed. The logarithm function for \({\lambda }_{uv}\) as a concave function simplifies calculations and means estimation curves downward. The logarithm function is commonly applied to many machine learning techniques such as Maximum Likelihood Estimation (MLE) (Failed 2006).
To learn \({P}_{uv}^{learn}\), we extracted some features from the edges of the graph. Next, we applied some regression algorithms (linear regression, decision tree, KNN regression, and SVR) to learn this term. We gained better performance utilizing SVR and used this algorithm in this paper. We used five features described below:

Child node activation rate: this feature specifies how many times the child node has been activated in diffusion episodes; with respect to the times it had the chance to be activated (at least one of these parents should be activated in the diffusion episode). It is computed using the following formula:
$$Child\;Positive\;Rate_{{uv}} {\text{~}} = \frac{{\left {\left\{ {D_{s} ~{\text{}}D_{s} \in S~AT_{s} \left( v \right) \le T\left( {D_{s} } \right)} \right\}} \right}}{{\left {\left\{ {D_{s} ~{\text{}}D_{s} \in S~AT_{s} \left( v \right) \le T\left( {D_{s} } \right)~or~\left {w \in N^{{in}} \left( v \right)~AT_{s} \left( w \right) < T\left( {D_{s} } \right)} \right > 0} \right\}} \right}}$$(5) 
Parent node diffusion rate: this feature specifies what fraction of children of the parent nodes have been activated if the parent node has been activated before them. Let \(diffusionSet(u)\) be \(\{{D}_{s}{D}_{s}\in S, A{T}_{s}\left(u\right)\le T\left({D}_{s}\right), \leftw\in {N}^{out}\left(u\right), A{T}_{s}\left(u\right)<A{T}_{s}\left(w\right)\right>0\}\). The feature is computed using the following formula:
$$Parent\;Diffusion\;Rate_{{uv}} = \frac{{\mathop \sum \nolimits_{{D_{s} \in diffusionSet\left( u \right)}} \frac{{\left {w \in N^{{out}} \left( u \right)~AT_{s} \left( u \right) < AT_{s} \left( w \right) < T\left( {D_{s} } \right)} \right}}{{\left {w \in N^{{out}} \left( u \right)~AT_{s} \left( u \right) < AT_{s} \left( w \right)} \right}}}}{{\left {diffusion\;Set\left( u \right)} \right}}$$(6) 
Parent node diffusion count: this feature specifies how many children of the parent nodes have been activated if a parent node has been activated before them. It is computed using the following formula:
$$Parent\;Diffusion\;Count_{{uv}} = \frac{{\mathop \sum \nolimits_{{D_{s} \in diffusionSet\left( u \right)}} \left {w \in N^{{out}} \left( u \right)~AT_{s} \left( u \right) < AT_{s} \left( w \right) < T\left( {D_{s} } \right)} \right}}{{\left {diffusion\;Set\left( u \right)} \right}}$$(7) 
Jaccard similarity coefficient: this feature specifies how many diffusion episodes both the parent and child nodes have been activated, based on at least one of them being activated. It is computed using the following formula:
$$Jaccard_{uv} = \frac{{\left {\left\{ {D_{s} D_{s} \in S AT_{s} \left( u \right) < T\left( {D_{s} } \right) AT_{s} \left( v \right) < T\left( {D_{s} } \right)} \right\}} \right}}{{\left {\left\{ {D_{s} D_{s} \in S AT_{s} \left( u \right) < T\left( {D_{s} } \right) or AT_{s} \left( v \right) < T\left( {D_{s} } \right)} \right\}} \right}}$$(8) 
Parent node outdegree: This feature indicates the number of outdegree parent nodes (i.e., the number of followers in a social network). It shows a node is popular among other nodes.
Experimental evaluation
Experiment 1
The purpose of experiment 1 is to evaluate the performance of WEDP. WEDP is compared with EM (Saito et al. 2008) in terms of mean absolute error (MAE), mean relative error (MRE), and training time. In (Saito et al. 2008), Saito et al. formulated the problem of learning diffusion probabilities from a set of diffusion episodes as a likelihood function and used the EM algorithm to optimize it.
We applied the approach presented in Saito et al. (2008) to choose diffusion probabilities and create diffusion episodes. For each dataset, we generated 100 different diffusion episodes with one uniformly randomly selected seed node for each episode. We tested the methods for different sizes of learning episodes. The diffusion probabilities are selected uniformly at random in the range [0.1, 0.3].
Table 3 lists the datasets used in the experiments. Selfedges are removed from datasets. Also, we considered the edges where the parent node activated before the child node in one diffusion episode. The datasets used in this experiment are Digg (Choudhury et al. 2009), Slashdot (GĂłmez et al. 2008), UC Irvine (Opsahl and Panzarasa 2009), CitHepTh (Leskovec et al. 2007), DBLP (Ley 2002), Cora (Ĺ ubelj and Bajec 2013), Googleâ€‰+â€‰(Leskovec and Mcauley 2012), and Twitter (Choudhury et al. 2010).
In Table 3, \(n\) and \(m\) show the number of nodes and edges in the network, respectively. Moreover, \(avg\_n\),\(ma{x}_{n}\), and \(mi{n}_{n}\) represent the average, maximum, and minimum number of nodes activated in diffusion episodes.
In Figs.Â 2, 3, 4, 5, 6, 7, 8, and 9, MAE is plotted against the number of diffusion episodes for each dataset.
In Tables 4, 5, 6, 7, 8, 9, 10, and 11, MRE is shown against the number of diffusion episodes for different datasets. The number of diffusion episodes used for learning is S. Bold values show the best result.
As it is clear from the results, WEDP shows better performance in terms of both mean absolute error and mean relative error than EM. We also observe that by increasing the number of diffusion episodes, all methods result in less error in learning diffusion probabilities. Hence, in this setting, WEDP has better performance. Also, we observed that WEDPExp had less error than WEDPLnr in some datasets, demonstrating the importance of weighing lower the data obtained from diffusion episodes in which multiple parent nodes activated the child node.
FigureÂ 10 shows the training time of methods in each dataset for 100 diffusion episodes.
As shown in Fig.Â 10, WEDP has less training time than EM, which is a significant improvement in learning diffusion probabilities. The reason is that EM uses an iterative algorithm to learn diffusion probabilities. On the other hand, WEDP estimates diffusion probabilities directly from diffusion episodes. Learning diffusion probabilities can be done through one scan of diffusion episodes. As a result, WEDP has fewer errors than EM, and its training time is much shorter than that of EM.
Experiment 2
In experiment 2, we evaluate CTWEDP and CTRWEDP performance. We compare them with the following methods:

EM: Saito et al. (2008) formulated the problem of learning diffusion probabilities from a set of diffusion episodes as a likelihood function and used the EM algorithm to optimize it. We refer to this method as EM.

CTIC: Saito et al. (2009) formulated the problem of learning diffusion probabilities considering the continuous time of nodesâ€™ activation times as a likelihood function. They used the EM algorithm to learn diffusion probabilities and time delays of node activations. We refer to this method as CTIC.

DAIC: Lamprier et al. (2016) proposed a method similar to EM to learn diffusion probabilities for ICM by focusing on the partial order of nodesâ€™ activation times instead of exact activation times. They also considered exponential prior distributions of the diffusion probabilities to improve robustness. We refer to this method as DAIC.
We evaluate methods for the prediction of information diffusion. In a test diffusion episode, we must predict which nodes will be activated at the end. We use the training time of methods and F1Score for evaluation, which is the harmonic average of precision and recall. Recall considers the ratio of nodes activated in a diffusion episode that has been retrieved as activated in the simulation. Precision is the ratio of correct activation predictions.
We use three datasets in this experiment. We considered the edges that, at least in one diffusion episode, the parent node activated before the child node:

The Digg news portal lets users vote on stories (articles, blog posts, videos, etc.). The dataset is described in Hogg and Lerman (2012). Voting on each story was considered a diffusion episode.

Twitter: In this dataset, we collected tweets from a set of Twitter users between January 1st, 2018, and March 19th, 2018. We started with ten random users. Next, we took 1000 followers from the collected users, which had more followers within the collected users. We followed this approach 10 times. We considered each hashtag propagation as a diffusion episode.

MemeTracker: The MemeTracker corpus has diffusion episodes of short phrases. This dataset is described in Leskovec et al. (2009). We considered this dataset between August 1st, 2008, and April 1st, 2009. We consider a link from blog A to blog B if blog B has at least one post having a post from blog A in it. In this way, we can conclude that blog B is watching blog Aâ€™s posts, and hence, diffusion will occur from blog A to B.
Table 12 gives some statistics for experiment datasets.
In Table 12, all parameters are the same as in Table 2. Moreover, parameter \(c\) shows the number of diffusion episodes in each dataset.
Table 13 shows the result for the F1Score for each dataset.
In all datasets, CTRWEDP shows better performance. In this CTRWEDP, we used CTWEDP with the weighting scheme. In the Digg dataset, we employed CTWEDPLnr in CTRWEDP because it has better performance than other methods. In the Twitter and MemeTracker datasets, we employed CTWEDPExp in CTRWEDP because it has a higher F1Score than other methods. These results show that CTWEDP and CTRWEDP are suitable methods for learning diffusion probabilities in realworld social networks. As a result, learning regularization terms with reliable diffusion probabilities can be helpful when learning data for some edges in the network is less than others.
FigureÂ 11 shows the training time of methods for each dataset.
CTWEDP and CTRWEDP have less training time than EMbased methods (EM, CTIC, DAIC). The reason is that EMbased methods use the EM algorithm in their learning phase, which is an iterative algorithm. Thus, CTWEDP and CTRWEDP have less training time and a better F1Score, which shows that they are appropriate for learning diffusion probabilities in realworld social networks.
Conclusion and future work
In this paper, we tackled the problem of learning diffusion probabilities for a wellknown diffusion model, ICM. Next, we proposed a method to learn diffusion probabilities for ICM considering continuous cascade time. Consequently, it can be used in information diffusion prediction. We also used some features of social networks to learn diffusion probabilities from edges with more reliable data. Our methods learn diffusion probabilities from diffusion episodes available for use in learning. The experiments showed that our methods reduce error. As a result of the method proposed in this paper, probabilities are learned in much less time, making it more scalable.
Based on promising results with this approach, we can expect a variety of developments in the proposed method in the future. For instance, we intend to consider the dynamics of social networks. Users and the relationships between them change over time and it will be helpful to design methods that consider these changes. We can also use usersâ€™ profiles and content, propagated to result in a model close to true usersâ€™ behaviors.
Availability of data and materials
The datasets analyzed during the current study are available in the KONECT â€“ The Koblenz Network Collection repository, http://konect.cc/networks/.
References
Bao Q, Cheung WK, Zhang Y, Liu J (2017) A componentbased diffusion model with structural diversity for social networks. IEEE Trans Cybern 47:1078â€“1089
Barbieri N, Bonchi F, Manco G (2013) Topicaware social influence propagation models. Knowl Inf Syst 37:555â€“584
Beni HA, Bouyer A, Azimi S, Rouhi A, Arasteh B (2023) A fast module identification and filtering approach for influence maximization problem in social networks. Inf Sci 640:119105
Bishop CM, Nasrabadi NM (2006) Pattern recognition and machine learning. Springer, New York
Bourigault S, Lamprier S, Gallinari P (2016) Representation learning for information diffusion through social networks: an embedded cascade model. In: Proceedings of the Ninth ACM international conference on web search and data mining, pp 573â€“582
Chang B, Xu T, Liu Q, Chen EH (2018) Study on information diffusion analysis in social networks and its applications. Int J Autom Comput 15:377â€“401
De Choudhury M, Sundaram H, John A, Seligmann DD (2009) Social synchrony: predicting mimicry of user actions in online social media. In: 2009 International conference on computational science and engineering CSE'09, pp 151â€“158
De Choudhury M, Lin YR, Sundaram H, Candan KS, Xie L, Kelliher A (2010) How does the data sampling strategy impact the discovery of information diffusion in social media? ICWSM 10:34â€“41
Domingos P, Richardson M (2001) Mining the network value of customers. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp 57â€“66
Zhang K, Yun X, Liang J, Zhang Xy, Li C, Tian B (2016) Retweeting behavior prediction using probabilistic matrix factorization. In: 2016 IEEE symposium on computers and communication (ISCC), pp 1185â€“1192
Feng S et al. (2022) HDiffu: hyperbolic representations for information diffusion prediction. In: IEEE Trans Knowl Data Eng, pp 1â€“14
Gao S, Pang H, Gallinari P, Guo J, Kato N (2017) A novel embedding method for information diffusion prediction in social network big data. IEEE Trans Industr Inf 13:2097â€“2105
Goldenberg J, Libai B, Muller E (2001) Talk of the network: a complex systems look at the underlying process of wordofmouth. Mark Lett 12:211â€“223
GĂłmez V, Kaltenbrunner A, LĂłpez V (2008) Statistical analysis of the social network and discussion threads in Slashdot. In: Proceedings of the 17th international conference on World Wide Web, pp 645â€“654
Gomez Rodriguez M, Balduzzi D, SchĂ¶lkopf B (2011) Uncovering the temporal dynamics of diffusion networks. In: Proceedings of the 28th international conference on machine learning (ICML 11), ICMLâ€™11, ACM, 2011, pp 561â€“568
Goyal A, Bonchi F, Lakshmanan LV (2010) Learning influence probabilities in social networks. In: Proceedings of the third ACM international conference on Web search and data mining, pp 241â€“250
Granovetter M (1978) Threshold models of collective behavior. Am J Sociol 83:1420â€“1443
Guille A, Hacid H (2012) A predictive model for the temporal dynamics of information diffusion in online social networks. In: Proceedings of the 21st international conference on World Wide Web, pp 1145â€“1152
Haldar A, Wang S, Demirci GV, Oakley J, Ferhatosmanoglu H (2023) Temporal cascade model for analyzing spread in evolving networks. ACM Trans Spatial Algor Syst 9(2):1â€“30
Hogg T, Lerman K (2012) Social dynamics of digg. EPJ Data Sci 1:5
Kempe D, Kleinberg J, Tardos Ă‰ (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 137â€“146
Khomami MMD, Rezvanian A, Bagherpour N, Meybodi MR (2018) Minimum positive influence dominating set and its application in influence maximization: a learning automata approach. Appl Intell 48:570â€“593
Lamprier S, Bourigault S, Gallinari P (2016) Influence learning for cascade diffusion models: focus on partial orders of infections. Soc Netw Anal Min 6:93
Leskovec J, Mcauley JJ (2012) Learning to discover social circles in ego networks. Adv Neural Inf Process Syst pp 539â€“547
Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data (TKDD) 1:2
Leskovec J, Backstrom L, Kleinberg J (2009) Memetracking and the dynamics of the news cycle. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 497â€“506
Ley M (20025) The DBLP computer science bibliography: evolution, research issues, perspectives. In: International symposium on string processing and information retrieval, pp 1â€“10
Lin W, Xu Q, Li Y, Xu L (2023) A tensorbased independent cascade model for finding influential links considering the similarity. Chaos Solitons Fract 173:113655
Lin Y, Wang X, Wang L, Wan P (2023) Dynamics modeling and optimal control for multiinformation diffusion in Social Internet of Things. Digit Commun Netw. https://doi.org/10.1016/j.dcan.2023.02.014
Mashayekhi Y, Meybodi MR, Rezvanian A (2018) Weighted estimation of information diffusion probabilities for independent cascade model. In: 2018 4th International Conference on Web Research (ICWR), pp 63â€“69
Najar A, Denoyer L, Gallinari P (2012) Predicting information diffusion on social networks with partial knowledge. In: Proceedings of the 21st international conference on world wide web, pp 1197â€“1204
Opsahl T, Panzarasa P (2009) Clustering in weighted networks. Social Netw 31:155â€“163
Patel NA, Nanavati N (2023) A state of the art review on user behavioral issues in online social networks. Recent Adv Comput Sci Commun (Formerly: Recent Patents on Computer Science) 16(2):81â€“95
Rezvanian A, Vahidipour SM, Meybodi MR (2023) A new stochastic diffusion model for influence maximization in social networks. Sci Rep 13(1):6122
Richardson M, Domingos P (2002) Mining knowledgesharing sites for viral marketing. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp 61â€“70
Saito K, Nakano R, Kimura M (2008) Prediction of information diffusion probabilities for independent cascade model. In: Knowledgebased intelligent information and engineering systems, pp 67â€“75
Saito K, Kimura M, Ohara K, Motoda H (2009) Learning continuoustime information diffusion model for social behavioral data analysis. Adv Mach Learn, pp 322â€“337
Saito K, Kimura M, Ohara K, Motoda H (2010) Selecting information diffusion models over social networks for behavioral analysis. In: Joint European conference on machine learning and knowledge discovery in databases, pp 180â€“195
Sharma K, Bajaj M (2023) DeepWalk based influence maximization (DWIM): influence maximization using deep learning. Intell Autom Soft Comput 35(1)
Ĺ ubelj L, Bajec M (2013) Model of complex networks based on citation dynamics. In: Proceedings of the 22nd international conference on World Wide Web, pp 527â€“530
Tan W, Blake MB, Saleh I, Dustdar S (2013) Socialnetworksourced big data analytics. IEEE Internet Comput 17:62â€“69
Varshney D, Kumar S, Gupta V (2017) Predicting information diffusion probabilities in social networks: a Bayesian networks based approach. KnowlBased Syst 133:66â€“76
Ver Steeg G, Galstyan A (2013) Informationtheoretic measures of influence based on content dynamics. In: Proceedings of the sixth ACM international conference on Web search and data mining, 2013, pp 3â€“12
Wang B, Chen G, Fu L, Song L, Wang X (2017) Drimux: dynamic rumor influence minimization with user experience in social networks. IEEE Trans Knowl Data Eng 29:2168â€“2181
Wang Z, Chen C, Li W (2020) Joint learning of user representation with diffusion sequence and network structure. IEEE Trans Knowl Data Eng 34(3):1275â€“1287
Wang Z, Zhao J, Xu K (2016) Emotionbased Independent Cascade model for information propagation in online social media. In: 2016 13th international conference on service systems and service management (ICSSSM), pp 1â€“6
Yang J, Leskovec J (2010) Modeling information diffusion in implicit networks. In: 2010 IEEE 10th international conference on data mining (ICDM), pp 599â€“608
Yu X, Chu T (2017) Learning the structure of influence diffusion in the independent cascade model. In: 2017 36th Chinese Control Conference (CCC), pp 5647â€“5651
Funding
No funding.
Author information
Authors and Affiliations
Contributions
YM designed, proposed the original idea, developed the code, designed, and performed the simulation experiments. YM, AR and SMV wrote the main manuscript text and guided the research process. All authors planned the work, analyzed the results, and reviewed the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Mashayekhi, Y., Rezvanian, A. & Vahidipour, S.M. A novel regularized weighted estimation method for information diffusion prediction in social networks. Appl Netw Sci 8, 81 (2023). https://doi.org/10.1007/s4110902300605z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s4110902300605z