Using a novel genetic algorithm to assess peer influence on willingness to use pre-exposure prophylaxis in networks of Black men who have sex with men

Johnson, Kara Layne; Walsh, Jennifer L.; Amirkhanian, Yuri A.; Borkowski, John J.; Carnegie, Nicole Bohme

doi:10.1007/s41109-020-00347-2

Research
Open access
Published: 18 March 2021

Using a novel genetic algorithm to assess peer influence on willingness to use pre-exposure prophylaxis in networks of Black men who have sex with men

Kara Layne Johnson ORCID: orcid.org/0000-0003-1008-9400¹,
Jennifer L. Walsh²,
Yuri A. Amirkhanian²,
John J. Borkowski¹ &
…
Nicole Bohme Carnegie¹

Applied Network Science volume 6, Article number: 22 (2021) Cite this article

2064 Accesses
2 Citations
1 Altmetric
Metrics details

Abstract

The DeGroot model for opinion diffusion over social networks dates back to the 1970s and models the mechanism by which information or disinformation spreads through a network, changing the opinions of the agents. Extensive research exists about the behavior of the DeGroot model and its variations over theoretical social networks; however, research on how to estimate parameters of this model using data collected from an observed network diffusion process is much more limited. Existing algorithms require large data sets that are often infeasible to obtain in public health or social science applications. In order to expand the use of opinion diffusion models to these and other applications, we developed a novel genetic algorithm capable of recovering the parameters of a DeGroot opinion diffusion process using small data sets, including those with missing data and more model parameters than observed time steps. We demonstrate the efficacy of the algorithm on simulated data and data from a social network intervention leveraging peer influence to increase willingness to take pre-exposure prophylaxis in an effort to decrease transmission of human immunodeficiency virus among Black men who have sex with men.

Background

The use of interventions for epidemic control requires both an effective intervention and sufficient uptake by the population. In the case of human immunodeficiency virus (HIV), Black men who have sex with men (BMSM) are disproportionately affected by $\text {HIV}$ infection throughout the United States [1]. Pre-exposure prophylaxis (PrEP) has been shown to reduce lifetime infection risk and increase mean life expectancy; however, $\text {PrEP}$ uptake is much lower for $\text {BMSM}$. Negative $\text {PrEP}$-related stereotypes are prevalent and awareness of $\text {PrEP}$ is low among $\text {BMSM}$, particularly those outside of large cities and among men who have sex with men who are not gay-identified nor easily reached through $\text {PrEP}$ campaigns directed toward the gay community [1,2,3]. An ongoing study seeks to assess the feasibility of increasing $\text {PrEP}$ uptake for $\text {BMSM}$ through the use of a social network intervention: training network leaders to communicate the benefits of $\text {PrEP}$ within their social networks [4]. In this paper, we analyze data from the pilot study of the intervention and present a methodological innovation needed to facilitate evaluation of the possible population-level impact of the intervention on HIV incidence.

Most $\text {BMSM}$ are connected with other men who have sex with men (MSM) of color in their personal social and sexual networks. For that reason, it is possible to reach men through their network connections. In addition to serving as a vehicle for reaching high-risk $\text {BMSM}$ in the community, networks are social environments that can be harnessed for interventions to increase $\text {PrEP}$ awareness, correct $\text {PrEP}$ misconceptions, and strengthen norms, attitudes, benefit perceptions, and skills for $\text {PrEP}$ use. In the $\text {HIV}$ epidemiology literature, the networks of $\text {BMSM}$ have too often been studied only as drivers of disease transmission [5]. However, from a strengths-based perspective, social networks also carry positive, adaptive, and protective functions. $\text {BMSM}$ confront stigma and exclusion due to homophobia in the Black community and racism in predominantly white gay communities, leading some to develop social institutions such as constructed families and house ball communities for support [6,7,8,9,10,11].

$\text {HIV}$ prevention advice from personally-known and trusted sources is likely to have greater impact than messages from impersonal sources. For that reason, recommendations that come from influential members of one’s close personal social network are especially powerful. Well-liked peers influence the actions and beliefs of their friends. Peers within the close personal networks of $\text {BMSM}$ provide acceptance, trusted information, and guidance on courses of action, including in matters related to $\text {HIV}$ prevention [9]. Messages that provide information but also target the recipient’s $\text {PrEP}$-related perceived norms, attitudes, intentions, and self-efficacy are likely to have the greatest impact because these theory-based domains influence the adoption of protective actions [12, 13].

The intervention is also grounded in principles of innovation diffusion theory [14]. After recruiting networks of $\text {BMSM}$ in the community, the intervention involved selection of a cadre of members within each network who were most socially interconnected with others, most trusted for advice, and most open to $\text {PrEP}$. These network leaders together attended sessions in which they learned about $\text {PrEP}$ and its benefits and were systematically engaged to talk with friends about these topics, correct misconceptions and counter negative stereotypes about $\text {PrEP}$, instill interest in $\text {PrEP}$, and guide interested friends in accessing $\text {PrEP}$ providers. Thus, the intervention engaged trusted and socially-interconnected network leaders to function as agents who diffuse messages to others.

A preliminary analysis on the pilot study data demonstrates more favorable opinions of $\text {PrEP}$ after the network leader intervention, across all subjects and for only those subjects who did not attend leadership training [4]. This analysis supports the continuing use of this intervention from an individual risk perspective. The question remains, however, how impactful the intervention might be if implemented at scale, and how the benefits to HIV prevention compare to similarly intensive interventions. To make this assessment, we plan to use an agent-based epidemic model. In order to evaluate the intervention, it is necessary to translate the results from the pilot and ongoing study to network and intervention parameters in the epidemic modeling framework. For this, we need to understand the structure of influence observed in the local networks. Hence, we wish to estimate the parameters of the opinion diffusion process. Since existing methods for fitting appropriate opinion diffusion models vastly exceed the data available in both the pilot and full study, we developed the parameter estimation method presented here.

We first detail classes of models for opinion diffusion, assessing the appropriateness of each method for modeling the diffusion of opinions about $\text {PrEP}$ through the observed networks and describing our chosen model. We then detail the genetic algorithm we developed to estimate the parameters of the selected opinion diffusion model. Finally, we assess the performance of the algorithm on simulated data and demonstrate its practical application using the pilot study data, discussing behavior of the algorithm and interesting features of the estimated opinion diffusion process.

Opinion diffusion models

A variety of models exist for opinion diffusion: the process through which opinions change and spread through a network. They vary in complexity, underlying assumptions, and the precision or structure of opinions generated. In this section, we outline the main classes of models and justify our choice of the DeGroot model [15] for our intended application.

Modeling considerations

Our primary considerations for selecting a model are the limited number of time steps available, small observed networks, expected features of $\text {BMSM}$ networks, structure of data collected, and focus on agent-level assessments. The pilot study consisted of observations collected at two time steps (initial opinions and one measurement of opinions in follow-up assessment after the intervention) and the full study will include three time steps (initial opinions and two follow-up assessments). These limited numbers of observations mean an appropriate model will not rely on the system reaching equilibrium. They also limit our ability to estimate a large number of parameters or assess the appropriateness of our selected model using data, informing our preference for simple models that have already been validated on the small networks ($N=4$ to $N=12$) present in the pilot study. Since the networks in the study are themselves clusters from larger networks and contain disinformation, an appropriate model will allow for disinformation under that structure. Because the data consist of Likert-scale measures, the chosen model must make use of the precision available in the data, especially in the absence of more observations. Finally, given our interest in the influence of particular agents within the network, an appropriate model will involve agent-level parameters as opposed to network-level parameters.

Statistical physics

Statistical physics traditionally focuses on modeling the movement of particles but has increasingly been applied to other fields including opinion diffusion, where agents take the place of particles [16]. Since these models typically involve taking the thermodynamic limit which–in the case of network models–means assuming a network with infinitely many agents, statistical physics models are applied to large networks where each agent interacts with a negligible number of agents relative to the size of the network [17, 18]. Even for networks with hundreds or thousands of agents, this assumption is problematic as behaviors of the diffusion process due to finite size effects are absent from the models using the thermodynamic limit [18]. Given the very small networks included in our data set, models that assume an infinite network are not appropriate. These models also focus on explaining the overall behavior of diffusion process through the actions of individual agents [16, 17] while our goal is to explain the behavior of individual agents through their interactions. Finally, while other candidate models have been validated using data, model validation is largely absent from the statistical physics literature [17].

$\text {SIR}$ model

Banerjee, Chandrasekhar, Duflo, and Jackson successfully modeled the diffusion of information about microfinance loans between households using a modification of the Susceptible-Infected-Recovered (SIR) epidemic model in which information about microfinance loans takes on the role of the disease [19]. The model was fit to population-level uptake data and does not incorporate agent-level parameters. Though this method would be appropriate for modeling the impact of the intervention on uptake of $\text {PrEP}$ over the limited number of time steps collected, it does not allow for an assessment of how opinions about $\text {PrEP}$ change within the network and would not make full use of the more precise opinion data collected.

Bayesian and naive learning

Unlike $\text {SIR}$ models, Bayesian learning directly models opinions, rather than uptake; however, these models require distributional assumptions about each agent’s prior belief about the state of the world and the signals—or information—received from other agents, both marginally and conditional on the state of the world. Bayesian learning models also assume agents are able to calculate the likelihood of the signals they receive under their current worldview according to Bayes’ rule [20]. These assumptions are problematic both from a modeling perspective and because they imply an unrealistic level of sophistication in the learning mechanisms of each agent. Additionally, this sophistication makes modeling of disinformation difficult [20, 21].

Two different experiments conducted on networks of seven agents using binary signals compared Bayesian learning to non-Bayesian naive learning^{Footnote 1} where agent adopt the majority belief expressed by themselves and their contacts. These experiments demonstrate that naive learning predicts the behaviors of individual agents better than Bayesian learning, especially in highly clustered or insular networks; however, they also indicate that agents behave with more sophistication than is implied by the naive model [21, 22]. Specifically, agents account for dependencies between signals received from agents who are connected to each other, though not to the extent a Bayesian learner would. A slight modification of the naive learning model where agents can place varying importance on the signals of other agents allows for more sophistication in the learning behavior of agents and can even approximate the behavior of a Bayesian learner, especially when the importance can vary with time [22]. This modification brings us to the DeGroot model for opinion diffusion.

DeGroot model

The DeGroot model is the foundational opinion diffusion model and most influential non-Bayesian model, with the majority of non-Bayesian models being modifications of the DeGroot model [15, 20,21,22]. Under the DeGroot opinion diffusion model, agents update their opinions at each time step to be a weighted average of their own current opinion and the opinions of everyone with whom they interact. This process is described on a network of N agents by

$$\begin{aligned} X (t+1)=WX (t) \end{aligned}$$

where X (t) is a vector of length N with $x_i (t)\in [0,1]$ representing the opinion of agent i at time t and W is an $N\times N$ matrix of weights with $w_{ij}$ representing the weight that agent i places on the opinion of agent j. The elements in the weight matrix W are restricted so that $0\le w_{ij}\le 1$ and $\sum _{j=1}^Nw_{ij}=1$.

W is further restricted based on the social network as represented by the adjacency matrix A, in which $a_{ij}=a_{ji}=1$ if agents i and j are connected in the network and $a_{ij}=a_{ji}=0$ otherwise. Since agents can only be directly influenced by the opinions of agents with whom they interact, $w_{ij}=0$ if agents are not connected in the network ($a_{ij}=0$). We set $a_{ii}=1$ to allow agents to update their opinions based on their own current opinions [15]. While we considered extension of the DeGroot model that include bounded confidence or decaying weight placed on the opinions of others, we lack the time steps to assess whether either extension is appropriate or to estimate the relevant parameters. Based on our intended application, the DeGroot model is the clear choice due to its simplicity, ability to model disinformation, capacity for using precise opinion data, and validation on small networks [21, 22].

Methods

Available methods to estimate parameters of a DeGroot model are quite limited. Castro and Shaikh were able to estimate the parameters of a Stochastic Opinion Dynamics Model (SODM)—a variant of the DeGroot model which includes information from external sources and normal measurement error on observed opinions—using data collected from an observed opinion diffusion process over an online social network. While they developed both maximum-likelihood-based and particle-learning-based algorithms for estimating the parameters of the $\text {SODM}$, these algorithms require more time steps than agents and at least 100 time steps, respectively [23, 24]. Since the requirements for the existing algorithms vastly exceed the available data for the $\text {PrEP}$ studies and similar opinion diffusion applications, we developed a novel genetic algorithm capable of recovering the parameters of a DeGroot opinion diffusion process—the elements in the weight matrix W—using small data sets, including those with missing data and more model parameters than observed time steps.

Objective function

We propose a squared deviation function summed across all N agents and T time steps^{Footnote 2} which measures how closely the predicted opinions match the observed opinions:

$$\begin{aligned} f_C ({\hat{X}},X)=\sum _{i=1}^N\sum _{t=0}^{T-1}\big ({\hat{x}}_i (t)-x_i (t)\big )^2. \end{aligned}$$

Since lower values indicate a better fit, the optimal solution is one that minimizes the value of the objective function. To assess fit at the agent-level, we simply exclude the sum over all agents and instead compute a separate value of the objective function for each agent using

$$\begin{aligned} f_C ({\hat{x}}_i,x_i)=\sum _{t=0}^{T-1}\big ({\hat{x}}_i (t)-x_i (t)\big )^2. \end{aligned}$$

Genetic algorithm

Genetic algorithms, which mimic the natural processes through which the fittest genes survive to subsequent generations, are an ideal choice for fitting a DeGroot model; they have fewer assumptions and a lower chance of becoming stuck in local optima compared to other optimization algorithms [25]. Genetic algorithms consist of a population of chromosomes, or complete solutions to the optimization problem, which are each composed of genes, subsets of the solution consisting of either individual values or collections of values. In each iteration of the algorithm, these parent chromosomes undergo a variety of operators which modify the genes, producing a population of offspring chromosomes that differ from the population of parent chromosomes. This process repeats with the offspring chromosomes becoming the parent chromosomes of the subsequent iteration until some stopping criterion is met [25]. In the case of the simple DeGroot model, a chromosome is defined as the weight matrix W and a gene as a single row of the weight matrix, denoted $W_i$. This means each gene corresponds to an individual agent and represents the weights that agent places on the opinions of other agents.

We adapt a genetic algorithm developed for generating D-optimal designs for constrained mixture experiments by Limmun, Borkowski, and Boonorm: a useful starting point since both the design matrix for mixture experiments and the weight matrix for a DeGroot model are row-stochastic with the elements in each gene summing to 1 [25]. Though this common constraint on the matrices means the operators developed for mixture experiments are well suited for estimating the parameters of the DeGroot model, there is an important difference in the way the objective functions are used to assess the fitness of chromosomes and the genes within them. Under D-optimality, the fitness of a gene within the design matrix is dependent on the other genes and cannot be assessed separately from the fitness of the entire chromosome. In contrast, since each gene corresponds to an agent, the objective function for a DeGroot opinion diffusion process can be assessed at the gene-level by assessing how well the model predicts the opinions of a particular agent, as demonstrated in the above objective function. We leverage this ability to assess fit on the gene-level by incorporating a gene-swapping process into the algorithm along with the selection, blending, crossover, mutation, and survival operators described below.

Gene-swapping

While the objective function can be assessed at the agent or gene-level, since agents update their opinions based on the opinions of others, the fitness of a gene is dependent on the predicted opinions of the other agents which are, in turn, dependent on the other genes within the chromosome. Consider a case where agent 1 has an unfavorable view of $\text {PrEP}$ which changes to a favorable view after speaking with their contacts, including agent 2. A gene corresponding to agent 1 which places high weight on agent 2 will fit well if the model predicts a favorable view for agent 2 but fit poorly if the model predicts an unfavorable view. In essence, when presented with a pair of genes corresponding to the same agent between two chromosomes, it is possible to assess which gene better predicts the opinions of the agent within its current chromosome but the fitter gene is not guaranteed to continue to produce better predicted opinions when swapped with a less fit gene in an otherwise fitter chromosome.

For example, consider the following population of chromosomes (B, C, and D) for a network of $N=3$ agents whose opinions are recorded across $T=6$ time steps. The value of the objective function for each chromosome, broken down by gene, is given to the right of the chromosome. Genes of interest, along with their contributions to the value of the objective function, are bolded.

$$\begin{aligned} B= & {} \begin{bmatrix} 0.336 &{} 0.664 &{} 0.000\\ \mathbf{0.261 } &{} \mathbf{0.364 } &{} \mathbf{0.375 }\\ 0.349 &{} 0.306 &{} 0.345 \end{bmatrix}, \quad f ({\hat{X}}_B,X)=0.125+ \mathbf{0.017 }+0.006=0.148 \\ C= & {} \begin{bmatrix} 0.378 &{} 0.622 &{} 0.000\\ 0.509 &{} 0.324 &{} 0.167\\ \mathbf{0.441 } &{} \mathbf{0.381 } &{} \mathbf{0.178 } \end{bmatrix}, \quad f ({\hat{X}}_C,X)=0.093+0.036+ \mathbf{0.001 }=0.130 \\ D= & {} \begin{bmatrix} 0.845 &{} 0.155 &{} 0.000\\ \mathbf{0.235 } &{} \mathbf{0.427 } &{} \mathbf{0.338 }\\ \mathbf{0.349 } &{} \mathbf{0.306 } &{} \mathbf{0.345 } \end{bmatrix}, \quad f ({\hat{X}}_D,X)=0.007+ \mathbf{0.044 }+ \mathbf{0.005 }=0.056 \end{aligned}$$

Compared to chromosome D, the second gene in chromosome B and the third gene in chromosome C produce predicted opinions closer to the observed opinions (objective function contributions of $0.017<0.044$ and $0.001<0.005$); however, when the fitter genes are swapped with the corresponding genes in chromosome D, the value of the objective function for the new chromosome $D^*$ increases, indicating the swapped genes perform worse than the original genes within chromosome D. This is due to changes in the predicted opinions over time when modified weights are placed on others’ opinions.

$$\begin{aligned} D= & {} \begin{bmatrix} 0.845 &{} 0.155 &{} 0.000\\ \mathbf{0.235 } &{} \mathbf{0.427 } &{} \mathbf{0.338 }\\ \mathbf{0.349 } &{} \mathbf{0.306 } &{} \mathbf{0.345 } \end{bmatrix}, \quad f ({\hat{X}}_D,X)=0.056 \\ D^*= & {} \begin{bmatrix} 0.845 &{} 0.155 &{} 0.000\\ \mathbf{0.261 } &{} \mathbf{0.364 } &{} \mathbf{0.375 } \\ \mathbf{0.441 } &{} \mathbf{0.381 } &{} \mathbf{0.178 } \end{bmatrix}, \quad f ({\hat{X}}_D^*,X)=0.071 \end{aligned}$$

Within the algorithm, the gene-swapping process behaves in a manner very similar to the above example. The fittest chromosome, which we again call D, is identified and the fitness of each chromosomes is assessed at the gene-level. We then compare the fitness of each gene within D to the fitness of the corresponding genes in all other chromosomes. In all cases where a fitter gene is present, the corresponding genes are swapped between D and the chromosomes containing fitter genes, resulting in $D^*$ and modified versions of all other chromosomes previously containing fitter genes. After all swaps are completed, we compare the fitness of D to the fitness of $D^*$. All swaps are retained if $D^*$ is the fitter chromosome and all swaps are rejected, returning all chromosomes to their previous version, if D is the fitter chromosome.

Selection

The purpose of the selection operator is to preserve the current optimal solution if the subsequent iteration of the algorithm is unable to find a better one. To this end, we implement selection with elitism, where the best chromosome is identified prior to each iteration of the algorithm. This elite chromosome undergoes gene swapping with non-elite chromosomes and then remains unchanged for the remainder of the current iteration of the algorithm, while the remaining chromosomes undergo the subsequent operators.

Blending

For blending, all non-elite chromosomes are randomly paired. Blending occurs for each pair of genes between paired parent chromosomes independently with probability $p_b$ and produces new genes which are weighted averages of the chosen gene pairs. Suppose chromosomes B and C are paired and that $B_i$ and $C_i$ are the ith genes in chromosomes B and C, respectively. Assuming blending occurs for row i, the offspring genes $B^*_i$ and $C^*_i$ are defined as

$$\begin{aligned} B^*_i=\beta B_i+ (1-\beta )C_i\quad \text {and}\quad C^*_i= (1-\beta )B_i+\beta C_i \end{aligned}$$

where $\beta \sim Unif (0,1)$ is a randomly selected blending factor.

Crossover

We use a modified version of the within-parent crossover operator developed by Limmun, Borkowski, and Boonorm [25]. Modifications are required to adequately explore the relatively large parameter space of the DeGroot model, in contrast to the smaller design space in constrained mixture experiments. Within-chromosome crossover occurs with probability $p_c$ independently for all genes in the non-elite chromosomes. For genes selected for crossover, all non-fixed weights are randomly permuted within the gene.

Mutation

Mutation of a gene occurs with probability $p_m$ independent of all other genes in the non-elite chromosomes. For mutation, a weight w is selected randomly from all non-fixed weights within the gene and a random perturbation $\varepsilon \sim N (0,\sigma ^2)$ is added to the selected weight to produce $w^*=w+\varepsilon$. All other non-fixed weights within the row are then scaled by $\frac{1}{1-w_{fixed}-w^*}$, where $w_{fixed}$ is the sum of all fixed weights within the row, so that the row sums to 1. We provide more details on fixed weights below in the Other Features section. To avoid division by zero and maintain weights in the interval [0,1], we implement the following special cases:

If $w^*<0$, $w^*$ is set to 0
If $w^*>1-w_{fixed}$, $w^*$ is set to $1-w_{fixed}$. All other non-zero weights in the row are then set to 0.
If the selected weight $w=1-w_{fixed}$, the excess weight of $1-w_{fixed}-w^*$ is evenly distributed between all other non-fixed weights within the row.

Survival

The survival operator is a means to retain only offspring chromosomes that are an improvement over the corresponding parent chromosomes. The survival operator occurs after mutation, blending, and crossover and compares each pair of parent and offspring chromosomes and retains the fitter of each pair. The retained chromosomes are then subject to gene swapping with their corresponding rejected chromosomes. The final retained chromosomes become the parent chromosomes for the subsequent iteration of the algorithm.

Other features

The fixed values mentioned in the mutation operator allow for specification of elements in the weight matrix W whose values are either known or assumed. Most often, this will be because weights are fixed at 0 because agents are not linked in the network ($w_{ij}$ where $a_{ij}=0$).

For early iterations of the algorithm, the purpose of the crossover and mutation operators is to explore the parameter space of the DeGroot model. To this end, we use relatively high values of the probabilities ($p_c,p_m$) and of $\sigma$. In later iterations, the purpose shifts to refining an existing solution. Since the blending operator is better suited to this purpose, we progressively reduce the probabilities of crossover and mutation and the mutation variance $\sigma ^2$ while increasing the probability of blending ($p_b$). These changes are implemented using a multiplicative adjustment each time a specified number of iterations with no improvement is reached (until the probabilities and $\sigma$ attain specified minimum or maximum values). The hyperparameters of the algorithm used in both the simulation study and data analysis are given in Table 1. The ranges of operator probabilities ($p_b,p_c,p_m$) and $\sigma$ are informed by the work of Limmun, Borkowski, and Boonorm [25] with adjustments to accommodate searching a larger space. The remainder of the parameters were selected based on our experience using the algorithm to heavily favor convergence over computational efficiency with the intention of the values having minimal impact on results.

Table 1 Simulation study hyperparameters

Using a novel genetic algorithm to assess peer influence on willingness to use pre-exposure prophylaxis in networks of Black men who have sex with men

Abstract

Background

Opinion diffusion models

Modeling considerations

Statistical physics

\(\text {SIR}\) model

Bayesian and naive learning

DeGroot model

Methods

Objective function

Genetic algorithm

Gene-swapping

Selection

Blending

Crossover

Mutation

Survival

Other features

Simulation study

Results

Network size and degree

Self-weight

Missingness and time steps

Application: diffusion of willingness to use \(\text {PrEP}\) among \(\text {BMSM}\)

Study overview

Recruitment of social networks

Assessments

Methods

Forward transformation

Back transformation

Results

Accuracy of predicted opinions

Weight estimates and variability

Comparison of leaders to other agents

Conclusions

Simulation study

Opinion diffusion modeling

Availability of data and materials

Notes

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords