Generating a heterosexual bipartite network embedded in social network

We describe an approach to generate a heterosexual network with a prescribed joint-degree distribution embedded in a prescribed large-scale social contact network. The structure of a sexual network plays an important role in how all sexually transmitted infections (STIs) spread. Generating an ensemble of networks that mimics the real-world is crucial to evaluating robust mitigation strategies for controlling STIs. Most of the current algorithms to generate sexual networks only use sexual activity data, such as the number of partners per month, to generate the sexual network. Real-world sexual networks also depend on biased mixing based on age, location, and social and work activities. We describe an approach to use a broad range of social activity data to generate possible heterosexual networks. We start with a large-scale simulation of thousands of people in a city as they go through their daily activities, including work, school, shopping, and activities at home. We extract a social network from these activities where the nodes are the people, and the edges indicate a social interaction, such as working in the same location. This social network captures the correlations between people of different ages, living in different locations, their economic status, and other demographic factors. We use the social contact network to define a bipartite heterosexual network that is embedded within an extended social network. The resulting sexual network captures the biased mixing inherent in the social network, and models based on this pairing of networks can be used to investigate novel intervention strategies based on the social contacts among infected people. We illustrate the approach in a model for the spread of chlamydia in the heterosexual network representing the young sexually active community in New Orleans.

only use sexual activity data, such as the number of partners per month, to generate the sexual network.Real-world sexual networks also depend on biased mixing based on age, location, and social and work activities.We describe an approach to use a broad range of social activity data to generate possible heterosexual networks.We start with a large-scale simulation of thousands of people in a city as they go through their daily activities, including work, school, shopping, and activities at home.We extract a social network from these activities where the nodes are the people and the edges indicate a social interaction, such as working in the same location.This social network captures the correlations between people of different ages, living in different locations, their economic sta-

Introduction
The structure of heterosexual networks plays an important role in the spread of sexually transmitted infections (STIs).These networks are captured in computer simulations by a bipartite graph where the nodes represent the people and the edges are sexual partnerships between nodes of different sexes.De-termining what is predictable in STI models requires an algorithm to generate an ensemble of random graphs that resembles real-world sexual activities, including the distribution for the number of sexual partners people have (their degree distribution) and the number of partners their partners have (the jointdegree, or degree-degree, distribution).The existing algorithms that generate bipartite random graphs preserving degree and joint-degree distributions of the nodes are strictly based on the number of partners people have, and not other demographic factors, such as age or location [23,15,9].
The degree and joint-degree distributions are just two of many properties for a heterosexual network that can affect its structure and the validity of an epidemic model.The heterosexual network is also correlated to an underlying social contact network of acquaintances connected by interpersonal relationships.A person's sexual activity depends on age, race, sociodemographic, and socioeconomic features of the environment that can be captured by a social contact network [4,2,24,16,21].In other words, using the extended social network of a person as a source of sexual partner selection when generating a heterosexual network, enables the network to capture the bias in heterogeneous mixing based on age, race, economic status, and geographic location [19].
Although it is widely accepted that social contact (non-sexual partners) and heterosexual (sexual partners) networks are related, there are few studies on how a population's social contact network impacts the spread of heterosexual STIs.The social network can affect the structure of the heterosexual network by providing the pool of sexual partners and can transmit information and cultural norms regarding STI test and safer sex, but there is no mechanistic approach that addresses and uses this capability of social networks.We will describe a new approach that fills this gap by applying social contact networks to generate the heterosexual network while preserving the joint-degree distribution of data.
Our new network generation approach uses the underlying extended social network of a population to extend these previous algorithms for generating bipartite heterosexual networks with prescribed joint-degree distribution [9].
Many sexual partnerships are formed from within a person's social circle, defined by the people they have regular social contact with, and the contacts of their contacts (their extended social network).These social circles have been modeled through large-scale simulations of thousands of people in a city as they go through their daily activities.We start with a network that mimics the social activity of the population [12], as generated by a complex social network simulation.We use this simulated data to create an extended social network and then identify a bipartite network of men and women to define our heterosexual network.We then create a virtual heterosexual network as a subgraph of this bipartite social network that captures a prescribed joint-degree distribution.
As a case study, we construct a heterosexual network that is embedded in the social contact network of New Orleans population and mimics the sexual behavior obtained from a sexual behavior survey of the young adult African American population in New Orleans [17,14].

Materials and method
People often find their sexual partners within their extended social network, the individuals they come in contact with each day at work, school, or other social activities.There are sophisticated simulations of these social networks that can be used to produce a sexual network, which is more realistic than basing partnerships on just the sexual activity of different individuals.
The social contact network is a graph where the nodes are synthetic people, labeled by their demographics (sex, age, income, location, etc.), and the edges between the nodes represent contacts determined in which each synthetic person is deemed to have made contact with a subset of other synthetic people through some Activity types.Each edge of the network is labeled with one of these activity locations and is weighted by the time spent on these contacts per day.
For example edge (i,j) labeled by the activity A = W ork and weighted by T W i,j means two persons i and j have a contact for T W i,j fraction of their total time spent at work.We base our algorithm on a social contact network, called SocNet, generated by Eubank et al. [12] with activity at different locations (e.g.home, work, school, shopping, or other activity).
We introduce an algorithm that embeds a heterosexual network within a social network and also meets the joint-degree distribution of the sexually active population.The heterosexual network preserves the bipartite joint degree (BJD) distribution matrix that represents the correlations between then number of partners a person has and the number of partners their partners have [9].
The algorithm has three stages: (i) Generate an extended social contact network, ESocNet: The original social contact network is a simple graph, whose nodes are synthetic people, and neighboring nodes are their social contacts during a typical day.We assume that most sexual partnerships come from a person's social contacts, or the social contacts of their social contacts, e.g. the neighbors of the neighbors of a node.We extend the social contact network to create a new network, the extended social network, ESocNet, where some of the neighbors (social contacts) of an individual's neighbors in this network are added to his/her social contacts.
(ii) Generate a reduced social bipartite network, BSocNet: The ESocNet includes all the individuals in the region being modeled.Our sexual network is based on individuals within a prescribed age range.In this step, we remove all nodes where the associated individuals are outside this age range.The extended social network is a simple graph where nodes have some neighbors that are the same sex.We identify the embedded bipartite subgraph of this network by removing all edges between individuals of the same sex.The resulting bipartite graph is a social network where male nodes are only connected to female nodes and vice versa.Finally, we assume siblings are not sexual partners and therefore, we remove all edges between individuals living in the same household, which is the edge labeled activity H for home.We call this reduced social bipartite network as BSocNet.
(iii) Generate an embedded heterosexual bipartite network, SexNet: We then use the BSocNet to define a heterosexual network of sexual partnerships, the SexNet, with a prescribed BJD based on survey data [9].That is, we preserve the correlations between the number of partners a person has and the distribution for the number of partners their partners have.
We assume that most of a person's sexual partners are neighbors in the BSocNet and a few of the partners are randomly selected from elsewhere in the population where they might have met through social media or at any other event.

Generate an extended social contact network (ESocNet)
In the first stage of our algorithm, we create an extended social contact network, ESocNet, so that an individual's social contacts include some of the contacts of their contacts.That is, ESocNet will add potential sexual partners by including some of the social contacts of an individual's social contacts.
Consider two people (nodes) i and j who are not currently connected, but have k A (i, j) > 0 common social contacts within activity A. We define p A ij as the probability that they will meet through a single contact.Therefore, the probability that they will meet and be connected in the ESocNet after k A (i, j) j) .The probability p A ij is a function of the time that i and j spend in an activity A in SocNet.From the data in SocNet, we can define τ A k as the average fraction of time person k spends with each social contact, when engaged in activity A: where T A k,l is the fraction of time two contacts k and l spend together in activity A, and N A (k) is set of all social contacts for person k through an activity location A. We then define 1) describes an schematic of this algorithm for a simple network.

Generate a reduced social bipartite network, BSocNet
In our heterosexual network, we only consider the sexually active population within a prescribed age range α = [α 1 , α 2 ].That is, we trim the ESocNet by removing all people with ages outside this range to not including any edges (sexual contacts) with people outside this range.We then remove all edges between people of the same sex to create a bipartite heterosexual social network.
Finally, to avoid including siblings as potential sexual partners, we remove all edges between two individuals living in the same household by removing all edges labeled activity H as home, to define the BSocNet.

Generate an embedded heterosexual network (SexNet)
The Soc2Sex algorithm uses BSocNet to generates a heterosexual network, SexNet, that mimics the heterogeneous mixing of the real population.
We assume that we have an estimate for the distribution for the number of partners of men and women (the degree distributions for their associated nodes) and that is, k 0 meets i and j at the same location, similarly k 1 meets i and j at the same location to k 0 's, however, k 2 meets them in different places.To compute p A i and p A j we only count the social contacts who meet them at the same location A, therefore, p A i = (T A ik 0 + T A ik 1 )/2, and p A j = (T A jk 0 + T A jk 1 )/2.Finally, the probability that i and j make an edge-are connected in the extended social network-is 1 the joint-degree distribution for the number of partners that their partners have [9].
An edge, ij, between two persons i and j in SexNet represents a sexual partnership.The degree of a person i, is defined by the number of his/her sexual partners.The degree distribution {d k } defines the number of people with degree k.The joint-degree distribution (k, j) is the number of partnerships between a man with degree j and a woman with degree k.This distribution can be represented by the Bipartite Joint Degree or BJD matrix: , where, w is the maximum degree in women nodes, and m is the maximum degree in men nodes, each element e ij is the number of edges between women with i partners and men with j partners.
The degree distribution of the number of women nodes, d w k , and men nodes, d m k , with k partners can be obtained from BJD SexNet : Though the heterosexual network SexNet is a subgraph of BSocNet, we also consider some sexual partners that are within a person's extended social circle.That is, for the general case, SexNet is partially embedded in BSoc-Net.
The Soc2Sex algorithm first generates an initial heterosexual network that closely agrees with the desired BJD matrix and is partially embedded in BSoc-Net.Usually, this network satisfies the desired BJD, but can fail when the average degree (number of partners people have) becomes large.When this happens, a second fix-up algorithm, based on rewiring the network, is used to repair any discrepancies so the final SexNet has the desired BJD matrix.

Generating the bipartite network
The Soc2Sex algorithm starts with the SocNet, the BJD matrix corresponding to SexNet, and the fraction p ∈ [0, 1] of partners that are chosen randomly from the extended social contacts in the BSocNet.The remaining fraction, (1 − p), of a person's sexual partners are randomly selected from elsewhere in the population.These partnerships might have formed by meeting through social media or a social event not captured by the original SocNet.
The Soc2Sex algorithm then generates a heterosexual network that is a partial subgraph of BSocNet and has a joint-degree distribution given by the BJD matrix.Note that p is approximately the percentage of SexNet that is a subgraph of BSocNet.
The algorithm starts with an empty set of nodes SexNet and then builds a network guided by the BJD matrix.The nodes with the smallest degree have the least flexibility, so we start building SexNet by randomly selecting a man node of BSocNet with the highest (social) degree and assign its desired sexual degree to be column size of BJD matrix.This is represented by stubs, or unconnected edges, associated with this node.
We repeat the following process until all edges in SexNet -that are equal to the summation of elements in BJD matrix-are placed: at any step, we find a node with the highest stub in SexNet and then with probability of p, we find a partner with proper degree defined by BJD for them from their social contact in BSocNet , or with the probability of 1 − p, we find a partner with proper degree defined by BJD from closest people but not their social contact, and then reduce its new partner's stub by one.If we find all the partners for all nodes in SexNet, we introduce a new node from highly social active nodes in BSocNet to SexNet and assign its desired degree and stub equal to the current maximum degree frequency of SexNet.
To keep or remove an edge, we have to calculate the degree of nodes attached to it for each possible edge in the SocNet, thus, the full set of experiments run in O(|E|P m P w ) time, where |E| is the number of edges in SocNet, P m number of its men nodes and P w number of its women nodes.This method is feasible if the average degree ( 2|E| P m +P w ) of the network is not high.For completeness, we provide pseudo-code for our Python scripts in Algorithms 1 and 2. Table 1  The initial algorithm will generate a network with the prescribed degree distribution.We have observed that the resulting network always had the desired BJD matrix for all of the sparse heterosexual networks we have generated in this project.However, there are some situations, where the desired network is not sparse, the algorithm can fail to exactly produce a network with the desired BJD matrix for the joint-degree distribution.When this happens, a second algorithm is used to rewire the network so that it will exactly match the desired bipartite joint-degree distribution for the number of partners that a person's partners have.

Rewiring SexNet for a given BJD matrix
The rewiring algorithm corrects any mismatch between the joint-degree distribution of generated SexNet and the desired BJD.We define the jointdegree distribution of the generated SexNet as BJD the mismatch error matrix E = BJD − BJD.If the matrix E has nonzero elements, then the network is rewired to eliminate the error.There are three possible cases: (i) If entry E (i,j) , (i, j > 1), is a positive value k, it means that SexNet needs k more edges between degree i women and degree j men.To create these edges, we iterate the following process k times: (ii) If element E (i,j) , (i, j > 1), is a negative value k ′ , it means that SexNet have extra k ′ edges between degree i women and degree j men.To remove these edges, we iterate following process k ′ times: (a) First, identify a woman, w i , in SexNet, where d SexNet (w i ) = i, which has a degree-j partner like m j , that is d SexNet (m j ) = j.The nodes w i and m j are selected so that they have at least one social contact with opposite sex that is not their sexual partner.Rewiring ( 1) Rewiring ( 2) (iii) In the previous steps, we pushed back nonzero elements in E to its first row and column, which causes new nonzero elements in the first row and column.To remove these nonzero values, we have to add or remove small components.For example, if the element (i,1) of E is a positive value k, it means that we need a small component of a degree i woman whose partners are all degree 1 men.Therefore, we simply make this component from the people who are not currently in SexNet.n.If the element (1, j) of E is a negative value k, it means we have to remove a small component of a degree j man whose partners are all degree 1 women.Therefore, we simply look for such a component and remove it from SexNet.
We have found that this algorithm almost always converges to the desired BJD.However, there are rare cases when the desired rewiring nodes may not exist, and algorithm stalls with E = 0.When this happens, the rewired network will have a better joint-degree distribution of SexNet, even though it does not exactly match the desired BJD.In the numerical simulations, all of the generated New Orleans heterosexual network had exactly the desired BJD based on survey data.
In the next section, we apply our algorithm to generate and analyze several random SexNet corresponding to sexual activity of adolescent and young adult sexually active African Americans reside in New Orleans.First, we explain the inputs of our approach: New Orleans social network and joint-degree distribution-BJD matrix-corresponding to its sexual network.Then we use our algorithm to generate and analyze a bunch of SexNets for a subpopulation of people in New Orleans.

Simulations
We analyze an ensemble of sexual networks with a prescribed joint-degree distribution representing sexual activity of young adult African Americans in New Orleans.

The New Orleans social activity data and SocNet
The SocNet is based on the synthetic data generated by Simfrastructure [12,13] for 130,000 synthetic people residing in New Orleans.Simfrastructure is a high-performance, service-oriented, agent-based simulation system, representing and analyzing interdependent infrastructures.The data for the network includes information for each individual, identified by their PID (personal identifier), and includes their age, gender, household, and other demographic information.The contact information for each PID is encoded in the contact file in

The BJD matrix for New Orleans heterosexual activity
An ongoing community-based pilot study was conducted among sexually active African Americans ages 15 − 25 in New Orleans [17], to assess the effectiveness of prevention and intervention programs for chlamydia.Socio-demographic information including age, race, educational level and, sexual behavior-number and age of heterosexual partners in the past two months-and history of their STI test results were collected from 202 men and 414 women participants.Meanwhile, their partners' information has been collected by asking questions referring to the status of each relationship such as the partner's age and the possibility that their partner(s) have intercourse with others.The survey results were used to construct the BJD matrix of a heterosexual network of individuals in New Orleans.For a population P = 15, 000 sexually active young adult men and their women partners residing in New Orleans, we have The dimension of this BJD matrix is 6 × 21, that is, the maximum number of partners women have is 6 and for men is 21.

SexNet analysis
Using the social network and BJD matrix provided in the previous subsections and approach described in Section 2, we generated 150 SexNets of 15000 people for p = 0.2, 0.4, 0.6, 0.8 and 1.That is, 30 of the SexNets are 20%, 30 are 40%, 30 are 60%, 30 are 80%, and the rest 30 are 100% subgraph of BSoc-Net.We then compared some descriptive measures of this ensemble of random networks that were not imposed when generating the networks, including the size of giant components and bi-components, number of connected components, and average redundancy coefficient.
First, we evaluated and compared the size of the giant component and bicomponent (the first and second biggest connected components of the network) for each group of the networks.Figure (3) shows the box plot of these sizes: there is an increment in the size of giant components when people select most of their sexual partners from their social contacts.Because in that case, sexually active people are tighter together within the social contact network.But there is not a significant difference in the size of giant bi-component.
The number of connected components, N c, is another measure characterizing network toughness.This measure can be, not necessarily, correlated to the component's size of the network.Redundancy coefficients are the measure of the degree to which nodes in a bipartite graph tend to cluster together: Definition 1.For a bipartite network, redundancy for a node is the ratio of its overlap to its maximum possible overlap according to its degree.The overlap of a node is the number of pairs of neighbors that have mutual neighbors themselves, other than that node [18].For a typical node v, the redundancy coefficient of v is defined as , where, N (v) is the set of all neighbors of node v, and E is the set of all edges in the network.
We compare this measure for the networks in

Discussion
We described a new algorithm to generate an ensemble of heterosexual networks based on heterosexual behavior surveys for the young adult African American population in New Orleans.The prescribed degree and joint-degree distribution represented the heterosexual network embedded within a social network that captures the biased mixing of the population based on age, physical location, and social activities.
We generated an ensemble of different heterosexual networks with the same BJD.When the networks had a more percentage of partners selected from their extended social contacts, then we observed a tighter distribution in the number of connected components and the size of giant component and bi-components.
When more partners are chosen from the extended social network, instead of randomly selected from the population, then the size of giant component increases, and following that the number of connected components decreases, which is because of reducing the mixing in generating sexual network: when people select their sexual partners from their social contacts they stand in a tight group within social network.In fact, when being subgraph of the extended social network become stronger, the candidate set of sexual partners for each person that is set of social contacts decreases and becomes local ( this set includes close contacts and contacts of contacts) compared with when this set is the whole population.
As p increases, then more partners are chosen from a person's extended social network.This also increases the network clustering coefficients (where more partners of your partner's partner are also one of your partners).The redundancy coefficients for networks increases as the dependence of sexual network on social one rises when p increases, which is because of the high clustering coefficient of the social network due to the first stage of the algorithm, generate an extended social contact network.In that stage, we made some new contacts between the contacts of each individual, which causes the increment in the clustering coefficient of the social contact network.On the other hand, when more partners are chosen from the social contact network, more properties of the social network such as the clustering coefficient become inherited by SexNet.Thus, increasing p, we observe increment in the redundancy coefficient of SexNet.
We studied the measures of networks because they may affect the spread of an STI such as chlamydia on the SexNets.In our future work when studying the spread of chlamydia on heterosexual networks, we will measure their impact on the prevalence of chlamydia over SexNets generated using different p values.
There are still unanswered questions for proving the existence of a heterosexual network with a prescribed joint-degree distribution embedded within a prescribed social network.That is, there are no explicit criteria to guarantee that a heterosexual network with a particular joint-degree distribution can be embedded within a particular social network or not.
We are currently simulating a stochastic agent-based network model on SexNet for the spread of chlamydia and comparing different intervention strategies to control the spread of STIs.These simulations will use the underlying social contact network to improve the current intervention models by considering the impact of counseling and behavioral changes such as increasing condom use or social contact notification.

Figure 1 :
Figure 1: Suppose the persons i and j are not currently social contacts in SocNetbut have three different common social contacts k 0 ,k 1 , and k 2 through different activities.They might be connected in the extended social network, ESocNet, when at least one of their common social contacts meet them within the same activity location.Suppose A ik 0 = A jk 0 = A ik 1 = (a) First, identify a woman, w i , in SexNet, where d SexNet (w i ) = i, and has as a partner, m 1 , with degree-1, i.e. d SexNet (m 1 ) = 1.(b) Next, identify another man, m j ∈ SocNet.N (w i ) − SexNet.N (w i ), where d SexNet (m j ) = j and m j has a degree-1 partner, w 1 .That is d SexNet (w 1 ) = 1.(c) Finally, we rewire the network by removing the edges (w i ,m 1 ) and (m j ,w i ) and add edge (w i , m j ), as illustrated in the Rewiring (1) of the Figure (2).

Figure 2 :
Figure 2: Schematic of steps 1 and 2 of Rewiring approach to correct BJD.

Figure ( 4 )
displays descriptive statistics for N c in each network group.Note that data distributions are approximately symmetrical, and measures of N c are similar across groups, but, they change by changing the source of partner selection-changing p.

Figure 3 :
Figure 3: Box plot representing thee size of the giant component and bi-component for each group of networks: The size of the giant component becomes bigger when the portion of the subgraph becomes stronger, however, the social network does not have much impact on the size of the giant bi-component.

Figure 4 :
Figure 4: Box plot representing the number of connected components, N c, for each group of networks: A significant difference is observed in N c between each group, N c is lower in larger subgraph of social networks.

Figure ( 5 )
: each data point Rc(k) for degree k is obtained by averaging redundancy coefficient over the group of people with k partners.In most of the networks, Rc(k) decreases with k[22].Redundancy coefficient Rc is affected by social network BSocNet: when people select more sexual partners from their social contacts the value for Rc increases, which is because of stage one of the algorithm-Generate an extended social network.In that stage, by connecting the social contacts of a person in BSocNet, we increase its clustering coefficient.Therefore, because increasing p SexNet becomes a stronger subgraph of BSocNet, it inherits more properties from BSocNet, that is, by increasing p Rc of the SexNet, which is correlated to clustering coefficient of BSocNet, increases.

Table 1 :
is the table of symbols used in these Algorithms.Table of notation for a conventional network G in algorithms.
dist(G, u, v)distance between two nodes u and v in Network G M.col / M.row column/row size of a matrix M M (i, :) / M (:, i) i th row/column of matrix MV (i) i th element of vector V V.index(a) index ofelement a in vector V |S| size (the number of elements) of a set S S.remove(m) remove member m from a set S S.sample(P ) randomly select an element with property P (if P= 1 there is no property) from set S urn uniform random number in [0, 1]

Table 2 .
The Simfrastructure data was used to generate the original SocNet, which was then used to generate ESocNet and BSocNet as described in the previous section.

Table 2 :
Table input for the social contact network of individuals.The PID is personal ID,FID is their social contact ID, A is activity in which PID meet FID, and T is the fraction of time in a day that two social contacts meet with each other through activity A. We have five different activities, including H as home, W as work, Sc as school, Sh as shopping, and O as others.For example, person 43722 stays with person 16981 at the same home for 7.2 hours in a day.