Measuring the effect of collaborative filtering on the diversity of users’ attention

While the ever-increasing emergence of online services has led to a growing interest in the development of recommender systems, the algorithms underpinning such systems have begun to be criticized for their role in limiting the variety of content exposed to users. In this context, the notion of diversity has been proposed as a way of mitigating the side effects resulting from the specialization of recommender systems. In this paper, using a well-known recommender system that makes use of collaborative filtering in the context of musical content, we analyze the diversity of recommendations generated through the lens of the recently proposed information network diversity measure. The results of our study offer significant insights into the effect of algorithmic recommendations. On the one hand, we show that the musical selections of a large proportion of users are diversified as a result of the recommendations. On the other hand, however, such improvements do not benefit all users. They are in fact mainly restricted to users with a low level of activity or whose past musical listening selections are very narrow. Through more in-depth investigations, we also discovered that while recommendations generally increase the variety of the songs recommended to users, they nonetheless fail to provide a balanced exposure to the different related categories.


Introduction
The ever-growing quantity of information and data available on online platforms has led to the need for efficient and reliable methods to filter this information. To this end, recommender systems have been introduced in different contexts, ranging from email filtering (Goldberg et al. 1992) to news exposure on social media platforms (Karimi et al. 2018), purchasing recommendations on online stores (Pradel et al. 2011), and generating playlists in streaming services (Hansen et al. 2020). The popularity of such systems lies in their ability to efficiently filter a high number of items to ensure that users are only presented with a few relevant ones.
While there has been a growing interest in developing such systems, the algorithms underpinning them have begun to face challenges, partly due to the over-personalization of their recommendations. Concerns have also been raised regarding the impact of algorithmic recommendations on users' behavior. Recently, for instance, news recommender Godinot and Tarissan Applied Network Science (2023) 8:9 systems have been criticized for their role in the appearance of echo chambers and the spread of fake news (Helberger 2019). In this context, diversity has been proposed as a way of mitigating the side effects resulting from the specialization of recommender systems (Helberger et al. 2018). However, while the scientific community usually agrees when it comes to the usefulness of diversity, little is known about how classic recommendation paradigms relate to the diversity of the content exposed to users. This paper makes a significant step in this direction by analyzing the impact of a classic recommendation approach, namely the Collaborative Filtering via Matrix Factorization for Implicit Datasets (Hu et al. 2008), in relation to diversity. By conducting random walks on a tripartite graph that describes the relation between users, products, and categories, we were able to quantify the diversity of users' attention (Ramaciotti Morales et al. 2021) in relation to the categories before and after recommendations. We applied this approach to a dataset that records user activity on an online platform featuring musical content (Bertin-Mahieux et al. 2011) and study the impact the parameters of the model have on different aspects of the diversity of recommendations.
Our results show that the relation between algorithmic recommendations and diversity is complex. On the one hand, and in contrast with claims about the effect of algorithmic recommendations on limiting the diversity of content exposed to users (Pariser 2011), we show that recommendations do not necessarily limit diversity. On the contrary, the musical selections of a large proportion of users are in fact diversified by algorithmic recommendations. Moreover, we observe that diversity also increases with the number of items recommended.
On the other hand, however, such positive outcomes are mitigated by the fact that this trend depends strongly on the user's profile. First, diversity mostly increases for users with a low level of activity on the platform or whose past musical listening records are very narrow in scope. Second, by investigating in greater depth the recommendations exposed to users for whom diversity is increased, we were able to reveal which facet of the diversity is improved by collaborative filtering. If the recommendations generally increase the variety of the songs recommended to users, they nonetheless fail to provide a balanced exposure to the different related categories. When it comes to diversity, the extent to which recommendations fit in with a user's musical habits actually proves more important than the diversity of the recommendations itself.
We believe that our proposed method, the practical investigation we carried out on a collaborative filtering approach and the results we obtained on a real dataset all serve to offer a new perspective on how researchers can leverage network structures to examine the ethical effects of recommendation algorithms. It is worth noting here that we tested this approach on music exposure but we believe it could be applied to any other context with a similar network structure, thus paving the way for more general and systematic studies that would shed light on the effect of recommender systems on diversity.

Paper outline
After reviewing in Sect. Related work the existing literature on recommender systems and their relation to diversity measures, in Sect. Diversity in multi-partite graphs and recommender systems we provide details about the recommendation system we studied in this paper and the network science approach we used to measure its effect on diversity. In Sect. Analysing the effect of collaborative filtering in terms of diversity, we apply this approach to a specific dataset which records user activity on an online platform featuring musical content, before demonstrating the results achieved. Finally, in Sect. Conclusion and perspectives we conclude the paper and propose possible avenues for future work.

Related work
The question addressed in this paper falls within two independent lines of research: defining the notion of diversity in order to derive diversity measures, and analyzing recommender systems. Our approach relies on recent advances in both fields.

The concept of diversity
Independently from the analysis of the effect of recommender systems, diversity has been the subject of a wide variety of research studies in many different contexts and the question of which diversity measure should be used is a recurrent debate (Sakai and Zeng 2019). These range from ecology (McCann 2000) to economics (Gini 1921) and information theory (Rényi et al. 1961), to name just a few. In this line of research, there is a long tradition of proposing different indexes to quantify the diversity of a system. We could cite, for instance, the well-known Shannon and Rényi entropy, the Gini coefficient or the Hirschman-Herfindahl index.
In his seminal paper (Stirling 2007), Stirling observed that although the concept of diversity depends largely on the system being studied, common traits can be identified. Diversity is an irreducible property of a system (and not only of its parts) that can be expressed as a combination of variety, balance, and disparity. In Stirling's own words, variety is the number of categories into which system elements are apportioned; balance is a function of the pattern of apportionment of elements across categories; and disparity refers to the way and extent to which the elements may be distinguished.
Continuing from Stirling's work, the authors of Ramaciotti Morales et al. (2021) developed a theory of diversity measures and introduced a general methodology to formally quantify the different aspects of diversity as soon as the system under consideration can be represented as a network. This method relies on the distribution generated by random walks on a network that captures how the nodes from one layer (typically users) are related to the nodes from another layer (such as categories of products). By measuring the extent to which such a distribution differs from a uniform distribution, the authors define the α-diversity as a measure of the diversity of a system at different orders (see Sect. A formalism to analyse diversity for more details).
Interestingly, the different orders of the diversity directly refer to two of the three facets highlighted by Stirling. In particular, the 0-diversity captures the variety of a system exactly, while the ∞-diversity captures its balance. Any other value of α is then an attempt to take those two dimensions into consideration in the measure. For this reason, we chose to follow this approach by using the 0, 2 and ∞-diversity in the rest of our study.

Recommender systems and diversity
When it comes to recommender systems, the technique most commonly used to measure diversity is based on intra-list dissimilarity (Silveira et al. 2019), in particular in the context of the diversity of music recommendations (Zhang et al. 2012;Schedl et al. 2017), which is the context studied in this paper. Given a set of items i ∈ I u listened to by a user u, a dissimilarity metric d(i, i ′ ) between items i and i ′ is constructed. When evaluating recommendations, it is then common to derive d from the cosine similarity between the matrix factorization latent vectors of i and i ′ (Cheng et al. 2017) and to take the average over all pairs (i, i ′ ) . Other metrics can be used, however, such as the inverse Pearson correlation (Vargas and Castells 2011) or the hamming distance (Kelly and Bridge 2006).
As an alternative, when matrix factorization is not used, the authors of Waller and Anderson (2019) used a community embedding model to obtain such vectors. Likewise, instead of taking the mean of the dissimilarities, one could use the maximum over any I u k-subset of the minimum pairwise distance (Abbar et al. 2013). Another diversity measure, used in particular when metadata (such as tags or categories) are available, is the topic coverage (Li et al. 2020): the diversity relates to the number of topics reached by a user through the items he/she listens to. Dissimilarity metrics and topic coverage usually measure the individual diversity of each user, but they can also be computed over all the items listened to by all the users, leading to a measure of the collective (or aggregate) diversity (Shi 2013).
Most studies have focused on dissimilarity to incorporate this notion into recommendations. However, when it comes to the notion of balance or the use of such properties to analyze the effect of recommendations, little is known. One of the few works to have leveraged the volume of user-item interactions and item-tag strengths to capture the notion of balance is Vargas et al. (2014), while (Paudel et al. 2017) is the only recent work to have studied the effect of the parameters of a model on the diversity of recommendations.
To the best of our knowledge, the work presented in this paper is the first attempt to build on this extensive literature on diversity in recommender systems in the light of the new network science approach introduced in Ramaciotti Morales et al. (2021), in order to analyze the effect of a recommender system (and particularly its parameters) on both the variety and the balance of the content exposed to users.

Diversity in multi-partite graphs and recommender systems
When studied in fields such as ecology or economics, diversity metrics have historically been derived from probability distributions. In certain situations, such as the distribution of income or species, obtaining such distributions can be straightforward. In most cases, however, it requires a detailed understanding of the field and choosing between different possibilities remains largely subjective. In the case of musical content, for instance, would it be more relevant to analyze diversity in relation to the distribution of the songs listened to by a user, the musical categories to which they belong, the dissimilarity of the songs, or another aspect?
The work set out in Ramaciotti Morales et al. (2021) provides a framework for guiding such choices when the data can be represented as a Heterogeneous Information Network. In this section, we introduce this formalism (Sect. A formalism to analyse diversity), before then describing the particular recommender system that will be the focus of our study, namely the Collaborative Filtering via Matrix Factorization for Implicit Datasets (Hu et al. 2008) (Sect. Collaborative filtering).

Tripartite graphs and random walks
A tripartite graph is a graph whose nodes can be divided into three disjoint sets such that two nodes of one set cannot be connected by an edge. In perspective of using tripartite graphs in the context of users (U) listening to songs (I) related to musical categories (C), we restrict tripartite graphs to the case T = U , I, C, E I U , E C I , with E I U ⊆ U × I and E C I ⊆ I × C 1 . In addition, weights can be assigned to edges by defining the associated weight functions: w I U : E I U → R + (the number of times a user has listened to a song) and w C I : E C I → R + (the strength of the relation between a song and a musical category). For any node v we denote by N(v) the set of its neighbors, d(v) its degree and d w (v) its weighted degree 2 . An example of such a tripartite graph is given in Fig. 1.
Once represented as a tripartite graph, one would like to analyze how the induced relation between bottom (users) and top (categories) nodes are distributed. To do so, we rely on random walks on the tripartite structure. For every node v, we define the probability to reach a neighbor z ∈ N (v) as p v→z = w(v,z) d w (v) . Then, for each bottom node u ∈ U and top node t ∈ C , we define the probability p u→t to reach t from u through I as: (1)  If we repeat this process for each bottom node, we obtain the bipartite projection of the tripartite graph into a bipartite graph where E C U is the set of edges between bottom nodes u and top nodes t such that there exists a path from u to t trough a middle node v ∈ I . The weights of the resulting edges are the transition probabilities p u→t (see Fig. 2 for the induced projections and transition probabilities of users U 1 , U 2 and U 3 from Fig. 1).
One of the strengths of this approach is that it provides a sound and interpretable way to extract different probability distributions from a user-song-category tripartite graph. In this paper, we consider a random walk restricted to the paths from U nodes to C ones, the aim being to obtain the distribution of the categories related to a user through his/her listening habits (weighted by the play count). This is similar to the work conducted in Vargas et al. (2014). Doing so, we measure the individual diversity of a user but the framework could also be used to extract other distributions that reflect on other aspects of diversity. For example, it could be used to measure the collective (or aggregate) diversity of a set of users by initiating the random walk from the (weighted) set of users instead of an individual one (see Ramaciotti Morales et al. (2021) for a complete review of the possible uses of this framework).

Diversity of users' attention
We base our diversity index on the true diversity of order α (or Hill Number) D α (p) (Hill 1973;Jost 2006). The aim is to distinguish between a perfect situation in which all categories are reached uniformly by the users (highest diversity) and a worst situation in which few categories capture all the links (lowest diversity). More formally, for any probability vector p ( p i ∈ [0, 1] , i p i = 1 ) and positive α , we define D α (p) as: We then derive the α-diversity of a user's attention u ∈ U as: A straightforward interpretation of the α-diversity is that a high value indicates that a user reaches a wide range of categories almost uniformly, while a low value indicates a concentration of his interest towards a small and unbalanced number of categories.
( However, this interpretation also depends on the order at which the diversity if measured. Interestingly, depending on the values of α , this diversity index expresses well-known diversity measures: • α = 0 is exactly the richness diversity (MacArthur 1965;Gotelli and Colwell 2011), which captures the notion of variety as defined by Stirling (Stirling 2007). • α = 1 is related to the Shannon entropy (Shannon 1948;Shannon et al. 1963) (H(p)): • α = 2 is related to the Herfindal-Hirschman diversity (Rhoades 1993) (HHI(p)): • α = ∞ is related to the Berger-Parker diversity (BBI(p)) (Berger and Parker 1970) and captures the notion of balance: Going back to the example of Figs. 1 and 2, one can see the interest of the different orders of diversity in order to differentiate between variety and balance. If one focuses on user U 1 for instance (Fig. 2a), one could clearly state that this is the most diverse user as he reaches all five categories of the tripartite graphs. This is captured by the 0-diversity ( D 0 (U 1 ) = 5 ) which is the highest value in the example ( D 0 (U 2 ) = 4 and D 0 (U 3 ) = 3). However, if one is interested in a balanced (or uniform) distribution of the relation between users and categories, one can clearly observe that U 1 is far from diverse since C 1 attracts almost all the paths starting from this user. This is clearly captured by the ∞ -diversity which is close to the lowest value 1 ( D ∞ (U 1 ) ∼ 1.09 ). To that regard, although U 3 reaches only 3 categories (Fig. 2c), the induced transition probabilities are completely uniform. This results in a ∞-diversity which is clearly higher ( D ∞ (U 3 ) = 3 ) and U 3 is actually the user that presents the most balanced profile in the example ( D ∞ (U 2 ) = 2.5 in comparison).
Finally, U 2 (Fig. 2b) is the user that presents the best profile if one wants to take into consideration both variety and balance in the diversity measure. Although the user is related to less categories than U 1 and that the distribution probabilities issued from the random walks are not as uniformly distributed than the ones issued from U 3 , his profile is sufficiently wide and balanced for obtaining the highest 2-diversity of the example: D 2 (U 2 ) ∼ 3.33 while D 2 (U 1 ) ∼ 1.20 and D 2 (U 3 ) = 3.
Having these observations in mind, in the rest of the paper we will systematically study the α-diversity for α = 0 (variety), α = ∞ (balance) and α = 2 which is a way to take into account the two dimensions. This will provide a more comprehensive picture of the impact of recommender systems on those different facets of diversity.

Collaborative filtering
The model In addition to the set U and I, let R = r ui | u rated i be the matrix describing the preferences of users as regards the items. When an item i has never been listened to by a user u, we assume r ui = 0 , otherwise we take the play count. We define c ui as the confidence the model has in the proposition "the user u likes the item i". Because there is no canonical relation between r ui and c ui we will use a simple model as suggested in Hu et al. (2008). This model is defined in Equation 4 (with µ ≥ 0 ) where the variable p ui is introduced to describe whether a user likes an item or not. We consider that as soon as a user listens to a song, he/she is prone to like it, therefore p ui = 1.
As with classic matrix factorization, we assume that each user u (resp. each item i) can be represented by a column vector of latent factors x u (resp. y i ). We name x = (x u 1 . . . x u n ) (resp. y = (y i 1 . . . y i m ) ) the matrix of user factors (resp. item factors) and p ui = x T u y i the estimator of p ui . The values x * and y * of the latent factors are computed by minimizing the weighted mean squared error between p ui and its estimator p ui .
In order to prevent the model from overfitting the training data, we add a regularization term with ≥ 0 and �x� 2 = k l x 2 kl the Frobenius norm. To recommend a set of items to a user u, we first select a set of candidates I u . This set contains all the items i a user did not listened to: I u = {i ∈ I | r ui = 0} . These items are then sorted by decreasing order of p ui . Finally, the recommended items are the k-best items in the sorted candidates list I u .

Training and evaluation
The model in 5 is optimized with a Regularized Alternative Least Squares method as described in Hu et al. (2008). However, each least square problem (which is equivalent to solving a linear system) is solved via Conjugate Gradient Descent (Takács et al. 2011).
Finally, the evaluation is conducted as follow. We first randomly select a proportion β of users in U, and name the resulting set U sel . Then, for each user u ∈ U sel , we randomly select a proportion of items previously listened to by u (ie. with rating r ui > 0 ) and add the corresponding ratings in the global user-item test set R test . All the remaining ratings are used to create the training set R train = R\R test . Finally, we make sure that while sampling items listened to by a user u, at least one item stays in the train set. This results in a testing set R test in which all the users have been encountered during training, thus eliminating the cold start problem. Assuming that the goal of a recommender system is to produce ordered lists that best match the preference of a user, we assess the performance of the model with the Mean Normalized Discounted (4) c ui = 1 + µr ui and p ui = 1 if r ui > 0 0 otherwise Page 9 of 18 Godinot and Tarissan Applied Network Science (2023) 8:9 Cumulative Gain. Given a recommended list of items L u = (i 1 , . . . , i |L| ) , ordered by their score p ui , we define the Discounted Cumulative Gain (DCG) in 6. To obtain the Normalized Discounted Cumulative Gain (NDCG), we generate an ideal recommendation list L u,ideal with the test data (the user's most listened items sorted by decreasing play count).

Analysing the effect of collaborative filtering in terms of diversity
This section presents the results we obtained by using the approach presented in Sect. Diversity in multi-partite graphs and recommender systems in the context of a dataset recording user activity on an online platform featuring musical content. We first present the general setting (Sect. Experimental setting), before investigating different questions: how diversified are the recommendations and what is the impact of the parameters (Sect. Analysis of the recommendations)? What is the effect of the recommendations on users' diversity (Sect. Effect of the recommendations on users' diversity)? How to explain the differences in the way recommendations affects users' diversity (Sect. Examining the effect of representative recommendations)?

Dataset
The dataset we used comes from the Million Song Dataset project (Bertin-Mahieux et al. 2011). To create the first two layers of the triparite graph and the user-item links, we used the Echo Nest user taste profile dataset of the project. It features triplets of (user, item, play count) that describe how many times a user has listened to a given song. To add the third layer of the tripartite graph and create the item-category links, we used the last.fm dataset of the project. It contains (item, tag, strength) triplets that describe the tags associated to each song with their strength. Furthermore, in order to obtain a coherent tripartite graph, we performed the following operations: we selected only the 1, 000 most popular tags 3 , deleted songs with ambiguous identifiers 4 , and deleted songs with no tag and users with no songs. Finally, in order to reduce the training time of our models, we randomly sampled 100, 000 users and their recorded items.

Implementation
We ran all of the experiments using the Python programming language on a 40 cores Intel Xeon server, equipped with 256 GB of RAM. We used the package lenskit (Ekstrand 2020) to instantiate, train and evaluate the recommender system. The code is available on GitHub 5 along with instructions to install and run it. Following (6) DCG(L u , u) = |L| k=1 r ui k log 2 (i) and NDCG(L u , u) = DCG(L u , u) DCG(L u,ideal , u) Page 10 of 18 Godinot and Tarissan Applied Network Science (2023) 8:9 the test procedure described in Sect. Collaborative filtering, the dataset was split in 5 user folds, using the leave-one-out strategy to create the train/test datasets. The hyperparameters were then chosen to empirically maximize the average Normalized Disconted Cumulative Gain, which resulted in 512 latent factors, µ = 40 and = 5 * 10 −3 . Unless stated otherwise, those are the values used in the rest of the paper. When studying particular users the first fold was used.

Tripartite graph of the recommendations
To evaluate the diversity of the recommendations, we define a second tripartite graph T r . In this graph, the links between songs and categories are the same as in the dataset but the links between the users and the songs now represent the recommendations (instead of the musical record of the user). In addition, to account for the impact of the position rank(i, u) of an item i in the list L u of recommendations generated to user u, the weight of the corresponding u-i link in the tripartite graph is set to |L u | − rank(i, u) . The diversity of the recommendations L u exposed to a user u is therefore the α-diversity of u in T r . Finally, to differentiate between the two tripartite graphs, we will refer to the organic diversity of a user as his/her diversity before being exposed to the recommendations.

Diversity of the recommendations
First, we investigate the properties of the recommendations in terms of diversity. Figure 3 presents the recommendation diversity (for k = 50 recommendations and α = 0, 2 and ∞ ) with respect to the organic diversity of each user. The users' volume (the sum of the play counts of all items the user has listened to) is represented by a color in a log scale 6 . This figure provides several pieces of information. First, we can observe that there is no strong relation between organic diversity and recommendation diversity since low organic diversity does not necessarily lead to low recommendation diversity. Some users with a low organic diversity are exposed to diversified recommendations,while others have a narrower exposure. In addition, and whatever the order of the diversity, we can observe that, as soon as a minimum number of items are recommended, the recommendations tend to be more diverse than the organic ones. This is particularly true for the variety captured by α = 0 , but it can also be observed (although to a lesser extent) for α = 2 and α = ∞. However, one can clearly see that this relation depends on the value of the organic diversity. As this value increases (for all values of α ), it becomes more difficult for the recommendations to reach the same level of diversity.

Impact of the parameters of the model
As regards the impact of the parameters of the model, it has been suggested that there is a trade-off between the usual notion of performance (engagement or accuracy) and the notion of diversity (Holtz et al. 2020). To investigate this relation, in Fig. 4 we show how the number of latent factors impacts the diversity 7 of the recommendations exposed to the users, along with the model performance (measured by the NDCG, in black dashed lines).
As expected, it can be observed that when the number of latent factors increases, so does the performance of the model. With more latent factors, the model is efficient in recommending items related to the user's past musical record. However, this efficiency has a clear effect on diversity, which decreases when the number of latent factor increases, except for a particularly low number of recommendations ( k = 10) and for α = 0 . In this situation, there does seem to be a trade-off between performance and diversity. This supports the observations made in Holtz et al. (2020) and clarifies the effect: the suggested trade-off can be observed for the variety but vanishes as soon as the balance exposure is taken into account in the diversity measure.  Effect of the recommendations on users' diversity Figure 3 presented in Sect. Analysis of the recommendations reveals that a list of recommendations can be diversified, even for users whose past musical records are narrow in scope. However, this plot does not provide any information on the real impact of the recommendations on users' musical habits. How diverse are the user's listening habits after being exposed to the recommendations? To investigate this question, for each user we computed the diversity increase, which is defined as where T t is the tripartite graph associated to the training dataset and T t+r the tripartite graph in which we added the recommendations. Thus a positive value of � α indicates that the recommendations have improved the musical habits of a user in terms of diversity, while a negative value indicates the opposite effect. It is worth noting here that, in order for this diversity measure to make sense, one has to carefully adapt the weights in T t+r and in particular to derive a relevant notion of play count for the recommendations. We chose to use a linear relation between the weight and the rank. Assuming that a user would listen to as many recommendations as the number u v of items he/she has listened to and that the last item recommended is not listened to ( w(u, i n r ) = 0 ), we used the following weight function: This choice clearly hides a simple user model and other choices could be investigated. In particular, one could use the models proposed in Poulain and Tarissan (2020) that account for the saturation effects in the diversity of users' attention. Figure 5 presents the distribution of the diversity increase (with α = 2 ) for different numbers of recommended items ( k = 10, 50, 100 ). Surprisingly, contrary to what Fig. 3 suggested, the musical listening habits of most users are diversified by the recommendations. Even with only ten items recommended, there is a positive increase for more than 89% of users. However, the log-scale of the y-axis could be misleading and one could object that the increase, although mostly positive, is relatively small. This does in fact prove to be the case and one can observe that the mean increase is always lower than 10, even for hundreds of recommendations. To achieve a more detailed and comprehensive picture of the effect recommendations have on users' diversity, in Fig. 6 we have plotted the diversity increase for different numbers of recommendations ( k = 10, 50, 100 ) and different orders of diversity ( α = 0, 2, ∞ ) for all users. The effect of the number of recommendations on the diversity of users' attention is clear: diversity increases with the number of items recommended (from left to right) for all diversity orders. However, one can also observe that the capacity of the recommendations to improve users' diversity is closely related to the users' profiles. Such an increase can mainly be seen in users that listen to a small number of songs (darker dots on the plots). For very active users or those with broad musical listening habits, the recommendations barely have any positive increase.
The plots also reveal that the recommendations do not have the same effect when diversity is considered in terms of variety or balance. In relation to variety ( α = 0 , top row), we can see that the recommendations improve diversity even with very few recommendations ( k = 10 ). This is not the case for other diversity orders. This suggests that the main effect of the recommendations is to introduce new categories into users' musical habits. As soon as balance is taken into account in the measure (middle and bottom row), the recommendations are less effective in increasing diversity.
These results indicate that collaborative filtering impacts the diversity of musical listening habits in a specific way. To study this relation in greater depth, we will now investigate exactly how the recommendations affect users' diversity.

Examining the effect of representative recommendations
We will conclude this section by presenting the cases of some specific users. This will shed light on the relation between recommended items and users' past musical records, and how, as a result, recommendations have different effects on users' attention. In Fig. 7, we present four users whose past musical records are either narrow (left column) or diverse (right column) and for whom recommendations ( k = 50 ) generate either a decrease (bottom row) or an increase (top row) in diversity ( α = 2 ). For each user case, the top part of the plot displays the proportion of the categories of the past musical record (left blue line) and of the recommendations (right orange line) independently, while the bottom part shows the distribution of the categories as a result of the recommendations, that is after the user is exposed to the recommendations. It is worth noting that, to ease the comparison between the different cases, we only display the most important categories 8 . Finally, users presented in this picture are also visible in Fig. 6 with their respective red marker. This figure reveals some interesting aspects in terms of how recommendations impact users. First, in all four cases, a significant fraction of the musical categories associated Page 15 of 18 Godinot and Tarissan Applied Network Science (2023) 8:9 with the recommendations are completely new to the users, thus increasing variety. This is obvious in Fig. 7a-c, even if only the main categories are displayed. In Fig. 7d, the new categories belong to less dominant categories, thus not visible in the distribution. To support this observation, we computed the proportion of new categories when a positive increase is observed in the whole dataset: on average, a positive diversity increase (for α = 2 and k = 50 ) leads to 322 new categories. This more than quadruples the number of categories of the past musical record of an average user. Second, we can observe that users with a similar profile in terms of organic diversity might be differently affected by recommendations. The real effect of the recommendations relies strongly on how they fit in with the users' musical habits. For user fdeb (bottom left), for instance, the most recommended category, hiphop, corresponds to the main category in his/her past musical record. This completely unbalances the musical landscape of this user, whose musical record was already largely restricted to hiphop music.
By contrast, the musical exposure of user dbc1 (top left) is diversified as a result of the recommendations, although he/she also has one particular musical focus: progressive metal and progressive death metal. One reason for this is that those categories barely appear in the list of recommendations (only metal, related to the former, is highly represented). Instead, the recommendations expose other new or less dominant categories in the user's past musical record (such as alternative or gothic metal). This provides a far more balanced range of musical categories, leading to a higher diversity.
One might wonder whether this effect could be due to the fact that those users have a low organic diversity to begin with. The study of users 4c0b and ac10 (right column) show this is not the case. Both are highly active users (respectively in the top 7% and top 25% of the most active users on the dataset) with a very high organic diversity (top 2% ). Yet, they both experience a completely different impact of the recommendations exposure : while the four main categories of user 4c0b (triphop, indie, rock and electronic) are also the four main categories of the recommendations, thus unbalancing the range of musical categories to which he/she is exposed, the musical record of user ac10, on the contrary, is broadened by the recommendations due to the more nuanced musical exposure. In particular, on can notice that indie (his/her most listened categories) is one of the least recommended.

Conclusion and perspectives
In this paper, we have investigated the impact recommender systems have on the diversity of users' attention. Specifically, we examined a recently proposed framework that exploits the relations between users, items and categories to measure the diversity of a user's attention. By applying a collaborative filtering approach to a dataset that records users' past musical records, we were able to undertake a detailed analysis of how recommendations affect diversity in this context.
The results presented in this paper all show that recommendations tend to have a relatively high degree of diversity (Fig. 3), and globally improve the diversity of most users (Fig. 5). However, there are some limits to this capacity to diversify users' musical habits. First, it usually undermines the performance of the models (Fig. 4). Second, this improvement does not benefit all users. Rather, it is limited to users with a low level of activity or whose musical records are narrow in spectrum (Fig. 6). Finally, when recommendations do successfully improve diversity, this is mainly due to the discovery of new categories close to the musical records of users, which enhances variety. It usually fails, however, to provide a balance exposure to the different musical categories (Fig. 7). These results are in line with recent papers which identified key complex effect of algorithmic recommendations in the context of musical platforms, although using different approaches. In particular, it is worth noting that (Anderson et al. 2020) also specifically identified that algorithmic recommendations are more effective for users with a lower organic diversity, while (Villermet et al. 2021) showed that if recommendations tend generally to drive new exploration of least popular content (hence increasing variety), such effect primarily depends on the users mode of consumption. These results, along with the ones presented in this paper, all show that when it comes to diversity, the extent to which recommendations fit in with a user's musical habits actually proves more important than the diversity of the recommendations itself.
We believe that the method proposed in this paper, as well as the practical investigation conducted on a collaborative filtering approach, applied to a real musical dataset, shed new light on how researchers could leverage network representations to examine the ethical effects of algorithmic recommendations. This approach calls for future investigations, two of which are discussed below.

Individual versus collective diversity
By focusing on individual diversity (that is, the diversity computed from one node of the network), we were able to highlight the impact recommendations have for each individual user. We then used the average computed across all the individual diversities to measure the impact the recommendation algorithm has from a global perspective. However, this approach fails to uncover certain situations that could be intuitively described as not diversified. For example, if a very diverse subset of items were systematically recommended to all users, the average individual diversity would be measured as high, although one could argue that this is extremely undiversified from a global perspective since a unique subset is exposed to all users. Fortunately, the framework we used in this paper makes it possible to detect this situation by measuring collective diversity in addition to individual diversity. Analyzing the effect of recommendations through these collective lenses would undoubtedly provide some meaningful insights. This approach would be helpful, for instance, for measuring the effect of polarization in news recommendations (Celis et al. 2019).

Integrating dissimilarity
In contrast with most of the literature on recommender systems, we did not explicitly explore the dissimilarity facet of diversity. Instead, we focused on the variety (also referred to as the â€˜coverage') and the balance of the exposure. However, most standard dissimilarity metrics are based on the scalar product between the vectors of items, extracted from users' musical records. Therefore, these metrics result in a combination of user dissimilarity and item dissimilarity. Translated in the context of the diversity measure we used, dissimilarity is close to what would be measured with the metapath User→Item→User→Item→Category. Adding such a dimension to the approach