Public risk perception and emotion on Twitter during the Covid-19 pandemic

Successful navigation of the Covid-19 pandemic is predicated on public cooperation with safety measures and appropriate perception of risk, in which emotion and attention play important roles. Signatures of public emotion and attention are present in social media data, thus natural language analysis of this text enables near-to-real-time monitoring of indicators of public risk perception. We compare key epidemiological indicators of the progression of the pandemic with indicators of the public perception of the pandemic constructed from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim 20$$\end{document}∼20 million unique Covid-19-related tweets from 12 countries posted between 10th March and 14th June 2020. We find evidence of psychophysical numbing: Twitter users increasingly fixate on mortality, but in a decreasingly emotional and increasingly analytic tone. Semantic network analysis based on word co-occurrences reveals changes in the emotional framing of Covid-19 casualties that are consistent with this hypothesis. We also find that the average attention afforded to national Covid-19 mortality rates is modelled accurately with the Weber–Fechner and power law functions of sensory perception. Our parameter estimates for these models are consistent with estimates from psychological experiments, and indicate that users in this dataset exhibit differential sensitivity by country to the national Covid-19 death rates. Our work illustrates the potential utility of social media for monitoring public risk perception and guiding public communication during crisis scenarios.

in response to disasters is not well-understood, and there is a need for more longitudinal data on such responses with which this understanding can be improved (Burns and Slovic 2012). Our goal is thus to contribute to bettering this understanding, and we do so by exploring the empirical relationships present between the progression of the Covid-19 pandemic and the public's perception of the risk posed by the pandemic.
We explain our findings in terms of the existing body of literature surrounding public perception of risk, disasters, and human suffering in cognitive psychology. In particular, we draw from psychophysics, the field that studies the relationship between stimulus and subjective sensation and perception (Gescheider 1997). The search for psychophysical "laws" of perception has existed since at least the mid-19th Century with the proposing of the Weber-Fechner law (Fechner et al. 1966), which posits that the smallest perceptible change ds in a physical stimulus of magnitude s is proportional to s. Thus, the perceived magnitude p of such stimuli follows In the continuum limit, this implies that p grows logarithmically with the physical magnitude s of the stimulus. More recently, empirical studies by Stevens (1975) supported, instead, a power law relationship between human perception of a stimulus and the physical magnitude of the stimulus: Summers et al. (1994) extended this concept to human sensitivity to war death statistics and found that a power law with exponent β = 0.32 best fit the data. A number of further studies have corroborated the extension of these psychophysical laws describing the subjective perception of physical magnitudes to the subjective evaluations of human fatalities (Slovic 2010;Fetherstonhaugh et al. 1997;Friedrich et al. 1999). In all of these, perception is a concave function of the stimulus, meaning that the larger the stimulus magnitude, the more it has to change in absolute terms to be equally noticeable. Thus, perception is considered relative rather than absolute, implying that our judgments are comparative in nature. This observation has been shown to account for deviations from rationality in economic decision-making (Weber 2004).
These proposed psychophysical laws of human perception present an opportunity for monitoring a population's response to a disaster scenario such as the Covid-19 pandemic. By evaluating the goodness of fit of these models to data on the perception of the progression of the pandemic, and determining the parameter values of such fits, we can describe the sensitivity of populations to the state of such crises, with important implications for risk communication and disaster management.
To this end, we make use of a massive Twitter dataset consisting of user-posted textual data to study the public's emotional and perceptual responses to the current public health crisis. Twitter provides convenient access to the conversation amongst members of the public across the globe on a plethora of topics, and many authors are studying several aspects of the public's response to the pandemic with it. Twitter is a particularly appropriate tool under conditions of physical distancing requirements and furlough schemes, where online communication has become more than ever a central feature of (1) dp ∝ ds s .
(2) p ∝ s β . everyday life. Moreover, results from psycholinguistics and advances in natural language processing techniques enable the extraction of psychologically meaningful attributes and the reconstruction of cognitive structures (e.g. semantic networks) from textual data. With this dataset, our general approach is to offer a quantitative, spatiotemporal comparison between indicators of the state of the pandemic and the topics and psychologically meaningful linguistic features present in the discussion surrounding Covid-19 on social media on a country-by-country basis, for a selection of countries.

Related work
Our work is novel in that, to our knowledge, it is the first to use a large social media dataset spanning multiple countries to model the perceptual response of countries' citizens to the pandemic in the context of risk perception. To date, empirical validation of the aforementioned psychophysical laws has largely taken place in controlled laboratory settings, in which decisions, actions, and scenarios are artificial or hypothetical. Our work thus contributes to the body of literature surrounding risk perception by investigating these laws in a naturalistic setting.
However, there have been numerous authors using social media to analyse the public response to the Covid-19 pandemic. This includes work that has focused on the psychological burden of the social restrictions. For instance, Stella et al. (2020) use the circumplex model of affect (Posner et al. 2005) and the NRC lexicon (Mohammad and Turney 2010) to give a descriptive analysis of the public mood in Italy from a Twitter dataset collected during the week following the introduction of lockdown measures. In addition, Venigalla et al. (2020) has developed a web portal for categorising tweets by emotion in order to track mood in India on a daily basis.
Others have instead focused on negative emotions, as in the work of Schild et al. (2020), where they study the rise of hate speech and sinophobia as a result of the outbreaks. More specifically on perception, Dryhust et al. (2020) measured the perceived risk of the Covid-19 pandemic by conducting surveys at a global scale ( n ∼ 6000 ) and compared countries, finding that factors such as individualistic and pro-social values and trust in government and science were significant predictors of risk perception. de Bruin and Bennett (2020) perform similar work in the USA. The closest work we have been able to find to our own are those of Barrios and Hochberg (2020) and Aiello et al. (2020), where both research pieces focus on the current pandemic using data from the USA. In the former, they combine internet search data with daily travel data to show that regions in the USA with a greater proportion of Trump voters exhibit behaviours that are consistent with a lower perceived risk during the Covid-19 pandemic. In the latter, they assess the epidemic psychology using Covid-19 Twitter data in the USA according to several linguistic features present in the tweets. They identify three psychological phases consistent with the refusal-suspended reality-acceptance stages of grief. Despite the above, we have been unable to find work that combines large-scale social media data with linguistic analysis to offer a spatiotemporal, quantitative analysis of emotion and risk perception during the Covid-19 pandemic across multiple countries.
Beyond the Covid-19 pandemic, our work is related to a small but growing body of literature on the use of data science in understanding human emotion and risk perception. In such work, natural language analysis has succeeded in supporting established linguistic theories such as the importance of the distribution of words in a vocabulary as a proxy for knowledge (Harris 1954), and regarding the relation between the uncertainty of events and the emotional response to their outcome (Feather 1963;Verinis et al. 1968). For instance, using textual data from Twitter, Bhatia found that unexpected events elicit higher affective responses than those which are expected . In another instance, the same author conducted experiments with 300 participants and predicted the perceived risk of several risk sources using a vector-space representation of natural language, concluding that the word distribution of language successfully captures human perception of risk (Bhatia 2019). Similar work has been conducted by Jaidka et al. (2020) in the area of monitoring public well-being, in which they compare word-based and data-driven methods for predicting ground-truth survey results for subjective wellbeing of US citizens on a county-level basis using a 1.5 billion Tweet dataset constructed from 2009 to 2015.
The remainder of this paper is laid out as follows. In "Data", we present the dataset used in the subsequent analysis. In "Analysing the public's perception of the pandemic", we provide further details on the approach followed to explore the relationships between indicators of the state of the pandemic and the public's perception of the pandemic, and discuss possible explanations for our observations by drawing on psychological literature. In "Discussion and conclusions", we summarise and offer concluding remarks, along with a discussion of the limitations of the current work and suggestions for avenues of future work.
For our analysis, we consider only the English and Spanish tweets with a non-empty self-reported location field. We process every self-reported location using OpenStreet-Maps (https ://plane t.osm.org) and remove non-sensical locations (e.g. "Mars", "Everywhere", "Planet Earth"). This allows us to group the remaining tweets by country and proceed with our analysis on a country-by-country basis. To assure the statistical significance of our analysis, we keep the countries with the highest number of tweets for each language, resulting in a geolocated Twitter dataset of ∼ 20 million original tweets posted by ∼ 4 million users on 12 different countries, which we summarise in Table 1.
1 The free Stream API randomly samples around 1% of the total tweets for the given queries. 2 A number of publicly available Twitter datasets have emerged in relation to the pandemic. We chose to work with this dataset since it used the most generic query terms among all the publicly available datasets we considered, and we wanted the least amount of bias possible for our analysis.

Epidemiological data
We measure the progression of the pandemic with the number of Covid-19 confirmed cases and deaths for all the countries in our analysis. The data was made publicly available by Our World in Data repository (Max Roser et al. 2020). In particular, we take the daily Covid-19 cases and deaths, both in linear and logarithmic scale, since these are four epidemiological indicators that are most frequently used to summarise the state of the pandemic, and are therefore frequently encountered by the public.

Analysing the public's perception of the pandemic
In this section, we study the public's perception of the pandemic on a country-bycountry basis, using the countries with the highest number of tweets in the observation period (see Table 1). We do this on a country-by-country basis since the pandemic has often evoked nation-level responses, making nation-level analysis the most natural geographic scale. Our broad approach is to inspect and compare the linguistic features of the tweets released by users in the Twitter dataset described in "Twitter dataset" section with the epidemiological data described in "Epidemiological data" section.

Defining perception from linguistic inquiry
Our goal is to explore the public's perception of the pandemic. To do this, we analyse the linguistic features present in the textual data generated by Twitter users, and map these features to psychologically meaningful categories that are indicative of the Twitter users' perception. Here, we are assuming that the words used by these Twitter users are indicative of their internal cognitive and emotional states (Tausczik and Pennebaker 2010), which is supported in (Bhatia 2019) where they predict the perception of risk using text data. Thus, we quantify the linguistic content of each tweet using the Linguistic Inquiry and Word Count (LIWC) program (Pennebaker et al. 2015). LIWC has been widely adopted in several text data analyses, and it has proven successful in applications ranging from measuring the perception of emotions (Yin et al. 2014) to predicting the German federal elections using Twitter (Tumasjan et al. 2010). Moreover, it has recently been used to successfully identify the early-epidemic psychological stages of grief in the current pandemic (Aiello et al. 2020). LIWC operates as text analysis program that reports the number of words in a document belonging to a set of predefined linguistically and psychologically meaningful categories 3 (Tausczik and Pennebaker 2010). For our purposes, a document d t i is a tweet posted on date t and from a user based in country i. LIWC represents documents as an unordered set of words, and a LIWC category l is similarly a set of words associated with concept l. For a given document d t i , the linguistic score p l for category l is the percentage of words in d t i that belong to l: There are many such categories l, including Family, Work, and Motion. We capitalise such category titles, and use the titles to refer to either the set of words associated with that category or to refer to the category itself. Linguistic scores from Eq. (3) for individual tweets will be noisy, as they are short documents. Moreover, we are interested in the average response of the population of a country. For this reason, we group the tweets by country i and by date t, and denote these sets of tweets as We then compute the National Linguistic Score (NLS) for category l as the average of the linguistic scores over documents in D t i relative to an empirically observed Twitter base rate p l B : The base rates p l B for the use of words on Twitter associated with category l are given in (Pennebaker et al. 2015). Using Eq. (4) for all the selected linguistic categories, we construct multidimensional country-level time series that represent the evolution of the public perception of the pandemic, similar to the linguistic profiles introduced by (Tumasjan et al. 2010). These perception dynamics are influenced by each user in our dataset, which may include bots and institutional or public relations accounts. We discuss the possible implications of this aspect of our data in "Discussion and conclusions" section.
In Fig. 1, we show the collection of NLSs for a selection of relevant linguistic categories. We observe clear trends that, in most cases, are synchronized between countries and languages. In particular, most categories associated with emotion-notably Affect, Anger, Anxiety, Positive emotion, Negative emotion, and Swear words [swearing is associated with frustration and anger (Jay and Janschowitz 2008)]-have their highest scores in mid-to-late March, when the World Health Organisation (WHO) announced the pandemic status of Covid-19 and most Western countries introduced more stringent social restrictions (Hale et al. 2020). These scores decay thereafter, indicating a relaxation of the emotional response in the conversation. This is consistent with results reported by Bhatia regarding the affective response to unexpected events ) and with those of Aiello et al. (2020) where the Death NLS of the USA rises from late March on. A qualitatively similar trend can be seen in the Social processes panel, the category involving "all non-first-person-singular personal pronouns as well as verbs that suggest human interaction (talking, sharing)" (Pennebaker et al. 2015).
We also observe that health-related categories such as Death and Health show an overall rising trend, with Death rising most rapidly throughout March. These categories, with the exception of Positive Emotion and Health, peak again in the USA at the end of May, coinciding with the murder of George Floyd and the subsequent Black Lives Matter protests. Such universal trends are not apparent by visual inspection in the Money, Risk, and Sadness panels. An additional feature of these plots is the absolute scale of these values: in all cases, there is a significant percentage change from their baseline values, with large percentage increases observed initially in the use of words associated with Anxiety and later with Death, and a moderate percentage increase in the use of words associated with Risk.

Comparing the public's perception with epidemiological data
In this section, we explore the relationship between the NLSs described in "Defining perception from linguistic inquiry" section, which we use as a proxy for the public's perception, and the intensity of the pandemic, which we assume is the stimulus triggering this perception. Our measure of the intensity of the pandemic is the number of Covid-19 cases and deaths from the data described in "Epidemiological data" section.
A straightforward way of approaching this relationship is by computing the correlations between the NLSs and the epidemiological data in a per-country basis, and we show the average across countries of these per-country correlations in Fig. 2. On the one hand, we observe significant negative correlations in emotionally charged categories (eg. Swear words, Anger, Anxiety, Affective processes), indicating a decay in emotion as the pandemic intensifies. Conversely, categories related with health and mortality (Death, Health) and analytical thinking (Analytic) show significant positive correlation. 4

Psychophysical numbing
We believe the trends we observe in Fig. 1 and the correlations we observe in Fig. 2 are consistent with the notion of psychophysical numbing. This term was introduced by Robert Jay Lifton (1982), and developed by Paul Slovic (Slovic 2010);Fetherstonhaugh et al. 1997) in the context of human perception of genocides and their associated death tolls, to describe the paradoxical phenomenon in which people exhibit growing indifference towards human suffering as the number of humans suffering increases. By inspecting the correlations between the NLSs and the epidemiological indicators, we find that as the pandemic intensifies-in the sense of an increasing number of cases and deaths reported daily-our emotional response diminishes, as expected from a psychophysical numbing phenomenon.
Specifically, we observe negative correlations between almost all components of the NLSs associated with affect-Affective processes, Anger, Anxiety, Negative emotion,  4)] averaged across all countries. * "Risk" and "Analytic" are only available for the English-language LIWC. These two categories are thus averages across English-language countries only Positive emotion, and Swear words-and the epidemiological data. 5 By inspecting Fig. 1, we see that every country exhibits similar downward trends in these components and, with the exception of Anxiety, are all significantly lower than their baseline values throughout the observation period.
This unusually low and decreasing Affect word count is accompanied, conversely, with a growing awareness of the morbidity of the situation in that we observe significant positive correlations between the Death NLSs and the daily national cases and deaths, indicating that the decrease in affect occurs simultaneously with and despite an attentional shift towards Covid-19 related mortality. We also observe a simultaneous increase in the Analytic component of each English-language dataset 6 over this same period, indicating a movement towards more logical and analytical, rather than intuitive and emotional, thinking.
The potential implication of this is that the public is less perceptive of the risk that the pandemic poses to public health, since their emotional response is reduced and reducing (Sandman 1993). For example, Van Bavel et al. (2020) and Loewenstein et al. (2001) describe that risk perception is driven more by association and affect-based processes than analytic and reason-based processes, with the affect-based processes typically prevailing when there is disagreement between the two modes of thinking. The negative correlations between the intensity of the pandemic and affective processes, together with its positive correlation with the prevalence of analytic processes, suggests that public risk communication could be adjusted to re-balance the degree of affective and analytic thinking amongst members of the public to achieve favourable risk avoidance behaviour and, consequently, favourable public health outcomes.

Analysing the emotional framing of Covid-19 casualties with semantic networks
To support our claim that these observations are attributable to psychophysical numbing, we construct word co-occurrence networks using tweets in our dataset. Word co-occurrence networks are a class of linguistic networks in which nodes are words appearing in a body of text and an edge is placed between a pair of words with a weight given by some function of the number of co-occurrences of that pair in the text. Empirical word co-occurrence networks have been used in cognitive network science as approximate reconstructions of the author's latent cognitive structures, e.g. semantic or conceptual networks (Siew et al. 2019), with a given corpus deemed to be an empirical manifestation of such structures.
For example, Kenett et al. (2014) reconstruct participants' internal semantic networks on the basis of their responses in a free word association task, reporting that participants that were found independently to have lower creativity scores also had less wellconnected semantic networks-specifically, a higher modularity, average shortest path length, and diameter, and a lower small-world-ness (Humphries and Gurney 2008)than participants scoring more highly in creativity.
In the context of the Covid-19 pandemic, Stella et al. (2020) use word and hashtag cooccurrence networks in conjunction with word-to-emotion mappings to uncover complex emotional profiles amongst Twitter users posting from Italy during the first week of lockdown. More generally, a plethora of models for inferring semantic relationships between words in natural language processing tasks are based on some notion of word co-occurrence (Jones et al. 2020). The semantic proximity of a pair of words in such models has also been shown to possess predictive power regarding the subjective probability participants assign to hypothetical real-world events involving that pair of concepts (Bhatia 2016). For a more complete review of the use of linguistic networks in the study of human cognition, we refer the interested reader to (Siew et al. 2019).
Given the well-established utility of word co-occurrence analysis in providing a view of authors' internal cognitive structures, we employ such an approach on W = Death ∪ Affect-the set of words in either the Death or Affect categories-in an attempt to approximate the Twitter users' internal semantic relationships between these two concepts. 7 Specifically, we hypothesise that, if the psychophysical numbing effect is legitimate, the modular structure of these networks will separate Death-related and Affect-related words more decidedly at larger daily death counts than at lower death counts. This would indicate that conversation regarding Covid-19-related mortality evokes a weaker emotional response at higher daily death counts.
Given a set T of tweets, the word co-occurrence network G(T ) is represented by a weighted adjancency matrix A(T ) in which the nodes are words belonging to W . Entry A ij (T ) counts the number of co-occurrences between words i and j across all tweets in T , and is computed as where B tk (T ) counts the number of instances of word k in tweet t ∈ T . We ignore selfedges by imposing A ii = 0 , since it is the relationship between distinct words that is of interest. (See "Appendix 3.1" for further details on the construction of these networks.)

Qualitative overview of the death-affect partition
We identify three main periods for which we construct network snapshots of word cooccurrences (see Fig. 3a, c). The first period spans 11th March to 9th April 2020, in which the WHO declared Covid-19's pandemic status and governments generally imposed social restrictions. The second period spans 10th April to 23rd May, during which most Covid-19 cases either underwent exponential growth or flattened out for some countries in Europe. The final period spans 24th May to 13th June, during which most countries were at the peak daily rate of Covid-19 cases or where in a stage of decreasing number of daily cases. Moreover, the Black Lives Matter protests were triggered by the murder of George Floyd in the USA in this period. In constructing these networks, we weight each country equally by taking a random sample of approximately 300,000 tweets from each country.
In Fig. 3a-c, we visualise these three snapshots for the English-language tweets. From these we observe that two clusters emerge in all cases: a left-hand cluster consisting mainly of Death-related words and a right-hand cluster consisting primarily of Affectrelated words. We also observe that the relative sizes of these clusters vary over time: the Death-cluster grows in size as the pandemic progresses, and remains separated from the Affect-based cluster. This indicates that the evolving structure of these networks may be consistent with our hypothesis of psychophysical numbing: throughout, Covid-19 casualties appear not to evoke a strong emotional response.  . 3 Snapshots of the word co-occurrences associated with Death (green labels) and Affect (red labels) for English-language tweets aggregated across all analyzed countries in three different time windows (see sub-captions). The nodes are coloured according to their community label as obtained by maximising modularity with the Louvain algorithm (Blondel et al. 2008). We filtered edges with weight below 20 co-occurrences for visualisation purposes However, we find that a number of the most highly connected nodes in these Death clusters are Affect-related words: in the first network, the Affect-related words "panic", "positive", and "isolat*" appear; in the second, the words "care", and "fail*" also appear; and in the third, "protests" appears. While such words are normally associated with affective processes, we argue that some of these are more readily understood in terms of their association with Covid-19-specific topics that are less indicative of an affective experience in this context than they might be more generally. For example, "positive" is used very frequently in the context of the pandemic in relation to individuals "testing positive" for the virus. In Table 2, we address five of these words, providing what we believe are the most plausible explanations for their association with conversation surrounding mortality during the Covid-19 pandemic.
Altogether, this initial examination indicates that words associated with a subjective emotional/affective experience and words related to death may be well-separated in this Twitter data, which is consistent with the notion of psychophysical numbing as an explanation for the trends and correlations observed in Figs. 1 and 2. For completeness, we include the equivalent co-occurrence graphs for the Spanish-language tweets in "Appendix 3.2", about which similar statements can be made.
Our discussion has so far been qualitative given that the aforementioned network snapshots (i) vary considerably in size, (ii) represent the aggregate conversation of the tweets across countries in our dataset, and (iii) involve crude aggregation over large time periods. In the next section, we address these issues by investigating the change in a number of network measures over time and discuss the extent to which they support our hypothesis of psychophysical numbing.

Quantitative analysis of the death-affect partition
To further probe this hypothesis of psychophysical numbing, we seek network measures that describe the strength of association between the concept of death and affective processes. Since the primary tenet of psychophysical numbing is that "the more who die, the less we care", our investigation is focused on the degree to which conversation around Covid-19 mortality evokes the use of affective language, which is our proxy for "degree of caring". In particular, we are interested in whether the emotional framing of such conversation changes as the daily death rates change in each country, where a less emotional conversation at higher daily death rates would support our hypothesis.
For this purpose, we investigate the dynamics of the following network measures over a sequence of comparable snapshots for each country: 1 the weighted modularity for the partition P LIWC induced by assigning nodes to their respective LIWC categories, i.e. Death or Affect. We compute the weighted modularity following Newman (2004) as where A ij (t) is the weighted adjacency matrix of a network at snapshot t, strength of the network, c i ∈ {Death, Affect} represents the community assignment of node i under partition P LIWC , and δ(·, ·) is the Dirac delta function. 2 the fraction of the total strength of node "death*" ("muert*") that can be attributed to its connections with other nodes in the Death category: The range of f Death is bounded between 0 and 1, being 0 when all of the neighbours of node "death*" ("muert*") are in the Affect category and 1 when all of its neighbours are in the Death category. We henceforth omit the explicit time-dependence of f Death and Q LIWC to simplify notation. The first measure tracks the quality of separation of the Death and Affect categories in these semantic networks. A larger Q LIWC indicates a better separation between Death and Affect in the empirical word co-occurrences. This is relevant to our investigation of psychophysical numbing in the following sense: if the numbing effect is genuine, we should expect that Q LIWC is larger at larger values of the daily number of deaths and lower at lower values of the daily number of deaths. This would indicate that conversation around Covid-19-related deaths evokes affective responses less strongly for larger death rates. If a weakening association between the concept of death and affective processes is an accurate measure of growing apathy and indifference-of the "collapse of compassion" (Cameron and Payne 2011)-then observing a positive correlation between Q LIWC and the daily national number of deaths would provide evidence supporting our hypothesis of psychophysical numbing.
The second is a local measure of the strength of association between the concept of Covid-19 deaths-represented with the word "death*" ("muert*") in a tweet-and the affective processes within those tweets. A high f Death value suggests a weak evocation of affective responses during conversation around Covid-19-related deaths.
To perform this analysis, we compute a sequence (G t ) t of higher-frequency snapshots than those in Fig. 3a-c, where t = 1, . . . , T labels each of the T snapshots for a given country. Each snapshot represents, on average, the tweets contained in 3 consecutive days and, for each country, each snapshot has roughly the same number of tweets (see "Appendix 3.1" for details on the construction of these networks). With this construction, each network contains approximately the same number of nodes, edges, and network total strength, enabling a fair comparison of the network measures-Eqs. (6) and (7)-over time.
In Fig. 4, we plot the z-scores of these network measures and of the log of the daily number of deaths, log s(t) , for each country, and report the Pearson correlation coefficients ρ Q and ρ f of Q LIWC and f Death with log s(t) , respectively, in parenthesis above each plot. In general, we observe similar dynamics for both Q LIWC and f Death . This is sensible, since both are measures of the relative strength of association within the two communities induced by the Death and Affect word sets. Furthermore, we observe a number of instances-most notably, Canada, Colombia, Mexico, the UK, and the USA-in which there is a relatively strong correlation between these network measures and log s(t) . These correlations are, however, weaker in other countries to varying degrees.
To verify that the observed ρ Q and ρ f can be attributed to the empirical word co-occurrences, we compute the same correlations for corresponding sequences of null network models. For our null model, we take the weighted version of the configuration model described in (Britton et al. 2011). Here, a realisation G null j,t of the null model at random seed j involves assigning node i D i stubs, where and p(d) is the empirical degree distribution at time t. The k-th stub for node i is then assigned a weight where p(w|D i = d) is the empirical distribution of weights for nodes with degree d. Stubs with the same weight are then joined with uniform probability.
As a baseline, we compute a sequence (8)  6) and (7) respectively], and the log of the national daily deaths (blue). Each panel represents a country, and in parentheses we show the Pearson correlation coefficients between these network measures and the log of the national daily deaths. Data is smoothed with a 3-day moving average and standardised by their z-scores of null model ensembles for each country at each snapshot t, where j = 1, . . . , J labels each of the J realisations of the null model. Here, we take J = 100 realisations per snapshot. We then compute the average network measure over each ensemble for both Q LIWC and f Death , here denoted Q null LIWC and f null Death respectively. Similarly, we write the correlation coefficients as ρ null Q and ρ null f . We report these coefficients, along with the correlation coefficients for the empirical networks, in Fig. 5.
We find that, in most cases, the correlation coefficients are higher for the empirical word co-ocurrences than for the null model counterparts. In particular, each of Australia, Canada, Colombia, South Africa, the UK, and the USA have ρ Q ≫ ρ null Q . This difference is also present for Nigeria, although it is smaller in this instance. For the remaining countries, however, the differences are either negligible or in the opposite direction. Overall, nonetheless, we see that the average across countries of ρ null Q is low, whereas the average across countries of ρ Q is almost three times larger, and that ρ Q > ρ null Q for nine cases out of twelve. A similar pattern is observed for ρ f and ρ null f . In some instances-namely Australia, Colombia, South Africa, the UK, and the USA-we observe an increase > 0.2 in the correlation coefficients of the original sequence of snapshots relative to the corresponding sequences of null snapshots. This indicates that these increases in ρ f can be attributed to the empirical word co-occurrences. The difference is smaller but nonetheless in the correct direction for Chile, Mexico, and Nigeria. For the remaining countries, the difference is negligible.

Discussion of the semantic network analysis
Overall, our semantic network analysis provides evidence in favour of our hypothesis of psychophysical numbing, although this evidence is not definitive. We have seen that, for most countries, the separation between words associated with Death and Affect in our approximate semantic networks-as measured by Q LIWC and f Death -becomes more pronounced as the national daily deaths rise, and that this relationship is generally weaker in the null model realisations.  6) and (7) respectively]. We indicate null model counterparts with superscripts "null" (see main text for details) There are nonetheless some exceptions to this statement. In particular, we find for Chile and Mexico that the difference between ρ and ρ null is marginal, but that both versions of the correlation coefficients are high. We also report low correlations between these network measures and the time series of daily deaths for Argentina, India, and Spain. For the case of Spain, however, there are two exogenous death-related events contributing to this anomalous behaviour and low correlation values (see "Appendix 2" for details). For the case of India, there is evidence suggesting that Twitter users posting from India have a strong preference for using Hindi in the expression of negative sentiment and emotion, but English in the expression of positive emotion (Rudra et al. 2016). Our use of an English-language dictionary for evaluating the emotional content of such tweets may therefore bias our results, and a more thorough analysis including tweets and dictionaries in both Hindi and English [or in "Hinglish", the blending of the two (Mathur et al. 2019)] should be performed in future. This is a specific case of a more general problem regarding the use of a single dictionary to analyse texts from different world regions, which typically differ in dialect.
For the remaining countries in our dataset, however, the empirical co-occurrences yield stronger correlations between the network measures and the national daily deaths than in the case of the baseline models, providing support for our psychophysical numbing hypothesis. Our observations thus indicate that psychophysical numbing may be a genuine effect for many Twitter users, but that other factors are possibly contributing to our results. Some of these factors are methodological issues with this work. First, we saw in Fig. 3a-c that LIWC is unable to account for context, and that there are a number of words that are classically associated with affective processes that are more appropriately associated with concepts surrounding mortality in the context of the pandemic. Second, in analysing word co-occurrences, we only retain tweets that contain at least two distinct words in the set Death ∪ Affect by construction. We have evaluated separately the proportion of tweets in each snapshot that contribute to our word co-occurrence networks, and have seen that this usually corresponds to between 10-20% of tweets for each snapshot, with between 20-30% of tweets involving the use of only one word in Death ∪ Affect. As such, this potentially leads to a systematic overestimation of the relative strength of association between words in Death ∪ Affect. Finally, as with most studies of organic social media data, it is hard to control for exogenous factors that form part of the Covid-19 conversation (e.g. Black Lives Matter protests, death-related news). It is thus important to treat such evidence as complementary to classical laboratory-based, controlled psychological experiments.

Modeling attention to Covid-19 casualties
In the previous section, we demonstrated our finding that as the pandemic intensifies, the proportion of words that appear in the set of Tweets posted in each country that indicate emotion diminishes over time. This indicates that the actual emotional response to the pandemic diminishes as the intensity of the pandemic increases, implying a psychophysical numbing effect. We supported this explanation by showing that the word co-occurrence networks induced by our set of tweets host a community structure that separates words in the Death and Affect dictionaries, suggesting that people do not talk about Covid-19 deaths in a highly emotional tone. We built on this analysis by tracking a number of measures of this supposed separation in higher-frequency sequences of snapshots for each country, observing that these network measures behaved consistently with our hypothesis of psychophysical numbing for a number of countries.
The following sections model the relationship between the progression of the Covid-19 pandemic and the Twitter users' perception using grounded theories of psychophysical numbing. Until this point, we have used the emotional framing of the conversation around Covid-19 mortality as an indication of the degree of concern or indifference towards these casualties. However, one could argue that attention itself is equally indicative of the degree of concern experienced by individuals regarding such casualties. Indeed, both are recognised as key components to risk perception and the perception of threats (Slovic 2010). For this reason, we investigate the relationship between the typical perceptual response of individuals to a stimulus, in this case the daily number of reported deaths nationally, and seek to describe this relationship using established psychophysical laws, as in previous lab-based psychological experiments e.g. Stevens (1957).

The Weber-Fechner law
Our analysis suggests that the public's perception of the progression of the pandemic is logarithmic or, at least, sublinear. From Fig. 2, we observe that the correlation magnitudes between NLSs and epidemiological data are generally larger in absolute value whenever the latter are taken in logarithmic scale. To exemplify this observation, we show in Fig. 6 the z-scores 8 of the Death NLSs and of the logarithm of the daily number of deaths and cases within each country.
The general correspondence between all three normalised features in each country is striking. 9 We propose that this can be explained in terms of the Weber-Fechner law (Fechner et al. 1966), which is a quantitative statement with its origins in psychology and psychophysics regarding humans' perceived magnitude p of a stimulus with physical magnitude s. It states that a human's perception of the magnitude of a stimulus varies as the logarithm of the physical magnitude s of the stimulus, meaning we are more sensitive to ratios when comparing different physical magnitudes than we are to absolute differences. In the continuum limit, Eq. (1) gives the following functional form for the Weber-Fechner law: where k and s 0 are real-valued parameters and R(t) the residual. Parameter k determines the sensitivity of perception to changes in the stimulus s, while s 0 determines the minimum threshold that the stimuli s must overcome in order to be perceived. The residual term R(t) is a random variable representing noise not directly captured by the stimulus. (10) For instance, exogenous events can trigger abrupt peaks in the Death score. This is the case, for example, with the murder of George Floyd in the USA, or the peak in Nigeria around April 17th 2020, triggered by a number of prominent African figures dying from Covid-19 around that day, including the Nigerian President's top aide (see "Appendix 2" for details on these exogenous peaks). In order to test the Weber-Fechner law, we fit a linear regression model to p Death i (t) , the Death NLS time series in country i, and log s i (t) , the daily number of deaths in the same country, and summarize the results of these fits in Table 3. We find that Eq. (10) accurately models the data, with significant coefficients (p-value < 0.01 ) for all countries except Spain. The sensitivity parameter k has the same order of magnitude for all significant countries. However, the country with the lowest k is ∼ 3 times less sensitive than the highest, indicating that Twitter users in different countries may react differently to the evolution of the pandemic. The minimum stimuli threshold s 0 , in the other hand, is always small: most countries, except for the USA and the UK, need only one Covid-19 death in a given day in order to be perceived. Conversely, the USA and UK need approximately 5 and 6 deaths to be (t) (blue), the logarithm of the daily deaths (orange), and the logarithm of the daily cases (green). Each panel presents a different country, with the country name provided in the subplot title. The correlation between p Death i (t) and the national daily death rate is given in parentheses for each country. Data is smoothed with a 3-day moving average and standardized with their z-score to make them visually comparable. Black vertical lines represent peaks in the death discourse caused by exogenous events not related to psychophysical numbing (see "Appendix 2" for details) which we remove from the time series perceived, which is small compared to the thousands of daily deaths registered in these countries during the observation period.

Power-law perception
An alternative functional form for the relationship between human perception p of a stimulus and the physical magnitude s of the stimulus is a power law relationship where ν and β are parameters determining the perception from a stimulus of unit magnitude and the growth rate of the perception as a function of the stimulus magnitude, and R (t) is a residual term. This form has been shown to outperform the Weber-Fechner law in characterising human perception in a number of empirical studies (Stevens 1975). We also therefore report the results of this model fit to the relationship between the Death NLS p Death i (t) and national daily death counts s i (t) for each country i, reporting our results in Table 4.
In all cases, we observe sublinear exponents β for the perception of the daily deaths data, with significant exponents (p-value < 0.01 ) ranging between 0.085 and 0.36. These exponents are of the same order of magnitude as the β of 0.32 reported in (Summers et al. 1994), where in several laboratory experiments they measure psychophysical numbing in participants' perception of death statistics. As discussed previously, the data for Spain is unusual for a number of reasons, thus the model does not accurately describe the data in this instance. These results suggest that Twitter users in certain countries are more sensitive to change in the number of deaths than others.
(11) p(t) = ν · s(t) β +R(t), Fig. 6) Overall, this model best describes the relationship between the daily number of deaths local to each country and the Death NLS.

Model comparison
Both the Weber-Fechner law and power-law relationships between the Death NLS and the daily number of reported deaths accurately model the data. Each captures the phenomenon in which "the first few fatalities in an ongoing event elicit more concern than those occurring later on" (Olivola 2015). By way of comparison, we present in Table 5 the normalised root mean squared errors (NRMSE), defined as for these models, in addition to a linear model between p Death i (t) and s i (t) as a baseline "null" model. Here, e(t) = p(t) −p(t) is the model residual, and n is the sample size. The models are directly comparable in this sense, since each involves only two parameters. Bhatia (2016) performed a similar model comparison to test psychophysical laws for subjective probability judgements of real-world events, in that case finding that the linear relationship was the best. In our case, however, a linear relationship between s and p is significantly worse than the present concave models of perception [see "Appendix 1" for the results of the linear model], reinforcing our hypothesis of psychophysical numbing.
While the Weber-Fechner law is better than the power law model overall, the difference in their goodness of fit-as measured by the NRMSE-is marginal. Both are reasonable descriptions of the observed relationship, and similar conclusions can be drawn from both.
In particular, the parameters k and β from the Weber-Fechner law and power law, respectively, are analogous in their interpretation as the measure of the sensitivity of the nation's Twitter users to changes in the national Covid-19 daily death rate. To illustrate this, we rank the countries in our dataset in order of sensitivity to changes in the (12) NRMSE = 1 n n t e(t) 2 p max − p min ,

Table 4 The results from the fit of a power law [see Eq. (11)] to the relationship between the Death NLS and the national daily death count
This is the best model in some cases, though it is outperformed by the Weber-Fechner law most times. *While we fit this model assuming a log-log relationship between p and s, we compute R 2 with linear p to make it comparable to the model implied by the Weber-Fechner law [see Eq. (14)  local death rate, as measured separately by these two parameters, and plot the correlation between the countries' ranks in Fig. 7. Here, low rank indicates high sensitivity to changes in the number of daily deaths nationally. The correlation between the two  Fig. 7 Comparison of the rank of each country as determined by their k and β parameters in the Weber-Fechner and power-law fits, respectively, which determine the sensitivity of Twitter users tweeting from each country to changes in the number of daily reported deaths. Low rank indicates high sensitivity relative to the remaining countries. The correlation between countries' ranks from both measures is high at 0.77 methods of ranking-according to k, the Weber-Fechner law slope parameters, and according to β , the power law model exponents-is high, with correlation coefficient 0.77. This shows that the sensitivity of each country is relatively robust between models. By both measures, therefore, Twitter users tweeting in English and Spanish from Australia and Argentina, respectively, appear to be the most sensitive to changes in the national daily death rate, while Twitter users posting in English from South Africa, India, and Nigeria and in Spanish from Spain and Chile appear to be the least sensitive to these changes.

Discussion and conclusions
We explored the country-by-country relationship between the linguistic features present in a large set of tweets posted in relation to the Covid-19 pandemic, and the progression/intensity of the pandemic as measured by the daily number of cases and deaths in each country in our dataset. By considering the change, relative to a baseline, in the percentage of words present in each tweet that are associated with a number of psychologically meaningful categories-here called linguistic scores-we observed significant trends that we believe are indicative of a psychophysical numbing effect (Slovic 2010). We found that the National Linguistic Scores [NLSs, see Eq. (4)] associated with emotion and affect decrease as the pandemic intensifies. This is in spite of a greater attentional focus on death and mortality and a simultaneous increase in use of words indicating analytic reasoning. We showed, by constructing word co-occurrence networks on different time periods of the pandemic, that words related to death co-occur more frequently with other words related to death than they do with words indicating affect and emotion. We constructed network measures of this separation between the concepts of death and emotion-namely the weighted modularity of the partition induced by the Death and the Affect LIWC dictionaries, and the fraction of strength of the "death*" (muert*) node attributable to connections with other nodes in the Death category-and showed that this separation became more pronounced at larger daily death rates for a number of countries. This is consistent with the notion of psychophysical numbing, which we believe may explain these observations.
We also showed that the psychophysical laws of Weber-Fechner and of power law perception in humans accurately model the relationship between the frequency of words related to death and the actual daily number of Covid-19 deaths in each country. We estimated sub-linear exponents in the power law perception function that are of similar values to values previously estimated from psychological experiments (Summers et al. 1994). These exponents, together with parameter k of the Weber-Fechner law [see Eq. (10)], tell us how sensitive the Twitter users in each country are to their national Covid-19 daily deaths, and were seen to vary by country, indicating inter-country differences in risk perception and sensitivity to death rates. Such sensitivities were consistent across models (see Fig. 7) suggesting that these measures are robust features of the data.
Overall, our results indicate that two key factors contributing to risk perceptionattention and emotion (Slovic 2010)-may be evolving in line with that predicted by psychophysical numbing amongst members of the public. In general, both measures of the degree of concern towards Covid-19-related casualties expressed by the Twitter users in our dataset appear to decrease as the number of Covid-19-related casualties increases. This potentially reflects a collapse of compassion and a concavity in the value assigned to human lives as the number of potential casualties grows.
Our findings illustrate the signaling power of Twitter, and demonstrate its potential use as a tool for monitoring public perception of risk during large-scale crisis scenarios. With the modelling and visualisation approaches we employ in this paper, policy-makers and public officials could track in near-to-real-time the public's attitudes towards threats to public well-being and the prevalence of factors important to public perception of risk, including degree of outrage and relative attentional focus on the threat. Our findings also imply a functional form for agent perception of the system state in models of opinion dynamics. This will be instrumental for developing coupled opinion dynamicsepidemiological models, in which the bidirectional relationships between human perception, human behaviour, and epidemic progression are modelled endogenously.
A natural extension to this work would involve nowcasting and/or forecasting of certain economic indicators. It has also been limited in that we assumed that only the national death rate is a significant predictor of perception. A more complete analysis should account for the effect of other countries' death statistics as a driver of local perception, or more broadly an advancement of a process-level explanation of the cross-cultural differences we observe in the sensitivity to death statistics. This analysis could also be enhanced by relating these measures of risk perception to behavioural data, whichsince "people's behavior is mediated by their perceptions of risk" (Weber 2004)-may be useful for understanding the role of emotions in driving behaviours that are conducive to public health during crises. Further, a deconstruction of the aggregate indicators we have developed to the state and regional level may be necessary to more accurately characterise the relationship between local crisis progression and human risk perception.
It is important to acknowledge that additional factors may be at play and contributing to our findings. In particular, our dataset is a large social media dataset in which non-human accounts -for instance, bots, institutional accounts, and companies' public relations accounts-coexist with human accounts. Such public relations and institutional accounts can be subject to editorial constraints on the kind of language used, and therefore may not reflect any true underlying subjective experience. The use of tweets from such nonhuman accounts may nonetheless be appropriate. Indeed, it is widely accepted that news media play a significant role in shaping public attention and opinion, e.g. via the Cultivation or Agenda-Setting theories of consumer-media relations (Bryant and Miron 2006;Mccombs and Shaw 1972). With almost half of all UK adults consuming news through social media in 2020 (https ://www.ofcom .org.uk/resea rch-and-data/tv-radio -and-ondeman d/news-media /news-consu mptio n), for example, the inclusion of news and institutional accounts may act as a proxy for public attention and opinion at large.
With regard to bots: previous large-scale studies of Twitter data have demonstrated the influence bots can have on the exposure of human accounts to emotional content (Stella et al. 2018) and the extent to which they can distort the discussion on certain topics (Bessi and Ferrara 2016). More recently and in the context of the current pandemic, bots have been shown to have a significant role in promoting political conspiracy theories (Ferrara 2020). By ignoring retweets and using unique original tweets only, we mitigate to some extent the potential effect of bots, which have previously been shown to engage in retweeting behaviour significantly more frequently than they do the creation of original content (Bessi and Ferrara 2016). It is nonetheless likely that, even if the hypothesised psychophysical numbing effect is genuine, our observations are partly attributable to the nature of content generated by these non-human accounts.
Furthermore, we stress that the results presented in this paper may be indicative only of the responses of Twitter users posting from each of these countries in each of these languages, so extrapolating these results to the broader population will only be possible with a better understanding of the biases present in, and representativeness of, the dataset at hand. While the demography of Twitter users has been to some extent mapped for the USA [see e.g. Greenwood et al. (2016)] and the UK [see e.g. Sloan (2017)], it is difficult to find similar studies for the remaining countries in our dataset, and thus to interpret these country-level differences in terms of potentially differing demographic representation on Twitter. We nonetheless advance this as a factor that possibly contributes to our results.
We also reiterate that our analysis has been crude in that we make use of a single dictionary for each language when extracting linguistic features from our data. This ignores important differences in dialect and language use between different nationalities and cultures, and can result in the systematic omission of certain linguistic features (Rudra et al. 2016;Mathur et al. 2019) which may also contribute to the observed differences between countries. Further important differences between countries which may help to account for the observed results are differences in the importance of religion in each of the considered countries. The set of countries under consideration here span the full spectrum of importance assigned to religion (Hackett et al. 2018), and attitudes towards death and the framing of mortality may vary accordingly by country. Despite these difficulties inherent to the empirical analysis of social media data, we nonetheless hope that our work inspires further investigations into the use of natural language processing and cognitive network science to investigate the prevalence of psychophysical numbing in naturalistic contexts.
Abbreviations LIWC: Linguistic Inquiry and Word Count; WHO: World Health Organization; NLS: National linguistic score.
In this section, we present further results of our models to give a more complete overview of their quality. Besides the Weber-Fechner law and power law models [see Eqs. (10) and (11)], we use the following linear relationship between s and p as our benchmark model where a and b are parameters. We summarize our results for the linear model in Table 6.
For all models, we compute the R 2 values where e(t) = p(t) −p(t) is the model residual, σ 2 p = n t=1 (p(t) − µ p ) 2 /(n − 1) is the variance of p(t), and n is the sample size. The R 2 values for all models are summarized in Table 7. (Note that as the power law model implies a log-normal residual, the R 2 values can be negative.) From this table we see that, once again, the Weber-Fechner law is generally a better fit to the data across all countries, but that the power law and Weber-Fechner models are often comparable and significantly better than the linear model.
We also show in Figs. 8 and 9 scatterplots of the Death NLSs against the logarithm of the daily number of deaths in each country, with the y-axis in linear-and logscales, respectively. Red lines indicate the line of best fit, with the slope equal to k and β in Eqs. 10 and 11, respectively. In this section, we address significant deviations in the National Linguistic Scores from our proposal of psychophysical numbing as an explanation for their trends over the observation period, and suggest possible explanations for their occurrence, see Table 8. We stress that the following table might be prone to error although we double checked every peak.

Appendix 3.1: Further technical details on co-occurrence network construction
In constructing the word co-occurrence networks presented in "Psychophysical numbing" section, we perform basic text preprocessing, including taking the lower-case form of all letters, removing URLs, removing punctuation, and removing the following small set of stopwords from the vocabulary: to, today, too, has, have, like. We retain hashtags, since LIWC also recognises hashtags and because hashtags are an essential aspect to communications on Twitter. It is also necessary to account for the fact that a number of "words" appearing in the LIWC dictionary are in fact regular expressions to which many complete words in the Twitter dataset map. For example, the "word" "isolat*" appears in the English LIWC dictionary, to which each of the following words would map: "isolate", "isolated", "isolating". Thus, construction of the word co-occurrence networks G ′ i involves a two-step procedure: first, constructing the raw word co-occurrence networks G i , in which the nodes are words exactly as they appear in the Twitter dataset; and then reducing this to a quotient graph G ′ i by contracting nodes in G i that are matched by the same regular expression in the LIWC dictionary. More formally: the LIWC dictionary implies an equivalence relation ∼ on the vocabulary V implied by the Twitter dataset, such that v ∼ u for words v, u ∈ V if both v and u are matched by the same regular expression in the LIWC dictionary. The weights of edges between nodes v ′ ⊂ V and u ′ ⊂ V in G ′ i are then taken to be where w G (x, y) is the weight of edge (x, y) in G. Note that w G (x, y) = w G (y, x) and w G (x, y) = 0 if (x, y) is not an edge in G.
To construct the higher-frequency sequences of snapshots, we impose a minimum document frequency of 5 × 10 −3 ( 2.5 × 10 −3 for Spanish tweets) for each term in the vocabulary in order to reduce the effect of noise. In Table 9, we summarise the approximate number N tweets of tweets per snapshot for each country. The number of tweets per snapshot for each country was chosen in order that each country had Fig. 1 Most of these peaks arise from the murder of George Floyd and the consequent protests. We focused mainly on the Death and emotionally-charged categories as these are the ones that are most related to the psychophysical numbing effect we describe in the main text

Country Date Category Description
USA June 2, 2020 Anger Donald Trump responding near the White House to the protests against the murder of George Floyd (Collinson 2020) approximately the same number of data points separated by approximately 3 days, and such that edge effects did not yield a final snapshot with a disproportionately low number of tweets. While this ultimately led to some snapshots representing aggregation over longer periods than others, this yielded sequences of networks that are comparable in terms of their total strength and order, enabling reasonably fair comparison of the modularities of the partition induced by the Death and Affect LIWC categories. N tweets 1 × 10 4 5 × 10 4 1 × 10 5 1 × 10 5 Fig. 10 Snapshots of the word co-occurrences associated with death ("muerte", green labels) and affect ("afecto", red labels) for Spanish-language tweets aggregated across all analyzed countries in three different time windows (see sub-captions). The nodes are coloured based on the community labels obtained by maximising modularity using the Louvain algorithm (Blondel et al. 2008). We filtered edges with weight below 20 co-occurrences for visualisation purposes