The older the better: a fourth-order adaptive network model for reward-driven choices of emotion regulation strategies over time

The choice of which emotion regulation strategy to use changes depending on context, which within Psychology is referred to as ‘flexibility’. Besides that, choices of emotion regulation strategies are prone to various other factors, ranging from culture to gender, expectations of their effect, age, etc. This paper considers the phenomenon where choices of emotion regulation strategies change adaptively with age. In addition, the choices within specific age frames are driven by some kind of reward that affects in an adaptive manner the learning of a specific emotion regulation strategy. These adaptive phenomena involve plasticity or metaplasticity of different orders. They have been modeled by a fourth-order adaptive mental network model where the choice of emotion regulation strategies is motivated by reward prediction, and different age phases have their own adaptive influences. Simulation results are discussed for evaluation of the adaptive network model. The fourth-order adaptive network model presented here extends a second-order adaptive network model previously addressed in a paper at the conference COMPLEX NETWORKS 2019.


Introduction
Changes in one's behaviour with age, in general and, in particular, the use of emotion regulation strategies is a well-established fact in today's literature; e.g., (Carstensen et al. 1999;Allen and Windsor 2019). There are two types of alterations that take place over time in case of emotion regulation strategies. One is referred to as flexibility (Aldao et al. 2015): the changes in the use of strategies for regulation of emotions as per demand of the context. The other type of alteration concerns long-term tendencies in using specific strategies from one's repertoire of strategies. In other words, more use of a specific strategy rather than another one with increase in age (Nakagawa et al. 2017; Allen and Windsor 2019; Carstensen et al. 1999). Such types of transitions over time relate to metaplasticity as they involve some kind of change of the daily learning while moving from one strategy to another.
Besides the factors that contribute to the age-wise tendencies in the use of emotion regulation strategies, there are also other factors that specifically influence the rate of the change at a specific time point (Schultz et al. 1997;Reynolds et al. 2002). One of those considered here is reward (Schultz et al. 1997), which influences the rate of employment frequency of specific strategy for the short term. This paper considers both these factors for development of the adaptive mental network model for age and reward driven choices of emotion regulation strategies. The model is based on the vast body of literature available on the topic.
Within neurocognitive science, changes in the choice of emotion regulation strategies with time are considered specific forms of plasticity. However, also the speed and persistence of such changes do change over time and with events. This effect is named metaplasticity (Abraham and Bear 1996;Abraham 2008) or 'plasticity of plasticity'. While this may suggest a two-level approach to adaptivity, further inspection shows that the picture is still more detailed. There are still other factors and possibilities that may influence the factors of these two levels.
One of those factors that this paper considers is reward. Reward, in some way is involved in metaplasticity (Schultz et al. 1997): if there's reward for one of the strategies, the choice of that strategy is more probable. In the opposite case, if there's punishment, the probability of the use of that strategy decreases. Here, the overall tendency of the person still goes in the same direction, but at the same time the person also takes advantage of/uses the available reward. Eventually, the choice of strategy that's considered as one's behaviour, depends on which reward one prefers the most.
The multi-level adaptive network model in this paper has been developed and simulated using adaptive casual network-oriented modeling approach and its supportive environment from (Treur 2019(Treur , 2020a(Treur , 2020b. This approach has already proven well-applicable for various adaptive as well as non-adaptive network models, for instance (Ullah et al. 2018(Ullah et al. , 2020Gao et al. 2019;Ullah and Treur 2019). Moreover, this paper extends a conference paper in COMPLEX NETWORKS 2019 (Ullah and Treur 2019a). In the subsequent sections of this paper, Section 2 provides background from empirical sciences for the model. Section 3 presents the multi-level adaptive network model, Section 4 gives insight into the simulation results of the model. Finally, Section 5 concludes the paper.

Background
Researchers from (neuro) cognitive sciences agree on the fact that priorities of emotion regulation strategies change with age (Zimmermann and Iwanski 2014). Various scholars have worked on exploring the factors of influence and have come up with a number of them. For instance, in their Socioemotional Selectivity Theory, (Carstensen et al. 1999), indicate the time constraint as a source of modification in motivational goals for older people as it increases as a person gets older. (Zimmermann and Iwanski 2014) confirms these differences, indicating age-related developmental alterations that take place in the use of emotion regulation strategies; therefore, older and younger adults use different emotion regulation strategies. This shift in strategies is further dug out by (Carstensen et al. 1999) in the sense that when growing older the dominant choice changes from response-focused strategies, like suppression, to antecedentfocused strategies, such as reappraisal. This has further been confirmed by (John and Gross 2004;Charles and Carstensen 2007), who indeed found out more use of reappraisal as compared to suppression with increase in age (from 20 to 60 years).
Another important factor for such differences in choice of emotion regulation strategies has been pointed out by (Underwood et al. 1992;Gross and John 2003); this is parental influence at a younger age. According to (Underwood et al. 1992;Gross and John 2003), parents motivate their children to suppress their emotions at a younger age, contrary to older age, where the individual turns to more frequent use of reappraisal because of his awareness and more attention to the positive side of matters (Masumoto et al. 2016). Moreover, (Charles 2010) takes these findings to a next level in his "Strength and vulnerability integration theory" thereby providing a ground for this shift in the sense that this shift from response-focused strategies to antecedentfocused strategies can occur because the physiological flexibility decreases with age and therefore it becomes more difficult to apply response-focused strategies.
In terms of efficacy, contrary to suppression, (John and Gross 2004;Haga et al. 2009) not only consider cognitive reappraisal an efficient emotion regulation strategy, they also considered it to be helpful in decreasing psychological distress. Similarly, (Yeung et al. 2011) also considers reappraisal as an effective emotion regulation strategy and confirms its contribution to a positive emotional state. In contrast, expressive suppression may successfully suppress expression of emotions (Gross et al. 1997) but psychological distress still remains high.
Moving on to the details of these changes in choice of emotion regulation strategies, in general such changes are referred to as plasticity and metaplasticity. According to (Labouvie-Vief et al. 1989), maturity in cognition is bound to prevail with age and, therefore, older people has more mature cognition than their younger counterparts (Labouvie-Vief and Blanchard-Fields 1982). Moreover, the findings of (Brandtstädter and Renner 1990;Heckhausen and Schulz 1995) also provide a good basis for the model in this paper by indicating that flexibility in goal adjustment increases with age. Similarly, there are various other studies putting forward that the intensity of negative emotion decreases with age (Regier et al. 1988;Mroczek and Kolarz 1998;Charles et al. 2001;Charles and Almeida 2006;Rocke et al. 2009) in comparison to younger adults; hence, a weaker negative reaction occurs (Birditt and Fingerman 2003;Birditt et al. 2005), apart from individual differences in self-regulation (Rothbart et al. 2000) and cognitive abilities (Verhaeghen 2011).
In addition to the above-mentioned changes in the choice of emotion regulation strategies with age, various studies have found other parameters influencing changes in the choice of emotion regulation strategies. For instance, (Reynolds et al. 2002) put forward genetics as a factor of individual differences and environment as another influencing component in the choice of emotion regulation strategies along with age. It is argued that the genetic influences defining one's individual standing at certain point in time, maybe on top of the other changes that are taking place over time. Genetics not only defines individual standing for emotion regulation, it also influences more in general cognitive performance in late adulthood (McClearn 1997;McGue and Christensen 2001). Similarly, (Reynolds et al. 2002) highlights the role of educational attainment in emotion regulation strategies. Apart from these studies, some studies specifically talk about the rate of persistence of learning which can increase or decrease due to reward and punishment, for instance (Schultz et al. 1997).
Reward is a concept relating to dopamine and used to describe the positive values that creatures assign to a behavioral act, object or an internal physical state (Schultz et al. 1997). (Wise 1989;Robbins and Everitt 1996) describe reward through the behaviour that it may elicit. Experiments on behaviors suggest that learning takes place when there are changes in the expectations about future, which can be interpreted as reward and punishment. Similarly, reward is considered to play the role of reinforcer which increases the frequency of behavioral responses during learning and maintaining the behavior (Schultz et al. 1997). According to (Farashahi et al. 2017), the metaplastic synapses adjust to reward prediction in the environment. Similarly, some other reward dependent learning theories highlight the role of unpredictability in learning by stating that "learning is driven by the unpredictability of the reward by the sensory cue" (Rescorla and Wagner 1972;Mackintosh 1975). This idea can be tested by considering a situation where the reward is entirely predicted by a sensory cue, where no further learning takes place. For each possible reward, there's a time dependent learning rate. Similarly, (Farashahi et al. 2017), through their experiment, confirm increase or decrease in learning rate which is driven by the better and worse reward options. Moreover, in their three factors learning approach, (Kuśmierz et al. 2017) names reward as one of the three factors which functions as a facilitator in different types of learning by providing feedback about the performance of the entire network.
As discussed, there is an important role of reward and dopamine; this is summarized in the following two useful quotations: 'As mentioned, animals do not learn to lever-press for such things as food, water or sexual contact if the training takes place while dopamine function is impaired. Moreover, although well-trained animals perform normally for an initial period, they do not continue to do so if their dopamine systems are blocked.' (Wise 2004, p. 7/8).
'In the 1940s and 1950s, Hebb (1949) was among the first to propose that alterations of synaptic strength based local patterns of activation might serve to explain how conditioned reflexes operated at the biophysical level. Bliss and Lomo (1973) succeeded in relating these two sets of concepts when they showed long-term potentiation in the rabbit hippocampus. Subsequent biophysical studies have shown several other mechanisms for altering synaptic strength that are closely related to both the theoretical proposal of Hebb (1949) and the biophysical mechanism of Bliss and Lomo (1973). Wickens (1993) and Wickens and Kotter (1995) proposed the most relevant of these for our discussion, which is often known as the three-factor rule. What Wickens (1993) and Wickens and Kotter (1995) proposed was that synapses would be strengthened whenever presynaptic and postsynaptic activities co-occurred with dopamine, and these same synapses would be weakened when presynaptic and postsynaptic activities occurred in the absence of dopamine. Indeed, there is now growing understanding at the biophysical level of the many steps by which dopamine can alter synaptic strengths (Surmeier et al. 2009).' (Glimcher 2011, p. 10/12) These findings provide a solid ground for the model introduced in this paper.

The multi-level adaptive network model
The adaptive network model introduced in this paper has been developed using the Network-Oriented Modeling approach for adaptive networks described in (Treur 2019(Treur , 2020a(Treur , 2020b. Figure 1 represents the connectivity of the base network and Fig. 2 displays the connectivity of the multi-level adaptive network model as a whole with the different adaptation levels on top of the base network. Nomenclature for the base network in Fig. 1 is given in Table 1 and the nomenclature for the adaptive network model in Fig. 2 is supplied in Table 2.
The base network provides all mental states involved in the regulation of negative emotions. In this base network, the connection from fs b to cs reapp and fs b to cs sup are the ones that are decisive for which strategy is used for regulation of the emotions. In expressive suppression, only expression of emotions is suppressed but the intensity of negative stimuli still remains high. Therefore, various studies have found negative psychological consequences of expressive suppression. Cognitive reappraisal, in contrast, changes the belief/interpretation for the negative stimuli, which also decreases the intensity of negative feelings and stimuli as a result. Therefore, reappraisal has a more positive psychological impact.
In the overall network model (in Fig. 2), the first and second-order adaptation states (at the first and second reification level) model the parent's influence in the choice of emotion regulation strategies, especially at the initial stages in life as a child. As already mentioned in Section 2, parents motivate their children to suppress their emotions. So, in this network model, initially, the person uses suppression as a strategy for handling emotions. Later on, the priority of the person changes from suppression to reappraisal as she or he grows and gains awareness. Similarly, as mentioned in Section 2, this model also considers a reward factor. The reward factor influences the persistence factor of the learning taking place. If a child obeys parents and suppresses his emotions, he or she gets some reward and the quality of that reward affects how long the child will stick to the choice of strategy that the parents have motivated for. On the other hand, when the person grows older, reward priorities change. She or he starts valuing internal psychological ease more than worldly reward. So, the person starts reappraising negative emotions instead of suppressing their expression.
Activation of the control states through the connections from fs b to cs reapp and fs b to cs sup in the base network, depends on the weights of these connections, which by   So, these first-order adaptation states (at the first reification level), i.e., the W-states, represent plasticity of the emotion regulation choice. The second-order adaptation states represented by Mand H-states (at the second reification level) represent a form of metaplasticity. These states are responsible for the persistence and speed factor of the Hebbian learning taking place at the first adaptation level. Short-term rewards (from parents in young age and internal psychological satisfaction in older age) influence the persistence factors represented by the M-states at the second adaptation level. The connections for the effect of reward are from shrt s, rwrd to M Wfs b ;cs sup and from shrt r, rwrd to M Wfs b ;cs reapp for suppression and reappraisal, respectively. In this way the second adaptation levels handle the influence of parents or other factors on one's choice of emotion regulation strategies by affecting the persistence and learning mechanism behind this choice, in accordance with the two quotes at the end of Section 2.
However, such influences of parents or other factors change over the years. Therefore, in the network model the above-mentioned connections from shrt s, rwrd to M Wfs b ;cs sup and from shrt r, rwrd to M Wfs b ;cs reapp change over time as well. To this end, a third adaptation level was added with states W shrt r:rwd ;M W fs b ;csreapp and W shrt s;rwrd ;M W fs b ;cssup that represent the weights of the above connections. During lifetime, the latter states also change, and to model this a fourth adaptation level was added with M-states and H-states X 27 to X 30 . All these processes are shown in the simulation scenario depicted in Figs. 3, 4, 5, 6. Specification of the multi-level adaptive network model in role matrix format is given below in Box 1 and Box 2. Each matrix addresses the role a specific network characteristic has for each of the states X j indicated in each of the rows, followed by the relevant data for that characteristic. Role matrix mb in Box 1 specifies all the states by which the indicated state X j is affected through its incoming connections. These are the states either at the same level or from a lower level. The incoming connections from higher levels are specified in the other role matrices. Role matrix mcw provides the connection weights of the incoming connections to state X j . The green cells indicate the nonadaptive connections weights between − 1 and 1, and the red cells, which indicate state numbers, indicate the adaptive values for connection weights. In other words, these are the downward connections from a state at a higher level to a state at a lower level. For instance, the incoming connections to state X 15 -cs reapp and X 16 -cs sup from state X 8fs b (base network) shown in mb have adaptive weights indicated in red cells with state numbers (of upper level) X 19 and X 20 , respectively, in mcw. Here X 19 is W fs b ; cs reapp and X 20 is W fs b ; cs sup . The values of these W-states change by a process of Hebbian learning also called plasticity. Similarly, the persistence speed factor of the W-states changes by a process of metaplasticity and are represented in their respective matrices, i.e., mcfp (persistence) and ms (speed factor). The same mechanism of representation and adaptation applies to rewards and their effectuation through the upper levels.
In Box 2, it can be seen that the W-states in the first and third adaptation levels have downward incoming connections from the states at the second and fourth adaptation levels one each for persistence (in matrix mcfp) and for speed factor (in matrix ms). The matrix mcf gives insight into the combination functions used for the aggregation of multiple incoming impacts at a state X j . There currently is a library of over 36 Fig. 3 Switching from Suppression to Reappraisal over time combination functions. Moreover, self-defined combination functions can also be added to the library which makes the approach quite flexible.
These role matrices are used as input for the modeling environment described in (Treur 2020a(Treur , 2020b and can be executed then. They represent the conceptual representation of the model and show how each node is influenced by other nodes in the network from the different roles these other nodes play. Starting from base network to the fourth reification level, Box 1 and 2 specify these influences. For more background on the role matrices specification format used here, and the modeling environment, see (Treur 2020a(Treur , 2020b.

Simulation results
This section presents some of the simulation results for the fourth-order adaptive network model described in Section 3. Table 3 below provides the initial values of the states of model. Note that in Table 3 the states dealing with suppression have been given slightly positive initial values contrary to reappraisal which, as a strategy, develops in later stage of life. Higher initial values for the latter states would not do any harm, but also not any good as they would immediately lead to a downward trend in the graph to let them vanish. Figure 3 provides all the states in the base network, which shows the basic functioning of the network model. It can be seen that, initially, at a younger age the person is using suppression cs sup for the regulation of his or her emotions. In case of suppression, the intensity of the negative emotions still remains high, which can be confirmed by looking into the curves representing negative belief bs − and feeling state of body fs b Box 2 Role matrices for aggregation and timing All other states 0 in Fig. 2 and Fig. 3, which remains high even after the activation of the control state for suppression cs sup . Similarly, distress ms dstrss also remains high in case of suppression. Figure 4 represents only the most relevant states of the base network for clarity. In the later part of the simulation in Fig. 2 and Fig. 3, it can be seen that reappraisal cs reapp gets activated and suppression cs sup gets deactivated. This represents the shift that takes place from suppression to reappraisal as age increases. In case of reappraisal, it can be seen in the figures that negative belief bs − and feeling state of body fs b also decreases as the person changes his or her interpretation of the stimuli. Similarly, distress ms dstrss also decreases in case of reappraisal. This intensity of distress execution es dstrss after regulation of emotions through suppression or reappraisal also helps in differentiating between the two kinds of rewards that a person gets as a result of applying a specific strategy. Increase in distress shows that the person is suppressing his or her emotions which gets a response from parents in the form of some kind of reward like appreciation. This reward motivates the child, for a short time, to use suppression. Contrary to these kinds of rewards in younger age, the person starts to give more value to psychological satisfaction and ease as she or he grows, which is obtained by reappraising. The results of this entire phenomenon have been depicted in the simulation results of base network but the actual phenomenon has been modeled at the third and fourth-order adaptation levels. Simulation results of these levels are shown in Fig. 5 and Fig. 6. Figure 5 shows the patterns for the first-and second-order reification states. The Wstates represent the Hebbian learning taking place for the connections from fs b to cs reapp and fs b to cs sup . It can be seen that initially W fs b ; cs sup is higher. This is because of the parental influence in younger age that children are motivated for suppression. At later stages of the person's life, as the person grows older, decrease in W fs b ; cs sup and increase in W fs b ; cs reapp can be seen, which represents the shift taking place from suppression to reappraisal. The plasticity of this plasticity (metaplasticity), is further depicted in the second-order adaptation states. In these second-order adaptation states, the M-states represent the persistence factor and the H-states represent the speed factor of the Hebbian learning represented by W-states. It can be seen in the figure that these M and H-states are also dynamic. The Mand H-states controlling the persistence and speed factor of the Hebbian learning for suppression W fs b ; cs sup are decreasing while the Mand H-states for the Hebbian learning of reappraisal W fs b ; cs reapp are increasing as the age increases. This phenomenon is referred to as 'plasticity of plasticity' or metaplasticity. These results are consistent with the findings on the topic discussed in Section 2. Figure 6 provides the simulation results for the third-and fourth-level reification states, which depicts the short-term reward influence in choice of emotion regulation strategies. Initially, at a younger age, short-term rewards from parents plays a role of motivation for the use of suppression but as the person grows older the reward valuation changes. In the figure, it can be seen that, initially, this short-term reward driven Hebbian learning W shrt s;rwrd ;M W fs b ;cssup is high but decreases to zero as the person grows older. These curves represent the entire young and older age phase of a person, therefore, the short-term suppression reward W shrt s;rwrd ;M W fs b ;cssup remains high throughout the person's childhood and, as the person grows older, the short-term reappraisal reward W shrt r:rwd ;M W fs b ;csreapp also keeps increasing until it reaches 1. This represent the psychological reward in case of using reappraisal as a strategy in older age. It's worth mentioning here that the reward based short term learning (W-states in third reified level) and its persistence and speed factor (the M and H-states in the fourth reified level) takes place at connection from shrt s, rwrd to M Wfs b ;cs sup for reward for suppression and at connection from shrt r. rwd to M Wfs b ;cs reapp for reappraisal. This means that the persistence factor of the Hebbian learning taking place in base network is influenced by the reward. In other words, the reward does influence the choice of strategies for short time but as a whole the pattern of tendencies in choice of emotion regulation strategies remains the same over one's entire life. The M and H-states controlling the persistence and speed factors of the W-states of the third-order reification level are given in the fourth-order reification level. The fourth-level shows that this reward driven choices are also dynamic.

Conclusion
The model presented in this paper demonstrates the choices of emotion regulation strategies throughout one's life, from childhood to old age. In this model, influences of two external factors have been considered. The first two adaptation levels represent the parental influence on the choice of one's emotion regulation strategies. Parental influence is stronger during childhood and least in older age. The third and fourth adaptation levels represent the adaptation over time of the short-term reward influence with aging. External reward, like parental influences, has much influence during younger ages. As demonstrated through simulation results, parental and external reward influences decrease as the person grows older because of the shift in internal valuation, whereby the person starts rating his or her internal psychological status higher than external rewards. Moreover, the person gains more awareness, so, they prefer their own wellbeing. These changes make this shift possible in choice of emotion regulation strategies as shown in the simulation results.
The main results are in line with the literature, and also with those in (Ullah and Treur 2019a). However, whereas they only consider differences in choice of emotion regulation strategies, we consider, along with age, the role of short-term reward in choice of emotion regulation strategies and its adaptation during aging.
Overall, this paper also shows the scope of adaptive casual modeling approach (Treur 2020a(Treur , 2020b used for the design and simulation of our model. This adaptive network modeling approach enables and supports modeling of a wide variety of dynamic and adaptive processes.