Uncovering the internal structure of Boko Haram through its mobility patterns

Boko Haram has caused nearly 40,000 casualties in Nigeria, Niger, Cameroon and Chad, becoming one of the deadliest Jihadist organisations in recent history. At its current rate, Boko Haram takes part in more than two events each day, taking the lives of nearly 11 people daily. Yet, little is known concerning Boko Haram’s internal structure, organisation, and its mobility. Here, we propose a novel technique to uncover the internal structure of Boko Haram based on the sequence of events in which the terrorist group takes part. Data from the Armed Conflict Location & Event Data Project (ACLED) gives the location and time of nearly 3,800 events in which Boko Haram has been involved since the organisation became violent 10 years ago. Using this dataset, we build an algorithm to detect the fragmentation of Boko Haram into multiple cells, assuming that travel costs and reduced familiarity with unknown locations limit the mobility of individual cells. Our results suggest that the terrorist group has a very high level of fragmentation and consists of at least 50–60 separate cells. Our methodology enables us to detect periods of time during which Boko Haram exhibits exceptionally high levels of fragmentation, and identify a number of key routes frequently travelled by separate cells of Boko Haram where military interventions could be concentrated.


Introduction
Boko Haram is one of the deadliest armed organisations in recent history. Since the Jihadist group became violent in 2009, it has caused nearly 40,000 casualties and displaced 2.4 million people around Lake Chad, an impoverished region divided between Nigeria, Niger, Cameroon and Chad in West Africa (UNHCR, 2019). Boko Haram has adopted a strategy of violence against Sufi and Salafi religious movements, traditional leaders, the wider civilian population, and the Nigerian state, which the organisation regards as corrupted and illegitimate (Matfess 2017). The organisation, which declared its own "state among the states of Islam" and sworn allegiance to the Islamic State in March 2015 (Pieri 2019), adheres to a literal interpretation of the religious texts of Islam and enforces a strict adherence to religious law. Its goal is to overthrow secular governments, cut their ties with the West and destroy the social and political order of the Lake Chad region.
Over the years, Boko Haram has been torn apart by internal rivalries that have their origins in the balance of power between the various leaders and factions that compose the main organisation (Zenn 2019). Boko Haram is now split between a faction led by Abubakar Shekau that controls parts of Borno State around Gwoza and the Cameroon-Nigeria border, and another faction led by Abu Mus'ab al-Barnawi, that is mainly active in the islands of Lake Chad, West of Maiduguri and along the Niger border in the Diffa region (Seignobos 2017).
Yet, due to the secretive nature of Boko Haram, the internal structure of the organisation remains largely unknown. Of particular importance is whether Boko Haram is a centralised organisation structured around a few key leaders or a network of decentralised cells (Anugwom 2018). Centralised organisations in which decisions and resources flow from the top down are theoretically more efficient than decentralised ones but also less resilient to counter-terrorism measures (Cunningham et al., 2016). Decentralised organisations in which individual cells are relatively independent from the core are more difficult to dismantle but also much more challenging to coordinate than centralised ones (Everton 2013;Price 2019). The issue of whether Boko Haram fighters tend to operate locally or travel extensively between their historical bases in northern Nigeria and their new sanctuaries in neighbouring countries also remains underexplored. Terrorist organisations capable of coordinating attacks over long distance are a much greater threat to African states and the international community that local organisations whose attacks are isolated in one particular region (Walther et al., 2020).
The paper proposes a novel technique to uncover the internal structure of Boko Haram, based on the sequence of events in which a terrorist cell takes part, using disaggregated data from the Armed Conflict Location & Event Data Project (Raleigh et al. 2010) on political violence in the region. We develop an algorithm to detect the fragmentation of Boko Haram into several cells, assuming that travel costs and reduced familiarity with unknown locations limit the mobility of the organisation. Shedding light on both the social structure and spatial organisation of Boko Haram, our analysis suggests that Boko Haram has a very high level of fragmentation and consists of at least 50-60 separate active cells. The method also identifies a number of key routes frequently travelled by separate Boko Haram cells, including international border crossings, where military interventions could be concentrated.

Background
Complex networks and quantitative models of crime and terrorism Network analysis can yield powerful insights into the latent structure of spatial and temporal data, as it is often the case with violent events (Yuan et al. 2019). Yet, as noted by Malcolm Sparrow in one of the earliest studies of crime and network analysis: "It would be enormously gratifying, therefore, if we could simply throw the existing network analysis toolkit at criminal intelligence databases, and come away with a set of valuable new insights. Of course it is not that easy" (Sparrow 1991). One of the main challenges of network-based studies of crime and terrorism is usually data incompleteness, dynamic behaviour (Gera et al. 2017) and the fact that "dark" networks tend to be covert and illegal (Bakker et al., 2012;Raab and Milward 2003;Gerdes 2015), which makes the identification of key nodes and links more difficult than with other networks.
Despite potential "hidden" data limitations, network-based studies of crime and terrorism have rapidly expanded since the beginning of the 2000s due to the availability of new data sources and the development of complex networks and quantitative models. Spatial networks, which are usually constructed by connecting crimes and potential criminal's address or connecting pairs of crimes (Oliveira et al. 2015) have helped identify crime pattern motifs (Davies and Marchione 2015), and have been used to predict crime, considering a street network (Rosser et al. 2017). The analysis of social networks has expanded to study organised crime networks, drug production (Malm et al., 2008), cybercrime and extremist networks (Morselli 2013). Social networks have also been used to model the diffusion of fear of crime as a reaction to direct and indirect victimisation (Prieto Curiel and Bishop 2017), providing a potential explanation as to why fear of crime can increase even if crime rates are being reduced (Prieto Curiel and Bishop 2018).
Networks are also increasingly used to visualise, model and counter terrorist organisations (Bakker et al., 2012;Krebs 2002;Carley 2006). The study of terrorist social networks usually looks at the network topology and identifies which actors are the most central (Everton 2009). Extant literature shows that terrorist organisations tend to find a balance between efficiency and security (Gerdes 2015;Morselli et al., 2007). Centralised networks, such as the Provisional Irish Republican Army (IRA) in Northern Ireland, are theoretically more efficient than decentralised ones but also less resilient to external threats, while decentralised networks are more difficult to detect and disrupt but also much less efficient at communicating resources and orders (Chuang and D'Orsogna 2019;Price 2019).
Beyond the social dimension of terrorism, space is now recognised as a fundamental dimension of both criminal and terrorist networks (Radil 2019;Bahgat and Medina 2013;Medina and Hepner 2008). Space provides the physical framework upon which crime and terrorist attacks are conducted. It shapes the strategies of covert organisations by acting as a facilitating or constraining factor in their fight against government forces or civilian populations. Geographical distance plays a critical role, as attacks are frequently executed near important areas or the city centre (Savitch 2014). Therefore, a frequent approach in terrorism studies is to detect spatially dependent events and selfreinforcing hotspots (Bahgat and Medina 2013). This approach focuses on how different events are linked or how spatial proximity can influence the formation of social networks (Skillicorn et al. 2019).
Another approach is to use exponential random graph models to explore the spatial and social network causes of violence. In Africa, recent research using exponential random graph models suggests that rebel groups whose turfs overlap are more likely to fight each other (Cunningham and Everton 2017). Space can also enable criminal and terrorist organisation to spread geographically by using border regions as sanctuaries (Arsenault and Bacon 2015), as in the Lake Chad region today (Walther et al., 2020).
Additional variables can be added to shed light on the social and spatial dynamics of terrorist networks, including ideology, tactics, weapons, targets and active regions (Gera et al. 2017;Campedelli et al., 2019a, b). A recent analysis of the terrorist attacks which occurred from 1997 to 2016 around the world shows, for instance, that groups with opposite ideologies can share very common behaviours (Campedelli et al., 2019a, b). In recent years, particular emphasis has been given to radical Islamist organisations, whose structure has been found to be resilient even if important social nodes were removed (Medina 2014). In West Africa, network studies have shown that Islamist organisations were capable of travelling long distances (Skillicorn et al. 2019), relied on a limited number of key brokers able to establish links with other rebel groups (Walther and Christopoulos 2015), and had a destabilising effect on regional political stability (Dorff et al., 2020). There is little agreement as to the organisational structure of Boko Haram. For some scholars, Boko Haram is a "centralized and nominally unified organization" in which Abubakar Shekau exercises a high degree of strategic and operational control (Zenn 2019). According to this perspective, Shekau's ruthless leadership allowed him to build a strongly unified organisation in which opponents were either killed, expelled or forced to follow his orders. While not particularly effective in winning battles and holding territories, this centralised leadership was instrumental in limiting the number of splinter groups, with the exception of the short-lived group Ansar al-Muslimin fi Bilad al-Sudan, better known as Ansaru, founded in 2012 and largely dormant since 2013 (Zenn and Pieri 2018).
Another strand of literature argues that Boko Haram is "organised under a loose federation of operating cells under the broad umbrella headship of the Islamic standard 'Shura Council'" , a consultative assembly (Anugwom 2019). According to this view, Boko Haram operates more as "a collection of loosely linked cells and bands than as a tightly disciplined hierarchical army" (Thurston 2017). For some authors (Weeraratne 2017), Boko Haram has adopted a "cell-like structure" since the execution of its leader Muhammad Yusuf in 2009. This structure, in which individual cells maintain little direct contact with the central leadership, allows local and regional commanders to enjoy a significant level of autonomy in their operations against governmental and civilian targets. The number of decentralised cells that composes Boko Haram, however, remains a matter of speculation. Local informants report that while Boko Haram is divided internally, "no one can pinpoint precisely how many these cells are and how far connected to the apex leadership these were" (Anugwom 2019). Fragmentation, however, has a cost as different cells might antagonise and compete against each other (Chuang and D'Orsogna 2019).
Boko Haram is known for its high mobility. Since it became violent in 2009, the organisation has been able to conduct an average of two attacks each day, taking on average the lives of nearly 11 people daily. The Boko Haram insurgency, which initially focused on cities, has mainly been active in rural areas since 2013, where it relies on cheap Chinese motorcycles to conduct its attacks (Agbiboa 2019). The move to rural areas has allowed Boko Haram to challenge the Nigerian military and to exploit agricultural and natural resources around Lake Chad. While Boko Haram had focused its attacks on northeastern Nigeria until 2014, increasing pressure from government forces and vigilante groups has led the terrorist organisation to conduct an increasing number of attacks in neighbouring Chad, Cameroon and Niger. Focusing on the organisation's diffusion across the region, Dowd (Dowd 2017) shows, for example, that Boko Haram has contracted subnationally, suggesting that the organisation is relocating to neighbouring countries instead of expanding. The mobility patterns that sustain these attacks remain largely under-reported.
Thus far, the debate on the organisational structure and mobility of Boko Haram primarily relies on qualitative data collected through interviews with former members of the Jihadist organisation, evaluation of tactics, court transcript, letters written between Boko Haram commanders and other extremist organisations, and propaganda videos (Kassim and Nwankpa 2018). Studies using quantitative approaches to detect and describe the social networks and spatial patterns of Boko Haram have mainly focused on relationships between the organisation and its enemies rather than on its internal dynamics (Walther et al., 2020). An exponential random graph model approach has shown that the emergence of Boko Haram in northern Nigeria led to an increase in the number of conflicts, even between pairs of actors that did not include Boko Haram (Dorff et al., 2020). Finally, some attempts have been made to create a multi-layer network of Boko Haram based on open-source data that includes shared events, collaborations, membership and financial ties (Cunningham 2014). That network is extremely sparse due to its relatively young cell-like structure and its lack of collective leadership (Gera et al. 2017).
Due to the secretive nature of terrorist groups, the internal structure of Boko Haram and whether it is a centralised organisation is still unknown. Whether Boko Haram cells tend to operate locally or have a high degree of mobility also remains underexplored. And, in that vein, paths which are frequently travelled by Boko Haram members and whether international borders work as frictions to the group or as safety structures is still an open question with potential policy implications.

Methods
The method used in our paper to understand the internal structure of Boko Haram differs from existing approaches. Building on a comprehensive dataset that includes all violent events in northern Nigeria and the neighbouring countries since 1997, we provide an estimate of the fragmentation of Boko Haram based on an agent-based model that identifies cells which move between Boko Haram events (Epstein 2002;Moon and Carley 2007;Park et al. 2012). Our approach requires two input parameters (the maximum cell speed and distance between events), whose impact on the results of the model (e.g., the number of cells detected) is analysed.
To analyse the mobility of Boko Haram cells, the locations of events are clustered and a spatial undirected weighted network is constructed based on those clusters, which captures how violent events are spatially linked and how cells move between different locations.

Data
Our study uses data from the Armed Conflict Location & Event Data project (ACLED) (Raleigh et al. 2010). To date, ACLED has recorded approximately half a million individual events and contains information about all reported political violence and protest events across Africa, South and Southeast Asia, the Middle East, Europe, and Latin America, mainly from local and regional media, reports from NGOs and social media accounts. Reports are separated into individual events that took place in different locations, have different types of violence, and involve different actors. For each event, the dataset records the date, actors, types of violence, locations, fatalities, and it also includes a space and time precision estimate.
All events in which Boko Haram was involved as an actor or associate actor were selected from the ACLED dataset including all Boko Haram factions, which in total gives 3795 events. Because our goal is to analyse the most recent mobility patterns of Boko Haram, a small number of isolated events involving Boko Haram before May 21st, 2012 were excluded from the analysis. This is the only filter applied to the 3795 events, and it removes 29.8% of the days since the first Boko Haram event but only drops 7.3% events. Two major events were dismissed, however: the July 2009 uprising of Boko Haram in Maiduguri against the police and military which resulted in 800 casualties, and the suicide attacks that took place in Kano in January 2012, which resulted in 185 casualties. Other events during the omitted period were less violent and resulted in fewer casualties.
In total, our dataset comprises 3,517 events and 36,775 casualties recorded by ACLED from May 2012 to May 2019, which represents 92.7% of the events and 94.4% of the total casualties attributed to Boko Haram since 2009.

Algorithm to detect fragmentation
Boko Haram has been most active around Lake Chad, a swampy region which has lost 90% of its surface water since 1960 (Policelli et al. 2018;Itno et al. 2015). The road infrastructure around the Lake and in northern Nigeria is in very poor condition, which results in limited, slow or costly mobility. Due to the lack of roads, it takes nearly 10 hours and 600 road kilometres to travel between Maiduguri (Nigeria) and Bol (Chad), two cities located on opposite sides of Lake Chad and only separated by 250 kilometres as the crow flies. It is roughly the same linear distance as between Lagos and Benin City, two Nigerian cities that can be travelled in 5.2 hours by road.
Some authors have argued that Boko Haram intensifies its attacks in rural areas during the rainy season (June-September), a period during which the mobility of government forces is limited by water-logged roads (Agbiboa 2019). ACLED data does not confirm this assumption. The highest number of events is recorded in January (with nearly two events each day of the month since 2016) and the highest number of casualties is recorded in February (with 12.3 casualties each day of the month since 2016), during the dry season.
Since 2014, there has been at least one Boko Haram event in 75% of the days and in 92% of any two consecutive days. If a single Boko Haram group (which we call a "cell") was responsible for all of these events, they would have travelled on average 216 kilometres each day for the past 7 years, the equivalent of travelling around the Earth twice each year. Since this is highly unrealistic and improbable, we assume that Boko Haram is fragmented into an unknown number of cells responsible for the observed patterns of attacks in the region.
Our model (algorithm) for constructing different Boko Haram cells is based on the principle of least action which assumes that the mobility of Boko Haram is constrained by environmental (distance, lack of roads) and security factors (presence of government forces) that reduce familiarity with unknown locations and limit the impact of its attacks. Boko Haram events are analysed in sequential order in a manner similar to that used previously to detect crime pattern motifs (Davies et al. 2016). Specifically, the algorithm assesses each event, assuming that cells move as little as needed. The first event is assigned a cell. The location and the date of the event is considered to be the last known location of that cell. For each subsequent event: 1. If the event takes place at a "reasonable distance" and within "reasonable time" from the last known location of a cell (from the set of existing cells), then we assume that the cell has moved between the two locations and is also responsible for the event. The location and time of the cell is updated. If the event could have been conducted by multiple cells, then one is selected at random. 2. However, if the event takes place either too far away or too soon after the last event (from the set of existing cells), then we assume that the event was conducted by a different cell. Hence, a new cell is created.
This approach thus also uses the principle of least group size (Thelen 1949), which assumes that if Boko Haram had more cells, it would be capable of committing more attacks and with a higher frequency than is observed.
In order to quantify "reasonable distance" and "reasonable time", let d i, j be the distance between events i and j and t i, j the number of days between them. Let ν > 0 be the maximum daily speed of a cell (in kilometres per day) and let μ > 0 be the maximum distance between two consecutive events (in kilometres) such that if: we assume that the two events were executed by a different cell. In other words, Equation (1) restricts the maximum daily speed of a cell (ν), and Equation (2) restricts the total distance that a cell can move between two consecutive events (μ). Figure 1 illustrates the cell assignment process outlined above.
Since Boko Haram attacks spans over 10 years, we presume that some of its cells will disappear, either because its members are killed or unable to coordinate their activities any longer. We therefore assume that a cell which has not been active for 1 year has dissolved and is no longer responsible for any future events. We also treat the main known Boko Haram factions identified by ACLED (Barnawi and Shekau) separately in our analysis. We assume that Barnawi cells do not take part in Shekau's events and Shekau's cells do not take part in Barnawi events.
The total number of cells, T τ (ν, μ) which counts all cells which existed up to time τ, and the active number of cells, A τ (ν, μ) which counts only the ones that are still active at a certain time τ, are identified and reported, as a function of the parameters ν and μ. In the example of Fig. 1, four events lead us to identify three cells. We write T 2019 (ν, μ) and A 2019 (ν, μ) to represent the latest known number of cells and active cells for some values of ν and μ and T τ (ν, μ) and A τ (ν, μ) if the period under consideration is different.

Parameter space and sensitivity analysis
The restrictions of maximum distance that a cell could have moved (μ), and their maximum daily speed (ν) are input model parameters. The range of what it is considered to be a "reasonable" daily speed and maximum distance is thus the parameter space. We consider that a cell can move at a maximum daily speed of up to 200 kilometres per day (and so values of ν range between 0 and 200) and the distance between any two consecutive events is, at most, 400 kilometres (and so values of μ range between 0 and 400).
Notice that with very large values of ν and μ, we get cells that could be "almost everywhere" as they move very fast and over long distances. This results in a small T τ (ν, μ) and A τ (ν, μ) since the same cell could have been responsible for most of the events (except for the ones which happen simultaneously). With μ = 0 or ν = 0, we obtain cells with no mobility and so, except for events which took place in the same location, the procedure assigns a different cell to each unique location. In that case, we get that T 2019 (0, 0) = 900, which means that Boko Haram has been active in roughly 900 unique locations, and that A 2019 (0, 0) = 233, meaning that they have Fig. 1 Schematic representation of the methodology. Events are analysed in sequential order and a unique Boko Haram cell is assigned to each one. For each event, the algorithm decides if an existing cell is involved in the attack or if a new (or not previously identified cell) is responsible. In the figure, an event took place during the first day, which means that a cell is created. The location of that event is its last known location and the date of the event is its last known date. The potential location of that cell increases each day according to its daily speed, . After a few days (four in the example) a cell has reached the maximum distance between consecutive events and so it is assumed that it remains within that region (in the example, μ = 4 ν). Then, during days 2 and 4, there is no cell nearby who could have been involved in the new events and so new cells are identified. During day 4, there is an event for which an existing cell is potentially responsible, so its last known location and date are updated been active only in 233 different locations during the past year and so many cells would be considered to be dissolved by now. Different values of ν and μ yield different numbers of total and active cells. We analyse T τ (ν, μ) and A τ (ν, μ) to illustrate the impact of the two parameters.
Our model consists of two parameters, ν and μ. The parameter space, which corresponds to values of the maximum distance between two events, μ between 0 and 400 kilometres and values of maximum daily speed, ν between 0 and 200 kilometres per day, was analysed first, by randomly choosing a value of ν and μ and then analysing the consecutive Boko Haram events as described in the text. This procedure was computed 100,000 times for different values of ν and μ before the corresponding T 2019 (ν, μ) and A 2019 (ν, μ) were reported. Also, since we are interested in detecting when has Boko Haram been more or less fragmented, we also computed T τ (ν, μ) and A τ (ν, μ) for values of τ from 2012 to 2019, for some fixed values ν and μ.

Spatial network of Boko Haram events
Although it would be possible to observe the mobility of cells by looking directly at the location of their corresponding events, the spatial grouping of locations into n clusters enables us to consolidate very short-distance movements. It also limits the possible journeys between distinct locations by n(n − 1)/2 and make it possible to analyse the most frequent journeys. Note that the construction of the network depends on our choice of parameters μ and ν. In other words, we will get a different network for alternative choices of μ and ν.
Event locations were clustered into nodes using Partitioning Around Medoids (Reynolds et al. 2006) (a procedure similar to K-means) with the restriction that locations inside a node are at a distance smaller than 20 kilometres. The result is a spatial network with 420 nodes: 294 of the nodes (70%) are in Nigeria, 80 nodes (19%) in Cameroon, 27 nodes (6%) in Niger and 19 nodes (5%) in Chad. Each event is assigned to its corresponding medoid. The medoids (or the nodes of the network) are located such that 99.4% of the events occurred in the same country as the corresponding medoid (except for 23 events where the medoid is located in a different country than the event).
We examine specific parts of the parameter space. To do so, we take pairs of values of ν 0 and μ 0 and selected all the realisations for which the values ν and μ are close to ν 0 and μ 0 . Formally, from all the realisations, if |ν − ν 0 | < 3.5 kilometres per day and if |μ − μ 0 | < 3.5 kilometres, a realisation is considered to be "close" and it is used to construct the spatial network around ν 0 and μ 0 . Instead of assuming that one realisation is the "true" network for a set of parameters ν 0 and μ 0 , we consider many realisations with a slight parameter change, in case a small perturbation changes the structure of the network completely. For a specific set of parameters ν 0 and μ 0 , the link ij is added to the network if our algorithm introduced above detects that a cell moved from node i to node j or from j to i. The corresponding weight of the edge is the number of journeys that is made by any cell in the set of realisations around ν 0 and μ 0 between i and j or between j and i (more details on the Supplementary materials) 5.1. Therefore, the edge weights w ij are the likelihood of one journey between i and j undertaken by a Boko Haram cell with maximum distance μ 0 and daily speed ν 0 . We measure the percentage of trips completed inside the same node, the percentage of trips which happen within the top 1% of the edges and the percentage of present edges for different values of μ 0 and ν 0 .

Boko Haram, a highly fragmented organisation
The results of our mobility pattern analysis suggest that Boko Haram is a highly fragmented terrorist organisation. The estimate of the number of cells depends on whether we believe that Boko Haram is rather mobile or not: highly mobile cells are capable of committing more attacks than immobile ones. If a high mobility scenario is selected, then there are at least 40 active cells in 2019 (Fig. 2). If a low mobility scenario is selected, then Boko Haram should have at least 150 active cells. An analysis of the total number of cells and the ratio between active and total cells in the parameter space is in the Supplementary materials 5.2.

Are Boko Haram cells specialised?
Very few datasets, besides ACLED, can be used as a source of validation of these results. Measuring the mobility of Boko Haram cells, estimating their daily speed and the maximum distance between events is almost impossible due to the risks of doing fieldwork in the region. Although mobility studies have rapidly evolved due to the development of new techniques and the use of new sources of data, such as mobile phone data (Wilson et al. 2016;Widhalm et al. 2015;Schneider et al. 2013) or credit card data Fig. 2 The number of Boko Haram active cells A 2019 (ν, μ), varies depending on two model input parameters: their daily speed ν and their maximum distance between two events μ. The smallest number of active cells is obtained when each cell travels ν > 90 kilometres each day and μ > 250 kilometres between consecutive events, which appears unlikely, considering the poor road conditions in the region. A more realistic hypothesis is that cells travel at most ν = 60 kilometres each day and μ = 180 kilometres between every pair of events, which would mean that Boko Haram is fragmented in roughly 53 active cells and 83 total cells (Clemente et al. 2018), this type of data simply does not exist for Boko Haram. Similarly, there is too little evidence about the internal structure of Boko Haram to validate the number of groups that we observed, apart from the fact that the organisation has adopted a "cell-like structure" (Weeraratne 2017), and some speculation around the number of decentralised cells (Anugwom 2019).
Due to the lack of an alternative validation exercise, we analyse whether some of the cells are more violent than others, or more specialised on certain types of events, as might be expected. We propose that evidence of such behaviour increases our confidence in the set of cells identified.
Boko Haram has participated in six main types of events. The majority of attacks (41.3%) are classified as armed clashes against state actors. Roughly a third (29.7%) of the attacks are committed against civilians, 8.9% are suicide bombs, 6.1% relate to governmental territorial gains, 4.4% are remote explosives and 3.7% are air or drone strikes. Sexual violence, abduction, violent demonstrations and other violent events are far less represented.
A metric of specialisation S for different regions in the parameter space ν, μ is constructed in order to measure how homogeneous or heterogeneous Boko Haram cells are. For each value of ν and μ, we look at the distribution of events by type within each cell. We then use the "distance" between the distribution of events by type of the most specialised cell and the distribution of events by type across all events (for more details, see the Supplementary materials 5.3).
Our results suggest that most specialised cells correspond to those found with parameters ν = 60 kilometres per day and 180 kilometres each day, and between μ = 170 and 200 kilometres between two consecutive events (Fig. 3). The number of cells T τ (ν, μ), and Fig. 3 Metric of specialisation S(ν, μ) of Boko Haram cells according to their speed ν and their maximum distance between two consecutive events μ. Cells are more specialised if they commit more attacks, armed clashes, suicide bombs or other types of events than expected. The region inside the white frame had the highest level of specialisation of the whole parameter space active cells A τ (ν, μ), in that parameter range is roughly the same, which means that it is likely that Boko Haram cells have a daily speed just below ν = 60 kilometres, and that they move around μ = 180 kilometres between any two events. These estimates are based on the hypothesis that mobility is reduced by environmental and security factors, that cells are specialised in certain types of attacks, and that the highest level of specialisation in the parameter space is an indication of the accuracy of our method. In what follows, we present results for the whole parameter space with a slight emphasis around the ν = 60 kilometres each day, and μ = 180 kilometres values.
Finally, the number of casualties per event was also considered as a potential way to distinguish between cells. The idea is that some cells could be more violent than others. Results show, however, that the number of casualties per cell is proportional to the number of events except for the most deadly events. This result is in line with the terrorism literature that suggests that the casualties or severity of terrorist events follows a power-law distribution (Clauset et al., 2007;Guo 2019). In the case of Boko Haram there is indeed a high concentration of casualties in some events. The 1% most violent events have caused 24% of the total casualties while the top 5% and top 10% events caused 47% and 61% of the casualties respectively (see the Supplementary materials 5.4 for further discussion on the number of casualties per cell).

Boko Haram has been restructured a few times
For a specific set of parameters ν 0 , μ 0 , the method suggests that T τ (ν 0 , μ 0 ) and A τ (ν 0 , μ 0 ) are not constant for different values of τ, that is the number of cells and the number of active cells changes according to the context. Particularly, the method shows that both T τ (ν 0 , μ 0 ) and A τ (ν 0 , μ 0 ) increased rapidly since the early 2010s, particularly in 2013, 2015 and 2019, 3 years during which Boko Haram has experienced internal changes (Fig. 4) and these breaking points are observed across the whole parameter space.
In 2013, Boko Haram expanded its activities to neighbouring countries, committing a number of attacks and kidnappings often associated with its splinter group Ansaru (Zenn 2014). This period is synonymous with major internal tensions between Boko Haram leader Shekau and two of his senior commanders, Khalid al-Barnawi and Mamman Nur, who condemn Shekau's strategy of indiscriminate violence against Muslim civilians and defectors. The year 2015 is another turning point in the war against Boko Haram. After several years of unsuccessful counter-insurgency operations, the Nigerian forces launched a series of attacks with the Multinational Joint Task Force (MNJTF), a regional initiative from Benin, Cameroon, Chad, Niger, and Nigeria, against the terrorist organisation. Boko Haram was defeated in a number of strategic locations and pushed back to remote or mountainous regions, around Lake Chad and the Cameroon border (Zenn 2019). The new cells observed in 2019 have been linked to the Barnawi faction of Boko Haram, which became active in 2016, but has committed an increasing number of attacks recently.

Some Boko Haram cells are more active than others
Our results suggest that a large fraction of Boko Haram events are committed by a few cells. If we assume that Boko Haram cells are highly mobile, with a speed of ν = 60 kilometres per day and a maximum distance of μ = 180 kilometres, then 12 cells are responsible for 70% of the events. Even with a low mobility scenario, 30 cells concentrate 70% of the events (Fig. 5). However, our model does not indicate that the cell of Boko Haram leader Shekau is significantly more active or more deadly than other highly active cells.

Boko Haram attacks are clustered in a few regions
The network formed by the mobility patterns of Boko Haram is very sparse. If we construct a composite network formed of cells generated for a range of μ and ν, only 4% of potential edges are actually present. This means that most pairs of nodes are not connected and, therefore, journeys between most pairs of locations were not identified by the model. Furthermore, 13% of sequential events are inside one of the nodes (e.g., the cells involved stayed within a 20 kilometres radius).

Boko Haram moves between a few regions
Our methodology identifies regions that are frequently traversed by Boko Haram cells. Results show that most of the movements take place between the capital of Borno State Maiduguri and the cities of Damaturu and Potiskum in Yobe State, along the major A3 Highway. Numerous movements are also recorded between Maiduguri and the Sambisa Forest, where Boko Haram has found a safe haven, and between the capital of Borno and the Cameroon border, where the headquarters of the organisation (Gwoza) was located until March 2015 (Fig. 6). For all types of mobility, the road between Maiduguri For a high mobility scenario, the frequency at which a cell is active twice in the same location (or node) is less than 6% of all its events (Fig. 7). Even with medium and low mobility, 10 and 30% of any two consecutive attacks take place in the same location. With a speed of ν = 60 kilometres per day and a maximum distance of μ = 180 kilometres, only 8% of any two consecutive events committed by the same cell are in the same location. These results suggests that Boko Haram fighters most probably leave the region they have attacked immediately and plan another attack from a different location.
Some of the journeys between two specific locations are very frequently travelled by Boko Haram cells (Fig. 7). Assuming that Boko Haram cells are highly mobile, the top 1% of the edges concentrate more than 40% of all the Boko Haram journeys. With low mobility, this increases to more than 70%. With a speed of ν = 60 kilometres per day and a maximum distance of μ = 180 kilometres, the top 1% edges concentrate roughly 50% of the journeys. Therefore, although cells bounce between different locations, most of their journeys are through very specific and repeated routes. Similar results are observed if we take the top 5% or other concentration units.
Finally, as we saw with the composite network, journeys between pairs of locations are not that frequent (Fig. 7). Depending on the mobility scenario, only 10-12% of node pairs are connected (e.g., at least one trip was detected).

Wider (and taller) bars represent cells with a higher number of events
Boko Haram is a regional problem Boko Haram started as an insurgency primarily focused on attacking the Nigerian government and for many years the vast majority of its attacks were conducted within Nigeria (Dowd 2017). In recent years, Boko Haram has relocated to Chad, Cameroon and Niger (Matfess 2019). Our results confirm this trend by showing that cross-border crossings have become more frequent (Skillicorn et al. 2019).
It is possible then to measure the number of cross-border trips by Boko Haram, this is simply the number of times that a cell was active in two consecutive events in nodes located at a different side of a country border. For a high mobility scenario, roughly 35% of the journeys of a Boko Haram cell cross an international border. Fewer crossings are observed under a lower mobility scenario (Fig. 8). With a speed of ν = 60 kilometres per day and a maximum distance of μ = 180 kilometres, roughly a third of the cells move across borders.

Conclusions
The objective of this study was to uncover the internal structure of a terrorist organisation through its mobility patterns. Our method identifies cells which move between Boko Haram events at a certain speed and for a certain maximum distance. Once cells and their mobility patterns have been extracted, a spatial network is constructed. Our Fig. 6 The 420 nodes and the top 1% edges according to their weight for different mobility scenarios. Certain paths (between Maiduguri, which is the largest node) and the border with Cameroon and the border with Niger are travelled frequently by Boko Haram cells. Also, the journey between Maiduguri and Damaturu and Potiskum (both located west of Maiduguri) are frequently travelled with high mobility cells. With low mobility cells, the movement of Boko Haram cells is most frequently between Maiduguri and Bama, Damboa, Gwoza and Kondua, four urban agglomerations south and south-east part of Maiduguri Fig. 7 For each combination of the daily speed ν and the maximum distance between two events μ, a spatial network was constructed. We measure the percentage of trips completed inside the same node (top), the percentage of trips which happen within the top 1% of the edges (middle) and the percentage of present edges (bottom). With a speed of ν = 60 kilometres and a maximum distance of μ = 180 kilometres, roughly 8% of the trips happen inside the nodes, nearly 50% of the trips happen within the top 1% of the edges and only 12% of the edges are present Fig. 8 The fraction of times that a cell crosses an international border depending on daily speed ν, and the distance between two events μ. With a speed of ν = 60 kilometres and a maximum distance of μ = 180 kilometres, Boko Haram cells cross a border 30% of the time between any two consecutive events study suggests that the terrorist organisation Boko Haram is structured around 50-60 cells active around Lake Chad in West Africa. Our work contributes to a long-lasting and often heated debate about the internal structure of Boko Haram. It suggests that Boko Haram is a rather fragmented organisation in which decentralised cells are capable of committing numerous and repetitive attacks against government and civilian targets in northern Nigeria and the surrounding countries. This result corresponds to earlier qualitative studies that noted that Boko Haram was organised around a loose federation of cells (Thurston 2017;Anugwom 2019). Due to the speed and spatial dispersion of attacks, it seems unlikely that Boko Haram is a strongly unified organisation, despite being formally ruled by a ruthless leader. While previous studies argue that the main unit led by Abubakar Shekau is responsible for more attacks than others, it appears to be less dominant in terms of casualties and geographical extent than some studies claim (Zenn 2019).
The fact that Boko Haram is fragmented into numerous cells that operate along specific routes, possibly across borders, can be used to inform counter-insurgency strategies. Firstly, dismantling one of the 50 presumed cells is unlikely to significantly reduce violence in the region, as each cell is on average responsible for only 2-3% of the casualties related to Boko Haram. Secondly, some paths are more frequently travelled by Boko Haram cells than others and so prevention interventions can be oriented to stopping cells when they move between two consecutive events, rather than a reactive strategy, targeting specific locations, such as where a cell previously has attacked. Thirdly, the large number of cross-border movements reported in our study suggests that Boko Haram has been able to operate regionally despite the multinational task force established by Nigeria and the neighbouring countries to secure the borders of the Lake Chad region (OECD/SWAC 2020). Cross-border cooperation remains a crucial factor in countering the Boko Haram insurgency and preventing its transnational spread in the region.
Our study of the mobility of Boko Haram suggests that members of the Jihadist organisation are capable of travelling over long-distance repeatedly. Based on the level of specialisation, we estimate that each cell of Boko Haram travels at most 60 kilometres per day on average, which is a significant distance considering the local road infrastructure and the need to avoid detection. Our model enables us to detect the main locations and paths travelled by Boko Haram cells, and indicates that both are highly concentrated in a number of cities (notably Maiduguri) and major road corridors.
In recent years, Boko Haram has been able to relocate (rather than spread) to remote places that are difficult to access to government troops, such as the Mandara mountains in Cameroon, the Sambisa Forest in Nigeria and the islands of Lake Chad. Our results show that more than a third of the journeys of Boko Haram cells cross an international border. Thus, security is not a national issue but a regional one and cross-border cooperation will play a fundamental role in the region.
Despite being composed of highly mobile cells, Boko Haram is nevertheless a rather territorial terrorist organisation which concentrates most of its attacks in what used to be the western part of the Kanem-Bornu, a pre-colonial empire that ruled from the 1380s to 1893. As such the spatial patterns of Boko Haram differ from those of Al Qaeda in the Islamic Maghreb, who does not seek to hold territory and is capable of conducting attacks thousands of kilometres apart (Walther et al., 2020).
Our method is capable of detecting known internal changes across time, particularly the 2015 turning point in the war against Boko Haram and the newly incorporated Barnawi events in 2019. The study confirms that the counter-offensive led by Nigerian forces, the Multinational Joint Task Force and vigilante groups in 2015 has contributed to further fragment Boko Haram and limit the spatial reach of its cells. By forcing Boko Haram to leave numerous cities and villages in northeast Nigeria, the counter-offensive is a turning point in the war against the organisation. We observe that since 2015 the number of armed clashes has increased much more rapidly than violence against civilians, which remains a modus operandi of Boko Haram. We also detect a series of major changes in the internal composition of Boko Haram during this period, with the creation of new cells, including but not limited to the faction led by Abu Mus'ab al-Barnawi. The fragmentation process observed during this period also leads to significant changes in the spatial patterns of Boko Haram, with a higher fragmentation but with cells which are less mobile and therefore less capable of conducting attacks far from their safe havens.

Supplementary materials
Constructing a spatial metric using the location of events Boko Haram has committed attacks in roughly 900 different locations across northern Nigeria and the neighbouring countries. However, the Euclidean distance between some events is very small, for instance, when two attacks place in distinct parts of Maiduguri, a city in Nigeria which, according to Africapolis (OECD/SWAC 2018), has a population of more than one million inhabitants and a surface of 139 kilometres 2 . In large cities, two events can be separated by more than 10 kilometres and still be inside the same metropolitan area. To address this issue, we aggregated events that took place in nearby regions.
We used Partitioning Around Medoids (Reynolds et al. 2006) to construct our spatial metric. This approach considers the location of all the events and a number k which corresponds to the number of clusters that the algorithm will produce. It then takes k representative objects (or medoids) among the observations of the locations and identifies several clusters of locations by assigning each observation to the nearest medoid. The algorithm tries different combinations of medoids by swapping between the options, with the objective to minimise the sum of the dissimilarities between the locations of each group and its medoid.
There are ways to find an optimal number of clusters, for example by looking at the quality of the clustering with different values of k (Rousseeuw and Kaufman 1990). Following a similar idea, we clustered the locations into k groups, with k = 2, 3, … and searched for the smallest number of clusters such that the Euclidean distance between any two locations of each cluster was smaller than 20 kilometres. This choice is justified by the idea that events separated by small distances might belong to the same metropolitan area or a similar region. The procedure was executed in R (R Core Team 2018) using the cluster package (Maechler et al. 2017). The distance between any two locations is smaller than 20 kilometres with k = 420 clusters.

Active cells, total cells and cells which remain active
Results show that with low daily speed ν or with a small maximum distance μ between events, there are at least 300 total cells, from which less than 50% are still active by 2019. The ratio between the total cells and those that are still active in 2019 is not uniformly distributed, but it ranges between 40 and 80% (Fig. 9). With a daily speed of ν = 60 kilometres per day and a maximum distance of μ = 180 kilometres, the total number of cells T 2019 (ν, μ) = 83, and the active number of cells A 2019 (ν, μ) = 53, meaning that 63% of the cells are still active in 2019.

Specialisation per cell
Our measure of specialisation was constructed by examining the type of events in which each cell had been involved, using the categories of events provided by ACLED (armed clash, attacks, suicide bombs, government regains territory; remote explosives; air or drone strikes and others). Then, we calculated the Euclidean distance between the distribution of the type of events of each cell and the distribution of the type of events across all observed events. We then weighted this distance by the number of events of the cell and reported the maximum value (across all cells) as the level of specialisation.
The level of specialisation for a realisation of the algorithm S(ν 0 , μ 0 ) is defined as where e i is the number of events of cell i, D i is the distribution of events per type of cell i, D ν 0 ;μ 0 is the average distribution and ∥ ∘ ∥ means the Euclidean distance. Since taking only the most specialised cell potentially leads to biased results (or could be the result of randomness), the average level of specialisation of the top 3, 9 and 27 cells was also considered (Fig. 10). Results show that the level of specialisation of the most specialised cell is highly correlated to the average of three most specialised cells, as are the results for the 9 or the 27 most specialised cells. Therefore, the section of the parameter space which is identified as very specialised with 1, 3, 9 or 27 cells are also very similar and we keep, as metric of specialisation, the level for just one cell.
A signal in the number of casualties?
It is possible to incorporate more information into each event to detect whether they were committed by the same cell or by different cells. The methodology could be extended by considering that two events are part of the same cell if they satisfy certain restrictions, and part of different cells otherwise. For example, if two events are executed by a substantially different number of people, or with distinct weapons, then the model could assign the events to different cells or assume the existence of new ones. See, for instance Campedelli et al. (2019a).
The number of casualties is another variable that could potentially have provided useful information to distinguish between cells. Unfortunately, this variable provides very little information that can be used in our analysis. Boko Haram has killed tens of thousands of people in northern Nigeria over the last 10 years but many of the events in which the organisation is involved have a small number of casualties, while a few events concentrate a disproportionate number of casualties. The top 5% most violent events concentrate 47% of the casualties of Boko Haram, whereas the 50% least violent events concentrate just 4.8% of the casualties. Using only the most violent events to discriminate between cells is also problematic because most of them happened during the first 2 months of 2015 when Nigerian forces launched a major military offensive. This period Fig. 10 The average metric of specialisation under different values of the daily speed ν and the maximum distance μ according to the number of cells which are considered for the metric during which Boko Haram was the deadliest, with 5% of its events and 17% of its fatalities, is too short to study long term changes within the organisation. Figure 11 shows the cumulative number of events per cell on the horizontal axis and the corresponding cumulative number of casualties per cell on the vertical axis for different mobility scenarios. Since few events are highly violent, the cells responsible for them are much more deadlier than the rest. All simulations have a similar structure in terms of the number of casualties (even with more or fewer cells) and so the number of casualties does not provide a signal to differentiate between cells.