The method used in our paper to understand the internal structure of Boko Haram differs from existing approaches. Building on a comprehensive dataset that includes all violent events in northern Nigeria and the neighbouring countries since 1997, we provide an estimate of the fragmentation of Boko Haram based on an agent-based model that identifies cells which move between Boko Haram events (Epstein 2002; Moon and Carley 2007; Park et al. 2012). Our approach requires two input parameters (the maximum cell speed and distance between events), whose impact on the results of the model (e.g., the number of cells detected) is analysed.

To analyse the mobility of Boko Haram cells, the locations of events are clustered and a spatial undirected weighted network is constructed based on those clusters, which captures how violent events are spatially linked and how cells move between different locations.

### Data

Our study uses data from the Armed Conflict Location & Event Data project (ACLED) (Raleigh et al. 2010). To date, ACLED has recorded approximately half a million individual events and contains information about all reported political violence and protest events across Africa, South and Southeast Asia, the Middle East, Europe, and Latin America, mainly from local and regional media, reports from NGOs and social media accounts. Reports are separated into individual events that took place in different locations, have different types of violence, and involve different actors. For each event, the dataset records the date, actors, types of violence, locations, fatalities, and it also includes a space and time precision estimate.

All events in which Boko Haram was involved as an *actor* or *associate actor* were selected from the ACLED dataset including all Boko Haram factions, which in total gives 3795 events. Because our goal is to analyse the most recent mobility patterns of Boko Haram, a small number of isolated events involving Boko Haram before May 21st, 2012 were excluded from the analysis. This is the only filter applied to the 3795 events, and it removes 29.8% of the days since the first Boko Haram event but only drops 7.3% events. Two major events were dismissed, however: the July 2009 uprising of Boko Haram in Maiduguri against the police and military which resulted in 800 casualties, and the suicide attacks that took place in Kano in January 2012, which resulted in 185 casualties. Other events during the omitted period were less violent and resulted in fewer casualties.

In total, our dataset comprises 3,517 events and 36,775 casualties recorded by ACLED from May 2012 to May 2019, which represents 92.7% of the events and 94.4% of the total casualties attributed to Boko Haram since 2009.

### Algorithm to detect fragmentation

Boko Haram has been most active around Lake Chad, a swampy region which has lost 90% of its surface water since 1960 (Policelli et al. 2018; Itno et al. 2015). The road infrastructure around the Lake and in northern Nigeria is in very poor condition, which results in limited, slow or costly mobility. Due to the lack of roads, it takes nearly 10 hours and 600 road kilometres to travel between Maiduguri (Nigeria) and Bol (Chad), two cities located on opposite sides of Lake Chad and only separated by 250 kilometres as the crow flies. It is roughly the same linear distance as between Lagos and Benin City, two Nigerian cities that can be travelled in 5.2 hours by road.

Some authors have argued that Boko Haram intensifies its attacks in rural areas during the rainy season (June–September), a period during which the mobility of government forces is limited by water-logged roads (Agbiboa 2019). ACLED data does not confirm this assumption. The highest number of events is recorded in January (with nearly two events each day of the month since 2016) and the highest number of casualties is recorded in February (with 12.3 casualties each day of the month since 2016), during the dry season.

Since 2014, there has been at least one Boko Haram event in 75% of the days and in 92% of any two consecutive days. If a single Boko Haram group (which we call a “cell”) was responsible for all of these events, they would have travelled on average 216 kilometres each day for the past 7 years, the equivalent of travelling around the Earth twice each year. Since this is highly unrealistic and improbable, we assume that Boko Haram is fragmented into an unknown number of cells responsible for the observed patterns of attacks in the region.

Our model (algorithm) for constructing different Boko Haram cells is based on the principle of least action which assumes that the mobility of Boko Haram is constrained by environmental (distance, lack of roads) and security factors (presence of government forces) that reduce familiarity with unknown locations and limit the impact of its attacks. Boko Haram events are analysed in sequential order in a manner similar to that used previously to detect crime pattern motifs (Davies et al. 2016). Specifically, the algorithm assesses each event, assuming that cells move as little as needed. The first event is assigned a cell. The location and the date of the event is considered to be the *last known location* of that cell. For each subsequent event:

- 1.
If the event takes place at a “reasonable distance” and within “reasonable time” from the last known location of a cell (from the set of existing cells), then we assume that the cell has moved between the two locations and is also responsible for the event. The location and time of the cell is updated. If the event could have been conducted by multiple cells, then one is selected at random.

- 2.
However, if the event takes place either too far away or too soon after the last event (from the set of existing cells), then we assume that the event was conducted by a different cell. Hence, a new cell is created.

This approach thus also uses the principle of least group size (Thelen 1949), which assumes that if Boko Haram had more cells, it would be capable of committing more attacks and with a higher frequency than is observed.

In order to quantify “reasonable distance” and “reasonable time”, let *d*_{i, j} be the distance between events *i* and *j* and *t*_{i, j} the number of days between them. Let *ν* > 0 be the *maximum daily speed of a cell* (in kilometres per day) and let *μ* > 0 be the *maximum distance between two consecutive events* (in kilometres) such that if:

$$ \frac{d_{i,j}}{t_{i,j}}>\nu, $$

(1)

or if

we assume that the two events were executed by a different cell. In other words, Equation (1) restricts the maximum daily speed of a cell (*ν*), and Equation (2) restricts the total distance that a cell can move between two consecutive events (*μ*). Figure 1 illustrates the cell assignment process outlined above.

Since Boko Haram attacks spans over 10 years, we presume that some of its cells will disappear, either because its members are killed or unable to coordinate their activities any longer. We therefore assume that a cell which has not been active for 1 year has dissolved and is no longer responsible for any future events. We also treat the main known Boko Haram factions identified by ACLED (Barnawi and Shekau) separately in our analysis. We assume that Barnawi cells do not take part in Shekau’s events and Shekau’s cells do not take part in Barnawi events.

The *total number of cells*, *T*_{τ}(*ν*, *μ*) which counts all cells which existed up to time *τ*, and the *active number of cells*, *A*_{τ}(*ν*, *μ*) which counts only the ones that are still active at a certain time *τ*, are identified and reported, as a function of the parameters *ν* and *μ*. In the example of Fig. 1, four events lead us to identify three cells. We write *T*_{2019}(*ν*, *μ*) and *A*_{2019}(*ν*, *μ*) to represent the latest known number of cells and active cells for some values of *ν* and *μ* and *T*_{τ}(*ν*, *μ*) and *A*_{τ}(*ν*, *μ*) if the period under consideration is different.

### Parameter space and sensitivity analysis

The restrictions of maximum distance that a cell could have moved (*μ*), and their maximum daily speed (*ν*) are input model parameters. The range of what it is considered to be a “reasonable” daily speed and maximum distance is thus the *parameter space*. We consider that a cell can move at a maximum daily speed of up to 200 kilometres per day (and so values of *ν* range between 0 and 200) and the distance between any two consecutive events is, at most, 400 kilometres (and so values of *μ* range between 0 and 400).

Notice that with very large values of *ν* and *μ*, we get cells that could be “almost everywhere” as they move very fast and over long distances. This results in a small *T*_{τ}(*ν*, *μ*) and *A*_{τ}(*ν*, *μ*) since the same cell could have been responsible for most of the events (except for the ones which happen simultaneously). With *μ* = 0 or *ν* = 0, we obtain cells with no mobility and so, except for events which took place in the same location, the procedure assigns a different cell to each unique location. In that case, we get that *T*_{2019}(0, 0) = 900, which means that Boko Haram has been active in roughly 900 unique locations, and that *A*_{2019}(0, 0) = 233, meaning that they have been active only in 233 different locations during the past year and so many cells would be considered to be dissolved by now. Different values of *ν* and *μ* yield different numbers of total and active cells. We analyse *T*_{τ}(*ν*, *μ*) and *A*_{τ}(*ν*, *μ*) to illustrate the impact of the two parameters.

Our model consists of two parameters, *ν* and *μ*. The parameter space, which corresponds to values of the maximum distance between two events, *μ* between 0 and 400 kilometres and values of maximum daily speed, *ν* between 0 and 200 kilometres per day, was analysed first, by randomly choosing a value of *ν* and *μ* and then analysing the consecutive Boko Haram events as described in the text. This procedure was computed 100,000 times for different values of *ν* and *μ* before the corresponding *T*_{2019}(*ν*, *μ*) and *A*_{2019}(*ν*, *μ*) were reported. Also, since we are interested in detecting when has Boko Haram been more or less fragmented, we also computed *T*_{τ}(*ν*, *μ*) and *A*_{τ}(*ν*, *μ*) for values of *τ* from 2012 to 2019, for some fixed values *ν* and *μ*.

### Spatial network of Boko Haram events

Although it would be possible to observe the mobility of cells by looking directly at the location of their corresponding events, the spatial grouping of locations into *n* clusters enables us to consolidate very short-distance movements. It also limits the possible journeys between distinct locations by *n*(*n* − 1)/2 and make it possible to analyse the most frequent journeys. Note that the construction of the network depends on our choice of parameters *μ* and *ν*. In other words, we will get a different network for alternative choices of *μ* and *ν*.

Event locations were clustered into nodes using Partitioning Around Medoids (Reynolds et al. 2006) (a procedure similar to K-means) with the restriction that locations inside a node are at a distance smaller than 20 kilometres. The result is a spatial network with 420 nodes: 294 of the nodes (70%) are in Nigeria, 80 nodes (19%) in Cameroon, 27 nodes (6%) in Niger and 19 nodes (5%) in Chad. Each event is assigned to its corresponding medoid. The medoids (or the nodes of the network) are located such that 99.4% of the events occurred in the same country as the corresponding medoid (except for 23 events where the medoid is located in a different country than the event).

We examine specific parts of the parameter space. To do so, we take pairs of values of *ν*_{0} and *μ*_{0} and selected all the realisations for which the values *ν* and *μ* are close to *ν*_{0} and *μ*_{0}. Formally, from all the realisations, if ∣*ν* − *ν*_{0} ∣ < 3.5 kilometres per day and if ∣*μ* − *μ*_{0} ∣ < 3.5 kilometres, a realisation is considered to be “close” and it is used to construct the spatial network around *ν*_{0} and *μ*_{0}. Instead of assuming that one realisation is the “true” network for a set of parameters *ν*_{0} and *μ*_{0}, we consider many realisations with a slight parameter change, in case a small perturbation changes the structure of the network completely. For a specific set of parameters *ν*_{0} and *μ*_{0}, the link *ij* is added to the network if our algorithm introduced above detects that a cell moved from node *i* to node *j* or from *j* to *i*. The corresponding weight of the edge is the number of journeys that is made by any cell in the set of realisations around *ν*_{0} and *μ*_{0} between *i* and *j* or between *j* and *i* (more details on the Supplementary materials) 5.1.

Therefore, the edge weights *w*_{ij} are the likelihood of one journey between *i* and *j* undertaken by a Boko Haram cell with maximum distance *μ*_{0} and daily speed *ν*_{0}. We measure the percentage of trips completed inside the same node, the percentage of trips which happen within the top 1% of the edges and the percentage of present edges for different values of *μ*_{0} and *ν*_{0}.