Introduction

The brain is a highly complex organ responsible for controlling essential functions such as sensory processing, cognition, memory, respiration, motor control, and language production (Ito et al., 2020; Adamovich et al., 2024; Iranmanesh et al., 2021). These functions rely on intricate computations within distinct brain regions and the transmission of signals between them. Long-range axons, also referred to as projecting axons, control this interregional communication (Liu et al., 2024; Shin et al., 2020), and numerous studies have highlighted their critical role in global neuronal circuitry (Zingg et al., 2014; Hilgetag & Zikopoulos, 2022; Mateus et al., 2024). Therefore, understanding the organization and connectivity of long-range axons is vital for deciphering brain function.

Recent advances in experimental imaging techniques, such as the sparse labeling technique, described in Economo et al. (2019) and Winnubst et al. (2019), overcome limitations of previous reconstruction efforts, such as electron microscopy (Helmstaedter et al., 2013; Kasthuri et al., 2015), and staining of local axons (Thomson & Armstrong, 2011; Blackman et al., 2014; Karube et al., 2017), enabling the reconstruction of complete long-range axonal morphologies that span multiple brain regions. However, the process remains time-consuming and costly, making it impractical to capture every detail of the biological neuronal circuits (Economo et al., 2019; Wang et al., 2018). In addition, experimental approaches face limitations in manipulating neuronal morphological properties to assess their effects on brain function. To overcome these challenges, numerical simulations of large-scale neuronal circuits, which we term as morphologically detailed circuits, are essential to gain deeper insights into brain organization and the distribution of tasks across different brain regions. These simulations should model a large number of neuronal morphologies, including both dendrites and axons, to establish complete circuit connectivity. A major challenge is generating realistic neuronal circuits in which the modeled morphologies exhibit local morphometric properties (e.g., branching patterns and section characteristics) that closely resemble those of experimentally reconstructed neurons. Although this work does not feature circuit simulations, it provides a new tool to synthesize multiple long-range axons accurately. This is fundamental for large-scale circuit simulations as these long-range axons enable the connection between brain regions.

Fig. 1
Fig. 1
Full size image

Schematic of long-range axon modeling, including the steps of the long-range trunk, the common ancestors, and the tufts generated at each target point

Long-range axons must target specific brain regions and follow pre-defined pathways, depending on the location of their source and targets (Brodal, 2010; Osten & Margrie, 2013). Various models have been proposed to synthesize artificial neuronal morphologies, each offering different advantages. Some models focus on simulating more realistic axonal growth during brain development, incorporating detailed molecular cues to guide the growth cone (Zubler & Douglas, 2009; Torben-Nielsen & Cuntz, 2014). While these models can reproduce detailed morphologies, they come with significant computational costs and require extensive data, which is difficult to acquire for large-scale circuits. However, mathematical models generate morphologies using mathematical principles and morphological statistics. Although these models require less experimental data, they need manual parameter adjustments based on the type and location of the neurons that are synthesized, therefore they are more difficult to generalize to our case of axonal shapes (Koene et al., 2009; Luczak, 2006; Ascoli et al., 2001; Cuntz et al., 2010).

A recent model successfully integrated the mathematical and statistical approaches into a topological synthesis algorithm (Kanari et al., 2022b). However, this work was limited to the synthesis of dendritic morphologies. Local axons extend within a brain region and similarly to the dendrites they can be simulated taking into account local contextual cues. Long-range axons extend over large distances from the soma and target specific brain regions during development to form functional connections (Dickson, 2002; Sakai & Kaprielian, 2012; Kerstjens et al., 2022; Goodman & Shatz, 1993). Long-range axons must grow taking into account the distance from the soma, as dendrites and local axons do, but also navigate to specific target regions within multiple brain regions (Kollins et al., 2009; Liu et al., 2024; Winter et al., 2023; Zhou et al., 2023). This requirement, along with the need for computational efficiency and realistic local morphometrics, cannot be fully captured by previous models.

In this work, we introduce a novel method for computationally synthesizing long-range axons that reproduce the local morphometrics of reconstructed axons, while also targeting the appropriate brain regions. Our approach utilizes the Steiner Tree Algorithm to generate a skeleton connecting large-scale target regions, followed by a random walk guided by this skeleton to synthesize the primary axonal trunk. Finally, a local-scale model, based on topological synthesis (Kanari et al., 2022b) is applied to generate the terminal branches of the axonal tree. We demonstrate that this method successfully combines large-scale target precision with detailed local-scale axonal synthesis, ensuring accurate targeting of brain regions while preserving realistic local morphometrics.

The organization of the paper is as follows. In Section “Definitions and Modeling Hypothesis”, we briefly define the main concepts used in the following sections, then we describe the method in Section “Methods”. Finally, in Section “Results” we present several results to validate the method and propose an application in a mouse brain atlas.

Fig. 2
Fig. 2
Full size image

Main steps of the long-range axon synthesis algorithm: define the target points (a), generate a 3D graph containing the source point, the target points, and other intermediate points (b), compute the Steiner Tree that connects the source point to the target points through this network (c), post-process the Steiner Tree solution to make it more realistic (d) and generate one tuft at each target point (e)

Definitions and Modeling Hypothesis

Axons are complex structures with several components, as detailed in Gibson and Ma (2011). In this work, we divide axons into a long-range trunk, which can span many brain regions, and several local tufts, as shown in Fig. 1. The point connecting a tuft to the long-range trunk is called the common ancestor of the tuft. Also, for modeling consistency, the long-range trunk cannot contain any terminals. Terminals are always part of a tuft, so a tuft may contain only one terminal section. The long-range trunk is built so that it connects specific brain regions targeted by the axon. These brain regions are registered in a brain atlas that divides space into voxels and associates data with each of them (brain region ID, local orientation, etc.).

This model is consistent with Gibson and Ma (2011) which states that during the development of the axon, a growth cone extends the axon in the direction of the targeted brain regions to connect to them. During this growth, branching processes occur to connect several brain regions while minimizing the total length of the axonal tree. Then, when the growth cone reaches a targeted brain region, the branch splits into multiple short terminal sections that form synapses with the dendrites of the surrounding neurons. In this work, we aim to create morphologies that are biologically similar to the final state of the developmental process, thus the intermediate states of axonal development are out of scope.

Methods

In this section, we describe the methodology of the axonal growth. The main steps are summarized in Fig 2. The axon grows in two subsequent steps: first, the trunk is generated based on the Steiner tree (Fig. 2c), then the tufts are created based on a topological synthesis approach (Fig. 2.e).

Creating Statistical Inputs from Reconstructed Cells

To synthesize realistic morphologies, it is important to calibrate multiple parameters from real reconstructed neuronal morphologies. There are two kinds of parameters to take into account: statistics on local properties of the sections of both the long-range trunk and the tufts, and the global tuft properties that describe their sizes and shapes depending on where they are located.

First, to compute the statistics and properties of the long-range trunk and the tufts, we should define which parts of a reconstructed morphology are considered as tufts. The rest is considered as the long-range trunk. In this work, the tufts are defined based on a simple clustering algorithm: two points separated by a Euclidean clustering distance \(d^{\textrm{Euclidean}} \le d^{\textrm{Euclidean}}_{\textrm{max}}\) and a clustering path distance \(d^{\textrm{path}} \le d^{\textrm{path}}_{\textrm{max}}\) are considered part of the same tuft. Applying this algorithm to each pair of terminals and merging the pairs with a common terminal defines the tufts. Other clustering algorithms were tested (e.g., do not consider the path or define the tufts according to brain regions), but the one presented here was computationally efficient and gave good results. In the rest of this work, the clustering distance stands for the Euclidean clustering distance.

Once the tufts are defined, it is possible to extract the required features from them. For each tuft, its barcode, which is a topological signature as described in Kanari et al. (2020), is computed using the Topological Morphology Descriptor (Kanari et al., 2024). Then the following tuft properties are computed:

  • the coordinates of the common ancestor,

  • the total path length of the tuft,

  • the path distance from the soma to the common ancestor,

  • the orientation in the local atlas frame of the segment between the common ancestor and the center of mass of the tuft (the local atlas frame has the Y axis towards the pia).

All these properties are then stored in a dataset from which entries will be sampled to synthesize new tufts.

Finally, the mean and standard deviation of the lengths of the segments of the long-range trunk are extracted to make it statistically realistic during synthesis. These properties will be used to post-process the skeleton of the long-range trunk, as the main process described in Section “Connecting SourcePoints to Target Points” only aims at drawing its trajectory, not its actual morphology.

Building the Long-range Tree

The main trunk of an axon, also referred to as the long-range trunk or long-range tree, is the large-scale part of the axon that connects different brain regions. The synthesis process of the main trunk has 3 main sub-steps:

  1. 1.

    find the source and target point coordinates (step shown in Fig. 2.a, Section “Source and Target Points Placement”);

  2. 2.

    connect the target points (steps shown in Fig. 2.b,c, Section “Connecting Source Points to Target Points”);

  3. 3.

    post-process the resulting tree (step shown in Fig. 2.d, Section “Post-processing”).

Source and Target Points Placement

The long-range axon synthesis algorithm starts by collecting the given morphologies from the given cell collection to which the synthesized long-range axons should be grafted. Then the starting points of each axon are computed. These starting points are called the source points of the axons and can either be given explicitly (e.g. to start from an existing local axon) or just start from the soma.

The next step is to associate a source population to each long-range axon. These source populations represent groups of neurons that participate in projections to a given set of target brain regions. Source populations can be arbitrarily associated with each axon or computed using a brain atlas, such as Wang et al. (2020). In the latter case, the brain region of the atlas in which the cell is located is computed, then a source population is randomly picked among the ones associated with this brain region (as multiple source populations can be associated with a given brain region to represent the diversity of projection classes). The probabilities of these source populations are given in the input source population matrix, which can be derived from connectomics results or from large reconstructed morphology datasets (see Petkantchin et al. (2024) for an application of this work).

After that, the target populations of each axon are randomly picked among the ones possible for the associated source population, each target population possibly being associated with several brain regions. The probabilities of each target population depending on a source population are given in a projection probability matrix. Once the target populations have been chosen, a voxel is randomly chosen in the brain region associated with each target population. Then a random shift is applied inside this voxel and the resulting location is called a target point of the axon.

Connecting Source Points to Target Points

As described by Cuntz et al. (2010), the Ramón y Cajal (1911)’s hypothesis about wiring optimization leads to near-optimal trees regarding the minimization of the total path length of the morphology. In graph theory, the problem of connecting a set of terminal points (which are the target points described in Section “Source and Target Points Placement”) in Euclidean space while minimizing the total length of the graph is known as the Euclidean Steiner Tree Problem. This problem is known to be NP-hard (Karp, 1972), which is an issue when generating large axon structures in a reasonable amount of time. Also, in this problem, it is supposed that the cost function is uniform in space, which is not the case in our problem, as axons tend to follow axonal projection tracts for example. For these reasons, we use an approximation to connect the set of terminal points. As a first step, we transform the Euclidean problem into a graph problem by creating a 3D network connecting all the terminal points with some intermediate points. Then, the edge weights are manipulated to model different constraints (especially to decrease the weight of edges located in the projection tracts). Finally, the Steiner Tree is solved in this graph using the approximation described by Hegde et al. (2015).

The graph generation is divided into several parts. First, the terminal points are collected. Then, several types of intermediate points are added. For the first type, each segment between the source point and the target points is evenly split into sub-segments to create \(N_{\textrm{intermediate}}\) new points. These splitting points are used to refine the future graph and to ensure it is possible to connect the terminal points using non-direct paths. Then \(N_{\textrm{random}}\) random points are added in the bounding box of the previous points such that that there is no other point within a \(r_{random}\) radius. These random points are also used to refine the future graph and to ensure that the Steiner Tree algorithm can find the paths that appear longer in Euclidean space but have smaller weights because they go through preferred areas (e.g. projection tracts, see Section “Preferred Region Modeling”). Then Voronoï points are computed between all the previous points (the Voronoï tesselation is the dual of the Delaunay tesselation, so the Voronoï points are the centers of the circumspheres of the Delaunay tetrahedrons) and added to the set (the ones that are too far from the bounding box are discarded). These Voronoï points allow us to reduce the empty areas and ensure that the bifurcation angles are not too small, which is more realistic. Finally, close points are merged to reduce the size of the point set and to avoid too small segments. This process greatly increases the size of the set of points used to compute the Steiner Tree but gives a lot of possible paths to connect the target points and thus can give relevant solution paths.

Once the set of points is created, a Delaunay triangulation is performed to create the edges of the graph. Then the edges of this graph are weighted according to Eq. 1, where the edge i connects the points A and B, \(l_i\) is the length of the edge i, \(o_i\) depends on the orientation of the edge i, \(\zeta _i\) depends on the depths of the edge i in the atlas and \(\gamma _i\) depends on the space area in which the edge i is located. The first term of the weight is required since the total length should be as close to the minimum as possible to build a relevant tree. The second term is used to reduce the probability of transverse sections, which are less common. The third term is used to favor paths that follow the curvature of the atlas layers. The last term is required to favor specific areas (e.g. projection tracts).

$$\begin{aligned} w_{i}&= l_{i} \times o_{i} \times \zeta _{i} \times \gamma _{i} \end{aligned}$$
(1)

Equation 2 defines how each term can be computed. In this equation, S is the center of the soma, C is the middle of the segment AB, \(\vec {r}\) is a location in space, \(\alpha _{o}\), \(\alpha _{\zeta }\) and \(\alpha _{\gamma }\) are positive exponents, \(\lambda _{o}\), \(\lambda _{\zeta }\) and \(\lambda _{\gamma }\) are amplitude coefficients, \(y(\vec {r})\) is the depth field associating an atlas depth to each location \(\vec {r}\) in space and \(\gamma (\vec {r})\) is an attraction field that represents the affinity of the axons for each location \(\vec {r}\) in space.

$$\begin{aligned} \left\{ \begin{array}{ll} l_i & = ||\overrightarrow{AB}|| \\ o_i & = 1 + \lambda _o \times \left( \sin ( \angle ( \overrightarrow{AB} , \overrightarrow{SC} ) ) \right) ^{\alpha _o} \\ \zeta _i & = 1 + \lambda _{\zeta } \times \left( \dfrac{ | y(\vec {r}_A) - y(\vec {r}_B) | }{ || \overrightarrow{AB} || } \right) ^{\alpha _{\zeta }} \\ \gamma _{i} & = 1 + \lambda _{\gamma } \times \left( 1 - \exp { \dfrac{- \left( \gamma ( \vec {r}_A ) + \gamma ( \vec {r}_B ) \right) }{2 \alpha _{\gamma }}} \right) \end{array} \right. \end{aligned}$$
(2)

After these steps, the Steiner Tree is computed on the resulting graph to select which edges are part of the final solution (more details about the Steiner Tree algorithm are presented in Appendix F). These edges are collected to build the long-range trunk skeleton. Then, this skeleton will be refined to make small-scale morphometrics more realistic. A simple example of the long-range trunk skeleton building process is shown in Fig. 10.

Preferred Region Modeling

As described in Thiebaut de Schotten et al. (2011), the main trunks of the axons usually do not go straight to their targets. In contrast, they tend to use close trajectories in brain regions called axonal projection tracts. Equation 2 presented how the weights of each edge of the graph are assigned before computing the Steiner Tree. In this equation, the last term allows us to consider a set of attractors at arbitrary locations which generate a global attraction field. This attraction field can be used to guide the main trunks of the axons to make them pass through specific regions in the brain, such as the experimentally observed projection tracts (Wang et al., 2020). The right part of Fig. 10 presents the same example as in its left part except that one attractor was added in the upper left corner. In this figure, edges are colored by the linear densities of their weights, where blue-colored edges have small weights and are thus preferred by the Steiner Tree algorithm, while the red-colored edges have high weights and are thus less likely to be selected by the Steiner Tree algorithm. As one can see, edges close to the attractor have smaller weights than edges far from the attractor. As a result, the optimal solution selected by the Steiner Tree algorithm, in this case, is very different from the one without any attractor: the first part of the main trunk passes close to the attractor, and then goes to the closest target, and then connects the furthest target. This simple example demonstrates that it is possible to alter the main trunk shape by defining attractors in space.

In the context of a synthesized axon, multiple attractors may be needed to represent the relevant attraction field. To test and validate the attraction mechanism, an axon was synthesized such that it should be very close to a given reconstructed axon (Fig. 11a to f). In that case, the target points were defined as the common ancestor of each clustered tuft, as described in Section “Mimicking Biological Reconstruc-tions ”. The attractors were built using a new dummy brain region that was added to the atlas composed of all voxels intersected by the reconstructed axon. Then one attractor was created at the centers of a set of voxels of this dummy region. These voxels were randomly chosen such that the distance between two chosen voxels is greater than a given distance. When there is no attractor (Fig. 11a) the main axon trunk goes straight to the target, while when there are enough attractors (Fig. 11b and c) the trunk follows them properly. Nevertheless, a minimum number of attractors is required to ensure good results. In Figs. 11d and e, the attractors are about \(300 \mu m\) and \(500 \mu m\) apart from each other, respectively, and the trunk deviates from the expected path. And if the attractors are far from each other, for example, \(750 \mu m\) apart from each other (Fig. 11f), then the Steiner Tree algorithm ignores the attractors and the trunk goes straight to the target. This example demonstrates that this method can follow complex paths in space when required data are available but that a minimum amount of data is required to provide enough data to the algorithm (i.e. create enough attractors).

Post-processing

The previous Section “Connecting Source Points to TargetPoints” described how the long-range trunk skeleton is built to connect the proper brain regions. However, the resulting tree needs post-processing to ensure that its local morphometrics are statistically consistent with the reconstructed axons. This post-process consists of performing a correlated random walk guided by the intermediate points of the Steiner Tree. This random walk must be calibrated so that the result satisfies the local morphometrics extracted in Section “CreatingStatistical Inputs from Reconstructed Cells”.

An important feature for the subsequent growth process of the tufts is to keep immovable the initial target points that will be used as root points to grow the tufts (these root points will thus become the common ancestors of the synthesized tufts). For this reason, the steps described in this section all preserve these points in the morphology. Furthermore, the bifurcation points are also kept immovable for simplicity.

For each tree, a set of long-range trunk statistics is randomly selected among the ones extracted in Section “Crea-ating Statistical Inputs from Reconstructed Cells”. Then a guided correlated random walk is performed between each consecutive immovable point in the tree, whether they are the source point, a bifurcation point, or a target point. The inputs of this random walk are thus a start point, an end point which is the global target to reach, and a set of intermediate points which are intermediate targets close to which the random walk must pass. The direction of each step of this random walk is defined by Eq. 3, where \(\vec {u}_i\) is the unit vector pointing in the direction of the current step i and which is equal to a weighted sum of several unit vectors: \(\vec {u}_{ep}\) is the unit vector pointing to the endpoint, \(\vec {u}_{nit}\) is the unit vector pointing to the next intermediate point, \(\vec {u}_{hist}\) is unit vector computed from the history of the previous steps and \(\vec {u}_{rand}\) is a random unit vector. The \(\beta _{i, ep}\), \(\beta _{i, nit}\), \(\beta _{i, hist}\) and \(\beta _{i, rand}\) are the weights applied to each associated term. These weights depend on the current step.

$$\begin{aligned} \vec {u}_i&= \beta _{i, ep} \vec {u}_{ep} + \beta _{i, nit} \vec {u}_{nit} + \beta _{i, hist} \vec {u}_{hist} + \beta _{i, rand} \vec {u}_{rand} \end{aligned}$$
(3)

At each step of the random walk, the next intermediate target is reassessed to ensure the random walk goes smoothly from one to another. To do this, the distance to the current next intermediate target is evaluated and compared with the value of the previous step. If this distance increased, then the current intermediate target is considered as reached and the next one becomes the new intermediate target. The intermediate target is also considered as reached if the distance is smaller than the mean step length multiplied by a given coefficient (equal to 1 in this work). This process ensures that the random walk follows the long-range trunk skeleton as defined in Section “Connecting Source Points to Target Points”. Nevertheless, the weight coefficients have to be chosen carefully to avoid unexpected behavior. More specifically, \(\beta _{i, rand}\) should remain smaller than the sum of the other weights to avoid that the random walk deviates too much from the next intermediate target direction.

After these post-processing steps, the long-range trunk is statistically close to the reconstructed ones and is thus ready to be used to grow the tufts from it.

Tufts

This section presents the method used to create the tufts and graft them to the long-range trunk (shown in Fig. 2d). For a given tuft, the general process consists of synthesizing a new tuft based on the properties of a reconstructed tuft. This reconstructed tuft is chosen as a template from a set of tufts according to a given weight. This weight is computed for each target population, as defined in Section “Source and TargetPoints Placement”, from a set of properties that should be as close as possible to what is needed for this newly synthesized tuft. These properties can be based on the morphometrics of the tuft or contextual properties. For example, simple morphometrics can be the path length from the soma (\(d^{path}_i\)) or the total path length of the tuft. A contextual property can be the brain region in which the tuft is located or the distance of the common ancestor of the tuft from the pia.

For example, using the path distance from the soma to the initial point and the expected total path length of the tuft, a probability can be calculated for each reconstructed tuft that may be chosen as a template. For each potential template j, an affinity is computed as:

$$\begin{aligned} A_j&= Al_j \times Ad_j \nonumber \\&= \mathcal {N}(0, \sigma _l^2)(\overline{l_i} - l_j) \times \mathcal {N}(0, \sigma _d^2)(d^{path}_i - d^{path}_j) \end{aligned}$$
(4)

where \(A_j\) is the affinity of the template j, \(Al_j\) is a component of the affinity associated with its total path length, \(Ad_j\) is the affinity associated with its distance from the soma, \(\mathcal {N}\) is a normal distribution, \(\sigma _l\) is the given standard deviation for the expected total path length of the tuft and \(\sigma _d\) is the given standard deviation for the path distance from the soma. Then the probability \(P_j\) to choose a template j is computed as follows:

$$\begin{aligned} P_j&= \frac{A_j}{\sum _k A_k} \end{aligned}$$
(5)
Fig. 3
Fig. 3
Full size image

Main steps of the long-range axon mimic algorithm, including the steps of clustering the input morphologies (Section “Creating Statistical Inputs from Reconstructed Cells”), synthesizing the main trunk (Section “Building the Long-range Tree”) and synthesizing the tufts (Section “Tufts”). The black spheres show the soma positions but the radius was increased to make it visible

Once a template tuft is chosen, a new tuft is synthesized using the NeuroTS algorithm (Kanari et al., 2022a). This algorithm aims at synthesizing topology-guided neuronal morphologies by performing a constrained random walk. It uses topological representations of the morphologies called barcodes, as defined in Kanari et al. (2024), which represents the length to synthesize before creating a bifurcation in the tree, and a few statistical properties such as the randomness amplitude and the mean and standard deviation of the step size of the random walk. The barcode given as input is the one of the template tuft, so the total length of the synthesized tuft is expected to be close to the one needed. In this work, all synthesized tufts use the same set of parameters and distributions in the NeuroTS algorithm, but it is possible to use specific ones for each tuft to refine the results.

Finally, the synthesized tuft is grafted onto the target point which becomes its common ancestor.

Results

In this section, we present the validation of the computational synthesis by comparing computationally generated axons to reconstructed morphologies. We first establish the method’s reliability by demonstrating its ability to precisely mimic reconstructed axons. A critical component of our analysis focuses on examining the impact of cluster size, primarily determined by the Euclidean clustering distance. To illustrate the method’s practical application, we present a case study that leverages reconstructed morphologies to identify axon target points and use atlas data to follow the projection tracts within a mouse brain.

Mimicking Biological Reconstructions

To verify that the method presented in Section “Methods” can build realistic long-range axons, the capacity to synthesize axons similar to reconstructed ones is evaluated.

The main steps of the so-called mimic algorithm, when used to mimic a reconstructed morphology, are summarized in Fig. 3. In the first step (Fig. 3a), a given reconstructed axon is clustered to select its tufts and reduce them to the common ancestors of their clusters, as described in Section “Creating Statistical Inputs from Reconstructed Cells”. Figure 3b shows the biological reconstruction after the tuft clustering: tuft sections are represented in magenta while the trunk is represented in green. Then, a graph is created as described in Section “Connecting Source Points to Target Points”. In this case, the preferred regions as described in Section “Preferred Region Modeling” are enabled. These preferred regions are constructed along the reconstructed morphology, as presented in Section “Preferred Region Modeling”. Then we use the Steiner Tree algorithm (Section “ConnectingSource Points to Target Points ”) to build the long-range trunk that will use the common ancestors of the clustered tufts as target points. This synthesized trunk is then post-processed as described in Section “Post-processing”. Figure 3c presents the synthesized trunk after this post-process. Finally, the tufts are synthesized as described in Section “Tufts”, based on the properties of the clustered tufts of the reconstructed axon. Figure 3d shows the final result of the synthesis process.

These figures show that the global shape of the synthesized long-range axon is very close to the one of the initial reconstructed morphology. The long-range trunk goes to the proper target areas, and the tufts have similar sizes and orientations, so the target regions are properly innervated. Nevertheless, the branches of the long-range trunk are not identical. This is expected because the intermediate points given to the Steiner Tree algorithm are not only based on biological features and are partially random, so the result can not be the same as the reconstructed morphology. Also, one can see that the parallel trunks seen in Fig. 3a (e.g., in the lower-left part of the morphology) are replaced by a single trunk in Fig. 3d, this is because the Steiner Tree Algorithm tries to minimize the total length of the trunk, so the parallel trunks can only appear if there is a forbidden area between them. This specific feature is not covered in this work as it is hard to get such data and because we suppose that the trunks have a small contribution to the connection network compared to the tufts.

In addition, Fig. 9 shows that most local morphometrics are consistent between the synthesized and reconstructed axons. Some morphometrics are not consistent though, especially the mean section lengths and, to a lesser extent, the inter-segment angles. The section length is expected to be different because the bifurcations in the trunk have no reason to be located at the same places.

Figure 4 shows several examples of synthesized axons that mimic reconstructed ones. The clustering distance is \(100~\mu \)m for all of them. Again, the results show good accuracy.

Fig. 4
Fig. 4
Full size image

Examples of synthesized long-range axons (red) and the respective reconstructed axons (blue). These results demonstrate the ability of the algorithm to approximate a variety of different shapes

In addition, to verify that the synthesized morphologies are statistically similar to the biological reconstructions, we mimicked 1076 morphologies of the MouseLight project (Winnubst et al., 2019). These morphologies, extracted from adult mice (\(\text {age} > 8 ~\text {weeks}\)), originate and target various brain regions. We compared the distributions of morphometrical features across the reconstructed and synthesized populations. These distributions for the complete morphologies, the trunks alone and the tufts alone are plotted in Fig. 5. The morphometrics used are defined in the Petilla terminology (Ascoli et al., 2008; Kanari et al., 2022b) and are briefly described in Appendix E. We also plotted the Maximum Visible Spread (MVS) score, defined in Kanari et al. (2022b), for the complete morphologies, the trunks alone and the tufts alone in Fig. 5. This score gives an indication of the similarity between two statistical distributions, by computing the absolute difference between the medians divided by an estimate of the overall visible spread of the distributions. This score is minimal at 0 when the medians coincide and the distributions are very close to each other; the closer it gets to 1, the less similar the distributions are. The morphometrics of the reconstructed and synthesized axons match very closely at a population level, with the worst MVS of 0.12, 0.20, and 0.16 for the remote bifurcation angles of the complete, trunk and tuft morphologies respectively. The cause of this difference in remote bifurcation angles is still unclear and will require more work. It could be due to a fundamental difference between the axonal tuft and dendrite development that lead to small deviations, but the data on this aspect are still sparse.

Fig. 5
Fig. 5
Full size image

Distribution of morphological features across 1076 reconstructed (blue) and synthesized (red) axonal complete morphologies (left column), trunk parts of the morphologies (middle column), and tuft parts of the morphologies (right column). Median, first, and third quartiles are shown on the distributions (hyphens). The MVS statistical score is shown in the last column for each type of morphology

Fig. 6
Fig. 6
Full size image

Parameter selection for clustering distance. Synthesized long-range axons (red) for varying Euclidean clustering distances compared to reconstructed axons (blue). Smaller clustering distance approximates the original axons with higher accuracy. In these figures, the soma is located inside the large tuft in the upper part of the morphology

Clustering Distance Effect

The Euclidean clustering distances used to define the tufts in the reconstructed morphologies is a crucial parameter when aiming to achieve results that closely match the reconstructed morphology. Small clustering distances tend to yield more detailed and accurate representations. However, in practical applications, using such small clustering distances can be challenging due to data accuracy and depends on how the target points are defined. As a result, it becomes necessary to find a compromise between the desired accuracy and the practical feasibility. Figure 6 shows the effect of the Euclidean clustering distance (with a path distance equal to \(d^{path}_{max} = 3 \times d^{Euclidean}_{max}\)) on a given synthesized axon compared to the corresponding reconstructed axon. As can be seen, the synthesized axon is very close to the reconstructed one when the clustering distance is small (Fig. 6a) while it becomes less and less accurate with higher clustering distance values (Fig. 6b to d). Figure 12 shows the normalized L1 error of the projection intensity for each Euclidean clustering distance. The projection intensity plotted in this figure is computed as follows: a 3D grid is created with a given voxel size, then the intersection of the axon and each voxel is computed for both the reconstructed and the synthesized axons. Then the normalized L1 error between the voxels of the two previous results is computed with Eq. 6 where \(r_i\) is the value of the voxel i of the reconstructed result and \(s_i\) is the value of the voxel i of the synthesized result.

$$\begin{aligned} L_1 = \dfrac{\sum || r_i - s_i ||}{\sum r_i + \sum s_i} \end{aligned}$$
(6)

The process is repeated for multiple voxel sizes in order to show which scales are properly synthesized and which are not. The result shows that for a clustering distance equal to \(100~\mu \)m the synthesized axon has very similar intersection lengths as the reconstructed axon as long as the voxel size is bigger than \(250~\mu \)m, while the error increases fast for lower voxel sizes. This means that the spatial structures bigger than this scale are accurately synthesized while the smaller structures are less accurate, which is expected since the synthesized axon is only supposed to be similar to the reconstructed one but not strictly equal, especially in the tuft areas. For bigger clustering distances, the size of accurately synthesized structures is much bigger.

Fig. 7
Fig. 7
Full size image

Comparison of two synthesized long-range axons, whose somata are located in the Primary Motor area, within the brain Atlas. The preferred region feature is disabled for the green one while it is enabled on the projection tracts for the red one. The two sub-figures show the same result from different points of view (top view in the upper figure and front left oblique view in the lower figure)

Application: Connecting Mouse Brain Regions

All previous tests have shown that the presented model is capable of accurately synthesizing long-range axons in many different situations. This last section presents a possible use case in which reconstructed morphologies are used to define the target point locations in the brain atlas. In this example, 44 morphologies from Winnubst et al. (2019) with somata located in layers 5 and 6 of the Primary Motor Area (MOp5 and MOp6 regions) were clustered to define trunks and tufts. Random points were then selected around these tuft locations using a normal distribution with a standard deviation equal to \(200~\mu \)m, while ensuring that the selected points remained in the same brain region as the initial tuft point. The properties of the tufts were directly used for picking the barcodes during the synthesis of the new tufts.

Projection Tracts as Preferred Regions

Using this setup, we first checked the effect of the preferred region feature. Here, all the brain regions considered as projection tracts in the CCFv3 atlas (Wang et al., 2020) are used as preferred regions. In this context, two long-range axons were synthesized using the same source and target points, the only difference being whether the preferred regions were enabled or not. As can be seen in Fig. 7, the two axons are very different: when the preferred regions are disabled, the main trunk goes straight to the target points. When preferred regions are enabled the trunk follows the projection tracts when possible and then goes to the target points. This example shows that the model can follow realistic projection tracts in the brain, which is a key feature for realistic long-range axon synthesis.

Fig. 8
Fig. 8
Full size image

Example of 44 long-range axons synthesized such that follow the projection tracts (blue region) (a and c) compared to the respective 44 reconstructed morphologies (b and d). Each axon has a different color

Boundary Constraints

This application aims at synthesizing long-range axons in a specific spatial context, namely that of a mouse brain. In this context, an important feature is to ensure that the synthesized morphologies do not go outside of the brain. This is done by applying simple boundary constraints. The first constraint only applies to the long-range trunk since it is applied during the graph creation step described in Section “Connecting Source Points to Target Points ”. During this step, all the points of the graph located outside the brain are discarded before computing the Steiner Tree. This ensures that the long-range trunk does not go outside the brain. The second constraint only applies to the tufts, since it is applied during the tuft generation step described in Section “Tufts”. First, the space is divided into a 3D grid, the same as the one containing the atlas brain regions. Then for each voxel of this grid, the vector from the center of the voxel to the closest boundary point is calculated, so we obtain a 3D vector field, called the boundary field, each vector representing the distance and direction to the closest boundary point. Finally, the vectors located outside the brain are reversed and resized to a small length (\(1e^{-3} \mu m\) in this work) to make the space outside the brain repulsive. After this process, the growth of the tufts is modified so that at each step of the random walk the distance and direction of the current point to the closest boundary are extracted from the previous boundary field. This distance is then refined to consider the position inside the voxel and used to calculate an attenuation vector using the following equation:

$$\begin{aligned} \vec {a}_i = \dfrac{\vec {b}_i}{||\vec {b}_i||} \times \exp \left( - \alpha _b \times ||\vec {b}_i|| \right) \end{aligned}$$
(7)

where \(\vec {a}_i\) is the attenuation vector for the point i, \(\vec {b}_i\) is the vector from the boundary field at the location of the current point i and \(\alpha _b\) is a scaling coefficient. The refined distance is calculated by a simple linear extrapolation, considering that the attenuation field keeps the same direction inside a voxel. So, the vector from the current point to the center of the voxel can just be projected on the vector associated with the voxel and added to it to refine the distance to the boundary.

Table 1 MVS scores of the morphometrics distributions computed on the 44 morphologies synthesized with and without the boundary constraints enabled. These MVS scores are very close to 0 so the morphometrics distributions are almost identical whether the boundary constraint is enabled or not

Finally, this attenuation vector is added to the direction of the current step of the random walk to reduce the component in the direction of the boundary. This process will reduce this component more and more so that it cannot cross the boundary.

Multiple Axon Synthesis

Using the same setup, 44 long-range axons were synthesized. The model is again configured to follow the projection tract regions of the CCFv3 atlas (Wang et al., 2020). The main parameters given to the model are presented in Table 2. Figure 8 shows that the long-range axons follow the projection tracts and properly innervate the same brain regions as the reconstructed morphologies. In this result, we notice that the synthesized axons do not always follow the same tracts as the reconstructed morphologies. This is because all the tracts were considered equally attractive in this application for simplicity, but this is not the case in reality and this approach should be refined for a real application.

Finally, we compared a set of morphometrics when boundary constraints were enabled or disabled to ensure that this process did not have a sensitive impact on the local structure of the morphologies. Table 1 presents the MVS scores for several morphometrics of the morphologies synthesized without the constraint compared to those calculated with the constraint. Most of these morphometrics are very similar (MVS scores are all very close to 0). Note that tortuosity and radial distances are expected to be different since the sections close to the boundary are curved when the boundary constraints are enabled.

Benchmarking

The method presented in the previous sections was developed to be computationally efficient. The algorithm was parallelized at morphology scale using MPI. The implementation relies on the dask framework, especially the dask-mpi extension. The main inputs are computed using vectorial computation on a global scale, before the tasks are scattered among the workers.

To estimate the computational cost of the algorithm, a test was performed in which 100000 long-range axons were synthesized on a super-computer with 200 tasks. The morphologies started from the Primary Motor Area (MOp) and expanded to multiple other regions in the brain. Parameters were consistent with those used in Section “Multiple Axon Synthesis” and both the preferred regions (for the projection tracts) and boundary features were enabled. In these conditions, the test is completed in 33min 15s using about 4Go of memory per task. Note that this time could be reduced by selecting the proper subset of projection tracts for each morphology. Indeed, all the projection tracts were selected as preferred region for each morphology, which results in a very large number of points in the graph on which the Steiner tree is computed. Therefore, the computation time reported in this benchmark should be considered as an upper boundary for real applications. The total length synthesized considering all axons was equal to \(1.01 \times 10^3\)m, which means that the mean length of the synthesized axons is equal to \(1.01 \times 10^4\mu \)m. This computational speed enables one to synthesize large neuronal circuits in a reasonable time.

Discussion

The synthesis of long-range axons is a complex yet essential step in reconstructing complete brain networks. In this work, we introduce a novel algorithm to synthesize long-range axons accurately. The synthesized axons show strong statistical agreement with expected biological properties, including local morphometrics, global targeting, and intermediate trajectories through preferred regions, such as projection tracts. Thus, the presented algorithm enables the generation of a diverse range of axons contributing to the reconstruction of large-scale brain circuits. In addition, the method is computationally efficient, and can thus generate 100000 axons of approximately 1km total length in about half an hour on a cluster with 200 tasks.

The axon synthesis algorithm can be further refined to generate more accurate axonal morphologies and, importantly, to increase variability within computational models of long-range axons. The current version shows sensitivity to the quality of input data, as shown in Section “ClusteringDistance Effect”. As more experimental data becomes available, the algorithm should generalize to capture the global properties of long-range axons to minimize dependency on specific experimental inputs. Additionally, the selection of projection tracts should be guided by the source and target locations of each axon, as outlined in Budd et al. (2010) and Thiebaut de Schotten et al. (2011), while also accounting for the different neuronal classes, as described in Liu et al. (2024).

Finally, it would be important to use population statistics to refine the inputs of the algorithm. Two recent studies, Venkadesh et al. (2023) and Wheeler et al. (2024), present methods to identify specific projection types that could be used to refine the targeting inputs for axons, based on population statistics. For example, target or intermediate points that are used as Steiner Tree inputs could be chosen based on specific sub-region patterns for different projection neuronal types, as observed in experimental data (Venkadesh et al., 2023). This improvement would greatly increase the accuracy of the synthesized neuronal circuit and reduce dependence on the specific input data. In addition, we expect that this refinement would not significantly impact computational performance, with the potential to actually reduce computational cost by reducing the number of nodes in the graph explored by the Steiner Tree algorithm.

Another potential improvement is to modify the Steiner tree algorithm to achieve a balance between cable cost and distance from the source, following an approach similar to the Tree algorithm, as described in Cuntz et al. (2010). Along with optimizing the selection of Steiner parameters, we will also implement an enhanced version of source-target connectivity. These modifications will improve specific morphometrics that currently do not perfectly match experimental data, such as inter-segment angles, bifurcation angles, and branch orders.

In a companion study, Petkantchin et al. (2024), we focus on generalizing the axon targeting from the available biological inputs for each brain region and hemisphere, and adding specialized tuft selection and placement to avoid replicating individual morphologies, thereby more accurately capturing population variability. By adding a clustering process to better differentiate the synthesis inputs, we also capture the source and targets of axons more accurately, improving, in addition, the input positions to the Steiner Tree algorithm. These improvements, alongside the incorporation of additional experimental data, will further enhance the quality and biological relevance of the synthesized axons.

The axon synthesis algorithm enables the generation of large-scale neuronal circuits with biologically realistic connectomes, generating detailed computational brain models. This capability allows for in-silico experiments that are otherwise unfeasible in-vivo, providing a powerful tool for advancing neuroscience research. The integration of synthesized brain regions with electrical models (Ofer et al., 2017; Rama et al., 2018) further enables the study of signal propagation across large neuronal circuits (Markram et al., 2015; Reimann et al., 2024). This step is essential for simulating brain disorders and investigating specific neurological conditions such as autism, epilepsy, Alzheimer’s disease, Parkinsons’s disease, and other neurodegenerative disorders (Belmonte & Bourgeron, 2006; Kaufmann and Moser, 2000; Moolman et al., 2004; Srivastava et al., 2012; Torben-Nielsen & Cuntz, 2014; Koch et al., 2015). While the full validation of such models remains a challenge, this approach holds significant promise for improving early-stage diagnostics and accelerating drug discovery. Moreover, this translational approach offers the potential for personalized medicine, enabling patient-specific drug testing and therapeutic strategies over the long term.

Information Sharing Statement

All data and scripts used to generate the results presented in this work are available here: https://doi.org/10.5281/zenodo.13843537

The main code used for axon synthesis is available here: https://github.com/BlueBrain/axon-synthesis