Alexander Spirov
Computer Science and CEWIT, SUNY Stony Brook, Stony Brook, NY, USA; and the
Sechenov Institute of Evolutionary Physiology and Biochemistry, St.-Petersburg, Russia
David Holloway
Mathematics Department, British Columbia Institute of Technology,
Burnaby, B.C., Canada
Developmental biologists approach the embryological formation of organisms in terms of the temporal and spatial regulation of gene expression. How does gene regulation precisely specify in what cells and at what time a gene is expressed (to a given level)? Much of the work in the field is on the gene–gene interactions, frequently in extended regulatory networks, which control given events in development.
Computational work on developmental gene regulatory networks (GRNs), to construct the “wiring diagrams” of gene connections and characterize network dynamics, has been going for many decades. More recently, there has been increasing focus on the evolutionary dynamics of how these GRNs arose. Since developmental GRNs are fundamental to creating the body plan of a species (and the functions associated with the form), understanding the evolution of GRNs is fundamental to understanding the dynamics of speciation: how is an organism’s form (phenotype) maintained, despite environmental variation and genetic mutation, and how is such stability balanced against evolvability, or the potential to respond with fit phenotypic changes to environmental or genetic changes? Computational simulations of GRN evolution have become very powerful tools for addressing such fundamental issues in evolutionary theory. At the same time, evolutionary algorithms (EAs) have become a major approach for optimization problems. EA has been inspired by the general principles of biological evolution, such as mutation, selection, and reproduction. We feel there is a large potential for developing new EA techniques based on more detailed mechanisms of biological evolution, for instance, using sexual reproduction or retroviral mechanisms of gene crossover. At the same time, the techniques used by computational biologists simulating evolution can be computationally intensive, and could benefit from improvements from computer science EA approaches (e.g., Ref. [1]). For a recent methodological guide to the computational biology of evolutionary simulations, see also Ref. [2].
In general, we will be discussing evolutionary computations with the general cycle (Figure 10.1):
In this chapter, we will outline the general approaches being used by computational biologists to simulate evolution, and review some of the major impacts this work is having on biological theory (especially, understanding the dynamics of evolution). With this survey of evolutionary simulation, we aim to inspire computer scientist practitioners of EA with the biology, as well as pointing to areas in the biological modeling, which could most benefit from increased computational efficiencies.
We distinguish two broad approaches for simulating GRN evolution in developmental problems. The first, the “coarse-grained” approach, treats the genes as nodes and their interactions as edges in a network (as depicted in Figure 10.1). The interactions and their strengths can be coded as a regulatory matrix. This approach has been used very extensively for studying evolutionary principles over the past two decades. It is, however, coarse grained in the sense that real gene regulation and biological evolution both operate at the deoxyribonucleic acid (DNA) sequence level. Fine-grained approaches have been increasingly developed in recent years, modeling at the level of sequence information (or at least a region of sequence, or regulatory module). With the corresponding increase in genome sequencing and comparative genomics, this provides powerful predictive value for evolutionary dynamics at the molecular genetic level.
Coarse-grained simulations of GRN evolution were pioneered by Andreas Wagner et al. [4, 5]. In their approach, genes are nodes of the network, which interact (network edges) only one on one (co-factors are not modeled), leading to Boolean (on–off) expression levels. These assumptions are too simple for a number of developmental phenomena (such as spatially patterned gene expression in response to a concentration gradient of a regulator), but the models are very fast to solve, and have had significant impact on the understanding of evolutionary principles (see Ref. [6] for further perspective).
In Wagner’s core model, a GRN is represented by the state of the network genes 1 − N:
Si(t) is binary—the gene is either expressed or not. Cross- and autoregulatory interactions causing the state to change are modeled by difference equations:
where the expression state of gene Gi at time t + τ, Si(t + τ), is a function of a weighted sum, hi(t), of the expression state of all network genes at time t. σ(x) is the sign function (σ(x) = −1 for x < 0, σ(x) = +1 for x > 0 and σ(0) = 0), and τ is a time constant whose value depends on biochemical parameters such as the rate of transcription or the time necessary to export mRNA into the cytoplasm for translation. The real constants wij represent the strength of regulatory interaction of the product of Gj with Gi, (activation, wij > 0; repression, wij < 0). wij are the elements of the W matrix representing regulation in the GRN. The approach is shown schematically in Figure 10.2; see Ref. [7] for a recent review.
The W matrix approach has been used extensively over the past two decades. To address particular topics, it has been refined and extended in a number of ways.
Siegal and Bergman [8, 9] introduced a sigmoidal function for σ in Equation (10.2) that allowed for dynamic switching of a repressive state into an activating state. This was then extended to allow for recombination (allele r or R, with equal probability) to modify σ [10]. Masel [11] set σ(x) = 1 (expressed) if x ⩾ 0 and σ(x) = 0 (not expressed) if x < 0, a more realistic mapping, biologically, than the negative values allowed in Equation (10.2). The balance between activating and inhibiting connections in W matrix models, including oscillations, was explored by McDonald et al. [12].
Biological gene regulation in eukaryotes commonly involves co-factors, in which the effect of a transcription factor (TF) is amplified or diminished by the presence of a different TF. Extension to a triple-index representation, Wijk, allows for TFs j and k to simultaneously affect the expression of gene i [13].
A critical aspect of developmental gene expression is its spatial dependence; genes must be expressed in distinct domains in order for tissues to differentiate in the correct locations. Although simulations of spatially dependent GRNs have a long tradition in developmental biology, it is more recently that the evolution of these GRNs has been studied. Anterior–posterior segmentation of the insect body plan is extremely well characterized through comparative biology, as well as through developmental biology of the fruit fly Drosophila melanogaster, and has become a test case for evolutionary simulations of spatially dependent gene expression. While Boolean gene states have been used with spatial patterning, researchers are generally turning toward continuous (DE) representations of gene states, to address the concentration-dependent aspects of developmental patterning. Evolutionary coarse-grained segmentation models include those of Salazar-Ciudad et al. [130, 131]; Sole et al. [14], Francois et al. [15, 16], Fujimoto et al. [17], and ten Tusscher and Hogeweg [18, 19] (overview of their approach shown in Figure 10.3). For reasons of computational simplicity, these models assume a one-dimensional embryo consisting of a row of cells (Figure 10.3b), ignoring morphogenetic processes and patterning in higher dimensions. Some of these network models, (Refs. [18, 19, 15, 16])are still Boolean and do not incorporate diffusion or transport for modeling spatial pattern formation, while others (Refs. [130, 131, 17]) have developed continuous evolutionary models for insect segmentation, with Hill-equation kinetics. With the exception of the work in Refs. [18, 19], which used a genome encoding the regulatory network, studies used the GRN directly as the genotype, with mutations occurring directly on the network (as depicted in Figure 10.3c) rather than on the genome sequence. The approaches differed, in details, on how the genotype is mapped to the phenotype (Figure 10.3b and Ref. [19]); please see Ref. [19] for a recent review on how these evolutionary developmental simulation studies shed light on some classic biological problems.
Though an advantage of coarse-grained approaches is their speed, several groups have worked on improving computational efficiency, in order to access larger numbers of generations (simulation cycles), particularly for spatially dependent models. Fomekong-Nanfack et al. [20] developed an evolutionary strategy (ES) for finding parameters in segmentation models, some 5–140 times faster than an earlier parallel simulated annealing approach [21]. Their “island” ES algorithm operates on N populations of individuals, each with a population size of i. Simulation cycles are generally run within islands, with the best individuals copied between islands every m generations. Jostins and Jaeger [22] developed an asynchronous communication between parallel islands that offered further speed-up. A different parallel method, DEEP (Differential Evolution Entirely Parallel), has been developed by Kozlov and Samsonov [23].
At a more general level, evolution has been studied within the framework of artificial life. Digital organisms can replicate, mutate, and have fitness criteria. This approach does not tend to address the biological details of gene regulation, but it can offer a platform to study general principles of evolving systems. For instance, Lenski et al. [24] used two classes of organism; one simple (rapidly replicating) and one complex (able to accelerate replication according to “metabolic” rewards). They found, after millions of mutations were introduced, that the complex organisms were more robust to single mutations than the simple organisms. Ofria et al. [25] found that while traits may appear suddenly in an evolutionary process, the information content of the genome is gradually varying. They argued that “… it is nearly impossible for a significantly complex trait to appear without reusing existing information.” Clune et al. [26] addressed the classic question of whether ontogeny recapitulates phylogeny, finding, in digital organisms, that traits that evolved earlier tended to occur earlier in an individual’s development. By adding a type of epistasis (gene background effect), Valverde et al. [27] observed that modular control of expression (clustered gene effects) was spontaneously generated (also see Section 10.2.2.2). Covert et al. [28], using digital organisms, described how deleterious mutations could be beneficial, opening access to new regions of the fitness landscape. Batut et al. [29] applied digital organisms to the evolution of marine cyanobacteria; matching observations that reduced selection strength led to genome loss, with a majority of the loss in non-coding sequences.
To address the biology more realistically, a number of groups have incorporated DNA structure into evolutionary models. The finer level of detail brings significant computing costs, but can answer evolutionary questions on the molecular genetic level (see, e.g., the review in Ref. [30]). Different levels of abstraction can be made within the fine-grained approaches. It is rare to model at base-pair resolution, but a number of groups include TF binding sites (TFBSs; including cooperative or competitive effects between TFs), as well as addressing the modular structure of cis-regulatory regions (reflecting that many genes have multiple enhancers [31, 32]).
ten Tusscher and Hogeweg [33] extended the basic W matrix approach to allow for multiple TFBSs. This allowed them to model different types of mutations: in TFBSs; in the coding part of the gene; as well as those affecting segments of the genome (see also Ref. [34]). At slightly more detail, Pujato et al. [35] included structural models of TF–DNA complexes to find binding strengths. Mutation altered TFBS strengths and created or destroyed binding sites (BSs). The Sinha group [36, 37] has developed thermodynamic models for binding individual TFBSs and for entire cis-regulatory modules (CRMs). Simulations of TFBS evolution for 37 well-characterized CRMs in Drosophila segmentation corroborated the BS conservation at short times and BS loss at longer times seen across 12 Drosophila genomes.
Many genes have multiple CRMs, with distinct sequences of TFBSs. For spatial patterning, different CRMs can control different regions of gene expression. We have developed a model for the evolution of multiple CRM structure in one of the major Drosophila segmentation genes, hunchback (hb) [38, 39]. Figure 10.4(a) shows some of the experimentally determined structure, in terms of BSs for specific hb TFs, separated into two of the three known enhancers active during segmentation. Figure 10.4(b) shows how we represent this structure in symbolic strings. The expression levels of hb depend on the arrangement of activating and repressing TFBSs, modeling co-activation and co-repression. In each generation of an evolutionary computation, hb spatial patterns are selected against experimental hb patterns. TFBSs mutate and evolve to optimize fits to the data. We have shown that the multiple CRMs observed in hb could have readily evolved.
Evolutionary computations have been applied in a number of areas, allowing researchers to test the limits of theoretical concepts. This has led to a number of refinements of classic evolutionary theory in recent years.
Several related concepts from the 1950s have had a major impact on the evolutionary theory of development. These include canalization, whereby species show low phenotypic variation, despite ample genetic and environmental variation (also termed as “robustness” [40, 41, 42]; and genetic assimilation, in which a phenotypic change induced by an environmental perturbation becomes stabilized in the genotype [43, 44].
The understanding of how canalization operates has been greatly advanced in recent years through evolutionary modeling (reviewed in Ref. [45]). A number of different computational approaches have exhibited canalization, highlighting different aspects of the evolutionary mechanisms involved.
Stabilizing selection (in which individuals are removed from both ends of a phenotypic distribution, maintaining the mean) has been shown, by modeling at several levels, to contribute to canalization. In simulations where phenotypes had different degrees of sensitivity to microenvironmental variation (developmental noise [46]), stabilizing selection produced canalization. Explicit modeling of individual development and evolution also showed this [4]. Modeling at the population-genetic level, Wagner et al. [47] distinguished that canalization against environmental variation can increase with stabilizing selection, but canalization against mutation increases only up to a point. Above a threshold, stabilizing selection purges the genetic variation from a population, and genetic canalization no longer applies.
Indeed, further work indicates that selection for environmental canalization may produce genetic canalization as a by-product [8, 9], that more densely connected networks evolve to be more robust to perturbation than sparser networks under stabilizing selection, and that phenotypic robustness to genetic perturbation does not directly indicate adaptive canalization (reviewed in Ref. [48]). Siegal and Bergman’s [8] simulations indicate that increased GRN complexity is associated with robustness. Canalization is not due to direct selection for reduced variability, but because GRNs evolve complexity to buffer against the effects of lethal gene deletions. “We argue that canalization may be an inevitable consequence of complex developmental-genetic processes and thus requires no explanation in terms of evolution to suppress phenotypic variation” [8] (see also Refs. [5, 49]). Systematic computational tests of how an arbitrary null (knockout) mutation affects the expression of other genes reveal that arbitrary genes in complex gene networks have the property of buffering genetic variation and, therefore, have the potential to act as evolutionary capacitors. In a complex GRN, there are many gene products which could appear as “phenotypic” capacitors, such that their removal increased phenotypic variability [9]. This is consistent with experimental observations that the buffering of other genes’ expression levels is disrupted when a single gene is knocked out. An extensively studied example is the molecular chaperone Hsp90 [50, 51], but GRN dynamics suggest there should be many more. Experiments in yeast have indicated more than 300 gene products whose removal increased variation [52]. Leclerc [104] has noted that real gene networks appear to be robust, but tend to be sparsely, not densely, connected. He showed that when the costs of complexity are taken into account, that robustness implies a parsimonious network structure that is sparsely connected; that selection will favor sparse networks when topology is free to evolve.
Gombar et al. [53] simulated epigenetic regulation (e.g., chromatin remodeling), and found that this not only facilitated decoupling of mutational and environmental robustness, but it also increased GRN sensitivity to noise (without the loss of robustness). The authors speculated that this epigenetic effect was a necessary condition for multicellularity to arise. Furusawa and Kaneko [54] found that positive feedback through epigenetic regulation aids fast and precise evolutionary responses to environmental change.
Canalization as a by-product of selection against environmental variation appears to be quite general, being seen in simulations of ribonucleic acid (RNA) evolution [55] and metabolic network evolution [56] (where it was suggested that environmental noise is needed for metabolic networks to be robust to gene loss). In general, “It may be that any developmental system that greatly reduces the dimensionality of phenotypes relative to the dimensionality of genotypes will produce the appearance of canalization by releasing hidden genetic variation when perturbed” [48].
As Waddington first stated the genetic assimilation concept, “… if selection was practiced for the readiness of a strain of organisms to respond to an environmental stimulus in a particular manner, genotypes might eventually be produced which would develop into the favoured phenotype even in the absence of the environmental stimulus. A character which had originally been an ‘acquired’ one might be said to have become genetically assimilated” [43] (see also Refs. [40, 44]). Computations in the past decade have refined this idea. For instance, Masel [11], using a variant of the Wagner model (Section 10.2.1.2), showed that genetic assimilation can occur in the absence of selection for the trait.
Genetic assimilation requires an underlying phenotypic plasticity—a capacity for the genotype to produce multiple phenotypes in response to non-genetic perturbations [57, 58, 59]. A factor in this response is the release of cryptic genetic variation, which occurs due to neutral drift, but may be invisible until a large enough perturbation occurs [60]. Mathematical analysis of Wagner’s canalization model [61] indicates that evolution toward a particular phenotype stabilizes other (cryptic) phenotypes, making GRNs more robust to genetic perturbations (loss, mutation, etc.). Computations by Lande [62] quantify the relation between plasticity dynamics and assimilation, wherein plasticity must increase in order to allow evolution to a perturbed environment, but then decrease in order to maintain the new optimum. Espinosa-Soto et al. [63] also characterized the plasticity–assimilation relation, using classic W-matrix simulations [64].
Lehner and Kaneko [65] have used an approach from statistical physics to derive a “fluctuation–response” relation for evolution akin to the “fluctuation–dissipation” theorem relating Brownian noise and motion in fluids. This describes a fundamental relation between environmental noise and evolvability. They found an optimal range of noise, which could contribute to evolution: above a critical noise level “error catastrophe” resulted; below this, GRNs can get stuck in fitness landscapes, and not evolve due to perturbations (i.e., low plasticity) [66, 67, 68]. They also found that evolution of robustness to noise could result in robustness to genetic mutation (similar to Ref. [8]), due to a proportionality between the environmental (noise) and genetic (mutation) contributions to phenotypic variance [69, 70, 71, 72].
A number of these projects used numerical simulations to explore the theoretical principles. In Ref. [67], it was found that networks acquired both mutational and noise robustness if the phenotypic variance induced by mutations was smaller than that observed in an isogenic population (where variance is only due to gene expression noise). They found adaptive states, characterized by high production and decay terms, were also less affected by gene expression noise. Developmental noise could aid both adaptability and robustness, with the highest adaptability at high noise levels that nearly destroy robustness.
In Ref. [71], two interacting and evolving networks were modeled: one a GRN controlling protein expression and the other a metabolic network depending on the GRN proteins and environmentally supplied nutrients. Growth of the “cell” containing these networks is limited by metabolite influx. They found that in any cell with stochastic gene expression, the cell’s state was selected by noise—it did not need to be driven by a specific signalling network.
Kuwahara and Soyer [73] used an intrinsic method (Gillespie algorithm) to model noise in a gene with autofeedback, and corroborated that noise (in the resulting bistable switch) aids evolvability. Extending the approach to consider protein–gene and protein–protein interactions, van Dijk et al. [74] have predicted that GRNs with monomeric TFs, rather than dimeric, are more robust to mutations affecting regulatory interactions.
Quantifying Waddington’s concept, Kaneko [75] identified genetic assimilation as the proportionality between genetic and environmental contributions to phenotypic variability. He found that the number of genes with environmental contribution greater than genetic contribution increased as robustness of fitness increased, with a concomitant decrease in the degrees of freedom for gene mutation [76].
Biologically, stable end states might not be the critical feature for speciation or survival. How fast a species approaches a stable state, its rate of adaptation, might be more important. Genes in a GRN will generally change expression in response to an environmental perturbation, with some genes returning rapidly to prior levels and some maintaining altered expression to adapt [77]. Neyfakh et al. [78], using a simple model of a cell with a GRN controlling basic metabolism of an environmental nutrient, found a rapid rise in fitness in response to an environmental perturbation, which then plateaued. The resulting GRNs were quite diverse, many with what appeared to be functionally non-essential genes (see also Section 10.3.3.1 on gene co-option). Cuypers and Hogeweg [79] (reviewed in Ref. [80]) further worked with this model, and found that the plateauing after the initial rise in fitness could be characterized by a period of streamlining gene loss.
Does this dynamic merely reflect the plasticity needed for genetic assimilation, or does plasticity itself affect adaptation rate (i.e., what is the relation between plasticity and evolvability)? Fierst [81], using the Wagner model [4], showed that “a history of phenotypic plasticity may … shorten the waiting time for the generation of phenotypic variance from new mutations and recombination. Rather than acting as a short-term alternative, phenotypic plasticity may facilitate future adaptation and genetic evolution.” On the other hand, plasticity can contribute to liability in developmental GRNs. Draghi and Whitlock [82], with a continuous extension of Wagner’s model [5], quantified how phenotypic plasticity increased variance, in mutations and in standing genomes, which could be amplified for correlated genes.
Francois and Siggia [16] (see also Refs. [83, 84]) used small GRNs to characterize two finer aspects of the response to perturbations: responsiveness, the speed at which genes altered expression, and quality of adaptation, or how well the GRN maintained fitness to the new environmental condition (see also Ref. [85] for an application to temperature compensation in circadian clocks). Ma et al. [86] did a complete analysis of all 3-node GRNs, and found only two topologies with an adaptive response, a negative feedback loop with a buffering node and an incoherent feed-forward loop (FFL). However, this does not appear to generalize to large GRNs: simulations with large GRNs [77] did not find few gene motifs underlying adaptation; rather, adaptation was cooperative across the network (and contributed to robustness).
Modeling has made the original concepts of Waddington much more precise; the quantitative questions raised by the modeling, such as what degree of plasticity is optimal for the development of particular organisms, may now be at the stage to allow for experimental testing.
Genetic material is altered in mutation and in reproduction. Both have profound impacts on evolutionary dynamics. There is great potential to refine EA by drawing from the details of biological reproduction mechanisms. In particular, point mutation tends to destroy meaningful “words” (genes) in the process of evolution; crossover of longer segments of genetic material can maintain these building blocks [87, 88]. For instance, Martin and Wagner [7] added recombination and point mutation steps to the W matrix approach and reported that recombination may be a much stronger factor in GRN evolution than point mutation. When both steps are active, they found genotype diversity to be increased, the deleterious effect of point mutations to be decreased, and the emergence of cis-regulatory complexes (combinatorial regulatory interactions). In this section, we discuss several approaches to simulating reproduction mechanisms and how this reflects on the role of reproduction in evolutionary dynamics.
The role of sexual reproduction in evolution has been intensively studied, and has long been understood to increase population fitness. Azevedo et al. [89] quantified this effect using an extension of the W matrix approach (see also Ref. [33] for sexual reproduction in W matrices). At closer detail, Livnat et al. [90, 91] argued that sex does not maximize fitness generally, but specifically improves mixability, the capacity of a particular genome to function with a wide variety of genetic partners. Sex also has the potential to destroy favorable combinations of genes. Misevic et al. [92, 93] used a digital organisms’ approach to compare sexual reproduction and asexual reproduction. They found clustering of genes (modularity) for similar traits with sexual reproduction, which may contribute to maintaining desirable characteristics.
Sexual reproduction frequently involves sexual selection, which can drive sexual dimorphism. Rapid evolution of dimorphism suggests that the underlying GRNs may favor evolvability over phenotypic stability. However, Fierst [81, 94] found, with a W matrix approach, that GRNs could produce dimorphic characteristics with both high robustness and high evolvability; coarse-grained dynamics may have the capacity to do both things.
Heterosis, or “hybrid vigor,” is the tendency for crosses of individuals from genetically distinct populations to improve particular traits (e.g., growth rate). Emmrich et al. [95, 96] have recently developed a GRN evolutionary simulation approach with hybridization to quantify this process.
Sexual reproduction is not the only mechanism used in biology for crossover of large segments of intact genetic information. We have developed a genetic algorithm (RetroGA) with crossover inspired by the mechanism by which retroviruses create DNA strands from RNA [87, 88] (which is then copied into a host). Biologically, the DNA strand is generally produced from two parent RNA strands, with the read-off switching between strands depending on jump sequences (regions of homology). We have tested this mechanism on Royal Road and Royal Staircase functions [97], designed to evaluate the retention of building blocks (e.g., genes), as well as on biological test cases in bacterial promoter evolution and Drosophila segmentation. These tests are characterized by basin-subportal architecture of the fitness landscape, with long periods of neutral evolution followed by quick jumps to higher fitness (upon creation of a new building block). In all cases, the RetroGA algorithm significantly sped up evaluation in comparison to point mutation.
GRNs can change size in the process of evolution, by gaining or losing genes. Genes can be newly created or destroyed within a GRN. But it is likely much more common for pre-existing genes to have small mutations in their CRMs, which allows them to be regulated by TFs from other GRNs, effectively recruiting them into that GRN [98, 99]. A number of groups have simulated ways in which simpler ancestral GRNs can evolve into more complex derived GRNs.
Computations indicate that co-option (recruitment) of genes and outgrowth of GRNs (increase in network nodes) occur spontaneously, without particular selective pressure for network complexity. Incorporating gene addition and gene withdrawal operators into a standard GA, we found a net increase of genes in GRNs (Figure 10.5) [100, 101]. We tested the algorithm on a coarse-grained (continuous, partial DE) GRN model for Drosophila segmentation, and found that the derived networks showed increased developmental robustness to fluctuations in maternal signalling factors. Co-opted genes frequently matched experimentally observed patterns, and became essential nodes of the derived GRN. Similar facility of network outgrowth has been observed in Ref. [102]. In simulations of protein networks, these authors found that adding mechanisms for protein addition, duplication, or deletion, as well as creation or destruction of interactions produced network growth, in the absence of any specific selection pressure for larger networks. Tsuda and Kawata [103] found similar results with a fine-grained GRN model (with TF binding in individual CRMs): complex GRNs tended to evolve if beneficial gene duplications were fixed in the genome. In the very fine-scaled work of He et al. [36], which was corroborated against Drosophila sequence data and expression patterns, it was observed that complex networks were favored by the evolutionary process (characterized by large numbers of TFBSs in the enhancers). With relevance also to Section 10.3.1, Pujato et al. [35] found that in simpler GRNs (with less connections) robustness (to environmental perturbations) depended on local sequences decreasing the creation or destruction of TFBSs. For more complex GRNs (with more connections), robustness depends on regulating TFBSs with high-level network wide effects. Iwasaki et al. [60] found a strong correlation between the amount of cryptic variation and the network size, leading them to propose that network expansion could be a source of evolvability. Work by Leclerc [104] places some caution on universal GRN outgrowth: if a cost is entered for GRN complexity, evolution can favor sparser networks.
Computations suggest that gene co-option can provide robustness to a GRN from parasite attack. In studies of co-evolution of a host and a parasite, Salathe and Soyer [105] found that parasite selective pressure led to recruitment and retention of new nodes to a host GRN, which buffered the host to loss of particular genes. When the parasites were removed, the host GRN could shrink again without the loss of fitness.
In several studies [106, 107, 108], we have characterized how gene co-option can maintain GRN functionality to form segmentation patterns, despite the attack by genetic parasites (transposons, e.g., Refs. [109, 110, 111, 112, 113, 114]). With a coarse-grained model [107, 108], we simulated the attack of a transposon on the ability of a GRN to read a maternal positional information gradient critical to segmentation. Computations showed an “arms race”—if the GRN could co-opt new genes rapidly enough, these could adapt to a new role of reading the maternal gradient before transposon infection spreads. More recently [108], we have observed similar host–parasite dynamics in a fine-grained model, in which transposons insert at particular base-pair sequences. These studies illustrate how gene co-option can lead to robustness, not against environmental perturbations, but against a particular type of gene loss due to the ubiquitous selective pressure of transposons on the genome. The transposon mechanism also sped evolutionary searches, which may be applicable to EA optimization in general (see also Ref. [115]).
Fine-scale modeling (at the level of TFBS evolution) can be used to investigate TFBS clustering, or the origin of CRMs. In Drosophila, the well-studied gene hb is known to have three CRMs actively driving expression during segmentation. The CRMs control expression in distinct spatial regions, though there is some overlap and redundancy. To study the evolvability of this genomic structure, we constructed a representation of an ancestral hb cis-regulatory region as a string of 104 characters [38, 39]. Each character represents a TFBS for known regulators of hb, or a neutral spacer (Figure 10.4). TFBSs are either activating or repressing. Total regulatory strength from the string was used to solve pattern forming (DE) equations, with the results scored (fitness) against experimental hb patterns. Evolution on populations of thousands of individuals readily produced clustering or modularity, such that the cis-regulatory region formed three distinct CRMs. Depending on overall regulation strength (Ra), the spatial domains controlled by each CRM ranged from entirely redundant to fully distinct (Figure 10.6). For Ra ⩾ 79, we observed the emergence of a distinct “stripe element” expression, and a close association between the model output and the actual expression domains seen for the three biological CRMs. We conclude from these simulations that modularity is readily evolvable. It can produce redundant expression, a basal form of redundancy against mutations (in which the loss of a modules’ function is not catastrophic). Modularity also allows for a mix of redundancy and distinct expression, as seen biologically; this allows for some redundancy, but also the capacity to differentially regulate the temporal and spatial aspects of particular domains (biologically, the anterior hb domains are expressed before the mid and posterior domains).
The fine-scale modeling of Duque et al. [37] corroborates a tendency to clustering, and, in addition, calibrates the rate of maintenance of TFBS clusters against divergence in evolutionary time: across 37 well-studied CRMs in segmentation, they could fit divergence times between 12 different Drosophila species. Saito et al. [116], in evolutionary computations of simple and complex GRNs, found that complex GRNs tend to evolve redundancy. Soyer’s work with protein networks shows similar tendency toward modularity [117].
A very powerful use of modeling is to map out the range of possibilities for a given dynamic mechanism. Many authors have done this with evolutionary simulations, characterizing parameter spaces or fitness functions for particular evolutionary mechanisms.
With a W matrix approach to represent the genotype-to-phenotype mapping, Borenstein and Krakauer [118] studied how the developmental plan maps several genotypes to the same phenotype, and consequently generates a much smaller number of distinct phenotypes (which depend on the specific mapping). They found that gene background (epistasis) created a clustering of phenotypes in the space of possibilities (morphospace), with the result that low-interaction ancestral GRNs spanned a larger volume in the space than the highly connected derived GRNs. Payne et al. [119] looked at this in terms of input (coded in a gene’s cis-regulatory region) to
output (gene expression) functions. Clustering can result from the fact that many GRNs can produce the same output; innovation can be measured in terms of an increasing distance from ancestral forms in the phenotype space.
Focusing on small GRNs, Nochomovitz and Li [120] surveyed the phenotype space for 3- and 4-node networks. They found that certain phenotypes can be coded by a very broad range of GRNs, allowing for flexible design. These tended to have a conserved set of connections for phenotypic stability, as well as a more evolvable set of connections. If a fourth gene was added to a 3-node net, it was frequently involved in the conservative suppression of phenotypic variation. Feed-forward loops are very common small GRN motifs in biological networks. Widder et al. [121, 122] used an analytical approach (made feasible by the small GRN size) to do a complete survey of the GRN-function landscape FFLs. They found the abundance of the FFL motif could be predicted from its high evolvability.
Large-scale surveys of phenotype space can provide broad insights into the dynamics of evolution. While the first assumption might be that species evolve toward steady states, Pinho et al. [123] found that GRNs (in the W matrix approach) more often showed cyclic behavior (periodic orbits) than monotonic evolution toward fixed points. They reported that stable evolution toward fixed points was more common in sparser networks.
Multifunctional GRNs can produce different expressions at different stages of development. Using a genotype space characterizing millions of GRNs and their functions, Payne and Wagner [124] found that the number of multifunctional GRNs declines exponentially depending on the number of functions. The total number of multifunction GRNs can remain high, but they become increasingly non-clustered as the number of functions increases. This implies that the historical trajectory of a GRN is important for acquiring new functions and that multifunctional GRNs are particularly susceptible to mutation. This is, however, a sequential view of the acquisition of multifunctionality; Warmflash et al. [125] considered that GRNs are under selective pressure from multiple constraints simultaneously. Frequently, these cannot all be equally optimized. The authors of Ref. [125] found that coding a Pareto optimization into their GRN evolutionary mechanism produced a more efficient search for fit GRNs than ad hoc combination of design criteria.
GRN computations can be used to study epistatic effects, by comparing a general null hypothesis of additive gene action (where genetic background has no effect) to epistatic models with background effects. Carter et al. [126] found that epistasis can be a strong modifier of evolution: if positive (cooperative), it can enhance evolvability; if negative (antagonistic), it can favor canalization. Sanjuan and Nebot [127] used computations to test whether simpler genomes tended to have antagonistic epistasis. Simulations indicated that multifunctionality and a lack of redundancy in small GRNs was associated with negative epistasis, and the converse of these in large networks was associated with positive epistasis (which was associated with increased robustness to mutations). Draghi and Plotkin [128] found epistasis to have low prevalence early in adaptation but to be more common later in adaptation. Accompanying shifts from antagonistic to synergistic effects paralleled the small versus large GRN results in Refs. [35, 118]. This may be useful for comparing bacterial versus eukaryotic evolutionary strategies.
Cotterell and Sharpe [129] tested epistatic effects in spatial gradient-reading GRNs (e.g., that convert graded input into discrete spatial domains in segmentation). They found that such GRNs were connected, in terms of gene additions and deletions, through unstable, non-functional GRNs. However, epistatic gene-background effects could stabilize the non-functional GRNs, so that evolution between stable networks occurred by these pathways.
In addition to the studies on spatially distributed gene expression discussed in Section 10.2.1.2, several lines of investigation have used GRN computations to elucidate probable paths of evolution for body segmentation mechanisms in insects. Our work [106, 107, 108] has focused on evolution of GRNs that produce segmentation patterns simultaneously, characteristic of long germ band insects such as Drosophila. Many insects have short germ band segmentation, in which only two or three segments are formed at a time (compared to seven in Drosophila). These short patterns are propagated down the body to form higher numbers of final segments. Salazar-Ciudad et al. [130, 131] simulated stripe forming GRNs, and found that small numbers of stripes (up to three) tended to be formed by hierarchical networks, while higher numbers of stripes tended to be formed simultaneously. These simultaneous networks could become hierarchical with further evolution. ten Tusscher and Hogeweg [18, 19] have computed GRN evolution on large populations, and found that the sequential (short germ band) mechanism evolves readily from an ancestral unsegmented state (also see the work in Ref. [15, 16, 17]). They argue that the ease with which this mechanism evolves supports independent origins in vertebrates, annelids, and arthropods. They suggest that the simultaneous (long germ) mechanism reflects particular historical constraints during evolution.
The increasing use of GRN simulations in recent years has had a significant impact on the understanding of a number of evolutionary principles in developmental biology. Starting from discrete, Boolean representations (W matrices), refinements have included modeling of continuous gene expression levels, consideration of co-factor effects, modeling at TFBS resolution, and extension to spatially distributed gene expression (a fundamental aspect of body plan formation in development). This chapter has outlined the techniques developed in this area of computational biology, and the main biological impacts these have had. Our hope is to provide some inspiration for computer scientists in EA that the types of evolutionary dynamics from the detailed biology can enrich development of new and more efficient search algorithms. In turn, we hope an increased interaction between computational biology and computer science can lead to improvements in the efficiency and scope of the developmental simulations (for instance, in Section 10.3.4, the use of multi- rather than single-objective optimization). Several recent computer science reviews [115, 132, 133, 134, 135] have called for development of a field of computational evolution. A number of the topics outlined (what does evolution act on; how does evolution happen; and what are the practical applications of this knowledge) are, as discussed in this chapter, being approached by computational biologists. Some topics, however, such as modeling evolution of nested, multi-level organisms, deserve much more attention (see Refs. [136, 137] for other recent reviews.) Increased interaction will benefit both computational biologists and computer scientists. To summarize, some of the areas we feel will be especially fruitful for interdisciplinary exchanges:
The authors thank the U.S. NIH for financial support, grant R01-GM072022. A.V.S. thanks The Russian Foundation for Basic Research, grant 13-04-02137.