CHAPTER 27
Interpretation of Phylogenetic Trees

CS Mukhopadhyay and RK Choudhary

School of Animal Biotechnology, GADVASU, Ludhiana

27.1 INTRODUCTION

Phylogenetic trees are frequently encountered in research papers related to evolution, population diversity, microbial studies, and genetics. It is critical to infer the meaning of a given phylogenetic tree, as terms such as “cladogram”, “phylogram”, “phenogram”, etc. sound quite confusing to a novice. This chapter starts with such terminologies, and then the phylogenetic tree is explained to decipher the meaning depicted in general.

27.2 UNDERSTANDING PHYLOGENETIC TREES

A rectangular, horizontal phylogenetic tree is shown in Figure 27.1, which has been constructed using 18s rRNA sequences from nine divergent taxa. In general, a phylogenetic tree is two‐dimensional, consisting of horizontal (analogous to the X‐axis of a graph) and vertical (as Y‐axis) axes.

Image described by caption and surrounding text.

FIGURE 27.1 This dendrogram represents the evolutionary relationship among the taxa. The horizontal axis represents the evolutionary changes over time.

The leaves or the terminal taxa are connected by internal nodes (solid circles) (Figure 27.1). The tree gradually shrinks towards the left and ends at the hypothetical common ancestor (solid square).

27.2.1 Horizontal dimension of a phylogenetic tree

This is the scale that signifies evolutionary distance (of a dendrogram) or time‐scale (of a chronogram). The branch length of the current dendrogram (Figure 27.1) denotes the evolutionary distance between two taxa – the longer the branch, the more genetic change that taxon (or cluster of taxa) has experienced over the time of evolution.

A scale at the bottom of the tree acts as the unit of substitution of residue per site (base or amino acid, depending on the type of tree) and, thus, measures the substitution of residues. The following formula determines it:

images

A scale of 0.1 means the amount of genetic change is 0.1 per unit length of the branch (indicated by the scale length). Thus, the total amount of genetic change will be

images

When the scale is represented as a percentage (here, it is 10%), this means that ten nucleotides have been substituted out of 100 residues. Please note that this does not necessarily mean that ten different nucleotides have been substituted, but that a single residue could have experienced substitution for multiple times. That is why a given value of 1.0 (or 100% in percent scale) does not mean all bases have been substituted but, rather, that 100 substitutions have taken place, some of which have occurred at the same residue position. Sometimes, the evolutionary scale is also represented as integer values, indicating the net number of base substitutions.

27.2.2 Vertical dimension of a phylogenetic tree

This direction has no meaning so far as evolutionary distance (or genetic changes) or the time‐scale is concerned. This dimension is used only to place the taxa while building the phylogenetic tree. The branches of a sub‐tree, or sub‐sub‐tree, or the whole tree, can be swapped without altering the meaning of the tree (depicted in terms of evolutionary relationship) (Figure 27.2). One can also increase the distance (width along the vertical axis) among the taxa, although it will have no impact on the meaning depicted by the tree.

Dendrogram illustrating the swapping of the branches of the sub‐tree of the main tree, with arrow depicting the branches of encircled subtree swapped at the red-marked internal node.

FIGURE 27.2 Swapping of the branches of the sub‐tree of the main tree does not change any meaning represented by the tree. The evolutionary distances between the OTUs remain unchanged.

27.3 REPRESENTATION OF PHYLOGENETIC TREES

A dendrogram can be drawn in several ways without distorting its meaning. The depictions are useful under different circumstances.

27.3.1 Rectangular tree

This representation is well suited for both rooted and unrooted trees, and such trees are most easily understood. The branches connecting the taxa are separated by a vertical line (of an arbitrary length). The midpoint of the vertical line indicates the internal node (representing the hypothetical common ancestor of these taxa, which are not available at the present time) between two taxa being connected.

Image described by caption.

FIGURE 27.3 Representing the same phylogenetic tree as circular, radiation, rectangular and straight orientations.

27.3.2 Straight tree

The rectangular trees are modified to straight tree by joining the taxa to the respective internal nodes directly (no vertical line is used), which makes the appearance of the tree more convergent towards the common ancestor. A straight tree depicts the same information as a rectangular tree.

27.3.3 Radiation tree

The typical tree‐like appearance is substituted with a comparatively simple depiction. The divergence of the component taxa is not shown from a hypothetical ancestor (i.e., internal node). Figure 27.3 depicts how a straight tree can be converted to a radiation tree. The evolutionary scale may not be shown in this type of tree, though the node statistics (bootstrap values) and scale are present.

27.3.4 Circular tree

Both rooted and unrooted trees can be depicted by a circular tree. The distance from the center denotes the branch length. The distance at the periphery counts as nothing (like the vertical axis of rectangular or straight trees).

Illustration of a hand (middle) converting a straight tree (right) to a radiation tree (left) by eliminating the depiction of divergence from common ancestor.

FIGURE 27.4 Converting a straight tree to a radiation tree by eliminating the depiction of divergence from common ancestor.

27.4 METHODS FOR CONSTRUCTING EVOLUTIONARY TREES FROM INFERENCES

There are two broad methods of phylogenetic tree construction: distance‐based and character‐based methods.

27.4.1 Distance‐based methods

A distance matrix containing the pairwise distances between the input sequences is first generated through multiple sequence alignment (MSA). The number of substitutions of residues (spanned throughout the length) between each pair of multiple molecular sequences is calculated and is then converted into a single value (for each pair), using a suitable model. Examples of distance‐based phylogenetic algorithms are UPGMA, Neighbor‐joining (NJ), and Fitch–Margoliash. An appropriate evolutionary model is selected, based on the underlying evolutionary process in distance‐based methods. Examples of such evolutionary models are: JC69 (Jukes and Cantor, 1969), K80 (Kimura, 1980), F81 (Felsenstein, 1981), HKY85 (Hasegawa et al., 1985), T92 (Tamura, 1992), TN93 (Tamura and Nei, 1993), and GTR (generalized time‐reversible; Tavaré, 1986).

The evolutionary model is required to calculate the number of substitution, based on certain assumptions. Thus, selection of the evolutionary model is as critical as the selection of the appropriate phylogenetic algorithm. The later depends on the sequence type (amino acid or RNA or Coding DNA or non‐coding DNA or intergenic DNA), sequence divergence, sequence length, and so on.

27.4.2 Character‐based methods

Individual residues of the sequences are taken into account to construct the tree. Here, instead of calculating the distances among the taxa, the sequences are aligned, to find out the similarity and dissimilarity among characters in each of the columns of aligned sequences. The total number of different residues (over the length) is not calculated but, rather, some particular state (or location) of the aligned residues is identified to define the evolution of the sequences. Examples of a character‐based method are maximum parsimony, maximum likelihood, and Bayesian inference. Maximum likelihood utilizes both approaches (distance‐ and character‐based).

Again, phylogenetic trees can be constructed by any one of the following two methods:

27.4.3 Cladistic methods

This is a method that discovers the evolutionary relationship among taxa through intermediate, as well as common, ancestry. This approach yields a cladogram; for example, maximum parsimony.

27.4.4 Phenetic method

This studies the degree of similarity among a group of organisms to unveil the relationship through a tree‐like network (called a phenogram), e.g. UPGMA, maximum likelihood method. The rate of divergence is assumed to be uniform among the taxa.

Table 27.1

Types of data
Tree
construction
methods
DistanceCharacter
Clustering algorithmUPGMA, NJ
Optimality criterionMEMP, ML

27.5 INFERRING PHYLOGENETIC TREES

Now we will compare the outputs of different molecular phylogeny methods, using a set of nucleotide sequences (18s rRNA) belonging to nine organisms representing distant taxa.

At the outset, the best model (i.e., TN93 + G) was selected, based on the least Bayesian information criterion (BIC) score (which was 10145.019). Parameters selected for each of the methods have been specified along with the tree in Figure 27.5.

5 Phylogenetic trees constructed from nine sequences of 18s rRNA gene belonging to divergent species using various algorithms: MP tree, ME tree, NJ tree, UPGMA tree, and ML tree.

FIGURE 27.5 Phylogenetic trees constructed from nine sequences of 18s rRNA gene belonging to divergent species using various algorithms.

Parametric details for each of the algorithms used in constructing the phylogenetic trees are as follows:

  • MP Tree: (parameters for tree construction are: subtree‐pruning‐regrafting (SPR); number of initial trees: 10; MP Search Level:1; no. of trees to retain: 100; bootstrap: 500 replicates)
  • ME Tree: (parameters for tree construction are: ‘Tajima‐Nei + Gamma’ model; gamma distribution to determine rates among sites; gamma parameter: 5; same pattern of lineages; bootstrap: 500 replicates)
  • NJ Tree: (parameters for tree construction are: ‘Tajima‐Nei + Gamma’ Model; gamma Distribution to determine rates among sites; gamma parameter: 5; same pattern of lineages; bootstrap: 500 replicates)
  • UPGMA Tree: (parameters for tree construction are: ‘Tajima‐Nei + Gamma’ Model; gamma distribution to determine rates among sites; gamma parameter 5; same pattern of lineages; bootstrap: 500 replicates)
  • ML Tree: Tajima‐Nei Model; (Parameters for tree construction are: ‘Tajima‐Nei + Gamma’ Model; gamma distribution to determine rates among sites; Gamma parameter: 5; ML heuristic method: nearest‐neighbor‐interchange (NNI); initial number of default tree: make initial tree automatically (Default – NJ/BioNJ); bootstrap: 500 replicates)

TABLE 27.2 Comparison between the features of the trees generated from the following important phylogenetic algorithms (Desper and Gascuel, 2005).

SN Tree Characteristic features
1 Maximum parsimony
  • Scale‐bar does not correspond to genetic distance but, rather, counting of substitution is done for sequence‐pairs.
  • Consensus tree shows the agreement of branching based on bootstrap values.
  • The minimum number changes needed to explain the sequence data is available from the analysis. This is a character‐based approach (not distance‐based). Hence, those particular sites (or columns of MSA) of the aligned sequences are identified that reveal maximum information about the evolution. Such most‐informative sites are only utilized to yield the phylogenetic tree. The scale bar indicates nucleotide substitution per site (value in decimal), or nucleotide substitution per unit length is given as the scale bar (value more than one).
2 UPGMA
  • The scale bar at the bottom indicates the evolutionary distance, which is additive in nature. In the given example, the unit of the scale bar is 0.05, which means 95% homology and 5% divergence. Please note that an equal and constant evolution rate has been assumed for all the branches. Hence, an unrealistic assumption with UPGMA will yield an erroneous branch length and the wrong tree.
  • Being ultrametric, it is assumed that molecular clock and all terminal taxa are equally distant from the root.
  • The distance matrix shown below indicates the divergence between all pairs of sequences. The distance between the 18srRNA sequences of rat and horse (1st and 2nd sequences, respectively) is 0.004, which indicates that four bases out of every 1000 bases are different. Hence, these two sequences have 99.6% (i.e., 100–0.4) identity.
  • Sometimes the distance is also given as a raw number of difference in the number of residues between two sequences, and then the scale is an absolute value instead of a percentage.
3 NJ
  • NJ is also a distance‐based method, with additive nature of branch lengths.
  • The additive nature of a tree is characterized by the feature that the sum of each branches connecting two taxa makes up the distance between those two taxa.
  • NJ is fit for non‐ultrametric distance data, where the additivity property is restored (and NJ tree becomes equivalent to ME tree).
  • NJ is based on clustering, and the molecular clock is not assumed (i.e., mutation or substitution rates are different for different OTUs).
  • The distances have been indicated in each arm, and the scale below is the indicator of the distance.
4 ME
  • A distance‐based method that ensures that the minimum total length of its branches indicates a minimum number of evolutionary events.
  • However, the tree with minimum total branch length is not necessarily the true tree, especially for short input sequences.
  • ME is comparable with MP, but the difference is that ME is inferred from a genetic distance, while MP is based on counting individual base substitutions over the tree.
5 ML
  • This approach is more robust and utilizes information from a distance between sequences and the character information.
  • The distance scale indicates evolutionary distances between sequence pairs. However, the scale is not comparable to time‐scale.
  • A scale bar of 0.1 indicates 0.1 substitutions per nucleotide.

TABLE 27.3 Pairwise distances (calculated by maximum composite likelihood model, using MEGA7) between the input sequences are shown in the lower triangular matrix.

M11188 NR_046271 M59392 AF173629 M10098 NR_074540 X61688 EF645689 AY036903
M11188 0.002 0.009 0.007 0.002 1.721 1.463 1.472 1.706
NR_046271 0.004 0.009 0.007 0.002 1.723 1.464 1.473 1.708
M59392 0.035 0.034 0.010 0.009 1.699 1.461 1.470 1.717
AF173629 0.026 0.025 0.039 0.007 1.690 1.487 1.496 1.689
M10098 0.003 0.003 0.034 0.026 1.721 1.462 1.472 1.706
NR_074540 2.582 2.593 2.551 2.571 2.584 1.201 1.196 0.912
X61688 2.250 2.261 2.239 2.285 2.253 1.868 0.003 1.111
EF645689 2.266 2.276 2.253 2.299 2.268 1.859 0.006 1.112
AY036903 2.546 2.557 2.556 2.553 2.548 0.249 1.716 1.723

27.6 QUESTIONS

  1. 1. How do you differentiate between distance‐based and character‐based methods of phylogeny?
  2. 2. What are the differences between a cladogram and phenogram? Under what circumstances will you use these terms?
  3. 3. What is the meaning of the scale bar given in a phylogenetic tree that has been constructed using the following methods?
    1. UPGMA
    2. Maximum parsimony
    3. Minimum evolution
  4. 4. Is there any difference between a circular tree and a vertical tree, so far as the meaning conveyed by the depictions is concerned?
  5. 5. Why do we first select the best evolutionary model before constructing a phylogenetic tree?
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset