CHAPTER 22
Construction of Phylogenetic Tree: Fitch Margoliash (FM) Algorithm

CS Mukhopadhyay and RK Choudhary

School of Animal Biotechnology, GADVASU, Ludhiana

22.1 INTRODUCTION

This is the first algorithm based on least squares principle for phylogenetic tree reconstruction. It was developed by Walter Fitch and Emanuel Margoliash in 1967 (Fitch and Margoliash, 1967; Fitch, 1970, 1971). The evolutionary distances between the taxa are determined by the Jukes–Cantor model when DNA sequences (instead of distances) of the same length are entered.

22.1.1 Principle

The algorithm is based on optimality criteria that select the tree with a minimum amount of residual (difference between actual and expected summed evolutionary distance). The algorithm estimates the total branch length (distance) and clusters in accordance to taxa pair in order to determine the unrooted tree with minimum distance.

22.1.2 Assumption

  1. The algorithm does not assume a constant mutation rate.
  2. It assumes additivity of distances – that is, additivity of the branch length of the trees to yield the total branch length or distance.

22.2 OBJECTIVE

To construct a phylogenetic tree using the Fitch Margoliash (FM) method, given the distances among a set of molecular sequences.

22.3 PROCEDURE

Let us start with four sequences: “A”, “B”, “C” and “D”, and consider that the given distances (d) between the four sequences are as follows:

TABLE 22.1

A B C D
A  0
B 20  0
C 26 22  0
D 34 30 16 0

The iterative steps in this algorithm are as follows:

  1. Consider two of the taxa (say, “A” and “B”) for determining the distance from the third composite taxa (denoted by “X”). The composite taxa (“X”) are a combination of the rest of the taxa (here, “D” and “C”).
  2. The distance between “A” and “X” (dAX) is calculated by averaging both the distances from “A” to “C” and “D” taxa (all the component OTUs of the composite taxa “X”).
  3. Similarly, the distance between “B” and “X” (dBX) is also calculated:

    TABLE 22.2

    A B C D
    A  0 dAB = 20
    B 20  0 dAX = (dAC + dAD)/2 = (26 + 34)/2 = 30
    C 26 22  0 dBX = (dBC + dBD)/2 = (22 + 30)/2 = 26
    D 34 30 16 0
  4. In the next step, the distance (P1) between the terminal node (taxa “A”) and its intermediate ancestor (“P”) is calculated using the formula: P1 = (dAB + dAX – dBX)/2.
  5. Similarly, the distances between taxon “B” and intermediate ancestor “Q”, as well as taxon “X” and its intermediate ancestor “R”, are calculated.

    TABLE 22.3

    A B X P1 = (dAB + dAX – dBX)/2 = (20 + 30 – 26)/2 = 12
    A  0 Q1 = (dAB + dBX – dAX)/2 = (20 + 26 – 30)/2 = 8
    B 20  0 R1 = (dAX + dBX – dAB)/2 = (30 + 26 – 20)/2 = 18
    X 30 26 0

    The obtained distances (in the P1, Q1 and R1) are put in a tree:

    Phylogenetic tree with a sub-tree. P, Q, and R as intermediate ancestors.

    FIGURE 22.1

  6. Now “A” and “B” are combined as “AB” (as we have obtained the distances of the taxa from the respective intermediate ancestors).
  7. The combined taxon “X” is expanded into its component taxa (here, “D” and “C”).
  8. The immediate taxon (“C”) is considered a second taxon to estimate the distance with its intermediate ancestor, and the other taxa are again combined into taxon “X”, so that the same steps can be iterated.
  9. The same notations are used as for the previous iteration. However, the subscripts are changed to “2”, – that is, P2 (distance between “AB” node with its intermediate ancestor, designated by “P” again), Q2 (distance between “C” node with its intermediate ancestor, designated as “Q” again) and R2 (distance between “X” node with its intermediate ancestor “R”).

    TABLE 22.4

    AB C D
    AB  0 (AB)C = (dAC + dBC)/2 = (26 + 22)/2 = 24
    C 24  0 (AB)D = (dAD + dBD)/2 = (34 + 30)/2 = 32
    D (or X) 32 16 0
    Similar to figure 22.1, but with distances between intermediate ancestors and nodes indicated: P and AB (P2 = 20), Q an C (Q2 = 4), and R and D (R2 = 12).

    FIGURE 22.2

  10. At this point, one additional parameter, internal branch length (IBL), is calculated for the combined taxa “AB”:

    IBL Calculation

    images
  11. In the last step, no more additional information for calculating IBL between “D” and “ABC” is available. In this situation, the length of the internal branch (designated as “IBL2” for the second time calculation) is determined by the following formula:
    images

    The final tree constructed by the FM algorithm is as follows:

    Phylogenetic tree with two sub-trees, with distances between intermediate ancestors and nodes indicated.

    FIGURE 22.3

22.4 INTERPRETATION OF THE FM TREE

The phylogenetic tree has been constructed assuming a different rate of evolution among different branches (or taxa). The feature of additivity of the branches holds true to determine distances between any two OTUs.

22.5 QUESTIONS

  1. 1. Construct the phylogenetic tree using the FM method:

    TABLE 22.5

    ABCD
    A 0
    B10 0
    C1415 0
    D2418110
  2. 2. In the last chapter, you constructed the phylogenetic tree using UPGMA (Q1a). Now construct the tree using the FM method and compare with the previous one.

    TABLE 22.6

    ABCDE
    B 8
    C1818
    D181810
    E181810 4
    F2020202020
  3. 3. What is the meaning of the term “internal branch length”? How is it important in calculating the phylogenetic tree using the FM method?
  4. 4. Differentiate between the principle and applications of the FM and UPGMA methods of phylogenetic tree construction.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset