CHAPTER 8
Dot Plot Analysis

CS Mukhopadhyay and RK Choudhary

School of Animal Biotechnology, GADVASU, Ludhiana

8.1 INTRODUCTION

A two‐dimensional (2D) plot depicting one or more of the various sequence features (sequence similarities, direct and/or inverted repeats, motifs, gaps, sequence inversions, etc.) is called a dot plot. A single sequence, or two different sequences (with the same type of residues), can be studied to reveal the hidden sequence features. Dot plot has been used for local (not global) alignment, and was identified as a very powerful tool for molecular sequence analysis as early as during the late 1960s (Fitch, 1969).

8.2 OBJECTIVE

To compare two homologous molecular sequences using a dot plot.

8.3 PROCEDURE

Molecular sequences can be subjected to dot plot analysis using online tools like Dotlet, Dotter, and so on.

  1. Dotlet (http://myhits.isb‐sib.ch/util/dotlet/doc/dotlet_help.html). This is freely available online and is used as a tool for diagonal plotting of sequences.
  2. Dot plot(+) (http://www.hku.hk/bruhk/gcgdoc/dot plot.html). Dot plot(+) software can identify the overlapping portions of two sequences, and also any repeats and inverted repeats of a particular sequence.
  3. Dotter (http://sonnhammer.sbc.su.se/Dotter.html). A graphical dot plot program for thorough comparison of two molecular sequences. Dotter can be run on any of the following operating systems: MAC, Linux, Sun Solaris and Windows OS.

Two different sequences, or a single sequence, can be placed along the vertical and the horizontal axes of a matrix for analysis using a dot plot. The query and the subject sequences are placed along rows (Y‐axis) and columns (X‐axis), respectively. Next, a dot is placed in the cells, where the two axes have the same residue. Thereby, a subset of the sequence which has a run of identical residues will form a straight line (Figure 8.1).

Image described by caption and surrounding text.

FIGURE 8.1 Depiction of plotting the straight line based on the runs of dots obtained from matches between residues along the X‐ and Y‐axes. Insertion in any of the sequences will distort the run of the straight line.

8.4 PARAMETERS OF DOT PLOT ANALYSIS

There are two main parameters optimized during a dot plot analysis: window size and mismatch limit.

8.4.1 Window size

This determines the run of residues that must match in both sequences. If the specified number of residues at a stretch is matching, the graph will not indicate any mark of dot(s). Window size, thus, monitors the background noise. The smaller the window size, the more background noise there will be. Again, a very high window size will produce a clean plot, devoid of any indication of sequence similarity.

8.4.2 Mismatch limit

This parameter allows one to tolerate a specified number of mismatches, thereby indicating the stretches of residues with sequence similarity. The limit specified by different software ranges from 1 to 3.

Please note that these dot plot analyses have been done using http://www.vivo.colostate.edu/molkit/dnadot/ and https://wssp.rutgers.edu/StudentScholars/WSSP08/Dot plotter/Dot plotPractice.html?destination=StudentScholars/WSSP08/Dot plotter/Dot plotPractice.html, online tools which are no longer available.

8.5 INTERPRETATION

Dot plot analysis reveals several sequence features at a glance. Some examples have been given below:

8.5.1 Insertion(s)/deletion(s) in a pair of sequences

The sequence‐pair being compared using dot plot may differ due to insertion(s)/deletion(s) at one or more positions. These InDels are reflected by a break in the straight line (Figure 8.1). Insertion in the horizontal sequence (or deletion in the vertical sequence) will necessitate horizontal movement, and a break in the straight line and insertion in the vertical sequence (or deletion in the horizontal sequence) will be indicated by vertical movement and discontinuity in the straight line. The third base (i.e., “C”) of the horizontal sequence and the ninth base (i.e. “G”) of the vertical sequence are the insertions (highlighted yellow in the second diagram, Figure 8.1) in those respective sequences.

8.5.2 Identifying repeat sequences

The presence of repeat sequence(s) can be detected by a dot plot (Figure 8.2). The same sequence is placed along the horizontal and vertical axes. There is four fold repetition of the same sequence “TACGGCTACAGTCACG”, intervened by short tetramers of different sequences:

Image described by caption and surrounding text.

FIGURE 8.2 Interpretation of dot plot based on the same repeat sequence (shown above) which has been placed along both axes. The four different colors (yellow, green, blue and gray) have been shown to indicate the 1st, 2nd, 3rd and 4th repeat of “TACGGCTACAGTCACG”.

T A C G G C T A C A G T C A C G G G G G T A C G G C T A C A G T C A C G C C C C T A C G G C T A C A G T C A C G A A A A T A C G G C T A C A G T C A C G A C C C C C T A T A A A A G C T C A G T G A G C G C C C G C G G T A A A T G T A C C T G T C A C C C T A C A G C G A C C T C T G C C A G A C C

In the dot plot result, we find one diagonal line representing the full sequence, and some short fragmented lines parallel to the diagonal. These short lines represent the repeat sequences. Every fragment stands for alignment of the repeats with each other.

8.5.3 Unraveling other sequence features

A nucleotide sequence may produce a stem‐loop secondary structure when it has a palindromic sequence intervened by a short sequence. Similarly, there may be an inversion in the other half of a given sequence. Dot plot analysis can reveal such features. Inverted sequences will produce a main diagonal line between the other two corners (the corners adjacent to the end terminals of the sequences) of the matrix. Smaller diagonal lines are symmetrically parallel to the main diagonal, which indicates that the same repeat is there in the sequence in tandem.

8.6 QUESTIONS

  1. 1. Briefly describe how a dot plot will look like in the following conditions:
    1. Plotting a single sequence with itself.
    2. Plotting a sequence against its reverse sequence.
    3. Plotting a sequence with internal repeats with itself.
    4. Plotting a sequence against the same, but after inserting a short sequence somewhere within.
  2. 2. Prepare a dot plot with the following sequences, with window sizes 5, 7 and 10:

    >Seq1

    M M N R V Q P E N V H S T I F T P R E Y Q V E L V D A C L K G N T L S V L A S R S T R T F L I T M V T R E M A H L V D A C L K G N T L S V L A S R S T R T L T R S K E Q G G K G Q L V D A C L K G N T L S V L A S R S T R T R T L L T G W S G P G L V R A G E A I Q Q N T N L A V T T Y T R L E Q V D G W L P S R W S H T F T E A Q V I I M T V D V L E K G L E T G L L Q L D M L N L L V I T D A H R V A T M M N R V Q P E N V H S T I F T P R E Y Q V E L V D A C L K G N T L S V L A S R S T R T F L I T M V T R E M A H L V D A C L K G N T L S V L A S R S T R T L T R S K E Q G G K G Q L V D A C L K G N T L S V L A S R S T R T R T L L T G W S G P G L V R A G E A I Q Q N T N L A V T T Y T R L E Q V D G W L P S R W S H T F T E A Q V I I M T V D V L E K G L E T G L L Q L D M L N L L V I T D A H R V A T

    >Seq2

    T A V R H A D T V N M D G T G K V D V T M V A T T H S W R S W G D V R T Y T T V A N T N A G A R V G G S W G T T R T R T S R S A V S T N G K C A D V G K G G K S R T T R T S R S A V S T N G K C A D V H A M R T V M T T R T S R S A V S T N G K C A D V V Y R T T S H V N V R N M M T A V R H A D T V N M D G T G K V D V T M V A T T H S W R S W G D V R T Y T T V A N T N A G A R V G G S W G T T R T R T S R S A V S T N G K C A D V G K G G K S R T T R T S R S A V S T N G K C A D V H A M R T V M T T R T S R S A V S T N G K C A D V V Y R T T S H V N V R N M M

  3. 3. Suppose a sequence has tandem repeats (like VNTRs). Explain how the dot plot will look when the sequence is plotted against itself.
  4. 4. The consensus palindrome sequence TGTGAGCGCTCACA is given (from Proceedings of the National Academy of Sciences of the USA 81, 1624–1628). Describe the dot plot if you impose no restriction to window‐size and word match.
  5. 5. Under what circumstances should we use dot plot before multiple or pairwise sequence alignment?
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset