CHAPTER 14
BLASTx

CS Mukhopadhyay and RK Choudhary

School of Animal Biotechnology, GADVASU, Ludhiana

14.1 INTRODUCTION

BLASTx is one of the three translated BLAST algorithms – namely, BLASTx, tBLASTn and tBLASTx. In BLASTx, a nucleotide sequence is used as a query, which is first translated in all six reading frames, and then each of the translated amino acid sequences is compared to the protein sequences in protein databases. Thus, the comparison occurs at the level of amino acid and, so, the result is the aligned amino acid sequences (i.e., the translated query versus homologous sequence in protein database), although the query is a nucleotide sequence. BLASTx runs at a slower pace, due to matching all the six reading frames to the protein databases. The result ultimately gives the open reading frame as a match with its homologous sequence.

BLASTx is a powerful gene‐finding or gene‐predicting tool. It is recommended for identifying the protein‐coding genes in genomic DNA/cDNA. It is also used to detect whether a novel nucleotide sequence is a protein‐coding gene or not, and it can be used to identify proteins encoded by transcripts or transcript variants.

14.2 OBJECTIVE

To determine the open reading frame and the name of the gene from the given coding sequence (cds).

14.3 PROCEDURE

The basic steps are same as for BLASTn. However, parameters like “Genetic Code”, “Organism”, and “Database” may be required to be modified. Open the NCBI home page with the URL http://www.ncbi.nlm.nih.gov/ and click “BLASTx”. It can also be opened by entering http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastx&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome in the space for URL. The BLASTx main web page is now open (Figure 14.1).

Image described by caption.

FIGURE 14.1 Homepage of BLASTx at NCBI. The sequence can be entered into the box (angled arrow) as query sequences, either with accession number(s) or as sequence(s) in FASTA format.

14.3.1 Enter query sequences

Enter accession number(s) or FASTA sequence(s), pasting one or more nucleotide query sequence(s) in FASTA format, or the respective NCBI accession number(s) in the specified sequence box. Alternatively, a text file containing the query sequences (in FASTA format) could also be uploaded by clicking the “Choose File” button.

  • Provide Query Sub‐range (optional): This specifies a particular range of the input sequence which is to be searched against the database. It is especially useful when the GenBank accession number is used instead of the sequence itself.
  • Genetic Code: The default is “Standard”, which can be used for eukaryotic genomic DNA‐derived sequences. Other options, for prokaryotic DNA or mold or yeast or vertebrate or invertebrate mitochondrial DNA, are also available.
  • Give a Job Title: To identify the BLAST results from saved searches.
  • Checking “Align two or more sequences”: This checkbox, if checked, will refresh the page to provide the user with another sequence box, where the subject sequence(s) is/are to be pasted. The application is the same as that discussed in the previous BLASTn or BLASTp chapters.

14.3.2 Choose search set

  • Database: Choose any one of the protein databases against which the search is to be made. The list of databases is same as that of BLASTp.
  • Organism (Optional): Specify the organism by common name or binomial name or taxonomical ID. You can also check the small checkbox adjacent to the entry box to exclude one or more organisms (click on the “+” sign to add more) from the search results.
  • Exclude Models(XM/XP) and/or Uncultured/environmental sample sequences (optional): Check one or both of the check boxes to exclude one or both of the options. Models(XM/XP) stands for “model reference sequences”, determined and annotated from the Genome Annotation Project of NCBI and, thus, could be incomplete.
  • Entrez Query (Optional): Same as for BLASTn. This is used to restrict the search to specified Entrez query. It allows the Boolean operators AND, OR, NOT to define the database to be searched.
  • BLAST: Click on the button to initiate the BLASTx search. Click the checkbox to open the search result in a new window (Figure 14.2).
Image described by caption and surrounding text.

FIGURE 14.2 The results page of BLASTx contains a color key‐based alignment display, followed by a tabular description of sequence alignments and, finally, alignment of each of the sequence pairs (a query versus database sequence, called a subject sequence).

14.3.3 Program selection

  1. Algorithm parameters: These are very much the same as those for BLASTp, except for the parameter “Short Queries”, which has been dropped in BLASTx.
  2. Optional parameters: These are of the subtypes shown in Table 12.2 (Chapter 12 of this book).

14.4 INTERPRETATION OF BLASTx RESULTS

  1. The output of BLASTx is similar to that of BLASTp.
  2. The color key‐based alignment depiction and the table indicating the BLASTx output for various homologous sequences are the same as BLASTx.
  3. Individual pairwise alignment is also the same as BLASTp. However, the open reading frame out of all the possible six reading frames is indicated by “Frame”.

14.5 QUESTIONS

  1. 1. Explain how BLAST can be used as a gene prediction tool.
  2. 2. Suppose the following partial cDNA amplicon (JQ911700.1 of NCBI GenBank) has been custom sequenced in yak (Bosgrunniens):

    C C G A A G A A G A A A A T G G C C A T A A C C A G G T C C C A A A T A T T A G G A C T T T T C A T C A C T G T C C T G A T C G G C C T A C A G G A A T C G T G G G C T A T T A A A G A G A A T C A T G T G A T C A T C C A A G C T G A G T T C T A T C T G A A A C C T G A G G A A T C A G C C G A G T T T A T G T T T G A C T T T G A T G G T G A T G A G A T T T T C C A C G T G G A T A T G G G G A A G A A G G A G A C G G T G T G G C G G C T T C C A G A A T T T G G A C A T T T T G C C A G C T T T G A G G C T C A G G G T G C C C T G G C C A A T A T G G C T G T G A T G A A A G C C A A C C T G G A C A T C A T G A T A A A G C G C T C C A A C A A C A C C C C A A A C A C C A A T G T T C C T C C A G A A G T G A C T C T G C T C C C A A A C A A G C C T G T G G A A C T G G G A G A G C C C A A C A C A C T C A T C T G C T T C A T T G A C A A G T T C T C C C C A C C C G T G A T C A G T G T C A C A T G G C T T C G A A A T G G C A A A C C T G T C A C T G A T G G A G T G T C A C A G A C G G T C T T C A T G C C C A G G A A T G A C C A C C T T T T C C G C A A G T T C C A C T A C C T C C C C T T C C T G C C C A C A A C A G A G G A T G T C T A T G A C T G C A A G G T G G A G C A C T T G G G T T T G A A T G A G C C T C T T C T C A A G C A C T G G G A G T A T G A A G C T C C A G C C C C C C T C C C A G A G A C C A C A G A G A A T G C A G T G T G T G C C C T G G G C C T G A T T G T G G C T C T G G T G G G C A T C A T T G C A G G G A C C A T C T T C A T C A T C A A G G G C G T G C G C A A A G C C A A C A C C G T T G A A C G C C G A G G G C C T C T G T G A G G C G C C T G C A G G T A A T G G A C T T T G T T A C A G A G A A G A T C A A T G A A G A T A T T T C T G C C T T A A T A G C T T T A C A A A C C T G G C A A T T C T C C A A T T G T T C A C C T C A C T G A A G A C C A C C A T G C T T C A G C A C T T C C C A G T C C T T T A C T T A C C C T A A G A G T A A G A T G C C T T C C A C A A T C T C C

    Determine whether it belongs to some protein‐coding gene, along with the reading frame.

  3. 3. Identify the nucleotide sequence (AY095312.1) and comment on whether it is a part of a coding sequence.
  4. 4. What is the principle and what are the applications of BLASTx?
  5. 5. Examine and interpret the following output:
    Result page of BLASTx displaying the alignment of each of the sequence pairs (a query versus database sequence, called subject sequence).
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset