CHAPTER 33
Prediction of Transcription Binding Sites

S Jain1, S Panwar2 and A Kumar3

1 Department of Applied Sciences & Humanities, Jai Parkash MukandLal Innovative Engineering and Technology Institute, Yamuna Nagar, Haryana, India

2 Department of Genetics and Plant Breeding, Chaudhary Charan Singh University, Uttar Pradesh, India

3 Department of Nutrition Biology, Central University of Haryana, Haryana, India

33.1 INTRODUCTION

Transcription factors are crucial for sequence‐specific control of transcriptional regulation. Classically, the computational prediction of transcription factor binding sites (TFBS) depends on position weight matrices (PWMs) (Wingender et al., 2001), which give weights to each nucleotide at each position. These models strongly suggest that each nucleotide participates independently in the corresponding DNA–protein interaction and does not account for flexible length motifs.

33.2 OBJECTIVE

To predict the transcription binding site by using the TRANSFAC and MATCH tools

33.3 TRANSFAC

TRANSFAC is a database of TRANScription regulatory FACtors, and is maintained at GBF Braunschweig (Wingender et al., 2000). It combines the data regarding transcription factors, their DNA binding sites, sources of the factors and systematic classification of transcription factors. All the experimental results are accessible mainly through the FACTORS and the SITES table (Frech et al., 1997).

The data regarding binding proteins and the DNA sequences that are recognized by these proteins are maintained by the FACTORS and the SITES table, respectively. Furthermore, many transcription factors can be classified according to the respective DNA binding domains and/or their dimerization domains; therefore, the CLASS table has been introduced to TRANSFAC. Tiny TRP, a browsing tool for TRANSFAC, is the only solution that requires the linked databases in their original format. These links, between TRANSFAC and other databases such as PIR, EMBL, PROSITE, and so on, are crucial for the use of TRANSFAC.

33.3.1 Procedure

Enrique Blanco has discussed the procedure in the “Practical” online tutorial (http://genome.crg.es/courses/Bioinformatics2003_promoters/).

33.3.1.1 Accessing the TRANSFAC database

  1. Go to the TRANSFAC database and choose the search in TRANSFAC 6.0 (Figure 33.1). The URL is http://www.biobase‐international.com/product/transcription‐factor‐binding‐sites.
  2. Select the Factor table (Figure 33.1).
  3. Type the factor name TBP (TATA binding protein).
  4. Provide a Factor Name (FA) as searching field and then submit.
  5. Choose (T00794) to find a description of the factor in humans.
  6. On the left, “BS” (for binding sites) and “MX” (for matrices) will be there. Choose one of the sites for assessment.
Image described by caption.

FIGURE 33.1 (a) TRANSFAC database search; (b) FACTOR table search; (c) TRANSFAC Factor entries; (d) output of TRANSFAC Factor table.

33.3.1.2 Building a model from a set of actual sites

  1. Actual TBP sites are collected from TRANSFAC.
  2. Go to the CLUSTALW web server at EBI.
  3. Bring up the collection of 23 TBP sites.
  4. Switch on the boxes:
    • ALIGNMENT = fast
    • COLOR ALIGNMENT = yes
    • OUTPUT FORMAT = aln wo/numbers
  5. Click on “Run”.

33.3.1.3 Open the WebLogo server

  1. After placing the sequence alignment in the input box, “activate DNA/RNA” in the “Sequence type” box.
  2. Submit the query.
  3. The resulting representation of TBP sites as shown in Figure 33.2.
Top: WebLogo interface with arrows labeled Go to WEBLOGO, Put CLUSTAL alignment, Activate DNA/RNA, and Create Logo options. Bottom: Representation of TBP sites with arrow pointing to THE binding site (TATAAAA.)

FIGURE 33.2 Creating sequences logos using the web interface.

33.3.1.4 Obtaining the TRANSFAC position weight matrices

  1. Go to the TRANSFAC database and choose the search in TRANSFAC 6.0.
  2. Pick the matrix table (Figure 33.3).
  3. Put in the factor name TATA.
  4. Set Factor Name (NA) as searching field and submit the query. There are two entries: M00252 and M00216.
  5. After repeating the procedures, keep the windows containing the matrices (M00252 matrix and those for SP1 and c/EBP).
  6. Compare the core of the matrix with the previous sequence logo.
  7. Compare both to the TATA box binding site in the ABS entry.
Image described by caption.

FIGURE 33.3 (a) Searching Transfac matrix table; (b) TRANSFAC Matrix entries; (c) output of TRANSFAC Matrix table.

33.3.2 Key features/benefits

  1. Quickly access detailed reports of 41 000+ transcription factor binding site, 18 000+ miRNA target sites and 1100+ miRNA reports (Ying et al., 2013), 22 000+ transcription factors, 13 837 000+ ChIP fragments and 273,000+ promoters.
  2. Molecular mechanisms that enable transcription factors to orchestrate with gene expression in vivo.

33.3.3 Access options

An online subscription provides access to the TRANSFAC web interface. However, a download subscription provides access to flat files containing data for factors, matrices, binding sites, genes, ChIP fragments and other supporting information, as well as command line access to the MATCH tool.

33.4 BINDING SITES SEARCHING USING THE MATCH TOOL

The MATCH tool is used for searching binding sites for transcription factors in any sequence, using the mononucleotide weight matrix library from TRANSFAC.

33.4.1 Procedure

33.4.1.1 Enter a name for search

Open the MATCH server to analyze promoter regions with TRANSFAC matrices: http://www.gene‐regulation.com/cgi‐bin/pub/programs/match/bin/match.cgi.

Enter a name for the search, since MATCH will store the result under that name. It will use the default as the result name.

33.4.1.2 Select a sequence

There are three options for selecting a sequence for a search:

  1. Select among the sequences entered for a previous search.
  2. Select an example sequence, such as the 5′ flanking region of the tyrosine aminotransferase (TAT) gene of Rat (EMBL: M34257).
  3. Enter a name for the new sequence. Store the sequence with that name so that it can be used again for a later search.
  4. Then insert the sequence. The formats accepted are FASTA, TRANSFAC, EMBL, GenBank, IG, and RAW.

Select a group of matrices or a profile to run MATCH vertebrates, insects, plants, fungi, bacteria, and nematodes. The term “profile” refers to a set of weight matrices obtained from the TRANSFAC library.

  1. Matrix selection: select the inserted group of matrices from the library. Several groups can be combined with one search. Specify the set of matrices (based on matrix similarity), either to restrict the search to the use of high‐quality matrices only, or to include user‐defined matrices. For any group of matrices, the cut‐offs for core and matrix similarity can be specified.
  2. Profile selection: One can either use a predefined profile or create one on the MATCH profiler page. To create a profile with the TRANSFAC search engine, use the following steps:
    1. Select the TRANSFAC query form “MATRIX SEARCH”.
    2. Intended entries boxes should be marked for inclusion in a MATCH search.
    3. A box with the text Run Match with marked entries will be displayed. Mark this box also.
    4. Click on “Show marked entries/Start MATCH”. A selection of sites will be seen among the user‐defined profile.
  3. Predefined profiles: various tissue‐specific profiles such as immune cell‐specific, cell cycle‐specific, muscle‐specific, and liver‐specific are provided by MATCH.

33.4.1.3 Submit the form

The result page tabulates all matches found in the input sequence. The output of the program is limited to 500 000 matches per sequence. The outcomes are represented in Figure 33.4 with the following columns:

  1. Respective matrix, linked with each identifier, linked to the TRANSFAC entry.
  2. Score for core similarity (core match).
  3. Score for matrix similarity (matrix match).
  4. Matching sequence.
  5. The matrix representing the name of the factor whose binding site is matching.
Top left: MATCH user interface WITH ARROWS LABELED Go to MATCH, Select a sequence, and Submit the form. Top right: Results page of MATCH output. Bottom: Representation of locations of the found matches.

FIGURE 33.4 (a) MATCH user interface; (b) results page of MATCH output; (c) a simple visual representation of locations of the found matches.

The last three lines of the result page give the total length of all the searched sequences, along with a total number of sites that have been found, and the frequency of sites per nucleotide.

33.5 QUESTIONS

  1. 1. What experimental methods are used to predict the transcription factor binding site? Explain in detail.

    Hint: TRANSFAC and MATCH are used to predict the TFBS. Consult section 33.3 and 33.4 for detail procedure.

  2. 2. Have any transcription factors been experimentally demonstrated to regulate the human, mouse or rat RNF43 genes? Justify with the proper experimental set‐up.

    Hint: Refer TRANSFAC and follow instructions given in section 33.3.1.

  3. 3. How many matrices are available for the human, mouse and rat FOXA family members? Create a profile for these matrices.

    Hint: See IV point of section 33.3.1.

  4. 4. How will you predict the transcription factor target site for the promoter region of IFNB1 using bioinformatics tools?

    Hint: Follow the procedure explained in section 33.4.1.

  5. 5. Define the experimental set‐up of transcription factor target site prediction for the promoter region of Trp63 gene.

    Hint: Consult section 33.4.1.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset