CHAPTER 43
Functional Annotation of Common Differentially Expressed Genes

GVPPSR Kumar, AP Sahoo and A Kumar

Animal Biotechnology Division, IVRI, UP, India

43.1 INTRODUCTION

Cuffdiff predicts Differentially Expressed Genes (DEGs) and gives the gene symbols in the output. However, EBSeq, DESeq2, and edgeR give the output of DEGs in Ensembl IDs. These Ensemble IDs are initially converted into gene symbols using g:Convert in g:Profiler. After conversion, it is always better to identify commonly differentially expressed genes across all the packages and further proceed with the analysis. The commonly predicted genes are identified by using the Venny package.

Flow diagram of Cuffdiff prediction of Differentially Expressed Genes (DEG) displaying light to dark shaded boxes giving the output of DEGs in Ensembl IDs.

FIGURE 43.1

Performing the functional annotation through g:Profiler database windows (top) with the excel window (bottom).

FIGURE 43.2

Diagram of 4 overlapping shaded ovals with numerical figures 115, 222, 318, 654, 1116, 875, 507, 1522, 838, 5, 4246, 123, 3, 203, and 26.

FIGURE 43.3

A total of 4246 commonly differentially expressed genes have been identified by all the packages in our analysis.

43.2 FUNCTIONAL ANNOTATION

Functional annotation is used to determine the gene ontology terms enriched in common differentially expressed genes. Gene ontology (GO) (Ashburner et al., 2000) is an in silico approach to amalgamate the methods of presenting the genes and gene product attributes over divergent species. Gene products are categorized into three categories (biological processes, cellular components and molecular functions) in a species‐independent manner in the process of assigning the annotations. There are several databases for performing the functional annotation: DAVID; AmiGO2; g:Profiler; PROSITE; PRINTS; Pfam; ProDom; SMART; TIGRFAMs; SUPERFAMILY; PIR superfamily; Gene3D; PANTHER; BLAST2GO; and HAMAP. Here we will be discussing g:Profiler, DAVID, and clueGO.

43.2.1 Functional annotation using g:Profiler (Reimand et al., 2011)

The gene lists resulting from analysis of high‐throughput genomic data can be manipulated and characterized by g:Profiler. This is a simple, user‐friendly web interface to derive and visualize GO functional pathways from enrichments of the transcription factor binding site up to individual gene levels (Reimand et al., 2007).

43.2.1.1 Step 1

Open http://biit.cs.ut.ee/gprofiler/, paste the gene list, and select Bos taurus as the organism (species of interest) and the output type as Excel spreadsheet (Figures 43.4 and 43.5)

G:Profiler database window displaying bos taurus as the organism and the output type as excel spreadsheet in the search box.

FIGURE 43.4

G:Profiler database window displaying bos taurus as the organism and the output type as excel spreadsheet in the search box.

FIGURE 43.5

43.2.1.2 Step 2

Download the Excel file to check for the annotations enriched in the differentially expressed genes (see Figure 43.6):

Excel file window displaying the annotations enriched in the differentially expressed genes.

FIGURE 43.6

The output shows the significance of terms and the genes associated with the query (Q) in the term (T). The first term – response to abiotic stimulus (Biological process – BP (t type)) has a term ID of GO:0009628, with a p‐value of 1.79E‐07. The term has 625 genes associated with it, of which only 210 are enriched in the gene list, out of a total of 4133 genes considered.

43.2.1.3 Step 3: Representing the functional terms graphically

The most common way of representing the functional terms is by choosing the top ten terms (by sorting on the basis of p‐value) in each category (Biological processes – BP; Molecular process – MP and cellular component – CC), and representing the term on the y‐axis and the significance (–log10P) on the x‐axis, as shown below for the biological processes. The same can be done for all the categories (Figure 43.7).

Excel file windows (top) with a horizontal bar graph (bottom) of biological processes displaying light-dark shaded bars representing cellular metabolic process, metabolic process, etc.

FIGURE 43.7

43.2.1.4 Step 4

Interpretation of the data is completely the researcher’s purview.

43.2.2 Functional annotation using DAVID (Database for Annotation, Visualization, and Integrated Discovery) (Huang da et al., 2009)

DAVID is an integrated biological knowledge base and analytic tool, aimed at systematically extracting biological meaning from large gene/protein lists:

43.2.2.1 Step 1

Open https://david.ncifcrf.gov and upload a multi‐list file if you have > 3000 genes to be annotated. The multi‐list file should be a list1 and list2, separated by a tab (shown in the figure below). Upload this list into DAVID, select the official gene symbol from the drop‐down menu (as an identifier), check the radio button against the gene list and submit (see Figure 43.8).

43.2.2.2 Step 2

Select an appropriate background (here it is Bos taurus) against which you wish to test your gene list. Create a combined list by clicking “combine” after selecting both the lists, and select the combined list to get the functional annotations (see Figure 43.8).

43.2.2.3 Step 3

Click on the functional annotation tool in the window to get the annotation summary results (see Figure 43.8).

43.2.2.4 Step 4

The + button can be clicked in the window to get the results. To get the gene ontology terms, click the + button by the side of the gene ontology and then proceed to any particular category – BP, MP or CC. Clicking on the chart option opens up a window with all the specific terms. Here we click on BP to get all the gene ontology terms enriched for biological processes in our differentially expressed genes. The details containing the genes associated with each gene can be downloaded and opened in Excel for further use. The same can be done to visualize pathways enriched in the DEGs (see Figure 43.9).

Four DAVID database windows displaying the homepage, analysis wizard, and annotation summary results with labels step 1, 2, and 3.

FIGURE 43.8

Four DAVID database windows displaying the annotation summary results and functional annotation chart.

FIGURE 43.9

43.2.3 Functional annotation using ClueGO (Bindea et al., 2009)

ClueGO is a Cytoscape plug‐in that helps in functional annotation and interpretation of large lists of genes. It integrates KEGG/BioCarta pathways with GO terms to create a functionally organized GO/pathway term network.

43.2.3.1 Step 1

Open the ClueGO app in Cytoscape, paste genes in the window (see Figure 43.10).

ClueGo app in Cytoscape window (left) with the control panel (right).

FIGURE 43.10

43.2.3.2 Step 2

Select a gene ontology or pathway and start. Here we selected immune system processes, as shown in Figure 43.10.

43.2.3.3 Step 3

Represent a network with GO term as node label, percentage associated genes as node color, and P‐Value Corrected with Bonferroni step‐down as node size (these parameters are selected as per the requirements of the researcher) (see Figures 43.11, 43.12 and 43.13).

Image described by surrounding text.

FIGURE 43.11

Image described by surrounding text.

FIGURE 43.12

Network diagram of GO term as node label with an arrow pointing to a magnified portion.

FIGURE 43.13

43.2.3.4 Step 4

The attributes of the network can be exported in a table format (see Figure 43.14).

Adobe window displaying the attributes of the network being exported in a table format.

FIGURE 43.14

43.3 QUESTIONS

  1. 1. What is gene ontology?
  2. 2. Why is there a need for functional annotation of genes obtained in the gene lists from RNA – Seq data analysis?
  3. 3. Name five tools for functional annotation of gene lists obtained from RNA – Seq data analysis.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset