Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 3
GENE REGULATORY NETWORKS: REAL DATA SOURCES AND THEIR ANALYSIS

Yuji Zhang

Department of Epidemiology and Public Health, University of Maryland
School of Medicine, Baltimore, MD, USA
and
Division of Biostatistics and Bioinformatics, University of Maryland
Greenebaum Cancer Center, Baltimore, MD, USA

3.1 INTRODUCTION

In all living organisms, deoxyribonucleic acid (DNA), ribonucleic acid (RNA), and protein are three types of biological macromolecules that are indispensable for all biological processes. They are unbranched polymer chains, formed by the string together of monomeric building blocks drawn from a standard repertoire that is the same for all living cells. These molecules interact with each other frequently, and conditionally depend on each other to provide complex biological functions (e.g., functions of a protein are usually provided by its interactions with other proteins and genes). These molecules and their interactions compose complex networks, called gene regulatory networks (GRNs).

Gene regulatory networks are one of the most important biological networks in the bioinformatics and systems biology field. They play a vital role in almost every biological process, including cell differentiation, metabolism, cell cycle, and signal transduction. By understanding the properties and dynamics of these networks, we can shed light on the mechanisms of diseases that occur when these cellular processes are dysregulated. Analysis and inference of GRNs will also guide biologists in downstream biological experimental designs, as such inferences are more time- and cost-effective than wet lab experiment validations. In general, there are two different types of computational approaches for analysis and inference of GRNs:

Topological analysis of GRN: based on the regulatory interactions (e.g., protein–DNA interactions and protein–protein interactions) collected in public databases (e.g., Human Protein Reference Database [1], IntAct [2], Biomolecular Interaction Network Database [3], and Search Tool for the Retrieval of Interacting Genes/Proteins [4]) and genome-wide high-throughput experiments [5, 6], several network analysis approaches have been proposed to investigate the topological properties of GRNs in different organisms [7, 8, 9].
Inference of gene regulatory relationships: based on gene expression data (e.g., time-series gene expression data), a series of computational approaches have been developed for reconstruction and inference of gene regulatory relationships at genome-wide level [10, 11, 12].

This chapter is organized as follows: we will first review the biological data sources available for analysis and inference of GRNs, and then we will introduce the topological analysis approaches for GRNs. We will briefly review different types of computational approaches for GRN inference as well as our proposed approach for GRN inference by integrating prior biological knowledge. Finally, we will conclude the chapter with discussions and future works in the GRN analysis and inference field.

3.2 BIOLOGICAL DATA SOURCES

In this section, we describe multiple sources of biological data that have been used for GRN analysis and inference. This will help us better understand how integration of these different types of biological data brings us a more complete picture in GRN inference.

3.2.1 Gene Expression Data

Gene expression data can be divided into two levels: mRNA level and protein level. In this chapter, we focus on the gene expression data on mRNA level, including cDNA microarrays, high-density oligonucleotide chips, reverse transcriptase polymerase chain reaction (RT-PCR), and RNA-seq.

3.2.1.1 cDNA Microarrays

Originally developed at Stanford University, cDNA microarrays are glass slides on which cDNA has been deposited by high-speed robotic printing [13]. They are ideally suited for expression analysis of up to 50,000 cDNA clones per array from expressed sequence tag sequencing projects (e.g., private effort at Incyte Pharmaceuticals and the public Washington University project). Measurements are carried out as differential hybridizations to minimize errors originating from cDNA spotting variability: mRNA from two different sources (e.g., control and drug treated), labeled with two different fluorescent dyes, is passed over the array at the same time. The fluorescence signal from each mRNA population is evaluated independently, and then used to calculate the treated/control expression ratio.

3.2.1.2 High-Density Oligonucleotide Chips

These chips, produced by Affymetrix [14], consist of small glass plates with thousands of short 20-mer oligonucleotide probes attached to their surface. The oligonucleotides are synthesized directly onto the surface using a combination of semiconductor-based photolithography and light-directed chemical synthesis. Due to the combinatorial nature of the process, very large numbers of mRNAs can be probed at the same time. However, manufacturing and reading of the chips requires expensive equipment. Current chips have over 65,000 different probes, with typically several probes for each mRNA.

3.2.1.3 RT-PCR

To measure gene expression using RT-PCR, the mRNA is first reverse-transcribed into cDNA, and the cDNA is then amplified to measurable levels using PCR [15]. Using built-in calibration techniques, RT-PCR can achieve high accuracy coupled with an exceptional sensitivity of 10 molecules/10 μl and a dynamic range covering 6–8 orders of magnitude. The method requires PCR primers for all the genes of interest, and is not inherently parallel like the previous two, so automation is crucial to scale up.

Roland Somogyi used this method to measure the expression levels of 112 genes at 9 different time points during the development of rat cervical spinal cord [16], and 70 genes during development and following injury of the hippocampus.

3.2.1.4 RNA-Seq

RNA-seq, also called “whole transcriptome shotgun sequencing,” is a recently developed approach to transcriptome profiling that uses deep-sequencing technologies [17]. Compared to hybridization-based approaches, RNA-seq has the advantages to (1) detect novel transcripts, (2) have very low background noise, (3) contain a large dynamic range of expression levels over which transcripts can be detected, (4) have higher level of reproducibility, and (5) require less RNA sample [17]. It has been applied to various organisms, including Saccharomyces cerevisiae [18], Schizosaccharomyces pombe [19], Arabidopsis thaliana [20], zebrafish [21], mouse [22], and human cells [23].

3.2.2 Protein–Protein Interaction Data

Protein–protein interactions are essential for a wide range of cellular processes and form a network of astonishing complexity. Until recently, our knowledge of this complex network was rather limited. The emergence of large-scale protein–protein interaction maps has given us new possibilities to systematically survey and study the underlying biological system. First attempts to collect protein–protein interactions on a large scale were initiated for model organisms such as S. cerevisiae, Drosophila melanogaster, and Caenorhabditis elegans [24, 25, 26, 27, 28]. Evidently, the generated interaction maps offered a rich resource for systematic studies of molecular networks.

After these initial efforts, the focus has moved toward deciphering the human protein–protein interactions. Most currently available human interaction maps can be divided into three classes: (1) maps obtained from the literature search [29, 30, 31], (2) maps derived from interactions among orthologous proteins in other organisms [32, 33, 34], and (3) maps based on large scans using yeast two-hybrid (Y2H) assays [35, 36]. All of these different mapping strategies have their obvious advantages as well as disadvantages. For example, Y2H-based mapping approaches offer rapid screens between thousands of proteins, but might be compromised by large false-positive rates. The extent, however, of how much the resulting interaction maps are influenced by the choice of mapping strategy is less clear. Thus, it is important to critically assess and compare the quality and reliability of produced maps.

Protein–protein interaction networks are commonly represented in a graph format, with vertices corresponding to proteins and edges corresponding to protein–protein interactions [37]. The network consists of many small subnets (groups of proteins that interact with each other but not with any other protein) and one large, connected subnet comprising more than half of all interacting proteins. The volume of experimental data on protein–protein interactions is rapidly increasing, thanks to high-throughput biotechniques that are able to produce a large number of protein–protein interactions. For instance, yeast contains over 5000 proteins, and currently about 18,000 protein–protein interactions have been identified among the yeast proteins, with hundreds of labs around the world constantly adding to this list [38]. The analogous networks for mammals are expected to be much larger. For example, humans are expected to have around 12,000 proteins and about 10⁶ interactions.

3.2.3 Protein–DNA Interaction Data

Currently, there are several different sources available for protein–DNA interactions: (1) experimental data from genome-wide location analysis (GWLA) [39], such as ChIP-chip [40] and ChIP-seq [41], (2) curated binding information in public databases, such as TRANSFAC [42], and (3) putative binding sites based on computational prediction algorithms [43]. As an in vivo study, GWLA technology is biologically most significant, but provides only the roughest information about possible binding sites. As more advanced sequencing technologies are being developed, it is expected that such approaches will overcome these limitations in the future. Curated information in public databases, on the other hand, presents a compilation of mostly in vitro studies and provides more accurate information, but at the expense of only small coverage of all intergenic regions. The third method is based on in silico predictions and provides the most detailed information on DNA-binding site locations, but contains the highest rate of false positives. As more interactions accumulate from different resources, interactions that are identified by more than one resource will be considered as high-confidence interactions. This will help us reduce the false positives arising from different resources.

3.2.4 Gene Ontology

The Gene Ontology (GO) Consortium [44] has developed three separate ontologies—molecular function, biological process, and cellular component—to describe the attributes of gene products, where molecular function defines what a gene product does at the biochemical level without specifying where or when the event actually occurs or its broader context; biological process describes the contribution of a gene product to a biological objective, and cellular component refers to where in the cell a gene product functions. Each GO is structured as a directed acyclic graph, wherein each term is a child of one or multiple parents, and child terms are instances or components of parent terms. For example, in Figure 3.1, the term S phase of meiotic cell cycle (GO: 0051332) is an instance of the term S phase (GO: 0051320) as well as an instance of the term interphase of meiotic cell cycle (GO: 0051328). Such information can be incorporated into GRN analysis and inference approaches to increase the inference accuracy [45, 46].

3.3 TOPOLOGICAL ANALYSIS OF GENE REGULATORY NETWORKS

A GRN can be defined as a directed graph: the nodes represent the genes, a directed edge from one node to another indicates that the first gene codes for a transcription factor regulates the second genes, and an undirected edge between two nodes represents their interactions at protein level, which further direct these two proteins to regulate some common downstream genes [46]. The architecture of GRN can be described by means of graph features such as node degree, network diameter, and clustering coefficient. We briefly introduce these concepts, followed by a description of GRN analysis approaches at different network levels.

3.3.1 Node Degree

The degree of a node is defined as the number of edges that connect to this node. In directed networks, the number of incoming edges is called the in-degree, and the number of outgoing edges is called the out-degree. If a node has a high degree, it indicates that this node is connected to many other nodes in the network. Biological networks are not randomly organized but have a scale-free architecture with the typical power law degree distribution, that is, only a small number of nodes have a high degree while most nodes have a small degree. The GRN network, protein–protein interaction network, and metabolic network are all scale-free networks [47]. The advantage of this kind of organization is that the loss of one non-hub link is not as disruptive in scale-free networks as in random networks. In other words, scale-free networks are generally more robust. The hubs are extremely important and usually play essential roles in many biological systems [48].

Two flowcharts. Left- Biological process leads to S phase of meiotic cycle through cellular process, et cetera. Right- Parent leads to negative regulation, six goslim blocks below. — **Figure 3.1** The directed acyclic graph induced from the GO term S phase of meiotic cell cycle (GO: 0051332), wherein at the bottom-most level is the GO term of interest itself, and at the upper levels are all its ancestors, adapted from QuickGO GO Browser (http://www.ebi.ac.uk/ego/). GO, Gene ontology.

3.3.2 Neighborhood Connectivity

The connectivity of a node is the number of its neighbors. The neighborhood connectivity of a node is defined as the average connectivity of its all neighbors. In analogy to the in- and out-degree, every node in a directed network has in- and out- connectivity.

3.3.3 Shortest Paths

The shortest path length between two nodes in one network is called the node distance. The network diameter is the maximum length of shortest paths between any two nodes in one network. If a network is disconnected, its diameter is the maximum of all diameters of its connected components. The distribution of shortest path length and network diameter can indicate small-world properties of the analyzed network [49]. Many biological networks, such as GRN and metabolic networks, are known to exhibit this small-world property.

There are also other important network parameters such as clustering coefficient, betweenness centrality, and stress centrality in network topological analysis; please refer to Doncheva et al. [50] for a detailed review.

In addition to the network properties described in Section 3.3, studies have also unveiled that biological networks have modular structure in most organisms [51, 52, 53, 54]. Indeed, biological processes consist of pathways that mainly act on their own and cross talk with each other under certain conditions. Therefore, it is expected that the distinct biological processes can be organized in discrete and separable modules. A module in a network can be defined in various ways [55]:

One popular definition of a module involves co-expressed genes, with or without environmental context dependence and assigning a regulatory motif or regulator to these genes. This definition of a module will refer to a gene module in the remainder of the present chapter.
Another definition of topological module can be defined by means of graph-based approaches. For instance, network motifs (NMs) are one of the smallest modules in GRNs, which will be called as gene regulatory modules in the remainder of the present chapter.

Furthermore, Shen-orr et al. [56] discovered the presence of NMs in the transcriptional network of Escherichia coli. Network motifs are the smallest building blocks in networks. They are topologically distinct regulatory interaction patterns that are present more frequently in true biological networks than in random networks. Therefore, these motifs must have a specific biological function: they are postulated to be the basic signal transduction elements, each with its own characteristic properties. Shen-orr et al. were the first to identify these NMs. Examples of NMs are the single input motif and multiple input and feed-forward loop motifs [56].

3.3.4 Reconstruction of Transcriptional Regulatory Network

Besides network properties of GRNs, inference of GRNs is another challenging topic in the bioinformatics field. The purpose of GRN inference is to determine for all transcription factors the regulatory mechanisms they recognize, the conditions in which they are active, the regulators they cooperate with in these conditions, and their target genes in these conditions. The approach can be categorized into two groups: bottom-up and top-down approaches [57, 58]:

Bottom-up approaches start from a comprehensive expert model of known interactions among molecular entities as described in literature and curated databases. Such models can be used to simulate cellular behavior or to predict the outcome of a perturbation experiment. Inconsistencies between observed data and simulations will direct to deficiencies in the current network structure and outline hypotheses of novel interactions that can better explain the observations.
Top-down approaches start from a global view of the behavior of the whole biological system by using high-throughput data. This type of inference methods does not rely on expert knowledge on the relationships among the molecular components. Top-down inference is a data-driven and thus data-demanding approach. Given the current data availability, top-down network inference problems are often underdetermined (i.e., the network that is reconstructed from the data is not unique, and many equally likely solutions can explain the observations). However, the top-down inference can be made increasingly tractable by integrating data from different sources, and holds great promise for future bioinformatics research.

One approach to tackle the above underdetermined problem in top-down network inference is to integrate the multi-complementary high-throughput datasets. Transcriptional regulation is a process that needs to be understood at multiple levels of description [59, 60] (Figure 3.2), including (1) the factor–target gene interaction, in which transcription factors activated under certain conditions interact with their conserved binding site sequences, and (2) transcriptional regulation, which explains how the bindings of transcription factors to their unique recognition sites regulate the expression of specific genes. A single source of information such as gene expression data is aimed at only one level of description (i.e., transcriptional regulation level), and thus it is limited in its ability to obtain a full understanding of the entire regulatory process. Other types of information such as protein–protein interaction [61, 62] and protein–DNA interaction [40] data provide complementary constraints on the models of regulatory processes. By integrating limited but complementary data sources, we can realize a mutually consistent hypothesis bearing stronger similarity to the underlying causal structures [60]. Among the various types of high-throughput biological data available nowadays, time-course gene expression profiles and GWLA data are two complementary sets of information that can be used to infer regulatory components. Time-course gene expression data are advantageous over typical static expression profiles as time can be used to disambiguate causal interactions. GWLA data, on the other hand, provide high-throughput quantitative information about in vivo binding of transcription factors to the target regulatory regions of the DNA. Incorporation of prior biological knowledge accumulated in literature will help guide inference from the above datasets, and integration of multiple data sources offers insights into the cellular system at different levels [46].

Another way to reduce the complexity of the GRN inference problem is to decompose it into small units of commonly used network structures, called gene regulatory modules. As we introduced in this section, GRNs are made of repeated occurrences of simple patterns–NMs. Since the establishment of the first NM in E. coli [56], similar NMs have also been found in eukaryotes including yeast [63], plants, and animals [64, 65, 66], suggesting that the general structure of NMs is evolutionarily conserved. One well-known family of NMs is the feed-forward loop [67], which appears in hundreds of gene systems in E. coli [56, 68] and yeast [63, 69], as well as in other organisms [64, 65, 66, 70, 71, 72]. A comprehensive review on NM theory and experimental approaches is presented in Ref. [73]. Knowledge of the NMs to which a given transcription factor belongs facilitates the identification of downstream target gene clusters. In yeast, a GWLA was carried out for 106 transcription factors and 5 NMs were considered significant: autoregulation, feed-forward loop, single input module, multi-input module, and regulator cascade [63]. In Section 3.4, we will review commonly used models for GRN inference, followed by the introduction of our proposed computational approach integrating multi-source biological data for GRN inference.

Rectangles with divisions TFBS1-TFBSn splits into Factor-gene binding and transcriptional regulation levels. TSS marked at E1 in transcriptional regulation level. — **Figure 3.2** The gene transcriptional regulatory program. The gene transcriptional regulatory program can be simplified at two levels. At the factor–gene binding level, the “activated” transcription factors bind to their specific conserved sequence motifs, called transcription factor binding sites. When the binding process is completed, the regulation mechanism instructs the gene transcription from transcriptional start site (DNA to mRNA); first part of the central dogma in molecular biology. This figure was adapted from Zhang et al. [46].

3.4 GRN INFERENCE BY INTEGRATION OF MULTI-SOURCE BIOLOGICAL DATA

In the last two decades, a variety of continuous or discrete, static or dynamic, and quantitative or qualitative models have been proposed for inference of GRNs. These include biochemically driven methods [74], linear models [75, 76], Boolean networks [77], fuzzy logic [78, 79], Bayesian networks [80, 81], and recurrent neural networks (RNNs) [82, 83, 84]. Chapter 2 provides a detailed description of these approaches. However, all these computational approaches described in Section 3.3 still cannot solve the underdetermined problem in GRN inference due to the typical small sample size compared to the number of genes investigated. We hypothesize that we can enhance our understanding of gene interactions in important biological processes and improve the inference accuracy of a GRN by (1) incorporating prior biological knowledge into the inference scheme, (2) integrating multiple biological data sources, and (3) decomposing the inference problem into smaller network modules. In this section, we will introduce our proposed integrative framework to tackle these challenges.

The GRN inference only based on gene expression data is inadequate because of its intrinsic complexity and noise in gene expression data. Integrating data from multiple global assays and curated databases is essential to understand the spatiotemporal interactions within cells. Different experiments measure cellular processes at various widths and depths, while databases contain biological information based on established facts or published data. Integrating these complementary datasets helps infer a mutually consistent transcriptional regulatory network with strong similarity to the structure of the underlying gene regulatory modules. Decomposing the transcriptional regulatory network into a small set of recurring regulatory patterns, called NMs, facilitates the inference. Identifying NMs defined by specific transcription factors establishes the modular framework structure of a transcriptional regulatory network and allows the inference of transcription factor–target gene relationship. This section introduces a computational framework for utilizing data from multiple sources to infer transcription factor–target gene relationships on the basis of NM regulatory modules. The data include time-course gene expression profiles, molecular interaction data, and GO information.

In the proposed framework, we consider two different layers of networks in the GRN. One is the molecular interaction network that includes protein–protein interactions and protein–DNA interactions at the factor–gene binding level. The other is the functional network that incorporates the consequences of these physical interactions, such as the activation or repression of transcription. We used three types of data to reconstruct the GRN, namely protein–protein interactions derived from a collection of public databases, protein–DNA interactions from the TRANSFAC database [42], and time-course gene expression profiles. The first two data sources provided direct network information to constrain the GRN model. The gene expression profiles provided an unambiguous measurement on the causal effects of the GRN model. GO annotation describes the similarities among genes within one network, which facilitates further characterization of the relationships between genes. The goal is to discern dependencies between the gene expression patterns and the physical intermolecular interactions revealed by complementary data sources.

Three rectangular blocks on left for interaction data, gene module, gene expression data leads to flowchart on right with Network motif discovery leading to gene regulatory module. — **Figure 3.3** Schematic overview of the computational framework used for the gene regulatory module inference. PPI, protein–protein interaction; PDI, protein–DNA interaction.

The framework model for GRN inference is illustrated in Figure 3.3. Three successive steps were involved in this framework as outlined in the following:

3.4.1 Gene Module Selection

Genes with similar expression profiles were represented by a gene module to address the scalability problem in GRN inference [79]. The assumption is that a subset of genes that are related in terms of expression (co-regulated) can be grouped together by virtue of a unifying cis-regulatory element(s) associated with a common transcription factor regulating each and every member of the cluster (co-expressed) [85]. GO information was utilized to define the optimal number of clusters with respect to certain broad functional categories. Since each gene module identified from clustering analysis mainly represents one broad biological or process category as evaluated by FuncAssociate [86], the regulatory network implies that a given transcription factor is likely to be involved in the control of a group of functionally related genes [87].

3.4.2 Network Motif Discovery

To reduce the complexity of the inference problem, NMs were utilized instead of a global GRN inference. The significant NMs in the combined molecular interaction network were first established and assigned to at least one transcription factor. These associations were further used to reconstruct the regulatory modules. This step was implemented using the FANMOD tool.

3.4.3 Gene Regulatory Module Inference

For each transcription factor assigned to an NM, a RNN was trained to model a GRN that mimics the associated NM. GA generated the candidate gene modules, and Particle Swarm Optimization was used to configure the parameters of the RNN. Parameters were selected to minimize the root-mean-square error (RMSE) between the output of the RNN and the target gene module's expression pattern. The RMSE was returned to GA to produce the next generation of candidate gene modules. Optimization continued until either a pre-specified maximum number of iterations were completed or a pre-specified minimum RMSE was reached. The procedure was repeated for all transcription factors. Biological knowledge from databases was used to evaluate the predicted results.

We applied this computational framework to two biological processes: yeast cell cycle progression process [88] and human Hela cancer cell cycle [89]. We demonstrate that our method can accurately infer the underlying relationships between transcription factor and the downstream target genes by integrating multi-sources of biological data. The predictive strength of this strategy is based on the combined constraints arising from multiple biological data sources including time-course gene expression data, combined molecular interaction network data, and GO category information.

3.5 CONCLUSIONS AND FUTURE DIRECTIONS

The analysis and inference of GRNs is a major obstacle in bioinformatics majorly due to (1) intrinsic complexity of gene regulation mechanisms, (2) limited sample size compared to gene numbers in one experiment, and (3) noise in gene expression data themselves. Computational approaches integrating multi-source biological data can address this challenge by reverse engineering GRNs at NM level. However, there are still needs for developing models that can integrate more types of biological data.

Biological systems are characterized by many highly interconnected levels. Most approaches analyze the GRNs at transcriptional level. This network may be augmented with additional types of relations, which can then provide further insight into other types of cellular mechanisms. One of the major tasks ahead is therefore the integration of more sources of information. One intriguing dataset to add is that of signal transduction pathways, including directed protein–protein interactions as those between kinases and their substrates, and interactions between signaling molecules (e.g., pheromone) and their targets. Assuming these data were available, it will enable the characterization of signal transduction pathways and their control mechanisms. Attempts to collect genome-wide signaling data are underway, for example, using protein chips designed to test kinase phosphorylation interactions as performed by Snyder et al. [90].

ACKNOWLEDGMENT

This work is supported in part by grant P30 CA 134274-04 from the NCI.

REFERENCES

Keshava Prasad, T.S., et al., Human Protein Reference Database—2009 update. Nucleic Acids Res, 2009. 37(Database issue): D767–D772.
Kerrien, S., et al., The IntAct molecular interaction database in 2012. Nucleic Acids Res, 2012. 40(Database issue): D841–D846.
Willis, R.C. and C.W. Hogue, Searching, viewing, and visualizing data in the Biomolecular Interaction Network Database (BIND). Curr Protoc Bioinformatics, 2006. Chapter 8: Unit 8.9.
Franceschini, A., et al., STRING v9.1: protein–protein interaction networks, with increased coverage and integration. Nucleic Acids Res, 2013. 41(Database issue): D808–D815.
Schmidt, D., et al., ChIP-seq: using high-throughput sequencing to discover protein-DNA interactions. Methods, 2009. 48(3): 240–248.
Wu, J., et al., ChIP–chip comes of age for genome-wide functional analysis. Cancer Res, 2006. 66(14): 6899–6902.
Luscombe, N.M., et al., Genomic analysis of regulatory network dynamics reveals large topological changes. Nature, 2004. 431(7006): 308–312.
Peter, I.S. and E.H. Davidson, Modularity and design principles in the sea urchin embryo gene regulatory network. FEBS Lett, 2009. 583(24): 3948–3958.
Davidson, E.H., Emerging properties of animal gene regulatory networks. Nature, 2010. 468(7326): 911–920.
Marbach, D., et al., Revealing strengths and weaknesses of methods for gene network inference. Proc Natl Acad Sci USA, 2010. 107(14): 6286–6291.
Sirbu, A., H.J. Ruskin, and M. Crane, Comparison of evolutionary algorithms in gene regulatory network model inference. BMC Bioinformatics, 2010. 11: 59.
De Smet, R. and K. Marchal, Advantages and limitations of current network inference methods. Nat Rev Microbiol, 2010. 8(10): 717–729.
Schena, M., et al., Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science, 1995. 270(5235): 467–470.
McGall, G.H. and J.A. Fidanza, Photolithographic synthesis of high-density oligonucleotide arrays. Methods Mol Biol, 2001. 170: 71–101.
Joyce, C., Quantitative RT-PCR. A review of current methodologies. Methods Mol Biol, 2002. 193: 83–92.
Wen, X., et al., Large-scale temporal gene expression mapping of central nervous system development. Proc Natl Acad Sci USA, 1998. 95(1): 334–339.
Wang, Z., M. Gerstein, and M. Snyder, RNA-seq: a revolutionary tool for transcriptomics. Nat Rev Genet, 2009. 10(1): 57–63.
Nagalakshmi, U., et al., The transcriptional landscape of the yeast genome defined by RNA sequencing. Science, 2008. 320(5881): 1344–1349.
Wilhelm, B.T., et al., Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature, 2008. 453(7199): 1239–1243.
Lister, R., et al., Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell, 2008. 133(3): 523–536.
Craig, T.A., et al., Research resource: whole transcriptome RNA sequencing detects multiple 1alpha,25-dihydroxyvitamin D(3)-sensitive metabolic pathways in developing zebrafish. Mol Endocrinol, 2012. 26(9): 1630–1642.
Cloonan, N., et al., Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods, 2008. 5(7): 613–139.
Morin, R., et al., Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing. Biotechniques, 2008. 45(1): 81–94.
Gavin, A.C., et al., Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature, 2002. 415(6868): 141–147.
Giot, L., et al., A protein interaction map of Drosophila melanogaster. Science, 2003. 302(5651): 1727–1736.
Ito, T., et al., A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA, 2001. 98(8): 4569–4574.
Li, S., et al., A map of the interactome network of the metazoan C. elegans. Science, 2004. 303(5657): 540–543.
Uetz, P., et al., A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae. Nature, 2000. 403(6770): 623–627.
Bader, G.D., D. Betel, and C.W. Hogue, BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res, 2003. 31(1): 248–250.
Peri, S., et al., Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res, 2003. 13(10): 2363–2371.
Ramani, A.K., et al., Consolidating the set of known human protein–protein interactions in preparation for large-scale mapping of the human interactome. Genome Biol, 2005. 6(5): R40.
Lehner, B. and A.G. Fraser, A first-draft human protein-interaction map. Genome Biol, 2004. 5(9): R63.
Brown, K.R. and I. Jurisica, Online predicted human interaction database. Bioinformatics, 2005. 21(9): 2076–2082.
Persico, M., et al., HomoMINT: an inferred human network based on orthology mapping of protein interactions discovered in model organisms. BMC Bioinformatics, 2005. 6(Suppl 4): S21.
Rual, J.F., et al., Towards a proteome-scale map of the human protein–protein interaction network. Nature, 2005. 437(7062): 1173–1178.
Stelzl, U., et al., A human protein-protein interaction network: a resource for annotating the proteome. Cell, 2005. 122(6): 957–968.
Przulj, N., D.A. Wigle, and I. Jurisica, Functional topology in a network of protein interactions. Bioinformatics, 2004. 20(3): 340–348.
Xenarios, I., et al., DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res, 2002. 30(1): 303–305.
Hawkins, R.D. and B. Ren, Genome-wide location analysis: insights on transcriptional regulation. Hum Mol Genet, 2006. 15(Spec No 1): R1–R7.
Buck, M.J. and J.D. Lieb, ChIP–chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. Genomics, 2004. 83(3): 349–360.
Jothi, R., et al., Genome-wide identification of in vivo protein-DNA binding sites from ChIP-seq data. Nucleic Acids Res, 2008. 36(16): 5221–5231.
Matys, V., et al., TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res, 2003. 31(1): 374–378.
Kel, A.E., et al., MATCH: A tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res, 2003. 31(13): 3576–3579.
Ashburner, M., et al., Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 2000. 25(1): 25–29.
Yeung, K.Y., et al., Construction of regulatory networks using expression time-series data of a genotyped population. Proc Natl Acad Sci USA, 2011. 108(48): 19436–19441.
Zhang, Y., et al., Network motif-based identification of transcription factor-target gene relationships by integrating multi-source biological data. BMC Bioinformatics, 2008. 9: 203.
Barabasi, A.L. and Z.N. Oltvai, Network biology: understanding the cell's functional organization. Nat Rev Genet, 2004. 5(2): 101–113.
Yu, H., et al., Genomic analysis of essentiality within protein networks. Trends Genet, 2004. 20(6): 227–231.
Vidal, M., M.E. Cusick, and A.L. Barabasi, Interactome networks and human disease. Cell, 2011. 144(6): 986–998.
Doncheva, N.T., et al., Topological analysis and interactive visualization of biological networks and protein structures. Nat Protoc, 2012. 7(4): 670–685.
Guelzim, N., et al., Topological and causal structure of the yeast transcriptional regulatory network. Nat Genet, 2002. 31(1): 60–63.
Hartwell, L.H., et al., From molecular to modular cell biology. Nature, 1999. 402(6761 Suppl): C47–C52.
Ravasz, E., et al., Hierarchical organization of modularity in metabolic networks. Science, 2002. 297(5586): 1551–1555.
Mitra, K., et al., Integrative approaches for finding modular structure in biological networks. Nat Rev Genet, 2013. 14(10): 719–732.
Wolf, D.M. and A.P. Arkin, Motifs, modules and games in bacteria. Curr Opin Microbiol, 2003. 6(2): 125–134.
Shen-Orr, S.S., et al., Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet, 2002. 31(1): 64–68.
Bruggeman, F.J. and H.V. Westerhoff, The nature of systems biology. Trends Microbiol, 2007. 15(1): 45–50.
De Keersmaecker, S.C., et al., Integration of omics data: how well does it work for bacteria? Mol Microbiol, 2006. 62(5): 1239–1250.
Walhout, A.J., Unraveling transcription regulatory networks by protein-DNA and protein-protein interaction mapping. Genome Res, 2006. 16(12): 1445–1454.
Blais, A. and B.D. Dynlacht, Constructing transcriptional regulatory networks. Genes Dev, 2005. 19(13): 1499–1511.
Fields, S. and O. Song, A novel genetic system to detect protein-protein interactions. Nature, 1989. 340(6230): 245–226.
Zhu, H., et al., Global analysis of protein activities using proteome chips. Science, 2001. 293(5537): 2101–2105.
Lee, T.I., et al., Transcriptional regulatory networks in Saccharomyces cerevisiae. Science, 2002. 298(5594): 799–804.
Odom, D.T., et al., Control of pancreas and liver gene expression by HNF transcription factors. Science, 2004. 303(5662): 1378–1381.
Boyer, L.A., et al., Core transcriptional regulatory circuitry in human embryonic stem cells. Cell, 2005. 122(6): 947–956.
Swiers, G., R. Patient, and M. Loose, Genetic regulatory networks programming hematopoietic stem cells and erythroid lineage specification. Dev Biol, 2006. 294(2): 525–540.
Mangan, S. and U. Alon, Structure and function of the feed-forward loop network motif. Proc Natl Acad Sci USA, 2003. 100(21): 11980–11985.
Mangan, S., A. Zaslaver, and U. Alon, The coherent feedforward loop serves as a sign-sensitive delay element in transcription networks. J Mol Biol, 2003. 334(2): 197–204.
Milo, R., et al., Network motifs: simple building blocks of complex networks. Science, 2002. 298(5594): 824–827.
Saddic, L.A., et al., The LEAFY target LMI1 is a meristem identity regulator and acts together with LEAFY to regulate expression of CAULIFLOWER. Development, 2006. 133(9): 1673–1682.
Iranfar, N., D. Fuller, and W.F. Loomis, Transcriptional regulation of post-aggregation genes in Dictyostelium by a feed-forward loop involving GBF and LagC. Dev Biol, 2006. 290(2): 460–469.
Milo, R., et al., Superfamilies of evolved and designed networks. Science, 2004. 303(5663): 1538–1542.
Alon, U., Network motifs: theory and experimental approaches. Nat Rev Genet, 2007. 8(6): 450–461.
Naraghi, M. and E. Neher, Linearized buffered Ca2+ diffusion in microdomains and its implications for calculation of [Ca2+] at the mouth of a calcium channel. J Neurosci, 1997. 17(18): 6961–6973.
Chen, T., H.L. He, and G.M. Church, Modeling gene expression with differential equations. Pac Symp Biocomput, 1999. 4: 29–40.
D'Haeseleer, P., et al., Linear modeling of mRNA expression levels during CNS development and injury. Pac Symp Biocomput, 1999. 4:41–52.
Shmulevich, I., et al., Probabilistic Boolean Networks: a rule-based uncertainty model for gene regulatory networks. Bioinformatics, 2002. 18(2): 261–274.
Woolf, P.J. and Y. Wang, A fuzzy logic approach to analyzing gene expression data. Physiol Genomics, 2000. 3(1): 9–15.
Ressom, H., R. Reynolds, and R.S. Varghese, Increasing the efficiency of fuzzy logic-based gene expression data analysis. Physiol Genomics, 2003. 13(2): 107–117.
Friedman, N., et al., Using Bayesian networks to analyze expression data. J Comput Biol, 2000. 7: 601–620.
Vignes, M., et al., Gene regulatory network reconstruction using Bayesian networks, the Dantzig Selector, the Lasso and their meta-analysis. PLoS One, 2011. 6(12): e29165.
Ressom, H.W., Y. Zhang, J. Xuan, Y. Wang, and R. Clarke, Inferring network interactions using recurrent neural networks and swarm intelligence. The 28th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS 2006), IEEE, New York City, NY. 2006. 4241–4244.
Maraziotis, I., A. Dragomir, and A. Bezerianos, Gene networks inference from expression data using a recurrent neuro-fuzzy approach. Conf Proc IEEE Eng Med Biol Soc, 2005. 5: 4834–4837.
Chiang, J.H. and S.Y. Chao, Modeling human cancer-related regulatory modules by GA-RNN hybrid algorithms. BMC Bioinformatics, 2007. 8: 91.
Yeung, K.Y., M. Medvedovic, and R.E. Bumgarner, From co-expression to co-regulation: how many microarray experiments do we need? Genome Biol, 2004. 5(7): R48.
Berriz, G.F., et al., Characterizing gene sets with FuncAssociate. Bioinformatics, 2003. 19(18): 2502–2504.
De Hoon, M.J., S. Imoto, and S. Miyano, Statistical analysis of a small set of time-ordered gene expression data using linear splines. Bioinformatics, 2002. 18(11): 1477–1485.
Spellman, P.T., et al., Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell, 1998. 9(12): 3273–3297.
Whitfield, M.L., et al., Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol Biol Cell, 2002. 13(6): 1977–2000.
Zhu, H., et al., Analysis of yeast protein kinases using protein chips. Nat Genet, 2000. 26(3): 283–289.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for CHAPTER 3 GENE REGULATORY NETWORKS: REAL DATA SOURCES AND THEIR ANALYSIS

Create new playlist

Sign In

Sign Up