Chapter 3

Inferring the Topology of Gene Regulatory Networks: An Algebraic Approach to Reverse Engineering

Brandilyn Stigler* and Elena Dimitrova, *Southern Methodist University, Dallas, TX 75275-0156, USA, Clemson University, Clemson, SC 29634-0975, USA, [email protected]

3.1 Introduction

3.1.1 Gene Regulatory Networks in Molecular Biology

In everyday language the word “genes” has become synonymous with heredity. Nowadays children learn early that who we, and the living world around us, are is encoded in each and every cell of the organism. This, of course, is generally true.1 Indeed genes hold the information to build and maintain an organism’s cells and pass genetic traits to offspring. However, thinking of a genome simply as a book full of facts about an organism paints a rather incomplete static picture. It hides the fact that the “book” also contains instructions for the mechanisms through which genetic information is extracted and plays a role in cellular processes, such as controlling the response of a cell to environmental signals and replication of the DNA preceding the cell division. This process of genetic information extracting and utilizing is part of what is know as gene regulation. Gene regulation is essential for prokaryotic and eukaryotic cells, as well as for viruses, as it increases the cell’s flexibility in responding to the environment, allowing it to synthesize gene products (which are most often proteins) when needed.

Gene regulation is an intricate process whose complexity makes it an extreme challenge for mathematical modeling. For example, proteins synthesized from genes may control the flow of genetic information from DNA to mRNA, function as enzymes catalyzing metabolic reactions, or may be components of signal transduction pathways. Gene regulation also drives the processes of cellular differentiation and morphogenesis, leading to the creation of different cell types in multicellular organisms where the different types of cells may possess different gene expression profiles though they all possess the same genome sequence. The degradation of proteins and the immediate DNA products can also be regulated in the cell. The proteins involved in the regulatory functions are produced by other genes. This gives rise to a gene regulatory network (GRN) consisting of regulatory interactions between DNA, RNA, proteins, and small molecules. A simple GRN consists of one or more input genes, metabolic and signaling pathways, regulatory proteins that integrate the input signals, several target genes, and the RNA and proteins produced from target genes.

The first discovery of a GRN is widely considered to be the identification in 1961 of the lac operon, discovered by Jacques Monod [1], in which proteins involved in lac metabolism are expressed by Escherichia coli and some other enteric bacteria only in the presence of lactose and absence of glucose. Since its discovery, the lac operon has often been used as a model system of gene regulation. Most mathematical models of the lac operon have been given as systems of differential equations (see Chapter 2 for examples) but discrete modeling frameworks are increasingly receiving attention for their use in offering global insights [24]. For examples of discrete mathematical models of a GRN, see the Boolean network models of the lac operon in E. coli in Chapter 1.

Boolean networks are particularly useful in the case where one is interested in qualitative behavior. However, allowing each variable to assume only 0 and 1 as its values (often interpreted as ‘absent’ and ‘present’, or ‘off’ and ‘on’) does not always allow for sufficient flexibility in the model construction and can cause the loss of important information. Furthermore, few GRNs are as well understood as the lac operon, which prohibits model construction that is purely based on knowledge of the GRN. Technological advances in the life sciences, however, have triggered an enormous accumulation of experimental data representing the activities of the living cell. As a result, the type of mathematical modeling which starts from experimental data and builds a model with little additional information has become of interest. Such modeling methods are sometimes referred to as reverse engineering and are the topic of the following section.

3.1.2 Reverse Engineering of Gene Regulatory Networks

Reverse engineering is an approach to mathematical modeling in which a model is constructed from observations of a system in response to stimulus. The process of reverse engineering is similar to playing the parlor game Twenty Questions, in which one tries to discover a secret word by asking yes/no questions. While one can ask all possible feature-type questions (is it alive, is it a physical object), one can typically get away with far fewer questions. In fact, one wins if fewer than 20 questions are asked. In molecular biology, a biological network, such as our immune system’s response to infection, plays the role of the secret word and laboratory experiments take the place of the questions. The goal then is to discover the network through the experiments with the hope of gaining deeper insight into a particular phenomenon, such as seasonal effects on immunity. For reasons of cost, it is advantageous to be able to discover the network with as few experiments as possible.

Reverse engineering is a critical step in the systems biology paradigm that has pervaded the biological sciences in recent history. The modeling process starts with a biological system under study. Given a question or hypothesis about the system, the researcher designs experiments to probe the system in hopes of addressing the question or hypothesis. A model is then built from the data resulting from the experiments. Predictions or simulations are extracted from the model and then compared against the original system. The step “from data to a model” is one that uses reverse engineering. This paradigm differs from the previous “reductionist” paradigm that dominated twentieth century biology: the parts of a system were of importance for study in reductionism, whereas the system itself is of interest in systems biology.

Generally speaking, reverse engineering in systems biology aims to recover the network topology and dynamics of a network from observations. A model is built to fit the given observations and the topology and/or the dynamics of the network can be inferred from the model. Network topology refers to the physical structure of the network, that is, how the components in the network are connected. It is often encoded as a directed graph, or a wiring diagram, where vertices represent the components of the network (genes, proteins, signaling molecules, etc.) and a directed edge is drawn between two vertices if there is an interaction between the associated components (regulation, synthesis, activation, etc.). Dynamics refer to the behavior of the network over time, that is the time evolution of the network processes. The dynamics of a network is also depicted as a graph; in the case of finite dynamical systems, which are considered here, the graph is finite and directed. The so-called state space graph is comprised of vertices representing network states (n-tuples) and a directed edge is drawn between two vertices A and B if the network advances from state A to state B. See Section 3.2 for definitions.

Gene regulatory networks are often modeled using collections of mathematical functions in which each molecular component is assigned a function that formalizes the dynamics of the component [5,6]. From these functions the wiring diagram and state space can be constructed and the structure and behavior of the GRN can be analyzed. A challenge for molecular geneticists is to identify causal links in the network. Identification and control of these links are important first steps in repairing defects in regulation. While there is a growing amount of data being collected from such networks, control of GRNs requires knowledge of the topology and dynamics of the GRN.

Mathematical methods to reverse engineer GRNs are diverse and draw from statistics, graph theory, network theory, computational algebra, and dynamical systems [7,8]. The performance of these methods intrinsically depends on the amount and quality of data provided [9]. In practice, there is insufficient data to uniquely infer a model for a GRN and the number of models that fit the data may be considerably large. An area of continual growth is the development of methods to select biologically feasible or likely models from a pool of candidate models. There are numerous strategies for model selection. For example, some methods restrict the space of likely wiring diagrams to those that have few inputs per vertex or whose in-degree distribution follows a power law, features which are consistent with what is believed about GRNs [10,11]. As it has also been observed that GRNs tend to be ordered systems with a limited number of regimes, there are methods which filter their results to only those state spaces with few steady states and short limit cycles (see Section 3.2 for definitions). Other aspects of GRNs, such as the negative or positive feedback loops and oscillatory behavior, are also considered in model selection.

3.2 Polynomial Dynamical Systems (PDSs)

Different mathematical frameworks have been proposed for the modeling of GRNs, with continuous models using differential equations being the most common approach. While the behavior of a biological system may be seen as continuous in that change of concentration of biochemicals can be modeled with continuous functions, the technology to record observations of the system is not continuous and the available experimental data consist of collections of discrete instances of continuous processes. Furthermore, discrete models where a variable (e.g., a gene) could be in one of a finite number of states are more intuitive, phenomenological descriptions of GRNs and, at the same time, require less data to build. The framework of finite dynamical systems (FDS) provides the opportunity to model a GRN as a collection of variables that transition discretely from one state to the next.

Definition 3.1

A finite dimensional system of dimension n is a function image, where each image is called a local (or transition) function, and S is a finite set.

For each variable image, its local function image determines the state of image in the next iteration based on the current state of all variables, including possibly image itself. If we further require that the state set S be a finite field, then a result in [12] guarantees that the local functions of an FDS can be expressed as polynomial functions. Working exclusively with polynomials facilitates the modeling process significantly, as we shall see in later sections. Fortunately, from basic abstract algebra we know that a restriction on the size of S can turn it into a finite field by requiring that the cardinality of S be a power of a prime integer. An example of a finite field is the set of integers modulo p, denoted image, where p is prime. The elements of image are the equivalence classes of remainders upon division by p, namely image, and addition and multiplication of these classes corresponds to addition and multiplication of representatives from the classes modulo p.

Example 3.1

The finite field image has elements image, and image. We see that image since image. Similarly, image since image.

For simplicity, we will write m instead of image to represent the elements of a finite field.

If the state set for an FDS F is a finite field image, we call F a polynomial dynamical system (PDS).

Example 3.2

Let image and image be a PDS with transition functions

image

The dimension of F is 2, where each of the two variables image and image can be in one of three states, i.e., 0, 1, or 2.

A key issue in the study of GRNs is understanding the relationships among genes, proteins, metabolites, etc. in the network. Thus an important characteristic of a PDS is the way its variables are “connected.” To visualize the network topology, a graph called a wiring diagram is used, where the vertices are labeled by the PDS variables and the directed edges signify the direction of interaction between two variables.

Definition 3.2

Let F be an n-dimensional PDS on variables image. The wiring diagram (also known as a dependency graph) of F is a directed graph with vertex set image. A directed edge is drawn from image to image if, and only if, image is present in at least one polynomial term (with a nonzero coefficient) of the local function image.

Figure 3.1(a) shows the wiring diagram of the PDS from Example 3.2.

image

Figure 3.1 Wiring diagram (a) and state space graph (b) of the PDS from Example 3.2.

While the wiring diagram of a PDS provides only a static snapshot of the network topology with no reference to strength and timing of the interactions, such a graph carries very useful information about the biological system and several methods have been developed for its inference. Sections 3.5.1 and 3.5.2 present two such methods.

Just as a phase portrait is used to plot the simultaneous change of two or three variables in a system of differential equations [13], the dynamic properties of a PDS are captured in a graph called a state space graph. In fact, the dynamics of a PDS image is uniquely represented by its state space graph, which has image vertices.

Definition 3.3

The state space graph of a PDS image on variables image is a directed graph whose vertex set is image and contains an edge directed from image to image if, and only if, image, where image.

The state space graph of the PDS from Example 3.2 is shown in Figure 3.1(b). For example, there is an arrow from state image to state image because image using modulo 3 arithmetic (since the field is image). The graph consists of image vertices arranged into three connected components. Notice that image and image. Such points are called fixed points and these two are the only fixed points for this PDS. Also notice that image, or graphically, image. This is called a limit cycle and its length is two. Analogously, for any positive integer m, a limit cycle of length m arises whenever image (and m is the smallest positive integer with this property) for some image. Fixed points can be considered as limit cycles of length one. For this PDS, image is the only limit cycle of length larger than one.

Exercise 3.1

Construct by hand the state space graphs and wiring diagrams for the following PDSs. Can a vertex in the state space graph have an out-degree greater than one, that is, can the vertex have more than one directed edge coming out of it? How about a vertex in the wiring diagram?

1. image, where

image

2. image, where

image ent

Exercise 3.2

Compute the fixed points of the PDS in Exercise 3.1 (1) algebraically. Compare your answers to what you found in the previous exercise. Hint: A fixed point will satisfy image and imageent

image

Figure 3.2 The Gröbner fan of the ideal in Example 3.10 intersected with the standard 2-simplex. The image-axis is on the right, the image-axis on the left, and the image-axis at the top.

Exercise 3.3

In Exercise 3.1 (2), you saw that the PDS has no fixed points. If instead we consider the system over image, would you expect to see fixed points? Verify your answer using the web tool Analysis of Dynamic Algebraic Models (ADAM) at

image

Figure 3.3 Diagram representing the SLC algorithm applied to the data of Example 3.12. The column on the right gives the corresponding Shannon’s entropy increasing at each consecutive level.

dvd.vbi.vt.edu/adam.html.   ent

Exercise 3.4

Find three different PDSs over image that have the wiring diagram as in Figure 3.5.  ent

image

Figure 3.4 The complete weighted graph constructed from vector entries 1, 2, 7, 9, 10, 11. Only the edge weights of the outer edges are given.

image

Figure 3.5 Wiring diagram for Exercise 3.5.

Exercise 3.5

Let’s reverse engineer! Find a PDS on three variables over image that fits the data as in Figure 3.6. Is the PDS unique? What if the entire state space was given? ent

image

Figure 3.6 Partial state space for Exercise 3.6.

Exercise 3.6

[14] Phosphorylation is the addition of a phosphate group to a protein or any other organic molecule. It turns many protein enzymes on and off, thereby altering their function and activity. Suppose proteins A and B interact. Let image reflect the absence (0)/presence (1) of protein A, whereas image indicates if the protein B has already been phosphorylated (1) or not (0) before the interaction. Furthermore, we want image to indicate the absence/presence of protein A right after the interaction and image to point out whether protein B has been phosphorylated or not.

1. Explain why image should depend on image only, while image must depend on both image and image.

2. Do you find the following Boolean model reasonable? Correct it if necessary.

image

3. Convert the above Boolean functions into polynomials over imageent

Project 3.1

Study the articles [15,37] and construct the state spaces and wiring diagrams of the models. ent

Project 3.2

Study the article [16] and, without computing the entire state space, find the cycle structure of the following Boolean monomial dynamical system: image, where

image

 ent

Project 3.3

In [17], the authors describe an equivalence among three discrete modeling classes, namely logical models, bounded Petri nets, and PDSs. What is the significance of such an equivalence? In addressing this question, consider the following:

1. In their article, the authors describe logical models and bounded Petri nets. How easy is it to describe or construct these types of models?

2. What algebraic structure do they have?

3. How do their description and algebraic structure compare to those of PDSs?

4. Do you think it is easier to develop an algorithm for PDSs than for logical models or Petri nets? ent

3.3 Computational Algebra Preliminaries

In this section we introduce some basic concepts from computational algebra that we will need for the rest of the chapter. For a comprehensive introductory treatment of the subject some excellent sources are [1921].

In linear algebra we learn how to solve systems of linear equations using Gaussian elimination. However, the algorithm cannot be easily adjusted for application to systems of polynomials of degree larger than one. Simultaneously solving systems of polynomial equations is needed in many areas but a systematic computational method was not developed until the 1960s when the so-called Gröbner bases were developed [22,23]. The role of a Gröbner basis is similar to that of a basis for a vector space and the algorithm for finding it can be seen as a multivariate, nonlinear generalization of Gaussian elimination for linear systems.

An issue that arises when using multivariable polynomials is that polynomial division is generally not well defined. Polynomial division in one variable is always well defined in the sense that there is a unique remainder upon division of polynomials. This is because there is a unique monomial ordering (see Definition 3.4) and the division of polynomial f by polynomial g proceeds by dividing the highest power of the variable in g into the highest power in f. In other words, the one-variable monomials are ordered using degree ordering:

image

With multivariate polynomials, however, there is more than one way of ordering their monomials (terms) and thus to carry out long division. So the remainder of long division is not uniquely determined in general.

Example 3.3

Consider the polynomials image and image. The remainder of dividing f by g is image or image, depending on whether we choose the initial monomial of g to be image or image, respectively.

As we will see below, multivariate polynomial division becomes well defined when using Gröbner bases.

Let k be a field. Here, we focus on the finite fields image where p is prime; however, much of the presented results are valid for other fields, such as image. The linear combination of monomials of the form image over k, where image is the n-tuple exponent image with nonnegative entries, forms a ring, called a polynomial ring, denoted image. For the purpose of polynomial division, it is necessary to be able to arrange the terms of a polynomial unambiguously in some order. This requires a total ordering on the monomials, i.e., for every pair of monomials image and image, exactly one of the following must be true:

image

Taking into account the properties of the polynomial sum and product operations, the following definition emerges.

Definition 3.4

A monomial ordering on image is any relation image on image satisfying:

1. image is a total ordering on image.

2. If image and image, then image.

3. image is a well ordering on image, i.e., every nonempty subset of image has a smallest element under image.

To simplify the classification of monomial orderings, we will assume throughout that image (or image).

One of the most popular monomial orderings is the lexicographic ordering (lex) which is analogous to the ordering of words in dictionaries and under which image. Another one is the graded lexicographic ordering (grlex) which orders by total degree first, then “breaks ties” using the lexicographic ordering. Under graded lexicographic ordering, image. Yet another very useful monomial ordering is the graded reverse lexicographic ordering (grevlex). Like grlex, it uses the total degree first but to break ties it looks at the rightmost variable and favors the smaller power. For example, image but image. Algorithmic definitions are available at [19].

A monomial ordering can also be defined by a weight vector image in image. We require that image have nonnegative coordinates in order for 1 to always be the smallest monomial. Fix a monomial ordering image, such as image. Then, for image, define image if and only if

image

For example, with image, weight vector image orders the monomials of polynomial image in the same way as the lexicographic order, while image performs the same monomial ordering as the graded lexicographic ordering.

However, even after fixing the monomial ordering, there is still a problem with multivariate polynomial division: when dividing a given polynomial by more than one polynomial, the outcome may depend on the order in which the division is carried out. Let image. The famous ideal membership problem is to determine whether there are polynomials image such that image.The following definition will help us state this in the language of abstract algebra and explains the name of the problem.

Definition 3.5

Let k be a field. Then

image

is called an ideal in image.

We know from abstract algebra that I is closed under addition and multiplication by any polynomial in image. This is analogous to a subspace of a vector space, where the “vectors” are polynomials.

We ask whether f is an element of I. In general, even under a fixed monomial ordering, the order in which f is divided by the generating polynomials image affects the remainder; perhaps even more surprisingly, a nonzero remainder of division does not imply image. Moreover, the generating set image of the ideal I is not unique but a special generating set image can be selected so that the remainder of polynomial division of f by the polynomials in image performed in any order is zero if and only if f lies in I. A generating set with this property is called a Gröbner basis. A Gröbner basis exists for very ideal other than image and for a fixed monomial ordering. Before we give a precise definition, we need to look at a special kind of ideal.

Definition 3.6

The initial ideal of an ideal image for a fixed monomial ordering image is denoted image and is the ideal generated by the set of leading monomials (under the specified ordering) of the polynomials of I. That is, image, where image is the monomial of f that is ordered first under the specified monomial ordering, called a leading or initial monomial. The monomials which do not lie in image are called standard monomials.

Definition 3.7

Fix a monomial ordering image. A finite subset image of an ideal I is a Gröbner basis if image.

A Gröbner basis for an ideal is necessarily a generating set for the ideal but may not be unique even under a fixed monomial ordering. However, if we also require that for any two distinct elements image, no term of image is divisible by image, such a Gröbner basis image is called reduced and is unique for an ideal and a monomial ordering image, provided the coefficient of image in g is 1 for each image.

Example 3.4

Let image and image, and image in image under grlex order. Notice that image is not a Gröbner basis for I because image and so image but image. The Buchberger algorithm (or any of its improvements, such as the Buchberger-Möller algorithm [24]) generates the following grlex Gröbner basis: image.

Example 3.5

In the previous example, we can see that the leading monomials (disregarding coefficients) of the Gröbner basis are image, and image. So the initial ideal is generated by image, and image: any other leading monomial can be written in terms of these generators. Furthermore, we see that the standard monomials associated to the Gröbner basis are 1, x, and y.

3.4 Construction of the Model Space: A Reverse Engineering Algorithm

In this section, we consider the problem of constructing PDSs from a given set of data collected from a biological system on n components, called nodes.

Let image be a finite field. Let image be a set of m input states of an n-node system, and image a set of corresponding output states, where image. Each of image and image has the form

image

where states in bold are elements of image and italicized states are elements of image. We aim to find all PDSs image over image such that image for each image and where each image. We proceed by considering the subproblem of finding all transition polynomials image that satisfy

image

Let p be the (prime) characteristic of image. The identity image for each image imposes a degree restriction on polynomials when viewed as functions, namely image for image. This equation gives rise to the relation image, for each image. Since we are interested in polynomials that can be treated as functions, the local functions image are actually polynomials in the quotient ring image. This ring contains polynomials that are the remainders of the elements of image upon division by the degree relations, that is, polynomials whose degree in any variable is less than p. In essence, the quotient ring is constructed by “modding out” by the elements of the “divisor” ideal. Thus we aim to find all polynomials image that map each system state image to the node state image.

Since the remaining discussion assumes that the coefficient field has positive characteristic, we can simplify the notation. We will denote the quotient polynomial ring simply as image and elements in the ring as polynomials with usual representation. That is, while an element of the quotient ring is of the form image, we will instead write it as image, understanding that all ring elements have been reduced modulo the ideal image.

Let image be the set of all polynomials that fit the data for node j as described above. We solve the problem similarly to solving a system of nonhomogeneous linear equations in that we construct a “particular” polynomial and the set of “homogeneous” polynomials. The “particular” polynomial is any that interpolates or fits the given data. There are numerous methods for constructing an interpolating polynomial function. Here we use a formula based on the ring-theoretic version of the Chinese Remainder Theorem:

image

where image is a polynomial that evaluates to 1 on image and 0 on any other input, and image represents image. Specifically,

image

where image is the first coordinate in which image and image. For a detailed description of the construction, see [25]. For the set of “homogeneous” polynomials, we use the Ideal-Variety Correspondence from algebraic geometry [19]. We view the input data image as a variety, or a set of roots of a system of polynomials equations. The Ideal-Variety Correspondence states how to construct the ideal of polynomials that vanish on the data. To each input state image, for image, we associate the (maximal) ideal

image

Each ideal image is the set of polynomials that evaluate to 0 on the single data point image. According to the correspondence, the intersection of these ideals contains all polynomials that vanish on the union of the input data. So the ideal of “homogeneous”, or vanishing, polynomials is

image

which we denote by I for simplicity. Therefore, the set image is described by image.

Example 3.6

Consider a 3-node system having the following states in image:

200
431
314
043

which we interpret as

image

The corresponding ideals are

image

To compute the intersection I of the ideals, we can use the computer algebra system Macaulay 2 [26] with the following code:

 R=ZZ/5[x1,x2,x3]

 I1=ideal(x1-2,x2,x3)

 I2=ideal(x1-4,x2-3,x3-1)

 I3=ideal(x1-3,x2-1,x3-4)

I=intersect{I1,I2,I3}.

Then I is computed as

image

Note that negative coefficients can be written as positive numbers: image, etc.

To find the function image for node image such that

image (3.1)

we compute the r polynomials:

image

We find that

image

So the set image of polynomials that fit the data for node image is image. Using the above strategy, we can find similar sets image and image for image and image, respectively, where image and image.

The data set in the previous example is called a time series, since the data points are considered to be successive. We call image the model space associated to the given data and any PDS image is called a model. Note that the ideal I of vanishing polynomials is independent of the node or particular function.

For the following exercises, consider the time series over image, which contains five observations for a 3-node system.

1 1 1
2 0 1
2 0 0
0 2 2
0 2 2

Exercise 3.7

Compute the ideal of the data points using Macaulay 2 and the lex monomial order with the variable order image. Use the previous example as a reference. In Macaulay 2, polynomial rings require an explicit choice of monomial ordering. For this exercise, you will use the following:

R=ZZ/3[x1,x2,x3,MonomialOrder=>Lex]

If none is given, then grevlex is used by default. Also the order of the variables in the polynomial ring indicates the variable order. Is the ideal different under the grevlex order? ent

Exercise 3.8

Let image be a data set as in Example 3.6. A polynomial image that interpolates D can be constructed as

image

Compute an interpolating polynomial for each node using this interpolation formula and the one provided in the text above. Do they differ on the data? Do they differ on other data points? ent

Project 3.4

An easy calculation shows that there are image states in image. From this we find that there are image polynomials in image. How many n-dimensional PDSs are there over image? Given a data set image, how many models are there in the corresponding model space? ent

3.5 Model Selection

As we saw in the previous section, the model space consists of all PDSs that fit a given data set. The problem is now to select models from the space.

There are numerous strategies for model selection. One naive approach is to simply choose the interpolating polynomial constructed above. The problem, however, is that predictions of the model are dependent on the form of the model. Specifically, the terms that appear in the polynomials will “predict” the interactions among the nodes as represented in the wiring diagram associated to the PDS. While it is possible that the chosen interpolation formula well characterizes the physical setting being modeled, it is more likely that the resulting model will have little to no predictive power. Therefore, strategies that incorporate appropriate priors are desirable (for example, see [27,28]).

It is believed that GRNs are sparse, in the sense that they have few (less than half of all possible) connections, and their edges follow a power-law distribution [10,11]. A model selection strategy that makes use of these assumptions is one based on Gröbner bases. Given a set image of transition functions for a node, the method finds a polynomial in the set that contains no terms in I. This is done by computing the remainder of image upon division by the elements in I. As seen in Section 3.3, polynomial division in general is not well defined in the sense that remainders may not be unique, but is well defined given a Gröbner basis. The strategy requires that a Gröbner basis image for I is computed. Then the process of computing the remainder of image upon division by the elements of I results in a unique polynomial called the normal form of image with respect to image and is denoted image. Given a monomial ordering image and a Gröbner basis image for I with respect to image, then image is a model that fits the given data and is reduced with respect to the data.

Since Gröbner bases depend on the choice of monomial ordering, so do resulting models.

Example 3.7

In Example 3.6, the graded reverse lexicographic ordering, grevlex, with image will yield models involving the terms image, whereas the lex ordering with the same variable order results in models with the terms image. We can see this by doing the following in Macaulay 2 (comments are preceded by two dashes):

 R=ZZ/5[x1,x2,x3] --grevlex order is assumed

 I1=ideal(x1-2,x2,x3) --same ideals as above

 I2=ideal(x1-4,x2-3,x3-1)

 I3=ideal(x1-3,x2-1,x3-4)

I=intersect{I1,I2,I3}

f1=x1ˆ2+x1+3

f2=3*x1ˆ2+x1+4

f3=2*x1ˆ2+2*x1+4

The next lines compute each f “modulo” I and is the same as image, where image is the reduced Gröbner basis for I with respect to the monomial order given in the description of the polynomial ring.

 f1%I

 f2%I

 f3%I

We see that image, and image, where image is the Gröbner basis for I with respect to grevlex.

Now let’s change the order to lex. Since the f polynomials and I were defined as objects in the ring R, we need to “tell” Macaulay 2 to consider them as objects in a new ring with a different monomial order.

S=ZZ/5[x1,x2,x3, MonomialOrder=>Lex]

 sub(f1,S)%sub(I,S)

 sub(f2,S)%sub(I,S)

 sub(f3,S)%sub(I,S)

This time, we see that while image and image are the same as the normal forms with the grevlex order, here we have that image, where image is the Gröbner basis for I with respect to lex.

To see the explicit Gröbner basis that is being used, you can type

 gens gb I

which returns the generators of the Gröbner basis as a matrix. To get the generators in list form, type

 flatten entries gens gb I

Exercise 3.9

Based on your work in Exercise 3.7, which variables do you expect to be present in the normal forms? For example, do you expect image to appear in any normal form? ent

Exercise 3.10

Use Macaulay 2 to compute the normal forms of the polynomials you found in Exercise 3.8 with respect to the ideals you found in Exercise 3.7. Are the resulting polynomials different when computed using lex, grlex, or grevlexent

Exercise 3.11

Compute the set of standard monomials for the monomial orders considered in Exercise 3.7. You can use the following command in Macaulay 2 to assist you:

 leadTerm I

(This is the same as leadTerm (gb I) since a Gröbner basis is required.) What is the relationship between these monomials and the terms in the normal forms computed in Exercise 3.10ent

3.5.1 Preprocessing: Minimal Sets Algorithm

We saw in the previous section that the choice of monomial ordering affects which model is selected from the model space. One way to minimize the effect of this choice is to minimize the number of variables in the ambient ring. We again restrict our attention to finding functions image for a node image. Consider the data set image as above and let image be the set of polynomials that fit the data. Next we describe the Minimal Sets Algorithm (MSA) which finds the smallest sets of variables such that a polynomial in image in those variables exists (see [18] for more details).

Definition 3.8

Let image and image be as above. Then image is a minimal set if image and removal of any variable from M renders the intersection empty.

Minimal Sets Algorithm:

1. Input image.

2. Partition the input states image according to the output values image. For each output value image, let image. Then the partition is given by image.

3. Compute the square-free monomial ideal M generated by the monomials image for all image and image in which image. The monomials image encode the coordinates in which the input states image and image differ.

4. Compute the primary decomposition of M; that is, write M as the irredundant intersection image of primary ideals.

5. Return the generating sets of the associated minimal prime ideals image.

The generating sets of the ideals image are the sought after minimal sets (see Corollary 4 of [18]).

Theorem 3.1

Let imagebe as above. The minimal sets for imageare the generating sets of the minimal primes in the primary decomposition of the monomial ideal M as constructed above.

Example 3.8

Consider the data in Example 3.6. The minimal sets for image are image, and image, meaning that there is a polynomial function in one variable that fits the data for image. In this case the minimal sets are the same for nodes 2 and 3.

Essentially what the MSA does is it computes the intersection of all possible “minimal” wiring diagrams. In the above example, the minimal sets for image are image, and image which indicates that ALL minimally described functions which fit the data for image will be in terms of just one variable.

As the above example demonstrates, there are typically many minimal sets for a node. Next we introduce a method to score minimal sets, based on the sparseness assumption.

Let image be the set of minimal sets for a node image. Define image to be the number of sets in image of cardinality s and image to be the number of sets X in image of cardinality s such that image. Then we score each variable image using the function

image

and each set using

image

We can obtain a probability distribution on image by dividing the set scores by image. The best performing minimal set is the one with the highest normalized set score.

Example 3.9

Suppose we have the following minimal sets for a given node in a 6-node system: image. Then the scores are 1, 7/24, 1/8, and 7/108, respectively. In this case, the singleton set image has the highest score. Even though image and image are the same size, image has a higher set score than image since the variables image and image have higher variable scores than image, namely, 1/2, 7/12, and 1/4, respectively.

For other scoring strategies, see [18]. Note that the strategy presented here corresponds to the image method in that paper.

Exercise 3.12

Using the time series immediately preceding Exercise 3.7, compute the minimal sets for each node. ent

Exercise 3.13

Why can the following not be minimal sets?

1. image

2. image

3. image

4. image

 ent

Exercise 3.14

While sets of standard monomials depend on the choice of monomial order, minimal sets do not. What is the relationship between a minimal set and a set of standard monomials? ent

Project 3.5

One can sometimes extract information about a data set from the structure (regularity or irregularity in the distribution) of a given collection of minimal sets. What can you infer from the following minimal sets for a 5-dimensional system? Each collection of minimal sets has a different corresponding data set. Look for patterns in occurrences of variables in the minimal sets and explore what these patterns say about the corresponding data set.

1. image. Hint: What does it mean that imageis in every minimal set? That every minimal set has the same variable together with any other variable (except image)? That imageis not present?

2. imageHint: What does it mean that imageappears in a minimal set by itself?

 ent

3.5.2 Postprocessing: The Gröbner Fan Method

Recall from Section 3.5 that Gröbner bases depend on the choice of monomial ordering and so do resulting models. In order to avoid ambiguity in model selection, we would like to be able to (1) identify all monomial orderings that cause the Buchberger algorithm to produce different reduced Gröbner bases; (2) study the effect of these monomial orderings on the model dynamics; and (3) select the best model according to some criteria.

A combinatorial structure that contains information about all reduced Gröbner bases of an ideal, and thus fulfills item (1) from the above list, is the Gröbner fan of an ideal. It is a polyhedral complex of cones, each corresponding to an initial ideal, which is in a one-to-one correspondence with the marked reduced Gröbner bases of the ideal, where “marked” refers to the initial term of each generating polynomial being distinguished. Here we give a brief introduction to the concept of the Gröbner fan. For details see, for example, [29].

A fixed polynomial ideal has only a finite number of different reduced Gröbner bases. Informally, the reason is that most of the monomial orderings only differ in high degree and the Buchberger algorithm for Gröbner basis computation does not “see” the difference among them. However, they may vary greatly in the number of polynomials and shape. In order to classify them, we first present a convenient way to define monomial orderings using vectors. Again, we think of a polynomial in image as a linear combination of monomials of the form image over a field k, where image is the n-tuple exponent image.

Definition 3.9

Suppose that image is a vector with real coefficients. It can be used to define an ordering on the elements of image by image if and only if image.

We are now almost ready to give a definition of the Gröbner fan. We first define its building blocks, the Gröbner cones.

Definition 3.10

Let image be a marked reduced Gröbner basis for an ideal I. Write each polynomial of the basis as image, where image is the initial term in image. The cone of image is image.

The collection of all cones for a given ideal is the Gröbner fan of that ideal. The cones are in bijection with the marked reduced Gröbner bases of the ideal. Since reducing a polynomial modulo an ideal I, as the reverse engineering algorithm requires, can have at most as many outputs as the number of marked reduced Gröbner bases, it follows that the Gröbner fan contains information about all Gröbner bases (and thus all monomial orderings) that need to be considered in the process of model selection. For an example of utilizing the information encoded in the Gröbner fan of an ideal for reverse engineering of PDSs, see [30].

There are algorithms based on the Gröbner fan that enumerate all marked reduced Gröbner bases of a polynomial ideal. An excellent implementation of such an algorithm is the software package Gfan [31].

Example 3.10

Consider the ideal

image

This ideal has three distinct marked reduced Gröbner bases, image (below), that correspond to the given weight vectors (monomial orderings), image. In other words, the Gröbner fan of I consists of three cones and each of the given weight vectors is an element of a different cone.

image

One can compute the first Gröbner basis, for instance, using the following Macaulay 2 code. Note that the ideal generators do not have to be a Gröbner basis.

R=ZZ/3[x,y,z, MonomialOrder=>{Weights=>{2,1,1}}]

I=ideal(zˆ2-z, yˆ2-y, x*z+y*z-x-y-z+1, x*y-y*z, xˆ2-x)

 flatten entries gens gb I

The following are the normal forms of the polynomial image, calculated with respect to the Gröbner bases and monomial orderings above:

image

The software packages Gfan and others implement a special type of Gröbner basis conversion, known as the Gröbner walk [32]. Unless, however, the dimension of the fan is low or the number of its cones happens to be small, computing the entire fan is computationally expensive and some of its components do not have polynomial bounds. The following is an example that illustrates how infeasible computing the entire Gröbner fan could be.

Example 3.11

For illustrative purposes, we will work over image in this example. The Gröbner fan of the ideal

image

has 360 full-dimensional cones. The intersection of the fan with the standard simplex in image is shown in Figure 3.2[31].

Fortunately, in order to compute polynomial normal forms, the only information that we need to extract from the Gröbner cones of a fan is their corresponding reduced Gröbner bases and/or their relative volumes, where “relative” refers to the cone volume when bounded by an n-sphere centered at the cone’s vertex. While computing the exact relative cone volumes requires knowing the facets of the fan, that is, the fan itself, approximation of the relative volumes in many cases is sufficient [33]. A stochastic method for estimating the relative volumes of the Gröbner cones of a Gröbner fan without computing the actual fan, as well as a Macaulay 2 implementation for uniform sampling from the Gröbner fan, is presented in [34].

Exercise 3.15

In Example 3.10, the weight vector image generated Gröbner basis image. Can you find a different weight vector that produces the same Gröbner basis? ent

Exercise 3.16

Which one of the three Gröbner bases in Example 3.10 corresponds to the lexicographic monomial ordering?  ent

Exercise 3.17

Let us construct a stochastic PDS. Let image be a Boolean PDS that fits a data set. Let I be the ideal of functions that vanish on the data points. Suppose 20% of all monomial orderings generated the following normal form of F with respect to I:

image

and 80% of the monomial orderings generated

image

Draw the state space and wiring diagram of this stochastic PDS labeling the edges with the corresponding probabilities.  ent

Project 3.6

Consider the four time series in Section 4.3 of [30]. Find the ideal of polynomials that vanish on the series and using the software package Gfan [31], compute its Gröbner fan.  ent

3.6 Discretization

For reasons explained at the beginning of Section 3.2, we have been assuming that the experimental data we use for reverse engineering have already been discretized into a (small) finite number of states. Typically, however, experimental measurements come to us represented by computer floating point numbers and consequently data discretization is in fact part of the modeling process and can be viewed as a preprocessing step. We will use the definition of discretization presented in [35].

Definition 3.11

A discretization of a real-valued vector image is an integer-valued vector image with the following properties:

1. Each element of image is in the set image for some (usually small) positive integer D, called the degree of the discretization.

2. For all image, we have image if and only if image.

Without loss of generality, assume that image is sorted, i.e., for all image. Spanning discretizations of degree D satisfy the additional property that the smallest element of image is equal to 0 and that the largest element of image is equal to image.

There is no universal way for data discretization that works for all data sets and all purposes. Sometimes discretization is a straightforward process. For example, if a gene expression time series has a sigmoidal shape, e.g., image, it is reasonable to discretize it as image. More complicated expression profiles may be easy to discretize too and it is often true that the human eye is the best discretization “tool” whose abilities to discern patterns cannot be reproduced by any software. Large data sets, on the other hand, do require some level of automatization in the discretization process. Regardless of the particular situation, it is good practice to look at the data first and explore for any patterns that may help with the discretization before inputting the data into any discretization algorithm. Afterwards, the way you choose to discretize your data, which includes selecting the number of discrete states, should depend on the type and amount of data and the specific reason for discretization. Below we present several possible approaches which by no means comprise a complete list.

Binary discretizations are the simplest way of discretizing data, used, for instance, for the construction of Boolean network models for gene regulatory networks [36,37]. The expression data are discretized into only two qualitative states as either present or absent. An obvious drawback of binary discretization is that labeling real-valued data according to a present/absent scheme may cause the loss of large amounts of information.

Interval discretizations divide the interval image into k equally sized bins, where k is user-defined. Another simple method is quantile discretization which places image (possibly duplicated) values in each bin [38]. Any method based on those two approaches would suffer from problems that make it inappropriate for some data sets. Interval discretizations are very sensitive to outliers and may produce a strongly skewed range [39]. In addition, some discretization levels may not be represented at all which may cause difficulties with their interpretation as part of the state space of a discrete model.

On the other hand, quantile discretizations depend only on the ordering of the observed values of image and not on the relative spacing values. Since distance between the data points is often the only information that comes with short time series, losing it is very undesirable. A shortcoming, common for both interval and quantile, as well as for most other discretization methods, is that they require the number of discrete states, k, to be user-provided.

A number of entropy-based discretization methods deserve attention. An example of those is Hartemink’s Information-Preserving Discretization (IPD) [35]. It relies on minimizing the loss of pairwise mutual information between each two real-valued vectors (variables). The mutual information between two random variables X and Y with joint distribution image and marginal distributions image and image is defined as

image

Note that if X and Y are independent, by definition of independence image, so image. Unfortunately, when modeling regulatory networks and having as variables, for instance, mRNA, protein, and metabolite concentrations, the joint distribution function is rarely known and it is often hard to determine whether two variables are independent.

Another family of discretization techniques is based on clustering [40]. One of the most often used clustering algorithms is the k-means [41]. It is a non-hierarchical clustering procedure whose goal is to minimize dissimilarity in the elements within each cluster while maximizing this value between elements in different clusters. Many applications of the k-means clustering such as the MultiExperiment Viewer [42] start by taking a random partition of the elements into k clusters and computing their centroids. As a consequence, a different clustering may be obtained every time the algorithm is run. Another inconvenience is that the number k of clusters to be formed has to be specified in advance. Although there are methods for choosing “the best k” such as the one described in [43], they rely on some knowledge of the data properties that may not be available.

Another method is single-link clustering (SLC) with the Euclidean distance function. SLC is a divisive (top-down) hierarchical clustering that defines the distance between two clusters as the minimal distance of any two objects belonging to different clusters [40]. In the context of discretization, these objects will be the real-valued entries of the vector to be discretized, and the distance function that measures the distance between two vector entries v and w will be the one-dimensional Euclidean distance image. Top-down clustering algorithms start from the entire data set and iteratively split it until either the degree of similarity reaches a certain threshold or every group consists of one object only. For the purpose of data analysis, it is impractical to let the clustering algorithm produce clusters containing only one real value. The iteration at which the algorithm is terminated is crucial since it determines the degree of the discretization. SLC with the Euclidean distance function has one major advantage: very little starting information is needed (only distances between points). In addition, being a hierarchical clustering procedure it lends itself to adjustment in case that clusters need to be split or merged. It may result, however, in a discretization where most of the points are clustered into a single partition if they happen to be relatively close to one another. Another problem with SLC is that its direct implementation takes D, the desired number of discrete states, as an input. However, we would like to choose D as small as possible, without losing information about the system dynamics and the correlation between the variables, so that an essentially arbitrary choice is unsatisfactory.

In [44], a hybrid method for discretization of short time series of experimental data into a finite number of states was introduced. It is a modification of the SLC algorithm: it begins by discretizing a vector in the same way as SLC does but instead of providing D as part of the input, the algorithm contains termination criteria which determine the appropriate value of D. After that each discrete state is checked for information content and if it is determined that this content can be considerably increased by further discretization, then the state is separated into two states in a way that may not be consistent with SLC.

If more than one vector is to be discretized, the algorithm discretizes each vector independently. (For details on multiple vector discretization, see [44]). If the vector contains m distinct entries, a complete weighted graph on m vertices is constructed, where a vertex represents an entry and an edge weight is the Euclidean distance between its endpoints. The discretization process starts by deleting the edge(s) of highest weight until the graph gets disconnected. If there is more than one edge labeled with the current highest weight, then all of the edges with this weight are deleted. The order in which the edges are removed leads to components, in which the distance between any two vertices is smaller than the distance between any two components, a requirement of SLC. We define the distance between two components G and H to be image.The output of the algorithm is a discretization of the vector, in which each cluster corresponds to a discrete state and the vector entries that belong to one component are discretized into the same state.

Example 3.12

Suppose that vector image is to be discretized. The diagram obtained by the SLC algorithm is given in Figure 3.3. The complete weighted graph in Figure 3.4 corresponds to iteration 0 of the diagram.

Having disconnected the graph, the next task is to determine if the obtained degree of discretization is sufficient; if not, the components need to be further disconnected in a similar manner to obtain a finer discretization. A component is further disconnected if, and only if, both (1) and (2) below hold:

1. The minimum vertex degree of the component is less than the number of its vertices minus 1. The contrary implies that the component is a complete graph by itself, i.e., the distance between its minimum and maximum vertices is smaller than the distance between the component and any other component.

2. One of the following three conditions is satisfied (“disconnect further” criteria):

a. The average edge weight of the component is greater than half the average edge weight of the complete graph.

b. The distance between its smallest and largest vertices is greater than or equal to half this distance in the complete graph. For the complete graph, the distance is the graph’s highest weight.

c. Finally, if the above two conditions fail, a third one is applied: disconnect the component if it leads to a substantial increase in the information content carried by the discretized vector.

The result of applying only the first two criteria is analogous to SLC clustering with the important property that the algorithm chooses the appropriate level to terminate. Applying the third condition, the information measure criterion may, however, result in a clustering which is inconsistent with any iteration of the SLC.

Exercise 3.18

Consider the following vector image. Plot the vector entries on the real number line and propose an appropriate discretization based on your intuition. How did you decide on the number of discrete states? Would your decision change if the entry 4.6 is added?  ent

Exercise 3.19

Discretize vector image from Problem 3.18 using the specified method.

1. Interval.

2. Quantile.

3. The hybrid method presented in this section.

 ent

Exercise 3.20

If you know that the experimental data is very noisy, would you be willing to use a smaller or a larger number of discrete states? ent

Exercise 3.21

Suppose a second vector, image, is to be discretized and it is known that image and image are strongly correlated (assume Spearman rank correlation). How would you discretize image based on your discretization of imageent

Project 3.7

Use the principle of relationship discretization that is behind Hartemink’s information-preserving discretization method to discretize image and imageent

References

1. Jacob F, Monod J. Genetic regulatory mechanisms in the synthesis of proteins. J Mol Biol. 1961;3(3):318–356.

2. Setty Y, Mayo A, Surette M, Alon U. Detailed map of a cis-regulatory input function. PANS. 2003;100(13):7702–7707.

3. Mayo A, Setty Y, Shavit S, Zaslaver A, Alon U. Plasticity of the cis-regulatory input function of a gene. PLoS Biology. 2006;4(4):0555–0561.

4. Stigler B, Veliz-Cuba A. Network topology as a driver of bistability in the lac operon. J Comput Biol. 2008;18(6):783–794.

5. de Jong H. Modeling and simulation of genetic regulatory systems: a literature review. J Comput Biol. 2002;9(1):67–103.

6. Bornholdt S. Less is more in modeling large genetic networks. Science. 2005;310:449–450.

7. Brazhnik P, de la Fuente A, Mendes P. Gene networks: how to put the function in genomics. Trends Biotechnol. 2002;20(11):467–472.

8. Gardner T, Faith J. Reverse-engineering transcription control networks. Phys Life Rev. 2005;2:65–88.

9. Bansal M, Belcastro V, Ambesi-Impiombato A, di Bernardo D. How to infer gene networks from expression profiles. Mol Sys Biol. 2007;3(78):1–10.

10. Burda Z, Krzywicki A, Martin OC, Zagorski M. Distribution of essential interactions in model gene regulatory networks under mutation-selection balance. Phys Rev E. 2010;82.

11. Yeh H-Y, Cheng S-W, Lin Y-C, Yeh C-Y, Lin S-F, Soo V-W. Identifying significant genetic regulatory networks in the prostate cancer from microarray data based on transcription factor analysis and conditional independency. BMC Med Genom. 2009;2.

12. Lidl R, Niederreiter H. Finite fields Encyclopedia of mathematics and its applications. vol. 20 Cambridge University Press 1997; p. 369.

13. Strogatz S. Non-linear dynamics and chaos: with applications to physics, biology, chemistry and engineering. Perseus Books 2000.

14. Termanini A, Tieri P, Franceschi C. Encoding the states of interacting proteins to facilitate biological pathways reconstruction. Biol Direct. 2010;5.

15. MacLean D, Studholme DJ. A boolean model of the pseudomonas syringae hrp regulon predicts a tightly regulated system. PLoS ONE. 2010;5.

16. Colon-Reyes O, Laubenbacher R, Pareigis B. Boolean monomial dynamical systems. Ann Comb. 2004;8:425–439.

17. Veliz-Cuba A, Jarrah A, Laubenbacher R. Polynomial algebra of discrete models in systems biology. Bioinformatics. 2010;26(13):1637–1643.

18. Jarrah A, Laubenbacher R, Stigler B, Stillman M. Reverse-engineering of polynomial dynamical systems. Adv Appl Math. 2007;39(4):477–489.

19. Cox D, Little J, O’Shea D. Ideals, varieties, and algorithms. New York: Springer-Verlag; 1997.

20. Adams WW, Loustaunau P. An introduction to Gröbner bases. Graduate studies in Math. vol. III American Mathematical Society 1994.

21. Eisenbud D. Introduction to commutative algebra with a view towards algebraic geometry Graduate Texts in Mathematics New York: Springer; 1995.

22. Buchberger B. An algorithm for finding a basis for the residue class ring of a zero-dimensional polynomial ideal. PhD Thesis, University of Innsbruck; 1965.

23. Hironaka H. Resolution of singularities of an algebraic variety over a field of characteristic zero: I. Ann Math. 1964;79(1):109–203.

24. Buchberger B, Möller HM. The construction of multivariate polynomials with preassigned zeros. In: Springer 1982;24–31. Calmet J, ed. Proceedings of the European computer algebra conference EURO CAM’82 LNCS. vol. 144.

24. Buchberger B, Möller HM. The construction of multivariate polynomials with preassigned zeros. In: Calmet J, editor. Proceedings of the European computer algebra conference EURO CAM’82. LNCS, vol. 144. Springer; 1982. p. 24–31.

25. Laubenbacher R, Stigler B. A computational algebra approach to the reverse engineering of gene regulatory networks. J Theor Biol. 2004;229:523–537.

26. Grayson D, Stillman M. Macaulay 2, a software system for research in algebraic geometry; 2006. http://www.math.uiuc.edu/Macaulay2.

27. Gustafsson M, Hornquist M, Lombardi A. Constructing and analyzing a large-scale gene-to-gene regulatory network Lasso-constrained inference and biological validation. IEEE/ACM Trans Comput Biol Bioinform. 2005;2(3):254–261.

28. Thorsson V, Hornquist M, Siegel A, Hood L. Reverse engineering galactose regulation in yeast through model selection. Stat Appl Genet Mol Biol. 2005;4(1):1–22.

29. Mora T, Robbiano L. Gröbner fan of an ideal. J Symb Comput. 1988;6(2/3):183–208.

30. Dimitrova ES, Jarrah AS, Laubenbacher R, Stigler B. A Gröbner fan method for biochemical network modeling. In: Proceedings of the international symposium on symbolic and algebraic computation (ISSAC). ACM 2007;122–126.

31. Fukuda K, Anders J, Thomas R. Computing Gröbner fans. Math Comput. 2007;76(260):2189–2212.

32. Collart S, Kalkbrener M, Mall D. Converting bases with the Gröbner walk. J Symb Comput. 1997;24:465–469.

33. Dimitrova ES, Garcia-Puente LD, Hinkelmann F, et al. Parameter estimation for Boolean models of biological networks. J Theor Comput Sci. 2011;412:2816–2826.

34. Dimitrova ES. Estimating the relative volumes of the cones in a Gröbner fan. Math Comput Sci. 2010;3(4):457–466.

35. Hartemink A. Principled computational methods for the validation and discovery of genetic regulatory networks. PhD Dissertation, Massachusetts Institute of Technology; 2001.

36. Kauffman SA. Metabolic stability and epigenesis in randomly constructed genetic nets. J Theor Biol. 1969;22:437–467.

37. Albert R, Othmer H. The topology of the regulatory interactions predict the expression pattern of the segment polarity genes in Drosophila melanogaster. J Theor Biol. 2003;223:1–18.

38. Dougherty J, Kohavi R, Sahami M. Supervised and unsupervised discretization of continuous features. In: Prieditis A, Russell S, editors. Machine learning: proceedings of the 12th International Conference; 1995.

39. Catlett J. Megainduction: machine learning on very large databases. PhD Dissertation. University of Sydney. 1991.

40. Jain A, Dubes R. Algorithms for Clustering Data. Prentice Hall 1988.

41. MacQueen J. Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley symposium of mathematical statistics and probability, vol. 1; 1967. p. 281–97.

42. Saeed A, Sharov V, White J, et al. TM4: a free, open-source system for microarray data management and analysis. BioTechniques. 2003;34(2):374–378.

43. Crescenzi M, Giuliani A. The main biological determinants of tumor line taxonomy elucidated by of principal component analysis of microarray data. FEBS Lett. 2001;507:114–118.

44. Dimitrova ES, McGee J, Laubenbacher R, Vera Licona P. Discretization of time series data. J Comput Biol. 2010;17(6):853–868.


1If we overlook the distinction between genes and their alleles and the fact that some organelles are self-replicating and are not coded for by the organism’s DNA.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset