Chapter 8. Metabolic Pathways Analysis: A Linear Algebraic Approach

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 8

Metabolic Pathways Analysis: A Linear Algebraic Approach

Terrell L. Hodge, Department of Mathematics and College of Arts and Sciences Dean’s Office, Western Michigan University, Kalamazoo, MI 49008, USA, [email protected]

8.1 Introduction

To quote from a well-known biochemistry textbook [1], “Metabolism is the overall process through which living systems acquire and utilize free energy to carry out their various functions.” Metabolism is enacted through metabolic pathways: chains of consecutive enzymatic reactions that produce specific products for use by an organism. As explored by biologists and biochemists, there are hundreds of such “chains of reactions” fitting together in many complex (and sometimes not well-understood) ways. For example, the single bacterium Escherichia coli is known to have 600–700 metabolic reactions. Standard (bio)chemical diagrams of such systems of reactions for multiple cellular reactions can easily take up wall-sized charts across multiple walls. To get a sample of this (with relatively uncomplicated diagrams) in the case of E. coli, go to the Kegg reference pathway site http://www.genome.jp/kegg/pathway/map/map01100.html. Similar diagrams exist for human and animal cell metabolism, with even more complexity; a portion is shown below in Figure 8.1. The metabolites in a metabolic pathway are usually taken to be the substrates, intermediates, and reactants in a chain of reactions.

Figure 8.1 A representation of a portion of the metabolic pathways for human and animal cell metabolism, from http://www.genome.jp/kegg-bin/show_pathway?org_name=map&mapno=01100&mapscale=1.0&show_description=show.

So why the interest? Cellular metabolism is the complex set of chemical reactions that enable a cell to extract energy and other necessities for life from nutrients, and to build the new structures it needs to live and to reproduce. While it may not provide a metaphysical answer to the question “Why are we alive?” metabolism certainly provides a physical answer to the question of how our cells, and hence ourselves, are able to exist, to grow, and, ultimately, what fails and results in death. The study of cellular metabolism is at the heart of numerous questions and basic research about health, such as aging, and on the emerging sidelines as a consideration for others, such as autism. The degree of interconnectedness of our bodies and our environment, through metabolic interactions in the cells of our gut and numerous other cells¹ that we host there, has emerged as a hot research topic, via the study of the microbiome (resp., metabalome), like a biosphere of the gut (resp., a complete profile at a metabolic level). Such research suggests that a broad spectrum of modern diseases, such as diabetes, may be the result of having metabolic processes that are not functioning properly, perhaps due to a lack or imbalance of what were evolutionarily fine-tuned contributions of these non-human cells in our germ-killing, antibiotic present, antibacterial-soap world. See, e.g., [2] for a recent accessible introduction.

Even setting aside such new speculations, metabolic processes have been adjusted and tinkered with profitably for some time, and not only for treatment of disease. Metabolic engineering is defined as directed modification of cellular metabolism and properties through the introduction, deletion, and modification of metabolic pathways by using recombinant DNA and other molecular biological tools. Currently, green alternatives for many compounds produced chemically using oil are being sought, as well as more green methods. Biochemical production methods of forcing biological organisms or components to overproduce certain desired compounds are one such alternative, achievable through metabolic engineering. Some of the organisms used as production hosts include E. coli, Mycobacterium tuberculosis,² and Saccharomyces cerevisiae (yeast). Most of us are quite familiar with the benefits of yeast metabolism, but, as another example, biochemical engineering processes “feed” glucose and corn steep liquor to E. coli or other bacteria and generate, through metabolic processes, succinic acid, a precursor to production of pharmaceuticals, fine chemicals, biodegradable polymers, and more. A goal is to understand better the host of metabolic pathways in organisms, and use this knowledge to increase flux through helpful reactions (so produce more output) or even to discover previously unsuspected reaction chains that might produce the desired metabolites in some other fashion. Helpful models would also allow us to test alternative hypotheses, say by computer, more cheaply than running multiple experiments, and might give insight into which types of experiments would be the most useful. One would also hope to obtain a more global perspective, a systems perspective that, for example, allows one to see and predict the effects of multiple interconnected reactions at a less reductionist level than reaction by reaction.

How might one explore and understand these interconnected reactions, this biochemical reaction network, in a systematic and computational way? What does it mean to have such a system (beyond drawing cartoons)? In order for the cell, and hence the body as a whole, to be in a living, thriving state, it must generally be able to maintain some balance (homeostasis). Each reaction will transform a fixed set of inputs into a fixed set of outputs, but the “flow” or “flux” through a reaction describes how that transformation or flow through the reaction is occurring. If one focuses on a portion of these biochemical reactions that form a (sub)network of interest, then in a balanced state, the total concentration of all chemical compounds in that (sub)system is not changing. In such a state, what are the chains of reactions for which the “total flux”, the combined measure of flux in all the reactions, is not changing, i.e., is 0? It is the exploration of this query and setup mathematically, through applying standard tools from linear algebra to the so-called stoichiometry matrix of the reaction system, that will be the focus of this chapter. In this way, we have an opportunity to see how a mathematical model for metabolic networks, and certain pathways within them, can be constructed. Mathematical models like this have the potential to help us understand, clarify, and make predictions about the very complex inner workings of cellular processes in common to all creatures, as studied by clinicians and doctors, biologists, chemists, engineers, and many others.

8.2 Biochemical Reaction Networks, Metabolic Pathways, and the Stoichiometry Matrix

8.2.1 Stoichiometric Matrix I: Nullspaces, Linear Dependence, and Spanning Sets

Suppose one wishes to consider a finite sequence of chemical reactions, involving m chemical compounds .

Recording a reaction in standard chemistry notation, each reaction would take the form

(8.1)

That is, in a given equation in the list, some of the possible compounds will be used (the s from among the possible choices ) in some amounts (the numbers , one for each compound involved in the reaction). Let’s consider this concretely in some steps of glycolysis, a cellular metabolic pathway that is part of the even more complex pathway of cellular respiration. For example, a possible first step in the metabolic pathway of glycolysis in humans is the glucokinase reaction:

(8.2)

In reaction (8.2), the biochemical molecules/compounds that are input, i.e., the substrates GLC and ATP, and those that are output, i.e., the products G6P and ADP, are called metabolites. The addition of ATP yielding ADP is a step repeated again in glycolysis and is very important—it is essentially a way for the cell to release and recapture energy in a controlled fashion. Alternatively, it’s not uncommon for biologists to think of the reaction step (8.2) simply as

that is, glucose is transformed, in the presence of ATP, into glucose 6-phosphate, with ADP being tossed off in the process. In this view, the reaction captures glucose for the cell, transforming it into a product that cannot wander back off outside the cell wall, whence further reactions in glycolysis continue its transformation. Such shifting of perspective, e.g., from (8.2) to just , is a common step in modeling (bio)chemical reaction systems—what constitutes a “metabolite” depends upon what one’s interest in the process is. For this reason, we will use the term metabolite to mean any compound of interest involved in a biochemical reaction, a use somewhat more loose than by biologists.

Generally, in each biochemical reaction step in a metabolic system, the reaction is enzymatic, in that enzymes, special proteins, catalyze these biochemical reactions. The hexokinases are enzymes that carry out this step from glucose to G-6-phosphate. Figure 8.2 [3] shows a particular hexokinase, glucokinase. Beyond its importance in glycolysis, functional mutations in glucokinase are responsible for Type II diabetes, so there is special interest in targeting this enzyme. (As for many proteins, incorrect folding can lead to disease.)

Figure 8.2 The enzyme glucokinase, from http://commons.wikimedia.org/wiki/File:Glucokinase-1GLK.png?useFormat=mobile.

By many common ways of describing glycolysis, there are 9 further reactions in the glycolysis pathway, with the total of 10 reactions involving a total of 17 (different) compounds as metabolites (some as substrates, some as products, and most as both). For a movie that illustrates this process step-by-step, along with the reaction chain of gluconeogenesis that produces glucose, rather than breaking it down, do see http://wps.prenhall.com/esm_horton_biochemistry_4/37/9594/2456197.cw/-/2456228/index.htm. For a static picture, see Figure 8.3 [4]; the reader is encouraged to look on the Web or in texts like [1] for others to see a multiplicity of perspectives.

Figure 8.3 A representation of glycolysis, from http://upload.wikimedia.org/wikipedia/commons/archive/a/a0/20091114111506%21Glycolysis.svg.

Coming back to the reactions themselves, if we were to label the compounds with the generic variables above, we would need 10 chemical equations in 17 variables, . Two additional examples of these chemical reactions involved in glycolysis are:

(8.3)

and

(8.4)

where (if you care to know)

Now, each (bio)chemical equation (8.1) can be rewritten mathematically as follows:

(8.5)

Necessarily, each , since is the number of molecules of the metabolite . Equivalently, the equality (8.5) can be rewritten as

(8.6)

with the convention that the number of molecules of the metabolite is altered by a minus sign when is a substrate (an “input,” or a metabolite “consumed by the reaction”), and taken to be positive when is a product (an “output,” or a metabolite “produced by the reaction”). For example, placing Eq. (8.2) in the form Eq. (8.6) gives

(8.7)

It is not uncommon for the coefficients of these (mathematically written) reaction equations to be simply ; in fact, equations arising from biochemical networks³ can often be represented in this form.

Exercise 8.1

Rewrite Eq. (8.3) in the form of Eq. (8.7) and call the result Eq. (8.8).

(8.8)

Now do the same for Eq. (8.4), and call the result Eq. (8.9).

(8.9)

The data in the three Eqs. (8.7)–(8.9) can be encoded in a matrix, a stoichiometric matrix, which we now define.

Definition 8.1

The stoichiometric matrix, for a system of n biochemical reactions (8.1), is an matrix , where m is the total number of metabolites in the n equations. The stoichiometry matrix S has exactly one row for each metabolite appearing in the system. Each column vector of the matrix S corresponds to a reaction, by recording the coefficients of the metabolites taken from the reaction equation in the form (8.6). If a metabolite does not appear in a reaction, the corresponding column entry is taken to be 0.

Example 8.1

In Eqs. (8.7)–(8.9), there are a total of 9 metabolites. Thus, S will have rows with each corresponding to a , and columns with each corresponding to an equation in the form of Eq. (8.6). Letting

then the first reaction Eq. (8.7) gives rise the first column

Exercise 8.2

Use Eqs. (8.8) and (8.9) to fill in the remaining two columns of the 9 by 3 stoichiometric matrix S for the system given by (8.7)–(8.9):

Exercise 8.3

a. The stoichiometric matrix S you found in Exercise 8.2 corresponds to a system of only some of the equations needed to represent the full glycolysis network. What would be the shape of the stoichiometric matrix for the full glycolysis network?

b. As analyzed in [6], the full metabolic network of the human red blood cell involves 51 reactions (and their “fluxes,” in biochemical parlance) and 29 metabolites. What would be the shape of the stoichiometric matrix S for this metabolic system⁴?

As one mathematical model,⁵ one can also attempt to capture metabolic systems, and biochemical reaction networks in general, pictorially, through graphs. In these graphs, there is one vertex (i.e., node) for every metabolite. Edges join two metabolites if they are linked in a (bio)chemical reaction, and a value associated to each edge is the “flux,” a measure of the level of activity through the reaction (or rate at which the reaction is occurring). The direction of an arrow represents the direction of the reaction, and reversible reactions are pictured through oppositely oriented double edges (see Figure 8.4). A large picture of multiple metabolic networks was given earlier (Figure 8.1); a simpler example of such a graph as given in [6] is pictured as graph (A) in Figure 8.4.

Figure 8.4 A graphical example from [6] of a possible metabolic network (graph A), also shown with (re)labelings for internal and external fluxes added as per conventions in [6] (graph B).

Such a graph as graph (A) in Figure 8.4 might correspond to looking at only part of a larger system (just as glycolysis appears in the much larger network of cellular metabolism). In this case, one can draw a boundary around the relevant part of the system, and include arrows to represent flows into or out of the subsystem from the larger system, pictured as graph⁶ (B) of Figure 8.4. There will be internal fluxes, corresponding to reactions in the particular system under consideration, and external fluxes, those involving inputs or outputs from parts of the system not under direct focus, but that, given the connectedness of systems, cannot be ignored. In [6], it is argued that a useful biochemical convention is to first represent the exchange fluxes as arrows “going out” of the boundary, even if the direction in a chemical reaction sense is the opposite. Hence, one sees the arrow conventions and labelings⁷ of internal fluxes (by ) and external fluxes () in (B) of Figure 8.4.

For each vertex in the graph of the biochemical reaction network, one may write a “balance” or “node” equation in the internal and external fluxes. In this, the formal sum of fluxes “going in” to a fixed node must equal the formal sum of fluxes “going out.” Thus, using the second of the two graphs above (graph (B)) at node A, one obtains

at node B, one obtains

Alternatively, one can assign the following conventions: label the flux for each incoming edge as positive, and label the flux for each outgoing edge negative; then the two node/balance equations above can be rewritten as:

(8.10)

(8.11)

Exercise 8.4

Complete the list of node/balance equations begun above, i.e.,

(8.12)

In the form of Eq. (8.12) above, the sequence of balance/node equations corresponds to a homogeneous linear system in the 11 variables , that is, in the flux variables. There is one equation for each node.

Exercise 8.5

Write down the coefficient matrix for the homogeneous linear system described in Exercise 8.4.

Wonderfully, the coefficient matrix you just found is just the stoichiometric matrix S for the system! Writing the total flux vector (so that , and the other s are the internal fluxes as before), the homogeneous system described by the node/balance equations has the form

(8.13)

This time, however, we have obtained S by focusing on the rows, instead of the columns.

Exercise 8.6

How many reaction equations occur in the system described by this graph/stoichiometric matrix S?

Recall that the nullspace N(S) of S is the set of all solutions to the homogeneous system (8.13). Equation (8.13) is interpretable as a statement of conservation of mass [6]. Each vector v in the nullspace N(S) describes the relative distribution of “fluxes,” and the variable entries of each flux vector v give values that represent the activity of the individual reactions, indicated by their flow rates. Thus, the product Sv assigns the flux throughout the entire metabolic system represented by S[6, p. 4194]. Flux vectors v satisfying correspond to steady-state solutions to a “dynamic mass-balance equation” , where for the concentration of the th metabolite (compound ), and is the change in concentration of the th metabolite, and as before, the rate (“flux”) of reaction .

Exercise 8.7

a. Let . Show that .

b. Represent the total flux vector v from part (a) by shading in Figure 8.5 the corresponding edges of the graph which have nonzero fluxes. (Recall that a zero entry means there is no flux, hence no involvement, of a particular reaction.) What do you notice about the resulting walk in the graph?

Figure 8.5 Graph for Exercise 8.7 (b).

c. Let . Repeat the instructions for parts (a) and (b) above, using Figure 8.5.

d. Check that v and are linearly independent vectors.

e. Using your observations from parts (a) to (d), find and sketch the graphical interpretation of another total flux vector, , which is linearly independent from v and (that is, show is a linearly independent set). Use Figure 8.5.

f. Now, find a total flux vector w for which form a linearly dependent set. Do you have a graphical interpretation for w?

Exercise 8.8

a. Find the entire space N(S) for the stoichiometric matrix S you found in Exercise 8.5. Before beginning, make a prediction as to the number of free variables you will find, and try to complete the rest of the accompanying statement below. My prediction for the number of free variables is: _____, and the relation to the minimal number of vectors spanning N(S) is: _____. Express your solution as linear combinations of vectors, with coefficients the free variables, that is, express N(S) as a span of a set of vectors. (Feel free to use your calculator here to find the rref of S.)

b. Argue (without a lot of computation) that the spanning set of vectors for N(S) that you found in part (a) forms a linearly independent set.

c. As in Exercise 8.7(b), give the graphical interpretation of each element of the spanning set for N(S) by using as many copies of Figure 8.5 as needed. Explain whether these results are consistent with your expectations formed from Exercise 8.7. (We will explore this outcome further in some later problems.)

Exercise 8.9

a. For the stoichiometric matrix S as in Exercise 8.2, find the nullspace N(S) of S.

b. Then, explain the biochemical significance of the result of part (a).

c. Can you hazard a guess also as to the graphical significance of the result of part (a)? [Hint: Take a look at the glycolysis network represented in Figure 8.3.]

d. Would you expect the same result as in part (a) if you used instead the stoichiometric matrix S for the entire metabolic network given by glycolysis? Be as specific as you can about the results you would expect, without doing any actual computations.

8.2.2 More on the Nullspace of the Stoichiometric Matrix: Spanning with Biochemical Pathways and Base Changing

Exercise 8.10

The vectors (total flux vectors v) you found in Exercise 8.8, spanning the nullspace N(S) of the stoichiometry matrix S, include two which have at least one negative entry in a variable associated to an internal reaction. List these two vectors.

Since any reversible reaction was already broken down graphically into a pair of double edges (oppositely oriented), and otherwise the arrows of internal reactions correspond to directions of the associated chemical equations, a negative value on an internal flux gives a somewhat nonsensical interpretation. Flux vectors with this problem (one or more negative flux distributions for internal reactions) represent biochemically impossible outcomes, while flux vectors without this problem represent chemically feasible pathways through the metabolic system [6]. This corresponds to what you have seen pictorially in Exercise 8.8(c).

Consequently, one goal of [6] is to find a biologically legitimate spanning set of flux vectors v for the nullspace N(S) of a stoichiometric matrix S, so that each v corresponds to a biochemically valid pathway through the metabolic system described by S. The mechanism [6] employ is base-changing, as we shall now discuss. First:

Exercise 8.11

a. Recall the definition of a basis of a vector space here.

b. List a basis for the vector space N(S), for the stoichiometric matrix S you found in Exercise 8.5.

In seeking a “good basis” for the nullspace N(S) of a stoichiometric matrix S[6], look for a set of total flux vectors that simultaneously:

1. all represent biochemically valid pathways,

2. span N(S), that is, each possible flux vector in N(S) is a linear combination of the basis vectors, and

3. form a linearly independent set: no one flux vector in the basis set (or its graphical interpretation) can be expressed nontrivially as a linear combination of the remaining ones.

Exercise 8.12

A quick check on your understanding: Is the basis you found for Exercise 8.11 a “good basis”, in the sense above? Is a “good basis” a basis?

To create a “good basis” as in [6], the authors further require the following conditions be satisfied by a set of total flux vectors for the nullspace N(S) of any stoichiometric matrix S:

4. All coordinates corresponding to internal fluxes will be positive, that is,

5. For the coordinates corresponding to exchange fluxes values,

Let

(8.14)

where the dashed line in B simply represents a partition of B into two blocks, but can otherwise be ignored. Set and take . We claim is a basis for the nullspace N(S) as in Exercise 8.11.⁸ Writing down the matrix B, instead of the list , is just a compact way to represent this basis. (The basis is still really the set of columns of B, though!) The entries in each part of the columns above the dashed line in B correspond to the internal flux variables , while the entries below the dashed line correspond to the external flux variables . Observe that the basis given by B is not a “good basis”.

Exercise 8.13

a. Set , and . Write out a matrix P with columns . (Use a partitioning scheme like that for B above.) Let .

b. Argue that spans N(S), for S as in Exercises 8.5 and 8.8.

c. Show that is a set of linearly independent vectors.

d. Parts (b) and (c) show form a basis of N(S). Finally, check that satisfies the additional properties required to make a “good,” that is, biochemically feasible, “basis” for N(S), as per the added conditions from [6] listed previously.

e. Illustrate the paths corresponding to the total flux vectors given by the basis vectors using copies of Figure 8.5.

f. Find a nontrivial total flux vector which does not correspond to a basis vector in , and represent it as a linear combination of paths (i.e., total flux vectors) from .

The basis found above is an example of “base-changing” from a mathematically valid basis of N(S) to one that is mathematically and biochemically valid. Question to ask at this juncture include:

Is it always possible to find a mathematically valid basis for the nullspace N(S) of a stoichiometric matrix ? If so, is it always possible to find a biochemically valid basis , starting from a mathematically valid basis of the nullspace of a stoichiometric matrix ? The first we can answer; the second is a research question which we will illustrate in another example.

Changing bases of a vector space can also be viewed in terms of “changing coordinates.” A set of basis vectors for an (arbitrary) vector space W can be viewed as a set of “coordinates” for W. This literally means that any vector has a unique expression as a linear combination in terms of the elements of . More precisely, there are real numbers so that

and if is any other vector, then if and only if, in the corresponding expression

one has .

One can represent by listing the coefficients of w in the linear combination to give

as a vector . In this way, every vector in W corresponds uniquely to a “point” .

Exercise 8.14

a. Suppose , the usual Euclidean plane. Let be the standard unit vector (vector of length one) in the direction of the positive x-axis, and let be the standard unit vector in the direction of the positive y-axis. Use geometric properties of vector operations in Euclidean space to sketch the vector . To what ordered pair does this linear combination of vectors correspond?

b. However, now replace by , and let . Let be the vector whose coordinates are (−4, 5) in the new coordinates given by . Sketch using the old coordinate system .

Going back to metabolic pathways and stoichiometric matrices, complete the following exercise.

Exercise 8.15

a. Define to consist of the elements of and the vector as in Exercise 8.13(f). Find a vector in the corresponding system N(S) which has two distinct representations as linear combinations of elements of . (This reflects a general principle: One cannot add a new vector to a set that is already a basis and still have a basis. Be able to justify this!)

b. Express each vector of (coming as before from (8.14)) as an element of N(S), with coordinates coming from the basis .

Exercises 8.13 and 8.15 explore two bases, namely the sets of vectors and , for the nullspace N(S) of the stoichiometric matrix S as in Exercises 8.5 and 8.8. As per Exercises 8.13 and 8.15, , so any vector in N(S) can be expressed uniquely in six coordinates taken either with respect to , or with respect to . For example, viewing N(S) as a subspace of Euclidean space , we previously checked that

(As per usual conventions, the coordinates here are in terms of the standard basis of .) However, in terms of coordinates with respect to the ordered basis of N(S) as a six-dimensional vector space,

(8.15)

Likewise, since , in terms of coordinates with respect to the ordered set , one has

(8.16)

On the other hand,

satisfies

(8.17)

but

(8.18)

A change-of-basis matrix⁹ from the basis to the basis is a square matrix with the following property:

then the matrix product gives the coordinate expression for u in the basis , i.e.,

The matrix can be created by setting each of its columns , to be the basis vector written in terms of coordinates of . Thus, in this case, by Eqs. (8.15) and (8.16),

while by Eqs. (8.17) and (8.18),

Exercise 8.16

a. Complete the remaining four columns of the change-of-basis matrix , from basis to , using the results of Exercise 8.13:

b. Use to find the coordinates of in terms of . Also, what vector of N(S) does u represent, expressed in terms of the standard coordinates in ?

c. Use to find the coordinates of an arbitrary vector in terms of .

In the next exercise, you will repeat the previous exercise, but switch the roles of and .

Exercise 8.17

a. Using the results of Exercise 8.15, now find the change-of-basis matrix from basis to basis of N(S).

b. Use to find the coordinates of in terms of . Also, what vector of N(S) does v represent, expressed in terms of the standard coordinates in

c. Use to find the coordinates of an arbitrary vector in terms of .

d. Compute the matrix products and , and explain your answer.

Exercise 8.18

The “Rank + Nullity Theorem”¹⁰ says that, for an arbitrary matrix ,

where the dimension of the nullspace of A is often called the nullity of A, and rank (A), the rank of A, is the dimension of the row space of A, equivalently, the number of nonzero rows in .

Suppose for parts (a) - (d) and (f), is a generic stoichiometric matrix:

a. Recall:What is the biological meaning of the number n of columns of ?

b. Recall: What is the biological meaning of the number m of rows of ?

c. Recall: What is the biological meaning of a vector in N(S), and of , the number of (linearly independent) vectors spanning ? What about the columns of ?

d. For the metabolic system captured by S, under what circumstances is the number of total flux vectors needed to describe the metabolic pathways (via linear combinations) in the system the same as the difference between the number of metabolites in the system and the number of chemical reactions in the system? [Hint: Use your previous answers and the Rank + Nullity Theorem.]

e. Were the circumstances you identified in part (d) of this exercise met in Exercise 8.8? What about in Exercise 8.9?

f. By definition of the rank of a matrix, satisfies the inequality . As per [7, p. 298], whenever conservation relationships hold in the system, for example, that ATP + ADP equals some constant value for the whole system. Using this, give a restatement (in English) of the biological meaning of the Rank + Nullity Theorem for a stoichiometric matrix, tying in your results from part (d).

g. Schilling et al. [7, pp. 298–299] presents an analysis of the reaction scheme of a metabolic system consisting of the glyoxylate cycle and related reactions, as pictured therein. The outcome of solving for the nullspace N(S) of a stoichiometric matrix S for this metabolic system¹¹ results in a nullspace N(S) with . A basis for N(S) is also pictured below, organized as the columns of the matrix B. For reasons of space, the actual matrix shown is ; the columns correspond to the following metabolite abbreviations: Eno, Acn, Sdh, Fum, Mdh, AspC, Gdh, Pyk, AceEF, GltA, Icd, Icl, Mas, AspCon, Ppc, GluCon:

Using the data from matrix B above, that is, from N(S) (and without attempting to compute S itself¹²), answer the following questions:

i. How many free variables must have appeared in ?

ii. If there were no conservation relationships holding among metabolites in the system, how many reactions were there?

8.2.3 Conclusion

Biologically “good” bases for lead to the notion of “extreme paths”. For example, for as in Exercise 8.18 (g), it is possible to find a “good basis” for N(S): check that by setting to be the ith row of , and taking , and , one obtains a “biologically good basis.” In [7] one can see a graph of the associated system with nodes indexed by the metabolites listed, and the problem that poses biochemically. (But given your experience to date, you should guess what that will be!) Since any three linearly independent vectors in span a three-dimensional vector space, there is a geometric interpretation , and as the edges of a convex cone, the “flux cone” of S in the space. (Do see [7] for representative pictures and more discussion.) The edges determine all possible total flux vectors for the system S represents, in that any positive convex linear combination of them (i.e., point in the flux cone) is a total flux vector. This leads to the notion of the as so-called extreme pathways. Although they are biologically a bit difficult to describe, their cone yields all total flux vectors for which the system has no changes in concentrations of the metabolites, so metabolites are conserved, and we arrive back to the ideas we discussed in Section 8.1.

The papers we have noted so far represent an early attempt to apply linear algebraic ideas to this area of metabolite conservation. In a series of papers over the years, their authors and their colleagues have employed other concepts in linear algebra [8] and more significant linear algebraic techniques (e.g., the SVD [9]). They have combined linear algebraic techniques with notions from convex analysis and other areas to try to address questions we have raised here, and to tackle biochemical reaction systems that are much more sophisticated than the tutorial examples in this chapter, taken from their early papers, address. The reader is encouraged to check http://gcrg.ucsd.edu/Researchers for a wealth of developments. In this context, there has been developed a program expa (free) for computing extreme pathways, and subsequently much more extensive software, the COBRA toolbox http://opencobra.sourceforge.net/openCOBRA/Welcome.html (requires Matlab or Python). A significant problem that arises early on, when working with more sophisticated reaction systems, is that our naive intuition linking paths in the graph/cartoon of the biochemical reaction system to extreme paths and the stoichiometry quickly breaks down or can become stretched in a biologically incorrect direction. A final section of this chapter, which could be broken apart as a project unto itself, includes a companion tutorial for the free download expa and gives the interested reader a chance to explore these issues and a potential solution to them by working with hypergraphs instead of graphs. These additional materials will also allow you to explore large (and hence, more realistically interesting) biochemical reaction systems, as was proposed would be useful in the introduction. Papers using related linear algebraic techniques combined with convex analysis and differential equations have enabled researchers to isolate new metabolic pathways for E. coli[10], to identify potential disease mechanisms in red blood cells [5], and a host of other biomechanical engineering and other applications, see, e.g., the survey [11]. Other papers explore the links between varying versions of biologically “special” sets of vectors (such as the “extreme flux modes” of [13], to name just one) e.g., [12] and modeling of biochemical reaction networks using many other types of modeling tools, such as graph theoretic and other algebraic approaches to system dynamics, e.g., [14,15], algebraic geometric methods, e.g., by [16], and more. It is even possible to use these ideas, in combination with aspects of phylogenetics discussed in Chapter 10 of this volume, to explore the evolution of metabolic pathways [17,18].

8.3 Extreme Paths and Model Improvements

This final section was coauthored by Robert J. Kipka and Terrell L. Hodge.

In this section (which could be a stand-alone project), we’ll use software to find extreme paths. These are particular elements of the nullspace of a stoichiometric matrix for which all nullspace elements are nonnegative linear combinations. This section will allow us to explore some of our modeling assumptions in greater depth and to do some simple analysis of the biochemistry of a red blood cell. The objectives of this section are:

a. To reflect on the advantages and disadvantages of representing biochemical processes using directed graphs.

b. To consider directed hypergraphs as a possible alternative and to interpret extreme pathways using directed hypergraphs.

c. To develop some basic familiarity with the ExPA program for finding extreme paths of complex biochemical systems.

The first thing we’ll do is download ExPA and make sure it can run on your computer.

8.3.1 Downloading and Installing expa.exe

The software we’ll use for this section is available for free download at the following Website:

http://gcrg.ucsd.edu/Downloads/ExtremePathwayAnalysis.

The program is meant to be run from a terminal window. If you’re using a PC, you can download an additional file from http://booksite.elsevier.com/9780124157804 that allows you to run the program without having to deal with the terminal window. If you’re using a Mac, things are more complicated. At the time of this writing, the program will not run at all on OSX 10.7 or later. If you’re using OSX 10.6 or older, the program will probably run but you will have to use a terminal window. We encourage you to do so! Very little terminal use is required and the benefits far outweigh the small amount of difficulty the terminal window represents.

8.3.1.1 Instructions for a PC

To install ExPA on your PC, follow the following steps in the order they’re given.

1. Open a Web browser and go to http://gcrg.ucsd.edu/Downloads/ExtremePathwayAnalysis.

2. You should download the latest version of expa.exe. At the time of this writing, the name of your download is “A new ExPA program.” Right-click on the colored text which says “A new ExPA program” and choose “Save target as…”

3. Choose a folder that can find again and save the file in this folder.

4. Find the file you just downloaded. If the file is still zipped (which it likely is), right-click on it and select “Extract all… ” Choose the folder you’d like to place expa.exe in.

5. Once you’ve downloaded the file and unzipped it, it’s ready to use from the command prompt.

Using ExPA on a PC from a Command Prompt

1. Open up “My Computer,” open up the “C” drive, and create a folder called “expa.”

2. Find the folder that was created during the unzipping process, above. It should contain “expa.exe.” Copy the contents of this folder into the expa folder you created on the C drive.

3. Open up a command prompt. This can probably be found in your start menu under “Accessories.”

4. In the command prompt, type in “cd c:expa”. This changes your working directory to the expa folder we created earlier.

5. To see if the program runs, type in “expa rbc.expa”. A file called “paths.txt” should appear.

6. For our purposes, to run ExPA you’ll only need to type in “expa” followed by the file name of your input file. Output is always written to “paths.txt.” The program only needs to be run from the command prompt. The easiest way to edit the input and output files is with a text editor like Notepad.

Using ExPA on a PC without the Command Prompt

1. Link to http://booksite.elsevier.com/9780124157804 and download the “run-expa.bat” file. Place this file in the folder that expa created when you unzipped it. This is crucial. If you’re not certain about what folder that is, look for the files “test.expa” and “rbc.expa” and place the download in the same folder as these files.

2. Open up the expa folder and right click on “test.expa.” Select “Rename.” Rename the file “source.txt.” It is important that you replace the “.expa” part of the name with “.txt” or things will not work properly. The operating system might give you a warning when you do this, but it’s perfectly safe.

3. You’re ready to use ExPA without the command prompt. To see if it works, double-click on the “run-expa.bat” file. A new file called “paths.txt” should appear. If it does not appear, try refreshing the folder (an option under right-click). If it still does not appear, something is wrong.

4. To use expa without the command prompt, all input has to be placed in this “source.txt” file. When you double-click on “run-expa.bat,” the program reads what’s written in “source.txt” and writes its output to “paths.txt.”

8.3.1.2 Instructions for a Mac

The instructions for getting ExPA to run on a Mac are not as detailed as those for a PC. This is because the program may not even run on a Mac. At the time of this writing the program will not work on OSX10.7 and may not work on OSX10.6 without additional effort. To install ExPA on a Mac, take the following steps:

1. Open a Web browser and go to http://gcrg.ucsd.edu/Downloads/ExtremePathwayAnalysis.

2. Download “A new Mac ExPA program” to your desktop.

3. Open up your “Applications” folder and create a new folder called “expa.”

4. Copy the contents of the folder created by the download to the “expa” folder you created in “Applications.”

5. Open a terminal window and type in “cd /Applications/expa”. Press enter. This will change your working directory.

6. Type in “ls” and press enter. This command lists the contents of the directory you’re working in. Check that “macexpa.out” is in the list of files. If it’s not, check that you copied the contents of the downloaded folder (not the whole folder) to the “expa” folder in “Applications.”

7. Once you’ve found “macexpa.out” using the terminal, type in “chmod u + x macexpa.out”. This only needs to be done once. The program is now ready to be used.

8. To test the program, type in “./macexpa.out./rbc.expa”. The program should create a file called “paths.txt”. This is how the program is run. Simply navigate to the /Applications/expa directory and type in “. /macexpa.out” followed by the name of your input file. The program only needs to be run from the terminal window. The easiest way to edit input files or read output files is with a text editor.

8.3.2 Analyzing a Modeling Decision: Directed Graphs

In representing biochemical reactions as directed graphs, we’ve chosen to represent metabolites as vertices and place edges between vertices whenever the metabolites are biochemically linked. For example, appears in a directed graph as in Figure 8.6.

Figure 8.6 Graph for .

Of course, there’s nothing that forces us to make this choice; this is a modeling decision and comes with both strengths and weaknesses. In this section we’ll explore this decision in a little bit greater depth and consider an alternative representation. At the end, we’ll use our new understanding to analyze red blood cell metabolism using ExPA.

Let’s start by looking at the following abstract biochemical process:

(8.19)

Suppose that A is an input to the system and F is an output. According to our directed graph conventions, this system has the following representation as in Figure 8.7.

Figure 8.7 Graph for system (8.19), external fluxes added.

To analyze this system with ExPA, we need to create an input file. An input file for ExPA provides the program with a list of internal fluxes (labeled in the chapter as ) and exchange fluxes (labeled in the chapter as ). A biochemical reaction can be reversible or irreversible. All of the reactions in our abstract system are irreversible. Based solely on our directed graph, our input file should look like Table 8.1. The I is for “irreversible.”

Exercise 8.19

The vertical dots in the input pictured in Table 8.1 should be replaced with the data for through . Create an input file¹³ for ExPA, based on Figure 8.7 that will model our abstract system.

Exercise 8.20

Use ExPA to find the extreme paths in the representation from Exercise 8.19.

Exercise 8.21

Now consider the system

(8.20)

Represent (8.20) as a directed graph and compare your answer to the graph for (8.19).

Exercise 8.22

Write down at least one more biochemical system which will give you the directed graph from Exercise 8.21.

Table 8.1

Your answer to the previous exercise highlights a possible weakness of our decision to model processes using directed graphs. In particular, a directed graph may not perfectly capture the stoichiometry of a biochemical process. For this reason, it can be interesting to use hypergraphs. Whereas in a directed graph, each edge must start at a unique vertex and end at a unique vertex, in a directed hypergraph an edge (really “hyperedge”) is allowed to have both multiple starting points and multiple endpoints. For example, a hypergraph representation of (8.19) would look like Figure 8.8, with and hyperedges that are not just edges.

Exercise 8.23

When we represent a biochemical process using a hypergraph, what do the “hyperedges” represent? In what ways is this the same as or different from the edges in a directed graph?

Figure 8.8 Hypergraph for system (8.19).

Some immediate questions regarding our software arise. In particular, can the software deal with hypergraphs? And if so, how will we represent hypergraphs as input for ExPA? It turns out that ExPA handles everything beautifully. In fact, because ExPA does its analysis on the stoichiometric matrix, all the program requires is the stoichiometry of the biochemical process. And so in order to have ExPA find the extreme paths of this process, all we need to do is enter internal fluxes in the obvious way. For example, will be given as in Table 8.2.

Exercise 8.24

Create an input file for ExPA which represents Figure 8.8. List the extreme paths given by the program and explain in laymen’s terms why your answer makes sense.

Exercise 8.25

Write down the stoichiometric matrix for (8.19) and explain how your input file for ExPA relates to this matrix.

Table 8.2

Now let’s look at something a little more complex. Consider the following abstract biochemical process:

(8.21)

Suppose that A and D are system inputs, while , and L are system outputs.

Exercise 8.26

Draw a hypergraph representation for (8.21).

Exercise 8.27

Using only the hypergraph or representation (8.21), see if you can find the proper inputs to create one molecule of J as an output. Is it possible? What about n molecules of J?

Exercise 8.28

Use ExPA to find the extreme paths of the system, making sure that your input file fully represents (8.21). Explain carefully why these are the only extreme paths for the system using either the biochemical representation (8.21), your hypergraph representation, or both.

Exercise 8.29

Revisit Exercise 8.27, this time using the extreme paths given by the software to determine the proper inputs (if any) necessary for the creation of one molecule of J by the system.

Exercise 8.30

Something interesting happens if you “turn off” an exchange flux at . Use ExPA to find out what happens and then explain in laymen’s terms what’s going on.

Exercise 8.31

Allow I to be a system output again and use ExPA to explore what happens if we also allow G to be a system output. Explain in simple terms what’s happening. How do changes in the exchange flux status of G relate to production of J as a system output?

Now that we’ve had a chance to play around with some “simple” abstract processes, let’s take a look at an actual biochemical process.

Exercise 8.32

The file rbc.expa, which comes with ExPA, is a sample file containing some of the biochemistry of a red blood cell. Use ExPA to find the extreme paths. Summarize your findings by creating a table showing only those pieces of the extreme paths relating to exchange fluxes for the system. Afterwards, compare with the graphical representations in [5], and note further discussion there of applications.

Exercise 8.33

The following reaction represents the production of ATP for metabolic energy and NADH for methemoglobin reduction:

Is this something the red blood cell can do, according to the extreme paths that you found?

Acknowledgments

The author gratefully acknowledges the support of the National Science Foundation under DUE award #0737467. Thanks go to many linear algebra students who experimented with both prototypes and other extended versions of a module, built for that award, from which parts of it became this chapter and the supplementary materials. Many thanks also go Robert J. Kipka for his role in preparing the final section/project, and other contributions as a TEXpert reviewer.

8.4 Supplementary Data

Supplementary files and materials associated with this article can be found from the volumes website http://booksite.elsevier.com/9780124157804

References

1. Voet D, Voet J. Biochemistry. Wiley and Sons 2004.

2. Ackerman J. How Bacteria in Our Bodies Protect Our Health. Scientific American June 2012.

3. http://commons.wikimedia.org/wiki/File:Glucokinase-1GLK.png?useFormat=mobile [as of July 2012].

4. http://upload.wikimedia.org/wikipedia/commons/archive/a/a0/2009111411150621Glycolysis.svg [as of July 2012].

5. Wiback S, Palsson O. Extreme pathway analysis of red blood cell metabolism. Biophys J. 2002;83:808–818.

6. Schilling CH, Palsson BO. The underlying pathway structure of biochemical reaction networks. PNAS. 1998;95:4193–4198.

7. Schilling CH, Schuster S, Palsson BO, Heinrich R. Metabolic pathway analysis: basic concepts and scientific applications in the post-genomic era. Biotechnol Prog. 1999;15:296–303.

8. Famili I, Palsson BO. The convex basis of the left null space of the stoichiometric matrix leads to the definition of metabolically meaningful pools. Biophys J. 2003;224:16–26.

9. Famili I, Palsson BO. Systematic metabolic reactions are obtained by singular value decomposition of genome-scale stoichiometric matrices. J Theor Biol. 2003;224:87–96.

10. Lee SY, Hong SH, Moon SY. In silico metabolic pathway analysis and design: succinic acid production by metabolically engineered Escherichia coli as an example. Genome Informatics. 2002;13:214–223.

11. Maertens J, Vanrolleghem PA. Modeling with a view to target identification in metabolic engineering: a critical evaluation of the available tools. Biotechnol Prog. 2010;26:313–331.

12. Llaneras F, Picó J. Which metabolic pathways generate and characterize the flux space? A comparison among elementary modes, extreme pathways and minimal generators. J Biomed Biotech 2010. doi 10.1155/2010/753904 13 pages, [Article ID 753904].

13. Schuster S, Dandekar T, Fell DA. Detection of elementary flux modes in biochemical networks: a promising tool for pathway analysis and metabolic engineering. Trends Biotechnol. 1999;17:53–60.

14. Craciun G. Graph theoretic approaches to injectivity in general chemical reaction systems (with Murad Banaji). Adv Appl Math. 2010;44:168–184.

15. Craciun G, Pantea C, Rempala GA. Algebraic methods for inferring biochemical networks: a maximum likelihood approach. Comput Biol Chem. 2009;33:361–367.

16. Shui A, Sturmfels B. Siphons in chemical reaction networks, with Bernd Sturmfels. Bull Math Biol. 2010;72:1448–1463.

17. Mithani A, Preston GM, Hein J. A bayesian approach to the evolution of metabolic networks on a phylogeny. PLoS Comput Biol. 2010;6:e1000868. doi 10.1371/journal.pcbi.1000868.

18. Mithani A, Preston GM, Hein J. A stochastic model for the evolution of metabolic networks with neighbor dependence. Bioinformatics. 2009;25:1528–1535.

¹Ten times the number of our own!

²Note: Tuberculosis infects about 2 billion people—one third of the Earth’s population! It’s satisfying to think one can turn this threat on its ear and use knowledge of biology to get M. tuberculosis to produce useful compounds.

³As opposed to “elemental equations,” reaction equations associated to the decomposition of compounds into their basic chemical elements, like hydrogen, oxygen, etc.

⁴No computations are necessary to answer this, but see the final section for an associated project wherein one can use free software to explore this example concretely, in more detail.

⁵See the final section for the associated project which explores shortcomings and identifies another model.

⁶By many formal definitions, the result is no longer formally a graph in the mathematical sense, since there are arrows which do not join two nodes, but we will continue to abuse terminology and call this a graph.

⁷This labeling of some fluxes by s and some by s is a notational convenience in what follows. However, the total flux vector will still be denoted by v, with first seven entries the internal fluxes, and last four entries the external fluxes.

⁸Hopefully, this looks familiar up to relabeling. Thus, we will not check that is a basis for N(S) here.

⁹Also called a “transition matrix” in some texts, while in others, this term is reserved for Markov chain processes.

¹⁰This is sometimes just called the “Rank Theorem.”

¹¹As explored in earlier exercises in this module.

¹²This could be made into an additional project.

¹³Note: If you’re using a PC and plan on running the program without opening the command prompt, this input file has to be named “source.txt” and has to be placed in the same directory as “expa.exe”. If you intend to use a command prompt or terminal window, the input file can be given any name.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 8. Metabolic Pathways Analysis: A Linear Algebraic Approach

Create new playlist

Sign In

Sign Up