Preface

Purpose and Goals

Bioinformatics can refer to almost any collaborative effort between biologists or geneticists and computer scientists and thus covers a wide variety of traditional computer science domains, including data modeling, data retrieval, data mining, data integration, data managing, data warehousing, data cleaning, ontologies, simulation, parallel computing, agent-based technology, grid computing, and visualization. However, applying each of these domains to biomolecular and biomedical applications raises specific and unexpectedly challenging research issues.

In this book, we focus on data management and in particular data integration, as it applies to genomics and microbiology. This is an important topic because data are spread across multiple sources, preventing scientists from efficiently obtaining the information required to perform their research (on average, a pharmaceutical company uses 40 data sources). In this environment, answering a single question may require accessing several data sources and calling on sophisticated analysis tools (e.g., sequence alignment, clustering, and modeling tools). While data integration is a dynamic research area in the database community, the specific needs of biologists have led to the development of numerous middleware systems that provide seamless data access in a results-driven environment (eight middleware systems are described in detail in this book).

The objective of the book is to provide life scientists and computer scientists with a complete view on biological data management by: (1) identifying specific issues in biological data management, (2) presenting existing solutions from both academia and industry, and (3) providing a framework in which to compare these systems.

Book Audience

This book is intended to be useful to a wide audience. Students, teachers, bioinformaticians, researchers, practitioners, and scientists from both academia and industry may all benefit from its material. It contains a comprehensive description of issues for biological data management and an overview of existing systems, making it appropriate for introductory and instructional purposes. Developers not yet familiar with bioinformatics will appreciate descriptions of the numerous challenges that need to be addressed and the various approaches that have been developed to solve them. Bioinformaticians may find the description of existing systems and the list of challenges that remain to be addressed useful. Decision makers will benefit from the evaluation framework, which will aide in their selection of the integration system that fits best the need of their research laboratory or company. Finally, life scientists, the ultimate users of these systems, may be interested in understanding how they are designed and evaluated.

Topics and Organization

The book is organized as follows: Four introductory chapters are followed by eight chapters presenting systems, an evaluation chapter, a summary, a glossary, and an appendix.

The introduction further refines the focus of this book and provides a working definition of bioinformatics. It also presents the steps that lead to the development of an information system, from its design to its deployment. Chapter 2 introduces the challenges faced by the integration of biological information. Chapter 3 refines these challenges into use cases and provides life scientists a translation of their needs into technical issues. Chapter 4 illustrates why traditional approaches often fail to meet life scientists’ needs.

The following eight chapters each present an approach that was designed and developed to provide life scientists integrated access to data from a variety of distributed, heterogeneous data sources. The presented approaches provide a comprehensive overview of current technology. Each of these chapters is written by the main inventors of the presented system, specifies its requirements, and provides a description of both the chosen approach and its implementation. Because of the self-contained nature of these chapters, they may be read in any order. Chapter 13 provides users and developers with a methodology to evaluate presented systems. Such a methodology may be used to select the system most appropriate for an organization, to compare systems, or to evaluate a system developed in-house. The summary reiterates the state-of-the-art, existing solutions and new challenges that need to be addressed.

The appendix contains a list of useful biological resources (databases, organizations, and applications) organized in three tables. The acronyms commonly used to refer to them and used in the chapters of this book are spelled out, and current URLs are provided so that readers can access complete information.

Each of the chapters uses various technical terms. Because these terms involve expertise in life science and computer science, a glossary providing the spelling of acronyms or short definitions is provided at the end of the book.

Acknowledgments

Such a book requires hard work from a large number of individuals and organizations, and although we are not able to explicitly acknowledge everyone involved, we would like to thank as many as possible for their contributions.

We are obviously indebted to those individuals who contributed chapters, as this book would not have been as informative without them. Most of these contributions came in the form of detailed system descriptions. Whereas there are many bioinformatics data integration systems currently available, we selected several of the larger, better-known systems to include in this book. We are fortunate that key individuals working on these projects were willing and able to devote their time and energy to provide detailed descriptions of their systems. The fact that these contributors include the key architects of the systems makes them much more insightful than would otherwise be possible. We are also fortunate that Su Yun Chung, John Wooley, and Barbara Eckman were able to contribute their insights on a life scientist perspective of bioinformatics.

Beyond this obvious group, others contributed, directly and indirectly, to the final version of this book. We would like to thank our reviewers for their extremely helpful suggestions and our publishers for their support and tireless work bringing everything together. The manuscript reviewers included: Johann-Christoph Freytag, Humboldt-Universität zu Berlin; Mark Graves, Berlex; Michael Hucka, California Institute of Technology; Sean Mooney, Stanford University; and Shalom (Dick) Tsur, Ph.D., The Real-Time Enterprise Group. We would also like to thank Tom Slezak and Krishna Rajan for contributions that were not able to be included in the final version of this book.

Finally, Terence Critchlow would like to thank Carol Woodward for ongoing moral support, and Pete Eltgroth for providing the resources he used to perform this work. He would also like to extend his appreciation to Lawrence Livermore National Laboratory for their support of his effort and to acknowledge that this work was partially performed under the auspices of the U.S. DOE by LLNL under contract No. W-7405-ENG-48.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset