Home Page Icon
Home Page
Table of Contents for
IV. Databases and Visualization
Close
IV. Databases and Visualization
by Cynthia Gibas, Per Jambeck
Developing Bioinformatics Computer Skills
Copyright
Preface
Audience for This Book
Structure of This Book
Our Approach to Bioinformatics
URLs Referenced in This Book
Conventions Used in This Book
Comments and Questions
Acknowledgments
I. Introduction
1. Biology in the Computer Age
1.1. How Is Computing Changing Biology?
1.1.1. The Eye of the Fly
1.1.2. Labels in Gene Sequences
1.1.3. Comparing eyeless and aniridia with BLAST
1.2. Isn't Bioinformatics Just About Building Databases?
1.2.1. The First Information Age in Biology
1.3. What Does Informatics Mean to Biologists?
1.4. What Challenges Does Biology Offer Computer Scientists?
1.5. What Skills Should a Bioinformatician Have?
1.6. Why Should Biologists Use Computers?
1.6.1. A New Approach to Data Collection
1.7. How Can I Configure a PC to Do Bioinformatics Research?
1.7.1. Why Use Unix or Linux?
1.8. What Information and Software Are Available?
1.8.1. Why Do I Need to Install a Program from the Web?
1.9. Can I Learn a Programming Language Without Classes?
1.10. How Can I Use Web Information?
1.11. How Do I Understand Sequence Alignment Data?
1.12. How Do I Write a Program to Align Two Biological Sequences?
1.13. How Do I Predict Protein Structure from Sequence?
1.14. What Questions Can Bioinformatics Answer?
2. Computational Approaches to Biological Questions
2.1. Molecular Biology's Central Dogma
2.1.1. Replication of DNA
2.1.2. Genomes and Genes
2.1.3. Transcription of DNA
2.1.4. Translation of mRNA
2.1.5. Molecular Evolution
2.2. What Biologists Model
2.2.1. Accessing 3D Molecules Through a 1D Representation
2.2.2. Abstractions for Modeling Protein Structure
2.2.3. Mathematical Modeling of Biochemical Systems
2.3. Why Biologists Model
2.4. Computational Methods Covered in This Book
2.5. A Computational Biology Experiment
2.5.1. Identifying the Problem
2.5.2. Separating the Problem into Simpler Components
2.5.3. Evaluating Your Needs
2.5.4. Selecting the Appropriate Data Set
2.5.5. Identifying the Criteria for Success
2.5.6. Performing and Documenting a Computational Experiment
2.5.6.1. Documentation issues in computational biology
2.5.6.2. Electronic notebooks
II. The Bioinformatics Workstation
3. Setting Up Your Workstation
3.1. Working on a Unix System
3.1.1. What Does an Operating System Do?
3.1.2. Why Use Unix?
3.1.3. Different Flavors of Unix
3.1.3.1. Linux
3.1.3.1.1. Will Linux run on your computer?
3.1.3.2. Other common flavors
3.1.4. Graphical Interfaces for Unix
3.2. Setting Up a Linux Workstation
3.2.1. Installing Linux
3.2.1.1. System requirements
3.2.1.2. Partitioning your disk
3.2.1.3. Selecting major package groupings
3.2.1.4. Other useful packages to add
3.3. How to Get Software Working
3.3.1. Unix tar Archives
3.3.2. Binary Distributions
3.3.3. RPM Archives
3.3.3.1. GnoRPM
3.3.4. Source Distributions
3.3.5. Perl Scripts
3.3.6. Putting It in Your Path
3.3.7. Sharing Software Among Multiple Users
3.4. What Software Is Needed?
4. Files and Directories in Unix
4.1. Filesystem Basics
4.1.1. Moving Around the Directory Hierarchy
4.1.2. Paths to Files and Directories
4.1.3. Using a Process-Based File Hierarchy
4.1.4. Establishing File-Naming Conventions for Your Work
4.1.5. Structuring a Project: An Example
4.2. Commands for Working with Directories and Files
4.2.1. Moving Around the Filesystem
4.2.1.1. You are here: pwd
4.2.1.2. Changing directories with cd
4.2.2. Finding Files and Directories
4.2.2.1. Listing files with ls
4.2.2.2. Interpreting ls output
4.2.2.3. Finding files with find
4.2.2.4. Finding an executable file with which
4.2.2.5. Finding an executable file with whereis
4.2.3. Manipulating Files and Directories
4.2.3.1. Copying files and directories with cp
4.2.3.2. Moving and renaming files and directories with mv
4.2.3.3. Creating new links to files and directories with ln
4.2.3.4. Creating and removing directories with mkdir and rmdir
4.2.3.5. Removing files with rm
4.3. Working in a Multiuser Environment
4.3.1. Users and Groups
4.3.2. User Directories
4.3.3. File Permissions and Statistics
4.3.3.1. Viewing file attributes with stat
4.3.3.2. Changing file ownership and permissions with chmod
4.3.3.3. Changing file and directory ownership with chown and chgrp
4.3.4. System Administration
4.3.5. Conventions for Organizing Files
4.3.6. Locating Files in System Directories
5. Working on a Unix System
5.1. The Unix Shell
5.1.1. What Flavors of Shell Are There?
5.2. Issuing Commands on a Unix System
5.2.1. The Command-Line Format
5.2.2. Unix Information Commands
5.2.3. Standard Input and Output
5.2.4. Redirection of Command Input and Output
5.2.5. Operators
5.2.6. Wildcard Characters
5.2.7. Running X Commands
5.3. Viewing and Editing Files
5.3.1. Viewing and Combining Files with cat
5.3.2. more: A Step in the Right Direction
5.3.3. less: The Gold Standard
5.3.4. Editing Files with vi and vim
5.3.5. The GNU Emacs Editor
5.3.6. Viewing Binary Files with strings
5.3.7. od and Binary Data
5.4. Transformations and Filters
5.4.1. Extracting the Beginning of a File with head
5.4.2. Extracting the End of a File with tail
5.4.3. Splitting Files with split and csplit
5.4.4. Separating File Components with cut
5.4.5. Combining Files with paste
5.4.6. Merging Datafiles with join
5.4.7. Sorting Files with sort
5.4.7.1. Specifying sort keys
5.5. File Statistics and Comparisons
5.5.1. Comparing Files with cmp and diff
5.5.2. Counting Words with wc
5.6. The Language of Regular Expressions
5.6.1. Searching for Patterns with grep
5.7. Unix Shell Scripts
5.8. Communicating with Other Computers
5.8.1. The Web
5.8.2. IP Addresses and Hostnames
5.8.3. telnet
5.8.4. ftp
5.8.5. Displaying from a Remote Terminal
5.8.6. Communication and File Sharing
5.8.7. Media Compatibility
5.8.8. Accessing Devices as Unix Filesystems
5.8.9. Accessing Devices as DOS Disks
5.9. Playing Nicely with Others in a Shared Environment
5.9.1. Processes and Process Management
5.9.1.1. Checking the load average
5.9.1.2. Listing processes with ps
5.9.1.3. top
5.9.1.4. Signaling processes with kill
5.9.1.5. Setting process priorities with nice and renice
5.9.2. Scheduling Recurring Activities with cron
5.9.2.1. Submitting jobs to cron using crontab
5.9.2.2. Using cron to schedule a recurrent database search
5.9.2.3. Scheduling processes with batch and at
5.9.3. Monitoring Space Usage and File Sizes
5.9.3.1. Checking disk usage with du
5.9.3.2. Checking for free disk space with df
5.9.3.3. Checking your compliance with system quotas with quota
5.9.4. Creating Archives of Your Data
5.9.4.1. tar: Hold the feathers
5.9.4.2. compress
5.9.4.3. gzip
III. Tools for Bioinformatics
6. Biological Research on the Web
6.1. Using Search Engines
6.1.1. Boolean Searching
6.1.2. Search Engine Algorithms
6.2. Finding Scientific Articles
6.2.1. Using PubMed Effectively
6.3. The Public Biological Databases
6.3.1. Data Annotation and Data Formats
6.3.2. 3D Molecular Structure Data
6.3.3. DNA, RNA, and Protein Sequence Data
6.3.4. Genomic Data
6.3.5. Biochemical Pathway Data
6.3.6. Gene Expression Data
6.4. Searching Biological Databases
6.4.1. GenBank
6.4.1.1. Saving search results
6.4.1.2. Saving large result sets
6.4.2. PDB
6.5. Depositing Data into the Public Databases
6.5.1. GenBank Deposition
6.5.2. PDB Deposition
6.6. Finding Software
6.7. Judging the Quality of Information
6.7.1. Authority
6.7.2. Transparency
6.7.3. Timeliness
7. Sequence Analysis, Pairwise Alignment, and Database Searching
7.1. Chemical Composition of Biomolecules
7.2. Composition of DNA and RNA
7.3. Watson and Crick Solve the Structure of DNA
7.4. Development of DNA Sequencing Methods
7.4.1. The Chemical Composition of Proteins
7.4.2. Mechanisms of Molecular Evolution
7.5. Genefinders and Feature Detection in DNA
7.5.1. Predicting Gene Locations
7.5.2. Feature Detection
7.6. DNA Translation
7.7. Pairwise Sequence Comparison
7.7.1. Scoring Matrices
7.7.2. Gap Penalties
7.7.3. Dynamic Programming
7.7.4. Global Alignment
7.7.4.1. Using ALIGN to produce a global sequence alignment
7.7.5. Local Alignment
7.7.5.1. Tools for local alignment
7.8. Sequence Queries Against Biological Databases
7.8.1. Local Alignment-Based Searching Using BLAST
7.8.1.1. The BLAST algorithm
7.8.1.2. NCBI BLAST and WU-BLAST
7.8.1.3. What do the various BLAST programs do?
7.8.1.4. Building a local database with formatdb
7.8.1.5. Evaluating BLAST results
7.8.2. Local Alignment Using FASTA
7.8.2.1. The FASTA algorithm
7.8.2.2. The FASTA programs
7.9. Multifunctional Tools for Sequence Analysis
7.9.1. NCBI SEALS
7.9.2. The Biology Workbench
7.9.3. DoubleTwist
8. Multiple Sequence Alignments, Trees, and Profiles
8.1. The Morphological to the Molecular
8.2. Multiple Sequence Alignment
8.2.1. Progressive Strategies for Multiple Alignment
8.2.2. Multiple Alignment with ClustalW
8.2.3. Viewing and Editing Alignments with Jalview
8.2.4. Sequence Logos
8.3. Phylogenetic Analysis
8.3.1. Phylogenetic Trees Based on Pairwise Distances
8.3.2. Phylogenetic Trees Based on Neighbor Joining
8.3.3. Phylogenetic Trees Based on Maximum Parsimony
8.3.4. Phylogenetic Trees Based on Maximum Likelihood Estimation
8.3.5. Software for Phylogenetic Analysis
8.3.5.1. PHYLIP
8.3.5.1.1. The PHYLIP input format
8.3.5.2. Generating input for PHYLIP with ClustalX
8.4. Profiles and Motifs
8.4.1. Motif Databases
8.4.1.1. Blocks
8.4.1.2. PROSITE
8.4.1.3. Pfam
8.4.1.4. PRINTS
8.4.1.5. COG
8.4.1.6. Accessing multiple databases
8.4.2. Constructing and Using Your Own Profiles
8.4.2.1. Finding new motifs with MEME
8.4.2.2. Searching for motifs with MAST and MetaMEME
8.4.2.3. Motif discovery with other programs
8.4.2.4. HMMer
8.4.3. Incorporating Motif Information into Pairwise Alignment
9. Visualizing Protein Structures and Computing Structural Properties
9.1. A Word About Protein Structure Data
9.2. The Chemistry of Proteins
9.2.1. From 1D to 3D
9.2.2. Interatomic Forces and Protein Structure
9.2.2.1. Covalent interactions
9.2.2.2. Hydrogen bonds
9.2.2.3. Hydrophobic and hydrophilic interactions
9.2.2.4. Charge-charge, charge-dipole, and dipole-dipole interactions
9.2.2.5. Van der Waals forces
9.2.2.6. Repulsive forces
9.2.2.7. Relative strength of interatomic forces
9.3. Web-Based Protein Structure Tools
9.4. Structure Visualization
9.4.1. Molecular Structure Viewers for Your Web Browser
9.4.1.1. RasMol
9.4.1.2. Cn3D
9.4.1.3. SWISS-PDBViewer
9.4.2. Standalone Modeling Packages
9.4.2.1. MolMol
9.4.2.2. MidasPlus
9.4.2.3. VMD
9.4.3. Creating High-Quality Graphics with MolScript
9.4.4. Active Site Visualization with LIGPLOT
9.4.5. dimplot
9.5. Structure Classification
9.5.1. Secondary Structure from Coordinates
9.5.1.1. STRIDE
9.5.2. Topology Cartoons
9.5.2.1. TOPS
9.5.3. Classification Databases
9.5.3.1. SCOP
9.5.3.2. CATH
9.5.3.3. Unique protein structure data sets
9.6. Structural Alignment
9.6.1. Comparing Two Protein Structures
9.6.1.1. ProFit
9.6.2. DALI Domain Dictionary
9.6.3. CE and CL
9.6.4. VAST
9.7. Structure Analysis
9.7.1. Analyzing Structure Quality
9.7.1.1. PROCHECK
9.7.1.2. WHAT IF/ WHAT CHECK
9.7.2. Intramolecular Interactions
9.7.2.1. Computing contacts with HBPLUS
9.8. Solvent Accessibility and Interactions
9.8.1. Computing Solvent Accessibility with naccess
9.8.2. Solvent Accessibility with Alpha Shapes
9.9. Computing Physicochemical Properties
9.9.1. Macromolecular Electrostatics
9.9.2. Visualization of Molecular Surfaces with Mapped Properties
9.9.2.1. GRASP/GRASS
9.10. Structure Optimization
9.10.1. Informatics Plays a Role in Optimization
9.10.2. Rotamer Libraries
9.10.3. PDFs
9.11. Protein Resource Databases
9.11.1. GeneCensus
9.11.2. PRESAGE
9.11.3. BIND
9.12. Putting It All Together
10. Predicting Protein Structure and Function from Sequence
10.1. Determining the Structures of Proteins
10.1.1. Solving Protein Structures by X-ray Crystallography
10.1.2. Solving Structures by NMR Spectroscopy
10.2. Predicting the Structures of Proteins
10.2.1. CASP: The Search for the Holy Grail
10.3. From 3D to 1D
10.4. Feature Detection in Protein Sequences
10.5. Secondary Structure Prediction
10.5.1. Alignment-Based and Hybrid Methods
10.5.2. Single Sequence Prediction Methods
10.5.3. Measuring Prediction Accuracy
10.5.4. Putting Predictions to Use
10.5.5. Predicting Transmembrane Helices
10.5.6. Threading
10.6. Predicting 3D Structure
10.6.1. Homology Modeling
10.6.1.1. Modeller
10.6.1.2. How Modeller builds a model
10.6.1.3. ModBase: a database of automatically generated models
10.6.1.4. The SWISS-MODEL server
10.6.2. Tools for Ab-Initio Prediction
10.7. Putting It All Together: A Protein Modeling Project
10.7.1. Finding Homologous Structures
10.7.2. Looking for Distant Homologies
10.7.3. Predicting Secondary Structure from Sequence
10.7.4. Using Threading Methods to Find Potential Folds
10.7.5. Using Profile Methods to Align Distantly Related Sequences
10.7.6. Building a Homology Model
10.8. Summary
11. Tools for Genomics and Proteomics
11.1. From Sequencing Genes to Sequencing Genomes
11.1.1. Analysis of Raw Sequence Data: Basecalling
11.1.2. Sequencing an Entire Genome
11.1.2.1. The shotgun approach
11.1.2.2. The clone contig approach
11.1.2.3. LIMS: Tracking all those minisequences
11.2. Sequence Assembly
11.3. Accessing Genome Informationon the Web
11.3.1. NCBI Genome Resources
11.3.2. TIGR Genome Resources
11.3.3. EnsEMBL
11.3.4. Other Sequencing Centers
11.3.5. Organism-Specific Resources
11.4. Annotating and Analyzing Whole Genome Sequences
11.4.1. Genome Annotation
11.4.1.1. MAGPIE
11.4.2. Genome Comparison
11.4.2.1. PipMaker
11.4.2.2. MUMmer
11.5. Functional Genomics: New Data Analysis Challenges
11.5.1. Sequence-Based Approaches for Analyzing Gene Expression
11.5.2. DNA Microarrays: Emerging Technologiesin Functional Genomics
11.5.3. Bioinformatics Challenges in Microarray Design and Analysis
11.5.3.1. Planning array experiments
11.5.3.2. Analyzing scanned microarray images with CrazyQuant
11.5.3.3. Visualizing high-dimensional data
11.5.3.4. Clustering expression profiles
11.5.3.5. A note on commercial software for expression analysis
11.6. Proteomics
11.6.1. Experimental Approaches in Proteomics
11.6.2. Informatics Challenges in 2D-PAGE Analysis
11.6.3. Tools for Proteomics Analysis
11.6.4. Generalizing the Array Approach
11.7. Biochemical Pathway Databases
11.7.1. Illustration of a Complex Metabolic Pathway
11.7.2. EC Nomenclature
11.7.3. WIT and KEGG
11.7.4. PathDB
11.8. Modeling Kinetics and Physiology
11.8.1. Modeling Kinetics with Gepasi
11.8.2. XPP
11.8.3. Using the Virtual Cell Portal
11.9. Summary
IV. Databases and Visualization
12. Automating Data Analysis with Perl
12.1. Why Perl?
12.1.1. Where Do I Get Perl?
12.2. Perl Basics
12.2.1. Hello World
12.2.2. A Bioinformatics Example
12.2.3. Variables
12.2.3.1. Scalars
12.2.3.2. Arrays
12.2.3.3. Hashes
12.2.4. Loops
12.2.5. Subroutines
12.3. Pattern Matching and Regular Expressions
12.4. Parsing BLAST Output Using Perl
12.5. Applying Perl to Bioinformatics
12.5.1. Bioperl
12.5.2. CGI.pm
12.5.3. LWP
12.5.4. PDL
12.5.5. DBI
12.5.6. GD
13. Building Biological Databases
13.1. Types of Databases
13.1.1. Flat File Databases
13.1.1.1. Flat file databases in biology
13.1.2. Relational Databases
13.1.2.1. How tables are organized
13.1.2.2. The database schema
13.1.3. Object-Oriented Databases
13.2. Database Software
13.2.1. Sequence Retrieval System
13.2.2. Oracle
13.2.3. PostgreSQL
13.2.4. Open Source Object DBMS
13.2.5. MySQL
13.3. Introduction to SQL
13.3.1. SQL Datatypes
13.3.2. SQL Commands
13.3.2.1. Adding a new table to a database
13.3.2.2. Changing an existing table
13.3.2.3. Adding data to an existing table
13.3.2.4. Altering existing data in a table
13.3.3. Accessing Your Database with the SQLSELECT Command
13.3.3.1. Choosing fields to select
13.3.3.2. Using a WHERE clause to specify selection conditions
13.3.3.3. Joining output from multiple tables
13.4. Installing the MySQL DBMS
13.4.1. Setting Up MySQL
13.4.1.1. Using the mysql client program
13.4.1.2. Using the mysqladmin client program to set up MySQL
13.4.1.3. Restarting the MySQL server
13.4.2. Securing Your MySQL Server
13.4.3. Setting Up the Data Directory
13.4.4. Creating a New Database
13.5. Database Design
13.5.1. On Entities and Attributes
13.5.2. Creating a Database from Your Data Model
13.5.3. Creating Relationships Between Tables
13.6. Developing Web-Based Software That Interacts with Databases
13.6.1. CGI
13.6.2. XML
13.6.2.1. XML applications
13.6.3. PHP
13.6.3.1. Accessing MySQL databases with PHP
13.6.3.2. Collecting information from a form with PHP
14. Visualization and Data Mining
14.1. Preparing Your Data
14.2. Viewing Graphics
14.2.1. xzgv
14.2.2. Ghostview and gv
14.2.3. The GIMP
14.3. Sequence Data Visualization
14.3.1. Making Publication-Quality Alignmentswith TEXshade
14.3.2. Viewing Sequence Distances Geometrically
14.4. Networks and Pathway Visualization
14.5. Working with Numerical Data
14.5.1. gnuplot and xgfe
14.5.2. Grace: The Pocketknife of Data Visualization
14.5.3. Multidimensional Analysis: XGobi and XGvis
14.5.4. Programming for Data Analysis
14.5.4.1. R and S-plus
14.5.4.2. Online resources for R
14.5.4.3. Matlab and Octave
14.6. Visualization: Summary
14.7. Data Mining and Biological Information
14.7.1. Problems in Data Mining and Machine Learning
14.7.1.1. Supervised and unsupervised learning
14.7.2. A Collection of Data Mining Techniques
14.7.2.1. Decision trees
14.7.2.2. Neural networks
14.7.2.3. Genetic algorithms
14.7.2.4. Support vector machines
Bibliography
Unix
SysAdmin
Perl
General Reference
Bioinformatics Reference
Molecular Biology/Biology Reference
Protein Structure and Biophysics
Genomics
Biotechnology
Databases
Visualization
Data Mining
Colophon
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
11. Tools for Genomics and Proteomics
Next
Next Chapter
12. Automating Data Analysis with Perl
Part IV. Databases and Visualization
Chapter 12
Chapter 13
Chapter 14
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset