Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

Index

AADAM format, 281, 292

Abstract Syntax Notation One (ASN1), 407

abstraction, process, 358–359

access, restriction, in Sequence Retrieval System, 128

accession number, 44

accuracy, 25–27

AADM. See Affymetrix Analysis Data Model

Affymetrix Analysis Data Model, 407

Affymetrix GeneChip microarray, 280

GeneExpress system and, 282

aggregation, 407

algebra, relational, 42–43, 420

algorithm

cell averaging, 280

gene expression data, 286–287

AllGenes project, 53–54

AllGenes query, 57, 58

ampersand, 120

analysis, 404

complexity of, 6–7

analysis package, Kleisli query system and, 165

analysis program, sensitivity of, 26

analysis software, 19

analysis tool in Sequence Retrieval System, 137–139

annotation, gene

as integration challenge, 289–290

standardization involving, 282

annotation data mapping, gene, 295–296

annotation data space, gene, 279

annotation pipeline, genome, 26

anomaly, update, 40

ANSI-SPARC three-schema architecture, 254–257

API. See application programming interface

application programming interface (API), 381, 407

application semantics, 19

architecture

DiscoveryLink, 309–312

federated. See federation

grid, 91–92

K2, 231–232

of KIND model-based mediator, 361

mediator, 256–261

of Sequence Retrieval System, 111

three-schema, 254–257

array

different versions of, 285–286

probe, 280

ASN1. See Abstract Syntax Notation One

Atlas, SMART, 362–364

automated server maintenance in Sequence Retrieval System, 141–143

automatic summary table, 407

autonomous data source, 18

autonomy of databases, 407

bag, 408

Basic Local Alignment Search Tool (BLAST), 25, 137–138, 408

DiscoveryLink and, 311, 313–316

FASTA and, 146

functionality of, 383

integration of, 45–46

querying vs. browsing, 47

batch queue, 139

benchmarks in performance evaluation, 374–375

bi-valued semantics, 90

Binary Large Object (BLOB), 408

bindjoin, 408

bioinformatics

biological data integration, 4–7

definition of, 3

design of system, 75–101. See also design of biological information system

future of, 394–396

problem and scope of, 2–4

system development, 7–10

biological data, nature of, 15–17

biological data integration, 7–10

biological database, Kleisli query system and, 165–166

biological ontology, 216–217

biological resource, 397–405

query processing and, 92–93

biological sample data space, 278–279

biological tool, legacy, 79–80

biology

fusion with information science, 2–3

systems, 421

BLAST. See Basic Local Alignment Search Tool

blastn, 408

blastp, 408

BLOB. See Binary Large Object

Boolean circuit, 408

Boolean query, 24

BottomUpOnce strategy, 172

box plot, 408

browsing

definition of, 408

design of, 89–90

example of, 50–52

querying vs., , 46–48

scientific objects, 100–101

semantic, in model-based mediation, 344

strengths and weaknesses of, 61–62

bulk data type, 408

calcium channel protein, example using, 319–322

Call-Level Interface (CLI), 409

canned query, 139

capability, source, 93

capturing, relational schema, 125–126

capturing process knowledge, 340–341

CDATA, 408

cDNA, 409

CDS, 409

cell averaging algorithm, 280

Cell-Centered Database, 345–347, 362–364

CGI, 409

challenges of information integration, 11–31

data integration, 21–24

meta-data specification, 24–25

ontology, 27–30

provenance and accuracy, 25–27

Web presentations, 30–31

Character Large Object (CLOB), 409

CLI. See Call-Level Interface

CLOB. See Character Large Object

CLUSTAL, 137–138

clustering technique, 13

CM. See Conceptual Model

CNS tissue, 409

co-clustered fragment, 409

code

Icarus, 113–115

Perl, 167, 168

code generator, 260

Collection Programming Language (CPL)

definition of, 409

DiscoveryLink and, 308

K2 system and, 228

P/FDM mediator and, 267

query processor and, 205

combining old and new data, 68

Common Object Request Broker Architecture (CORBA), 22, 91, 140, 141

definition of, 410

TAMBIS and, 214–215

comparative genomics, 409

compensation in query optimization, 317–318

compilation of domain maps, 354–355

compiler

condition, 260

execution plan, in KIND model-based mediator, 362

complex DTDs, 121

complex multiple-world scenario, 336–337

complex objects in Sequence Retrieval System, 134

complex value data, 233, 409

composite structure, links to create, 136

composition, view, 68

comprehension syntax-based language, 151

Comprehensive Data Center, 397

computational analysis tool, 19

concept

definition of, 190

parameterized, 356–357

recursive, 356

restricting of, 200–201

role as, 353

in system design, 85–86

concept description, query as, 197–202

concept integration, 4–5

concept overloading, 5

Conceptual Model (CM), 410

conceptual schema, 44, 255

condition compiler, 260

consortium, Gene Ontology, 29

construction, of links, 131–132

context-sensitive optimizations, 171–174

contextual references, in model-based mediation, 349

contextualization, in model-based mediation, 344, 350–351

controlled vocabulary, 40

CORBA. See Common Object Request Broker Architecture

cost model in performance evaluation, 372–374

cost of query processing, 96–97

DiscoveryLink and, 318, 322–326

coverage of information sources, 92

CPL. See Collection Programming Language

CPL2Perl, 176–179

CPU, 410

creating wrapper in DiscoveryLink registration, 313

criterion, 193

curated database, 26

curated gene data source, simple, 37–38

curation, data, definition of, 410

Daplex query language

capabilities of, 264–265

example using, 261, 262, 264

functional data model and, 252, 253

data

model, in K2 information integration system, 232–235

multimedia, 99–100

standardization involving, 282

data cleansing, 410

data curation, 410

data dictionary, 22

data distribution, in system evaluation, 386–387

data-driven integration, 91–92

data driver

decoupled, 242

integrated, 241

data exchange

for integration of third-party gene expression data, 291–293

standards for, 282

data federation, use case, 68–69

data format, updating of, 6

data fusion, 82, 410

data integration. See Integration, data

data loading, 296–297

data management, 35–69

basics, 36–39

gene expression, 277–299. See also gene expression data management

relational model, 41–44

retrieving genes, 38–39

semi-structured text files, 40–41

simple curated gene data source, 37–38

spreadsheets, 39–40

traditional, 41–44

transforming of database structure, 44

data mapping, semantic, 293–296

data mining, 87–89, 411

data model, 411

in K2 information integration system, 232–235

non-relational, 64

relational, 41–44

strengths and weaknesses of, 64

data organization, traditional, 81

data provenance, 25–27

data provider in model-based mediation, 343

data replication approach, 250–251

data repository, 4

data-shipping, 411

data source

characteristics of, 17–19

definition of, 147, 411

DiscoveryLink registration and, 314

gene expression data management and, 290

in K2 information integration system, 240–242

Kleisli query system and, 165–167

mediator and, 349–351

P/FDM mediator and, 265–266

simple curated gene, 37–38

Web, 65–67

data space, gene expression, 278–281

biological sample, 278–279

gene annotation, 279

gene expression measurement, 279–281

data transformation, 5

data type, 411

data warehouse. See Warehousing

databank

definition of, 112

relational, viewing entry from, 128–129

XML, loading from, 135

Databank, in Sequence Retrieval System, 109–116

database

autonomy of, 407

biologic, Kleisli query system and, 165–166

cell-centered, 362–364

definition of, 36, 112, 410

Expressed Sequence Tag, 319

flat files vs., , 78

heterogeneous, definition of, 414

link-driven federation of, 415–416

number of, 4

patent, Kleisli query system and, 164

relational, query performance to, 128

virtual, in DiscoveryLink, 305

database management, traditional, 80–81

database management system (DBMS), 36

definition of, 411

relational, 21–22

database structure, transforming, 44

database system, 405

Datalog, 411

DB2 DataJoiner, 306

DCOM. See Microsoft Distributed Component Object Model

DBMS. See database management system

DDL statement, 313–314

declarative access, procedural access vs., , 49

declarative query language, 63

decomposition, query, 68

decoupled data driver, 242

definition

integrated view, 345

intensional, 348–349

delivery pattern in query processing, 93

Department of Energy unanswerable query challenge, 226, 228, 229, 375–376

deployment issues in GeneExpress

system, 283–284

description, concept, query as, 197–202

description logic ontology, 194

description logics, 411

design of biological information system, 75–101

browsing, 89–90

concepts and ontologies, 85–86

data fusion, 82

engineering vs. experimental science, 76–77

fully structured vs. semi-structured, 82–84

generic system vs. query-driven, 77–78

legacy data and tools, 78–80

queries, 86–98. See also Query

scientific object identity, 84–85

searching, 87–89

tool-driven vs. data-driven, 91–92

traditional database management, 80–81

visualization, 98–101

development process, 9

dictionary

data, 22

in K2 system, 233

difference operation, 42

discovery process, life sciences, 12–14

discoveryHub, efficiency of, 377

DiscoveryLink, 24, 55–58, 303–331

approach, 306–316

architecture, 309–312

registration, 313–316

ease of use, scalability, and performance of, 327–329

efficiency of, 377

functionality of, 383

Kleisli query system and, 181–182

materialized vs. non-materialized approach and, 386

query processing in, 316–326

determining costs, 322–326

example of, 319–322

optimization and, 317–319

system information for, 428

distributed data, 45

distributed database systems, 411

distributed integration approach, 22

distributed object technology, 91

distribution, data, in system evaluation, 386–387

diversity, 15–16, 19–20

DNA, definition of, 412

DNA microarray, 412

DNA sequence, resources for, 397–398

DNA sequencing, 412

domain, constantly changing, 80

domain map, 335

definition of, 412

for model-based mediator system, 352–357

compilation of, 354–355

definition of, 352–353

deriving role hierarchy, 355–356

as logic rules, 354–355

parameterized role and concepts, 356–357

recursive concepts, 356

reified roles as concepts, 353

remarks, 355

role hierarchy, 354

domain semantics, 337

domain-specific benchmark, 374

driver

decoupled data, 242

integrated data, 241

DTD file, complex, 121

DTDGenerator, 120–121

EBI. See European Bioinformatics Institute

EcoCyc, 216

efficiency

as implementation criterion, 377–378

as user criterion, 382

elaboration, process, 358–359

elaboration identifier, 358

EMBOSS, 138

Empty syntax, XML and, 118–119

end user in model-based mediation, 344

engineering

experimental science vs., , 76–77

knowledge, 353

entity, general, 119–120

Entrez interface, 88–89

entry ID, hub table as, 126

environment, for life science discovery, 14–15

ENZYME, 403

enzyme, definition of, 412

ER model, 412

error

propagation of, 26

in spreadsheet, 40

EST sequence, definition of, 412

European Bioinformatics Institute (EBI), 91

evaluation, query, 95, 96

evaluation matrix, 372

evaluation of data management system, 9–10, 371–390

implementation criteria for, 376–381

efficiency, 377–378

extensibility, 378–379

functionality, 379

scalability, 379–380

understandability, 380

usability, 381

performance model for, 371–376

benchmarks, 374–375

cost model, 372–374

evaluation matrix, 372

tradeoffs in, 385–389

data distribution and heterogeneity, 386–387

integrating applications, 389

materialized vs. non-materialized approach, 385–386

semi-structured vs. fully structured data, 387–388

user criteria for, 382–385

efficiency, 382

extensibility, 382–383

functionality, 383

scalability, 383

understandability, 384

usability, 384–385

evolution biology, 12

Excel, 39–40

exchange format

Kleisli, 156, 157

self-describing, 156

standards for, 282

for third-party gene expression data integration, 291–293

execution plan compiler in KIND model-based mediator, 362

experimental science, engineering vs., , 76–77

explorer window in TAMBIS, 195–197

exporter in P/FDM mediator, 251

exporting from SRS to XML, 136–137

Expressed Sequence Tag database, 319

expression

shorthand, 119–120

table, 421

expression profile, 13

extensibility

as implementation criterion, 378–379

as user criterion, 382–383

extensible markup language (XML), 43–44

for biological Web services, 30–31

browsing and, 90

categories of, 83

database integration into Sequence Retrieval System, 116–124

challenge of, 122–124

procedure for, 120–121

support features, 121–122

uniqueness of, 118–120

definition of, 423

exporting objects from SRS, 136–137

loading from, 135

navigational capabilities of, 90

semi-structured vs. fully structured data and, 387–388

Sequence Retrieval System and, 110, 116–124

TAMBIS and, 215

wrapper, 312

external schema, 254

FASTA, 137–138, 146

Feature table of GenBank, 159

federation, 22

definition of, 412

DiscoveryLink based on, 306

example of, 54–58

link-driven, 415–416

P/FDM mediator and, 249–272

alternative architectures for integration, 250–252

analysis, 266–272

data sources, 265–266

example of, 261–264

functional data model, 252–254

mediator architecture, 257–261

query capabilities, 264–265

schemas in federation, 254–257

Sequence Retrieval System and, 143

use case, 68–69

warehousing vs., 49

fields, SRS, 130

file

hypertext markup language, 147–148

probe intensity, 281

semi-structured text, 40–41

filler, 193

filter, 208

First Order logic, 413

flat file, database vs., , 78

flat file databank integration, 112–116

foreign key, 413

format

data

semi-structured text, 40–41

updating of, 6

exchange

Kleisli, 156, 157

self-describing, 156

standards for, 282

for third-party gene expression data integration, 291–293

self-describing exchange, 156

fragment, gene, 289

definition of, 413

frame-based system, 217

frame of reference, terminological, 347

FTP, 413

fully structured data, semi-structured data vs., , 387–388

fully structured information system, 82–84

functional data model, 252–254

functional genomics, 413

functional programming language, 413

functionality

as implementation criterion, 379

as user criterion, 383

fuser, result, 261

fusion

data, 82

definition of, 410

vertical loop, 170

future of bioinformatics, 394–396

Garlic project, 306–307

GCG. See Genetics Computer Group

GDB. See Genome DataBase

GenAtlas, querying in, 85

GenBank

accession number, 44

feature table of, 159

identifiers in, 100–101

Kleisli query system and, 150

materialized vs. non-materialized approach and, 385–386

search in, 66–67

gene, definition of, 413

gene annotation

as integration challenge, 289–290

standardization involving, 282

gene annotation data mapping, 295–296

gene annotation data space, 279

gene chip microarray technology, 414

gene data source, simple curated, 37–38

gene discovery, 319

gene expression, 399, 413

Gene Expression Array (GXA), 283–284

gene expression data management, 277–299

data spaces, 278–281

biological sample, 278–279

gene annotation, 279

gene expression measurement, 279–281

GeneExpress system for, 282–284

integration in, 285–290

algorithms and normalization and, 286–287

array versions and, 285–286

gene annotation and, 289–290

sample data and, 288

of third-party gene expression data, 291–298

variability and, 287–288

gene expression measurement data space, 279–281

gene fragment, definition of, 413

Gene Logic, DiscoveryLink and, 308

Gene Nomenclature Committee (HGNC), 28, 402

Gene Oncology (GO) Consortium, 29, 217

description of, 402

gene product, 413

GeneCards, search in, 66–67

GeneChip, 413

GeneChip microarray, 280

GeneExpress, system information for, 427

GeneExpress Data Warehouse (GXDW), 283–284

gene annotation component of, 290

GeneExpress system, 282–284

algorithms in, 286–287

components of, 283

deployment and update issues in, 283–284

integrating third-party expression data in, 291–298

sample data in, 288

general entity, 119–120

generator

code, 260

logic plan, 360–361

generic approach, 49–50

query-driven approach vs., , 77–78

strengths and weaknesses of, 63

generic benchmark, 374

generic query optimization, 267–268

genetics, 399

Genetics Computer Group (GCG), 307–308

genome

definition of, 414

resources of, 398

genome annotation pipeline, 26

Genome DataBase (GDB)

Kleisli query system and, 150–151

materialized vs. non-materialized approach and, 385–386

object identity and, 84–85

genome project, 414

genomic data source as integration challenge, 289–290

Genomic Unified Schema, 385–386

genomics, 414

functional, 413

research needs of, 12–13

GenPept report, 153–154

creating warehouse of, 164–165

Glimpse search engine, 88

global-as-view technique, 216

definition of, 414

in model-based mediation, 349, 350

global integration schema, 266

global schema, 45–46, 414

Globus Pallidus External, 351

GO databank in Sequence Retrieval System, 126–127

GRAIL, 202

GRAIL query, 205–206

query planner, 208–211

graphical interface, 179

graphical user interface, for P/FDM, 269, 271

Grid, 414

grid architecture, 91–92

GUI, 414

GXA. See Gene Expression Array

GXDW. See GeneExpress Data Warehouse

hard-coding, 49–50

legacy tools including, 80

strengths and weaknesses of, 63

hardwired access to data sources, 304

hardwiring of mapping in GeneExpress system, 295

hash table, 321

heterogeneity

in semantic data integration, 58–59

syntactic and semantic, 212

heterogeneous data format, 18, 19–20

heterogeneous database, definition of, 414

HGNC. See Gene Nomenclature Committee

hierarchy, role, 355–356

hierarchy, in GeneExpress system, 293

host variable, 414

HTML. See hypertext markup language file

HTTP, 414

hub table, 126–127

HUGO. See Human Genome Organization

HUGO name, withdrawn or approved, 84–85

human computer interaction, 375

Human Genome Initiative, 415

Human Genome Project, 415

Human Genome Organization (HUGO), 28, 402

hybrid integration approach, 64–65

hybridization, 415

hypertext markup language file (HTML), 147–148

hypothesis as design step, 76

Icarus code, 113–114

ICode, 257–258, 261, 262–263

ICode rewriter, 260

ID, entry, hub table as, 126

identifier, elaboration, 358

identity

pre-defined, 81

scientific object, 84–85

IBM DiscoveryLink middleware system, 24

ImMunoGeneTics information system, 403

implementation, experiment as, 76

implementation criteria system evaluation, 376–381

efficiency, 377–378

extensibility, 378–379

functionality, 379

scalability, 379–380

understandability, 380

usability, 381

in silico discovery kit (ISDK), 160, 161, 415

indexing, SRS support for, 121–122

indexing tool output, 138

industrial merger, 303

information integration

in bioinformatics, 213–215

biologic ontologies, 216–217

data challenges, 21–24

data provenance and accuracy, 25–27

knowledge based, 215–216

meta-data specification, 24–25

ontology, 27–30

Web presentations, 30–31

information integration system, K2, 225–247. See also K2 information integration system

information science, fusion with biology, 2–3

Informax, 307

Infosleuth, 266

initial process semantics, 357

input, processing of, 138

input/output format, 19

integrated data driver, 241

Integrated Taxonomic Information System, 402

integrated view definition, 345

integrated view of biology, 12

integration

schema, 421

in system evaluation, 389

view, 423

integration, data, 4–10, 60–69

browsing vs. querying, 46–48, 61–62

as challenges, 21–24

challenges of, 11–31

concept, 4–5

declarative query language, 63

definition, 410

development process, 9

evaluation of, 9–10

of flat file databanks with SRS, 112–116

of gene expression data, 285–290

algorithms and normalization and, 286–287

array versions and, 285–286

sample data and, 288

gene annotation and, 289–290

variability and, 287–288

generic approach to, 63

hard-coded approach to, 63

hybrid approach to, 64–65

issues of, 4–7

procedural code, 63

relational vs. non-relational, 64

semantic, 58–60

semantic query planning, 65–67

specifications for, 7–8

syntactic vs. semantic, 48–49

technical approach, 8–9

of third-party gene expression data, 291–298

data exchange formats for, 291–293

data loading issues in, 296–297

semantic data mapping issues in, 293–296

structural data transformation issues in, 293

update issues in, 297–298

tool-driven vs. data-driven, 91–92

use case for, 45–46

Web data sources, 66

integration schema, global, 266

intensional definitions, 348–349

intensity file, probe, 281

interaction, human computer, 375

interface

application programming, 407

Entrez, 88–89

graphical, 179

in K2 information integration system, 243–244

keyword-search querying, 24

Kleisli query system and, 166

for P/FDM, 268–271

to Sequence Retrieval System, 139–141

TAMBIS, 195–205

constructing queries, 197–202

exploring ontology, 195–197

query processor, 205–212

reasoning in query formulation, 202–205

intermediary, 8

internal language, of K2 information integration system, 239–240

internal schema, 254, 256

International Classification of Diseases, Ninth Revision, 402

International Organization for Standardization, 415

International Union of Biochemistry and Molecular Biology (IUBMB), 28, 403

International Union of Pure and Applied Chemistry (IUPAC), 28, 403

is a hierarchy, 192

ISA relationship, 415

ISDK. See in silico discovery kitlSO. International Organization for Standardization

iteration, 207

IUBMB. See International Union of Biochemistry and Molecular Biology

IUPAC. See International Union of Pure and Applied Chemistry

Java-based visual interface, for P/FDM, 268

Java DataBase Connectivity (JDBC), 229, 415

Java RMI, 241–242

JDBC. See Java DataBase Connectivity

join, 42

joining data in DiscoveryLink query

processing, 317–318

joins, spatial, 337

Journal of Nucleic Acid Research, , 17

K2 information integration system, 225–247

approach in, 229–232

data model and languages in, 232–235

data sources in, 240–242

example of, 235–239

impact of, 245–246

internal language of, 239–240

Kleisli vs., , 228–229

query optimization in, 242–243

scalability of, 244–245

system information for, 426

user interfaces in, 243–244

K2MDL, 231–232, 415

KEGG. See Kyoto Encyclopedia of Genes and Genomes

key, primary, 81

keyword-search querying interface, 24

KIND

mediator prototype, 360–362

system information for, 428–429

understandability of, 381, 384

Kleisli query system, 23–24, 147–184

approach of, 151–153

data model and representation in, 153–157

data sources in, 165–167

DiscoveryLink and, 181–182

efficiency of, 377–378

functionality of, 383

K2 information integration system vs., , 228–229

motivating example for, 149–151

Object-Protocol Model and, 182–183

optimizations, 167–169

context-sensitive, 171–174

monadic, 169–170

relational, 174–175

query capability of, 158–163

Sequence Retrieval System and, 179–181

system information for, 425

understandability of, 384

user interfaces, 175–179

graphical, 179

program language, 175–179

warehousing capability of, 163–165

knowledge, process, 340–341

knowledge base, 90

knowledge based information integration, TAMBIS, 215–216

knowledge engineering, 353

knowledge representation in model-based mediator system

domain maps for, 352–357

compilation of, 354–355

definition of, 352–353

deriving role hierarchy, 355–356

as logic rules, 354–355

parameterized role and concepts, 356–357

recursive concepts, 356

reified roles as concepts, 353

remarks, 355

role hierarchy, 354

process maps for, 357–360

domain maps and, 358

initial process, 357

as logic rules, 359–360

process elaboration and abstraction, 358–359

known gene, 416

KRAFT, 266

Kyoto Encyclopedia of Genes and Genomes (KEGG), 416

Laboratory Information Management System (LIMS), 13, 127

definition of, 416

GeneChip, 281

output, 20

language

Daplex, 253

extensible markup. See extensible markup language (XML)

functional programming, 413

of K2 information integration system, 232–235, 239–240

query

definition of, 419

limitations of, 86–87

SRS, 129–130

legacy data and tools

biologic, 78–79

workflows, 79–80

LENS, 86

library, subentry, 116

life sciences discovery process, 12–14

LIMS. See Laboratory Information Management System

link

browsing, 89–90

in browsing scientific objects, 100

link-driven federation of databases, 416

link operator in SRS query language, 132–133

linking, databank, to Sequence Retrieval System, 130–133

LION, 307

LISP, 416

list, definition of, 416

list comprehension, 257

literature reference, 401

loader, object, in Sequence Retrieval System, 133–137

data, 296–297

from XML databank, 135

local-as-view technique, 216

definition of, 416

in model-based mediation, 350–351

local ontology, in model-based mediation, 344

local schema, 45–46

LocusLink, 403

logic

First Order, 413

temporal, 90

logic plan generator, 360–361

logic rule

domain map as, 354

process map as, 359–360

logics, description, 411

LOGSPACE, 416

long-term potentiation in nerve cell, 340

loop design, 76

loosely coupled system, 250

maintenance, automated server, in Sequence Retrieval System, 141–143

management

data, 35–69. See also data management

multimedia, 99–100

schema, 67–69

space, 373

time, 372–373

traditional database, 80–81

map

domain, 335

definition of, 412

in neuroscience, 339–342

process, 335

definition of, 419

simple process, 342

subprocess, 359

mapped role, 208

mapping

P/FDM mediator and, 263

schema, 68

semantic data, in integration of third-part expression data, 293–296

markup language, extensible. See extensible markup language

MAS. See microarray suite, GeneChip

MAS algorithm, 286–287

materialized approach, 385–386

materialized view, 44, 416

matrix

evaluation, 372

GXA, 283–284

MBM. See model-based mediation

measurement data space, gene expression, 279–281

mediation, semantic, 364

mediator

definition of, 417

sources and, 349–351

mediator architecture, 256–261

mediator database system, 22–24

mediator system

K2, 230–231

description of, 237–239

model-based, 335–366. See also model-based mediator system

P/FDM, 249–272. See also P/FDM mediator

prototype, 261–266

MEDLINE, 66

MEDLINE report, 153

merger, industrial, 303

meta-data, 56

Sequence Retrieval System and, 109–110, 111

meta-data specification, 24–25

meta language (ML), 417

MGED. See Microarray Gene Expression Database society

MIAME. See minimum information about microarray experiment

microarray

different versions of, 285–286

DNA, 411–412

microarray analysis, 404

Microarray Gene Expression Database society (MGED), 281, 417

microarray suite algorithm, 286–287

microarray suite (MAS), GeneChip, 280

microarray technology, gene chip, 414

Microsoft Distributed Component Object Model (DCOM), 91

Microsoft Visual Basic, 40

middleware, 417

middleware system, DiscoveryLink, 24. See also DiscoveryLink

minimum information about a microarray experiment (MIAME), 281–282, 417

mining, data, 87–89, 411

mismatch probe, 280

ML. See meta language

model

conceptual, 410

cost, 372–374

data, relational, 41–44

ER, 412

functional data, 252–254

object-oriented, 418

relational, 420

sources and services, 206–208

model-based mediator system, 335–366

background of, 336–337

Cell-Centered Database and SMART Atlas, 362–364

challenges from neurosciences, 338–342

conceptual models and source registration at, 344–349

for Cell-Centered Database, 345–347

contextual references, 349

creating terminological frame of reference, 347

intensional definitions, 348–349

ontological grounding of OM (S), 348

semantics of relationships in, 347–348

domain maps for, 352–357

compilation of, 354–355

definition of, 352–353

deriving role hierarchy, 355–356

as logic rules, 354–355

parameterized role and concepts, 356–357

recursive concepts, 356

reified roles as concepts, 353

remarks, 355

role hierarchy, 354

interplay between mediator and sources, 349–351

KIND mediator prototype, 360–362

process maps for, 357–360

domain maps and, 358

initial process, 357

as logic rules, 359–360

process elaboration and abstraction, 358–359

protagonists in, 343–344

reason-able meta-data, 365–366

related work, 364–365

model-based mediation (MBM), 417

module

optimizer, 260

reordering, 260

monad approach, 228

monadic optimizations, 169–170

motif, 192, 204

motivating use case, 45–46, 47

Mouse Genome Database

syntactic vs. semantic integration, 48–49

use case for integration, 45–46

mRNA, 417

multi-database approach, 251–252, 417

multidisciplinary approach, 15

multimedia data, 99–100

multiple sequence alignment, 404

name, HUGO, withdrawn or approved, 84–85

National Biological Information Infrastructure, 402

NCBI Entrez, 51–52

NCMIR, 338–339

nested object in Sequence Retrieval System, 134

Nested Relational Calculus (NRC), 152, 163, 418

nested relationalized version of SQL, 151–153

nested structure in K2 system, 226

neuroinformatics, 12

neuroscience, data integration in, 338–339

nomenclature, sample data mapping, 294–295

non-databased query, 175–176

non-materialized approach, 385–386

non-materialized view, 44, 418

non-relational data model, 64

relational data model vs., , 50

nonsensical question, 201–202

normal syntax, XML and, 118

normalization, gene expression data and, 286–287

novel gene discovery, 319

NP (NPTIME), 418

NP-complete, 418

NRC. See Nested Relational Calculus

number, accession, 44

OASIS, 31

object

browsing of, 100–101

complex and nested, 134

Sequence Retrieval System, 140–141

Object Data Management Group (ODMG), 231–233, 418

Object Definition Language (ODL), 418

object identity, scientific, 84–85

object loader in Sequence Retrieval System, 133–137

complex and nested objects, 134

exporting objects to XML, 136–137

links to create composite structures, 136

support for, 135

Object Management Group (OMG), 22, 28, 419

object model, 344

object-oriented database, 308

object-oriented interface to Sequence Retrieval System, 140–141

object-oriented model, 418

object-oriented programming, 253, 254

object-oriented technology, 22

Object-Protocol Model (OPM), 24

DiscoveryLink and, 308

Kleisli query system and, 162, 182–183

system based on, 85–86

TAMBIS and, 213–214

Object Query Language (OQL), 86, 419

definition of, 418

K2 system and, 228

ODB-Tools, 365

ODBC. See Open DataBase Connectivity

ODL. See Object Definition Language

ODMG. See Object Data Management Group

OIL. See Ontology Inference Layer

OLAP. See on-line analytical processing

OMB. See Ontology for Molecular Biology

OMG. See Object Management Group

on-line analytical processing (OLAP), 419

one-world/multiple-world scenarios, 419

ontological grounds of OM (S), 348

ontology, 27–30

biological, 216–217

definition of, 419

in model-based mediation, 344

neuroscience, 339

in system design, 85–86

TAMBIS, 192–197, 214, 219–220

Ontology Inference Layer (OIL), 418

Ontology for Molecular Biology (OMB), 217

Open DataBase Connectivity (ODBC), 418

OPM. See Object-Protocol Model

optimization, query, 95–98

Daplex and, 264

in DiscoveryLink, 317–319

generic, 267–268

in K2 information integration system, 242–243

Kleisli query system and, 167–169

monadic, 169–170

relational, 174–175

semantic, 258, 267

optimizer module, 260

OQL. See Object Query Language

Oracle, 308

Oracle wrapper, 311

organ resources, 401

organism resources, 401

organization, data, 78–79

traditional, 81

output, processing of, 138

overloading, concept, 5

P (PTLME), 420

P/FDM mediator, 249–272

alternative architectures for integration, 250–252

analysis, 266–272

optimization, 267–268

scalability, 271–272

user interface, 268–271

data sources, 265–266

example of, 261–264

functional data model, 252–254

mediator architecture, 257–261

query capabilities, 264–265

schemas in federation, 254–257

system information for, 427

package, analysis, 165

parameterized roles and concepts, 356–357

parser module, 257

parsing tool output, 138

patent database, 166

pattern, in query processing

delivery, 93

statistical, 93

pattern recognition, 405

perfect-match probe, 280

performance model for system evaluation, 371–376

benchmarks, 374–375

cost model, 372–374

evaluation matrix, 372

performance of DiscoveryLink, 327–329

Perl codes, 167, 168

pharmacogenomics, 400–401

definition of, 420

pharmacology research, 304

phrase-based system, 217

phylogeny and evolution biology, 12

pipeline, genome annotation, 26

planning, query, 94–95

Plant Ontology Consortium, 402

platform, establishing, 8

pre-defined identity, 81

pre-processing, 138

precision, of text retrieval, 388–389

primary key, 81, 419

Prisma, SRS, 141–143

probe, definition of, 419

probe array, 280

probe array version, 285

probe data, 280

probe intensity file, 281

probe pair, 280

procedural access, declarative access vs., , 49

procedural code, 63

process

life sciences discovery, 12–14

map, definition of, 419

process elaboration and abstraction, 358–359

process knowledge, capturing, 340–341

process map, 335

in neuroscience, 339

simple, 342

process maps for model-based mediator system, 357–360

domain maps and, 358

initial process, 357

as logic rules, 359–360

process elaboration and abstraction, 358–359

process semantics, initial, 357

processing, query, 92–98

processor, query, 205–212, 220. See also query processor

profile, user, 7–8

program, structural recursion, 162–163

programming, object-oriented, 253

programming interface, application, 407

programming language, functional, 413

projection, 42

Prolog, 254

propagation of errors, 26

protein, calcium channel, 319–322

protein domain, 400

protein family, 400

protein sequence, resources for, 397–398

proteome, definition of, 419

proteomics, 400, 419

prototype mediator, 261–266

KIND, 360–362

provenance, 25–27

provider

data, 343

view, 343–344

Public Catalog of Databases, 17

public data source, 17–18

PubMed

identifiers in, 100–101

search in, 51–52, 66–67, 89

query, 86–98

AllGenes, 57, 58

Boolean, 24

browsing, 89–90

cost of processing, 322–326

Daplex, 252, 261, 262, 264

capabilities of, 264–265

definition of, 420

DiscoveryLink and, 305–306, 316–326

architecture and, 309–310

determining costs, 322–326

example of, 319–322

optimization and, 317–319

efficiency of, 377–378

old and new data, 68

reasoning in formulation of, 202–205

in relational database, 128

searching and mining, 87–89

semantics of, 90

in Sequence Retrieval System, 128, 129–130

SQL, 127

in TAMBIS, 191, 197–202

unanswerable, 226, 228, 229, 375–376

to Web interface, 139

query decomposition, 68

query-driven approach, 77–78

query execution plan, 65

query language

declarative, 63

definition of, 420

SRS, 129–130

standard, 43–44

query optimization

in K2 information integration system, 242–243

semantic, 258

query processing, 92–98

biological resources in, 92–93

optimization in, 95–98

planning in, 94–95

query processor, TAMBIS, 205–212, 220

query planner, 208–211

sources and services model, 206–208

syntactic and semantic heterogeneity, 212

wrappers, 211–212

query rewriter in KIND model-based mediator, 362

query-shipping, 420

query splitter, 260, 268

query system, Kleisli, 147–184. See also Kleisli query system

querying, 420

browsing vs., , 46–48

object identity and, 84–85

SRS support for, 121–122

strengths and weaknesses of, 61–62

querying interface, keyword-search, 24

question, nonsensical, 201–202

queue, batch, 139

RDBMS. See relational database management system

RDF. See Resource Description Framework

reason-able meta-data, 365–366

reasoning, in query formulation, 202–205

record, definition of, 420

recursion program, structural, 162–163

recursive concept, 356

reductionist molecular biology, 12

registration

in DiscoveryLink, 309

process of, 313–316

in model-based mediation, 344–349

reified roles as concepts, 353

relational algebra, 42–43, 420

relational data model, 41–44

non-relational model vs., , 50

strengths and weaknesses of, 64

relational database, 153

integration into Sequence Retrieval System, 124–129

capturing relational schema, 125–126

hub table selection, 126–127

query performance, 128

restricting access, 128

SQL generation, 127

summary of, 129

viewing entries, 128–129

whole schema integration, 124–125

query performance to, 128

viewing entry from, 128–129

relational database management system (RDBMS), 21–22

Kleisli query system and, 165

relational model, 420

relational optimizations, 174–175

relational schema, capturing, 125–126

relationships, semantics of, in model-based mediation, 347–348

relevance

semantic, 364–365

source, 92–93

reliability, data provenance and, 26–27

reordering module, 260

replication approach, data, 250–251

report

GenPept, 153–154

creating warehouse of, 164–165

MEDLINE, 153

repository, data, 4

research and development, revolution in, 2–3

resolution, concept integration and, 4–5

resource, biological

list of, 397–105

in query processing, 92–93

Resource Description Framework (RDF), 420

restriction

access, 128

concept, 200–201

result fuser, 261

retrieval, text, 388–389

retrieval system, 405

rewriter

ICode, 260

query, in KIND model-based mediator, 362

RiboWeb, 216, 403

RNA, 420

role, 193

as concept, 353

mapped, 208

parameterized, 356–357

in TAMBIS, 207–208

role hierarchy, 355–356

rule

Icarus, 113–115

logic

domain map as, 354

process map as, 359–360

in query optimization, 96

rule-based rewriter, 258

sample data

gene expression, 288

standardization involving, 282

sample data mapping

nomenclature, 295

studies of, 294

sample data space, biological, 278–279

sanctioning, 203

scalability

of DiscoveryLink, 327–329

as implementation criterion, 379–380

of K2 information integration system, 244–245

P/FDM and, 271–272

as user criterion, 383

scaling factor, 287

schema

conceptual, 44

in database federation, 258

definition of, 41–42, 421

global integration, 266

relational, capturing, 125

three-schema architecture, 254–257

whole schema integration, 124–125

schema integration, 421

schema management, 67–69

schema mapping, 68

science, experimental, engineering vs., , 76–77

scientific analysis program, sensitivity of, 26

scientific analysis tool in Sequence Retrieval System, 137–139

scientific object, browsing of, 100–101

scientific object identity, 84–85

search, spreadsheet, 40

search engine, Glimpse, 88

searching

definition of, 421

design of, 87–89

and mining, 87–89

selection, 42

self-describing exchange format, 156

semantic browsing in model-based mediation, 344

semantic data integration, 58–60

semantic data mapping in integration of third-party expression data, 293–296

semantic heterogeneity, 212

semantic mediation, 364

semantic query optimization, 258

semantic relevance, 364–365

semantic vs. syntactic integration, 48–49

Semantic Web, 421

semantics

application, 19

of biological data, 5

initial process, 357

in model-based mediation, 347–348

of query, 90

semi-structured data, fully structured data vs., , 387–388

semi-structured information system, 82–83

semi-structured text file, advantages and disadvantages, 40–41

SeqStore, 307

sequence

DNA or protein, resources for, 397–398

EST, definition of, 412

sequence data source, searching against, 87

sequence folding, 404

Sequence Retrieval System (SRS), 109–144

architecture of, 111

automated server maintenance, 141–143

integrating flat file databanks, 112–116

subentry libraries, 116

token server, 113–115

interfaces to, 139–141

Kleisli query system and, 179–181

linking databanks, 130–133

object loader, 133–137

complex and nested objects, 134

exporting objects to XML, 136–137

links to create composite structures, 136

support for, 135

query language of, 129–130

relational database integration, 124–129

capturing relational schema, 125–126

hub table selection, 126–127

query performance, 128

restricting access, 128

SQL generation, 127

summary of, 129

viewing entries, 128–129

whole schema integration, 124–125

scientific analysis tools, 137–139

system information for, 425

TAMBIS and, 213

XML database integration, 116–124

challenge of, 122–124

procedure for, 120–121

support features, 121–122

uniqueness of, 118–120

sequence similarity search, 404

sequencing, definition of, 412

server

DiscoveryLink, 309

query processing and, 318

GeneExpress system on, 283

SOAP, TAMBIS and, 214–215

token, 113–115

server in Sequence Retrieval System, 111–112

maintenance of, 141–143

set, definition of, 421

shorthand expression, 119–120

simple curated gene data source, 37–38

simple multiple-world scenario, 336

Simple Object Access Protocol (SOAP), 141, 421

TAMBIS and, 214–215

simple one-world scenario, 336

simple process map, 342

simplified SQL, 148–149

simplified Structured Query Language (sSQL), 148–149, 151–152, 421

simplifier, 257

single channel gene expression microarray system, 279–281

SMART Atlas. See Spatial Markup Rendering Tool Atlas

SML. See Standard Markup Language

SNOMED. See Systematized

Nomenclature of Medicine

SOAP. See Simple Object Access Protocol

software, analysis, 19

software benchmark, 374–375

source, data

characteristics of, 17–19

definition of, 147, 411

gene expression data management and, 290

in K2 information integration system, 240–242

Kleisli query system and, 165–167

mediator and, 349–351

P/FDM mediator and, 265–266

simple curated gene, 37–38

types of, 78

Web, 65–67

source dependent query plan, 191

source relevance, 92–93

sources and services model, 206–208

space management, 373

spatial joins, 337

Spatial Markup Rendering Tool (SMART) Atlas, 360, 362–364

specification, meta-data, 24–25

specifications

determining, 7–8

translating into technical approach, 8–9

splitter, query, 260

spreadsheet, 39–40

SQL. See Structured Query Language

SRS. See Sequence Retrieval System

SRS Prisma, 141–143

SRSCS, 140, 141

sSQL. See simplified Structured Query Language

stackPACK, 138

Staged Prisma, 142

Standard Markup Language (SML), definition of, 421

standard query language, 21–22

standardization

benefits and limitations of, 281–282

of gene names, 28

Stanford-IBM Manager of Multiple Information Sources (TSIMMIS), 24

statement, DDL, 313–314

statistical pattern in query processing, 93

statistical technique for gene expression data, 287–288

storage schema, 256

stored procedure, 422

structural data transformation in integration of third-party gene expression data, 293

structural recursion program, 162–163

structure

composite, links to create, 136

database, transformation of, 44

resources of, 399

structure prediction, 404

Structured Query Language (SQL), 43, 86

definition of, 422

DiscoveryLink and, 311

generation of, 127

mining and, 87

plan generator, 362

subentry library in integrating flat file databanks, 116

subprocess map, 359

summary table, automatic, 407

survey, TAMBIS, 218

Swiss-Prot

accession number, 44

query optimization and, 97–98

SYNAPSE, 338–339

syntactic heterogeneity, 212

syntactic vs. semantic integration, 48–49

syntactical problem, SRS solution of, 123–124

synthetic approach to biology, 12

system evaluation, 9–10

system requirements

determining, 7–8

translating into technical approach, 8–9

Systematized Nomenclature of Medicine (SNOMED), 28, 288, 294–295, 402

systems analysis, demands of, 12

systems biology, 422

table

automatic summary, 407

hash, 321

hub, 126–127

table expression, 422

tagged union type, 153

TAMBIS, 24, 66, 149, 189–220

current and future developments in, 217–219

DiscoveryLink and, 308

extensibility of, 378–379

information integration, 213–215

biological ontologies, 216–217

knowledge based, 215–216

ontology, 192–197

P/FDM mediator and, 267

scalability of, 380

semantic integration and, 60

system information for, 426

tools-driven technology used by, 91

understandability of, 384

usability of, 381

user interface

constructing queries, 197–202

exploring ontology, 195–197

query processor, 205–212

reasoning in query formulation, 202–205

technology, gene chip microarray, 413

temporal logic, 90

term, 85

terminological frame of reference, 347

text file, semi-structured, advantages and disadvantages, 40–41

text retrieval, in system evaluation, 388–389

third-party gene expression data, integration of, 291–298

data exchange formats for, 291–293

data loading issues in, 296–297

semantic data mapping issues in, 293–296

update issues in, 297–298

three-level hierarchy, in GeneExpress system, 293

three-schema architecture, 254–257

tightly coupled system, 250

time management, 372–373

tissue resources, 401

token server, 113–115

tool

legacy, 79–80

scientific analysis, in Sequence Retrieval System, 137–139

tool-driven integration, 91–92

traditional database management, 80–81

traditional database system, searching and mining in, 88

transcription, 422

transcriptome, 422

transformation

data, 5

of database structure, 44

translation, 422

Transparent Access to Multiple Bioinformatics Information Sources. See TAMBIS

TSIMMIS. See Stanford-IBM Manager of Multiple Information Sources

tuple, 81

two channel gene expression microarray system, 279–281

two-level hierarchy in GeneExpress system, 293

UML. See Unified Modeling Language

UMLS ontology, 363

unanswerable query challenge, 226, 228, 229, 375–376

understandability

as implementation criterion, 380

as user criterion, 384

Unified Modeling Language (UML), 422

Uniform Resource Locators (URL), 423

union, 42

Universe, Sequence Retrieval System and, 110–111

update anomaly, 40

updating

GeneExpress system, 283–284

in integration of third-party gene expression data, 297–298

URL. See Uniform Resource Locators

usability

as implementation criterion, 381

as user criterion, 384–385

use case, 36–39

combining old and new data, 68

data federation, 68–69

data warehousing, 68

for integration, 45–46

retrieving genes and associated expression results, 38–39

simple curated gene data source, 37–38

user interface

in K2 information integration system, 243–244

for P/FDM, 268–271

in TAMBIS, 220

constructing queries, 197–202

exploring ontology, 195–197

query processor, 205–212

reasoning in query formulation, 202–205

user profile, 7–8

user survey, TAMBIS, 218

variability, 17

in gene expression data, 287–288

variant, definition of, 423

vector, differing meanings of, 29

vertical loop fusion, 170

view

definition of, 423

materialized, 416

non-materialized, 418

view building, 68

view composition, 68

view integration, 228, 423

view provider in model-based mediation, 343–344

viewing entry from relational databank, 128–129

virtual database in DiscoveryLink, 305

visualization

browsing scientific objects, 100–101

multimedia data, 99–100

vocabulary

consistent, 30

controlled, 40

warehousing, 21–22

definition of, 411

DiscoveryLink and, 307–308

example of, 52–54

federation vs., , 49

gene expression data management and, 290

GeneExpress system and, 283

in K2 system, 229

in Kleisli query system, 163–165

strengths and weaknesses of, 62–63

use case, 68

Web data source, 65–67

Web interface

for P/FDM, 268–269, 270

to Sequence Retrieval System, 139

Web presentation, 30–31

Web services, 141

webomim-get-detail function in Kleisli system, 166–167

whole schema integration, 124–125

window, explorer, in TAMBIS, 195–197

withdrawn HUGO name, 84–85

workflow

biological tools and, 80

definition of, 423–424

World Wide Web, 30–31

data sources on, 6, 17–18

wrapped sources, 191

wrapper, 23, 49–50

BLAST, 315

in database federation, 260–261

definition of, 424

DiscoveryLink, 56, 308, 310–311

cost of query processing and, 322–326

registration and, 313–316

TAMBIS, 211–212

XA, 423

XML. See extensible markup language

XPath, 90

XQuery, 90, 423

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Index

Create new playlist

Sign In

Sign Up

Index

Table of Contents for
Index