A
Abstract Syntax Notation One (ASN1),
407
access, restriction, in Sequence Retrieval System,
128
Affymetrix Analysis Data Model,
407
Affymetrix GeneChip microarray,
280
GeneExpress system and,
282
analysis package, Kleisli query system and,
165
analysis program, sensitivity of,
26
analysis tool in Sequence Retrieval System,
137–139
standardization involving,
282
annotation data mapping, gene,
295–296
annotation data space, gene,
279
annotation pipeline, genome,
26
ANSI-SPARC three-schema architecture,
254–257
application programming interface (API),
381,
407
application semantics,
19
of KIND model-based mediator,
361
of Sequence Retrieval System,
111
automated server maintenance in Sequence Retrieval System,
141–143
automatic summary table,
407
autonomous data source,
18
autonomy of databases,
407
C
calcium channel protein, example using,
319–322
Call-Level Interface (CLI),
409
capturing, relational schema,
125–126
capturing process knowledge,
340–341
cell averaging algorithm,
280
challenges of information integration,
11–31
meta-data specification,
24–25
provenance and accuracy,
25–27
Character Large Object (CLOB),
409
co-clustered fragment,
409
Collection Programming Language (CPL)
combining old and new data,
68
Common Object Request Broker Architecture (CORBA),
22,
91,
140,
141
comparative genomics,
409
compensation in query optimization,
317–318
compilation of domain maps,
354–355
execution plan, in KIND model-based mediator,
362
complex multiple-world scenario,
336–337
complex objects in Sequence Retrieval System,
134
composite structure, links to create,
136
comprehension syntax-based language,
151
Comprehensive Data Center,
397
computational analysis tool,
19
concept description, query as,
197–202
Conceptual Model (CM),
410
conceptual schema,
44,
255
consortium, Gene Ontology,
29
context-sensitive optimizations,
171–174
contextual references, in model-based mediation,
349
contextualization, in model-based mediation,
344,
350–351
controlled vocabulary,
40
cost model in performance evaluation,
372–374
cost of query processing,
96–97
coverage of information sources,
92
creating wrapper in DiscoveryLink registration,
313
curated gene data source, simple,
37–38
curation, data, definition of,
410
D
functional data model and,
252,
253
model, in K2 information integration system,
232–235
standardization involving,
282
data distribution, in system evaluation,
386–387
data-driven integration,
91–92
for integration of third-party gene expression data,
291–293
data federation, use case,
68–69
data format, updating of,
semi-structured text files,
40–41
simple curated gene data source,
37–38
transforming of database structure,
44
in K2 information integration system,
232–235
strengths and weaknesses of,
64
data organization, traditional,
81
data provider in model-based mediation,
343
characteristics of,
17–19
DiscoveryLink registration and,
314
gene expression data management and,
290
in K2 information integration system,
240–242
simple curated gene,
37–38
data space, gene expression,
278–281
gene expression measurement,
279–281
relational, viewing entry from,
128–129
Databank, in Sequence Retrieval System,
109–116
biologic, Kleisli query system and,
165–166
Expressed Sequence Tag,
319
heterogeneous, definition of,
414
patent, Kleisli query system and,
164
relational, query performance to,
128
virtual, in DiscoveryLink,
305
database management, traditional,
80–81
database management system (DBMS),
36
database structure, transforming,
44
declarative access, procedural access
vs., ,
49
declarative query language,
63
decoupled data driver,
242
delivery pattern in query processing,
93
deployment issues in GeneExpress
description, concept, query as,
197–202
description logic ontology,
194
design of biological information system,
75–101
concepts and ontologies,
85–86
engineering
vs. experimental science,
76–77
fully structured
vs. semi-structured,
82–84
generic system
vs. query-driven,
77–78
legacy data and tools,
78–80
scientific object identity,
84–85
tool-driven
vs. data-driven,
91–92
traditional database management,
80–81
discovery process, life sciences,
12–14
discoveryHub, efficiency of,
377
ease of use, scalability, and performance of,
327–329
materialized
vs. non-materialized approach and,
386
system information for,
428
distributed database systems,
411
distributed integration approach,
22
distributed object technology,
91
distribution, data, in system evaluation,
386–387
DNA sequence, resources for,
397–398
domain, constantly changing,
80
for model-based mediator system,
352–357
parameterized role and concepts,
356–357
reified roles as concepts,
353
domain-specific benchmark,
374
E
as implementation criterion,
377–378
elaboration identifier,
358
end user in model-based mediation,
344
experimental science
vs., ,
76–77
entry ID, hub table as,
126
environment, for life science discovery,
14–15
enzyme, definition of,
412
EST sequence, definition of,
412
European Bioinformatics Institute (EBI),
91
evaluation, query,
95,
96
implementation criteria for,
376–381
data distribution and heterogeneity,
386–387
integrating applications,
389
materialized
vs. non-materialized approach,
385–386
semi-structured
vs. fully structured data,
387–388
for third-party gene expression data integration,
291–293
execution plan compiler in KIND model-based mediator,
362
experimental science, engineering
vs., ,
76–77
exporter in P/FDM mediator,
251
Expressed Sequence Tag database,
319
as implementation criterion,
378–379
extensible markup language (XML),
43–44
for biological Web services,
30–31
database integration into Sequence Retrieval System,
116–124
exporting objects from SRS,
136–137
navigational capabilities of,
90
semi-structured
vs. fully structured data and,
387–388
F
Feature table of GenBank,
159
DiscoveryLink based on,
306
alternative architectures for integration,
250–252
Sequence Retrieval System and,
143
semi-structured text,
40–41
flat file, database
vs., ,
78
flat file databank integration,
112–116
semi-structured text,
40–41
for third-party gene expression data integration,
291–293
self-describing exchange,
156
frame of reference, terminological,
347
fully structured data, semi-structured data
vs., ,
387–388
fully structured information system,
82–84
functional programming language,
413
as implementation criterion,
379
G
GenAtlas, querying in,
85
Kleisli query system and,
150
materialized
vs. non-materialized approach and,
385–386
standardization involving,
282
gene annotation data mapping,
295–296
gene annotation data space,
279
gene chip microarray technology,
414
gene data source, simple curated,
37–38
Gene Expression Array (GXA),
283–284
gene expression data management,
277–299
gene expression measurement,
279–281
algorithms and normalization and,
286–287
of third-party gene expression data,
291–298
gene expression measurement data space,
279–281
gene fragment, definition of,
413
Gene Logic, DiscoveryLink and,
308
Gene Nomenclature Committee (HGNC),
28,
402
Gene Oncology (GO) Consortium,
29,
217
GeneCards, search in,
66–67
GeneExpress, system information for,
427
GeneExpress Data Warehouse (GXDW),
283–284
gene annotation component of,
290
deployment and update issues in,
283–284
integrating third-party expression data in,
291–298
query-driven approach
vs., ,
77–78
strengths and weaknesses of,
63
generic query optimization,
267–268
Genetics Computer Group (GCG),
307–308
genome annotation pipeline,
26
materialized
vs. non-materialized approach and,
385–386
object identity and,
84–85
genomic data source as integration challenge,
289–290
Glimpse search engine,
88
global-as-view technique,
216
in model-based mediation,
349,
350
global integration schema,
266
Globus Pallidus External,
351
GO databank in Sequence Retrieval System,
126–127
graphical user interface, for P/FDM,
269,
271
H
legacy tools including,
80
strengths and weaknesses of,
63
hardwired access to data sources,
304
hardwiring of mapping in GeneExpress system,
295
in semantic data integration,
58–59
syntactic and semantic,
212
heterogeneous data format,
18,
19–20
heterogeneous database, definition of,
414
hierarchy, in GeneExpress system,
293
HUGO name, withdrawn or approved,
84–85
human computer interaction,
375
Human Genome Initiative,
415
Human Genome Project,
415
Human Genome Organization (HUGO),
28,
402
hybrid integration approach,
64–65
hypertext markup language file (HTML),
147–148
hypothesis as design step,
76
I
ID, entry, hub table as,
126
identifier, elaboration,
358
IBM DiscoveryLink middleware system,
24
ImMunoGeneTics information system,
403
implementation, experiment as,
76
implementation criteria system evaluation,
376–381
in silico discovery kit (ISDK),
160,
161,
415
indexing tool output,
138
data provenance and accuracy,
25–27
meta-data specification,
24–25
information science, fusion with biology,
2–3
initial process semantics,
357
input, processing of,
138
integrated data driver,
241
Integrated Taxonomic Information System,
402
integrated view definition,
345
integrated view of biology,
12
in system evaluation,
389
declarative query language,
63
of flat file databanks with SRS,
112–116
algorithms and normalization and,
286–287
hard-coded approach to,
63
hybrid approach to,
64–65
relational
vs. non-relational,
64
semantic query planning,
65–67
syntactic
vs. semantic,
48–49
of third-party gene expression data,
291–298
semantic data mapping issues in,
293–296
structural data transformation issues in,
293
tool-driven
vs. data-driven,
91–92
integration schema, global,
266
intensity file, probe,
281
interaction, human computer,
375
application programming,
407
in K2 information integration system,
243–244
keyword-search querying,
24
Kleisli query system and,
166
to Sequence Retrieval System,
139–141
reasoning in query formulation,
202–205
internal language, of K2 information integration system,
239–240
International Classification of Diseases, Ninth Revision,
402
International Organization for Standardization,
415
International Union of Biochemistry and Molecular Biology (IUBMB),
28,
403
International Union of Pure and Applied Chemistry (IUPAC),
28,
403
K
K2 information integration system,
225–247
data model and languages in,
232–235
system information for,
426
keyword-search querying interface,
24
understandability of,
381,
384
data model and representation in,
153–157
K2 information integration system
vs., ,
228–229
Sequence Retrieval System and,
179–181
system information for,
425
understandability of,
384
knowledge based information integration, TAMBIS,
215–216
knowledge engineering,
353
knowledge representation in model-based mediator system
parameterized role and concepts,
356–357
reified roles as concepts,
353
process elaboration and abstraction,
358–359
Kyoto Encyclopedia of Genes and Genomes (KEGG),
416
L
Laboratory Information Management System (LIMS),
13,
127
functional programming,
413
life sciences discovery process,
12–14
in browsing scientific objects,
100
link-driven federation of databases,
416
link operator in SRS query language,
132–133
linking, databank, to Sequence Retrieval System,
130–133
literature reference,
401
loader, object, in Sequence Retrieval System,
133–137
local-as-view technique,
216
local ontology, in model-based mediation,
344
long-term potentiation in nerve cell,
340
loosely coupled system,
250
M
maintenance, automated server, in Sequence Retrieval System,
141–143
traditional database,
80–81
semantic data, in integration of third-part expression data,
293–296
materialized view,
44,
416
measurement data space, gene expression,
279–281
mediator database system,
22–24
meta-data specification,
24–25
Microarray Gene Expression Database society (MGED),
281,
417
microarray suite algorithm,
286–287
microarray suite (MAS), GeneChip,
280
microarray technology, gene chip,
414
Microsoft Distributed Component Object Model (DCOM),
91
Microsoft Visual Basic,
40
minimum information about a microarray experiment (MIAME),
281–282,
417
model-based mediator system,
335–366
Cell-Centered Database and SMART Atlas,
362–364
challenges from neurosciences,
338–342
conceptual models and source registration at,
344–349
for Cell-Centered Database,
345–347
contextual references,
349
creating terminological frame of reference,
347
ontological grounding of OM (S),
348
semantics of relationships in,
347–348
parameterized role and concepts,
356–357
reified roles as concepts,
353
interplay between mediator and sources,
349–351
process elaboration and abstraction,
358–359
model-based mediation (MBM),
417
syntactic
vs. semantic integration,
48–49
use case for integration,
45–46
multidisciplinary approach,
15
multiple sequence alignment,
404
N
name, HUGO, withdrawn or approved,
84–85
National Biological Information Infrastructure,
402
nested object in Sequence Retrieval System,
134
Nested Relational Calculus (NRC),
152,
163,
418
nested relationalized version of SQL,
151–153
nested structure in K2 system,
226
neuroscience, data integration in,
338–339
nomenclature, sample data mapping,
294–295
non-materialized view,
44,
418
non-relational data model,
64
relational data model
vs., ,
50
normal syntax, XML and,
118
normalization, gene expression data and,
286–287
novel gene discovery,
319
O
Object Definition Language (ODL),
418
object identity, scientific,
84–85
object loader in Sequence Retrieval System,
133–137
complex and nested objects,
134
links to create composite structures,
136
Object Management Group (OMG),
22,
28,
419
object-oriented database,
308
object-oriented interface to Sequence Retrieval System,
140–141
object-oriented model,
418
object-oriented programming,
253,
254
object-oriented technology,
22
Object-Protocol Model (OPM),
24
Object Query Language (OQL),
86,
419
on-line analytical processing (OLAP),
419
one-world/multiple-world scenarios,
419
ontological grounds of OM (S),
348
in model-based mediation,
344
Ontology Inference Layer (OIL),
418
Ontology for Molecular Biology (OMB),
217
Open DataBase Connectivity (ODBC),
418
optimization, query,
95–98
in K2 information integration system,
242–243
organization, data,
78–79
output, processing of,
138
P
alternative architectures for integration,
250–252
system information for,
427
parameterized roles and concepts,
356–357
pattern, in query processing
performance model for system evaluation,
371–376
performance of DiscoveryLink,
327–329
pharmacology research,
304
phylogeny and evolution biology,
12
pipeline, genome annotation,
26
Plant Ontology Consortium,
402
precision, of text retrieval,
388–389
probe, definition of,
419
probe intensity file,
281
procedural access, declarative access
vs., ,
49
life sciences discovery,
12–14
process elaboration and abstraction,
358–359
process knowledge, capturing,
340–341
process maps for model-based mediator system,
357–360
process elaboration and abstraction,
358–359
process semantics, initial,
357
program, structural recursion,
162–163
programming, object-oriented,
253
programming interface, application,
407
programming language, functional,
413
propagation of errors,
26
protein sequence, resources for,
397–398
proteome, definition of,
419
Public Catalog of Databases,
17
public data source,
17–18
R
reasoning, in query formulation,
202–205
record, definition of,
420
recursion program, structural,
162–163
reductionist molecular biology,
12
reified roles as concepts,
353
relational data model,
41–44
non-relational model
vs., ,
50
strengths and weaknesses of,
64
integration into Sequence Retrieval System,
124–129
capturing relational schema,
125–126
query performance to,
128
relational database management system (RDBMS),
21–22
Kleisli query system and,
165
relational schema, capturing,
125–126
relationships, semantics of, in model-based mediation,
347–348
reliability, data provenance and,
26–27
replication approach, data,
250–251
research and development, revolution in,
2–3
resolution, concept integration and,
4–5
in query processing,
92–93
Resource Description Framework (RDF),
420
query, in KIND model-based mediator,
362
in query optimization,
96
S
standardization involving,
282
sample data space, biological,
278–279
as implementation criterion,
379–380
of K2 information integration system,
244–245
in database federation,
258
relational, capturing,
125
science, experimental, engineering
vs., ,
76–77
scientific analysis program, sensitivity of,
26
scientific analysis tool in Sequence Retrieval System,
137–139
scientific object, browsing of,
100–101
scientific object identity,
84–85
search engine, Glimpse,
88
self-describing exchange format,
156
semantic browsing in model-based mediation,
344
semantic data integration,
58–60
semantic data mapping in integration of third-party expression data,
293–296
semantic heterogeneity,
212
semantic query optimization,
258
semantic
vs. syntactic integration,
48–49
semi-structured data, fully structured data
vs., ,
387–388
semi-structured information system,
82–83
semi-structured text file, advantages and disadvantages,
40–41
DNA or protein, resources for,
397–398
sequence data source, searching against,
87
Sequence Retrieval System (SRS),
109–144
automated server maintenance,
141–143
integrating flat file databanks,
112–116
complex and nested objects,
134
links to create composite structures,
136
relational database integration,
124–129
capturing relational schema,
125–126
system information for,
425
sequence similarity search,
404
sequencing, definition of,
412
query processing and,
318
GeneExpress system on,
283
server in Sequence Retrieval System,
111–112
simple curated gene data source,
37–38
simple multiple-world scenario,
336
Simple Object Access Protocol (SOAP),
141,
421
simple one-world scenario,
336
single channel gene expression microarray system,
279–281
characteristics of,
17–19
gene expression data management and,
290
in K2 information integration system,
240–242
simple curated gene,
37–38
source dependent query plan,
191
sources and services model,
206–208
Spatial Markup Rendering Tool (SMART) Atlas,
360,
362–364
specification, meta-data,
24–25
translating into technical approach,
8–9
Standard Markup Language (SML), definition of,
421
standard query language,
21–22
benefits and limitations of,
281–282
Stanford-IBM Manager of Multiple Information Sources (TSIMMIS),
24
statistical pattern in query processing,
93
statistical technique for gene expression data,
287–288
structural data transformation in integration of third-party gene expression data,
293
structural recursion program,
162–163
composite, links to create,
136
database, transformation of,
44
structure prediction,
404
Structured Query Language (SQL),
43,
86
subentry library in integrating flat file databanks,
116
summary table, automatic,
407
query optimization and,
97–98
syntactic heterogeneity,
212
syntactic
vs. semantic integration,
48–49
syntactical problem, SRS solution of,
123–124
synthetic approach to biology,
12
translating into technical approach,
8–9
systems analysis, demands of,
12
T
current and future developments in,
217–219
semantic integration and,
60
system information for,
426
tools-driven technology used by,
91
understandability of,
384
reasoning in query formulation,
202–205
technology, gene chip microarray,
413
terminological frame of reference,
347
text file, semi-structured, advantages and disadvantages,
40–41
text retrieval, in system evaluation,
388–389
third-party gene expression data, integration of,
291–298
semantic data mapping issues in,
293–296
three-level hierarchy, in GeneExpress system,
293
tightly coupled system,
250
scientific analysis, in Sequence Retrieval System,
137–139
tool-driven integration,
91–92
traditional database management,
80–81
traditional database system, searching and mining in,
88
of database structure,
44
Transparent Access to Multiple Bioinformatics Information Sources.
See TAMBIS
two channel gene expression microarray system,
279–281
two-level hierarchy in GeneExpress system,
293