Index

Note: Page numbers followed by f indicate figures and t indicate tables.

A

Acronym resolution, nonrepetitive data 277, 277f
Active indexing 255–256
Aggregated data, life cycle of 34
Agility, Data Vault 2.0 166–167
Analytic tool, call centers 299–300, 299f
AnalytiX DS 137
Application data 60–61, 60f, 322–323, 323f
transformations of 62, 63f
Archival facility 47
Associative word processing 281–282
Atomicity, consistency, isolation, and durability (ACID) compliance 158
Auditing data 360, 360f

B

Bar charts 389t, 390, 391f
bubble chart 393
horizontal bar chart 392, 392f
line chart 392, 393f
stacked bar chart 391–392, 391f
Big data 29, 44–45, 44f, 67, 201, 202f
context in 
nonrepetitive data 79–80, 79–80f
repetitive data 77–78, 77–78f
data in 76–77, 76–77f
definition of 73
existing system interface 211, 212f
context-enriched section of 215–216, 217f
exception based data 213, 214f
into existing systems environment 215, 216f
nonrepetitive raw 214–215
repetitive raw 211–213
structured data/unstructured data analysis 217–218
Hadoop and 70
IBM and 71
hash keys 150–151
high ground 
analogy of 67, 68f
holding the 71
progression of events 67–69, 68f
IBM 360 69
inexpensive storage 74
infrastructure for 9
large volumes 73–74
log data 253–255
metadata in 257–259, 259f
nonrepetitive data in 78–79, 269
online transaction processing 69–70
reengineering 172–174
repetitive big data 9
Roman census approach 74–75
Teradata and MPP processing 70
unstructured data 7, 75–76
Blob 373
Boilerplate contract 112–113
Bubble chart 389t, 393
Bulk data, transformations of 65, 66f
Bulk data mart 48
Bulk data vault 47
Bulk data warehouse 47, 50, 52
Business concept model, DV2 
methodology of 137
modeling of 136
Business concepts 152
Business context 141
Business intelligence (BI) 138–139, 163–164
CMMI 163–166
managed self-service BI 161–162
PMP and SDLC 167–168
Six Sigma 168–169
Business keys 143–144, 147, 152–153
multipart source 155–156
sequence numbers as 154–155
source system sequence-driven 153–155
Business problem 154
Business processes 143–144
Business relevancy, to corporate data 22, 23f, 24–25, 25f
Business requirement meetings, recording of 165
Business value 363–364, 365f
data architecture 5
data relevancy over time 367–368
occurance of 366–367
proposition, unstructured data 90–91, 90f
tactical decisions 368–369
tactical/strategic 364–365
volume of data vs. 365, 366f

C

Call centers 
analytic tool 299–300, 299f
dashboard informations 300–301, 300–304f, 303–304
mapping process 297–298, 298f
textual disambiguation 296–299, 298–299f
Capability Maturity Model Integration (CMMI) 163–167
Capture/edit process, data 33
Categorization, of data 30–31, 30f
Changed data capture (CDC) 222–223, 222f
Classical ETL interface 220–221, 220f
Classical system development life cycle (SDLC) processing 234–235, 234f
COBOL 177
Commercial taxonomies 115
Comparative analysis 213
Computer, commercial uses of 177, 178f
Conditional architecture 172
Context 
challenge of 374–375, 376f
in nonrepetitive data 79–80, 79–80f
in repetitive data 77–78, 77–78f, 243–244
Context-enriched section, big data 215–216, 217f
Contextualization 16
of repetitive unstructured data 99–100
unstructured data 93–95, 94f, 96f
Continental divide, data architecture 4, 4f
Corporate computing 183
Corporate data 7, 49–50
analysis of 27, 28f
categorization 30–31, 30f
data integration 29
diverse sources of 27, 28f
formal analysis 27
informal analysis 27
key performance indicators 31
many problems of 28
normalization 29–30, 30f
business relevancy relates to 22, 23f, 24–25, 25f
classification of 13
demographics of 21–25
division of 21, 22f
nonrepetitive unstructured data 24, 24f
potentially business-relevant records 24
ratios of repetitive data 21, 22f
repetitive unstructured data 21–24, 22f
structured data 23, 24f
Corporate data models 340–343
application models 343, 343f
data warehouse 343, 344f
generic data models 342, 342f
Corporate decision-making 331, 334
Corporate information infrastructure 34
Corporate information systems 33
Coupled processors 41–42, 42f
Crawler technology 256
Current valued data 126–127
Curve of usefulness 34
declining of 35–36, 35–36f
Customer account number (CAN) 141
Customize data, transformations of 61, 61f
Custom variables resolution, nonrepetitive data 274–275

D

Dashboard 394
Data 39
accumulation curve 36
active/passive indexing of 255–256
analyzing points of 246, 247f
automated generation of 48
block of 251, 252f
coupled processors 41–42, 42f
in data lake 53–54, 54f
data vault 44, 44f
data warehouse 43, 43f
DBMS 41, 41f
degradation of integrity of 37, 37f
different forms of 5
disk storage 40–41, 40f
in end-state data architecture 48–50, 49f
file structures of 206, 207f
the great divide 45, 45f
integration of 29
integrity of 121
internal, external 261
internal formatting of 204, 205f
life cycle of 33–37
linkage of 259–260
logical organization of 202, 202f
magnetic tapes 40, 40f
online transaction processing 42, 42f
paper tape and punch cards 39, 39f
parallel data management 43, 43f
phenomenon of 34–35
physical gathering of 28
raw detailed 33–34
resolution problem of 29
standard/universal measurements of 262
transformations in 
into bulk storage 63–64, 65f
into customized state 63, 64f
generated automatically 64–65, 65f
Data architecture 1
aspects of 199
big data 201, 202f
business value 5
continental divide 4, 4f
database concept 207–208, 208f
database management systems 203
data mart 210, 210f
data warehouse 208–209, 209f
different forms of data 5
disk storage 201, 201f
evolution of 199, 200f
file structures of data 206, 207f
great divide of data 3–4
high-level perspective 225
different communities 229
questions types 228–229
redundancy 225–226
system of record 226–228
internal formatting of data 204, 205f
logical organization of data 202, 202f
magnetic tape 200, 201f
master file 207, 208f
nonrepetitive unstructured environment 206, 206f
online database environment 208, 208f
operational data store 209–210, 209f
paper tape 199, 200f
parallel disk storage 201, 201f
parent-child relationship and net-worked relationship 203, 203f
physical dimension of 199, 200f
punched cards 199, 200f
relational database management system 203, 204f
repetitive/nonrepetitive unstructured data 2–3
structured approach 204, 205f
subdividing data 1–2
textual/nontextual data 4
unstructured approach 204, 205f
unstructured data 205–206, 205–206f
Database concept 207–208, 208f
Database management systems (DBMS) 41, 41f, 180–181, 203, 217
blob 373
requirements of 371
and text 371–372
Data communications (DC) 69
Data encryption 263, 264f
Data flow diagrams, functional decomposition 337–340, 338–340f
Data infrastructure 7
being optimized 10–11
different infrastructures 10
comparison of 12
repetitive big data 9
repetitive data 7
repetitive structured data 7–8
Data integrity 320–321, 320f
Data item set (dis) 193–194
Data lake 48, 50, 347–348, 348–349f
architecture of 57, 57f
data in 53–54, 54f
transformation of 48
Data marts 47, 50, 174–175, 210, 210f, 229, 325–326, 356
Data modeling 
corporate 340–343
end-state architecture 337, 338f
functional decomposition 337–340
generic data models 197–198
ontologies 345–347
operational/data warehouse 198
proactive/reactive 349–351
selective subdivision 347–348
star join/dimensional 343–345
for structured environment 191
data item set 193–194
different levels of 195–196
generic data models 197–198
granular data only 191–192
linkage 196, 197f
operational/data warehouse data models 198
physical data base design 194–195, 195f
purpose of roadmap 191
taxonomies 111–113, 345–347
types of 337
Data organization, visualizations 387
Data over time 247–249
Data ponds 48, 50
Data quality, visualizations 387–388
Data relevancy 367–368
Data sources, visualizations 386–387
Data vault 44, 44f, 47
Data Vault 1.0 modeling 134
Data Vault 2.0 134–136
origins and background 133–136
Data Vault 1.0 (DV1) 139–140
modeling of 134
Data Vault 2.0 (DV2) 134–136
business benefits of 138–139
components of 133
implementation of 138, 171
data marts 174–175
managed self-service BI 175–176
patterns 171–172
reengineering 172–174
issues to solve 135
methodology of 137, 163
agility 166–167
CMMI 163–166
PMP and SDLC 167–168
Six Sigma 168–169
Data Vault 2.0 architecture 138, 157, 158f
components of 157–158
hard and soft business rules 160–161
managed self-service BI 161–162
NoSQL platform 158–159
objectives of 159
Data Vault 2.0 model 136–137, 145, 145f
hybrid model 142
objectives of 159–160
primary key values 
business keys 152–153
hash keys 150–152
multipart source business keys 155–156
sequence numbers 148–149
source system sequence-driven business keys 153–155
Data vault model (DVM) 141, 144–146
business keys 143–144
components of 142
and data warehousing 144
defined as 141–142, 142f
many-to-many link structures 147–148
restructuring of 146
rules of 146–147
Data visualizations 381–382, 382f, 395
framework 383, 384f
data 385–388
define 383–385
design 388–393
distribute 393–394
purpose and context 382, 383f
science and art 382–383, 384f
tools and software 394–395
Data warehouse 43, 43f, 47, 50–52, 51f, 208–209, 209f, 321, 343, 344f, 356, 363
data transformation 357
definition of 321
operational environment interface 219, 219f
changed data capture 222–223, 222f
classical ETL interface 220–221, 220f
ELT processing 223–224
inline transformation 223, 223f
ODS interface 220–221, 221f
staging area 221–222
Data warehouse data models 198
Data warehousing systems 138–139, 142, 144
Date standardization 280, 280f, 293
Date tagging 279
Decision-making 
corporate level of 331, 334
personal level of 331, 334
Deep historic data 356
Denormalization 319–320
transaction response time 314–315, 315f
Derived metadata 258
Detailed data 256–257, 257f, 359, 359f
Deterministic 150
Dimensional data model 343–345, 344–345f
Direct end user 125
Disciplined agile delivery (DAD) 166
Disk storage 40–41, 40f, 201, 201f
Disk technology 179–180
Distillation process 211
repetitive analysis 265–266
repetitive analytics 238–239, 239f
Document classification 284–285
Document fracturing 105–106
Document metadata 284, 284f
Document preprocessing 106, 107f

E

Electronic speed 312
ELT processing 223–224
E-mails 107, 107f
End-state data architecture 47, 337, 338f, 355, 355–356f
architectural components 47–48, 48f
business value 363–364, 365f
data relevancy over time 367–368
occurance of 366–367
tactical decisions 368–369
tactical/strategic 364–365
volume of data versus 365, 366f
in data lake 53–54, 54f, 57, 57f
data warehouse 50–52, 51f
evolutionary experience 55–56, 56f
evolution of 363, 364f
features of 349–350
kinds of data in 48–50, 49f
metadata in 54, 55f
networked metadata 55, 56f
questions types 52–53, 53f
shaping through models 50, 51f
transformations in 59–61, 60f
application data 62, 63f
bulk data 65, 66f
customizing data 61, 61f
data generated automatically 64–65, 65f
data into bulk storage 63–64, 65f
data into customized state 63, 64f
and redundancy 66, 66f
redundant data 59, 59f
text 61–62, 62f
End user awareness cycle 353–354, 354f
Enterprise data warehousing (EDW) 139
Enterprise information integration (EII) 161, 175
Entity 192
Entity relationship diagram (ERD) 192–193
ETL technology 322
Exception based data 213, 214f
External data 261

F

False-positive correlation 233, 233f
False positives 233
Fast-running transaction 314, 314f
Federated query engines 161
Filtering data 242–243
Filtering process 
repetitive analysis 265–266
repetitive analytics 238–239, 239f
Formal analysis, corporate data analysis 27
Fortran 177
Free-form text 113
Freezing data 236
Frozen business, requirements of 130, 130f
Functional decomposition 337–340, 338–340f
Functional sequencing, within textual ETL 286, 286f

G

Gap analysis 175
Gartner Group, big data definition 73
Generic data models 197–198, 342, 342f
Granular data 191–192, 324–325
The “great divide,” 3–4, 14, 45, 45f
corporate data classification 13
different worlds 18–19
nonrepetitive unstructured data 16–18
repetitive unstructured data 14–15
Greenwich mean time (GMT) 262

H

Hadoop-based satellite 160, 160f
Hadoop distributed file system (HDFS) 158
Hadoop technology 14–15, 15f, 70, 151, 158–159, 254–255, 254f
IBM and 71
Hard and soft business rules 160–161
Hash keys 150–152
Heuristic processing 234–236
High ground 
analogy of 67, 68f
holding the 71
progression of events 67–69, 68f
High-level perspective, data architecture 225
different communities 229
questions types 228–229
redundancy 225–226
system of record 226–228
Historical data 321, 321f
siloed applications 127–128
Homographic resolution, nonrepetitive data 275–276
Horizontal bar chart 389t, 392, 392f
Hourglass analogy 185–187
Hubs 142, 145, 154–155

I

IBM 360 69
Indirect end user 125
Inexpensive storage, big data 74
Infographic 394
Informal analysis, corporate data analysis 27
Information management system (IMS) 69
Information marts 174
Inline contextualization, nonrepetitive data 272–273
Inline transformation 223, 223f
Integrated data 321, 324–327
Integrity of data 320–321, 320f, 333
Intelligent key 152–153
Internal data 261
Internal referential integrity, nonrepetitive data 287, 287f
International Business Machines (IBM) 37
and Hadoop technology 71
I/O operation, transaction response time 312–313, 313f

J

Julian date 262

K

Key performance indicators (KPIs) 327–329, 327–328f
corporate data analysis 31

L

Landing zone 48
Least squares approach 246
Line charts 389t, 392, 393f
Links 142, 145–146
Link structures 147
List processing, nonrepetitive data 280–281
Log data 253–255
Logical foreign key 160
Log tape records 245–246
Lookup process 149

M

Magnetic tapes 40, 40f, 180, 200, 201f
Managed self-service BI 161–163, 175–176
Manual analysis, unstructured data 96–97, 97f
Many-to-many link structures 147–148
Mapping process 291, 292f
call centers 297–298, 298f
definition of 292–293
levels of 195, 196f
repetitive and nonrepetitive records of data 291
repetitive text 291–292
textual disambiguation 102–104, 293
variables name selection 292
MapReduce 96, 96f
Massively parallel processing (MPP) approach 70, 83–84, 83f
Master file 207, 208f
Mechanical speed 312
Medical records 304–307, 308f
Metadata 
in big data 257–259, 259f
in end-state data architecture 54, 55f
Metrics, repetitive analysis 267–268
“Million in one” syndrome 366
Multipart source business keys 155–156
Multiple processors 41–42

N

Named value processing 105–106
Narrative data, classification of 112–113
Narrative information 304
Native metadata 258
Natural language processing (NLP) 17, 95, 374
Negation analysis, nonrepetitive data 277–278
Networked metadata 55, 56f
Net-worked relationship 203, 203f
Nonrepetitive data 86, 269, 269f
acronym resolution 277, 277f
analytics from 295
call center information 295–303, 304f
medical records 304–307, 308f
associative word processing 281–282
in big data 78–79, 269
context in 79–80, 79–80f
custom variables 274–275
date standardization 280, 280f
date tagging 279
document classification 284–285
document metadata 284, 284f
functional sequencing within textual ETL 286, 286f
homographic resolution 275–276
inline contextualization 272–273
internal referential integrity 287, 287f
list processing 280–281
negation analysis 277–278
numeric tagging 278–279
parsing of 85–86, 85f, 271
preprocessing and postprocessing 287–289
processing of 270
process of reading 270
proximity analysis 285–286
stop word processing 282–283
taxonomy/ontology processing 273–274
word stemming 283
Nonrepetitive nontextual data 4
Nonrepetitive raw big data 214–215
Nonrepetitive records of data 291
Nonrepetitive unstructured data 2–3, 13, 16–18, 24
business relevancy of 24, 24f
business value 90
context in 94
easy to analyze 92
environment of 206, 206f
information of 91, 91f
Nonstructured data 1
Nontextual data 4
Normalization, of data 29–30, 30f
Normal profile 238, 238f
NoSQL 138
NoSQL platform, Data Vault 2.0 architecture 158–159
Number charts 389, 389t, 389f
Numeric tagging, nonrepetitive data 278–279

O

Online database environment 208, 208f
Online real-time system 42, 43f
Online transaction processing systems 42, 42f, 69–70, 180–181, 181f
applications 127–128, 128f
Ontologies 114, 114f
data model 345–347
processing, nonrepetitive data 273–274
Open-ended continuous analysis 231, 232f
Operational analytics 319
application data 322–323, 323f
data integrity lack 320–321, 320f
data marts 325–326
data warehouse 321
denormalization 319–320
historical data 321, 321f
operational environment 309, 310f, 319, 320f
perspectives of data 324–325
relational model 322, 322f
system of record 323, 324f
transaction response time 310–317, 311f
Operational data models 198
Operational data store (ODS) 47, 209–210, 209f, 326–329
interface 220–221, 221f
Operational environment 177
applications 178, 178f
commercial uses of computer 177, 178f
corporate computing 183
DBMS 180–181
disk technology 179–180
Ed Yourdon and structured revolution 178–179
interface, data warehouse 219, 219f
changed data capture 222–223, 222f
classical ETL interface 220–221, 220f
ELT processing 223–224
inline transformation 223, 223f
ODS interface 220–221, 221f
staging area 221–222
response time and availability 181–183, 182f
SDLC 179, 179f
Operational systems 319
Optical character recognition (OCR) software 27–28
Optimization 164
Outliers 247, 247f

P

Paper tape 39, 39f, 199, 200f
Parallel data management 43, 43f
Parallel disk storage 201, 201f
Parallelization 82
MPP approach 83–84, 83f
in Roman census approach 81, 82f, 84
Parallel processing 
big data handling 81, 82f
multiple processors 81
parsing 84, 84f
of nonrepetitive data 85–86, 85f
of repetitive data 85, 85f
Parent-child relationship 203, 203f
Parsing 
of nonrepetitive data 85–86, 85f, 271
of repetitive data 85, 85f, 252
of repetitive unstructured data 99, 100f
Passive indexing 255–256
Passive integration 146
Pattern analysis 213
repetitive analytics 232–234
Performance, operational environment 309, 310f
Personal analytics 
decision-making 331, 332f, 334
sandbox 335, 335f
spreadsheet 332–334, 332–334f
Personal computer 331, 332f
Personal data 331, 334–335
Personal decision-making 331, 334
Physical data base design 194–195, 195f
Pie charts 389t, 390, 390f
Postprocessing, nonrepetitive data 287–289
Potentially business-relevant records 24
PowerPoint 392–393
Preprocessing, nonrepetitive data 287–289
Private taxonomies 115
Proactive data models 349–351, 350f
Probabilistic linkages 260
Project-based analysis 231, 232f
Project management professional (PMP) 167–168
Proximity analysis, nonrepetitive data 285–286
Punch cards 39, 39f, 199, 200f

Q

Qlik Sense 386, 386f, 392
Queue time, transaction response time 313, 313f

R

Racetrack analogy 187, 187f
Raw big data 211
nonrepetitive 214–215
repetitive 211–213
Reactive data model 349–351
Redundancy 
high-level perspective, data architecture 225–226
transformations and 66, 66f
Redundant data 59, 59f
Reengineering, DV2 implementation 172–174
Refine process 47
Relational database management system 203, 204f
Relational model, operational analytics 322, 322f
Repetitive analysis 
archiving results 266–267
filtering and distillation processing 265–266
internal, external data 261
metrics 267–268
security 263–265, 263f
universal identifiers 262
Repetitive analytics 231
analyzing points of data 246, 247f
bias of the sample 241, 241f
data over time 247–249
distillation and filtering processing 238–239, 239f
filtering data 242–243
freezing data 236
heuristic processing 234–236
kinds of analysis 231–232
linking repetitive records 244–245
log tape records 245–246
normal profile 238, 238f
outliers 247, 247f
patterns 232–234
repetitive data and context 243–244
sandbox 237, 237f
subsetting data 240–241
Repetitive big data 9
Repetitive data 269
analysis of 251, 252f
active/passive indexing of data 255–256
block of data 251, 252f
elements in 251, 252f
linking data 259–260
log data 253–255
metadata, in big data 257–259, 259f
parsing of 252
summary/detailed data 256–257, 257f
application-specific nature of 87, 87f
big data context in 77–78, 77–78f
and context 243–244
contextual data on 86, 86f
parsing of 85, 85f
ratios of 21, 22f
types of 7
Repetitive raw big data 211–213
Repetitive records 291
linking of 244–245
Repetitive structured data 7–8
Repetitive text 291–292
Repetitive unstructured data 2–3, 13–15, 21–22, 22f, 111
business relevancy of 23–24, 24f
business value 90
contextualizing of 99–100
easy to analyze 92
output data recast 100, 100f
parsing of 99, 100f
Repetitive unstructured information 91, 91f
Report decompilation 108–109, 110f
Reservation systems 70
Response time 181–183, 182f
elements of 185, 186f
Roadmap, purpose of 191
Roman census approach 9, 74–75
parallelization in 81, 82f, 84

S

Sandbox 237, 237f, 335, 335f
Satellites 142, 145, 147
Scale-free network design 142
Scatter chart 232–233
Scatter diagram 246, 247f
Scatterplot.  See Bubble chart
Scrum 166
Security, repetitive analysis 263–265, 263f
Selective subdivision, data 347–348, 348–349f
Self-service BI 161, 175
Sequence numbers 148–149
as business keys 154–155
Service-level agreement (SLA) 188–189
Siloed applications 121
building of 124–126
challenge of 121–124
characteristics of 126
current valued data 126–127
dismantling of 131, 131f
frozen business requirements 130, 130f
high availability 128–129
minimal historical data 127–128
overlap between 129–130
Siloed systems 121, 124–125
Six Sigma, Data Vault 2.0 168–169
Smart key 152–153
Software development life cycle (SLDC) 163–164, 167–168
Source system sequence-driven business keys 153–155
Stacked bar chart 389t, 391–392, 391f
Staging area 221–222
Standard data warehouse 52
Standard structured DBMS 11f, 12
Standard work unit (swu) 185, 188
hourglass analogy 185–187
racetrack analogy 187, 187f
response time, elements of 185, 186f
Star join 
data marts 325
data model 343–345, 344–345f
Stemming 373
word 283
Stop words 373
nonrepetitive data 282–283
Strategic business value 364–365
Structured approach, data architecture 204, 205f
Structured data 1, 2f, 10, 23, 24f
analysis of 217–218
decisions based 89–90, 90f
merging text based data and 378–379
repetitive structured data 7–8
visualizations 385
Structured DBMS 11–12
Structured revolution 178–179
Subsetting data 240–241
Summary data 256–257, 257f, 359, 359–360f
life cycle of 34
Super Bowl 233
System development life cycle (SDLC) 179, 179f
System of record 226–228
auditing data 360
data flow 357
data types 356, 356f
data updation 358–359, 359f
defined as 354–355, 355f
detailed and summary data 359, 359f
end-state architecture 355
end user awareness cycle 353–354, 354f
operational analytics 323, 324f
other data 358, 358f
text and 360–361

T

Tactical business value 364–365
Tagging 373
Taxonomic resolution 374
Taxonomies 111, 113–114
applicability of 113
commercial/private 115
data models 111–113
maintenance of 118, 119f
in multiple languages 115, 115f
ontology 114, 114f
and textual disambiguation 
dynamics of 116, 116f
separate technologies 116–117, 118f
types of 117, 118f
Taxonomy data model 345–347, 346–348f
Taxonomy processing, nonrepetitive data 273–274
Technical view, DV2 
methodology of 137
modeling of 136–137
Teradata 70, 217
Text 47, 49
management of 371
challenge of 371–374
merging text based and structured data 378–379
secondary analysis 377–378, 378f
visualization 378, 378f
system of record 360–361, 361f
transformations in 61–62, 62f
Text-based data 378–379
Textual data 4, 363
Textual disambiguation 16–18, 17–18f, 47, 79, 85–86, 101, 216, 217f, 262, 270, 271f, 374–375, 376f
call centers 296–299, 298–299f
classification of 271–272, 271f
document fracturing/named value processing 105–106
document preprocessing 106, 107f
e-mails 107, 107f
flow of processing in 271
input into 102, 103f
input/output 104–105
mapping process 102–104, 293
from narrative to analytical data base 101, 102f
nonrepetitive data 
acronym resolution 277, 277f
associative word processing 281–282
custom variables 274–275
date standardization 280, 280f
date tagging 279
document classification 284–285
document metadata 284, 284f
functional sequencing within 286, 286f
homographic resolution 275–276
inline contextualization 272–273
internal referential integrity 287, 287f
list processing 280–281
negation analysis 277–278
numeric tagging 278–279
preprocessing and postprocessing 287–289
proximity analysis 285–286
stop word processing 282–283
word stemming 283
processing components of 376–377
report decompilation 108–109, 110f
secondary analysis 377–378, 378f
spreadsheets 107–108
taxonomies and 115
dynamics of 116, 116f
separate technologies 116–117, 118f
Textual ETL.  See Textual disambiguation
Textual information 89, 89f
Total cost of ownership (TCO) 144
Total quality management (TQM) 169–170
Transaction data 49
Transaction response time 310, 311f
data types 314, 315f
denormalization 314–315, 315f
elements of 311–312, 311f
fast-running transaction 314, 314f
I/O operation 312–313, 313f
long-running programs 315–317, 316f
measurement of 312, 312f
queue time 313, 313f

U

Uniprocessor architecture 41
Universal identifiers 262
Unstructured approach, data architecture 204, 205f
Unstructured big data 7
Unstructured data 89, 205–206, 205–206f
analysis of 217–218
big data 75–76
business value proposition 90–91, 90f
classification of 111, 112f
contextualization 93–95, 94f, 96f
ease of analysis 91–93, 92–93f
manual analysis 96–97, 97f
MapReduce 96, 96f
nonrepetitive 2–3, 16–18
repetitive 2–3, 14–15
repetitive and nonrepetitive unstructured information 91, 91f
structured data, decisions 89–90, 90f
textual information 89, 89f
visualizations 385
US dollar 262

V

Virtualization, data marts 174–175
Visualizations 378, 378f, 388, 389t
bar chart 390, 391f
bubble chart 393
horizontal bar chart 392, 392f
line chart 392, 393f
stacked bar chart 391–392, 391f
number chart 389, 389f
pie charts 390, 390f

W

WhereScape 137
Word stemming, nonrepetitive data 283

Y

Yourdon, Ed 178–179, 202
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset