Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

Index

Note: Page numbers followed by f indicate figures and t indicate tables.

Acronym resolution, nonrepetitive data 277, 277f

Active indexing 255–256

Aggregated data, life cycle of 34

Agility, Data Vault 2.0 166–167

Analytic tool, call centers 299–300, 299f

AnalytiX DS 137

Application data 60–61, 60f, 322–323, 323f

transformations of 62, 63f

Archival facility 47

Associative word processing 281–282

Atomicity, consistency, isolation, and durability (ACID) compliance 158

Auditing data 360, 360f

Bar charts 389t, 390, 391f

bubble chart 393

horizontal bar chart 392, 392f

line chart 392, 393f

stacked bar chart 391–392, 391f

BI See Business intelligence (BI)

Big data 29, 44–45, 44f, 67, 201, 202f

context in

nonrepetitive data 79–80, 79–80f

repetitive data 77–78, 77–78f

data in 76–77, 76–77f

definition of 73

existing system interface 211, 212f

context-enriched section of 215–216, 217f

exception based data 213, 214f

into existing systems environment 215, 216f

nonrepetitive raw 214–215

repetitive raw 211–213

structured data/unstructured data analysis 217–218

Hadoop and 70

IBM and 71

hash keys 150–151

high ground

analogy of 67, 68f

holding the 71

progression of events 67–69, 68f

IBM 360 69

inexpensive storage 74

infrastructure for 9

large volumes 73–74

log data 253–255

metadata in 257–259, 259f

nonrepetitive data in 78–79, 269

online transaction processing 69–70

reengineering 172–174

repetitive big data 9

Roman census approach 74–75

Teradata and MPP processing 70

unstructured data 7, 75–76

Blob 373

Boilerplate contract 112–113

Bubble chart 389t, 393

Bulk data, transformations of 65, 66f

Bulk data mart 48

Bulk data vault 47

Bulk data warehouse 47, 50, 52

Business concept model, DV2

methodology of 137

modeling of 136

Business concepts 152

Business context 141

Business intelligence (BI) 138–139, 163–164

CMMI 163–166

managed self-service BI 161–162

PMP and SDLC 167–168

Six Sigma 168–169

TQM 169–170

Business keys 143–144, 147, 152–153

multipart source 155–156

sequence numbers as 154–155

source system sequence-driven 153–155

Business problem 154

Business processes 143–144

Business relevancy, to corporate data 22, 23f, 24–25, 25f

Business requirement meetings, recording of 165

Business value 363–364, 365f

data architecture 5

data relevancy over time 367–368

occurance of 366–367

proposition, unstructured data 90–91, 90f

tactical decisions 368–369

tactical/strategic 364–365

volume of data vs. 365, 366f

Call centers

analytic tool 299–300, 299f

dashboard informations 300–301, 300–304f, 303–304

mapping process 297–298, 298f

textual disambiguation 296–299, 298–299f

Capability Maturity Model Integration (CMMI) 163–167

Capture/edit process, data 33

Categorization, of data 30–31, 30f

Changed data capture (CDC) 222–223, 222f

Classical ETL interface 220–221, 220f

Classical system development life cycle (SDLC) processing 234–235, 234f

CMMI See Capability Maturity Model Integration (CMMI)

COBOL 177

Commercial taxonomies 115

Comparative analysis 213

Computer, commercial uses of 177, 178f

Conditional architecture 172

Context

challenge of 374–375, 376f

in nonrepetitive data 79–80, 79–80f

in repetitive data 77–78, 77–78f, 243–244

Context-enriched section, big data 215–216, 217f

Contextualization 16

of repetitive unstructured data 99–100

unstructured data 93–95, 94f, 96f

Continental divide, data architecture 4, 4f

Corporate computing 183

Corporate data 7, 49–50

analysis of 27, 28f

categorization 30–31, 30f

data integration 29

diverse sources of 27, 28f

formal analysis 27

informal analysis 27

key performance indicators 31

many problems of 28

normalization 29–30, 30f

business relevancy relates to 22, 23f, 24–25, 25f

classification of 13

demographics of 21–25

division of 21, 22f

nonrepetitive unstructured data 24, 24f

potentially business-relevant records 24

ratios of repetitive data 21, 22f

repetitive unstructured data 21–24, 22f

structured data 23, 24f

Corporate data models 340–343

application models 343, 343f

data warehouse 343, 344f

generic data models 342, 342f

Corporate decision-making 331, 334

Corporate information infrastructure 34

Corporate information systems 33

Coupled processors 41–42, 42f

Crawler technology 256

Current valued data 126–127

Curve of usefulness 34

declining of 35–36, 35–36f

Customer account number (CAN) 141

Customize data, transformations of 61, 61f

Custom variables resolution, nonrepetitive data 274–275

DAD See Disciplined agile delivery (DAD)

Dashboard 394

call centers 300–301, 300–304f, 303–304

Data 39

accumulation curve 36

active/passive indexing of 255–256

analyzing points of 246, 247f

automated generation of 48

big data 44–45, 44f, 76–77, 76–77f

block of 251, 252f

coupled processors 41–42, 42f

in data lake 53–54, 54f

data vault 44, 44f

data warehouse 43, 43f

DBMS 41, 41f

degradation of integrity of 37, 37f

different forms of 5

disk storage 40–41, 40f

in end-state data architecture 48–50, 49f

file structures of 206, 207f

the great divide 45, 45f

integration of 29

integrity of 121

internal, external 261

internal formatting of 204, 205f

life cycle of 33–37

linkage of 259–260

logical organization of 202, 202f

magnetic tapes 40, 40f

online transaction processing 42, 42f

paper tape and punch cards 39, 39f

parallel data management 43, 43f

phenomenon of 34–35

physical gathering of 28

raw detailed 33–34

resolution problem of 29

standard/universal measurements of 262

transformations in

into bulk storage 63–64, 65f

into customized state 63, 64f

generated automatically 64–65, 65f

Data architecture 1

aspects of 199

big data 201, 202f

business value 5

continental divide 4, 4f

database concept 207–208, 208f

database management systems 203

data mart 210, 210f

data warehouse 208–209, 209f

different forms of data 5

disk storage 201, 201f

evolution of 199, 200f

file structures of data 206, 207f

great divide of data 3–4

high-level perspective 225

different communities 229

questions types 228–229

redundancy 225–226

system of record 226–228

internal formatting of data 204, 205f

logical organization of data 202, 202f

magnetic tape 200, 201f

master file 207, 208f

nonrepetitive unstructured environment 206, 206f

online database environment 208, 208f

operational data store 209–210, 209f

paper tape 199, 200f

parallel disk storage 201, 201f

parent-child relationship and net-worked relationship 203, 203f

physical dimension of 199, 200f

punched cards 199, 200f

relational database management system 203, 204f

repetitive/nonrepetitive unstructured data 2–3

structured approach 204, 205f

subdividing data 1–2

textual/nontextual data 4

unstructured approach 204, 205f

unstructured data 205–206, 205–206f

Database concept 207–208, 208f

Database management systems (DBMS) 41, 41f, 180–181, 203, 217

blob 373

requirements of 371

and text 371–372

Data communications (DC) 69

Data encryption 263, 264f

Data flow diagrams, functional decomposition 337–340, 338–340f

Data infrastructure 7

being optimized 10–11

different infrastructures 10

comparison of 12

repetitive big data 9

repetitive data 7

repetitive structured data 7–8

Data integrity 320–321, 320f

Data item set (dis) 193–194

Data lake 48, 50, 347–348, 348–349f

architecture of 57, 57f

data in 53–54, 54f

transformation of 48

Data marts 47, 50, 174–175, 210, 210f, 229, 325–326, 356

Data modeling

corporate 340–343

end-state architecture 337, 338f

functional decomposition 337–340

generic data models 197–198

ontologies 345–347

operational/data warehouse 198

proactive/reactive 349–351

selective subdivision 347–348

star join/dimensional 343–345

for structured environment 191

data item set 193–194

different levels of 195–196

ERD 192–193

generic data models 197–198

granular data only 191–192

linkage 196, 197f

operational/data warehouse data models 198

physical data base design 194–195, 195f

purpose of roadmap 191

taxonomies 111–113, 345–347

types of 337

Data organization, visualizations 387

Data over time 247–249

Data ponds 48, 50

Data quality, visualizations 387–388

Data relevancy 367–368

Data sources, visualizations 386–387

Data vault 44, 44f, 47

Data Vault 1.0 modeling 134

Data Vault 2.0 134–136

origins and background 133–136

Data Vault 1.0 (DV1) 139–140

modeling of 134

Data Vault 2.0 (DV2) 134–136

business benefits of 138–139

components of 133

implementation of 138, 171

data marts 174–175

managed self-service BI 175–176

patterns 171–172

reengineering 172–174

issues to solve 135

methodology of 137, 163

agility 166–167

CMMI 163–166

PMP and SDLC 167–168

Six Sigma 168–169

TQM 169–170

Data Vault 2.0 architecture 138, 157, 158f

components of 157–158

hard and soft business rules 160–161

managed self-service BI 161–162

NoSQL platform 158–159

objectives of 159

Data Vault 2.0 model 136–137, 145, 145f

hybrid model 142

objectives of 159–160

primary key values

business keys 152–153

hash keys 150–152

multipart source business keys 155–156

sequence numbers 148–149

source system sequence-driven business keys 153–155

Data vault model (DVM) 141, 144–146

business keys 143–144

components of 142

and data warehousing 144

defined as 141–142, 142f

many-to-many link structures 147–148

restructuring of 146

rules of 146–147

Data visualizations 381–382, 382f, 395

framework 383, 384f

data 385–388

define 383–385

design 388–393

distribute 393–394

purpose and context 382, 383f

science and art 382–383, 384f

tools and software 394–395

Data warehouse 43, 43f, 47, 50–52, 51f, 208–209, 209f, 321, 343, 344f, 356, 363

data transformation 357

definition of 321

operational environment interface 219, 219f

changed data capture 222–223, 222f

classical ETL interface 220–221, 220f

ELT processing 223–224

inline transformation 223, 223f

ODS interface 220–221, 221f

staging area 221–222

Data warehouse data models 198

Data warehousing systems 138–139, 142, 144

Date standardization 280, 280f, 293

Date tagging 279

DBMS See Database management systems (DBMS)

DC See Data communications (DC)

Decision-making

corporate level of 331, 334

personal level of 331, 334

Deep historic data 356

Denormalization 319–320

transaction response time 314–315, 315f

Derived metadata 258

Detailed data 256–257, 257f, 359, 359f

Deterministic 150

Dimensional data model 343–345, 344–345f

Direct end user 125

Disciplined agile delivery (DAD) 166

Disk storage 40–41, 40f, 201, 201f

Disk technology 179–180

Distillation process 211

repetitive analysis 265–266

repetitive analytics 238–239, 239f

Document classification 284–285

Document fracturing 105–106

Document metadata 284, 284f

Document preprocessing 106, 107f

DV1 See Data Vault 1.0 (DV1)

DV2 See Data Vault 2.0 (DV2)

DVM See Data vault model (DVM)

EDW See Enterprise data warehousing (EDW)

EII See Enterprise information integration (EII)

Electronic speed 312

ELT processing 223–224

E-mails 107, 107f

End-state data architecture 47, 337, 338f, 355, 355–356f

architectural components 47–48, 48f

business value 363–364, 365f

data relevancy over time 367–368

occurance of 366–367

tactical decisions 368–369

tactical/strategic 364–365

volume of data versus 365, 366f

in data lake 53–54, 54f, 57, 57f

data warehouse 50–52, 51f

evolutionary experience 55–56, 56f

evolution of 363, 364f

features of 349–350

kinds of data in 48–50, 49f

metadata in 54, 55f

networked metadata 55, 56f

questions types 52–53, 53f

shaping through models 50, 51f

transformations in 59–61, 60f

application data 62, 63f

bulk data 65, 66f

customizing data 61, 61f

data generated automatically 64–65, 65f

data into bulk storage 63–64, 65f

data into customized state 63, 64f

and redundancy 66, 66f

redundant data 59, 59f

text 61–62, 62f

End user awareness cycle 353–354, 354f

Enterprise data warehousing (EDW) 139

Enterprise information integration (EII) 161, 175

Entity 192

Entity relationship diagram (ERD) 192–193

ERD See Entity relationship diagram (ERD)

ETL technology 322

Exception based data 213, 214f

External data 261

False-positive correlation 233, 233f

False positives 233

Fast-running transaction 314, 314f

Federated query engines 161

Filtering data 242–243

Filtering process

repetitive analysis 265–266

repetitive analytics 238–239, 239f

Formal analysis, corporate data analysis 27

Fortran 177

Free-form text 113

Freezing data 236

Frozen business, requirements of 130, 130f

Functional decomposition 337–340, 338–340f

Functional sequencing, within textual ETL 286, 286f

Gap analysis 175

Gartner Group, big data definition 73

Generic data models 197–198, 342, 342f

Granular data 191–192, 324–325

The “great divide,” 3–4, 14, 45, 45f

corporate data classification 13

different worlds 18–19

nonrepetitive unstructured data 16–18

repetitive unstructured data 14–15

Greenwich mean time (GMT) 262

Hadoop-based satellite 160, 160f

Hadoop distributed file system (HDFS) 158

Hadoop technology 14–15, 15f, 70, 151, 158–159, 254–255, 254f

IBM and 71

Hard and soft business rules 160–161

Hash keys 150–152

HDFS See Hadoop distributed file system (HDFS)

Heuristic processing 234–236

High ground

analogy of 67, 68f

holding the 71

progression of events 67–69, 68f

High-level perspective, data architecture 225

different communities 229

questions types 228–229

redundancy 225–226

system of record 226–228

Historical data 321, 321f

siloed applications 127–128

Homographic resolution, nonrepetitive data 275–276

Horizontal bar chart 389t, 392, 392f

Hourglass analogy 185–187

Hubs 142, 145, 154–155

IBM See International Business Machines (IBM)

IBM 360 69

IMS See Information management system (IMS)

Indirect end user 125

Inexpensive storage, big data 74

Infographic 394

Informal analysis, corporate data analysis 27

Information management system (IMS) 69

Information marts 174

Inline contextualization, nonrepetitive data 272–273

Inline transformation 223, 223f

Integrated data 321, 324–327

Integrity of data 320–321, 320f, 333

Intelligent key 152–153

Internal data 261

Internal referential integrity, nonrepetitive data 287, 287f

International Business Machines (IBM) 37

and Hadoop technology 71

I/O operation, transaction response time 312–313, 313f

Julian date 262

Key performance indicators (KPIs) 327–329, 327–328f

corporate data analysis 31

Landing zone 48

Least squares approach 246

Line charts 389t, 392, 393f

Links 142, 145–146

Link structures 147

List processing, nonrepetitive data 280–281

Log data 253–255

Logical foreign key 160

Log tape records 245–246

Lookup process 149

Magnetic tapes 40, 40f, 180, 200, 201f

Managed self-service BI 161–163, 175–176

Manual analysis, unstructured data 96–97, 97f

Many-to-many link structures 147–148

Mapping process 291, 292f

call centers 297–298, 298f

definition of 292–293

levels of 195, 196f

repetitive and nonrepetitive records of data 291

repetitive text 291–292

textual disambiguation 102–104, 293

variables name selection 292

MapReduce 96, 96f

Massively parallel processing (MPP) approach 70, 83–84, 83f

Master file 207, 208f

Mechanical speed 312

Medical records 304–307, 308f

Metadata

in big data 257–259, 259f

in end-state data architecture 54, 55f

Metrics, repetitive analysis 267–268

“Million in one” syndrome 366

MPP approach See Massively parallel processing (MPP) approach

Multipart source business keys 155–156

Multiple processors 41–42

Named value processing 105–106

Narrative data, classification of 112–113

Narrative information 304

Native metadata 258

Natural language processing (NLP) 17, 95, 374

Negation analysis, nonrepetitive data 277–278

Networked metadata 55, 56f

Net-worked relationship 203, 203f

NLP See Natural language processing (NLP)

Nonrepetitive data 86, 269, 269f

acronym resolution 277, 277f

analytics from 295

call center information 295–303, 304f

medical records 304–307, 308f

associative word processing 281–282

in big data 78–79, 269

context in 79–80, 79–80f

custom variables 274–275

date standardization 280, 280f

date tagging 279

document classification 284–285

document metadata 284, 284f

functional sequencing within textual ETL 286, 286f

homographic resolution 275–276

inline contextualization 272–273

internal referential integrity 287, 287f

list processing 280–281

negation analysis 277–278

numeric tagging 278–279

parsing of 85–86, 85f, 271

preprocessing and postprocessing 287–289

processing of 270

process of reading 270

proximity analysis 285–286

stop word processing 282–283

taxonomy/ontology processing 273–274

word stemming 283

Nonrepetitive nontextual data 4

Nonrepetitive raw big data 214–215

Nonrepetitive records of data 291

Nonrepetitive unstructured data 2–3, 13, 16–18, 24

business relevancy of 24, 24f

business value 90

context in 94

easy to analyze 92

environment of 206, 206f

information of 91, 91f

Nonstructured data 1

Nontextual data 4

Normalization, of data 29–30, 30f

Normal profile 238, 238f

NoSQL 138

NoSQL platform, Data Vault 2.0 architecture 158–159

Number charts 389, 389t, 389f

Numeric tagging, nonrepetitive data 278–279

ODS See Operational data store (ODS)

Online database environment 208, 208f

Online real-time system 42, 43f

Online transaction processing systems 42, 42f, 69–70, 180–181, 181f

applications 127–128, 128f

Ontologies 114, 114f

data model 345–347

processing, nonrepetitive data 273–274

Open-ended continuous analysis 231, 232f

Operational analytics 319

application data 322–323, 323f

data integrity lack 320–321, 320f

data marts 325–326

data warehouse 321

denormalization 319–320

historical data 321, 321f

ODS 326–329

operational environment 309, 310f, 319, 320f

perspectives of data 324–325

relational model 322, 322f

system of record 323, 324f

transaction response time 310–317, 311f

Operational data models 198

Operational data store (ODS) 47, 209–210, 209f, 326–329

interface 220–221, 221f

Operational environment 177

applications 178, 178f

commercial uses of computer 177, 178f

corporate computing 183

DBMS 180–181

disk technology 179–180

Ed Yourdon and structured revolution 178–179

interface, data warehouse 219, 219f

changed data capture 222–223, 222f

classical ETL interface 220–221, 220f

ELT processing 223–224

inline transformation 223, 223f

ODS interface 220–221, 221f

staging area 221–222

response time and availability 181–183, 182f

SDLC 179, 179f

Operational systems 319

Optical character recognition (OCR) software 27–28

Optimization 164

Outliers 247, 247f

Paper tape 39, 39f, 199, 200f

Parallel data management 43, 43f

Parallel disk storage 201, 201f

Parallelization 82

MPP approach 83–84, 83f

in Roman census approach 81, 82f, 84

Parallel processing

big data handling 81, 82f

multiple processors 81

parsing 84, 84f

of nonrepetitive data 85–86, 85f

of repetitive data 85, 85f

Parent-child relationship 203, 203f

Parsing

of nonrepetitive data 85–86, 85f, 271

of repetitive data 85, 85f, 252

of repetitive unstructured data 99, 100f

Passive indexing 255–256

Passive integration 146

Pattern analysis 213

repetitive analytics 232–234

Performance, operational environment 309, 310f

Personal analytics

decision-making 331, 332f, 334

sandbox 335, 335f

spreadsheet 332–334, 332–334f

Personal computer 331, 332f

Personal data 331, 334–335

Personal decision-making 331, 334

Physical data base design 194–195, 195f

Pie charts 389t, 390, 390f

PMP See Project management professional (PMP)

Postprocessing, nonrepetitive data 287–289

Potentially business-relevant records 24

PowerPoint 392–393

Preprocessing, nonrepetitive data 287–289

Private taxonomies 115

Proactive data models 349–351, 350f

Probabilistic linkages 260

Project-based analysis 231, 232f

Project management professional (PMP) 167–168

Proximity analysis, nonrepetitive data 285–286

Punch cards 39, 39f, 199, 200f

Qlik Sense 386, 386f, 392

Queue time, transaction response time 313, 313f

Racetrack analogy 187, 187f

Raw big data 211

nonrepetitive 214–215

repetitive 211–213

Reactive data model 349–351

Redundancy

high-level perspective, data architecture 225–226

transformations and 66, 66f

Redundant data 59, 59f

Reengineering, DV2 implementation 172–174

Refine process 47

Relational database management system 203, 204f

Relational model, operational analytics 322, 322f

Repetitive analysis

archiving results 266–267

filtering and distillation processing 265–266

internal, external data 261

metrics 267–268

security 263–265, 263f

universal identifiers 262

Repetitive analytics 231

analyzing points of data 246, 247f

bias of the sample 241, 241f

data over time 247–249

distillation and filtering processing 238–239, 239f

filtering data 242–243

freezing data 236

heuristic processing 234–236

kinds of analysis 231–232

linking repetitive records 244–245

log tape records 245–246

normal profile 238, 238f

outliers 247, 247f

patterns 232–234

repetitive data and context 243–244

sandbox 237, 237f

subsetting data 240–241

Repetitive big data 9

Repetitive data 269

analysis of 251, 252f

active/passive indexing of data 255–256

block of data 251, 252f

elements in 251, 252f

linking data 259–260

log data 253–255

metadata, in big data 257–259, 259f

parsing of 252

summary/detailed data 256–257, 257f

application-specific nature of 87, 87f

big data context in 77–78, 77–78f

and context 243–244

contextual data on 86, 86f

parsing of 85, 85f

ratios of 21, 22f

types of 7

Repetitive raw big data 211–213

Repetitive records 291

linking of 244–245

Repetitive structured data 7–8

Repetitive text 291–292

Repetitive unstructured data 2–3, 13–15, 21–22, 22f, 111

business relevancy of 23–24, 24f

business value 90

contextualizing of 99–100

easy to analyze 92

output data recast 100, 100f

parsing of 99, 100f

Repetitive unstructured information 91, 91f

Report decompilation 108–109, 110f

Reservation systems 70

Response time 181–183, 182f

elements of 185, 186f

Roadmap, purpose of 191

Roman census approach 9, 74–75

parallelization in 81, 82f, 84

Sandbox 237, 237f, 335, 335f

Satellites 142, 145, 147

Scale-free network design 142

Scatter chart 232–233

Scatter diagram 246, 247f

Scatterplot. See Bubble chart

Scrum 166

Security, repetitive analysis 263–265, 263f

Selective subdivision, data 347–348, 348–349f

Self-service BI 161, 175

Sequence numbers 148–149

as business keys 154–155

Service-level agreement (SLA) 188–189

Siloed applications 121

building of 124–126

challenge of 121–124

characteristics of 126

current valued data 126–127

dismantling of 131, 131f

frozen business requirements 130, 130f

high availability 128–129

minimal historical data 127–128

overlap between 129–130

Siloed systems 121, 124–125

Six Sigma, Data Vault 2.0 168–169

SLA See Service-level agreement (SLA)

SLDC See Software development life cycle (SLDC)

Smart key 152–153

Software development life cycle (SLDC) 163–164, 167–168

Source system sequence-driven business keys 153–155

Spreadsheets 107–108, 332–334, 332–334f

Stacked bar chart 389t, 391–392, 391f

Staging area 221–222

Standard data warehouse 52

Standard structured DBMS 11f, 12

Standard work unit (swu) 185, 188

hourglass analogy 185–187

racetrack analogy 187, 187f

response time, elements of 185, 186f

SLA 188–189

Star join

data marts 325

data model 343–345, 344–345f

Stemming 373

word 283

Stop words 373

nonrepetitive data 282–283

Strategic business value 364–365

Structured approach, data architecture 204, 205f

Structured data 1, 2f, 10, 23, 24f

analysis of 217–218

decisions based 89–90, 90f

merging text based data and 378–379

repetitive structured data 7–8

visualizations 385

Structured DBMS 11–12

Structured revolution 178–179

Subsetting data 240–241

Summary data 256–257, 257f, 359, 359–360f

life cycle of 34

Super Bowl 233

System development life cycle (SDLC) 179, 179f

System of record 226–228

auditing data 360

data flow 357

data types 356, 356f

data updation 358–359, 359f

defined as 354–355, 355f

detailed and summary data 359, 359f

end-state architecture 355

end user awareness cycle 353–354, 354f

operational analytics 323, 324f

other data 358, 358f

text and 360–361

Tactical business value 364–365

Tagging 373

Taxonomic resolution 374

Taxonomies 111, 113–114

applicability of 113

commercial/private 115

data models 111–113

maintenance of 118, 119f

in multiple languages 115, 115f

ontology 114, 114f

and textual disambiguation

dynamics of 116, 116f

separate technologies 116–117, 118f

types of 117, 118f

Taxonomy data model 345–347, 346–348f

Taxonomy processing, nonrepetitive data 273–274

Technical view, DV2

methodology of 137

modeling of 136–137

Teradata 70, 217

Text 47, 49

management of 371

challenge of 371–374

merging text based and structured data 378–379

secondary analysis 377–378, 378f

visualization 378, 378f

system of record 360–361, 361f

transformations in 61–62, 62f

Text-based data 378–379

Textual data 4, 363

Textual disambiguation 16–18, 17–18f, 47, 79, 85–86, 101, 216, 217f, 262, 270, 271f, 374–375, 376f

call centers 296–299, 298–299f

classification of 271–272, 271f

document fracturing/named value processing 105–106

document preprocessing 106, 107f

e-mails 107, 107f

flow of processing in 271

input into 102, 103f

input/output 104–105

mapping process 102–104, 293

from narrative to analytical data base 101, 102f

nonrepetitive data

acronym resolution 277, 277f

associative word processing 281–282

custom variables 274–275

date standardization 280, 280f

date tagging 279

document classification 284–285

document metadata 284, 284f

functional sequencing within 286, 286f

homographic resolution 275–276

inline contextualization 272–273

internal referential integrity 287, 287f

list processing 280–281

negation analysis 277–278

numeric tagging 278–279

preprocessing and postprocessing 287–289

proximity analysis 285–286

stop word processing 282–283

word stemming 283

processing components of 376–377

report decompilation 108–109, 110f

secondary analysis 377–378, 378f

spreadsheets 107–108

taxonomies and 115

dynamics of 116, 116f

separate technologies 116–117, 118f

Textual ETL. See Textual disambiguation

Textual information 89, 89f

Total cost of ownership (TCO) 144

Total quality management (TQM) 169–170

Transaction data 49

Transaction response time 310, 311f

data types 314, 315f

denormalization 314–315, 315f

elements of 311–312, 311f

fast-running transaction 314, 314f

I/O operation 312–313, 313f

long-running programs 315–317, 316f

measurement of 312, 312f

queue time 313, 313f

Uniprocessor architecture 41

Universal identifiers 262

Unstructured approach, data architecture 204, 205f

Unstructured big data 7

Unstructured data 89, 205–206, 205–206f

analysis of 217–218

big data 75–76

business value proposition 90–91, 90f

classification of 111, 112f

contextualization 93–95, 94f, 96f

ease of analysis 91–93, 92–93f

manual analysis 96–97, 97f

MapReduce 96, 96f

nonrepetitive 2–3, 16–18

repetitive 2–3, 14–15

repetitive and nonrepetitive unstructured information 91, 91f

structured data, decisions 89–90, 90f

textual information 89, 89f

visualizations 385

US dollar 262

Virtualization, data marts 174–175

Visualizations 378, 378f, 388, 389t

bar chart 390, 391f

bubble chart 393

horizontal bar chart 392, 392f

line chart 392, 393f

stacked bar chart 391–392, 391f

number chart 389, 389f

pie charts 390, 390f

WhereScape 137

Word stemming, nonrepetitive data 283

Yourdon, Ed 178–179, 202

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Index

Create new playlist

Sign In

Sign Up

Index

Table of Contents for
Index