Table of Contents

Cover image

Title page



Chapter 1.1: An Introduction to Data Architecture


Subdividing Data

Repetitive/Nonrepetitive Unstructured Data

The Great Divide of Data

Textual/Nontextual Data

The Different Forms of Data

Business Value

Chapter 1.2: The Data Infrastructure


Two Types of Repetitive Data

Repetitive Structured Data

Repetitive Big Data

The Two Infrastructures

What's Being Optimized?

Comparing the Two Infrastructures

Chapter 1.3: The “Great Divide”


Classifying Corporate Data

The “Great Divide”

Repetitive Unstructured Data

Nonrepetitive Unstructured Data

Different Worlds

Chapter 1.4: Demographics of Corporate Data


Chapter 1.5: Corporate Data Analysis


Chapter 1.6: The Life Cycle of Data: Understanding Data Over Time


Chapter 1.7: A Brief History of Data


Paper Tape and Punch Cards

Magnetic Tapes

Disk Storage

Data Base Management System (DBMS)

Coupled Processors

Online Transaction Processing

Data Warehouse

Parallel Data Management

Data Vault

Big Data

The Great Divide

Chapter 2.1: The End-State Architecture—The “World Map”


Architectural Components

Different Kinds of Data in the End State Architecture

Shaping the Data Through Models

Where Is the Data Warehouse?

Where Different Types of Questions Are Answered Across the End State Architecture

Data in the Data Lake

Metadata in the End State Architecture

Networked Metadata

An Evolutionary Experience

The Data Lake Architecture

Chapter 3.1: Transformations in the End-State Architecture


Redundant Data


Customizing Data

Transforming Text

Transforming Application Data

Transforming Data Into a Customized State

Transforming Data Into Bulk Storage

Transforming Data Generated Automatically

Transforming Bulk Data

Transformation and Redundancy

Chapter 4.1: A Brief History of Big Data


An Analogy—Taking the High Ground

Taking the High Ground

Standardization With the 360

Online Transaction Processing

Enter Teradata and MPP Processing

Then Came Hadoop and Big Data

IBM and Hadoop

Holding the High Ground

Chapter 4.2: What Is Big Data?


Another Definition

Large Volumes

Inexpensive Storage

The Roman Census Approach

Unstructured Data

Data in Big Data

Context in Repetitive Data

Nonrepetitive Data

Context in Nonrepetitive Data

Chapter 4.3: Parallel Processing


Chapter 4.4: Unstructured Data


Textual Information—Everywhere

Decisions Based on Structured Data

The Business Value Proposition

Repetitive and Nonrepetitive Unstructured Information

Ease of Analysis


Some Approaches to Contextualization

Map Reduce

Manual Analysis

Chapter 4.5: Contextualizing Repetitive Unstructured Data


Parsing Repetitive Unstructured Data

Recasting the Output Data

Chapter 4.6: Textual Disambiguation


From Narrative Into an Analytical Data Base

Input Into Textual Disambiguation



Document Fracturing/Named Value Processing

Preprocessing a Document

E-mails—A Special Case


Report Decompilation

Chapter 4.7: Taxonomies


Data Models/Taxonomies

Applicability of Taxonomies

What Is a Taxonomy?

Taxonomies in Multiple Languages

Commercial or Private Taxonomies?

Dynamics of Taxonomies and Textual Disambiguation

Taxonomies and Textual Disambiguation—Separate Technologies

Different Types of Taxonomies

Taxonomies—Maintenance Over Time

Chapter 5.1: The Siloed Application Environment


The Challenge of Siloed Applications

Building Siloed Applications

What Does a Siloed Application Look Like?

Current Valued Data

Minimal Historical Data

High Availability

Overlap Between Siloed Applications

Frozen Business Requirements

Dismantling Siloed Applications

Chapter 6.1: Introduction to Data Vault 2.0


Data Vault Origins and Background

What Is Data Vault 2.0 Modeling?

How Is Data Vault 2.0 Methodology Defined?

Why Do We Need a Data Vault 2.0 Architecture?

Where Does Data Vault 2.0 Implementation Fit?

What Are the Business Benefits of Data Vault 2.0?

What Is Data Vault 1.0?

Chapter 6.2: Introduction to Data Vault Modeling


What Is a Data Vault Model Concept?

Data Vault Model Defined

Components of a Data Vault Model

What Makes Business Keys So Interesting?

What Does This Have to Do With Data Vault and Data Warehousing?

How Does This Translate to Data Vault Modeling?

Why Restructure the Data From the Staging Area?

What Are the Basic Rules of the Data Vault Model?

Why Do We Need Many to Many Link Structures?

Primary Key Options for Data Vault 2.0

Chapter 6.3: Introduction to Data Vault Architecture


What Is a Data Vault 2.0 Architecture?

How Does NoSQL Fit in to the Architecture?

What Are the Objectives of the Data Vault 2.0 Architecture?

What Is the Objective of the Data Vault 2.0 Model?

What Are Hard and Soft Business Rules?

How Does Managed Self Service BI Fit in the Architecture?

Chapter 6.4: Introduction to Data Vault Methodology


Data Vault 2.0 Methodology Overview

How Does CMMI Contribute to the Methodology?

If CMMI Is So Great, Why Should We Care About Agility Then?

Why Include PMP, SDLC If CMMI and Agile Should Be All That's Needed?

So Then, What Does Six Sigma Contribute to the Data Vault 2 Methodology?

Where Does TQM (Total Quality Management) Fit in to All of This?

Chapter 6.5: Introduction to Data Vault Implementation


Implementation Overview

What's So Important About Patterns?

Why Does Reengineering Happen Because of Big Data?

Why Do We Need to Virtualize Our Data Marts?

What Is Managed Self-Service BI?

Chapter 7.1: The Operational Environment: A Short History


Commercial Uses of the Computer

The First Applications

Ed Yourdon and the Structured Revolution


Disk Technology

Enter the DBMS

Response Time and Availability

Corporate Computing Today

Chapter 7.2: The Standard Work Unit


Elements of Response Time

An Hourglass Analogy

The Racetrack Analogy

Your Vehicle Runs as Fast as the Vehicle in Front of It

The Standard Work Unit


Chapter 7.3: Data Modeling for the Structured Environment


The Purpose of the Roadmap

Granular Data Only


The Dis

Physical Data Base Design

Relating the Different Levels of the Data Model

An Example of the Linkage

Generic Data Models

Operational Data Models/Data Warehouse Data Models

Chapter 8.1: A Brief History of Data Architecture


Chapter 8.2: Big Data/Existing System Interface


The Big Data/Existing Systems Interface

The Repetitive Raw Big Data/Existing Systems Interface

Exception Based Data

The Nonrepetitive Raw Big Data/Existing Systems Interface

Into the Existing Systems Environment

The “Context Enriched” Big Data Environment

Analyzing Structured Data/Unstructured Data Together

Chapter 8.3: The Data Warehouse/Operational Environment Interface


The Operational/Data Warehouse Interface

The Classical ETL Interface

The ODS and the ETL Interface

The Staging Area

Changed Data Capture

Inline Transformation

ELT Processing

Chapter 8.4: Data Architecture: A High-Level Perspective


A High Level Perspective


The System of Record

Different Types of Questions

Different Communities

Chapter 9.1: Repetitive Analytics: Some Basics


Different Kinds of Analysis

Looking for Patterns

Heuristic Processing

Freezing Data

The Sandbox

The “Normal” Profile

Distillation, Filtering

Subsetting Data

Bias of the Sample

Filtering Data

Repetitive Data and Context

Linking Repetitive Records

Log Tape Records

Analyzing Points of Data


Data Over Time

Chapter 9.2: Analyzing Repetitive Data


Log Data

Active/Passive Indexing of Data

Summary/Detailed Data

Metadata in Big Data

Linking Data

Chapter 9.3: Repetitive Analysis


Internal, External Data

Universal Identifiers


Filtering, Distillation

Archiving Results


Chapter 10.1: Nonrepetitive Data


Inline Contextualization

Taxonomy/Ontology Processing

Custom Variables

Homographic Resolution

Acronym Resolution

Negation Analysis

Numeric Tagging

Date Tagging

Date Standardization

List Processing

Associative Word Processing

Stop Word Processing

Word Stemming

Document Metadata

Document Classification

Proximity Analysis

Functional Sequencing Within Textual ETL

Internal Referential Integrity

Preprocessing, Postprocessing

Chapter 10.2: Mapping


Chapter 10.3: Analytics From Nonrepetitive Data


Call Center Information

Medical Records

Chapter 11.1: Operational Analytics: Response Time


Transaction Response Time

Chapter 12.1: Operational Analytics


Different Perspectives of Data

Data Marts

The Operational Data Store—ODS

Chapter 13.1: Personal Analytics


Chapter 14.1: Data Models Across the End-State Architecture


The Different Data Models

Functional Decomposition and Data Flow Diagrams

The Corporate Data Model

The Star Join/Dimensional Data Model


The Selective Subdivision of Data

Proactive/Reactive Data Models

Chapter 15.1: The System of Record


The End User Cycle of Awareness

The System of Record

The System of Record in the End State Architecture

The Role of Age in the System of Record

A Simple Example

The Flow of Data in the System of Record

Other Data Than the System of Record

Is Data Updated in the System of Record?

Detailed and Summary Data in the System of Record

Auditing Data and the System of Record

Text and the System of Record

Chapter 16.1: Business Value and the End-State Architecture


The Evolution of the End State Architecture

What is Meant by “Business Value”

Tactical Business Value/Strategic Business Value

Volume of Data Versus Business Value

The “Million in One” Syndrome

Where Business Value Occurs

Data Relevancy Over Time

Where Tactical Decisions Are Made

Chapter 17.1: Managing Text


The Challenge of Text

The Challenge of Context

The Processing Components of Textual ETL

Secondary Analysis


Merging Text Based Data and Structured Data

Chapter 18.1: An Introduction to Data Visualizations


Introduction to Data Visualizations—Overview

Purpose and Context

Visualization—A Science and an Art

Visualization Framework

Step 1: Define

Step 2: Data

Step 3: Design

Step 4: Distribute

Data Visualization Tools and Software




..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.