Table of Contents

Cover image

Title page

Copyright

Dedication

Chapter 1.1: An Introduction to Data Architecture

Abstract

Subdividing Data

Repetitive/Nonrepetitive Unstructured Data

The Great Divide of Data

Textual/Nontextual Data

The Different Forms of Data

Business Value

Chapter 1.2: The Data Infrastructure

Abstract

Two Types of Repetitive Data

Repetitive Structured Data

Repetitive Big Data

The Two Infrastructures

What's Being Optimized?

Comparing the Two Infrastructures

Chapter 1.3: The “Great Divide”

Abstract

Classifying Corporate Data

The “Great Divide”

Repetitive Unstructured Data

Nonrepetitive Unstructured Data

Different Worlds

Chapter 1.4: Demographics of Corporate Data

Abstract

Chapter 1.5: Corporate Data Analysis

Abstract

Chapter 1.6: The Life Cycle of Data: Understanding Data Over Time

Abstract

Chapter 1.7: A Brief History of Data

Abstract

Paper Tape and Punch Cards

Magnetic Tapes

Disk Storage

Data Base Management System (DBMS)

Coupled Processors

Online Transaction Processing

Data Warehouse

Parallel Data Management

Data Vault

Big Data

The Great Divide

Chapter 2.1: The End-State Architecture—The “World Map”

Abstract

Architectural Components

Different Kinds of Data in the End State Architecture

Shaping the Data Through Models

Where Is the Data Warehouse?

Where Different Types of Questions Are Answered Across the End State Architecture

Data in the Data Lake

Metadata in the End State Architecture

Networked Metadata

An Evolutionary Experience

The Data Lake Architecture

Chapter 3.1: Transformations in the End-State Architecture

Abstract

Redundant Data

Transformations

Customizing Data

Transforming Text

Transforming Application Data

Transforming Data Into a Customized State

Transforming Data Into Bulk Storage

Transforming Data Generated Automatically

Transforming Bulk Data

Transformation and Redundancy

Chapter 4.1: A Brief History of Big Data

Abstract

An Analogy—Taking the High Ground

Taking the High Ground

Standardization With the 360

Online Transaction Processing

Enter Teradata and MPP Processing

Then Came Hadoop and Big Data

IBM and Hadoop

Holding the High Ground

Chapter 4.2: What Is Big Data?

Abstract

Another Definition

Large Volumes

Inexpensive Storage

The Roman Census Approach

Unstructured Data

Data in Big Data

Context in Repetitive Data

Nonrepetitive Data

Context in Nonrepetitive Data

Chapter 4.3: Parallel Processing

Abstract

Chapter 4.4: Unstructured Data

Abstract

Textual Information—Everywhere

Decisions Based on Structured Data

The Business Value Proposition

Repetitive and Nonrepetitive Unstructured Information

Ease of Analysis

Contextualization

Some Approaches to Contextualization

Map Reduce

Manual Analysis

Chapter 4.5: Contextualizing Repetitive Unstructured Data

Abstract

Parsing Repetitive Unstructured Data

Recasting the Output Data

Chapter 4.6: Textual Disambiguation

Abstract

From Narrative Into an Analytical Data Base

Input Into Textual Disambiguation

Mapping

Input/Output

Document Fracturing/Named Value Processing

Preprocessing a Document

E-mails—A Special Case

Spreadsheets

Report Decompilation

Chapter 4.7: Taxonomies

Abstract

Data Models/Taxonomies

Applicability of Taxonomies

What Is a Taxonomy?

Taxonomies in Multiple Languages

Commercial or Private Taxonomies?

Dynamics of Taxonomies and Textual Disambiguation

Taxonomies and Textual Disambiguation—Separate Technologies

Different Types of Taxonomies

Taxonomies—Maintenance Over Time

Chapter 5.1: The Siloed Application Environment

Abstract

The Challenge of Siloed Applications

Building Siloed Applications

What Does a Siloed Application Look Like?

Current Valued Data

Minimal Historical Data

High Availability

Overlap Between Siloed Applications

Frozen Business Requirements

Dismantling Siloed Applications

Chapter 6.1: Introduction to Data Vault 2.0

Abstract

Data Vault Origins and Background

What Is Data Vault 2.0 Modeling?

How Is Data Vault 2.0 Methodology Defined?

Why Do We Need a Data Vault 2.0 Architecture?

Where Does Data Vault 2.0 Implementation Fit?

What Are the Business Benefits of Data Vault 2.0?

What Is Data Vault 1.0?

Chapter 6.2: Introduction to Data Vault Modeling

Abstract

What Is a Data Vault Model Concept?

Data Vault Model Defined

Components of a Data Vault Model

What Makes Business Keys So Interesting?

What Does This Have to Do With Data Vault and Data Warehousing?

How Does This Translate to Data Vault Modeling?

Why Restructure the Data From the Staging Area?

What Are the Basic Rules of the Data Vault Model?

Why Do We Need Many to Many Link Structures?

Primary Key Options for Data Vault 2.0

Chapter 6.3: Introduction to Data Vault Architecture

Abstract

What Is a Data Vault 2.0 Architecture?

How Does NoSQL Fit in to the Architecture?

What Are the Objectives of the Data Vault 2.0 Architecture?

What Is the Objective of the Data Vault 2.0 Model?

What Are Hard and Soft Business Rules?

How Does Managed Self Service BI Fit in the Architecture?

Chapter 6.4: Introduction to Data Vault Methodology

Abstract

Data Vault 2.0 Methodology Overview

How Does CMMI Contribute to the Methodology?

If CMMI Is So Great, Why Should We Care About Agility Then?

Why Include PMP, SDLC If CMMI and Agile Should Be All That's Needed?

So Then, What Does Six Sigma Contribute to the Data Vault 2 Methodology?

Where Does TQM (Total Quality Management) Fit in to All of This?

Chapter 6.5: Introduction to Data Vault Implementation

Abstract

Implementation Overview

What's So Important About Patterns?

Why Does Reengineering Happen Because of Big Data?

Why Do We Need to Virtualize Our Data Marts?

What Is Managed Self-Service BI?

Chapter 7.1: The Operational Environment: A Short History

Abstract

Commercial Uses of the Computer

The First Applications

Ed Yourdon and the Structured Revolution

The SDLC

Disk Technology

Enter the DBMS

Response Time and Availability

Corporate Computing Today

Chapter 7.2: The Standard Work Unit

Abstract

Elements of Response Time

An Hourglass Analogy

The Racetrack Analogy

Your Vehicle Runs as Fast as the Vehicle in Front of It

The Standard Work Unit

The SLA

Chapter 7.3: Data Modeling for the Structured Environment

Abstract

The Purpose of the Roadmap

Granular Data Only

The ERD

The Dis

Physical Data Base Design

Relating the Different Levels of the Data Model

An Example of the Linkage

Generic Data Models

Operational Data Models/Data Warehouse Data Models

Chapter 8.1: A Brief History of Data Architecture

Abstract

Chapter 8.2: Big Data/Existing System Interface

Abstract

The Big Data/Existing Systems Interface

The Repetitive Raw Big Data/Existing Systems Interface

Exception Based Data

The Nonrepetitive Raw Big Data/Existing Systems Interface

Into the Existing Systems Environment

The “Context Enriched” Big Data Environment

Analyzing Structured Data/Unstructured Data Together

Chapter 8.3: The Data Warehouse/Operational Environment Interface

Abstract

The Operational/Data Warehouse Interface

The Classical ETL Interface

The ODS and the ETL Interface

The Staging Area

Changed Data Capture

Inline Transformation

ELT Processing

Chapter 8.4: Data Architecture: A High-Level Perspective

Abstract

A High Level Perspective

Redundancy

The System of Record

Different Types of Questions

Different Communities

Chapter 9.1: Repetitive Analytics: Some Basics

Abstract

Different Kinds of Analysis

Looking for Patterns

Heuristic Processing

Freezing Data

The Sandbox

The “Normal” Profile

Distillation, Filtering

Subsetting Data

Bias of the Sample

Filtering Data

Repetitive Data and Context

Linking Repetitive Records

Log Tape Records

Analyzing Points of Data

Outliers

Data Over Time

Chapter 9.2: Analyzing Repetitive Data

Abstract

Log Data

Active/Passive Indexing of Data

Summary/Detailed Data

Metadata in Big Data

Linking Data

Chapter 9.3: Repetitive Analysis

Abstract

Internal, External Data

Universal Identifiers

Security

Filtering, Distillation

Archiving Results

Metrics

Chapter 10.1: Nonrepetitive Data

Abstract

Inline Contextualization

Taxonomy/Ontology Processing

Custom Variables

Homographic Resolution

Acronym Resolution

Negation Analysis

Numeric Tagging

Date Tagging

Date Standardization

List Processing

Associative Word Processing

Stop Word Processing

Word Stemming

Document Metadata

Document Classification

Proximity Analysis

Functional Sequencing Within Textual ETL

Internal Referential Integrity

Preprocessing, Postprocessing

Chapter 10.2: Mapping

Abstract

Chapter 10.3: Analytics From Nonrepetitive Data

Abstract

Call Center Information

Medical Records

Chapter 11.1: Operational Analytics: Response Time

Abstract

Transaction Response Time

Chapter 12.1: Operational Analytics

Abstract

Different Perspectives of Data

Data Marts

The Operational Data Store—ODS

Chapter 13.1: Personal Analytics

Abstract

Chapter 14.1: Data Models Across the End-State Architecture

Abstract

The Different Data Models

Functional Decomposition and Data Flow Diagrams

The Corporate Data Model

The Star Join/Dimensional Data Model

Taxonomies/Ontologies

The Selective Subdivision of Data

Proactive/Reactive Data Models

Chapter 15.1: The System of Record

Abstract

The End User Cycle of Awareness

The System of Record

The System of Record in the End State Architecture

The Role of Age in the System of Record

A Simple Example

The Flow of Data in the System of Record

Other Data Than the System of Record

Is Data Updated in the System of Record?

Detailed and Summary Data in the System of Record

Auditing Data and the System of Record

Text and the System of Record

Chapter 16.1: Business Value and the End-State Architecture

Abstract

The Evolution of the End State Architecture

What is Meant by “Business Value”

Tactical Business Value/Strategic Business Value

Volume of Data Versus Business Value

The “Million in One” Syndrome

Where Business Value Occurs

Data Relevancy Over Time

Where Tactical Decisions Are Made

Chapter 17.1: Managing Text

Abstract

The Challenge of Text

The Challenge of Context

The Processing Components of Textual ETL

Secondary Analysis

Visualization

Merging Text Based Data and Structured Data

Chapter 18.1: An Introduction to Data Visualizations

Abstract

Introduction to Data Visualizations—Overview

Purpose and Context

Visualization—A Science and an Art

Visualization Framework

Step 1: Define

Step 2: Data

Step 3: Design

Step 4: Distribute

Data Visualization Tools and Software

Summary

Glossary

Index

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset