Home Page Icon
Home Page
Table of Contents for
Cover image
Close
Cover image
by Mary Levins, Daniel Linstedt, W.H. Inmon
Data Architecture: A Primer for the Data Scientist, 2nd Edition
Cover image
Title page
Table of Contents
Copyright
Dedication
Chapter 1.1: An Introduction to Data Architecture
Abstract
Subdividing Data
Repetitive/Nonrepetitive Unstructured Data
The Great Divide of Data
Textual/Nontextual Data
The Different Forms of Data
Business Value
Chapter 1.2: The Data Infrastructure
Abstract
Two Types of Repetitive Data
Repetitive Structured Data
Repetitive Big Data
The Two Infrastructures
What's Being Optimized?
Comparing the Two Infrastructures
Chapter 1.3: The “Great Divide”
Abstract
Classifying Corporate Data
The “Great Divide”
Repetitive Unstructured Data
Nonrepetitive Unstructured Data
Different Worlds
Chapter 1.4: Demographics of Corporate Data
Abstract
Chapter 1.5: Corporate Data Analysis
Abstract
Chapter 1.6: The Life Cycle of Data: Understanding Data Over Time
Abstract
Chapter 1.7: A Brief History of Data
Abstract
Paper Tape and Punch Cards
Magnetic Tapes
Disk Storage
Data Base Management System (DBMS)
Coupled Processors
Online Transaction Processing
Data Warehouse
Parallel Data Management
Data Vault
Big Data
The Great Divide
Chapter 2.1: The End-State Architecture—The “World Map”
Abstract
Architectural Components
Different Kinds of Data in the End State Architecture
Shaping the Data Through Models
Where Is the Data Warehouse?
Where Different Types of Questions Are Answered Across the End State Architecture
Data in the Data Lake
Metadata in the End State Architecture
Networked Metadata
An Evolutionary Experience
The Data Lake Architecture
Chapter 3.1: Transformations in the End-State Architecture
Abstract
Redundant Data
Transformations
Customizing Data
Transforming Text
Transforming Application Data
Transforming Data Into a Customized State
Transforming Data Into Bulk Storage
Transforming Data Generated Automatically
Transforming Bulk Data
Transformation and Redundancy
Chapter 4.1: A Brief History of Big Data
Abstract
An Analogy—Taking the High Ground
Taking the High Ground
Standardization With the 360
Online Transaction Processing
Enter Teradata and MPP Processing
Then Came Hadoop and Big Data
IBM and Hadoop
Holding the High Ground
Chapter 4.2: What Is Big Data?
Abstract
Another Definition
Large Volumes
Inexpensive Storage
The Roman Census Approach
Unstructured Data
Data in Big Data
Context in Repetitive Data
Nonrepetitive Data
Context in Nonrepetitive Data
Chapter 4.3: Parallel Processing
Abstract
Chapter 4.4: Unstructured Data
Abstract
Textual Information—Everywhere
Decisions Based on Structured Data
The Business Value Proposition
Repetitive and Nonrepetitive Unstructured Information
Ease of Analysis
Contextualization
Some Approaches to Contextualization
Map Reduce
Manual Analysis
Chapter 4.5: Contextualizing Repetitive Unstructured Data
Abstract
Parsing Repetitive Unstructured Data
Recasting the Output Data
Chapter 4.6: Textual Disambiguation
Abstract
From Narrative Into an Analytical Data Base
Input Into Textual Disambiguation
Mapping
Input/Output
Document Fracturing/Named Value Processing
Preprocessing a Document
E-mails—A Special Case
Spreadsheets
Report Decompilation
Chapter 4.7: Taxonomies
Abstract
Data Models/Taxonomies
Applicability of Taxonomies
What Is a Taxonomy?
Taxonomies in Multiple Languages
Commercial or Private Taxonomies?
Dynamics of Taxonomies and Textual Disambiguation
Taxonomies and Textual Disambiguation—Separate Technologies
Different Types of Taxonomies
Taxonomies—Maintenance Over Time
Chapter 5.1: The Siloed Application Environment
Abstract
The Challenge of Siloed Applications
Building Siloed Applications
What Does a Siloed Application Look Like?
Current Valued Data
Minimal Historical Data
High Availability
Overlap Between Siloed Applications
Frozen Business Requirements
Dismantling Siloed Applications
Chapter 6.1: Introduction to Data Vault 2.0
Abstract
Data Vault Origins and Background
What Is Data Vault 2.0 Modeling?
How Is Data Vault 2.0 Methodology Defined?
Why Do We Need a Data Vault 2.0 Architecture?
Where Does Data Vault 2.0 Implementation Fit?
What Are the Business Benefits of Data Vault 2.0?
What Is Data Vault 1.0?
Chapter 6.2: Introduction to Data Vault Modeling
Abstract
What Is a Data Vault Model Concept?
Data Vault Model Defined
Components of a Data Vault Model
What Makes Business Keys So Interesting?
What Does This Have to Do With Data Vault and Data Warehousing?
How Does This Translate to Data Vault Modeling?
Why Restructure the Data From the Staging Area?
What Are the Basic Rules of the Data Vault Model?
Why Do We Need Many to Many Link Structures?
Primary Key Options for Data Vault 2.0
Chapter 6.3: Introduction to Data Vault Architecture
Abstract
What Is a Data Vault 2.0 Architecture?
How Does NoSQL Fit in to the Architecture?
What Are the Objectives of the Data Vault 2.0 Architecture?
What Is the Objective of the Data Vault 2.0 Model?
What Are Hard and Soft Business Rules?
How Does Managed Self Service BI Fit in the Architecture?
Chapter 6.4: Introduction to Data Vault Methodology
Abstract
Data Vault 2.0 Methodology Overview
How Does CMMI Contribute to the Methodology?
If CMMI Is So Great, Why Should We Care About Agility Then?
Why Include PMP, SDLC If CMMI and Agile Should Be All That's Needed?
So Then, What Does Six Sigma Contribute to the Data Vault 2 Methodology?
Where Does TQM (Total Quality Management) Fit in to All of This?
Chapter 6.5: Introduction to Data Vault Implementation
Abstract
Implementation Overview
What's So Important About Patterns?
Why Does Reengineering Happen Because of Big Data?
Why Do We Need to Virtualize Our Data Marts?
What Is Managed Self-Service BI?
Chapter 7.1: The Operational Environment: A Short History
Abstract
Commercial Uses of the Computer
The First Applications
Ed Yourdon and the Structured Revolution
The SDLC
Disk Technology
Enter the DBMS
Response Time and Availability
Corporate Computing Today
Chapter 7.2: The Standard Work Unit
Abstract
Elements of Response Time
An Hourglass Analogy
The Racetrack Analogy
Your Vehicle Runs as Fast as the Vehicle in Front of It
The Standard Work Unit
The SLA
Chapter 7.3: Data Modeling for the Structured Environment
Abstract
The Purpose of the Roadmap
Granular Data Only
The ERD
The Dis
Physical Data Base Design
Relating the Different Levels of the Data Model
An Example of the Linkage
Generic Data Models
Operational Data Models/Data Warehouse Data Models
Chapter 8.1: A Brief History of Data Architecture
Abstract
Chapter 8.2: Big Data/Existing System Interface
Abstract
The Big Data/Existing Systems Interface
The Repetitive Raw Big Data/Existing Systems Interface
Exception Based Data
The Nonrepetitive Raw Big Data/Existing Systems Interface
Into the Existing Systems Environment
The “Context Enriched” Big Data Environment
Analyzing Structured Data/Unstructured Data Together
Chapter 8.3: The Data Warehouse/Operational Environment Interface
Abstract
The Operational/Data Warehouse Interface
The Classical ETL Interface
The ODS and the ETL Interface
The Staging Area
Changed Data Capture
Inline Transformation
ELT Processing
Chapter 8.4: Data Architecture: A High-Level Perspective
Abstract
A High Level Perspective
Redundancy
The System of Record
Different Types of Questions
Different Communities
Chapter 9.1: Repetitive Analytics: Some Basics
Abstract
Different Kinds of Analysis
Looking for Patterns
Heuristic Processing
Freezing Data
The Sandbox
The “Normal” Profile
Distillation, Filtering
Subsetting Data
Bias of the Sample
Filtering Data
Repetitive Data and Context
Linking Repetitive Records
Log Tape Records
Analyzing Points of Data
Outliers
Data Over Time
Chapter 9.2: Analyzing Repetitive Data
Abstract
Log Data
Active/Passive Indexing of Data
Summary/Detailed Data
Metadata in Big Data
Linking Data
Chapter 9.3: Repetitive Analysis
Abstract
Internal, External Data
Universal Identifiers
Security
Filtering, Distillation
Archiving Results
Metrics
Chapter 10.1: Nonrepetitive Data
Abstract
Inline Contextualization
Taxonomy/Ontology Processing
Custom Variables
Homographic Resolution
Acronym Resolution
Negation Analysis
Numeric Tagging
Date Tagging
Date Standardization
List Processing
Associative Word Processing
Stop Word Processing
Word Stemming
Document Metadata
Document Classification
Proximity Analysis
Functional Sequencing Within Textual ETL
Internal Referential Integrity
Preprocessing, Postprocessing
Chapter 10.2: Mapping
Abstract
Chapter 10.3: Analytics From Nonrepetitive Data
Abstract
Call Center Information
Medical Records
Chapter 11.1: Operational Analytics: Response Time
Abstract
Transaction Response Time
Chapter 12.1: Operational Analytics
Abstract
Different Perspectives of Data
Data Marts
The Operational Data Store—ODS
Chapter 13.1: Personal Analytics
Abstract
Chapter 14.1: Data Models Across the End-State Architecture
Abstract
The Different Data Models
Functional Decomposition and Data Flow Diagrams
The Corporate Data Model
The Star Join/Dimensional Data Model
Taxonomies/Ontologies
The Selective Subdivision of Data
Proactive/Reactive Data Models
Chapter 15.1: The System of Record
Abstract
The End User Cycle of Awareness
The System of Record
The System of Record in the End State Architecture
The Role of Age in the System of Record
A Simple Example
The Flow of Data in the System of Record
Other Data Than the System of Record
Is Data Updated in the System of Record?
Detailed and Summary Data in the System of Record
Auditing Data and the System of Record
Text and the System of Record
Chapter 16.1: Business Value and the End-State Architecture
Abstract
The Evolution of the End State Architecture
What is Meant by “Business Value”
Tactical Business Value/Strategic Business Value
Volume of Data Versus Business Value
The “Million in One” Syndrome
Where Business Value Occurs
Data Relevancy Over Time
Where Tactical Decisions Are Made
Chapter 17.1: Managing Text
Abstract
The Challenge of Text
The Challenge of Context
The Processing Components of Textual ETL
Secondary Analysis
Visualization
Merging Text Based Data and Structured Data
Chapter 18.1: An Introduction to Data Visualizations
Abstract
Introduction to Data Visualizations—Overview
Purpose and Context
Visualization—A Science and an Art
Visualization Framework
Step 1: Define
Step 2: Data
Step 3: Design
Step 4: Distribute
Data Visualization Tools and Software
Summary
Glossary
Index
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Next
Next Chapter
Title page
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset