Home Page Icon
Home Page
Table of Contents for
FrontMatter
Close
FrontMatter
by Bill Inmon
Data Lake Architecture: Designing the Data Lake and Avoiding the Garbage Dump
Introduction
Chapter 1 Data Lakes
Enter Big Data
Enter the Data Lake
“One Way” Data Lake
In Summary
Chapter 2 Transforming the Data Lake
Metadata
Integration Mapping
Context
Metaprocess
Data Scientist
General Usability
In Summary
Chapter 3 Inside the Data Lake
Analog Data
Application Data
Textual Data
Another Perspective
In Summary
Chapter 4 Data Ponds
Conditioning Data
Raw Data Pond
Analog Data Pond
Application Data Pond
Textual Data Pond
Data Passing Directly Into the Data Ponds
Archival Data Pond
In Summary
Chapter 5 Generic Structure of the Data Pond
Pond Descriptor
Pond Target
Pond Data
Pond Metadata
Pond Metaprocess
Pond Transformation Criteria
In Summary
Chapter 6 Analog Data Pond
Analog Data Issues
Data Descriptor
Capturing Raw Data/Transforming Raw Data
Transforming/Conditioning Raw Analog Data
Data Excision
Clustering Data
Data Relationships
Probability of Future Usage
Outliers
Specialized Ad Hoc Analysis
In Summary
Chapter 7 Application Data Pond
DNA of Data
Descriptors
Standard Database Format
Basic Organization of Data
Integration of Data
Data Model
Necessity of Integration
Pointing From one Application to the Next
Intersecting Applications
Subsets of Data in the Application Data Pond
In Summary
Chapter 8 Textual Data Pond
Uniform Data and the Computer
Valuable Text
Textual Disambiguation
Text Sent to the Data Pond
Output of Textual Disambiguation
Inherent Complexity
Textual Disambiguation Functionality
Taxonomies and Ontologies
Value of Text and Context
Tracing Text Back to the Source
Mechanics of Disambiguation
Analyzing the Database
Visualizing the Results
In Summary
Chapter 9 Comparing the Ponds
Similarities Across the Data Ponds
Dissimilarities Across the Data Ponds
Relational Format for Final State Data
Technology Differences
Total Expected Volume of Data in the Data Pond
Moving Data From Pond to Pond
Doing Analysis From Multiple Ponds
Using Metadata to Relate Data From Different Ponds
What if…?
In Summary
Chapter 10 Using the Infrastructure
“One Way” Data Lake
Transforming the Data Lake
Transformation Technology
Some Analytical Questions
Querying Textual Data
Real Analysis
In Summary
Chapter 11 Search and Analysis
Confusion Spread by the Vendors
In Summary
Chapter 12 Business Value in the Data Ponds
Business Value in the Analog Data Pond
Business Value in the Application Data Pond
Business Value in the Textual Data Pond
Percent of Records That Have Business Value
In Summary
Chapter 13 Additional Topics
High System Level Documentation
Detailed Data Pond Level Documentation
What Data Flows Into the Data Lake/Data Pond?
Where Does Analysis Occur?
The age of Data
Security of Data
In Summary
Chapter 14 Analytical and Integration Tools
Visualization
Search and Qualify
Textual Disambiguation
Statistical Analysis
Classical ETL Processing
In Summary
Chapter 15 Archiving Data Ponds
Criteria for Removal
Structural Alteration
Creating Independent Indexes for Archival Data
In Summary
Glossary
References
Index
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
CoverImage
Next
Next Chapter
FrontMatter-1
Designing the Data Lake
and
Avoiding the Garbage Dump
first edition
Bill Inmon
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset