Contents

Chapter 1 – The Aster Data Architecture

What is Parallel Processing?

Aster Data is a Parallel Processing System

Each vworker holds a Portion of Every Table

The Rows of a Table are Spread Across All vworkers

The Aster Data Architecture

The Queen Node

The Worker Node

The Loader Node

The Backup Node

The Aster Architecture Interconnect

Backup and Loader Nodes Do Not use the Interconnect

The Aster Architecture has Spare Nodes

The Aster Architecture Allows Flexibility based on Need

Aster Data Provides Four Fundamental Hardware Strengths

Replication Failover

Data is Compressed on Data Transfers

Aster Utilizes Dual Optimizers

Aster Allows a Hybrid of SQL and MapReduce

MapReduce History

What is MapReduce?

What is SQL-MR?

Sessionize – An Example of SQL-MR

Support for Mixed Workload Management and Prioritization

Chapter 2 – Fact and Dimension Tables

Aster Tables are defined as Fact or Dimension when Created

Fact Table

A More Detailed Look at the Fact Table Distribution

Dimension Table are Replicated

A Dimension Table is often Replicated across vworkers

Aster Data has Fact and Dimension Tables

Aster Tables are defined as Fact or Dimension when Created

Fact and Dimension Tables can be Hashed by the same Key

Distribution Key Rules

Aster Data Uses a Hash Formula

The Hash Map Determines which vworker will own the Row

The Hash Formula, Hash Map and vworker

Placing rows on the vworker

Placing rows on the vworker Continued

A Review of the Hashing Process

Like Data Hashes to the Same vworker

Distribution Key Data Types

Run ANALYZE to COLLECT STATISTICS on a Table

Some Examples of ANALYZE

What Columns to Analyze

Chapter 3 – How Aster Processes Data

When a Table is Created, a Table Header is Created

Every vworker has the Exact Same Tables

All Aster Tables are spread across All vworkers

The Table Header and the Data Rows are Stored Separately

A vworker Stores the Rows of a Table inside a Data Block

To Read Rows, a vworker Moves the Data Block into Memory

A Full Table Scan Means All vworkers must Read All Rows

The “Achilles Heel”, or Slowest Process, is Block Transfer

Each Table has a Distribution Key

A Query Using the Distribution Key uses a Single vworker

As Rows are Added, a Data Block will Eventually Split

A Full Table Scan Means All vworkers Read All Blocks

Distribution Key Query uses One vworker

Each vworker Can Have Many Blocks for a Single Table

A Full Table Scan Means All vworkers Read All Blocks

Quiz – How Many Blocks Move into vworker Memory?

Answer – How Many Blocks Move into vworker Memory?

Quiz – How Many Blocks Move Using the Distribution Key?

Answer-How Many Blocks Move Using the Distribution Key?

Chapter 4 - Four Options for Aster Data Table Design

There are Four Options to Aster Table Design

Straight up Distribute by Hash

Straight up Distribute by Hash - Problems

Straight up Distribute by Replication

Partition the Table with Logical Partitioning

This Partitioned Table Sorts Rows by Month of Order_Date

An All vworkers Retrieve By Way of a Single Partition

You can Partition a Table by Range or by List

A Partitioned By List Example with Three Tactical Queries

Aster Data Multi-Level Partitioning

Aster Allows for Multi-Level Partitioning

SQL Commands for Logical Partitioning as One Table

What Partitions are on my Table?

What does a Columnar Table look like?

A Comparison of Data for Normal Vs. Columnar

A Columnar Table is best for Queries with Few Columns

Quiz – How Many Blocks Move to vworker Memory?

Answer – How Many Containers Move to vworker Memory?

When to use a Columnar Table

Chapter 5 - How Joins Work Inside the Aster Engine

Aster Join Quiz

Aster Join Quiz Answer

The Joining of Two Tables

Aster Moves Joining Rows to the Same vworker

Because of the Join Rule – Dimension Table are Replicated

The Two Different Philosophies for Table Join Design

What Could You Do If Two Tables Joined 1000 Times a Day?

Fact and Dimension Tables can be Hashed by the same Key

Joining Two Tables with the same PK/FK Distribution Key

A Join With Co-Location

A Performance Tuning Technique for Large Joins

The Joining of Two Tables with an Additional WHERE Clause

Aster Performs Joins Using Three Different Methods

The Hash Join

The Merge Join

Nested Loop Joins

Chapter 6 - Temporary and Analytic Tables

Aster has Three Types of Data

Create a Permanent Table Using Create Table AS (CTAS)

Create a Logically Partitioned Table and Populate It

Create a Temporary Table with using Create Table AS (CTAS)

A Temporary Table in Action

A Temporary Table That Uses an Insert/Select

Create an Analytic Table Using an Insert/Select

Create an Analytic Table Using CREATE TABLE AS (CTAS)

Operations that Invalidate an Analytic Table

If an Analytic Table is Invalid

Tera-Tom History

Chapter 7 – Aster Modeling Rules

Modeling Rules for Aster Data

Three Principles that Govern the Modeling Rules

Modeling Rule 1 – Dimensionalize your Model

A Dimensional Model is called a "Star Schema"

To Read a Data Block, a vworker Moves the Block to Memory

A Dimensional Model Moves Less Mass into Memory

Which Move From Disk to Memory Would You Choose?

Vworkers transfer their Fact Table into Memory in Parallel

Modeling Rule 2 – Use Columnar

Which Move From Disk to Memory Would You Choose?

Let's Discuss Modeling and Joins at the Simplest Level

Let's Discuss Modeling and Joins at the Simplest Level

Let's Discuss Joins at the Simplest Level

Modeling Rule 3 – Distribute your Tables Based on Joins

The Two Different Philosophies for Table Join Design

Facts are Hashed and most often the Dimension is Replicated

Fact and Dimension Tables can be Hashed by the same Key

Joining Two Tables with the same PK/FK Primary Index

A Join With No Redistribution or Duplication

Aster Hates Joining Tables with a Different Distribution Key

Aster Hates to Redistribute by Hash to Join Tables

Modeling Rule 4 – Replicate Dimension Tables

Modeling Rule 5 – Partition Your Tables

Modeling Rule 6 – Make Fact Tables Skinny

Modeling Rule 6 – Make Fact Tables Skinny Example

Modeling Rule 7 – Index Your Tables

The B-Tree Index

Which Columns Might You Create an Index?

Answer - Which Columns Might You Create an Index?

Modeling Rule 8 – Denormalize based on Your Environment

Modeling Rule 8 – Denormalize based on Your Environment

Chapter 8 – Tera-Tom's Top Tips

Tera-Tom's Top Tips

Tera-Tom's Top Tips # 2

Tera-Tom's Top Tips #3

Tera-Tom's Top Tips # 3 Rewritten

Tera-Tom's Top Tips #4

When the GROUP BY Column is NOT the Distribution Key

Example of GROUP BY Column is NOT the Distribution Key

Tera-Tom's Top Tips #5

Tera-Tom's Top Tips #6 – Use EXPLAIN

Query Plan and Estimates

Explain Plan Showing a Hash Join

Explain Plan Showing a Merge Join

Explain Plan Showing a Nested Loop Join

Chapter 9 - Indexes

There are Only Three Types of Scans

Guidelines for Indexes

An Index Syntax Example

The B-Tree Index

Which Columns Might You Create an Index?

Answer - Which Columns Might You Create an Index?

A Visual of an Index (Conceptually)

A Query Using an Index Uses All vworkers

Multicolumn indexes

A NUSI BITMAP Theory

A NUSI Bitmap in Action

Indexes on Expressions

Indexes on Extracts of Dates

GiST Indexes

Five Operational Tips for Efficient Indexing

REINDEX

createCompressedIndexOnCompressedTableByDefault Flag

Chapter 10 – Aster Windows Functions

Cumulative Sum

Cumulative Sum - Major and Minor Sort Key(s)

The ANSI CSUM – Getting a Sequential Number

The ANSI OLAP – Reset with a PARTITION BY Statement

PARTITION BY only Resets a Single OLAP not ALL of them

ANSI Moving Sum is Current Row and Preceding n Rows

How ANSI Moving SUM Handles the Sort

Quiz – How is that Total Calculated?

Answer to Quiz – How is that Total Calculated?

Moving SUM every 3-rows vs. a Continuous Sum

Moving Average

Quiz – How is that Total Calculated?

Answer to Quiz – How is that Total Calculated?

Quiz – How is that 4th Row Calculated?

Answer to Quiz – How is that 4th Row Calculated?

Partition By Resets an ANSI OLAP

Moving Average Using BETWEEN

Moving Difference using ANSI Syntax

Moving Difference using ANSI Syntax with Partition By

RANK Defaults to Ascending Order

Getting RANK to Sort in DESC Order

You can use Window Functions in Expressions

RANK() OVER and PARTITION BY

DENSE_RANK() OVER

PERCENT_RANK() OVER

PERCENT_RANK() OVER with 14 rows in Calculation

PERCENT_RANK() OVER with 21 rows in Calculation

RANK With ORDER BY SUM()

COUNT OVER for a Sequential Number

Quiz – What caused the COUNT OVER to Reset?

Answer to Quiz – What caused the COUNT OVER to Reset?

The MAX OVER Command

MAX OVER with PARTITION BY Reset

The MIN OVER Command

Quiz – Fill in the Blank

Answer to Quiz – Fill in the Blank

The Row_Number Command

Quiz – How did the Row_Number Reset?

Quiz – How did the Row_Number Reset?

NTILE

NTILE Using a Value of 10

NTILE With a Partition

CUME_DIST

CUME_DIST With a Partition

LEAD

LEAD With Partitioning

LAG

LAG with Partitioning

FIRST_VALUE

FIRST_VALUE After Sorting by the Highest Value

FIRST_VALUE with Partitioning

LAST_VALUE

NTH_VALUE

NTH_VALUE With Partition

SUM(SUM(n))

Chapter 11 – SQL-MapReduce

MapReduce History

What is MapReduce?

What is SQL-MapReduce?

SQL-MapReduce Input

SQL-MapReduce Output

Subtle SQL-MapReduce Processing

Aster Data Provides an Analytic Foundation

Path Analysis

Text Analysis

Statistical Analysis

Segmentation (Data Mining)

Graph Analysis

Transformation of Data

Sessionize

Tokenize

SQL-MapReduce Function . . . nPath

nPath SELECT Clause

nPath ON Clause

nPath PARTITION BY Expression

nPath DIMENSION Expression

nPath ORDER BY Expression

nPath MODE Clause has Overlapping or NonOverlapping

nPath PATTERN Clause

Pattern Operators

Pattern Operators Order of Precedence

Matching Patterns Which Repeat

nPath SYMBOLS Clause

nPath RESULTS Clause

Adding an Aggregate to nPath Results

Adding an Aggregate to nPath Results (Continued)

SQL-MapReduce Examples - Use Regular SQL

SQL-MapReduce Examples - Create Objects

SQL-MapReduce Examples - Subquery

SQL-MapReduce Examples - Query as Input

SQL-MapReduce Examples - Nesting Functions

SQL-MapReduce Examples - Functions in Derived Tables

SQL-MapReduce Examples - SMAVG

SQL-MapReduce Examples - Pack Function

SQL-MapReduce Examples - Pack Function (Continued)

SQL-MapReduce Examples - Pivot Columns

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset