Chapter 3

Hashing of the Primary Index

“The true index of a man's character is the health of his wife.”

– Cyril Connolly

Table of contents Chapter 3 – Hashing of the Primary Index

The Hashing Formula Facts

The Hash Map Determines which AMP will own the Row

The Hash Map Determines which AMP will own the Row Continued

Placing rows on the AMP

Placing rows on the AMP Continued

A Review of the Hashing Process

Non-Unique Primary Indexes have Skewed Data

The Uniqueness Value

The Row Hash and Uniqueness Value make up the Row-ID

A Row-ID Example for a Unique Primary Index

A Row-ID Example for a Non-Unique Primary Index(NUPI)

Two Reasons why each AMP Sorts their rows by the Row-ID

AMPs sort their rows by Row-ID to Group Like Data

AMPs sort their rows by Row-ID to do a Binary Search

Table CREATE Examples with four different Primary Indexes

Null Values all Hash to the Same AMP

A Unique Primary Index (UPI) Example

A Non-Unique Primary Index (NUPI) Example

A Multi-Column Primary Index Example

A No Primary Index (NoPI) Example

Watch the Video on the Hashing the Data

The Hashing Formula Facts

images

images There is only one Hashing Formula.

images A rows Primary Index value is hashed and the output is its Row Hash.

images The Row Hash will be stored with the row (on disk) as part of the Row_ID.

images If the Hashing Formula hashes value 1001 and gets a row hash of 13 then it will produce a 13 every time it hashes a 1001 value. Its consistent!

There is one Hashing Formula in Teradata and it is consistent. The concept is to take the value of a rows Primary Index and run it through the Hash Formula. It will produce a Row Hash number. That Row Hash will stay with the row forever and reside as the first part of the row. The Row Hash also determines which AMP will own the row.

The Hash Map Determines which AMP will own the Row

images

This Hash Map is for a 3-AMP system

A rows will be placed on an AMP after the Parsing Engine (PE) hashes the rows Primary Index value. The output of the Hashing Algorithm is a rows Row Hash. The Row hash goes to a bucket in the Hash Map and is assigned to an AMP.

The Hash Map Determines which AMP will own the Row

images

images

The above example hashed Emp_No 1001 (Primary Index value) and the output was a Row Hash of 13. Teradata counted over to bucket 13 in the Hash Map and it has the number one (1) inside that bucket. This means that this row will go to AMP 1.

Placing rows on the AMP

images

The above example hashed Emp_No 1002 (Primary Index value) and the output was a Row Hash of 5. Teradata counted over to bucket 5 in the Hash Map and it has the number two (2) inside that bucket. This means that this row will go to AMP 2.

Placing rows on the AMP Continued

images

The above example hashed Emp_No 1003 (Primary Index value) and the output was a Row Hash of 9. Teradata counted over to bucket 9 in the Hash Map and it has the number one (3) inside that bucket. This means that this row will go to AMP 3.

A Review of the Hashing Process

images Hash the Primary Index Value for a row with the Hash Formula.

images The output of the Hash Formula is a 32-bit Row Hash.

images Take the Row Hash and find its corresponding bucket in the Hash Map.

images Send the row and its Row Hash to the AMP listed in the Hash Map Bucket.

images

Take a look at the row hash for each row and notice it corresponds with the Hash Map.

Non-Unique Primary Indexes have Skewed Data

Imagine if we made Last_Name the Primary Index for a table. Here is an example of how it would distribute. Notice all duplicates have the same Row Hash.

images

The Hash Formula is consistent so every Smith has the same Row Hash and the same goes for each Jones and each Patel. Therefore duplicate values land on the same AMP.

The Uniqueness Value

images

Each AMP will place a Uniqueness Value after the row hash to track duplicate values.

The Row Hash and Uniqueness Value make up the Row-ID

images

Row-ID equals the Row Hash of the Primary Index column and the Uniqueness Value.

A Row-ID Example for a Unique Primary Index

images

Notice two things for this Unique Primary Index (UPI) example:

1) The Uniqueness Value on each Row-ID is 1.

2) Each AMP sorts their rows by the Row-ID.

Row Hash and the Uniqueness Value make up the Row-ID. AMPs sort by the Row-ID.

A Row-ID Example for a Non-Unique Primary Index(NUPI)

images

Notice two things:

1) Uniqueness Value increases on all duplicate names.

2) Each AMP sorts their rows by the Row-ID.

Row Hash and the Uniqueness Value make up the Row-ID. AMPs sort by the Row-ID.

Two Reasons why each AMP Sorts their rows by the Row-ID

images

The Two Reasons are:

1) So like Data such as ‘Smith’ is grouped together on the AMPs disk.

2) So each AMP can perform a Binary Search, like a Phone Book in Alphabetical order.

AMPs sort rows by Row-ID so like data is grouped together and for Binary searches.

AMPs sort their rows by Row-ID to Group Like Data

images

Query

SELECT * FROM Employee_Table
WHERE Last_Name = ‘Smith’ ;

Parsing Engine (PE) Plan

  1) I see that Last_Name is the Primary Index.

  2) So Hash ‘Smith’ and get the Row Hash.

  3) The Row Hash for ‘Smith’ = 7.

  4) What AMP# is in Bucket 7 of the Hash Map?

  5) Bucket 7 says AMP 1?

  6) Bynet – tell AMP 1 to all retrieve Row Hash 7's.

  7) Bring back all Columns for Row Hash 7 (‘Smith’).

Notice that all of the Smiths are lumped together because of the sorting by Row-ID.

AMPs sort their rows by Row-ID to do a Binary Search

Query

SELECT * FROM Order_Table
WHERE Order_Number = 50;

images

1) Hash Order_Number 50 (Row Hash = 75)

2) Go to bucket 75 in Hash Map

3) What AMP is in bucket 75? AMP 1!

4) Bynet – Tell AMP 1 to get Row Hash 75!

5) Perform a Binary Search

A Binary Search knows the Row-IDs are in numeric order . Its like you using a phone book. Go to the middle first and then go up or down in chunks to find things quickly.

Table CREATE Examples with four different Primary Indexes

images

CREATE TABLE Emp_Intl
  (Emp_No        INTEGER,
   Dept_No        SMALLINT,
   First_Name    VARCHAR(12),
   Last_Name     CHAR(20),
   Salary            DECIMAL(10,2))
UNIQUE PRIMARY INDEX ( Emp_No );

UPI

images

CREATE TABLE Emp_Intl
  (Emp_No        INTEGER,
   Dept_No        SMALLINT,
   First_Name    VARCHAR(12),
   Last_Name     CHAR(20),
   Salary            DECIMAL(10,2))
PRIMARY INDEX ( Dept_No );

NUPI

images

CREATE TABLE Emp_Intl
  (Emp_No        INTEGER,
   Dept_No        SMALLINT,
   First_Name    VARCHAR(12),
   Last_Name     CHAR(20),
   Salary            DECIMAL(10,2))
PRIMARY INDEX ( First_Name , Last_Name );

Multi-Column NUPI

images

CREATE TABLE Emp_Intl
  (Emp_No        INTEGER,
   Dept_No        SMALLINT,
   First_Name    VARCHAR(12),
   Last_Name     CHAR(20),
   Salary            DECIMAL(10,2))
NO Primary Index

No Primary Index

A table can have only one Primary Index so picking the right one is essential. Above are four different examples for your consideration. Now lets review with these.

Null Values all Hash to the Same AMP

images

If there are NULL values in the Primary Index you could find this is the reason for your skew. A Table with a Unique Primary Index can have only one Null value, but a NUPI table can have many NULL values and each NULL value hashes to the same AMP.

A Unique Primary Index (UPI) Example

images

CREATE TABLE Emp_Intl
  (Emp_No        INTEGER,
   Dept_No        SMALLINT,
   First_Name    VARCHAR(12),
   Last_Name     CHAR(20),
   Salary            DECIMAL(10,2))
UNIQUE PRIMARY INDEX ( Emp_No );

images

images

A Unique Primary Index will spread the data perfectly evenly.

A Non-Unique Primary Index (NUPI) Example

images

CREATE TABLE Emp_Intl
  (Emp_No        INTEGER,
   Dept_No        SMALLINT,
   First_Name    VARCHAR(12),
   Last_Name     CHAR(20),
   Salary            DECIMAL(10,2))
PRIMARY INDEX ( Dept_No );

images

images

A Non-Unique Primary Index will NOT spread the data perfectly evenly.

A Multi-Column Primary Index Example

images

CREATE TABLE Emp_Intl
  (Emp_No        INTEGER,
   Dept_No        SMALLINT
   First_Name    VARCHAR(12),
   Last_Name     CHAR(20),
   Salary            DECIMAL(10,2))
PRIMARY INDEX ( First_Name , Last_Name );

images

images

A Multi-Column Primary Index is often used to fix a data skew problem.

A No Primary Index (NoPI) Example

images

CREATE TABLE Emp_Intl
  (Emp_No        INTEGER,
   Dept_No        SMALLINT,
   First_Name    VARCHAR(12),
   Last_Name     CHAR(20),
   Salary            DECIMAL(10,2))
NO Primary Index

images

images

All AMPs read all of their rows (full table scan) because there is no Primary Index.

Watch the Video on the Hashing the Data

images

Tera-Tom Trivia

In 1980 Tom Coffing won two Olympic Trials before the United States boycotted the Olympics. Tom is pictured above as the wrestler on the left.

Click on the link below or place it in your browser and watch the video on Hashing.

http://www.coffingdw.com/TbasicsV12/Hashing.wmv

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset