Chapter 9

How Joins work internally

“A Join Index walked up to two tables in a bar and said, mind if I Join you?”

Tera-Tom Coffing

Table of Contents Chapter 9 – How Joins work Internally

Teradata Join Quiz

Teradata Join Quiz Answer

If the Join Condition is the Primary Index no Movement

How the Parsing Engine Decides on a Join Plan

Quiz – Redistribute the Employees by their Dept_No

Quiz – Employees Dept_No landed on AMP with Matches

When Rows are on the same AMP they can be Joined

Redistribution and then a Row Hash Match Scan

Quiz – Redistribute the Orders to the Proper AMP

Answer to Redistribute the Employees by their Dept_No Quiz

A Visual of the Join in Action

Nexus Query Chameleon Example

The Big Table Small Table Join causes Duplication

Visual of Duplication of the Smaller Table across All-AMPs

Duplication of the Smaller Table across All-AMPs

A Visual of Duplication of the Smaller Table on a Single AMP

A Visual of Redistribution on a Single AMP

Teradata Join Quiz

Which Statement is NOT true!

1. Each Table in Teradata has a Primary Index, unless it is a NoPI table.

2. The Primary Index is the mechanism that allows Teradata to physically distribute the rows of a table across the AMPs.

3. Each AMP Sorts their rows by the Row-ID, unless it is a Partitioned table, and then the sort is first by the Partition and then by Row-ID.

4. For two rows to be Joined together Teradata insists that both rows are physically on the same AMP.

5. Teradata will either Redistribute one or both of the tables or Duplicate the smaller table across all AMPs to ensure matching rows are on the same AMP, even if it is only for the life of the Join.

Do you know which statement above is False?

Teradata Join Quiz Answer

Which Statement is NOT true!

1. Each Table in Teradata has a Primary Index, unless it is a NoPI table.

2. The Primary Index is the mechanism that allows Teradata to physically distribute the rows of a table across the AMPs.

3. Each AMP Sorts their rows by the Row-ID, unless it is a Partitioned table, and then the sort is first by the Partition and then by Row-ID.

4. For two rows to be Joined together Teradata insists that both rows are physically on the same AMP.

images

5. Teradata will either Redistribute one or both of the tables or Duplicate the smaller table across all AMPs to ensure matching rows are on the same AMP, even if it is only for the life of the Join.

Do you know which statement above is False? All statements above are TRUE!

If the Join Condition is the Primary Index no Movement

images

CREATE Table Employee_Table
( Employee_No   INTEGER
,Dept_No            INTEGER
,Last_Name        CHAR(20)
,First_Name        VARCHAR(20)
,Salary                DECIMAL(10,2)
) UNIQUE PRIMARY INDEX(Employee_No);

CREATE Table Department_Table
(Dept_No                    INTEGER
,Department_Name     CHAR(20)
,Mgr_No                      INTEGER
,Budget                       DECIMAL(10,2)
)
UNIQUE PRIMARY INDEX(Dept_No);

Teradata knows that it can only JOIN two rows together if they are physically on the same AMP. This can occur naturally if the join condition columns are the Primary Indexes of their respective tables, but most likely Teradata will have to move data to get the matching rows on the same AMP. What will the Parsing Engine decide to do next?

How the Parsing Engine Decides on a Join Plan

images

The Parsing Engine (PE) knows that the Dept_No column is the Primary Index for the Department_Table. It also know that the Dept_No column is NOT the Primary Index for the Employee_Table, so the PE commands the AMPs to Redistribute the entire Employee_Table by Dept_No into spool. This is equivalent to loading the Employee_Table with a Primary Index of Dept_No. Now all matching rows can join.

Quiz – Redistribute the Employees by their Dept_No

images

images

If Teradata decides to Redistribute the Employee_Table by Dept_No which AMPs will hold which Employees? Place their Dept_No and Last_Name on the AMP after Redistribution.

Fill in the quiz above. This is a great opportunity to understand the Teradata engine.

Quiz – Employees Dept_No landed on AMP with Matches

images

images

Each redistributed row landed on the same AMP as its matching row. Turn the page.

When Rows are on the same AMP they can be Joined

images

images

images

Redistribution and then a Row Hash Match Scan

images

images

Above resides the Query, the Parsing Engine's (PE) Analysis and a summary (in laymen's terms) of the PE's PLAN. First we need to get all matching rows to the corresponding AMP so the join can be performed between the two tables.

Quiz – Redistribute the Orders to the Proper AMP

images

images

If Teradata decides to Redistribute the Order_Table by Customer_No which AMPs will hold which Orders? Place their Customer_Number and Order_Total on the AMP after Redistribution.

Fill in the quiz above. This is a great opportunity to understand the Teradata engine.

Answer to Redistribute the Employees by their Dept_No Quiz

images

images

The Teradata Hashing Formula is consistent. It is used to load a Table's rows via the Primary Index of the table. Teradata follows the same Hash Formula to Redistribute for Joins.

Each redistributed row landed on the same AMP as its matching row. Turn the page.

A Visual of the Join in Action

images

images

images

images

The Big Table Small Table Join causes Duplication

images

images

Visual of Duplication of the Smaller Table across All-AMPs

images

The next slide will demonstrate how Teradata Duplicates the smaller table (Department_Table) across All-AMPs.

Duplication of the Smaller Table across All-AMPs

images

Teradata took the Department_Table and gathered up all 5-rows and then in Spool Duplicated the entire 5-row Table across All-AMPs. Now the join can happen!

A Visual of Duplication of the Smaller Table on a Single AMP

images

The picture above shows the smaller table Duplicated in its entirety on this AMP in spool. This same spool table resides on every AMP for only the life of this join. Notice that both Dept_No's in the Table and Spool have corresponding Row Hashes.

A Visual of Redistribution on a Single AMP

images

The picture above shows the Department_Table on the top right that holds two rows on this particular AMP. The Employee_Table has been redistributed by Hash Code on Dept_No and the rows in spool have matches on the same AMP.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset