Chapter 39 – Hashing Functions

Teradata uses Parallel Processing and the most important aspect of this is to spread the rows of a table equally among the AMPs, who read and write data.

Below are the four Hashing Functions available to see the distribution of data:

HASHROW HASHAMP
HASHBUCKET HASHBAKAMP

Teradata uses parallel processing with its architecture of AMPs and PEs. The Primary Index (PI) must exist whenever you create a table because it is the sole determinant of which AMPs owns which rows. This concept pertains to data storage and data retrieval. Picking the proper column(s) for the PI is extremely important for distribution and therefore, performance. The hashing functions introduced in this section provide information pertaining to the selection of the AMP where each individual row is stored.

There are hashing functions, which can be incorporated into SQL, to produce and use the same hash value result for testing current or future distribution levels. In other words, these functions can be used to evaluate the distribution of the rows within any or all tables or determine the acceptability of other columns as a potential primary index.

The HASHROW Function

The basic syntax for using the HASHROW function follows:

HASHROW( [ <data-column-value> […, <data-column-value>…] ] )

Below is an actual example

SELECT HASHROW (NULL) AS NULL_HASH
,HASHROW(‘Coffing’) AS Name_HASH
,HASHROW() AS NO_HASH ;

NULL_HASH	NAME_HASH	NO_HASH
00000000	4B1075E1	FFFFFFFF

If you ran this query over and over again you would get the same result.

Notice that you can Hash a Null and understand that the hash is consistent so running this query with those values will return the same answer time and time again. The HASHROW function is used to produce the 32-bit binary (BYTE(4) data type) Row Hash that is stored as part of the data row. It can return a maximum of 4,294,967,295 unique values. The values produced range from 0 to FFFFFFFF.

The HASHROW Function in a real-world Example

Now that the functionality has been demonstrated on the previous page, a more realistic use might be the following to examine the data distribution and determine the average number of rows per value:

       SELECT
       COUNT(*)/COUNT(DISTINCT(HASHROW(Student_Id))) AS
       AVG_ROW_CT
       FROM Student_Table;

AVG_ROW_CT

From the above answer of 1 it is a great sign that the data is perfectly distributed or even Unique. As good as this is, the HASHROW function does not provide a lot more help in the evaluation process. However, when combined with the other Hashing Functions, it yields some very helpful data demographics.

The HASHBUCKET Function

The basic syntax for using the HASHBUCKET function follows:

HASHBUCKET( [ <row-hash-value> ] )

Below is an actual example

SELECT HASHBUCKET(NULL) AS NULL_BUCKET
,HASHBUCKET() AS NO_BUCKET;

NULL_BUCKET	NO_BUCKET
?	1048575

If you ran this query over and over again you would get the same result.

The HASHBUCKET function is used to produce the 16-bit binary Hash Bucket (the DSW) that is used with the Hash Map to determine the AMP that should store and retrieve the data row. It can return a maximum of just over 1,000,000 unique values. The values range from 0 to 1,048,575, not counting the NULL as a potential result. The input to the HASHBUCKET is the 32-bit Row Hash value.

The HASHBUCKET Function in a real-world Example

SELECT COUNT(*) AS NBR_ROWS
,HASHBUCKET(HASHROW (Student_ID)) AS Bucket_No
FROM Student_Table
GROUP BY 2 ;

NBR_ROWS	Bucket_No
1	1007154
1	499675
1	1020598
1	1019019
1	1017675
1	579440
1	466041
1	975629
1	729631
1	542497

Now that the functionality has been demonstrated on the previous page, a more realistic use might be to see the number of rows in each Hash Bucket for a column in a table:

The HASHAMP Function

The basic syntax for using the HASHAMP function follows:

HASHAMP( <hash-bucket> )

Below is an actual example

SELECT HASHAMP(NULL) AS NULL_BUCKET
,HASHAMP () AS NO_BUCKET;

NULL_BUCKET	NO_BUCKET
?	1

The HASHAMP function returns the identification number of the primary AMP for any Hash Bucket number. The input to the HASHAMP function is an integer value. When no value is passed to the HASHAMP function, it returns a number that is one less than the number of AMPs in the current system configuration. If any other data type is passed to it, a run-time error occurs.

The HASHAMP Function in a real-world Example

SELECT Student_ID
,HASHBUCKET(HASHROW(Student_ID)) AS Bucket_No
,HASHAMP(HASHBUCKET(HASHROW(Student_ID))) AS AMP_No
FROM Student_Table
ORDER BY 1 ;

Student_ID	Bucket_No	AMP_No
123250	1019019	0
125634	542497	1
231222	579440	2
234121	1017675	3
260000	975629	4
280023	499675	3
322133	1007154	2
324652	729631	1
333450	1020598	2
423400	466041	4

Now that the functionality has been demonstrated on the previous page, a more realistic use might be to see the AMP Number for each Student to check for distribution.

A Great HASHAMP Function for Large Tables

SELECT COUNT(*) “Count”
,HASHAMP(HASHBUCKET(HASHROW(Product_ID))) AS AMP_No
FROM Sales_Table
GROUP BY 2
ORDER BY 2 ;

Count	AMP_No
14	0
7	1

The example above is shown using a smaller table on a small system, but the query will brilliantly show the row counts for each AMP. This is a very valuable report.

The HASHBAKAMP Function

The basic syntax for using the HASHBAKAMP function follows:

HASHBAKAMP ( <hash-bucket> )

Below is an actual example

SELECT HASHBAKAMP (NULL) AS NULL_BUCKET
,HASHBAKAMP () AS NO_BUCKET;

NULL_BUCKET	NO_BUCKET
?	1

The HASHBAKAMP function returns the identification number of the Fallback AMP for any Hash Bucket number. The input to the HASHBAKAMP function is an integer. When no value is passed to the HASHAMP function, it returns a number that is one less than the number of AMPs in the current system configuration. If any other data type is passed to it, a run-time error occurs.

A Real-World HASBAKHAMP Function Example

SELECT Student_ID
      ,HASHBUCKET(HASHROW(Student_ID)) AS Bucket_No
      ,HASHBAKAMP(HASHBUCKET(HASHROW(Student_ID)))
            AS BAK_AMP_No
FROM Student_Table ORDER BY 3;

Student_ID	Bucket_No	BAK_AMP_No
333450	1020598	0
322133	1007154	0
231222	579440	0
125634	542497	1
260000	975629	1
234121	1017675	1
123250	1019019	1
324652	729631	1
280023	499675	1
423400	466041	1

The example above shows the Student_ID, the Bucket_No and the AMP that contains the fallback rows for each Student_ID.

A Great way to see distribution for Primary and Fallback rows

SELECT SUM(NbrRows) AS “Rows Per AMP” ,AMP_Nbr
FROM (SEL COUNT(*),
HASHBAKAMP(HASHBUCKET(HASHROW(Student_ID)))
FROM Student_Table GROUP BY 2
UNION ALL
SEL COUNT(*) ,HASHAMP(HASHBUCKET(HASHROW(Student_ID)))
FROM Student_Table GROUP BY 2) DT (NbrRows, AMP_Nbr)
GROUP BY 2
ORDER BY 2 ;

Rows Per AMP	AMP_Nbr
10	0
10	1

The following SELECT can help determine that situation by finding all the Primary rows with their AMP and all the FALLBACK rows with their AMPs and than adding them together for the total (notice it uses a derived table to consolidate the rows counts):

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 39 – Hashing Functions

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 39 – Hashing Functions