Chapter 39

Hashing Functions

“Love, like a chicken salad or restaurant hash, must be taken with blind faith or it loses its flavor. “

– Helen Rowland

Table of Contents Chapter 39 – Hashing Functions

Hashing Functions on Teradata

The HASHROW Function

The HASHROW Function in a real-world Example

The HASHBUCKET Function

The HASHBUCKET Function in a real-world Example

The HASHAMP Function

The HASHAMP Function in a real-world Example

A Great HASHAMP Function for Large Tables

The HASHBAKAMP Function

A Real-World HASBAKHAMP Function Example

A Great way to see distribution for Primary and Fallback rows

Hashing Functions on Teradata

Teradata uses Parallel Processing and the most important aspect of this is to spread the rows of a table equally among the AMPs, who read and write data.

Below are the four Hashing Functions available to see the distribution of data:

HASHROW            HASHAMP
HASHBUCKET     HASHBAKAMP

Teradata uses parallel processing with its architecture of AMPs and PEs. The Primary Index (PI) must exist whenever you create a table because it is the sole determinant of which AMPs owns which rows. This concept pertains to data storage and data retrieval. Picking the proper column(s) for the PI is extremely important for distribution and therefore, performance. The hashing functions introduced in this section provide information pertaining to the selection of the AMP where each individual row is stored.

There are hashing functions, which can be incorporated into SQL, to produce and use the same hash value result for testing current or future distribution levels. In other words, these functions can be used to evaluate the distribution of the rows within any or all tables or determine the acceptability of other columns as a potential primary index.

The HASHROW Function

The basic syntax for using the HASHROW function follows:

HASHROW( [ <data-column-value> […, <data-column-value>…] ] )

Below is an actual example

SELECT HASHROW (NULL) AS NULL_HASH
      ,HASHROW(‘Coffing’) AS Name_HASH
      ,HASHROW() AS NO_HASH ;

NULL_HASH NAME_HASH NO_HASH
00000000 4B1075E1 FFFFFFFF

If you ran this query over and over again you would get the same result.

Notice that you can Hash a Null and understand that the hash is consistent so running this query with those values will return the same answer time and time again. The HASHROW function is used to produce the 32-bit binary (BYTE(4) data type) Row Hash that is stored as part of the data row. It can return a maximum of 4,294,967,295 unique values. The values produced range from 0 to FFFFFFFF.

The HASHROW Function in a real-world Example

Now that the functionality has been demonstrated on the previous page, a more realistic use might be the following to examine the data distribution and determine the average number of rows per value:

       SELECT
       COUNT(*)/COUNT(DISTINCT(HASHROW(Student_Id))) AS
       AVG_ROW_CT
       FROM Student_Table;

AVG_ROW_CT
1

From the above answer of 1 it is a great sign that the data is perfectly distributed or even Unique. As good as this is, the HASHROW function does not provide a lot more help in the evaluation process. However, when combined with the other Hashing Functions, it yields some very helpful data demographics.

The HASHBUCKET Function

The basic syntax for using the HASHBUCKET function follows:

HASHBUCKET( [ <row-hash-value> ] )

Below is an actual example

SELECT HASHBUCKET(NULL) AS NULL_BUCKET
       ,HASHBUCKET() AS NO_BUCKET;

NULL_BUCKET NO_BUCKET
? 1048575

If you ran this query over and over again you would get the same result.

The HASHBUCKET function is used to produce the 16-bit binary Hash Bucket (the DSW) that is used with the Hash Map to determine the AMP that should store and retrieve the data row. It can return a maximum of just over 1,000,000 unique values. The values range from 0 to 1,048,575, not counting the NULL as a potential result. The input to the HASHBUCKET is the 32-bit Row Hash value.

The HASHBUCKET Function in a real-world Example

SELECT COUNT(*) AS NBR_ROWS
      ,HASHBUCKET(HASHROW (Student_ID)) AS Bucket_No
FROM Student_Table
GROUP BY 2 ;

NBR_ROWS Bucket_No
1 1007154
1 499675
1 1020598
1 1019019
1 1017675
1 579440
1 466041
1 975629
1 729631
1 542497

Now that the functionality has been demonstrated on the previous page, a more realistic use might be to see the number of rows in each Hash Bucket for a column in a table:

The HASHAMP Function

The basic syntax for using the HASHAMP function follows:

    HASHAMP( <hash-bucket> )

Below is an actual example

SELECT HASHAMP(NULL) AS NULL_BUCKET
      ,HASHAMP () AS NO_BUCKET;

NULL_BUCKET NO_BUCKET
? 1

The HASHAMP function returns the identification number of the primary AMP for any Hash Bucket number. The input to the HASHAMP function is an integer value. When no value is passed to the HASHAMP function, it returns a number that is one less than the number of AMPs in the current system configuration. If any other data type is passed to it, a run-time error occurs.

The HASHAMP Function in a real-world Example

SELECT Student_ID
      ,HASHBUCKET(HASHROW(Student_ID)) AS Bucket_No
      ,HASHAMP(HASHBUCKET(HASHROW(Student_ID))) AS AMP_No
FROM Student_Table
ORDER BY 1 ;

Student_ID Bucket_No AMP_No
123250 1019019 0
125634 542497 1
231222 579440 2
234121 1017675 3
260000 975629 4
280023 499675 3
322133 1007154 2
324652 729631 1
333450 1020598 2
423400 466041 4

Now that the functionality has been demonstrated on the previous page, a more realistic use might be to see the AMP Number for each Student to check for distribution.

A Great HASHAMP Function for Large Tables

SELECT COUNT(*) “Count”
      ,HASHAMP(HASHBUCKET(HASHROW(Product_ID))) AS AMP_No
FROM Sales_Table
GROUP BY 2
ORDER BY 2 ;

Count AMP_No
14 0
7 1

The example above is shown using a smaller table on a small system, but the query will brilliantly show the row counts for each AMP. This is a very valuable report.

The HASHBAKAMP Function

The basic syntax for using the HASHBAKAMP function follows:

    HASHBAKAMP ( <hash-bucket> )

Below is an actual example

SELECT HASHBAKAMP (NULL) AS NULL_BUCKET
      ,HASHBAKAMP () AS NO_BUCKET;

NULL_BUCKET NO_BUCKET
? 1

The HASHBAKAMP function returns the identification number of the Fallback AMP for any Hash Bucket number. The input to the HASHBAKAMP function is an integer. When no value is passed to the HASHAMP function, it returns a number that is one less than the number of AMPs in the current system configuration. If any other data type is passed to it, a run-time error occurs.

A Real-World HASBAKHAMP Function Example

SELECT Student_ID
      ,HASHBUCKET(HASHROW(Student_ID)) AS Bucket_No
      ,HASHBAKAMP(HASHBUCKET(HASHROW(Student_ID)))
            AS BAK_AMP_No
FROM Student_Table ORDER BY 3;

Student_ID Bucket_No BAK_AMP_No
333450 1020598 0
322133 1007154 0
231222 579440 0
125634 542497 1
260000 975629 1
234121 1017675 1
123250 1019019 1
324652 729631 1
280023 499675 1
423400 466041 1

The example above shows the Student_ID, the Bucket_No and the AMP that contains the fallback rows for each Student_ID.

A Great way to see distribution for Primary and Fallback rows

SELECT SUM(NbrRows) AS “Rows Per AMP” ,AMP_Nbr
FROM (SEL COUNT(*),
      HASHBAKAMP(HASHBUCKET(HASHROW(Student_ID)))
      FROM Student_Table GROUP BY 2
UNION ALL
SEL COUNT(*) ,HASHAMP(HASHBUCKET(HASHROW(Student_ID)))
FROM Student_Table GROUP BY 2) DT (NbrRows, AMP_Nbr)
GROUP BY 2
ORDER BY 2 ;

Rows Per AMP AMP_Nbr
10 0
10 1

The following SELECT can help determine that situation by finding all the Primary rows with their AMP and all the FALLBACK rows with their AMPs and than adding them together for the total (notice it uses a derived table to consolidate the rows counts):

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset