Chapter 17 – Table Create and Data Types

“Create a definite plan for carrying out your desire and begin at once, whether you are ready or not, to put this plan into action. “

- Napoleon Hill

Distribution Strategy 1 - Segmented By Hash

image

The entire row of a table is on a segment, but each column in the row is in a separate block. Vertica spreads the rows of a table evenly across the nodes. A good Distribution Key is the key to good distribution!

Distribution Strategy 2 - Unsegmented

image

When Unsegmented is chosen for distribution, the entire table is copied to each segment. This is often termed replicated. The general idea is to Segment by Hash all large tables and to use Unsegmented on smaller tables.

Sorting the Data in a Table CREATE Statement

image

We have chosen the Order_Date column as the sort key and the Order_No column as the Hash Key.

Even Distribution

image

The data has spread evenly among the segments for this table. Do you know why? The Hash Key is Order_No and it is a unique value. Hashing unique values results in near perfect distribution every single time.

Uneven Distribution Where the Data is Non-Unique

image

The data did not spread evenly among the segments for this table. Do you know why? The Hash Key is Cust_No. All like values went to the same Node. This distribution isn't perfect, but it is reasonable, so it is an acceptable practice.

Matching Distribution Keys for Co-Location of Joins

image

Notice that both tables are distributed by Hash on the column Dept_No. When these two tables are joined WHERE Dept_No = Dept_No, the rows with matching department numbers are on the same segment. This is called Co-Location. This makes joins efficient and fast.

Big Table / Small Table Joins

image

Notice that the Department_Table has only four rows. Those four rows are copied to every segment. This is distributed by UNSEGMENTED. Now, the Department_Table can be joined to the Employee_Table with a guarantee that matching rows are co-located. They are co-located because the smaller table has copied ALL of its rows to each Node. When two joining tables have one large table (fact table) and one small table (dimension table), then use the UNSEGMENTED keyword to distribute the smaller table. This theory is also called a "Big Table/ Small Table Join".

Fact and Dimension Table Distribution Key Designs

image

The fact table (Line_Order_Fact_Table) is the largest table, but the Part_Table is the largest dimension table. That is why you make Part_Key the distribution key for both tables. Now, when these two tables are joined together, the matching Part_Key rows are on the same Node. You can then distribute by UNSEGMENTED, which replicates the other dimension tables to each node. Each table will have all their rows on each Node. Now, everything that joins to the fact table is co-located!

Why a Sort Key Improves Performance

image

There are three basic reasons to use the sortkey keyword when creating a table. 1) If recent data is queried most frequently, specify the timestamp or date column as the leading column for the sort key. 2) If you do frequent range filtering or equality filtering on one column, specify that column as the sort key. 3) If you frequently join a (dimension) table, specify the join column as the sort key. Above, you can see we have made our sortkey the Order_Date column. Look how the data is sorted!

Sort Keys Help GROUP BY, ORDER BY and Window Functions

image

When data is sorted on a strategic column, it will improve (GROUP BY and ORDER BY operations), window functions (PARTITION BY and ORDER BY operations), and even as a means of optimizing compression. But, as new rows are incrementally loaded, these new rows are sorted but they reside temporarily in a separate region on disk. In order to maintain a fully sorted table, you need to run the VACUUM command at regular intervals. You will also need to run ANALYZE.

Syntax for Temporary Tables

CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ]

TABLE [schema-name].table-name  {

( column-definition [ , ... ] )

  | [ column-name-list ] }

    [ ON COMMIT { DELETE | PRESERVE } ROWS ]

    [ AS [ AT EPOCH LATEST ] | [ AT TIME 'timestamp' ] query ]

       [ [ ORDER BY table-column [ , ... ] ]

        [ ENCODED BY column-definition [ , ... ]

          [ hash-segmentation-clause | range-segmentation-clause

           | UNSEGMENTED { NODE node | ALL NODES } ]

              [ KSAFE [ k-num ] ]

                | [ NO PROJECTION ] ]

The syntax above is for creating temporary tables. Global tables can be seen outside the session and persist until the end of the session. Global is the default. Local tables can only be seen inside the session and persist until the end of session.

Temporary Tables Explained

Global Temporary Tables - The definition of a global temporary table is permanent in the database catalogs until explicitly removed by using the DROP TABLE command.

Global temporary tables are created in the public schema, and they are visible to all users and sessions. But, the contents (data) of a global table are private to the transaction or session in which the data was inserted. Data is automatically removed when the transaction commits, rolls back, or the session ends. This allows two users to use the same temporary table, but each only sees the data specific to his or her own transactions for the duration of those transactions or sessions.

Local Temporary Tables - A local temporary table is created in the V_TEMP_SCHEMA namespace and is inserted into the user's search path automatically. It can only be seen by the user who created the table, and it lasts for only the duration of the session in which it is created. When the session ends, the table definition is automatically dropped from the database catalogs. Local Temporary Tables can be dropped explicitly.

Above are the major differences between Global and Local Temporary tables.

Key Temporary Table Terms

Global - [Optional] means that the table definition is visible to all sessions. Temporary table data is visible only to the session that materializes (inserts) the data into the table. Temporary tables in default to global.

Local - [Optional] Means that the table definition is visible only to the session in which it is created. Temporary tables always default to global.

On Commit Preserve|Delete rows – Preserve will preserve the rows until session end and then Truncate the table and Delete will Truncate the rows after each COMMIT.

AT EPOCH LATEST | AT TIME - Used with AS query to query historical data. You can specify AT EPOCH LATEST to include data from the latest committed transaction or specify a specific epoch based on a specific time stamp.

Above are the key terms you will want to know when creating a temporary table.

Creating and Populating a Local Temporary Table

image

1) A USER Creates a Local Temporary Table and then 2) populates the Temporary Table with an INSERT/SELECT Statement. Now, the user can query this table all session long. When the session is logged off the table and the data are automatically deleted (Truncated).

Using a Local Temporary Table

CREATE LOCAL Temporary TABLE  Dept_Agg_Local2

                  ( Dept_no                    Integer

                     ,AVG_Salary            Decimal(10,2)

                  )    ON COMMIT PRESERVE ROWS ;

INSERT INTO Dept_Agg_Local2

SELECT    Dept_no

                 ,AVG(Salary)

FROM  Employee_Table GROUP BY Dept_no ;

SELECT E.*, AVG_Salary

FROM Employee_Table as E

INNER JOIN

             Dept_Agg_Local2

ON E.Dept_No = Dept_Agg_Local2.Dept_No

AND Salary > AVG_Salary

image

We created the Local Temporary Table, materialized it and then used it in a join. The above query finds all employees making a greater salary then the AVG (Salary) within their own dept_no.

Creating and Populating a Global Temporary Table

image

1) A USER Creates a Global Temporary Table once and the table definition will persist permanently, until it is dropped. Users can then 2) populates the Global Temporary Table with an INSERT/SELECT Statement. Now, the user can query this table all session long. When the session is logged off the table definition stays, but the data is automatically deleted (Truncated). Many different users can populate the table, but each only sees the table they materialized.

Creating and Populating a Global Temporary Table

CREATE Global Temporary TABLE  Dept_Agg_Global

                   ( Dept_no                   Integer

                      ,AVG_Salary           Decimal(10,2)

                   )    ON COMMIT PRESERVE ROWS ;

image

Both users above can only see
the data they populated

Two users above have materialized the same Global Temporary table, but each only sees their table. Users can not share a Global Temporary table, but only the definition.

Some Great Examples of Creating a Temporary Table Quickly

This table is created from the Sales_Table

CREATE TEMP TABLE Sales_Agg

ON COMMIT PRESERVE ROWS

AS

SELECT Product_ID

              ,SUM(Daily_Sales)

FROM Sales_Table

Group by Product_ID;

This table is materialized from a join

CREATE TEMP TABLE Emp_Dept

ON COMMIT PRESERVE ROWS

AS

SELECT E.*, Department_Name, Budget

FROM Employee_Table as E

INNER JOIN

             Department_Table as D

ON E.Dept_No = D.Dept_No;

Above are two great examples to quickly CREATE a temporary Table from another table.

Creating a Temporary Table That is sorted

This table is sorted by Sale_Date

CREATE GLOBAL TEMP TABLE Temp_Orders

(

       Order_Number INTEGER

       ,Customer_Number INTEGER

       ,Order_Date   Date

       ,Order_Total            Decimal(8,2))

     ON COMMIT PRESERVE ROWS

   ORDER BY Order_Date, Customer_Number;

INSERT INTO Temp_Orders

SELECT * FROM Order_Table;

SELECT *

FROM Temp_Orders;

A great reason to create a temporary table is to have it sorted.

A Temp Table That Populates Some of the Rows

Create a Temporary Table with orders from September

CREATE Temp TABLE Order_Vol

ON COMMIT PRESERVE ROWS

AS (SELECT * FROM Order_Table

  WHERE Extract(Month from Order_Date) = 9);

Above is an example of creating a temporary table that is not an exact copy. It is only populating the table with orders from the month of September.

A Temporary Table with Some of the Columns

This creates a table with only three columns

CREATE Temporary TABLE Order_Vol5
ON COMMIT PRESERVE ROWS
AS (SELECT Customer_Number
                     ,Order_Date, Order_Total
      FROM     Order_Table) ;

Above is an example of creating a Temporary table with three columns. The original table had four columns.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset