“Create a definite plan for carrying out your desire and begin at once, whether you are ready or not, to put this plan into action. “
- Napoleon Hill
The entire row of a table is on a segment, but each column in the row is in a separate block. Vertica spreads the rows of a table evenly across the nodes. A good Distribution Key is the key to good distribution!
When Unsegmented is chosen for distribution, the entire table is copied to each segment. This is often termed replicated. The general idea is to Segment by Hash all large tables and to use Unsegmented on smaller tables.
We have chosen the Order_Date column as the sort key and the Order_No column as the Hash Key.
The data has spread evenly among the segments for this table. Do you know why? The Hash Key is Order_No and it is a unique value. Hashing unique values results in near perfect distribution every single time.
The data did not spread evenly among the segments for this table. Do you know why? The Hash Key is Cust_No. All like values went to the same Node. This distribution isn't perfect, but it is reasonable, so it is an acceptable practice.
Notice that both tables are distributed by Hash on the column Dept_No. When these two tables are joined WHERE Dept_No = Dept_No, the rows with matching department numbers are on the same segment. This is called Co-Location. This makes joins efficient and fast.
Notice that the Department_Table has only four rows. Those four rows are copied to every segment. This is distributed by UNSEGMENTED. Now, the Department_Table can be joined to the Employee_Table with a guarantee that matching rows are co-located. They are co-located because the smaller table has copied ALL of its rows to each Node. When two joining tables have one large table (fact table) and one small table (dimension table), then use the UNSEGMENTED keyword to distribute the smaller table. This theory is also called a "Big Table/ Small Table Join".
The fact table (Line_Order_Fact_Table) is the largest table, but the Part_Table is the largest dimension table. That is why you make Part_Key the distribution key for both tables. Now, when these two tables are joined together, the matching Part_Key rows are on the same Node. You can then distribute by UNSEGMENTED, which replicates the other dimension tables to each node. Each table will have all their rows on each Node. Now, everything that joins to the fact table is co-located!
There are three basic reasons to use the sortkey keyword when creating a table. 1) If recent data is queried most frequently, specify the timestamp or date column as the leading column for the sort key. 2) If you do frequent range filtering or equality filtering on one column, specify that column as the sort key. 3) If you frequently join a (dimension) table, specify the join column as the sort key. Above, you can see we have made our sortkey the Order_Date column. Look how the data is sorted!
When data is sorted on a strategic column, it will improve (GROUP BY and ORDER BY operations), window functions (PARTITION BY and ORDER BY operations), and even as a means of optimizing compression. But, as new rows are incrementally loaded, these new rows are sorted but they reside temporarily in a separate region on disk. In order to maintain a fully sorted table, you need to run the VACUUM command at regular intervals. You will also need to run ANALYZE.
CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ]
TABLE [schema-name].table-name {
( column-definition [ , ... ] )
| [ column-name-list ] }
[ ON COMMIT { DELETE | PRESERVE } ROWS ]
[ AS [ AT EPOCH LATEST ] | [ AT TIME 'timestamp' ] query ]
[ [ ORDER BY table-column [ , ... ] ]
[ ENCODED BY column-definition [ , ... ]
[ hash-segmentation-clause | range-segmentation-clause
| UNSEGMENTED { NODE node | ALL NODES } ]
[ KSAFE [ k-num ] ]
| [ NO PROJECTION ] ]
The syntax above is for creating temporary tables. Global tables can be seen outside the session and persist until the end of the session. Global is the default. Local tables can only be seen inside the session and persist until the end of session.
Global Temporary Tables - The definition of a global temporary table is permanent in the database catalogs until explicitly removed by using the DROP TABLE command.
Global temporary tables are created in the public schema, and they are visible to all users and sessions. But, the contents (data) of a global table are private to the transaction or session in which the data was inserted. Data is automatically removed when the transaction commits, rolls back, or the session ends. This allows two users to use the same temporary table, but each only sees the data specific to his or her own transactions for the duration of those transactions or sessions.
Local Temporary Tables - A local temporary table is created in the V_TEMP_SCHEMA namespace and is inserted into the user's search path automatically. It can only be seen by the user who created the table, and it lasts for only the duration of the session in which it is created. When the session ends, the table definition is automatically dropped from the database catalogs. Local Temporary Tables can be dropped explicitly.
Above are the major differences between Global and Local Temporary tables.
Global - [Optional] means that the table definition is visible to all sessions. Temporary table data is visible only to the session that materializes (inserts) the data into the table. Temporary tables in default to global.
Local - [Optional] Means that the table definition is visible only to the session in which it is created. Temporary tables always default to global.
On Commit Preserve|Delete rows – Preserve will preserve the rows until session end and then Truncate the table and Delete will Truncate the rows after each COMMIT.
AT EPOCH LATEST | AT TIME - Used with AS query to query historical data. You can specify AT EPOCH LATEST to include data from the latest committed transaction or specify a specific epoch based on a specific time stamp.
Above are the key terms you will want to know when creating a temporary table.
1) A USER Creates a Local Temporary Table and then 2) populates the Temporary Table with an INSERT/SELECT Statement. Now, the user can query this table all session long. When the session is logged off the table and the data are automatically deleted (Truncated).
CREATE LOCAL Temporary TABLE Dept_Agg_Local2
( Dept_no Integer
,AVG_Salary Decimal(10,2)
) ON COMMIT PRESERVE ROWS ;
INSERT INTO Dept_Agg_Local2
SELECT Dept_no
,AVG(Salary)
FROM Employee_Table GROUP BY Dept_no ;
SELECT E.*, AVG_Salary
FROM Employee_Table as E
INNER JOIN
Dept_Agg_Local2
ON E.Dept_No = Dept_Agg_Local2.Dept_No
AND Salary > AVG_Salary
We created the Local Temporary Table, materialized it and then used it in a join. The above query finds all employees making a greater salary then the AVG (Salary) within their own dept_no.
1) A USER Creates a Global Temporary Table once and the table definition will persist permanently, until it is dropped. Users can then 2) populates the Global Temporary Table with an INSERT/SELECT Statement. Now, the user can query this table all session long. When the session is logged off the table definition stays, but the data is automatically deleted (Truncated). Many different users can populate the table, but each only sees the table they materialized.
CREATE Global Temporary TABLE Dept_Agg_Global
( Dept_no Integer
,AVG_Salary Decimal(10,2)
) ON COMMIT PRESERVE ROWS ;
Both users above can only see
the data they populated
Two users above have materialized the same Global Temporary table, but each only sees their table. Users can not share a Global Temporary table, but only the definition.
This table is created from the Sales_Table
CREATE TEMP TABLE Sales_Agg
ON COMMIT PRESERVE ROWS
AS
SELECT Product_ID
,SUM(Daily_Sales)
FROM Sales_Table
Group by Product_ID;
This table is materialized from a join
CREATE TEMP TABLE Emp_Dept
ON COMMIT PRESERVE ROWS
AS
SELECT E.*, Department_Name, Budget
FROM Employee_Table as E
INNER JOIN
Department_Table as D
ON E.Dept_No = D.Dept_No;
Above are two great examples to quickly CREATE a temporary Table from another table.
This table is sorted by Sale_Date
CREATE GLOBAL TEMP TABLE Temp_Orders
(
Order_Number INTEGER
,Customer_Number INTEGER
,Order_Date Date
,Order_Total Decimal(8,2))
ON COMMIT PRESERVE ROWS
ORDER BY Order_Date, Customer_Number;
INSERT INTO Temp_Orders
SELECT * FROM Order_Table;
SELECT *
FROM Temp_Orders;
A great reason to create a temporary table is to have it sorted.
Create a Temporary Table with orders from September
CREATE Temp TABLE Order_Vol
ON COMMIT PRESERVE ROWS
AS (SELECT * FROM Order_Table
WHERE Extract(Month from Order_Date) = 9);
Above is an example of creating a temporary table that is not an exact copy. It is only populating the table with orders from the month of September.
This creates a table with only three columns
CREATE Temporary TABLE Order_Vol5
ON COMMIT PRESERVE ROWS
AS (SELECT Customer_Number
,Order_Date, Order_Total
FROM Order_Table) ;
Above is an example of creating a Temporary table with three columns. The original table had four columns.