Chapter 38

Collect Statistics

“The non permanent appearance of happiness and distress, and their disappearance in due course,, are like the appearance and disappearance of summer and winter seasons.”

– Bhagavad Gita

Table of Contents Chapter 38 – Collect Statistics

The Purpose of Collect Statistics

What to COLLECT STATISTICS On?

How do you know if Statistics were Collected on a Table?

The Basic Syntax for COLLECT STATISTICS

COLLECT STATISTICS Examples for a better Understanding

The Official Syntax for COLLECT STATISTICS

How to Re-COLLECT STATISTICS on a Table

How does the PE Plan if No Statistics were Collected?

How to Copy a Table with Data and the Statistics?

How to Copy a Table with NO Data and the Statistics?

When to COLLECT STATISTICS Using only a SAMPLE

Examples of COLLECT STATISTICS Using only a SAMPLE

How to Collect Statistics on a PPI Table on the Partition

Teradata V12 and V13 Statistics Enhancements

The Purpose of Collect Statistics

The Teradata Parsing Engine (PE) also called the Optimizer is in charge of creating the PLAN for the AMPs to follow. The PE works best when Statistics have been collected on a table. Then it knows:

 

1. The number of rows in the table

2. The average row size

3. Information on all Indexes in which statistics were collected

4. The range of values for the column(s) in which statistics were collected

5. The number of rows per value for the column(s) in which statistics were collected

6. The number of NULLs for the column(s) in which statistics were collected

The purpose of the COLLECT STATISTICS command is to gather and store demographic data for one or more columns or indices of a table or join index. This process computes a statistical profile of the collected data, and stores the synopsis in the Data Dictionary (DD) inside USER DBC for use during the PE's optimizing phase of SQL statement parsing. The optimizer uses this synopsis data to generate efficient table access and join plans. Do NOT COLLECT Statistics on all columns in the table.

What to COLLECT STATISTICS On?

You don't COLLECT STATISTICS on all columns and indexes because it takes up too much space for unnecessary reasons, but you do collect on:

• All Non-Unique indices

• Non-index join columns

• The Primary Index of small tables

• Primary Index of a Join Index

• Secondary Indices defined on any join index

• Join index columns that frequently appear on any additional join index columns that frequently appear in WHERE search conditions

• Columns that frequently appear in WHERE search conditions or in the WHERE clause of joins.

The first time you collect statistics you collect them at the index or column level. After that you just collect statistics at the table level and all previous columns collected previously are collected again. It is a mistake to collect statistics only once and then never do it again. COLLECT STATISTICS each time a table's data changes by 10%.

How do you know if Statistics were Collected on a Table?

Syntax: HELP Statistics <Table Name>

images

The HELP Statistics command will show you what statistics have been collected.

The Basic Syntax for COLLECT STATISTICS

Here is the syntax for collecting on columns and indexes.

images

Below are three actual examples

images

The COLLECT STATISTICS commands above are excellent examples.

COLLECT STATISTICS Examples for a better Understanding

images

The Official Syntax for COLLECT STATISTICS

images

Collect Statistics on a Table

COLLECT STATISTICS [ USING SAMPLE ]
   ON [ TEMPORARY ] { <table-name> | <join-index-name> | <hash-index-name>
}
  [ COLUMN { <column-name> | (<column-list>) }
     | [ UNIQUE ] INDEX { <index-name> [ ALL ] | (<column-list>) }
            [ ORDER BY { HASH | VALUES } [ <column-name> ] ] ] ;

images

Collect Statistics on an Index

COLLECT STATISTICS [ USING SAMPLE ]
  [ COLUMN { <column-name> | (<column-list>)}
     | [ UNIQUE ] INDEX { <index-name> [ ALL ] | (<column-list>)}
           [ ORDER BY { HASH | VALUES } [ <column-name> ] ] ]
ON [ TEMPORARY ] { <table-name> | <join-index-name> | <hash-index-name> } ;

How to Re-COLLECT STATISTICS on a Table

Here is the syntax for re-collecting statistics on a table.

COLLECT STATISTICS on <Tablename> ;

Below is an actual example

COLLECT STATISTICS on Employee_Table;

images

The first time you collect statistics you do it for each individual column or index that you want to collect on. When a table changes its data by 10% due to Inserts, Updates, or Deletes you merely use the command above and it re-collects on the same columns and indexes previously collected on.

How does the PE Plan if No Statistics were Collected?

If the Parsing Engine finds there are NO Statistics on a table being queried it runs a Random AMP Sample and then Guesstimates!

A Random AMP is selected for a Random Sample. Two things happen:

 

1) Indexes are sampled on the Random AMP and the PE estimates based on the total number of AMPs in the system.

 

2) If a column in the WHERE clause of the SQL is not an Index the PE assumes that 10% of the rows will come back. If two columns are in the WHERE clause then it assumes 7.5% of the rows will come back. If three columns are in the WHERE Clause it assumes 5%.

Today's Teradata systems always perform a Random AMP Sample even if tables have Statistics. Then they compare the Random AMP Sample with the statistics to determine if the statistics are stale.

A Random AMP sample is selected by the PE to perform a Random AMP Sample if the PE finds there are no statistics on the table. Bad distribution could be bad here!

How to Copy a Table with Data and the Statistics?

This next example is pretty amazing. Assume that the original Employee_Table had COLLECT STATISTICS on the columns Employee_No and Dept_No. The New table we have called Employee_Table_New will have DDL exactly like the Employee_Table plus data plus the statistics. Yes, the exact same statistics will be copied to the new table. Below is the actual example!

images

The example above will CREATE a new table called Employee_Table_New and it will have the exact same DDL as the Employee_Table, the exact same data, and the exact same statistics.

How to Copy a Table with NO Data and the Statistics?

This next example is clever. Assume that the original Employee_Table had COLLECT STATISTICS on the columns Employee_No and Dept_No. The New table we have called Employee_Table_99 will have DDL exactly like the Employee_Table but NO data. It will have the Statistics, but they will be Zeroed Statistics.

images

You have just Re-Collected Statistics on Employee_Table_99 for the columns Employee_No and Dept_No. The easy re-collection on the columns previously collected on (after the data is loaded) was the entire purpose of getting the Zeroed Statistics in the first place. Make sure you recollect after your data is loaded though!

When to COLLECT STATISTICS Using only a SAMPLE

You might consider Collecting Statistics with SAMPLE if:

images You are collecting statistics on a very large table.
images When collecting statistics becomes a problem with system performance or cost because the system is so busy.

Don't consider Collecting Statistics with SAMPLE if:

images The tables are small.
images To replace all existing full scan Collect Statistics.
images If the column's data is skewed badly.

COLLECT STATISTICS can be very time consuming because it performs a full table scan and then performs a lot of statistical calculations. Because Collect Statistics runs infrequently and benefits query optimization it is considered a necessary task. Without statistics, query performance will suffer. The bad news about sampled statistics is that they may not be as accurate, which could negatively affect the PE's plans. In most cases sampled statistics are better than no statistics. Don't use Sample unless necessary!

Examples of COLLECT STATISTICS Using only a SAMPLE

images

If you recollect statistics on a sample it recollects with the same sample!

Sampled statistics are generally more accurate for data that is not skewed. For example, columns or indexes that are unique or nearly unique are not skewed. Because the PE needs to be aware of skewed data you should not collect with sample on skewed data. That is why sampling is generally more appropriate for indexes than non-indexed column(s). If you recollect statistics on a Sample it Recollects with the same Sample!

How to Collect Statistics on a PPI Table on the Partition

Here is the syntax for collecting statistics on a PPI table.

COLLECT STATISTICS on <Tablename> COLUMN PARTITION;

Below is an actual example

COLLECT STATISTICS on Order_Table_PPI COLUMN PARTITION;

Three reasons to Collect on the Partition:

images The Parsing Engine will have a better plan for PPI Tables.
images This helps the most with Partition Elimination on Range Queries.
images This is especially helpful when a table has a lot of empty partitions.

The Parsing Engine can use this information to better estimate the query cost when there are a significant number of empty partitions. If PARTITION statistics are not collected, empty partitions may cause the Parsing Engine to underestimate the number of rows in a partition. You shouldn't use WITH SAMPLE to collect on Partitions.

Teradata V12 and V13 Statistics Enhancements

In V12 Extrapolate Statistics is designed to more accurately provide for a statistical estimate for date range-based queries that specify a “future” date that is outside the bounds of the current statistics. This results in less re-collections.

In V12 Stale Statistics Detection compares the Random AMP Sample with the statistics collected and determines if they are stale, and should not be used.

In V13 Statistics can now be collected on Volatile Tables.

In V13 PARTITION statistic capabilities have been added to Global Temporary Tables.

In V13 Multi-Column statistics are now available on Hash Indexes and Join Indexes.

In V13 Sample Statistics are available on Tables, Volatile Tables, Global Temporary Tables, Hash Indexes and Join Indexes, including the Partition Columns.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset