Chapter 21

OLAP Functions

“The best we can do is size up the chances, calculate the risks involved, estimate our ability to deal with them, and then make our plans with confidence.”

– Henry Ford

Table of Contents Chapter 21- OLAP Functions

On-Line Analytical Processing (OLAP) or Ordered Analytics

Cumulative Sum (CSUM) Command and how OLAP Works

OLAP Commands always Sort (ORDER BY) in the Command

Calculate the Cumulative Sum (CSUM) after Sorting the Data

The OLAP Major Sort Key

The OLAP Major Sort Key and the Minor Sort Key(s)

Troubleshooting OLAP – My Data isn't coming back Correct

GROUP BY in Teradata OLAP Syntax Resets on the Group

CSUM the Number 1 to get a Sequential Number

A Single GROUP BY Resets each OLAP with Teradata Syntax

A Better Choice – The ANSI Version of CSUM

The ANSI Version of CSUM – The Sort Explained

The ANSI CSUM – Rows Unbounded Preceding Explained

The ANSI CSUM – Making Sense of the Data

The ANSI CSUM – Making Even More Sense of the Data

The ANSI CSUM – The Major and Minor Sort Key(s)

The ANSI CSUM – Getting a Sequential Number

Troubleshooting The ANSI OLAP on a GROUP BY

The ANSI OLAP – Reset with a PARTITION BY Statement

PARTITION BY only Resets a Single OLAP not ALL of them

The Moving Average (MAVG) and Moving Window

How the Moving Average is Calculated

How the Sort works for Moving Average (MAVG)

GROUP BY in the Moving Average does a Reset

Quiz – Can you make the Advanced Calculation in your mind?

Answer to Quiz for the Advanced Calculation in your mind?

Quiz – Write that Teradata Moving Average in ANSI Syntax

Both the Teradata Moving Average and ANSI Version

The ANSI Moving Window is Current Row and Preceding

How ANSI Moving Average Handles the Sort

Quiz – How is that Total Calculated

Answer to Quiz – How is that Total Calculated?

Quiz – How is that 4th Row Calculated?

Answer to Quiz – How is that 4th Row Calculated?

Moving Average every 3-rows Vs a Continuous Average

Partition By Resets an ANSI OLAP

The Moving Difference (MDIFF)

Moving Difference (MDIFF) Visual

Moving Difference using ANSI Syntax

Moving Difference using ANSI Syntax with Partition By

Trouble Shooting the Moving Difference (MDIFF)

The RANK Command

How to get Rank to Sort in Ascending Order

Two ways to get Rank to Sort in Ascending Order

RANK using ANSI Syntax Defaults to Ascending Order

Getting RANK using ANSI Syntax to Sort in DESC Order

RANK() OVER and PARTITION BY with a QUALIFY

QUALIFY and WHERE

Quiz – How can you simplify the QUALIFY Statement

Answer to Quiz –Can you simplify the QUALIFY Statement

The QUALIFY Statement without Ties

The QUALIFY Statement with Ties

The QUALIFY Statement with Ties Brings back Extra Rows

Mixing Sort Order for QUALIFY Statement

Quiz – What Caused the RANK to Reset?

Answer to Quiz – What Caused the RANK to Reset?

Quiz – Name those Sort Orders

Answer to Quiz – Name those Sort Orders

PERCENT_RANK() OVER

PERCENT_RANK() OVER with 14 rows in Calculation

PERCENT_RANK() OVER with 21 rows in Calculation

Quiz – What Cause the Product_ID to Reset

Answer to Quiz – What Cause the Product_ID to Reset

COUNT OVER for a Sequential Number

Troubleshooting COUNT OVER

Quiz – What caused the COUNT OVER to Reset?

Answer to Quiz – What caused the COUNT OVER to Reset?

The MAX OVER Command

MAX OVER with PARTITION BY Reset

Troubleshooting MAX OVER

The MIN OVER Command

Troubleshooting MIN OVER

Quiz – Fill in the Blank

Answer to Quiz – Fill in the Blank

The Row_Number Command

Quiz – How did the Row_Number Reset?

Testing Your Knowledge

Testing Your Knowledge

Testing Your Knowledge

Testing Your Knowledge

Testing Your Knowledge

Testing Your Knowledge

On-Line Analytical Processing (OLAP) or Ordered Analytics

image

OLAP is often called Ordered Analytics because the first thing every OLAP does before any calculating is SORT all the rows. The query above sorts by Sale_Date!

Cumulative Sum (CSUM) Command and how OLAP Works

SELECT   Product_ID , Sale_Date, Daily_Sales
         ,CSUM(Daily_Sales, Sale_Date) AS “CSum”
FROM Sales_Table ;

image

OLAP always sorts first and then is in a position to calculate starting with the first sorted row and continuing to the last sorted row, thus calculating all Daily_Sales.

OLAP Commands always Sort (ORDER BY) in the Command

SELECT    Product_ID , Sale_Date, Daily_Sales
                  ,CSUM(Daily_Sales, Sale_Date) AS “CSum”
FROM Sales_Table ;

Not all rows are displayed in this answer set

image

image

Calculate the CSUM starting with the first sorted row and go to the last.

Once the data is first sorted by Sale_Date then phase 2 is ready and the OLAP calculation can be performed on the sorted data. Day 1 we made 48850.40! Add the next row's Daily_Sales to get a Cumulative Sum (CSUM) to get 90739.28!

Calculate the Cumulative Sum (CSUM) after Sorting the Data

SELECT   Product_ID , Sale_Date, Daily_Sales
        CSUM(Daily_Sales, Sale_Date) AS “CSUM”
FROM       Sales Table WHERE Product_ID BETWEEN 1000 and 2000

Not all rows are displayed in this answer set

image

This is our first OLAP known as a CSUM. Right now, the syntax wants to see the cumulative sum of the Daily_Sales sorted by Sale_Date. The first thing the above query does before calculating is SORT all the rows on Sale_Date.

The OLAP Major Sort Key

SELECT Product_ID , Sale_Date, Daily_Sales,
        CSUM(Daily_Sales, Sale_Date) AS “CSum”
FROM Sales Table WHERE Product_ID BETWEEN 1000 and 2000

Not all rows are displayed in this answer set

image

In a CSUM, the second column listed is always the major SORT KEY. The SORT KEY in the above query is Sale_Date. Notice again the answer set is sorted by this. After the sort has finished the CSUM is calculated starting with the first sorted row till the end.

The OLAP Major Sort Key and the Minor Sort Key(s)

image

Product_ID is the MAJOR sort key and Sale_Date is the MINOR Sort key above.

Troubleshooting OLAP – My Data isn't coming back Correct

SELECT      Product_ID , Sale_Date, Daily_Sales,
           CSUM(Daily_Sales, Product_ID, Sale_Date) AS “CSum'
FROM Sales_Table WHERE Product_ID BETWEEN 1000 and 2000

ORDER BY Daily_Sales;

image

image

The first thing every OLAP does is SORT. That means you should NEVER put an ORDER BY at the end. It will mess up the ENTIRE result set.

GROUP BY in Teradata OLAP Syntax Resets on the Group

image

image

The GROUP BY Statement cause the CSUM to start over (reset) on its calculating the cumulative sum of the Daily_Sales each time it runs into a NEW Product_ID.

CSUM the Number 1 to get a Sequential Number

image

With “Seq_Number”, because you placed the number 1 in the area where it calculates, it will continuously add 1 to the answer for each row.

A Single GROUP BY Resets each OLAP with Teradata Syntax

image

What does the GROUP BY Statement cause? Both OLAP Commands to reset!

A Better Choice – The ANSI Version of CSUM

image

Not all rows are displayed in this answer set

image

This ANSI version of CSUM is SUM() Over. Right now, the syntax wants to see the sum of the Daily_Sales after it is first sorted by Sale_Date. Rows Unbounded Preceding makes this a CSUM. The ANSI Syntax seems difficult, but only at first.

The ANSI Version of CSUM – The Sort Explained

SELECT Product_ID , Sale_Date, Daily_Sales,
               SUM(Daily_Sales) OVER (ORDER BY Sale_Date
               ROWS UNBOUNDED PRECEDING) AS SUMOVER
FROM Sales_Table WHERE Product_ID BETWEEN 1000 and 2000 ;

Not all rows are displayed in this answer set

image

The first thing the above query does before calculating is SORT all the rows by Sale_Date. The Sort is located right after the ORDER BY.

The ANSI CSUM – Rows Unbounded Preceding Explained

SELECT Product_ID , Sale_Date, Daily_Sales,
               SUM(Daily_Sales) OVER (ORDER BY Sale_Date
               ROWS UNBOUNDED PRECEDING) AS SUMOVER
FROM Sales_Table WHERE Product_ID BETWEEN 1000 and 2000 ;

Not all rows are displayed in this answer set

image

The keywords Rows Unbounded Preceding determines that this is a CSUM. There are only a few different statements and Rows Unbounded Preceding is the main one. It means start calculating at the beginning row and continue calculating until the last row..

The ANSI CSUM – Making Sense of the Data

SELECT  Product_ID , Sale_Date, Daily_Sales,
          SUM(Daily_Sales) OVER (ORDER BY Sale_Date
          ROWS UNBOUNDED PRECEDING) AS SUMOVER
FROM Sales_Table WHERE Product_ID BETWEEN 1000 and 2000 ;

Not all rows are displayed in this answer set

image

The second “SUMOVER” row is 90739.28. That is derived by the first row's Daily_Sales (41888.88) added to the SECOND row's Daily_Sales (48850.40).

The ANSI CSUM – Making Even More Sense of the Data

SELECT Product_ID , Sale_Date, Daily_Sales,
    SUM(Daily_Sales) OVER (ORDER BY Sale_Date
    ROWS UNBOUNDED PRECEDING) AS SUMOVER
FROM Sales_Table WHERE Product_ID BETWEEN 1000 and 2000 ;

Not all rows are displayed in this answer set

image

The third “SUMOVER” row is 138739.28. That is derived by taking the first row's Daily_Sales (41888.88) and adding it to the SECOND row's Daily_Sales (48850.40). Then you add that total to the THIRD row's Daily_Sales (48000.00).

The ANSI CSUM – The Major and Minor Sort Key(s)

SELECT Product_ID , Sale_Date, Daily_Sales,
    SUM(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
    ROWS UNBOUNDED PRECEDING) AS SumOVER
FROM Sales_Table ;

Not all rows are displayed in this answer set

image

You can have more than one SORT KEY. In the top query, Product_ID is the MAJOR Sort and Sale_Date is the MINOR Sort.

The ANSI CSUM – Getting a Sequential Number

image

Not all rows are displayed in this answer set

image

With “Seq_Number”, because you placed the number 1 in the area which calculates the cumulative sum, it'll continuously add 1 to the answer for each row.

Troubleshooting The ANSI OLAP on a GROUP BY

SELECT Product_ID , Sale_Date, Daily_Sales,
        SUM(Daily_Sales) OVER (ORDER BY Sale_Date
           ROWS UNBOUNDED PRECEDING) AS SUMOVER
FROM Sales_Table

GROUP BY Product_ID ;

Error! Why?

Never GROUP BY in a SUM()Over or with any ANSI Syntax OLAP command. If you want to reset you use a PARTITION BY Statement, but never a GROUP BY.

The ANSI OLAP – Reset with a PARTITION BY Statement

SELECT Product_ID , Sale_Date, Daily_Sales,
    SUM(Daily_Sales) OVER (PARTITION BY Product_ID
    ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING) AS SumANSI
FROM Sales_Table ;

Not all rows are displayed in this answer set

image

The PARTITION Statement is how you reset in ANSI. This will cause the SUMANSI to start over (reset) on its calculating for each NEW Product_ID.

PARTITION BY only Resets a Single OLAP not ALL of them

SELECT Product_ID , Sale_Date, Daily_Sales,
  SUM(Daily_Sales) OVER (PARTITION BY Product_ID
  ORDER BY Product_ID, Sale_Date
  ROWS UNBOUNDED PRECEDING) AS Subtotal,
    SUM(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING) AS GRANDTotal
FROM Sales_Table ;

Not all rows are displayed in this answer set

image

Above are two OLAP statements. Only one has PARTITION BY so only it resets.

The Moving Average (MAVG) and Moving Window

image

This is the Moving Average (MAVG). It will calculate the average of 3 rows because that is the Moving Window. It will read the current row and TWO preceding to find the MAVG of those 3 rows. It will be sorted by Product_ID and Sale_Date first.

How the Moving Average is Calculated

SELECT Product_ID , Sale_Date, Daily_Sales,
    MAVG( Daiiy_Saies, 3, Product_ID, Sale_Date) AS AVG3_Rows
FROM Sales_Table

Not all rows are displayed in this answer set

image

With a Moving Window of 3, how is the 46450.23 amount derived in the AVG3_Rows column in the third row? It is the AVG of 48850.40, 54500.22 and 36000.07! The fourth row has AVG3_Rows equal to 43566.91. That was the average of 54500.22, 360000.07 and 40200.43. The calculation is on the current row and the two before.

How the Sort works for Moving Average (MAVG)

image

The sorting is show above.

GROUP BY in the Moving Average does a Reset

image

What does the GROUP BY Product_ID do? It causes a reset on all Product_ID breaks.

Quiz – Can you make the Advanced Calculation in your mind?

image

How is the 44944.44 derived in the 9th row of the AVG_for_3_Rows?

Answer to Quiz for the Advanced Calculation in your mind?

image

Notice there are only two calculations although this has a moving window of 3. That is because the GROUP BY caused the MAVG to reset when Product_ID 2000 came.

Quiz – Write that Teradata Moving Average in ANSI Syntax

SELECT   Product_ID , Sale_Date, Daily_Sales,
   MAVG( Daily_Sales, 3, Product_ID, Sale_Date) AS AVG3_Rows
FROM Sales_Table ;

Challenge

Can you place another equivalent Moving Average in the SQL above using ANSI Syntax?

Here is a challenge that almost everyone fails. Can you do it perfectly?

Both the Teradata Moving Average and ANSI Version

SELECT Product_ID , Sale_Date, Daily_Sales,
    MAVG( Daily_Sales, 3, Product_ID, Sale_Date) AS AVG_3,
    AVG(Daily_Sales) OVER (ORDER BY Product_ID,
    Sale_Date ROWS 2 Preceding) AS AVG_3_ANSI
FROM Sales Table ;

Not all rows are displayed in this answer set

image

The MAVG and AVG(Over) commands above are equivalent. Notice the Moving Window of 3 in the Teradata syntax and that it is a 2 in the ANSI version. That is because in ANSI it is considered the Current Row and 2 preceding.

The ANSI Moving Window is Current Row and Preceding

image

The AVG () Over allows you to do is to get the moving average of a certain column.

How ANSI Moving Average Handles the Sort

image

Much like the SUM OVER Command, the Average OVER places the sort after the ORDER BY.

Quiz – How is that Total Calculated?

SELECT Product_ID , Sale_Date, Daily_Sales,
AVG(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS 2 Preceding) AS AVG_3_ANSI
FROM Sales_Table ;

Not all rows are displayed in this answer set

image

With a Moving Window of 3, how is the 46450.23 amount derived in the AVG_3_ANSI column in the third row?

Answer to Quiz – How is that Total Calculated?

SELECT Product_ID , Sale_Date, Daily_Sales,
   AVG(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS 2 Preceding) AS AVG_3_ANSI
FROM Sales Table;

Not all rows are displayed in this answer set

image

AVG of 48850.40, 54500.22, and 36000.07

Quiz – How is that 4th Row Calculated?

SELECT Product_ID , Sale_Date, Daily_Sales,
   AVG(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
    ROWS 2 Preceding) AS AVG_3_ANSI
FROM Sales_Table;

Not all rows are displayed in this answer set

image

With a Moving Window of 3, how is the 43566.91 amount derived in the AVG_3_ANSI column in the fourth row?

Answer to Quiz – How is that 4th Row Calculated?

SELECT Product_ID , Sale_Date, Daily_Sales,
   AVG(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
    ROWS 2 Preceding) AS AVG_3_ANSI
FROM Sales_Table;

Not all rows are displayed in this answer set

image

AVG of 54500.22, 36000.07, and 40200.43

Moving Average every 3-rows Vs a Continuous Average

SELECT Product_ID , Sale_Date, Daily_Sales,
AVG(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
    ROWS 2 Preceding) AS AVG3,
AVG(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED Preceding) AS Continuous
FROM Sales_Table;

Not all rows are displayed in this answer set

image

The ROWS 2 Preceding gives the MAVG for every 3 rows. The ROWS UNBOUNDED Preceding gives the continuous MAVG.

Partition By Resets an ANSI OLAP

image

Not all rows are displayed in this answer set

image

Use a PARTITION BY Statement to Reset the ANSI OLAP.

The Moving Difference (MDIFF)

image

This is the Moving Difference (MDIFF). What this does is calculate the difference between the current row and only the 4th row preceding.

Moving Difference (MDIFF) Visual

SELECT Product_ID , Sale_Date, Daily_Sales,
   MDIFF(Daily_Sales, 4, Product_ID, Sale_Date) as “MDiff”
FROM Sales_Table ;

Not all rows are displayed in this answer set

image

How much more did we make for Product_ID 1000 on 2000-10-03 versus Product_ID 1000 which was 4 rows earlier on 2000-09-29?

Moving Difference using ANSI Syntax

SELECT Product_ID, Sale_Date, Daily_Sales,
   Daily_Sales – SUM(Daily_Sales)
    OVER ( ORDER BY Product_ID ASC, Sale_Date ASC
    ROWS BETWEEN 4 PRECEDING AND 4 PRECEDING)
     AS “MDiff_ANSI”
FROM Sales Table ;

Not all rows are displayed in this answer set

image

This is how you do a MDiff using the ANSI Syntax with a moving window of 4.

Moving Difference using ANSI Syntax with Partition By

SELECT Product_ID, Sale_Date (Format 'yyyy-mm-dd'), Daily_Sales,
Daily_Sales – SUM(Daily_Sales) OVER (PARTITION BY Product_ID
   ORDER BY Product_ID ASC, Sale_Date ASC
    ROWS BETWEEN 4 PRECEDING AND 4 PRECEDING)
AS “MDiff_ANSI”
FROM Sales_Table;

Not all rows are displayed in this answer set

image

Wow! This is how you do a MDiff using the ANSI Syntax with a PARTITION BY.

Trouble Shooting the Moving Difference (MDIFF)

image

Do you notice that column Compare2Rows did not produce any data? That is because the GROUP BY Reset before it could get 7 records to find the MDIFF.

The RANK Command

SELECT   Product_ID ,Sale_Date , Daily_Sales,
    RANK(Daily_Sales) AS “Rank”
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;

image

This is the RANK. In this example, it will rank your Daily_Sales from greatest to least. The default for this type of RANK is to sort DESC.

How to get Rank to Sort in Ascending Order

SELECT   Product_ID ,Sale_Date , Daily_Sales,
    RANK(Daily_Sales ASC) AS “Rank”
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;

image

This RANK query sorts in Ascending mode.

Two ways to get Rank to Sort in Ascending Order

SELECT   Product_ID ,Sale_Date , Daily_Sales,
   RANK(Daily_Sales ASC) AS Rank1,
RANK(-Daily_Sales)   AS Rank2
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;

image

A minus sign or keyword ASC will sort Both RANK in Ascending mode.

RANK using ANSI Syntax Defaults to Ascending Order

SELECT   Product_ID ,Sale_Date , Daily_Sales,
   RANK() OVER (ORDER BY Daily_Sales) AS Rank1
FROM Sales_Table
WHERE Product_ID IN (1000, 2000)

Not all rows are displayed in this answer set

image

This is the RANK() OVER. It provides a rank for your queries. Notice how you do not place anything within the () after the word RANK. Default Sort is ASC.

Getting RANK using ANSI Syntax to Sort in DESC Order

SELECT Product_ID ,Sale_Date , Daily_Sales,
   RANK() OVER (ORDER BY Daily_Sales DESC) AS Rank1
FROM Sales_Table
WHERE Product_ID IN (1000, 2000) ;

Not all rows are displayed in this answer set

image

Is the query above in ASC mode or DESC mode for sorting?

RANK() OVER and PARTITION BY with a QUALIFY

SELECT Product_ID ,Sale_Date , Daily_Sales,
   RANK() OVER (PARTITION BY Product_ID
    ORDER BY Daily_Sales DESC) AS Rank1
FROM Sales_Table
WHERE Product_ID IN (1000, 2000)
QUALIFY Rank1 < 4

image

What does the PARTITION Statement in the RANK() OVER do? It resets the rank. The QUALIFY statement limits rows once the Rank's been calculated.

QUALIFY and WHERE

SELECT   Product_ID ,Sale_Date , Daily_Sales,
     RANK(Daily_Sales ASC) AS Rank1,
     RANK(-Daily_Sales)   AS Rank2
FROM Sales_Table
WHERE Product_ID IN (1000, 2000)
QUALIFY Rank(-Daily_Sales) < 6 ;

image

The WHERE statement is performed first. It limits the rows being calculated. Then the QUALIFY takes the calculated rows and limits the returning rows. QUALIFY is to OLAP what HAVING is to Aggregates. Both limit after the calculations.

Quiz – How can you simplify the QUALIFY Statement

SELECT Product_ID ,Sale_Date , Daily_Sales,
   RANK(Daily_Sales ASC) AS Rank1,
RANK(-Daily_Sales)   AS Rank2
FROM Sales_Table
WHERE Product_ID IN (1000, 2000)
QUALIFY Rank(-Daily_Sales) < 6 ;

image

How can you improve the QUALIFY Statement above for simplicity?

Answer to Quiz –Can you simplify the QUALIFY Statement

SELECT Product_ID ,Sale_Date , Daily_Sales,
   RANK(Daily_Sales ASC) AS Rank1,
   RANK(-Daily_Sales)   AS Rank2
FROM Sales_Table
WHERE Product_ID IN (1000, 2000)
QUALIFY Rank2 < 6 ;

image

QUALIFY Rank2 < 6 (Use the Alias)

The QUALIFY Statement without Ties

SELECT   Product_ID ,Sale_Date , Daily_Sales,
    RANK(Daily_Sales) AS Rank1
FROM Sales_Table
WHERE Product_ID IN (1000, 2000)
QUALIFY Rank1 < 6 ;

image

A QUALIFY < 6 will provide a result that is 5 rows. Notice there are NO ties, yet!

The QUALIFY Statement with Ties

SELECT   Product_ID ,Sale_Date , Daily_Sales,
    RANK(Daily_Sales ASC) AS Rank1
FROM Sales_Table
WHERE Product_ID IN (1000, 2000)
QUALIFY Rank1 < 6 ;

image

A QUALIFY < 6 will provide a result that is 5 rows. Notice there are Ties!

The QUALIFY Statement with Ties Brings back Extra Rows

SELECT   Product_ID ,Sale_Date , Daily_Sales,
    RANK(Daily_Sales ASC) AS Rank1
FROM Sales_Table
WHERE Product_ID IN (1000, 2000)
QUALIFY Rank1 < 2 ;

image

A QUALIFY < 2 will provide more rows than 1 because of the Ties!

Mixing Sort Order for QUALIFY Statement

image

image

Look at the Rankings and the Daily_Sales. This data come out odd because Rank1 is DESC by default (using this Syntax) and the QUALIFY specifies ASC mode.

Quiz – What Caused the RANK to Reset?

SELECT   Product_ID ,Sale_Date , Daily_Sales,
    RANK(Daily_Sales) AS Rank1
FROM Sales_Table
WHERE Product_ID IN (1000, 2000)
GROUP BY Product_ID
QUALIFY Rank1 < 4 ;

image

What caused the data to reset the column Rank1?

Answer to Quiz – What Caused the RANK to Reset?

SELECT   Product_ID ,Sale_Date , Daily_Sales,
    RANK(Daily_Sales) AS Rank1
FROM Sales_Table
WHERE Product_ID IN (1000, 2000)
GROUP BY Product_ID
QUALIFY Rank1 < 4 ;

image

    GROUP BY

Quiz – Name those Sort Orders

RANK() OVER (ORDER BY Daily_Sales) AS ANSI_Rank

Is the default above ASC or DESC?

RANK(Daily_Sales) AS NON_ANSI_Rank

Is the default above ASC or DESC?

Answer the questions above.

Answer to Quiz – Name those Sort Orders

RANK() OVER (ORDER BY Daily_Sales) AS ANSI_Rank

Defaults to ASC

RANK(Daily_Sales) AS NON_ANSI_Rank

Defaults to DESC

Please note that by default these different syntaxes sort completely opposite.

PERCENT_RANK() OVER

SELECT Product_ID ,Sale_Date , Daily_Sales,
   PERCENT_RANK() OVER (PARTITION BY PRODUCT_ID
   ORDER BY Daily_Sales DESC) AS PercentRank1
FROM Sales_Table WHERE Product_ID in (1000, 2000) ;

image

7 Rows in Calculation for 1000 Product_ID

7 Rows in Calculation for 2000 Product_ID

We now have added a Partition statement which produces 7 rows per Product_ID.

PERCENT_RANK() OVER with 14 rows in Calculation

SELECT Product_ID ,Sale_Date , Daily_Sales,
   PERCENT_RANK() OVER ( ORDER BY Daily_Sales DESC)
    AS PercentRank1
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;

image

14 Rows in Calculation for both the 1000 and 2000 Product_IDs

Percentage_Rank is just like RANK however, it gives you the Rank as a percent, but only a percent of all the other rows up to 100%.

PERCENT_RANK() OVER with 21 rows in Calculation

SELECT Product_ID ,Sale_Date , Daily_Sales,
   PERCENT_RANK() OVER ( ORDER BY Daily_Sales DESC)
    AS PercentRank1
FROM Sales_Table ;

Not all rows are displayed in this answer set

image

21 Rows in Calculation for all of the Product_IDs

Percentage_Rank is just like RANK however, it gives you the Rank as a percent, but only a percent of all the other rows up to 100%.

Quiz – What Cause the Product_ID to Reset

SELECT Product_ID ,Sale_Date , Daily_Sales,
   PERCENT_RANK() OVER (PARTITION BY PRODUCT_ID
   ORDER BY Daily_Sales DESC) AS PercentRank1
FROM Sales_Table WHERE Product_ID in (1000, 2000) ;

image

What caused the Product_IDs to be sorted?

Answer to Quiz – What Cause the Product_ID to Reset

SELECT Product_ID ,Sale_Date , Daily_Sales,
   PERCENT_RANK() OVER (PARTITION BY PRODUCT_ID
   ORDER BY Daily_Sales DESC) AS PercentRank1
FROM Sales_Table WHERE Product_ID in (1000, 2000) ;

image

PARTITION BY caused the data to be sorted!

COUNT OVER for a Sequential Number

SELECT Product_ID ,Sale_Date , Daily_Sales,
   COUNT(*) OVER (ORDER BY Product_ID, Sale_Date
   ROWS UNBOUNDED PRECEDING) AS Seq_Number
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;

Not all rows are displayed in this answer set

image

This is the COUNT OVER. It will provide a sequential number starting at 1. The Keyword(s) ROWS UNBOUNDED PRECEDING causes Seq_Number to start at the beginning and increase sequentially to the end.

Troubleshooting COUNT OVER

SELECT   Product_ID ,Sale_Date , Daily_Sales,
COUNT(*) OVER (ORDER BY Product_ID, Sale_Date) AS No_Seq
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;

Rows Unbounded Preceding is missing in this statement.

image

  14 rows came back

When you don't have a ROWS UNBOUNDED PRECEDING, No_Seq get a value of 14 on every row. Why? Because 14 is the FINAL COUNT NUMBER.

Quiz – What caused the COUNT OVER to Reset?

SELECT Product_ID ,Sale_Date , Daily_Sales,
   COUNT(*) OVER (PARTITION BY Product_ID
    ORDER BY Product_ID, Sale_Date
   ROWS UNBOUNDED PRECEDING) AS StartOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;

image

What Keyword(s) caused StartOver to reset?

Answer to Quiz – What caused the COUNT OVER to Reset?

SELECT Product_ID ,Sale_Date , Daily_Sales,
   COUNT(*) OVER (PARTITION BY Product_ID
    ORDER BY Product_ID, Sale_Date
   ROWS UNBOUNDED PRECEDING) AS StartOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;

image

PARTITION BY

The MAX OVER Command

SELECT Product_ID ,Sale_Date , Daily_Sales,
   MAX(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
   ROWS UNBOUNDED PRECEDING) AS MaxOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;

image

After the sort the Max() Over shows the Max Value up to that point.

MAX OVER with PARTITION BY Reset

SELECT Product_ID ,Sale_Date , Daily_Sales,
   MAX(Daily_Sales) OVER (PARTITION BY Product_ID
    ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING) AS MaxOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;

Not all rows are displayed in this answer set

The largest value is 64300.00 in the column MaxOver. Once it was evaluated it did not continue until the end because of the PARTITION BY reset.

Troubleshooting MAX OVER

SELECT Product_ID ,Sale_Date , Daily_Sales,
 MAX(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date )
    AS MaxOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;

Rows Unbounded Preceding is missing in this statement.

Not all rows are displayed in this answer set

image

You can also use MAX as a OLAP. 64300.00 came back in MaxOver because that was the MAX value for Daily_Sales in this Answer Set. Notice that it doesn't have a ROWS UNBOUNDED PRECEDING.

The MIN OVER Command

SELECT Product_ID, Sale_Date ,Daily_Sales
   ,Min(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
    ROWS UNBOUNDED PRECEDING) AS MinOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;

image

After the sort the MIN () Over shows the Max Value up to that point.

Troubleshooting MIN OVER

SELECT Product_ID ,Sale_Date , Daily_Sales,
   MIN(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date )
    AS MinOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;

Rows Unbounded Preceding is missing in this statement.

Not all rows are displayed in this answer set

image

Min only displayed 32800.50 because there is NOT a ROWS UNBOUNDED PRECEDING statement so it found the lowest Daily_Sales and repeated it.

Quiz – Fill in the Blank

SELECT Product_ID ,Sale_Date , Daily_Sales,
   Min(Daily_Sales) OVER (PARTITION BY Product_ID
   ORDER BY Product_ID, Sale_Date
   ROWS UNBOUNDED PRECEDING) AS MinOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;

image

The last two answers (MinOver) are blank so you can fill in the blank.

Answer to Quiz – Fill in the Blank

SELECT Product_ID ,Sale_Date , Daily_Sales,
   Min(Daily_Sales) OVER (PARTITION BY Product_ID
   ORDER BY Product_ID, Sale_Date
    ROWS UNBOUNDED PRECEDING) AS MinOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;

image

The Row_Number Command

SELECT Product_ID ,Sale_Date , Daily_Sales,
   ROW_NUMBER() OVER (ORDER BY Product_ID, Sale_Date)
   AS Seq_Number
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;

Not all rows are displayed in this answer set

image

The ROW_NUMBER() Keyword(s) caused Seq_Number to increase sequentially. Notice that this does NOT have a Rows Unbounded Preceding and it still works!

Quiz – How did the Row_Number Reset?

SELECT Product_ID ,Sale_Date , Daily_Sales,
   ROW_NUMBER() OVER (PARTITION BY Product_ID
    ORDER BY Product_ID, Sale_Date ) AS StartOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;

image

What Keyword(s) caused StartOver to reset?

Quiz – How did the Row_Number Reset?

SELECT Product_ID ,Sale_Date , Daily_Sales,
   ROW_NUMBER() OVER (PARTITION BY Product_ID
   ORDER BY Product_ID, Sale_Date ) AS StartOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;

image

What Keyword(s) caused StartOver to reset?

PARTITION BY

Testing Your Knowledge

SELECT Product_ID , Sale_Date, Daily_Sales,
   CSUM(Daily_Sales, Product_ID, Sale_Date) AS “CSum”
FROM Sales_Table WHERE Product_ID BETWEEN 1000 and 2000
GROUP BY Product_ID ;

This is the CSUM. However, what we want to see is the Sum()Over ANSI version. Use the information in the CSUM and convert this to the equivalent Sum()Over.

Testing Your Knowledge

SELECT Product_ID , Sale_Date, Daily_Sales,
   CSUM(Daily_Sales, Product_ID, Sale_Date) AS “CSum”
FROM Sales_Table WHERE Product_ID BETWEEN 1000 and 2000
GROUP BY Product_ID ;

SELECT Product_ID , Sale_Date, Daily_Sales,
   SUM(Daily_Sales) OVER (PARTITION BY Product_ID
   ORDER BY Product_ID, Sale_Date
   ROWS UNBOUNDED PRECEDING) AS SumANSI
FROM Sales_Table
WHERE Product_ID BETWEEN 1000 and 2000 ;

Both statements are exactly the same except the bottom example uses ANSI syntax.

Testing Your Knowledge

SELECT Product_ID , Sale_Date, Daily_Sales,
 MAVG( Daily_Sales, 3, Product_ID, Sale_Date) AS AVG_for_3_Rows
FROM Sales_Table WHERE Product_ID BETWEEN 1000 and 2000 ;

Write the equivalent to the SQL above using ANSI Syntax such as AVG () Over.

Testing Your Knowledge

SELECT Product_ID , Sale_Date, Daily_Sales,
MAVG( Daily_Sales, 3, Product_ID, Sale_Date) AS AVG_for_3_Rows
FROM Sales_Table WHERE Product_ID BETWEEN 1000 and 2000 ;

SELECT Product_ID , Sale_Date, Daily_Sales,
 AVG(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
   ROWS 2 Preceding) AS AVG_3_ANSI
FROM Sales_Table WHERE Product_ID BETWEEN 1000 and 2000

The SQL above is equivalent except the bottom example uses ANSI Syntax.

Testing Your Knowledge

SELECT Product_ID ,Sale_Date , Daily_Sales,
   RANK(Daily_Sales) AS “Rank”
FROM Sales_Table
WHERE Product_ID IN (1000, 2000) ;

This is the Rank. However, what we want to see is the RANK()Over. Use the information in the Rank to make it the Rank()Over.

Testing Your Knowledge

SELECT Product_ID ,Sale_Date , Daily_Sales,
   RANK(Daily_Sales) AS “Rank”
FROM Sales_Table
WHERE Product_ID IN (1000, 2000) ;

SELECT Product_ID ,Sale_Date , Daily_Sales,
   RANK() OVER (ORDER BY Daily_Sales DESC) AS Rank1
FROM Sales_Table
WHERE Product_ID IN (1000, 2000)

The SQL above is equivalent except the bottom example uses ANSI Syntax. Also notice the sort key. DESC is the default in the top example.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset