“Don’t count the days, make the days count.”
– Mohammed Ali
OLAP is often called Ordered Analytics because the first thing every OLAP does before any calculating is SORT all the rows. The query above sorts by Sale_Date and then does a cumulative sum on the Daily_Sales column.
SELECT |
Product_ID , Sale_Date, Daily_Sales |
|
,CSUM(Daily_Sales, Sale_Date) AS "CSum" |
FROM |
Sales_Table ; |
OLAP always sorts the rows first. That is why they are called Ordered Analytics. They order the data first and then calculate the rows once the sort has taken place. This CSUM will calculate the first sorted row and continuing to the last sorted row, thus calculating the cumulative sum for all Daily_Sales.
SELECT |
Product_ID , Sale_Date, Daily_Sales |
|
,CSUM(Daily_Sales, Sale_Date) AS "CSum" |
FROM |
Sales_Table ; |
Once the data is first sorted by Sale_Date then phase 2 is ready and the OLAP calculation can be performed on the sorted data. Day 1 we made 48850.40! Add the next row’s Daily_Sales to get a Cumulative Sum (CSUM) to get 90739.28!
In a CSUM, the second column listed is always the major SORT KEY. The SORT KEY in the above query is Sale_Date. Notice again the answer set is sorted by this. After the sort has finished the CSUM is calculated starting with the first sorted row till the end.
Product_ID is the MAJOR sort key and Sale_Date is the MINOR Sort key above.
The first thing every OLAP does is SORT. That means you should NEVER put an ORDER BY at the end. It will mess up the ENTIRE result set. The data rows above are calculated correctly, but the reordering makes the data look wrong.
The GROUP BY Statement cause the CSUM to start over (reset) on its calculating the cumulative sum of the Daily_Sales each time it runs into a NEW Product_ID.
With “Seq_Number”, because you placed the number 1 in the area where it calculates, it will continuously add 1 to the answer for each row.
What does the GROUP BY Statement cause? Both OLAP Commands to reset! All Teradata OLAP commands reset simultaneously with one GROUP BY statement. We will soon learn about the ANSI version, which is much better.
This ANSI version of CSUM is SUM() Over. Right now, the syntax wants to see the sum of the Daily_Sales after it is first sorted by Sale_Date. Rows Unbounded Preceding means to start calculating from the first row, and continue calculating until the last row. This Rows Unbounded Preceding makes this a CSUM. The ANSI Syntax seems difficult, but only at first.
SELECT Product_ID , Sale_Date, Daily_Sales,
SUM(Daily_Sales) OVER (ORDER BY Sale_Date
ROWS UNBOUNDED PRECEDING) AS CsumAnsi
FROM Sales_Table
WHERE Product_ID BETWEEN 1000 and 2000 ;
The first thing the above query does before calculating is SORT all the rows by Sale_Date. The Sort is located right after the ORDER BY.
SELECT Product_ID , Sale_Date, Daily_Sales,
SUM(Daily_Sales) OVER (ORDER BY Sale_Date
ROWS UNBOUNDED PRECEDING) AS CsumAnsi
FROM Sales_Table
WHERE Product_ID BETWEEN 1000 and 2000 ;
The keywords Rows Unbounded Preceding determines that this is a CSUM. There are only a few different statements and Rows Unbounded Preceding is the main one. It means start calculating at the beginning row, and continue calculating until the last row.
SELECT |
Product_ID , Sale_Date, Daily_Sales, |
|
SUM(Daily_Sales) OVER (ORDER BY Sale_Date |
|
ROWS UNBOUNDED PRECEDING) AS SUMOVER |
FROM Sales_Table WHERE Product_ID BETWEEN 1000 and 2000 ; |
The second “SUMOVER” row is 90739.28. That is derived by the first row’s Daily_Sales (41888.88) added to the SECOND row’s Daily_Sales (48850.40).
SELECT |
Product_ID , Sale_Date, Daily_Sales, |
|
SUM(Daily_Sales) OVER (ORDER BY Sale_Date |
|
ROWS UNBOUNDED PRECEDING) AS SUMOVER |
FROM Sales_Table WHERE Product_ID BETWEEN 1000 and 2000 ; |
The third “SUMOVER” row is 138739.28. That is derived by taking the first row’s Daily_Sales (41888.88) and adding it to the SECOND row’s Daily_Sales (48850.40). Then, you add that total to the THIRD row’s Daily_Sales (48000.00).
SELECT Product_ID , Sale_Date, Daily_Sales,
SUM(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING) AS SumOVER
FROM Sales_Table ;
You can have more than one SORT KEY. In the top query, Product_ID is the MAJOR Sort, and Sale_Date is the MINOR Sort.
With “Seq_Number”, because you placed the number 1 in the area which calculates the cumulative sum, it’ll continuously add 1 to the answer for each row.
SELECT Product_ID , Sale_Date, Daily_Sales,
SUM(Daily_Sales) OVER (ORDER BY Sale_Date
ROWS UNBOUNDED PRECEDING) AS AnsiCsum
FROM Sales_Table
GROUP BY Product_ID ;
Error! Why?
Never GROUP BY in a SUM()Over or with any ANSI Syntax OLAP command. If you want to reset you use a PARTITION BY Statement, but never a GROUP BY.
SELECT Product_ID , Sale_Date, Daily_Sales,
SUM(Daily_Sales) OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING) AS SumANSI
FROM Sales_Table ;
The PARTITION Statement is how you reset in ANSI. This will cause the SUMANSI to start over (reset) on its calculating for each NEW Product_ID.
SELECT Product_ID , Sale_Date, Daily_Sales,
SUM(Daily_Sales) OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING) AS Subtotal,
SUM(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING) AS GrandTotal
FROM Sales_Table ;
Above are two OLAP statements. Only one has PARTITION BY, so only it resets.
This is the Moving Sum (MSUM). It will calculate the Sum of 3 rows because that is the Moving Window. It will read the current row and TWO preceding to find the MSUM of those 3 rows. It will be sorted by Product_ID and Sale_Date first.
SELECT Product_ID , Sale_Date, Daily_Sales,
MSUM( Daily_Sales, 3, Product_ID, Sale_Date) as MSum3_Rows
FROM Sales_Table
With a Moving Window of 3, how is the 139350.69 amount derived in MSum3_Rows, which is in the third row? It is the SUM of 48850.40, 54500.22 and 36000.07. The fourth row has MSum3_Rows equal to 130700.72. That was the Sum of 54500.22, 360000.07 and 40200.43. The MSum is the current row sum plus the previous two.
The sorting above is a major and minor sort. The major sort is on the product_id column and that sort is done first. For any duplicate product_id rows the minor sort takes effect. Notice that all of the product_id 1000's are further sorted by sale_date.
What does the GROUP BY Product_ID do? It causes a reset on all Product_ID breaks.
How is the 89888.88 derived in the 9th row of the MSum3_Rows?
The GROUP BY reset the column to start over when Product_ID went to 2000. The 89888.88 is the sum of 41888.88 and 48000.00.
SELECT Product_ID , Sale_Date, Daily_Sales,
MSUM( Daily_Sales, 3, Product_ID, Sale_Date) AS MSum3
FROM Sales_Table ;
Challenge
Can you place
another equivalent
Moving Sum
in the SQL above
using ANSI Syntax?
Here is a challenge that almost everyone fails. Can you do it perfectly?
SELECT Product_ID , Sale_Date, Daily_Sales,
MSUM( Daily_Sales, 3, Product_ID, Sale_Date) AS MSum3,
SUM(Daily_Sales) OVER (ORDER BY Product_ID,
Sale_Date ROWS 2 Preceding) AS SUM3_ANSI
FROM Sales_Table ;
Not all rows are displayed in this answer set
The MSUM and SUM (Over) commands above are equivalent. Notice the Moving Window of 3 in the Teradata syntax is a 2 in the ANSI version. That is because in ANSI the moving window is considered the Current Row and 2 preceding.
The SUM () Over allows you to get the moving SUM of a certain column. The moving window in ANSI form always includes the current row. A Rows 2 Preceding statement means the current row and two preceding, which is a moving window of 3.
The SUM OVER places the sort after the ORDER BY.
SELECT Product_ID , Sale_Date, Daily_Sales,
SUM(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS 2 Preceding) AS Sum3_ANSI
FROM Sales_Table ;
With a Moving Window of 3, how is the 139350.69 amount derived in the Sum3_ANSI column in the third row?
SELECT Product_ID , Sale_Date, Daily_Sales,
SUM(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS 2 Preceding) AS Sum3_ANSI
FROM Sales_Table ;
With a Moving Window of 3, how is the 139350.69 amount derived in the Sum3_ANSI column in the third row? It is the sum of 48850.40, 54500.22 and 36000.07. The current row of Daily_Sales plus the previous two rows of Daily_Sales.
SELECT Product_ID , Sale_Date, Daily_Sales,
SUM(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS 2 Preceding) AS SUM3,
SUM(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED Preceding) AS Continuous
FROM Sales_Table;
Not all rows are displayed in this answer set
The ROWS 2 Preceding gives the MSUM for every 3 rows. The ROWS UNBOUNDED Preceding gives the continuous MSUM.
Use a PARTITION BY Statement to Reset the ANSI OLAP. Notice it only resets the OLAP command containing the Partition By statement, but not the other OLAPs.
This is the Moving Average (MAVG). It will calculate the average of 3 rows because that is the Moving Window. It will read the current row and TWO preceding to find the MAVG of those 3 rows. It will be sorted by Product_ID and Sale_Date first.
SELECT Product_ID , Sale_Date, Daily_Sales,
MAVG( Daily_Sales, 3, Product_ID, Sale_Date) AS AVG3_Rows
FROM Sales_Table ;
With a Moving Window of 3, how is the 46450.23 amount derived in the AVG3_Rows column in the third row? It is the AVG of 48850.40, 54500.22 and 36000.07! The fourth row has AVG3_Rows equal to 43566.91. That was the average of 54500.22, 360000.07 and 40200.43. The calculation is on the current row and the two before.
The sorting is show above.
What does the GROUP BY Product_ID do? It causes a reset on all Product_ID breaks.
How is the 44944.44 derived in the 9th row of the AVG_for_3_Rows?.
AVG of 41888.88 and 48000.00
Notice there are only two calculations although this has a moving window of 3. That is because the GROUP BY caused the MAVG to reset when Product_ID 2000 came.
SELECT Product_ID , Sale_Date, Daily_Sales,
MAVG( Daily_Sales, 3, Product_ID, Sale_Date) AS AVG3_Rows
FROM Sales_Table ;
Challenge
Can you place
another equivalent
Moving Average
in the SQL above
using ANSI Syntax?
Here is a challenge that almost everyone fails. Can you do it perfectly?
SELECT Product_ID , Sale_Date, Daily_Sales,
MAVG( Daily_Sales, 3, Product_ID, Sale_Date) AS AVG_3,
AVG(Daily_Sales) OVER (ORDER BY Product_ID,
Sale_Date ROWS 2 Preceding) AS AVG_3_ANSI
FROM Sales_Table ;
The MAVG and AVG(Over) commands above are equivalent. Notice the Moving Window of 3 in the Teradata syntax and that it is a 2 in the ANSI version. That is because in ANSI it is considered the Current Row and 2 preceding.
SELECT Product_ID , Sale_Date, Daily_Sales,
AVG(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS 2 Preceding) AS AVG_3
FROM Sales_Table ;
Notice the Moving Window of 3 in the syntax and that it is a 2 in the ANSI version. That is because in ANSI, it is considered the Current Row and 2 preceding.
The AVG () Over allows you to get the moving average of a certain column. The Rows 2 Preceding is a moving window of 3 in ANSI.
Much like the SUM OVER Command, the Average OVER places the sort keys via the ORDER BY keywords.
SELECT Product_ID , Sale_Date, Daily_Sales,
AVG(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS 2 Preceding) AS AVG_3_ANSI
FROM Sales_Table ;
With a Moving Window of 3, how is the 46450.23 amount derived in the AVG_3_ANSI column in the third row?
SELECT Product_ID , Sale_Date, Daily_Sales,
AVG(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS 2 Preceding) AS AVG_3_ANSI
FROM Sales_Table ;
AVG of 48850.40, 54500.22, and 36000.07
With a Moving Window of 3, the 46450.23 amount derived in the third row is the average of 48850.40, 54500.22, and 36000.07.
SELECT Product_ID , Sale_Date, Daily_Sales,
AVG(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS 2 Preceding) AS AVG_3_ANSI
FROM Sales_Table ;
With a Moving Window of 3, how is the 43566.91 amount derived in the AVG_3_ANSI column in the fourth row?.
SELECT Product_ID , Sale_Date, Daily_Sales,
AVG(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS 2 Preceding) AS AVG_3_ANSI
FROM Sales_Table ;
With a Moving Window of 3, how is the 43566.91 amount derived in the AVG_3_ANSI column in the fourth row? The current row plus Rows 2 Preceding.
SELECT Product_ID , Sale_Date, Daily_Sales,
AVG(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS 2 Preceding) AS AVG3,
AVG(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED Preceding) AS Continuous
FROM Sales_Table;
The ROWS 2 Preceding gives the MAVG for every 3 rows. The ROWS UNBOUNDED Preceding gives the continuous MAVG.
Use a PARTITION BY Statement to Reset the ANSI OLAP. The Partition By statement only resets the column using the statement. Notice that only Continuous resets.
This is the Moving Difference (MDIFF). What this does is calculate the difference between the current row and only the 4th row preceding.
SELECT Product_ID, Sale_Date, Daily_Sales,
Daily_Sales - SUM(Daily_Sales)
OVER ( ORDER BY Product_ID ASC, Sale_Date ASC
ROWS BETWEEN 4 PRECEDING AND 4 PRECEDING)
AS "MDiff_ANSI"
FROM Sales_Table ;
This is how you do a MDiff using the ANSI Syntax with a moving window of 4.
SELECT Product_ID, Sale_Date, Daily_Sales,
Daily_Sales - SUM(Daily_Sales) OVER (PARTITION BY Product_ID
ORDER BY Product_ID ASC, Sale_Date ASC
ROWS BETWEEN 4 PRECEDING AND 4 PRECEDING) AS "MDiff_ANSI"
FROM Sales_Table;
Wow! This is how you do a MDiff using the ANSI Syntax with a moving window of 4 and with a PARTITION BY statement.
SELECT Product_ID , Sale_Date, Daily_Sales,
MDIFF(Daily_Sales, 7, Product_ID, Sale_Date) as Compare2Rows
FROM Sales_Table GROUP BY Product_ID ;
Do you notice that column Compare2Rows did not produce any data? That is because the GROUP BY Reset before it could get 7 records to find the MDIFF.
SELECT Product_ID, Sale_Date, Daily_Sales,
ROW_NUMBER() OVER (PARTITION BY Product_ID ORDER BY Sale_Date
RESET WHEN Daily_Sales <= SUM(Daily_Sales)
OVER (PARTITION BY Product_ID
ORDER BY Sale_Date
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING)) -1 as Increases
FROM Sales_Table WHERE Product_ID Between 1000 and 2000 ;
This query finds how many consecutive days the Daily_Sales increases per product_id.
SELECT Product_ID, Month(Sale_Date) as Mo, sum(Daily_Sales)
as Monthly_Sum,
ROW_NUMBER() over (PARTITION BY Product_ID
ORDER BY Mo
RESET WHEN sum(Daily_Sales) <=SUM(sum(Daily_Sales))
over (PARTITION BY Product_ID
ORDER BY Mo
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING)) -1 as
Balance_Increase
FROM Sales_Table
GROUP BY Product_ID, Mo;
This query finds how many months per Product_ID the revenue has increased.
SELECT Product_ID ,Sale_Date , Daily_Sales,
RANK(Daily_Sales) AS "Rank"
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
This is the RANK. In this example, it will rank your Daily_Sales from greatest to least. The default for this type of RANK is to sort DESC.
SELECT Product_ID ,Sale_Date , Daily_Sales,
RANK(Daily_Sales ASC) AS "Rank"
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
This RANK query sorts in Ascending mode.
SELECT Product_ID ,Sale_Date , Daily_Sales,
RANK(Daily_Sales ASC) AS Rank1,
RANK(-Daily_Sales) AS Rank2
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
A minus sign or keyword ASC will sort RANK in Ascending mode.
SELECT Product_ID ,Sale_Date , Daily_Sales,
RANK() OVER (ORDER BY Daily_Sales) AS Rank1
FROM Sales_Table
WHERE Product_ID IN (1000, 2000) ;
This is the RANK() OVER. It provides a rank for your queries. Notice how you do not place anything within the () after the word RANK. Default Sort is ASC.
SELECT Product_ID ,Sale_Date , Daily_Sales,
RANK() OVER (ORDER BY Daily_Sales DESC)
AS Rank1
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
Is the query above in ASC mode or DESC mode for sorting?
SELECT Product_ID ,Sale_Date , Daily_Sales,
RANK() OVER (PARTITION BY Product_ID
ORDER BY Daily_Sales DESC) AS Rank1
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
What does the PARTITION Statement in the RANK() OVER do? It resets the rank.
SELECT Product_ID, Sale_Date, Daily_Sales,
RANK() OVER (ORDER BY Daily_Sales DESC) AS Rank1
FROM Sales_Table
WHERE Product_ID IN (1000, 2000)
QUALIFY Rank1 < 7
The QUALIFY statement limits rows once the Rank’s been calculated.
SELECT Product_ID ,Sale_Date , Daily_Sales,
RANK() OVER (PARTITION BY Product_ID
ORDER BY Daily_Sales DESC) AS Rank1
FROM Sales_Table
WHERE Product_ID IN (1000, 2000)
QUALIFY Rank1 < 4
What does the PARTITION Statement in the RANK() OVER do? It resets the rank. The QUALIFY statement limits rows once the Rank’s been calculated.
SELECT Product_ID ,Sale_Date , Daily_Sales,
RANK(Daily_Sales ASC) AS Rank1,
RANK(-Daily_Sales) AS Rank2
FROM Sales_Table
WHERE Product_ID IN (1000, 2000)
QUALIFY Rank(-Daily_Sales) < 6 ;
The WHERE statement is performed first. It limits the rows being calculated. Then the QUALIFY takes the calculated rows and limits the returning rows. QUALIFY is to OLAP what HAVING is to Aggregates. Both limit after the calculations. Notice that our Rank1 and Rank2 are exactly the same because we sorted them both the same.
SELECT Product_ID ,Sale_Date , Daily_Sales,
RANK(Daily_Sales ASC) AS Rank1,
RANK(-Daily_Sales) AS Rank2
FROM Sales_Table
WHERE Product_ID IN (1000, 2000)
QUALIFY Rank(-Daily_Sales) < 6 ;
How can you improve the QUALIFY Statement above for simplicity?
SELECT Product_ID ,Sale_Date , Daily_Sales,
RANK(Daily_Sales ASC) AS Rank1,
RANK(-Daily_Sales) AS Rank2
FROM Sales_Table
WHERE Product_ID IN (1000, 2000)
QUALIFY Rank2 < 6 ;
QUALIFY Rank2 < 6 (Use the Alias)
SELECT Product_ID ,Sale_Date , Daily_Sales,
RANK(Daily_Sales) AS Rank1
FROM Sales_Table
WHERE Product_ID IN (1000, 2000)
QUALIFY Rank1 < 6 ;
A QUALIFY < 6 will provide a result that is 5 rows. Notice there are NO ties, yet! This is merely because we have no ties, but turn to the next page and we will have.
SELECT Product_ID ,Sale_Date , Daily_Sales,
RANK(Daily_Sales ASC) AS Rank1
FROM Sales_Table
WHERE Product_ID IN (1000, 2000)
QUALIFY Rank1 < 6 ;
A QUALIFY < 6 will provide a result that is five rows. Notice there are Ties! This is because in ASC mode there are two matches within our first five rows.
SELECT Product_ID ,Sale_Date , Daily_Sales,
RANK(Daily_Sales ASC) AS Rank1
FROM Sales_Table
WHERE Product_ID IN (1000, 2000)
QUALIFY Rank1 < 2 ;
A QUALIFY < 2 will provide more rows than 1 because of the Ties!
Look at the Rankings and the Daily_Sales. This data come out odd because Rank1 is DESC by default (using this Syntax) and the QUALIFY specifies ASC mode.
SELECT Product_ID ,Sale_Date , Daily_Sales,
RANK(Daily_Sales) AS Rank1
FROM Sales_Table
WHERE Product_ID IN (1000, 2000)
GROUP BY Product_ID
QUALIFY Rank1 < 4 ;
What caused the data to reset the column Rank1?
SELECT Product_ID ,Sale_Date , Daily_Sales,
RANK(Daily_Sales) AS Rank1
FROM Sales_Table
WHERE Product_ID IN (1000, 2000)
GROUP BY Product_ID
QUALIFY Rank1 < 4 ;
GROUP BY
What caused the data to reset the column Rank1? It is the GROUP BY statement.
RANK() OVER (ORDER BY Daily_Sales) AS ANSI_Rank
Is the default above ASC or DESC?
RANK(Daily_Sales) AS NON_ANSI_Rank
Is the default above ASC or DESC?
Answer the questions above.
RANK() OVER (ORDER BY Daily_Sales) AS ANSI_Rank
Defaults to ASC
RANK(Daily_Sales) AS NON_ANSI_Rank
Defaults to DESC
Please note that by default these different syntaxes sort completely opposite.
SELECT Product_ID ,Sale_Date , Daily_Sales,
PERCENT_RANK() OVER (PARTITION BY PRODUCT_ID
ORDER BY Daily_Sales DESC) AS PercentRank1
FROM Sales_Table WHERE Product_ID in (1000, 2000) ;
We now have added a Partition statement which resets on Product_ID so this produces 7 rows for each of our Product_IDs.
SELECT Product_ID ,Sale_Date , Daily_Sales,
PERCENT_RANK()
OVER ( ORDER BY Daily_Sales DESC) AS PercentRank1
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
Percentage_Rank is just like RANK however, it gives you the Rank as a percent, but only a percent of all the other rows up to 100%.
SELECT Product_ID ,Sale_Date , Daily_Sales,
PERCENT_RANK() OVER ( ORDER BY Daily_Sales DESC)
AS PercentRank1
FROM Sales_Table ;
Percentage_Rank is just like RANK, however, it gives you the Rank as a percent but only a percent of all the other rows up to 100%.
SELECT Product_ID ,Sale_Date , Daily_Sales,
PERCENT_RANK() OVER (PARTITION BY PRODUCT_ID
ORDER BY Daily_Sales DESC) AS PercentRank1
FROM Sales_Table WHERE Product_ID in (1000, 2000) ;
What caused the Product_IDs to be sorted?
SELECT Product_ID ,Sale_Date , Daily_Sales,
PERCENT_RANK() OVER (PARTITION BY PRODUCT_ID
ORDER BY Daily_Sales DESC) AS PercentRank1
FROM Sales_Table WHERE Product_ID in (1000, 2000) ;
What caused the Product_IDs to be sorted? It was the PARTITION BY statement.
SELECT Product_ID ,Sale_Date , Daily_Sales,
COUNT(*) OVER (ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING) AS Seq_Number
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
This is the COUNT OVER. It will provide a sequential number starting at 1. The Keyword(s) ROWS UNBOUNDED PRECEDING causes Seq_Number to start at the beginning and increase sequentially to the end.
SELECT Product_ID ,Sale_Date , Daily_Sales,
COUNT(*) OVER (ORDER BY Product_ID, Sale_Date) AS No_Seq
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
When you don’t have a ROWS UNBOUNDED PRECEDING, No_Seq gets a value of 14 on every row. Why? Because 14 is the FINAL COUNT NUMBER.
SELECT Product_ID ,Sale_Date , Daily_Sales,
COUNT(*) OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING) AS StartOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
What Keyword(s) caused StartOver to reset?
SELECT Product_ID ,Sale_Date , Daily_Sales,
COUNT(*) OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING) AS StartOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
What Keyword(s) caused StartOver to reset? It is the PARTITION BY statement.
SELECT Product_ID ,Sale_Date , Daily_Sales,
MAX(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING) AS MaxOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
After the sort, the Max() Over shows the Max Value up to that point.
SELECT Product_ID ,Sale_Date , Daily_Sales,
MAX(Daily_Sales) OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING) AS MaxOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
The largest value is 64300.00 in the column MaxOver. Once it was evaluated, it did not continue until the end because of the PARTITION BY reset.
SELECT Product_ID ,Sale_Date , Daily_Sales,
MAX(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date )
AS MaxOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
You can also use MAX as a OLAP. 64300.00 came back in MaxOver because that was the MAX value for Daily_Sales in this Answer Set. Notice that it doesn’t have a ROWS UNBOUNDED PRECEDING.
SELECT Product_ID, Sale_Date ,Daily_Sales
,MIN(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING) AS MinOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
After the sort, the MIN () Over shows the Max Value up to that point.
SELECT Product_ID ,Sale_Date , Daily_Sales,
MIN(Daily_Sales) OVER (ORDER BY Product_ID, Sale_Date )
AS MinOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
Min only displayed 32800.50 because there is NOT a ROWS UNBOUNDED PRECEDING statement so it found the lowest Daily_Sales and repeated it.
SELECT Product_ID, Sale_Date, Daily_Sales,
MIN(Daily_Sales) OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Date
ROWS BETWEEN 1 Following and 1 Following) AS NextSale
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
The above example finds the value of a column in the next row for Daily_Sales. Notice it is partitioned, so there is a Null value at the end of each Product_ID.
SELECT Product_Id, Sale_Date,
MIN(Sale_Date) OVER (PARTITION BY Product_Id
ORDER BY Sale_Date
ROWS BETWEEN 1 FOLLOWING
AND UNBOUNDED FOLLOWING) AS Date_Of_Next_Row
,Date_Of_Next_Row -Sale_Date AS Days_To_Next_Row
FROM Sales_Table WHERE Product_ID Between 1000 and 2000 ;
The above query finds gaps in dates.
The above example shows the cumulative SUM for the Daily_Sales for the first three days for each of our Product_IDs.
SELECT Product_ID ,Sale_Date , Daily_Sales,
MIN(Daily_Sales) OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING) AS MinOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
The last two answers (MinOver) are blank, so you can fill in the blank.
SELECT Product_ID ,Sale_Date , Daily_Sales,
MIN(Daily_Sales) OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Date
ROWS UNBOUNDED PRECEDING) AS MinOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
The last two answers (MinOver) are filled in.
SELECT Product_ID ,Sale_Date , Daily_Sales,
ROW_NUMBER() OVER
(ORDER BY Product_ID, Sale_Date) AS Seq_Number
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
The ROW_NUMBER() Keyword(s) caused Seq_Number to increase sequentially. Notice that this does NOT have a Rows Unbounded Preceding, and it still works!
SELECT Product_ID ,Sale_Date , Daily_Sales,
ROW_NUMBER() OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Date ) AS StartOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
What Keyword(s) caused StartOver to reset?
SELECT Product_ID ,Sale_Date , Daily_Sales,
ROW_NUMBER() OVER (PARTITION BY Product_ID
ORDER BY Product_ID, Sale_Date ) AS StartOver
FROM Sales_Table WHERE Product_ID IN (1000, 2000) ;
What Keyword(s) caused StartOver to reset? It is the PARTITION BY statement.
SELECT Counter AS "Typical Rows per Value"
FROM
(SELECT Product_ID, COUNT(*)
FROM Sales_Table GROUP BY 1) AS TeraTom (Col1, Counter),
(SELECT COUNT(DISTINCT(Product_ID))
FROM Sales_Table) AS Derived2 (num_rows)
QUALIFY ROW_NUMBER () OVER
(ORDER BY TeraTom.Col1) = Derived2.num_rows /2 ;
Typical Rows Per Value
7
The query above retrieved the typical rows per value for the column Product_ID.
SELECT Counter AS "Typical Rows per Sale_Date"
FROM
(SELECT Sale_Date, COUNT(*)
FROM Sales_Table GROUP BY 1) AS TeraTom (Col1, Counter),
(SELECT COUNT(DISTINCT(Sale_Date))
FROM Sales_Table) AS Derived2 (num_rows)
QUALIFY ROW_NUMBER () OVER
(ORDER BY TeraTom.Col1) = Derived2.num_rows /2 ;
Typical Rows Per Sale_Date
3
The query above retrieved the typical rows per value for the column Sale_Date.