Chapter 2. Review of SQL Server Features for Developers

Before delving into the new features in SQL Server 2016, let's have a quick recapitulation of the features of SQL Server for developers already available in the previous versions of SQL Server. Recapitulating the most important features will help you remember what you already have in your development toolbox and also understand the need and the benefits of the new or improved features in SQL Server 2016.

Tip

This chapter has a lot of code. As this is not a book for beginners, the intention of this chapter is not to teach you the basics of database development. It is rather a reminder of the many powerful and efficient Transact-SQL (T-SQL) and other elements included in SQL Server version 2014 and even earlier.

The recapitulation starts with the mighty T-SQL SELECT statement. Besides the basic clauses, advanced techniques such as window functions, common table expressions, and the APPLY operator are explained. Then, you will pass quickly through creating and altering database objects, including tables and programmable objects, such as triggers, views, user-defined functions, and stored procedures. You will also review data modification language statements. Of course, errors might appear, so you have to know how to handle them. In addition, data integrity rules might require that two or more statements are executed as an atomic, indivisible block. You can achieve this with the help of transactions.

Note

Note that this chapter is not a comprehensive development guide.

The last section of this chapter deals with the parts of SQL Server Database Engine marketed with a common name: "Beyond Relational." This is nothing beyond the relational model—beyond relational is really just a marketing term. Nevertheless, you will review the following:

  • How SQL Server supports spatial data
  • How you can enhance the T-SQL language with Common Language Runtime (CLR) elements written in a .NET language, such as Visual C#
  • How SQL Server supports XML data

The code in this chapter uses the WideWorldImportersDW demo database. In order to test the code, this database must be present in the SQL Server instance you are using for testing, and you must also have SQL Server Management Studio (SSMS) as the client tool.

This chapter will cover the following points:

  • Core Transact-SQL SELECT statement elements
  • Advanced SELECT techniques
  • Data definition language statements
  • Data modification language statements
  • Triggers
  • Data abstraction: views, functions, and stored procedures
  • Error handling
  • Using transactions
  • Spatial data
  • CLR integration
  • XML support in SQL Server

The mighty Transact-SQL SELECT

You probably already know that the most important SQL statement is the mighty SELECT statement you use to retrieve data from your databases. Every database developer knows the basic clauses and their usage:

  • SELECT to define the columns returned, or a projection of all table columns
  • FROM to list the tables used in the query and how they are associated, or joined
  • WHERE to filter the data to return only the rows that satisfy the condition in the predicate
  • GROUP BY to define the groups over which the data is aggregated
  • HAVING to filter the data after the grouping with conditions that refer to aggregations
  • ORDER BY to sort the rows returned to the client application

Besides these basic clauses, SELECT offers a variety of advanced possibilities as well. These advanced techniques are unfortunately less exploited by developers, although they are really powerful and efficient. Therefore, I advise you to review them and potentially use them in your applications. The advanced query techniques presented here include:

  • Queries inside queries, or shortly, subqueries
  • Window functions
  • The TOP and OFFSET...FETCH expressions
  • The APPLY operator
  • Common tables expressions (CTEs)

Core Transact-SQL SELECT statement elements

Let's start with the simplest concept of SQL that every Tom, Dick, and Harry is aware of! The simplest query to retrieve the data you can write includes the SELECT and the FROM clauses. In the SELECT clause, you can use the star (*) character, literally SELECT *, to denote that you need all columns from a table in the result set. The following code switches to the WideWorldImportersDW database context and selects all data from the Dimension.Customer table. Note that there are many versions of this demo database, so your results might vary.

USE WideWorldImportersDW; 
SELECT * 
FROM Dimension.Customer; 

The code returns 403 rows, all customers with all columns.

Note

Using SELECT * is not recommended in production. Such queries can return an unexpected result when the table structure changes, and it's also not suitable for good optimization.

Better than using SELECT * is to explicitly list only the columns you need. This means you are returning only a projection on the table. The following example selects only four columns from the table:

SELECT [Customer Key], [WWI Customer ID], 
  [Customer], [Buying Group] 
FROM Dimension.Customer; 

Here is the shortened result, limited to the first three rows only:

Customer Key WWI Customer ID Customer                      Buying Group
------------ --------------- ----------------------------- -------------
0            0               Unknown                       N/A
1            1               Tailspin Toys (Head Office)   Tailspin Toys
2            2               Tailspin Toys (Sylvanite, MT) Tailspin Toys

You can see that the column names in the WideWorldImportersDW database include spaces. Names that include spaces are called delimited identifiers. In order to make SQL Server properly understand them as column names, you must enclose delimited identifiers in square parentheses. However, if you prefer to have names without spaces, or if you use computed expressions in the column list, you can add column aliases. The following query completely returns the same data as the previous one, just with columns renamed with aliases to avoid delimited names:

SELECT [Customer Key] AS CustomerKey, 
  [WWI Customer ID] AS CustomerId, 
  [Customer], 
  [Buying Group] AS BuyingGroup 
FROM Dimension.Customer; 

You might have noticed in the result set returned from the last two queries that there is also a row in the table for an unknown customer. You can filter this row with the WHERE clause:

SELECT [Customer Key] AS CustomerKey, 
  [WWI Customer ID] AS CustomerId, 
  [Customer],  
  [Buying Group] AS BuyingGroup 
FROM Dimension.Customer 
WHERE [Customer Key] <> 0; 

In a relational database, you typically have data spread across multiple tables. Each table represents a set of entities of the same kind, like the customers in the examples you have seen so far. In order to get meaningful result sets for the business your database supports, most of the time you need to retrieve data from multiple tables in the same query. You need to join two or more tables based on some conditions.

The most frequent kind of a join is the inner join. Rows returned are those for which the condition in the join predicate for the two tables joined evaluates to true. Note that in a relational database, you have three-valued logic, because there is always a possibility that a piece of data is unknown. You mark the unknown with the NULL keyword. A predicate can thus evaluate to true, false, or NULL. For an inner join, the order of the tables involved in the join is not important. In the following example, you can see the Fact.Sale table joined with an inner join to the Dimension.Customer table:

SELECT c.[Customer Key] AS CustomerKey, 
  c.[WWI Customer ID] AS CustomerId, 
  c.[Customer],  
  c.[Buying Group] AS BuyingGroup, 
  f.Quantity, 
  f.[Total Excluding Tax] AS Amount, 
  f.Profit 
FROM Fact.Sale AS f 
  INNER JOIN Dimension.Customer AS c 
    ON f.[Customer Key] = c.[Customer Key]; 

In the query, you can see that table aliases are used. If a column's name is unique across all tables in the query, then you can use it without a table name. If not, you need to use the table name in front of the column, to avoid ambiguous column names, in the table.column format. In the previous query, the [Customer Key] column appears in both tables. Therefore, you need to precede this column name with the table name of its origin to avoid ambiguity. You can shorten two-part column names by using table aliases. You specify table aliases in the FROM clause. Once you specify table aliases, you must always use the aliases; you can't refer to the original table names in that query anymore. Note that a column name might be unique in the query at the moment when you write the query. However, later somebody could add a column with the same name in another table involved in the query. If the column name is not preceded by an alias or by the table name, you would get an error when executing the query because of the ambiguous column name. In order to make the code more stable and more readable, you should always use table aliases for each column in the query. The preceding query returns 228,265 rows. It is always recommended to know at least approximately the number of rows your query should return. This number is the first control of the correctness of the result set, or put differently, whether the query is written logically correct. The query returns the unknown customer and the orders associated with this customer, or more precisely put, associated to this placeholder for an unknown customer. Of course, you can use the WHERE clause to filter the rows in a query that joins multiple tables, just as you use it for a single table query. The following query filters the unknown customer's rows:

SELECT c.[Customer Key] AS CustomerKey, 
  c.[WWI Customer ID] AS CustomerId, 
  c.[Customer],  
  c.[Buying Group] AS BuyingGroup, 
  f.Quantity, 
  f.[Total Excluding Tax] AS Amount, 
  f.Profit 
FROM Fact.Sale AS f 
  INNER JOIN Dimension.Customer AS c 
    ON f.[Customer Key] = c.[Customer Key] 
WHERE c.[Customer Key] <> 0; 

The query returns 143,968 rows. You can see that a lot of sales are associated with the unknown customer.

Of course, the Fact.Sale table cannot be joined to the Dimension.Customer table only. The following query joins it to the Dimension.Date table. Again, the join performed is an inner join:

SELECT d.Date, f.[Total Excluding Tax], 
  f.[Delivery Date Key] 
FROM Fact.Sale AS f 
  INNER JOIN Dimension.Date AS d 
    ON f.[Delivery Date Key] = d.Date; 

The query returns 227,981 rows. The query that joined the Fact.Sale table to the Dimension.Customer table returned 228,265 rows. It looks as if not all Fact.Sale table rows have a known delivery date; not all rows can match the Dimension.Date table rows. You can use an outer join to check this. With an outer join, you preserve the rows from one or both tables, even if they don't have a match in the other table. The result set returned includes all of the matched rows like you get from an inner join plus the preserved rows. Within an outer join, the order of the tables involved in the join might be important. If you use LEFT OUTER JOIN, then the rows from the left table are preserved. If you use RIGHT OUTER JOIN, then the rows from the right table are preserved. Of course, in both cases, the order of the tables involved in the join is important. With FULL OUTER JOIN, you preserve the rows from both tables, and the order of the tables is not important. The following query preserves the rows from the Fact.Sale table, which is on the left-hand side of the join to the Dimension.Date table. In addition, the query sorts the result set by the invoice date descending using the ORDER BY clause:

SELECT d.Date, f.[Total Excluding Tax], 
  f.[Delivery Date Key], f.[Invoice Date Key] 
FROM Fact.Sale AS f 
  LEFT OUTER JOIN Dimension.Date AS d 
    ON f.[Delivery Date Key] = d.Date 
ORDER BY f.[Invoice Date Key] DESC; 

The query returns 228,265 rows. Here is the partial result of the query:

Date       Total Excluding Tax  Delivery Date Key Invoice Date Key
---------- -------------------- ----------------- ----------------
NULL       180.00               NULL              2016-05-31
NULL       120.00               NULL              2016-05-31
NULL       160.00               NULL              2016-05-31
...          ...                ...                 ...
2016-05-31 2565.00              2016-05-31        2016-05-30
2016-05-31 88.80                2016-05-31        2016-05-30
2016-05-31 50.00                2016-05-31        2016-05-30

For the last invoice date (2016-05-31), the delivery date is NULL. The NULL in the Date column from the Dimension.Date table is there because the data from this table is unknown for the rows with an unknown delivery date in the Fact.Sale table.

Joining more than two tables is not tricky if all the joins are inner joins. The order of joins is not important. However, you might want to execute an outer join after all of the inner joins. If you don't control the join order with the outer joins, it might happen that a subsequent inner join filters out the preserved rows of an outer join. You can control the join order with parentheses. The following query joins the Fact.Sale table with an inner join to the Dimension.Customer, Dimension.City, Dimension.[Stock Item], and Dimension.Employee tables, and with a left outer join to the Dimension.Date table:

SELECT cu.[Customer Key] AS CustomerKey, cu.Customer, 
  ci.[City Key] AS CityKey, ci.City,  
  ci.[State Province] AS StateProvince, ci.[Sales Territory] AS SalesTeritory, 
  d.Date, d.[Calendar Month Label] AS CalendarMonth,  
  d.[Calendar Year] AS CalendarYear, 
  s.[Stock Item Key] AS StockItemKey, s.[Stock Item] AS Product, s.Color, 
  e.[Employee Key] AS EmployeeKey, e.Employee, 
  f.Quantity, f.[Total Excluding Tax] AS TotalAmount, f.Profit 
FROM (Fact.Sale AS f 
  INNER JOIN Dimension.Customer AS cu 
    ON f.[Customer Key] = cu.[Customer Key] 
  INNER JOIN Dimension.City AS ci 
    ON f.[City Key] = ci.[City Key] 
  INNER JOIN Dimension.[Stock Item] AS s 
    ON f.[Stock Item Key] = s.[Stock Item Key] 
  INNER JOIN Dimension.Employee AS e 
    ON f.[Salesperson Key] = e.[Employee Key]) 
  LEFT OUTER JOIN Dimension.Date AS d 
    ON f.[Delivery Date Key] = d.Date; 

The query returns 228,265 rows. Note that with the usage of the parenthesis the order of joins is defined in the following way:

  • Perform all inner joins, with an arbitrary order among them
  • Execute the left outer join after all of the inner joins

So far, I have tacitly assumed that the Fact.Sale table has 228,265 rows, and that the previous query needed only one outer join of the Fact.Sale table with Dimension.Date to return all of the rows. It would be good to check this number in advance. You can check the number of rows by aggregating them using the COUNT(*) aggregate function. The following query introduces this function:

SELECT COUNT(*) AS SalesCount 
FROM Fact.Sale; 

Now you can be sure that the Fact.Sale table has exactly 228,265 rows.

Many times, you need to aggregate data in groups. This is the point where the GROUP BY clause becomes handy. The following query aggregates the sales data for each customer:

SELECT c.Customer, 
  SUM(f.Quantity) AS TotalQuantity, 
  SUM(f.[Total Excluding Tax]) AS TotalAmount, 
  COUNT(*) AS InvoiceLinesCount 
FROM Fact.Sale AS f 
  INNER JOIN Dimension.Customer AS c 
    ON f.[Customer Key] = c.[Customer Key] 
WHERE c.[Customer Key] <> 0 
GROUP BY c.Customer; 

The query returns 402 rows, one for each known customer. In the SELECT clause, you can have only the columns used for grouping, or aggregated columns. You need to get a scalar, a single aggregated value for each row for each column not included in the GROUP BY list.

Sometimes, you need to filter aggregated data. For example, you might need to find only frequent customers, defined as customers with more than 400 rows in the Fact.Sale table. You can filter the result set on the aggregated data using the HAVING clause, as the following query shows:

SELECT c.Customer, 
  SUM(f.Quantity) AS TotalQuantity, 
  SUM(f.[Total Excluding Tax]) AS TotalAmount, 
  COUNT(*) AS InvoiceLinesCount 
FROM Fact.Sale AS f 
  INNER JOIN Dimension.Customer AS c 
    ON f.[Customer Key] = c.[Customer Key] 
WHERE c.[Customer Key] <> 0 
GROUP BY c.Customer 
HAVING COUNT(*) > 400; 

The query returns 45 rows for the 45 most frequent known customers. Note that you can't use column aliases from the SELECT clause in any other clause introduced in the previous query. The SELECT clause logically executes after all other clause from the query, and the aliases are not known yet. However, the ORDER BY clause executes after the SELECT clause, and therefore the columns aliases are already known and you can refer to them. The following query shows all of the basic SELECT statement clauses used together to aggregate the sales data over the known customers, filters the data to include the frequent customers only, and sorts the result set descending by the number of rows of each customer in the Fact.Sale table:

SELECT c.Customer, 
  SUM(f.Quantity) AS TotalQuantity, 
  SUM(f.[Total Excluding Tax]) AS TotalAmount, 
  COUNT(*) AS InvoiceLinesCount 
FROM Fact.Sale AS f 
  INNER JOIN Dimension.Customer AS c 
    ON f.[Customer Key] = c.[Customer Key] 
WHERE c.[Customer Key] <> 0 
GROUP BY c.Customer 
HAVING COUNT(*) > 400 
ORDER BY InvoiceLinesCount DESC;

The query returns 45 rows. Here is the shortened result set:

Customer                        TotalQuantity TotalAmount  SalesCount
------------------------------- ------------- ------------ ----------
Tailspin Toys (Vidrine, LA)          18899         340163.80    455
Tailspin Toys (North Crows Nest, IN) 17684         313999.50    443
Tailspin Toys (Tolna, ND)            16240         294759.10    443

Advanced SELECT techniques

Aggregating data over the complete input rowset or aggregating in groups produces aggregated rows only—either one row for the whole input rowset or one row per group. Sometimes you need to return aggregates together with the detailed data. One way to achieve this is by using subqueries, queries inside queries.

The following query shows an example of using two subqueries in a single query. In the SELECT clause, a subquery calculates the sum of quantities for each customer. It returns a scalar value. The subquery refers to the customer key from the outer query. The subquery can't execute without the outer query. This is a correlated subquery. There is another subquery in the FROM clause that calculates the overall quantity for all customers. This query returns a table, though it is a table with a single row and single column. It is a self-contained subquery, independent of the outer query. A subquery in the FROM clause is also called a derived table.

Another type of join is used to add the overall total to each detail row. A cross join is a Cartesian product of two input rowsets—each row from one side is associated with every single row from the other side. No join condition is needed. A cross join can produce an unwanted huge result set. For example, if you cross join just 1,000 rows from the left of the join with 1,000 rows from the right, you get 1,000,000 rows in the output. Therefore, typically you'd want to avoid a cross join in production. However, in the example in the following query, 143,968 rows from the left-hand-side rows are cross-joined to a single row from the subquery, therefore producing 143,968 rows only. Effectively, this means that the overall total column is added to each detail row:

SELECT c.Customer, 
  f.Quantity, 
  (SELECT SUM(f1.Quantity) FROM Fact.Sale AS f1 
   WHERE f1.[Customer Key] = c.[Customer Key]) AS TotalCustomerQuantity, 
  f2.TotalQuantity 
FROM (Fact.Sale AS f 
  INNER JOIN Dimension.Customer AS c 
    ON f.[Customer Key] = c.[Customer Key]) 
  CROSS JOIN  
    (SELECT SUM(f2.Quantity) FROM Fact.Sale AS f2 
   WHERE f2.[Customer Key] <> 0) AS f2(TotalQuantity) 
WHERE c.[Customer Key] <> 0 
ORDER BY c.Customer, f.Quantity DESC; 

Here is the abbreviated output of the query:

Customer                    Quantity TotalCustomerQuantity TotalQuantity
--------------------------- -------- --------------------- -------------
Tailspin Toys (Absecon, NJ) 360      12415                 5667611
Tailspin Toys (Absecon, NJ) 324      12415                 5667611
Tailspin Toys (Absecon, NJ) 288      12415                 5667611

In the previous example, the correlated subquery in the SELECT clause has to logically execute once per row of the outer query. The query was partially optimized by moving the self-contained subquery for the overall total in the FROM clause, where it logically executes only once. Although SQL Server can often optimize correlated subqueries and convert them to joins, there also exists a much better and more efficient way to achieve the same result as the previous query returned. You can do this by using the window functions.

The following query uses the window aggregate function SUM to calculate the total over each customer and the overall total. The OVER clause defines the partitions, or the windows of the calculation. The first calculation is partitioned over each customer, meaning that the total quantity per customer is reset to zero for each new customer. The second calculation uses an OVER clause without specifying partitions, thus meaning the calculation is done over all input rowsets. This query produces exactly the same result as the previous one:

SELECT c.Customer, 
  f.Quantity, 
  SUM(f.Quantity) 
   OVER(PARTITION BY c.Customer) AS TotalCustomerQuantity, 
  SUM(f.Quantity) 
   OVER() AS TotalQuantity 
FROM Fact.Sale AS f 
  INNER JOIN Dimension.Customer AS c 
    ON f.[Customer Key] = c.[Customer Key] 
WHERE c.[Customer Key] <> 0 
ORDER BY c.Customer, f.Quantity DESC; 

You can use many other functions for window calculations. For example, you can use the ranking functions, such as ROW_NUMBER(), to calculate some rank in the window or in the overall rowset. However, rank can be defined only over some order of the calculation. You can specify the order of the calculation in the ORDER BY sub-clause inside the OVER clause. Note that this ORDER BY clause defines only the logical order of the calculation, and not the order of the rows returned. A standalone outer ORDER BY clause at the end of the query defines the order of the result.

The following query calculates a sequential number, the row number of each row in the output, for each detail row of the input rowset. The row number is calculated once in partitions for each customer and once ever the whole input rowset. The logical order of calculation is over quantity descending, meaning that row number 1 gets the largest quantity, either the largest for each customer or the largest in the whole input rowset:

SELECT c.Customer, 
  f.Quantity, 
  ROW_NUMBER() 
   OVER(PARTITION BY c.Customer 
        ORDER BY f.Quantity DESC) AS CustomerOrderPosition, 
  ROW_NUMBER() 
   OVER(ORDER BY f.Quantity DESC) AS TotalOrderPosition 
FROM Fact.Sale AS f 
  INNER JOIN Dimension.Customer AS c 
    ON f.[Customer Key] = c.[Customer Key] 
WHERE c.[Customer Key] <> 0 
ORDER BY c.Customer, f.Quantity DESC; 

The query produces the following result, again abbreviated to a couple of rows only:

Customer                 Quantity CustomerOrderPosition TotalOrderPosition
------------------------ -------- --------------------- -----------------
Tailspin Toys (Absecon, NJ)   360        1                     129
Tailspin Toys (Absecon, NJ)   324        2                     162
Tailspin Toys (Absecon, NJ)   288        3                     374
...                           ...       ...                    ...
Tailspin Toys (Aceitunas, PR) 288        1                     392
Tailspin Toys (Aceitunas, PR) 250        4                     1331
Tailspin Toys (Aceitunas, PR) 250        3                     1315
Tailspin Toys (Aceitunas, PR) 250        2                     1313
Tailspin Toys (Aceitunas, PR) 240        5                     1478

Note the position, or the row number, for the second customer. The order does not appear completely correct---it is 1, 4, 3, 2, 5, and not 1, 2, 3, 4, 5, as you might expect. This is due to the repeating value for the second largest quantity---the quantity 250. The quantity is not unique, and thus the order is not deterministic. The order of the result is defined over the quantity and not over the row number. You can't know in advance which row will get which row number when the order of the calculation is not defined on unique values. Also, note that you might get a different order when you execute the same query on your SQL Server instance.

Window functions are useful for some advanced calculations, such as running totals and moving averages as well. However, the calculation of these values can't be performed over the complete partition. Additionally, you can frame the calculation to a subset of rows of each partition only.

The following query calculates the running total of the quantity per customer (the Q_RT column alias in the query) ordered by the sale key and framed differently for each row. The frame is defined from the first row in the partition to the current row. Therefore, the running total is calculated over one row for the first row, over two rows for the second row, and so on. Additionally, the query calculates the moving average of the quantity (the Q_MA column alias in the query) for the last three rows:

SELECT c.Customer, 
  f.[Sale Key] AS SaleKey, 
  f.Quantity, 
  SUM(f.Quantity) 
   OVER(PARTITION BY c.Customer 
        ORDER BY [Sale Key] 
      ROWS BETWEEN UNBOUNDED PRECEDING 
                 AND CURRENT ROW) AS Q_RT, 
  AVG(f.Quantity) 
   OVER(PARTITION BY c.Customer 
        ORDER BY [Sale Key] 
      ROWS BETWEEN 2 PRECEDING 
                 AND CURRENT ROW) AS Q_MA 
FROM Fact.Sale AS f 
  INNER JOIN Dimension.Customer AS c 
    ON f.[Customer Key] = c.[Customer Key] 
WHERE c.[Customer Key] <> 0 
ORDER BY c.Customer, f.[Sale Key]; 

The query returns the following (abbreviated) result:

Customer                    SaleKey  Quantity    Q_RT        Q_MA
--------------------------- -------- ----------  ----------- ----------
Tailspin Toys (Absecon, NJ)  2869     216         216         216
Tailspin Toys (Absecon, NJ)  2870     2           218         109
Tailspin Toys (Absecon, NJ)  2871     2           220         73

Let's find the top three orders by quantity for the Tailspin Toys (Aceitunas, PR) customer! You can do this by using the OFFSET...FETCH clause after the ORDER BY clause, as the following query shows:

SELECT c.Customer, 
  f.[Sale Key] AS SaleKey, 
  f.Quantity 
FROM Fact.Sale AS f 
  INNER JOIN Dimension.Customer AS c 
    ON f.[Customer Key] = c.[Customer Key] 
WHERE c.Customer = N'Tailspin Toys (Aceitunas, PR)' 
ORDER BY f.Quantity DESC 
OFFSET 0 ROWS FETCH NEXT 3 ROWS ONLY; 

This is the complete result of the query:

Customer                       SaleKey  Quantity
------------------------------ -------- --------
Tailspin Toys (Aceitunas, PR)  36964    288
Tailspin Toys (Aceitunas, PR)  126253   250
Tailspin Toys (Aceitunas, PR)  79272    250

But wait.... Didn't the second largest quantity, the value 250, repeat three times? Which two rows were selected in the output? Again, because the calculation is done over a non-unique column, the result is somehow non-deterministic. SQL Server offers another possibility, the TOP clause. You can specify TOP n WITH TIES, meaning you can get all the rows with ties on the last value in the output. However, this way you don't know the number of rows in the output in advance. The following query shows this approach:

SELECT TOP 3 WITH TIES 
  c.Customer, 
  f.[Sale Key] AS SaleKey, 
  f.Quantity 
FROM Fact.Sale AS f 
  INNER JOIN Dimension.Customer AS c 
    ON f.[Customer Key] = c.[Customer Key] 
WHERE c.Customer = N'Tailspin Toys (Aceitunas, PR)' 
ORDER BY f.Quantity DESC;  

This is the complete result of the previous query---this time it is four rows:

Customer                     SaleKey  Quantity
---------------------------- -------- --------
Tailspin Toys (Aceitunas, PR)  36964    288
Tailspin Toys (Aceitunas, PR)  223106   250
Tailspin Toys (Aceitunas, PR)  126253   250
Tailspin Toys (Aceitunas, PR)  79272    250

The next task is to get the top three orders by quantity for each customer. You need to perform the calculation for each customer. The APPLY Transact-SQL operator comes in handy here. You use it in the FROM clause. You apply, or execute, a table expression defined on the right-hand side of the operator once for each row of the input rowset from the left side of the operator. There are two flavors of this operator. The CROSS APPLY version filters out the rows from the left rowset if the tabular expression on the right-hand side does not return any rows. The OUTER APPLY version preserves the row from the left-hand side, even if the tabular expression on the right-hand side does not return any rows, just as the LEFT OUTER JOIN does. Of course, columns for the preserved rows do not have known values from the right-hand side tabular expression. The following query uses the CROSS APPLY operator to calculate the top three orders by quantity for each customer that actually does have some orders:

SELECT c.Customer, 
  t3.SaleKey, t3.Quantity 
FROM Dimension.Customer AS c 
  CROSS APPLY (SELECT TOP(3)  
                f.[Sale Key] AS SaleKey, 
                f.Quantity 
                FROM Fact.Sale AS f 
                WHERE f.[Customer Key] = c.[Customer Key] 
                ORDER BY f.Quantity DESC) AS t3 
WHERE c.[Customer Key] <> 0 
ORDER BY c.Customer, t3.Quantity DESC; 

The following is the result of this query, shortened to the first nine rows:

Customer                         SaleKey  Quantity
-------------------------------- -------- --------
Tailspin Toys (Absecon, NJ)        5620     360
Tailspin Toys (Absecon, NJ)        114397   324
Tailspin Toys (Absecon, NJ)        82868    288
Tailspin Toys (Aceitunas, PR)      36964    288
Tailspin Toys (Aceitunas, PR)      126253   250
Tailspin Toys (Aceitunas, PR)      79272    250
Tailspin Toys (Airport Drive, MO)  43184    250
Tailspin Toys (Airport Drive, MO)  70842    240
Tailspin Toys (Airport Drive, MO)  630      225

For the final task in this section, assume that you need to calculate some statistics on the totals of customers' orders. You need to calculate the average total amount for all customers, the standard deviation of this total amount, and the average count of the total count of orders per customer. This means you need to calculate the totals over customers in advance, and then use the AVG() and STDEV()aggregate functions on these aggregates. You could do aggregations over customers in advance in a derived table. However, there is another way to achieve this. You can define the derived table in advance, in the WITH clause of the SELECT statement. Such a subquery is called a common table expression, or CTE.

CTEs are more readable than derived tables, and may also be more efficient. You could use the result of the same CTE multiple times in the outer query. If you use derived tables, then you need to define them multiple times if you want to use them multiple times in the outer query. The following query shows the usage of a CTE to calculate the average total amount for all customers, the standard deviation of this total amount, and the average count of total count of orders per customer:

WITH CustomerSalesCTE AS 
( 
SELECT c.Customer,  
  SUM(f.[Total Excluding Tax]) AS TotalAmount, 
  COUNT(*) AS InvoiceLinesCount 
FROM Fact.Sale AS f 
  INNER JOIN Dimension.Customer AS c 
    ON f.[Customer Key] = c.[Customer Key] 
WHERE c.[Customer Key] <> 0  
GROUP BY c.Customer 
) 
SELECT ROUND(AVG(TotalAmount), 6) AS AvgAmountPerCustomer, 
  ROUND(STDEV(TotalAmount), 6) AS StDevAmountPerCustomer,  
  AVG(InvoiceLinesCount) AS AvgCountPerCustomer 
FROM CustomerSalesCTE; 

It returns the following result:

AvgAmountPerCustomer  StDevAmountPerCustomer AvgCountPerCustomer
--------------------- ---------------------- -------------------
270479.217661         38586.082621           358
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset