Chapter 3: Writing Simple SELECT Queries

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 3

Writing Simple SELECT Queries

Chapter 1 had you prepare your computer by installing SQL Server and the AdventureWorks sample database. You learned how to get around in SQL Server Management Studio and a few tips to help make writing queries easier. In Chapter 2, you learned about databases, tables, and the other objects that make up a database.

Now that you’re ready, it’s time to learn how to retrieve data from an SQL Server database, which you will do by using the SELECT statement, starting with the simplest syntax. This chapter will cover the different parts, called clauses, of the SELECT statement so that you will be able to not only retrieve data but also filter and order it. The ultimate goal is to get exactly the data you need from your database—no more, no less.

Beginning in this chapter, you will find many code examples. Even though all the code is available from this book’s catalog pages at www.apress.com, you will probably find that by typing the examples yourself you will learn more quickly. As they say, practice makes perfect! In addition, exercises follow many of the sections so that you can practice using what you have just learned. You can find the answers for each set of exercises at the end of the chapter.

Note If you take a look at SQL Server Books Online, you will find the syntax displayed for each kind of statement. Books Online displays every possible parameter and option, which is not always helpful when learning about a new concept for the first time. In this book, you will find only the syntax that applies to the topic being discussed at the time.

Using the SELECT Statement

You use the SELECT statement to retrieve data from SQL Server. T-SQL requires only the word SELECT followed by at least one item in what is called a select-list.

If SQL Server Management Studio is not running, go ahead and start it. When prompted to connect to SQL Server, enter the name of the SQL Server instance you installed in Chapter 1 or the name of your development SQL Server. You will need the AdventureWorks sample databases installed to follow along with the examples and to complete the exercises. You will find instructions for installing the sample databases in Chapter 1.

Selecting a Literal Value

Perhaps the simplest form of a SELECT statement is that used to return a literal value. A literal value is one that you specify exactly. It is not data that come from the database. Begin by clicking New Query to open a new query window. Listing 3-1 shows two SELECT statements that each return a literal value. Notice the single quote marks that are used to designate the string value. It is recommended that all T-SQL statements be followed with a semicolon (;). At this point, the semicolons are not required, but it is a good practice to get in the habit of using them so you will be ready once they are required. Type each line of the code from Listing 3-1 into your query window.

Listing 3-1. Statements Returning Literal Values

SELECT 1;
SELECT 'ABC';

After typing the code in the query window, press F5 or click Execute to run the code. You will see the results displayed in two windows at the bottom of the screen, as shown in Figure 3-1. Because you just ran two statements, two sets of results are displayed.

Figure 3-1. The results of running your first T-SQL statements

Tip By highlighting one or more statements in the query window, you can run just a portion of the code. For example, you may want to run one statement at a time. Use the mouse to select the statements you want to run, and press F5. You can also click the Execute icon or press CTRL+E.

Notice the Messages tab next to the Results tab. Click Messages, and you will see the number of rows affected by the statements as well as any error or informational messages. In this case, the rows affected are actually just outputted. If an error occurs, you will see the Messages tab selected by default instead of the Results tab when the statement execution completes. You can then find the results, if any, by clicking the Results tab.

Retrieving from a Table

You will usually want to retrieve data from a table instead of literal values. After all, if you already know what value you want, you probably don’t need to execute a query to get that value.

In preparation for retrieving data from a table, either delete the current code or open a new query window. Change to the example database by typing Use AdventureWorks and executing or by selecting the AdventureWorks database from the drop-down list, as shown in Figure 3-2.

Figure 3-2. Choosing the AdventureWorks database

You use the FROM clause to specify a table name in a SELECT statement. The FROM clause is the first part of the statement that the database engine evaluates and processes. Here is the syntax for the SELECT statement with a FROM clause:

SELECT <column1>, <column2> FROM <schema>.<table>;

Note There are many versions of the AdventureWorks database with the SQL Server version appended to the name. Throughout this book, the generic name “AdventureWorks” will be used instead of any specific version.

Type in and execute the code in Listing 3-2 to learn how to retrieve data from a table.

Listing 3-2. Writing a Query with a FROM Clause

USE AdventureWorks;
GO
SELECT BusinessEntityID, JobTitle
FROM HumanResources.Employee;

The first statement in Listing 3-2 switches the connection to the AdventureWorks database if it’s not already connected to it. The word GO doesn’t really do anything except divide the code up into separate distinct code batches. You’ll learn more about the reasons for using batches in Chapter 12.

When retrieving from a table, you still have a select-list as shown in Listing 3-1; however, your select-list typically contains column names from a table. The select-list in Listing 3-2 requests data from the BusinessEntityID and JobTitle columns, which are both found in the Employee table. The Employee table is in turn found in the HumanResources schema.

Figure 3-3 shows the output from executing the code in Listing 3-2. There is only one set of results, because there is only one SELECT statement.

Figure 3-3. The partial results of running a query with a FROM clause

Notice that the FROM clause in Listing 3-2 specifies the table name in two parts: HumanResources.Employee. The first part—HumanResources—is a schema name. In SQL Server, groups of related tables can be organized together as schemas. You don’t always need to provide those schema names, but it’s the best practice to do so. Two schemas can potentially each contain a table named Employee, and those would be different tables with different structures and data. Specifying the schema name as part of your table reference eliminates a source of potential confusion and error.

To retrieve all the columns from a table, you can use the * symbol, also known as asterisk, star, or splat. Run the following statement to try this shortcut: SELECT * FROM HumanResources.Employee. You will see that all the columns from the table are returned.

The asterisk technique is useful for performing a quick query, but you should avoid it in a production application, report, or process. Retrieving more data than you really need may have a negative impact on performance. Why retrieve all the columns from a table and pull more data across the network when you need only a few columns? Besides performance, application code may break if an additional column is added to or removed from the table. Additionally, there might be security reasons for returning only some of the columns. The best practice is to write select-lists specifying exactly the columns that you need and return only the rows you need.

Generating a Select-List

You might think that typing all the required columns for a select-list is tedious work. Luckily, SQL Server Management Studio provides a shortcut for writing good SELECT statements. Follow these instructions to learn the shortcut:

In the Object Explorer, expand Databases.
Expand the AdventureWorks database.
Expand Tables.
Right-click the HumanResources.Employee table.
Select Script Table as…. SELECT To New Query Editor Window.
Run the code.

You now have a properly formed SELECT statement, as shown in Listing 3-3, that retrieves all the columns from the HumanResources.Employee table. You can also easily remove any unneeded columns from the query.

Listing 3-3. A Scripted SELECT Statement

USE [AdventureWorks]
GO

SELECT [BusinessEntityID]
      ,[NationalIDNumber]
      ,[LoginID]
      ,[OrganizationNode]
      ,[OrganizationLevel]
      ,[JobTitle]
      ,[BirthDate]
      ,[MaritalStatus]
      ,[Gender]
      ,[HireDate]
      ,[SalariedFlag]
      ,[VacationHours]
      ,[SickLeaveHours]
      ,[CurrentFlag]
      ,[rowguid]
      ,[ModifiedDate]
  FROM [HumanResources].[Employee]
GO

Notice the brackets around the names in Listing 3-3. Column and table names need to follow specific naming rules so that SQL Server’s parser can recognize them. When a table, column, or database has a name that doesn’t follow those rules, you can still use that name, but you must enclose it within square brackets ([]). Automated tools often enclose all names within square brackets as a just-in-case measure.

Note Another shortcut to typing all the column names is to click and drag the column(s) from the left side of Management Studio into the query window. For example, if you click the Columns folder and drag it to the query window, SQL Server will list all the columns.

Mixing Literals and Column Names

You can mix literal values and column names in one statement. Listing 3-4 shows an example. SQL Server allows you to create or rename a column within a query by using what is known as an alias. You use the keyword AS to specify an alias for the column. This is especially useful when using literal values where you create a column name in the T-SQL statement that doesn’t exist in the table.

Listing 3-4. Mixing Literal Values and Column Names

SELECT 'A Literal Value' AS "Literal Value",
    BusinessEntityID AS EmployeeID,
    LoginID JobTitle
FROM HumanResources.Employee;

Now execute the query in Listing 3-4. You should see results similar to those shown in Figure 3-4. Notice the column names in your results. The column names are the aliases that you specified in your query. You can alias any column, giving you complete control over the headers for your result sets.

Figure 3-4. The partial results of using aliases

The keyword AS is optional. You can specify an alias name immediately following a column name. If an alias contains a space or is a reserved word, you can surround the alias with square brackets, single quotes, or double quotes. If the alias follows the rules for naming objects, the quotes or square brackets are not required.

Be aware that any word listed immediately after a column within the SELECT list is treated as an alias. If you forget to add the comma between two column names, the second column name will be used as the alias for the first. Omitting this comma is a common error. Look carefully at the query in Listing 3-4 and you’ll see that the intent is to display the LoginID and JobTitle columns. Because the comma was left out between those two column names, the name of the LoginID column was changed to JobTitle. JobTitle was treated as an alias rather than as an additional column. Watch for and avoid this common mistake.

Reading about T-SQL and typing in code examples are wonderful ways to learn. The best way to learn, however, is to figure out the code for yourself. Imagine learning how to swim by reading about it instead of jumping into the water. Practice now with what you have learned so far. Follow the instructions in Exercise 3-1, and write a few queries to test what you know.

EXERCISE 3-1

For this exercise, switch to the AdventureWorks database. You can find the solutions to this exercise at the end of the chapter.

Remember that you can expand the tables in the Object Explorer to see the list of table names and then expand the column section of the table to see the list of column names.

Now, try your hand at the following tasks:

Write a SELECT statement that lists the customers along with their ID numbers. Include the StoreID and the AccountNumber from the Sales.Customers table.
Write a SELECT statement that lists the name, product number, and color of each product from the Production.Product table.
Write a SELECT statement that lists the customer ID numbers and sales order ID numbers from the Sales.SalesOrderHeader table.
Answer this question: Why should you specify column names rather than an asterisk when writing the select-list? Give at least two reasons.

Filtering Data

Usually an application requires only a fraction of the rows from a table at any given time. For example, an order-entry application that shows the order history will often need to display the orders for only one customer at a time. There might be millions of orders in the database, but the operator of the software will view only a handful of rows instead of the entire table. Filtering data is a very important part of T-SQL.

Adding a WHERE Clause

To filter the rows returned from a query, you will add a WHERE clause to your SELECT statement. The database engine processes the WHERE clause second, right after the FROM clause. The WHERE clause will contain expressions, called predicates, that can be evaluated to TRUE, FALSE, or UNKNOWN. You will learn more about UNKNOWN in the “Working with NULL” section later in the chapter. The WHERE clause syntax is as follows:

SELECT <column1>,<column2>
FROM <schema>.<table>
WHERE <column> = <value>;

Listing 3-5 shows the syntax and some examples demonstrating how to compare a column to a literal value. The following examples are from the AdventureWorks database. Be sure to type each query into the query window and execute the statement to see how it works. Make sure you understand how the expression in the WHERE clause affects the results returned by each query. Notice that tick marks, or single quotes, have been used around literal strings and dates.

Listing 3-5. How to Use the WHERE Clause

--1
SELECT CustomerID, SalesOrderID
FROM Sales.SalesOrderHeader
WHERE CustomerID = 11000;

--2
SELECT CustomerID, SalesOrderID
FROM Sales.SalesOrderHeader
WHERE SalesOrderID = 43793;

--3
SELECT CustomerID, SalesOrderID, OrderDate
FROM Sales.SalesOrderHeader
WHERE OrderDate = '2005-07-02';

--4
SELECT BusinessEntityID, LoginID, JobTitle
FROM HumanResources.Employee
WHERE JobTitle = 'Chief Executive Officer';

Each query in Listing 3-5 returns rows that are filtered by the expression in the WHERE clause. Be sure to check the results of each query to make sure that the expected rows are returned (see Figure 3-5). Each query returns only the information specified in that query’s WHERE clause.

Figure 3-5. The results of using the WHERE clause

Note Throughout this book you will see many comments in the code. Comments are not processed by SQL Server and will help make your code more readable. To create a one-line comment, type in two dashes (--). To comment out several lines begin the section with a slash and asterisk (/*) and end the section with an asterisk and slash (*/).

Using WHERE Clauses with Alternate Operators

Within WHERE clause expressions, you can use many comparison operators, not just the equals sign. Books Online lists the following operators:

> (greater than)
< (less than)
= (equals)
<= (less than or equal to)
>= (greater than or equal to)
!= (not equal to)
<> (not equal to)
!< (not less than)
!> (not greater than)

Type in and execute the queries in Listing 3-6 to practice using these additional operators in the WHERE clause.

Listing 3-6. Using Operators with the WHERE Clause

--Using a DateTime column
--1
SELECT CustomerID, SalesOrderID, OrderDate
FROM Sales.SalesOrderHeader
WHERE OrderDate > '2005-07-05';

--2
SELECT CustomerID, SalesOrderID, OrderDate
FROM Sales.SalesOrderHeader
WHERE OrderDate < '2005-07-05';

--3
SELECT CustomerID, SalesOrderID, OrderDate
FROM Sales.SalesOrderHeader
WHERE OrderDate >= '2005-07-05';

--4
SELECT CustomerID, SalesOrderID, OrderDate
FROM Sales.SalesOrderHeader
WHERE OrderDate <> '2005-07-05';

--5
SELECT CustomerID, SalesOrderID, OrderDate
FROM Sales.SalesOrderHeader
WHERE OrderDate != '2005-07-05';

--Using a number column
--6
SELECT SalesOrderID, SalesOrderDetailID, OrderQty
FROM Sales.SalesOrderDetail
WHERE OrderQty > 10;

--7
SELECT SalesOrderID, SalesOrderDetailID, OrderQty
FROM Sales.SalesOrderDetail
WHERE OrderQty <= 10;

--8
SELECT SalesOrderID, SalesOrderDetailID, OrderQty
FROM Sales.SalesOrderDetail
WHERE OrderQty <> 10;

--9
SELECT SalesOrderID, SalesOrderDetailID, OrderQty
FROM Sales.SalesOrderDetail
WHERE OrderQty != 10;

--Using a string column
--10
SELECT BusinessEntityID, FirstName
FROM Person.Person
WHERE FirstName <> 'Catherine';

--11
SELECT BusinessEntityID, FirstName
FROM Person.Person
WHERE FirstName != 'Catherine';

--12
SELECT BusinessEntityID, FirstName
FROM Person.Person
WHERE FirstName > 'M';

--13
SELECT BusinessEntityID, FirstName
FROM Person.Person
WHERE FirstName !> 'M';

Take a look at the results of each query to make sure that the results make sense and that you understand why you are getting them. Remember that both != and <> mean “not equal to” and are interchangeable. Using either operator should return the same results if all other aspects of a query are the same.

You may find the results of query 12 interesting. At first glance, you may think that only rows with the first name beginning with the letter N or later in the alphabet should be returned. However, if any FirstName value begins with M followed by at least one additional character, the value is greater than M, so the row will be returned. For example, Ma is greater than M.

Query 13 is also an interesting example that you will probably not see often. This means that the FirstName is not greater than M. That means that it could be M or anything in the alphabet before M.

Using BETWEEN

BETWEEN is another useful operator you can use in the WHERE clause to specify an inclusive range of values. It is frequently used with dates but can be used with string and numeric data as well. Here is the syntax for BETWEEN:

SELECT <column1>,<column2>
FROM <schema>.<table>
WHERE <column> BETWEEN <value1> AND <value2>;

Type in and execute the code in Listing 3-7 to learn how to use BETWEEN.

Listing 3-7. Using BETWEEN

--1
SELECT CustomerID, SalesOrderID, OrderDate
FROM Sales.SalesOrderHeader
WHERE OrderDate BETWEEN '2005-07-02' AND '2005-07-04';

--2
SELECT CustomerID, SalesOrderID, OrderDate
FROM Sales.SalesOrderHeader
WHERE CustomerID BETWEEN 25000 AND 25005;

--3
SELECT BusinessEntityID, JobTitle
FROM HumanResources.Employee
WHERE JobTitle BETWEEN 'C' and 'E';

--4 An illogical BETWEEN expression
SELECT CustomerID, SalesOrderID, OrderDate
FROM Sales.SalesOrderHeader
WHERE CustomerID BETWEEN 25005 AND 25000;

Pay close attention to the results of Listing 3-7 shown in Figure 3-6. Query 1 returns all orders placed on the two dates specified in the query as well as the orders placed between the dates. You will see the same behavior from the second query—all orders placed by customers with customer IDs within the range specified. What can you expect from query 3? You will see all job titles that start with C or D. You will not see the job titles beginning with E, however. A job title composed of only the letter E would be returned in the results. Any job title beginning with E and at least one other character is greater than E and therefore not within the range. For example, the Ex in Executive is greater than just E, so any job titles beginning with Executive get eliminated.

Figure 3-6. The partial results of queries with BETWEEN

Query 4 returns no rows at all because the values listed in the BETWEEN expression are switched. No values meet the qualification of being greater than or equal to 25005 and also less than or equal to 25000. Make sure you always list the lower value first and the higher value second when using BETWEEN. Another thing to notice about the results is that all of the datetime values do not contain times. When time values are recorded, then the expressions in the WHERE clause must take this into account. You’ll see an example of this later in this chapter in the section “Filtering on Date and Time.”

Using BETWEEN with NOT

To find values outside a particular range of values, you write the WHERE clause expression using BETWEEN along with the NOT keyword. In this case, the query returns any rows outside the range. Try the examples in Listing 3-8, and compare them to the results from Listing 3-7.

Listing 3-8. Using NOT BETWEEN

--1
SELECT CustomerID, SalesOrderID, OrderDate
FROM Sales.SalesOrderHeader
WHERE OrderDate NOT BETWEEN '2005-07-02' AND '2005-07-04';

--2
SELECT CustomerID, SalesOrderID, OrderDate
FROM Sales.SalesOrderHeader
WHERE CustomerID NOT BETWEEN 25000 AND 25005;

--3
SELECT BusinessEntityID, JobTitle
FROM HumanResources.Employee
WHERE JobTitle NOT BETWEEN 'C' and 'E';

--4 An illogical BETWEEN expression
SELECT CustomerID, SalesOrderID, OrderDate
FROM Sales.SalesOrderHeader
WHERE CustomerID NOT BETWEEN 25005 AND 25000;

Query 1 displays all orders placed before July 2, 2005 (2005-07-02) or after July 4, 2005 (2005-07-04)—in other words, any orders placed outside the range specified (see Figure 3-7). Query 2 displays the orders placed by customers with customer IDs less than 25000 or greater than 25005. When using the NOT operator with BETWEEN, the values specified in the BETWEEN expression don’t show up in the results. Query 3 returns all job titles beginning with A and B. It also displays any job titles beginning with E and at least one more character, as well as any job titles starting with a letter greater than E. If a title consists of just the letter E, it will not show up in the results. This is just the opposite of what you saw in Listing 3-7.

Figure 3-7. The partial results of queries with NOT BETWEEN

Query 4 with the illogical BETWEEN expression returns all the rows in the table. Because no customer ID values can be less than or equal to 25005 and also be greater than or equal to 25000, no rows meet the criteria in the BETWEEN expression. By adding the NOT operator, every row ends up in the results, which is probably not the original intent.

Filtering on Date and Time

Some temporal data columns store the time as well as the date. If you attempt to filter on such a column specifying only the date, you may retrieve incomplete results. Type in and run the code in Listing 3-9 to create and populate a temporary table that will be used to illustrate this issue. Don’t worry about trying to understand the table creation code at this point.

Listing 3-9. Table Setup for Date/Time Example

CREATE TABLE #DateTimeExample(
    ID INT NOT NULL IDENTITY PRIMARY KEY,
    MyDate DATETIME2(0) NOT NULL,
    MyValue VARCHAR(25) NOT NULL
);
GO
INSERT INTO #DateTimeExample
    (MyDate,MyValue)
VALUES ('2009-01-02 10:30','Bike'),
    ('2009-01-03 13:00','Trike'),
    ('2009-01-03 13:10','Bell'),
    ('2009-01-03 17:35','Seat'),

Now that the table is in place, type in and execute the code in Listing 3-10 to see what happens when filtering on the MyDate column.

Listing 3-10. Filtering on Date and Time Columns

--1
SELECT ID, MyDate, MyValue
FROM #DateTimeExample
WHERE MyDate = '2009-01-03';

--2
SELECT ID, MyDate, MyValue
FROM #DateTimeExample
WHERE MyDate BETWEEN '2009-01-03 00:00:00' AND '2009-01-03 23:59:59';

Figure 3-8 shows the results of the two queries. Suppose you want to retrieve a list of entries from January 3, 2009 (2009-01-03). Query 1 tries to do that but returns no results. Results will be returned only for entries where the MyDate value is precisely 2009-01-03 00:00:00, and there are no such entries. The second query returns the expected results—all values where the date is 2009-01-03. It does that by taking the time of day into account. To be even more accurate, the query could be written using two expressions: one filtering for dates greater than or equal to 2009-01-03 and another filtering for dates less than 2009-01-04. You will learn more about data types in Chapter 16, but this example will only work because of the specific data type datetime2(0) that was used. You will learn how to write WHERE clauses with two expressions in the next section, which will have an example that is actually a better way to handle filtering on date and time data types.

Figure 3-8. Results of filtering on a date and time column

So what would happen if you formatted the date differently? Will you get the same results if slashes (/), are used or if the month is spelled out (in other words, as January 3, 2009)? SQL Server does not store the date using any particular character-based format but rather as an integer representing the number of days between 1900-01-01 and the date specified. If the data type holds the time, the time is stored as the number of clock ticks past midnight. As long as you pass a date in an appropriate format based on the localization settings of the server where SQL Server is installed, the value will be recognized as a date, but as a best practice always use the 'YYYY-MM-DD' format.

Writing a WHERE clause is a very important skill. Take the time to practice what you have learned so far by completing Exercise 3-2.

EXERCISE 3-2

Use the AdventureWorks database to complete this exercise. Be sure to run each query and check the results. You can go back and review the examples in the section if you don’t remember how to write the queries. You can find the solutions at the end of the chapter.

Write a query using a WHERE clause that displays all the employees listed in the HumanResources.Employee table who have the job title Research and Development Engineer. Display the business entity ID number, the login ID, and the title for each one.
Write a query using a WHERE clause that displays all the names in Person.Person with the middle name J. Display the first, last, and middle names along with the ID numbers.
Write a query displaying all the columns of the Production.ProductCostHistory table from the rows in which the standard cost is between the values of $10 and $13. Be sure to use one of the features in SQL Server Management Studio to help you write this query.
Rewrite the query you wrote in question 1, changing it so the employees who do not have the title Research and Development Engineer are displayed.
Explain why a WHERE clause should be used in many of your T-SQL queries.

Using WHERE Clauses with Two Predicates

So far, the examples have shown only one condition or predicate in the WHERE clause, but the WHERE clause can be much more complex. They can have multiple predicates by using the logical operators AND and OR. Type in and execute the code in Listing 3-11 that demonstrates how to use AND and OR to combine two predicates.

Listing 3-11. How to Use AND and OR

--1
SELECT BusinessEntityID, FirstName, MiddleName, LastName
FROM Person.Person
WHERE FirstName = 'Ken' AND LastName = 'Myer';

--2
SELECT BusinessEntityID, FirstName, MiddleName, LastName
FROM Person.Person
WHERE LastName = 'Myer' OR LastName = 'Meyer';

--3
IF OBJECT_ID('tempdb..#DateTimeExample') IS NOT NULL BEGIN
        DROP TABLE #DateTimeExample;
END;
GO
CREATE TABLE #DateTimeExample(
    ID INT NOT NULL IDENTITY PRIMARY KEY,
    MyDate DATETIME2(0) NOT NULL,
    MyValue VARCHAR(25) NOT NULL
);
GO
INSERT #DateTimeExample (ID, MyDate, MyValue)
VALUES ('2009-01-01 10:30','Bike'),
    ('2009-01-01 11:30','Bike'),
    ('2009-01-02 13:00','Trike'),
    ('2009-01-03 13:10','Bell'),
    ('2009-01-03 17:35','Seat'),
    ('2009-01-04 00:00','Bike'),

--4
SELECT ID, MyDate, MyValue
FROM #DateTimeExample
WHERE MyDate >= '2009-01-02' AND MyDate < '2009-01-04';

Figure 3-9 shows the results. Query 1 returns any rows with the first name Ken and the last name Myer because both expressions must evaluate to TRUE. Query 2 returns any rows with either the last name Myer or the last name Meyer because only one of the expressions must evaluate to TRUE. Query 3 demonstrates the best way to deal with datetime data types. This query returns all rows where MyDate is any time on 2009-01-02 or 2009-01-03. By using greater than or equal to the lower date and less than one more than the upper date, you will get the expected results. Using more than two predicates, especially when including OR along with AND, can get quite complex. You will learn more about this in Chapter 9.

Figure 3-9. The results of queries with two predicates in the WHERE clause

Using the IN Operator

The IN operator is very useful when a set of multiple values must be compared to the same column. Query 2 in Listing 3-11 could have been written using the IN operator. Follow the IN operator with a list of possible values for a column within parentheses. Here is the syntax:

SELECT <column1>,<column2>
FROM <schema>.<table>
WHERE <column> IN (<value1>,<value2>);

Type in and execute the code from Listing 3-12. The queries in this listing demonstrate how to use the IN operator. Review the results to be sure that you understand them.

Listing 3-12. Using the IN Operator

--1
SELECT BusinessEntityID,FirstName,MiddleName,LastName
FROM Person.Person
WHERE FirstName = 'Ken' AND
    LastName IN ('Myer','Meyer'),

--2
SELECT TerritoryID, Name
FROM Sales.SalesTerritory
WHERE TerritoryID IN (2,2,1,4,5);

--3
SELECT TerritoryID, Name
FROM Sales.SalesTerritory
WHERE TerritoryID NOT IN (2,1,4,5);

You will probably find that the operator IN can simplify many queries when you have multiple values to compare. Query 1 requires that the first name must be Ken and the last name can be either Myer or Meyer. You can also use IN with numbers and dates and most data types. Query 2 returns all rows with TerritoryID 2, 1, 4, or 5. Notice that 2 was listed twice. This doesn’t cause the row to be returned twice, however. By using NOT, query 3 returns the opposite results. Figure 3-10 shows the results of the three queries from Listing 3-12.

Figure 3-10. The results of queries using the IN operator

As the WHERE clause becomes more complicated, it becomes very easy to make a mistake. You will spend more time learning about the WHERE clause in Chapter 9. Complete Exercise 3-3 to practice writing WHERE clauses with multiple predicates and the IN operator.

EXERCISE 3-3

Use the AdventureWorks database to complete this exercise. Be sure to check your results to ensure that they make sense. You can find the solutions at the end of the chapter.

Write a query displaying the order ID, order date, and total due from the Sales.SalesOrderHeader table. Retrieve only those rows where the order was placed during the month of September 2005.
Write a query with the same columns as question 1. Include rows where the Total Due is $10,000 or more or the SalesOrderID is less than 43000.
Explain when it makes sense to use the IN operator.

Working with NULL

Probably nothing causes more aggravation to T-SQL developers than NULL values. NULL means that a value has not been entered for a particular column in a row. Suppose you have an e-commerce application that requires the customer to fill in information such as name and address. In this example, the phone number is optional. What does it mean if the customer does not enter a phone number and the table ends up with NULL in the PhoneNumber column of the Customer table? Does it mean that the customer does not have a phone? That’s one possibility. Another is that the customer has at least one phone number but chose not to supply it because it was not required. Either way, the end result is that you do not know if a phone number exists or not.

Think about what would happen if you had a list of 1 million phone numbers and tried to figure out whether any of the phone numbers belonged to this particular customer. Even if you compared each phone number to the customer’s row, one by one, you would never know whether any of the phone numbers were the right one. You would never know because you would be comparing 1 million values to an unknown value. Conversely, can you guarantee that every one of your 1 million phone numbers is not the missing phone number? No, you can’t do that either, because the customer’s phone number is unknown and one of those numbers might actually be correct.

This example should give you an idea about the challenges of working with NULL values. Type in and execute the code in Listing 3-13 to work on some examples using real data.

Listing 3-13. An Example Illustrating NULL

--1 Returns 19,972 rows
SELECT MiddleName
FROM Person.Person;

--2 Returns 291 rows
SELECT MiddleName
FROM Person.Person
WHERE MiddleName = 'B';

--3 Returns 11,182 but 19,681 were expected
SELECT MiddleName
FROM Person.Person
WHERE MiddleName != 'B';

--4 Returns 19,681
SELECT MiddleName
FROM Person.Person
WHERE MiddleName IS NULL
    OR MiddleName !='B';

Query 1 with no WHERE clause returns 19,972 rows, the total number of rows in the table. Query 2 returns 291 rows with the middle name B. Logic follows that query 3 will return the difference of the two numbers: 19,681 rows. When you check the results of query 3, you will find that more than 8,000 rows are not accounted for. That is because the rows with NULL values can’t be found by the expression containing not equal. Comparing NULL to B returns UNKNOWN, so the rows are not returned. You must specifically check for NULL values by using the IS NULL operator, as shown in query 4, which returns the correct number of rows.

Usually comparing the data in a column to a value or comparing the values from two columns returns either TRUE or FALSE. If the expression in the WHERE clause evaluates to TRUE, then the row is returned. If the expression evaluates to FALSE, then the row is not returned. If a value in the expression contains NULL, then the expression is resolved to UNKNOWN. In some ways, the behavior is like FALSE. When an expression resolves to UNKNOWN, the row is not returned. The problems begin when using any operator except for equal to (=). The opposite of FALSE is TRUE, but the opposite of UNKNOWN is still UNKNOWN.

Note Be aware that NULL is not a value so it can’t equal itself or any other value. T-SQL instead provides specific expressions and functions to test for NULL values. To test for a NULL value you will want to use the IS [NOT] NULL expression (http://msdn.microsoft.com/en-us/library/ms188795(v=SQL.120).aspx). Additionally, you may want to change a NULL value to a value like an empty string (''). The function ISNULL will replace a NULL value with another value specified in the query (see Chapter 4).

Neglecting to take possible NULL values into consideration can often cause incomplete results. Always remember to think about NULL values, especially when writing any expression containing NOT. Do the NULL values belong in the results? If so, you will have to check for NULL. You will also need to keep NULL values in mind when using the less than (<) operator. NULL values will be left out of those results as well. Chapter 4 will show you some other options for working with NULL.

Understanding how NULL values can affect the results of your queries is one of the most important skills you will learn. Even experienced T-SQL developers struggle from time to time when working with NULL values. You have covered a lot of ground about the WHERE clause in these sections, but there is much more to learn. You will spend time in Chapter 9 learning even more on this subject. Be sure to complete Exercise 3-4 to practice what you have just learned.

EXERCISE 3-4

Use the AdventureWorks database to complete this exercise. Make sure you consider how NULL values will affect your results. You can find the solutions at the end of the chapter.

Write a query displaying the ProductID, Name, and Color columns from rows in the Production.Product table. Display only those rows where no color has been assigned.
Write a query displaying the ProductID, Name, and Color columns from rows in the Production.Product table. Display only those rows in which the color is known not to be blue.
Write a query displaying ProductID, Name, Style, Size, and Color from the Production.Product table. Include only the rows where at least one of the Size or Color columns contains a value.

Sorting Data

So far, you have learned how to retrieve a list of columns from a table and filter the results. This section covers how to sort the data that are retrieved using the ORDER BY clause. The ORDER BY clause is the last part of the SELECT statement that the database engine will process.

You can specify one or more columns in the ORDER BY clause separated by commas. The sort order is ascending by default, but you can specify descending order by using the keyword DESCENDING or DESC after the column name. You can also specify ASCENDING or ASC if you wish, but the sort order is ascending by default. Here is the syntax for ORDER BY:

SELECT <column1>,<column2>
FROM <schema>.<tablename>
ORDER BY <column1>[<sort direction>],<column2> [<sort direction>]

Type in and execute the code in Listing 3-14 to learn how to use the ORDER BY clause.

Listing 3-14. How to Use ORDER BY

--1
SELECT ProductID, LocationID
FROM Production.ProductInventory
ORDER BY LocationID;

--2
SELECT ProductID, LocationID
FROM Production.ProductInventory
ORDER BY ProductID, LocationID DESC;

Figure 3-11 shows the partial results. The rows from query 1 display in order of LocationID. Query 2 returns the results ordered first by ProductID, and then the results are further sorted by LocationID in descending order.

Figure 3-11. The results when using the ORDER BY clause

When NULL values are part of the results, they will show up first when sorting in an ascending manner. This reminds me of the secretarial skills class I took in high school. When learning about filing, the teacher said to remember “Nothing before something.” Except for learning how to type on an electric typewriter, which I haven’t done in years, that was the most important thing I learned in the class. Another interesting thing you can do is use aliases from the SELECT clause in the ORDER BY clause. Here is an example that actually causes a problem because the same column name is used in the SELECT list.

SELECT BusinessEntityID, 1 as LastName, LastName, FirstName, MiddleName
FROM Person.Person
ORDER BY LastName DESC, FirstName DESC, MiddleName DESC;

Msg 209, Level 16, State 1, Line 11 Ambiguous column name 'LastName'

You may find the ORDER BY clause easy to use, but you should still practice what you have learned about sorting the results of your queries by completing Exercise 3-5.

EXERCISE 3-5

Use the AdventureWorks database to complete this exercise and practice sorting the results of your queries. You can find the solutions at the end of the chapter.

Write a query that returns the business entity ID and name columns from the Person.Person table. Sort the results by LastName, FirstName, and MiddleName.
Modify the query written in question 1 so that the data is returned in the opposite order.

Thinking About Performance

Reading this book and performing the exercises found in each chapter will enable you to become a proficient T-SQL programmer. You will learn how to write the queries, often in more than one way, to get results. Frequently, T-SQL developers don’t learn the best way to write a query, and the performance of their applications and reports suffers. As a result, several chapters of this book, beginning with this chapter, feature a section on performance to get you thinking about how the statements you write can affect performance.

Taking Advantage of Indexes

Indexes help the database engine locate the rows that must be returned by a query. In fact, the database engine, if possible, will retrieve all the required columns from the index instead of accessing the table. I am not advocating creating an index on every column, but strategically designed indexes immensely improve the performance of queries.

When a table contains an index on a column, the database engine will usually use that index to find the rows for the results if the column appears in the WHERE clause. For example, the Person.Person table contains an index called IX_Person_LastName_FirstName_MiddleName, which consists of the LastName, FirstName, and MiddleName columns. To see the index properties, follow these steps:

Using SQL Server Management Studio, connect to your SQL Server instance if you aren’t connected already.
Expand Databases.
Expand AdventureWorks.
Expand Tables.
Expand Person.Person.
Expand Indexes.
Locate the IX_Person_LastName_FirstName_MiddleName index, and double-click it to view the properties.

View the index properties in Figure 3-12. Notice that the LastName column appears first in the list. To take full advantage of this index, the WHERE clause must filter on LastName. Imagine searching a phone book by first name when you don’t know the last name! SQL Server must do the same thing, looking at each entry in the index, when the query filters on FirstName but not LastName.

Figure 3-12. The properties of an index

What happens when you filter on only a nonindexed column? The database engine must check the value of the column in each row of the table to find the rows meeting the criteria. Again, I’m not advocating creating an index for every query, and index creation and tuning are both well beyond the scope of this book. I just want to make you aware that the indexes defined on the table will affect the performance of your queries.

Viewing Execution Plans

By using execution plans, you can determine whether the database engine utilizes an index to return the rows in the query. You can also compare the performance of two or more queries to see which one performs the best. Again, this book doesn’t intend to make you an expert on execution plans but instead just gets you started using them to help you understand how your query performs.

Note To learn more about execution plans, see the book SQL Server 2008 Query Performance Tuning Distilled by Grant Fritchey and Sajal Dam (Apress, 2009).

While you have a query window open, click the Include Actual Execution Plan icon (see Figure 3-13) to turn on this feature for the current session. The setting must be toggled on for each query window; it is not a permanent setting.

Figure 3-13. Clicking the Include Actual Execution Plan icon

Listing 3-15 contains three queries to demonstrate the differences in performance found depending on whether SQL Server can take advantage of an index to return the results. Type in and execute the code in Listing 3-15.

Listing 3-15. Learning How to View Execution Plans

--1
SELECT LastName, FirstName
FROM Person.Person
WHERE LastName = 'Smith';

--2
SELECT LastName, FirstName
FROM Person.Person
WHERE FirstName = 'Ken';

--3
SELECT ModifiedDate
FROM Person.Person
WHERE ModifiedDate BETWEEN '2005-01-01' and '2005-01-31';

Once the query execution completes, click the Execution Plan tab. Figure 3-14 shows the graphical execution plans for the three queries. First, take a look at the query cost for each query shown at the top of each section. The query cost gives you an estimated weight of each query compared to the total. The numbers should add up to approximately 100 percent.

Figure 3-14. The actual execution plans generated from Listing 3-15

Query 1, which has a relative query cost of 0 percent, filters the results on the LastName column. Recall that an index comprised of the LastName, FirstName, and MiddleName columns exists on the Person.Person table. Because the query filters on the first column in the index, the database engine can take full advantage of the index; it performs an index seek without scanning the entire index. This is similar to looking at the phone book when you know the last name; you don’t need to look at every page or every entry to find the name you are looking for. Query 2, which has a relative query cost of 3 percent, filters the results on the FirstName column. The table has an index that contains the FirstName column, but because it appears second in the index, SQL Server must perform an index scan. This means that the database engine must compare the string Ken to every FirstName value in the index. The query was able to take advantage of the index but not to the fullest extent. Because the index contains both columns found in the results, the database engine didn’t have to touch the actual table, pulling all the needed data from the much smaller index. Because the index has less data in it than the table, SQL Server can do less work when retrieving from the index. The execution plan also suggests a new index that will make this query perform better.

Query 3, which has a relative query cost of 96 percent, filters the results on the ModifiedDate column. The table doesn’t have an index containing this column. To filter the rows, the database engine must perform a clustered index scan. The clustered index is the actual table. In this case, the database engine had to look at each row of the table to retrieve the results. Scanning the table is much more work for SQL Server than scanning the nonclustered index or searching either type. Review the “Understanding Indexes” section in Chapter 2 to learn more about clustered indexes and indexes in general.

Viewing and understanding execution plans will help you learn how writing queries will affect the performance of your applications and reports. Don’t rush to your database administrator demanding changes to the database indexes in your production database; this section doesn’t intend to teach index tuning. Think of execution plans as another tool you can use to write better code.

Summary

The SELECT statement is used to retrieve data from tables stored in SQL Server databases. The statement can be broken down into several parts called clauses. The FROM clause specifies the table where the data is stored. The SELECT clause contains a list of columns to be retrieved. To filter the data, use the WHERE clause. To sort the data, use the ORDER BY clause.

This chapter covered a lot of ground, especially many nuances of the WHERE clause. Make sure you really understand the material covered in the chapter before continuing in this book. Everything you learn throughout the rest of the book will depend on a thorough knowledge of the basics. The next chapter explores many of the built-in functions you can use to make data retrieval even more interesting.

Answers to the Exercises

This section provides solutions to the exercises found on writing simple SELECT queries.

Solutions to Exercise 3-1: Using the SELECT Statement