Advanced WHERE Clauses
In Chapter 3, you learned how to write queries with SELECT, FROM, WHERE, and ORDER BY clauses. Chapter 3 covered a lot of options for filtering results with the WHERE clause, however, there is still more to learn. This chapter will discuss using the comparison operator LIKE and matching against a partial value or a pattern, full-text search, and WHERE clauses with more than two predicates and the PATINDEX function.
Sometimes you know only part of the value that will match the data stored in the table. For example, you may need to search for one word within a description. You can perform searches with pattern matching using wildcards to find one value within another value.
Using LIKE
Pattern matching is possible by using the LIKE operator in the expression instead of equal to (=) or one of the other operators. Most of the time, the percent sign (%) is used as a wildcard along with LIKE to represent zero or more characters. You will also see the underscore (_) used as a wildcard to replace exactly one character, but it’s not used as often. Type in and run the code from Listing 9-1 to learn how to use LIKE.
Listing 9-1. Using LIKE with the Percent Sign
--1
SELECT DISTINCT LastName
FROM Person.Person
WHERE LastName LIKE 'Sand%';
--2
SELECT DISTINCT LastName
FROM Person.Person
WHERE LastName NOT LIKE 'Sand%';
--3
SELECT DISTINCT LastName
FROM Person.Person
WHERE LastName LIKE '%Z%';
--4
SELECT DISTINCT LastName
FROM Person.Person
WHERE LastName LIKE 'Bec_';
The queries in Listing 9-1 contain the keyword DISTINCT to eliminate duplicates in the results shown in Figure 9-1. Query 1 returns all LastName values where the last name starts with Sand. Query 2 returns the opposite—it returns all the LastName values not returned by query 1, which are those rows where the last name does not start with Sand. Query 3 returns all LastName values that contain a Z anywhere in the last name. Query 4 will return only the last name Beck or any last name starting with Bec and one more character, but not the last name Becker because the underscore matches one and only one character.
Figure 9-1. The partial results of queries with LIKE
Restricting the Characters in Pattern Matches
The value matching a wildcard may be restricted to a list or range of characters. To do this, surround the possible values or range by square brackets ([]). Alternately, include the caret (^) symbol to list characters or the range of characters that you don’t want to use as replacements. Here is the syntax for using brackets as the wildcard:
SELECT <column1>,<column2>
FROM <schema>.<table>
WHERE <column> LIKE 'value[a-c]';
SELECT <column1>,<column2>
FROM <schema>.<table>
WHERE <column> LIKE 'value[abc]';
SELECT <column1>,<column2>
FROM <schema>.<table>
WHERE <column> LIKE 'value[^d]';
Type in and execute the code from Listing 9-2, which shows some examples. You will probably not encounter the square bracket technique very often, but you should be familiar with the syntax in case you run into it.
Listing 9-2. Using Square Brackets with LIKE
--1
SELECT DISTINCT LastName
FROM Person.Person
WHERE LastName LIKE 'Cho[i-k]';
--2
SELECT DISTINCT LastName
FROM Person.Person
WHERE LastName LIKE 'Cho[ijk]';
--3
SELECT DISTINCT LastName
FROM Person.Person
WHERE LastName LIKE 'Cho[^i]';
Figure 9-2 displays the results of running Listing 9-2. Queries 1 and 2 could return unique rows with a last name of Choi, Choj, or Chok because the pattern specifies the range i to k. Only Choi is actually found in the data. Query 1 specifies the range of values, while query 2 explicitly lists the allowable values that may be replaced. Query 3 returns unique rows that have a last name beginning with Cho and ending with any character except for i.
Figure 9-2. The results of queries restricting characters in matches
Searching for Wildcards
Occasionally you will need to search for the percent sign, the underscore, or one of the brackets in a string. If you just include them in your search term, the database engine will ignore them. To get around this, always escape these symbols by including them within brackets when you are actually trying to return values that contain them. Table 9-1 shows the strings to use when searching for one of these symbols within a string.
Table 9-1. Symbols Used in Search Patterns
Symbol |
Search Pattern |
---|---|
Percent % |
'%[%]%' |
Underscore _ |
'%[_]%' |
Left Bracket [ |
'%[[]%' |
Right Bracket ] |
'%[]]%' |
Combining Wildcards
You may combine wildcards to create even more elaborate patterns. Remember that the percent sign replaces any number of characters, the underscore replaces one character, and the square brackets replace one character based on the values within the brackets. Listing 9-3 demonstrates some examples. Type in and execute the code to see how this works.
Listing 9-3. Combining Wildcards in One Pattern
--1
SELECT LastName
FROM Person.Person
WHERE LastName LIKE 'Ber[rg]%';
--2
SELECT LastName
FROM Person.Person
WHERE LastName LIKE 'Ber[^r]%';
--3
SELECT LastName
FROM Person.Person
WHERE LastName LIKE 'Be%n_';
View Figure 9-3 to see the results of running this code. Query 1 returns all rows with a last name beginning with Ber followed by either r or g (which is signified by the characters within the brackets) and then by any number of characters. Query 2 returns all rows with a last name beginning with Ber followed by any letter except for r and then by any number of characters. Query 3 returns all rows with a last name beginning with Be followed by any number of characters, except that the next-to-last character must be an n.
Figure 9-3. The results of queries with multiple wildcards
You will probably find LIKE used frequently in queries, so it’s important to understand how it works. Practice the skills you have just learned by completing Exercise 9-1.
Use the AdventureWorks database to complete this exercise. Follow the steps in this exercise to test your knowledge of pattern matching and wildcard queries. You can find the solutions at the end of the chapter.
SELECT FirstName
FROM Person.Person
WHERE LastName LIKE 'Ja%es';
SELECT FirstName
FROM Person.Person
WHERE LastName LIKE 'Ja_es';
Using PATINDEX
In Chapter 4 you learned to use the CHARINDEX function that returns the position of one string inside another. There is also a function called PATINDEX that works similarly but includes the use of wildcards in the search condition. Here is the syntax:
PATINDEX(‘%pattern%’,expression)
You can use both the percent sign and the underscore and the other patterns used with LIKE. If the pattern is found in the expression, the position where the expression is found is returned. If the pattern is not found, then the function returns 0. Listing 9-4 demonstrates how to use PATINDEX in the SELECT list and the WHERE clause.
Listing 9-4. Using PATINDEX
--1
SELECT LastName, PATINDEX('Ber[rg]%', LastName) AS Position
FROM Person.Person
WHERE PATINDEX('Ber[r,g]%', LastName) > 0;
--2
SELECT LastName, PATINDEX('%r%',LastName) Position
FROM Person.Person
WHERE PATINDEX('%[r]%',LastName) > 0;
Figure 9-4 shows the partial results of running this code. Query 1 returns any rows where the LastName column starts with Ber, either r or g, and then followed by any number of characters. Query 2 returns all rows where the LastName contains the letter R.
Figure 9-4. Using PATINDEX
Using WHERE Clauses with Three or More Predicates
In Chapter 3, you learned how to write a WHERE clause with one or two predicates. A WHERE clause can contain more than two predicates combined by the logical operators AND and OR. If a WHERE clause contains more than two predicates using both AND and OR, you must be careful to ensure that the query returns the expected results. Type in and execute the code in Listing 9-5 to see how the order of the predicates affects the results and how to use parentheses to enforce the correct logic.
Listing 9-5. WHERE Clauses with Three Predicates
--1
SELECT BusinessEntityID,FirstName,MiddleName,LastName
FROM Person.Person
WHERE FirstName = 'Ken' AND LastName = 'Myer'
OR LastName = 'Meyer';
--2
SELECT BusinessEntityID,FirstName,MiddleName,LastName
FROM Person.Person
WHERE LastName = 'Myer' OR LastName = 'Meyer'
AND FirstName = 'Ken';
--3
SELECT BusinessEntityID,FirstName,MiddleName,LastName
FROM Person.Person
WHERE LastName = 'Meyer'
AND FirstName = 'Ken' OR LastName = 'Myer';
--4
SELECT BusinessEntityID,FirstName,MiddleName,LastName
FROM Person.Person
WHERE FirstName = 'Ken' AND (LastName = 'Myer'
OR LastName = 'Meyer'),
You can see the results of Listing 9-5 in Figure 9-5. Once both logical operators AND and OR are used in the WHERE clause, things can get complicated. The logical operator AND takes precedence over OR; therefore, the database engine evaluates AND first. For example, suppose you want to find a name in the Person.Person table, Ken Meyer, but you can’t remember the spelling of the last name. It could be Myer. Listing 9-5 shows four attempts to solve this problem, but only the last one is logically correct.
Figure 9-5. The results of queries that force precedence to ensure the correct results
Query 1 returns the rows with the name Ken Myer but also returns any row with the last name Meyer. Queries 2 and 3 return identical results—the row with Ken Meyer and any rows with the last name Myer. Finally, by using the parentheses, query 4 returns the expected results.
When using multiple conditions, you must be very careful about the precedence, or order, that the expressions are evaluated. The database engine evaluates the conditions in the WHERE clause from left to right, but AND takes precedence over OR. Rearranging the terms can produce different but possibly still invalid results as in the previous example. To guarantee that the query is correct, always use parentheses to enforce the logic once the logical operator OR is added to the WHERE clause. The code is also more readable with the parentheses in the case where they are not needed. Later when you look at the code, you will know exactly what you had intended.
Another interesting twist when using parentheses is that you can negate the meaning of the expression within them by specifying the keyword NOT. For example, you could try to find the rows where the first name is Ken and the last name can’t be Myer or Meyer. Type in and execute Listing 9-6 to see two ways to write the query.
Listing 9-6. Using NOT with Parentheses
--1
SELECT BusinessEntityID,FirstName,MiddleName,LastName
FROM Person.Person
WHERE FirstName='Ken' AND LastName <> 'Myer'
AND LastName <> 'Meyer';
--2
SELECT BusinessEntityID,FirstName,MiddleName,LastName
FROM Person.Person
WHERE FirstName='Ken'
AND NOT (LastName = 'Myer' OR LastName = 'Meyer'),
Often multiple ways exist to solve the same problem, as in this case. Query 1 contains three expressions. One expression restricts the rows to those where FirstName is Ken. The other two expressions compare LastName to a value using not equal to (<>). In query 2, the expressions within the parentheses are evaluated first. Next, that result is negated by the NOT operator to find all last names that are not Myer or Meyer. Finally, only the rows that also have the first name Ken are returned. You can see the results in Figure 9-6.
Figure 9-6. The identical results of two queries with different techniques
As a best practice, always employ parentheses whether they are needed or not to enforce precedence when the WHERE clause includes the logical operator OR. Not only will this decrease the possibility of an incorrect WHERE clause, but it will also increase the readability of the query.
Complete Exercise 9-2 to practice the use of these skills.
Use the AdventureWorks database to complete this exercise. The solution can be found at the end of the chapter.
Performing a Full-Text Search
You have learned how to use LIKE and PATINDEX to find a character match in data. Full-text search provides the ability to search for words or phrases within string or binary data columns similar to a web search such as Google or Bing. You can only use LIKE for pattern matching and not for searching binary data. Full-text search has support for multiple languages and other features such as synonym searches. Full-text search is especially beneficial for documents stored as binary data in the database.
Full-text search must be installed by running the SQL Server installation either during the initial set up or added later. A special full-text index needs to be created on the table. It is beyond the scope of this book to teach you how to set up and manage full-text search, but here you will see how to write some of the basic queries. For more information about full-text search and some of the features not covered here, like CONTAINSTABLE, see the book Pro Full-Text Search in SQL Server 2008 by Hilary Cotter and Michael Coles (Apress, 2008). There are a number of enhancements to full-text searching introduced with SQL Server 2012 that allow you to search the meta-data, for example, the document author, as well. The AdventureWorks database ships with three full-text indexes already in place. Table 9-2 lists the columns with full-text indexes included by default in AdventureWorks.
Table 9-2. AdventureWorks Columns with Full-Text Indexes
Table Name |
Column Name |
Data Type |
---|---|---|
Production.ProductReview |
Comments |
NVARCHAR(3850) |
Production.Document |
DocumentSummary |
NVARCHAR(MAX) |
Production.Document |
Document |
VARBINARY(MAX) |
HumanResources.JobCandidate |
Resume |
XML |
CONTAINS is one of the functions used to search full-text indexes. The simplest way to use CONTAINS is to search a column for a particular word or phrase. Here is the syntax for CONTAINS:
SELECT <column1>,<column2>
FROM <schema>.<tablename>
WHERE CONTAINS(<indexed column>,<searchterm>);
Listing 9-7 shows how to use CONTAINS. Notice that the second query has a regular predicate in the WHERE clause as well. Be sure to type in and execute the code to learn how to use CONTAINS.
Listing 9-7. Using CONTAINS
--1
SELECT FileName
FROM Production.Document
WHERE CONTAINS(Document,'important'),
--2
SELECT FileName
FROM Production.Document
WHERE CONTAINS(Document,' "service guidelines" ')
AND DocumentLevel = 2;
Figure 9-7 displays the results of running this code. Notice how double quotes are used within single quotes to designate a phrase in query 2. Query 2 also demonstrates that both a full-text predicate and a regular predicate can be used in the same query. You may be wondering why the Document column is not part of the results since that is the search term. The document is actually a binary file, such as a Microsoft Word document, that must be opened by the appropriate application.
Figure 9-7. The results of a full-text search operation
Using Multiple Terms with CONTAINS
You can use CONTAINS to find words in data that are not even next to each other by using AND, OR, and NEAR. You can use the operator AND NOT to find results with one term and not another. Listing 9-8 demonstrates this technique.
Listing 9-8. Multiple Terms in CONTAINS
--1
SELECT FileName, DocumentSummary
FROM Production.Document
WHERE CONTAINS(DocumentSummary,'bicycle AND reflectors'),
--2
SELECT FileName, DocumentSummary
FROM Production.Document
WHERE CONTAINS(DocumentSummary,'bicycle AND NOT reflectors'),
--3
SELECT FileName, DocumentSummary
FROM Production.Document
WHERE CONTAINS(DocumentSummary,'maintain NEAR bicycle AND NOT reflectors'),
Figure 9-8 shows the results of running this code. In this case, a regular string data column, DocumentSummary, is searched so you can verify the results.
Figure 9-8. The results from using multiple search terms
You can search multiple columns or all full-text indexed columns at once without multiple CONTAINS predicates in the WHERE clause. Use the asterisk to specify that all possible columns are searched, or use a comma-delimited list in parentheses to specify a list of columns. Type in and execute the code in Listing 9-9, which demonstrates these techniques.
Listing 9-9. Using Multiple Columns
--1
SELECT FileName, DocumentSummary
FROM Production.Document
WHERE CONTAINS((DocumentSummary,Document),'maintain'),
--2
SELECT FileName, DocumentSummary
FROM Production.Document
WHERE CONTAINS((DocumentSummary),'maintain')
OR CONTAINS((Document),'maintain')
--3
SELECT FileName, DocumentSummary
FROM Production.Document
WHERE CONTAINS(*,'maintain'),
The list of columns to be searched in query 1 is explicitly listed and contained within an inner set of parentheses. Query 2 is equivalent to query 1 by using two CONTAINS expressions, each searching a different column for the same term. By using the asterisk in query 3 within the CONTAINS expression, all columns with a full-text index are searched.
FREETEXT is similar to CONTAINS except that it returns rows that don’t exactly match. It will return rows that have terms with similar meanings to your search terms by using a thesaurus. FREETEXT is less precise than CONTAINS, and it is less flexible. The keywords AND, OR, and NEAR can’t be used with CONTAINS. Avoid using double quotes that specify an exact phrase with FREETEXT, because then SQL Server won’t use the thesaurus and will search only for the exact phrase. The same rules about multiple columns apply. Type in and execute the code in Listing 9-10, which compares FREETEXT to LIKE.
Listing 9-10. Using FREETEXT
--1
SELECT FileName, DocumentSummary
FROM Production.Document
WHERE FREETEXT((DocumentSummary),'provides'),
--2
SELECT FileName, DocumentSummary
FROM Production.Document
WHERE DocumentSummary LIKE '%provides%'
Figure 9-9 displays the results from Listing 9-10. The DocumentSummary values in the rows returned from query 1 do not contain the word provides. Query 1 returns the rows anyway because FREETEXT will find similar words as well as exact matches. In each case, the word provided can be found in the data. Query 2 doesn’t return the rows because the LIKE operator is looking for an exact match.
Figure 9-9. The results from using FREETEXT
Full-text search operations can get much more complicated than the information provided here. This was meant to be an overview of the basic syntax. Be sure to see the book Pro Full-Text Search in SQL Server 2008 by Hilary Cotter and Michael Coles (Apress, 2008) to learn more about full-text search. Practice what you have just learned about full-text search by completing Exercise 9-3.
Use the AdventureWorks database to complete the following tasks. Be sure to take advantage of the full-text indexes in place when writing the queries. You can find the solutions at the end of the chapter.
Thinking About Performance
You learned in Chapter 4 that using a function on a column in the WHERE clause can decrease performance, especially if that column is the key column of an index that could have been used for the query. If all the columns required for the query are part of a nonclustered index, either as a key column or included column, it is still possible that the optimizer will choose to scan the nonclustered index instead of scanning the table. You may be wondering what the difference is. The nonclustered index will almost always be a smaller structure than the table and will be less work to scan. Imagine that you had to find a particular phrase in a book. It would be easier for you to read the index instead of reading the entire book because the index is smaller. Type and execute Listing 9-11 to see some examples.
Listing 9-11. Comparing LIKE with CHARINDEX
--1
SET STATISTICS IO ON;
GO
SELECT Name
FROM Production.Product
WHERE CHARINDEX('Bear',Name) = 1;
--2
SELECT Name
FROM Production.Product
WHERE Name LIKE 'Bear%';
--3
SELECT Name, Color
FROM Production.Product
WHERE CHARINDEX('B',Color) = 1;
--4
SELECT Name, Color
FROM Production.Product
WHERE Color LIKE 'B%';
Query 1 uses the CHARINDEX function to find all rows where the Name value starts with Bear. It will return the exact same results as query 2, which uses the LIKE operator. If you take a look at the Messages tab to view the Logical reads, you will see that query 1 takes six logical reads compared to only two logical reads from query 2. The reason for this is that query 1 is scanning the nonclustered index while query 2 is doing a seek against the same index. In this case, a seek of the index is less work.
Queries 3 and 4 do the same test against a non-indexed column, Color. In this case, there is no difference between using a function and LIKE. Both queries take 15 logical reads, the work to scan the entire table. Figure 9-10 shows the Statistics IO information for the four queries.
Figure 9-10. The logical reads comparison
A term you may see that describes the condition where the optimizer can take full advantage of an index is sargable. The term sarg is short for search argument, which is then turned into the adjective sargable describing the search predicate. Of the four queries in this section, only the predicate in query 2 is sargable.
Summary
This chapter introduced several of the more advanced topics you need to learn to fully take advantage of complex WHERE clauses. In addition, you learned about an optional feature of SQL Server, the full-text index. These indexes are used most often with data containing paragraphs or documents, not the keys and limited values of most databases.
Now that you have learned about all the topics involved with selecting data, it is time to learn how to manipulate data by inserting, updating, and deleting rows in Chapter 10.
Answers to the Exercises
This section provides solutions to the exercises found on writing queries with advanced feature WHERE clauses.
Solutions to Exercise 9-1: Using LIKE
Use the AdventureWorks database to complete this exercise.
SELECT ProductID, Name
FROM Production.Product
WHERE Name LIKE 'Chain%';
SELECT ProductID, Name
FROM Production.Product
WHERE Name LIKE '%Paint%';
SELECT ProductID, Name
FROM Production.Product
WHERE Name NOT LIKE '%Paint%';
SELECT BusinessEntityID,
FirstName, MiddleName,
LastName
FROM Person.Person
WHERE MiddleName LIKE '%[EB]%';
SELECT FirstName
FROM Person.Person
WHERE LastName LIKE 'Ja%es';
SELECT FirstName
FROM Person.Person
WHERE LastName LIKE 'Ja_es';
The first query will find any rows with a last name that starts with Ja and ends with es. There can be any number of characters in between. The second query allows only one character in between Ja and es.
Solution to Exercise 9-2: Using WHERE Clauses with Three or More Predicates
Use the AdventureWorks database to complete this exercise.
SELECT SalesOrderID, OrderDate, TotalDue, CreditCardID
FROM Sales.SalesOrderHeader
WHERE OrderDate >= '2006/01/01'
AND OrderDate < '2007/01/01'
AND (TotalDue > 1000 OR CreditCardID IS NOT NULL);
SELECT SUB.Name AS [SubCategory Name],
P.Name AS [Product Name], ProductID, Color
FROM Production.Product P
JOIN Production.ProductSubcategory SUB
ON P.ProductSubcategoryID = SUB.ProductSubcategoryID
WHERE (SUB.Name LIKE '%mountain%' OR P.name like '%mountain%')
AND Color = 'Silver';
Solution to Exercise 9-3: Performing a Full-Text Search
Use the AdventureWorks database to solve the exercises.
SELECT ProductID, Comments
FROM Production.ProductReview
WHERE CONTAINS(Comments,'socks'),
SELECT Title, FileName
FROM Production.Document
WHERE CONTAINS(*,'reflector'),
SELECT Title, FileName
FROM Production.Document
WHERE CONTAINS(*,'reflector AND NOT seat'),
SELECT Title, FileName, DocumentSummary
FROM Production.Document
WHERE FREETEXT(*,'replaced'),