CHAPTER 16

image

Managing Indexes

by Jason Brimhall

Indexes assist with query processing by speeding up access to the data stored in tables and views. Indexes allow for ordered access to data based on an ordering of data rows. These rows are ordered based upon the values stored in certain columns. These columns comprise the index key columns, and their values (for any given row) are a row’s index key.

This chapter contains recipes for creating, altering, and dropping different types of indexes. I demonstrate how indexes can be created, including the syntax for index options, support for partition schemes, the INCLUDE command, page and row lock disabling, index disabling, and the ability to perform online operations.

For the exercises performed in this chapter, you may wish to back up the AdventureWorks2012 database beforehand so that you can restore it to its original state after going through the recipes.

image Note  For coverage of index maintenance, reindexing, and rebuilding (ALTER INDEX), see Chapter 23. Indexed views are covered in Chapter 14. For coverage of index performance troubleshooting and fragmentation, see Chapter 23.

Index Overview

An index is a database object that, when created, can provide faster access paths to data and can facilitate faster query execution. Indexes are used to provide SQL Server with a more efficient method of accessing the data. Instead of always searching every data page in a table, an index facilitates retrieving specific rows without having to read a table’s entire content.

By default, rows in a regular unindexed table aren’t stored in any particular order. A table in an orderless state is called a heap . To retrieve rows from a heap based on a matching set of search conditions, SQL Server would have to read through all the rows in the table. Even if only one row matched the search criteria and that row just happened to be the first row the SQL Server database engine read, SQL Server would still need to evaluate every single table row because there is no other way for it to know if other matching rows exist. Such a scan for information is known as a full-table scan . For a large table, that might mean reading hundreds, thousands, millions, or even billions of rows just to retrieve a single row. However, if SQL Server knows that there is an index on a column (or columns) of a table, then it may be able to use that index to search for matching records more efficiently.

In SQL Server, a table is contained in one or more partitions. A partition is a unit of organization that allows you to separate allocation of data horizontally within a table and/or index while still maintaining a single logical object. When a table is created, by default, all of its data is contained within a single partition. A partition contains heaps or, when indexes are created, B-tree structures.

When an index is created, its index key data is stored in a B-tree structure. A B-tree structure starts with a root node, which is the beginning of the index. This root node has index data that contains a range of index key values that point to the next level of index nodes, called the intermediate leaf level The bottom level of the node is called the leaf level. The leaf level differs based on whether the actual index type is clustered or nonclustered. If it is a clustered index, the leaf level is the actual data page. If it’s a nonclustered index, the leaf level contains pointers to the heap or clustered index data pages.

A clustered index determines how the actual table data is physically stored. You can designate only one clustered index. This index type stores the data according to the designated index key column or columns. Figure 16-1 demonstrates the B-tree structure of the clustered index. Notice that the leaf level consists of the actual data pages.

9781430242000_Fig16-01.jpg

Figure 16-1. B-tree structure of a clustered index

Clustered index selection is a critical choice, because you can have only one clustered index for a single table. In general, good candidates for clustered indexes include columns that are queried often in range queries because the data is then physically organized in a particular order. Range queries use the BETWEEN keyword and the greater-than (>) and less-than (<) operators. Other columns to consider are those used to order large result sets, those used in aggregate functions, and those that contain entirely unique values. Frequently updated columns and non-unique columns are usually not a good choice for a clustered index key, because the clustered index key is contained in the leaf level of all dependent nonclustered indexes, causing excessive reordering and modifications. For this same reason, you should also avoid creating a clustered index with too many or very wide (many bytes) index keys.

Nonclustered indexes store index pages separately from the physical data, with pointers to the physical data located in the index pages and nodes. Nonclustered index columns are stored in the order of the index key column values. You can have up to 999 nonclustered indexes on a table or indexed view. For nonclustered indexes, the leaf node level is the index key coupled to a row locator that points to either the row of a heap or the clustered index row key, as shown in Figure 16-2.

9781430242000_Fig16-02.jpg

Figure 16-2. B-tree structure of a nonclustered index

When selecting columns to be used for nonclustered indexes, look for those columns that are frequently referenced in WHERE, JOIN, and ORDER BY clauses. Search for highly selective columns that would return smaller result sets (less than 20 percent of all rows in a table). Selectivity refers to how many rows exist for each unique index key value. If a column has poor selectivity, for example, containing only zeros or ones, it is unlikely that SQL Server will take advantage of that query when creating the query execution plan, because of its poor selectivity.

An index, either clustered or nonclustered, is based on one or more key values. The index key refers to columns used to define the index itself. SQL Server also has a feature that allows the addition of non-key columns to the leaf level of the index by using the new INCLUDE clause demonstrated later in the chapter. This feature allows more of your query’s selected columns to be returned or “covered” by a single nonclustered index, thus reducing total I/O, because SQL Server doesn’t have to access the clustered leaf level data pages at all.

You can use up to 16 key columns in a single index, so long as you don’t exceed 900 bytes of all index key columns combined. You can’t use large object data types within the index key, including varchar(max), nvarchar(max), varbinary(max), xml, ntext, text, and the image data types.

A clustered or nonclustered index can be specified as either unique or non-unique. Choosing a unique index ensures that the data values inserted into the key column or columns are unique. For unique indexes using multiple keys (called a composite index), the combination of the key values has to be unique for every row in the table.

As noted earlier, indexes can be massively beneficial in terms of your query performance, but there are also costs associated with them. You should only add indexes based on expected query activity, and you should continually monitor whether indexes are still being used over time. If not, they should be removed. Too many indexes on a table can cause performance overhead whenever data modifications are performed in the table, because SQL Server must maintain the index changes alongside the data changes. Ongoing maintenance activities such as index rebuilding and reorganizations will also be prolonged with excessive indexing.

These next few recipes demonstrate how to create, modify, disable, view, and drop indexes.

image Note  See Chapter 23 to learn how to view which indexes are being used for a query. This chapter also covers how to view index fragmentation and identify whether an index is being used over time. To learn how to rebuild or reorganize indexes, see Chapter 23.

16-1. Creating a Table Index

Problem

You have a table that has been created without any indexes. You need to create indexes on this table.

Solution

Here I show you how to create two types of indexes, one clustered and the other nonclustered. An index is created by using the CREATE INDEX command. This chapter reviews the many facets of this command; however, the basic syntax used in this solution is as follows:

 
CREATE [ UNIQUE ] [ CLUSTERED | NONCLUSTERED ] INDEX index_name
ON {
[ database_name. [ schema_name ] . | schema_name. ] table_or_view_name} ( column [ ASC | DESC ] [ ,...n ] )

The arguments of this command are described in Table 16-1.

Table 16-1. CREATE INDEX Command Arguments

Argument Description
[ UNIQUE ] You can have only one primary key on each table. However, if you wish to enforce uniqueness in other non-key columns, you can designate that the index be created with the UNIQUE constraint. You can create multiple UNIQUE indexes for a single table and can include columns that contain NULL values (although only one NULL value is allowed per column combo).
[ CLUSTERED | NONCLUSTERED ] This specifies the index type, either CLUSTERED or NONCLUSTERED. You can have only one CLUSTERED index but up to 999 NONCLUSTERED indexes.
index_name This defines the name of the new index.
[ database_name. [ schema_name ]. This indicates the table or view to be indexed.
| schema_name. ] table_or_view_name}Column This specifies the column or columns to be used as part of the index key.
[ ASC | DESC ] This defines the specific column order of indexing, either ASC for ascending order or DESC for descending order.

To help demonstrate the creation of indexes for this example, I create a new table in the AdventureWorks2012 database and intentionally exclude a PRIMARY KEY in the table definition:

USE AdventureWorks2012;
GO
If Not Exists (Select 1 from sys.objects where name = 'TerminationReason' and SCHEMA_NAME(schema_id) = 'HumanResources')
BEGIN
CREATE TABLE HumanResources.TerminationReason(
  TerminationReasonID smallint IDENTITY(1,1) NOT NULL,
  TerminationReason varchar(50) NOT NULL,
  DepartmentID smallint NOT NULL,
  CONSTRAINT FK_TerminationReason_DepartmentID FOREIGN KEY (DepartmentID)
REFERENCES HumanResources.Department(DepartmentID)
  );
END

Before I demonstrate how to use CREATE INDEX, it is important to remember that when a primary key is created on a column using CREATE TABLE or ALTER TABLE, that primary key also creates an index. Instead of defining this up front, in this example, I create a CLUSTERED index on TerminationReasonID using ALTER TABLE with ADD CONSTRAINT:

USE AdventureWorks2012;
GO
ALTER TABLE HumanResources.TerminationReason
ADD CONSTRAINT PK_TerminationReason PRIMARY KEY CLUSTERED (TerminationReasonID);

Next, I create a nonclustered index on the Departments column.

USE AdventureWorks2012;
GO
CREATE NONCLUSTERED INDEX NCI_TerminationReason_DepartmentID ON HumanResources.TerminationReason (DepartmentID);

How It Works

In this exercise, the TerminationReason table was created without a primary key defined, meaning that, initially, the table was a heap. The primary key was then added afterward using ALTER TABLE. The word CLUSTERED follows the PRIMARY KEY statement, thus designating a clustered index with the new constraint.

ALTER TABLE HumanResources.TerminationReason
ADD CONSTRAINT PK_TerminationReason PRIMARY KEY CLUSTERED (TerminationReasonID)

Had the TerminationReasonID column not been chosen as the primary key, you could have still defined a clustered index on it by using CREATE INDEX.

USE AdventureWorks2012;
GO
CREATE CLUSTERED INDEX CI_TerminationReason_TerminationReasonID ON HumanResources.TerminationReason (TerminationReasonID);

Had a nonclustered index already existed for the table, the creation of the new clustered index would have caused the nonclustered index to be rebuilt in order to swap the nonclustered leaf level row identifier with the clustered key.

The nonclustered index in the example was created as follows:

USE AdventureWorks2012;
GO
CREATE NONCLUSTERED INDEX NCI_TerminationReason_DepartmentID ON HumanResources.TerminationReason (DepartmentID);

The only difference in syntax between the two index types is the use of CLUSTERED or NONCLUSTERED between the keywords CREATE and INDEX.

16-2. Enforcing Uniqueness on Non-key Columns

Problem

You need to enforce uniqueness on a non-key column in a table.

Solution

Using the table created in the previous recipe (HumanResources.TerminationReason), execute the following script to create a unique index:

USE AdventureWorks2012;
GO
CREATE UNIQUE NONCLUSTERED INDEX UNI_TerminationReason ON HumanResources.TerminationReason (TerminationReason);

Now, I insert two new rows into the table with success.

USE AdventureWorks2012;
GO
INSERT INTO HumanResources.TerminationReason (DepartmentID, TerminationReason)
  VALUES (1, 'Bad Engineering Skills')
  ,(2, 'Breaks Expensive Tools'),

If I attempt to insert a row with a duplicate TerminationReason value, an error will be raised:

USE AdventureWorks2012;
GO
INSERT INTO HumanResources.TerminationReason (DepartmentID, TerminationReason)
  VALUES (2, 'Bad Engineering Skills'),

This query returns the following (results pivoted for formatting):

Msg 2601, Level 14, State 1, Line 9

Cannot insert duplicate key row in object 'HumanResources.TerminationReason'

with unique index 'UNI_TerminationReason'.

The duplicate key value is (Bad Engineering Skills).

The statement has been terminated.

Selecting the current rows from the table shows that only the first two rows were inserted.

USE AdventureWorks2012;
GO
SELECT TerminationReasonID, TerminationReason, DepartmentID
  FROM HumanResources.TerminationReason;

This query returns the following (results pivoted for formatting):

TerminationReasonID	TerminationReason	DepartmentID
---------------------- ---------------------- ------------
1                 Bad Engineering Skills 1
2                 Breaks Expensive Tools 2

How It Works

A unique index was created on the TerminationReason column, which means that each row must have a unique value. You can choose multiple unique constraints for a single table. NULL values are permitted in a unique index and may not be duplicated, much like non-NULL values. Like a primary key, unique indexes enforce entity integrity by ensuring that rows can be uniquely identified.

16-3. Creating an Index on Multiple Columns

Problem

You need to create a composite index to support queries that utilize multiple columns in a search predicate or result set.

Solution

In previous recipes, I’ve shown you how to create an index on a single column; however, many times you will want more than one column to be used in a single index. Use composite indexes when two or more columns are often searched within the same query or are often used in conjunction with one another.

In this example, I have determined that TerminationReason and DepartmentID will often be used in the same WHERE clause of a SELECT query. With that in mind, I create the following multi-column NONCLUSTERED INDEX:

USE AdventureWorks2012;
GO
CREATE NONCLUSTERED INDEX NI_TerminationReason_TerminationReason_DepartmentID
  ON HumanResources.TerminationReason(TerminationReason, DepartmentID);

How It Works

Choosing which columns to index is a bit of an art. You’ll want to add indexes to columns that you know will be commonly queried; however, you must always keep a column’s selectivity in mind. If a column has poor selectivity (containing a few unique values across thousands of rows), for example, it is unlikely that SQL Server will take advantage of that index when creating the query execution plan.

One general rule of thumb when creating a composite index is to put the most selective columns at the beginning, followed by the other, less-selective columns. In this recipe’s example, the TerminationReason was chosen as the first column, followed by DepartmentID. Both are guaranteed to be totally unique in the table and are, therefore, equally selective.

16-4. Defining Index Column Sort Direction

Problem

You need to create an index to support the sort order expected by the application and business requirements.

Solution

The default sort for an indexed column is ascending order. You can explicitly set the ordering using ASC or DESC in the column definition of CREATE INDEX:

( column [ ASC | DESC ] [ ,...n ] )

In this example, I add a new column to a table and then index the column using a descending order:

USE AdventureWorks2012;
GO
ALTER TABLE HumanResources.TerminationReason
ADD ViolationSeverityLevel smallint;
GO
CREATE NONCLUSTERED INDEX NI_TerminationReason_ViolationSeverityLevel
  ON HumanResources.TerminationReason (ViolationSeverityLevel DESC);

How It Works

In this recipe’s example, a new column, ViolationSeverityLevel, was added to the TerminationReason table:

USE AdventureWorks2012;
GO
ALTER TABLE HumanResources.TerminationReason
ADD ViolationSeverityLevel smallint;
GO

Query authors may want to most commonly sort on this value, showing ViolationSeverityLevel from highest to lowest. Matching index order to how you think users will use ORDER BY in the query can improve query performance, because SQL Server isn’t then required to re-sort the data when the query is processed. The index is created with the DESC instruction after the column name:

(ViolationSeverityLevel DESC)

If you have multiple key columns in your index, each can have its own sort order.

16-5. Viewing Index Metadata

Problem

You have created indexes in your database. Now you need some mechanism for tracking where they have been created and what the names, types, and columns are that define them.

Solution

Use the sp_helpindex system stored procedure to view the index names, descriptions, and keys for indexes on a specific table. This system stored procedure only takes a single argument, the name of the table whose indexes you want to view.

This example demonstrates viewing all indexes on the Employee table:

USE AdventureWorks2012;
GO
EXEC sp_helpindex 'HumanResources.Employee';

This returns the following sample results:

index_name index_description index_keys
AK_Employee_LoginID nonclustered, unique located on PRIMARY LoginID
AK_Employee_NationalIDNumber nonclustered, unique located on PRIMARY NationalIDNumber
AK_Employee_rowguid nonclustered, unique located on PRIMARY rowguid
IX_Employee_OrganizationLevel_OrganizationNode nonclustered located on PRIMARY OrganizationLevel, OrganizationNode
IX_Employee_OrganizationNode nonclustered located on PRIMARY OrganizationNode

For more in-depth analysis of indexes, you can use the sys.indexes system catalog view. For example, the following query shows index options (which are discussed later in the chapter) for the HumanResources.Employee table:

USE AdventureWorks2012;
GO
SELECT index_name = SUBSTRING(name, 1,30) ,
  allow_row_locks,
  allow_page_locks,
  is_disabled,
  fill_factor,
  has_filter
  FROM sys.indexes
  WHERE object_id = OBJECT_ID('HumanResources.Employee'),

This returns the following sample results:

index_name allow_row_locks allow_page_locks is_disabled fill_factor has_filter
PK_Employee_BusinessEntityID 1 1 0 0 0
IX_Employee_OrganizationNode 1 1 0 0 0
IX_Employee_OrganizationLevel_ 1 1 0 0 0
AK_Employee_LoginID 1 1 0 0 0
AK_Employee_NationalIDNumber 1 1 0 0 0

How It Works

You can use the system stored procedure sp_helpindex call to list the indexes on a specific table. The output also returns a description of the indexes, including the type and filegroup location. The key columns defined for the index are also listed.

The sys.indexes system catalog view can also be used to find out more about the configured settings of a specific index.

Several of the options shown in this system catalog view haven’t been covered yet, but Table 16-2 discusses some of them that I’ve discussed.

Table 16-2. A Subset of the sys.indexes System Catalog Columns

Column Description
object_id This is the object identifier of the table or view to which the index belongs. You can use the OBJECT_NAME function to show the table or view name, or you can use OBJECT_ID to convert a table or view name into its database object identifier.
name This indicates the index name.
index_id When index_id is 0, the index is a heap. When index_id is 1, the index is a clustered index. When index_id is greater than 1, it is a nonclustered index.
type This specifies the index type, which can be 0 for heap, 1 for clustered index, 2 for nonclustered, 3 for an XML index, 4 for spatial, 5 for clustered columnstore index, and 6 for nonclustered columnstore index.
type_desc This defines the index type description.
is_unique When is_unique is 1, the index is a unique index.
is_primary_key When is_primary_key is 1, the index is the result of a primary key constraint.
is_unique_constraint When is_unique_constraint is 1, the index is the result of a unique constraint.

16-6. Disabling an Index

Problem

You have had a disk error and would like to defer creation of an index affected by the disk error.

Solution

Disable the index. Consider disabling an index as an index troubleshooting technique or if a disk error has occurred and you would like to defer the index’s re-creation.

image Caution  If you disable a clustered index, keep in mind that the table index data will no longer be accessible. This is because the leaf level of a clustered index is the actual table data itself. Also, reenabling the index means either re-creating or rebuilding it (see the “How It Works” section for more information).

An index is disabled by using the ALTER INDEX command. The syntax is as follows:

ALTER INDEX index_name ON
table_or_view_name DISABLE

The command takes two arguments: the name of the index and the name of the table or view on which the index is created. In this example, I disable the UNI_TerminationReason index on the TerminationReason table:

USE AdventureWorks2012;
GO
ALTER INDEX UNI_TerminationReason
  ON HumanResources.TerminationReason DISABLE

How It Works

This recipe demonstrated how to disable an index. If an index is disabled, the index definition remains in the system tables, although the user can no longer use the index. For nonclustered indexes on a table, the index data is actually removed from the database. For a clustered index on a table, the data remains on disk, but because the index is disabled, you can’t query it. For a clustered or nonclustered index on the view, the index data is removed from the database.

To reenable the index, you can use either the CREATE INDEX with DROP_EXISTING command (see later in this chapter) or ALTER INDEX REBUILD (described in Chapter 23). Rebuilding a disabled nonclustered index reuses the existing space used by the original index.

16-7. Dropping Indexes

Problem

You have determined that an index is no longer used and needs to be removed from the database.

Solution

Drop the index. When you drop an index, it is physically removed from the database. If this is a clustered index, the table’s data remains in an unordered (heap) form. You can remove an index entirely from a database by using the DROP INDEX command. The basic syntax is as follows:

DROP INDEX <table_or_view_name>.<index_name> [ ,...n ]

In this example, I demonstrate dropping a single index from a table:

USE AdventureWorks2012;
GO
DROP INDEX HumanResources.TerminationReason.UNI_TerminationReason;

How It Works

You can drop one or more indexes for a table using the DROP . . . INDEX command. Dropping an index frees the space taken up by the index and removes the index definition from the database. You can’t use DROP INDEX to remove indexes that result from the creation of a PRIMARY KEY or UNIQUE CONSTRAINT. If you drop a clustered index that has nonclustered indexes on it, those nonclustered indexes will also be rebuilt in order to swap the clustered index key for a row identifier of the heap.

16-8. Changing an Existing Index

Problem

You need to alter an existing index to add or remove columns or to reorganize the column order.

Solution

Change the column definition of an existing index by using CREATE INDEX ... DROP_EXISTING. This option also has the advantage of dropping and re-creating an index within a single command (instead of using both DROP INDEX and CREATE INDEX). Also, using DROP_EXISTING on a clustered index will not cause existing nonclustered indexes to be automatically rebuilt, unless the index column definition has changed.

Here I show you how to drop and re-create an index within a single execution, as well as change the key column definition of an existing index. The ALTER INDEX can be used to change index options, rebuild and reorganize indexes (reviewed in Chapter 23), and disable an index, but it is not used to add, delete, or rearrange columns in the index.

The following example demonstrates just rebuilding an existing nonclustered index (no change in the column definition):

USE AdventureWorks2012;
GO
CREATE NONCLUSTERED INDEX NI_TerminationReason_TerminationReason_DepartmentID
  ON HumanResources.TerminationReason(TerminationReason, DepartmentID)
WITH (DROP_EXISTING = ON);
GO

Next, a new column is added to the existing nonclustered index.

USE AdventureWorks2012;
GO
CREATE NONCLUSTERED INDEX NI_TerminationReason_TerminationReason_DepartmentID
  ON HumanResources.TerminationReason(TerminationReason, ViolationSeverityLevel, DepartmentID DESC)
WITH (DROP_EXISTING = ON);
GO

How It Works

In the first example, CREATE INDEX didn’t change anything about the existing index definition but instead just rebuilt it by using the DROP_EXISTING clause. Rebuilding an index can help defragment the data, something that is discussed in more detail in Chapter 23.

In the second statement, a new column was added to the existing index and placed right before the DepartmentID column. The index was re-created with the new index key column. You cannot use DROP_EXISTING to change the name of the index. For that, use DROP INDEX and CREATE INDEX with the new index name.

Controlling Index Build Performance and Concurrency

So far in this chapter, I’ve reviewed how an index is defined, but note that you can also determine under what circumstances an index is built. For example, when creating an index in SQL Server, to improve the performance, you can designate that a parallel plan of execution is used, instantiating multiple processors to help complete a time-consuming build. In addition to this, you could also direct SQL Server to create the index in tempdb, instead of causing file growth operations in the index’s home database. If you are using Enterprise Edition, you can also allow concurrent user query access to the underlying table during the index creation by using the ONLINE option.

The next three recipes demonstrate methods for improving the performance of the index build, as well as improving user concurrency during the operation.

16-9. Sorting in Tempdb

Problem

You need to mitigate the length of time to create indexes as well as minimize potential for file growth operations in the user database.

Solution

If index creation times are taking too long for what you expect, you can try to use the index option SORT_IN_TEMPDB to improve index build performance (for larger tables). This option pushes the intermediate index build results to the tempdb database instead of using the user database where the index is housed.

In this recipe, I show you how to push index creation processing to the tempdb system database. The tempdb system database is used to manage user connections, temporary tables, temporary stored procedures, or temporary work tables needed to process queries on the SQL Server instance. Depending on the database activity on your SQL Server instance, you can sometimes reap performance benefits by isolating the tempdb database on its own disk array, separate from other databases.

The syntax for this option, which can be used in both CREATE INDEX and ALTER INDEX, is as follows:

WITH (SORT_IN_TEMPDB = { ON | OFF })

The default for this option is OFF. In this example, I create a new nonclustered index with the SORT_IN_TEMPDB option enabled.

USE AdventureWorks2012;
GO
CREATE NONCLUSTERED INDEX NI_Address_PostalCode
  ON Person.Address (PostalCode)
  WITH (SORT_IN_TEMPDB = ON);

How It Works

The SORT_IN_TEMPDB option enables the use of the tempdb database for intermediate index results. This option may decrease the amount of time it takes to create the index for a large table, but with the trade-off that the tempdb system database will need additional space to participate in this operation.

16-10. Controlling Index Creation Parallelism

Problem

You need to limit the number of processors that index creation can utilize.

Solution

If using SQL Server Enterprise Edition with a multiprocessor server, you can control/limit the number of processors potentially used in an index creation operation by using the MAXDOP index option. Parallelism, which in this context is the use of two or more processors to fulfill a single query statement, can potentially improve the performance of the index creation operation.

The syntax for this option, which can be used in both CREATE INDEX and ALTER INDEX, is as follows:

MAXDOP = max_degree_of_parallelism

The default value for this option is 0, which means that SQL Server can choose any or all of the available processors for the operation. A MAXDOP value of 1 disables parallelism on the index creation.

This example demonstrates how to control the number of processors used in parallel plan execution (parallelism) during an index creation:

USE AdventureWorks2012;
GO
CREATE NONCLUSTERED INDEX NI_Address_AddressLine1
  ON Person.Address (AddressLine1)
  WITH (MAXDOP = 4);

How It Works

In this recipe, the index creation was limited to four processors.

WITH (MAXDOP = 4)

Just because you set MAXDOP doesn’t guarantee that SQL Server will actually use the number of processors that you designate. It only ensures that SQL Server will not exceed the MAXDOP threshold.

16-11. User Table Access During Index Creation

Problem

Users must have continued access throughout the creation of indexes.

Solution

In this recipe, I show you how to allow query activity to continue to access the index even while an index creation process is executing. If you are using SQL Server Enterprise Edition, you can allow concurrent user query access to the underlying table during index creation by using the new ONLINE option, which is demonstrated in this next recipe.

USE AdventureWorks2012;
GO
CREATE NONCLUSTERED INDEX NCI_ProductVendor_MinOrderQty
  ON Purchasing.ProductVendor(MinOrderQty)
  WITH (ONLINE = ON); -- Online option is an Enterprise Edition feature

How It Works

With the ONLINE option in the WITH clause of the index creation, long-term table locks are not held during index creation. This can provide better concurrency on larger indexes that contain frequently accessed data. When the ONLINE option is set ON, only intent share locks are held on the source table for the duration of the index creation, instead of the default behavior of a longer-term table lock held for the duration of the index creation.

Index Options

The next three recipes cover options that affect performance, each in its own different way. For example, the INCLUDE keyword allows you to add non-key columns to a nonclustered index. This allows you to create a covering index that can be used to return data to the user without having to access the clustered index data.

The second recipe discusses how the PAD_INDEX and FILLFACTOR options determine how to set the initial percentage of rows to fill the index leaf level pages and intermediate levels of an index. The recipe discusses how the fill factor affects the performance of not only queries but also insert, update, and delete operations.

The third recipe covers how to disable certain locking types for a specific index. As discussed in the recipe, using these options allows you to control both concurrency and resource usage when queries access the index.

16-12. Using an Index INCLUDE

Problem

You need to provide a covering index for a query that requires the use of several non-key columns.

Solution

One solution to this problem is the INCLUDE keyword, which allows you to add up to 1,023 non-key columns to the nonclustered index, helping you improve query performance by creating a covered index. These non-key columns are not stored at each level of the index but instead are found only in the leaf level of the nonclustered index.

A covering query is a query whose referenced columns are found entirely within a nonclustered index. This scenario often results in better query performance, because SQL Server does not have to retrieve the actual data from the clustered index or heap: it only needs to read the data stored in the nonclustered index. The drawback, however, is that you can only include up to 16 columns or up to 900 bytes for an index key.

The syntax for using INCLUDE with CREATE NONCLUSTERED INDEX is as follows:

CREATE NONCLUSTERED INDEX index_name
ON table_or_view_name ( column [ ASC | DESC ] [ ,...n ] ) INCLUDE ( column [ ,... n ] )

The first column list is for key index columns, and the column list after INCLUDE is for non-key columns. In this example, I create a new large object data type column to the TerminationReason table. I drop the existing index on DepartmentID and re-create it with the new non-key value in the index.

USE AdventureWorks2012;
GO
ALTER TABLE HumanResources.TerminationReason
  ADD LegalDescription varchar(max);
Go
DROP INDEX HumanResources.TerminationReason.NI_TerminationReason_TerminationReason_DepartmentID;
Go
CREATE NONCLUSTERED INDEX NI_TerminationReason_TerminationReason_DepartmentID
  ON HumanResources.TerminationReason (TerminationReason, DepartmentID)
  INCLUDE (LegalDescription);

How It Works

This recipe demonstrated a technique for enhancing the usefulness of a nonclustered index. The example started with creating a new varchar(max) data type column. Because of its data type (large object, LOB), it cannot be used as a key value in the index; however, using it within the INCLUDE keyword will allow you to reference the new large object data types. The existing index on the TerminationReason table was then dropped and re-created using INCLUDE with the new non-key column.

You can use INCLUDE only with a nonclustered index (where it comes in handy for covering queries), and you still can’t include the deprecated image, ntext, and text data types. Also, if the index size increases too significantly because of the additional non-key values, you may lose some of the query benefits that a covering index can give you, so be sure to test comparatively before and after performance.

16-13. Using PADINDEX and FILLFACTOR

Problem

You need to create an index that will help minimize page splits due to insert operations.

Solution

Set the initial percentage of rows to fill the index leaf level pages and intermediate levels of an index. The fill factor percentage of an index refers to how full the leaf level of the index pages should be when the index is first created. The default fill factor, if not explicitly set, is 0, which equates to filling the pages as full as possible (SQL Server does leave some space available, enough for a single index row). Leaving some space available, however, allows new rows to be inserted without resorting to page splits. A page split occurs when a new row is added to a full index page. To make room, half the rows are moved from the existing full page to a new page. Numerous page splits can slow down INSERT operations. On the other hand, however, fully packed data pages allow for faster read activity, because the database engine can retrieve more rows from fewer data pages.

The PAD_INDEX option, used only in conjunction with FILLFACTOR, specifies that the specified percentage of free space be left open on the intermediate level pages of an index. These options are set in the WITH clause of the CREATE INDEX and ALTER INDEX commands. The syntax is as follows:

WITH (PADINDEX = { ON | OFF } FILLFACTOR = fillfactor)

In this example, an index is dropped and re-created with a 50 percent fill factor and PADINDEX enabled:

USE AdventureWorks2012;
GO
DROP INDEX HumanResources.TerminationReason.NI_TerminationReason_TerminationReason_DepartmentID;
GO
CREATE NONCLUSTERED INDEX NI_TerminationReason_TerminationReason_DepartmentID
  ON HumanResources.TerminationReason (TerminationReason ASC, DepartmentID ASC)
  WITH (PAD_INDEX=ON, FILLFACTOR=50);
GO

How It Works

In this recipe, the fill factor was configured to 50 percent, leaving 50 percent of the index pages free for new rows. PADINDEX was also enabled, so the intermediate index pages will also be left half free. Both options are used in the WITH clause of the CREATE INDEX syntax.

WITH (PAD_INDEX=ON, FILLFACTOR=50)

Using FILLFACTOR can be a balancing act between reads and writes. For example, a 100 percent fill factor can improve reads but slow down write activity, causing frequent page splitting because the database engine must continually shift row locations in order to make space in the data pages. Having too low of a fill factor can benefit row inserts, but it can also slow down read operations, because more data pages must be accessed in order to retrieve all required rows. If you’re looking for a general rule of thumb, use a 100 percent (default) fill factor for tables with almost no data modification activity, 80 to 90 percent for low activity, and 70 to 80 percent for moderate to high activity on the index key. When setting this value, test your performance extensively before and after the change to ensure it will work as desired.

16-14. Disabling Page and/or Row Index Locking

Problem

You need to eliminate resource locking during index creation while also troubleshooting performance issues.

Solution

Change the lock resource types that can be locked for a specific index. In Chapter 12, I discussed various lock types and resources within SQL Server. Specifically, various resources can be locked by SQL Server from small (row and key locks) to medium (page locks, extents) to large (table, database). Multiple, smaller-grained locks help with query concurrency, assuming there are a significant number of queries simultaneously requesting data from the same table and associated indexes. Numerous locks take up memory, however, and can lower performance for the SQL Server instance as a whole. The trade-off is larger-grained locks, which increase memory resource availability but also reduce query concurrency.

You can create an index that restricts certain locking types when it is queried. Specifically, you can designate whether page or row locks are allowed.

In general, you should allow SQL Server to automatically decide which locking type is best; however, there may be a situation where you want to temporarily restrict certain resource locking types for troubleshooting or a severe performance issue. The syntax for configuring these options for both CREATE INDEX and ALTER INDEX is as follows:

WITH ( ALLOW_ROW_LOCKS = { ON | OFF } ALLOW_PAGE_LOCKS = { ON | OFF })

This recipe shows you how to disable the database engine’s ability to place row or page locks on an index, forcing it to use table locking instead:

USE AdventureWorks2012;
GO
-- Disable page locks. Table and row locks can still be used.
CREATE INDEX NI_EmployeePayHistory_Rate
  ON HumanResources.EmployeePayHistory (Rate)
  WITH (ALLOW_PAGE_LOCKS=OFF);
-- Disable page and row locks. Only table locks can be used.
ALTER INDEX NI_TerminationReason_TerminationReason_DepartmentID
  ON HumanResources.TerminationReason
  SET (ALLOW_PAGE_LOCKS=OFF,ALLOW_ROW_LOCKS=OFF );
-- Allow page and row locks.
ALTER INDEX NI_TerminationReason_TerminationReason_DepartmentID
  ON HumanResources.TerminationReason
  SET (ALLOW_PAGE_LOCKS=ON,ALLOW_ROW_LOCKS=ON );

How It Works

This recipe demonstrated three variations. The first query created a new index on the table, configured so that the database engine couldn’t issue page locks against the index:

WITH (ALLOW_PAGE_LOCKS=OFF)

In the next statement, both page and row locks were turned OFF (the default for an index is for both to be set to ON):

ALTER INDEX NI_TerminationReason_TerminationReason_DepartmentID
ON HumanResources.TerminationReason
SET (ALLOW_PAGE_LOCKS=OFF,ALLOW_ROW_LOCKS=OFF );

In the last statement, page and row locking is reenabled:

SET (ALLOW_PAGE_LOCKS=ON,ALLOW_ROW_LOCKS=ON )

Removing locking options should only be done if you have a good reason to do so; for example, you may have activity that causes too many row locks, which can eat up memory resources. Instead of row locks, you may want to have SQL Server use larger-grained page or table locks instead.

Managing Very Large Indexes

This next set of recipes for this chapter cover methods for managing very large indexes; however, the features demonstrated here can be applied to smaller and medium-sized indexes as well. For example, you can designate that an index be created on a separate filegroup. Doing so can provide benefits from both the manageability and performance sides, because you can then perform separate backups by filegroup, as well as improve I/O performance of a query if the filegroup has files that exist on a separate array.

As was reviewed in Chapter 15, also in addition to table partitioning you can implement index partitioning. Partitioning allows you to break down the index data set into smaller subsets of data. As discussed in the recipe, if large indexes are separated onto separate partitions, this can positively impact the performance of a query (particularly for very large indexes).

SQL Server provides us with the filtered index feature and the ability to compress data at the page and row level. The filtered index feature allows you to create an index and associated statistics for a subset of values. If incoming queries hit only a small percentage of values within a column, for example, you can create a filtered index that will target only those common values, thus reducing the overall index size compared to a full-table index, and also improving the accuracy of the underlying statistics.

As for the compression feature, available in the Enterprise and Developer Editions, you can designate row or page compression for an index or specified partitions. I demonstrated this feature for CREATE TABLE and ALTER TABLE in Chapter 15. In this chapter, I continue this discussion with how to enable compression using CREATE INDEX and ALTER INDEX.

16-15. Creating an Index on a Filegroup

Problem

You have been required to create indexes in a filegroup other than the filegroup containing the tables and data.

Solution

Create an index on a specific filegroup. If not explicitly designated, an index is created on the same filegroup as the underlying table. This is accomplished using the ON clause of the CREATE INDEX command.

ON filegroup_name | default

This option can take an explicit filegroup name or the database default filegroup. (For more information on filegroups, see Chapter 15.)

This example demonstrates how to explicitly define in which filegroup an index is stored. First, I create a new filegroup on the AdventureWorks2012 database.

Use master;
GO
ALTER DATABASE AdventureWorks2012
  ADD FILEGROUP FG2;

Next, I add a new file to the database and the newly created filegroup in a folder on the root of C called Apress.

Use AdventureWorks2012;
GO
ALTER DATABASE AdventureWorks2012
  ADD FILE
--Please ensure the Apress directory exists or change the path in the FILENAME statement
  ( NAME = AW2,FILENAME = 'c:Apressaw2.ndf',SIZE = 1MB )
  TO FILEGROUP FG2;

Lastly, I create a new index, designating that it be stored on the newly created filegroup.

Use AdventureWorks2012;
GO
CREATE INDEX NI_ProductPhoto_ThumnailPhotoFileName
  ON Production.ProductPhoto (ThumbnailPhotoFileName)
  ON [FG2];

How It Works

The first part of the recipe creates a new filegroup in the AdventureWorks2012 database called FG2 using the ALTER DATABASE command. After that, a new database data file is created on the new filegroup. Lastly, a new index is created on the FG2 filegroup. The ON clause designated the filegroup name for the index in square brackets.

ON [FG2]

Filegroups can be used to help manage very large databases, both by allowing separate backups by filegroup and by improving I/O performance if the filegroup has files that exist on a separate array.

16-16. Implementing Index Partitioning

Problem

You have a partitioned table that is being queried. The indexes on this table are touching each partition and performing less than optimally. You need to optimize the index performance.

Solution

Apply partitioning to a nonclustered index. In Chapter 15, I demonstrated table partitioning. Partitioning can provide manageability, scalability, and performance benefits for large tables. This is because partitioning allows you to break down the data set into smaller subsets of data. Depending on the index key(s), an index on a table can also be quite large. Applying the partitioning concept to indexes, if large indexes are separated onto separate partitions, this can positively affect the performance of a query. Queries that target data from just one partition will benefit because SQL Server will only target the selected partition instead of accessing all partitions for the index.

This recipe demonstrates index partitioning using the HitDateRangeScheme partition scheme that was created in Chapter 15 on the Sales.WebSiteHits table:

Use AdventureWorks2012;
GO
CREATE NONCLUSTERED INDEX NI_WebSiteHits_WebSitePage
  ON Sales.WebSiteHits (WebSitePage)
  ON [HitDateRangeScheme] (HitDate);

How It Works

The partition scheme is applied using the ON clause.

ON [HitDateRangeScheme] (HitDate)

Notice that although the HitDate column wasn’t a nonclustered index key, it was included in the partition scheme, matching that of the table. When the index and table use the same partition scheme, they are said to be aligned.

You can choose to use a different partitioning scheme for the index than the table; however, that scheme must use the same data type argument, number of partitions, and boundary values. Unaligned indexes can be used to take advantage of co-located joins, meaning if you have two columns from two tables that are frequently joined that also use the same partition function, same data type, number of partitions, and boundaries, you can potentially improve query join performance. However, the common approach will most probably be to use aligned partition schemes between the index and table, for administration and performance reasons.

16-17. Indexing a Subset of Rows

Problem

You have a query that is performing subpar. The query searches on a column for a range of values that comprise less than 10 percent of the total rows in the table. You need to optimize this index.

Solution

Add a filtered index to support this query. SQL Server 2008 introduced the ability to create filtered nonclustered indexes in support of queries that require only a small percentage of table rows. The CREATE INDEX command includes a filter predicate that can be used to reduce index size by indexing only rows that meet certain conditions. That reduced index size saves on disk space and potentially improves the performance of queries that now need only read a fraction of the index entries that they would otherwise have to process.

The filter predicate allows for several comparison operators to be used, including IS, IS NOT, =, <>, >, <, and more. In this recipe, I demonstrate how to add filtered indexes to one of the larger tables in the AdventureWorks2012 database, Sales.SalesOrderDetail. To set up this example, let’s assume that I have the following common query against the UnitPrice column:

Use AdventureWorks2012;
GO
SELECT SalesOrderID
  FROM Sales.SalesOrderDetail
  WHERE UnitPrice BETWEEN 150.00 AND 175.00;

Let’s also assume that the person executing this query is the only one who typically uses the UnitPrice column in the search predicate. When she does query it, she is concerned only with values between $150 and $175. Creating a full index on this column may be considered to be wasteful. If this query is executed often, and a full clustered index scan is performed against the base table each time, this may cause performance issues.

I have just described an ideal scenario for a filtered index on the UnitPrice column. You can create that filtered index as follows:

Use AdventureWorks2012;
GO
CREATE NONCLUSTERED INDEX NCI_UnitPrice_SalesOrderDetail
  ON Sales.SalesOrderDetail(UnitPrice)
  WHERE UnitPrice >= 150.00 AND UnitPrice <= 175.00;

Queries that search against UnitPrice and that also search in the defined filter predicate range will likely use the filtered index instead of performing a full-index scan or using full-table index alternatives.

In another example, let’s assume it is common to query products with two distinct IDs. In this case, I am also querying anything with an order quantity greater than 10. However, this is not my desired filtering scenario, just filtering on the product ID:

Use AdventureWorks2012;
GO
SELECT SalesOrderDetailID
  FROM Sales.SalesOrderDetail
  WHERE ProductID IN (776, 777)
  AND OrderQty > 10;

This query performs a clustered index scan. I can improve the performance of the query by adding a filtered index, which will result in an index seek against that nonclustered index instead of the clustered index scan. Here’s how to create that filtered index:

Use AdventureWorks2012;
GO
CREATE NONCLUSTERED INDEX NCI_ProductID_SalesOrderDetail
  ON Sales.SalesOrderDetail(ProductID,OrderQty)
  WHERE ProductID IN (776, 777);

The result will be less I/O, because the query can operate against the much smaller, filtered index.

How It Works

This recipe demonstrates how to use the filtered index feature to create a fine-tuned index that requires less storage than the full-table index alternative. Filtered indexes require that you understand the nature of incoming queries against the tables in your database. If you have a high percentage of queries that consistently query a small percentage of data in a set of tables, filtered indexes will allow you to improve I/O performance while also minimizing on-disk storage.

The CREATE INDEX statement isn’t modified much from its original format. To implement the filter, I used a WHERE clause after the ON clause (if using an INCLUDE, the WHERE should appear after it):

Use AdventureWorks2012;
GO
CREATE NONCLUSTERED INDEX NCI_UnitPrice_SalesOrderDetail
  ON Sales.SalesOrderDetail(UnitPrice)
  WHERE UnitPrice >= 150.00 AND UnitPrice <= 175.00;

The filter predicate allows for simple logic using operators such as IN, IS, IS NOT, =, <>, >, >=, !>, <, <=, and !<. You should also be aware that filtered indexes have filtered statistics created along with them. These statistics use the same filter predicate and can result in more accurate results because the sampling is against a smaller row set.

16-18. Reducing Index Size

Problem

You have discovered that your indexes are significantly large. You need to reduce the size of these indexes without altering the definition of the index.

Solution

Implement compression on the indexes in question. As was covered in Chapter 15, the SQL Server 2012 Enterprise and Developer editions provide options for page and row-level compression of tables, indexes, and the associated partitions. That chapter demonstrated how to enable compression using the DATA_COMPRESSION clause in conjunction with the CREATE TABLE and ALTER TABLE commands. It also covered how you compress clustered indexes and heaps. For nonclustered indexes, you use CREATE INDEX and ALTER INDEX to implement compression. The syntax remains the same, designating the DATA_ COMPRESSION option along with a value of NONE, ROW, or PAGE. The following example demonstrates adding a nonclustered index with PAGE-level compression (based on the example table ArchiveJobPosting created in Recipe 15-14):

Use AdventureWorks2012;
GO
CREATE NONCLUSTERED INDEX NCI_SalesOrderDetail_CarrierTrackingNumber
  ON Sales.SalesOrderDetail (CarrierTrackingNumber)
  WITH (DATA_COMPRESSION = PAGE);

I can modify the compression level after the fact by using ALTER INDEX. In this example, I use ALTER INDEX to change the compression level to row-level compression:

Use AdventureWorks2012;
GO
ALTER INDEX NCI_SalesOrderDetail_CarrierTrackingNumber
ON Sales.SalesOrderDetail
REBUILD
WITH (DATA_COMPRESSION = ROW);

How It Works

This recipe demonstrated enabling row and page compression for a nonclustered index. The process for adding compression is almost identical to that of adding compression for the clustered index or heap, using the DATA_COMPRESSION index option. When creating a new index, the WITH clause follows the index key definition. When modifying an existing index, the WITH clause follows the REBUILD keyword.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset