Chapter 11. Introducing SQL Server In-Memory OLTP

IT systems today are required to deliver maximum performance with zero downtime and maximum flexibility for developers and administrators. As developers, we all know that if the performance of our applications is somehow lacking, then the cause of the slowdown must be in the database!

Major causes of performance problems inside database projects can be followed back to database design mistakes or to the limitations built into the database engine to ensure data consistency for concurrent connections, or a mixture of both of these aspects.

Microsoft SQL Server adheres to the ACID theory. A simplified explanation of the ACID theory is that it describes how changes made to a database are required to fulfil four properties: Atomicity, Consistency, Isolation, and Durability:

  • Atomicity states that the contents of each transaction are either processed successfully and completely or fail in their entirety. "Half-transactions" are not possible.
  • Consistency states that a transaction brings a database from one valid state to another. This includes the adherence to the logical model inside the database (constraints, rules, triggers, and so on). A transaction may not change a database from a logically valid state to a logically invalid state, for example, performing a data change that violates a unique constraint.
  • Isolation states that the changes made inside a transaction are isolated from and unaffected by other concurrent operations/transactions.
  • Durability states that the changes made inside a transaction remain inside the database permanently, even a system goes offline immediately after the transaction has completed.

To adhere to the ACID theory, Microsoft SQL Server employs the pessimistic concurrency model. Pessimistic concurrency assumes there will be conflicting concurrent transactions attempting to change the same data simultaneously. Therefore, transactions must (temporarily) lock the data they are changing to prevent other transactions from interfering with the isolation of those changes.

However, a system with enough concurrent users will eventually reach a logical limitation for transactional throughput through the design of the pessimistic concurrency model. Assuming unlimited hardware resources, the locking of data through pessimistic concurrency will become the bottleneck for transaction processing inside SQL Server.

Microsoft has developed a solution to this logical limitation, by implementing a method for employing the optimistic concurrency model. This model uses row versioning to avoid the need to lock and block data changes between concurrent users. Their implementation was made available in SQL Server 2014 under the code name Hekaton (Greek for 100-fold, hinting at the possible performance increases) and the official feature name In-Memory OLTP.

This chapter will introduce the In-Memory OLTP feature as it was initially made available in SQL Server 2014. We will discuss the feature's implementation and limitations before the specific additions and improvements in SQL Server 2016 are expanded upon in Chapter 12, In-Memory OLTP Improvements in SQL Server 2016.

Note

For the examples in this chapter, we will be working with SQL Server 2014. Chapter 12, In-Memory OLTP Improvements in SQL Server 2016 will cover the improvements and enhancements of In-Memory OLTP for SQL Server 2016.

In-Memory OLTP architecture

In-Memory OLTP is the name of the optimistic concurrency model implementation offered from SQL Server 2014 onwards. The challenge that In-Memory OLTP is designed to solve is the ability to process massively concurrent transaction workloads while removing the logical limitations of the pessimistic concurrency model present in the traditional transaction processing engine in SQL Server. Microsoft also wanted to introduce In-Memory OLTP as an additional transaction processing engine and not as a replacement for the standard transaction processing engine. This decision was made to ensure that existing applications running on SQL Server would be compatible with newer SQL Server versions, but offer the ability to allow newer applications to take advantage of the new engine without separating the two systems.

In this section, we will take a look at the architecture of the In-Memory OLTP engine and see how data is stored and processed.

Row and index storage

The first major difference in In-Memory OLTP is the storage structure of both tables and indexes. The traditional storage engine in SQL Server was optimized for disk storage, especially for the block storage of hard disk subsystems. However, In-Memory OLTP was designed from the ground up to be memory resident, or memory-optimized. This small, but important difference allowed a design with byte storage in mind. This means that memory-optimized tables avoid the needs of data pages and extents that we know from the normal, disk-optimized tables in SQL Server. Through this change, a significant overhead in page and extent management is removed. This reduction in overhead provides a major performance increase in itself.

Row structure

Rows in memory-optimized tables are stored as heaps. These heaps are not to be confused with heaps in the traditional storage engine, but are memory structures storing the row data itself. These rows have no specific physical ordering and no requirement or expectation of being located near to other rows in memory. They also have no knowledge of connected rows themselves; they are solely created to store row data. The connection of these rows is controlled via the indexes created on memory-optimized tables. Knowing that the index stores and controls the connection of the memory-optimized row data heaps, we can infer that a table must have at least one index. This index (or indexes) must be defined during table creation.

In the introduction to the chapter, we discussed that the In-Memory OLTP system is an implementation of the optimistic concurrency model. This model uses row versioning to allow concurrent connections to change data and still provide the necessary isolation. Row versioning is an important portion of how memory-optimized tables are structured. Let's look at how a row is structured for a memory-optimized table.

In Figure 11.1, we can see a depiction of a memory-optimized table, which is split into a row header (information about the row) and payload (the actual row data). The row header structure is expanded in the lower part of the diagram:

Row structure

Figure 11.1: Memory-optimized table row structure - Header and Payload

Row header

The row header is split into five sections:

  • Begin Ts: This is the timestamp of the transaction that inserted the row into the database.
  • End Ts: This is the timestamp of the transaction that deleted the row from the database. For rows that have not yet been deleted, this field contains the value "infinity".
  • StmtId: This field stores a unique statement ID corresponding to the statement inside the transaction that created this row.
  • IdxLinkCount: This field is a counter for the number of indexes that reference this row. An in-memory table stores data only and has no reference to other rows in the table. The indexes that are linked to the table control which rows belong together.
  • Finally, we have a set of pointers that reference the indexes connected to this row. The number of pointers stored here corresponds directly to the number of indexes that reference this row.

Row payload

The row payload is where the actual row data is stored. The key columns and all other columns are stored here. The structure of the payload depends upon the columns and data types that make up the table from the CREATE TABLE command.

Index structure

All memory-optimized tables must have at least one index. The indexes on a memory-optimized table store the connections between the rows. It is also important to remember that, as the name suggests, In-Memory OLTP indexes reside inside non-permanent memory (RAM) and are not written to disk. The only portions written to disk are the data rows of the base memory-optimized table and data changes, which are written to the transaction log (which is a requirement to fulfill the Durability part of the ACID concept).

There are two types of index available for memory-optimized tables: non-clustered index and hash index.

Non-clustered Index

To be able to discuss the structure of a non-clustered index itself, we first need to understand how the storage engine creates, references, and processes these indexes. The name of this type of index should be well known to all SQL Server developers. A non-clustered index for a memory-optimized table is widely similar to a non-clustered index in the traditional storage engine. This type of index uses a variation of the B-tree structure called a Bw-tree and is very useful for searching on ranges of data in a memory-optimized table. A Bw-tree is essentially a B-tree structure, without the locking and latching used to maintain a traditional B-tree. The index contains ordered key-values which themselves store pointers to the data rows. With the Bw-tree, these pointers do not point to a physical page number (these data page structures don't exist for memory-optimized tables); the pointers direct to a logical Page ID (PID), which directs to an entry in a Page Mapping Table. This Page Mapping Table stores the physical memory address of the row data. This behavior mimics a covering index from the traditional disk-based non-clustered indexes, containing the columns of the key columns of the index. As mentioned, memory-optimized tables run in optimistic concurrency and use versioning to avoid the need for locks and latches. The memory-optimized non-clustered index never updates a page when a change occurs; updated pages are simply replaced by adding a new page to the index and adding the physical memory address to the Page Mapping Table.

In Figure 11.2, we see an example of a memory-optimized non-clustered index:

Non-clustered Index

Figure 11.2: Memory-optimized non-clustered index - Bw-tree structure

The Bw-tree looks a lot like a B-tree except that we have a connection with the Page Mapping Table (instead of an Index Allocation Map for disk-based tables), which translates the index to the physical memory location of the row data. The root page and the non-leaf pages store key value information and a PID of the page "below" themselves in the tree. Particularly noteworthy here, is that the key value stored is the highest value on the page on the next level down and not, as with a traditional B-tree, the lowest value. The leaf level also stores the PID to the memory address of the row that the index references. If multiple data rows have the same key, they will be stored as a "chain of rows", where the rows reference the same physical address for the value. Data changes in a memory-optimized non-clustered index are processed through a delta record process. The value that was originally referenced is then no longer referenced by the index and can be physically removed via the Garbage Collector (GC).

A memory-optimized non-clustered index page follows a familiar pattern, with each page having a header which carries uniform control information:

  • PID: The Page ID pointer which references the Page Mapping Table
  • Page Type: This indicates the type of the page—leaf, internal, or delta
  • Right PID: The PID of the page to the right of the current page in the Bw-tree
  • Height: The distance of the current page to the leaf
  • Page statistics: The count of records on the page, including the delta records
  • Max Key: The maximum key value on the page

For pages lower down the Bw-tree structure, that is, internal and leaf, there are additional fields containing more detailed information required for deeper traversal than the root page:

  • Values: This is an array of pointers that directs to either (for internal pages) the PID of the page on the next level down, or (for leaf pages) the memory location for the first row in a chain of rows with the same key values
  • Keys: This represents the first value on the page (for internal pages) or the value in a chain of rows (for leaf pages)
  • Offsets: This stores the key value start offset for indexes with variable length keys

Hash indexes

The second type of index for memory-optimized tables is the hash index. An array of pointers is created, with each element in the array representing a single value—this element is known as a hash bucket. The index key column of a table that should have a hash index created on it has a hashing function applied to it and the resulting hash value is used to "bucketize" rows. Values that generate the same hash value are stored in the same hash bucket as a linked chain of values. To access this data, we pass in the key value we wish to search for; this value is also hashed to determine which hash bucket should be accessed. The hash index translates the hash value to the pointer and allows us to retrieve all the duplicate values in a chain.

In Figure 11.3, we can see a very simplified representation of the hashing and retrieval of data. The row header has been removed for clarity's sake.

Hash indexes

Figure 11.3

In Figure 11.3, we can see a table storing company address information. We are interested in the city column, which is the key for a hash index. If we were to insert a value of London into this hash index, we will get a value 23 (the hash function is simplified here for illustration purposes). This stores the key value London and the memory address of the table row of the memory-optimized table. If we then insert the second row with a city value of Lingen and our hash function hashes Lingen to 23 too, then this will be added to the hash index in the hash bucket for the value 23 and added to the chain started with London. Each hash index on a memory-optimized table adds an index pointer column (shown here with a dashed outline). This is an internal column that is used to store the pointer information for the next row in the chain. Every row of a chain has a pointer to the next row, barring the last row, which can have no following chain row. When no pointer is in this column, the retrieval routine knows it has reached the end of the chain and can stop loading rows.

The creation of a hash index requires some thought. We have seen that we can have multiple values that hash to the same hash value, which creates chains of records. However, the idea of a hashing function is to try and keep these chains as short as possible. An optimal hash index will have enough hash buckets to store as many unique values as possible, but has as few buckets as possible to reduce the overhead of maintaining these buckets (more buckets equal more memory consumption for the index). Having more buckets than is necessary will not improve performance either but rather hinder it, as each additional bucket makes a scan of the buckets slower. We must decide how many buckets to create when we initially create the index, which we will see with a sample script shortly. It is also important to realize that the hash index and hashing function consider all key columns of an index we create. So a multi-column index raises the chances that hash values will be unique and can require more hash buckets to assist with keeping those buckets emptier.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset