Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

D. Mauri et al.Practical Azure SQL Database for Modern Developershttps://doi.org/10.1007/978-1-4842-6370-9_7

7. Scalability, Consistency, and Performance

Davide Mauri¹, Silvano Coriani², Anna Hoffman³, Sanjay Mishra¹ and Jovan Popovic⁴

(1)

Redmond, WA, USA

(2)

Legnano, Milano, Italy

(3)

Sherman Oaks, CA, USA

(4)

Belgrade, Serbia

Scalability, as generally defined, is the property of a system to handle a growing amount of work by adding resources to the system. In cloud platforms, where massive standardization and “economy of scale” principles are driving hardware design, the ability to just add resources to a single instance of a service may be limited. That is why, in addition to adopting scale-out architectures, an equally important aspect of scalability will be to design and tune your workload so that it can maximize resource utilization. If you are designing a brand-new application or have the chance to modify an existing one before migrating, this will also tangibly reduce your cloud bills.

While scalability is certainly a key attribute, consistency is another critical aspect that we should expect from a RDBMS solution. Transactional systems are making sure that data will remain in a consistent state after applications and users’ interactions, and the underlying engine is responsible to govern how multiple concurrent activities from different users will access data to maintain this consistency, based on configurable attributes like the level of isolation between these transactional activities.

Consistency is really a key factor in application development, even if it is quite often overlooked: many developers still think that consistency is optional. In some edge cases, it may be true; in most cases, it is not. In fact, it is not by chance that lately RDBMS has become again at the center of the stage. In the famous paper “F1: A Distributed SQL Database That Scales,” Google clearly stated that “Designing applications to cope with concurrency anomalies in their data is very error-prone, time consuming, and ultimately not worth the performance gains.” And if consistency is not taken care of right into the database, guess who will be asked to write code to deal with it? Right, you, the developer. That’s because your users will, instead, take consistency for granted. So, someone or something must have to deal with it – at scale, of course. And database-level consistency at scale is the reason why a whole new generation of post-relational databases like Google Spanner, Amazon Aurora, and Azure SQL Hyperscale came to light.

Transactions

Transactions are units of work within a given application logic that it may be composed of multiple lower-level tasks. Transactions are known also as “Unit Of Work,” a very well-known pattern that keeps track of changes and that logically commits or rolls back them as a whole, also described by Martin Fowler in his Patterns of Enterprise Application Architecture book. Database systems process transactions while maintaining what are known as ACID properties (Atomicity, Consistency, Isolation, and Durability) to guarantee data integrity and accuracy. Canonical examples of business logic for a transaction can be a money transfer between bank accounts or an ATM cash withdrawal, where you want to make sure a number of operations are happening following the “all or nothing” logic. A database platform helps enforcing these ACID properties without requiring complex implementations from an application standpoint: developers won’t need to manually implement logic that reverts update actions if at least one update fails.

Now, you may be very familiar with this sample, and you may be thinking that you’re not going to write a banking application, so you don’t really need transactions. While this line of thought can be apparently correct, keep in mind that even in the simplest application, let’s say a blog engine, you may have the need to use transactions. Let’s say, in fact, that you want to keep track of how many people visit your blog, keeping the count both for the whole blog and for each post you have written. Since you want to be sure you can scale as much as needed, you have two counters, one for the whole blog and one for each article. Each time someone visits a page, you need to increment the counter for the whole website and the counter for that page. What happens if you don’t implement and manage transactions properly?

It could happen that, for any reason, something crashes after the website counter has been updated but before the post counter is updated too. This means that now you have completely lost the information. Consistency is gone. If you try to sum up all the single counters, the result will be different than the site counter. Which value is correct now? You may be able to rebuild the information – maybe – with a lot of dedication, time, and effort. In many cases, you just can’t. Now, for a blog counter, that’s not a big deal (and even this is not true, what if you get paid by the number of visits to your blog?), but it gives you the importance of consistency, and thus transactions, even in places where maybe you haven’t thought. Data consistency should be the cornerstone of every system. Without it, data is just noise.

Luckily for us, in modern database systems, ACID properties are usually achieved through write-ahead logging (WAL) which is a capability that provides atomicity and durability by recording changes in the transaction log at first and only later are written to the database by a recurring background process referred to as checkpoint . The log is usually storing both redo and undo information that will be used to recover whatever transaction was already committed, but not yet written in the database, in situations like recovering from a power loss or during a database restore .

Note

If you are an Apache Kafka user, you may find a lot of similarities between the WAL technique and the Immutable Log principle used by Apache Kafka. Everything is an event, stored in the log, in both cases. By reading the log forward or backward, you can redo or undo any action and even stop at a desired point in time. This is exactly why RDBMS like Azure SQL guarantees the end user that no data will be lost and everything can be recovered and the point in time they desire.

In Azure SQL, you can think of several use cases where using a transaction can help. As an example, you may want to insert, update, or delete multiple rows from a table as a single unit or batch process rather than going one by one. Similarly, you may need to modify the state of a table when, and only when, other tables in the same or in a different database (or even server) have been modified as well. As these systems can process multiple of these operations concurrently, Azure SQL engine will protect consistency, generally, by placing locks on various resources involved in these transactions (base tables, index keys, etc.), with a level of granularity that can span from a single row to the entire database, depending on how concurrency (also known as isolation) settings have been configured. These locks will be released once transactions will be committed or rolled back. Locks are just one of the options, probably the simplest, that Azure SQL can use to make sure data stays consistent. Another, much more sophisticated, is the ability to keep different versions of the same row active at the same time, serving the correct version to the requesting transaction, guaranteeing consistency even if other transactions are changing that very same row. Discussion of this technique, known as row versioning , is way out of the scope of this book: just keep in mind that Azure SQL will use a mix of the two to make sure you have the best possible experience and performance by default.

Tip

Keeping your transactions as short in duration as possible is a best practice that will help increase concurrency on your database objects and reducing system resources needed to maintain locks or row versions.

Applications can control transactions by specifying when a transaction starts and ends, successfully or not, through either Transact-SQL commands or API functions exposed by client drivers. Transactions are usually managed at the connection level: when a transaction is started on a connection, all Transact-SQL statements executed on that connection are part of the transaction until the transaction ends.

Note

An exception to this behavior is when you have a multiple active resultset (MARS) session. MARS is a feature that can be turned on through a connection string attribute and will let an application to execute a command on a connection while there is already an active resultset opened on the same connection. A typical scenario is while you are cycling over a series of rows retrieved by a SELECT query and you want to execute an UPDATE command using the same connection. In this case, a transaction will be scoped at the batch level (instead of at the connection level), and when it completes, Azure SQL will automatically roll it back if the transaction is not explicitly committed or rolled back.

Transactions in Azure SQL are defined explicit when you explicitly define start and end through an API function (e.g., BeginTransaction() method in .NET languages) or by issuing T-SQL statements like BEGIN TRANSACTION, COMMIT TRANSACTION, COMMIT WORK, ROLLBACK TRANSACTION, or ROLLBACK WORK.

When you do not explicitly initiate a transaction, but execute a single command against Azure SQL, then the default mode is called auto-commit . In this mode, every T-SQL statement is committed or rolled back when it completes. If a statement completes successfully, it is committed; if it encounters any error, it is rolled back. Auto-commit mode is also the default mode in which client drivers like ADO, OLE DB, ODBC, and DB-Library operate.

The third mode we have is called implicit . In this mode, Azure SQL starts a new transaction after the current transaction is committed or rolled back. There is nothing to do to start off a transaction; you only commit or roll back each transaction. This last mode is mainly implemented for compliance and backward compatibility reasons. In most use cases, your applications will use auto-commit when executing simple database commands, and controlling complex transactions is not strictly needed. Switch to explicit mode is recommended when some more complex specific transactional behavior needs to be managed. Transaction mode settings are controllable at connection level by issuing an explicit T-SQL statement. Different client drivers may have different transaction mode default settings and are invoking T-SQL statements for you when a new connection is established, so it’s important to check in the documentation what’s the behavior of the one you’re using.

Local transactions

As an application developer, you will mostly interact with transactions executed within the context of a single Azure SQL database instance, and these are usually referred as local transactions, to differentiate from distributed transactions that we will discuss later on. In this scenario, local transactions are commonly used while wrapped in some more complex business logic inside a Stored Procedure, which is usually the recommended approach to keep transactions short in duration and avoid multiple roundtrips between application and database code. If, for whatever reason, you are not relying on Stored Procedures but are writing your database access command within your application code, you can control transactional behavior by invoking specific commands exposed through client drivers’ APIs.

Here is an example of how to use explicit transactions within a T-SQL script like a Stored Procedure. In our system, there are two main tables: Orders and Inventory ; every time a new order is placed for a given product, we need to decrement its quantity in stock. Business logic in our system must prevent placing a new order for an article that is running out of stock. Out of many ways of implementing such a logic, in our sample we’re wrapping both new order’s placement and decrementing the inventory for the same article in an explicit transaction:

-- Create table structures

CREATE TABLE dbo.Orders (ID int PRIMARY KEY, ProductID int, OrderDate datetime);

CREATE TABLE dbo.Inventory (ProductID int PRIMARY KEY, QuantityInStock int

CONSTRAINT CHK_QuantityInStock CHECK (QuantityInStock>-1));

-- Fill up with some sample values

INSERT INTO dbo.Orders VALUES (1,1,getdate());

INSERT INTO dbo.Inventory VALUES (1,100);

-- Begin an explicit transaction

BEGIN TRANSACTION mytran

-- Try executing the unit of work in app logic

BEGIN TRY

INSERT INTO Orders VALUES (2,1,getdate());

UPDATE Inventory SET QuantityInStock=QuantityInStock-1 WHERE ProductID=1

-- If both commands are successful, commit the transaction

COMMIT TRANSACTION

END TRY

BEGIN CATCH

-- If there is an error, capture the details

SELECT

ERROR_NUMBER() AS ErrorNumber

, ERROR_SEVERITY() AS ErrorSeverity

, ERROR_STATE() AS ErrorState

, ERROR_PROCEDURE() AS ErrorProcedure

, ERROR_LINE() AS ErrorLine

, ERROR_MESSAGE() AS ErrorMessage;

-- Explicitly rollback the transaction

ROLLBACK TRANSACTION

END CATCH

SELECT * FROM dbo.Orders

SELECT * FROM dbo.Inventory

In this script, after creating and filling a couple of tables, you can find the BEGIN TRANSACTION statement followed by an (optional) name.

The BEGIN TRY block will try to execute all statements until the END TRY, and if they will all be executed without errors, the transaction will be permanently committed by the COMMIT TRANSACTION statement.

../images/493913_1_En_7_Chapter/493913_1_En_7_Fig1_HTML.jpg — Figure 7-1
Successful transaction results

We now change the quantity in stock for our product and we set it to zero, as we want to test what happens when running out of stock, and then we try to run the transaction again:

UPDATE dbo.Inventory SET QuantityInStock=0 WHERE ProductID=1

../images/493913_1_En_7_Chapter/493913_1_En_7_Fig2_HTML.jpg — Figure 7-2
Transaction rollback results

In this case, an error will be thrown while executing the UPDATE Inventory statement as we are violating the CHECK constraint placed on the QuantityInStock attribute. Control will pass from with the BEGIN TRY block to the BEGIN CATCH block. In that block, we’re retrieving all the details related to the latest error happened in that session, but we’re also explicitly calling ROLLBACK TRANSACTION to make sure that modifications made to other tables as part of the same transaction, in the sample INSERT INTO Orders command, will not be persisted, and system’s state will be consistent.

Let’s see how the same behavior can be achieved developing a sample application using Python.

import os

import pyodbc

from decouple import config

server = config('server')

database = config('database')

username = config('username')

password = config('password')

driver= '{ODBC Driver 17 for SQL Server}'

# Connecting to Azure SQL Database with autocommit=True

cnxn = pyodbc.connect('DRIVER='+driver+';SERVER='+server+';PORT=1433;DATABASE='+database+';UID='+username+';PWD='+ password,autocommit=True)

def prepare_database():

try:

# Executing database preparation tasks

cursor = cnxn.cursor()

cursor.execute("""IF (EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES

WHERE TABLE_SCHEMA = 'dbo'

AND TABLE_NAME = 'Orders'))

BEGIN

DROP TABLE [dbo].[Orders];

END

IF (EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES

WHERE TABLE_SCHEMA = 'dbo'

AND TABLE_NAME = 'Inventory'))

BEGIN

DROP TABLE [dbo].[Inventory];

END

CREATE TABLE [dbo].[Orders] (ID int PRIMARY KEY, ProductID int, OrderDate datetime);

CREATE TABLE [dbo].[Inventory] (ProductID int PRIMARY KEY, QuantityInStock int

CONSTRAINT CHK_QuantityInStock CHECK (QuantityInStock>-1));

-- Fill up with some sample values

INSERT INTO dbo.Orders VALUES (1,1,getdate());

INSERT INTO dbo.Inventory VALUES (1,0);

""")

except pyodbc.DatabaseError as err:

# Catch if there’s an error

print("Couldn't prepare database tables")

def execute_transaction():

try:

# Switching to autocommit = False, our app can now control transactional behavior

cnxn.autocommit = False

cursor = cnxn.cursor()

cursor.execute("INSERT INTO Orders VALUES (2,1,getdate());")

cursor.execute("UPDATE Inventory SET QuantityInStock=QuantityInStock-1 WHERE ProductID=1")

except pyodbc.DatabaseError as err:

# If there’s a database error, roll back the transaction

cnxn.rollback()

print("Transaction rolled back: " + str(err))

else:

# If database commands were successful, commit the transaction

cnxn.commit()

print("Transaction committed!")

finally:

cnxn.autocommit = True

prepare_database()

execute_transaction()

Listing 7-1

Example of using transactions in Python

In this example, you can see how the auto-commit property plays an important role when controlling transactions with pyodbc in Python. We’re opening the connection with autocommit=True, which means that each command will be automatically committed once executed. In the prepare_database() method in fact, we’re executing some preparation activities in the database, and these will be executed as a single transaction. In the execute_transaction() method though, we’re explicitly switching to autocommit=False, so that we can programmatically control when to commit or roll back our logical unit of work. Remember that our application logic is enforced with a CHECK constraint that prevents placing new orders when a given product is out of stock.

By executing this code sample, you will get back this error message and database tables will not be modified as the transaction is rolled back:

Transaction rolled back: ('23000', '[23000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server] The UPDATE statement conflicted with the CHECK constraint "CHK_QuantityInStock". The conflict occurred in database "WideWorldImporters-Full", table "dbo.Inventory", column 'QuantityInStock'. (547) (SQLExecDirectW)')

In this very simple example, we’re reacting to the exception raised when we’re getting an error executing a database operation, but there are other scenarios where our application logic requires to check on a specific state of a given entity in the database (by querying a table) and then decide whether commit or roll back the entire unit of work as a whole.

Naming a transaction can be a useful practice that improves code readability and, maybe more important, will be reflected in various transaction-related system views simplifying the debugging process. BEGIN TRANSACTION [transaction_name] starts a local transaction, but nothing is effectively written in the transaction log until further data manipulation actions such as INSERT, UPDATE, or DELETE are triggering an update to the log (although it may be that other activities like acquiring locks to protect the transaction isolation level of SELECT statements can happen).

An application or a script can begin, and even commit or roll back, a transaction where another transaction already exists. Azure SQL does not really provide direct support for nested transactions, and committing a transaction opened within another has absolutely no effect. Only outer transactions will effectively control what happens to various data manipulation activities executed on various transactions in between, committing or rolling back everything. As application developers, we should not really use this approach to avoid misleading behaviors and consequences.

So why are nested transactions allowed at all? Well, a common scenario where they can be used is when you are writing Stored Procedures containing transactional logic that can be invoked either stand-alone or from an outer procedure that may have itself already initiated a transaction. A good practice is always to check if there’s already an open transaction on that connection using the @@TRANCOUNT system variable and then decide if to start a new one or just participate in the one started by the outer procedure and marking the start of the inner part of the transaction with a SAVE TRANSACTION savepoint_name.

Every time a new BEGIN TRANSACTION is executed, @@TRANCOUNT will be incremented by 1. Same will be decremented by 1 every time COMMIT TRANSACTION will be executed. In case of a ROLLBACK TRANSACTION, all transactions will be rolled back and @@TRANCOUNT will be reset to zero, with the single exception of the ROLLBACK TRANSACTION savepoint_name which will not affect @@TRANCOUNT counter in any way.

When more complex transactional logic is required in fact, developers can define a location in the transaction flow where to return if only a portion of the transaction is conditionally canceled or rolled back: this is usually referred as a savepoint . If a transaction is rolled back to a savepoint with a ROLLBACK TRANSACTION savepoint_name, then the control flow must proceed to completion, optionally executing more T-SQL statements if needed, and a COMMIT TRANSACTION statement to complete it successfully. Otherwise, it must be canceled altogether by rolling the transaction back to its beginning. To cancel an entire transaction, use the form ROLLBACK TRANSACTION transaction_name.

A T-SQL example that summarizes all these concepts is as follows.

USE AdventureWorks2012;

IF EXISTS (SELECT name FROM sys.objects

WHERE name = N'SaveTranExample')

DROP PROCEDURE SaveTranExample;

CREATE PROCEDURE SaveTranExample

@InputCandidateID INT

-- Detect whether the procedure was called

-- from an active transaction and save

-- that for later use.

-- In the procedure, @TranCounter = 0

-- means there was no active transaction

-- and the procedure started one.

-- @TranCounter > 0 means an active

-- transaction was started before the

-- procedure was called.

DECLARE @TranCounter INT;

SET @TranCounter = @@TRANCOUNT;

IF @TranCounter > 0

-- Procedure called when there is

-- an active transaction.

-- Create a savepoint to be able

-- to roll back only the work done

-- in the procedure if there is an

-- error.

SAVE TRANSACTION ProcedureSave;

ELSE

-- Procedure must start its own

-- transaction.

BEGIN TRANSACTION;

-- Modify database.

BEGIN TRY

DELETE HumanResources.JobCandidate

WHERE JobCandidateID = @InputCandidateID;

-- Get here if no errors; must commit

-- any transaction started in the

-- procedure, but not commit a transaction

-- started before the transaction was called.

IF @TranCounter = 0

-- @TranCounter = 0 means no transaction was

-- started before the procedure was called.

-- The procedure must commit the transaction

-- it started.

COMMIT TRANSACTION;

END TRY

BEGIN CATCH

-- An error occurred; must determine

-- which type of rollback will roll

-- back only the work done in the

-- procedure.

IF @TranCounter = 0

-- Transaction started in procedure.

-- Roll back complete transaction.

ROLLBACK TRANSACTION;

ELSE

-- Transaction started before procedure

-- called, do not roll back modifications

-- made before the procedure was called.

IF XACT_STATE() <> -1

-- If the transaction is still valid, just

-- roll back to the savepoint set at the

-- start of the stored procedure.

ROLLBACK TRANSACTION ProcedureSave;

-- If the transaction is uncommittable, a

-- rollback to the savepoint is not allowed

-- because the savepoint rollback writes to

-- the log. Just return to the caller, which

-- should roll back the outer transaction.

-- After the appropriate rollback, echo error

-- information to the caller.

DECLARE @ErrorMessage NVARCHAR(4000);

DECLARE @ErrorSeverity INT;

DECLARE @ErrorState INT;

SELECT @ErrorMessage = ERROR_MESSAGE();

SELECT @ErrorSeverity = ERROR_SEVERITY();

SELECT @ErrorState = ERROR_STATE();

RAISERROR (@ErrorMessage, -- Message text.

@ErrorSeverity, -- Severity.

@ErrorState -- State.

);

END CATCH

Listing 7-2

Example of using transactions in a T-SQL script

The logic here is checking if an existing transaction already exists before modifying the database. If that is the case, then it first creates a savepoint, and if the data modification fails, it rolls back only to that savepoint without involving any previous activities executed in the outer transaction.

Not all database drivers provide direct control over transaction names or savepoints: for example, pyodbc doesn’t provide methods to invoke beginning of a transaction (with or without name) or savepoint, but you can always access these functionalities by simply executing the equivalent T-SQL statements.

In other languages like C# and .NET Core, you can instead control all aspects of your transactional logic, as demonstrated by the following example.

using (SqlConnection connection = new SqlConnection(connectionString))

{

connection.Open();

SqlCommand command = connection.CreateCommand();

SqlTransaction transaction;

// Start a local transaction.

transaction = connection.BeginTransaction("SampleTransaction");

// Must assign both transaction object and connection

// to Command object for a pending local transaction

command.Connection = connection;

command.Transaction = transaction;

try

{

command.CommandText =

"Insert into Region (RegionID, RegionDescription) VALUES (100, 'Description')";

command.ExecuteNonQuery();

transaction.Save("FirstRegionInserted");

command.CommandText =

"Insert into Region (RegionID, RegionDescription) VALUES (101, 'Description')";

command.ExecuteNonQuery();

// Rollback to a savepoint

transaction.Rollback("FirstRegionInserted");

// Only first insert will be considered

// Attempt to commit the transaction.

transaction.Commit();

Console.WriteLine("Only first insert is written to database.");

}

catch (Exception ex)

{

Console.WriteLine("Commit Exception Type: {0}", ex.GetType());

Console.WriteLine(" Message: {0}", ex.Message);

// Attempt to roll back the transaction.

try

{

transaction.Rollback("SampleTransaction");

}

catch (Exception ex2)

{

// This catch block will handle any errors that may have occurred

// on the server that would cause the rollback to fail, such as

// a closed connection.

Console.WriteLine("Rollback Exception Type: {0}", ex2.GetType());

Console.WriteLine(" Message: {0}", ex2.Message);

}

Listing 7-3

Example of using transactions in a .NET Core app

Azure SQL offers several Dynamic Management Views (DMVs) to monitor transactions running in a given database instance. They start with the sys.dm_tran_* prefix. The following example returns detailed information on open transactions at the database level:

SELECT * FROM sys.dm_tran_database_transactions WHERE database_id = db_id()

Distributed transactions across cloud databases

Relational database management systems have traditionally supported transactions across multiple databases and even multiple servers, usually referred as distributed transactions. These capabilities are enabled through specific components, like distributed transaction coordinators and monitors (like Microsoft Distributed Transaction Coordinator, or MS DTC), and two-phase commit protocols like XA. Generally speaking, these underlying technologies were designed to run within traditional data center environments, with low-latency, tightly coupled local connectivity and specific hardware configurations. In highly distributed systems and standardized cloud environments though, design patterns have evolved to remove dependencies on these traditional approaches, moving instead into more decoupled, asynchronous patterns like Saga which are usually managed outside of the database tier.

As per the time of writing, MS DTC is supported only in Azure SQL Managed Instance, and you are able to take full advantage of it. But what if you need distributed transaction support in Azure SQL Database? You’ll be happy to know it is still possible to implement distributed transactions across multiple Azure SQL Database instances using Elastic Transactions, which are available for .NET-based applications.

With this approach, Azure SQL Database will effectively coordinate the distributed transaction on behalf of MS DTC, and applications can connect to multiple instances and execute a transaction controlled through System.Transaction classes, with mandatory requirement of .NET 4.6.1 or greater, while unfortunately this is not yet supported by .NET Core at the time we’re writing this. The following picture visualizes how various components are interacting with each other to support this scenario.

../images/493913_1_En_7_Chapter/493913_1_En_7_Fig3_HTML.jpg — Figure 7-3
Distributed transactions on Azure SQL Database

Let’s see how this can be implemented in our applications through a simple example:

using (var scope = new TransactionScope())

{

using (var conn1 = new SqlConnection(connStrDb1))

{

conn1.Open();

SqlCommand cmd1 = conn1.CreateCommand();

cmd1.CommandText = string.Format("insert into T1 values(1)");

cmd1.ExecuteNonQuery();

}

using (var conn2 = new SqlConnection(connStrDb2))

{

conn2.Open();

var cmd2 = conn2.CreateCommand();

cmd2.CommandText = string.Format("insert into T2 values(2)");

cmd2.ExecuteNonQuery();

}

scope.Complete();

}

As you can see, usage is quite straightforward: TransactionScope is doing all the heavy lifting by establishing an ambient transaction (an ambient transaction is one that lives in the current thread), and all connections opened within that scope will participate in that transaction. If these are pointing to multiple databases, then the transaction is automatically elevated to a distributed transaction. When the scope is complete, transactions will commit.

Elastic Transactions can also involve databases belonging to multiple logical servers, but to enable that, these servers will need first to be entered into a mutual communication relationship by invoking the New-AzSqlServerCommunicationLink PowerShell commandlet.

We can monitor the status of a distributed transaction using the same DMVs mentioned for local transactions (e.g., SELECT * FROM sys.dm_tran_active_transactions), but for distributed transactions, the UOW (Unit Of Work) column will identify the different child transactions that belong to the same distributed transaction. All transactions within the same distributed transaction carry the same UOW value.

Locking and non-locking options

Database engines are generally multi-user systems, where operations performed by a given user may or may not affect tables and rows used by others concurrently. By specifying an isolation level for an individual transactional command, or as a default for a session or all sessions in a database, we can define the degree to which one transaction must be isolated from resource or data modifications made by other transactions. Isolation levels define which concurrency side effects, such as dirty reads (read data from a row that has been modified by another running transaction not yet committed) or phantom reads (rows added or removed by another transaction to the records being read), are allowed. They control things like whether locks are taken when data is read, what type of locks are requested, and how long the read locks are held. If locks are not used to provide consistency, isolation level helps the engine to understand when different versions of the same rows can be deleted as no other transactions are using it. When read operations are referencing rows modified by another transaction, isolation levels define if they get blocked until the exclusive lock on the row is freed; if locks are not used, they define which committed version of the row that existed at the time the statement or transaction started should be retrieved; if isolation is set low enough, you can even read uncommitted data. While this is permitted, you should almost never use this ability. Reading uncommitted data means that you are reading something that has the chance to be rolled back; this means that your application may start to do something based on the data that, for example, has been entered by mistake and thus undone.

An important concept to understand is that changing transaction isolation level does not affect how the locks are acquired to protect data modifications. Transaction always gets an exclusive lock on any data modified and holds that lock until the transaction completes, regardless of the isolation level set for that transaction. Locks are not only a synchronization mechanism but also carry metadata which is very useful to the database engine to know what’s going on anytime it is needed. What can be changed is how locks are used. Luckily, we don’t have to go into nitty-gritty details, as we can express the desired behavior we want, in terms of consistency and concurrency, and Azure SQL will use the correct locks for us.

For reads, isolation levels define the level of protection from modifications made by other transactions. Lower isolation levels increase the ability of more concurrent access to data at the same time but increase potential concurrency issues like dirty reads and inconsistency. Higher isolation levels increase consistency but increase the chance of blocking other transactions by using locks that will also consume system resources (i.e., memory). Application logic and requirements will determine which isolation level is more appropriate.

Isolation level	Definition
Read uncommitted	The lowest isolation level; dirty reads are allowed. One transaction may see not-yet-committed changes made by other transactions.
Read committed	A transaction can access data created or modified – and then committed – by another transaction. If data is modified, Azure SQL keeps write locks (acquired on selected data) until the end of the transaction. Read locks are released as soon as the SELECT operation is performed.
Repeatable read	Azure SQL keeps read and write locks that are acquired on selected data until the end of the transaction. Range locks are not managed, so phantom reads can occur. A range lock protects all the data in the range to be modified, including modification done by INSERT or DELETE statements.
Serializable	Highest level, where transactions are completely isolated from one another. Azure SQL keeps read and write locks on selected data and will be released at the end of the transaction. Range locks are acquired when a SELECT operation uses a ranged WHERE clause, especially to avoid phantom reads.

In addition to these isolation levels, which are defined as ISO standards for all database management systems, Azure SQL also supports two additional transaction isolation levels that use row versioning.

Row versioning isolation level	Definition
Read Committed Snapshot (RCSI)	In Azure SQL Database, this is the default isolation level for newly created databases, and read committed isolation uses row versioning to provide statement-level read consistency. Read operations require only SCH-S (shared schema locks, which simply prevent table schema to be changed while the table is being read) and no page or row locks. Azure SQL uses row versioning to provide each statement with a transactionally consistent snapshot of the data as it existed at the start of the statement. Locks are not used to protect the data from updates by other transactions. With this isolation level, writers and readers don’t block each other.
Snapshot	The snapshot isolation level uses row versioning to provide transaction-level read consistency. Read operations acquire no page or row locks; only SCH-S table locks are acquired. When reading rows modified by another transaction, they retrieve the version of the row that existed when the transaction started. You can only use Snapshot isolation against a database when the ALLOW_SNAPSHOT_ISOLATION database option is set ON, which is the default in Azure SQL.

Row versioning isolation level

Definition

Read Committed Snapshot (RCSI)

In Azure SQL Database, this is the default isolation level for newly created databases, and read committed isolation uses row versioning to provide statement-level read consistency. Read operations require only SCH-S (shared schema locks, which simply prevent table schema to be changed while the table is being read) and no page or row locks. Azure SQL uses row versioning to provide each statement with a transactionally consistent snapshot of the data as it existed at the start of the statement. Locks are not used to protect the data from updates by other transactions. With this isolation level, writers and readers don’t block each other.

Snapshot

The snapshot isolation level uses row versioning to provide transaction-level read consistency. Read operations acquire no page or row locks; only SCH-S table locks are acquired. When reading rows modified by another transaction, they retrieve the version of the row that existed when the transaction started. You can only use Snapshot isolation against a database when the ALLOW_SNAPSHOT_ISOLATION database option is set ON, which is the default in Azure SQL.

Before moving further, let’s clarify some of the terminology just used, especially around locks. A Shared Schema Lock (SCH-S) is used to make sure that no one can change the table definition while you are using it. In a highly concurrent system, it could happen that someone will try to DROP a table or ALTER it while you are running your SELECT. Azure SQL will protect your SELECT by placing a SCH-S lock so that those commands who would like to change the schema will have to wait until your SELECT is done.

A Range Lock is used to protect a running transaction from data that doesn’t exist yet, but that, if allowed to be created, could interfere with the running transaction.

The Azure SQL engine manages concurrency between multiple users executing transactions against databases by requesting locks on data. Locks have different modes, such as shared or exclusive. Lock modes define the level of dependency the transaction has on the data. Locks cannot be granted for a transaction if they would conflict with the type of a lock already granted on that data to another transaction. If a transaction requests a lock type that conflicts with a lock that has already been granted on the same data, Azure SQL will pause the requesting transaction until the first lock is released. When a transaction modifies a piece of data, it holds the lock protecting the modification until the end of the transaction. How long a transaction holds the locks acquired to protect read operations depends on the transaction isolation level setting. All locks held by a transaction are released when the transaction completes (either commits or rolls back).

Applications do not typically request locks directly; rather, locks are managed internally by a part of the Azure SQL engine, called the lock manager. When processing a Transact-SQL statement, Azure SQL determines which resources are to be accessed. The query processor determines what types of locks are required to protect each resource based on the type of access and the transaction isolation level setting. The query processor then requests the appropriate locks from the lock manager, a subsystem that exists within the Azure SQL engine. The lock manager grants the locks if there are no conflicting locks held by other transactions.

Azure SQL has multi-granular locking that allows different types of resources to be locked by a transaction. To minimize the cost of locking (lock structures are held in memory), Azure SQL lock resources automatically work at a level appropriate to the task. Locking at a smaller granularity, such as rows, increases concurrency but has a higher overhead because more locks must be held if many rows are locked. Locking at a larger granularity, such as tables, is expensive in terms of concurrency because locking an entire table restricts access to any part of the table by other transactions. However, it has a lower overhead because fewer locks are being maintained. Azure SQL automatically selects the best lock granularity based on the amount of data that it needs to lock, trying to balance between resource costs and concurrency impact.

Azure SQL often must acquire locks at multiple levels of granularity to fully protect a resource. This group of locks at multiple levels of granularity is called a lock hierarchy. For example, to fully protect a read of an index, an instance of the Azure SQL may have to acquire Shared locks on rows and Intent Shared locks on the pages and table. Intent Locks are something used to indicate that, in the future, lock could be acquired on that resource. This information helps the lock manager to avoid future conflicts.

Note

As you have seen, transaction and lock management are pretty complex topics. You can get more insights on how locking works internally, and how different lock types and level can impact concurrency on underlying data can be found on “Transaction Locking and Row Versioning Guide” on the Azure docs at this link: https://aka.ms/sstlarv.

As developers, we can control how our applications are dealing with different isolation levels and concurrency through specific T-SQL statements and client driver APIs like Microsoft.Data.SqlClient or SQL Server JDBC Driver.

In a T-SQL script or procedure, you can use SET TRANSACTION ISOLATION LEVEL isolation_level to control locking and row versioning behavior of all the following statements issued over that connection:

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

BEGIN TRANSACTION;

SELECT * FROM HumanResources.EmployeePayHistory;

SELECT * FROM HumanResources.Department;

COMMIT TRANSACTION;

Various client driver APIs are providing slightly different behaviors and semantics regarding managing isolation levels when executing transactions. The first example here is related to Microsoft.Data.SqlClient driver, where you can specify isolation level when invoking BeginTransaction() method :

// Start a local transaction.

transaction = connection.BeginTransaction(

IsolationLevel.ReadCommitted, "SampleTransaction");

In Microsoft SQL Server JDBC Driver instead, you specify isolation level through the setTransactionIsolation() method of the Connection class:

con.setTransactionIsolation(Connection.TRANSACTION_READ_COMMITTED);

In addition to drivers’ APIs and connection-level isolation settings, table hints can also override default behaviors of Azure SQL engine for the duration of data manipulation language (DML) statements’ execution and let you control things like lock type, granularity, and so on by specifying a locking method. For example, using a table hint of type HOLDLOCK (or its equivalent, SERIALIZABLE) makes shared locks more restrictive by holding them until a transaction is completed, instead of releasing the shared lock as soon as the required table or data page is no longer needed. ROWLOCK specifies that row locks are taken when page or table locks are ordinarily taken (as opposed to TABLOCK or PAGLOCK).

Table hints are specified in the FROM clause of the DML statement and affect only the table or view referenced in that clause; here’s an example of how you can use them:

FROM t (ROWLOCK, XLOCK)

Note that not all the hints are compatible with different DML command types, and to get all details about this interesting topic, you can follow this link in official product documentation: https://aka.ms/uathaaqh.

Note

Slow or long-running queries can contribute to excessive resource consumption and be the consequence of blocked queries. The cause of the blocking can be poor application design, bad query plans, the lack of useful indexes, and so on. You can use the sys.dm_tran_locks dynamic management view to get information about the current locking activity in the database.

As we mentioned, the default isolation level for Azure SQL is Read Committed Snapshot Isolation (RCSI) , and this provides the best trade-off between concurrency and data integrity, letting readers and writers don’t block each other for most scenarios. As a best practice, you should limit the usage of other transaction isolation levels or statement-level table hints to very specific use cases where you need specific behaviors. In all other cases, default settings will work just fine. At the same time, rest assured that if you need to go into great detail and fine-tune even just a single statement, that can be done. This amazing flexibility is unique in the database cloud market!

Natively compiled procedure

Natively compiled Stored Procedures are T-SQL Stored Procedures compiled to machine code, rather than interpreted by the query execution engine as regular Stored Procedures. They work in conjunction with In-Memory OLTP tables in Azure SQL to provide very fast response times for frequently executed queries on performance-critical parts of an application. Performance benefits of using a natively compiled Stored Procedure increases with the number of rows and the amount of logic that is processed by the procedure, and works best when performing aggregations; nested loops joins on large number of row; multi-statement select, insert, update, and delete operations; or complex procedural logic, such as conditional statements and loops.

When we use traditional interpreted Transact-SQL statements to access In-Memory OLTP tables, query processing pipeline looks very similar to disk-based tables, with the exception of the fact that rows are not retrieved from classical buffer pools but from the in-memory engine:

../images/493913_1_En_7_Chapter/493913_1_En_7_Figa_HTML.jpg

When using natively compiled Stored Procedures, the process is split into two parts. First parser, algebrizer and optimizer create optimized query execution plans for all the queries in the Stored Procedure, and then Stored Procedure gets compiled into a native code and stored in a .dll file:

../images/493913_1_En_7_Chapter/493913_1_En_7_Figb_HTML.jpg

When the Stored Procedure is invoked, the In-Memory OLTP runtime locates the DLL entry point for the Stored Procedure, machine code in the DLL is executed, and the results are returned to the client:

../images/493913_1_En_7_Chapter/493913_1_En_7_Figc_HTML.jpg

Syntax for creating a natively compiled Stored Procedure is quite straightforward, as shown in the following example (that tests how fast 100.000 GET + PUT operations can be done on a table acting as a key-value store cache):

create or alter procedure cache.[Test]

with native_compilation, schemabinding

begin atomic with (transaction isolation level = snapshot, language = N'us_english')

declare @i int = 0;

while (@i < 100000)

begin

declare @r int = cast(rand() * 10000 as int)

declare @v nvarchar(max) = (select top(1) [value] from dbo.[MemoryStore] where [key]=@r);

if (@v is not null) begin

declare @c int = cast(json_value(@v, '$.counter') as int) + 1;

update dbo.[MemoryStore] set [value] = json_modify(@v, '$.counter', @c) where [key] = @r

end else begin

declare @value nvarchar(max) = '{"value": "' + cast(sysdatetime() as nvarchar(max)) + '", "counter": 1}'

insert into dbo.[MemoryStore] values (@r, @value)

end

set @i += 1;

end

Once created, this Stored Procedure can be invoked like any other by our application:

EXEC cache.[Test];

There are few restrictions on operators supported within a natively compiled Stored Procedure. For example, the syntax SELECT * FROM table is not supported, and you need to specify the complete column list. For more information on these details, you can look at product documentation at https://aka.ms/ncsp.

Note

As a best practice, you shouldn’t consider using natively compiled Stored Procedures for query patterns like point lookups. If you need to process only a single row, in fact, using a natively compiled Stored Procedure may not provide a performance benefit compared to traditional interpreted Stored Procedures.

Optimizing database roundtrips

Reducing the number of roundtrips between your application tier and the database tier is likely the single most important optimization you can think about when designing or troubleshooting an application working with Azure SQL. Wherever your application logic permits, batching operations to Azure SQL Database and Azure SQL Managed Instance significantly improves the performance and scalability of your applications. Batching calls to a remote service is a well-known strategy for increasing performance and scalability. There are fixed processing costs to any interactions with a remote service, such as serialization, network transfer, and deserialization. Packaging many separate transactions into a single batch minimizes these costs.

The multitenant characteristics of Azure SQL mean that the efficiency of the data access layer correlates to the overall scalability of the database. In response to usage in excess of predefined quotas, Azure SQL can reduce throughput or respond with throttling exceptions. Efficiencies, such as batching, enable you to do more work before reaching these limits.

There are several batching techniques we can adopt to make our applications more efficient, and these apply to most programming languages and frameworks. Let’s start expanding on these techniques by starting from various batching strategies you should consider.

Leveraging transactions

It seems strange to begin a review of batching by discussing transactions. But the use of client-side transactions has a subtle server-side batching effect that improves performance. And transactions can be added with only a few lines of code, so they provide a fast way to improve performance of sequential operations.

Moving from individual calls executed as implicit transactions to wrapping several calls in a single transaction effectively delays writes to the transaction log until the transaction is committed, reducing the impact of latency between Azure SQL engine and transaction log storage.

Depending on specific table structures and number of operations to wrap in a transaction, this technique can provide up to 10x performance improvement for data modification tasks.

Table-valued parameters

Table-valued parameters support user-defined table types as parameters in Transact-SQL statements, Stored Procedures, and functions. This client-side batching technique allows you to send multiple rows of data within the table-valued parameter. To use table-valued parameters, first define a table type. The following Transact-SQL statement creates a table type named MyTableType:

CREATE TYPE MyTableType AS TABLE (

mytext NVARCHAR(MAX),

num INT

);

In your application code, you can create an in-memory data structure with the exact same shape of the table type and pass this data structure as a parameter in a text query or Stored Procedure call. The following example shows this technique in a .NET Core application:

using (SqlConnection connection = new SqlConnection(connString))

{

connection.Open();

DataTable table = new DataTable();

// Add columns and rows. The following is a simple example.

table.Columns.Add("mytext", typeof(string));

table.Columns.Add("num", typeof(int));

for (var i = 0; i < 10; i++)

{

table.Rows.Add(DateTime.Now.ToString(),

DateTime.Now.Millisecond);

}

SqlCommand cmd = new SqlCommand(

"INSERT INTO MyTable(mytext, num) SELECT mytext, num FROM

@TestTvp", connection);

cmd.Parameters.Add(

new SqlParameter()

{

ParameterName = "@TestTvp",

SqlDbType = SqlDbType.Structured,

TypeName = "MyTableType",

Value = table,

});

cmd.ExecuteNonQuery();

}

To improve the previous example further, use a Stored Procedure instead of a text-based command. The following Transact-SQL command creates a Stored Procedure that takes the SimpleTestTableType table-valued parameter:

CREATE PROCEDURE [dbo].[sp_InsertRows]

@TestTvp as MyTableType READONLY

BEGIN

INSERT INTO MyTable(mytext, num)

SELECT mytext, num FROM @TestTvp

END

Then we can just change few lines of previous example to invoke the Stored Procedure:

SqlCommand cmd = new SqlCommand("sp_InsertRows", connection);

cmd.CommandType = CommandType.StoredProcedure;

In most cases, table-valued parameters have equivalent or better performance than other batching techniques. Table-valued parameters are often preferable, because they are more flexible than other options. With table-valued parameters, you can use logic in the Stored Procedure to determine which rows are updates and which are inserts. The table type can also be modified to contain an “Operation” column that indicates whether the specified row should be inserted, updated, or deleted.

For more complex application scenarios where using table-valued parameters can provide great benefits on both overall performance and also simplifying your implementation, you can refer to this link to product documentation: https://aka.ms/piubbs.

Bulk copy

Azure SQL offers specific support for bulk loading large sets of records into a table from an application. This support spans from the TDS wire protocol up to various APIs available in many client drivers (e.g., SqlClient, JDBC, or ODBC).

As an example, .NET applications can use the SqlBulkCopy class to perform bulk insert operations. SqlBulkCopy is similar in function to the command-line tool, Bcp.exe, or the Transact-SQL statement, BULK INSERT. The following code example shows how to bulk copy the rows in the source DataTable, represented by the table variable, to the destination table, MyTable:

using (SqlConnection connection = new SqlConnection(connString))

{

connection.Open();

using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection))

{

bulkCopy.DestinationTableName = "MyTable";

bulkCopy.ColumnMappings.Add("mytext", "mytext");

bulkCopy.ColumnMappings.Add("num", "num");

bulkCopy.WriteToServer(table);

}

You may be wondering now what you should use table-valued parameters and when it would be better to use the bulk copy API. Using table-valued parameters is comparable to other ways of using set-based variables; however, using table-valued parameters frequently can be faster for large datasets. Compared to bulk operations that have a greater startup cost than table-valued parameters, table-valued parameters perform well for inserting up to a few 1000 rows. If you’re batching 10000 rows or more instead, bulk copy will definitely be the most efficient way of loading data into Azure SQL.

Multiple-row parameterized INSERT statements

One alternative for small batches is to construct a large parameterized INSERT statement that inserts multiple rows. T-SQL language provides direct support to this use case through the following syntax:

INSERT INTO [MyTable] ( mytext, num ) VALUES (@p1, @p2), (@p3, @p4), (@p5, @p6), (@p7, @p8), (@p9, @p10)

From your language of choice then, you can use specific drivers’ APIs to leverage these capabilities. As an example, you can take a look at this Python code fragment:

params = [ ('A', 1), ('B', 2) ]

cursor.fast_executemany = True

cursor.executemany("insert into t(name, id) values (?, ?)", params)

Basically, by setting fast_executemany cursor property to True, pyodbc driver will pack multiple parameter tuples into a single batch, drastically reducing the number of roundtrips between your application code and the back-end database.

Higher-level frameworks like EntityFramework Core or SQLAlchemy are leveraging underlying client drivers’ capability and provide equivalent batching options. It’s always a recommended practice to check if these capabilities are available for your own combination of programming language and framework.

Recommendations and best practices

It’s important to understand the trade-offs between batching/buffering and resiliency. If a database operation fails, the risk of losing an unprocessed batch of business-critical data might outweigh the performance benefit of batching, so implementing proper retry logic techniques introduced in previous chapters is even more important when adopting batching.

If you choose a single batching technique, table-valued parameters offer the best performance and flexibility. For the fastest insert performance, follow these general guidelines, but test your scenario:

For < 100 rows, use a single-parameterized INSERT command.
For < 1000 rows, use table-valued parameters.
For >= 1000–2000 rows, use SqlBulkCopy in .NET or equivalent bulk copy techniques in other languages (e.g., SQLServerBulkCopy in Java).

For update and delete operations, use table-valued parameters with Stored Procedure logic that determines the correct operation on each row in the table parameter. Use the largest batch sizes that make sense for your application and business requirements.

Test the largest batch size to verify that Azure SQL does not reject it. Create configuration settings that control batching, such as the batch size or the buffering time window. These settings provide flexibility: you can change the batching behavior in production without redeploying the cloud service.

Avoid parallel execution of batches that operate on a single table in one database. If you do choose to divide a single batch across multiple worker threads, run tests to determine the ideal number of threads. This number depends on several factors, including log generation rate limits for each service tiers and compute sizes, locking effects on data and index pages on target tables, and so on. The net effect is that more threads can effectively decrease performance rather than increase it, so it’s important to test your own workload and potentially adjust the number of threads.

If you want to know more

Managing data consistency is a tough task. Doing it concurrently and at scale, it’s even more challenging. As mentioned in the introduction of this chapter, it’s not by chance that almost all the most used databases provide support for transactions. Managing everything in the application code would be tremendously difficult and inefficient. And today the discussion between scalability and consistency is almost gone. Thanks to the new cloud-born architecture, like the one used to build Azure SQL DB Hyperscale, you can have both. The technology behind Azure SQL Hyperscale database is amazing. If you want to know more, here’s a list that will give you a kickstart:

Unit of Work – https://martinfowler.com/eaaCatalog/unitOfWork.html
Spanner: Becoming a SQL System – Google Research – https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46103.pdf
Socrates: The New SQL Server in the Cloud – www.microsoft.com/research/publication/socrates-the-new-sql-server-in-the-cloud/
Is SQL Beating NoSQL? – https://dzone.com/articles/is-sql-beating-nosql
6 Data Management Patterns for Microservices – http://progressivecoder.com/6-data-management-patterns-for-microservices/
Transaction Locking and Row Versioning Guide – https://docs.microsoft.com/sql/relational-databases/sql-server-transaction-locking-and-row-versioning-guide
How to use batching to improve Azure SQL Database and Azure SQL Managed Instance application performance – https://docs.microsoft.com/azure/azure-sql/performance-improve-use-batching

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 7. Scalability, Consistency, and Performance

Create new playlist

Sign In

Sign Up

7. Scalability, Consistency, and Performance

Transactions

Local transactions

Distributed transactions across cloud databases

Locking and non-locking options

Natively compiled procedure

Optimizing database roundtrips

Leveraging transactions

Table-valued parameters

Bulk copy

Multiple-row parameterized INSERT statements

Recommendations and best practices

If you want to know more

Table of Contents for
7. Scalability, Consistency, and Performance