© Davide Mauri, Silvano Coriani, Anna Hoffman, Sanjay Mishra, Jovan Popovic 2021
D. Mauri et al.Practical Azure SQL Database for Modern Developershttps://doi.org/10.1007/978-1-4842-6370-9_3

3. Connecting and Querying Azure SQL

Davide Mauri1 , Silvano Coriani2, Anna Hoffman3, Sanjay Mishra1 and Jovan Popovic4
(1)
Redmond, WA, USA
(2)
Legnano, Milano, Italy
(3)
Sherman Oaks, CA, USA
(4)
Belgrade, Serbia
 

Once you have created and configured a database instance, your next task will be to connect a newly developed or existing application to it and start executing data manipulation or retrieval commands.

Azure SQL is a cloud-native database service that communicates with external applications and processes through multiple Inter-Process Communication (IPC) mechanisms, like TCP/IP sockets, named pipes, or shared memory. Both commands in T-SQL (SQL Server’s own SQL dialect) like SELECT/INSERT/UPDATE/DELETE and resultsets returned from the service are packaged into an application-level protocol called TDS (Tabular Data Stream, https://aka.ms/mstds).

As an application developer, of course you don’t have to code against these low-level protocols in your own application. They usually are abstracted by a comprehensive series of drivers and libraries covering pretty much every modern programming language and framework available on the market and running on Windows, Linux, and macOS operating systems.

All the examples in this chapter are edited using Visual Studio Code editor (https://code.visualstudio.com/) and built and executed through command-line tools and SDKs for respective runtimes like .NET Core 3.1, OpenJDK 11, and Python 3.6.6 and on Windows, macOS, or Linux operating systems.

Driver and libraries

Most drivers are designed around some basic constructs, mainly representing common entities like
  • A connection with server/database

  • A command to execute over that connection

  • An object to iterate and access records on a returned resultset

Some libraries do also offer more advanced data manipulation capabilities, like disconnected caches that can store retrieved rows, track offline modifications, and provide current/previous versions of contained rows to be used in pessimistic concurrency multi-user scenarios (e.g., ADO.NET DataSets/DataTable).

Higher-level frameworks and libraries are also available to cover specific scenarios in areas like performance (e.g., think about connection pooling for libraries that don’t provide it natively, like JDBC drivers) or productivity tools like Object-Relational Mappers and MicroORMs to speed up development times. This table recaps all client drivers for Azure SQL on various programming languages and available for all platforms (Windows, Linux, macOS):

Language

Driver library

Version

.NET languages (C#, F#, etc.)

Microsoft ADO.NET for SQL Server

V1.1+

Java

Microsoft JDBC driver for SQL Server

V8.2+

PHP

PHP SQL driver for SQL Server

V 5.8+

Node.js

Node.js Tedious driver for SQL Server

V8.0.1+

Python

Python ODBC bridge (pyodbc)

V4.0.30+

Go

Microsoft SQL Server Driver for Go

 

Ruby

Ruby driver for SQL Server

V2.1.0+

Native languages (e.g., C/C++)

Microsoft ODBC driver for SQL Server

V17.5.1.1+

What Azure SQL driver works best for me

For languages like C#, C++, Node.JS, or Python, multiple options are available for drivers, depending on platforms and application needs, so it’s important to understand what combinations are available and recommended.

If your application is targeting full .NET Framework (on Windows) or the multi-platform .NET Core (on Windows, macOS, Linux, or Docker) using a language like C#, you have two main options:
  • Rely on the classic ADO.NET Provider for SQL Server in System.Data.SqlClient namespace, which is available as part of the full .NET Framework version installed at the machine level.

  • Reference in your project the new Microsoft.Data.SqlClient driver, hosted in GitHub (https://aka.ms/ghmdsc) and NuGet (https://aka.ms/ngmdsc).

The latter option provides a much more flexible and direct way for Microsoft to introduce improvements in client libraries for .NET, so it’s the recommended version to use when developing a brand-new app. Installing and referencing this package for a project is pretty straightforward using dotnet CLI:
dotnet add package Microsoft.Data.SqlClient

Programming languages like Perl, PHP, and Python are providing lightweight wrappers and interfaces around native ODBC drivers (e.g., pyodbc, DBI, etc.). To support all newest features available in latest Azure SQL releases (e.g., Always Encrypted, Data Classification, AAD authentication, etc.), Microsoft ODBC Driver 17 for SQL Server (or higher) is recommended. This driver is available for most major Linux distros and releases, in addition to Windows. Packaged with this driver are also traditional SQL Server client utilities like sqlcmd and bcp. Installation procedures for the ODBC driver largely depend on your target operating system and distribution, and all details can be found at this URL: https://aka.ms/azuresql-odbc-install.

Java developers should use Microsoft JDBC Driver 8.2 for SQL Server, which is a Type 4 JDBC driver supporting JRE 8, 11, and 12 and works on all main operating systems where Java is supported, including Sun Solaris. When using Maven, JDBC driver can be added to a project by adding it as dependency in the POM.xml file:
<dependency>
    <groupId>com.microsoft.sqlserver</groupId>
    <artifactId>mssql-jdbc</artifactId>
    <version>8.2.0.jre11</version>
</dependency>

As typical in the Java space, JDBC drivers don’t provide native connection pooling capabilities so many external libraries are available on the market. While we’re not endorsing any particular one, HikariCP (https://aka.ms/hikaricp) is one we often encounter when working with customers connecting to Azure SQL instances and has proven to be fast and reliable.

For other programming languages like Python , usually multiple options are available although Microsoft places its testing efforts and its confidence in pyodbc driver (https://aka.ms/pyodbc) that, as discussed previously, relies on the ODBC driver for SQL Server and supports most operating systems. You can easily install pyodbc using the following line in command prompt:
pip install pyodbc
Similarly, for Node.JS, the Tedious driver can be added to a project through canonical package manager typing
npm install tedious

Getting started

Official Microsoft documentation offers a great “Getting started” section for developers (https://aka.ms/sdcq), which is the best place to start familiarizing with application development for Azure SQL family using a step-by-step approach.

While you will have time to evaluate all these samples, we will start from scratch here with some of the most popular programming languages and explain key aspects of how to connect to Azure SQL and run some basic queries.

All samples in this chapter are referring to a database named “WideWorldImportersFull” that can be deployed as Azure SQL Database stand-alone or managed instance. In Chapter 2, you’ll find instructions on how to deploy this sample database. Please update connection strings accordingly to make these samples work.

Listing 3-1. Data access method in a .NET Core application
public async Task<IEnumerable<Order>> GetOrders()
{
    List<Order> orders = new List<Order>();
    using (SqlConnection cnn =  new SqlConnection(
                    config.GetConnectionString
                           ("DefaultConnection")))
    // Connection string pattern: "Server=tcp:<servername>.database.windows.net,1433;Initial Catalog=<database>;User ID=<username>;Password=<password>;Connect Timeout=30;"
    {
        SqlCommand cmd = new SqlCommand
            (@"SELECT TOP 5
                  [o].[OrderID],
                  [o].[OrderDate],
                  [c].[CustomerName]
               FROM [Sales].[Orders] AS [o]
                  INNER JOIN [Sales].[Customers] AS [c]
                  ON [o].[CustomerID] = [c].[CustomerID]", cnn);
        await cnn.OpenAsync();
        SqlDataReader dr = await cmd.ExecuteReaderAsync();
        while (dr.Read())
        {
            orders.Add(new Order()
            {
                OrderID = Convert.ToInt32(dr[0]),
                OrderDate = Convert.ToDateTime(dr[1]),
                CustomerName = Convert.ToString(dr[2])
            });
        }
    }
    return orders;
}
In this basic example, you can find a typical .NET Core data access method that can be reused in multiple application types, from a Console app to a WebAPI or Azure Function. Once you add a reference to Microsoft.Data.SqlClient package from NuGet, you can use SqlConnection and SqlCommand classes to interact with Azure SQL using asynchronous methods to open a new connection and execute a command that retrieves a resultset and materialize a collection of objects to be used in your business logic. While tons of other things are happening behind the scenes when you execute a simple operation like this, and many best practices and guidance are available to optimize your data access layer, you will find yourself writing similar code in most of the database interactions you’ll have in your applications. Let’s address the same exact scenario using Java now, one of enterprise developers’ most favorite programming languages.
public List<Order> getOrders() {
     List<Order> order = new ArrayList<>();
     String connectionString = "jdbc:sqlserver://<servername>.database.windows.net:1433;" + "database=WideWorldImporters-Full;user=<username>@<servername>;" + "password=<password>;loginTimeout=30;";
      try {
          try (Connection conn =
               DriverManager.getConnection(connectionString)) {
                String sql =
                "SELECT TOP 5 "
                    +"[o].[OrderID],"
                    +"[o].[OrderDate],"
                    +"[c].[CustomerName]"
                    +"FROM [Sales].[Orders] AS [o]"
                    +"INNER JOIN [Sales].[Customers] AS [c] "
                    +"ON [o].[CustomerID] = [c].[CustomerID]";
                try (Statement stmt =
                    conn.createStatement();
                        ResultSet rs =
                        stmt.executeQuery(sql)) {
                    while (rs.next()) {
                        order.add(new Order(
                            rs.getInt(1),
                            rs.getString(2),
                            rs.getString(3)));
                    }
                }
                conn.close();
            }
      } catch (Exception e) {
            System.out.println();
            e.printStackTrace();
      }
      return order;
}
Listing 3-2

Data access method in a Java application

Java code sample is very similar to previous .NET Core one. Once we create an initial code skeleton through Maven for the application type we’re looking for (Console app, REST service, etc.), we need to reference JDBC driver library in the pof.xml file to download it from central package repository and then use it in your code. DriverManager class represents the entry point for all JDBC drivers loaded in the application, and the jdbc:sqlserver prefix in the connection string is indicating what specific driver to use. Connection, Statement, and resultset are again the main classes used to encapsulate underlying connection management, command execution, and resultset iteration. These three classes will be the foundation for most of your database interactions.

Python is a language that gained a lot of traction over the years for writing data access and management applications. Let’s see how to use it to connect to an Azure SQL database instance.
@app.route('/order')
def getorders():
     cnxn = pyodbc.connect("DRIVER={ODBC Driver 17 "
     "for SQL Server};" "SERVER="+server+";DATABASE="+database+""
     ";UID="+username+";PWD="+ password)
     cursor = cnxn.cursor()
     tsql = "SELECT TOP 5 "
            " [o].[OrderID], [o].[OrderDate],"
            " [c].[CustomerName]"
            " FROM [Sales].[Orders] AS [o]"
            " INNER JOIN [Sales].[Customers] AS [c]"
            " ON [o].[CustomerID] = [c].[CustomerID]"
     rows = cursor.execute(tsql).fetchall()
     order_list = []
     for row in rows:
        d = collections.OrderedDict()
        d['orderID'] = row.OrderID
        d['orderDate'] = str(row.OrderDate)
        d['customerName'] = row.CustomerName
        order_list.append(d)
     return json.dumps(order_list)
Listing 3-3

Create a simple Python application

In this Python Flask function (you can find the complete app in the companion GitHub repo), pyodbc module plays a central role. It provides a wrapper around ODBC driver and higher-level abstractions like connection, cursor, and row, which let you both execute commands and iterate on results. Rows are then transformed into a list of ordered dictionaries and returned as a JSON fragment to function callers. In pure Pythonic spirit, pyodbc represents a pretty efficient and straightforward method for accessing Azure SQL database instances from your Python programs.

As we mentioned previously, tedious is the driver you can use to connect to Azure SQL from Node.JS applications. Let’s take a look at a simple function that executes a query and converts a resultset into a JSON array that can be returned, for example, as REST API result.
exports.getorders = function(req, res) {
  const { Connection, Request } = require("tedious");
  const config = {
    server: "<servername>.database.windows.net",
    options: {
      database: "WideWorldImporters-Full",
      encrypt: true},
      authentication: {
        type: "default",
        options: {
          userName: "username",
          password: "password",
        }
      }
  };
  const connection = new Connection(config);
  connection.on("connect", err => {
    if (err) {
      console.error(err.message);
    } else {
      getOrders();
    }
  });
  function getOrders() {
    var data = []
    const request = new Request(
      `SELECT TOP 5 [o].[OrderID],
      [o].[OrderDate], [c].[CustomerName]
      FROM [Sales].[Orders] AS [o]
      INNER JOIN [Sales].[Customers] AS [c]
      ON [o].[CustomerID] = [c].[CustomerID]`,
      (err, rowCount) => {
        if (err) {
          console.error(err.message);
        }
        else {
          res.send(data)
        }
      }
    );
    request.on("row", function(row) {
      data.push({
        orderid: row[0].value,
        orderdate: row[1].value,
        customername: row[2].value
      })
    });
    connection.execSql(request);
  }
};
Listing 3-4

Node.JS function returning JSON array from database

As seen in previous examples for other programming languages, the Connection and Request objects play a central role for opening a new connection to the database, executing a command, and retrieving results (you can find the complete working example in the companion GitHub repo). These two objects offer all properties and capabilities to cover most common scenarios, from defining parameterized queries to transaction management or bulk loads.

This section doesn’t cover all available drivers and libraries but provides a representative overview of how applications are typically connecting with Azure SQL and can definitely be applied to other drivers not mentioned here. Moving forward, we will focus on advanced and specific scenarios that are critical for your application’s data access layer.

Connectivity aspects

As briefly mentioned in Chapter 2, network connectivity is another fundamental aspect of interacting with Azure SQL from your applications. Although more related to infrastructure design and network security than pure app development, you’ll need to consider various options and components that play a role in this process to successfully connect with your database instances. Both single database and managed instance deployment models for Azure SQL Database support two main connectivity options for applications:
  • Private VNET

  • Public connectivity

While a managed instance is automatically associated with an Azure Virtual Network at creation time, and public connectivity is optional, for individual Azure SQL Database instance is actually the opposite, and you should rely on Azure Private Link capability to connect your application deployed on one of the Azure services that support VNETs via a private endpoint and on a completely isolated network traffic path.

For those scenarios where public connectivity is required, there are still some options that can be considered to govern network traffic between your application and Azure SQL:
  • Server- and database-level firewall

  • VNET Service Endpoint

  • VNET Network Security Group (NSG)

Server- and database-level firewall rules are designed to define what ranges of IP addresses (coming from public Internet connectivity or from within various Azure services) can establish a connection with Azure SQL Database single instances. If database-level firewall rules exist (today, these can be created through T-SQL commands only), those will be evaluated to understand if a client connection is coming from an allowed range; otherwise, server-level rules (valid for all database instances associated with that virtual server and defined through T-SQL, PowerShell/CLI, or Azure Portal) will be checked. If you decide to trust all network connections coming from an Azure service, then there’s a check box option on the portal to “Allow Azure services and resources to access this server” to simplify your settings.

VNET Service Endpoint is a feature designed to guarantee that network connections targeting a given Azure SQL server will be accepted only if they come from one or more VNET/Subnet pairs where your applications are deployed.

On the application side of the connection, a VNET Network Security Group can be created and associated with a Virtual Machine’s NIC or an entire Subnet to make sure they can only connect with a given range of IP addresses and ports where your Azure SQL instances reside.

Network security is not the only aspect that can affect database connectivity; network latency is the other critical one. Connection policy determines how your database connections will interact with the back-end node hosting your databases in Azure SQL and can be one of these two options:
  • Proxy

  • Redirect

With Proxy, you have maximum flexibility to connect to your instance from anywhere using FQDN server name and just port 1433, but these connections will always go through a Gateway front-end layer that will increase network latency (usually not good for chatty applications).

Redirect policy instead establishes connections directly to the node hosting the database, reducing latency, and improving application throughput, but it does require specific port ranges (11000–11999) to be open between your application host and the database.

Default connection policy is Proxy for all connections coming from outside Azure service IP ranges and Redirect for those within Azure, but you can change the default through PowerShell and Azure CLI. More details on Azure SQL connectivity architecture can be found at the following URLs:

Resilient connection and query execution

Building distributed systems on cloud architectures requires a specific approach regarding service-to-service interactions to increase resiliency and availability. This is generally true for all kinds of cloud services and interactions, from transactional solutions to batch processing.

In a traditional on-premises solution, database and application servers are usually sitting next to each other with physical networking devices like switches and routers dedicated to providing stable and fast connectivity. When hardware or software failures are happening, despite proper redundancy and high-level device quality in place, chances are that these are going to be quite impactful and persisting until someone will physically fix the problem and bring the system back online again.

In a cloud environment, everything is virtualized and completely automated, so you’ll have many more moving parts potentially introducing some transient connectivity blips that application developers should consider to make their application more reliable overall. As an example, Azure SQL will automatically manage hardware or software failures, or planned maintenance operations, to the database node service application requests at any point in time, but, as fast as the failover operation to a secondary node can be, data access code may face an exception for the very few seconds after that event happened (on average around 8 seconds, at most in less than 60 seconds) before being able to respond to requests regularly.

Other examples of transient connectivity issues could be related to reaching maximum limits in Azure SQL instances based on the service tier and size selected. As we’re referring to a multitenant service, it is absolutely critical to preserve the overall system’s stability. In case of such event, applications temporarily won’t be able to connect to Azure SQL until resource utilization decreases to within given thresholds.

Retry logic

To make your applications more reliable, a common approach is to introduce proper retry logic capabilities in database access code. Retry logic usually refers to a technique that
  1. a.

    Intercepts application errors provoked by a transient condition

     
  2. b.

    Retries the original operation for a certain amount of times, introducing a delay period that can be fixed or incremental, to make sure you won’t flood instances with requests that are going to fail anyway usually creating a convoy effect

     

This logic can be as simple as just retrying opening a connection, after a previous tentative has failed returning a certain exception, for a fixed amount of times and with some delay between retries.

A trickier use case for retry logic is when an operation ultimately modifies database state, like inserting a new record or updating one or more existing ones. If a transient error happened while this command was under execution, the application will be responsible for deciding if the previous attempt failed before or after the database was effectively modified. In this case, in fact, client applications cannot just blindly re-execute previous commands, as there’s no guarantee data wasn't already modified (think about a bank account’s transaction as an example). Retry logic needs to ensure that either the previous transaction was completely committed or that the entire operation was rolled back; otherwise, the database could remain in an inconsistent state. Basically, retry logic for transactional database code can be quite complex and will typically only apply for those use cases where your data modification code is completely idempotent (i.e., can be executed multiple times without necessarily modifying database state).

If correctly modeled, the database will help you in making sure data is consistent: for example, an order with an already existing number will not be allowed to be inserted. That said, you will still need to handle the returned error, and thus implementing solid retry logic will significantly improve your application reliability and stability.

Transient and persistent errors

A key aspect of implementing a robust retry logic mechanism is to intercept and decode what errors the application should interpret as transient and what are instead permanent (and retry logic won’t be able to help).

As mentioned previously, there are several categories of conditions and events that could be categorized as transient, from underlying hardware failures and automatic reconfigurations to temporary resource exhaustion. Transient issues could also happen at different layers in the stack: think about a temporary glitch in software defined networking! While these episodes can be rare and very short in time, it’s very important to proactively address them by adopting proper coding practices. A comprehensive explanation of typical connectivity issues at different layers is offered in Azure SQL’s public documentation at this link: https://aka.ms/tciasb.

Custom code or reusable libraries

Let’s see how to implement in practical terms a simple retry logic mechanism for database access as described in previous sections. The following code is quite straightforward and only provides basic retry functionalities by wrapping database interaction method in a for loop and intercepting potential exceptions that can be thrown during its execution.
bool success = false;
int retryCount = 3;
int retryInterval = 8;
List<int> RetriableCodes =
   new List<int> { 4060, 40197, 40501, 40613,
             49918, 49919, 49920, 11001,208 };
for (int retries = 1; retries <= retryCount; retries++)
{
    try
    {
        if (retries > 1)
        {
           Thread.Sleep(1000 * retryInterval);
           // Retry interval increases by 50% every retry
           retryInterval = Convert.ToInt32 (retryInterval * 1.5);
        }
        await MyDatabaseOperation();
        success = true;
        break;
    }
    catch (SqlException se)  
    {
        if (RetriableCodes.Contains(se.Number) == true)
        {
           // Retriable error – log and continue
           continue;
        }
        else
        {
        // Non-retriable error – exit the loop
           break;
        }
    }
    catch (Exception e)
    {
       // Generic exception – exit the loop  
       break;
    }
}
Listing 3-5

Create custom retry logic in a C# application

As you notice, a lot of “plumbing” and complexity is required for a quite simple database operation to make it resilient to transient errors. Imagine if this should be repeated for every method and interaction that your application has with its data layer! Plus, this lacks any option for consistently configure parameters like number of retries, fixed or incremental delays, or even what exceptions should be considered transient vs. permanent.

Luckily, over the years, a number of reusable libraries have been created covering pretty much every programming language and framework to encapsulate that plumbing code into configurable mechanisms that developers can use to make their applications more reliable.

For .NET applications , one of the most known retry logic libraries is Transient Fault Handling (TFH) Application Block, originally part of Microsoft Enterprise Library framework, that has been recently ported to .NET Core and is freely available on NuGet for downloads at this URL: https://aka.ms/eltfhc.

It is a fully configurable and comprehensive set of classes that let you define your own RetryStrategy (e.g., FixedInterval, Incremental, ExponentialBackoff) and use a standard (out of the box) or a custom transient error detection logic. These two components combined define a RetryPolicy which is the class you use to wrap database access activities. Let’s look at a practical implementation using this framework.
public async Task<IEnumerable<Order>> GetOrders()
{
    // Get retry strategy
    RetryStrategy retryStrategy =
        config.GetRetryStrategies<FixedInterval>()
            ["MyFixedStrategy"];
    // Create retry policy
    RetryPolicy retry = new
        RetryPolicy<MyTransientErrorDetection>(retryStrategy);
    // Wrap database interaction with retry policy
    await retry.ExecuteAsync(async () => {
    List<Order> orders = new List<Order>();
    using (SqlConnection cnn =  new SqlConnection(
                    _config.GetConnectionString
                           ("DefaultConnection")))
    {
        SqlCommand cmd = new SqlCommand
            (@"SELECT TOP 5
                  [o].[OrderID],
                  [o].[OrderDate],
                  [c].[CustomerName]
               FROM [Sales].[Orders] AS [o]
                  INNER JOIN [Sales].[Customers] AS [c]
                  ON [o].[CustomerID] = [c].[CustomerID]",cnn);
        await cnn.OpenAsync();
        SqlDataReader dr = await cmd.ExecuteReaderAsync();
        while (dr.Read())
        {
            orders.Add(new Order()
            {
                OrderID = Convert.ToInt32(dr[0]),
                OrderDate = Convert.ToDateTime(dr[1]),
                CustomerName = Convert.ToString(dr[2])
            });
        }
    }
    return orders;
    });
}
Listing 3-6

Use a retry logic library in a C# application

In this fragment (you can find the complete example in the companion GitHub repo), we configure retry strategies’ details in a configuration file, essentially defining all the parameters like number of retries, retry intervals, and so on.

You may want to define more retry strategies in your applications and use them depending on the kind of database operation you want to retry. For example, for less frequent data retrieval operation, you may want to define a more aggressive retry strategy with higher number of retries, while for a different use case, you may want to step back and quickly return the error to the end user so that he or she can make a different decision based on that.

Next step is to define what error detection strategy you want to use. TFH provides out of the box a class called SqlDatabaseTransientErrorDetectionStrategy which encapsulates the logic for detecting the most common error codes that Azure SQL will emit when facing a transient error. In this example, we instead created a custom strategy by creating a class that implements ITransientErrorDetectionStrategy interface, as we wanted to test our retry logic with some non-transient errors.

You then create a RetryPolicy instance by combining your retry strategy and transient error detection class, and that will provide the ExecuteAction() or ExecuteAsync() method to effectively wrap database access code.

Retry policy class also exposes a Retrying event that you can subscribe to and be notified when retry logic is intercepting a transient error and retrying an operation.

Another popular library in the .NET space is Polly (https://aka.ms/avnp) which also provides features that cover other app reliability aspects by implementing resiliency patterns like Circuit Breaker, Timeout, Bulkhead Isolation, and Fallback in addition to just Retry.

Similar implementations also exist in most other programming languages and frameworks. In Python, a common one is called Tenacity (https://aka.ms/tenacity) and provides similar capabilities. Tenacity comes as a generic decorator (@retry) for the methods you will want to retry automatically. You can specify several parameters, like number of attempts (stop condition) and delay between attempts (referred as wait condition), and of course customize what exception types should be considered retriable. This implementation is quite simple to use; let’s see it in practice in the next example.
import pyodbc
import random
from tenacity import *
import logging
def is_retriable(value):
    # Define all retriable error codes from https://docs.microsoft.com/en-us/azure/sql-database/troubleshoot-connectivity-issues-microsoft-azure-sql-database
    RETRY_CODES = [1204,1205,1222,49918,49919,49920,4060,4221,40143,40613,40501,  40540,40197,10929,10928,10060,10054,10053,233,64,20,0]
    ret = value in RETRY_CODES
    return ret
@app.route('/order')
@retry(stop=stop_after_attempt(3), wait=wait_fixed(10), after=after_log(logger, logging.DEBUG))
def getorders():
     cnxn = pyodbc.connect("DRIVER={ODBC Driver 17 "
     "for SQL Server};" "SERVER="+server+";DATABASE="+database+""
     ";UID="+username+";PWD="+ password)
     try:
     cursor = cnxn.cursor()
     tsql = "SELECT TOP 5 "
            " [o].[OrderID], [o].[OrderDate],"
            " [c].[CustomerName]"
            " FROM [Sales].[Orders] AS [o]"
            " INNER JOIN [Sales].[Customers] AS [c]"
            " ON [o].[CustomerID] = [c].[CustomerID]"
     rows = cursor.execute(tsql).fetchall()
     order_list = []
     for row in rows:
        d = collections.OrderedDict()
        d['orderID'] = row.OrderID
        d['orderDate'] = str(row.OrderDate)
        d['customerName'] = row.CustomerName
        order_list.append(d)
    except Exception as e:
        if isinstance(e,pyodbc.ProgrammingError) or
           isinstance(e,pyodbc.OperationalError):
               if is_retriable(int(e.args[0])):
                   raise
    pass
    return json.dumps(order_list)
Listing 3-7

Implement retry logic in a Python application

In this example, we use Tenacity to decorate a Flask method interacting with our database called getorders() , and we retry three times with a fixed interval of 10 seconds in case of an exception. Instead of specifying in decorator’s attributes what exception to retry with, we’re instead wrapping pyodbc methods with a try/except block and checking if the exception has anything to do with database access. In that case, we’re checking if underlying database error code is contained in the list we’re maintaining for retriable error and, if that’s the case, we’re just raising that exception so that the @retry decorator can do its job of automatically retrying the method until it succeeds or it should stop trying as the max number of attempts has been reached.

Even in Java, there are again multiple options to achieve the same result. One of the most commonly used is called Failsafe (https://aka.ms/jnfs) and is very similar to the Transient Fault Handling library we mentioned previously. Without the need for demonstrating it in another complete application, the following code fragment shows how to quickly use Failsafe’s main classes:
// Define a retry policy instance with familiar parameters
RetryPolicy<Object> retryPolicy = new RetryPolicy<>()
  .handle(ConnectException.class)
  .withDelay(Duration.ofSeconds(1))
  .withMaxRetries(3);
// Wrap your database interaction code with that retry policy
Failsafe.with(retryPolicy).run(() -> connect());

Connectivity best practices

Let’s start from a very basic one: in a database application, latency matters! This means that, for performance reasons, it’s important to make sure your application code runs as close to your database as possible, no matter what Azure service it will be deployed on. This is especially true for applications that are executing a lot of database interactions or roundtrips, where this latency can easily become more impactful than real processing time.
../images/493913_1_En_3_Chapter/493913_1_En_3_Fig1_HTML.jpg
Figure 3-1

Roundtrips to database

At the very least, you should make sure your application gets deployed in the same region as where your database instance is. This may also have some architectural implications in case, for example, you’re designing a highly reliable, cross-region solution. This means that you need to plan for failing over not only your data layer but also your application tier accordingly to minimize latency impact in case of a malfunctioning of your primary site.

Along the same line, it’s also important to understand overall Azure SQL connectivity architecture (explained in the official docs: https://aka.ms/sdmica) and make sure that, if your application tier is running in an Azure service like Virtual Machine, App Service, or Azure Kubernetes Service, as an example, it is leveraging the Redirect connection policy, which means that your application will communicate directly with the node hosting your database instance instead of passing through the Gateway layer for every single interaction. If your app is executing anything more than only a few queries every minute, this option will make a significant difference from a performance perspective, and the trade-off required is just to make sure that ports in the range of 11000–11999 are open in networking configuration where your client code resides.

While it may sound trivial, another recommendation related to connectivity is to make sure your code is effectively opening a connection with an Azure SQL database instance as late as possible before executing some meaningful command, and it’s closing that connection as soon as results are consumed. Most driver libraries are, in fact, designed to leverage Connection Pooling, a mechanism that will help you balance between the cost of opening a brand-new physical connection (e.g., a TCP socket) with a remote service, which always comes with a given millisecond overhead, and the cost of keeping too many connections always open as that will increase the amount of resources (memory, worker threads, etc.) consumed on the service side.

Connection pooling works at the process level and keeps a physical connection opened for a certain amount of time even if in your application code you explicitly called a close or dispose method, so that if a new request to open a connection with the same connection string parameters will be executed later, the existing physical connection will be reused instead of opening a brand new one.

Thanks to this approach, in a canonical web application or web API scenario, it’s not uncommon that, even if thousands of users are accessing a given page, only a few tens of real database connections are kept open at any given time, significantly reducing the overhead generated for Azure SQL database instances.

Generally speaking, in most scenarios where a multi-threaded application is executing a conventional database workload (queries, transactions, etc.) against an Azure SQL instance, it is recommended to leverage Connection Pooling for performance reasons. The only exception to this general rule is where your application really needs to carefully control how specific operations are executed against the database. A good example for that are Data Definition Language (DDL) commands (CREATE, ALTER, DROP that will be discussed later, that work on data structures instead of data itself) that your application may issue against the database, where usually one connection at time is executed and commands are serialized on that same connection.

As mentioned, most existing drivers are providing this capability out of the box and even enabling it by default, like .NET Data Provider for SQL Server, but there’s an important exception. In the Java space, historically, connection pooling has been a separate implementation from JDBC drivers so SQL Server’s one doesn’t provide functionality.

Thankfully, there are many external packages offering that capability for your Java application, and one of the most known is certainly HikariCP (https://aka.ms/hikaricp), as mentioned before. It’s important to notice though that, generally speaking, Java drivers have some challenges in detecting what are usually referred as “stale connections” or client-side connection objects that have lost underlying connectivity with a database instance due to a transient issue, without trying to execute a test command (by explicitly invoking java.sql.Connection.isValid() which pings the database every time to make sure the connection is opened). In other drivers, this is usually performed at a lower level by checking the state of a TCP socket, but Java native APIs have issues with that. A similar problem could happen while a command is executed, and a resultset is under consumption by your application code. The recommendation here is to carefully configure both your JDBC driver and your connection pooling classes with proper timeouts to avoid that the application can hang forever if a transient error happens at the wrong time. All the details about these configurations are further explained in an article at this URL: https://aka.ms/jdbc-connection-guidance.

Other high-level languages and frameworks like Python and pyodbc may be suffering from the same transient connectivity issues, and the same approach and guidance is also recommended.

Handling exceptions

Opening of a connection to a data source and the execution of commands are operations naturally subject to the occurrence of errors, which can range from the lack of network connectivity to the server up to the violation of some rules for maintaining data integrity, coded within the database (primary key, relationships, etc.) or even concurrency management (e.g., locking/blocking) within your application or during maintenance operations (complete list of database engine error codes is available here: https://aka.ms/eaerde). It is therefore necessary to provide, for your application code performing database interactions, an adequate interception mechanism and management of error conditions.

Most programming languages and frameworks implement exception handling through the try/catch (or except) approach. In the try block, you generally put the instructions that are presumed to generate exceptions, such as opening the connection, executing a command, or calling a component which in turn can generate an application exception. If one of these operations fails, application control passes to the first catch/except block which specifies an exception type compatible with the one that occurred. Generally, those exception types provide all the information concerning the specific operation failed; this information is then made available in the catch block so you can code a proper logic on how to manage it.

In T-SQL language (e.g., within a Stored Procedure), in addition to errors potentially generated by command interacting with database objects (e.g., a primary key constraint violation), you can also generate exceptions that represent a logical error in your procedure through the RAISERROR or THROW functions, so that your application can behave accordingly.

You can pass an error message, severity level, and code to the RAISERROR function, and depending on the severity, client providers will act accordingly:
  • Level 10 or below, connection is not interrupted, and no exceptions are generated. Messages can still be collected from the client driver.

  • From 11 to 19, the connection remains open, but an exception is thrown.

  • Beyond 19 are considered fatal errors, an exception is thrown, and connection is terminated.

For Azure SQL, errors generated with a severity level equal or less than 10 do not really generate an exception but are considered as a simple informational or warning message. Driver libraries are capturing this information through specific classes (e.g., InfoMessage for .NET Provider for SQL Server) together with proper exception collection.

At the very minimum, you’ll also want to use these details about the error condition as part of your logging strategy for further analysis from the operations team in charge or running your application. Other than that, you’ll have to decide what option between just retrying the operation (as described in the retry logic section) and returning the information back to the caller is the most correct, so that one can take appropriate decisions on the best course of action for each specific use case.

Frameworks, ORM, and MicroORM

So far, we mentioned that client drivers are generally providing base abstractions to connect, query, and consume results from Azure SQL database instances. That said, application developers in most cases are typically looking at higher-level abstractions to help them be more productive and eliminate recurring and potentially error-prone data access tasks, like representing and interacting with data entities in their application logic.

To achieve higher productivity, a number of data access frameworks have been created of the last 20 years or such that could be grouped in two main buckets:
  • Helper classes that just wrap those base abstractions and simplify common tasks

  • Object-Relational Mappers (ORM), providing rich data modeling and mapping capabilities to reduce the impedance mismatch between relational structures and object-oriented programming

The following table represents a (non-exhaustive) list of some of the most popular libraries in the market by programming language:

Language

Recommended/popular libraries

.NET

• Datasets

• Entity Framework (Core)

• Dapper

Java

• Spring Data

• Hibernate

PHP

• Doctrine

• Eloquent

Node.js

• RxDB

• TypeORM

• Sequelize

Python

• SQLAlchemy

• Django

• pandas

Ruby

• ActiveRecord

Go

• Gorm

In the next sections, we’ll dig deeper into some of them.

Data access frameworks

Although basic abstractions will generally take control of every possible aspect of your database interactions, they may require a lot of boilerplate code in your application to transform objects and structures representing higher-level entities in your logic into rows and columns within your database structure.

Over the years, a number of libraries and frameworks have been created to simplify this issue while still letting you control how your code interacts with your data. As a practical example, in the .NET space, ADO.NET Provider for SQL Server provides from the very first release, together with SqlConnection, SqlCommand , and SqlDataReader classes (representing what is usually referred to as the “connected” portion of SqlClient library), a number of other classes are provided to interpret and navigate through query results in a more “object-oriented” fashion, but also to take care of persisting back to the underlying database whatever change has been made to original data. I’m referring to classes as SqlDataAdapter and DataSet (typed or untyped). DataSet is an in-memory representation of resultsets populated from the scratch in your code or returned from one or more queries to the database and provides additional logic around offline change tracking, data validation, and relationship management between entities. SqlDataAdapter acts as a “connector” between DataSets and database objects and can both automatically generate T-SQL commands that take changes applied to in-memory data and persist them in the back-end database or leverage existing commands and Stored Procedures to control all aspects of these database operations for performance or concurrency reasons. DataSets can also be automatically generated from a database schema and become fully typed objects exposing internal resultsets as named collections (e.g., Customers, Orders, etc.) instead of rows and columns. To discover more about these options, you can find complete coverage at this link: https://aka.ms/dnfado.

Similar helper classes are very popular on other programming frameworks as well. Examples can be Spring Data Access for Java (https://aka.ms/sfrda) or SQLAlchemy Core classes for Python.
def getorders():
    # create a SQLAlchemy engine using SQL Server dialect
    engine = create_engine("mssql+pyodbc:///?odbc_connect=%s" % connectionString)
    # initialize metadata repository
    metadata = MetaData()
    # define metadata for result sets
    orders = Table('Orders', metadata,
        Column('orderid', Integer, primary_key=True),
        Column('customerid', Integer),
        Column('orderdate', Date),
        schema='Sales'
    )
    customers = Table('Customers', metadata,
        Column('customerid',Integer,primary_key=True),
        Column('customername',String),
        schema='Sales'
    )
    # define our query using a SQL-like syntax
    s = select([orders.c.orderid,orders.c.orderdate,customers.c.customername]).
            select_from(orders.join(customers, orders.c.customerid==customers.c.customerid)).
                limit(10)
    try:
        # open the connection to the database
        cnn = engine.connect()
        # execute SQLAlchemy command
        res = cnn.execute(s)
        # iterate on results
        for row in res:
            print(row)
        # close the connection
        cnn.close()
    except Exception as e:
        print (e)
        pass
Listing 3-8

Using SQLAlchemy in a Python application

In this simple example, you can see how SQLAlchemy lets us define in-memory representation of our application entities and how they map to database tables. Using an app-level SQL-like syntax, we can then specify our query containing advanced operations like joins, filters, aggregations, projections, and so on, and SQLAlchemy classes will translate this into a SQL syntax specific for Azure SQL, but you could easily port the same code to connect to other supported database systems as well. While SQLAlchemy can do much more (we’ve just scratched the surface here), its most advanced features more than just a data access framework belong to the realm of Object-Relational Mappers, which is the topic of the next section.

ORMs

The purpose for an ORM is to mediate between the relational and object-oriented worlds and let developers to write applications that interact with data stored in relational databases using typed objects that represent the application’s domain and reducing the need for data access redundant (or “plumbing”) code that they usually need to write. The following table recaps various ORM libraries you can use to connect with Azure SQL:

Language

Platform

ORM(s)

C#

Windows, Linux, macOS

Entity Framework

Entity Framework Core

Java

Windows, Linux, macOS

Hibernate ORM

PHP

Windows, Linux, macOS

Laravel (Eloquent)

Doctrine

Node.js

Windows, Linux, macOS

Sequelize ORM

Python

Windows, Linux, macOS

Django

Ruby

Windows, Linux, macOS

Ruby on Rails

One of the first and most successful libraries in this space is Java’s Hibernate (https://aka.ms/horm) appeared in the early 2000s, with the goal of providing a better experience than Enterprise Java Beans entities to persist data into databases without necessarily using SQL commands. Over the years, these libraries became much more powerful and complex (for some people, even too complex, so that alternatives like “micro-ORM” libraries have been created) to cover other aspects of the data access tier like modeling, security, and scalability.

Microsoft’s own ORM for .NET world is Entity Framework (EF), and its more recent release is EF Core (https://aka.ms/ghdnefc). EF Core works with several back-end database stores like SQL Server, Azure SQL Database, SQLite, Azure Cosmos DB, MySQL, PostgreSQL, and other databases through a provider plug-in API. It’s the result of 10+ years of development in this space and provides features like LINQ queries, offline change tracking, batched updates, and database schema management through a feature named “migrations.”

The core of this library is the DbContext class that developers can use to create their database interaction context and model how database tables will be mapped to entities and collections and how operations against these entities will be transparently transformed into SQL code to read, create, or update records within a database.
using(var dbctx = new WWImportersContext())
{
     var res = dbctx.Orders
           .Include("Customer")
           .Select (o => new
           {o.OrderID,o.OrderDate,o.Customer.CustomerName})
           .ToList().Take(10);
     foreach(var o in res)
     {
         Console.WriteLine("OrderId: {0} - OrderDate: {1} - CustomerName: {2}",o.OrderID,o.OrderDate,o.CustomerName);
     }
}
class Order
{
    public int OrderID {get;set;}
    public DateTime OrderDate {get;set;}
    public int CustomerID {get;set;}
    public Customer Customer {get;set;}
}
class Customer
{
    public int CustomerID {get;set;}
    public String CustomerName {get;set;}
}
class WWImportersContext : DbContext
{
    // Model building deliberately omitted in this fragment
    public DbSet<Order> Orders {get;set;}
    public DbSet<Customer> Customers {get;set;}
}
Listing 3-9

Using Entity Framework Core in a C# application fragment

Let’s focus on the three key aspects of this simple example:
  • First, the two POCO (plain old CLR objects) classes Order and Customer that represent the business entities managed by our application.

  • WWImportersContext class, inheriting from DbContext, which represents the context our app is using to connect and query the database. Specifically, it does contain the two DbSet collections that are mapped to database tables.

  • The LINQ query that interacts with the database context and expresses what entities we want to retrieve.

As you can notice in the complete example (see GitHub repo for that), it’s interesting to see how our WWImportersContext class overrides the OnConfiguring and OnModelCreating methods of its base class to do exactly what their names imply: configuring the way our context communicates with the database and defining a model where our entities map to respective database tables. We also configured the logging infrastructure to show how the resulting T-SQL code automatically generated by the context looks like. Pretty straightforward, isn’t it?

Entity Framework Core can do much more, and you can start familiarizing with all these capabilities through this free Microsoft Learn online course (https://aka.ms/lmpsefc).

“With great power comes great responsibility,” so there are some basic good practices that you should always keep in mind to get the most out of ORM tools like EF Core:
  • First of all, make sure you’re pulling from the database only the data you really need and making your resultset as efficient as possible.

  • If your application only needs to materialize objects based on query results and display them on a screen, but will never modify and update them, switch change tracking off. This will avoid wasting lots of application resources to track object state information you will never use.

  • If instead your application will create or update many entities within your database context, make sure you’re leveraging batching (e.g., calling the AddRange() method on your DbSet collections). EF Core will then behind the scenes create a Table-Valued Parameter (TVP) and use it in a MERGE statement to insert multiple rows in a single database roundtrip. This is providing many benefits for Azure SQL both in terms of reduced latency and minimizing transaction log pressure. For very large bulk operations (in the >10,000s rows ballpark), you may want to perform them outside of Entity Framework (e.g., using ADO.NET’s SqlBulkCopy class) or use a nice EF Core extension called EFCore.BulkExtensions (https://aka.ms/efcbe).

  • Where possible, turn on logging and validate T-SQL syntax generated by EF Core.

  • In case of complex database interactions, using raw SQL commands or Stored Procedures through EF Core can solve performance and flexibility issues.

MicroORMs

For those scenarios where you don’t need the complexity of a full ORM library, a new class of simplified tools emerged over the last years, and they’re usually referred as MicroORMs . One of the most successful is probably Dapper from StackExchange (https://aka.ms/ghsaed). Basically, Dapper provides a fast and efficient way of materializing objects based on a SQL query, and it’s a great solution for application scenarios where you don’t necessarily need to track disconnected changes and manage full persistence for your objects into the database. Dapper extends existing IDbConnection interface (as available in various ADO.NET Data Providers) and provides helper methods to execute T-SQL queries and map results to collections of POCO objects or execute commands that modify database state in a performant manner. The following example shows how to use Dapper to retrieve a resultset.
class Program
{
   static void Main(string[] args)
   {
       using (SqlConnection cnn =
           new SqlConnection(
           "Server=tcp:<servername>.database.windows.net,"+
           "1433;Initial Catalog=WideWorldImporters-Full;"+
           "User ID=<username>;Password=<password>;"))
       {
            var orders = cnn.Query<Order>("SELECT TOP 10 OrderID, OrderDate, CustomerID FROM Sales.Orders;");
           foreach (var o in orders)
           {
               Console.WriteLine("OrderId: {0} - OrderDate: {1} - CustomerId: {2}",o.OrderID,o.OrderDate,o.CustomerID);
           }
       }
   }
}
class Order
{
    public int OrderID {get;set;}
    public DateTime OrderDate {get;set;}
    public int CustomerID {get;set;}
}
Listing 3-10

Using Dapper in a C# application

This example shows how MicroORMs like Dapper can be a good compromise between using base SqlConnection, SqlCommand, and SqlDataReader classes and a more sophisticated, but complex, solution like Hibernate or Entity Framework for your data access layers.

Using read-only replicas

One benefit provided by Azure SQL Premium, Business Critical, and Hyperscale database instances is that, as part of their high-availability architecture, several secondary replicas are deployed and are maintained in sync with primary replica at minimal latency by the underlying infrastructures. Read Scale-Out feature, by default enabled on Premium and Business Critical tiers, gives you the ability to run read-only workload against one of these secondary replicas without impacting performance of your primary, read-write, replica at no extra cost.

From a data access perspective, this is extremely easy to use as it only requires you to add the ApplicationIntent=ReadOnly; attribute to your connection string, and all traffic for that connection will be automatically redirected to a read-only replica of the database. Hyperscale performance tier also provides such capability, but you have to explicitly create at least one secondary replica for your database in order to be able to benefit from it.

Your application can make sure it’s connected to a read-only replica by checking that this command SELECT DATABASEPROPERTYEX(DB_NAME(), 'Updateability') effectively returns READ_ONLY as result. It's worth remembering that, while read-only replicas are in a transactionally consistent state, in some rare cases there may be some small latency compared to data in the primary replica. Also, at the time of writing these notes, certain features like Query Store, Extended Events, and Audit are not yet supported on read-only replicas, although there are ways of monitoring them using traditional DMVs like sys.dm_db_resource_stats or sys.dm_exec_query_stats, sys.dm_exec_query_plan, and sys.dm_exec_sql_text that work as expected. For additional details on this interesting feature, please refer to the official documentation at https://aka.ms/sqrso.

If you want to know more

This chapter provided you with a comprehensive overview on how to connect to Azure SQL database instances from a variety of traditional and modern programming languages using conventional drivers and more advanced data access frameworks. It provided some best practices on how to effectively increase connection reliability and make your application more resilient to transient connectivity issues that may happen when you design and implement distributed systems in cloud-based environments. To dig deeper into some of these topics, we recommend to take a look at these links:
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset