© Davide Mauri, Silvano Coriani, Anna Hoffman, Sanjay Mishra, Jovan Popovic 2021
D. Mauri et al.Practical Azure SQL Database for Modern Developershttps://doi.org/10.1007/978-1-4842-6370-9_5

5. Developing with Azure SQL – Advanced

Davide Mauri1 , Silvano Coriani2, Anna Hoffman3, Sanjay Mishra1 and Jovan Popovic4
(1)
Redmond, WA, USA
(2)
Legnano, Milano, Italy
(3)
Sherman Oaks, CA, USA
(4)
Belgrade, Serbia
 

After having discussed the foundational aspects of querying and manipulating data, it’s now time to focus on more advanced and development-oriented features that you can use in Azure SQL. If you already took a peek at the chapter content, you may be surprised to find a full section on security. Don’t be. Security is not something that can be added later, like an afterthought, that is a nice to have but not so core. On the contrary, security is important as much as performances and maintainability and must be taken into consideration from the ground up. Therefore, the options Azure SQL offers you to keep your data secure are a must-known for everyone who wants to create a modern application – modern because it is not only scalable and modular, but because it is also secure.

Programmability

As a developer, you already know how important it is to be able to reuse and encapsulate existing code. Azure SQL provides a wide range of options that you can use to make sure your SQL code is clean, legible, easy to maintain, and reusable.

Variables

There are two types of variables in T-SQL: Scalar and Table. A scalar variable is just a normal variable as the one you are used to work with in other programming languages. Variables must be explicitly declared, have a name that starts with the character “@”, and must be typed:
DECLARE @i INT = 42;
DECLARE @name NVARCHAR(50) = N'John';
SELECT @i AS FamousNumber, @name AS CommonName;
T-SQL doesn’t support arrays, lists, or dictionaries, but if you need to store more than one value in a variable, you can use a Table Variable:
DECLARE @t AS TABLE (
      [Id] INT NOT NULL,
      [Name] NVARCHAR(50) NOT NULL
);
INSERT INTO @t VALUES (42, N'John');
SELECT * FROM @t;

One behavior that may surprise at the beginning, and that is quite different from variables in other programming languages, is how Azure SQL manages the variable’s scope. A variable is scoped to a batch or to an object. In the case of an object, understanding the scoping is easy, as T-SQL behaves exactly like any other programming language. A variable defined within a Stored Procedure, Trigger, or Function will be scoped to that object only.

Variables defined outside an object, instead, are scoped to the batch. A batch is a group of T-SQL statements executed together. In the preceding code sample, you must execute all three statements (DECLARE, INSERT, and SELECT) all together; otherwise, the code would not work. If you try to execute the DECLARE statement alone, that will work, but once done, the variable will be gone. If you would try to execute the INSERT statement, then you’ll get an error telling you that the variable @t is not defined.

Table Variables are generally used in very niche places as they may have a quite heavy impact on performances, and usually they should not contain too many rows. Be aware that this has nothing to do with memory consumption or anything related to that: the issue is that Table Variables are something in between two worlds – Variables and Tables – and for this reason, the Azure SQL engine cannot optimize access to their data as much as it would with regular table instead. If you want to learn more details about this, look at the end of this chapter so that you can learn more. In general, if you need to have a temporary place where to store some of your data, use Temporary Tables instead.

Temporary Tables

Especially if creating analytical solutions, it could happen (even pretty frequently) that you need to temporarily store processed data somewhere, so that you can work on it at a later time. Sometimes it is also helpful to split a very complex query in several smaller queries, and you’ll need a place to store the result of a query so that it can be picked up from another one for further processing.

Temporary Tables will help you exactly in those cases. You can create a temporary table using the regular CREATE TABLE command. As long as the table name starts with the hash character (#), that table will be temporary:
CREATE TABLE #t
(
     [Id] INT NOT NULL,
     [Name] NVARCHAR(50) NOT NULL
);

Temporary means that it will exist until manually destroyed with the DROP command or until it goes out of scope. For a Stored Procedure or a Trigger, the scope is the lifetime of the Stored Procedure or Trigger that created the temporary table. Once the Stored Procedure or Trigger has finished executing, the temporary table will be destroyed.

A quick way to create and fill a temporary table with the result of a query is the SELECT INTO statement:
SELECT
     [OrderLineID],
     [Description],
     [Quantity]
INTO
     #Order37
FROM
     [Sales].[OrderLines]
WHERE
     [OrderID] = 37;

A Temporary Table named #Order37 will be created, automatically inferring the column names and types, and it will be filled with all the rows from the table Sales.OrderLines related to the specified order.

Temporary Tables benefit from special optimizations that Azure SQL can do, as it knows such tables are ephemeral by definition. If you need to park your temporary data somewhere, make sure you use them in place of regular tables.

Views

Views are probably the simplest option available to reuse code and encapsulate data processing logic. A view is nothing more than a query definition, labeled with name, and usable as a table. In fact, views are also known as virtual tables, even if this name is seldomly used.

Views are useful for code reuse but also for securing access to your data and to keep backward compatibility with existing applications, by abstracting access to underlying tables.

The first point is quite obvious: instead of writing a complex query with JOINs, Windowing Functions, and all the goodness that Azure SQL provides, you can just save the query definition as a view and then call it later by executing simpler code. For example, let’s create a view to encapsulate the logic to calculate a Running Total:
CREATE OR ALTER VIEW [Sales].[OrderLinesRuninngTotal]
AS
SELECT
     [OrderID],
     [OrderLineID],
     [Description],
     [Quantity],
     SUM(Quantity) OVER (
           PARTITION BY [OrderID]
           ORDER BY [OrderLineID] ROWS BETWEEN
                UNBOUNDED PRECEDING AND
                CURRENT ROW
     ) AS RunningTotal
FROM
     [Sales].[OrderLines];
GO
Now you can execute the same query using this simpler code:
SELECT
     OrderID,
     [OrderLineID],
     [Description],
     [Quantity],
     [RunningTotal]
FROM
     [Sales].[OrderLinesRuninngTotal];
The query executed will be exactly the same as if you would have executed the query code directly without using the view. More precisely, the view body is “expanded” and merged with the query’s body. For example, in fact, if you write something like this:
SELECT
     OrderID,
     [OrderLineID],
     [Description],
     [Quantity],
     [RunningTotal]
FROM
     [Sales].[OrderLinesRuninngTotal]
WHERE
     [OrderID] IN (41, 42, 43);
GO

GO is not a T-SQL command or keyword. GO is a keyword recognized by a tool, like SQL Server Management Studio or Azure Data Studio, that notifies the tool that two sets of T-SQL statements, usually called batches, must be executed separately and independently (but not in parallel though!). This is needed by some commands, many CREATE statements, for example, so that you can tell the tool you’re using when a view’s body is finished and when another command, potentially even unrelated to the created view, is starting.

Azure SQL will behave like it could push the WHERE clause into the view and apply the filter so that the running total will be calculated only on the requested Orders. More precisely, what really happens is that the view is expanded and merged with the query that is using it. This allows Azure SQL to optimize the query as a whole and avoid, for example, to just follow the precedence order which would require to calculate the Running Total for all orders and then remove all the rows that are not in the query's scope. As you can imagine, that would be extremely inefficient. By following this optimization process instead, the view can take advantage of all the filters used in outer queries (when it makes sense) so that it can reduce as much as possible the number of rows on which it has to operate, providing better performance and less resource usage.

Security is another reason why views are useful: you can grant someone access to a view while denying that user access to the table used by the view. This means that you can be sure a user can only see the result of the query, but not the values stored in the underlying tables used to create that result.

Last reason for taking advantage of views is that they help to abstract access to tables by creating an indirection layer, so that your application and the database can stay loosely coupled, giving you the flexibility to make changes to your database without necessarily breaking any working application: views can used as a backward-compatible interface to existing applications. You are free to refactor your database and evolve its design to some new direction you want it to take, with the peace of mind that you won’t introduce any breaking changes.

Views can be dropped using the DROP VIEW command:
DROP VIEW [Sales].[OrderLinesRuninngTotal];

Functions

Functions in Azure SQL are of two types: functions that return a table and functions that return a scalar value. Besides this difference, they are very similar to what you probably already expect, given the fact that they are, as the name implies, functions. They allow you to encapsulate and reuse code and processing logic and can accept parameters and return a result.

As Azure SQL has many system-provided functions built in, for example, STRING_SPLIT, the functions created by the user are generally called User-Defined Functions or UDFs.

UDFs that return a table are usually referenced to as Table-Valued Functions or TVF for short; as such, they can be used anywhere a table could be used too, for example:
SELECT
      [OrderID],
      [OrderDate],
      [TotalQuantity],
      [TotalValue]
FROM
      dbo.[GetOrderTotals](40, 42);
If a TVF is made only of one SELECT statement, it becomes very similar to a view with the exception that it supports parameters. As a view, such function can be expanded and merged (this process is usually referred to as inlining) with the query that is using it and thus can take advantage of better optimization. For this ability to be inlined, those TVFs are called Inline Table-Valued Functions :
CREATE OR ALTER FUNCTION dbo.GetOrderTotals(@FromOrderId AS INT, @ToOrderID AS INT)
RETURNS TABLE
AS
RETURN
WITH cte AS (
     SELECT
          [OrderId],
          SUM([Quantity]) AS TotalQuantity,
          SUM([Quantity] * [UnitPrice]) AS TotalValue
     FROM
          [Sales].[OrderLines]
     WHERE
          [OrderId] BETWEEN @FromOrderId AND @ToOrderID
     GROUP BY
          [OrderId]
)
SELECT
     o.[OrderId],
     o.[OrderDate],
     ol.[TotalQuantity],
     ol.[TotalValue]
FROM
     cte ol
INNER JOIN
     [Sales].[Orders] o ON [ol].[OrderID] = [o].[OrderID];
GO

A Table-Valued Function that is made of more than one single SELECT statement is called Multi-Statement Table-Valued Function , and while it provides more flexibility than an Inline TVF, it cannot be optimized as much when called by other queries. For this reason, it should be used only if there are no other options available as it may have a tangible impact on performances.

On the other side of the spectrum, there are the Scalar Functions. They return just a single value. The code used in their body can be as complex as you want and even include several SELECT statements, but at the end of its execution, one and only one scalar value must be returned.

While Scalar Functions could give the impression of something you would use as much as possible, especially if you are used to imperative programming languages, you should instead think twice when you want to create and use a new one. The reason is simple and is related to the fact that applying one operation to a set of data is usually much more efficient than executing that operation for each data point (rows in case of a relational database) in your set. If you create a Scalar function and use it to process the values returned by a query, you are forcing Azure SQL to execute that function for each row returned. Each call to a function has its own overhead, which adds up to the fact that Azure SQL Query Optimizer is working with his hands tied as it doesn’t have much space to optimize your code, as the function must be executed for every row. Performance will suffer a lot. And it is not by chance, in fact, that every time a database is forced to work processing single rows instead of a set of data, people refer to it as RBAR (Row-By-Agonizing-Row). So, use Scalar UDFs sparingly. On the bright side, in the last few years, Microsoft research found some clever methods to take the code inside the Scalar Function and rewrite it as more optimizable code, but still, if you can write the code in such a way RBAR is not needed from the start, you will help the optimizer right away, without having to wait for Microsoft Research, and in turn you’ll get better performance immediately.

Functions can be dropped using the DROP FUNCTION command:
DROP FUNCTION dbo.GetOrderTotals;

Stored Procedures

Stored Procedures are exactly what the name says: procedures stored in the database that can be executed by calling them by name. Stored Procedures can have parameters, can return a single scalar value, and can return zero one or more resultsets.

They offer an extremely powerful way to encapsulate complexity, reuse code, and create an abstraction layer: they can be used to create an API interface – a contract – that can shield the user from any changes or evolution that happens during a database’s lifetime. In addition to that, Stored Procedures offer other interesting benefits.

Like many other programmability features used to easily secure access to data, Stored Procedures can also be used for that purpose. A user can be authorized to execute a Stored Procedure but could be denied to access the underlying tables or views directly. This, especially if you are creating public facing applications, like an API or web app, is vital as you can be sure that, no matter what, a user accessing the database through that solution will only be able to access data via the procedures you have created for that purpose. Even if for some reason a user could gain direct access to the database, bypassing your middle tier or your web app, he or she won’t be able to do anything else apart from what the Stored Procedures would allow him or her to do. That’s a pretty important point and a huge help in keeping your data secure.

Stored Procedures also have another important benefit. Azure SQL, once the Stored Procedure is invoked for the first time, will cache the execution plan, also known, in other database systems, as a DAG (Directed Acyclic Graph), which is a list of steps that Azure SQL will execute in order to generate the result of your query. You will learn more about this later in the book, but as you can imagine, especially for complex queries, generating an execution plan can be expensive and resource-intensive, so caching it will help Azure SQL to save CPU and time, giving you better overall performances and freeing resources for higher scalability without incurring into additional costs.

As a developer, Stored Procedures will be something you will love to use as they really help you in creating a very clear separation of concerns, keeping the code clean and polished, facilitate a clear definition of an interface between database and application, and ultimately favor a good loosely coupled architecture.

A Stored Procedure can contain all the code you need to implement, even the most complex data manipulation processes, and they are called using the EXEC (or EXECUTE if you like to be verbose) command:
CREATE OR ALTER PROCEDURE dbo.GetOrderForCustomer
@CustomerInfo NVARCHAR(MAX)
AS
IF (ISJSON(@CustomerInfo) != 1) BEGIN
     THROW 50000, '@CustomerInfo is not a valid JSON document', 16
END
SELECT [Value] INTO #T FROM OPENJSON(@CustomerInfo, '$.CustomerId') AS ci;
SELECT
     [CustomerID],
     COUNT(*) AS OrderCount,
     MIN([OrderDate]) AS FirstOrder,
     MAX([OrderDate]) AS LastOrder
FROM
     Sales.[Orders]
WHERE
     [CustomerID] IN (SELECT [Value] FROM #T)
GROUP BY
     [CustomerID];
GO
In the preceding code, the Stored Procedure requires a string parameter. After a simple validation, the JSON document received in input is read and, just to simulate an intermediate step, the relevant content is saved into a temporary table. After that, the results are retrieved and returned to the user. The created Stored Procedure can be executed just by using the EXEC command:
EXEC dbo.GetOrderForCustomer N'{"CustomerId": [106, 193, 832]}';

The code inside the Stored Procedure will be executed, and one resultset, the one generated by the SELECT ... FROM Sales.[Order] query, will be returned.

You may be wondering, at this point, when you should use Functions and when Stored Procedures. Good question! As a general guidance, keep in mind that while functions are quite powerful, especially the Inline Table-Valued Functions, they also have quite a lot of limitations and constraints. For this reason, unless you have some specific use case that is perfectly suited for a Function, the recommendation is to use Stored Procedures: they offer better performances and don’t have all the limitations that Functions have.

A Stored Procedure can be dropped using the DROP PROCEDURE command:
DROP PROCEDURE dbo.GetOrderForCustomer;

Triggers

Triggers are special Stored Procedures that get executed when something happens. There are two types of Triggers: DML (Data Manipulation Language) Triggers and DDL (Data Definition Language) Triggers.

The first type, DML Triggers, gets executed when an INSERT, UPDATE, or DELETE (or MERGE) command is executed. The latter, DDL Triggers, is instead executed when CREATE, ALTER, or DROP commands, which are used to define data structures, are invoked.

In both cases, you will have access to special objects that will allow you to interact with the event that sparked the Trigger execution.

For DDL Triggers, you have access to the Inserted and Deleted virtual tables so that you can access the data as it was before and after the change. You can decide to do whatever you want with that data. For example, you can store the deleted data into another table to provide an easily accessible log of changed rows. You can also instruct the Trigger to prevent that change to happen at all: for example, you may want to be sure that some configuration data is not removed from a core table that your application needs to work properly.

In DML Triggers, you have access to an EVENTDATA function that returns an XML containing all the details about the statement being executed. For example, in this case, you can save that information to keep track of when, how, and who altered a table or dropped an object. Also in this case, you can also prevent that modification from happening.

Triggers are part of the transaction being executed, which means that you can decide if that transaction should be allowed to successfully complete or not. A situation where this could happen, for example, is when only a set of selected users are authorized to make changes to the discount rates applied to your products for selected customers. In a Trigger, you may check that the user making the change has the authorization to do it, maybe even logging that operation in another table to keep track of that. If the user doesn’t have that authorization, you can roll back the transaction right within the Trigger, meaning the change won’t happen at all:
CREATE OR ALTER TRIGGER [Warehouse].[ProtectAzure]
ON [Warehouse].[Colors]
FOR UPDATE, DELETE
AS
BEGIN
     IF EXISTS(
           SELECT * FROM [Deleted]
           WHERE [ColorName] IN ('Azure')
           )
     BEGIN
           THROW 50000, 'Azure is here to stay.', 16;
           ROLLBACK TRAN;
     END
END
If you try to DELETE or UPDATE a value in the Warehouse.Colors table so that the color “Azure” will be removed or changed to something else, the Trigger will roll back that action, leaving the “Azure” color intact:
DELETE FROM [Warehouse].[Colors] WHERE ColorID = 1

This capability is very powerful but also quite expensive: since Triggers are part of an active transaction, they can roll it back, but from a resource usage perspective, it would be better to prevent it in the first place. Preventing it would have avoided using IO, CPU, and memory resources just to discover, right at the end of the process, that all the work done must be undone – such a waste of precious resources.

Besides the impact on resource usage, Triggers can also have a big impact on performances. Their performance will be as good as the code you write and as fast as the logic you built into the Trigger could be. If in the Trigger you’re trying to aggregate data of a billion rows table, chances are that performance won’t be exactly stellar. And this will happen every time the Trigger is executed.

That’s why, more and more frequently, Triggers are replaced with natively supported features. For example, as you’ll learn in the next sections, keeping track of all changes done to a table can be done more efficiently and easily using a feature called Temporal Tables, or that securing access to specific rows and values can be done with Row-Level Security.

So, in general, try to keep Triggers’ usage as low as possible. You should do all the checks upfront before making any changes to your data, so that you don’t have to undo anything. It will favor performance, concurrency, maintainability, and scalability.

Triggers can be dropped using the DROP TRIGGER command:
DROP TRIGGER [Warehouse].[ProtectAzure]
ON [Warehouse].[Colors];

Non-scalar parameters

As you just learned in the previous sections, Stored Procedure and Functions, as expected, support parameters. In all modern programming languages, parameters can be anything, from scalars to complex objects. In Azure SQL, this is true too, albeit with some differences, since there is no concept like an object, given that SQL is not an object-oriented language. In the next sections, you’ll find the options available to achieve what passing an object would do in other languages.

Table-Valued Parameters

As the name implies , parameters can be tables. Yes, you can pass an entire table into a Function or a Stored Procedure. This is by far the best option you have when you need to pass quite a lot of data from your application to Azure SQL. I’ve written quite a lot because, as a rule of thumb, this option is good when you have up to some thousand rows to pass. If you are in the realms of hundreds of thousands, millions, or even more, then you should look at the “Bulk operations” section discussed in the previous chapter for really good performance.

Also keep in mind that the Table-Valued Parameter (TVP) feature needs to be supported by the client library, as it needs to be aware of that option to be able to use it. Common languages like .NET, Python, Node, and Go all support TVPs.

How to use the TVP depends on the language you are using, but the common idea is that you can load data into an object that can be enumerated, making sure that the data structure is compatible with the schema of the table type you’ll use as a parameter, and then pass that object to the Stored Procedure you are calling.

You start creating the table type:
CREATE TYPE dbo.PostTagsTableType AS TABLE
(
     Tag NVARCHAR(100) NOT NULL UNIQUE
);
And then you reference it into your Stored Procedure:
CREATE PROCEDURE dbo.AddTagsToPost
@PostId INT,
@Tags dbo.PostTagsTableType READONLY
AS
INSERT INTO dbo.PostTags SELECT @PostId, Tag FROM @Tags
As you can see, the parameter @Tags is really a table so you can use it as a regular table with the only limit that it is read only. Then, with .NET, for example, you can use the TVP as shown in the following code:
var p2 = new SqlParameter("@Tags", SqlDbType.Structured);
p2.TypeName = "dbo.PostTagsTableType";
p2.Value = tags;
cmd.Parameters.Add(p2);
where the object tags is a .NET DataTable:
var tags = new DataTable("PostTagsTableType");
tags.Columns.Add("Tag", typeof(string));
tags.Rows.Add("azure-sql");
tags.Rows.Add("tvp");

Table-Valued Parameters are heavily optimized for performances, and also by sending a batch of rows to Azure SQL, your application will do the roundtrip to the database only once, instead of as many times as you have rows to pass. As every call to resources external to application has small overhead, by keeping the number of times you call the database as low as possible – also referred to as a less or not chatty application – you’ll make sure to give the end user the best experience possible, creating applications that are scalable and provide great performance.

JSON

Sometimes passing an object as a table will not be possible. For example, when we have an object representing something like a post, with its own tags, categories, author details, and so on. We could surely find ways to turn such objects into some Table-Valued Parameters, but it would hardly be elegant. So why not just pass the object serialized as JSON as a whole? Azure SQL is perfectly capable of doing complex JSON manipulations, as you’ll learn in Chapter 8. You can write a Stored Procedure that can accept JSON as a parameter; you can use it then to read the needed JSON sections so that you can fill the appropriate tables or pass the extracted values , or even the JSON document, to other procedures:
CREATE PROCEDURE dbo.AddTagsToPost
@PostId INT,
@Tags NVARCHAR(MAX)
AS
INSERT INTO dbo.PostTags
SELECT @PostId, T.[value] FROM OPENJSON(@Tags, '$.tags') T
GO
And the Stored Procedure can accept JSON in @Tags:
EXEC dbo.AddTagsToPost 1, '{"tags": ["azure-sql", "string_split", "csv"], "categories": {}}';

As JSON is passed a regular string, this technique will work with any language as there is no special knowledge required by the client libraries to take advantage of this option.

CSV

What about if you just need to pass a small array of values? Or if the language you are using does not support TVPs and JSON feels just overkill for the goal? One way to elegantly solve the problem would be to use good-old, but always present CSV (Comma-Separated Values). Note that the comma is not really mandatory, as you can use the divider you prefer, like a pipe or a dash. Azure SQL gives you the Table-Valued System Function STRING_SPLIT to use for this purpose:
CREATE PROCEDURE dbo.AddTagsToPost
@PostId INT,
@Tags NVARCHAR(MAX)
AS
INSERT INTO dbo.PostTags
SELECT @PostId, T.[value] FROM STRING_SPLIT(@Tags, '|') AS T
And then you can just use the Stored Procedure like this:
EXEC dbo.AddTagsToPost 1, 'azure-sql|string_split|csv'

Same as with JSON , this option doesn’t require any special handling on the client side, so you can use this technique with any development language that supports regular ODBC connections.

Monitoring data for changes

Finding what data has changed from the last time an application, or a user, accessed it can be extremely helpful to increase application efficiency and scalability. Much fewer resources, in terms of CPU, network, and IO, would be needed, for example, to exchange only the changed data instead of a full set of data with a target application. In addition to such an already important point, since the burden of tracking changes can be left to Azure SQL, you’ll have your code leaner and solely focused on the specific business case you need to take care of. As a result, you will have more time to take care of it, removing from your plate the need of having to deal with other complexities that are anyway needed and expected in modern applications – like efficiency in data transfer.

Change Tracking

Change tracking is a simple, yet effective technology to understand what data has changed since a specific moment in time. The idea behind this feature is that every time a user accesses a table, he or she can ask for the version number active at that moment. The user must keep that version number safely stored somewhere so that the next time he or she tries to access that table again, the user can also ask Azure SQL to only get the changed data starting from that version number and on. Azure SQL will return all the changed data from that version number along with the operation – insert, update, or delete – that the user should do on its own dataset to update it to the current status, along with the new version number that the user must use next time, to get a new set of changes.

This feature is really powerful as it completely removes all the complexity from the client side, as the only thing that the client must do is to preserve the version number and present it to Azure SQL.

Change Tracking must be enabled on the database:
ALTER DATABASE WideWorldImportersStandard
SET CHANGE_TRACKING = ON
(CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON)
And on the table you want to track
ALTER TABLE [Warehouse].[Colors]
ENABLE CHANGE_TRACKING
From that moment on, you can use the system scalar function CHANGE_TRACKING_CURRENT_VERSION() to get the current version number:
SELECT CHANGE_TRACKING_CURRENT_VERSION()

The version number is calculated and automatically updated at the database level – that’s why it doesn’t require any parameter – and will change every time some data is changed in one of the tracked tables. Let’s say, for example, that I’ve just read the table Warehouse.Colors, and after doing that, I asked the current version number. In addition to the resultset, Azure SQL returned the value 42 to me, which represents the number associated with the current version, which means that the values I just read are associated with that version number.

Now let’s say that someone, maybe even including myself, makes some changes to the Warehourse.Colors tables. After a while, let’s say 3 hours, I need to get the data again from the same table, but now, instead of getting the full dataset, I’d like to have only the changes from when I last saw it, so that I can avoid having to read and transfer all the rows that haven’t been changed in the meantime and that, most probably, would be the majority.

CHANGETABLE is the function I need to use for this purpose:
SELECT
     SYS_CHANGE_OPERATION, ColorID
FROM
     CHANGETABLE(CHANGES [Warehouse].[Colors], 42) C
The query will return a table like the following:
../images/493913_1_En_5_Chapter/493913_1_En_5_Figa_HTML.jpg

where you can see the operation done on a row and the value of the column for the Primary Key of that row. With the Primary key – that uniquely identifies rows in a table – we can join that result with the original table, Warehourse.Colors, and get all the data we need. We can now run the CHANGE_TRACKING_CURRENT_VERSION() function again to get the current version; let’s say now it is equal to 50, and we keep the value somewhere, so that the next time we query for changes again, we'll be using that value to get the new changes happened from that moment on – and so on for the next syncs.

Easy and extremely efficient – and helpful to keep your client-side code very clean and easy to understand.

To disable Change Tracking on a specific table, you use the ALTER command on the table where you want to disable Change Tracking:
ALTER TABLE [Warehouse].[Colors]
DISABLE CHANGE_TRACKING
To disable it on the entire database, you use the ALTER command on the whole database:
ALTER DATABASE WideWorldImportersStandard
SET CHANGE_TRACKING = OFF

Change data capture

Change data capture is an even more sophisticated technology that uses the internal immutable transaction log to read all the changes done on data and save it into a specified table. It resembles very closely the idea of a Change Feed that some other databases provide. It reads from the transaction log using an asynchronous process and thus is perfect to consume changes without affecting performance of the table where those changes happened. At the moment, it is not yet fully available on all Azure SQL offerings, but keep an eye out for it. With solutions like Debezium, it will easily allow the creation of near real-time ETL solutions and integration with Kafka for a completely upgraded developer experience.

Protecting data assets

Aside from storing data and making it available upon request, a database also has the task of keeping the data secure. Azure SQL offers plenty of features to do that, ranging from permission management through data masking and up to encryption, so that you can choose exactly to which degree you want to expose your data and to whom.

In addition to the features that will be presented in the next sections, keep in mind that by default data is encrypted at rest automatically, thanks to a feature named Transparent Database Encryption . As the name implies, it is completely transparent and doesn’t require any change to your code. Your data is secured by default once it is on Azure. Of course, that might not be enough for you, so read on.

Permissions

Managing permissions and thus data access authorization is typically a database administrator duty, but security is, as discussed already, a key point in every application, so it is always good for a developer to have at least a basic knowledge of the security principles of the tools and the system he or she needs to use, to make sure security can be taken care of from the ground up, avoiding future headaches and unpleasant surprises.

Azure SQL applies the principle of the least privilege, and as such, a newly created user cannot really do much in Azure SQL. Aside from the administrative user you defined when you created the Azure SQL Server or Azure SQL Managed Instance resource, which has all the possible authorizations as it is an administrative account, and should not be used to grant application access to the data, you should create a dedicated user for your application. For example, for an API service hosted in Azure Web API, you could create the WebAPI user:
CREATE USER WebAPI WITH PASSWORD = '94m1-2sx0_1!';
If you try to log in with this user, you’ll discover that it doesn’t have access to anything. To allow it to do a SELECT on a table, for example, you need to GRANT it the correct permission:
GRANT SELECT ON OBJECT::[Sales].[Orders] TO WebAPI;
There are lots of permissions in Azure SQL which allow you to fine-tune the security access of any user. To simplify the management of security, you can use roles or schemas. Roles allow you to group users together, so that a user can inherit the permissions given to the roles he or she has been assigned to. Roles can be created manually, or you can use pre-existing roles. For example, any user in the db_datareader role can read data from any table, but they won’t be able to modify any data:
ALTER ROLE [db_datareader] ADD MEMBER [WebAPI];
If you want to give some permissions only to a specific set of objects, for example all the tables used by the Web API solution you are creating, you can target an object schema:
GRANT SELECT ON SCHEMA::[Sales] TO WebAPI;

Such code will allow you to use SELECT on any object that belongs to the Sales schema.

As Stored Procedures do not allow SELECT to be used, as you may have already guessed, you’ll need to grant the EXECUTE permission to a user to make sure one can execute the desired Stored Procedure:
GRANT EXECUTE ON OBJECT::[dbo].[AddTagsToPost] TO WebAPI;

Every time a user tries to execute an action on an object for which he or she doesn’t have the correct permission, an error will be raised, and the execution of the statement will be halted.

In order to test that a specific user has the correct permissions, you can impersonate that user if you have logged in using the administrative account:
EXECUTE AS USER = 'WebAPI';

From now on, you can execute all the T-SQL code you want to test, and it will be executed as if was the user WebAPI to invoke it.

To revert back to your administrative user, you just need to run
REVERT;
Also, keep in mind that if you want to make sure someone cannot access something, you can explicitly DENY to that user the permission to do something on an object:
DENY SELECT ON OBJECT::[Sales].[Orders] TO SampleUser;

A DENY will always win if there are several different and conflicting authorizations for the same object. This could happen if the user is part of one or more roles.

To completely remove a user from a database, so that he won’t even be able to connect to it, one easy way is to delete the user:
DROP USER SampleUser;
In case you want to temporarily prevent the user from connecting, instead of dropping that user, you could just deny him or her the permission to connect if you want:
DENY CONNECT TO SampleUser;

Permissions and security are huge topics, but this is a good start. Remember to keep security in mind right from the beginning and you’ll already be one step ahead.

If you’re already a security-savvy person, you may be thinking that using a password for logging in into a system is something that’s not really secure, and you are totally right. You can create an Azure SQL user that does not have a password stored in Azure SQL and that relies on Azure Active Directory instead. Your application can then use a Managed Identity to authenticate itself when connecting to Azure SQL.

A full tutorial on this topic is here: https://aka.ms/aswtcmsi.

Row-Level Security

With permissions , you can deny access to certain tables or even just to certain columns. But what if you want to deny access to some specific rows? In Azure SQL, you have a powerful feature called Row-Level Security, or RLS for short, that allows you to do exactly that.

With RLS, you can create a policy that will target one or more tables. Each time a read or a write operation is executed against those tables, the policy will kick in: it will evaluate if the rows affected by the requested operation are accessible to the user executing it or not.

To do the evaluation, the policy will apply a provided user defined Inline Table-Valued Function to each row touched by the active operation. If the function returns 1, that row will be accessible; otherwise, it won’t.

Thanks to the Azure SQL Query Optimizer , the function will be inlined, and thus it won’t really be executed for each row in scope; otherwise, performance would be horrible, but from a logical point of view, that’s exactly what is happening.

Here’s an example of a security policy definition:
CREATE SECURITY POLICY OrderSecurityPolicy
ADD FILTER PREDICATE [rls].LoginSecurityPolicy(SalespersonPersonID) ON [Sales].[Orders]
WITH (STATE = ON);
As you can see, it uses the Function rls.LogonSecurityPolicy to check authorization for rows in Sales.Orders table. The function will receive the value of the column SalespersonPersonID for each row it needs to evaluate. The function is created as follows:
CREATE FUNCTION rls.LoginSecurityPolicy(@PersonID AS INT)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
SELECT
     1 As [Authorized]
FROM
     [Application].[People]
WHERE
     LoginName = SESSION_CONTEXT(N'Login')
AND
     PersonID = @PersonId;
GO

The code uses the system function SESSION_CONTEXT to retrieve the value of the Logon key. The function is incredibly useful as it allows a client to pass some information to Azure SQL as part of the connection data that can then be accessed anytime during the connection lifetime. This feature comes in handy when you are creating a solution that runs on a middle tier or as a microservice and cannot impersonate the calling user. In this way, you can still have a way to pass to Azure SQL the information you need, for example, the login name, so that you can use it to apply a security policy. If you are using OAuth2 to authenticate the user calling the solution you are working on, for example, this is absolutely a needed feature; otherwise, Azure SQL won’t be able to know who is really trying to access the data, and you would have to find clever (and complex) way to solve the security challenge by yourself on the client side.

Once Azure SQL has the information on who is actually trying to access the data, as you can see in the code, it will check if that login exists in the Application.People table. It will also check that such person is the one assigned to manage the order he or she is trying to access by using the value injected into the @PersonId variable by the active security policy. If all these checks are true, then the row is allowed to surface to the user. If not, the row will simply be discarded, and the user won’t even know it exists. In fact, when the security policy is active, if Miss Kayla Woodcock (that has PersonId = 2) tries to access the data via the solution we created, she will only see her rows:
../images/493913_1_En_5_Chapter/493913_1_En_5_Figb_HTML.jpg

This is pretty amazing, as it really helps to simplify the code a lot, making sure that only the data a user is authorized to see will actually leave the database.

Other than filtering out rows, a policy can also be instructed to raise an error in case someone is trying to access rows for which he or she doesn’t have the needed permissions. This can be useful, for example, to intercept INSERT commands that would insert non-authorized values. As the value doesn’t exist yet, there is no way to use the filter predicate, but still you may want to block its execution: that’s exactly what the BLOCK PREDICATE can do.

As usual, the security policy can be deleted with the DROP command:
DROP SECURITY POLICY OrderSecurityPolicy;

A video that goes into detail of RLS and shows how it can be used to create real-world application, along with a GitHub repo with working sample code, is available here: https://aka.ms/rlsvideo.

Dynamic Data Masking

Sometimes you can’t prevent access to certain columns as the application expects them to work properly. For example, imagine you are creating a set of APIs that will allow the user to execute ad hoc queries on available data, so that one can have an ad hoc reporting or analytics solution to run even the most exotic analysis. A typical use case for this would be to connect Power BI to such API. Depending on the user accessing the data, you may want to mask some of the returned data, so that the unauthorized user, instead of getting an error, would simply see a predetermined pattern, protecting sensitive information. To do that, you need to add a Mask to the columns you want to obfuscate:
ALTER TABLE [Application].[People];
ALTER COLUMN EmailAddress ADD MASKED WITH (FUNCTION = 'email()');
After the preceding code is executed, anyone who doesn’t have the UNMASK permission will see only masked data for the EmailAddress column.
../images/493913_1_En_5_Chapter/493913_1_En_5_Figc_HTML.jpg

Figure 5-1. Dynamic Data Masking in action

This obfuscation happens right into the database, so no sensitive data is ever sent outside Azure SQL.

As data is only obfuscated, a user that is already in possession of some sensitive data can still use that information to query the database. Let’s say that in the report you are creating, you need to analyze a specific user and that user gave you her own email. A query like this will work perfectly:
SELECT * FROM [Application].[People] WHERE EmailAddress = '[email protected]'

This gives you quite a lot of flexibility as you can be sure to protect data without limiting its usage for those who had access to that protected information in some other, legitimate way.

Of course, this is a double-edged sword as it also means that by executing brute force attacks, obfuscated data could be guessed by a malicious user. That’s why Dynamic Data Masking should always be used along with other security features, like Row-Level Security, so that the possibility of allowing a data breach is minimal. Azure also offers a feature called Advanced Threat Protection that works with Azure SQL too and that raises alerts for potential threats by using sophisticated Machine Learning algorithms to monitor suspicious database access and anomalous query patterns, so that you can always be sure that any malicious attempt to access your data will be reported immediately.

Always Encrypted

Always Encrypted is a feature that will guarantee that your highly sensitive data, like Credit Card numbers or Social Security numbers, is encrypted with a set of keys that are not shared with the database engine. This means that even database administrators won’t be able to decrypt that data, as the data will be encrypted directly by the application and sent to the database in its encrypted form.
../images/493913_1_En_5_Chapter/493913_1_En_5_Figd_HTML.jpg

Figure 5-2. CreditLimit has been Always Encrypted

The keys used for encryption can be stored in the local machine or in Azure Key Vault, and only those applications that have access to those keys can access the encrypted data. Trying to access Always Encrypted data without the correct key will result in an error:
Msg 0, Level 11, State 0, Line 0
Failed to decrypt column 'CreditLimit'.
To encrypt and decrypt the values, the client application must use a connection library that supports Always Encrypted, like
  • Microsoft JDBC Driver 6.0 (or higher)

  • ODBC Driver 13.1 for SQL Server

  • ODBC Driver 17 for SQL Server

  • Microsoft Drivers 5.2 for PHP for SQL Server

  • Microsoft.Data.SqlClient

Those libraries will take care of everything, and encryption and decryption will be completely transparent for the developer.

If you want to know more

You now have a complete knowledge of all the supported programmability features that Azure SQL offers, as well as a very good base of security concepts and options. These capabilities are at your disposal to make sure your application is not only scalable and fast but also secure.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset