After having discussed the foundational aspects of querying and manipulating data, it’s now time to focus on more advanced and development-oriented features that you can use in Azure SQL. If you already took a peek at the chapter content, you may be surprised to find a full section on security. Don’t be. Security is not something that can be added later, like an afterthought, that is a nice to have but not so core. On the contrary, security is important as much as performances and maintainability and must be taken into consideration from the ground up. Therefore, the options Azure SQL offers you to keep your data secure are a must-known for everyone who wants to create a modern application – modern because it is not only scalable and modular, but because it is also secure.
Programmability
As a developer, you already know how important it is to be able to reuse and encapsulate existing code. Azure SQL provides a wide range of options that you can use to make sure your SQL code is clean, legible, easy to maintain, and reusable.
Variables
One behavior that may surprise at the beginning, and that is quite different from variables in other programming languages, is how Azure SQL manages the variable’s scope. A variable is scoped to a batch or to an object. In the case of an object, understanding the scoping is easy, as T-SQL behaves exactly like any other programming language. A variable defined within a Stored Procedure, Trigger, or Function will be scoped to that object only.
Variables defined outside an object, instead, are scoped to the batch. A batch is a group of T-SQL statements executed together. In the preceding code sample, you must execute all three statements (DECLARE, INSERT, and SELECT) all together; otherwise, the code would not work. If you try to execute the DECLARE statement alone, that will work, but once done, the variable will be gone. If you would try to execute the INSERT statement, then you’ll get an error telling you that the variable @t is not defined.
Table Variables are generally used in very niche places as they may have a quite heavy impact on performances, and usually they should not contain too many rows. Be aware that this has nothing to do with memory consumption or anything related to that: the issue is that Table Variables are something in between two worlds – Variables and Tables – and for this reason, the Azure SQL engine cannot optimize access to their data as much as it would with regular table instead. If you want to learn more details about this, look at the end of this chapter so that you can learn more. In general, if you need to have a temporary place where to store some of your data, use Temporary Tables instead.
Temporary Tables
Especially if creating analytical solutions, it could happen (even pretty frequently) that you need to temporarily store processed data somewhere, so that you can work on it at a later time. Sometimes it is also helpful to split a very complex query in several smaller queries, and you’ll need a place to store the result of a query so that it can be picked up from another one for further processing.
Temporary means that it will exist until manually destroyed with the DROP command or until it goes out of scope. For a Stored Procedure or a Trigger, the scope is the lifetime of the Stored Procedure or Trigger that created the temporary table. Once the Stored Procedure or Trigger has finished executing, the temporary table will be destroyed.
A Temporary Table named #Order37 will be created, automatically inferring the column names and types, and it will be filled with all the rows from the table Sales.OrderLines related to the specified order.
Temporary Tables benefit from special optimizations that Azure SQL can do, as it knows such tables are ephemeral by definition. If you need to park your temporary data somewhere, make sure you use them in place of regular tables.
Views
Views are probably the simplest option available to reuse code and encapsulate data processing logic. A view is nothing more than a query definition, labeled with name, and usable as a table. In fact, views are also known as virtual tables, even if this name is seldomly used.
Views are useful for code reuse but also for securing access to your data and to keep backward compatibility with existing applications, by abstracting access to underlying tables.
GO is not a T-SQL command or keyword. GO is a keyword recognized by a tool, like SQL Server Management Studio or Azure Data Studio, that notifies the tool that two sets of T-SQL statements, usually called batches, must be executed separately and independently (but not in parallel though!). This is needed by some commands, many CREATE statements, for example, so that you can tell the tool you’re using when a view’s body is finished and when another command, potentially even unrelated to the created view, is starting.
Azure SQL will behave like it could push the WHERE clause into the view and apply the filter so that the running total will be calculated only on the requested Orders. More precisely, what really happens is that the view is expanded and merged with the query that is using it. This allows Azure SQL to optimize the query as a whole and avoid, for example, to just follow the precedence order which would require to calculate the Running Total for all orders and then remove all the rows that are not in the query's scope. As you can imagine, that would be extremely inefficient. By following this optimization process instead, the view can take advantage of all the filters used in outer queries (when it makes sense) so that it can reduce as much as possible the number of rows on which it has to operate, providing better performance and less resource usage.
Security is another reason why views are useful: you can grant someone access to a view while denying that user access to the table used by the view. This means that you can be sure a user can only see the result of the query, but not the values stored in the underlying tables used to create that result.
Last reason for taking advantage of views is that they help to abstract access to tables by creating an indirection layer, so that your application and the database can stay loosely coupled, giving you the flexibility to make changes to your database without necessarily breaking any working application: views can used as a backward-compatible interface to existing applications. You are free to refactor your database and evolve its design to some new direction you want it to take, with the peace of mind that you won’t introduce any breaking changes.
Functions
Functions in Azure SQL are of two types: functions that return a table and functions that return a scalar value. Besides this difference, they are very similar to what you probably already expect, given the fact that they are, as the name implies, functions. They allow you to encapsulate and reuse code and processing logic and can accept parameters and return a result.
As Azure SQL has many system-provided functions built in, for example, STRING_SPLIT, the functions created by the user are generally called User-Defined Functions or UDFs.
A Table-Valued Function that is made of more than one single SELECT statement is called Multi-Statement Table-Valued Function , and while it provides more flexibility than an Inline TVF, it cannot be optimized as much when called by other queries. For this reason, it should be used only if there are no other options available as it may have a tangible impact on performances.
On the other side of the spectrum, there are the Scalar Functions. They return just a single value. The code used in their body can be as complex as you want and even include several SELECT statements, but at the end of its execution, one and only one scalar value must be returned.
While Scalar Functions could give the impression of something you would use as much as possible, especially if you are used to imperative programming languages, you should instead think twice when you want to create and use a new one. The reason is simple and is related to the fact that applying one operation to a set of data is usually much more efficient than executing that operation for each data point (rows in case of a relational database) in your set. If you create a Scalar function and use it to process the values returned by a query, you are forcing Azure SQL to execute that function for each row returned. Each call to a function has its own overhead, which adds up to the fact that Azure SQL Query Optimizer is working with his hands tied as it doesn’t have much space to optimize your code, as the function must be executed for every row. Performance will suffer a lot. And it is not by chance, in fact, that every time a database is forced to work processing single rows instead of a set of data, people refer to it as RBAR (Row-By-Agonizing-Row). So, use Scalar UDFs sparingly. On the bright side, in the last few years, Microsoft research found some clever methods to take the code inside the Scalar Function and rewrite it as more optimizable code, but still, if you can write the code in such a way RBAR is not needed from the start, you will help the optimizer right away, without having to wait for Microsoft Research, and in turn you’ll get better performance immediately.
Stored Procedures
Stored Procedures are exactly what the name says: procedures stored in the database that can be executed by calling them by name. Stored Procedures can have parameters, can return a single scalar value, and can return zero one or more resultsets.
They offer an extremely powerful way to encapsulate complexity, reuse code, and create an abstraction layer: they can be used to create an API interface – a contract – that can shield the user from any changes or evolution that happens during a database’s lifetime. In addition to that, Stored Procedures offer other interesting benefits.
Like many other programmability features used to easily secure access to data, Stored Procedures can also be used for that purpose. A user can be authorized to execute a Stored Procedure but could be denied to access the underlying tables or views directly. This, especially if you are creating public facing applications, like an API or web app, is vital as you can be sure that, no matter what, a user accessing the database through that solution will only be able to access data via the procedures you have created for that purpose. Even if for some reason a user could gain direct access to the database, bypassing your middle tier or your web app, he or she won’t be able to do anything else apart from what the Stored Procedures would allow him or her to do. That’s a pretty important point and a huge help in keeping your data secure.
Stored Procedures also have another important benefit. Azure SQL, once the Stored Procedure is invoked for the first time, will cache the execution plan, also known, in other database systems, as a DAG (Directed Acyclic Graph), which is a list of steps that Azure SQL will execute in order to generate the result of your query. You will learn more about this later in the book, but as you can imagine, especially for complex queries, generating an execution plan can be expensive and resource-intensive, so caching it will help Azure SQL to save CPU and time, giving you better overall performances and freeing resources for higher scalability without incurring into additional costs.
As a developer, Stored Procedures will be something you will love to use as they really help you in creating a very clear separation of concerns, keeping the code clean and polished, facilitate a clear definition of an interface between database and application, and ultimately favor a good loosely coupled architecture.
The code inside the Stored Procedure will be executed, and one resultset, the one generated by the SELECT ... FROM Sales.[Order] query, will be returned.
You may be wondering, at this point, when you should use Functions and when Stored Procedures. Good question! As a general guidance, keep in mind that while functions are quite powerful, especially the Inline Table-Valued Functions, they also have quite a lot of limitations and constraints. For this reason, unless you have some specific use case that is perfectly suited for a Function, the recommendation is to use Stored Procedures: they offer better performances and don’t have all the limitations that Functions have.
Triggers
Triggers are special Stored Procedures that get executed when something happens. There are two types of Triggers: DML (Data Manipulation Language) Triggers and DDL (Data Definition Language) Triggers.
The first type, DML Triggers, gets executed when an INSERT, UPDATE, or DELETE (or MERGE) command is executed. The latter, DDL Triggers, is instead executed when CREATE, ALTER, or DROP commands, which are used to define data structures, are invoked.
In both cases, you will have access to special objects that will allow you to interact with the event that sparked the Trigger execution.
For DDL Triggers, you have access to the Inserted and Deleted virtual tables so that you can access the data as it was before and after the change. You can decide to do whatever you want with that data. For example, you can store the deleted data into another table to provide an easily accessible log of changed rows. You can also instruct the Trigger to prevent that change to happen at all: for example, you may want to be sure that some configuration data is not removed from a core table that your application needs to work properly.
In DML Triggers, you have access to an EVENTDATA function that returns an XML containing all the details about the statement being executed. For example, in this case, you can save that information to keep track of when, how, and who altered a table or dropped an object. Also in this case, you can also prevent that modification from happening.
This capability is very powerful but also quite expensive: since Triggers are part of an active transaction, they can roll it back, but from a resource usage perspective, it would be better to prevent it in the first place. Preventing it would have avoided using IO, CPU, and memory resources just to discover, right at the end of the process, that all the work done must be undone – such a waste of precious resources.
Besides the impact on resource usage, Triggers can also have a big impact on performances. Their performance will be as good as the code you write and as fast as the logic you built into the Trigger could be. If in the Trigger you’re trying to aggregate data of a billion rows table, chances are that performance won’t be exactly stellar. And this will happen every time the Trigger is executed.
That’s why, more and more frequently, Triggers are replaced with natively supported features. For example, as you’ll learn in the next sections, keeping track of all changes done to a table can be done more efficiently and easily using a feature called Temporal Tables, or that securing access to specific rows and values can be done with Row-Level Security.
So, in general, try to keep Triggers’ usage as low as possible. You should do all the checks upfront before making any changes to your data, so that you don’t have to undo anything. It will favor performance, concurrency, maintainability, and scalability.
Non-scalar parameters
As you just learned in the previous sections, Stored Procedure and Functions, as expected, support parameters. In all modern programming languages, parameters can be anything, from scalars to complex objects. In Azure SQL, this is true too, albeit with some differences, since there is no concept like an object, given that SQL is not an object-oriented language. In the next sections, you’ll find the options available to achieve what passing an object would do in other languages.
Table-Valued Parameters
As the name implies , parameters can be tables. Yes, you can pass an entire table into a Function or a Stored Procedure. This is by far the best option you have when you need to pass quite a lot of data from your application to Azure SQL. I’ve written quite a lot because, as a rule of thumb, this option is good when you have up to some thousand rows to pass. If you are in the realms of hundreds of thousands, millions, or even more, then you should look at the “Bulk operations” section discussed in the previous chapter for really good performance.
Also keep in mind that the Table-Valued Parameter (TVP) feature needs to be supported by the client library, as it needs to be aware of that option to be able to use it. Common languages like .NET, Python, Node, and Go all support TVPs.
How to use the TVP depends on the language you are using, but the common idea is that you can load data into an object that can be enumerated, making sure that the data structure is compatible with the schema of the table type you’ll use as a parameter, and then pass that object to the Stored Procedure you are calling.
Table-Valued Parameters are heavily optimized for performances, and also by sending a batch of rows to Azure SQL, your application will do the roundtrip to the database only once, instead of as many times as you have rows to pass. As every call to resources external to application has small overhead, by keeping the number of times you call the database as low as possible – also referred to as a less or not chatty application – you’ll make sure to give the end user the best experience possible, creating applications that are scalable and provide great performance.
JSON
As JSON is passed a regular string, this technique will work with any language as there is no special knowledge required by the client libraries to take advantage of this option.
CSV
Same as with JSON , this option doesn’t require any special handling on the client side, so you can use this technique with any development language that supports regular ODBC connections.
Monitoring data for changes
Finding what data has changed from the last time an application, or a user, accessed it can be extremely helpful to increase application efficiency and scalability. Much fewer resources, in terms of CPU, network, and IO, would be needed, for example, to exchange only the changed data instead of a full set of data with a target application. In addition to such an already important point, since the burden of tracking changes can be left to Azure SQL, you’ll have your code leaner and solely focused on the specific business case you need to take care of. As a result, you will have more time to take care of it, removing from your plate the need of having to deal with other complexities that are anyway needed and expected in modern applications – like efficiency in data transfer.
Change Tracking
Change tracking is a simple, yet effective technology to understand what data has changed since a specific moment in time. The idea behind this feature is that every time a user accesses a table, he or she can ask for the version number active at that moment. The user must keep that version number safely stored somewhere so that the next time he or she tries to access that table again, the user can also ask Azure SQL to only get the changed data starting from that version number and on. Azure SQL will return all the changed data from that version number along with the operation – insert, update, or delete – that the user should do on its own dataset to update it to the current status, along with the new version number that the user must use next time, to get a new set of changes.
This feature is really powerful as it completely removes all the complexity from the client side, as the only thing that the client must do is to preserve the version number and present it to Azure SQL.
The version number is calculated and automatically updated at the database level – that’s why it doesn’t require any parameter – and will change every time some data is changed in one of the tracked tables. Let’s say, for example, that I’ve just read the table Warehouse.Colors, and after doing that, I asked the current version number. In addition to the resultset, Azure SQL returned the value 42 to me, which represents the number associated with the current version, which means that the values I just read are associated with that version number.
Now let’s say that someone, maybe even including myself, makes some changes to the Warehourse.Colors tables. After a while, let’s say 3 hours, I need to get the data again from the same table, but now, instead of getting the full dataset, I’d like to have only the changes from when I last saw it, so that I can avoid having to read and transfer all the rows that haven’t been changed in the meantime and that, most probably, would be the majority.
where you can see the operation done on a row and the value of the column for the Primary Key of that row. With the Primary key – that uniquely identifies rows in a table – we can join that result with the original table, Warehourse.Colors, and get all the data we need. We can now run the CHANGE_TRACKING_CURRENT_VERSION() function again to get the current version; let’s say now it is equal to 50, and we keep the value somewhere, so that the next time we query for changes again, we'll be using that value to get the new changes happened from that moment on – and so on for the next syncs.
Easy and extremely efficient – and helpful to keep your client-side code very clean and easy to understand.
Change data capture
Change data capture is an even more sophisticated technology that uses the internal immutable transaction log to read all the changes done on data and save it into a specified table. It resembles very closely the idea of a Change Feed that some other databases provide. It reads from the transaction log using an asynchronous process and thus is perfect to consume changes without affecting performance of the table where those changes happened. At the moment, it is not yet fully available on all Azure SQL offerings, but keep an eye out for it. With solutions like Debezium, it will easily allow the creation of near real-time ETL solutions and integration with Kafka for a completely upgraded developer experience.
Protecting data assets
Aside from storing data and making it available upon request, a database also has the task of keeping the data secure. Azure SQL offers plenty of features to do that, ranging from permission management through data masking and up to encryption, so that you can choose exactly to which degree you want to expose your data and to whom.
In addition to the features that will be presented in the next sections, keep in mind that by default data is encrypted at rest automatically, thanks to a feature named Transparent Database Encryption . As the name implies, it is completely transparent and doesn’t require any change to your code. Your data is secured by default once it is on Azure. Of course, that might not be enough for you, so read on.
Permissions
Managing permissions and thus data access authorization is typically a database administrator duty, but security is, as discussed already, a key point in every application, so it is always good for a developer to have at least a basic knowledge of the security principles of the tools and the system he or she needs to use, to make sure security can be taken care of from the ground up, avoiding future headaches and unpleasant surprises.
Such code will allow you to use SELECT on any object that belongs to the Sales schema.
Every time a user tries to execute an action on an object for which he or she doesn’t have the correct permission, an error will be raised, and the execution of the statement will be halted.
From now on, you can execute all the T-SQL code you want to test, and it will be executed as if was the user WebAPI to invoke it.
A DENY will always win if there are several different and conflicting authorizations for the same object. This could happen if the user is part of one or more roles.
Permissions and security are huge topics, but this is a good start. Remember to keep security in mind right from the beginning and you’ll already be one step ahead.
If you’re already a security-savvy person, you may be thinking that using a password for logging in into a system is something that’s not really secure, and you are totally right. You can create an Azure SQL user that does not have a password stored in Azure SQL and that relies on Azure Active Directory instead. Your application can then use a Managed Identity to authenticate itself when connecting to Azure SQL.
A full tutorial on this topic is here: https://aka.ms/aswtcmsi.
Row-Level Security
With permissions , you can deny access to certain tables or even just to certain columns. But what if you want to deny access to some specific rows? In Azure SQL, you have a powerful feature called Row-Level Security, or RLS for short, that allows you to do exactly that.
With RLS, you can create a policy that will target one or more tables. Each time a read or a write operation is executed against those tables, the policy will kick in: it will evaluate if the rows affected by the requested operation are accessible to the user executing it or not.
To do the evaluation, the policy will apply a provided user defined Inline Table-Valued Function to each row touched by the active operation. If the function returns 1, that row will be accessible; otherwise, it won’t.
Thanks to the Azure SQL Query Optimizer , the function will be inlined, and thus it won’t really be executed for each row in scope; otherwise, performance would be horrible, but from a logical point of view, that’s exactly what is happening.
The code uses the system function SESSION_CONTEXT to retrieve the value of the Logon key. The function is incredibly useful as it allows a client to pass some information to Azure SQL as part of the connection data that can then be accessed anytime during the connection lifetime. This feature comes in handy when you are creating a solution that runs on a middle tier or as a microservice and cannot impersonate the calling user. In this way, you can still have a way to pass to Azure SQL the information you need, for example, the login name, so that you can use it to apply a security policy. If you are using OAuth2 to authenticate the user calling the solution you are working on, for example, this is absolutely a needed feature; otherwise, Azure SQL won’t be able to know who is really trying to access the data, and you would have to find clever (and complex) way to solve the security challenge by yourself on the client side.
This is pretty amazing, as it really helps to simplify the code a lot, making sure that only the data a user is authorized to see will actually leave the database.
Other than filtering out rows, a policy can also be instructed to raise an error in case someone is trying to access rows for which he or she doesn’t have the needed permissions. This can be useful, for example, to intercept INSERT commands that would insert non-authorized values. As the value doesn’t exist yet, there is no way to use the filter predicate, but still you may want to block its execution: that’s exactly what the BLOCK PREDICATE can do.
A video that goes into detail of RLS and shows how it can be used to create real-world application, along with a GitHub repo with working sample code, is available here: https://aka.ms/rlsvideo.
Dynamic Data Masking
Figure 5-1. Dynamic Data Masking in action
This obfuscation happens right into the database, so no sensitive data is ever sent outside Azure SQL.
This gives you quite a lot of flexibility as you can be sure to protect data without limiting its usage for those who had access to that protected information in some other, legitimate way.
Of course, this is a double-edged sword as it also means that by executing brute force attacks, obfuscated data could be guessed by a malicious user. That’s why Dynamic Data Masking should always be used along with other security features, like Row-Level Security, so that the possibility of allowing a data breach is minimal. Azure also offers a feature called Advanced Threat Protection that works with Azure SQL too and that raises alerts for potential threats by using sophisticated Machine Learning algorithms to monitor suspicious database access and anomalous query patterns, so that you can always be sure that any malicious attempt to access your data will be reported immediately.
Always Encrypted
Figure 5-2. CreditLimit has been Always Encrypted
Microsoft JDBC Driver 6.0 (or higher)
ODBC Driver 13.1 for SQL Server
ODBC Driver 17 for SQL Server
Microsoft Drivers 5.2 for PHP for SQL Server
Microsoft.Data.SqlClient
Those libraries will take care of everything, and encryption and decryption will be completely transparent for the developer.
If you want to know more
You now have a complete knowledge of all the supported programmability features that Azure SQL offers, as well as a very good base of security concepts and options. These capabilities are at your disposal to make sure your application is not only scalable and fast but also secure.
SQL Server 2017 Developer's Guide – www.amazon.com/SQL-Server-2017-Developers-Guide-dp-1788476190/dp/1788476190
What's the difference between a temp table and table variable in SQL Server? – https://dba.stackexchange.com/questions/16385/whats-the-difference-between-a-temp-table-and-table-variable-in-sql-server/16386#16386
Working with Change Tracking – https://docs.microsoft.com/sql/relational-databases/track-changes/work-with-change-tracking-sql-server
Debezium: Stream changes from your database – https://debezium.io/
SQL Server Change Stream – https://medium.com/@mauridb/sql-server-change-stream-b204c0892641
Transparent data encryption – https://docs.microsoft.com/azure/sql-database/transparent-data-encryption-azure-sql?tabs=azure-portal
Always Encrypted – https://docs.microsoft.com/sql/relational-databases/security/encryption/always-encrypted-database-engine
Always Encrypted Client Development – https://docs.microsoft.com/sql/relational-databases/security/encryption/always-encrypted-client-development?view=azuresqldb-current