Batch processing

With columnar storage, the CPU can become a bottleneck. SQL Server solves these problems with batch mode processing. In batch mode processing, SQL Server processes data in batches rather than processing one row at a time. A batch represents roughly 900 rows of data. Each column within a batch is stored as a vector in a separate memory area, meaning that batch mode processing is vector-based. Batch mode processing interrupts a processor with metadata only once per batch rather than once per row, as in row mode processing, which lowers the CPU burden substantially. This means that batch mode spreads the metadata access costs over all of the 900 rows in a batch.

Batch mode processing is orthogonal to columnar storage. This means SQL Server can use it with many different operators, no matter whether the data is stored in row or column storage. However, batch mode processing gives the best results when combined with columnar storage. DML operations, such as insert, update, or delete, work in row mode. Of course, SQL Server can mix batch and row mode operators in a single query.

SQL Server introduced batch mode also in version 2012. The batch mode operators in this version include the following:

Filter
Project
Scan
Local hash (partial) aggregation
Hash inner join
Batch hash table build, but only in-memory, no spilling
Bitmap filters limited to a single column, data types represented with a 64-bit integer

In SQL Server 2014, the following batch mode operators were added:

All join types
Union all
Scalar aggregation
Spilling support
Complex bitmap filters, all data types supported

SQL Server 2016 added the following improvements to batch mode processing:

Single-threaded queries
Sort operator
Multiple distinct count operations
Left anti-semi join operators
Window aggregate functions
Window analytical functions
String predicate and aggregate pushdown to the storage engine
Row-level locking on index seeks against a nonclustered index and rowgroup-level locking on full table scans against the columnstore

Finally, SQL Server 2017 added two more improvements to batch mode processing:

Batch mode adaptive joins
Batch mode memory grant feedback

The following table summarizes the most important features and limitations of columnar storage and batch mode processing in SQL Server versions 2012 to 2106:

Columnstore Index/Batch Mode Feature	SQL 2012	SQL 2014	SQL 2016	SQL 2017
Batch execution for multi-threaded queries	yes	yes	yes	yes
Batch execution for single-threaded queries			yes	yes
Batch mode adaptive joins				yes
Batch mode memory grant feedback				yes
Archival compression		yes	yes	yes
Snapshot isolation and read-committed snapshot isolation			yes	yes
Specify CI when creating a table			yes	yes
AlwaysOn supports CIs	yes	yes	yes	yes
AlwaysOn readable secondary supports read-only NCCI	yes	yes	yes	yes
AlwaysOn readable secondary supports updateable CIs			yes	yes
Read-only NCCI on heap or B-tree	yes	yes	yes	yes
Updateable NCCI on heap or B-tree			yes	yes
NCCI online build and rebuild				yes
B-tree indexes allowed on a table that has a NCCI	yes	yes	yes	yes
Updateable CCI		yes	yes	yes
CCI LOB columns support				yes
B-tree index on a CCI			yes	yes
CCI on a memory-optimized table			yes	yes
Filtered NCCI			yes	yes

You can check whether SQL Server uses row or batch mode for an operator by analyzing the properties of the operator in the execution plan. Before checking the batch mode, the following code adds nine time as many rows to the test fact table:

DECLARE @i AS INT = 1; 
WHILE @i < 10 
BEGIN 
SET @i += 1; 
INSERT INTO dbo.FactTest 
(SaleKey, CustomerKey,  
 Customer, CityKey, City, 
 DateKey, StockItemKey, 
 Product, Quantity, 
 TotalAmount, Profit) 
SELECT @i * 1000000 + f.[Sale Key] AS SaleKey, 
  cu.[Customer Key] AS CustomerKey, cu.Customer, 
  ci.[City Key] AS CityKey, ci.City,  
  f.[Delivery Date Key] AS DateKey, 
  s.[Stock Item Key] AS StockItemKey, s.[Stock Item] AS Product,  
  f.Quantity, f.[Total Excluding Tax] AS TotalAmount, f.Profit 
FROM Fact.Sale AS f 
  INNER JOIN Dimension.Customer AS cu 
    ON f.[Customer Key] = cu.[Customer Key] 
  INNER JOIN Dimension.City AS ci 
    ON f.[City Key] = ci.[City Key] 
  INNER JOIN Dimension.[Stock Item] AS s 
    ON f.[Stock Item Key] = s.[Stock Item Key] 
  INNER JOIN Dimension.Date AS d 
    ON f.[Delivery Date Key] = d.Date; 
END;

Let's check how much space this table uses now:

EXEC sys.sp_spaceused N'dbo.FactTest', @updateusage = N'TRUE'; 
GO

The result is as follows:

Name          rows     reserved   data       index_size  unused 
------------  -------  ---------  ---------  ----------  -------
dbo.FactTest  2279810  502152 KB  498528 KB  2072 KB     1552 KB

Now let's start querying this table. Before executing the following query, make sure you turn the actual execution plan on:

SELECT f.StockItemKey, 
 SUM(f.TotalAmount) AS Sales 
FROM dbo.FactTest AS f 
WHERE f.StockItemKey < 30 
GROUP BY f.StockItemKey 
ORDER BY f.StockItemKey;

You can hover the mouse over any of the operators. For example, the following screenshot shows the details of the Hash Match (Partial Aggregate) operator:

Row mode operators

You can see that SQL Server used row mode processing. As mentioned, batch mode is not strictly bound to columnar storage; however, it is much more likely that SQL Server would use it as you use the columnar storage. The following code creates a filtered nonclustered columnstore index. It is actually empty:

CREATE NONCLUSTERED COLUMNSTORE INDEX NCCI_FactTest 
ON dbo.FactTest 
(SaleKey, CustomerKey,  
 Customer, CityKey, City, 
 DateKey, StockItemKey, 
 Product, Quantity, 
 TotalAmount, Profit) 
WHERE SaleKey = 0; 
GO

Now, execute the same query again. As you can see from the following screenshot, this time batch mode is used for the Hash Match (Partial Aggregate) operator:

Batch mode operators

Table of Contents for Batch processing

Create new playlist

Sign In

Sign Up

Table of Contents for
Batch processing