Saturday, April 2, 2022

What are T-SQL window functions? How to use it and Performance Analysis

What are T-SQL window functions?

 T-SQL window functions make creating a lot of queries easier, and they typically outperform previous approaches. Using the LAG function, for example, is far superior to doing a self-join. To improve overall performance, you must first comprehend the notion of framing and how window functions rely on sorting to provide results.


T-SQL Window functions, first introduced in 2005 and improved in 2012, are fantastic improvements to the T-SQL language. We will outline everything you need to know to obtain good results while utilizing these functions in this post.


The OVER clause and sorting

PARTITION BY and ORDER BY are two choices in the OVER clause that might induce sorting. All window functions allow PARTITION BY. However, it's not required. For the majority of the functions, ORDER BY is necessary. The data will be sorted based on the OVER clause, which might be the query's speed bottleneck, depending on what you attempt to do.

The OVER clause's Sequence BY option is necessary so that the database engine can "align up" the rows so that the function may be applied in the right order. Let's imagine you wish to use the ROW NUMBER function in the order of SalesOrderID.

If you want the function to be applied in decreasing the order of TotalDue, the results will be different.

Let's see what the T-SQL query would look like in this scenario : 

USE AdventureWorks<version>; 
GO
SELECT SalesOrderID, 
	TotalDue, 
	ROW_NUMBER() OVER(ORDER BY SalesOrderID) AS RowNum
FROM Sales.SalesOrderHeader;
 
SELECT SalesOrderID, 
	TotalDue, 
	ROW_NUMBER() OVER(ORDER BY TotalDue DESC) AS RowNum
FROM Sales.SalesOrderHeader;


The results from the above queries would look like the below : 

What are T-SQL window functions? How to use it and Pefromance Analysis



how to use window function in SQL



No sorting is required because the first query uses the cluster key as the ORDER BY option.





The second query includes a time-consuming sort procedure.




The ORDER BY phrase in the OVER clause is unrelated to the ORDER BY clause in the overall query, which may be quite different.


The only method to avoid sorting's speed hit is constructing a dedicated index for the OVER clause. Itzik's book Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions explains how to use window functions in high-performance T-SQL. The POC index is recommended by Ben-Gan. (P)PARTITION BY, (O)ORDER BY, and (c)covering are the acronyms for (P)PARTITION BY, (O)ORDER BY, and (c)covering. 

He suggests placing any filtering columns in the key before the PARTITION BY and ORDER BY columns. Then, as included columns, add any extra columns required to construct a covering index. You'll need to test how such an index affects your query and total workload, just like everything else.

Of course, you won't be able to add an index to every query you create, but if the efficiency of a query that employs a window function is critical, you may follow this suggestion.

Here's a list of indexes that will help you enhance your previous query:

CREATE NONCLUSTERED INDEX test ON Sales.SalesOrderHeader
(CustomerID, SalesOrderID) 
INCLUDE (TotalDue);


When you restart the query, you'll see that the sort operation is no longer included in the execution plan:





FRAMING


Framing is, in my opinion, the hardest topic to grasp while studying T-SQL window methods. See the introduction to T-SQL window functions for further information on the syntax. The following items require framing:

ORDER BY window aggregates, such as FIRST VALUE LAST VALUE, are used for running totals or moving averages.

Fortunately, framing isn't necessary most of the time, but it's all too simple to skip the frame and go with the default. RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is always the default frame. Speed will suffer while you will obtain the right results as long as the ORDER BY option contains a unique column or collection of columns.

Here's an example of a comparison between the default and proper frames:

SET STATISTICS IO ON;
GO
SELECT CustomerID, 
	SalesOrderID, 
	TotalDue, 
	SUM(TotalDue) OVER(PARTITION BY CustomerID ORDER BY SalesOrderID)
          AS RunningTotal 
FROM Sales.SalesOrderHeader;
 
SELECT CustomerID, 
	SalesOrderID, 
	TotalDue, 
	SUM(TotalDue) OVER(PARTITION BY CustomerID ORDER BY SalesOrderID
	   ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
           AS RunningTotal 
FROM Sales.SalesOrderHeader;


Do let me know the output of the above query. You guys can compare that in the comment section and let me know.

The outcomes are the same, but the execution is vastly different. Unfortunately, the implementation plan does not give you the truth in this scenario. Each query consumed 50% of the resources, according to the report.

The I/O operations on the first query will be in thousands, while in the second query, they would be 0! Yes, you heard that right!


Conclusion


T-SQL window functions have been touted as high-performance options. They make creating inquiries easier in my perspective, but you must understand them well to achieve high performance. Although indexing can help, you can't generate an index for every query you write. Framing is a difficult concept to grasp, but it is critical if you need to scale up to enormous tables.

No comments:

Post a Comment