Query Store and Parameterization Problems

The query store gives us a novel way of identifying those queries that are causing performance problems when they are parameterized by SQL Server for reuse. Although it is relatively simple to ensure that certain troublesome queries avoid the problem, it is laborious to identify these queries. Additionally, Query Store gives us the means to fix the problem for groups of queries by means of plan guides without changing the DDL at all. Dennes Torres explains the details

Although parameterization problems can have a serious impact on the performance of a SQL Server database, it isn’t easy to determine which queries have the problem so that you can fix the issues that are causing them to run slowly. Although it is relatively easy to determine whether a specific query has a parameterization problem, it is more difficult to then identify all the other queries that have this kind of problem in the server. If we can do this, and resolve the problem for each query, it will improve the overall performance of the database.

As well as being a very important new tool in SQL Server 2016, Query Store can also help us to find all the queries with parameterization problems in a SQL Server, and to fix them. In this article I will show you how that’s done.

What is the Parameterization Problem

When SQL Server receives a new query, the query is compiled and the resulting query plan is stored in memory, in the query plan cache, so the plan can, if possible, be reused.: The query itself is also stored with it as a string. Every time the same query, with the same text, is executed again, the query plan will be re-used from the cache.

However, if the text was saved unaltered, all queries that have predicates seeking for specific information, such as the client record for customer 135, would have this specific information stored in the cache. In consequence, a query that, for example, was seeking for a particular client record would only be reused if a new query was made seeking for the same client. If a new query asked for a different client, it would be compiled again and stored, with no reuse. This would result in excessive query compilations, thereby affecting server performance.

To get around this problem, SQL Server ‘parameterizes’ the query string. The way it does this is controlled by a database property called Parameterization. The default configuration, ‘Simple’, allows SQL Server to judge whether to parameterize a query or not, so that some but not all queries are parameterized.

When SQL Server decides to parameterize a query, the information in the query are replaced by parameters, and this parameterized query is stored in cache along with its plan. If another query is made that after this parameter substitution resolves to the same query, then the plan can be reused, thereby potentially saving considerable time and CPU.

You might think that all such queries should be parameterized, and so be tempted to change the Parameterization option to Forced; but there is a catch: Some queries have a terrible behavior when parameterized. When allowed to, with ‘simple’ parameterization, SQL Server parameterizes just a small percentage of queries that it caches, using a complex heuristic, to avoid parameterizing a query that shouldn’t be parameterized.

As well as using the ‘forced’ Parameterization configuration, a query can also be parameterized by the developer. The developer can call sp_execute_sql procedure to send a parameterized query to the database. In fact, several ORM (Object-Relational Mapping) frameworks, such as Entity Framework, uses this procedure automatically.

Batches of queries inside stored procedures are also parameterized by the developer, using the stored procedure parameters, or the batches provided to sp_execute_sql. This means that they can also be subject to parameterization problems.

The problem itself happens when we parameterize a query or batch that shouldn’t be parameterized. If the data distribution for the column used in the predicate is particularly skewed, some queries would have a terrible execution plan for some values of the predicate: Everything goes well if the values are equally distributed, but if they aren’t, then we are likely to have a problem because the strategy in the cached query plan would be poor. Sometimes, the first time that the query is used, it uses an atypical parameter so that the stored plan is inappropriate for the majority of subsequent queries. This can lead to a query running uncharacteristically slowly on occasion. (see Parameter Sniffing)

All this seems a bit nebulous when just described: So, in the first part of this article, we will illustrate the problem practically, and then we’ll show how to investigate it and solve it. First. We must set up a test environment to demonstrate these points.

Demonstration Environment

I will do all the demonstrations of parameterization using the sample database ‘AdventureWorks2016’, which you can download at https://www.microsoft.com/en-us/download/details.aspx?id=49502.

I will also use a function that I’ve developed called ‘plancachefromdatabase’, which allow us to check the content of the plan cache of a specific database. I explained about this function in a previous article I wrote on Simple-Talk called https://www.simple-talk.com/sql/t-sql-programming/checking-the-plan-cache-warnings-for-a-sql-server-database/

Execute the code below in the ‘AdventureWorks2016’ database to create the function, preparing the environment for the demonstrations.

I will also use the tables generated by the script called ‘Make_Big_Adventure’, created by Adam Machanic. You can download the script here (http://sqlblog.com/blogs/adam_machanic/archive/2011/10/17/thinking-big-adventure.aspx) and execute it in ‘AdventureWorks2016’. You need to change the first line of the script, changing the ‘Use’ instruction to ‘Use Adventureworks2016’.

Finally, we also will use a new index over the table ‘BigProduct’. Below you can see the ‘create index’ instruction, execute it in ‘AdventureWorks2016’ database.

Demonstrating Parameterization

These four queries below will illustrate the simple parameterization: two of them will be parameterized and the other two will not. We can check this by using queries over the system tables to look at what is being stored in the query plan cache.

Let’s execute them step-by-step:

  1. Select the database and clear the procedure cache

    This method of clearing the procedure cache of a single database is new to SQL Server 2016. However, you need to take care with it, and avoid using this procedure in a production database, because it will affect the database performance for a while.

  2. Select and execute the following query:

    It’s important to select the query precisely. If you select a single extra space before or after the query the query optimizer will understand as a different query and the demonstration will fail.

    Correct:

    Wrong: (extra white space after the string)

    Wrong again: (preceding white space in the CR/LF sequence)

  3. Do the same, select and execute, each one of the following queries, one by one:

  4. Use the query below to check the plan cache. We are using the custom function we created in the database while preparing the demo:

The result will be similar to the image below. You may notice that one of the queries was compiled for each different value in the predicate while the other wasn’t. Using parameterization ‘Simple’ only a small class of queries are parameterized, while using ‘Forced’ all the queries will be parameterized.

Should we use ‘Forced’ parameterization to avoid plan cache bloat? No, we shouldn’t. ‘Forced’ parameterization will parameterize queries that shouldn’t be parameterized and this will become a bigger problem than plan cache bloat. Let’s see an example.

In the next code section, there are two similar queries. These queries are the same, except for the parameter. However, the optimal execution plans of the queries are different. When we use the value 245.01 the plan uses an index seek, but it uses clustered index scan when we use the value 0.

To demonstrate this, select both queries below in SSMS and click the button Display Estimated Execution Plan. I’m using option (recompile) hint, so the queries will be recompiled and we can check the plan of each query:

The query plans of these queries are below:

These queries are using a predicate over a field whose data isn’t well distributed in the table. There are ten thousand rows with the value ‘0.00’ while only fifty with the value ‘245.1’. By using the option (recompile) hint, we can see that there is a different query plan for each value. That’s the catch: with badly-distributed values, each value will need its own query plan. In this case, if the query is parameterized, which query plan will SQL Server use?

These troublesome queries could be parameterized for one of several reasons, and the configuration ‘Parameterization:Forced’ is only one of them. There are several developer tools, such as ORMs, that will parameterize the queries and some of them will do this by default.

This is the nature of the problem that we are investigating: Some queries work fine with parameterization, some don’t. We need to identify the queries with parameterization problems and solve the problem.

Identifying the Queries with Parameterization Problems

Let’s first try the same queries that we used to demonstrate parameterization. We’ll use ‘Forced’ parameterization and check what happens when these queries are parameterized. Let’s do it step-by-step.

  1. First, change the parameterization configuration and clear the database procedure cache:

  2. Click the Include Actual Execution Plan’ button in SSMS toolbar.
  3. Select the query below and execute it. You should be careful: If you select any extra whitespace before or after the query, the sample will not work.

  4. Repeat the same action for the second query:

    The resulting query plan is exactly the same, the second query is re-using the query plan of the firstt one, because they are parameterized.

  5. Clear the procedure cache again:

  6. Repeat the same procedure, changing the query order. First, execute the following query:

  7. Repeat the same action for the other query:

    Again, the second query plan is the same than the first, the second query is re-using the plan of the first query.

As you have noticed in the demonstration, SQL Server uses the value of the first executed query to build the query plan. The resulting cached plan will be good for some values and terrible for others. We still can use some query hints to force a good enough plan, however the best solution would be to prevent the query being parameterized and for SQL Server to then use the best plan for each value.

We can solve this problem using plan guides, but let’s go a step back. First, we need a way to identify which queries in our server are suffering with parameterization problems. We can do this using query store.

Query Store is a new and very useful tool in SQL Server 2016.

Query Store

The query plan cache only keeps the current plan for each query. Query Store, on the other hand, is able to keep a history of query plans that are used for each query and the execution statistics for each plan. Using this information, we can answer several questions, such as ‘which queries are suffering performance regression?’.

The execution information that Query Store collects can tell us which query is suffering from parameterization problems.

Query Store needs to be enabled in all the databases in which we would like to collect historic query plans and execution statistics. When it is enabled, it stores all the query information in system tables within the database.

Query Store also has some pre-built graphics about the database activity. We can use these graphics whenever we experience regression problems or we can choose to query the information in query store system views (DMVs) to find useful data. However, what I’m proposing here is far from the main objective of query store, but still a very interesting approach to solve an old problem. Due to that, we will use query store system views.

Query Store Schema

Query Store collects information from two different phases of the query processing: The compilation and the execution. From the compilation phase, Query Store collects the query text and query plan, while from the execution phase it collects the execution statistics.

All this information is kept in memory and written asynchronously to the disk, so that the query execution performance isn’t affected by the query store.

The image below illustrates the data capture process used by query store. Since the information is partially in memory until the asynchronous write happens, the query store views join the information in memory and in disk.

The main DMVs in the QS schema are used to store queries, plans and run-time statistics. For each query, there are several plans and for each plan there are several statistic records. Each statistic record is a summary of all executions during a time interval.

The image below illustrates the query store schema and its DMVs.

Identifying Parameterization Problems

When a query is suffering from parameterization problems, some executions of the query will be far worse than others. By analyzing execution statistics, we can identify average execution numbers and we can also identify the standard deviation from these numbers. Standard Deviation is a difference from the average, to less or more, making a range containing at least 68.26% of the values, in this case, 68.26% of the query executions.

If the standard deviation is low, this means most executions are near the average numbers and the query hasn’t a problem. However, if the query suffers from parameterization problems, some executions are way worse than others, making a high standard deviation.

That’s the way we can use to identify parameterization problems: The queries with the highest standard deviation have this problem. It’s possible, sure, that other problems with the query will also cause a high standard deviation, however, this method allows us to retrieve a promising list of queries that probably have parameterization problems.

Before we go into a step-by-step demonstration, we will need to build the query we will use to find the parameterization problems using the Query Store schema.

Building the main query

  1. Firstly, we can find, using query store DMVs, all those queries that are parameterized. We can identify this using the ‘query_parameterization_type’ field

  2. Now, let’s join the result with the ‘sys.query_store_query_text’ DMV to retrieve the text of the query. We can include an additional filter to retrieve queries which have ‘@’ in the query text. This is needed because queries inside a stored procedure are not identified as parameterized, however if they use the stored procedures parameters they can suffer from the same parameterization problems.

  3. By joining the query with the sys.query_store_plan DMV, we can retrieve the most recent query plan used by the query. We also need to include a sub-query in the filter to retrieve only the most recent query plans, rather than all of them.

  4. If we join the query with the ‘sys.query_store_runtime_stats’ DMV, we can retrieve all the execution statistic information from each query plan we retrieved earlier.

  5. The previous query retrieves several execution status rows for each query plan. The Query Store creates a new row for each time interval. We need to group the rows and summarize the values.

  6. We need to filter the result to exclude all system queries. I included several filters to achieve this, however if your application uses queries against system tables or system databases, these queries will be excluded too.

  7. Finally, let’s create a function to turn easier the use of this query. We can’t include the order by inside the function:

Now It’s time to do a step-by-step walk-through that demonstrates how our query can identify parameterization problems.

Identifying Parameterization Problems Via Query Store

  1. First, let’s create a stored procedure. Parameterization problems can happen with ad-hoc queries and stored procedures, let’s do an example with both.

  2. Let’s turn query store on for ‘AdventureWorks2016’ database

  3. We need to execute two queries with parameterization problems several times

  4. It’s also interesting for the example to execute two more queries that have no parameterization problems.

  5. For a complete example, let’s end off by also executing a stored procedure that has parameterization problems

  6. Finally, let’s use our query against query store

    The image below shows the query result:

    There are some points you may have noticed:

    • The query against the Sales.CreditCard table hasn’t parameterization problems, we can deduce this by the low standard deviation values.
    • The query using a letter as parameter was executed inside a stored procedure.
    • The queries using numbers as parameters were executed as ad-hoc queries and parameterized by SQL Server.
    • The standard deviation of the CPU time of the first two queries is too high, showing that they are suffering from parameterization problems.
  7. We can add one more predicate to the query to filter results with low standard deviation.

There is a way to be sure these queries are suffering from parameterization problems. If we check the data distribution in the table we will notice the values aren’t well distributed. This query below does this for the ‘bigproduct’ table:

The image above illustrates a totally uneven distribution of data in the field ‘listprice’. Some values appear in a huge amount of records while others appear in only a few. This is completely natural and our queries need to deal with it.

We can’t solve parameterization problems using query store. Although query store allows us to force a specific query plan, this will not solve this kind of problem. The solution we need is the possibility to use different query plans according to the value of the parameter. We can solve this using “Option (Recompile)” in the queries, so that each execution will generate a new query plan. However, query store isn’t able to add one option to an existing plan. How can we achieve this without changing the source code of the system? The solution is the use of plan guides.

The procedure used to create plan guides, ‘sp_create_plan_guide’, needs the query text and the set of parameters as two different varchar input parameters. We can use some string functions to build these parameters. Check the following query:

The above query works for this demonstration, but there is no guarantee that it will work for you so you’d need to check, because small changes in the query text would generate a wrong result.

After we have decided which queries deserve a plan guide and identified the correctness of the parameters and the query text, we can execute the script below to create the plan guides:

You can check in the ‘Object Explorer’ window, inside SSMS, the plan guides created. They will appear under ‘AdventureWorks2016’ database, ‘Programmability’ -> Plan Guides’

Finally, to be sure that we have achieved the desired result, we need to check the query plans of the queries with problems. We want the result to be a different plan for each value used in the query.

Select each of the following queries, one by one, and click the button ‘Display estimated execution plan’ in SSMS.

As you may have noticed, each query generated a different query plan, the best query plan for each one, thereby solving the parameterization problem.

Conclusion

As well as all the wonderful features that Query Store brings to our performance troubleshooting tasks, we can now also use it to identify parameterization problems in our server before our users complain that the database is running slowly.

Until now, we were able to identify if a specific query suffers from this problem, but we had no solution to find all of those queries in our server that suffer from parameterization problems. This new solution can become a very useful routine maintenance task.

If this is your first contact with query store, you may also like the following articles, which explain the main features and benefits of Query Store: