Tune SQL With Wait States - SQL Consulting

When your SQL Server is not performing as well as you would like, the fastest way to find out what’s wrong is to look at the wait statistics available in the dynamic management view sys.dm_os_wait_stats. At the bottom of this article we will give you the code to query this DMV to see what events are causing your SQL Server to wait too long or too frequently. But first we will explain a bit about wait states and wait statistics.

SQL Server is always waiting for something, a disk io, a place in the processor queue, a lock, etc. The SQL Server engine keeps track of the duration and type of these waits and makes the aggregated results available through sys.dm_os_wait_stats. It is simple to query this data management view to find which wait types are causing SQL Server to wait the most.

Identifying the wait types that have caused SQL Server to slow down is the first step in solving your performance problems. Unfortunately, the second step is more difficult. This step requires learning what these wait states mean and how the problems they cause can be fixed.

This step is difficult because there are hundreds of different wait states in SQL Server. And we can assume that there will be new ones added in future versions.

Fortunately, there are only a relatively few wait states that commonly cause performance problems. If you learn something about the eight most common wait states, you will be well on your way to using wait statistics for performance tuning.

These common waits are the ones that we will discuss in this article. We will describe what causes them and where possible, how to fix them. However, we also include a link below to a full reference on wait states for the times when you run into one that you don’t understand.

Here is our summary of common wait states:

CXPACKET

This wait state is associated with query parallelism. You will always see some number of these waits unless you have disabled query parallelism. Unless CXPACKET constitutes a very high percentage of total wait time, it usually does not mean it is a performance problem. If necessary, you can play with the max degree of parallelism setting. In rare cases you might even want to turn query parallelism off by setting max degree of parallelism to 1.

PAGEIOLATCH_XX

All the variations of PAGEIOLATCH waits usually indicate a problem with the disk subsystem. However, we should not assume that a lot of these waits mean your disk subsystem is inadequate for the load. It might well be that poorly written code, poor indexing or lack of sufficient memory to cache data is causing the io subsystem to work much harder than it should normally have to.

ASYNC_NETWORK_IO

The name is deceptive. It is rare that you find this to be a network problem. Most often it is a problem with the application consuming data very slowly. This can be triggered by poor code or by queries that return an enormous amount of data for the application to process. Often the fix is to optimize your poorly performing code.

WRITELOG

In most cases this is an indication that the disk subsystem serving the log file is inadequate for the task.

OLEDB

Most often this wait is triggered by communication between linked servers. However, linked servers are not the only things that use OLEDB to communicate. Since data management views also use OLEDB the problem could be a 3rd party monitoring tool that is making heavy use of DMVs to gather performance data.

LCK_M_XX

This wait is often a symptom of blocking. However the blocking itself can be caused by any number of different things. High on that list is inefficient code and indexing.

SOS_SCHEDULER_YIELD

Without going into a great deal of complex explanation, it is probably enough to say that this is an indication of pressure on your cpu resources. You need to find and fix code that is using too much cpu as well as other conditions that use a lot of cpu. If high values for this wait state persist, you might consider more or faster processors.

PAGELATCH_XX

This wait occurs when SQL Server is attempting to latch a page in memory. Often this is caused by a well-known issue in tempdb when a high number of processes are trying to access the allocation pages in the tempdb file at the same time. If your tempdb is heavily used, this may be the reason.

If you need to know something about a wait state that does not appear above, You should be able to find what you need below.

Querying sys.dm_os_wait_stats

Paul Randal has written the query below.
https://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/

It removes irrelevant waits like internal processes and benign waits that are not a cause for concern. This query makes it easy to see the true state of affairs on your server.

If you need to know something about a wait state that does not appear above, You should be able to find what you need there.

In addition to the link above, I reproduce Paul’s query here because it is time consuming to copy the code from Paul’s blog and remove the line numbers manually.

USE master;
 
-- clear accumulated stats if necessary
--DBCC SQLPERF ('sys.dm_os_wait_stats', CLEAR); 
 
 
WITH [Waits] AS
    (SELECT
        [wait_type],
        [wait_time_ms] / 1000.0 AS [WaitS],
        ([wait_time_ms] - [signal_wait_time_ms]) / 1000.0 AS [ResourceS],
        [signal_wait_time_ms] / 1000.0 AS [SignalS],
        [waiting_tasks_count] AS [WaitCount],
        100.0 * [wait_time_ms] / SUM ([wait_time_ms]) OVER() AS [Percentage],
        ROW_NUMBER() OVER(ORDER BY [wait_time_ms] DESC) AS [RowNum]
    FROM sys.dm_os_wait_stats
    WHERE [wait_type] NOT IN (
        N'CLR_SEMAPHORE',    N'LAZYWRITER_SLEEP',
        N'RESOURCE_QUEUE',   N'SQLTRACE_BUFFER_FLUSH',
        N'SLEEP_TASK',       N'SLEEP_SYSTEMTASK',
        N'WAITFOR',          N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
        N'CHECKPOINT_QUEUE', N'REQUEST_FOR_DEADLOCK_SEARCH',
        N'XE_TIMER_EVENT',   N'XE_DISPATCHER_JOIN',
        N'LOGMGR_QUEUE',     N'FT_IFTS_SCHEDULER_IDLE_WAIT',
        N'BROKER_TASK_STOP', N'CLR_MANUAL_EVENT',
        N'CLR_AUTO_EVENT',   N'DISPATCHER_QUEUE_SEMAPHORE',
        N'TRACEWRITE',       N'XE_DISPATCHER_WAIT',
        N'BROKER_TO_FLUSH',  N'BROKER_EVENTHANDLER',
        N'FT_IFTSHC_MUTEX',  N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
        N'DIRTY_PAGE_POLL', N'SP_SERVER_DIAGNOSTICS_SLEEP')
    )
SELECT
    [W1].[wait_type] AS [WaitType],
    CAST ([W1].[WaitS] AS DECIMAL(14, 2)) AS [Wait_S],
    CAST ([W1].[ResourceS] AS DECIMAL(14, 2)) AS [Resource_S],
    CAST ([W1].[SignalS] AS DECIMAL(14, 2)) AS [Signal_S],
    [W1].[WaitCount] AS [WaitCount],
    CAST ([W1].[Percentage] AS DECIMAL(4, 2)) AS [Percentage],
    CAST (([W1].[WaitS] / [W1].[WaitCount]) AS DECIMAL (14, 4)) AS [AvgWait_S],
    CAST (([W1].[ResourceS] / [W1].[WaitCount]) AS DECIMAL (14, 4)) AS [AvgRes_S],
    CAST (([W1].[SignalS] / [W1].[WaitCount]) AS DECIMAL (14, 4)) AS [AvgSig_S]
FROM [Waits] AS [W1]
INNER JOIN [Waits] AS [W2]
    ON [W2].[RowNum] <= [W1].[RowNum]
GROUP BY [W1].[RowNum], [W1].[wait_type], [W1].[WaitS],
    [W1].[ResourceS], [W1].[SignalS], [W1].[WaitCount], [W1].[Percentage]
HAVING SUM ([W2].[Percentage]) - [W1].[Percentage] < 95; -- percentage threshold
GO

Now that you are armed with the proper tools you should be able to diagnose problems in SQL Server very rapidly.

Additional menu