Windows/윈도우 공통

Monitoring Disk Usage

99iberty 2013. 11. 21. 15:48

http://social.technet.microsoft.com/wiki/contents/articles/3214.monitoring-disk-usage.aspx

 

Microsoft SQL Server performance depends heavily on the I/O Subsystem (IOS). Latency in the IOS can result in many performance problems. For example, you may experience slow response times and problems caused by tasks timing out. It is critical that you monitor your disk usage.


 

Monitoring Disk I/O

Disk counters that you can monitor to determine disk activity are divided into the following two groups:

Primary

  • PhysicalDisk: Avg. Disk sec/Write
  • PhysicalDisk: Avg. Disk sec/Read

Secondary

  • PhysicalDisk: Avg. Disk Queue Length
  • PhysicalDisk: Disk Bytes/sec
  • PhysicalDisk: Disk Transfers/sec

Each disk should be monitored individually. Note that using the (_Total) instance can be misleading and mask problem areas. This is because the (_Total) instance sums and averages all of the disks together.

NOTE: If you are using mount points we recommend that you use the Logical Disk object instead of the Physical Disk object. Logical Disk displays the mount point path instead of the physical drive number.

In System Monitor, the Avg. Disk sec/Write and Avg. Disk sec/Read counters are considered “primary”. These counters should be examined first and do not need any additional information to evaluate the drive performance. These counters determine the average latency of an I/O request.

Avg. Disk Sec/Read is the average time in seconds of a read of data from the disk. The following list shows ranges of possible values and what the ranges represent:

  • Less than 10 ms - very good
  • Between 10 - 20 ms - okay
  • Between 20 - 50 ms - slow, needs attention
  • Greater than 50 ms – Serious I/O bottleneck

Avg. Disk Sec/Write is the average time in seconds of a write of data to the disk. The guidelines for the Avg. Disk Sec/Read values apply here.

NOTE: The numbers listed in this section are for general reference. If you have a very high requirement of application response time on a busy system, meeting the disk response time with these numbers might not be sufficient.

If all or most of the drives report high latency, the bottleneck is probably in the communication medium (such as SAN HBA, Switches, Fiber, Frontend Adaptor CPUs, and Cache). If only one drive or a select few report latency, the bottleneck will generally be in the JBOD (number of disks). To further examine this, review the secondary counters for the drives reporting high latency. If all drives are below their threshold there is no reason to examine the secondary counters.

NOTE: In System Monitor, it is important to monitor using the maximum field. Using the average field in System Monitor can be misleading.

You should only use the secondary counters for the drive(s) that have high latency. If the drive has acceptable latency, there is no point in moving forward. Disk Bytes/sec and Disk Transfers/sec are used to determine the size of and number of I/O requests. These counters can help determine if the number of disks or the communication medium is the source of the latency. You can also use Avg. Disk Queue Length to validate the communication medium. Generally, a value greater than 32 represents a bottleneck that can increase latency.

Disk Transfers/sec is comprised of both Disk Reads/sec and Disk Writes/sec. You can use these counters to determine if the drive does not have enough supporting disks. When you use these counters you might need to adjust the values for the type of RAID you have implemented. To determine what values to use, use the following formulas:

  • Raid 0 -- I/Os per disk = (reads + writes) / number of disks
  • Raid 1 -- I/Os per disk = [reads + (2 * writes)] / 2
  • Raid 5 -- I/Os per disk = [reads + (4 * writes)] / number of disks
  • Raid 10 -- I/Os per disk = [reads + (2 * writes)] / number of disks

For example, if the maximum value for Disk Transfers/sec is 1800, you could determine that the drive will need at least 10 15k RPM disks in its RAID group. Generally, a 15k RPM disk is capable of approximately 180 I/O requests per second (IOPS). 180*10 = 1800. For a higher value you might require more than 10 disks.

NOTE: Consult your hardware vendor to identify the precise amount of IOPS your disks are capable of handling. The average seek time and rotational latency can affect the IOPS output. All disks are NOT created equal.

If the latency is consistently high, you can determine the root cause by using the secondary counters. If the latency is due to the number of disks, consider the following:

  • Use a faster disk drive.
  • Move frequently accessed files to a separate disk, server, or SAN.
  • Add disks to a RAID array if you are using one.
  • Use a faster RAID type, such as RAID 10.
  • Stop sharing disks with other volumes or LUNs.

If the latency is due to the communication medium, consider the following:

  • Increase the queue depth.
  • Move frequently accessed files to a separate disk, server, or SAN.
  • Validate the SAN cache.
  • Use multiple paths.

In addition to these recommendations, consider tuning queries that generate large amounts of I/O. To identify queries that consume large amounts of I/O, use the SYS.DM_EXEC_QUERY_STATS DMV. The dynamic management views (DMVs) have metrics for both reads and writes and are displayed by the query. You can also include the query plan and SQL command text by joining the SYS.DM_EXEC_SQL_TEXT and SYS.DM_EXEC_QUERY_PLAN dynamic management functions with the CROSS APPLY operator.

Following is a sample query using SYS.DM_EXEC_QUERY_STATS :

SELECT TOP 25 execution_count, plan_generation_num, last_execution_time,
total_worker_time, last_worker_time, min_worker_time, max_worker_time,
total_logical_reads, last_logical_reads, min_logical_reads, max_logical_reads
,
total_physical_reads, last_physical_reads, min_physical_reads, max_physical_reads
,
total_logical_writes, last_logical_writes, min_logical_writes, max_logical_writes
,
total_elapsed_time, last_elapsed_time, min_elapsed_time, max_elapsed_time
,
(SUBSTRING(s2.text, statement_start_offset / 2, ( (CASE WHEN statement_end_offset = -1 THEN (LEN(CONVERT(nvarchar(max),s2.text)) * 2) ELSE statement_end_offset END) - statement_start_offset) / 2) ) AS sql_statement,
text, p.query_plan

FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.plan_handle) s2
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) P
ORDER BY total_physical_reads DESC

Additionally, the PENDING_DISK_IO_COUNT column in SYS.DM_OS_SCHEDULERS can indicate disk subsystem problems. You should further investigate any sustained value for each processor. You can use SYS.DM_IO_PENDING_IO_REQUESTS to identify what the waiting requests are and associate them to database files. SYS.DM_IO_VIRTUAL_FILE_STATS reports I/O statistics on both data and log files.

Another resource you can use is the SYS.DM_OS_WAIT_STATS DMV. Use this DMV to determine what the engine is frequently waiting for and target that area for tuning. You might have an IOS bottleneck if PAGEIOLATCH waits represent some of the highest waits. PAGEIOLATCH waits indicate the amount of time the database engine has been waiting for the IOS. PAGEIOLATCH has various modes And more waits on PAGEIOLATCH_SH indicate a read bottleneck, whereas PAGEIOLATCH_EX indicates a write bottleneck.

Isolating Disk Activity Created by SQL Server

You can monitor the following counters to determine the amount of I/O generated by SQL Server components:

  • SQL Server:Buffer Manager:Page reads/sec
  • SQL Server:Buffer Manager:Page writes/sec
  • SQL Server:Buffer Manager:Checkpoint pages/sec
  • SQL Server:Buffer Manager:Lazy writes/sec

In System Monitor, these counters monitor the amount of I/O generated by SQL Server components by examining the following performance areas:

  • Writing pages to disk
  • Reading pages from disk

If the values for these counters approach the capacity limit of the hardware I/O subsystem, attempt to reduce the values by tuning your application or database to reduce I/O operations (such as index coverage, better indexes, or normalization), increasing the I/O capacity of the hardware, or adding memory. For example, you can use the Database Engine Tuning Advisor to analyze typical SQL Server workloads and produce recommendations for indexes, indexed views, and partitioning to improve server performance. For more information about Database Engine Tuning Advisor, see Tuning the Physical Database Design .

NOTE: These two counters do not include I/O generated by transaction log flushes. Most of the transaction log I/Os are writes.