Professional Documents
Culture Documents
Mike Ruthruff SQLServer On SAN SQLCAT
Mike Ruthruff SQLServer On SAN SQLCAT
US: NASDAQ, USDA, Verizon, Raymond James Europe: London Stock Exchange, Barclays Capital Asia/Pacific: Korea Telecom, Western Digital, Japan Railways East ISVs: SAP, Siebel, Sharepoint, GE Healthcare
Drives product requirements back into SQL Server from our customers and ISVs Shares deep technical content with SQL Server community
Target the Most Challenging and Innovative Applications on SQL Server Investing in Large Scale, Referenceable SQL Server Projects Across the World
Provide SQLCAT technical & project experience Conduct architecture and design reviews covering performance, operation, scalability and availability aspects Offer use of HW lab in Redmond with direct access to SQL Server development team
How to validate a configuration using I/O load generation tools General SQL Server I/O characteristics How to diagnose I/O bottlenecks Sample Configurations
Generalizing SQL Server I/O patterns is difficult making sizing storage for a SQL Server deployment non-trivial in some cases OLTP (Online Transaction Processing)
Typical heavy on random read / writes (8K most common) Some amount of read-ahead typical Typical 64KB+ sequential reads (table and range scan) 128-256KB sequential writes (bulk load) Backup/Restore , Index Rebuild, etc (see appendix)
Operational Activities
Many mixed workloads observed in customer deployments Analysis Services I/O patterns
See appendix for more details on I/O characteristics of certain SQL Server operations
5
User threads fill log buffers & requests log manager to flush all records up to certain LSN - log manager thread writes the buffers Sequential in nature Individual write size varies
Dependent on nature of transaction Transaction Commit forces log buffer to be flushed to disk Up to 60KB in size
on WRITELOG
During checkpoint
Types of Checkpoints
User-initiated checkpoints:
Reflexive checkpoints:
Concurrency
Background/automatic checkpoints take place one at a time, however
Any number of user-initiated or reflexive checkpoints may occur simultaneously as long they are for different databases
Checkpoint Throttling
Checkpoint measures I/O latency impact and automatically adjusts checkpoint I/O to keep the overall latency from being unduly affected CHECKPOINT [checkpoint_duration]
CHECKPOINT now allows an optional numeric argument, which specifies the number of seconds the checkpoint should take Checkpoint makes a best effort to perform in the time specified If specified time is too low it runs at full speed
Lazy Writer
Background process which attempts to locate buffer pages which can be returned to the free list
LRU-2 algorithm in SQL 2005 / 2008
Attempts to retrieve data pages that will be used in immediate future Single read-ahead request
I/O Size determined by logical vs. physical ordering, target size of 64 pages (any multiple of 8K up to 512K) Standard: Limited to 128 pages, Enterprise: up to 512 pages Cumulative outstanding limit of 5000 (pages)
Until the buffer pool is (nearly) full, all single-page requests bring in the entire 8-page extent (Enterprise only)
Helps the server come up to speed quicker
11
Determined mainly by hardware capacity & characteristics of access patterns Data files can be used to maximize # of spindles stripping Number of data files per FILEGROUP
In the range of .25 to 1 per CPU cores depending on nature of the workload (also consider growth will number of CPU cores grow over time?) Scalability / performance consideration for allocation intensive workloads see slide on Diagnosing Allocation Contention
Will the target environment for a disaster recovery restore accommodate the file sizes?
Best practices:
Align data files with CPU cores (considering access patterns) Pre-size data/log files Use equal size for files within a single FILEGROUP Grow all files in a single FILEGROUP together when possible Rely on AUTOGROW as little as possible
13
Performance
Filegroups can be used to separate tables/indexes - allowing selective placement of these at the disk level (use with caution) Separate objects requiring more data files due to high page allocation rate
Database is available if primary filegroup is available; other filegroups can be offline A filegroup is available if all its files are available
Can specify separate filegroups for in-row data and large-object data
Partitioned Tables
Each partition can be in its own filegroup May provide better archiving strategy as partitions can be SWITCHED in/out of the table
14
In many modern storage scenarios it may better to place tempdb on common spindles with data files utilizing more cumulative disks
Depends on how well you know your workload use of tempdb (i.e. RDW workloads may
Many underlying technologies within SQL Server utilize tempdb (index rebuild with sort in tempdb, RCSI, etc..) More details (Working with tempdb in SQL Server 2005):
http://www.microsoft.com/technet/prodtechnol/sql/2005/workingwithtempdb.mspx
Applies most to allocation intensive workloads with heavy tempdb utilization Same practices as data files with respect to sizing and growth
15
High rate of allocations to data files can result in contention on allocation structures
PFS/GAM/SGAM are structures within data file which manage free space Especially a consideration on servers with many CPU cores More data files scales-out these structures and reduces the contention potential
Resource description in form of DBID:FILEID:PAGEID Can be cross referenced with sys.dm_os_buffer_descriptors to determine type of page
16
Storage technologies are evolving rapidly and traditional best practices may not apply to all configurations
However, many still do apply (specifically physical isolation practices), especially at the high end (high volume OLTP, large scale DW) There is no one single right way to configure storage for SQL Server on SAN
17
Already many shared components (ports / switches, array cache, controllers, etc) Spindle sharing is becoming more common of a practice
Cache does not solve all performance problems Think about splitting workloads with very different I/O characteristics at the physical levels yes, there is a benefit Isolation at physical level can provide 1) predictability and 2) better performance (in some cases)
Best to tune for writes (when possible) ; low log latency and absorbing checkpoint operations In shared storage environments - can be overused across hosts impacting all users
18
Array Cache
strategy
Management/Growth strategy Windows/SQL Server considerations Array feature utilization (snapshots, replication, etc..)
DBAs need to have knowledge of physical storage configuration Storage Administrators need some understanding of SQL Server I/O patterns When performance matters - size based spindle count not capacity, consider physical isolation at spindle level Shared components can impact everyone Heterogeneous I/O workloads sharing physical spindles can be problematic Workloads with overlapping periods of heavy I/O unpredictable performance Performance degradation over time as capacity utilization increases (increased seek times)
20
Proactive monitoring strategy in place and trending of response times Proper host storage configuration
Queue depth set too low Multipathing improperly configured Not using vendor recommended drivers
Disk Partition Alignment: Increase I/O Throughput by up to 10%, 20%, 30% or more Jimmy May Disk Partition Alignment Essentials (Cheat Sheet) Jimmy May
21
Controllers/Processors
Switch
Cache
Host
Switch
PCI Bus
HBA
Array Processors
Disks
23
Monitors
Disk counters Logical or Physical (when necessary) PAGEIOLATCH waits
Granularity
Volume or LUN
Latency, Number of I/Os Number of Reads (logical or physical) Number of writes (logical) Number of I/Os and type of access (seek, scan, lookup, write) PAGEIOLATCH waits
Database files
Query or Batch
Index or Table
Index or Table
Disk Read & Write Bytes/sec Average Disk Bytes/Read & Write
Measure of total disk throughput. Ideally larger block scans should be able to heavily utilize connection bandwidth. Measures the size of I/Os being issued. Larger I/O tend to have higher latency (example: BACKUP/RESTORE)
Cache utilization
% of Cache Utilized Write pending % Impacts how aggressive array is in flushing writes to physical media
database
Thin Provisioning
Capacity on demand / supports Green Computing Requires NTFS quick format & SQL Server instant file initialization
geographical distance
Virtualization of heterogeneous
storage environments
Ability to manage all storage
the application
Storage device based on NAND flash Fits into regular HDD slot Utilizes the same command set and interface Advantages
Disadvantages
Cost per GB Shifting bottleneck Limited experience for enterprise use Seeks are free, writes are expensive relative to reads
Database size (~300GB) Random read and write for checkpoints / sequential log writes 16 core server completely CPU bound
Counter Avg Disk/sec Read (total) Disk Reads/sec (total) Avg Disk/sec Write (total) Writes/sec (total) Processor Time Batches/sec
31
Same workload/database as SSD configuration (OLTP) Nearly same sustained IOs with ~10x number of spindles
Higher latency Slightly lower throughput Short stroking the spinning media
Counters Avg Disk/sec Read (total) Disk Reads/sec (total) Avg Disk/sec Write (total) Writes/sec (total) Processor Time Batches/sec
PASS Community Summit 2008 DBA-323 SQL Server 2008 on SAN - Best Practices and Lessons Learned
32
ICE
http://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032341825&Culture=en-US
Microsoft SQL Server I/O subsystem requirements for the tempdb database
http://support.microsoft.com/kb/917047
33
34
Visit the Microsoft Technical Learning Center located in the Expo Hall
Microsoft Data Platform ISV Village Microsoft Ask the Experts Lounge Microsoft Chalk Talk Theatre Presentations
35
36
Sponsored by
37
38
Difficult to generalize I/O patterns of SQL Server SQL is a platform on which applications are built hence I/O patterns may differ significantly from one application to another Monitoring of I/O is necessary to determine specifics of each scenario Understanding the I/O characteristics of common SQL Server operations/scenarios can help determine how to configure storage General I/O characteristics of common scenarios:
Operation
OLTP Log OLTP Log OLTP Data (Index Seeks) OLTP - Lazy Writer OLTP - Checkpoint Read Ahead (DSS, Index/Table Scans) Bulk Insert
Random / Sequential
Sequential Sequential Random Random Random Sequential Sequential
Read / Write
Write Read Read Write Write Read Write
Size Range
Sector Aligned Up to 60K Sector Aligned Up to 120K 8K Any multiple of 8K up to 256K Any multiple of 8K up to 256K Any multiple of 8KB up to 256K (512K for ENT Edition) Any multiple of 8K up to 128K
Note these values may change as optimizations are made to take advantage of modern storage enhancements 39
Operation
Random / Sequential
Sequential
Read / Write
Write
Size Range
512KB (SQL 2000) , Up to 4MB (SQL2005) (Only log file is initialized in SQL Server 2005)
CREATE DATABASE
BACKUP RESTORE DBCC CHECKDB ALTER INDEX REBUILD - replaces DBREINDEX (Read Phase) ALTER INDEX REBUILD - replaces DBREINDEX (Write Phase) DBCC SHOWCONTIG (deprecated, use sys.dm_db_index_physical_stats)
Multiple of 64K (up to 4MB) Multiple of 64K (up to 4MB) 8K 64K Any multiple of 8KB up to 256K
Sequential
Write
Sequential
Read
8K 64K
Note these values may change as optimizations are made to take advantage of modern storage enhancements 40
SAN
Better flexibility provided by virtualization of storage
Speed of deployment (once initial configuration is in place) Online configuration changes Better overall utilization of storage resources
DAS
Simple and well understood
More features
Storage Replication/Disaster Recovery, Snapshots/Clones via VDI/VSS Integration, Thin Provisioning, etc..
Contrary to some common perceptions SAN does might not equal better performance
41
Which queries are issuing the most I/O and are they properly tuned?
Which data files are incurring the most I/O and highest response times?
42
43
SQLIOSim.exe
Use: Ensure correct functionality of underlying I/O subsystem. Simulates various patterns of SQL Server I/O. Replacement for SQLIOStress.exe.
http://blogs.msdn.com/sqlserverstorageengine/archive/2006/10/06/SQLIOSim-available-fordownload.aspx
Use: Ensure correct functionality of underlying I/O subsystem. Simulates various patterns of SQL Server I/O. Use: Test throughput of I/O subsystem or establish benchmark of I/O subsystem performance
http://www.microsoft.com/downloads/details.aspx?familyid=9a8b005b-84e4-4f24-8d65cb53442d9e19&displaylang=en
SQLIO.exe
IOMeter
Use: Test throughput of I/O subsystem or establish benchmark of I/O subsystem performance Open source tool, Allows combinations of I/O types to run concurrently against test file No support for mount point volumes
http://www.iometer.org/
44
SQLIO / IOMeter
SQLIO is an unsupported tool provided by Microsoft that can be used for this
IOMeter is an external tool providing ability to stress storage subsystem with a variety of I/O patterns concurrently
Test and validate the performance of each storage configuration before deploying SQL Server application (common pitfall) Benchmark performance and shake out hardware/driver/multipathing problems early in the configuration Share the results with your vendor good method for comparing different configurations
45
Test a variety of I/O types and sizes Make sure your test files are significantly larger than the amount of cache on the array
Exception: if you are testing channel throughput in which case use files that will fit in array cache To get a true representation of disk performance use test files of approximate size to planned data files small test files (even if they are larger than cache) may result in smaller seek times and skew results
Test each I/O path individually and then combinations of the I/O paths Relatively short tests are okay, however, longer runs may give a more complete understanding of how the storage will perform Allow time in between tests to allow the hardware to reset (cache flush) Keep all of the benchmark data to refer to after the SQL implementation has taken place Maximum throughput (IOPs or MB/s) has been obtained when latency continues to increase while throughput is near constant
46
Consult your storage admin or vendor. They should know if this results are reasonable for the particular storage configuration Once you reach saturation (i.e. latency increases but throughput does not)
1. 2.
Ensure any multipathing is functional and you are not bound by the capacity of a single HBA, switch port, etc Ensure queue depth setting on the HBA is set high enough.
If too low it will seem as though the disk is saturated before it actually is (common pitfall) Default values for queue depth generally not ideal for SQL Server consider increasing
If test results vary wildly check to determine if you are sharing spindles with others on the array or shared components are an issue Monitoring at host and on the array during tests is ideal
47
RAID Levels
48
Complex storage configuration deployed in production Achieves disaster recovery through storage level replication across
Distance ~30Km Average 3-5ms latency Through snapshot/VDI technologies Used to quick reestablish replication
Backup/Restore
49
Deep queues per LUN, proactive monitoring of response time Sized for IOPs: sustain 8K Random / 18K sequential per LUN
CX700, CX3-80, CX-400i, DMX 3 - 4500, DMX 4 Using SRDF for DR / Using clone technologies to enable scale out of reporting as well as backup/restore
50
Monitoring Strategy
Work closely with SQL developers to optimize storage for I/O Professional respect. I will really say that it is a team effort A big part of it is learning to speak a common language. Hardware guys speak in I/Os SQL folks talk in spids and queries. We learned over the years to translate.
51
Internal tool used by Microsoft Information Security team Collects inbound and outbound e-mail traffic, login events, and Web browsing, into a single database which is then used to provide forensic evidence Provides analysis and query capabilities Gathers data from 85+ sources around the world Up to 10 concurrent users running ad-hoc queries and fixed reports SSIS, SSRS and the DW on the same box Use Table Partition to load new data into new partition quickly Achieve with minimal HW and operation cost
52
4 way single core 2.2 GHz HP Proliant DL 585 G1, X-64 with 8 GB RAM 40 TB across two CX700 arrays Currently @ 30TB
Loading/Deleting 500GB-1TB / day (60 day retention period) SQL Data, Log, TempDB volumes on RAID1+0, backup volumes on RAID5 200GB LUNs backed by 12 Spindles
53
ERP database migrated from IBM Mainframe to HP Integrity rx8640 this year 12 CPUs Itanium Dual-Core with 192GB RAM as Database Server Database Volume around 5TB 6 teamed 4Gb HBAs Application Server layer:
Workload during day created by over 1500 users Workload during night created by heavy batch activities Up to 30,000 random IOPS monitored during high load phase or parallel index create
54