Professional Documents
Culture Documents
Designing and Tuning High Speed Data Loading - Thomas Kejser
Designing and Tuning High Speed Data Loading - Thomas Kejser
Thomas Kejser
Principal Program Manager
tkejser@microsoft.com
1
Agenda
Tuning Methodology
Bulk Load API Basics
Design Pattern and Techniques
Parallelism
Table Layout
Tuning Methodology
Generate
Hypothesi
s
time
Agree on targets for
Save
Result
Measure
optimization
Actual runtime
CPU, Memory, I/O
Change
Task Manager
WinDbg
KernRate
Sys.dm_os_latch_stats
Allows deep dive into LATCH_<X> waits
Sys.dm_os_spinlock_stats
When too much CPU seems to be spend
Sys.dm_io_virtual_filestats
Because I/O systems are rarely perfect
BULK INSERT
CSV or fixed width files
BCP
Like BULK INSERT, but can be run remotely
Minimally logged
Only allocations are logged, not individual rows/pages
Minimally logged:
INSERT Heap WITH (TABLOCK) SELECT ...
If TF610 is on:
INSERT ClusterIndex SELECT ...
Destinations in SSIS
10
11
12
Servers or OPENROWSET
Cons
Only CSV and fixed width
Pro
Can perform
transformations
Any OLEDB enabled input
Cons
Takes X-locks on table
Linked Servers or
OPENROWSET needed
13
Pro:
Fastest option
Easy to configure
out remote
Con:
Typically slower than SQL
Destination
Con:
Must run on same box as
14
DTEXEC
(1) Task
Get
Do Work
Loop
Priority Queue
Pn
P5
P4
P3
P2
P1
DTEXEC
(2) Task
Get
Do Work
Loop
15
0
1
2
3
4
5
6
hash
253
254
255
16
Sale
s
Sales
200
Updated
1
200
2
200
3
200
4
SW
ITC
H
SWITCH
Sales_Ne
w
Sales_Old
Sales_Del
ta
Update
Records
BULK INSERT
17
Sale
s
200
1
200
2
200
3
200
4
2001
(Filtered)
SWITCH
BULK
INSERT
SW
ITC
H
Sales_Tem
p
(2001
Filtered)
Sales_Tem
p
(2001)
18
Engine
19
ALLOC_FREESPACE_CACHE
- Measure:
Heap limits
250.0
Sys.dm_os_latch_waits
Long waits for
200.0
ALLOC_FREESPACE_CACHE
SQL Server Books Online:
150.0
more speed
MB/Sec
100.0
50.0
0.0
0
10
15
20
Concurrent Bulks
25
30
20
PAGELATCH_UP
PFS contention
Measure:
sys.dm_os_wait_stats
Hypothesis Generation
I/O problem?
What can we predict?
to the filegoup!
21
RESOURCE_SEMAPHORE
- Query memory usage
DW load queries will often
around it
22
SOS_SCHEDULER_YIELD
Hypothesis: Caused by two bulk commands
at same scheduler
Predict:
We should see multiple bulk commands on same scheduler
Observe: And we do
scheduler_id in sys.dm_exec_requests
23
Fixing SOS_SCHEDULER_YIELD
How can we fix this?
Two ways:
Terminate and reconnect
Soft NUMA
Core 0
Soft-NUMA
Node 0
TCP port
1433
x CPU
cores
Core
X
Soft-NUMA
Node X
TCP port
1433 + X
BULK INSERT
x CPU
cores
BULK INSERT
24
double buffering
scheme
Important to feed it
Table
PAGEIOLATCH_EX
Pars
e
64KB
64KB
CSV
IMPROVIO_WAIT
OLEDB
ASYNC_NETWORK_IO
fast enough
Also, target SQL
25
Throughput / DOP
INSERTSELECT
50.0
45.0
Measure:
Sometimes
throughput drops
with higher DOP
Hypothesis:
backpressure in
query execution
40.0
35.0
30.0
Throughput (MB/sec(
25.0
20.0
15.0
10.0
5.0
0.0
1 6 11 16 21 26 31 36 41 46
DOP
26
Solution:
OPTION (MAXDOP = X)
100,000,000
80,000,000
60,000,000
40,000,000
20,000,000
0
30.0 10.0
Throughput (MB/sec)
27
Typical Cause
Resolution
PAGELATCH_UP
ALLOC_FREESPACE_CACHE
SOS_SCHEDULER_YIELD
RESOURCE_SEMAPHORE
LCK_X
WRITELOG
PAGEIOLATCH_<X>
Tune I/O
IMPROV_IOWAIT
CXPACKET
OLEDB/ASYNC_NETWORK_IO
Optimize source
28
29
30
buffers to 512
Set client Rx buffers to 512 and client Tx buffers
to 256
Link speed 1000mbps Full Duplex
31
Measure
Perfmon shows huge
Hypothesis:
This is caused by small
size to 32K
Increases throughput by 15%
32
33
Time/s
Krows/s
SSIS 2008
144
2222
SQL MAXDOP = 0
158
2025
SQL MAXDOP = 1 x 32
162
1975
SQL MAXDOP = 1 x 32
246
1301
SSIS 2008
278
1151
1927
166
SQL MAXDOP = 0
34
Baseline of Package
Sanity check:
How much memory does each package use?
How much CPU does each package stream use?
Need enough CPU and Memory to run them all
Performance counters:
Process Private Bytes / Working Set (DTEXEC)
Processor % Processor Time
Network interface
Network / Current Bandwidth
Network / Bytes Total/sec
35
36
streams
Measured so far up to 96
cores
cores
Just Get rid of all bottlenecks
37
Q&A
38
2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S.
and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond
to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after
the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
39
APPENDIX
40
Practices
Managing and Deploying SQL Server Integrat
ion Services
SQL Server 2005 Integration Services: A Str
ategy for Performance
Integration Services: Performance Tuning Te
chniques
High Impact Data Warehousing with SQL Serv
er Integration Services
41