Download as pdf or txt
Download as pdf or txt
You are on page 1of 47

Bob Duffy

Irish SQL Academy 2008. Level 300


DTS 2000
1.75 Developers
SSIS 2005
*Figures are only approximations and should not be referenced or quoted
Minimize staging (else use RawFiles if possible)
Hardware Infrastructure: Disks, RAM, CPU, Network
SQL Infrastructure: File Groups, Indexing, Partitioning
Optimize and
Stabilize the basics
Replace destinations with RowCount
Source->RowCount throughput
Source->Destination throughput
Measure
OVAL performance tuning strategy
The Three Ss
Data Flow Bag of Tricks
Tune
Lookup patterns
Script vs custom transform
Parallelize
Increase the efficiency of
every aspect
Sharpen
Parallelize, partition,
pipeline
Share
Buy faster, bigger, better
hardware
Spend
But be aware of limitations
Row
Based
(synchronous)
Partially
Blocking
(asynchronous)
Blocking
(asynchronous)
http://msdn.microsoft.com/en-us/library/ms345346.aspx
4 Gb Fiber Channel
Destination
server runs SQL
Server
Database
EMC CX3-80
Destination server:
Unisys ES7000/One
32 sockets each with dual core
Intel 3.4 GHz CPUs
256 GB RAM
Windows Server 2008
SQL Server 2008
1 Gb Ethernet
connections
2 Gb Fiber Channel
Source servers
run SSIS
Source data
EMC CX600
Source servers:
Unisys ES3220L
2 sockets each with 4 core
Intel 2 GHz CPUs
4 GB RAM
Windows Server 2008
SQL Server 2008
Make: Unisys
Model: ES7000/one Enterprise Server
OS: Microsoft Windows Server 2008 x64 Datacenter Edition
CPU: 32 socket dual core Intel Xeon 3.4 GHz (7140M)
RAM: 256 GB
HBA: 8 dual port 4Gbit FC
NIC: Intel PRO/1000 MT Server Adapter
Database: Pre-release build of SQL Server 2008 Enterprise Edition (V10.0.1300.4)
Storage: EMC Clariion CX3-80 (Qty 1)
11 trays of 15 disks; 165 spindles x 146 GB 15Krpm; 4Gbit FC
Quantity: 4
Make: Unisys
Model: ES3220L
OS: Windows2008 x64 Enterprise Edition
CPU: 2 socket quad core Intel Xeon processors @ 2.0GHz
RAM: 4 GB
HBA: 1 dual port 4Gbit Emulex FC
NIC: Intel PRO1000/PT dual port
Database: Pre-release build of SQL Server 2008 Integration Services (V10.0.1300.4)
Storage: 2x EMC CLARiiON CX600 (ea: 45 spindles, 4 2Gbit FC)
C1
C1
C1
C1
Orders Table
Partition
1
Partition
2
Partition
3
Partition
4
Partition
5
Partition
6
Partition
55
Partition
56
Orders_1 Orders_2 Orders_3 Orders_4 Orders_5 Orders_6 Orders_55 Orders_56
. . .
S
S
I
S
S
S
I
S
S
S
I
S
S
S
I
S
S
S
I
S
S
S
I
S
S
S
I
S
S
S
I
S . . .
orders.tbl.1 orders.tbl.2 orders.tbl.3 orders.tbl.4 orders.tbl.5 orders.tbl.6 orders.tbl.55 orders.tbl.56
(Package details
removed to protect
the innocent)
Iterative design, development & testing
Follow Microsoft
Development Guidelines
People & Processes
Kimballs ETL and SSIS books are an excellent reference
Understand the Business
Resource contention, processing windows,
SSIS does not forgive bad database design
Old principles still apply e.g. load with/without indexes?
Get the big picture
Will this run on IA64 / X64?
No BIDS on IA64 how will I debug?
Is OLE-DB driver XXX available on IA64?
Memory and resource usage on different platforms
Platform considerations
Break complex ETL into logically distinct packages (vs
monolithic design)
Improves development & debug experience
Process
Modularity
Separate sub-processes within package into separate Containers
More elegant, easier to develop
Simple to disable whole Containers when debugging
Package
Modularity
Use Script Task/Transform for one-off problems
Build custom components for maximum re-use
Component
Modularity
Concise naming conventions
Conformed blueprint design patterns
Presentable layout
Annotations
Error Logging
Configurations
Get as close to the data as possible
Limit number of columns
Filter number of rows
Dont be afraid to leverage TSQL
Type conversions, null coercing, coalescing, data type sharpening
select nullif(name, ) from contacts order by 1
select convert(tinyint, code) from sales
Performance Testing & Tuning
Connect Output to RowCount transform
See Performance Best Practices
FastParse for text files
BEFORE:
select
dbo.Tbl_Dim_Store.SK_Store_ID
, Tbl_Dim_Store.Store_Num
,isnull(dbo.Tbl_Dim_Merchant_Division.SK_Merch_Di
v_ID, 0) as SK_Merch_Div_ID
from dbo.Tbl_Dim_Store
left outer join dbo.Tbl_Dim_Merchant_Division
on dbo.Tbl_Dim_Store.Merch_Div_Num=
dbo.Tbl_Dim_Merchant_Division.Merch_Div_N
um
where Current_Row = 1
AFTER:
select * from etl.uf_FactStoreSales(@Date)
Use the power of TSQL to clean the data 'on the fly'
Too many moving parts is inelegant and likely slow
But dont be afraid to experiment there are many ways to
solve a problem
Avoid over-
design
Allocate enough threads
EngineThreads property on DataFlow Task
See Performance Talk
Maximize
Parallelism
Synchronous vs. Asynchronous components
Memcopy is expensive
Minimize
blocking
For example, minimize data retrieved by LookupTx
Minimize
ancillary data
Full Cache for small lookup datasets
No Cache for volatile lookup datasets
Partial Cache for large lookup datasets
Three Modes of
Operation
Full Cache is optimal, but uses the most memory, also takes time to
load
Partial Cache can be expensive since it populates on the fly using
singleton SELECTs
No Cache uses no memory, but takes longer
Tradeoff
memory vs.
performance
Catch is that it requires Sorted inputs
See SSIS Performance white paper for more details
Can use Merge
Join component
instead
Can written in any .Net language
Must be signed, registered and installed but
can be widely re-used
Quite fiddly for single task
Custom
components
Can be written in VisualBasic.Net or C#
Are persisted within a package and have
limited reuse
Have template methods already created for you
Scripts
http://sqlcat.com
http://technet.microsoft.com/en-us/library/bb961995.aspx
http://blogs.msdn.com/sqlperf/archive/2008/02/27/etl-world-record.aspx

You might also like