DTS 2000 1.75 Developers SSIS 2005 *Figures are only approximations and should not be referenced or quoted Minimize staging (else use RawFiles if possible) Hardware Infrastructure: Disks, RAM, CPU, Network SQL Infrastructure: File Groups, Indexing, Partitioning Optimize and Stabilize the basics Replace destinations with RowCount Source->RowCount throughput Source->Destination throughput Measure OVAL performance tuning strategy The Three Ss Data Flow Bag of Tricks Tune Lookup patterns Script vs custom transform Parallelize Increase the efficiency of every aspect Sharpen Parallelize, partition, pipeline Share Buy faster, bigger, better hardware Spend But be aware of limitations Row Based (synchronous) Partially Blocking (asynchronous) Blocking (asynchronous) http://msdn.microsoft.com/en-us/library/ms345346.aspx 4 Gb Fiber Channel Destination server runs SQL Server Database EMC CX3-80 Destination server: Unisys ES7000/One 32 sockets each with dual core Intel 3.4 GHz CPUs 256 GB RAM Windows Server 2008 SQL Server 2008 1 Gb Ethernet connections 2 Gb Fiber Channel Source servers run SSIS Source data EMC CX600 Source servers: Unisys ES3220L 2 sockets each with 4 core Intel 2 GHz CPUs 4 GB RAM Windows Server 2008 SQL Server 2008 Make: Unisys Model: ES7000/one Enterprise Server OS: Microsoft Windows Server 2008 x64 Datacenter Edition CPU: 32 socket dual core Intel Xeon 3.4 GHz (7140M) RAM: 256 GB HBA: 8 dual port 4Gbit FC NIC: Intel PRO/1000 MT Server Adapter Database: Pre-release build of SQL Server 2008 Enterprise Edition (V10.0.1300.4) Storage: EMC Clariion CX3-80 (Qty 1) 11 trays of 15 disks; 165 spindles x 146 GB 15Krpm; 4Gbit FC Quantity: 4 Make: Unisys Model: ES3220L OS: Windows2008 x64 Enterprise Edition CPU: 2 socket quad core Intel Xeon processors @ 2.0GHz RAM: 4 GB HBA: 1 dual port 4Gbit Emulex FC NIC: Intel PRO1000/PT dual port Database: Pre-release build of SQL Server 2008 Integration Services (V10.0.1300.4) Storage: 2x EMC CLARiiON CX600 (ea: 45 spindles, 4 2Gbit FC) C1 C1 C1 C1 Orders Table Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 Partition 55 Partition 56 Orders_1 Orders_2 Orders_3 Orders_4 Orders_5 Orders_6 Orders_55 Orders_56 . . . S S I S S S I S S S I S S S I S S S I S S S I S S S I S S S I S . . . orders.tbl.1 orders.tbl.2 orders.tbl.3 orders.tbl.4 orders.tbl.5 orders.tbl.6 orders.tbl.55 orders.tbl.56 (Package details removed to protect the innocent) Iterative design, development & testing Follow Microsoft Development Guidelines People & Processes Kimballs ETL and SSIS books are an excellent reference Understand the Business Resource contention, processing windows, SSIS does not forgive bad database design Old principles still apply e.g. load with/without indexes? Get the big picture Will this run on IA64 / X64? No BIDS on IA64 how will I debug? Is OLE-DB driver XXX available on IA64? Memory and resource usage on different platforms Platform considerations Break complex ETL into logically distinct packages (vs monolithic design) Improves development & debug experience Process Modularity Separate sub-processes within package into separate Containers More elegant, easier to develop Simple to disable whole Containers when debugging Package Modularity Use Script Task/Transform for one-off problems Build custom components for maximum re-use Component Modularity Concise naming conventions Conformed blueprint design patterns Presentable layout Annotations Error Logging Configurations Get as close to the data as possible Limit number of columns Filter number of rows Dont be afraid to leverage TSQL Type conversions, null coercing, coalescing, data type sharpening select nullif(name, ) from contacts order by 1 select convert(tinyint, code) from sales Performance Testing & Tuning Connect Output to RowCount transform See Performance Best Practices FastParse for text files BEFORE: select dbo.Tbl_Dim_Store.SK_Store_ID , Tbl_Dim_Store.Store_Num ,isnull(dbo.Tbl_Dim_Merchant_Division.SK_Merch_Di v_ID, 0) as SK_Merch_Div_ID from dbo.Tbl_Dim_Store left outer join dbo.Tbl_Dim_Merchant_Division on dbo.Tbl_Dim_Store.Merch_Div_Num= dbo.Tbl_Dim_Merchant_Division.Merch_Div_N um where Current_Row = 1 AFTER: select * from etl.uf_FactStoreSales(@Date) Use the power of TSQL to clean the data 'on the fly' Too many moving parts is inelegant and likely slow But dont be afraid to experiment there are many ways to solve a problem Avoid over- design Allocate enough threads EngineThreads property on DataFlow Task See Performance Talk Maximize Parallelism Synchronous vs. Asynchronous components Memcopy is expensive Minimize blocking For example, minimize data retrieved by LookupTx Minimize ancillary data Full Cache for small lookup datasets No Cache for volatile lookup datasets Partial Cache for large lookup datasets Three Modes of Operation Full Cache is optimal, but uses the most memory, also takes time to load Partial Cache can be expensive since it populates on the fly using singleton SELECTs No Cache uses no memory, but takes longer Tradeoff memory vs. performance Catch is that it requires Sorted inputs See SSIS Performance white paper for more details Can use Merge Join component instead Can written in any .Net language Must be signed, registered and installed but can be widely re-used Quite fiddly for single task Custom components Can be written in VisualBasic.Net or C# Are persisted within a package and have limited reuse Have template methods already created for you Scripts http://sqlcat.com http://technet.microsoft.com/en-us/library/bb961995.aspx http://blogs.msdn.com/sqlperf/archive/2008/02/27/etl-world-record.aspx
Learn T-SQL From Scratch: An Easy-to-Follow Guide for Designing, Developing, and Deploying Databases in the SQL Server and Writing T-SQL Queries Efficiently