Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 27

March 2015

When Good Design Goes Bad

Bob Duffy
Database Architect
Prodata SQL Centre of Excellence
Bob Duffy
• 20 years in database sector, 250+ projects
• Senior Consultant with Microsoft 2005-2008
• One of about 25 MCA for SQL Server globally (aka SQL Ranger)
• SQL MCM on SQL 2005 and 2008
• SQL Server MVP 2009+
• SSAS Maestro
• Database Architect at Prodata SQL Centre of Excellence

• http://blogs.prodata.ie/author/bob.aspx
• bob@Prodata.ie

@bob_duffy
What We Will Cover
Stored Procedures
Clustered Tables
Identity and Primary Keys
Indexes
Fragmentation
Naming Conventions
Partitioning
ORM
1. Stored Procedures
Why use Stored Procedures ?
The Dreaded Search Screen

http://www.sommarskog.se/dyn-search.html
The Chunky v Chatty Debate
Two Types of “Chunkiness”
Data Transferred per Call
Number of Calls
Network latency Important here
Stored Procedure Weigh In

Dynamic Design Patterns


Performance Slow Interpreted TSQL
Security Application Agility
Plan Cache Developer Agility
Maintainability The ORM Debate
Chunky Chatty
2. Clustered vs Heap
Best Practise: Cluster ALL Tables ?
Ever Increasing, Narrow, Unique, Static
Always use an Identity Column
When do Clustered Tables go Bad ?
Harder to Scale, especially for some key choices
Large Table Scan Workloads
Non sequential Clustering Keys Cause Fragmentation
Why is Fragmentation the Achilles Heel of table Scans
More Pages => More IO
Kills Read Ahead and disk performance
Heavy NCI Requirement
Best Practise “Clustered Index are Better for Seeks”
Well it depends on if the seek is on a NCI or not!
Clustered Index v Heap

Most Logging Tables


Ascending Keys Insert and Scan heavy Tables
Range Scans OLTP Transaction Tables (banking)
Lots of Deletes Bulk Loading
Tables with Heavy Primary
Seek
3 Always use Identity for Primary/CX Key
Best Practise
Always Use an Identity Column as Primary Key
Extension: Always add a new Surrogate Key

This may shoot you in the foot on large Fact Tables


The Distributed Database
Choice of Identity will cause a lot of pain!

How common is this Issue?


Very Frequent with replication and new MPP Architectures
The Distributed Write Cache ?
Identity creates a bottleneck on the DB
Serializes new records
What if database offline ?
The Over Zealous Dimensional Modeller
This may go bad if you are not a “single hop” data source
Customer DimCustomer

PK CustomerID PK CustomerKey
DimCustomer
CustomerName CustomerID

CustomerAddress PK CustomerKeyKeyKeyKey
CustomerName

CustomerAddress

CustomerKeyKeyKey

CustomerKeyKey

CustomerKey

CustomerID

CustomerName

CustomerAddress
4. We Don’t Need no Indexes?
Best Practise. Add Indexes..
To Reduce IO on important Queries
Seek rather than scan. SELECT * WHERE CustomerID=2
Narrower Scan. SELECT SUM(Qty)

DateKey Customer RegionKey Sales € Qty Cost

Jan 1 1 1000 100 80

Jan 2 1 1000 100 80

Jan 3 1 1000 100 80

Jan 4 1 1000 100 80

Jan 1 1 1000 100 80

Jan 2 1 1000 100 80

Feb 3 1 1000 100 80

Mar 1 1 1000 100 80

April 1 1 1000 100 80

April 1 1 1000 100 80

May 1 1 1000 100 80

June 1 1 1000 100 80

July 1 1 1000 100 80

Aug 1 1 1000 100 80


When Indexes go bad
OLTP
Small Tables
Larger Results – See “The Tipping Point” by Kimberly Tripp
When upsert is more important then select
When every column Indexed
High Throughput Queueing Design Patterns

DWH
Bad “Tipping Points”
Staging Tables
Tables that we scan
When avoiding bad statistics is very hard
Data Analytics
Where we need guaranteed query performance for varied workloads.
Guaranteed Performance !!?!
We have a 1TB Table. Query SLA is 5 mins… Add indexes?
5 Stop Worrying about Fragmentation
Best Practise – Defragment the hell out of your database

Why could this be bad ?


Takes a long time and may interfere with query performance
Why Could this be not worth the bother ?
More Memory will reduce reliance on contiguous disk blocks
Most SANs only do random IO anyway
Its mainly important if our primary concerns are Scans
6 Naming Conventions
Best Practice – use one!
Goes Bad When prefix is meta data (object type, data type, size)
Naming – Common Sense
Project with following prefix standards on SSIS
DATA Source
Transform Type (LOAD, TRANSFORM, EXTRACT)
Package
Control Flow
Shape
7 Partitioning
Best Practise – Partition when table it too big or too slow

Ordered Queries
Maintenance Operations Serial Queries
Parallel Queries Dynamic Parallel Queries
8 ORMs
Best Practise ? Hotly debated

Good For
Developer Agility
Code First, Database Second
Integrated Debugging
Domain Business Model
Cache Management
Key Management
Portable
Query Plan Nightmares

Source: http://www.scarydba.com/2014/12/19/pretty-plans-vs-performance/
When ORMs go Bad
Can write truly horrible TSQL and Plans
Naïve context
Parameterisation
The Disaster Scenario

Lazy/Eager Loading
Can be used as an excuse of lack of database expertise
Hard to Index for (lots of Select *)
Everything has good and bad Aspects

It Depends ;-)

You might also like