Professional Documents
Culture Documents
SQL Tips Techniques
SQL Tips Techniques
SQL Tips Techniques
Table of Contents
SQL Tips and Techniques Using CASE statement Random Sampling Dynamic SQL Join and Aggregate Index Timestamp Applications Performance Reminders Summary
2 pg.
3 pg.
SELECT first_name ,last_name ,CASE WHEN manager_employee_number = 1019 THEN 'employee' WHEN employee_number = 1019 THEN 'manager' ELSE NULL END FROM employee WHERE employee_number = 1019 OR manager_employee_number = 1019;
first_name -------------------------Carol Ron John last_name -------------------Kanieski Kubic Stein <CASE expression> -----------------------------employee manager employee
5 pg.
The total estimated time is 0.15 seconds vs 0.31 for UNION Use of CASE requires only a single table scan.
6 pg.
Useful for business purposes Teradata System Calendar provides day of week as a numeric Requires a join to the System Calendar
7 pg.
8 pg.
SELECT dw.char_day "Day of// Week" ,SUM(ds.sales) AS Sales FROM daily_sales ds ,sys_calendar.calendar sc , day_of_week dw WHERE sc.calendar_date = ds.salesdate AND sc.day_of_week = dw.numeric_day GROUP BY 1, dw.numeric_day ORDER BY dw.numeric_day;
Day of Week Sales ---------------- ---------Sunday 2950.00 Monday 2200.00 Tuesday 2000.00 Wednesday 2100.00 Thursday 2000.00 Friday 2450.00 Saturday 3250.00
Requires joining three tables using two join conditions Day of Week table has only seven rows
9 pg.
RANDOM Function
The RANDOM function may be used to generate a random number between a specified range. RANDOM (Lower limit, Upper limit) returns a random number between the lower and upper limits inclusive. Both limits must be specified. Example: Assign a random number between 1 and 9 to each department. SELECT department_number, RANDOM(1,9) FROM department; department_number ----------------501 301 201 600 100 402 403 302 401
11 pg.
Random(1,9) ----------2 6 3 7 3 2 1 5 1
Note it is possible for random numbers to repeat. The RANDOM function is activated for each row processed, thus duplicate random values are possible.
Note that no duplicates were generated because the pool of possible values is over ten times the number of rows to be assigned.
12 pg.
With only three values to distribute over nine rows, duplicates are necessary.
13 pg.
RANDOM Sampling
Consider the following distribution of employee salaries.
Salary Range ------------$ 0 to < $30K $30 to < $40K $40 to < $50K $50K + Count ------6 9 4 7
Problem: Select a sample representing two thirds of the employees making under $30,000. Use the RANDOM function to accomplish this. SELECT employee_number , salary_amount FROM employee WHERE (salary_amount < 30000 AND RANDOM(1,3) < 3);
employee_number --------------1006 1023 1013
Because of the nature of random number generation, we end up with a 50% sample (3 out of 6) instead of a 67% sample (4 out of 6).
14 pg.
SELECT employee_number , salary_amount FROM employee WHERE salary_amount < 30000 SAMPLE .67;
employee_number --------------1006 1023 1008 1014 salary_amount ------------29450.00 26500.00 29250.00 24500.00
16 pg.
SELECT employee_number, salary_amount FROM employee WHERE (salary_amount < 30000 AND RANDOM(1,3) < 3) OR (salary_amount BETWEEN 30001 AND 40000 AND RANDOM(1,3) < 3) OR (salary_amount BETWEEN 40001 AND 50000 AND RANDOM(1,3) < 3) ORDER BY 2;
employee_number --------------1014 1001 1023 1009 1005 1004 1003 1021 1020 1002 1024 1010 1007
salary_amount ------------24500.00 25525.00 26500.00 31000.00 31200.00 36300.00 37850.00 38750.00 39500.00 43100.00 43700.00 46000.00 49700.00
The result shows the following distribution: Under $30,000 3 out of 6 (50%) Between $30,000 and $39,999 Between $40,000 and $49,999
17 pg.
The smaller the RANDOM range is defined relative to the size of the pool of rows, the more accurately a specific percentage can be achieved.
19 pg.
20 pg.
21 pg.
Static SQL
- pre-constructed SQL compiled into the stored procedure. - may be parameterized. - still optimized prior to each execution
22 pg.
Restrictions
The creating user must also be the owner of the procedure in order to have the right to use dynamic SQL. The size of the SQL command string cannot exceed 32000. Multi-statement requests are not supported. The ending semi-colon is optional on the SQL command. The following SQL statements cannot be used as dynamic SQL in stored procedures: CALL CREATE PROCEDURE DATABASE EXPLAIN HELP REPLACE PROCEDURE SHOW
24 pg.
SELECT SELECT INTO SET SESSION ACCOUNT SET SESSION COLLATION SET SESSION DATEFORM SET TIME ZONE
Join Indexes
A Join Index is an optional index which may be created by the user for one of the following three purposes: Pre-join multiple tables(Multi-table Join Index) Distribute the rows of a single table on the hash value of a foreign key value(Single-table Join Index) Aggregate one or more columns of a single table or multiple tables into a summary table(Aggregate Join Index) If possible, the optimizer will use a Join Index rather than access tables directly This typically will result in much better performance Join Indexes are automatically updated as the table rows are updated A Join Index may not be accessed directly It is a option which the optimizer may choose if the index covers the query
25 pg.
49 valid customers have orders. 1 valid customer has no orders. 1 order has an invalid customer.
26 pg.
CUSTOMERS
49
ORDERS
A join index will not help this query The table orders covers the query
27 pg.
CUSTOMERS
49
ORDERS
A join index can help this query Two tables are needed to cover the query Query cost: .39 secs
28 pg.
Fixed Portion
CUST_ID 1001 CUST_NAME ABC Corp ORDER_ID 501 502 503 504 505 506 507 508 509
Variable Portion
ORDER_STATUS C C C C C C C C C ORDER_DATE 990120 990220 990320 990420 990520 990620 990122 990222 990322
1002
BCD Corp
29 pg.
CUSTOMERS
49
ORDERS
Same SQL query Optimizer picks Join Index rather than doing a join Join Index covers query Without Join Index .39 secs With Join Index .17 secs
30 pg.
1002
BCD Corp
Without Join Index .40 secs With Join Index .17 secs
31 pg.
CUST_ID 1001
1002
BCD Corp
ORDER_ID 501 502 503 504 505 506 507 508 509
ORDER_STATUS C C C C C C C C C
ORDER_DATE 990120 990220 990320 990420 990520 990620 990122 990222 990322
All referenced columns part of join index Join Index covers query Without Join Index .23 secs Optimizer picks Join Index With Join Index .15 secs
32 pg.
33 pg.
Traditional Aggregation
SELECT EXTRACT(YEAR FROM salesdate) AS Yr , EXTRACT(MONTH FROM salesdate)AS Mon , SUM(sales) FROM daily_sales WHERE itemid = 10 AND Yr IN (1997, 1998) GROUP BY 1,2 Yr ORDER BY 1,2; ----------Explanation -------------------------------------------------------------------------1) First, we do a SUM step to aggregate from PED1.daily_sales by way of the primary index "PED1.daily_sales.itemid = 10" with a residual condition of ("((EXTRACT(YEAR FROM (PED1.daily_sales.salesdate )))= 1997) OR ((EXTRACT(YEAR FROM (PED1.daily_sales.salesdate )))= 1998)"), and the grouping identifier in field 1. Aggregate Intermediate Results are computed locally, then placed in Spool 2. The size of Spool 2 is estimated with high confidence to be 1 to 1 rows.
Mon Sum(sales) ----------- -------------1997 1 2150.00 1997 2 1950.00 1997 8 1950.00 1997 9 2100.00 1998 1 1950.00 1998 2 2100.00 1998 8 2200.00 1998 9 2550.00
34 pg.
CREATE JOIN INDEX monthly_sales AS SELECT itemid AS Item ,EXTRACT(YEAR FROM salesdate) AS Yr ,EXTRACT(MONTH FROM salesdate) AS Mon ,SUM(sales) AS SumSales FROM daily_sales GROUP BY 1,2,3;
35 pg.
Explanation ----------------------------------------------------------------------1) First, we do a SUM step to aggregate from join index table PED1.monthly_sales by way of the primary index "PED1.monthly_sales.Item = 10", and the grouping identifier in field 1. Aggregate Intermediate Results are computed locally, then placed in Spool 2. The size of Spool 2 is estimated with low confidence to be 4 to 4 rows.
36 pg.
37 pg.
ANSI Timestamp
Timestamp combines date and time into a single column. TIMESTAMP(n) - Where n=(0-6) Consists of 6 fields of information YEAR,MONTH,DAY,HOUR,MINUTE,SECOND Internal format is DATE(4 bytes) + TIME(6 bytes) = 10 bytes Timestamp representation TIMESTAMP(0) 2001-12-07 11:37:58 TIMESTAMP(6) 2001-12-07 11:37:58.213000 Character conversion CHAR(19) CHAR(26)
CREATE TABLE tblb (tmstampb TIMESTAMP); INSERT INTO tblb (CURRENT_TIMESTAMP); SELECT * FROM tblb; tmstampb --------------------------------------2001-11-06 13:48:38.580000
38 pg.
Timestamp + Interval
Timestamp may be combined with any day-time interval to produce a new timestamp. TIMESTAMP +
YEAR YEAR TO MONTH MONTH DAY DAY TO HOUR DAY TO MINUTE DAY TO SECOND HOUR HOUR TO MINUTE MINUTE MINUTE TO SECOND SECOND
= TIMESTAMP
Subtract 2 yrs and 6 mos from the designated timestamp: SELECT TIMESTAMP '1999-10-01 09:30:22' - INTERVAL '2-06' YEAR TO MONTH;
1997-04-01 09:30:22
Subtract 1 hr, 20 mins and 10 secs from designated timestamp: SELECT TIMESTAMP '1999-10-01 09:30:22' - INTERVAL '01:20:10' HOUR TO SECOND;
1999-10-01 08:10:12
39 pg.
Timestamp Subtraction
TIMESTAMP - TIMESTAMP = Given the following two timestamps, calculate the difference between them as directed: In months? SELECT (TIMESTAMP '1999-10-20 10:25:40' TIMESTAMP '1998-09-19 08:20:00') MONTH;
13 YEAR YEAR TO MONTH MONTH DAY DAY TO HOUR DAY TO MINUTE DAY TO SECOND HOUR HOUR TO MINUTE MINUTE MINUTE TO SECOND SECOND
41 pg.
42 pg.
Comparing Intervals
Show the serial number and the number of days required for each TV that took longer than 2 days to repair. SELECT serial_number, (end_time - start_time) DAY TO MINUTE AS #_DaysHrsMns FROM Repair_time WHERE #_DaysHrsMns > INTERVAL '02 00:00' DAY TO MINUTE;
43 pg.
44 pg.
Incorrect Answer
SELECT (100 * COUNT(serial) / cnt) (FORMAT '99%') FROM (SELECT COUNT(*) FROM Repair_time) AS temp1(cnt), (SELECT serial_number, (end_time - start_time) day AS Num_Days FROM Repair_time WHERE Num_days > INTERVAL '02 00:00' DAY TO MINUTE;) AS temp2(serial, Number_days) GROUP BY cnt; ((100*Count(serial))/cnt) ---------------------------------Correct Answer 78%
45 pg.
Performance Reminders
Consider use of CASE for small set values testing Use appropriate sampling functions - RANDOM or SAMPLE Use Dynamic SQL with Stored Procedures Join indexes can help query performance by pre-joining tables Aggregate indexes are preferable to aggregated views or tables Use TIMESTAMP and INTERVALS for time-related processing
46 pg.
Summary
SQL is a very versatile language Usually, if theres a will, theres a way Often there are several ways to write a query Find the one that performs best, using EXPLAIN
47 pg.