SQL Tips Techniques

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 47

SQL Tips and Techniques

Paul Derouin Learning Consultant Teradata Learning

Table of Contents
SQL Tips and Techniques Using CASE statement Random Sampling Dynamic SQL Join and Aggregate Index Timestamp Applications Performance Reminders Summary

2 pg.

Using Union For Set Tagging


Show the name of manager 1019 and the names of his direct reports.
SELECT first_name ,last_name , ' employee ' AS "Employee//Type" FROM employee WHERE manager_employee_number = 1019 UNION SELECT first_name ,last_name ,' manager ' FROM employee WHERE employee_number = 1019 ORDER BY 2;
Employee first_name ---------------------------Carol Ron John last_name -------------------Kanieski Kubic Stein Type -------------employee manager employee

3 pg.

Using Union For Set Tagging (Cont.)


SELECT first_name ,last_name, ' employee ' AS "Employee//Type" FROM employee WHERE manager_employee_number = 1019 UNION SELECTfirst_name ,last_name ,' manager ' FROM employee WHERE employee_number = 1019 ORDER BY 2;
3) We do an all-AMPs RETRIEVE step from CUSTOMER_SERVICE.employee by way of an all-rows scan with a condition of ( "CUSTOMER_SERVICE.employee.manager_employee_number = 1019") into Spool 1, which is redistributed by hash code to all AMPs. The size of Spool 1 is estimated with no confidence to be 3 rows. The estimated time for this step is 0.16 seconds. 4) We do a single-AMP RETRIEVE step from CUSTOMER_SERVICE.employee by way of the unique primary index "CUSTOMER_SERVICE.employee.employee_number = 1019" with no residual conditions into Spool 1, which is redistributed by hash code to all AMPs. Then we do a SORT to order Spool 1 by the sort key in spool field1 eliminating duplicate rows. The size of Spool 1 is estimated with high confidence to be 2 to 26 rows. The estimated time for this step is 0.15 seconds.

The total estimated time is 0.31 seconds


4 pg.

Using CASE For Set Tagging


Show the name of manager 1019 and the names of his direct reports.

SELECT first_name ,last_name ,CASE WHEN manager_employee_number = 1019 THEN 'employee' WHEN employee_number = 1019 THEN 'manager' ELSE NULL END FROM employee WHERE employee_number = 1019 OR manager_employee_number = 1019;
first_name -------------------------Carol Ron John last_name -------------------Kanieski Kubic Stein <CASE expression> -----------------------------employee manager employee

5 pg.

Using CASE For Set Tagging


SELECT first_name ,last_name ,CASE WHEN manager_employee_number = 1019 THEN 'employee' WHEN employee_number = 1019 THEN 'manager' ELSE NULL END FROM employee WHERE employee_number = 1019 OR manager_employee_number = 1019;
3) We do an all-AMPs RETRIEVE step from CUSTOMER_SERVICE.employee by way of an all-rows scan with a condition of ( "(CUSTOMER_SERVICE.employee.employee_number = 1019) OR (CUSTOMER_SERVICE.employee.manager_employee_number = 1019)") into Spool 1, which is built locally on the AMPs. The size of Spool 1 is estimated with no confidence to be 4 rows. The estimated time for this step is 0.15 seconds.

The total estimated time is 0.15 seconds vs 0.31 for UNION Use of CASE requires only a single table scan.

6 pg.

Reporting By Day of Week


Show the sales figures by day of week as seen below.
Day of Week ---------------Sunday Monday Tuesday Wednesday Thursday Friday Saturday Sales ---------2950.00 2200.00 2000.00 2100.00 2000.00 2450.00 3250.00

Useful for business purposes Teradata System Calendar provides day of week as a numeric Requires a join to the System Calendar

7 pg.

Creating a Day of Week Table


CREATE TABLE day_of_week (numeric_day BYTEINT ,char_day CHAR(9) ) UNIQUE PRIMARY INDEX (numeric_day); INSERT INTO day_of_week VALUES (1, 'Sunday'); INSERT INTO day_of_week VALUES (2, 'Monday'); INSERT INTO day_of_week VALUES (3, 'Tuesday'); INSERT INTO day_of_week VALUES (4, 'Wednesday'); INSERT INTO day_of_week VALUES (5, 'Thursday'); INSERT INTO day_of_week VALUES (6, 'Friday'); INSERT INTO day_of_week VALUES (7, 'Saturday');

8 pg.

Using a Day Of Week Table


Show the sales figures by day of week.

SELECT dw.char_day "Day of// Week" ,SUM(ds.sales) AS Sales FROM daily_sales ds ,sys_calendar.calendar sc , day_of_week dw WHERE sc.calendar_date = ds.salesdate AND sc.day_of_week = dw.numeric_day GROUP BY 1, dw.numeric_day ORDER BY dw.numeric_day;

Day of Week Sales ---------------- ---------Sunday 2950.00 Monday 2200.00 Tuesday 2000.00 Wednesday 2100.00 Thursday 2000.00 Friday 2450.00 Saturday 3250.00

Requires joining three tables using two join conditions Day of Week table has only seven rows

Total cost of this query is approx .47

9 pg.

Using CASE Statement


SELECT CASE sc.day_of_week WHEN 1 then 'Sunday' WHEN 2 then 'Monday' WHEN 3 then 'Tuesday' WHEN 4 then 'Wednesday' WHEN 5 then 'Thursday' WHEN 6 then 'Friday' WHEN 7 then 'Saturday' ELSE 'Not Found' END AS "Day of// Week" ,SUM(ds.sales) AS Sales FROM daily_sales ds ,sys_calendar.calendar sc WHERE sc.calendar_date = ds.salesdate GROUP BY 1, sc.day_of_week ORDER BY sc.day_of_week; Same Result
Day of Week Sales ---------------- ---------Sunday 2950.00 Monday 2200.00 Tuesday 2000.00 Wednesday 2100.00 Thursday 2000.00 Friday 2450.00 Saturday 3250.00

Requires joining only two tables using one join condition

Total cost of this query is approx .35


10 pg.

RANDOM Function
The RANDOM function may be used to generate a random number between a specified range. RANDOM (Lower limit, Upper limit) returns a random number between the lower and upper limits inclusive. Both limits must be specified. Example: Assign a random number between 1 and 9 to each department. SELECT department_number, RANDOM(1,9) FROM department; department_number ----------------501 301 201 600 100 402 403 302 401
11 pg.

Random(1,9) ----------2 6 3 7 3 2 1 5 1

Note it is possible for random numbers to repeat. The RANDOM function is activated for each row processed, thus duplicate random values are possible.

Duplicate RANDOM Values


Duplicate value likelihood may be reduced by increasing the size of the RANDOM interval relative to the size of the table. Example: Assign a random number between 1 and 100 to each department. SELECT department_number , RANDOM(1,100) FROM department;
department_number ----------------501 301 201 600 100 402 403 302 401 Random(1,100) ------------15 19 71 75 61 41 81 31 59

Note that no duplicates were generated because the pool of possible values is over ten times the number of rows to be assigned.

12 pg.

Duplicate RANDOM Values (cont'd)


Duplicate random values can be increased, by decreasing the size of the RANDOM interval relative to the size of the table. Example: Assign a random number between 1 and 3 to each department. SELECT department_number, RANDOM(1,3) FROM department;
department_number ----------------501 301 201 600 100 402 403 302 401 Random(1,3) ----------2 3 3 1 3 2 1 2 1

With only three values to distribute over nine rows, duplicates are necessary.

13 pg.

RANDOM Sampling
Consider the following distribution of employee salaries.
Salary Range ------------$ 0 to < $30K $30 to < $40K $40 to < $50K $50K + Count ------6 9 4 7

Problem: Select a sample representing two thirds of the employees making under $30,000. Use the RANDOM function to accomplish this. SELECT employee_number , salary_amount FROM employee WHERE (salary_amount < 30000 AND RANDOM(1,3) < 3);
employee_number --------------1006 1023 1013

Because of the nature of random number generation, we end up with a 50% sample (3 out of 6) instead of a 67% sample (4 out of 6).

salary_amount ------------29450.00 26500.00 24500.00

14 pg.

Using The SAMPLE Function


A sample of a single group can also be generated and with more accuracy using the SAMPLE function. Solution 2:

SELECT employee_number , salary_amount FROM employee WHERE salary_amount < 30000 SAMPLE .67;
employee_number --------------1006 1023 1008 1014 salary_amount ------------29450.00 26500.00 29250.00 24500.00

4 out of 6 employees represents a 67% sample.


15 pg.

SAMPLE Function For Multiple Samples


Permits use of percentage or row count specification. Used rows are not reusable for subsequent sample sets. SELECT department_number ,SAMPLEID FROM department SAMPLE .25, .25, .50 ORDER BY SAMPLEID ;
department_number 301 403 302 401 100 402 201 600 501 SampleId 1 1 2 2 3 3 3 3 3

SELECT department_number ,SAMPLEID FROM department SAMPLE 3, 5, 8 ORDER BY SAMPLEID;


department_number 301 403 302 401 100 402 201 501 600 SampleId 1 1 1 2 2 2 2 2 3

16 pg.

Complex RANDOM Sampling


The RANDOM function can be used multiple times in the same SELECT statement, It can be used to produce multiple samples, each using a separate criteria. Example: Create a sample consisting of approximately 67% from each of the under $50,000 salary ranges.

SELECT employee_number, salary_amount FROM employee WHERE (salary_amount < 30000 AND RANDOM(1,3) < 3) OR (salary_amount BETWEEN 30001 AND 40000 AND RANDOM(1,3) < 3) OR (salary_amount BETWEEN 40001 AND 50000 AND RANDOM(1,3) < 3) ORDER BY 2;

employee_number --------------1014 1001 1023 1009 1005 1004 1003 1021 1020 1002 1024 1010 1007

salary_amount ------------24500.00 25525.00 26500.00 31000.00 31200.00 36300.00 37850.00 38750.00 39500.00 43100.00 43700.00 46000.00 49700.00

The result shows the following distribution: Under $30,000 3 out of 6 (50%) Between $30,000 and $39,999 Between $40,000 and $49,999
17 pg.

6 out of 9 (67%) 4 out of 4 (100%)

Complex RANDOM Sampling (cont'd)


Changing the size of the RANDOM range can affect the size of the returned sample. Example: Perform the same query but change the size of the RANDOM range to 100. SELECT employee_number, salary_amount employee_number FROM employee --------------WHERE (salary_amount < 30000 1013 1023 AND RANDOM(1,100) < 68) 1005 OR (salary_amount BETWEEN 30001 AND 40000 1022 AND RANDOM(1,100) < 68) 1004 OR (salary_amount BETWEEN 40001 AND 50000 1003 AND RANDOM(1,100) < 68) 1007 ORDER BY 2; This result shows the following distribution: Under $30,000 Between $30,000 and $39,999 Between $40,000 and $49,999
18 pg.

salary_amount ------------24500.00 26500.00 31200.00 32300.00 36300.00 37850.00 49700.00

2 out of 6 (33%) 4 out of 9 (44%) 1 out of 4 (25%)

Sample Sizing Issues


The larger the pool of rows to be drawn from, the closer one can get to achieving a specific percentage of rows in the sample.
SEL COUNT(*) FROM agent_sales WHERE (sales_amt BETWEEN 20000 and 39999); Returns 100 rows exactly Each of the following examples attempts to return a 50% sample of the target rows. SEL COUNT(*) FROM agent_sales WHERE (sales_amt BETWEEN 20000 and 39999) AND RANDOM(1,100) < 51; Returns 58 rows or 58% SELECT COUNT(*) FROM agent_sales WHERE (sales_amt BETWEEN 20000 and 39999) AND RANDOM(1,10) < 6; Returns 53 rows or 53% SELECT COUNT(*) FROM agent_sales WHERE (sales_amt BETWEEN 20000 and 39999) AND RANDOM(1,4) < 3; Returns 50 rows or 50%

The smaller the RANDOM range is defined relative to the size of the pool of rows, the more accurately a specific percentage can be achieved.
19 pg.

Limitations On Use Of RANDOM


RANDOM is non-ANSI standard RANDOM may be used in a SELECT list or a WHERE clause, but not both RANDOM may be used in Updating, Inserting or Deleting rows RANDOM may not be used with aggregate or OLAP functions RANDOM cannot be referenced by numeric position in a GROUP BY or ORDER BY clause

20 pg.

V2R5 Sampling Features


Before V2R5: Sampling without replacement Proportional allocation - each AMP provides same proportion of sample rows. With V2R5: Sampling with or without replacement (User choice) Proportional allocation - each AMP provides same proportion of sample rows. Ramdomized allocation - randomized across system - not AMP proportional.

21 pg.

Dynamic SQL and Static SQL


Dynamic SQL
- technique for generating and executing SQL commands dynamically from a stored procedure at runtime.

Static SQL
- pre-constructed SQL compiled into the stored procedure. - may be parameterized. - still optimized prior to each execution

Static SQL Example


REPLACE PROCEDURE static_sql (IN sal DEC(9,2) ,IN emp_num INT) BEGIN UPDATE emp1 SET salary_amount = :sal WHERE employee_number = :emp_num); END; CALL static_sql(50000, 1018);

22 pg.

Dynamic SQL (cont'd)


Dynamic SQL Example REPLACE PROCEDURE dyn_sql (IN col1 CHAR(15) ,IN val1 CHAR(10) ,IN emp_num CHAR(8)) BEGIN CALL DBC.SysExecSQL('UPDATE emp1 SET '|| :col1 || '= ' || :val1 || ' WHERE employee_number = ' || :emp_num); END; CALL dyn_sql('salary_amount','50000','1018'); /* Updates employee 1018 salary_amount to $50,000 */ CALL dyn_sql('job_code','567890','1018'); /* Updates employee 1018 job_code to 567890 */ Dynamic SQL - Constructed as a concatenated character string - Passed to DBC.SysExecSQL for execution - May be subject to run-time errors
23 pg.

Dynamic SQL (cont'd)


The following are restrictions on the use of Dynamic SQL within stored procedures:

Restrictions
The creating user must also be the owner of the procedure in order to have the right to use dynamic SQL. The size of the SQL command string cannot exceed 32000. Multi-statement requests are not supported. The ending semi-colon is optional on the SQL command. The following SQL statements cannot be used as dynamic SQL in stored procedures: CALL CREATE PROCEDURE DATABASE EXPLAIN HELP REPLACE PROCEDURE SHOW
24 pg.

SELECT SELECT INTO SET SESSION ACCOUNT SET SESSION COLLATION SET SESSION DATEFORM SET TIME ZONE

Join Indexes
A Join Index is an optional index which may be created by the user for one of the following three purposes: Pre-join multiple tables(Multi-table Join Index) Distribute the rows of a single table on the hash value of a foreign key value(Single-table Join Index) Aggregate one or more columns of a single table or multiple tables into a summary table(Aggregate Join Index) If possible, the optimizer will use a Join Index rather than access tables directly This typically will result in much better performance Join Indexes are automatically updated as the table rows are updated A Join Index may not be accessed directly It is a option which the optimizer may choose if the index covers the query
25 pg.

Customer and Order Tables


CREATE TABLE customer ( cust_id INTEGER NOT NULL, cust_name CHAR(15), cust_addr CHAR(25) )UNIQUE PRIMARY INDEX ( cust_id ); CREATE TABLE orders ( order_id INTEGER NOT NULL, order_date DATE FORMAT 'yyyy-mm-dd', cust_id INTEGER, order_status CHAR(1)) UNIQUE PRIMARY INDEX ( order_id );
CUSTOMERS 1 49 1 ORDERS

49 valid customers have orders. 1 valid customer has no orders. 1 order has an invalid customer.

26 pg.

Single Table Query


How many orders have assigned customers? SELECT COUNT(order_id) FROM orders WHERE cust-id IS NOT NULL; Count(order_id) ---------------------50

CUSTOMERS

49

ORDERS

A join index will not help this query The table orders covers the query

27 pg.

Will Join Index Help?


How many orders have assigned valid customers? SELECT COUNT(o.order_id) FROM customer c INNER JOIN orders o ON c.cust_id = o.cust_id; Count(order_id) -----------------------49

CUSTOMERS

49

ORDERS

A join index can help this query Two tables are needed to cover the query Query cost: .39 secs

28 pg.

Creating a Join Index


CREATE JOIN INDEX cust_ord_ix AS SELECT (c.cust_id, cust_name),(order_id, order_status, order_date) FROM customer c, orders o WHERE c.cust_id = o.cust_id PRIMARY INDEX (cust_id);

Fixed Portion
CUST_ID 1001 CUST_NAME ABC Corp ORDER_ID 501 502 503 504 505 506 507 508 509

Variable Portion
ORDER_STATUS C C C C C C C C C ORDER_DATE 990120 990220 990320 990420 990520 990620 990122 990222 990322

1002

BCD Corp

29 pg.

With Join Index


How many orders have assigned valid customers? SELECT COUNT(o.order_id) FROM customer c INNER JOIN orders o ON c.cust_id = o.cust_id; Count(order_id) -----------------------49

CUSTOMERS

49

ORDERS

Same SQL query Optimizer picks Join Index rather than doing a join Join Index covers query Without Join Index .39 secs With Join Index .17 secs

30 pg.

Join Index Coverage


How many valid customers have assigned orders in January 1999? SELECT COUNT(C.CUST_ID) FROM customer c INNER JOIN orders o ON c.cust_id = o.cust_id WHERE o.order_date BETWEEN 990101 AND 990131; Count(cust_id) ---------------------9 Order_date is part of Join Index Join Index covers query Optimizer picks Join Index
Join Index
CUST_ID 1001 CUST_NAME ABC Corp ORDER_ID 501 502 503 504 505 506 507 508 509 ORDER_STATUS C C C C C C C C C ORDER_DATE 990120 990220 990320 990420 990520 990620 990122 990222 990322

1002

BCD Corp

Without Join Index .40 secs With Join Index .17 secs

31 pg.

Join Index Comparison


Name the valid customers who have open orders in January 1999? SELECT c.cust_name FROM customer c INNER JOIN orders o ON c.cust_id = o.cust_id WHERE o.order_date BETWEEN 990101 and 990131 AND o.order_status = O;
Join Index

cust_name ---------------JKL Corp

CUST_ID 1001

CUST_NAME ABC Corp

1002

BCD Corp

ORDER_ID 501 502 503 504 505 506 507 508 509

ORDER_STATUS C C C C C C C C C

ORDER_DATE 990120 990220 990320 990420 990520 990620 990122 990222 990322

All referenced columns part of join index Join Index covers query Without Join Index .23 secs Optimizer picks Join Index With Join Index .15 secs

32 pg.

Aggregate Join Indexes


Aggregate Join Indexes are: Designed for queries which use counts, sums and averages Extracted aggregated data optionally based on months or years An alternative to summary tables Automatically updated as base tables change An option for the optimizer when the index covers the query Are not compatible with Multiload or Fastload

33 pg.

Traditional Aggregation
SELECT EXTRACT(YEAR FROM salesdate) AS Yr , EXTRACT(MONTH FROM salesdate)AS Mon , SUM(sales) FROM daily_sales WHERE itemid = 10 AND Yr IN (1997, 1998) GROUP BY 1,2 Yr ORDER BY 1,2; ----------Explanation -------------------------------------------------------------------------1) First, we do a SUM step to aggregate from PED1.daily_sales by way of the primary index "PED1.daily_sales.itemid = 10" with a residual condition of ("((EXTRACT(YEAR FROM (PED1.daily_sales.salesdate )))= 1997) OR ((EXTRACT(YEAR FROM (PED1.daily_sales.salesdate )))= 1998)"), and the grouping identifier in field 1. Aggregate Intermediate Results are computed locally, then placed in Spool 2. The size of Spool 2 is estimated with high confidence to be 1 to 1 rows.

Mon Sum(sales) ----------- -------------1997 1 2150.00 1997 2 1950.00 1997 8 1950.00 1997 9 2100.00 1998 1 1950.00 1998 2 2100.00 1998 8 2200.00 1998 9 2550.00

34 pg.

Creating An Aggregate Index


CREATE SET TABLE daily_sales ,NO FALLBACK , ( itemid INTEGER, salesdate DATE FORMAT 'YY/MM/DD', sales DECIMAL(9,2)) PRIMARY INDEX ( itemid );

CREATE JOIN INDEX monthly_sales AS SELECT itemid AS Item ,EXTRACT(YEAR FROM salesdate) AS Yr ,EXTRACT(MONTH FROM salesdate) AS Mon ,SUM(sales) AS SumSales FROM daily_sales GROUP BY 1,2,3;

35 pg.

Query Using Aggregate Index


SELECT EXTRACT(YEAR FROM salesdate)AS Yr , EXTRACT(MONTH FROM salesdate)AS Mon , SUM(sales) FROM daily_sales WHERE itemid = 10 AND Yr IN (1997, 1998) Yr Mon Sum(sales) GROUP BY 1,2 ------------ ----------- -------------ORDER BY 1,2; 1997 1 2150.00
1997 1997 1997 1998 1998 1998 1998 2 8 9 1 2 8 9 1950.00 1950.00 2100.00 1950.00 2100.00 2200.00 2550.00

Explanation ----------------------------------------------------------------------1) First, we do a SUM step to aggregate from join index table PED1.monthly_sales by way of the primary index "PED1.monthly_sales.Item = 10", and the grouping identifier in field 1. Aggregate Intermediate Results are computed locally, then placed in Spool 2. The size of Spool 2 is estimated with low confidence to be 4 to 4 rows.

36 pg.

Join Index Summary


A Join Index: Is a denormalization tool Pre-joins existing tables Aggregates existing columns Can improve performance for covered queries Can join more than two tables Can use inner, outer and cross joins Costs additional disk space Costs additional maintenance processing for updates Cannot be accessed directly by SQL Is a choice for the optimizer

37 pg.

ANSI Timestamp
Timestamp combines date and time into a single column. TIMESTAMP(n) - Where n=(0-6) Consists of 6 fields of information YEAR,MONTH,DAY,HOUR,MINUTE,SECOND Internal format is DATE(4 bytes) + TIME(6 bytes) = 10 bytes Timestamp representation TIMESTAMP(0) 2001-12-07 11:37:58 TIMESTAMP(6) 2001-12-07 11:37:58.213000 Character conversion CHAR(19) CHAR(26)

CREATE TABLE tblb (tmstampb TIMESTAMP); INSERT INTO tblb (CURRENT_TIMESTAMP); SELECT * FROM tblb; tmstampb --------------------------------------2001-11-06 13:48:38.580000
38 pg.

Timestamp + Interval
Timestamp may be combined with any day-time interval to produce a new timestamp. TIMESTAMP +
YEAR YEAR TO MONTH MONTH DAY DAY TO HOUR DAY TO MINUTE DAY TO SECOND HOUR HOUR TO MINUTE MINUTE MINUTE TO SECOND SECOND

= TIMESTAMP

Subtract 2 yrs and 6 mos from the designated timestamp: SELECT TIMESTAMP '1999-10-01 09:30:22' - INTERVAL '2-06' YEAR TO MONTH;
1997-04-01 09:30:22

Subtract 1 hr, 20 mins and 10 secs from designated timestamp: SELECT TIMESTAMP '1999-10-01 09:30:22' - INTERVAL '01:20:10' HOUR TO SECOND;
1999-10-01 08:10:12
39 pg.

Timestamp Subtraction
TIMESTAMP - TIMESTAMP = Given the following two timestamps, calculate the difference between them as directed: In months? SELECT (TIMESTAMP '1999-10-20 10:25:40' TIMESTAMP '1998-09-19 08:20:00') MONTH;
13 YEAR YEAR TO MONTH MONTH DAY DAY TO HOUR DAY TO MINUTE DAY TO SECOND HOUR HOUR TO MINUTE MINUTE MINUTE TO SECOND SECOND

In years? SELECT (TIMESTAMP '1999-10-20 10:25:40' TIMESTAMP '1998-09-19 08:20:00') YEAR;


1

In days? SELECT (TIMESTAMP '1999-10-20 10:25:40' TIMESTAMP '1998-09-19 08:20:00') DAY(3);


396
40 pg.

Using Timestamp In An Application


CREATE TABLE Repair_time ( serial_number INTEGER ,product_desc CHAR(8) ,start_time TIMESTAMP(0) ,end_time TIMESTAMP(0)) UNIQUE PRIMARY INDEX (serial_number); SELECT * FROM Repair_time ORDER BY 1;
serial_number product_desc start_time -------------------- ----------------- ---------------------------100 TV 2000-01-15 10:30:00 101 TV 2000-01-20 08:30:00 102 TV 2000-01-25 13:40:00 103 TV 2000-02-02 11:30:00 104 TV 2000-02-07 09:00:00 105 TV 2000-02-10 08:40:00 106 TV 2000-02-15 12:30:00 107 TV 2000-02-19 14:30:00 108 TV 2000-02-21 11:30:00 end_time ---------------------------2000-01-17 13:20:00 2000-01-23 12:20:00 2000-01-26 14:20:00 2000-02-09 08:50:00 2000-02-10 08:50:00 2000-02-12 14:50:00 2000-02-20 15:20:00 2000-02-21 10:50:00 2000-02-23 16:40:00

41 pg.

Calculating Time Intervals


Produce a report showing each TV by serial number and how long in days, hours and minutes it took to repair the TV? SELECT serial_number, (end_time - start_time) DAY TO MINUTE AS work_time FROM Repair_time ORDER BY 1;
serial_number ------------------100 101 102 103 104 105 106 107 108 work_time -------------2 02:50 What is the average amount of time it takes to repair a TV? 3 03:50 Show the answer in days, hours and minutes. 1 00:40 6 21:20 SELECT AVG( (end_time - start_time) DAY TO MINUTE) 2 23:50 2 06:10 AS avg_repair_time 5 02:50 FROM Repair_time; 1 20:20 2 05:10 avg_repair_time --------------------3 01:40

42 pg.

Comparing Intervals
Show the serial number and the number of days required for each TV that took longer than 2 days to repair. SELECT serial_number, (end_time - start_time) DAY TO MINUTE AS #_DaysHrsMns FROM Repair_time WHERE #_DaysHrsMns > INTERVAL '02 00:00' DAY TO MINUTE;

serial_number -------------------106 101 108 100 104 103 105

#_DaysHrsMns -------------------5 02:50 3 03:50 2 05:10 2 02:50 2 23:50 6 21:20 2 06:10

43 pg.

Advanced Use of Timestamp - Example 1


Produce a list which pairs by serial number any two TVs that were being repaired at the same time. SELECT a.serial_number, b.serial_number FROM Repair_time a CROSS JOIN Repair_time b WHERE (a.start_time, a.end_time) OVERLAPS (b.start_time, b.end_time) AND a.serial_number < b.serial_number;

Alternative Approach Using DISTINCT


SELECT DISTINCT a.serial_number, b.serial_number FROM Repair_time a CROSS JOIN Repair_time b WHERE (a.start_time, a.end_time) OVERLAPS (b.start_time, b.end_time);

serial_number serial_number ------------------- ------------------106 107 103 104 104 105

44 pg.

Advanced Use of Timestamp - Example 2


What percentage of all TVs took 2 or more days to repair? SELECT (100 * COUNT(serial) / cnt) (FORMAT '99%') FROM (SELECT COUNT(*) FROM Repair_time) AS temp1(cnt), (SELECT serial_number, (end_time - start_time) day AS Num_Days FROM Repair_time WHERE Num_days > INTERVAL '02' DAY) AS temp2(serial, Number_days) GROUP BY cnt; ((100*Count(serial))/cnt) ---------------------------------33%

Incorrect Answer

SELECT (100 * COUNT(serial) / cnt) (FORMAT '99%') FROM (SELECT COUNT(*) FROM Repair_time) AS temp1(cnt), (SELECT serial_number, (end_time - start_time) day AS Num_Days FROM Repair_time WHERE Num_days > INTERVAL '02 00:00' DAY TO MINUTE;) AS temp2(serial, Number_days) GROUP BY cnt; ((100*Count(serial))/cnt) ---------------------------------Correct Answer 78%
45 pg.

Performance Reminders
Consider use of CASE for small set values testing Use appropriate sampling functions - RANDOM or SAMPLE Use Dynamic SQL with Stored Procedures Join indexes can help query performance by pre-joining tables Aggregate indexes are preferable to aggregated views or tables Use TIMESTAMP and INTERVALS for time-related processing

46 pg.

Summary

SQL is a very versatile language Usually, if theres a will, theres a way Often there are several ways to write a query Find the one that performs best, using EXPLAIN

47 pg.

You might also like