SQL SERVER - A Tricky Question and Even Trickier Answer - Index Intersection - Partition Function

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

SQL SERVER A Tricky Question and Even Trickier

Answer Index Intersection Partition Function


Here is the question: Write a select statement using a single table, using single table single time
only without using join keywords, which generate execution plan with 2 join operators. Use
AdventureWorks as a sample database.
Here is his answer:
SELECT Row_number() OVER (ORDER BY OBJECT_ID) num, Rank() OVER (ORDER BY OBJECT_ID
DESC)
num2
INTO
#tmp
FROM
sys.columns
-Enable
Execution
Plan
with
CTRL+M
SELECT
num,
SUM(num2)
OVER
(Partition
BY
num)
FROM #tmp

When I saw this answer I was very happy because I did not visualize it as a solution when I
was asking the question. Here is the execution plan of the T-SQL code above. Its easy to see that
there are multiple joins because of the Partition Function used in the query. What an excellent
participation by Alphonso Jones.

Click to Enlarge
Here is the answer which I had visualized when I asked the question. I was running the query on
AdventureWorks database and executed the following query, which in turn, generated an
execution plan with multiple joins:
USE
GO
SELECT
FROM
WHERE
GO

AdventureWorks2012

[EmployeeID]

258

*
[Purchasing].[PurchaseOrderHeader]
AND
[VendorID]
=
1580

Look at the execution plan of the above query. You can see the joins even though I am using
single table and there is no join syntax in the query.

Click to Enlarge
Personally, I liked the solution of Alphonso Jones as his solution will always generate multiple
joins due to Partition Function. On the other hand, my solution is a bit tricky for it requires
Indexes on the table [Purchasing].[PurchaseOrderHeader], which generates index
intersection. Index Intersection is a technique which utilizes more than one index on a table to
satisfy a given query.

BACKGROUND
This article demonstrates some commonly asked SQL queries in a job interview. I will be
covering some of the common but tricky queries like:-

(i) Finding the nth highest salary of an employee.


(ii) Finding TOP X records from each group.
(iii) Deleting duplicate rows from a table.
NOTE : All the SQL mentioned in this article has been tested under SQL Server 2005.

(i) Finding the nth highest salary of an employee.


Create a table named Employee_Test and insert some test data as:Collapse | Copy Code
CREATE TABLE Employee_Test
(
Emp_ID INT Identity,
Emp_name Varchar(100),
Emp_Sal Decimal (10,2)
)
INSERT
INSERT
INSERT
INSERT
INSERT

INTO
INTO
INTO
INTO
INTO

Employee_Test
Employee_Test
Employee_Test
Employee_Test
Employee_Test

VALUES
VALUES
VALUES
VALUES
VALUES

('Anees',1000);
('Rick',1200);
('John',1100);
('Stephen',1300);
('Maria',1400);

It is very easy to find the highest salary as:-

Collapse | Copy Code


--Highest Salary
select max(Emp_Sal) from Employee_Test

Now, if you are asked to find the 3rd highest salary, then the query is as:Collapse | Copy Code
--3rd Highest Salary
select min(Emp_Sal) from Employee_Test where Emp_Sal in
(select distinct top 3 Emp_Sal from Employee_Test order by Emp_Sal desc)

The result is as :- 1200


To find the nth highest salary, replace the top 3 with top n (n being an integer 1,2,3 etc.)
Collapse | Copy Code
--nth Highest Salary
select min(Emp_Sal) from Employee_Test where Emp_Sal in
(select distinct top n Emp_Sal from Employee_Test order by Emp_Sal desc)

(ii) Finding TOP X records from each group


Create a table named photo_test and insert some test data as :Collapse | Copy Code
create table photo_test
(
pgm_main_Category_id int,
pgm_sub_category_id int,
file_path varchar(MAX)
)
insert into photo_test values
(17,15,'photo/bb1.jpg');
insert
insert
insert
insert
insert
insert
insert

into
into
into
into
into
into
into

photo_test
photo_test
photo_test
photo_test
photo_test
photo_test
photo_test

values(17,16,'photo/cricket1.jpg');
values(17,17,'photo/base1.jpg');
values(18,18,'photo/forest1.jpg');
values(18,19,'photo/tree1.jpg');
values(18,20,'photo/flower1.jpg');
values(19,21,'photo/laptop1.jpg');
values(19,22,'photo/camer1.jpg');

insert into photo_test values(19,23,'photo/cybermbl1.jpg');


insert into photo_test values
(17,24,'photo/F1.jpg');

There are three groups of pgm_main_category_id each with a value of 17 (group 17 has
four records),18 (group 18 has three records) and 19 (group 19 has three records).
Now, if you want to select top 2 records from each group, the query is as follows:Collapse | Copy Code

select pgm_main_category_id,pgm_sub_category_id,file_path from


(
select pgm_main_category_id,pgm_sub_category_id,file_path,
rank() over (partition by pgm_main_category_id order by pgm_sub_category_id asc) as rankid
from photo_test
) photo_test
where rankid < 3 -- replace 3 by any number 2,3 etc for top2 or top3.
order by pgm_main_category_id,pgm_sub_category_id

The result is as:Collapse | Copy Code


pgm_main_category_id
17
15
17
16
18
18
18
19
19
21
19
22

pgm_sub_category_id
file_path
photo/bb1.jpg
photo/cricket1.jpg
photo/forest1.jpg
photo/tree1.jpg
photo/laptop1.jpg
photocamer1.jpg

(iii) Deleting duplicate rows from a table


A table with a primary key doesnt contain duplicates. But if due to some reason, the
keys have to be disabled or when importing data from other sources, duplicates come
up in the table data, it is often needed to get rid of such duplicates.
This can be achieved in tow ways :(a) Using a temporary table.
(b) Without using a temporary table.
(a) Using a temporary or staging table
Let the table employee_test1 contain some duplicate data like:Collapse | Copy Code
CREATE TABLE Employee_Test1
(
Emp_ID INT,
Emp_name Varchar(100),
Emp_Sal Decimal (10,2)
)
INSERT
INSERT
INSERT
INSERT
INSERT
INSERT
INSERT

INTO
INTO
INTO
INTO
INTO
INTO
INTO

Employee_Test1
Employee_Test1
Employee_Test1
Employee_Test1
Employee_Test1
Employee_Test1
Employee_Test1

VALUES
VALUES
VALUES
VALUES
VALUES
VALUES
VALUES

(1,'Anees',1000);
(2,'Rick',1200);
(3,'John',1100);
(4,'Stephen',1300);
(5,'Maria',1400);
(6,'Tim',1150);
(6,'Tim',1150);

Step 1: Create a temporary table from the main table as:Collapse | Copy Code
select top 0* into employee_test1_temp from employee_test1

Step2 : Insert the result of the GROUP BY query into the temporary table as:Collapse | Copy Code
insert into employee_test1_temp
select Emp_ID,Emp_name,Emp_Sal
from employee_test1
group by Emp_ID,Emp_name,Emp_Sal

Step3: Truncate the original table as:Collapse | Copy Code


truncate table employee_test1

Step4: Fill the original table with the rows of the temporary table as:Collapse | Copy Code
insert into employee_test1
select * from employee_test1_temp

Now, the duplicate rows from the main table have been removed.
Collapse | Copy Code
select * from employee_test1

gives the result as:Collapse | Copy Code


Emp_ID
1
2
3
4
5
6

Emp_name
Anees
Rick
John
Stephen
Maria
Tim

Emp_Sal
1000
1200
1100
1300
1400
1150

(b) Without using a temporary table


Collapse | Copy Code
;with T as
(
select * , row_number() over (partition by Emp_ID order by Emp_ID) as rank
from employee_test1
)
delete
from T
where rank > 1

The result is as:-

Collapse | Copy Code


Emp_ID
1
2
3
4
5
6

Emp_name
Anees
Rick
John
Stephen
Maria
Tim

Emp_Sal
1000
1200
1100
1300
1400
1150

You might also like