Part 4 - Grouping Data and Subqueries

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Database Concepts and Skills for Big

Data
Le Duy Dung (Andrew)
SQL – Part 3
SQL Expressions and Functions
Grouping
❑ Grouping is a tool for dividing data into sets
❑ We can perform aggregate calculations on the resulted sets
❑ Groups are created using GROUP BY clause
Group By
❑ Example: Group the vendors by their cities and return the total
number of Vendors in each city
• To calculate the number of vendors in one specific city we have:
SELECT COUNT(*) AS NumberOfVendors
FROM VENDORS
WHERE City = ‘Seattle’;

• To calculate the number of vendors in each city using GROUP BY we have:


SELECT CITY, COUNT(*) AS NumberOfVendors
FROM VENDORS
GROUP BY City;
Group By
❑ The Group By clause calculates the number of vendors in each city .
❑ Using Group By we don’t need to specify each city. The statement will
find all cities in the City column.
❑ The Group By statement can contain multiple columns
❑ The Group By statement can be nested (group inside group)
❑ Most SQL implementations do not allow to group a column with
variable length data type (we will cover data types later)
❑ Group By clause come after any WHERE clause and before any
ORDER BY clause
Group By
❑ When using GROUP BY, instead of WHERE keyword, the HAVING
clause should be used.
❑ If we use the WHERE condition, it will be applied before grouping.
❑ In other words:
• The WHERE clause specifies which rows will used to determine the groups.
• The HAVING clause specifies which groups will be used in the final results.
Group By and Having
❑ Example: Find all the cities with at least 2 vendors
• Instead of WHERE, use HAVING:
SELECT City, COUNT(*) as NumOfVendor
FROM Vendor
GROUP BY City
HAVING COUNT(*) > 1;

❑ The reason WHERE does not work here because filtering is based on
the aggregated value, not the value of the rows
❑ In other words, WHERE filters before data is grouped; HAVING
filters after data is grouped.
Group By and Having
❑ Example: Find all customers who bought more than $1000 at the
store.
SELECT CustomerID, SUM(Total) as TotalBought
FROM SALE
GROUP BY CustomerID
Having SUM(Total) > 1000;
Subqueries
❑ A subquery is query embedded into other query
❑ Let’s consider an example:
❑ We want to recognize the best performing salesperson among the
employee.
❑ We need to know the total sale of each employee
❑ Get the employee with the highest sale.
Subqueries
We want to recognize the best performing salesperson among
the employee.
SELECT Top 1 EmployeeID, TotalSale
FROM (SELECT EmployeeID, SUM(Total) AS TotalSale
FROM SALE
GROUP BY EmployeeID)
ORDER BY TotalSale DESC;

What if we want to retrieve the name of this best performing employee?


Subqueries
• The subqueries are processed starting from the innermost
query and working outward
• Each subquery returns values to the upper-level query
• Subqueries can only process a single column.
QnA!

You might also like