Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Summary of Advanced SQL

Features for Analysts


Grouping Data
When data is to be divided into groups by field values, the GROUP BY command
is used:

SELECT
field_1,
field_2,
...,
field_n,
AGGREGATE_FUNCTION(field) AS here_you_are
FROM
table_name
WHERE -- if needed
condition
GROUP BY
field_1,
field_2,
...,
field_n

Once you know which fields you'll be grouping by, make sure all those fields are
listed in both the SELECT block and the GROUP BY block. The aggregate function
itself shouldn't be included in the GROUP BY block; otherwise, the query won't
comply. SQL's GROUP BY operates much like the groupby() method in pandas.
GROUP BY can be used with any aggregate function: COUNT, AVG, SUM, MAX,
MIN. You can call several functions at a time.

Sorting Data
Analysis results are usually presented in a certain order. To sort data by a field,
you use the ORDER BY command.

Summary of Advanced SQL Features for Analysts 1


SELECT
field_1,
field_2,
...,
field_n,
AGGREGATE_FUNCTION(field) AS here_you_are
FROM
table_name
WHERE -- if needed
condition
GROUP BY
field_1,
field_2,
...,
field_n,
ORDER BY -- if needed. List only those fields
--by which the table data is to sorted
field_1,
field_2,
...,
field_n,
here_you_are;

Unlike GROUP BY, with ORDER BY, only those fields by which we want to sort the
data should be listed in the the command block.
Two modifiers can be used with the ORDER BY command to sort the data in
columns:

ASC (the default) sorts data in ascending order.

DESC sorts data in descending order.

The ORDER BY modifiers are placed right after the field by which the data is
sorted:

ORDER BY
field_name DESC
-- sorting data in descending order

ORDER BY
field_name ASC;
-- sorting data in ascending order

The LIMIT command sets a limit to the number of rows in the result. It always
comes at the end of a statement, followed by the number of rows at which the

Summary of Advanced SQL Features for Analysts 2


limit is to be set (n):

SELECT
field_1,
field_2,
...,
field_n,
AGGREGATE_FUNCTION(field) AS here_you_are
FROM
table_name
WHERE -- if needed
condition
GROUP BY
field_1,
field_2,
...,
field_n,
ORDER BY -- if needed. List only those fields
--by which the table data is to sorted
field_1,
field_2,
...,
field_n,
here_you_are
LIMIT -- if needed
n;
-- n: the maximum number of rows to be returned

Processing Data within a Grouping


The WHERE construction is used to sort data by rows. Its parameters are, in fact,
table rows. When we need to sort data by aggregate function results, we use the
HAVING construction, which has a lot in common with WHERE:

SELECT
field_1,
field_2,
...,
field_n,
AGGREGATE_FUNCTION(field) AS here_you_are
FROM
TABLE
WHERE -- if needed
condition
GROUP BY

Summary of Advanced SQL Features for Analysts 3


field_1,
field_2,
...,
field_n
HAVING
AGGREGATE_FUNCTION(field_for_grouping) > n
ORDER BY -- if needed. List only those fields
--by which the data is to be sorted
field_1,
field_2,
...,
field_n,
here_you_are
LIMIT -- if needed
n;

The resulting selection will include only those rows for which the aggregate
function produces results that meet the condition indicated in the HAVING and
WHERE blocks.

HAVING and WHERE have a lot in common. So why can't we pass all of our
conditions to one of them? The thing is that the WHERE command is compiled
before grouping and arithmetic operations are carried out. That's why it's
impossible to set sorting parameters for the results of an aggregate function with
WHERE. Hence the need for HAVING.

Pay special attention to the order in which the commands are introduced:

1 GROUP BY

2 HAVING

3 ORDER BY
This order is mandatory. Otherwise, the code won't work.

Operators and Functions for Working with Dates


We have two major functions for working with date and time values: EXTRACT
and DATE_TRUNC (i.e. truncate date). Both functions are called in the SELECT
block.

Summary of Advanced SQL Features for Analysts 4


Here's what the EXTRACT function looks like:

SELECT
EXTRACT(date_fragment FROM column_name) AS new_column_with_date
FROM
Table_with_all_dates;

EXTRACT, unsurprisingly, extracts the information you need from the timestamp.
You can retrieve:

century

day

doy — day of the year, from 1 to 365/366

isodow(day of the week under ISO 8601, the international date and time
format); Monday is 1, Sunday is 7

hour

milliseconds

minute

second

month

quarter

week — week of the year

year

DATE_TRUNC truncates the date when you only need a certain level of precision.
For example, if you need to know what day an order was placed but the hour
doesn't matter, you can use DATE_TRUNC with the argument "day.") Unlike with
EXTRACT, the resulting truncated date is given as a string. The column from
which the full date is to be taken comes after a comma:

SELECT
DATE_TRUNC('date_fragment_to_be_truncated_to', column_name) AS new_column_with_date

Summary of Advanced SQL Features for Analysts 5


FROM
Table_with_all_dates;

You can use the following arguments with the DATE_TRUNC function:

'microseconds'

'milliseconds'

'second'

'minute'

'hour'

'day'

'week'

'month'

'quarter'

'year'

'decade'
'century'

Subqueries
A subquery, or inner query, is a query inside a query. It retrieves information that
will later be used in the outer query.

Subqueries can be used at various locations within a query. If a subquery is inside


the FROM block, SELECT will select data from the table that gets generated by
the subquery. The name of the table is indicated within the inner query, and the
outer query refers to the table's columns. Subqueries are always put in
parentheses:

SELECT
SUBQUERY_1.column_name,
SUBQUERY_1.column_name_2
FROM -- to make the code readable, put subqueries in new lines
-- indent subqueries
(SELECT
column_name,
column_name_2
FROM
table_name
WHERE

Summary of Advanced SQL Features for Analysts 6


column_name = value) AS SUBQUERY_1;
-- remember to name your subquery in FROM block

You may need subqueries at various places within your query. Let's put one in the
WHERE block. The main query will compare the results of the subquery with
values from the table in the outer FROM block. When there's a match, the data
will be selected:

SELECT
column_name,
column_name_1
FROM
table_name
WHERE
column_name =
(SELECT
column_1
FROM
table_name_2
WHERE
column_1 = value);

Now let's add the IN construction to our sample and collect data from several
columns:

SELECT
column_name,
column_name_1
FROM
table_name
WHERE
column_name IN
(SELECT
column_1
FROM
table_name_2
WHERE
column_1 = value_1 OR column_1 = value_2);

Summary of Advanced SQL Features for Analysts 7

You might also like