Professional Documents
Culture Documents
Summary of Advanced SQL Features For Analysts Theme 3
Summary of Advanced SQL Features For Analysts Theme 3
SELECT
field_1,
field_2,
...,
field_n,
AGGREGATE_FUNCTION(field) AS here_you_are
FROM
table_name
WHERE -- if needed
condition
GROUP BY
field_1,
field_2,
...,
field_n
Once you know which fields you'll be grouping by, make sure all those fields are
listed in both the SELECT block and the GROUP BY block. The aggregate function
itself shouldn't be included in the GROUP BY block; otherwise, the query won't
comply. SQL's GROUP BY operates much like the groupby() method in pandas.
GROUP BY can be used with any aggregate function: COUNT, AVG, SUM, MAX,
MIN. You can call several functions at a time.
Sorting Data
Analysis results are usually presented in a certain order. To sort data by a field,
you use the ORDER BY command.
Unlike GROUP BY, with ORDER BY, only those fields by which we want to sort the
data should be listed in the the command block.
Two modifiers can be used with the ORDER BY command to sort the data in
columns:
The ORDER BY modifiers are placed right after the field by which the data is
sorted:
ORDER BY
field_name DESC
-- sorting data in descending order
ORDER BY
field_name ASC;
-- sorting data in ascending order
The LIMIT command sets a limit to the number of rows in the result. It always
comes at the end of a statement, followed by the number of rows at which the
SELECT
field_1,
field_2,
...,
field_n,
AGGREGATE_FUNCTION(field) AS here_you_are
FROM
table_name
WHERE -- if needed
condition
GROUP BY
field_1,
field_2,
...,
field_n,
ORDER BY -- if needed. List only those fields
--by which the table data is to sorted
field_1,
field_2,
...,
field_n,
here_you_are
LIMIT -- if needed
n;
-- n: the maximum number of rows to be returned
SELECT
field_1,
field_2,
...,
field_n,
AGGREGATE_FUNCTION(field) AS here_you_are
FROM
TABLE
WHERE -- if needed
condition
GROUP BY
The resulting selection will include only those rows for which the aggregate
function produces results that meet the condition indicated in the HAVING and
WHERE blocks.
HAVING and WHERE have a lot in common. So why can't we pass all of our
conditions to one of them? The thing is that the WHERE command is compiled
before grouping and arithmetic operations are carried out. That's why it's
impossible to set sorting parameters for the results of an aggregate function with
WHERE. Hence the need for HAVING.
Pay special attention to the order in which the commands are introduced:
1 GROUP BY
2 HAVING
3 ORDER BY
This order is mandatory. Otherwise, the code won't work.
SELECT
EXTRACT(date_fragment FROM column_name) AS new_column_with_date
FROM
Table_with_all_dates;
EXTRACT, unsurprisingly, extracts the information you need from the timestamp.
You can retrieve:
century
day
isodow(day of the week under ISO 8601, the international date and time
format); Monday is 1, Sunday is 7
hour
milliseconds
minute
second
month
quarter
year
DATE_TRUNC truncates the date when you only need a certain level of precision.
For example, if you need to know what day an order was placed but the hour
doesn't matter, you can use DATE_TRUNC with the argument "day.") Unlike with
EXTRACT, the resulting truncated date is given as a string. The column from
which the full date is to be taken comes after a comma:
SELECT
DATE_TRUNC('date_fragment_to_be_truncated_to', column_name) AS new_column_with_date
You can use the following arguments with the DATE_TRUNC function:
'microseconds'
'milliseconds'
'second'
'minute'
'hour'
'day'
'week'
'month'
'quarter'
'year'
'decade'
'century'
Subqueries
A subquery, or inner query, is a query inside a query. It retrieves information that
will later be used in the outer query.
SELECT
SUBQUERY_1.column_name,
SUBQUERY_1.column_name_2
FROM -- to make the code readable, put subqueries in new lines
-- indent subqueries
(SELECT
column_name,
column_name_2
FROM
table_name
WHERE
You may need subqueries at various places within your query. Let's put one in the
WHERE block. The main query will compare the results of the subquery with
values from the table in the outer FROM block. When there's a match, the data
will be selected:
SELECT
column_name,
column_name_1
FROM
table_name
WHERE
column_name =
(SELECT
column_1
FROM
table_name_2
WHERE
column_1 = value);
Now let's add the IN construction to our sample and collect data from several
columns:
SELECT
column_name,
column_name_1
FROM
table_name
WHERE
column_name IN
(SELECT
column_1
FROM
table_name_2
WHERE
column_1 = value_1 OR column_1 = value_2);