Window Lag - Sqlzoo

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

12/26/21, 9:48 PM Window LAG - SQLZOO

Window LAG

Contents
COVID-19 Data
Window Function
Introducing the covid table
Introducing the LAG function
LAG operation
Number of new cases
Weekly changes
LAG using a JOIN
RANK()
Infection rate
Turning the corner

COVID-19 Data
Notes on the data:
This data was assembled based on work done by Rodrigo Pombo (https://github.com/pomber/covid19) based on John Hopkins
University (https://systems.jhu.edu/research/public-health/ncov/), based on World Health Organisation (https://www.who.int/health-topics/coronavirus).
The data was assembled 21st April 2020 - there are no plans to keep this data set up to date.

Window Function
The SQL Window functions include LAG, LEAD, RANK and NTILE. These functions operate over a "window" of rows - typically these are rows in the table
that are in some sense adjacent.

Introducing the covid table

1.
The example uses a WHERE clause to show the cases in 'Italy' in March 2020.

Modify the query to show data from Spain

SELECT name, DAY(whn) day_, confirmed, deaths, recovered

FROM covid

WHERE (name = 'Spain') AND (MONTH(whn) = 3)

ORDER BY whn;

Submit SQL Restore default

Result:
name day_ confirmed deaths recovered
Spain 1 84 0 2
Spain 2 120 0 2
Spain 3 165 1 2
Spain 4 222 2 2
Spain 5 259 3 2
Spain 6 400 5 2
Spain 7 500 10 30
Spain 8 673 17 30
Spain 9 1073 28 32
S i 10 1695 35 32

https://sqlzoo.net/wiki/Window_LAG 1/6
12/26/21, 9:48 PM Window LAG - SQLZOO

Introducing the LAG function


Note for MySQL:
If you are using the MariaDB engine you will hit the bug https://jira.mariadb.org/browse/MDEV-23866

You can use the Microsoft SQL Server engine instead


You can include this line before each query:

SET @@sql_mode='ANSI';

2.
The LAG function is used to show data from the preceding row or the table.
When lining up rows the data is partitioned by country name and ordered by the
data whn. That means that only data from Italy is considered.

Modify the query to show confirmed for the day before.

SET @@sql_mode='ANSI';

SELECT name, DAY(whn) day_, confirmed, LAG(confirmed, 1) OVER (PARTITION BY name ORDER BY whn) confirmed_yday

FROM covid

WHERE (name = 'Italy') AND (MONTH(whn) = 3)

ORDER BY whn;

Submit SQL Restore default

Result:
name day_ confirmed confirmed_yday
Italy 1 1694
Italy 2 2036 1694
Italy 3 2502 2036
Italy 4 3089 2502
Italy 5 3858 3089
Italy 6 4636 3858
Italy 7 5883 4636
It l 8 7375 5883

LAG operation
Here is the correct query showing the cases for the day before:

SELECT name, DAY(whn), confirmed,

LAG(confirmed, 1) OVER (partition by name ORDER BY whn) AS lag

FROM covid

WHERE name = 'Italy'

AND MONTH(whn) = 3

ORDER BY whn

Notice how the values in the LAG column match the value of the row diagonally above and to the left.

name DAY(whn) confirmed dbf


Italy 1 1694 null
Italy 2 2036 1694
Italy 3 2502 2036
Italy 4 3089 2502
Italy 5 3858 3089
Italy 6 4636 3858
Italy 7 5883 4636
Italy 8 7375 5883
Italy 9 9172 7375
Italy 10 10149 9172

https://sqlzoo.net/wiki/Window_LAG 2/6
12/26/21, 9:48 PM Window LAG - SQLZOO
...

Number of new cases

3.
The number of confirmed case is cumulative - but we can use LAG to recover the number of new cases reported for each day.

Show the number of new cases for each day, for Italy, for March.

SET @@sql_mode='ANSI';

SELECT name, DAY(whn) day_, (confirmed - LAG(confirmed, 1) OVER (PARTITION BY name ORDER BY whn)) AS day_count

FROM covid

WHERE (name = 'Italy') AND (MONTH(whn) = 3)

ORDER BY whn;
Submit SQL Restore default

Result:
name day_ day_count
Italy 1
Italy 2 342
Italy 3 466
Italy 4 587
Italy 5 769
Italy 6 778
Italy 7 1247
Italy 8 1492
Italy 9 1797

Weekly changes

4.
The data gathered are necessarily estimates and are inaccurate. However by taking a longer time span we can mitigate some of the effects.

You can filter the data to view only Monday's figures WHERE WEEKDAY(whn) = 0.

Show the number of new cases in Italy for each week in 2020 - show Monday only.

SET @@sql_mode='ANSI';

SELECT name, DATE_FORMAT(whn,'%Y-%m-%d') date_, (confirmed - LAG(confirmed, 1) OVER (PARTITION BY name ORDER BY whn)) AS week_count

FROM covid

WHERE (name = 'Italy') AND (WEEKDAY(whn) = 0)

ORDER BY whn;

Submit SQL Restore default

Result:
name date_ week_count
Italy 2020-01-27
Italy 2020-02-03 2
Italy 2020-02-10 1
Italy 2020-02-17 0
Italy 2020-02-24 226

https://sqlzoo.net/wiki/Window_LAG 3/6
12/26/21, 9:48 PM Window LAG - SQLZOO
Italy 2020-03-02 1807
Italy 2020-03-09 7136
Italy 2020-03-16 18808
Italy 2020-03-23 35947
Italy 2020-03-30 37812

LAG using a JOIN

5.
You can JOIN a table using DATE arithmetic. This will give different results if data is missing.

Show the number of new cases in Italy for each week - show Monday only.

In the sample query we JOIN this week tw with last week lw using the DATE_ADD function.

SELECT tw.name, DATE_FORMAT(tw.whn,'%Y-%m-%d') date_, (tw.confirmed - lw.confirmed) week_count

FROM covid tw LEFT JOIN covid lw ON (DATE_ADD(lw.whn, INTERVAL 1 WEEK) = tw.whn) AND (tw.name = lw.name)

WHERE (tw.name = 'Italy') AND (WEEKDAY(tw.whn) = 0)

ORDER BY tw.whn;

Submit SQL Restore default

Correct answer
name date_ week_count
Italy 2020-01-27
Italy 2020-02-03 2
Italy 2020-02-10 1
Italy 2020-02-17 0
Italy 2020-02-24 226
Italy 2020-03-02 1807
Italy 2020-03-09 7136
Italy 2020 03 16 18808

RANK()

6.
The query shown shows the number of confirmed cases together with the world ranking for cases.

United States has the highest number, Spain is number 2...

Notice that while Spain has the second highest confirmed cases, Italy has the second highest number of deaths due to the virus.

Include the ranking for the number of deaths in the table.

https://sqlzoo.net/wiki/Window_LAG 4/6
12/26/21, 9:48 PM Window LAG - SQLZOO
SET @@sql_mode='ANSI';

SELECT name, confirmed, RANK() OVER (ORDER BY confirmed DESC) rnk_confirmed, deaths, RANK() OVER (ORDER BY deaths DESC) rnk_deaths

FROM covid

WHERE whn = '2020-04-20'

ORDER BY confirmed DESC;

Submit SQL Restore default

Correct answer
name confirmed rnk_confirmed deaths rnk_deaths
US 799022 1 45179 1
Spain 200210 2 20852 3
Italy 181228 3 24114 2
France 154402 4 20241 4

Infection rate

7.
The query shown includes a JOIN t the world table so we can access the total population of each country and calculate infection rates (in cases per 100,000).

Show the infect rate ranking for each country.


Only include countries with a population of at least 10 million.

SET @@sql_mode='ANSI';

SELECT world.name, ROUND(100000*confirmed/population,0) infection_rates_per_100000, RANK() OVER (ORDER BY confirmed/population) rank_infection_rates

FROM covid JOIN world ON covid.name = world.name

WHERE whn = '2020-04-20' AND population > 10000000

ORDER BY population DESC;

Submit SQL Restore default

Correct answer
name infection_rat.. rank_infectio..
China 6 39
India 1 1
Indonesia 3 1
Brazil 20 57
Pakistan 5 39
Nigeria 0 1
Bangladesh 2 1
Russia 32 61
Japan 9 39
Mexico 7 39
Philippines 6 39
Vietnam 0 1
Ethiopia 0 1
E t 4 1

https://sqlzoo.net/wiki/Window_LAG 5/6
12/26/21, 9:48 PM Window LAG - SQLZOO

Turning the corner

8.
For each country that has had at last 1000 new cases in a single day, show the date of the peak number of new cases.
@@ q _ ;

SELECT name, date, newcases

FROM (SELECT name, date, newcases,

RANK() OVER (PARTITION BY a.name

ORDER BY a.newcases DESC) rank

FROM (SELECT name,

DATE_FORMAT(whn,'%Y-%m-%d') date,
confirmed - LAG(confirmed,1)

OVER (PARTITION BY name ORDER BY whn) newcases

FROM covid) a) b

WHERE newcases >= 1000 AND rank = 1

ORDER BY date;

#a and b are temporary tables


Submit SQL Restore default

Correct answer
name date newcases
China 2020-02-13 15136
Ecuador 2020-04-24 11536
Qatar 2020-05-30 2355
Chile 2020-06-06 13990
Pakistan 2020-06-14 12073
Kyrgyzstan 2020-07-18 11505
Equatorial Guinea 2020-07-31 1750
Peru 2020-08-02 21358
Luxembourg 2020-11-02 1967
Italy 2020-11-13 40902
Belize 2020-12-03 1382
Turkey 2020-12-10 823225
Sweden 2020-12-29 32485
Lithuania 2020-12-30 3984
Panama 2021-01-06 5186
Ireland 2021-01-08 8227
US 2021-01-08 303487
United Kingdom 2021-01-08 68192
Sudan 2021-01-10 1215
Lebanon 2021-01-15 6154
Dominican Republic 2021 01 16 2370

Retrieved from "http:///sqlzoo.net/w/index.php?title=Window_LAG&oldid=39866"

This page was last edited on 7 December 2021, at 21:27.

https://sqlzoo.net/wiki/Window_LAG 6/6

You might also like