Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

University of Mathematical

Southampton Sciences

MATH6183
Data Mining and Analytics

Introduction to
SQL: Workshop 1

Professor Christine Currie


Introduction to SQL 2

Contents
1 Introduction 3
1.1 Installing SQLite on Your Computer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Working with SQLite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Manipulating Data Using SQL 5


2.1 Addition, Subtraction, Multiplication and Division . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Concatenate Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Manipulating Dates and Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Comparison Predicates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4.1 Using Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5 Aggregate Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Exercise 8

4 Creating Tables Using SQL (Optional) 9


4.1 Modifying the Data in your Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Introduction to SQL 3

1 Introduction
In this workshop, we will be using SQL to extract information from a database. We will be working
in SQLite, but the SQL commands should work in other database applications, e.g. Access, mySQL,
PostgreSQL, Oracle, etc. There is a wide range of literature on SQL and you will need to find a book or
e-book that suits you best. The following two books take their readers from the very basics of SQL and
databases up to some relatively advanced ideas.

• SQL for Dummies, 8th Edition: Allen G. Taylor, John Wiley and Sons, New Jersey, USA, 2013.
• A Beginner’s Guide to SQL. 3rd Edition: Andy Oppel and Robert Sheldon, McGraww Hill, USA,
2009.

Both contain descriptions of all of the SQL commands that you are likely to need for this module. In
addition, they are able to place SQL in a wider context so that you can read about the other issues that
need to be taken into account when building databases that we do not have time to cover in this module,
e.g. keeping them secure.
We will be working with two different databases in the SQL workshops: the library database and the fruit
and vegetables database. Both are described below. There is an exercise at the end of this worksheet that
you should work through to check your understanding of what is covered in this tutorial.

1.1 Installing SQLite on Your Computer


SQLite is a C-language library that implements a small, fast, self-contained, high-reliability, full-featured,
SQL database engine. SQLite is (apparently) the most used database engine in the world, according to their
website. You can download SQLite here: https://www.sqlite.org/download.html. These instructions
on how to download it are also very useful for Windows users: https://www.sqlitetutorial.net/
download-install-sqlite/.
In the tutorials we will be typing SQL code directly into the Command Prompt on Windows but there are
several tools that will provide you with a GUI (Graphical User Interface) for working with SQLite:

• SQLiteStudio www.sqlite.org
• DBeaver https://dbeaver.io/
• DB Browser for SQLite https://sqlitebrowser.org/

While we will not be working with these directly in the tutorials, you may find them useful when working
with SQLite on your own machines.

1.2 Working with SQLite


The SQLite website includes a tutorial that describes how to get started, available at https://www.
sqlite.org/quickstart.html. Below we give the key information needed to create a new database and
open an existing database. For more information, please visit the website.
There are two options for opening SQLite. The second works on a Windows machine such as those in the
computer lab sessions.

1. Open the shell or command prompt (search on the task bar if you are unsure where to find this) and
type sqlite3. You should then see some text appear to show that sqlite3 has opened.
2. Double click on sqlite3.exe in File Explorer.

We go through some of the most useful commands in SQLite below but for a more detailed description,
including a list of commands, see https://www.sqlite.org/cli.html.
Introduction to SQL 4

Create a New Database

To either create a new database or open an existing called test.db type the following command into the
command prompt.
.open test.db
If test.db does not exist sqlite will create a new database.
To ensure that the database is created or opened from the same folder, you are able to include the full
pathname. For example, to open test.db in the folder called “work” in the C drive, type the following.
.open c:/work/test.db
It is not straightforward to check what your working directory is. If you are working within the Command
Prompt on Windows, the following command should return the path of the working directory.
.shell cd
(The command .shell will allow you to run any commands typically used within the program/shell you are
using to access SQLite. For Windows, the command cd will either change the directory or list the currend
directory.)

Writing Out SQL Results

Our aim in this part of the course is to write SQL to extract useful information from a database. Sometimes
that information might just be one answer but often the output is a data table. SQLite will, as a default,
use the LIST mode for outputs in which each row of a query result is written on one line of output and
each column within that row is separated by a specific separator string. The default separator is a pipe
symbol (“—”). List mode is especially useful when you are going to send the output of a query to another
program.
There are 14 output modes available: ascii, box, csv, column, html, insert, json, line, list, markdown,
quote, table, tabs, tcl. You are likely to be mainly using list or csv. The second of these outputs a csv file
including the query results.

• .mode on its own will tell you which output mode is being used.
• .mode csv will change the output mode to csv (and similar for other available output modes).

For example, to output the results of a query on the Supplier table to a csv file called dataout.csv stored
in the downloads folder on the C drive you would type.

.headers on
.mode csv
.once c:/users/username/downloads/dataout.csv
SELECT * FROM Supplier;

Note that the SQL code in this example is the “SELECT * FROM Supplier;”. Check what the “.headers
on” command does in this example. The computers in some of the university teaching space will try to
open csv files in Minitab. It is actually easier to view them in Notepad or Excel. To do this, right click on
the file in File Explorer and click Open With, then choose the program to use to view the results.
Check that you have understood this by downloading both of the example databases from
Blackboard and saving them on your machine, then try to open them in SQLite. As you go
through the remainder of the worksheet, run the SQL queries on the Fruit and Veg database
and check the output matches that given in the worksheet, before working on the exercises.
Introduction to SQL 5

1.3 Examples
Example: Fruit and Veg

A warehouse wishes to set up a database to record the movement of fruit and vegetables. These arrive at
the warehouse from farms and are then sent out to supermarkets or other customers. Currently, everything
is recorded manually but a database is needed to allow the end customers to trace the origin of their food.
The standard form attached to each consignment is given below and a database is provided on Blackboard
for you to download. We do not cover structuring of databases as part of the syllabus but you may want
to think about how you would split the information given here into tables within a database to avoid
repeating information.
Product Information
Description Apples Picked 27/09/08
Arrived 29/09/08 Quality High
Origin Cider Farm Checked in by Bob
Contact Mrs G. Smith Telephone 01938 340928
Address Cider Farm, Gloucester Postcode GL1 2NH
Destination Waitrose, Cirencester Lorry B78
Contact Mr J. Lewis Telephone 01234 567890
Address Waitrose, High Street, Cirencester Postcode CL1 2GH

Example: Library

You have been asked to work with a database that can keep track of the availability of books in a small
lending library and who currently has these books on loan. The library would like to be able to contact
customers who have overdue books and customers would like to be able to search for books on different
subjects. The dataset was created via simulation and a copy is available to download from Blackboard.

2 Manipulating Data Using SQL


Handling data is dealt with by the Data Manipulation Language (DML), which forms a second part of
SQL. The purpose of the DML is principally to deduce information from the data stored in the database
and consequently is arguably the most useful part of SQL for someone working in Operational Research
or Data and Decision Analytics.
The SELECT statement, often in conjunction with WHERE is used to select the data from the database
that you wish to manipulate. SQL supports the usual set of expressions for manipulating data. We will
work through a set of these to demonstrate how to use them. Table 1 provides a reference of the data
types that you are most likely to use in SQL.

2.1 Addition, Subtraction, Multiplication and Division


The usual operators apply in SQL:‘+’, ‘-’, ‘*’, ‘/’ and can be used to carry out operations on the data.

2.2 Concatenate Text


Joining two pieces of text together is described as ‘concatenation’. In SQLite, the concatenation operator
is ‘||’. For example, in order to prepare a list of suppliers where their name and address are included in
the same column, write
Introduction to SQL 6

Data Type Description


INTEGER No fractional part.
BIGINT An integer with higher precision than INTEGER where precision is the
maximum number of significant digits a number can have.
NUMERIC Number with a fractional part: you can specify precision and scale, the
number of digits in its fractional part.
DECIMAL Number with a fractional part which will use whichever is greater of the
default precision and scale and those you specify.
REAL Single-precision, floating point number.
DOUBLE PRECISION Double-precision, floating point number.
FLOAT Number with a fractional part: you can specify precision but do not
need to specify whether they use double- or single-precision arithmetic.
CHARACTER or CHAR Character-string that you can use for text or numbers that are codes
rather than scaled numbers, e.g. telephone numbers. You can specify
the number of characters, and 1 is the default. If you don’t use all of
the characters, SQL will pad out the field with blanks.
VARCHAR or CHARACTER Enables you to store exactly the number of characters that a user enters.
VARYING You specify the maximum number of characters.
BOOLEAN or BIT In Access, you need to use the datatype BIT, which gives you a Yes/No
variable. Boolean is not used in Access and takes the values TRUE,
FALSE, or UNKNOWN.
DATE Year-month-day, YYYY-MM-DD, e.g. 2022-10-03 would represent 3 Oc-
tober 2022.
DATETIME Year-month-day Hours:minutes:seconds YYYY-MM-DD HH:MI:SS, e.g.
2022-10-03 09:10:06 would represent 10 minutes and 6 seconds past 9am
on 3 October 2022.

Table 1: A selection of the datatypes that you are most likely to use in SQL. Note that some formats may
vary from one package to another.
Introduction to SQL 7

SELECT Name || Address


FROM Supplier;

Look at the output from this query - it does not look tidy. This can be improved by adding a space in
between the output of the two fields. See if you can do this for yourself (Hint, you will need to use quotes
“ ”).

2.3 Manipulating Dates and Times


Manipulating dates and times is best achieved using the DATEADD() function but this does not work in
SQLite. Instead, use the DATE() function to add a time interval to a date.

SELECT DispatchDate, DATE(DispatchDate, ‘+10 days’),


DATE(DispatchDate, ‘-10 days’), DATE(DispatchDate, ‘-10 months’),
DATE(DispatchDate, ‘-10 years’)
FROM Dispatch;

(Note if you get fed up with seeing a huge number of results add WHERE DispatchID = 1 before the
semi-colon and it will just return the output for the first entry in the Dispatch table.)
SQL has a standard date format yyyy-mm-dd and the DATE function will not work if dates are included
in the database in an incorrect format.
There is also a DATETIME data type which we use in the library database to record the date and time of
each transaction. To find the date of a DATETIME field you can use the DATE() function. For example,
to find the date on which a transaction took place in the library database we would use DATE(true date).
Similarly we may want to find the time that a transaction took place and we can then use TIME(true date)
to return just the time part of the true date field.
You may also want to extract either the month or the year from a date variable. To do this, you can
use the function strftime(). See https://www.w3resource.com/sqlite/sqlite-strftime.php for full
details of how to use this function.

2.4 Comparison Predicates


A predicate is a logical statement, which may form a condition that the data you display have to obey.
For example, insisting that DispatchID is equal to 1 as we describe above is an example of a comparison
predicate. SQL supports the operators: =, <> (not equal to), <, >, <= (less than or equal to), >=
(greater than or equal to).

2.4.1 Using Logic

You may wish to use a more complicated predicate and in this case, you will need to make use of logic
operators: AND, OR, NOT. For example, if you wish to run the date query where the dispatchID is equal
to 1 or 2, you would add the line

WHERE DispatchID = 1 OR DispatchID = 2;

If you wish to view suppliers that are called products that are high quality and grapes then you would use
the following query.

SELECT * FROM Product


WHERE ProductDescription = ‘Grapes’ AND Quality = ‘High’;

Remember that ‘SELECT *’ means that the query will return all of the fields of the data table.
Test this out and variations of this query to be sure that you understand how it works.
Introduction to SQL 8

2.5 Aggregate Functions


There are 5 aggregate or set functions supported by SQL that act on the data contained in the rows that
meet the conditions of the query in the specified table.

1. COUNT: returns the number of rows,


2. MAX: returns the maximum value,
3. MIN: returns the minimum value,
4. SUM: returns the sum of the values,

5. AVG: returns the average of the values.

For example, in the Fruit and Veg database, we may want to return the number of times that a particular
lorry is used to dispatch an order:

SELECT COUNT(DispatchID)
FROM Dispatch
WHERE Lorry = ‘L4’;

Combining with these aggregate functions, we may wish to return the top few entries in an ordered list.
We can do this by first ordering by the quantity of interest and then using the LIMIT function to specify
how many entries to output. For example, the following returns the two lorries that have recorded the
most trips in the Dispatch table.

SELECT Lorry, COUNT(DispatchID) AS numTrips


FROM Dispatch
GROUP BY Lorry
ORDER BY numTrips DESC
LIMIT 2;

The command DESC is needed to ensure that we are sorting in descending order, while the GROUP BY
command ensures that we are counting the number of trips per lorry. We will see GROUP BY again in
workshop 2.

3 Exercise
In carrying out this exercise, you will need to know the names of the tables and fields for the library
dataset. These are listed below.

• users: user id, first name, surname, email


• books: book id, title, author1, author1 initial, author2, author2 initial, year, topic, num copies

• transactions: transaction id, true date, book id, user id, trans type.

There is a way of finding out field names and details in SQLite, which may be useful when working with
databases that you do not have the full details of (or if you have forgotten the details). After opening the
database, type the following to find out details about a table called table-name.
PRAGMA table info(table-name);
Alternatively, to view information about all of the tables and their fields, use .schema. For the library
database, this outputs the following.
Introduction to SQL 9

CREATE TABLE books(book_ID INTEGER NOT NULL, title VARCHAR(50), author1 VARCHAR(20),
author1_intial VARCHAR(10), author2 VARCHAR(20), author2_initial VARCHAR(10),
year INTEGER, topic VARCHAR(20), num_copies INTEGER);
CREATE TABLE transactions(transaction_id INTEGER NOT NULL, true_date DATE, book_id INTEGER,
user_id INTEGER, trans_type VARCHAR(10));
CREATE TABLE users(user_id INTEGER NOT NULL, first_name VARCHAR(30),
surname VARCHAR(30), email VARCHAR(50));

Use SQL to obtain the following information from the library database:

1. Output a list of library users who have the first name Michael, including their full name and their
e-mail addresses, where their first and second names are included in the first column and their e-mail
addresses in the second column.
2. Assuming that library books are allowed on loan for 3 weeks, output a list of library books that
have been loaned out, detailing the date that they were loaned out and the date they were due to be
returned. To reduce the amount of text that you output, do this only for the first 10 transactions in
the database by placing a condition on transaction id.
3. The number of times that Studies in Optimisation has been withdrawn from the library.
4. The most recent transaction in the library.
5. List all of the loans that user 7 has made.
6. (Extension) List all of the transactions that user 7 has made during October.

Table 2 includes a list of SQL commands used in this workshop.

Command Description
SELECT An important command that is used to select a set of data that fulfils
the criteria provided in the SQL query
DATE Use to convert text to a date or to manipulate a date, e.g. add or
subtract a number of days or years.
WHERE Allows the user to specify conditions on the data to be used in the query
CREATE Use to create a table. In Access, CREATE can only be used to create a
table but in other implementations of SQL it allows you to create other
objects.
DROP Use to delete anything that has been created by a CREATE statement.
ALTER Use to alter tables or columns
DELETE Use to delete rows
INSERT Adds new rows to a table

Table 2: A selection of key SQL Commands Used in this Workshop. The final few are used in the following
optional section.

4 Creating Tables Using SQL (Optional)


The part of SQL that is used for setting up databases is called the Data Definition Language (DDL).
Different implementations of SQL will support different DDL commands.
SQL supports three types of tables, but we will be working with just one type: persistent base tables.
These are the tables that hold the data stored in a database. To create a table in SQL use the command
Introduction to SQL 10

CREATE TABLE
For example, if we return to the fruit and vegetable example, we would create the Supplier table by typing
the following into SQLite.
CREATE TABLE Supplier (

SupplierID INTEGER NOT NULL,


Name VARCHAR(25),
Address VARCHAR(50),
PostCode VARCHAR(10),
TelephoneNumber VARCHAR(15) );

Each attribute has its data type specified and any constraints that you wish to place on it, e.g. the primary
key (ContactID) has been specified as being NOT NULL.
If you need to change the structure of a table or remove it from the database, use the ALTER TABLE
or DROP commands. For example, to add a yes/no column named Testing to the Suppliers table in the
Fruit and Veg database, write
ALTER TABLE Supplier
ADD COLUMN Testing BIT;
If you now want to delete the column,
ALTER TABLE Supplier
DROP COLUMN Testing;

4.1 Modifying the Data in your Tables


There are a number of ways of adding data into a database. If you wish to add in one line of data to your
database, the following statement should be used.
INSERT INTO Supplier VALUES (101, ‘Testing Farm’, ’Fred Smith’, ‘Red Street, London’, ‘W1 2FP’,
’fred@myemail.com’, ‘02086728907’);
You may also wish to add a large number of entries at the same time. In SQLite, the easiest way to add
data to an existing table called tableName is to save the data to be added as filename.csv (csv = comma
separated values; the filename can be your choice) and use the following set of functions.
.mode csv
.import fileName.csv tableName
Note that the csv file should not include column titles.
It can also be useful to delete specific rows from a database. For example, to delete the records of all
suppliers called ALEX WHITE from the Supplier table, write
DELETE FROM Supplier WHERE Name = ‘ALEX WHITE’;
Note that SQLite does not ask you twice about whether you want to execute this command and once the
data has been deleted, it cannot be retrieved.

You might also like