Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

PostgreSQL

2.021 – UAB – DACSO


Pere Pons
Virtual machine prepared
For this course there is a virtual machine already prepared.

• VirtualBox based
• Has installed:
• PostgreSQL
• Dbeaver
• Sublime text editor

• Has some databases for learning

• Install Oracle VirtualBox, and add the virtual machine to your installation.
• The user is user1 and password user1
What is PostgreSQL
• descripció I breu història
• capacitats (replicas, transaccions, bloqueig sofisticat, vistes, integritat
referencial, tipus definits per l’usuari, herencia, regles)

https://www.postgresql.org/docs/9.1/sql.html
What can PostgreSQL offer us?
It is said to be “the most advanced open-source relational database in the world”.

Key points:
• Highly standard compliant
• Supports high concurrency with clever locks
• Transactions: ensures atomicity, isolation, and durability.
• Support for a high set of datatypes (numeric, shapes, addresses, bit strings, texts, json, etc..)
• Replication capabilities

Comparison between databases: https://www.digitalocean.com/community/tutorials/sqlite-vs-mysql-vs-postgresql-a-comparison-of-relational-database-management-systems


Ranking of databases: https://db-engines.com/en/ranking
PostgresSQL architecture

One process per


connection

Dedicated to
replication

Stored data

https://en.wikibooks.org/wiki/PostgreSQL/Architecture#:~:text=PostgreSQL%20is%20a%20relational%20database,request%20read%20and%20write%20operations.
PostgreSQL connectivity
• Achitecture of PostgreSQL. C/C#/VB/P
ython etc..
Java
application
Application in
any platform
that has
application drivers

• ODBC I JDBC psql command


ODBC (Open
Database
JDBC (Java
Database
Native
line
Connectivity) Connectivity) drivers
• Drivers natius
• us amb php, python, c#??
postgres
processes

Files in the fs
Installing postgresql in Linux
The easiest way to install it:
$ sudo apt install postgresql

Then, indicate how users will be allowed to log in editing:


/etc/postgresql/12/main/pg_hba.conf

Users can be requested to authenticate by different methods


• trust: just trust, no check.
• peer: the user in the OS has to be the same than the one in the database. No password check is performed.
• password: sends the password not encrypted
• md5: sends the password encrypted
• And many others like pam, ldap, with certificate, etc..

Firsts interactions will be done with the user “postgres”


$ sudo su postgres
Creation and removal of databases

Use with postgres user:

create: createdb <db_name>


remove: dropdb <db_name>

list existing dabases: psql –l


Creation of users
to create a user:
$ createuser –d –P –r <username>

(createuser --help for more information)

To log in via de psql user interface:

psql –h <host to connect> -U <username> -W


<database_name>
Interacting with PostgreSQL
our user in this
installation is
To interact with PostgreSQL the system offers you a console based tool: dbuser1 and its
password is
psql dbuser1

To connect to a db use: psql –h <host> -U user –W <database>

with psql you can:


• Send queries interactively
• run “batch queries” from files
psql
with the command \? you will see the help
information about the commands you can
send to psql

With the \h command you will get the help


about the SQL syntax. Try “\h select”.

It is the man page of PostgreSQL.

It is a useful tool, but in development having


only a command line tool is not the best
option...
DBeaver
Dbeaver is a free multi-
platform tool for developers,
database administrators and
analysts. It supports all
popular databases:
PostgreSQL, MySQL, Oracle,
MS SQLServer, Sybase, SQLite,
etc..

It is available in Linux,
Windows and OSX.

A tool to know.
Connect to PostgreSQL with dbeaver
You an create a new connection to a
PostgreSQL database. Just select new
database connection and the PostgreSQL
type.

Then set the connection settings:

• host to connect (127.0.0.1)


• username and password
• database to connect

Then, just select the database and connect.


Let’s get started!
Type here your
commands
In our VM you will find a database of the Martian
Empire, let’s use it to learn SQL.

Press F3, and select “New script”. This will allow us to


write SQL commands interactively.
To run the command press Control+Enter and see the
results in the bottom area.
Get here your
results
Martian Empire’s schema
Queries
We have to prepare a set of reports for the government of Mars, and to do it we will take
advantage of the Planetary Database that we have.

We will prepare the following tasks that we have been assigned:


• to provide a list of Martian population.
• to provide a list of population of all bases. Name, surname and name of the base.
• to provide a list of population that is not assigned to any base.
• to provide a list with the first name, the surname and the base name for everyone. If
someone is not assigned to any base, the name of the base will be empty.
• Calculate the value stored of each product supply per resort
• Calculate the cost of the stock that Mars owns
• Prepare a list with all the know humans
Basics of queries
“to provide a list of Martian population”

The basic form of a query is

SELECT * | <field names>


FROM <table_names>|<subquery>
WHERE <conditions to meet>
[ORDER BY <fields to order> DESC|ASC]

SELECT name, surname


FROM martians
Joining two tables

“provide a list of population of all bases. Name, surname and


name of the base.”

SELECT martians.name, martians.surname, bases.base_name


FROM martians, bases
where martians.base_id = bases.base_id

SELECT m.name, m.surname, b.base_name


FROM martians m, bases b
WHERE m.base_id = b.base_id

SELECT m.name, m.surname, b.base_name


FROM martians m, bases b
WHERE m.base_id = b.base_id
ORDER BY b.base_name asc

We are joining the information of two tables to create a report.


Joining two tables

There is another option

SELECT m.name, m.surname, b.base_name


FROM martians m
INNER JOIN bases b
ON m.base_id = b.base_id
ORDER BY b.base_name

The same result… mmm

Martians Bases

The INNER join is the intersection zone, is to say the rows that
have a base_id that is in the second table.
Unlocated people
“provide a list of population that is not assigned to any base”

We have a foreign key that makes sure that all base_id in the martian table exists in the bases table.
The base_id in martian table may be null
We want to know who has a base_id set to null.

SELECT m.name, m.surname


FROM martians m
WHERE m.base_id is null
Everyone in his base.. even the ones that do not
have.
“to provide a list with the first name, the surname and the base name for everyone. If someone is not assigned to any base,
the name of the base will be empty.”

In this case we want to know who is a martian and its base, even if he does not have a base. Martians Bases
Is important not to forget anyone!

This operation is a LEFT join (the group on the left in the schema)

SELECT m.name, m.surname, b.base_name


FROM martians m
LEFT JOIN bases b
ON m.base_id = b.base_id
ORDER BY b.base_name
Everyone in his base.. even the ones that do not
have (cont).

Of course, RIGHT join will show all martians that have a base, but also include the bases that
do not have martians Martians Bases

SELECT m.name, m.surname, b.base_name


FROM martians m
RIGHT JOIN bases b
ON m.base_id = b.base_id
ORDER BY b.base_name
Everyone in his base.. even the ones that do not
have (cont).

And, of course the last operation is the FULL join, is to say:


Martians Bases
All martians and all bases, the ones that do not have a correspondence will have the
fields set to NULL

SELECT m.name, m.surname, b.base_name


FROM martians m
FULL JOIN bases b
ON m.base_id = b.base_id
ORDER BY b.base_name
Everyone in his base.. even the ones that do not
have (cont).

There is a fifth case… the ones that are not in the bases, but without looking at the NULL value
Martians Bases
in the base_id.

SELECT m.name, m.surname, b.base_name


FROM martians m
LEFT JOIN bases b
ON m.base_id = b.base_id
WHERE base_name is null
ORDER BY b.base_name
Cartesian product
If we set no restriction between two tables
SQL will make the cartesian product of the sets
Subqueries
Up to now, we have only made queries from
tables, but we can make queries on queries…

SELECT m.name, m.surname, b2.base_name


FROM martians m,
( select b.base_id, b.base_name
from bases b
where b.base_name > 'T') b2

WHERE b2.base_id = m.base_id

We will select the bases whose name is bigger than “T” and use that
selection as a table.
Calculations
“Calculate the value stored of each product supply per resort”
PostgreSQL not only can print values, also can perform multiple
calculations..

select r.name as Resort, s.name as Supply, rs.stock as Quantity,


rs.stock * s.value as Value
from resort_supplies rs, resorts r, supplies s
where rs.resort_id = r.resort_id
and rs.supply_id = s.supply_id

PostgreSQL has hundreds of functions implemented:


• Logical,
• Mathmatical,
• String manipulation,
• patter matching functions
• Date and time functions,
• etc… (up to 25 groups of functions)
https://www.postgresql.org/docs/9.1/functions.html
Conditions
What else can we set in search condition of the WHERE clause?
• Logical operations between clauses AND, OR, NOT
• Comparison clauses =, >, <, >=, <=, <>,
SELECT … WHERE (a > 8) and (b < 10) or (c = 12)
SELECT .. FROM people WHERE age BETWEEN 18 AND 65

• Operations with sets


SELECT … WHERE color IN (‘red’, ‘blue’, ‘green’)
SELECT … FROM users WHERE EXIST (SELECT id FROM allowed WHERE allowed.id=users.id)

• Operations with strings


SELECT … WHERE LENGTH(name) = 4
SELECT … WHERE name LIKE ‘Mi%’

• And many, many others.. refer to the online user manual for more information.
https://www.postgresql.org/docs/12/functions.html the list is huge!!!
Grouping data
“Calculate the total stock value that each resort has stored”
• Let’s calculate
What is the cost of each product that all bases have?

SELECT r."name" as Resort_name, s."name" as "Supply",


rs.stock * s.value as stock_value
FROM resort_supplies rs, supplies s, resorts r
WHERE rs.supply_id = s.supply_id
AND r.resort_id = rs.resort_id

SELECT r."name" as Resort_name, SUM(rs.stock * s.value) as stock_value_per_resort


FROMresort_supplies rs, supplies s, resorts r
WHERE rs.supply_id = s.supply_id
AND r.resort_id = rs.resort_id
GROUP BY by resort_name

We could add a HAVING clause after the group by. Will include an additional contition to be applied in the grouped information.
Paging data
Mars has a table of Earthers for marketing purposes.
It contains the Earther_Id, the name, surname and
email of the person. It has nearly 14 million records!

To list the earthers in groups of 20, to display them


in the application you should use LIMIT and OFFSET

SELECT name, surname , email


FROM earthers e
LIMIT 20 Page 1
OFFSET 0
SELECT name, surname , email
FROM earthers e
LIMIT 20
OFFSET 20 Page 2
UNION
“Prepare a list of all known humans”

We have the table martians and the table


earthers, and both contains humans.

SELECT name, surname, email Same number of


FROM martians m
UNION row and same
SELECT name, surname, email types
FROM earthers e
VIEWS
The concept “View” refers to present a
query like a table.
Can be used to:
• Restrict visibility (show just a list of fields)
• Avoid having to repeat a complex query

For example, to have available the value in


stock of all bases…
CREATE OR REPLACE VIEW value_stock_per_base AS
SELECT r.resort_id, r."name" as Resort_name, SUM(rs.stock * s.value) as
stock_value_per_resort
FROM resort_supplies rs, supplies s, resorts r
WHERE rs.supply_id = s.supply_id
AND r.resort_id = rs.resort_id
GROUP BY r.resort_id, r.name
Summarizing so far..
• We can explore all the areas of the schema of an
intersection of data stored in tables.
More information on queries:
• If we only set the tables and the restrictions, we make a https://www.postgresql.org/docs/9.1/queries.html
INNER join.
• We can use subqueries as tables
• There is a huge number of functions available that can
be used in the SELECT section, or the WHERE section.
• The information can be grouped and make operations
within the grouped values.
• We can page the information, concatenate two queries,
and generate views from the queries.
Inserting data
To insert data in a table you have to send an Insert command:

INSERT INTO <table> (<field1>, <field2>, .., <fieldn>)


VALUES {(<value1>, <value2>, .., <valuen>),}

INSERT INTO martians (name, surname, base_id, email)


VALUES ('Josh', 'Martins', 2, 'jmartins@google.mars’);

INSERT INTO martians (name, surname, base_id, email)


VALUES ('Kira', 'Loshten', 2, 'klosh@google.mars’),
('Sandra', 'Osburn', 2, 'sosburn@google.mars');
Updating information Martians table (original data)

The syntax of the update command is:

UPDATE <table_name>
SET <field1> = <value1>,
<field2> = <value2>
WHERE <condition> Bases table

Let’s set Josh Martins to the base “Orbit 1”:

UPDATE martians
SET base_id = 4
WHERE martian_id = 9

or

UPDATE martians
SET base_id = 4
WHERE name = ‘Josh’ and surname = ‘Martins’
Updating information Martians table (original data)

Sophie Leicester has been identified as the CIA


agent Theresa Loid, please update its data in
the martians table.
Bases table

UPDATE martians
SET name = ‘Theresa’, surname = ‘Loid’
WHERE martian_id = 5
Updating information Supplies table

Some supplies, the ones that com from Mars,


will have to increase its value a 7%. The
affected supplies are: Martian Chicken,
Magazines and Bread.

UPDATE supplies
SET value = value * 1.07
WHERE supply_id IN (1, 9, 11)
Indexes and optimization
“Earthers” is a table, for marketing purposes, of earthers that may be
interested on the visit to Mars. There are more than 13.5 Million
registers!!.
Its structure is:
create table earthers
(
earther_id serial,
name text not null,
surname text not null,
email text,
primary key (earther_id)
);
Indexes and optimization
Search Time (in
seconds)
The search times in my computer (yours may be different) are: Who has the email James_Ruhl@microsoft.com
Select * from Earthers where email=‘James_Ruhl@microsoft.com’ 5.77
Get the register from Leo White
Select * from Earthers where name=‘Leo’ and surname=‘White’ 5.17
Can we do it better? Sure!! Count how many people is named Noah
Select count(*) from Earthers where name=‘Noah’ 4.84
Count how many people is named Noah
Select count(*) from Earthers where surname=‘White’ 5,13

What will the DBMS do to resolve the queries? Use the EXPLAIN
clause..

… Will go one by one… not very good!


SEQUENTIAL SCAN

https://www.postgresql.org/docs/12/sql-createindex.html
Indexes and optimization
Search Time (in
seconds)
Let’s create an index for the email field: Who has the email James_Ruhl@microsoft.com
Select * from Earthers where email=‘James_Ruhl@microsoft.com’ 5.77
Get the register from Leo White
Select * from Earthers where name=‘Leo’ and surname=‘White’ 5.17
CREATE INDEX ndx_earthers_email
Count how many people is named Noah
ON Earthers (email) Select count(*) from Earthers where name=‘Noah’ 4.84
Count how many people is named Noah
Select count(*) from Earthers where surname=‘White’ 5,13
It will take a bit… 2m 42s

How will the DBMS resolve the query now?


INDEX SEARCH
… Will go one by one… not very good!

How long will it take…. 6ms!!!


From 5700 ms to 6ms.. 1000 times faster!!

https://www.postgresql.org/docs/12/sql-createindex.html
Indexes and optimization
So… why not to place index for every field in all tables?

Because the price to pay is:


• Memory usage.
• Update and insert performance

So you have to know which are the queries to perform, and create
indexes only for those queries.

https://www.postgresql.org/docs/12/sql-createindex.html
Materialized views
A Materialized View is a view whose resulting data is stored on the
DBMS.

When is the snapshot taken? When you create it or when you refresh it.
It may not be up to date always? Yes. correct.
You can request the DBMS to refresh its data whenever you want.
It will provide the information much faster.

https://www.postgresql.org/docs/12/rules-materializedviews.html
Backup and recovery
From the terminal command line you have the following tools:

• pg_dump: Generates a dump (backup) of the database.


$ pg_dump –h <host> -U <user> <database>
$ pg_dump –h 127.0.0.1 –U dbuser1 martian_database | gzip > martiandb.dmp.gz

• psql: the main CLI interface for the database


$ psql –l –h 127.0.0.1 –U dbuser1 => will list the databases
$ psql –h 127.0.0.1 –U dbuser1 => will log in as dbuser and allow to launch commands

You might also like