DBS-C01-S02-B-03-Relational Databases

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

DBS-C01-S02-B-03-Relational Databases

Relational databases have been commercially available since the 1970s.  They provide an efficient, intuitive, and
flexible way to store and report on highly-structured data. These structures, called schemas, are defined
before any data can be entered into the database. Schemas are designed and built based on reporting
requirements.  

This means that a database’s expected output drives the creation of the database and how data is stored
inside it. Once a schema has been defined, database administrators and programmers work backward from
these requirements to define how data will be stored inside the database. No data can be stored in a database
until this work has been completed.  

Schema changes to existing databases are expensive in terms of time and compute power.   It also has a risk of
corrupting data and breaking existing reports.

 Data in a relational database is stored in tables. 

 Each table--sometimes called a relation--contains one or more rows of data.  

 Each row--sometimes called a record--contains a collection of logically related data that is identified by
a key. 

 The pieces of data stored in a row are called attributes or fields.

Visually, a table looks like a spreadsheet that has rows and columns.  Stored in one of the columns, each table
has a primary key that uniquely identifies the information stored in each row.   Relationships between tables
are created using these keys and there are rules that govern their behaviour.  The primary key in one table is a
foreign key in another.

Data integrity is of particular concern in a relational database, there are a number of constraints that ensure
the data contained in tables is reliable and accurate.

These reliability features--commonly referred to as ACID transactions:

1. Atomicity

2. Consistency

3. Isolation

4. Durability 

 Atomicity refers to the elements that make up a single database transaction.  A transaction could have
multiple parts.  It is treated as a single unit that either succeeds completely or fails completely.

 Consistency refers to the database’s state.  Transactions must take the database from one valid state
to another valid state.

 Isolation prevents one transaction from interfering with another.


 Durability ensures that data changes become permanent once the transaction is committed to the
database.

Data in a relational database must be kept in a known and stable state.  

 As part of the requirements to maintain database stability, Primary and Foreign Keys are constrained--
they have rules that govern them--to ensure the integrity of database tables.

 Entity Integrity ensures that, in a table, the primary key is unique to the table and it has a value. 
Primary keys cannot be blank or null.

 Referential Integrity requires that every value in a Foreign Key column exists as the Primary Key of its
originating table.  If four tables are related and a record is deleted in one of them, then the
corresponding records in related tables must be deleted as well.    

The standard user and application programming interface--or API--of relational databases is the Structured
Query Language, SQL. Pronounced as either Ess-Queue-Ell or Sequel, it can be used either interactively or
programmatically to create, update, and maintain the data inside a relational database. SQL is the dominant
query language for relational databases. SQL is an industry standard, it is interoperable between database
engines and application programming languages, well-documented, and stable.

Security is one of the most important responsibilities of a database administrator.  Relational database engines
have built-in features for securing and protecting data but planning and effort are required to properly
implement them. 

These features include:

1. User Authentication

2. Authorization

3. Audit Logging

As part of the structure, data stored in relational databases is highly normalized. 

 Normalization is a process where information is organized efficiently and consistently before storing
it.

 Duplicate data is discarded.

 Closely related fields are grouped together.

 Data should only be stored one time in a relational database.  Fields that are logically related, like a
first and last name, should be stored in the same table.  

 Removing redundancy and keeping similar data close reduces storage costs and improves the
efficiency of data retrieval.

Relational databases are not partition tolerant.  A data partition, in this case, refers to the disk.  Adding
another disk would be like creating a second copy of the database.  This copy, or partition, is called a shard.  
 When a shard is created, it uses the original database’s schema.  This is a horizontal partition of a
database.

 To use it, logic outside of the database must be created to direct queries to the correct database.

 This is because relational databases are designed to validate how data is stored.   They do not check to
see if information belongs inside it.

To illustrate, here’s a weather database split into a pair of shards, Rain and Snow.  They are identical except for
the information stored inside them. An application determines if data should be stored in Rain or if it should be
stored in Snow. If a record belonging in Rain ends up in Snow and it matches the database schema, it will be
stored. However, since that record belongs in Rain, the reports will be wrong and applications will break when
trying to query data.

Because of this complexity, most of the time relational databases are scaled vertically.  

 Horizontal scaling adds a copy of the database server. 

 Vertical scaling is growing the server; usually by adding memory, CPU, or expanding a disk volume .

Vertical scaling has limits.  There are only so many resources that will fit inside a server.  Once these limits have
been reached, a database will either need to be redesigned or broken into shards.

AWS has six fully-managed database engines available inside the Relational Database Service, RDS:

1. Amazon Aurora

2. MySQL

3. Postgres

4. MariaDB

5. Oracle

6. Microsoft SQL Server

Amazon Aurora is AWS’s cloud-native version of MySQL and Postgres.  

As a review…

 Relational databases are highly-structured data stores.

 The structure is called a schema.

 The schema defines how data is stored in tables.

 Inside tables there are rows and columns.


 A row is a record and each column is an attribute or field of the record.

 Tables have keys that identify data in a table.

 A Primary Key uniquely identifies a row in a table.

 Foreign Keys are used to connect data in a row to rows in other tables.

 Scaling is usually done vertically by adding compute resources to an existing database.

 Horizontal scaling is called sharding and requires logic outside of the database.

 Relational databases are ideal for applications that do online transactional processing. 

 These OLTP applications include online banking, e-commerce sites, inventory management, human
resource management, and financial services.

 OLTP transactions usually perform specific tasks and involve a single record or a small selection of
records. An online banking customer might send money from a checking account to a saving account.
A transaction like this involves two accounts and no other customers of the bank.

But, what about analytical applications where hundreds, thousands, or millions of transactions need to be
processed quickly, efficiently, and at a low cost?  

That’s where non-relational databases are helpful.  Though, unlike the various relational database engines that
have similar needs around structured data, the size & shape of the unstructured or semi-structured data
determine the type of non-relational database to choose.

These non-relational databases are often called NoSQL databases because, when they were first developed,
they used something other than SQL to store and retrieve data.  

However, over time, SQL has been adapted to be used with some of these non-relational databases.   Because
of this, NoSQL can also mean “Not Only SQL.”

If any type of data can be stored in a relational database, why bother with a non-relational database?  

In the next lecture, let’s learn about NoSQL Databases, what they are, and what differentiates them from
relational databases.

You might also like