Download as pdf or txt
Download as pdf or txt
You are on page 1of 76

AWS Developer Associate

Databases on AWS
Learning Objectives

By the end of this lesson, you will be able to:

Identify the different types of databases offered by AWS

Present an overview of the features and benefits of Amazon RDS

Create a table using DynamoDB console

Delineate the concepts in DynamoDB

List the aspects of Amazon ElastiCache


Introduction to Databases
What Is a Database?

A database is a collection of individual data items stored in a highly structured manner.

Provides the ability to store a large amount of information

Facilitates quick access to information

Allows users to share information at different locations

Ensures data security

AWS databases are both relational and non-relational.


Relational Databases

A Relational Database is a group of data items having pre-defined relationships with each other.
These items are arranged into a set of tables with rows and columns.

Table 2
users
1 1
id int
Table 1 Table 4
ratings first_name varchar tags
last_name varchar
id int id int
email varchar
rating int tag varchar

user_id int Table 3 user_id int


.. ..
movie_id int movies movie_id int

id int
1 1
name varchar
description text
Features of Relational Databases

SQL: is a primary interface

Data integrity: is enforced by a set of constraints

Transactions: result in a COMMIT or a ROLLBACK

ACID compliance: ensures data integrity


AWS Relational Databases

Here are some Relational Database Engines that Amazon RDS offers:

Amazon Aurora

Oracle

Microsoft SQL Server

Maria DB
Key-Value Databases

A Key-Value Database is a type of Non-relational Database. To store data, it uses a collection of key-
value pairs in which the key acts as a unique identifier.

Products
Primary key
Attributes
Partition key Sort key

Product ID Type Schema is defined per item

1 Book ID Odyssey Homer 1871

2 Album ID 6 Partitas Bach


Items
Album ID: Partita
2
Track ID No. 1
Drama,
3 Movie ID The Kid Chaplin
Comedy

Example of data stored as key-value pairs in DynamoDB


Use Cases of Key-Value Databases

Session Store Shopping Cart

Session data is always queried by a Key-Value Databases are capable of


primary key. Hence, a fast key-value scaling large amounts of data and
store is an ideal fit for session data. high volumes of state changes.
AWS In-Memory Databases

An In-Memory Database is a type of purpose-built database that primarily depends on memory


for data storage.
In-Memory Databases are ideal for applications that need microsecond response times.

Application

Master Server

RAM: RAM: RAM: RAM:


Data Partition 1 Data Partition 2 Data Partition 3 Data Partition 4
Use Cases of In-Memory Databases

Real-Time Bidding Gaming Leaderboards

In-Memory Databases can In-Memory Databases can quickly


ingest, process, and analyze real-time deliver sorting results and update
data with sub-millisecond latency. the leaderboard in real-time.

Caching

The primary purpose of a cache


is to facilitate increased
data retrieval performance.
AWS In-Memory Databases

Amazon Elasticache for Redis


A blazing fast in-memory data store that
provides sub-millisecond latency to power
Internet-scale, real-time applications

Amazon Elasticache for Memcached


A Memcached-compatible in-memory
key-value store service that can be
used as a cache or a data store
Amazon RDS
Amazon RDS

Amazon RDS is a Relational Database Management service.

• Provides CPU, memory, IOPS, and storage separately for individual scaling
• Looks after software patching, updates, backups, recovery, and automatic failure
detection
• Facilitates creating backups automatically or manually via snapshot
• Has a primary instance and a simultaneous secondary instance to provide high
availability and avoid failure

It is mainly used to manage the data of e-commerce platforms,


gaming software, apps, and websites.
Benefits of Amazon RDS

Availability of MySQL, postgreSQL, Oracle,


and SQL servers

Need for payment only during use

Ease in handling of patching, backups, and


replication

Simple and fast scaling

AWS RDS Simple and fast deployment

Fast and predictable performance


Amazon RDS Database Engines

Amazon Aurora PostgreSQL

MySQL MariaDB

Microsoft SQL
Oracle
Server
Amazon Aurora

Amazon Aurora is a Relational Database fully managed by Amazon RDS.

Compatibility with MySQL and PostgreSQL

High speed: Up to 5X faster performance than MySQL and 3X faster


performance than PostgreSQL

Applicability for cross-region Read Replica

High availability, durability, and security

Cost-effective

Amazon Aurora consists a storage volume of 10GB logical blocks. It can scale
up to 64 TB when required.
Crash Recovery

Traditional Databases AWS Aurora

• Replay logs since the last • Performs redo of records on


checkpoint demand, as part of disk read
• Generally, takes five minutes • Performs parallel, distributed,
between checkpoints vs and asynchronous operations
• MySQL works with single-thread; • Does not replay on startup of
number of disk accesses are server
very high
Use Cases of Amazon RDS

Web and Mobile Applications

• Amazon RDS is the perfect fit for highly demanding applications as it


provides a high throughput, massive storage scalability, and high
availability.

• The absence of licensing constraints best suits the variable usage


pattern of these applications.
Use Cases of Amazon RDS

E-Commerce Applications

• Amazon RDS is a flexible, secured, highly scalable, and low-cost


database solution that is well-qualified for small and large e-
commerce businesses.

• It helps satisfy PCI compliance and builds a superior customer


experience, without the hassle of managing the underlying
database.
Use Cases of Amazon RDS

Mobile and Online Games

• Amazon RDS efficiently manages the database by taking care of the


provisioning, scaling, and monitoring of database servers.

• It can rapidly increase its capacity by providing familiar database


engines to meet user demand.
Database Instances

A Database Instance is a set of memory structures that manage the database.

It is a basic building block of RDS.

The computation and memory capacity of a DB Instance is determined by its


DB Instance class, which is selected as per need.

Every DB Instance can host multiple-user created databases or a single


oracle database with multiple schemas.

Every DB Instance runs on a DB engine.

By default, a customer can have 40 DB Instances.


Backup and Restore in Amazon RDS

VPC A VPC B
RDS RDS
Instance R Instance R
1 2

EC2 S3 EC2
Instance Bucket Instance
A B

Data Flow Diagram during Backup and Restore


Backup and Restore in Amazon RDS

Amazon RDS offers automated backups, point-in-time restores, and database snapshots.

AWS RDS carries the automated


backups of DB Instances, based on The backup retention period can be
the specified backup retention set between one and 35 days.
period.

When a DB Instance is deleted, the


Backups can also be created automated backups also get
manually via snapshots. deleted. But the manual snapshots
remain the same.
Multi-AZ Deployments in Amazon RDS

When a Multi-AZ DB Instance is provisioned, Amazon RDS creates a primary DB


Instance automatically and, simultaneously, replicates the data to a standby instance
in a different Availability Zone (AZ).

Benefits

Enhanced durability

Increased availability

Protected database performance

Automatic failover
Failover Conditions

AWS RDS automatically switches from a primary DB Instance to a standby replica present in
another availability zone whenever one or more of following conditions occur:

Failure of a primary DB Instance Blackout of an availability zone

Software patching of the OS of Change in the DB Instance


DB Instance under process server type

The normal failover time is 60–120 seconds. This may be exceeded in case of a heavy
recovery process.
Failover Conditions

Application Database
servers failure Standby

New standby
Availability Zone A

Primary
Availability Zone B
Read Replicas in Amazon RDS

Read Replicas are one or more copies of a particular Relational Database Instance to handle
high volume read traffic.

Application servers Database server

Read/write Primary
• Any amazon RDS activity initiated runs only
in the current default region.

• Amazon RDS provides high availability and


failover support for DB Instances by Asynchronous
maintaining asynchronous standby replica in replication
multi-availability zone deployments.

• Amazon RDS synchronizes standby replicas


in different availability zones. Read only

BI/reporting
application server Read replica
Costs of Amazon RDS

Amazon RDS offers a pay for what you use. The table below lists the billing procedure for
various parameters:

Parameters Billing procedure

Based on the class, a full hour will be considered even if


DB Instance hours
the DB Instance is consumed for a partial hour

Scaling the provisioned storage capacity within the


Storage (per GB per month)
month will be billed pro-rated

I/O requests per month Total number of storage i/o requests

Data transfer Data transfer in and out on tour DB Instance on Internet


Assisted Practice
RDS Database Instance

Problem Statement: Create an RDS database instance. Duration: 15 mins


Assisted Practice: Guidelines

Steps to create an RDS Database Instance:

1. Go to AWS management console and click on “RDS”.


2. Select the database engine.
3. Fill the required details.
4. Click on “launch DB Instance”.
5. Install WAMP 64 and give the path of its location in command prompt.
6. Enter the endpoint, username, port, and password to connect AWS, RDS and the WAMP
server.
7. Once the connect is done, perform CRUD operation in it.
Amazon DynamoDB
Difference Between SQL and NoSQL Databases

Characteristics SQL NoSQL

Workloads Ad hoc queries, data warehousing, OLAP Web scale applications

Schema-less with a primary key;


Well-defined schema where data is
Data model manages structured or
normalized into tables, rows, and columns
semi-structured data

AWS management console or


Data Access SQL
AWS CLI; performs ad hoc tasks

Performance Optimized for storage Optimized for compute

Scaling Vertical scaling Horizontal scaling


Amazon DynamoDB

DynamoDB is a fully managed NoSQL database that supports key-value and document data.

It is used by systems that require milli-second read latency.

The record in every row is known as item. A TTL (Time to leave) can be set to
automatically delete the items in the table once they expire.

Operations such as create, insert, update, query, scan, and delete are
performed in the table via appropriate API.

For faster performance and data durability, the table data is stored in an SSD
disk and spread across many servers in different availability zones.
Use Cases of Amazon DynamoDB

Ad tech Gaming

Retail Banking and Finance


Use Cases of
Amazon
DynamoDB

Media and Entertainment Software and Internet


Read Consistency in DynamoDB

DynamoDB supports both Eventually Consistent Reads and Strongly Consistent Reads.

Eventually Consistent Read


Stale data is provided instead of the one recently added in the DynamoDB table.
If the read request is repeated after a short time, the response returns the latest data.

Strongly Consistent Read


The response is returned with the most up-to-date data, reflecting the updates from all
prior successful write operations.
Strongly Consistent Read might not be available if there is a network delay or outage.
Amazon DynamoDB Global Tables

Amazon DynamoDB global tables act as a complete solution to deploy a multi-region,


multi-active database, without the need for building and maintaining a replication.

The AWS Regions where the table is to be available can be specified.

DynamoDB executes all the tasks needed to create identical tables in the
specified regions and distributes ongoing data changes to all of them.
How DynamoDB works

2. Add and query


items

3. Monitor and manage


1. Create table table
Benefits of Amazon DynamoDB Global Tables

Is a perfect fit for massively scaled


applications with globally dispersed users

Promotes fast application performance

Provides automatic multi-active replication


to AWS Regions globally

Delivers low-latency data access to users,


irrespective of their location
Amazon DynamoDB Pricing

The cost for using DynamoDB depends on the charges for reading, writing, and storing data
in DynamoDB tables, and for optional features, if any.

DynamoDB has two capacity modes that have specific billing options.

On-demand capacity mode Provisioned capacity mode

Charges for the data reads and writes Charges according to the number of
the application performs on the tables reads and writes specified per second
by the user
DynamoDB Use Case: Duolingo

Duolingo is a popular language-learning website and mobile app that delivers lessons for
80 languages. Duolingo uses DynamoDB to store around 31 billion items.

DynamoDB fits the requirements for Duolingo owing to its


scalability and performance.
Assisted Practice
DynamoDB

Problem Statement: Create a table using the DynamoDB Console. Duration: 15 mins
Assisted Practice: Guidelines

Steps to create a table using the DynamoDB Console:

1. Go to AWS management console and select the DynamoDB service.


2. Click on create table and enter the table name and primary keys.
3. Now select Items and click on create item to insert data in the table.
4. If the data is inserted successfully, you can read it from the dashboard.
5. If you want to remove an item from the table, click on remove.
6. If you want to delete the table, click on Delete table.
DynamoDB Concepts
Indexes

An index is a data structure that allows the user to perform fast queries on
specific columns in a table.

DynamoDB supports two types of indexes.

01 02

Local Global
Secondary Secondary
Index Index
Scan vs Query API Call

Scan API scans the table to Query API performs a direct


look for elements that match lookup to a selected partition.
the criteria. The lookup will be based on
partition or hash key.
DynamoDB APIs

There are three planes in DynamoDB API.

Control Plane

Data Plane

DynamoDB Streams
Control Plane

Control Plane allows to create and manage DynamoDB table.

CREATETABLE

DESCRIBETABLE

UPDATETABLE
Operations
DELETETABLE

LISTTABLE

DESCRIBELIMITS
Data Plane

Data Plane allows to perform CRUD actions on data in a table.

Creating data

Reading data

Updating data

Deleting data
Throughput Capacity

Throughput capacity is the speed at which the file server hosting the file system can
serve file data.

Read and Write capacities

A Read Capacity unit represents only one strongly consistent read per second,
or two Eventually Consistent Reads per second, for an item up to 4KB in size.

A Write Capacity unit represents one write per second for an item up to
1KB in size.

Note

Specify the capacity requirement for Read and Write activity


while creating the table.
DynamoDB On-Demand Capacity

DynamoDB On-Demand Capacity is a flexible billing option that requires no capacity


planning. The user need not mention the Read and Write Capacity.

On-demand is preferable when:

New tables with unknown workloads must be created.

The application traffic is unpredictable.

Pay for what is used is preferred.

Note

On-demand mode can be chosen either while creating the


table, or later, using the Capacity tab.
DynamoDB Accelerator

DynamoDB Accelerator (DAX) is a caching service, which is:

Fully Highly
Manageable Available

10-times In-memory
faster cache
DynamoDB Transactions

DynamoDB transactions help developers operate on multiple items in a single request.

Help the developer implement business logic that requires


multiple, all or no operation across one or more tables

Provide atomicity, consistency, isolation, and durability


(ACID) across tables

Support scale, and performance to a broader set of


workloads

Offer multiple read and write options to meet different


application requirements
Working of DynamoDB Transactions

TransactWriteItems API
Is a batch operation that contains a write set, with one or more PutItem, UpdateItem
and DeleteItem operations. It can optionally check the pre-requisite that must be
satisfied before an update is made.

Idempotency
It is an optionally available feature, which prevents application errors if multiple items
are submitted due to connection time-outs or network errors.
Working of DynamoDB Transactions

Error Handling for Writing


Write transaction fails if a condition expression is not met ‘or’ more than one action in
the same TransactionWriteItems target the same item.

TransactGetItems API
Is a batch operation that contains a read set with one or more GetItem operation. If it is
issued on an item that is a part of an active write transaction, the read transaction is
cancelled. It can include up to 25 unique items or 4 MB data.
DynamoDB Transactions

Within a transaction, a conflict can occur during concurrent item-level requests on a same
item.

The scenarios when transactional conflicts could occur are:

A request (put, update, delete) for an item conflicts with an ongoing


TransactWriteItems request

A request for a TransactWriteItems with an ongoing TransactWriteItems for


the same item

A request for a TransactGetItems with an ongoing TransactWriteItems for


the same item
DynamoDB Time To Live

Amazon DynamoDB Time to Live (TTL) supports defining a per-item timestamp. This helps to
determine when an item is no longer needed.

TTL Features

Removes user or sensor data after one year of inactivity


in an application

Archives expired items to an Amazon S3 data lake via


Amazon DynamoDB Streams and AWS Lambda

Retains sensitive data for a certain amount of time, based


on contractual or regulatory obligations
DynamoDB Streams

DynamoDB Streams are used to replicate the data from one table
to another in a different region.

APIs used for data transfer are:

LISTSTREAM: Retrieves a list of stream descriptors for current account and endpoint

DESCRIBESTREAM: Retrieves detailed information about a given stream.

GETSHRADITERATOR: Retrieves a shard iterator

GETRECORDS: Retrieves the stream records within a given shard


Routing Policies

Routing Policies are used to route the traffic based on the geographic location
from where the DNS query has originated.

Fast and
consistent Fully Fine-grained
performance manageable access control

Highly Event-driven Flexible in


scalable programming nature
Amazon ElasticCache
Amazon Elasticache

ElastiCache is an AWS in-memory data store and cache environment. It is used to cache results
and reduce overhead and latency on database.

It is a web service that improves the performance of web applications.

It helps to set up, manage, and scale a distributed in-memory cache


environment in the cloud.

It supports two open-source memory engines—Redis and Memcached.


Popular Use Cases of Elasticache: Adtech

Ad serving

Real-time bidding

ID-looking

Session tracking

User profile management


Popular Use cases of Elasticache: IoT

Tracking state

Real-time notification

Metadata and reading from


millions of devices
Popular Use cases of Elasticache: Gaming

Recording game details

Leader boards

Session information

Usage history

Logs
Popular Use cases of Elasticache: Mobile and Web

Storing user profile

Session details

Personalization setting

Entity-specific metadata
Amazon Elasticache: Redis

Redis is an in-memory data structure store used as database, cache, and


message broker.

It is single threaded, and its Read Replicas are synced asynchronously.

It collects one to six Redis nodes and the collection process is called Shard.

It uses one to 15 shards when cluster mode is enabled and uses only one
shard when it is disabled.

It stores the backups in s3, with a retention period of 0 to 35 days.


Amazon Elasticache for Redis: Benefits

Monitoring and management Enhanced Redis Engine

Reliable and efficient open


Simplified administrative tasks
source Redis

Security and compliance Scalability

Compliant data protection and Adjustable usage, based on the


help needs
Amazon Elasticache: Memcached

Memcached is used to speed up the dynamic data driven websites. Hence, it is called
distributed memory catching system .

Memcached is simple to use and is multi-threaded.

Memcached cluster can have a maximum of 100 nodes in a region.

Memcached supports both horizontal and vertical scaling.

Memcached is fast and is well established.


Benefits of Amazon Elasticache for Memcached

Extreme Performance Secure and Hardened

By utilizing an end-to-end optimized It continuously monitors your nodes and


stack running on customer nodes, it applies the necessary patch to keep your
provides blazing fast performance. environment safe.

Memcached compatible

It’s compliant with Memcached, so


popular tools we use today will work
seamlessly with the service.
Benefits of Amazon ElastiCache for Memcached

Easily Scalable Fully-Managed

It includes sharding to scale in – memory No longer need to perform management


cache up to 20 nodes and 12.7 TB per tasks as it monitors your cluster to keep
cluster. your workloads up and running.

Auto-Discovery

It saves users’ time by simplifying the


way an application connects to a
Memcached cluster.
Amazon Elasticache Costs

Elasticache offers a usage-based subscription following a free trial. It provides storage space
for one snapshot free of charge for each active ElastiCache for Redis cluster.
Shown below is a list of node types supported by Elasticache:

On-demand nodes: A user pays for memory capacity by


the hour that a node runs.

Reserved nodes: A user can choose to make a one-time


upfront payment, no upfront payment, or one-time upfront
payment with low hourly charges for each reserved node.

Note

Additional back up storage for snapshots is charged at


$0.085 per GB every month.
Memcached versus Redis

Characteristics Memcached Redis

Is an in-memory key value store, Is an in-memory data structure store, used


Description
originally intended for catching as database, cache, and message broker

Replication Does not support replication Supports master-slave replication

Stores variables in memory and


Storage type retrieves information directly Is like a database that resides in memory
from server instead of DB
Memcached versus Redis

Characteristics Memcached Redis

Good to handle high traffic Neither can handle high traffic on read nor
Read and Write speed
websites heavy writes

Key length Has a maximum of 250 bytes Has a maximum of 2GB

Catching relatively small and Session cache, full page cache (FPC),
Ideal for static data such as HTML code Queues, 000000000000000000000 or
fragments counting, and more
Key Takeaways

There are three types of databases offered by AWS—


Relational, Key-Value, and In-Memory Databases.

Amazon RDS is a web service that helps to set up, operate,


and scale a relational database in the AWS Cloud.

Amazon DynamoDB is a fully-managed NoSQL database


service that provides high speed and seamless scalability.

There are three planes in DynamoDB API—Control Plane,


Data Plane, and DynamoDB Streams.

Amazon ElastiCache is used to cache results and reduce the


overhead and latency on the database.
Storing Application Data in MySQL DB using Amazon RDS

Problem Statement:
You are asked to demonstrate joining multiple VPC together using Peering
Connection and Private Link

Tools required:
WAMP Server, AWS RDS, Visual Studio Code

Expected Deliverables:
Screenshots for every steps

You might also like