Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

Databases on AWS

Amazon Relational Database Service (RDS)


Amazon Relational Database Service (Amazon RDS) is a managed web service that
makes it easy to set up and operate relational database in the cloud. Since this is a
managed service, on your behalf, Amazon RDS will manage the administration tasks
such as hardware provisioning, database setup, patching, backups, automatic failure
detection, and recovery. You can use the AWS Database Migration Service to easily
migrate or replicate your existing databases to Amazon RDS.

Amazon RDS supported relational database engines:

1. SQL Server
2. Oracle
3. MySQL
4. PostgreSQL
5. MariaDB
6. Amazon Aurora

DB Instance
A DB instance is an isolated database environment in the cloud. Each DB instance
runs a DB engine. Amazon RDS currently supports six DB engines. The computation
and memory capacity of a DB instance is determined by its DB instance class.

● DB instance storage comes in three types:


○ General Purpose (SSD)
○ Provisioned IOPS (PIOPS)
○ Magnetic
● Amazon RDS uses Network Time Protocol (NTP) to synchronize the time on DB
Instances.
● You can stop a DB instance for up to 7 days. If you do not manually start your DB
instance after 7 days, it will be started automatically.
● A DB instance can host multiple databases, or a single Oracle database with
multiple schemas.
● By default, you can have up to a total of 40 Amazon RDS DB instances.
● RDS is not serverless with the exception of Aurora serverless.
● DynamoDB is serverless.

1
DB Parameter Groups
A DB parameter group acts as a container for engine configuration values that are
applied to one or more DB instances. You can manage your DB engine configuration
by associating your DB instances with parameter groups. Amazon RDS defines
parameter groups with default settings that apply to newly created DB instances.
You cannot modify the parameter settings of a default DB parameter group. What
you can do is to define your own parameter groups with customized settings and
then modify your DB instances to use your own parameter groups. You can specify a
custom parameter group when launching a new RDS instance.

Option Groups
Some DB engines offer additional features that make it easier to manage data and
databases, and to provide additional security for your database. Amazon RDS uses
option groups to enable and configure these features. An option group can specify
features, called options, that are available for a particular Amazon RDS DB instance.
Options can have settings that specify how the option works. When you associate a
DB instance with an option group, the specified options and option settings are
enabled for that DB instance.

Encrypting Amazon RDS Resources


Amazon RDS supports encryption at rest for all database engines. This includes
underlying storage of the DB instance, as are its automated backups, read replicas,
and snapshots.

Encryption for the database can be done during the creation of the database. Also,
you need to ensure that the underlying instance type supports DB encryption.

You can also add encryption to a previously unencrypted DB instance by creating a


DB snapshot and then creating a copy of that snapshot and specifying a KMS
encryption key. You can then restore an encrypted DB instance from the encrypted
snapshot.

Encryption in transit also supported by all Amazon RDS engines using SSL/TLS.
Once an encrypted connection is established, data transferred between the DB
Instance and your application will be encrypted during transfer.

2
High Availability (Multi-AZ)
Amazon RDS provides high availability and failover support for DB instances using
Multi-AZ deployments. When you select this option, Amazon automatically
provisions and maintains a secondary standby DB instance in a different Availability
Zone. It is supported by all RDS database engines.

With Multi-AZ, AWS will handle the replication for you and your primary DB instance
is synchronously replicated across Availability Zones to the secondary instance. In
the event of planned DB maintenance, DB instance failure or an availability zone
failure, Amazon RDS will automatically failover to the secondary standby database
minimizing the downtime.

The high-availability feature is not a scaling solution for read-only scenarios. Multi-
AZ is for Disaster Recovery (DR) only. You cannot use the standby to serve read
traffic. To service read-only traffic, you should use a Read Replica. The RDS console
shows the Availability Zone of the standby replica, called the secondary AZ.

Read: https://aws.amazon.com/blogs/database/amazon-rds-under-the-hood-multi-az/

Read Replicas
Read Replicas make it easy to scale-out (Not scale-up) a DB instance with a read-
heavy database workloads. You can create up to 5 read replicas per DB instance
(source) and distribute your read traffic amongst them (read scaling). In particular,
updates are applied to your Read Replica(s) after they occur on the source DB
instance (asynchronous replication). You can create read replica from a read replica.

You need to turn-on automatic backups on your source DB Instance before adding
read replicas, by setting the backup retention period to a value other than 0. Backups
must remain enabled for read replicas to work.

You can have read replicas that have Multi-AZ and currently MySQL, MariaDB and
PostgreSQL support Multi-AZ read replica deployment. You can create read replicas
of Multi-AZ source databases. Read replicas can be promoted to be their own
database, however that will break the replication.

Amazon RDS doesn't support circular replication. You can't configure a DB instance
to serve as a replication source for an existing DB instance; you can only create a
new Read Replica from an existing DB instance. For example, if MyDBInstance
replicates to ReadReplica1, you can't configure ReadReplica1 to replicate back to

3
MyDBInstance. From ReadReplica1, you can only create a new Read Replica, such as
ReadReplica2.

Read replicas are available in Amazon RDS for MySQL, MariaDB, PostgreSQL and
Oracle as well as Amazon Aurora.

https://aws.amazon.com/rds/details/read-replicas/

Multi-AZ Vs Read Replica: https://www.quora.com/What-is-the-difference-between-the-


Multi-AZ-deployment-and-Read-Replica-in-AWS-RDS

Replicas now support Multi-AZ Deployments (MySQL, MariaDB, PostgreSQL, Oracle):

https://aws.amazon.com/about-aws/whats-new/2018/01/amazon-rds-read-replicas-now-
support-multi-az-deployments/

https://aws.amazon.com/about-aws/whats-new/2018/06/rds-postgres-supports-
readreplicas-multiaz/

Backups
There are two types of backups:

● Automated backups
● Database snapshots

Automated backups are enabled by default and It creates a storage volume


snapshot of your DB instance, backing up the entire DB instance (not just individual
databases) and store in S3.

Amazon RDS creates automated backups of your DB instance during the backup
window of your DB instance. Amazon RDS saves the automated backups of your DB
instance according to the backup retention period that you specify. If necessary, you
can recover your database to any point in time during the backup retention period.

By default, when you create an RDS instance in AWS, daily backups are enabled with
a 7 day retention policy. You can set the backup retention period to between 1 and
35 days. Setting the backup retention period to 0 disables automated backups.

You can also backup your DB instance manually (user-initiated), by manually creating
a database snapshot. There is a manual snapshot limits of 100 per region and it
does not apply to automated backups.

4
The first snapshot of a DB instance contains the data for the full DB instance.
Subsequent snapshots of the same DB instance are incremental, which means that
only the data that has changed after your most recent snapshot is saved.

Automated backups occur daily during the preferred backup window and while your
data is being backed up, storage I/O may be briefly suspended while the backup
process initializes (typically under a few seconds) and you may experience a brief
period of elevated latency.

If you are running a Multi-AZ deployment, automated backups and DB Snapshots are
simply taken from the standby to avoid I/O suspension on the primary. If the backup
requires more time than allocated to the backup window, the backup continues after
the window ends, until it finishes. The backup window can't overlap with the weekly
maintenance window for the DB instance.

All automated backups are deleted when you delete a DB instance and you can
choose to have Amazon RDS create a final DB snapshot before it deletes your DB
instance. Unlike automated backups, manual snapshots are not deleted when you
delete a DB instance, they are kept until you explicitly delete them.

Amazon RDS will also store the transaction logs throughout the day (RDS uploads
transaction logs for DB instances to Amazon S3 every 5 minutes) and when you do a
recovery, they can be applied on top of automated backups to do a point in time
recovery down to a second. Whenever you restore either an automated backup or a
manual database snapshot, the restored version of the database will be a new RDS
instance with a new DNS endpoint.

Restoring
Two ways:

● Restoring from a DB Snapshot


○ https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/
USER_RestoreFromSnapshot.html
● Point-in-Time Recovery
○ https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PIT.html

Whenever you restore either an automated backup or a manual snapshot, the


restored version of the database will be a new RDS instance with a new DNS
endpoint.

5
Monitoring
Monitoring Tools
AWS provides various tools that you can use to monitor Amazon RDS. You can
configure some of these tools to do the monitoring for you, while some of the tools
require manual intervention. It is recommended to use automated monitoring tasks
as much as possible.

Automated Monitoring Tools

● Amazon RDS Events - Subscribe to Amazon RDS events to be notified when


changes occur with a DB instance, DB snapshot, DB parameter group, or DB
security group.
● Database log files - View, download, or watch database log files using the
Amazon RDS console or Amazon RDS API actions. You can also query some
database log files that are loaded into database tables.
● Amazon RDS Enhanced Monitoring - Look at metrics in real time (1-60 sec
interval, default is 60 sec) for the operating system (OS) that your DB instance
runs on.

In addition, Amazon RDS integrates with Amazon CloudWatch for additional


monitoring capabilities:

● Amazon CloudWatch Metrics - Amazon RDS automatically sends metrics to


CloudWatch for each active database. You are not charged additionally for
Amazon RDS metrics in CloudWatch.
● Amazon CloudWatch Alarms - You can watch a single Amazon RDS metric over
a specific time period, and perform one or more actions based on the value of
the metric relative to a threshold you set.
● Amazon CloudWatch Logs - Most DB engines enable you to monitor, store, and
access your database log files in CloudWatch Logs.

CloudWatch Monitoring

Amazon CloudWatch offers standard monitoring metrics for your database


instances at no additional charge. You can use the RDS Management Console to
view key operational metrics, including CPU/memory/storage, I/O, DB connections.
Amazon RDS also provides Enhanced Monitoring. Enhanced Monitoring supports all
RDS database engines and it is available in all regions except for AWS GovCloud
(US). By default standard monitoring sends metrics every 5 minutes.

6
Read: https://n2ws.com/blog/aws-automation/features-amazon-rds-metrics-
monitoring

Differences Between CloudWatch and Enhanced Monitoring Metrics

CloudWatch gathers metrics about CPU utilization from the hypervisor for a DB
instance, and Enhanced Monitoring gathers its metrics from an agent on the
instance. As a result, you might find differences between the measurements,
because the hypervisor layer performs a small amount of work. The differences can
be greater if your DB instances use smaller instance classes, because then there are
likely more virtual machines (VMs) that are managed by the hypervisor layer on a
single physical instance. Enhanced Monitoring metrics are useful when you want to
see how different processes or threads on a DB instance use the CPU.

Read: https://www.sumologic.com/blog/amazon-web-services/amazon-rds-monitoring-
strategy/

Amazon RDS Performance Insights


Amazon RDS Performance Insights is a database performance tuning and
monitoring feature that helps you quickly assess the load on your database, and
determine when and where to take action. Performance Insights allows non-experts
to detect performance problems with an easy-to-understand dashboard that
visualizes database load.

Performance Insights uses lightweight data collection methods that don’t impact the
performance of your applications, and makes it easy to see which SQL statements
are causing the load, and why.

https://www.youtube.com/watch?v=4462hcfkApM

Exercise
● Create a MySQL RDS instance. Stick to all default free tier options.
● Create an EC2 instance with SSH/HTTP inbound allowed security group, install
Apache(HTTPD), PHP and PHP-MySQL and set up info.php/index.php pages.

sudo yum install httpd php php-mysql -y


--------------
sudo echo "<?php phpinfo(); ?>" > /var/www/html/info.php
--------------
sudo echo ' <?php

7
$servername = "7.7.1.222";
$username = "root";
$password = "mysql";
// Create connection
$conn = new mysqli($servername, $username, $password);
// Check connection
if ($conn->connect_error) {
die("Connection failed: " . $conn->connect_error);
}
echo "Connected successfully";
?> ' > /var/www/html/index.php
--------------
sudo service httpd restart
sudo chkconfig httpd on

● Modify the index.php and set MySQL endpoint as server and set the appropriate
username and password.
● The security group of the RDS instance does not have a rule to allow traffic from
your EC2 instance. Therefore, modify the security group of RDS and set a new
inbound rule for MySQL port by selecting EC2’s security group or its IP (IP should
be in IP/SubnetMask format).
● Try to load info.php page and see if you can connect to EC2 properly.
● Try to load index.php page and see if you can connect to MySQL properly.

Amazon DynamoDB
Amazon DynamoDB is a fully managed NoSQL database that supports key-value and
document data models.

● Stored on SSD.
● Spread across 3 geographically distinct data centers.
● Consistency cross all copies of data usually reached within a second (One
second rule).
● This is a serverless service.

Core Components
In DynamoDB, tables, items, and attributes are the core components that you work
with. A table is a collection of items, and each item is a collection of attributes.
DynamoDB uses primary keys to uniquely identify each item in a table and secondary
indexes to provide more querying flexibility.

● Tables - Tables
● Items - Rows

8
● Attributes - Columns
● Primary Key - DynamoDB supports two types of primary keys:
○ Partition key: A simple primary key, composed of one attribute known as the
partition key.
○ Partition key and sort key: Referred to as a composite primary key, this type
of key is composed of two attributes. The first attribute is the partition key,
and the second attribute is the sort key.
● Secondary Indexes - Amazon DynamoDB provides fast access to items in a
table by specifying primary key values. However, many applications might
benefit from having one or more secondary (or alternate) keys available, to allow
efficient access to data with attributes other than the primary key. To address
this, you can create one or more secondary indexes on a table. You can then
query or scan the index just as you would query or scan a table. DynamoDB
supports two types of secondary indexes:
○ Local secondary index (5 local secondary indexes per table)
○ Global secondary index (20 global secondary indexes per table)
● DynamoDB Streams - DynamoDB Streams captures item level changes in any
DynamoDB tables and provide them in a time-ordered sequence. These
information are stored in a log for up to 24 hours.

Read consistency
DynamoDB supports:

● Eventual consistent reads (Default)


● Strongly consistent reads (returns a response with the most up-to-date data)

DynamoDB uses eventually consistent reads (Default), unless you specify otherwise.

DynamoDB Auto Scaling


Amazon DynamoDB auto scaling uses the AWS Application Auto Scaling service to
dynamically adjust provisioned throughput capacity on your behalf, in response to
actual traffic patterns. This enables a table or a global secondary index to increase
its provisioned read and write capacity to handle sudden increases in traffic, without
throttling. When the workload decreases, Application Auto Scaling decreases the
throughput so that you don't pay for unused provisioned capacity. If you use the AWS
Management Console to create a table or a global secondary index, DynamoDB auto
scaling is enabled by default.

9
Amazon Redshift
Redshift is Amazon’s fully managed Data Warehouse service. It is really fast as well
as cheap when compared to others.

Two configuration types:

● Single node (160 GB)


● Multi-node (For a production environment, using at least two nodes
recommended)
○ Leader node - Manages client connections and receives queries.
○ Compute node - Stores data and perform queries. Can have upto 128
compute nodes.
● Use advance compression (Compression based on columns).
● Maintain 3 copies of your data.
● Limited to a single availability zone (Currently, Amazon Redshift only supports
Single-AZ deployments).
● Amazon Redshift always attempts to maintain at least three copies of your data:
○ The original
○ Replica on the compute nodes (Amazon Redshift replicates all your data
within your data warehouse cluster. Single node clusters do not support
data replication)
○ A backup in Amazon S3 (By default Amazon Redshift enables automated
backups of your data to S3 and these snapshots are incremental)
● For disaster recovery, Redshift can also asynchronously replicate your
snapshots to S3 in another region.
● When you restore a backup, AWS will provision a new data warehouse cluster
and restore your data to it.

Redshift pricing
With Redshift, you are charged for:

● Compute node hours (no charges for leader node hours)


● Backups
● Data transfers within VPC (no charges for outside)

Redshift security
Redshift provides:

10
● Encryption in transit
● Encryption at rest

Aurora
● Amazon proprietary database.
● Compatible with MySQL and PostgreSQL
● Maintain 6 copies of your data across 3 Availability Zones.
● Two types of replicas:
○ Aurora replica
○ MySQL replica
● Migrate to Aurora:
○ Create an Aurora read replica and promote it.
○ Create a snapshot and restore from that snapshot.

Elasticache
● Elasticache is a in-memory cache service.
● Helps to improve web application performance by retrieving information fast
from in-memory caches, instead of relying on slower disk based databases.
● Elasticache is a good choice if your database is read-heavy and not prone to
frequent changes.
● Support two open-source in-memory caching engines:
○ Memcached (Simple, easy to get started)
○ Redis (Advance capabilities)

Exam tips
● With DynamoDB, you can scale your database on the fly, without any down time.
● With RDS, it is not easy to scale and you usually have to use a bigger instance or
add a read replica.

RDS important FAQ


1. What is a maintenance window? Will my DB instance be available during maintenance events?
2. What should I do if my queries seem to be running slowly?
3. How will I be charged and billed for my use of Amazon RDS?
4. How will I be billed for a stopped DB instance?
5. Will my DB instance remain available during scaling?
6. How do I choose among the Amazon RDS storage types?
7. What is the difference between automated backups and DB Snapshots?

11
8. Do I need to enable backups for my DB Instance or is it done automatically?
9. What is a backup window and why do I need it? Is my database available during the backup
window?
10. Where are my automated backups and DB snapshots stored and how do I manage their
retention?
11. What happens to my backups and DB snapshots if I delete my DB instance?
12. Can I encrypt data at rest on my Amazon RDS databases?
13. What are the benefits of a Multi-AZ deployment?
14. Are there any performance implications of running my DB instance as a Multi-AZ deployment?
15. When running my DB instance as a Multi-AZ deployment, can I use the standby for read or write
operations?
16. What happens when I convert my RDS instance from Single-AZ to Multi-AZ?
17. What events would cause Amazon RDS to initiate a failover to the standby replica?
18. What happens during Multi-AZ failover and how long does it take?
19. Will my standby be in the same Region as my primary?
20. When would I want to consider using an Amazon RDS read replica?
21. Do I need to enable automatic backups on my DB instance before I can create read replicas?
22. Can I create a read replica in an AWS Region different from that of the source DB instance?
23. Can I use a read replica to enhance database write availability or protect the data on my source
DB instance against failure scenarios?
24. Can I create a read replica with a Multi-AZ DB instance deployment as its source?
25. Can I configure my Amazon RDS read replicas themselves Multi-AZ?
26. If my read replica(s) use a Multi-AZ DB instance deployment as a source, what happens if Multi-
AZ failover occurs
27. Can my read replicas only accept database read operations?
28. Can I promote my read replica into a “standalone” DB Instance?
29. How do I delete a read replica? Will it be deleted automatically if its source DB Instance is
deleted?

NoSQL - https://aws.amazon.com/nosql/

12

You might also like