Professional Documents
Culture Documents
Databases - Done
Databases - Done
1. SQL Server
2. Oracle
3. MySQL
4. PostgreSQL
5. MariaDB
6. Amazon Aurora
DB Instance
A DB instance is an isolated database environment in the cloud. Each DB instance
runs a DB engine. Amazon RDS currently supports six DB engines. The computation
and memory capacity of a DB instance is determined by its DB instance class.
1
DB Parameter Groups
A DB parameter group acts as a container for engine configuration values that are
applied to one or more DB instances. You can manage your DB engine configuration
by associating your DB instances with parameter groups. Amazon RDS defines
parameter groups with default settings that apply to newly created DB instances.
You cannot modify the parameter settings of a default DB parameter group. What
you can do is to define your own parameter groups with customized settings and
then modify your DB instances to use your own parameter groups. You can specify a
custom parameter group when launching a new RDS instance.
Option Groups
Some DB engines offer additional features that make it easier to manage data and
databases, and to provide additional security for your database. Amazon RDS uses
option groups to enable and configure these features. An option group can specify
features, called options, that are available for a particular Amazon RDS DB instance.
Options can have settings that specify how the option works. When you associate a
DB instance with an option group, the specified options and option settings are
enabled for that DB instance.
Encryption for the database can be done during the creation of the database. Also,
you need to ensure that the underlying instance type supports DB encryption.
Encryption in transit also supported by all Amazon RDS engines using SSL/TLS.
Once an encrypted connection is established, data transferred between the DB
Instance and your application will be encrypted during transfer.
2
High Availability (Multi-AZ)
Amazon RDS provides high availability and failover support for DB instances using
Multi-AZ deployments. When you select this option, Amazon automatically
provisions and maintains a secondary standby DB instance in a different Availability
Zone. It is supported by all RDS database engines.
With Multi-AZ, AWS will handle the replication for you and your primary DB instance
is synchronously replicated across Availability Zones to the secondary instance. In
the event of planned DB maintenance, DB instance failure or an availability zone
failure, Amazon RDS will automatically failover to the secondary standby database
minimizing the downtime.
The high-availability feature is not a scaling solution for read-only scenarios. Multi-
AZ is for Disaster Recovery (DR) only. You cannot use the standby to serve read
traffic. To service read-only traffic, you should use a Read Replica. The RDS console
shows the Availability Zone of the standby replica, called the secondary AZ.
Read: https://aws.amazon.com/blogs/database/amazon-rds-under-the-hood-multi-az/
Read Replicas
Read Replicas make it easy to scale-out (Not scale-up) a DB instance with a read-
heavy database workloads. You can create up to 5 read replicas per DB instance
(source) and distribute your read traffic amongst them (read scaling). In particular,
updates are applied to your Read Replica(s) after they occur on the source DB
instance (asynchronous replication). You can create read replica from a read replica.
You need to turn-on automatic backups on your source DB Instance before adding
read replicas, by setting the backup retention period to a value other than 0. Backups
must remain enabled for read replicas to work.
You can have read replicas that have Multi-AZ and currently MySQL, MariaDB and
PostgreSQL support Multi-AZ read replica deployment. You can create read replicas
of Multi-AZ source databases. Read replicas can be promoted to be their own
database, however that will break the replication.
Amazon RDS doesn't support circular replication. You can't configure a DB instance
to serve as a replication source for an existing DB instance; you can only create a
new Read Replica from an existing DB instance. For example, if MyDBInstance
replicates to ReadReplica1, you can't configure ReadReplica1 to replicate back to
3
MyDBInstance. From ReadReplica1, you can only create a new Read Replica, such as
ReadReplica2.
Read replicas are available in Amazon RDS for MySQL, MariaDB, PostgreSQL and
Oracle as well as Amazon Aurora.
https://aws.amazon.com/rds/details/read-replicas/
https://aws.amazon.com/about-aws/whats-new/2018/01/amazon-rds-read-replicas-now-
support-multi-az-deployments/
https://aws.amazon.com/about-aws/whats-new/2018/06/rds-postgres-supports-
readreplicas-multiaz/
Backups
There are two types of backups:
● Automated backups
● Database snapshots
Amazon RDS creates automated backups of your DB instance during the backup
window of your DB instance. Amazon RDS saves the automated backups of your DB
instance according to the backup retention period that you specify. If necessary, you
can recover your database to any point in time during the backup retention period.
By default, when you create an RDS instance in AWS, daily backups are enabled with
a 7 day retention policy. You can set the backup retention period to between 1 and
35 days. Setting the backup retention period to 0 disables automated backups.
You can also backup your DB instance manually (user-initiated), by manually creating
a database snapshot. There is a manual snapshot limits of 100 per region and it
does not apply to automated backups.
4
The first snapshot of a DB instance contains the data for the full DB instance.
Subsequent snapshots of the same DB instance are incremental, which means that
only the data that has changed after your most recent snapshot is saved.
Automated backups occur daily during the preferred backup window and while your
data is being backed up, storage I/O may be briefly suspended while the backup
process initializes (typically under a few seconds) and you may experience a brief
period of elevated latency.
If you are running a Multi-AZ deployment, automated backups and DB Snapshots are
simply taken from the standby to avoid I/O suspension on the primary. If the backup
requires more time than allocated to the backup window, the backup continues after
the window ends, until it finishes. The backup window can't overlap with the weekly
maintenance window for the DB instance.
All automated backups are deleted when you delete a DB instance and you can
choose to have Amazon RDS create a final DB snapshot before it deletes your DB
instance. Unlike automated backups, manual snapshots are not deleted when you
delete a DB instance, they are kept until you explicitly delete them.
Amazon RDS will also store the transaction logs throughout the day (RDS uploads
transaction logs for DB instances to Amazon S3 every 5 minutes) and when you do a
recovery, they can be applied on top of automated backups to do a point in time
recovery down to a second. Whenever you restore either an automated backup or a
manual database snapshot, the restored version of the database will be a new RDS
instance with a new DNS endpoint.
Restoring
Two ways:
5
Monitoring
Monitoring Tools
AWS provides various tools that you can use to monitor Amazon RDS. You can
configure some of these tools to do the monitoring for you, while some of the tools
require manual intervention. It is recommended to use automated monitoring tasks
as much as possible.
CloudWatch Monitoring
6
Read: https://n2ws.com/blog/aws-automation/features-amazon-rds-metrics-
monitoring
CloudWatch gathers metrics about CPU utilization from the hypervisor for a DB
instance, and Enhanced Monitoring gathers its metrics from an agent on the
instance. As a result, you might find differences between the measurements,
because the hypervisor layer performs a small amount of work. The differences can
be greater if your DB instances use smaller instance classes, because then there are
likely more virtual machines (VMs) that are managed by the hypervisor layer on a
single physical instance. Enhanced Monitoring metrics are useful when you want to
see how different processes or threads on a DB instance use the CPU.
Read: https://www.sumologic.com/blog/amazon-web-services/amazon-rds-monitoring-
strategy/
Performance Insights uses lightweight data collection methods that don’t impact the
performance of your applications, and makes it easy to see which SQL statements
are causing the load, and why.
https://www.youtube.com/watch?v=4462hcfkApM
Exercise
● Create a MySQL RDS instance. Stick to all default free tier options.
● Create an EC2 instance with SSH/HTTP inbound allowed security group, install
Apache(HTTPD), PHP and PHP-MySQL and set up info.php/index.php pages.
7
$servername = "7.7.1.222";
$username = "root";
$password = "mysql";
// Create connection
$conn = new mysqli($servername, $username, $password);
// Check connection
if ($conn->connect_error) {
die("Connection failed: " . $conn->connect_error);
}
echo "Connected successfully";
?> ' > /var/www/html/index.php
--------------
sudo service httpd restart
sudo chkconfig httpd on
● Modify the index.php and set MySQL endpoint as server and set the appropriate
username and password.
● The security group of the RDS instance does not have a rule to allow traffic from
your EC2 instance. Therefore, modify the security group of RDS and set a new
inbound rule for MySQL port by selecting EC2’s security group or its IP (IP should
be in IP/SubnetMask format).
● Try to load info.php page and see if you can connect to EC2 properly.
● Try to load index.php page and see if you can connect to MySQL properly.
Amazon DynamoDB
Amazon DynamoDB is a fully managed NoSQL database that supports key-value and
document data models.
● Stored on SSD.
● Spread across 3 geographically distinct data centers.
● Consistency cross all copies of data usually reached within a second (One
second rule).
● This is a serverless service.
Core Components
In DynamoDB, tables, items, and attributes are the core components that you work
with. A table is a collection of items, and each item is a collection of attributes.
DynamoDB uses primary keys to uniquely identify each item in a table and secondary
indexes to provide more querying flexibility.
● Tables - Tables
● Items - Rows
8
● Attributes - Columns
● Primary Key - DynamoDB supports two types of primary keys:
○ Partition key: A simple primary key, composed of one attribute known as the
partition key.
○ Partition key and sort key: Referred to as a composite primary key, this type
of key is composed of two attributes. The first attribute is the partition key,
and the second attribute is the sort key.
● Secondary Indexes - Amazon DynamoDB provides fast access to items in a
table by specifying primary key values. However, many applications might
benefit from having one or more secondary (or alternate) keys available, to allow
efficient access to data with attributes other than the primary key. To address
this, you can create one or more secondary indexes on a table. You can then
query or scan the index just as you would query or scan a table. DynamoDB
supports two types of secondary indexes:
○ Local secondary index (5 local secondary indexes per table)
○ Global secondary index (20 global secondary indexes per table)
● DynamoDB Streams - DynamoDB Streams captures item level changes in any
DynamoDB tables and provide them in a time-ordered sequence. These
information are stored in a log for up to 24 hours.
Read consistency
DynamoDB supports:
DynamoDB uses eventually consistent reads (Default), unless you specify otherwise.
9
Amazon Redshift
Redshift is Amazon’s fully managed Data Warehouse service. It is really fast as well
as cheap when compared to others.
Redshift pricing
With Redshift, you are charged for:
Redshift security
Redshift provides:
10
● Encryption in transit
● Encryption at rest
Aurora
● Amazon proprietary database.
● Compatible with MySQL and PostgreSQL
● Maintain 6 copies of your data across 3 Availability Zones.
● Two types of replicas:
○ Aurora replica
○ MySQL replica
● Migrate to Aurora:
○ Create an Aurora read replica and promote it.
○ Create a snapshot and restore from that snapshot.
Elasticache
● Elasticache is a in-memory cache service.
● Helps to improve web application performance by retrieving information fast
from in-memory caches, instead of relying on slower disk based databases.
● Elasticache is a good choice if your database is read-heavy and not prone to
frequent changes.
● Support two open-source in-memory caching engines:
○ Memcached (Simple, easy to get started)
○ Redis (Advance capabilities)
Exam tips
● With DynamoDB, you can scale your database on the fly, without any down time.
● With RDS, it is not easy to scale and you usually have to use a bigger instance or
add a read replica.
11
8. Do I need to enable backups for my DB Instance or is it done automatically?
9. What is a backup window and why do I need it? Is my database available during the backup
window?
10. Where are my automated backups and DB snapshots stored and how do I manage their
retention?
11. What happens to my backups and DB snapshots if I delete my DB instance?
12. Can I encrypt data at rest on my Amazon RDS databases?
13. What are the benefits of a Multi-AZ deployment?
14. Are there any performance implications of running my DB instance as a Multi-AZ deployment?
15. When running my DB instance as a Multi-AZ deployment, can I use the standby for read or write
operations?
16. What happens when I convert my RDS instance from Single-AZ to Multi-AZ?
17. What events would cause Amazon RDS to initiate a failover to the standby replica?
18. What happens during Multi-AZ failover and how long does it take?
19. Will my standby be in the same Region as my primary?
20. When would I want to consider using an Amazon RDS read replica?
21. Do I need to enable automatic backups on my DB instance before I can create read replicas?
22. Can I create a read replica in an AWS Region different from that of the source DB instance?
23. Can I use a read replica to enhance database write availability or protect the data on my source
DB instance against failure scenarios?
24. Can I create a read replica with a Multi-AZ DB instance deployment as its source?
25. Can I configure my Amazon RDS read replicas themselves Multi-AZ?
26. If my read replica(s) use a Multi-AZ DB instance deployment as a source, what happens if Multi-
AZ failover occurs
27. Can my read replicas only accept database read operations?
28. Can I promote my read replica into a “standalone” DB Instance?
29. How do I delete a read replica? Will it be deleted automatically if its source DB Instance is
deleted?
NoSQL - https://aws.amazon.com/nosql/
12