Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

AWS Academy Cloud Architecting

Module 14: Planning for Disaster

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Module overview

Sections Lab
1. Architectural need • Guided Lab: Hybrid Storage and Data
Migration with AWS Storage Gateway File
2. Disaster planning strategies Gateway

3. Disaster recovery patterns

Knowledge check

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


2
reserved.
Module objectives

At the end of this module, you should be able to:


• Identify strategies for disaster planning
• Define recovery point objective (RPO) and recovery time objective (RTO)
• Describe four common patterns for backup and disaster recovery and how to
implement them
• Use AWS Storage Gateway for on-premises-to-cloud backup solutions

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


3
reserved.
Module 14: Planning for Disaster

Section 1: Architectural need

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


reserved.
Café business requirement

If the café’s infrastructure ever becomes unavailable, the staff must be able to get their applications
running again within an amount of time that is acceptable to the business. They need an
architecture that supports their disaster recovery plans while also optimizing for cost.

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


5
reserved.
Module 14: Planning for Disaster

Section 2: Disaster planning strategies

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


reserved.
Planning for failures

"Everything fails, all the time."


– Werner Vogels

Small-scale events Large-scale events Colossal events

How do you prepare for these


events?

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


7
reserved.
Avoiding and planning for disaster

High availability
• Minimize how often your applications and data become unavailable

Backup
• Make sure that your data is safe in case of disaster

Disaster recovery (DR)


• Recover your data and get your applications back online after a
disaster

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


8
reserved.
Selected AWS Well-Architected Framework
design principles

Operational Excellence pillar


• Anticipate failure
• Refine operational procedures frequently

Reliability pillar
• Test recovery procedures
• Automatically recover from failure

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


9
reserved.
Recovery point objective (RPO)

Recovery point objective (RPO) is the maximum acceptable amount of data loss,
measured in time.

How often must your data be backed up?

Example RPO: The business can recover from losing (at most) the last 8 hours of
data.

8 hours or fewer RPO


Time
[ data loss ]

Last backup Disaster strikes

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


10
reserved.
Recovery time objective (RTO)

Recovery time objective (RTO) is the maximum acceptable amount of time after
disaster strikes that a business process can remain out of commission.

How quickly must your applications and data be recovered?

Example RTO: The application can be unavailable for a maximum of 1 hour.

RPO 1 hour RTO


Time
[ data loss ] [ down time ]

Last backup Disaster strikes Applications and


data recovered
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights
11
reserved.
Plan for disaster recovery
Be intentional about where your data is stored and where your
applications run.

Region 2

Region 1

Storage Compute Networking Database Deployment


orchestration

The most robust DR plans span more than one Region.


© 2020, Amazon Web Services, Inc. or its Affiliates. All rights
12
reserved.
Storage and backup building blocks
AWS Cloud

Block File Object

Data storage

Amazon EBS EC2 Amazon EFS Amazon FSx for Amazon Amazon S3
instance Windows File Server S3 Glacier
store

Data transfer

Corporate data center

AWS AWS AWS AWS


Direct Connect DataSync Storage Gateway Snowball
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights
13
reserved.
Best practice: S3 Cross-Region Replication
AWS Cloud

Block File Object

Amazon EBS EC2 Amazon EFS Amazon FSx for Amazon Amazon S3
instance Windows File Server S3 Glacier
store
• Most S3 storage classes replicate data across
Availability Zones within a single Region
Region A Region B
• Configure S3 cross-Region replication for higher-
level data security
• Automatically, asynchronously replicates objects
created after you add the replication configuration
• Can also help meet compliance requirements and Source Destination
reduce latency for users who are accessing objects S3 bucket S3 bucket
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights
(replication configured)
14
reserved.
Best practice: EBS volume snapshots
AWS Cloud

Block File Object

Amazon EBS EC2 Amazon EFS Amazon FSx for Amazon Amazon S3
instance Windows File Server S3 Glacier
store

• Create point-in-time snapshots of EBS volumes


Region A Region B • Snapshots provide incremental backups (they back up the
blocks that changed since the previous snapshot)
• Snapshots enable you to restore data to a new EBS volume
Source Snapshot Copy of • Use Amazon Data Lifecycle Manager to automate the
EBS volume stored in snapshot creation, retention, and deletion of snapshots
Amazon S3
• You cannot snapshot instance storage
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights
15
reserved.
Best practice: File system replication
AWS Cloud

Block File Object

Amazon EBS EC2 Amazon EFS Amazon FSx for Amazon Amazon S3
instance Windows File Server S3 Glacier
store
• Replicate EFS or
FSx for Windows Region A Region B On-premises

File Server file


systems across
Regions Source AWS Destination AWS AWS Source
file system DataSync EFS or FSx DataSync Direct file system
• Replicate on- file system Connect
premises file (optional)
systems to the cloud
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights
reserved.
16
Compute capacity should be quickly
recoverable
Obtain and boot new server instances within minutes.

Amazon EC2

Custom Amazon
Machine Images
(AMIs)

EC2 Auto Scaling group

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


17
reserved.
Strategies for compute disaster recovery

• Use the Amazon EC2 snapshot capability for


backups
• Snapshots can be performed manually, or Transient compute example
scheduled (for example, by using AWS
Lambda) Long-lived resources
• Use system or instance level system AMI S3
backups infrequently and as a last resort buckets

• Drives up the cost of storage that is used


quickly
Create Pull
• Prefer automated rebuild from configuration data
Write
or code repositories instead instance data

• Cross-region AMI copies Process


Created data Terminated
• Cross-region snapshot copies Instance instance
• Consider transient compute architectures Time
• Store essential data off of the instance
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights
18
reserved.
Databases: Features that support recovery

Amazon Relational Database Amazon DynamoDB


Service (Amazon RDS)

• Take snapshot data and save it in a separate • Back up entire tables in seconds
Region
• Use point-in-time-recovery to continuously
• Combine read replicas with Multi-AZ back up tables for up to 35 days
deployments to build a resilient disaster
• Initiate backups with a single click in the
recovery strategy
console or a single application programming
• Retain automated backups interface (API) call
• Use Global Tables to build a multi-region,
multi-master database that provides fast
local performance for massively scaled
globally distributed applications

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


19
reserved.
Module 14: Planning for Disaster

Section 3: Disaster recovery patterns

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


reserved.
Common disaster recovery patterns on
AWS
Four disaster recovery patterns
• Backup and restore
• Pilot light
• Warm standby
• Multi-site

Each pattern is suited to a different


combination of:
• Recovery point objective
• Recovery time objective
• Cost-effectiveness
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights
21
reserved.
Backup and restore pattern
Back up configuration and state data to S3. Implement lifecycle policy to save on cost.
Corporate data center AWS Cloud

Lifecycle
S3 bucket policy Amazon S3 Amazon S3
Standard-IA Glacier

AWS Cloud
Restore when needed.
Corporate data center Lifecycle
S3 bucket policy Amazon S3 Amazon S3
Standard-IA Glacier
VPC in Endpoint
DR
Region Amazon
EC2

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


22
reserved.
AWS Storage Gateway
On-premises
infrastructure Archive
HTTPS
File Amazon S3 S3 Glacier
gateway Standard vault

iSCSI

S3 bucket
HTTPS Option to restore
Volume to volume, attach
Server gateway
AWS Storage Stored as EC2
Gateway EBS snapshots instance

S3 bucket
HTTPS Archive
Tape
gateway Stored as S3 Glacier
virtual tape libraries vault
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights
23
reserved.
Backup and restore: Checklist

Preparation phase In case of disaster


• Create backups of current systems • Retrieve backups from Amazon S3
• Store backups in Amazon S3 • Restore required infrastructure
• Document procedure to restore from • EC2 instances from prepared AMIs
backups • Elastic Load Balancing load balancers
• AWS resources created by an AWS
• Know: CloudFormation stack – automated
• Which AMI to use, and build as needed deployment to restore or duplicate the
• How to restore system from backups environment
• How to route traffic to the new system • Restore system from backup
• How to configure the deployment
• Route traffic to the new system
• Adjust Domain Name System (DNS)
records accordingly
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights
24
reserved.
Pilot light pattern: Preparation phase

On-premises or AWS Cloud AWS Cloud


www.example.com

Web
Web
server server
Web server Servers exist,
Route 53 but are not
hosted zone Web running
server
App App server
server

Data mirroring and


replication Secondary DB
Primary DB

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


25
reserved.
Pilot light pattern: In case of disaster

On-premises or AWS Cloud AWS Cloud


www.example.com

Web
server
Web server Servers start
Route 53 in minutes
hosted zone Web
server
App App server
server

Data mirroring and


replication Secondary DB
Primary DB

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


26
reserved.
Pilot light pattern: Checklist

Preparation phase In case of disaster


• Configure EC2 instances to replicate or • Automatically bring up resources around the
mirror servers replicated core dataset
• Ensure that all supporting custom software • Scale the system as needed to handle
packages are available on AWS current production traffic
• Create and maintain AMIs of key servers • Switch over to the new system
where fast recovery is needed • Adjust DNS records to point to AWS
• Regularly run these servers, test them, and
apply any software updates and
configuration changes
• Consider automating the provisioning of
AWS resources

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


27
reserved.
Warm standby pattern: Preparation phase

On-premises or AWS Cloud User or AWS Cloud


system access
Auto Scaling
group
Web Web Active Web
server server server
Web server
Route 53 Low capacity
hosted zone Web
server
App App App server
server server Auto Scaling
group

Data mirroring and


Primary DB replication Secondary DB

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


28
reserved.
Warm standby pattern: In case of disaster

On-premises or AWS Cloud User or AWS Cloud


system access
Auto Scaling
group
Web Web Active Web
server server server
Web Low capacity at
Route 53 switchover, starts
servers
hosted zone Web to scale up
server
App App App
server server servers
Auto Scaling
group

Data mirroring and


Primary DB replication Secondary DB

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


29
reserved.
Warm standby pattern: Checklist

Preparation In case of disaster


• Similar to pilot light • Immediately fail over most critical
• All necessary components production load
running 24/7, but not scaled for • Adjust DNS records to point to AWS
production traffic • (Automatically) Scale the system
• Best practice: Continuous testing further to handle all production
• Trickle a statistical subset of load
production traffic to the DR site

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


30
reserved.
Multi-site pattern
On-premises or AWS Cloud AWS Cloud
User or
system

Web
Web
Web Web
server
server server server
Web
Route 53 Full capacity
servers
hosted zone Web always running
server
App App App
server server servers

Data mirroring and


Primary DB replication Secondary DB

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


31
reserved.
Multi-site: Checklist

Preparation In case of disaster


• Similar to warm standby • Immediately fail over all production
• Configured for full scaling in or load
scaling out for production load

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


32
reserved.
Summary of common DR patterns
more RTO Cost

less

Backup and restore Pilot light Warm standby Multi-site

 Lower-priority use  Meeting lower RTO and  Solutions that  Automatic failover of
cases RPO requirements require RTO and your environment in
 Solutions: Amazon S3,  Core services RPO in minutes AWS to a running
Storage Gateway  Business-critical duplicate
 Scale AWS resources in
response to a DR event services
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights
33
reserved.
DR preparation: Best practices

Start simple Check for software Practice Game Day


licensing issues exercises

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


34
reserved.
• Common disaster recovery patterns on AWS
Section 3 key include backup and restore, pilot light, warm
standby, and multi-site.
takeaways • Backup and restore is the most cost effective
approach. However, it has the highest RTO.
• Multi-site provides the fastest RTO. However, it
costs the most because it provides a fully
running production-ready duplicate.
• AWS Storage Gateway provides three
interfaces—file gateway, volume gateway, and
tape gateway—for data backup and recovery
between on-premises and the AWS Cloud.

35 © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Module 14 – Guided Lab:
Hybrid Storage and Data
Migration with AWS
Storage Gateway File
Gateway

36 © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Guided lab: Tasks

1. Reviewing the lab architecture


2. Creating the primary and secondary S3 buckets
3. Enabling Cross-Region Replication
4. Configuring the file gateway and creating an NFS file share
5. Mounting the file share to the Linux instance and migrating the data
6. Verifying that the data is migrated

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


37
reserved.
Guided lab: Final product

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


38
reserved.
~ 45 minutes

Begin Module 14 – Guided


Lab: Hybrid Storage and
Data Migration with AWS
Storage Gateway File
Gateway

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


39
reserved.
Guided lab
debrief:
Key takeaways

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


40
reserved.
Module 14: Planning for Disaster

Module wrap-up

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


reserved.
Module summary

In summary, in this module, you learned how to:


• Identify strategies for disaster planning
• Define RPO and RTO
• Describe four common patterns for backup and disaster recovery and how to
implement them
• Use AWS Storage Gateway for on-premises-to-cloud backup solutions

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


42
reserved.
Complete the knowledge check

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


43
reserved.
Sample exam question

Company salespeople upload their sales figures daily. A Solutions Architect needs
a durable storage solution for these documents that also protects against users
accidentally deleting important documents.

Which action will protect against unintended user actions?

A. Store data in an EBS volume and create snapshots once a week.


B. Store data in an S3 bucket and enable versioning.
C. Store data in two S3 buckets in different AWS Regions.
D. Store data on EC2 instance storage.

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 44
Additional resources

• Amazon S3 Replication
• Amazon S3 Object Lifecycle Management
• Amazon EBS Snapshots
• Using AWS Lambda with Scheduled Events
• Backup & Restore resource center
• Disaster Recovery with AWS (video)

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights


45
reserved.
Thank you

© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission
from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited. Corrections or feedback on the course, please email us at: aws-course-
feedback@amazon.com. For all other questions, contact us at: https://aws.amazon.com/contact-us/aws-training/. All trademarks are the property of their owners.

You might also like