SUMSEM-2021-22 CSE4011 ETH VL2021220701890 Reference Material I 20-08-2022 Disaster Recovery Patterns

AWS Academy Cloud Architecting
Module 14: Planning for Disaster
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Module overview
Sections Lab
1. Architectural need • Guided Lab: Hybrid Storage and Data
Migration with AWS Storage Gateway File
2. Disaster planning strategies Gateway
3. Disaster recovery patterns
Knowledge check
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights

2
reserved.
Module objectives
At the end of this module, you should be able to:

• Identify strategies for disaster planning
• Define recovery point objective (RPO) and recovery time objective (RTO)
• Describe four common patterns for backup and disaster recovery and how to
implement them
• Use AWS Storage Gateway for on-premises-to-cloud backup solutions

3
reserved.
Section 1: Architectural need

reserved.
Café business requirement
If the café’s infrastructure ever becomes unavailable, the staff must be able to get their applications
running again within an amount of time that is acceptable to the business. They need an
architecture that supports their disaster recovery plans while also optimizing for cost.

5
reserved.
Section 2: Disaster planning strategies

reserved.
Planning for failures
"Everything fails, all the time."

– Werner Vogels
Small-scale events Large-scale events Colossal events
How do you prepare for these

events?

7
reserved.
Avoiding and planning for disaster
High availability
• Minimize how often your applications and data become unavailable
Backup
• Make sure that your data is safe in case of disaster
Disaster recovery (DR)

• Recover your data and get your applications back online after a
disaster

8
reserved.
Selected AWS Well-Architected Framework
design principles
Operational Excellence pillar

• Anticipate failure
• Refine operational procedures frequently
Reliability pillar
• Test recovery procedures
• Automatically recover from failure

9
reserved.
Recovery point objective (RPO)
Recovery point objective (RPO) is the maximum acceptable amount of data loss,
measured in time.
How often must your data be backed up?
Example RPO: The business can recover from losing (at most) the last 8 hours of
data.
8 hours or fewer RPO

Time
[ data loss ]
Last backup Disaster strikes

10
reserved.
Recovery time objective (RTO)
Recovery time objective (RTO) is the maximum acceptable amount of time after
disaster strikes that a business process can remain out of commission.
How quickly must your applications and data be recovered?
Example RTO: The application can be unavailable for a maximum of 1 hour.
RPO 1 hour RTO

Time
[ data loss ] [ down time ]
Last backup Disaster strikes Applications and

data recovered
11
reserved.
Plan for disaster recovery
Be intentional about where your data is stored and where your
applications run.
Region 2
Region 1
Storage Compute Networking Database Deployment

orchestration
The most robust DR plans span more than one Region.

12
reserved.
Storage and backup building blocks
AWS Cloud
Block File Object
Data storage
Amazon EBS EC2 Amazon EFS Amazon FSx for Amazon Amazon S3
instance Windows File Server S3 Glacier
store
Data transfer
Corporate data center
AWS AWS AWS AWS

Direct Connect DataSync Storage Gateway Snowball
13
reserved.
Best practice: S3 Cross-Region Replication
AWS Cloud
Block File Object
store
• Most S3 storage classes replicate data across
Availability Zones within a single Region
Region A Region B
• Configure S3 cross-Region replication for higher-
level data security
• Automatically, asynchronously replicates objects
created after you add the replication configuration
• Can also help meet compliance requirements and Source Destination
reduce latency for users who are accessing objects S3 bucket S3 bucket
(replication configured)
14
reserved.
Best practice: EBS volume snapshots
AWS Cloud
Block File Object
store
• Create point-in-time snapshots of EBS volumes

Region A Region B • Snapshots provide incremental backups (they back up the
blocks that changed since the previous snapshot)
• Snapshots enable you to restore data to a new EBS volume
Source Snapshot Copy of • Use Amazon Data Lifecycle Manager to automate the
EBS volume stored in snapshot creation, retention, and deletion of snapshots
Amazon S3
• You cannot snapshot instance storage
15
reserved.
Best practice: File system replication
AWS Cloud
Block File Object
store
• Replicate EFS or
FSx for Windows Region A Region B On-premises
File Server file

systems across
Regions Source AWS Destination AWS AWS Source
file system DataSync EFS or FSx DataSync Direct file system
• Replicate on- file system Connect
premises file (optional)
systems to the cloud
reserved.
16
Compute capacity should be quickly
recoverable
Obtain and boot new server instances within minutes.
Amazon EC2
Custom Amazon
Machine Images
(AMIs)
EC2 Auto Scaling group

17
reserved.
Strategies for compute disaster recovery
• Use the Amazon EC2 snapshot capability for

backups
• Snapshots can be performed manually, or Transient compute example
scheduled (for example, by using AWS
Lambda) Long-lived resources
• Use system or instance level system AMI S3
backups infrequently and as a last resort buckets
• Drives up the cost of storage that is used

quickly
Create Pull
• Prefer automated rebuild from configuration data
Write
or code repositories instead instance data
• Cross-region AMI copies Process

Created data Terminated
• Cross-region snapshot copies Instance instance
• Consider transient compute architectures Time
• Store essential data off of the instance
18
reserved.
Databases: Features that support recovery
Amazon Relational Database Amazon DynamoDB

Service (Amazon RDS)
• Take snapshot data and save it in a separate • Back up entire tables in seconds
Region
• Use point-in-time-recovery to continuously
• Combine read replicas with Multi-AZ back up tables for up to 35 days
deployments to build a resilient disaster
• Initiate backups with a single click in the
recovery strategy
console or a single application programming
• Retain automated backups interface (API) call
• Use Global Tables to build a multi-region,
multi-master database that provides fast
local performance for massively scaled
globally distributed applications

19
reserved.
Section 3: Disaster recovery patterns

reserved.
Common disaster recovery patterns on
AWS
Four disaster recovery patterns
• Backup and restore
• Pilot light
• Warm standby
• Multi-site
Each pattern is suited to a different

combination of:
• Recovery point objective
• Recovery time objective
• Cost-effectiveness
21
reserved.
Backup and restore pattern
Back up configuration and state data to S3. Implement lifecycle policy to save on cost.
Corporate data center AWS Cloud
Lifecycle
S3 bucket policy Amazon S3 Amazon S3
Standard-IA Glacier
AWS Cloud
Restore when needed.
Corporate data center Lifecycle
S3 bucket policy Amazon S3 Amazon S3
Standard-IA Glacier
VPC in Endpoint
DR
Region Amazon
EC2

22
reserved.
AWS Storage Gateway
On-premises
infrastructure Archive
HTTPS
File Amazon S3 S3 Glacier
gateway Standard vault
iSCSI
S3 bucket
HTTPS Option to restore
Volume to volume, attach
Server gateway
AWS Storage Stored as EC2
Gateway EBS snapshots instance
S3 bucket
HTTPS Archive
Tape
gateway Stored as S3 Glacier
virtual tape libraries vault
23
reserved.
Backup and restore: Checklist
Preparation phase In case of disaster

• Create backups of current systems • Retrieve backups from Amazon S3
• Store backups in Amazon S3 • Restore required infrastructure
• Document procedure to restore from • EC2 instances from prepared AMIs
backups • Elastic Load Balancing load balancers
• AWS resources created by an AWS
• Know: CloudFormation stack – automated
• Which AMI to use, and build as needed deployment to restore or duplicate the
• How to restore system from backups environment
• How to route traffic to the new system • Restore system from backup
• How to configure the deployment
• Route traffic to the new system
• Adjust Domain Name System (DNS)
records accordingly
24
reserved.
Pilot light pattern: Preparation phase
On-premises or AWS Cloud AWS Cloud

www.example.com
Web
Web
server server
Web server Servers exist,
Route 53 but are not
hosted zone Web running
server
App App server
server
Data mirroring and

replication Secondary DB
Primary DB

25
reserved.
Pilot light pattern: In case of disaster

www.example.com
Web
server
Web server Servers start
Route 53 in minutes
hosted zone Web
server
App App server
server
Data mirroring and

replication Secondary DB
Primary DB

26
reserved.
Pilot light pattern: Checklist
Preparation phase In case of disaster

• Configure EC2 instances to replicate or • Automatically bring up resources around the
mirror servers replicated core dataset
• Ensure that all supporting custom software • Scale the system as needed to handle
packages are available on AWS current production traffic
• Create and maintain AMIs of key servers • Switch over to the new system
where fast recovery is needed • Adjust DNS records to point to AWS
• Regularly run these servers, test them, and
apply any software updates and
configuration changes
• Consider automating the provisioning of
AWS resources

27
reserved.
Warm standby pattern: Preparation phase
On-premises or AWS Cloud User or AWS Cloud

system access
Auto Scaling
group
Web Web Active Web
server server server
Web server
Route 53 Low capacity
hosted zone Web
server
App App App server
server server Auto Scaling
group
Data mirroring and

Primary DB replication Secondary DB

28
reserved.
Warm standby pattern: In case of disaster
On-premises or AWS Cloud User or AWS Cloud

system access
Auto Scaling
group
Web Web Active Web
Web Low capacity at
Route 53 switchover, starts
servers
hosted zone Web to scale up
server
App App App
server server servers
Auto Scaling
group
Data mirroring and


29
reserved.
Warm standby pattern: Checklist
Preparation In case of disaster

• Similar to pilot light • Immediately fail over most critical
• All necessary components production load
running 24/7, but not scaled for • Adjust DNS records to point to AWS
production traffic • (Automatically) Scale the system
• Best practice: Continuous testing further to handle all production
• Trickle a statistical subset of load
production traffic to the DR site

30
reserved.
Multi-site pattern
User or
system
Web
Web
Web Web
server
Web
Route 53 Full capacity
servers
hosted zone Web always running
server
App App App
server server servers
Data mirroring and


31
reserved.
Multi-site: Checklist
Preparation In case of disaster

• Similar to warm standby • Immediately fail over all production
• Configured for full scaling in or load
scaling out for production load

32
reserved.
Summary of common DR patterns
more RTO Cost
less
Backup and restore Pilot light Warm standby Multi-site
 Lower-priority use  Meeting lower RTO and  Solutions that  Automatic failover of
cases RPO requirements require RTO and your environment in
 Solutions: Amazon S3,  Core services RPO in minutes AWS to a running
Storage Gateway  Business-critical duplicate
 Scale AWS resources in
response to a DR event services
33
reserved.
DR preparation: Best practices
Start simple Check for software Practice Game Day

licensing issues exercises

34
reserved.
• Common disaster recovery patterns on AWS
Section 3 key include backup and restore, pilot light, warm
standby, and multi-site.
takeaways • Backup and restore is the most cost effective
approach. However, it has the highest RTO.
• Multi-site provides the fastest RTO. However, it
costs the most because it provides a fully
running production-ready duplicate.
• AWS Storage Gateway provides three
interfaces—file gateway, volume gateway, and
tape gateway—for data backup and recovery
between on-premises and the AWS Cloud.
35 © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Module 14 – Guided Lab:
Hybrid Storage and Data
Migration with AWS
Storage Gateway File
Gateway
36 © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Guided lab: Tasks
1. Reviewing the lab architecture

2. Creating the primary and secondary S3 buckets
3. Enabling Cross-Region Replication
4. Configuring the file gateway and creating an NFS file share
5. Mounting the file share to the Linux instance and migrating the data
6. Verifying that the data is migrated

37
reserved.
Guided lab: Final product

38
reserved.
~ 45 minutes
Begin Module 14 – Guided

Lab: Hybrid Storage and
Data Migration with AWS
Storage Gateway File
Gateway

39
reserved.
Guided lab
debrief:
Key takeaways

40
reserved.
Module wrap-up

reserved.
Module summary
In summary, in this module, you learned how to:

• Identify strategies for disaster planning
• Define RPO and RTO
• Describe four common patterns for backup and disaster recovery and how to
implement them
• Use AWS Storage Gateway for on-premises-to-cloud backup solutions

42
reserved.
Complete the knowledge check

43
reserved.
Sample exam question
Company salespeople upload their sales figures daily. A Solutions Architect needs
a durable storage solution for these documents that also protects against users
accidentally deleting important documents.
Which action will protect against unintended user actions?
A. Store data in an EBS volume and create snapshots once a week.

B. Store data in an S3 bucket and enable versioning.
C. Store data in two S3 buckets in different AWS Regions.
D. Store data on EC2 instance storage.
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 44
Additional resources
• Amazon S3 Replication
• Amazon S3 Object Lifecycle Management
• Amazon EBS Snapshots
• Using AWS Lambda with Scheduled Events
• Backup & Restore resource center
• Disaster Recovery with AWS (video)

45
reserved.
Thank you
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission
from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited. Corrections or feedback on the course, please email us at: aws-course-
feedback@amazon.com. For all other questions, contact us at: https://aws.amazon.com/contact-us/aws-training/. All trademarks are the property of their owners.

SUMSEM-2021-22 CSE4011 ETH VL2021220701890 Reference Material I 20-08-2022 Disaster Recovery Patterns

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SUMSEM-2021-22 CSE4011 ETH VL2021220701890 Reference Material I 20-08-2022 Disaster Recovery Patterns

Uploaded by

Copyright:

Available Formats

AWS Academy Cloud Architecting

Module 14: Planning for Disaster

3. Disaster recovery patterns

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights

At the end of this module, you should be able to:

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights

Section 1: Architectural need

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights

Section 2: Disaster planning strategies

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights

"Everything fails, all the time."

Small-scale events Large-scale events Colossal events

How do you prepare for these

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights

Disaster recovery (DR)

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights

Operational Excellence pillar

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights

How often must your data be backed up?

8 hours or fewer RPO

Last backup Disaster strikes

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights

How quickly must your applications and data be recovered?

Example RTO: The application can be unavailable for a maximum of 1 hour.

RPO 1 hour RTO

Last backup Disaster strikes Applications and

Storage Compute Networking Database Deployment

The most robust DR plans span more than one Region.

Block File Object

Corporate data center

AWS AWS AWS AWS

Block File Object

Block File Object

• Create point-in-time snapshots of EBS volumes

Block File Object

File Server file

EC2 Auto Scaling group

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights

• Use the Amazon EC2 snapshot capability for

• Drives up the cost of storage that is used

• Cross-region AMI copies Process

Amazon Relational Database Amazon DynamoDB

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights

Section 3: Disaster recovery patterns

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights

Each pattern is suited to a different

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights

Preparation phase In case of disaster

On-premises or AWS Cloud AWS Cloud

Data mirroring and

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights

On-premises or AWS Cloud AWS Cloud

Data mirroring and

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights

Preparation phase In case of disaster

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights

On-premises or AWS Cloud User or AWS Cloud

Data mirroring and

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights

On-premises or AWS Cloud User or AWS Cloud

Data mirroring and

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights