Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

18 November 2020

Beginners Guide to
High Availability for
Postgres
Presented by:
Gaby Schilders, Sr. Sales Engineer
Welcome – Housekeeping Items

Slides and recording will be available in next 48 hours


Please do submit your questions through the chat – answers follow during Q&A
We will be sharing information about EDB and PostgreSQL later on in the session
Agenda
1. Concepts of High Availability
2. RPO, RTO and Uptime in High Availability
3. How does High Availability work?
4. High Availability for Postgres using
• Streaming Replication
• Logical Replication
5. Postgres parameters for High Availability (Streaming
Replication)
6. EDB tools for High Availability management and
monitoring

3 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


High Availability
Concepts
High Availability (formally)
High availability (HA) is a characteristic of a system, which aims to ensure an agreed level of
operational performance, usually uptime, for a higher than normal period.

Key principles:
• Eliminate single point of failure
• Reliable crossover
• Detection of failures
Ref: https://en.wikipedia.org/wiki/High_availability

5 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


High Availability (practically)
High availability (HA) in computing is the continuous and consistently correct functioning of a
system or service as provided to (end)users

But what is this factor measured on?


• Hardware
• IT infrastructure
• Business application

6 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


Scheduled/Unscheduled downtime
• Scheduled/planned downtime is a result of maintenance that is disruptive to system
operation and usually cannot be avoided with a currently installed system design.
• It include patches to system software that require a reboot or system configuration
changes that only take effect upon a reboot.
• Unscheduled/Unplanned downtime is the result of downtime events due to some physical
failures/events, such as hardware or software failure or environmental anomaly.
• For example, power outages, failed CPU or RAM components (or possibly other
hardware components failure), network failure, security breaches, or various
applications, middleware, and operating system failures result in Unplanned
outage/Unscheduled downtime.

7 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


Availability calculation
Calculated/expressed as a percentage of uptime in a given year based on the service level
agreements. Some companies exclude the planned outage/scheduled downtime based on their
agreements with customers on the availability of their services.

Downtime per
Availability % Downtime per year Downtime per month week Downtime per day

99.99% ("four nines") 52.60 minutes 4.38 minutes 1.01 minutes 8.64 seconds
99.995% ("four and a half
nines") 26.30 minutes 2.19 minutes 30.24 seconds 4.32 seconds
864.00
99.999% ("five nines") 5.26 minutes 26.30 seconds 6.05 seconds milliseconds

8 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


RPO
RTO
MTTR
GRO
Recovery Point Objective (RPO)
RPO is a measurement of time from the failure, disaster or comparable loss-causing event as seen
from the system being measured over.

RPO is described as:


• The period of time from the data-loss event back to the last point where data is in a usable
format
• How frequently you need to back-up your data (RPO doesn’t represent additional factors like
restore time and recovery time as we’ll discuss later on)
• How much data is lost following a disaster or loss-causing event (in volume, for instance)

10 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


Recovery Time Objective (RTO)
The amount of time an application can be down and not result in significant damage to a business
and the time that it takes for the system to go from loss to recovery
Recovery process includes
• The steps that IT must take to return the application
• And its data to its pre-disaster state.

11 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


RPO vs. RTO
RPOs and RTOs are key concepts for maintaining business continuity and function as business
metrics for calculating how often your business needs to perform data backups.

• RTOs coincide with recovery point objectives (RPOs), a measurement of time from the
failure, disaster or similar loss-causing event.
• RPOs calculate back in time to when your data was last usable, probably the most recent
backup.

12 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


Mean Time To Recover (MTTR)
The average time that a device will take to recover from any failure. systems which have to be
repaired or replaced.
• Examples of such devices range from self-resetting fuses (where the MTTR would be very
short, probably seconds), up to whole systems which have to be repaired or replaced.
• Usually part of a maintenance contract, where the user would pay more for a system MTTR
of which was 24 hours, than for one of, say, 7 days
• Does not mean the supplier is guaranteeing to have the system up and running again within
24 hours (or 7 days) of being notified of the failure.

13 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


Geography Recovery Objectives (GRO)
If datacenter becomes unavailable, how long it takes for the service to become available again.

• It covers RPO/RTO for making services available across the geography.

14 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


High Availability
For Postgres
Eliminate Single
Point of Failure
Eliminate Single Point of failure

• WAL shipping based replication


• Replication based on the archived WAL
• Streaming replication (SR)
• Streaming WAL files to one or more standbys
• Logical replication
• Streaming logical data modifications from the WAL.

17 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


Eliminate Single Point of failure
Hot Standby

• Identical to primary system


• Data is still mirrored in real time
• Allows READ
• On failure, can replace primary
• Approaches
• WAL shipping based
• Streaming WAL (widely used after 9.0)

18 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


Eliminate Single Point of failure
Hot Standby: WAL shipping

19 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


Eliminate Single Point of failure
Monitor: WAL shipping
• Functions on standby
• pg_is_in_recovery()
• pg_last_xlog/wal_replay_location/lsn()
• pg_last_xact_replay_timestamp()

20 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


Eliminate Single Point of failure
Hot Standby: Streaming Replication

21 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


Eliminate Single Point of failure
Streaming Replication
• Asynchronous Streaming Replication
• Synchronous Streaming Replication
• synchronous_standby_names
E.g.
• FIRST 1 (standby_east, standby_west)
• ANY 3 (standby_east, standby_west, eu_standby_east, eu_standby_west)
• 'standby_east, standby_west'
• synchronous_commit - off/local/remote_write/on/remote_apply

22 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


Eliminate Single Point of failure
Monitor: Streaming Replication
• Views
• Primary: pg_stat_replication
• Standby: pg_wal_receiver

23 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


Reliable
CrossOver &
Detection
Reliable Crossover & Detection

• In a redundant system, the crossover point itself becomes a single point of failure.
• Fault-tolerant systems must provide a reliable crossover or automatic switchover
mechanism to avoid failure.
• Detection of failures:
• If the above two principles are proactively monitored, then a user may never see a
system failure.

25 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


Reliable Crossover & Detection
EDB Postgres Failover Manager:
• Continuously monitors your PostgreSQL service to automatically detect failures.
• After an outage is confirmed, Failover Manager automatically promotes the most up-to-date
standby database as the new master.

26 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


Reliable Crossover & Detection
EDB Postgres Failover Manager:

27 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


RPO
RTO
MTTR
GPO
RPO/RTO/MTTR/GPO
Backup And Recovery

29 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


RPO/RTO/MTTR/GPO
Backup And Recovery Tool

30 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


High
Availability
Monitoring
High Availability Monitoring
Postgres Enterprise Manager

32 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


High Availability Monitoring
Postgres Enterprise Manager

33 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


Maintenance
Window/
Planned
Downtime
Maintenance Window/Planned Downtime
Software Updates/Patching
• Three reasons for software updates
• Remedy known software issues
• General stability and reliability of the software
• Security problem

35 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


Maintenance Window/Planned Downtime
Software Updates: Strategies
• Three strategies
• All Nodes Patching
• Rolling Patching
• Minimum Downtime Patching

36 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


Conclusion

• High Availability components


• Hot Standby (Streaming Replication)
• EDB Postgres Failover Manager
• Postgres Enterprise Manager
• Backup And Recovery Tool
• Design consideration
• Near zero downtime software maintenance
• RPO/RTO/GRO

37 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


Resources

• Blog series
• What Does High Availability Really Mean
• Patching Minor Version in Postgres High Availability (HA) Database Cluster
• Plans & Strategies for DBAs
• Key Parameters and Configuration for Streaming Replication in Postgres 12
• Quick and Reliable Failure Detection with EDB Postgres Failover Manager

38 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


Expertise
2019 We’re database fanatics who care
Challengers Leaders deeply about PostgreSQL

• Enterprise PostgreSQL innovations


• 4,000+ global customers
• Recognized by Gartner Magic Quadrant for 7 years in a row
Ability to execute

• One of the only sub-$1bn revenue companies


Niche Players Visionaries
• PostgreSQL community leadership
Completeness of vision

1986 1996 2004 2020


The Design Birth of EDB Heap Only Materialized Parallel JIT Serializable Today
Tuples (HOT) Views Query Compilation Parallel Query
of PostgreSQL PostgreSQL is founded

39 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


Market Success

40 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


EDB Open Source Leadership
Named EDB open source committers and contributors
Core team Major contributors Contributors

Bruce Momjian Amit Langote Devrim Gündüz Akshay Joshi Amul Sul Ashesh Vashi Ashutosh Sharma Jeevan Chalke

Dave Page Robert Haas Dilip Kumar Jeevan Ladhe Mithun Cy Rushabh Lathia Amit Khandekar

Designates PostgreSQL committers

41 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.


Learn More

Next Webinar: November 25


An overview of reference architectures for
Postgres by Michael Willer

Other resources

Postgres Pulse EDB Youtube

Thank You
Channel

42 © Copyright EnterpriseDB Corporation, 2020. All rights reserved.

You might also like