ITI Lec 06

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

IT Infrastructure Last Lecture

• Non‐functional Attributes
– Availability
– Performance
– Security
• Availability Concepts
Lecture # 6 •

Calculating Availability
MTBF and MTTR
• Calculating Availability: Examples

Dr. Muhammad Aamir Khan


Assistant Professor
Department of Informatics and Systems
School of Systems and Technology (SST)
University of Management and Technology

SST UMT Lahore IT Infrastructure ‐ Lecture 6 1 SST UMT Lahore IT Infrastructure ‐ Lecture 6 2

Outlines Announcement # 1
• Failover • There will be a Quiz on Monday (25th March 2024). Prepare Lectures 1, 2,
• Fallback 3, and 4. If possible bring A4 size answer sheets.
– Fallback – Hot Site
– Fallback ‐ Cold Site
– Fallback ‐ Warm Site
• Business Continuity
• RTO and RPO

SST UMT Lahore IT Infrastructure ‐ Lecture 6 3 SST UMT Lahore IT Infrastructure ‐ Lecture 6 4

Single Point of Failure (SPOF)


• A single point of failure (SPOF) is a component in the infrastructure that,
if it fails, causes downtime to the entire system. SPOFs should be avoided
in IT infrastructures as they pose a large risk to the availability of a system.

Availability Patterns For example, in most storage systems, the failure of one disk does not
affect the availability of the storage system. Technologies like RAID
(Redundant Arrays of Independent Disks) can be used to handle the
failure of a single disk, eliminating disks as a SPOF.
• Server clusters, double network connections, and dual datacenters –
they all are meant to eliminate SPOFs. The trick is to find SPOFs that are
not that obvious.

SST UMT Lahore IT Infrastructure ‐ Lecture 6 5 SST UMT Lahore IT Infrastructure ‐ Lecture 6 6

School of Systems and Technology


IT Infrastructure ‐ Lecture 6 1
Single Point of Failure (SPOF) Redundancy
• While it sounds easy to eliminate singe points of failure, in practice it is • Redundancy is the duplication of critical components in a single system,
not always feasible or cost effective. to avoid a single point of failure (SPOF)
• Take for instance the internet connection your organization uses to send • Examples:
e‐mail. Do you have multiple internet connections from your e‐mail – A single component having two power supplies; if one fails, the other
server? Are these connections running over separate cables in the takes over
building? What about outside of the building? Do you use multiple – Dual networking interfaces
internet providers? Do they share their backbones?
– Redundant cabling
• While eliminating SPOFs is very important, it is good to realize that there is
always something shared in an infrastructure (like the building, the
electricity provider, the metropolitan area, or the country). We just need
to know what is shared and if the risk of sharing is acceptable.
• To eliminate SPOFs, a combination of redundancy, failover, and fallback
can be used.

SST UMT Lahore IT Infrastructure ‐ Lecture 6 7 SST UMT Lahore IT Infrastructure ‐ Lecture 6 8

Failover Fallback
• Failover is the (semi)automatic switch‐over to a standby system or • Fallback is the manual switchover to an identical standby computer
component system in a different location
• Examples: • Typically used for disaster recovery
– Windows Server failover clustering • Three basic forms of fallback solutions:
– VMware High Availability – Hot site
– Oracle Real Application Cluster (RAC) database – Cold site
• Oracle Real Application Clusters (RAC) allow customers to run a – Warm site
single Oracle Database across multiple servers in order to
maximize availability and enable horizontal scalability, while
accessing shared storage.

SST UMT Lahore IT Infrastructure ‐ Lecture 6 9 SST UMT Lahore IT Infrastructure ‐ Lecture 6 10

Fallback – Hot Site Fallback ‐ Cold Site


• A hot site is • Is ready for equipment to be brought in during an emergency, but no
– A fully configured fallback datacentre computer hardware is available at the site
– Fully equipped with power and cooling
– Applications are installed on the servers • Applications will need to be installed and current data fully restored from
– Data is kept up‐to‐date to fully mirror the production system backups

• Requires constant maintenance of the hardware, software, data, and • If an organization has very little budget for a fallback site, a cold site may
applications to be sure the site accurately mirrors the state of the be better than nothing
production site at all times

SST UMT Lahore IT Infrastructure ‐ Lecture 6 11 SST UMT Lahore IT Infrastructure ‐ Lecture 6 12

School of Systems and Technology


IT Infrastructure ‐ Lecture 6 2
Fallback ‐ Warm Site
• A computer facility readily available with power, cooling, and computers,
but the applications may not be installed or configured

• A mix between a hot site and cold site


Business Continuity: High
• Applications and data must be restored from backup media and tested
Availability
– This typically takes a day

SST UMT Lahore IT Infrastructure ‐ Lecture 6 13 SST UMT Lahore IT Infrastructure ‐ Lecture 6 14

Business Continuity Business Continuity


• An IT disaster is defined as an irreparable problem in a datacenter, • In case of a disaster, the infrastructure could become unavailable, in some
making the datacenter unusable cases for a longer period of time
• Natural disasters: • Business continuity is about identifying threats an organization faces and
– Floods providing an effective response.
– Hurricanes
– Tornadoes 1. Business Continuity Management (BCM) and
– Earthquakes 2. Disaster Recovery Planning (DRP)
• Manmade disasters: • are processes to handle the effect of disasters.
– Hazardous material spills
– Infrastructure failure
– Bio‐terrorism

SST UMT Lahore IT Infrastructure ‐ Lecture 6 15 SST UMT Lahore IT Infrastructure ‐ Lecture 6 16

Business Continuity Management Disaster Recovery Planning


• Business Continuity Management (BCM) includes: • Disaster recovery planning (DRP) contains a set of measures to take in case of
a disaster, when (parts of) the IT infrastructure must be accommodated in an
– IT alternative location.
– Managing business processes • An IT disaster is defined as an irreparable problem in a datacenter, making
– Availability of people and work places in disaster situations the datacenter unusable. In general, disasters can be classified into two broad
categories.
• The first is natural disasters such as floods, hurricanes, tornadoes or
• It also includes disaster recovery, business recovery, crisis management, earthquakes.
incident management, emergency management, product recall, and • The second category is manmade disasters, including hazardous material
contingency planning. spills, infrastructure failure, or bio‐terrorism.
• The IT disaster recovery standard BS:25777 can be used to implement DRP.
• A Business Continuity Plan (BCP) describes the measures to be taken DRP assesses the risk of failing IT systems and provides solutions. A typical DRP
solution is the use of fallback facilities and having a Computer Emergency
when a critical incident occurs in order to continue running critical Response Team (CERT) in place. A CERT is usually a team of systems
operations, and to halt non‐critical processes. The BS:25999 norm managers and senior management that decides how to handle a certain
describes guidelines on how to implement BCM. crisis once it becomes reality.

SST UMT Lahore IT Infrastructure ‐ Lecture 6 17 SST UMT Lahore IT Infrastructure ‐ Lecture 6 18

School of Systems and Technology


IT Infrastructure ‐ Lecture 6 3
RTO and RPO RTO and RPO
• Recovery Point Objective (RPO)
• RTO (Recovery Time Objective) and RPO (Recovery Point Objective) are – The RPO is the point in time to which data must be recovered
objectives in case of a disaster considering some "acceptable loss" in a disaster situation. It describes
the amount of data loss a business is willing to accept in case of a
• Recovery Time Objective (RTO) disaster, measured in time. For instance, when each day a backup is
– The maximum duration of time within which a business process must made of all data, and a disaster destroys all data, the maximum RPO is
be restored after a disaster, in order to avoid unacceptable 24 hours – the maximum amount of data lost between the last backup
consequences (like bankruptcy). and the occurrence of the disaster. To lower the RPO, a different back‐
up regime could be implemented.
– RTO is only valid in case of a disaster and not the acceptable
downtime under normal circumstances. Measures like failover and • NOTE: RTO and RPO are individual objectives
fallback must be taken in order to fulfill the RTO requirements. – They are not related

SST UMT Lahore IT Infrastructure ‐ Lecture 6 19 SST UMT Lahore IT Infrastructure ‐ Lecture 6 20

Next Lecture Questions?


• In the next lecture(s) we will discuss more about IT Infrastructure Non‐
Functional Attributes and Performance Concepts.

• NOTE: There will be a Quiz on Monday (25th March 2024). Prepare


Lectures 1, 2, 3, and 4. If possible bring A4 size answer sheets.

SST UMT Lahore IT Infrastructure ‐ Lecture 6 21 SST UMT Lahore IT Infrastructure ‐ Lecture 6 22

School of Systems and Technology


IT Infrastructure ‐ Lecture 6 4

You might also like