CFGM & Critical INCM - Interlink

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 25

Configuration Management & Critical Incident Management interlink

Agenda
Introduction Critical Incident Management Process in a NUT SHELL KPE's, RtOP & EON The "Critical" Information Flow & Key CI Attributes Bridging the GAP Q&A

Introduction
INCM has high visibility in an Organization Critical to Business Continuity and Emergency Ops Plan Service Availability is the Key The Underlying CMDB is a crucial factor Enable Communication and Decision Making Capabilities

Critical Incident Management Process in a NUT SHELL

What is an Incident ?- Definition from ITIL V3 :


An unplanned interruption to an IT Service or a reduction in the Quality of an IT Service. Failure of a Configuration Item that has not yet impacted Service is also an Incident. For example, Failure of one disk from a mirror set.

What is a Critical Incident ?


An Incident causing a complete interruption or extreme degradation of service delivered to a clients KPE, impacting the environment or business operation.

Critical Incident Management Process Overview


Strategic Incident Management Process
2.0 1.0 SIM Process Incident Handling SIM Process Initiation
1.3 Standard Incident Management Process

1.1 - Svc Call / OVO Alert - SDM / Customer call

1.2 SIM Incident = ? YES 1.7 SIM Process Initiation

NO

1.4 Service is restored = ?

YES

1.5 Incident Closure

NO

1.6 SIM needed =?

NO

YES

2.1 Initial SIM Communication

2.2 KPE affected = ?

NO

2.3 SRT / War room Establishment

2.4 Action Plan Creation & Execution

2.5 SIM Update Communication (SMS/E-mail/Exec summary)

2.7 Escalation Process

YES 2.9 EMEA RtOP Crisis Process 2.8 Final SIM Communication (SMS/E-mail) 2.6 Service Restored = ? NO

YES

3.0 SIM Incident Report Activities

3.1 Is SIM IR needed =? YES 3.3 SIM IR Document Draft 3.4 SIM IR Review Meeting

NO

3.2 PRM Handover Needed = ? YES 3.5 SIM IR Document Distribution 3.6 Problem Management

NO

3.7 Incident Closure

Critical Incident Management Time Line


00:05 Inform SIM 00:00 Ticket Creation 00:15 ADM & Tech Teams Informed 00:10 SIM calls DL 00:30 Business Impact Confirmed and escalate to L2 00:45 confirm path to resolution or start SIM Process

00:00

00:45

The Key Success Factors


Setting up the Service Restoration Team with minimal delay will Decide on the Time frame of Service Restoration. Getting the Required Information Org Details , CI Details & Relationships ,Technical Escalation Matrix & Current Impact are all Deciding Factors. Know-How on the KPE affected , will enhance SIM to trigger RtOP and Efficiently manage the Incident End to End

KPE's, RtOP & EON

What is KPE ?
A Key Production Environment (KPE) is a service, physically represented / supported by one or more IT components, whose loss or impairment will seriously impact the business of one or more (external or internal) customers and/or their customers. Also referred to as a Vital Business Function (VBF)

A Function of a Business Process that is critical to the success of the Business.


An outage or serious reduction of its functionality will result in a Priority 1 Incident. Documentation to be stored within CIS = Contract Information System Linkage should be made within ESL = Enterprise System List

KPEs & Supporting CI Layer

What is an RtOP ?
RtOP stands for Response for Operational Problems RtOP procedure is underpinned to the process incident management for outages which have a significant business impact to the client

Purpose :
The Response to Operational Problems (RtOP) procedure was developed to provide a solution to ensure timely communication of all HP P1 incidents to HP Enterprise Services leadership. RtOP is a corporate standard as referred to in the SRA (Standard Reference Architecture). The RtOP procedure is required for all Priority 1 incidents where a Mission Critical Environment (referred to as a Key Production Environment) is impacted or at risk. This procedure is the notification to HP Enterprise Services leadership.

Scope :
An RtOP is Invoked when an Incident causing a complete interruption of service delivery to the affected customer service entity / key production environment(s) or business operation. Those affected cannot utilize one or more predefined key services until service delivery is restored. There is no immediate workaround. Note: This is normally when the client's IT Director, CIO or CEO has been made aware of the issue due to the criticality to the business and therefore possibility of a client escalation to HP Management.

RtOP Types :
RtOP : Critical Outage HP Responsibility or 3rd Vendor (HP owns the support contract)

VRtOP : Critical Outage Client Responsibility or Client 3rd Vendor (Client owns the support contract)
IRtOP : Risk of Critical Outage, Contractual P1 not meeting RtOP P1 definition or Non Operational Issue

-RtOP : Critical Outage Multiple Clients ( Shared Services) , Long


Running Dissatisfaction Outage , Brand Image jeopardized , Client

RtOP Vs Critical Incident Management


Critical Incident Management is the Super Set and RtOP is its Sub Set
Not all Account P1s classify as RtOPs RtOP communication Involves Executive Audience RtOP process is triggered for KPE Outages Only

RtOP Procedure Flow

The "Critical" Information Flow & Key CI Attributes

CMDB Interface in Sev 1 Process Flow


Critical Incident Phases

CMDB Interface in Sev 1 Process Flow


Critical Incident Phases
Incident Detection Events/End User Issue Description Initial Priority Services Affected Incident Number Site & Locations affected Customer Contact Recovery , Resolution ,& Closure IR Notification & Communication: Resilient and Data Recovery Special Handling Instructions H/W-S/W contracts Downtime Contacts Technical Escalation Resolution Confirmation Problem Management Triggered PIR Initiation Classification and Prioritization Sev 1 Criteria Impact Analysis Coverage Initial Assignment Group Priority Justification Capabilities Involved Investigation and Diagnosis Services Hosted Environment Hands & Eyes KPE check and Linked CIs CI Location Recent Changes Resilient Technology CI Usage Description Contractual Information

CMDB Interface in Sev 1 Process Flow


Key users of the CMDB
Service Desk ASTs Accounts Community APMs Global Crisis Managers Client Capability leads

CMDB Interface in Sev 1 Process Flow


Critical Incident Phases CFGM Interlink

Bridging the GAP

Bridging the GAP


The Missing Links
Incomplete KPE Linkages Downtime Contacts & System Usage Void Business Criticality Fields DRP solutions not available for Business Critical Systems Obsolete & MTP Systems linked to KPEs Critical Changes not Captured in CMDB Hardware / Software Contact Hands & Eye Information and DC location details for DC / Onsite access.

Bridging the GAP


This will HELP
Periodic KPE audits with regards to KPE/Hostname Linkages Accommodate Attributes essential to Incident Management in CMDB Audits Talk with Change Management on Recent critical changes to be updated in CMDB Talk with Incident Management on RtOPs and check for any missing KPE linkages or Invalid KPEs. Feedback to AE/ADEs on Invalid KPEs found during KPE audit. Feedback to Availability Management on Failed KPE Resilience. Interface with Problem Management to resolve CI Discrepancies Indentified Lastly A CI Relationship diagram will Indeed Help

Q&A

Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Thank you

Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

You might also like