Incident Management (ITIL Standard)

Incident Management
Description/Summary
Incident Management provides the process, tools and concept for the fast recovery of service
quality in a defined service. It deals with service issues, and with all other service and user
requests recorded by a service desk. It also monitors the completion of requests by the service
desk or by all other service units. Finally, Incident Management has the task of informing the
service requester on the status of a service request.
Objectives
The purpose of Incident Management is to recover normal (i.e. agreed) service operation,
as quickly as possible, after an incident has been detected/recorded.
Incident Management contributes to an integrated Service Management approach by achieving

the following goals:
 Every incident and all required data is recorded.

 Every incident runs through a set of standardized activities and procedures, in order to
ensure effective and efficient processing.
 Every incident is categorized and prioritized regarding its (potential) impact and urgency,
in order to schedule its resolution in a business-oriented way.
 Functional and hierarchical escalation procedures are in place in order to ensure that each
incident is investigated by qualified members of staff, either by internal or external
experts.
 Well-defined functional escalation levels ensures that all incidents are handled in a cost-
efficient way and that experts are relieved from non-expert diagnosis and resolution
activities.
Roles and Functions

Incident Management Specific Roles
Static Process Roles
 Process Owner: Initiator of the process, responsible for defining its strategic goals and
allocating all required resources
 Process Manager (Incident Manager): Manager of the entire process, responsible for its
effectiveness and efficiency
Dynamic Process Roles

 Incident Agent: see IT Service Management Principles Ticket, Ticket Owner and Ticket
Agent for details
 Incident Owner: see IT Service Management Principles Ticket, Ticket Owner and Ticket
Agent for details
Service Specific Roles
 First Level Support – in general this is equivalent to the function of the Service Desk: A
person working in the Service Desk function; acts as a Ticket Agent and Ticket Owner
 Expert / Service Specialist
o Second Level Support member of staff: A person working in an internal IT
department and providing expert qualities in one or more specific areas; acts as a
ticket agent (see below)
o Third/n-th Level Support member of staff: A person working in an internal IT
department, or for an external supplier, and providing specialist qualities in one or
more specific areas; acts as a ticket agent (see below)
Customer Specific Roles
 User: Person who is affected by the Incident

 Caller: The person who reports the Incident. This may be the same as the „User“
 Service Desk
Information Artifacts
This section outlines information/data required or recorded by the process. In general, a process
record (here: the incident record) contains all information needed to execute this process and also
represents the current progress of a process. Additional information items (artifacts) typically can
be realized by considering the information from one or more (up to all) process records. This can
be done by either filtering, merging, correlating or interpreting information from these records;
sometimes this can also be done in the context of information and data from other sources.
The Incident Record
The Incident Record holds any management-relevant information for a specific incident. On
creation, it is based on (filled with) the information provided by the user or system/tool reporting
the new incident.
The following attributes must be considered for an Incident Record:
 Unique Identifier
 Incident Owner
 Incident Agent
 Caller
 User
 Customer
 Status of the Incident (Record)
 Description of the Incident Symptoms
 Service Level Agreement (SLA defined urgency)
 Current Assessed Impact of the Incident
 Priority (according to urgency and impact)
 Services Affected by the Incident
 Related Configuration Item
 General check list result
 Specific check list result
 Investigation results
 Problem(s) or Error(s) Related to the Incident
 Applicable Workaround(s) (link to workaround database)
 Request(s) for change(s) triggered
 Resolution/Recovery describtion
 Resolution date and time
 Testing result
 Result of Service Recovery Confirmation Request
 Closure date and time
 Additional Remarks
Key Concepts
Major Incident Handling
The incident of priority „1“ is a Major Incident. In order to accelerate process execution in such a
case, the following special procedures for the handling of Major Incidents are proposed. The
Service Owner takes over the Ownership of the Ticket from the Service Desk. He is now
accountable for the Incident. See Ticket, Ticket Owner and Ticket AgentTicket Owner for
detailed information.
Priority
Priority is a control parameter in the process and supports the Incident Management Staff in the
efficient management of resources (staff, capacity, time etc.). Services with higher service levels
will have a higher priority („1“ is the highest priority). Priority is defined by combining impact
and urgency:
 Impact – this describes the situation of an incident and is defined by the following
factors:
o Number of affected users
o Percentage of affected users
 Urgency – this defines the priority of incidents when the impact of those incidents are the
same.
Information Duty
First level support / line support has the duty to inform the user about the status of an incident
especially, when the incident cannot be fixed immediately. The user should know about the
status of an incident, between the status „work on incident started“ and „work on incident
stopped“. If user and caller are different persons, both persons should be informed on status.
Regular status information helps to avoid escalations because the user is informed and know that
his request is taken care of. A information can be triggered by event and by time schedules.
Hierarchical Escalation
Incidents can be escalated hierarchically when the process can not be fulfilled in „regular“ way.
This case occurs when:
 available resources are not sufficient

 agreements in ULA/ UC contracts can not be fulfilled
 service level can not be fulfilled
Escalation can be performed at each point of an incident’s life cycle. See Hierarchical

Escalation for detailed information
Incident Controls

status description
new A new incident is identified or reported.
A new incident is registered if all the necessary Information has been stored in
registered
the Incident Record.
service
The affected service is identified.
selected
All important control factors have been determined, e.g.priority of the affected
classified service respectively the Service Level Agreement. Also the type of procedure is
defined.
forwarded The record has been forwarded.
If an incident could not be resolved by the 1st Level, it has to be
assigned
forwarded/functionally escalated to the expert team.
solution
An incident where the solution is found.
found
recovered An incident, which is resolved (with or without Change Management).
An incident, with formerly status recovered, which has not successfully passed
failed
the required tests or the user has the solution rejected is failed.
solved An incident has passed the required test.
The user has confirmed the solution of an incident and that the request has been
confirmed
handled successfully.
aborted An incident which did not completely pass the lifecycle.
end The final status of a completed incident.
Process
High Level Process Flow Chart
Critical Success Factors
 Incident overload and backlog

 Reliable data from the Configuration Management
 Extensive “Knowledge Database” which includes Problem Descriptions, Known Errors,
Resolutions and Workarounds
 Automated Ticket Tracking Systems
 Lack of Service Level Agreements
 Users and IT staff bypassing Incident Management process
Performance Indicators (KPI)
 Total Number of Incidents

 Average Resolution Time
 Percentage of Incidents resolved by first-line support (without routing)
 Number(or percentage) of Incidents with initial incorrect classification
 Average Support Costs per Incident
 Number(or percentage) of Incidents Routed Incorrectly
These can be monitored using the following measures: per priority, per service, per user, per
customer, per location, etc
Process Trigger
Event Triggers
 Any reported incident triggers the incident management process, which may be reported
using one of the following methods:
o User calls, mails, web form
o Event management monitoring tools
o Technical staff calls or mails
Time Triggers
 None (Incident Management is typically event-triggered)
Process Specific Rules
 for each reported Incident a new Incident Record with an unique identifier is created.
 a change of the Incident Owner/Agent is only allowed, if the new Incident Owner/Agent
agrees.
 preferably, a person rather than a group should be the Incident Owner/Agent.
 inform user/caller on the status of ticket handling after each activity
Process Activities
Incident Recording
All upcoming incidents are be logged; a ticket in a trouble ticket system is created for each
incident. In order to facilitate incident ownership, tracking and escalation, a ticket owner is
defined according to the roles indicated above.
Activity Specific Rules:
 Default Incident Owner is set to Service Desk

 in case an Incident was reported by a caller; the member of Service Desk recording the
incident is, from this point in time on, the Incident Owner.
 set Incident Agent = Incident Owner
 set User and Caller
 set Customer
 „Default Incident Priority“ is set to 1
 „Default Incident Status“ is set to new
 document the Incident symptoms in the Incident Describtion
 if the user/customer is valid and is allowed to trigger an incident for this service, then go
to control activity „recorded – accepted“
otherwise go to „recorded – rejected“
 set Configuration Item to „-„
Incident Classification
This classification aims at recording the class/type of the incident and provides essential
information for subsequent prioritization. This activity is crucial for the success of the following
sub processes
In a first step, the affected service and the adequate SLAs are classified (set). Then the priority
can be defined based on SLAs for a certain service. Other control factors will have to be set as
well.
 Service is set to the affected Service or Sub Service

 Service Level Agreement is set to the relevant SLA for the selected user (customer) and
service
 Impact is set by a consideration of the Incident Symptoms Description
 Priority is set according to urgency and impact, as described in the Service Level
Agreement
 Other control factors are set
 Go to control activity „classified“
Incident Initial Support

This activity checks if an incident has already been reported or if a solution exists for the
incident (known incidents).
Classified incidents are checked according to the following general checklist:
 are there any open incidents of same service reported?

 are there any closed incidents of same service documented?
 are there any open problems of same service reported?
 are any workarounds known?
 are there any recently implemented changes addressing same service?
If one of those questions can be answered with „YES“, this may decrease the handling time
needed for an incident’s solution. If none of questions above can be answered, a service specific
check list need to be used. This service specific check list is provided in the service description.
Depending on the quality of the investigation, less incidents will need to be escalated to the
Service Expert Support Level.
Following the result of the Incident Initial Support, an incident should be assigned to the owner
of similar incidents. This should avoid that two different owners handling the similar incident.
Incidents occurring simultaneously will in most cases have the same reason and therefore can be
consolidated. Good documentation about solutions of solved incidents can help to decrease
handling time of new incidents. This correlation of incidents to problems is only possible if a
Problem Management Process exists. Known workarounds of incidents can also improve the
handling times of new incidents.
 execute general checklist

 execute service specific checklist
 describe the result of the general checklist
 describe the result of the service specific checklist
 describe the solution
 if a solution is found within this activity then go to control activity „supported –
successful“
otherwise
 set the Incident Agent to Service Expert by functional escalation

 go to control activity „support – failed“
Incident Investigation
Incident Investigation is the most complex activity of the Incident Management Process. The
incident is investigated by using available information on the incident symptoms with the aim of
achieving a quick resolution of the incident and a restoration of the disrupted service(s). This
available information is provided by the general and specific checklists. Additional information
can be: affected services, CIs, users, related incidents, errors, information out of the CMDB, and
technical expert knowledge.
The first action in this activity is to check whether the classification, especially the determination
of the Service, has been completed correctly. If not, the incident is returned to the Service Desk
for reclassification.
If the classification was correct, then the results of the general and service specific checklist are
reviewed. If the initial support was wrong, then the incident is returned to the Service Desk to
complete.
If both decisions are passed successfully, then the 2nd level support (or 3rd) starts its
investigation. If a solution is found they then execute the recovery activity, then they can pass the
Incident Ticket back to the Service Desk.
If no solution is found, it can be functionally escalated to a Specialist.
 if classification was NOT correct

o set Incident Agent to either Service Desk or the former Incident Agent to
reclassify
o go to control activity „recorded -accepted“
 if initial support was NOT correct
o set Incident Agent to either Service Desk or the former Incident Agent to re-
execute initial support
o go to control activity „classified“
 if no solution was found
o set Service Agent to an appropriate Service Specialist
o inform the Service Owner
o go to control activity „supported- failed“
 otherwise
o if the next activities (recovery, testing, …) are to executed by Service Desk
 set Service Agent to Service Desk
o go to control activity „supported- successful“
Incident Recovery
Once a solution has been found, then different recovery options are possible. If a Change is
needed, then the Change Management Process has to be triggered. Otherwise the Incident Agent
can either perform the recovery or can guide the user/caller to do so. The individual steps must
be documented.
 if a change is needed to recover the Incident Change Management has to be triggered
Otherwise
 the Incident Agent has to perform the tasks to recover the service
or
 the Incident Agent has to guide the user/caller to perform the tasks:
o document recovery
o go to the control activity recovered.
Incident Testing
If a incident is recovered, the functionality of the live-system needs to be tested. The following
actions need to be performed during this testing:
 see the Service Description for Specific Testing Procedures

 testing (test especially if all incident Symptoms have disappeared)
 approval of the implementation
The test result must be documented in the Incident Record. If Change Management was involved
in the recovery and a test was already performed, then this testing can be skipped. If possible,
this tester should be different from the Person who tested the recovery.
 set Incident Agent to Service Desk

 execute test
 record test result
 if test was successful, go to control activity „tested – successful“
otherwise
 set Incident Agent to a Service Expert

 go to control activity „tested – failed“
Incident Closure
When an incident is reported as tested successful, the user has to be informed. An answer is
required from the user (caller/Customer) to confirm or deny that there has been a successful
recovery to normal service operation. If an alert (event) triggered the Incident, then no
conformation is needed.
The solution must then be documented. This is necessary in order to build a Knowledge
Database. The final task to close the Incident Ticket must then be performed. If the user
(caller/Customer) refuses to confirm this is successful, then the Ticket has to be send back to
Initial Support.
 if the user confirms testing is successful, set the Incident Agent to the user
 when the user refuses to confirm, go to control activity „closed – failed“
otherwise
 set Incident Agent to Service Desk

 document the solution
 close the Incident
 go to control activity „closed – successful“

Incident Management (ITIL Standard)

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Incident Management (ITIL Standard)

Uploaded by

Copyright:

Available Formats

Incident Management

Incident Management contributes to an integrated Service Management approach by achieving

 Every incident and all required data is recorded.

Roles and Functions

Static Process Roles

Dynamic Process Roles

Service Specific Roles

Customer Specific Roles

 User: Person who is affected by the Incident

The Incident Record

The following attributes must be considered for an Incident Record:

 available resources are not sufficient

Escalation can be performed at each point of an incident’s life cycle. See Hierarchical

 Incident overload and backlog

Performance Indicators (KPI)

 Total Number of Incidents

 None (Incident Management is typically event-triggered)

Process Specific Rules

Activity Specific Rules:

 Default Incident Owner is set to Service Desk

otherwise go to „recorded – rejected“

 set Configuration Item to „-„

Activity Specific Rules:

 Service is set to the affected Service or Sub Service

Incident Initial Support

Classified incidents are checked according to the following general checklist:

 are there any open incidents of same service reported?

Activity Specific Rules:

 execute general checklist

 set the Incident Agent to Service Expert by functional escalation

If no solution is found, it can be functionally escalated to a Specialist.

Activity Specific Rules:

 if classification was NOT correct

Activity Specific Rules:

 if a change is needed to recover the Incident Change Management has to be triggered

 see the Service Description for Specific Testing Procedures

Activity Specific Rules:

 set Incident Agent to Service Desk

 set Incident Agent to a Service Expert

Activity Specific Rules:

 set Incident Agent to Service Desk

You might also like