Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Basic Concepts for Incident Management

 Timescales

Time is of the essence in incident management because every incident represents some loss or deterioration of
service. Every aspect of the process needs to be optimized to produce the fastest end result. Service-level
agreements, operational-level agreements, and underpinning contracts will defi ne how long a support group or
third-party has to complete each step, with measurable targets. Service management tool sets should be configured
to capture how long it takes to log and escalate an incident, how many incidents are resolved within the first few
minutes without requiring escalation, and how long support teams take to respond to and to fi x incidents. These
times should be monitored, and steps should be taken to identify bottlenecks or underperforming teams so that
improvement actions can be taken.

 Incident Models

Using incident models, which are incident templates prepopulated with the necessary steps to resolve common
incidents, is one method of speeding up resolution. They enable faster, more consistent logging and resolution. The
steps may instruct the service desk how to resolve the incident or may predefined the information to be gathered as
well as the correct escalation group.

 Major Incidents

All incidents should get resolved as quickly as possible, but some incidents are so serious, with such an impact on the
business, that they require extra attention. The first step is to agree on exactly what is defi ned as a major incident.
Some organizations will defi ne all priority one incidents as major; others may restrict priority one incidents to those
whose impact will be felt by the external customers. In this definition, an incident with a major impact within the
organization would not normally be classed as major. An incident that (for example) prevents customers from
ordering goods from the organization’s website and that is therefore affecting both revenue and reputation would
be included. The definition must align with the priority scheme to avoid confusion.

The purpose of defining an incident as a major incident is so that it can receive special focus. Specific actions to be
undertaken are defi ned in advance so that when the major incident occurs, everyone knows what they are expected
to do. Typical actions might include the following:

 Notification of key contacts within the service provider organization and the business as soon as the major
incident is declared
 Regular updates posted through agreed channels—intranet, key users, and so on
 Recorded greeting put on the service desk number to inform callers that the incident has occurred and is
being dealt with to reduce the number of calls being handled by the desk
 Appointment of a major incident manager (this may be the service desk manager) and the appointment of a
separate team to focus on resolving the incident

As with any incident, some major incidents can be resolved without understanding the cause (perhaps by restarting
a server); some require the underlying cause to be understood. In the second case, problem management would
become involved. It is essential, however, that the focus of incident management remains on restoring service as
quickly as possible.

A major responsibility of the service desk is communicating with the users; this is particularly true in the case of
major incidents. Regular updates should be provided. The service desk staff members are also accountable for
ensuring that the incident record is kept up-to-date throughout the incident, although it may be the technicians in
other teams who actually enter the information.
An accurate record is essential during the incident so that there is no confusion; it will also be used after the incident
is resolved, as part of the major incident review. Regular updates showing the steps taken and whether they were
successful will allow improvements to be identified for future events.

Incident Status

Incident management tracks incidents through their lifecycle, moving from when the incident is identified through
diagnosis and resolution and finally closure. Incident management must ensure that incidents are resolved as quickly
as possible and so will remind resolving groups of the associated target times, making sure no incident is forgotten or
ignored.

Most service management tool sets will allow a number of statuses to be defi ned for each incident to facilitate
progress tracking. Typical statuses include

 Open The incident has been identified and logged. It may be being worked on by a service desk analyst, or
the service desk may be considering which second-line team it should be escalated to. Incidents resolved by
the first-line team may move directly from Open to Closed, because the service desk analyst obtains the user
confirmation that the incident has been satisfactorily resolved.
 Assigned This may mean the incident has been sent to a support team but not allocated to a particular
individual.
 Allocated or In Progress This is usually defi ned as when a support technician has been allocated the call
 On Hold This status is sometimes used when the user is not available or has not the time to test the
resolution. It is used to “stop the target clock,” because the service provider cannot do anything further to
resolve the incident without the user.
 Resolved This status indicates that the technician has completed their work, but it has not been confirmed
by the customer that this was successful. It is common to use the service management’s automated email
facility to automatically email the user when an incident is resolved, asking for a response within x days if the
user is still not happy. If no reply is received, the incident is automatically closed.
o If the user is unhappy, the call is put back into In Progress, and further work is carried out to resolve
it.
o The service desk should attempt to contact users to obtain permission to close calls before the
automated closure, especially for high-impact incidents, where the user may not be aware of the
resolution.
 Closed This status confirms that the incident is over to the user’s satisfaction. The incident management
process has no further involvement, although problem management may now investigate the underlying
cause.

Expanded Incident Lifecycle

The expanded incident lifecycle is used by the service design availability management process and within CSI. The
expanded lifecycle breaks down each step of the process so that they can be examined to understand the reasons
for the failed targets. For example, the diagnosis of the incident may ascertain very quickly that the resolution
requires the restoration of data, which takes three hours; this information would be used to pinpoint where
improvements should be made. Delays in any step of the lifecycle can be analysed, and improvements can be
implemented to speed up resolution; implementing a knowledge base or storing spare parts on-site are two typical
measures that are taken to shorten the diagnosis and repair steps.

lifecycle of an incident

o Step 1: Incident Identification Incident management is a reactive process; we cannot start to resolve an incident
until we know it has occurred. As we said earlier, it is essential that incidents are resolved in the shortest
possible time, because each represents business disruption. Whenever possible, therefore, we should be trying
to realize that an incident has occurred before the user notices or, failing that, before they have reported it to
the service desk. on event management, shows how monitoring tools can be used to identify failures. The event
management process should link directly to incident management so that any incidents spotted are worked on
immediately and resolved quickly.
Where an automated response to an incident is used, such as restarting a server following a failure, an incident
should still be logged for future analysis.

o Step 2: Incident Logging


The incident record contains all the information concerning a particular incident; details of when it was logged,
assigned, resolved, and closed may be required for service-level management reporting. Details of symptoms
and the affected equipment may be used by problem management. Steps taken to resolve the incident may be
used to populate a knowledge base. It is essential therefore that all relevant information is added to the record
as it progresses through its lifecycle.
A good integrated service management tool makes good recordkeeping much easier, because it can
automatically populate the record with user details from Active Directory or a similar tool) and equipment and
warranty details (based on the CI number. Automatic date and time stamping of each update and identification
of who made the update will both improve the completeness of the information in the record.

Service management tool sets differ, but a typical list of required information in an incident record would include
the following:
■ Unique reference number, generated automatically
■ Incident category (covered in the next section)
■ Incident impact, urgency, and priority
■ Date/time of every update, from logging to closure
■ Name of who logged and updated the incident
■ The Major Service Operation Processes
■ Method of notification telephone, automatic, email, in person, and so on
■ Full user contact details
■ Symptoms, questions asked by the service desk, and the answers given by the user
■ Steps taken to try to resolve the incident successful or otherwise
■ Incident status
■ Related CI/problem/known error
■ Assignee group and individual
■ Closure category

You might also like