Professional Documents
Culture Documents
Incident Management Process
Incident Management Process
Incident Management Process
Classification: Internal
Version 1.0
This document describes the Incident Management Process. The Process provides a consistent, simple, and repeatable method for everyone to follow
when system issues are reported by a customer or discovered by Mapal.
Who is accountable?
Product Manager(s) are accountable for the successful resolution of all incidents within agreed SLA targets.
Director of Support are accountable as the process owner and manager.
Incident management is a defined process for logging, recording and resolving incidents.
The aim of incident management is to restore the service to the customer as quickly as possible, this could be through a work around or temporary
fixes, whilst trying to find a permanent solution handled under the problem management process.
1.1.1. Primary goal
The primary goal of the Incident Management process is to restore normal service operation as quickly as possible. The aim is to minimize the
adverse impact on business operations, thus ensuring that the best possible levels of service quality and availability are maintained. ‘Normal service
operation’ is defined here as service operation within SLA limits.
1.2. Process Definition:
Incident Management includes any event which disrupts, or which could disrupt, a service. This includes events which are communicated directly by
users or Mapal staff through the Support Team or through an interface to system monitoring and incident management tools.
1.4.1. CSM
Customer Success management (CSM) is an approach a company's interactions with current and future customers. It often involves using technology
to organize, automate, and synchronize sales, marketing, customer service, and technical support. Mapal use Dynamics CRM as the technology
provider /platform to manage this data.
1.4.2. Customer
A customer is an end user or someone who refers to an overall organisation that is engaged in a contracted agreement for Mapal Group.
Mapal Support operates on a local business hours model and all solutions are developed and managed from the Madrid, Edinburg, Paris, and
Stockholm offices including all MAPAL-OS products such as, Workforce Management, Inventory, Flow Learning, Compliance, Reputation,
Analytics.
The end user is simply the person who uses the Mapal software after it has been fully developed, marketed, and installed. It is also the person who
would raise a query should a Mapal solution not be working correctly. Generally, the terms "user" and "end user" has the same meaning.
1.4.5. Incident
An incident is an unplanned interruption to a Mapal Product or Service or reduction in the quality of. Failure of any Item, software, or hardware, used
in the support of a Mapal system that has not yet affected service is also an Incident. Often described as a fault, error, defect, bug, problem or it
doesn’t work as designed.
An incident occurs when the operational status of a Mapal solution changes from working to failing or about to fail, resulting in a condition in which
the product is not functioning as it was designed or implemented. The resolution for an incident involves implementing a repair to restore the item to
its original state.
A design flaw does not create an incident. If the product is working as designed, despite the design may not be perceived as correct, the correction
needs to take the form of a change request or idea to modify the design. The service request may be expedited based upon the need, but it is still a
modification, not a repair.
A knowledge gap or user related process gap does not create an incident but rather a Question or Request.
1.4.5.1. Problem
Problem management differs from incident management in that its main goal is the detection of the underlying causes of an incident and the best
resolution and prevention. In many situations, the goals of problem management can be in direct conflict with the goals of incident management. The
Mapal approach is to restore the service as quickly as possible (incident management) but ensuring that all details are recorded. This will enable
problem management to continue once a workaround has been implemented.
1.4.5.2. Incident vs. Problem
An incident is where an error occurs: something does not work the way it is designed.
A problem (is different) and can be:
the occurrence of the same incident many times.
Incident Management distinguishes between Incidents (Service Interruptions) and Service Requests (standard requests from users, e.g. password
resets). Service Requests are not fulfilled by Incident Management; instead, this is a data services or change Request. There is a dedicated process for
dealing with emergencies.
1.4.7. Incident Priority
Incident Priority is the value given to an Incident to indicate its relative importance to ensure the appropriate allocation of resources and to determine
the timeframe within which action and resolution is required. The severity and impact of an incident will be used in determining the Incident Priority
for resolution. Incident Priority is based upon a coherent and up-to-date understanding of business impact and severity.
P4- #
P3- System operational with difficulty or procedural issues. A valid workaround is available. Low numbers of users affected.
P2 - A core functionality of the system is not operating as expected. No valid workaround is available. Multiple users affected.
P1- System completely non-operational and no work can be carried out. All users affected.
Impact
Critical P1 P1 P2 P2
High P1 P2 P2 P3
Medium P2 P2 P3 P3
Low P4 P4 P4 P4
1.4.7.1. Urgency
Urgency is determined by how many personnel are affected and has a rating assigned to Incidents, Problems and Changes used in conjunction with
Impact is one of the factors for allocating IT priorities.
High –Multiple users affected. Multiple users from multiple organisations or a single organisation are affected.
Critical – All users affected. All users from multiple organisations or a single organisation are affected.
1.4.7.2. Impact
Impact is determined by how much the user is restricted from performing their work, and is a measure of the effect of an Incident, Problem or Change
on Business Processes. Impact is often based on how service levels will be affected.
The Incident Repository is a database containing relevant information about all Incidents whether they have been resolved or not. General status
information along with notes related to activity should also be maintained in a format that supports standardized reporting. At Mapal, the incident
repository is contained within Customer Service APP, part of Dynamics CRM.
Reports providing incident details including a root cause, timeline of events and corrective measures to be distributed to customers within SLA.
A Record containing the details of an Incident; Each Incident record documents the Lifecycle of a single Incident, cause where available, and
corrective measures to resolve the incident and these records are maintained in Dynamics.
1.4.11. CSAT
Customer Satisfaction Survey (CSAT) is a management tool that is used to gauge the satisfaction on the service received for the Mapal end-users. We
use Dynamics for this.
1.4.12. Operational Level Agreement (OLA)
Often referred to as the OLA, operational level management ensures that arrangements are in place with internal IT support-providers in the form of
Operational Level Agreements (OLAs) and Underpinning Contracts (UCs), respectively.
1.4.13. RCA
Is an activity that identifies the root cause of an Incident or Problem, and typically concentrates on IT, Infrastructure, or database failures. (Root
Cause Analysis)
RCA reports are required for all P1 and P2 incidents that are of a higher impact and severity as this information is used to provide details and
reassurance of further preventative measures within problem management.
1.4.14. Response
Time elapsed between the time the incident is reported, and the time receipt is acknowledged and assigned to an individual for resolution.
1.4.15. Resolution
Service is restored to a point where the customer can perform their job. In some instances, this may only be a work around solution until the root
cause of the incident is identified and corrected.
Often referred to as the SLA, the Service Level Agreement is the agreement between Mapal and the customer outlining services to be provided, and
operational support levels.
A Contract between Mapal and a Third-Party service provider. The Third-Party provides goods or Services that support delivery of an IT Service to
end users, Mapal’s Customers. The Underpinning Contract defines targets and responsibilities that are required to meet agreed Service Level Targets
in an SLA.
1.5. Metrics
Metrics are results of processes and data that is measured and reported to help manage a Process, IT Service, or an Activity.
It is important for the success of all processes that roles, responsibilities and owners are agreed and documented. The person who will later have the
responsibility for running a certain process should also participate in its design. This will ensure that as much experience as possible flows into the
process definition, and that the role owners identify themselves closely with any changes to existing working practice. Responsibilities are assigned
and understood by those required to fulfill activities and tasks as part of the process.
KEY
R Responsible Does the work and makes the decisions to ensure a task is achieved
A Accountable Must be one person. Ensures correct and thorough completion of the process
C Consulted Provides information for the process through 2-way communication. Usually several people, subject experts
Perform and responsible for the major incident management process, provide notifications, updates on progress and resolution of each major incident
to key business stakeholders and customers effected by an Incident.
The creation and collection of root cause analysis details for distribution of the RCA Incident report.
Responsible for P1 and P2 incident internal and external communication and in accordance with OLA.
Responsible for governance of Major Incident Management Process with Tier I.
2.3. Customer Support – (Tier II)
To support incident management process through performing fault fix activities as per relevant support model and providing required communication
on progress to Service Team as per agreed OLA timescales. Ordinarily, a manager would not be involved in an incident in terms of working with the
support teams to understand the fault but ensuring there is the agreed activity with support and incident management process.
2.4. Technology Operations (TIM – Technology Incident Manager)
To manage the major incident management process within Technology and provide communication on progress to Service Manager (MIM) as per
agreed OLA timescales.
Ensure details and updates provided to incident logs, for root cause analysis and problem records are readily available and in accordance
with OLA.
Responsible for P1 and P2 incident communication to MIM in accordance with OLA.
Responsible for governance of Major Incident Management Process within Technology division.
2.5. Customer
Communication and point of contact for escalation of incidents with a specified support contact for incidents.
The below RACI matrix is a high-level summary of activities of defined roles and responsibilities for the Incident Management process agreed across
multiple departments.
Roles
Raise and validate the incident case within 15 minutes of incident identification
Respond to any P1 incident escalated from TIER I & TIER II within 15 minutes with at least an acknowledgement of the incident escalation and as much useful information as possible.
Send an email notification to all customers concerned (to be coordinated with Marketing department and Customer Success teams.
Need to have the customer lists (by product, by country)
Feedback on the latest progress at least every 30 minutes after that initial response unless other agreed and stated clearly in communications
SLA agreement is to resolve any P1 incidents within 4 business hours since reported
Tech teams would continue to work on any P1 incidents as a top priority until resolution
Raise and validate the incident case within 30 minutes of incident identification
Respond to any P2 incident escalated from TIER I & TIER II within 1 hour with at least an acknowledgement of the incident escalation and as much useful information as possible.
Send incident notifications (EMAIL, TEAMS CHANNEL, with a post in a channel to inform stakeholders and a conversation with the concerned people tech and support, STATUS PAGE)
Feedback on the latest progress at least every 60 minutes after the initial response unless other agreed and stated clearly in communications
Tech Ops would continue to work on any P2 incidents as a top priority until resolution
Responsible for incident closure, customer, and stakeholder communication for all incidents
Responsible for assigning incidents to the appropriate technology group for resolution i.e., assign incident to System Analysts, Tech Op to
provide initial investigation.
Performs post-resolution customer review to ensure that all services are functioning properly, and all incident logging is complete.
Responsible for creation and distribution of the RAC.
Prepare reports showing statistics of incidents resolved, SLA achievement and other metrics agreed with for Mapal Group Managers,
Directors and Executive team.
Includes Infrastructure, Development and QA technical staff involved in supporting services including and not limited to System Analyst,
Infrastructure engineer, Database analyst, and Developer.
Correct the issue or provide a work around to the service that will provide functionality that approximates normal service as closely as
possible and minimises the impact.
If an incident reoccurs or is likely to reoccur, create or update the problem management monitoring record so that root-cause analysis can
be performed, and a standard work around can be deployed.
Incident log details completed for cause and corrective measures.
On-going analysis to identify trends and support problem management.
Outside the listed hours of availability for contact and escalation please refer to ‘2.11. Hierarchical Escalation Contact details’
NAME EMAIL MOBILE
3.1. Categorisation
Identify what is reported is in an incident; the products and services impacted, the appropriate SLA and escalation timelines.
Indicate what support groups need to be involved.
Provide meaningful reporting on system continuity and reliability.
For each incident, the specific product or service will be identified. It is critical to establish with the user the specific area of the service being
impacted. For example, at Mapal it is Stock Control, Financial, Human Resources, or another area? If it is Stock Control, is it for Stock Count or
Purchasing? Identifying the impact to operations properly establishes the appropriate Service Level Agreement and relevant Service Level Targets.
In addition, the impact and severity of the incident need to be established. All incidents are important to the user, but incidents that affect large
groups of personnel, business deadlines or critical operational functions need to be addressed before those affecting 1 or 2 users.
Principles of Categorisation:
Does the incident cause a work stoppage for the user, or do they have other means of performing their job? An example would be a broken link on a
web page is an incident but if there is another navigation path to the desired page, the incident’s priority (severity) would be low because the user can
still perform the needed function.
The incident may create a work stoppage for only one person, but the impact is far greater because it is a critical operational function. An example of
this scenario would be the person in payroll having an issue which prevents the payroll from processing. The impact affects many more personnel
than just the user.
The Incident Priority P1 – P4 is assigned to an incident that will determine how quickly it is scheduled for resolution and will be set depending upon a
combination of the severity and impact.
Following are the current targets for response and resolution for incidents based upon Incident Priority.
Role Description
Incident Reported Incidents can be reported by the customer, internal or technical staff through various means, i.e., phone, email, or a self-service web interface
(Help Centre).
Incident identification
Support Team As far as possible, all key components should be monitored so that failures or potential failures are detected early so that the incident management
process can be started quickly. Mapal with all available monitoring will always aim to resolve an Incident before the end user is impacted.
All incidents must be fully logged, and date/time stamped, regardless of whether they are raised via a customer or whether automatically detected
via an event monitoring alert. All relevant information relating to the nature of the incident must be logged so that a full historical record is
maintained – and so that if the incident must be referred to other support group(s), they will have all relevant information at hand to assist them.
Incident categorisation
If the customer is calling about an issue, they have that is not related to one of the agreed services or is a system issue, then it is not an incident. A
case will still be logged and categorised appropriately as non-system related.
Is this a Question or Request incorrectly categorized as an incident? If so, update the case to reflect that it is and follow the appropriate process.
Incident prioritisation
Before an Incident Priority can be set, the severity and impact need to be assessed. Once the severity and impact are set, the Incident Priority is
derived using the Incident Priority matrix. Refer to ‘1.4.5. Incident Priority’
Initial diagnosis
A Support Team member must carry out initial diagnosis, using tools and known error information to try to discover the full symptoms of the
incident and to determine exactly what has gone wrong. The Support Team will utilise the collected information on the symptoms and use that
information to initiate a search of the Knowledge available in the Knowledge Hub, Dynamics CRM to find an appropriate solution. If possible,
the Support Team will resolve the incident and close the incident if the resolution is successful.
If this is a major incident meaning that a service is unavailable in part or whole, the Mapal appropriate stakeholders should be alerted to make
certain any resources necessary to the resolution will be immediately made available.
Incident Closure
Verify with the customer that the resolution was satisfactory, and the customer can perform their work. An incident resolution does not require
that the underlying cause of the incident has been corrected. The resolution only needs to make it possible for normal system activity to resume.
If the customer is satisfied with the resolution, proceed to closure, otherwise continue investigation and diagnosis.
When proceeding with closure the Support Team should also check the following:
Closure categorisation. Check and confirm that the initial incident categorisation was correct or, where the categorisation subsequently turned
out to be incorrect, update the record so that a correct closure categorisation is recorded for the incident – seeking advice or guidance from the
resolving group(s) as necessary.
User satisfaction survey. CSAT survey distributed on CRM incident closure by email.
Incident documentation. Chase any outstanding details and ensure that the Incident log is fully documented so that a full historic record at a
sufficient level of detail is complete. Incident RCA
On-going or recurring problem? Determine (in conjunction with support groups) whether it is likely that the incident could recur and decide
whether any preventive action is necessary to avoid this. In conjunction with Problem monitoring, a problem record should be created from every
incident to document and prevent further root cause problems and for repeat analysis reporting.
Assign to Is the necessary information available to resolve the incident? If not, the case should then be assigned to the Development Group that supports
Technology group the product.
mapal-os.com 9
5.1. Incident Tech Ops (Tier II) assignment steps:
*All escalation process steps are performed by the Support Team. Some of the steps may be automated.
Description
Examine all open incidents and determine actions based upon incident Priority.
Has the incident been resolved? If not continue to monitor and provide fort-nightly customer updates.
If it is a P1 incident, The Support Manager (MIM) on shift should be contacted by phone to initiate the incident assignment.
Join Incident Teams Channel - monitor the status of the P1 incident providing informational updates Support Manager (MIM) every 25
minutes.
Has the incident been resolved? If not continue to monitor and provide updates.
If the incident has been resolved, provide Support Manager with RCA details.
If it is a P2 incident, The Support Manager (MIM) on shift should be contacted by phone to initiate the incident assignment.
Join Incident Teams Channel - monitor the status of the P2 incident providing informational updates Support Manager (MIM) every 55
minutes.
Has the incident been resolved? If not continue to monitor and provide updates.
If the incident has been resolved, provide Support Manager with RCA details.
If the Tech Ops Team Lead (TIM) is not available, and it’s been 30 minutes call the Head of Tech Ops
If neither Team Lead nor the Head of Tech Ops is available, organically follow the hierarchical escalation chart
Hierarchical escalation should be used if 30 minutes after the incident has been logged, assigned and contact attempted if the
Incident Manager has not been available or responded.
All incidents must be raised to the Support Team for validation and categorisation of an incident and to follow incident management
process. Includes incidents reported from all internal departments.
mapal-os.com 10
5.5. P3 and P4 Minor defect escalation process:
This can be applied at any stage of the defect resolution lifecycle and is not dependant on if the SLA has been breached or not by
updating this incident to P2 to invoke the incident management process.
The primary goal of the Major Incident Management process is to restore normal service operation as quickly as possible and
minimize the adverse impact on business operations, thus ensuring that the best possible levels of service quality and availability are
maintained. ‘Normal service operation’ is defined here as service operation within SLA limits.
Major incident requiring this process is defined as an event which has significant impact or urgency for the business/organisation,
and which demands a response beyond the routine incident management process. This is an inclusive extension of the Incident
Management process and is implemented before the SLA is breached on P1 and P2 incidents or where otherwise deemed necessary.
Major Incidents may either cause, or have the potential to cause, impact on business-critical services or systems or be an incident
that has significant impact and risk on reputation, revenue, legal compliance, regulation, or security of the business/organisation.
Incidents for which the timescale of disruption – to even a relatively small percentage of users – becomes excessive should also be
regarded as major incidents.
It is possible to define some of these major incidents, but most will be prioritised as they happen based on impact and urgency.
Major incidents at Mapal will normally be classified as P1, P2 priority incidents.
Major Incident Management Response process is implemented by the Support Manager (MIM).
Role Description
Is it P1 or P2 incident?
Has it been 4 hours for P1 or 12 hours for P2 priority incident and still not resolved?
mapal-os.com 11
Follow standard P1, P2 incident management process until it is clear the SLA resolution
time has or is likely to breach. Also invoke MIMR if investigation of the incident has
provided evidence that the resolution time will breach SLA.
1st Action- Commandeer appropriate meeting room if applicable and start Teams. session.
2nd Action-Provide Teams join meeting ID via Team multi channels broadcast.
Director of Support
Infrastructure Manager
7.1. Reports/Metrics
Reports and Metrics will be produced monthly with quarterly summaries and included is:
Total numbers of Incidents (as a control measure)
Breakdown of incident at each stage (e.g., recorded, open, closed etc.)
Size of current incident backlog
Number and percentage of P1 and P2 incidents
Percentage of incidents handled within agreed response time as defined by SLA’s.
Number of incidents reopened and as a percentage of the total.
Number and percentage of incidents incorrectly prioritised and categorized.
Number and percentage of Incidents closed by the Service Team without reference to other levels of support (often
referred to as ‘first time fix)
Breakdown of incidents by time of day, to help pinpoint peaks and ensure matching of resources.
mapal-os.com 12
7.2. SLA Alerts & Dashboard
Alerting is available and is for open incidents SLA achievement/ tracking and is configured using entitlements in Salesforce and
email alerts will be distributed to prevent SLA breach.
There will be multiple levels of alerting for each incident based on age and time left to be able to resolve the incident without SLA
breach and these will be distributed hierarchically to include Tech Ops, Cloud Ops, Technical Support and Service and both
Technology and Operations Leadership.
Additionally, there is also an open incident SLA dashboard to display all open cases aligned to SLA achievement with the time left
to resolve within or if already breached SLA.
7.3. Meetings
The Director of Support conduct fortnightly sessions (Case Management) with Tech Ops to review previous incidents and incident
management.
Target for all the above is 95% completion and success with the minimum accepted level of 90%.
If Mapal already provides a service to a customer, but that customer wants to significantly expand that service or solution usage
beyond the existing cost support model in place, the request should be treated as an additional service request for managed services
or other and forwarded to the Customer Success Manager.
Incidents should be prioritized based upon impact to the customer and the availability of a workaround.
Regardless of where an incident is referred to during its life, ownership of the incident remains with the Support Team always. The
Support Team remains responsible for tracking progress, keeping users informed and ultimately for Incident Closure.
Rules for re-opening incidents - Despite all adequate care, there will be occasions when incidents recur even though they have been
formally closed. If the incident recurs within one working day, then it can be re-opened – but beyond this point a new incident must
be raised but linked to the previous incident(s) as a child case.
Work around solutions should be in conformance with Mapal standards, and policies.
mapal-os.com 13