Professional Documents
Culture Documents
The Shortcut Guide To: Improving IT Service Support Through ITIL
The Shortcut Guide To: Improving IT Service Support Through ITIL
The Shortcut Guide To: Improving IT Service Support Through ITIL
tm
Rebecca Herold
Introduction
Introduction to Realtimepublishers
by Don Jones, Series Editor
For several years, now, Realtime has produced dozens and dozens of high-quality books that just happen to be delivered in electronic formatat no cost to you, the reader. Weve made this unique publishing model work through the generous support and cooperation of our sponsors, who agree to bear each books production expenses for the benefit of our readers. Although weve always offered our publications to you for free, dont think for a moment that quality is anything less than our top priority. My job is to make sure that our books are as good asand in most cases better thanany printed book that would cost you $40 or more. Our electronic publishing model offers several advantages over printed books: You receive chapters literally as fast as our authors produce them (hence the realtime aspect of our model), and we can update chapters to reflect the latest changes in technology. I want to point out that our books are by no means paid advertisements or white papers. Were an independent publishing company, and an important aspect of my job is to make sure that our authors are free to voice their expertise and opinions without reservation or restriction. We maintain complete editorial control of our publications, and Im proud that weve produced so many quality books over the past years. I want to extend an invitation to visit us at http://nexus.realtimepublishers.com, especially if youve received this publication from a friend or colleague. We have a wide variety of additional books on a range of topics, and youre sure to find something thats of interest to youand it wont cost you a thing. We hope youll continue to come to Realtime for your educational needs far into the future. Until then, enjoy. Don Jones
Table of Contents Introduction to Realtimepublishers.................................................................................................. i Chapter 1: ITIL Overview and Challenges......................................................................................1 A High-Level ITIL Overview..........................................................................................................2 Change Management ...........................................................................................................3 Incident Management...........................................................................................................5 Problem Management ..........................................................................................................7 The Business Value of ITIL.............................................................................................................9 Efficient IT Benefits Business .............................................................................................9 Customer Retention ...............................................................................................10 Improved Quality ...................................................................................................10 Greater Efficiency..................................................................................................11 Better Communication .......................................................................................................11 Measurable Results ................................................................................................11 Better Audit Outcomes ..........................................................................................12 Automation Boosts IT Efficiency ......................................................................................12 ITIL Challenges .............................................................................................................................13 ITIL Implementation Takes Time......................................................................................13 ITIL Implementation Requires Resources from Across the Enterprise.............................13 ITIL Implementation Requires Understanding..................................................................13 Baseline Data Must Be Collected ......................................................................................13 Personnel Throughout the Enterprise Must Be Involved...................................................14 Integration with Other Frameworks Must Be Carefully Planned ......................................14 Getting Started With ITIL..............................................................................................................14 Implementing ITIL.............................................................................................................15 #1: Be Realistic; Start Small ..................................................................................15 #2: Document, Document, Document! ..................................................................15 #3: Obtain Executive Support................................................................................15 Summary ........................................................................................................................................16 Chapter 2: Effective Change Management Through ITIL.............................................................18 The Change Management Process .................................................................................................18 Change Management Benefits .......................................................................................................19 Inputs, Outputs, and Relationships ................................................................................................20 Inputs..................................................................................................................................20
ii
Table of Contents Outputs...............................................................................................................................20 Relationships......................................................................................................................21 About RFCs .......................................................................................................................24 Planning .........................................................................................................................................26 Why Do We Need to Create the CMDB?......................................................................................26 What Should Be in the CMDB?.....................................................................................................27 Why Is Automation Important? .....................................................................................................30 Automation Has Positive Business Impact ........................................................................30 Automation Tool Features .................................................................................................30 Avoid Common Pitfalls .....................................................................................................32 Costs...............................................................................................................................................32 People Costs.......................................................................................................................32 Technology Costs...............................................................................................................32 Measuring Success.........................................................................................................................33 Change Efficiency Rate .....................................................................................................34 Change Success Rate .........................................................................................................34 Change Reschedule Rate....................................................................................................34 Change Incident Rate.........................................................................................................34 Other Useful Metrics..........................................................................................................34 Summary ........................................................................................................................................35 Chapter 3: Effective Incident and Problem Management Through ITIL ......................................36 Incidents.............................................................................................................................36 Problems ............................................................................................................................36 Errors..................................................................................................................................36 Relationship Between Incident and Problem Management ...........................................................37 Why Is Incident Management Important? .....................................................................................38 The Incident Management Process ................................................................................................39 Incident Reporting .............................................................................................................39 Classification and Initial Support.......................................................................................40 Matching ............................................................................................................................40 Investigation and Diagnosis...............................................................................................40 Resolution and Recovery ...................................................................................................41 Incident Closure .................................................................................................................41
iii
Table of Contents Incident Management Benefits ......................................................................................................41 Incident Management Inputs, Outputs, and Relationships ............................................................42 Outputs...............................................................................................................................44 Relationships......................................................................................................................44 Measuring Incident Management success .....................................................................................47 Incident Resolution Efficiency Rate ..................................................................................48 Customer Incident Impact Rate .........................................................................................48 Incident Reopen Rate.........................................................................................................49 Incident Labor Utilization Rate .........................................................................................49 Why Is Problem Management Important?.....................................................................................49 The Problem Management Process................................................................................................50 Problem Control.................................................................................................................50 Error Control......................................................................................................................51 Proactive Problem Management ........................................................................................52 Information Generation......................................................................................................52 Problem Management Benefits......................................................................................................53 Inputs, Outputs, and Relationships ................................................................................................53 Outputs...............................................................................................................................54 Relationships......................................................................................................................54 Putting Incident Management and Problem Management into Action..........................................56 Costs...............................................................................................................................................58 People Costs.......................................................................................................................58 Technology Costs...............................................................................................................58 Measuring Problem Management Success ....................................................................................59 Customer Impact Rate........................................................................................................60 Incident Repeat Rate ..........................................................................................................60 Problem Labor Utilization Rate .........................................................................................60 Problem Reopen Rate ........................................................................................................61 Problem Resolution Rate ...................................................................................................61 Problem Workaround Rate ................................................................................................61 Summary ........................................................................................................................................61 Chapter 4: Supporting Compliance Through ITIL ........................................................................62 IT Compliance Is Relatively Young ..............................................................................................62
iv
Table of Contents Frameworks Support Compliance..................................................................................................63 ITIL Has Been Validated ...............................................................................................................64 ITIL Service Management Supports Compliance..........................................................................64 SOX Mapping to ITIL Service Management.................................................................................65 ITIL Supports Compliance with Many Laws and Regulations .....................................................67 Compliance with Policies and Procedures.....................................................................................68 ITIL Supports Compliance and Improves Business ......................................................................69 Change Management .........................................................................................................70 Incident Management.........................................................................................................72 Problem Management ........................................................................................................73 Compliance Requires AccountabilityITIL Establishes Accountability.....................................75 Summary ........................................................................................................................................75 Chapter 5: Roadmap for Successful ITIL Service Support Implementation .................................77 Getting Ready ................................................................................................................................78 Realizing Improvements Are Needed................................................................................79 Get Executive Support .......................................................................................................80 Choose Team Members .....................................................................................................80 Create Mission Statements.................................................................................................81 Perform a Baseline Assessment .....................................................................................................82 Identify Stakeholders .........................................................................................................82 Determine Current Situation ..............................................................................................83 Identify Trouble Spots .......................................................................................................84 Perform Benchmarks .........................................................................................................85 Planning .........................................................................................................................................86 Document the Business Case .............................................................................................86 Set Goals ............................................................................................................................87 Create the Implementation Plan.........................................................................................87 Create Policies ...................................................................................................................89 Identify Responsibilities ....................................................................................................90 Implementation ..............................................................................................................................91 Train Personnel ..................................................................................................................91 Implement the Plan ............................................................................................................91 Use Tools to Manage Change ............................................................................................92
Table of Contents Measurement..................................................................................................................................93 Review Status.....................................................................................................................93 Measure Goals ...................................................................................................................93 Measure Changes ...............................................................................................................93 Document Problems and Vulnerabilities ...........................................................................94 Plan for Ongoing Management..........................................................................................94 Summary ........................................................................................................................................95
vi
Copyright Statement
Copyright Statement
2007 Realtimepublishers.com, Inc. All rights reserved. This site contains materials that have been created, developed, or commissioned by, and published with the permission of, Realtimepublishers.com, Inc. (the Materials) and this site and any such Materials are protected by international copyright and trademark laws. THE MATERIALS ARE PROVIDED AS IS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. The Materials are subject to change without notice and do not represent a commitment on the part of Realtimepublishers.com, Inc or its web site sponsors. In no event shall Realtimepublishers.com, Inc. or its web site sponsors be held liable for technical or editorial errors or omissions contained in the Materials, including without limitation, for any direct, indirect, incidental, special, exemplary or consequential damages whatsoever resulting from the use of any information contained in the Materials. The Materials (including but not limited to the text, images, audio, and/or video) may not be copied, reproduced, republished, uploaded, posted, transmitted, or distributed in any way, in whole or in part, except that one copy may be downloaded for your personal, noncommercial use on a single computer. In connection with such use, you may not modify or obscure any copyright or other proprietary notice. The Materials may contain trademarks, services marks and logos that are the property of third parties. You are not permitted to use these trademarks, services marks or logos without prior written consent of such third parties. Realtimepublishers.com and the Realtimepublishers logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. If you have any questions about these terms, or if you would like information about licensing materials from Realtimepublishers.com, please contact us via e-mail at info@realtimepublishers.com.
vii
Chapter 1 [Editor's Note: This eBook was downloaded from Realtime NexusThe Digital Library. All leading technology guides from Realtimepublishers can be found at http://nexus.realtimepublishers.com.]
New Technology Threats Poor Communication Interfaces Development Errors that Go into Production Ineffective Change Management Processes Personnel Reluctance to Change
Chapter 1 The possibilities are endless. Implementing yet more disconnected processes and procedures alone cannot efficiently address these challenges.
Recently in the U.K., JPMorgan Chase used ITIL to streamline their IT service desk. They have seven sites with more than 700 people that handle 3 million IT service calls per year. In 2004, they merged with Bank One and had to consolidate dozens of IT tools. Before ITIL, four incident-management tools, 14 change-control systems, four knowledge management tools, and 25 request tools were used. After using ITIL to consolidate processes, JPMorgan Chase now has just one incidentmanagement tool, one change-control system, one knowledge management tool, and four request tools. The service desk maintains 93% customer satisfaction ratings and a 75% first-call resolution rate. (Source: ComputerworldUK, April 23, 2007, http://www.computerworlduk.com/management/itbusiness/it-organisation/news/index.cfm?newsid=2689.)
Old management styles that were once used strictly within centralized, single-system computing environments dont work in todays highly diverse and decentralized environments. It is easy within such complexity for errors to happen. Even small failures can impact the entire business. A single hardware problem can impact multiple virtual machines. For example, if an event console cannot accurately perform root cause analysis, it could possibly be reported as multiple faults, making it extremely difficult to identify the error to fix the problem.
According to IDC research, Eighty percent of IT system outages are caused by operator and application errors (Source: Behr, Kim and Spafford. The Visible OPS Handbook. Information Technology Process Institute. Eugene, Oregon. 2006. pg. 10).
ITIL V3 is currently being reviewed and has a reported planned release in 2007.
Chapter 1 This guide will look at how the ITIL Service Support processes can be applied to businesses. More specifically, it will explore Change Management, Problem Management, and Incident Management. Change Management To most efficiently and effectively handle IT changes, there must be one centrally managed Change Management process. The Change Management process must be integrated throughout the entire applications and systems development life cycle (SDLC). Activities that must be managed to process changes include the following; shown in the order they occur:
1. RecordingEnsuring all change sources can submit Requests for Change (RFCs) and
resources, and involving the change advisory board (CAB) where necessary.
5. CoordinationScheduling, development, testing, and implementation. 6. Evaluation and closureDetermining success and learning from the experience. ITIL Change Management Glossary ITIL uses many terms that may not be familiar to those of you new to this methodology. To assist with the discussion of Change Management within this and subsequent chapters, the following list highlights common terms that you will see when ITIL Change Management concepts are discussed: Change ManagerOne of the two authorities within Change Management. This is the position that is responsible for sorting, receiving, and classifying all Requests for Change (RFCs). Change Advisory Board (CAB)The second of the authorities within Change Management. This is a type of consulting group that meets on a regular basis to review, assess, prioritize, and plan changes. Configuration Items (CIs)IT components and the services provided with them. Examples include computer hardware, computer software, network components, servers of all types, procedures, documentation, and all other components that the IT area controls. Configuration Management Database (CMDB)Used to track all the IT components, including the version, status, and relationships for each. All CIs are part of the CMDB. Process ScopeThis is determined in conjunction with the scope of the Configuration Management and Release Management processes. Determining the scope of Configuration Management is dynamic and can change as associated actions occur and as the information within the CMDB changes. Request for Change (RFC)This is used to propose a change to any component of the IT infrastructure or any part of an IT service. An RFC can be a document or record used to enter the details, justification, and authorization for the proposed change.
Chapter 1
Urgent?
Yes
Coordination
Working as planned?
No
Yes
Evaluation
Chapter 1 Incident Management Incident Management is responsible for managing all incidents from detection and recording through resolution and closure. Incident Management is reactive. The objectives of Incident Management are to reduce or eliminate the business impacts and effects of actual or likely disturbances within IT services to ensure personnel can get back to work and business can resume to normal as soon as possible. The types of activities that occur within Incident Management include the following; shown in the order they occur:
1. Incident acceptance and recordingDetect or report an incident and then create an
incident record.
2. Classification and initial supportCode the incident by type, status, impact, urgency,
priority, service level agreement (SLA), and so on. Provide temporary workarounds as applicable.
3. Service requestIf necessary, implement the appropriate procedure to request IT
services.
4. MatchingDetermine whether the incident is known and if there is a workaround. 5. Investigation and diagnosisIf a known solution to the incident does not exist, then
investigation occurs.
6. Resolution and recoveryAfter finding a solution, the issue is resolved. 7. ClosureIf the user is satisfied with the solution, the incident is closed.
Progress monitoring and tracking activities occur after each of these steps. During these activities, the incident cycle is monitored to determine how quickly it can be resolved and whether escalation is necessary.
ITIL Incident Management Glossary To assist with the discussion of Incident Management within this and subsequent chapters, the following list highlights common terms that you will see when ITIL Incident Management concepts are discussed: IncidentAny event which is not part of the standard operation of a service and which causes, or may cause, an interruption to, or a reduction in the quality of that service (Source: The ITIL Open Guide site at http://www.itlibrary.org/index.php?page=Incident_Management on May 6, 2007.) Request for Change (RFC)This is used to request a change to the IT infrastructure or any IT service. An RFC can be a document or record used to enter the details, justification, and authorization for the proposed change. Service RequestA request for a change to be made to an IT service. A Service Request should be made under strict, well-defined procedural controls, making it almost risk free. Examples include establishing a new network user ID and transferring a computer from one department to another.
Chapter 1
Classification and initial support Follow urgency procedures Yes Service request?
No
Matching
No
Chapter 1 Problem Management So how is a problem different than an incident? Generally, a problem is an unwanted or undesirable situation that, if not addressed soon enough, can become the root cause of an incident. Problem Management takes the entire IT infrastructure into account, using all available information to identify existing and potential failures in the delivery of IT services. Problem Management supports Incident Management by providing alternative workarounds and temporary fixes during an incident but does not have responsibility for actually resolving incidents. Problem Management also involves the analysis of incidents and problems to identify trends and then subsequently takes proactive actions to prevent the further occurrence of similar incidents and problems. The types of activities that occur within Problem Management include:
1. Problem identification and recordingIdentifying known and new problems and
and then closing the records. During each of the first four steps, actions are taken to track and monitor the problem, ensuring clear and comprehensive documentation is maintained. Likewise, during each of steps five through eight, actions are taken to track and monitor the error.
ITIL Problem Management Glossary To assist with the discussion of Problem Management within this and subsequent chapters, the following list highlights common terms that you will see when ITIL Problem Management concepts are discussed: ProblemA description of an unwanted situation that specifies the root cause of one or most existing or potential incidents. Known ErrorA problem that has a documented root cause and a workaround. Post-Implementation Review (PIR)A review that occurs after a change or a project has been implemented, determines whether the change or project was successful, and identifies improvement opportunities.
Chapter 1
Problem investigation and diagnosis RFC and problem resolution and closure
Problem classificatino
Error assessment
Chapter 1
These inefficiencies and negative impacts can be reduced, and most even eliminated, using ITIL. Efficient IT Benefits Business ITIL promotes efficient and effective IT practices, which in turn benefits business. All these benefits help to ensure IT becomes a global, efficient, cost-effective, seamless part of the business enterprise.
Chapter 1
Customer Retention IT services become more customer-focused, making your external customers happier and promoting customer loyalty and retention by reducing IT problems that noticeably impact customers. Agreements about IT service quality also improve the relationships with customers. IT services are more clearly and accurately described, in better detail, and in customer language. The customers who depend upon your IT services as part of the product or service they purchased expect that your organization will put them first when they experience problems. If your IT services are not customer-focused when problems occur, you will quickly find yourself on the front page of international news sitesnot only concerning your other customers but also keeping potential customers away.
Avoid headlines like these from the May 16, 2007 abcnews.com Web site and widely discussed on Good Morning America (http://abcnews.go.com/GMA/Technology/story?id=3179394&page=1): "Dell Hell: Computer Giant Faces Consumer Lawsuit and Consumers Allege They Didn't Get the Tech Support They Paid For."
Improved Quality IT service quality, availability, reliability, and costs are managed better when properly using ITIL, saving time, money, and resources and resulting in better justification for costs related to IT service quality. There are many factors that contribute to this quality improvement: Documented roles and responsibilities improve the quality of IT service provisioning. Following repeatable, consistent processes that are engineered specifically to support business reduces human errors, resulting in better quality output and outcomes. Quality management systems based on ISO 9000 and BS15000 are supported.
As a case in point, United Space Alliance, the largest contractor for the NASA space shuttle program, implemented an integrated asset and service management system using ITIL (Source: http://www-306.ibm.com/software/success/cssdb.nsf/CS/LWIS6ZSLKQ?OpenDocument&Site=software&cty=en_us, May 16, 2007). As a result, they measurably improved service quality and efficiency for their 50,000+ hardware assets and 100,000+ software assets by establishing real-time incident, problem, and change management capabilities. Many organizations are now outsourcing critical IT processes. The IT process structure resulting from ITIL also provides a framework to facilitate more effective outsourcing of IT service elements, allowing the organization to realize better quality from their outsourced vendor. A higher-quality IT service support function brings with it more agile and efficient IT service support, which enhances competitiveness and ultimately improves business.
10
Chapter 1
Greater Efficiency Commonly used IT processes are better integrated when successfully implementing ITIL. Rework is reduced and redundant work eliminated by centralizing IT processes. Not only does this result in IT processes having improved scalability and consolidation, but the IT area is more clearly structured, more efficient, and better focused on corporate objectives. Because ITIL is business focused, IT is also better integrated with other business processes throughout the enterprise. Better integration results in an improved utilization of IT resources. Changes within the IT infrastructure are easier to manage. Clearly identifiable reference points for internal communications and external communications with vendors and business partners are created, allowing for the effective standardization of procedures. During mergers and acquisitions, IT installations that may have wide-scale differences are consolidated into one coherent management structure by using ITIL concepts and frameworks. Better Communication The use of ITIL concepts produces agreed-upon consistent points of contact within the IT areas. Consistent points of contact within the IT area improve communication. Because of the emphasis of documentation, continuous documented learning occurs from IT experiences, helping to prevent mistakes from recurring. As a result of improved communication, IT services better meet business, customer, and user demands and realize improved performance for the IT service delivery and service support areas. Measurable Results Using ITIL, demonstrable performance indicators are created and used to support the business. By being able to monitor and respond to these indicators, mission-critical IT services have improved availability, reliability, and security. The centralization and efficiency impacts result in measurably reducing latency at every stage of the IT management cycle, dramatically reducing costs. Reducing latency improves IT project deliverables and delivery times. Baseline IT metrics and ongoing measurements become part of the business as an effect of ITIL implementation. How significant are the savings that can result from ITIL? Consider Transporeon, a leading European e-logistics solution provider. Transporeon implemented ITIL and an IT process automation system and improved their IT staffs productivity in addition to reducing their overall maintenance cost by more than 40% (Source: http://www.opsware.com/Downloads/CS_2007_05_PAS_Transporeon.pdf on May 16, 2007). A side effect was freeing key resources to allow for faster response times that increased customer satisfaction.
11
Chapter 1
Better Audit Outcomes ITIL makes audits easier by having better documentation and up-to-date metrics that the auditors can use instead of having to try to create the metrics themselves based upon numerous and disparate documents. ITIL allows IT management system audits to be more favorable and take less time. Because IT infrastructure and related services are better controlled, fewer audit findings result. Security controls based on COBIT, which auditors overwhelmingly use, are supported. Compliance requirements for the U.S. Sarbanes-Oxley Act, as well as other laws and regulations, are supported by ITIL. Automation Boosts IT Efficiency The efficiency of ITIL implementation can be improved with automation. Many critical processes can be automated to streamline business. Automation can be seamlessly integrated using products specifically engineered to complement the ITIL processes.
Automation makes personnel work easier and less training is needed to accomplish the ITIL objectives.
Good, effective tools allow automation to use the same data model, security model, and other IT models your organization has adopted; it just makes them timelier, more efficient, more consistent, and more likely to be error-free. As a case in point, consider EDS. EDS has one of the worlds largest IT organizations. They recently automated more than 65,000 servers across more than 400 worldwide locations in support of their ITIL process. As a result, they reduced costs and improved efficiencies in their IT organization by automating the complete life cycle of business application management and the underlying infrastructure. The EDS example points out that ITIL can support a global scale of deployment. Automation solutions must be able to scale as large as possible. Automating ITIL processes does not occur overnight; automation must be built-in to solutions. In addition to supporting more successful and efficient global deployment, automation allows for IT service support and delivery processes to be delivered more quickly, saving huge amounts of time to accomplish tasks and ultimately reducing resource costs involved with performing actions manually. As an example, consider the BNSF Railway. In 2005, BNSF became one of the first companies in the railway industry to deploy an extensive wireless network and automate the management of its complex IT infrastructure. Prior to automation, the network engineers would log on to each device manually to make configuration changes, taking significant time to accomplish. Automation allowed the BNSF network engineers to securely automate password and SNMP community string management, deployment of access control lists (ACLs), and configuration change tracking. Now, BNSF pushes out changes as a batch and they use an automated policy compliance manager to ensure that all deployed changes match the companys required security and compliance policies.
12
Chapter 1 As Greg Britz, Network Operations Manager, BNSF Railway said, Automation will win every time over manual IT management as we begin to roll out new services and the network becomes more complex. We will handle configuration updates in milliseconds, compared to the minutes or hours that it took to configure systems manually(Source: http://www.opsware.com/about/success_BNSF.php on May 8, 2007).
Automation reduces latency at every stage of the management cycle to dramatically reduce costs.
ITIL processes and solutions must be living. The ITIL rules and capabilities, compliance audits, and other processes must be updated and changed whenever necessary with the least impact. In addition, solutions must be able to self correct as much as possible and notify administrators when the rules have been changed and when they need to change. Automation allows for these living changes to occur much more quickly and efficiently than can be accomplished manually.
ITIL Challenges
The benefits of ITIL can be realized only if ITIL is used correctly. Organizations face similar challenges using ITIL and must be aware of the common mistakes. Avoid these mistakes by understanding and using ITIL components according to the needs of your business that the IT organization supports. ITIL Implementation Takes Time Bringing ITIL into the enterprise can take a long time and requires significant coordination and effort. It may very well require a change to the culture of the organization. Being overly ambitious in bringing ITIL into the organization could be frustrating if objectives are not met. ITIL Implementation Requires Resources from Across the Enterprise Without sufficient resources, training, support tools, and time, ITIL will not be implemented to the degree with which it can have the most positive business impact. When ITIL is being introduced, additional resources and personnel may be needed until it is well established. ITIL Implementation Requires Understanding A lack of understanding about the processes being implemented will not result in any improvements. Those using ITIL, throughout the enterprise, must understand what the appropriate performance indicators are and how to control the processes. Baseline Data Must Be Collected Baseline data must be established to be able to measure impacts and improvements. If no baseline data is collected, improvements in the provisioning of services and cost reductions will not be able to be measured, and business leaders will not know, in quantitative terms they understand, the value ITIL brought to the organization.
13
Chapter 1 Personnel Throughout the Enterprise Must Be Involved Successful implementation requires the participation of personnel at all levels of the enterprise, throughout the entire enterprise. If you try to have one department or team implement ITIL, it may very well isolate that group, and a direction may subsequently be set that the rest of the enterprise does not accept or follow. Integration with Other Frameworks Must Be Carefully Planned Implementing ITIL with other frameworkssuch as COBIT, Six Sigma, and CMMIis possible. In fact, just one framework alone will not meet the wide range of business needs and processes, and so multiple chosen frameworks should be used in harmony. However, harmony between ITIL and other frameworks cannot be achieved in ways that try to make incompatible components fit; this will lead to frustration and failure and full ITIL value will not be realized. Certainly, multiple frameworks have compatible components, but careful analysis must occur to determine which components are truly good fits within your particular organization.
To most successfully implement ITIL, IT organizations can implement an integrated technology solution that addresses the issues previously discussed. Automating the ITIL processes will allow the IT department to more successfully Map services to business needs Measure key performance indicators (KPIs) and the actual end-user experience Manage IT components across all organizational systems and networks
When choosing a technology solution to support ITIL, organizations should require the solution to support all aspects of IT service support in a unified manner so that use of the product does not end up being counter to ITIL principles.
14
Chapter 1
Implementing ITIL Expect to have at least some internal resistance to implementing ITIL. There will always be people who would rather stay an ineffective course than put in the time and effort necessary to follow a new path. ITIL initiatives can be successfully championed and initiated by following three simple principles. #1: Be Realistic; Start Small Identify the areas within the IT infrastructure where there is the least efficiency, most problems, and most user dissatisfaction. Implement ITIL to address those areas. This will give you experience with the ITIL processes while addressing the most significant and business-impacting IT problem areas. #2: Document, Document, Document! You will not be able to demonstrate or communicate the value of IT to the business if it is not well documented. Initial measurements and benchmarks must be accurately and consistently documented to validate improvements. Accurately and consistently tracking KPIs will allow IT to continuously measure progress and report this progress to business leaders over time. This is critical for successful and effective management as well as for establishing accountability for business unit and executive management. #3: Obtain Executive Support Organizational changes are almost always difficult. It is human nature to want to continue using known processes, even if they are bad or ineffective, instead of learning and implementing something new. Executive support and sponsorship is necessary for successful ITIL implementation, just as it is with any other major enterprise initiative. IT personnel will need to do significant work to implement ITIL processes, so executive support is a must, and the value for it cannot be underestimated. When executive leaders understand the positive impact ITIL can have upon business, they will support the training and other investments necessary to ensure successful and efficient ITIL implementation.
15
Chapter 1
Summary
By relating IT infrastructure to business value, ITIL also helps demonstrate to business leaders the value of IT, supporting IT investments and initiatives. ITIL integrates data to provide a comprehensive cross-tier representation for IT services, bringing better understanding to business leaders throughout the enterprise for how IT supports business success. ITIL integrates data to provide a comprehensive cross-tier representation for IT services. ITIL is a powerful set of guidelines that enables IT to deliver greater value and better align itself with business needs in efficient and valuable ways. Organizations implementing the ITIL framework must do so with a clearly defined sense of purpose and realize that, as with most ambitious business objectives, it will be achieved one step at a time. ITIL addresses IT service support challenges, as the following examples illustrate: Change Management Helps prevent problems and incidents that typically occur when IT changes are made to accommodate new technologies that are deployed throughout the enterprise Helps ensure that all IT infrastructure or device changes are implemented in a consistent, efficient, and repeatable way, which in turn will minimize IT services downtime resulting from errors and bad planning associated with the changes Helps to successfully combine and streamline multiple and diverse systems and applications during mergers and acquisitions Lessens the impacts of new technology threats, ensuring more efficient and effective recovery Helps restore business services and processes as quickly as possible; incidents are recorded within a central repository, enabling IT to most effectively utilize available skills and ensure important tasks are not overlooked during incident response With Problem Management, ensures an effective interface exists between the details of Incident Management and Problem Management systems to most effectively resolve the errors and root causes of incidents to keep the errors from recurring Helps to ensure that known errors from the development environment are communicated to the production environment Minimizes the impact that errors within the IT infrastructure have on the business and helps to prevent recurrence of incidents related to the errors
Incident Management
Problem Management
16
Chapter 1 The upcoming chapters will discuss in detail how the ITIL Change Management, Problem Management, and Incident Management Service Support processes can be applied to help support business activities and goals. Each chapter will detail The basic concept and objectives for the associated ITIL process The benefits of implementing the ITIL process How to get started with implementing the ITIL process Specific metrics for each ITIL process Ways to verify the ITIL process The costs of potential problems associated with each ITIL process
Chapter 4 will discuss in more detail how ITIL supports compliance. And, finally, Chapter 5 will tie it all together within a roadmap for successful ITIL implementation of the three processes discussed.
17
Chapter 2
18
Chapter 2
Successful implementation of the ITIL Change Management process will result in many benefits to the business, including: Better estimates for proposed change costs Better management information about changes allow for better problem diagnosis Fewer reversed changes More smoothly executed back-outs Improved IT personnel productivity because of fewer distractions caused by emergency changes or back-out procedures Improved user productivity because of more stable IT services Better ability to make more frequent changes without creating an unstable IT environment Reduced adverse impacts of changes
How will you know if you are realizing these benefits? By maintaining Change Management metrics, which I discuss later in this chapter. However, first it is important to understand what is involved with the Change Management process to help you understand and appreciate the benefits measurements.
19
Chapter 2
The CMDB data is critical for performing the change impact analysis.
Outputs The outputs of the Change Management process include: The updated FSC Triggers to use for Configuration Management and Release Management CAB agenda, minutes, discussions, decisions, and action items Change Management reports
The FSC, sometimes called a Change Schedule, lists all approved changes and their planned implementation dates.
Figure 2.1 illustrates the inputs and outputs for the Change Management process.
20
Chapter 2
Data From Other Processes Forward Schedule of Change (FSC)
RFCs
CMDB Data
Recording Rejecting
Building
Accepting Evaluating
Triggers
CAB Documents
Updated FSC
Relationships Change Management has relationships with all the other ITIL processes. It is important for the success of not only Change Management but for all enterprise-wide ITIL processes that these relationships are appropriately managed. Figure 2.2 illustrates these relationships at a high level.
Incident Management Configuration Management
RFC CI Relationships RFC Change Notice Change Notice Change Notice
Capacity Management
Change Management
PSA Report
Change Notice
RFC
Problem Management
Change Notice
RFC
Availability Management
Release Management
21
Chapter 2 As this figure shows, Change Management activities impact all other ITIL processes in one way or another. It is important for effective communications channels to exist to communicate key activities. Lets step through an example to see how all these processes are related. ACME Super Duper Supplies is going to implement a new ecommerce Web site that will allow for online merchandise ordering and payments for their new product, Magic Mover. This is a significant change in their IT infrastructure in addition to having a major impact on their business. The change must be implemented in a coordinated way to ensure all impacted areas are aware of the change, and that any potential negative impacts are minimized as much as possible. By following ACMEs Change Management process, Ms. Flint, the manager of the Magic Mover business unit, will help ensure the change is implemented as successfully as possible. Ms. Flint submits an RFC for the change to the Change Management team. The ACME Change Manager gives the RFC to the CAB, which approves the change. The Change Manager works closely with the Configuration Management area to provide the data from the associated CMDB to identify the relationships between the configuration item (CI) associated with adding Magic Mover to the site and determines what is affected by the change. The CAB works with the Availability Management team to estimate the potential impact of making the changes to add the Magic Mover to the e-commerce Website. Availability Management will in turn make the changes necessary to help improve service availability as it may be affected as a result of the changes. The Change Manager notifies the Incident Management, Problem Management, and the Release Management teams of the planned change so that they can determine how this change will affect them. The Change Manager sends a report to the Service Level Management team that lists the changes that will need to be made to the SLAs along with the impact of the FSC on the service availability. The Change Manager will communicate the change details to the Capacity Management team so that they can determine what the cumulative effects will be of adding the Magic Mover item to the e-commerce Web site, and they will determine what the cumulative impact of that change will be over an extended time. They may find that response time will be impacted and that more processing power is necessary. The Change Management and the CAB will work closely with the IT Service Continuity Management team to ensure it is aware of all the changes that will be made as a result of adding the Magic Move to the Web site and determine how this will impact the existing recovery plans. They can then ensure that the appropriate steps are taken to update the plans so that recovery can be completed successfully. Figure 2.3 shows now the Change Management process would flow to make the Magic Mover Web site implementation.
22
Chapter 2
Ms. Flint submits an RFC to add the Magic Mover to the ecommerce site
Is the change urgent? No CAB creates the change plan and waits for approval Magic Mover team works with Change Management area on implementation
Yes
No
23
Chapter 2 Table 2.1 provides the high-level descriptions about the relationships between Change Management and the other ITIL processes that I pointed out in the Magic Mover scenario. These relationships will be similar for any type of change.
ITIL Process Availability Management Capacity Management Configuration Management Incident Management IT Service Continuity Management Problem Management Release Management Service Level Management Relationship with Change Management Helps to estimate the potential impact of changes and determines how a change could affect the availability of a service Works with Change Management to determine how a change would impact a service and the availability of resources over an extended period of time Controls change recording and change impact analysis and keeps track of the relationships between the CI and other CIs; ITIL Service Support guidance recommends integrating with Change Management Requests changes to repair the impacts of incidents; also takes information from change notices to identify and repair any impacts from those changes Must be aware of changes that could make continuity plans unfeasible or unnecessary and updates plans accordingly Must be aware of changes to be able to identify new errors that result in new problems; must also communicate change requests to fix errors Change Management controls rollouts of new releases Helps to determine the impact of changes on services and business processes; discusses change impacts with customers as appropriate
About RFCs RFCs come from many different sources, as represented in Figure 2.4.
24
Chapter 2
IT Personnel
Legislation
Customers
RFCs
Project Management
Suppliers
Problem Management
The RFCs can contain a wide amount of varying information depending upon your own unique organization, business, technologies, and so on. A few examples of the types of information to collect on RFCs include: Requestors name, location, phone number, email address Submission date RFC identification number Problem number CI to be changed Description of change Justification and business benefit for change Estimated resources Timeframes
25
Chapter 2 The RFC will be recorded when submitted. From the information on the RFC, Change Management will be able to determine whether the request will be treated as a service request, as a change, or will be denied. This categorization is good because it helps to sort out the service requests so that the CAB does not need to spend valuable time considering them. Change Management also makes an initial decision for denying RFCs if they do not make sense, are impractical, incomplete or unnecessary; this saves additional time for the CAB. If the CAB accepts an RFC, they give it a priority and determine the category to put it in.
Planning
Change Management uses an FSC to keep track of when each change will occur. The FSC will inform the recipients of upcoming changes. The FSC should contain enough information for the person responsible for the change to determine whether the change is going to affect them. The FSC allows both the IT and business areas to schedule changes appropriately. The Change Manager may need to obtain the approval of IT management for major changes before submitting an RFC to the CAB. Approval for major changes is typically necessary for three issues: Business approvalThe areas impacted by the change may need to provide approval. Financial approvalThe IT area may need to perform a cost/benefit analysis and budget. Technical approvalThe IT area will need to determine the impact, necessity, and feasibility of the change.
If these approvals are obtained, the CAB will help plan significant changes and act as an advisory committee. To help facilitate effective use of time and make the most informed decisions, the Change Manager should communicate the details of the RFC to CAB members prior to the CAB meeting.
26
Chapter 2 The situation would likely be quite different for a very large IT department with the same types of application troubleshooting problems. The resolution process would typically span the Help desk area, many different IT experts, the customer relationship area, and the related business unit managers. For a large organization, simply capturing all configuration details in the CMDB will not improve the coordination effort; too many details will be confusing to the different players who only need to know some of the details. Instead, this large organization will probably decide to share the many different infrastructure relationships across the Change Management team members. Their CMDB could then contain the most basic device configuration data, relationship information, and information that points to additional sources of more detailed information. These two CMDB implementations are much different but they both provide benefits to business. They allow for shorter applications and systems problem solving and more efficient and error-free changes. By performing thoughtful and enterprise-wide efforts to prioritize process improvements, take into consideration the players involved, and identify the data sharing and use needs throughout the enterprise, each organization will be able to determine the best items to put into each of their respective CMDBs. The CMDB will provide a centralized enterprise repository of information that contains all the details related to the IT architecture. It will allow for a unified view of every IT component within the enterprise. This centralized, unified capability will allow all the business leaders to make better business, as well as technical, decisions. A CMDB will facilitate the capabilities for: Automated discovery Service-centric views Automated and out-of-the-box change processes Integration to other change management solutions
27
Chapter 2
A single, centralized, all-encompassing CMDB should have all the key information available to the entire organization to track all the CIs in the system, map dependencies of CIs, track the status of CIs, determine the history of CIs, and track requests for change for CI verification. Some of the fields you will want to consider using as keys to track the status for each CI are listed in Table 2.2.
28
Chapter 2
Key Field CI Identifier (ID) CI Description CI ID Number CI Category Owner Customer Date Created License Number Location Make Model Model Number Part Number Relationship Relationship Number Scheduled Maintenance Serial Number Status Supplier Ticket Number Version Number
Description The unique name used to identify the CI The description of the CI The unique number generated by the CMDB The category for the CI The person responsible for the CI The customer using the CI The date the CI was created Software license number The physical location of the device Manufacturer Model name Model number Hardware part number How the CI is connected to other CIs; for example, Parent/Child, contained within another CI, using another CI, and so on CI IDs used to create the Relationship Number Date for the next scheduled maintenance, if applicable Hardware serial number Information regarding if the CI is registered, accepted, rejected, under development, installed, and so on The vendor that supplied the component Ticket numbers related to this CI Software version number
These key fields will not only help to make Change Management more efficient, the data can also be used to help measure the success with Change Management activities and automate the some or all of the Change Management process.
29
Chapter 2
A federated data model enables a CMDB to provide a single source of record for CIs.
30
Chapter 2 Ability to be used throughout the entire IT infrastructure coverage; all servers, applications, network devices, storage locations, and so on Maintain baseline configurations and generate reports against those baselines at any point in time Event management between all enterprise IT systems Dependency mapping of CIs to determine business impact assessment, service desk activities, and event management consoles Ability to generate role-defined dashboards Software developer kits (SDKs) and application program interfaces (APIs) for data integration Configurable triggers for workflows and specific events that include send notifications and create incidents to investigate
A good and effective automation product that is successfully implemented within an organization can provide the following business benefits: Increased ITIL adoption throughout the enterprise for all eight ITIL processes ITIL implementation acceleration by making previously manual tasks automated and utilizing standardized ITIL processes, speeding up adoption and business-impact analysis Standardization of tasks and workflows throughout the enterprise and at all levels for noticeable positive day-to-day process impact Verification that the Change Management processes are being performed correctly and are effectively communicating with the other ITIL processes Communications between teams improve resulting in a decrease of operational costs Granular configuration tracking, reconciliation, and auditing capabilities result in more detailed and accurate compliance reports, enabling audits to be performed more quickly using historical change tracking, point-in-time references, and remediation capabilities More efficient provisioning and change management of new application services based on standardized configurations that allow for faster time to production Improvement in change and configuration management processes using automated workflows and integration across multiple enterprise IT sources using federation improves IT service availability Server repurposing results in reduced hardware costs Creating a single source for provisioning and compliance tasks that produces comprehensive tracking streamlines processes, brings cross-silo teams together, reduces human errors, and results in lower operations costs The ability to create real-time accurate reports showing the current state of the environment, comparisons to baseline data, and trending analysis on change activity based on CMDB data allows for more useful and valuable reports
31
Chapter 2 Avoid Common Pitfalls Organizations often fall into common pitfalls during the implementation of ITIL Change Management automation tools: Lack of executive sponsorshipExecutive visibility and leadership buy-in is essential for describing political and silo concerns and for getting cooperation enterprise-wide Lack of integrationLack of solution integration between the chosen vendor products and with existing third-party tools used within the enterprise Cross-silo configurationsLimited configuration capabilities for network components, server, applications, storage, and so on that span multiple enterprise silos without a federated data scheme Inability to calculate return on investment (ROI)Limited ROI discussion and long-term projection with only a narrow perspective on short-term cost savings Excessive deployment and professional services costsPlanning sessions considering deployment timelines and key ROI objectives with a quarter-to-quarter perspective and focus on the role of integrators for deployment customization and data, event, and interface integrations Poor ITIL alignmentOverlooked opportunities to standardize processes based on ITILdefined process workflows to increase the impact of changes on IT services Lack of closed-loop processesMany vendor solutions do not have a closed-loop change management process; without a closed-loop process, you will need to spend time and resources to integrate the products, which will be a very complicated task
Whatever automation tool you use, it should seamlessly integrate all Change Management processes to avoid costly in-house time making it fit with your other systems.
Costs
It is important for you to consider the costs involved with implementing ITIL Change Management processes. These costs will basically fall into two categories; people costs and technology costs. People Costs You likely already have personnel throughout the enterprise performing Change Management tasks. If you are not already using ITIL, it is likely that they are performing these tasks, but in silos, meaning they are repeating tasks. When implementing Change Management processes, you should be able to use some of these personnel that are now freed up for implementation. However, you will still need to utilize personnel to be on the CAB. Technology Costs You will need to plan carefully the hardware and software tools you decide to use for implementing Change Management processes, and ensure that they integrate with the other ITIL processes. A good, integrated technology tool may be a significant up-front investment, but if chosen and implemented correctly, it will result in long-term savings in other areas of the enterprise.
32
Chapter 2
Measuring Success
Change Management metrics can help improve business. But to demonstrate this, it is important to create statistics and metrics to clearly show the improvements. Success must be documented in terms of improvements to the business. As it has often been said, you cannot manage what you cannot measure. What kind of change management measurements and associated data can be used to measure improvements? The following are some for you to consider and build upon: Total changes planned; in the pipeline Total changes implemented Number of failed changes Number of emergency changes Number of unauthorized changes Number of rescheduled changes Average process time per change Number of changes that resulted in incidents Change management tooling support level Change management process maturity Total labor hours to coordinate changes Total labor hours available for coordinating changes Total labor hours to implement changes Change management system reports Incident management system reports Labor reports HR reports Audit reports CMDB reports
So where do you find this data? They can be found in such places as:
What kind of evaluations can you make from these seemingly nondescript numbers? Now is the fun part, when you get to do some math! The following are just some of the metrics you can calculate from the data.
33
Chapter 2
Change Efficiency Rate You can determine the change efficiency rate by dividing the total changes implemented by the total changes in the pipeline. For example, if you did 20 changes this week and you had 40 to do in the pipeline, your efficiency rate is 20/40, or 50%. This will tell your management how efficient you are at handling changes. This can be used to demonstrate your improvement in implementing changes by using ITIL compared with when you did not. Change Success Rate You can determine the success rate and failure rate percentages for your changes by dividing the number of failed changes by the total number of changes implemented. For example, if you implemented 10 changes, but 2 of them failed, you would have a 2/10, or a 20% failure rate. Subtract this from 100% and this gives you an 80% success rate. Change Reschedule Rate You can determine how well you implement changes on schedule by calculating the change reschedule rate. Do so by dividing the number of changes rescheduled by the number of changes you had scheduled. For example, if you had planned for 40 changes this week but rescheduled 5 of them, your reschedule rate would be 5/40, or 12.5%. Change Incident Rate A very useful metric to reveal how changes impacted business productivity is the change incident rate. You can calculate this by taking the number of changes that created incidents and divide it by the total number of changes implemented. For example, if you implemented 30 changes this week and 5 of them caused incidents, your change incident rate would be 5/30, or 16.7%. Other Useful Metrics These should give you a good idea of what metrics you can use to determine your successes and challenges with Change Management processes. There are many more you can compute using the data you have gathered in a successfully implemented Change Management process. A few of these include: Emergency change rate Average process time per change Unauthorized change rate Personnel time utilization for changes Change Management technology tools support utilization Change Management process maturity
Metrics such as these will tell you, and more importantly tell your business leaders, how efficient you are at implementing Change Management process components and where improvements are needed.
34
Chapter 2
Summary
Implementing the ITIL Change Management process will be an evolutionary process. It will take time and investment up front. It will be a learning experience. But, when done correctly, it will make your business more efficient and make IT more valuable in the eyes of your business leaders. Change Management implementation success will take the strong and steady commitment of your executive management to get through these growing pains. Be sure you have that to get the subsequent commitment of your ITIL team members and ultimately improve your Change Management processes.
35
Chapter 3
Problems The ITIL Service Support book defines a Problem as An unknown, underlying cause of one or more incidents. A single problem may generate several incidents. Examples of problems are: An application update may have made the application unusable under the same settings as before the update A newly installed WAN component may not be working correctly The ISP may not have renewed the domain name correctly
Errors The ITIL Service Support book defines an Error as A problem for which the root cause has been identified and a workaround or permanent solution has been developed. Errors can be identified through analysis of user complaints or by vendors and development staff prior to production implementation. Examples of errors include: The network settings for the desktop or server may have been misconfigured A network-monitoring tool may incorrectly flag a WAN circuit as being busy The spam filter on the email server may have been configured incorrectly
36
Chapter 3
To demonstrate this relationship, consider a common scenario within IT shops. The Service desk receives a call from an end user who got an error message when trying to log into the network. The Service desk logs the report to the incident database. An automated trend analysis determines whether this same incident has been reported, taking into consideration the time, date, and other related details about the incident. The resulting trend analysis is sent to the Problem Management system where commonalities between this and the other reported incidents can be identified. Common failures and configuration items (CIs) are identified and matched with known errors. The Problem Management system will provide a workaround or a temporary fix so that the user can get logged into the network as soon as possible. In the meantime, a request for change (RFC) may be generated to resolve the error. If the number of incidents continues to increase, the priority for implementing the RFC will become higher. When the change is implemented, the Known Errors Database will be updated to indicate the error has been resolved. Figure 3.1 shows the relationships between incidents, problems, and errors.
37
Chapter 3
Incident Management is inherently reactive. With regard to IT incidents, the goal of Incident Management is to reduce or eliminate the effects of actual or possible troubles in IT services to ensure users can get back to work, and the business can get back to being productive, as soon as possible. Incident Management has a short-term focus on restoring service.
Information Management activities include: Incident detection and recording Classification and initial support Investigation and diagnosis Resolution and recovery Closure Incident ownership, monitoring, tracking, and communication
To most effectively address incidents, they need to be recorded and classified and the resolution for each assigned to the appropriate, qualified personnel. Incident resolution must be monitored consistently and closely to ensure incidents have been completely addressed.
38
Chapter 3
Incident is reported
Incident recording
Service Request? No
Yes
Matching
Match? Yes
No
No Resolved? Yes A
Incident Closure
According to the Office of Government Commerce (OGC) Best Management Practice (http://www.best-management-practice.com/gempdf/ITIL_Glossary_V3_1_24.pdf), Service Request is defined as A request from a User for information, or advice, or for a Standard Change or for Access to an IT Service. For example to reset a password, or to provide standard IT Services for a new User. Service Requests are usually handled by a Service Desk, and do not require an RFC to be submitted.
Incident Reporting Incidents can be reported from any part of the enterprise as well as a number of sources outside the organization. Following a well-thought-out repeatable process will not only make incident responses more efficient, it will help to prevent similar incidents from recurring. When the incident is reported, it is important that the details of the incident are first recorded as soon as possible. If you try to jump headfirst into incident response thinking you can always come back later and record the details, it is likely that documentation will never occur. It is also important for successful resolution of the incident that ongoing recording of significant details occurs so that progress can be accurately monitored. This documentation will also assist with addressing other incidents; learn from your experiences!
Failure to record the incident details will not allow you to monitor compliance with SLA levels.
39
Chapter 3 An important note to make about incident reporting is that each incident should not be recorded in the system more than once. Doing so will skew the incident reports and make your key performance indicator (KPI) metrics inaccurate. A KPI is a valuable metric that indicates the performance level, or success, of a particular operation or process. Management can use KPIs to make better decisions about IT processes and systems.
The OGC Best Management Practice (http://www.best-managementpractice.com/gempdf/ITIL_Glossary_V3_1_24.pdf) defines a KPI as A Metric that is used to help manage a Process, IT Service or Activity. Many Metrics may be measured, but only the most important of these are defined as KPIs and used to actively manage and report on the Process, IT Service or Activity. KPIs should be selected to ensure that Efficiency, Effectiveness, and Cost Effectiveness are all managed.
Classification and Initial Support Often overlooked in typical incident response plans is classification of the incidents. Classification will allow the incident to be categorized and assist with monitoring and reporting. To create your classifications, use the following parameters: CategoryThis will include information about the origin of the incident or the support group involved. Examples include such things as processor, network, workstation, organization, procedure, Service Request, and so on. PriorityThis will determine how quickly the incident should be addressed. ServiceThis will provide information about the services involved with the incident as covered within the associated SLA. Support groupThis is the group that will assist with incident resolution if the Service desk cannot resolve it. TimelinesThis will indicate the estimated time it will take to resolve the incident along with planned update times. Incident reference numberAssign a number not only to make it easier to find the incident data within your Incident Management system but also to reference. StatusUpdate the status to show where you are within the incident resolution process.
Matching After the incident is classified and all associated data recorded, check to determine whether this type of incident has occurred before. If so, you can streamline the incident response time by seeing what the solution or workaround was for the previous incident and possibly use the same one, depending upon the symptoms or causal problems and/or errors. Investigation and Diagnosis If the Service Desk passes an incident on to a support group, the group will investigate the incident and perform diagnosis to provide resolution. If the initial group cannot resolve the incident within the targeted timeframe, they will pass it on to another support group. This will continue until the incident is resolved.
40
Chapter 3 Resolution and Recovery When the incident has been successfully solved, the support group will record all the details about the resolution into the system. If a change must occur to prevent a similar incident from recurring, a request for change (RFC) will be submitted into the Change Management process.
It is possible that you may have an incident that does not get resolved. In this hopefully rare situation, the incident will remain open.
Incident Closure The support group will send notice to the Service Desk that the incident has been resolved. The Service Desk will then check with the person that reported the incident and ask him or her to check the related application or system to ensure that, from their point of view and experience, the incident truly has been addressed correctly. The incident record should be updated to indicate what final category the incident is now in, along with the SLA-related metrics. Throughout the Indicate Management process, the Service Desk is responsible for monitoring progress and updating users and customers of incident resolution status and escalation to other support groups.
41
Chapter 3 The IT area benefits will include: More efficient and effective use of personnel time Documented tracking of incidents and service requests with lessened likelihood of losing or incorrectly documenting incident information The CMDB is more accurate, with incident information keeping it updated as well as audited with the incident data being recorded and mapped to CIs The ability to improve monitoring of and measurement for meeting SLA requirements Better management of SLA reporting and service quality Customers are happier with IT services because of more effective response to incidents and less downtime
To make incident response as effective and efficient as possible, there should be a basic core of information consistently collected about each incident. These data items will determine the classification of the incident and will contribute to determining the urgency and speed for which the incident should be addressed. The data items will also support how the incident is monitored and provide information for the incident report.
42
Chapter 3 Table 3.1 provides the items that should be collected when an incident is reported; these are the details that are input to the Incident Management process.
Input Item Category Description Each incident should be assigned to a category and subcategory to correspond to the incident origin and support group. The following are examples of categories that can be used: Central processingApplication, system, mainframe NetworkIP address, segment, router, hub Organization and ProceduresCommunication, order, request Service RequestFrom the Service Desk Use and FunctionalityAvailability, backup, capacity, service WorkstationKeyboard, monitor, CPU, storage drive Each incident needs to be assigned a priority to help the support groups understand which incidents need to be addressed immediately versus those that can be addressed at a later time. Priority is often computed by taking a number assigned to Urgency multiplied by a number assigned to Impact. For example, if the Urgency is 1 and the Impact is 2, the Priority is 1 2, or 2. Another incident may have an Urgency of 3 and an Impact of 1, so the Priority would be 3 1, or 3. This is a list to identify the services related to the incident. These should reference the applicable SLA requirements. Included within this list will be the escalation times for the services required by the SLA. If the Service Desk doesnt resolve the incident within the SLA time requirements, a support group may be called on to address the incident. The consideration of the SLA requirements with the priority will be used to determine the timelines for incident resolution. These need to be recorded. Each incident is assigned a reference number for easy and future reference. The status, also referenced as workflow position, indicates where progress is within the workflow. Status labels could include such terms as new, accepted, planned, assigned, active, suspended, resolved, closed, and so on.
Priority
Service
The escalation of an incident from the Service Desk to a support group is often described as functional escalation.
43
Chapter 3
Outputs Figure 3.3 illustrates the inputs and outputs for the Incident Management process.
Service Desk
Incidents
Computer Operations
Incidents
Procedures
Incidents
Networking
Incidents
Matching
Investigation & Diagnosis Resolution & Recovery Incident closure Incident Ownership Monitoring, Tracking & Communication
Resolutions & Workarounds Reports Resolutions & Workarounds Incident Data Resolutions & Workarounds RFCs Resolutions & Workarounds Reports Configuration Details Reports Routing & Monitoring
Service Desk
Computer Operations
Procedures
Networking
CMDB
Service Requests
Availability Management
Problem Management
Change Management
Capacity Management
Relationships Incident Management has relationships with most of the other ITIL processes. It is important for the success of not only Incident Management but of enterprise-wide ITIL processes that these relationships are appropriately managed. Figure 3.4 illustrates these relationships at a high level.
44
Chapter 3
Configuration Management
Reports
Incident Management
Incident Data Reports Reports
Availability Management
SLA Parameters
Work Arounds
Problem Management
As this figure shows, Incident Management activities impact other ITIL processes in one way or another. It is important for effective communications channels to exist to communicate key activities. Table 3.2 provides the high-level descriptions about the relationships between Incident Management and the other ITIL processes.
45
Chapter 3
Relationship with Incident Management Availability Management uses incident data and records in conjunction with status monitoring data from Configuration Management. Based upon the information, a service can be assigned a status, just like a CI in the CMDB. Information provided by Availability Management records can be used to determine the availability of a service and the response time of the service provider. Capacity Management uses information about incidents that are associated to capacity (for example, incidents resulting from lack of storage space, unacceptably slow response times, and so on). These events can send a notice to the Incident Management process via systems managers, business managers, or using automated tools The CMDB defines the relationships between resources, services, users, and Service Levels. Because Configuration Management defines the position responsible for each infrastructure component, incidents related to specific components can be most efficiently addressed. The CMDB can also be used to develop workarounds, such as diverting traffic to a different email server or temporarily placing a defined user group on a different print server. Problem Management provides requirements for the quality of incident documentation and records that assist with determining the causal errors. It provides information about problems, known errors, temporary fixes, and workarounds. How are many incidents resolved? By making changes, such as replacing faulty network components or modifying parameters. Change Management provides information about scheduled changes, change status, and so on that Incident Management needs to determine appropriate actions. Additionally, changes can cause incidents. When this happens, Incident Management will send information and data to Change Management about the incidents. Service Level Management is involved with monitoring the customer agreements to ensure support provided meets customer expectations. Incident Management must understand the SLA to ensure this information is considered and used when communicating with users about incidents. Incident reports can also reveal whether service levels are provided accordingly.
Capacity Management
Configuration Management
Problem Management
Change Management
46
Chapter 3
So where do you find this data? It can be found in such places as:
47
Chapter 3 What kind of evaluations can you make from these seemingly nondescript numbers? What are your KPIs? Some of these numbers stand on their own to provide meaningful KPIs, such as: Total number of incidents reported Total number of unique incidents Total number of Severity 1 incidents Total number of Severity 2 incidents Total time to resolve Severity 1 incidents Total time to resolve Severity 2 incidents Number of incidents resolved within service level agreement parameters Total number of High Severity incidents Total number of incidents with customer impacts Total available non-Service Desk labor hours available to work on incidents Total non-Service Desk labor hours used resolving incidents
However, you can do a little math and determine additional useful KPIs. The following are just some of the metrics you can calculate from the data. Incident Resolution Efficiency Rate You can determine the incident resolution rate by dividing the total number of incidents resolved within SLA parameters by the total number of incidents reported. For example, if there were 15 incidents reported this week and 12 of them were resolved within the SLA parameters, your resolution efficiency rate is 12/15 or 80%. This will tell your management how successful you are at resolving incidents in alignment with business requirements. The lower your efficiency rate goes, the more evidence you have that you do not have the resources or tools necessary to appropriately resolve incidents or that your SLA parameters are not realistic. Customer Incident Impact Rate You can determine the impact of incidents upon customers by dividing the total number of incidents with customer impact by the total number of incidents reports. For example, 15 of 20 incidents reported during the week noticeably and measurably impacted customers, such as making services unavailable, damaging business files customers depend upon, and so on, you would have a 15/20 or a 75% customer incident impact rate. This metric will tell you how successful you are at keeping incidents from impacting your customers and can point to where stronger controls are necessary, where systems need to be adjusted, and so on.
48
Chapter 3
Incident Reopen Rate You can determine the incident reopen rate by dividing the total number of incidents reopened by the total number of incidents reported. For example, if you had 5 incidents reopened during the week, and the total number of incidents reports was 20, your incident reopen rate would be 5/20 or 25%. This metric will tell you how successful you are at permanently resolving incidents. If your incident reopen rate is high, you need to look at you incident response procedures and tools and make changes to lower the rate. Incident Labor Utilization Rate A very useful metric to reveal how changes impact business productivity is the change incident rate. This metric will tell you how much available labor was used handling incidents. You can calculate this by taking the total labor hours (not part of the Service Desk) used to resolve incidents divided by the total available labor non-Service Desk labor hours to resolve incidents. For example, if 55 labor hours were used during the week to resolve incidents, and you had 50 hours available to work on incidents, you would have an incident labor utilized rate of 55/50 or 110%. You were over-utilized this week in working on incidents. You should keep you eye on this number to determine whether you are consistently or often over-utilized. This will help you to decide whether you should add personnel who have responsibilities for handling incidents. Metrics such as these will tell you, and more importantly tell your business leaders, how efficient your Incident Management process components are and where improvements are needed.
49
Chapter 3 Whereas Incident Management is reactive, Problem Management is primarily proactive by taking actions to determine the reasons why there was a failure in the provision of IT services. However, there are some significant reactive actions within Problem Management, such as identifying the cause of previous incidents and providing recommendations for removing those causes. Problem Management is basically an investigative process whereas Incident Management is basically a resolution process.
Many errors may be the cause of a problem. Many problems may be the result of one error.
Problem Management seeks to identify the cause or causes of a problem. The determination of the cause becomes a known error. An RFC can then be submitted to eliminate the known error along with the associated problem or problems.
Problem Control Problem control activities seek to identify problems and determine the root cause of the problems. Once the causes are known, the problems can be turned into known errors that are associated with the base cause of the problem and an associated workaround. Any incident could have associated problems if the cause of the incident is not known. The first step in problem control is identifying the existence of a problem along with recording significant details about the problem. The problem should then be classified according to the Appropriate category, such as hardware or software Impact upon the business and associated business process and applications Priority based upon consideration of urgency, impact, risk, and the sources necessary to resolve the problem Status of the problem Urgency of finding a solution
50
Chapter 3 The classification of a problem may change throughout the process of resolving the problem. For example, implementing a temporary fix or using a workaround may lessen the urgency and impact. In addition to classification, an impact analysis should be performed to determine how serious the problem is and what potential and actual effects the problem has on IT services. This impact analysis will become the basis to mitigate and manage the risk. Based upon the results of the impact analysis, a priority is assigned to the problem and then the appropriate personnel and resources can be assigned to resolve the problem. The problem is now ready to be investigated and diagnosed. Investigation and diagnosis will typically need to be repeated multiple times. Each time you will get closer to resolution. Too many IT practitioners believe that resolution should or can occur quickly, but with this attitude, you will be setting yourself up for failure and frustration. Investigation often includes trying to reproduce the problem within an isolated environment. This is a very good tactic. Dont be afraid to call in specialists from the support group to help. If an acceptable workaround can be established after the cause of the problem is discovered, and the CIs responsible are identified, a relationship between the incident and CIs will allow for a known error to be defined. If an RFC must be submitted to apply a temporary fix, the RFC process must be followed. Error Control The error control activities involve monitoring and managing all known errors from the time they are identified until they are resolved. Many areas throughout the enterprise may be involved with error control. When the cause of the problem has been determined and the corresponding CIs identified, the problem can be linked to a known error, which launches the error control process. At this point, data is sent to the Incident Management process to use within any open incidents. An existing workaround for the known error can also be used to assist with incident resolution. The team working within the Problem Management process will determine what needs to be done to resolve the problem if the errors are known. The team members should compare the possible solutions and choose the one that is the best fit with the associated SLAs, costs, impacts, and urgency. When the decision has been made regarding the best solution for resolving the problem, an RFC can be submitted to Change Management. Although most problem and failures are identified in the production environment, it is important to keep in mind that test and development environments can also have failures and known errors. When the changes to fix the error have been implemented, a Post Implementation Review (PIR) should be done before closing the problem. Incident Management should be sent the results of the PIR so that they can close the applicable incidents. Throughout the error control activities, there is constant tracking and monitoring to stay abreast of problem and error resolution. Tracking and monitoring will help determine whether the business impact and/or urgency changes, if the priority changes, and whether the RFC has been successfully implemented and addresses the problem or error.
51
Chapter 3 Proactive Problem Management The actions that occur within proactive problem management, which basically means actions taken to prevent problems, ensures the quality of the services and underlying infrastructure. Trend analysis occurs along with actions to identify weaknesses. Proactive problem management can have a huge impact on the business by identifying, investigating, and addressing weaknesses throughout the infrastructure components before they result in incidents. Information Generation Throughout Problem Management processes, information is generated and shared. The closest relationship is with Incident Management, to which information is passed concerning workarounds and temporary fixes. Information is also obtained from the CMDB to determine the other entities that need to receive information about the problem resolution. The SLA is also used to see what additional entities need to receive information. Figure 3.5 demonstrates the Problem Management process.
Problem tracking and monitoring
Problem classification
Error assessment
52
Chapter 3
53
Chapter 3 Outputs Problem Management provides output to two other ITIL processes: Change Management receives RFCs to help resolve problems. Incident Management receives matching information to determine whether a problem has been associated with other incidents.
Figure 3.6 illustrates the inputs and outputs for the Problem Management process.
Incident Management
Information
Capacity Management
Information
Change Management
PIR
Configuration Management
Information
Availability Management
Information
Problem Control
Change Management
Incident Management
Relationships Problem Management has relationships with six other ITIL processes. It is important for the success of not only Problem Management but of enterprise-wide ITIL processes that these relationships are appropriately managed. Figure 3.7 illustrates these relationships at a high level.
54
Chapter 3
Configuration Management
Problem Management
SLA Data Matching Information, Workarounds, & quick fixes
Availability Management
Availability Data
Incident Data
Incident Management
Figure 3.7: Problem Management relationships with other ITIL processes.
Table 3.3 provides high-level descriptions about the relationships between Problem Management and the other ITIL processes.
ITIL Process Incident Management Relationship with Problem Management Incident Management provides incident record data used by Problem Management to identify problems. Incident Management receives matching information to determine whether this problem has been associated with other incidents. Change Management provides PIR results about associated incidents, problems, and errors. Change Management receives RFCs to help resolve problems. Configuration Management provides information critical for resolving problems, such as infrastructure details, software and hardware configurations, services, architecture blueprints, and so on. Availability Management provides availability design, planning, and monitoring data. Capacity Management provides data about storage, bandwidth settings, and other details useful for problem shooting. Service Level Management provides SLA data along with other quality data.
Change Management
55
Chapter 3
56
Chapter 3
Incident Management for the Magic Mover website (in bold)
A A A
Yes
No A C A
Incident Data
No
Send data to Problem Management and perform resolution & recovery procedures
No
Yes B A A
Incident Closure
RFC
Problem classification
Problem resolved?
Yes
Error assessment
Problem resolved?
Yes
No
Figure 3.8: Magic Mover Incident Management and Problem Management process flow.
57
Chapter 3
Costs
It is important for you to consider the costs involved with implementing ITIL Incident Management and Problem Management processes. These costs will generally fall into two categories: people costs and technology costs. People Costs You likely already have personnel throughout the enterprise performing Incident Management and Problem Management tasks, but in an ad hoc or otherwise uncoordinated way. If you are not already using ITIL, it is likely that they are performing these tasks, but in silos, meaning they are repeating tasks, leaving out important tasks, or performing conflicting tasks. When implementing Incident Management and Problem Management processes, you should be able to use some of these same personnel that are now freed up for implementation. Personnel costs will include such things as the time of the personnel who are members of the support groups when they are actively resolving incidents as well as any training they need to receive. There are also personnel costs in maintaining and upgrading the associated Information Management and Problem Management systems and tools. A typically significant cost is the upfront time necessary to plan, define, communicate, and implement the Incident Management and Problem Management processes. Technology Costs Technology costs will include such things as tools to support the Incident Management and Problem Management processes, possibly hiring outside consultants or technicians to assist in implementation of the tools, storage space for incident data, and any training costs that may be necessary. You will need to plan carefully the hardware and software tools you decide to use for implementing the automated portion of the Incident Management and Problem Management processes, and ensure that they integrate with the other ITIL processes. A good, integrated technology tool may be a significant up-front investment, but if chosen and implemented correctly, it will result in long-term savings in other areas of the enterprise.
58
Chapter 3
So where do you find this data? They can be found in such places as:
59
Chapter 3 What kind of evaluations can you make from these seemingly nondescript numbers? What are your KPIs? Some of these numbers stand on their own to provide meaningful KPIs: Total number of major problems Total number of problems in the pipeline Total number of problems resolved and removed Total number of known errors Total number of problems reopened Total number of problems with customer impact Total available labor hours allotted to work on problems Total labor hours spent working on problems
However, you can do a little math and determine additional useful KPIs. The following are just some of the metrics you can calculate from the data. Customer Impact Rate You can determine the customer impact rate by dividing the total number of problems with customer impact by the total number of problems in the pipeline. For example, if you had 50 problems in the pipeline this week and 22 of them impacted customers, your customer impact rate is 22/50 or 44%. This will tell your management how well you are at keeping problems from impacting your customers and point to where you need more resources, tools, or labor to lower the rate to an acceptable level. Incident Repeat Rate When incidents repeat and must be reopened, it points to underlying problems that must be discovered. You can determine the incident repeat rate by dividing the total number of repeat incidents by the total number of incidents. For example, if you had 50 incidents during the week, and 25 of them were repeat incidents, your incident repeat rate is 25/50 or 50%. This will tell your management how effective you are at minimizing repeat incidents. The higher the number, the more investigation and research that needs to be done to determine any existing problems at the core of the incidents. Problem Labor Utilization Rate You can determine the problem labor utilization rate by dividing the total labor hour spent working on problems by the total labor hours available to work on problems. For example, if you spent 80 hours resolving problems during the week and you had allotted 120 hours to be available for problem resolution, your problem labor utilization rate would be 80/120 or 67%. This metric will indicate how much available labor capacity was used handling problems, and can indicate whether the number allotted is too low, if more personnel is needed, or if there needs to be changes in the procedures to fix problems.
60
Chapter 3 Problem Reopen Rate The problem reopen rate is found by dividing the total number problems reopened by the total number of problems in the pipeline. For example, if you had 60 problems in the pipeline for the week, and 20 of the problems were reopened, your problem reopen rate would be 20/60 or 33%. This metric will tell your management how successful you are at permanently removing problems. Your goal will be to get this rate as low as possible. Problem Resolution Rate The problem resolution rate is computed by dividing the total number of problems resolved by the total number of problems in the pipeline. For example, if you had 45 problems in the pipeline for the week, and you resolved 30 of them, your problem resolution rate would be 30/45 or 67%. This metric will tell your management the percentage of problems you successfully addressed and removed. The higher the percentage, the better. Problem Workaround Rate The problems workaround rate is found by dividing the total number of known errors by the total number of repeat incidents. For example, if the total number of known errors is 100 and the total number of repeat incidents is 120, your problem workaround rate is 100/120 or 83%. This metric will tell your management the percentage of problems for which you implemented workarounds. Metrics such as these will tell you, and more importantly tell your business leaders, how efficient you are at implementing Problem Management process components and where improvements are needed.
Summary
Implementing the ITIL Incident Management and Problem Management processes will be an evolutionary process just as Change Management implementation was. It will take time and investment up front. It will be a learning experience. But, when done correctly, it will make your business more efficient; reduce downtime; prevent incidents from happening; save money that otherwise would have been spent constantly addressing recurring incidents, problems, and errors; and make IT more strategic in the eyes of your business leaders. Incident Management and Problem Management implementation success will also take the strong, consistent commitment of your executive management to get through the inevitable growing pains. Be sure you have that to get the subsequent commitment of your ITIL team members, and ultimately improve your Incident Management and Problem Management processes.
61
Chapter 4
62
Chapter 4
Recall each of these? Lets quickly review: Information Technology Infrastructure Library (ITIL) offers best practice approaches to facilitate the delivery of high-quality information technology (IT) services, the earliest version of which was released in 1985. Control Objectives for Information and related Technology (COBIT) provides best practices for IT management and controls created by the Information Systems Audit and Control Association (ISACA) and the IT Governance Institute (ITGI) in 1992. ISO/IEC 17799 is an information security standard most recently published in June 2005 by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). This standard was renumbered ISO/IEC 27002:2005 in July 2007. Committee of Sponsoring Organizations (COSO) of the Treadway Commission is a U.S. privatesector initiative formed in 1985 that makes recommendations to reduce fraud incidents. COSO has a common definition of internal controls, standards, and criteria against which companies and organizations can assess their control systems.
There has been much written in the past few years about ITIL. Why? Because ITIL is a perfect complement to both COBIT and ISO/IEC17799. It aligns nicely with them. ITIL, COBIT, and ISO/IEC 17799 interoperate in many ways. Most organizations that use frameworks will typically use more than one; they realize that just one framework does not address all the issues necessary for effective information management within a complex business environment. With the passage of SOX, it has been common to see organizations use COSO and COBIT in conjunction with ITIL. Auditors overwhelmingly use COBIT to determine appropriate controls when doing SOX reviews. IT areas can benefit from following a standardized framework, such as ITIL, to support COBIT constructs, and at the same time ensure SOX compliance. Why is this? Because COBIT and ITIL provide frameworks covering the areas that must be reviewed, along with the necessary criteria to use for evaluations, when considering the effectiveness of IT service management.
It is important to keep in mind that COBIT and ITIL do not provide explicit solutions to the risks being discussed within them. For them to try to do so would be foolhardy considering the very wide range of technology solutions that exist along with the technologies emerging every day. However, COBIT and ITILwhich address general and significant IT control and management issues in basically all organizationsprovide an efficient and effective roadmap to follow to successfully implement IT solutions. Because COBIT and ITIL include what are widely accepted as best practices, the documentation and implementation of the concepts will provide the best possible, and defendable, IT management results.
63
Chapter 4
64
Chapter 4
PCAOB "is a private-sector, non-profit corporation, created by the Sarbanes-Oxley Act of 2002, to oversee the auditors of public companies in order to protect the interests of investors and further the public interest in the preparation of informative, fair, and independent audit reports." For more information, see their Web site at http://www.pcaobus.org/.
The PCAOB recommends the COSO and COBIT frameworks be used to meet SOX compliance within various guidance documents they have issued, such as in PCAOB Release No. 2004-001, March 9, 2004, and in their Auditing Standard #2. The PCAOB directed that established frameworks be used by organizations to support consistent and effective internal controls. So, SOX directed the PCAOB to create guidance, and the PCAOB mandated the use of established and effective frameworks for internal controls. ITIL clearly maps to COBIT and COSO. Figure 4.1 demonstrates these relationships.
Guidelines
COSO, COBIT
PCAOB
Auditors
Management
65
Chapter 4 Now lets drill down a little further to the point where the auditors are using COBIT to evaluate your IT controls. Auditors will use the COBIT 4.0, Manage Changes (AI6, AI7) section. The Control Objective is Controls provide reasonable assurance that system changes of financial reporting significance are authorized and appropriately tested before being moved to production. What does this have to do with financial reporting controls? The Rationale explains it well: Managing changes addresses how an organization modifies system functionality to help the business meet its financial reporting objectives. Deficiencies in this area could significantly impact financial reporting. For instance, changes to the programs that allocate financial data to accounts require appropriate approvals and testing prior to the change so that proper classification and reporting integrity is maintained. This relates to Section 404 of SOX general requirements because they are there to ensure proper internal controls exist for processes, automation, and documentation. IT managers, internal auditors, controllers, process specialists, and IT systems personnel are accountable for ensuring these controls exist. Figure 4.2 shows at a high level how ITIL Service Management processes support SOX Section 404. Details for each are discussed later in the chapter.
Change Management Requests for program changes, system changes, and maintenance (including changes to system software) are standardized, logged, approved, documented, and subject to formal change management procedures Emergency change requests are documented and subject to formal change management procedures Controls are in place to restrict migration of programs to production by authorized individuals only IT management implements system software that does not jeopardize the security of the data and programs being stored on the system Rapid disclosure of operations, financial reporting and compliance validation and documentation
Figure 4.2: How ITIL Service Management supports SOX Section 404 requirements.
Incident Management IT management has defined and implemented a incident management system such that data integrity and access control incidents are recorded, analyzed, resolved in a timely manner and reported to management A security incident response process exists to support timely response and investigation of unauthorized activities
Problem Management The problem management system provides for adequate audit trail facilities, which allow tracing from incident to underlying cause
66
Chapter 4 The general ITIL controls that support all three of these IT Service Management processes include: Application controls, such as those for the systems development life cycle (SDLC), logging access activities, and processing and reporting financial activities of all types IT general controls, such as access controls, authorization, and records retention Document controls, such as the existence of policies, procedures, narratives, flowcharts, configurations
HIPAA
European Union Data Protection Directive 95/46/EC Canadas Personal Information Protection and Electronic Data Act (PIPEDA) U.S. State Breach Notice Laws
67
Chapter 4
It is important to note that the FTC also typically requires violators of the FTC Act to establish formal information security programs and undergo ongoing independent audits of the adequacy of the programs for a period of 20 years. The ongoing purview of the FTC is often more expensive than the dollar penalty.
68
Chapter 4
So, with all these in mind, lets look at the details for how these three ITIL Service Management processes support not only compliance but also business improvement.
69
Chapter 4
Change Management One of the key internal control objectives in COBIT is managing change. Managing change is also one of the required General IT controls. The foundation of an effective and efficient IT control environment is effective Change Management. Well-defined documented processes based on best practices frameworks, such as ITIL, and supported by automation where possible, are necessary to achieve compliance. The following Change Management activities support compliance requirements: Ensuring system changes are authorized and appropriately tested before being moved to production Having a documented change management process and keeping it maintained to reflect the current process Having change management procedures for all changes within the production environment, including program changes, system maintenance, and infrastructure changes Following procedures to control and monitor change requests Following procedures to initiate, approve, and track change requests Following documented procedures to appropriately test and approve changes before placing them into production Ensuring the approval procedures address all the following: operations, security, IT infrastructure management, and IT management Following documented procedures to ensure only authorized/approved changes are moved into production Maintaining an audit trail, change request log, and supporting documentation Ensuring documented procedures for timely implementation of patches to system software Maintaining and following documented procedures to control and supervise emergency changes Maintaining an audit trail of all emergency activity and following procedures to have it independently reviewed Following documented procedures, including back out activities, for emergency changes Following documented procedures to ensure all emergency changes are tested and appropriately approved by systems owners, development staff, and computer operations, as appropriate, before being put into production Establishing separation of duties between the staff responsible for moving a program into production and development staff Following documented procedures to perform a risk assessment of the potential impact of changes to system software
70
Chapter 4 The benefits of following the ITIL Change Management process go beyond compliance. The organizational benefits include: Cost savingsAccording to Nouri Association, Inc. (NAI), organizations save 30% to 50% using frameworks with automated controls compared with those that use manual change management controls. Increased customer satisfactionChange management occurs more consistently and dependably. Customers know the status of their change request throughout the entire change process. Production environment stabilityNAI research shows there is a 15% to 20% decrease in change-related incidents. Supports quality assurance (QA) initiativesFollowing the structured, well-documented, and consistent processes within ITIL Change Management supports QA recommendations, such as those found within Six Sigma.
To most efficiently and effectively handle IT changes and compliance requirements, the Change Management process should be centrally managed and integrated throughout the entire applications and SDLC. Activities that should be centrally managed to process changes include: RecordingEnsuring all change sources can submit requests for change (RFCs) and that the RFCs are properly recorded AcceptanceFiltering submitted RFCs and moving those eligible on for consideration Classification, categorization, and prioritizationPutting each RFC into the appropriate category and establishing a priority Planning and approvalConsolidating the changes, giving approvals, obtaining resources, and involving the change advisory board (CAB) where necessary CoordinationScheduling, development, testing, and implementation Evaluation and closureDetermining success and learning from the experience
71
Chapter 4
Incident Management The Incident Management process needs to manage all incidents from detection and recording through to resolution and closure. Incident Management is reactive by nature. The objectives of Incident Management are to reduce or eliminate the business impacts and effects of actual or likely disturbances within IT services to not only ensure personnel can get back to work as soon as possible but also that business can resume to normal as soon as possible. Another COBIT internal control objective is managing incidents. The following Incident Management activities also support compliance requirements: Documenting and maintaining a formal incident management system. Establishing and maintaining formally documented incident management procedures. Providing training for, and consistently following, incident management procedures. Obtaining clearly documented management support for incident management processes. Establishing consistent, well-documented incident reports that include information about the incident, how the incident was analyzed, and how it was resolved. Establishing incident management audit trails to track the entire incident resolution lifecycle, from initial report to confirmed resolution. Establishing procedures to respond to unauthorized activities in a timely manner.
Well-defined documented procedures, automated where possible, help to further support compliance. Automation helps to ensure procedures are consistently and completely followed and reduce the amount of human error. The types of activities that occur within Incident Management that can be automated to support compliance requirements include: Incident acceptance and recordingDetecting and reporting an incident and then creating an incident record Classification and initial supportAssigning the incident a type, status, impact, urgency, priority, service level agreement (SLA), and so on to help facilitate the most appropriate response; this should include providing temporary workarounds whenever applicable Service requestDocumenting and implementing automated procedures to request IT services whenever necessary to support incident response MatchingDetermining whether the incident is known and if there is a workaround in place
72
Chapter 4
Investigation and diagnosisDetermining whether a known solution to an incident does not exist, then following procedures to launch an investigation Resolution and recoveryFollowing procedures to find a solution, documenting it, and then automatically notifying the appropriate individuals and areas ClosureUpon obtaining confirmation from those notified that the solution is satisfactory, following automated procedures to formally close the incident Progress monitoring and trackingThroughout the incident response life cycle, monitoring progress so that the time it takes to resolve the incident is recorded; in addition, ensuring that, when roadblocks occur, that incident is appropriately escalated to the next level of support.
Problem Management So how is a problem different than an incident? As I discussed in Chapter 1, a problem is generally an unwanted or undesirable situation that, if not addressed soon enough, can become the root cause of an incident. Problem Management takes the entire IT infrastructure into account, using all available information, to identify existing and potential failures in the delivery of IT services. Problem Management supports Incident Management by providing alternative workarounds and temporary fixes during an incident but does not have responsibility for actually resolving incidents. Problem Management also involves the analysis of incidents and problems to identify trends and then subsequently takes proactive actions to prevent the further occurrences of similar incidents and problems. Problem Management also supports COBIT internal control objectives and, as a result, compliance with laws and policies. The following Problem Management activities support compliance requirements: Establishing a documented Problem Management system and ensuring it is being used throughout the enterprise Establishing formally documented procedures to use the Problem Management system, including consistent reports and review practices Following formally documented procedures to create audit trails for Problem Management activities
73
Chapter 4 Well-defined documented Problem Management procedures, automated where possible, help to further support compliance. As with Incident Management, automation helps to ensure procedures are consistently and completely followed and reduce the amount of human error. The types of activities that occur within Problem Management that can be automated to support compliance requirements include: Problem identification and recordingAutomating problem reporting helps to streamline the identification of known and new problems, in addition to supporting better trend analysis. Problem classification and allocationDetermining the category, impact, urgency, priority, and status of a problem then allocating resources for resolution is made more efficient through automation. Problem investigation and diagnosisDetermining the cause of the problem and linking it to the appropriate CIs is more accurate and time efficient through automation. Temporary fixesImplementing necessary temporary or emergency fixes to manage known errors until they can be resolved is accomplished much more quickly by using automated processes to identify the temporary fixes. Error identification and recordingIdentifying the error and then communicating the error to Incident Management, if appropriate, is made easier through automation. Error assessmentDetermining what is necessary to resolve known problems and errors is made easier through automation. Record error resolutionDetermining the most appropriate business solution is done more quickly through automation. Close error and associated problemsPerforming a Post Implementation Review (PIR) and then closing the records is done more accurately and efficiently through automation.
74
Chapter 4
An effective IT control strategy will utilize all these controls and be designed to minimize risk to the business. By implementing these controls following ITIL, regulatory and policy compliance in large part can be achieved.
Summary
As organizations continue to look for better ways to manage IT while meeting regulatory and policy compliance, ITIL continues to grow in popularity. As a result, organizations also realize better integration of IT throughout all enterprise business processes. Putting ITIL in place requires careful planning and commitment, and it is usually expensive. ITIL is often best implemented with other frameworks, particularly COBIT, to meet compliance requirements. However, organizations that take a proactive approach to compliance and frameworks implementation realize they also achieve greater efficiency, reduced operational and legal risk, and lower operational expense.
75
Chapter 4 According to studies of high-performing IT organizations by the IT Process Institute, implementing frameworks as part of their compliance efforts spent less than 10 full-time equivalent (FTE) staff-years on SOX Section 404 activities compared with hundreds of FTEs in other organizations. The organizations working towards frameworks and compliance goals spent less than 5% of their time on IT problem resolution compared with 35% to 45% spent on unplanned, unscheduled work in other IT organizations that were not using frameworks [Behr, K., G. Kim, and G. Spafford, The Visible Ops Handbook, Information Technology Process Institute (ITPI), 2004-2005]. ITIL implementation continues to grow throughout the world; a reminder of the growing importance of international standards. When you are implementing controls and processes to meet compliance requirements so that you can avoid litigation, fines, and penalties under your applicable laws and policies, take the opportunity to also act strategically to incorporate IT throughout all your organizations business decision-making processes. You will find that taking this risk-based, frameworks approach will create valuable benefits beyond compliance. You will see that the resulting strong IT controls strategy will achieve compliance objectives as well as increase IT efficiency and effectiveness.
76
Chapter 5
There were processes in place but each business unit had their own unique way of getting their jobs accomplished. There were still old desktop computers and servers being utilized. There was a centralized Help desk, but the IT areas did not use it. In fact, the IT area told their applications and systems customers to contact the IT staff directly if the customers needed help. The only time the Help desk was called was basically for password resets. Okay, so where to start? This sounds like a job for ITIL!
77
Chapter 5
Getting Ready
Organizations of any size can benefit from centralizing IT processes as much as possible. The IT Service Support processes that tend to impact all organizations regardless of size are Change Management, Problem Management, and Incident Management. The Generic Manufacturing Company can benefit from effective centralization of its many IT processes. They determine they can probably benefit by clearly establishing one process and one area to be ultimately responsible for change, problem, and incident handling processes. Consider all the possibilities for where and when to begin. This is the initial stage of ITIL implementation. It is important to get the different areas, currently mistrustful and at odds with each other, to understand the improvements that can be made through cooperative implementation of ITIL. The business management was all for implementing a process to make IT management go more smoothly, and in fact improve upon business results. The Generic Manufacturing Company developed the ITIL process implementation roadmap that Figure 5.1 shows. All organizations can take this roadmap and modify it to meet their own enterprises unique environments.
Planning
Implementation
Measurement
Set scope Identify stakeholders Determine current situation Identify trouble spots Perform benchmark
Create awareness Train personnel Implement plan Manage organizational change Manage cultural change
Review status Measure goals Measure organizational changes Measure cultural changes Document problems and vulnerabilities
78
Chapter 5 It is not practical to implement all aspects of ITIL Service Support at one time! Not all the required process inputs will be available when the first process is initiated. There will be information quality issues where key process input areas are absent. It will be very difficult, and often impossible, to determine the impact of a change on business services availability, capacity, and continuity when supporting processes do not exist. This increases the possibility of problems occurring when deploying the processes. Be sure that representatives from each of the ITIL Service Support process teams are included in the CAB. These representatives can then be made aware of the impact of the changes on the user community.
Be sure to include information security within your processes. Information security is a critical function within the IT organization. It is critical to include information security in the authorization of all network and information system changes. Doing so ensures that security is accounted for in the development of the change. Information security should also be part of the CAB.
Realizing Improvements Are Needed The Generic Manufacturing Company performed an assessment to clearly identify and document the problems. Three major trouble areas were discovered: The multiple areas of the company were each following different processes to move applications from test to pilot to production environments. One of the areas didnt even have a pilot (end-user quality assurance) environment and moved the applications directly from the test environment to the production environment! IT problems were handled very differently throughout the enterprise. A couple of the business unit applications support areas told their end users to call them directly to handle problems. Other business units directed end users to call the corporate Help desk. Documentation for the problems was not consistent, was not centralized, and often was not documented. The same problems seemed to occur over and over again. IT-related incidents were recurring with increasing frequency. Many times the same incident occurred in different parts of the enterprise, and different teams handled the same incident differently. The teams handling the incidents did not communicate with each other about what worked well and what didnt work for incident resolution.
Generic Manufacturing Company IT leaders realized these problems were likely having major negative impact on the business. They wanted to look into specifically how much impact, and then determine what could be done to improve upon the situations. These trouble areas are common to most organizations.
79
Chapter 5
Get Executive Support Executive management must clearly support the implementation of ITIL processes throughout the enterprise. Without this support, the people who must be involved with implementing the necessary changes will not do so and will continue with business as usual. It is human nature to continue doing things as they are currently done; it is not as much work out of an already perceived to be too busy day to continue with the status quo. Unless executive leaders tell personnel that changes must be made, there will not be full cooperation. Lack of cooperation here will lessen effectiveness of ITIL implementation and could result in an unsuccessful project. The Generic Manufacturing Company CIO explained the three major problems to the corporate executives. The explanation described the ITIL Service Support processes and how they could be good ways to improve or eliminate the problems. The IT staff asked for, and got, executive support to do a project to determine the extent of the problems throughout the enterprise, and then to determine what specifically should be done to address the problems. Choose Team Members Another key component for success is obtaining the qualified personnel to perform the implementation and ongoing tasks necessary within the ITIL Service Support processes. There will be a need for some full time employees (FTEs), but there will also be the need to add responsibilities to existing positions. Even with the strong and visible support of executive business leaders, buy-in from all stakeholders, the establishment of the CMDB, the creation of a good process definition, and integration and automation of the processes throughout the enterprise, success will not be accomplished if you do not have personnel performing the necessary activities. You must also ensure that your internal customers will be actively involved in the ITIL process development, implementation, and maintenance. This will make certain your customers have tested and accepted the components of the ITIL process, ensuring the requirements have been met. Be sure to obtain feedback to guarantee the customer needs are considered and included within the planning process. There will be similar team members for each of the ITIL processes. The key roles for the ITIL Service Support processes include: Change ManagerThis position will have authority and accountability to define, validate, and maintain the Change Management process. This position is responsible for oversight of Change Management process monitoring, measuring, reporting, and operations. Support Center ManagerThis position should help determine how the Incident Management process can provide data relating to the other processes. This is the central point of contact for customers, so this position should ensure that the information identified and classified within the Support Center will be valuable and useful to other parts of the organization. Project ManagerThis position is responsible for assisting with the initiation of the project, creating the project plan, executing the plan, and then closing the project. The Project Manager is also responsible for providing status updates for each project milestone.
80
Chapter 5 Human Resources (HR)A representative from HR should create job descriptions for new positions related to the Change Management process. HR can also identify potential internal candidates for the new positions. TrainerA position needs to exist to ensure training will be provided to targeted positions as well as providing awareness communications to the entire enterprise. Purchasing DepartmentSomeone from the purchasing department should be enlisted to negotiate the best prices and services possible from the vendors you will use to implement and support the Change Management process. Applications DevelopmentEnlist the experts from your applications development areas to help choose Change Management process software. They can also identify the hardware requirements for the software. They should also discuss the products being considered with the Support Center staff to make sure the integration with the other ITIL processes are facilitated. Business Unit Managers or RepresentativesYour internal customers must be represented within the Change Management process planning, implementation, and ongoing management to ensure the needs of the business are appropriately considered. This person must have in-depth knowledge of how the business works and must be able to translate and communicate the process issues and requirements into business terms for the personnel within the business unit. Support Center StaffThese personnel will typically not participate in the implementation of the Change Management process but will provide service delivery through their established communications channels, such as email, telephone, intranet site, and so on. The Support Center owns the Incident Management process. This area is the point of contact for customers and must be able to provide them with good advice and guidance. The Support Center staff must also obtain feedback regarding the problems reported through the Problem Management process. They should also create regular, typically daily, reports of the top-ten most common problems reported each day to allow for addressing reported problems most effectively and lessening the effect of ongoing, repeat problems. Incident Team MembersThese personnel will work with the application providers to create Service Level Agreements (SLAs) and Operating Level Agreements. They also will manage the Support Center knowledge base and management reporting.
Create Mission Statements To achieve success, organizations must first define success. It is necessary to create a mission statement for each of the ITIL processes you are implementing. The following example shows the mission statement Generic Manufacturing Company created. You can use this as an example on which to base your own mission statements. You will need to modify it to fit your own organizations style of writing, along with your industry and environment.
81
Chapter 5
Generic Manufacturing Company Change Management Mission Statement The mission of the Generic Manufacturing Company Change Management process is to enable technical changes within the production environment in the most efficient and consistent way possible to support business objects and with the least amount of disruptions resulting from making IT changes. The purpose of the Change Management process is to ensure changes made within the IT environments are consistently tracked, reviewed, tested, communicated, implemented, and validated to reduce the negative impacts to the business as much as possible.
The Generic Manufacturing Company determined the scope for the ITIL processes they will first implement will be for Change Management, Problem Management, and Incident Management related to the email system used throughout the enterprise. Identify Stakeholders You must understand who the stakeholders are; they are necessary to understand how to improve upon the processes as well as determine the success for how the processes were implemented. Define, identify, and map the stakeholders for each of the ITIL processes you are implmenting. Identify the specific needs for each of the types of stakeholders youve identified. This information will be used in assessing the success of the ITIL process for the corresponding stakeholders. The Generic Manufacturing Company identified the following stakeholders for the email system: Information Technology Information Security Human Resources Legal Management Email Process Owners Email Users
These are similar to the stakeholder other organizations will have for the email system.
82
Chapter 5 Determine Current Situation Organizations must determine the specific needs for change; the need to improve upon the ways IT processes are currently being performed. This need must be documented to validate to the stakeholders why change is necessary. As a generic example, if your organization has $1000 per hour revenue coming in every hour of every day each week, and your changes typically cost you 5 hours of lost revenue, that calculates to $5000 of lost revenue per week or $260,000 of lost revenue as a result of changes. Making your Change Management process more efficient and effective could save you many hours of time that computes to many thousands of dollars of additional revenue. To demonstrate this, you will need to keep track of some critical measurements. Here are some key measurements to make when implementing the Change Management process: How many changes occur within the scope youve identified each month? How must time does it take to implement the changes? How many incidents and outages occur as a result of the changes? How many changes are backed out each month?
The Generic Manufacturing Company identified the following for their email system: Even though the same type of email system was being used throughout the enterprise, Lotus Notes, there were four different email servers being separately maintained by different business units, each located in a different country from the others. Upgrades and patches were applied to each of the four email servers as determined by each maintenance team. This resulted in having different versions of Lotus Notes running on the four servers. End users reported their email problems to many different areas; sometimes to the IT staff administering one of the servers, sometimes to the central Help desk, sometimes to the Information Security area, and sometimes to their own managers. End users also report email incidents, such as spam or phishing messages, to many different areas. A review of each mail server revealed that each server was unavailable because of changes, problems, or incidents anywhere from 1 hour to 8 hours per work week.
83
Chapter 5
Identify Trouble Spots Where are the trouble spots within your current processes? You must identify the activities and components within your current processes in order to ensure not only that you do not recreate them, but more importantly that you resolve the trouble spots with your implementation. There are many different tools, automated and manual, you can use for identifying your trouble spots flowcharts, Pareto charts, and fishbone diagrams, just to name a few.
A flowchart is a schematic representation, using well-defined symbols, of an algorithm or a process. A flowchart can be used to identify the flow or sequence of events within a process or service. A Pareto chart, named after Vilfredo Pareto, is a special type of bar chart where the values plotted are arranged in descending order. A Pareto chart can focus efforts on the problems that have the greatest potential for improvement by illustrating their relative frequency or size within a bar graph. A fishbone diagram, also called a "cause and effect" diagram, and an "Ishikawa" diagram after the creator, shows the causes of a certain event. A fishbone diagram can allow team members to identify and graphically display all the possible causes related to a problem or condition to help determine the root causes.
The Generic Manufacturing Company ITIL implementation team decided they would use fishbone diagrams to identify where their trouble spots were for their email system Change Management, Problem Management, and Incident Management processes. Figure 5.2 shows the diagram they created to show the trouble spots within their email system Problem Management process. This provides an example of how you could use a fishbone diagram to help identify the trouble spots when you are planning for your ITIL Service Support processes.
Customers
Inconsistent Problem Reporting Problems Go Unreported
Documentation
Methods
Handwritten Notes
No Documentation
Inconsistent Documentation
Responsibility for Handling Problems is Not Assigned Different Areas are Called
Problems are not communicated to other business units Problems are inconsistently communicated to other business units
Contacts
Linkages
Knowledge
84
Chapter 5 Perform Benchmarks After youve identified the problem areas and the need for improvements, you can establish a benchmark. Why create a benchmark? Simple: to be able to determine how much youve improved your process following implementation and to help to continue process improvement. If you do not measure where your organization is at (your benchmark), you will not be able to clearly show how much change has occurred as a result of implementing the ITIL processes. ITIL has a process maturity framework (PMF) that organizations can used to measure, or benchmark, the process within your organization and then to subsequently provide the context for measuring the maturity of the process as time goes on.
The PMF assumes that a Quality Management System (QMS) is in place and that there is a goal to improve one or more aspects of the process effectiveness, efficiency, economy, or equity. A few of the QMS models ITIL uses include those by Deming, Juran, Baldridge, and Crosby.
Table 5.1 shows the five levels within the ITIL PMF.
Level 1 2 3 4 5 PMF Initial Repeatable Defined Managed Optimized Description Little to no documentation or assigned responsibilities for the process. The process is documented but there are limited operational processes in place; it is not viewed as having significant importance. The process is documented and has an owner, objectives, and allocated resources; however, acceptance throughout IT may not exist. The process is well-documented and implemented throughout all the business units and IT. The process interfaces with other processes. There is seamless integration of the process throughout IT and business areas. The process has become institutionalized as part of everyday activity.
The Generic Manufacturing Companys assessment clearly and quickly shows that they are at Level 1 within the PMF.
85
Chapter 5
Planning
When you have the trouble spots documented and the benchmark complete, it is time to formally document the plan to address the issues from your findings. Document the Business Case The single biggest factor in successfully implementing ITIL Service Support processes will likely be overcoming resistance to change. To overcome the challenge, partner with all your stakeholders to gain their buy-in for the changes as well as using their input for creating the processes you are implementing. Use the information from your assessment and benchmark to build your business case. As a detailed investment proposal, include within your business case a detailed analysis of all the costs, benefits, and risks associated with the proposed investment of implementing Change Management, Incident Management, and Problem Management. Put the investment decision into the context of strategic business goals. Position the business objectives and goals with the options involved with each of the ITIL processes that impact the decision makers. Use one of the many management tools, automated or manual, to help you demonstrate the improvements that the business can realize by implementing the ITIL processes. The Generic Manufacturing Company created their report by linking the current situation with the business goals for revenue, and showed how much money they were losing by the time lost from downtime and employee time taken to respond to poorly executed changes, along with inconsistent handling of incidents and ongoing problems. They used Pareto charts to demonstrate their current situations and focused upon the problems and the alternatives that are projected to have the most positive impact upon the business. Figure 5.3 shows the Pareto chart for their problems, current and projected, following Problem Management process implementation. Use this example to inspire your plans to make the business case.
Figure 5.3: Example Pareto chart for Problem Management process implementation.
86
Chapter 5 Set Goals You must establish documented goals to be able to know whether you are successful in your efforts. Clearly define the goals for your specific organization for implementing ITIL Service Support processes. The following example highlights the goals defined by the Generic Manufacturing Company.
The Generic Manufacturing Companys Goals for Implementing ITIL Service Support Processes The goal for implementing a formal Incident Management process within the Generic Manufacturing Company is to restore normal service operation as quickly as possible, using consistent practices, and to minimize the negative impact on business operations to ensure the best possible quality and availability of IT service levels as defined within the IT SLA. The goal for implementing a formal Problem Management process is to minimize the negative impacts of problems and the resulting incidents on the business. The Problem Management process will maximize IT services by correcting problems and preventing recurrences. The goal for implementing a formal Change Management process is to ensure that standardized and consistent methods and procedures are used to efficiently and promptly handle all IT changes to minimize the impact of change-related problems upon IT service quality, thus improving the day-to-day operations of the organization.
Use the Generic Manufacturing Company Service Support goals as a basis for creating your own customized goals. Communicate the goals within your awareness messages. Create the Implementation Plan Carefully develop the implementation plan. Too many organizations spend too little time planning implementations, and then end up spending ten or twenty more times doing the actual implementation than they would have spent if they had just invested more planning time up front. Think through the steps necessary for your own organization. Each business will be different. The Generic Manufacturing Company created detailed implementation plans for their ITIL Service Support processes. Table 5.2 shows a sample from their Incident Management process implementation.
87
Chapter 5
Section 1 1.1 1.2 2 2.1 2.2 2.3 2.4 2.5 3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 Scope
Action
Start Date
Due Date
Identify areas impacted by the Incident Management process Identify linkages to other processes Roles Identify existing Incident Management roles Update roles and responsibilities Identify personnel to fill roles and assume responsibilities Identify CAB members Identify project team roles Awareness and Training Identify groups that need targeted training Identify awareness communications necessary Decide whether to create training content or bring in from outside Create awareness communications Create timeline for sending awareness communications Identify dates for providing training Send awareness communications Deliver training
Use this plan sample as an example upon which to create your own, customized plan for your business. Remember, this is just a sample; yours will need to be more detailed.
88
Chapter 5
Create Policies You must document the high-level plans to describe the goals for your organization with regard to the ITIL processes. This documentation is critical for describing managements decisions regarding their commitment, direction, and planned course of action. Create policies for each of the ITIL Service Management processes you decide to implement. The following list shows some of the Change Management policies created by the Generic Manufacturing Company: All changes within the enterprise email systems must follow the established and documented Change Management procedure. Each submitted change must have a detailed script describing the change. The Change Management Project Manager is responsible for reviewing the scripts and approving change requests before the change can be scheduled. Following change request approval, any deviation from the pre-approved script requires Change Management approval. Each change request must include the following information: Activity description Details of change justification Possible impacts to the customer and other production systems Scheduled start and end times Necessary resources
Each approved change request must have a documented fallback option. Any vendor contracted to do changes must do the work onsite to enable appropriate oversight of the activities.
Use these Change Management policies as examples to help you get started with your own organizations policy development.
89
Chapter 5
Identify Responsibilities You will need to assign responsibilities within each of the ITIIL Service Support processes. Most, if not all, of the people filling these roles will be the same as the implementation team members within the corresponding defined roles. However, these responsibilities are different in that, as opposed to implementation activities, these roles will be responsible for ongoing Service Support activities. These are the ITIL Service Support responsibilities the Generic Manufacturing Company defined: 1st Level SupportRegister and classify incident report and perform immediate actions and keep users informed about the situation status at specified intervals. If 1st Level Support cannot resolve the situation, it will be transferred to the appropriate 2nd Level Technical Support Group. 2nd Level SupportTake over situations that cannot be solved immediately by 1st Level Support. If necessary, request external support, such as from software or hardware manufacturers. If the situation cannot be resolved, the 2nd Level Support passes the situation on to Problem Management. 3rd Level SupportResources within the hardware or software manufacturers whose services are requested by 2nd Level Support. Incident ManagerResponsible for the effective implementation of the Service Desk and Incident Management process. Carries out reporting procedures. The first point of contact for incidents. Problem ManagerResearches the root causes of problems and incidents. If possible, makes workarounds to the Incident Management team. Develops final solutions for Known Errors. Change ManagerAuthorizes and documents all IT infrastructure and configuration item changes. Determines and communicates the sequence of individual change stages. Involves the CAB when necessary. Release ManagerResponsible for consistently and effectively implementing changes to the IT infrastructure. Plans, monitors, and implements changes in coordination with Change Management. Configuration ManagerPrepares and makes available to the Service Management teams the necessary information about the IT infrastructure and services. Maintains the configuration items and related documentation for the components of the IT infrastructure. Documents changes and checks the updated information regularly. Automates the CMDB update process as much as possible.
90
Chapter 5
Implementation
When the mission statements have been clearly articulated, they must be effectively be communicated throughout the enterprise. If you do not make your personnel aware of the Service Support processes, they will not follow the processes and/or will not know what the processes require. The implementation plan and the purpose of each process must be clearly and effectively communicated to each of the key stakeholders. Ask for feedback to ensure that the needs from each of their areas are adequately incorporated into the process as well as to obtain their ongoing support. Train Personnel Success cannot be accomplished with a group of implementation folks who have no knowledge about what must be done for each of the ITIL processes. Training the team members for each of your processes is yet another key to success for your process implementation. There must be a clear understanding of the goals for each of the processes. You can provide the team members with this knowledge through a wide variety of methodsassigned reading, classroom training, sending them to industry meetings, taking computer based training (CBT), or bringing in an outside trainer who is an expert in the ITIL process. When implementing ITIL, it is also a good idea to have at least the leader for each of the ITIL processes invest the time necessary to attain ITIL Foundation Level Certification as well as practitioner level certification in their specific ITIL process.
For more information about ITIL certifications see http://www.itilofficialsite.com/Qualifications/HowtoStart.asp.
Implement the Plan At this point, you should have a very clearly documented, detailed implementation plan. Provide enough time between the development and implementation phase to allow for one of Murphys Lawseverything will take twice as long as anticipatedand to allow for training to effectively occur. Make sure all stakeholders will be provided with advance notice of the upcoming changes. Be sure to clearly communicate to them how they will or may be affected by the changes. Use a phased rollout approach to avoid a big bang type of situation. Small changes are difficult enough for personnel to deal with. If you try to throw many changes at them at once, you are setting yourself up for failure at best and disaster at worst.
91
Chapter 5
Use Tools to Manage Change It is important that any tools you choose to implement the ITIL processes actually follow or support the ITIL philosophy. Keep the following in mind when choosing your ITIL Service Support tools: Be sure to include the people who will be using the tools for different activities involved with the processes when identifying tool requirements and testing the tools. If you are replacing old tools with new ones, remember that there may be more users for the tools than when you did not have the processes in place. Take into consideration the impact the considered tools will have upon the network and supporting infrastructures. Be sure to obtain enough licenses to cover all possible users. Start defining the requirements you have for supporting tools early in the Service Support planning processes; dont wait until you are planning to implement the processes. Assign a ranking or level of importance to each of the tool requirements you identify. Be sure to evaluate the tool in terms of how well it will meet your defined goals and requirements. Look for tools that are customizable and consider how much effort and cost is involved with that customization. Take into consideration the amount of training and expertise the possible tools will require. Will you be able to obtain that expertise in-house or will you need to go outside your enterprise? Be sure to test the potential tools thoroughly. Clearly define scenarios and choose testers from throughout your stakeholders to ensure you consider the perspective of all the ultimate tool users.
Be sure to choose tools that will offer seamless integration with the other IT tools throughout the enterprise to reduce integration risks.
92
Chapter 5
Measurement
Chapter 2 and 3 provided key performance indicators (KPIs) to use to measure the success of Change Management, Problem Management, and Incident Management success. Be sure to use them! You must also measure the success of your implementation activities as you go along. Tracking key measurements will help to ensure optimization of your IT investments and will be used to validate to your customers the effectiveness of the processes. Review Status Carefully track the status for each of the activities involved with each of the Service Support processes. You first need to know what your current state is for each of the processes you are implementing; your benchmark values. You then need to track how much time the implementation tasks consuming. Once implementation is complete, you need to determine how much time the changes take. How much time does responding to incidents take? How much time does resolving problems take? Be sure to carefully document all these status metrics to not only be able to see your progress but also enable you to answer executive management questions regarding your implementation progress. Measure Goals Look at the goals you documented during project planning. How close are you to meeting those goals? Document the goals that have been achieved, along with how close you are to meeting the goals that you have not yet met. Measure Changes Document the changes that have occurred not only within IT but also within the entire enterprise as a result of implementing the ITIL Service Support processes. Some of these changes may include: Reduced IT costs Greater business process productivity Improved communications Increased IT reliability
It is important to document the changes not only to ensure that improvements can continually be made within the processes but also to provide documentation to validate the value of the processes within the organization. Many organizations have a tendency to abandon processes once improvements have been made. Your business leaders must understand through your wellwritten documentation that the processes must be preserved, and improved upon as necessary, in order to keep those noticeable and measured improvements in place. It is also important to be realistic in your efforts. You should not expect to become ITIL certified compliant right away; maybe even never. However, what you should expect with successful implementation is for documented and noticeably improved processes, validated through your measurements.
93
Chapter 5 Document Problems and Vulnerabilities No ITIL Service Support implementation will occur without snags. And there are always vulnerabilities within the enterprise that will put certain portions of the processes at risk. It is important to identify and document these problems and vulnerabilities so that you can most effectively address them. Some of the common problems and vulnerabilities experienced by organizations include: Lack of strong and visible commitment from executive management Shortage of resources needed for implementation of one or more of the processes Shortage of resources needed for ongoing maintenance of one or more of the processes Lack of personnel and/or implementation team awareness and understanding of the processes and how they apply to their respective job responsibilities Customer service levels or the processes not clearly defined or inadequately defined Poor integration with other processes Personnel resistance to change Workarounds are not effectively or consistently shared with other support staff Change updates are not communicated Lack of established customer service levels Poor or non-existent tools to support the Service Support processes
Plan for Ongoing Management As I stated earlier, implementing ITIL Service Support processes is not a one-time activity. To keep the processes effective, organizations need to develop and consistently follow plans for ongoing management and operations activities. These activities can be simplified and made more efficient through automation. The following are some examples for how ongoing management and activities can be automated: Automated auditing tools can be used to compare the documented infrastructure with what the infrastructure actually looks like. Change detection tools can be used to identify changes within the infrastructure and compare them with the approved changes documented within the Change Management database. Detection tools can alert IT staff of intrusions and possible threats and vulnerabilities. Incident tracking software can be used to identify incidents related to specific changes. This will make it easier to report separately from other incidents and will enable incident trends and summary reports to be more easily created. IT service monitoring can show the service impact resulting from changes, incidents, and problems.
94
Chapter 5 Network search tools can be used to identify all the devices throughout the enterprise. Some are especially helpful by being able to identify software versions and hardware configurations. Report automation software can standardize, quicken, and make the reporting process and tasks easier. Self-service tools can allow end users to perform ITIL Service Support activities themselves, without involving other personnel, which will allow the other personnel to continue with their other job responsibilities. System infrastructure monitors can be used to monitor device availability during planned outages, changes, incident response, and problem resolution.
Summary
IT services are an integral part of all business processes within most organizations. ITIL provides a collection of best practices for performing effective Service Support processes. All ITIL Service Support processes are linked and, theoretically, are dependent upon each other. However, to be most successful with ITIL implementation, organizations must first identify where their most pressing problems exist within the enterprise, then establish a manageable scope for initially introducing and implementing the Service Support processes into the enterprise. The likely processes for most organizations to implement beneficially will be for Change Management, Problem Management, and Incident Management. Implementation of the chosen ITIL processes within the chosen scope must be strongly and clearly supported by executive management. The implementation plan must be carefully and thoroughly developed. Mistakes within the implementation plan will not only result in a loss of investment cost and personnel time but also likely result in pushback of further implementation of the processes from the stakeholders, and even damage the business disruption of services. Planning and providing a coordinated approach to IT Service Support design, implementation, and maintenance, otherwise referenced as application life cycle support, will help to ensure IT operations areas deliver services designed to meet business requirements. Coordinating input from the stakeholder and business unit areas will allow the IT organization to better meet service-level requirements and choose new technologies to re-engineer business processes, resulting in greater effectiveness, more efficiency, and positive business impact.
95