Professional Documents
Culture Documents
Infrastructure Management Consolidated CS Slides (Mid-Sem 1 To 6) 20220310A
Infrastructure Management Consolidated CS Slides (Mid-Sem 1 To 6) 20220310A
Dr. Phalachandra HL
Acknowledgements:
Significant portions of the information in the slide sets presented through the course in the class are
extracted from IT Systems Management -Rich Schiesser and other books/sources from Internet. Although
BITS Pilani SS ZG538 Infrastructure permission for use from the book was requested, there has been no response to the same. Since these were
intended only for presentation in the class room, have continued to use but would like to sincerely thank,
Pilani|Dubai|Goa|Hyderabad Management
acknowledge and reiterate that the credit/rights remain with the original authors/publishers only
1
BITS Pilani
Pilani|Dubai|Goa|Hyderabad
Introduction
Session 1
SS ZG538 Infrastructure Management BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956 2
In the course
• IT Infrastructure, IT infrastructure Management, Challenges, Support needed by executives (Business Case) and then
designing and structuring the IT Organization, Process owners and responsibility of Process owners
• The need for staffing and retaining people with required skills and skill levels, the personal and business ethics or lack of it
and it’s impact in-terms of legislation and what that drives into organizations
• Evolution of the IT Systems management as customer centric services & the approach of sharing of the best practices
with ITIL
• There are 12 key processes used in IT Systems management which would be discussed as part of the course
• Availability Management • Network Management
• Performance/Tuning • Configuration Management
• Production Acceptance • Capacity Planning
• Change Management • Security
• Problem Management • Business Continuity
• Storage Management • Facilities Management
• How would you build world class processes and integrating the processes described above
Defn:
– IT infrastructure consists of equipment, systems, software and services, used in common across an organization,
that are required to develop, test, deliver, monitor, control or support IT Services to customers regardless of
mission/program/project
Systems Management:
• Is the activity of identifying and
integrating various products and
processes in order to cost
effectively provide a stable and
responsive IT environment
In your perspective what would you believe are the challenges towards
• Focus of CIOs typically is towards application of cost effective technology, rather than on
the technology itself.
• A Business case is a clear and succinct cost justification for funds for IT Infrastructure
management
• An effective and thorough business case will itemize all of the associated costs of a new
system or process and compare it to the expected benefits.
• Challenge:
It is often very difficult to predict accurately the true benefits of a new system or process.
Even when the estimated benefits are reasonably accurate, its very hard to put a $ figure
for the same as some of the benefits could be qualitative rather than quantitative
What are the important things from your perspective which you will need
2. Understanding which IT business goals are most critical to achieve the company’s Business
goals. (Alignment)
3. Determining which systems management functions are most critical to meeting the IT business
goals at that point of time. (Selection)
E.g. Focusing on scalability would be more beneficial when the provisioned capacity is crossing an
identified critical level rather than during startup when capacity is not a constraint
• Awareness that these could change over time as the goals of the company changes
4. Meeting and conferring with IT senior management to confirm and prioritize the systems
management functions to be focused on. (Validation)
5. Estimating all costs associated with the implementation and maintenance of a particular function.
• Cost of software licenses • Cost of Hardware
• Procurement • Procurement
• Enhancement • Hardware upgrades
• Maintenance • Hardware maintenance
• Cost of Manpower • Office space
• Recruiting • Scheduled outages
• Training
6. Itemizing all benefits associated with the function.
• Being able to predict capacity shortages before they occur
• Reducing the frequency and duration of outages and hence the un-productive time
• Increasing productivity by improving response times
• Ensuring business continuity during disaster recovery
• Avoiding the cost of rebuilding databases and reissuing transactions
15 Jan 2022 SS ZG538 Infrastructure Management 16
IT Infrastructure Management
Executive Support : Building a Business Case
Steps for Developing a business case for an systems management function (Contd.)
8. Building credibility to the proposed Business case by Surveys, Soliciting testimonials from
customers of other similar companies and demonstrating real-life benefits of a product in an
actual business setting
9. Presenting the same in the Business terms as would be related by the Executive audience
with the technical knowledge supporting it
SS ZG538 Infrastructure Management BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956 20
IT Infrastructure Management
Dr. Phalachandra HL
Acknowledgements:
Significant portions of the information in the slide sets presented through the course in the
class are extracted from IT Systems Management -Rich Schiesser and other books/sources
from Internet. Although permission for use from the book was requested, there has been no
BITS Pilani response to the same. Since these were intended only for presentation in the class room, have
continued to use but would like to sincerely thank, acknowledge and reiterate that the
Pilani|Dubai|Goa|Hyderabad
credit/rights remain with the original authors/publishers only
SS ZG538 Infrastructure Management BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956 3
IT Infrastructure Management
Organizing the IT infrastructure for Systems Management
• Organization structure can be looked at as the way in which IT organization arranges its
tasks, groups tasks to departments, defines or delegates authority, allocating people and
other resources.
• IT departments, which are responsible for the IT Infrastructure needs to be organized for
Optimal efficiencies and effectiveness of the system management processes.
• This will also typically involve a systematic review of human resources, finances, and
priorities.
• This arrangement builds in relationships between the different departments (and its
members) and assigns roles and responsibilities and authority to carry out different
activities.
• Typically there exists a leader responsible and multiple (layer’s of) subordinates.
• There are different alternative approaches/scenarios of structuring the various groups that
comprise the IT infrastructure. These could be Functional, Divisional, Matrix, Flat
A Basic IT Organization
IT Organization evolving still further to Business Units, Different technical Services, Project
Management, Budgeting and HR … (combination of functional and divisional)
Why wo
• Since Service Desk is the first point of contact for users with the IT organization and will have a
lasting impression on the quality, hence positioning it higher in the Organization can increase its
effectiveness, visibility and stature Positioning.
Service
Desk
Database
Administration
Systems
Management
Process Owners
Why wo
What would you think should be the responsibilities and skills needed for different IT
process owners?
Having the right process owner is one of the critical factors in the implementation of the systems
management processes. This role will need to lead
• Depending on the specific process involved, for administering other support tasks such as
What would you think are the steps involved in Staffing in your
organization
• A Skill set matrix for an environment for a set of positions can simplify the process
of identifying and quantifying the skill levels needed
Skill Set Matrix
No of Skill Level
Skill No Area of focus Platform Positions
or FTEs Intern Junior Associate Senior Lead Expert
1 Operating System HP Unix 2
Other 1
2 DBMS Oracle/Unix 1
MongoDB/Ux 2
3 Network Systems LAN 1
• Skill set matrix like the one above is very generic – more specific matrices will need
to be evolved for specific IT functions
1. The first place to source candidates with the required skill set and skill level
would be to look inside the IT organization to see if there are any qualified
candidates (either with exact skills or with necessary technology skills)
available for redeployment.
3. Opening up a staffing requirement and staff the positions from outside the
company (through referrals, online, recruiters, ads etc)
Many organizations also consider outsourcing some or all of their IT functions to staff
skills.
Factors to consider while outsourcing their IT environments are
• Overall cost savings or increases
• Scalability of resources
• Potential loss of control
• Total cost of maintaining an outsourcing agreement (Hidden costs)
• Credibility and experience of outsourcer
• Possible conflicts of priority
• Geographic and time-zone differences
• Language barriers
• Cultural clashes/morale of employees
• Knowledge retention
The effects of outsourcing also depends on whether all of the IT is being outsources or
part of the IT.
ETHICS
SS ZG538 Infrastructure Management BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956 14
IT Systems Management : Ethics
• Ethics are Moral principles or view points that governs the behavior of
people, Business or conduction of an activity
• Law involves rules for conduct of individuals and Businesses that may be
used for punishing violations
• We will briefly look at
• Principles and theories associated with Ethics
• Personal Ethics and Business ethics
• Well known breaches of ethical behavior in terms of corporate
accounting, fraud and abuse of position and where it leads to the
punishment to Senior executives
• Legislations which have come up to address and manage these
• Some of the approaches which companies have taken to mitigate these
unethical behaviors and approaches to delegate these to others
Business Ethics : Business ethics are the set of values which a Business
and “Individuals when doing Business” use to influence and guide their
behavior and actions.
• One of the critical success factors for ITSM are also people. These could be in terms of
process owners and the people who form the process teams.
• We discussed on mechanisms which are typically used for getting the people who
have the right skills and skill level.
• These people could have different personal and Business ethics which could breach
the commonly accepted values and behaviors leading to corporate frauds and abuse
like the Enron, Tyco etc. globally and like Satyam in India, and looked at the
legislations (like Sarbanes-Oxley Act) which have come into being to address some
of these.
ITIL
(Information Technology Infrastructure Library)
SS ZG538 Infrastructure Management BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956 4
ITSM Frameworks :
• There are number of ITSM frameworks which businesses can use. Some of the frameworks are targeted
at specific industries or business needs e.g. telecommunications.
• COBIT (Control Objectives for information and Related Technologies): an IT governance framework
• FitSM: a simplified, streamlined service management framework typically aligned with ISO/IEC 20000
• ISO/IEC 20000: considered the international standard for IT service management and delivery
• ITIL is a framework and is a set of IT service management (ITSM) best practices that
focuses on aligning IT services with the needs of business and also provides
approaches which have worked on some scenarios, for selection, planning, delivery,
maintenance and overall lifecycle of IT services within a Business
• It allows the organization to establish a baseline from which it can plan, implement,
and measure. It is used to demonstrate compliance and to measure improvement.
▪ Non-Proprietary: ITIL is not a single vendor view of IT processes and you don’t have to
pay to apply it in your organization
▪ Comprehensive: ITIL captures all of the essential service support and services delivery
processes, and integrates them to work together. E.g. Incident, Error, Issue, Problem,
Fault
• ITIL started as a process-improvement initiative in mid 1980’s in Great Britain, to improve the quality of
products and services provided by IT infrastructure, which at that time was not providing reliable and
responsive services to the bureaucracy and various agencies which were dependent on these services.
• In 1986, the British government’s Centralized Telecommunications and Computing Agency (CTCA)
formally sponsored a program to promote improved management of IT services. As part of this around
40 IT experts from the public, private, and academic sectors to establish a framework of best practices
for managing the IT environment. In 1989 this team came out initially with a set of 42 books that
comprised first version of ITIL
• 1995, the total number of ITIL books had grown to >60 volume and had become unwieldy.
• In 2000 a more condensed ITIL Version 2 came in. This reduced the list of available books to ~7 in
which Service Support and Service Delivery were the most prominent.
• In 2007 ITIL V3 consisting of 26 processes and functions were grouped into 5 volumes around the
concept of a Service Lifecycle.
• In 2019 ITIL V4 came in which focusses on business and technology while working with Agile, DevOps
and digital transformation.
• Service life cycle management (SLM) refers to a strategy that supports service organizations
in examining the service opportunities proactively as a life cycle instead of a solitary event or
set of discrete events.
• Then for each of the phases in the Lifecycle, it provides best practice guidance as a set of
processes which can be followed for IT Service Management.
https://wiki.en.it-processmaps.com/index.php/ITIL_Processes
https://www.cio.com/article/2439501/infrastructure-it-
infrastructure-library-itil-definition-and-solutions.html
▪ Service strategy: Understand organizational objectives & customer needs and provides strategic guidance
for investments in services. Includes service value definition, business-case development, service assets,
market analysis, and service provider types
Processes involved : Strategy Mgmt, Portfolio Mgmt ..
▪ Service design: Involves turning the service strategy into a plan for
delivering the business objectives, technology service
delivery
Processes involved : Service Level Mgmt, Availability Mgmt
▪ Service transition: Developing and improving capabilities for
introducing new services into supported environments. It
relates to the delivery of services required by a business into
live/operational use and encompasses the "project" side of IT
Processes involved : Transition Planning & Support, Change Mgmt
▪ Service operation: Involves managing services in supported environments. Aims to provide best practice
for achieving the delivery of agreed levels of services both to end-users and the customers.
Processes involved: incident mgmt., Ops management, Service management, Service desks etc.
▪ Continual service improvement: Includes incremental and large-scale improvements to services
▪ ITIL V4 looks to facilitate value to customers and stakeholders by holistically looking at service
management along the four dimensions of
▪Organizations and people: An organization needs a culture that supports its objectives, and the
right level of capacity and competency among its workforce.
▪Information and technology: This includes the information, knowledge and the technologies
required for the management of services.
▪Partners and suppliers: This refers to an organization’s relationships with those other
businesses that are involved in the design, deployment, delivery, support, and continual
improvement of services.
▪Value streams and processes: How the various parts of the organization work in an integrated
and coordinated way is important to enable value creation through products and services.
▪ ITIL V4 expands from processes to practices of managing IT services, by factoring in elements such
as culture, technology, information and data management to provide a holistic vision of the ways of
working.
▪ ITIL V4 includes 34 management practices as "sets of organizational resources designed for
performing work or accomplishing an objective". There is various types of guidance, such as key
terms and concepts, success factors, key activities, information objects, etc. for each of these
processes.
29 Jan 2022 SS ZG538 Infrastructure Management 11
ITIL V4 –Holistic Service Management
▪ ITIL defines the Service Value System (SVS), built around the Service value chain, a flexible
operating model for the creation, delivery and continual improvement of services enabling
the components and activities of an organization to work together to enable value creation
▪ Service value chain defines six key activities:
1. Plan 4. Obtain/build
2. Engage 5. Deliver and support
3. Design and transition 6. Improve
They can be combined in many different sequences, thus allowing an organization to define a
number of variants of value streams, like the ITIL v3 service lifecycle.
ITIL V4 has 7 guiding principles to help adopt and adapt ITIL guidance to specific and
different needs and circumstances. illustrating the integration of the Agile, DevOps principles,
& supporting digital transformations
1. Focus on value 5. Think and work holistically
2. Start where you are 6. Keep it simple and practical
3. Progress iteratively with feedback 7. Optimize and automate
4. Collaborate and promote visibility
https://www.axelos.com/news/blogs/february-2019/from-v3-to-4-this-is-the-new-itil https://www.bmc.com/blogs/itil-service-value-chain/
SS ZG538 Infrastructure Management BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956 16
Customer Service
How IT Evolved into a Service Organization
• IT Services management, having IT process groups which are structured, positioned and
staffed with the requisite skills, will need to have the focus on quality of Service it can
provide, for meeting reasonable expectations of customers
• Since IT Systems management involves people who provide services to customers
(employees within the organization), the focus for this topic is to consider customer
service and some of the best practices followed around IT Services for IT systems
management
▪ If you believe users need to be satisfied and happy, say given the organizations you
are working could you share your thoughts on how you (assuming you will be
empowered) would drive the IT organization to be more customer oriented.
▪ what in your opinion would change with a satisfied and happy IT Service
consumer.
▪ What would delight you as an IT service receiver or What would make you to be
satisfied with the IT services offered by your organization
Good customer service would mean meeting and exceeding expectations of your
customers. There are four elements of achieving good customer service:
A. Identifying key customers
B. Identifying key services of key customers
C. Identifying key processes that support key services
D. Identifying key suppliers that support key processes
Illustrating
A. Identifying key customers
Although all company employees are customers of IT, typically a small subset of the
employee customers can represent the rest of the company employees and can be
used for evaluating Customer service and working towards effective process
improvements
The following could be used as a mechanisms/criteria for choosing the key customers
for a typical infrastructure
1. Someone whose success critically depends on the IT services provided
Identifying the groups who are most essential to the core business of the company
and/or applications which are considered mission critical for the company and the
leaders in these (heads or leads) are good candidates for key customers.
Key suppliers provide direct input in terms of products or support to the key processes
E.g. In the earlier example of the Legal department needing the IT service to retrieve
records that have been archived some of the key suppliers would be
• Individuals responsible for storing the data offsite and for retrieving it back to make
it accessible for users.
E.g. In the earlier Human resources example to provide data- security process,
▪ More Focus of customers success in using the services and gaining business value
rather than satisfaction
▪ Presuming SLAs will solve all problems which prevent Customer satisfaction
Availability Management
SS ZG538 Infrastructure Management BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956 2
IT Systems Management
Recap - 1
• IT infrastructure includes all of Physical devices and software in an IT environment
required to operate an enterprise, like the Servers, Disk Storage, DBs, Networks and
Desktop environments
• IT Infrastructure Systems Management involves building processes which can manage
the IT Services running on these IT environment components (for its customers), and
providing a stable and responsive IT environment, which supports or furthers the
Business of the organization, while being Available, Responsive, Cost efficient, Secure,
Scalable,…
• Building and institutionalizing process would need resources, obtained by putting up a
business case to the executives. Once support for a process is obtained, these
processes being built will need to appropriately hosted within organization structure for
ensuring efficient and effective execution. People, whether process owners or
individuals playing different roles in the process and their ethics are critical for the
success of the process. We discussed on different approaches for staffing,
engagement, retention and the influence of ethics on the IT organization.
12 Feb 2022 4
IT Systems Management
• We discussed that there were 12 key processes which we would discuss
as part of IT Systems management as part of the course
• Availability Management • Network Management
• Performance/Tuning • Configuration Management
• Production Acceptance • Capacity Planning
• Change Management • Security
• Problem Management • Business Continuity
• Storage Management • Facilities Management
• Availability Management is the first of the key Infrastructure process used
in IT Systems Management
• As part of this, we will look at how we would define & assess the current
state of availability and then look at approaches which will support or
enhance availability
• We can consider that a system/process when up and running, is available, and if its
not running (regardless of the reason), the system/process is not available.
• The focus for maximizing Availability would be for timely recovery from outages to
service, and methods to reduce the frequency and duration of these outages
Defn Availability:
• Availability is the probability that a system or the IT services will work as required
(may be driven by SLAs), when required, during the period when it needs to be
used for a purpose.
• Availability management process involves planning on how we are going to
prevent failures to have an impact on the IT services and hence the Businesses
and what we are going to do when things go wrong. We look at planning for
optimizing the readiness of production systems by accurately testing, measuring,
analyzing, managing and reducing outages and their impacts to those production
systems and services to meet expectations.
• There can be a lot of components like Datacenter facility, Server H/W components, Server
System SW components, Application S/W, Disks Subsystems HW, DB or Network HW,
Network SW, Desktop HW, Desktop SW which can potentially fail and reduce Availability
• IT Systems management folks face few dilemmas/challenges while managing Availability in
terms of
• Trading the costs of outages of components (and system) against the costs of total
redundancy
• Accountability challenges due to having multiple components corresponding to
multiple process owners
Identifying a process owner responsible for overall availability across different
components of the Datacenter would be ideal for effective Availability management
12 Feb 2022 SS ZG538 Infrastructure Management 12
Differentiating Slow Responses from Downtime
High Availability:
Fault Tolerance:
3. Downtime
▪ Another approach is to track the quantum of downtime and hence the
Availability occurring on a daily, weekly, and monthly basis.
▪ Infrastructure personnel can pinpoint and proactively correct problem areas by
analyzing the trends, patterns, and relationships of these downtimes
▪ Infrastructure personnel can also track several of the major components like
the server environment, the disk storage environment, databases and
networks
▪ Improving levels of availability often involves capital expenditures which most
companies are reluctant to invest in unless a strong, convincing business
justification is offered like the cost of downtime below
4. Rule of Nine
▪ Significant number of service suppliers measure their availability in
percentages of uptime as the rule of 9s. If you have a working week of 100
hours or working hours of 24x7
▪ Goal of availability process owners will be to maximize the uptime of the various online systems
for which they are responsible and make them fault tolerant with minimal impact to the budget
▪ The challenges towards a 100% Available system are the Budget, Component failures, Faulty
code, Human error, Flawed design, Natural disasters, Unforeseen business shifts like mergers,
downturns, political changes
▪ There are several approaches which has been taken to maximize availability for extending
uptime, minimizing downtime, and improving the overall level of service. These are referred
to as 7 Rs
1. Redundancy
2. Reputation
3. Reliability
4. Repairability
5. Recoverability
6. Responsiveness
7. Robustness
1. Redundancy
▪ This is practiced by Manufacturers by designing this into products. E.g. Power supplies, Multiple
processors, Segmented memory, Redundant disks etc.
▪ This can also refer to entire server systems running in a hot standby mode. Infrastructure analysts can take
a similar approach by configuring disk and tape controllers, and servers with dual paths, splitting network
loads over dual lines, and providing alternate control consoles—in short, eliminate as much as possible any
single points of failure that could disrupt service availability
2. Reputation
▪ To build IT environment using products like Servers, disk storage systems, network hardware etc from
suppliers of repute. Reputation refers to the track record of key suppliers
▪ Reputation can be validated with percent of Market share, Industry analyst reports, Track record of
reliability, customer references (which can also confirm factors as cost, service, quality of the product,
training of service personnel, and trustworthiness)
3. Reliability
▪ Reliability pertains to the dependability of the components to function under the stated conditions.
▪ It depends on the process which has gone into building the same ..
▪ This can be verified from customer references and industry analysts. (cont.)
4. Repairability
▪ It’s a measure of how quickly and easily suppliers can fix or replace failing parts
▪ Availability could be increased by planning for increased repairability or reducing
MTTR
▪ MTTR (Mean time to Repair or Recover or Restore or Resolve) is a common metric
used to evaluate this trait and measures the average time it takes to do the actual
repair
▪ Its computed as: MTTR = sum of repair times / # of failures
5. Recoverability
▪ This refers to the ability to overcome a momentary failure in such a way that
there is no impact on end-user availability.
▪ It could be across the spectrum from a single bit error recovery to a having an
entire server system switch over to its standby system with no loss of data or
transactions.
▪ Recoverability also includes retries of attempted reads and writes out to disk or
tape, as well as the retrying of transmissions down network lines.
▪ This could also includes mechanisms of acknowledgements used to ensure
missing or non-ordered delivery of sequence of data in protocols like TCP
6. Responsiveness
▪ Its the sense of urgency all people involved with high availability need to exhibit
in terms of quickness and efficiency when needing to respond to problems
▪ This could also be how quickly automated recovery systems get triggered for
action to restore availability
7. Robustness
▪ It describes the overall design of the availability process to withstand a
variety of forces—both internal and external—that could easily disrupt and
undermine availability in a weaker environment
▪ Robustness is achieved through documentation and training to withstand
▪ Technical challenges related to Platforms, Products, Services,
Customers
▪ Personnel changes as they relate to turnover, expansion and Rotation
▪ Business changes as they relate to New direction, Acquisitions, Mergers
8. Resilience :
▪ If the system has some of the ML capabilities to learn based on patterns and
adapt to the changes
Techniques that have being used to understand the reason for the disruption
of availability
1. Component Failure Impact Analysis :
Component Failure Impact Analysis (CFIA) can be used to predict and evaluate
the impact on IT service arising from component failures within the technology.
The output from a CFIA can be used to identify where additional resilience
should be considered to prevent or minimize the impact of component failure to
the business operation and users
2. Single Point of Failure analysis
A Single Point of Failure (SPoF) in any component within the IT infrastructure
that has no backup or fail-over capability, has the potential to cause disruption
to the business, customers or users when it fails. It is important that no
unrecognized SPoF’s exist within the IT infrastructure design or the actual
technology, and that they are avoided wherever possible. If the system has
some of the ML capabilities to learn based on patterns and adapt to the
changes
Techniques that have being used to understand the reason for the disruption
of availability (Contd)
3. Fault Tree Analysis Fault Tree Analysis (FTA) :
Its a technique that can be used to determine the chain of events that causes a
disruption to IT services. FTA, in conjunction with calculation methods, can offer
detailed models of availability.
Operations can be performed on the resulting fault tree; these operations
correspond with design options
• There are 10 categories or key measures or characteristics about a Process. These could be
categorized into the three objectives of Quality, Efficiency and Effectiveness
Quality Efficiency Effectiveness
1. Executive support 4. Supplier Involvement 8. Customer involvement
2. Process owner 5. Process metrics 9. Service metrics
3. Process documentation 6. Process integration 10. The training of staff
7. Streamlining/automation
• The degree to which each characteristic is put to use in designing and managing a process is
a good measure of its relative robustness
Quality
• Executive support
• Process owner
• Process documentation
Efficiency
• Supplier involvement
• Process metrics
• Process integration
• Streamlining/automation
Effectiveness
• Customer involvement
• Service metrics
• The training of staff
Characteristics within each category is rated
on a scale of 1 to 4 with 1 indicating no
presence and 4 indicating a large presence
of the characteristic.
• Since all categories are rated in the same fashion, a single column could have
been used to record the ratings of each category
• if we format separate columns for each of the four possible scores, categories
scoring the lowest and highest ratings stand out visually and quantifies areas of
strength and weakness for a given process
• This also acts as a benchmark from where future process refinements can be
quantitatively measured and compared
In your organization what do you see as the planning and activities which
people do to ensure that new or changed services can support the expected
SLAs and what are the kinds of measures which you would use
Can you think about what are those things in the environment which people
monitor
What are those performance indicators which will indicate whether your
availability of key processes in your organization is supported or not
supported .. What kind of indicators would you use and what are the Key
Performance Indicators (KPIs) and CSFs (Critical Success Factors).
Lecture 5
Performance & Tuning
SS ZG538 Infrastructure Management
Performance and Tuning of Infrastructure can be looked at from the perspectives of a Server Environment,
Disk Storage Environment, Network Environment, Desk top compute environment and Databases as the
discrete components of typical IT Infrastructure.
In case of cloud Infrastructure, this could also be looked at from the perspective of managing Virtualized
physical components and managing these abstracted virtual components for performance.
Cont.
19 Feb 2022 SS ZG538 Infrastructure Management 12
Performance & Tuning of Infrastructure Areas
1. Server Environment
This refers to looking at performance tuning across all compute platforms from mainframe computers,
midrange computers, workstations to servers.
Major places in a server environment where performance bottlenecks can be seen & addressed are
a. Processors
▪ The number and power of processors influence the rate of work accomplished for processor-
oriented transactions.
▪ The extent of utilization of the processor also has a bearing on the performance. Keeping the
processor utilization (as a tunable lever) to be ~70% can lead to more optimal performance
b. Main memory and size of swap space
▪ Size of the main memory, and configuration of the main memory for swap space (part of main
memory to hold frequently used portions of programs to reduce time-consuming I/O operations)
are potential bottlenecks and levers for tuning for performance
c. Number and size of Buffers
▪ The number and size of Buffers (which are high-speed registers of main memory that store data
being staged for input or output operations) can trade off the amount of memory available for
process-oriented transactions
Cont.
19 Feb 2022 SS ZG538 Infrastructure Management 13
Performance & Tuning of Infrastructure Areas
1. Server Environment (Contd.)
g. Fragmentation
▪ Extents are needed due to fragmentation of disks. Higher the fragmentation, larger the
number of extents and the other way too. Reallocating for files reducing the extents,
defragmentation are good tuning methods to improve fragmentation
h. Database fragmentation
▪ Database records are rewritten in locations that are different from their optimal starting points. This causes
data to be not optimally placed for its anticipated access pattern and the fragmentation of the data records
causing less than efficient use of the disk space.
▪ Periodic defragmentation based on the reports and statistics can help in addressing the above
d. Single Sign-On
▪ There is a trade-off between performance and security. The convenience and performance
savings of logging on only once instead of multiple times for access to the network, operating
system, database management system, and a particular application must be weighed against the
potential security exposures of bypassing normal password checks.
▪ Most applications have robust security features designed into them to mitigate this risk
e. Number of Retries
▪ Transmission errors and particularly transient ones occur periodically in a network. Network
Protocols have this factored in and retry for a period of time or number of retries before its flagged
as an error. This value if very large or too small can impact the performance of the network and
needs to adjusted
f. Non-Standard Interfaces and Broadcast Storms
▪ Suppliers can sometimes offer interfaces to increase the compatibility of these devices with a
network. These when working fine, can support the business need at acceptable
productivity/performance levels.
▪ They can also cause major performance problems such as locking up lines, introducing
interference, or, in extreme cases, flooding the network with nonstop transmissions called
broadcast storms when not compatible.
▪ Caution needs to be exercised with heightened awareness of network performance in these
scenarios
19 Feb 2022 SS ZG538 Infrastructure Management 23
Performance & Tuning of Infrastructure Areas
5. Desktop Compute Environment
Common Desktop Environment issues which can influence Performance and Tuning are
Maximum Weight – 28
Maximum Rating Value - 4
Once the forms are filled up if you were assembling the Infrastructure from scratch,
then it would involve
• Build up the physical datacenter room
• Install redundant power cabling
• Install racks
• Test the facilities
• Install the server, networking, and storage hardware
• Allow for a burn-in period
• Check the power and cooling usage
• Configure the infrastructure components
• Install systems management tools
• Test systems management processes
• Follow the deployment process
Scenarios for Go-Live: Big Bang Parallel changeover or Phased changeover
No formal QA, bit of change management and production acceptance process but
manual. People focused with 2 trained expert engineers handling it based on tacit
knowledge. Minimum Documentation. Compartmentalized physically distributed
organization. Metrics collected.
Positive Outlook: Willing to try out new things, wants to improve
Limitations: Non effective communications, minimal documentation, tacit training, no
formal repeatable process for improvements.
Operations team-Initiated formal production acceptance process.
Learnings: Development and operations team need to work together. Infrastructure
group to support the Production acceptance process.
Case Study A
• Ensure the operations department is involved very early with a new application projects.
This helps ensure that the appropriate operation’s group provides or receives the proper
resources, capacity, documentation, and training required for a successful deployment.
• Support from Infrastructure groups are essential
Case Study B
• Plan for and ensure long range commitments of IT
• Consider a change management process prior to a Production Acceptance process
Case Study C
• Organization structure can help or reduce the effectiveness.
• IT executives should ensure that operations control the PA process and that development
is involved in the process design from the start
Case Study D
• Commitment to follow the processes in spite of pressures
• There are significant benefits in standardizing across divisions and sites. This can help in
mergers integrations.
…
BITS Pilani extracted from IT Systems Management -Rich Schiesser and other books/sources from Internet. Although
permission for use from the book was requested, there has been no response to the same. Since these were
Pilani|Dubai|Goa|Hyderabad intended only for presentation in the class room, have continued to use but would like to sincerely thank,
acknowledge and reiterate that the credit/rights remain with the original authors/publishers only
• Change Management has the most interactions with other disciplines. As part of
going through this process we will look at
• Introduction to Change Management in terms of, the process to control and
coordinate all changes to the IT production environment
• Components of Change Management
• Drawbacks of most Change management processes
• Key steps required in developing a Change management process
• Emergency changes metrics
• Assessing an Infrastructure’s Change Management Process
• Measuring and Streamlining the Change Management Process
Meet
Speed - Stability
1. Value Realization
Benefit realization (function and Non-Function)
• Responding to changing Business requirements and aligning to Business Goals (making it fit for
purpose)
• Responding to changes in terms of transition to emerging technologies and processes.
(we realize the benefit and would like to change)
Risk Mitigation
• I am managing risk by changing
(Addressing Issues or fixing things with goal of reducing disruption and enhancing effectiveness)
Asset Optimization
• Resource optimization
(Reduction in cost, increase utilization etc.)
All of these while
• Ensuring changes are properly handled and managed
• Ensuring all changes to CIs achieve the desired outcome and are recorded in the CMS
• Ensuring governance and compliance expectations are met
▪ All Changes have an impact on the IT environment, and hence need to be assessed for it. Approval and
test & Validation process could be different based on the impact
▪ Scope of the impact could be based on the number of users impacted, number of systems impacted, time
period impacted, risk involved etc.
▪ If the scope of the change is “No-impact” or “low-impact” with limited scope, the approval process
may need approval at a lower level and the Verification may not be as stringent.
▪ If the scope of the change is higher, the change control process could include more formal process
of requesting, prioritizing, and approvals
▪ Change co-ordination
• Change co-ordination involves collaboration and work together, planning and scheduling the change,
Communicating/notifying effectively
5. Schedule: Agree upon the date and time of the change, which systems and
customers will be impacted and to what degree, and who will implement the change.
E.g. The change will need to occur at the time the new tax tables would go into effect.
6. Communicate/Notify: Inform all appropriate individuals about the change via agreed-
upon means.
E.g. This could go out to all users, service-desk personnel (in the event users call in
about the change), and support analysts.
The following 13 steps are required 7. If change metrics exist, collect and
to implement an effective change analyze them; if not, set up a process
management process. to do so.
12. Develop a Charter for a Change Advisory • Cancel at the discretion of the board.
Board (CAB) • Analyze total number and types of changes from the
prior week to evaluate trends, patterns, and
The first review meeting of the CAB would set up relationships.
the CAB charter statements like ones as below
Functioning
(Scope, Desired outcomes, Functioning, Duration
and time, Reporting ..) • Few of the roles may have Veto’s.
Scope • Changes will be approved, modified, or cancelled by
a simple majority of the voting members present.
• Review all upcoming high‐impact change requests
submitted to the CAB by the change coordinator Periodicity
Review a summary of the prior week’s changes • CAB will meet every Wednesday from 3 p.m. to 4
p.m. in room 1375.
• Validate that all emergency changes from the
prior week were legitimate. Reporting
• Review and track the status of all planned change • CAB typically is an independent team to begin with
requests from the prior week as to impact level • CAB meeting will eventually become part of a
and lead time. systems management meeting at which the status of
problems, service requests, and projects are also
Outcome
discussed.
• Approve if appropriate.
• Disputes will be escalated to the senior director of
• Modify immediately (if possible) and approve (as the infrastructure department.
appropriate).
• Send back to requester for additional information.
13. Use the CAB to Continually Refine and Improve the Change Management Process
▪ A constant action item in the CAB (Change Advisory Board) meeting would be discuss
improvements for the change management process. If any approved by the CAB then
should be assigned, scheduled for implementation for follow on meetings.
PTO
26 Feb 2022 SS ZG538 Infrastructure Management 29
Change Management
Emergency Change Metrics
▪ Emergency change is an urgent mandatory change requiring manual intervention in less than 24
hours (typically to be scheduled ~2-4 hours), to restore or prevent interruption of accessibility,
functionality, or acceptable performance to a production application or to a support service
▪ One of significant metrics for change management is the number of emergency changes
occurring each week, when compared to the weekly number of high-impact changes and total
weekly changes
▪ The degree of emergency change management reflects the proactiveness or reactiveness of the
environment. A number of ~15-20% could mean the environment is reactive
▪ Higher percentage of Proactive changes indicates that the changes are thoroughly planned and
properly coordinated well in advance.
▪ The number of emergency change metrics trended over time indicates maturity of the process
and gives a good indication on the progress towards being proactive.
26 Feb 2022 SS ZG538 Infrastructure Management 30
Change Management
Process
Maximum Weight – 32
Maximum Rating Value - 4
▪ We can measure and streamline the change management process with the help of the
assessment worksheet.
▪ We can measure the effectiveness of a change management process by analyzing service metrics
such as availability, the type and number of changes logged, and the number of changes causing
problems, number of changes which had to be rolled back.
▪ Process metrics like the changes logged after the fact, changes with a wrong priority, absences at
CAB meetings, and late metrics reports, which can help us gauge the efficiency of the process.
▪ Change management process can be streamlined by automating certain actions like changes
logged after the fact, changes with wrong priorities, absences at CAB meetings, and late metrics
reports etc.
• The next process which we will look to design and implement is Problem
Management. As part of going through this process we will look at
• Context of Problem Management
• Definition and Scope of Problem Management
• Distinguishing Between Problem, Change, and Request (Service Request)
Management
• Distinguishing Between Problem Management and Incident Management
• The Role of the Service Desk
• Segregating and Integrating Service Desks
• Key Steps (11) to Developing a Problem Management Process
• Client Issues with Problem Management
• Assessing an Infrastructure’s Problem Management Process
• Measuring and Streamlining the Problem Management Process
2 2- Tier Traditional Problem Management (SRM + Specialized Support) trouble call Problem
3 3-Tier - Escalation Management Service identified
Desk
4 Major Service Disruption - Crisis Management
5 1-Tier Reporting and Service Request management
6 All tiers reporting and Service request management Problem DB
7 All tiers reporting and both Service request and change management)
In ITIL processes focused towards ITIL service operation like problem management, incident
management, event management, request fulfillment and access management addresses this
Book-?
▪ Here are the Advantages and disadvantages of a Segregated Service desk is as below.
Advantage Disadvantage
Ability to customize specialized support Hard to cross train
for diverse applications, customers, and
services
If there are too diverse a set of support Customers will need to call many times or
needing to be provided, a segregated have many service desks to choose from,
support center can offer better depth of making it not as effective
service
Increased cost
Easier to integrate to different independent
processes
▪ A compromise hybrid solution is sometimes used in which all IT customers call a single
service desk number that activates a menu system. The customer is then routed to the
appropriate section of a centralized service desk depending on the specific service
requested.
▪ The following are the 11 steps which are required for developing a robust problem
management process.
1. Select an executive sponsor.
2. Assign a process owner.
3. Assemble a cross‐functional team.
4. Identify and prioritize requirements.
5. Establish a priority and escalation scheme.
6. Identify alternative call‐tracking tools.
7. Negotiate service levels.
8. Develop service and process metrics.
9. Design the call‐handling process.
10.Evaluate, select, and implement the call‐tracking tool.
11.Review metrics to continually improve the process.
………..
26 Feb 2022 SS ZG538 Infrastructure Management 46
Problem Management
Key Steps to Developing a Problem Management Process - 2
………..
26 Feb 2022 SS ZG538 Infrastructure Management 48
Problem Management
Key Steps to Developing a Problem Management Process - 4
5. Establish a scheme for Priority and Escalation of the incidents which reach
the Service Center
▪ This is the scheme on whose
basis you classify and assign
priority to the incidents/
problems which land into the
service center.
▪ This is also the step where
you will establish a scheme
which will prescribe how to
handle escalations (typically
these are high priority difficult
to resolve problems)
▪ Typically most organizations
attempt to prioritize problems
based on severity, impact
urgency, and aging.
………..
26 Feb 2022 SS ZG538 Infrastructure Management 49
Problem Management
Key Steps to Developing a Problem Management Process - 5
6. Identify Call-Tracking Tools with Alternatives to choose from
▪ An effective problem management process revolves around a call-tracking tool, whose requirements
are evolved in step 4, and number of alternative tools are identified as part of this step
▪ Companies usually custom build a tool (cheaper and built to the process of the organization) or buy
an off the shelf tool (more flexibility with integration capabilities but expensive)
7. Negotiate Service Levels
▪ External service levels enforceable, and mutually agreed upon by both the customer service
department and IT.
▪ Internal service levels should be negotiated with internal level 2 support groups and external
suppliers.
8. Develop Service and Process Metrics
▪ Service metrics should be established to support the SLAs that will be in place with key customers.
▪ The following are some common problem management Service metrics:
▪ Wait time when calling help desk
▪ Average time to resolve a problem at level 1
▪ Average time for level 2 to respond to a customer
▪ Average time to resolve problems of each priority type
▪ Percentage of time problem is not resolved satisfactorily
▪ Percentage of time problem is resolved at level 1
▪ Trending analysis of various service metrics
26 Feb 2022 SS ZG538 Infrastructure Management 50
Problem Management
Key Steps to Developing a Problem Management Process - 6
8. Develop Service and Process Metrics (Cont.)
▪ The following are some common problem management Process metrics:
▪ Abandon rate of calls
▪ Percentage of calls dispatched to wrong level 2 group
▪ Total number of calls per day, week, month
▪ Number of calls per level 1 analyst
▪ Percentage of calls by problem type, customer, or device
▪ Trending analysis of various process metrics
9. Design the Call-Handling Process
▪ The entire cross-functional team will need to design this.
▪ It dictates how problems are first handled, logged, analyzed and communicated
▪ This will also indicate how they might be handed off to level 2 for resolution, closing, and
customer feedback
▪ This will also design how the proactive calls will be selected
10. Evaluate, Select, and Implement the Call-Tracking Tool
▪ Alternative call-tracking tools are evaluated by the cross-functional team to determine the final
selection.
▪ The selected tool is then implemented
▪ In the initial phases, the tool is passed through a small subset of calls to pilot the
implementation.
26 Feb 2022 SS ZG538 Infrastructure Management 51
Problem Management
Key Steps to Developing a Problem Management Process - 7
▪ We can measure and streamline the problem management process with the help of the
assessment worksheet.
▪ We can measure the effectiveness of a problem management process with service metrics
such as calls answered by the second ring, calls answered by a person, calls solved at level
1, response times of level 2 and feedback surveys.
▪ We can also look at the Process metrics—such as calls dispatched to wrong groups, calls
requiring repeat follow-up, amount of overtime spent by level 2, third-party vendors etc.
which help us gauge the efficiency of this process.
▪ We can streamline the problem management process by automating actions such as
escalation, paging, exception reporting, and the use of a knowledge database.