Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 10

Capacity Management

● Refers to the wide variety of planning actions used to ensure that a business infrastructure has
adequate resources to maximize its potential activities and production output under any
condition.
● Act of ensuring a business maximizes its potentials activities and production output
● Measures how much companies can achieve, produce or sell within a given time period
● Maximum throughput that a configuration item or IT service can deliver

Changing conditions or External influences


● Seasonal demand
● Industry changes
● Unexpected macroeconomic events

Capacity Management
● Working overtime
● Outsourcing business operations
● Purchasing additional equipment
● Leasing or selling commercial property

Space Management
● Calculating the proportion of spatial capacity that is actually being used over a certain time
period
Goal: Ensure that cost-justifiable IT capacity in all areas of IT always exists and is matched to the current
and future agreed need of the business, in a timely manner.
Purpose: Provide a point of focus and management for all capacity and performance related issues.

Capacity Management Information System (CMIS)


- a collection of IT infrastructure usage capacity and performance information gathered in a consistent
many and stored in one or more databases.
- single book of record for all usage, capacity and performance data, complete with associated business,
application and service statistics
- used by any IT staffer needing access to capacity management data
- all data is synchronized from a collection period perspective
- it is scrubbed to ensure it is consistent and accurate.

IT Service Management Processes frequently accessing CMIS Data:


● Capacity Planning
● Performance Management
● Service Level Management
● Help/Service Desk
● Incident Management
● Problem Management
● Configuration Management

Capacity Management Database (CDB)


- was the central data store but ITIL proponents realized that it fell short of what was needed to take
capacity management to the next level
- a collection of data but there were no standards regarding the collection and archival nor integration
between the different technologies

CMIS Contents
● Business performance data
● Financial data
● Business transaction metrics
● Infrastructure upgrade costs
● Application transaction counts
● Power and cooling cost
● Invoices generated
● IT budget information
● IT service performance data
● Component utilization data
● Transaction response times
● Server performance metrics
● Transaction rate
● Network performance
● Workload volumes
● Data storage measurements
● Memory usage
Characteristics of CMIS

1. Openness
- goal of CMIS is to become the central hub for all performance-related data
- a good CMIS need to make it easy to het information in and out
- a comprehensive performance data regarding the infrastructure going in and efficient access to
that data for analysis and reporting purposes.
● Data collectors use the CMIS to store information
● Performance and other systems management tools use it to access data and share analysis
results
- should be possible to effectively instrument all the critical applications (custom application)
- should be able to implement custom analysis and reports
- should facilitate information sharing with Configuration Management Database (CMDBs),
chargeback application and other tools
- should be able to alert event consoles and service desk tools when adverse events are detected

1. Business-relevant Views
- CMIS should let tools analyze and report on enterprise IT infrastructure from:
○ a component useful for problem-solving and technology-
view, specific detailed planning
○ an IT service- help facilitate business-aligned analysis and
based view, reporting
○ a business
process view
- these views allows you to relate operational and planning results at many different levels

1. Real-time Data
- ability to detect and respond to performance bottlenecks will be hampered if your CMIS is
unable to collect and deliver performance data in real-time

1. Heterogeneous Coverage
- one advantage of CMIS is being able to manage the performance and capacity of all those
platforms from a single repository
- CMIS can handle data from the key platforms within your data centers

1. Automation
- a good CMIS has built-in automation to handle most of the repetitive task and provide interfaces
where you can automate other related task that are specific to your organization's needs.

1. Scalability
- CMIS must have the ability to scale up or down to meet the growing need of the organization

1. Efficiency
- best CMIS tools minimize their use of computing resources, networking bandwidth and require
fewer data storage to perform their work

1. Security
- prevents unauthorized changes or deletion of historical data
- permits you to restrict access to proprietary data stored in the CMIS (e.g. business lan to
preserve competitive advantage)

1. Support
- CMIS must have a capable support team available to assist you with the implementation and
ongoing maintenance of CMIS

How Capacity Management Works


Capacity Management Tools
- measure the volume, speeds, latencies, and efficiency of the movement of data as it is processed by an
organization's application
- able to examine the operations of all the hardware and software in an environment and capture
critical information about data flow
- must be able to observe the individual performance of IT assets, as well as how these assets interact.
- should be able to monitor and measure the following IT elements:
● Servers
● End-user devices
● Networks and related communications devices
● Storage systems and storage network devices
● Cloud services
- relies on the interception of data movements metrics and the internal processes of individual
components

E.g.
1. IOmeter - free, open-source utility originally developed by Intel that provides details about
processing by servers, clusters of servers, or individual end-user computers
IOPS (input/output operations per second) - basic measure of the transfer rate of data during
processing.
1. Emulation Programs - mimic application programs such as database management system
(DBMSes) to determine how a system is likely to perform under similar loads in production
environments
2. Application Emulators - include their own sets of test data to help ensure accurate and
consistent results across disparate equipment.
3. Hardware-based monitoring devices - focus on network performance and can provide
comprehensive information on most aspects of data movement.

Components
1. Control devices (servers with specialized software)
2. Network TAPS ( Network Test Access Points) - devices that physically hook into particular
elements of a network to capture information about data traffic as it occurs.

Components of Capacity Management


- have a fairly narrow scope, providing high-level information on a variety of infrastructure components
- provide detailed metrics related to one segment of the computing environment
- gather as much information as possible and then to attempt to correlate those measurements into an
application-centric picture that focuses on the performance and requirement of mission-critical
applications across the environment rather than how individual components are performing.
Performance (throughput)
- key metric in capacity management as it may point to processing bottlenecks that affect overall
application processing performance
- CPU, routers, storage and controllers should be monitored to ensure that their processing capabilities
are not frequently pinning or at near 100%

Memory
- a factor in capacity management
- servers and other devices use their installed memory to run applications and process data

Physical Space
- most commonly associated with capacity management
- focus generally on storage space for application and data
● Storage Systems
- that are near capacity will have longer response time, as it takes longer to locate specific data
when drives (hard disk/solid-state) are full or nearly full
● Processor and memory measurements
- it's important to monitor space usage in devices other than server and end-user PCs that may
have installed storage that's used for caching data.

Disaster Recovery Plan?


● A documented, structure approach that describes how an organization can quickly resume work
after an unplanned incident
● An essential part of a business continuity plan (BCP)
● Is applied to the aspects of an organization that depend on a functioning IT infrastructure
● Aims to help an organization resolve data loss and recover system functionality
● Step-by-step plan consists of the precautions to minimize the effect of a disaster so the
organization can continue to operate or quickly resume mission-critical functions
● Typically involves an analysis of business processes and continuity needs
● Before generating a detailed plan, an organization performs a business impact analysis (BIA) and
risk analysis (RA) and stablishes recovery objectives
● Define data recovery and protection strategies
● Ability to quickly handle incident can reduce downtime and minimize both financial and
reputational damage
● Ensure that organizations meet all compliance requirements while also providing a clear
roadmap to recovery.

Some types of disasters:


● Application failure
● Communication failure
● Data center disaster
● Building disaster
● Campus disaster
● Citywide disaster
● Regional disaster
● National disaster
● Multinational disaster.

Recovery Plan Considerations


● Recovery time objective (RTO)
○ Describes the target amount of time a business application can be down
● Recovery point objective (RPO)
○ Describes the age of files that must be recovered from backup storage for normal
operations to resume
● Recovery strategies
○ Define an organization's plan for responding to an incident
● Disaster recovery plans
○ Describe how the organization should respond
○ Derived from recovery strategies
● Budget
● Insurance coverage
● Resources (people and physical facilities
● Management's position on risk
● Technology
● Data
● Suppliers
● Compliance requirements

Types of Disaster Recovery Plans


Environment-Specific Plans

● Virtualized DRP
○ Provides opportunities to implement disaster recovery in a more efficient and simpler
wat
○ Can spin up new virtual machine (VM) instances within minutes and provide application
recovery through high availability
○ Testing is easier to achieve
○ Plan must include the ability to validate that application can be run in disaster recovery
mode and returned to normal operations within the RPO and RTO
● Network DRP
○ Recovering network gets more complication as the complexity of the network increases
○ It is important to detail the step-by-step recovery procedure, test it properly and keep it
updated
○ Data will be specific to the network such as performance and networking staff
● Cloud DRP
○ Can range from a file backup in the cloud to a complete replication
○ Can be space, time and cost-efficient but requires proper management for maintenance
○ Manager must know the location of the physical and virtual servers
○ The plan must address security which is a common issue that can be alleviated through
testing
● Data Center DRP
○ Focuses exclusively on the data center facility and infrastructure
○ Operation risk assessment is a key element because it analyzes key components such as
building location, power systems and protection security and office space
○ Plan must address a broad range of possible scenarios

Scope and Objectives


● Business Continuity Institute and Disaster Recovery Institute International provide free
information and online how-to articles
● DRP checklist
○ Identifying critical IT systems and networks
○ Prioritizing the RTO
○ Outlining the steps needed to restart
○ Reconfigure and recover systems and networks

How to Build a Disaster Recovery Plan


Business Impact Analysis (BIA)
● Identifies the impacts of disruptive events and is the starting point for identifying risk within the
context of disaster recovery
Risk Analysis (RA)
● Identifies threats and vulnerabilities that could disrupt the operation of systems and processes
highlighted in the BIA
● Assesses the likelihood of a disruptive event and outlines its potential severity

DRP Checklist/Steps
● Establish the range/extent of necessary treatment an activity and the scope of recovery
● Gathering relevant network infrastructure documents
● Identifying the most serious threats and vulnerabilities and most critical assets
● Reviewing history of unplanned incidents and outages and how they were handle
● Identifying current disaster recovery strategies
● Identifying the incident response team
● Management review and approve the DRP
● Testing the plan
● Updating the plan
● Implementing a DRP audit

Elements of DRP
● A statement if intent and disaster recover policy statement
● Plan goals
● Authentication tools (passwords)
● Geographical risk and factors
● Tips for dealing with media
● Financial and legal information and action steps
● Plan history

Communication Plan
● Another component of DRP
● Details how both internal and external crisis communication will be handled
○ Internal Communications
■ Alerts that can be sent using email, overhead building paging systems, voice
messages or text messages to mobile devices
■ Examples (instructions to evacuate the building, updates on the progress of the
situation
○ External Communications
■ Include instructions on how to notify family members in the case of injury or
death; how to inform and update key clients and stakeholders

Disaster Recovery Plan Template


● Begin with summary of vital action steps and a list of important contacts
● Define roles and responsibilities of disaster recovery team
● Outline the criteria to launch the plan into action
● Specify in detail the incident response and recovery activities

List of Disaster Recovery Test


1. LOSS OF KEY STAFF SCENARIOS
■ Plane crash with critical personnel on board
■ Major transit incident prevents staff from getting to the office
■ Major flu epidemic strike
■ Employees go on strikes
2. LOSS OF KEY TECHNICAL INFRASTRUCTURE SCENARIOS
■ Building fire at office
■ Trucker plows through power cable supporting the office building
■ An employee at a branch office smells a "gas-like smell" coming from the locked
server room
■ Office hardware, computers and telephony have been stolen overnight
■ Fans and cooling systems to the data center have lost power
■ Disgruntled employee takes anger out on
■ Loss or corruption of critical application
3. LOSS OR CORRUPTION OF KEY DATA SCENARIOS
■ Faulty backup tapes
■ HR's office is completely cleaned out by burglars in the middle of the night
■ Network has been hacked
■ Your servers hacked
■ Rogue file sharing instance have lead to a data breach
4. ENVIRONMENTAL SCENARIOS
■ Storm
■ Earthquake
■ Incidents
■ Riots
■ tsunami

You might also like