Professional Documents
Culture Documents
RCA - Downtime Due To Broadcast Storm
RCA - Downtime Due To Broadcast Storm
ROOT CAUSE ANALYSIS OF NETWORK DOWNTIME DUE TO BROADCAST STORM – 20 MAY 2008
PEOPLE TECHNOLOGY
4 Hour Intermittent
Network Connection
Due To Broadcast
Storm
No Change Management No new employee deployment
No policy on how to address Process and Procedures. process in place.
rush/immediate network change
requests
POLICIES PROCESS/
PROCEDURE
RP Natividad
FISHBONE DIAGRAM:
ROOT CAUSE ANALYSIS OF NETWORK DOWNTIME DUE TO BROADCAST STORM – 20 MAY 2008
1. No Monitoring software/tools that can see network health 1. Identify and setup Open Source monitoring system EO June 2010
and prevent Broadcast Storm. (ZENOSS/CACTI) that can monitor build up of broadcast
packets, CPU utilization, etc in network devices.
2. Broadcast storm control not implemented in switches. 2. Reconfigure and implement broadcast storm control in 15 June 2010
Cisco switches.
1. IT Staff lacks training in troubleshooting and configuring 1. Send IT staff to basic networking and basic Cisco TBD – Coordinate
routers and switches. administration training. with OD/ER
PEOPLE
2. Fast deployment and frequent relocation of staff results to 2. Set standard TAT for deployment and relocation of 7 July 2010
undocumented network changes. workstations to allow enough time to document changes.
3. No single person accountable/knowledgeable in maintaining 3. Hire Systems Administrator who will be responsible in EO May 2010
and documenting network and servers. maintaining Servers and Network devices.
POLICIES
1. No policy on how to address rush/immediate network 1. Develop and Implement Change Management Policy. EO June 2010
change requests.
PROCESS / PROCEDURS
1. No Change Management Process and Procedures. 1. Document and implement Change Management Process EO June 2010
and Procedure.
2. No new employee deployment process in place. 2. Implement new employee deployment process. EO June 2010
3. Unclear procedure on how to troubleshoot broadcast storms 3. Create Basic network troubleshooting process /procedure. EO June 2010
and network problems.
RP Natividad