Professional Documents
Culture Documents
Chapter 3 - Network Operations
Chapter 3 - Network Operations
NETWORK OPERATIONS
Network Monitoring
Network operations starts with monitoring. Monitoring provides visibility that facilitates
all other operational tasks.
Monitoring is a basic requirement of any service. These are the minimal items that
should be monitored for a network:
• Network devices (routers and switches, not endpoints such as PCs):
– Health (up/down status)
– Internal components (blades, slaves, extensions)
– Resource utilization (memory, CPU, and so on)
• For each WAN link, LAN trunk plus the individual links that make up a bonded set:
– Health (up/down status)
– Utilization (how much capacity is in use)
– Error counts
Management
Network management is often conflated with monitoring, but they are really two entirely
separate things.
Monitoring observes the current state of the network and helps you to detect, predict,
and prevent problems.
Management means controlling device configuration and firmware versions.
Any changes that are made to network devices need to generate an audit trail that
enables one to trace the change back to a person. Audit trails are a regulatory
requirement for many businesses, and extremely useful for all businesses.
The audit logs should be sent off the device to a centralized logging service, for event
correlation and subsequent review as and when they are required.
Life Cycle
The life cycle of a device is the series of stages it goes through between being
purchased and ultimately disposed of. The life cycle of network devices is considerably
different than the Software Life Cycle.
The life-cycle phases for network devices can be described as in stock, assigned,
installed, deploying, operational, servicing, decommissioning, and disposed.
The life-cycle state should be tracked in the inventory system.
Depending on the type of device and the life-cycle phase that it is in, different levels of
change control are appropriate. The guiding principle is that change management is
needed when the change affects the production network.
2. Assigned: A device moves to assigned state when it has been earmarked for a
particular use.
3. Installed: A device moves to installed state when it has been unboxed and racked.
Up until this point, no change control has been necessary, except for any datacenter
change control process that restricts when physical installations can take place.
4. Deploying: In the deploying phase, the device is powered up, and connected to the
management network and a console server.
The correct version of software is installed, along with some initial configuration.
As long as the network device is sufficiently isolated so that it cannot impact the
production network—by injecting routes into the routing table, for example—then no
change control is necessary during this phase.
As the deploying state can involve a considerable amount of work, it is useful to impose
the rule that a device in this state cannot be connected to the production network in a
way that could cause an impact. With that rule and appropriate controls in place, even
the strictest change review board will stipulate that changes to a device in the deploying
state do not require formal change control.
5. Operational: Moving a device from the deploying state into the operational state
usually involves change control. Once it is in operational state, it is subject to standard
change-management processes.
6. Servicing: If a device fails in some way, it may be taken temporarily out of service
and put into the servicing state. From here, it may return to the operational state, if it is
put directly back into the service, or it may return to stock, if it was replaced by another
device.
From there the device can move back to stock, or it can be disposed of.
8. Disposed: After a device has been decommissioned and disposed of, it is marked as
disposed in inventory. Usually devices in the disposed state are visible in inventory for a
fixed period of time, after which information about that device is archived.
Configuration Management
Configuration management for network devices lags behind what is being done with
workstations and servers.
Audit logs from network devices should be collected centrally for later use and analysis.
Network devices should be configured to log configuration changes to an external log
server. This server will log who made the change and provide a timestamp. Integrate this
into the change-management systems to track which modifications are related to specific
change requests.
Software Versions
Network Devices are all upgraded separately, and they may all be running different
versions of code. The proliferation of different versions of code makes it difficult to
automate, and difficult to keep up-to-date with vulnerability patching and bug tracking.
One version should be the strategic version, and the other should be the version that
you are phasing out.
Deployment Process
Firmware, or software, releases for network devices should be treated in the same way
as any other software.
The new release should be subjected to an intensive testing process in the lab.
The deployment to the lab equipment and the subsequent testing should both be as
automated as possible. Start with a checklist of what to test and how, and add to it as
you find new failure scenarios, perhaps due to problems in production.
Documentation
The first is to assist people in troubleshooting problems, so that they can understand
and resolve the problems as quickly as possible.
The second is to explain what the network looks like, how it has been put together, and
why.