Professional Documents
Culture Documents
Disaster Recovery Plan Template
Disaster Recovery Plan Template
Disaster Recovery Plan Template
Templates are intended to be a guide for completing department disaster recovery plans and may be modified to meet specific requirements.
For information regarding disaster recovery planning please review the Disaster Recovery website.
The information presented in this template should be viewed as a guideline to help teams develop a model best suited for their DR
requirements. Therefore each owner responsible for DR planning should add or remove areas of this template as necessary to ensure
successful recovery of their application or infrastructure production environment in the event of a disaster.
Definition and Scope
Unexpected interruptions to normal production will occasionally occur. Such interruptions may be caused by any number of things, such as a
power outage, human error or technical failure within the application or infrastructure. Additionally some disruptions may be a result of a
cascading event at a remote "upstream" or geographically separated site.
Evacuation due to a hazardous materials spill, accessibility problems due to civil unrest, and wide-area power failures are examples of
business interruptions due to remote or foreign conditions. Most interruptions are temporary, with conditions returning to normal within a time
frame considered non-critical for the business environment. Other interruptions can quickly escalate into extended periods that severely
impair an organization's ability to conduct business. As the ability to do business is impaired, the customer base dwindles and market share is
lost.
The term disaster lends itself to a preconceived notion of a large-scale event, usually a natural disaster. In fact, each event must be
addressed within the context of the impact it has on the organization and the surrounding community. What might constitute a nuisance to a
large industrial facility could be a disaster to a small business. To be effective, a DR plan must establish the correct scope. A full
understanding of risk assessment must be performed. This will support the creation of an effective strategy against risk (disaster avoidance),
and risk mitigation (disaster response).
Types of threats to consider
Business and IT face many potential threats. Some are unique to business continuity planning, others to disaster recovery planning. BC and
DR planners should review this list of potential threats, but not be limited by it. New threats emerge, and old ones mutate into new ones. Let
these serve as thought starters for your DR planning efforts:
a. Natural disasters Events such as fire, flood, earthquake, weather. What if your normal work area was destroyed, or simply made
unavailable as a result of a natural (or man-made) disaster? Are records replicated? Do you have an alternate work location?
b. Personnel outage (strike, pandemic) What if an employee strike, or pandemic prevents personnel from coming to work? Who's
cross-trained on key business processes? Can employees work from home via the internet? Have employees been provided the
right tools and have they tested remote access?
c. Threats Threats can include bombs threats, disgruntled employees, etc. How do you respond to these emergency situations
(response plans and evacuation plans)?
d. Just-In-Time What if critical suppliers were unable to meet expectations? Do you have sufficient inventory?
e. Power/water failure Technology systems require electricity and cooling. Without it these systems will shut down. Regulatory
requirements may require potable water be available to employees or businesses must close due to health risks. Do you have a
backup power and water supply?
f. Failure of IT components What would you do if critical systems were down for any length of time? Are there manual processes?
Have personnel been trained to use them?
g. Data Loss - Is your data routinely backed up? Have you tested sample data restoration? What is the process for restoring data
and is it stored offsite or in an approved/safe location?
h. Geo-Political and Civil unrest The global economy presents unique risks that cannot be predicted, nor prevented. In some
countries, internal discord could prevent workers from safely reaching their places of employment, or put workers onsite at risk. Do
you have plans to cover such an event?
Identify Recovery Assumptions, Expectations, and Sign-Off
A full understanding of assumptions, expectations, and potential impact must be performed as part of the overall DR planning effort. This
should be the first step as it will help to identify how to avoid and respond to disasters. Once the DR plan is complete, the application or
infrastructure teams DR Coordinator, and/or manager or department head of the application or infrastructure for which the DR plan is being
developed must sign-off the DR plan to ensure that the core components were sufficiently documented, and will enable recovery teams to
respond to a disaster affecting the application or infrastructure.
ITO infrastructure support teams recover the components such as servers, storage, routers, and operating systems, interfacing
software, databases, monitoring, etc
Application owners MUST have properly configured backups scheduled for their application, and are responsible for validating
recovery of the application
Application and Business Owners have properly contracted appropriate production and recovery systems and maintain current
vendor Service Level Agreements
Recovery plans/tasks are sufficiently documented and available to teams in the event of a business disruption. These may include
run books or other procedures that can be accessed during an emergency.
Key personnel and contact information methods for support teams are defined
The DR plan is used by many different teams. Provide enough detail regarding the application and associated infrastructure as
there will be many different teams working together to help in the recovery effort.
Vital/Critical Dependencies and Resources
Any resources (people, tools, documentation etc) needed and how to obtain them for the recovery effort of applications and infrastructure
must be evaluated and cross-referenced here. Ensure supporting documentation is available off-site.
Hardware, application and/or service templates required for the recovery of the application.
Installation and operations guides required for the recovery of the application.
User Workaround Procedures, Standard Operating Procedures (SOPs) etc
Architectural/Business flow diagrams
Recovery teams, tasks, damage assessment, and roles/responsibilities for applications
and infrastructure
Recovery teams need to have pre-defined tasks (steps) to conduct damage assessment, provide triage, consider mitigation options, and
perform recovery operations. Simply said, recovery teams just want to know what it is they have to do. One effective method is to use a
"timeline" document which defines specific tasks to be performed, when they are to be performed, and by whom. The timeline should include
the following:
Planning (define team members, contact information, central communications information, damage assessment, triage, mitigation,
coordination with crisis management and other support teams, change control management etc)
Core system recovery (recovery team tasks, hardware provisioning, core system recovery etc)
Application/database recovery steps (restoration from backup, perform initial validation etc)
Post recovery (final validation, restart business operations, issues and follow-up etc)
Additionally, the timeline captures other (external) tasks that may need to be performed simultaneously, providing a record of events,
communications, situation report data, change control tracking, and follow-up issues (for an example of the timeline see section DR004 in this
document).
Alternate Site Planning
Access to facilities is critical to business operations. When situations interrupt those routines (weather, local emergency, power outage etc)
alternatives must be considered. For technology teams you need to plan for these events. Define where employees go to perform routine
operational processes if the primary site is not available (and access rights). How are these processes performed? If your primary technician
is not available who will perform the task? What technologies are needed to do the job (VPN permissions, and access to file shares, mirrored
storage, network redundancy, consoles to gain access to the servers and systems)?
Disaster Recovery Command Center (DRCC)
A DRCC is used by a crisis management team or other business management teams to mitigate impact of the event, direct recovery efforts,
and resume operations.
The facility must be sufficiently staffed and equipped to provide effective incident and communication "command and control" in order to
facilitate strategic/tactical decisions, and direct communications to those that have a "need to know" (management, recovery teams,
customers, and as needed the public).
If appropriate this facility may be located away from other business facilities in case a crisis renders the primary DRCC unusable.
Team Awareness and Training
Employees in general and recovery teams also need the correct amount of awareness or skill-set training so they understand what to do in
response to business interruptions. During an emergency employees and recovery teams routine assignments, roles, and responsibilities are
immediately affected (or changed) in order to focus on preserving life, resolving the incident at hand, and minimizing business interruption and
damage to property.
There is a difference between training and awareness. In awareness activities, the learner is the recipient of information, whereas the learner
in a training environment has a more active role.
Awareness relies on reaching broad audiences with attractive packaging techniques. Awareness activities may include (but are not limited to):
How to respond to a disaster-like event (activate or notify crisis management)
How are recovery teams contacted, and where will they report for assignment?
Where is the DR plan located, and how do recovery teams access it?
Training is more formal, having a goal of building knowledge and skills to facilitate the job performance. Training activities may include (but
are not limited to):
Procedures in recovering applications and infrastructure
Performing a tape recovery of applications or databases
Coordination efforts in handling a large outage (communication, triage, task assignments etc)
Proficiency increases when lecture, demonstration, and skill-set testing methods are combined. To be effective, management must endorse
training and awareness, and may involve one or more of the following methods:
Lecture based or self-study - Subject matter training targeted to achieve necessary skill-set proficiency in specific recovery team
tasks, such as roles, responsibilities, or tasks.
Demonstration actual or simulated (hands-on) demonstration of proficiency involving specific recovery tasks)
Recovery team proficiency should be validated to gauge the level of training a recovery team has actually achieved. Knowledge
base testing proficiency may be demonstrated using a combination of lecture Q&A sessions and through demonstration results.
Complete the following templates from the Corporate Business Continuity Templates:
Originator: ITO BC/DR team (disaster@ford.com) Page 4 of 30
Confidential
Date Issued: 10/31/2006
Date Revised: 06/12/2013
BCP012 Training and Awareness Plan
BCP012a Training and Awareness Evaluation (Optional)
If your overall Disaster Recovery (DR) plan is comprised of many disparate documents (e.g. run books, vendor specific configuration manuals
etc). These documents should be "published" using various media (paper, or electronic). It may be beneficial to consider using a "front-end"
interface to present user friendly access to the vital/critical recovery documents (example RoboHelp, website, Word, Excel, PowerPoint, PDF etc).
Security of vital/critical documents is necessary per corporate directives. Personal information contained in documents should provide procedures
on protecting the information from getting into the wrong hands. Where appropriate, they should be protected per HIPAA and other regulatory laws.
Electronic versions should be password protected and encrypted where appropriate when stored on portable media such as laptops, USB devices,
or CDs/DVDs.
When developing your DR plans follow corporate documentation and information management standards. Any information which is sensitive (e.g.
personal information, access information, etc) must be labeled as such (e.g. confidential, proprietary etc).
When storing documents off-site and at alternate locations ensure they are protected from plain view or open access. Do not store DR plans in
vehicles. It may be appropriate to use a two person integrity system to provide separation of duties and to prevent any one person from having full
access to sensitive information (e.g. split a password into two sealed envelopes to be opened only by authorized agents).
Final thoughts
The recovery effort often requires restoring applications or rebuilding infrastructure to an alternate host at an alternate site. As a result, some
application and infrastructure components will need to be modified (on the application or supporting core operating system) to restart
production once the recovery tasks are started and completed. Examples could include host names or IP addresses.
The DR plan development process may initially be very complex and involve participation from several different groups (e.g. management
buy-in, end users, customers, application and infrastructure development, engineering and support teams). The process will take time to fully
mature from initial scope development to final plan approval. As the big picture unfolds (e.g. what, when, how, and why of what is being
recovered), thorough DR plan testing and peer review will help surface where gaps should be addressed and resolved or mitigated.
By now it should be realized that DR planning is a continual improvement process which involves thorough evaluation for impact, change, and
testing. Over time through testing real invocation, an effective DR plan will ultimately sustain the business during unplanned outages.
1. Overview of the application/infrastructure mission in support of the strategic importance to Ford Motor Company
(DR002)
2. Define routine problem management (support and escalation) processes, Business Criticality
Assessments/Operational Level Agreements (BCA/OLA), Recovery Time Objectives (RTO), and Recovery Point
Objectives (RPO) (DR002)
6. Define recovery teams, recovery tasks, damage assessment, and roles/responsibilities for applications and
infrastructure (DR004)
10. Perform annual employee DR training and roles/responsibility awareness (BCP012, BCP012a)
11. Complete annual DR Plan testing, resolution of issues, lessons learned, and update of DR plans (DR007, BCP013,
BCP014)
The items in italicized blue text are for information only, and therefore they will need to be written for your specific requirements.
Note: These are NOT all inclusive. Include anything that you need to recover your application/infrastructure. Consider alternatives, how to access the application if
primary means are not available (manual processes or alternatives). Evaluate whether the application or infrastructure is using non-standard tools.
The items in italicized blue text are examples only, and therefore they will need to be written for your specific requirements.
DR004 Instructions and item description: Complete one of these high-level documents for each stand-alone application or infrastructure plan.
Enter all tasks, dates, and assign responsibility that is required to recover the application/infrastructure, including teams or individuals needed to perform them. Include change control tickets if
needed
Review your BC plan for contact information (BCP009 and BCP010) to ensure there are no changes since the last update. If differences exist determine correct information and update for use
during the event. There may be several different contact lists. Contact lists contain information pertaining to recovery teams, subject matter experts, suppliers, customers, or other key
contacts as appropriate
Contact information must be sufficiently comprehensive (alternative contact methods), easily accessible, and regularly updated for immediate use during emergencies
See the Business Continuity Process Guide for more information on BC and templates for contact and supplier information
During a disaster, do not initiate a task until the application/infrastructure team lead indicates that it is OK to do so
The items in italicized blue text are examples only, and therefore they will need to be written for your specific requirements.
Testing *Insert test scope here, i.e. This test will recreate the steps needed to test the situation where the primary server FCXXXXX is no longer available and production needs to
Scope: switch to the backup server ECCXXXXX
Servers/Databases Required For Test Location Operations Owner Shared OS Time Zone
CDSID@ford.com
Disaster Recovery Coordinator John Disaster
Phone: 313-XXX-XXXX
CDSID@ford.com
Application team Jane Application
Phone: 313-XXX-XXXX
CDSID@ford.com
Business Owner John Owner
Phone: 313-XXX-XXXX
CDSID@ford.com
Business Tester Jane Tester
Phone: 313-XXX-XXXX
CDSID@ford.com
DBA John Database
Phone: 313-XXX-XXXX
CDSID@ford.com
System Administrator Jane Administrator
Phone: 313-XXX-XXXX
CDSID@ford.com
Network John Network
Phone: 313-XXX-XXXX
Instructions:
Originator: ITO BC/DR team (disaster@ford.com) Page 19 of 30
Confidential
Date Issued: 10/31/2006
Date Revised: 06/12/2013
Do not initiate a task until the Disaster Recovery Coordinator indicates that it is OK to do so during test
Insert tasks within the plan indicating when bridge lines are opened and closed, and the required information for those bridge lines
DR004 can be used as a starting point for DR004a Part 2. Steps should be modified as required to fit scope of test
Steps in black are generally required for all client-server applications. Steps in Blue are generally optional
16 Conduct test plan review meeting Pre-DR test Disaster Recovery Coordinator At least 2 weeks before DR test
invite all teams required for test
DR005 Instructions and item description: Complete one of these high-level documents for each stand-alone application or infrastructure plan.
1. Identify the street address and room location of the alternate hardware site. Both Building 6 and FMCC locations are provided as examples. Remove the building that does not apply to your
application/infrastructure.
2. Identify the primary and alternate points of contact (POC) for the facility in case the recovery team needs to contact them. Provide primary and alternate methods of contacting them (phone,
email, pager working hours and after hours as appropriate). These are the people you will need to contact in the event you physical access to the alternate hardware.
3. Room Type General room description (small, medium, large etc)
4. Comments - Use this space for any additional details which may help expand on details not provided elsewhere on this form.
Alternate site considerations
Is the site equipped for supporting vital and critical operations (redundant or separate power, cooling, telecommunications, information technology grids)
Is equipment readily available to deliver, stage, and present to operations? Is it aligned with your OLA?
Alternate site information: See glossary entry for more information
*Note: Some conference rooms may have inactive phone and network outlets. Check with the facilities owner to find out how to activate in an emergency.
The items in italicized blue text are examples only, and therefore they will need to be written for your specific requirements.
*Annual
Disaster Recovery Distribution Log DR009
The following items are included in your Business Continuity (BC) plan, and may be required while executing your Disaster Recovery plan. Please
make sure these BC forms are updated per the schedule, and are kept offsite with your Disaster Recovery plans
Quarterly
Contact List/Calling Tree BCP009
Quarterly
Supplier/Customer Contact List BCP010
Annual
Wallet Cards BCP011
As a member of IT, the following activities can be conducted simultaneously for Disaster Recovery and Business Continuity. Please make sure
these activities are conducted at least annually.
*Annual
Training and Awareness BCP012
*Annual
Call Tree Testing BCP013, BCP014
DR007 Instructions and item description: Complete one of these high-level documents for each stand-alone application or infrastructure plan.
*Annual = Item should be reviewed at least annually (more often if significant changes occur).
1. Add core components to this list that are specific to your plan where applicable.
2. BC plan templates are here
3. Indicate who is responsible to complete the review and maintenance of this section.
4. Indicate the frequency this section is reviewed (e.g. monthly, quarterly, semi-annually, and annually). Do not wait for the recommended
frequency to perform a needed revision.
5. Indicate dates as sections are completed.
Maintenance considerations:
Maintenance cycles should tie to testing cycles (usually updates to your plans are required following a test)
Vital and critical business processes may drive more frequent review and updates to DR plans (e.g. quarterly).
Building access and important team contact information can change frequently. Building privileges, personnel changes, phone
numbers, email, and pagers should be reviewed and updated quarterly.
Documents may include, contact cards, awareness, recovery tasks, run books, web or network-based documentation, or regulatory
references.
DR008 Instructions and item description: Complete one of this form for each application/infrastructure DR plan
1. Indicate the CDSID of the originator/modification owner of the DR Plan
2. Indicate the date (created/revised)
3. Indicate the version (see example in instructions above).
4. Describe the changes made to the current version
5. Indicate the approver of the DR plan