Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Data Recovery Best Practices

Building a responsible backup and recovery system for your databases

By Stephen Wynkoop

Microsoft SQL Server MVP


Founder The SQL Server Worldwide User’s Group

W H I T E PA P E R
Data Recovery Best Practices – White Paper

Table of Contents

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3

Why Backup is Necessary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3


Full Database Recovery and Restore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4
Point-in-Time Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4
Specific Transaction Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4

Disaster Planning and Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4


How Much Can You Afford to Lose? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
About Transaction Logs and Keeping Historical Backups . . . . . . . . . . . . . . . . . . .5
Optimize Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
Plan for the Future, Don’t Fail to Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
Pointers to Keep in Mind for the Restoration Process Planning . . . . . . . . . . . . . .8

Disk to Disk = Best Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8

Being Prepared for Recovery – The Backup Process . . . . . . . . . . . . . . . . . . . . . . . . . . .10

Summary/Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10

2
Data Recovery Best Practices – White Paper
Copyright © 2003 Lumigent Technologies, All Rights Reserved
Data Recovery Best Practices – White Paper

Introduction
When people think about Data Recovery, they think largely about backups and the actual act
of both backing up the database and associated files and the process of restoring those files
to the server. Without a solid plan in place that reviews the best approaches for setting up a
plan, testing the plan and executing on that plan, you can quickly get into trouble.

Planning for data recovery is more than just making sure your database is backed up. You
need to understand how the process works, you need to have the right tools in place, and
you need to have practice in using those tools. When the time comes to restore information
to your production systems, you won’t want to be learning about how things work; you’ll want
to get the job done as quickly as possible.

There are many different components to a competent backup and recovery plan. In addition,
there are many types of recovery plans available. Each of these different approaches may suit
what you need for different types of issues that arise. You need to understand and plan for
the differences between a full system restore and a point-in-time recovery. At the most precise
level, you may even need to recover a specific transaction or data element. As you can imagine,
understanding each of these, and how to execute on them, is critical to managing your data
resources.

In this white paper, we’ll explain each of these items, talk about what they mean and how they
apply. We’ll also provide key planning points, and investigate how some different tools can help
you accomplish these tasks.

Why Backup is Necessary


Backup provides you a recovery avenue when things go wrong. Hard drives fail, connections
between systems fail and have to be restored, people make mistakes, all causing the need
to recover at different levels.

Note
Backup processes and planning often revolve around the unsettling question of “how much
can you afford to lose.” This is because you need to determine the frequency that you
backup the transaction logs and databases, while at the same time paying attention to
disk and/or tape space constraints. In addition, you’ll need to decide how you store back-
ups, how many days of backups you retain and lastly, whether you want to maintain a
sub-set of your backups off-site.

Remember, in the worst possible scenario, if your backups are stored right next to your
computer and there is a fire, the backups will go up in smoke too, right along with your
computer. It’s important to have at least a skeleton off-site storage plan.

Keep in mind that responsible planning and management of your systems includes more
than just backing up to a device and then restoring the database should systems fail. There
are really three different types of recoveries you may be faced with, and several shades of gray
between each of these. The major restore options are explained in the next three sections.

3
Data Recovery Best Practices – White Paper
Copyright © 2003 Lumigent Technologies, All Rights Reserved
Data Recovery Best Practices – White Paper

FULL DATABASE RECOVERY AND RESTORE


Full database backup and restore is what many people think of when they consider their back-
up strategy, and it’s the most drastic recovery path. This requires that you restore the most
recent full database backup, and then apply all transaction logs that were backed up after that
backup was taken. At the end of the process, your database will be in the same state it was
as of the time of the last transaction log backup. Your data loss in this scenario will amount
to that information that was not in the most recent transaction log backup.

POINT-IN-TIME RECOVERY
Point-in-Time Recovery lets you recover, typically using transaction logs, to a specific time
when you know the data was valid. This typically means you’ve discovered data issues after
some time has passed. This usually means restoring the most recent backup, then applying
transaction logs to the system up to just before the time when you know the data began to
have issues. This lets you restore to a known good point in time. You can also perform differ-
ential database backups – these allow you to backup just the changes since the last backup
was performed.

SPECIFIC TRANSACTION RECOVERY


Transaction-based recovery is typically done in one of two different ways. First, your application
can be managing transactions in the code by starting transactions, doing a bit of work, and
then committing the work to the database with an end transaction call. If the transaction fails,
it can be rolled-back, putting the information in the tables into the same state that it was in
when the transaction was started. In addition, if the server were forced to restart during the
transaction, SQL Server would roll back the transaction, putting the database into a known
state – the values representing the values in the database at the time that the transaction
was started.

It’s also possible to roll back specific transactions (either literal transactions or merely changes
to the data in the database) using third party tools. Lumigent’s Log Explorer product will let
you peruse data changes, along with a whole host of information about those changes.
This includes who made the change, what was the value before the change, etc. From this
information, the tool will allow you to restore specific values, in essence rolling-back data
modifications, even without the benefit of transactions.

Disaster Planning and Recovery


Disaster planning must take into account the types of recovery you want and need to support.
You need to have a written plan, and you need to test the plan to make sure it addresses
the different facets of any restore process. Remember, you won’t control when the process
is needed. You want to be able to provide for how the process is done, what the expected
outcome will be, and how to provide for support for these processes up to the time you need
the recovery efforts to begin.

What follows are some guidelines to thinking through your plan.

4
Data Recovery Best Practices – White Paper
Copyright © 2003 Lumigent Technologies, All Rights Reserved
Data Recovery Best Practices – White Paper

HOW MUCH DATA CAN YOU AFFORD TO LOSE?


As mentioned above, this is perhaps the most telling question you need to be sure you can
answer. If you can’t lose a single transaction or a single change, your disaster planning and
recovery efforts will need to include fail-over systems. This means you’ll be looking into cluster-
ing solutions, and you’ll be working with hot stand-by systems and real-time replication and
archival solutions. These tend to lead to rather large budgets, so depending on your budget,
“no data loss whatsoever” may not be a reality.

That said, and assuming that you’re not looking into a clustered solution, you’ll need to know
how much data you have in the actual database(s) you’re backing up, and you’ll need to know
what size the transaction logs get to as the database is used.

One of the most common approaches to backups, and one which allows for only a maximum
one hour data loss window, is to backup the database nightly and the transaction logs hourly.
Typically, you’ll set up SQL Server to keep a specific number of days worth of backup as
archive. When you set up this type of backup structure, you’ll tell SQL Server “Keep 14 days
of backups, backup the database each morning at 3AM and the transaction logs every hour
for all other times.”

Keep in mind that, if you’re using this approach, you need to have disk (or tape, if you’re
backing up directly to tape) space equal to more than 14 times the size of your database
since you’ll be keeping 14 archival copies in the queue. In addition, you need to plan for
enough space to support the 13 transaction log dumps. The size of transaction log dumps
varies wildly and is entirely dependent on the volume of information processed by SQL Server.

ABOUT TRANSACTION LOGS AND KEEPING HISTORICAL BACKUPS


Many people make the mistake of thinking that as long as they have several days of backups,
they can restore to any point in time during those several days. It can be a painful lesson to
learn that this may not be the case, depending on your archive solution. Consider the following
backup policy:

• Nightly backups
• Hourly transaction log dumps
• Database backups are kept online for five days, then archived to a secondary source
• Transaction logs are rotated to keep the most recent 24 hours available

At first glance, this is great. You can recover to the last database backup, then apply the
transaction logs to recover beyond that to the current state, or any time in between. If your
system fails, and you recognize the failure within 24 hours of the last database backup, you’re
correct in saying that you’re covered.

Keep in mind, though, that if you have the possibility of needing to restore further back than
that last database backup, you will be faced with data loss.

This situation comes from the fact that you’ll restore the database from three days ago (as an
example), which would be available online. But if you follow the history configuration for the
transaction logs, you’ll find that the transaction logs are only available for the last 24 hours.

5
Data Recovery Best Practices – White Paper
Copyright © 2003 Lumigent Technologies, All Rights Reserved
Data Recovery Best Practices – White Paper

This would mean you wouldn’t be able to move forward beyond that three-day old backup.
You’d be restoring to that point and no further in the database.

Keep this in mind as you architect your recovery solution. You need to consider your transac-
tion log rotation schedule in addition to your backup rotation schedule. It all goes back to “how
much data can you lose” and how far back are you willing to support in the need to recover
that data? If the answer is that you need to be able to restore to a point in time during that
five day window (from our example of five days online backup storage), you’ll need to also be
storing five days of hourly transaction logs.

OPTIMIZE AVAILABILITY
When you’re building out your plan, be sure to consider the impact on your users and those
dependent on access to the database. If you’re in a situation that requires access at all times
(financial applications are an example of this), you’ll want to look not only at a recovery plan,
but also a failover plan.

Failover will protect you in cases where a hard drive fails, or other instances where the server
goes offline, taking your database systems with it. Failover typically includes clustered server
capabilities, where you have more than one server working against a given set of data. If one
server does fail, the other server is able to pick up where the failing server left off and the user
experience is largely unaffected by the downtime.

Note
In a clustered environment, if a failover situation does occur, the application working
against the database may need to be restarted to “see” the recovery server. Typically this
is merely a restart of the application, or a reconnection to the web site or other resource
working with your SQL Server. The important point here is that your recovery plan in a
clustered environment should include several phases:

• Bring the applications back online against the recovery server(s).


• Take the server offline that is down and/or experiencing trouble.
• Correct the issue with the original server.
• Bring the original server back into the cluster to begin supporting the cluster again.

On the other hand, if you don’t need to make sure you have full access, all the time to the
server, you can work out your plan so you know exactly what you need to do to recover your
system, get people back working again in the shortest period of time, and how to address
problems that may arise during that process.

PLAN FOR THE FAILURE, DON’T FAIL TO PLAN


Executing on your plans will be key – below you’ll find different things you’ll need to consider
and work through as you design your recovery plans.

6
Data Recovery Best Practices – White Paper
Copyright © 2003 Lumigent Technologies, All Rights Reserved
Data Recovery Best Practices – White Paper

Backup Procedure Checks

• Are they working?


- Check your scheduled task’s history entries.
- Check the backup directory for the related database and transaction log dump files.
• Are they archiving appropriate numbers of past copies of the backups?
- Check the directory for past copies of the database and transaction log dump files –
if you’re expecting a rotation of files, perhaps several days worth or more of these files,
make sure they’re in the directory.
• Are the transaction logs backing up on time?
- Check the job history.
- Check the directory that is used for the backups; make sure the transaction logs dumps
are there.

Tip
When you review the backup file sizes, if you see that your transaction log dump files are
rather large, you may want to consider making the time between transaction log backups
smaller. Remember, in the case of a restore, you’ll be restoring the database, then the
transaction logs to get caught up. If the transaction logs are large, this can mean that you
are running a large number of transactions, which translates into losing a large number of
transactions (since the last transaction log backup) between backup processes.

• If you’re using SQL LiteSpeed, try running LiteSpeed with the debug option turned on.
This will enable you to see the various messages as the backups are performed. You’ll
need to manually run the backups to be able to review/see these messages. Alternatively,
you can have the output of the backup operations directed to a log file, external to
SQL Server. You can then review this log file for any issues that may arise. For more
information, read about the @logfile option with LiteSpeed.

Perhaps the most important check is whether your backup files can be restored. It sounds silly,
but there are a large number of people that can attest to the fact that they thought they were
successfully backing up and were protected from disaster. When it came time to recover and
restore their files from backup, they found that they didn’t know how (didn’t know the com-
mands), the backup files were either missing or corrupt, or they couldn’t find the correct hard-
ware/software combination to get the files back onto the server for restoration. (This last point
is one that pertains largely to tape backup systems.)

Once you have your backup files, you need to make absolutely certain they are valid, that
you know how to restore them, and that the restoration process is documented. Remember,
if you’re encrypting or password protecting your backups, the password should be stored some-
where safe, but somewhere where the right person knows how to get to it. If you’re away on
vacation and the system must be restored, there should be a procedure that can be followed
to complete the restoration, complete with passwords.

7
Data Recovery Best Practices – White Paper
Copyright © 2003 Lumigent Technologies, All Rights Reserved
Data Recovery Best Practices – White Paper

Keep in mind that just because you may not be taking vacations, this doesn’t mean you don’t
need a plan. When things go wrong, the last thing you want to be doing is trying to remember
the steps you need to follow to get your systems back online. Take the time now to write out
the steps…then practice them.

HERE ARE SOME POINTERS TO KEEP IN MIND FOR THE RESTORATION PROCESS PLANNING:
• Have a written plan with steps to follow for the restoration and recovery process. One very
important thought on this topic has surfaced given the recent mass power outage in New
York City and the surrounding areas. If you consider that, if you were the DBA, the phones
and many transportation systems were out of commission, and you quickly see that you
can’t count on getting back to the office to address issues. While this is extreme, it does
point out that it’s possible that whoever happens to be in the office at the time a critical
issue arises needs to be able to address that issue. You need to have a written plan.
• Try performing your restores against a second server. Make sure you know the process and
that you’ve gone through the steps of restoring the database, checking user permissions,
applying transaction logs.
• If you’re working in a clustered environment, run through a test with a failed node. Note
of course that unless you have an extra clustered environment this can be tricky relative to
downtime. Make sure you have a planned maintenance window and that you’re prepared
for issues that may arise. While this will take some meticulous planning to avoid complica-
tions, all the planning and studying to understand the failover technologies will pay off – not
just in the dry run, but in the real thing when the knowledge is needed most.

Disk to Disk = Best Practices


You have several options when considering the actual approach to backing up your system,
especially as it relates to how you’ll store the backups, how you make them available for
restores, and how you archive those backups. Typically, you can expect your backups to be
needed for a restoration process within a reasonably short time. This is because backups are
used to recover a system after a system failure – not to “go back in time” to see data. This is
an important distinction because you’ll want to make sure your most recent backups are both
the most protected and the most readily available.

As a general rule of thumb, you’ll find that disk-to-disk backup is a much better solution than
tape-based alternatives when it comes to recovery options and processes. Some of the bene-
fits of this approach include:

• Speed – with no tape transfer process to work with, you can access your database and
transaction log backups immediately, providing a much faster path to recovery.
• Additional recovery options – you can use products like Lumigent’s Log Explorer to work
with the transaction logs, making transaction and specific data element recovery possible.
This may be possible with tape backup, but would force a restore to your server or other
location.
• More reliable data storage medium – since you’re backing up to disk, you stand a better
chance of not having the media go “bad” for your backups. That said, of course, make sure
you’re backing up your backup devices, just in case. Keep in mind too that the “Acts of God”

8
Data Recovery Best Practices – White Paper
Copyright © 2003 Lumigent Technologies, All Rights Reserved
Data Recovery Best Practices – White Paper

issues still remain – if you’re backing up to the disk on the same server that has your SQL
Server, or you’re backing up to another server physically located near your SQL Server, you
can still be in danger of not being able to recover from fire or other catastrophic disaster.
For this reason, it’s good to keep archive copies (perhaps weekly, for example) off-site as a
last-step recovery mechanism.
By backing up to disk, and keeping those backups online and available, you are able to use
world class tools to quickly provide recovery options. Time is of the essence when you’re
working to bring systems or data elements back online. Backing up to tape requires locating
the tape, restoring to your server – both of which require time and introduce variables that
can stand in the way of your recovery process’ success.

If given a choice, it’s always a better solution to backup to disk.

The table below shows some examples and recovery approaches you can employ with this type
of system in place, based on the scenario you’re facing.

Recover a database Restore the database; restore the logs, in order,


from the point in time of the last backup. The
resulting system will include all updates up to
the time of the more recent transaction log
backup.

If you only want to recover to a specific point in


time, determine which log file occurs closest to
the point in time before your target time period.
Restore the database, restore the log files up to
that point.

Recover a specific data element change Using Log Explorer, you can review the
transaction logs, locate the change that was in
error and restore the data to the value prior to
the change.

Recover a dropped table Restore your database and log files to a new,
temporary database. From this database, you
can copy the lost table back to the production
database.

Alternate solution, use Lumigent’s Log Explorer


product to recover the lost table – recovery is
possible for DROPped or TRUNCATEd tables,
depending on your transaction logs.

9
Data Recovery Best Practices – White Paper
Copyright © 2003 Lumigent Technologies, All Rights Reserved
Being Prepared for Recovery – The Backup Process
By utilizing disk-based backup procedures, you can optimize your responsiveness and available
up time to support the recovery methods you’ll need. By using the right tools, you will have a
full circle of options when it comes to restoring and recovering from system and database issues.

Backing up your information, and how you do it is just as important as having the tools and
knowledge available to you to recover your data. Backing up your data with tools or technologies
that can become faulty or cause time delays in your recovery cycles are simply not good practice.

A very significant tool you can use to optimize your system – both on the backup and recovery
sides of the equation is the SQL LiteSpeed product from DBassociatesIT. The product offers
fast, non-CPU-intensive, encrypted and compressed backups. One objection to backing up
to disk has been the amount of disk space required to support a solid recovery model. With
LiteSpeed’s compression technologies, you’ll not have to use third-party archive and compres-
sion utilities, and you can save drastically on the disk space you need to store and manage
your database and transaction log backups.

LiteSpeed runs just like the native backup routines in SQL Server and syntax is nearly identical
to native backup options in all but just a few new commands. In addition, you can address the
security issues associated with traditional backups by encrypting your database and transaction
log backups with true encryption that protects the whole of your backup set.

To be best prepared, set up a backup server – the destination for your backups. Install a good
amount of disk space and use this as the destination for your backups. Don’t store the backups
on the same drive as your databases. This is a solution that would provide no recovery path Lumigent Technologies, Inc. is the
leading provider of data integrity
when the disk fails. solutions for businesses that need
to manage the integrity of their data
across the enterprise. Lumigent
Summary/Conclusion solutions provide organizations with
unprecedented insight into how
There is much to consider as you build out your backup, restore and recovery plans. It’s more their data assets are being used,
in order to address risk reduction,
than the ability to simply restore your database; you need to manage the recovery options and compliance, security, and
make sure you have all available options available to you. operations challenges.

Be sure to write out your plan. Test the plan, practice the plan, and make sure others that
may be in contact with the servers in your absence are also aware of and familiar with your
plans. While restoration of a single point in time transaction isn’t something you need to train
everyone one, you should consider training on full system restores, transaction log restores
and how to work with the backup media you use.

Use 3rd party tools as appropriate to make sure your systems are both optimized and provid-
ing the highest level of functionality you need. Having too many options is just not possible
when the users are screaming, the boss is sweating and you’re in the hot seat to get things Lumigent Technologies, Inc.

right again with your database server. 289 Great Road


Acton, MA 01720 USA
Toll Free 1 866-LUMIGENT
IF YOU’RE INTERESTED IN MORE INFORMATION ON EITHER OF THE PRODUCTS MENTIONED, YOU CAN VISIT: 1 866-586-4436
DBassociatesIT, SQL LiteSpeed, http://www.dbassociatesit.com Phone +1 978-206-3700
Lumigent Technologies, Log Explorer, http://www.lumigent.com E-mail info@lumigent.com

About Stephen Wynkoop www.lumigent.com


Stephen Wynkoop is the founder of The SQL Server Worldwide Users Group (www.sswug.org) where he writes a daily
database column and newsletter, and a Microsoft SQL Server MVP. Stephen is a best-selling SQL Server author and
Copyright © 2003 Lumigent Technologies, Inc.
a well-known speaker at technical conferences. Stephen first started working with SQL Server when it was first All rights reserved. Lumigent, the Lumigent
introduced in 1993 and has worked with SQL Server ever since. In addition, Stephen has authored online and offline logo and Log Explorer are registered trademarks
or trademarks of Lumigent Technologies, Inc.
columns, books, and other references on Office Development Technologies, web site design and deployment All other names and marks are property of
technologies and Microsoft Access. To contact Stephen, email swynk@bitonthewire.com. their respective owners.

You might also like