Roadmap To Checking Data Migration Computers Security

You might also like

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 5

ROADMAP TO CHECKING DATA MIGRATION

Introduction
When management decide to replace an IT system, standing data, account balance data, and maybe
some historic transactions data will be migrated from the old into the new system. If the data is
business-critical, the board will expect assurance that the conversion process will not result in data
errors or other problems once the new system goes live. If there are errors, explanations will be sought
and heads may roll. What should audit do to ensure there are no data errors in the new system? The
options include

(A) Test check data after it has been migrated into the new system
(B) Test check data while it is being migrated into the new system
(C) Review management’s methodology for migrating data and checking the conversion but
without testing the data itself
(D) Do nothing and hope for the best.

Obviously option (D) does not provide the assurance the board would expect. Option (C) may provide
sufficient assurance, depending on how thoroughly management checks the conversion. Option (A)
may go a little further towards providing assurance if all went well, but if errors were discovered after
the system had gone live it might not be straightforward to correct those errors before they cause
problems, plus additional testing would probably be required. This article describes option (B), which
has the two key advantages of providing almost 100% assurance on data integrity AND doing so before
the system goes live. This article can also be used as a source of pointers if you opt for options (A) or
(C): the techniques are valid whoever does the checking.

The key challenges


There is a little more to checking a data conversion than counting records and agreeing control totals.
In practice, there are a number of challenges in checking thousands of data items during a data
migration exercise:

1. grabbing the data out of the old and new systems for comparison
2. differences between the old system and the new system in how data is stored
3. changes to live data during the migration process
4. accuracy of cut-off parameters that determine whether data is being selected for conversion at all.

Each of these challenges will be considered in turn. Additionally we suggest opportunities to


streamline the way data is held and managed in the new system.

CHALLENGE 1: Grabbing the data out of the old and new systems
for comparison
Rather than test checking a random sample of data items, you may find it requires little extra effort, and
provides considerably more assurance, to mimic the entire data migration exercise by downloading
almost entire tables of source data from the old system and the new system and comparing data item by
data item. As this may have to be repeated a number of times for tests and the final conversion, it is
helpful as you go to compile a step-by-step instruction guide describing which tables to download, how
to compare them in MS Excel and any quirks that have to be taken into account. This should ensure
that by the time of the final run, the checking process will be refined to the point where it can be
carried out quickly and without holding up go-live.

It is standard practice to do a number of dummy test runs to ensure IT’s method of transferring data
from the old system to the new works properly. Audit will need to piggyback on these dummy test
runs by downloading data from the old system at the same time as IT, and comparing it to the output of
the new system.

Due to its flexibility and ease of use, MS Excel is a useful repository of data for comparing between the
old and new systems, though it is limited to 65,536 rows. If you want to use Excel to compare tables

669752115.doc 1 Printed 21/11/2002 07:01:00 PM


Authors: Chris Kelly and Chris Nelms.
with a larger number of rows, you can still do it by checking a sample of less than 100% of the data
rows. In the absence of a report generator with a sampling function, if key values are effectively
random you could try the crude method of selecting all rows with a key value ending in a given digit as
a way of selecting a 10% sample, if you assess that while strictly it may be statistically unsound to
select precisely every 10th item, the chances of a systematic error affecting just the items chosen or
excluded is minimal.

As each data table has different dimensions, you will have to compare each table in a separate
worksheet. By importing and sorting data from both the old system and the new system and pasting
them side-by-side into a worksheet for each table, it is possible to compare each data item using
Excel’s comparison formulas along the lines of =IF(A3=W3, “OK”,“ERROR”), placing the
comparison formulae in a third set of columns, and then using Excel’s database filter functionality to
filter out the OKs and focus on the ERRORs. You can even use the =COUNTIF function to count the
errors. One limitation in this approach is that the old system data rows seldom line up exactly with the
new system data rows first time, which means the =IF formulas do not compare like with like. So each
time the data is downloaded into the worksheets you may first have to manipulate the data rows to
ensure they line up. You may be able to do this using a simple cut and paste macro or, if that proves
too laborious (typically where there were more than about 1,000 records), by using LOOKUP formulas
which reach into separately downloaded Excel tables to either match new records to old records or vice
versa, depending on where you feel the highest risk of error is.

How do you download the data in the first place? Most applications have in-built functionality to
download in MS Excel format or at least into compatible format such as comma-separated. If possible,
it is preferable to use a database interrogation tool such as Cognos Impromptu because:
 it allows a clearer view of the actual database tables
 it allows more precise filters with which to mimic the migration parameters
 it should allow you to save the resulting data tables in MS Excel format.

BEWARE: Our download tool had a few surprises up its sleeve. Firstly, when exporting downloaded
data to MS Excel it was limited to an earlier version of Excel and therefore only exported 16,384 rows
of data, rather than the maximum 65,536 rows allowed by recent versions of Excel. We got around this
by exporting in dBase format first, and then opening the dBase file in Excel. Secondly, our reporting
tool was unable to interpret some dates on contracts which had begun in the 1800s or which had 999
years to expiry (such unusual dates are not unheard of in our industry). Fortunately, in the comparison
spreadsheets these could be manually identified and corrected.

TIP: If you are going to download most or all tables from the old and new systems into spreadsheets
across several dummy test runs, you will end up with a large number of spreadsheets. A file naming
convention can help avoid data overload. For instance, 2003 08 24 customer addr old system.xls shows
the date of the download and name of the table without having to open the file, and because the date
order is reversed it will stay in date order when viewed in MS Windows Explorer. Another benefit of
the file naming convention is that it allows you to go back and restore data from a previous download if
necessary, and it provides an audit trail of the work done.

CHALLENGE 2: Differences between old and the new system in


how data is stored
Downloading transaction and standing data table-by-table and comparing them field-by-field should
prove most effective because:
 Comparing whole tables reduces the chances of missing key data fields in the comparison exercise
 MS Excel lends itself to table comparisons, and each corresponding table set (ie from old and new
systems) can be compared in a single MS Excel worksheet
 It enables audit to gain a solid understanding of how the data tables are arranged in the new
system, which may prove useful when specifying audit and exception reports.

In addition, although the new system may look and feel different from a user perspective and may offer
different features (eg, web-enabled, enhanced reporting capabilities etc), it is likely that its underlying
table structure is similar to that of the old system. If so, comparing data tables between old and new
systems should be reasonably straightforward.

669752115.doc 2 Printed 21/11/2002 07:01:00 PM


Authors: Chris Kelly and Chris Nelms.
Invariably there will be differences between the old system tables and the new system tables, and you
may have to spend a good deal of time tailoring your extraction reports and comparison spreadsheets to
take account of them. They include:

 Field name differences that make it difficult to work out which fields on the old and new
systems should match up,
 exclusion of some fields/tables that were in the old system. Has any critical data been lost?
Can it be stored in another field?
 inclusion of additional fields/tables in the new system (for instance, to take account of new tax
classifications) etc. Have they been populated? How? You may need to either re-create
those fields manually, or mimic the logic of those fields by formulas in order to compare the
new fields in the new system to what they would have been had they existed in the old system.
 literal differences between logically identical fields in the old and new system, for instance,
where “Yes” in the old system becomes “True” in the new system; or different data formats
particularly in date fields, which can be numeric, date or alphanumeric. Even text fields may
be upper case in one system and not in the other. Again, extraction programs or comparison
formulae will need to take account of these differences by translating as necessary. This can
get a little more complex where one set of codes on the old system maps to a different set of
codes on the new, e.g. if the chart of accounts has changed.
 All of the above, i.e. a combination of fields on the old system that maps to a different
combination of fields on the new, some of which hold the same data in a different way. If
these contain flags, dates, and other values that taken together are critical in determining how
the application processes the data, it may be necessary to understand how these are used on
both the old and the new systems in order to verify that the correct rules have been devised
and applied in converting them.

Audit will need to understand field-by-field and table-by-table how the new system data is arranged
and how it differs from the old system. Any knowledge audit does not already possess from past
experience of at least the old system will be in manuals, system files, or inside the heads of IT and
those on the accounting team who have helped specify the new system, and getting that knowledge will
require the usual skills in diplomatic questioning.

BEWARE: If you are using a reporting tool, it may only be able to see those fields that have been
defined for it, and therefore data fields which have not been defined are not available for downloading
into MS Excel. In those instances where it matters, you will need to have the additional fields defined
for the reporting tool, or find other ways to retrieve or mimic that data, or test it manually in the new
system, or take account of the differences when the comparison exercise is done.

TIP: In cases where standing data includes multiple sub-records, such as historic address details or
bank account numbers, the default record will probably be flagged, and this flag can be used by audit to
ensure standing data comparisons compare like-with-like. It should also be the flag used by IT in
specifying the data migration process.

CHALLENGE 3: Changes to live data during the migration process


To minimise the number of comparison errors arising from updates to data in the live system, the
download from the old system should occur as close as possible to the migration event, both test runs
and final live run. If this doesn’t occur, audit will risk downloading transaction and standing data
which has undergone change in the live system after having been migrated into the new (dormant)
system, resulting in apparent errors when the data between the two systems is analysed. Time is
wasted on such false positives.

BEWARE: There will be a delay of at least one day between the extraction of live data from the old
system and when it gets loaded into the new system for testing. In order to avoid timing differences,
make sure you capture the data from the live old system as near as possible to the same time it is
captured by your IT department, down to the nearest minute if possible.

669752115.doc 3 Printed 21/11/2002 07:01:00 PM


Authors: Chris Kelly and Chris Nelms.
CHALLENGE 4: Accuracy of cut-off parameters that determine
whether data is being selected for conversion at all
Old data will need to be brought from the old system into the new system if it will retain business
relevance in the future. If not, audit should question whether the data needs to be migrated at all. On
the other hand, an error in cut-off parameters can result in valid data not being migrated. For instance,
during one of our test runs we discovered the migration logic had excluded all expired contracts, but
had not taken into account the fact that some of these contracts were still live and were being
renegotiated by management.

Transactional data will probably have a reasonably clear cut-off date before which the data is of little
relevance to the business. Audit should be able to mimic these cut-offs without too much difficulty.
Standing data, however, is quite different. Standing data may be old, but it may still be of relevance to
the business going forwards (such as a supplier who has worked with the company for 40 years, but not
recently). The conversion process may use an activity based cut-off, such as the elimination of supplier
standing data where the supplier record has no associated transactional data in the last twelve months,
say. Replicating this sort of selection logic poses a slightly greater challenge. However, as the above
example illustrates, it does also form a useful test of whether the users’ requirements, whether or not
explicitly and unambiguously stated, have been correctly interpreted by the IT department.

Types of comparison differences


The following bullets summarise the generic types of errors that may arise in a data migration exercise.
In most cases the errors necessitate refinements to the download programs or comparison analysis
spreadsheets, but in some cases they represent opportunities to add value to the migration process
through correcting logic errors.

 data wrongly migrated or not migrated into the new system because of migration logic-definition
errors, or errors in specifying or interpreting user requirements
 entire fields of data inadvertently overlooked by IT which escaped the migration effort – these
stood out obviously in audit’s comparison and were quickly rectified
 timing differences arising from non-synchronous downloads by audit and the IT department,
which is why it is important to capture the data at precisely the same time as it is captured by those
migrating it across to the new system
 Some differences in standing data may not matter enough to delay going live. For example, if the
selection criteria are supposed to select only suppliers used in the last year, if a few get left out
they can be set up again manually in the new system if it turns out they are needed. Similarly,
cosmetic differences like errors in the use of capital letters in text fields can be sorted out after go-
live.

Opportunities to improve the business


The methodology described in this article ensures that the data in the old system is faithfully
reproduced in the new system. While it does not of course ensure that only valid data is converted, it
does give audit a starting point for identifying errors and inconsistencies for correction.

But by obtaining a detailed understanding of the captured data, how it is held in the accounting
information system and how to access it with the available reporting tools, audit will have acquired a
powerful induction into the new system ahead of other users. This is the time to use this knowledge to
design exception reports to flag up the most important data inconsistencies eg:

 Data entry error combinations such as incompatible codes. This includes combinations that would
have been rejected by the new system’s input validation tests had they been keyed in rather than
uploaded from the old system.
 Reports to seek out suppliers and customers with matching bank account numbers, which could
suggest supplier fraud or customer fraud.
 Identification of particularly slow paying customers.
 Evidence of non-cancelled duplicate payments to suppliers.

669752115.doc 4 Printed 21/11/2002 07:01:00 PM


Authors: Chris Kelly and Chris Nelms.
Conclusion
Effectively, the methods described above almost amount to re-performing the data conversion. Given
the large scope for error, and the potential consequences, this can be of value in identifying instances
where requirements may have been misunderstood.

As is often the case in audit, the devil is in the detail. There is no substitute for understanding the table
structures of both the old system and the new system. Investing the time in learning the table structure
rapidly repays itself in identifying errors, and suggests ongoing exception reports which can improve
the corporate control environment on a day to day basis.

Chris Kelly FCA (Aust) B.Com is Associate Consultant specialising in internal audit with Resources
Connection Ltd. He has 13 years’ operational and financial experience.

Chris Nelms, FCA, M.Sc, B.Sc (Econ), CISA is the Systems Accountant (and formerly Computer Audit
Manager) at property company MEPC Ltd. He has 19 years’ experience of computer audit and writes
regularly for computer audit journals.

669752115.doc 5 Printed 21/11/2002 07:01:00 PM


Authors: Chris Kelly and Chris Nelms.

You might also like