Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

EXCEL CLEANUP GUIDE

Overview

This guide covers:

1. Steps for preparing an Excel file for Excel Analytics, Power BI, or
ACL import

2. Steps in ACL to import a prepared Excel file

This guide will help you to:

 Define and understand the benefits of proper “Data Preparation”.


 Determine if you can “prepare” the Excel file yourself, or need help
from a data conversion specialist (expert).
 Perform the basic steps to prepare an Excel file for Excel Analytics,
Power BI, or for import into ACL.
 Perform the steps to import a prepared Excel file into ACL.

Appendices will help you to:

 Prepare Excel data which contains a single file spanning multiple


tabs/spreadsheets/workbooks.
 Understand the characteristics of relatively clean, “do-it-yourself” Excel
files vs. difficult/messy Excel files (and non-Excel files) requiring
assistance from a data preparation/conversion specialist (expert).
 Perform a completeness testing, before and after data preparation for
Excel Analytics or ACL conversion.

Definition of properly “prepared” data

Bottom line – A properly prepared Excel file is ready for Excel Analytics,
Power BI, or ready for import into ACL when it has the following
characteristics:

 Row-1 contains mnemonic (*) column names.


 Row-2 and below contains a solid block of data with no headings,
breaks, subtotals, totals, blank lines or columns, or any other non-
data elements.
 Numeric and date information are formatted as numbers and dates
respectively, and everything else is text.
 Simple numeric formatting, e.g., 1234.56 or -1234.56, and date
formatting, e.g., 03/05/2015 are used.

© 2018 For information contact Deloitte Touche Tohmatsu Limited


 The columns are sized to fit the widest occurrence of data, and all
non-numeric, and non-date information is left justified.
 Text is in one case, normally uppercase.

(*) Mnemonic column names refer to names which are as short as


possible yet recognizable for what they represent. For example, “INVNO”
could represent an invoice number.

Benefits of properly prepared data…

 Data prepared in common and generally accepted format facilitates


an expectation that the data has a certain and predictable standard
look, feel and behavior.
 Our tools work with properly prepared data in a predictable
manner. (For example: If a column contains the “City” and
uppercase, lowercase, and proper case data is encountered,
generating totals by city will reflect totals per case variations - a
separate total for LONDON, london, and London- within the “City”
column, not by city).
 Certain Excel functionality, such as VLookup, may not function
properly if data is not left justified.
 It is easier to teach analytics using Excel Analytics, Power BI, ACL,
and other visualization tools if people adopt a generally accepted
data preparation standard.

Can I do this myself? Or, will a data preparation specialist


(expert) be required?

The bottom line –

This guide will take you through the basic steps to properly prepare a
relatively clean Excel file. (See Appendix-Y to learn about characteristics
of relatively clean vs. “messy” Excel files)

Many financial reporting systems create Excel files. However, data being
in Excel doesn’t necessarily mean you’ll have an easy time preparing the
file for Excel Analytics, Power BI, or for ACL. The data may be structured
in a difficult-to-use report-like format, or have other data issues.

If you cannot properly prepare an Excel file in 10 to 20 minutes, you may


want to consider using a specialist (expert) to perform this task for you.

Oftentimes, Excel files are not an option. Data can come in a variety of
non-Excel formats such as PDF, text files, text reports, delimited files
(CSV), Word documents, Access databases, etc.

© 2018 For information contact Deloitte Touche Tohmatsu Limited


Whether you have a messy/hard-to-prepare Excel file, or any one of the
aforementioned non-Excel file formats, there is hope!

Excel data preparation – Summary overview

The following steps provide a brief overview of the necessary data


preparation required for Excel Analytics, advanced Excel usage (for
example, VLookup or pivot tables), and if you ultimately need to import
Excel data into ACL. Detailed instructions follow below this high level
overview of steps:

1. Establish column names in Row 1

2. Ensure data begins in Row 2

3. Eliminate non-data items, such as blank rows, blank columns, page-


breaks, headings/titles, subtotals, totals, etc.

4. Format numbers 0.00/-0.00 and dates DD/MM/YY

5. Format text columns (left justification, consistent case, etc.).

6. Save the spreadsheet

Excel data preparation – Detailed step-by-step instructions

Step 1 — Establish column names (Fieldnames) in row-1

Row-1 in Excel will become the fieldnames used by Excel Analytics and
ACL, if the spreadsheet is subsequently imported to ACL. Whether or not
ACL will be used, it makes sense to use ACL’s rules for fieldnames:

 The first character in the name must be alpha (A-Z)


 The rest of the fieldname can contain 0-9, A-Z or an underscore ( _
)
 No spaces or special characters like %, &, ©, $, or # are allowed.

Make column names mnemonic. Mnemonic means “as short as possible


yet recognizable”. Examples include DOCNO, CUSTNO, DESC, INV_DATE,
AMOUNT, ACCT, and PART_NO.

Step 2 — Ensure the first row of data appears in row 2,


immediately after the row containing fieldnames.

© 2018 For information contact Deloitte Touche Tohmatsu Limited


Step 3 — Eliminate non-data items, such as blank rows, columns,
page-breaks, headings/titles, subtotals, totals, etc.

Below the column names in row-1, the data should begin in row-2, and
be contiguous with no blank rows or blank columns, subtotals or totals or
any non-data such as page breaks or report headings.

Helpful tips:

1. Use the Excel Analytics sheet checker. Click the sheet checker
to eliminate blank rows and columns, totals and subtotals. It may
not fix everything but it will put you several steps ahead. If you
have page headers interspersed throughout your data, sheet
checker would not clean these up.

2. Consider sorting on a column with a predictable content.


Example: Your spreadsheet looks like a report and contains page
headers, page breaks, titles, column names, blank lines, dashed
lines, totals and subtotals repeatedly throughout. You notice column
“C” is a customer Number ranging from 000000 to 999999 and
every row which contains an invoice has a customer number in
column “C”. In column “C”, on the “discard-lines” containing report
headings, totals, subtotals, blank lines, etc., there are no 0-9
values. By sorting on column “C”, all the usable invoice data will
fall into a large contiguous block (000000-999999). Dashed lines
will appear above that block, and all blank lines, totals, subtotals,
headings, etc., will fall below. Simply delete the dashed lines at the
top, and all of the headings, blank lines, totals, subtotals, etc.
which fell below. The end result is column names in row-1, a
contiguous block of data from row-2 onward. Make sure the data
still foots to your account balance being tested or other expected
amount.

Step 4 — Ensure dates and numeric data is formatted in a


consistent manner. Date and numeric data format may vary
across different member firms. The key is to ensure that data that
will interact with one another be formatted consistently.

Using the UK as an example, the following would be best practice


for consistent formatting:

Numbers: Format columns containing numbers using the 0.00 format.


(Format cells / Number / -1234.10) Make sure to specify decimal places
for consistent formatting.

© 2018 For information contact Deloitte Touche Tohmatsu Limited


Dates: Format dates using the DD/MM/YY format. (Format cells / Date /
03/14/01) For example, March 31, 2018, should look like 31/03/18.
Note the leading zero.

Consider the “Format as Date ( )” option in Excel Analytics which


converts any unusually formatted dates into a standard format Excel can
recognize based on a user-defined mask. This is helpful when a column
looks like a date but does not function as a date, and is text.

Step 5 — Ensure text columns are formatted in a consistent


manner to ensure Excel Analytics, and native Excel functions
execute in a consistent, predictable manner.

Consider the following examples and how they would make data analytics
using Excel or Excel Analytics usage more difficult.

 If you were to total the accounts receivable by city, and the city
column contains uppercase, lowercase, and proper case data, you
would get totals by each of the city-variations based on case, not
by city.
 If you were to perform a VLookup in Excel, leading spaces (spaces
on the left) causes matching issues.
 Some data, such as account numbers, customer numbers, etc.,
may physically be numeric in Excel but should be converted to text
and left justified in order to perform key functionality in Excel or
Excel Analytics.

Manipulate Fields

Upper/Lower Case: In the Manipulate Fields menu ( ) on the


ribbon of the toolbar, the manipulate text icon: contains many
features including converting data to uppercase, lowercase or proper
case. Simply highlight an entire text column (non-numeric, non-date) and
click the icon to convert an entire column to the right case.

Left Justification: Under Manipulate Fields, the manipulate text icon:


contains many features to left justify your text columns. Simply
highlight an entire text column (non-numeric, non-date) and click any of
the below choices:

© 2018 For information contact Deloitte Touche Tohmatsu Limited


Excel functions such as VLookup will perform in a better more predictable
way with left justified data.

Converting numeric columns to text: Under “Manipulate Fields”, the


manipulate text icon: contains many features to clean and
consistently format data. Simply highlight an entire numeric column
which you intend to convert to text, and select:

Data such as account numbers, or customer numbers which were


originally numeric, will be text and left justified. Many of the features in
Excel Analytics, or Excel functions such as VLookup will perform in a
consistent, expected manner.

Many data cleanup tools to consider: Every Excel spreadsheet will


have some unique data characteristics, some of which may affect the
behavior of Excel’s functionality and Excel Analytics. Below is a screen
capture of the Excel Analytics’ manipulate text icon ( ). Consider the
different features as you prepare your data when embarking on
performing data analytics on your audit.

© 2018 For information contact Deloitte Touche Tohmatsu Limited


Step 6 — Save your spreadsheet, and you are ready for Excel
Analytics and Excel functionality.

ACL Import Steps - The data preparation steps required for Excel
Analytics are the same steps required for Excel data preparation
prior to importing into ACL.

Make sure to save the spreadsheet into a folder where you will create and
access your ACL project. Below are the detailed steps to import a
properly prepared Excel file into ACL.

Step 1 - Create a new ACL project or access an existing one

An ACL project is a small file in which ACL stores Table Layouts, Views
and Scripts. Using plain words, an ACL Project points to your data files,
indicates what the columns are, and manages how the data is displayed
on your screen. In Windows explorer, you’ll notice ACL Projects have the
*.ACL extension. An ACL Project is NOT your data file. You should

© 2018 For information contact Deloitte Touche Tohmatsu Limited


typically designate a folder under “Documents” in Windows Explorer
which would include the ACL project and all related data files. If the ACL
project and all related files are under one folder, it is easier to share this
data with other team members.

Follow the appropriate steps below based on whether you don’t have and
need to create an ACL project, or have an existing ACL project and need
to access it.

To create an ACL project: Open ACL and select New Project. Point to
the folder you have designated (where the properly prepared Excel file
resides) and enter a project name. Click [Save] and the project will be
created. Warning: If you create a new project with the name of an
existing one, the existing one will be overwritten.

To open an existing ACL project: Open ACL and select Open project.
Point to the folder you have designated (where the existing ACL project
and properly prepared Excel file resides) and double-click an existing
project name.

Step 2 — Access the Data Definition Wizard and locate the


“properly prepared” Excel file

To access the Data Definition Wizard…

Select File / New / Table…

To locate the Excel file…

Select “Local”

Within the “Select Data Source” screen, here should be a black-dot next
to “Disk”, click [Next >]

Point to your Excel file and select [Open]

The Data Definition Wizard should correctly identify the spreadsheet as an


Excel file. Click [Next >]

Step 3 — Select the named range and tell ACL how to detect
column widths and data types

Follow the steps displayed in the screen below. As noted in Step 3 in the
screen below, use the entire spreadsheet to determine the field (column)
widths and data types. Do not use the “First 100 records” option. This
can result in data being truncated. Example: Your file has a column

© 2018 For information contact Deloitte Touche Tohmatsu Limited


containing customer names. In the first 100 rows, the longest name
encountered was 17 characters in length. In the entire file, the longest
name encountered is 50 characters. Using the “First 100 Rows” option
will result in the column being truncated to 17 characters wide and the 33
rightmost characters will be lost.

Step 4 — Preview the converted data and if necessary, make final


fieldname, data type, decimal place and date format changes

ACL will analyze your spreadsheet and then display a preview screen.
This will be your final opportunity to make edits to fieldnames, data
types, date formats, number of decimals and column widths.

The following “Preview Data” screen will appear:

© 2018 For information contact Deloitte Touche Tohmatsu Limited


The following are key considerations when reviewing and making edits in
the “Preview Data” screen:

Field (column) names: If during Step 1 of the Excel file cleanup, you
created a bad ACL fieldname, you can fix it at this point. For example,
the fieldname Cust # would appear as Cust__as ACL cannot accept the
space and # symbol and replaces them with underscores. As you click on
each column, ensure you have appropriate ACL fieldnames.

“Type” refers to numeric, text or date designation or status. It is


important that dates and numeric fields are formatted as such. For
example, an invoice date must be formatted as a date otherwise ACL’s
Aging Command will not recognize it. Numeric fields must be formatted
with a numeric data type. Examples of numeric fields include amounts,
interest rates, unit costs, quantity on hand, debit/credit, etc. If a field is
not a date, and not a number that you could perform mathematical
operations on, then it should be formatted as text, even if the column
only contains numbers 0-9. For example, customer numbers ranging
from 00000001-99999999 are not numeric fields that you would perform

© 2018 For information contact Deloitte Touche Tohmatsu Limited


mathematical operations on. If ACL suggests these are numeric, you
should change them to Text. As you click on each column, ensure each
field has an appropriate data type.

Date format and decimal places: By formatting dates appropriately in


Excel (See Step 4 above), ACL should properly assign the “datetime”
data-type and reflect the correct date format. The date format tells ACL
how the date is structured so ACL can use the field as a date and perform
date-related functions like aging. As for numeric fields, make sure the
assigned number of decimal places is correct. As you click on each
numeric and date column, make sure the date is appropriately formatted
and the number of decimals is correct.

Step 5 — Finish the conversion

After clicking [Next >] on ACL’s preview screen, you’ll be asked to enter a
filename which will become the name of your ACL table. It will have the
extension *.fil when viewing your folder in Windows Explorer.

After the conversion is complete, a screen will display the fieldnames and
data types. This is when you click [Finish].

The last step is to accept the table layout name which will be the same as
the data file’s name. When ACL says, “Table ‘Untitled’ was changed, save
as”, Click [OK].

The next screen is ACL’s main view displaying your imported data. Run
Analyze / Statistics on an amount field in order to verify it agrees with an
expected amount such as the total that you had in Excel and the account
balance being tested.

APPENDIX – X – DATA WHICH SPANS MULTIPLE SPREADSHEETS

Does your data span across multiple spreadsheets or workbooks?

For example, the entity may provide data in a workbook containing


separate workbooks or spreadsheets (tabs) by accounting period. The
desired result is to have a single spreadsheet which contains all data
(multiple spreadsheets combined).

There are a few scenarios to consider:

Scenario 1: The sum of all rows in all spreadsheets which need to


be combined exceed Excel’s maximum of 1,048,576 rows. The
solution may be to reach out to a specialist (expert) For more information
on this, please reach out to your local specialist (expert) team.

© 2018 For information contact Deloitte Touche Tohmatsu Limited


Scenario 2: The sum of all rows in all spreadsheets which need to
be combined does not exceed Excel’s maximum of 1,048,576
rows. However, the Excel spreadsheets or workbooks were saved
in Excel 97-2003 which has a maximum number of rows of
65,536. This can be determined by observing the words “Compatibility
Mode” at the top of your screen, and/or navigating to the bottom of the
spreadsheet and observing the maximum number of rows being 65,536.
The solution is to open every workbook and click File/Save as. Beneath
the file name, next to “Save as type:”, change from Excel 97-2003(*.xls)
to Microsoft Excel(*.xlsx). After performing this “Save as” function to
every workbook that needs to be combined, close all
spreadsheets/workbooks, exit Excel, re-open Excel and call up the
workbooks/spreadsheets that were saved as the latest version (*.xlsx).
Proceed to Scenario 3 solution.

Scenario 3: The sum of all rows in all spreadsheets which need to


be combined does not exceed Excel’s maximum of 1,048,576
rows. The solution is outlined below:

1. Open every spreadsheet. There could be multiple spreadsheets


within a single workbook, multiple workbooks, or combination of
the above. It doesn’t matter. It is only important that every
workbook and every spreadsheet that needs to be combined is
open.

2. Inspect to ensure that every spreadsheet which will ultimately be


combined has the exact same respective columns as every other
spreadsheet. Hint: Consider inserting a blank row at the top of
every spreadsheet, and copying/pasting the column headings from
one of the spreadsheets to the blank row at the top of every other
spreadsheet. This would result in two rows of column headings in
each spreadsheet, and then it is easy to compare and ensure all
spreadsheets have the same respective columns. If there is
variability, you will be required to manually fix the spreadsheets so
all of them have the same respective columns.

3. Once you have ensured all spreadsheets have the same respective
columns, access Excel Analytics, and select the icon titled, “Manage
Sheets”, and then select “Append Sheets”. Shift+Click to select
every spreadsheet, regardless of which workbook it is in, which
needs to be combined (appended) into one spreadsheet. The
default options should suffice in most cases and are self-
explanatory. Simply click [OK] and a new tab, named “Appended”
will appear and this will be your combined data. You have the
option of normalizing the data before or after appending sheets

© 2018 For information contact Deloitte Touche Tohmatsu Limited


however it would be easier to normalize the combined data as one,
as a final series of steps. You will notice a new “Column-A” which
indicates which previous spreadsheet each row was sourced from
(this is a default option, you can choose not to create that column if
it’s not relevant for you).

Appendix – Y – CHARACTERISTICS OF RELATIVELY CLEAN EXCEL


FILES VS. “MESSY” EXCEL FILES.

Characteristics of relatively clean Excel files include:

 Data falls neatly into columns and generally looks uniform.


Information which should be contained within a column is not
unnaturally split into multiple columns.
 There is no requirement to repeatedly copy/paste information.
 If the data is spread across multiple tabs within a workbook, or
multiple workbooks, the sum of all rows in all spreadsheets does
not exceed the maximum number of rows that Excel can handle.
 Dates are not in a text format. In other words, they can be
formatted as dates, and used in date related math.
 Numeric data, such as an amount column, functions as numbers
and is not formatted as text.
 You’re able to recalculate totals with little effort.

Characteristics of Excel files which may require adata conversion


specialist (expert) include:

 The layout looks exactly like a report, complete with report


headings, page numbers, spacing issues, totals, and subtotals.
 Each record of data actually occupies two rows. For example, an
invoice number, invoice date, description, and amount may occupy
one row, but for each invoice, the row beneath contains an
additional description which essentially needs to be pulled up onto
the row above.
 Headings include data which pertains to subsequent rows, and
there is a requirement to copy and paste this information alongside
the detailed information below. For example, an accounts
receivable open invoice report contains a customer number,
customer name and address on a single row. Beneath that row,
there is a row of data for each open invoice containing the invoice
number, date, description and amount. A person would have to
copy the customer number and name/address alongside the invoice
detail in order to consider the data usable. This could take hours or
even days to accomplish.

© 2018 For information contact Deloitte Touche Tohmatsu Limited


 All data is contained within column “A”. It looks like it falls into
columns but when you position the cursor over Column “A”, the
entire row of data is contained in “A”.
 A text report was imported into Excel and parsed into columns.
Data which you would expect to fit into one column is split into two
or more columns. Amounts are text on some rows, numeric in
others.
 Your Excel Workbook actually has 12 tabs containing monthly data.
The sum of the rows for the 12 tabs combined exceeds 1,048,576,
which is Excel’s current limitation.
 You are unable to recalculate totals.

APPENDIX – Z - COMPLETENESS TESTING, BEFORE AND AFTER DATA


PREPARATION FOR EXCEL ANALYTICS OR ACL CONVERSION

Before Attempting the Data Preparation steps…

Whether performing the data preparation steps yourself or relying upon


the data conversion service, there are a couple of things to know before
starting your data preparation endeavor.

Verify the data contains all of the necessary columns (fields) needed to
perform your analysis.

Make sure the data is at the correct testing unit level of detail.

Verify that the totals at the bottom of the spreadsheet or other form of
data agrees or “ties out” to the account balance you are testing, or to
some other expected amount. You should attempt to recalculate totals
from within Excel.

Save a copy of the original file in case you get to a point where you just
want to start over.

After the data preparation steps…

Verify all of the necessary columns (fields) needed to perform your


analysis are formatted properly. For example, dates function as dates,
numeric data is properly prepared and all other data is left justified text.

Verify the data is at the correct testing unit level of detail.

Verify that the totals at the bottom of the spreadsheet or other form of
data still agrees to the account balance you are testing, or to some other
expected amount.

© 2018 For information contact Deloitte Touche Tohmatsu Limited

You might also like