Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

#Draft E-MAIL

To: Sprocket Central Pty Ltd

Subject: Data Quality Evaluation

Dear Sir /Madam,

Below you will find KPMG’s data quality evaluation and respected recommendations
regarding the issues found for Sprocket Central’s customer demographic, customer
addresses and past transaction data sets.

Customer Demographic
The accuracy, consistency, completeness, and validity of data caused key issues
regarding the quality of data received.

1. Accuracy
There are various first names that have been incorrectly inputted, it is therefore
recommended that Sprocket Central Pty Ltd reviews the following Customer ID’s to
ensure that the correct first names have been inputted

Customer ID First Name


2347 L;urette
249630 D’arcy

A similar issue occurred for last names, wherein the Customer Id’s have been incorrectly
entered, these have also been noted down below:
Customer ID Last Name
1727 Godehard.sf

Furthermore, the date of birth has also had a significant outlier that should be reviewed:
Customer ID DOB
34 1843-‐12-‐21

The following input regarding gender should also be altered:


Customer ID Gender
54 Femal

2. Consistency
It is recommended that Sprocket Central Pty Ltd review the following issues and choose
a method by which they will dedicate themselves to.
When inputting last names, Sprocket Central should decide on whether or not to include
a space between select last names that have apostrophes in them, this is shown below:

Customer ID First Name Last Name


1079 Dennie L’ Anglois
1784 Louella O' Timony
3479 Pierette O' Ronan
1583 Krysta O' Reagan
3131 Sybilla O' Markey
1765 Sibella O' Mara
881 Carmella O' Lone

*Note: Since the majority of data already neglects the spaces between last names, it
is preferable to change the above customers last names such that they follow the
current trend within the data set.

Furthermore, to ensure consistency, it is recommended that Sprocket Central Pty Ltd


categorises the genders of its customers as Female, Male and U (representing the
undefined) to continue to current trend. Therefore, the following data inputs should be
reviewed:
Customer ID Gender
1 F
57 M

3. Completeness
The customer demographic dataset furthermore significantly lacks completeness;
therefore, it is recommended that Sprocket Central Pty Ltd ensures that they collect the
necessary information to complete the data set. This can be done by implementing
alerts and notifications in the data collection process.
The following data columns lack information:

Data Column Missing Information


Date of Birth (DOB) 87 unknowns
Tenure 87 unknowns
Job Title 506 unknowns

It is also recommended that Sprocket Central Pty Ltd research into any links between
the lack of DOB and the lack of information regarding the customers’ tenure. Should
there be a relationship, then it can be mitigated by implementing respective controls.

4. Validity
The default column is unreadable, in which many of the values lack accuracy,
consistency and legibility, it is therefore recommended to either
(a) Remove the default column
(b) Completely redesign and reconfigure the default column

Customer Address
The key issues regarding the customer address dataset revolved around consistency,
uniqueness and completeness.

1. Consistency
It is important to ensure the consistent input of data therefore, the following State
notation should be changed to NSW and VIC respectively to follow current Sprocket
Central data trends
State Entered Number of Customers
New South Wales 86
Victoria 82

Furthermore, within the customer address dataset the presence of 0-‐cushioning,


where addresses are written as such:
Incorrectly Written Correct Version
004 Lawn Trail 4 Lawn Trail
This is the case for 315 addresses, and should be corrected.

2. Uniqueness
There is a repetition in addresses, as shown below, it is recommended that Sprocket
Central reviews and ensures that the correct addresses have been inputted:
Customer ID Address Postcode State
737 3 Talisman Place 4811 QLD
2475 3 Talisman Place 4017 QLD
64 Macpherson
2320 Junction 2208 NSW
64 Macpherson
3540 Junction 4061 QLD

3. Completeness
The following customer IDs are possibly missing, therefore Sprocket Central should
review their data to ensure that it has not been categorised under the incorrect ID:
Customer ID
3
10
22
23

Furthermore, it is possible that one customer’s information is missing since 3999 IDs
in Customer addresses and 4000 in customer demographics, therefore Sprocket
Central should double check their data.

Transactions Sheet
The transaction sheet for the past three months for Sprocket Central Pty Ltd lacks currency,
consistency and completeness.

1. Currency
Sprocket Central is missing all transactions from the 31 st of December. Although it is
likely that this is because of a public holiday, as other public holiday such as Australia
Day (26th January) maintains consistent transactions, Sprocket Central should review
and add the missing data to the transaction sheet.

2. Consistency
Since the Customer ID’s are given chronologically, the following customer ID is
invalid.
Given Customer ID Correct Customer ID Transaction Numbers
5034 3501 8708, 16701, 17469

Furthermore, the standard costs maintain a consistent 2 decimal point rounding,


therefore, the following transactions should be altered to maintain consistency will
all other standard costs:
Transaction ID Standard Cost Correct Standard Cost
17469 667.4000244 $667.40
16701 270.2999878 $270.30

Furthermore, some list prices items do not maintain the 2-‐decimal point trend, and
therefore should be changed:

List Price
360.4
642.7
792.9
1179
1281.6
1403.5
1483.2
1635.3
1636.9
1720.7
1765.3
1777.8
1810

3. Completeness
There are a multitude of values in each column that are missing, the column name
and amount of information missing is given below. To account for this, it is
recommended that Sprocket Central attain these values from their customers to
maintain high levels of data quality.

Column Title Information Missing


Online Order (T/F) 360
Brand of Bike 197
Product Line 197
Bike Class 197
Bike Size 197
First Sold Date 197
Standard Cost Price 197

Due to the recurring value nature of brand, product line, standard cost, bike class,
size and first sold date, KPMG also recommends that Sprocket Central research into
the link and relationship between the unknown variables, and thereby solve the
issue from recurring through either educating its staff, debugging its programs and
integrating alert systems.

Recommendations for Future Operations


Accuracy
Implement double checking mechanisms and practices between employers and
customers to ensure that all information is accurate and relevant
Consistency
Enforce standard data recording procedures to ensure that all new data entered into
the system is in a manner consistent to previous data
Completeness
Implement a double-checking system in which employees responsible for inputting
data receive notifications when key data is missing, this thereby prohibits account
creation when there is a lack of data.
Currency
Create weekly or monthly notification reminders for managers and supervisors to
review all inputted data often, thereby checking for currency.

Kind Regards,
Prabhav Garg
KPMG Data Analytics Virtual Intern

You might also like