Professional Documents
Culture Documents
#Draft E-MAIL: Godehard - SF
#Draft E-MAIL: Godehard - SF
Below you will find KPMG’s data quality evaluation and respected recommendations
regarding the issues found for Sprocket Central’s customer demographic, customer
addresses and past transaction data sets.
Customer Demographic
The accuracy, consistency, completeness, and validity of data caused key issues
regarding the quality of data received.
1. Accuracy
There are various first names that have been incorrectly inputted, it is therefore
recommended that Sprocket Central Pty Ltd reviews the following Customer ID’s to
ensure that the correct first names have been inputted
A similar issue occurred for last names, wherein the Customer Id’s have been incorrectly
entered, these have also been noted down below:
Customer ID Last Name
1727 Godehard.sf
Furthermore, the date of birth has also had a significant outlier that should be reviewed:
Customer ID DOB
34 1843-‐12-‐21
2. Consistency
It is recommended that Sprocket Central Pty Ltd review the following issues and choose
a method by which they will dedicate themselves to.
When inputting last names, Sprocket Central should decide on whether or not to include
a space between select last names that have apostrophes in them, this is shown below:
*Note: Since the majority of data already neglects the spaces between last names, it
is preferable to change the above customers last names such that they follow the
current trend within the data set.
3. Completeness
The customer demographic dataset furthermore significantly lacks completeness;
therefore, it is recommended that Sprocket Central Pty Ltd ensures that they collect the
necessary information to complete the data set. This can be done by implementing
alerts and notifications in the data collection process.
The following data columns lack information:
It is also recommended that Sprocket Central Pty Ltd research into any links between
the lack of DOB and the lack of information regarding the customers’ tenure. Should
there be a relationship, then it can be mitigated by implementing respective controls.
4. Validity
The default column is unreadable, in which many of the values lack accuracy,
consistency and legibility, it is therefore recommended to either
(a) Remove the default column
(b) Completely redesign and reconfigure the default column
Customer Address
The key issues regarding the customer address dataset revolved around consistency,
uniqueness and completeness.
1. Consistency
It is important to ensure the consistent input of data therefore, the following State
notation should be changed to NSW and VIC respectively to follow current Sprocket
Central data trends
State Entered Number of Customers
New South Wales 86
Victoria 82
2. Uniqueness
There is a repetition in addresses, as shown below, it is recommended that Sprocket
Central reviews and ensures that the correct addresses have been inputted:
Customer ID Address Postcode State
737 3 Talisman Place 4811 QLD
2475 3 Talisman Place 4017 QLD
64 Macpherson
2320 Junction 2208 NSW
64 Macpherson
3540 Junction 4061 QLD
3. Completeness
The following customer IDs are possibly missing, therefore Sprocket Central should
review their data to ensure that it has not been categorised under the incorrect ID:
Customer ID
3
10
22
23
Furthermore, it is possible that one customer’s information is missing since 3999 IDs
in Customer addresses and 4000 in customer demographics, therefore Sprocket
Central should double check their data.
Transactions Sheet
The transaction sheet for the past three months for Sprocket Central Pty Ltd lacks currency,
consistency and completeness.
1. Currency
Sprocket Central is missing all transactions from the 31 st of December. Although it is
likely that this is because of a public holiday, as other public holiday such as Australia
Day (26th January) maintains consistent transactions, Sprocket Central should review
and add the missing data to the transaction sheet.
2. Consistency
Since the Customer ID’s are given chronologically, the following customer ID is
invalid.
Given Customer ID Correct Customer ID Transaction Numbers
5034 3501 8708, 16701, 17469
Furthermore, some list prices items do not maintain the 2-‐decimal point trend, and
therefore should be changed:
List Price
360.4
642.7
792.9
1179
1281.6
1403.5
1483.2
1635.3
1636.9
1720.7
1765.3
1777.8
1810
3. Completeness
There are a multitude of values in each column that are missing, the column name
and amount of information missing is given below. To account for this, it is
recommended that Sprocket Central attain these values from their customers to
maintain high levels of data quality.
Due to the recurring value nature of brand, product line, standard cost, bike class,
size and first sold date, KPMG also recommends that Sprocket Central research into
the link and relationship between the unknown variables, and thereby solve the
issue from recurring through either educating its staff, debugging its programs and
integrating alert systems.
Kind Regards,
Prabhav Garg
KPMG Data Analytics Virtual Intern