Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 9

INFS 6018: Managing Business Intelligence

Week 3: Dirty Data and Data Quality

Dr. Daniel Gozman


Prof. John Buchanan
S1 2023

The University of Sydney


5 Levels of Data

– Molyneaux 2002
Demographic Human characteristics data eg age, sex etc

Transactional Data about business events involving customers

Behavioural Data about how a customer interacts with a company

Relationship Data for relationships between people in the business

Derived Data collected by analysis tools (strictly information as


they are processed data) eg customer profitability

The University of Sydney Page 2


Data Cleansing

– Source systems contain


“dirty data” that must be
cleansed

– Specialized data cleansing


software is often used.
Important for performing
name and address
correction

The University of Sydney Page 3


Data Quality
– Harvesting
– Data Cleansing
– Filtering
– Validation
– Analysis
– Synthesis
– Presentation(Visualization)
– Exploration (Visual analytics)
– Personalisation/Customization

– Critical thinking reflection:


How do these activities help or
hinder the value of
information?
The University of Sydney 4
Page 4
Reasons for “Dirty” Data
· Dummy Values

· Absence of Data

· Multipurpose Fields

· Cryptic Data

· Contradicting Data

· Inappropriate Use of Address Lines

· Violation of Business Rules

· Reused Primary Keys, Non-Unique Identifiers

· Data Integration Problems


The University of Sydney Page 5
The Three Dimensions of Information Quality

The University of Sydney Page 6


Data Quality
Source English 1999

How would you define the four aspects of data quality in an


organization you know well? Pick 2-3 examples to reflect upon

Right data for Accuracy The data correctly defines the event
what happened

Got the lot Completeness All the data are present

Defined
performance Validity The data fall between acceptable ranges defined by the business

Apples vs. oranges Consistency The data elements are consistently defined and understood

The University of Sydney Page 7


Data/Information Quality Recap
(Source: TDWI report)
– Accuracy- Does the data accurately represent reality?

– Integrity – Is the structure of data and relationships among entities and attributes
maintained consistently?

– Consistency – Are data elements consistently defined and understood?

– Completeness – Is all necessary data present?

– Validity – Do data value fall within acceptable range?

– Timeliness – Is data available when is needed?

– Accessibility – Is the data easily accessible, understandable and usable?

The University of Sydney Page 8


Value & Risk Assessment of Different
Organisational Information Assets(Hawley 1995)

The University of Sydney Page 9

You might also like