Professional Documents
Culture Documents
Rose Collins Data Warehouse Design Assignment 1
Rose Collins Data Warehouse Design Assignment 1
Rose Collins Data Warehouse Design Assignment 1
Data Warehouse
Design
Assignment 1
Contents
Data Quality...............................................................................................................................................3
Steps :....................................................................................................................................................3
Stages in data cleansing:...........................................................................................................................3
parsing...................................................................................................................................................3
correction...............................................................................................................................................3
Standardization......................................................................................................................................3
Enhancement.........................................................................................................................................4
Matching....................................................................................................................................................4
Consolidation.............................................................................................................................................4
Problems................................................................................................................................................4
Data Governance.......................................................................................................................................4
Data quailty metrics...................................................................................................................................5
Uniqueness............................................................................................................................................5
Accuracy................................................................................................................................................5
Consistency............................................................................................................................................5
Completeness.........................................................................................................................................5
Timeliness..............................................................................................................................................5
Currency................................................................................................................................................5
Conformance.........................................................................................................................................5
Referential Integrity..............................................................................................................................5
Technology that supports your metrics.................................................................................................5
Monitor and Manage Ongoing Quality of Data....................................................................................5
Data quality scorecard...............................................................................................................................5
References.................................................................................................................................................6
Data Quality
Data Quality is when the data is of good quality e.g no errors or incorret data has been entered. There
is stages and events you go through to complete the data.
You should run a baseline on your data quality which will allow you to see if there will be any
failures, and prevent these from happening.
Steps :
Matching comparing the data to see if you have similar data in your data warehouse. By
removing the duplicate data you will be able to see accurate data on you customers.
Problems with data matching
Company changes there name
Two companys merge together
Business acronyms e.g IBM
Consolidation- once you have matched your data you can see a relationship between some
customers. This helps with target marketing and you get to know the customer better. Have a one to
one relationship with your customer, e.g you will know what products they will be interested in by
looking at their history. Even by looking at neighbourhoods you can see which products sell best in
which areas.
two methods for consolidation
1. looking at all of the data on a customer
2. looking at links between customers e.g business grouping and householding grouping.
Householding links are between people who live in the same house, e.g same address , same last
name.
Bussiness links are customers who work in the same company. E.g company name, address, dept
Problems
Some causes of data quality problems can be:
The complexity of a data warehouse increases
Missing values in data sources
Failure to update sources in a timely manner
Poor data quality testing
Incorrect data relationships
Spelling mistakes
Abbreviations used instead of full name
Organizations see data quality as a system, technical or process problem when in fact it is very rarely
any of these things. Poor data quality is a behavioural problem, not a technology
problem. What I mean by behavioural is end users. It is the end users who input the data and append
and update and maintain the data.
Data Governance
Data governance needs a governing body, a set of procedures and a plan to execute these procedures.
You must create a plan to say who is accountable for what aspects of the data. e.g. accuracy is all the
data correct and up to date, accessibility- can I access the data and if not why.
Then u need to define your processes of how the data is stored backed up etc.
Next a set of procedures must be developed that will define how the data is used.
Finally, a set of controls and audit procedures must be put into place that ensures on going compliance
with government regulations
Timeliness the time between when the information is expected and when it is available
Currency how up to date the data is. lifetime of data before it needs to be checked and updated.
Conformance instances of the data are stored e.g data type, format
Referential Integrity assigns unquie keys to object e.g. customer no, product no.[2] introduces new expectations that any time an object identifier is used as foreign keys within a data set to
refer to the core representation, that core representation actually exists.
References
[1]http://moodle1011.dkit.ie/file.php/12080/section_7/FirstLogic_256.pdf
data quality
The Foundation of One-to-One Customer Relationships - i.d centric from First Logic, white paper
[2]http://moodle1011.dkit.ie/file.php/12080/2010_Additions/Monitoring_Data_Quality_Performance.
pdf
Monitoring Data Quality Performance
Using Data Quality Metrics with David Loshin white paper
[3] ]
http://moodle1011.dkit.ie/file.php/12080/2010_Additions/Monitoring_Data_Quality_Performance.pdf
Monitoring Data Quality Performance
Using Data Quality Metrics with David Loshin white paper