Rose Collins Data Warehouse Design Assignment 1

You might also like

Download as odt, pdf, or txt
Download as odt, pdf, or txt
You are on page 1of 6

Rose Collins

Data Warehouse
Design
Assignment 1

Contents
Data Quality...............................................................................................................................................3
Steps :....................................................................................................................................................3
Stages in data cleansing:...........................................................................................................................3
parsing...................................................................................................................................................3
correction...............................................................................................................................................3
Standardization......................................................................................................................................3
Enhancement.........................................................................................................................................4
Matching....................................................................................................................................................4
Consolidation.............................................................................................................................................4
Problems................................................................................................................................................4
Data Governance.......................................................................................................................................4
Data quailty metrics...................................................................................................................................5
Uniqueness............................................................................................................................................5
Accuracy................................................................................................................................................5
Consistency............................................................................................................................................5
Completeness.........................................................................................................................................5
Timeliness..............................................................................................................................................5
Currency................................................................................................................................................5
Conformance.........................................................................................................................................5
Referential Integrity..............................................................................................................................5
Technology that supports your metrics.................................................................................................5
Monitor and Manage Ongoing Quality of Data....................................................................................5
Data quality scorecard...............................................................................................................................5
References.................................................................................................................................................6

Data Quality
Data Quality is when the data is of good quality e.g no errors or incorret data has been entered. There
is stages and events you go through to complete the data.
You should run a baseline on your data quality which will allow you to see if there will be any
failures, and prevent these from happening.

Steps :

Extraction is extracting the data from the sources systems


Transformation then loading the data so it can be cleansed
consolidation
maintenance

[1]Data quality is achieved in three stages: cleansing, matching, and consolidation.

Stages in data cleansing:


1. Data Cleansing parsing, correction, standardized, and enhanced for accurate matching.
cleaning the data up and putting it into the appropiate fields

Parsing locates, identifies, and isolates


problems with parsing
information in a field may not match its metadata profile
data placed in the wrong field e.g. town in county field
blank data fields

Correction- cleasning and validating


problems with correction
spelling mistakes
abbreviations of names e.g rob instead of robert
data that is out of date e.g cust has changed address or has got married

Standardization arranging data into correct format


Problems with standardization

Inconsistant abrviations los Angles as LA


Spelling mistakes e.g melaine as meliane

Enhancement changing new data and completing missing fields


Problems with enhancement

Takes time and cost money


Customers may not want to provide information e.g credit card info
Some data can be sensivite an cannot be leaked to outside sources . e.g bank details

Matching comparing the data to see if you have similar data in your data warehouse. By
removing the duplicate data you will be able to see accurate data on you customers.
Problems with data matching
Company changes there name
Two companys merge together
Business acronyms e.g IBM

Consolidation- once you have matched your data you can see a relationship between some
customers. This helps with target marketing and you get to know the customer better. Have a one to
one relationship with your customer, e.g you will know what products they will be interested in by
looking at their history. Even by looking at neighbourhoods you can see which products sell best in
which areas.
two methods for consolidation
1. looking at all of the data on a customer
2. looking at links between customers e.g business grouping and householding grouping.
Householding links are between people who live in the same house, e.g same address , same last
name.
Bussiness links are customers who work in the same company. E.g company name, address, dept

Problems
Some causes of data quality problems can be:
The complexity of a data warehouse increases
Missing values in data sources
Failure to update sources in a timely manner
Poor data quality testing
Incorrect data relationships
Spelling mistakes
Abbreviations used instead of full name
Organizations see data quality as a system, technical or process problem when in fact it is very rarely
any of these things. Poor data quality is a behavioural problem, not a technology
problem. What I mean by behavioural is end users. It is the end users who input the data and append
and update and maintain the data.

Data Governance
Data governance needs a governing body, a set of procedures and a plan to execute these procedures.
You must create a plan to say who is accountable for what aspects of the data. e.g. accuracy is all the
data correct and up to date, accessibility- can I access the data and if not why.
Then u need to define your processes of how the data is stored backed up etc.
Next a set of procedures must be developed that will define how the data is used.
Finally, a set of controls and audit procedures must be put into place that ensures on going compliance
with government regulations

Data quailty metrics


Uniqueness no entity exist more than once and has unqiue key that access this entity e.g each
product will have a unique product no.

Accuracy data correctly represents true values.


Consistency data being the same in one set to another set.
Completeness - Mandatory fields that require a value, fields that have a value based on a
condition

Timeliness the time between when the information is expected and when it is available
Currency how up to date the data is. lifetime of data before it needs to be checked and updated.
Conformance instances of the data are stored e.g data type, format
Referential Integrity assigns unquie keys to object e.g. customer no, product no.[2] introduces new expectations that any time an object identifier is used as foreign keys within a data set to
refer to the core representation, that core representation actually exists.

Technology that supports your metrics


A framework that monitors the data must integrate technology that will access and discover data
issues. Use data quality rules. Use these rules to distinguish between valid and invalid data and
cleansing invalid data, and the management, measurement, and reporting of conformance to
those rules.

Monitor and Manage On going Quality of Data


[3]The most important component of data quality metrics is the ability to collect the statistics
associated with data quality metrics, report them in a way that an enables action to be taken [],
And provide a history of this so it can be tracked over time.

Data quality scorecard


Is a management tool that captures shots of the quality of the date, shows flaws that are impacting
business operations and shows where the worst flaws are in the system. Now you can take action to
fix and remove the sources of poor data quality quickly and efficiently.

References
[1]http://moodle1011.dkit.ie/file.php/12080/section_7/FirstLogic_256.pdf
data quality
The Foundation of One-to-One Customer Relationships - i.d centric from First Logic, white paper
[2]http://moodle1011.dkit.ie/file.php/12080/2010_Additions/Monitoring_Data_Quality_Performance.
pdf
Monitoring Data Quality Performance
Using Data Quality Metrics with David Loshin white paper
[3] ]
http://moodle1011.dkit.ie/file.php/12080/2010_Additions/Monitoring_Data_Quality_Performance.pdf
Monitoring Data Quality Performance
Using Data Quality Metrics with David Loshin white paper

You might also like