Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

INTRODUCTION

TO ANALYTICS
2022 – 2023
LESSON 2.
DATA LIFE CYCLE
Learning Objectives

• Name and understand the phases of the data lifecycle


• Identify the processes and activities of each phase
• Recognize DAMA Framework knowledge areas
• Interpret a simple context diagram
• Describe how analytics fits into DAMA framework
• Discuss good and bad data
• Interpret XML data format
Agenda

1. Data lifecycle phases and activities


2. Context diagram example
3. DAMA DMBOK knowledge areas
4. Qualities of good data; five C’s
5. XML data format
Does the data have a life cycle?

Discuss the article given out as home assignment.

What happens to the data?

Where does it come from?

Where does it go?


DATA LIFE CYCLE
Module 2
Data Life Cycle

Sourcing

Storage &
Preparation
Protection
& Usage

Sharing

Archiving

Destruction
Data Life Cycle
Sourcing Collecting and capturing data values from various sources.
A.k.a Data capture/Data acquisition
Storage & Storing, maintaining and preparing data for usage.
preparation A.k.a. Storage & maintenance
Protection & Application of data to the tasks needed to operate the enterprise while
usage protecting the data.
A.k.a Permitted use of data

Sharing Sending data to users or entities that require the data for certain purposes,
both inside and outside the enterprise.
A.k.a. “publication”

Archiving Archiving data that is no longer actively used for a defined retention period.
Destruction Removal of every copy of data item from enterprise.
A.k.a. Purging / Permanently destroying
Data Life Cycle – processes

Sourcing

Storage &
Preparation
Protection
& Usage
• Obtain data externally
Sharing
• Create or enter data
• Receive and capture data signals
Archiving

Destruction
Data Life Cycle – processes

Sourcing

Storage &
Preparation
Protection
& Usage

Sharing
• Move and store data
• Cleanse and enrich data Archiving
• Transform and synthesise data
• Integrate data from multiple sources Destruction
Data Life Cycle – processes

Sourcing

Storage &
Preparation
Protection
& Usage

Sharing

Archiving
• Apply data to enterprise tasks
• Protect, monitor and audit usage Destruction
• Search, classify and explore data
• Model and analyse data
Data Life Cycle – processes

Sourcing

Storage &
Preparation
Protection
& Usage

Sharing

Archiving

• Data publication
Destruction
• Visualization
• Data sharing, moving and copying
• Delivering data products to customers
Data Life Cycle – processes

Sourcing

Storage &
Preparation
Protection
& Usage

Sharing

Archiving

Destruction

• Copying data into archive


• Removing archived data from active environments
Data Life Cycle – processes

Sourcing

Storage &
Preparation
Protection
& Usage

Sharing

Archiving

Destruction

• Permanently destroying data


Data life cycle: group discussions

Why do enterprises purge (destroy) data?


Later in the course

Module 7: Analytics • Phases in analytics projects – how do they relate to data life
project basics cycle

Module 8: Legislative & • Permitted uses of data


security issues
• Data protection

Module 9: Ethical • Ethical sharing of data


issues in analytics
Data Life Cycle – processes
What knowledge and skills are
Sourcing
needed to manage data
Storage & through its lifecycle?
Preparation
Protection
& Usage

Sharing

Archiving

Destruction
DMBOK KNOWLEDGE
AREAS
Module 2
DAMA and DMBOK

DAMA International is a not-for-profit, vendor-independent, global association of technical and


business professionals dedicated to advancing the concepts and practices of information and
data management.

DAMA DMBOK ®: Data Management Association (DAMA) Data Management Body of Knowledge

https://dama.org/content/body-knowledge
DMBOK Data Management
Knowledge Areas
Data Management is an
overarching term that
describes the processes
used to plan, specify,
enable, create, acquire,
maintain, use, archive,
retrieve, control, and purge
data. These processes
overlap and interact within
each data management
knowledge area.

DAMA DMBOK Framework


Data Governance

DMBOK Planning, oversight, and control over management of data and the use
Definition of data and data-related resources.

Processes & Enforce:


activities • Consistent definitions
• Rules
• Business metrics
• Policies and procedures on how to use data
• Reference data
• Data ownership
Data Architecture

DMBOK The overall structure of data and data-related resources as an integral


Definition part of the enterprise architecture

Processes & Define:


activities • Data needed to meet business needs
• Data, facts and dimensions
• Logical data models
• Enterprise data flows
Examine:
• Completeness and correctness of the source systems needed to obtain
data
Context Diagram — example

Service Customer Customer


Customer Self- requests Information Transaction
Relationship
Service App History
Management
Request status Customer
&notifications transactions
Customer Order
address status

Order
Management
Data Modeling & Design

DMBOK Analysis, design, building, testing, and maintenance of data structures


Definition

Processes & Design and build:


activities • Conceptual, logical and physical data modeling
• Master data modeling
• Modeling and design for different architectures (data warehouse, data
lake, cloud data storage etc.)
Data Storage & Operations

DMBOK Deployment and management of structured physical data assets storage


Definition

Processes & Manage:


activities • Building and operating data storage solutions
• Performance management, back-up and recovery of data assets
• Monitoring, archiving and purging of data assets
Data Security

DMBOK Ensuring privacy, confidentiality and appropriate access to data


Definition

Processes & Define:


activities • Privacy and security
• Access management
• Security governance (monitoring, audit, breach responses)
• Data protection (encryption)
Data Integration & Interoperability

DMBOK Acquisition, extraction, transformation, movement, delivery, replication,


Definition federation, virtualization and operational support of data assets

Processes & Manage:


activities • Data acquisition and movement
• Transformation
• Interoperability and integration
• Data migration and conversion
Documents & Content

DMBOK Storing, protecting, indexing, and enabling access to data found in


Definition unstructured sources (electronic files and physical records), and making
this data available for integration and interoperability with structured
(database) data
Processes & Govern:
activities • Content management (classification, tagging, indexing)
• Managing physical documents
• Managing electronic records (documents, images, scans, multimedia)
Reference & Master Data

DMBOK Managing shared data to reduce redundancy and ensure better data
Definition quality through standardized definition and use of data values

Processes & Govern:


activities • Establishing and managing systems of record
• Acquiring or creating systems of reference (business, spatial, market
data)
• Data business rules
Data Warehousing & Business Intelligence

DMBOK Managing analytical data processing and enabling access to decision


Definition support data for reporting and analysis

Processes & Govern:


activities • Data profiling and warehousing
• Data discovery, searching and querying
• Operational and analytical reporting
• Analytics
Metadata

DMBOK Collecting, categorizing, maintaining, integrating, controlling, managing,


Definition and delivering metadata

Processes & Manage:


activities • Business glossary / data dictionary
• Data classification
Describing data: metadata

Image credit: John O’Gorman


Metadata: information about data

Metadata: description of the data as it is created, stored, transformed, accessed


and consumed by the enterprise.
Business metadata: description of the data from business perspective
Business definition
Meaning
Source of the data
Technical metadata: description of the data as it is processed by software tools
Format
Size
Mapping

Sources: Textbook Chapter 4


Metadata: information about data

Metadata: description of the data as it is created, stored, transformed, accessed


and consumed by the enterprise.
Business metadata: description of the data from business perspective
Business definition
Meaning
Source of the data
Technical metadata: description of the data as it is processed by software tools
Format
Size
Mapping

Sources: Textbook Chapter 4


Metadata - example
Data Quality

DMBOK Defining, monitoring, maintaining data integrity, and improving data


Definition quality

Processes & Govern:


activities • Planning data quality
• Implementing data quality measures
• Monitoring data quality
Business Insights & Analytics: how does it fit in?
Sourcing

Storage &
Preparation

Protection
& Usage

Sharing

Archiving

Destruction
Business Insights & Analytics: how does it fit in?
Sourcing

Storage &
Preparation

Protection
& Usage

Sharing

Archiving

Destruction
GOOD AND BAD
DATA
Module 2
The five C’s of data

Clean data must be accurate, have no missing data points, conform


Clean
to the format and contain no invalid entries

Consistent data must follow the same standard, definitions and use
Consistent
the same codes and ranges of values to reflect the same meaning

Conformed data must be shareable across the same dimensions with


Conformed
the same business meaning

Current data must be as recent as required for business purposes


Current

Comprehensive data must be sufficient and complete for the purpose


Comprehensive
that this data is to be used for

Sources: Textbook Chapter 1


Can data be bad?

Where can bad data come from?

Provide an example of bad data from your personal or professional life.


https://www.dataquest.io/blog/advanced-data-cleaning-r-course/
XML DATA FORMAT
Module 2
Structured/Semi- Structured/Unstructured
Examples
Semi-
Structured Unstructured
Structured

Text
Numbers Social media
XML files
Categories Satellite images
Email
Codes Presentations
JSON messages
Dates PDFs
Digital photo files
Character strings Audio recordings
Accessible PDFs
Binary (True/False) Video
Website content
Rectangular datasets
(spreadsheets, database
tables)
XML Basics

XML (eXtensible Markup Language):


• Text-based format used to share data
• Markup language – uses tags to describe pieces of data
• Metalanguage - allows users to define their own markup languages
• A specification for storing information
• A specification for describing the structure of that information
• Has a well-defined structure – must follow a set of rules

Example: https://learning-oreilly-com.ezproxy.humber.ca/library/view/xml-visual-
quickstart/9780321602589/ch02.html
XML structure
A root element is required
Every XML document must contain one, and only one, root element. This root element contains all Root element
the other elements in the document.
All data (values) must be enclosed within tags Child element
Every piece of data must have a defined place in an XML file within a starting and a closing tag.
Closing tag has the same name as starting tag, with ‘/’ in front Grandchild element

Tags can have any names, but must describe the content
Grandchild element
A user can pick any name for a tag however it should describe the element’s purpose and
contents. Grandchild element
Closing tags are required
Every element must have a closing tag. Child element
Elements must be properly nested
If you start element A, then start element B, you must first close element B before closing element
A
Child element
Tags can have attributes (zero to many)
Information contained in an attribute is considered metadata - information about the data in the
element, as opposed to the data itself. An element can have as many attributes as desired, as
long as each has a unique name.
Indentation XML: Visual QuickStart Guide, Second Edition
It is a good practice to indent child elements relative to parents to make XML documents easier by Kevin Howard Goldberg Published by
Peachpit Press, 2008
to read and interpret by a human (see examples in the source)
Nesting <root>

<child>
Root element
<grandchild>

Child element Toopy

</grandchild>
Grandchild element
</child>
Grandchild element </root>

Grandchild element

<root>
Child element <child>

<grandchild>

Toopy
Child element
</child>

</grandchild>

</root>
XML syntax
XML declaration
Should be included at the beginning of each XML file: <?xml version="1.0"?>
Case matters
XML is case sensitive. Starting and closing tags must use the same capitalization.
Tag names
Names must begin with a letter, underscore, or colon, and may contain letters, digits, and underscores. Spaces are not allowed.
Although valid, it is recommended to avoid including colons, dashes, and periods within your names. In addition, you may not use
names that begin with the letters xml, in any combination of upper- and lowercase.
Tag contents does not require any additional format
Everything within starting and closing tag is considered the tag content
Attribute values must be enclosed in quotation marks
An attribute’s value must always be enclosed in either matching single or double quotation marks. No spaces between attribute name
and value
White Space
You can add extra white space, including line breaks, around the elements in your XML code to make it easier to edit and view. While
extra white space is visible in the file and when passed to other applications, it is ignored by the XML processor,
Language support
Tag and element names do not need to be in English – it can be any language supported by the software used.
XML: Visual QuickStart Guide, Second Edition
Comments by Kevin Howard Goldberg Published by
Comments can be inserted anywhere, enclosed in <!-- and --> (double hyphen) Peachpit Press, 2008
Special characters in XML

Special character XML replacement


Dun & Bradstreet
< &lt;
Dun &amp; Bradstreet
> &gt;

& &amp;

“ &quot;

' &apos;
XML example
<?xml version="1.0"?>
<family>
<Parent>Yulia </Parent>
<child>
<name>Lucy</name>
<DoB>7 /7 /2005 </DoB>
<gender>female</gender>
</child>

<child>
<name>Matt</name>
<DoB>7/12/2002</DoB>
</child>
<child>
<name>Preetika</name>
<DoB>7/7/2007</DoB>
</child>
</family>

<Average_daily>

<Average_monthly>
XML example – dates

Using a date attribute: Using an expanded <date> element:


<note date="2008-01-10"> <note>
<to>Tove</to> <date>
<from>Jani</from> <year>2008</year>
<subj>Hello there</subj> <month>01</month>
</note> <day>10</day>
</date>
Using a <date> element: <to>Tove</to>
<note> <from>Jani</from>
<date>2008-01-10</date> </note>
<to>Tove</to>
<from>Jani</from>
</note>
https://www.w3schools.com/xml/xml_attributes.asp
XML vs JSON example
{
<?xml version="1.0" encoding="UTF-8" ?> "student": [
<root>
<student> {
<id>01</id> "id":"01",
<name>Tom</name> "name": "Tom",
<lastname>Price</lastname> "lastname": "Price"
</student> },
<student>
<id>02</id> {
<name>Nick</name> "id":"02",
<lastname>Thameson</lastname> "name": "Nick",
</student> "lastname": "Thameson"
</root> }
]
}

JSON vs XML: What’s the Difference?

You might also like