Professional Documents
Culture Documents
BIA 5000 Introduction To Analytics - Lesson 2
BIA 5000 Introduction To Analytics - Lesson 2
TO ANALYTICS
2022 – 2023
LESSON 2.
DATA LIFE CYCLE
Learning Objectives
Sourcing
Storage &
Preparation
Protection
& Usage
Sharing
Archiving
Destruction
Data Life Cycle
Sourcing Collecting and capturing data values from various sources.
A.k.a Data capture/Data acquisition
Storage & Storing, maintaining and preparing data for usage.
preparation A.k.a. Storage & maintenance
Protection & Application of data to the tasks needed to operate the enterprise while
usage protecting the data.
A.k.a Permitted use of data
Sharing Sending data to users or entities that require the data for certain purposes,
both inside and outside the enterprise.
A.k.a. “publication”
Archiving Archiving data that is no longer actively used for a defined retention period.
Destruction Removal of every copy of data item from enterprise.
A.k.a. Purging / Permanently destroying
Data Life Cycle – processes
Sourcing
Storage &
Preparation
Protection
& Usage
• Obtain data externally
Sharing
• Create or enter data
• Receive and capture data signals
Archiving
Destruction
Data Life Cycle – processes
Sourcing
Storage &
Preparation
Protection
& Usage
Sharing
• Move and store data
• Cleanse and enrich data Archiving
• Transform and synthesise data
• Integrate data from multiple sources Destruction
Data Life Cycle – processes
Sourcing
Storage &
Preparation
Protection
& Usage
Sharing
Archiving
• Apply data to enterprise tasks
• Protect, monitor and audit usage Destruction
• Search, classify and explore data
• Model and analyse data
Data Life Cycle – processes
Sourcing
Storage &
Preparation
Protection
& Usage
Sharing
Archiving
• Data publication
Destruction
• Visualization
• Data sharing, moving and copying
• Delivering data products to customers
Data Life Cycle – processes
Sourcing
Storage &
Preparation
Protection
& Usage
Sharing
Archiving
Destruction
Sourcing
Storage &
Preparation
Protection
& Usage
Sharing
Archiving
Destruction
Module 7: Analytics • Phases in analytics projects – how do they relate to data life
project basics cycle
Sharing
Archiving
Destruction
DMBOK KNOWLEDGE
AREAS
Module 2
DAMA and DMBOK
DAMA DMBOK ®: Data Management Association (DAMA) Data Management Body of Knowledge
https://dama.org/content/body-knowledge
DMBOK Data Management
Knowledge Areas
Data Management is an
overarching term that
describes the processes
used to plan, specify,
enable, create, acquire,
maintain, use, archive,
retrieve, control, and purge
data. These processes
overlap and interact within
each data management
knowledge area.
DMBOK Planning, oversight, and control over management of data and the use
Definition of data and data-related resources.
Order
Management
Data Modeling & Design
DMBOK Managing shared data to reduce redundancy and ensure better data
Definition quality through standardized definition and use of data values
Storage &
Preparation
Protection
& Usage
Sharing
Archiving
Destruction
Business Insights & Analytics: how does it fit in?
Sourcing
Storage &
Preparation
Protection
& Usage
Sharing
Archiving
Destruction
GOOD AND BAD
DATA
Module 2
The five C’s of data
Consistent data must follow the same standard, definitions and use
Consistent
the same codes and ranges of values to reflect the same meaning
Text
Numbers Social media
XML files
Categories Satellite images
Email
Codes Presentations
JSON messages
Dates PDFs
Digital photo files
Character strings Audio recordings
Accessible PDFs
Binary (True/False) Video
Website content
Rectangular datasets
(spreadsheets, database
tables)
XML Basics
Example: https://learning-oreilly-com.ezproxy.humber.ca/library/view/xml-visual-
quickstart/9780321602589/ch02.html
XML structure
A root element is required
Every XML document must contain one, and only one, root element. This root element contains all Root element
the other elements in the document.
All data (values) must be enclosed within tags Child element
Every piece of data must have a defined place in an XML file within a starting and a closing tag.
Closing tag has the same name as starting tag, with ‘/’ in front Grandchild element
Tags can have any names, but must describe the content
Grandchild element
A user can pick any name for a tag however it should describe the element’s purpose and
contents. Grandchild element
Closing tags are required
Every element must have a closing tag. Child element
Elements must be properly nested
If you start element A, then start element B, you must first close element B before closing element
A
Child element
Tags can have attributes (zero to many)
Information contained in an attribute is considered metadata - information about the data in the
element, as opposed to the data itself. An element can have as many attributes as desired, as
long as each has a unique name.
Indentation XML: Visual QuickStart Guide, Second Edition
It is a good practice to indent child elements relative to parents to make XML documents easier by Kevin Howard Goldberg Published by
Peachpit Press, 2008
to read and interpret by a human (see examples in the source)
Nesting <root>
<child>
Root element
<grandchild>
</grandchild>
Grandchild element
</child>
Grandchild element </root>
Grandchild element
<root>
Child element <child>
<grandchild>
Toopy
Child element
</child>
</grandchild>
</root>
XML syntax
XML declaration
Should be included at the beginning of each XML file: <?xml version="1.0"?>
Case matters
XML is case sensitive. Starting and closing tags must use the same capitalization.
Tag names
Names must begin with a letter, underscore, or colon, and may contain letters, digits, and underscores. Spaces are not allowed.
Although valid, it is recommended to avoid including colons, dashes, and periods within your names. In addition, you may not use
names that begin with the letters xml, in any combination of upper- and lowercase.
Tag contents does not require any additional format
Everything within starting and closing tag is considered the tag content
Attribute values must be enclosed in quotation marks
An attribute’s value must always be enclosed in either matching single or double quotation marks. No spaces between attribute name
and value
White Space
You can add extra white space, including line breaks, around the elements in your XML code to make it easier to edit and view. While
extra white space is visible in the file and when passed to other applications, it is ignored by the XML processor,
Language support
Tag and element names do not need to be in English – it can be any language supported by the software used.
XML: Visual QuickStart Guide, Second Edition
Comments by Kevin Howard Goldberg Published by
Comments can be inserted anywhere, enclosed in <!-- and --> (double hyphen) Peachpit Press, 2008
Special characters in XML
& &
“ "
' '
XML example
<?xml version="1.0"?>
<family>
<Parent>Yulia </Parent>
<child>
<name>Lucy</name>
<DoB>7 /7 /2005 </DoB>
<gender>female</gender>
</child>
<child>
<name>Matt</name>
<DoB>7/12/2002</DoB>
</child>
<child>
<name>Preetika</name>
<DoB>7/7/2007</DoB>
</child>
</family>
<Average_daily>
<Average_monthly>
XML example – dates