Professional Documents
Culture Documents
Ajay Bigdata Unit 2
Ajay Bigdata Unit 2
AJAY KUMAR
Assistant Professor-II
Computer Science & Engineering
Information life-cycle
management forms one of
the foundational pillars in
the management of data
within an enterprise.
It is the platform on which
the three pillars of data
management are designed.
The first pillar represents
process, the second
represents the people, and
the third represents the
technology.
Phases of Information Life cycle
Capturing Data
Preserving data
Grouping data
Processing data
Publishing data
Archiving data
Removing data
Governance
Information and program governance
are two important aspects of managing
information within an enterprise.
Information governance deals with
setting up governance models for data
within the enterprise and program
governance deals with implementing
the policies and processes set forth in
information governance.
Both of these tasks are fairly people-
specific as they involve both the
business user and the technology teams.
A governance process is a
multistructured organization of people
who play different roles in managing
information. Data Governance Teams
Technology
Implementing the program from a concept to Is measured in percentage of
reality within data governance falls in the corrections required per execution
technology layers. There are several different per table. The lower the
percentage of corrections, the
technologies that are used to implement the higher the quality of data.
different aspects of governance. These include
tools and technologies used in Data acquisition, Data enrichment
Data cleansing, Data transformation and We have always enriched data to
Database code such as stored procedures, improve its accuracy and information
quality.
programming modules coded as application
programming interface (API), Semantic
In the world of Big Data, data
enrichment is accomplished by
technologies and Metadata libraries. integrating taxonomies, ontologies, and
third-party libraries as a part of the data
Data quality processing architecture.
Is implemented as a part of the data movement and
transformation processes.
Is developed as a combination of business rules
developed in ETL/ELT programs and third-party data
enrichment processes.
Technology
Enriched data will provide the user Is developed by IT teams. ● Includes
capabilities: auditing and traceability framework
To define and manage hierarchies. components for recording data
manipulation language (DML) outputs
To create new business rules on-the-fly for and rejects from data quality and integrity
tagging and classifying the data. checks.
To process text and semi-structured data
more efficiently.
Data archival and retention
Explore and process multilingual and
Is implemented as part of the archival and
multistructured data analysis. purging process.
Is developed as a part of the database
Data transformation systems by many vendors.
Is implemented as part of ETL/ELT Is often misquoted as a database feature.
processes.
Often fails when legacy data is imported
Is defined as business requirements by the back due to lack of correct metadata and
user teams. underlying structural changes.
Uses master data and metadata program This can be avoided easily by exporting
outputs for referential data processing and the metadata and the master data along
data standardization with the data set.
Technology
Master data management Metadata
Is implemented as a data definition process by
Is implemented as a standalone business users,
program. Has business-oriented definitions for data for
Is implemented in multiple cycles each business unit.
for customers and products.
One central definition is regarded as the
enterprise metadata view of the data.
Is implemented for location, Has IT definitions for metadata related to data
organization, and other smaller data structures, data management programs, and
sets as an add-on by the semantic layers within the database.
implementing organization.
Has definitions for semantic layers
implemented for business intelligence and
Measured as a percentage of changes analytical applications.
processed every execution from All the technologies used in the processes
source systems. described above have a database, a user
Operationalized as business rules for interface for managing data, rules and
definitions, and reports available on the
key management across operational,
processing of each component and its
transactional, warehouse, and associated metrics.
analytical data
Measuring the impact of information life-cycle
management