Professional Documents
Culture Documents
Chapter 14 Big Data & Data Science
Chapter 14 Big Data & Data Science
Chapter 14 Big Data & Data Science
REPUBLIK INDONESIA
INSPEKTORAT JENDERAL
WORKSHOP DAMA-DMBOK
BIG DATA & DATA SCIENCE
INSPEKTORAT I
Jakarta, 29 Januari 2021
2/3/2021 2
Inspektorat Jenderal toward IACM 4 2
1. Introduction
2/3/2021 3
Inspektorat Jenderal toward IACM 4 3
1. Introduction (2)
2/3/2021 4
Inspektorat Jenderal toward IACM 4 4
1. Introduction (3)
Diagram Context:
Big Data & Data Science
2/3/2021 5
Inspektorat Jenderal toward IACM 4 5
2 Essential Concepts
Service Based
Sources of Big Data Data Lake Machine Learning
Architecture
Sentiment Analysis Data & Text Mining Predictive Analysis Prescriptive Analysis
Unstructured Data
Operational Analysis Data Visualization Data Mashups
Analysis
2/3/2021 6
Inspektorat Jenderal toward IACM 4 6
2 Essential Concepts (2)
Big Data
Data Data Science Architecture
Process Big Data
Science Components
2/3/2021 7
Inspektorat Jenderal toward IACM 4 7
2 Essential Concepts (3)
Sources of Service Based Machine
Big Data Data Lake Architecture Learning
2/3/2021 8
Inspektorat Jenderal toward IACM 4 8
2 Essential Concepts (4)
Sentimen Data & Text Predictive Prescriptive
Analysis Mining Analysis Analysis
• Media monitoring and • Data mining: analysis • Sub-field of supervised • To define actions that
text analysis are that reveals patterns in learning where users will affect outcomes,
automated methods for data using various attempt to model data rather than just
retrieving insights from algorithms elements and predict predicting the
large unstructured or • Text mining: analyzes future outcomes outcomes from actions
semi-structured data documents to classify through evaluation of that have occurred
• This is used to content automatically probability estimates • Prescriptive analytics
understand what • Data and text mining • Insight: What is likely to can continually take in
people say and feel use a range of happen? new data to repredict
about brands, techniques: and re-prescribe. This
products, or services, ➢ Profiling process can improve
etc. prediction accuracy and
➢ Data Reduction result in better
• Using Natural Language ➢ Association
Processing (NLP). prescriptions
Semantic analysis can ➢ Clustering • Scenario: What should
detect sentiment and ➢ Self-organizing we do to make things
also reveal changes in maps happen?
sentiment to predict
possible scenarios.
2/3/2021 9
Inspektorat Jenderal toward IACM 4 9
2 Essential Concepts (5)
• Combines text mining, • Activities like user • Process of interpreting • Combine data and
association, clustering, segmentation, concepts, ideas, and services to create
and other unsupervised sentiment analysis, facts by using pictures visualization for insight
learning techniques to geocoding, and other or graphical or analysis.
codify large data sets techniques applied to representations.
data sets for marketing • Data visualizations
campaign analysis, etc. condense and
• Operational analytics encapsulate
involves tracking and characteristics data,
integrating real-time making them easier to
streams of information, see.
deriving conclusions • In doing so, they can
based on predictive surface opportunities,
models of behavior, identify risks, or
and triggering highlight messages
automatic responses
and alerts
2/3/2021 10
Inspektorat Jenderal toward IACM 4 10
3 Activities
1. Define Big Data Strategy & Business Need 2. Choose Data Sources
❑Define the requirements that identify desired ❑Identify gaps in the current data asset base and find
outcomes with measurable tangible benefits. data sources to fill those gaps.
❑A Big Data strategy must include criteria to evaluate: ❑As more data becomes available, data needs to be
✓What problems the organization is trying to solve? evaluated for worth and reliability.
✓What data sources to use or acquire?
✓The timeliness and scope of the data to provision
✓The impact on and relation to other data structure
✓Influence to existing modelled data
2/3/2021 11
Inspektorat Jenderal toward IACM 4 11
3 Activities (2)
3. Acquire and Ingest Data 4. Develop Data Hypotheses and 5. Integrate and Align Data for
Sources Methods Analysis
❑Data sources need to be found and ❑Define model algorithm inputs, ❑Preparing the data for analysis
loaded into the Big Data types, or model hypotheses and involves understanding what is in the
environment. During this process, methods of analysis. data, finding links between data from
capture critical Metadata about the ❑Each model will operate depending the various sources, and aligning
source. on the analysis method chosen. It common data for use.
❑Once the data is in a data lake, it can should be tested for a range of ❑Apply appropriate data integration
be assessed for suitability for outcomes. and cleansing techniques to increase
multiple analysis efforts ❑Models depend on both the quality quality and usefulness of
❑Before integrating the data, assess of input data and the soundness of provisioned data sets.
its quality. The assessment process the model itself.
provides valuable insight into how
the data can be integrated with
other data sets.
2/3/2021 12
Inspektorat Jenderal toward IACM 4 12
3 Activities (3)
2/3/2021 13
Inspektorat Jenderal toward IACM 4 13
4 Tools
Advances in technology have created
the Big Data and Data Science industry
MPP Shared-nothing Distributed File- In-database Big Data Cloud Statistical Computing Data Visualization
Technologies & based Databases Algorithms Solutions and Graphical Tools
Architecture Distributed file-based An in-database algorithm There are vendors who Languages Advanced visualization
In MPP databases, data is solutions technologies, uses the principle that provide cloud storage R is an open source and discovery tools use
logically distributed such as the open source each of the processors in and integration for Big scripting language and in-memory architecture
across multiple Hadoop, are an a MPP Shared-nothing Data, including analytic environment for to allow users to interact
processing servers, with inexpensive way to store platform can run queries capabilities statistical computing and with the data. A visual
each server having its large amounts of data in independently, so a new graphics. It provides a pattern can be picked up
own dedicated memory different formats form of analytics wide variety of statistical quickly when thousands
to process data locally processing could be techniques of data points are loaded
accomplished by into a sophisticated
providing mathematical display.
and statistical functions
at the computing node
level
2/3/2021 14
Inspektorat Jenderal toward IACM 4 14
5 Techniques
Analytic Modelling
❑Apply proven data modelling techniques while accounting for the variety of sources.
❑Develop the subject area model so it can be related to proper contextual entities and placed into the overall
roadmap.
❑Understand how the data links between data sets.
2/3/2021 15
Inspektorat Jenderal toward IACM 4 15
6 Implementation Guidelines
2/3/2021 16
Inspektorat Jenderal toward IACM 4 16
7 Big Data & Data Science Governance
Sourcing:
What to source, when to source, what is the best source of
data for particular study
Sharing:
What data sharing agreements and contracts to enter into,
terms and conditions both inside and outside the organization
Metadata:
What the data means on the source side, how to interpret the
results on the output side
Enrichment:
Whether to enrich the data, how to enrich the data, and the
benefits of enriching the data
Access:
What to publish, to whom, how, and when
2/3/2021 17
Inspektorat Jenderal toward IACM 4 17
TERIMA KASIH
2/3/2021 18
Inspektorat Jenderal toward IACM 4 18