Download as pdf or txt
Download as pdf or txt
You are on page 1of 84

MACHIN

E LEARNING
Day One
PROFESSIONAL
TRAINING
MACHINE LEARNING FOR
PROFESSIONALS

Day One Session Three


Machine Learning Library
MACHINE LEARNING LIBRARIES
MACHINE LEARNING LIBRARIES

• Google Collab environment or Jupyter Notebook


• Python
• Numpy
• Pandas
• Matplotlib/ Seaborn
GOOGLECOLLAB AND JUPYTERNOTEBOOK

• Google Collab environment or Jupyter Notebook


PYTHON

• Python
NUMPY

A python
library used for working
with arrays. It also has
functions for working in
domain of linear
algebra, fourier
transform, and matrices.
SCIPY
Advanced maths
library based on
NumPy
PANDAS FORDATA ANALYSIS
MATPLOTLIB/ SEABORN
MATHPLOT
LIB 3.3.2
• Plotting library
SEABORN
0.11
• Data visualization
library based on
matplotlib
ALGORITHMIC LIBRARIES
SCIKIT-
LEARN 0.23.2
• Machine learning
library
STATSMOD
ELS0.12.0
Provides classes and
functions for the
estimation of many
different statistical
models, as well as
for conducting
statistical tests, and
statistical data
exploration
MACHINE LEARNING FOR
PROFESSIONALS

Day One Session Four


Data Analyisis and Visualisation
DATA ANALYSIS AND VISUALISATION

• Statistical Analysis
• Visualisation
STATISTICAL ANALYSIS

• Summarize main characteristics of the data


• Gain better understanding of the dataset
• Uncover relationships between different features/ variables
STATISTICAL ANALYSIS
STATISTICAL ANALYSIS

• Boxplot
STATISTICAL ANALYSIS

• Relation between features


(and classes)
STATISTICAL ANALYSIS

• Analysis of Variant(ANOVA)
• A statistical test that can be used to find the correlation between different groups of a categorical
variable.
• The ANOVA test returns two variables
• F-test score: ratio of variation between the groups’ mean over the variation within each of the sample
groups.
• P-value shows whether the obtained result is statistically significant.
STATISTICAL ANALYSIS
• F-test score
STATISTICAL ANALYSIS
• Correlation
STATISTICAL ANALYSIS
• (Pearson) Correlation
STATISTICAL ANALYSIS
• (Pearson) Correlation
STATISTICAL ANALYSIS
• Heatmap of Correlations
DATA
VISUALIZATION

DS / A I Professional Course

ITB
DAT
A The representation and
VISUALIZATI presentation of data that
ON exploits our visual
perception abilities in
order to amplify cognition
REPRESENTATION

• The way you decide to depict data through a choice of physical forms.

Taken from bbc.co.uk


• You are taking data as the raw material and creating a representation to
best portray its attributes.

31/03/21 IF4061 Visualisasi Data dan Informasi _DPL


PRESENTATION

• It goes beyond the representation


of data
• Concerns how you integrate your
data representation into the
overall communicated work

Including the choice of:


• Colors
• Layout
• Annotations
• interactive features, etc

IF4061 Visualisasi Data dan Informasi _DPL 31/03/21


PRESENTATION(2)

31/03/21 IF4061 Visualisasi Data dan Informasi _DPL


AMPLIFY COGNITION

Maximizing how efficiently and effectively we


are able to process information into thought,
insights, and knowledge.

IF4061 Visualisasi Data dan Informasi _ DPL 31/03/21


DATA VIZ PURPOSE

1. Data Analysis
To understand data and to take information comprehensively
The greatest value of a picture is when it forces us to notice what we never expected to see
John W Tukey

A picture is worth 10,000 words


(anonymous)

IF4061 Visualisasi Data dan Informasi _DPL 31/03/21


WHAT CAN WE FIND?

IF4061 Visualisasi Data dan Informasi _DPL 31/03/21


LETSVISUALIZINGIT!

IF4061 Visualisasi Data dan Informasi _DPL 31/03/21


LETSVISUALIZINGIT!!

IF4061 Visualisasi Data dan Informasi _DPL 31/03/21


LETSVISUALIZINGIT!!

IF4061 Visualisasi Data dan Informasi _DPL 31/03/21


LETSVISUALIZINGIT!!

IF4061 Visualisasi Data dan Informasi _DPL 31/03/21


ADVANTAGE
OFDATA
ANALYSIS • Understand large data (faster)

USIN • Capture important properties of data

G • Capture problems

VISUALIZATI • Tool for quality control


ON • Facilitate new hypothesis
DATA VIZ PURPOSE

2. Communication

To communicate information:

Incorporated simplification and tonal (feeling)

IF4061 Visualisasi Data dan Informasi _DPL 31/03/21


IF4061 Visualisasi Data dan Informasi _DPL 31/03/21
IF4061 Visualisasi Data dan Informasi _DPL 31/03/21
METODOLOGI
7 STAGES OF VISUALIZING DATA
(FRY, 2008), PP. 5

1. Acquire:Obtain the data.


2. Parse:Provide some structure for the data’s meaning and order it into
categories.
3. Filter: Remove all but the data of interest.
4. Mine: Apply methods from statistics or data mining as a way to
discern patterns or place the data in mathematical context.
5. Represent:Choose a basic visual model, such as a bar graph, list, or
tree.
6. Refine: Improve the basic representation to make it clearer and more
visually engaging.
7. Interact:Add methods for manipulating the data or
controlling what features are visible.

Note: stages are often iterative and may have a flexible order or
even be omitted in some projects.

IF4061 - Data and Information Visualization


79
CHOOSINGTHEAPPROPRIATECHART
TYPE
1. Does it accommodate the physical
properties of your data?
2. Does it facilitate the desired degree of
accuracy?
3. Is it potentially capable of conveying a
certain metaphorical and design
consistency with our subject matter?
80
FIVE VISUALIZATIONMETHODS

01 02 03 04 05

Comparison Part-of-a- Changes Charting Map


Spatial
Whole over Time Relationship Data
1. COMPARING CATEGORICAL VALUES

To facilitate comparisons between the relative and absolute sizes of


categorical values.

Example: bar chart

82
COMPARING CATEGORIES: DOTPLOT

• Data variables: 2 x categorical, 1 x quantitative ratio.


• Visual variables: Position, color-hue, symbol.
83
COMPARING CATEGORIES:
FLOATINGBAR(ORGANTTCHART)

• Data variables: 1 x categorical-nominal, 2 x quantitative.


• Visual variables: Position, length

84
COMPARINGCATEGORIES: HISTOGRAM

Histogram vs Bar Chart?

• Histograms show distribution


through the frequency of
quantitative values (y axis)
against defined intervals of
quantitative values(x axis).
• By contrast, bar charts facilitate
comparison of categorical
values.
• One of the distinguishing
features of a histogram is the
lack of gaps between the bars

• Data variables: 1 x quantitative-interval, 1 x quantitative-ratio.

• Visual variables: Height,width.

85
COMPARINGCATEGORIES: SLOPEGRAPH
(ORBUMPS CHARTORTABLECHART)

• for comparing two (or


more) sets of
quantitative values
when they are
associated with the
same categorical
value.
• A before and after
view or comparison
of two different points
in time

• Data variables: 1 x categorical, 2 x quantitative.


• Visual variables: Position, connection, color-hue. 86
COMPARING CATEGORIES: RADIAL

CHART

• Data variables: Multiple x categorical, 1 x categorical-ordinal.


• Visual variables: Position, color-hue, color-saturation/lightness, texture. 87
COMPARINGCATEGORIES:
SANKEYDIAGRAM

• Data variables: Multiple x categorical, multiple x quantitative.


• Visual variables: Height, position, link, width, color-hue. 88
COMPARING CATEGORIES:
WORDCLOUD

• Data variables: 1 x categorical, 1 x quantitative-ratio.


• Visual variables: Size.
89
2. ASSESSING HIERARCHIES AND PART-
OF-A-WHOLE RELATIONSHIPS
• a breakdown of categorical values in their
relationship to a population of values
• or as constituent elements of hierarchical structures.

• Example: pie chart.

90
ASSESSING HIERARCHIES: SQUA

REPIE (ORUNIT CHARTORWAFFLECHA RT)

• Data variables: 1 x categorical, 1 x quantitative-ratio.


• Visual variables: Position, color-hue, symbol.
91
ASSESSING HIERARCHIES:
STAC KEDBAR CHART(ORSTACKED COLUMNCHART)

• Data variables: 2 x categorical, 1 x quantitative-ratio.


• Visual variables: Length, color-hue, position, color-saturation/lightness.
92
ASSESSING HIERARCHIES:
TREEMAP

• Data variables: Multiple x categorical-nominal, 1 x quantitative-


ratio. 93
ASSESSING HIERARCHIES:
BUBBLE HIERARCHY

• Data variables: Multiple x categorical, 1 x quantitative-ratio.


94
• Visual variables: Area, position, color-hue.
3. SHOWING CHANGESOVERTIME

• To exploit temporal data and show the changing trends


• Patterns of values over a continuous timeframe.

Example:
line
chart.

95
SHOWING CHANGESOVERTIME:
SPARKLINES

• Data variables: 1 x quantitative-interval, 1 x quantitative-ratio.


• Visual variables: Position, slope.
96
SHOWING CHANGESOVERTIME:
AREA CHART

• Data variables: 1 x quantitative-interval, 1 x categorical, 1 x quantitative-


ratio.
• Visual variables: Height, slope, area, color-hue.

Area chart vs Line Chart? 97


SHO WING CHANGESOVERTIME:
HORI ZON CHART

• Data variables: 1 x quantitative-interval, 1 x categorical, 2 x


quantitative-ratio.
• Visual variables: Height, slope, area, color-hue, color- 98
SHOWING CHANGESOVERTIME:
STACKED

AREACHART

• Data variables: 1 x quantitative-interval, 1 x categorical, 1 x


quantitative-ratio.
99
• Visual variables: Height, area, color-hue.
SHOWING CHANGESOVERTIME:
STREAM GRAPH

• Data variables: 1 x quantitative-interval, 1 x categorical, 1 x


quantitative-ratio.
100
• Height, area, color-hue.
SHOWING CHANGESOVERTIME:
CANDLES

TICKCHART

• Data variables: 1 x quantitative-interval, 4 x quantitative-ratio.


101
• Visual variables: Position, height, color-hue.
SHOWING CHANGESOVERTIME:
BARCODECHART

• Data variables: 1 x quantitative-interval, 3 x categorical.


• Visual variables: Position, symbol, color-hue. 102
4. CHARTING ANDGRAPHING
RELATIONSHIPS
• To assess the associations, distributions, and patterns that exists between
multivariate datasets.

• Usually focuses on facilitating exploratory analysis.

Example : scatter plot.

103
PLOTTING CONNECTIONSAND
RELATIONSHIPS: BUBBLEPLOT

• Data variables: 3 x quantitative, 1 x categorical.


104
• Visual variables: Position, area, color-hue.
200 COUNTRIES, 200 YEARS, 4 MINUTES - THE
JOY OFSTATS

105
PLOTTING CONNECTIONSAND
RELATIONSHIPS:
SCATTERPLOTMATRIX

• Data variables: 2 x quantitative, 2 x categorical.


• Visual variables: Position, area, color-hue.
106
PLOTTING CONNECTIONSAND
RELATIONSHIPS:
HEATMAP (ORMATRIXCHART)

• Data variables: Multiple x categorical, 1 x quantitative-ratio.


• Visual variables: Position, color-saturation. 107
PLOTTING CONNECTIONSAND
RELATIONSHIPS:
PARALLELSETS(ORPARALLELCOORDINATES)

• Data variables: Multiple x categorical, multiple x quantitative-ratio.


108
• Visual variables: Position, width, link, color-hue.
PLOTTING CONNECTIONSAND
RELATIONSHIPS:
RADIAL NETWORK(ORCHORDDIAGRAM)

• Data variables: Multiple x categorical, 2 x quantitative-ratio.


• Visual variables: Position, connection, width, color-hue,
color-lightness, symbol, size. 109
PLOTTING CONNECTIONSAND
RELATIONSHIPS:
NETWORKDIAGRAM

• Data variables: Multiple x categorical-nominal, 1 xquantitative-


ratio. 110
5. MAPPING GEO-SPATIAL DATA
To plot and present datasets with geo-spatial properties via the many
different mapping frameworks.

Example:
choropleth map

111
MAPPING GEO-SPATIAL DATA:
DOTPLOTMAP
• Data variables: 2 x quantitative-interval.
• Visual variables : Position.

112
MAPPING GEO-SPATIAL DATA:
BUBBLE PLOTMAP
• Data variables: 2 x quantitative-interval, 1 x quantitative-ratio, 1
x categorical-nominal.
• Vis ual variables : Position, area, color-hue.

113
MAPPING GEO-SPATIAL DATA:
ISARITHMIC MAP
• Data variables: Multiple x quantitative, multiple x categorical.
• Visual variables: Position, color-hue, color-saturation, color-
darkness.

114
MAPPING GEO-SPATIAL DATA:
PARTICLEFLOWMAP
• Data variables: Multiple x quantitative.
• Visual variables: Position, direction, thickness, speed.

115
MAPPING GEO-SPATIAL DATA:
CARTOGRAM
• Data variables: 2 x quantitative-interval, 1 x quantitative-
ratio.
• Visual variables: Position, size.

116
MAPPING GEO-SPATIAL DATA:
DORLING CARTOGRAM
• Data variables: 2 x categorical, 1 x quantitative-ratio.
• Visual variables: Position, size, color-hue.

117
MAPPING GEO-SPATIAL DATA:
NETWORKCONNECTIONMAP
• Data variables: 2 x quantitative-interval, 1 x categorical-nominal.
• Visual variables: Position, link, color-hue.

118

You might also like