Professional Documents
Culture Documents
DSAI-ProfTraining-DayOne - Session 3 - 4
DSAI-ProfTraining-DayOne - Session 3 - 4
E LEARNING
Day One
PROFESSIONAL
TRAINING
MACHINE LEARNING FOR
PROFESSIONALS
• Python
NUMPY
A python
library used for working
with arrays. It also has
functions for working in
domain of linear
algebra, fourier
transform, and matrices.
SCIPY
Advanced maths
library based on
NumPy
PANDAS FORDATA ANALYSIS
MATPLOTLIB/ SEABORN
MATHPLOT
LIB 3.3.2
• Plotting library
SEABORN
0.11
• Data visualization
library based on
matplotlib
ALGORITHMIC LIBRARIES
SCIKIT-
LEARN 0.23.2
• Machine learning
library
STATSMOD
ELS0.12.0
Provides classes and
functions for the
estimation of many
different statistical
models, as well as
for conducting
statistical tests, and
statistical data
exploration
MACHINE LEARNING FOR
PROFESSIONALS
• Statistical Analysis
• Visualisation
STATISTICAL ANALYSIS
• Boxplot
STATISTICAL ANALYSIS
• Analysis of Variant(ANOVA)
• A statistical test that can be used to find the correlation between different groups of a categorical
variable.
• The ANOVA test returns two variables
• F-test score: ratio of variation between the groups’ mean over the variation within each of the sample
groups.
• P-value shows whether the obtained result is statistically significant.
STATISTICAL ANALYSIS
• F-test score
STATISTICAL ANALYSIS
• Correlation
STATISTICAL ANALYSIS
• (Pearson) Correlation
STATISTICAL ANALYSIS
• (Pearson) Correlation
STATISTICAL ANALYSIS
• Heatmap of Correlations
DATA
VISUALIZATION
DS / A I Professional Course
ITB
DAT
A The representation and
VISUALIZATI presentation of data that
ON exploits our visual
perception abilities in
order to amplify cognition
REPRESENTATION
• The way you decide to depict data through a choice of physical forms.
1. Data Analysis
To understand data and to take information comprehensively
The greatest value of a picture is when it forces us to notice what we never expected to see
John W Tukey
G • Capture problems
2. Communication
To communicate information:
Note: stages are often iterative and may have a flexible order or
even be omitted in some projects.
01 02 03 04 05
82
COMPARING CATEGORIES: DOTPLOT
84
COMPARINGCATEGORIES: HISTOGRAM
85
COMPARINGCATEGORIES: SLOPEGRAPH
(ORBUMPS CHARTORTABLECHART)
CHART
90
ASSESSING HIERARCHIES: SQUA
Example:
line
chart.
95
SHOWING CHANGESOVERTIME:
SPARKLINES
AREACHART
TICKCHART
103
PLOTTING CONNECTIONSAND
RELATIONSHIPS: BUBBLEPLOT
105
PLOTTING CONNECTIONSAND
RELATIONSHIPS:
SCATTERPLOTMATRIX
Example:
choropleth map
111
MAPPING GEO-SPATIAL DATA:
DOTPLOTMAP
• Data variables: 2 x quantitative-interval.
• Visual variables : Position.
112
MAPPING GEO-SPATIAL DATA:
BUBBLE PLOTMAP
• Data variables: 2 x quantitative-interval, 1 x quantitative-ratio, 1
x categorical-nominal.
• Vis ual variables : Position, area, color-hue.
113
MAPPING GEO-SPATIAL DATA:
ISARITHMIC MAP
• Data variables: Multiple x quantitative, multiple x categorical.
• Visual variables: Position, color-hue, color-saturation, color-
darkness.
114
MAPPING GEO-SPATIAL DATA:
PARTICLEFLOWMAP
• Data variables: Multiple x quantitative.
• Visual variables: Position, direction, thickness, speed.
115
MAPPING GEO-SPATIAL DATA:
CARTOGRAM
• Data variables: 2 x quantitative-interval, 1 x quantitative-
ratio.
• Visual variables: Position, size.
116
MAPPING GEO-SPATIAL DATA:
DORLING CARTOGRAM
• Data variables: 2 x categorical, 1 x quantitative-ratio.
• Visual variables: Position, size, color-hue.
117
MAPPING GEO-SPATIAL DATA:
NETWORKCONNECTIONMAP
• Data variables: 2 x quantitative-interval, 1 x categorical-nominal.
• Visual variables: Position, link, color-hue.
118