Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Data Analytics

Box Plots
• Box plots are useful as they show:
• the average score of a data set.
• the skewness of a data set
• the dispersion of a data set.
• outliers within a data set.
Box plots : skewness
• symmetric distribution: When the median is
in the middle of the box, and the whiskers
are about the same on both sides of the box
• skewed right : When the median is closer to
the bottom of the box, and if the whisker is
shorter on the lower end of the box
• skewed left: When the median is closer to
the top of the box, and if the whisker is
shorter on the upper end of the box
Box plots: dispersion (variability, spread,
scatter)

• Range: The smallest value and largest value are found at the end of
the ‘whiskers’ and are useful for providing a visual indicator regarding
the spread of scores (e.g. the range).

• The interquartile range (IQR) is the box plot showing the middle 50%
of scores and can be calculated by subtracting the lower quartile from
the upper quartile (e.g. Q3−Q1).
• The longer the box, the more dispersed the data.
Box Plot: Outliers
• When reviewing a box plot, an outlier is defined as a data point that is
located outside the whiskers of the box plot.
Example:Sales Data
Univariate, Bivariate and
Multivariate analysis
Univariate Analysis
• When the data contains only one variable and doesn’t deal with a
causes or effect relationships, Univariate analysis technique is used.
The key objective of Univariate analysis is to simply describe the data
to find patterns within the data.
• This is be done by looking into the mean, median, mode, dispersion,
variance, range, standard deviation etc.
• Univariate analysis is conducted through several ways which are
mostly descriptive in nature
• Frequency Distribution Tables
• Histograms
• Frequency Polygons
• Pie Charts
• Bar Charts
Bivariate analysis
• When the data set contains two variables and researchers aim to
undertake comparisons between the two data set then Bivariate
analysis is the right type of analysis technique.
• For example –ratio of students who scored above 85% corresponding
to their genders
• two variables
• gender = X (independent variable)
• result = Y (dependent variable).
A Bivariate analysis will measure the correlations between the two
variables
Bivariate analysis
• Correlation coefficients
• Correlations is a statistical association technique where strength of
relationship between two variables are observed. It shows the
strength as strong or weak correlations and are rated on a scale of –1
to 1, where 1 is a perfect direct correlation, –1 is a perfect inverse
correlation, and 0 is no correlation.
Revisit variance
• Variance is calculated by taking the differences between each number in a data set and
the mean, squaring those differences to give them positive value, and dividing the sum
of the resulting squares by the number of values in the set.

• variance is treats all deviations from the mean of the data set in the same way,
regardless of direction.
• This ensures that the squared deviations cannot sum to zero, which would result in giving
the appearance that there was no variability in the data set at all.
• it gives added weight to numbers that are far from the mean, or outliers.
• Squaring these numbers can at times result in skewed interpretations of the data set as a
whole.
Covariance
• Covariance is a measure of how much two random variables vary
together.
• Similar to variance, but variance tells you how a single variable
varies, co variance tells you how two variables vary together.
Covariance
• Covariance signifies the direction of the linear relationship between
the two variables.
• if the variables are directly proportional or inversely proportional to each
other.
• Increasing the value of one variable might have a positive or a negative
impact on the value of the other variable.
• The values of covariance can be any number between the two
opposite infinities.
• covariance only measures how two variables change together, not the
dependency of one variable on another one.
Covariance
• Covariance is used to measure variables that have different units of
measurement.
• Covariance allows to determine whether units are increasing or
decreasing, but they are unable to solidify the degree to which the
variables are moving together due to the fact that covariance does
not use one standardized unit of measurement.
Correlation or joint variability
• Correlation or joint variability tells you about the relation between a
pair of variables in a dataset. Useful measures include covariance and
the correlation coefficient.
• relationship between the corresponding elements of two variables in
a dataset.
• two variables, 𝑥 and 𝑦, with an equal number of elements, 𝑛.
• Let 𝑥₁ from 𝑥 correspond to 𝑦₁ from 𝑦,
• 𝑥₂ from 𝑥 to 𝑦₂ from 𝑦, and so on.
• there are 𝑛 pairs of corresponding elements: (𝑥₁, 𝑦₁), (𝑥₂, 𝑦₂), and so on.
Correlation
• The correlation coefficient is the term used to refer to the resulting correlation measurement. It will always
maintain a value between one and negative one.
• When the correlation coefficient is one, the variables under examination have a perfect positive correlation.
In other words, when one moves, so does the other in the same direction, proportionally.
• If the correlation coefficient is less than one, but still greater than zero, it indicates a less than perfect
positive correlation. The closer the correlation coefficient gets to one, the stronger the correlation between
the two variables.
• When the correlation coefficient is zero, it means that there is no identifiable relationship between the
variables. If one variable moves, it’s impossible to make predictions about the movement of the other
variable.
• If the correlation coefficient is negative one, this means that the variables are perfectly negatively or
inversely correlated. If one variable increases, the other will decrease at the same proportion. The variables
will move in opposite directions from each other.
• If the correlation coefficient is greater than negative one, it indicates that there is an imperfect negative
correlation. As the correlation approaches negative one, the correlation grows.
Examples
Pearson Correlation
The measure of correlation is called as the Coefficient of Correlation

Assumptions:
• Each observation should have a pair of values.
• Each variable should be continuous.
• Each variable should be normally distributed.
• It should be an absence of outliers.
• not able to tell the difference
between dependent variables and independent
variables.
Visual
• Scatter plots
• Heat maps – correlation matrix
Explore Other Correlations

You might also like