Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

IBM ICE (Innovation Centre for Education)

Welcome to:
Distribution and Statistical Interference

9.1
Unit objectives IBM ICE (Innovation Centre for Education)
IBM Power Systems

After completing this unit, you should be able to:

• Learn about the applied probability techniques used in data analytics and visualization

• Gain knowledge on the probability distributions used in data analytics and visualization

• Learn about various testing such as hypothesis testing, parametric & non-parametric testing
like t-test, chi-square test, ANNOVA.

• Gain an insight into the concept of clustering analysis

• Understand the concept of dimension reduction such as PCA and factor analysis
Basic concepts of probability
theory IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Experiments
– Deterministic.
– Random.

• Basic terminologies.
– Outcome.
– Trial.
– Sample space.
– Event.
– Mutually exclusive events.
– Exhaustive events.
– Equally likely events.
– Probability space.
Defining probability (1 of 2) IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Classical definition of probability: If an effects of a random experiment cause "n" to be


complete, mutually dependent, and equally likely, "m" is desirable for event E to occur.
Therefore the "P" incidence is probability generally defined by “P(E)” by “P(E)”.

• Von Mises’s statistical or empirical or frequency definition of probability: if tests are repeated
several times under the same conditions, the limit of the repeated incident takes place to
maximum set of tests as the number of tests is that indefinitely is called the probability of that
incident occurring.

• Axiomatic definition of probability: The P(A) function is referred to as an event A. It is the


calculation of chances when following three axiom are met.
Defining probability (2 of 2) IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Some theorems based on the axiomatic definition of probability:

• Multiplication theorem can be stated as follows for two events: For two events A and B,
Bayes’ Theorem IBM ICE (Innovation Centre for Education)
IBM Power Systems

• If an event A can occur if one or another of a set of events that are mutually exclusive B1,B2,
B3, ... , Bk Happens and if P(Bi) ≠ 0 for i = 1,2,3, ... , k, then,
Random variables (1 of 5) IBM ICE (Innovation Centre for Education)
IBM Power Systems

• For probabilistic modelling, random variables are a fundamental method. They help us to
model unknown numerical quantities.

Figure: Random variables


Source: https://miro.medium.com/max/639/1*7DwXV_h_t7_-TkLAImKBaQ.png
Random variables (2 of 5) IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Probability mass function: To calculate the likelihood of each value, a discrete random
variable must be described.

• Definition: Let (Ω,F,P) be a probability space and X : Ω → Z a random variable. The


Probability Mass Function (PMF ) of X is defined as:
– pX (x) := P ({ω | X (ω) = x}). (2.5)
– In terms, pX (x) is the probability of X being equal to x. We usually say that according to a certain
PMF, a random variable is distributed.

Figure: Probability mass function of the random variable X


Random variables (3 of 5) IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Continuous random variables: Sometimes it’s easier to describe physical quantities as


continuous: distance, frequency, weight, temperature etc.

• We should discrete their fields and define them as distinct random variables in order to
design these quantities probabilistically.

• Cumulative Distribution Function (CDF) of the random variable X can be termed as: F(x) =
P(X ≤ x).
Random Variables (4 of 5) IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Probability Density Function: Definition - Let X: Ω → R to be a cdf FX random variable. If FX


is distinguishable, the function of probability density or X pdf is defined as,

• In the length range, fX (x) x is the probability that X is about x as 0. A basic theorem of
calculus calculates the probability of a factor X that belongs to a group,

• P (a < X ≤ b) = FX (b) − FX (a) =


Random variables (5 of 5) IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Example for Probability distribution function:

• Probability density function and cumulative distribution function for rolling a die in example:
Expectation and variance of a
random variable (1 of 2) IBM ICE (Innovation Centre for Education)
IBM Power Systems

• The probability density function (or probability mass function) and the cumulative distribution
function are helpful in characterizing the features of a random variable.

• Some other features of random variables are characterized by the concepts of:
– Expectation
– Variance

• Expectation: Definition - The expectation of a continuous random variable X, having the


probability density function f (x) with |x| f (x)dx < ∞, is defined as

• For a discrete random variable X, which takes the values x1, x2,... with respective
probabilities p2, p2,..., the expectation of X is defined as
Expectation and variance of a
random variable (2 of 2) IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Variance: Variance of a random variable is described by variance. This gives some insight
into how values are clustered and distributed around the distribution's arithmetic mean.
– Var(X)=E[X − E(X)]2 is the variability of a random variable X.

• There is a continuous random variable X with variance

• Similarly, a discrete random variable X variance is,

• The difference is normally referred to as π2=Var(X). The standard deviation of the variability
is the positive square root.
Probability distribution IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Probability distribution: It is a statistical feature that connects or lists all possible outcomes
that a random variable with its corresponding probability of occurrence can take in any
random process.
Discrete probability distribution (1 of 2) IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Types of discrete probability distribution:


– Bernoulli distribution.
– Binomial distribution.
– Poisson distribution.

• Bernoulli Distribution: Only two possible Bernoulli distribution options are available: 1
(success) and 0 (failure) and one test. Thus the random variable X with a Bernoulli
distribution takes value 1 with the likelihood of success, say p, and value 0 with the likelihood
of catastrophe, either q or 1-p.

• The weight function of the likelihood is given by: px(1-p)1-x where x € (0, 1) is used.

• The mean and variance of a Bernoulli random variable are determined to be,
Discrete probability distribution (2 of 2) IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Binomial distribution's properties are:


– Every test is separate.
– In a trial, there are only two possible scenarios, either a win or a loss.
– A minimum of n equivalent experiments are performed.
– The likelihood that all studies will succeed and fail is the same

• One of the characteristics of binomial distribution graph is the chances of success are always
not equal to the chances of failure.
Continuous probability
distributions (1 of 2) IBM ICE (Innovation Centre for Education)
IBM Power Systems

• These variables are defined by an infinite number of potential outcomes and a continuous
function of f(x) distribution. The point odds, therefore, are 0, i.e. P(X = x) = 1.

• Type of continuous probability distribution:


– Uniform distribution.
– Normal distribution.
– Exponential distribution.

• Uniform Distribution: Continuous uniform distribution in R is a continuous alternative to


discreet uniform distribution at a closed period. Continuous random variable X is defined to
obey uniform distribution of intervals [a, b] that is, X U (a, b) when using the Function
Probability Density (PDF),
Continuous probability
distributions (2 of 2) IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Normal Distribution: It is the most common statistical distributions. A German Mathematician,


Carl Friedrich (1777-1855) gave this name. The normal distribution is also referred as
Gaussian distribution.

• If b is given to the PDF, a variable of random X will follow a normal distribution with
parameters of μ and π2.

• E(X) = μ ; and Var(X) = σ2

• Unless you assume that potential lifespan is independent of lifespan already in place (that is,
no "aging" mechanism works), waiting times can be viewed as exponentially increasing.

• If its PDF is given by, a random variable X will adopt exponential distribution with parameter
π > 0.
Statistical analysis IBM ICE (Innovation Centre for Education)
IBM Power Systems

• There are two main categories within statistics


– Descriptive statistics: Through descriptive statistics, either through quantitative formulas or graphs or
tables, you describe, present, summarize and organize the data (population).
– Inferential statistics: Mathematical calculations generate inferential statistics that are more abstract
and allow us to infer patterns and make population assumptions and forecasts based on an analysis
of samples.

• Descriptive statistical analysis helps to understand the data and is a very important part of
analytics.

Figure: Mean, median, mode


Source: https://miro.medium.com/max/2093/0*N7NAMAeUN41oToZ-.png
Measures of variability IBM ICE (Innovation Centre for Education)
IBM Power Systems

Figure: Measures of variability


Source: https://miro.medium.com/max/1823/0*1A7jzRbqRHafgfiP.jpg
Modality IBM ICE (Innovation Centre for Education)
IBM Power Systems

• A distribution's modality depends on how many peaks it contains. Some distributions have
only one peak, but it is possible to find two or more peak distributions.

• The following picture shows graphical representations of the three modality types:

Figure: Modality
Source: https://miro.medium.com/max/1693/0*m_Fd3Opt6L70LiYS.png
Skewness IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Skewness is a representation of a distribution's symmetry.

• Illustration of the different types of skewness.

Figure: Skewness
Source: https://miro.medium.com/max/750/0*F1mkGYUbmqtZzLKF.jpg
Data visualizations for
descriptive analytics: Box plot IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Box plot
– A box plot is based on a representation of five numbers (min, max, three quartiles written in
increasing order) and can be used to provide a graphical overview of the centre-point and variance of
the observed parameter values in a data set.

Figure: Box plot


Source: Introduction%20to%20Statistics%20and%20Data%20Analysis%20in%20R.pdf
Data visualizations for
descriptive analytics: Quantile plot IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Quantile plots (QQ-Plots)


– Example Patterns

Figure: QQ plots
Source: Introduction%20to%20Statistics%20and%20Data%20Analysis%20in%20R.pdf
Data visualizations for
descriptive analytics: Histogram IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Histogram

Figure: Histogram
Source: Introduction%20to%20Statistics%20and%20Data%20Analysis%20in%20R.pdf
Data visualizations for descriptive
analytics: Kernel density plots IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Kernel Density Plots


• The following function can be used to create a kernel density plot:

– Where n = sample size.


– h = bandwidth.
– K = kernel function.
Inferential statistics IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Inferential statistics allows you to make predictions (“inferences”) from that data.

• With inferential statistics, the user can take data from samples and make generalizations
about a population.

• The following questions can be answered using inferential statistics:


• Inferencing the population from the sample.
• Determine the sample and population differences
Inferential statistics: Population
distribution IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Population distribution
– Sampling distribution helps to estimate the population statistic
– Example of effect of sample size on the distribution process
Inferential statistics:
Confidence interval IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Confidence interval
– It is a type of sampling distribution interval estimation that provides a range of values that may include
population statistics
– Formally, Confidence Interval can be described as:

– Example :
– Where, X = the sample mean
– Zα/2 = Z value for desired confidence level α
– σ= standard deviation of population
– Alpha value = 0.95 which is 95% confidence interval,
– Z=1.96

.
Inferential statistics: Hypothesis IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Hypothesis
– Null hypothesis: H0 is usually referred to as the theory with "no difference."
• Significance analysis often starts with the presumption that there is a null hypothesis.
– Alternate hypothesis: Negation of null hypothesis is alternative hypothesis.
• It is set in such a way as to consider alternative hypothesis when null hypothesis is denied.
– Test stats: Test statistics are determined from sample data.
• Based on the significance of the test result, the option to accept or not reject the null hypothesis.
Inferential statistics: Type I type II error IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Type I error.
• Type II error.

Figure: Type I & II error


Source: Introduction%20to%20Statistics%20and%20Data%20Analysis%20in%20R.pdf
Inferential Statistics: Statistical test IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Steps involved to conduct statistical test.

• Generally speaking, a population parameter hypothesis depends on data could be tested


using the steps described below.
– Explains the principles of random value distribution by means of population parameters. (for example,
μ and π). It is included in the parametric analysis.
– In addition, if the sample size increases, some experiments can relax the distributional assumptions.

• Express null hypothesis and alternative hypothesis.


Independent sample test IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Hypothesis
– H0 :There is no significant difference between two treatments.
– H0: 1 = 2 (or) 1 - 2 = 0
– H1: There is a significant difference between two treatments.
– H1: 1 ≠ 2 (or) 1 - 2 ≠ 0

• Test Statistic
– The test statistic is given by:

– Where,
– 𝑛1 and 𝑛2 = number of observations in given two groups respectively.
– 𝑥 ̅1 and 𝑥 ̅2 = sample means of two groups.
– 𝑆12 and 𝑆22 =sample variances the given groups.
Paired t- test IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Test Statistic
– It is given by :

– Where,
– 𝑛= number of observations in given sample.
– 𝑑 ̅ = mean difference between two observations in given sample.
– 𝑆_𝑑 = differences in standard deviation.

• Assumptions
– Random samples should be separate from each population.
– The distribution of variations between pairs in the population should be distributed or distributed
roughly normally.
ANOVA IBM ICE (Innovation Centre for Education)
IBM Power Systems

• ANOVA's aim is much the same as the t test.

• The goal is to decide whether the mean differences obtained for sample data are significant
enough to support a hypothesis that mean differences exist between the populations from
which the samples are taken.
Non-parametric test IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Assumptions
– Not require normal distributions or homogeneity-of-variances,
– But does require independent observations and assumes dependent variable is continuous.

• Test Statistic

– Where,
Kruskal-Wallis Test IBM ICE (Innovation Centre for Education)
IBM Power Systems

• Test statistic
Checkpoint (1 of 2) IBM ICE (Innovation Centre for Education)
IBM Power Systems

Multiple Choice Questions:

1. Advantages of Classical Definition


a) The concept of probability is simple and easily comprehensible.
b) The probability of an incident is measured only based on the given set of points in test area and from
the set of points in incident.
c) This is the only concept that helps us to determine exactly the probability of an occurrence.
d) All the above

2. The properties of Binomial Distribution's is:


a) Every test is not separate.
b) In a trial, there is only one possible scenarios, win
c) A minimum of n equivalent experiments are performed.
d) The likelihood is that all studies will succeed

3. A box plot is based on a representation of _____numbers


a) Six
b) Five
c) Seven
d) Two
Checkpoint solutions (1 of 2) IBM ICE (Innovation Centre for Education)
IBM Power Systems

Multiple Choice Questions


1. Advantages of Classical Definition
a) The concept of probability is simple and easily comprehensible.
b) The probability of an incident is measured only based on the given set of points in test area and from
the set of points in incident.
c) This is the only concept that helps us to determine exactly the probability of an occurrence.
d) All the above

2. The properties of Binomial Distribution's is:


a) Every test is not separate.
b) In a trial, there is only one possible scenarios, win
c) A minimum of n equivalent experiments are performed.
d) The likelihood is that all studies will succeed

3. A box plot is based on a representation of _____numbers


a) Six
b) Five
c) Seven
d) Two
Checkpoint (2 of 2) IBM ICE (Innovation Centre for Education)
IBM Power Systems

Fill in the blanks:

1. The goal of ANOVA is to decide whether the ___obtained for sample data are significant
enough to support a hypothesis.
2. Manny Whitney U Test are ____method that decides how graded scores vary in two
separate groups
3. The central trend determines the cluster data's value inclination around its_______
4. An experiment is called _______ when just two outcomes are possible, replicated numerous
times.

True or False:

1. A random experiment's outcome or combination of outcomes is known as events.


True/False
2. For probabilistic modelling, random variables are not fundamental method. True/False
3. In measures of variability, most common indicator of variation is the size and interquartile
range (IQR). True/False
Checkpoint solutions (2 of 2) IBM ICE (Innovation Centre for Education)
IBM Power Systems

Fill in the blanks

1. The goal of ANOVA is to decide whether the mean differences obtained for sample data
are significant enough to support a hypothesis.
2. Manny Whitney U Test are non- parametric method that decides how graded scores vary in
two separate groups
3. The central trend determines the cluster data's value inclination around its average.
4. An experiment is called binomial when just two outcomes are possible, replicated
numerous times.

True or False

1. A random experiment's outcome or combination of outcomes is known as events. True


2. For probabilistic modelling, random variables are not fundamental method. False
3. In measures of variability, most common indicator of variation is the size and interquartile
range (IQR). True
Question bank IBM ICE (Innovation Centre for Education)
IBM Power Systems

Two mark questions:


1. Define classical probability. List the properties and advantages of classical probability.
2. Define Von Mises’s statistical probability? List its advantages.
3. Define the two types of statistical analysis.
4. Define the following a) Kernel Density Plots b) Inferential Statistics.

Four mark questions:


1. Describe the two experimental forms of probability theory with a suitable example.
2. Illustrate Probability density function with suitable graphical representations.
3. Describe Skewness and Kurtosis with examples.
4. Describe box plots with the graphical representation of its four quantiles.

Eight mark questions:


1. Define probability distribution. Explain the types of probability distribution with suitable
examples and graphical representations.
2. Describe testing hypothesis and its types of errors with suitable examples.
Unit summary IBM ICE (Innovation Centre for Education)
IBM Power Systems

Having completed this unit, you should be able to:

• Learn about the applied probability techniques used in data analytics and visualization

• Gain knowledge on the probability distributions used in data analytics and visualization

• Learn about various testing such as hypothesis testing, parametric & non-parametric testing
like t-test, chi-square test, ANNOVA.

• Gain an insight into the concept of clustering analysis

• Understand the concept of dimension reduction such as PCA and factor analysis

You might also like