Stats

®
CAREERERA
A Warm Welcome To Careerera Family
MEASURES OF CENTRAL TENDANCY
• A measure of central tendency represents the center point.
• These measures indicate where most values in a distribution fall
and are also referred to as the central location of a distribution.
• In statistics, the three most common measures of central tendency
are the mean, median, and mode.
• Each of these measures calculates the location of the central point
using a different method.
MEAN
• The mean is the arithmetic average, and it is probably the measure
of central tendency that you are most familiar. Calculating the mean
is very simple. You just add up all of the values and divide by the
number of observations in your dataset.
MEDIAN
• The median is the middle value. It is the value that splits the dataset
in half. To find the median, order your data from smallest to largest,
and then find the data point that has an equal amount of values
above it and below it.
MODE
• The mode is the value that occurs the most frequently in your data
set.
MEASURES OF VARIABILITY
• A measure of variability is a summary statistic that represents the
amount of dispersion in a dataset.
• Measures of variability define how far away the data points tend to
fall from the center.
• A low dispersion indicates that the data points tend to be clustered
tightly around the center. High dispersion signifies that they tend to
fall further away.
DIAGRAMS SHOWING THE VARIABILTY
VARIANCE
• Variance is the average squared difference of the values from
the mean.
• The variance includes all values in the calculation by comparing
each value to the mean.
• Calculate a set of squared differences between the data points
and the mean, sum them, and then divide by the number of
observations.
POPULATION VARIANCE FORMULA
• σ2 isthe population parameter for the variance,

• μ is the parameter for the population mean,
• N is the number of data points,
SAMPLE VARIANCE FORMULA
• s2 is the sample variance, and M is the sample mean.

• N-1 in the denominator corrects for the tendency of
a sample to underestimate the population variance.
STANDARD DEVIATION
• The standard deviation is just the square root of the variance.

• It’s the standard or typical difference between each data point and
the mean.
• When the values in a dataset are grouped closer together, we have a
smaller standard deviation.
• On the other hand, when the values are spread out more, the
standard deviation is larger because the standard distance is greater.
STANDARD DEVIATION FORMULA
SAMPLING
• Sampling is technique that you can apply for selecting the
predetermined number of a sample from a large population having
similar characteristics.
• When you perform the investigation, due to lack of resources and
time you would not able to gather information from each and every
one, in such case you can use sampling method for selecting
participants.
• Sampling is done to draw conclusions about populations from
samples.
TYPES OF SAMPLING TECHNIQUES
TYPES OF SAMPLING METHOD
The sampling method can be categorized into two these are:

• Probability: It is a type of sampling method in which you need to
select participants a randomly.
• Non-probability: It is the type of sampling method which is most
suitable for performing both qualitative and exploratory
investigation.
PROBABILISTIC SAMPLING

Stratified Sampling:
• You can apply the stratified sampling method where the population
involves mixed characteristics and you want to make sure that you
have represent each character is equally .
Simple Random Sampling:
• It is a type of probability sampling method by using which you can
provide all people in the population with a chance to get select as a
participant
PROBABILISTIC SAMPLING
Systematic Sampling:
• It is a probability sampling method that includes a listing of all
people. In a systematic sampling technique, all people are chosen
at a regular interval of time.
Cluster Sampling:
• it is a type of sampling technique that mainly includes categorizing
the population into subgroups. While applying this technique you
need to make sure that every group consists of similar
characteristics.
OVER SAMPLING
Oversampling :
• Duplicating samples from the minority class
• Random Oversampling includes selecting random
examples from the minority class with replacement and
supplementing the training data with multiple copies of
this instance, hence it is possible that a single instance
may be selected multiple times.
UNDER SAMPLING
• Under sampling — Deleting samples from the majority class.

• Under sampling attempts to reduce the bias(error) associated
with imbalanced classes of data.
• You can under sample the majority class, oversample the minority
class, or combine the two techniques.
• In general, under sampling (instead of oversampling) the majority
class works best for large data sets.
PROBABILITY DISTRIBUTION
Continuous Discrete PD
PD • Binomial
Distribution
• Normal • Poisson
Distribution Distribution
CONTINUOUS PROBABILITY DISTRIBUTION
Normal Distribution:
• It is also known as the Gaussian distribution, it is a probability
distribution that is symmetric about the mean, showing that data near
the mean are more frequent in occurrence than data far from the mean.
• In graph form, normal distribution will appear as a bell curve.
DISCRETE PROBABILITY DISTRIBUTION
Binomial Distribution
• The binomial distribution with parameters n and p is the discrete
probability distribution of the number of successes in a sequence
of n independent experiments, each asking a yes–no question, and
each with its own Boolean-valued outcome: success or failure.
DISCRETE PROBABILITY DISTRIBUTION
Poisson Distribution
• A Poisson distribution is a distribution that can be used to show
how many times an event is likely to occur within a specified period
of time. Helps to understand independent events that occur at a
constant rate within a given interval of time.
BAYES' THEOREM
• Bayes' theorem is a way to figure out conditional probability.
• Bayes' theorem states that the conditional probability of an
event, X, given the occurrence of another event, Y, is equal to
the product of the likelihood of Y given X and the probability of
X.
FORMULA FOR BAYES' THEOREM
• Bayes’ Theorem (also known as Bayes’ rule) is a deceptively simple

formula used to calculate conditional probability.
• The Theorem was named after English mathematician Thomas
Bayes (1701-1761). The formal definition for the rule is:
CENTRAL LIMIT THEOREM
• The central limit theorem (CLT) states that the distribution of
sample means approximates a normal distribution as the sample
size gets larger.
• Sample sizes equal to or greater than 30 are considered sufficient
for the CLT to hold.
FORMULA
• Central limit theorem is applicable for a sufficiently large sample

sizes (n ≥ 30).
• The formula for central limit theorem can be stated as follows:
•
Type 1 and Type 2 errors
Type 1 error
• Type 1 error, in statistical hypothesis testing, is the error caused

by rejecting a null hypothesis when it is true.
• Type I error is denoted by α (alpha) known as an error, also called
the level of significance of the test.
• Type 1 error occurs when the null hypothesis is rejected even
when there is no relationship between the variables.
Type II error
• Type II error is the error that occurs when the null hypothesis is
accepted when it is not true.
• The type II error results in a false negative result.
• The Type II error is denoted by β (beta) and is also termed as the
beta error.
• The null hypothesis is set to state that there is no relationship
between two variables and the cause-effect relationship between
two variables, if present, is caused by chance.
Type I error VS Type II error
Type I error Type II error

Type 1 error, in statistical hypothesis Type II error is the error that occurs when
testing, is the error caused by rejecting a the null hypothesis is accepted when it is
null hypothesis when it is true. not true.

It is the false acceptance of an incorrect
It is a false rejection of a true hypothesis.
hypothesis.
Type I error is denoted by α. Type II error is denoted by β.
The probability of type I error is equal to The probability of type II error is equal to
the level of significance. one minus the power of the test.
It can be reduced by decreasing the level of It can be reduced by increasing the level of
significance. significance.
Type I error is similar to a false hit. Type II error is similar to a miss.
HYPOTHESIS TESTING
Hypothesis testing is a well defined procedure which helps us to
decide objectively whether to accept or reject the hypothesis based
on the information available from the sample.
NULL HYPOTHESIS
• Null Hypothesis (H0) is the hypothesis which is tested for the

possible rejection under the assumption that is true.
• It is used to determine whether the hypothesis assumed for the
sample of data stands true for the entire population or not.
• A null hypothesis proposes that no statistical significance exists in a
set of given observations.
• The hypothesized value of the parameter is called the null
hypothesis.
ALTERNATIVE HYPOTHESIS
• Alternative Hypothesis (Ha) is the logical opposite of the null

hypothesis
• The alternative hypothesis is a statement that will be accepted as a
result of the null hypothesis being rejected.
• The alternative hypothesis is the residual of the null hypothesis
• Alternative Hypothesis includes the outcomes not covered by the
null hypothesis.
STEPS OF HYPOTHESIS TESTING
• Step 1: Specify the Null Hypothesis. ...

• Step 2: Specify the Alternative Hypothesis. ...
• Step 3: Set the Significance Level (a) ...
• Step 4: Calculate the Test Statistic and Corresponding P-Value. ...
• Step 5: Drawing a Conclusion.
QUESTIONS ON HYPOTHESIS TESTING
Q1. A test of H0:μ=20H0:μ=20 versus Ha: μ>20Ha:μ>20 rejects the null hypothesis.
Later it is discovered that μ=19.9μ=19.9. What type of error, if any, has been made?
Soln : Null Hypothesis:
H0:μ=20H0:μ=20
Alternative Hypothesis:
Ha:μ>20
The new population mean, μ=19.9
Conclusion:
In this question, we are rejecting the null hypothesis when it is false because the
actual mean is μ=19.9μ=19.9. Therefore, it is neither a type I error nor a type II error.
Option (c) is correct.
• 4. Assume that the difference between the observed, paired
sample values is defined in the same manner and that the
specified significance level is the same for both hypothesis tests.
Using the same data, the statement that “a paired/dependent two
sample t-test is equivalent to a one sample t-test on the paired
differences, resulting in the same test statistic, same p-value, and
same conclusion” is:
• a. Always True b. Never True
• c. Sometimes True d. Not Enough Information.
Soln : a . Always True
Q. A statistics instructor believes that fewer than 20% of Evergreen
Valley College (EVC) students attended the opening night midnight
showing of the latest Harry Potter movie. She surveys 84 of her
students and finds that 11 attended the midnight showing. An
appropriate alternative hypothesis is:
1.p=0.20 2. p>0.20
3.p<0.20 4. p≤0.20
Soln : 3. p<0.20
Z-TESTS
• Z-test is a statistical test to determine whether two population
means are different when the variances are known and the
sample size is large.
• Z-test is a hypothesis test in which the z-statistic follows a
normal distribution.
• A z-statistic, or z-score, is a number representing the result
from the z-test.
• Z-tests assume the standard deviation is known
FORMULA FOR Z TEST
ONE SAMPLE T-TEST
• The One Sample T-Test (Student’s T Test) allows to compares the
(small) population mean to some hypothesized value or one
sample mean to determine if they are significantly different.
• t test is used when the population standard deviation is unknown .
• t- test is applied when the sample size is below 30, otherwise use
Z-test (for known variance)
ONE SAMPLE T-TEST FORMULA
TWO-SAMPLE T-TEST
• The two-sample t-test (also known as the independent samples t-

test) is a method used to test whether the unknown population
means of two groups are equal or not.
• The assumptions of the two-sample t-test are:
1. The data are continuous.
2. The data follow the normal probability distribution.
3. The variances of the two populations are equal.
4. The two samples are independent.
TWO SAMPLE T-TEST FORMULA
•Where n1 and n2 are sample sizes

•x1̅ and x2̅ are means of sample sizes
•Sp is the pooled standard deviation
CONFIDENCE INTERVAL
• A confidence interval gives a range of values, bounded above and

below the statistic's mean, that likely would contain an unknown
population parameter.
• A confidence interval (CI) is a type of estimate computed from the
statistics of the observed data.
• The confidence level is designated before examining the data.
Most commonly, a 95% confidence level is used.
HOW ARE CONFIDENCE INTERVALS USED?
• Statisticians use confidence intervals to measure uncertainty in a

sample variable.
• For example, a researcher selects different sample randomly from
the same population and computes a confidence interval for each
sample to see how it may represent the true value of the
population variable.
• The resulting datasets are all different where some intervals
include the true population parameter and others do not.
ANOVA
• Analysis of variance, or ANOVA, is a statistical method that
separates variance data into different components to use for
additional tests.
• A one-way ANOVA is used for three or more groups of data, to
gain information about the relationship between the dependent
and independent variables.
• If no true variance exists between the groups, the ANOVA's F-
ratio should equal close to 1.
NEED FOR ANOVA
• The ANOVA test allows a comparison of more than two groups at
the same time to determine whether a relationship exists between
them.
• The result of the ANOVA formula, the F statistic (also called the F-
ratio), allows for the analysis of multiple groups of data to
determine the variability between samples and within samples.
CHI-SQUARE
• The Chi Square test is a type of Non-Parametric test, which deals
with multiple samples with categorical data.
• Chi-square is a statistical test commonly used to compare
observed data with data we would expect to obtain according to a
specific hypothesis.
• Chi square is used when we have categorical data
CHI SQUARE TEST
• The Chi Square Test is a statistical test which consists of three
different types of analysis
• 1) Test for Homogeneity
• 2) Chi Square test of Association or Test of Independence
• 3) Goodness of fit
CHI-SQUARE TESTING
• In the chi-square test, two hypotheses are tested.
• The null hypothesis (H0) states that there is no difference between
the two observed and expected values; they are statistically the
same and any difference that may be detected is due to chance.
• The alternative hypothesis (Ha) states that the two sets of data,
the observed and expected values, are different; the difference is
statistically significant and must be due to some reason other than
chance.
χ2=∑(observed−expected) 2 /expected
Thank You !!!
(USA) (INDIA)
2-Industrial Park Drive, E-Waldorf, MD,
20602,
B-44, Sector-59,
United States Noida Uttar
Pradesh 201301
(USA)
(INDIA)
+1-844-889-4054
+91-92-5000-4000
(Singapore)
3 Temasek Avenue, Singapore 039190
info@careerera.com
www.careerera.com

Stats

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stats

Uploaded by

Copyright:

Available Formats

®

• σ2 isthe population parameter for the variance,

• s2 is the sample variance, and M is the sample mean.

• The standard deviation is just the square root of the variance.

The sampling method can be categorized into two these are:

• Under sampling — Deleting samples from the majority class.

• Bayes’ Theorem (also known as Bayes’ rule) is a deceptively simple

• Central limit theorem is applicable for a sufficiently large sample

• Type 1 error, in statistical hypothesis testing, is the error caused

Type I error Type II error

• Null Hypothesis (H0) is the hypothesis which is tested for the

• Alternative Hypothesis (Ha) is the logical opposite of the null

• Step 1: Specify the Null Hypothesis. ...

• The two-sample t-test (also known as the independent samples t-

•Where n1 and n2 are sample sizes

• A confidence interval gives a range of values, bounded above and

• Statisticians use confidence intervals to measure uncertainty in a

You might also like