Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 14

Predictive analytics 1

Comparisons of Means
• Hypothesis is a claim or belief
• Hypothesis testing is a statistical process of either rejecting or retaining
the claim or belief or association.
• Hypothesis testing retains two complementary statements called null
hypothesis or alternative hypothesis, and only one of them is true
– In general the null hypothesis means that there is no relationship between two
variables under consideration . It is denoted by H0.
H0: There is no difference in male and female on average time spent in social media.
Alternative hypothesis, denoted by Ha or H1, is the complement of null hypothesis
HA: ‘women use social media more than the men’

• Hypothesis testing is an integral part of predictive analytic techniques


such as multiple linear regression and logistic regression.
• Hypothesis is testing is used for checking the validity of the claim using
evidence found in sample data.
• Steps in hypothesis testing
1. Describe the hypothesis in words. Define it in null
and alternate form. Initially we believe that the null
hypothesis is true. Null and alternate hypotheses are
defined using a population parameter.
2. Identify the test statistic to be used for testing the
validity of the null hypothesis. Test statistic will
enable us to calculate the evidence in support of null
hypothesis. The test statistic will depend on the
probability distribution of the sampling distribution.
3. Decide the criteria for rejection and retention of null hypothesis.
This is called significance value traditionally denoted by symbol α.
The value of α will depend on the context and usually 0.1, 0.05, and
0.01 are used. Significance value α is the Type I error.
4. Calculate the p-value (probability value), which is conditional
probability of observing the test statistic value when the null
hypothesis is true. In simple terms, p-value is the evidence in
support of the null hypothesis.
5. Take the decision to reject or retain null hypothesis based on p-
vale and significance value α. The null hypothesis is rejected when
p-value is less than α and the null hypothesis is retained when p-
value is greater than or equal to α.
• Decision making criteria in hypothesis testing
Criteria Decision

p-value < α Reject null hypothesis

p-value ≥  α Retain (or fail to reject) the null hypothesis

• Analyst a priori selects significance level, α, for testing H0.


Typically, the 0.05 level is selected. We calculate p-value and
then compare p-value with α.
• If P < 0.05, reject null hypothesis (Accept alternate
hypothesis)
• If P ≥ 0.05, accept null hypothesis
One Sample T-Test
• The one-sample t-test tells us if a sample mean comes
from a population with a definite mean.
• For this variant of t-test, the alternative hypothesis
assumes that there exists a difference between the
specific value and the sample mean.
• The null, on the contrary, assumes that there exists no
difference between this specific value and the sample
value. Thus,
• H0: sample mean = specified value
• H1: sample mean ≠ specified value    
Assumptions of One-Sample T-test
• The data must be continuous.
• The sample observations are independent of one another.
• Similarly, outliers should be checked. The scores must be
standardized and values outside the range of –3.29 and 3.29
are removed.
• The data should be approximately normally distributed. Since
real world data is not perfectly normal, statisticians have
optimized the situation to be approximately normal. Whether
a data set is normal or not can be checked using the
histogram and Kolmogorov–Smirnov Test.
Paired Sample T-test

• This test is also known by the names of dependent t-


test and repeated measures t-test.
• In this variant of the t-test, same unit is measured
twice resulting in pairs of values. These two pairs can
be generated in two ways
– The same unit (individual) is measured at two different
times like pre- and post-training test scores, performance
rating in yr 1 & yr 2 of the same individual.
– The same unit (individual) is measured for two different
interventions such as scores for Java and C++ training of the
same set of individuals.
• For this variant of t-test, the alternative
hypothesis assumes that there exists a
difference between the mean values of the
pairs. The null, on the contrary, assumes that
there exists no difference between the means
of the pairs. Thus,
• H0: Difference of means of pairs = 0
• H1: Difference of means of pairs ≠ 0  
Independent Sample t-test
• Also known as unpaired T-test/two sample T-test,
unrelated T-test
• It checks whether means of two independent samples
are different
• Features:
– One dependent variable
– Two groups, or levels of the independent variable
• Independent samples (between groups): the two groups
are not related in any way
• Interval/ratio measurement of the dependent variable
• For this variant of t-test the alternative
hypothesis assumes there exists a difference
between the means values of the groups. The
null, assumes that there exists no difference
between means of the groups.
• H0: Difference of means of groups = 0
• H1: Difference of means of groups ≠ 0  
ANOVA
• Technique to compare difference in mean
output/attitude/performance (continuous
variable) between three or more groups of
employees/product lines/sales
teams/students.
• When there is only one dependent variable.
• Not meant for categorical dependent variable.
• Assumptions
– Populations are normally distributed.
– Populations have equal variances.
– Samples are randomly and independently drawn.

• Post hoc test


– Can provide pair wise mean difference between the groups
– Use Tukey’s post hoc test when equal variances are assumed
– Use Games-Howell (GH) test when equal variances is not
assumed
Thank you!

You might also like