2021 Stat Notes

2021 Business Statistics
Regression
Linear Regression
• After having established the fact that 2 variables are closely related, we should
estimate/predict the value of one variable given the value of another.
• Eg. If advertising and sales are correlated, we can find out:
• The expected amount of sales for a given advertisement expenditure or
• The required amount of expenditure for achieving a fixed sales target
• The statistical tool with the help of which we are in a position to estimate or predict the
unknown values of one variable from known values of another variable is called regression
• It helps to find the average probable change in one variable given a certain amount of change
in another
• Regression is a technique for measuring the linear association between a dependent (Y) and
an independent variable (X)
• Regression analysis attempts to predict the values of a continuous DV from specific values of
the IV
3 advantages of regression analysis
• It provides estimates of values of the DVs from the values of IVs- with
the help of regression line- which describes the average relationship
existing between X and Y variables
• Calculates the standard error- the error involved in using the
regression line as a basis for estimations
• If the scatter is lesser and the line fits the data closely, it means that there is
relatively little scatter of observations around the regression line. This means,
we can make a good estimate of Y, the DV
• But, if the observations are scattered around the fitted regression line, it will
not produce accurate estimates of the DV
Difference between correlation and
regression
Correlation Regression
• Precedes regression • Succeeds correlation
• Tool for ascertaining the dergee • Tool for studying the nature of
of relationship relation- i.e. the cause and effect
relationship
Linear Bivariate Regression Model
• In this, we proceed by observing the sample data,
• and use the results obtained as estimates of the corresponding population relationship
• For a bivariate population, the model chosen is simple linear regression model
• Assumptions:
• The value of Y is dependent upon the value of X
• The average relationship between X & Y can be described as a linear equation
Y=a+bX, which gives a straight line graph
• Y: DV,
• X: IV,
• a: Y-intercept and
• b : slope, i.e. the average amount of change of Y per unit of change in the value of
X. The sign of b indicates the type of relationship between X & Y (direct/ inverse)
Regression lines
• Considering 2 variables X & Y, we have 2 regression lines
• Regression line of X on Y (is the line which gives the best estimate for the value of X for any specified value of
Y)
• Regression line of Y on X (is the line which gives the best estimate for the value of Y for any specified value of
X)
• The farther these lines are from each other, the lesser the degree of
correlation between them and vice versa
• The 2 regression lines show the average relationship between the 2 variables
based on the 2 equations known as Regression equations
• Regression equations: are algebraic expressions of the regression lines. Since
there are 2 regression lines, there are 2 regression equations (RE)
• RE of X on Y: describes the variations in the values of X for given change(s) in Y
and
• RE of Y on X: describes the variations in the values of Y for given change(s) in X
Regression Equation of Y on X
• Y= a + b X
• Y is the DV or criterion variable to be estimated and X is the IV or the predictor variable
• a & b are 2 unknown constants (fixed numerical values) which determine the position of the line
completely
• The constants are called parameters of the line
• If the value of either or both of them is changed, another line is determined
• Parameter ‘a’ determines the level of the fitted line (i.e. the distance of the line directly above or
below the origin)
• Parameter ‘b’ determines the slope of the line (i.e. change in Y for unit change in X)
• Once the values of a and b are obtained, we can determine the line. This is done by the METHOD OF
LEAST SQUARES
• It states tat the line should be drawn through the plotted points in such a manner that the sum of the
squares of the vertical deviations of the actual Y values from the estimated Y value IS THE LEAST
• i.e the (Actual Y – Estimated Y)^2 is minimum
• Such a line is called the LINE OF BEST FIT
The line of best fit
For the points which lie above the line, the

error would be positive and for the points
which lie below the line, the error would be
negative.
1
2
3
4
5
Unit 6
• Introduction to Probability
• Basic Concepts of Probability
Uncertainties
Managers often base their decisions on an analysis

of uncertainties such as the following:
What are the chances that sales will decrease
if we increase prices?
What is the likelihood a new assembly method

method will increase productivity?
What are the odds that a new investment will

be profitable?
Probability
Probability is a numerical measure of the likelihood

that an event will occur.
Probability values are always assigned on a scale

from 0 to 1.
A probability near zero indicates an event is quite

unlikely to occur.
A probability near one indicates an event is almost

certain to occur.
An experiment is any process that generates The sample space for an experiment is the set of
An experimental outcome is also called a sample point.
well- defined outcomes. all experimental outcomes.
Probability as a Numerical Measure
of the Likelihood of Occurrence
Increasing Likelihood of Occurrence
0 .5 1
Probability:
The event The occurrence The event

is very of the event is is almost
unlikely just as likely as certain
to occur. it is unlikely. to occur.
Statistical Experiments
In statistics, the notion of an experiment differs

somewhat from that of an experiment in the
physical sciences.
In statistical experiments, probability determines

outcomes.
Even though the experiment is repeated in exactly

the same way, an entirely different outcome may
occur.
For this reason, statistical experiments are some-

times called random experiments.
An Experiment and Its Sample Space
Experiment Experiment Outcomes

Toss a coin Head, tail
Inspection a part Defective, non-defective
Conduct a sales call Purchase, no purchase
Roll a die 1, 2, 3, 4, 5, 6
Play a football game Win, lose, tie
Assigning Probabilities
 Basic Requirements for Assigning Probabilities
1. The probability assigned to each experimental

outcome must be between 0 and 1, inclusively.
0 < P(Ei) < 1 for all i
where:
Ei is the ith experimental outcome
and P(Ei) is its probability
2. The sum of the probabilities for all experimental
outcomes must equal 1.
P(E1) + P(E2) + . . . + P(En) = 1 where:

n is the number of experimental outcomes
Assigning Probabilities
Classical Method
Assigning probabilities based on the assumption
of equally likely outcomes
Relative Frequency Method

Assigning probabilities based on experimentation
or historical data
Subjective Method
Assigning probabilities based on judgment
Classical Method
 Example: Rolling a Die
If an experiment has n possible outcomes, the
classical method would assign a probability of 1/n
to each outcome.
Experiment: Rolling a die

Sample Space: S = {1, 2, 3, 4, 5, 6}
Probabilities: Each sample point has a
1/6 chance of occurring
 Example: Lucas Tool Rental
Lucas Tool Rental would like to assign probabilities
to the number of car polishers it rents each day.
Office records show the following frequencies of daily
rentals for the last 40 days.
Number of Number
Polishers Rented of Days
0 4
1 6
2 18
3 10
4 2
 Example: Lucas Tool Rental
Each probability assignment is given by dividing
the frequency (number of days) by the total frequency
(total number of days).
Number of Number
Polishers Rented of Days Probability
0 4 .10
1 6 .15
2 18 .45 4/40
3 10 .25
4 2 .05
40 1.00
Subjective Method
 When economic conditions and a company’s
circumstances change rapidly it might be
inappropriate to assign probabilities based solely on
historical data.
 We can use any data available as well as our
experience and intuition, but ultimately a probability
value should express our degree of belief that the
experimental outcome will occur.
 The best probability estimates often are obtained by
combining the estimates from the classical or relative
frequency approach with the subjective estimate.
Subjective Method
 Example: Bradley Investments
An analyst made the following probability estimates.
Exper. Outcome Net Gain or Loss Probability
(10, 8) $18,000 Gain .20
(10, -2) $8,000 Gain .08
(5, 8) $13,000 Gain .16
(5, -2) $3,000 Gain .26
(0, 8) $8,000 Gain .10
(0, -2) $2,000 Loss .12
(-20, 8) $12,000 Loss .02
(-20, -2) $22,000 Loss .06
Events and Their Probabilities
An event is a collection of sample points.
The probability of any event is equal to the sum of

the probabilities of the sample points in the event.
If we can identify all the sample points of an

experiment and assign a probability to each, we
can compute the probability of an event.
Some Basic Relationships of Probability
There are some basic probability relationships that
can be used to compute the probability of an event
without knowledge of all the sample point probabilities.
Complement of an Event
Union of Two Events
Intersection of Two Events
Mutually Exclusive Events

Complement of an Event
The complement of event A is defined to be the event

consisting of all sample points that are not in A.
The complement of A is denoted by Ac.
Sample
Event A Ac Space S
Venn
Diagram
Union of Two Events
The union of events A and B is the event containing

all sample points that are in A or B or both.
The union of events A and B is denoted by A B
Sample
Event A Event B Space S
Intersection of Two Events
The intersection of events A and B is the set of all

sample points that are in both A and B.
The intersection of events A and B is denoted by A 
Sample
Intersection of A and B
Addition Law
The addition law provides a way to compute the

probability of event A, or B, or both A and B occurring.
The law is written as:
P(A B) = P(A) + P(B) - P(A  B

Two events are said to be mutually exclusive if the

events have no sample points in common.
Two events are mutually exclusive if, when one event

occurs, the other cannot occur.
Sample
If events A and B are mutually exclusive, P(A  B = 0.
The addition law for mutually exclusive events is:
P(A B) = P(A) + P(B)
There is no need to
include “- P(A  B”
Independent Events
If the probability of event A is not changed by the

existence of event B, we would say that events A
and B are independent.
Multiplication Law
for Independent Events
The multiplication law also can be used as a test to see
if two events are independent.
The law is written as:
P(A B) = P(A)P(B)

Mutual Exclusiveness and Independence
Do not confuse the notion of mutually exclusive

events with that of independent events.
Two events with nonzero probabilities cannot be

both mutually exclusive and independent.
If one mutually exclusive event is known to occur,

the other cannot occur.; thus, the probability of the
other event occurring is reduced to zero (and they
are therefore dependent).
Two events that are not mutually exclusive, might

or might not be independent.
Question
• What is the probability that a randomly chosen card from a deck of cards will be
either a king or a heart?
• Solution:
• 52 cards
• A: Kings= 4, P(A)= 4/52
• B: Hearts= 13, P(B)= 13/52
• King & heart= 1
• P(A or B)= P (A U B)= P(A B) = P(A) + P(B) - P(A  B= 16/52= 4/13
• P(A  B= P(A & B)= 1/52
Binomial, Poisson (theory only)
Normal distributions (theory & sums)
Business Forecasting and Time
Series Analysis
Importance
Components
Trend
Free hand method
Methods of semi-averages, moving averages and least squares
Problems based on these
Forecasting, methods of forecasting
Introduction
• When estimates of future conditions are made on a systematic
basis, the process is referred to as forecasting
• The figure or statement obtained is called forecast
• Forecasting is a service whose purpose is to offer the best available
basis for ‘management expectations of the future’
• Forecasting aims at reducing the area of uncertainty that surrounds
management decision-making, with respect to costs, profit, sales,
production, pricing, capital investment etc.
• Forecasting is the process of making predictions of the future,
based on the past & present data by the analysis of trends
• The knowledge of forecasting methods is essential for decision
makers to make reliable and accurate estimates and assess or
evaluate the future consequences of decisions in the face of
uncertainty
What Is Forecasting?
• An essential tool in any decision-making process
• Process of predicting a future event
• Underlying basis of all business decisions
• Planning production in expectation of certain levels of
sales
• Building warehouses in expectation of certain levels of
stocks and sales
• Setting prices in expectation of certain levels of raw
material costs, financial constraints, wages and sales
• Recruiting labour, buy materials, arrange finance or plan
factories in expectation of certain levels of sales and
other activity
Objectives of forecasting
• Creating plans of action
• Monitoring its progress
• Developing a warning-system of the critical factors
Types of forecasts
• Demand forecasts: prediction of demands for products/services based
on the sales and marketing information
• Environmental forecasts: concerned with the social, political and
economic environments of the state/ country
• Technological forecasts: concerned with the new developments in
existing technologies
Timing of forecasts
• Short-range: 0-3 months, may go upto 1 year- for job scheduling, work
force levels, job assignments etc.
• Uses mathematical techniques like moving averages, exponential
smoothing, trend exploration
• Medium-range: 1-3 yrs time span- used for sales planning, production
planning, budgeting etc.
• Long-range: >=3 yrs: for designing/ installing new plants, facility
location, R&D etc.
Forecasting Approaches
Qualitative Methods Quantitative Methods
• Used when situation is • Used when situation is
vague & little data exist ‘stable’ & historical data
• New products exist
• New technology • Existing products
• Involve intuition, • Current technology
experience • Involve mathematical
techniques
Quantitative Forecasting- process
• Select several forecasting methods

• Evaluate forecasts
• Select best method
• Forecast the future
• Monitor the forecasting accuracy continuously
These are used when:

• Uses past data
• Quantified information
• Past pattern continues into the future
Quantitative Forecasting Methods
Quantitative
Forecasting
Time Series Causal

Models Models
Moving Exponential Trend

Free-hand Regression
method Average Smoothing Models
A. Time series forecasting methods
• Time series is a set of measurements of a variable that changes through
time
• The data is gathered on a variable over a period of time at regular intervals
• The future periods’ outcome is predicted by analysing patterns over a
period of time
• Considers an observed historical pattern for any variable and projects the
same into the future using a mathematical formula
• B. Causal forecasting methods
• Based on the assumptions that the variable value to be forecasted has a
cause-effect relationship with one or more variables
• Eg. Correlation, Linear regression method
• Identifies the factors that influence cause variations in the value of any
variable in some predictable manner
Free-hand method
• A trend line fitted by the free-hand method should confirm to the following conditions:
• It should be smooth
• The sum of the vertical deviations of the observations above the trend line should be
equal to the sum of the vertical deviations of the observations below the trend line
• The sum of the squares of the vertical deviations of the observations from the trend
line should be as small as possible
• Limitations:
• Highly subjective method as the trend line depends on personal judgement
• Time-consuming
Smoothing methods
• This provides pattern of movements in the data over time, by eliminating random
variations due to irregular components of time series
• 3 smoothing methods are:
Moving averages: a subjective method which depends on the length of the period
for calculating moving averages
• It is a technique to get an overall idea of the trends in a data set
• It is described as moving because old data points get replaced by new figures in its calculation
• Focuses on long-term trend in a time series
Weighted moving averages: A moving average where some time periods are
weighted differently than others
• The most recent observations are assigned larger weightage and it decreases for older data
values
• WMA= ∑ (Weight for period n) * (Data values in period n) / ∑weights
Semi-average method: used to estimate the slope and intercept of the trend line, if
time series is represented by a linear function
Steps in semi-average method
1. Data are divided into 2 parts
2. Their respective arithmetic means are computed
3. These 2 means are plotted corresponding to the midpoint of the data/
class interval covered by the respective part
4. These points are joined by a straight line to get the required trend line
5. The AM of the 1st part is the intercept value ‘a’
6. Slope: ratio of the difference in the AM of the number of years
between them, b= Δy/ Δ x = (AM1-AM2)/ (year 1- year 2)
7. Time series of the form y’ =a+bx, where y’ is the predicted y
Trend projection method
(linear, exponential or quadratic)
• It fits a trend line to a time series data and then projects medium-to-
long range forecasts
• Helps to describe the long-term general direction of any business
activity over a long period of time
• The study of trend facilitate in making intermediate and long term
forecasting projections
Linear projection method
• The method of least squares from regression analysis is used to find
the trend line of best fit to a time series data.
• This line is defend as y’ = a+bx where
• y’ is the predicted value of the DV
• a is the y intercept
• b is the slope of regression line Δy/ Δx
• x is the IV represented as time in year/ month etc.
Characteristics of the trend line of best fit
• The sum of all vertical deviations about the line of best fit is zero
• ∑(y-y’)=0
• The sum of all vertical deviations squared is minimum
• ∑(y-y’)^2 is the least
• The line of best fit passes through the mean values of variables x & y
• The value of 2 constants a & b can be found by the simultaneous
solution of normal equations
What is a Time Series?
• It is used to detect patterns of change in statistical
information over regular intervals of time
• We project these patterns to arrive at an estimate for the
future
• Thus it helps to cope with uncertainty about the future
• Set of evenly spaced numerical data

• Obtained by observing variables at regular time periods
• Forecast based only on past values
• Assumes that factors influencing past, present & future will
continue
• Example
• Year: 201620172018 2019 20202022
• Sales: 78.763.589.7 93.292.1 ??
•
Time Series data
Time series data is a sequence of observations
• collected from a process
• with equally spaced periods of time
• major purpose of forecasting with time series
is to extrapolate
• Time series is dynamic, it does change over
time.
• Data should be plotted on table or graph, so
that researcher can view it, and understand the
behavior of the variable
Objective: to identify the pattern and isolate the
influencing factors of production, planning
and control
Time Series Components
Secular Cyclical
trend variations
Seasonal Irregular
variations variations
Time Series Patterns
© Wiley 2010 74
•
Secular Trend Component
Value of the variable tends to increase or decrease over a long
period of time  Overall Upward or Downward Movement- in the
average value of the forecast variable
• Due to long-term factors like population increase, changing
demographic characteristics, technology, consumer preferences
etc.
• Data Taken Over a Period of Years
Sales d tre nd
Upwar
Time (eg: month/quarter/year)

Cyclical Component (non-seasonal)
• There are years when the business cycle hits a peak above the trend line
• Variations are periodic in nature and repeats like the 4 phases of a business cycle
• At times, business activity slumps and hits a low point below the trend line 
Upward or Downward Swings
• 4 well-defined periods /phases in the business cycle: prosperity, decline, depression &
improvement
• Due to interactions of factors influencing economy
• May Vary in length & usually lasts 2 - 10 Years
Sales
1 Cycle
Time
Seasonal Component
• Fluctuations are repeated within a year- daily, weekly, monthly, quarterly etc.
• Regular patterns of upward or downward swings- high degree of regularity
• Due to climate, weather, customs, traditions etc.- tend to be repeated from year to
year (Observed Within One Year mostly)
• daily traffic volume shows within-the-day “seasonal” behavior, with peak levels
occurring during rush hours, moderate flow during the rest of the day and early
evening, and light flow from midnight to early morning
Sales
Summer
Winter
Spring Fall
Time (Monthly or Quarterly)
Irregular Component
• Rapid changes caused by short-term unanticipated and non recurring
factors
• Due to random variation or unforeseen events
• Nature (flood, earthquake), accidents, Union strike, War
• Erratic, unpredictable, unsystematic, random, ‘residual’ fluctuations
• Short duration & non-repeating
Time Series Forecasting
T im e
S e r ie s
S m o o th in g No Yes T re n d
M e th o d s T re n d ? M o d e ls
M o v in g E x p o n e n tia l
A v e ra g e S m o o th in g
A u to -
L in e a r Q u a d r a tic E x p o n e n tia l R e g r e s s iv e
3 forecasting methods
• Three forecasting methods that are appropriate for a time series with
a horizontal pattern:
• moving averages, weighted moving averages, and exponential smoothing
• objective of each of these methods is to “smooth out” the random
fluctuations in the time series, they are referred to as smoothing
methods
Moving Averages
• The moving averages method uses the average of the most
recent data values in the time series as the forecast for the
next period
• The term moving is used because every time a new
observation becomes available for the time series, it replaces
the oldest observation in the equation and a new average is
computed
• Eg: sales of women’s blouses in the first three weeks (in
thousands of Rs.) are 17, 21 and 19
• forecast of sales in week 4 using the average of the time
series values in weeks 1–3, F4= average of weeks 1-
3=(17+21+19)/3=19
• Thus, the moving average forecast of sales in week 4 is 19 or
Rs. 19,000
• But, the actual value observed in week 4 is 23, the forecast
error in week 4 is 23-19= 4 (Rs.4000)
• Next, we compute the forecast of sales in week 5 by averaging the time
series values in weeks 2–4.
• F5 average of weeks 2-4= (21+19+23)/3 =21
• Hence, the forecast of sales in week 5 is 21 and the error associated with
this forecast is 18 -21 = 3
WEIGHTED MOVING AVERAGES
• select a different weight for each data value and then computing a
weighted average of the most recent values as the forecast.
• In most cases, the most recent observation receives the most weight, and
the weight decreases for older data values.
Exponential smoothing
• Exponential smoothing also uses a weighted average of past time
series values as a forecast; it is a special case of the weighted moving
averages method in which we select only one weight—the weight for
the most recent observation.
• The weights for the other data values are computed automatically
and become smaller as the observations move farther into the past
Steps in forecasting
• Define organizational objective of forecasting
• Select the variables to be forecasted- eg. Capital investment,
employment level, etc.
• Determine the time horizon- short/ medium or long-term of the
forecast, to predict the future
• Select appropriate forecasting method
• Collect relevant data for forecasting
• Make the forecast and implement the results
Unit – VII: Introduction to Inferential
Statistics
• Meaning & Purpose of inferential statistics, Introduction to testing of Hypothesis:
Procedure for testing hypothesis - Setting of Hypothesis -Null and alternative
hypotheses,
• Computation of Test statistics ( simple problems)-
• Types of errors in hypothesis testing - Level of significance, Critical region and
value - Decision making.
• Test of significance for Large and small sample tests, Z and t tests for mean and
proportion,
• One way ANOVA, Chi-square test for goodness of fit and independence of
attributes.
• (Simple problems)
Hypothesis
• An unproven statement or supposition that tentatively explains
certain facts or phenomena; a proposition that is empirically testable
• Null hypothesis (H0): a statement in which no difference or no effect
is expected. It is the hypothesis that is always tested.
• Alternative hypothesis (H1): A statement that some difference or
effect is expected. It is a statement indicating the opposite of the null
hypothesis
86
The role of hypothesis in a research study
 Guides the direction of study
 Specifies what data are relevant and

irrelevant for the purpose of research.
 Provides the framework for data analyses

and organising the conclusions that result.
87
Types of hypothesis
1. Hypotheses based on Empirical Uniformities.
Univariate and descriptive in nature,

states that something is the case, that
some group of individuals, objects, events
or incidents has certain property or
characteristic.
Eg: All software professionals lead a

sedentary life.
88
2. Hypotheses based on association between Variables.
Bivariate and explanatory in nature, states that

 something is associated with some other
thing;
 that something is greater than some other
thing;
 that some group of individuals, objects,
events or incidents have more of a certain
property as compared to some other group; that
speaks of the frequency of occurrence of
something as compared to some other thing.
Eg: Men make better managers than women.

89
3. Hypotheses based on cause and effect
relationship.
Bivariate and deterministic in nature,

states that something is causally related to
some other thing;
that something is a determinant of some other
thing;
Eg: Cigarette smoking causes cancer.
90
Stating hypotheses
 Stated in declarative form.
 States generally a relationship between variables.
 Ideally reflects the theoretical framework of the study
based on a theory/body of literature.
 Is brief and to the point.
Example of Hypothesis
Research Idea Objective Hypothesis
Drug abuse and To ascertain the There is a positive

child abuse linkage between relationship between drug
drug abuse of abuse among adults and
adults and abuse their physical and
of children. psychological abuse of
children in their contact.
91
Hypothesis testing procedures
1. Formulate H0 (null hypothesis) and H1 (alternative hypothesis)
2. Select an appropriate statistical technique
3. Choose the level of significance (α)- usually 5% is taken for management decisions.
• This means, there are about 5 chances out of 100 that we would reject null hypothesis, when it should be
accepted. i.e We are 95% confident about our right decision
• At 5% significance level, when we reject H0, the test is said to be significant
4. Compute the degrees of freedom (df)- applicable for t-test, chi square test and
ANOVA
5. Calculate the value of the test statistic- z, t, F, chi square
• The test statistic is a value that is computed from the sample data that is used in making the decision about the rejection of
the null hypothesis.
6. Determine the critical region/ rejection region: those values of the test statistic
leading to the rejection of H0 (figure in the next 3 slides)
7. Find whether the test statistic calculated in step 5 falls in the critical region or
acceptance region
8. Make the statistical decision to reject or not reject the H0
92
Right-tailed test (critical region is the blue
region
zα =
93
Left-tailed test
-zα
94
2-tailed test
Critical Region &

Area of Rejection
Critical Region &
Area of Rejection
Area of
Acceptance
(95%)
-Zα/2 +Zα/2
.025 - 1.96 1.96
.025
CRITICAL
VALUES
95
What does the critical region mean?
• The critical region of the sampling distribution of a
statistics is also known as the alpha region.
• The critical region of a hypothesis test is the set of
all outcomes which, if they occur, cause the H0 to be
rejected and the H1 accepted.
• The values within the acceptance region are called
acceptable at the 95% Confidence Level and if we
find that our sample mean lies within this region, we
would conclude that Ho is true and we accept it.
96
• The critical region CR, or rejection region RR, is a set of values of the test
statistic for which the null hypothesis is rejected in a hypothesis test.
• That is, the sample space for the test statistic is partitioned into two regions;
one region (the critical region) will lead us to reject the null hypothesis Ho, the
other will not.
• So, if the observed value of the test statistic is a member of the critical region,
we conclude "Reject Ho"; if it is not a member of the critical region then we
conclude "Do not reject Ho".
• For instance, if we have calculated that the critical region at a 95% confidence
level is between 10 and 20, then we can be 95% confident that the true mean
lies within that region.
97
Hypotheses pertaining to Left, right and 2-tailed tests
• There are three ways to set up a the null and alternate
hypothesis, mathematically.
• Equal hypothesis versus not equal hypothesis (two-tailed test)

H0 : μ = some value
H1: μ ≠ some value
• Equal hypothesis versus less than hypothesis (left-tailed test)

H1 : μ < some value
• Equal hypothesis versus greater than hypothesis (right-tailed

test)
H1 : μ > some value
98
Type I and Type II errors
• Whenever we draw inferences about a population, there is a risk that
an incorrect conclusion will be reached
Two types of errors can occur
• Type I error and
• Type II error
99
• H0: Patient is alive (because null hypothesis represents no
change)
• H1: Patient is not alive (dead)
• Possible states of nature: (based on H0)
• Patient is alive (H0 true & H1 false)
• Patient is dead (H0 false & H1 true)
• Decisions are something that researcher has control over, we
make correct or incorrect decision
• Possible decisions (based on Ho) / conclusions (based on claim)
• Reject Ho: sufficient evidence to say patient is dead
• Fail to reject or accept Ho: insufficient evidence to say patient is
dead
• Four possibilities that can occur based on 2 possible states of
nature and the 2 decisions which we can make:
100
Testing of Hypotheses
Errors in hypothesis testing
Decision
Accept H0 Reject H0
Correct Type I
H0 is True Decision Error
Condition
H0 is False
Type II Correct
Error Decision
 = P(Type I Error) ;  = P(Type II Error)

Goal: Keep a, b reasonably small
101
Example
• A major dept store is considering the introduction of an Internet shopping service
• The new internet shopping service will be introduced if more than 40% of the Internet
users shop via the Internet, say p
• H0: p =0.40
• H1: p >0.40
• If H0 is rejected, then H1 will be accepted and the new Internet shopping service will be
introduced.
• If H0 is not rejected, then the new Internet shopping service should not be introduced
unless additional evidence is obtained
• This is a one-tailed test
102
Type I error
• Occurs when the sample results lead to the rejection of H0 when it is in fact true
• Type I error in this eg:
Would occur if we concluded, based on the sample data, that the proportion of
customers preferring the new service plan was greater than 0.40, when in fact it was
less than or equal to 0.40 (ie. when H0 is true). Hence we do the mistake of
introducing the service incurring huge loss.
The probability of Type I error (α) is also called the level of significance
103
Type II error (β)
• Occurs when, based on the sample results, H0 is not rejected when it is
in fact false.
• In our eg., type II error would occur
If we concluded, based on sample data, that the proportion of customers
preferring the new service plan was less than or equal to 0.40, when, in
fact, it was greater than 0.40. Hence we do the mistake of not
introducing the service.
104
1. Test of hypothesis concerning
population mean
• Test concerning mean of one population
To test Ho: μ= μo against
a) H1: μ> μo
b) H1: μ< μo
c) H1: μ= μo
105
• A sample of size n (n>30) is taken from the population with unknown
mean μ and known SD σ
• Let x be the sample mean
• Critical value z=(x- μ)
(σ/√n)
106
Case a) H1: μ> μo
• This is right tailed test.
• The rule is: “If z> zα (the tabled value), the test is significant. There is
significant difference between the sample mean and the hypothetical
mean and hence we reject Ho at (1- α)100% confidence level”
• hence we reject Ho at (α)100% significance level
• “If z< zα (the tabled value), the test is not significant. There is no
mean and hence and we fail to reject Ho at (1- α)100% confidence
level”
• hence we accept Ho at (α)100% significance level
107
Case b) H1: μ< μo
• This is left tailed test.
• The rule is: “If z≤ -zα (the tabled value), the test is significant. There is
mean and hence we reject Ho at (1- α)100% confidence level”
• “If z>- zα (the tabled value), the test is not significant. There is no
mean and hence and we fail to reject Ho at (1- α)100% confidence
level”
108
Case c) H1: μ= μo
• This is two tailed test.
• The rule is: “If absolute value of z, ie. IzI > zα/2 (the tabled value), the test is
significant. There is significant difference between the sample mean and the
hypothetical mean and hence we reject Ho at (1- α)100% confidence level”
• “If IzI < zα/2 (the tabled value), the test is not significant. There is no significant
difference between the sample mean and the hypothetical mean and hence and
we fail to reject Ho at (1- α)100% confidence level”
109
• To save time and effort, the table below relates critical z values to alpha
levels and type of test (whether one-tailed or two-tailed).
• Alpha Tails Critical Z
• 0.05 two plus or minus 1.96
0.05 right 1.645
0.05 left -1.645
0.01 two plus or minus 2.58
0.01 right 2.33
0.01 left -2.33
110
Practice 1
• A sample of 100 students is taken from the students of a college with
heights having standard deviation 10 cm. the mean height of the
sample of students was found to be 168.8 cm. Can we accept the
assumption that the mean height of the students of the college is 170
cm? Significance level= 0.05
111
Solution 1
• σ=10
• x = 168.8
• n=100
• To test Ho: μ= 170 against H1: μ= 170
• This is a 2-tailed test
• α= 0.05, then zα/2 = 1.96
• Applying the formula, z= -1.2
• Here IzI < zα/2 and hence we accept the assumption that the mean
height of the students of the college is 170 cm
112
Practice 2
• A sample of 400 observations were taken from a population with
standard deviation of 15. if the mean of the sample is 27, test
whether the hypothesis that the mean of the population is less than
24.
α= 0.05
113
Solution 2
• To test Ho: μ= 24 against H1: μ<24
• σ=15 α= 0.05
• x = 27 so, zα = -1.64
• n=400
• Applying the formula, z= (27-24) /(15/20)=4
• This is a left-tailed test. If z>-zα, the test is not
significant. We accept Ho at 95% CL.
• Hence, mean is reasonable accepted to be 24.
114
2. Test of hypothesis concerning population
proportions
• Test concerning one population proportion
To test Ho: p= po against (read po as p not)
a) H1: p> po
b) H1: p< po
c) H1: p= po
115
• Given a sample of size n from the population.
• x is the number of items having a particular characteristic
• Sample proportion p=x/n
• Formula to calculate z is z= (p- po)/√ po(1- po)/n
116
Practice 1
• In a survey of 70 business firms, it was found that 45 are planning to
expand their capacities next year. Does the sample information
contradict the hypothesis that 70% of the firms are planning to
expand next year?
117
Solution
• To test H0: p= 0.7 against H1: p=0.7
• This is a 2-tailed test. At α= 0.05, zα/2= 1.96
• n=70, x=45
• p= x/n = 45/70 = 0.64
• z= (p- po)/√ po(1- po)/n
= (0.64-0.70)/√(0.7x0.3)/70
= -0.06/0.05 = -1.2
IzI= 1.2 is < z α/2 (ie. 1.96)
Test is not significant and we accept Ho at 95% CL. There is no reason to doubt the
hypothesis that 70% of the companies are going to expand their capacities.
118
• Practice 1
• An e-commerce research company claims that 60%
or more graduate students have bought
merchandise on-line. A consumer group is
suspicious of the claim and thinks that the
proportion is lower than 60%. A random sample of
80 graduate students show that only 22 students
have ever done so. Is there enough evidence to
show that the true proportion is lower than 60%?
Conduct the test at 5% Type I error rate, and use
the rejection region approaches.
119
• Left tailed test
• To test H0: p= 0.6 against H1: p< 0.6
• n=80, x=22; p= x/n =22/80=0.275;
• po = 0.6; 1- po = 0.4
• Z= (p- po) / √ po(1- po)/n
• =(0.275−0.6)/√[0.6×0.4]/80= −5.93
• Z< - zα; Test is significant and we reject Ho at
5% SL. There is enough evidence to show
that the true proportion is lower than 60%
120
3. Test of hypothesis concerning Chi Square
Statistic
• When the assumption that ‘the samples are drawn from a normal population’ cannot be
justified, we use statistical procedures generally referred to as non-parametric tests.
• Chi square is one such test belonging to this category, first used by Karl Pearson
121
Properties
• Chi square distribution is a continuous probability distribution which has a
value zero as its lower limit and extends to infinity in the positive direction
• It can never have a negative value, as the difference between observed
and expected frequencies is squared
• The exact shape of the distribution depends upon the degrees of freedom
• For a small df, the shape of the curve is positively skewed. As the df
becomes larger, it becomes symmetrical and approximates to the shape of
a normal distribution
• It makes no assumptions about the population being sampled
• The greater the chi square value, the greater is the discrepancy between
observed and expected frequencies
3. Test of hypothesis concerning Chi Square
Statistic
Calculate the chi square statistic by completing the following steps:
• For each observed number in the table subtract the corresponding
expected number (O — E).
• Square the difference [ (O —E)2 ].
• Divide the squares obtained for each cell in the table by the expected
number for that cell [(O - E)2 / E ].
• Sum all the values for (O - E)2 / E. This is the chi square statistic.
123
Expected value
• Eij = (Ri * Cj)/n
where
Ri= total observed frequency in the ith row
Cj= total observed frequency in the jth column
and n is the sample size
124
Conditions
• Minimum 50 observation in the sample (n>50)
• Each cell frequency should not be less than 5 observations, otherwise
increase the sample size per cell
• The data should be expressed in original units (frequencies/counts),
i.e. frequencies and not in percentage or ratio form
• Sample data to be drawn at random from the target population
125
• Formulate the null hypothesis and determine the expected
frequency of each answer
• Ho: the two attributes are independent/ there is no association
between the attributes
• H1: X is dependent on Y/ there is a significant association
between the 2 attributes
• Determine the appropriate significance level
• Calculate the chi-square value, using the observed (from
sample) and expected frequencies
• Make the statistical decision by comparing the calculated chi
square with the critical (tabled) value
126
Decision rule- X2 test
• The tabled X2 value is X2 k-1,α where k is the number of classes and k-1
is the degrees of freedom to find the tabled value.
• If calculated X2 > tabled X2 k-1,α, the test is significant and we reject H0 at
(1-α)100% CL.
• Otherwise we accept H0
127
Problem
• A company has to choose among three No. of employees favoring
proposed pension plans. The company Job Plan A Plan B Plan C
wishes to test the hypothesis ‘preference classifi
for plans is independent of job cation
classification’. It asks the opinion of a sample Factory 160 30 10
of employees and obtains the information employ
presented in the table. Test the hypothesis ees
which the company wishes to do.
Clerical 148 40 20
employ
ees
Supervi 72 10 10
sors
Executi 70 20 10
ves
128
observed expected
160 30 10 200 150 33.33 16.67
148 40 20 208 156 34.67 17.33
72 10 10 92 69 15.33 7.67
70 20 10 100 75 16.67 8.33
450 100 50 600
O E (O-E)^2 (O-E)^2/E
160 150 100 0.666666667
Ho: preference for plans is
148 156 64 0.41025641
independent of job classification
72 H1: preference for plans is dependent
69 9 0.130434783
70 on job classification
75 25 0.333333333
30 33.33 11.0889 0.33270027
dof= (3-1)*(4-1)=6
40 34.67 28.4089 0.819408711
X2 calc<X2 6,0.05, =12.6
10 15.33 28.4089 1.853157208
20 16.66 11.15 0.66

Interpretation: We fail to reject Ho at
10 16.66 44.35 2.66 5% sig level. We have no evidence to
20 17.33 7.12 0.41 state that preference for plans is
10 7.66 5.47 0.71 dependent on job classification
10 8.33 2.78 0.33
9.3 129
• This has a X2 distribution with (r-1)*(c-1) degrees of freedom
• If calculated X2 > tabled X2 (r-1)*(c-1) ,α, the test is significant and we reject H0
at (1-α)100% CL. There is evidence to believe that the two attributes
(variable 1 and variable 2) are dependent or related
• Otherwise we accept H0
• Cell: section of a table representing a specific combination of two
variables or a specific value of a variable
130
Chi square problem
• Of the 1000 workers in a factory exposed to an Covid-19, 700 in all were
attacked, 400 had been inoculated and of these 200 were attacked
• On the basis of this information, can it be said that inoculation and
attack are independent?
Table
Inoculated Not Inoculated Total
Attacked by Covid-19 200 500 700
Not attacked 200 100 300
Total 400 600 1000
132
Tests based on statistics following Student’s t distribution:
the study of statistical inference with small samples
• If the orginial population is normally distributed, SD of the
population is unknown, the sampling distribution of the mean
derived from the small samples (n<30) will follow a t-distribution
• The shape of the t-distribution is influenced by its degrees of
freedom (d.o.f or d.f)
• The number of d.o.f is equal to the number of useful items of
information generated by a sample of given size with respect to the
estimation of a given population parameter
• It is calculated as df= n-1
133
• In statistics, Student's t-
distribution (or simply
the t-distribution) is a
probability distribution
that arises in the problem
of estimating the mean of
a normally distributed
population when the
sample size is small
• Assumptions:
1.Population is normal
2.SD of population is
unknown
• Properties:
1.It ranges from -∞ to +∞
• The t-table :
2.It is bell shaped and symmetrical around the mean
3.Its shape changes with the change in df
1.Its value is called tα or tα/2
4.It is more platykurtic that the normal distribution 2.Determined from the table given a
5.As n approaches 30, the t-distribution approaches the normal particular df and level of significance
134
form
• A sample of size n (n<30) is taken from a normal population with
unknown population standard deviation
• Let x be the sample mean and s be the sample SD
• Then t=(x-μ) / (s/√n)
• μ is the hypothesized population mean
• s= √∑(x-x) 2/n-1
135
Questions you may ask to arrive at a decision
between t and Z
• Is the population standard deviation (σ) known?
• If the answer is yes, the Z-distribution is appropriate
• When σ is unknown, a second question is asked: “Is the sample size
greater than 30?”
• If the answer is no, the t-distribution should be used, if it is yes, the Z-
distribution should be used (because as the sample size increases, the t-
distribution becomes increasingly similar to the Z-distribution)
136
Test concerning mean of one population
To test Ho: μ= μo against

a) H1: μ> μo
b) H1: μ< μo
c) H1: μ= μo
137
Case a) H1: μ> μo
• This is right tailed test.
• The golden rule is: “If calculated t> tα,n-1 (the tabled value with n-1 degrees
of freedom), the test is significant. We reject Ho at (1- α)*100% confidence
level”
• “Otherwise, we accept (fail to reject) Ho at (1- α)100% confidence level”
138
Case b) H1: μ< μo
• This is left tailed test.
• The golden rule is: “If calculated t< -tα,n-1 (the tabled value with n-1
degrees of freedom), the test is significant. We reject Ho at (1- α)100%
confidence level”
• “Otherwise, we accept (fail to reject) Ho at (1- α)100% confidence level”
139
Case c) H1: μ= μo
• This is two tailed test.
• The golden rule is: “If absolute value of the calculated ItI> tα/2,n-1 (the
tabled value with n-1 degrees of freedom), the test is significant. We
reject Ho at (1- α)100% confidence level”
• “Otherwise, we accept (fail to reject) Ho at (1- α)100% confidence
level”
140
Analysis Of Variance (ANOVA)
• Analysis of variance (ANOVA) is a collection of statistical models and their associated
estimation procedures (such as the "variation" among and between groups) used to
analyze the differences among means. ANOVA was developed by the statistician Ronald
Fisher.
• When the means of more than two groups or populations are to be compared, one-way
analysis of variance, a bivariate statistical technique, is the appropriate statistical tool
• One way because there is only one independent variable (though several levels of that
variable may be present)
• It is the analysis of the effects of one treatment variable on an interval-scaled or ratio-
scaled dependent variable; a technique to determine if statistically significant
differences in means occur between two or more groups
• Eg. Students from different colleges take the same exam. You want to see if one college
outperforms the other.
Example of an ANOVA problem
• To compare women who are working full-time outside the
home, working part-time outside the home, and not working
outside the home on their willingness to purchase a personal
computer
• This eg: has only one IV- working status with 3 levels:
• Full time employment
• Part-time employment and
• No employment outside the home
• Because there are 3 levels (groups), a t-test cannot be used to
test for statistical significance.
Contd…
• The null hypothesis, here, can be stated as “All the means are equal”
or
• Ho: μ1= μ2 = μ3
• As the name suggests, ANOVA requires comparing variances to make

inferences about the means
The logic of this technique:
• The variance of the means of the three groups will be large if these
women differ from each other in terms of purchasing intentions
• If we calculate this variance within groups and compare it to the
variance of the group means about a grand mean, we can determine
whether the means are significantly different
• F-test makes this easier
F-test
• A procedure for comparing one sample variance to another sample
variance
• It determines whether there is more variability in the scores of one
sample than in the scores of another sample
To obtain F-statistic or F-ratio
• F-statistic: a test statistic that measures the ratio of one sample
variance to another sample variance, such as the variance between
groups to the variance within groups
• F = variance between groups = MS between where MS is the mean square
variance within groups MS within
If the F-value is large, it is likely that the results are statistically
significant
Some terminologies for calculation of F-ratio
• Within-group variance : variation of scores within a group due

to random error or individual difference
• Between-group variance: variation of scores between groups
due to either the manipulation of an IV or characteristics of the
IV
• Total variance: the sum of within-group variance and between-
group variance
Calculating the F-ratio
Sales in units (thousands)
• The data in the table is from Regular Reduced off coupon
a hypothetical packaged- Price U Price V price W
goods company’s test- Test market 130 145 153
market experiment on A, B or C
pricing. Three pricing Test market 118 143 129
treatments are D, E or F
administered in four
separate test markets (ie. 12 Test market 87 120 96
test areas, A-L, were G, H or I
required). Do all the 3 price
treatments produce the Test market J, 84 131 99
same sales volume? K or L
Mean X1 = X2 = X3 =
104.75 134.75 119.25
Grand Mean Is the mean
of all 3 means
X =119.58
• Total sum of squares =
within group sum of squares + between group sum
of squares
ie. SS total = SS within + SS between
SS total is computed by squaring the deviation of each

score form the grand mean and summing these
squares
SS total = Σn Σc (Xij – X)2 where

i=1 j=1
• Xij = individual score, i.e., the ith observation or test unit in the jth group
• X = grand mean
• n= number of all observations or test units in a group (in this eg: 4)
• c= number of jth groups (or columns) (in this eg: 3)
Applying the formula …
SStotal = (130-119.58)2 + (118-119.58)2
+ (87-119.58)2 + (84-119.58)2
+(145-119.58)2 + (143-119.58)2
+(120-119.58)2 + (131-119.58)2
+(153-119.58)2 + (129-119.58)2
+(96-119.58)2 + (99-119.58)2
= 5948.93
To calculate SSwithin
• SSwithin, the variability that we observe within each group, is calculated by
squaring the deviation of each score from its group mean and summing these
scores:
• SSwithin = Σn Σc (Xij – Xj)2 where
• Xij = individual score
• Xj = group mean for the jth group
• n= number of observations in a group
• c= number of jth groups
i=1 j=1
Applying the formula…
• SSwithin = (130-104.75)2 + (118-104.75)2
+ (87-104.75)2 + (84-104.75)2
+ (145-134.75)2 + (143-134.75)2
+ (120-134.75)2 + (131-134.75)2
+ (153-119.25)2 + (129-119.25)2
+ (96-119.25)2 + (99-119.25)2
= 4148.25
To calculate SSbetween
• SSbetween , the variability of the group means about a grand mean, is calculated by squaring the
deviation of each group mean from the grand mean, multiplying by the number of items in the
group, and summing these scores:
• SSbetween = Σc nj (Xj – X)2 where
• Xj = group mean for the jth group
• X = grand mean
• nj= number of items in the jth group, same as ‘n’
j=1
Applying the formula…
• SSbetween = 4(104.75-119.58)2
+ 4(134.75-119.58)2
+ 4(119.25-119.58)2
= 1800.68
• The next calculation requires dividing the various sums of squares by
their appropriate degrees of freedom.
• These divisions produce the variances, or mean squares
MSbetween
• To obtain mean square between groups, SSbetween is divided by c-1
degrees of freedom:
• MSbetween = SSbetween
c-1
= 1800.68 =900.34
3-1
MSwithin
• To obtain the mean square within groups, SSwithin is divided by cn-c
degrees of freedom
• MSwithin = SSwithin
cn-c
= 4148.25 =460.91
12-3
F-ratio
• F-ratio is calculated by taking the ratio of the mean square between
groups to the mean square within groups
• The between-groups mean square is used as the numerator and the
within-groups mean square is used as the denominator:
F= MSbetween =900.34 =1.95
MSwithin 460.91
Summary for Analysis Of Variance
Source of Sum of squares Degre Mean F-ratio
variation es of square
freedo
m
Between SSbetween = c-1 MSbetween = -
groups Σc nj (Xj – X)2 Ssbetween/
j=1
c-
1
Within groups SSwithin = cn-c MSwithin = F=
SSwithin / cn-c MSbetween
Σn Σc (Xij – Xj)2
i=1 j=1
MSwithin
Total SS total = Σn c is the

number
n is the number
of observations
Σc (Xij – X)2 of in a group
i=1 j=1 groups
• There will be (c-1) degrees of freedom in the numerator and (cn-c)
degrees of freedom in the denominator
c-1 = 3-1 =2
cn-c 3(4)-3 9
From F-distribution table, at 0.05 level for 2 (n1) and 9 (n2) dof, indicates an F of
4.26
Pricing experiment: ANOVA table
• As calculated F 1.95< Source of
variation
Sum of
squares
Degrees
of
Mean
square
F
-ratio
4.26 (tabled), we fail freedom
to reject H0 at 95 % CL
Between 1800.68 2 900.34 --
• We conclude that all groups
the price treatments
produce Within 4148.25 9 460.91 1.953
groups
approximately the
same sales volume
Total 5948.93 11 -- --
All the best!

2021 Stat Notes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2021 Stat Notes

Uploaded by

Copyright:

Available Formats

2021 Business Statistics

For the points which lie above the line, the

Managers often base their decisions on an analysis

What is the likelihood a new assembly method

What are the odds that a new investment will

Probability is a numerical measure of the likelihood

Probability values are always assigned on a scale

A probability near zero indicates an event is quite

A probability near one indicates an event is almost

Increasing Likelihood of Occurrence

The event The occurrence The event

In statistics, the notion of an experiment differs

In statistical experiments, probability determines

Even though the experiment is repeated in exactly

For this reason, statistical experiments are some-

Experiment Experiment Outcomes

 Basic Requirements for Assigning Probabilities

1. The probability assigned to each experimental

0 < P(Ei) < 1 for all i

P(E1) + P(E2) + . . . + P(En) = 1 where:

Relative Frequency Method

Experiment: Rolling a die

An event is a collection of sample points.

The probability of any event is equal to the sum of

If we can identify all the sample points of an

Union of Two Events

Intersection of Two Events

Mutually Exclusive Events

The complement of event A is defined to be the event

The complement of A is denoted by Ac.

The union of events A and B is the event containing

The union of events A and B is denoted by A B

The intersection of events A and B is the set of all

The intersection of events A and B is denoted by A 

The addition law provides a way to compute the

The law is written as:

P(A B) = P(A) + P(B) - P(A  B

Two events are said to be mutually exclusive if the

Two events are mutually exclusive if, when one event

If events A and B are mutually exclusive, P(A  B = 0.

The addition law for mutually exclusive events is:

P(A B) = P(A) + P(B)

If the probability of event A is not changed by the

The law is written as:

P(A B) = P(A)P(B)

Do not confuse the notion of mutually exclusive

Two events with nonzero probabilities cannot be

If one mutually exclusive event is known to occur,

Two events that are not mutually exclusive, might

• Select several forecasting methods

These are used when:

Time Series Causal

Moving Exponential Trend

• Set of evenly spaced numerical data

Time (eg: month/quarter/year)

 Guides the direction of study

 Specifies what data are relevant and

 Provides the framework for data analyses

Univariate and descriptive in nature,

Eg: All software professionals lead a

Bivariate and explanatory in nature, states that

Eg: Men make better managers than women.

Bivariate and deterministic in nature,