STAT-702 Unit # 1

STAT-702
Introduction to Statistics
1
Objectives
 Understand the complexity of managerial decisions
 Know the need of using quantitative approach to
managerial decisions
 Appreciate the role of Statistical methods in data
analysis.
2
INTRODUCTION TO STATISTICS
 Statistics:-
May be defined as a science of collection, representation,
analysis and interpretation of numerical data under
uncertainty conditions.
 Descriptive Statistics:
Gives numerical and graphic procedures to summarize a
collection of data in a clear and understandable way
 Inferential Statistics:
Provides procedures to draw inferences about a
population from a sample.
3
Statistics presents a rigorous scientific method for
gaining insight into data. For example, suppose we
measure the weight of 100 patients in a study. With
so many measurements, simply looking at the data
fails to provide an informative account. However
statistics can give an instant overall picture of data
based on graphical presentation or numerical
summarization irrespective to the number of data
points. Besides data summarization, another
important task of statistics is to make inference and
predict relations of variables.
4
 Population
A population is the totality of the observations made on all the objects
(under investigation) possessing some common specific characteristics,
which are of particular interest to researchers. It is the entire group
whose characteristics are to be estimated.
For example, the heights of all the students enrolled at UAF in M.Com
degree in Spring 2014, the wages of all employees of a mill in a given
year, etc. A population may be finite or infinite. The number of
observations in a finite population is called the size of the population
and is denoted by the letter N.
 Sample
A sample is a representative part of the population which is selected to
obtain information concerning the characteristics of the population.
The number of observations in a sample is called the size of the sample
which is denoted by n.
5
 Sampling
The process of drawing a sample from population is called
sampling.
Why take a sample
Why take a sample instead of studying every member of the
population?
A sample of registered voters is necessary because of the
prohibitive cost of contacting millions of voters before an election.
Testing wheat for moisture content destroys the wheat, thus
making a sample imperative.
If the soft drink tasters tested all the soft drink, none would be
available for sale.
6
 Parameter
A parameter is a numerical characteristic of a population, such
as its mean or standard deviation, etc. Parameters are fixed
constants that characterize a population. They are denoted by
Greek letters. Parameter is a fixed quantity.
 Statistic
A statistic is a numerical characteristic of a sample such as its
mean or standard deviation, etc. The statistics are used to draw
valid inferences about the population. They are denoted by Latin
letters. Statistic is a variable quantity.
7
Variables
Any characteristic, which may varies with respect to individual,
time, and place. For example
 No of products produced by a machine during a specified
period of time.
 Number of workers
 Weight of a any individual
 Price, Sale, Adv. expenditures
 Quality, Design, Performance
Variables are usually represented by last alphabets as X, Y, Z etc.
8
Types of variables
Fixed variables:- Random Variables:-
1.Design 1.Sales
2.Quality 2.Growth
3.Adv. Expenditures 3.Recovery Time
4.Diet
5.Dose of a medicine
9
Types of variables
 Qualitative variable
Characteristic which varies in quality (not numerically) from one
individual to another, also called attribute, e.g. eye color, education
level, Behavior, quality, Design, Performance.
 Quantitative variable
Variable is called a quantitative variable when it varies in quantity (or
numerically) from one individual to another, e.g. age, income,
temperature, Price , Sale, Advertising Expenditures
Types of Quantitative variable
 Discrete variable
Variable take only specified values or take values by jumps or breaks,
e.g. number of rooms in a house, number of deaths in an accident etc.
 Continuous variable
If it can assume any vale (fractional or integral) within two specified
values ‘a’ and ‘b’, e.g. height of a plant, speed of a car, Sale, Price
10
Measurement
 The process of assigning numbers or labels to objects, persons,
states or, events in accordance with specific logically accepted
rules for representing quantities or qualities of attributes or
characteristics. Data can be classified according to levels of
measurement. The level of measurement of the data often
dictates the calculations that can be done to summarize and
present the data. It will also determine the statistical tests that
should be performed. There are actually four levels of
measurement: nominal, ordinal, interval, and ratio [Stevens
1951].
 The lowest, or the most primitive, measurement is the nominal
level. The highest, or the level that gives the most information
about the observation, is the ratio level of measurement.
11
Levels of Measurement
 Nominal-Level
The data is only descriptive (e.g. religion, country name, region).
For example, eye color, religion, Gender, Product Name, Reg. #
etc.
 Ordinal-Level
The data has rank order, though intervals between data points
cannot be considered equal (e.g. high/medium/low income). For
example, cricket teams standings in ICC ranking, students’
grades, etc.
 Interval-Level
The data has equal intervals between data points. For example,
temperature, shoe size and IQ scores, etc.
 Ratio-Level
The data has equal intervals between data points and a true zero.
For example, bank balance, weight, height, etc.
12
 Observation
The numerically recording of information is called observation
or datum.
 Observations can be simply divided into three types:
categorical where the observations can be in a limited
number of categories which have no obvious scale (e.g.
‘Pass’, ‘Fail’, ‘Yes or No’);
 Discrete where there is a real scale but not all values are
possible (e.g. ‘number of Products’ or ‘number of students )
 Continuous where any value is theoretically possible, only
restricted by the measuring device (e.g. lengths,
concentrations, Weight ).
13
 Data
The Collection of some related observations is called data.
 Classification of data
Data that may have been originally collected and have not
undergone any sort of statistical treatment are called
Primary data, while the data that have undergone any sort
of statistical treatment at least once are called Secondary
data.
Data may be available from existing sources e.g. records
and publications or the same may have to be collected
afresh.
14
 Collection of primary data
(1) Direct personal investigation
(2) Personal interview
(3) Collection through questionnaires.
(4) Collection through enumerators.
(5) Collection through local sources
 Collection of Secondary data:
1. Official Publications
 Federal Bureau of Statistics
 Population Census of Organization
 Ministries of Health, Food, Agriculture, Finance etc.
 Provincial Bureaus of Statistics
(2) Semi-official
15
 Collection of Secondary data:
2. Semi-official Sources
 Publication of State Bank of Pakistan
 NBP
 District Councils
 WAPDA
3. Private Sources
 Chamber of Commerce & Industry
 Co-Operative Societies
4. Research Organizations
 PARC, NARC, Universities
16
A Taxonomy of Statistics
17
Arithmetic Mean
The arithmetic mean is defined as a value obtained by
dividing the sum of all the observations by their
number, that is
Sum of all the observatio ns
Arithmetic Mean 
Number of the observatio ns
If X1, X2, …, Xn are n observations of a variable X then
their AM is defined as: n
X1  X 2    X n X i
X  i 1
n n
18
Arithmetic Mean
The marks obtained by 8 students are given below
Find Arithmetic Mean: Let X=Marks then
X
67
 X  548 , n  8
72 X
 X 548
  68.5 Marks
68 n 8
70
65 NOTE:
68
75 At least one observation will be below
and at least one will be above the
63 mean
548
19 ΣX
Example: The height of 15 plants are given below. Find the
Arithmetic Mean . Let X= Plant Height
listing X
1 14
2 17
3 31 x-bar
4 28 737/15 = 49.13333
5 42
6 43
7 51
8 51
9
10
11
66
70
67
 X  737 , n  15
12
13
70
78
A.M  X 
 X 737
  49.13
14 62
n = 15 47
n 15
total 737
20
Example Days Off per Year
The data represent the number of days off per year for a
sample of individuals selected from nine different countries.
Find the mean.
20, 26, 40, 36, 23, 42, 35, 24, 30
X
X1  X 2  X 3   Xn

 X
n n
20  26  40  36  23  42  35  24  30 276
X   30.7
9 9
The mean number of days off is 30.7 years.

21
EXAMPLE:1Information regarding the receipts of a news agent
for seven days of a particular week are given below:
Day Receipt of News Agent

Monday £ 9.90
Tuesday £ 7.75
Wednesday £ 19.50
Thursday £ 32.75
Friday £ 63.75
Saturday £ 75.50
Sunday £ 50.70
7
X
i 1
i
£ 259.85
Mean sales per day in this week :

7
X i
259
X i 1
  £ 37.12
22 n 7 (to the nearest penny).
Interpretation:
The mean, £ 37.12, represents the amount (in pounds

sterling) that would have been obtained on each day if the same
amount were to be obtained on each day.
To calculate the approximate value of the mean, the
observations in each class are assumed to be identical with the
class midpoint Xi.
As was just mentioned, the observations in each class
are assumed to be identical with the midpoint i.e. the class-
mark.
, (This is based on the assumption that the observations
in the group are evenly scattered between the two extremes of
the class interval).
The mid-point of every class is known as its class-mark.
In other words, the midpoint of a class ‘marks’ that class.
23
Measures of Dispersion
The Scatter of the values about their center is called
Dispersion and Measures which are used to find the
amount of scatter about the center are called
Measures of Dispersion.
Measures of variation measure the variation present
among the values in a data set with a single number
so measures of variation are summary measures of
spread of values in the data
24
Types of Measures of Dispersion
There are two main types of measures of dispersion:
1. Absolute Measure of Dispersion
2. Relative Measure of Dispersion
1. Absolute Measure of Dispersion
The absolute measure of dispersion measures the variation present
among the observations in the unit of the variable or square of
the unit of the variable.
2. Relative Measure of Dispersion
The relative measure of dispersion measures the variation present
among the observations relative to their average. It is expressed
in the form of ratio, or percentage. It is independent of the unit
of measurement.
25
Measures of Dispersion
The commonly used measures of absolute dispersion
are:
1. Range
2. Quartile Deviation
3. Mean (Average) Deviation
4. Variance and Standard Deviation
Their corresponding measures of relative dispersion are:
1. Coefficient of Range/Coefficient of dispersion
2. Coefficient of Quartile Deviation
3. Coefficient of Mean (Average) Deviation
4. Coefficient of Variation (CV)
26
Measures of Spread
– Distance Based Measures of Spread
• The range
• The Semi interquartile range
– Centre Based Measures of Spread
• The mean deviation
• The variance
• The standard deviation
27
Variance: is defined as the arithmetic mean
of the Squared deviation of observations
2
from mean it is denoted
S by
( Biased Estimate Variance)
     
2
( X  X )2
1 X
S 
2
or S   X 
2 2 
n n n 
 
(Unbiased Estimate Variance)
     
2
( X  X )2
1 X
S2  or S 2   X 2  
n 1 n 1  n 
 
28
Q. Find Variance of the X ( X  X )2
given data, s2 4 36
6 16
X 60 9 1
X   10 12 4
n 6
13 9
S 
2  ( X  X ) 102
2
  17
16 36
n 6 60 102
X ( X  X ) 2
S 
2  ( X  X ) 2

102
 20.4
n 1 5
29
Q. Find Variance of the
X cm X2
given data, s2 4 16
X 60 6 36
X    10
n 6 9 81
   X 
2
1
   17 12 144
S 
2
X 
2
n n  13 169
 
1 (102) 2  16 256
S   702 
2
  17 cm 2
6 6  60 702
1

 X 2    X 
2


X X 2
S2 
n 1  n 
 
1  (102) 2
 n6
S   702 
2
  20.4
5 6 
30
Standard Deviation: is defined as the Positive
square root of the arithmetic mean of the Squared
deviation of observations from mean it is denoted
by S
( Biased Estimate S .D )
     
2
( X  X ) 2
1 X
S
n
or S 
n
 X 
2
n 

 
(Unbiased Estimate S .D )
     
2
( X  X ) 2
1  X
S
n 1
or S 
n 1 
 X 
2
n 

 
31
Q. Find S.D of the given X ( X  X )2
data, s 4 36
6 16
X 60
X   10
9 1
n 6 12 4
(X  X )
13 9
2
102
S   4.123 16 36
n 6 60 102
X ( X  X ) 2
S
(X  X ) 2

102
 4.517
n 1 5
32
Q. Find S.D and Coefficient
Variation of the given data, X cm X2
s2 4 16
X 60
X    10 6 36
n 6
9 81
   X 
2
1
S
n
 X 
2
n


12 144
  13 169
1 (102) 2  16 256
S  702    17  4.123cm
6 6  60 702
1 
   X
2
 X X 2
S
n 1 
 X 
2
n


 
1 (102) 2 
n6
S  702   20.4  4.517
5 6 
33
Coefficient of Variation (CV)
• Always in percentage (%)
• Shows relative variability, that is, variability relative
to the magnitude of the data i.e variation relative to
mean
• Can be used to compare two or more sets of data
measured in different units or same units but
different average size
S
C.V    .100%
X
34
Coefficient of Variation (CV)
X  10
S  4.123
 4.123 
CV    100%=41.23%
 10 
35
Comparing Coefficient of Variation
Mr. Ali
AverageMarks  X 1 =80
Standard deviation=S1  5
 S1  5 Mr. Ali is
CVALI    100%  100%  6.25%
 X1  80 more
Consistent
Mr.Zain in
AverageMarks  X 1 =80 performanc
e
Standard deviation=S1  15
 S1  15
CVALI    100%  100%  18.75%
 X1  80
36
Comparing Coefficient
of Variation
• Stock A:
o Average price last year = $50
o Standard deviation = $5
s $5
 
CVA     100%   100%  10%
x $50 Both stocks
have the same
• Stock B:
standard
o Average price last year = $100 deviation, but
stock B is less
o Standard deviation = $5 variable
relative to its
s $5 price
CVB     100%   100%  5%
x $100
Coefficient of Variation
Summary statistics for WEIGHT and HEIGHT (both ratio variables) of Pakistani adults in different
units:
Weight Height Weight Height
Mean 160 pounds 66 inches SD 30 pounds 4 inches
72.6 kilograms 5.5 feet 13.6 kilograms 0.33 feet
0.08 tons 168 centimeters 0.015 tons 10.2
centimeters
Which variable [WEIGHT or HEIGHT] has greater dispersion? [No meaningful answer can be
given]
Which variable has greater dispersion relative to its average, e.g., greater Coefficient of
Dispersion (SD relative to mean)?
 S1  30 13.6 0.015
CVWeight   100%     18.7%
 X1  160 72.6 0.08
 S1  4 0.33 10.2
CVHeight   100%     6.1%
 X1  66 5.5 168
Note that the Coefficient of Variation is a pure number, not expressed in any units and is the
same whatever units the variable is measured in.
STATISTICAL INFERENCE
 Statistical inference is the process of reaching conclusions about
characteristics of an entire population using data from a subset, or sample,
of that Population.
 The process of making guesses about the truth about a population

parameter from a sample statistic.
 To Draw the conclusion about the population parameter by using sample
information.
Sample statistics
n
 x
ˆ  X n  i 1
n
n
(x i  X n )2
ˆ 2  s 2  i 1
n 1
Truth (not *hat notation ^ is often used to indicate
“estimate”
observable)
Sample
Population (observation)
parameters
N N
x (x   )
i
2
 i 1
2  i 1 Make guesses about
N N
the whole
population
Estimation…
 There are two types of inference: estimation and hypothesis
testing; estimation is introduced first.
 The objective of estimation is to determine the approximate

value of a population parameter on the basis of a sample
statistic.
 E.g., the sample mean ( ) is employed to estimate the

population mean ( ).
Estimation…
 The objective of estimation is to determine the approximate
value of a population parameter on the basis of a sample statistic.
 There are two types of estimators:
 Point Estimator
 Interval Estimator
Point Estimator…
 A point estimator draws inferences about a population by estimating

the value of an unknown parameter using a single value or point.
 We saw earlier that point probabilities in continuous distributions

were virtually zero. Likewise, we’d expect that the point estimator
gets closer to the parameter value with an increased sample size, but
point estimators don’t reflect the effects of larger sample sizes. Hence
we will employ the interval estimator to estimate population
parameters…
Interval Estimator…
 An interval estimator draws inferences about a population by

estimating the value of an unknown parameter using an interval.
 That is we say (with some ___% certainty) that the population

parameter of interest is between some lower and upper bounds.
Point & Interval Estimation…
 For example, suppose we want to estimate the mean summer income
of a class of business students. For n=25 students,
 is calculated to be 400 $/week.
 point estimate interval estimate

 An alternative statement is:
 The mean income is between 380 and 420 $/week.
Testing of Hypothesis
• A procedure which enables us to decide on the basis of
information obtained from the sample taken from the
population whether to reject or don’t reject any
specified statement or hypothesis regarding the value of
the population parameter in a statistical problem is
known as testing of hypothesis.
• Is a procedure to determine the whether or not an
assumption about some parameter of a population
supported by the sample information.
46
Hypothesis Testing
H1
47
Left-tailed Test:- Average Marks at
Least 45
H0: µ  45
H1: µ < 45
Points Left
Reject H0 Fail to reject H0
Values that
differ significantly
from 45 45
Right-tailed Test: Average Marks at
most 45
H0: µ  45
H1: µ > 45
Points Right
Fail to reject H0 Reject H0
Values that
differ significantly
45 from 45
Two-tailed Test Average Marks equal
to 45
H : µ = 45 a is divided equally between
0 the two tails of the critical
H1: µ  45 region
Means less than or greater than
Reject H0 Fail to reject H0 Reject H0
45
Values that differ significantly from 45

Type I and Type II Errors
True State of Nature
The null ( H 0 ) The null ( H 0 )
hypothesis is hypothesis is
true false
Correct
We decide to
Decision
reject the Type I error
No Error
a
Decision
null hypothesis
1-
Correct
We don't
Decision Type II error
reject the
null hypothesis
No Error 
1-a
Significance Level
Probability of committing a Type-I error is called the
level of significance, denoted by α . By α =5% we mean
that there are 5 chances in 100 of incorrectly rejecting a
true null hypothesis. To put it in another way we say that
we are 95% confident in making the correct decision.
Level of Confidence
The probability of not committing a Type-I error, (1- α ), is
called the level of confidence, or confidence co-efficient.
Power of a Test
The probability of not committing a Type-II error, (1-β), is
called the power of the test.
52
Test Statistic
 A statistic on which the decision of rejecting or don’t rejecting the
null hypothesis is based is called a test statistic
 In testing of hypothesis the sampling distribution of the test statistic
is based on the assumption that the null hypothesis is true.
53
Decision Rule Critical Value
 Critical region/Rejection region

Critical region is that part of the sampling distribution of a statistic for which the
HO is rejected. A null hypothesis is rejected if the value of test-
statistic is not consistent with the HO. CR is associated with H1.
 Non-rejection Region
Non-rejection region is that part of the sampling distribution of a statistic AR RR
for which the HO is not rejected.
CriticalValues:
The values that separate Rejection and Non-rejection regions are called Critical
values
Conclusion:-
Reject Ho if the calculated value of test statistic falls in the rejection region
otherwise don’t reject Ho
54
General Procedure for Hypothesis Testing
Step-1:- Formulate the null and alternative hypotheses
Step-2:- Decide upon a significance level,
Step-3:- Choose an appropriate test statistic
Step-4:- Calculation
Step-5:- Determine the Critical Region (CR). The location of the
CR depends upon the form of alternative
hypothesis.
• If >, choose the right tail as the CR
• If <, choose the left tail as the CR
• If ≠ , choose a two-tailed CR
•Step-6:-Conclusion: Reject null hypothesis if the computed
value of test statistic falls in the CR, otherwise don’t reject null
hypothesis and then state the decision in managerial terms
55
EXAMPLE:- It is claimed that an automobile is driven on the
average more than 12,000 miles per year. To test this claim a
random sample of 100 automobiles owners are asked to keep
a record of the miles they travel. Would you agree with the
claim if the random sample showed an average of 12500
miles and a standard deviation of 2400 miles?
Construction of hypotheses
POPULATION Ho :   12000
 > 12000 H1:  > 12000
  12000 Level of significance
a = 5%
Step-3:- Test Statistic Step-4:-Calculation
t X   tCal 
12500 12000
 2.08
2
SAMPLE s2 2400
n=100 n 100
ത
𝑋=12500
S=2400 56
Step-5 Critical Region:-
t  ta  n1d . f
t  t0.0599d . f
t  1.66
Step-6
Conclusion: Since tcal  2.08 fall in the Rejection Region so we reject
H0
tcal  2.08
Acceptance Region Rejection
ttab  1.66 Region
-5 -4 -3 -2 -1 0 1 2 3 4 5
EXAMPLE:- It is claimed that an automobile is driven on the
average at most 12,000 miles per year. To test this claim a
random sample of 100 automobiles owners are asked to keep
a record of the miles they travel. Would you agree with the
claim if the random sample showed an average of 12500
miles and a standard deviation of 2400 miles?
POPULATION Ho :   12000
  12000 H1:  > 12000
 > 12000 Level of significance
a = 5%
12500 12000
 2.08
2
SAMPLE s2 2400
n=100 n 100
ത
𝑋=12500
S=2400 58
t  ta  n1d . f
t  t0.0599d . f
t  1.66
Step-6
Conclusion: Since tcal  2.08 fall in the Rejection Region so we reject
H0
tcal  2.08
Acceptance Region Rejection
ttab  1.66 Region
-5 -4 -3 -2 -1 0 1 2 3 4 5
EXAMPLE:- It has been found from experience that the mean
breaking strength of a particular brand of thread is 9.63N.
Recently a sample of 36 pieces of thread showed a mean
breaking strength of 8.93N and standard deviation of 1.40N.
Can we conclude that the thread has become inferior?
POPULATION Ho :   9.63
H1:  < 9.63
 < 9.63
  9.63 Level of significance
a = 5%
8.93  9.63
 3.0
2
s2 1.40
SAMPLE n 36
n=36
ത
𝑋=8.93
S=1.40 60
EXAMPLE:- It has been found from experience that the mean
breaking strength of a particular brand of thread is 9.63N.
Recently a sample of 36 pieces of thread showed a mean
breaking strength of 8.93N and standard deviation of 1.40N.
Can we conclude that the thread has become inferior?
POPULATION Ho :   9.63
H1:  < 9.63
 < 9.63
  9.63 Level of significance
a = 5%
8.93  9.63
 3.0
2
s2 1.40
SAMPLE n 36
n=36
ത
𝑋=8.93
S=1.40 61
t  ta  n 1d . f
t  t0.0535d . f
t  1.690
Step-6
Conclusion: Since tcal  3.00 fall in the Rejection Region so we reject
H0
tcal  3.00
ttab  1.690 Acceptance Region
Rejection Region
-5 -4 -3 -2 -1 0 1 2 3 4 5
EXAMPLE:- The mean lifetime of bulbs produced by a
company has in past been 1120 hours. A sample of 9
electric light bulbs recently chosen from a supply of
newly produced battery showed a mean lifetime of 1170
hours with a standard deviation of 120 hours. Test that
mean lifetime of the bulbs has not changed
Step-1 Construction of hypotheses
POPULATION Ho:  = 1120
H1:   1120
 =1120
  1120 Step-2. Level of significance
a = 5%
t  X  tCal 
1170 1120
2
 1.25
SAMPLE s2 120
n=9 n 9
ത
𝑋=1170
S=120 63
t  ta t  ta
2  n1d . f 2
n 1 d . f
t  t0.0258d . f t  t0.0258d . f
Step-6
t  2.306 t  2.306
Conclusion: Since tcal  1.25 does not fall in the Rejection Region so
we do not reject H 0
tcal  1.25
Rejection ttab  2.064 Acceptance ttab  2.064 Rejection
Region Region Region
-5 -4 -3 -2 -1 0 1 2 3 4 5
Example:- A researcher wishes to estimate the average marks of the
students in Math-101 course of A section. A random sample of 25
students is selected and the sample mean is found to be 50 with standard
deviation 2. Estimate 90 % confidence Interval for the Average Marks.
SAMPLE   2
X  t a ( n 1) d . f  S 
n=25 α=0.10  n 
ത
𝑋=50 α/2=0.05  
 2
S=2 50 t 0.05( 24)  2 
 25 
 
50(1.711)(0.4)
Width of C.I
50 0.684
50.684- 49.316
1.368 50-0.684, 50+0.684
65
( 49.316 , 50.684)
Example:- A researcher wishes to estimate the average marks of the
students in Math-101 course of A section. A random sample of 25
students is selected and the sample mean is found to be 50 with
standard deviation of 2. Estimate 95 % confidence Interval for the
Average Marks.
   2
SAMPLE  S 
X  t a / 2( n 1) d . f
n=25 α=0.05  n 
 
ത
𝑋=50 α/2=0.025  2 
S=2 50 t 0.025( 24)  2 

 25 
 
50(2.064)(0.4)
Width of C.I
50.826- 49.174 50 0.826
1.652
50-0.826, 50+0.826
66
( 49.174 , 50.826)
Example:- A researcher wishes to estimate the average amount of money
that a student from university spends for food per day. A random sample of
36 students is selected and the sample mean is found to be Rs 45 with
standard deviation of Rs.3. Estimate 90 % confidence limits for the
average amount of money that the students from the university spend on
food per day.
  2 
SAMPLE X  t a / 2( n 1) d . f  S 
 n 
n=36 α=0.10  
 2 
ത
𝑋=45 α/2=0.05 45 t 0.05(35)  3 
S=3  36 
 
45(1.69)(0.5)
45 0.84
45-0.84, 45+0.84
67
( 44.16 , 45.84)
TEST OF HYPOTHESIS FOR DIFFERENCE BETWEEN POPULATION MEANS
EXAMPLE: The average marks of 20 Students in Math-101 of A Section are 50
with a standard deviation 2 and the average marks of 15 Students in Math-101
of B Section are 40 with a standard deviation 2.5. On the basis of above sample
information can we conclude that students of Section A perform better than
Section B students . (Assume Population variances are equal) Use 5% level of
significance Step-1:Construction of hypotheses
POPULATION H0 : 1  2  1  2  0
1 > 2 H1 : 1  2  1  2  0
1  2
Step-2: Level of significance
a = 5%
Step-3:- Test Statistic
t  ( X 1  X 2 )  (1  2 )
SAMPLE 1 1
n1=20 n2=15 S   
2
p
 n1 n2 
𝑋ത1 =50 𝑋ത2 =40
S1=2 S2=2.5 68
Step-4:-Calculation
( n 1) S 2  (n 1)S 2
S 2p  1 1 2 2
n1  n2  2
Sp 
2 (20  1)*4  (15 1)*9
20 15  2
S 2p  76  87.50  4.95
33
Step-4:-Calculation
(50  40)  0 10
tCal    13.15
 1 1  0.845
4.95   
 20 15 
69
t  ta  n  n  2 d . f
1 2
t  t0.0533d . f
t  1.692
Step-6
Conclusion: Since tcal  13.15 fall in the Rejection Region so we
reject H 0
tcal  13.15
Acceptance Region ttab  1.691 Rejection
Region
-5 -4 -3 -2 -1 0 1 2 3 4 5
Comparing more than two population means
We can use two sample t-test to test the equality of
more than two population means, but this
procedure
– Require large number of two sample t-tests
– Performing many two sample t-tests at α tends to
inflate the overall α risk.
For example, To test the equality of 10-
population means, we have to perform 45 t-
test If the tests are independent and each test
use α =0.05, then overall α=45(0.05)=2.25
we require a procedure for carrying out test of
hypothesis about the equality of several population
means simultaneously
–we can use F-distribution in ANOVA that yields a single
test statistic for comparing all means so that the overall risk
71
of Type-I error is controlled
Analysis of Variance (ANOVA)
Analysis of Variance is a procedure that partitions the total
variability present in the data set into meaningful and distinct
components. Each component represents the variation due
to a recognized source of variation, in addition, one
component represents the variation due to uncontrolled
factors and random errors associated with the
NORMALITY:-The K-populations from which sample are drawn

should be normal
INDEPENDENCE:-The k-samples should be independent
Randomness: The k-Samples should be random
HOMOSCEDASTICITY ( Common Variance):-The k-population
have common variance
72
C .F  Correction Factor
TSS= Total Sum Of Square
SSM=Method Sum of Square
SSE=Error Sum of Square
ANOVA TABLE
SOV d.F SS MS F.cal F.tab
Methods m-1
Error
Total n-1
73
(One-Way ANOVA)Four groups of students ( All of
approximately same attributes) were subjected to different
teaching techniques and tested at the end of a specified period
of time. Due to drop outs in the experimental groups (sickness,
transfers etc) the number of students varied from group to
group
Method 1 Method 2 Method 3 Method 4
65 75 59 94
87 69 78 89
73 83 67 80
79 81 62 88
81 72 83
69 79 76
90
Do the data provide sufficient evidence to indicate a difference in the

74
mean achievements for the 4 teaching techniques?
Step-1 Construction of hypotheses
Ho : 1=2=3=4(i.e Mean achievements from 4 methods are same)
H1: At least two ’s are different
Step-2. Level of significance Step-3. Test Statistic
a = 5% F test (ANOVA)
Step  4 Calculation
Method Method Method Method Total
1 2 3 4
65 75 59 94
87 69 78 89
73 83 67 80
79 81 62 88
81 72 83
69 79 76
90
Total 454 549 425 351 1779
G.T
C.F =(G.T)2/n= (1779)2/23 = 137601.8
TSS=(65)2+(87)2 …(88)2 – CF= 139511 – 137601.8

= 1909.2
(454)2 (549)2 (425)2 (351) 2
SSMethods      CF
6 7 6 4
SSMethods  138314.4  137601.8  712.6
SS Error =Total SS –SS Methods=1909.2 – 712.6= 1196.6
SOV DF SS MS F.cal F.Tab

Methods m-1=3 712.6 237.5 237.5/63=3.77 F0.05(3,19)
Error 19 1196.6 63 3.13
TOTAL n-1=22 1909.2
Result:-As Fcal =3.77 > F.05(3,19) =3.13 So reject Ho and conclude that
there is difference in the mean achievements for the four teaching methods.
(Two-Way ANOVA)Four Breeds of cattle were fed on three
different diats. Gains in weight in pounds over a given period
were recorded.
Rations Breeds Total
B1 B2 B3 B4
R1 46.5 62 41 45
R2 47.5 41.5 22 31.5
R3 50 40 25.5 28.5
Total
Is there a significant difference (i) Between Breeds

(ii) Between Rations
Step  1: Construction of hypotheses
H 0 :  B1   B 2   B 3   B 4
H1 : At least two i are different
H 0 :  R1   R 2   R 3
H1 : At least two i are different
Breeds
Rations Total
B1 B2 B3 B4
R1 46.5 62 41 45 194.5
R2 47.5 41.5 22 31.5 142.5
R3 50 40 25.5 28.5 144.0
Total 144 143.5 88.5 105 481
G.T
(G.T ) 2 (481) 2
C .F    19280.08
n 12
TSS  (46.5) 2  (47.5) 2   (28.5) 2  C.F
TSS  20729.5  19280.08  1449.42
(144) 2 (143.5) 2 (88.5) 2 (105) 2
SSBreed      CF
3 3 3 3
SSBreed  20061.83  19280.08  781.75
(194.5) 2 (142.5) 2 (144) 2
SSRation     CF
4 4 4
SSRation  19718.125  19280.08  438.05
SSError  TSS SSB SSR
SSE  1449.42  781.75  438.05  229.63
ANOVA TABLE
SOV d.f SS MS F.cal F.tab
Breed b-1=3 781.75 260.75 F1=6.81 F.05(3,6)=4.76
Ration r-1 =2 438.05 219.02 F2=5.72 F.05(2,6)=5.14
Error 6 229.63 38.37
Total n-1=11 1449.42
Step-6: Conclusion . Since F1cal >Ftab So we reject

Ho and F2cal>Ftab So we reject H’o
80
Two-Way ANOVA. The Black Rock candy company was
planning a test of three new candy flavors (F1,F2,F3). In the test
company wished also to measure the effect of three different retail
price levels (P1=79 Cents, P2=89 Cents, P3=99 Cents). Because
each flavor was to be tested at each price a total of nine different
flavor-prices level combinations were to be tested. The following
data represent the number of sold candy in (100).
Price Candy Flavors Total
F1 F2 F3
P1 8 13 5 26
P2 4 18 6 28
P3 4 22 10 36
Total 16 53 21 90
Do the data provide sufficient evidence to indicate
a difference in the mean for flavors and prices?
81
Step-1: Construction of hypotheses
Ho : 1=2=3 i.e All the flavors have equal sales
H1: At least one ’s is different
Ho : 1=2=3 i.e All the prices have equal sales
’
H1 : At least one ’s is different
Candy Flavors
Price Total
F1 F2 F3
P1 8 13 5 26
P2 4 18 6 28
P3 4 22 10 36
Total 16 53 21 90
G.T
82
(G.T ) 2 (90) 2
C .F    900.00
n 9
TSS  (8)  (4) 
2 2
 (10)  C.F
2
TSS  1234  900  334

(16) 2 (53) 2 (21) 2
SSF     CF
3 3 3
SSF  1168.67  900  268.67
(26) 2 (28) 2 (36) 2
SSP     CF
3 3 3
SSP  918.67  900  18.67
SSE  TSS SSF SSP
SSE  334.00  268.67  18.67  46.66
ANOVA TABLE
SOV d.f SS MS F.cal F.tab
Flavors f-1=2 268.67 134.33 F1=11.51 F.05(2,4)=6.94
Price p-1=2 18.67 9.33 F2=0.80 F.05(2,4)=6.94
Error 4 46.66 11.33
Total n-1=8 334.00
Step-6: Conclusion . Since F1cal >Ftab So we reject

Ho and F2cal<Ftab So we Accept H’o
84

STAT-702 Unit # 1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STAT-702 Unit # 1

Uploaded by

Copyright:

Available Formats

STAT-702

Variables are usually represented by last alphabets as X, Y, Z etc.

Types of Quantitative variable

The mean number of days off is 30.7 years.

Day Receipt of News Agent

Mean sales per day in this week :

The mean, £ 37.12, represents the amount (in pounds

( Biased Estimate Variance)

 The process of making guesses about the truth about a population

 The objective of estimation is to determine the approximate

 E.g., the sample mean ( ) is employed to estimate the

 There are two types of estimators:

 A point estimator draws inferences about a population by estimating

 We saw earlier that point probabilities in continuous distributions

 An interval estimator draws inferences about a population by

 That is we say (with some ___% certainty) that the population

 point estimate interval estimate

Reject H0 Fail to reject H0

Fail to reject H0 Reject H0

Means less than or greater than

Reject H0 Fail to reject H0 Reject H0

Values that differ significantly from 45

 Critical region/Rejection region

Step-3:- Test Statistic Step-4:-Calculation

Step-3:- Test Statistic Step-4:-Calculation

Step-3:- Test Statistic Step-4:-Calculation

Step-3:- Test Statistic Step-4:-Calculation

Step-3:- Test Statistic Step-4:-Calculation

S=2 50 t 0.025( 24)  2 

Step-3:- Test Statistic

NORMALITY:-The K-populations from which sample are drawn

Do the data provide sufficient evidence to indicate a difference in the

TSS=(65)2+(87)2 …(88)2 – CF= 139511 – 137601.8

SOV DF SS MS F.cal F.Tab

Is there a significant difference (i) Between Breeds

Step-6: Conclusion . Since F1cal >Ftab So we reject

Price Candy Flavors Total

TSS  1234  900  334

Step-6: Conclusion . Since F1cal >Ftab So we reject

You might also like