22 Chapter 4 Data Management

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 75

Chapter 4:

D A TA M A N A G E M E N T
4.1 • Descriptive Statistics

4.2 • Data Collection Method

4.3 • Frequency Distribution

4.4 • Measures of Central Tendency

4.5 • Measures of Variability

4.6 • Testing a Statistical Hypothesis


Definition: Statistics
It is a branch of mathematics that deals with
data collection, organization, analysis, interpretation
and presentation.

o Data collection is defined as the procedure


of collecting, measuring and analyzing
accurate insights for research using
standard validated techniques.

o Data organization refers to the method of


classifying and organizing data sets to make
them more useful, it can be applied to
physical records or digital records.
o Data analysis is a process of inspecting, cleansing,
transforming, and modeling data with the goal of
discovering useful information, informing
conclusions, and supporting decision-making.

o Interpretation of data is the process of assigning


meaning to the collected information and
determining the conclusions, significance, and
implications of the findings.

o Presentation of data refers to the organization of


data into tables, graphs or charts, so that logical
and statistical conclusions can be derived from the
collected measurements.
4.1 Descriptive Statistics

Descriptive Statistics gives us


information or help describe the
characteristics of a specific data set by
giving short summaries about the sample
and measures of the data.
Basic Statistical Concepts

A population consists of the totality of the


observation and sample is a part of the
population.

A variable is any characteristics, number,


or quantity that can be measured or counted.
There are two kinds of variables:

1. Qualitative variables also called as categorical


variables are variables that are not
numerical. It describes data that fits into
categories.

2. Quantitative variables are numerical. It can be


ranked and has order.
Quantitative variables can be classified further
into discrete variables and continuous variables.

A discrete variable is a variable whose value is


obtained by counting.

Continuous variables can assume an infinite


number of values between any two specific
values. They are obtained by measuring. They
often include fractions and decimals.
E xamples:

Discrete
number of students present
number of red marbles in a jar
number of heads when flipping three coins
students’ grade level

Continuous
height of students in class
weight of students in class
time it takes to get to school
distance traveled between classes
 Types of Statistical Data

1. Numerical data
These data have meaning as a measurement such as a
person’s height, weight, IQ, or blood pressure or shares
of stocks a person owns.

2. Categorical data
Categorical data represent characteristics such as a
person’s gender, marital status, hometown, or the
types of movies they like. Categorical data can take on
numerical values (such as “1” indicating male and “2”
indicating female) but those numbers don’t have
mathematical meaning.
 Four Levels of Measurement

1. Nominal – it deals with names, categories, or labels.


– it possesses the identity property.
Examples: colors of eyes, yes or no responses to a survey, occupation
of parents, and favorite breakfast cereal.

2.Ordinal – data at this level possess the properties of


both identity and order.
– data at the ordinal level should not be used
in calculations.
Examples: Social classes, teacher’s ranks, taste preferences, Letter
grades where we can order things so that A is higher than B but
without any other information).
3. Interval – deals with data that possess identity
property, can be ordered, and has no absolute zero
property. Data at the interval level can be used in
calculations.
Examples: temperatures and intelligence scores

4. Ratio – the highest level of measurement. Data


possess all of the features of the interval level, in
addition to an absolute zero. Due to the presence of a
zero, it now makes sense to compare the ratios of
measurements.

Examples: length, density, and weight


E xercises:

Identify the level of measurement of the following data


as nominal, ordinal, interval, or ratio.

1) Students are classified by their reading ability:


Above average, Below Average, Normal

2) Amount of money in savings account


3) Intelligence quotient
4) Religious Affiliation
5) Flavors of frozen yogurt
4.2 Data Collection Method

1) In-Person Interviews
Pros: In-depth and a high degree of confidence on the data
Cons : Time consuming, expensive and can be dismissed as anecdotal
2) Mail Surveys
Pros : Can reach anyone and everyone – no barrier
Cons : Expensive, data collection errors, lag time
3) Phone Surveys
Pros : High degree of confidence on the data collected, reach almost anyone
Cons : Expensive, cannot self-administer, need to hire an agency
4) Web/Online Surveys
Pros : Cheap, can self-administer, very low probability of data errors
Cons : Not all your customers might have an email address/be on the internet,
customers may be wary of divulging information online
 Three Ways of Presenting Data

1. Textual – this method comprises data presentation


with the help of a paragraph or a number of
paragraphs.

2. Tabular – the method of presenting data using the


statistical table. A systematic organization of
data in columns and rows.

3. Graphical – a chart representing the quantitative


variations or changes of variables in pictorial or
diagrammatic form.
4.3 Frequency Distribution

Definition: Frequency

Frequency is the rate that measures how often


something occurs.
Example:
Jack’s team has scored the following numbers of goals in
their games,
3, 1, 2, 1, 3, 2, 4, 2, 3, 2, 5, 4, 3, 2.
J a c k put the numbers in order, then added up:
how often 1 occurs (2 times),
how often 2 occurs (5 times),
how often 3 occurs (4 times),
how often 4 occurs (2 times),
how often 5 occur (1 time)
Frequency Distributions
A frequency distribution is a table that shows classes or
intervals of data with a count of the number in each class.
The frequency f of a class is the number of data points in
the class.

C lass Frequency, f
1–4 4
Lower Class
Limits 5–8 5
Upper Class 9 – 12 3 Frequencies
Limits 13 – 16 4
17 – 20 2

Larson & Farber, Elementary Statistics: Picturing the World, 3e 18


Frequency Distributions
The class width is the distance between lower (or upper)
limits of consecutive classes.

Class Frequency, f
1–4 4
5–1=4 5–8 5
9–5=4 9 – 12 3
13 – 9 = 4 13 – 16 4
17 – 13 = 4 17 – 20 2
The class width is 4.

The range is the difference between the maximum and


minimum data entries.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 19
Constructing a Frequency Distribution Table

Guidelines
1. Decide on the number of classes to include. The number of
classes should be between 5 and 20; otherwise, it may be
difficult to detect any patterns. (or number of classes= 𝑁 )
2. Find the class width as follows. Determine the range of the
data, divide the range by the number of classes, and round up
to the next convenient number.
3. Find the class limits. You can use the minimum entry as the
lower limit of the first class. To find the remaining lower limits,
add the class width to the lower limit of the preceding class.
Then find the upper class limits.
4. Make a tally mark for each data entry in the row of the
appropriate class.
5. Count the tally marks to find the total frequency f for each
class.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 20


Constructing a Frequency Distribution Table

Example:
The following data represents the ages of 30 students in a
statistics class. Construct a frequency distribution that
has five classes.
Ages of Students
18 20 21 27 29 20
19 30 32 19 34 19
24 29 18 37 38 22
30 39 32 44 33 46
54 49 18 51 21 21
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 21
Constructing a Frequency Distribution Table

Example continued:

1. The number of classes (5) is stated in the problem.

2. The minimum data entry is 18 and maximum entry is


54, so the range is 54 – 18 = 36. Divide the range by the
number of classes to find the class width.

Class width = 36 = 7.2 Round up to 8.


5

Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 22
Constructing a Frequency Distribution Table

Example continued:
3. The minimum data entry of 18 may be used for the
lower limit of the first class. To find the lower class
limits of the remaining classes, add the width (8) to each
lower limit.
The lower class limits are 18, 26, 34, 42, and 50.
The upper class limits are 25, 33, 41, 49, and 57.
4. Make a tally mark for each data entry in the
appropriate class.

5. The number of tally marks for a class is the frequency


for that class.
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 23
Constructing a Frequency Distribution Table

Example continued:
Number of
Ages students
Ages of Students
C lass Tally Frequency, f
18 – 25 13
26 – 33 8
34 – 41 4
42 – 49 3
Check that the
50 – 57 2 sum equals
the number in
 f  30
the sample.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 24


Class Mark/Midpoint
The midpoint of a class is the sum of the lower and upper
limits of the class divided by two. The midpoint is
sometimes called the class mark.

Class mark = (Lower class limit) + (Upper class limit)


2

Class Frequency, f Class Mark


1–4 4 2.5

Class mark = 1  4  5  2.5


2 2

Larson & Farber, Elementary Statistics: Picturing the World, 3e 25


Class Mark/Midpoint
Example:
Find the class marks for the “Ages of Students” frequency
distribution.
Ages of Students
Class Frequency, f Class Mark
18 + 25
18 – 25 13 21.5 = 21.5
2
26 – 33 8 29.5
34 – 41 4 37.5
42 – 49 3 45.5
50 – 57 2 53.5
 f  30

Larson & Farber, Elementary Statistics: Picturing the World, 3e 26


Class Boundaries
.Class boundaries are the numbers that separate the classes
without forming gaps between them.
Ages of Students
Class
Class Frequency, f Boundaries
The distance from 18 – 25 13 17.5  25.5
the upper limit of
the first class to the 26 – 33 8 25.5  33.5
lower limit of the 34 – 41 4 33.5  41.5
second class is 1.
42 – 49 3 41.5  49.5
Half this 50 – 57 2 49.5  57.5
distance is 0.5.
 f  30

Larson & Farber, Elementary Statistics: Picturing the World, 3e 27


Relative Frequency
The relative frequency of a class is the portion or
percentage of the data that falls in that class. To find the
relative frequency of a class, divide the frequency f by the
sample size n .
Class frequency  f
Relative frequency =
Sample size n

Relative
Class Frequency, f
Frequency
1–4 4 0.222
 f  18
Relative frequency  f  4  0.222
n 18
Larson & Farber, Elementary Statistics: Picturing the World, 3e 28
Relative Frequency
Example:
Find the relative frequencies for the “Ages of Students”
frequency distribution.

Relative Portion of
C lass Frequency, f Frequency students
18 – 25 13 0.433 f  13
26 – 33 8 0.267 n 30
34 – 41 4 0.133  0.433
42 – 49 3 0.1
50 – 57 2 0.067
 f  30 f 1
n
Larson & Farber, Elementary Statistics: Picturing the World, 3e 29
Cumulative Frequency
The cumulative frequency of a class is the sum of the
frequency for that class and all the previous classes.

Ages of Students
Cumulative
Class Frequency, f Frequency
18 – 25 13 13
26 – 33 +8 21
34 – 41 +4 25
42 – 49 +3 28
Total number
50 – 57 +2 30 of students
 f  30

Larson & Farber, Elementary Statistics: Picturing the World, 3e 30


Frequency Distribution Table
Example: Frequency distribution table of the ages of students
Class Frequency Class Class Cumulative Relative
Limits (f ) Marks Boundaries Frequency Frequenc
y
18 – 25 13 21.5 17.5 – 25.5 13 0.433

26 – 33 8 29.5 25.5 – 33.5 21 0.267

34 – 41 4 37.5 33.5 – 41.5 25 0.133

42 – 49 3 45.5 41.5 – 49.5 28 0.1

50 – 57 2 53.5 49.5 – 57.5 30 0.067

Larson & Farber, Elementary Statistics: Picturing the World, 3e 31


Graphical Representation of
4.3 Frequency Distribution
Frequency Histogram
A frequency histogram is a bar graph that represents
the frequency distribution of a data set.
1. The horizontal scale is quantitative and measures
the data values.
2. The vertical scale measures the frequencies of the
classes.
3. Consecutive bars must touch.

The horizontal scale of a histogram can be marked with


either the class boundaries or the class marks.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 33


Frequency Histogram
Example:
Draw a frequency histogram for the “Ages of Students”
frequency distribution. Use the class boundaries.

14 13 Ages of Students
12
10
8
8

f 6
4
4 3
2 2

0
17.5 25.5 33.5 41.5 49.5 57.5
Broken axis
Age (in years)
Larson & Farber, Elementary Statistics: Picturing the World, 3e 34
Frequency Polygon
A frequency polygon is a line graph that emphasizes the
continuous change in frequencies.

14
Ages of Students
12
10
8 Line is extended
to the x -axis.
f 6
4
2
0
13.5 21.5 29.5 37.5 45.5 53.5 61.5
Broken axis
Age (in years) C lassmarks

Larson & Farber, Elementary Statistics: Picturing the World, 3e 35


Cumulative Frequency Graph
A cumulative frequency graph or ogive, is a line graph
that displays the cumulative frequency of each class at
its upper class boundary.

30 Ages of Students
Cumulative frequency
(portion of students)

24

18
The graph ends
at the upper
12 boundary of the
last class.
6

0
17.5 25.5 33.5 41.5 49.5 57.5
Age (in years)
Larson & Farber, Elementary Statistics: Picturing the World, 3e 36
Pie Chart
A pie chart is a circle that is divided into sectors that
represent categories. The area of each sector is proportional
to the frequency of each category.
Accidental Deaths in the U S A in 2002
Type Frequency
Motor Vehicle 43,500
Falls 12,200
Poison 6,400
Drowning 4,600
Fire 4,200
Ingestion of Food/Object 2,900
(Source: U S Dept.
of Transportation) Firearms 1,400 Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 37
Pie Chart
To create a pie chart for the data, find the relative frequency
(percent) of each category.
Relative
Type Frequency
Frequency
Motor Vehicle 43,500 0.578
Falls 12,200 0.162
Poison 6,400 0.085
Drowning 4,600 0.061
Fire 4,200 0.056
Ingestion of Food/Object 2,900 0.039
Firearms 1,400 0.019
n = 75,200
Continued .
Larson & Farber, Elementary Statistics: Picturing the World, 3e 38
Pie Chart
Next, find the central angle. To find the central angle,
multiply the relative frequency by 360°.
Relative
Type Frequency Angle
Frequency
Motor Vehicle 43,500 0.578 208.2°
Falls 12,200 0.162 58.4°
Poison 6,400 0.085 30.6°
Drowning 4,600 0.061 22.0°
Fire 4,200 0.056 20.1°
Ingestion of Food/Object 2,900 0.039 13.9°
Firearms 1,400 0.019 6.7°

Continued .
Larson & Farber, Elementary Statistics: Picturing the World, 3e 39
Pie Chart
Fire Ingestion F irearms
6% 4% 2%
Drowning
6%

Poison
8%
Motor
Vehicle
Fa lls 58%
16%

Graph of the Accidental Deaths in the U S A in 2002

Continued .
Larson & Farber, Elementary Statistics: Picturing the World, 3e 40
Scatter Plot
When each entry in one data set corresponds to an entry in
another data set, the sets are called paired data sets.

In a scatter plot, the ordered pairs are graphed as points


in a coordinate plane. The scatter plot is used to show the
relationship between two quantitative variables.

The following scatter plot represents the relationship


between the number of absences from a class during the
semester and the final grade.

Continued .
Larson & Farber, Elementary Statistics: Picturing the World, 3e 41
Scatter Plot
Absences Grade
Final 100 x y
grade 90 8 78
2 92
(y) 80
5 90
70 12 58
60 15 43
50 9 74
6 81
40

0 2 4 6 8 10 12 14 16
Absences (x)

From the scatter plot, you can see that as the number of
absences increases, the final grade tends to decrease.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 42
Times Series Chart
A data set that is composed of quantitative data entries
taken at regular intervals over a period of time is a time
series. A time series chart is used to graph a time series.

Example:
The following table lists Month Minutes
the number of minutes January 236
Robert used on his cell
February 242
phone for the last six
months. March 188
April 175
Construct a time series May 199
chart for the number of
June 135
minutes used.
Continued .
Larson & Farber, Elementary Statistics: Picturing the World, 3e 43
Times Series Chart

Robert’s Cell Phone Usage


250

200
Minutes

150

100

50

0
Jan Feb Mar Apr M ay J une
Month

Larson & Farber, Elementary Statistics: Picturing the World, 3e 44


4.4
Measures of Central Tendency

A measure of central tendency is a value that


represents a typical, or central, entry of a data
set. The three most commonly used measures of
central tendency are the mean , the median , and
the mode.
M ean

The mean of a data set is the sum of the data entries


divided by the number of entries.

Population mean: μ  x
N

“mu” Sample mean: x   x


n
“x-bar”

Larson & Farber, Elementary Statistics: Picturing the World, 3e 46


M ean
Example 1:
The following are the ages of all seven employees of a small
company:
53 32 61 57 39 44 57
Calculate the population mean.

Solution:
∑𝑥 53 + 32 + 61 + 57 + 39 + 44 + 57
𝜇= = = 𝟒𝟗
𝑁 7

The mean age of the employees is 49 years.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 47


M ean
Example 2:
What is the average time for rats to finish a maze?

Rat 1 took 34 seconds


Rat 2 took 49 seconds
Rat 3 took 48 seconds
Rat 4 took 46 seconds

The average time for rats to finish a maze is


34 + 49 + 48 + 46
= 44.25 seconds.
4
Larson & Farber, Elementary Statistics: Picturing the World, 3e mathsisfun.com 47
Median
The median of a data set is the value that lies in the
middle of the data when the data set is ordered.
If the data set has an odd number of entries, the median is the
middle data entry. If the data set has an even number of entries,
the median is the mean of the two middle data entries.

Example 1:
Calculate the median age of the seven employees.
53 32 61 57 39 44 57

To find the median, sort the data.


32 39 44 53 57 57 61

The median age of the employees is 53 years.


Larson & Farber, Elementary Statistics: Picturing the World, 3e 49
M ode
The mode of a data set is the data entry that occurs with
the greatest frequency.
If no entry is repeated, the data set has no mode. If two entries
occur with the same greatest frequency, each entry is a mode and
the data set is called bimodal.

Example 1:
Find the mode of the ages of the seven employees.
53 32 61 57 39 44 57

The mode is 57 because it occurs the most times.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 50


M ode
Example 2:
Consider the data set * 6 , 3, 9, 6, 6, 5, 9, 3 +. Find the
modal value of the given set.

Since 6 occurs most often, the modal value is 6.

Larson & Farber, Elementary Statistics: Picturing the World, 3e mathsisfun.com 50


Weighted Mean

A weighted mean is the mean of a data set whose entries have


varying weights. A weighted mean is given by
x  (x w )
w
where w is the weight of each entry x.

Example:
Grades in a statistics class are weighted as follows:
Tests are worth 50% of the grade, homework is worth 30% of the
grade and the final is worth 20% of the grade. A student receives a
total of 80 points on tests, 100 points on homework, and 85 points
on his final. What is his current grade?
Continued .
Larson & Farber, Elementary Statistics: Picturing the World, 3e 52
Weighted Mean

Begin by organizing the data in a table.

Source Score, x Weight, w xw


Tests 80 0.50 40
Homework 100 0.30 30
Final 85 0.20 17

x  (x w )  87  0.87
w 100

The student’s current grade is 87%.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 53


It is worth noting that the mean has the following characteristics:
a) The mean is affected by the presence of extreme values.
b) The sum of the deviations of the observations from the mean is
zero.
c) The sum of the squared deviations of the observations from the
mean is minimum.
d) It is a good measure for interval and ratio type of data.

The median has the following characteristics:


a) It is not affected by the presence of extreme observations.
b) The sum of absolute deviations of the observation from the
median is minimum.
c) It is an appropriate measure for an ordinal type of data.

The mode has the following characteristics:


a) Mode is determined by frequency.
b) It is an appropriate measure for nominal data.
Exercises:

1. The mean of four numbers is 71.5. If three of the


numbers are 58, 76, and 88, what is the value of the
fourth number?

2. Consider the following dataset:


12, 11, 14, 10, 8, 13, 11, 9

Find the mean, median, and modal value.


4.5 Measures of Variability

Variability or dispersion refers to how much the


observations spread out from the mean. The
higher the variability, the more dispersed are
the observation; the lower it is, the more
consistent are the observations.
Some measures of variability are: Range ,
Standard Deviation , and Variance.
Range
The range of a data set is the difference between the maximum and
minimum date entries in the set.
Range = (Maximum data entry) – (Minimum data entry)

Example:
The following data are the closing prices for a certain stock
on ten successive Fridays. Find the range.

Stock 56 56 57 58 61 63 63 67 67 67

The range is 67 – 56 = 11.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 57


Variance and Standard Deviation
Definition: Variance

- it is defined as the average of the squared


deviations (differences) from the mean.
- it is the measure that considers the position of
each observation relative to the mean.

Definition: Standard Deviation

- it is the measure of the spread or dispersion of


scores from the mean of distribution. It is the
square root of the variance

Larson & Farber, Elementary Statistics: Picturing the World, 3e 58


Variance and Standard Deviation
The population variance of a population data set of N entries is
Population variance =  
2 (x  μ )2
.
N
“sigma
squared”

The population standard deviation of a population data set of N


entries is the square root of the population variance.
(x  μ)2 .
Population standard deviation =     2
N
“sigma”

Larson & Farber, Elementary Statistics: Picturing the World, 3e 59


Variance and Standard Deviation

The sample variance of a sample data set of 𝑛 entries is


sample variance = s 2 
 x  x 2

n 1

The sample standard deviation of a sample data set of 𝑛 entries is


the square root of the sample variance.
 x  x 2
sample standard deviation = s  n 1

Larson & Farber, Elementary Statistics: Picturing the World, 3e 60


Finding the Population Standard Deviation

Guidelines
In Words In Symbols
1. Find the mean of the population μ  x
data set. N

2. Find the deviation of each entry. x μ


3. Square each deviation. x  μ2
4. Add to get the sum of squares. S S x  x  μ2
5. Divide by N to get the population x  μ2
  2
variance. N
6. Find the square root of the
x  μ2
variance to get the population 
N
standard deviation.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 61


Finding the Sample Standard Deviation

Guidelines
In Words In Symbols
1. Find the mean of the sample data x  x
set. n

2. Find the deviation of each entry. x x


3. Square each deviation. x  x 2
4. Add to get the sum of squares. S S x   x  x 2
5. Divide by n – 1 to get the sample  x  x 2
2
variance. s 
n 1
6. Find the square root of the
 x  x 2
variance to get the sample s
n 1
standard deviation.

Larson & Farber, Elementary Statistics: Picturing the World, 3e 62


Finding the Population Standard Deviation

Example 1:
The following data are the closing prices for a certain stock on five
successive Fridays. The population mean is 61. Find the population
standard deviation.
Always positive!

Stock Deviation Squared SS 2 = Σ(x – μ)2 = 74


x x– μ (x – μ)2
56 –5 25 x  μ2 74
 
2
 5  14.8
58 –3 9 N
61 0 0
63 2 4 x  μ2
  14.8  33.885
67 6 36 N

Σ x = 305 Σ(x – μ) = 0 Σ(x – μ)2 = 74


σ  $3.85
Larson & Farber, Elementary Statistics: Picturing the World, 3e 63
Example 2:
You and your friends have just measured the
heights of a sample of 5 dogs (in centimeters).

The heights (at the shoulders) are: 60cm, 47cm, 17cm,


43cm and 30 cm.

Find the mean, variance and standard deviation.

mathsisfun.com
Solution:

A) Mean:
60 + 47 + 17 + 43 + 30 197
𝑥= = = 39.4
5 5

The average height is 39.4 centimeters.

𝟑𝟗. 𝟒

mathsisfun.com
S olution:

B) Sample variance:

(60 − 39.4)2+(47 − 39.4)2+(17 − 39.4)2+(43 − 39.4)2+(30 − 39.4)2


𝑠2 =
5−1
1085.2
=
4

= 271.3

C) Sample standard deviation:


𝑠 = 271.3 = 16.47 cm

mathsisfun.com
So, using the standard deviation we have a "standard"
way of knowing what is normal, and what is extra large
or extra small.

16.47

16.47

Rottweiler dogs are tall and Dachshunds are a bit short,


right?

mathsisfun.com
4.6 Testing a Statistical Hypothesis

Hypothesis testing is the most significant area of statistical


inference. It is a step-by-step process in making inferences
(conclusions) about a population.

The truth value of a statistical hypothesis can only be


identified when we take a portion of the population of
interest and use the information obtained from this portion
to decide whether the statistical hypothesis is likely to be
true or false. We either “reject” the statistical hypothesis
when inconsistencies from the sample occur, or “not reject”
otherwise.
Significance is defined as the quality of being statistically
significant.

Level of Significance
• It is denoted by alpha or α refers to the degree of
significance in which we accept or reject the null
hypothesis.
• 100% accuracy is not possible in accepting or rejecting a
hypothesis.
• The significance level is also the probability of making the
wrong decision when the null is true.
 Types of Statistical Hypothesis
 Types of Statistical Tests

1) One – tailed test (Directional)


If the alternative hypothesis of any statistical test is one –
sided, it is said to be a one – tailed test.

Examples: H 1 : 𝑥 < 8 or H 1 : 𝑥 > 8

2) Two – tailed test (Non – directional)


if the alternative hypothesis is two – sided, the test is said
to be two – tailed .

Examples: H 1 : 𝑥 ≠ 8
 Possibilities in the Decision Procedure

i. The null hypothesis is accepted when it is true.


Perfect decision

ii. The null hypothesis is accepted when it is false.


Error

i. The null hypothesis is rejected when it is true.


Error

ii. The null hypothesis is rejected when it is false.


Perfect decision
 Types of Error

1) Type I error
The null hypothesis is rejected when in fact it is
true.

2) Type II error
The null hypothesis is accepted when in fact it
is false .
 Constructing the Null and Alternative Hypothesis

Examples: Formulate the null H 0 and alternative


hypothesis H 1 for the following problems:

1) A researcher wants to know if the average test score of


the students taking a particular examination is 80.

H 0 : 𝜇 = 80 (the average test score of the students taking


a particular examination is 80).

H 1 : 𝜇 ≠ 80 (the average test score of the students taking


a particular examination is not 80).
 Constructing the Null and Alternative Hypothesis

2) A researcher wants to study if the customer satisfaction


level of a cable television company A is greater than a
cable television company B.

H 0 : 𝜇1 = 𝜇2 (the customer satisfaction levels of two


competing cable television companies are the same).

H 1 : 𝜇1 > 𝜇2 (the customer satisfaction levels of a cable


television company A is greater than a cable television
company B).

You might also like