(M5-MAIN) Data Management

Module 5
Data Management
About This Module
This module is will teach the student basic statics like
types of data, organizing and presenting data, statistical
measures and probability.
Substopics
Learn the basic terms and applications of statistics
Learn how to organize data in tables and graphs. Also
learn how to interpret tables and graphs.
Learn how to create a frequency distribution table from a
given set of raw data. Learn how to interpret values in
the frequency distribution table.
Learn how to compute basic statistical measures and
their meaning.
Learn how computer and interpret other measures of
location
Learn the basics of classical probability.
You may stop here and proceed
to the subtopic materials or you
may continue and read through
the full course materials.
Data Management
Introduction to Statistics
Intended Learning
Outcomes
• Understand the steps in conducting a statistical
investigation.
• Understand basic statistical terms
• Differentiate between descriptive and inferential
statistics
Statistics
stat
Numerical data - representing counts or measurements
Categorical data – descriptions or characteristics
Any recording of information is called an observation

• Collection,
• Presentation,
• Analysis, and
• Interpretation of data.
Descriptive statistics comprises those methods concerned
with collecting and describing a set of data so as to yield
meaningful information.
stat
stat
stat
stat
Statistical Inference comprises those methods concerned
with the analysis of a subset of data leading to predictions or
inferences about the entire set of data.
• Infer the expected amount of rain for July next year based
on the average precipitation data for July in the past 30
years.
• Estimate the average depth of a lake from measures taken

from a set of random areas.
1. As a result of recent cutbacks by oil-producing nations, we can expect
the price of gasoline to double in the next year.
2. At least 5% of all fires reported last year in a certain city were
deliberately set by arsonists.
3. Of all patients who have received this particular type of drug at a local
clinic, 60% later developed significant side effects.
4. Assuming that less than 20% of the Columbian coffee beans were
destroyed by frost this past winter, we should expect an increase of no
more than 30 cents for a kilogram of coffee by the end of the year.
5. As a result of a recent poll, most Americans are in favor of building
additional nuclear power plants.
1. As a result of recent cutbacks by oil-producing nations, we can expect
the price of gasoline to double in the next year.
2. At least 5% of all fires reported last year in a certain city were
deliberately set by arsonists.
3. Of all patients who have received this particular type of drug at a local
clinic, 60% later developed significant side effects.
4. Assuming that less than 20% of the Columbian coffee beans were
destroyed by frost this past winter, we should expect an increase of no
more than 30 cents for a kilogram of coffee by the end of the year.
5. As a result of a recent poll, most Americans are in favor of building
additional nuclear power plants.
A population consists of the totality of the observations with
which we are concerned. May be finite or infinite
A sample is a subset of a population. For the inferences from the

sample to be valid, the sample must be representative of the
population.
A simple random sample of 𝑛 observations is a sample that is

chosen in such a way that every subset of 𝑛 observations of the
population has the same probability of being selected.
Define a suitable populations from which the following samples
are selected:
1. Persons in 200 homes in the city of Richmond were asked

who their favored candidate to the school board.
2. A coin is tossed 100 times and 34 tails are recorded
3. 200 pairs of a new type of tennis shoe are tested on the

professional tour and, on the average, lasted 4 months.
Define a suitable populations from which the following samples
are selected:
1. Persons in 200 homes in the city of Richmond were asked
who their favored candidate to the school board.
Favored candidate of the persons in all homes in the city of Richmond
2. A coin is tossed 100 times and 34 tails are recorded

Outcomes of all tosses of the coin
3. 200 pairs of a new type of tennis shoe are tested on the

professional tour and, on the average, lasted 4 months.
Life of all pairs of the new type of tennis shoes
A useful tool in choosing a randon sample from any population
[1]
Ex. Select a random sample of 15 out of 324 participants.
[1]
Data Management
Tables and Graphs

Intended Learning
Outcomes
• Organize collected data into tables and graphs.
• Interpret tables and graphs.
Tables and Graphs
Tables summarizes raw data into an organized list or distribution of counts
and may includes sums and proportions.
Branch Apples Oranges Total

Fruits
1 125 75 200
2 118 84 202
3 164 72 236
Total 407 231 638
Bar graphs are often used to compare quantities in
different categories.
Ex.
• Net value of different companies
• Number of subscribers of different networks
Number of fruits supplied to each store
Number of fruits supplied to each branch per week
store branch per week
180
Branch Apples Oranges Total 160
Fruits 140
120
1 125 75 200 100
2 118 84 202 80
60
3 164 72 236 40
Total 407 231 638 20

0
1 2 3
Apples Oranges
A pie graph is used to show the distribution or
proportions of parts to a whole
Ex.
• National budget allocation
• Caloric distribution
Recommended Diet Distribution Recommended Diet
Food Recommended
Diet
Fruit 36%
Protein 28%
Vegetables 14%
Dairy 13%
Grain 9%
Fruit Protein Vegetables Dairy Grain

Line Graphs show information that is connected in some
way like changes through time.
Growth and Profit of Product A
45%
Growth and Profit of Product A
40%
35%
Month Growth Profit
30%
January -3% -2% 25%
February -2% 5% 20%
March 1% 11% 15%
10%
April 3% 25%
5%
May 7% 40% 0%
June 5% 42% -5% January February March April May June

-10%
Growth Profit
Data Management
Frequency Distribution Table

Intended Learning
Outcomes
• Construct a frequency distribution table from a set
of raw data.
• Interpret values on the frequency distribution table
Frequency Distribution
Table
A frequency distribution is the organization of raw data in table form, using
classes and frequencies. When the range of the data is large, the data
must be grouped into classes that are more than one unit in width, in what is
called a grouped frequency distribution.
Each class is defined by its class limits, which are the smallest and highest
data value that can be included in the class.
Class boundaries are numbers used to separate the classes so that there
are no gaps in the frequency distribution.
There must be 5 to 20 classes that are mutually exclusive and exhaustive.

They must be continuous and of equal widths.
A school has conducted a survey of 60 students to investigate the time it
takes for students to travel to school. The following data gives the travel time
to the nearest minute:
12 15 16 8 10 17 25 34 42 18
24 18 45 33 38 45 40 3 20 12
10 10 27 16 37 45 15 16 26 32
35 8 14 18 15 27 19 32 6 12
14 20 10 16 14 28 31 21 25 8
32 46 14 15 20 18 8 10 25 22
Suppose you want to create a frequency distribution table with 5 classes.

• Find the range and select the number of classes desired.
• Find the width (class size) by dividing the range by the number of classes
and rounding up.
• Select a starting point (usually the lowest value or any convenient
number less than the lowest value), add the width to get the next lower
limit.
• Subtract one unit from the lower limit of the second class to get the upper
limit of the first class. Add the width to get the next upper limits.
• Find the class boundaries by subtracting 0.5 from the each lower limit
and adding 0.5 from each upper limit.
STEP 2: Tally the data and find the frequency for each class.
Count the number of data that belongs to each class.
STEP 3: Find the cumulative frequencies. Cumulative

frequencies are used to show how many data values are
accumulated up to and including a specific class.
STEP 4: Find the class marks to be the representative value

𝑙𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡+𝑢𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡
of each class. 𝑐𝑙𝑎𝑠𝑠 𝑚𝑎𝑟𝑘 =
2
Class Limits Class Class Mark Frequency Cumulative
Boundaries Frequency
2 – 10 1.5 – 10.5 6 11 11
11 – 19 10.5 – 19.5 15 21 32
20 – 28 19.5 – 28.5 24 13 45
29 – 37 28.5 – 37.5 33 8 53
38 – 46 37.5 – 46.5 42 7 60
25
Class Frequency
Boundaries 20
1.5 – 10.5 11
Frequency
15
10.5 – 19.5 21
10
19.5 – 28.5 13
5
28.5 – 37.5 8
37.5 – 46.5 7 0
1.5 – 10.5 10.5 – 19.5 19.5 – 28.5 28.5 – 37.5 37.5 – 46.5
Class Boundaries
Histogram is a bar graph of the frequencies against the class boundaries.

Class Mark Frequency 25
1.5 0 20
Frequency
6 11 15
15 21 10
24 13
5
33 8
0
42 7 1.5 6 15 24 33 42 46.5
46.5 0 Class Mark
Frequency Polygon is the line graph of the frequencies against the class marks.
Close the polygon at the lowest and highest class boundaries.
70
60
Cumulative Frequency
Class Cumulative
Boundaries Frequency 50
1.5 0 40
30
10.5 11
20
19.5 32
10
28.5 45
0
37.5 53 1.5 10.5 19.5 28.5 37.5 46.5
46.5 60 Class Boundaries
Line graph of the cumulative frequency with the upper boundaries.

Data Management
Measures of Center and Dispersion

Intended Learning
Outcomes
• Compute measures of central tendency
• Compute measures of dispersion
• Interpret values
Statistical Measures 1
𝒏
෍ 𝒙𝒊 = 𝒙𝟏 + 𝒙𝟐 + 𝒙𝟑 + ⋯ + 𝒙𝒏
𝒊=𝟏
Example 𝒙 = {𝟏, 𝟓, 𝟐, −𝟑, 𝟎, −𝟐}
෍ 𝒙𝒊 = 𝒙𝟏 + 𝒙𝟐 + 𝒙𝟑 + 𝒙𝟒 + 𝒙𝟓 = 𝟏 + 𝟓 + 𝟐 + −𝟑 + 𝟎
𝒊=𝟏
𝟑
෍ 𝒊 ⋅ 𝒙𝒊 = 𝟏 𝒙𝟏 + 𝟐 𝒙𝟐 + 𝟑 𝒙𝟑 = 𝟏 𝟏 + 𝟐 𝟓 + 𝟑(𝟐)
𝒊=𝟏
These values are used to represent a set of data.
• Mean
• Median
• Mode
Population Mean
Given a population of size 𝑁, the population mean is defined as
σ𝑁𝑖=1 𝑥𝑖
𝜇=
𝑁
where the 𝑥𝑖 ’s are the observation values
Sample Mean
Given a sample of size 𝑛, the sample mean is defined as
σ𝑛𝑖=1 𝑥𝑖
𝑥ҧ =
𝑛
where the 𝑥𝑖 ’s are the observation values
The median is the middle number when all observations are arranged in
increasing or decreasing order.
Median
Arrange the observations from lowest to highest.
Let 𝑛 be the number of observations.
𝑛+1
If 𝑛 is odd, let 𝑘 = .
2
Then the median is the 𝑘th observation, 𝑥𝑘 .
𝑛
If 𝑛 is even, let 𝑘 = .
2
Then the median is the average of the 𝑘th and 𝑘 + 1 th observations,
𝑥𝑘 + 𝑥𝑘+1
.
2
.
The mode of a set of observations is that value which occurs
most often with the greatest frequency.
Find the mean, median, and mode of the given set of data:
26 29 29 29 30 31 34 41 42 42 47 48
26 29 29 29 30 31 34 41 42 42 47 48
26+29+29+29+30+31+34+41+42+42+47+48
Mean: = 35.67
12
26 29 29 29 30 31 34 41 42 42 47 48
Median: There are 12 observations: 𝑛 = 12, even

12
𝑘 = = 6 then 𝑘 + 1 = 7. 𝑥6 = 31 and 𝑥7 = 34
2
31+34
Thus the median is : = 32.5
2
26 29 29 29 30 31 34 41 42 42 47 48
Mode : 29
Weighted Mean
When each data point, 𝑥𝑖 , contribute different weights, 𝑤𝑖 , on the mean.
σ𝑁
𝑖=1 𝑤𝑖 𝑥𝑖
𝑥ҧ = 𝑁
σ𝑖=1 𝑤𝑖
Examples:
Grade point average of subjects with varying units.
Average daily balance of a bank account.
Mean computed from the frequency distribution table
Compute for the weighted mean grade
3 2 + 3 1.5 + 5 1 + 3 1.5 + 3 1.5

𝑥ҧ = = 1.44
3+3+5+3+3
These values are used to describe the distribution of a
set of data
• Range
• Variance
• Standard Deviation
Range
the difference between the largest and smallest number in the set.
𝑟𝑎𝑛𝑔𝑒 = ℎ𝑖𝑔ℎ𝑒𝑠𝑡 − 𝑙𝑜𝑤𝑒𝑠𝑡
Example:
26 29 29 29 30 31 34 41 42 42 47 48
𝑟𝑎𝑛𝑔𝑒 = 48 − 26 = 22
Sometimes the range is not enough to describe the
distribution of values
A = {3, 4, 5, 6, 8, 9, 10, 12, 15} 𝑟𝑎𝑛𝑔𝑒𝐴 = 15 − 3 = 12

B = {3, 7, 7, 7, 8, 8, 8, 9, 15} 𝑟𝑎𝑛𝑔𝑒𝐵 = 15 − 3 = 12
What can you say about the variation of sets A and B?

Population Variance
Given a population of size 𝑁 with mean 𝜇, the population variance is defined as
σ 𝑁 2
𝑖=1 𝑥𝑖 − 𝜇
𝜎2 =
𝑁
Where the 𝑥𝑖 ’s are the observation values
Sample Variance
Given a sample of size 𝑛 with mean 𝑥,ҧ the sample variance is defined as
σ 𝑛 2
𝑥
𝑖=1 𝑖 − 𝑥ҧ
𝑠2 =
𝑛−1
Where the 𝑥𝑖 ’s are the observation values
Standard Deviaton
Population Standard Deviation Sample Standard Deviation
𝜎= 𝜎2 𝑠= 𝑠2
Compute for the variance and standard deviation of each set of sample
data:
A = {3, 4, 5, 6, 8, 9, 10, 12, 15} 𝑥𝐴 = 8

B = {3, 7, 7, 7, 8, 8, 8, 9, 15} 𝑥𝐵 = 8
2 2 2 2
3−8 + 4−8 + 8 − 8 2 + 9 − 8 2 + 10 − 8
+ 5−8 + 6−8 2
+ 12 − 8 2
+ 15 − 8 2
𝑠𝐴2= = 15.5
9−1
2
3−8 + 7−8 + 7−8 + 7−8 + 8−8 2+ 8−8 2+ 8−8
2 2 2 2 2
+ 9−8 2
+ 15 − 8 2
𝑠𝐵 = = 9.75
9−1
𝑠𝐴 = 15.5 = 3.94 𝑠𝐵 = 9.75 = 3.12
The higher the variance, the more varied is the data.

An observation, 𝑥, from a population with mean,𝜇 , and
standard deviation,𝜎 , has a 𝑧-score or 𝑧-value defined by
𝑥−𝜇
𝑧=
𝜎
This is the value used to compare values from different sets

with different mean and standard deviation.
Different typing skills are required for secretaries depending on whether one is working in a
law office, an accounting firm, or for a research group. Given this table of standardized
samples and the applicant’s score, for what type of position does this applicant seem to be
best suited?
Sample Applicant’s Score Mean Standard Deviation

Law 141 sec 180 sec 30 sec
Accounting 7 min 10 min 2 min
Research 33 min 26 min 5 min
141 − 180 7 − 10 33 − 26
𝑧𝐿 = = −1.3 𝑧𝐴 = = −1.5 𝑧𝑅 = = 1.4
30 2 5
Since for this problem, the best would be one with the lowest relative time. Thus the
applicant is most suitable for Accounting.
Data Management
Other Measures of Location

Intended Learning
Outcomes
• Compute measures of location
• Interpret these values
Statistical Measures 2
• Percentile
• Decile
• Quartile
A percentile is a measure indicating the value below which is a given
percentage of observations in a group of observations fall.
The 𝑟th percentile is denoted by 𝑃𝑟 , where 𝑟 is a whole number from 1 to 99.

Percentile
Arrange the observations from lowest to highest.
𝑟
Let 𝑘 = (𝑁 + 1), where 𝑁 is the number of observations.
100
Let 𝐼𝑘 be the integer part of 𝑘 and 𝐷𝑘 be it’s decimal part. Identify the 𝐼𝑘th and
𝐼𝑘 + 1 st observations.
Then the 𝑟th percentile is computed as

𝑃𝑟 = 𝑥𝐼𝑘 + 𝐷𝑘(𝑥𝐼𝑘+1 − 𝑥𝐼𝑘 )
Find the 85th percentile or 𝑃85 in the given data set below.
3 6 8 8 8 8 10 10 10 10
10 12 12 12 14 14 14 14 15 15
15 15 16 16 16 16 17 18 18 18
18 19 20 20 20 21 22 24 25 25
25 26 27 27 28 31 32 32 32 33
34 35 37 38 40 42 45 45 45 46
85
𝑟 = 85 𝑘= 60 + 1 = 51.85
100
𝐷𝑘 = 0.85 𝐼𝑘 = 51 𝐼𝑘 + 1 = 52 𝑥51 = 34 𝑥52 = 35
𝑃85 = 34 + 0.85 35 − 34 = 34.85

Deciles and Quartiles are special percentiles.
There are 9 deciles and their equivalent percentiles are

𝐷1 = 𝑃10 , 𝐷2 = 𝑃20 , …, 𝐷_9 = 𝑃90 .
There are 3 quartiles and their equivalent percentiles are

𝑄1 = 𝑃25 , 𝑄2 = 𝑃50 , 𝑄3 = 𝑃75
Example: Find 𝐷6 and 𝑄1 from the data set in the previous slide.
Find the 6th decile or 𝐷6 = 𝑃60 in the given data set below.
3 6 8 8 8 8 10 10 10 10
10 12 12 12 14 14 14 14 15 15
15 15 16 16 16 16 17 18 18 18
18 19 20 20 20 21 22 24 25 25
25 26 27 27 28 31 32 32 32 33
34 35 37 38 40 42 45 45 45 46
60
𝑟 = 60 𝑘= 60 + 1 = 36.6
100
𝐷𝑘 = 0.6 𝐼𝑘 = 36 𝐼𝑘 + 1 = 37 𝑥36 = 21 𝑥37 = 22
𝐷6 = 21 + 0.6 22 − 21 = 21.6
Find the 1st quartile or 𝑄1 = 𝑃25 in the given data set below.
3 6 8 8 8 8 10 10 10 10
10 12 12 12 14 14 14 14 15 15
15 15 16 16 16 16 17 18 18 18
18 19 20 20 20 21 22 24 25 25
25 26 27 27 28 31 32 32 32 33
34 35 37 38 40 42 45 45 45 46
25
𝑟 = 25 𝑘= 60 + 1 = 15.25
100
𝐷𝑘 = 0.25 𝐼𝑘 = 15 𝐼𝑘 + 1 = 16 𝑥15 = 14 𝑥16 = 14
𝑄1 = 14 + 0.25 14 − 14 = 14
Data Management
Probability
Intended Learning
Outcomes
• Find basic probabilities
• Apply properties of probabilities in solving
problems.
Classical Probability
Basic Concepts
Definition
A probability experiment is a chance process that leads to well-defined results called
outcomes.
An outcome is a result of a single trial of a probability experiment.
A sample space is the set of all possible outcomes of a probability experiment.
MMW5.6 Data Mana

Examples
Experiment Sample Space

Toss one coin head, tail
Rolla die 1, 2, 3, 4, 5, 6
Answer a true or false question true, false
Toss two coins head-head,head-tail,tail-head,tail-tail
MMW5.6 Data Mana

Rolling Dice
The sample space for rolling two dice
Die 2
Die 1 1 2 3 4 5 6
1 (1,1) (1,2) (1,3) (1,4) (1,5) (1,6)
2 (2,1) (2,2) (2,3) (2,4) (2,5) (2,6)
3 (3,1) (3,2) (3,3) (3,4) (3,5) (3,6)
4 (4,1) (4,2) (4,3) (4,4) (4,5) (4,6)
5 (5,1) (5,2) (5,3) (5,4) (5,5) (5,6)
6 (6,1) (6,2) (6,3) (6,4) (6,5) (6,6)
MMW5.6 Data Mana

Gender of Children
Use a tree diagram to find the sample space for the gender fo the children if a family has three
children. Use B for boy and G for girl.
MMW5.6 Data Mana

Definition
An event consists of a set of outcomes of a probability experiment.
Equally likely events are events that have the same probability of occuring.
Classical probability assumes that all outcomes in the sample space are equally likely to
occur.
MMW5.6 Data Mana

Formula for Classical Probability
The probability of any event E is

Number of outcomes in E
P(E) =
Total number of outcomes in the sample space
This probability is called classical probability, and it uses the sample space, S.
MMW5.6 Data Mana

Example
Drawing Cards
A card is drawn from an ordinary deck. Find the probability that the card is a queen.
MMW5.6 Data Mana

Example
Drawing Cards
A card is drawn from an ordinary deck. Find the probability that the card is a queen.
The sample space consists of all the 52 different cards in the deck.
The event, E, of drawing a queen consist of 4 different queens, queen of hearts, queen of
diamonds, queen of spade, and queen of clubs.
4 1
P(E) = =
52 13
MMW5.6 Data Mana

Example
Gender of Children
If a family has three children, find the probability that two of the children are girls.
MMW5.6 Data Mana

Example
Gender of Children
If a family has three children, find the probability that two of the children are girls.
The sample space
S = {BBB, BBG, BGB, BGG, GBB, GBG, GGB, GGG}
The event of having 2 girls
E = {BGG,GBG, GGB}
Thus
3
P(E) =
8
MMW5.6 Data Mana

Probability Rules
1 For any event, E, 0 ≤ P(E) ≤ 1.
MMW5.6 Data Mana

Probability Rules

2 If an event, E cannot occur, then P(E) = 0.
MMW5.6 Data Mana

Probability Rules

3 If an event, E is certain, then P(E) =1.
MMW5.6 Data Mana

Probability Rules

3 If an event, E is certain, then P(E) = 1.
4 The sum of probabilities of all outcomes in a sample space is 1.
MMW5.6 Data Mana

Probability Rules

3 If an event, E is certain, then P(E) =1.
4 The sum of probabilities of all outcomes in a sample space is 1.
5 If Ej is the complement of event E, then
P(Ej) = 1 − P(E)
P(E) = 1 − P(Ej)
P(E)+P(Ej) = 1
MMW5.6 Data Mana

Examples
1 When a single die is rolled, find the probability of getting a 9.
MMW5.6 Data Mana

Examples
1 When a single die is rolled, find the probability of getting a 9. Ans: 0
MMW5.6 Data Mana

Examples

2 When a single die is rolled, what is the probability of getting a number less than 7.
MMW5.6 Data Mana

Examples

Ans: 1
MMW5.6 Data Mana

Examples

Ans: 1
3 In the event of drawing a letter from the english alphabet,
1 what is the complement of the event of drawing a vowel?
MMW5.6 Data Mana

Examples

Ans: 1
1 what is the complement of the event of drawing a vowel? Ans: draw a consonant
MMW5.6 Data Mana

Examples
1
When a single die is rolled, find the probability of getting a 9. Ans: 0
2
When a single die is rolled, what is the probability of getting a number less than 7.
Ans: 1
3
In the event of drawing a letter from the english alphabet,
2 What is the probability of drawing a vowel?
MMW5.6 Data Mana

Examples

Ans: 1
5
2 What is the probability of drawing a vowel? Ans:
26
MMW5.6 Data Mana

Examples

Ans: 1
5
26
3 What is the probability of drawing a consonant?
MMW5.6 Data Mana

Examples

Ans: 1
5
26
5 21
3 What is the probability of drawing a consonant? Ans: 1 − =
26 26
MMW5.6 Data Mana

[1]Bluman, A.G., Elementary Statistice: A Step by Step
Approach, McGraw-Hill , 2015

(M5-MAIN) Data Management

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(M5-MAIN) Data Management

Uploaded by

Copyright:

Available Formats

Module 5

Categorical data – descriptions or characteristics

Any recording of information is called an observation

• Estimate the average depth of a lake from measures taken

A sample is a subset of a population. For the inferences from the

A simple random sample of 𝑛 observations is a sample that is

1. Persons in 200 homes in the city of Richmond were asked

2. A coin is tossed 100 times and 34 tails are recorded

3. 200 pairs of a new type of tennis shoe are tested on the

2. A coin is tossed 100 times and 34 tails are recorded

3. 200 pairs of a new type of tennis shoe are tested on the

Tables and Graphs

Branch Apples Oranges Total

Total 407 231 638 20

Fruit Protein Vegetables Dairy Grain

February -2% 5% 20%

March 1% 11% 15%

June 5% 42% -5% January February March April May June

Frequency Distribution Table

There must be 5 to 20 classes that are mutually exclusive and exhaustive.

Suppose you want to create a frequency distribution table with 5 classes.

STEP 3: Find the cumulative frequencies. Cumulative

STEP 4: Find the class marks to be the representative value

Histogram is a bar graph of the frequencies against the class boundaries.

Line graph of the cumulative frequency with the upper boundaries.

Measures of Center and Dispersion

Example 𝒙 = {𝟏, 𝟓, 𝟐, −𝟑, 𝟎, −𝟐}

Median: There are 12 observations: 𝑛 = 12, even

3 2 + 3 1.5 + 5 1 + 3 1.5 + 3 1.5

𝑟𝑎𝑛𝑔𝑒 = ℎ𝑖𝑔ℎ𝑒𝑠𝑡 − 𝑙𝑜𝑤𝑒𝑠𝑡

A = {3, 4, 5, 6, 8, 9, 10, 12, 15} 𝑟𝑎𝑛𝑔𝑒𝐴 = 15 − 3 = 12

What can you say about the variation of sets A and B?

A = {3, 4, 5, 6, 8, 9, 10, 12, 15} 𝑥𝐴 = 8

𝑠𝐴 = 15.5 = 3.94 𝑠𝐵 = 9.75 = 3.12

The higher the variance, the more varied is the data.

This is the value used to compare values from different sets

Sample Applicant’s Score Mean Standard Deviation

Other Measures of Location

The 𝑟th percentile is denoted by 𝑃𝑟 , where 𝑟 is a whole number from 1 to 99.

Then the 𝑟th percentile is computed as

𝑃85 = 34 + 0.85 35 − 34 = 34.85

There are 9 deciles and their equivalent percentiles are

There are 3 quartiles and their equivalent percentiles are

An outcome is a result of a single trial of a probability experiment.

A sample space is the set of all possible outcomes of a probability experiment.

MMW5.6 Data Mana

Experiment Sample Space

MMW5.6 Data Mana

The sample space for rolling two dice

MMW5.6 Data Mana

MMW5.6 Data Mana

MMW5.6 Data Mana

Formula for Classical Probability

The probability of any event E is

MMW5.6 Data Mana

MMW5.6 Data Mana

MMW5.6 Data Mana

MMW5.6 Data Mana

The sample space

S = {BBB, BBG, BGB, BGG, GBB, GBG, GGB, GGG}

The event of having 2 girls

MMW5.6 Data Mana