Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

Elements of Statistics and Probability

Lecture: 3

Tarikul Islam
Lecturer, BRACU

Tarikul Islam Elements of Statistics and Probability 1 / 48


Outline

Quantitative Data Representation


Graphical Representation

Tarikul Islam Elements of Statistics and Probability 2 / 48


Quantitative Data Representation

Tarikul Islam Elements of Statistics and Probability 3 / 48


Frequency distribution

• A frequency distribution for quantitative data lists all the


classes and the number of values that belong to each class.
• Data presented in the form of a frequency distribution are
called grouped data.
• There are several method to group quantitative data into
classes. For example:
• Single-value grouping

• Limit grouping

• Cutpoint grouping

Tarikul Islam Elements of Statistics and Probability 4 / 48


Single value grouping

• In some cases, the most appropriate way to group quantitative


data is to use classes in which each class represents a single
possible value.
• Such classes are called single-value classes, and this method
of grouping quantitative data is called single-value grouping.

Tarikul Islam Elements of Statistics and Probability 5 / 48


Single value grouping

• In single-value grouping, we use the distinct values of the


observations as the classes, a method completely analogous to
that used for qualitative data.
• Single-value grouping is particularly suitable for discrete data in
which there are only a small number of distinct values.

Tarikul Islam Elements of Statistics and Probability 6 / 48


Single value representation

Number of TVs in 50 households:

Tarikul Islam Elements of Statistics and Probability 7 / 48


Single value representation

Tarikul Islam Elements of Statistics and Probability 8 / 48


Limit Grouping

• A second way to group quantitative data is to use class limits.


With this method, each class consists of a range of values.
• The smallest value that could go in a class is called the lower
limit of the class, and the largest value that could go in the
class is called the upper limit of the class.
• This method of grouping quantitative data is called limit
grouping.
• It is particularly useful when the data are expressed as whole
numbers and there are too many distinct values to employ
single-value grouping

Tarikul Islam Elements of Statistics and Probability 9 / 48


General rules for forming frequency distribution
The following steps can be considered as the general rules for
constructing frequency distributions:
• Determine the largest and smallest numbers in the raw data
and thus find the range(difference between largest and smallest
numbers).
• Determine the number of classes.

• Calculate class interval using the following formula:

Highest Value - Lowest Value


Size of the class interval = i =
Number of classes

• Determine the number of observations falling into each class


interval, i.e. find the class frequencies. This is best done by
using a tally or score sheet.

Tarikul Islam Elements of Statistics and Probability 10 / 48


Number of classes

• One’s professional judgment can determine the number of


classes.
• Too many classes or too few classes might not reveal the basic
shape of the set of data.
• As a general rule, it is best to not use less than 5 nor more
than 15 classes in the construction of a frequency distribution.
• The number of classes can be estimated based on the number
of observations, n, using one of the two rules.

Tarikul Islam Elements of Statistics and Probability 11 / 48


Rules for estimating number of classes

• The 2k rule: As the number of classes, select the smallest


integer, k (whole number) such that 2k > n
• Sturges rule: Estimate the number of classes using formula:

Number of classes = 1 + 3.322 × log10 (n)

Tarikul Islam Elements of Statistics and Probability 12 / 48


Example

Tarikul Islam Elements of Statistics and Probability 13 / 48


Some common terms in Limit grouping

• Lower class limit: The smallest value that could go on a class.

• Upper class limit: The largest value that could go on a class.

• Class width: The difference between the lower limit of a class


and the lower limit of the next class.
• Class mark or midpoint: The averages of the two class limits
(upper and lower limits) of a class.
For example: for the class 50 − 59 in the example the lower limit is
, upper limit is , class width is , and class mark is
?

Tarikul Islam Elements of Statistics and Probability 14 / 48


Shortcomings/Disadvantages of limit grouping

• The numbers in between the lower limit of a group and the


upper limit of the next group does not have any specific group.
For example: for groups 30-39 and 40-49 we can fail to assign a
group to the number 39.5
• To overcome this issue we employ cutpoint grouping.

Tarikul Islam Elements of Statistics and Probability 15 / 48


Cutpoint grouping

• Each class consists of a range of values.

• The smallest value that could go in a class is called the lower


cutpoint of the class, and the smallest value that could go in
the next higher class is called the upper cutpoint of the class.
• Note that the lower cutpoint of a class is the same as its lower
limit and the upper cutpoint of a class is the same as the lower
limit of the next higher class.
• The method of grouping quantitative data by using cutpoints is
called cutpoint grouping.

Tarikul Islam Elements of Statistics and Probability 16 / 48


Example: Cutpoint grouping
The U.S. National Center for Health Statistics publishes data on
weights and heights by age and sex in the document Vital and
Health Statistics. The weights shown in the table: Weights, in
pounds, of 37 males aged 18-24 years.

Tarikul Islam Elements of Statistics and Probability 17 / 48


Example

Tarikul Islam Elements of Statistics and Probability 18 / 48


Some common terms in cutpoint grouping

• Lower class cutpoint: The smallest value that could go on a


class.
• Upper class cutpoint: The smallest value that could go in the
next-higher class (equivalent to the lower cutpoint of the
next-higher class).
• Class width: The difference between the cutpoints of a class.

• Class mark or midpoint: The average of the two cutpoints of


a class.
For example: for the class 160-under 180 in the example the lower
cutpoint is , upper cutpoint is , class width is , and
class mark/ midpoint is ?

Tarikul Islam Elements of Statistics and Probability 19 / 48


Class Boundaries or Cut points

• If the non-overlapping limits are considered, for example


118-127, then 128-137, then the first class theoretically
includes all measurements from 117.5 to 127.5.
• These numbers, indicated briefly by the exact numbers 117.5 to
127.5 are called boundaries or true class limits.
• Note that in case of overlapping class intervals, the class
boundaries are same as the class limits.

Tarikul Islam Elements of Statistics and Probability 20 / 48


Which grouping method should we use?

• Single-value grouping: Use with discrete data in which there


are only a small number of distinct values.
• Limit grouping: Use when the data are expressed as whole
numbers and there are too many distinct values to employ
single-value grouping.
• Cutpoint grouping: Use when the data are continuous and
are expressed with decimals.

Tarikul Islam Elements of Statistics and Probability 21 / 48


Graphical Representation

Tarikul Islam Elements of Statistics and Probability 22 / 48


Histogram

• A histogram displays the classes of the quantitative data on a


horizontal axis and the frequencies (relative frequencies,
percents) of those classes on a vertical axis.
• The frequency (relative frequency, percent) of each class is
represented by a vertical bar whose height is equal to the
frequency (relative frequency, percent) of that class.
• The bars should be positioned so that they touch each other.

Tarikul Islam Elements of Statistics and Probability 23 / 48


Value beneath the Histogram

• For single-value grouping, we use the distinct values of the


observations to label the bars, with each such value centered
under its bar.
• For limit grouping or cutpoint grouping, we use the lower
class limits (or, equivalently, lower class cutpoints) to label the
bars.
• Some statisticians and technologies use class marks or class
midpoints centered under the bars.

Tarikul Islam Elements of Statistics and Probability 24 / 48


How to construct a Histogram

• Obtain a frequency (relative-frequency, percent) distribution of


the data.
• Draw a horizontal axis on which to place the bars and a
vertical axis on which to display the frequencies (relative
frequencies, percents).
• For each class, construct a vertical bar whose height equals the
frequency (relative frequency, percent) of that class.
• Label the bars with the classes, the horizontal axis with the
name of the variable, and the vertical axis with Frequency
(Relative frequency, Percent).

Tarikul Islam Elements of Statistics and Probability 25 / 48


Histogram: Example

Tarikul Islam Elements of Statistics and Probability 26 / 48


Histogram: Single value grouping

Tarikul Islam Elements of Statistics and Probability 27 / 48


Histogram: Limit grouping

Tarikul Islam Elements of Statistics and Probability 28 / 48


Histogram: Cutpoint grouping

Tarikul Islam Elements of Statistics and Probability 29 / 48


Some insights of Histogram

• Relative-frequency (or percent) histograms are better than


frequency histograms for comparing two data sets.
• The vertical scale of a frequency histogram depends on the
number of observations, making comparison more difficult.
• The same vertical scale is used for all relative frequency
histograms (a minimum of 0 and a maximum of 1) making
direct comparison easy.
• Histograms based on limit or cutpoint grouping can be
sensitive to the choice of classes.
• It is sometimes important to experiment with making different
choices for the classes in order to see whether and how such
choices affect the shape or other aspects of the histogram.

Tarikul Islam Elements of Statistics and Probability 30 / 48


Relative Frequency Polygon

• In a relative frequency polygon,a point is plotted


• above each class mark in limit grouping

• above each class midpoint in cutpoint grouping

at a height equal to the relative frequency of the class.


• Then the points are connected with lines

Tarikul Islam Elements of Statistics and Probability 31 / 48


Relative Frequency Polygon

Tarikul Islam Elements of Statistics and Probability 32 / 48


Cumulative Frequency Distribution

• A cumulative frequency distribution gives the total number


of values that fall below the upper boundary of each class.
• The cumulative relative frequencies are obtained by dividing
the cumulative frequencies by the total number of observations
in the data set.
• The cumulative percentages are obtained by multiplying the
cumulative relative frequencies by 100.

Tarikul Islam Elements of Statistics and Probability 33 / 48


Cumulative Frequency Distribution

• Frequency distribution and cumulative frequency distribution


for the Data on iPods Sold.

Question: From the ipod data find the (i) relative frequency (ii)
percentage (iii) cumulative frequency (iv) cumulative relative
frequency (v) cumulative relative percentage

Tarikul Islam Elements of Statistics and Probability 34 / 48


Ogive plot

• Cumulative information can be portrayed using a graph called


an ogive.
• Ogive plot is usually used to determine the values of certain
quantities such as median, quartile, percentile.
• To construct an ogive, we first make a table that displays
cumulative frequencies and cumulative relative frequencies.
• A cumulative frequency is obtained by summing the frequencies
of all classes representing values less than a specified lower
class limit (or cutpoint).

Tarikul Islam Elements of Statistics and Probability 35 / 48


Ogive plot

• A cumulative relative frequency is found by dividing the


corresponding cumulative frequency by the total number of
observations.
• In an ogive, a point is plotted above each lower class limit (or
cutpoint) at a height equal to the cumulative relative frequency.
• Then the points are connected with lines.

Tarikul Islam Elements of Statistics and Probability 36 / 48


Ogive plot

Tarikul Islam Elements of Statistics and Probability 37 / 48


Ogive example

Find the 25th , 50th and 75th percentile for apple and orrange from
the following figure:

Tarikul Islam Elements of Statistics and Probability 38 / 48


Dot plot

• In a dot plot each observation is plotted as a dot at an


appropriate place above a horizontal axis.
• Observations having equal values are stacked vertically.

• A dot plot only has an x-axis. The y-axis is never drawn.

• Advantage of a dot plot: Moderate amounts of discrete


quantitative data can be quickly visualized.

Tarikul Islam Elements of Statistics and Probability 39 / 48


How to construct a Dot plot

• Step 1: Draw a horizontal axis that displays the possible


values of the quantitative data.
• Step 2: Record each observation by placing a dot over the
appropriate value on the horizontal axis.
• Step 3: Label the horizontal axis with the name of the variable.

Tarikul Islam Elements of Statistics and Probability 40 / 48


Example
One of Professor Weisss sons wanted to add a new DVD player to
his home theater system. He used the Internet to shop and went to
pricewatch.com. There he found 16 quotes on different brands and
styles of DVD players.

Tarikul Islam Elements of Statistics and Probability 41 / 48


Dot plot of prices of DVD players

Tarikul Islam Elements of Statistics and Probability 42 / 48


Stem and leaf diagram

• Statisticians continue to invent ways to display data.

• One method, developed in the 1960s by the late Professor John


Tukey of Princeton University, is called a stem and leaf
diagram.
• This ingenious diagram is often easier to construct than either
a frequency distribution or a histogram and generally displays
more information.
• In a stem and leaf diagram (or stem plot), each observation is
separated into two parts, namely, a stem consisting of all but
the rightmost digit and a leaf, the rightmost digit.

Tarikul Islam Elements of Statistics and Probability 43 / 48


How to construct a stem and leaf Diagram

• Step 1: Think of each observation as a stem consisting of all


but the rightmost digit and a leaf, the rightmost digit.
• Step 2: Write the stems from smallest to largest in a vertical
column to the left of a vertical rule.
• Step 3: Write each leaf to the right of the vertical rule in the
row that contains the appropriate stem.
• Step 4: Arrange the leaves in each row in ascending order.

Tarikul Islam Elements of Statistics and Probability 44 / 48


Stem and leaf diagram

Tarikul Islam Elements of Statistics and Probability 45 / 48


Stem and Leaf Diagram

Tukey (1977) first proposed the technique. It allows us to use the


information contained in a frequency distribution to show
• The range of score

• Concentration of scores

• The shape of the distribution

• Presence of any specific values or scores not represented in the


entire data set
• Whether there are any stray or extreme values in the
distribution.

Tarikul Islam Elements of Statistics and Probability 46 / 48


Exercise

Cholesterol levels for 20 high-level patients are given below:

Construct a stem-and-leaf diagram for these data by using


• one line per stem

• two lines per stem.

Tarikul Islam Elements of Statistics and Probability 47 / 48


Tarikul Islam Elements of Statistics and Probability 48 / 48
Thank You

Tarikul Islam Elements of Statistics and Probability 48 / 48

You might also like