Week3 Frequency Analysis

STATISTICAL ANALYSIS FOR
EXPERIMENTAL DESIGN AND DATA

PROCESSING
Prof. Mustafa Sait YAZGAN

Assoc. Prof. Alpaslan EKDAL
1
Frequency Analysis and Parameter Estimation
• A sample is a set of observations collected to determine the statistical

properties of a random variable.
• Each element in the sample is an event belonging to the random variable or is a
value that the random variable has taken.
2
Characteristics of a Sample to be Qualitatively
Adequate
• Data in the sample should be homogeneous (Dam)
• There should be no systematic errors in the measurement

of the elements of the sample
• Random errors should be minimized
3
Characteristics of a Sample to be Quantitatively
Adequate
• The number of elements in the sample is sufficiently large
• Samples having < 20-30 elements called as small samples
4
Uncertainty
• Estimates made for the properties of the population
(probability distribution function, parameters) by statistical
analysis of the sample are not equal to the real values of the
population
- Qualitative inadequacy of the sample (random errors,

unnoticeable non-homogeneity and systematic errors)
- Limited number of elements of the sample (sampling
errors)
5
Frequency Analysis
It is not possible to observe all the population of a random
variable, so it is assumed that probability distribution is
equivalent to the frequency distribution obtained by the
analysis of sample.
6
Frequency Analysis - Definitions
Raw Data
Raw data are collected data that have not been organized
numerically. An example is the set of weights of 100 male students
obtained from an alphabetical listing of university records.
Array
An array is an arrangement of raw numerical data in ascending or
descending order of magnitude.
Range
The difference between the largest and smallest numbers is called
the range of the data. For example, if the largest weight of 100 male
students is 74 kg and the smallest weight is 60 kg, the range is 14 kg.
7
Frequency Distributions
When summarizing large masses of raw data, it is often useful to
distribute the data into classes, or categories, and to determine the
number of individuals belonging to each class, which is called as the
class frequency.
A tabular arrangement of data by classes together with the

corresponding class frequencies is called a frequency distribution,
or frequency table.
Table 2.1 is the weights of 100 male students at MAT271E Course in

kilograms (rounded to the nearest weight) and Table 2.2 is the
frequency distribution of the weights.
8
Frequency Distributions
Table 2.1 Weights of 100 Male Students at MAT271E Course
66 68 63 68 68 67 70 69 69 63
67 67 67 67 68 67 70 71 69 64
66 68 64 67 68 66 67 70 69 64
66 68 65 62 64 65 70 68 66 61
68 70 70 67 69 64 64 65 69 65
71 68 68 67 66 68 63 71 71 65
68 66 67 67 69 66 66 68 65 70
70 70 69 67 68 69 71 62 64 71
68 66 67 67 69 66 70 70 63 60
61 73 72 64 74 74 73 73 72 74
The first class (or category), for example, consists of

weights from 60 to 62 kg and is indicated by the
range symbol 60–62. Since five students have
weights belonging to this class, the corresponding
class frequency is 5.
Data, organized and summarized as in the given
frequency distribution, are often called grouped
data. Although the grouping process generally
destroys much of the original details of the data, it is
a clear "overall" picture of data that is collected.
9
Histograms and Frequency Polygons
Class Intervals and Class Limits
A symbol defining a class, such as 60–62 in Table 2.1, is called a class

interval. The end numbers, 60 and 62, are called class limits; the smaller
number (60) is the lower class limit, and the larger number (62) is the
upper class limit.
The terms class and class interval are often used alternately, although the
class interval is actually a symbol for the class.
Class Mark
Class mark is the midpoint of class interval and calculated by dividing the
sum of lower and upper class limits with 2.
10
Class Boundaries
When weights are given in kg unit, theoretically 60-62 class interval

includes all the weight measurements from 59.5 to 62.5. These absolute
numbers shown with 59.5 and 62.5 are called as class numbers or real
class limits. The smaller number (59.5) is the lower class boundary, and
the larger number (62.5) is the upper class boundary.
11
Histogram is the graphic representation of frequency distributions.
1. A histogram, consists of a set of rectangles having

(a) bases on horizontal axis (the X axis), with centers at the class
marks and lengths equal to the class interval sizes, and
(b) areas proportional to the class frequencies.
If the class intervals all have equal size, the heights of the rectangles
are proportional to the class frequencies, and it is then customary to
take the heights numerically equal to the class frequencies.
2. A frequency polygon is a line graph of the class frequency plotted

against the class mark. It can be obtained by connecting the midpoints
of the tops of the rectangles in the histogram.
12
The histogram and frequency polygon corresponding to the frequency
distribution of weights in Table 2.1 are shown on the same set of axes in
the Figure.
13
Relative Frequency Histogram
If the frequencies in Table 2.1 are replaced with the corresponding relative
frequencies, the resulting table is called a relative–frequency distribution,
percentage distribution, or relative–frequency table.
Graphic representation of relative–frequency distributions can be

obtained from the histogram or frequency polygon simply by changing the
vertical scale from frequency to relative frequency, keeping exactly the
same diagram. The resulting graphs are called relative–frequency
histograms (or percentage histograms) and relative–frequency polygons
(or percentage polygons), respectively.
14
Relative Frequency Histogram
The relative frequency histogram is the frequency of the class divided by
the total frequency of all classes and is generally expressed as a percentage.
For example, the relative frequency of the class 66–68 in Table 2.1 is
42/100 = 42%.
The sum of the relative frequencies of all classes is clearly 1, or 100%.
15
Cumulative–Frequency Distributions and Ogives
The total frequency of all values less than the upper class boundary of a
given class interval is called the cumulative frequency up to and including
that class interval. For example, the cumulative frequency up to and
including the class interval 66–68 in Table 2.1 is 5 + 18 + 42 = 65, signifying
that 65 students have weights less than 68.5 kg.
A table presenting such cumulative frequencies is called a cumulative–

frequency distribution, cumulative–frequency table, or briefly a
cumulative distribution, and is given in Table 2.2 for the student weight
distribution of Table 2.1.
16
17
A graph showing the cumulative frequency less than any upper class
boundary plotted against the upper class boundary is called a
cumulative–frequency polygon, or ogive, and is shown in Fig. 2–2
for the student weight distribution of Table 2.1.
18
For some purposes, it is desirable to consider a cumulative–

frequency distribution of all values greater than or equal to the
lower class boundary of each class interval. Because in this case we
consider weights of 59.5 kg or more, 62.5 kg or more, etc., this is
sometimes called an "or more" cumulative distribution, while the
one considered above is a "less than" cumulative distribution. One
is easily obtained from the other. The corresponding ogives are then
called "or more" and "less than" ogives. Whenever we refer to
cumulative distributions or ogives without qualification, the "less
than" type is implied.
19
Frequency Analysis of Continuous Variables
20
21
22
23
24
General Rules for Forming Frequency Distributions of
Continuous Data
1. Determine the largest and smallest numbers in the raw data and
find the range (the difference between the largest and smallest
numbers).
2. Divide the range into a convenient number of class intervals
having the same size. You can find the convenient number of
class by using the formula
M=1+3.3 log N
Where M is the number of class and N is the number of data. M

value is rounded and converted into an integer number.
25
Continuous Data
3. The range of the numbers is divided into M in order to get the

number of intervals. Usually interval number is rounded into
simpler integer numbers (i.e. 92 is rounded into 100, 22 into 25
etc.)
4. Start the lower class limit of first interval from the lowest
number or a bit lower than the lowest number of data set. (i.e.
if the lowest number is 23, start from 20.
5. Form the each class by adding the number obtained in previous
calculation. Be sure that the greatest class limit of last interval
includes the greatest number in the data set.
The number of class intervals is usually taken between 5 and 20,

depending on the data.
26
Frequency Analysis
27
Frequency Analysis
28
Discrete Data
Since the discrete data sets include only integer numbers, it may
not be necessary to cumulate the value in order to form intervals in
general. Instead, each number may be directly used as a class limit.
If you have more dispersed discrete data set, you may group them
as explained in the rules of continuous data.
29
Discrete Data
• 2 dices are thrown 65 times and the number of observations of

their sum are given in Table 2.3.
30
Discrete Data
• The sum of two dices is discrete type of data and its histogram
can be drawn directly without grouping in general.
31
Types of Frequency Curves
Frequency curves arising in practice take on certain characteristic
shapes, as shown in Figure.
32
1. The symmetrical, or bell–shaped, frequency curves are
characterized by the fact that observations equidistant from the
central maximum have the same frequency. An important
example is the normal curve.
2. In the moderately asymmetrical, or skewed, frequency curves

the tail of the curve to one side of the central maximum is
longer than that to the other. If the longer tail occurs to the
right, the curve is said to be skewed to the right or to have
positive skewness, while if the opposite is true, the curve is said
to be skewed to the left or to have negative skewness.
33
3. In a J–shaped or reverse J–shaped curve a maximum occurs at

one end.
4. A U–shaped frequency curve has maxima at both ends.
5. A bimodal frequency curve has two maxima.
6. A multimodal frequency curve has more than two maxima.
34
• The data set of an annual amount of rain of a city is given in mm

in the following table .
700 315 450 615 625 625

420 645 635 895 500 565
650 585 665 555 410 715
365 455 535 545 575 645
550 735 615 675 835 595
35
The histogram of the data is shown in Figure.
36
The frequency histogram of the data is shown in Figure.
37
The cumulative frequency distribution of the data is shown in
Figure. It is seen that 50% of the rain is below 600 mm.
38
The appearance of the frequency histogram is affected by the
number of class intervals. The use of too few classes causes too
much loss of information, whereas too many class intervals may
lead to irregular histograms, with very few observations (or maybe
none) in some intervals. Thus, selection of the number of class
intervals is important.
39
Example 1
Frequency distribution of the monthly salaries of 65 employees
working for P&R company is given in the table.
Salary (TL) Number of Employees

Class 1 2500 - 2599.99 8
Class 2 2600 - 2699.99 10
Class 3 2700 - 2799.99 16
Class 4 2800 - 2899.99 14
Class 5 2900 - 2999.99 10
Class 6 3000 - 3099.99 5
Class 7 3100 - 3199.99 2
Total 65
40
Example 1
Please answer the following according to the data given in table
a) What is the lower limit of class 6?

b) What is the upper limit of class 4?
c) What is the class mark of class 3?
d) What are the boundaries of class 5?
e) What is the width of class 5?
f) What is the frequency of class 3?
g) What is the relative frequency of class 3?
h) What is the class interval that has the highest frequency?
i) What is the percent of employees earning less than 2800 TL?
j) What is the percent of employees earning less than 3000 TL but
at least 2600 TL?
41
Example 1
a) 3000 TL
b) 2899.99 TL
c) ½(2700+2799.99) =2749.995 TL (Round up to 2750 TL for
practical purposes)
d) Lower class boundary = ½(2900+2899.99) = 2899.995 TL
Upper class boundary = ½(2999.99+3000) = 2999.995 TL
e) 2999.995 - 2899.995 = 100 TL
42
Example 1
f) 16
g) 16/65 = 0.246 = 24.6%
h) 2700 – 2799.99 TL
i) Number of employees earning less than 2800 TL = 16+10+8 = 34
Percent of employees earning less than 2800 TL = 34/65 = 52.3%
j) Number of employees earning less than 3000 TL but at least
2600 TL = 10+14+16+10 = 50
Percent of employees earning less than 3000 TL but at least
2600 TL = 50/65 = 76.9%
43
Example 2
By using the data given in table in Example 1
a) Calculate the cumulative frequency distribution

b) Calculate the percent cumulative distribution
c) Draw the cumulative frequency diagram
d) Draw the percent cumulative frequency diagram
44
Example 2
a) Calculate the cumulative frequency distribution
Salary (TL) Cumulative Frequency

< 2500 0
< 2600 8
< 2700 18
< 2800 34
< 2900 48
< 3000 58
< 3100 63
< 3200 65
45
Example 2
b) Calculate the percent cumulative distribution
Salary (TL) Percent Cumulative Frequency

< 2500 0.0
< 2600 12.3
< 2700 27.7
< 2800 52.3
< 2900 73.8
< 3000 89.2
< 3100 96.9
< 3200 100.0
46
Example 2
c) Draw the cumulative frequency diagram
47
Example 2
d) Draw the percent cumulative frequency diagram
48
Example 3
By using the data given in table in Example 1 calculate the
cumulative frequency distribution as «more than» and draw its
graph.
Salary (TL) Cumulative Frequency

> 2500 65
> 2600 57
> 2700 47
> 2800 31
> 2900 17
> 3000 7
> 3100 2
> 3200 0
49
Example 3
50

Week3 Frequency Analysis

Uploaded by

Copyright:

Available Formats

You might also like

Week3 Frequency Analysis

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week3 Frequency Analysis

Uploaded by

Copyright:

Available Formats

STATISTICAL ANALYSIS FOR

EXPERIMENTAL DESIGN AND DATA

Prof. Mustafa Sait YAZGAN

• A sample is a set of observations collected to determine the statistical

• Data in the sample should be homogeneous (Dam)

• There should be no systematic errors in the measurement

• Random errors should be minimized

• The number of elements in the sample is sufficiently large

• Samples having < 20-30 elements called as small samples

- Qualitative inadequacy of the sample (random errors,

A tabular arrangement of data by classes together with the

Table 2.1 is the weights of 100 male students at MAT271E Course in

The first class (or category), for example, consists of

A symbol defining a class, such as 60–62 in Table 2.1, is called a class

When weights are given in kg unit, theoretically 60-62 class interval

1. A histogram, consists of a set of rectangles having

2. A frequency polygon is a line graph of the class frequency plotted

Graphic representation of relative–frequency distributions can be

The sum of the relative frequencies of all classes is clearly 1, or 100%.

A table presenting such cumulative frequencies is called a cumulative–

For some purposes, it is desirable to consider a cumulative–

Where M is the number of class and N is the number of data. M

3. The range of the numbers is divided into M in order to get the

The number of class intervals is usually taken between 5 and 20,

• 2 dices are thrown 65 times and the number of observations of

2. In the moderately asymmetrical, or skewed, frequency curves

3. In a J–shaped or reverse J–shaped curve a maximum occurs at

4. A U–shaped frequency curve has maxima at both ends.

5. A bimodal frequency curve has two maxima.

6. A multimodal frequency curve has more than two maxima.

• The data set of an annual amount of rain of a city is given in mm

700 315 450 615 625 625

The histogram of the data is shown in Figure.

The frequency histogram of the data is shown in Figure.

Salary (TL) Number of Employees

a) What is the lower limit of class 6?

a) Calculate the cumulative frequency distribution

Salary (TL) Cumulative Frequency

Salary (TL) Percent Cumulative Frequency

Salary (TL) Cumulative Frequency

You might also like