Week3 Frequency Analysis

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

STATISTICAL ANALYSIS FOR

EXPERIMENTAL DESIGN AND DATA


PROCESSING

Prof. Mustafa Sait YAZGAN


Assoc. Prof. Alpaslan EKDAL

1
Frequency Analysis and Parameter Estimation

• A sample is a set of observations collected to determine the statistical


properties of a random variable.
• Each element in the sample is an event belonging to the random variable or is a
value that the random variable has taken.
2
Characteristics of a Sample to be Qualitatively
Adequate

• Data in the sample should be homogeneous (Dam)

• There should be no systematic errors in the measurement


of the elements of the sample

• Random errors should be minimized

3
Characteristics of a Sample to be Quantitatively
Adequate

• The number of elements in the sample is sufficiently large

• Samples having < 20-30 elements called as small samples

4
Uncertainty
• Estimates made for the properties of the population
(probability distribution function, parameters) by statistical
analysis of the sample are not equal to the real values of the
population

- Qualitative inadequacy of the sample (random errors,


unnoticeable non-homogeneity and systematic errors)
- Limited number of elements of the sample (sampling
errors)

5
Frequency Analysis
It is not possible to observe all the population of a random
variable, so it is assumed that probability distribution is
equivalent to the frequency distribution obtained by the
analysis of sample.

6
Frequency Analysis - Definitions
Raw Data
Raw data are collected data that have not been organized
numerically. An example is the set of weights of 100 male students
obtained from an alphabetical listing of university records.

Array
An array is an arrangement of raw numerical data in ascending or
descending order of magnitude.

Range
The difference between the largest and smallest numbers is called
the range of the data. For example, if the largest weight of 100 male
students is 74 kg and the smallest weight is 60 kg, the range is 14 kg.

7
Frequency Distributions
When summarizing large masses of raw data, it is often useful to
distribute the data into classes, or categories, and to determine the
number of individuals belonging to each class, which is called as the
class frequency.

A tabular arrangement of data by classes together with the


corresponding class frequencies is called a frequency distribution,
or frequency table.

Table 2.1 is the weights of 100 male students at MAT271E Course in


kilograms (rounded to the nearest weight) and Table 2.2 is the
frequency distribution of the weights.

8
Frequency Distributions
Table 2.1 Weights of 100 Male Students at MAT271E Course
66 68 63 68 68 67 70 69 69 63
67 67 67 67 68 67 70 71 69 64
66 68 64 67 68 66 67 70 69 64
66 68 65 62 64 65 70 68 66 61
68 70 70 67 69 64 64 65 69 65
71 68 68 67 66 68 63 71 71 65
68 66 67 67 69 66 66 68 65 70
70 70 69 67 68 69 71 62 64 71
68 66 67 67 69 66 70 70 63 60
61 73 72 64 74 74 73 73 72 74

The first class (or category), for example, consists of


weights from 60 to 62 kg and is indicated by the
range symbol 60–62. Since five students have
weights belonging to this class, the corresponding
class frequency is 5.
Data, organized and summarized as in the given
frequency distribution, are often called grouped
data. Although the grouping process generally
destroys much of the original details of the data, it is
a clear "overall" picture of data that is collected.
9
Histograms and Frequency Polygons
Class Intervals and Class Limits

A symbol defining a class, such as 60–62 in Table 2.1, is called a class


interval. The end numbers, 60 and 62, are called class limits; the smaller
number (60) is the lower class limit, and the larger number (62) is the
upper class limit.

The terms class and class interval are often used alternately, although the
class interval is actually a symbol for the class.

Class Mark

Class mark is the midpoint of class interval and calculated by dividing the
sum of lower and upper class limits with 2.

10
Histograms and Frequency Polygons
Class Boundaries

When weights are given in kg unit, theoretically 60-62 class interval


includes all the weight measurements from 59.5 to 62.5. These absolute
numbers shown with 59.5 and 62.5 are called as class numbers or real
class limits. The smaller number (59.5) is the lower class boundary, and
the larger number (62.5) is the upper class boundary.

11
Histograms and Frequency Polygons
Histogram is the graphic representation of frequency distributions.

1. A histogram, consists of a set of rectangles having


(a) bases on horizontal axis (the X axis), with centers at the class
marks and lengths equal to the class interval sizes, and
(b) areas proportional to the class frequencies.

If the class intervals all have equal size, the heights of the rectangles
are proportional to the class frequencies, and it is then customary to
take the heights numerically equal to the class frequencies.

2. A frequency polygon is a line graph of the class frequency plotted


against the class mark. It can be obtained by connecting the midpoints
of the tops of the rectangles in the histogram.

12
Histograms and Frequency Polygons
The histogram and frequency polygon corresponding to the frequency
distribution of weights in Table 2.1 are shown on the same set of axes in
the Figure.

13
Relative Frequency Histogram
If the frequencies in Table 2.1 are replaced with the corresponding relative
frequencies, the resulting table is called a relative–frequency distribution,
percentage distribution, or relative–frequency table.

Graphic representation of relative–frequency distributions can be


obtained from the histogram or frequency polygon simply by changing the
vertical scale from frequency to relative frequency, keeping exactly the
same diagram. The resulting graphs are called relative–frequency
histograms (or percentage histograms) and relative–frequency polygons
(or percentage polygons), respectively.

14
Relative Frequency Histogram
The relative frequency histogram is the frequency of the class divided by
the total frequency of all classes and is generally expressed as a percentage.

For example, the relative frequency of the class 66–68 in Table 2.1 is
42/100 = 42%.

The sum of the relative frequencies of all classes is clearly 1, or 100%.

15
Cumulative–Frequency Distributions and Ogives

The total frequency of all values less than the upper class boundary of a
given class interval is called the cumulative frequency up to and including
that class interval. For example, the cumulative frequency up to and
including the class interval 66–68 in Table 2.1 is 5 + 18 + 42 = 65, signifying
that 65 students have weights less than 68.5 kg.

A table presenting such cumulative frequencies is called a cumulative–


frequency distribution, cumulative–frequency table, or briefly a
cumulative distribution, and is given in Table 2.2 for the student weight
distribution of Table 2.1.

16
Cumulative–Frequency Distributions and Ogives

17
Cumulative–Frequency Distributions and Ogives

A graph showing the cumulative frequency less than any upper class
boundary plotted against the upper class boundary is called a
cumulative–frequency polygon, or ogive, and is shown in Fig. 2–2
for the student weight distribution of Table 2.1.

18
Cumulative–Frequency Distributions and Ogives

For some purposes, it is desirable to consider a cumulative–


frequency distribution of all values greater than or equal to the
lower class boundary of each class interval. Because in this case we
consider weights of 59.5 kg or more, 62.5 kg or more, etc., this is
sometimes called an "or more" cumulative distribution, while the
one considered above is a "less than" cumulative distribution. One
is easily obtained from the other. The corresponding ogives are then
called "or more" and "less than" ogives. Whenever we refer to
cumulative distributions or ogives without qualification, the "less
than" type is implied.

19
Frequency Analysis of Continuous Variables

20
Frequency Analysis of Continuous Variables

21
Frequency Analysis of Continuous Variables

22
23
24
General Rules for Forming Frequency Distributions of
Continuous Data

1. Determine the largest and smallest numbers in the raw data and
find the range (the difference between the largest and smallest
numbers).
2. Divide the range into a convenient number of class intervals
having the same size. You can find the convenient number of
class by using the formula

M=1+3.3 log N

Where M is the number of class and N is the number of data. M


value is rounded and converted into an integer number.

25
General Rules for Forming Frequency Distributions of
Continuous Data

3. The range of the numbers is divided into M in order to get the


number of intervals. Usually interval number is rounded into
simpler integer numbers (i.e. 92 is rounded into 100, 22 into 25
etc.)
4. Start the lower class limit of first interval from the lowest
number or a bit lower than the lowest number of data set. (i.e.
if the lowest number is 23, start from 20.
5. Form the each class by adding the number obtained in previous
calculation. Be sure that the greatest class limit of last interval
includes the greatest number in the data set.

The number of class intervals is usually taken between 5 and 20,


depending on the data.

26
Frequency Analysis

27
Frequency Analysis

28
General Rules for Forming Frequency Distributions of
Discrete Data

Since the discrete data sets include only integer numbers, it may
not be necessary to cumulate the value in order to form intervals in
general. Instead, each number may be directly used as a class limit.

If you have more dispersed discrete data set, you may group them
as explained in the rules of continuous data.

29
General Rules for Forming Frequency Distributions of
Discrete Data

• 2 dices are thrown 65 times and the number of observations of


their sum are given in Table 2.3.

30
General Rules for Forming Frequency Distributions of
Discrete Data

• The sum of two dices is discrete type of data and its histogram
can be drawn directly without grouping in general.

31
Types of Frequency Curves
Frequency curves arising in practice take on certain characteristic
shapes, as shown in Figure.

32
Types of Frequency Curves
1. The symmetrical, or bell–shaped, frequency curves are
characterized by the fact that observations equidistant from the
central maximum have the same frequency. An important
example is the normal curve.

2. In the moderately asymmetrical, or skewed, frequency curves


the tail of the curve to one side of the central maximum is
longer than that to the other. If the longer tail occurs to the
right, the curve is said to be skewed to the right or to have
positive skewness, while if the opposite is true, the curve is said
to be skewed to the left or to have negative skewness.

33
Types of Frequency Curves

3. In a J–shaped or reverse J–shaped curve a maximum occurs at


one end.

4. A U–shaped frequency curve has maxima at both ends.

5. A bimodal frequency curve has two maxima.

6. A multimodal frequency curve has more than two maxima.

34
Types of Frequency Curves

• The data set of an annual amount of rain of a city is given in mm


in the following table .

700 315 450 615 625 625


420 645 635 895 500 565
650 585 665 555 410 715
365 455 535 545 575 645
550 735 615 675 835 595

35
Types of Frequency Curves

The histogram of the data is shown in Figure.

36
Types of Frequency Curves

The frequency histogram of the data is shown in Figure.

37
Types of Frequency Curves
The cumulative frequency distribution of the data is shown in
Figure. It is seen that 50% of the rain is below 600 mm.

38
Types of Frequency Curves
The appearance of the frequency histogram is affected by the
number of class intervals. The use of too few classes causes too
much loss of information, whereas too many class intervals may
lead to irregular histograms, with very few observations (or maybe
none) in some intervals. Thus, selection of the number of class
intervals is important.

39
Example 1
Frequency distribution of the monthly salaries of 65 employees
working for P&R company is given in the table.

Salary (TL) Number of Employees


Class 1 2500 - 2599.99 8
Class 2 2600 - 2699.99 10
Class 3 2700 - 2799.99 16
Class 4 2800 - 2899.99 14
Class 5 2900 - 2999.99 10
Class 6 3000 - 3099.99 5
Class 7 3100 - 3199.99 2
Total 65

40
Example 1
Please answer the following according to the data given in table

a) What is the lower limit of class 6?


b) What is the upper limit of class 4?
c) What is the class mark of class 3?
d) What are the boundaries of class 5?
e) What is the width of class 5?
f) What is the frequency of class 3?
g) What is the relative frequency of class 3?
h) What is the class interval that has the highest frequency?
i) What is the percent of employees earning less than 2800 TL?
j) What is the percent of employees earning less than 3000 TL but
at least 2600 TL?

41
Example 1
a) 3000 TL
b) 2899.99 TL
c) ½(2700+2799.99) =2749.995 TL (Round up to 2750 TL for
practical purposes)
d) Lower class boundary = ½(2900+2899.99) = 2899.995 TL
Upper class boundary = ½(2999.99+3000) = 2999.995 TL
e) 2999.995 - 2899.995 = 100 TL

42
Example 1
f) 16
g) 16/65 = 0.246 = 24.6%
h) 2700 – 2799.99 TL
i) Number of employees earning less than 2800 TL = 16+10+8 = 34
Percent of employees earning less than 2800 TL = 34/65 = 52.3%
j) Number of employees earning less than 3000 TL but at least
2600 TL = 10+14+16+10 = 50
Percent of employees earning less than 3000 TL but at least
2600 TL = 50/65 = 76.9%

43
Example 2
By using the data given in table in Example 1

a) Calculate the cumulative frequency distribution


b) Calculate the percent cumulative distribution
c) Draw the cumulative frequency diagram
d) Draw the percent cumulative frequency diagram

44
Example 2
a) Calculate the cumulative frequency distribution

Salary (TL) Cumulative Frequency


< 2500 0
< 2600 8
< 2700 18
< 2800 34
< 2900 48
< 3000 58
< 3100 63
< 3200 65

45
Example 2
b) Calculate the percent cumulative distribution

Salary (TL) Percent Cumulative Frequency


< 2500 0.0
< 2600 12.3
< 2700 27.7
< 2800 52.3
< 2900 73.8
< 3000 89.2
< 3100 96.9
< 3200 100.0

46
Example 2
c) Draw the cumulative frequency diagram

47
Example 2
d) Draw the percent cumulative frequency diagram

48
Example 3
By using the data given in table in Example 1 calculate the
cumulative frequency distribution as «more than» and draw its
graph.

Salary (TL) Cumulative Frequency


> 2500 65
> 2600 57
> 2700 47
> 2800 31
> 2900 17
> 3000 7
> 3100 2
> 3200 0

49
Example 3

50

You might also like