Week5 PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

MECHANICAL

ENGINEERING SYSTEMS
LABORATORY

Group 02

Asst. Prof. Dr. E. İlhan KONUKSEVEN


STATISTICAL TREATMENT OF
EXPERIMENTAL DATA
DISCRETE FREQUENCY DISTRIBUTIONS

Assume that a total of n=10 measurements, xi (i=1,…,10)


are made as:

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

14 16 13 19 18 14 14 15 18 15

Note that the span of measurements is 6, ranging from


13 to 19.
FREQUENCY F( nj )

IS THE NUMBER OF OCCURRENCE OF


THE jth MEASUREMENT VALUE

In this example, frequencies are:

j 1 2 3 4 5 6 7
value 13 14 15 16 17 18 19
nj 1 3 2 1 0 2 1
RELATIVE FREQUENCY fj

IS THE RELATIVE VALUES OF NUMBER OF OCCURRENCES


WITH RESPECT TO TOTAL NUMBER OF OCCURRENCES

nj m

f
m
fj  j 1 & nnj
n j1 j1

THERE ARE 7 GROUPS ie m = 7


j 1 2 3 4 5 6 7
value 13 14 15 16 17 18 19
fj 0.1 0.3 0.2 0.1 0.0 0.2 0.1
j 1 2 3 4 5 6 7
value 13 14 15 16 17 18 19
nj 1 3 2 1 0 2 1

Frequency Graph: These measurements may be shown


graphically on a histogram called “Frequency Graph”
as follows:
Frequency Relative
nj Frequency
4 0.4
fj
3 0.3
x7
2 0.2
x6 x10 x9
1 0.1
x3 x1 x8 x2 x5 x4
0 0.0
13 14 15 16 17 18 19
MEASURES OF CENTRAL TENDENCY
x

ARITHMETIC MEAN (Average)


n
1
x
n
i1
xi

IT PROVIDES THE BEST ESTIMATE OF AN UNBIASED


DISTRIBUTION OF DATA

x is the most commonly used measure of central tendency because


it usually provides the “best estimate” of the most typical value in
the distribution of data.

x =15.6 for the last example

BIAS:
In statistics, bias is systematic favoritism (tendency to make
systematic errors) present in data collection, analysis or reporting of
quantitative search
MEASURES OF CENTRAL TENDENCY

MEDIAN

IT IS THE VALUE AT THE MIDDLE POSITION OF A


DISTRIBUTION OF DATA

IT IS USUALLY USED WHEN THE DISTRIBUTION


IS BIASED
Median is the middle value of the given numbers or distribution
in their ascending order. Median is the average value of the two
middle elements when the size of the distribution is even.
13, 14, 14, 14, 15, 15, 16, 18, 18, 19
(It is 15 for the last example)
MEASURES OF CENTRAL TENDENCY

MODE

IT IS THE VALUE HAVING THE HIGHEST


FREQUENCY
IN THE SAMPLE DISTRIBUTION

( It is not very meaningful unless n is too large )

(It is 14 for the last example)


GEOMETRIC MEAN (Log - Mean)
1/n
 n

x g    x i 
 i1 

1 n
log( x g )   log( x i )
n i 1

IT IS IMPORTANT WHEN DEALING WITH


RATIOS OR PERCENTAGES

(It is 15.5 for the last example)


HARMONIC MEAN

n
x h  n  (1 / x i )
i1

(It is 15.4 for the last example)


QUADRATIC MEAN

(ROOT - MEAN - SQUARE )

1 n 2
x rms  
n i1
xi

It can be considered as the second moment of a set of


data about its origin. (It is 15.7 for the last example)
MEASURES OF DISPERSION OF DATA

VARIANCE
(MEAN SQUARE DEVIATION )

n
1
VAR     ( x i  x )
2 2

n i 1

It is 3.84 for the last example


MEASURES OF DISPERSION OF DATA

STANDARD DEVIATION

1 n
 
n i1
( x i  x ) 2
 ( x 2
i )  ( x ) 2

It is 1.96 for the last example


MEASURES OF DISPERSION OF DATA

RANGE
IT IS THE DIFFERENCE BETWEEN
THE LARGEST AND SMALLEST
VALUES OF THE ENTIRE SET OF
DATA

(It is 6 for the last example)


MEASURES OF DISPERSION OF DATA

AVERAGE DEVIATION

n
1
A.D . 
n
 i1
x  x
i

It is 1.72 for the last example


UNBIASED ESTIMATES
If a “random sample” is drawn from a “population”
(or “universe”),

P o p u la t io n o r U n iv e r s e
M ean: 
S .D .: 

R a n d o m S a m p le (x 1, x 2, … , x n)
UNBIASED ESTIMATES
A) THE SAMPLE MEAN
Population or Universe
x IS THE BEST Mean: 
S.D.: 
AVAILABLE ESTIMATE
OF THE UNKNOWN
Random Sample (x1, x2, … , xn)
MEAN OF THE
UNIVERSE 
UNBIASED ESTIMATES

A) THE BEST Population or Universe


Mean: 
AVAILABLE ESTIMATE S.D.: 

OF THE UNKNOWN
Random Sample (x1, x2, … , xn)
STANDARD DEVIATION
OF THE UNIVERSE  IS GIVEN BY

s
1 n

n  1 i 1
( x i  x ) 2

n
n 1
( x 
2
i )  ( x ) 2

s
1

n

n  1 i 1
( x i  x) 
2 n
n 1

( x i )  ( x)
2 2

THE USE OF THIS EXPRESSION BECOMES
IMPORTANT ESPECIALLY WHEN n IS SMALL

FOR LARGE VALUES OF n s   sample

HOWEVER, S > sample ALWAYS

(For the last example, s=2.07)


xj C) IF MORE THAN ONE ( SAY m ) EQUAL-SIZED RANDOM
SAMPLES ARE DRAWN FROM THE SAME UNIVERSE, THEN
THEIR RESPECTIVE MEANS AND STANDARD DEVIATIONS ARE
EXPECTED TO BE EQUAL TO EACH OTHER

x 1  x 2  .....  x m Population or Universe

s 1  s 2  .....  s m Sample 1

Sample 2 Sample m

It is also possible to treat xj and sj as statistical quantities and


define their standard deviations
STANDARD ERROR OF THE MEAN

s
sx 
n

THIS QUANTITY REPRESENTS THE STANDARD


DEVIATION OF
x FROM 

( For the last example, s x = 0.655 )


STANDARD ERROR OF THE
STANDARD DEVIATION

s sx
ss  
2n 2
THIS QUANTITY REPRESENTS THE STANDARD
DEVIATION OF s FROM 

For the last example, ss=0.463


CONTINUOUS DISTRIBUTIONS

IN ACTUAL EXPERIMENTS VALUES WILL BE LESS


DISCRETE

23.26 , 25.12 , etc

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
14 16 13 19 18 14 14 15 18 15

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
14.21 16.36 13.16 18.74 17.59 14.43 14.02 14.77 18.01 15.16
CONTINUOUS DISTRIBUTIONS

IF WE HAD A SET OF 100 DATA VALUES SUCH AS


23.26 , 25.12 ... , etc THEN THE FREQUENCY GRAPH
WOULD PROBABLY HAVE VERY FEW VALUES THAT
WERE THE SAME

Relative Frequency, fj
0.2

0.1

0.0
13 14 15 16 17 18 19
CONTINUOUS DISTRIBUTIONS

THE ONLY APPARENT MEANINGFUL QUANTITY


APPEARS TO BE THE DENSITY OF THE “DOTS”
CONTINUOUS DISTRIBUTIONS
LET US DIVIDE THE
DATA BY
INCREMENTS

16
CONTINUOUS DISTRIBUTIONS
NOW LET US COUNT
HOW MANY DATA
POINTS ARE BETWEEN
22.51 AND 23.50

16
If all intervals of interest are plotted, the result would
be a bar graph as:
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
14 16 13 19 18 14 14 15 18 15

Frequency Relative
nj Frequency
4 0.4
fj
3 0.3
x7
2 0.2
x6 x10 x9
1 0.1
x3 x1 x8 x2 x5 x4
0 0.0
13 14 15 16 17 18 19
IF MORE MEASUREMENTS WITH A MORE
ACCURATE DEVICE WERE TAKEN

x1 x2 x3 x4 x5 x6 x7 x8 x9 x 10
1 4 .2 1 1 6 .3 6 1 3 .1 6 1 8 .7 4 1 7 .5 9 1 4 .4 3 1 4 .0 2 1 4 .7 7 1 8 .0 1 1 5 .1 6

R e la t iv e F r e q u e n c y , f j
0 .2

0 .1

0 .0
13 14 15 16 17 18 19
AND IF THE DATA WERE INCREASED

R e la tiv e F re q u e n c y , f j
0 .1 0

0 .0 5

0 .0 0
13 14 15 16 17 18 19
Relative Frequency, f j
0.10

0.05

0.00
13 14 15 16 17 18 19
When all intervals of interest are plotted, the result would be a
bar graph as:

R elative F requ en cy, f j

0 .0 8 E n v elop e

0 .0 6

0 .0 4

0 .0 2

0 .0 0
13 14 15 16 17 18 19
THE INTERVAL MUST BE CHOSEN

* LARGE ENOUGH TO BE
MEANINGFUL

* SMALL ENOUGH
TO GIVE DETAIL

N = 5 log n for large n


N = 1 + 3.3 log n for n<25 Sturges rule
where n is the num ber of data points and N is
suggested num ber of class intervals.

You might also like