Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 102

Topic III

Evaluation of Analytical
data
“It is impossible to perform a chemical
analysis that is totally free of errors, or
uncertainties. All can hope is to minimize
these errors and to estimate their size with
acceptable accuracy.”
Errors in Chemical Analysis
 Every measurement is influenced by many
uncertainties, which combine to produce a scatter of
results.
 Measurement uncertainties can never be completely
eliminated, so the true value for any quantity is
always unknown.
 Therefore, it would be wrong to say that blood
testing is “error-free” when presenting data to court.
Remember: data of unknown quality are worthless.
Measures of Central Tendency
 Chemists usually carry three to five replicates
(portions) of a sample through an analytical
procedure.
 Replicate samples, or replicates, are portions of a
material of approximately the same size that are
carried through an analytical procedure at the same
time and in the same way.
 Individual results from a set of measurements are
seldom the same, so a central or “best” value is used
for the set.
1. Mean, x
 arithmetic mean or average
 the quantity obtained by dividing the sum of
replicate measurements (xi) by the number of
measurements (N) in the set.

Mathematically
N speaking:
 xi
x  i 1 = ( x + x + x + x + … + x )
N 1 2 3 4 N

N
2. Median, M
 middle value of a sample of results arranged in
order of increasing/decreasing magnitude.
 odd number of results
 take the middle value
 even number of results
 take the mean of the two middle values
Example 1
Calculate the mean and the median for the following data:
20.3, 19.4, 19.8, 20.1, 19.6 , 19.5
Ideally, the mean and the median are identical.
Frequently they are not particularly when the
number of measurements in the set is small.

The median is used advantageously when a set


of data contains an outlier.
An outlier, a result that differs significantly
from others in the set.
An outlier can have a significant effect on the
mean but lesser on the median.
3. Mode
 the value that occurs most frequently in a set of
determinations.
Precision
 describes the reproducibility of the measurements.
 tells how close the results are, provided that they are obtained in exactly
the same way.
 deals with repeatability (within-runs) and reproducibility (between-runs).
 three terms are widely used to describe the precision of a set of replicate
data:
 standard deviation
 Variance
 coefficient of variation.
All these terms are a function of the deviation from the mean which is
defined as:
 deviation from the mean = di  / x/ i  xt
Accuracy
 indicates the closeness of the measurements to its true
value or accepted value and is expressed by the error
(or simply the proximity to the true value).
 is expressed in terms of:
a. absolute error, E = xi - xt
b. relative error, Er
xi  xt
Er  x100%(in terms of %)
xt
xi  xt
Er  x1000 (in terms of ppt)
xt
Illustration

 
Tell whether: 


 


(high or low) 

X



X
 
accuracy (high or  

  
low) precision 

(X represents the true


value
 
 
() represents the 

 X


 X


   
replicates) 

 

Illustration

 
Tell whether: 


 


(high or low) 

X



X
 
accuracy (high or  

  
low) precision 
low accuracy low accuracy
(X represents the true low precision high precision
value
 
 
() represents the 

 X


 X


   
replicates) 


high accuracy
 highaccuracy
low precision high precision
Example 2
Determine the absolute error and relative error (in %
and ppt) and the for the mean in ex. 1 above given
that the true value is 20.0
Example 2
 (a) Calculate the mean and the median for the
following data: 20.3, 19.4, 19.8, 20.1, 19.6 , 19.5

 (b) Determine the relative error (in % and ppt) and the
absolute error for the mean in (a) above given that the
true value is 20.0

Answer:
Absolute error, E = -0.2 Relative error, Er = -1%
Er = -10ppt
Example 2
Determine the relative error (in % and ppt) and the
absolute error for the mean in ex. 1 above given that
the true value is 20.0
Absolute error, E = -0.2 Relative error, Er = -1%
Er = -10ppt
Note: Accuracy measures the agreement between a
result and its true value. Precision describes the
agreement among several results that have been
measured in the same way.
Types of Errors
1. Random / Indeterminate Error
 causes data to be scattered more or less systematically around a mean value.
 In general, then, the random error in a measurement is reflected
by its precision
2. Systematic / Determinate Error
 causes the mean of a set of data to differ from the accepted value.
 causes the results in a series of replicate measurements to be all high or low.
3. Gross Error
 occur only occasionally, are often large and may cause a result to be either
high or low.
 leads to outliers, results that obviously differ significantly from the rest of
the data of replicate measurements.
Sources of Systematic Errors
a. Instrumental error
 caused by imperfection of measuring devices and
instabilities in their power supplies.
 glasswares used at temperatures that differ from their
calibration temperature
 distortion of container walls
 errors in the original calibration
 contaminants on the inner surface of the containers
Sources of Systematic Errors
b. Methodic error
 arises from non-ideal chemical or physical behavior of
analytical systems.
 due to slowness of some reactions
 incompleteness of a reaction
 instability of some species
 nonspecificity of the reagents
 possible occurrence of side reactions
Sources of Systematic Errors
c. Personal error
 results from the carelessness, inattention or personal limitations of
the experimenter.
 estimating the level of the liquid between two scale divisions
 the color of the solution at the end point in a titration

Persons who make measurements must guard against personal bias


to preserve the integrity of the collected data. Of the three types of
systematic errors encountered in a chemical analysis, methodic
errors are usually the most difficult to identify and correct.
Classification of Systematic Errors:

a. constant error
 independent of the magnitude of the measured quantity (the magnitude stays essentially
the same as the size of the quantity measure is varied).
 becomes less significant as the magnitude increases.
e.g. constant end-point error of 0.10 mL
sample 1: 10.0 mL titrant:
Er = 0.10 mL x 100% = 1.0 %
10.0 mL

sample 2: 50.0 mL titrant:


Er = 0.10 mL x 100% = 0.20 %
50.0 mL
Classification of Systematic Errors:

b. proportional error
 increases or decreases in proportion to the size of
the sample taken for analysis.
 common cause of this error is the presence of
interfering contaminants in the sample.
Classification of Systematic Errors:

b. proportional error
Illustration: Determination of copper based upon the reaction of
copper(II) ion with potassium iodide to give iodine.
 First, the quantity of iodine is measured and is proportional to the

amount of copper(II) in the sample.


 If iron(III) is present, it will also liberate iodine from potassium
iodide.
 Unless steps are taken to eliminate the interference, iron(III), high

results are observed for the percentage of copper because the iodine
produced will be a measure of the copper(II) and iron(III) in the
sample.
Detection of Systematic Error:
a. systematic instrument error
 usually found and corrected by calibration.
 periodic calibration of the equipment is always desirable because
the response of most instruments changes with time as a result of
wear, corrosion and mistreatment.
b. systematic personal error
 can be minimized by self-discipline.
 a good habit is to check the instrument readings, notebook entries,
and calculations systematically.
c. systematic methodic error
 analytical method has its biases and difficult to detect.
Detection of Systematic Error:
 One or more of the following steps recognize and
adjust for a systematic error in an analytical method:
a. analysis of standard samples
 the analysis of the standard reference materials, SRM
(materials that contain one or more analytes with exactly
known concentration levels.)
 standard material can be purchased or sometimes prepared
by synthesis but unfortunately, this often impossible or so
difficult and time consuming that this approach is not
practical.
Standard Reference Materials (SRM)

 SRM can be purchased from a number of governmental and


industrial sources (e.g. National Institute of Standards and
Technology, NIST which offers over 900 SRM)
 Concentration of the SRM has been determined in one of
the three ways:
 through analysis by previously validated reference method,
 through analysis by two or more independent, reliable
measurement methods,
 through analysis by a network of cooperating laboratories,
technically competent and thoroughly knowledgeable with the
material being tested.
Detection of Systematic Error:
b. independent analysis
 a second independent and reliable analytical method to
be used in parallel with the method being evaluated.
 should differ as much as possible from the method
used.
 This minimizes the possibility that some common
factor in the sample has the same effect on both
methods.
Detection of Systematic Error:
c. blank determination
 useful for detecting certain types of constant errors.
 all steps of the analysis are performed in the absence of the sample.
 the results from the blank are then applied as a correction to the
sample measurements.
 this reveals errors due to interfering contaminants from the
reagents and vessels employed in the analysis.
 this also allow the analyst to correct titration data for the volume of
reagent needed to cause an indicator to change color at the end-
point.
Detection of Systematic Error:
d. variation in sample size
 can detect constant errors (as the size of the
measurement increases, the effect of a constant
error decreases).
Random Errors
 arise when a system of measurement is extended
to its maximum sensitivity. This type of error is
caused by many uncontrollable variables that are
an inevitable part of every physical or chemical
measurement.
 the accumulated effect of the individual
indeterminate uncertainties, however, causes
replicate measurements to fluctuate randomly
around the mean of the set.
A histogram showing distribution of the 50
results in calibrating a 10-mL pipet
Frequency Data from the Histogram
Sources of Random Errors
The table below shows the
possible ways that four
errors can combine to give
the indicated deviations from
the mean value.
 When a very large
number of individual
error is processed and the
magnitude of error is
plotted against relative
frequency, a bell-shaped
curve is obtained.
 This plot is known as the
Gaussian or normal
error curve – a curve
that shows the
symmetrical distribution
of data around the mean
set of data.
Sources of Random Errors

The frequency distribution of


some results or measurements
can be plotted into different
(b) Frequency distribution for
forms of graphs. measurements containing 10 random
uncertainties

(c) Frequency distribution for


(a) Frequency distribution for measurements
measurements containing a very large
containing 4 random uncertainties
number of random uncertainties
Statistical Treatment of Random Errors

 sample – a finite number of experimental


observations; a tiny fraction of infinite number of
observations.
 population or universe – theoretical infinite
number of data.
 population mean, μ – true mean of the population;
in the absence of any systematic error, this is also
the true value for the measured quantity.
Statistical Treatment of Random Errors

 population mean, μ – true mean of the population;


in the absence of any systematic error, this is also
the true value for the measured quantity.
n

x i
 i 1 where N = 
N
Statistical Treatment of Random Errors

 sample mean, – the mean of a limited sample


drawn from the population of the data.

x
i 1
i
where N is finite
x
N
Measures of Precision
 population standard deviation, σ – a measure of
the precision of a population of data and is
mathematically given by:
n

 x  
2
i
 i 1

N
Measures of Precision
 sample standard deviation, s – a measure of the
precision of a sample of data and is mathematically
given by:

 x  x 
n
2
i
s i 1
N 1
Measures of Precision
 standard deviation of the mean, S or sm

s
S = -----
N
Other ways of expressing precision:

 Variance, s2 – simply the square of the standard


deviation.

 x 
n
2
i x
i 1
s 
2

N 1
Other ways of expressing precision:

 Coefficient of Variation, CV

s
CV = x 100%
x
 Relative Standard Deviation, RSD, and

s
RSD = x 1000 ppt
x
Other ways of expressing precision:

3. spread or range, w
 the difference between the largest value and the
smallest in the set of data.
w = highest value – lowest value
The standard deviation for curve B is twice that for curve A, that is, B = 2A.

In (a) the abscissa is the deviation from the mean (x – µ) in the units of
measurement.

In (b) the abscissa is the deviation from the mean in units of . For this plot, the
two curves A and B are identical.
Example 3
The following results were obtained in the replicate
determination of the lead content of a blood sample:
0.752, 0.756, 0.752, 0.751, and 0.760 ppm Pb.
Calculate the mean, standard deviation, standard
deviation of the mean and relative standard
deviation.
Reliability of s as a Measure of Precision

 Most of the statistical tests described are based


upon sample standard deviations, and the
probability of correctness of the results of these
tests improves as the reliability of s becomes
greater. Uncertainty in the calculated value of s
decreases as N increases. When N is greater than
20, s and σ can assumed to be identical for all
practical purposes.
Pooling Data to Improve the Reliability of s

 data from a series of similar samples accumulated over time can


often be pooled to provide an estimate of s superior to the value of
the individual subset.
 mathematical equation of the superior s or pooled standard
deviation:
 x    x 
N1 N2
2 2
i  x1 j  x2  ...
i 1 j 1
S pooled 
N 1  N 2  N 3  ...  N T
where: N1 = number of data in set 1
N2 = number of data in set 2
NT = number of data sets that are pooled
N1 + N2 + … – NT = degrees of freedom
Example: S pooled
Glucose levels are routinely monitored in patients suffering from
diabetes. The glucose concentrations in a patient with mildly elevated
glucose levels were determined in different months by a
spectrophotometric analytical method. The patient was placed on a low-
sugar diet to reduce the glucose levels. The following results were
obtained during a study to determine the effectiveness of the diet.
Calculate a pooled estimate of the standard deviation for the method.
The Confidence Limit
 The exact value of the mean, μ, for a population of
data can never be determined exactly because such a
determination requires that an infinite number of
measurements be made. Statistical theory, however,
allows us to set limits around an experimentally
determined mean within which the population mean
lies with a given degree of probability. These limits
are called confidence limits, and the interval they
define is known as the confidence interval, CI.
CI for μ = x ± ts
N
The Confidence Limit

The value of t
depends on the
desired confidence
level and on the
number of degrees of
freedom ( N – 1 ) in
the calculation of s.
Example: Confidence Limit
 Determine the 80% and 95% confidence
intervals for the following glucose reading:
1108, 1122, 1075, 1099, 1115, 1083, 1100
Detection of Gross Error
 When a set of data contains an outlying result that
appears to differ exclusively from the average, the
decision must be made whether to retain or reject
it. It is an unfortunate fact that no universal rule
can be invoked to settle the question of retention or
rejection.
The Q-test­
 is a simple, widely used statistical test; Qexp is the
absolute value of the questionable result Xq and its
neighbor Xn (provided that the result was arranged in
increasing or decreasing order) divided by the range
or spread of the entire set.
>

Xq  Xn
Qexp 
w
Recommendation for Treatment of Outliers:

 Reexamine carefully all data relating to the outlying result to see if


a gross error could have affected its value.
 If possible, estimate the precision that can be reasonably expected
from the procedure to be sure that the outlying result actually is
questionable.
 If more data cannot be secured, apply Q-test to the existing set if
the doubtful result should be retained or rejected on statistical
grounds.
 If the Q-test indicates retention, consider reporting the median of
the set rather than the mean. The median has the great virtue of
allowing inclusion of all data in a set without undue influence
from an outlying value. In addition, the median of a normally
distributed set containing 3 measurements provides a better
estimate of the correct value than the mean of the set after an
Least-Square Method
(A tool for calibration plots)
 Most analytical methods are based on
experimentally determined/derived calibration
plots/curves in which a measured quantity, y, is
plotted as a function of x. The ordinate is the
dependent variable and the abscissa is the
independent variable. As is typical and desirable,
the plot approximates a straight line. Note however
than along the process, indeterminate errors may
arise and consequently not all data fall exactly on
the same line.
Least-Square Method
(A tool for calibration plots)

Assumptions:
1. There is actually a linear relationship between the measured
variable, y, and the analyte concentration, x.

Recall: Equation of the Line: y = mx + b


where b = y-intercept (the value of y when x is zero)
m = slope of the line.

2. Any deviations of the individual points from the straight line


results from error in the measurement, that is, there is no
error in the x values of the points.
Least-Square Method
(A tool for calibration plots)

 The equations shown on


the right are used to
calculate the slope, m
and the intercept, b of
the line.
Application of Statistics to Data Treatment

and Evaluation
Experimentalist use statistical calculations to sharpen their judgments concerning
the quality of experimental measurements. The most common application of
statistics to analytical chemistry includes:
a. establishing confidence limits for the mean of a set of replicate data.
b. determining the number of replications required to decrease the confidence
limit for a mean for a given level of confidence.
c. determining at a given probability whether an experimental mean is different
from the accepted value for the quantity being measured.
d. determining at a given probability level whether two experimental means are
different.
e. determining at a given probability level whether precision of two sets of
measurements differs.
f. deciding whether an outlier is probably the result of a gross error and should
be discarded in calculating a mean.
g. defining and estimating detection limits
h. treating calibration data.
Exercises
1. Consider the following sets of replicate samples.
A B C D
69.94 0.0902 2.3 0.624
69.92 0.0884 2.6 0.613
69.80 0.0886 2.2 0.596
69.88 0.1000 2.4 0.607
69.84 0.0935 2.9 0.582

For each set, calculate the:


a. mean,
b. median,
c. range,
d. standard deviation,
e. coefficient of variation,
f. standard deviation of the mean
g. 95% confidence limit.
(Observe rules on significant figures)
Exercises
2. Apply Q-test to the following set of data and determine
whether the outlying result is retained or rejected at 95%.
What is the reported value after applying Q-Test?
 41.27, 41.71, 41.84, 41.78
 7.290, 7.284, 7.388, 7.292
 85.10, 84.72, 84.70, 84.68
Exercises
3. The sulfate ion concentration in natural water can be
determined by measuring the turbidity that results when an
excess of BaCl2 is added to a measured quantity of the sample. A
turbiditimeter, the instrument used for this analysis, was
calibrated with a series of standard Na2SO4 solutions. The
following data were obtained for the calibration:
Assume that a linear relationship exists Cx , mg Turbiditimeter
between the instrument reading and SO42– /L reading, R
concentration. 0.00 0.06
a. Derive an equation of a line that
5.00 1.48
tells us the relationship of Cx and
10.00 2.28
R. 15.12 3.67
b. What is the concentration of
15.00 3.98
sulfate when the turbiditimeter 20.00 4.61
reading was 3.67?
Exercises
4. The following data were obtained in calibrating a
calcium ion electrode for the determination of pCa. A
linear relationship between the potential E and pCa is
known to exist.
a. Derive an equation of the pCa E, mV
line that tells us the 5.00 -53.8
relationship of potential with 4.00 -27.7
pCa. 3.00 +2.7
2.44
2.00 +20.3
+31.9
b. What is the pCa of a serum
solution when the electrode 1.00 +65.1
potential was 20.3 mV?
5

4.5 f(x) = 0.232 x + 0.162


4 R² = 0.983389947439503
3.5

3 No.3
2.5

1.5

0.5

0
0 5 10 15 20 25

80

60
f(x) = − 29.74 x + 92.86
40 R² = 0.998451637723867

20
No.4 0
0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.00 5.50
-20

-40

-60
Random Errors
Standard Deviation of
Calculated Results
Standard Deviation of Calculated Results
Standard Deviation of a Sum or Difference

 Consider the summation:

where the numbers in parentheses are absolute


standard deviations.
 The variance of a sum or difference is equal to the
sum of the individual variances.
Standard Deviation of a Sum or Difference

 The most probable value for a standard deviation of


a sum or difference can be found by taking the
square root of the sum of the squares of the
individual absolute standard deviations. So, for the
computation:
Standard Deviation of a Sum or Difference

 Hence, the standard deviation of the result sy is:

where sa, sb, and sc are the standard deviations of the three terms
making up the result.
 Substituting the standard deviations from the
example gives:

 and the sum should be reported as 2.63 ( 0.06).


Standard Deviation of a Product or
Quotient
 Consider the following computation where the
numbers in parentheses are again absolute standard
deviations:

 In this situation, the standard deviations of two of the


numbers in the calculation are larger than the result
itself.
 Evidently, we need a different approach for
multiplication and division.
Standard Deviation of a Product or
Quotient
 As shown in Table 6-4, the relative standard
deviation of a product or quotient is determined by
the relative standard deviations of the numbers
forming the computed result. For example, in the
case of:

 we obtain the relative standard deviation sy/y of the


result by summing the squares of the relative
standard deviations of a, b, and c.
Standard Deviation of a Product or
Quotient
 then calculating the square root of the sum:

 Applying this equation to the numerical example gives:


Standard Deviation of a Product or
Quotient
 To complete the calculation, we must find the
absolute standard deviation of the result,

 and we can write the answer and its uncertainty as


0.0104 ( 0.0003).
 Note that if y is a negative number, we should treat
sy/y as an absolute value.
Standard Deviation of Calculated Results

 For a sum or a difference, the standard deviation of


the answer is the square root of the sum of the
squares of the standard deviations of the numbers
used in the calculation.

 For multiplication or division, the relative standard


deviation of the answer is the square root of the sum
of the squares of the relative standard deviations of
the numbers that are multiplied or divided.
Example
Least squares method:
A Tool for Calibration Plots
 Most analytical methods are based on a calibration curve in which a
measured quantity y is plotted as a function of the known
concentration x of a series of standards.
 As is typical (and desirable), the plot approximates a straight line.
 However, because of the indeterminate errors in the measurement
process, not all the data fall exactly on the line. Thus, the
investigator must try to draw the “best” straight line among the
points.
 A statistical technique called regression analysis provides the
means for objectively obtaining such a line and also for specifying
the uncertainties associated with its subsequent use.
 We consider here only the basic procedure for 2-dimensional data
that conform to a linear regression model: the method of least
squares.
Assumptions of the Least-Squares Methods

1. There is actually a linear relationship between the measured variable (y) and the
analyte concentration (x). The mathematical relationship that describes this
assumption is called the regression model, which may be represented as
y = mx + b
b= y-intercept (value of y when x is zero)
m = slope of the line

Figure 3.4. Slope-intercept form of a straight line

2. Any deviation of individual points from the straight line results from error in
the measurement, that is, we must assume, that there is no error in the x values
of the points. For the example of a calibration curve, we must assume that exact
concentrations of the standards are known.
Computing Regression Coefficients and
Finding the Least-Squares Line
 A vertical deviation of each point from the straight line is called a
residual.
 The line generated by the least-squares method is one that
minimizes the sum of the squares of the residuals for all the points.
 In addition to providing the best fit between the experimental points
and the straight line, the method gives the standard deviations for m
and b.
 Definition of 3 important quantities Sxx , Syy , Sxy

Sxx = ∑ (xi – x )2 = ∑ xi2 – (∑xi)2/N


Syy = ∑ (yi - y )2 = ∑ yi 2 – (∑ y1)2 /N
Sxy = ∑ (xi – x )(yi – y ) = ∑ xi yi – (∑ xi ∑ yi)/N
Computing Regression Coefficients and
Finding the Least-Squares Line

where xi & yi = individual pairs of data for x and y


N = no. of pairs of data used in preparing the calibration curve
x & y = average values for the variables
x = ∑ xi /N and y = ∑ yi /N
6 useful quantities that can be derived from Sxx , Syy and Sxy :
1. slope of the line, m m = Sxy / Sxx
2. intercept b b = y - mx
3. standard deviation about regression Sr = √ (Syy – m2 Sxx)/ N – 2
4. standard deviation of the slope Sm = √ Sr2 / Sxx
5. standard deviation of the intercept Sb = Sr √ (∑xi2)/ N ∑xi2- (∑xi)2
6. standard deviation for results obtained from the calibration curve Sc
Sc = Sr / m √1/M + 1/N + [(yc – y )2 /m2Sxx]
Sample Problem
(Least-squares analysis)

Carry out a least-squares analysis of the


following experimental data:
Calibration Data for the Chromatographic
Determination of Isooctane in a Hydrocarbon Mixture
Mole Percent Peak Area
Isooctane, xi yi
0.352 1.00
0.803 1.78
1.08 2.60
1.38 3.03
1.75 4.01
Sample Problem
(Least-squares analysis)

Mole Percent Peak Area


Isooctane, xi yi xi 2 yi 2 x i yi

0.352 1.09 0.12390 1.1881 0.38368

0.803 1.78 0.64481 3.1684 1.42934

1.08 2.60 1.16640 6.7600 2.80800

1.38 3.03 1.90440 9.1809 4.18140

1.75 4.01 3.06250 16.0801 7.01750

∑= 5.365 12.51 6.90201 36.3775 15.81992


Sample Problem
(Least-squares analysis)
Sample Problem
(Least-squares analysis)
Answers: (Sxx = 1.14537 ; Syy = 5.07748; Sxy = 2.39669;
m = 2.09; b= 0.26; Eqn for least-squares line: y= 2.09x +
0.26); Sr = 0.14; Sm = 0 .13; Sb = 0.16)
 2. The calibration curve determined in the previous example
was used for the chromatographic determination of isooctane
in a hydrocarbon mixture. A peak area of 2.65 was obtained.
Calculate the mole percent of isooctane and the standard
deviation for the result if the peak area was (a) the result of a
single measurement and (b) the mean of four measurements.
 Answers: ( x = 1.14 mol %;
(a) Sc = 0.074 mol % (b) 0.046 mol % )
Correlation Coefficient and Coefficient of
Determination
 Correlation coefficient – used as a measure of the correlation between two
variables.
 When x and y are correlated rather than being functionally related (i.e., are not
directly dependent upon one another), we do not speak of the “best” y value
corresponding to a given x value, but only the most “probable” value.
 The closer the observed values are to the most probable values, the more definite is
the relationship between x and y.
 The Pearson correlation coefficient is one of the most convenient to calculate.
This is given by
r = ∑ (xi- x)(yi- y)
n sxsy
where r= correlation coefficient, n is the number of observations, sx is standard
deviation of x, sy is the standard deviation of y, x and y are the individual variables
of x and y, respectively, and x and y are their means.
Pearson correlation coefficient, r

 The use of differences in calculation is cumbersome, and the previous equation can
be transformed to a more convenient form:
r = ∑ xi yi - n xy
√ ( ∑ xi2 - nx 2)(∑ yi 2 – ny 2)
= n ∑ x iy i - ∑ x iy i
√ [ n ∑ xi2 – (∑ xi)2][n ∑ yi2 – (∑ yi)2]

 The maximum value of r is 1. When this occurs, there is exact correlation between
the two variables.
 When the value of r is zero (this occurs when xy is equal to zero), there is complete
independence of the variables.
 The minimum value of r is – 1. A negative correlation coefficient indicates that the
assumbed dependence is opposite to what exists and is therefore a positive
coefficient for the reversed relation.
Pearson correlation coefficient, r , and r 2
(coeff. of determination)

 A correlation coefficient near 1 means there is a direct relationship


between two variables, e.g., absorbance and concentration.
 A correlation coefficient can be calculated for a calibration curve to
ascertain the degree of correlation between the measured instrumental
variable and the sample concentration.
 As a general rule, 0.90 < r < 0.95 indicates a fair curve, 0.95 < r < 0.99, a
good curve, and r > 0.99 indicates excellent linearity.
 A more conservative measure of closeness to fit is the square of the
correlation coefficient, r2 , and this is what most statistical programs
calculate (including Excel)
 R2, the coefficient of determination, is a better measure of it.
 The goodness of fit is judged by the number of 9’s. So three 9’s (0.999)
or better represents an excellent fit.
Interpretation of Least-Squares Results
98

 The closer the data points are to the line predicted by the least-squares analysis, the
smaller are the residuals.
 The sum of the squares of the residuals, SSresid , measures the variation in the observed
values of the dependent variable (y values) that are not explained by the presumed linear
relationship between x and y.
SSresid = ∑ [ yi – (b + mx)]2
 We can also define a total sum of the squares SStot , as
SStot = Syy = ∑ (yi - y )2 = ∑ yi 2 – (∑ yi )2/N
SStot is a measure of the total variation in the observed values of y because of the
deviations are measured from the mean value of y.
 The coefficient of determination (R2) measures the fraction of the observed
variation in y that is explained by the linear relationship and is given by
The closer R2 is to unity, the better the
R2 = 1 – (SSresid / SStot ) linear model explains the y variation.
Using Excel to Do Least-Squares

Mole Percent Peak Area 4.5


y = 2.0925x + 0.2567
Isooctane, xi yi 4
R2 = 0.9877
0.352 1.09 3.5
3
0.803 1.78
2.5 Series1

y
1.08 2.60 2 Linear (Series1)
1.5
1.38 3.03
1
1.75 4.01 0.5
0
0 0.5 1 1.5 2
x
Statistical Treatment of Random Errors

 sample mean, x - the mean of a limited sample drawn from the


population of the data

WAYS OF EXPRESSING PRECISION


 population standard deviation, σ – a measure of the precision of a
population data and is mathematically given by:
(xi - µ)= deviation of individual results from
the population mean of data

 sample standard deviation, s – a measure of the precision of a sample of


data (xi – x ) = deviation of individual results
from the sample mean
N-1 = number of degrees freedom ; indicates
the no. of independent data that go into the
computation of s
Other Ways of Expressing Precision

 Average Deviation (from the mean), d – often given in scientific papers as


a measure of variability
d = ∑ |xi – x |
n
 For a large group of data which are normally distributed, the average deviation
approaches 0.8σ
 Relative Average Deviation (%) = d / x × 100%
 Relative Average Deviation (ppt) =d / x × 1000 %
 Coefficient of Variation (CV)or % RSD v = s / x × 100
 Variance = s 2
 Relative Standard Deviation (RSD) in ppt = s / x × 1000
 Range or Spread, w = highest value – lowest value
 Standard deviation of the mean (std error of the mean, s m ) S = s /√n
Sample Problem
1. The normality of a solution is determined by 4 separate titrations,
the results being 0.2041, 0.2049, 0.2039, and 0.2043. Calculate the
mean, median, range, average deviation, relative average deviation
in ppt, standard deviation, and coefficient of variation, and standard
deviation of the mean.
Answers: (0.2043 N ; 0.2042 N; 0.0010 ; 0.0003; 1.5 ppt ;
0.0004; 0.2%; 0.0002)

You might also like