Professional Documents
Culture Documents
Evaluation of Analytical Data
Evaluation of Analytical Data
Evaluation of Analytical
data
“It is impossible to perform a chemical
analysis that is totally free of errors, or
uncertainties. All can hope is to minimize
these errors and to estimate their size with
acceptable accuracy.”
Errors in Chemical Analysis
Every measurement is influenced by many
uncertainties, which combine to produce a scatter of
results.
Measurement uncertainties can never be completely
eliminated, so the true value for any quantity is
always unknown.
Therefore, it would be wrong to say that blood
testing is “error-free” when presenting data to court.
Remember: data of unknown quality are worthless.
Measures of Central Tendency
Chemists usually carry three to five replicates
(portions) of a sample through an analytical
procedure.
Replicate samples, or replicates, are portions of a
material of approximately the same size that are
carried through an analytical procedure at the same
time and in the same way.
Individual results from a set of measurements are
seldom the same, so a central or “best” value is used
for the set.
1. Mean, x
arithmetic mean or average
the quantity obtained by dividing the sum of
replicate measurements (xi) by the number of
measurements (N) in the set.
Mathematically
N speaking:
xi
x i 1 = ( x + x + x + x + … + x )
N 1 2 3 4 N
N
2. Median, M
middle value of a sample of results arranged in
order of increasing/decreasing magnitude.
odd number of results
take the middle value
even number of results
take the mean of the two middle values
Example 1
Calculate the mean and the median for the following data:
20.3, 19.4, 19.8, 20.1, 19.6 , 19.5
Ideally, the mean and the median are identical.
Frequently they are not particularly when the
number of measurements in the set is small.
Illustration
Tell whether:
(high or low)
X
X
accuracy (high or
low) precision
low accuracy low accuracy
(X represents the true low precision high precision
value
() represents the
X
X
replicates)
high accuracy
highaccuracy
low precision high precision
Example 2
Determine the absolute error and relative error (in %
and ppt) and the for the mean in ex. 1 above given
that the true value is 20.0
Example 2
(a) Calculate the mean and the median for the
following data: 20.3, 19.4, 19.8, 20.1, 19.6 , 19.5
(b) Determine the relative error (in % and ppt) and the
absolute error for the mean in (a) above given that the
true value is 20.0
Answer:
Absolute error, E = -0.2 Relative error, Er = -1%
Er = -10ppt
Example 2
Determine the relative error (in % and ppt) and the
absolute error for the mean in ex. 1 above given that
the true value is 20.0
Absolute error, E = -0.2 Relative error, Er = -1%
Er = -10ppt
Note: Accuracy measures the agreement between a
result and its true value. Precision describes the
agreement among several results that have been
measured in the same way.
Types of Errors
1. Random / Indeterminate Error
causes data to be scattered more or less systematically around a mean value.
In general, then, the random error in a measurement is reflected
by its precision
2. Systematic / Determinate Error
causes the mean of a set of data to differ from the accepted value.
causes the results in a series of replicate measurements to be all high or low.
3. Gross Error
occur only occasionally, are often large and may cause a result to be either
high or low.
leads to outliers, results that obviously differ significantly from the rest of
the data of replicate measurements.
Sources of Systematic Errors
a. Instrumental error
caused by imperfection of measuring devices and
instabilities in their power supplies.
glasswares used at temperatures that differ from their
calibration temperature
distortion of container walls
errors in the original calibration
contaminants on the inner surface of the containers
Sources of Systematic Errors
b. Methodic error
arises from non-ideal chemical or physical behavior of
analytical systems.
due to slowness of some reactions
incompleteness of a reaction
instability of some species
nonspecificity of the reagents
possible occurrence of side reactions
Sources of Systematic Errors
c. Personal error
results from the carelessness, inattention or personal limitations of
the experimenter.
estimating the level of the liquid between two scale divisions
the color of the solution at the end point in a titration
a. constant error
independent of the magnitude of the measured quantity (the magnitude stays essentially
the same as the size of the quantity measure is varied).
becomes less significant as the magnitude increases.
e.g. constant end-point error of 0.10 mL
sample 1: 10.0 mL titrant:
Er = 0.10 mL x 100% = 1.0 %
10.0 mL
b. proportional error
increases or decreases in proportion to the size of
the sample taken for analysis.
common cause of this error is the presence of
interfering contaminants in the sample.
Classification of Systematic Errors:
b. proportional error
Illustration: Determination of copper based upon the reaction of
copper(II) ion with potassium iodide to give iodine.
First, the quantity of iodine is measured and is proportional to the
results are observed for the percentage of copper because the iodine
produced will be a measure of the copper(II) and iron(III) in the
sample.
Detection of Systematic Error:
a. systematic instrument error
usually found and corrected by calibration.
periodic calibration of the equipment is always desirable because
the response of most instruments changes with time as a result of
wear, corrosion and mistreatment.
b. systematic personal error
can be minimized by self-discipline.
a good habit is to check the instrument readings, notebook entries,
and calculations systematically.
c. systematic methodic error
analytical method has its biases and difficult to detect.
Detection of Systematic Error:
One or more of the following steps recognize and
adjust for a systematic error in an analytical method:
a. analysis of standard samples
the analysis of the standard reference materials, SRM
(materials that contain one or more analytes with exactly
known concentration levels.)
standard material can be purchased or sometimes prepared
by synthesis but unfortunately, this often impossible or so
difficult and time consuming that this approach is not
practical.
Standard Reference Materials (SRM)
x i
i 1 where N =
N
Statistical Treatment of Random Errors
x
i 1
i
where N is finite
x
N
Measures of Precision
population standard deviation, σ – a measure of
the precision of a population of data and is
mathematically given by:
n
x
2
i
i 1
N
Measures of Precision
sample standard deviation, s – a measure of the
precision of a sample of data and is mathematically
given by:
x x
n
2
i
s i 1
N 1
Measures of Precision
standard deviation of the mean, S or sm
s
S = -----
N
Other ways of expressing precision:
x
n
2
i x
i 1
s
2
N 1
Other ways of expressing precision:
Coefficient of Variation, CV
s
CV = x 100%
x
Relative Standard Deviation, RSD, and
s
RSD = x 1000 ppt
x
Other ways of expressing precision:
3. spread or range, w
the difference between the largest value and the
smallest in the set of data.
w = highest value – lowest value
The standard deviation for curve B is twice that for curve A, that is, B = 2A.
In (a) the abscissa is the deviation from the mean (x – µ) in the units of
measurement.
In (b) the abscissa is the deviation from the mean in units of . For this plot, the
two curves A and B are identical.
Example 3
The following results were obtained in the replicate
determination of the lead content of a blood sample:
0.752, 0.756, 0.752, 0.751, and 0.760 ppm Pb.
Calculate the mean, standard deviation, standard
deviation of the mean and relative standard
deviation.
Reliability of s as a Measure of Precision
The value of t
depends on the
desired confidence
level and on the
number of degrees of
freedom ( N – 1 ) in
the calculation of s.
Example: Confidence Limit
Determine the 80% and 95% confidence
intervals for the following glucose reading:
1108, 1122, 1075, 1099, 1115, 1083, 1100
Detection of Gross Error
When a set of data contains an outlying result that
appears to differ exclusively from the average, the
decision must be made whether to retain or reject
it. It is an unfortunate fact that no universal rule
can be invoked to settle the question of retention or
rejection.
The Q-test
is a simple, widely used statistical test; Qexp is the
absolute value of the questionable result Xq and its
neighbor Xn (provided that the result was arranged in
increasing or decreasing order) divided by the range
or spread of the entire set.
>
Xq Xn
Qexp
w
Recommendation for Treatment of Outliers:
Assumptions:
1. There is actually a linear relationship between the measured
variable, y, and the analyte concentration, x.
and Evaluation
Experimentalist use statistical calculations to sharpen their judgments concerning
the quality of experimental measurements. The most common application of
statistics to analytical chemistry includes:
a. establishing confidence limits for the mean of a set of replicate data.
b. determining the number of replications required to decrease the confidence
limit for a mean for a given level of confidence.
c. determining at a given probability whether an experimental mean is different
from the accepted value for the quantity being measured.
d. determining at a given probability level whether two experimental means are
different.
e. determining at a given probability level whether precision of two sets of
measurements differs.
f. deciding whether an outlier is probably the result of a gross error and should
be discarded in calculating a mean.
g. defining and estimating detection limits
h. treating calibration data.
Exercises
1. Consider the following sets of replicate samples.
A B C D
69.94 0.0902 2.3 0.624
69.92 0.0884 2.6 0.613
69.80 0.0886 2.2 0.596
69.88 0.1000 2.4 0.607
69.84 0.0935 2.9 0.582
3 No.3
2.5
1.5
0.5
0
0 5 10 15 20 25
80
60
f(x) = − 29.74 x + 92.86
40 R² = 0.998451637723867
20
No.4 0
0.50 1.00 1.50 2.00 2.50 3.00 3.50 4.00 4.50 5.00 5.50
-20
-40
-60
Random Errors
Standard Deviation of
Calculated Results
Standard Deviation of Calculated Results
Standard Deviation of a Sum or Difference
where sa, sb, and sc are the standard deviations of the three terms
making up the result.
Substituting the standard deviations from the
example gives:
1. There is actually a linear relationship between the measured variable (y) and the
analyte concentration (x). The mathematical relationship that describes this
assumption is called the regression model, which may be represented as
y = mx + b
b= y-intercept (value of y when x is zero)
m = slope of the line
2. Any deviation of individual points from the straight line results from error in
the measurement, that is, we must assume, that there is no error in the x values
of the points. For the example of a calibration curve, we must assume that exact
concentrations of the standards are known.
Computing Regression Coefficients and
Finding the Least-Squares Line
A vertical deviation of each point from the straight line is called a
residual.
The line generated by the least-squares method is one that
minimizes the sum of the squares of the residuals for all the points.
In addition to providing the best fit between the experimental points
and the straight line, the method gives the standard deviations for m
and b.
Definition of 3 important quantities Sxx , Syy , Sxy
The use of differences in calculation is cumbersome, and the previous equation can
be transformed to a more convenient form:
r = ∑ xi yi - n xy
√ ( ∑ xi2 - nx 2)(∑ yi 2 – ny 2)
= n ∑ x iy i - ∑ x iy i
√ [ n ∑ xi2 – (∑ xi)2][n ∑ yi2 – (∑ yi)2]
The maximum value of r is 1. When this occurs, there is exact correlation between
the two variables.
When the value of r is zero (this occurs when xy is equal to zero), there is complete
independence of the variables.
The minimum value of r is – 1. A negative correlation coefficient indicates that the
assumbed dependence is opposite to what exists and is therefore a positive
coefficient for the reversed relation.
Pearson correlation coefficient, r , and r 2
(coeff. of determination)
The closer the data points are to the line predicted by the least-squares analysis, the
smaller are the residuals.
The sum of the squares of the residuals, SSresid , measures the variation in the observed
values of the dependent variable (y values) that are not explained by the presumed linear
relationship between x and y.
SSresid = ∑ [ yi – (b + mx)]2
We can also define a total sum of the squares SStot , as
SStot = Syy = ∑ (yi - y )2 = ∑ yi 2 – (∑ yi )2/N
SStot is a measure of the total variation in the observed values of y because of the
deviations are measured from the mean value of y.
The coefficient of determination (R2) measures the fraction of the observed
variation in y that is explained by the linear relationship and is given by
The closer R2 is to unity, the better the
R2 = 1 – (SSresid / SStot ) linear model explains the y variation.
Using Excel to Do Least-Squares
y
1.08 2.60 2 Linear (Series1)
1.5
1.38 3.03
1
1.75 4.01 0.5
0
0 0.5 1 1.5 2
x
Statistical Treatment of Random Errors