Lecture 2-Data Analysis - Part 1

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

Analytical Chemistry

Chapter 2
Statistics in Analytical Chemistry

Instructor: Nguyen Thao Trang


Semester I 2016-2017
Outlines
• Errors in chemical analysis
– Important terms
– Significance figures
– Systematic errors
– Random errors

• Statistical treatment of random errors


– Gaussian distribution
– Error propagation
– Confidence interval

2
Introduction
• All measurements always involve in errors and uncertainties.
• Example: Errors involved in a titration

chem.uiuc.edu
chem-ilp.net

– Difference in color of the solution of at the endpoint: caused by


experimenter.
– Difference in volume of the titrant used: caused by personal error, fail
in calibration of buret,… 3
Introduction
• Measurement errors are inherently part of quantized world
we live à it is impossible to perform a chemical analysis that
is free of errors and uncertainties.

• Errors and uncertainties need to be minimized in a chemical


analysis. Frequent calibration, analysis of known samples, care
when exercising will lessen some errors.

• Size of errors needs to be determined if it is acceptable.

4
Important terms
• Mean: X, is the numerical average:

where Xi is the ith measurement, and n is the number of


independent measurements.

5
Important terms
• Median:
– Xmed is the middle value when data are ordered from the smallest to
the largest value.
– Odd number of measurements: median is the middle value.
– Even number of measurements: median is the average of the n/2 and
the (n/2) + 1 measurements, where n is the number of measurements.

6
Important terms
• Precision: refers to the closeness of the results
obtained from identical measurement →
describes reproducibility.

• Accuracy: describes how close a single True value


measurement to the true value and is
expressed by error. Measurement

• Precision and accuracy: are both achieved


when results are close to each other and to the
true value.

7
Significant figures
• Significant figures: the number of digits reported in a
measurement reflect the accuracy of the measurement and
the precision of the measurement device.

• Significant figures are all certain figures plus one extra figure
having some uncertainty.
• Example:

8
Significant figures
• Rule 1: Disregard all initial zeros, all remaining digits including
terminal zeros and zeros between nonzero integers are
significant.

• Examples: Determine the number of significant figures of


a. 0.005
b. 0.030
c. 0.207
d. 92500

9
Significant figures
• Rule 2: For addition and subtraction, the smallest number of
digits to the right of the decimal set the significance.
• Examples:
1.362 22.989 770
+ 35.453 Rule for rounding to drop
+ 3.111
all insignificant numbers:
4.473 58.442 770 round up for digits ≥ 5,
round down for digits < 5
Not significant
à58.443
Rounding up

• Exercises:
1) Rounding to 3 significant figures: 0.135 2; 0.0216 74
2) Write answer with the correct number of digits: 12.3 – 1.63 =;
1.021 + 1.63 =
10
Significant figures
• Rule 3: For multiple and division, the smallest number of
significant digits determines the significance.
• Examples:

3.26 × 10-5 34.60


× 1.78 ÷ 2.4687
5.80 × 10-5 14.05

• Exercise:
Write answer with the correct number of digits: 4.34 × 9.2 = 39.928

11
Significant figures
• Rule 4:
– Number of digits in mantissa of log x = number of significant figures in
x
• Example:

– Number of digits in antilog x ( 10x) = number of significant figures in


mantissa of x:
• Example:

• Exercises: find the significant figures of these numbers:


log 0.001 237 = ? ; log 3.2 = ?
antilog 4.37 = ? ; 102.600 = ?
12
Errors
• Absolute error E: in the measurement of a quantity x is given
by the equation:
𝐸 = 𝑋% -𝑋"
Where 𝑋" is the true or accepted value.

– Example: Results from 6 replicate determinations of iron in aqueous


samples of a standard solution containing 20.0 ppm iron(III ).

1st: 19.4; 2nd: 19.5; 3rd: 19.6; 4th: 19.8; 5th: 20.1; 6th: 20.3.

• Absolute error of the 4th replicate:


E = 19.8 - 20.0 = - 0.2 ppm
• Absolute error of the 5th replicate:
E = 20.1 - 20.0 = 0.1 ppm

– The sign in stating the absolute error is retained.

13
Errors
• Relative error Er: is a more useful quantity than the absolute
error.
'( *'+ '( *'+
𝐸𝑟 = % 𝐸𝑟 = ×100%
'( '(

– Example: Results from six replicate determinations of iron in aqueous


samples of a standard solution containing 20.0 ppm iron(III).

1st:19.4; 2nd: 19.5; 3rd: 19.6; 4th: 19.8; 5th: 20.1; 6th: 20.3
Mean = 19.8
Relative error for the mean:
Er = (19.8 - 20.0) x 100%/19.8 = - 1%

14
Errors
• Results can be precise without being accurate or accurate
without being precise.

15
Fundamentals of analytical chemistry, Skoog, D. A
Errors
• Every measurement has some uncertainty, called
experimental error.
• Experimental error is classified as systematic or random.
• Systematic errors:
– Also called determinate error, arises from a flaw in equipment or the
design of an experiment. If you conduct the experiment again in
exactly the same manner, the error is reproducible.
– In principle, systematic error can be discovered and corrected.

Measured pH 7.38 à 0.18 unit too high

When you read a pH of


7.00, the actual pH of the
Known pH 7.20
sample is ?
www.twinklinghope.wordpress.com 16
Systematic errors
• 3 types of systematic errors:

– Instrumental errors: are caused by non ideal instrument behavior, by


faulty calibrations, or by use under inappropriate conditions.
Calibration or proper use eliminates most systematic errors of this
type.

– Method errors: arise from non-ideal chemical or physical behavior of


analytical systems. Errors inherent in a method are often difficult to
detect and are thus the most serious of the three types of systematic
error.

– Personal errors: result from the carelessness, inattention, or personal


limitations of the experimenter.

17
Random errors
• Random errors:
– Also called indeterminate error, arises from uncontrolled variables in
the measurement.
– Random error has an equal chance of being positive or negative.
– It is always present and very difficult to be corrected.
– Example: Reading a scale

58.? (58.2, 58.3 or 58.4)

18
Gross errors
• Gross errors:
– Gross errors differ from indeterminate and determinate errors. They
usually occur only occasionally, are often large and may cause a result
to be either high or low.

– They are often the product of human errors.

– Example: Lost of precipitate before weighing à low result; Touching a


weighing bottle with bare hands after zero à high mass reading.

– Gross errors lead to outliers, results that appear to differ markedly


from all other data in a set of replicate measurements.

– Statistical tests can be performed to determine if a result is an outlier.

19
Statistical treatment of random errors
• Random or indeterminate errors exist in every measurement.

• Never totally be eliminated and are often the major source of


uncertainty in a determination.

• Accumulated effect of the individual uncertainties causes


replicate measurements to fluctuate randomly around the
mean of the set.

20
Statistical treatment of random errors
• Distribution of random errors:
– Example: Calibration of a 10 mL pipet with replication of 50 times.

à Replicate data from most quantitative analytical experiments


approaches that of the Gaussian curve (bell-shaped curve).
1 − ( x − µ )2 /2σ 2 µ: population mean
y= e
σ 2π σ: standard deviation
21
Fundamentals of analytical chemistry, Skoog, D. A
Statistical treatment of random errors
• Statistical analysis is based on the assumption that random
errors in analytical results follow a Gaussian, or normal
distribution.

• Population is the collection of all measurements of interest,


can be real and finite or a hypothesis or concept.

• Characterizing population by taking sample.

• The larger the number of samples, the closer the distribution


becomes to normal.

22
Properties of Gaussian curve
• Difference between sample mean 𝑋0 and population mean µ

∑3 𝑋% ∑3 𝑋%
𝑋0 = %45 𝜇 = % 45
𝑁 𝑁

N represents the number of N represents the number of measurements


measurements in the sample set. in the population.

• When no systematic errors present, population mean is also the true


value.
• When number of measurements the sample set is small, 𝑋0 is different
from ".
• Probable difference between 𝑋0 and " decreases with increasing the
number of measurements made up the sample.

23
Properties of Gaussian curve
• Population standard deviation #: is a measure of the precision
of a population data.

∑3
%45 𝑋% − 𝜇
: N represents the number of data
𝜎= points that make up the data.
𝑁

𝑋−𝜇 1 − z 2 /2
𝐼𝑓 𝑧 = y= e
𝜎 σ 2π

24
Fundamentals of analytical chemistry, Skoog, D. A
Properties of Gaussian curve
• Area under the Gaussian curve: between a pair or limits gives
the probability of a measured value.
– Example: calculate the probability of a measured value within ± σ.
+σ +1
1 − ( x − µ )2 /2σ 2 1 − z 2 /2
area = ∫ e dx = ∫ e dz = 0.683
−σ σ 2π −1 2π
~ 68.3% of the values will lie ~ 99.7% of the values will lie
within ± σ (z = ± 1) within ± 3σ (z = ± 3)

25
Fundamentals of analytical chemistry, Skoog, D. A
Properties of Gaussian curve
• The area under entire Gaussian curve = 1 à 100 % the values
making up the population will lie within ±∞.

26
Properties of Gaussian curve

Quantitative chemical analysis, Daniel Harris

27
Sample standard deviation
• Sample standard deviation s (absolute standard deviation):
:
∑3 0 2 ∑3
%45 𝑋% − 𝑋 %45 𝑑%
𝑠= =
𝑁 −1 𝑁−1

• Where 𝑋% − 𝑋0 2 represents the deviation di of value Xi from the


mean 𝑋0 .
• (N-1) is the number of degrees of freedom.
• Alternative expression of s:

:
: ∑3 𝑋
∑3 0 2 ∑%45 𝑋% − %45 %
3
%45 𝑋% − 𝑋 𝑁
𝑠= =
𝑁 −1 𝑁−1

28
Sample standard deviation
• Sample standard deviation s:
– Example:

29
Sample standard deviation
• Pooling data to increase the reliability of s:
– The pooled estimate of σ, spooled is a weighted average of individual
estimates:

30
Sample standard deviation
• Variance (s2): can be used to describe the precision of the
data. :
∑ 3 0
%45 𝑋% − 𝑋
2 ∑3
% 45 𝑑%
𝑠 =
2 =
𝑁−1 𝑁−1
𝑠
• Relative standard deviation (RSD): 𝑅𝑆𝐷 =
𝑋0

– The result is often expressed in ppt (part per thousand):


𝑠
𝑅𝑆𝐷 𝑖𝑛 𝑝𝑝𝑡 = ×1000 𝑝𝑝𝑡
𝑋0
– The result is also expressed in percent, coefficient of variance (CV):
𝑠
𝐶𝑉 = ×100%
𝑋0
• Spread or range (w): describes the precision of a set of
replicate results. 𝑤 = 𝑙𝑎𝑟𝑔𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 − 𝑠𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒
31
Error propagation
• Addition/subtraction:

If 𝑦 = 𝑎 + 𝑏 − 𝑐; then 𝑠V = 𝑠W : + 𝑠X : + 𝑠Y :

• Example:
Standard deviation of the result:

• Multiplication/Division:
[\ [] : [^ : [_ :
If 𝑦 = 𝑎×𝑏/𝑐; then = + +
V W X Y

• Example:

32
Error propagation
• Exponential:
[\ [] (the exponent x can be considered
If 𝑦 = 𝑎 ` ; then =𝑥
V W free of uncertainty).

• Example:

33
Error propagation
• Logarithm and antilogarithm:

5 [m [m
𝐼𝑓 𝑦 = log 𝑥 ; then 𝑠V = ≅ 0.434 26
jk5l ` `

[\
𝐼𝑓 𝑦 = 10 ` ; then = 𝑙𝑛10 𝑠` ≅ 2.302 6 𝑠`
V

• Examples:

34
Confidence intervals (CI)
• Confidence interval for the mean is the range of values within
which the population mean µ is expected to lie with a certain
probability.
• Example: 99% probable that the true population mean for a
set of calcium measurements lies in the interval 7.25% ±
0.15% Ca. Thus, the mean should lie in the interval from
7.10% to 7.40% Ca with 99% probability.

• 99% à confidence level % calcium (Ca)


• 7.10% - 7.40 % à confidence interval
7.40%
• 7.10%, 7.40% à confidence limits 99% chance that
the true value
7.25%
lies in this
interval
7.10%

35
CI when σ is known or s is a good approximation of σ

• For a single measurement:


– CI for 𝜇 = 𝑋 ± 𝑧𝜎 (z comes from the area under the Gaussian curve)
– % confidence is the % area defined by ± z.

Z = ± 0.67 à 50 % probability that µ Z = ± 2.58 à 90 % probability that µ


will fall in the interval 𝑋0 ± 0.67𝜎 will fall in the interval 𝑋0 ± 2.58 𝜎

– The probability that a result is outside of the confidence


level is often called the significance level. 36
CI when σ is known or s is a good approximation of σ
– Values for z at various confidence levels are listed in Table 7- 1

• For a series of measurements:


v ±
– CI for 𝜇 = 𝑋
wx experimental mean 𝑋v ; of N measurements
3

37
CI when σ is known or s is a good approximation of σ

• Example 1: Determine the 80% and 95% confidence intervals


for (a) the first entry (1108 mg/L glucose) and (b) the mean
value for month 1. Assume that in each part, s = 19 is a good
estimate of σ.

38
CI when σ is known or s is a good approximation of σ

• Example 2: How many replicate measurements in month 1 are


needed to decrease the 95% confidence interval to 1100.3 ±
10.0 mg/L of glucose?

à 14 measurements are needed to provide a slightly better than


95% chance that the population mean will lie within ± 10 mg/L
of the experimental mean.

39
CI when σ is unknown
• Often, limitations in time or in the amount of available sample
prevent us to assume s is a good estimate of σ.
• Use t statistical parameter t (Student’s t), which is defined in
exactly the same way as z except that s is substituted for σ.
• For a single measurement with result x:
𝑥−𝜇
𝑡=
𝑠
• For the mean of N measurements:
𝑥̅ − 𝜇
𝑡=
𝑠/ 𝑁
• CI for the mean of N replicate measurements:
𝑡𝑠
𝐶𝐼 𝑓𝑜𝑟 𝜇 = 𝑥v ±
𝑁
Note: t depends on the desired confidence level and the number of degrees
of freedom (N-1) in the calculation of s.
40
CI when σ is unknown
• Values of t at different degree of freedom and confidence
level:

41
CI when σ is unknown
• Example 1: chemist obtained the following data for the
alcohol content of a sample of blood: % C2H5OH: 0.084, 0.089,
and 0.079. Calculate the 95% confidence interval for the mean
assuming:
(a) The three results obtained are the only indication of the precision of
the method

42
CI when σ is unknown
(b) from previous experience on hundreds of samples, we know that the
standard deviation of the method s = 0.005% C2H5OH and is a good
estimate of σ

à A sure knowledge of σ (± 0.006% as compared to ± 0.012%


of unknown σ)can decrease the confidence interval by a
significant amount.

43

You might also like