Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

College of Health and Allied Professions

Benigno S. Aquino Drive, Bacolod City

STATISTICS IN ANALYTICAL CHEMISTRY

Modern analytical chemistry is concerned with the detection, identification, and measurement of
the chemical composition of unknown substances using existing instrumental techniques, and the
development or application of new techniques and instruments. It is a quantitative science, meaning
that the desired result is almost always numeric.

Quantitative results are obtained using devices or instruments that allow us to determine the
concentration of a chemical in a sample from an observable signal. There is always some variation
in that signal over time due to noise and/or drift within the instrument. We also need to calibrate the
response as a function of analyte concentration in order to obtain meaningful quantitative data. As
a result, there is always an error, a deviation from the true value, inherent in that measurement. One
of the uses of statistics in analytical chemistry is therefore to provide an estimate of the likely value
of that error; in other words, to establish the uncertainty associated with the measurement.

We can use statistical methods to evaluate the random errors. Generally, we base statistical
analyses on the assumption that random errors in analytical results follow a Gaussian, or normal,
distribution, such as that illustrated in the next figure.

Figure 1. Normal distribution curve.

Statistical analysis only reveals information that is already present in a data set. No new information
is created by statistical treatments. Statistical methods do allow you to categorize and characterize
data in different ways and to make objective and intelligent decisions about data quality and
interpretation.

Sample and Population

Typically, in a scientific study, we infer information about a population or universe from observations
made on a subset or sample.

The population is the collection of all measurements of interest and must be carefully defined by the
experimenter. In some cases, the population is finite and real, while in others, the population is
hypothetical or conceptual in nature.

JSEspanola/ MSAlfaras /JDJavier/JMBedrio Page 1 of 4


In many of the cases encountered in analytical chemistry, the population is conceptual. Example, in
determining glucose in the blood of a patient, we could hypothetically make an extremely large
number of measurements if we used the entire blood supply. The subset of the population analyzed
in this case is the sample. Again, we infer characteristics of the population from those obtained with
the sample. Therefore, it is very important to define the population being characterized.

Parameter and Statistic

The term parameter refers to quantities such as population mean 𝜇 and population standard
deviation 𝜎 that define a population or distribution. Data values such as 𝑥 are variables. The term
statistic refers to an estimate of a parameter that is made from a sample of data.

Population Mean and Sample Mean

The sample mean 𝑥̅ is the arithmetic average of a limited sample drawn from a population of data.
The sample mean is defined as the sum of the measurement values divided by the number of
measurements.

The population mean 𝜇, in contrast, is the true mean for the population. The equation is just the same
as that of the sample mean except that the denominator represents the total number of
measurements in the population.

If there is no systematic error in the population, the population mean is also the true value for the
measured quantity.

Median

The median is the middle value when we order our data from the smallest to the largest value. When
the data has an odd number of values, the median is the middle value. For an even number of values,
the median is the average of the middle values, where n is the size of the data set.

Population Standard Deviation, 𝝈

The population standard deviation, which is a measure of the precision of the population, is given
by summing the squares of the deviations from the mean, dividing by the number of measurements
𝑁, and taking the square root of the result:

∑𝑁
𝑖=1(𝑥𝑖 − 𝜇)
2
𝜎= √
𝑁

The square of the standard deviation 𝜎 2 is also important. This quantity is called the variance.

JSEspanola/ MSAlfaras /JDJavier/JMBedrio Page 2 of 4


Sample Standard Deviation, 𝒔

The previous equation must be modified when it is applied to a small sample of data. Thus, the
sample standard deviation 𝑠 is given by the equation:

∑𝑁 (𝑥𝑖 − 𝑥̅ )2
𝑠 = √ 𝑖=1
𝑁−1
The 𝑁 − 1 represents the number of degrees of freedom. When 𝑁 − 1 is used instead of 𝑁, 𝑠 is said
to be an unbiased estimator of the population standard deviation 𝜎. If this substitution is not used,
the calculated 𝑠 will be less on average than the true standard deviation 𝜎, that is, 𝑠 will have a
negative bias.
Example 1: The following results were obtained in the replicate determination of the lead content
of a blood sample: 0.752, 0.756, 0.752, 0.751, and 0.760 ppm Pb. Find the mean and the standard
deviation of this set of data.

Solution:

To calculate the sample mean:


0.752 + 0.756 + 0.752 + 0.751 + 0.760
𝑥̅ = = 0.754 𝑝𝑝𝑚
5
To calculate the sample standard deviation, we can create a table:
Sample 𝒙𝒊 ̅
𝒙𝒊 − 𝒙
1 0.752 0.752 – 0.754 = -0.002
2 0.756 0.756 – 0.754 = 0.002
3 0.752 0.752 – 0.754 = -0.002
4 0.751 0.751 – 0.754 = -0.003
5 0.760 0.760 – 0.754 = 0.006

∑𝑁 (𝑥𝑖 − 𝑥̅ )2 (−0.002)2 + (0.002)2 + (−0.002)2 + (−0.003)2 + (0.006)2


𝑠 = √ 𝑖=1 =√
𝑁−1 5−1
𝑠 = 0.004 𝑝𝑝𝑚

Alternative Solution (Calculator Shortcut):

Most calculators nowadays can perform basic statistical operations. In your calculator, press
MODE, then select STAT and if options are available, use 1 – VAR. You will be displayed with a
table.

In the table, input your raw data as given in the problem. Press the “=” symbol after every data
entry. After which, press AC.

Look for the STAT key in your calculator. In most calculators, you have to press Shift + 1. Select
Var, then select 𝑥̅ to get the sample mean. This should give an answer of 0.7542 or 0.754 ppm.

To get the sample standard deviation, again, press STAT, select Var, then select 𝑠𝑥. This will give
a result of 0.00377 or 0.004 ppm.

When you make statistical calculations, remember that, because of the uncertainty in 𝑥̅ , a sample
standard deviation may differ significantly from the population standard deviation. As N becomes
larger, 𝑥̅ and 𝑠 become better estimators of 𝜇 and 𝜎.

JSEspanola/ MSAlfaras /JDJavier/JMBedrio Page 3 of 4


Sample Variance, 𝒔𝟐

The variance is just the square of the standard deviation. The sample variance 𝑠 2 is an estimate of
the population variance 𝜎 2 and is given by:
∑𝑁
𝑖=1(𝑥𝑖 − 𝑥̅ )
2
𝑠2 =
𝑁−1

Note that the standard deviation has the same units as the data, while the variance has the units of
the data squared. Scientists tend to use standard deviation rather than variance because it is easier
to relate a measurement and its precision if they both have the same units.

Relative Standard Deviation, 𝑹𝑺𝑫

Frequently standard deviations are given in relative rather than absolute terms. Calculate the relative
standard deviation (RSD) by dividing the standard deviation by the mean value of the data set.

𝑠
𝑅𝑆𝐷 =
𝑥̅

Coefficient of Variation, 𝑪𝑽

The RSD multiplied by 100% is called the coefficient of variation (CV).

𝑠
𝐶𝑉 = 𝑅𝑆𝐷 × 100% = × 100%
𝑥̅

Spread or Range, 𝒘

The spread, or range, is another term that is sometimes used to describe the precision of a set of
replicate results. It is the difference between the largest value in the set and the smallest.

EXERCISE

Consider the following sets of replicate measurements:


A B C D E F
9.5 55.35 0.612 5.7 20.63 0.972
8.5 55.32 0.592 4.2 20.65 0.943
9.1 55.20 0.694 5.6 20.64 0.986
9.3 0.700 4.8 20.51 0.937
9.1 5.0 0.954

For each set, calculate the (a) mean, (b) median, (c) spread or range, (d) standard deviation, and (e)
coefficient of variation.

JSEspanola/ MSAlfaras /JDJavier/JMBedrio Page 4 of 4

You might also like