Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Data Analysis for Physics Laboratory

Dr M. Lloyd and Dr A. Markwick September 2015

These notes cover the most important points from the PHYS 10181B Data Analysis
course. They are intended as an aide-memoire, and do not replace the e-learning module
lecture notes. Further material can be found in textbooks, such as Practical Physics
by G.L. Squires.

It is good laboratory practice, which helps with data analysis, to keep good records of
everything that could possibly be relevant, recording all of your readings in ink, directly
into a neat laboratory notebook.

Standard errors
When we measure some quantity, our measured value, x, is very unlikely to be exactly
equal to the true value, X, of that quantity. It is useful to have a good estimate of how
close to the true value any given measurement is likely to be. This is called the Standard
Error. Although we use the word error, this does not imply that the measurement has
been done badly, or incorrectly. It is better to think of the standard error as a descrip-
tion of the uncertainty inherent in the measurement. The value of the standard error
does not just apply to our one measurement, but will be the same for any measurement
of the same quantity using the same apparatus and technique.

Uncertainties dont represent absolute limits: for example, a result x implies 68.3%
confidence that the true value is in the range x x + , and 95.5% confidence that
the true value lies in the range x 2 x + 2.

There are two important types of errors: random errors due to factors such as the
intrinsic accuracy of the apparatus; and systematic errors, which cause measurements
to deviate from true values in a systematic way.

Repeated measurements and the Gaussian (normal) distribution


Consider a sample of n repeated measurements, xi , of the same quantity, each with the
same standard error, . The best estimate of the true value is the average, x. The
mean squared deviation (or variance), s2 , of the sample is a measure of the spread of
the distribution of measurements:
n
1X
s2 = (xi x)2 .
n i=1
The standard deviation (the root mean squared deviation), s, of the sample is just the
square root of this:
v
n
u1 X
u
s= t (x i x)2 .
n i=1
We can calculate the values of x and s directly from our sample of measurements xi ,
and use them to obtain our best estimate, with uncertainty, of the true value of X.

The sample represents a finite set of measurements from an underlying distribution


which, in the ideal case, is a Gaussian (also called normal) distribution:

(x X)2
!
1
f (x) = exp .
2 2 2
The quantity X is the mean value of the Gaussian distribution and is the standard de-
viation of this distribution. It represents the standard error on a single measurement, i.e.
it is related to the precision of the apparatus and the care with which the measurements
are made.
Although we cannot measure X or directly, we can use our value of x as our best
estimate of X, and a good estimate of may be obtained from the standard deviation
of the finite sample of n measurements, using:
s
n
s.
n1
q
In practice, the number of measurements, n, is usually large enough that n/(n 1)
can be taken as unity. In other words, the standard deviation of the finite sample of
measurements is a very good approximation to the standard deviation of the underlying
(infinite) Gaussian distribution.

The standard error on our average value, x, is given by the standard deviation of the
distribution (i.e. the standard error on an individual measurement) divided by the square
root of the number of measurements in our sample:
s
m = = .
n n1
From such a set of measurements, the quoted result would therefore be x m .

From the point of view of the design of an experiment involving repeated measurement,
the final precision therefore depends on both the precision of each measurement and the
number of measurements taken.

Estimating the error on a single measurement


Most often, one doesnt make many measurements of the same quantity. When only
one, or a few, measurements are taken (or repeated measurement would just give the
same answer) we need to estimate the uncertainty. One method is to ask yourself: over
what range of values am I almost certain (say 95% confidence) that the true value lies?
That range would correspond to 2 (i.e. four times the uncertainty).

Fitting functions to data


One often investigates the dependence of a quantity (a dependent variable y) on another
quantity (an independent variable x) which is deliberately varied in definite steps.

For a set of N measurements (xi , yi i ) compared to a theoretical curve, y = f (x), the


value of chi-squared is given by:
N
!2
2
X yi f (xi )
= .
i=1 i
This value is a measure of how likely it is that these measured data points come from a
physical system which behaves according to the function y = f (x). In a fit to the data, in
which some parameters of f (x) are allowed to vary (usually by a computer), the best fit
corresponds to the set of parameter values which gives the minimum value of chi-squared.

The number of degrees of freedom, N dof , is equal to the number of data points, N ,
minus the number of free parameters in the fit. A fit to a straight line, y = mx + c, has
two free parameters, m and c.

For a good fit, the value of chi-squared will be approximately equal to the number of
degrees of freedom. As a rule of thumb, values of chi-squared in the range from half to
twice N dof may be regarded as indicating an acceptable fitqif N dof is less than q about
2
20. For larger N dof , values of /N dof in the range 1 8/N dof to 1 + 8/N dof
would be acceptable. A good fit tells us that this particular set of data points could
reasonably have been measured from a physical system governed by this function. Note
that there may also be other functions or parameter values which give a good fit.
Values outside of these acceptable ranges usually imply that something is wrong, either
with the uncertainties i , where an under-estimate would give a high chi-sq and an over-
estimate would give a low chi-sq, or with the assumed function f (x). Always look at a
plot of your data points and best-fit function to help you to interpret your calculated
value of chi-sq sensibly.
Often, the normalised value of chi-sq (2 /N dof ) is used; this is referred to as the reduced
chi-sq or 2r .

Combining or propagating errors


A set of independently measured quantities x, y, z,... each with uncertainties (x , y , z
...), may be combined to get the value of some function f = f (x, y, z, ...) and its uncer-
tainty (f ). The general formula for combining errors involves the partial derivatives of
the function:
!2 !2 !2
f f f
f2 = x2 + y2 + z2 + ...
x y z
The terms represent the contribution to the uncertainty in f from the uncertainty in each
of the measured variables. The contributions are added in quadrature. (For complicated
functions, rather than working out tricky partial derivatives, it may be easier to find
these contributions by varying the value of each measurement in turn by its uncertainty,
while keeping all other quantities fixed, and recalculating the value of f ).

The formula leads to some simple rules, such as:


1. When independent measurements are added or subtracted, then the absolute errors
are combined in quadrature; for example, if f = x + y z, then

f2 = x2 + y2 + z2 .

2. When independent quantities are multiplied or divided, then the fractional (or
percentage) errors are combined in quadrature. For example, if f = xy/z, then
!2 2 !2 2
f x y z
 
= + + .
f x y z

3. When a measured quantity is raised to a power, the fractional error is multiplied


3
by the (modulus of) the power. For example, if f = x 2 , then
!
f 3 x
 
= .
f 2 x

4. For the natural log function f = ln x, then f = (x /x).

The weighted average


For a set of n measurements, xi , of the same quantity, each with a different error, i ,
the best estimate of the true value is the weighted average, given by:
Pn
wi x i
Wmean = Pi=1
n
i=1 wi

where wi is the weight of a measurement, xi , given by:


1
wi = .
i2
The standard error on the weighted average is given by:
1
(W mean) = P .
wi
Counting Experiments and the Poisson distribution
Many types of processes involving discrete occurrences, such as radioactive decay or su-
pernovae explosions follow a Poisson distribution. This distribution applies in situations
where we are counting events, and each individual event is rare and happens randomly.
The Poisson distribution has the form
N e
f (N, ) =
N!
where N is the number of events counted and is the average rate at which these events
are expected to occur.
The expectation value (i.e. most likely value) is , and for this distribution function, the
variance is also . This means that the standard deviation of the Poisson distribution

is .

The shape of the Poisson distribution is not symmetric, however, as the value of
increases it becomes very like the Gaussian distribution, and for values of above about
10, it is reasonable to treat it as a (discrete) Gaussian
distribution. In an experiment
where N counts are recorded, a statistical error of N may be assigned, to allow for the
random fluctuations in the counting rate. So our best estimate of is N N .

You might also like