Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 11

Standard deviation

http://en.wikipedia.org/wiki/Standard_deviation, From Wikipedia, the free encyclopedia


In probability and statistics, the standard deviation of a probability distribtion, random variable,
or poplation or mltiset of vales is a measre of the spread of its vales. It is sally denoted with
the letter ! "lower case sigma#. It is defined as the s$are root of the variance.
%o nderstand standard deviation, keep in mind that variance is the average of the s$ared
differences between data points and the mean. &ariance is tablated in nits s$ared. Standard
deviation, being the s$are root of that $antity, therefore measres the spread of data abot the
mean, measred in the same nits as the data.
Said more formally, the standard deviation is the root mean s$are "'(S# deviation of vales from
their arithmetic mean.
For e)ample, in the poplation *+, ,-, the mean is . and the deviations from mean are */0, 0-.
%hose deviations s$ared are *+, +- the average of which "the variance# is +. %herefore, the
standard deviation is 0. In this case 1223 of the vales in the poplation are at one standard
deviation of the mean.
%he standard deviation is the most common measre of statistical dispersion, measring how widely
spread the vales in a data set are. If the data points are close to the mean, then the standard
deviation is small. 4s well, if many data points are far from the mean, then the standard deviation is
large. If all the data vales are e$al, then the standard deviation is 5ero.
For a poplation, the standard deviation can be estimated by a modified standard deviation "s# of a
sample. %he formlas are given below.
6iven a random variable "in ble#, the standard deviation ! is a
measre of the spread of the vales of the random variable away from
its mean 7.
Contents
1 8efinition and calclation
o
1.1 Standard deviation of a random variable
o
1.0 9stimating poplation standard deviation from sample standard deviation
0+01,,:2:.doc, ;age 1 of 11
0 9)ample
: Interpretation and application
o
:.1 'eal/life e)amples

:.1.1 Weather

:.1.0 Sports

:.1.: Finance
o
:.0 6eometric interpretation
o
:.: 'les for normally distribted data
o
:.+ <hebyshev=s ine$ality
+ 'elationship between standard deviation and mean
> 'apid calclation methods
. See also
? 9)ternal links
1. Definition and calculation
1.1 Standard deviation of a random variable
%he standard deviation of a random variable X is defined as:
where 9"X# is the e)pected vale of X.
@ot all random variables have a standard deviation, since these e)pected vales need not e)ist. For
e)ample, the standard deviation of a random variable which follows a <achy distribtion is
ndefined.
If the random variable X takes on the vales "which are real nmbers# with e$al
probability, then its standard deviation can be compted as follows. First, the mean of X, , is
defined as a smmation:
where N is the nmber of samples taken. @e)t, the standard deviation simplifies to
In other words, the standard deviation of a discrete niform random variable X can be calclated as
follows:
0+01,,:2:.doc, ;age 0 of 11
1. For each vale x
i
calclate the difference between x
i
and the average vale .
0. <alclate the s$ares of these differences.
:. Find the average of the s$ared differences. %his $antity is the variance !
0
.
+. %ake the s$are root of the variance.
%he above e)pression can also be replaced with
9$ality of these two e)pressions can be shown by a bit of algebra:
1.2 Estimating population standard deviation from sample standard deviation
In the real world, finding the standard deviation of an entire poplation is nrealistic e)cept in
certain cases, sch as standardi5ed testing, where every member of a poplation is sampled. In most
cases, the standard deviation is estimated by e)amining a random sample taken from the poplation.
%he most common measre sed is the sample standard deviation, which is defined by
where is the sample and is the mean of the sample. %he denominator N A 1 is the
nmber of degrees of freedom in the vector .
%he reason for this definition is that s
0
is an nbiased estimator for the variance !
0
of the nderlying
poplation, if that variance e)ists and the sample vales are drawn independently with replacement.
Bowever, s is not an nbiased estimator for the standard deviation !C it tends to nderestimate the
poplation standard deviation. 4lthogh an nbiased estimator for ! is known when the random
variable is normally distribted, the formla is complicated and amonts to a minor correction.
(oreover, nbiasedness, in this sense of the word, is not always desirableC see bias of an estimator.
4nother estimator sometimes sed is the similar e)pression
0+01,,:2:.doc, ;age : of 11
%his form has a niformly smaller mean s$ared error than does the nbiased estimator, and is the
ma)imm/likelihood estimate when the poplation is normally distribted.
2. Example
We will show how to calclate the standard deviation of a poplation. Dr e)ample will se the
ages of for yong children: * >, ., ,, E -.
Step 1. <alclate the mean average, :
We have N F + becase there are for data points:
'eplacing N with +
%his is the mean.
Step 0. <alclate the standard deviation :
'eplacing N with +
'eplacing with ?
0+01,,:2:.doc, ;age + of 11
So, the standard deviation is the s$are root of five halves, or appro)imately 1.>,.
Were this set a sample drawn from a larger poplation of children, and the $estion at hand was the
standard deviation of the poplation, convention wold replace the denominator N "or +# in step 0
here with NA1 "or :#.
3. Interpretation and application
4 large standard deviation indicates that the data points are far from the mean and a small standard
deviation indicates that they are clstered closely arond the mean.
For e)ample, each of the three data sets *2, 2, 1+, 1+-, *2, ., ,, 1+- and *., ., ,, ,- has a mean of
?. %heir standard deviations are ?, >, and 1, respectively. %he third set has a mch smaller standard
deviation than the other two becase its vales are all close to ?. In a loose sense, the standard
deviation tells s how far from the mean the data points tend to be. It will have the same nits as the
data points themselves. If, for instance, the data set *2, ., ,, 1+- represents the ages of for siblings
in years, the standard deviation is > years.
4s another e)ample, the data set *1222, 122., 122,, 121+- may represent the distances traveled by
for athletes in : mintes, measred in meters. It has a mean of 122? meters, and a standard
deviation of > meters.
Standard deviation may serve as a measre of ncertainty. In physical science for e)ample, the
reported standard deviation of a grop of repeated measrements shold give the precision of those
0+01,,:2:.doc, ;age > of 11
measrements. When deciding whether measrements agree with a theoretical prediction, the
standard deviation of those measrements is of crcial importance: if the mean of the measrements
is too far away from the prediction "with the distance measred in standard deviations#, then we
consider the measrements as contradicting the prediction. %his makes sense since they fall otside
the range of vales that cold reasonably be e)pected to occr if the prediction were correct and the
standard deviation appropriately $antified. See prediction interval.
3.1 Real-life examples
%he practical vale of nderstanding the standard deviation of a set of vales is in appreciating how
mch variation there is from the GaverageG "mean#.
3.1.1 eat!er
4s a simple e)ample, consider average temperatres for cities. While two cities may each have an
average temperatre of .2 HF, it=s helpfl to nderstand that the range for cities near the coast is
smaller than for cities inland, which clarifies that, while the average is similar, the chance for
variation is greater inland than near the coast.
So, an average of .2 occrs for one city with highs of ,2 HF and lows of +2 HF, and also occrs for
another city with highs of .> and lows of >>. %he standard deviation allows s to recogni5e that the
average for the city with the wider variation, and ths a higher standard deviation, will not offer as
reliable a prediction of temperatre as the city with the smaller variation and lower standard
deviation.
3.1.2 Sports
4nother way of seeing it is to consider sports teams. In any set of categories, there will be teams
that rate highly at some things and poorly at others. <hances are, the teams that lead in the standings
will not show sch disparity, bt will be pretty good in most categories. %he lower the standard
deviation of their ratings in each category, the more balanced and consistent they might be. So, a
team that is consistently bad in most categories will have a high standard deviation indicating it will
probably lose more often than win. 4 team that is consistently good in most categories will also
have a low standard deviation and will therefore end p winning more than losing. 4 team with a
high standard deviation might be the type of team that scores a lot "strong offense# bt gets scored
on a lot too "weak defense#C or vice versa, might have a poor offense, bt compensate by being
difficlt to score on / teams with a higher standard deviation will be more npredictable.
%rying to predict which teams, on any given day, will win, may inclde looking at the standard
deviations of the varios team GstatsG ratings, in which anomalies can match strengths vs
weaknesses to attempt to nderstand what factors may prevail as stronger indicators of evental
scoring otcomes.
In racing, a driver is timed on sccessive laps. 4 driver with a low standard deviation of lap times is
more consistent than a driver with a higher standard deviation. %his information can be sed to help
nderstand where opportnities might be fond to redce lap times.
0+01,,:2:.doc, ;age . of 11
3.1.3 "inance
In finance, standard deviation is a representation of the risk associated with a given secrity "stocks,
bonds, property, etc.#, or the risk of a portfolio of secrities. 'isk is an important factor in
determining how to efficiently manage a portfolio of investments becase it determines the
variation in retrns on the asset and/or portfolio and gives investors a mathematical basis for
investment decisions. %he overall concept of risk is that as it increases, the e)pected retrn on the
asset will increase as a reslt of the risk premim earned / in other words, investors shold e)pect a
higher retrn on an investment when said investment carries a higher level of risk.
For e)ample, yo have a choice between two stocks: Stock 4 historically retrns >3 with a
standard deviation of 123, while Stock I retrns .3 and carries a standard deviation of 023. Dn
the basis of risk and retrn, an investor may decide that Stock 4 is the better choice, becase the
additional percentage point of retrn "an additional 023 in dollar terms# generated by Stock I is not
worth doble the degree of risk associated with Stock 4. Stock I is likely to fall short of the initial
investment more often than Stock 4 nder the same circmstances, and will retrn only one
percentage point more on average. In this e)ample, Stock 4 has the potential to earn 123 more than
the e)pected retrn, bt is e$ally likely to earn 123 less than the e)pected retrn.
<alclating the average retrn "or arithmetic mean# of a secrity over a given nmber of periods
will generate an e)pected retrn on the asset. For each period, sbtracting the e)pected retrn from
the actal retrn reslts in the variance. S$are the variance in each period to find the effect of the
reslt on the overall risk of the asset. %he larger the variance in a period, the greater risk the secrity
carries. %aking the average of the s$ared variances reslts in the measrement of overall nits of
risk associated with the asset. Finding the s$are root of this variance will reslt in the standard
deviation of the investment tool in $estion. Jse this measrement, combined with the average
retrn on the secrity, as a basis for comparing secrities.
3.2 #eometric interpretation
%o gain some geometric insights, we will start with a poplation of three vales, x
1
, x
0
, x
:
. %his
defines a point P F "x
1
, x
0
, x
:
# in R
:
. <onsider the line L F *"r, r, r# : r in R-. %his is the Gmain
diagonalG going throgh the origin. If or three given vales were all e$al, then the standard
deviation wold be 5ero and P wold lie on L. So it is not nreasonable to assme that the standard
deviation is related to the distance of P to L. 4nd that is indeed the case. (oving orthogonally from
P to the line L, one hits the point:
whose coordinates are the mean of the vales we started ot with. 4 little algebra shows that the
distance between P and R "which is the same as the distance between P and the line L# is given by
!K3. 4n analogos formla "with : replaced by N# is also valid for a poplation of N valesC we
then have to work in R
N
.
0+01,,:2:.doc, ;age ? of 11
3.3 Rules for normall$ distributed data
8ark ble is less than one standard deviation from the mean. For the normal distribtion, this
acconts for .,.0? 3 of the setC while two standard deviations from the mean "medim and dark
ble# accont for E>.+> 3C and three standard deviations "light, medim, and dark ble# accont for
EE.?: 3.
In practice, one often assmes that the data are from an appro)imately normally distribted
poplation. %his is fre$ently Lstified by the classical central limit theorem, which says that sms
of many independent, identically/distribted random variables tend towards the normal distribtion
as a limit. If that assmption is Lstified, then abot ., 3 of the vales are within 1 standard
deviation of the mean, abot E> 3 of the vales are within two standard deviations and abot
EE.? 3 lie within : standard deviations. %his is known as the 68-95-99.7 rle, or t!e empirical rle
%he confidence intervals are as follows:
! .,.0.,E+E01:?13
0! E>.++EE?:.12:.3
:! EE.?:2202:E:.?3
+! EE.EE:..>?>1.:3
>! EE.EEEE+0..E.E3
.! EE.EEEEEE,20.,3
?! EE.EEEEEEEEE?+3
For normal distribtions, the two points of the crve which are one standard deviation from the
mean are also the inflection points.
3.% C!eb$s!ev&s ine'ualit$
<hebyshev=s ine$ality proves that in any data set, nearly all of the vales will be nearer to the
mean vale, where the meaning of Gclose toG is specified by the standard deviation. <hebyshev=s
0+01,,:2:.doc, ;age , of 11
ine$ality entails that for "nearly# all random distribtions, not Lst normal ones, we have the
following weaker bonds:
4t least >23 of the vales are within 1.+1 standard deviations from the mean.
4t least ?>3 of the vales are within 0 standard deviations from the mean.
4t least ,E3 of the vales are within : standard deviations from the mean.
4t least E+3 of the vales are within + standard deviations from the mean.
4t least E.3 of the vales are within > standard deviations from the mean.
4t least E?3 of the vales are within . standard deviations from the mean.
4t least E,3 of the vales are within ? standard deviations from the mean.
4nd in general:
4t least "1 A 1/"
0
# M 1223 of the vales are within " standard deviations from the
mean.
(. Relations!ip bet)een standard deviation and mean
%he mean and the standard deviation of a set of data are sally reported together. In a certain
sense, the standard deviation is a GnatralG measre of statistical dispersion if the center of the data
is measred abot the mean. %his is becase the standard deviation from the mean is smaller than
from any other point. %he precise statement is the following: sppose x
1
, ..., x
n
are real nmbers and
define the fnction:
Jsing calcls, it is possible to show that !"r# has a ni$e minimm at the mean:
"%his can also be done with fairly simple algebra alone, since !
0
"r# is e$ated to a $adratic
polynomial#.
%he coefficient of variation of a sample is the ratio of the standard deviation to the mean. It is a
dimensionless nmber that can be sed to compare the amont of variance between poplations
with different means.
%. Rapid calculation met!ods
4 slightly faster "significantly for rnning standard deviation# way to compte the poplation
standard deviation is given by the following formla "thogh considerations mst be made for
rond/off error, arithmetic overflow, and arithmetic nderflow conditions#:
0+01,,:2:.doc, ;age E of 11
or
where the power sms s
2
, s
1
, s
0
are defined by
Similarly for sample standard deviation:
Dr from rnning sms:
#ee also algorit!ms $or calclating variance.
*. See also
4lgorithms for calclating variance
4n ine$ality on location and scale
parameters
<hebyshev=s ine$ality
<onfidence interval
<mlant
8eviation "statistics#
Nrtosis
(ean absolte error
(ean
;ooled standard deviation
'aw score
'oot mean s$are
Sample si5e
Satration "color theory#
Skewness
Standard error
Standard score
&ariance
&olatility
Oamartino (ethod for calclating
standard deviation of wind direction
+. External lin,s
0+01,,:2:.doc, ;age 12 of 11
Standard 8eviation, an elementary introdction
Standard 8eviation, a simpler e)planation for writers and Lornalists
Standard 8eviation <alclator
Standard 8eviation In :2 Seconds / a non/maths e)planation
Statistics
Descriptive statistics (ean "4rithmetic, 6eometric# / (edian / (ode / ;ower / &ariance /
Standard deviation
Inferential statistics Bypothesis testing / Significance / @ll hypothesis/4lternate hypothesis /
9rror / P/test / Stdent=s t/test / (a)imm likelihood / Standard score/P
score / ;/vale / 4nalysis of variance
Survival anal$sis Srvival fnction / Naplan/(eier / Qogrank test / Failre rate / ;roportional
ha5ards models
-robabilit$ distributions @ormal "bell crve# / ;oisson / Iernolli
Correlation ;earson prodct/moment correlation coefficient / 'ank correlation
"Spearman=s rank correlation coefficient, Nendall ta rank correlation
coefficient#
Regression anal$sis Qinear regression / @onlinear regression / Qogistic regression
0+01,,:2:.doc, ;age 11 of 11

You might also like