Descriptive Stat Excel

You might also like

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 3

W.R.

Wilcox, Clarkson University


Last revised September 17, 2012
Definitions of descriptive statistics of a single variable
generated by the Descriptive Statistics tool in Excels Data Analysis
Background
Imagine that we want to know the distance from the front wall of this room to its back wall. We
measure it. We measure it again and obtain a slightly different result. We might guess that the
average of these two measurements would be closer to the true (unknown) value, and that the
more measurements we make the closer the average will be to the true value. In principle, the
number of possible measurements is unlimited.
We might also measure the diameter of pistons being produced in an automotive plant. Each of
these will be somewhat different, reflecting not only errors in our method of measuring but also
real variations in the actual diameter. Again, in principle, there is no limit to the number of pistons
that could be produced and measured.
In both our eamples, we define the !population" as the number of measurements that could be
made and !samples" as the actual measurements made. #he challenge of statistics is to use the
samples to estimate characteristics of the population. $ften, we use different symbols for these
characteristics, depending on whether they are for the population or for the samples. %or eample,
the population mean (average) is generally given the &reek letter mu, ', and the sample mean is
written x . #he s(uare root of the average s(uare of the deviation of individual values of the
population from ' is the population standard deviation, and is given the &reek letter sigma, ). #he
sample standard deviation, s, is defined below and is an estimate of ). As the sample si*e n is
increased, x becomes closer to ' and s closer to ).
In the following, we denote the individual value of the sample or measurement as i, where i goes
from + to n. #he terms below appear in the order they are produced by Ecel,s -escriptive
.tatistics. Each term is followed in capital letters by the Ecel function that produces the same
value, a definition or eplanation of the statistic, and then the relevant e(uation.
/ote that the mean, standard error, median, mode, standard deviation, range, minimum, maimum,
sum and confidence level all have the same units as the sample values i.
Mean (A0E1A&E)2 #he sum of all samples divided by the number of values2
n

n
+
i
=
Standard Error2 #he population standard deviation of many measurements of a mean of n samples. It
is estimated by the standard deviation of one measurement of the mean divided by the s(uare root of n2
( )
( ) + n n

n
s
n
+
3
i

Median (4E-IA/)2 If n is odd, the value of i for which half of the remaining values are larger and half
are smaller. If n is even, the average of the two values in the middle.
Mode (4$-E)2 #he most fre(uently occurring value, if any.
+
Standard Deviation (.#-E0)2 %rom Ecel,s 5elp on this function, !#he standard deviation is a
measure of how widely values are dispersed from the average value (the mean)."
( )
+ n

s
3
i

Sample variance (0A1)2 .(uare of the standard deviation2


( )
+ n

s
n
+
3
i
3

Kurtosis (671#)2 %rom Ecel,s 5elp on this function,


!6urtosis characteri*es the relative peakedness or flatness of
a distribution compared with the normal distribution. 8ositive
kurtosis indicates a relatively peaked distribution. /egative
kurtosis indicates a relatively flat distribution." #he kurtosis
of a sample is consistent with a normal distribution for a
population if it is small, e.g. less than 9.:.
Skeness (.6EW)2 !.kewness characteri*es the degree of
asymmetry of a distribution around its mean. 8ositive
skewness indicates a distribution with an asymmetric tail
etending toward more positive values. /egative skewness
indicates a distribution with an asymmetric tail etending
toward more negative values." #he skewness of a sample is
consistent with a normal distribution for a population if it,s
absolute value is small, e.g. less than 9.:.
!ange2 4aimum value minus minimum value. (7sually increases as n increases, making it a poor
measure of the dispersion or spread of the population values.)
Mimimum (4I/)2 4inimum value.
Maximum (4A;)2 4aimum value.
Sum (.74)2 .um of all values,

n
+
i

"ount (<$7/#)2 /umber of values, n


"onfidence #evel $chosen %&2
If the population is normally distributed and you choose the default of =>? (@ A 9.9>), then the
probability is =>? that
Bevel <onfidence =
. #he <onfidence Bevel A
n
ts
, where t is .tudent,s t
(or, often, Cust t). #hus the probability is + D @ that
n
ts
=
, or @ that the true value of ' lies outside
these confidence limits. #he value of t can be calculated by Ecel,s #I/0 function, in which E A nF+ is
the degrees of freedom and @ is the probability (chance that the confidence limits do not include the true
'). #here are several important things to note2
3
#he Ecel function <$/%I-E/<E does not give the same results unless n is greater than about
+99. #he reason is that the -escriptive .tatistics tool correctly uses the .tudent,s t distribution for a
finite si*ed sample, while <$/%I-E/<E uses the normal distribution, which is for an infinite
population. .ee normally distributed for a more detailed eplanation and for 4A#BAG programs to
calculate .tudent,s t and descriptive statistics.
#he more the absolute values of skewness or kurtosis eceed +, the greater is the probability that the
population is not normally distributed, and the less chance that the confidence level calculated by
Ecel is correct.
Eercise Ha shows how Ecel can provide a graphical test of normalcy.
#he probability @ that
a >
can be found using Ecel as follows. <alculate
s
n a
t = . #hen
@ A #-I.#(t,n,3). #his is called a twoFtailed test.
#he probability that
a >
is I of
#-I.#(t,n,3), or #-I.#(t,n,+). #his is called a oneFtailed test.
'utliers
$utliers are values i which differ significantly from the mean . #he most modern criterion seems to
be &rubbs, #est (the t discussed on that page is .tudent,s t). If an outlier is so identified, you should
look at the source of the data to see if there is any reason why this value might be invalid. If so, it is
permissible to throw it out and recalculate all of the statistics. Gut it should not be thrown out simply
because it is an outlier.
1eturn to the Ecel tutorial home.
Comments and s!!estions al"ays "elcome. #mail to "ilcox$clarkson.ed.
:

You might also like