Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Mean

Defining The Term


 The mean is the arithmetic average of a group of observations (or
numbers).
 It is computed by summing all the observations and dividing by
the total number of observations.
 The population mean is represented by the Greek letter mu (�μ),
and the sample mean is represented by �‾x.
 The formulae for computing the population mean and the sample
mean are given below:
o Population Mean: μ = Σx = = Nx1+x2+x3+..+xN
o Sample Mean: x‾=nΣx=nx1+x2+x3+..+xn
 Let's break down the formulae:
o The capital Greek letter sigma(ΣΣ) is commonly used in
mathematics to represent a summation of all the numbers in a
grouping.
N is the number of observations in the population, and n is the
number of observation in the sample.

Compute the mean.

 Solution:

Here we have a total of 12 observations, so N=13.

μ= 643000+327000+233000+204000+167000+144000+20000+
12000+10000+9000+9000+7000+60001313643000+327000+233000+20
4000+167000+144000+20000+12000+10000+9000+9000+7000+6000
μ= 179100013131791000 = 137769.23137769.23

The Outlier Problem


 What is an outlier?
o Outliers are the data points that are far from the other data
points, i.e. they're unusual/unexpected values in a dataset.
 e.g. in the scores 10, 25, 27, 29, 31, 34, 50 both 10 and 50 are
"outliers".
The one main disadvantage of the mean is its susceptibility to
the influence of outliers.
 As the data contains outliers the mean loses its ability to provide
the best central location for the data because the outlier is
dragging the mean away from the typical value.
 Let's take an example to understand this. Let's say we have a
dataset with following numbers: 50, 49, 55, 52, 53, 48, 49, 55, 56,
55, 50.
 The mean(μ) of the dataset is:
o μ=50+49+55+52+53+48+49+55+56+55+5011=52=1150+49
+55+52+53+48+49+55+56+55+50=52

Let's now visualise the data and mean value.


From the above figure, we can see that the mean value represents
the centre value of the dataset.
 Now, adding outliers to this dataset will change the mean value,
and the outliers will draw the mean further away from the centre.
 Let's add 85 and 84 to this dataset.
 The new mean (μnew) becomes:
μnew=1350+49+55+52+53+48+49+55+56+55+50+85+84=57
The presence of outliers in the dataset has shifted the mean, and
it no longer represents the center value of the dataset.
 This problem occurs because the mean is affected by every value in
the dataset. And if the dataset has larger or small values (i.e.
outliers), it pulls the mean towards the extreme value. (as seen in
the above figure)
 So, if we have outliers in our dataset, the mean loses its ability to
provide the best central location for the data because the outliers
will drag the mean away from the typical value.

When To Use Mean


 We use mean when both of the following conditions are met:
o Data is scaled:
 Data with equal intervals like speed, weight, height,
temperature etc.
o Data does not contain outliers:
 The mean is sensitive to the outliers. We should only use
mean when the dataset does not contain outliers.

You might also like