Rini Darathy
Business Statistics: Assignment 2 & 3

Assignment 2:

Q. Write a short note (250- 300 words each) on the following with suitable
business examples. (20 Marks)

1. Mean

2. Median

3. Standard Deviation

4. Skewness

Answer: Descriptive statistics are the base with which we are going to describe
a sample and various factors associated with the sample.

We have 3 Measures in Descriptive Statistics:

 Measures of Central Tendency

 Measures of Dispersion
 Measures of Distribution Shape

Measures of Central Tendency:

Central tendency means how a data on an average is behaving on the data we

have collected.

There are 3 ways to calculate central tendency they are:

 Mean
 Median
 Mode


Mean is an average and most common calculated core value in a collection of

data set and referred as an expected value. This is one of the important and
commonly used measures of central tendency of a probability distribution along
with the median and mode. Mostly in business it is used to represent the
characteristic value and thus results in a criterion for all observations. In
statistics mean plays a major role in finance and basically called as average.

Mean=Sum of observations /Total number of observations

Total number of observations is denoted as n.

Population Mean:

It may be a collection of a group of people, items or things and it is represented

as (µ).

Sample Mean:
It is an average of a data set and it is mainly used to calculate the central
tendency of a data set. It is represented as x-bar.


In our organisation, we have six employees working for learning and

development department under Delivery assurance team for North America,
Business unit and their monthly salaries are listed as 55000, 50000, 40000,
48000, 23000, 35000. The delivery assurance team head wants to know what
was the average amount spend on salary learning and development team. To
calculate average the first step is to add the salaries of Six employees. In this
case the sum of six employees’ salaries would be 251000. Divide the summed
salaries by the total no of employee. We will get the average salary of 41,833.
In Mean, outliers can be low or high.


The median is also the measure of central tendency and it is sorted by a middle
number, ascending or descending and it is more descriptive than the average of
a data set. In median the first step is to arrange the number from lowest to
highest. It will give us an approximate average value.

If we have odd numbers in our data set, then the median is calculated as value
at (n+1/2)th location (it means the middle number in the given data set). When
we have even numbers then the median is calculated as value at (n/2) th location
+value at (n+1/2)th location.(it means the middle two values need to be
summed and then divide them by two).

The median will give more justice to the results when we compare the results
with mean.


In our organisation, we have six employees working for learning and

development department under Delivery assurance team for North America,
Business unit and their monthly salaries are listed as 55000, 50000, 40000,
48000, 23000, 35000. With this smaller list we will be able to say easily the
value present in the middle. Basically in smaller list half of them will be lower
than the middle value and half will be higher than the middle value. As a first
step we have to sort the list of salaries in lowest to highest or highest to lowest.
Now we should look whether the data set has even or odd numbers. If the data
set has odd number, then the median will be the middle number (it is calculated
as (n+1/2)th location from the given set of data). If the data set has even
number then the median will be calculated as value at (n/2)th location +value at
(n+1/2)th location.(it means the middle two values need to be summed and
then divide them by two). Since we have smaller list the median and mean value
has changed slightly. If we have larger list, then we will have larger difference
between median and mean. While working on compensation we have to use
median over mean because medium is less affected by outliners.
Standard Deviation:

Standard deviation is a statistic that measures the scattering relative to the

mean of a set of data, and is calculated as square root of the variance. It mostly
measures the distance between each data points and the mean. If standard
deviation is near to the mean, then it is good.

There are two types of standard deviation:

Population standard deviation:

σ= √Σ(xi- μ)²/N

σ=Population Standard deviation, μ=population mean, N=Population size, xi=

value in dataset at ith location.

Sample standard deviation

Sx= √Σ(xi- x̄)²/n-1

Sx=Sample standard deviation, x̄=sample mean, n=sample size, xi= value in

dataset at ith location.

Normal Distribution:

Values are distributed across the mean. It is a special type of density curve
called Bell curve or sometimes it is called as normal curve. In normal distribution
the central value is called as mu. Some of the data set will be located far from
the mean and some of the data set will be located near to the mean.

Empirical rule:

States that for a normal distribution, all of the data will fall within three standard
deviations of the mean.68% of data falls within the first standard deviation, 95%
of data falls under the second standard deviation, 99.75 of data falls under the
third standard deviation.


As a training team lead, we have asked the participants to rate different trainers
from 1 to 5, so that the trainer will come to know the response received from
the participants to predict whether their training ratings fall on average, below
average or above average. With the sample data, first we should calculate the
mean value by adding all the data and dividing it by total number of data points.
Second step is we have to subtract the mean from the data point values. Third
step is square the result we got and sum up the results and then divide the
number of data points minus 1. Final step is to take the square root to find
standard deviation.

It is a measure of distortion that may be derived from the set of data using bell
curve or normal distribution. A distribution is used to be tilted when the data
point clusters are added towards the side of the scale than the other. Skewness
can be displayed using list of data or using graph. There are two types of

Positive Skewness:

When the data points or distribution frequency curve has long tail to the right
side of the curve then it is called as positive skewness.

Negative Skewness:

When the data points or distribution frequency curve has long tail to left side of
the curve then it is called as negative skewness

Skewness=nΣ((xi- x̄)³/(n-1)(n-2)S³

n=sample size, x̄=sample mean, S=Sample standard deviation, xi= value in

dataset at ith location.

If skewness is less than -1 or greater than +1, then distribution is highly

skewed, if skewness is between -1 and -1/2 or between +1/2 and +1, then
distribution is moderately skewed, if skewness is between -1/2 and +1/2 then
the distribution is symmetric.


In a marketing company the monthly sales distribution of the particular region is

in thousands. Majority of people will sell the products worth 20000 to 50000.
Very few will sell the products less than 10000 and very few will sell the
products above 100000., here the central value is around 50000, in this
situation, we will be able to see the long tail around right side of the central
value. As the tail is on the positive side of the central value we say that the
distribution is positively skewed.
Assignment 3:

" Correlation does not necessarily mean causation." Do you agree or disagree
with this statement? In either case, support your answers with two different
business examples. (20 Marks)


Correlation is used to understand the usual relationship between two variables.

Correlation is a statistical measurement in which two variables are linearly
related (i.e. they become one at a constant ratio). It is a common tool to
describe simple relationships without releasing a statement of cause and effect.


Causation refers to the relationship between two events in which one event is
affected by other event. Causation is defined as, when the value of one event,
increases or decreases as a result of other events. It happens only when a
controlled experiment has been done to prove that the occurrence of one
variable is affected by the other variable.

Correlation does not necessarily mean causation- I agree with this

statement, because we cannot develop cause and effect relationship between
two variables. In correlation we will be able to derive only 2 things (X & Y) and
there will not be any third things. Correlation analysis has two parameters
strength and direction. Strength, is denoted as x and it has three sub categories
low, medium, high. Direction, is denoted as y and it has two sub categories
positive and negative. r is denoted as coefficient of correlation. If r value is
positive (+1), then the direction of relationship between two variables will be
positive. If r value is negative (-1), then the direction of relationship between
two variables will be negative. If r value is 0 then there is no correlation. We
calculate correlation effect not only to determine if both the variable move in the
same direction but also to determine how strong the relationship between two

Whenever we look at the relationship between two variables, it is prudent to be

conservative and to assume that the relationship is more relevant than the
cause. Even if there is a correlation between two variables, it cannot be
concluded that one variable causes a change in the other. This relationship may
be accidental or a third factor may cause both variables to change.

The first reason there is some third variable (Z) that affects both X and Y at the
same time, moving X and Y together. The second reason is reverse causality
where X and Y move together may not be that X Causes Y, but instead that Y
Causes X. The third reason is sample selection; the model we see is not
representative of interested people. The fourth reason is measurement of error,
the results we are interested in are difficult to measure and can only be fully

1. Employees who attend more training are the most knowledgeable.

Training makes people more knowledgeable, and get deployed at client
location faster but it is always not true. There may be other factors
involved such as they might have worked on multiple project earlier and
they got more work experience on that area.
2. In some organization few team leads are following the micromanagement
practices on their team members. If micromanagement increases
performance of employee decreases and if micromanagement decreases
performance of employee increases but it is always not true. There may
be other factors involved behind the same.

