STATISTICS (Tanya) Pg 1 - 28

 What are statistics?

Anything involving numerical facts and figures; how these numbers are chosen and

EXAMPLE of wrong interpretation:

A new advertisement for Ben and Jerry's ice cream introduced in late May of last year
resulted was effective.
A major flaw is that ice cream consumption generally increases in the months of June, July,
and August regardless of advertisements. This effect is called a history effect and leads
people to interpret outcomes as the result of one variable when another variable is
actually responsible.

And therefore, “statistics” refers to a range of techniques and

procedures for analysing, interpreting, displaying, and making
decisions based on data.

 Importance
1) Lends credibility to an argument.
2) It provides tools that you need in order to react intelligently to information you hear
or read.

 Descriptive stats:
Numbers used to summarize and describe data
No assumptions/inferences can be made.
Just descriptive in nature.

Average salaries for various occupations in 1999.

$112,760 paediatricians
$106,130 dentists
$100,090 podiatrists
$76,140 physicists
$53,410 architects,
$49,720 school, clinical, and counselling psychologists
$47,910 flight attendants
$39,560 elementary school teachers
$38,710 police officers
$18,980 floral designers
Prime facie Description:
we pay the people who educate our children and who protect our citizens a great
deal less than we pay people who take care of our feet or our teeth.

No inference should be made from this data as there might be various other factors
involved too.

 Inferential stats:
The mathematical procedures whereby we convert information about the sample
into intelligent guesses about the population.

1)Sample -- a small subset of a larger set of data --- to draw inferences about the
larger set.
2) Population: The larger set is from which the sample is drawn.

Assumption that sampling is random and covers all varieties of specimens in the

 Simple Random Sampling:

1)Every member of the population to have an equal chance of being selected
into the sample.
2)the selection of one member must be independent of the selection of every
other member.

In this sense, we can say that simple random sampling chooses a sample by pure

Sample size matters. Only a large sample size makes it likely that our sample is
close to representative of the population.

 Random Assignment: (DOUBT)

When the sample is randomly divided into groups with no prior
differentiating each subjected to different treatment.
A non-random sample simply restricts the generalizability of the results.

 Stratified Sampling:
Used to make the sample more representative of the population.
Used if the population has a number of distinct “strata” or groups.

You first identify members of your sample who belong to each group. Then
you randomly sample from each of those subgroups in such a way that the
sizes of the subgroups in the sample are proportional to their sizes in the
(ratio of subgroup: sample = sample: population)
 Variables:
Variables are properties or characteristics of some event, object, or person that can
take on different values or amounts.

1) Independent Variables: When a variable is manipulated by an experimenter only

and is not affected by any external force.

Dependent Variables: Variables that vary when an independent variable is

manipulated by the experimenter. i.e. it is dependent on the independent

2) Qualitative Variables: Those that express a qualitative attribute such as hair

colour, eye colour, religion, favourite movie, gender, and so on. The values of a
qualitative variable do not imply a numerical ordering

Qualitative variables/categorical variables: Are those variables that are

measured in terms of numbers such as height, weight, and shoe size.

3) Discrete Variables:Variables such as number of children in a household are

called discrete variables since the possible scores are discrete points on the
scale. For example, a household could have three children or six children, but
not 4.53 children.

Continuous Variables: Variables such as “time to respond to a question” are

continuous variables since the scale is continuous and not made up of discrete
steps. The response time could be 1.64 seconds/1.64237123922121 seconds.

 Percentiles:(DOUBT)
Percentage of number of people behind your score in a test/ or a level.
Shows your performance with respect to your competitors.
STATISTICS (PurabSoni)Pg 29-52
First two definitions are self-explanatory so discussing about the third
For Example refer Pg no. 30
Explanation is given in the audio

Levels of Measurement-
Types of Scales
Nominal Scales
Ordinal Scales
Interval Scales
Ratio Scales

Nominal: The essential point about nominal scales is that they do not imply
any ordering among the responses. For example, when classifying people
according to their favorite color, there is no sense in which green is placed
“ahead of” blue. Responses are merely categorized. Nominal scales embody
the lowest level of measurement.

Ordinal:EX: Rating surveys in restaurants – When a waiter gets a paper or

online survey with a question: “How satisfied are you with the dining
experience?” having 0-10 option, 0 being extremely dissatisfied and 10 being
extremely satisfied.
Thus, an ordinal scale is used as a comparison parameter to understand
whether the variables are greater or lesser than one another using sorting. The
central tendency of the ordinal scale is Median.
Limitation: Survey respondents will choose between these options of
satisfaction but the answer to “how much?” will remain unanswered.

Interval: Interval Scale is defined as a numerical scale where the order of

the variables is known as well as the difference between these variables.
Variables which have familiar, constant and computable differences are
classified using the Interval scale. It is easy to remember the primary role ofthis
scale too, ‘Interval’ indicates ‘distance between two entities’, which is what
Interval scale helps in achieving.
These scales are effective as they open doors for the statistical analysis of
provided data. Mean, median or mode can be used to calculate the central
tendency in this scale.
Limitation: The only drawback of this scale is that there no pre-decided
starting point or a true zero value.
EX: time is a very common example of interval scale as the values are already
established, constant and measurable.

Ratio: Ratio Scale is defined as a variable measurement scale that not only
produces the order of variables but also makes the difference between
variables known along with information on the value of true zero.Ratio scale
provides the most detailed information as researchers and statisticians can
calculate the central tendency using statistical techniques such as mean,
median, mode and methods such as geometric mean, the coefficient of
variation or harmonic mean can also be used on this scale.
Ratio Scale Example:
The following question fall under the Ratio Scale category:
What is your daughter’s current height?
 Less than 5 feet.
 5 feet 1 inch – 5 feet 5 inches
 5 feet 6 inches- 6 feet
 More than 6 feet

Fixed Frequencies are
distributed to different
Ranges are made for
Continuous frequencies for different
Probability Distribution:
When probablity is
calculated for each
frequency to occur.
Chance for the frequency
to appear can affect the
ultimate data table.

Example: Discrete
Colour Frequency 14
Red 12 10
Brown 1 8

Blue 7 6
Purple 6 2
Red Brown Blue Purple

Probability Distribution
Colour Probability
Red 0.4 0.4
Brown 0.15 0.3
Blue 0.25 0.2
Purple 0.2
Red Brown Blue Purple

A hand gesture and average time men take to respond.

Time to Respond Frequency

(in milliseconds) 9
500-600 3 8
600-700 9 6
700-800 2 4
800-900 6 3
500-600 600-700 700-800 800-900

Probability Density
To represent the probability of any given event associated with any arbitrary movement like
the one mentioned above, we plot their frequency over a stipulated period of time. To
account for all possible outcomes, we try to make it continuous for us not to miss out on
any outcome. A normal bell like Curve is the most common curve used to represent such

Here, the probability for the event to occur is the maximum at the centre, while it is the
least where the curve cuts the X-axes.

 Area under the curve is equal to 1 because the curve shows the summation of all
probabilities for different events.
 Second, the probability of any exact value of X is 0 as the probability that his
movement takes exactly 698.956432342346576 milliseconds is essentially zero.

Shapes of distributions
Not all shapes look like a bell. Not all events have their normal probability density centred at
the middle. They could be more spread out.
Note: A normal bell curve would have its mean, median and mode all at the centre. When
median deviates from the mean, then we call the probability to be skewed or more spread
A distribution with the longer tail extending in the positive direction is said to have a positive skew.
It is also described as “skewed to the right.”

A distribution with the longer tail extending in the negative direction is said to have a negative skew.
It is also described as “skewed to the left”.


Take 3 numbers (49,50,51), the central tendency being 50. If a number is added before 49, say 48 as
the median figure would lie on the left of the central tendency. The graph obtained would be
positively skewed. On the other hand, if a 52 is added, the graph would be skewed towards the right
with the median value being greater than the central tendency making it negatively skewed.

Talking in terms of placements at B-schools, usually the figures are positively skewed as the central
tendency is usually on the right side of the median.

 Distributions also differ from each other in terms of how large or “fat” their tails are. The left
distribution has relatively more scores in its tails; its shape is called leptokurtic. The right
distribution has relatively fewer scores in its tails; its shape is called platykurtic.
STATISTICS (Nivi ) Page –(52-91)

 Summation – Mathematical formulae involve addition of large number of

variables. Summation or Sigma notation is used as a concise and convenient
expression for denoting sum of values of a variable.
For Ex: Suppose variable X denotes the weight of three students:

Student Weight - X
Harry 60
Ron 54
Hermione 57

∑ 3𝑖=1 𝑋i = 60 + 54 + 57 = 171

Many formulas involve squaring numbers before they are summed. This is indicated as:

∑ 𝑋2 = 602 + 542 + 572 = 9765

Some formulas involve sum of cross products :
X and Y denote variables and XY denotes cross product.

1 3 3
2 2 4
3 4 12

∑ 𝑋𝑌 = 3 + 4 + 12 = 19
This indicates the summation of cross products.

Also ( ∑ 𝑋 )2 ≠ ∑ 𝑋 2 , because the expression on the left means to sum up all

the values of X and then square the sum (19² = 361), whereas the expression on the right
means to square the numbers and then sum the squares.

 Linear transformation – Often it is necessary to transform data from one

measurement scale to another. Such cases where the transformation from one scale
to another consists of multiplying by one constant and then adding a second
constant are therefore called linear transformations.
For Ex- Conversion from Centigrade to Fahrenheit, F = 1.8 C + 32. For the conversion
from Centigrade to Fahrenheit, the first constant is 1.8 and the second is 32. Also,
the plot of degrees Centigrade as a function of degrees Fahrenheit will always form a
straight line and this would be the case for all linear transformations.

 Logarithms –

 The logarithm of a number to the base b is the power to which b must be

raised to produce the number. Thus, suppose y = bx. Then, logb(y) = x or
logb(bx) = x.
 Logs are, in a sense, the opposite of exponents. Consider the following
simple expression: 102 = 100. Here we can say the base of 10 is raised to the
second power. Here is an example of a log: Log10(100) = 2.
 Natural logarithms can be indicated either as: Ln(x) or loge(x). Changing the
base of the log changes the result by a multiplicative constant. To convert
from Log10 to natural logs, you multiply by 2.303. Analogously, to convert in
the other direction, you divide by 2.303.
 Taking the antilog of a number undoes the operation of taking the log.
Therefore, since Log10(1000) = 3, the antilog10 of 3 is 1,000.
 A series of numbers that increases proportionally will increase in equal
amounts when converted to logs. For example, if one student increased their
score from 100 to 200 while a second student increased their's from 150 to
300, the percentage change (100%) is the same for both students. The log
difference is also the same, as shown below.

Log10(100) = 2.000 Log10(200) = 2.301 Difference: 0.301

Log10(150) = 2.176 Log10(300) = 2.477 Difference: 0.301

 Statistical Literacy - Statistical literacy is the ability to understand and reason
with statistics and data. The abilities to understand and reason with data, or
arguments that use data, are necessary for citizens to understand material
presented in publications such as newspapers, television, and the Internet. Being
statistically literate is sometimes taken to include having the abilities to both
critically evaluate statistical material and appreciate the relevance of statistically-
based approaches to all aspects of life in general or to the evaluating, design, and/or
production of scientific work.

 Graphing qualitative data- Graphing is the most important aspect of data

analysis through which we can cover tedious data into concise information. The key
point about the qualitative data is that they do not come with a pre-established
ordering. For ex- Consider consumers opting for vanilla, butterscotch and chocolate
ice -creams. Consider the frequency table :

Ice cream Frequency Relative

Vanilla 85 85/500=0.17
Butterscotch 60 60/500=0.12
Chocolate 355 355/500=0.71
Total 500 1

 Pie – chart - Pie charts are effective for displaying the relative frequencies of
a small number of categories. In a pie chart, each category is represented by
a slice of the pie. The area of the slice is proportional to the percentage of
responses in the category. This is simply the relative frequency multiplied by
 Bar charts - Bar charts can also be used to represent frequencies of different
categories. Here, frequencies are shown on the Y-axis and the no of
consumers opting for a particular ice-cream is shown on the X-axis.

Bar charts are better when there are more than just a few categories and for
comparing two or more distributions.
 Some common mistakes to avoid – Unnecessary fanciness can lead to
unacceptable distortion and vary the information you are trying to convey.
For example, using 3D charts , setting the baseline to value other than zero,
using line graph for qualitative variables etc.

 Graphing quantitative variables - Quantitative variables are variables

measured on a numeric scale. Height, weight, response time, subjective rating of
pain, temperature, and score on an exam are all examples of quantitative variables.

 Stem and leaf displays - A stem and leaf display is a graphical method of
displaying data. It is particularly useful when your data are not too numerous .
The 'stem' is on the left displays the first digit or digits. The 'leaf' is on the
right and displays the last digit.
For example if we have a distribution:
3, 6, 9, 10, 10, 11, 14, 17, 19, 20, 22, 22, 27, 28, 29, 31, 31, 33, 33, 33
The numbers 3, 2, 1 and 0 (for single digits) are arranged as stems and the
numbers to the right of the bar are leaves, and they represent the 1’s digits.
3 |11333 (31, 31, 33, 33, 33)
2 |022789 (20, 22, 22, 27, 28, 29)
1 |001479 (10, 10, 11, 14, 17, 19)
0 |369 (3, 6, 9)
We shall repeat the leaf as per the no of occurrences in the dataset. We can
also simplify this figure by splitting the stem into two parts. For example if we
allocate each row with specified intervals like,
1st row- 30-34 3 |11333
2 row- 25-29 2 |789
3 row- 20-24 2 |022
4 row- 15-19 1 |79
5 row- 10-14 1 |0014
6 row- 5-9 0 |69
7 row-0-4 0 |3

We can also use stem and leaf display for comparison. For instance – Suppose
girls and boys in a school read a certain no of books. This can be represented
on a stem and leaf display. From the plot we can infer that there are two girls
who studied 51 books while there is only one boy who studied 51 books and
so on.

Decimal numbers and negative numbers can also be plotted on a stem and
leaf display. We can round off the decimal to nearest whole number and to
represent negative numbers we can use negative stems. For instance,
43.9 can be rounded to 44, 51.2 can be rounded to 51 and so on.
Consider the data set: 43.9, 51.2, -27.4, -15.4, 1.2, -0.2, -6.3, -6.7, -8.8
Now to plot this on stem and leaf display:

5 |1 51.2 rounded off to 51

4 |4 43.9 rounded off to 44
0 |1 1.2 rounded off to 1
-0|0679 -0.2 rounded off to negative 0, -6.3 to -6, -6.7 to -7,-8.8 to -9
-1|5 -15.4 rounded off to -15
-2|7 -27.4 rounded off to -27

Although stem and leaf displays are unwieldy for large data sets, they are
often useful for data sets with up to 200 observations. For example if we use
stem and leaf display for representing large population data then we need to
round off the large values to two place accurate numbers. Like 493,559 can
be rounded to 490,000 and then plotted with a stem of 4 and a leaf of 9.
Whether your data can be suitably represented by a stem and leaf display
depends on whether they can be rounded without loss of important
 Histograms - A histogram is a graphical method for displaying the shape of a
distribution. It is particularly useful when there are a large number of
observations. It groups the observations into ranges or intervals. The height
of each bar shows how many elements fall into each interval known as class
frequency. Let us consider the height distribution of trees in an orchard. We
can club them in a group of 50 each. First bar would depict the no of trees
from 100cm – 150cm, second bar would depict the no of trees from 150cm –
200cm and so on.

Histograms can be based on relative frequencies instead of actual

frequencies. Histograms based on relative frequencies show the proportion
of scores in each interval rather than the number of scores. In this case, the
Y-axis runs from 0 to 1 (or somewhere in between if there are no extreme
proportions). You can change a histogram based on frequencies to one based
on relative frequencies by (a) dividing each class frequency by the total
number of observations, and then (b) plotting the quotients on the Y-axis
(labelled as proportion).
There is more to be said about the widths of the class intervals, sometimes
called bin widths. Your choice of bin width determines the number of class
intervals. The best advice is to experiment with different choices of width,
and to choose a histogram according to how well it communicates the shape
of the distribution. There are some “rules of thumb” that can help you
choose an appropriate width.
1) Sturges’ rule is to set the number of intervals as close as possible to 1
+ Log2(N), where Log2(N) is the base 2 log of the number of
observations. According to Sturges’ rule, 1000 observations 84 would
be graphed with 11 class intervals since 10 is the closest integer to
2) Rice rule is to set the number of intervals to twice the cube root of
the number of observations. In the case of 1000 observations, the
Rice rule yields 20 intervals instead of the 11 recommended by
Sturges' rule.

 Frequency Polygons - Frequency polygons are a graphical device for

understanding the shapes of distributions. They serve the same purpose as
histograms, but are especially helpful for comparing sets of data. Frequency
polygons are also a good choice for displaying cumulative frequency
distributions. To draw a frequency polygon first start with choosing the
interval. Then represent the middle point value of each class data on X- axis
and frequency of each class on Y-axis. Place a point in the middle of each
class interval at the height corresponding to its frequency. Finally, connect
the points. You should include one class interval below the lowest value in
your data and one above the highest value. The graph will then touch the X-
axis on both sides. For example the frequency polygon below shows the
number of students for each score interval. The point 13 corresponds to
score range 11.5-14.5, the point 16 corresponds to score range 14.5-17.5 and
so on. So between 11.5-14.5 we have 1 student, between 14.5-17.5 we have
2 students etc.
A cumulative frequency polygon can be plotted by adding up the values of
the lower class intervals as well. Consider the following plot:

Limit Frequency Cumulative

25 10 10
30 12 10+12=22
35 8 22+8=30
40 20 30+20=50
45 11 50+11=61
50 4 61+4=65
55 5 65+5=70

Frequency polygons are useful for comparing distributions. This is achieved by

overlaying the frequency polygons drawn for different data sets
It is also possible to plot two cumulative frequency distributions in the same graph.
This is illustrated in the figures below:

STATISTICS - PART 4 [ Paul Vinod] Pages # 123 - 158


Summarizing distribution or descriptive statistics would help in describing an

entire set of numbers or values in few numbers.

4.1 Objectives:
 Central Tendency
o Mean, Median and Mode
o Calculation
 Variability
o Standard Deviation
o Variance
 Shape and Transformations
o Variance Sum Law 1.

4.2 Central Tendency

There are three measures of central tendency which are mainly:

1. Balance Scale

The point in the given data set where the distribution is in balance. For
example, for a given data set S = { 2, 3, 4, 9, 16} the following image represent
the balance scale.

The balance point or the fulcrum would vary depending on the type of
distribution, whether it is a symmetric or asymmetric distribution. Examples as
shown below.

=> Symmetric Distribution

=> Asymmetric Distribution.

NOTE: The balance point/ the point where fulcrum is placed denotes the
centre of distribution

2. Smallest Absolute Deviation

The center of a distribution is based on the concept of the sum of the

absolute deviations (differences) where the sum of absolute deviations should
be the minimum.

For a data set S = { 2, 3, 4, 9, 16} we need to find a value for which the sum of
absolute deviations is minimum.

Two cases are taken for calculating the absolute deviation of the set from
A. 10
B. 5

Numbers in Set S V = 10 V=5

2 8 3
3 7 2
4 6 1
9 1 4
16 6 11
Total 28 21
So, here we can see that for a V = 5 has a minimum sum of absolute deviation
and it would be considered a better option than that of V = 10.

3. Smallest Squared Deviation

The smallest squared deviation is used for computing the central

tendency of the given distribution wherein the sum of the squared
deviations(differences) is minimum.

For the same sample set S = {2, 3, 4, 9 ,16}

Numbers in Set S V = 10 V=5
2 64 9
3 49 4
4 36 1
9 1 16
16 36 121
Total 186 151

4.3 Measures of Central Tendency

1. Arithmetic Mean
The arithmetic mean is the summation of all the numbers in a
given data set divided by the number of numbers in the same. The symbol “μ”
is used for the mean of a population. The symbol “M” is used for the mean of a

Example: For the data set S = {2, 3, 4, 9, 16},

Mu = (2 + 3 + 4 + 9 + 16) / 5 = 6.8
2. Median
The median is one of the measures of central tendency which is
the midpoint of the distribution I.e. same number of scores is above and below
the median.
There are 2 cases in computing the median of a set of numbers.
Case 1: n = even
Median = average of the n/2th term and (n/2+1)th term
Case 2: n = odd
Median = middle term I.e. (n+1)/2th term.
Note: When there are numbers with the same value then the formula of the
third definition of the 50th percentile should be used.

3. Mode
The mode is a central tendency measure tool which tells the most frequently
occurring value.
The following continuous data contains the range and frequency at which a
value comes in that particular range.

The mode for the given sample data set would be between 600-700 as this
range has the highest frequency. Thus, the mode is the middle of the interval,
Note: The mean, median and mode would be same for a symmetric
distribution, for an asymmetric distribution the trio would not be same and it
would vary.

4.4 Additional Measures of Central Tendency

1. Trimean
The trimean is the weighted average of the 25th , 50th and 75th
T imean = (P25 + 2*P50 + 75) / 4
Eg: For a given data set
S = { 37, 33, 33, 32, 29, 28, 28, 23, 22, 22, 22, 21, 21, 21, 20, 20, 19,
19, 18, 18, 18, 18, 16, 15, 14, 14, 14, 12, 12, 9, 6}

Trimean = (15 + 2* 20 + 23) / 4 = 78/4 = 19.5

2. Geometric Mean
a. Geometric mean is the product of the n numbers in a given data set
and taking the nth root of the resultant.
b. GM is ideal measure for averaging rates.
Eg: For the following data set which represent the stock portfolio for a value of
$1000 and had annual returns of 13%, 22%, 12%, -5%, and -13%.
Here each return is a multiplier indicating how much higher the value grew
after each year.
Thus for the above table the geometric mean of the multipliers is 1.05

3. Trimmed Mean
Removal of the higher and lower scores and computing the mean of the
remaining scores is known as trimmed mean.
Representation: Mean trimmed 10% means that a mean is computed with
10% of its scores trimmed off, which is 5% from the bottom of the data set and
5% from the top of the data set.
S = { 37, 33, 33, 32, 29, 28, 28, 23, 22, 22, 22, 21, 21, 21, 20, 20, 19,
19, 18, 18, 18, 18, 16, 15, 14, 14, 14, 12, 12, 9, 6}
For the set S to calculate mean trimmed at 20% is by removing 10% from the
bottom of S and 10% from the top of the set.
I. Total number of elements in S = 30 ;
II. 10% from top and bottom imply = 3 elements each
So the new set S’ which is a trimmed set is as follows
S’ = { 32, 29, 28, 28, 23, 22, 22, 22, 21, 21, 21, 20, 20, 19, 19, 18, 18, 18,
18, 16, 15, 14, 14, 14, 12, }
Top 10% of S = { 37, 33, 33 }
Bottom 10% of S = {12, 9, 6 }
Therefore, S’(Trimmed Mean) = 20.16 (approx) [n’ = 31 - 6 = 25terms]
[Note: The number of terms for calculating the mean is to be used from S’]

4.5 Comparing the Measures of Central Tendency

1. For a symmetric distribution the mean, median, trimean and trimmed mean
are equally distributed.
2. There are differences among the measures of central tendency in case of
skewed distributions. As the skew varies the distribution measure change
a) For a small skew distribution the measures of central tendency ( median
mode, trimmed mean , mean etc.) would be close to each other.
Considering any one factor would give a summarized result of the entire

b) For a large skew distribution, the measures of central tendency would

vary and considering one measure would not be able to describe the

whole story.

4.6 Measures of Variability

1. Variability refers to how “spread out” the distribution is.
2. It is synonyms to spread dispersion of the data set.
3. Four commonly used dispersion/ variability measures:
i. Range
ii. Inter-quartile range
iii. Variance
iv. Standard deviation.
DATA for the variability
Understanding the Graphs:

1. Axes of the graphs: y axis -> frequency x-axis -> respective values (here
scores of quiz)
2. 75th and 25th percentile calculation
a. Find the summation of the frequencies for different scores, total =
b. Calculate the 75th percentile and 25th percentile which would be 15
and 4 respectively.
c. Find the score corresponding to which the frequencies gets add up to
15 and 4 which are scores 9 and 5 respectively.

1. Range
This describes the spread of the entire distribution by calculating
the difference between the highest and lowest scores.

2. Inter-Quatrile Range
The IQR is the range of middle 50% of scores in a distribution.
IQR = 75th percentile - 25th percentile
For Quiz 1 the IQR would be,
IQR Quiz 1 = 8 - 6 = 2
For Quiz 2 the IQR would be,
IQR Quiz 2 = 9 - 5 = 4
A related measure of variability is semi-inter-quartile range. It is defined as the
half of the inter-quartile range. For a symmetric distribution, the median +/-
median contains half the scores in the distribution.
3. Variance
Variability can also be defined in terms of how close the scores in the
distribution are to the middle of the distribution. The variance is defined as the
average squared difference of the scores from the mean.
Thus for a sample the calculation of the variance is as follows:

For calculating the variance of the population :

s2 is the estimate of the variance and M is the sample mean. Note that M is the
mean of a sample taken from a population with a mean of μ. Since, in practice,
the variance is usually computed in a sample, this formula is most often used.
4. Standard Deviation
The standard deviation is simply the square root of the variance. The standard
deviation is an especially useful measure of variability when the distribution is
normal or approximately normal because the proportion of the distribution
within a given number of standard deviations from the mean can be calculated.

[Repeat] : Please take the standard deviation section again when taking normal
4.6 Shapes of Distribution
1. Skew
a) Distributions with large positive skew value will have larger means and
b) In a highly skewed distribution, mean is more than twice of the median
c) To calculate the skew index of the graph Pearson formula :

3(mean  median)

Skew index(measure of skew), the third moment about the mean.

3. Kurtosis
It is another measure of skew for a distribution. The value “3” is subtracted to
define “no kurtosis” as the kurtosis of a normal distribution. Otherwise, a
normal distribution would have a kurtosis of 3.

Q. Difference between Skewness and Kurtosis.

R. Skewness is an indicator of lack of symmetry with respect to the central
point. Kurtosis is the measure of data, that is either peaked or flat with respect
to the probability distribution.
Q . What is meaning of Moment?
R : Moment is defined as the function of the probability distribution. Zeroth
moment is the total probability, the first moment is the mean, the second central
moment is variance and third central moment is skewness (Pearson’s Index) and
fourth central moment is the Kurtosis.

Statistical variance does have the following properties:

1) The value of variance is always positive. It cannot be negative.

2) Zero variance indicates that all the values are equal in the distribution.

3) If in a frequency distribution, all values are added by constant number, then the variance of the
distribution does not change, i.e.

var (x) = var (x + a)

4) If all the values of a variable in a distribution are multiplied by constant, then the variance of
distribution is multiplied by the square of that constant.

var (a x) = a2 var (x)

5) If we have multiple distributions having same mean and if their variances are given, then the
total variance can be calculated using the following formulae.

4.7 Variance Sum Law I

The variance a sample space comprising of two sample space A and B can be
determined by summation of individual variances.

Note: These formulas for the sum and difference of variables given above only
apply when the variables are independent.
Pranay Saha ( Pg : 164- 179)
Introduction to Bivariate Data
What is Bivariate Data?
In statistics, bivariate data is data on each of two variables, where each value of one of the variables is
paired with a value of the other variable. Typically it would be of interest to investigate the possible
association between the two variables.
For Example:
Below figure shows a scatter plot of Arm Strength and Grip Strength from individuals working in
physically demanding jobs including electricians, construction and maintenance workers, and auto

Figure. Scatter plot of Grip Strength and Arm Strength.

Not surprisingly, the stronger someone's grip, the stronger their arm tends to be. There is therefore a
positive association between these variables. Although the points cluster along a line, they are not
clustered quite as closely as they are for the scatter plot of spousal age.
However, not all scatter plots show linear relationships. Figure below shows the results of an
experiment conducted by Galileo on projectile motion.

Figure. Galileo's data showing a non-linear relationship.

In the experiment, Galileo rolled balls down an incline and measured how far they travelled as a
function of the release height. It is clear from above Figure that the relationship between “Release
Height” and “Distance Travelled” is not described well by a straight line: If you drew a line
connecting the lowest point and the highest point, all of the remaining points would be above the line.
The data are better fit by a parabola.
Scatter plots that show linear relationships between variables can differ in several ways
including the slope of the line about which they cluster and how tightly the points cluster about
the line.
Values of the Pearson Correlation

What is Pearson product-moment correlation coefficient?

The Pearson product-moment correlation coefficient is a measure of the strength of the linear
relationship between two variables. It is referred to as Pearson's correlation or simply as the
correlation coefficient. If the relationship between the variables is not linear, then the
correlation coefficient does not adequately represent the strength of the relationship between
the variables.

The symbol for Pearson's correlation is “ρ” when it is measured in the population and “r”
when it is measured in a sample. Because we will be dealing almost exclusively with
samples, we will use “r” to represent Pearson's correlation unless otherwise noted.

Pearson's r can range from -1 to 1. An “r” of -1 indicates a perfect negative linear

relationship between variables, an r of 0 indicates no linear relationship between variables,
and an r of 1 indicates a perfect positive linear relationship between variables. Figure below
shows a scatter plot for which r = 1, r=-1 and r=0.

Figure 3. A scatter plot for which r = 0. Notice that there is no relationship between X and Y.
However, with real data, you would not expect to get values of r of exactly -1, 0, or 1.
For example:

Properties of Pearson's r
i. A basic property of Pearson's r is that its possible range is from -1 to 1. A
correlation of -1 means a perfect negative linear relationship, a correlation of 0
means no linear relationship, and a correlation of 1 means a perfect positive
linear relationship.

ii. Pearson's correlation is symmetric in the sense that the correlation of X with Y
is the same as the correlation of Y with X. For example, the correlation of
Weight with Height is the same as the correlation of Height with Weight.

iii. A critical property of Pearson's r is that it is unaffected by linear

transformations. This means that multiplying a variable by a constant and/or
adding a constant does not change the correlation of that variable with other
variables. For instance, the correlation of Weight and Height does not depend on
whether Height is measured in inches, feet, or even miles.
Computing Pearson's r
We are going to compute the correlation between the variables X and Y shown
in Table 1.

Step 1: Compute the mean for X and subtracting this mean from all values of X. The new
variable is called “x.”
Step 2: The variable “y” is computed similarly.

The variables x and y are said to be deviation scores because each score is a deviation from
the mean. Notice that the means of x and y are both 0.
Next we create a new column by multiplying x and y.

Please Note:
Before proceeding with the calculations, let's consider why the sum of the xy column
reveals the relationship between X and Y. If there were no relationship between X and
Y, then positive values of x would be just as likely to be paired with negative values of y
as with positive values. This would make negative values of xy as likely as positive
values and the sum would be small. On the other hand, consider Table 1 in which high
values of X are associated with high values of Y and low values of X are associated with
low values of Y. You can see that positive values of x are associated with positive values
of y and negative values of x are associated with negative values of y. In all cases, the
product of x and y is positive, resulting in a high total for the xy column. Finally, if there
were a negative relationship then positive values of x would be associated with negative
values of y and negative values of x would be associated with positive values of y. This
would lead to negative values for xy.

Pearson's correlation is computed by dividing the sum of the xy column (Σxy)

by the square root of the product of the sum of the x2 column (Σx2) and the sum
of the y2 column (Σy2). The resulting formula is:
So as per the table

An alternative computational formula that avoids the step of computing

deviation scores is:

Variance Sum Law II

When the variables X and Y are independent, the variance of the sum or
difference between X and Y can be written as follows:

The variance of X plus or minus Y is equal to the variance of X plus the

variance of Y.
When X and Y are correlated, the following formula should be used:

where ρ is the correlation between X and Y in the population. For example, if

the variance of verbal SAT were 10,000, the variance of quantitative SAT were
11,000 and the correlation between these two tests were 0.50, then the variance
of total SAT (verbal + quantitative) would be:

which is equal to 31,488. The variance of the difference is:

which is equal to 10,512.

If the variances and the correlation are computed in a sample, then the following
notation is used to express the variance sum law:

You might also like