Professional Documents
Culture Documents
Statistics
Statistics
Statistics
Statistics
Introduction to Statistics
Meaning and Definition of Statistics
Statistics is the practice or science of collecting and analyzing numerical data in large quantities,
especially for the purpose of inferring proportions in a whole from those in a representative sample.
(Reference : Oxford dictionary)
There are two categories in statistics:
1) Descriptive statistics
2) Inferential statistics.
From a GMAT perspective, our focus would be on some of the measures of descriptive statistics.
Descriptive statistics
Descriptive statistics is a summary of certain data, whose purpose is to give an overview.
The most commonly used example of this is the average, like the average marks obtained by a
student in math in a period of 3 years.
Categories of Descriptive statistics
Measures of Central tendency refer to a value that is usually the center point of a data set.
Measures of Dispersion refer to how far the values in a data set have deviated from the mean
value.
Measures of Central Tendency
The most common measures of central tendency are mean, median and mode.
Mean
The Mean, also called the Arithmetic Mean or the Average, of a set of numbers is obtained by
calculating the sum of all elements in the set divided by the number of elements in the set.
Formula for calculating the Mean / Average
Mean = Sum of the elements/Number of elements in the set.
Consider the following example:
Tom scored 88 in English, 97 in Math, 90 in Science and 85 in Social Studies. Calculate his
average marks.
Solution: Average marks obtained by Tom = (88+97+90+85)/4 = 90.
Median
The Median of a data set is the middle value of the set when the elements are arranged in
ascending or descending order.
When a data set has an odd number of elements, we choose the middle value.
When the data set has an even number of elements, the average or the mean of the two middle
values becomes the median.
Consider the following examples:
1) Find the median of the set {4, 7, 1, 0, 9}.
Solution: Arranging the set in an order = {0, 1, 4, 7, 9}
4 is the median of this set.
2) Find the median of the set {3, 2, 5, 10, 8, 7}
Solution: Arranging the set in an order = {2, 3, 5, 7, 8, 10}
Median = (5+7)/2 = 6 is the median of this set.
Note:
When a set is evenly distributed, which means the difference between consecutive elements of the
set is equal, the median and mean of the set are equal.
This can be verified with the help of an example.
Find the mean and median of the set {4,8,10,6}
Mean = (4+8+10+6)/4 = 28/4 = 7
Median = {4,6,8,10} = (6+8)/2 = 14/2 = 7
Mode
The Mode of a data set is the most frequently occurred value in the set.
A set may have more than one mode or no mode at all.
Consider the following examples:
1) 3, 4, 7, 3, 1, 2, 3, 9, 13
3 is the mode in this set.
2) 21, 34, 9, 57, 64, 34, 90, 9, 12, 2, 34, 9
This is a bimodal set. 34 and 9 are its modes.
3) 6, 7, 36, 2, 1, 41
This set has no mode.
Take a look at this example:
The mean of 2,6,9,13,x is 9. Find the median of {22,x,38,11,5,9}.
Solution:
(2+6+9+13+x)/5 = 9
30+x = 5*9
X = 45-30 = 15.
For finding Median, arranging the numbers in an order,
{5,9,11,15,22,38}
The median is (11+15)/2 = 26/2 = 13
Measures of Dispersion
We will be focusing on two measures of Dispersion: Range and Standard deviation.
Range
This is probably the simplest measure of dispersion. It is obtained by calculating the difference
between the highest and the least values of a set.
The range of the data set {3, 4, 10, 14, 8} is 14-3 = 11.
Standard deviation
The Standard deviation of a set is calculated in five steps.
0 to 20 - 0
41 to 60 55 1
61 to 80 62, 75, 79 3
81 to 100 89, 96 2
Total 10
SOME BASIC DEFINITIONS
(i) Variate: Variate is a quantity that may vary from observation to observation.
(ii) Range: Range is difference between the maximum and minimum observations.
(iii) Class Interval: When data are divided in groups, each group is called a class interval.
(iv) Class Limit: Every class interval has two limits. The smallest observation of the interval is
called lower limit and the largest observation of the interval is called upper limit.
(v) Class Mark: The mid value of any class is called its class mark.
Class Mark =
(vi) Class Size: Class size is defined as the difference between two successive class marks. It is
also the difference between the upper and lower limits of any class interval.
(vii) Frequency: In a particular class the count of the number of observation is called its frequency.
So the corresponding frequency of a class is called its class frequency.
(viii) Cumulative Frequency: The cumulative frequency of any class is obtained by adding all the
frequencies successively prior to that class i.e. it is the sum of all frequencies up to that class.
Inclusive and Exclusive distributions:
Inclusive Distribution: When in a distribution, the upper limit does not coincide with the lower limit
of the next class then the distribution is called an inclusive distribution. e.g.
Inclusive Form
150-152 4
153-156 10
157-169 6
170-173 3
Exclusive Distribution: An exclusive distribution is that distribution in which the upper limit of one
class coincides with the lower limit of the next class. e.g.
Inclusive Form
10-20 10
20-30 8
30-40 15
40-50 4
True Class Limit: In the case of exclusive classes the upper and lower limits are respectively
known as its true upper limits and true lower limits.
In the case of inclusive classes, the true lower and upper limits are obtained by subtracting 0.5 from
the lower limit and adding 0.5 to the upper limit.
True upper limits and true lower limits are also known as boundaries of the class.
Tally: Tally method is used to keep the chance of error at minimum in counting. A bar (|) called tally
mark is put against any item when it occurs. The fifth occurrence of any item is represented by
putting diagonally a cross tally (|) on the first four tallies.
FREQUENCY DISTRIBUTION TABLE
The tabular arrangement of data showing the frequency of each item is called a frequency
distribution table. It is a method to present raw data in the form from which one can easily
understand the information contained in the raw data.
Frequency distribution are of two types:
(i) Discrete frequency distribution: In this type of frequency distribution, in the first column of
frequency table we write all possible values of the variables from the lowest to the highest, in the
second column we write tally marks and in the third column we show frequency of each item. In this
method data are not divided into groups or classes.
(ii) Continuous or Grouped Frequency Distribution: In the frequency distribution data are
divided into groups or classes. This method is used only where the values in the raw data are
largely repeating and the difference between the greatest and the smallest observations is not very
large.
PREPARATION OF A FREQUENCY DISTRIBUTION TABLE:
The following steps are taken to prepare a frequency distribution table:
(i) First of all we arrange the data in an array.
(ii) Then draw a table consisting of 3 columns. First column is used for class, the second column
for tally and the third column for frequency.
(iii) Then in the first column we write the classes keeping the lowest and the highest scores in view.
(iv) In second column we put tally marks against each class according to the scores.
(v) Then we write frequency of each class in the third column after counting the tally.
(vi) Figures in first column and third column taken together represent the frequency table.
CUMULATIVE FREQUENCY TABLE
Cumulative frequency table is obtained from the ordinary frequency table by successively adding
the several frequencies. Thus to form a cumulative frequency table we add a column of cumulative
frequency in the frequency distribution table. It is obvious that the cumulative frequency of the last
class is the sum of the frequencies of all the classes.
Cumulative frequency series are of two types:
(i) Less than series
(ii) More than series
GRAPHICAL REPRESENTATION OF DATA:
A given data can be represented in graphical way. There are various methods of graphical
representation of frequency distribution. Here we shall study only four of them:
1. Bar Graphs
2. Histogram
3. Frequency Polygon
4. Cumulative frequency curve or ogive
BAR GRAPH
The frequency distribution of a discrete value is best represented by a bar graph. The height of the
bars is proportional to the frequency of each variate-value. In a bar graph the bars must be kept
distinct to show that the variate-values are distinct. The bars are of equal width and are drawn with
equal spacing between them on the x-axis depicting the variable. The values of the variable are
shown on the y-axis.
HISTOGRAM
Histogram is a graphical representation of a grouped frequency distribution with continuous classes.
It consists of a set of rectangles where heights of rectangles are proportional to their class
frequencies, for equal class intervals. There is no gap between two successive rectangles. The
rectangles are constructed with base as the class size and their heights representing the
frequencies.
A.M =
Properties of Arithmetic Mean
(1) If x is the mean of n observations, x 1, x2, ....., xn, then the mean of observations x 1 + a, x2 +
a, ...., xn + a is , i.e. if each observation is increased by a, then the mean is also increased by a.
(2) If is the mean of n observations, x 1, x2, ..... xn, then the mean of observation, x 1 – a, x2 – a, ...,
xn – a is i.e. if each observation is decreased by a, then the mean is also decreased by a.
(3) If is the mean of x1, x2, .... xn then mean of ax1, ax2, .... axn is , where a is any number different
from zero i.e. if each observation is multiplied by a non-zero number a, then the mean is also
multiplied by a.
(4) If is the mean of n observations x 1, x2, ...., xn then the mean of x1/a, x2/a, ..... xn/a is x̄/a where
a ≠ 0, i.e. if each observation is divided by a non-zero number, then the mean is also divided by it.
Arithmetic mean of Grouped Data:
Let x1, x2, x3, ..... xn be n observations whose frequencies are f 1, f2, f3, .., fn respectively, then the
arithmetic mean of this distribution is given by
Combined Mean
Let and be the means of two groups of observations with number of observations n1 and n2
respectively, then the combined mean of two groups is given by,
(ii) If n is even, then we have two middle terms i.e. (n/2)th observation and (n/2 + 1)th observation.
Median of the given data will be mean of these two middle observations.