Unit-4

UNIT 4 ELEMENTARY STATISTICS
Elementary Statistics
Structure
4.0 Objectives
4.1 Introduction
4.2 Meaning of Statistics
4.3 Application of Statistics
4.3.1 Business
4.3.2 Finance
4.4 Collection of Data
4.5 Organisation and Presentation of Data
4.6 Measures of Central Tendency
4.6.1 Arithmetic Mean
4.6.2 Median
4.6.3 Mode
4.7 Measures of Dispersion
4.7.1 Standard Deviation
4.7.2 Variance
4.7.3 Coefficient of Variation
4.8 Other important topics
4.8.1 Sampling
4.8.2 Simple and Bivariate Relations
4.8.3 Hypothesis Testing
4.8.4 Estimation
4.9 Let Us Sum Up
4.10 Answers/Hints to Check Your Progress Exercises
4.0 OBJECTIVES
After studying this Unit, you should be able to:
• describe the importance of statistics in making financial decisions;
• use different methods of collecting data;
• organize (classification and arrangement) and condense statistical data;

Dr. Amit Singh Khokhar, Assistant Professor, Delhi Skill and
Entrepreneurship University
63
Elementary Statistics • present the statistical data in the forms of tables, graphs, diagrams, etc.;
and Spreadsheets
• compute numerical quantities that measure the central tendency of a set of
data such as, mean, median, and mode, and use these measures;
• explain the concept of dispersion;
• compute numerical quantities that measure the dispersion of a set of data; and
• explain the concepts of sampling, hypothesis testing, and estimation.
4.1 INTRODUCTION
Statistics is a discipline that deals with the collection, organisation, presentation,
analysis, and interpretation of data and numerical facts. The application of
statistical knowledge helps business managers and policymakers to make
business decisions and formulate policy more effectively. Simultaneously,
statistics is a tool for students, researchers, philosophers to find test theory and
find solutions for daily life problems. Indeed, statistics has become one of the
most important courses in business or financial education, because a background
in applied statistics is a key ingredient in understanding accounting, finance,
marketing, economics, production, and other business courses.
Almost every person deals with and interprets statistics every day, but seldom
recognises it. For example, many Indians play and watch cricket matches and
discuss the batting scores and bowling figures without noticing that these figures
are match statistics. When people say, “India has won three out of first five one-
day matches in a bilateral cricket series,” they are probably referring to the match
statistics. Runs scored by different batsmen of each team are examples of
statistics. The average run rate, the total runs scored by a batsman in the
tournament, and the win-loss analysis are some more examples of statistics.
Let’s take another example. Before every election, the media conducts an opinion
poll and presents the results of the voter survey to its audience. These opinion
polls ask a set of questions about candidate preferences to a sample of voters
instead of the whole population of voters. The media usually makes projections
based on this survey and also state the possible margin of error. A margin of error
of 5 per cent means that the actual number of constituencies won by a political
party may differ from the poll results by as much as 5 percentage points in either
direction (“plus or minus”).
In business and finance, managers frequently employ statistics to help them make
better decisions. A car manufacturer, for instance, needs to forecast future sales
to determine whether to expand production or curtail it. Statistical analysis guides
business owners/managers in decision making.
The government under its constitutional obligation releases a statement (budget)
of its estimated revenue and expenses over a specified future period. In addition,
the government releases many other income and expenditure estimates. Most
64
popular among these are the gross domestic product(GDP), the unemployment
rate, the money supply, the consumer price index (CPI), and the foreign exchange
rate. All these measures are statistics that are used to summarize the general state
of the economy. And, of course, academic economists, businesses, and the
government use statistical methods to predict these macroeconomic variables.
The uses of statistics are widespread not only in business and economics but also
in everyday life.
TV Show Ratings: Advertisers and TV channel owners use the TRP ratings to
determine which channels and the programmes are viewed most or it indicates
the popularity of a TV channel or a programme. TRP is calculated by the Indian
agency the Broadcast Audience Research Council using “BAR-O-meters.” The
BARC releases weekly TRP results every Thursday ranking all TV channels and
TV programmes. These TRP responses are then used to conclude the viewing
habits of the entire Indian population.
4.2 MEANING OF STATISTICS

The term statistics is derived from the New Latin word ‘statisticum collegium’
("council of state") and the Italian word ‘statista’ ("statesman" or "politician").
The German ‘Statistik’, first introduced by Gottfried Achenwall (1749),
originally designated the analysis of data about the state. By the end of the 18th
century, the term "statistics" was used to refer to the systematic collection of
demographic and economic data by states. It was introduced into English in 1791
by Sir John Sinclair.
Early government records show statistical information on some aspects of
population, land records, the military strength of different wings, mortality during
epidemics, and so on. Perhaps it was because of this that Statistics was called the
science of kings. But as humanity developed, the usage, as well as the
understanding of Statistics, increased and now it is difficult to imagine a field of
knowledge that can do without statistics. It has become an important tool of
analysis. The invention of computers has led to a rise in the use of statistics as an
analytical tool. Statistics play an important role in computer science and vice
versa. Statistics is used for data mining, speech recognition, vision and image
analysis, data compression, artificial intelligence, and network and traffic
modelling.
The term “statistics” may be used in two different senses – singular and plural. In
the plural sense, it means a systematic collection of numerical facts, while in the
singular sense; it is the science of collecting, classifying and using statistics.
Statistics in the plural sense means the statistical data, and statistics in the
singular sense means the statistical methods.
There are two basic types of statistics – descriptive and inferential. Descriptive
statistics deals with the presentation and organization of data. Various measures
of central tendency, such as the mean, median, or mode, and measures of 65
Elementary Statistics dispersion, such as the standard deviation and coefficient of variation, are
and Spreadsheets
descriptive statistics. For example, a financial analyst computes financial returns
on corporate bonds and stocks to compare their performance during, say, the past
10 years. Because the financial analyst is collecting and summarizing the
available data, we conclude that he or she is using descriptive statistics. NITI
Aayog presents the poverty figures based on National Family Health Survey
(NFHS). The statistics on deprivation presented in Fig. 4.1 are an example of
descriptive statistics. The descriptive statistics help summarise the data.
Fig. 4.1: Percentage of the Total Population of India which is Deprived in

each Indicator.
Source: Baseline report, National Multidimensional Poverty Index, NITI Aayog,
2021.
Inferential statistics deals with the use of sample data to infer general conclusions
about a much larger population. In statistics, we refer to the population as the
whole group of individuals we are interested in studying. The sample is a subset
of such a population. In the pre-election opinion poll example presented earlier,
the surveyors took a sample because it would have been too expensive and time-
consuming to collect responses from every voter. For example, statistics may be
used to summarize the performance of the stock market on a given day. The
Sensex, an average of the stocks of 30 of the largest and most actively traded
stocks on the BSE, is used as a barometer of the overall stock market’s
performance. Other indexes, such as the Nifty 50, are also calculated to generate
summary measures of stock market performance. Each of these measures is
derived through inferential statistics, because a sample is used to provide
representative— though incomplete—information about the stock market at
large.
66
4.3 APPLICATION OF STATISTICS
Statistics has developed wide areas of its applications in the modern age of
information technology. Economics, social sciences, natural sciences, state, etc.
are some of the important areas of statistical applications. The areas of
application of statistics specific to this course are discussed as under:
4.3.1 Business
In the modern world, management of the business organisation has become a
complicated exercise as a result of changes in size, technical know-how, the
quantum of production, number of personnel employed, capital and workers
employed, and increasing level of competition. Management, while planning,
organizing, controlling and communicating, is confronted with alternative
courses of action. In the face of uncertainty, management cannot adopt a trial-
and-error method. It is here that statistical data and powerful statistical
techniques of probability, expectations, sampling, the test of significance,
estimation theory, and forecasting, etc. play an indispensable role. In the words
of Chao, “Statistics is a method of decision-making in the face of uncertainty
based on numerical data and calculated risks”. Statistics, thus, provides
information to the business units which help in deciding the location and size of
business, demand forecasting, production planning, quality control, marketing
decisions and personnel administration. In Industry, statistics is extensively used
in ‘Quality Control’.
4.3.2 Finance
Financial analysts use a variety of statistical data and tools to guide their
investment and to give recommendations to investors, policymakers, and
government officials. In the case of stocks, the analysts review a variety of
financial data including dividend yields and price/earnings ratios. A financial
analyst can begin to draw a conclusion on whether an individual stock is over-or-
under-priced by comparing the trend for an individual stock with the trend of
stock market averages.
Some of the common questions asked in the finance sector, which can be
answered with the application of statistics, are as follows: Which of the several
brokerages has a reliable record of higher-than-average return on investment? Is
the share price for a given firm rising predictably enough for a day-trader to
invest in it? What has our return on investment been for the given two brands of
computer equipment over the last two years? Should one pay the higher premium
being proposed by his/her insurance company for liability insurance? Should one
invest in bonds or in stocks in the current year? What are the average rates of
return? Which country has the greatest chance of maximizing our profit on
investment over the next decade?
67
and Spreadsheets
Check Your Progress 1
Note: i) Use the space given below for your answers.
ii) Check your progress with those answers given at the end of the unit.
1) State any three examples when we use statistics in everyday life.
………………………………………………………………………………….
………………………………………………………………………………….
2) Differentiate between the singular and plural meanings of the term statistics.
………………………………………………………………………………….
………………………………………………………………………………….
3) Differentiate between descriptive and inferential statistics.
………………………………………………………………………………….
………………………………………………………………………………….
4.4 COLLECTION OF DATA

The collection of reliable and sufficient data/statistical information is a pre-
requisite of any statistical inquiry. For any statistical investigation, whether it is
related to management, business, social or natural sciences, the fundamental
requirement is to collect the information relating to the concerned phenomenon.
Statistical data can be collected either by a surveyor by conducting an
experiment. Surveys are more popular in social sciences like economics and
business. In natural/physical sciences, experimentation is a more commonly used
method of investigation. The entire structure of statistical analysis for any
enquiry is based on the systematic collection of data. The collection of data
requires careful planning and execution of a statistical survey. If this is not so
then the result obtained may be misleading or incomplete and hence useless. Data
can be classified into the following types:
Primary and Secondary Data

A proper choice of type of data needed for any statistical investigation basically
depends on consideration of various factors such as nature, objective and scope
of enquiry, availability of financial resources, time, accuracy expected and the
status of the agency.
Primary Data
Also called raw data, primary data is the data gathered through surveys,
interviews, or experiments. A typical example of primary data is household
surveys. When the data required for a particular study cannot be found either in
the internal records of the enterprise or in published sources, it may become
necessary to collect original data, i.e., to conduct a first-hand investigation.
Therefore, primary data can also be called first-hand data. The work of collecting
original data is usually limited by the money, time, and manpower available for
the study. The investigator has to decide whether to collect data from the entire
68
population of a group or a part of it. When the data to be collected are very large Elementary Statistics
in volume, the investigator may choose to collect data from a small portion of the
population called sample. The size of the sample is always less than the total size
of the population.
There are different methods of primary data collection. Primary data may be
collected by any one of the following methods:
• Information received from correspondents or local agencies
• Schedule sent through enumerators
• Direct personal interview
• Indirect oral interview
• Mailed questionnaire method
• Telephonic survey
Secondary Data
Alternatively, information can also be obtained through a secondary source. It
means drawing or collecting data from the already collected data of some other
agency. Technically, the data so collected are called secondary data. This data is
already available in some reports, research studies, journals or newspapers.
Needless to say, before using secondary data, the investigator must weigh the
advantage in terms of saving money, time and effort with the disadvantage of
reaching misleading conclusions. Whether secondary data is safe or not should be
judged from its adequacy, suitability and reliability.
The sources of secondary data may be classified into the following two
categories:
Published Sources: By published sources of secondary data we refer to data
relating to trade, business, price, stock market, investment, financial institutions,
etc. published by national/international organisations/agencies. These
publications are very useful sources of secondary data. Some of the important
sources of secondary data in this category include the following:
• Publications of the central and state governments, of international bodies like
ADB, ILO, IMF, UNO, WTO, World Bank, and of foreign governments etc.
• Publications of autonomous institutions and research bodies e.g., Reserve
Bank of India Bulletin, CSO, and NSSO.
• Weekly/monthly/annual publications like Yojana, Economic and Political
Weekly, etc.
Unpublished Sources: All those sources of secondary data where records are
maintained by business organisations or private agencies for self-use and are less
frequently available for use by the general public. Data collected by research
institutions also come under the purview of unpublished sources of secondary
data. 69
and Spreadsheets 4.5 ORGANISATION AND PRESENTATION OF
DATA
In the preceding section, we discussed the methods of collection of data either by
a statistical survey (or inquiry) or from some secondary source. Data collected
either from census or sample inquiry, that is from the primary source, are always
unarranged and in raw form. To start with, they are contained in hundreds and
thousands of questionnaires. To draw insights from them, they must be
organised, (i.e., classified and arranged) and condensed or summarised. For this
purpose, we can use various methods like preparing master sheets in which
various information are recorded directly from the questionnaires. From these
sheets, small summary tables can be prepared manually. Nowadays computers
can be used for organisation and condensation of data more swiftly, efficiently
and in much less time. Computer applications help us to construct various types
of graphs and diagrams.
Once the data has been collected the next important step in statistical inquiry is
the organisation and presentation of the collected data. This step broadly covers
the following aspects - classification of data, tabulation of data, diagrammatic
presentation of data, and graphic representation of data. The following tools are
frequently used to organise and present the data for statistical analysis:
Tables
All data tables have four elements: cells, row labels, column labels, and caption.
The caption describes the information that is contained in the table. Table 4.1
illustrates all these elements. The row labels identify the information in rows,
such as housing, electricity, and nutrition. The column labels include censored
headcount, weight, and contribution. A cell is defined by the intersection of a
specific row and a specific column. The entry at the cross-section of column 1
and row 1 is 37.60 %. This is an example of a cell.
Table 4.1: Common Labels in any Table
Source: Baseline report, National Multidimensional Poverty Index, NITI Aayog,

70 2021.
When the amount of information presented through tables is large, it takes time
to understand it. Only experts can interpret such tables. Graphs and diagrams,
though they provide fewer details than tables, have the advantage of presenting
data in a more understandable and memorable manner. In most graphical
presentations, except a few – like pie diagrams, the independent variable is
plotted on the horizontal axis (the x-axis) and the dependent variable on the
vertical axis (they-axis). The graphical/diagrammatic presentation of data
provides the following advantages: (1) easy to understand, (2) huge volumes of
data can be represented in a simplified manner, (3) reveal hidden facts, (4) quick
to grasp and easy to compare, and (5) have universal acceptability. Some of the
diagrammatical/graphical tools to present the data are as follows:
Line Charts
Line charts are constructed by graphing data points and drawing lines to connect
the points. Figure 4.2 shows the year-on-year, quarter-on-quarter, and moving
average estimates for economic growth between 2017-2021. So, this is a time-
series graph. The dependent variables are often in percentages.
Fig. 4.1: Line Charts for the Presentation of Data

Source: RBI Annual Report 2020-21.
Bar Diagram
A bar graph is a chart that plots data using rectangular bars or columns (called
bins) that represent the total amount of observations in the data for that category.
Bar charts can be displayed with vertical columns, horizontal bars, comparative
bars (multiple bars to show a comparison between values), or stacked bars (bars
containing multiple types of information). Fig. 4.2 shows the different types of
bar diagrams.
71
Elementary Statistics One Dimensional Bar Diagram
and Spreadsheets
A bar is defined as a thick line, often made thicker to attract the attention of a
reader. The height of the bar highlights the value of the variable with width
presenting nothing. Therefore, it has nothing to do with the area of the bar.
Further, the bars of the bar diagram are separated from one another so that the
gap between the successive bars is the same. In a bar diagram, the bars can be
placed both vertically as well as horizontally.
Fig. 4.2: Various Types of Bar Diagrams

Source: RBI Annual Report 2020-21.
Component Bar Diagram
A component bar diagram is used when it is desired to represent the comparative
values of different components of a phenomenon. In this diagram, the bars,
corresponding to each phenomenon, is divided into various components. The
portion of the bar occupied by each component denotes its share in the total. The
subdivisions of different bars must always be done in the same order and these
should be distinguished from each other by using different colours or shades.
Notably, using bar diagrams is most appropriate when we are comparing only a
few items.
72
Pie Diagram or Pie Chart
It is also known as an angular diagram. It is used to represent percentage break
downs of the given data. A Pie Chart, also known as an angular diagram, is a type
of graph that displays data in a circular graph. The pieces of the graph are
proportional to the fraction of the whole in each category. In other words, each
slice of the pie is relative to the size of that category in the group as a whole. The
entire “pie” represents 100 per cent of a whole, while the pie “slices” represents
portions of the whole. Fig. 4.4 presents a pie-chart showing the ministry-wise
breakdown of capital expenditure - 2021-22 as per cent of total capital
expenditure. Each slice of the pie depicts the proportion of capital expenditure by
a specific ministry. The larger the proportion of a ministry, the larger is the size
of its slice.
Fig. 4.3: Pie Chart Showing the Ministry Wise Capital Expenditure as a
Percentage of Total Capital Expenditure.
Histogram
The histogram is a very common type of graph for displaying classified data. It
was first introduced by Karl Pearson. It is a set of rectangles erected vertically. It
has the following features:
a) It is a rectangular diagram.
b) Since the rectangles are drawn with specified width and height, the histogram
is a two-dimensional diagram.
c) The area of each rectangle is proportional to the frequency of the respective
class.
A histogram is a chart that depicts the frequency of a numerical variable in non-
overlapping intervals, called 'bins', that span the entire range of the data. In 73
Elementary Statistics essence, a histogram is a pictorial representation of a frequency table. While we
and Spreadsheets
have used bar charts for categorical variables, a histogram would be the
equivalent kind of chart for numerical data. Histograms can be useful because at
a glance, we can quickly see the shape of the data. For example, does it look bell-
shaped, or does it seem to be skewed to the left or to the right?
Fig. 4.5 is a histogram displaying the distribution of heights among the students.
The figure shows that the highest number of students measure between 165-170
cm. The figure also reveals that the height wise distribution of students is a
unimodal distribution as there is only one class with the highest frequency.
Fig. 4.4: Histogram Presenting the Number of Students in Each Height

Category.
1) Name some primary data collection methods used in statistics.
………………………………………………………………………………….
………………………………………………………………………………….
………………………………………………………………………………….
2) What are the advantages of organising the data?
………………………………………………………………………………….
………………………………………………………………………………….
………………………………………………………………………………….
3) Differentiate between a bar graph and a histogram.
………………………………………………………………………………….
………………………………………………………………………………….
………………………………………………………………………………….
4) List the important elements of tables in statistics.
………………………………………………………………………………….
………………………………………………………………………………….
74
4.6 MEASURES OF CENTRAL TENDENCY
There are several different measures of central tendency; each having its own
advantage and disadvantage, including the mean, median and mode. The choice
of these averages depends up on which one best fits the property under
discussion.
4.6.1 Arithmetic Mean

The mean is usually the best measure of central tendency when there are no
outliers/extreme values. It can be calculated either from the population or sample
values. However, most of the time, the population values are unknown and hence
we may use sample values to estimate the population mean. There are three types
of means which are suitable for a particular type of data. These are Arithmetic
Mean, Geometric Mean, and Harmonic Mean. In this unit, the Arithmetic Mean
is covered.
The mean is defined as the sum of the magnitude of the items divided by the
number of items. The mean of X1, X2, X3, X4, ……. Xn is denoted by A.M.(𝑋̅)
and is given by
∑ 𝑛
x +x +K+ xn Xi
𝑋̅ = 1 2𝑛 = 𝑖=1𝑛
When the data are arranged or given in the form of frequency distribution i.e.
there are k variate values such that a value x1 has a frequency f1 (i=1,2,---,k), then
∑𝑛 𝑓i𝑥i
the Arithmetic mean will be 𝑋̅ = 𝑖=1𝑘 ̅ where K is the number of classes
𝑖=1
and ∑𝑘𝑖=1 𝑓i = 𝑛
Arithmetic Mean for Grouped Data
If data are given in the shape of a continuous frequency distribution, then the
arithmetic mean is obtained as follows:
∑ 𝑛
𝑓1 𝑚1
𝑋̅ = 𝑖=1
∑𝑘
, m1=the class mark of the ith class and fi=the frequency of
𝑖=1 𝑓1
the ith class

Example: Find the arithmetic mean for the following frequency distribution
Class Frequency Class
interval mark
3-5 3 4
6-8 2 7
9-11 5 10
12-14 4 13
𝑛
∑ 𝑓1 𝑓𝑚1 =3+42+7+5+10+4+13
𝑋̅ = 𝑖=1 𝑘 , 3+2+5+4
= 128
14
= 9.14
∑𝑖=1 𝑓1
75
Elementary Statistics 4.6.2 Median
and Spreadsheets
In any distribution, the median is the value of the variable which divides the data
into two equal halves. In an ordered series of data, the median is an observation
lying exactly in the middle of the series. It is the middlemost value in the sense
that the number of values less than the median is equal to the number of values
greater than it.
Let X1, X2,......Xn be the observations, then the number arranged in ascending
order will be x(1), x(2),..... x(n),where x(i),is ith smallest value.
Here, we find that x(1),< x(2),<..... x(n)
Median is denoted by M
Median for under grouped by M
𝑛+1
𝑥 𝑖𝑓 𝑛 𝑖𝑠 𝑜𝑑𝑑
2
Median={1
2
(𝑥𝑛 +𝑥𝑛+1 ) ,𝑖𝑓 𝑛 𝑖𝑠 𝑒𝑣𝑒𝑛
2 2
Example: Find the median of the following data:

3,8,4,7,7,5,6,8,7,4,6,8,9,7,6
Arrange the given data in either increasing or decreasing order:
3,4,4,5,6,6,7,7,7,7,8,8,8,9
Median=7
3,4,4,5,6,6,6,7,7,7,7,8,8,8
Median 6+7
2
= 6.5
Median for grouped data: If data are given in the shape of continuous frequency
distribution, the median is defined as:
𝑤 𝑛
𝑀 = 𝐿𝑚𝑒𝑑 +𝑓 (2 − 𝑓𝑐 )
𝑚𝑒𝑑
Where 𝑙𝑚𝑒𝑑 = Lower class boundary of the median class.

𝑓𝑚𝑒𝑑 = The frequency of the median class
𝑓𝑐 = The cumulative frequency(less than type) preceding the median class.
W = the size of the median class and n= total of observation.
Note: The median class in the class with the smallest cumulative frequency (less
than type) greater than or equal to n/2.
Example: Find the median wage of the following distribution

Wages(in 2000-3000 3000-4000 4000-5000 5000-6000 6000-7000
Rs)
No. of 3 5 20 10 5
workers
76
Solution:
Wages(in Rs) No. of workers Cf
2000-3000 3 3
3000-4000 5 8
4000-5000 20 28
5000-6000 10 38
6000-3000 5 43
Here N/2=43/2=21.5, So.cf > 21.5 is 28 and the corresponding class is 4,000-
5,000, so the median class is 4,000-5,000, and
Median = 4000+𝟏𝟎𝟎𝟎
𝟐𝟎
=(21.5-8) =4,675
So the wage is 4,675
4.6.3 Mode
The mode is a value that occurs most frequently in a set of values, and which
occurs more than once. It is the most repeated observation in the dataset. The
mode may not exist and even if it does exist, it may not be unique. In the case of
discrete distribution, the value having the maximum frequency is the modal
value. If in a set of observed values, all values occur once or an equal number of
times, then, there is no mode.
Example
a) Find the mode of 5, 3, 5, 8, 9
Solution: Mode =5, because comes more than any other observation in the
given data.
b) Find the mode of 8, 9, 9, 7, 8, 2, and 5.
Solution: It is bimodal data as it has two modes, 8 and 9.
c) Find the mode of 4, 12, 3, 6, and 7.
Solution: No mode for this data, because all observations appear only once in
the dataset.
The mode of a set of numbers X1, X2, …, Xn is usually denoted by Z.
Mode for Grouped data
If data are given in the shape of continuous frequency distribution, the mode is
defined as:
Z=𝐿𝑚𝑜𝑑 + (∆1∆+∆
1 )W
2
Where: z= the mode of the distribution

𝐿𝑚𝑜𝑑 =the lower class boundary of the modal class
∆1=𝑓𝑚𝑜 − 𝑓1
77
Elementary Statistics ∆2 =𝑓𝑚𝑜 − 𝑓2
and Spreadsheets
𝑓𝑚𝑜𝑑 = frequency of the modal class
𝑓1 = frequency of the class preceding the modal class
𝑓2 = frequency of the class succeeding the modal class
W= the size of modal class
Example: Find the mode for the frequency distribution given by below:
Class interval 3-6 6-9 9-12 12-15
Frequency 4 8 10 3
Z = 𝐿1 + (∆1∆+∆
1 )cw
2
2 2
= 9+(2+7) 3 = 9 (9) 3
29
= 3
4.7 MEASURES OF CENTRAL TENDENCY

The measures of central tendency are not adequate to describe a set of
observation unless all observations are the same. Two data sets can have the
same mean but they can be entirely different. So, we need some additional
information like :
1) The extent to which the items in a particular distribution are scattered around
the central tendency i.e. the measure of dispersion.
2) The direction of scatter whether more items are attached towards higher or
lower values i.e., the measure of skewness.
The measure of scatter or spread of items of a distribution is known as dispersion
or variation. In other words, the degree to which numerical data tend to spread
about an average value is called dispersion or variation of the data. Measures of
dispersions are statistical measures that provide ways of measuring the extent to
which the data are dispersed or spread out.
Objectives of measuring Variation:
• To judge the reliability of measures of central tendency
• To control variability itself.
• To compare two or more groups of numbers in terms of their variability.
• To make further statistical analysis.
Various measures of dispersions are in use. The most commonly used and
reliable measures of dispersion are Standard deviation, Variance, and coefficient
78 of variation.
4.7.1 Standard Deviation Elementary Statistics
Standard deviation (SD or σ) is the most commonly used measure of dispersion.

It is a measure of the spread of data about the mean. SD is the square root of the
sum of squared deviation from the mean divided by the number of observations.
∑𝑛𝑖=1(𝑋 − 𝜇)2
𝜎=√ 𝑤𝑖𝑡ℎ 𝑐𝑙𝑎𝑠𝑠 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
𝑁
∑𝑛𝑖=1 𝑓1 (𝑋 − 𝜇)2
𝜎=√ 𝑓𝑜𝑟 𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
𝑁
Population Standard Deviation = 𝜎 = √𝜎 2

Sample deviation = 𝑆 = √𝑆 2
If the standard deviation of a set of data is small the values are more concentrated
around the mean and if the standard deviation is large, the value is more scattered
widely around the mean. One of the shortcomings of SD is that if the unit of
measure of variables in two series is not the same, the variability cannot be
compared by comparing the values of SD for the two series.
Example: Find the variance and standard deviation of the following sample data.
∑ 4
𝑋𝑖
Solution: 𝑋̅ = 𝑖=1 = 11
4
𝑋𝐼 5 10 12 17 Total
(x1-x)2 36 1 1 36 74
∑𝟒𝒊=𝟏(𝒙𝒊 −𝑿)𝟐
S2= =74/3=24.67 = > S = √𝑺𝟐=√𝟐𝟒. 𝟔𝟕 =4.97
𝟒−𝟏
4.7.2 Variance
The variance is defined as the square of standard deviation i.e., it is the sum of
the squared deviation from the mean divided by the number of observations. The
population variance is denoted byσ2 and the sample variance by S2. The
following box summarises the relation between SD and variance, and population
and sample measures of dispersion.
Population Variance Population Standard Deviation
∑𝑛
𝑖=1(𝑋−𝜇)
2 ∑𝑛
𝑖=1(𝑋−𝜇)
2
𝜎2 = 𝜎=√
𝑁 𝑁
Sample Variance Sample Standard Deviation
∑𝑛
𝑖=1(𝑋−𝜇)
2 ∑𝑛
𝑖=1(𝑋−𝜇)
2
𝑠2 = 𝑁
𝑠=√ 𝑁
79
Elementary Statistics Example: Suppose a sample of sales is taken from four firms and the figures are
and Spreadsheets
₹ 2.3 crores, ₹ 1.1 crores, ₹ 0.7 crores, and ₹ 6.8 crores. Find the sample variance
and standard deviation.
Solution:
Sales(x) x - x̅ (x - x̅)2
x2
2.3 -.425 .181
5.29
1.1 -1.625 2.64
1.21
.7 -2.025 4.100
.49
6.8 4.075 16.60
46.24
10.9 0 23.53
53.23
Sample Variance Sample Standard Deviation
∑𝑛
𝑖=1(𝑥𝑖 −𝑥)
2 ∑𝑛
𝑖=1(𝑥𝑖 −𝑥)
2
𝑠2 = 𝑠=√
𝑛−1 𝑛−1
10.9
Mean = 𝑥 = = 2.725
4
25.53
Variance = 𝑠 2 = = 7.8
4−1
Standard Deviation = s = √7.8 = 2.8
4.7.3 Coefficient of Variation

It is defined as the ratio of standard deviation to the mean usually expressed as
percentages. This measure is used to compare variability or homogeneity or
stability or uniformity or consistency of two or more sets of data. The data having
a higher value of the coefficient of variation is said to be more dispersed or less
uniform etc. The coefficient of variation overcomes the shortcoming of standard
deviation and can compare the level of dispersion when the unit of measurement
is different for the given two series.
𝜎
C.V. = 𝜇 × 100 𝑓𝑜𝑟𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛
𝑆
= 𝑋 × 100 𝑓𝑜𝑟𝑡ℎ𝑒𝑠𝑎𝑚𝑝𝑙𝑒
Example: Consider the distribution of per dim (per employee) of two companies.
For the first company, the mean and Standard Deviation (S.D.) are 60 and 10
respectively. For the second company, the mean and S.D. are 50 and 9
respectively. Hence, their coefficient of variation is given by:
10
CV1 = 60 × 100 = 16.7%
80
9 Elementary Statistics
CV2 = 50 × 100 = 18.0%
This shows that the variability in the first company is less as compared to that in
the second company.
4.6 OTHER IMPORTANT TOPICS

Under this we consider: (i) sampling, (ii) simple and bivariate relations, (iii)
hypothesis testing and (iv) estimation.
4.8.1 Sampling
It has been discussed in previous sections that collected data from whole
population is neither feasible nor essential in most cases. Therefore, we collect a
sample of data from the population and use it to make inferences about the
population. Very often, we will be interested in estimating a population
parameter. In order to estimate this, we need to define our terms carefully:
Population: The entire group of individuals or objects of interest under
investigation or study.
Unit: An element of the population. This will be a person or object on which
observations can be made or from which information can be obtained.
Sampling Frame: A list of all the units from the population from which the
sample may be drawn is called sampling frame.
Target Population: The population about which one wishes to make an inference.
Sample Size: The number of individuals in the sample.
Sampling: It is the process of selecting a sample from the population.
Importance of Sampling
1) sampling is economical in nature
2) sampling saves precious time of the investigator
3) sampling is convenient to organise
4) sample based results are good estimates of population parameters
5) suitable in limited resources
Type of Errors in Sampled Data
An estimate based on a sample will not be exact, there may be some error
involved in it. This error is the difference between an estimate and the true value
of the parameter being evaluated. In general, errors that occur during estimation
based on a sample can be categorized into two:
• Sampling Errors
• Non-Sampling Errors
81
Elementary Statistics Sampling Errors: It is the discrepancy between the population parameter and
and Spreadsheets
the sample statistic. The error arises due to the selection of a less representative
sample. Even if we have a representative sample, there may still be an error if the
sample size is small. On the other hand, our estimates of parameters will often be
inaccurate if our sample is not representative of the population. The selection of
the sample should be done carefully to minimise the sampling error.
Non-Sampling Errors: Suppose we have a representative sample and have
chosen a sample large enough to ensure our parameter estimates are accurate to a
good degree of precision, errors may still arrive – such as recording errors,
measurement errors, respondent bias, non-response errors, errors in processing
the data, interviewer error, and the reporting error. Measurement errors and
recording errors occur if there is an error in measuring the item being studied or
in recording its result. Non-responses can be due to refusals. Another common
form of error is the interviewer error. Interviewer errors can occur in surveys
when an interviewer introduces bias into an interview or when a questionnaire is
badly designed.
Sampling Methods
Sampling techniques can be grouped into two categories:
Random (Probability): The sampling method in which the items are included in
the sample on a random basis i.e., each observation in the population has an equal
chance of getting selected in the sample. Example – simple random sampling,
stratified random sampling, cluster sampling, and systematic sampling.
Non-Random (Non-Probability): In non-probability sampling, the sample is not
based on chance. It is rather determined by the personal judgment of the
researcher. This method is cost-effective; however, we cannot make objective
statistical inferences. Depending on the technique used, non-probability samples
are classified into quota, judgment, purposive, and convenience samples.
4.8.2 Simple and Bivariate Relations
Univariate Data: This type of data consists of only one variable. The analysis of
univariate data is thus the simplest form of analysis because the information deals
with changes in only one variable. It does not deal with causes of changes or
relationships among variables. The main purpose of the analysis is to describe the
data and find patterns that exist within it. An example of univariate data can be
the height of people.
Height (in 143 152 176 139 148 122 171
cm)
The above data has only one variable and that is the height of the people selected
for the study.
82
Bivariate Data: This type of data involves two different variables. The analysis
of this type of data deals with causes and relationships among the two variables.
An example of bivariate data can be rainfall and agricultural production in a
particular year.
Rainfall (in
80 91 84 78 86 89 82
cm)
Wheat
Quantity 131 142 136 125 139 141 132
(Tonne)
The above table presents data about rainfall and wheat production in a particular
year. The data shows a positive relation between average annual rainfall and
wheat production. Thus, bivariate data analysis involves relationships,
comparisons, explanations, and causes. These variables are often plotted on the X
and Y axis on the graph for a better understanding of data and one of these
variables is independent while the other is dependent.
4.8.3 Hypothesis Testing

Testing a statistical hypothesis is an important part of inferential statistics. A
statistical hypothesis is an assumption or a statement, about one or two
parameters involving one or more than one population. A statistical hypothesis
may or may not be true. We need to decide, based on the data in a sample, or
samples, whether the stated hypothesis is true or not. If we knew all the members
of the population, then it is possible to say with certainty whether or not the
hypothesis is true. However, in most cases, it is impossible, and impractical to
examine the entire population. Due to scarcity of resources, lack of time, and
tedious calculations based on a population, we can only examine a sample that
hopefully represents that population very well. So, the truth or falsity of a
statistical hypothesis is never known with certainty.
Testing a statistical hypothesis is a technique, or a procedure, by which we can
gather some evidence, using the data of the sample, to support, or reject, the
hypothesis we have in mind. This is also one way of making inferences about
population parameters, where the investigator has a prior notion about the value
of the parameter.
The Null and Alternative Hypotheses
The first step, in testing a statistical hypothesis, is to set up a null hypothesis and
an alternative hypothesis. When we postulate a statement, about one parameter of
a population, or two parameters of two populations, we usually keep in mind an
alternative postulation to the first one. Only one of the postulations can be true.
So, in essence, we are weighing the truth of one postulation against the truth of
the other. This idea is the first basic principle in testing a statistical hypothesis.
For Example, a person is accused of a crime; he/she faces a trial. The prosecution
83
Elementary Statistics presents its case, and a judge must make a decision on the basis of the evidence
and Spreadsheets
presented. In fact, the judge conducts a test of hypothesis.
Typically, the question of interest will be represented by the alternative
hypothesis, as illustrated in the following examples. What is interesting to the
analyst is the alternative hypothesis in the following examples:
Example: Suppose a stockbroker has become interested in the performance of
the shares for ABC Bank; he wants to know if the data for the last three years
support the view that the growth rate is at least 6% per year. If it is, he will
recommend to a client interested in long-term investments that the investment fits
the client’s profile.
H0: The growth rate is less than 6% per year.
H1: The growth rate is 6% per year or more;
4.8.4 Estimation
The objective of estimation is to determine the approximate value of a population
parameter on the basis of a sample statistic. For example, the sample mean is
employed to estimate the population mean. We refer to the sample mean as the
estimator of the population mean. Once the sample means has been computed, its
value is called the estimate. There are two alternative methods of estimating the
population parameter.
Point Estimation: A point estimate is a single statistic used to estimate a
population parameter. Suppose a firm wants to estimate the mean age of buyers
of smart phones. They select a random sample of 100 recent purchasers, records
the age of each purchaser, and compute the mean age of the buyers in the sample.
The mean of this sample is a point estimate of the mean of the population.
Generally, a point estimate is a statistic, computed from sample information,
which is used to estimate the population parameter.
There are three drawbacks to using point estimators.
• It is virtually certain that the estimate will be wrong. (The probability that a
continuous random variable will equal a specific value is 0; that is, the
probability that 𝑥will exactly equal μ is 0.)
• In drawing inferences about a population, it is intuitively reasonable to expect
that a large sample will produce more accurate results because it contains
more information than a smaller sample does. But point estimators don’t have
the capacity to reflect the effects of larger sample sizes. As a consequence,
we use the second method of estimating a population parameter, the interval
estimator.
84
Interval Estimator: An interval estimator draws inferences about a population
by estimating the value of an unknown parameter using an interval. An interval
estimate is a range of values, calculated on the basis of information in the sample
that the parameter in a population will be within that range with some degree of
confidence.
The purpose of an interval estimate is to provide information about how close the
point estimate, provided by the sample, is to the value of the population
parameter. The general form of an interval estimate of a population mean is
𝑥 ± 𝑀𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟
Similarly, the general form of an interval estimate of a population proportion is
𝑝̂ ± 𝑀𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟
The sampling distributions play key roles in computing these interval estimates.

1) Describe different measures of central tendency.
………………………………………………………………………………….
………………………………………………………………………………….
………………………………………………………………………………….
2) What information do the measures of dispersion provide?
………………………………………………………………………………….
………………………………………………………………………………….
………………………………………………………………………………….
3) What is the importance of hypotheses testing in statistics?
………………………………………………………………………………….
………………………………………………………………………………….
………………………………………………………………………………….
4) Define point estimate.
………………………………………………………………………………….
………………………………………………………………………………….
………………………………………………………………………………….
4.9 LET US SUM UP

In this unit, you have learned about the meaning and types of statistics. You also
learned the application of statistics in business and finance. Statistics can be used
in a singular or plural sense with the former meaning the methods of statistics
and the latter meaning the available information. The data can be sourced from
primary sources by directly collecting it for self-use or it can be sourced from the 85
Elementary Statistics published or un-published secondary sources. You also learned the tabular and
and Spreadsheets
graphical presentation of data. The collected data can be arranged in the form of
tables and it can also be presented graphically with the help of tools like line
graphs, pie-chart, histograms, etc. You also learned to compute various measures
of central tendency. These measures of central tendency can be divided into two
broad categories, namely mathematical averages and positional averages.
Positional averages are mode and median, while the arithmetic mean is a
mathematical average. The measures of central tendency do not provide
information about the spread of the data. This information is revealed by the
measures of dispersion. The measures of dispersion you learned about in this unit
are the variance, standard deviation and coefficient of variation. You have also
learned to compute variance,
Standard deviation and coefficient of variation can be computed for both
ungrouped and grouped data. The coefficient of variation is used to compare the
dispersion of two distributions having either different mean (even when their
variables are measured in the same units) or different units of measurement of
their variables. The unit ended with a brief discussion about single and bivariate
data, point and interval estimation, and hypothesis testing.
4.10 ANSWERS TO CHECK YOUR PROGRESS

EXERCISES
1) a) Advertisers and TV channel owners use Statistics when they use TRP
ratings to determine which channels and the programmes are viewed most. b)
In business and finance, managers frequently employ statistics to help them
make better decisions. c) Before every election, the media use statistics to
conduct an opinion poll and present the results of the voter survey to its
audience.
2) In the plural sense, it means a systematic collection of numerical facts, while
in the singular sense; it is the science of collecting, classifying and using
statistics. Statistics in the plural sense means the statistical data, and statistics
in the singular sense means the statistical methods.
3) Descriptive statistics deals with the presentation and organization of data.
Various measures of central tendency, such as the mean, median, or mode,
and measures of dispersion are descriptive statistics. On the other hand,
Inferential statistics deals with the use of sample data to infer general
conclusions about a much larger population.
1) Primary data may be collected by Information received from correspondents
or local agencies, schedule sent through enumerators, direct personal
interview, indirect oral interview, mailed questionnaire method and
86 telephonic survey.
2) Organising the data is required to draw insights from raw data.
Questionnaires are sometimes in hundreds or thousands, they must be
organised (i.e., classified and arranged) and condensed or summarised to
draw inferences from them.
3) A histogram is a chart that depicts the frequency of a numerical variable in
non-overlapping intervals, called ‘bins’ which span the entire range of the
data. In essence, a histogram is a pictorial representation of a frequency table.
While bar charts are for categorical variables, a histogram would be the
equivalent kind of chart for numerical data.
4) All data tables have four elements: cells, row labels, column labels, and
caption.

1) The mean is used to calculate average and is usually the best measure of
central tendency when there is no outliers/extreme value. In an ordered series
of data, the median is an observation lying exactly in the middle of the series
and divides the data into two equal halves. The mode is a value that occurs
most frequently in a set of values, and which occurs more than once. It is the
most repeated observation in the dataset.
2) Measures of dispersion help to:
• judge the reliability of measures of central tendency
• control variability itself.
• compare two or more groups of numbers in terms of their variability.
• make further statistical analysis.
3) Hypothesis testing is a technique, or a procedure, by which we can gather
some evidence, using the data of the sample, to support, or reject, the
hypothesis we have in mind. It is one way of making inferences about
population parameters, where the investigator has a prior notion about the
value of the parameter.
4) A point estimate is a single statistic computed from sample information,
which is used to estimate the population parameter.
87

Unit-4

Uploaded by

Copyright:

Available Formats

You might also like

Unit-4

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit-4

Uploaded by

Copyright:

Available Formats

UNIT 4 ELEMENTARY STATISTICS

4.2 MEANING OF STATISTICS

Fig. 4.1: Percentage of the Total Population of India which is Deprived in

4.4 COLLECTION OF DATA

Primary and Secondary Data

Source: Baseline report, National Multidimensional Poverty Index, NITI Aayog,

Fig. 4.1: Line Charts for the Presentation of Data

Fig. 4.2: Various Types of Bar Diagrams

Fig. 4.4: Histogram Presenting the Number of Students in Each Height

4.6.1 Arithmetic Mean

the ith class

Example: Find the median of the following data:

Where 𝑙𝑚𝑒𝑑 = Lower class boundary of the median class.

Example: Find the median wage of the following distribution

So the wage is 4,675

Where: z= the mode of the distribution

Class interval 3-6 6-9 9-12 12-15

4.7 MEASURES OF CENTRAL TENDENCY

Standard deviation (SD or σ) is the most commonly used measure of dispersion.

Population Standard Deviation = 𝜎 = √𝜎 2

Population Variance Population Standard Deviation

Sample Variance Sample Standard Deviation

Sample Variance Sample Standard Deviation

Standard Deviation = s = √7.8 = 2.8

4.7.3 Coefficient of Variation

4.6 OTHER IMPORTANT TOPICS

4.8.2 Simple and Bivariate Relations

4.8.3 Hypothesis Testing

Check Your Progress 3

4.9 LET US SUM UP

4.10 ANSWERS TO CHECK YOUR PROGRESS

Check Your Progress 3

You might also like