Professional Documents
Culture Documents
Unit-4
Unit-4
Unit-4
Elementary Statistics
Structure
4.0 Objectives
4.1 Introduction
4.2 Meaning of Statistics
4.3 Application of Statistics
4.3.1 Business
4.3.2 Finance
4.4 Collection of Data
4.5 Organisation and Presentation of Data
4.6 Measures of Central Tendency
4.6.1 Arithmetic Mean
4.6.2 Median
4.6.3 Mode
4.7 Measures of Dispersion
4.7.1 Standard Deviation
4.7.2 Variance
4.7.3 Coefficient of Variation
4.8 Other important topics
4.8.1 Sampling
4.8.2 Simple and Bivariate Relations
4.8.3 Hypothesis Testing
4.8.4 Estimation
4.9 Let Us Sum Up
4.10 Answers/Hints to Check Your Progress Exercises
4.0 OBJECTIVES
After studying this Unit, you should be able to:
• describe the importance of statistics in making financial decisions;
• use different methods of collecting data;
• organize (classification and arrangement) and condense statistical data;
Dr. Amit Singh Khokhar, Assistant Professor, Delhi Skill and
Entrepreneurship University
63
Elementary Statistics • present the statistical data in the forms of tables, graphs, diagrams, etc.;
and Spreadsheets
• compute numerical quantities that measure the central tendency of a set of
data such as, mean, median, and mode, and use these measures;
• explain the concept of dispersion;
• compute numerical quantities that measure the dispersion of a set of data; and
• explain the concepts of sampling, hypothesis testing, and estimation.
4.1 INTRODUCTION
Statistics is a discipline that deals with the collection, organisation, presentation,
analysis, and interpretation of data and numerical facts. The application of
statistical knowledge helps business managers and policymakers to make
business decisions and formulate policy more effectively. Simultaneously,
statistics is a tool for students, researchers, philosophers to find test theory and
find solutions for daily life problems. Indeed, statistics has become one of the
most important courses in business or financial education, because a background
in applied statistics is a key ingredient in understanding accounting, finance,
marketing, economics, production, and other business courses.
Almost every person deals with and interprets statistics every day, but seldom
recognises it. For example, many Indians play and watch cricket matches and
discuss the batting scores and bowling figures without noticing that these figures
are match statistics. When people say, “India has won three out of first five one-
day matches in a bilateral cricket series,” they are probably referring to the match
statistics. Runs scored by different batsmen of each team are examples of
statistics. The average run rate, the total runs scored by a batsman in the
tournament, and the win-loss analysis are some more examples of statistics.
Let’s take another example. Before every election, the media conducts an opinion
poll and presents the results of the voter survey to its audience. These opinion
polls ask a set of questions about candidate preferences to a sample of voters
instead of the whole population of voters. The media usually makes projections
based on this survey and also state the possible margin of error. A margin of error
of 5 per cent means that the actual number of constituencies won by a political
party may differ from the poll results by as much as 5 percentage points in either
direction (“plus or minus”).
In business and finance, managers frequently employ statistics to help them make
better decisions. A car manufacturer, for instance, needs to forecast future sales
to determine whether to expand production or curtail it. Statistical analysis guides
business owners/managers in decision making.
The government under its constitutional obligation releases a statement (budget)
of its estimated revenue and expenses over a specified future period. In addition,
the government releases many other income and expenditure estimates. Most
64
Elementary Statistics
popular among these are the gross domestic product(GDP), the unemployment
rate, the money supply, the consumer price index (CPI), and the foreign exchange
rate. All these measures are statistics that are used to summarize the general state
of the economy. And, of course, academic economists, businesses, and the
government use statistical methods to predict these macroeconomic variables.
The uses of statistics are widespread not only in business and economics but also
in everyday life.
TV Show Ratings: Advertisers and TV channel owners use the TRP ratings to
determine which channels and the programmes are viewed most or it indicates
the popularity of a TV channel or a programme. TRP is calculated by the Indian
agency the Broadcast Audience Research Council using “BAR-O-meters.” The
BARC releases weekly TRP results every Thursday ranking all TV channels and
TV programmes. These TRP responses are then used to conclude the viewing
habits of the entire Indian population.
4.3.1 Business
In the modern world, management of the business organisation has become a
complicated exercise as a result of changes in size, technical know-how, the
quantum of production, number of personnel employed, capital and workers
employed, and increasing level of competition. Management, while planning,
organizing, controlling and communicating, is confronted with alternative
courses of action. In the face of uncertainty, management cannot adopt a trial-
and-error method. It is here that statistical data and powerful statistical
techniques of probability, expectations, sampling, the test of significance,
estimation theory, and forecasting, etc. play an indispensable role. In the words
of Chao, “Statistics is a method of decision-making in the face of uncertainty
based on numerical data and calculated risks”. Statistics, thus, provides
information to the business units which help in deciding the location and size of
business, demand forecasting, production planning, quality control, marketing
decisions and personnel administration. In Industry, statistics is extensively used
in ‘Quality Control’.
4.3.2 Finance
Financial analysts use a variety of statistical data and tools to guide their
investment and to give recommendations to investors, policymakers, and
government officials. In the case of stocks, the analysts review a variety of
financial data including dividend yields and price/earnings ratios. A financial
analyst can begin to draw a conclusion on whether an individual stock is over-or-
under-priced by comparing the trend for an individual stock with the trend of
stock market averages.
Some of the common questions asked in the finance sector, which can be
answered with the application of statistics, are as follows: Which of the several
brokerages has a reliable record of higher-than-average return on investment? Is
the share price for a given firm rising predictably enough for a day-trader to
invest in it? What has our return on investment been for the given two brands of
computer equipment over the last two years? Should one pay the higher premium
being proposed by his/her insurance company for liability insurance? Should one
invest in bonds or in stocks in the current year? What are the average rates of
return? Which country has the greatest chance of maximizing our profit on
investment over the next decade?
67
Elementary Statistics
and Spreadsheets
Check Your Progress 1
Note: i) Use the space given below for your answers.
ii) Check your progress with those answers given at the end of the unit.
1) State any three examples when we use statistics in everyday life.
………………………………………………………………………………….
………………………………………………………………………………….
2) Differentiate between the singular and plural meanings of the term statistics.
………………………………………………………………………………….
………………………………………………………………………………….
3) Differentiate between descriptive and inferential statistics.
………………………………………………………………………………….
………………………………………………………………………………….
in volume, the investigator may choose to collect data from a small portion of the
population called sample. The size of the sample is always less than the total size
of the population.
There are different methods of primary data collection. Primary data may be
collected by any one of the following methods:
• Information received from correspondents or local agencies
• Schedule sent through enumerators
• Direct personal interview
• Indirect oral interview
• Mailed questionnaire method
• Telephonic survey
Secondary Data
Alternatively, information can also be obtained through a secondary source. It
means drawing or collecting data from the already collected data of some other
agency. Technically, the data so collected are called secondary data. This data is
already available in some reports, research studies, journals or newspapers.
Needless to say, before using secondary data, the investigator must weigh the
advantage in terms of saving money, time and effort with the disadvantage of
reaching misleading conclusions. Whether secondary data is safe or not should be
judged from its adequacy, suitability and reliability.
The sources of secondary data may be classified into the following two
categories:
Published Sources: By published sources of secondary data we refer to data
relating to trade, business, price, stock market, investment, financial institutions,
etc. published by national/international organisations/agencies. These
publications are very useful sources of secondary data. Some of the important
sources of secondary data in this category include the following:
• Publications of the central and state governments, of international bodies like
ADB, ILO, IMF, UNO, WTO, World Bank, and of foreign governments etc.
• Publications of autonomous institutions and research bodies e.g., Reserve
Bank of India Bulletin, CSO, and NSSO.
• Weekly/monthly/annual publications like Yojana, Economic and Political
Weekly, etc.
Unpublished Sources: All those sources of secondary data where records are
maintained by business organisations or private agencies for self-use and are less
frequently available for use by the general public. Data collected by research
institutions also come under the purview of unpublished sources of secondary
data. 69
Elementary Statistics
and Spreadsheets 4.5 ORGANISATION AND PRESENTATION OF
DATA
In the preceding section, we discussed the methods of collection of data either by
a statistical survey (or inquiry) or from some secondary source. Data collected
either from census or sample inquiry, that is from the primary source, are always
unarranged and in raw form. To start with, they are contained in hundreds and
thousands of questionnaires. To draw insights from them, they must be
organised, (i.e., classified and arranged) and condensed or summarised. For this
purpose, we can use various methods like preparing master sheets in which
various information are recorded directly from the questionnaires. From these
sheets, small summary tables can be prepared manually. Nowadays computers
can be used for organisation and condensation of data more swiftly, efficiently
and in much less time. Computer applications help us to construct various types
of graphs and diagrams.
Once the data has been collected the next important step in statistical inquiry is
the organisation and presentation of the collected data. This step broadly covers
the following aspects - classification of data, tabulation of data, diagrammatic
presentation of data, and graphic representation of data. The following tools are
frequently used to organise and present the data for statistical analysis:
Tables
All data tables have four elements: cells, row labels, column labels, and caption.
The caption describes the information that is contained in the table. Table 4.1
illustrates all these elements. The row labels identify the information in rows,
such as housing, electricity, and nutrition. The column labels include censored
headcount, weight, and contribution. A cell is defined by the intersection of a
specific row and a specific column. The entry at the cross-section of column 1
and row 1 is 37.60 %. This is an example of a cell.
Table 4.1: Common Labels in any Table
71
Elementary Statistics One Dimensional Bar Diagram
and Spreadsheets
A bar is defined as a thick line, often made thicker to attract the attention of a
reader. The height of the bar highlights the value of the variable with width
presenting nothing. Therefore, it has nothing to do with the area of the bar.
Further, the bars of the bar diagram are separated from one another so that the
gap between the successive bars is the same. In a bar diagram, the bars can be
placed both vertically as well as horizontally.
Fig. 4.3: Pie Chart Showing the Ministry Wise Capital Expenditure as a
Percentage of Total Capital Expenditure.
Histogram
The histogram is a very common type of graph for displaying classified data. It
was first introduced by Karl Pearson. It is a set of rectangles erected vertically. It
has the following features:
a) It is a rectangular diagram.
b) Since the rectangles are drawn with specified width and height, the histogram
is a two-dimensional diagram.
c) The area of each rectangle is proportional to the frequency of the respective
class.
A histogram is a chart that depicts the frequency of a numerical variable in non-
overlapping intervals, called 'bins', that span the entire range of the data. In 73
Elementary Statistics essence, a histogram is a pictorial representation of a frequency table. While we
and Spreadsheets
have used bar charts for categorical variables, a histogram would be the
equivalent kind of chart for numerical data. Histograms can be useful because at
a glance, we can quickly see the shape of the data. For example, does it look bell-
shaped, or does it seem to be skewed to the left or to the right?
Fig. 4.5 is a histogram displaying the distribution of heights among the students.
The figure shows that the highest number of students measure between 165-170
cm. The figure also reveals that the height wise distribution of students is a
unimodal distribution as there is only one class with the highest frequency.
When the data are arranged or given in the form of frequency distribution i.e.
there are k variate values such that a value x1 has a frequency f1 (i=1,2,---,k), then
∑𝑛 𝑓i𝑥i
the Arithmetic mean will be 𝑋̅ = 𝑖=1𝑘 ̅ where K is the number of classes
𝑖=1
and ∑𝑘𝑖=1 𝑓i = 𝑛
Arithmetic Mean for Grouped Data
If data are given in the shape of a continuous frequency distribution, then the
arithmetic mean is obtained as follows:
∑ 𝑛
𝑓1 𝑚1
𝑋̅ = 𝑖=1
∑𝑘
, m1=the class mark of the ith class and fi=the frequency of
𝑖=1 𝑓1
75
Elementary Statistics 4.6.2 Median
and Spreadsheets
In any distribution, the median is the value of the variable which divides the data
into two equal halves. In an ordered series of data, the median is an observation
lying exactly in the middle of the series. It is the middlemost value in the sense
that the number of values less than the median is equal to the number of values
greater than it.
Let X1, X2,......Xn be the observations, then the number arranged in ascending
order will be x(1), x(2),..... x(n),where x(i),is ith smallest value.
Here, we find that x(1),< x(2),<..... x(n)
Median is denoted by M
Median for under grouped by M
𝑛+1
𝑥 𝑖𝑓 𝑛 𝑖𝑠 𝑜𝑑𝑑
2
Median={1
2
(𝑥𝑛 +𝑥𝑛+1 ) ,𝑖𝑓 𝑛 𝑖𝑠 𝑒𝑣𝑒𝑛
2 2
Median for grouped data: If data are given in the shape of continuous frequency
distribution, the median is defined as:
𝑤 𝑛
𝑀 = 𝐿𝑚𝑒𝑑 +𝑓 (2 − 𝑓𝑐 )
𝑚𝑒𝑑
4.6.3 Mode
The mode is a value that occurs most frequently in a set of values, and which
occurs more than once. It is the most repeated observation in the dataset. The
mode may not exist and even if it does exist, it may not be unique. In the case of
discrete distribution, the value having the maximum frequency is the modal
value. If in a set of observed values, all values occur once or an equal number of
times, then, there is no mode.
Example
a) Find the mode of 5, 3, 5, 8, 9
Solution: Mode =5, because comes more than any other observation in the
given data.
b) Find the mode of 8, 9, 9, 7, 8, 2, and 5.
Solution: It is bimodal data as it has two modes, 8 and 9.
c) Find the mode of 4, 12, 3, 6, and 7.
Solution: No mode for this data, because all observations appear only once in
the dataset.
The mode of a set of numbers X1, X2, …, Xn is usually denoted by Z.
Mode for Grouped data
If data are given in the shape of continuous frequency distribution, the mode is
defined as:
Z=𝐿𝑚𝑜𝑑 + (∆1∆+∆
1 )W
2
Frequency 4 8 10 3
Z = 𝐿1 + (∆1∆+∆
1 )cw
2
2 2
= 9+(2+7) 3 = 9 (9) 3
29
= 3
∑𝑛𝑖=1(𝑋 − 𝜇)2
𝜎=√ 𝑤𝑖𝑡ℎ 𝑐𝑙𝑎𝑠𝑠 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
𝑁
∑𝑛𝑖=1 𝑓1 (𝑋 − 𝜇)2
𝜎=√ 𝑓𝑜𝑟 𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
𝑁
𝑋𝐼 5 10 12 17 Total
(x1-x)2 36 1 1 36 74
∑𝟒𝒊=𝟏(𝒙𝒊 −𝑿)𝟐
S2= =74/3=24.67 = > S = √𝑺𝟐=√𝟐𝟒. 𝟔𝟕 =4.97
𝟒−𝟏
4.7.2 Variance
The variance is defined as the square of standard deviation i.e., it is the sum of
the squared deviation from the mean divided by the number of observations. The
population variance is denoted byσ2 and the sample variance by S2. The
following box summarises the relation between SD and variance, and population
and sample measures of dispersion.
∑𝑛
𝑖=1(𝑋−𝜇)
2 ∑𝑛
𝑖=1(𝑋−𝜇)
2
𝜎2 = 𝜎=√
𝑁 𝑁
∑𝑛
𝑖=1(𝑋−𝜇)
2 ∑𝑛
𝑖=1(𝑋−𝜇)
2
𝑠2 = 𝑁
𝑠=√ 𝑁
79
Elementary Statistics Example: Suppose a sample of sales is taken from four firms and the figures are
and Spreadsheets
₹ 2.3 crores, ₹ 1.1 crores, ₹ 0.7 crores, and ₹ 6.8 crores. Find the sample variance
and standard deviation.
Solution:
Sales(x) x - x̅ (x - x̅)2
x2
2.3 -.425 .181
5.29
1.1 -1.625 2.64
1.21
.7 -2.025 4.100
.49
6.8 4.075 16.60
46.24
10.9 0 23.53
53.23
∑𝑛
𝑖=1(𝑥𝑖 −𝑥)
2 ∑𝑛
𝑖=1(𝑥𝑖 −𝑥)
2
𝑠2 = 𝑠=√
𝑛−1 𝑛−1
10.9
Mean = 𝑥 = = 2.725
4
25.53
Variance = 𝑠 2 = = 7.8
4−1
Example: Consider the distribution of per dim (per employee) of two companies.
For the first company, the mean and Standard Deviation (S.D.) are 60 and 10
respectively. For the second company, the mean and S.D. are 50 and 9
respectively. Hence, their coefficient of variation is given by:
10
CV1 = 60 × 100 = 16.7%
80
9 Elementary Statistics
CV2 = 50 × 100 = 18.0%
This shows that the variability in the first company is less as compared to that in
the second company.
4.8.1 Sampling
It has been discussed in previous sections that collected data from whole
population is neither feasible nor essential in most cases. Therefore, we collect a
sample of data from the population and use it to make inferences about the
population. Very often, we will be interested in estimating a population
parameter. In order to estimate this, we need to define our terms carefully:
Population: The entire group of individuals or objects of interest under
investigation or study.
Unit: An element of the population. This will be a person or object on which
observations can be made or from which information can be obtained.
Sampling Frame: A list of all the units from the population from which the
sample may be drawn is called sampling frame.
Target Population: The population about which one wishes to make an inference.
Sample Size: The number of individuals in the sample.
Sampling: It is the process of selecting a sample from the population.
Importance of Sampling
1) sampling is economical in nature
2) sampling saves precious time of the investigator
3) sampling is convenient to organise
4) sample based results are good estimates of population parameters
5) suitable in limited resources
Type of Errors in Sampled Data
An estimate based on a sample will not be exact, there may be some error
involved in it. This error is the difference between an estimate and the true value
of the parameter being evaluated. In general, errors that occur during estimation
based on a sample can be categorized into two:
• Sampling Errors
• Non-Sampling Errors
81
Elementary Statistics Sampling Errors: It is the discrepancy between the population parameter and
and Spreadsheets
the sample statistic. The error arises due to the selection of a less representative
sample. Even if we have a representative sample, there may still be an error if the
sample size is small. On the other hand, our estimates of parameters will often be
inaccurate if our sample is not representative of the population. The selection of
the sample should be done carefully to minimise the sampling error.
Non-Sampling Errors: Suppose we have a representative sample and have
chosen a sample large enough to ensure our parameter estimates are accurate to a
good degree of precision, errors may still arrive – such as recording errors,
measurement errors, respondent bias, non-response errors, errors in processing
the data, interviewer error, and the reporting error. Measurement errors and
recording errors occur if there is an error in measuring the item being studied or
in recording its result. Non-responses can be due to refusals. Another common
form of error is the interviewer error. Interviewer errors can occur in surveys
when an interviewer introduces bias into an interview or when a questionnaire is
badly designed.
Sampling Methods
Sampling techniques can be grouped into two categories:
Random (Probability): The sampling method in which the items are included in
the sample on a random basis i.e., each observation in the population has an equal
chance of getting selected in the sample. Example – simple random sampling,
stratified random sampling, cluster sampling, and systematic sampling.
Non-Random (Non-Probability): In non-probability sampling, the sample is not
based on chance. It is rather determined by the personal judgment of the
researcher. This method is cost-effective; however, we cannot make objective
statistical inferences. Depending on the technique used, non-probability samples
are classified into quota, judgment, purposive, and convenience samples.
Univariate Data: This type of data consists of only one variable. The analysis of
univariate data is thus the simplest form of analysis because the information deals
with changes in only one variable. It does not deal with causes of changes or
relationships among variables. The main purpose of the analysis is to describe the
data and find patterns that exist within it. An example of univariate data can be
the height of people.
Height (in 143 152 176 139 148 122 171
cm)
The above data has only one variable and that is the height of the people selected
for the study.
82
Elementary Statistics
Bivariate Data: This type of data involves two different variables. The analysis
of this type of data deals with causes and relationships among the two variables.
An example of bivariate data can be rainfall and agricultural production in a
particular year.
Rainfall (in
80 91 84 78 86 89 82
cm)
Wheat
Quantity 131 142 136 125 139 141 132
(Tonne)
The above table presents data about rainfall and wheat production in a particular
year. The data shows a positive relation between average annual rainfall and
wheat production. Thus, bivariate data analysis involves relationships,
comparisons, explanations, and causes. These variables are often plotted on the X
and Y axis on the graph for a better understanding of data and one of these
variables is independent while the other is dependent.
4.8.4 Estimation
The objective of estimation is to determine the approximate value of a population
parameter on the basis of a sample statistic. For example, the sample mean is
employed to estimate the population mean. We refer to the sample mean as the
estimator of the population mean. Once the sample means has been computed, its
value is called the estimate. There are two alternative methods of estimating the
population parameter.
Point Estimation: A point estimate is a single statistic used to estimate a
population parameter. Suppose a firm wants to estimate the mean age of buyers
of smart phones. They select a random sample of 100 recent purchasers, records
the age of each purchaser, and compute the mean age of the buyers in the sample.
The mean of this sample is a point estimate of the mean of the population.
Generally, a point estimate is a statistic, computed from sample information,
which is used to estimate the population parameter.
There are three drawbacks to using point estimators.
• It is virtually certain that the estimate will be wrong. (The probability that a
continuous random variable will equal a specific value is 0; that is, the
probability that 𝑥will exactly equal μ is 0.)
• In drawing inferences about a population, it is intuitively reasonable to expect
that a large sample will produce more accurate results because it contains
more information than a smaller sample does. But point estimators don’t have
the capacity to reflect the effects of larger sample sizes. As a consequence,
we use the second method of estimating a population parameter, the interval
estimator.
84
Elementary Statistics
Interval Estimator: An interval estimator draws inferences about a population
by estimating the value of an unknown parameter using an interval. An interval
estimate is a range of values, calculated on the basis of information in the sample
that the parameter in a population will be within that range with some degree of
confidence.
The purpose of an interval estimate is to provide information about how close the
point estimate, provided by the sample, is to the value of the population
parameter. The general form of an interval estimate of a population mean is
𝑥 ± 𝑀𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟
Similarly, the general form of an interval estimate of a population proportion is
𝑝̂ ± 𝑀𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟
The sampling distributions play key roles in computing these interval estimates.
87