Lesson 1 - 14

Department of Distance and Continuing Education
University of Delhi
nwjLFk ,oa lrr~ f'k{kk foHkkx
fnYyh fo'ofo|ky;
Semester-II
B.A. (Programme) DSC-4 (Major)
Semester-I
Semester-IV
Course
Generic Credits-4
Elective -Economics
All UG Courses except B.A.(Hons.) Economics
Course Credit-4
BASIC STATISTICS
UNDERSTANDING FOR ECONOMICS
POLITICAL THEORY
(Department of Economics)
As As
perper
thethe
UGCF-2022
UGCF andand
National
National
Education
Education
Policy
Policy
2020
2020
Basic Statistics for Economics
Editorial Board
Prof. J. Khuntia, V.A.Rama Raju, Vajala Ravi,
Bhavna Rajput, Anupama, Devender
Content Writers
Suramya Sharma, Dr. Pooja Sharma
Taramati, Sugandh Kumar Choudhary,
Tasha Agarwal, Ashish Kumar Garg
Academic Coordinator
Deekshant Awasthi
© Department of Distance and Continuing Education

ISBN: 978-81-19169-58-0
1st edition: 2023
E-mail: ddceprinting@col.du.ac.in
economics@col.du.ac.in
Published by:
Department of Distance and Continuing Education under
the aegis of Campus of Open Learning/School of Open Learning,
University of Delhi, Delhi-110 007
Printed by:
School of Open Learning, University of Delhi
© Department of Distance & Continuing Education, Campus of Open Learning,

This Study Material is duly recommended in the meeting of Standing

Committee held on 08/05/2023 and approved in Academic Council meeting
held on 26/05/2023 Vide item no. 1014 and subsequently Executive Council
Meeting held on 09/06/2023 vide item no. 14 {14-1(14-1-11)}
• Corrections/Modifications/Suggestions proposed by Statutory Body, DU/Stakeholder/s in the Self

Learning Material (SLM) will be incorporated in the next edition. However, these
corrections/modifications/suggestions will be uploaded on the website https://sol.du.ac.in. Any
feedback or suggestions can be sent to the email- feedbackslm@col.du.ac.in

BASIC STATISTICS FOR ECONOMICS

Study Material: Lesson 1-14
TABLE OF CONTENT
Name of Lesson Content Writers Page No
LESSON 1 Introduction of Population and Sample Suramya Sharma 1-19
LESSON 2 Pictorial Methods in Descriptive Statistics Suramya Sharma 20-41
LESSON 3 Measures of Location and Variability Suramya Sharma 42-76
LESSON 4 Sample Space, Events, and Probability Pooja Sharma 77-92
LESSON 5 Conditional Probability Pooja Sharma 93-107
LESSON 6 Random Variables and Probability Pooja Sharma 108-120
Distributions
LESSON 7 Cumulative Distribution Function, Density Pooja Sharma 121-140
Function, Expected Value, and Variance
LESSON 8 Discrete Distribution Taramati 141-150
LESSON 9 Continuous Distribution Taramati 151-166
LESSON 10 Joint Probability Distribution and Sugandh Kumar 167-184
Mathematical Expectations Choudhary
LESSON 11 Correlation and Covariance Tasha Agarwal 185-202
LESSON 12 Characteristics of Estimators Ashish Kumar Garg 203-242
LESSON 13 Statistical Hypothesis Ashish Kumar Garg 243-271
LESSON 14 Error in Hypothesis Testing and Power of Ashish Kumar Garg 273-305
Test
About Contributors
Contributor's Name Designation
Suramya Sharma Guest Faculty, NCWEB, Hansraj College, University of Delhi,
Delhi
Dr. Pooja Sharma Associate Professor, Daulat Ram College, University of Delhi
Taramati Guest Faculty, Kirori Mal College, University of Delhi
Sugandh Kumar Assistant Professor, Department of Economics, S.S.Khanna Girl's
Choudhary Degree College, University of Allahabad
Tasha Agarwal Ph.D. Scholar, Ambedkar University, Delhi
Ashish Kumar Garg Assistant Professor, Ramjas College, University of Delhi

LESSON 1
INTRODUCTION TO POPULATION AND SAMPLE
STRUCTURE
1.1 Learning Objectives

1.2 Introduction
1.3 Type of Data
1.3.1 Quantitative Data
1.3.2 Qualitative Data
1.4 Population, Sample and Processes
1.4.1 Population
1.4.2 Sample
1.5 Sampling Techniques
1.5.1 Probability Sampling Techniques
1.5.2 Non-Probability Sampling Techniques
1.6 Branches of Statistics
1.6.1 Descriptive Statistics
1.6.2 Inferential Statistics
1.7 Summary
1.8 Glossary
1.9 Answers to In-Text Questions
1.10 Self-Assessment Questions
1.11 References
1.12 Suggested Reading
1.1. LEARNING OBJECTIVES
After reading this lesson, students will be able:

1. To gain thorough understanding of the meaning, importance, and application of
statistics
2. To distinguish between quantitative and qualitative data
1|Page

B.A.(Prog.) / All UG Courses except B.A.(Hons.) Economics
3. To learn about various scales of measurement viz. ratio, interval, ordinal, and
nominal scales
4. To identify and comprehend the differences between statistical population and
samples
5. To distinguish between various sampling techniques and
6. To demonstrate the knowledge of various branches of statistics through descriptive
and inferential statistics
1.2 INTRODUCTION
"Statistical thinking will one day be as necessary for efficient citizenship as the ability to read
and write." Samuel S. Wilks (1906 - 1964).
Over the past many decades, statistics have become an indispensable part of our lives. We often
come across statistics in one form or the other. If we turn our newspapers or televisions, we
can definitely find some surveys that establish relationship between particular issues, say,
eating fast food and the risk of having a health issue; or we may find graphs depicting growth
rates or changes in some variables overtime, say growth rate of GDP or inflation level in a
country. But the scope of statistics is not just limited to data collection and representation, but
it establishes a basis for decision making and problem solving. We can create models to not
only study the past trends but can also extend them to study the uncertain future. Statistics help
us to make informed decisions in a world of uncertainty and variability. We would not have
needed statistics had there been no uncertainty and variability around us.
The word Statistics is derived from a Latin word, “Status,” which means “a group of numbers
or figures that represent some information of human interest.” Statistics may be formally
defined as “the study of collecting, organizing, analyzing and interpreting information in the
form of data.”
Statistics helps us gain valuable insights not only in the Economics discipline, but is also
popular amongst finance scholars, engineers, medical researchers and other science and social-
science disciplines. In the financial sector, statistical analysis may be used at the micro and
macro level. At micro level, it facilitates understanding of a company or business’ performance
like determining the revenue generating capacity, relationship between advertising and sales,
etc. Whereas at macro level, statistical analysis allows a country to assess its financial condition
and measure economic growth. In the field of engineering, statistics is an indispensable part
for robust analysis, probability risk assessment, measurement of error etc. Statistics allow
clinical researchers to compare various medical treatments, evaluate the benefits of alternate
therapies, establish optimal treatment combinations, etc. Since the scope of statistics has
broadened and is now used in a number of practical fields, it is also referred to as applied
statistics.
2|Page
When we talk about the nature of statistics, it is considered as a science as well as an art. It is
a science since the statistical techniques are systematic and have broad application. In several
instances, statistics are used to study cause and effect relationships and the results can be
generalized in the same way as any scientific experiment or law. Statistics is also an art which
refers to the “skill of handling facts so as to achieve a given objective.” Managing, presenting,
and drawing relevant conclusions from data is considered an art.
1.3 TYPE OF DATA
We’ve learnt that statistics is the study of collecting, organizing, analyzing and interpreting
data. But you may wonder, what exactly is data? Data is nothing but pieces of raw information,
facts and figures that are used for analysis. Broadly, data can be of two kinds:
1.3.1 Quantitative data
1.3.2 Qualitative data
1.3.1 Quantitative data

As the name suggests, all kinds of numerical data that can be measured comprise quantitative
data. Information like age, height, distance, income, saving, GDP, imports, exports, rate of
employment, etc. are quantitative data.
Now quantitative data can further be classified into two types:
i. Discrete data – The data which can only take specific values are termed as discrete
data. For example, age of respondents- this variable can assume only whole numbers.
For instance, the number of computers in a school can only be whole numbers. We will
never witness 7.2 or 15.7 computers in a school. Similarly, the number of students in a
class- this variable too can only take whole numbers. We never observe 39.7 or 45.2
students in a class. These are examples of discrete data. So discrete data generally takes
only whole numbers, is finite and countable.
ii. Continuous data – The data which can take any value between an interval is referred
to as continuous data. This means that such data can take up any value between two
numbers. For example, daily temperature recorded in an Indian city in degree Celsius-
this variable can take any value in the range of -50° to 50°. Here, 4.7° or 42.3° are
acceptable observations. Similarly, the daily income of an ice-cream seller- here too
Rs. 346 or Rs. 1787.5 are suitable values. So continuous data can take decimal values,
is infinite and may not be countable.
3|Page

1.3.2 Qualitative data
On the contrary, any information which is not directly measurable is known as qualitative data.
Qualitative data represents the qualities or the characteristics of data. Information like political
ideology, physical attributes of a person, problems faced by workers etc are examples of
qualitative data.
Scales of measurement:
i. Nominal scale data – The information which cannot be sorted or put in any order
is known as Nominal. Such data are individual set of information where changing
the order of the information does not change any meaning. For example, occupation
of respondents- the values may range from teacher, farmer, shopkeeper,
unemployed etc. These data cannot be measured, nor can they be sorted in any way.
Another example of nominal data may be the marital status of respondents. The
variable may take values like married, unmarried, divorced, widow etc. Changing
the order of the responses does not make any difference in the understanding of the
sample
ii. Ordinal scale data – In contrast, ordinal data follows a natural order. Although
these too cannot be measured explicitly, we can sort the data or order them in a way
to observe basic comparison between values. For example, the education level of
respondents- such a variable can take values from nursery to post graduation and
the data has a specific order. Opinion of respondents towards relevance of CCTV
cameras in workplaces could take values like strongly agree, somewhat agree,
somewhat disagree, strongly disagree, etc. These values can also be arranged in a
particular order.
iii. Interval scale – This scale pertains to numerical data which possesses the property
that differences in values represent the real differences in the variable. With such
variables, we know that not only one value is greater than the other but that the
distances between the intervals on the scale are the same. For example, the
temperature in Fahrenheit or Celsius, year of birth, etc. Here a temperature of 92°F
is greater than 90°F and also the difference between 92°F and 90°F would be same
as the difference between 90°F and 88°F.
iv. Ratio scale – The data belonging to ratio scale is a quantitative measurement with
labels and orders the variable with evenly spaced intervals between values. These
scales have a real absolute zero representing the total absence of the variable being
measured. Hence ratio scale variables are exactly same as interval scale variables
along with a “True zero.” For example, weight of a commodity & height of a person,
etc. Note that a zero value indicates that the commodity is weightless.
4|Page
IN-TEXT QUESTIONS
1. Sort the following data into quantitative and qualitative data:

A. Gender of respondents
B. Number of lectures attended
C. Percentage marks obtained in Economics subject
D. Revenue in lakhs
E. Whether interested in buying a washing machine
2. Mark whether the statements are true or false:
A. Nominal data can be arranged in a particular order
B. Amount of time taken to complete a class project is a continuous variable
C. Statistics helps us to make informed decisions
D. Qualitative data can be measured directly
E. Discrete data are finite and countable
3. Match the following:
A. Economic status: low, medium and high 1. Discrete

B. Weight of students in a class 2. Continuous
C. Employees in a company 3. Nominal
D. Religion: Hindu, Muslim, Sikh, Christian, other 4. Ordinal
4. Fill in the blank:

A teacher notes down the weight of each student in the class. The scale of measurement
being used here is ____________.
1.4 POPULATION, SAMPLE AND PROCESSES
We can also categorize data into univariate, bivariate and multivariate data. Uni means one
and variate refers to variable. Hence univariate data consists of only a single variable. It is the
simplest form of data. For instance, the number of cold drinks sold by a street vendor on
weekdays. The data may look like as below:
1 2 3 4 5
Sales (in Rs.) 2,500 2,700 2,000 3,200 3,800
5|Page

In the above example, you may notice that we can only describe the data and any kind of
relationship or comparison cannot be drawn. On the other hand, bivariate and multivariate data
allow a researcher to establish relationships and correlations between variables. Here, bi means
two and hence bivariate data involves two distinct variables. A researcher may use this data to
establish a relationship between the two variables. For instance, the number of cold drinks sold
by a street vendor and the daily temperature of the city the vendor resides in. They could look
like:
1 2 3 4 5
Sales (in Rs.) 2,500 2,700 2,000 3,200 3,800
Temperature
34 37 36 37 38
(in °C)
A researcher may draw a conclusion that sales and temperature have a positive relationship
since as temperature in the city rises, so does the number of cold drinks sell by the street vendor.
Finally, multivariate data consists of more than two variables. Suppose a researcher wishes to
analyze the determinants of cold drinks sales in a city. So, she gathers data on cold drinks sales,
temperature of city, and price of cold drinks.
1 2 3 4 5
Sales (in Rs.) 2,500 2,700 2,000 3,200 3,800
Temperature
34 37 36 37 38
(in °C)
Price (in Rs.) 20 18 21 16 15
The researcher can identify several relationships between these variables.

Now to get reliable results from our analysis, it is crucial to understand that the data we use
must be relevant and representative. To ensure the same, we need to know about populations
and samples. The differences between them, and why is there a need to use a sample at all. We
will finally discuss some of the commonly used sampling techniques.
1.4.1 Population
If we ask you, what do you understand by the term population, what will your answer be?
Maybe you’ll say that population is a group of individuals who live in a country or a state or
even in a city. You may also say that population may not just be related to human beings, but
the scope of population can be expanded to all living beings including animals and birds.
6|Page
However, in statistics, the definition of population is slightly different. Here, population refers
to all the individuals/entities possessing similar characteristics and belonging to a particular
group under study. The population could vary from study to study. For instance, a researcher
might be interested in analyzing the results of the Common Admission Test (CAT) examination
in India. Here, the population would be comprised of all the candidates who appear for CAT
in a particular year. Say in a year, about 2.3 lakh candidates appear for the exam. To study a
population, the researcher would have to gather data for all 2.3 lakh candidates—no one from
the population can be left out. The best example to understand the concept of a population is a
census. In India, a census is the process of collecting and analyzing demographic, economic
and social aspects of Indian residents. Since census is a study of the whole population, the data
is collected from each and every Indian resident. You can imagine the scale at which the data
collection is carried out that it takes about 10 years to collect, clean and publish the entire data.
In statistics, the population depends on the topic and scope of research. Other examples of a
population could be total number of people working under the NREGS, voter population in
India, number of students enrolled in private schools in a state, total number of accidents taken
place in India etc.
Studying a population helps a researcher to gain useful insights into the characteristics of all
the elements under study. Since each and every unit is considered, the results are considered to
be reliable and representative of the population. This also enables a researcher to study more
than one aspect of the population and carry out an intensive study. Such a data can also form
the basis of further investigations. However, studying a population is more suitable when we
have a small scope of study.
The descriptive statistics taken from the population are termed as ‘population parameter’ or
simply a ‘parameter’. So, a parameter describes the characteristics of a population. They are
usually denoted by Greek letters such as µ (Mu) for mean and σ (Sigma) for standard deviation.
Based on the data collected from the entire population of 2.3 Lakh candidates, if we want to
say that the average marks a candidate scores in the CAT examination are 85, we could denote
it as: 𝜇 = 85.
1.4.2 Sample
As you may have identified, the major challenge of analyzing a population is that such research
requires data to be collected from each and every member of the population, which is
undeniably a tedious task. The possibilities of making errors in studying population is also
significant since there may be missing data due to non-response by respondents or
measurement complexities due to the large amount of data, etc. Gathering data from a
population not only requires extra efforts, but also consumes a lot of time and is expensive.
Constraints on scarce resources render a population survey unfeasible.
7|Page

To overcome the drawbacks, a subset of the population, referred to as a sample, may be used
instead. A sample is an unbiased subset of the statistical population which is representative of
the entire dataset. This means that a researcher randomly draws and analyzes some
observations from the population to make inferences about the whole population. The figure 1
depicts the relationship between population and sample:
Population
Sample
Figure 1: Sample is a subset of Population

For instance, consider the above example of a researcher who wishes to analyze the results of
CAT examination, again. Now, instead of collecting data from all 2.3 lakh candidates, the
researcher may select a subset from the population, say randomly selected 1000 candidates,
and then perform the analysis. If the sample is well representative of the population, the
researcher could generalize the results for all the 2.3 lakh candidates.
So, it is advisable to use a sample when:
a. The population is too large. For example, if a researcher wants to understand the
relationship between level of inflation and unemployment in a country, then the
researcher must collect data from each and every unemployed person in the country,
which is not practical. Instead, the researcher may select a small sample from the
population to understand the relationship at the country level.
b. The research is time sensitive. Suppose a researcher wishes to analyze the short-term
sentiments of the population about the Covid-19 lockdown in a country. Clearly, it will
take a lot of time to collect the information from the entire population. During that time,
it is possible that the sentiments of the public change. Due to the time-consuming
process of data collection, the research may not provide accurate results by the time it
is completed. In such cases, using a sample is a time saving way to conduct research.
c. Data collection is expensive. Data collection from a population involves several
expenses such as the cost of employing survey teams, computers/laptops,
8|Page
transportation, stationery, and other miscellaneous expenses. On the other hand,

samples are a cost-effective way to conduct research.
Corresponding to a parameter, the descriptive values taken from a sample are known as
‘sample statistics or just ‘statistic’. So, a statistic describes the characteristics of a sample.
They are denoted by Latin letters such as 𝑥̅ (𝑥 𝑏𝑎𝑟) for mean and 𝑠 for standard deviation.
Now, based on the data collected from a sample of 1000 candidates, if we want to say that the
average marks a candidate scores in the CAT examination are 86.8, we could denote it as: 𝑥̅ =
86.8.
The process of selecting a sample from the population is known as ‘sampling’. When selecting
a sample from a population, deciding on the size of the sample is also important. A sample
size is simply the number of observations we select out of a population as a sample. For
instance, say we want to investigate about the work environment of a leading e-commerce
company for its female employees. Assume that the total number of full-time and part-time
female employees working for the company is 2 Lakh. So, this will be our study population.
Since we already know that it is difficult to conduct research based on the whole population,
we should draw a sample for our study. Now the sample size that we select, could either be:
• Too small. What if we select a sample of only 50 female employees as our sample and
conduct our research based on their experience? There are very high chances that our
results will be biased and unrepresentative of the whole population; or
• Too big. The other extreme could be if we select a sample of 1 lakh female employees.
Since this sample size is quite large, we will get more accurate results. However, the
process of collecting data will be as complex, time-consuming and costly as it would be if
we studied the whole population.
Hence, we now understand the importance of appropriate sample size in a study. We will not
study the variables and formula required to calculate the sample size since it is out of the scope
of this unit.
The following table summarizes the concepts of population and sample in a tabular form:
Population Sample
Definition Set of all items or observations Subset or a part of population that

that possess common is representative of population
characteristics
Characteristics Parameter Statistics
Symbols Population size = N Population size = n
9|Page

Population mean = µ (Mu) Population mean = x̅ (x bar)

Population standard variance = σ Population standard variance = s
(Sigma)
Data collection Census Sampling
Advantages 1. Results are representative of 1. Convenient

population 2. Low cost and less time
2. Intensive study consuming
3. Suitable for small universe 3. Better accuracy if sample is
representative
Disadvantages 1. Time consuming and costly 1. Possibility of bias
2. Possibility of errors 2. Difficulty in selecting a
representative sample
IN-TEXT QUESTIONS
5. Select the correct option and fill in the blank:

Suppose a researcher wishes to compare the popularity of five advertisements on a
website, and gather data for the click rates for teens, adults and elderly. The data
collected in this case is _____________ data. (Univariate / Bivariate / Multivariate)
6. Mark whether the following are parameter or statistic:
A. 𝑠 = 5.7
B. 𝜇 = 120
C. 𝜎 = 32
D. 𝑥̅ = 18
7. Select the correct option:
I. Sample is used when:
A) Data collection is inexpensive B) Research is time sensitive
C) Population is small D) Population is unknown
II. A mean is called a statistic if it is calculated from the:
A) Sample B) Population
C) Parameter D) Standard deviation
10 | P a g e
1.5 SAMPLING TECHNIQUES
To ensure that the samples drawn are representative of the population, it is crucial to understand
the different ways in which we can select a sample. The two broad methods of sampling are:
1.5.1 Probability sampling techniques – It is one of the commonly used sampling techniques
where each unit of population has an equal chance (or probability) of getting selected in a
sample. This means that the samples selected are random and unbiased and hence are
representative of the population. These techniques are also known as random sampling
techniques. The five techniques under probability sampling are:
a. Simple random sampling – As the name suggests, it is the most basic and crude form
of random sampling. Here each unit of population has an equal and independent chance
of being chosen. Example: in a lottery system, the names of each unit of population are
written on a chit and after thorough shuffling, the researcher picks the chits one by one
and notes down the names. Another way of simple random sampling is through random
number generation where all the units of population are assigned a number in sequential
order. Then random numbers are generated using software and sample selections is
carried out.
b. Systematic random sampling – Under systematic random sampling, the sample set is
selected from the population in a fixed interval. This technique is more feasible than
simple random sampling. To draw samples using systematic random sampling, the first
step is to arrange the units of population in an order and assign a number to each unit
𝑁
from 1 to 𝑁. Then the sampling interval is calculated using the formula: 𝐾 = 𝑛 ,
where 𝐾 is the interval, 𝑁 is the population size and 𝑛 is the sample size. Finally, we
select one unit at random and then select following units at equal interval 𝐾. For
example, suppose we have to collect information from residents of a city where the
houses are numbered from 1 to 100,000. So, if the size of the population is 100,000 and
we need a sample size of 1000 houses, then the interval should be 100,000/ 1000 = 100.
The researcher will choose one house at random and then will select every 100th house
thereafter to get the sample.
c. Stratified random sampling – The above two types of sampling techniques assume
that the population is homogeneous. In case the population is heterogenous, we use the
Stratified random sampling technique. Under this technique, we divide the population
into sub-groups based on homogeneous characteristics such as gender, age group,
income level, etc., called strata, and then select random samples from each sub-group
or stratum. For instance, if a company has 700 male employees and 300 female
employees, then simple and systematic random sampling technique may give us biased
samples. To avoid the bias, we use stratified random sampling wherein we create two
11 | P a g e

groups based on gender and then select random samples from both groups. The
technique has further two approaches to select a sample: Proportionate stratified
sampling and disproportionate stratified sampling.
d. Multi-stage sampling – At times the population of interest is quite large and
geographically diverse. In such cases, one sampling technique is not enough to select a
sample and using multi-stage sampling is suitable. Here, a sample is selected in stages,
combining different sampling techniques as described above. For instance, to study the
issues faced by primary school children, a researcher may first divide the population
into states, then use simple random sampling to create a sample of states. Next, the
researcher could again use simple random sampling to select a few districts, and finally
use systematic random sampling to identify a few schools within a district. The multi-
stage sampling method is frequently used to scale down large data sets into workable
sizes. Although there is no restriction on the number of stages you could use to select a
sample, it is important to note that all the sampling techniques used must be probability
sampling methods.
1.5.2 non-probability sampling techniques – Under non-probability sampling or non-
random sampling techniques, each unit of population does not have an equal chance of getting
selected in a sample, that is, the samples are not random—they may be biased and may not
represent the population accurately. Hence, non-probability sampling techniques may not
produce results that can be generalized. Yet, there are many occasions when non-probability
sampling methods are preferred over probability sampling methods, which will be discussed in
the following passages. The most commonly used non-probability sampling techniques are:
a. Convenience sampling – As is clear from the term itself, convenience sampling refers
to the sampling technique in which a researcher collects the data from convenient
sources. For instance, as part of the undergraduate course, a student undertakes a
research project in which she tries to understand the consumer sentiments related to
rooftop solar panels. Considering the time, cost and effort constraints, the student may
choose to collect data from the most convenient location to her, it may be within the
city she resides in or within the district. It is worth noting that such research may not
be representative of the population and generalization of the results may not be
appropriate. That said, such a technique is useful when a researcher carries out a pilot
survey to test the questionnaire.
b. Judgement sampling – Under this technique, the researcher selects a sample based on
her own judgement about the characteristics of the individuals. In other words, the
researcher uses her expertise to identify the best fir for her sample. Take the example
of a researcher who is interested in understanding the challenges faced by disabled
employees in a company. In such a case, the researcher can easily identify her sample
by using her sense of judgement about the characteristics of disabled persons.
12 | P a g e
c. Snowball sampling – At times it is difficult to locate the appropriate target group for
a study. In such cases, snowball sampling techniques may be used. This is generally
used in social science research. Under snowball sampling, the existing respondents are
asked to identify or suggest other individuals who are well-suited for the research. So
based on the references, the sample size keeps on increasing—just like a snowball. For
example, a researcher may want to understand the plight of immigrants in a city. Since
there may not be any official record of immigrants readily available, the researcher
could identify few immigrants and then ask them to help locating other immigrants for
the study. Such a technique comes in handy when we’re interested in researching hard-
to-find- groups. However, the risk of achieving biased results is quite high since the
initial respondents may refer to their friends and family, who share common
characteristics and beliefs.
The following figure summarizes all the sampling techniques we have discussed:
Figure 2: Sampling techniques
IN-TEXT QUESTIONS
8. What type of a sampling technique is being used in the following examples:

A. A manager wants to select a sample of their clients to ask for donation. She
arranges the list of clients in alphabetical order and randomly selects the first
client. She then proceeds to select every 5th client from the list.
B. A news reporter gathers consumer sentiments regarding a government policy by
interviewing people on the street.
13 | P a g e

C. A research student asks the respondents to identify other potential research

participants.
D. A researcher writes the names of different states in India on separate chits and
put them in a bowl. She then selects a chit without looking to get a sample of 7
states.
9. A sampling technique that does not involve probability is known as ___________.
10. Which of the following is not an example of Random sampling:
A) Simple random sampling B) Stratified sampling
C) Judgement sampling D) Systematic sampling
1.6 BRANCHES OF STATISTICS
A researcher may apply statistics to simply summarize and describe the characteristics of data
or employ statistics to draw some conclusions or inferences from the data. Vast number of
research apply two types of Statistical analysis:
1. Descriptive statistics
2. Inferential statistics
1.6.1 Descriptive Statistics

As the name indicates, descriptive statistics are basically used to ‘describe’ the data. It refers
to different techniques of summarizing and displaying data. This includes communicating the
patterns in the data or conveying the summary of data through graphs, tables, numerical or
simple charts. Calculating the measures of central tendency like mean, median and measures
of variability such as range, standard deviation and variance, along with creation of histograms,
bar chart, dot plots etc. constitute descriptive statistics.
It is important to note that we do not use descriptive statistics to draw any conclusions or
generalize the results. It is used to simply state the situation as represented by the data collected.
For example, suppose a research institute gathers data from a village consisting of 100
households. The institute can use the descriptive statistic to learn the average education level
of the household heads in that village. Let’s say that the education level of the head of the
household varies between illiterate to graduate and on an average, heads of the households have
attained education till class 10. They can further study the relationship between the education
level of the head of household and a household’s savings. Let’s assume that they find that the
households in which the heads were at least graduates, saved more. In descriptive statistics, we
14 | P a g e
can only report the findings. It cannot be concluded, by merely looking at the descriptive
statistics, that generally, in India, households with highly educated heads save more.
So, we see that descriptive statistics cannot be used to make general estimates or predictions.
Yet, descriptive statistics are extremely useful as they can provide a snapshot of the whole data
in meaningful ways. It helps simplify large data and present it in visually attractive ways. We
will learn about the various components of descriptive statistics in later chapters.
1.6.2 Inferential Statistics
On the other hand, inferential statistics is used to predict, estimate, and make other generalized
approximations based on the data. It is usually based on sample data to draw conclusions and
generalize the results to the larger population. Hypothesis testing and regression analysis are
two examples of inferential statistics.
Inferential statistics is very popular among researchers since it allows them to gather limited
data and extrapolate the results to a larger population. This saves them a lot of cost as well as
time.
If we consider the above example again, the research institute now gathers data from 10
randomly selected villages in India with total observations equal to roughly 1000 households.
Now the institute may generalize its results to state or national level through inferential
statistics.
The figure 3 summarizes the branches of statistics we discussed:
Statistical
analysis
Descriptive Inferential
statistics statistics
Measures of
Measures of Hypothesis Regression
central Graphs
variability testing analysis
tendency
Standard Bar graph,

Quartiles,
Mean Median Range deviation, histogram,
Percentiles
variance dot plot
Figure 3: Branches of statistics
15 | P a g e

IN-TEXT QUESTIONS
11. Fill in the blanks:
A. We can make predictions and estimations using __________ statistics.
B. Histogram is an example of __________ statistics.
C. Standard deviation is calculated as a part of ___________ statistics.
12. Select the correct option:
Inferential statistics can be used to
A) Estimate B) Generalize
C) Only A) D) Both A) and B)
1.7 SUMMARY
Statistics is defined as the study of collecting, organizing, analyzing, and interpreting

information in the form of data. Since the scope of statistics has broadened and is now used in
a number of practical fields, it is also referred to as applied statistics. It is both a science as well
as an art. Data are pieces of raw information, facts and figures that are used for analysis. The
two broad types of data used in statistics are- quantitative and qualitative data. Quantitative
data is measurable in terms of some numbers, whereas qualitative data cannot be directly
measured in numbers. quantitative data is further classified into discrete and continuous data
whereas qualitative data can be categorized into nominal scale, ordinal scale, interval scale and
ratio scale. Data can also be sorted into univariate, bivariate and multivariate data. Univariate
data consists of only a single variable. It is the simplest form of data. While bivariate data
involves two distinct variables, multivariate data includes more than two variables. Bivariate
and multivariate data allow a researcher to establish relationships and correlations between
variables. Population refers to all the individuals/entities possessing similar characteristics and
belonging to a particular group under study, a sample is a subset of the population. When the
descriptive statistics are taken from the population, they are termed as ‘parameter’ and
descriptive values taken from a sample are known as ‘statistic.’ The process of selecting a
sample from the population is known as ‘sampling’. The sampling techniques are broadly
classified into Probability sampling techniques and non-probability sampling techniques.
Simple random sampling, systematic random sampling, stratified random sampling and multi-
stage sampling are examples of probability sampling techniques. Whereas convenience
sampling, judgment sampling and snowball sampling are examples of non-probability
sampling techniques. The two branches of statistical analysis are descriptive and inferential
statistics. Descriptive statistics is defined as one that describes the data through graphs,
measures of central tendency and variability. On the other hand, inferential statistics utilize
data to predict, estimate or make other generalized approximations.
16 | P a g e
1.8 GLOSSARY
• Convenience Sampling: Data is collected from convenient sources

• Descriptive statistics: It is used to describe the data through graphs, measures of
central tendency and variability
• Inferential statistics: Used to predict, estimate, or make other generalized
approximations.
• Judgment Sampling: Sample is selected based on researchers’ judgement about the
characteristics of the individuals
• Multi-stage sampling: A sample is selected in stages, combining different sampling
techniques
• Non-probability sampling: Each unit of population does not have an equal chance of
getting selected in a sample
• Parameter: The descriptive statistics taken from the population
• Population: All the individuals/entities possessing similar characteristics and
belonging to a particular group under study.
• Probability Sampling: Where each unit of population has an equal chance (or
probability) of getting selected in a sample.
• Sample: Unbiased subset of the statistical population which is representative of the
entire dataset.
• Sample Size: Number of observations we select out of a population as a sample.
• Sampling: Process of selecting a sample from the population
• Simple Random Sampling: Each unit of population has an equal and independent
chance of being chosen
• Snowball Sampling: Existing respondents identify other participants for the study
• Statistic: The descriptive statistics taken from the sample
• Statistics: Study of collecting, organizing, analyzing and interpreting information in
the form of data.
• Stratified Random Sampling: Divide the population into sub-groups based on
homogeneous characteristics and then select random samples from each sub-group
• Systematic Random Sampling: Sample set is selected from the population in a fixed
interval
17 | P a g e

1.9 ANSWERS TO IN-TEXT QUESTIONS
1. A) Qualitative B) Quantitative C) Quantitative D) Quantitative E) Qualitative

2. A) False B) True C) True D) False E) True
3. A = 4; B = 2; C = 1; D = 3
4. Ratio
5. Multivariate
6. A. Statistic B. Parameter C. Parameter D. Statistic
7. I. B) Research is time sensitive
II. A) Sample
8. A. Systematic random sampling
B. Convenience sampling
C. Snowball sampling
D. Simple random sampling
9. Non-Probability sampling
10. C) Judgement sampling
11. A) Inferential B) Descriptive C) Descriptive
12. D) Both A) and B)
1.10 SELF-ASSESSMENT QUESTIONS
Q.1 Define Statistics. Is it science or art? Justify your answer.

Q.2 Give examples of a possible sample size of 4 from each of the following populations:
a. All news channels aired in India
b. All students in your university
c. All fast-food chains operating in India
d. Income of residents of India
e. Marks obtained out of 100 by first semester Statistics students
Q.3 What is the difference between a parameter and a statistic? Identify parameters and
statistics in the following hypothetical cases:
a. A dietician wants to compute the average number of sweets consumed by children
under the age of 7 within a month. From a random sample of 50 children, she
discovers a mean of 59 sweets a month. Whereas when she gathered the population
data, she came to know that the mean number of sweets consumed was actually 65.
18 | P a g e
b. On the occasion of National Milk Day, the government wants to estimate the
number of cows a dairy farmer owns in a particular state. Using the census
approach, the government finds that about 40 lakh households are engaged in dairy
farming in the state with the average number of cows owned equal to 22. When a
government official took a random sample of 100 households in a district engaged
in dairy farming, she found out that the average number of cows owned equaled to
37.
Q.4 In a college, parking on the premises has become a major problem. To deal with the
problem, the college administration wishes to compute the average parking time of the
students who park their vehicles in the college parking lot. One of the college officials
quietly follows 150 students and records the duration of time the students keep their
vehicle in the parking lot.
a. Identify the population of interest to the college administration.
b. What is the sample size that the college administration is examining?
Q.5 Briefly describe any two methods of drawing non-probability samples.
Q.6 Why are descriptive statistics used? Can we use descriptive statistics to make
generalized predictions based on the data? If not, then how can we do so?
1.11 REFERENCES
• Devore, J. L. (2016). Probability and Statistics for Engineering and the Sciences.
Cengage learning.
• Larsen, R. J., & Marx, M. L. (2012). An introduction to mathematical statistics and its
applications. Prentice Hall.
• McClave, J. T., Benson, P. G., & Sincich, T. (2018). Statistics for business and
economics. Pearson Education.
1.12 SUGGESTED READING
• Gupta, S. C. (2019). Fundamentals of statistics. New Delhi, India: Himalaya publishing

house.
19 | P a g e

LESSON 2
PICTORIAL METHODS IN DESCRIPTIVE STATISTICS
STRUCTURE

2.2 Introduction
2.3 Stem and Leaf Plot
2.4 Dot Plots
2.5 Bar Charts
2.6 Histograms
2.7 Summary
2.8 Glossary
2.11 References
2.1 LEARNING OBJECTIVES

1. To understand the different types of graphical techniques
2. To comprehend the advantages and disadvantages of various graphical techniques and
3. To gain proficiency in creating various types of graphs
2.2 INTRODUCTION
We’ve mentioned in the earlier chapters that descriptive statistics involve the computation of
the basic statistics of the data such as mean, median, standard deviation etc. These give a basic
idea about the distribution of the dataset. Visual representation of the data is also an integral
part of descriptive statistics. In this chapter we will take a closer look at the most common
graphic methods to present the data.
20 | P a g e
We will learn about the following graphical techniques:

1. Stem and Leaf plot
2. Dot plots
3. Bar charts
4. Histograms
2.3 STEM AND LEAF PLOT
A stem and leaf plot are a convenient way to visualize continuous data. The plot can easily be
constructed by hand and gives an overview of the distribution of the observations in the data at
first glance. To create a plot, the data is arranged in an order and divided into equal intervals.
We then create a table that presents the whole data set in two columns. The values of the data
are split into two – stem and leaf. The first column—referred to as the stem—includes the tens,
hundreds or thousands unit, as per the values of the data and the second column—known as
the leaf—contains the rest of the digits. The concept will become clear with the following
example.
Say a researcher has collected data of the weight of 10 college students, chosen at random. The
weights, in kgs, are as follows:
71, 43, 66, 52, 59, 83, 92, 67, 74, 61

The first step in creating a stem and leaf plot would be to arrange the data in ascending order:
43, 52, 59, 61, 66, 67, 71, 74, 83, 92

Now, we create a table with two columns- Stem and Leaf, with the Stem column consisting of
the tens digits and the leaf column containing the unit’s digit. The table will look like this:
Stem Leaf
4 3
5 29
6 167
7 14
8 8
9 2
21 | P a g e

Here, the first row with stem of 4 and leaf of 3 denotes the weight of 43kg and the last row
with stem of 9 and leaf of 2 denotes the weight of 92kg. Simply looking at the table, we can
work out that on average, the college students have a weight in sixties. We can observe that the
shape of the display gradually rises, peaks at 6 and then steadily declines. We call this a bell-
shaped curve which is symmetric in shape. We will learn about other features of a symmetrical
distribution later in the unit.
Let us consider another example in which we have a data set consisting of the number of hours
of daily sleep 100 students who are in college get. The stem and leaf display looks like this:
Stem Leaf
5 4455567889
6 00223446667899
7 000112233566789999
8 00000011222234445577778889
9 0001333355777779
10 01344446899
11 12247
Note that here, the first row with stem of 5 and leaf of 4 denotes 5.4 hours of sleep and the last
row with stem of 11 and leaf of 7 denotes 11.7 hours of sleep. We can clearly observe that most
of the students get about 8 hours of sleep. If we ask you how many students sleep for more than
10 hours a day, then you have to simply add the number of leaves written in front of stems 10
and 11. So the answer will be 16 students.
A stem and leaf plot are helpful to get a basic understanding of the dataset, however it gets
difficult to create a chart when the number of observations increase.
IN-TEXT QUESTIONS
1. The correct list of data for the following stem and leaf plot is:
Stem Leaf
0 3
3 27
5 119
7 0
22 | P a g e
A. 03, 32, 37, 11, 51, 59,70 B. 03, 27, 37, 51, 51, 59,70
C. 03, 32, 37, 51, 51, 59,70 D. 03, 32, 37, 51, 59,70
2. The following stem-and-leaf plot depicts the number of cakes that a home-baker sells
each week. If 1|7 represents 17 cakes, then,
Stem Leaf
0 689
1 024479
2 01136889
3 122
A. How many weeks did the home-bakers sell cakes?
B. How many weeks did they sell more than 25 cakes?
2.4 DOT PLOT
A dot plot is another simplified way to visualize the data in the form of dots representing each
unit of observation. The dots are stacked over one another that represent the frequency of the
value in our dataset. For example, suppose a researcher would like to know the number of
vaccinated children in a city. To make the analysis simpler, she divides the city into 5 localities
and collects the number of vaccinated children for each locality. The data collected is tabulated
in the following manner:
Locality No. of vaccinated children
1 6
2 1
3 3
4 11
5 8
23 | P a g e

The dot plot for the data would look like figure 1 where the height of each column denotes the
frequency of the observation.
Figure 1: Dot plot of vaccinated children in a city

Dots plots can be created for continuous data as well. Consider male literacy level in 10 Indian
states:
States Male literacy Rate (%)

Andhra Pradesh 73.4
Assam 90.1
Bihar 79.7
Chhattisgarh 85.4
Goa 92.8
Gujarat 89.5
Haryana 88
Kerala 97.4
Meghalaya 77.1
Uttar Pradesh 81.8
Since the data is continuous and unique, we would have one dot for each state. Instead, to
make the dot plots more informative, we create groups of data or class intervals.
24 | P a g e
Male literacy Rate (%) No. of states
70-75 1
75-80 2
80-85 1
85-90 3
90-95 2
95-100 1
Now we can easily create the dot plot using the above table. Try making the dot plot on your
own using the above table. Your dot plot should look something like figure 2:
Figure 2: Dot plot of male literacy level of 10 Indian states

The dots plots are useful to highlight clusters of observation when we have continuous data.
However, dot plots too, are difficult and inconvenient to create as the size of data increases.
25 | P a g e

IN-TEXT QUESTIONS
3. The following dot plot illustrates the number of hours a student spends talking on phone
per week:
A. How many students report that they did not talk on the phone at all during the week?
B. How many students spend at least 2 hours on the phone?
4. True or False: We cannot create dot plots for continuous data.
2.5 BAR CHARTS
One of the most popularly used graph types is a bar chart that uses horizontal or vertical bars
to depict the observations in the dataset. Bar charts can further be of two types: Stacked bar
chart or grouped bar chart. Let’s understand all the kinds of bar graphs using an example.
Suppose we have the following data on two-wheeler sales in a city over the years:
Year Two-wheeler Sales
2010 12,500
2015 17,000
2020 24,000
To create a vertical bar graph, we plot the years on the X-axis and the sale numbers on Y-axis.
The height of the vertical rectangular bar represents the value of the data, as presented below:
26 | P a g e
Figure 3: Bar chart of sale of two-wheelers in 2010, 2015 and 2020

Here each bar represents the number of two-wheelers sold in a given year.
Similarly, we can create a horizontal bar graph as well. Before we move on the stacked and
grouped bar charts, it is important to note two points about bar charts:
a. The numerical axis must start from zero
b. The width of the bars and the distance between each bar remains constant.
27 | P a g e

Stacked bar charts are designed in a way that two or more categories of the same data are
presented on the same bar. Stacked bar charts allow the reader to easily compare the value of
various categories simultaneously. Let us take the above example one step further. Suppose
that we have the following data for two-wheeler, three-wheeler and four-wheeler sales in a city:
Year Two-wheeler Sales Three-wheeler Sales Four-wheeler Sales
2010 12,500 8,900 26,000
2015 17,000 11,000 33,500
2020 24,000 6,500 35,000
To create stacked bar charts, we draw the bars for each category on top of each other for a
particular year. The final chart will look something like this:
Figure 4: Stacked bar chart of sale of two-wheelers in 2010, 2015 and 2020
Here we can compare the sales of each category of vehicle for each year. We can also create
100% stacked bar charts to present the same data in a more visually appealing way. A 100%
stacked bar chart represents the share of each category in the data out of 100. The height of all
the bars is equal to hundred percent and we can observe the relative changes in the values from
the size of the sub-bars:
28 | P a g e
Figure 5: 100% Stacked bar chart of sale of two-wheelers in 2010, 2015 and 2020
Grouped bar charts are a convenient way to compare different categories of data. In a grouped
bar chart, the bars for each category are placed adjacent to each other instead of on top of each
other. In this case, we can easily observe the absolute changes in the data of each category.
When interpreting stacked and grouped bar charts, it is crucial to pay extra attention to the
legend that is usually displayed at the bottom of a chart.
It is important to remember that it can become complex to present too many categories of data
in a stacked or grouped bar chart.
Figure 6: Grouped bar chart of sale of two-wheelers in 2010, 2015 and 2020
29 | P a g e

IN-TEXT QUESTIONS
5. The following bar graph displays the favorite color of 200 kindergarten students in a
school:
A. Which is the most preferred and least preferred color among the students?
B. How many students like oranges?
6. The following grouped bar chart shows the daily number of customers visiting a
shopping center in morning (M) and evening(E).
A. On which day/days does the shopping center receive an equal number of customers
in the morning as well as evening?
B. On which day/days do we see less customers shopping in the evening than in the
morning?
30 | P a g e
7. A travel agent organizes trips to destinations either in South India or North India. The
following 100% stacked bar chart presents the share of tourists visiting both the regions
between 2010 and 2014.
A. Looking at the bar graph can you say that over the years the popularity of North Indian
destinations is increasing? Why or why not?
B. In which year did maximum tourists visit South Indian destinations as compared to
North Indian destinations?
2.6 HISTOGRAMS
At first, a histogram may look similar to a bar chart, but both are significantly different to each
other. A histogram is also represented in the form of bars placed adjacent to each other, but
each bar here represents the frequency with which an observation occurs in the dataset. The
term frequency of any particular value is simply the number of times that value occurs in the
data set. Hence, a histogram is also said to represent the ‘frequency distribution of variables.’
On the horizontal axis, we usually take the range/class intervals and on the vertical axis we
place the frequency. We construct class intervals in such a way that each observation is
contained in exactly one interval.
For instance, following are the economics marks of 20 college students:
86, 57, 69, 64, 67, 59, 81, 34, 47, 46, 38, 51, 66, 91, 42, 73, 62, 70, 77, 55
To construct a histogram, we divide the dataset into class intervals with each interval
representing 10 marks. Next, we’ll insert the frequency with which an observation occurs
within each interval:
31 | P a g e

Marks Frequency
30-40 2
40-50 3
50-60 4
60-70 5
70-80 3
80-90 2
90-100 1
The histogram for the above data looks like:
Figure 7: Histogram of economics marks of 20 college students

One clear difference between a bar chart and a histogram is that in a histogram, the bars do not
have space in between them.
If there are areas of the measuring scale with a large concentration of data values and other
areas with relatively sparse data, equal-width class intervals may not be the best option. The
32 | P a g e
following Dot plot displays data with a strong concentration of values in the center and a limited
values on either side.
When there are only a few class intervals with equal widths, there are chances that all
observations fall into just one or two of the classes. There may be some classes that have zero
frequency if equal widths of class intervals are used. Using a few bigger intervals close to
extreme observations and narrower intervals in the area of high concentration is a wise
decision. To construct a histogram with unequal class widths, first determine the frequencies.
Then relative frequencies may be calculated by the following formula:
Number of times the value occurs
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 =
Number of observations in the data set
This signifies the proportion of times the value occurs in the data.
The height of each bar can then be computed using the formula given by:
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠
𝐵𝑎𝑟 ℎ𝑒𝑖𝑔ℎ𝑡 =
𝐶𝑙𝑎𝑠𝑠 𝑤𝑖𝑑𝑡ℎ
The resulting rectangle or bar heights are typically called densities.
Such a histogram has an interesting property. If we multiply the bar height by the class width,
we will get,
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠 = 𝐵𝑎𝑟 ℎ𝑒𝑖𝑔ℎ𝑡 × 𝐶𝑙𝑎𝑠𝑠 𝑤𝑖𝑑𝑡ℎ

Since the class width is nothing but the bar width, we can rewrite the above equation as:
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠 = 𝐵𝑎𝑟 ℎ𝑒𝑖𝑔ℎ𝑡 × 𝑏𝑎𝑟 𝑤𝑖𝑑𝑡ℎ
= 𝐴𝑟𝑒𝑎 𝑜𝑓 𝑟𝑒𝑐𝑡𝑎𝑛𝑔𝑙𝑒
This means that the area of each rectangle or bar represents the relative frequency of the
corresponding class interval. Moreover, the sum of relative frequencies should be one and
hence the total area of all rectangles in such a density histogram is one.
SHAPES OF HISTOGRAMS
The histogram in the first example follows a quite symmetric or a bell-shaped distribution. This
means that if we place a mirror in the exact center of the distribution, the left and right side of
the distribution will have the same shape. A symmetric or a bell-shaped distribution is also
called a ‘normal distribution.’
33 | P a g e

However, this is not the case always. Asymmetric distributions are called skewed. There are
two types of skewed distributions – right skewed distribution and left skewed distribution.
When the dataset has a greater number of observations on the left side of mean, then it is called
a right or positively skewed distribution. Conversely, when the dataset has a greater number of
observations on the right side of mean, then it is called a left or negatively skewed distribution.
Now consider the mathematics marks of the same 20 students:
45, 58, 51, 54, 49, 59, 88, 44, 49, 41, 48, 61, 66, 93, 40, 64, 69, 77, 72, 51
Follow the same steps to create intervals:
Marks Frequency
40-50 7
50-60 5
60-70 4
70-80 2
80-90 2
90-100 1
The histogram looks like this:
Figure 8: Histogram of mathematics marks of 20 college students

It is evident from the above histogram that it does not follow a symmetrical distribution. The
average mathematics score of a student in this class is 58. Since our data is concentrated on the
left side of the mean, we call such a distribution a right skewed distribution.
Lastly, consider English marks of the 20 students we have been examining:
75, 62, 93, 84, 96, 53, 81, 92, 87, 86, 91, 95, 82, 79, 64, 97, 62, 54, 71, 40
34 | P a g e
Creating equal intervals of 10 marks each:
Marks Frequency
40-50 1
50-60 2
60-70 3
70-80 3
80-90 5
90-100 6
Using the table, we can create the following histogram:
Figure 9: Histogram of English marks of 20 college students

If the average English marks in this class is 77, you may observe that we have more
observations on the right side of the mean. This depicts a left-skewed distribution. We will
discuss skewness in detail in the next lesson.
Histograms can take some more unusual shapes. Up till now we’ve seen histograms with only
one peak, in the center, on the right hand or on the left-hand side of the data. Such a histogram
with a single peak is known as a unimodal histogram. However, a histogram can have more
than one peak. A histogram with two peaks is referred to as a bimodal histogram. Such a
35 | P a g e

histogram arises in case the data set consists of observations of two very dissimilar kinds of
individuals or objects. Let us understand this with an example. A restaurant experiences high
footfall during lunchtime and dinner time. Hence if a researcher collects data on the number of
customers entering a restaurant in day, the histogram will display two peaks. A bimodal
histogram with hypothetical numbers would look something like this:
Figure 10: Bimodal histogram of number of the customers entering a restaurant in day
Similarly, multimodal histograms have more than two peaks. Such a graph suggests that the
data may have different patterns of response. Suppose a researcher has some data on the heights
of plants belonging to 3 different kinds of species. One of the species has tall plants, the other
has medium plants and the third species has shorter plants. She creates a histogram to find out
the pattern in her data. The following figure illustrates a histogram she created:
Figure 11: Multimodal histogram of height of plant species

36 | P a g e
This is clearly a multimodal histogram where each peak represents the most common height of
each species of plants.
Although histograms are a useful way to present the data, one major drawback of histograms
is that the reader cannot identify the individual values of the data by simply looking at the
graph.
IN-TEXT QUESTIONS
8. Based on the following histogram, answer the questions:
A. What is the frequency of the class interval 20-25?

B. Which class interval has the least frequency?
C. What is the cumulative frequency of the class interval 25-30?
9. The following histogram is an example of:
A. Symmetrical histogram B. Left skewed histogram

C. Right skewed histogram D. None of the above
37 | P a g e

10. The following histogram depicts a __________ histogram (unimodal / bimodal /

multimodal)
11. Choose the correct option from the bracket and fill in the blank:
When a dataset has greater number of observations on the left side of mean, then it is
called a _______ (right/ left) skewed distribution.
2.7 SUMMARY
In this lesson some of the most popular ways to display data, as a part of descriptive statistics,
were discussed. In a stem and leaf plot, data is divided into two columns – stem and leaf. The
plot gives a visual understanding of the distribution of the observations. The dot plots are
illustrated in the form of dots representing each unit of observation. The dots are stacked over
one another that represent the frequency of the value in our dataset. Bar charts use horizontal
or vertical bars to depict the observations in the dataset. Bar charts can further be of two types:
stacked bar chart and grouped bar chart. Histograms are the most widely used graphs in
statistics. Vertical bars are used to depict the observations in the dataset. When the dataset has
a greater number of observations on the left side of mean, it is called a right or positively
skewed distribution whereas when the dataset has a greater number of observations on the right
side of mean, then it is called a left or negatively skewed distribution. A histogram with a single
peak is known as a unimodal histogram. A histogram with two peaks is referred to as a bimodal
histogram. Histograms with more than two peaks are referred to as multimodal histograms.
2.8 GLOSSARY
• Bar Charts : A graph that displays data through vertical or horizontal rectangular bars
with heights of each bar representing the values of the observation
38 | P a g e
• Dot Plots : A plot illustrating the frequency of observations through dots on a simple scale
• Histograms : A graph that shows the frequency of data using rectangular bars with heights
of each bar representing the frequency of observations laying in that range
• Negative Skewness : When dataset has greater number of observations on right side of
mean
• Positive Skewness : When dataset has greater number of observations on left side of mean
• Stem and Leaf Plot : A plot where each observation in data is presented in two columns-
Stem (the first digit) and leaf (the other digits).
1. C. 03, 32, 37, 51, 51, 59,70

2. A. 20 B. 7
3. A. 10 B. 8
4. False
5. A. Most preferred color- Blue; Least preferred color- Green
B. 40
6. A. Tuesday B. Monday and Thursday
7. A. No, since the share of trips to North Indian destinations is declining over the
years.
B. 2014
8. A. 30 B. 35-40 C. 90
9. B. Left skewed histogram
10. Bimodal
11. Right
Q.1 Following are the weights of individuals who have taken membership of two local gyms
in a city:
Gym 1: 94 90 95 93 128 95 125 91 104 116 162 102 90 110 92 113 116 90 97 103 95 120 109
91 138
Gym 2: 123 116 90 158 122 119 125 90 96 94 137 102 105 106 95 125 122 103 96 111 81 113
128 93 92
39 | P a g e

Create stem and leaf plot for both the gyms and interpret the plots.
Q.2 The following table summarizes the data collected by a researcher on the time taken to
eat breakfast by 40 respondents:
Minutes: 0 1 2 3 4 5 6 7 8 9 10 11 12
People: 6 2 3 5 2 5 0 0 2 3 7 4 1
A. Create a dot plot taking minutes on the X-axis.

B. Is the shape of the plot symmetrical?
C. How many people skip their breakfast in the morning?
Q.3 Honey has recently passed class 12th and the following table presents her marksheet:
Business
Subject English Hindi Accountancy Mathematics Economics
studies
Marks 87 93 96 99 97 92
A. Create a bar graph to depict her marks for each subject.

B. How can you show a comparison of her marks with her classmate Hazel who secured
the following marks? Use a suitable bar graph to depict the marks of both the students
on the same graph.
Business
Subject English Hindi Accountancy Mathematics Economics
studies
Marks 82 97 88 71 79 95
In which subject/subjects was the difference in the marks between Honey and Hazel highest?
Q.4 Mr. Kapoor owns a garden with 30 cherry trees. The height of the trees in inches are:
61, 63, 64, 66, 68, 69, 71, 71.5, 72, 72.5, 67.5, 73.5, 74, 74.5, 76, 76.2, 76.5, 77, 77.5,
78, 78.5, 79, 79.2, 80, 81, 82, 83, 84, 85, 87
A. Create a histogram of the above data by creating intervals of 5 inches. Comment on the
shape of the graph.
B. Recently Mrs. Kapoor bought 10 more cherry plants to propagate them in their garden.
Their heights in inches are: 57.5, 66, 40.5, 59, 46, 69.5, 67, 51.5, 52, 62. Incorporate
this additional information and create a new histogram for all 40 cherry trees.
Comment and compare the shapes of both the histograms.
40 | P a g e
Q.5. Draw a histogram for the following dataset:
Class Intervals 50-60 60-70 70-80 80-90 90-100 100-110
Frequency 35 25 45 15 20 40
Comment on the shape of the histogram.

Q.6 The number of contaminating particles on a silicon wafer prior to a certain rinsing
process was determined for each wafer in a sample of size 100, resulting in the
following frequencies:
Number of 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
particles
Frequency 1 2 3 12 11 15 18 10 12 4 5 3 1 2 1
a. What proportion of the sampled wafers had at least one particle? At least five
particles?
b. What proportion of the sampled wafers had between five and ten particles,
inclusive? Strictly between five and ten particles?
c. Draw a histogram using relative frequency on the vertical axis. How would you
describe the shape of the histogram?
2.11 REFERENCES
Cengage learning.
• Gupta, S. C. (2019). Fundamentals of statistics. New Delhi, India: Himalaya publishing

house.
41 | P a g e

LESSON 3
MEASURES OF LOCATION AND VARIABILITY
STRUCTURE

3.2 Introduction
3.3 Measures of Central Tendency
3.3.1 Mean
3.3.2 Median
3.3.3 Mode
3.3.4 Relationship between Mean, Median and Mode
3.3.5 Other Measures of Central Tendency- Quartiles, Percentiles, Deciles and
Trimmed Mean
3.4 Measures of variability
3.4.1 Range and Inter-Quartile Range
3.4.2 Standard Deviation and Variance
3.5 Effect of Change in Origin and Scale
3.5.1 Change in Origin
3.5.2 Change in Scale
3.6 Skewness
3.7 Boxplots
3.8 Summary
3.9 Glossary
3.10 Answers to In-Text questions
3.12 References

1. To recognize the different types of measures of location and variability
42 | P a g e
2. To establish the relationship between various measures of location and variability

3. To compute measures of location and variability
4. To understand the major advantages and disadvantages of measures of location and
variability
5. To demonstrate the effects of change in origin and scale on the measures of location
and variability
6. To identify the various shapes of distributions and
7. To develop the ability to present mean and variance through boxplots
3.2 INTRODUCTION
Continuing with our discussion about descriptive statistics, we now move on to the core
elements of the descriptive statistics, also known as summary statistics. Recall that under
descriptive statistics we quantitatively present an overview of the data. Descriptive statistics
can be divided into two sub-groups:
a. Measures of central tendency – Measures of central tendency comprise of certain
measurements that give us a typical or central value from the data around which the
data is generally clustered. These central tendencies are also referred to as averages. In
this course, we will concentrate on the three major measures of central tendencies-
Mean, median, and mode. We will also briefly study quartiles, percentiles, deciles and
trimmed mean.
b. Measures of variability – Measures of variability represent the spread of the data around
the averages. It denotes the variation in the data. The most widely used measures of
variability are range, standard deviation, and variance.
Let us now look at the two in detail.
3.3 MEASURES OF CENTRAL TENDENCY
3.3.1 Mean
The most common measure of central tendency is arithmetic mean or simply, mean of the
data set. You must have studied about mean at some point in your mathematics course. Mean
is simply the average value of the dataset represented which can be calculated by adding the
value of all the observations and dividing the sum by the total number of observations, i.e.,
∑𝑛𝑖=1 𝑥𝑖
𝜇=
𝑁
Where, 𝜇 denotes the mean of 𝑁 observations, ∑𝑁
𝑖=1 𝑥𝑖 = 𝑥1 + 𝑥2 + 𝑥3 + ⋯ + 𝑥𝑛
43 | P a g e
Recall from previous lesson, population mean (parameter) is denoted by 𝜇 and sample mean
(statistic) is denoted by 𝑥̅ .
Let us calculate the mean of following 5 numbers to understand the formula better:
12, 31, 78, 45, 29

Here,
12 + 31 + 78 + 45 + 29
𝜇=
5
195
=
5
= 39
Hence, we can say that the mean of the dataset is 39.
Let’s take another example. A researcher has a sample of age of 10 people.
45, 39, 53, 45, 43, 48, 50, 40, 40, 45

The researcher would like to know the average age of the group.
Again,
45 + 39 + 53 + 45 + 43 + 48 + 50 + 40 + 40 + 45
𝑥̅ =
10
448
10
= 44.8
So, we can conclude that the average age in the data set is 44.8 years.
Note here that we denote mean as 𝑥̅ . This is called the sample mean, that is calculated from the
sample data. The mean taken from the population data is denoted by µ, as mentioned in the
previous chapters.
Mean is a useful measure of central tendency with allows the reader to get an approximate idea
about the typical value in a dataset. When we have very large datasets then the relevance of
mean is clearer. Say a researcher gathers prices of houses in a locality. She gathers the price of
about 1000 houses. The prices range between Rs. 45 lakhs to 1.9 crore. If the researcher
calculates the average price of the houses as Rs. 1.2 crore, then we can easily interpret that a
44 | P a g e
typical house in that locality is worth Rs. 1.2 crore. So, mean is a convenient way to locate the
average value of the dataset.
Before we move on to median, we should note some pros and cons of using mean. Arithmetic
mean is the simplest measure of central tendency and is rigidly defined. The mean is not
affected by the order of the data, i.e., the data may be in ascending or in descending order. Each
and every observation in the dataset is used to calculate mean. This ensures that there is no loss
of information. Finally, Arithmetic mean is capable of further mathematical treatment. If we
have separate means of two groups of data, we can easily get the combined mean. However,
mean also suffers from some limitations. First, mean is affected by extreme observations. Few
extreme observations can impact the mean of the dataset which may not be the accurate
representation anymore. Second, mean cannot be calculated if even one observation is missing.
We cannot determine the mean by merely glancing at the dataset. It needs to be calculated each
time using the formula. Finally, mean cannot be calculated for open ended class intervals.
Sometimes when we’re collecting data, we may come across extremely large or extremely
small values in our data that may either be incorrectly stated by the respondent or incorrectly
recorded by the researcher. In such cases, we see that mean gets impacted by extreme values
in our dataset. This is one of the major limitations of using mean as a central value.
The following example will make the argument clear: suppose that the salary of 10 employees
in a firm, in lakhs per annum, is given below:
3.2 , 4.5, 3.5 , 3 , 4.2, 16.5, 3.7, 3, 15, 4.4

Calculating mean using the formula:
∑𝑁
𝑖=1 𝑥𝑖
𝜇=
𝑁
We get,
3.2 + 4.5 + 3.5 + 3 + 4.2 + 16.5 + 3.7 + 3 + 15 + 4.4
𝜇=
10
61
10
= 6.1
We conclude that the average salary in the firm is Rs 6.1 lakhs per annum. However, if we take
a closer look at the observations, we realize that eight out of the 10 salaries lie between 3 and
4.5 lakhs and the two extreme values influenced the mean. So, we see here that mean is not
representative of the above data.
45 | P a g e
When we have data that contains extreme values or outliers, in such cases, we prefer to use
median over mean. We’ll see why and how, in the subsequent section.
IN-TEXT QUESTIONS
1. The mean of the first 10 whole numbers is ___.

2. The mean of the following numbers is 24.
16, 18, 19, 21, P, 23, 23, 27, 29, 29
The value of ‘P’ is:
A. 20.5 B. 30
C. 24 D. 35
3. The mean of 6, 8, 𝑥 + 2, 10, 2𝑥 − 1, 𝑎𝑛𝑑 2 is 9. The value of x the value of the

observations in the data are:
A. x = 11; Observations = 13 and 21 B. x = 9; Observations = 11 and 17
C. x = 10; Observations = 12 and 19 D. x = 12; Observations = 14 and 23
4. The average marks of 39 students of a class are 50. The marks obtained by the 40 th
student is 39 more than the average marks of all 40 students. Find the mean marks of
all 40 students.
3.3.2 Median
The word Median is synonymous with ‘middle’. Median is the middle observation in the data
when the data is arranged in ascending or descending order of magnitude. Median is less
affected by extreme values. The population median (parameter) is denoted by 𝜇̃ while sample
median (statistic) is symbolized as 𝑥̃. There are two formulas to calculate median if we have N
observations:
i. When there are odd number of observations, then median is simply the middle
value, i.e., when n = odd,
𝑁 + 1 𝑡ℎ
𝜇̃ = ( ) 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
2
ii. When there are even number of observations, then median is simply the average of
the two middle values or, when n = even,
𝑁 𝑡ℎ 𝑁 𝑡ℎ
𝜇̃ = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 ( ) 𝑎𝑛𝑑 ( + 1) 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
2 2
46 | P a g e
Let us consider the same example of salaries of 10 employees in a firm, in lakhs per annum:
3.2 , 4.5, 3.5 , 3 , 4.2, 16.5, 3.7, 3, 15, 4.4

To calculate the median, the first step is to arrange the data in ascending order:
3, 3, 3.2, 3.5, 3.7, 4.2, 4.4, 4.5, 15, 16.5

Since we have even (10) number of observations, we will use the second formula, i.e., average
𝑁 𝑡ℎ 𝑁 𝑡ℎ
of ( ) 𝑎𝑛𝑑 ( + 1) observation. As we know N = 10. So,
2 2
10 𝑡ℎ 10 𝑡ℎ
𝜇̃ = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 ( ) 𝑎𝑛𝑑 ( + 1) 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
2 2
= 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 5𝑡ℎ 𝑎𝑛𝑑 6𝑡ℎ 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
= 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 3.7 𝑎𝑛𝑑 4.2
3.7 + 4.2
=
2
= 3.95
So, we can conclude that the median salary in the firm is Rs. 3.95 lakh per annum.
Note here that if we had salaries of only 9 employees, then since 9 is odd, the median would
𝑁+1 𝑡ℎ
simply be the middle observation, i.e., the ( ) observation. This means that the median
2
would’ve been the 5th observation, i.e., 3.7. We would then have concluded that the median
salary in the firm is Rs. 3.7 lakh per annum.
It is worth noting here that even if we increase the values of the extreme observations from 15
and 16.5 to say 40 and 50, we will still get the same value of median. As median does not get
affected by extreme values in a dataset, it is said to be representative of the sample. Hence, in
cases when we have extreme observations, median is a better measure of central tendency than
mean.
Median as a measure of central tendency is quite useful since it is easy to understand and
calculate and, in some cases, median can be located by simply looking at the data. Median is
better than mean since is not at all affected by extreme values in the dataset. Also, it can be
calculated for open-ended distributions. However, the limitations of using mean are that the
data must be in either ascending or descending order. If we have a very large dataset, arranging
the data may be time-consuming. Median is not based on all the observations and so may not
be representative of the dataset. Finally, it is not capable of further mathematical treatment.
47 | P a g e
IN-TEXT QUESTIONS
5. The import of electronic products in million dollars in a country for eight years was
recorded as 27.4, 16.6, 1.7, 14.1, 32.9 18.7, 3.8, 22.5. The median import of the country
is ____.
6. The following numbers are arranged in ascending order:
15 , 𝑥 , 22 , 𝑥 + 7, 32 , 56 , 88
If the median of the data is 25, then the value of x will be:
A. 17 B. 25
C. 18 C. 19
7. The runs scored in a cricket match by 11 players is as follows:
7, 16, 167, 41, 110, 57, 1, 16, 9, 0, 16
3.3.3 Mode
In simple terms, mode is that value in the dataset which appears most frequently. Like median,
the value of mode too does not get affected by extreme observations. Mode is easy to
understand and calculate and, in some cases, its value can be located by simply looking at the
data. Mode can also be calculated for open-ended distributions.
For example, following are a person’s daily expenditure, in Rs., on lunch in a week:
130, 115, 130, 130, 165, 150, 130

Clearly, we can observe that the person spends Rs. 130 four times a week on lunch. Hence, the
mode is 130.
Data may have a single mode, two modes (known as bimodal) or more than two modes
(multimodal).
Mode also suffers from some limitations. First, it is not rigidly defined. Second, mode is not
based on all the observations and so may not be representative of the dataset. Finally, it is not
capable of further mathematical treatment
3.3.4 Relationship between Mean, Median and Mode
Now that we have studied the three measures of central tendencies, you may believe that since
all of the measures denote the central value in a dataset, then they must be the same. In this
section we’ll see that this is not always true.
48 | P a g e
First take the following 16 observations and try to calculate the mean, median and mode by
yourself:
4, 5, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 9, 10
Mean =
𝑥̅ =
𝑛
4 + 5 + 6 + 6 + 6 + 7 + 7 + 7 + 7 + 7 + 7 + 8 + 8 + 8 + 9 + 10
16
112
=
16
= 7
We have mean equal to 7.
Since we have even (16) observations, for median, we use the following formula:
𝑛 𝑡ℎ 𝑛 𝑡ℎ
𝑥̃ = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 ( ) 𝑎𝑛𝑑 ( + 1) 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
2 2
16 𝑡ℎ 16 𝑡ℎ
= 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 ( ) 𝑎𝑛𝑑 ( + 1) 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
2 2
= 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 8𝑡ℎ 𝑎𝑛𝑑 9𝑡ℎ 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
7+7
=
2
=7
Hence, we get median equal to 7.
Finally, we can look at the data and can infer that the mode is 7 since it occurs the maximum
number of times in the dataset.
49 | P a g e
In this example, it is evident that Mean = Median = Mode. We can look at the central tendencies
using a histogram as well:
Figure 1: Histogram of symmetric data
Figure 2: Histogram of symmetric data

We can see that the distribution is symmetric or is a bell-shaped curve. We can conclude from
here that when the distribution is symmetric, the three measures of central tendencies converge
at the same point.
However, when we have unsymmetrical or skewed data, the three measures are not equal. We
will learn about skewness in detail later in the lesson.
The relationship between Mean, Median and Mode, when the data is not symmetrical, can
mathematically be explained by the following formula:
3 𝑀𝑒𝑑𝑖𝑎𝑛 = 2 𝑀𝑒𝑎𝑛 + 𝑀𝑜𝑑𝑒
50 | P a g e
IN-TEXT QUESTIONS
8. A researcher records the age of each participant in her study. The ages are:
21, 59, 62, 21, 66, 28, 66, 48, 79, 59, 28, 62, 63, 63, 48, 66, 59, 66, 48, 79, 19, 79
The mode of the above data is:
A. 66 B. 59
C. 79 D. 62
9. A researcher has computed the mean of her data as 22.5 and median as 20. Calculate
the value of mode using these values. Is the distribution symmetrical?
10. A researcher calculates the following values of median, and mode of a distribution.
Median = 17.5
Mode = 20.5
A. Calculate Mean.
B. Do these values represent a symmetrical distribution?
3.3.5 Other measures of central tendency- Quartiles, percentiles, deciles and trimmed
mean
Mean and median are not the only measures of central tendency. There are several others as
well. We will briefly introduce the concepts of quartiles, percentiles and trimmed mean.
Just as a median divide the dataset into two equal halves, quartiles divide the data set into 4
equal parts. There are 3 quartiles, and each quartile consists of exactly 25% of observations.
The first quartile, Q1, containing the first 25% of observations is known as the lower quartile.
The second quartile, Q2 is called the median which divides the dataset into two. The third
quartile, Q3 is known as the upper quartile where 75% of observations lie below it and 25% of
the observations are greater than this quartile.
To calculate quartiles, we first arrange the observations in ascending or descending order. We
can then find the value of each quartile by using the following formulae:
𝑛 + 1 𝑡ℎ
𝑄1 = ( ) 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
4
51 | P a g e
𝑛 + 1 𝑡ℎ
𝑄2 = ( ) 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 = 𝑀𝑒𝑑𝑖𝑎𝑛
2
𝑛 + 1 𝑡ℎ
𝑄3 = (3 × ) 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
4
Consider the following dataset to understand the quartiles better:
0, 2, 5, 7, 8, 10, 16, 23, 35, 52, 77.
Since there are odd number of observations, you can easily identify the median in the above
dataset. Median is 10. This is our second quartile, i.e., Q2. Now as per the formula of the first
quartile,
11 + 1 𝑡ℎ
𝑄1 = ( ) 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
4
12 𝑡ℎ
= ( ) 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
4
= 3𝑟𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
=5
Similarly, for third quartile,
𝑛 + 1 𝑡ℎ
𝑄3 = (3 × ) 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
4
11 + 1 𝑡ℎ
= (3 × ) 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
4
12 𝑡ℎ
= (3 × ) 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
4
= 9𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
= 35
For grouped data, a quartile can be calculated using the following formula:
𝑗𝑁
− 𝑃𝑐𝑓
𝑄𝑗 = 𝐿 + 4 × 𝑖 𝑓𝑜𝑟 𝑗 = 1, 2, 3.
𝑓
52 | P a g e
Here, L = lower limit of quartile class, Pcf = Preceding cumulative frequency and i = size of
quartile class.
We can easily use the above formula to obtain each quartile in the following manner:
𝑁
− 𝑃𝑐𝑓
𝑄1 = 𝐿 + 4 ×𝑖
𝑓
𝑁
− 𝑃𝑐𝑓
𝑄2 = 𝐿 + 2 ×𝑖
𝑓
and,
3𝑁
− 𝑃𝑐𝑓
𝑄3 = 𝐿 + 4 ×𝑖
𝑓
Percentiles, on the other hand, simply denote that observation below which a particular
percentage of observations fall. The value of percentiles varies on the scale from 1 to 100. For
instance, 90th percentile would indicate that observation in the dataset below which 90% of
observations fall. A percentile of an observation ‘x’ can be calculated by the following formula:
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑉𝑎𝑙𝑢𝑒𝑠 𝐵𝑒𝑙𝑜𝑤 “𝑥”
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 = × 100
𝑇𝑜𝑡𝑎𝑙 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑉𝑎𝑙𝑢𝑒𝑠
For grouped data, a percentile can be calculated by using the following formula:
𝑗𝑁
− 𝑃𝑐𝑓
𝑃𝑗 = 𝐿 + 100 × 𝑖 𝑓𝑜𝑟 𝑖 = 1,2,3, … , 99.
𝑓
Similarly, deciles divide a dataset into 10 equal parts, which is in contrast to a percentile, which
divides a dataset into 100 parts. As seen above, we can derive a similar formula for computing
a decile:
𝑗𝑁
− 𝑃𝑐𝑓
𝐷𝑗 = 𝐿 + 10 × 𝑖 𝑓𝑜𝑟 𝑖 = 1,2,3, … , 9.
𝑓
Where the symbols have the usual meaning and interpretation. You must note here that we will
have ninety-nine percentiles (𝑃1 , 𝑃2 , … … … , 𝑃99 ) and ten deciles (𝐷1 , 𝐷2 , … … … , 𝐷9 ). For both,
percentiles and deciles, the middle values, 𝑃50 and 𝐷5 represent the median.
53 | P a g e
As we already know that mean is sensitive to extreme observations, we use trimmed mean to
eliminate the extreme observations from our analysis. Such a measure is considered to be more
accurate than the regular mean. For instance, to compute a 5% trimmed mean, we will eliminate
the smallest 5% and the largest 5% of the sample observations and then calculate the mean of
the remaining observations in the regular way.
IN-TEXT QUESTIONS
11. For the following data set, the value of upper quartile is _______.
18, 30, 32, 39, 54, 57, 61, 62, 81, 88, 90
12. 2nd Quartile = 5th Decile = 50th Percentile =
A. Mode B. Median
C. Mean D. Trimmed mean
3.4 MEASURES OF VARIABILITY
Reporting the central values of a dataset provides only partial information about the entire data.
It is possible that the two datasets have similar measures of central tendency, but both may
differ based on the spread of the values. The logic will become clear through the following
example.
Consider two restaurants selling pizzas that are located in the same town. A researcher collected
data on the delivery time of both the restaurants, in minutes, for a week, as given below:
Restaurant A: 42, 50, 47, 43, 52, 55, 40

Restaurant B: 47, 32, 70, 55, 65, 35, 25
Before you continue reading, you should try to solve the value of the central tendency. Since
we do not have repeated values in the data, you should try and calculate the value of mean or
median.
To calculate mean, use the following formula:
𝑥̅ =
𝑛
For restaurant A,
42 + 50 + 47 + 43 + 52 + 55 + 40
𝑥̅𝐴 =
7
54 | P a g e
329
=
7
= 47
For restaurant B,
47 + 32 + 70 + 55 + 65 + 35 + 25
𝑥̅𝐵 =
7
329
=
7
= 47
It is your task to check if the value of median for both the restaurants is also equal to 47 or not.
Moving on, we can conclude that the average delivery time of both the restaurants is 47
minutes. You may also try to create a dot plot of the two datasets. It will look something like
this:
Now, since both the datasets have the same central value, can we claim that both the datasets
convey the same information? No.
If you observe the values in each dataset carefully, you will notice that the delivery time of
restaurant A ranges between 40 and 55 minutes, whereas the delivery time of restaurant B
ranges between 25 and 70 minutes. Even in the Dot plots, you may observe that the data points
of restaurant A are clustered together, whereas the data points of restaurant B are spread out.
What can you infer from this extra piece of information? This means that restaurant A is more
consistent in delivering pizzas between 40 and 55 minutes, that is, the time frame of delivery
is shorter. Whereas the time taken by restaurant B to deliver a pizza is subject to more variation,
that is the time frame of delivery is longer. Knowledge about such variation is important when
we have to make important decisions. In this example, say you are starving, but you have just
entered an Economics class that will finish in exactly 45 minutes. If you have to take the
decision now to order a pizza, which restaurant would you prefer? You should prefer restaurant
A since it is possible that restaurant B will deliver the pizza 20 minutes earlier and you are sure
that neither would the professor finish the class, nor would you be able to leave the class that
early.
55 | P a g e
Figure 3: Dot plot of delivery time of two restaurants

This is a very minor decision to make, you may argue. However, measures of variability are a
basis for making many more important distinctions and decisions. You may now appreciate the
importance of studying variability.
So, measures of variability denote dispersion in a dataset. In other words, it shows how far
away the data points are from the measure of central tendency. Low dispersion suggests that
the data points are closer by and clustered around the average, whereas high dispersion
indicates that the data points are further apart from each other. High dispersion also means that
there is more variability in the data and hence, more chances of getting extreme values that
may distort our estimations.
In this lesson we will study about three measures of variability – range, standard deviation, and
variance.
3.4.1 Range and inter-quartile range
Range is the most straightforward measure of variability. It is simply the difference between
the largest and the smallest value in the dataset.
𝑅𝑎𝑛𝑔𝑒 = 𝐻 − 𝐿
Where H is the highest value and L is the lowest value of a dataset. In the above example, the
range for restaurant A = 55 − 40 = 15. Whereas the range for restaurant B = 70 − 25 = 45.
This clearly indicates that there is more variability in the data points of restaurant B as
compared to restaurant A.
56 | P a g e
The concept of range is extensively used in statistical quality control. Range is helpful in
studying the variation in the prices of shares and debentures and other commodities that are
very sensitive to price changes from one period to another. For the meteorological department
too, range is a good indicator for weather forecast.
The relative measure corresponding to range, called coefficient of range, is obtained by the
formula:
𝐻−𝐿
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑅𝑎𝑛𝑔𝑒 =
𝐻+𝐿
Although range is easy to understand and compute, the usage of this measure is limited since
it takes into consideration only the extreme data points and other observations in the data are
simply ignored. So, range can be affected by extreme values. Moreover, as the size of the
dataset increases, range loses its relevance as a measure of variability.
A slightly better measure of variability than range is the inter-quartile range. Here, instead of
taking the difference between the extreme observations in the dataset, we take the difference
between the upper and lower quartile. It is calculated as Q3 – Q1. This is a better measure than
range since it takes into account only the middle 50% of the observations and the extreme
observations do not affect the measure.
3.4.2 Standard deviation and Variance
Standard deviation and Variance are considered significantly better measures of variability as
they are based on all the observations in the dataset and hence are more sensitive than range
and inter quartile range. Since standard deviation is just the square root of variance, we will
begin our discussion with variance.
As the name suggests, variance illustrates the variation in a dataset, that is, how far each data
point is from the average. Formally, variance is calculated by dividing the sum of the squared
deviations from the mean by (n – 1), where n denotes the sample size. We symbolize sample
variance by 𝑠 2 and the formula can be written as:
2
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
𝑠 =
(𝑛 − 1)
Here, the term ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 represents the sum of square of deviations of each data point
taken from the sample mean 𝑥̅ . To make this simpler, we break this figure into three steps:
Step 1: Calculate the difference between the mean and each observation in the dataset
Step 2: Square each value of difference received from Step 1
57 | P a g e
Step 3: Add all the values you get from step 2, i.e., the squares of differences
You should note that we add up the squares of deviations (differences) instead of simply adding
only the differences since by merely adding deviations, we will get zero. To understand it
better, consider again the example of the two restaurants selling pizzas. The delivery time of
restaurant A, in minutes was:
42, 50, 47, 43, 52, 55, 40

We have already calculated the mean delivery time, i.e., 47 minutes.
Now deviation of each data point from the mean would be:
(𝑥1 − 𝑥̅ ) = 42 − 47 = −5
(𝑥2 − 𝑥̅ ) = 50 − 47 = 3
(𝑥3 − 𝑥̅ ) = 47 − 47 = 0
(𝑥4 − 𝑥̅ ) = 43 − 47 = −4
(𝑥5 − 𝑥̅ ) = 52 − 47 = 5
(𝑥6 − 𝑥̅ ) = 55 − 47 = 8
(𝑥7 − 𝑥̅ ) = 40 − 47 = −7
Now adding up all the deviations, we get,

7
∑(𝑥1 − 𝑥̅ ) = − 5 + 3 + 0 − 4 + 5 + 8 − 7
𝑖
=0
Since the positive and negative deviations from the mean cancel each other out, we consider
the sum of squared deviations from the mean. In this way we can eliminate all the negative
values. So, in our example,
(𝑥1 − 𝑥̅ )2 = (−5)2 = 25
(𝑥2 − 𝑥̅ )2 = 32 = 9
(𝑥3 − 𝑥̅ )2 = 02 = 0
(𝑥4 − 𝑥̅ )2 = (−4)2 = 16
(𝑥5 − 𝑥̅ )2 = 52 = 25
58 | P a g e
(𝑥6 − 𝑥̅ )2 = 82 = 64
(𝑥7 − 𝑥̅ )2 = (−7)2 = 49
Now adding up all the squared deviations, we get,

7
∑(𝑥1 − 𝑥̅ )2 = 25 + 9 + 0 + 16 + 25 + 64 + 49
𝑖
= 188
Finally, to get the variance, divide the sum of squared deviations by 𝑛 − 1.

188
𝑠2 =
7−1
∴ 𝑠 2 = 31.3
Sample standard deviation, s, is simply the square root of the variance, that is,
𝑠 = √𝑠 2
In our example, standard deviation = √31.3 = 5.6 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑒𝑙𝑦.

We can interpret the value of standard deviation as a typical deviation from sample mean. You
may also understand it in this way that a typical delivery by restaurant A would be 47 minutes
and ±5.6 minutes. This means that a typical delivery from restaurant A will take something
between 41.4 and 52.6 minutes. In the above example, it is now your task to calculate the
variance and standard deviation of the delivery time of restaurant B. Check if you get the value
of 𝑠 2 = 337.8 and 𝑠 = 18.3. Try and interpret the value of standard deviation you got for
restaurant B on your own. What is the typical time span of deliveries from restaurant B?
We can clearly see that the variance and standard deviation of restaurant B is higher than
restaurant A. This implies that there is more variation in the delivery time by restaurant B as
compared to restaurant A.
Note that both 𝑠 and 𝑠 2 are non-negative.

As we had differentiated the symbols of population mean and sample mean, we do the same in
case of sample variance/standard deviation and population variance/ standard deviation. While
sample variance is denoted by 𝑠 2 and sample standard deviation is denoted by 𝑠, population
variance is denoted by the Greek alphabet 𝜎 2 and standard deviation is denoted by 𝜎. The basic
understanding of the population parameters remains the same, just that we now say that the
59 | P a g e
population variance denotes the variability in the population and population standard deviation
denotes the typical deviation of a population value from its population mean µ.
The formal representation of the population variance and standard deviation also gets modified.
In terms of the population parameters, the population variance can be written as:
∑𝑛𝑖=1(𝑥𝑖 − 𝜇)2
𝜎2 = √
𝑁
Population standard deviation can be written as:
𝜎 = √𝜎 2
Just as we may use 𝑥̅ to make inferences about µ, similarly, we use s2 to make inferences about
σ2 .
Note here that we divide the sum of deviations from mean by N and not N-1 since s2 is based
on n-1 degrees of freedom. Degree of freedom refers to the maximum number of independent
values, that have the freedom to vary, in a sample. In other words, if we fix 𝑥̅ , then we need
only determine (n−1) number of the elements in the sample in order to know the nth element
of the sample.
Coefficient of Variation (C.V.)
A very popular and frequently used relative measure of variation is the coefficient of variation
denoted by C.V. This is simply the ratio of the standard deviation to arithmetic mean expressed
as a Percentage.
𝜎
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 = 𝐶. 𝑉. = × 100
𝑥̅
When C.V. is less in the data, it is said to be less variable or more consistent.
Consider the following data on the mean daily sales and standard deviation of four regions:
Region Mean daily sales (Rs.’000) Standard deviation (Rs.’000)

1 82 10.41
2 44 5.85
3 70 9.52
4 60 11.22
60 | P a g e
To determine which region is most consistent in terms of daily sales, we can calculate the
coefficient of variation:
10.41
𝐶𝑉1 = × 100
82
= 12.69
5.85
𝐶𝑉2 = × 100
44
= 13.29
9.52
𝐶𝑉3 = × 100
70
= 13.60
11.22
𝐶𝑉1 = × 100
60
= 18.70
Since the coefficient of variation is 12.69, the minimum for region 1. Hence the most consistent
region for sales is Region 1.
IN-TEXT QUESTIONS
13. If the standard deviation of a data is 0.012. The variance will be:
A) 0.144 B) 0.00144
C) 0.000144 D) 0.0000144
14. The variance of the first 10 whole numbers is _______.
15. In a class of 100 students, the mean marks on a particular exam was 75, and the standard
deviation was 0. This implies that:
A) All students scored 75 marks B) Variance is 0.75
C) Standard deviation cannot be zero D) None of the above
16. If the mean of certain observations is given as 60 and the standard deviation is 12, then
the coefficient of variation is 20%. (True/False)
61 | P a g e
3.5 EFFECT OF CHANGE IN ORIGIN AND SCALE
Now that we are clear with the calculation of the measures of central tendencies and variations,
we will now examine the effects of change in origin and scale on both mean and standard
deviation/variance. It is important to learn about these effects since often a researcher may
incorrectly report the values in the dataset and to recalculate the mean and standard deviation
can become a lengthy task. To avoid such an inefficiency, we will learn how does mean and
variance respond if every value in the data set is changed.
3.5.1 Change in origin
Change in origin can also be understood in simpler terms as shifting of data. This suggests a
situation in which we add or subtract a constant value from all the observations in our dataset.
We will now see how this impacts the value of mean and standard deviation. Let us understand
this concept through a simple example.
Suppose we have the following 5 observations in our sample:
3, 9, 12, 18, 23
For convenience, we will call these observations as the ‘original dataset.’ At this point, you can
easily compute the mean and standard deviation of the original dataset.
∑𝑛
𝑖=1 𝑥𝑖 3+9+12+18+23 65
Mean = = = = 13
𝑛 5 5
∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2 100 +16+1+25+100 242
Variance = = = = 60.5
𝑛−1 4 4
Standard deviation = √𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = √60.5 = 7.7 (𝑎𝑝𝑝𝑟𝑜𝑥. )

Now, let us add a constant value 5 to each observation in our original dataset. The new values
will become:
8, 14, 17, 23, 28
Again, let us calculate the mean and standard deviation of the new values. We get,
∑𝑛
𝑖=1 𝑥𝑖 8+14+17+23+28 90
Mean = = = = 18
𝑛 5 5
∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2 100 +16+1+25+100 242
Variance = = = = 60.5
𝑛−1 4 4
Standard deviation = √𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = √60.5 = 7.7
62 | P a g e
Did you notice the pattern in the new estimates? As compared to the original dataset, we
observe that the value of mean increased by 5 whereas the value of standard deviation and
variance remained unchanged. You need to remember that this pattern remains the same
whatever constant we add or subtract from our observations. Try repeating the exercise by
subtracting 5 from each observation in the original dataset and calculating the mean and
standard deviation of the new values. You will see that the value of mean reduces by 5 and
standard deviation again remains the same. So, in general we can conclude that:
𝐼𝑓 𝑦𝑖 = 𝑥𝑖 + 𝑎, 𝑤ℎ𝑒𝑟𝑒 𝑎 𝑖𝑠 𝑎 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑜𝑟 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡, 𝑡ℎ𝑒𝑛,
𝑀𝑒𝑎𝑛 ∶ 𝑦̅ = 𝑥̅ + 𝑎 ,
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 ∶ 𝑠𝑦 = 𝑠𝑥
3.5.2 Change in scale

The other way we can modify data is through scaling. Scaling the data means to either multiply
or divide all the observations in the data. Let us see the impact of change in scale on mean and
standard deviation. Consider again the original dataset in the above example:
3, 9, 12, 18, 23
We will change the scale of the dataset by multiplying each observation by 3.
The new dataset we will get is:
9, 27, 36, 54, 69
Calculate the mean and standard deviation:
∑𝑛
𝑖=1 𝑥𝑖 9+27+36+54+69 195
Mean = = = = 39
𝑛 5 5
∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2 900+144+9+225+900 2178
Variance = = = = 544.5
𝑛−1 4 4
Standard deviation = √𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = √544.5 = 23 (𝑎𝑝𝑝𝑟𝑜𝑥. )

Now compare the above values with the original mean and standard deviation. We find that by
multiplying each observation by 3, the mean and standard deviation also get multiplied by 3.
Hence, scaling the observations has also scaled the measures of central tendency and
variability. Try dividing all the observations in the initial dataset by 3 and check whether the
mean and standard deviation too get divided by 3. Since this pattern will follow every time we
scale a dataset, we can conclude generally that,
63 | P a g e
𝐼𝑓 𝑦𝑖 = 𝑏𝑥𝑖 , 𝑤ℎ𝑒𝑟𝑒 𝑏 𝑖𝑠 𝑎 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡, 𝑡ℎ𝑒𝑛,
𝑀𝑒𝑎𝑛 ∶ 𝑦̅ = 𝑏𝑥̅ ,
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 ∶ 𝑠𝑦 = 𝑏𝑠𝑥
In conclusion, we can say that mean is affected by both- change in origin as well as change in
change in scale; whereas Standard deviation is affected only by change in scale.
IN-TEXT QUESTIONS
17. Suppose the standard deviation of a dataset is 6. If each observation is divided by 3 then
the standard deviation of the new dataset will be:
A) 3 B) 2
C) 18 D) 9
18. A researcher measures the weight of 10 students. The mean weight she calculates is
57kg. Later she realized that the weighing scale was misreporting each weight and she
had to add 3 kgs to the weight of each student. The new mean weight of students will
be _____.
19. If the standard deviation of 11, 21, 31…,71, 81, 91 is ‘K’, then the standard deviation
of 15, 25, 35…,75, 85, 95 will be:
A) K - 4 B) K +4
C) K D) 4K
3.6 SKEWNESS
The measures of central tendencies and variation discussed above do not reveal all the
characteristics of a given set of data. For example, two distributions may have the same mean,
variance and standard deviation but may differ widely in terms of their shape and peakedness.
The given data is either symmetrical or it is not. It may be flat, normal or peaked.
If the distribution of data is not symmetrical, it is called asymmetrical or skewed. Thus,
skewness refers to the lack of symmetry in distribution.
A simple method of detecting the direction of skewness is to look at the tails to distribution.
The rules are:
64 | P a g e
1. Data are symmetrical when there are no extreme values in a particular direction so that
low and high Values balance each other. In this case, mean = median = Mode, or 𝑥̅ =
Q2 = Mode.
Figure 4: Histogram of symmetric distribution
2. If the longer tail is towards the lower value or left-hand side, the skewness is negative.
Negative skewness arises when the mean is decreased by some very low values. Then
we have, mean < median < mode.
Figure 5: Histogram of negatively skewed distribution
3. If the longer tail of the distribution is towards the higher values or right-hand side, then
skewness is positive. Positive skewness occurs when mean is increased by some very
high valued observations. In this case, mean > median > mode.
65 | P a g e
Figure 6: Histogram of positively skewed distribution

Relative Skewness
Karl Pearson's coefficient of skewness to compare between two distributions is given by:
𝑀𝑒𝑎𝑛 − 𝑀𝑜𝑑𝑒
𝑆𝐾 =
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝑥̅ − 𝑀𝑜𝑑𝑒
𝑆𝐾 =
𝜎
If Mode is not given, we can use the approximate relationship studied earlier in the lesson, i.e.
Mode = 3 Median – 2 Mean. Hence, we can write the equation of coefficient of skewness as:
3(𝑀𝑒𝑎𝑛 − 𝑀𝑒𝑑𝑖𝑎𝑛)
𝑆𝐾 =
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
Now, if,
• SK = 0, it is a symmetrical distribution.
• SK > 0, the distribution is positively skewed.
• SK < 0, then it is a negatively skewed distribution.
But in practice, normally the value of SK lies between +1 ≤ 𝑆𝐾 ≤ −1.
66 | P a g e
For an open-ended distribution with extreme values in data with positional measures such as
median and quartiles, Bowley’s coefficient of skewness is used,
𝑄1 + 𝑄3 − 2𝑄2
𝑆𝐾 =
𝑄3 − 𝑄1
Here, 𝑄2 is the median. Again, if
• SK = 0, it is a symmetrical distribution.
• SK > 0, the distribution is positively skewed.
• SK < 0, then it is a negatively skewed distribution.
IN-TEXT QUESTIONS
20. For the frequency distribution of a variable x, mean = 32, median = 30 and mode = 26.
The distribution is:
A. Positively Skewed B. Negatively skewed
C. Symmetric D. None of the above
21. A researcher gathers data on the number of years of experience professors in a
university have. The mean, median, mode and standard deviation are 25, 24, 26 and 5,
respectively. Karl Pearson’s coefficient of skewness is _______ (0.20 / - 0.20).
3.7 BOXPLOTS
We will conclude this lesson by discussing boxplots. Boxplots are yet another type of graphical
representation that is extremely informative. On one hand, the stem and leaf plots, bar graphs
and histograms depict a particular aspect of the data, on the other hand, measures of central
tendency and variability also focus on separate features of the data. Is there no way we can
visualize the data and also trace the mean and variability at the same time? There is, and the
answer is boxplots. Boxplot is a comprehensive graphical representation of data in which we
can illustrate not only the central value and the variability in the data, but it is also capable of
presenting the extreme values (outliers) as well as the shape of the distribution. The following
figure displays a box plot and its features:
67 | P a g e
Figure 7: Boxplot
The left side of the box represents the first quartile of the dataset whereas the right side denotes
the third quartile. The difference between the two, i.e., the interquartile range, or fourth spread
(fs) is the length of the box. You can see the median, or the second quartile in the middle,
dividing the box into two equal halves. The whiskers (the two hands extending out of the box
on the right and left side) extend towards the smallest and the largest value in the dataset, which
are not outliers. The small dots beyond the minimum and maximum values are termed as
outliers.
Let us create a boxplot together for better understanding. Suppose we have the following
dataset containing 10 numbers:
34, 29, 25, 35, 28, 37, 30, 35, 29, 38
To create a boxplot, we first need the 5-number summary of the data, i.e., the smallest number,
Q1, Median (Q2), Q3 and the largest number. To get these, it is advisable to arrange the data in
ascending or descending order. So, we get:
25, 28, 29, 29, 30, 34, 35, 35, 37, 38
You should now try and identify the 5-number summary from the above data.
We will get, smallest number = 25
Q1 = 29
Median (Q2) = 32
Q3 = 35
Largest number = 38
68 | P a g e
Interquartile range = 35 – 29 = 6
Now start by drawing an X-axis with appropriate labels and then draw a box around the first
and third quartile. Mark the median value in the middle. The length of the box must equal the
interquartile range. Next, draw two whiskers from both the ends of the box extending to the
smallest value on the left and largest value on the right. You should get a graph that looks like:
Figure 8: Boxplot
The boxplots can also be created in a vertical manner.
In a larger dataset, we can differentiate between the highest and lowest values of the dataset
and the extreme values that are also known as outliers. We use the following formula to
calculate the minimum and maximum values in a data set and any other value lower or greater
than these, respectively are termed as outliers:
Minimum value: Q1 – 1.5 IQR
Maximum value: Q3 + 1.5IQR
Where IQR is simply the Interquartile range. So, any value below the minimum and any value
above the maximum are outliers and we denote such values in the box plot as dots. Formally,
any observation farther than 1.5fs from the closest quartile is termed as an outlier. An outlier is
extreme if it is more than 3fs from the nearest quartile, and it is mild otherwise. Having an idea
about outliers is important since as we have seen in earlier sections extreme values can affect
our measures of central tendencies and variability. There is a possibility that the extreme value
is a result of an error in any step of research. By identifying such extreme values, we can be
cautious with our study ahead.
Boxplots are also useful in indicating the shape of the distribution of the data, i.e., whether we
have symmetrical or skewed data. When the median sits exactly in the center of the box and
has equal length of the whiskers on both the sides, we can say that the distribution is
symmetrical. However, when the median lies somewhere on the right end of the box with the
right whisker smaller than the left one, then the distribution is said to be left skewed and vice-
69 | P a g e
versa. The following figure represents the relationship between the shape of distribution and
boxplots:
Figure 9: Boxplot and shape of distribution

We can easily compare multiple datasets using boxplots. For doing so, we draw two boxplots
adjacent to each other on the same scale. For instance, suppose a researcher is studying about
the duration of Indian classical songs and rap songs. She collects data from 100 classical and
rap songs and draws two boxplots given below:
Figure 10: Comparative boxplot of duration of Indian classical songs and rap songs
Try and attempt to interpret both the box plots yourself. What differences can you observe in
both the plot? What do these differences signify?
Let us begin interpreting the median. We can clearly see that the median length of classical
songs is significantly higher than that of rap songs. The average length of classical songs is 4.8
minutes, whereas the average length of rap songs is 4 minutes. Next, interpret the length of the
box, i.e. the interquartile range. It shows 50% of the data lies within this range. So, we can say
that half of the classical songs are 4.40 to 5 minutes long. Whereas 50% of the rap songs are
3.60 to 4.40 minutes long. We can also interpret the range of the data, i.e., the difference
between the maximum and minimum value. For Indian classical songs, the range is 5.20 – 3.80
70 | P a g e
that is 1.4 minutes. Whereas for rap songs, the range is 4.80 – 3.20 = 1.6 minutes. So, we can
say that the length of rap songs is more variable than classical songs. Finally, we can observe
the shape of the distributions by studying the location of the median value inside the box. The
median length of Indian classical songs is towards the right end of the box, indicating that the
distribution is left skewed. This means that most of the observations lie towards the right of
mean. In other words, we can say that most of the Indian classical songs in the data have a
longer duration. In contrast, the median length of rap songs lies right in the middle of the box.
This means that the distribution is quite symmetric.
IN-TEXT QUESTIONS
22. The following boxplots illustrates the data about the ages of actors and actresses who
have won the National film award since 1967.
Mark all the statements that support the data shown by the boxplots:
A. The first quartile age of Best Actor winner is less that the last quartile age of
Best Actress winner
B. The minimum age of Best Actor winner is equal to the minimum age of Best
Actress winner
C. The range of age of Best Actor winner is higher than the range of age of Best
Actress winner
D. Both the distributions are left skewed
23. Inspired by his statistics class; a student started maintaining a record of the number of
minutes he was late to enter the classroom every day. He recorded the time for 15 days
and the following list displays the data (in minutes):
19, 12, 9, 7, 17, 10, 6, 18, 9, 14, 19, 8, 5, 17, 9
Which of the given boxplots accurately depict the data:
71 | P a g e
A. B.
C. D.
3.8 SUMMARY
This lesson focused on the quantitative features of data in terms of measures of central tendency
and variability and their applications. Measures of central tendency refer to a typical or central
value of the data around which the data is generally clustered. Arithmetic mean is the average
value of the dataset which can be calculated by adding the value of all the observations and
dividing the sum by the total number of observations. Since mean is affected by extreme values,
median is preferred. Median is the middle observation in the data when the data is arranged in
ascending or descending order of magnitude.
When the distribution is symmetric, the three measures of central tendencies converge
at the same point. Quartiles divide the data set into 4 equal parts. Percentiles denote the
observation below which a particular percentage of observations fall. Deciles divide a dataset
into 10 equal parts. Since arithmetic mean is affected by extreme values, trimmed mean is used
to eliminate the extreme observations from analysis. Under measures of variability, range is
the simplest measure. It is the difference between the largest and the smallest value in the
dataset. Inter-quartile range is calculated by taking the difference between the upper and lower
quartile. Variance is calculated by dividing the sum of the squared deviations from the mean
by (n – 1). Standard deviation is simply the square root of variance. Degree of freedom refers
to the maximum number of independent values, that have the freedom to vary, in a sample.
Coefficient of variation is a relative measure of variation. Mean is affected by both- change in
origin as well as change in change in scale; whereas Standard deviation is affected only by
change in scale. Skewness refers to the lack of symmetry in distribution. In case of a right or
positively skewed distribution, the value of mean is the largest, followed by the median and
mode. The opposite is true in the case of a left or negatively skewed distribution, i.e., the value
of mode is the largest and mean has the smallest value. Boxplots are capable of illustrating the
central values, variability in the data, the extreme values (outliers) as well as the shape of the
distribution. The boxplots are extremely useful to compare two datasets and identify outliers.
3.9 GLOSSARY
• Boxplot: Graphical representation of measure of central tendency, variability and
skewness of numerical data using quartiles
72 | P a g e
• Change in Origin: Addition or subtraction of a constant value from all observations in

dataset
• Change in Scale: Multiplying or dividing all the observations in the dataset with a constant
term
• Coefficient of Variation: Relative measure of variation
• Deciles: Divide dataset into 10 equal parts
• Degree of Freedom: Maximum number of independent values, that have the freedom to
vary
• Inter-Quartile Range: Difference between the upper and lower quartile
• Mean: Average value of the dataset
• Median: Middle observation in the data when the data is arranged in ascending or
descending order
• Measures of Central Tendency: Certain measurements that give a typical or central value
from the data
• Measures of Variability: Represent spread of data around averages
• Mode: Most frequently occurring value in the dataset
• Percentiles: That observation below which a particular percentage of observations fall
• Quartile: Divide the data set into 4 equal parts
• Range: Difference between largest and smallest value in the dataset
• Skewness: Lack of symmetry in distribution
• Standard Deviation: square root of variance
• Trimmed Mean: Computing mean after eliminating the extreme observations
• Variance: Illustrates the variation in a dataset

1. 4.5
2. D. 35
3. B. x = 9; Observations = 11 and 17
4. 51
73 | P a g e
5. 17.65 million dollars

6. C. 18
7. A. 40
B. False, due to extreme values
C. 16
D. True
8. A. 66
9. 15. Unsymmetrical
10. A. 16. B. No.
11. 81
12. B. Median
13. C) 0.000144
14. 8.25
15. A. All students scored 75 marks
16. True
17. B) 2
18. 13
19. C) K
20. A. Positively Skewed
21. - 0.20
22. A, D
23. B
74 | P a g e

Q.1 The mean of the four numbers is 37. The mean of the smallest of three numbers is 34.
If the range of the data is 15, what is the mean of the largest three numbers?
Q.2 The mean and variance of 10 observations is 4 and 2, respectively. If each observation
is multiplied by 2, Calculate the mean and variance of new data.
Q.3 If V is the variance and M is the mean of first 10 natural numbers, then what is the value
of V + M2?
Q.4 Let ‘x’ be the median of the following observations:
33, 42, 28, 49, 32, 37, 52, 57, 35, 41
If 32 is replaced by 36 and 41 is replaced by 63, then the median obtained is ‘y’.
Calculate the value of (x + y).
Q.5 The mean of 5 observations is 3 and variance is 2. If three of the five observations are
1, 3, 5, find the other two.
Q.6 Show that for the given (n+1) numbers,
If the average monthly spending by 21 women in a kitty group was Rs. 5240, what is
the new average spending if another member is added whose average monthly spending
is 5540? Use the formula above to answer.
Q.7 The sum of deviations of a certain number of observations from 12 is 166 while the
sum of deviations of these observations from 16 is (-54). Find the number of
observations and their mean.
Q.8 Consider the following data that depicts the daily income of two ice cream sellers in
two different regions:
Region Day 1 Day 2 Day 3 Day 4 Day 5 Day 6

A 500 900 800 900 700 400
B 300 540 480 540 420 240
Calculate the coefficient of variation for each region and compare your result.
75 | P a g e
Q.9 If the coefficient of skewness of a distribution is 0.32, the standard deviation is 6.5 and
the mean is 29.6 then find the mode of the distribution.
Q.10 There are 40 students in a class preparing for a statistics test. There are two strategies
that these students can adopt to ace the test. After the test, the professor interviewed the
students and noted down their strategies. The professor observed that 20 students
followed strategy A and the other 20 followed strategy B. Given below are the marks
of the students based on the strategies they adopted:
Strategy A: 78, 78, 79, 80, 80, 82, 82, 83, 83, 86, 86, 86, 86, 87, 87, 87, 88, 88, 88, 91
Strategy B: 66, 66, 66, 67, 68, 70, 72, 75, 75, 78, 82, 83, 86, 88, 89, 90, 93, 94, 95, 98
Write down the 5-number summary for both the strategies. Create two boxplots for each
strategy using the 5-number summary and comment on the shapes of boxplots. According to
you, which strategy is more likely to fetch you good marks in the test and why?
3.12 REFERENCES
Cengage learning.

• Gupta, S. C. (2019). Fundamentals of Statistics. New Delhi, India: Himalaya publishing
house.
76 | P a g e
LESSON 4
SAMPLE SPACE, EVENTS, AND PROBABILITY
STRUCTURE

4.2 Introduction
4.3 Sample and Population
4.3.1 Statistical or Random Experiments
4.3.2 Sample Point, Event
4.3.3 Population or Sample Space of an Experiment
4.3.4 Events, Set Theory and Venn Diagrams
4.3.5 De Morgan’s laws
4.4 Probability
4.4.1 Classical Definition of Probability
4.4.2 Relative Definition of Probability (by Von Mises)
4.4.3 Axiomatic Definition of Probability
4.5 Summary
4.6 Glossary
4.9 References
4.10 Suggested Readings
77 | P a g e

1. To understand the concept of sample space and population and their significance.
2. To comprehend the need for the concept of sample and population in the context of
probability.
3. To understand the concept of probability in the context of random experiments.
4. To visualize the applications of probability in real life and understand the definition
of probability.
5. To be able to differentiate between the sample space, events, sample points, and
random experiments.
6. To get familiarized with the technique of the Venn diagram, its usage in defining
events, types of events and
7. To understand the properties of probability and various operations to comprehend the
working of probabilities.
4.2 INTRODUCTION
This unit introduces the concept of ‘probability’ to the students. The phenomenon of
probability indicates the presence of randomness and the existence of some element of
uncertainty. Whenever we face a situation in which there is more than one possible outcome
that can occur, the concept of probability renders a technique for quantifying the chances or
likelihood associated with every possible outcome. There are several instances that involve
chances and thus the notion of probability is applicable. For example, in political elections,
based on exit polls it is plausible to predict that a certain political party could come into power.
By deploying a database of the previous days and considering various parameters such as
temperature, humidity, pressure, etc., the meteorologists use specific tools or techniques to
predict weather forecasts and determine that there are 60 out of 100 chances that it would rain
today.
Another example from day-to-day life is that ‘since it is supposed to rain tomorrow, it is very
likely I will use my raincoat when I go to work. Similarly, flipping a coin involves the
probability of getting either a head or a tail is 0.5 and playing with dice involves one out of six
chances that the required number will come. Thus, the concept of probability can be applied to
several interesting events.
Probability is a mathematical term and the study of probability as a branch of mathematics is
over 300 years. This chapter enables the students to understand and estimate the likelihood of
78 | P a g e
various possibilities of events and outcomes. Various elementary concepts used in

comprehending the concept of probability will be discussed and explained, such as Sample,
population, random experiments, Venn diagram, sample points, events, types of events etc.
4.3 SAMPLE AND POPULATION

The discipline of Statistics deals with organizing and summarizing data for drawing
conclusions based on the information collected in the form of data. An investigation or
experiment that results in a well-defined collection of objects, constitutes what is known as
‘Population’.
There can be several types of population. One study on a particular type of medicine will lead
to a collection of particular capsules during a specified period. Another investigation might
involve a population consisting of students getting enrolled in BA honors Economics. If the
desired information is available for all the objects in the population, it is called a ‘census’.
A subset of the population is considered as a ‘sample’. A sample is selected in some prescribed
manner. “Sample is a means to an end rather than the end itself”. The technique for
generalizing from a sample to a population gathered within the branch of our discipline called
“Inferential Statistics”.
Deductive Reasoning
Probability
Population Sample
Inferential Statistics
Inductive Reasoning
Figure 1: Relationship between Population and Sample ‘a two-way process.

It can be visualized from the figure above that a sample and a population both can be deployed
to examine and assess the data also called ‘inference’. There are two fundamental approaches
for inference, deductive and inductive reasoning.
79 | P a g e
When a sample is derived from the given population, then the concept of probability is used to
infer anything regarding the population. This method of inference is called deductive
reasoning. However, when the sample is used to deduct or infer the population, inferential
statistics is deployed for inferring the population. The technique is referred to as ‘inductive
reasoning. Thus, the role of probability is explicit and well-defined as it plays a critical role in
inferring the sample derived from the population. It is crucial in the deductive method of
statistical inference or research.
Having understood the difference between the sample and population and the relationship
between them, also their role played in statistical inference, it is crucial to comprehend the kind
of experiments or data collection.
4.3.1 Statistical or Random Experiments
Any activity or process whose outcome is subject to uncertainty is considered an experiment.
Experiments generally suggest careful controlled testing of the situation or planned testing in
the laboratory. However, in the disciple of statistics, experiments refer to a wider scope of trials
such as tossing a coin once or several times, selecting a card from the deck, obtaining a
particular blood type from a group of individuals, etc.
Any process of observation or measurement that has more than one possible outcome and for
which there is uncertainty about which outcome will actually materialize is referred to as a
‘random experiment’. For example, tossing a coin, throwing a pair of dice, drawing a card from
the deck of cards.
4.3.2 Sample Point, Event
Each member or outcome of a sample space or population is called Sample Point and event.
It is also called an element of sample space. Let us consider the example of the toss of the coin
for which the sample space is S = {H, T}. The number of elements in the sample space or
population is n(S) = 2. Each element of the sample space that is H and T are known as a Sample
point. In general, n(S) is the number of sample points, a number of times the experiment is
repeated.
Consider an event B which is defined as Event B: Tail appears: B={T}. The number of elements
in event B is 1, denoted by n(B)=1
In a random experiment of the toss of a coin, suppose the event A denotes the event that Head
appears. A = {H}. The number of elements in event A is 1, denoted by n(A)=1
A+B = S: {H}+{T} = {H,T} = S
Let us consider another example of tossing two fair coins. The sample space or population for
this experiment is given by Sample Space: {HH, HT, TH, TT}
80 | P a g e
The number of elements in the sample space is 4, denoted by n (S) = 4.

Consider an event B that at least the head appears on one of the coins in the toss of two coins
simultaneously. Event B can be represented as
B= {HH, HT, TH},
The number of elements in event B is 3, represented by n(B) = 3
Trial & Events: An experiment is repeated under essentially identical conditions but does not
give unique results. It may result in several possible outcomes. The experiment is called a Trial
and the outcomes are called events. For example, throwing a coin once is an experiment, and
getting a Head or Tail is an event. Planting a sapling is a Trial and whether it survives, or dies
is an Event. Sitting for an examination is a Trial and getting grades such as A, B, C, D, and E
are events.
Exhaustive Events: All possible outcomes of an experiment constitute collectively exhaustive
events. For example, tossing a coin result in two exhaustive cases which are Head and Tail.
Planting a sapling leads to two exhaustive cases which are Survival and Death. Sitting for an
examination where a student is awarded only 5 grades results in those many exhaustive
numbers of cases.
Favourable Events: All those outcomes of an experiment that lend themselves to the
objectives or favour of the experiments are favourable events. For example, a gambler betting
on an Ace in a game of cards where every draw of cards decides the winner or loses has 4
favourable events, and betting on a black card has 13+13 = 26 favourable events.
Mutually Exclusive Events: Events are said to be mutually exclusive if happening of one
event prevents the occurrence of other events at the same time. Such events are also referred to
as disjoint events since they have no element in common. For example, in athletics meet
involving 10 challengers if any one of them wins then the remaining 9 winning cannot happen
and hence are mutually exclusive. Similarly, in a toss of coin, occurrence of Head or Tail are
mutually exclusive.
Equally Likely Events: Two events are said to be equally likely if one of them is as likely to
happen as the other. For example, in tossing a fair coin once, the outcomes Head and Tail are
equally likely. In a throw of 6-faced dice, all the six numbers 1,2,3,4,5,6 are equally likely. If
a person suffers a minor heart attack, the death or survival outcomes are not equally likely.
Independent Events: If the happening of one event is not affected by the happening (or not
happening) of another event, such events are said to be independent. For example, successively
throwing a dart on the dartboard and getting a perfect score in every throw are independent
81 | P a g e
events. However, a person throwing the dart once, practicing, and then throws it for the second
time. The event of getting a perfect score in both throws is not independent.
Example: 1 Trial: Tossing of one fair coin

Events: Occurrence of Head, the occurrence of Tail.
Exhaustive events: Occurrence of Head
Mutually exclusive events: Head and Tail
Equally Likely Events: Head and Tail
Example 2: Trial: Tossing of Two fair coins
Events: Occurrence of Two Heads
Occurrence of One Head
Occurrence of Zero Head
Exhaustive Events: HH, HT, TH, TT
Favourable Events: a) HH
b) HT, TH
c) TT
Mutually exclusive event: Occurrence of Two Heads and Occurrence of Two
Tails
Equally likely events: a) getting at least one Head (HH, HT, TH) is equally
likely as getting at least one Tail (TT, TH, HT)
b) getting both heads is not equally likely as getting at least one head.
Independent Events: getting a Head in the second Toss is independent of
getting a Head in the first Toss.
4.3.3 Population or Sample Space of an Experiment
The set of all possible outcomes of an experiment is called Population or simply Sample
Space, denoted by S. Let us consider an example of tossing one fair coin. This is an example
of a random experiment since this involves two plausible outcomes. A head or a tail can appear
82 | P a g e
in a single toss of a fair coin. For such an experiment the total number of outcomes is two,
therefore the sample space is denoted by
The sample space: S = {H, T},
The number of elements in the sample space or population is n(S) = 2
Either both tosses result in a Head or both Tosses result in a Tail or the first Toss result in a
Head while the second results in Tail or the first Toss results in a Tail and the second results in
a Head.
Let us consider another example of tossing two fair coins. The sample space or population for
this experiment is therefore given by
Sample Space: {HH, HT, TH, TT}
The number of elements in the sample space or population is 4, n (S) = 4.
Consider another example of rolling a die,
Sample space S = {1,2,3,4,5,6},
The number of elements in the sample space is n (S) = 6
And if the same dice is rolled twice, n(S)= 36 = 62 is the Sample space.
If rolled thrice, n(S) = 216 = 63 is the Sample Space.
A CASE STUDY
Consider another example of rolling a dice, The sample space for the random experiment of
rolling dice is given by the Sample space S = {1,2,3,4,5,6},
The number of elements in the sample space is 6, denoted by n (S) = 6,
Let event E be an event that reflects even numbers that appear on dice, as represented by
E= {2, 4, 6},
The number of elements in event E is 3, represented by n(E)=3
There are several varieties of events as described in the next section.
83 | P a g e
IN-TEXT QUESTIONS
1. Events are said to be _____________. if the occurrence of one event prevents the
occurrence of another event at the same time.
2. If event A represents an event that at least a head appears, and event B represents an
event that only the tail appears. Events A and B are equally likely True / False
3. In the occurrence of the event: {Head} in a single throw of the coin, the occurrence of
event {Tail} is disjoint. The two events are called
a) Mutually exhaustive b) Equally likely c) Both
4. In an experiment consisting of tossing two coins, if event A represents an Event that at
least a Head occurs and event B represents that at least a Tail occurs, then
a) Events A and B are equally likely (True/False)
b) Events A and B are mutually exclusive (True/False)
c) Events A and B together form an exhaustive set (True/False)
4.3.4 Events, Set theory and Venn Diagrams
An event can be considered a set, therefore the relationships and results from elementary set
theory can be used to study events of any random experiment. Some of the fundamental
operations of set theory can therefore be applied to events such as.
1. The complement of an event A is denoted by A'. A complement represented as A' is
the set of all outcomes in the sample space S that are not contained in set A.
2. The union of the two events A and B is denoted by A ∪ B. A union B can also be read
as “A or B” or in both events. In other words, the union of two events includes outcomes
for which both A and B occur as well as outcomes for which exactly one occurs. It
means all outcomes in at least one of the events.
3. The intersection of the two events, A and B, denoted by A ∩ B is read as “A and B”.
The intersection of two events indicates an event consisting of all outcomes that are in
both A and B.
4. A null event is an event consisting of no outcomes whatsoever and is denoted by ∅.
Suppose there are two events A and B, and it is given that A ∩ B = ∅. then A and B are
said to be mutually exclusive or disjoint events.
4.3.5 De Morgan’s laws
a. The complement of the union of events A and B is equal to the intersection of the
complement of A and the complement of B.
84 | P a g e
(A ∪ B) ' = A ‘∩ B '
b. The complement of the intersection of event A and B is equal to the union of the
complement of A and the complement of B.
(A ∩ B) ' = A' ∪ B'
The events can be represented by using the Venn diagram as shown in the diagrams below.
A B
Fig 1: Population or Sample Space Fig 2: Event A and B are disjoint
A∩ 𝐵
A A
B
Fig 3: Events A and B are not disjoint

All elements in the sample space belong to the rectangle that represents the entire population
as shown in figure 1. Event A is represented by the oval in orange colour and event B is
represented by the oval shape in blue inside the rectangle, as shown in figure 2. The rectangle
is the population of sample space and events A, and B are the subsets of the sample space.
Events A and B have nothing in common, such events are referred to as disjoint events. These
events are also referred to as mutually exclusive events. Event A is represented by the oval in
orange colour and event B is represented by the oval shape in blue inside the rectangle. The
rectangle is the population of sample space and events A, and B are the subsets of the sample
space. In this case events, A and B have common elements therefore, events A and B are not
disjoint sets.
85 | P a g e
IN-TEXT QUESTIONS
1. Consider an experiment in which each of the three vehicles taking a particular freeway
exit turns left (L) or right (R) at the end of the exit ramp. Outline the sample space and
events.
2. The two events E1 and E2 are mutually exclusive, where E1 is the event consisting of
numbers less than 3 and E2 is the event that consists of numbers greater than 4. (True/
False)
3. If the two events have some common elements, the two events are not ____________.
4.4 PROBABILITY
In the realm of random experiments, the key objective of the probability of any event A is to
assign a number P(A) to event A. This value P(A) is called the probability of event A which
gives a unique measure of the chances that the event will occur.
In other words, the probability is the chance of happening or occurrence of an event such as it
might rain today, team X will probably win today, or I may win the lottery. Largely, probability
is a measure of uncertainty.
4.4.1 Classical Definition of Probability
It is also called a priori or mathematical definition of probability. The probabilities are derived
from purely deductive reasoning. This implies that one does not throw a coin to state that the
probability of obtaining a head, or a tail is ½. However, there are cases where possibilities that
arise cannot be regarded as equally likely. For example, the Probability of a recession next year
Probability of GDP value next year. Similarly, the possibility of whether it will rain, or the
outcome of an election is not equally likely.
If an experiment results in mutually exclusive and equally likely outcomes. If m outcomes are
favorable to event A and n is the total number of outcomes in the sample space, then
𝑚 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 𝑡𝑜 𝐴
P(A) = , 𝑜𝑟
𝑛 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
𝐹𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐸𝑣𝑒𝑛𝑡𝑠 𝑛(𝐴)

= =
𝐸𝑥ℎ𝑎𝑢𝑠𝑡𝑖𝑣𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐸𝑣𝑒𝑛𝑡𝑠 𝑛(𝑆)
In a single throw of a die, the total occurrences or sample space is n = 6. All are mutually
exclusive and equally likely.
86 | P a g e
4.4.2 Relative Definition of Probability (by Von Mises)

If a trial is repeated a large number of times under essentially homogeneous and identical
conditions, then the limiting value of the relative frequency which is the ratio of absolute
frequencies to the total number of occurrences is called the probability of happening of events.
𝑚
P(A) = lim
𝑛→∞ 𝑛
IN-TEXT QUESTIONS
6. In a toss of two coins simultaneously, the probability of getting exactly 2 heads P(E)
no. of possible outcomes / total outcomes
7. In the toss of 3 coins simultaneously, the probability of getting exactly two heads.
8. What is the probability of getting at least 1 head when two coins are tossed
simultaneously?
9. Prob of getting almost 2 tails when three coins are tossed simultaneously.
10. Probability of getting at least 2 heads when three coins are tossed simultaneously.
11. Probability of getting a greater number of tails than heads when three coins are tossed
simultaneously.
4.4.3 Axiomatic Definition of Probability
The axiomatic approach to probability was provided by Russian Mathematician A.N.
Kolmogorov and includes both the above definitions. In order to ensure that the probability
assignments of values P(A) for a particular event in the sample space S, is consistent with the
intuitive notion of probability, all assignments of values of probability P(A) must satisfy the
following properties or Axioms.
1. For any event A, the probability of event A, given by P(A) is non-positive P(A) ≥0. In
other words, the probability that event A will occur can either be zero or some positive
number. The probability of event A can never be negative.
The Axiom 1 reflects the intuitive notion that the chance of A occurring should be non-
negative and is known as the Axiom of non-negativity.
2. The probability of the entire sample space is 1, that is P(S) = 1. In other words, the
probability that the entire sample space will occur is 100 percent, which means it will
surely occur. This is known as the Axiom of Certainty.
87 | P a g e
The sample space by definition is the event that must occur when the experiment is
performed. The sample space S contains all possible outcomes, therefore the maximum
possible probability is assigned to sample space S.
3. If A1, A2, A3, ………... are the infinite collection of disjoints events, then
P (A1 ∪A2 ∪ A3, ……….) = ∑∞ 𝑖=1 𝑃(𝐴)
This indicates that the probability of the union of all disjoint events belonging to the sample
space sums the chances of all individual events.
The third Axiom formalizes the idea that if we wish the probability that at least one of a number
of events will occur, given that no two events can occur simultaneously, then the chance of at
least one occurring is the sum of the chances of the individual events. This is known as the
axiom of finite additivity.
4. The probability of an event always lies between 0 and 1.
0< = P(A) < = 1,
P(A) = 0 means event A will not occur.
P(A) = 1 means event A will occur certainly.
5. Let the ∅ be the null event. The event contained no outcomes whatsoever. This property
mainly reflects Axiom 3 indicating the finite collection of disjoint events.
Therefore, P (∅) = 0, the probability of a null event is zero.
6. If A, B, and C are mutually exclusive events, the probability that any one of them will
occur is equal to the sum of probabilities of either individual occurrence.
P(A+B+C+...........) = P (AUBUC…….) = P(A) +P(B)+P(C) +...............
7. If A, B, C …… are a mutually exclusive and collectively exhaustive set of events the
sum of the probability of their individual occurrences is 1. However, if A, B, C ……
are any events, they are said to be statistically independent if the probability of their
occurring together is equal to the product of their individual probabilities. P(A∩B∩C)
= Probability of events A, B, and C occurring together or jointly or simultaneously, also
referred to as Joint probability.
P(A), P(B), and P(C) are called unconditional marginal or individual probabilities.
8. If events A, B, and C …...are not mutually exclusive then,
P(A+B) or P(AUB) = P(A) +P(B) - P(A∩B)
Where P(AB) is the joint probability that the two events occur simultaneously, that is
P (A∩ 𝐵). However, if A and B are mutually exclusive then,
P(A∩B) = P (∅) = 0
88 | P a g e
For every event A, there is an event A', called as a complement of A
P (A + A') = P (A ∪ A’) = P (S) = 1
P (A A') = P (A ∩A’) = P (∅) = 0
IN-TEXT QUESTIONS
12. An unbiased dice is thrown. What is the probability of getting

(i) a multiple of 3
(ii) a number less than 5
(iii) an even prime number
(iv) a prime number
(v) a factor of 6
13. A dice is thrown once, find the probability of getting
a) An odd number
b) A multiple of 3
c) A factor of 5
14. Two dice are thrown together, find the probability of getting
a) An even number on both
b) Sum as a perfect square
c) Different numbers on both
d) A total of at least 10
e) Sum as a multiple of 3
f) A multiple of 2 on one and a multiple of 3 on other
g) Sum as an even number OF PROBABILITY
4.5 SUMMARY
This lesson familiarized the students with the basic concepts of sample space and population
along with their significance. The notion of probability was introduced with help of random
experiments. Various applications of probability in real life are presented in the chapter. Certain
important concepts related to probability such as space, events, sample points, and random
89 | P a g e
experiments are described in the chapter. The basic difference between the sample, population,
sample points, and events have been emphasized. The types of events such as disjoint events,
mutually exhaustive, and exclusive events have been explained. Further, the concept of the
Venn diagram is also presented in the chapter. The notion of probability by using classical and
relative definition has been introduced. Later the properties of probabilities are also discussed
in the chapter.
4.6 GLOSSARY
1. Sample: “Sample is a means to an end rather than the end itself”.

2. Population: An investigation or experiment that results in a well-defined collection
of objects, constitutes what is known as ‘Population’.
3. Deductive Reasoning: When a sample is derived from the given population, then the
concept of probability is used to infer anything regarding the population. This method
of inference is called deductive reasoning.
4. Inductive Reasoning: When the sample is used to deduct or infer the population,
inferential statistics is deployed for inferring the population. The technique is referred
to as ‘inductive reasoning.
5. Random Experiment: Any process of observation or measurement that has more than
one possible outcome and for which there is uncertainty about which outcome will
actually materialize. Such an experiment is referred to as ‘random experiment’.
6. Sample Point or Event: Each member or outcome of sample space or population is
called Sample Point. It is also called an element of sample space.
7. Mutually Exclusive: Events are said to be mutually exclusive if the occurrence of one
event prevents the occurrence of another event at the same time. Such events are also
referred to as disjoint events since they have no element in common.
8. Equally Likely: The events are called equally likely when two events are said to be
equally likely if one event is as likely to occur as the other.
9. Collectively Exhaustive: The events are collectively exhaustive if the events exhaust
all possible outcomes of an experiment.
10. De Morgan’s Law: The complement of the union of two sets A and B is equal to
the intersection of the complement of the sets A and B. This is De Morgan’s
first law.
1. Mutually Exclusive
90 | P a g e
2. False
3. Both
3 The sample space S; {LLL, RLL, LRL, LLR, LRR, RLR, RRL, RRR}
The event that exactly one of the three vehicles turns right: A
The elements in event A: {RLL, LRL, LLR}
The event that at most one of the vehicles turns right: B
The elements in the event B: {LLL. RLL, LRL, LLR}
In the event that all three vehicles turn in the same direction: C
The elements in the event C: {LLL, RRR}
4. E1 = {1,2}, E2 = {5,6}. The two events are mutually exclusive. True
5. Disjoint
6. ¼
7. 3/8
8. ¾
9. 7/8
10. ½
11. ½
12. Total number of possible outcomes = 6= n(S)
(i) a multiple of 3
Number of favorable outcomes = 2 {3 and 6}
Hence P (getting multiple of 3) = 2/6 = 1/3
ii) a number less than 5
Number of favorable outcomes = 4 {1, 2, 3, 4}
Hence, P (getting number less than 5) = 4/6 = 2/3
iii) an even prime number
Number of favorable outcomes = 1 {2}
Hence, P (getting an even prime number) = 1/6
iv) a prime number
Number of favorable outcomes = 3 {2,3,5}
Hence the P (getting a prime number) = 3/6 = 1/2
91 | P a g e
v) a factor of 6
Number of favorable outcomes= 4 {1, 2, 3, 6}
Hence, P (getting a factor of 6) = 4/6 = 2/3
13. a) 1/2 b) 1/3 c) 1/3
14. a) 1/4 b) 7/36 c) 5/6 d) 1/6 e) 1/3 f) 11/36 g) ½
1. Two six-faced dice are rolled together, or dice is rolled twice. The total number of
possible outcomes are 36.
2. (i) Prove that the probability of null event is zero, P (∅) = 0.
(ii) Prove that for any two events A and B
P(AUB) = P(A) +P(B) - P(AB)
4.9 REFERENCES
Cengage Learning.
• Freund, J. E., Miller, I., & Miller, M. (2004). John E. Freund's Mathematical Statistics:
With Applications. Pearson Education India.
4.10 SUGGESTED READINGS
• Rice, J. A. (2006). Mathematical statistics and data analysis. Cengage Learning.
• Larsen, R. J., & Marx, M. L. (2005). An introduction to mathematical statistics.

Prentice Hall.
92 | P a g e
LESSON 5
CONDITIONAL PROBABILITY
STRUCTURE

5.2 Introduction
5.3 Conditional Probability
5.3.1 Computation of Conditional Probability
5.4 Bayes' Theorem
5.5 Independence of Events
5.6 Summary
5.7 Glossary
5.10 References
1. To understand the concept of conditional probability and its significance in real life.
2. To comprehend the significance of the initial assignment of probability that may be
followed by partial information relevant to the outcome.
3. To visualize that partial information may affect the assignment of probability
assignment. This leads us to the concept of conditional probability and Bayes’
Theorem.
4. To comprehend the concept of Bayes’ Theorem and its applications.
5. Learning the computation of a posterior probability from the given prior probabilities
and conditional probabilities plays a critical role in Bayes’ Theorem and
93 | P a g e
6. To practice several cases and examples for the application of Bayes’ Theorem.
5.2 INTRODUCTION
In the previous chapter, we introduced the topic of probability. In this chapter, we expose
students to the deeper concepts and situations related to probability. The probabilities assigned
to various events or occurrences are subject to what is known as experimental situations. The
initial assignment may be followed by partial information relevant to the outcome. The partial
information may affect the assignment of probability assignment. This leads us to the concept
of conditional probability and Bayes’ Theorem.
Conditional probability is considered a measure of the likelihood of an event occurring,
assuming that another event or outcome has previously occurred. For instance, a student aims
to receive an academic scholarship while applying for admission. For every 1000 applications,
the college accepts 100 applications and awards an academic scholarship to 10 of every 500
students. Of the scholarship recipients, 50 % receive university stipends for books, meals,
housing etc. As a result, the chance of students being accepted and then receiving a scholarship
is 2% given by 0.1 multiplied 0.02. While the chance of being accepted, receiving the
scholarship and then receiving the stipend for books etc. is 0.1 & given by 0.1 multiplied by
0.02 multiplied by 0.5.
In the realm of conditional probability, Bayes’ Rule or Bayes’ Law is used to calculate the
conditional probability. Bayes’ theorem is a mathematical equation that helps calculate
conditional probability. The computation of a posterior probability from the given prior
probabilities and conditional probabilities plays a critical role in Bayes’ Theorem.
5.3 CONDITIONAL PROBABILITY
Conditional probability is defined as the likelihood of an event or an outcome occurring based

on the occurrence of a previous event or outcome. The conditional probability is condition
upon the occurrence of some event that has happened earlier. Therefore, the conditional
probability is computed by multiplying the probability of the preceding event by the updated
probability of the succeeding or conditional event.
For a particular event A, we have used P(A) to represent the probability of event A. The
probability P(A) can be considered as unconditional probability, which simply implies that the
probability of occurrence of event A does not depend on anything. Suppose now we introduce
another event B. An occurrence of this event B affects the probability assigned to event A. In
other words, the probability of event B affects the probability of event A. To represent the
probability of event A such that event B has already occurred be considered as P(A/B). This
represents the conditional probability of A given that event B has already occurred. Here, B is
the conditioning event.
For two events A and B,
94 | P a g e
P(AB) = P(A) * P (B | A), P(A) > 0

P (B) * P (A | B), P(B) > 0
Where P (B | A) represents the conditional probability of occurrence of B when A has already

occurred, and P (A | B) is the conditional probability of occurrence of A when B has already
occurred.
Let us take an example of the virus COVID-19. Event A might refer to an individual infected
with the COVID-19 virus in the presence of the symptoms. However, if the blood test is
performed on that individual and the result is negative. Consider this situation as event B where
B reflects a negative Blood test. Thus, the probability of having COVID-19 will change, that
is Probability of event A occurring will change, or P(A) will change. It is natural to think that
P(A) will decrease but it may not be zero because the blood test is not fully reliable.
Another example is that suppose a student who applies for a college will be accepted is event
A. There is an 80% chance that students will be accepted to college. Suppose event B is that
the student will be given hostel accommodation in that college. The hostel accommodation will
only be provided for 60% of all accepted students. The probability that event B occurs provided
event A has occurred is given by P(B/A) given by P(AB) * P(A) which is 0.6*0.8 which is
equal to 0.48.
5.3.1 Computation of Conditional Probability
If there are two events, A and B, then the probability that event A occurs knowing that event
B has already occurred. This is called conditional probability of A, conditioned that event B
has already occurred.
Conditional Probability is denoted by: P (A | B)
𝑃(𝐴𝐵)
P (A | B) = , P(B) > 0
𝑃(𝐵)
Where P(AB) is the joint probability of events A and B occurring together. Also, read as the
probability of events A and B or P(A∩B) or probability of event A intersection B.
The computation of conditional probability P (A | B) requires that the P(B) is positive or the
probability of event B occurring cannot be negative or zero.
Similarly, the conditional probability of B, given A, is denoted by P (B | A)
P (B | A) = P(AB)/ P(A), P(A) > 0
P(AB) = P (B | A) *P(B), P(A) > 0
95 | P a g e
Where P(AB) is the joint probability of events A and B occurring together. Also, read as the
probability of events A and B or P(A∩B) or probability of event A intersection B. The
computation of conditional probability P (A | B) requires that the P(B) is positive or the
probability of event B occurring cannot be negative or zero.
In the case of Conditional Probability is denoted by: P (A | B)
P (A | B) = P(AB)/ P(B), P(B) > 0
The conditional probability is expressed as a ratio of the unconditional probabilities. The
numerator is simply the probability of the intersection of the two events, whereas the
denominator is the probability of the conditioning event B. The conditional probability can be
represented by the following Venn diagram.
A A B
Figure 1: Conditional Probability

Suppose event B is given and it has already occurred, as a result, the sample space is no longer
the entire population, but consists of outcomes in B. Event A has occurred if and only if one of
the outcomes in the intersection occurred. Therefore, the conditional probability of A given B
is proportional to P(A∩B).
The ratio given by 1/P(B) is considered the proportionality constant. This ratio ensures that the
P(B/B) is the probability of a new sample space due to the condition that event B has already
occurred and the probability P (B | B) of the new sample space is equal to 1.
For instance, A box contains 20 TVs, of which 5 are defective. If 3 of the TVs are selected at
random and removed from the box in successive without replacement, what is the probability
that all three fuses are defective?
Let us consider event A that the first TV is defective, B is the event that the second TV is
defective, and C is the event that the third TV is defective then, given that
P(A) = 5/20, P (B | A) = 4/19, P (C | A ∩ B) = 3/18
P (A ∩ B ∩ C) = P(A) * P (B |A) * P (C| A ∩ B) = 5/20 * 4/19 * 3/18 = 1/114
96 | P a g e
Points to remember
(a) Two possible mutually disjoint events are always dependent.
Proof: Let A and B be disjoint events i.e. A ∩ B = Ø
So, P (A ∩ B) = 0
We know that P (A ∩ B) = P(A)*P (B |A), P(A) ≠ 0
P(B)*P (A |B), P(B) ≠ 0
Since P(A) ≠ 0 and P(B) ≠ 0

Implies P (B |A) = 0 and P (A |B) = 0
Implies A and B are dependent events.
(b) Two possible and independent events cannot be mutually disjoint.
Proof: Let A and B be two independent events such that both are possible i.e.
P (A ∩ B) = P(A)*P(B) such that P(A) > 0, and P(B) > 0
Implies P (A ∩ B) ≠ 0
Hence, A and B cannot be disjoint.
For example, in a management class, let us define two independent events namely choosing a
tall person and choosing an intelligent person, for the sample Prove that the events of choosing
a short person and choosing a Moron are also independent.
Let us define, set A as choosing a tall person and set B as choosing an intelligent person.
Then, P(AB) = P(A)*P(B) as A and B are independent.
Now, P (A' ∪ B’) = P [(A ∪ B) '] = 1 – P [A ∪ B]
= 1 – [P(A) + P(B) – P(AB)]
= 1 - [P(A) + P(B) – P(A) * P(B)]
= [1 - P(A)] [ 1 – P (B)]
= P (A') P (B’)
Here, A' (choosing a short person) and B' (choosing a Moron) are also independent.
97 | P a g e
CASE STUDY
Suppose that of all the individuals who buy the smartphone, 60% include an optional memory
card in their purchase, 40% include an extra battery and 30% include both a card and a battery.
Solution:
Let us consider a randomly selected buyer, and let event A be the memory card purchased,
while event B be the battery purchased. Thus, P(A) is 0.6, P(B) is 0.4. The probability that both
memory card and battery are purchased, P(A∩B) or P(AB) is 0.3.
Given that the selected individual purchased an extra battery, the probability that an optional
card was also purchased is P (memory card/battery) or P(A/B).
P (memory card/battery) = P(A|B) = P(A∩B)/ P(B), P(B) > 0
= 0.3/0.4 = 0.75
This implies of all those purchasing an extra battery, 75% purchased an optional memory card.
Similarly, given that the memory card was purchased, the probability of buying battery is given
by P (battery/memory card) or P(B/A).
P (battery|memory card) = P(B|A) = P(A∩B)/ P(A), P(A) > 0
= 0.3/0.6 = 0.5
It can be observed that P(A|B) ≠ P(A) and P(B|A) ≠ P(B)
This means conditional probability is not equal to unconditional probability.
IN-TEXT QUESTIONS
1. A card is drawn from a deck of cards. What is the probability that it will be either a
heart or a queen?
2. The numerator is the union of two events in the computation of conditional probability.
(True/false)
3. In a class, there are 500 students of which 300 are, males and 200 are females. Of these
100 males and 60 females plan to major in accounting. A student is selected at random
from this class and it is found that this student plans to be an accounting major. What
is the probability that the student is a male?
4. If there are two events, A and B, then the probability that event A occurs knowing that
event B has already occurred is referred to as ______________.
98 | P a g e
5. If we randomly pick two TV sets in succession from a shipment of 240 T.V tubes of
which 15 are defective. What is the probability that they will both be defective?
5.4 BAYES’ THEOREM
Bayes’ Theorem is primarily a mathematical formula for computing conditional probability

and was named after 18th Century British mathematician.1 This theorem is also known as
Bayes’ Rule or Bayes’ Law and is also considered the foundation of Bayesian statistics. As
discussed in the above section, conditional probability indicates the likelihood of a particular
outcome occurring, based upon the results of an earlier or previous event that has already
occurred. There is a wide range of applications of Bayes’ Theorem in the field of finance such
as the risk of lending money to borrowers. Furthermore, Bayes’ Theorem plays an instrumental
role in the implementation of machine learning.
There are many situations where the outcome of the experiment is conditional upon or depends
on the outcomes associated with various intermediate stages. To comprehend such intermediate
stages let us consider one example.
Suppose the completion of a construction assignment may be delayed because of some political
emergency such as a curfew. Suppose there are 0.60 probabilities that there will be a political
emergency, 0.85 that the construction assignment will be completed on time if there is no
emergency, and 0.35 that the construction work will be completed on time if there is a political
emergency. What is the probability that the construction assignment will be completed?
Solution: Let us assume that A is an event that the construction assignment will be completed
on time and B is the event that there will be a political emergency. It is given that P(B) is 0.60.
The probability of event A occurring such that B does not occur P(A/B') is 0.85 while the
probability of event A occurring such that event B has already occurred P(A/B) is 0.35.
By using the formula P (A) = P [ (AB) ∪ AB')]
= P(A∩B) + P (AB')
= P(B). P(A|B) + P (B'). P (A|B')
P(A) = (0.60) *(0.35) + (1- 0.60) * (0.85) = 0.55
Such a case can be generalized where the intermediate stage permits k different alternatives
denoted by B1, B2, B3………… Bk. The following theorem connects these intermediate stages
by what is known as the rule of total probability or the rule of elimination.
The B’s constitute a partition of the sample space if they are pairwise mutually exclusive and
if their union equals S as shown in figure 2. Bi’s are mutually exclusive and exhaustive, if A
occurs it must be in conjunction with exactly one of the Bi’s. This mainly implies A = (B1∩A)
1
https://www.investopedia.com/terms/b/bayes-theorem.asp
99 | P a g e
∪ ………∪ ( Bk ∩A), where all the events (Bi ∩A) are mutually exclusive. This “partitioning of
A” is illustrated in figure 2 below.
Figure 2: Partitioning of A by mutually exclusive and exhaustive Ai’s.

Therefore, if events B1, B1, B1………… Bk are constituting a partition of sample space S and
P(Bi) = 0 for all i = 1,2,3, ....................k the for any event A in S
𝐴
P(A) = ∑𝑘𝑖=1 𝑃 (𝐵𝑖 ). 𝑃(𝐵 ))
𝑖
For any event A in S, such that P(A) ≠ 0
𝐴
𝑃(𝐵𝑟).𝑃( )
𝐵𝑟
P(Br/A) = 𝐴 , for all r = 1, 2,3,………..k
∑𝑘
𝑖=1 𝑃 (𝐵𝑖 ).𝑃(𝐵 ))
𝑖
𝐴 𝐴 𝐴
∑𝑘𝑖=1 𝑃 (𝐵𝑖 ). 𝑃( ) = 𝑃( 𝐵1 ). 𝑃 ( ) + 𝑃(𝐵2 ). 𝑃 ( ) +
𝐵 𝑖 𝐵 1𝐵 2
𝐴 𝐴
𝑃(𝐵3 ). 𝑃(𝐵 )+……… 𝑃(𝐵𝑘 ). 𝑃(𝐵 )
3 𝑘
𝑃( 𝐵𝑟 ∩𝐴)
P(Br/A) = 𝑃(𝐴)
In the figure 2, it is evident that given that event A has occurred, the probability that A had
occurred from partition B4 is given by
𝐴
𝑃(𝐵4 ).𝑃( )
𝐵4
P(B4/A) = 𝐴 , for i = 1, 2,3,4
∑4𝑖=1 𝑃 (𝐵𝑖 ).𝑃( ))
𝐵𝑖
100 | P a g e
𝐴 𝐴 𝐴 𝐴 𝐴
∑4𝑖=1 𝑃 (𝐵𝑖 ). 𝑃( ) = 𝑃( 𝐵1 ). 𝑃 ( ) + 𝑃(𝐵2 ). 𝑃 ( ) + 𝑃(𝐵3 ). 𝑃( ) + 𝑃(𝐵4 ). 𝑃( )
𝐵
𝑖 𝐵1 𝐵 2 𝐵 3 𝐵 4
P(B4/A): Probability that partition B4 occurs given that event A has occurred.
Example: The probability of receiving a spam message given that the computer programme
filter has confirmed the probability to be more than 0.6. These are related probabilities that can
be calculated by using Bayes’ Theorem. Using the same notations, we find two mutually
exclusive and collectively exhaustive events A and B as follows.
A : The incoming mail is a spam message.
B : The incoming mail is not a spam message.
The other events defined in the context of the same experiment are:
C : Filter test confirms spam
D : Filter test did not confirm spam.
The data given to us are:
P(A) : Probability of finding spam = 0.6
P(B) : Probability of not finding spam = 0.4
P(C|A) : Probability test predicts correctly when spam is actually confirmed or found.
P(D|A) : Probability test predicts incorrectly when spam is actually found.
P(D|B) : Probability test predicts correctly when actually spam is not there.
P(C|B) : Probability test predicts incorrectly when actually no spam is found.
We are interested in finding:
P(C) : Probability that the test says spam is there.
P(D) : Probability that the test says no spam is there.
P(A|C) : Probability of finding spam, given positive test results.
P(A|D) : Probability of finding spam, given negative test results.
P(B|C) : Probability of not finding spam, given positive test results.
P(B|D)' : Probability of not finding spam, given negative.
Applying Bayes’ Theorem
101 | P a g e
𝑃(𝐶|𝐴)∗𝑃(𝐴)
P(A|C) = 𝑃(𝐶 |𝐴).𝑃(𝐴)+𝑃(𝐶 |𝐵 ).𝑃(𝐵)
0.9∗0.6
= 0.9 .∗ 0.6 +0.3∗0.4 = 0.818
𝑃(𝐶|𝐵)∗𝑃(𝐵)
P(B|C) = 𝑃(𝐶 |𝐵 ).𝑃(𝐵)+𝑃(𝐶 |𝐴).𝑃(𝐴)
0.3∗0.4
= = 0.182
0.3 .∗ 0.4 +0.9∗0.6
𝑃(𝐷|𝐴)∗𝑃(𝐴)
P(A|D) = 𝑃(𝐷 |𝐴).𝑃(𝐴)+𝑃(𝐷 |𝐵 ).𝑃(𝐵)
0.1∗0.6
= 0.1 .∗ 0.6 +0.7∗0.4 = 0.176
𝑃(𝐷|𝐵)∗𝑃(𝐵)
P(B|D) =
𝑃(𝐷 |𝐵 ).𝑃(𝐵)+𝑃(𝐷 |𝐴).𝑃(𝐴)
0.7∗0.4
= 0.1∗0.6+0.7∗0.4 = 0.824
We also know that

P(C) = P (C | A). P(A) + P(C|B). P(B) = 0.9 *0.6 + 0.3*0.4 = 0.66
P(D) = P (D | A). P(A) + P(D|B). P(B) = 0.1 *0.6 + 0.7*0.4 = 0.34
We also find that
P(C) + P (D) = 0.66 + 0.34 = 1
P(A|C) + P(B|C) = 0.818 + 0.182 = 1
P(A|D) + P(B|D) = 0.176 + 0.824 = 1
102 | P a g e
CASE STUDY
Suppose a consulting firm rents motorbikes from three rental agencies, 60 percent from agency
1, 30 percent from agency 2, and 10 percent from agency 3. Suppose 9 percent of bikes from
agency 1, need a tune-up and 6 percent of the cars from agency 3 need a tune-up, what is the
probability that a rental bike delivered to the firm will need a tune-up?
Let A be the event that the bike needs a tune and B1, B2, and B3 are the events that the bike
comes from rental agencies 1,2, or 3. P(B1) = 0.60, P(B2) = 0.30, P (B3) = 0.10, P(A/B2) =
0.20, and P(A/B3) = 0.06
According to the Bayes’ Theorem,
P(A) = (0.60) * (0.09) + (0.30) * (0.20) + (0.10) * (0.06) = 0.12
If a rental bike delivered to the consulting firm needs a tune-up, then what is the probability
that it came from rental agency 2?
P(B2/A) = (0.30) * (0.20) / (0.60) * (0.09) + (0.30) * (0.20) + (0.10) * (0.06)
= 0.060 / 0.120 = 0.5
It is observed that although only 30 percent of the bike delivered to the firm come from agency
2, 50 percent of those who require a tune-up come from the agency.
IN-TEXT QUESTIONS
6. A balanced die is tossed twice. If A is the event that an even number comes up on the
first toss, B is the event that an even number comes up on the second toss, and C is the
event that both tosses result in the same number, are the events A, B and C
a. Pairwise independent
b. Independent?
7. The probability of simultaneous occurrences of two events can never exceed the sum
of probabilities of these events. T
8. The conditional probability of an event given another event can never be less than the
probability of the joint occurrence of their events. T
5.5 INDEPENDENCE OF EVENTS
The concept of conditional probability suggests that the probability of an event A, P(A) must
be modified in context of another event B has occurred whose outcome affects the occurrence
of event A. The new probability now assigned to A can be expressed as P(A|B). This is
considered as conditional probability of event A occurring given that event B has already
occurred. Therefore, the conditional probability of A such that B has occurred, given by P(A|B)
103 | P a g e
differs from unconditional probability P(A). This mainly indicates that the information that B
has occurred results in change in the chance of A occurring.
The chances are that the occurrence of A is not affected by the fact that B has occurred,
implying that P(A|B) = P(A). In other words, the occurrence or non-occurrence of one event
has no consequence on the chances that the other will occur. Such events are referred to as
independent events.
The two events A and B are independent if P(A|B) = P(A), while if they are dependent P(A/B)
≠ P(A). There exists a strong connection between the concept of independence and conditional
probability.
The conditional probability formula for P(A|B) and P(B|A) as given below,
P(A|B) = P(AB)/ P(B), P(B) > 0 eq (1)
For P(B/A),
P(B|A) = P(AB)/ P(A), P(A) > 0 eq (2)
From equation 1, P(AB) = P(A|B) * P(B) eq (3)
Substituting equation 3 in equation 2 we get

P(B|A) = P(A|B) * P(B) / P(A)
If the conditional and unconditional probability of A are same implying that
P(A|B) = P(A)
In other words, events A and B are independent,
As a result, P(B|A) = P(A) * P(B) / P(A)
P(B|A) = P(B)
CASE STUDY
If a coin is tossed three times and each of the outcomes is equally likely to occur. Suppose A
is the event that a Head occurs on each of the two tosses, B is the event that a tail occurs on the
third toss and C is the event that exactly two tails occur in the three tosses. Show that
a. Events A and B are independent
b. Events B and C are dependent.
Sample space: S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}
104 | P a g e
A = {HHH, HHT} B = {HHT, HTT, THT, TTT} C = {HTT, THT, TTH}

A ∩ B = {HHT}, B ∩ C = {HTT, THT} P(A) = ¼, P (B) = ½, P (C) = ⅜, P (A ∩ B) =
⅛, and P (B ∩ C) = ¼
Since, P (A) * P (B) = ¼ * ½ = ⅛ = P (A ∩ B) = ⅛, implies that events A and B are
independent.
Since P (B) * P (C) = ½ * ⅜ = 3/16 ≠ P (B ∩ C) = ⅛, implies that events A and
B are not independent.
IN-TEXT QUESTIONS
9. Prove that if A and B are independent the A' and B' are also independent
10. The two mutually exclusive events must be independent. (True / False)
11. A bag contains 7 red and 4 blue balls. Two balls are drawn at random with replacement.
The probability of getting the balls of different colors is:
a. 28/121
b. 56/121
c. ½
d. None of these
12. The conditional and unconditional probability of random variables are always equal.
5.6 SUMMARY
The lesson presents a measure of the likelihood of an event occurring, assuming that another
event or outcome has previously occurred, which is called conditional probability. The
conditional probability has wide range of application. The computation of conditional
probability has been explained systematically in the lesson. The role of conditional probability
to define independence and dependence of events has been described. In case of independent
events conditional and unconditional probabilities are same. The notion and relevance of Bayes
theorem which is primarily the application of conditional probability has been explained.
5.7 GLOSSARY
1. Conditional Probability: Conditional probability is defined as the likelihood of an
event or an outcome occurring based on the occurrence of a previous event or outcome.
It is condition upon the occurrence of some event that has happened earlier.
2. Unconditional Probability: It is a chance that one outcome occurs out of many
outcomes. It refers to the likelihood that one outcome occurs irrespective of other
outcomes.
105 | P a g e
3. Bayes’ Theorem: It states that based on the occurrence of another event; the
conditional probability indicates the likelihood of the second event given the first event
multiplied by the probability of the first event.
4. Independence of Events: The occurrence or non-occurrence of one event has no
consequence on the chances that the other will occur. Such events are referred to as
independent events. In other words, the two events are said to be independent only if
the conditional probability is equal to the unconditional probability.
5.8 ANSWERS TO THE QUESTIONS
1. 4/13
Hint: P(S), n(S) = 52
A: A card drawn from 52 cards
H: Heart appearing
Q: Queen appearing
P(HUQ) = P(H) +P(Q) - P(HQ)
P(H) = n(H)/n(S), = 13/52
P(Q) = n(Q) / n(S) = 4/52
P(HQ)= n(HQ)/n(S) = 1/52
P(HUQ) = 13/52 + 4/52 - 1/52 = 4/13
2. False
3. 5/8
Hint: Accounting major: A: n(A) = 160
n(S) = 500
M: n(M) = 300, n(MA) = 100, P(MA) = n(MA)/n(S) = 100/500
F: n(F) = 200
P(A) = n(A)/n(S) = 160/500
P (M/A) = P(MA) or P(AM)/ P(A) = 100/160 = ⅝
4. Conditional Probability
5. 7/1,912
Hint: Let event A be the event when the first randomly picked-up TV set out of the two
is defective, A: 1st TV defective. The number of elements in the sample space is n(S)
is 240.
106 | P a g e
The probability of event A, P(A) = n(A)/n(S) = 15/240.

Let event B be the event when the second randomly picked-up TV set out of the two is
defective, B: 2nd TV defective, P(B/A) = 14/239
P(AB) = P(A)*P(B|A) =15/240*14/239 = 7/1,912
6. a) the events are pairwise independent
b) the events are not independent
7. True
8. True
9. Hint: Event A can be expressed as (A ∩ B) ∪ (A ∩ B')
Also note that (A ∩ B) and (A ∩ B') are mutually exclusive. It is given that A and B
are independent.
10. False
11. (b)
12. False
1. Two six-faced dice are rolled together, or dice is rolled twice. The total number of
possible outcomes are 36.
2. (i) Prove that the probability of null event is zero, P (∅) = 0.
(ii) Prove that for any two events A and B
P(AUB) = P(A) +P(B) - P(AB)
5.10 REFERENCES
Cengage Learning.
Prentice Hall.
107 | P a g e
LESSON 6
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS
STRUCTURE
6.2 Introduction
6.3 Random Variables
6.3.1 Types of Random Variables
6.4 Probability Mass Function
6.5 Probability Density Function
6.5.1 Properties of Probability Density Function
6.6 Summary
6.7 Glossary
6.10 References
After reading this lesson, students will be able :
1. To understand the concept of random variables, and their significance in statistical
analysis.
2. The students will be able to distinguish between the two fundamental types of random
variables, namely the discrete random variable and continuous random variable.
3. To familiarize the students with some commonly used discrete and continuous
distributions of random variables.
4. To understand the concept of probability distribution function or probability mass
function and
5. To comprehend the derivation of the probability distribution function
6.2 INTRODUCTION
We have seen in earlier units how the concept of probability enables us to compute the extent
of uncertainty associated with random experiments. An experiment may yield both qualitative
and quantitative outcomes. Statistical analysis focuses on the numerical aspect of the data or
experiment. Thus, the term random variable is introduced to represent any event or outcome
108 | P a g e
that can take different values. Such a variable takes the values that are the plausible outcomes
of any event or experiment.
Since the values are random and associated with random experiment such variables are termed
as random variables. The concept of random variable allows us to pass from the experimental
outcomes themselves to a numerical function of the outcomes.
There are two types of random variables discrete random variables and continuous random
variables. This chapter will define the concept of random variables along with the two
fundamental types of random variables. The chapter will help us to understand the derivation
of probability distribution function both for discrete and continuous random variables.
6.3 RANDOM VARIABLE
Each outcome of an experiment can be associated with a number by specifying a rule of
association example: total weight of baggage for a sample of 25 airline passengers.
Such a rule of association is called random variable
A random variable is a variable because the observed value depends on which of the possible
experimental outcomes results.
Sample Space
X2 f(x1) f(x2)
X1
Fig 1: Rule that connects outcome of sample space to the value

For a given sample space S of some experiment, a random variable (rv) is any rule that
associates a number with each outcome in S. In other words, a random variable is a function
whose domain is the sample space and whose range is the set of real numbers as shown in
figure 1. Various outcomes of random experiments denoted by X1, X2 belonging to the sample
space of a certain experiment are the random variables that act as a dependent variable for the
rule or a function that defines probability associated with each plausible outcome x1.
A random variable is denoted by a capital letter such as X, Y, and the values picked up are
referred as x,y,z ……….
In a toss of two coins, S = {HH, HT, TH, TT}
If Event A is defined as the number of heads appearing in the toss of two coins.
109 | P a g e
A: {no. of heads in the toss}, then a variable X can be assigned as a random variable to A,
where the values that X picks up are all random based on the outcome of the experiment i.e.
Toss of two coins.
Similarly, if an event B is defined as B as the number of tails in the toss of two coins,
B: {no. of tails in the toss}, then a variable Y can be assigned as a random variable to B, where
the values that Y picks up are all random based on the outcome of the experiment i.e. Toss of
two coins.
6.3.1 Types of random variables
A random variable is a variable that takes values that are nothing, but the outcomes associated
with the random experiment. Here, on the basis of values or data taken by the random variable,
the random variables can be distinguished on the basis of the observed data and its countability.
Thus, a random variable can be distinguished as a discrete rv or a continuous rv.
a) A discrete random variable: A discrete random variable is a rv whose possible values either
constitute a finite set or else can be listed in an infinite sequence in which there is a first
element, a second element, and so on ……... (countable finite).
b) A continuous random variable: The continuous random variable consists either of all the
numbers in a single interval on the number line (infinite from - infinity to infinity) or all
numbers in a disjoint union of such intervals. No possible value of the variable has a
positive probability, P(X=c) = 0 for any possible value c.
On the basis of the data taken by the random variable, we define the functions that yield the
corresponding value of probability for each specific value of a random variable. Such functions
are called probability distribution which gives probabilities of occurrences of different possible
outcomes of an experiment. Further, depending on whether the random variable takes discrete
or continuous values, these functions are referred to as Probability mass function (Pmf) or
Probability density function (pdf).
6.4 PROBABILITY MASS FUNCTION
For a discrete random variable X that can take at most a countably infinite number of values
x1, x2, ………………., we associate a probability
pi = P [X =x] = P [ all s € S, X(s) = x]
that must satisfy the following conditions,
1. p(x) ≥ 0 for all x which implies that the value of probability distribution is positive at
all the values taken by the random variable x.
2. ∑𝑥 𝑝(𝑥) =1 which implies that all the values taken by the random variable complete the
sample space, therefore the sum of all probabilities of every value of the random
variable is 1.
110 | P a g e
Example 1:
Let us consider an example of a lab in the Department of Economics, where six computers are
reserved for an Economics major. Let the random variable X denote the number of these
computers that are in use at a particular time in a day. Suppose the probability distribution of
X corresponding to each value of X is given below.
X 0 1 2 3 4 5 6
P (X =x) 0.05 0.10 0.15 0.25 0.20 0.15 0.10
It is easy to verify that the above values satisfy both the properties of a Probability Mass
Function as each p(x) is positive and all of them sum to unity.
By using the above-given probabilities, various probabilities could be computed such as
The probability that at most 2 computers are in use is given by P (X ≤ 2), the probability that
the random variable takes at most the value 2.
P (X ≤ 2) = P (X =0 or 1 or 2) = p (0) + p (1) + p(2) = 0.05 + 0.10 +0.15 = 0.30
In case of the event that at least 3 computers are in use is given by P (X ≥ 3). Since the event
that at least 3 computers are in use is complementary to at most 2 computers are in use,
therefore, the probability can be computed as follows.
P (X ≥ 3) = 1 - P (X ≤ 2)
= 1 - 0.30
= 0.70
Another way to compute the probability of the event that at least 3 computers are in use is by
adding values of probability when X takes the values 3, 4, 5 and 6.
The probability that between 2 and 5 computers are in use is given by
P (2 ≤ X ≤ 5) = P (X = 2, 3, 4 or 5) = 0.15 + 0.25 + 0.20 + 0.15 = 0.75
The probability that the number of computers in use is strictly between 2 and 5 is
P (2 < X < 5) = P (X = 3 or 4) = 0.25 + 0.20 = 0.45
In the above example of the number of computers in use in the computer lab of the Department
of Economics, let us verify if the probability distribution function satisfies the properties.
111 | P a g e
Firstly, the value of each probability distribution function is positive. Therefore, the first
property is satisfied. The sum of all probabilities is 1. The second property is also satisfied.
Thus, the given distribution function can serve as a probability distribution function.
Let us create a probability distribution function for the toss of two fair coins. In this random
experiment, consider the event X which is the number of heads that appear.
S: Sample space X: Random variable P(X=x) p(x)
(r.v.)
(no. of heads)
TT 0 P(X=0) = ½ * ½ ¼
HT 1 P(X=1) = ½ * ½ ¼
TH 1 P(X=1) = ½ * ½ ¼
HH 2 P(X=2) = ½ * ½ ¼
Now, one can create the probability distribution function defined for the specific values x was
taken by the random variable X that represents the event, the number of heads that appear in
the toss of two fair coins. Thus, X can take values 0,1 and 2 because in two tosses of fair coins
we can have no heads, only one head or both outcomes as head.
The probability distribution function for the discrete random variable can be represented by the
following function p(x) which defines below.
¼ if x = 0
p(x) = ½ if x = 1
¼, if x = 2
1. The value of the probability distribution function is always positive. This implies p(x)
≥0
2. The sum of all values of probability distribution function at given values of x is one.
∑2𝑥=0 𝑝(𝑥) = 1
The above function satisfies both conditions; therefore, it serves as pdf or pmf, which
can be depicted the form of a graph as below.
112 | P a g e
Probability Distribution Function (Pdf)

or
Probability Mass Function (Pmf)
0.6
0.5
0.4
0.3
0.2
0.1
0
1 2 3
Figure 2: Graph of Probability Mass Function (Pmf)

The function depicted in figure 2 is the graph plotted for the probability distribution function
or probability mass function defined in the example. The graph of the probability mass function
(Pmf) is a discrete function, and it can be depicted as a histogram as reflected in the figure 2.
CASE STUDY
Find the probability distribution or the pmf of the sum of numbers obtained on throwing
a pair of dice.
Solution: The Sample Space of the experiment is:
(1,1) (1,2) (1,3) (1,4) (1,5) (1,6)
(2,1) (2,2) (2,3) (2,4) (2,5) (2,6)
(3,1) (3,2) (3,3) (3,4) (3,5) (3,6)
(4,1) (4,2) (4,3) (4,4) (4,5) (4,6)
(5,1) (5,2) (5,3) (5,4) (5,5) (5,6)
(6,1) (6,2) (6,3) (6,4) (6,5) (6,6)
The sum of the numbers are as follows:
234567
345678
456789
113 | P a g e
5 6 7 8 9 10
6 7 8 9 10 11
7 8 9 10 11 12
The probability distribution or pmf is therefore given by:
x p(x) = P(X=x)
2 1/36
3 2/36
4 3/36
5 4/36
6 5/36
7 6/36
8 5/36
9 4/36
10 3/36
11 2/36
12 1/36
114 | P a g e
IN-TEXT QUESTIONS
1. A random variable that can assume any possible value between two points is called a
________________ (discrete/continuous) random variable.
2. If C is a constant in a continuous probability distribution, then p(X=c) is always equal
to Zero. This statement is True/False.
3. A listing of all the outcomes of an experiment and the probability associated with each
outcome is called:
a) Probability density function
b) Cumulative distribution function
c) Probability distribution
d) Probability tabulation
CASE STUDY
Check if the following function given by
p(x) = x + 2 / 25, for x = 1,2,3,4,5
can serve as the probability distribution of a discrete random variable.
Solution: For all the values of x, p(x) is computed as follows,
For, x=1, f (1) = 3/25
x=2, f (2) = 4/25
x= 3, f (3) = 5/25
x=4, f (4) = 6/25
x=5, f (5) = 7/25
For every x, p(x) ＞ 0, and also the sum of all p(x) is 1. Thus, all two properties of the
probability distribution function have been satisfied.
IN-TEXT QUESTIONS
4. Find the probability distribution of the total number of heads obtained in four tosses of
a balanced coin.
5. The probability distribution of a random variable is defined as
115 | P a g e
x -1 -2 0 1 2
p(x) c 2c 3c 4c 6c
Then, c is equal to
a) 0
b) ¼
c) 1
d) 1/16
6. The suitable graph for the probability distribution of a discrete random variable is
a) Probability Histogram
b) Stepwise Function
c) Both (a) and (b)
6.5 PROBABILITY DENSITY FUNCTION

For continuous random variable X that can take all possible values between certain limits, we
consider the small interval (x, x+∆𝑥) of length ∆𝑥. If f(x) is any continuous function of x such
that f(x) ∆𝑥 represents the probability that X lies in infinite small interval (x, x+∆𝑥)
that is, P [x≤ 𝑋 ≤ 𝑥 + ∆𝑥] = 𝑓(𝑥) ∆𝑥
The function f(x) so defined is known as probability density function and is denoted by
P[x≤𝑋≤𝑥+∆𝑥]
f(x) = lim
∆𝑥 →0 ∆𝑥
Let X be a continuous rv the probability distribution or probability density function pdf of X is

a function p(x) such that for any two numbers a and b with a ≤ b,
𝑏
P (a ≤ X ≤ b) = ∫𝑎 𝑓(𝑥)𝑑𝑥
Which is the probability that X takes value in the interval [ a, b] is the area above this interval
and under the graph of the density function. The value of function f (c) at a point c is irrelevant
in this case and does not provide the value P (X = c) as in the case of a discrete case. In the
case of a continuous random variable, probabilities are always associated with intervals and
therefore, probability value when the random variable takes a value c then, P (X = c) = 0 for
any real constant c.
116 | P a g e
a b
Figure 3: The density curve between a and b given by P (a ≤ X ≤ b)
The probability between intervals a and b can be computed by integrating the density function
as depicted in figure 3. This implies that probability between a and b P (a ≤ X ≤ b) can be
𝑏
obtained by integrating the density function ∫𝑎 𝑓(𝑥)𝑑𝑥.
6.5.1 Properties of Probability Density Function
Every Probability density function qualifies certain properties. The first and foremost property
is the two conditions that should be satisfied for any function of a continuous random variable
to be addressed as the probability density function.
1. A function can serve as a probability density of a continuous random variable X if its
values, p(x), satisfy the following two conditions.
(a) p(x) ≥ 0 for -∞ < x < ∞, for all x
∞
(b) ∫−∞ 𝑝(𝑥)𝑑𝑥 = 1
2. If X is a continuous random variable and a and b are real constants with a ≤ b then,
P (a ≤ X ≤ b) = P (a ≤ X < b) = P (a < X ≤ b ) = P ( a < X < b )
CASE STUDY
For example, If X random variable has the probability density given by
f (x) = k.e-3x for x > 0
0 elsewhere
Find k and P (0.5 ≤ X ≤ 1)
117 | P a g e
The given function satisfies the two necessary conditions for the probability density function.
(a) p(x) ≥ 0 for -∞ < x < ∞
∞ ∞
(b) ∫−∞ 𝑓(𝑥)𝑑𝑥 = 1 = ∫−∞ k. 𝑒 −3𝑥 dx = k/3 =1
1
For k = 3, P (0.5 ≤ X ≤ 1) = ∫0.5 3. 𝑒 −3𝑥 dx = 0.173
IN-TEXT QUESTIONS
The pdf of a continuous random variable X is given by:
0.075𝑥 + 0.2, 3 ≤ 𝑥 ≤ 5
𝑓(𝑥) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
7. The total area under the density curve of X is
(a) 1
(b) 0
(c) ½
(d) ¼
8. Find P(X≤ 4).
9. P(X≤ 4) will be the same as P(X<4). This statement is True/False.
6.6 SUMMARY
The lesson describes the concept of a random variable which takes values that are outcomes of
random experiments. The two types of random variables namely the discrete and continuous
random variables have been described in the lesson. The discrete random variables are the
variables whose values are either finite or infinite in character. While the continuous random
variable consists either of the intervals on the number line or a disjoint union of intervals. The
probability distribution functions, also known as probability mass functions, depicting
probabilities corresponding to each outcome or value of the discrete random variable have been
derived. The corresponding graph of probability mass function has been plotted. Similarly, the
probability density function, depicting the probability associated with each of the continuous
random variable has been derived. Finally, the properties of the probability distribution and
probability density functions have been presented in the lesson.
118 | P a g e
6.7 GLOSSARY
1. Random Variable: A random variable is a rule that assigns a numerical value to each
outcome in a sample space. It is a variable because the observed value depends on which
of the possible experimental outcomes results.
2. Discrete random variable: A discrete random variable is an rv whose possible values
either constitute a finite set or else can be listed in an infinite sequence in which there
is a first element, a second element, and so on
3. Continuous random variable: The continuous random variable consists either of all
the numbers in a single interval on the number line (infinite from - infinity to infinity)
or all numbers in a disjoint union of such intervals.
4. Probability distribution function: The probabilities assigned to various outcomes in
S in turn determine probabilities associated with the values of any particular random
variable rv. X. It is a function that provides relative likelihood of occurrence of all
possible outcomes of an experiment.
5. Probability density function: The function defines the probability function
representing a continuous random variable belonging to some specified range of values.
The function provides the likelihood of values of continuous random variables.
1. Discrete
2. True
3. (c) Probability Distribution
4. p(x) = 4Cx /16, for x = 0, 1,2, 3, 4
5. (d) 1/16
6. (a) Probability Histogram
7. (a) 1
8. 0.4625
9. True
1. What is the difference between discrete and continuous random variables?
2. An event management company has overbooked the tickets for an upcoming music
concert. The available seating capacity is 40 while the company has sold 45 tickets.
119 | P a g e
Suppose X denotes the number of ticketed people who actually show up for the concert.
The probability mass function of X is given by:
x 35 36 37 38 39 40 41 42 43 44 45
p(x) 0.05 0.10 0.13 0.13 0.25 0.18 0.05 0.05 0.03 0.02 0.01
What is the probability that the event management company will be able to accommodate
all ticketed people who show up?
3. Prove that the following function defined by
1
𝑓(𝑥) = {5 , 2<𝑥<7
can serve as a valid pdf for a random variable X.
4. The pdf of a continuous random variable Y is given by
𝑘
, 0<𝑦<4
𝑓(𝑦) = {√𝑦
Find the value of k. Also, find P (Y≥ 1)
6.10 REFERENCES
Cengage Learning.
Prentice Hall.
120 | P a g e
LESSON 7
CUMULATIVE DISTRIBUTION FUNCTION, DENSITY FUNCTION, EXPECTED
VALUE, AND VARIANCE
STRUCTURE
7.2 Introduction
7.3 Cumulative Distribution Function (CDF)
7.3.1 Properties of Cumulative Distribution Function
7.4 Cumulative Density Function
7.4.1 Properties of Cumulative Density Function
7.5 Expected Value
7.5.1 Properties of Expected Value
7.6 Variance V(X)
7.6.1 Properties of Variance
7.7 Summary
7.8 Glossary
7.11 References
After reading this lesson, students will be able :
1. To understand the need to evaluate the descriptive statistics of the probability
distribution function.
2. To learn the formula and method to derive the expected value of the probability
distribution function.
3. To compute and apply the expected value in different random experiments by deriving
the probability distribution functions.
4. To learn the formula and method to derive the variance of the probability distribution
function.
121 | P a g e
5. To compute and apply the variance in different random experiments by deriving the
probability distribution functions and
6. To provide exposure to various useful properties of expected value and variance to the
students and their applications.
7.2 INTRODUCTION
In the earlier lesson 6, we discussed random variables, types of random variables, and
probability distributions. Once we have the probability distribution function of a random
variable, it is essential to evaluate and assess its mean and other descriptive statistics such as
expected value and variance. The population mean for a random variable is a measure of centre
for the distribution of a random variable. The expected value is essentially a formula that
enables us to evaluate the mean value as more and more values of the random variables are
collected either by trials or random experiments or any kind of experiment involving
probability, the sample mean becomes closer and closer to the expected value. It is obtained by
summing the product of the value of the random variable and its associated probabilities over
all the values of the random variable.
Another important measure of dispersion is variance. It determines the measure of spread for
the distribution of a random variable. It reflects the degree of variability of values of a random
variable from the expected value. The variance for a given probability distribution function is
obtained by summing the product of the square of the difference between the value of the
random variable and the expected value, and the associated probability of the value of the
random variable taken across all the values of the random variable.
7.3 CUMULATIVE DISTRIBUTION FUNCTIONS (CDF)
The cumulative distribution function (cdf) of a discrete r.v. variable X with pmf f(x) is defined
for every number x by
F(x) = P (X ≤ x) = ∑𝑥𝑦=−∞ 𝑝(𝑦)
For any number x, F(x) is the probability that the observed value of X will be at most x.
Let us compute the Cumulative Distribution Function for the toss of two coins.
S: Sample space X: Random variable f(x): pmf F(x): CDF (cdf)
(r.v.) (no. of heads)
TT 0 ¼ ¼
HT 1 ¾=¼+½
TH 1 ½
HH 2 ¼ 1=¼+¾
122 | P a g e
The first column represents the elements in the sample space. The next column presents the
values of a random variable which is defined as the number of heads in the toss of two coins,
denoted by X random variable. The values that random variable X takes are denoted by x and
these values are 0,1 or 2. The third column represents the corresponding probabilities
associated with each value of random variable X. The fourth column is obtained by adding or
cumulating all the earlier probabilities till each random variable.
The Cumulative Distribution Function for the toss of two coins can be presented in the
following way.
¼ if x< 1
F(x) = ¾, if 1 ≤ x < 2
1, if x ≥ 2
1. f(x) ≥ 0
The probability distribution function is positive for each value of random variable X.
2. F (x) = ∑𝐷 𝑓(𝑥)= 1
This property signifies that the value of the cumulative distribution function for the last
value of the random variable for which it is defined is equal to 1. This is due to the fact that
while cumulating all the probabilities till the last value of random variable X for which the pdf
is defined, we tend to exhaust all plausible values or outcomes of sample space and therefore
the value of CDF function is 1 as the probability of whole sample space is 1.
The graph of the above cumulative distribution function can be presented in figure 1 below.
Figure 1: Graph of the cumulative distribution function cdf.
123 | P a g e
The graph of the cumulative distribution function is a step function. For all the values of
random variables less than zero, the value of the cumulative distribution function CDF is zero
since the probability or the value of the probability distribution function is zero. The cumulative
distribution function is defined for intervals where the extreme points of each interval are not
included in the interval concerned but included in the next slab or interval. At each interval the
values of the probability function keep getting added or cumulated, thus the cumulative
distribution function appears like a ladder or steps moving upwards. At the last value of the
random variable, all the values of probability distributions get added and the value of the
cumulative distribution function attains the value one, where it reaches the maximum value
and the value of cdf remains at one for all the infinite values of random variable for which it is
defined.
Consider whether the next person buying a computer at a university bookstore buys a laptop or
a desktop model.
X = 1, if the customer purchases a laptop computer
0, if the customer purchases a desktop computer
If 20% of all purchasers during a week select a laptop computer
For X = 0, p (0) = P(X=0) = P (next customer purchases a desktop model) = 0.8
For X = 1, p (1) = P (X =1) = P (next customer purchases a laptop model) = 0.2
p(x) = P (X = x) = 0 for x ≠ 0, 1
In the above activity, the example mentioned related to the next person buying a computer at a
university bookstore buying a laptop or a desktop model. The cumulative distribution function
can be derived in the following manner.
X f(x) Prob F(x)

0 0.8 P (X≤ 0) 0.8
1 0.2 P (X ≤ 1) 1
0, if x< 0
F(x) = 0.8, if 0 ≤ x < 1
1, if x ≥ 1
124 | P a g e
The value of CDF remains zero for all negative values of x for which the probability
distribution function is zero, while the CDF takes a value of 0.8 between 0 and 1 and finally 1
for all values 1 and greater than 1.
Figure 1: Graph of the cumulative distribution function cdf.

The probability distribution function is defined for two values of random variables 0 and 1.
Therefore, while computing the cumulative distribution function the value of CDF remains
zero for all negative values of x for which the probability distribution function is zero, while
the CDF takes a value of 0.8 till the points are strictly less than 1. Finally, the CDF value
becomes 1 when the probability distribution function value at 1 is added. The CDF continues
to attain the value 1 for all values of random variable X.
7.3.1 Properties of Cumulative Distribution Function
The value of F(x) of the distribution function of a discrete random variable X satisfies the
conditions:
1. F (- ∞) = 0 and F ( ∞ ) = 1
The value of Cumulative Distribution Function CDF is 0 for all negative values of random
variable X for which the probability distribution function remains zero.
2. If a <b, then F (a) ≤ F (b) for any real numbers a and b.
125 | P a g e
The CDF is a non-decreasing function in the random variable X. This implies for each greater
value of X; the value of the cumulative distribution function is also greater.
If the probability distribution of a discrete random variable is given, the corresponding
distribution function can be derived.
The distribution function of the total number of heads obtained in four tosses of a balanced
coin can be obtained as below. For x = 0,1,2,3,4.
‘
f (0) = 1/16, f(1) = 4/16 , f(2) = 6/16, f(3) = 4/16 , and f(4) = 1/16
It follows that
F (0) = f (0) = 1/16
F (1) = f (0) + f(1) = 5/16
F (2) = f (0) + f(1) + f(2) = 11/16
F (3) = f(0) + f(1) + f(2) + f(3) = 15/16
F (4) = f (0) + f(1) + f(2) + f(3) + f (4) = 1
The properties of Distribution Function are satisfied by the above CDF.
1. F(- ∞ ) = 0 and F ( ∞ ) = 1
2. If a < b, then F (a) ≤ F (b) for any real numbers a and b.
Therefore, the distribution function is given by
0 for x < 0
1/16 for 0 ≤ x < 1
F(x) = 5/16 for 1 ≤ x < 2
11/16 for 2 ≤ x < 3
15/16 for 3 ≤ x < 4
1 for x ≥ 4
The distribution function is defined not only for the values taken on by the given random
variable but for all real numbers.
F (1.7) = 5/16 and F (100) =1, although the probabilities of getting “at most 1.7 heads” or “at
most heads” in four tosses of a balanced coin may not be of any real significance.
If the range of a random variable X consists of the values x1< x2< x3 < x4 ……<xn , then
126 | P a g e
f( x1 ) = F (xi ) - F (x i-1 ) , for i = 2,3, …., n

The above equation reveals that the probability distribution function can be computed from the
given cumulative distribution function by taking the difference of CDF for each consecutive
random variable.
IN-TEXT QUESTIONS
1. A bulb manufacturing company testing the number of defective bulbs. Let X denotes
the number of defective bulbs
0.25 x=0
f(x) = 0.10 x=1
0.30 x=2
0.35 x=3
(A) Calculate the Probability that almost 1 bulb is defective.
(B) find CDF.
2. Let X be the number of a group that attended the festival of the college.
X 0 1 2 3 4 5
P(X) 0.20 0.10 0.05 0.15 0.30 0.20
(a) Find the probability that at most two groups attended the festival.
(b) Find the probability that at least four groups attended the festival.
3. Cumulative distribution function of a random variable Y is the probability that Y takes
the value _____
(a) Equal to Y
(b) Greater than Y
(c) Less than or equal to Y
(d) Zero
127 | P a g e
7.4 CUMULATIVE DENSITY FUNCTION

A cumulative density function is the cumulative distribution function for a continuous random
variable. If X is a continuous random variable and the value of its probability density at x is
f(x), then the function given by
𝑥
F (x) = P (X ≤ x ) = ∫−∞ 𝑓(𝑦)𝑑𝑦 for -∞ < x < ∞ , for all x.
Is called the distribution function or the cumulative distribution of X
For each x, F(x) is defined as the area under the density curve to the left of x as shown in figure
2. The density function increases as x increase
f(x) F(x)
F(x) 1
x
Figure 2 (a): Probability density function Figure 2(b) : Cumulative density function
Figure 2 (a) depicts the probability density function while figure 2(b) depicts the cumulative
density function.
7.4.1 Properties of Cumulative Density Function
There are certain properties of cumulative density function:
1. This further indicates, F (-∞) = 0, F ( ∞ ) = 1
2. For any two numbers a and b with a ≤ b,

P (a≤ X ≤ b) = F (b) - F (a - )
a - represents the largest possible X value that is strictly less than a.
128 | P a g e
3. If a and b are integers, then

P (a ≤ X ≤ b) = P (X = a or a +1 or …...or b )
= F (b) - F (a - 1)
Taking a = b yields P (X = a) = F (a) - F (a -1) in this case
4. For any real constants a and b with a ≤b, and
f (x) = dF(x) / dx
where the derivative exists.
The third property indicates that one can obtain the probability function from the given
distribution or cumulative density function by simply differentiating the density function.
CASE STUDY
In the case study depicted in lesson 7, section 7.5.1, If X random variable has the probability
density given by
f (x) = k.e-3x for x > 0
0 elsewhere
The distribution function derived in lesson 7 section 7.5.1 is given by

𝑥
F(x) = ∫−∞ 3𝑒 −3𝑡 dt
= 1 - e-3t
Since F(x) = 0 for x ≤ 0,
F(x) = 0 for x ≤ 0
=1 -e-3x for x > 0
Now, to determine the probability P( 0.5 ≤ X ≤ 1 ), Cumulative density function F (x) can be
used
P( 0.5 ≤ X ≤ 1 ) = F(1) - F (0.5)
= (1 -e-3 ) - (1 - e-1.5 )
= 0.173
129 | P a g e
IN -TEXT QUESTIONS
4. Given the following probability density function pdf
f(x) = px2 0≤x≤3

0 otherwise
(A) Find the value of p such that pdf is defined.

(B) P(X>2)
7.5 EXPECTED VALUE

Let x be a discrete rv with a set of possible values D and pmf f(x). The expected value or mean
value of X, denoted by E (X) or μₓ is
E (X) = μₓ = ∑𝐷 𝑥. 𝑓(𝑥)
The expected value of any random variable X is obtained by summing the product of each
random variable x and the corresponding probability distribution function, where D is the
domain to which all values of x random variable belong.
Let us consider the computation of the expected value of the toss of two coins as shown below.
In the first column, the random variable X that denotes the number of heads is defined. The
random variable X takes values as small x, which could be 0, 1, or 2. The corresponding
probability for each value of X is depicted as a probability distribution function or probability
mass function in the second column.
For computing the expected value, the third column which is the product of random variable
and corresponding probability is presented. The final column computes the expected value
X: Random f(x) : pmf x. f(x) E (X)
variable (r.v.)
(no. of heads)
0 ¼ 0* 1/4 0
1 ½ 1 * 1/2 ½
2 ¼ 2 * 1/4 ½
E (X) = ∑𝐷 𝑥. 𝑓(𝑥) E (x) = ∑𝐷 𝑥. 𝑓(𝑥)= 1
130 | P a g e
If the r.v. X has a set of possible values D and pmf f(x), then the expected value of any function
h(x), denoted by E [ h(x)] or μh(x) is computed by
E [ h(x)] or μh(x) = ∑𝐷 ℎ(𝑥). 𝑓(𝑥)
The expected value of h(x) is given by the sum of the product of h(x) and the corresponding
probability distribution function f(x) as shown in the above expression, where D is the domain.
Example 1: Suppose there is a collection of 12 audio sets that include 2 with white cords. If
three of the sets are chosen at random for shipment to a hotel, how many sets with white cords
can the shipper expect to send to the hotel?
First, we need to construct the probability distribution of X, the number of sets with white cords
shipped to the hotel, given by
2 10
f(x) = Cx * C3-x for x = 0,1,2
10
C3
X 0 1 2
f(x) 6/11 9/22 1/22
E(X) = 0* 6/11 + 1 * 9/22 + 2 * 1/22 = ½

If X is the number of points rolled with a balanced die, find the expected value of
g(X) = 2X2 + 1
The probability of each outcome is ⅙., we get
E [ g(X)] = ∑ (2X2 + 1) * ⅙
= 94/3
IN-TEXT QUESTION
5. Suppose the probability distribution function is given as below
x -3 -2 -1 0 1 2 3
f(x) 0.20 0.05 0.15 0.05 p 0.15 0.25
131 | P a g e
(A) Find the CDF F(x)

(B) Find the expected value E(X)
6. If ‘Y’ is a continuous random variable then the expected value of Y is

(a) P(y)
∞
(B) ∫−∞ 𝑦. 𝑓(𝑦)𝑑𝑦
(c) ∑𝐷 𝑦. 𝑓(𝑦)
(D) None of these
7.5.1 Properties of Expected Value
One needs to find the average number of heads obtained when tossing a coin several times,
referred to as the expected value.
Some of the properties of expected value E(X) are
1. The expected value of a constant is a constant
E(b) = b
If b =2, then E (2) = 2
2. Expectation of the sum of the two random variables X and Y is equal to the sum of the
expectations of those random variables.
E (X+Y) = E (X) + E (Y)
3. The expected value of ratio of two random variables is not equal to the ratio of the
expected values of those random variables.
E (X/ Y) ≠ E(X) / E(Y)
4. The expected value of the product of two random variables that are dependent is not
equal to the product of expectations of those random variables.
E (XY) ≠ E(X) * E(Y)
However, if X and Y are independent random variables then
E (XY) = E(X) * E(Y)
Hint: Since the joint probability mass function for all values of X and Y is equal to the product
of individual probability distribution function pdf of two random variables for all values of
variables.
132 | P a g e
f (X, Y) =fx (x) * fy (y)
5. The expected value of the square of X is not equal to the square of the expected value
of X
E (X2) ≠ [ E(X) ]2
6. If a is a constant, then
E (aX) = a E (X)
7. If a and b are constants,

E(aX + b ) = a E ( X ) + E (b)
= a E(X) + b
E (4X + 7) = 4 E (X) + 7
8. The expected value of multivariate Probability distribution is

E (X) = ∑𝑥 ∑𝑦 𝑥. 𝑦 𝑓(𝑥𝑦)
IN-TEXT QUESTION
7. If E(XY)= E(X). E(Y) then X and Y are
(A) Dependent
(B) Independent
(C) Correlated
(D) None of these
8. E(X) = 4
E(Y) = 1
E(X-Y) is ______
(A) 2
(B) 4
(C) 3
(D) None of these
133 | P a g e
7.6 VARIANCE V(X)

The variance is the square of the mean of deviation between the values of a random variable
from the expected value or population mean. The variance of the random variable X is denoted
by 𝛔 x2, Var (X). The symbol represents standard deviation under the root of variance.
V(X) = E(X2 ) - [ E (X) ]2 or E(X2 ) - [ μ x ]2
Proof:
V(X) = E [ X - E(X) ]2 = E [ X - μ x ]2
= E [ X2 - 2 μ x X + μ x2 ]
= E ( X2 ) - 2 μ x E( X) + E ( μ x2 )
= E ( X2 ) - 2 μ x2 + μ x2 Since, Expected value of a constant is constant
itself
= E ( X2 ) - μ x2
= E ( X2 ) - [ E(X)] 2
Let X have pmf f(x) and expected value E(X) as μₓ. Then the variance of X denoted by V(X)
or 𝝈 x²X is
V(X) = 𝝈 x²X = ∑ (X - μₓ) ². f(x) = E (X - μₓ) ²
The standard deviation sd of X is 𝝈x = √ 𝝈²x
Let us consider the computation of variance of the two coins
X: Random f(x) :pmf x.f(x) x 2 f (x)
variable (r.v.) (no.
of heads)
0 ¼ 0* 1/4 0*¼ = 0
1 ½ 1 * 1/2 1* ½ = ½
2 ¼ 2 * 1/4 4* ¼ = 1
E (X) = ∑𝐷 𝑥. 𝑓(𝑥) or E (X) = 1 E (X²) = ∑𝐷 𝑥 2 . 𝑓(𝑥) = 1.5
E (X²) - [E (X)]² = 1.5 - 1 = 0.5
134 | P a g e
A CASE STUDY
Calculate the variance of a random variable X that represents the number of points rolled with
a balanced die.
X 0 1 2 3 4 5 6
f(x) 1/6 1/6 1/6 1/6 1/6 1/6 1/6

For this problem, we need to first compute the expected value of X, E (X)
E(X) = 1 * ⅙ + 2 * ⅙ + 3 * ⅙ + 4 * ⅙ + 5 * ⅙ + 6 * ⅙ = 7/2
E (X2) = 12 * ⅙ + 22 * ⅙ + 32 * ⅙ + 42 * ⅙ + 52 * ⅙ + 62 * ⅙ = 91/6
V(X) = E (X2) - [ E(X)] 2
= 91/6 - (7/2 )2
= 35/12
IN-TEXT QUESTION
9. Variance of first 5 natural number is ______
(A) 4
(B) 2
(C) 3
(D) 1
7.6.1 Properties of Variance

Variance thus defined shows how the individual values are spread or distributed around its
expected or mean value. Some of the useful properties of variance are mentioned below.
1. If all X values are equal to E (X) the variance is 0 value whereas if they are widely
spread around the expected value, it will be relatively large.
2. The variance of a constant is zero. It implies V (a) = 0
3. If X and Y are independent random variables then,
V (X + Y) = V (X) + V (Y)
V (X - Y) = V (X) - V (Y)
135 | P a g e
4. If b is a constant then,
V (X + b) = V (X) + V (b)
= V (X) + 0
= V (X)
5. If a is a constant then,
V (aX) = a2 V (X)
V (5X) = 25 V (X)
6. If a and b are constants then,
V (aX + b) = a2 V (X) + 0
V (5X + 9) = 25 V (X)
7. If X and Y are independent random variables and a & b are constants then,
V (aX + bY) = a2 V (X) + b2 V (Y)
V (3X + 5Y) = 9 V (X) + 25 V (Y)
8. The variance can be computed as
V (X) = E (X2) - [ E (X) ]2
E (X2) = ∑𝐷 𝑥 2 . 𝑓(𝑥)
E (X) = ∑𝐷 𝑥. 𝑓(𝑥)
IN-TEXT QUESTION
10. V(X) = 9, Find V(2X).
(A) 18
(B) 9
(C) 36
(D) 72
11. If the standard deviation of a set of observations is 5 and if each observation is divided
by 5, then the new standard deviation is.
(A) 1
(B) 2
(C) 4
(D) 5
136 | P a g e
7.7 SUMMARY
The lesson presents the cumulative distribution function both for the discrete and continuous
random variables. The properties of the cumulative distribution function have been discussed
with respect to both the discrete random variable and continuous random variable. The method
to derive the probability density function from the cumulative density function by
differentiating the cumulative density function is explained. Further the descriptive statistics
such as expected value and variance is computed for probability distribution function. There
are several useful and interesting properties of expected value and variance comprehensively
explained in the lesson.
7.8 GLOSSARY
1. Cumulative Distribution Function: The cumulative distribution function is another
way of defining the distribution of the discrete random variable. The cumulative
distribution function (cdf) is a discrete r.v. variable X with pmf p(x) is defined for every
number x
2. Cumulative Density Function: The cumulative density function, also referred to as
Density Function is the cumulative function of probability density distribution for
continuous random variables.
3. Expected value of Distribution Function: Let x be a discrete rv with a set of possible
values D and pmf p(x). The expected value or mean value of X, denoted by E (X) or μₓ.
It is the sum of the product of each random variable and the associated probability
function
4. Variance of Distribution Function: The variance is the square of the mean of
deviation between the values of random variables from the expected value or population
means. The variance of the random variable X is denoted by 𝛔2, Var (X). The
symbol represents standard deviation under root of variance.
1. (A) 0.35
0 x<0
0.25 0 ≤ x<1
(B) F(x) = 0.35 1≤x<2
0.65 2≤x<3
1 x≥3
137 | P a g e
2. (A) 0.35
(B) 0.50
3. (C) less than or equal to x
4. (A) 1/9
(B) 19/27
5. (A) 0.15
(B) 0.35
∞
6. (B) ∫−∞ 𝑦. 𝑓(𝑦)𝑑𝑦
7. (b) independent
8. (C) 3
9. (b) 2
10. (c) 36
11. (a) 1
Q. 1) If distribution function of X is given by
0 x<2
2/10 2 ≤ x<4
F(x) = 6/10 4≤x<6
7/10 6≤x<8
1 x≥8
(a) Find the probability distribution of the random variable f(x)

(b) P(X≥6). Probability of X taking the value at least 6.
138 | P a g e
Q 2) Suppose the probability distribution

X 0 1 2 3 4
P(x) 1/6 2/6 0 p 1/3
Find the value of the following

(a) F(x) CDF
(b) E(X), Expected value
(c) E(X+2)
(d) V(X), Variance
(e) V(X+2)
Q 3) The probability density function of random variable Y is given by

c/ √y
f(y) = 0<y<9
0 Otherwise
Find the value of
(a) c
(b) P(Y>4)
(c) E(Y), Expected value of Y
Q.4 A coin is tossed thrice. Let X denotes the number of tails. Find its expectation and
variance
Q.5 The probability density function of a random variable Y is given as below:
f(y) = 1/16y 0≤y≤9

0 Otherwise
Find the value of c such that P(Y≤c) = 1/2

7.11 REFERENCES
Cengage Learning.
139 | P a g e
Prentice Hall.
140 | P a g e
LESSON-8
DISCRETE DISTRIBUTION
STRUCTURE
8.2 Introduction
8.3 Probability Distribution for Discrete Random Variable
8.3.1 Uniform Distribution
8.3.2 Bernoulli Distribution
8.3.3 Binomial Distribution
8.3.4 Poisson Distribution
8.3.5 Limiting Case of Binomial Distribution
8.3.6 Hyper Geometric Distribution
8.4 Summary
8.5 Answer to In-Text Questions
8.7 References
After reading this lesson, students will be able to learn about :
1. Different kinds of discrete distributions.
2. Uniform distributions, Bernoulli distributions, Bernoulli trials,
3. Binomial distributions, Poisson distributions with some important numerous and
4. Waiting distributions i.e., geometric distributions, negative binomial and
hypergeometric distributions have been discussed.
8.2 INTRODUCTION
In this unit we will study the different types of discrete distributions. We have studied the
discrete random variable in unit 3. The discrete random variables form the discrete probability
141 | P a g e

distributions. Possible values of discrete random variables along with the probabilities forms
the discrete probability distribution.
1. Uniform Distribution
2. Bernoulli Distribution
3. Binomial Distribution
4. Poisson Distribution
5. Limiting case of Binomial distribution
6. Hyper-geometric distribution
8.3 PROBABILITY DISTRIBUTION FOR DISCRETE RANDOM VARIABLE

Under this distribution, random variable can take ‘n’ different values with equal probability.
For example, rolling a fair dice we get 1, 2, 3, 4, 5, 6 as outcome each with the probability of
1
6 . So the probability of occurrence of each event is equally likely.
Mean=E(x) and Variance=V(x)

k 1 k2 (1) 2
E(x) V(x)
2 12
Note that x has started from 1, 2, 3, .... n not from 0.
If I consider trials from 0 then
f (x) x 0,1, 2,....n

n 1
0 otherwise
Since there will be n + 1 terms
Mean=E(x) and Variance=V(x)
n n2
E(x) , V(x)
2 12
142 | P a g e
8.3.2 BERNOULLI DISTRIBUTION

A random experiment which gives rise to only two outcomes say ‘pass’ and ‘fail’ is known as
Bernoulli experiment. A random variable X is said to have Bernoulli distribution if its
probability mass function
𝑝 𝑥 (1 − 𝑝)1−𝑥 ; 𝑥 = 0,1 ; 0≤p≤1
f(x ; p) = 0 otherwise
Here the probability of pass is p and probability of fail is (1 – p).

So, when x = 0; f (0; p) = 1 – p and x = 1 f (1; p) = p
It is to be noted that under Bernoulli distribution, the number of trial, is only 1 i.e. If we have
to toss a coin to get ‘head’ as outcome, then the trial is just one toss to decide the outcome. So,
the probability of getting head is p and probability of getting a tail is 1 – p.
Mean = E(x) = p
Variable = V(x) = p (1-p)
8.3.3 Binomial Distribution
Binomial distribution was introduced by James Bernoulli in the year 1700. Binomial
distribution considers a sequence of Bernoulli trials i.e., having only two outcomes i.e. ‘pass’
and ‘fail’. The n trials are conducted under identical conditions and are independent with
constant probability.
Let X denotes success or pass in n independent trials with probability p as success and (1 – p)
= q as failure.
n r n r
So, f (X = r) = Cr p q ;r=0,1,2,3…..n ; 0≤p≤1
0≤q≤1
Where n, p are the parameters of Binomial distribution
Mean = np and variance = npq or np (1–p)
Example: The probability that a student will pass an examination is 0.4. Determine the
probability that out of 5 students: (i) at least one, (ii) at least two will pass the exam.
(i) p = P (passing the exam) = 0.4
q = P (failing the exam) = 1 – 0.4 = 0.6
P (at least one will pass the exam) = 1 – P (none will pass)
143 | P a g e

P (none will pass) = P (X = 0) = (0.6)5 = 0.07776

P (at least one will pass the exam) = 1 – 0.07776
= 0.92224
(ii) P (at least two will pass) = P(X2) = 1- P(X<2)
P(X2) 1 [P(X 0) P(X 1)]

n
1 [0.0776 C1p1q 4
5
1 [0.07776 C1 (0.4)1 (0.6) 4 ]
1 [0.07776 0.2592]
0.66304
8.3.4 Poisson Distribution
Poisson distribution was developed by Simeon Denis Poisson in 1837.Here the random variable
X represents successes in a specific interval of space and time. Its probability mass function is
given by
𝑒 −λ λ𝑥
𝑓(𝑋 = 𝑥) = ; x=0,1,2,3…... ; λ > 0
𝑥!
0 ; otherwise
where  represents average number of successes.
E(X) =  V (X) =  i.e., mean and variance = 
For eg: Noting the number of deaths in an area during the month.
The number of cars arriving at parking during a given period of time.
The number of errors made by typist per page.
The number of defective bulbs in a manufacturing unit etc.
The average number of customers per hour at a shop.
Visits to a particular website, e mail messages sent to a particular address.
Accidents in an industrial facility.
Cosmic ray showers observed by astronomers at a particular observatory.
These are some of the examples where Poisson distribution can be used.
144 | P a g e
For example, the number of customers at a shop is 4 in an hour. Find the probability that during
an hour (i) no customer arrived (ii) 2 or more customer arrived at shop.
Let X be the number of customers at shop is an hour.
So, X follows Poisson distribution with  = 4
As  represents average number of successes.
So, every hour on average 10 customers arrives at the shop.
𝑒 −λ λ𝑥
So, 𝑓(𝑋 = 𝑥) = ; x=0,1, 2,3…...infinity
𝑥!
0 ; otherwise
4
e 40
f (X 0) 0.01831
(i) 0!
(ii) f(X2)=1-f(X<2)
= 1 – [f (x = 0) + f (x = 1)]
𝑒 −4 41
= 1-[0.1831+ ]
1!
= 1 – [0.01831 + 0.07326]
= 0.90843
8.3.5 Limiting Case of Binomial Distribution
If the probability of success in the binomial distribution is too small and number of trials are
large, then binomial distribution can be approximated to Poisson distribution. In such an
approximation the average number of successes is the mean of binomial distribution which is
np i.e.
 = np
Mathematically, if n→ ∞ and p → 0 then binomial distribution is approximated to Poisson
distribution.
For e.g., the probability of ineffective covid vaccine is 0.002, determine that out of 1000
individuals:
(i) exactly 2 will suffer Covid infection after being vaccinated.
(ii) more than 2 will suffer from infection.
145 | P a g e

Answer. since the p is 0.002 i.e., p → 0 and n is large i.e., 1000. So, the Binomial distribution
will approximate to Poisson distribution with = np.
 = np
0.002
1000
1000
= 2
e 2 22
f (X 2) 0.2706
(i) 2!
(ii) f(X>2)=1-f(X2)
1 f (X 0) f (X 1) f (X 2)
𝑒 −2 20 𝑒 −2 21 𝑒 −2 22
=1-[ + + ]
0! 1! 2!
1 [0.1353 0.2706 0.2706]

0.3235
IN-TEXT QUESTIONS
1. The Probability of a car having flat tire while crossing a Bridge is 0.00006. Use the
Poisson distribution to approximate the binomial probabilities that, among 10,000 cars
crossing this Bridge. Exactly one will have a flat tire.
2. ________________Distribution has only one trial.
3. If the Probability of binomial distribution is ___________and Number of trials
are______________ then Binomial distribution can be approximated to Poisson
distribution.
4. Fit Binomial distribution to the following data.
X 0 1 2 3 4
F 25 68 45 12 5
8.3.6 Hyper Geometric Distribution

Hyper-geometric distribution is obtained if the population is finite, and sampling is done
without replacement and events though random are statistically dependent. Consider an urn
146 | P a g e
with N Bals of which K are Red and remaining N-K are white. Let us draw a random sample
of n balls without replacement. Then the probability of getting x red balls out of ‘n’ is given by
K N K
Cx Cn x
f (x)
N Cn
;x = 0, 1, 2, ....n
0 otherwise
For example: Let there be 5 economics and 10 commerce graduates. The organization requires
5 analysts that are to be chosen from the economics and commerce students. Find the
probability of selecting 3 economics students for this job.
K = 5 = number of economic students
N – K = 10 = number of commerce student
N = 15 total student
x = selecting economics student as analyst
n = total student selected as analyst
K N K
Cx Cn x
f (x)
N Cn
5 10
C3 C2
f (x 3) 15
C5
= 0.1498
8.4 SUMMARY
A thorough knowledge of all the discrete distribution will help to assess the level of uncertainty
and plan accordingly. All the distribution along with their probability mass function, mean and
variance are shown in the following table.
Distribution Probability Mass Function PMF Mean Variance
Uniform 1 n2
Distribution f (x x) f (x) x 1, 2,3.....n n 12
n
2
Bernoulli 𝑝 𝑥 (1 − 𝑝)1−𝑥 p (1-p) p
147 | P a g e

Binomial n
Cr p r q n r np np(1-p)
Poisson 𝑒 λ λ𝑥  
𝑥!
8.5 ANSWER TO IN-TEXT QUESTIONS

1. Since p is 0.00006 le p tends to 0 and n is large i.e., 10,000
So, the Binomial distribution will approximate to Poisson distribution with λ = np.
λ= np
λ = 0.00006×100000
100000
λ =6
𝑒 −6 61
f(X=x)= 1!
= 0.0148
So, Answer is 0.0148
2. Bernoulli
3. small; large
4. We have n=4 and N = f =155
The mean of the given distribution

𝑓𝑋 0𝑋25+1𝑋68+2𝑋45+3𝑋12+4𝑋5
Mean= =
𝑓 155
214
1.380=
155
Mean of Binomial Distribution is п * р = mean
148 | P a g e
214
4p=155
107 203
p=310 ; q=310
Expected binomial frequencies
107 𝑥 203 4−𝑥

f(x)= N P(X=x)=155X𝐶04 𝑋 (310) X(310)
X P(X=x) Expected Binomial

Frequency
N P(x)=155P(x)
0 107 0 203 4
𝐶04 𝑋 (310) X(310) =0.1838 28.5028
1 107 1 203 3 60
𝐶14 𝑋 (310) X(310) =0.3876
2 107 2 203 2
𝐶24 𝑋 (310) X(310) =0.3065 47.5148
3 107 3 203 1
𝐶34 𝑋 (310) X(310) =0.1077 16.6917
4 107 4 203 0 2.192

𝐶44 𝑋 (310) X(310)
X 0 1 2 3 4
F 28 60 48 17 2

(1) Probability of a car having a flat tyre is 0.00005 while crossing a bridge. Use the Poisson
distribution to approximate the binomial probabilities of 10,000 cars crossing the
bridge.
149 | P a g e

(a) exactly two have a flat tyre

(b) at most one has flat tyre
(2) Suppose car accidents in Delhi follow Poisson distribution with an average of one
accident per day. What is the probability more than 5 accidents will occur in a week?
(3) Let X = B (n, p) with n = 25 & p = 0.3. find P(X< µ–2σ).
(4) In an air pollution survey, an inspector decides to examine 3 of a company’s 30
factories. If 6 of the company’s factories omit excessive pollutants and 24 factories
follow all the standards. What is the probability the one of the factories will be included
in the examination.
8.7 REFERENCES
• Devore, J. (2012). Probability and Statistics for Engineers, 8th ed. Cengage Learning
• John A. Rice (2007). Mathematical Statistics and Data Analysts, 3rd ed. Thomson
Brooks/Cole.
• Miller, 1, Miller, M. (2017). J. Freund's Mathematical Statistics with Applications, 8th
ed. Pearson.
• Hogg, R., Tanis, E., Zimmerman, D. (2021) Probability and Statistical inference 10th
Edition, Pearson
• Larsen, R, Marx, M. (2011) An Introduction to Mathematical Statistics and its
Applications, Prentice Hall.
• James McClave, P. George Benson, Terry Sincich (2017), Statistics for Business and
Economics, Pearsons Publication.
150 | P a g e
LESSON 9
CONTINUOUS DISTRIBUTION
STRUCTURE
9.1 Learning Objective
9.2 Introduction
9.2.2 Exponential Distribution
9.2.3 Normal Distribution
9.2.4 Standard Normal Distribution
9.2.5 Central Limit Theorem
9.3 Summary
9.6 References
After reading this lesson, students will be able to learn :
1. Uniform and exponential distributions.
2. Normal distribution with its properties.
3. Standard Normal distribution with its applications and
4. Central limit theorem and in applications.
9.2 INTRODUCTION
In previous lessons, you have learned about continuous random variables and associated
functions. In this chapter you will read about different kinds of continuous distribution. Normal
distribution is used extensively in economics and statistics. Several important applications of
normal distribution with cognizable examples have been discussed. The central limit theorem
is one of the most celebrated theorems in statistics and is used extensively.
151 | P a g e


A random variable X is said to have continuous uniform distribution over an interval (a, b) if
its probability density function is constant over the entire range of X. Random variable x takes
value between a and b.
So, the total probability is evenly distributed between the entire interval such that subintervals
with same length have same probability.
1
f (X ) = , a X b
b−a
0 ; otherwise
b+a (b − a)2
Variance =
Mean 2 12
x
1 x−a
f ( X  x) =  dt = = CDF
a
b−a b−a
CDF
For eg: If y ~ U (50,140) , what are P (Y  70)
and P (50  Y  130)

1
f ( x) =
140 − 50 , Since a = 50, b = 140
1
f ( x) =
90
140 140
1 x
P(Y  70) =  90
dx =  
 90  70
70
140 70
= −
90 90
70
= = 0.78
90
130
1
P (50  Y  130) =  90
dx
50
152 | P a g e
130 50
= − = 0.89
90 90
9.2.2 Exponential Distribution
A continuous non-negative random variable X is said to have an exponential distribution with
parameters  if its pdf is given by
f ( X ) = e− x ; x≥0
0 ; otherwise
Exponential distribution is also known as waiting distribution where waiting parameter is .
1 1
E( X ) = and V (X ) =
So,  2
x
P ( X  x) =   e − t dt
Cumulative density function 0
 − t 
x
P( X  x) = e 
− 0
Le us consider that probability of getting a success x in a time interval t and t + t is .

P( X  x) = 1 − P( X  x)
= 1 − [1 − e− x ]
= e−  x
Example: A call center receives 4 calls per hour. What is the probability that next call arrives
after ½ hours?
Solution: So  = 4

P  X   =  (4e −4 x )dx
1
 2  1/ 2
153 | P a g e


 −4 x 
4 e 
=  −4 1/ 2
= −e−4 + e−4/ 2
= −e− + e−2
= 0.13533
9.2.3 Normal Distribution
This is the most important distribution developed in 1733 by French Mathematician De Moivre.
The normal distribution is also called Gaussian distribution as German Mathematician
Friedrich Gauss (1777-1855) derived in equation. It is a symmetric bell-shaped curve
A random variable X is called normal random variable
1  X −  2
1 −  
f (X ) = e 2   −  x  
 2
Where constant π=3.14 and e=2.718. µ and σ are the two parameters of the distribution and X
is a real number denoting the continous random variable of interest.
Properties
• It is symmetric through the mean.
• Because of the symmetry at the points of inflexion at ±σ distance, the normal curve has
a bell shape
• The right and left tails of the curve extend infinitely without touching the horizontal x
axis.
• In normal distribution Mean = Median = Mode.
• The area between two points a and b is represented by the shaded region.
154 | P a g e
• Area between  −  to  +  is 68.27%.

• Area between  − 2 to  + 2 is 95.44%
• Area between  − 3 to  + 3 is 99.73%
9.2.4 Standard Normal Distribution

𝑋−µ
Making the transformation, Z= , we obtain the standard normal variable Z that has mean 
𝜎
= 0 and standard deviation  = 1.
Area under the standard normal curve = 1.
Property O
(1) p (a  X  b) = P(a  X  b) = P (a  X  b) = P (a  X  b)
155 | P a g e

(2) In standard Normal curve areas at the right and left of 0 is 0.5
Normal curve is symmetric So, any area between 0 and a particular point c and area between 0
and point −c will be same.
P(−c  z  0) = P(0  z  c)
For e.g., Area between 0 and 1.45 will be equal to area between O and −1.45.
P(0  z  1.45) = P(−1.45)  z  0) = 0.4265

Now you all must be wondering how I got the value 0.4264. For this you have to learn to look
at the standard normal table.
The probability values of the different values of z are given in the table, which represents the
area under the standard normal curve.
z
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
Z
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
156 | P a g e
So, to get the value of P (0  Z  0.34) , we have to

search 0.3 in column and 0.04 is first row. So, we
get the value 0.1331.
Similarly, if we want P(−0.34  Z  0.34) then as

we know that standard normal curve is
symmetrical and
P(0  Z  0.34) = P(0.34  Z  0)
So, P(−0.34  Z  0.34) = P(−0.34  Z  0) + P(0  Z  0.34)
P(−0.34  Z  0.34) = P(0  Z  0.34) + P(0  Z  0.34)

P(−0.34  Z  0.34) = 2 P(0  Z  0.34)
P(−0.34  Z  0.34) = 2  0.1331
P(−0.34  Z  0.34) = 0.2662
Suppose X be a normally distributed random
variable with mean  and variance  Now,
normally distributed random variable can be
X −
z=
transformed to standard normal distribution i.e., 
Now the new random variable z will have mean = 0 and standard deviation 1.
Example: If X is a normally distributed with mean  = 30 and variance σ2 = 16. Find
1. P( X  15)
2. P( 10< X<25)
3. P(12  X  36)
157 | P a g e

Sol. First, we have to convert X to z by transformation

X − X − 30
Z= =
 4
15 − 30 −15
X = 15, Z = = = −3.75
(I) For 4 4
P( Z  −3.75) = P( X  15) = P(−3.75  Z  0) + 0.5
= 0.4999 + 0.5
= 0.9999
(II)
 10 − 30 X −  25 − 30 
P(10  X  25) = P    
(III)  4  4 
= P(−5  Z  −1.25)
= P(−5  Z  0) = P(0  Z  −1.25)
= 0.49999 − 0.3944
= 0.1055
 12 − 30 X −  36 − 30 
P(12  X  36) = P    
(IV)  4  4 
X −
= P(−4.5   1.5)

158 | P a g e
= P(−4.5  Z  1.5)
= (−4.5  Z  0) + P(0  Z  1.5)
= 0.4999 + 0.4332
= 0.9331
IN-TEXT QUESTIONS
1. In normal distribution, 34% of the items are under 50 and 5% of items are over 70. Find
the mean and variance of the distribution.
2. Standard normal distribution has mean ………... and variance…………
SKEWNESS
When a distribution aperture from its symmetry then it said to have a skewed distribution.
There are two types of skewed distribution, i.e., positive and negatively skewed distribution. A
positive skewed distribution is skewed to right or have a longer tail at the right side.
Similarly, negatively skewed distribution is skewed towards left or have longer tail at the left
side.
159 | P a g e

KURTOSIS
The degree of peakedness is determined by the kurtosis. A high peak distribution is
characterized as leptokurtic
A low peak distribution is characterized as Platykurtic
The normal distribution is Mesokurtic neither too peaked nor too low.
So, normal distribution is symmetric i.e., neither positive nor negatively skewed and they are
mesokurtic i.e., neither too high peaked nor too low peaked.
160 | P a g e
9.2.5 Central Limit Theorem
According to the central limit theorem, if 1 2 3

x , x , x ,..., xn are independently identically
distributed (IID) random variable following normal distribution with Mena  and Variance 2
then central limit theorem will help to find the distribution of X and Xi random variable.
So, to find the distribution of Random variable Xi
# Rule E ( X + Y ) = E ( X ) + E (Y )
Since X1 , X 2 , X 3 ,..., X n ~ N (  ,  ) , So E( X1) = , E( X 2 ) =  and so on.

2
X i = X1 + X 2 + ... + X n
E(X i ) = E( X1 + X 2 + ... + X n )
E(X i ) = E( X1) + E( X 2 ) + ... + E( X n )
E(X i ) =  +  + .... + 
So, adding  n times will give the result as

E(X i ) =  n Eq. (1)
To find variance of
V (X i ) = V ( X1 + X 2 + X 3 + ... X n ) .
X1 + X 2 + X 3 + ... X n ~ N (  ,  2 ) V ( X1 ) =  2 , V ( X 2 ) =  2 .....V ( X n ) =  2
Since , So,
V (X i ) = V ( X1) + V ( X 2 ) + V ( X 3 ) + ...V ( X n )
V (X i ) =  2 +  2 + ... +  2
So, adding 2 n times will give the result as

V (X i ) = n 2
eqn. (2)
X i ~ N (n , n 2 )
So, and
X i − n
z= ~ N (0,1)
n 2
Now, let us find the distribution of random variable X
161 | P a g e

X i
X=
n
 X 
E( X ) = E  i  E ( X i ) = n
 n  From (1)
1
= E(X i )
n
1
= n
n
=
 X 
V (X ) = V  i 
 n  # Rule V (aX ) = a V ( X )
2
V (X i )
V (X ) =
n2 V (X i ) = n 2
from (2)
n 2
V (X ) =
n2
2
V (X ) =
n
 2
X ~ N  ,  
 n 
Ques. In a large population the distribution of a variable has a mean of 165 and standard
deviation 25 units. If a random sample of size 35 is chosen, find the approximate probability
that the sample mean lies between 162 and 170.
2
Solution: X ~ N (165, 25 )
Where, sample size (n) = 35
We have to find the distribution of sample mean
162 | P a g e
 2
X ~ N  ,  
 n  , Since  = 165, 2 = 252
 2
X ~ N 165, 25 
 35 
252 2
Note that 35 i.e., n is variance
2 
=
Standard deviation is n n
 162 − 165 170 − 165 

 z
P(162  X  170) = P 25 25 
 
To find  35 35 
= P (−0.70  z  1.18)
= P( z  1.18) + P( z  0.70)
as P( z  0.70) = P( z  −0.70)
= 0.3810 + 0.2580
= 0.639
As the sample size increases then distribution of X will tend to normal distribution. For a
distribution to be approximated to normal distribution, sample size must be at least 30 or in
other words n  30. As the sample size increases, even discrete distribution approximates
normal distribution.
163 | P a g e

IN-TEXT QUESTIONS
3. Consider a random sample of size 30 taken from a Normal distribution with Mean 60
and variance 25. Let the sample mean be denoted by X . So, calculate the probability
that X assumes a value greater than 62.
4. Mean and variance of distribution of random variable X is ……………... and

……………. respectively.
9.3 SUMMARY
Continuous distributions are most extensively used in economics. Thorough knowledge of
these distributions will help you to solve the self-assessment questions properly. The central
limit theorem is one of the most celebrated theorems of statistics. It states that all discrete and
continuous distribution will approximate to normal distribution with increase in sample size
i.e., n  30. We can also represent all the continuous distributions with their probability density
function, mean and variance.
Distribution Probability Density function Mean Variance
Uniform distribution 1 b+a (b − a ) 2
b−a 2 12
Exponential distribution  e−  x 1 1
 2
Normal distribution 1 X −  2
− 
 2
1 
− e 2  
 2
Standard normal distribution 1 −2z
1 2 0 1
e
2

1. As per the given information
f ( X  50) = 0.34 and f ( X  70) = 0.05
Let  and  the mean and variance of X. Converting X into standard normal variable z.
50 −  70 − 
X = 50, z1 = and for X = 70, z2 =
For  
164 | P a g e
The respective areas are

represented on the graph
P(Z  z1) = 0.34
P(Z1  Z  0) = 0.5 − 0.34
i.e., it represents area between 0
and − z1
P(0  Z  − z1) = 0.16
So, value of − z1 can be found
through the standard normal
table. The area 0.16 is represented through 0.42
So, − z1 = 0.42
 50 −  
−  = 0.42
  
50 −  = −0.42 (1)
P( X  70) = 0.05
P(Z  z2 ) = 0.05
P(0  Z  z2 ) = 0.5 − 0.05 = 0.45
So, the value of z2 from the standard normal table
z2 = 1.66
70 − 
= 1.66

70 −  = 1.66 (2)
Solving eqn. (1) and (2) we get
 = 9.61 and  = 54
2. The standard normal distribution has a mean of 0 and variance 1.
3. n = 30,  = 60, 2 = 25
 2
X ~ N  ,  
 n 
 25 
X ~ N  60, 
 30 
 X −  62 − 60 

P( X  62) = P   5 
 
 n 30 
165 | P a g e

4. P( X  2.19) = 0.5 − P ( 0  z  2.19)

P( Z  2.19) = 0.5 − 0.4857
P ( Z  2.19) = 0.0143
2
;
n

1. A magazine claims that 30% of its readers are Students A random sample of 100 readers
is taken & is found to Contain 25 students. Calculate the probability of obtaining 25 or
fewer students’ readers assuming that the magazine’s claim is correct.
2. A fair die is tossed 150 times Determine that the face 6 will appears.
1. between 20 and 30 times inclusive
2. less than 20 time
3. A call center Receives 6 calls per hour. What is the probability that next call arrives
after ¼ hour.
4. Only 4 Students came to attend the class today, find the portability for exactly 6 students
to attend the class tomorrow.
5. A random variable X has the distribution B (10, p), Given that p= 0.30 find
1. P (X < 4)
2. P(X8)
9.6 REFERENCES
• Devore, J. (2012). Probability and Statistics for Engineers, 8th ed. Cengage Learning
• John A. Rice (2007). Mathematical Statistics and Data Analysts, 3rd ed. Thomson
Brooks/Cole.
• Miller, 1, Miller, M. (2017). J. Freund's Mathematical Statistics with Applications, 8th
ed. Pearson.
• Hogg, R., Tanis, E., Zimmerman, D. (2021) Probability and Statistical Inference 10th
Edition, Pearson
• Larsen, R, Marx, M. (2011) An Introduction to Mathematical Statistics and its
Applications, Prentice Hall
166 | P a g e
LESSON 10
JOINT PROBABILITY DISTRIBUTION AND MATHEMATICAL EXPECTATIONS
STRUCTURE
10.2 Introduction
10.3 Joint Probability Mass Function
10.3.1 Conditional Probability Distributions
10.3.2 Independence of Random Variables
10.3.3 Marginal Probability Mass Functions
10.3.4. Expectations of Probability Mass Functions
10.4 Continuous Random Variables
10.4.1 Marginal Probability Density Functions
10.4.2 Expected Value of a Probability Density Function
10.4.3 Conditional Probability Distributions
10.5 Summary
10.6 Glossary
10.9 References
10.1 LEARNING OBJECTIVE

After reading this lesson, student will be able to :
1. Identify the probability distribution of two or more events occurring together
2. Calculate marginal distributions of more than one variable of discrete and continuous
distributions
3. Calculate conditional probability and verify independence of probability distributions
and
167 | P a g e

4. Calculate mathematical expectations of joint probability mass function and joint

probability density functions
10.2 INTRODUCTION
This chapter deals with the probability distribution of two or more random variables called
joint probability distribution. There are two types of joint probability distribution. One is
probability mass function (PMF) and the other is probability density function (PDF). In the
case of joint probability distribution of two discrete variables, the probability distribution
function is called probability mass function. In the case of continuous variables, it is called
probability density function. The chapter first deals with the joint probability mass function
and then probability density function in the second half of the chapter.
10.3 JOINT PROBABILITY MASS FUNCTION
Joint probability mass function is related to the probability distribution of two discrete
variables. It is characterized by the following features.
Let X and Y be the two discrete random variables on the sample space S. The joint probability
mass function (PMF) is given by
P( x, y ) = P( X − x and Y = y ) where
( x, y ) is a pair of possible values for the pair of random variables ( X , Y ) and P ( x, y )
must satisfy the following conditions –
(a) 0  P( x, y )  1
 P( x, y ) = 1
x y
(b)
The probability P[( X , Y )  A] is obtained by summing the joint PMF.
P[( X , Y )  A] =  P ( x, y )
( x , y ) A
(c)
It must be noted that conditions (a) and (b) are required for P ( x, y ) to be a valid joint PMF.
Example-1
Consider two random variables X and Y with joint PMF as shown in the table below:
Y=0 Y=1 Y=2
X=0 1/6 1/4 1/8
X=1 1/8 1/6 1/6
168 | P a g e
Find the following
(i) P ( X = 0, Y  1)
(ii) P (Y = 0, X  1)
Solution:
(i) P( X = 0, Y  1) = PXY (0,0) + PXY (0,1)

1 1 5
= + =
6 4 12 Answer.
1 1 7
P(Y = 2, X  1) = PXY (0, 2) + PXY (1, 2) = + =
(ii) 8 6 24
Example-2
A function f is given by 𝑓(𝑥, 𝑦) = 𝑐𝑥𝑦 for 𝑥 = 1,2,3 ; 𝑦 = 1,2,3
Determine the value of 𝑐 for which the above function 𝑓(𝑥, 𝑦) 𝑣𝑎𝑙𝑖𝑑𝑎𝑡𝑒 𝑎𝑠 𝑡𝑟𝑢𝑒 𝑝. 𝑚. 𝑓.
Solution:
From the question given,
𝑓(𝑥, 𝑦) = 𝑐𝑥𝑦
Where, 𝑥 = 1,2,3 𝑎𝑛𝑑 𝑦 = 1,2,3.
There are 9 possible pairs of 𝑋 𝑎𝑛𝑑 𝑌, namely (1,1), (1,2), (1,3), (2,1), (2,2), (2,3),
(3,1), (3,2) 𝑎𝑛𝑑 (3,3). The probabilities associated with each of the pairs are:
𝑓(1,1) = 𝑐(1)(1) = 𝑐
𝑓(1,2) = 2𝑐, 𝑓(1,3) = 3𝑐, 𝑓(2,1) = 2𝑐
𝑓(2,2) = 4𝑐, 𝑓(2,3) = 6𝑐, 𝑓(3,1) = 3𝑐
𝑓(3,2) = 6𝑐, 𝑓(3,3) = 9𝑐
For 𝑓(𝑥, 𝑦) to be a valid joint 𝑝𝑚𝑓,
∑ ∑ 𝑓(𝑥, 𝑦) = 1
𝑦
𝑥
Hence,
3 3
∑ ∑ 𝑓(𝑥, 𝑦) = 𝑐 + 2𝑐 + 3𝑐 + 2𝑐 + 4𝑐 + 6𝑐 + 3𝑐 + 6𝑐 + 9𝑐 = 1
𝑥=1 𝑦=1
36𝑐 = 1
169 | P a g e

1
𝑐=
36
1
Thus, for, 𝑐 = the given function is a valid probability mass function.
36
10.3.1 Conditional Probability

Conditional probability has already been discussed earlier. It is once again reiterated that
conditional probability is a measure of the probability of an event occurring given that another
event has already occurred.
Conditional probability is denoted by P (( A | B ) where

P( A  B)
P(( A | B) = ; P( B)  0
P( B)
In this chapter, we deal in joint probability of two random variable X and Y. The conditional
probability of which is given by
P( X  C , Y  D)
P[ X  C | Y  D] =
P(Y  D)
where C , D  R .
For discrete random variables X and Y, the conditional PMFs of X given Y and Y given by X
respectively are given by
PXY ( xi , y j )
PX |Y ( xi / y j ) =
PY ( y j )
{for any
xi  RX and
PXY ( xi , y j )
PY | X ( y j / xi ) =
PY ( xi ) y j  RY
Example-3
Consider two random variables X and Y with joint PMF as shown in the table below
Y=2 Y=4 Y=5
X=1 1/12 1/24 1/24
X=2 1/6 1/12 1/8
X=3 1/4 1/8 1/12
170 | P a g e
Find the following –
(a) P( X  2, Y  4)
(b) P(Y = 2 | X = 1)
Solution:
(a) P( X  2, Y  4)
= PXY (1,2) + PXY (1,4) + PXY (2,2) + PXY (2,4)
1 1 1 1 3
= + + + =
12 24 6 8 8
P( X = 1, Y = 2)
P(Y = 2 | X = 1) =
(b) P( X = 1)
P (1, 2) 1 1 1
= XY =  =
PX (1) 12 6 2 Ans.
10.3.2 Independence of Random Variables

In the case of joint PMF, criteria for the independence of two discrete random variables X and
Y are given by –
PXY ( x, y) = PX ( x)  PY ( y)  x, y
The above condition must fulfil for two discrete random variables X and Y is independent.
Example-4
From the question in example 3, check if X and Y are independent.
Solution:
For X and Y to be independent.
P( X = xi , Y = y j ) = P( X = xi )  P(Y = y j )
for all
xi  RX and for all y j  RY
1
P( X = 2, Y = 2) =
6
3 1 3
P( X = xi )  P(Y = y j ) =  =
8 2 16
1 3

6 16
171 | P a g e

 X and Y are not independent.
10.3.3 Marginal Probability Mass Functions
If ( X , Y ) are discrete variables, then marginal probability is the probability of a single event
that occur independent of another event.
The marginal probability mass function of

X i is obtained from the joint PMF as shown below–
PX i ( x) =  PX ( x1, x2 ,..., xk )
X1... X k
In words the marginal PMF of Xi at the point X is obtained by taking the sum of the joint PMF
PX out all the vectors that belong to RX in such a way that is component is equal to X.
Example-5
Carrying forward from example 3, find the marginal PMFs of X and Y.
Solution
RX = {1,2,3}, RY = {2,4,5}
Marginal PMFs are given by
1
6 , for X = 1

 3 for X = 2
PX ( x) =  8
 11
 for X = 3
 24
0 Otherwise
1
2 , for Y = 2

 1 for Y = 4
PY ( y ) =  4
1
 for Y = 5
4
0 Otherwise
172 | P a g e
10.3.4 Expectation of a PMF
Let X and Y be a jointly distributed Random variable with probability mass function P ( x, y )
with discrete variables. Then the expected value of function g ( x, y ) is given by
E[ g ( X , Y )] =   g ( X , Y )  P ( x, y )
x y
Example-6
Find E(XY) for data given in example 2
Solution:
1 1 1 1 1
= (1 × 2 × ) + (1 × 4 × ) + (1 × 5 × ) + (2 × 2 × ) + (2 × 4 × )
12 24 24 6 12
1 1 1 1
+ (2 × 5 × ) + (3 × 2 × ) + (3 × 4 × ) + (3 × 5 × )
8 4 8 12
177
= = 7.38
24
IN–TEXT QUESTIONS
Answer the following MCQs
1. Let U  {0,1} and V {0, 1} be two independent binary variables. If P (U = 0) = P and

P (V = 0) = q, when P (U + V )  1 is
(a) pq + (1 − p )(1 − q )
(b) pq
(c) p (1 − q )
(d) 1 − pq
2. If a variable can take certain integer values between two given points, then it is called–
(a) Continuous random variable
(b) Discrete random variable
(c) Irregular random variable
(d) Uncertain random variable
173 | P a g e

3. If E (U ) = 2 and E (V ) = 4 then E (U − V ) = ?
(a) 2
(b) 6
(c) 0
(d) Insufficient data
4. Height is a discrete variable (T / F)
5. If X and Y are two events associated with the same sample space of a random
experiment. then P ( X | Y ) is given by
(a) P ( X  Y ) / P(Y ) provided P(Y )  0
(b) P ( X  Y ) / P(Y ) provided P(Y ) = 0
(c) P ( X  Y ) / P (Y )
(d) P( X  Y ) / P( X )
6. Let X and Y be events of a sample space S of an experiment. If P ( S | Y ) = P(Y | Y ) then

the value of P (Y | Y ) is
(a) 0
(b) −1
(c) 1
(d) 2
7. What are independent events? Choose the correct option:
(a) If the outcome of one event does not affect the outcome of another.
(b) If the outcome of one event affects the outcome of another.
(c) Any one of the outcomes of one event does not affect the outcome of another.
(d) Any one of the outcomes of one event does affect the outcome of the other.\
10.4 CONTINUOUS RANDOM VARIABLES
The probability that the observed value of a continuous random variable X lies in a one-
dimensional set A, is obtained by integrating the probability density function (PDF) f ( x ) over
the set A.
Similarly, the probability that the pair ( X , Y ) of a continuous random variable fall in a two-
dimensional set A is obtained by integrating the joint PDM.
174 | P a g e
Joint density function is a piecewise continuous function of two variables f ( x, y ) , such that
for any "reasonable" two-dimensional set B
P( X , Y )  A =  f ( x, y)dydy
A .
Definition: Let X and Y be continuous random variables. A joint density function f ( x ) for
these two variables is a function satisfying
(a) f ( x, y )  0
and
 
  f ( x, y )dxdy = 1
− −
then for two-dimensional set A

P[( X , Y )  A] =  f ( x, y)dydx
A
Example-7
The joint PDF of (X, Y) is given by
6
 (x + y ) 0  x  1, 0  y  1
2
f ( x, y ) =  5

 0 Otherwise
answer the following

(a) Verify that f ( x, y ) is a legitimate PDF.
 1 1
P0  x  , 0  y  
(b) Find  n n
Solution: (a) Two conditions must be satisfied for f ( x, y ) to be a legitimate PDF
(i) f ( x, y )  0 and
o 
  f ( x, y )dxydy = 1
(ii) − −
the first condition is fulfilled as f ( x, y )  0 for the verification of the second condition –
175 | P a g e

  1 1
6
  f ( x, y )dxdy =   5
( x + y 2 )dxdy
− − 0 0
1 1 1 1
6 6 2
=  x dxdy +   y dxdy
0 0
5 0 0
5
1 1
6 6
= x dxdy +  y 2 dxdy
0
5 0
5
= 1. This second condition is also verified.
P  0  X  , 0  Y  
1 1
(b)  4 4
1/ 4 1/ 4
6
=   5
( x + y 2 )dxdy
0 0
1/ 4 1/ 4 1/ 4
6 6
=
5   x dxdy +
5  y 2 dxdy
0 0 0
1 1
2 x= 4 3 y= 4
= 6  x + 6 y
20 2 x =0 20 3 y =0
7
=
640 Ans.
Example-8
Consider two continuous random variables X and Y with joint p.d.f.
2 2
𝑓(𝑥, 𝑦) = {81 𝑥 𝑦, 0 < 𝑥 < 𝐾, 0 < 𝑦 < 𝐾
a) find the value of K so that 𝑓(𝑥, 𝑦)𝑖𝑠 𝑎 𝑣𝑎𝑙𝑖𝑑 𝑝. 𝑑. 𝑓.

b) find P(X>3Y)
176 | P a g e
Solution:
a) for 𝑓(𝑥, 𝑦) 𝑡𝑜 𝑏𝑒 𝑎 𝑣𝑎𝑙𝑖𝑑 𝑝. 𝑑. 𝑓. conditions of continuous p.d.f. must satisfy
𝑘 𝑘 2 𝐾5
therefore, ∫0 ∫0 𝑥 2 𝑦𝑑𝑥𝑑𝑦 = 1 = 243 => K = 3
81
𝑥
3 2
b) P(X>3Y) = ∫0 (∫03 81 𝑥 2 𝑦𝑑𝑦) 𝑑𝑥
3 1
= ∫0 𝑥 4 𝑑𝑥
729
1
= 15
10.4.1 Marginal Probability Density Function

Marginal PDF in the continuous distribution variable can be obtained in a similar manner as in
the case of discrete variables.
MDF can be obtained by integrating the joint PDF of one variable keeping the other constant.
The marginal PDF of X and Y denoted by f X ( x) and FY (Y ) , respectively is given by

f X ( x) =  f ( x, y )dy
− for − x  

fY ( x ) =  f ( x, y )dx
− for − y  
Example 9
Find the MDF f X ( x) and fY ( y) in example 3.

Solution

f X ( x) =  f ( x, y )dy
−
1
6 6x 2
= ( x + y 2 )dy = +
0
5 5 5
177 | P a g e

6 2
 x+ , 0  x 1
f X ( x) =  5 5

 0 otherwise

fY ( y ) =  f ( x, y )dx
−
1
6
= ( x + y 2 )dx
0
5
6 2 3
= y +
5 5
6 2 3
 y + for 0  y  1
fY ( y ) =  5 5

 0 Otherwise

10.4.2 Expected value of a PDF
Let X and Y be a continuous random variable with joint PDF f ( x, y ) . Let g be some function,
then
 
E[ g ( x, y )] =   g ( x, ly ), f ( x, y ) dxdy
− −
Example-10
The length of a thread is 1 mm, and two points are chosen Uniformly and independently along
the thread. Find the expected distance between these two points.
Solution
Let U and V be the two points that are chosen. The joint PDF of U and V is
1 0  U ,V  1
f (U ,V ) = 
0 otherwise
1 1
E[U − V ] =   | U − V | dUdV
01 0
178 | P a g e
E[U − V ] =  (U − V ) dUdV +  (U − V )dUDV

U V V U
1 1 1 1
=  (U − V )dUdV +   (V − U )dUdV
0 0 0 0
1
E[U − V ] =
3
Example 11
The joint PDF of X and Y is given by
3
 ( x + y) 0  x  1
f ( x, y ) =  7

 0 otherwise
2
find the expected value of X / Y .
Solution
2 1
3x( x + y )
E[ X , Y ] = 
2
 dxdy
1 0 7 y2
2
3  1 1
=   2 +  dy
7 1  3y y
3
E[ X , Y 2 ] =
28 Ans.
10.4.3 Conditional Distributions

Conditional PDF of X, given that Y = y is denoted by
f ( x, y )
f X |Y ( x | y ) =
fY ( y )
and the conditional expected value of X given Y = Y is given by
E[ X | Y ] =  xf X |Y ( x | y)dx
179 | P a g e

Similarly, one can define the conditional PDF, expected value of Y given X = X by
interchanging the rate of X and Y.
Properties of Conditional PDFs
The conditional PDF for X, given Y = Y is a valid PDF if two conditions are satisfied–
0  f X |Y ( x, y)
(1) (a)
(b)  f X |Y ( x | y)dx = 1
(2) The conditional distribution of X given Y does not equal the conditional distribution of Y
given X.
f f ( x | y)  fY | X ( y | x)
i.e. X |Y
Example 12
If the joint PDF of U and V is given by
2
 (U + V ) 0  U  1, 0  V  1
f (U ,V ) =  3

 0 Otherwise
find the conditional mean of U given V = 1/2.

Solution:
Let U and V be the joint PDF
 2U + 4V
 0 U 1
f (U | V ) =  1 + 4V

 0 elsewhere
so that
2
 1   (U + 1) 0  U  1
f U  =  3
 2 
 0 otherwise
1
 1 2
E U  =  U (U + 1)dV
 2 0 3
then
 1 5
E U  =
 2 9
180 | P a g e
IN TEXT QUESTIONS
8. A random variable that assume an infinite number of values is called

(a) Continuous random variable
(b) Discrete random variable
(c) Irregular random variable
(d) Uncertain random variable
9. For the function f ( x) = a + bx, 0  x  1 to be a valid PDF, which of the following
statement is correct:
(a) a = 0.5, b = 1
(b) a = 1, b = 4
(c) a = 1, b = −1
(d) a = 0, b = 0
10. Two random variables X and Y are distributed according to
( x + y) 0  x  1, 0  y  1
f X ,Y ( x, y) = 
 0 otherwise
The probability P( X + Y  1) is
(a) 0.66
(b) 0.33
(c) 0.5
(d) 0.1
11. What are the two important conditions that must be satisfied for f ( x, y ) to be a
legitimate PDF.
12. When do the conditional density function get converted into the marginal density
function?
(a) Only if random variable exhibits statistical dependency.
(b) Only if random variable exhibits statistical independency
(c) Only if random variable exhibit deviation from its mean value
(d) None of the above.
13. Let U and V be jointly distributed continuous variable and joint PDF is given as
6e − (2U +3V )

fU ,V (U ,V ) = 
 0 otherwise
181 | P a g e

Answer the following

(a) Are U and V independent?
(b) Verify if E[V|U > 2] = 1/3
(c) Verify if P(U > V) = 3/5
10.5 SUMMARY
Joint probability distribution function refers to the combined probability distribution of more
than one random variable. These variables may be discrete or continuous. Marginal probability
distribution is obtained by adding probability distribution of one variable keeping the other
variable as constant. P ( x, y ) must satisfy the following conditions in the case of discrete
variables to be a valid joint probability mass function-
(d) 0  P ( x, y )  1
  P( x, y ) = 1
x y
(e)
Respective counterpart is important in the case of continuous random variable. Conditional
probability is the probability of happening one event when the other event has already occurred.
X and Y are called independent if the joint p.d.f. is the product of the individual p.d.f.’s,
i.e., if f(x, y) = fX (x). fY (y) for all x, y.
10.6 GLOSSARY
Conditional Probability: a measure of the probability of an event occurring given that another
event has already occurred
Independence of Random Variables: if PXY ( x, y) = PX ( x)  PY ( y)  x, y
Marginal probability Density Function: obtained by integrating the joint PDF of one variable
keeping the other constant.
 
E[ g ( x, y )] =   g ( x, ly ), f ( x, y ) dxdy
Expected Value of a PDF: − −
182 | P a g e
10.7 ANSWERS TO IN – TEXT QUESTIONS
1. d 8. a
2. b 9. a
3. a 10. b
4. False 11. Refer to example 5
5. a 12. b
6. c 13. a) Yes
7. a b). Yes
c) Yes
10.8 SELF – ASSESSMENT QUESTIONS

1. A fair coin is tossed 4 times. Let the random variable X denote the number of leads in
the first 3 tosses and let the random variable Y denote the number of leads in the last 3
tosses. Answer the following.
(a) What is the joint PMF of X and Y.
(b) What is the probability of 2 or 3 leads appearing in the first three tosses and 1 or 2
leads appear in the last three tosses.
2. Let X and Y be random variables with joint PDF.
1
 −1  x, y  1
f XY ( x, y) =  4
 0 otherwise
Find
(a) P( X + Y  1)
2 2
(b) P(2 X − Y  0)
3. Let X and Y be two jointly distributed continuous random variable with joint PDF
6 xy 0  x  1, 0  y  x
f X ,Y ( x, y ) = 
 0 otherwise
(a) Find f X ( x) and fY ( y)

(b) Are X and Y independent?
183 | P a g e

(c) Find conditional PDF of X given Y
(d) Find E[ X | Y = y ] for 0  y  1

4. The joint pdf of two random variables X and Y is given by:
24𝑢𝑣, 0 < 𝑢 < 1, 0 < 𝑣 < 1, 𝑢 + 𝑣 < 1
𝑓(𝑢, 𝑣) = {
0, otherwise
1
Find 𝑃(𝑈 + 𝑉) < 2.
10.9 REFERENCES
• Devore J. L. (2012). Probability and statistics for engineering and the sciences (8th
ed.; First Indian reprint 2012). Brooks/Cole Cengage Learning.
• Rice J. A. (2007). Mathematical statistics and data analysis (3rd ed.).
Thomson/Brooks/Cole.
• Johnson R. A. & Pearson Education. (2017). Miller & Freund’s probability and
statistics for engineers (Ninth edition Global). Pearson Education.
• Miller, I., Miller, M. (2017). J. Freund's Mathematical Statistics with Application, 8th
ed., Pearson
• Hogg R. V. Tanis E. A. & Zimmerman D. L. (2021). Probability and statistical
inference (10th Edition). Pearson.
• James McClave, P. George Benson, Terry Sincich (2017), Statistics for Business and
Economcs, Pearson Publication
• Webster A. L. (1998). Applied statistics for business and economics an essentials

version (Third). Irwin/McGraw-Hill.
184 | P a g e
LESSON 11
CORRELATION AND COVARIANCE
STRUCTURE
11.2 Introduction
11.3 Covariance
11.4 Correlation
11.5 Types of Correlations
11.5.1 Positive and Negative Correlation
11.5.2 Linear and Non-Linear Correlation
11.5.3 Simple and Multiple Correlation
11.6 Difference between Correlation and Covariance
11.7 Methods of Calculating Correlation
11.7.1 Scatter diagram
11.7.2 Karl Pearson’s Coefficient of Correlations
11.7.3 Spearman’s Coefficient of Rank Correlation
11.8 Glossary
11.9 Summary
11.10 Answers to the In-Text Questions
11.12 References
After reading this lesson, students will be able to :
1. Difference between covariance and correlation
2. Method of calculating covariance
3. Types of correlation and
4. Methods of calculating correlation
185 | P a g e

11.2 INTRODUCTION
In the previous units, you must have come across problems dealing with a single variable such
as marks, weight, and height of students in a classroom in which you used statistical measures
of central tendency such as mean, median, mode, standard deviation etc. All these measures
focused on understanding the data set containing individual variables independently. However,
in the real world, we have to analyse not only single variable but number of variables at the
same time. In such a situation, the basic question that comes in our minds is whether there is
any relationship between the two or more variables or not? And if there is a relationship, then
what kind of relationship? How can we find out the presence of such relationship between the
variables? What is the strength of such a relationship? The objective of this unit is to find the
answer to such numerous questions that we come across while dealing with two or more
variables simultaneously.
11.3. COVARIANCE
Covariance is one of the statistical measures of the relationship between two variables. In other
words, it shows how two variables change simultaneously. Suppose in your classroom there
are different students with different height and weight. So, if you want to know whether there
is any relationship between the height and weight of students. In other words, whether weight
of students varies simultaneously with the height or not. Then in such a case, we can use
covariance to understand the relationship between the two variables such as height and weight
in the present case.
The formula for covariance is given by:
𝛴(X −𝑋)(Y− 𝑌)
Cov (X, Y) = 𝑁
Where, N= sum of number of observations

𝑋 = Mean of X
X= value of observation X
𝑌= Mean of Y
Y= value of observation Y
Example 1: Find the covariance between the height and weight of the following students.
Height (cm) 65 60 70 55 50
Weight (Kg) 73 82 50 40 55
Solution: Assuming height to be X and weight of students to be Y, we have
186 | P a g e
X Y (X − 𝑋) (Y − 𝑌 ) (X − 𝑋)(Y − 𝑌 )
65 73 5 13 65
60 82 0 22 0
70 50 10 -10 -100
55 40 -5 -20 100
50 55 -10 -5 50
ΣX= 300 ΣY= 300 115
𝛴𝑋 300 𝛴𝑌 300
𝑋= = = 60 and 𝑌= = = 60
𝑁 5 𝑁 5
𝛴(𝑋− 𝑋)(Y−𝑌)
Using Cov (X, Y) = 𝑁
115
We have Cov(X, Y)= = 23
5
Thus, the covariance between the height and weight is 23.

IN-TEXT QUESTIONS
1. _______________ is the statistical measure of relationship between variables.
2. Find the covariance between the marks of students in English and mathematics in Grade
10.
English 60 40 41 55 67 23 19 70 75
Mathematics 50 60 75 55 87 65 70 45 33
3. Find the covariance between X and Y from the following table.

X 100 150 175 225 250
Y 700 600 500 400 800
187 | P a g e

11.4. CORRELATION
In earlier example, where we tried to find out the relationship between weight and height of
students, and we used covariance to find that there is positive covariance between the two
variables. Positive covariance suggested that the variable move in same direction i.e. when
there is increase in height of a student, weight also rise simultaneously. However, in this
example, we couldn’t find through covariance, how much the weight increases with increase
in the height.
Therefore, covariance tells us whether there is relationship between two variable or not.
However, it fails to determine the strength of such a relationship. In other words, covariance
doesn’t inform about how closely two variables are related to each other. Thus, in order to
determine the strength of relationship between two variables, we use correlation. In other
words. If we need to determine how much one variable changes with respect to another
variable, we use another measure known as correlation.
Correlation is defined as the degree of association between two variables. In simple words, it
explains how far two variables are related to each other. In fact, coefficient of correlation is
said to be a measure of covariance between two series of variables.
Correlation is an important statistical measure which helps in determining changes in one
variable vis-à-vis another variable. For example, we know law of demand, according to which,
quantity demanded is inversely related to the price of a commodity given all other things are
constant. Similarly, Keynes physiological law of consumption, which says that if there is an
increase in income, it will lead to increase in consumption but by less than the increase in the
former. However, if we need to find out how much consumption changes with increase in
income, we can again take the help of correlation coefficient to measure this relationship
between income and consumption.
However, it is important to distinguish correlation from causation. Correlation simply informs
us about the how much one variable varies with respect to changes in another variable. It
doesn’t necessarily mean causation. It means correlation doesn’t not tell anything about the
cause-and-effect relationship between two variables, it just only gives an understanding
regarding the strength of relationship between the two variables. For example, in the following
table, we have information regarding the demand and price of a commodity.
Demand 100 200 300 400 500

Price 60 50 40 30 20
In this case, there is a perfect negative correlation between the demand and price. However, it
implies that decrease in price causes demand to rise. This is only explaining inverse relationship
between the price and quantity demanded. In order to determine the cause-and-effect
relationship, we need to use higher statistical measures such as regression analysis
188 | P a g e
11.5. TYPES OF CORRELATION

11.5.1. Positive and Negative Correlation
When the coefficient of correlation between the two variables is positive it means that both the
variables move in the same direction. In other words, when one variable increases, then the
other also increases, though the rate of increase could be different.
For example: The law of supply curve states that there is a one-to-one relationship between the
price of a commodity and quantity supplied, given other things are constant. Such a relationship
between the price and quantity supplied is positive indicating as there is rise in the price, the
quantity supplied by the producer also rise.
Negative correlation between the two variables implies that both move in opposite direction
i.e. when one variable increase, there is a decrease in other variable. Such an inverse
relationship is found in the law of demand, which states that as the price of a commodity
increase, there is fall in the quantity demanded of the commodity.
11.5.2. Linear and Non- linear correlation
When the relationship between the two variables is linear, then it is referred as linear
correlation. In case of linear correlation, the amount of change in one variable tends to bear
constant ratio to the amount of change in another variable as a result when two variables are
plotted in a graph, we get a straight line.
800
700
600
Marks in History
500
400
300
200
100
0
0 10 20 30 40 50 60 70 80
Marks in English
Fig.1: Linear Correlation
189 | P a g e

On the other hand, when the relationship between the two variables is non-linear, then it is
referred as non- linear correlation. In this case, the amount of change in one variable does not
bear a constant ratio to the amount of change in other variable. Thus, when we plot two
variables in a graph, then we don’t get a straight line but a curve.
25
20
15
Marks
10
0
0 1 2 3 4 5 6 7 8 9 10
No. of Hours of study
Figure 2: Non-Linear Correlation

11.5.3. Simple and Multiple correlations
When we try to find the relationship between only two variables, then it is a case of simple
correlation. In the case of partial or multiple correlations, we are concerned with finding the
correlation between two or more variables. For example, when we try to find out the
relationship between the marks obtained by the students on a test and the number of hours of
study done by the students and his/her IQ. Then such a case is an example of multiple
correlation.
11.6. DIFFERENCE BETWEEN CORRELATION AND COVARIANCE
Since now you have studied correlation, you need to have a clear distinction between
correlation and covariance. Table 1 elaborates on the same.
190 | P a g e
Table 1: Difference between correlation and covariance

Correlation Covariance
It is the measure of strength of relationship It is a measure which shows how two random
between two variables. variables change with respect to each other.
The value of correlation lies between -1 and The value of covariance lies between -∞ and
+1. +∞.
It measures the direction as well as strength It only indicates the direction of the
of the relationship between the given two relationship between the given two variables.
variables.
It is free from units of measurement. It depends on units of measurement.
11.7. METHODS OF CALCULATING CORRELATION
There are three methods of calculating coefficient of correlation:

1) Scatter diagram
2) Karl Pearson’s coefficient of correlation
3) Spearman’s rank correlation
11.7.1. Scatter diagram
As the name suggests, in this method we will simply put the data into the graph in the form of
scatter plot to find out the correlation between the two variables. If the scatter of the plotted
points is dense, then the correlation between the two variables is higher. However, if the scatter
of the plotted point is spread widely, then the correlations between the two variables is small.
This is one of the simplest methods of ascertaining relationship between two variables as it just
requires plotting on graph and visualization.
191 | P a g e

Figure 3: Scatter Plot Diagram showing positive correlation
The above graph plots the GDP at current price and public expenditure on education by the
government of India since 1999-2000 to 2019-2020. In this case, the points lie closely on a
upward sloping straight line from left to right, thus the correlation between the GDP and public
expenditure on education is highly positive.
Figure 4: Scatter Plot Diagram showing weak correlation
192 | P a g e
The above graph plots the GDP and the expenditure on education as the percentage of GDP
spent on education in India during 1999-2000 to 2019-20. In this case, the points are widely
scattered in the graphs which indicates weak correlation between the GDP and expenditure on
education as the percentage of GDP
11.7.2. Karl Pearson’s Coefficient of Correlation
It is the mathematical method of calculating coefficient of correlation. The coefficient of
correlation in case of Karl Pearson is represented by r. When both variables in a particular data
set are normally distributed, it is the best method to use this method. However, extreme values
can have an impact on this coefficient, which makes it undesirable when one or both of the
variables are not normally distributed because they could exaggerate or weaken the strength of
the association.
Assumptions of Karl Pearson’s coefficient of correlation:
1) There exits linear relationship between two variables
2) The two variables are normally distributed
3) There is a presence of cause-and-effect relationship between the factors which affect
the distribution of the two variables.
𝛴𝑥𝑦
r = 𝑁𝜎𝑥𝜎𝑦
where x = ( X − 𝑋)
y = (Y − 𝑌 )
N = No of items
𝜎𝑥= Standard deviation of X
𝜎𝑦= Standard deviation of Y
𝛴𝑥𝑦
In simpler form, r =
√𝛴𝑥 2 𝛴𝑦 2
Where x = ( X − 𝑋)
y= (Y − 𝑌 )
x2= (X − 𝑋 )2
y2= (Y − 𝑌)2
193 | P a g e

Example 1: Find the Karl Pearson’s coefficient of correlation between the marks in economics
and mathematics of the following students.
Economics 70 50 55 70 85 90
Mathematics 35 45 28 33 25 14
Solution: Let marks in economics by X and marks in mathematics be Y.

X Y ( 𝑋 − 𝑋) = (Y − 𝑌 ) = (X − 𝑋 )2 (Y − 𝑌)2= xy
x y =x2 y2
70 35 0 5 0 25 0
50 45 -20 15 400 225 -300
55 28 -15 -2 225 2 30
70 33 0 3 0 9 0
85 25 15 -5 225 25 -75
90 14 20 -16 400 256 -320
ΣX = 420 ΣY = Σx2= 1250 Σy2= 544 Σxy= -665
180
𝛴𝑋
𝑋 = = 420/6 = 70
𝑁
𝛴𝑌
𝑌= = 180/6 = 30
𝑁
𝛴𝑥𝑦
Using, r =
√𝑥 2 × 𝑦 2
We have
−665
r=
√1250 ×544
194 | P a g e
−665
=
√6,80,000
−665
= 824.62
= - 0.8064
= - 0.81(approx.)
Therefore, the marks of students in economics and mathematics are inversely correlated to each
other as the Karl Pearson’s coefficient of correlation is -0.81.
Direct method for finding coefficient correlation

In the previous method of finding correlation coefficient, we have taken deviations of items
from mean. However, we can also find correlation without taking any deviation of items from
mean by employing following formula:
𝛴𝑋𝑌− (𝛴𝑋)(𝛴𝑌)
r=
√𝑁𝛴𝑋 2 −(𝛴𝑋)2 √𝑁𝛴𝑌 2 −(𝛴𝑌)2
Example 2: find out the coefficient of correlation from the following data set using direct
method
A 1 6 9 3 4 5 8 2 1
B 12 9 5 6 3 7 15 11 9
Solution:
X Y X2 Y2 XY
1 12 1 144 12
6 9 36 81 54
9 5 81 25 45
3 6 9 36 18
4 3 16 9 12
5 7 25 49 35
195 | P a g e

8 15 64 225 120
2 11 4 121 22
1 9 1 81 9
ΣX= 39 ΣY= 77 ΣX2= 237 ΣY2= 771 ΣXY= 327
327− (39×37)
r=
√9×237−(39)2 √9×771−(77)2
327− 3003
r=
√2133−1521 √6939−5929
−2676
r=
√612 √1010
−2676
r = 24,74 × 31.78
−2676
r = 786.24
r = - 3.40
11.7.3. Spearman’s Coefficient of Rank Correlation
In the previous method of calculating correlation coefficient, important assumption was made
that the variables under the study must be normally distributed so as to yield appropriate results.
However, in actual circumstances, we often face a situation where the variables are not
normally distributed but skewed. In such a situation, there is a need to use another method
which doesn’t make such unrealistic assumptions about the distribution of the variables in
question. Such one method is Spearman’s rank correlation, under which no assumption is to be
followed for calculating coefficient of correlation between the two variables.
In Spearman’s rank correlation, variables are ranked, and the calculations are made on the basis
of ranks not the original observations in order to determine the coefficient of correlation.
The formula for spearman’s rank correlation is
6𝛴𝐷 2
R = 1- 𝑁(𝑁2−1)
Where, R= rank corelation

D2 = square of difference between the ranks
196 | P a g e
N = no of observation
Characteristics of spearman’s rank correlation:

1) The difference in rank between any two variables will have sum equal to zero.
2) Spearman’s rank correlation is non-parametric meaning it is not based on any
assumption about the distribution of the variable.
3) The Karl Pearson coefficient of correlation between the rankings is the same as the
spearman's rank correlation.
The above method of rank correlation is useful when there is no tie between the ranks of an
observation. However, in many cases, we come across variables which are similar in size or
other characteristics. In such situations, it becomes important to give equal ranks to such similar
observation. Thus, in such cases, where observations in a variable set have equal ranks, the
above method of rank correlation needs to be modified so as to be appropriate in cases of equal
ranks.
Thus, when equal ranks are given, we will have modified version of spearman’s rank
correlation, which is =
1 1
6𝛴𝐷 2 − (𝑚13 − 𝑚1 ) + (𝑚23 − 𝑚2 )+.....................
12 12
R = 1- 𝑁(𝑁 2 −1)
Where, R= rank corelation

D2 = square of difference between the ranks
m = no. of items whose ranks are same.
N = no of observation
Example 3: Find out the spearman’s rank correlation between the ranks of students in two
section of class XI.
Student 1 2 3 4 5 6 7
Section 4 5 7 1 3 2 6
A
Section B 1 3 2 7 5 6 4
197 | P a g e

Solution:
Students Ranks in Ranks in (R1-R2)= D (R1-R2) = D2
Section A (R1) Section B(R2)
1 4 1 3 9
2 5 3 2 4
3 7 2 5 25
4 1 7 -6 36
5 3 5 -2 4
6 2 6 -4 16
7 6 4 2 4
ΣD2= 98
Using,
6𝛴𝐷 2
R = 1- 𝑁(𝑁2−1)
Where, ΣD2= 98
6 ×98
R = 1- 7(72−1)
588
R = 1 - 336
R= 1- 1.75
R = -0.75
Example 4: Calculate coefficient of correlation between the rank of participants in a dance
competition.
Participants 1 2 3 4 5
Score of 55 70 80 70 75
Judge 1
Score of 70 80 90 50 60
Judge 2
198 | P a g e
Solution: In the above case, we are given score of two judges. Since ranks are not given, we
will give rank to the participants on the basis of scores of two judges.
Participants Score of R1 Score of R2 R1-R2 (R1-R2)2 = D2
Judge 1 Judge 1
1 55 5 70 3 2 4
2 70 3.5 80 2 1.5 2.25
3 80 1 90 1 0 0
4 70 3.5 50 5 1.5 2.25
5 75 2 60 4 -2 4
ΣD2= 12.50
1
6𝛴𝐷 2 − (𝑚13 − 𝑚1 )
12
Using, R = 1- 𝑁(𝑁 2 −1)
Where m = 2 as there are only 2 observation whose ranks are repeated.

6
6×12.5−
12
R = 1- 5 × 24
75−0.5
R = 1-
120
74.5
R = 1- 120
R = 1- 0.60
R= 0.40
IN-TEXT QUESTIONS
4. Scatter plot is simplest method of determining correlation. True/False
5. Karl Pearson’s is a one of the methods of correlation. True/False
6. Scatter plot method involves plotting variables in graph. True/ False
7. Spearman’s Rank correlation cannot be used in case of common ranks. True/False
8. Which of the following is the method of measuring correlation?
A) Spearman’s Rank method B) Standard Deviation
C) Covariance D) Mode
199 | P a g e

9. Find out the correlation between X and Y using Karl Pearson’s coefficient of
correlation.
X 10 20 30 40 50 60 70 80
Y 100 150 100 250 300 200 200 300
10. In the following table ranking of 10 participants by two judges in a drawing competition
is given.
Judge 1 2 5 10 1 9 8 4 3 7 6
Judge 2 2 10 7 4 9 5 9 1 8 3
11. Calculate the rank coefficient between the marginal utilities of two goods received by
the 10 individuals.
Individuals A B C D E F G H I J
Marginal utility 70 50 60 60 77 80 90 15 25 45
of Good X
Marginal Utility 60 90 55 40 59 65 70 85 73 50
of Good Y
11.8 GLOSSARY
Covariance: it is the measure of relationship between two or more variables

Correlation: It refers to the degree of relationship between two or more variables.
11.9 SUMMARY
In this chapter, we dealt with related but different statistical measure of determining
relationship between two or more variables. on the one hand, covariance informs us about how
the variables move together or not. However, it doesn’t inform anything about the amount of
association between the variables. On the other hand, correlation shows the relationship as well
as degree of relationship between two or more variables. Correlation can be further determined
using following methods:
A) Scatter Plot method
B) Karl Pearson’s Coefficient of Correlation
200 | P a g e
C) Spearman’s Rank Correlation
1. Covariance
2. -135.67
3. – 600
4. True
5. True
6. True
7. False
8. Spearman’s rank method
9. 0.73
10. 0.45
11. -0.22
Q.1. What is the difference between covariance and correlation?
Q.2. Discuss briefly the difference between different methods of calculating coefficient of
correlation.
Q.3. Plot the following data on a scatter plot and comment about the correlation between the
two variables.
X 10 5 1 3 2 7
Y 3 4 10 9 5 2
Q.4. Using scatter plot find out whether there is any correlation between the number of hours
individuals exercise and their respective weight.
No of 1 2 3 4 5 6
Exercise
Hours
Weight 80 65 55 60 50 60
201 | P a g e

Q5. Find out the correlation between the following variables.

X 1 2 3 5 6
Z 2 5 6 1 2
11.12 REFERENCES
• Devore, J. (2012) Probability and Statistics for Engineers, 8th ed.. Cengage Learning
• John A. Rice (2007), Mathematical Statistics and Data Analysis, 3rd ed. Thomson
Brooks/Cole
• Miller, I,Miller, M (2017, J. Freund’s Mathematical Statistics with Applications. 8th ed
Pearson
• Larsen, R, Marx (2011). An Introduction to Mathematical Statistics inference, 10th
Edition, Pearson
• Godfrey, K. (1980). Correlation methods. Automatica, 16(5), 527–534.

https://doi.org/10.1016/0005-1098(80)90076-x
• Schober, P., Boer, C., & Schwarte, L. A. (2018). Correlation Coefficients. Anesthesia
& Analgesia, 126(5), 1763–1768. https://doi.org/10.1213/ane.0000000000002864
• Akoglu, H. (2018). User’s guide to correlation coefficients. Turkish Journal of
Emergency Medicine, 18(3), 91–93. https://doi.org/10.1016/j.tjem.2018.08.001
• Janse, R. J., Hoekstra, T., Jager, K. J., Zoccali, C., Tripepi, G., Dekker, F. W., & van
Diepen, M. (2021). Conducting correlation analysis: important limitations and pitfalls.
Clinical Kidney Journal, 14(11), 2332–2337. https://doi.org/10.1093/ckj/sfab085
• Correction to Lancet Respir Med 2021; published online April 9. https://doi.
org/10.1016/S2213-2600(21)00160-0. (2021). The Lancet Respiratory Medicine, 9(6),
e55. https://doi.org/10.1016/s2213-2600(21)00181-8
202 | P a g e
LESSON 12
CHARACTERISTICS OF ESTIMATORS
STRUCTURE
12.2 Introduction
12.3 Characterstic of Estimators
12.3.1 Unbiasedness
12.3.2 Consistency
12.3.3 Efficiency
12.3.4 Sufficiency
12.4 In-Text Questions
12.5 Summary
12.6 Glossary
12.7 Answer to in-text questions
12.8 References
One of the main objectives of Statistics is to draw inferences about a population from the
analysis of a sample drawn from that population. Two important problems in statistical
inference are
(i) estimation
(ii) testing of hypothesis.
The theory of estimation was founded by Prof. R.A. Fisher in a series of fundamental papers
round about 1930.
12.2 INTRODUCTION
Let us consider a random variable 𝑋 with 𝑝, 𝑑. 𝑓. 𝑓(𝑥, 𝜃). In most common applications, though
not always, the functional form of the population distribution is assumed to be known except
203 | P a g e

for the value of some unknown parameter(s) 𝜃 which may take any value on a set Θ. This is
expressed by writing the p.d.f. in the form 𝑓(𝑥, 𝜃), 𝜃 ∈ Θ. The set Θ, which is the set of all
possible values of 𝜃 is called the parameter space. Such a situation gives rise not to one
probability distribution but a family of probability distributions which we write as { f
(𝑥, 𝜃), 𝜃 ∈ Θ }, e.g., if 𝑋 ∼ 𝑁(𝜇, 𝜎 2 ), then the parameter space Θ = [(𝜇, 𝜎 2 ): −∞ < 𝜇 <
∞; 0 < 𝜎 < ∞ ]
In particular, for 𝜎 2 = 1, the family of probability distributions is given by:
{𝑁(𝜇, 1); 𝜇 ∈ Θ},whereΘ = {𝜇: −∞ < 𝜇 < ∞}
In the following discussion we shall consider a general family of distributions:
{𝑓(𝑥; 𝜃1 , 𝜃2 , … , 𝜃𝑘 ): 𝜃𝑖 ∈ Θ, 𝑖 = 1,2, … , 𝑘}.
Let us consider a random sample 𝑥1 , 𝑥2 , … , 𝑥𝑛 of size 𝑛 from a population, with probability
function 𝑓(𝑥; 𝜃1 , 𝜃2 , … , 𝜃𝑘 ), where 𝜃1 , 𝜃2 … , 𝜃𝑘 are the unknown population parameters. There
will then always be an infinite number of functions of sample values, called statistics, which
may be proposed as estimates of one or more the parameters.
Evidently, the best estimate would be one that falls nearest to the true value of the parameter
to be estimated. In other words, the statistic whose distribution concentrates as closely as
possible near the true value of the parameter may be regarded the best estimate. Hence the basic
problem of the estimation in the above case, can be formulated as follow
We wish to determine the functions of the sample observations :

𝑇1 = 𝜃ˆ1 (𝑥1 , 𝑥2 , … , 𝑥𝑛 ), 𝑇2 = 𝜃ˆ2 (𝑥1 , 𝑥2 , … , 𝑥𝑛 ), … , 𝑇𝑘 = 𝜃ˆ𝑘 (𝑥1 , 𝑥2 , … , 𝑥𝑛 ),
such that their distribution is concentrated as closely as possible near the true value of the
parameter. The estimating functions are then referred to as estimators.
12.3 CHARACTERISTICS OF ESTIMATORS
The following are some of the criteria that should be satisfied by a good estimator.
(i) Unbiasedness
(ii) Consistency
(iii) Efficiency
(iv) Sufficiency.
Now we shall now, briefly, explain these terms one by one.
12.3.1 Unbiasedness :
An estimator 𝑇𝑛 = 𝑇(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) is said to be an unbaised estimator of 𝛾(𝜃) if 𝐸(𝑇𝑛 ) =
𝛾(𝜃), for all 𝜃 ∈ Θ
We have seen that in sampling from a population with mean 𝜇 and variance 𝜎 2
204 | P a g e
𝐸(𝑥‾) = 𝜇
and 𝐸(𝑠 2 ) ≠ 𝜎 2 but 𝐸(𝑆 2 ) = 𝜎 2 .
1
𝑠 2 = 𝑛 ∑𝑛𝑖=1 (𝑥𝑖 − 𝑥‾)2
1
Hence there is a reason to prefer 𝑆 2 = 𝑛−1 ∑𝑛𝑖=1 (𝑥𝑖 − 𝑥‾)2 , to the sample variance.
Note: If 𝐸(𝑇𝑛 ) > 𝜃, 𝑇𝑛 is said to be positively biased and if 𝐸(𝑇𝑛 ) < 𝜃, it is said to be
negatively biased, the amount of bias 𝑏(𝜃) being given by
𝑏(𝜃) = 𝐸(𝑇𝑛 ) − 𝛾(𝜃), 𝜃 ∈ Θ
Example 1. Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 is a random sample from a normal population 𝑁(𝜇, 1). Then
1
Show that 𝑡 = 𝑛 ∑𝑛𝑖=1 𝑥𝑖2 , is an unbiased estimator of 𝜇 2 + 1.
Solution. We are given: 𝐸(𝑥𝑖 ) = 𝜇, 𝑉(𝑥𝑖 ) = 1 ∀𝑖 = 1,2, … , 𝑛
2
Now 𝐸(𝑥𝑖2 ) = 𝑉(𝑥𝑖 ) + (𝐸(𝑥𝑖 )) = 1 + 𝜇 2.
1 1 1
∴ 𝐸(𝑡) = 𝐸 (𝑛 ∑𝑛𝑖=1 𝑥𝑖2 ) = 𝑛 ∑𝑛𝑖=1 𝐸(𝑥𝑖2 ) = 𝑛 ∑𝑛𝑖=1 (1 + 𝜇 2 ) = 1 + 𝜇 2
hence t is an unbiased estimator of 1 + 𝜇 2

Example 2. If 𝑇 is an unbiased estimator for 𝜃, show that 𝑇 2 is a biased estimator for 𝜃 2 .
Solution. Since 𝑇 is an unbiased estimator for 𝜃, we have 𝐸(𝑇) = 𝜃
Also Var (𝑇) = 𝐸(𝑇 2 ) − {𝐸(𝑇)}2 = 𝐸(𝑇 2 ) − 𝜃 2 ⇒ 𝐸(𝑇 2 ) = 𝜃 2 + Var (𝑇),
(Var 𝑇 > 0). Since 𝐸(𝑇 2 ) ≠ 𝜃 2 , 𝑇 2 is a biased estimator for 𝜃 2 .
Example 3. Let X be distributed in the Poisson form tith paraneter 𝜃. Show that only unbiased
estimator of exp (−(𝑘 + 1)𝜃), 𝑘 > 0, is 𝑇(𝑋) = (−𝑘)𝑥 so that 𝑇(𝑥) > 0 if 𝑥 is even and
T(𝑥) < 0 if 𝑥 is odd.
𝑒 −𝜃 𝜃𝑥
𝐸(𝑇(𝑋)} = 𝐸{(−𝑘)𝑥 }, 𝑘 > 0 = ∑∞ 𝑥
𝑥=0 (−𝑘) ( )
𝑥!
Solution. (−𝑘𝜃)𝑥
= 𝑒 −𝜃 ∑∞
𝑥=0 { } = 𝑒 −𝜃 ⋅ 𝑒 −𝑘𝜃 = 𝑒 −(1+𝑘)𝜃
𝑥!
⇒ 𝑇(𝑋) = (−𝑘)𝑥 is an unbiased estimator for exp {−(1 + 𝑘)𝜃}, 𝑘 > 0.
12.3.2: Consistency :
An estimator 𝑇𝑛 = 𝑇(𝑥1 , 𝑥2 , … , 𝑥𝑛 ), based on a random sample of size 𝑛, is said to be consistent
estinator of 𝛾(𝜃), 𝜃 ∈ Θ, the parameter space, if 𝑇𝑛 converges to 𝛾(𝜃) in probability,
𝑃
i.e. if 𝑇𝑛 → 𝛾(𝜃) as 𝑛 → ∞. In other twords, 𝑇𝑛 is a consistent cstinator of 𝛾(𝜃) if for edery
𝜀 > 𝜃, 𝑛 > 0, there exists a positive integer 𝑛 ≥ 𝑚(𝜀, 𝜂) such that
𝑃{|𝑇𝑛 − 𝛾(𝜃)| < 𝜀} → 1 as 𝑛 → ∞ ⇒ 𝑃{|𝑇𝑛 − 𝛾(𝜃)| < 𝜀} > 1 − 𝜂; ∀𝑛 ≥ 𝑚
where m is very large value of n.
205 | P a g e

Note 1. If 𝑋1 , 𝑋2 , … , 𝑋𝑛 is a random sample from population with finite mean

𝐸𝑋𝑖 = 𝜇 𝑤ℎ𝑖𝑐ℎ 𝑖𝑠 𝑓𝑖𝑛𝑖𝑡𝑒 𝑡ℎ𝑒𝑛 by Khinchine's weak law of large numbers (𝑊LLN), we have
𝑛
1 𝑝
𝑋‾n = ∑ 𝑋𝑖 ⟶ 𝐸(𝑋𝑖 ) = 𝜇, as 𝑛 → ∞
𝑛
𝑖=1
Hence sample mean (𝑋‾𝑛 ) is always a consistent estimator of the population mean (𝜇).
Note 2. Obviously consistency is a property concerning the behaviour of an estimator for
indefinitely large values of the sample size 𝑛, Lee, as 𝑛 → ∞. Nothing is regarded of its
Moreover, if there exists a consistent estimator, say, 𝑇n of 𝛾(𝜃), then infinitely many such
behaviour for finite 𝑛. eftimators can be constructed, eg.
𝑛−𝑎 1 − (𝑎/𝑛) 𝑝
𝑇𝑛′ = ( ) 𝑇𝑚 = [ ] 𝑇𝑛 → 𝑇𝑛 ⟶ 𝛾(𝜃), as 𝑛 → ∞
𝑛−𝑏 1 − ((b/𝑛)
and hence, for different values of 𝑎 and 𝑏, 𝑇𝑛 is also consistent for 𝛾(𝜃).
Invariance Property of Consistent Estimators.
Theorem : If 𝑇𝑛 is a consistent estimator of 𝛾(𝜃) and 𝜓(𝛾(𝜃)) is a contimuous functiog of
𝛾(𝜃), then 𝜓(𝑇𝑛 ) is a consistent estimator of 𝜓(𝛾(𝜃)).
𝑝
Proof. Since 𝑇𝑛 is a consistent estimator of 𝛾(𝜃), 𝑇𝑛 ⟶ 𝛾(𝜃 as 𝑛 → ∞, i.e. for every 𝜀 > 0, 𝜂 >
0, ∃ a positive integer 𝑛 ≥ 𝑚(𝜀, 𝜂) such that
𝑃{|𝑇𝑛 − 𝛾(𝜃)| < 𝑒 ∣> 1 − 𝑛, ∀𝑛 ≥ 𝑚
Since 𝜓(⋅) is a continuous function, for every 𝜀 > 0, however small, ∃ a positive number 𝜀1
such that | 𝜓(𝑇𝑛 ) − 𝜓|𝛾(𝜃)| ∣< 𝜀1, whenever |𝑇𝑛 − 𝛾(𝜃)| < 𝜀, i.e.,
|𝑇𝑛 − 𝛾(𝜃)| < 𝜀 ⇒ |𝜓(𝑇𝑛 ) − 𝜓|𝛾(𝜃)|| < 𝜀1
For two events 𝐴 and 𝐵, if 𝐴 ⇒ 𝐵, then
𝐴 ⊆ 𝐵 ⇒ 𝑃(𝐴) ≤ 𝑃(𝐵) or 𝑃(𝐵) ≥ 𝑃(𝐴)
So
𝑃[∣ 𝜓(𝑇𝑛 ) − 𝜓(𝛾(𝜃)|| < 𝜀1 ] ≥ 𝑃[|𝑇𝑛 − 𝛾(𝜃)| < 𝜀]
⇒ 𝑃[∣ 𝜓(𝑇𝑛 ) − 𝜓(𝛾(𝜃)|| < 𝜀1 ] ≥ 1 − 𝜂; ∀𝑛 ≥ 𝑚
𝑝
𝜓(𝑇𝑛 ) ⟶ 𝜓[𝛾(𝜃)], as 𝑛 → ∞ or 𝜓(𝑇𝑛 ) is a consistent estimator of 𝛾(𝜃).
Sufficient Conditions for Consistency

Theorem 2. Let {𝑇𝑛 } be a sequence of estimators such that for all 𝜃 ∈ Θ,
(i) 𝐸𝜃 (𝑇𝑛 ) → 𝛾(𝜃), 𝑛 → ∞ and
(ii) Var𝜃 (𝑇𝑛 ) → 0, as 𝑛 → ∞. Then 𝑇𝑛 is a consistent estimator of 𝛾(𝜃).
206 | P a g e
Example 1. If 𝑋1 , 𝑋2 … , 𝑋𝑛 are random observations on a Bernoulli variate 𝑋 taking the value

𝑋 with probability 𝑝 and the value 0 with probability
(1 − 𝑝), show that :
Σ𝑥𝑖 Σ𝑥𝑖
(1 − ) is a consistent estimator of 𝑝(1 − 𝑝).
𝑛 𝑛
Solution. Since 𝑋1 , 𝑋2 , … , 𝑋𝑛 are i.i.d Bernoulli variates with parameter ' 𝑝 ',
𝑛
𝑇 = ∑ 𝑥𝑖 − 𝐵(𝑛, 𝑝) ⇒ 𝐸(𝑇) = 𝑛𝑝 and Var (𝑇) = 𝑛𝑝𝑞

𝑖=1
𝑛
1 𝑇 1 1
𝑋‾ = ∑ 𝑥𝑖 = ⇒ 𝐸(𝑋‾) = 𝐸(𝑇‾) = , 𝑛𝑝 = 𝑝
𝑛 𝑛 𝑛 𝑛
𝑖=1
𝑇 1 𝑝𝑞
Var (𝑋‾) = Var (𝑛) = 𝑛2 , Var (𝑇) = 𝑛 → 0 as 𝑛 → ∞.
Since 𝐸(𝑋‾) → 𝑝 and Var (𝑋‾) → 0, as 𝑛 → ∞; 𝑋‾ is a consistent estimator of 𝑝. Also

Σ𝑥𝑖 Σ𝑥𝑖
(1 − ) = 𝑋‾(1 − 𝑋‾), being a polynomial in 𝑋‾, is a continuous function of 𝑋‾.
𝑛 𝑛
Since 𝑋‾ is consistent estimator of 𝑝, by the invariance property of consistent estimators
(Theorem 17.1), 𝑋‾(1 − 𝑋‾) is a consistent estimator of 𝑝(1 − 𝑝).
12.3.3 Efficient Estimators:

Even if we confine ourselves to unbiased estimates, there will, in general, exist more than one
consistent estimator of a parameter. For example, in sampling from a normal population
𝑁(𝜇, 𝜎 2 ), when 𝜎 2 is known, sample mean 𝑥‾ is an unbiased and consistent estimator of 𝜇 .
From symmetry it follows immediately that sample median (𝑀𝑑) is an unbiased estimate of 𝜇,
which is same as the population median. Also, for large 𝑛,
1
𝑉(𝑀𝑑) =
4𝜋𝑓12
Here 𝑓1 = Median ordinate of the parent distribution. = Modal ordinate of the parent
distribution.
1 1
=[ exp {−(𝑥 − 𝜇)2 /2𝜎 2 }] =
𝜎√2𝜋 𝑥=𝜇 𝜎√2𝜋
1 𝜋𝜎 2
∴ 𝑉(𝑀𝑑) = ⋅ 2𝜋𝜎 2 =
4𝑛 2𝑛
207 | P a g e

median is also an unbiased and consistent estimator of 𝜇. Thus, there is a necessity of some
further criterion which will enable us to choose between the estimators with the common
property of consistency. Such a criterion which is based on the variances of the sampling
distribution of estimators is usually known as efficiency. If, of the two consistent estimators
𝑇1 , 𝑇2 of a certain parameter 𝜃, we have
𝑉(𝑇1 ) < 𝑉(𝑇2 ), for all 𝑛 then 𝑇1 is more efficient than 𝑇2 for all samples sizes.
We have seen above :

𝜎2 𝜋𝜎2 𝜎2
For all 𝑛, 𝑉(𝑥‾) = and for large 𝑛, 𝑉(𝑀𝑑) = = 1.57
𝑛 2𝑛 𝑛
Since 𝑉(𝑥‾) < 𝑉(𝑀𝑑), we conclude that for normal distribution, sample mean is more efficient
estimator for 𝜇 than the sample median, for large samples at least.
Most Efficient Estimator: If in a class of consistent estimators for a parameter, there exist
Vone whose sampling variance is less than that of any such estimator, it is called the most
efficient estimator. Wheneoer such an estimator exists, it provides a criterion for measurement
of efficiency of the other estimators.
Efficiency If 𝑇1 is the most efficient estimator with variance 𝑉1 and 𝑇2 is ary other estimator
with variance 𝑉2, then the efficiency 𝐸 of 𝑇2 is defined as :
𝑉1
𝐸=
𝑉2
Obvionsly, E cannot exceed unity.
If 𝑇, 𝑇1 , 𝑇2 , … , 𝑇𝑛 are all estimators of 𝛾(𝜃) and Var (𝑇) is minimum, then the efficiency 𝐸𝑖 of
𝑇i , (𝑖 = 1,2, … , 𝑛) is defined as :
Var 𝑇
𝐸𝑖 = ; 𝑖 = 1,2, … , 𝑛
Var 𝑇𝑖
Obviously 𝐸𝑖 ≤ 1; 𝑖 = 1,2, … 𝑛. For example, in the normal samples, since sample mean 𝑥‾ is
the most efficient estimator of 𝜇 , the efficiency E of 𝑀𝑑 for such samples, (for large 𝑛 ), is :
𝑉(𝑥‾) 𝜎 2 /𝑛 2
𝐸= = = = 0.637.
𝑉(𝑀𝑑) 𝜋𝜎 2 /(2𝑛) 𝜋
Example 1. A random sample (𝑋1 , 𝑋2 , 𝑋3 , 𝑋4 , 𝑋5 ) of size 5 is drawn from a normal population
with unknowon mean 𝜇. Consider the following estimators to estimate 𝜇 :
𝑋1 +𝑋2 +𝑋1 +𝑋4 +𝑋5
(i) 𝑡1 = ,
5
𝑋1 +𝑋2
(ii) 𝑡2 = + 𝑋3
2
2𝑋1 +𝑋2 +𝜆𝑋3
(iii) 𝑡3 = . where 𝜆 is such that 𝑡3 is an unbiased estimator of 𝜇.
3
208 | P a g e
Find 𝜆. Are 𝑡1 and 𝑡2 mbiased? State giving reasons, the estimator which is best among 𝑡1 , 𝑡2
and 𝑡3
Solution. We are given :
𝐸(𝑋𝑖 ) = 𝜇, Var (𝑋𝑖 ) = 𝜎 2 , ( say ); Cov (𝑋𝑖 , 𝑋𝑗 ) = 0, (𝑖 ≠ 𝑗 = 1,2, … , 𝑛)
1 1 1
(i) 𝐸(𝑡1 ) = 5 ∑5𝑖=1 𝐸(𝑋𝑖 ) = 5 ∑5𝑖=1 𝜇 = 5 , 5𝜇 = 𝜇 ⇒ 𝑡1 is an unbiased estimator of 𝜇.
1 1
(ii) 𝐸(𝑡2 ) = 2 𝐸(𝑋1 + 𝑋2 ) + 𝐸(𝑋3 ) = 2 (𝜇 + 𝜇) + 𝜇 = 2𝜇
⇒ 𝑡2 is not an unbiased estimator of 𝜇.

1
(iii) 𝐸(𝑡3 ) = 𝜇 ⇒ 3 𝐸(2𝑋1 + 𝑋2 + 𝜆𝑋3 ) = 𝜇
(∵ 𝑡3 is unbiased estimator of 𝜇)
∴ 2𝐸(𝑋1 ) + 𝐸(𝑋2 ) + 𝜆𝐸(𝑋3 ) = 3𝜇 ∴ 2𝜇 + 𝜇 + 𝜆𝜇 = 3𝜇 ⇒ 𝜆 = 0
1 1
𝑉(𝑡1 ) = {𝑉(𝑋1 ) + 𝑉(𝑋2 ) + 𝑉(𝑋3 ) + 𝑉(𝑋4 ) + 𝑉(𝑋5 )} = 𝜎 2
25 5
1 1 2 3
𝑉(𝑡2 ) = {𝑉(𝑋1 ) + 𝑉(𝑋2 )} + 𝑉(𝑋3 ) = 𝜎 + 𝜎 2 = 𝜎 2
4 2 2
1 1 5
𝑉(𝑡3 ) = {4𝑉(𝑋1 ) + 𝑉(𝑋2 )} = (4𝜎 2 + 𝜎 2 ) = 𝜎 2 (∵ 𝜆 = 0)
9 9 9
Since 𝑉(𝑡1 ) is least, 𝑡1 is the best estimator (in the sense of least variance) of 𝜇.
Example 2. 𝑋1 , 𝑋2, and 𝑋3 is a random sample of size 3 from a population with mean value 𝜇
ald variance 𝜎 2 , 𝑇1 , 𝑇2 , 𝑇3 are the estimators used to estimate mean value 𝜇, where 𝑇1 = 𝑋1 +
1
𝑋2 − 𝑋3 , 𝑇2 = 2𝑋1 + 3𝑋3 − 4𝑋2 , and 𝑇3 = 3 (𝜆𝑋1 + 𝑋2 + 𝑋3 )/3.
(i) Are 𝑇1 and 𝑇2 unbiased estimators ?

(ii) Find the value of 𝜆 such that 𝑇3 is unbiased estimator for 𝜇.
(iii) With this value of 𝜆 is 𝑇3 a consistent estimator ?
(iv) Which is the best estimator ?
Solution. Since 𝑋1 , 𝑋2 , 𝑋3 is a random sample from a population with mean 𝜇 and variance
𝜎 2 , 𝐸(𝑋𝑖 ) = 𝜇, Var (𝑋𝑖 ) = 𝜎 2 and Cov (𝑋𝑖 , 𝑋𝑗 ) = 0, (𝑖 ≠ 𝑗 = 1,2, … , 𝑛)
(i) We have
𝐸(𝑇1 ) = 𝐸(𝑋1 ) + 𝐸(𝑋2 ) − 𝐸(𝑋3 ) = 𝜇 ⇒ 𝑇1 is an unbiased estimator of 𝜇
𝐸(𝑇2 ) = 2𝐸(𝑋1 ) + 3𝐸(𝑋3 ) − 4𝐸(𝑋2 ) = 𝜇 ⇒ 𝑇2 is an unbiased estimator of 𝜇.
209 | P a g e

1
(ii) We are given: 𝐸(𝑇3 ) = 𝜇 ⇒ 3 {𝜆𝐸(𝑋1 ) + 𝐸(𝑋2 ) + 𝐸(𝑋3 )} = 𝜇
1
⇒ (𝜆𝜇 + 𝜇 + 𝜇) = 𝜇 ⇒ 𝜆 + 2 = 3 ⇒ 𝜆 = 1
3
1
(iii) With 𝜆 = 1, 𝑇3 = 3 (𝑋1 + 𝑋2 + 𝑋3 ) = 𝑋‾. Since sample mean is a consistent estimator of
population mean 𝜇, by Weak Law of Large Numbers, 𝑇3 is a consistent estimator of 𝜇.
(iv) We have
Var (𝑇1 ) = Var (𝑋1 ) + Var (𝑋2 ) + Var (𝑋3 ) = 3𝜎 2
Var (𝑇2 ) = 4Var (𝑋1 ) + 9Var (𝑋3 ) + 16Var (𝑋2 ) = 29𝜎 2
1 1
Var (𝑇3 ) = [Var (𝑋1 ) + Var (𝑋2 ) + Var (𝑋3 )] = 𝜎 2 (∵ 𝜆 = 1)
9 3
Since Var (𝑇3 ) is minimum, 𝑇3 is the best estimator of 𝜇 in the sense of minimum variance.
Minimum Variance Unbiased (M.V.U.) Estimators:
If a statistic 𝑇 = 𝑇(𝑥1 , 𝑥2 , … , 𝑥𝑛 ), based on sample of size 𝑛 is such that
(i) 𝑇 is unbiased for 𝛾(𝜃), for all 𝜃 ∈ Θ and
(ii) It has the smallest variance among the class of all unbiased estimators of 𝛾(𝜃), then 𝑇
is called the minimum variance unbiased estimator (𝑀𝑉𝑈𝐸) of 𝛾(𝜃).
More precisely, 𝑇 is MVUE of 𝛾(𝜃) if
𝐸𝜃 (𝑇) = 𝛾(𝜃) for all 𝜃 ∈ Θ and Var𝜃 (𝑇) ≤ Var𝜃 (𝑇 ′ ) for all 𝜃 ∈ Θ
where 𝑇 ′ is any other unbiased estimator of 𝛾(𝜃).
We give below some important Theorems concerning MVU estimators.

Theorem 1. An M.V.U. is unique in the sense that if 𝑇1 and 𝑇2 are M.V.U. estimator for
𝛾(𝜃), then 𝑇1 = 𝑇2 , almost surely.
Proof. We are given that

𝐸𝜃 (𝑇1 ) = 𝐸0 (𝑇2 ) = 𝛾(𝜃), for all 𝜃 ∈ Θ
and }
Var𝜃 (𝑇1 ) = Var0 (𝑇2 ), for all 𝜃 ∈ Θ
1
Consider a new estimator, 𝑇 = 2 (𝑇1 + 𝑇2 ) which is also unbiased since,
210 | P a g e
1
𝐸(𝑇) = {𝐸(𝑇1 ) + 𝐸(𝑇2 )} = 𝛾(𝜃)
2
1 1
Var (𝑇) = Var { (𝑇1 + 𝑇2 )} = Var (𝑇1 + 𝑇2 ) [∵ Var (𝐶𝑋) = 𝐶 2 Var (𝑋)]
2 4
1
= {Var (𝑇1 ) + Var (𝑇2 ) + 2Cov (𝑇1 , 𝑇2 )}
4
1
= {Var (𝑇1 ) + Var (𝑇2 ) + 2𝜌√Var (𝑇1 )Var (𝑇2 )}
4
1
= Var (𝑇1 )(1 + 𝜌),
2
where 𝜌 is Karl Pearson's co-efficient of correlation between 𝑇1 and 𝑇2 .
Since 𝑇1 is the 𝑀𝑉𝑈 estimator, Var (𝑇) ≥ Var (𝑇1 )
1 1
⇒ Var (𝑇1 )(1 + 𝜌) ≥ Var (𝑇1 ) ⇒ (1 + 𝜌) ≥ 1 ⇒ 𝜌 ≥ 1
2 2
Since |𝜌| ≤ 1, we must have 𝜌 = 1, i.e., 𝑇1 and 𝑇2 must have a linear relation of the form:
𝑇1 = 𝛼 + 𝛽𝑇2 where 𝛼 and 𝛽 are constants independent of 𝑥1 , 𝑥2 , … , 𝑥𝑛 but may depend on
𝜃, i.e., we may have 𝛼 = 𝛼(𝜃) and 𝛽 = 𝛽(𝜃).
Taking expectation of both sides then we get
𝜃 = 𝛼 + 𝛽𝜃
Also we get Var (𝑇1 ) = Var (𝛼 + 𝛽𝑇2 ) = 𝛽 2 Var (𝑇2 )
⇒ 1 = 𝛽 2 ⇒ 𝛽 = ±1
But since 𝜌(𝑇1 , 𝑇2 ) = +1, the coefficient of regression of 𝑇1 on 𝑇2 must be positive.
∴ 𝛽=1⇒𝛼=0
so we get 𝑇1 = 𝑇2 as desired.
Theorem 2. Let 𝑇1 and 𝑇2 be unbiased estimators of 𝛾(𝜃) with efficiencies 𝑐1 and 𝑐2
respectively anf 𝜌 = 𝜌𝜃 be the correlation coefficient between them. Then
√𝑒1 𝑒2 − √(1 − 𝑒1 )(1 − 𝑒2 ) ≤ 𝜌 ≤ √𝑒1 𝑒2 + √(1 − 𝑒1 )(1 − 𝑒2 )
Proof. Let 𝑇 be minimum variance unbiased estimator of 𝛾(𝜃). Then we are given :
𝐸𝜃 (𝑇1 ) = 𝛾(𝜃) = 𝐸𝜃 (𝑇2 ) ∀𝜃 ∈ Θ
𝑉𝜃 (𝑇) 𝑉 𝑉
𝑒1 = = , ( say ) ⇒ 𝑉1 =
𝑉𝜃 (𝑇1 ) 𝑉1 𝑐1
𝑉𝜃 (𝑇) 𝑉 𝑉
𝑒2 = = , ( say ) ⇒ 𝑉2 =
𝑉𝜃 (𝑇2 ) 𝑉2 𝑐2
Let us consider another estimator: 𝑇3 = 𝜆𝑇1 + 𝜇𝑇2 ,
211 | P a g e

which is also unbiased estimator of 𝛾(𝜃) i.e.,

𝐸(𝑇3 ) = (𝜆 + 𝜇)𝛾(𝜃) = 𝛾(𝜃) ⇒ 𝜆 + 𝜇 = 1
𝑉𝜃 (𝑇3 ) = 𝑉(𝜆𝑇1 + 𝜇𝑇2 ) = 𝜆2 𝑉(𝑇1 ) + 𝜇 2 𝑉(𝑇2 ) + 2𝜆𝜇Cov (𝑇1 , 𝑇2 )
𝜆2 𝜇 2 𝜆𝜇𝜌
= 𝑉( + +2⋅ )
𝑒1 𝑐2 √𝑒1 𝑒2
But 𝑉𝜃 (𝑇3 ) ≥ 𝑉, since 𝑉 is the minimum variance.

𝜆2 𝜇 2 2𝜌𝜆𝜇
∴ + + ≥ 1 = (𝜆 + 𝜇)2
𝑒1 𝑒2 √𝑒1 𝑒2
1 1 𝜌
⇒ ( − 1) 𝜆2 + ( − 1) 𝜇 2 + 2𝜆𝜇 ( − 1) ≥ 0
𝑐1 𝑐2 √𝑒1 𝑒2
1 𝜆 2 𝑝 𝜆 1
⇒ ( − 1) ( ) + 2 ( − 1) ( ) + ( − 1) ≥ 0,
𝑐1 𝜇 √𝑐1 𝑒2 𝜇 𝑐2
which is quadratic expression in (𝜆/𝜇).

1 1
Note that: 𝑒𝑖 < 1 ⇒ 𝑐 > 1 or (𝑐 − 1) > 0, 𝑖 = 1,2
𝑖 𝑖
We know that 𝐴𝑋 2 + 𝐵𝑋 + 𝐶 ≥ 0∀𝑥, 𝐴 > 0, 𝐶 > 0; if and only if
Discriminant = 𝐵 2 − 4𝐴𝐶 ≤ 0
𝜌 2 1 1 2
( − 1) − ( − 1) ( − 1) ≤ 0 ⇒ (𝜌 − √𝑒1 𝑒2 ) − (1 − 𝑒1 )(1 − 𝑒2 ) ≤ 0
√𝑒1 𝑒2 𝑒1 𝑒2
2
∴ 𝜌 − 2√𝑒1 𝑒2 𝜌 + (𝑒1 + 𝑒2 − 1) ≤ 0
This implies that 𝜌 lies between the roots of the equation :
𝜌2 − 2√𝑒1 𝑐2 𝜌 + (𝑒1 + 𝑒2 − 1) = 0
1
which are given by : 2 {2√𝑒1 𝑒2 ± 2√𝑒1 𝑒2 − (𝑒1 + 𝑒2 − 1)} =
√𝑒1 𝑒2 ± √(𝑒1 − 1)(𝑒2 − 1) Hence we have:

√𝑒1 𝑒2 − √(𝑒1 − 1)(𝑒2 − 1) ≤ 𝜌 ≤ √𝑒1 𝑒2 + √(𝑒1 − 1)(𝑒2 − 1)
⇒ √𝑒1 𝑒2 − √(1 − 𝑒1 )(1 − 𝑒2 ) ≤ 𝜌 ≤ √𝑒1 𝑒2 + √(1 − 𝑒1 )(1 − 𝑒2 )
12.3.4 Sufficient Estimators:
An estimator is said to be sufficient for a parameter, if it contains alf the information in the
sample regarding the parameter.
212 | P a g e
If T = t(𝑥1 , 𝑥2 … , 𝑥𝑛 ) is an estimator of a parameter 𝜃, based on a sample 𝑥1 , 𝑥2 , … , 𝑥𝑛 of size

𝑛 from the population with density 𝑓(𝑥, 𝜃) such that the conditional distribution of 𝑥1 , 𝑥2 , … , 𝑥𝑛
given T is independent of 𝜃 then T is sufficient estimator for 𝜃.
Example Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 be a random sample from a Bernoulli population with pararaeter '
𝑝 ', 0 < 𝑝 < 1, i.e.,
1, with probability 𝑝
𝑥i = {
0, with probability 𝑞 = (1 − 𝑝)
Then
𝑇 = 𝑡(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) = 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 follow 𝐵(𝑛, 𝑝)
𝑛
∴ 𝑃(𝑇 = 𝑘) = ( ) 𝑝k (1 − 𝑝)𝑛−𝑘 ; 𝑘 = 0,1,2, … , 𝑛
𝑘
The conditional distribution of (𝑥1 , 𝑥2 , … , 𝑥n ) given 𝑇 is :
𝑃(𝑥1 ∩ 𝑥2 ∩ … ∩ 𝑥𝑛 ∩ 𝑇 = 𝑘)
𝑃(𝑥1 ∩ 𝑥2 ∩ … ∩ 𝑥𝑛 ∣ 𝑇 = 𝑘) =
𝑃(𝑇 = 𝑘)
𝑝 (1 − 𝑝)𝑛−𝑘
k
1
𝑛 = 𝑛
( ) 𝑝𝑘 (1 − 𝑝)𝑛−𝑘 ( )
= 𝑘 𝑘
𝑛
0, if ∑ 𝑥𝑖 ≠ 𝑘
{ 𝑖=1
′
Since this does not depend on ' 𝑝 , 𝑇 = ∑𝑛𝑖=1 𝑥𝑖 is sufficient for ' 𝑝 '.
FACTORIZATION THEOREM (Neymann).
The necessary and sufficient condition for a distribution to admit sufficient statistic is
provided by the 'factorization theorem' due to Neymann.
Statement 𝑇 = 𝑡(𝑥) is sufficient for 𝜃 if and only if the joint density function 𝐿 (say), of the
sample values can be expressed in the form:
𝐿 = 𝑔𝜃 [𝑡(𝑥)] ⋅ ℎ(𝑥)
where (as indicated) 𝑔𝜃 [𝑡(𝑥)] depends on 𝜃 and 𝑥 only through the value of 𝑡(𝑥) and ℎ(𝑥) is
independent of 𝜃.
Remarks 1. It should be clearly understood that by 'a function independent of 𝜃 , we not only
mean that it does not involve 𝜃 but also that its domain does not contain 𝜃. For example, the
function:
1
𝑓(𝑥) = , 𝑎 − 𝜃 < 𝑥 < 𝑎 + 𝜃; −∞ < 𝜃 < ∞
2𝑎
213 | P a g e

depends on 𝜃.
2. It should be noted that the original sample 𝑋 = (𝑋1 , 𝑋2 , … , 𝑋𝑛 ), is always a sufficient
statistic.
3. The most general form of the distributions admitting sufficient statistic is Koopman's
form and is given by: 𝐿 = 𝐿(𝐱, 𝜃) = 𝑔(𝑥) ⋅ ℎ(𝜃). exp {𝑎(𝜃)𝜓(𝑥)] where ℎ(𝜃) and
𝑎(𝜃) are functions of the parameter 𝜃 only and 𝑔(𝑥) and 𝜓(𝑥) are the functions of the
sample observations only.
4. Invariance Property of Sufficient Estimator: If 𝑇 is a sufficient estimator for the
parameter 𝜃 ayd if 𝜓(𝑇) is a one to one function of 𝑇, then 𝜓(𝑇) is sufficient for 𝜓(𝜃).
5. Fisher-Neyman Criterion. A statistic 𝑡1 = 𝑡(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) is sufficient estimator of
parimeter 𝜃 if and only if the likelihood function (joint p.d.f of the sample) can be
expressed as :
𝑛
𝐿 = ∏ 𝑓(𝑥𝑖 , 𝜃) = 𝑔1 (𝑡1 , 𝜃) ⋅ 𝑘(𝑥1 , 𝑥2 , … , 𝑥𝑛 )

𝑖=1
where 𝑔1 (𝑡1 , 𝜃) is the p.d.f. of the statistic 𝑡1 and 𝑘(𝑥1 , 𝑥2 … . 𝑥𝑛 ) is a function of sample
observations only, independent of 𝜃.
Note that this method requires the working out of the p.d.f. (p.m.f.) of the statistic 𝑡1 =
𝑡(𝑥1 , 𝑥2 , … , 𝑥𝑛 ), which is not always easy.
Example 1. Let 𝑥1 , 𝑥2 , … , 𝑥1 be a random sample from a uniform population on [0, 𝜃]. Find
asufficient estimator for 𝜃.
1
, 0 ≤ 𝑥𝑖 ≤ 𝜃
(𝑥
Solution. We are given: 𝑓𝜃 𝑖 = { ) 𝜃
0, otherwise
1, if 𝑎 ≤ 𝑏 𝑘(0,𝑥𝑖 )𝑘(𝑥𝑖 ,𝜃)
Let 𝑘(𝑎, 𝑏) = }. then 𝑓0 (𝑥𝑖 ) = ,
0, if 𝑎 > 𝑏 𝜃
𝑛 𝑛
𝑘(0, 𝑥𝑖 )𝑘(𝑥𝑖 , 𝜃) 𝑘 (0, min 𝑥𝑖 ) ⋅ 𝑘 ( max 𝑥𝑖 , 𝜃)
1≤𝑖≤𝑛 1≤𝑖≤𝑛
𝐿 = ∏ 𝑓𝜃 (𝑥𝑖 ) = ∏ [ ]= 𝑛
= 𝑔0 (𝑡(𝑥) ∣ ℎ(𝑥)
𝜃 𝜃
𝑖=1 𝑖=1
where
𝑘{𝑡(𝐱), 𝜃}
𝑔0 [𝑡(𝐱)] = , 𝑡(𝑥) = max 𝑥𝑖 and ℎ(𝑥) = 𝑘 (0, min 𝑥𝑖 )
𝜃𝑛 1≤𝑖≤𝑛 1≤𝑖≤𝑛
Hence by Factorization theorem, 𝑇 = max1≤𝑖≤𝜋 𝑥𝑖 , is sufficient statistic for 𝜃.
214 | P a g e
Aliter. We have
𝑛
1
𝐿 = ∏ 𝑓(𝑥𝑖 , 𝜃) = ; 0 < 𝑥𝑖 < 𝜃
𝜃𝑛
𝑖=1
If 𝑡 = max(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) = 𝑥(𝑛) , then 𝑝. 𝑑. 𝑓. of 𝑡 is given by :

= 𝑛{𝐹(𝑥𝑛 )]𝑛−1 ⋅ 𝑓(𝑥(𝑛) )
𝑔(𝑡, 𝜃)
𝑥 ∗
1 𝑥
We have 𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = ∫ 𝑓(𝑥, 𝜃)𝑑𝑥 = ∫ , 𝑑𝑥 =
0 0 𝜃 𝜃
𝑥(𝑛) 𝑛−1 1 𝑢 𝑛−1
∴ 𝑔(𝑡, 𝜃) = 𝑛 { } ( ) = 𝑛 [𝑥(𝑛) ]
𝜃 𝜃 𝜃
Hence by Fisher - Neymann criterion, the statistic 𝑡 = 𝑥(𝑛) , is sufficient estimator for 𝜃.
Exampla 17.14. Let 𝑥1 , 𝑥2 , … , 𝑥1 be a random sample from 𝑁(𝜇, 𝜎 2 ) population. Find
sufficient esfimators for 𝜇 and 𝜎 2 .
Solution. Let us write 𝜃 = (𝜇, 𝜎 2 ); −∞ < 𝜇 < ∞, 0 < 𝜎 2 < ∞.
Then
𝑛 𝑛 𝑛
1 1
𝐿 = ∏ 𝑓0 (𝑥𝑖 ) = { } ⋅ exp {− 2 ∑ (𝑥𝑖 − 𝜇)2 }
𝜎√2𝜋 2𝜎
𝑖=1 𝑖=1
𝑛 𝑛
1 1
=( ) exp {− 2 (∑ 𝑥𝑖2 − 2𝜇 ∑ 𝑥𝑖 + 𝑛𝜇 2 )}
𝜎√2𝜋 2𝜎
𝑖=1
= 𝑔𝜃 [𝑡(𝑥)] ⋅ ℎ(𝑥)
1 𝑛 1
where 𝑔𝜃 [𝑓(𝑥)] = (𝜎√2𝜋) exp [− 2𝜎2 {𝑓2 (𝑥) − 2𝜇𝜇1 (𝑥) + 𝑛𝜇 2 }]
𝑡(𝑥) = |𝑡1 (𝑥), 𝑡2 (𝑥)| = (Σ𝑥1 , Σ𝑥𝑖2 ) and ℎ(𝑥) = 1

Thus 𝑡1 (𝑥) = Σ𝑥1 is sufficient for 𝜇 and 𝑡2 (𝑥) = ∑𝑥12 , is sufficient for 𝜎 2 .
Example 3. Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from a distribution with p.d.f.:

𝑓(𝑥, 𝜃) = 𝑒 −(𝑥−𝜃) , 𝜃 < 𝑥 < ∞; −∞ < 𝜃 < ∞
Obtain sufficient statistic for 𝜃.
Solution. Here
215 | P a g e

𝑛 𝑛 𝑛
−(𝑥𝑖 −𝜃)
𝐿 = ∑ 𝑓(𝑥𝑖 𝜃) = ∑ {𝑒 } = exp (− ∑ 𝑥𝑖 ) × exp (𝑛𝜃)
𝑖=1 𝑖=1 𝑖=1
Let 𝑌1 , 𝑌2 , … , 𝑌𝑛 denote the order statistics of the random sample such that 𝑌1 < 𝑌2 < ⋯ < 𝑌𝑛 .
The p.d.f. of the smallest observation 𝑌1 is given by:
𝑔1 (𝑦1 , 𝜃) = 𝑛[1 − 𝐹(𝑦1 )]𝑛−1 𝑓(𝑦1 , 𝜃),
where 𝐹(⋅) is the distribution function corresponding to 𝑝 ⋅ 𝑑. 𝑓. 𝑓(⋅).
Thus the likelihood function (") of X1 , X2 , … , X𝑛 may be expressed as
𝑛
𝑛𝜃
exp (− ∑𝑛𝑖=1 𝑥𝑖 )
𝐿 =𝑒 exp (− ∑ 𝑥𝑖 ) = 𝑛exp (−𝑛(𝑦1 − 𝜃)) { }
𝑛exp (−𝑛𝑦𝑖 )
𝑖=1
exp (− ∑𝑛𝑖=1 𝑥𝑖 )
= 𝑔1 (m 𝑥𝑖 , 𝜃) { }
𝑛exp (−𝑛𝑦𝑖 )
Hence by Fisher-Neymann criterion, the first order statistic 𝑌1 = min(𝑋1 , 𝑋2 , … , 𝑋𝑛 ) is a
sufficient statistic for 𝜃.
12.4 IN-TEXT QUESTIONS
Question: 1
Let 𝑋1 , 𝑋2 , … , 𝑋𝑁
be identically distributed random variable with mean 2 and variance 1. Let N be a random
variable follows Poisson distribution with mean 2 and independent of
𝑋i′ S. Let 𝑆𝑁 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑁 , then Var (SN ) is equals
A. 4
B. 10
C. 2
D. 1
Question: 2
Let 𝐴 and 𝐵 be independent Random Variables each having the uniform distribution on [0,1].
Let 𝑈 = min{𝐴, 𝐵} and 𝑉 = max{𝐴, 𝐵}, then Cov (𝑈, 𝑉) is equals
A. -1/36
B. 1/36
C. 1
216 | P a g e
D. 0
Question: 3
Let 𝑋1 , 𝑋2 , 𝑋3 be random sample from uniform (0, 𝜃 2 ), 𝜃 > 1 Then maximum likelihood
estimation (mle) of 𝜃
2
A. 𝑋(1)
B. √X(3)
C. √X(1)
D. 𝛼𝑋(1) + (1 − 𝛼)𝑋(3) ; 0 < 𝛼 < 1
Question: 4
For the discrete variate with density:
1 6 1
𝑓(𝑥) = 𝐼(−1) (𝑥) + 𝐼(0) (𝑥) + 𝐼(1) (𝑥).
8 8 8
Which of the following is TRUE?

1
A. 𝐸(𝑋) = 2
1
B. 𝑉(𝑋) = 2
1
C. 𝑃{|𝑋 − 𝜇𝑥 | ≥ 2𝜎𝑥 } ≤ 4
1
D. 𝑃{|𝑋 − 𝜇𝑥 | ≥ 2𝜎𝑥 } ≥ 4
Question: 5
Lęt 𝑋𝑖 , 𝑌𝑖 ; (𝑖 = 1,2)
be a i.i.d random sample of size 2 from a standard normal distribution. What is the
distribution W is given by
√2(𝑋1 + 𝑋2 )
𝑊=
√(𝑋2 − 𝑋1 )2 + (𝑌2 − 𝑌1 )2
A. t-distribution with 1 d.f

B. t-distribution with 2 d.f
217 | P a g e

C. Chi-square distribution with 2 d.f

D. Does not determined
Question: 6
The moment generating function of a random variable X is
given by
1 1 𝑡 1 2𝑡 1 3𝑡
𝑀𝑋 (𝑡) = + 𝑒 + 𝑒 + 𝑒 , −∞ < 𝑡 < ∞
6 3 3 6
Then P(X ≤ 2) equals
1
A.
3
1
B. 6
1
C. 2
5
D. 6
Question: 7
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from
1 1
𝐔 (𝜃 − 2 , 𝜃 + 2) distribution, where 𝜃 ∈ ℝ. If
𝑋(1) = min{𝑋1 , 𝑋2 , … , 𝑋𝑛 } and
𝑋(𝑛) = max{𝑋1 , 𝑋2 , … , 𝑋𝑛 }. Define
1 1
𝑇1 = (𝑋(1) + 𝑋(𝑛) ), 𝑇2 = (3𝑋(1) + 𝑋(𝑛) + 1)
2 4
1
and 𝑇3 = 2 (3𝑋(𝑛) − 𝑋(1) − 2)
an estimator for 𝜃, then which of the following is/are
TRUE?
A. 𝑇1 and 𝑇2 are MLE for 𝜃 but 𝑇3 is not MLE for 𝜃
B. 𝑇1 is MLE for 𝜃 but 𝑇2 and 𝑇3 are not MLE for 𝜃
C. 𝑇1 , 𝑇2 and 𝑇3 are MLE for 𝜃
218 | P a g e
D. 𝑇1 , 𝑇2 and 𝑇3 are not MLE for 𝜃

Question: 8
Let 𝑋 and 𝑌 be random variable having joint probability density function
𝑘
𝑓(𝑥, 𝑦) = ; −∞ < (𝑥, 𝑦) < ∞
(1 + 𝑥 2 )(1 + 𝑦 2 )
Where k is constant, then which of the following is/are TRUE?
1
A. k = 𝜋2
1 1
B. 𝑓(𝑥) = 𝜋 1+𝑥 2 ; −∞ < 𝑥 < ∞
C. P(X = Y) = 0
D. all of the above
Question : 9
Lę 𝑋1 , 𝑋2 , … , 𝑋𝑛
be sequence of independently and identically distributed random variables with the
probability density function
1 2 −𝑥
𝑓(𝑥) = {2 𝑥 𝑒 , if 𝑥 > 0 and let
0, otherwise
𝑆𝑛 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛
Then which of the following statement is/are TRUE?
𝑆𝑛 −3𝑛
A. ∼ 𝑁(0,1) for all 𝑛 ≥ 1
√3𝑛
𝑆
B. For all 𝜀 > 0, 𝑃 (| 𝑛𝑛 − 3| > 𝜀) → 0 as
n→∞
𝑆𝑛
C. → 1 with probability 1
𝑛
D. Both A and B
Question : 10
Let 𝑋, 𝑌 are i.i.d Binomial (𝑛, 𝑝) random variables. Which of the following are true?
219 | P a g e

A. 𝑋 + 𝑌 ∼ Bin (2𝑛, 𝑝)
B. (X, Y) ∼ Multinomial (2n; p, p)
C. Var (X − Y) = E(X − Y)2
D. option A and C are correct.

Question: 11
Let 𝑋 and 𝑌 be continuous random variables with the joint probability density function
1 2 2
1
𝑓(𝑥, 𝑦) = 2𝜋 𝑒 −2(𝑥 +𝑦 ) ; (𝑥, 𝑦) ∈ ℝ2
Which of the following statement is/are TRUE?
1
A. 𝑃(𝑋 > 0) = 2
1
B. P(X > 0 ∣ Y < 0) = 2
1
C. P(X > 0, Y < 0) = 4
D. All of the above
Question: 12
Let X and 𝑌 are random variable with 𝐸[𝑋] = 𝐸[𝑌], then which of the following is NOT
TRUE?
A. E{E[X ∣ Y]} = E[Y]
B. V(𝑋 − 𝑌) = 𝐸(𝑋 − 𝑌)2
C. 𝐸[𝑉(𝑋 ∣ 𝑌)] + 𝑉[𝐸(𝑋 ∣ 𝑌)] = 𝑉(𝑋)
D. X and Y have same distribution
Question : 13
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from Exp (𝜃 ) distribution, where 𝜃 ∈ (0, ∞).
1
If 𝑋‾ = 𝑛 ∑𝑛𝑖=1 𝑋𝑖 , then a 95% confidence interval for 𝜃 is
2
𝜒2𝑛,0.95
A. (0, ]
𝑛𝑋‾
2
𝜒2𝑛,0.95
B. [ , ∞)
𝑛𝑋‾
220 | P a g e
2
𝜒2𝑛,0.95
C. (0, ]
2𝑛𝑋‾
2
𝜒2𝑛,0.95
D. [ , ∞)
2𝑛𝑋‾
Question: 14
𝑋𝑖 , 𝑖 = 1,2, …
be independent random variables all distributed according to the PDF 𝑓𝑥 (𝑥) = 1,0 ≤ 𝑥 ≤ 1.
Define
𝑌𝑛 = 𝑋1 𝑋2 𝑋3 … 𝑋𝑛 , for some integer n. Then Var (𝑌𝑛 ) is equal to
𝑛
A. 12
1 1
B. − 22𝑛
3𝑛
1
C. 12𝑛
1
D. 12
Question : 15
Let 𝑋1 , 𝑋2 , … , 𝑋4
be i.i.d random variables having continuous distribution.
Then
𝑃(𝑋3 < 𝑋2 < max(𝑋1, 𝑋4 )) equal
A. 1/2
B. 1/3
C. 1/4
D. 1/6
Question : 16
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from
221 | P a g e

1 1
U (𝜃 − 0 , 𝜃 + 2) distribution, where 𝜃 ∈ ℝ. If
𝑋(1) = min{𝑋1 , 𝑋2 , … , 𝑋𝑛 } and
𝑋(𝑛) = max{𝑋1 , 𝑋2 , … , 𝑋𝑛 }
Consider the following statement on above:

1
1. 𝑇1 = 2 (𝑋(1) + 𝑋(𝑛) ) is consistent for 𝜃
1
2. 𝑇2 = (3𝑋(1) + 𝑋(𝑛) + 1)
4
is unbiased consistent for 𝜃
Select the correct answer using code given below:
A. 1 only
B. 2 only
C. Both 1 and 2
D. Neither 1 nor 2
Question: 17
Lęt 𝐹𝑛 be a sequence of DFs defined by
0, 𝑥<0
1
𝐹𝑛 (𝑥) = {1 − , 0 ≤ 𝑥 ≤ 𝑛 and let
𝑛
1, 𝑛≤𝑥
lim𝑛→∞ 𝐹𝑛 (𝑥) = 𝐹(𝑥)
then which of the following is/are TRUE?
0, 𝑥<0
A. 𝐹(𝑥) = {
1, 𝑥≥0
B. 𝐸(𝑋𝑛𝑘 ) → 𝐸(𝑋 𝑘 ) for any k ≥ 1
C. F is distribution function of the RVX degenerate at 𝑥 = 0
D. all of the above
222 | P a g e
Question : 18
Lę 𝑋𝑖 (𝑖 = 1,2, … , 𝑛)
be a random sample drawn from uniform distribution on [𝜃, 𝜃 + 1],
then which of the following statement is/are correct?
1
A. (𝑋‾ − 2) is an unbiased estimate of 𝜃
𝑋(1) +𝑋(𝑛) 1
B. − 2 is an unbiased estimate of 𝜃
2
C. (𝑋(1) , 𝑋(𝑛) ) is jointly sufficient but not complete for 𝜃

D. all options are correct.
Question: 19
Lęt {𝑋𝑛 , 𝑛 ≥ 1} be i.i.d uniform (−1,2) random variables and let 𝑆𝑛 = ∑𝑛𝑘=1 𝑋𝑘 ⋅
Then, as n → ∞
𝑆𝑛 1
A. → 2 in probability
𝑛
𝑠𝑛 1
B. → 2 in distribution
𝑛
C. P(𝑆𝑛 ≤ 𝑛) → 1 as n → ∞
D. all options are correct

Question : 20
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛
denote random sample of size n from a uniform population with probability density function
1 1
𝑓(𝑥, 𝜃) = 1; 𝜃 − 2 ≤ 𝑥 ≤ 𝜃 + 2 , −∞ < 𝑥 < ∞
𝑋(𝑛) +𝑋(1)
Define 𝑇𝑛 = .
2
A. 𝑇𝑛 is consistent for 𝜃
B 𝑇𝑛 is MLE for 𝜃
C. 𝑇𝑛 is unbiased consistent for 𝜃
D. all options are correct
223 | P a g e

Question : 21
The cumulative distribution function of a random variable X given by
0, if 𝑥 < 0
4
, if 0 ≤ 𝑥 < 1
𝐹(𝑥) = 9
8
, if 1 ≤ 𝑥 < 2
9
{1, if 𝑥 ≥ 2
Which of the following statements is (are) TRUE?
A. The random variable 𝑋 takes positive probability only at least two points
5
B. 𝑃(1 ≤ 𝑋 ≤ 2) = 9
21
C. E(X) = 3
4
D. 𝑃(0 < 𝑋 < 1) = 9
Question: 22
Let A and B be events in a sample space S such that
1 1
𝑃(𝐴) = 2 = 𝑃(𝐵) and 𝑃(𝐴𝑐 ∩ 𝐵 𝑐 ) = 3. Which of the following is correct?
5
A. 𝑃(𝐴 ∪ 𝐵 𝑐 ) = 6
5
B. 𝑃(𝐴 ∪ 𝐵 𝑐 ) ≤ 6
C. 𝑃(𝐴 ∩ 𝐵) ≥ min{𝑃(𝐴), 𝑃(𝐵)}
D. A and B are independent

Question: 23
Let X and 𝑌 be i.i.d random variable having distribution
function
224 | P a g e
1 1
+ tan−1 (𝑥) − ∞ < 𝑥 < ∞
𝐹(𝑥) =
2 𝜋
Which of the following is NOT TRUE?
1 1
A. 𝑓(𝑥, 𝑦) = 𝜋2 (1+𝑥 2)(1+𝑦 2) ; −∞ < (𝑥, 𝑦) < ∞
1 1
B. 𝑓(𝑥) = 𝜋 1+𝑥 2 ; −∞ < 𝑥 < ∞
C. Φ𝑋+𝑌 (𝑡) = 𝑒 −2𝑖|𝑡|

D. E(X) does not exist
Question : 24
Suppose 𝑟1.23 and 𝑟1.234 are sample multiple correlation coefficient of 𝑋1 on 𝑋2 , 𝑋3 and 𝑋1 on
𝑋2 , 𝑋3 , 𝑋4 respectively. Which of the following is possible?
A. 𝑟1.23 = −0.3, 𝑟1.234 = 0.7
B. 𝑟1.23 = −0.5, 𝑟1.234 = −0.7
C. 𝑟1.23 = 0.3, 𝑟1.234 = 0.7
D. 𝑟1.23 = 0.7, 𝑟1.234 = −0.3

Question : 25
If 𝑋1 , 𝑋2 , … , 𝑋𝑛 is a random sample from a population with density 𝑓(𝑥, 𝜃) =
𝜃−1
if 0<𝑥 < 1
{𝜃𝑥
0, otherwise
Where 𝜃 > 0 is an unknown parameter, what is a 100(1 − 𝑎)%
confidence interval for 𝜃?
2
𝜒𝛼 (2𝑛) 𝜒2 𝛼 (2𝑛)
1−
2 2
A. [ , ]
2∑𝑛
𝑖=1 ln 𝑋𝑖 2∑𝑛
𝑖=1 ln 𝑋𝑖
225 | P a g e

2
𝜒𝛼 (𝑛) 𝜒2 𝛼 (𝑛)
1−
B. [−2∑𝑛2 , 2
]
𝑖=1 ln 𝑋𝑖 −2∑𝑛
𝑖=1 ln 𝑋𝑖
2
𝜒𝛼 (2𝑛) 𝜒2 𝛼 (2𝑛)
1−
2 2
C. [ , ]
−2∑𝑛
𝑖=1 ln 𝑋𝑖
2
𝜒𝛼 (𝑛) 𝜒2 𝛼 (𝑛)
1−
2 2
D. [2∑𝑛 , ]
𝑖=1 ln 𝑋𝑖
Question : 26
Suppose that 𝑟 ball are drawn one at time without replacement from a bag containing n white
and m black balls. Let 𝑆𝑟 be the number of black balls drawn, then var (𝑆𝑟 )
is equal to
𝑚𝑛𝑟
A. (𝑚 + 𝑛 − 𝑟)
(𝑚+𝑛)2 (𝑚+𝑛+1)
𝑚𝑛𝑟
B. (𝑚 + 𝑛 − 𝑟)
(𝑚+𝑛)2 (𝑚+𝑛)
𝑚𝑛𝑟
C. (𝑚 + 𝑛 − 𝑟)
(𝑚+𝑛)2 (𝑚+𝑛−1)
𝑚𝑛𝑟
D. (𝑚 + 𝑛 − 𝑟)
(𝑚+𝑛)2 (𝑚−𝑛)
Question: 27
Let 𝐹𝑛 be a sequence of DFs defined by
0, 𝑥<0
1
𝐹𝑛 (𝑥) = {1 − , 0 ≤ 𝑥 ≤ 𝑛
𝑛
1, 𝑛≤𝑥
and let lim𝑛→∞ 𝐹𝑛 (𝑥) = 𝐹(𝑥)
then which of the following is NOT TRUE?
0, 𝑥<0
A. 𝐹(𝑥) = {
1, 𝑥≥0
B. 𝐸(𝑋𝑛𝑘 ) → 𝐸(𝑋 𝑘 ) for any k ≥ 1

226 | P a g e
C. 𝑋𝑛 converge in probability to 0
D. 𝐹 is distribution function of the RVX degenerate at 𝑥 = 0
Question : 28
The Cumulative distribution function of a random variable 𝑋 is given by
0, 𝑥<2
1 7
𝐹(𝑥) = { (𝑥 2 − ) , 2≤𝑥<3
10 3
1, 𝑥≥3
Which of the following statements is(are) TRUE?
A. 𝐹(𝑥) is continuous everywhere

B. F(x) increases only by jumps
1
C. 𝑃(𝑋 = 2) = 16
5
D. 𝑃 (𝑋 = 2 ∣ 2 ≤ 𝑋 ≤ 3) = 0
Question: 29
Let 𝑋 and 𝑌 be two independent standard normal random variables. Then the probability
|𝑋|
density function of 𝑍 = |𝑌| is
√1/2
𝑓(𝑧) = { √𝜋 𝑒 −𝑧/2 𝑧 −1/2 if 𝑧 > 0
0, otherwise
2 −𝑧 2/2
𝑒 if 𝑧 > 0
𝑓(𝑧) = { √𝜋
0, otherwise
−𝑧
𝑒 if 𝑧 > 0
𝑓(𝑧) = {
0, otherwise
2 1
⋅ if 𝑧 > 0
𝑓(𝑧) = {√𝜋 (1 + 𝑧 2 )
0, otherwise
227 | P a g e

Question: 30
If the joint moment generating function of the random variables X and Y is
2 +18𝑡 2 +12𝑠𝑡)
M(s, t) = 𝑒 (𝑠+3𝑡+2𝑠
Which of the following is/are correct?
A. 𝐸(𝑋) < 𝐸(𝑌)
B. Corr (𝑋, 𝑌) > 0
C. Cov (𝑋, 𝑌) = 12
D. all of the above
12.5 SUMMARY
The main points which we have covered in this lessons are what is estimator and what is
consistency, efficiency and sufficiency of the estimator and how to get best estimator.
12.6 GLOSSARY
Motivation: These Problems are very useful in real life and we can use it in data science ,
economics as well as social sciemce.
Attention: Think how the best estimator are useful in real world problems.
Answer 1 : B
Explanation:
Let 𝑋1 , 𝑋2 , …
be identically distributed random variable and let N be a random variable.

Define 𝑆𝑁 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑁
Then E(SN ) = E(Xi ) ⋅ E(N) = 4
𝑉(𝑆𝑁 ) = 𝐸(𝑁)Var (𝑋𝑖 ) + [𝐸(𝑋𝑖 )]2 Var (𝑁) = 10
Answer 2 : B
Explanation:
If 𝐴 and 𝐵 be independent 𝑅𝑎𝑛𝑑𝑜𝑚 𝑉𝑎𝑟𝑖𝑎𝑏𝑙𝑒 each having the uniform distribution on [0,1].
Let 𝑈 = min{𝐴, 𝐵} and 𝑉 = max{𝐴, 𝐵},
then
228 | P a g e
𝐸(𝑈) = 1/3, 𝐸(𝑉) = 2/3 and 𝑈𝑉 = 𝐴𝐵 and 𝑈 + 𝑉 = 𝐴 + 𝐵

Thus Cov (𝑈, 𝑉) = 𝐸(𝑈𝑉) − 𝐸(𝑈)
𝐸(𝑉) = 𝐸(𝐴𝐵) − 𝐸(𝑈)
1 2 1
E(V) = E(A) ⋅ E(B) − E(U) ⋅ E(V) = − =
4 9 36
Answer 3 : B
Explanation:
1
𝑋𝑖 ∼ 𝑈(0, 𝜃 2 ) 𝑓(𝑥) = ; 0 < 𝑥𝑖 < 𝜃 2
𝜃2
𝑋(3) ≤ 𝜃 2 ⇒ 𝜃ˆ ∈ [√𝑋(3) , ∞)
3
1
𝐿(𝑋, 𝜃) = ∏ 𝑓(𝑥𝑖 , 𝜃) =
𝜃6
𝑖=1
∂𝐿
⇒ ∂𝜃 < 0 there fore given function is decreasing then 𝜃ˆ = √𝑋(3)
Answer 4 : C
Explanation:
X −1 0 1
P(x) 1/8 6/8 1/8
1 6 1
E(X) = −1 × + 0 × + 1 × = 0
8 8 8
1 6 1 1
E(𝑋 2 ) = 1 × + 0 × + 1 × =
8 8 8 4
1 1
𝑉(𝑋) = 𝐸(𝑋 2 ) − {𝐸(𝑋)}2 = ⇒ 𝜎𝑋 =
4 2
229 | P a g e

𝑃{|𝑋 − 𝜇𝑥 | ≥ 2𝜎𝑥 } = 𝑃{|𝑋| ≥ 1} = 1 − 𝑃(|𝑋| < 1)
= 1 − 𝑃(−< 𝑋 < 1) = 1 − 𝑃(𝑋 = 0) = 1/4
1
𝑃{|𝑋 − 𝜇𝑥 | ≥ 2𝜎𝑥 } ≤ 4 [By Chebychev’s inequality]
Answer 5 : B
Explanation:
Let 𝑋𝑖 , 𝑌𝑖 ; (𝑖 = 1,2)
be a i.i.d random sample of size 2 from a standard normal
√2(X1 +X2 )
distribution. Then W = ∼ 𝑡(2)
√(X2 −X1 )2 +(Y2 −Y1 )2
Hence option (b) is correct.

Answer 6 : D
Explanation:
Let 𝑋 be Random Variable with 𝑀𝑋 (𝑡) = 𝐸(𝑒 𝑡𝑋 ) = ∑etx P(X = x)
1
; 𝑥=0
6
1
; 𝑥=1
3
Then 𝑃(𝑋 = 𝑥) = 1
; 𝑥=2
3
1
{6 ; 𝑥 = 3
𝑃(𝑋 ≤ 2) = 𝑃(𝑋 = 0) + 𝑃(𝑋 = 1) + 𝑃(𝑋 = 2)

1 1 1 5
= + + =
6 3 3 6
Answer 7 : A
Explanation:
1 1
𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from U (𝜃 − 2 , 𝜃 + 2)
1 1
𝑓(𝑥) = 1; 𝜃 − < 𝑥𝑖 < 𝜃 +
2 2
230 | P a g e
1 1
𝜃ˆ ∈ [𝑋(𝑛) − , 𝑋(1) + ]
2 2
distribution of 𝑋 free from parameter, then
1 1
𝜃ˆ = 𝜆 (𝑋(𝑛) − ) + (1 − 𝜆) (𝑋(1) + ) ; 0 < 𝜆 < 1
2 2
1 1 3
Take 𝜆 = 2 , 4 and 4 then we obtained mle of 𝜃 are
1 1 1
(𝑋(1) + 𝑋(𝑛) ); 4 (3𝑋(1) + 𝑋(𝑛) + 1); 4 (3𝑋(1) + 𝑋(𝑛) + 1) respectively.
2
Hence option (a) is correct.

Answer 8 : D
Explanation:
Let 𝑋 and 𝑌 be random variable having joint probability
𝑘
density function 𝑓(𝑥, 𝑦) = (1+𝑥 2)(1+𝑦 2) ; −∞ < (𝑥, 𝑦) < ∞
∞ ∞ 1
∫−∞ ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = 1 ⇒ 𝑘 = 2
𝜋
1 1
Since 𝑋 and 𝑌 are independent, then 𝑋 ∼ 𝑓(𝑥) = 2 ; −∞ < 𝑥 < ∞
𝜋 1+𝑥
P(X = Y) = 0{ There is no region occur corresponding to X = Y, then probability

corresponding to this region will be zero}
Answer 9 : D
Explanation:
Clearly, 𝑋1 , 𝑋2 , … , 𝑋𝑛
are i.i.d 𝐺(3,1) random variables. Then, 𝐸(𝑋𝑖 ) = 3 and Var (𝑋𝑖 ) = 3, 𝑖 = 1,2, …
Let 𝑆𝑛 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 , then E(𝑆𝑛 ) = 3𝑛 and Var (𝑆𝑛 ) = 3𝑛
Now For option (a)
𝑆𝑛 −3𝑛
Using CLT ∼ 𝑁(0,1) for all 𝑛 ≥ 1
√3𝑛
231 | P a g e

For option (b)
𝑆 3𝑛 𝑆 3𝑛
lim𝑛→∞ 𝐸 ( 𝑛𝑛) = lim𝑛→∞ = 3; lim𝑛→∞ 𝑉 ( 𝑛𝑛) = lim𝑛→∞ 𝑛2 = 0
𝑛
By Using Convergence in probability condition

(Consistency Properties)
𝑆
For all 𝜀 > 0, 𝑃 (| 𝑛𝑛 − 3| > 𝜀) → 0 as n → ∞
For option (c)

𝑆𝑛
→3
𝑛
with probability 1 (By using convergent in probability condition)
For option (d)

𝑠𝑛 −𝐸(𝑠𝑛 ) 3(𝑛−√𝑛)−𝐸(𝑆𝑛 )
lim𝑛→∞ 𝑃 ( ≥ ) = 𝑃(𝑍 ≥ −√3) = 1 −
√Var (𝑆2 ) √Var (𝑆𝑤 )
𝑃(𝑍 ≤ −√3)
1
= 1 − Φ(−√3) ≥
2
Answer 10 : D
Explanation:
(A) Sum of independent binomial variate is also a binomial variate if corresponding
probability will be same
Then 𝑋 + 𝑌 ∼ Bin (2𝑛, 𝑝)
(B) When there are more than two variables include, the observation lead to multinomial
distribution.
(𝑋, 𝑌) not follows Multinomial (2𝑛; 𝑝, 𝑝)
(C) Var (X − Y) = E(X − Y)2 − {E(X − Y)}2 = E(X − 𝑌)2
232 | P a g e
(D) Cov (𝑋 + 𝑌, 𝑋 − 𝑌) = 𝑉(𝑋) − Cov (𝑋, 𝑌) + Cov (𝑌, 𝑋) − 𝑉(𝑌) = 0
{∴ X and Y are independent Cov (X1 Y) = Cov (Y, X) = 0}
Hence option D is correct.
Answer 11 : D
Explanation:
1 2 +𝑦 2 ) 1 2 1 2
1 1 1
The joint pdf of 𝑋 and 𝑌 is 𝑓(𝑥, 𝑦) = 2𝜋 𝑒 −2(𝑥 = 𝑒 −2(𝑥 ) × 𝑒 −2(𝑦 ) ; (𝑥, 𝑦)
√2𝜋 √2𝜋
∈ ℝ2
It is easy to see that 𝑋 and 𝑌 are i.i.d 𝑁(0,1) random
variables, and therefore,
1
𝑃(𝑋 > 0) = 2
1 1 1
𝑃(𝑋 > 0)𝑃(𝑌 < 0) = × =
2 2 4
𝑃(𝑋 > 0, 𝑌 < 0) 1

𝑃(𝑋 > 0 ∣ 𝑌 < 0) = =
𝑃(𝑌 < 0) 2
Answer 12 : D
Explanation:
E{E[ X ∣ Y ]} = E[X] = E[Y] {Given that E[X] = E[Y]}
V(𝑋 − 𝑌) = 𝐸(𝑋 − 𝑌)2 − {𝐸(𝑋 − 𝑌)}2 = 𝐸(𝑋 − 𝑌)2

{ Since 𝐸[𝑋 − 𝑌] = 0}
𝐸[ V(X ∣ Y)] + V[E(X ∣ Y)] = V(X)
𝑋 and 𝑌 may or may not be same distribution.
Hence option (D) is correct.

Answer 13 : C
Explanation:
233 | P a g e

𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from Exp (𝜃)

distribution, where 𝜃 ∈ (0, ∞)
Then 2𝜃∑𝑛𝑖=1 𝑋𝑖 ∼ 𝜒2𝑛 2
⇒ P(0 < 2𝜃∑𝑛𝑖=1 𝑋𝑖 ≤ 𝜒2𝑛,0.95
2
)=
0.95
2
𝜒2𝑛,0.95
0 < 2𝜃∑𝑛𝑖=1 𝑋𝑖 ≤ 𝜒2𝑛,0.95
2
⇒ 𝜃 ∈ (0, ]
2𝑛𝑥‾
Hence option C is correct.
Answer 14 : B
Explanation:
𝑋1 , 𝑋2 , … , 𝑋𝑛 are independent, we have that E(𝑌𝑛 ) = E(𝑋1 ) × … × 𝐸(𝑋2 ). Similarly,
𝐸(𝑌𝑛2 ) = E(𝑋12 ) × … × 𝐸(𝑌𝑛2 ). Since
E(𝑋𝑖 ) = 1/2 and E(𝑌𝑖2 ) = 1/3 for i = 1,2, … , n
it follows that
1 1
Var (𝑌𝑛 ) = 𝐸(𝑌𝑛2 ) − [E(𝑌𝑛 )]2 = 𝑛 − 2𝑛
3 2
Hence option (B) is correct.
Answer 15 : C
Explanation:
Note that 𝑃(𝑋1 < 𝑋2 ) + 𝑃(𝑋2 < 𝑋1 ) + 𝑃(𝑋1 = 𝑋2 ) = 1
since the corresponding events are disjoint and exhaust
all the probabilities. But 𝑃(𝑋1 < 𝑋2 ) = 𝑃(𝑋2 < 𝑋1 )
by symmetry. Furthermore, 𝑃(𝑋1 = 𝑋2 ) = 0
1
since the random variables are continuous. Therefore, 𝑃(𝑋1 < 𝑋2 ) = 2. From above results
1
𝑃(𝑋3 < 𝑋2 < max(𝑋1, 𝑋4 )) = 4
Answer 16 : A
Explanation:
1 1
1 1
𝑓(𝑥) = 1; 𝜃 − < 𝑥𝑖 < 𝜃 +
2 2
1 1
ˆ
𝜃 ∈ [𝑋(𝑛) − 2 , 𝑋(1) + 2] ; distribution of 𝑋
234 | P a g e
1 1
free from parameter, then 𝜃ˆ = 𝜆 (𝑋(𝑛) − 2) + (1 − 𝜆) (𝑋(1) + 2) ; 0 < 𝜆 < 1
1 1
Take 𝜆 = 2 , 4 we get
1
𝑇1 = 2 (𝑋(1) + 𝑋(𝑛) ) is MLE as well as consistent for 𝜃
1
𝑇2 = (3𝑋(1) + 𝑋(𝑛) + 1)
4
is MLE as well as consistent for 𝜃 but not unbiased…
Hence option (A) is correct.
Answer 17 : D
Explanation:
0, 𝑥 < 0
lim𝑛→∞ 𝐹𝑛 (𝑥) = 𝐹(𝑥) = {
1, 𝑥 ≥ 0
Note that 𝐹𝑛 is the DF of the RV𝑋𝑛 with PMF
1 1
𝑃(𝑋𝑛 = 0) = 1 − , 𝑃(𝑋𝑛 = 𝑛) =
𝑛 𝑛
And F is the distribution function of the RVX degenerate at
1
X = 0. We have 𝐸(𝑋𝑛𝑘 ) = 𝑛𝑘 (2) = 𝑛𝑘−1 where k
is a positive integer. Also 𝐸(𝑋 𝑘 ) = 0 so that
𝐸(𝑋𝑛𝑘 ) → 𝐸(𝑋 𝑘 ) for any k ≥ 1
𝑋𝑛 does not converge in probability to 0 because 𝑋𝑛
converges in probability to 1 .
Hence option (d) is correct.
Answer 18 : D
Explanation:
Given that
𝑋 ∼ 𝑈[𝜃, 𝜃 + 1] ⇒ 𝑓(𝑥) = 1; 𝜃 ≤ 𝑋𝑖 ≤ 𝜃 + 1
For option (A)
235 | P a g e

1 2𝜃+1 1
𝐸 (𝑋‾ − 2) = −2=𝜃
2
1
(𝑋‾ − 2) is an unbiased estimate of 𝜃
For option (B)
𝑋(1) + 𝑋(𝑛) 1 2𝜃 + 1 1
𝐸( − )= − =𝜃
2 2 2 2
𝑋(1) +𝑋(𝑛) 1
− 2 is an unbiased estimate of 𝜃
2
For option (C)

The joint pdf of 𝑋1 , 𝑋2 , … , 𝑋𝑛 is given by 𝑓𝜃 (𝑋1 , 𝑋2 , … , 𝑋𝑛 ) = 1. 𝐼𝐴
Where 𝐴 = {(𝑥1 , 𝑥2 , … , 𝑥𝑛 ); 𝜃 ≤ 𝑋(1) ≤ 𝑋(𝑛) ≤ 𝜃 + 1}

(𝑋(1) , 𝑋(𝑛) ) is jointly sufficient for 𝜃
𝑋(1) +𝑋(𝑛) 1
Construct a non-zero function 𝑔(𝑇) = − 𝜃 − 2 ∀𝜃 > 0
2
𝑋(1) + 𝑋(𝑛) 1
E[g(T)] = 𝐸 [ −𝜃− ]=0
2 2
By-completeness property it is not complete sufficient statistics for 𝜃.
Hence option (d) is correct.

Answer 19 : D
Explanation:
𝑆 1 𝑆
lim𝑛→∞ 𝐸 ( 𝑛𝑛) → 2 and lim𝑛→∞ 𝑉 ( 𝑛𝑛) → 0
𝑆𝑛 1
this implies 𝑛
→ 2 in probability
𝑆𝑛 1 𝑆𝑛 1
If → 2 in probability then → 2 in distribution
𝑛 𝑛
1
𝑆𝑛 −
2
Using CLT: 3
∼ 𝑁(𝟎, 1)as 𝐧 → ∞
√
4
236 | P a g e
𝑠𝑛 −𝐸(𝑆𝑛 ) 𝑛−𝐸(𝑆𝑛 )
𝐏( ≤ ) → 1 as 𝐧 → ∞
√𝑆𝑛 √𝑆𝑛
Hence option 𝑑 is correct.
Answer 20 : D
Explanation:
1 1
1, 𝜃 − 2 ≤ 𝑥𝑖 ≤ 𝜃 + 2
Here 𝐿 = 𝐿(𝜃; 𝑋1 , 𝑋2 , … , 𝑋𝑛 ) = {
0, otherwise
1 1
If 𝑋(1) , 𝑋(2) , … , 𝑋(𝑛) is order sample, then 𝜃 − 2 ≤ 𝑥(1) ≤ 𝑥(2 ) ≤ ⋯ ≤ 𝜃 + 2
1 1
Thus 𝐿 attains the maximum if 𝜃ˆ ∈ [𝑋(𝑛) − 2 , 𝑋(1) + 2]
Therefore the convex linear combination is also a MLE for 𝜃
1 1
𝜃ˆ = 𝜆 (𝑋(𝑛) − ) + (1 − 𝜆) (𝑋(1) + )
2 2
1 𝑋(𝑛) +𝑋(1)
take 𝜆 = 2 ⇒ 𝜃ˆ = 𝑇𝑛 = and also
2
𝑋(𝑛) + 𝑋(1)
𝐸( )=𝜃
2
𝑋(𝑛) +𝑋(1)
By property of MLE𝑇𝑛 = 2
is consistent for 𝜃
From above 𝑇𝑛 are unbiased, consistent and also MLE for 𝜃
Answer 21 : B
Explanation:
Since F is continuous on R ∖ {0,1,2}, it follows that P(X = x) = 0∀x ∈ ℝ ∖ {0,1,2}
4
Further, P(X = 0) = F(0) − F(0−) = 9
4
P(X = 1) = F(1) − F(1−) =
9
237 | P a g e

1
and 𝑃(𝑋 = 2) = 𝐹(2) − 𝐹(2−) = 9
Therefore, we conclude that 𝑋 is a discrete random
variable positive probabilities at three points 0,1,2.
4 1 5
P(1 ≤ X ≤ 2) = P(X = 1) + P(X = 2) = + =
9 9 9
2
4 4 1 2
𝐸(𝑋) = ∑𝑥=0 𝑥𝑃(𝑋 = 𝑥) = 0 × + 1 × + 2 × =
9 9 9 3
and 𝑃(0 < 𝑋 < 1) = 0.
Hence option B is correct.

Answer 22 : A
Explanation:
Notice that 𝐴 ∪ 𝐵 𝑐 = 𝐴 ∪ (𝐴𝑐 ∩ 𝐵 𝑐 )
1 1 5
Thus, 𝑃(𝐴 ∪ 𝐵 𝑐 ) = 𝑃(𝐴) + 𝑃(𝐴𝑐 ∩ 𝐵 𝑐 ) = 2 + 3 = 6
Answer 23 : C
Explanation:
Let 𝑋 and 𝑌 be i.i.d random variable having distribution function
1 1
𝐹(𝑥) = + tan−1 (𝑥) − ∞ < 𝑥 < ∞
2 𝜋
1 1
𝑋 ∼ 𝑓(𝑥) = ; −∞ < 𝑥 < ∞
𝜋 1 + 𝑥2
Φ𝑋 (𝑡) = 𝑒 −|𝑡| ; Φ𝑋+𝑌 (𝑡) = 𝐸(𝑒 𝑖𝑡(𝑥+𝑦) ) = 𝑒 −2|𝑡|
Both 𝑋 and 𝑌 are i.i.d random variable.
Answer 24 : C
Explanation:
Since sample multiple correlation lies between 0 to 1
238 | P a g e
0 ≤ 𝑟1.23,…,𝑛 ≤ 1
So option C is only hold this condition

Answer 25 : C
Explanation:
We use the random variable
𝑄 = −2𝜃∑𝑛𝑖=1 ln 𝑋𝑖 ∼ 𝜒(2𝑛)
2
As the pivotal quantity. The 100(1 − 𝑎)% confidence interval

for 𝜃 can be constructed from
𝜒𝛼2 (2𝑛) 2
𝜒1− 𝛼 (2𝑛)
1 − 𝛼 = 𝑃 (𝜒𝛼2 (2𝑛) ≤ 𝑄 ≤ 𝜒1−
2
α (2𝑛)) = 𝑃 [
2
≤𝜃≤ 2
]
2 2 −2∑𝑛𝑖=1 ln 𝑋𝑖 −2∑𝑛𝑖=1 ln 𝑋𝑖
2
𝜒𝛼 (2𝑛) 𝜒2 𝛼 (2𝑛)
1−
2 2
Thus, 100(1 − 𝑎)% confidence interval for 𝜃 is given by [ , ]
−2∑𝑛
𝑖=1 ln 𝑋𝑖
Answer 26 : A
Explanation:
1, if the kth ball drawn is black
Let us define 𝑋𝑘 = { 𝑘 = 1,2, … , 𝑟
0, if the kth ball drawn is white
Then 𝑆𝑟 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑟
𝑚 𝑛
Also, P(𝑋𝑘 = 1) = 𝑚+𝑛′ , and P(𝑋𝑘 = 0) = 𝑚+𝑛
𝑚 𝑚𝑛
Thus E(𝑋𝑘 ) = 𝑚+𝑛 and V(𝑋𝑘 ) = (𝑚+𝑛)2
To compute cov (𝑋𝑗 , 𝑋𝑘 ), 𝑗 ≠ 𝑘
note that the random variable 𝑋𝑗 𝑋𝑘 = 1id
239 | P a g e

then jth and 𝑘th balls drawn are black, and = 0 otherwise.
𝑚 𝑚−1
Thus 𝐸(𝑋𝑗 , 𝑋𝑘 ) = 𝑃(𝑋𝑗 = 1, 𝑋𝑘 = 1) = 𝑚+𝑛 𝑚+𝑛−1
𝑚𝑛
and Cov (𝑋𝑗 , 𝑋𝑘 ) = (𝑚+𝑛)2(𝑚+𝑛−1)
𝑚𝑟 𝑚𝑛𝑟
Thus E(𝑆𝑟 ) = ∑𝑟𝐾=1 𝐸(𝑋𝑘 ) = and 𝑉(𝑆𝑟 ) = (𝑚 + 𝑛 − 𝑟)
𝑚+𝑛 (𝑚+𝑛)2 (𝑚+𝑛+1)
Hence option A is correct.
Answer 27 : C
Explanation:
0, 𝑥 < 0
lim𝑛→∞ 𝐹𝑛 (𝑥) = 𝐹(𝑥) = {
1, 𝑥 ≥ 0
Note that 𝐹𝑛 is the 𝐷𝐹 of the 𝑅𝑉𝑋𝑛 with PMF
1 1
𝑃(𝑋𝑛 = 0) = 1 − , 𝑃(𝑋𝑛 = 𝑛) =
𝑛 𝑛
And F is the distribution function of the RVX degenerate at X = 0.
1
We have 𝐸(𝑋𝑛𝑘 ) = 𝑛𝑘 (𝑛) = 𝑛𝑘−1
where k is a positive integer. Also 𝐸(𝑋 𝑘 ) = 0,
so that 𝐸(𝑋𝑛𝑘 ) → 𝐸(𝑋 𝑘 ) for any k ≥ 1
𝑋𝑛 does not converge in probability to 0 because 𝑋𝑛 converges in probability to 1 .

Answer 28 : D
Explanation:
1 7 1
We have 𝐹(2) = 10 (4 − 3) = 6 and
𝐹(2− ) = 0. since 𝐹(2) ≠ 𝐹(2− ), the function 𝐹
is not continuous at 2
240 | P a g e
It is direct to see that 𝐹 is increasing in 𝑥 ∈ [2,3)

without any jump
1
P(X = 2) = F(2) − F(2− ) = 6
5
Since F is continuous at x = 5/2, we have P (X = 2) = 0, and therefore,
5
5 𝑃 (𝑋 = 2 , 2 ≤ 𝑋 ≤ 3)
𝑃 (𝑋 = ∣ 2 ≤ 𝑋 ≤ 3) =
2 𝑃(2 ≤ 𝑋 ≤ 3)
5
𝑃 (𝑋 = 2)
= =0
𝑃(2 ≤ 𝑋 ≤ 3)

Answer 29 : D
Explanation:
We know that the ratio of two independent standard normal random variables has Cauchy
𝑋
distribution, and therefore, the pdf of U = 𝑉 is
|𝑋|
Now, it is to verify that the pdf of 𝑍 = |𝑌|
21
⋅ if 𝑧 > 0
𝑓(𝑧) = {√𝜋 (1 + 𝑧 2 )
0, otherwise
B. Answer 30 : D
Explanation:
2 2
M(s, t) = 𝑒 (𝑠+3𝑡+2𝑠 +18𝑡 +12𝑠𝑡)
∂𝑀 ∂𝑀
𝐸(𝑌) = [ ] = 3; 𝐸(𝑋) = [ ] =1
∂𝑡 (0,0) ∂𝑠 (0,0)
∂2 𝑀
𝐸(𝑋𝑌) = [ ] = 15
∂𝑠 ∂𝑡 (0,0)
𝐸(𝑋) < 𝐸(𝑌)
Cov (𝑋, 𝑌) = 𝐸(𝑋𝑌) − 𝐸(𝑋) ⋅ 𝐸(𝑌) = 15 − 1 × 3 = 12
Cov (𝑋, 𝑌) > 0 this implies Corr (𝑋, 𝑌) > 0
241 | P a g e

12.8 REFERENCES
• S. C Gupta , V.K Kapoor, Fundamentals of Mathematical Statistics,Sultan Chand
Publication, 11th Edition.
• B.L Agarwal, Programmed Statistics ,New Age International Publishers, 2nd Edition.
242 | P a g e
LESSON 13
STATISTICAL HYPOTHESIS
STRUCTURE
13.2 Introduction
13.3 Statistical Hypothesis
13.3.1 Simple and Composite Hypothesis
13.3.2 Critical Region
13.3.3 Type I and Type II Error
13.3.4 Most Powerful Test
13.3.5 Neymann Pearson Lemma
13.5 Summary
13.6 Glossary
13.8 References
One of the main objectives to discuss testing of hypothesis and how it can be use in real
analysis.
13.2 INTRODUCTION
The main problems in statistical inference can be broadly classified into two areas:
(i) The area of estimation of population parameter(s) and setting up of confidence intervals
for them, i.e, the area of point and intertal estimation and
(ii) Tests of statistical hypothesis.
In Neyman-Pearson theory, we use statistical methods to arrive at decisions in certain situations
where there is lack of certainty on the basis of a sample whose size is fixed in advance while
in Wald's sequential theory the sample size is not fixed but is regarded as a random variable.
243 | P a g e

13.3 STATISTICAL HYPOTHESIS

We will discuss these distributions in details.
(i) Simple and Composite Hypothesis
(ii) Critical Region
(iii) Type I and Type II Error
(iv) Most Powerful Test
(v) Neymann Pearson Lemma
13.3.1 Simple and Composite Hypothesis
A statistical hypothesis is some statement or assertion about a population or equivalently. about
the probability distribution characterizing a population, which we want to verify on the
basis of information available from a sample. If the statistical hypothesis specifies the
population completely then it is termed as a simple statistical hypothesis otherwise it is called
a composite statistical hypothesis.
For example, if 𝑋1 , 𝑋2 , … , 𝑋𝑛 is a random sample of size 𝑛 from a normal population with mean
𝜇 and variance 𝜎 2 , then the hypothesis 𝐻0 : 𝜇 = 𝜇0 , 𝜎 2 = 𝜎02 is a simple hypothesis, whereas
each of the following hypotheses is a composite hypothesis:
(i) 𝜇 = 𝜇0
(ii) 𝜎 2 = 𝜎02 ,
(iii) 𝜇 < 𝜇0 , 𝜎 2 = 𝜎02 ,
(iv) 𝜇 > 𝜇0 , 𝜎 2 = 𝜎02
(v) 𝜇 = 𝜇0 , 𝜎 2 < 𝜎02 ,
(vi) 𝜇 = 𝜇0 , 𝜎 2 > 𝜎02 ,
(vii) 𝜇 < 𝜇0 , 𝜎 2 > 𝜎02 .
A hypothesis which does not specify completely ' 𝑟 ' parameters of a population is termed as a
composite hypothesis with 𝑟 degrees of freedom.
Test of a Statistical Hypothesis.
A test of a statistical hypothesis is a two-action decision problem after the experimental sample
values have been obtained, the two actions being the acceptance or rejection of the hypothesis
under consideration.
Null Hypothesis.
In hypothesis testing, a statistician or decision-maker should not be motivated by prospects of
profit or loss resulting from the acceptance or rejection of the hypothesis. He should be
completely impartial and should have no brief for any party or company nor should he allow
his personal views to influence the decision. Much, therefore, depends upon how the hypothesis
is framed. For example, let us consider the 'light-bulbs' problem. Let us suppose that the bulbs
244 | P a g e
manufactured under some standard manufacturing process have an average life of 𝜇 hours and
it is proposed to test a new procedure for manufacturing light bulbs. Thus, we have two
populations of bulbs, those manufactured by standard process and those manufactured by the
new process. In this problem the following three hypotheses may be set up:
(i) New process is better than standard process.
(ii) New process is inferior to standard process.
(iii) There is no difference between the two processes.
The first two statements appear to be biased since they reflect a preferential attitude to one or
the other of the two processes. Hence the best course is to adopt the hypothesis of no difference,
as stated in (iii). This suggests that the statistician should take up the neutral or null attitude
regarding the outcome of the test. His attitude should be on the null or zero line in which the
experimental data has the due importance and complete say in the matter. This neutral or non-
committal attitude of the statistician or decision-maker before the sample observations are
taken is the keynote of the null hypothesis.
Thus in the above example of light bulbs if 𝜇0 is the mean life (in hours) of the bulbs
manufactured by the new process then the null hypothesis which is usually denoted by H0 , can
be stated as follows: 𝐻0 : 𝜇 = 𝜇0 .
As another example let us suppose that two different concerns manufacture drugs for inducing
sleep, drug 𝐴 manufactured by first concern and drug 𝐵 manufactured by second roncern. Each
company claims that its drug is superior to that of the other and it is desired to test which is a
superior drug 𝐴 or 𝐵 ? To formulate the statistical hypothesis let 𝑋 be a random variable which
denotes the additional hours of sleep gained by an individual when drug 𝐴 is given and let the
random variable 𝑌 denote
the additional hours of sleep gained when drug 𝐵 is used, Let us suppose that 𝑋 and 𝑌 follow
the probability distributions with means 𝜇𝑋 and 𝜇𝑌 respectively. Here our null hypothesis
would be that there is no difference between the effects of two drugs. Symbolically, 𝐻0 : 𝜇𝑋 =
𝜇𝑌
Alternative Hypothesis.
It is desirable to state what is called an alternative hypothesis in respect of every statistical
hypothesis being tested because the acceptance or rejection of null hypothesis is meaningful
only when it is being tested against a rival hypothesis which should rather be explicitly
mentioned. Alternative hypothesis is usually denoted by 𝐻1 . For example, in the example of
light bulbs, alternative hypothesis could be 𝐻1 : 𝜇 > 𝜇0 or 𝜇 < 𝜇0 or 𝜇 ≠ 𝜇0 . In the example of
drugs, the alternative hypothesis could be 𝐻1 : 𝜇𝑋 > 𝜇𝑌 or 𝜇𝑋 < 𝜇𝑌 or 𝜇𝑋 ≠ 𝜇𝑌 .
In both the cases, the first two of the alternative hypotheses give rise to what are called 'one
tailed' test and the third alternative hypothesis results in 'two tailed' tests.
245 | P a g e

Important Remarks
1. In the formulation of a testing problem and devising a 'test of hypothesis' the roles of 𝐻0
and 𝐻1 are not at all symmetric. In order to decide which one of the two hypotheses should
be taken as null hypothesis 𝐻0 and which one as alternative hypothesis 𝐻1 , the intrinsic
difference between the roles and the implifications of these two terms should be clearly
understood.
2. If a particular problem cannot be stated as a test between two simple hypotheses, i.e., simple
null hypothesis against a simple alternative hypothesis, then the next best alternative is to
formulate the problem as the test of a simple null hypothesis against a composite alternative
hypothesis. In other words, one should try to structure the problem so that null hypothesis
is simple rather than composite.
3. Keeping in mind the potential losses due to wrong decisions (which may or may not be
measured in terms of money), the decision maker is somewhat conservative in holding the
null hypothesis as true unless there is a strong evidence from the experimental sample
observations that it is false. To him, the consequences of wrongly rejecting a null
hypothesis seem to be more severe than those of wrongly accepting it. In most of the cases,
the statistical hypothesis is in the form of a claim that a particular product or product process
is superior to some existing standard. The null hypothesis H0 in this case is that there is no
difference between the new product or production process and the existing standard. In
other words, null hypothesis nullifies this claim. The rejection of the null hypothesis
wrongly which amounts to the acceptance of claim wrongly involves huge amount of
pocket expenses towards a substantive overhaul of the existing set-up. The resulting loss is
comparatively regarded as more serious than the opportunity loss in wrongly accepting H0
which amounts to wrongly rejecting the claim, i.e., in sticking to the less efficient existing
standard. In the light-bulbs problem discussed earlier, suppose the research division of the
concern, on the basis of the limited experimentation, claims that its brand is more effective
than that manufactured by standard process. If in fact, the brand fails to be more effective
the loss incurred by the concern due to an immediate obsolescence of the product, decline
of the concern's image, etc., will be quite serious. On the other hand, the failure to bring
out a superior brand in the market is an opportunity loss and is not a consideration to be as
serious as the other loss.
13.3.2 Critical Region
Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 be the sample observations denoted by O. All the values of 𝑂 will be aggregate
of a sample and they constitute a space, called the sample space, which is denoted by 𝑆.
Since the sample values 𝑥1 , 𝑥2 , … , 𝑥𝑛 can be taken as a point in 𝑛-dimensional space, we specify
some region of the 𝑛-dimensional space and see whether this point lies within this region or
outside this region. We divide the whole sample space 𝑆 into two disjoint parts 𝑊 and 𝑆 − 𝑊
or 𝑊 ‾ or 𝑊 ′ : The null hypothesis H0 is rejected if the observed sample point falls in 𝑊 and if
it falls in 𝑊 ′ we reject 𝐻1 and accept H0 . The region of rejection of H0 when H0 is true is that
246 | P a g e
region of the outcome set where H0 is rejected if the sample point falls in that region and is
called critical region. Evidently, the size of the critical region is 𝛼, the probability of
committing type 1 error (discussed below).
Suppose if the test is based on a sample of size 2, then the outcome set or the sample space is
the first quadrant in a two dimensional space and a test criterion will enable us to separate our
outcome set into two complementary subsets, W and 𝑊 ‾ . If the sample point falls in the subset
𝑊, 𝐻0 is rejected, otherwise 𝐻0 is accepted. This is shown in the adjoining diagram :
13.3.3 Type I and Type II Errors :

The decision to accept or reject the null hypothesis H0 is made on the basis of the information
supplied by the observed sample observations. The conclusion drawn on the basis of a
particular sample may not always be true in respect of the population. The four possible
situations that arise in any test procedure are given in the following table.
From the above table it is obvious that in any testing problem we are liable to commit two types
of errors.
247 | P a g e

Errors of Type I and Type II. The error of rejecting 𝐻0 (accepting 𝐻1 ) when 𝐻0 is true is called
Type I error and the error of accepting 𝐻0 when 𝐻0 is false (𝐻1 is true) is called Type II error.
The probabilities of type I and type II errors are denoted by 𝛼 and 𝛽 respectively.
Thus
𝛼 = Probability of type I error = Probability of rejecting 𝐻0 when 𝐻0 is true.
An ideal test would be the one which properly keeps under control both the types of errors. But
since the commission of an error of either type is a random variable, equivalently an ideal test
should minimise the probability of both the types of errors, viz., 𝛼 and 𝛽. But unfortunately,
for a fixed sample size 𝑛, 𝛼 and 𝛽 are so related (like producer's and consumer's risk in sampling
inspection plans), that the reduction in one results in an increase in the other. Consequently,
the simultaneous minimising of both the errors is not possible. Since type I error is deemed to
be more serious than the type II error (c.f. Remark 3§18.2.3 ) the usual practice is to control 𝛼
at a predetermined low level and subject to this constraint on the probabilities of type I error,
choose a test which minimises 𝛽 or maximises the power function 1 − 𝛽. Generally, we choose
𝛼 = 0.05 or 0.01 .
STEPS IN SOLVING TESTING OF HYPOTHESIS PROBLEM
The major steps involved in the solution of a 'testing of hypothesis' problem may be outlined
as follows:
1) Explicit knowledge of the nature of the population distribution and the parameter(s) of
interest, i.e., the parameter(s) about which the hypotheses are set up.
2) Setting up of the null hypothesis 𝐻0 and the alternative hypothesis 𝐻1 in terms of the range
of the parameter values each one embodies.
3) The choice of a suitable statistic 𝑡 = 𝑡(𝑥1 , 𝑥2 , … . , 𝑥𝑛 ) called the test statistic, which will
best reflect upon the probability of 𝐻0 and 𝐻1 .
4) Partitioning the set of possible values of the test statistic 𝑡 into two disjoint sets 𝑊 (called
the rejection region or critical region) and 𝑊 ‾ (called the acceptance region) and framing
the following test :
(i) Reject 𝐻0 (i.e., accept 𝐻1 ) if the value of 𝑡 falls in 𝑊.
(ii) ‾.
Accept 𝐻0 if the value of 𝑡 falls 𝑊
5) After framing the above test, obtain experimental sample observations, compute the
appropriate test statistic and take action accordingly.
OPTIMUM TEST UNDER DIFFERENT SITUATIONS
The discussion enables us to obtain the so called best test under different situations. In any
testing problem the first two steps, viz, the form of the population distribution, the parameter(s)
of interest and the framing of H0 and 𝐻1 should be obvious from the description of the problem.
The most crucial step is the choice of the 'best test, i.e., the best statistic ' 𝑡 ' and the critical
248 | P a g e
region 𝑊 where by best test we mean one which in addition to controlling 𝛼 at any desired low
level has the minimum type II error 𝛽 or maximum power 1 - 𝛽, compared to 𝛽 of all other
tests having this 𝛼. This leads to the following definition.
13.3.4 Most Powerful Test
Most Powerful Test (MP Test). Let us consider the problem of testing a simple hypothesis:
𝐻0 : 𝜃 = 𝜃0 against a simple alternative hypothesis : 𝐻1 : 𝜃 = 𝜃1
Definition. The critical region 𝑊 is the most powerful (MP) critical region of size 𝛼 (and the
corresponding test a most potverful test of level 𝛼 ) for testing 𝐻0 : 𝜃 = 𝜃0 against 𝐻1 : 𝜃 = 𝜃1
if
𝑃(𝑥 ∈ 𝑊 ∣ 𝐻0 ) = ∫ 𝐿0 𝑑𝑥 = 𝛼
𝑊
and 𝑃(𝑥 ∈ 𝑊 ∣ 𝐻1 ) ≥ 𝑃(𝑥 ∈ 𝑊1 ∣ 𝐻1 )

for every othen critical region 𝑊1 satisfying (18.3).
Uniformly Most Powerful Test (UMP Test).
Let us now take up the case of testing a simple null hypothesis against a composite alternative
hypothesis, e.g., of testing 𝐻0 : 𝜃 = 𝜃0
against the alternative 𝐻1 : 𝜃 ≠ 𝜃0
against the alternative 𝐻1 : 𝜃 ≠ 𝜃0
In such a case, for a predetermined 𝛼, the best test for 𝐻0 is called the uniformly most
powerful testeftevel 𝛼.
Definition. The region 𝑊 is called uniformly most powerful (UMP) critical region of size 𝛼
[and the corresponding test as uniformly most powerful (UMP) test of level 𝛼 ] for testing
𝐻0 : 𝜃 = 𝜃0 against 𝐻1 : 𝜃 ≠ 𝜃0 i.e., 𝐻1 : 𝜃 = 𝜃1 ≠ 𝜃0 if
𝑃(𝑥 ∈ 𝑊 ∣ 𝐻0 ) = ∫ 𝐿0 𝑑𝑥 = 𝛼
𝑊
and 𝑃(𝑥 ∈ 𝑊 ∣ 𝐻1 ) ≥ 𝑃(𝑥 ∈ 𝑊1 ∣ 𝐻1 ) for all 𝜃 ≠ 𝜃0
whatever the yegion 𝑊1 satisfying may be.
13.3.5 Neymann Pearson Lemma
This Lemma provides the most powerful test of simple hypothesis against a simple alternative
hypothesis. The theorem, known as Neyman-Pearson Lemma, will be proved for density
function 𝑓(𝑥, 𝜃) of a single continuous variate and a single parameter. However, by regarding
𝑥 and 𝜃 as vectors, the proof can be easily generalised for any number of random variables
𝑥1 , 𝑥2 , … , 𝑥𝑛 and any number of parameters 𝜃1 , 𝜃2 … , 𝜃k . The variables 𝑥1 , 𝑥2 … . . 𝑥𝑛 occurring
in this theorem are understood to represent a random sample of size 𝑛 from the population
249 | P a g e

whose density function is 𝑓(𝑥, 𝜃). The lemma is concerned with a simple hypothesis 𝐻0 : 𝜃 =
𝜃0 and a simple alternative 𝐻1 : 𝜃 = 𝜃1 .
Neyman Pearson Lemma
Let 𝑘 > 0, be a constant and 𝑊 be a critical region of size 𝛼 such that
𝑓(𝑥, 𝜃1 )
𝑊 = {𝑥 ∈ 𝑆: > 𝑘}
𝑓(𝑥, 𝜃0 )
𝐿1
⇒ 𝑊 = {𝑥 ∈ 𝑆: > 𝑘}
𝐿0
𝐿
‾ = {𝑥 ∈ 𝑆: 1 < 𝑘}
and 𝑊
𝐿0
where 𝐿0 and 𝐿1 are the likelihood functions of the sample observations 𝑥 = (𝑥1 , 𝑥2 , … , 𝑥𝑛 )
under 𝐻0 and 𝐻1 respectively. Then 𝑊 is the most powerful critical region of the test
hypothesis 𝐻0 : 𝜃 = 𝜃0 against the alternative 𝐻1 : 𝜃 = 𝜃1 .
Proof. We are given
𝑃(𝑥 ∈ 𝑊 ∣ 𝐻0 ) = ∫ 𝐿0 𝑑𝑥 = 𝛼
𝑊
The power of the region is
𝑃(x ∈ 𝑊 ∣ 𝐻1 ) = ∫ 𝐿1 𝑑x = 1 − 𝛽, (say).
𝑊
In order to establish the lemma, we have to prove that there exists no other critical region, of
size less than or equal to 𝛼, which is more powerful than 𝑊.
Let 𝑊1 be another critical region of size 𝛼1 ≤ 𝛼 and power 1 − 𝛽1 so that we have
𝑃(𝑥 ∈ 𝑊1 ∣ 𝐻0 ) = ∫ 𝐿0 𝑑𝑥 = 𝛼1
𝑊1
𝐿1
and 𝑃(𝑥 ∈ 𝑊1 ∣ 𝐻1 ) = ∫ 𝑑x = 1 − 𝛽1
𝑊1
Now we have to prove that 1 − 𝛽 ≥ 1 − 𝛽1

Let
𝑊 = 𝐴 ∪ 𝐶 and 𝑊1 = 𝐵 ∪ 𝐶
250 | P a g e
If 𝛼1 ≤ 𝛼, we have
∫ 𝐿0 𝑑x ≤ ∫ 𝐿0 𝑑x
𝑊1 𝑊
⇒ ∫ 𝐿0 𝑑x ≤ ∫ 𝐿0 𝑑x
𝐵∪𝐶 𝐴∪𝐶
⇒ ∫ 𝐿0 𝑑x ≤ ∫ 𝐿0 𝑑x
𝐵 𝐴
⇒ ∫ 𝐿0 𝑑x ≥ ∫ 𝐿0 𝑑x
𝐴 𝐵
Since 𝐴 ⊂ 𝑊,
(18.5) ⇒ ∫ 𝐿1 𝑑x > 𝑘 ∫ 𝐿0 𝑑x ≥ 𝑘 ∫ 𝐿0 𝑑x
𝐴 𝐴 𝐵
Also it implies
𝐿1
‾
≤ 𝑘∀𝑥 ∈ 𝑊
𝐿0
⇒ ∫ 𝐿1 𝑑x ≤ 𝑘 ∫ 𝐿0 𝑑𝐱
‾
𝑊 ‾
𝑊
‾ , say 𝑊
This result also holds for any subset of 𝑊 ‾ ∩ 𝑊1 = 𝐵. Hence
∫ 𝐿1 𝑑x ≤ 𝑘 ∫ 𝐿0 𝑑x ≤ ∫ 𝐿1 𝑑x
𝐵 𝐵 𝐴
Adding ∫𝐶 𝐿1 𝑑𝐱 to both sides, we get
∫ 𝐿1 𝑑𝐱 ≤ ∫ 𝐿1 𝑑𝐱 ⇒ 1 − 𝛽 ≥ 1 − 𝛽1
𝑊1 𝑊
Hence the Lemma.
251 | P a g e

Remark. Let W defined in (18.5) of the above theorem be the most powerful critical region
of size 𝛼 for testing 𝐻0 : 𝜃 = 𝜃0 against 𝐻1 : 𝜃 = 𝜃1 , and let it be independent of 𝜃1 ∈ Θ1 =
Θ − Θ0 , where Θ0 is the parameter space under 𝐻0 . Then we say that C.R. W is the UMP CR
of size 𝛼 for testing: 𝐻0 : 𝜃 = 𝜃0 , against 𝐻1 : 𝜃 ∈ Θ1 .
Example 1. Given the frequency function :
1
𝑓(𝑥, 𝜃) = {𝜃 , 0 ≤ 𝑥 ≤ 𝜃
0, elsewhere
and that you are testing the null hypothesis 𝐻0 : 𝜃 = 1 against 𝐻1 : 𝜃 = 2, by means of a single
observed value of 𝑥. What would be the sizes of the type 1 and type II errors, if you choose the
interval (i) 0.5 ≤ 𝑥, (ii) 1 ≤ 𝑥 ≤ 1.5 as the critical regions? Also obtain the power function of
the test.
Solution. Here we want to test 𝐻0 : 𝜃 = 1, against 𝐻1 : 𝜃 = 2.
(i) Here and
𝑊 = {𝑥: 0.5 ≤ 𝑥} = {𝑥: 𝑥 ≥ 0.5}
‾
𝑊 = {𝑥: 𝑥 ≤ 0.5}
𝛼 = 𝑃{𝑥 ∈ 𝑊 ∣ 𝐻0 } = 𝑃{𝑥 ≥ 0.5 ∣ 𝜃 = 1} = 𝑃{0.5 ≤ 𝑥 ≤ 𝜃 ∣ 𝜃 = 1}
1 1
= 𝑃{0.5 ≤ 𝑥 ≤ 1 ∣ 𝜃 = 1} = ∫ [𝑓(𝑥, 𝜃)]𝜃=1 𝑑𝑥 = ∫ 1. 𝑑𝑥 = 0.5
0.5 0.5
Similarly,
𝛽 ‾ ∣ 𝐻1 } = 𝑃{𝑥 ≤ 0.5 ∣ 𝜃 = 2}
= 𝑃{𝑥 ∈ 𝑊
0.5 0.5
1
= ∫ [𝑓(𝑥, 𝜃)]𝜃=2 𝑑𝑥 = ∫ 𝑑𝑥 = 0.25
0 0 2
Thus the sizes of type 𝐼 and type 𝐼 errors are respectively 𝛼 = 0.5 and 𝛽 = 0.25
and power function of the test = 1 − 𝛽 = 0.75
(ii) 𝑊 = {𝑥: 1 ≤ 𝑥 ≤ 1.5}

15
𝛼 = 𝑃(𝑥 ∈ 𝑊 ∣ 𝜃 = 1} = ∫ [𝑓(𝑥, 𝜃)]𝜃=1 𝑑𝑥 = 0,
1
since under 𝐻0 : 𝜃 = 1, 𝑓(𝑥, 𝜃) = 0, for 1 ≤ 𝑥 ≤ 1.5.

𝛽 = 𝑃{𝑥 ∈ 𝑊‾ ∣ 𝜃 = 2} = 1 − 𝑃{𝑥 ∈ 𝑊 ∣ 𝜃 = 2}
15
𝑥 1.5
= 1 − ∫ [𝑓(𝑥, 𝜃)]𝜃=2 𝑑𝑥 = 1 − | | = 0.75
1 21
252 | P a g e
∴ Power Function = 1 − 𝛽 = 1 − 0.75 = 0.25
Example 2. If 𝑥 ≥ 1 is the critical region for testing 𝐻0 : 𝜃 = 2 against the alternative 𝜃 = 1,

or the basis of the single observation from the population, 𝑓(𝑥, 𝜃) = 𝜃exp (−𝜃𝑥),0 ≤ 𝑥 < ∞,
obtain the values of type I and type II errors.
‾ = {𝑥: 𝑥 < 1} and 𝐻0 : 𝜃 = 2, 𝐻1 : 𝜃 = 1.

Solution. Here 𝑊 = {𝑥: 𝑥 ≥ 1} and 𝑊
𝛼 = Size of Type I error = 𝑃[𝑥 ∈ 𝑊 ∣ 𝐻0 ] = 𝑃[𝑥 ≥ 1 ∣ 𝜃 = 2]
∞ ∞ ∞
−2𝑥
𝑒 −2𝑥 1
= ∫ [𝑓(𝑥, 𝜃)]𝜃=2 𝑑𝑥 = 2 ∫ 𝑒 𝑑𝑥 = 2 | | = 2
1 1 −2 1 𝑒
𝛽 = Size of type II error = 𝑃[𝑥 ∈ 𝑊 ‾ ∣ 𝐻1 } = 𝑃{𝑥 < 1 ∣ 𝜃 = 1}
1 1
𝑒 −𝑥 𝑒−1
= ∫ 𝑒 −𝑥 𝑑𝑥 = | | = (1 − 𝑒 −1 ) = .
0 −1 0 𝑒
Example 3. Let 𝑝 be the probability that a coin will fall head in a single toss in order to test
1 3
𝐻0 /𝑝 = 2 against 𝐻1 : 𝑝 = 4. The coin is tossed 5 times and 𝐻0 is rejected if more than 3 heads
are obtained. Find the probability of type I error and power of the test.
1 3
Solution. Here 𝐻0 : 𝑝 = 2 and 𝐻1 : 𝑝 = 4.
If the r.v. X denotes the number of heads in 𝑛 tosses of a coin then 𝑋 ∼ 𝐵(𝑛, 𝑝) so that
𝑛
𝑃(𝑋 = 𝑥) = 𝐶𝑥 𝑝 𝑥 (1 − 𝑝)𝑛−𝑥 = 5 𝐶𝑥 𝑝 𝑥 (1 − 𝑝)5−𝑥 ------------(*)
‾ = {𝑥: 𝑥 ≤ 3}
The critical region is given by : 𝑊 = {𝑥: 𝑥 ≥ 4} ⇒ 𝑊
𝛼 = Probability of type I error = 𝑃[𝑋 ≥ 4 ∣ 𝐻0 ]
1 1 1 4 1 5−4 5 1 5
= 𝑃 (𝑋 = 4 ∣ 𝑝 = ) + 𝑃 (𝑋 = 5 ∣ 𝑝 = ) = 5 𝐶4 ( ) ( ) + 𝐶5 ( ) [From (*)]
2 2 2 2 2
5 5 5
1 1 1 3
= 5( ) + ( ) = 6( ) =
2 2 2 16
𝛽 = Probability of Type II error = 𝑃(𝑥 ∈ 𝑊 ‾ ∣ 𝐻1 ) = 1 − 𝑃(𝑥 ∈ 𝑊 ∣ 𝐻1 )
3 3 5
3 4 1 5
3 5
= 1 − [{𝑃 (𝑋 = 4 ∣ 𝑝 = ) + 𝑃 (𝑋 = 5 ∣ 𝑝 = ) = 1 − { 𝐶4 ( ) ( ) + 𝐶5 ( ) }
4 4 4 4 4
3 4 5 3 81 47
=1−( ) { + }= 1− =
4 4 4 128 128
81
∴ Power of the test = 1 − 𝛽 = 128.
253 | P a g e

Example 4. Let 𝑋 ∼ 𝑁(𝜇, 4), 𝜇 unknown. To test 𝐻0 : 𝜇 = −1 against 𝐻1 : 𝜇 = 1, based on a

sample of size 10 from this population, we use the critical region : 𝑥1 + 2𝑥2 𝑦 … + 10𝑥10 ≥ 0.
What is its size? What is the power of the test ?
Solution. Critical Region 𝑊 = {𝑥: 𝑥1 + 2𝑥2 + ⋯ + 10𝑥10 ≥ 0}.
Let 𝑈 = 𝑥1 + 2𝑥2 + ⋯ + 10𝑥10
Since 𝑥𝑖 s are i.i.d. 𝑁(𝜇, 4),
𝑈 ∼ 𝑁[(1 + 2+. . +10)𝜇, (12 + 22 + ⋯ + 102 )𝜎 2 ] = 𝑁(55𝜇, 385𝜎 2 )

⇒ ∼∼ (55𝜇, 385 × 4) = 𝑁(55𝜇, 1540) − − − −(∗)
The size ' 𝛼 ' of the critical region is : 𝛼 = 𝑃(𝑥 ∈ 𝑊 ∣ 𝐻0 ) = 𝑃(𝑈 ≥ 0 ∣ 𝐻0 ) − − − −(∗∗)
𝑈−𝐸(𝑈) 𝑈+55
Under 𝐻0 : 𝜇 = −1, 𝑈 ∼ 𝑁(−55,1540)[ From (∗)] ⇒ 𝑍 = 𝜎 =
𝑈 √1540
55 55
∴ Under 𝐻0 , when 𝑈 = 0, 𝑍 = = 39.2428 = 1.4015
√1540
[From ( ∗∗ ) ]
(From Normal Probability Tables)
Alternatively, 𝛼 = 1 − 𝑃(𝑍 ≤ 1.4015) = 1 − Φ(1.4015),
where Φ(⋅) is the distribution function of standard normal variate.
Power of the test is : 1 − 𝛽 = 𝑃(𝑥 ∈ 𝑊 ∣ 𝐻1 ) = 𝑃(𝑈 ≥ 0 ∣ 𝐻1 )
Under 𝐻1 : 𝜇 = 1, 𝑈 ∼ 𝑁(55,1540)
𝑈 − 𝐸(𝑈) −55
⇒ 𝑍 = = = −1.40 (when U = 0)
𝜎𝑢 √1540
∴ 1 − 𝛽 = 𝑃(𝑍 ≥ −1.40) = 𝑃(−1.4 ≤ 𝑍 ≤ 0) + 0.5
= 𝑃(0 ≤ 𝑍 ≤ 1.4) + 0.5 (By Symmetry)
= 0.4192 + 0.5 = 0.9192
Alternatively, 1 − 𝛽 = 1 − 𝑃(𝑍 ≤ −1.40) = 1 − Φ(−1.40),
where Φ(.)𝑖𝑠𝑡ℎ𝑒𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑜𝑓𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑛𝑜𝑟𝑚𝑎𝑙𝑣𝑎𝑟𝑖𝑎𝑡𝑒.
Example 5. Let X have a p.d.f. of the form :

1 −𝑥/𝜃
𝑓(𝑥, 𝜃) = {𝜃 𝑒 0 < 𝑥 < ∞, 𝜃 > 0
0, , elsewhere.
To test 𝐻0 : 𝜃 = 2, against 𝐻1 : 𝜃 = 1, use the random sample 𝑥1 , 𝑥2 of size 2 and define a
critical region: 𝑊 = {(𝑥1 , 𝑥2 ): 9.5 ≤ 𝑥1 + 𝑥2 }.
254 | P a g e
Find: (i) Power of the test.

(ii) Significance level of the test.
Solution. We are given the critical region :

𝑊 = {(𝑥1 , 𝑥2 ): 9.5 ≤ 𝑥1 + 𝑥2 } = {(𝑥1 𝑥2 ): 𝑥1 + 𝑥2 ≥ 9.5}
Size of the critical region 𝑖. 𝑒. , the significance level of the test is given by :
𝛼 = 𝑃(𝐱 ∈ 𝑊 ∣ 𝐻0 ) = 𝑃[𝑥1 + 𝑥2 ≥ 9.5 ∣ 𝐻0 ]
In sampling from the given exponential distribution,
2 𝑛 2
∑ 𝑥 ∼ 𝜒 2 (2𝑛) ⇒ 𝑈 = (𝑥 + 𝑥2 ) ∼ 𝜒 2 (4), (𝑛 = 2) [c.f. Example 18-8]
𝜃 𝑖=1 𝑖 𝜃 1
2 2
∴ 𝛼 = 𝑃 [ (𝑥1 + 𝑥2 ) ≥ × 9.5 ∣ 𝐻0 ] [From ( ∗ )]
𝜃 𝜃
= 𝑃[𝜒 2 (4) ≥ 9.5]
𝛼 = 0.05 (From Probability Tables of 𝜒 2 -distribution]
Power of the test is given by
1−𝛽 = 𝑃(𝑥 ∈ 𝑊 ∣ 𝐻1 ) = 𝑃(𝑥1 + 𝑥2 ≥ 9.5 ∣ 𝐻1 )
2 2
= 𝑃 [ (𝑥1 + 𝑥2 ) ≥ × 9.5 ∣ 𝐻1 ]
𝜃 𝜃
2
= 𝑃[𝜒(4) ≥ 19]
(∵ Under 𝐻1 , 𝜃 = 1 )
Example 6. Use the Neyman-Pearson Lemma to obtain the region for testing 𝜃 = 𝜃0 against
𝜃 = 𝜃1 > 𝜃0 and 𝜃 = 𝜃1 < 𝜃0 , in the case of a normal population 𝑁(𝜃, 𝜎 2 ), where 𝜎 2 is
known. Hence find the power of the test.
Solution.
𝑛 𝑛 𝑛
1 1
𝐿 = ∏ 𝑓(𝑥𝑖 , 𝜃) = ( ) exp {− 2 ∑ (𝑥𝑖 − 𝜃)2 }
𝜎√2𝜋 2𝜎
𝑖=1 𝑖=1
Using Neyman-Pearson Lemma, best critical region (B.C.R.) is given by (for 𝑘 > 0 )
1 𝑛 2
𝐿1 exp {− 2𝜎 2 ∑𝑖=1 (𝑥𝑖 − 𝜃1 ) }
= ≥𝑘
𝐿0 exp {− 1 ∑𝑛 (𝑥 − 𝜃 )2 }
2𝜎 2 𝑖=1 𝑖 0
255 | P a g e

𝑛 𝑛
1
⇒ exp [− 2 {∑ (𝑥𝑖 − 𝜃1 )2 − ∑ (𝑥𝑖 − 𝜃0 )2 }] ≥ 𝑘
2𝜎
𝑖=1 𝑖=1
𝑛
𝑛 1
⇒ exp [− 2 (𝜃12 − 𝜃02 ) + 2 (𝜃1 − 𝜃0 ) ∑ 𝑥𝑖 ] ≥ 𝑘
2𝜎 𝜎
𝑖=1
𝑛
𝑛 1
⇒ − 2
(𝜃12 − 𝜃02 ) + 2 (𝜃1 − 𝜃0 ) ∑ 𝑥𝑖 ≥ log 𝑘
2𝜎 𝜎
𝑖=1
(since log 𝑥 is an increasing function of 𝑥 )

𝜎2 𝜃12 − 𝜃02
⇒ 𝑥‾(𝜃1 − 𝜃0 ) ≥ log 𝑘 +
𝑛 2
Case (i) If 𝜃1 > 𝜃0 , the B.C.R. is determined by the relation (right-tailed test) :
𝜎 2 log 𝑘 𝜃1 + 𝜃0
𝑥‾ > ⋅ +
𝑛 𝜃1 − 𝜃0 2
⇒ 𝑥‾ > 𝜆1 (say).
∴ B.C.R. is: 𝑊 = {x: 𝑥‾ > 𝜆1 }
Case (ii) If 𝜃1 < 𝜃0 , the B.C.R. is given by the relation (left handed test)
𝜎 2 log 𝑘 𝜃1 + 𝜃0
𝑥‾ < ⋅ + = 𝜆2 , (say).
𝑛 𝜃1 − 𝜃0 2
Hence B.C.R. is : 𝑊1 = |𝑥: 𝑥‾ ≤ 𝜆2 |
The constants 𝜆1 and 𝜆2 are so chosen as to make the probability of each of the relations (18.10)
and (18.11) equal to 𝛼 when the hypothesis 𝐻0 is true. The sampling distribution of 𝑥‾, when
𝜎2
𝐻𝑖 is true is 𝑁 (𝜃𝑖 , ) , (𝑖 = 0,1). Therefore, the constants 𝜆1 and 𝜆2 are determined from the
𝑛
relations :
𝑃[𝑥‾ > 𝜆1 ∣ 𝐻0 ] = 𝛼 and 𝑃[𝑥‾ < 𝜆2 ∣ 𝐻0 ] = 𝛼
𝜆1 − 𝜃0
∴ 𝑃(𝑥‾ > 𝜆1 ∣ 𝐻0 ) = 𝑃 [𝑍 > ] = 𝛼; 𝑍 ∼ 𝑁(0,1)
𝜎/√𝑛
𝜆1 − 𝜃0 𝜎
⇒ = 𝑧𝛼 ⇒ 𝜆1 = 𝜃0 + 𝑧𝛼
𝜎/√𝑛 √𝑛
where 𝑧𝛼 is the upper 𝛼-point of the standard normal variate given by :

𝑃(𝑍 > 𝑧𝛼 ) = 𝛼
Also 𝑃(𝑥‾ < 𝜆2 ∣ 𝐻0 ) = 𝛼 ⇒ 𝑃(𝑥‾ ≥ 𝜆2 ∣ 𝐻0 ) = 1 − 𝛼
256 | P a g e
𝜆2 − 𝜃0 𝜆2 − 𝜃0
⇒ 𝑃 (𝑍 ≥ )= 1−𝛼 ⇒ = 𝑧1−𝛼
𝜎/√𝑛 𝜎/√𝑛
𝜎
⇒ 𝜆2 = 𝜃0 + 𝑧1−𝛼
√𝑛
Note. By symmetry of normal distribution, we have 𝑧1−𝛼 = −𝑧𝛼 .
Power of the test. By definition, the power of the test in case (𝑖) is :
1 − 𝛽 = 𝑃[𝑥 ∈ 𝑊 ∣ 𝐻1 ] = 𝑃[𝑥‾ ≥ 𝜆1 ∣ 𝐻1 ]
𝜆1 − 𝜃1 𝑥‾ − 𝜃1
= 𝑃 (𝑍 ≥ ) [∵ Under 𝐻1 , 𝑍 = ∼ 𝑁(0,1)]
𝜎/√𝑛 𝜎/√𝑛
𝜎
𝜃0 + 𝑧𝛼 − 𝜃1
√ 𝑛
= 𝑃 (𝑍 ≥ )
𝜎/√𝑛
𝜃1 − 𝜃0
= 𝑃 (𝑍 ≥ 𝑧𝛼 − )
𝜎/√𝑛
= 1 − 𝑃(𝑍 ≤ 𝜆3 )
= 1 − Φ(𝜆3 ),
where Φ(.)𝑖𝑠𝑡ℎ𝑒𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛𝑜𝑓𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑛𝑜𝑟𝑚𝑎𝑙𝑣𝑎𝑟𝑖𝑎𝑡𝑒.
Similarly in case (ii), (𝜃1 < 𝜃0 ), the power of the test is
𝜆2 − 𝜃1
1−𝛽 = 𝑃(𝑥‾ < 𝜆2 ∣ 𝐻1 ) = 𝑃 (𝑍 < )
𝜎/√𝑛
𝜎
𝜃0 + 𝑧1−𝛼 − 𝜃1
√𝑛
= 𝑃 (𝑍 < )
𝜎/√𝑛
𝜃0 − 𝜃1
= 𝑃 (𝑍 < 𝑧1−𝛼 + ) = Φ(𝜆4 ), (∵ 𝜃0 > 𝜃1 ) … (18 ⋅ 13𝑎)
𝜎/√𝑛
√𝑛(𝜃0 −𝜃1 ) √𝑛(𝜃0 −𝜃1 )
where 𝜆4 = 𝑧1−𝛼 + = − 𝑧𝛼
𝜎 𝜎
257 | P a g e


MCQ’s Problems
Question 1.
Let 𝑋1 , … , 𝑋𝑛 be a random sample of size n(≥ 2) from a uniform distribution with
1
; 0<𝑥<𝜃
probability density function 𝑓(𝑥, 𝜃) = {𝜃
0, otherwise
where 𝜃 ∈ (0, ∞). If 𝑋(1) = min{𝑋1 , … , 𝑋𝑛 } and 𝑋(𝑛) = max{𝑋1 , … , 𝑋𝑛 }.
1 𝜃
then, as n→ ∞, 𝜃 (𝑋(𝑛) + 𝑛+1) converges in probability to
A. 1
B. 0
C. 2
D. 3
Question 2.
Which measure is used to determine the convexity of the distribution curve?
A. skewness
B. kurtosis
C. variance
D. standard deviation
Question 3.
Consider the sample linear regression model 𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝜖𝑖 , 𝑖 = 1,2, … , 𝑛
Where 𝜖𝑖′ 𝑠 are i.i.d random variables with mean 0 and variance 𝜎 2 ∈ (0, ∞)
Suppose that we have a data set (𝑥1 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 ) with n = 10,
∑𝑛𝑖=1 𝑥𝑖 = 50, ∑𝑛𝑖=1 𝑦𝑖 = 40, ∑𝑛𝑖=1 𝑥𝑖2 = 500 ∑𝑛𝑖=1 𝑦𝑖2 = 400 and ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 = 400.
An unbiased estimate of 𝜎 2 is
A. 5
B. 1/5
C. 10
D. 1/10
258 | P a g e
Question 4.
If 𝑋1 , 𝑋2 , … , 𝑋𝑛 is a random sample from a population with density
1 −𝑥
𝑓(𝑥, 𝜃) = {𝜃 𝑒
𝜃 if 0<𝑥 < ∞
0, otherwise
Where 𝜃 > 0 is an unknown parameter, what is a 100(1 − 𝛼)% confidence interval for 𝜃?
2∑𝑛 𝑛
𝑖=1 ln 𝑋𝑖 2∑𝑖=1 ln 𝑋𝑖
A. [ 2 , 2 ]
𝜒 𝛼 (2𝑛) 𝜒𝛼 (2𝑛)
1−
2 2
2∑𝑛
𝑖=1 𝑋𝑖 2∑𝑛
𝑖=1 𝑋𝑖
B. [𝜒2 , 2 (2𝑛) ]
𝛼 (2𝑛) 𝜒𝑎
1−
2 2
2∑𝑛 𝑋𝑖 2∑𝑛 𝑋
C. [ 𝜒2𝑖=1 , 𝑖=1 𝑖 ]
(2𝑛) 𝜒2 (2𝑛)
𝛼 𝛼
1−
2 2
2∑𝑛
𝑖=1 ln 𝑋𝑖
D. [ 2 (2𝑛) , ]
𝜒𝛼 𝜒2 𝛼 (2𝑛)
1−
2 2
Question 5.
Suppose that 𝑋 has uniform distribution on the interval [0,100]. Let 𝑌 denote the greatest
integer smaller than or equal to X. Which of the following is true?
1
A. 𝑃(𝑌 ≤ 25) = 4
26
B. 𝑃(𝑌 ≤ 25) = 100
C. 𝐸(𝑌) = 50
101
D. 𝐸(𝑌) = 2
Question 6.
Let 𝑥1 = 3, 𝑥2 = 4, 𝑥3 = 3, 𝑥4 = 2.5 be the observed values of a random sample from the
𝑥 𝑥
1 1 1 − 2
𝑓( 𝑥 ∣ 𝜃 ) = 3 [𝜃 𝑒 − 𝜃 + 𝜃2 𝑒 𝜃 + 𝑒 −𝑥 ] , 𝑥 > 0, 𝜃 𝑏𝑒𝑙𝑜𝑛𝑔𝑠 𝑡𝑜 {1,2,3,4}
259 | P a g e

then the method of moment estimate (MME) of 𝜃 is

A. 1
B. 2
C. 3
D. 4
Question 7.
Let the random variable 𝑋 and 𝑌 have the joint probability mass function
𝑥 3 𝑦 1 𝑥−𝑦 2𝑥
𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) = 𝑒 −2 (𝑦) ( ) ( ) , 𝑦 = 0,1,2, … , 𝑥; 𝑥 = 0,1,2, …
4 4 𝑥!
Then 𝑉(𝑌) is equal to
A. 1
B. 1/2
C. 2
D. 3/2
Question 8.
Let the discrete random variables 𝑋 and 𝑌 have the joint probability mass function
𝑒 −1
; 𝑚 = 0,1,2, … , 𝑛; 𝑛 = 0,1,2, …
𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) = {(𝑛 − 𝑚)! 𝑚! 2𝑛
0, otherwise
A. The marginal distribution of 𝑋 is Poisson with mean 1/2
B. The random variable 𝑋 and 𝑌 are independent
1
C. The conditional distribution of X given Y = 5 is Bin (6, 2)
D. 𝑃(𝑌 = 𝑛) = (𝑛 + 1)𝑃(𝑌 = 𝑛 + 2) for 𝑛 = 0,1,2, …
Question 9.
Consider the trinomial distribution with the probability mass function
260 | P a g e
2! 1 𝑥 2 𝑦 3 2−𝑥−𝑦
𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) = ( ) ( ) ( )
𝑥! 𝑦! (2 − 𝑥 − 𝑦)! 6 6 6
, 𝑥 ≥ 0, 𝑦 ≥ 0, and 0 < 𝑥 + 𝑦 ≤ 2. Then Corr (𝑋, 𝑌) is equal to…
(correct up to two decimal places)
A) -0.31
B) 0.31
C) 0.35
D) 0.78
Question 10.
Let 𝑥1 = 1.1, 𝑥2 = 0.5, 𝑥3 = 1.4, 𝑥4 = 1.2 be the observed values of a random sample of size
four from a distribution with the probability density function
𝑒 𝜃−𝑥 , if 𝑥 ≥ 𝜃, 𝜃 ∈ (−∞, ∞)
𝑓(𝑥 ∣ 𝜃) = {
0, otherwise
Then the maximum likelihood estimate of 𝜃 2 + 𝜃 + 1 is equal (up to decimal place).
A) 1.75
B) 1.89
C) 1.74
D) 0.87
Question 11.
Let 𝑈 ∼ 𝐹5,8 and 𝑉 ∼ 𝐹8,5. If 𝑃[𝑈 > 3.69] = 0.05, then the value of C such that
𝑃[𝑉 > 𝑐] = 0.95 equals… (round off two decimal places)
A) 0.27
B) 1.27
C) 2.27
D) 2.29
Question 12.
Let P be a probability function that assigns the same weight to each of the points of the
sample space Ω = {1,2,3,4}. Consider the events E = {1,2}, F = {1,3} and G = {3,4}. Then
which of the following statement(s) is (are) TRUE?
1. E and F are independent
261 | P a g e

2. E and G are independent

3. E, F and G are independent
A. 1 only
B. 2 only
C. 1 and 2 only
D. 1,2 and 3
Question 13.
Let 𝑋1 , 𝑋2 , … , 𝑋4 and 𝑌1 , 𝑌2 , … , 𝑌5 be two random samples of size 4 and 5 respectively,
5
1 𝑋 2 +𝑋 2 +𝑋 2 +𝑋 2
2 3 4
from a standard normal population. Define the statistic T = (4) 𝑌 2+𝑌 2 +𝑌 2 +𝑌 2 +𝑌 2
1 2 3 4 5
then which of the following is TRUE?

A. Expectation of 𝑇 is 0.6
B. Variance of T is 8.97
C. T has F-distribution with degree of freedom 5 and 4
D. T has F-distribution with degree of freedom 4 and 5
Question 14.
Let 𝑋, 𝑌 and 𝑍 be independent random variables with respective moment generating function
1 2
𝑀𝑋 (𝑡) = 1−𝑡 , 𝑡 < 1; 𝑀𝑌 (𝑡) = 𝑒 𝑡 /2 = 𝑀𝑍 (𝑡) 𝑡 ∈ ℝ. Let 𝑊 = 2𝑋 + 𝑌 2 + 𝑍 2 then P(W > 2)
is equals to
A. 2𝑒 −1
B. 2𝑒 −2
C. 𝑒 −1
D. 𝑒 −2
Question 15
𝑥 𝑥
1 1 1 − 2
probability density function 𝑓(𝑥 ∣ 𝜃) = 3 [𝜃 𝑒 − 𝜃 + 𝜃2 𝑒 𝜃 + 𝑒 −𝑥 ] , 𝑥 > 0, 𝜃 ∈ (0, ∞)
Then the method of moment estimate (MME) of 𝜃 is
A. 1.5
262 | P a g e
B. 2.5
C. 3.5
D. 4.5
Question 16.
Let 𝑋 be a random variable with cumulative distribution function
1 𝑛+2𝑘+1
𝑃(𝑋 = ℎ, 𝑌 = 𝑘) = ( ) ; 𝑛 = −𝑘, −𝑘 + 1, … , ; 𝑘 = 1,2, …
2
Then E(Y) equals
A. 1
B. 2
C. 3
D. 4
Question 17.
Let 𝑋 be a random variable with the cumulative distribution function
0, 𝑥<0
1 + 𝑥2
, 0≤𝑥<1
𝐹(𝑥) = 10
3 + 𝑥2
, 1≤𝑥<2
10
{ 1, 𝑥≥2
3
A. 𝑃(1 < 𝑋 < 2) = 10
31
B. 𝑃(1 < 𝑋 ≤ 2) = 5
11
C. 𝑃(1 ≤ 𝑋 < 2) = 2
41
D. 𝑃(1 ≤ 𝑋 ≤ 2) = 5
Question 18.
Let the random variables 𝑋1 and 𝑋2 have joint probability density function
263 | P a g e

𝑥1 𝑒 −𝑥1𝑥2
𝑓(𝑥1 , 𝑥2 ) = { , 1 < 𝑥1 < 3, 𝑥2 > 0
2
0, otherwise.
What is the value Var (𝑋2 ∣ 𝑋1 = 2) …(up to two decimal place)?
A) 0.27
B) 0.28
C) 0.25
D) 1.90
Question 19
Let 𝑥1 = 1, 𝑥2 = 0, 𝑥3 = 0, 𝑥4 = 1, 𝑥5 = 0, 𝑥6 = 1 be the data on a random sample of size 6
from Bin (1, 𝜃) distribution, where 𝜃 ∈ (0,1). Then the uniformly minimum variance unbiased
estimate of 𝜃(1 + 𝜃) equal to
13.5 SUMMARY
13.6 GLOSSARY
• Motivation: These Problems are very useful in real life and we can use it in data
science , economics as well as social sciemce.
• Attention: Think how the best estimator are useful in real world problems.
Answer 1: A
Explanation :
1
𝑓(𝑥, 𝜃) = {𝜃 ; 0 < 𝑥 < 𝜃
0, otherwisse
𝑛𝜃
E(𝑋(𝑛) ) =
𝑛+1
1 𝜃
Let Y = 𝜃 (𝑋(𝑛) + 𝑛+1)
𝑋(𝑛) 1
so E(Y) = E( + )=1
𝜃 𝑛+1
1 1
lim𝑛→∞ 𝐸 [ (𝑋(𝑛) + )] = 1;
𝜃 𝑛+1
264 | P a g e

Answer 2: B
Explanation:
Convexity (peakedness) is decided by kurtosis.
Answer 3: C
Explanation :
𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝜖𝑖 , 𝑖 = 1,2, … , 𝑛
∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − 𝑛𝑥‾𝑦‾ 400 − 10 × 5 × 4 4 4
𝛽ˆ = 𝑛 = = ; 𝛼
ˆ = 𝑦
‾ − 𝑥
‾𝛽ˆ = 4−5× =0
∑𝑖=1 𝑥𝑖2 − 𝑛𝑥‾ 2 500 − 10 × 52 5 5
1 2 1 4 2
𝜎ˆ 2 = 𝑛−2 ∑𝑛𝑖=1 (𝑦𝑖 − 𝛼ˆ − 𝛽ˆ 𝑥𝑖 ) = 10−2 ∑𝑛𝑖=1 (𝑦𝑖 − 5 𝑥𝑖 )
1 4 4 2 1 8 16
= 8 (∑𝑛𝑖=1 𝑦𝑖2 − 2 × 5 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 + (5) ∑𝑛𝑖=1 𝑥𝑖2 ) = 8 (400 − 5 × 400 + 25 × 500)
= 10
Answer 4: B
Explanation :
2
We use the random variable 𝑄 = 𝜃 ∑𝑛𝑖=1 𝑋𝑖 ∼ 𝜒(2𝑛)
2
As the pivotal quantity. The 100(1 − 𝛼)%

confidence interval for 𝜃 can be constructed from
2∑𝑛
𝑖=1 𝑋𝑖
1 − 𝛼 = 𝑃 (𝜒α2 (2𝑛) ≤ 𝑄 ≤ 𝜒1−
2
α (2𝑛)) = 𝑃 [ 2 ≤𝜃≤ 2 (2𝑛) ]
2 2 𝜒 α (2𝑛) 𝜒α
1−
2 2
2∑𝑛
𝑖=1 𝑋𝑖
Thus, 100(1 − 𝛼)% confidence interval for 𝜃 is given by [𝜒2 ≤𝜃≤ 2 (2𝑛) ]
𝛼 (2𝑛) 𝜒𝛼
1−
2 2
Answer 5: B
Explanation :
Let 𝑌 = [X]; where 𝑌 denote the greatest integer smaller than or equal to 𝑋.
265 | P a g e

26 1 26
𝑃(𝑌 ≤ 25) = 𝑃([𝑋] ≤ 25) = 𝑃(𝑋 ∈ (0,26)) = ∫0 𝑑𝑥 =
100 100
Hence B is the correct option.
Answer 6: C
Explanation :
1
𝑥‾ = (3 + 4 + 3.5 + 2.5) = 3.25
4
1 1
𝐸(𝑋) = [𝜃 + 𝜃 2 + 1]Γ2 = [𝜃 + 𝜃 2 + 1] = 3.25
3 3
2
𝜃 + 𝜃 − 8.75 = 0 then 𝜃 = 2.5 or −3.5
Since 𝜃 ∈ {1,2,3,4} then 𝜃 = 3
Answer 7: D
Explanation :
The marginal pmf of 𝑌 is given by
∞ ∞
−2
𝑥 3 𝑦 1 𝑥−𝑦 2𝑥
𝑃(𝑌 = 𝑦) = ∑ 𝑃(𝑋 = 𝑥), 𝑌 = 𝑦) = ∑ 𝑒 (𝑦) ( ) ( )
4 4 𝑥!
𝑥=𝑦 𝑥=𝑦
𝑦 ∞ 𝑢 𝑦+𝑢
3 𝑦+𝑢 1 2
= 𝑒 −2 ( ) ∑ ( 𝑦 ) ( ) (Assume 𝑢 = 𝑥 − 𝑦)
4 4 (𝑦 + 𝑢)!
𝑢=0
𝑦 ∞ ∞
−2
3 (𝑦 + 𝑢)! 1 𝑢 2𝑦+𝑢 −2
3 𝑦 2𝑦 1! 1 𝑢
=𝑒 ( ) ∑ ( ) == 𝑒 ( ) ∑ ( )
4 𝑦! 𝑢! 4 (𝑦 + 𝑢)! 4 𝑦! 𝑢! 2
𝑢 0
3 3 𝑦
−2
3 𝑦 2𝑦 1/2 𝑒 −2(2)
=𝑒 ( ) 𝑒 = , 𝑦 = 0,1, …
4 𝑦! 𝑦!
Which is the pmf of Poisson random variable with parameter 3/2, so 𝐸(𝑋) = 3/2 and
𝑉(𝑋) = 3/2.
Answer 8: A
Explanation :
The marginal probability mass function of X is given by
𝑃(𝑋 = 𝑚) = ∑∞
𝑛=𝑚 𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) ( for 𝑚 = 0,1,2, … )
266 | P a g e
1 1 𝑚
− ( )
𝑒 22
= , 𝑚 = 0,1,2, …
𝑚!
Thus the marginal distribution of X is Poisson with mean 1/2.
The marginal probability mass function of 𝑌 is given by
𝑃(𝑌 = 𝑛) = ∑∞ 𝑚=0 𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) ( for 𝑛 = 0,1,2, … )
𝑒 −1
= , 𝑛 = 0,1,2, …
𝑛!
Thus the marginal distribution of 𝑌 is Poisson with mean 1 .
𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) ≠ 𝑃(𝑋 = 𝑚)𝑃(𝑌 = 𝑛)
Therefore 𝑋 and 𝑌 are not independent.
𝑃(𝑋 = 𝑚, 𝑌 = 5) 5! 1 5
𝑃(𝑋 = 𝑚 ∣ 𝑌 = 5) = = ( ) , 𝑚 = 0,1,2, … ,5
𝑃(𝑌 = 5) 𝑚! (5 − 𝑚)! 2
1
Thus the conditional distribution of 𝑋 given 𝑌 = 5 is B in (5, 2)
𝑃(𝑌=𝑛)
Since 𝑃(𝑌=𝑛+1) = (𝑛 + 1) for 𝑛 = 0,1,2, …
Answer 9: 𝑨
Explanation :
The trinomial distribution of two r.v.'s 𝑋 and 𝑌 is given by
𝑛!
𝑓𝑋,𝑌 (𝑥, 𝑦) = 𝑝 𝑥 𝑞 𝑦 (1 − 𝑝 − 𝑞)(𝑛−𝑥−𝑦)
𝑥! 𝑦! (𝑛 − 𝑥 − 𝑦)!
for 𝑥, 𝑦 = 0,1,2, … , 𝑛 and 𝑥 + 𝑦 ≤ 𝑛, where p + q ≤ 1.
n = 2, p = 1/6 and q = 2/6
1 1 10
Var (X) = 𝑛𝑝1 (1 − 𝑝1 ) = 2 × (1 − ) = ; Var (Y) = 𝑛𝑝2 (1 − 𝑝2 )
6 6 36
2 2
= 2 × ̅6 (1 − 6) = 16/36
1 2 4
Cov (𝑋, 𝑌) = −𝑛𝑝1 𝑝2 = −2 × × =−
6 6 36
Cov (𝑋, 𝑌) 4
Corr (𝑋, 𝑌) = =− = −0.31
√Var (𝑋)√Var (𝑌) 4√10
Hence −0.31 is the correct answer.
267 | P a g e

Answer 10: 𝐀
Explanation :
Let 𝑥1 = 1.1, 𝑥2 = 0.5, 𝑥3 = 1.4, 𝑥4 = 1.2
𝑒 𝜃−𝑥 , if 𝑥 ≥ 𝜃, 𝜃 ∈ (−∞, ∞)
𝑓(𝑥 ∣ 𝜃) = {
0, otherwise
𝜃 ∈ (∞, 𝑋(1) ]
𝑑
Since 𝑑𝜃 𝑓(𝑥 ∣ 𝜃) > 0 ∀𝜃 ∈ (∞, 𝑋(1) ], then
𝑓(𝑥 ∣ 𝜃) is strictly increasing function. So 𝜃ˆ = 𝑋(1) = 0.5, therefore by invariance

property the MLE of 𝜃 2 + 𝜃 + 1 = (0.5)2 + 0.5 + 1 = 1.75.
Hence MLE for 𝜃 2 + 𝜃 + 1 is 1.75.
Answer 11: 𝐀
Explanation :
1
X ∼ 𝐹(𝑚, 𝑛) then x ∼ 𝐹(𝑛, 𝑚)
𝑃[𝑈 > 3.69] = 0.05 ⇒ 1 − 𝑃[𝑈 > 3.69] = 1 − 0.05
⇒ 𝑃[𝑈 < 3.69] = 0.95
1 1 1
⇒ 𝑃 [𝑈 > 3.69] = 0.95 ⇒ 𝑉 = 𝑈 and
1
𝑐= = 0.27
3.69
Hence c = 0.27 is the correct answer.
Answer 12: C
Explanation :
Clearly, P({𝜔}) = 1/4 ∀𝜔 ∈ Ω = {1,2,3,4}. We have E = {1,2}, F = {1,3} and G = {3,4}
Then P(E) = P(F) = P(G) = 2/4 = 1/2.
Using this result, we see that E and F are independent and also E and G are independent.
Answer 13: D
Explanation :
268 | P a g e
5 𝑋12 + 𝑋22 + 𝑋32 + 𝑋42 𝑛 5

𝑇=( ) 2 2 2 2 2 ∼ 𝐹(4,5); 𝐸(𝑊) = =
4 𝑌1 + 𝑌2 + 𝑌3 + 𝑌4 + 𝑌5 𝑛−2 3
2(5)2 (7) 350
Var (𝑇) = = = 9.72
4(3)2 (1) 36
Answer 14: A
Explanation :
2
Since 𝑊 = 2𝑋 + 𝑌 2 + 𝑍 2 ∼ 𝜒(4)
1 −𝑤/2
𝑓𝑊 (𝑤) = {4 𝑤𝑒 , if 𝑤 > 0
0, otherwise
∞1
𝑃(𝑊 > 2) = ∫2 𝑤𝑒 −𝑤/2 𝑑𝑤 = 2𝑒 −1
4
Answer 15: C
Explanation :
1
𝑥‾ = (3 + 4 + 3.5 + 2.5) = 3.25
4
1 1
𝐸(𝑋) = [𝜃 + 𝜃 2 + 1]Γ2 = [𝜃 + 𝜃 2 + 1] = 3.25
3 3
2
𝜃 + 𝜃 − 8.75 = 0 then 𝜃 = 2.5 or −3.5
Since 𝜃 ∈ (0, ∞) then 𝜃 = 2.5
Answer 16: B
Explanation :
𝑃(𝑌 = 𝑘) = ∑∞
𝑛=−𝑘 𝑃(𝑋 = 𝑛, 𝑌 = 𝑘): { put m = n + k}
1 1 𝑘−1
= ( ) {𝑘 = 1,2, …
2 2
which is the pmf of geometric distribution with parameter 1/2}
269 | P a g e

1 1 𝑘−1
𝐸(𝑌) = ∑∞
𝑘=0 𝑘 ( ) =2
2 2
Answer 17: A
Explanation :
3
𝑃(1 < 𝑋 < 2) = 𝐹(2) − 𝐹(1) − 𝑃(𝑋 = 2) =
10
3
𝑃(1 < 𝑋 ≤ 2) = 𝐹(2) − 𝐹(1) =
5
1
𝑃(1 ≤ 𝑋 < 2) = 𝐹(2) − 𝐹(1) − 𝑃(𝑋 = 2) + 𝑃(𝑋 = 1) =
2
4
𝑃(1 ≤ 𝑋 ≤ 2) = 𝐹(2) − 𝐹(1) + 𝑃(𝑋 = 1) =
5
Answer 18: C
Explanation :
∞ 𝑥1 𝑒 −𝑥1 𝑥2 1
The marginal pdf of 𝑋1 is 𝑔(𝑥1 ) = ∫0 𝑑𝑥2 = 2 , 1 < 𝑥1 < 3
2
𝑓(𝑥1 , 𝑥2 )
ℎ(𝑥2 ∣ 𝑥1 ) = = 𝑥1 𝑒 −𝑥1𝑥2 , 𝑥2 > 0
𝑔(𝑥1 )
1
𝑋2 ∣ 𝑋1 ∼ Exp (𝑥1 ) with mean 𝑥
1
1
Therefore Var (𝑋2 ∣ 𝑋1 = 2) = 4 = 0.25
Hence 0.25 is correct answer.
Answer 19: 0.7
Explanation :
𝑇 𝑇(𝑇−1)
+ 𝑛(𝑛−1) is UMVUE of 𝜃(1 + 𝜃)
𝑛
𝑇 𝑇(𝑇−1)
𝑬 (𝑛 + 𝑛(𝑛−1)) = 𝜃(1 + 𝜃); where
𝑇 = ∑𝑛𝑖=1 𝑋𝑖
𝑇 𝑇(𝑇−1) 3 3(3−1) 21
Therefore, + = + = = 0.70
𝑛 𝑛(𝑛−1) 6 6(6−1) 30
270 | P a g e
13.8 REFERENCES
• B.L Agarwal, Programmed Statistics ,New Age International Publishers, 2nd Edition.
271 | P a g e

LESSON 14
ERROR IN HYPOTHESIS TESTING AND
POWER OF TEST
STRUCTURE
14.2 Introduction
14.3 Error in Hypothesis Testing and Power of Test
14.3.1 Type I and Type II Error
14.3.2 Unbiased Test and Unbiased Critical Region
14.3.3 UMP (Uniformly Most Powerful) Critical Region
14.3.4 Likelihoof Ratio Test
14.5 Summary
14.6 Glossary
14.8 References
One of the main objectives to discuss testing of hypothesis and how it can be use in real
analysis.
14.2 INTRODUCTION
The main problems in statistical inference can be broadly classified into two areas:
(i) The area of estimation of population parameter(s) and setting up of confidence intervals
for them, i.e, the area of point and intertal estimation and
(ii) Tests of statistical hypothesis.
In Neyman-Pearson theory, we use statistical methods to arrive at decisions in certain situations
where there is lack of certainty on the basis of a sample whose size is fixed in advance while
in Wald's sequential theory the sample size is not fixed but is regarded as a random variable.
14.3 ERROR IN HYPOTHESIS TESTING AND POWER
We will discuss these distributions in details.
(i) Type I and Type II Error
272 | P a g e
(ii) Unbiased Test and Unbisaed Critical Region

(iii) UMP ( Uniformly Most Powerful Critical Region)
(iv) Likelihood Ratio Test
14.3.1 Type I and Type II Error :
• Type I error, also known as a "false positive": the error of rejecting a null hypothesis when
it is true. In other words, this is the error of accepting an alternative hypothesis (the real
hypothesis of interest) when the results can be attributed to chance. Plainly speaking, it
occurs when we are observing a difference when in truth there is none (or more specifically
- no statistically significant difference). So the probability of making a type I error in a test
with rejection region R is 𝑃(𝑅 ∣ 𝐻0 is true ).
• Type II error, also known as a "false negative": the error of not rejecting a null hypothesis
when the alternative hypothesis is the true state of nature. In other words, this is the error
of failing to accept an alternative hypothesis when you don't have adequate power. Plainly
speaking, it occurs when we are failing to observe a difference when in truth there is one.
So the probability of making a type II error in a test with rejection region R is 1 − 𝑃(𝑅 ∣ 𝐻𝑎
is true). The power of the test can be 𝑃(𝑅 ∣ 𝐻𝑎 is true ).
Hypothesis testing is the art of testing if variation between two sample distributions can just be
explained through random chance or not. If we have to conclude that two distributions vary in
a meaningful way, we must take enough precaution to see that the differences are not just
through random chance. At the heart of Type I error is that we don't want to make an
unwarranted hypothesis so we exercise a lot of care by minimizing the chance of its occurrence.
Traditionally we try to set Type I error as .05 or .01 - as in there is only a 5 or 1 in 100 chance
that the variation that we are seeing is due to chance. This is called the 'level of significance'.
Again, there is no guarantee that 5 in 100 is rare enough so significance levels need to be chosen
carefully. For example, a factory where a six sigma quality control system has been
implemented requires that errors never add up to more than the probability of being six standard
deviations away from the mean (an incredibly rare event). Type I error is generally reported as
the p-value.
14.3.2 Unbiased Test and Unbiased Critical Region
Let us consider the testing of 𝐻0 : 𝜃 = 𝜃0 against 𝐻1 : 𝜃 = 𝜃1 : The critical region 𝑊 and
consequently the test based on it is said to be unbiased if the power of the test exceeds the size
of the critical region, i.e., if
Power of the test ≥ Size of the C.R.
⇒ 1−𝛽 ≥𝛼
⇒ 𝑃𝜃1 (𝑊) ≥ 𝑃𝜃0 (𝑊)
⇒ 𝑃[𝑥: 𝑥 ∈ 𝑊 ∣ 𝐻1 ] ≥ 𝑃[𝑥: 𝑥 ∈ 𝑊 ∣ 𝐻0 ]
273 | P a g e

In other words, the critical region 𝑊 is said to be unbiased if

𝑃𝜃 (𝑊) ≥ 𝑃𝜃0 (𝑊), ∀𝜃(≠ 𝜃0 ) ∈ Θ
Theorem. Every most powerful (MP) or uniformly most powerful (UMP) critical region
(CR) is necessarily unbiased.
(a) If 𝑊 be an MPCR of size 𝛼 for testing 𝐻0 : 𝜃 = 𝜃0 against 𝐻1 : 𝜃 = 𝜃1 , then it is
necessarily unbiased.
(b) Similarly if 𝑊 be UMPCR of size 𝛼 for testing 𝐻0 : 𝜃 = 𝜃0 against 𝐻1 : 𝜃 ∈ Θ1 , then it is
also unbiased.
Proof. Since 𝑊 is the MPCR of size 𝛼 for testing 𝐻0 : 𝜃 = 𝜃0 against 𝐻1 : 𝑄 = 𝜃1 , by Neyman
Pearson Lemma, we have; for ∀𝑘 > 0,
and 𝑊 = {𝑥: 𝐿(𝑥, 𝜃1 ) ≥ 𝑘𝐿(𝑥, 𝜃0 } = {𝑥: 𝐿1 ≥ 𝑘𝐿0 ∣
where 𝑘 is determined so that the size of 𝑊 ′ = {𝑥: 𝐿(𝑥, 𝜃1 ) < 𝑘𝐿(𝑥, 𝜃0 ) ∣= {𝑥: 𝐿1 < 𝑘𝐿0 },
𝑃𝜃0 (𝑊) = 𝑃[𝑥 ∈ 𝑊 ∣ 𝐻0 ] = ∫ 𝐿0 𝑑𝑥 = 𝛼

𝑊
To prove that 𝑊 is unbiased, we have to show that :

Power of 𝑊 ≥ 𝛼 i.e., 𝑃𝜃1 (𝑊) ≥ 𝛼
We have: 𝑃𝜃1 (𝑊) = ∫𝑊 𝐿1 𝑑𝑥 ≥ 𝑘∫𝑊 𝐿0 𝑑𝑥 = 𝑘𝛼
[∵ On 𝑊, 𝐿1 ≥ 𝑘𝐿0 and Using (i)]
i.e., 𝑃𝜃1 (𝑊) ≥ 𝑘𝛼, ∀𝑘 > 0
Also
1 − 𝑃𝜃1 (𝑊) = 1 − 𝑃(𝐱 ∈ 𝑊 ∣ 𝐻1 ) = 𝑃(𝐱 ∈ 𝑊 ′ ∣ 𝐻1 ) = ∫ 𝐿1 𝑑𝐱

𝑊′
< 𝑘 ∫ 𝐿0 𝑑𝐱 = 𝑘𝑃(𝐱: 𝐱 ∈ 𝑊 ′ ∣ 𝐻0 ) [∵ On 𝑊 ′ , 𝐿1 < 𝑘𝐿0 ]

𝑊′
= 𝑘[1 − 𝑃(𝐱: 𝐱 ∈ 𝑊 ∣ 𝐻0 )] = 𝑘(1 − 𝛼)
[Using (i)]
i.e., 1 − 𝑃𝜃1 (𝑊) ≤ 𝑘(1 − 𝛼), ∀𝑘 > 0
Case (i) 𝑘 ≥ 1. If 𝑘 ≥ 1, then from (iii), we get
𝑃𝜃1 (𝑊) ≥ 𝑘𝛼 ≥ 𝛼
⇒ 𝑊 is unbiased CR.
𝑊 = {x: 𝑔𝜃1 (𝑡(x)) ⋅ ℎ(x) ≥ 𝑘 ⋅ 𝑔𝜃0 (𝑡(x)) ⋅ ℎ(x)}, ∀𝑘 > 0
= {x: 𝑔𝜃1 (𝑡(x)) ≥ 𝑘 ⋅ 𝑔𝜃0 (𝑡(x))}, ∀𝑘 > 0
274 | P a g e
Hence if 𝑇 = 𝑡(𝑥) is sufficient statistic for 𝜃 then the MPCR for the test may be defined in
terms of the marginal distribution of 𝑇 = 𝑡(x), rather than the joint distribution of
𝑥1 , 𝑥2 , … , 𝑥𝑛 .
14.3.3 UMP (Uniformly Most Powerful ) Critical Region
It provides best critical region for testing 𝐻0 : 𝜃 = 𝜃0 against the hypothesis 𝜃 = 𝜃1 , provided
𝜃1 > 𝜃0 while it defines the best critical region for testing 𝐻0 : 𝜃 = 𝜃0 against 𝐻1 : 𝜃 = 𝜃1 ,
provided 𝜃1 < 𝜃0 . Thus, the best critical region for testing simple hypothesis 𝐻0 : 𝜃 = 𝜃0
against the simple hypothesis 𝜃 = 𝜃1 + 𝑐, 𝑐 > 0 will not serve as best critical region for testing
simple hypothesis 𝐻0 : 𝜃 = 𝜃0 against simple alternative hypothesis 𝐻1 : 𝜃 = 𝜃0 − 𝑐, 𝑐 > 0.
Hence in this problem, no uniformly most powerful test exists for testing the simple hypothesis,
𝐻0 : 𝜃 = 𝜃0 against the composite alternative hypothesis, 𝐻1 : 𝜃 ≠ 𝜃0 .
However, for each alternative hypothesis, 𝐻1 : 𝜃 = 𝜃1 > 𝜃0 or 𝐻1 : 𝜃 = 𝜃 < 𝜃0 , 𝑎 UMP test
exists .
Remark. In particular, if we take 𝑛 = 2, then the B.C.R. for testing 𝐻0 : 𝜃 = 𝜃0 , against
𝐻1 : 𝜃 = 𝜃1 (> 𝜃0 ) is given by :
𝑊 = {x: (𝑥1 + 𝑥2 )/2 ≥ 𝜃0 + 𝜎𝑧𝑎 /√2}
= {x: 𝑥1 + 𝑥2 ≥ 2𝜃0 + √2𝜎𝑧𝛼 }
= {x: 𝑥1 + 𝑥2 ≥ 𝐶}, (say)
where 𝐶 = 2𝜃0 + √2𝜎𝑧𝛼 = 2𝜃0 + √2𝜎 × 1.645, if 𝛼 = 0.05.
Similarly, the B.C.R. for testing 𝐻0 : 𝜃 = 𝜃0 against 𝐻1 : 𝜃 = 𝜃1 (< 𝜃0 ) with 𝑛 = 2 and 𝛼 =
0.05 is given by
𝑊1 = {x: (𝑥1 + 𝑥2 )/2 ≤ 𝜃0 − 𝜎𝑧𝑎 /√2}
= {x: (𝑥1 + 𝑥2 ) ≤ 2𝜃0 − √2𝜎 × 1.645}
= {x: 𝑥1 + 𝑥2 ≤ C1 }, (say),
where 𝐶1 = 2𝜃0 − √2𝜎𝑧𝛼 = 2𝜃0 − √2𝜎 × 1.645, if 𝛼 = 0.05
The B.C.R. for testing 𝐻0 : 𝜃 = 𝜃0 against the two tailed alternative 𝐻1 : 𝜃 = 𝜃1 (≠ 𝜃0 ), is
given by : 𝑊2 = {x: (𝑥1 + 𝑥2 ≥ 𝐶) ∪ (𝑥1 + 𝑥2 ≤ 𝐶1 )}
The regions are given by the shaded portions in the following figures (i), (ii) and (iii)
respectively.
275 | P a g e

Example 1. Show that for the normal distribution with zero mean and variance 𝜎 2 , the best
critical region for 𝐻0 : 𝜎 = 𝜎0 against the alternative 𝐻1 : 𝜎 = 𝜎1 is of the form :
𝑛
∑ 𝑥𝑖2 ≤ 𝑎𝛼 , for 𝜎0 > 𝜎1

𝑖=1
𝑛
and ∑ 𝑥𝑖2 ≥ 𝑏𝛼 , for 𝜎0 < 𝜎1

𝑖=1
𝜎2
Show that the power of the best critical region when 𝜎0 > 𝜎1 is 𝐹 (𝜎02 , 𝜒 2 𝛼, 𝑛), where 𝜒 2 𝛼𝑛
1
is lower 100𝛼-per cent point and 𝐹 is the distribution function of the 𝜒 2 -distribution with 𝑛
degrees of freedom.
and since it is independent of 𝜃1 , 𝑊1 is also UPM C.R. for 𝐻0 : 𝜃 = 𝜃0 against
𝐻1 : 𝜃 = 𝜃1 (< 𝜃0 ).
However, since the two critical regions 𝑊0 and 𝑊1 are different, there exists no critical region
of size 𝛼 which is U.M.P. for 𝐻0 : 𝜃 = 𝜃0 against the two tailed alternative, 𝐻1 : 𝜃 ≠ 𝜃0 .
Power of the test. The power of the test for testing 𝐻0 : 𝜃 = 𝜃0 , against 𝐻1 : 𝜃 = 𝜃1 (> 𝜃0 ) is
given by
𝑛
1 2
1−𝛽 = 𝑃[𝑥 ∈ 𝑊0 ∣ 𝐻1 ] = 𝑃 (∑ 𝑥𝑖 ≤ 𝜒 ∣ 𝐻1 )
2𝜃0 1−𝛼,2𝑛
𝑖=1
𝑛
𝜃1 2
= 𝑃 (2𝜃1 ∑ 𝑥𝑖 ≤ 𝜒 ∣ 𝐻1 )
𝜃0 1−𝛼,2𝑛
𝑖=1
2 𝜃1 2
= 𝑃 {𝜒(2𝑛) ≤ 𝜒 },
𝜃0 1−𝛼2
276 | P a g e
since under 𝐻1 , 2𝜃1 ∑𝑛𝑖=1 𝑥𝑖 ∼ 𝜒(2𝑛)

2
.
Similarly the power of the test for testing 𝐻0 : 𝜃 = 𝜃0 , against 𝐻1 : 𝜃 = 𝜃1 (< 𝜃0 ) is given by:
𝑛
1 2
1−𝛽 = 𝑃(𝑥 ∈ 𝑊1 ∣ 𝐻1 ) = 𝑃 (∑ 𝑥𝑖 ≥ 𝜒 𝛼, 2𝑛 ∣ 𝐻1 )
2𝜃0
𝑖=1
𝑛
𝜃1 2
= 𝑃 (2𝜃1 ∑ 𝑥𝑖 ≥ 𝜒 𝛼, 2𝑛 ∣ 𝐻1 )
𝜃0
𝑖=1
𝜃1 2
= 𝑃 {𝜒 2 (2𝑛) ≥ 𝜒 𝛼, 2𝑛}
𝜃0
Remark. The graphic representation of the B.C.R. for 𝐻0 : 𝜃 = 𝜃0 against different alternatives
𝐻1 : 𝜃 = 𝜃1 (> 𝜃0 ), 𝐻1 : 𝜃 = 𝜃1 (< 𝜃0 ) and 𝐻1 : 𝜃 = 𝜃1 (≠ 𝜃0 ) for 𝑛 = 2, can be done similarly
as in Example 18.6, for the mean of normal distribution.
𝛽exp {−𝛽(𝑥 − 𝛾) ∣ 𝑑𝑥, 𝑥 ≥ 𝛾
Example 2. For the distribution 𝑑𝐹 = { show that for a
0, 𝑥 < 𝛾
hypothesis 𝐻0 that 𝛽 = 𝛽0 , 𝛾 = 𝛾0 and an alternative 𝐻1 that 𝛽 = 𝛽1, 𝛾 = 𝛾1 , the best critical
region is given by
1 1 𝛽1
𝑥‾ ≤ {𝛾1 𝛽1 − 𝛾0 𝛽0 − log 𝑘 + log }
𝛽1 − 𝛽0 𝑛 𝛽0
provided that the admissible hypothesis is restricted by the condition 𝛾1 ≤ 𝛾0 , 𝛽1 ≥ 𝛽0
𝛽exp {−𝛽(𝑥 − 𝛾)}, 𝑥 ≥ 𝛾
Solution. 𝑓(𝑥; 𝛽, 𝛾) = {
0, otherwise
𝑛
𝑛
𝑛
𝛽 exp {−𝛽 ∑ (𝑥𝑖 − 𝛾)} ; 𝑥1 , 𝑥2 , … , 𝑥𝑛 ≥ 𝛾
∴ ∏ 𝑓(𝑥𝑖 ; 𝛽, 𝛾) = {
𝑖=1
𝑖=1
0, otherwise
Using Neyman-Pearson Lemma, B.C.R. for 𝑘 > 0, is given by
𝛽1 𝑛 exp {−𝛽1 ∑𝑛𝑖=1 (𝑥𝑖 − 𝛾1 )}
≥𝑘
𝛽0 𝑛 exp {−𝛽0 ∑𝑛𝑖=1 (𝑥𝑖 − 𝛾0 )}
277 | P a g e

𝑛 𝑛
𝛽1 𝑛
⇒ ( ) exp {−𝛽1 ∑ (𝑥𝑖 − 𝛾1 ) + 𝛽0 ∑ (𝑥𝑖 − 𝛾0 )} ≥ 𝑘
𝛽0
𝑖=1 𝑖=1
𝛽1 𝑛
⇒ ( ) exp [−𝛽1 𝑛(𝑥‾ − 𝛾1 ) + 𝛽0 𝑛(𝑥‾ − 𝛾0 )] ≥ 𝑘
𝛽0
⇒ 𝑛log (𝛽1 /𝛽0 ) − 𝑛𝑥‾(𝛽1 − 𝛽0 ) + 𝑛𝛽1 𝛾1 − 𝑛𝛽0 𝛾0 ≥ log 𝑘
(since log 𝑥 is an increasing function of 𝑥 ).
1 𝛽1
⇒ 𝑥‾(𝛽1 − 𝛽0 ) ≤ {𝛾1 𝛽1 − 𝛾0 𝛽0 − log 𝑘 + log ( )}
𝑛 𝛽0
1 1 𝛽1
∴ 𝑥‾ ≤ {𝛾1 𝛽1 − 𝛾0 𝛽0 − log 𝑘 + log ( )} provided 𝛽1 > 𝛽0 .
𝛽1 − 𝛽0 𝑛 𝛽0
Example 3. Examine whether a best critical region exists for testing the null hypothesis
𝐻0 : 𝜃 = 𝜃0 against the alternative hypothesis 𝐻1 : 𝜃 > 𝜃0 for the parameter 𝜃 of the distribution:
1+𝜃
𝑓(𝑥, 𝜃) = ,1 ≤ 𝑥 < ∞
(𝑥 + 𝜃)2
1
Solution. ∏𝑛𝑖=1 𝑓(𝑥𝑖 , 𝜃) = (1 + 𝜃)𝑛 ∏𝑛𝑖=1 (𝑥 +𝜃)2
𝑖
By Neyman-Pearson Lemma, the B.C.R. for 𝑘 > 0, is given by

𝑛 𝑛
1 1
(1 + 𝜃1 )𝑛 ∏ 2
≥ 𝑘(1 + 𝜃0 )𝑛 ∏
(𝑥𝑖 + 𝜃1 ) (𝑥𝑖 + 𝜃0 )2
𝑖=1 𝑖=1
𝑛 𝑛
⇒ 𝑛log (1 + 𝜃1 ) − 2 ∑ log (𝑥𝑖 + 𝜃1 ) ≥ log 𝑘 + 𝑛log (1 + 𝜃0 ) − 2 ∑ log (𝑥𝑖 + 𝜃0 )

𝑖=1 𝑖=1
𝑛
𝑥𝑖 + 𝜃0 1 + 𝜃0
⇒ 2 ∑ log ( ) ≥ log 𝑘 + 𝑛log ( )
𝑥𝑖 + 𝜃1 1 + 𝜃1
𝑖=1
𝑥 +𝜃
Thus the test criterion is ∑𝑛𝑖=1 log (𝑥𝑖 +𝜃0), which cannot be put in the form of a function of the
𝑖 1
sample observations, not depending on the hypothesis. Hence no B.C.R. exists in this case.
278 | P a g e
14.3.4 Likelihood Ratio Test:

Neyman-Pearson Lemma based on the magnitude of the ratio of two probability density
functions provides best test for testing simple hypothesis against simple alternative hypothesis.
The best test in any given situation depends on the nature of the population distribution and the
form of the alternative hypothesis being considered. In this section we shall discuss a general
method of test construction called the Likelihood Ratio (L.R.) Test introduced by Neyman and
Pearson for testing a hypothesis, simple or composite, against a simple or composite alternative
hypothesis. This test is related to the maximum likelihood estimates.
Before defining the test, we give below some notations and terminology.
Parameter Space. Let us consider a random variable 𝑋 with p.d.f. 𝑓(𝑥, 𝜃). In most common
applications, though not always, the functional form of the population distribution is assumed
to be known except for the value of some unknown parameter(s) 𝜃 which may take any value
on a set Θ. This is expressed by writing the p.d.f. in the form 𝑓(𝑥, 𝜃), 𝜃 ∈ Θ. The set Θ, which
is the set of all possible values of 𝜃 is called the parameter space. Such a situation gives rise
not to one probability distribution but a family of probability distributions which we write as I
𝑓(𝑥, 𝜃) = 𝜃 ∈ Θ}. For example if 𝑋 ∼ 𝑁(𝜇, 𝜎 2 ), then the parameter space is :
Θ = {(𝜇, 𝜎 2 ): −∞ < 𝜇 < ∞, 0 < 𝜎 < ∞}
In particular, for 𝜎 2 = 1, the family of probability distributions is given by
{𝑁(𝜇, 1); 𝜇 ∈ Θ}, where Θ = {𝜇: −∞ < 𝜇 < ∞}
In the following discussion we shall consider a general family of distributions:
{𝑓(𝑥: 𝜃1 , 𝜃2 , … , 𝜃𝑘 ): 𝜃𝑖 ∈ Θ, 𝑖 = 1,2, … , 𝑘}
The null hypothesis 𝐻0 will state that the parameters belong to some subspace Θ0 of the
parameter space Θ.
Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 be a random sample of size 𝑛 > 1 from a population with p.d.f. 𝑓(𝑥,
𝜃1 , 𝜃2 , … , 𝜃𝑘 ), where Θ, the parameter space is the totality of all points that (𝜃1 , 𝜃2 , …, 𝜃𝑘 ) can
assume. We want to test the null hypothesis :
𝐻0 : (𝜃1 , 𝜃2 , … , 𝜃𝑘 ) ∈ Θ0
against all alternative hypotheses of the type :
𝐻1 : (𝜃1 , 𝜃2 , … , 𝜃𝑘 ) ∈ Θ − Θ0
The likelihood function of the sample observations is given by
𝑛
𝐿 = ∏ 𝑓(𝑥𝑖 ; 𝜃1 , 𝜃2 , … , 𝜃𝑘 )
𝑖=1
279 | P a g e

According to the principle of maximum likelihood, the likelihood equation for estimating any
parameter 𝜃𝑖 is given by
∂𝐿
= 0, (𝑖 = 1,2, … , 𝑘)
∂𝜃𝑖
Using (18.17), we can obtain the maximum likelihood estimates for the parameters
(𝜃1 , 𝜃2 , … , 𝜃𝑘 ) as they are allowed to vary over the parameter space Θ and the subspace Θ0 .
Substituting these estimates in (18.16), we obtain the maximum values of the likelihood
function for variation of the parameters in Θ and Θ0 respectively. Then the criterion for the
likelihood ratio test is defined as the quotient of these two maxima and is given by
𝐿(Θ̂0 ) Sup𝜃∈Θ0 𝐿(x, 𝜃)
𝜆 = 𝜆(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) = =
𝐿(Θ̂) Sup𝜃∈Θ 𝐿(x, 𝜃)
where 𝐿(Θ̂0 ) and 𝐿Θ̂ are the maxima of the likelihood function with respect to the parameters
in the regions Θ0 and Θ respectively.
The quantity 𝜆 is a function of the sample observations only and does not involve re parameters.
Thus 𝜆 being a function of the random variables, is also a random variable. Obvious 𝜆 > 0.
Further Θ0 ⊂ Θ ⇒ 𝐿(Θ0 ) ≤ 𝐿(Θ) ⇒ 𝜆 ≤ 1
Hence, we get
The critical region for testing 𝐻0 (against 𝐻1 ) is an interval
0 < 𝜆 < 𝜆0
where 𝜆0 is some number (< 1) determined by the distribution of 𝜆 and the desired probability
of type 1 error, i.e., 𝜆0 is given by the equation :
𝑃(𝜆 < 𝜆0 ∣ 𝐻0 ) = 𝛼
For example, if 𝑔(.)𝑖𝑠 𝑡ℎ𝑒 𝑝. 𝑑. 𝑓. 𝑜𝑓𝜆 then 𝜆0 is determined from the equation :
𝜆0
∫ 𝑔(𝜆 ∣ 𝐻0 )𝑑𝜆 = 𝛼
0
A test that has critical region defined is a likelihood ratio test for testing H0 .
Remark. define the critical region for testing the hypothesis 𝐻0 by the likelihood ratio test.
Suppose that the distribution of 𝜆 is not known but the distribution of some function of 𝜆 is
known, then this knowledge can be utilized as given in the following theorem.
Theorem . If 𝜆 is the likelihood ratio for testing a simple hypothesis 𝐻0 and if 𝑈 = 𝜙(𝜆) is a
monotonic increasing (decreasing) function of 𝜆 then the test based on 𝑈 is equivalent to the
likelihood ratio test. The critical region for the test based on 𝑈 is :
𝜙(0) < 𝑈 < 𝜙(𝜆0 ), ∣ [[𝜙(𝜆0 ) < 𝑈 < 𝜙(0)]
280 | P a g e
Proof. The critical region for the likelihood ratio test is given by 0 < 𝜆 < 𝜆0 , where 𝜆0 is
determined by
𝜆0
∫ 𝑔(𝜆 ∣ 𝐻0 )𝑑𝜆 = 𝛼
0
Let 𝑈 = 𝜙(𝜆) be a monotonically increasing function of 𝜆. Then ( ∗ ) gives

𝜆0 𝜙(𝜆0 )
𝛼=∫ 𝑔(𝜆 ∣ 𝐻0 )𝑑𝜆 = ∫ ℎ(𝑢 ∣ 𝐻0 )𝑑𝑢
0 𝜙(0)
where ℎ(𝑢 ∣ 𝐻0 ) is the p.d.f. of 𝑈 when 𝐻0 is true. Here the critical region 0 < 𝜆 < 𝜆0
transforms to 𝜙(0) < 𝑈 < 𝜙(𝜆0 ). However if 𝑈 = 𝜙(𝜆) is a monotonic decreasing function
of 𝜆, then the inequalities are reversed and we get the critical region as 𝜙(𝜆0 ) < 𝑈 < 𝜙(0).
2. If we are testing a simple null hypothesis 𝐻0 then there is a unique distribution determined
for 𝜆. But if 𝐻0 is composite, then the distribution of 𝜆 may or may not be unique. In such a
case the distribution of 𝜆 may possibly be different for different parameter points in Θ0 and
then 𝜆0 is to be chosen such that
𝜆0
∫ 𝑔(𝜆 ∣ 𝐻0 )𝑑𝜆 ≤ 𝛼
0
for all values of the parameters in Θ0 .

However, if we are dealing with large samples, a fairly satisfactory situation to this testing of
hypothesis problem exists as stated (without proof) in the following theorem.
Theorem. Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 be a random sample from a population with p.d.f. 𝑓(𝑥;
𝜃1 , 𝜃2 , … , 𝜃𝑘 ) where the parameter space Θ is 𝑘-dimensional. Suppose we want to test the
composite hypothesis
𝐻0 : 𝜃1 = 𝜃1′ ; 𝜃2 = 𝜃2′ ; … , 𝜃𝑟 = 𝜃𝑟′ ; 𝑟 < 𝑘
where 𝜃1′ , 𝜃2′ , … , 𝜃𝑟′ are specified numbers. When 𝐻0 is true, −2log 𝑒 𝜆 is asymptotically
distributed as chi-square with 𝑟 degrees of freedom, i.e., under 𝐻0 ,
−2log 𝜆 ∼ 𝜒(𝑟)2 , if 𝑛 is large.
Since 0 ≤ 𝜆 ≤ 1, −2log 𝑒 𝜆 is an increasing function of 𝜆 and approaches infinity when 𝜆 →
0, the critical region for −2log 𝜆 being the right hand tail of the chi-square distribution. Thus
at the level of significance ' 𝛼 ', the test may be given as follows :
Reject 𝐻0 if −2log 𝑒 𝜆 > 𝜒𝑟2 (𝛼)
where 𝜒𝑟2 (𝛼) is the upper 𝛼-point of the chi-square distribution with 𝑟 d.f. given by :
2
𝑃[𝜒 2 > 𝜒(𝑟) (𝛼)] = 𝛼
281 | P a g e

otherwise 𝐻0 may be accepted.

Properties of Likelihood Ratio Test. Likelihood ratio (L.R.) test principle is an intuitive one.
If we are testing a simple hypothesis H0 against a simple alternative hypothesis 𝐻1 then the 𝐿𝑅
principle leads to the same test as given by the Neyman-Pearson lemma. This suggests that 𝐿𝑅
test has some desirable properties, specially large sample properties.
In 𝐿𝑅 test, the probability of type 𝐼 error is controlled by suitably choosing the cut off point 𝜆0 .
LR test is generally UMP if an UMP test at all exists. We state below, the two asymptotic
properties of 𝐿𝑅 tests.
1 Under certain conditions, −2log 𝑒 𝜆 has an asymptotic chi-square distribution.
Under certain assumptions, 𝐿𝑅 test is consistent.
Now we shall illustrate how the likelihood ratio criterion can be used to obtain various
standard tests of significance in Statistics.

MCQ’s Problems
Question 1.
Let 𝑋1 , … , 𝑋𝑛 be a random sample of size n(≥ 2) from a uniform distribution with
1
; 0<𝑥<𝜃
𝑓(𝑥, 𝜃) = {𝜃
0, otherwise
where 𝜃 ∈ (0, ∞). If 𝑋(1) = min{𝑋1 , … , 𝑋𝑛 } and 𝑋(𝑛) = max{𝑋1 , … , 𝑋𝑛 }.
1 𝜃
Then, as n→ ∞, 𝜃 (𝑋(𝑛) + 𝑛+1) converges in probability to
A. 1
B. 0
C. 2
D. 3
Question 2.
Which measure is used to determine the convexity of the distribution curve?
A. skewness
B. kurtosis
C. variance
282 | P a g e
D. standard deviation
Question 3.
Consider the sample linear regression model 𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝜖𝑖 , 𝑖 = 1,2, … , 𝑛 Where 𝜖𝑖′ 𝑠 are
i.i.d random variables with mean 0 and variance 𝜎 2 ∈ (0, ∞). Suppose that we have a data set
(𝑥1 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 ) with n = 10, ∑𝑛𝑖=1 𝑥𝑖 = 50, ∑𝑛𝑖=1 𝑦𝑖 = 40, ∑𝑛𝑖=1 𝑥𝑖2 = 500
∑𝑛𝑖=1 𝑦𝑖2 = 400 and ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 = 400. An unbiased estimate of 𝜎 2 is :
A. 5
B. 1/5
C. 10
D. 1/10
Question 4.
If 𝑋1 , 𝑋2 , … , 𝑋𝑛 is a random sample from a population with density
1 −𝑥
𝑓(𝑥, 𝜃) = {𝜃 𝑒
𝜃 if 0<𝑥 < ∞
0, otherwise
Where 𝜃 > 0 is an unknown parameter, what is a 100(1 − 𝛼)% confidence interval for 𝜃?
2∑𝑛 𝑛
𝑖=1 ln 𝑋𝑖 2∑𝑖=1 ln 𝑋𝑖
A. [ , ]
𝜒2 𝛼 (2𝑛) 2 (2𝑛)
𝜒𝛼
1−
2 2
2∑𝑛
𝑖=1 𝑋𝑖
B. [𝜒2 , 2 (2𝑛) ]
𝛼 (2𝑛) 𝜒𝑎
1−
2 2
2∑𝑛 𝑋𝑖 2∑𝑛 𝑋
C. [ 𝜒2𝑖=1 , 𝑖=1 𝑖 ]
(2𝑛) 𝜒2 (2𝑛)
𝛼 𝛼
1−
2 2
2∑𝑛
𝑖=1 ln 𝑋𝑖 2∑𝑛𝑖=1 ln 𝑋𝑖
D. [ 2 (2𝑛) , ]
𝜒𝛼 𝜒2 𝛼 (2𝑛)
1−
2 2
Question 5.
Suppose that 𝑋 has uniform distribution on the interval [0,100]. Let 𝑌 denote the greatest
integer smaller than or equal to X. Which of the following is true?
1
A. 𝑃(𝑌 ≤ 25) = 4
283 | P a g e

26
B. 𝑃(𝑌 ≤ 25) = 100
C. 𝐸(𝑌) = 50
101
D. 𝐸(𝑌) = 2
Question 6.
𝑥 𝑥
1 1 1
−
probaability density function 𝑓( 𝑥 ∣ 𝜃 ) = 3 [𝜃 𝑒 − 𝜃 + 𝜃2 𝑒 𝜃2 + 𝑒 −𝑥 ] , 𝑥 >
0, 𝜃 𝑏𝑒𝑙𝑜𝑛𝑔𝑠 𝑡𝑜 {1,2,3,4}. Then the method of moment estimate (MME) of 𝜃 is
A. 1
B. 2
C. 3
D. 4
Question 7.
Let the random variable 𝑋 and 𝑌 have the joint probability mass function
−2
𝑥 3 𝑦 1 𝑥−𝑦 2𝑥
𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) = 𝑒 (𝑦) ( ) ( ) , 𝑦 = 0,1,2, … , 𝑥; 𝑥 = 0,1,2, …
4 4 𝑥!
Then 𝑉(𝑌) is equal to
A. 1
B. 1/2
C. 2
D. 3/2
Question 8.
Let the discrete random variables 𝑋 and 𝑌 have the joint probability mass function
𝑒 −1
; 𝑚 = 0,1,2, … , 𝑛; 𝑛 = 0,1,2, …
𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) = {(𝑛 − 𝑚)! 𝑚! 2𝑛
0, otherwise
A. The marginal distribution of 𝑋 is Poisson with mean 1/2
B. The random variable 𝑋 and 𝑌 are independent
1
C. The conditional distribution of X given Y = 5 is Bin (6, 2)
284 | P a g e
D. 𝑃(𝑌 = 𝑛) = (𝑛 + 1)𝑃(𝑌 = 𝑛 + 2) for 𝑛 = 0,1,2, …

Question 9.
Consider the trinomial distribution with the probability mass function
2! 1 𝑥 2 𝑦 3 2−𝑥−𝑦
𝑃(𝑋 = 𝑥, 𝑌 = 𝑦) = ( ) ( ) ( )
𝑥! 𝑦! (2 − 𝑥 − 𝑦)! 6 6 6
, 𝑥 ≥ 0, 𝑦 ≥ 0, and 0 < 𝑥 + 𝑦 ≤ 2. Then Corr (𝑋, 𝑌) is equal to…
(correct up to two decimal places)
A) -0.31
B) 0.31
C) 0.35
D) 0.78
Question 10.
Let 𝑥1 = 1.1, 𝑥2 = 0.5, 𝑥3 = 1.4, 𝑥4 = 1.2 be the observed values of a random sample of size
four from a distribution with the probability density function
𝑒 𝜃−𝑥 , if 𝑥 ≥ 𝜃, 𝜃 ∈ (−∞, ∞)
𝑓(𝑥 ∣ 𝜃) = {
0, otherwise
Then the maximum likelihood estimate of 𝜃 2 + 𝜃 + 1 is equal (up to decimal place).
A) 1.75
B) 1.89
C) 1.74
D) 0.87
Question 11.
Let 𝑈 ∼ 𝐹5,8 and 𝑉 ∼ 𝐹8,5. If 𝑃[𝑈 > 3.69] = 0.05, then the value of C such that
𝑃[𝑉 > 𝑐] = 0.95 equals… (round off two decimal places)
A) 0.27
B) 1.27
C) 2.27
D) 2.29
285 | P a g e

Question 12.
Let P be a probability function that assigns the same weight to each of the points of the
sample space Ω = {1,2,3,4}. Consider the events E = {1,2}, F = {1,3} and G = {3,4}. Then
which of the following statement(s) is (are) TRUE?
1. E and F are independent
2. E and G are independent
3. E, F and G are independent
A. 1 only
B. 2 only
C. 1 and 2 only
D. 1,2 and 3
Question 13.
Let 𝑋1 , 𝑋2 , … , 𝑋4 and 𝑌1 , 𝑌2 , … , 𝑌5 be two random samples of size 4 and 5 respectively, from a
51 𝑋 2 +𝑋 2 +𝑋 2 +𝑋 2
2 3 4
standard normal population. Define the statistic T = (4) 𝑌 2+𝑌 2 +𝑌 2 +𝑌 2 +𝑌 2 , then which of the
1 2 3 4 5
following is TRUE?
A. Expectation of 𝑇 is 0.6
B. Variance of T is 8.97
C. T has F-distribution with degree of freedom 5 and 4
D. T has F-distribution with degree of freedom 4 and 5
Question 14.
Let 𝑋, 𝑌 and 𝑍 be independent random variables with respective moment generating function
1 2
𝑀𝑋 (𝑡) = 1−𝑡 , 𝑡 < 1; 𝑀𝑌 (𝑡) = 𝑒 𝑡 /2 = 𝑀𝑍 (𝑡) 𝑡 ∈ ℝ. Let 𝑊 = 2𝑋 + 𝑌 2 + 𝑍 2 then P(W > 2)
is equals to
A. 2𝑒 −1
B. 2𝑒 −2
C. 𝑒 −1
D. 𝑒 −2
286 | P a g e
Question 15.
1 1 𝑥 1 −𝑥
𝑓(𝑥 ∣ 𝜃) = [ 𝑒 − 𝜃 + 2 𝑒 𝜃2 + 𝑒 −𝑥 ] , 𝑥 > 0, 𝜃 ∈ (0, ∞)
3 𝜃 𝜃
Then the method of moment estimate (MME) of 𝜃 is
A. 1.5
B. 2.5
C. 3.5
D. 4.5
Question 16.
Let 𝑋 be a random variable with cumulative distribution function
1 𝑛+2𝑘+1
𝑃(𝑋 = ℎ, 𝑌 = 𝑘) = ( ) ; 𝑛 = −𝑘, −𝑘 + 1, … , ; 𝑘 = 1,2, …
2
Then E(Y) equals
A. 1
B. 2
C. 3
D. 4
Question 17.
Let 𝑋 be a random variable with the cumulative distribution function
0, 𝑥<0
1 + 𝑥2
, 0≤𝑥<1
𝐹(𝑥) = 10
3 + 𝑥2
, 1≤𝑥<2
10
{ 1, 𝑥≥2
3
A. 𝑃(1 < 𝑋 < 2) = 10
31
B. 𝑃(1 < 𝑋 ≤ 2) = 5
287 | P a g e

11
C. 𝑃(1 ≤ 𝑋 < 2) = 2
41
D. 𝑃(1 ≤ 𝑋 ≤ 2) = 5
Question 18.
Let the random variables 𝑋1 and 𝑋2 have joint probability density function
𝑥1 𝑒 −𝑥1𝑥2
𝑓(𝑥1 , 𝑥2 ) = { , 1 < 𝑥1 < 3, 𝑥2 > 0
2
0, otherwise.
What is the value Var (𝑋2 ∣ 𝑋1 = 2) …(up to two decimal place)?
A) 0.27
B) 0.28
C) 0.25
D) 1.90
Question 19.
Let 𝑥1 = 1, 𝑥2 = 0, 𝑥3 = 0, 𝑥4 = 1, 𝑥5 = 0, 𝑥6 = 1 be the data on a random sample of size 6
from Bin (1, 𝜃) distribution, where 𝜃 ∈ (0,1). Then the uniformly minimum variance unbiased
estimate of 𝜃(1 + 𝜃) equal to
Question: 20
Let 𝑋1 , 𝑋2 , … , 𝑋𝑁 be identically distributed random variable with mean 2 and variance 1. Let N
be a random variable follows Poisson distribution with mean 2 and independent of 𝑋i′ S. Let
𝑆𝑁 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑁 , then Var (SN ) is equals:
A. 4
B. 10
C. 2
D. 1
Question: 21
Let 𝐴 and 𝐵 be independent Random Variables each having the uniform distribution on [0,1].
Let 𝑈 = min{𝐴, 𝐵} and 𝑉 = max{𝐴, 𝐵}, then Cov (𝑈, 𝑉) is equals
A. -1/36
B. 1/36
C. 1
288 | P a g e
D. 0
Question 22.
Let 𝑋1 , 𝑋2 , 𝑋3 be random sample from uniform (0, 𝜃 2 ), 𝜃 > 1, then maximum likelihood
estimation (mle) of 𝜃
2
A. 𝑋(1)
B. √X(3)
C. √X(1)
D. 𝛼𝑋(1) + (1 − 𝛼)𝑋(3) ; 0 < 𝛼 < 1
Question 23.
For the discrete variate with density:
1 6 1
𝑓(𝑥) = 𝐼(−1) (𝑥) + 𝐼(0) (𝑥) + 𝐼(1) (𝑥).
8 8 8
Which of the following is TRUE?
1
A. 𝐸(𝑋) = 2
1
B. 𝑉(𝑋) = 2
1
C. 𝑃{|𝑋 − 𝜇𝑥 | ≥ 2𝜎𝑥 } ≤ 4
1
D. 𝑃{|𝑋 − 𝜇𝑥 | ≥ 2𝜎𝑥 } ≥ 4
Question: 24
Lęt 𝑋𝑖 , 𝑌𝑖 ; (𝑖 = 1,2)
be a i.i.d random sample of size 2 from a standard normal distribution. What is the
distribution W is given by
√2(𝑋1 + 𝑋2 )
𝑊=
√(𝑋2 − 𝑋1 )2 + (𝑌2 − 𝑌1 )2
A. t-distribution with 1 d.f
B. t-distribution with 2 d.f
C. Chi-square distribution with 2 d.f
D. Does not determined
289 | P a g e

Question: 25
The moment generating function of a random variable X is given by
1 1 1 1
𝑀𝑋 (𝑡) = 6 + 3 𝑒 𝑡 + 3 𝑒 2𝑡 + 6 𝑒 3𝑡 , −∞ < 𝑡 < ∞, then P(X ≤ 2) equals
1
A. 3
1
B. 6
1
C. 2
5
D. 6
Question: 26
1 1
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from 𝐔 (𝜃 − 2 , 𝜃 + 2) distribution, where 𝜃 ∈ ℝ. If
𝑋(1) = min{𝑋1 , 𝑋2 , … , 𝑋𝑛 } and 𝑋(𝑛) = max{𝑋1 , 𝑋2 , … , 𝑋𝑛 }.
1 1 1
Define 𝑇1 = 2 (𝑋(1) + 𝑋(𝑛) ), 𝑇2 = 4 (3𝑋(1) + 𝑋(𝑛) + 1) and 𝑇3 = 2 (3𝑋(𝑛) − 𝑋(1) − 2) an
estimator for 𝜃, then which of the following is/are TRUE?
A. 𝑇1 and 𝑇2 are MLE for 𝜃 but 𝑇3 is not MLE for 𝜃
B. 𝑇1 is MLE for 𝜃 but 𝑇2 and 𝑇3 are not MLE for 𝜃
C. 𝑇1 , 𝑇2 and 𝑇3 are MLE for 𝜃
D. 𝑇1 , 𝑇2 and 𝑇3 are not MLE for 𝜃
Question: 27
Let 𝑋 and 𝑌 be random variable having joint probability density function
𝑘
𝑓(𝑥, 𝑦) = ; −∞ < (𝑥, 𝑦) < ∞
(1 + 𝑥 2 )(1 + 𝑦2)
Where k is constant, then which of the following is/are TRUE?
1
A. k = 𝜋2
1 1
B. 𝑓(𝑥) = 𝜋 1+𝑥 2 ; −∞ < 𝑥 < ∞
C. P(X = Y) = 0
D. All of the above
290 | P a g e
Question: 28
Lę 𝑋1 , 𝑋2 , … , 𝑋𝑛 be sequence of independently and identically distributed random variables
with the probability density function
1 2 −𝑥
𝑓(𝑥) = {2 𝑥 𝑒 , if 𝑥 > 0 and let
0, otherwise
𝑆𝑛 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 , then which of the following statement is/are TRUE?
𝑆𝑛 −3𝑛
A. ∼ 𝑁(0,1) for all 𝑛 ≥ 1
√3𝑛
𝑆
B. For all 𝜀 > 0, 𝑃 (| 𝑛𝑛 − 3| > 𝜀) → 0 as n → ∞
𝑆𝑛
C. → 1 with probability 1
𝑛
D. Both A and B
Question: 29
Let 𝑋, 𝑌 are i.i.d Binomial (𝑛, 𝑝) random variables. Which of the following are true?
A. 𝑋 + 𝑌 ∼ Bin (2𝑛, 𝑝)
B. (X, Y) ∼ Multinomial (2n; p, p)
C. Var (X − Y) = E(X − Y)2
D. option A and C are correct.
Question: 30
Let 𝑋 and 𝑌 be continuous random variables with the joint probability density function
1 2 2
1
𝑓(𝑥, 𝑦) = 2𝜋 𝑒 −2(𝑥 +𝑦 ) ; (𝑥, 𝑦) ∈ ℝ2
Which of the following statement is/are TRUE?
1
A. 𝑃(𝑋 > 0) = 2
1
B. P(X > 0 ∣ Y < 0) =
2
1
C. P(X > 0, Y < 0) = 4
D. All of the above
291 | P a g e

Question: 31
Let X and 𝑌 are random variable with 𝐸[𝑋] = 𝐸[𝑌], then which of the following is NOT
TRUE?
A. E{E[X ∣ Y]} = E[Y]
B. V(𝑋 − 𝑌) = 𝐸(𝑋 − 𝑌)2
C. 𝐸[𝑉(𝑋 ∣ 𝑌)] + 𝑉[𝐸(𝑋 ∣ 𝑌)] = 𝑉(𝑋)
D. X and Y have same distribution
Question: 32
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from Exp (𝜃 ) distribution, where 𝜃 ∈ (0, ∞).
1
If 𝑋‾ = 𝑛 ∑𝑛𝑖=1 𝑋𝑖 , then a 95% confidence interval for 𝜃 is
2
𝜒2𝑛,0.95
A. (0, ]
𝑛𝑋‾
2
𝜒2𝑛,0.95
B. [ , ∞)
𝑛𝑋‾
2
𝜒2𝑛,0.95
C. (0, ]
2𝑛𝑋‾
2
𝜒2𝑛,0.95
D. [ , ∞)
2𝑛𝑋‾
Question: 33
𝑋𝑖 , 𝑖 = 1,2, … be independent random variables all distributed according to the PDF 𝑓𝑥 (𝑥) =
1,0 ≤ 𝑥 ≤ 1. Define 𝑌𝑛 = 𝑋1 𝑋2 𝑋3 … 𝑋𝑛 , for some integer n. Then Var (𝑌𝑛 ) is equal to
𝑛
A. 12
1 1
B. − 22𝑛
3𝑛
1
C. 12𝑛
1
D. 12
Question: 34
Let 𝑋1 , 𝑋2 , … , 𝑋4 be i.i.d random variables having continuous distribution.
Then 𝑃(𝑋3 < 𝑋2 < max(𝑋1 , 𝑋4 )) equal
A. 1/2
B. 1/3
C. 1/4
D. 1/6
292 | P a g e
Question: 35
1 1
Let 𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from U (𝜃 − 0 , 𝜃 + 2) distribution, where 𝜃 ∈ ℝ. If
𝑋(1) = min{𝑋1 , 𝑋2 , … , 𝑋𝑛 } and 𝑋(𝑛) = max{𝑋1 , 𝑋2 , … , 𝑋𝑛 }
Consider the following statement on above:
1
1. 𝑇1 = 2 (𝑋(1) + 𝑋(𝑛) ) is consistent for 𝜃
1
2. 𝑇2 = (3𝑋(1) + 𝑋(𝑛) + 1)
4
is unbiased consistent for 𝜃
A. 1 only
B. 2 only
C. Both 1 and 2
D. Neither 1 nor 2
14.5 SUMMARY
14.6 GLOSSARY
• Motivation: These Problems are very useful in real life and we can use it in data
science , economics as well as social sciemce.
• Attention: Think how the best estimator are useful in real world problems.
Answer 1: A
Explanation :
1
𝑓(𝑥, 𝜃) = {𝜃 ; 0 < 𝑥 < 𝜃
0, otherwisse
𝑛𝜃
E(𝑋(𝑛) ) =
𝑛+1
1 𝜃
Let Y = 𝜃 (𝑋(𝑛) + 𝑛+1)
293 | P a g e

𝑋(𝑛) 1
so E(Y) = E( +𝑛+1) = 1
𝜃
1 1
lim𝑛→∞ 𝐸 [ (𝑋(𝑛) + )] = 1;
𝜃 𝑛+1
Answer 2: B
Explanation:
Convexity (peakedness) is decided by kurtosis.
Answer 3: C
Explanation :
𝑦𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝜖𝑖 , 𝑖 = 1,2, … , 𝑛
∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − 𝑛𝑥‾𝑦‾ 400 − 10 × 5 × 4 4 4
𝛽ˆ = 𝑛 2 2
= 2
= ; 𝛼ˆ = 𝑦‾ − 𝑥‾𝛽ˆ = 4 − 5 × = 0
∑𝑖=1 𝑥𝑖 − 𝑛𝑥‾ 500 − 10 × 5 5 5
1 2 1 4 2
𝜎ˆ 2 = 𝑛−2 ∑𝑛𝑖=1 (𝑦𝑖 − 𝛼ˆ − 𝛽ˆ 𝑥𝑖 ) = 10−2 ∑𝑛𝑖=1 (𝑦𝑖 − 5 𝑥𝑖 )
1 4 4 2 1 8 16
= 8 (∑𝑛𝑖=1 𝑦𝑖2 − 2 × 5 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 + (5) ∑𝑛𝑖=1 𝑥𝑖2 ) = 8 (400 − 5 × 400 + 25 × 500)
= 10
Answer 4: B
Explanation :
2
We use the random variable 𝑄 = 𝜃 ∑𝑛𝑖=1 𝑋𝑖 ∼ 𝜒(2𝑛)
2
As the pivotal quantity. The 100(1 − 𝛼)%

confidence interval for 𝜃 can be constructed from
2∑𝑛
𝑖=1 𝑋𝑖
1 − 𝛼 = 𝑃 (𝜒α2 (2𝑛) ≤ 𝑄 ≤ 𝜒1−
2
α (2𝑛)) = 𝑃 [ 2 ≤𝜃≤ 2 (2𝑛) ]
2 2 𝜒 α (2𝑛) 𝜒α
1−
2 2
2∑𝑛
𝑖=1 𝑋𝑖
Thus, 100(1 − 𝛼)% confidence interval for 𝜃 is given by [𝜒2 ≤𝜃≤ 2 (2𝑛) ]
𝛼 (2𝑛) 𝜒𝛼
1−
2 2
294 | P a g e
Answer 5: B
Explanation :
Let 𝑌 = [X]; where 𝑌 denote the greatest integer smaller than or equal to 𝑋.
26 1 26
𝑃(𝑌 ≤ 25) = 𝑃([𝑋] ≤ 25) = 𝑃(𝑋 ∈ (0,26)) = ∫0 𝑑𝑥 =
100 100
Hence B is the correct option.
Answer 6: C
Explanation :
1
𝑥‾ = (3 + 4 + 3.5 + 2.5) = 3.25
4
1 1
𝐸(𝑋) = [𝜃 + 𝜃 2 + 1]Γ2 = [𝜃 + 𝜃 2 + 1] = 3.25
3 3
2
𝜃 + 𝜃 − 8.75 = 0 then 𝜃 = 2.5 or −3.5
Since 𝜃 ∈ {1,2,3,4} then 𝜃 = 3
Answer 7: D
Explanation :
The marginal pmf of 𝑌 is given by
∞ ∞
𝑥 3 𝑦 1 𝑥−𝑦 2𝑥
𝑃(𝑌 = 𝑦) = ∑ 𝑃(𝑋 = 𝑥), 𝑌 = 𝑦) = ∑ 𝑒 −2 (𝑦) ( ) ( )
4 4 𝑥!
𝑥=𝑦 𝑥=𝑦
𝑦 ∞ 𝑢
3 𝑦+𝑢 1 2𝑦+𝑢
= 𝑒 −2 ( ) ∑ ( 𝑦 ) ( ) (Assume 𝑢 = 𝑥 − 𝑦)
4 4 (𝑦 + 𝑢)!
𝑢=0
𝑦 ∞ ∞
−2
3 (𝑦 + 𝑢)! 1 𝑢 2𝑦+𝑢 3 𝑦 2𝑦 1! 1 𝑢
=𝑒 ( ) ∑ ( ) == 𝑒 −2 ( ) ∑ ( )
4 𝑦! 𝑢! 4 (𝑦 + 𝑢)! 4 𝑦! 𝑢! 2
𝑢 0
3 3 𝑦
− ( )
−2
3 𝑦 2𝑦 1/2 𝑒 22
=𝑒 ( ) 𝑒 = , 𝑦 = 0,1, …
4 𝑦! 𝑦!
Which is the pmf of Poisson random variable with parameter 3/2, so 𝐸(𝑋) = 3/2 and
𝑉(𝑋) = 3/2.
Answer 8: A
295 | P a g e

Explanation :
The marginal probability mass function of X is given by
𝑃(𝑋 = 𝑚) = ∑∞
𝑛=𝑚 𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) ( for 𝑚 = 0,1,2, … )
1 1 𝑚
𝑒 − 2(2)
= , 𝑚 = 0,1,2, …
𝑚!
Thus the marginal distribution of X is Poisson with mean 1/2.
The marginal probability mass function of 𝑌 is given by
𝑃(𝑌 = 𝑛) = ∑∞ 𝑚=0 𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) ( for 𝑛 = 0,1,2, … )
𝑒 −1
= , 𝑛 = 0,1,2, …
𝑛!
Thus the marginal distribution of 𝑌 is Poisson with mean 1 .
𝑃(𝑋 = 𝑚, 𝑌 = 𝑛) ≠ 𝑃(𝑋 = 𝑚)𝑃(𝑌 = 𝑛)
Therefore 𝑋 and 𝑌 are not independent.
𝑃(𝑋 = 𝑚, 𝑌 = 5) 5! 1 5
𝑃(𝑋 = 𝑚 ∣ 𝑌 = 5) = = ( ) , 𝑚 = 0,1,2, … ,5
𝑃(𝑌 = 5) 𝑚! (5 − 𝑚)! 2
1
Thus the conditional distribution of 𝑋 given 𝑌 = 5 is B in (5, 2)
𝑃(𝑌=𝑛)
Since 𝑃(𝑌=𝑛+1) = (𝑛 + 1) for 𝑛 = 0,1,2, …
Answer 9: 𝑨
Explanation :
The trinomial distribution of two r.v.'s 𝑋 and 𝑌 is given by
𝑛!
𝑓𝑋,𝑌 (𝑥, 𝑦) = 𝑝 𝑥 𝑞 𝑦 (1 − 𝑝 − 𝑞)(𝑛−𝑥−𝑦)
𝑥! 𝑦! (𝑛 − 𝑥 − 𝑦)!
for 𝑥, 𝑦 = 0,1,2, … , 𝑛 and 𝑥 + 𝑦 ≤ 𝑛, where p + q ≤ 1.
n = 2, p = 1/6 and q = 2/6
1 1 10
Var (X) = 𝑛𝑝1 (1 − 𝑝1 ) = 2 × (1 − ) = ; Var (Y) = 𝑛𝑝2 (1 − 𝑝2 )
6 6 36
2 2
= 2 × 6̅ (1 − 6) = 16/36
1 2 4
Cov (𝑋, 𝑌) = −𝑛𝑝1 𝑝2 = −2 × × =−
6 6 36
296 | P a g e
Cov (𝑋, 𝑌) 4
Corr (𝑋, 𝑌) = =− = −0.31
√Var (𝑋)√Var (𝑌) 4√10
Hence −0.31 is the correct answer.
Answer 10: 𝐀
Explanation :
Let 𝑥1 = 1.1, 𝑥2 = 0.5, 𝑥3 = 1.4, 𝑥4 = 1.2
𝑒 𝜃−𝑥 , if 𝑥 ≥ 𝜃, 𝜃 ∈ (−∞, ∞)
𝑓(𝑥 ∣ 𝜃) = {
0, otherwise
𝜃 ∈ (∞, 𝑋(1) ]
𝑑
Since 𝑑𝜃 𝑓(𝑥 ∣ 𝜃) > 0 ∀𝜃 ∈ (∞, 𝑋(1) ], then
𝑓(𝑥 ∣ 𝜃) is strictly increasing function. So 𝜃ˆ = 𝑋(1) = 0.5, therefore by invariance

property the MLE of 𝜃 2 + 𝜃 + 1 = (0.5)2 + 0.5 + 1 = 1.75.
Hence MLE for 𝜃 2 + 𝜃 + 1 is 1.75.
Answer 11: 𝐀
Explanation :
1
X ∼ 𝐹(𝑚, 𝑛) then ∼ 𝐹(𝑛, 𝑚)
x
𝑃[𝑈 > 3.69] = 0.05 ⇒ 1 − 𝑃[𝑈 > 3.69] = 1 − 0.05

⇒ 𝑃[𝑈 < 3.69] = 0.95
1 1 1
⇒ 𝑃 [𝑈 > 3.69] = 0.95 ⇒ 𝑉 = 𝑈 and
1
𝑐= = 0.27
3.69
Hence c = 0.27 is the correct answer.
Answer 12: C
Explanation :
Clearly, P({𝜔}) = 1/4 ∀𝜔 ∈ Ω = {1,2,3,4}. We have E = {1,2}, F = {1,3} and G = {3,4}
Then P(E) = P(F) = P(G) = 2/4 = 1/2.
Using this result, we see that E and F are independent and also E and G are independent.
297 | P a g e

Answer 13: D
Explanation :
5 𝑋12 + 𝑋22 + 𝑋32 + 𝑋42 𝑛 5
𝑇=( ) 2 ∼ 𝐹(4,5); 𝐸(𝑊) = =
4 𝑌1 + 𝑌22 + 𝑌32 + 𝑌42 + 𝑌52 𝑛−2 3
2(5)2 (7) 350
Var (𝑇) = = = 9.72
4(3)2 (1) 36
Answer 14: A
Explanation :
2
Since 𝑊 = 2𝑋 + 𝑌 2 + 𝑍 2 ∼ 𝜒(4)
1 −𝑤/2
𝑓𝑊 (𝑤) = {4 𝑤𝑒 , if 𝑤 > 0
0, otherwise
∞1
𝑃(𝑊 > 2) = ∫2 𝑤𝑒 −𝑤/2 𝑑𝑤 = 2𝑒 −1
4
Answer 15: C
Explanation :
1
𝑥‾ = (3 + 4 + 3.5 + 2.5) = 3.25
4
1 1
𝐸(𝑋) = [𝜃 + 𝜃 2 + 1]Γ2 = [𝜃 + 𝜃 2 + 1] = 3.25
3 3
2
𝜃 + 𝜃 − 8.75 = 0 then 𝜃 = 2.5 or −3.5
Since 𝜃 ∈ (0, ∞) then 𝜃 = 2.5
Answer 16: B
Explanation :
𝑃(𝑌 = 𝑘) = ∑∞
𝑛=−𝑘 𝑃(𝑋 = 𝑛, 𝑌 = 𝑘): { put m = n + k}
1 1 𝑘−1
= ( ) {𝑘 = 1,2, …
2 2
which is the pmf of geometric distribution with parameter 1/2}
298 | P a g e
1 1 𝑘−1
𝐸(𝑌) = ∑∞𝑘=0 𝑘 ( ) =2
2 2
Answer 17: A
Explanation :
3
𝑃(1 < 𝑋 < 2) = 𝐹(2) − 𝐹(1) − 𝑃(𝑋 = 2) =
10
3
𝑃(1 < 𝑋 ≤ 2) = 𝐹(2) − 𝐹(1) =
5
1
𝑃(1 ≤ 𝑋 < 2) = 𝐹(2) − 𝐹(1) − 𝑃(𝑋 = 2) + 𝑃(𝑋 = 1) =
2
4
𝑃(1 ≤ 𝑋 ≤ 2) = 𝐹(2) − 𝐹(1) + 𝑃(𝑋 = 1) =
5
Answer 18: C
Explanation :
∞ 𝑥1 𝑒 −𝑥1 𝑥2 1
The marginal pdf of 𝑋1 is 𝑔(𝑥1 ) = ∫0 𝑑𝑥2 = 2 , 1 < 𝑥1 < 3
2
𝑓(𝑥1 , 𝑥2 )
ℎ(𝑥2 ∣ 𝑥1 ) = = 𝑥1 𝑒 −𝑥1𝑥2 , 𝑥2 > 0
𝑔(𝑥1 )
1
𝑋2 ∣ 𝑋1 ∼ Exp (𝑥1 ) with mean 𝑥
1
1
Therefore Var (𝑋2 ∣ 𝑋1 = 2) = 4 = 0.25
Hence 0.25 is correct answer.
Answer 19: 0.7
Explanation :
𝑇 𝑇(𝑇−1)
+ 𝑛(𝑛−1) is UMVUE of 𝜃(1 + 𝜃)
𝑛
𝑇 𝑇(𝑇−1)
𝑬 (𝑛 + 𝑛(𝑛−1)) = 𝜃(1 + 𝜃); where
𝑇 = ∑𝑛𝑖=1 𝑋𝑖
𝑇 𝑇(𝑇−1) 3 3(3−1) 21
Therefore, 𝑛 + 𝑛(𝑛−1) = 6 + 6(6−1) = 30 = 0.70
299 | P a g e

Answer 20: B
Explanation:
Let 𝑋1 , 𝑋2 , … be identically distributed random variable and let N be a random variable.
Define 𝑆𝑁 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑁
Then E(SN ) = E(Xi ) ⋅ E(N) = 4
𝑉(𝑆𝑁 ) = 𝐸(𝑁)Var (𝑋𝑖 ) + [𝐸(𝑋𝑖 )]2 Var (𝑁) = 10
Answer 21: B
Explanation:
If 𝐴 and 𝐵 be independent 𝑅𝑎𝑛𝑑𝑜𝑚 𝑉𝑎𝑟𝑖𝑎𝑏𝑙𝑒 each having the uniform distribution on [0,1].
Let 𝑈 = min{𝐴, 𝐵} and 𝑉 = max{𝐴, 𝐵},
then
𝐸(𝑈) = 1/3, 𝐸(𝑉) = 2/3 and 𝑈𝑉 = 𝐴𝐵 and 𝑈 + 𝑉 = 𝐴 + 𝐵
Thus Cov (𝑈, 𝑉) = 𝐸(𝑈𝑉) − 𝐸(𝑈)
𝐸(𝑉) = 𝐸(𝐴𝐵) − 𝐸(𝑈)
1 2 1
E(V) = E(A) ⋅ E(B) − E(U) ⋅ E(V) = − =
4 9 36
Answer 22: B
Explanation:
1
𝑋𝑖 ∼ 𝑈(0, 𝜃 2 ) 𝑓(𝑥) = ; 0 < 𝑥𝑖 < 𝜃 2
𝜃2
𝑋(3) ≤ 𝜃 2 ⇒ 𝜃ˆ ∈ [√𝑋(3) , ∞)
3
1
𝐿(𝑋, 𝜃) = ∏ 𝑓(𝑥𝑖 , 𝜃) =
𝜃6
𝑖=1
∂𝐿
⇒ ∂𝜃 < 0 there fore given function is decreasing then 𝜃ˆ = √𝑋(3)
Answer 23: C
Explanation:
X −1 0 1
P(x) 1/8 6/8 1/8

1 6 1
E(X) = −1 × + 0 × + 1 × = 0
8 8 8
300 | P a g e
1 6 1 1
E(𝑋 2 ) = 1 × + 0 × + 1 × =
8 8 8 4
2) 2
1 1
𝑉(𝑋) = 𝐸(𝑋 − {𝐸(𝑋)} = ⇒ 𝜎𝑋 =
4 2
𝑃{|𝑋 − 𝜇𝑥 | ≥ 2𝜎𝑥 } = 𝑃{|𝑋| ≥ 1} = 1 − 𝑃(|𝑋| < 1)
= 1 − 𝑃(−< 𝑋 < 1) = 1 − 𝑃(𝑋 = 0) = 1/4
1
𝑃{|𝑋 − 𝜇𝑥 | ≥ 2𝜎𝑥 } ≤ 4 [By Chebychev’s inequality]
Answer 24: B
Explanation:
Let 𝑋𝑖 , 𝑌𝑖 ; (𝑖 = 1,2) be a i.i.d random sample of size 2 from a standard normal distribution.
√2(X1 +X2 )
Then W = ∼ 𝑡(2)
√(X2 −X1 )2 +(Y2 −Y1 )2
Hence option (b) is correct.
Answer 25: D
Explanation:
Let 𝑋 be Random Variable with 𝑀𝑋 (𝑡) = 𝐸(𝑒 𝑡𝑋 ) = ∑etx P(X = x)
1
; 𝑥=0
6
1
; 𝑥=1
3
Then 𝑃(𝑋 = 𝑥) = 1
; 𝑥=2
3
1
{6 ; 𝑥 = 3
𝑃(𝑋 ≤ 2) = 𝑃(𝑋 = 0) + 𝑃(𝑋 = 1) + 𝑃(𝑋 = 2)
1 1 1 5
= + + =
6 3 3 6
Answer 26: A
Explanation:
1 1
1 1
𝑓(𝑥) = 1; 𝜃 − < 𝑥𝑖 < 𝜃 +
2 2
1 1
𝜃ˆ ∈ [𝑋(𝑛) − , 𝑋(1) + ]
2 2
distribution of 𝑋 free from parameter,
301 | P a g e

1 1
then 𝜃ˆ = 𝜆 (𝑋(𝑛) − 2) + (1 − 𝜆) (𝑋(1) + 2) ; 0 < 𝜆 < 1
1 1 3
Take 𝜆 = 2 , 4 and 4 then we obtained mle of 𝜃 are
1 1 1
(𝑋(1) + 𝑋(𝑛) ); 4 (3𝑋(1) + 𝑋(𝑛) + 1); 4 (3𝑋(1) + 𝑋(𝑛) + 1) respectively.
2
Hence option (a) is correct.

Answer 27: D
Explanation:
Let 𝑋 and 𝑌 be random variable having joint probability
𝑘
density function 𝑓(𝑥, 𝑦) = (1+𝑥 2)(1+𝑦 2) ; −∞ < (𝑥, 𝑦) < ∞
∞ ∞ 1
∫−∞ ∫−∞ 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = 1 ⇒ 𝑘 = 2
𝜋
1 1
Since 𝑋 and 𝑌 are independent, then 𝑋 ∼ 𝑓(𝑥) = 𝜋 1+𝑥 2 ; −∞ < 𝑥 < ∞
P(X = Y) = 0{ There is no region occur corresponding to X = Y, then probability
corresponding to this region will be zero}
Answer 28: D
Explanation:
Clearly, 𝑋1 , 𝑋2 , … , 𝑋𝑛
are i.i.d 𝐺(3,1) random variables. Then, 𝐸(𝑋𝑖 ) = 3 and Var (𝑋𝑖 ) = 3, 𝑖 = 1,2, …
Let 𝑆𝑛 = 𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 , then E(𝑆𝑛 ) = 3𝑛 and Var (𝑆𝑛 ) = 3𝑛
Now For option (a)

𝑆𝑛 −3𝑛
Using CLT ∼ 𝑁(0,1) for all 𝑛 ≥ 1
√3𝑛
For option (b)

𝑆 3𝑛 𝑆 3𝑛
lim𝑛→∞ 𝐸 ( 𝑛𝑛) = lim𝑛→∞ 𝑛 = 3; lim𝑛→∞ 𝑉 ( 𝑛𝑛) = lim𝑛→∞ 𝑛2 = 0
By Using Convergence in probability condition
(Consistency Properties)
𝑆
For all 𝜀 > 0, 𝑃 (| 𝑛𝑛 − 3| > 𝜀) → 0 as n → ∞
For option (c)
𝑆𝑛
→3
𝑛
with probability 1 (By using convergent in probability condition)
302 | P a g e
For option (d)

𝑠𝑛 −𝐸(𝑠𝑛 ) 3(𝑛−√𝑛)−𝐸(𝑆𝑛 )
lim𝑛→∞ 𝑃 ( ≥ ) = 𝑃(𝑍 ≥ −√3) = 1 −
√Var (𝑆2 ) √Var (𝑆𝑤 )
𝑃(𝑍 ≤ −√3)
1
= 1 − Φ(−√3) ≥
2
Answer 29: D
Explanation:
(A) Sum of independent binomial variate is also a binomial variate if corresponding
probability will be same then 𝑋 + 𝑌 ∼ Bin(2𝑛, 𝑝)
(B) When there are more than two variables include, the observation lead to multinomial
distribution.
(𝑋, 𝑌) not follows Multinomial (2𝑛; 𝑝, 𝑝)
(C) Var (X − Y) = E(X − Y)2 − {E(X − Y)}2 = E(X − 𝑌)2
(D) Cov (𝑋 + 𝑌, 𝑋 − 𝑌) = 𝑉(𝑋) − Cov (𝑋, 𝑌) + Cov (𝑌, 𝑋) − 𝑉(𝑌) = 0
{∴ X and Y are independent Cov (X1 Y) = Cov (Y, X) = 0}
Answer 30: D
Explanation:
The joint pdf of 𝑋 and 𝑌 is
1 2 +𝑦 2 ) 1 2 1 2
1 1 1
𝑓(𝑥, 𝑦) = 2𝜋 𝑒 −2(𝑥 = 𝑒 −2(𝑥 ) × 𝑒 −2(𝑦 ) ; (𝑥, 𝑦) ∈ ℝ2
√2𝜋 √2𝜋
It is easy to see that 𝑋 and 𝑌 are i.i.d 𝑁(0,1) random variables, and therefore,
1
𝑃(𝑋 > 0) = 2
1 1 1
𝑃(𝑋 > 0)𝑃(𝑌 < 0) = × =
2 2 4
𝑃(𝑋 > 0, 𝑌 < 0) 1

𝑃(𝑋 > 0 ∣ 𝑌 < 0) = =
𝑃(𝑌 < 0) 2
Answer 31: D
Explanation:
303 | P a g e

E{E[ X ∣ Y ]} = E[X] = E[Y] {Given that E[X] = E[Y]}
V(𝑋 − 𝑌) = 𝐸(𝑋 − 𝑌)2 − {𝐸(𝑋 − 𝑌)}2 = 𝐸(𝑋 − 𝑌)2

{ Since 𝐸[𝑋 − 𝑌] = 0}
𝐸[ V(X ∣ Y)] + V[E(X ∣ Y)] = V(X)
𝑋 and 𝑌 may or may not be same distribution.
Answer 32: C
Explanation:
𝑋1 , 𝑋2 , … , 𝑋𝑛 be a random sample from Exp (𝜃)
distribution, where 𝜃 ∈ (0, ∞)
Then 2𝜃∑𝑛𝑖=1 𝑋𝑖 ∼ 𝜒2𝑛 2
⇒ P(0 < 2𝜃∑𝑛𝑖=1 𝑋𝑖 ≤ 𝜒2𝑛,0.95
2
)=
0.95
2
𝜒2𝑛,0.95
0 < 2𝜃∑𝑛𝑖=1 𝑋𝑖 ≤ 𝜒2𝑛,0.95
2
⇒ 𝜃 ∈ (0, ]
2𝑛𝑥‾
Answer 33: B
Explanation:
𝑋1 , 𝑋2 , … , 𝑋𝑛 are independent, we have that E(𝑌𝑛 ) = E(𝑋1 ) × … × 𝐸(𝑋2 ). Similarly,
𝐸(𝑌𝑛2 ) = E(𝑋12 ) × … × 𝐸(𝑌𝑛2 ). Since
E(𝑋𝑖 ) = 1/2 and E(𝑌𝑖2 ) = 1/3 for i = 1,2, … , n
it follows that
1 1
Var (𝑌𝑛 ) = 𝐸(𝑌𝑛2 ) − [E(𝑌𝑛 )]2 = 𝑛 − 2𝑛
3 2
Hence option (B) is correct.
Answer 34: C
Explanation:
Note that 𝑃(𝑋1 < 𝑋2 ) + 𝑃(𝑋2 < 𝑋1 ) + 𝑃(𝑋1 = 𝑋2 ) = 1
since the corresponding events are disjoint and exhaust
all the probabilities. But 𝑃(𝑋1 < 𝑋2 ) = 𝑃(𝑋2 < 𝑋1 )
by symmetry. Furthermore, 𝑃(𝑋1 = 𝑋2 ) = 0
1
since the random variables are continuous. Therefore, 𝑃(𝑋1 < 𝑋2 ) = . From above results
2
1
𝑃(𝑋3 < 𝑋2 < max(𝑋1, 𝑋4 )) = 4
Answer 35: A
304 | P a g e
Explanation:
1 1
1 1
𝑓(𝑥) = 1; 𝜃 − < 𝑥𝑖 < 𝜃 +
2 2
1 1
ˆ
𝜃 ∈ [𝑋(𝑛) − 2 , 𝑋(1) + 2] ; distribution of 𝑋
1 1
free from parameter, then 𝜃ˆ = 𝜆 (𝑋(𝑛) − 2) + (1 − 𝜆) (𝑋(1) + 2) ; 0 < 𝜆 < 1
1 1
Take 𝜆 = 2 , 4 we get
1
𝑇1 = 2 (𝑋(1) + 𝑋(𝑛) ) is MLE as well as consistent for 𝜃
1
𝑇2 = (3𝑋(1) + 𝑋(𝑛) + 1)
4
is MLE as well as consistent for 𝜃 but not unbiased…
14.8 REFERENCES
• S. C Gupta, V.K Kapoor, Fundamentals of Mathematical Statistics, Sultan Chand
• S. C Gupta, V.K Kapoor, Fundamentals of Mathematical Statistics, Sultan Chand
• B.L Agarwal, Programmed Statistics, New Age International Publishers, 2nd Edition.
305 | P a g e

978-81-19169-58-0
9 788119 169580

Lesson 1 - 14

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lesson 1 - 14

Uploaded by

Copyright:

Available Formats

Department of Distance and Continuing Education

© Department of Distance and Continuing Education

© Department of Distance & Continuing Education, Campus of Open Learning,

This Study Material is duly recommended in the meeting of Standing

• Corrections/Modifications/Suggestions proposed by Statutory Body, DU/Stakeholder/s in the Self

© Department of Distance & Continuing Education, Campus of Open Learning,

BASIC STATISTICS FOR ECONOMICS

© Department of Distance & Continuing Education, Campus of Open Learning,

INTRODUCTION TO POPULATION AND SAMPLE

1.1 Learning Objectives

1.1. LEARNING OBJECTIVES

After reading this lesson, students will be able:

© Department of Distance & Continuing Education, Campus of Open Learning,

1.3 TYPE OF DATA

1.3.1 Quantitative data

© Department of Distance & Continuing Education, Campus of Open Learning,

1.3.2 Qualitative data

1. Sort the following data into quantitative and qualitative data:

A. Economic status: low, medium and high 1. Discrete

4. Fill in the blank:

1.4 POPULATION, SAMPLE AND PROCESSES

Sales (in Rs.) 2,500 2,700 2,000 3,200 3,800

© Department of Distance & Continuing Education, Campus of Open Learning,

Sales (in Rs.) 2,500 2,700 2,000 3,200 3,800

Sales (in Rs.) 2,500 2,700 2,000 3,200 3,800

Price (in Rs.) 20 18 21 16 15

The researcher can identify several relationships between these variables.

© Department of Distance & Continuing Education, Campus of Open Learning,

Figure 1: Sample is a subset of Population

transportation, stationery, and other miscellaneous expenses. On the other hand,

Definition Set of all items or observations Subset or a part of population that

Characteristics Parameter Statistics

Symbols Population size = N Population size = n

© Department of Distance & Continuing Education, Campus of Open Learning,

Population mean = µ (Mu) Population mean = x̅ (x bar)

Data collection Census Sampling

Advantages 1. Results are representative of 1. Convenient

5. Select the correct option and fill in the blank:

1.5 SAMPLING TECHNIQUES

© Department of Distance & Continuing Education, Campus of Open Learning,

Figure 2: Sampling techniques

8. What type of a sampling technique is being used in the following examples:

© Department of Distance & Continuing Education, Campus of Open Learning,

C. A research student asks the respondents to identify other potential research

1.6 BRANCHES OF STATISTICS

1.6.1 Descriptive Statistics

Standard Bar graph,

Figure 3: Branches of statistics

© Department of Distance & Continuing Education, Campus of Open Learning,

Statistics is defined as the study of collecting, organizing, analyzing, and interpreting

• Convenience Sampling: Data is collected from convenient sources

© Department of Distance & Continuing Education, Campus of Open Learning,

1.9 ANSWERS TO IN-TEXT QUESTIONS

1. A) Qualitative B) Quantitative C) Quantitative D) Quantitative E) Qualitative

1.10 SELF-ASSESSMENT QUESTIONS

Q.1 Define Statistics. Is it science or art? Justify your answer.

1.12 SUGGESTED READING

• Gupta, S. C. (2019). Fundamentals of statistics. New Delhi, India: Himalaya publishing

© Department of Distance & Continuing Education, Campus of Open Learning,

PICTORIAL METHODS IN DESCRIPTIVE STATISTICS

2.1 Learning Objectives

2.1 LEARNING OBJECTIVES

After reading this lesson, students will be able: