Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

QUE. 1) WHAT IS STATISTICS?

Ans. Introduction
Origin and Development of Statistics
The subject of statistics, as it seems, is not a new discipline but it is as old as the human society itself. It has been
used right from the existence of life on this earth, although the sphere of its utility was very much restricted. In
the old days, statistics was regarded as the ‘Science of Statecraft’ and was the by product of the administrative
activity of the state.
The word statistics seems to have been derived from the,
Latin word ‘Status’, Italian word ‘Statista’, German word ‘Statistik’, French word ‘Statistique’.
Each of which means a political state. In the ancient time the scope of statistics was primarily limited to the
collection of the following data by the govt. for framing military and fiscal policies: Age and Sex-wise population
of the country. Property and Wealth of the country. But now days the statistics is use in very large areas like
that Medicine, Environment, Industry, Govt. Surveys, Market Research, etc. In short, the person who like to solve
practical problems, like to work with numbers, like to work with people and like to work with computer, he/she
use the Statistics. Statistics helps us make better decisions.
Meaning of Statistics
Statistics is the art and scientific application of mathematical principles to the collection, analysis, presentation
and interpretation of numerical data. # Statistics is the art and science of deciding what are the appropriate data
to collect, deciding how to collect them efficiently and then using them to give information, answer – questions,
draw inferences and make decisions.
Statistics means,
Producing trustworthy data Analyzing data to make their meaning clear Drawing practical conclusions from
data In short, Statistics is use to solve problems in a wide variety of fields.
“Statistics is a tool for creating new understanding from a set of numbers.”
Statistics (1)Data= Facts, especially numerical facts, collected together for reference or information
(2)Information= Knowledge communicated concerning some particular fact
Definitions of Statistics 1) “Statistics may be called the science of counting.”- Bowley A.L. 2) “Statistics may rightly
be called the science of average.”- Bowley A.L. 3) “Statistics is the science of estimates and probabilities.”-
Boddington 4) “Statistics may be defined as the science of collection, presentation, analysis and interpretation of
numerical data.”- Croxton and Cowden 5) “Statistics is the science which deals with classification and tabulation
of numerical facts as the basis for explanation, description and comparison of phenomenon.”- Lovin
QUE. 2) WHAT IS DATA? EXPLAIN TYPES OF DATA.
Ans. Introduction Data means the facts, especially numerical facts, collecting together for reference or
information. In other words, data means the numerical facts which are useful for investigator to reach the final
conclusion of the statistical problem. The raw data means the unorganized data or unprocessed data.
Types of Data In the statistics there are four types of data but mainly two types of data. A) Primary Data B)
Secondary Data
A) Primary Data: Primary data means the data which is collected first time. The investigator or the organizing
agency may collect the data originally is called primary data. Primary data means original data that have been
collected specially for the purpose in mind. Research where one gathers this kind of data is referred to as ‘field
research’. For example: A questionnaire B) Secondary Data: The data which have already been collected and
processed by some agency or person and taken over from there and used by any other agency for their statistical
work ate termed as secondary data.In other words, we can say that secondary data are data that have been
collected for another purpose and where we will use statistical method with the primary data. It means that after
performing statistical operations on primary data the results become known as secondary data. Research where
one gathers this kind of data is referred to as desk research. For example: Data from a book, Data from the
internet.
Other Two Types of Data
C) Qualitative Data: Qualitative data is not given numerically. Qualitative data is a categorical measurement
expressed not in terms of numbers, but rather by means of natural language description. For example: Favourite
Movie “Fast and Furious” D) Quantitative Data: Quantitative data is numerically. Quantitative data is a numerical
measurement expressed not by means of a natural language description, but rather in terms of numbers. For
example: Weight – 45kg.
Que. 3) What is Primary data? Also explain methods of collecting primary data.
Ans. Primary Data: Primary data means the data which is collected first time. The investigator or the organizing
agency may collect the data originally is called primary data. Primary data means original data that have been
collected specially for the purpose in mind. Research where one gathers this kind of data is referred to as ‘field
research’. For example: A questionnaire
Methods of collecting Primary data
1) Direct personal investigation 2) Indirect oral interviews 3) Information received through local agencies 4)
Questionnaire method 5) Schedules sent through enumerators
1) Direct personal investigation: This method consists in the collection of data personally by the investigator
from the sources concerned. In other words, the investigator has to go to the field personally for making enquires
and soliciting information from the informants or respondents. This investigation very much restricts the scope
of the enquiry. This method should be used only if the investigation is generally local. Merits: The first hand
information obtained by the investigator himself is bound to more reliable and accurate. When the audience is
approached personally by the investigator, the response is likely to be more encouraging. The investigator can
extract proper information from the respondents by talking to them at their educational level.
Demerits: This type of investigation is restrictive in nature and is suited only for intensive studies and not for
extensive enquiries. This type of investigation is handicapped due to lack of time, money and manpower. It is
particularly time consuming. The main drawback of this enquiry is that it is absolutely subjective in nature.
2) Indirect oral interview: When the ‘direct personal investigation’ is not practicable either because of lack of
time, money or the area is large the investigator used the method for collecting the primary data is “Indirect oral
interview”. For example, if we want to solicit information on certain social evils like if a person is addicted to
drinking, gambling or smoking, etc. The information on the gambling, drinking or smoking habits of an individual
can best be obtained by interviewing his personal friends, relatives or neighbours who know him thoroughly well.
Merits: As compared with the method of ‘direct personal investigation’, this method is less expensive and
requires less time for conducting the enquiry. If necessary, the expert views and suggestions of the specialists
on the given problem can be obtained in order to formulate and conduct the enquiry more effectively and
efficiently.
Demerits: Due to lack of direct supervision and personal touch the investigator has to rely entirely on the
information supplied by the enumerators. If the wrong and improper choice of the witnesses will give biased
results which may adversely affect the findings of the enquiry.
3) Information received through local agencies:
In this method the information is not collected formally by the investigator or the enumerators. This method
consists in the appointment of local agents by the investigator in different parts of field of enquiry. These local
agencies in different regions collect the data according to their own ways, fashions, likings and decisions and then
submit their reports periodically to the central or head office where the data are processed for final analysis.
Merits: This method works out to be very cheap and economical for extensive investigations particularly if the
data are obtained through agents. Moreover, the required information can be obtained expeditiously since only
rough estimates are required. Demerits: Since the different local agents collect the data in their own fashion
and style, so the enquiry and consequently the data may be not reliable or proper.
4) Questionnaire method: This method consists in preparing a questionnaire (A list of questions relating to the
field of enquiry and providing space for the answers to be filled by the respondents.) which is mailed to the
respondents with a request for quick response within the specified time. This method is usually used by the
research workers, private individuals, non-official agencies and sometimes even by government.
Merits: Of all the methods of collecting primary data the “Questionnaire method” is by far the most economical
method in terms of time, money and manpower. This method is used for extensive enquiries covering a very
wide area. Demerits: The most serious drawback of this method is that it can be used only for educated
population. Quite often people might suppress correct data and furnish wrong replies then we cannot receive
reliable data. Another limitation is that the respondents are not willing to give answer some personal questions
like income, property, age, personal habits, etc.
5) Schedules sent through Enumerators: Before discussing this method it is desirable to make a distinction
between a questionnaire and a schedule. As already explained, Questionnaire in a list of questions which are
answered by the respondent himself/herself in this own handwriting. While schedule is the device of obtaining
answers to the questions in a form which is filled by the interviewers or enumerators (the field agents who put
these question) in a face to face situation with the respondents. Merits: The enumerators can explain in detail
the objectives and aims of the investigation to the respondents impress upon term the need of furnishing the
correct data. Unlike the “Questionnaire method” this technique can be used with advantage even if the
respondents are illiterate. Demerits: It is fairly expensive method because the enumerators or institute is make
their change which are financially sound. It is also more time consuming as compared with the “Questionnaire
method”. The success of the method largely depends upon the efficiency and skill of the enumerators. If the
enumerators are not able to collect the data then the investigator cannot get the correct data.
QUE. 4) CHARACTERISTICS OF AN IDEAL QUESTIONNAIRE
1) The size of the questionnaire should be as small as possible. 2) The questions should be clear, brief, simple,
unambiguous and precise. 3) The questions should be arranged in a natural logical sequence. 4) The usage of
‘multiple meaning’ and vague words should be avoided. 5) The questions should be capable of being easily
answered by the respondents. The questions that rely too much on the memory of the respondent should be
avoided. 6) The questions affecting the sentiments and personal nature should never be asked. 7) Use main four
types of questions in the questionnaire:- Simple alternative questions: Such questions are answered by yes/no
or Right/Wrong, etc. Multiple choice questions: In such questions the possible answers are printed in the
questionnaire and the respondent is supposed to tick any one of them; E.g.;
What is your occupation?
a) Business b) private job c)Govt. job d) Any other
Specific information questions: Such questions are used to extract specific information like Name, Address,
B.O.D, etc.
Open questions: These types of questions are to be answered by the respondent in his own words like Aim,
Suggestions, etc.
8) The questions relating to mathematical calculation should be avoided.
9) The questions should be directly related to the objective of investigation.
10) Necessary instruction for filling the questionnaire should also be given in simple form and words.
11) Enough space should be provided for answers. The questionnaire should look as attractive as possible.
12) Before actually using questions a test check must always be done by obtaining answers from respondents. In
necessary the questionnaire should be modified.
QUE. 5) SOURCES OF SECONDARY DATA
The chief sources of secondary data may be broadly classified into the following two groups:1) Published sources
2) Unpublished sources
1) Published Sources:
There are a number of national organizations and also international agencies. Which collect statistical data
relating to business, trade, labour, prices, consumption, production, industries, agriculture, income, currency and
exchange, health, population and a number of socio-economic phenomena and publish their findings in statistical
reports on a regular basis.(Monthly, Quarterly, Annually, Ad-hoc)
The published sources are consisting:
A) Official publications of central Govt.:
Monthly abstracts of statistic; statistical pocket book, India; Annual survey of Industries-General review;
Statistical system of India, etc. all published by the Central Statistical Organization (C.S.O.) New Delhi,
Census data in various census reports; vital statistics of India-all published by Register General of India (R.G.I.),
Various statistical reports on phenomenon relating to socio-economic and demographic conditions published
by National Sample Survey Organization (N.S.S.O.)
B) Publications of semi Govt. statistical organization:
Statistical department of the Reserve Bank of India, Mumbai. Economic department of the RBI. The institute
of Economic Growth, Delhi. Gokhale institute of Politics and Economics, Puna. The institute of Foreign Trade,
New Delhi.
C) Publication of Research institutions: Indian Council of Agricultural Research (I.C.A.R.), New Delhi. Indian
Statistical Institute (I.S.I) Indian Agricultural Statistics Research Institute (I.A.S.R.I)
D) Publications of commercial and financial institutions : Federation of India chamber of commerce and
industries (F.I.C.C.I ) Institute of chartered Accountants of India Trade unions Stock exchanges Bank bodies
Co-operative societies
E) Reports of various committees and commissions appointed by the Govt.: They are providing the information
related to the wages, dearness allowance, price, national income, taxation, land, education, etc.
Kothari commission report on educational reforms. Gupta commission report on Maruti Affairs.
Wanchoo commission report on Taxation.
F) Newspaper and periodicals: Eastern economist Economic times The financial express Indian journal of
economic Commerce Capital Transport
G) International publication: United Nations Organization (U.N.O) UNO statistical year book Demographic
year book World Health Organization (W.H.O) International Labour Organization (I.L.O)
International Monetary Fund (I.M.F)
2) Unpublished data: The statistical data need not always be published. There are various sources of unpublished
statistical material such as the records maintained by private firms or business enterprises who may not like to
release their data to any outside agency; the various department and offices of the Central and State
Governments; the researches carried out by the individual research scholars in the universities or research
institutes.
QUE. 1) WHAT IS CORRELATION?
We have studied problems relating to one variable. We know that a distribution can be studied with the help of
measures of central values and measures of dispersion. But many times we come across two variables which
appear to move simultaneously E.G., if we study heights and weights of a group of persons, we observe that a
person with more weight. Thus, height and weight are found to be related variable. Similarly we come across
many pairs of variables which may found to be related. The income and expenditure of persons, numbers of
vehicles and number of accidents; demand and price of commodity etc. are very familiar example of related
variables. Generally when there are simultaneous changes in the values of two variables, we observe that there
exists some cause and effects relationship between the two variables. The relationship between income and
expenditure of persons, between demand and price of a commodity are illustrations of relationship due to cause
and effect. Sometimes there may not be the direct cause, and effect relationship between two variables, but they
may be indirectly related with each other. Simultaneous changes in the values of two variables may be due to
some other factor. E.G., the simultaneous increase in the sales of umbrellas and rain shoes may be due to heavy
rain. When the changes in the values of two variables are simultaneous and when there is cause and effect
relationship between two variables, they are said to be correlated variables. Thus, correlation is a statistical tool
with the help of which the relationship between two variables can be studied. It should be carefully understood
that the variables are said to be correlated provided there exists causation between them.
In the study of two variables the relationship may be Linear or non-linear. In this chapter only discuss the
problems in which the relationship is linear. According to statistician king, “Correlation means that
between two series or groups of data there exists some causal connection.” Bondington has defined
correlation in the following way; “It does not matter whether the data in one section changes in the same or the
reverse direction to that in the other, so long as a movement in sympathy is apparent.” According
to A.M. Tuttle, “An analysis of the co-variation of two or more variables is usually called Correlation.”
QUE. 2) TYPES OF CORRELATION
The Correlation between two variables can be of the following two types: 1) Positive correlation 2) Negative
correlation 1) Positive correlation: Sometimes the changes in the values of two variables are in the same
direction i.e. when the values of one variable increase, the values of the other variable also increase and when
the values of one variable decrease, the value of other variable also decrease, the correlation between them is
said to be positive. The correlation between age of husband and age of wife, income and expenditure are the
examples of positive correlation. The following table gives age of husbands and their wife. It can be seen
that when the age of husband is more, the age of wife is also more. Thus there is positive correlation between
age of husband and age of wife. 2) Negative correlation: When the changes in the values of two variables are in
opposite direction, it is called negative correlation. When the values of one variable increase, the values of the
other variable decrease, and when the values of one variable decrease, the values of the other variable increase,
the correlation between them is said to be negative. The correlation between the price of a commodity and its
demand, expenditure and saving, age of driver and number of accidents are the examples of negative correlation.
The following table gives the price of a commodity and its demand. If can be seen that when the price
increase, the demand decrease. Thus there is negative correlation between the price and demand.
QUE. 3) METHODS OF MEASURING CORRELATION
We shall discuss the following three methods of studying direction and degree of relationship between to
variables. 1) Scatter diagram method 2) Karl Pearson’s product moment method 3) Spearman’s method of rank
correlation
1) Scatter diagram method: This is very simple method of studying the relationship between two variables. In
this method one variable is taken on X-axis and the other variable is taken on Y-axis and for each pair of values,
points are plotted on the graph paper.
Types of correlation by Scatter diagram method:
Perfect Positive correlation: The changes in the values of two variables are in same direction and in the same
proportion it is called Perfect Positive Correlation. The curve of perfect positive correlation is up-word sloping in
straight line. Where r = +1 /
Partial Positive Correlation:The changes in the value of two variables are in same direction but not in the same
proportion it is called Partial Positive Correlation. The curve of partial positive correlation is up-word sloping but
not in straight line. Where r=0 to +1 /\/\
Perfect Negative correlation: The changes in the value of two variables are in opposite direction and in the same
proportion it is called Perfect Negative Correlation. The curve of perfect negative correlation is down-word
sloping in straight line. Where r = -1 \
Partial Negative Correlation: The changes in the value of two variables are in opposite direction but not in the
same direction it is called Partial Negative Correlation. The curve of partial negative correlation is down-word
sloping but not in straight line. Where r = -1 to 0 \/\/
Absence of correlation / Independent correlation / No relation:
The changes in the value of two variables are randomly it is called Absence of the Correlation. Where r = 0……
2) Karl Pearson’s product moment method:
We have seen that the scatter diagram method helps us in knowing the direction of the relationship between the
various, but it cannot give us the exact amount of the relationship between the variables. Among different
methods of finding out the degree and the direction of relationship between two variables, the method given by
Karl Pearson is most accurate and it is very widely used. By this method the amount of relationship between two
variables can be numerically measured. The numerical measure of correlation between two variables is known
as correlation co-efficient and it is denoted by ‘r’. Karl Pearson has defined correlation co-efficient in the following
way.
Merits and limitations of Pearson’s correlation co-efficient:
Karl Pearson’s co-efficient of correlation is the best measure for representing the relationship between two
variables. The degree and direction of the relationship between the variables can be obtained by it. However the
following are some of the limitation of it; If is based on the assumption of linearity of relationship between the
variables. The computation by this method is difficult compared to other methods. The correlation co-efficient
is highly influenced by extreme pairs of observations. It is always difficult to interpret the correlation co-efficient
correctly.
3) Spearman’s method of rank correlation:
Prof. Charles Edward Spearman has given one method of finding out correlation co-efficient between two
variables. In this method instead of values, the ranks are used to find out correlation co-efficient and hence the
method is known as the method of rank correlation. We know that qualitative phenomena cannot be numerically
expressed. But it is convenient to assign them ranks, e.g. suppose there are 10 competitors in a beauty contest.
It is inconvenient to give marks to these competitors in a beauty contest. Instead of that they can be easily
assigned ranks as first rank, second rank, etc. If two judges have given ranks to the same participants then we
may be interested in knowing how far the two judges agree in assigning ranks. This can be measured by co-
efficient of rank correlation. Method of rank correlation can thus be used for finding out the relationship between
two qualitative phenomena like honesty, intelligence, poverty, etc. Merits : This method is easier to
understand and apply compared to Karl Pearson’s method. When the data are of qualitative nature like honesty,
beauty, intelligence, etc. this method is convenient. When the dispersion in a series is more this method is
useful. When the ranks are given instead of values then this is the only method that can be used. Limitations:
This method does not give accurate results as compared to Pearson’s method. When there are none
observations, it is tedious to assign ranks. The method cannot be used for data given in a bivariate frequency
distribution.
QUE. 4) INTERPRETATION OF CORRELATION CO-EFFICIENT
The correlation co-efficient expresses the degree and direction of the relationship between the variables. Having
obtained the value of the correlation co-efficient, it is essential to interpret it. The sign of the correlation co-
efficient gives the idea about the direction of the relationship while the numerical value gives the idea about the
closeness of the relationship. However, the interpretation of the correlation co-efficient mainly depends upon
the experience of dealing with such problems. The following general rules are useful in interpreting the value of
correlation co-efficient.
(1) Interpretation of r = +1: r = +1 shows perfect positive correlation between two variables. For such variables
an increase in the value of one variable is associated with a proportional increase in the value of the other
variable. The points on the scatter diagram for such variables are in a straight line in an increasing order.
(2) Interpretation of r = -1: r = -1 shows perfect negative correlation between two variables. For such variables
an increase in the value of one variable is associated with a proportional decrease in the value of the other
variable. The points on the scatter diagram for such variables are in a one straight line in decreasing order.
(3) Interpretation of r = 0: r = 0 shows absence of the relationship between the variables. Such variables are said
to be uncorrelated. The variables are independent and the points on the scatter diagram are randomly
distributed. (4) If the value of r is nearer to +1 or -1, the relationship between the variables is closer, and if the
value of r is nearer to zero, the relationship is less close. (5) The relationship between the variables is not
proportional to the value of r. i.e. r = 0.8 does not indicate that the relation is two times closer than when r = 0.4,
r = 0.8 indicate more closeness of the relationship than r = 0.4. (6) Before interpreting the value of r, we should
examine whether there exist cause and effect relationship between the variables. (7) In estimating the population
correlation co-efficient from the value of sample correlation co-efficient the probable error of r should also be
taken into consideration.
WHAT IS REGRESSION? Meaning When there are simultaneous changes in the values of two variables and
when the changes in one variable are due to the changes in other variable, they are said to be correlated,
and the correlation co-efficient expresses the extent of the relationship between them. But if we want to
know the value of one variable when the value of other variable is given to us the correlation analysis cannot
help us e.g. If we are given the figures regarding rainfall and yield of rice for last 10 years, we can find out
correlation co-efficient between them. But that cannot help us in estimating the yield of current year when
know the rainfall. For estimating the value of one variable for a given value of another variable, we must find
out some functional relationship between the variable. Regression is a statistical technique with the help of
which helps us in estimating the unknown value of one variable for a known value of other variable. The
word regression was first used by Sir Francis Galton at the end of 19th century. He used the word regression
while studying the relationship between heights of fathers and heights of sons. # In the study of regression
mathematical model is used for representing the relationship between two variables, i.e. some
mathematical equation is obtained to represent the relationship between two variables. If there is cause
and effect relationship between two variables, a change in the value of one variable will result in a
corresponding change in the value of another variable. The variable in which we make changes is called
causal variable and it is called in independent variable. Generally it is denoted by x by making changes in the
values of causal variable x, the other variable in the form of effect also changes. The variable in the form of
effect is called a dependent variable and usually it is denoted by y. In the relationship between income and
expenditure, income is independent variable and expenditure can be demoted by y. In the study of rainfall
and the yield of rice the amount of rainfall is an independent variable, while yield of rice is dependent
variable. In the regression model the dependent variable y is expressed as a function of independent variable
x Thus regression is a relationship between two variables determined by an appropriate mathematical
function. If this relationship is represented by some straight line it is called linear regression. In the study of
relationship between two variables one variable is taken as independent variable and the other variable is
taken as dependent variable and a line of regression is obtained. # In most of the cases the independent
and dependent variable can be easily decided. In the study of income and expenditure of families income is
independent variable, while expenditure is dependent variable. Similarly in the study of regression between
rainfall and yield of a crop rainfall is independent variable, while yield of crop is dependent variable. Thus
the dependent and independent variables can be easily distinguished in most of the cases. Denoting
independent variable by x and dependent variable by y, we obtain regression line of y on x. # In some of the
cases it is not easy to determine independent and dependent variables. For e.g. in the study of the
relationship between demand and price, one can think in the following two ways: (1) Demand depends upon
price or (2) Price depends upon demand. Thus it is difficult to determine independent and dependent
variables. Similarly in the study of relationship between height and weight the independent and dependent
variables cannot be categorically decided. In such cases the two variables are found to be manually
dependent, and it becomes difficult to determine which variable should be regarded as dependent variable
and which variable should be regarded as independent variable. In such cases two regression lines are
obtained.
2) REGRESSION LINES We know that the simplest method of studying the relationship between two
variables is the method of scatter diagram. In the figure scatter diagram is shown for two variables. Generally
all the points of the scatter diagram are not in one straight line, and hence the line, around which most of
the points lie, may be regarded as a line showing the relationship between the variables. A number of such
lines can be thought of. We must find out the best line out of all such line. The line around which most of
the points lie is regarded as the best line. A well-known mathematical principle of Least Squares can be used
to obtain such a line. The line obtained by least squares principle is known as the Line of Best Fit. It is also
called the best estimating line or the regression line. Thus regression line is the best average line obtained
by the least squares principle. Taking x as independent variable and y as dependent variable a line obtain by
least squares principle is called regression line of y on x We know that equation of any straight line can be
written in the form of y = a + bx. Hence the line obtained by least squares principle from the points of the
scatter diagram can be represented as y = a + bx. More specifically this line will be represented as𝐲 = 𝒂 +
𝒃𝒚𝒙𝒙 . # In cases in which y is taken as independent variable and x is taken as dependent variable the line
obtained by least squares principle can be called a regression line of x on y and we shall write its equation as
𝐱 = 𝒂 + 𝒃𝒙𝒚𝒚 . Thus for two mutually related variables we can obtain two regression lines.
Difference between Correlation and Regression
Correlation It gives a numerical measure of the linear correlation in between the variables.  It is along
between -1 and +1.  It is independent change of origin and scale.  It can be obtain from regression
coefficient. Regression  It gives the functional relationship between the variables.  It is greater than 1.  It
is independent change of origin but not scale.  It can be obtain from correlation coefficient.
What is Probability? Introduction: 1. The sun will rise in the east. 2. A flower taken from a basket full of
roses, is a rose. 3. An apple falling from a tree will go up. 4. A ball taken from a bag containing white balls is
a black ball. 5. A coin is tossed. 6. A card is drawn from a pack of cards. # Among the above events the first
two are certain to occur. The third and fourth are impossible events. Thus we come across many events
which certainly occur whereas certain events cannot occur or they are impossible events. While in cases of
some experiments, the results cannot be predicted in advance. For example when a coin is tossed we may
get either a head or a tail. If a card is drawn from a pack of cards, we get any one of the 52 cards. Thus we
cannot predict the results of the events (5) and (6).# In practical life, we come across many situations, where
the results are uncertain.# The theory of probability is an attempt to measure the degree of uncertainty in
the results of such experiments. The theory of probability was originated from gambling. Now – a – days,
probability is used practically in all branches of study. Statistics cannot be studied without understanding
the theory of probability. Mathematicians James Bemoulli, Pascle, De’Moivre, Base etc. have given important
contribution in developing the theory of probability. Definitions of Probability: Before giving definitions of
probability, we shall understand certain terms: 1. Random experiment (or trial) : An experiment which can
result in any one of the several possible outcomes is called random experiment or a trial. E.g. (1) Tossing of
a coin is a random experiment. (2) Drawing a card from a pack of playing cards is a random experiment. (3)
Throwing a die is a random experiment. Characteristics of a random experiment: (1) The experiment results
in any one of the outcomes. (2) All possible outcomes of the experiment can be described in advance. (3)
The experiment can be repeated under same conditions. (4) The result of the experiment cannot be
predicted correctly in advance.
2. Sample space: A set representing all possible outcomes of a random experiment is called a sample space
and it is denoted by S or U. Each outcome is called a sample point. The number of sample points in S may be
denoted by n(S). If the number of sample points of S is finite, it is known as a finite sample space and if the
number of sample points is infinite, it is known as an infinite sample space. E.g. If a coin is tossed, the sample
space will be as follows: S = {H, T} Similarly if two coins are tossed the following sample space is generated:
S = {(H, H), (H, T), (T, H), (T, T)} When 3 coins are thrown simultaneously following sample space S is given by
S = {HHH, HHT, HTH, THH, TTH, THT, HTT, TTT} Similarly if two dice are thrown the following sample space is
obtained: S = { (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6) (2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6) (3, 1) (3, 2) (3, 3) (3, 4)
(3, 5) (3, 6) (4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6) (5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6) (6, 1) (6, 2) (6, 3) (6, 4) (6, 5)
(6, 6) } In this experiment the sample space consists of 36 sample points. If a coin is tossed until head appears
we get the following sample space {H, TH, TTH, TTTH, TTTTH, ………..} In the first four experiments the sample
spaces are finite while in the fifth experiment the sample space is infinite. 3. Events: The result of an
experiment is known as events: e.g. (i) If a coin is tossed head (H) and tail (T) are two different events. (ii) 1,
2, 3, 4, 5, 6 are different events when a die is thrown. If A is an event and S is a sample space then A is a
subset of sample space S. i.e. Generally events are denoted by A, B, C or A1, A2, A3 etc. If A = Φ then A is
impossible event and if A = S, then the event A is certain to occur. 4. Complementary event: The complement
of an event A is the aggregate of all the sample points of sample space S which do not belong to A. It is
denoted be A’ or Ā. E.g. If A is an event of getting an odd number when a die is thrown, then event of not
getting an odd number i.e. getting an even number is the complement of event A and it is denoted by A’ or
Ā. 5. Union of two events: The union of two events A and B is denoted by A U B. It is the aggregate of all
sample points belonging to ether A or B or both. E.g. If A is an event that a student plays cricket and B is an
event that a student plays hockey then A U B represents an event that a student plays either cricket or hockey
or both. 6. Intersection of two events: The intersection two events A and B is denoted A∩B. It is the
aggregate of all sample points belonging to A and B both. When two events A and B occur simultaneously
we say that A∩B has occurred. E.g. If A is an event that a student plays cricket and B is an event that a student
plays hockey, then A∩B represents an event that a student plays cricket and hockey both. 7. Difference
Event: The difference of two events A and B is the event that A happens and B does not happen. It is denoted
by A – B. A – B = A ∩ B’. 8. Exhaustive events: If all possible outcomes of an experiment are considered, the
outcomes are said to be exhaustive. The exhaustive events are nothing but all the sample points in the
sample space. In throwing a die 1, 2, 3, 4, 5, 6 are exhaustive events. 9. Mutually exclusive events: Events
are said to be mutually exclusive, if they cannot occur together. i.e. the occurrence of any one of them
prevents the occurrence of the remaining. If A and B are two mutually exclusive events, then A ∩ B = Φ.
Head and Tail are mutually exclusive events when a coin is tossed. 10. Equally exclusive events: Events are
said to be equally likely if we have no reason to believe that one event is preferable to the others. Head and
Tail are equally likely events in tossing a coin. 11. Favorable cases: The number of sample points favorable
to the happening of an event A are known as favorable cases of A e.g. in drawing a card from a pack of cards,
the favorable cases for getting a spade are 13. 12.Independent events: Events are said to be independent if
the happening of one event does not depend upon the happening or non-happening of other events. E.g.
When a coin is tossed two times, the event of getting head in the first throw and that of getting head in the
second throw are independent events. Here the result of the second throw does not depend upon the result
of the first throw. Similarly the event of getting 3 when a die is thrown and getting a spade from a pack of
cards are independent events. Having defined some basic terms we are now in a position to define
probability.
Mathematical or Classical or Apriori definition of probability: If an experiment can result in nexhaustive,
mutually exclusive and equally likely ways, and if m of them are favorable to the happening of an event A,
then the probability of happening of an event A is defined as the ratio of m to n. The probability of happening
of an event A is denoted by P (A). i.e
. P (A) = 𝐹𝑎𝑣𝑜𝑟𝑎𝑏𝑙𝑒 𝑐𝑎𝑠𝑒𝑠 𝑓𝑜𝑟 ℎ𝑎𝑝𝑝𝑒𝑛𝑖𝑛𝑔 𝑎𝑛 𝑒𝑣𝑒𝑛𝑡 𝐴/𝑇𝑜𝑡𝑎𝑙 𝑒𝑥ℎ𝑎𝑢𝑠𝑡𝑖𝑣𝑒,𝑚𝑢𝑡𝑢𝑎𝑙𝑙𝑦 𝑒𝑥𝑐𝑙𝑢𝑠𝑖𝑣𝑒 𝑎𝑛𝑑
𝑒𝑞𝑢𝑎𝑙𝑙𝑦 𝑙𝑖𝑘𝑒𝑙𝑦 𝑐𝑎𝑠𝑒𝑠 # P (A) = 𝑚/𝑛 Limitations of mathematical definition: The following are the limitations
of the mathematical definition of probability: (1)If the total number of exhaustive cases n is not known,
probability cannot be obtained. (2)If the exhaustive cases are infinite, probability cannot be found out.
(3)This definition can be used only when the cases are equally likely. If they are not equally likely, the
definition cannot be used. (4)In this definition of probability the word equally likely is used. Events are said
to be equally likely if they have the same chance of occurrence i.e. the same probability of occurrence. Thus,
in defining probability the word probability is indirectly used. The definition is therefore, circular in nature,
and hence cannot be regarded as a good definition. Statistical or Empirical or A posteriori definition of
Probability: If an experiment is repeated under essentially, the same conditions for a great number of times
then the limit of the ratio of number of times the event happens to the total number of trials, is defined as
the probability of the event. Here it is assumed that the limit exists and it is unique. i.e. P (A) = lim (𝑚/𝑛)
𝑛→∞

RESULTS
1. U = A ᴜ A’ U
P (U) = P (A) + P (A’)
P (A ᴜ A’) = P (A) + P (A’)
P (A) + P (A’) = 1
P (A) = 1 – P (A’)
P (A’) = 1 – P (A)
2. P (A ᴜ B) = P (A) + P (B) – P (A ∩ B)
A = (A ∩ B’) ᴜ (A ∩ B)
P (A) = P (A ∩ B’) + P (A ∩ B)
B = (A ∩ B) ᴜ (A’ ∩ B)
P (B) = P (A ∩ B) + P (A’ ∩ B)
P (A’ ∩ B’) = P (A ᴜ B)’
P (A ᴜ B) = 1 – P (A ᴜ B)’
P (A ᴜ B)’ = 1 – P (A ᴜ B)
P (A’ ∩ B’) = 1 – P (A ᴜ B)
3. P (A ᴜ B ᴜ C) = P (A) + P (B) + P (C) – P (A ∩ B) – P (A ∩ C) – P (B ∩ C) + P (A ∩ B ∩ C)
4. Mutually Exclusive Event:
P (A ∩ B) = 0
P (A ᴜ B) = P (A) + P (B)
P (A ᴜ B ᴜ C) = P (A) + P (B) + P (C)
P (A) + P (B) + P (C) = 1
5. Different Event:
P (A – B) = P (A ∩ B’)
= P (A) – P A ∩ B)
P (B – A) = P (A’ ∩ B)
= P (B) – P (A ∩ B)
If A ∩ B = Ø
P (A – B) = P (A)
P (B – A) = P (B)
6. Independent Event:
P (A ∩ B) = P (A) ∙ P (B)
P (A ∩ B’) = P (A) ∙ P (B’)
P (A’ ∩ B) = P (A’) ∙ P (B)
P (A’ ∩ B’) = P (A’) ∙ P (B’)
P (A/B) = P (A)
P (B/A) = P (B)
7. Conditional Probability:
P (A/B) = 𝑃 (𝐴 ∩𝐵)/*𝑃 (𝐵) P (A ∩ B) = P (B) ∙ P (A/B)
P (B/A) = 𝑃 (𝐴 ∩𝐵) /*𝑃 (𝐴) P (A ∩ B) = P (A) ∙ P (B/A)
BBA SEM - III BS (BBA 305) Instructor: Dr. Sheetal Patel

CHAPTER: 2
MEASURE OF CENTRAL TENDENCY AND
MEASURE OF DISPERSION (VARIATION)

Formulas:
Measure of Central Tendency
1. MEAN (ARITHMETIC MEAN):
 Only Observation

∑𝑥𝑖 ∑𝑑𝑖
X̄ = OR X̄ = A +
𝑛 𝑛

 Ungroup frequency distribution

∑𝑓𝑖𝑥𝑖 ∑𝑓𝑖𝑑𝑖
X̄ = OR X̄ = A +
𝑛 𝑛

 Group frequency distribution

∑𝑓𝑖𝑥𝑖 ∑𝑓𝑖𝑑𝑖
X̄ = OR X̄ = A + ×C
𝑛 𝑛

2. MEDIAN:
 Only Observation and Ungroup frequency distribution

𝑛+1 th
M=( ) Obs.
2

 Group Frequency Distribution


𝑛
𝑛 2
− 𝑐𝑓𝑖
Median Class = ( )th Obs. M=L+ ×C
2 𝑓𝑖

3. QUARTILES [1 TO 3]:

 Only Observation and Ungroup frequency distribution

𝑛+1 th
Q1 = ( ) Obs.
4

𝑛+1 th
Q3 = 3( ) Obs.
4

12
BBA SEM - III BS (BBA 305) Instructor: Dr. Sheetal Patel

 Group Frequency Distribution


𝑛
𝑛 4
− 𝑐𝑓𝑖
Q1 Class = ( )th Obs. Q1 = L + ×C
4 𝑓𝑖
3𝑛
𝑛 − 𝑐𝑓𝑖
4
Q3 Class = 3( )th Obs. Q3 = L + ×C
4 𝑓𝑖

4. DECILES [1 TO 9]:

 Only Observation and Ungroup frequency distribution

𝑛+1 th
DK = K( ) Obs.
10

 Group Frequency Distribution

𝐾𝑛
𝑛 − 𝑐𝑓𝑖
10
DK Class = K( )th Obs. D K= L + ×C
10 𝑓𝑖

5. PERCENTILES [1 TO 99]:

 Only Observation and Ungroup frequency distribution

𝑛+1 th
PK = K( ) Obs.
100

 Group Frequency Distribution

𝐾𝑛
𝑛 − 𝑐𝑓𝑖
100
PK Class = K( )th Obs. PK = L + ×C
100 𝑓𝑖

6. MODE:

 Mode is defined:

Modal Class = The class with highest frequency

𝑓1−𝑓0
Z=L+ ×C
2𝑓1−𝑓0−𝑓2

 Mode is not defined:

Z = 3M - 2X̄

13
BBA SEM - III BS (BBA 305) Instructor: Dr. Sheetal Patel

Measure of Dispersion (Variation)

1. RANGE:
𝑋𝐻−𝑋𝐿 𝑅
R = XH - XL Co-efficient of Range = =
𝑋𝐻+𝑋𝐿 𝑋𝐻+𝑋𝐿

2. QUARTILE DEVIATION:
𝑄3−𝑄1 𝑄3−𝑄1/2 𝑄3−𝑄1
Qd = Co-efficient of Qd = =
2 𝑄3+𝑄1 /2 𝑄3+𝑄1

3. MEAN DEVIATION:

 Only Observation

∑ | 𝑥𝑖− x̄ |
δ X̄ =
𝑛

 Ungroup Frequency Distribution and Group Frequency Distribution:

∑ 𝑓𝑖 | 𝑥𝑖− x̄ |
δ X̄ = Co-efficient of Mean Deviation = δ X̄ / X̄
𝑛

4. STANDARD DEVIATION (S OR 𝝈):

 Only Observation

∑𝑥𝑖² ∑𝑥𝑖² ∑𝑥𝑖


S=√ − (X̄)² OR S=√ − ( )² OR
𝑛 𝑛 𝑛

∑𝑑𝑖² ∑𝑑𝑖 ∑(𝑥𝑖−x̄ )²


S=√ − ( )² OR S=√
𝑛 𝑛 𝑛

 Ungroup Frequency Distribution:

∑𝑓𝑖𝑥𝑖² ∑𝑓𝑖𝑥𝑖² ∑𝑓𝑖𝑥𝑖


S=√ − (X̄)² OR S=√ − ( )² OR
𝑛 𝑛 𝑛

∑𝑓𝑖𝑑𝑖² ∑𝑓𝑖𝑑𝑖 ∑𝑓𝑖 (𝑥𝑖−x̄ )²


S=√ − ( )² OR S=√
𝑛 𝑛 𝑛

14
BBA SEM - III BS (BBA 305) Instructor: Dr. Sheetal Patel

Formulas:
Karl Person’s Product Moment Method
Only Observation:

𝑛 ∑ 𝑥𝑦−( ∑ 𝑥 ) ( ∑𝑦 )
r=
√𝑛 ∑ 𝑥²−( ∑ 𝑥 )² . √𝑛 ∑ 𝑦²−( ∑ 𝑦 )²

∑(𝑥− x̄) (𝑦− ȳ )


r=
√ ∑( 𝑥− x̄ )² . √∑( 𝑦− ȳ )²

𝑛 ∑ 𝑢𝑣−( ∑ 𝑢 ) ( ∑𝑣 )
r=
√𝑛 ∑ 𝑢²−( ∑ 𝑢 )² . √𝑛 ∑ 𝑣²−( ∑ 𝑣 )²

Ungroup Frequency Distribution and Group Frequency Distribution:

𝑛 ∑ 𝑓𝑢𝑣 −( ∑ 𝑢𝑓𝑢 ) ( ∑𝑣𝑓𝑣 )


r=
√𝑛 ∑ 𝑢²𝑓𝑢−( ∑ 𝑢𝑓𝑢 )² . √𝑛 ∑ 𝑣²𝑓𝑣 −( ∑ 𝑣𝑓𝑣 )²

Prof. Charles Edward Spearman’s Rank Correlation

6 ∑ 𝑑²
r=1–
𝑛 ( 𝑛2 − 1)

𝑚 𝑚 𝑚
6 {∑ 𝑑2 +12( 𝑚2 − 1)+12( 𝑚−1)+12( 𝑚2 − 1)+ ………..}
r=1–
𝑛 ( 𝑛2 − 1)

ASSIGNMENT – III
(1) What is Correlation?

(2) Explain the types of Correlation.

(3) Write a note on the Scatter diagram method with the importance & limitations of this
method.

(4) Write down the merits & demerits of Pearson’s product moment method & Spearman’s
rank correlation method.

(5) Differentiate between Karl Pearson method & Spearman method.

(6) Interpret the correlation coefficient.


33
BBA SEM - III BS (BBA 305) Instructor: Dr. Sheetal Patel

In cases in which y is taken as independent variable and x is taken as dependent variable the
line obtained by least squares principle can be called a regression line of x on y and we shall
write its equation as 𝐱 = 𝒂 + 𝒃𝒙𝒚𝒚 .

Thus for two mutually related variables we can obtain two regression lines.

Que. 3) Difference between Correlation and Regression


Ans:
Correlation Regression

 It gives a numerical measure of the  It gives the functional relationship


linear correlation in between the between the variables.
variables.

 It is along between -1 and +1.  It is greater than 1.

 It is independent change of origin  It is independent change of origin


and scale. but not scale.

 It can be obtain from regression co-  It can be obtain from correlation co-
efficient. efficient.

Formulas:
 Equation of Regression line of Y on X:

Y = a + byx ∙ X
Only Observation:

𝑛 ∑ 𝑥𝑦−( ∑ 𝑥 ) ( ∑𝑦 )
byx =
𝑛 ∑ 𝑥²−( ∑ 𝑥 )²

∑(𝑥− x̄) (𝑦− ȳ )


byx = ∑(𝑥− x̄)²

𝑛 ∑ 𝑢𝑣−( ∑ 𝑢 ) ( ∑𝑣 )
byx =
𝑛 ∑ 𝑢²−( ∑ 𝑢 )²

𝑛 ∑ 𝑢𝑣−( ∑ 𝑢 ) ( ∑𝑣 ) 𝑐𝑦
byx = ×
𝑛 ∑ 𝑢²−( ∑ 𝑢 )² 𝑐𝑥

47
BBA SEM - III BS (BBA 305) Instructor: Dr. Sheetal Patel

Ungroup Frequency Distribution and Group Frequency Distribution:

𝑛 ∑ 𝑓𝑢𝑣 −( ∑ 𝑢𝑓𝑢 ) ( ∑𝑣𝑓𝑣 ) 𝑐𝑦


byx = ×
𝑛 ∑ 𝑢²𝑓𝑢−( ∑ 𝑢𝑓𝑢 )² 𝑐𝑥

𝑐𝑜𝑣 ( 𝑥 ,𝑦 ) 𝑆𝑦 ∑(𝑥− x̄) (𝑦− ȳ )


byx = byx = r byx =
𝑆𝑥² 𝑆𝑥 𝑛𝑆𝑥²

a = Ȳ - byx ∙ X̄

 Equation of Regression line of X on Y:

X = A + bxy ∙ Y
Only Observation:

𝑛 ∑ 𝑥𝑦−( ∑ 𝑥 ) ( ∑𝑦 )
bxy =
𝑛 ∑ 𝑦²−( ∑ 𝑦 )²

∑(𝑥− x̄ ) (𝑦− ȳ )
bxy = ∑(𝑦− ȳ )²

𝑛 ∑ 𝑢𝑣−( ∑ 𝑢 ) ( ∑𝑣 )
bxy =
𝑛 ∑ 𝑣²−( ∑ 𝑣 )²

𝑛 ∑ 𝑢𝑣−( ∑ 𝑢 ) ( ∑𝑣 ) 𝑐𝑥
bxy = ×
𝑛 ∑ 𝑣²−( ∑ 𝑣 )² 𝑐𝑦

Ungroup Frequency Distribution and Group Frequency Distribution:

𝑛 ∑ 𝑓𝑢𝑣 −( ∑ 𝑢𝑓𝑢 ) ( ∑𝑣𝑓𝑣 ) 𝑐𝑥


bxy = ×
𝑛 ∑ 𝑣²𝑓𝑣−( ∑ 𝑣𝑓𝑣 )² 𝑐𝑦

𝑐𝑜𝑣 ( 𝑥 ,𝑦 ) 𝑆𝑥 ∑(𝑥− x̄) (𝑦− ȳ )


bxy = bxy = r bxy =
𝑆𝑦² 𝑆𝑦 𝑛𝑆𝑦²

A = X̄ - bxy ∙ Ȳ

48

You might also like