Research Methodology - ITM Unviersity - MBA - Sem 2

I T M
UNIVERSITY
ONLINE
Research Methodology
Table of Content eBook
1. Introduction to Research Methodology 6
1.1 Introduction 7
1.2 Definition of Research 8
1.3 Research Methods and Research Methodology 8
1.4 Objectives of Research 9
1. 5 Motivation for Conducting Research 10
1. 6 Criteria of Good Research 10
1. 7 Characteristics of Research 11
1.8 Types of Research 12
1. 9 Steps Involved in Research Process 16
1.10 Role of Research in Business 19
1.11 Chapter S u m m a r y 20
2. Research Problem Formulation and Research Design 21
2.1 Introduction 22
2.2 Definition of Research Problem 23
2.3 Procedure of Defining General Research Problem 24
2.4 Objectives of Research Design 27
2. 5 Research Design 28
2.6 Contents of Research Design 28
2. 7 Important Concepts in Research Design 29
2.8 Types of Research Design 31
2. 9 Basic P r i n c i p l e s of Experimental Design 33
3. Sampling Design and Sampling Techniques 35
3.1 Introduction 36
3.2 Population, Census, and Sample 37
3.3 Sampling Design 38
3.4 S a m p l e Design Procedure 38
3.5 Characteristics of a Good Sample Design 41
3.6 Criteria for Selecting a Sampling Procedure 41
3.7 Types of S a m p l i n g Techniques 43
www.itmuniversityonline.org Page 2
4. Methods and Tools of Data Collection 50
4.1 Introduction 51
4.2 Data Types 52
4.3 Questionnaire Design 58
4.4 Requirements of a Good Questionnaire 61
4.5 Case Study 62
5. Measurement and Scaling Techniques 65
5.1 Introduction 66
5.2 Measurement and Scaling 67
5.3 Primary Scales of Measurement 68
5.4 Classification of Scaling Techniques 71
5.5 Comparative Scales 72
5.6 Categorical Scales 74
6. Tabulation and Analysis of Data 81
6.1 Introduction 82
6.2 Tabulation 83
6.3 M u l t i p l e Regression Analysis 85
6.4 M u l t i p l e D i s c r i m i n a n t Analysis 85
6. 5 Measures of Central Tendency 86
6.6 Measures of Dispersion 94
6. 7 Measures of Skewness 96
6.8 Measures of Relationships 97
6. 9 Association of Attributes 100
6.10 Time Series Analysis and Index Number 101
7. Hypothesis Testing 104
7.1 Introduction 105
7.2 Hypothesis 106
7.3 Types of Hypothesis 107
7.4 Terminologies Used in Hypothesis Testing 108
7.5 Procedure of Testing of Hypothesis 112
7.6 Parametric and Non-parametric Testing 113
7. 7 Testing of Hypothesis for Mean 115
7.8 Testing of Hypothesis for Variance 119
7.9 Testing of Hypothesis for Correlation Coefficients 121
7.10 Limitations of Testing of Hypothesis 122
8. Analysis of Variance (ANOVA) 124
8.2 Analysis of Variance (ANOVA) 126
8.3 Why Analyze Variance? 128
8.4 Variability Measure by One-way ANOVA 129
8.5 ANOVA Technique 131
8.6 One-way ANOVA - Example 132
8.7 Two-way ANOVA 136
8.8 Analysis of Covariance (ANOCOVA) 138
9. Non-parametric Testing and Chi-Square Test 140
9.2 Non-parametric Test 142
9.3 C h i - s q u a r e Test 144
9.4 S i g n Test 149
9.5 Run Test 151
9.6 Spearman's Rank Correlation 153
9.7 Kendall's test 155
9.8 Wilcoxon Matched-pairs Test 159
9.9 M a n n - W h i t n e y U Test 160
1 0 . Research Report Writing 163
10.2 Meaning and Importance of Research Report 165
10. 3 Steps in Writing Research Report 167
10.4 Report Format 169
10. 5 P r e l i m i n a ry Parts of Research Report 169
10. 6 Main Body of Research Report 173
10. 7 Types of Research Report 177
I n t r o d u c t i o n to
Research
Methodology
01. Introduction to Research Methodology eBook
1.1 Introduction
The term research is used progressively for any kind of exploration that is intended to
discover interesting or new facts. Research is used to stipulate a variety of explorations
significant to a wide range of subjects, such as leisure studies and sports, hospitality,
healthcare and nursing studies, the natural sciences, social sciences, the environment,
social anthropology, psychology, politics, business, education, and the h u m a n i t i e s .
Various university courses include research that students must carry out independently,
in the form of projects, dissertations and thesis, and the more advanced the degree, the
greater the research content.
After reading t h i s chapter, you will be able to:
Define research
Explain research methods and research methodology
E x p l a i n the objectives of research
List motivations for conducting research
Enumerate the characteristics of research
Identify the different types of research
Describe steps involved in research process
E x p l a i n the role of research in business
1 . 2 D e f i n i t i o n of Research
Generally, research refers to a search for knowledge. It can also be defined as a
scientific and systematic exploration for appropriate information on a special subject
matter. Research is one of the ways to find a good solution for problems, by
investigating and analyzing information in a scientific way. In a technical sense, research
comprises an academic activity. According to some people, research is a movement, a
movement from the known to the unknown.
According to Clifford Woody, "Research comprises defining and redefining
problems, formulating hypothesis or suggested solutions; collecting, organising
and evaluating data; making deductions and reaching conclusions; and at last
carefully testing the conclusions to determine whether they fit the formulating
hypothesis."
In the words of Grinnell, "Research is a structured inquiry that utilizes
acceptable scientific methodology to solve problems and creates new
knowledge that is generally acceptable."
D. Slesinger and M. Stephenson, in the encyclopedia of Social Sciences, defined
research as, "the m a n i p u l a t i o n of things, concepts or symbols for the purpose of
generalising to extend, correct or verify knowledge, whether that knowledge
aids in construction of theory or in the practice of an art."
In other words, research is the search for knowledge t h r o u g h objective and systematic
method of f i n d i n g solution to a problem. Research is a systematic approach that deals
with generalization and the formulation of a theory. It also includes formulating a
hypothesis, enunciating the problem, analyzing the facts, collecting the facts or data,
and reaching certain conclusions.
1.3 Research Methods a n d Research M e t h o d o l o g y
Research Methods
Research methods and research methodology differ. Methods/techniques that are used
during the course of conducting the research are known as research methods. The study
of research methods offers training to apply various methods to solve the research
problem.
0 1 . Introduction to Research Methodology eBook
Research methods can be categorized into following three g r o u p s :
Methods that are concerned with the collection of data. Example: Questionnaire
method, interview, etc.
Statistical t ec h n i q u e s that are used for building relationships between the variables
u n d e r study. Example: Regression and correlation analysis, etc.
Methods used for calculating the accuracy of the results. For example, testing of
hypothesis, etc.
Research methodology is a way to consistently solve the research problem. In a
research, various steps are generally adopted by a researcher. Methodology refers to the
procedure, theory or study of methods by which knowledge is gained. Research
methodology is the procedure by which researchers predict, explain, and describe their
work.
Research methodology gives necessary training in scientific tools, materials, choosing
methods, and techniques relevant for the problem chosen. Also, research methodology
is concerned with the explanation of questions, like:
Why the particular research study is undertaken?
How to formulate the research problem?
Why the particular technique of analysis of data is u s ed ?
1 . 4 Objectives of Research
Research discovers answers to questions through scientific procedures. The main
intention of research is to find out the hidden truth. Research study has its own specific
purpose.
The major objectives of research are:
Exploration: To gain familiarity in the phenomenon and achieve new i n s i g h t .
Description: To describe accurate characteristics of individuals under
consideration.
Diagnostic: To determine the frequency with which a certain t h i n g h a p p e n s or with
w h i c h it is associated with something else.
Explanation: To test the causal relationship between the variables.
1 . 5 M o t i v a t i o n f o r Conducting Research
There are various factors motivating people to undertake research studies:
1. Yearning to get a research degree, moreover, with its substantial advantages.
2. To face the c h a l l e n g e in solving unsolved problems.
3. To get the inventive joy of doing some stimulating work.
4. Desire to serve society.
5. To earn a good virtue.
Note:
This is not a comprehensive list of factors motivating people to undertake research
studies.
1 . 6 Criteria of Good Research
The research has to satisfy the following criteria:
1. The purpose of the research should be clearly defined.
2. The research procedure used should be described in sufficient detail, in order to
permit another researcher to repeat the research for further advancement,
keeping the continuity of what has already been attained.
3. The procedural design of the research should be carefully planned to yield results
that are as objective as possible.
4. The researcher should report with complete frankness the flaws in procedural
d e s i g n and estimate their effects on the findings.
5. The analysis of data should be sufficiently adequate to reveal its significance and
the methods of analysis used should be appropriate. The v a l i d i t y and r e l i a b i l i t y of
the data should be checked carefully.
6. Conclusions should be confined to those justified by the data of the research and
limited to those for which the data provide an adequate basis.
7. Greater confidence in research is warranted if the researcher is experienced, has a
good reputation in research, and is a person of integrity.
Source: 1. James Harold Fox, Criteria of Good Research, Phi Delta Kappan, Vol. 39 (March, 1 9 5 8 ) , pp.
285-86. 2. Danny N. Bellenger and Barnett, A. Greenberg, "Marketing Research-A Management
Information Approach", p. 107-108
1 . 7 Characteristics of Research
Various terms are used to check the validity and fairness of the research; the success of
a n y research d e p e n d s on these terms. Some characteristics of research are:
Reliability
This is a prejudiced term that cannot be measured precisely. Often, various techniques
or instruments are used to measure the reliability of any research accurately. A reliable
research is that which yields similar results every time it is undertaken, with similar
population and procedures. Reliability refers to the repetition of a n y research, research
instrument, tool or procedure. Reliability present in the research is proportional to the
n u m b e r of s i m i l a r results produced.
Validity
Validity refers to the effectiveness with which you approximate research conclusions,
a s s u m p t i o n s or propositions, true or false. The applicability of any research d e p e n d s on
its validity. T h e validity of the research instrument can be defined as the suitability of
the research instrument to the research problem or how accurately the instrument
measures the problem. Defining concepts in the best possible manner can keep the
research on-track so that no errors occur during measurement.
Accuracy
Accuracy refers to the degree to which each research process, instrument, and tool is
related to each other. It measures whether research tools have been selected in the best
possible manner and research procedures suit the research problem or not. The
accuracy of research can be improved by choosing the best data collection tool.
Credi bi I ity
C r e d i b i l i t y comes with the use of the best source of information and the best procedures
in research, as secondary data has been manipulated by humans and is therefore, not
very valid to use in research. So the research might complete in less time but its
credibility will be at stake. Instead of the least credible primary data, a certain
percentage of secondary data can be used. The credibility of a research can be increased
by g i v i n g accurate references.
Generalizability
This refers to the applicability of research findings to a larger population. A researcher
takes a s m a l l sample from the target population to conduct the research. As the sample
is merely a representative of the population, the findings should also be the same. If
research findings can be applied to any sample from the population, the results of the
research are said to be generalizable.
Empirical
Research has been tested for accuracy and is based on real life experiences.
Quantitative research is easier to prove, scientifically, than qualitative research.
Systematic
According to this approach for research, no research can be conducted haphazardly.
Each step must follow the other. There are a set of procedures that have been tested
over a period of time and are thus, suitable to use in research.
Controlled
When similar events are tested in research, due to the broader nature of factors that
affect that event, some factors are taken as controlled factors, while others are tested
for the possible effect. The controlled factors or variables should have to be controlled
rigorously. In pure sciences, it is very easy to control such elements because
experiments are conducted in the laboratory but in social sciences, it becomes difficult to
control these factors due to the nature of research.
Source: 1. http://gulnazahmad.hubpages.com/hub/research-methodology 2. James Harold Fox, Criteria of
Good Research, Phi Delta Kappan, Vol. 39 (March, 1958), pp. 285-86. 3. Danny N. Bellenger and Barnett,
A. Greenberg, "Marketing Research-A Management Information Approach", p. 107-108
1 . 8 Types of Research
The basic types of research are:
Descriptive vs. Analytical
Descriptive research includes surveys and fact-finding enquiries of different kinds.
Descriptive research refers to one that provides an accurate portrayal of the
characteristics of a particular individual, situation or a group. It is also known as
statistical research. It deals with everything that can be enumerated and s t u d i e d, which
has an impact on the lives of the people it deals with. Example: frequency of customers,
preferences of students or similar data. In analytical research, a researcher analyzes
facts or information already available, to make a critical evaluation of the material.
Applied vs. Fundamental
Applied research is a scientific study used to analyze practical problems. Solutions to
everyday problems, cure of illnesses, and developing innovative techniques, rather than
acquiring knowledge, can be obtained using applied research. For example, to increase
a g r i c u l t u r a l crop production or the energy efficiency of hospitals, machineries, etc.
Research concerned with generalizations and the formulation of a theory is called
fundamental research. In other words, gathering knowledge for the sake of knowledge is
termed 'pure', 'basic' or 'fundamental' research. In this, you do research on natural
phenomenon or topics relating to pure mathematics are included. Research studies
concerning human behavior carried on with a view to make generalizations about human
behavior, basic science probe for answers to questions such as 'how did the universe
b e g i n ? ' come u n d e r fundamental research.
Quantitative vs. Qualitative: Quantitative research is based on scientific methods, in
w h i c h data is related to the measurement of quantity or a m o u n t . In case of quantitative
phenomena, quantitative research should be conducted and if the phenomenon is
qualitative in nature, qualitative research should be conducted. To develop and employ
mathematical theories, models and/or hypotheses pertaining to the phenomenon, is the
objective of quantitative research. For example, research related to the development of
machines and tools for measurements.
Qualitative research aims to collect detailed information of human attitude and the
reasons that a d m i n i s t e r such attitude. Research designed to find out how people feel or
what they t h i n k about a particular subject or institution is also q u a l i t a t i v e research.
Conceptual vs. Empirical: Philosophers and thinkers use this type of research to
develop new concepts or reinterpret existing ones. Empirical research is data-based
research, coming u p with conclusions. These conclusions are capable of being verified by
observation or experiment. This type of research is also known as experimental type of
research.
For the purpose of controlling and predicting phenomenon and e x a m i n i n g p r o b a b i l i t y and
causality among selected variables, an objective, systematic, and controlled
investigation, that is, experimental research is required.
Source: P a u l i n e V. Young, Scientific Social Surveys and Research, p. 30.
Other Types of Research
The objective of exploratory research is to analyze the data and explore the p o s s i b i l i t y of
obtaining as many relationships as possible between variables, without knowing their
end-applications. It provides a basis for general findings and provides a better
understanding of the situation. It uses a survey and observation method for research
findings. For example, finding the various causes for decrease in the revenue of a
particular car segment.
The objective of co-relational or causal research is to discover or establish the existence
of relationship/association/interdependence between two or more aspects of situations
or variables. For example, finding the impact of incentives on the productivity of the
workers, keeping other elements unchanged.
Significance of Research
According to Hudson Maxim, "All progress is born of inquiry. Doubt is often
better than overconfidence, for it leads to inquiry, and inquiry leads to
invention." Progress can be made possible by increasing the quantity of research.
Scientific and inductive thinking and the development of logical habits of t h i n k i n g and
organizations are included in the research.
Research is very useful in various fields, like applied economics, business, and medical
fields and is on the increase, day-by-day. Due to the complicated nature of business,
research is used to a large extent to solve complicated operational problems.
In the economic system, research gives a basis for all government policies. A big part of
the government's budgets are based on an analysis of the needs and desires of the
people and on the a v a i l a b i l i t y of revenues to meet these needs.
Research has a great importance in solving many planning and operational problems of
industries and businesses. Market and operations research, along with motivational
research, are the most important terms in taking business decisions.
In market research, investigation of the development and structure of a market for the
purpose of formulating efficient policies for production, sales, and purchasing is done.
Operations research includes the application of mathematical, logical, and analytical
methods to obtain solutions to business problems of cost minimization or of profit
maximization.
The significance of research can also be explained with the following points:
To students that have to write a master's or Ph.D. thesis, research may mean
careerism or a way to attain a higher position in the social structure.
To professionals in research methodology, research may mean a source of
livelihood.
To philosophers and thinkers, research may mean the outlet for new ideas and
insights.
To literary men and women, research may mean the development of new styles
and creative work.
To analysts and intellectuals, research may mean generalizations of new theories.
Thus, research is the fountain of knowledge for the sake of knowledge and an important
source of providing guidelines for solving different business, governmental, and social
problems. It is a sort of formal training that enables one to understand the new
developments in one's field in a better way.
Source: C. R. Kothari, Research Methodology: Methcx:ls and Techniques, New Age International Publishers,
2nd Edition
!
Application Objective I n q u m n g Mode
Perspective Perspective Perspective
Descriptive
Qualitatlve
Pure Research
Research
Research
(Structured
Explanatory
Applied
Approach)
Research
Research
Exploratory
Quantitative
Research
Research
(Unstructured
Co-relational
Approach)
Research
Fig. 1.Sa: Types of Research
1 . 9 Steps Involved I n Research Process
The research process can be summarized in the following eight steps:
Research
.. .. ..
Research
Literature Hypothesis Design
Problem
Review Formulation (Sample
Formulation
Design)
Report Generalization and

Data
Analysis and '
Data

Hypothesis
Preparation
Interpretation Testing
Collection
Fig. 1 . 9 a : Steps Involved In Research Process
Step 1: Research Problem Formulation
There are two types of research problems. Some research problems relate to states of
nature and others relate to the relationship between variables. If a research problem is
stated in a general way, then doubts or ambiguities, if any, relating to the problem will
be resolved. The feasibility of a final result is considered before the formulation of the
research problem.
Understanding the problem thoroughly and rephrasing the same into meaningful terms
from an analytical point of view are the two steps involved in the formulation of the
research problem. Initially, the problem can be stated in a broad, general way and
reframed into analytical or operational terms by rephrasing the problem in as specific
terms as possible.
Step 2: Literature Review
Literature review or survey is a collection of research publications, books, and other
supporting documents related to the defined problem. It helps in getting a proper
understanding of the problem chosen and to acquire proper theoretical and practical
knowledge to investigate the problem. It helps in the identification of various variables
to be considered for research. A literature review helps in assessing the current status of
the problem. After formulating a research problem, a brief summary should be written.
For example, for a research worker, writing a thesis for a Ph.D. degree or writing a
synopsis of the topic and submitting it to the Committee or Research Board is necessary
for a p p r o v a l .
Extensive literature survey that is concerned with the research problem is very
important. It can be made simple and easy by the abstracting and indexing of j o u r n a l s
and p u b l i s h ed or unpublished bibliographies. In this process, conference proceedings,
government reports, academic journals, books, etc. help a lot. Also, it should be noted
that one source will lead to another. At this level, a researcher can take the help of a
good library to make the task easy and simple.
Step 3: Hypothesis Formulation
Hypothesis formulation is the next step to a literature survey. The hypotheses should be
stated in clear and u n a m b i g u o u s terms. Hypothesis is nothing but an assumption made
in order to draw a conclusion about the population under consideration. Hypothesis
formulation is an important step because it provides the focal point for research. This is
the most crucial step in the analysis of data. It indirectly affects the q u a l i t y of data that
is required for the analysis. This is an important step in the development of research
problems. The hypothesis to be formulated must be very specific and limited because it
has to be tested in the analysis part.
Hypotheses are more specific predictions about the nature and direction of the
relationship between two variables. In a research, working hypothesis is a tentative
assumption made in order to draw and test its logical or empirical consequences.
Hypothesis is a resu It of the researcher's creativity.
Step 4: Research Design
Research design consists of sample design and methods for the collection of
measurement and a n a l y s i s of data. The research design must contain the details of the
defined sa m p l e and population, population and sample type, their size and their
probability distribution, definition, and details of variables considered under study. It
also contains procedures and techniques for data collection, the sa m p l e of research
population and method, and the technique to process and analyze the data.
Step 5: Data Collection
W h i l e selecting methods of collecting primary data, take into consideration the nature of
investigation, scope, and objective of the inquiry, available time, financial resources, and
the desired degree of accuracy. There are various methods for collecting primary d a t a :
By observation, through personal or telephonic interview, by observation, and through
questionnaire, schedules or any other data collection forms.
Step 6: Data Analysis and Hypothesis Testing
After data collection, the data is codified, tabulated, and analyzed for statistical
inferences. Through coding, the data is categorized and transformed in the form of
symbols for t a b u l a t i n g and counting the data.
Statistical values are obtained for this data and test for hypothesis is conducted by
applying tests, like c h i - sq u a r e , ANOVA, F-test, and many more. According to the testing
criteria, the d ec i s i o n is then taken to accept or reject the hypothesis.
Step 7: Generalization and Interpretation
Generalization involves two processes; theoretical inference from data, in order to
develop concepts and theory and the empirical application of the data to a wider
population, that is, building the theory based on research outputs. Interpretation refers
to the task of drawing inferences from the collected facts after an analytical and/or
experimental study for conclusion and for further research.
Step 8: Report Preparation
Report writing is a vital step in research, where the complete research and findings are
compiled together. A proper and valid report increases the efficiency of the research.
Acceptance and applicability of research depend on correct report writing. Due to
misleading conclusions about the research vitality, the whole research may be
questioned. Valid interpretations about the research can expose processes and relations
that u n d e r l i e its f i n d i n g s .
1.10 Role of Research in Business
The major role of research in business is to reduce the risk of the b u s i n e s s decision by
providing appropriate information regarding customers, competitors, market trends,
employees, government regulations, etc.
In a research process, the organization is able to obtain information about key business
areas, analyze it, develop strategies, and distribute business information.
Research has three major strategic roles in business decision m a k i n g :
To expand existing businesses, which includes:
o Improvements in the existing product service.
o Change in materials and technology for manufacturing and many more.
Exploring new business opportunities, which includes:
o Exploring new technology or product for new market.
o Entering into a new market with existing or new products.
Broadening and deepening technological capabilities, which include the same
product with improved or new technology.
1.11 Chapter S u m m a r y
Research can be summarized as a systematized effort to g a i n new knowledge.
Research method includes various procedures and techniques used for obtaining
and a n a l y z i n g data.
Research methodology is the approach toward systematically solving research
problems.
D e p e n d i n g on the perspective, there are various research types:
o Application Viewpoint: Pure and Applied Research
o Objective Viewpoint: Descriptive, Explorative, Explanatory, and
Co-relational Research
o I n q u i r y Mode Viewpoint: Quantitative and Qualitative Research
Research process gives a detailed flow of steps to be followed for any kind of
research.
For business, the major purpose of research is to reduce the risk of
decision-making.
Research P r o b l e m
Formulation and
Research D e s i g n
02. Research Problem Formulation and Research Design eBook
2.1 Introduction
Selecting and properly defining a research problem is the foremost step in the research
process. Problem identification and formulation are important terms in a research.
Identification of problem helps to define a research problem correctly. Some difficulty
that a researcher experiences, in context with either a theoretical or a practical
situation, is a research problem to which a researcher wants to obtain a s o l u t i o n .
In the words of Alison Loat, "A problem well-defined is a problem half-solved."
Without being clear of what you are going to research, it is troublesome to plan how you
are going to research it. You will be able to define your research strategy and data
collection methods by identifying a research problem clearly. The procedure of research
problem formulation will g u i d e you toward accurate research problem identification and
formulation.
You have to form the blueprint of how to conduct the research after defining a problem
as in the field of construction, architects with the help of a blueprint design, decide on
the efficient allocation of various resources. A blueprint for conducting a research is the
research design. It gives a detailed logical flow of the research approach and
respective methods of conducting those researches.
Define research problem
E x p l a i n the procedure of defining general research problem
Define the objectives of research problem
Define research design
E x p l a i n contents of research design
Explain important concepts in research design
E x p l a i n types of research design
Explain basic principles of experimental design
2.2 Definition of Research Problem
In the words of Z i k m u n d and Babin, "A problem is a situation, occurs when there
is a difference between the current conditions and most preferable set of
conditions."
Also, according to C. R. Kothari, "A research problem, in general, refers to some
difficulty which a researcher experiences in the context of either a theoretical
or practical situation and wants to obtain a solution for the same."
The process of defining and developing a decision statement and the steps involved in
translating it into more precise research terminology, including a set of research
objectives, can be referred to as 'research problem'.
Defining correct research problem guides for literature survey, selection of research
strategy, research design, selecting a data collection method and analysis method. Ill
defined research problems may create hurdles but a proper definition of research
problem will enable the researcher to be on track. Thus, defining a research problem
properly is a requirement for any study and is a highly important step. The formulation
of a problem is more important than its solution. It is only on the careful d e t a i l i n g of the
research problem that you can work out the research d e s i g n and can smoothly carry on
a l l the consequential steps involved while doing the research.
Factors to be considered while formulating a research problem:
An i n d i v i d u a l or an organization under consideration.
The environment or condition to which the difficulty pertains.
Specific objectives or goals to be attained.
Economic consideration: Research design efforts cost money. The value of
anticipated results must commensurate with the efforts put into the research, in
terms of benefit/returns.
Technical consideration: Adequate technical knowledge is available, with which
research is carried out.
Environmental consideration: Controversial subjects should not be chosen for
research. Preliminary studies or pilot surveys should be conducted after research
problem d e f i n i t i o n .
Consideration of limitations: Limitations, such as time limit, resource constraints,
and policy constraints are to be considered.
2.3 Procedure of Defining General Research Problem
The t e c h n i q u e of defining the general research problem involves the following steps:
1. Defining the Problem in a General Way
In a research problem, you must address either a specific practical operational
issue or some scientific discovery. Keeping in view some practical concern or some
scientific or intellectual interest, all the problems should be stated in a broad,
general way. Hence, to formulate a problem, researchers must involve themselves
absolutely in the subject matter.
In social research, it is advisable to do some field observation and undertake pilot
survey. Then, the researcher can seek the guidance of the guide or the subject
expert, in accomplishing this task. The guide puts forth the problem in general
terms, and it is then up to the researcher to narrow it down and phrase the
problem in operational terms. The problem stated in a general way may contain
ambiguities. Such problems must be resolved by cool thinking and rethinking.
2. Understanding the Nature of the Problem
Discuss the problem to understand it in a better way. Researchers should consider
all the points that induced them to make a general statement concerning the
problem. They can enter into discussion with those who have good knowledge of
the problem concerned or similar other problems, for a better u n d e r s t a n d i n g of the
nature of the problem involved.
3. Literature Survey
Review a l l the possible literature that is available on the research area and give a
new d i m e n s i o n in the particular area that leads to the enhancement of knowledge.
Before a definition of the research problem is given, all available literature
concerning the problem at-hand must necessarily be surveyed and examined.
The researcher must be well-conversant with relevant theories in the field, reports,
and records premise. For indicating the type of difficulties that may be
encountered in the current study, as also the possible analytical shortcomings
studied on related problems are useful. At times, such studies may also suggest
beneficial and even new lines of approach to the present problem.
02. Research Problem Formulation a n d Research Design eBook
4. Developing Ideas through Discussions
Take the advice of experienced researchers, to develop different aspects of the
research pro b l e m . Researchers can discuss the problem with colleagues and others
who have e n o u g h experience in the same area or in working on similar problems.
A discussion of the concerning problem often leads to the generation of useful
information. Discussions can develop new ideas; people with rich experiences are
in a position to enlighten the researcher on different aspects of the proposed
study.
5. Rephrasing the Research Problems
Put the research problem in 'as specific terms as possible', so that it may become
operationally viable for hypothesis development. Researchers can frame the
research problem more appropriately, to get good results.
Business Research Problem Definition Process
In case of bus i nes s research problem definition, the process can be shown as illustrated
in Fig. 2.3a.
6. Write research questions and/or research
hypotheses
I 5. Determine the relevant variables
1 4 . Determine the unit of analysis
3. Write managerial decision statement a n d
corresponding research objectives
2. Identify the problems from the symptoms
1. Understand the situation - Identify the key
symptoms
Fig. 2.3a: Business Research Problem Definition Process
Source: Zikmund, Babin, Carr, Griffin, Business Research Methods, 8th Edition
1. Understand the situation: Identify the key symptoms
Identify the symptoms: ask 'what has changed?'
The symptoms can be: decline in sales, increase in the cost of recruitment,
etc.
2. Identify the problems from the symptoms
Relate the symptoms with various possible reasons or causes. For example, a firm
has a problem with advertising effectiveness; the causes can be low brand
awareness, using wrong media, etc.
3. Write research objectives corresponding to managerial decision
statement
Decision statement explains how a problem can be solved.
It e x p l a i n s the information that is needed to help make the d e c i s i o n .
Research objective expresses potential research results that should aid
decision m a k i n g .
4. Determine the unit of analysis
The unit of analysis for a study indicates what or who should provide the
data and at what level of aggregation. For example, it can be the target
population from whom data needs to be collected to serve the research
objectives.
In a study of home appliances, the data is gathered from married couples.
s. Determine the relevant variables
A variable is anything that varies or changes from one instance to another.
It can also exhibit the difference between the values or directions.
The determination of the type of items or variables is very essential in a
research and should be studied to address the decision statement.
6. Write research questions a n d / o r research hypotheses
Research questions simply restate research objectives in the form of a q u e s t i o n .
2.4 Objectives of Research Problem
The research discovers answers to questions using the application of scientific methods.
According to C. R. Kothari, research objectives can be divided into the following broad
groups:
To g a i n familiarity with a phenomenon or to achieve new i n s i g h t s into it, known as
exploratory or formulative research studies.
To portray, accurately, the characteristics of a particular individual, situation or a
group, known as descriptive research studies.
To determine the frequency with which something occurs or with which it is
associated, known as diagnostic research studies.
To test the hypothesis of a causal relationship between variables, known as
hypothesis-testing research studies.
Formulating Hypothesis
A hypothesis states the relationship between two or more variables that suggest
an answer to the research question.
It predicts the relationship, in terms of expected results or outcomes of a study.
The direction of relationship between dependent and independent variables are
also predicted.
The suggestion formulated in the hypothesis may be the solution to the problem.
Two types of hypothesis are:
Null Hypothesis ( H o )
Null hypothesis predicts that in a general population, no relationship or no significant
difference exists between groups of a variable.
Alternative Hypothesis ( HA)
Alternative hypothesis is just the opposite of null hypothesis; it states that there is
significant difference or relationship between the groups or variables that can be tested.
For an advertising strategy, a firm is interested in evaluating the effectiveness of
promoting a product by internet and television.
Ho: There is no significant difference between the effectiveness of promotion by internet
and television.
HA: Both the advertising m ed i a have significant difference, in terms of effectiveness.
Or
HA: Promotion by Internet is more effective than television.
2.5 Research D e s i g n
In the words of Claire Selltiz, "A research design is the arrangement of
conditions for collection and analysis of data in a manner that aims to combine
relevance to the research purpose with economy in procedure."
A research design is a framework or a blueprint for conducting research. It gives the
procedures and is useful for obtaining the information needed to structure or solve the
research problems. It is a decision matrix which looks into the aspects of SWH - what,
where, when, which, and how, as they pertain to a research procedure.
Research design facilitates the logical flow of research operations. It gives the concise
plan of logical relations between the research type, data required, data collection and
analyzing method, and research reporting method. It helps to yield maximum
information, with m i n i m a l expenses for effort, time, and money.
2.6 Contents of Research Design
The contents of research d e s i g n are:
Sampling Design
It deals with the method of selecting items to be observed for the g i v e n research type.
Observational Design
It describes the c o n d i t i o n s under which the observations are to be made.
Statistical Design
The statistical design that is concerned with the question of how many items are to be
observed and how the information and data gathered are to be analyzed.
Operational Design
Operational design deals with techniques by which the procedures specified in the
s a m p l i n g , observational, and statistical design are satisfied.
Source: Claire Selltiz and others, Research Methods in Social Sciences, 1 9 6 2 , p. 50.
2.7 I m p o r t a n t Concepts in Research Design
Concepts rn Research Design
Experimental
Variables Experiment Control Group
Group
Fig. 2 . 7 a : Important Concepts in Research Design
Variables
Variable is a concept that can take on different quantitative values. A variable is
anything that varies or changes from one instance to another. It can e x h i b i t differences
in value or direction. Example: Concepts like weight, height, and income, which vary
from individual-to-individual, randomly.
Continuous Variable
It is a variable that can take any value, even a decimal value, between its minimum
value and m a x i m u m value. Example: Recording the temperature of a city.
Discrete Variable
It is a variable that takes only integer value. Example: Count of c h i l d r e n in family. The
relationship between the variables is described according to dependency on each other.
Dependent Variable
Dependent variable is a process outcome or variable that is predicted and/or explained
by other variables.
Independent Variable
Independent variable is a variable that is expected to influence the outcome of the
dependent variable, in some way. For example, customer loyalty may be a dependent
variable that is influenced or predicted by independent variables, such as service q u a l i t y ,
brand awareness or customer satisfaction.
Extraneous Variable
Extraneous variables are independent variables that are not related to the purpose of
the study but may affect the dependent variable. It is not under the control of the
researcher.
Experiment
The process of examining the truth of a statistical hypothesis (Ho), relating to some
research problem, is known as an experiment. The purpose of an experiment is to study
the causal links, whether a change in one independent variable produces a change in
another d e p e n d e n t variable.
Major Terms Used in Experiments
Treatments
The different conditions under which experimental and control groups are tested are
referred to as treatments.
Experimental (Sampling) Units
These are pre-determined plots or blocks where different treatments are used; always
specifically define those units in research design.
Control and Experimental Groups
In a classic experiment, two groups are established and certain members are assigned
to each group. The two groups will be exactly similar in all aspects relevant to the
research. When the g r o u p is exposed to some novel or special condition by intervention
or m a n i p u l a t i o n of independent variables, it is an experimental g r o u p . When the g r o u p is
exposed to a u s u a l condition, it is a control group.
Control Group ! Experimental Group
i
Group members assigned at random
Dependent variable is measured
Manipulation of
j independent variable
i
Dependent variable is measured
Fig. 2 . 7 b : Control and Experimental Groups
2.8 Types of Research Design
Different research d e s i g n s c a n be categorized, depending on research approaches:
Exploratory Research Design
Exploratory research, also termed as formulative research studies, is the v a l u a b l e means
of f i n d i n g out:
What is h a p p e n i n g
New insights
The questions to be asked
The assessment of phenomenon in a new light
To conduct the exploratory research, three principles are u s ed :
A search of the literature
Interviewing experts in subject
Conducting focus group interviews
Exploratory research design can be obtained by the following methods:
Literature Survey
The literature survey method is one of the simplest and most fruitful methods of
formulating precisely the research problem and developing hypotheses.
Hypotheses of earlier researchers are evaluated as the basis of further research.
Experience Survey
Experience survey means a survey of people who have had practical experience
with the problem to be studied. For such a survey, people who are competent and
can contribute new ideas may be carefully selected as respondents, to ensure a
representation of different types of experience. It may enable the researcher to
define the problem more concisely and help in the formulation of the research
hypothesis.
Focus Group Interview
It is an unstructured, free-flowing interview with a small group of six to ten
people. Focus groups are led by a trained moderator, who follows a flexible
format, encouraging dialogue among respondents.
www.itmuniversityonline.org Page 3 1
Descriptive and Diagnostic Research Design
Descriptive is concerned with describing characteristics of a particular individual or
group. Diagnostic is concerned with determining the frequency with which something
occurs or its association with something else.
The procedure for descriptive/diagnostic research is:
1. Formulating the objective of the study: What the study is about and why is it
being made?
2. Designing the methods of data collection: What techniques of g a t h e r i n g data will
be adapted, such as observation, questionnaires, etc.
3. Selecting the sample: How much material will be needed? From which population
is the sample taken?
4. Collecting data: Where can the required data be found and with what time period
should the data be related?
5. Process a n a l y z i n g the data.
6. Reporting the f i n d i n g s .
Hypothesis Testing Research Design
It is concerned with the testing of hypothesis for the causal relationships between
variables and helps in drawing inferences about the causality. Testing of hypothesis
employs statistical procedures, in which the inferences about the target population are
drawn from a study sample. Experimental design is the method for conducting the
hypothesis testing method. While conducting hypothesis testing research, three basic
principles of experimental design are to be followed for improving the accuracy of
research inferences about dependent variables.
The p r i n c i p l e s for experimental design are:
P r i n c i p l e of replication
P r i n c i p l e of randomization
P r i n c i p l e of local control
2.9 B a s i c P r i n c i p l e s of Experimental Design
Principle of Replication
The research design should be such that the experiment can be repeated more than
once. Each treatment is applied in many experimental units, instead of one. Due to
repetitive experiments, statistical accuracy in the estimation of the variable relationship
is increased.
Principle of Randomization
Randomization is the random assignment of treatments to the experimental or sample
units. The research design should be planned such that while experimenting, the
variations caused by the extraneous variables/factors can all be c o m b i n ed under the
general heading of 'chance'. Thus, the members of experimental or sample u n i t s are to
be selected in random manner.
Principle of Local Control
Randomization and replication do not remove all the extraneous sources of variation,
experimental errors still remain but are unknown. Local control refers to the g r o u p i n g of
the experiment units in such a way that the units within the group are more
homogenous than units in other groups. Then the randomized treatment is assigned to
these parts of blocks. Dividing the samples into various homogenous parts is known as
blocking. Blocking is done in such a way that variation due to the extraneous variable
remains fixed. U n d e r t h i s principle, the design is planned such that the v a r i a b i l i t y d u e to
the extraneous va r i a b l e can be measured, in order to reduce experimental errors.
2.10 Chapter S u m m a r y
Formulation of the research question and stating the hypothesis are key
p r e l i m i n a r y steps in the research process.
The research question or a research problem statement presents the idea that is
to be examined in the study and is the foundation of the research.
The final research question consists of a statement about the relationship of two or
more variables.
A hypothesis is a declarative statement about the relationship between two or
more variables that predicts an expected outcome.
Research d e s i g n is a framework or blueprint for conducting research.
Sampling Design
and Sampling
Techniques
03. S a m p l i n g Design and S a m p l i n g Techniques eBook
3 . 1 Introduction
If researchers want to discover the most pressing financial problems faced by the people
in general, varying from low wages to raising health care, housing costs, etc., they have
to ask everyone for t h e i r opinions. However, due to economical and time constraints, it
is not possible to question every person.
Representative small groups can be selected from the general population for research
and analysis within the time frame for the required data. Such a grouping is known as
s a m p l i n g and how such grouping is done is known as sampling techniques.
The collection of a l l the observations under consideration is known as a 'population.' A
complete enumeration or study of all items in the 'population' of a universe, where
population is a subset of the universe, is known as a census study or census i n q u i r y .
Often, it is not possible to study each and every observation in the population due to
time, money, and many other constraints. In that case, a fraction of that population is
studied and it is known as a sample study or sample survey, that is, the part of the
population taken for study is the sample.
Define p o p u l a t i o n , census, and sample
Explain sampling d e s i g n and its procedure
E x p l a i n the criteria for selecting a sampling procedure
E x p l a i n the various types of sampling techniques
3 . 2 P o p u l a t i o n , Census, a n d Sample
Population
Population is any complete group of entities that share some common set of
characteristics, about which inferences are to be made. Population characteristics are
known as parameters. For example, all registered voters in India or a l l members of the
international teachers u n i o n .
Census
Census is an investigation of all the individual elements that make u p a p o p u l a t i o n and a
total enumeration, rather than a sample. In a census, the survey investigator studies
the characteristics of each and every entity in the population. For example, the census
of I n d i a is a big source of a variety of statistical information, which i n c l u d e s the different
characteristics of the people. This census data is collected once every 10 years.
Sample
Sample is a subset or some part of a population, used to make inferences about the
whole population, as shown in Fig. 3.2a. Sampling involves the process of selecting a
number of representative study units from a defined study population. Sample
characteristics are known as statistics.
Sample
Population
Fig. 3 . 2 a : Population and Sample
3.3 Sampling Design
A systematic plan for obtaining a sample from a given population is known as a sample
design. It refers to the technique or the procedure that the researcher should adopt in
selecting items for the sa m p l e . Three decisions have to be considered while designing a
sample:
Who w i l l be surveyed? - S a m p l e :
Determine what type of information is needed and who is most likely to have it.
How many w i l l be surveyed? - Sample size:
Large samples give more reliable results than small samples.
How should a sample be chosen? - Type of S a m p l i n g :
S a m p l e members may be chosen at random from an entire p o p u l a t i o n , also known
as ' p r o b a b i l i t y s a m p l i n g . '
S a m p l e members or u n i t s might be chosen as per the requirement in general or as per
the convenience of the researcher, also known as 'non-probability sa m p l i n g ' or
'judgmental s a m p l i n g . '
3.4 S a m p l e D e s i g n Procedure
The s a m p l i n g design procedure that should be considered while selecting a sample i s :
1. Selecting Target Population
The target population should have the characteristics, about which inferences are
to be drawn. For consumer-related surveys, the appropriate population elements
frequently used are households. The population can be finite or infinite, d e p e n d i n g
on the certainty of the number of members. If numbers are known, such as
population of a city or the number of workers in a factory, then the population is
finite. If the count is not known, such as listeners of a specific radio program, then
the population is said to be infinite.
Selecting target population (or the set of objects, technically called the Universe,
to be s t u d i ed ) is the first step in developing any sample d e s i g n . Depending on the
n u m b e r of items in the universe, it can be finite or infinite. In a finite universe, the
n u m b e r of items is certain. However, in case of an infinite universe, the n u m b e r of
items is infinite.
T h e population of a town and the number of people in that town are examples of a
finite universe. The number of fishes in the sea, viewers of a specific TV serial
programme, etc. are examples of an infinite universe.
2. Select a S a m p l i n g Frame
A complete list of all cases in the population from which the sample w i l l be drawn,
is known as the sampling frame. An investigator has to take decisions concerning
a sa m p l i n g u n i t before selecting a sample.
Sampling u n i t s may be geographical, such as village, state, district, etc., may be a
social unit, such as school, club, family, etc. or it may be an individual. The
researcher w i l l have to decide one or more of such units that he has to select for
h i s study.
For exam pie, if the research objective is concerned with the members of the sports
club, then the sa m p l i n g frame will contain a complete list of individuals who are
members of that c l u b .
3. Determine if Probability or Non-probability Sampling Method will be
Chosen
In probability sampling, every element in the population has a known, non-zero
p r o b a b i l i t y of selection. The simple random sample, in w h i c h each member of the
population has an equal probability of being selected, is the best-known probability
sample. In non-probability sampling, the probability of a n y particular member of
the population being selected is unknown. The selection of s a m p l i n g units in non
probability sa m p l i n g is quite arbitrary, as researchers rely heavily on personal
judgment.
4. Determine Sample Size
This refers to the n u m b e r of items or units to be selected from the population, in
order to constitute samples. An optimum sample to be considered should satisfy
the requirements of efficiency, representativeness, reliability, and flexibility. While
deciding this, size, budgetary, and time constraints are to be considered.
Sample size is nothing but the total number of units to be selected from the
universe, in order to form a sample.
One of the main problems for the researcher is in the sa m p l e size selection. This
sample size should not be too large or too small, that is, the sa m p l e size must be
optimal. Optimal size samples can easily fulfil requirements, such as reliability,
representativeness, efficiency, and flexibility. At the time of the determination of
the sample size, the researcher must determine the desired precision to be
achieved and also, an acceptable confidence level for the estimate. The value of a
sample size d e p e n d s on the population size.
5. Parameters of Interest
While determining the design of a sample, one must note the question of the
specific p o p u l a t i o n parameters as they are essential.
For example, you may be interested in estimating the proportion of students with
some characteristic in the population or you may be interested in knowing some
average or another measure concerning the population.
6. Budgetary Constraint
Costs involved in the total sampling procedure have a great impact on decisions
relating to the size, as well as, to the type of sample.
7. Sampling Procedure
The researcher must decide about the technique to be used in choosing items for
the sample. This is a part of the sample design itself. There are several sample
designs, out of which the researcher must choose one for h i s study. Obviously, he
must select the design which, for a given sample size and cost, has a smaller
s a m p l i n g error.
3.5 Characteristics of a Good Sample Design
Some characteristics of a good sample are:
It must be a good representative of the population.
Its d e s i g n must result in a small sampling error.
Its design must be applicable in the context of funds available for the research
study.
S a m p l e d e s i g n must be able to control systematic bias in a better way.
Sample results of the sample study should be applicable g e n e r a l l y in the universe,
with a reasonable level of confidence.
Source: C. R. Kothari, Research Methodology: Methcx:ls and Techniques, New Age International Publishers,
2nd Edition
3.6 Criteria for Selecting a S a m p l i n g Procedure
There are two types of costs involved in sampling analysis: the cost of an incorrect
inference, resulting from incorrect data and the cost of collecting the data. Incorrect
inferences are gathered d u e to systematic bias and sampling error. Error in the sa m p l i n g
procedures results in systematic bias, which cannot be eliminated or reduced by
increasing sa m p l e size. One can detect and correct the causes responsible for these
errors.
A systematic bias is the result of one or more of the following factors:
1. Inappropriate s am pli ng frame
If the sampling frame is inappropriate, that is, a biased representation of the
universe, it w i l l result in a systematic bias.
2. Defective measuring device
If the measuring device is constantly in error, it will result in systematic bias. In
survey work, systematic bias can result if the questionnaire or the interviewer is
biased.
3. Non-respondents
If you are unable to sample all the individuals initially included in the sample,
there may arise a systematic bias.
4. Indeterminacy principle
Sometimes, i n d i v i d u a l s act differently, when kept u n d e r observation, than they do
when kept in non-observed situations. Thus, the indeterminacy p r i n c i p l e may also
be a cause of systematic bias.
5. Natural bias in the reporting of data
Natural bias of respondents in the reporting of data is often the cause of a
systematic bias in many inquiries. There is usually a downward bias in the income
data collected by the government taxation department; whereas, there is an
upward bias in the income data collected by social organizations. People in gereral
understate t h e i r incomes if asked about it for tax purposes, but they overstate the
same if asked for social status or their affluence.
Sampling Errors
Sampling errors are the random variations in sample estimates. Sampling error
decreases with the increase in the size of the sample. It can be measured for a given
sample design and size. The measurement of sampling error is usually called the
'precision of the sampling plan'. If we increase the sample size, the precision can be
improved. Thus, the effective way to increase precision is, usually, to select a better
sampling design, which has a smaller sampling error for a given sample size, at a given
cost.
Sampling errors are the random variations in the sample estimates around the
true population parameters.
Sampling error decreases with the increase in sample size.
Thus, a major criterion while selecting a sampling procedure is to ensure that the
procedure causes a relatively small sampling error and helps to control the systematic
bias in a better way.
Source: C.R. Kothari, Research Methodology: Methods and Techniques, New Age International Publishers,
2nd Edition
3.7 Types of S a m p l i n g Techniques
Different types of sample designs depend on two factors, namely, the representation
basis and the element selection technique. In representation basis, the sa m p l e selected
may be by probability sampling or non-probability sa m p l i n g . Probability sampling is
based on the concept of random selection. Non-probability sampling is 'non-random'
sampling. In an unrestricted sampling procedure, each sample element is selected
individually, from the population under consideration at large. All other forms of
sampling are covered u n d e r the term 'restricted sampling'.
The major types of sampling techniques are shown in Fig. 3 . 7 a .
Non-probability Probability
Sampling Techniques Sampling Techniques
Convenience . Simple Random

.
Sampling Sampling
Judgment . Systematic
.
Sampling Sampling
Quota . Stratified
.
Sampling Sampling
. Snowball Cluster
.
Sampling Sampling
Fig. 3 . 7 a : Types of Sampling Techniques
3.7.1 Non-probability Sampling Techniques
It is also known as j u d g e m e n t , deliberate or purposive sampling. In this method, the
researcher selects the items deliberately, that is, in such a sampling technique, the
researcher purposely chooses particular units of the population to constitute a sample
that w i l l represent the whole population. Some of the types of n o n - p r o b a b i l i t y sa m p l i n g
are convenience s a m p l i n g , judgment sampling, quota sampling, and snowball s a m p l i n g .
In this method, the results selected by the investigator are favourable to his point of
view, so that the entire i n q u i r y may get vitiated. There is always a serious bias entering
into t h i s type of s a m p l i n g technique.
03. Sampling Design and S a m p l i n g Techniques eBook
Convenience S a m p l i n g
In this type of sampling, the researcher selects units of the population most
conveniently, in order to form a sample. Hence, this method is referred to as
convenience sampling. This technique gives a sample of convenient elements. Only
respondents that happen to be in the right place, at the right time, get selected in the
samples. For exploratory research, convenience samples are best used when additional
research w i l l , subsequently, be conducted with a probability sample.
Judgment Sampling
Another type of non-probability sampling is judgmental (purposive) sa m p l i n g , in w h i c h a
researcher selects the units of the sample, based on their judgment, about some
appropriate characteristics required. In case study research or in a case where research
is informative, this method is often used when working on very small samples. The
samples selected by this method satisfy specific purposes of research but will not fully
represent the population. This sampling technique is used to obtain information from a
very specific g r o u p of people.
Quota S a m p l i n g
Quota sampling is one of the non-probability sampling procedures, in which various
s u b g r o u p s of a population will be represented on the basis of pertinent characteristics to
the exact extent that the investigator desires. Quota sampling is a two-stage, restricted
judgmental sampling during which, in the first stage, the population is divided into
various groups and a quota must be calculated for each group, depending on relevant
and available data. In the second stage, sample elements from quota groups are
selected based on convenience or judgment sampling.
This s a m p l i n g method is usually used for interview and survey methods. For example, in
a particular city, an interviewer takes 100 interviews to find the m o n t h l y expenditure of
the city. The interviewer selects the sample with 10/o of high class, 60% of middle
class, 10% of lower middle class, and 20/o of the rest according to the quota assigned
to each g r o u p .
Snowball S a m p l i n g
A snowball sampling is a non-probability sampling procedure, in which initial
respondents are selected by probability methods or randomly and additional
respondents are obtained from the information provided by the initial respondents.
First, a group of respondents are selected randomly and then, subsequent respondents
are selected, based on referrals made by the previous g r o u p .
This technique is usually used to locate members of a rare population and also, the
cases for which identifying population is very difficult. The error of systematic bias
occurs frequently with such sampling. For example, people can claim unemployment
benefits by hiding information about their employment.
3 . 7 . 2 Probability S a m p l i n g Techniques
It is also known as 'chance' or 'random' sampling. In this method, each item of the
population has an equal chance of being included in the sa m p l e . For example, in a
lottery method, individual units are picked up from the whole group, using some
mechanical process. The results obtained from random or probability sampling can be
measured in terms of probability. Some of the types of probability sampling are simple
random s a m p l i n g , systematic sampling, stratified sa m p l i n g , and cluster s a m p l i n g .
Simple Random Sampling
A simple random sa m p l e is a sample of size n, drawn from a population of size N, in
such a way, that the u n i t in every possible sample of size n has an e q u a l chance of being
selected, that is, simple random sampling is one of the types of probability sa m p l i n g
procedure that assures each element in the population will have an equal chance of
being included in the sa m p l e . S i m p l e random sampling is best used in case an accurate
and easily accessible sampling frame that lists the entire population, is available,
preferably stored on a computer. For a small sample size, methods like drawing
numbers or names from a fishbowl or using a spinner is appropriate. However, for a
large sample size, random number generation techniques are applied in obtaining
sample u n i t s .
A simple random sample is a subset of units selected from a population. Each unit is
selected randomly and entirely by chance, such that each unit has the same chance or
p r o b a b i l i t y of being selected at any stage during the sa m p l i n g process. Each subset
of R u n i t s has the same probability of being selected for the sample, as a n y other subset
of R u n i t s . This technique and process is known as simple random sampling. A simple
random sample is one of the unbiased surveying methods.
Suppose N people want to get a ticket for a movie but there are only X tickets
where X < N. So, the authority decides to distribute the ticket among the people,
without any bias. Then, everyone is given a number in the range from O to N - 1 and
random numbers are drawn, either from a table of random numbers or electronically,
with the help of computers. Numbers between the ranges O to N - 1 are considered,
ignoring any n u m b e r s previously selected. The first X numbers would get the X ticket.
In small or large size populations, this type of sampling is typically done 'without
replacement', in which one avoids selecting any individual of the population more than
once. Instead of this, simple random sampling can be carried out with replacement.
For a small sample from a large size population, sampling without replacement is
approximately the same as sampling with replacement, since the odds of selecting the
same i n d i v i d u a l twice is low.
Systematic S a m p l i n g
A systematic sa m p l i n g is one of the types of probability sampling in w h i c h a starting unit
1h
is selected by a random process and then, every i number on the list is selected for
subsequent sa m p l e members. The sampling interval 'i' is obtained by dividing the
population size N by the sample size 'n' and rounding to the nearest integer. When the
ordering of the elements is relegated to the characteristics of importance, systematic
sampling increases the representativeness of the sample.
For example, there are 100,000 individuals in the population and a sample of 1,000 is
required. In this case, the sampling interval, 'i', is 100. Now, select a random number
between 1 and 100. If this number is 23, then the sample consists of elements 23, 123,
223, 323, etc.
Stratified S a m p l i n g
A stratified sampling is a probability sampling procedure, in which simple random
subsamples that are more or less equal in some characteristic are drawn from within
each stratum of the population. Elements in different strata should be as heterogeneous
in nature, as possible.
There are two primary reasons for using stratified sa m p l i n g : the sa m p l e will be more
representative of the population and it ensures a specific number of individuals are
selected from each category.
00000 000
ooooo

o o . t>. 00000 ::::J DODD
o o c:. o .
. ... .... .
.o.c:.

.0. /::,,
.................. ... ...............
Population Strata Sample
Fig. 3 . 7 . 2 a : Stratified Sampling
Stratified sampling technique is generally applied, in order to obtain a representative
sample, in case a p o p u l a t i o n , from which a sample is to be drawn, does not constitute a
homogeneous group. In this method, the population is d i v i d ed into several sub
populations that are individually more homogeneous than the total population. These
different sub-populations are called 'strata'. Items should be selected from each
stratum, in order to constitute a sample. Variation within each stratum is much less than
that of a p o p u l a t i o n . Precise estimates for each stratum are computed and t h u s , a better
estimate of the p o p u l a t i o n , as a whole, is derived.
Stratified sampling gives more detailed and reliable information. The following three
questions are h i g h l y relevant in the context of stratified sa m p l i n g :
a) How to form strata?
b) How should items be selected from each stratum?
c) How many items should be selected from each stratum or how to allocate the
sample size of each stratum?
There are two methods of stratified sampling:
Proportionate Stratified Sampling
In t h i s s a m p l i n g , the size of the sample drawn from each stratum is proportionate
to the relative size of that stratum in the total population.
Disproportionate Stratified Sampling
In t h i s s a m p l i n g , the size of the sample from each stratum is proportionate to the
relative size of that stratum and to the standard deviation of the d i s t r i b u t i o n of the
characteristic of interest among all the elements in that stratum.
As variability increases, sample size must increase, in order to provide accurate
estimates. Hence, such sampling techniques are used. To increase sample efficiency, the
strata having large variability are sampled more heavily, that is, to produce smaller
random s a m p l i n g error.
Cluster S a m p l i n g
Cluster sampling is one of the types of a probability sa m p l i n g procedure, in which the
population of interest are divided into representative "clusters" of i n d i v i d u a l s .
Clusters themselves should be as homogeneous as possible, so that each cluster should
be a small-scale representation of the population itself. If the researcher cannot get a
complete list of the members of the population, then cluster sampling is conducted
because in that case, it is impossible or impractical to draw a simple random sample or
stratified sample. In the method of cluster sampling, grouping the population units is
done and then, selecting the groups or the clusters, instead of individual elements, for
inclusion in the s a m p l e .
In cluster sampling, it is necessary to divide the total area into a number of smaller,
non-overlapping areas, which are known as clusters. After forming appropriate clusters,
a n u m b e r of these clusters are randomly selected, so that all units in these s m a l l areas,
that is, clusters, get selected in the sample.
3 . 8 Chapter S u m m a r y
The collection of all the observations under consideration is known as a
' P o p u l a t i o n . ' A complete enumeration or study of all items in the ' p o p u l a t i o n ' of a
universe is known as a census study or census inquiry.
Sample is a subset or a part of the population that is used to make inferences
about the whole population. Sampling involves the process of selecting a number
of representative study units from a defined study population. Population
characteristics are known as parameters.
A definite plan for obtaining a sample from a given population is known as a
sample d e s i g n .
Procedures for sample design are:
o Select the target population
o Select the sa m p l i n g frame
o Determine if probability or non-probability sampling method will be chosen
o Determine sa m p l e size
Two major costs involved in a sampling analysis are:
o Cost of collecting data
o Cost of an incorrect inference resulting from the data
The major causes of incorrect inferences are:
o Systematic Bias
o Sampling Errors
Methods a n d
T o o l s of D a t a
Collection
04. Methods and Tools of Data Collection eBook
4 . 1 Introduction
In real life situations, we deal with different types of data, which are nothing but values
of q u a l i t a t i v e or q u a n t i t a t i v e variables related to a particular set of items.
Data collection is a process of preparing and collecting data. The main purpose of data
collection is to obtain information to make decisions, to keep a record about important
topics and also, to pass information to others. Primarily, collected data provides
information regarding a specific topic.
While dealing with any real life problem, sometimes, it is discovered that the data at
hand is inadequate; it, then, becomes essential to collect data that is a p p l i c a b l e . There
are several ways of collecting appropriate data, which differ considerably, in context with
time, costs, and other resources, at the disposal of the researcher.
E x p l a i n the types of d a t a : primary and secondary data
Describe the different methods used to collect data
Discuss the merits and demerits of various data collection methods
Explain how to collect data through a questionnaire
E x p l a i n the m a i n aspects of a questionnaire
Illustrate how to d e s i g n a questionnaire
Discuss the requirements of a good questionnaire
E x p l a i n the method of case study
Describe the characteristics of a case study
Discuss the a s s u m p t i o n s of a case study
Describe advantages and limitations of a case study
4. 2 Data Types
Data collection is the next step to defining and planning research problems. There are
various methods of data collection. The type of method used for the collection of data
d e p e n d s on the type of data to be collected. There are two types of data, primary and
secondary data. Data collected afresh, for the first time, is known as primary data, w h i l e
secondary data are those which have already been collected by another person and
which have already passed through the statistical process.
Researchers have to decide which type of data needs to be collected first for the study
and then, they can select the method of data collection. The methods used to collect
primary data are different than the method of secondary data collection. Facts and
statistics collected together for reference or analysis is known as data. Data is values of
qualitative or quantitative variables, belonging to a set of items. Data in processing is
represented in a structure, often tabular, a graph or a tree structure. Data is typically
the resu It of measurements and can be visualized using g r a p h s or images.
The process of gathering and measuring information on variables of interest is known as
data collection. This process can be established in a systematic fashion that enables one
to answer the stated research questions, evaluate outcomes, and test hypotheses.
The term data collection is used commonly, in the research of all fields of study,
including physical and humanities, business, social sciences, etc. W h i l e methods vary by
discipline, the significance on ensuring accurate and honest collection remains the same.
Accurate data collection is a very important step, rather t h a n defining data (qualitative
or quantitative) to maintain the integrity of research.
The two types of data are explained below.
4.2.1 Primary Data
In case of experimental research, you can collect primary data during the course of
conducting the experiment. However, in case of descriptive type of research, primary
data can be collected through observation or through direct communication with
respondents t h r o u g h personal interviews. Data which is collected afresh and for the first
time, w h i c h also h a p p e n s to be original in character, is known as primary data. It is data
in which information is obtained directly from first-hand sources by means of
observation, experimentation or surveys. It is in an unpublished form because it is
obtained from an o r i g i n a l survey or research study.
It means there are various methods of primary data collection. Some of these important
methods are:
Observation method
Interview method
T h r o u g h questionnaires
T h r o u g h schedules
Other methods i n c l u d e :
o Warranty cards
o Distributor a u d i t s
o Pantry a u d i t s
o Consumer panels
o Using mechanical devices
o Through projective techniques
o Depth interviews
o Content a n a l y s i s
Now, you w i l l study some of these methods in detail:
Observation Method
This is the most commonly used method of primary data collection, especially in studies
related to behavioral sciences. In this method, data is collected by observing things
around. Here, an observation becomes a scientific tool. Information is obtained by an
investigator's own direct observation, without asking the respondent. For example, in
consumer behavior study, the investigator may look at the watch, instead of inquiring
about the brand of the wrist watch that is used by the respondent.
In observation method, information is gathered by watching events, behavior or noting
physical characteristics in their natural setting. Observations can be overt (everyone
knows they are being observed) or covert (no one knows they are being observed and
the observer is concealed). Observations may also be obtained either directly or
indirectly.
In direct observation method, you watch behaviors, interactions or processes, as they
occur. For example, observing teachers teaching a topic from a written curriculum, in
order to determine whether they are delivering it with fidelity. In indirect observation
method, you watch the results of behaviors, interactions or processes. For example,
measuring the quantity of food wasted by students in a school cafeteria, to determine
whether introduction of a new food is acceptable or not.
Merits
The researcher is a b l e to keep record of the natural behavior of the g r o u p .
The researcher can even gather information which could not be easily obtained, if
the observation is in a disinterested fashion.
The researcher can even verify the truth of statements made by informants in the
context of a questionnaire or a schedule.
Demerits
If the participant participates emotionally, then the observer may lose objectivity.
The problem of observation-control is not solved.
It may restrict the researcher's range of experience.
Source: C. R. Kothari, Research Methodology-Methods and Techniques, Second Revised Edition, New Age
International Publishers, 2004.
Interview Method
In interview method, data is collected by oral-verbal stimuli and replies in terms of oral
verbal responses. Example: telephone interviews and personal interviews.
In personal interview method, a person, known as the interviewer, asks q u e s t i o n s w h i l e
face-to-face with the participant.
Merits
More information can be obtained and in greater depth.
Interviewers can overcome the resistance of the respondents.
There is greater flexibility under this method, it can be a p p l i e d in recording verbal
answers to various questions.
Personal information can be obtained easily.
Demerits
It is a very time-consuming, expensive method, especially when a large and
widely spread geographical sample is taken.
There remains the possibility of the bias of interviewer, as well as, that of the
respondent.
Certain types of respondents, such as important executives, officials or people in
high income g r o u p s may not be easily approachable u n d e r t h i s method and to that
extent, the data may prove inadequate.
The presence of the interviewer on the spot may over-stimulate the respondent,
sometimes even to the extent that the respondent may give i m a g i n a r y information
just to make the interview interesting.
Through Questionnaires
Particularly in the case of big enquiries, this method is quite popular. It is being adopted
by private and public organizations, private individuals, research workers, and even by
the government. A questionnaire is sent to the persons concerned, with a request to
answer the q u e s t i o n s and return the questionnaire. The questionnaire is considered as a
main part of the survey method; hence, it is constructed very carefully.
If it is not properly set up, the survey is bound to fail. The general form, question
sequence, question formulation and wording are the main aspects of a q u e s t i o n n a i r e . A
questionnaire is a set of a number of questions typed or printed in definite order on a
paper. The respondents have to answer the questions on their own.
Merits
It is low cost, even when the universe is large and is widely spread geographically.
It is free from the bias of the interviewer; answers are in the respondent's own
words.
Respondents have adequate time to think carefully before a n s w e r i n g .
Respondents, who are not easily approachable, can also be reached conveniently.
Large samples can be made use of, thus the results can be made more d e p e n d a b l e
and reliable.
Demerits
Low rate of return of the d u l y filled in questionnaires; bias d u e to no-response is
often indeterminate.
It can be used only when respondents are educated and cooperating.
The control over questionnaires may be lost once it is sent.
There is inbuilt inflexibility due to the difficulty of amending the approach once
q u e s t i o n n a i r e s have been dispatched.
It is difficult to know whether willing respondents are truly representative.
This method is likely to be the slowest of all.
Through Schedules
This method of data collection is very similar to the method of collection of data t h r o u g h
questionnaire. The difference here is that schedules are filled by the enumerators that
are appointed for this purpose. Schedules may be handed over to respondents and
enumerators may help them in recording their answers against the questions.
Enumerators explain the aim and objective of the investigation; they also remove
d i ff i c u l t i e s faced by any respondent in understanding the implications of a particular
question, the definition or concept of difficult terms. The enumerators should be trained
well and the scope and nature of the investigation should be explained to them
thoroughly, so that they can perform. With complete training, they can understand the
implications of different questions put in the schedule. Enumerators must possess the
capacity of cross-examination. They must be intelligent, hardworking, sincere, honest,
and should have patience and perseverance.
Merits
Non-response is generally very low.
In case of schedule, the identity of the respondent is known.
The information is collected ahead of time, as they are filled by enumerators.
Direct personal contact is established with respondents.
Information can be gathered even when the respondents h a p p e n to be illiterate.
Demerits
Collecting data through schedules is relatively more expensive because a
considerable amount of money has to be spent in appointing enumerators and in
providing and imparting training to them. Money is also spent in preparing
schedules.
There remains the danger of interviewer bias and cheating.
There usually remains difficulty in sending enumerators over a relatively wider
area.
4 . 2 . 2 Secondary Data
A data which has already been collected by someone else and which has already been
passed t h r o u g h the statistical process is known as secondary data. This data may be in
published or u n p u b l i s h e d form. Secondary data are already a v a i l a b l e . Secondary data is
that data which already been collected and analyzed by someone else. If you are using
the secondary data, then you have to look into the various sources from where it is
obtained.
Sources of Published Data
P u b l i c a t i o n s of the state, central governments.
P u b l i c a t i o n s of international bodies and their subsidiary organizations or of foreign
governments.
Trade and technical journals.
Magazines, books, and newspapers.
Publications and reports of various associations connected with industry and
business, banks, stock exchanges, etc.
Reports prepared by research scholars, economists, universities etc. in different
fields.
Public records and statistics, historical documents of published information. This
type of data is used very carefully.
Sources of U n p u b l i s h e d Data
Letters, diaries, autobiographies, and unpublished biographies.
Scholars and research workers, labor bureaus, trade associations, and other
private/public organization individuals.
Before using secondary data, the researcher must know the following characteristics of
secondary d a t a :
Reliability of Data
The r e l i a b i l i t y of data d e p e n d s on factors like:
Who collected the data?
What were the sources of data?
Were they collected by using proper methods?
At what time were they collected?
Was there any bias of the compiler?
What level of accuracy was desired? Was it achieved?
Suitability of Data
Data that is suitable for one type of enquiry may not necessarily be suitable to another
enquiry. If the available data is found to be unsuitable, it cannot be used by the
researcher. The researcher must very carefully scrutinize the definition of various terms
and u n i t s of collection used at the time of collecting the data from the primary source.
Adequacy of Data
If the level of accuracy achieved in the data is known and if it is found to be inadequate,
the researcher should not use that data for further study. Data that is related to an area,
which may be either narrower or wider than the area of the present enquiry, will be
considered as inadequate. It means that using secondary data is very risky. If the
secondary data is found to be more reliable, suitable, and adequate, only then
secondary data is used. No one can blindly refuse the use of available data, if it is
a v a i l a b l e from authentic sources. Using secondary data will not be economical to spend
time and energy in field surveys for collecting information.
4 . 3 Q u e s t i o n n a i r e Design
A questionnaire contains a set of questions, especially one addressed to a statistically
significant n u m b e r of subjects, as a way of gathering information for a survey.
The Main Aspects of a Questionnaire
The General Form
The general form of a questionnaire can either be structured or unstructured.
Structured questionnaires are questionnaires in which there are concrete, definite,
and pre-determined questions. The form of the question may be either closed (the
type 'yes' or 'no') or open (inviting free response) but should be stated in advance
and not constructed during questioning.
A wide range of data, in the respondent's own words, cannot be obtained with
structured questionnaires. In such situations, unstructured questionnaires may be
used effectively. On the basis of the results obtained in pretest (testing before
final use) and operations from the use of unstructured questionnaires, one can
construct a structured questionnaire for use in the m a i n study.
Question Sequence
If the sequence of questions is in a proper manner, it will considerably reduce the
chances of i n d i v i d u a l questions being misunderstood. The question-sequence must
be smoothly-moving and clear, thereby meaning that the relation of one question
to another should be readily apparent to the respondent, with questions that are
easiest to answer being put first.
The following type of questions should be avoided:
Questions that put great strain on the memory or intellect of the respondent.
Questions of a personal character.
Questions related to personal wealth.
Question Formulation and Wording
Questions should be constructed with a logical view to form a carefully considered
tabulation plan. In general, all questions should meet the following standards:
Should be easily understood
Should be simple
Should be concrete
Should conform, as much as possible, to the respondent's way of t h i n k i n g
Since words are likely to affect responses, they should be properly chosen. Simple
words, which are familiar to all respondents, should be preferred. Words with
ambiguous meanings must be avoided. Similarly, danger words, catch-words or
words with emotional connotations should be avoided.
Sample of a Questionnaire
International Students Questionnaire
This is a research study conducted by a group of medical students. Please do NOT write
your name on the questionnaire, as this study is anonymous. Do not feel obligated to
answer a l l q u e s t i o n s if you are uncomfortable or unable to do so. T h a n k you very much
for taking the time to complete our questionnaire, your effort is greatly appreciated.
1 . Are you male or female?
Male - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 1 c:::J
Female - - - - - - - - - - - - - - - - - - - - - - - - - - - - 2 c:::J
2. What is your current age?
17 - 19 - - - - - - - - - - - - - - - - - - - - - - - - - - - - 1 c:::J
20 - 22 - - - - - - - - - - - - - - - - - - - - - - - - - - - - 2 c:::J
23 - 25 - - - - - - - - - - - - - - - - - - - - - - - - - - - - 3 c:::J
26 - 30 --------- ----- ---- ----- -----4 c:::J
30+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 5 c:::J
3. W h i c h continent are you from?
Eu rope - - - - - - - - - - - - - - - - - - - - - - - - - - - - 1 c:::J
Asia - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 2 c:::J
Africa - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 3 c:::J
North America - - - - - - - - - - - - - - - - - - - 4 c:::J
South America - - - - - - - - - - - - - - - - - - - 5 c:::J
Australia/New Zealand ----------6 c:::J
4. What is your first l a n g u a g e ?
English ----------------------------1 c:::J
N o n - E n g l i s h - - - - - - - - - - - - - - - - - - - - - - - 2 c:::J
5. Rate your a b i l i t y to communicate in English, when you came to Canada.
None 1 2 3 4 5 6 7 8 9 Excellent
D D D D D D D D D D D
6. What r e l i g i o n do you identify yourself with?
Budd hist - - - - - - - - - - - - - - - - - - - - - - - - - - - l c:::J
Christian - - - - - - - - - - - - - - - - - - - - - - - - - - - 2 c:::J
Hindu - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 3 c:::J
Islam - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 4 c:::J
Jewish - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 5 c:::J
Non-d enom inationa 1 - - - - - - - - - - - - - - - 6 c:::J
Other - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 7 c:::J
7. What year of study are you currently enrolled in?
1 -----------------------------------lc::]
2 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 2 c::J
3 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 3 c::J
4 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 4 c::J
Other - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 5 c::J
8. What program are you currently enrolled in?
Bachelor of Arts - - - - - - - - - - - - - - - - - - 1 c::J
Bachelor of Business - - - - - - - - - - - - - 2 c::J
Administration
Bachelor of Science -------------3 c::J
Nursing - - - - - - - - - - - - - - - - - - - - - - - - - - - 4 c::J
Human Kinetics - - - - - - - - - - - - - - - - - - 5 c::J
Source: http://people.stfx.ca/wjackson/questionnaires/2004/Gl WJ04.pdf
4.4 R e q u i r e m e n t s of a Good Questionnaire
Questions should be short and simple.
Questions should proceed in logical sequence moving from easy to difficult
questions.
Personal and intimate questions should be left to the e n d .
Technical terms and vague expressions capable of different interpretations should
be avoided in a questionnaire.
Questions may be dichotomous (yes or no answers), multiple choice (alternative
answers listed) or o p e n - e n d e d .
The control questions, thus, introduce a cross-check to see whether the
information collected is correct or not.
Questions affecting the sentiments of respondents should be avoided.
Adequate space for answers should be provided in the questionnaire to help
editing and tabulation.
The q u a l i t y of the paper, along with its color, must be good so that it may attract
the attention of respondents.
4 . 5 Case Study
This method involves a careful and complete observation of a social unit. A person, an
institute, a family, a cultural group or even an entire community, are examples of social
units. It is a very p o p u l a r form of qualitative analysis. Case study emphasizes on the full
a n a l y s i s of a limited n u m b e r of events or conditions and their interrelations. It deals with
the process that takes place and their interrelationship and it is also an intensive
investigation of the particular unit under consideration. To locate the factors that
account for the behavior patterns of the given unit, as an integrated totality, is the
objective of the case study method.
Characteristics of a Case Study
For the purpose of this study, the researcher can take one single social unit or
more than one such u n i t .
The u n i t selected for study is studied intensively.
In t h i s method, researcher studies the social unit covering a l l facets completely.
An effort is made to know the mutual interrelationship of causal factors.
Researcher can study the behavior pattern of the concerning u n i t directly and not
by an indirect and abstract approach.
This method results in fruitful hypotheses, along with the data, which may be
helpful in testing them.
Assumptions of a Case Study
The assumption of uniformity in basic human nature, in spite of the fact that
h u m a n behavior may vary according to situations.
The a s s u m p t i o n of studying the natural history of the unit concerned.
The a s s u m p t i o n of comprehensive study of the unit concerned.
Advantages of Case Study
It enables to fully understand the behavioral pattern of the concerned unit.
A researcher can obtain a real and enlightened record of personal experiences.
It enables the researcher to trace out the natural history of the social u n i t and its
relationship with the social factors and the forces involved in its surrounding
environment.
It helps in formulating relevant hypotheses, along with data, w h i c h may be helpful
in testing them.
This method facilitates an intensive study of social units, which is generally not
possible.
It h e l p s the researcher in the task of constructing an appropriate q u e s t i o n n a i r e or
schedule for the said task, which requires thorough knowledge of the concerning
universe.
The researcher can use one or more of the several research methods u n d e r case
study method, depending on the prevalent circumstances.
It is beneficial in determining the nature of the units to be studied, along with the
nature of the universe.
Case studies constitute the perfect type of sociological material, as they represent
a real record of personal experiences, which very often escapes the attention of
the most skilled researchers using other techniques.
Case study method enhances the experience of the researcher and this, in turn,
increases their analyzing ability and skill.
This method makes the study of social changes possible. On account of the m i n u t e
study of the different facets of a social unit, the researcher can well understand
the social change, then and now.
Limitations of Case Study
Case situations are seldom comparable and the information gathered in case
studies is often not comparable as such.
Real information is often not collected because the subjectivity of the researcher
does enter in the collection of information in a case study.
No set rules are followed in the collection of the information and only few u n i t s are
studied. It consumes more time and requires a lot of expenditure.
Case data is often vitiated. Sampling is not possible u n d e r a case study method.
Case study method is based on several assumptions, which may not be very
realistic at times and the usefulness of case data is always subjected to d o u b t .
Response of the respondent is a vital limitation of the case study method.
4.6 Chapter Summary
Data are values of qualitative or quantitative variables, b e l o n g i n g to a set of items.
Primary data is that which are collected afresh and for the first time and hence,
happen to be o r i g i n a l in character.
The following methods are for collecting primary data: observation method,
interview method, through questionnaires, and through sc h e d u l e s .
Secondary data are those which have already been collected by someone else and
have already been passed through the statistical process. Secondary data may
either be p u b l i s h e d or unpublished data.
Case study method involves careful and complete observation of a social u n i t .
Measurement and
Scaling Techniques
05. Measurement and Scaling Techniques eBook
5 . 1 Introduction
Measurement is an essential factor in our daily life. To cook a d i s h , one has to m i x the
required i n g r e d i e n t s in proper measurements to get the perfect recipe. If the ingredients
in the recipe are mixed without any measurement, it can spoil the whole recipe. To give
such measurement, one has to have some standards. For this standard, different
measurement scales are used, depending on the variable or objects u n d e r study.
Some standard measurement scales are liter for water, kilogram for weight of an object,
meter for height of an object, etc. These are physical measurements but in sciences and
b u sin e s s research, to measure the attitudes of the respondents from whom data is
collected, also requires an appropriate measurement scale. In this chapter, such
measurement scales that are required in the measuring attitudes w i l l be d i s c u s s e d .
Define measurement and scale
E x p l a i n various scales of measurement
Discuss comparative scaling techniques
Discuss categorical scaling techniques
OS. Measurement and Scaling Techniques eBook
5 . 2 M e a s u r e m e n t and Scaling
Measurement
According to Z i k m u n d , "Measurement is the process of describing some property
of a phenomenon of interest, usually by assigning numbers in a r e l i a b le and
valid way."
According to Nunnally (1978), "Measurement means assigning numbers or other
symbols to characteristics of objects according to certain pre-specified rules."
Thus, when an object or item is measured, it is assigned some numerical value, which,
in t u r n , reflects the property of the object or item. It is important to have knowledge of
the properties and the concept of the object or items for its measurement. Properties
are some specific characteristics of an object or item. It helps in distinguishing one
object from the other. The properties of the objects may be of an objective or subjective
type. The objective properties are the properties that can be physically described, while
subjective properties can only be mentally described. Also, the concept of the object
should be known, w h i c h gives you a brief view about the object, that is, the m e a n i n g .
Scaling
Scaling is defined by Zikmund as, "A device providing a range of values that
correspond to different values in a concept being measured is called scaling."
According to S. L. Gupta and Hitesh Gupta, "Various researchers have classified
scales and scaling techniques based on scale properties, subject orientation,
n u m b e r of dimensions, scale construction techniques, etc."
They have provided various scale construction techniques or approaches:
Arbitrary approach: Scales are developed on an unplanned basis.
Consensus approach: A panel of experts evaluates the chosen items for their
inclusion in the measurement instruments.
Cumulative approach: Scales are chosen based on their conforming to some
ranking of items, with ascending and descending d i sc r i m i n a t i n g power.
Factor approach: Scales are developed based on inter-correlations of items
i n d i c a t i n g the common factor accounting for the relationship between items.
5 . 3 P r i m a r y Scales of Measurement
The primary scales of measurements are classified into four categories, namely, nominal
scale, o r d i n a l scale, interval scale, and ratio scale.
N o m i n a l Scale
The simplest scale in measurement is the nominal scale. It helps in identifying the types
among different categories, in which it falls. For example, male, female, married,
u n m a r r i e d , etc.
Nominal scale is considered to be a qualitative scale because it involves only categorical
data, rather than metric data. Nominal data may be represented by numbers, like 1, 2,
3, etc. that are in a metric representation. The nominal data can also be represented by
symbols or letters or figures. For example, in a business research, the gender of the
respondent from whom the data is collected is labeled as O for female and 1 for m a l e .
O r d i n a l Scale
Ordinal scale is a scale in which data is categorized according to some common
properties, like nominal scale. However, such data is arranged in either ascending or
descending order. Thus, if data can be ordered accordingly, then such data is termed as
o r d i n a l data.
For example, three biscuit brands that are in very close competition, in terms of t h e i r
taste, can be rated accordingly, with the best tasting biscuit brand as the first, followed
by the other two. Some other examples of ordinal data are:
Socioeconomic status: High, low or poor
Position of student in an examination: First, second, third, etc.
Interval Scale
An interval scale is the same as the ordinal scale, with a d d i t i o n a l information about the
difference between the categories.
In this scale, the units of measurement between the numbers on the scale are all
equidistant, that is, of eq u a l interval. A good example of an interval scale is temperature
(in - c or F ); the scale of temperature from O - 100c is d i v i d ed into 100 equal parts,
that is, the difference between any two successive numbers is fixed.
Ratio Scale
It is the most advanced level of measurement. The property on the basis of which it is
measured can be obtained accurately, as a numerical value. Some of the examples of
ratio scale are weight, height, distance, sale, etc., which are numerical values defining
the property for the object, with zero denoting the absence of the property in the object.
Table 5.3a summarizes the different scales.
Numerical Descriptive
Level Examples
Operations Statistics
Employee ID number
Yes or No
Frequencies
Nominal Good or Bad Counting
Mode
Religion: Hindu, Muslim, Christian,
S i k h , etc.
Student class rank Frequencies
Indicate your level of education: Counting Mode

Ordinal
h i g h school, h i g h school diploma, and ordering Median
college degree, X standard pass Range
Frequencies
Mode
Student's Grade Point Average
Common Median
(GPA)
Interval arithmetic Range
1 0 0 - p o i n t job performance rating
operations Mean
provided by supervisor
Variance
Standard deviation
Frequencies
Mode
Salesperson's sales volume
All Median
N u m b e r of stores visited on a
Ratio arithmetic Range
s h o p p i n g trip
operations Mean
A n n u a l family income
Variance
Standard deviation
Table 5 . 3 a : Different Measurement Scales
Source (for the t a b l e ) : Zikmund, Babin, Carr, Griffin, Business Research Methods, Eighth Edition
Reliability and Validity of a Measurement Scale
Reliability of a measurement has been illustrated in simple words by Bruce Wren,
Robert Stevens, and David Louden as, "A reliable measure is one that
consistently generates the same result over repeated measures. For example, if
a scale shows that a standardized 1 lb weight actually weighs 1 lb when placed
on the scale today, tomorrow, and next Tuesday, then it appears to be reliable
scale. If it reads a different weight, then it is unreliable, the degree of
unreliability indicated by how frequently and by how much it reads an in
accurate weight."
V alid it y is the extent of accuracy of a measure. To determine the v a l i d i t y of a measure is
not a s i m p l e evaluation. Thus, reliability measures consistency and v a l i d i t y measures the
accuracy of a measure, which are criteria for a good measure.
Reliability vs. Validity
According to Z i k m u n d , Babin, Carr, and Griffin, the difference between reliability and
valid it y can be explained by an experiment, as shown in Fig. S.3a, "Suppose an
expert sharpshooter fires a number of rounds with a century-old rifle and
modern rifle. The shots from the older gun are considerably scattered, but
those from the newer gun are closely clustered. The variability of the old rifle
compared with that of the new one indicated it is less reliable. The target on
the right illustrates the concept of a systematic bias influencing validity. The
new rifle is r eliable (because it has little variance), but the sharpshooter's
vision is hampered by glare. Although shots are consistent, the sharpshooter is
unable to hit the bull's-eye."
...--.::
. . .
............. : ._ .

Old Rifle New Rifle New Rifle Sunglare
Low Reliability High Reliability Reliable but not Valid
(Target A) (Target B) (Target C)
Fig. S . 3 a : Bull's-eye with the Shots from Different Rifles
Source: Z i k m u n d , Babin, C.arr, Griffin, Business Research Methods, 8th Edition
5 . 4 C l a s s i f i c a t i o n of S c a l i n g Techniques
Scaling techniques can be further classified into two main categories, namely,
comparative scale and categorical or non-comparative scale. They are classified
according to what the responses of the respondents under a study are.
Comparative Scale
In this type of scale, the respondent is asked to compare one object with another and
furnish a response. For example, the respondent is asked to compare two cosmetic
brands, Lakme and Ponds, on its effectiveness. The response, in such a case, can be
either Lakme or Ponds, whichever the respondent finds effective. However, in t h i s scale,
the respondent can only choose the brand he/she finds effective but cannot allocate a
numerical value for the effectiveness.
These types of scales are ordinal in characteristics and so it is also known as non-metric
scale. There is no standard for this scale and different respondents use different
approaches or standards.
Categorical or Non-comparative Scale
In a categorical scale, the respondent has to rate his/her responds in a scale in which
each object is individually evaluated and each object is independent of the other. For
example, three b r a n d s of deodorant are rated on the basis of t h e i r fragrance in a scale
of 1 to 5. Where 1 indicates the best and 5 indicates the worst fragrance, as shown in
Fig. S.4a.
Q. Rate the frag ranee of the following brands in the scale below:
Brand 1
1 2 3 4 5
Best Better Good Bad Worst
Brand 2
1 2 3 4 5
Brand 3
1 2 3 4 5
Fig. S.4a: Non-comparative Scale
The comparative scale is further classified into two different scales: Rank order and
paired comparisons. Also, categorical or non-comparative scale is classified as
continuous rating scale and itemized scale. The itemized rating scale is further d i v i d e d
into four different scales:
Like rt Sea le
Semantic Differential Scale
C u m u l a t i v e scale
Stapel Scale
The classification of the measurement scaling techniques is shown in Fig. 5.4b.
Scaling
Techniques
Comparative
Categorical Scale
Scales
Itemized
Paired Continuous
Rating
Comparison Rating Scale

Scale
Semantic
Cumulative
Differential
Scale

Scale
Fig. 5.4b: Classification of Measurement Scaling Techniques
5 . 5 C o m p a r a t i v e Scales
The different types of comparative scales are:
Rank Order Scale
In t h i s scale, the respondent is asked to rank several objects based on certain properties
or criteria. It is the simplest and quickest to apply. Ranking the extremes is very easy in
t h i s method but ranking the objects in between is difficult.
Example:
Rank the following soft d r i n k s , on the basis of their taste, from 1 for the best tasting one
to 5 for the worst tasting one.
Coca-Cola
7-Up
Sprite
Pepsi
Mountain Dew
Paired Comparison
As the name implies, this scale involves the comparison of different pairs of objects.
Here, the respondent is provided with pairs of objects and he/she has to select one of
the objects from the pair, on the basis of some criteria.
If there is ' n ' n u m b e r of objects, then there will be n(n2- l) pairs to be compared.
The data obtained in this scale is ordinal and the responses of the respondent obtained
can be transformed into a matrix form. This method was given by L. L. Thurstone
(1927).
The method can be explained with the help of an example; if six apparel brands are
compared, namely, B i b a , Aurelia, W, Global Desi, Kimaya, and Fab I n d i a , on the basis of
t h e i r d e s i g n and a sa m p l e of 100 female customers or respondents are taken. There w i l l
5 52 1
be 15 pairs [ < - >]. After each respondent furnishes their response in the forms,
t h e i r preferences can be presented in a matrix form as illustrated in Table s . s a .
Fab
Biba Aurelia w Global Desi Kimaya
India
Biba x 0 1 1 1 1
Aurelia 1 x 1 1 1 1
w 0 0 x 0 0 0
Global Desi 0 0 1 x 0 0
Kimaya 0 0 1 1 x 1
Fab India 0 0 1 1 0 x
Total 1 0 5 4 2 3
Table 5 . S a : Table for Paired Comparison Scale Method
In Table 5.Sa, the value 1* indicates that the brand in that column (that is, Biba) is
preferred over the brand in the row (Aurelia).
5 . 6 C a t e g o r i c a l Scales
The different types of categorical scales are:
Continuous Rating Scale
In this scale, the respondents are to mark ( <) an appropriate position, which is
considered by them to be the favorable case in a number scale or pictorial scale. The
rating scale is represented by line diagrams, scale with pictures, and others.
Example:
1. How would you rate the services of DTDC as country-specific courier service?
Very good Good Quite good Neither Quite bad Bad Very bad
Fig. 5 . 6 a : Continuous Rating Scale
2. Is the conducted test easy or difficult?
Very Easy Very Difficult
Or
Very Easy Very Difficult
7 6 5 4 3 2 1
Fig. 5 . 6 b : Continuous Rating Scale
3. Are you satisfied with your job?

0 1 2 3 4 5
Very Very
Satisfied Unsatisfied
Fig. 5.6c: Continuous Rating Scale
In the above question 3, a pictorial scale is used.
Itemized Rating Scale
Itemized rating scale is also known as a numerical rating scale. In t h i s scale, a series of
statements are given, from which the respondent needs to select the statement
according to the response. Unlike the continuous scale, this scale gives a rating scale in
which the respondent can select the favorable statement. The measurements in the
scale used should be of odd categories, preferably and most likely to be five to nine
categories.
According to variables, respondents, etc., the itemized rating scale has the following
divisions:
Likert Scale (Summated Scale)
Cumulative Scale
Stapel Scale
Likert Scale (Summated Scale}
This scale was developed by Rensis Likert in 1932. According to Mukul Gupta and
Deepa Gupta, "Summated scale consists of a number of statements which
express either a favorable or unfavorable attitude towards the given object to
which the respondent is asked to react. The respondent indicates his
agreement or disagreement with each statement in the instrument."
This scale is a five-point scale and the scale ranges from 1 to 5 or - 2 to 2, with O as the
neutral response.
Example:
1. Are you interested in mutual fund investments?
Very interested Somewhat Neutral Not very Not at a l l
interested interested interesting
5 4 3 2 1
Table 5.6a: Likert Scale
2. Was the workshop on 'Marketing Research' useful to you?
Strongly Disagree Neutral Agree Strongly Agree
Disagree
- 2 - 1 0 1 2
Table 5.6b: Likert Scale
Charles E. Osgood, G. J. Suci, and P. H. Tennenbnum ( 1 9 7 5 ) developed the scale known
as semantic differential scale. In this scale, words, rather than numbers, are used in a
seven-point scale, that is, two adjectives are placed in the extreme points of the scale
and the respondent has to select the value, accordingly, in the b i p o l a r scale.
Q. Rate the furniture of the brand 'Zuari':
Strong Weak
Expensive Inexpensive
Fashionable Unfashionable
Fig. 5.6c: Semantic Scale
C u m u l a t i v e Scale
It is also known as Louis Guttman's scalogram analysis, named after the person who
developed it. It consists of some statements, which the respondent has to give his/her
verdict of agreement or disagreement on.
According to C. R. Kothari, "Scalogram analysis refers to the procedure for
d e t e r m i n i n g whether a set of items forms a unidimensional scale."
Respondent's
Questions
Score
4 3 2 1
,/ ,/ ,/ ,/
4
,/ ,/ ,/
x 3
,/ ,/
x x 2
,/
x x x 1
./ = Agreement ; x = Disagreement
Table 5 . 6 d : Response Pattern in Scalogram Analysis
From Table 5.6d, you can see that the respondent is provided with two options,
agreement and disagreement, for each question asked. The questions are built up in
such a way that if the respondent's answer is positive to question 2, then the
respondent's answer to question 1 may also be positive. Likewise, if the respondent's
answer for que st i o n 3 is agreement, then the answer for question 1 and 2 may be
agreement. Hence, it is also termed as a cumulative scale.
Stapel Scale
The stapel scale was developed by Jan Stapel. It is a 10-point interval scale from
+ 5 to - 5. But there is no neutral point 0. This scale is unipolar because only one
adjective is under consideration. It even has a number of categories, unlike the other
scales.
Example:
How would you rate the extent to which the tag line of a p a rt i c u l a r product matches
accordingly?
(+5)
(+4)
(+3)
(+2)
(+1)
Perfectly Matches
{-1)
(-2)
(-3)
(-4)
(-5)
Fig. 5.6e: Stapel Scale
This scale is u s u a l l y presented vertically. Respondents select their response on the basis
of t h e i r perspective as to what degree the word perfectly matches is appropriate to the
context. The larger positive number indicates high accuracy, while a smaller negative
n u m b e r indicates less accuracy.
Table 5.6d summarizes the different measurement scales.
Non-comparative Scale
Basic
Scale Examples Advantages Disadvantages
Characteristics
Continuous Place a mark on a Reaction to 1V Easy to Scoring can be
Rating Scale c o n t i n u o u s line commercials construct cumbersome,
unless
computerized
Itemized Rating Scale
Likert Scale Degree of Measurement Easy to More time
agreement on a 1 of attitudes construct, consuming
(strongly administer, and
disagree) to 5 understand
(strongly agree)
Semantic Seven-point scale Brand, product, Versatile Controversy
Differential with bipolar and company about whether or
Scale labels images not the data are
interval
Stapel Scale U n i p o l a r ten- Measurement Easy to Confusing and
point scale, - 5 of attitudes and construct; difficult to a p p l y
to + 5, without a images administered
neutral point over telephone
(zero)
.
Table 5 . 6 d : Different Measurement Scales
Source: Naresh K. Malhotra, Satyabhushan Dash, Marketing Research, An applied Orientation, Fifth
Edition, Pearson Education, 2007
Before preparing a non-comparative itemized rating scale, you should keep the following
factors in m i n d , w h i l e u s i n g an itemized scale.
The n u m b e r of categories under study.
Check whether a balanced or unbalanced scale is to be used. A balanced scale is
the scale in which the positive and negative categories are equal, while in an
unbalanced scale, the scale does not have equal number of categories.
Odd or even n u m b e r s of categories under study.
The scale must be a forced or an unforced rating scale. In a forced rating scale,
respondents are forced to select an option in the middle of the scale, as it does
not contain a 'no opinion or comment' option, while in the unforced scale,
respondents are g i v e n an option of 'no opinion', if they find that the options are
not a p p l i c a b l e or cannot disclose their response.
Verbal description about the categories in different scales varies. So it should be
presented in such a way that the responses are close to the q u e s t i o n .
The scales can be either horizontally or vertically presented. They can also be
represented by boxes, lines with numbers or without numbers.
5 . 7 Chapter Summary
Measurement is the process of assigning numbers or scores to characteristics or
attributes of the objects or people of interest.
Primary scales of measurements are:
o N o m i n a l Scale: Labeling of objects
o O r d i n a l Scale: Ranking the objects
o Interval Scale: Expressing relative meaning
o Ratio Scale: Expressing absolute values
S c a l i n g t ec h n i q u e s are classified as follows:
o Comparative Scaling Technique: Comparing two objects
o Categorical Scaling Technique: Rating the same attribute of object
Tabulation and
A n a l y s i s of D a t a
06. Tabulation and Analysis of Data eBook
6 . 1 Introduction
In t h i s chapter, t a b u l a t i o n and analysis of data will be discussed. T a b u l a r representation
helps to view large data in a compact manner. It also enables us to compare different
variables at one time. The summing of the values of different variables becomes easy
when it is expressed in a tabular form. Thus, tabulation of data helps to get a precise
overview of the data.
Analysis of data is the most essential part of a research, which leads to the conclusion of
the research. The a n a ly s i s of data should be done scientifically, by using various
statistical techniques or tools. In this chapter, some of the statistical t e c h n i q u e s used for
an alys i s of data are d i s c u s s e d . By using an appropriate statistical tool, correct f i n d i n g s in
a research are o b t a i n e d .
Define t a b u l a t i o n and its parts
State the p r i n c i p l e s of tabulation
Discuss m u l t i p l e regression analysis
Discuss m u l t i p l e d i sc r i m i n a n t analysis
Discuss the measures of central tendency
E x p l a i n the measures of dispersion
Describe the measures of skewness
E x p l a i n the measures of relationships
Describe the association of attributes
E x p l a i n measures like time series and index numbers
6.2 Tabulation
Tabulation is the representation of data in a compact form. It is used in all kinds of
reports, articles, j o u r n a l papers, etc. to summarize a particular data. The use of tables is
in numerous but its importance in research is immense. It is h i g h l y recommended, as the
data is organized in a systematic manner, reflecting the information of the data used for
the table and it also helps in further analysis required for the interpretation of the
research f i n d i n g .
As explained by Tuttle, "The logical listing of related qualitative data in vertical
columns and horizontal rows of numbers with sufficient explanatory and
qualifying words, phrases and statements in the form of titles, headings and
explanatory notes to make clear the full meaning, context and the o r i g i n of the
data.''
Thus, the technique of organizing the given data in a tabular form is known as
tabulation. In a table, it is very important that the headings for each of the rows and
c o l u m n s are properly inserted according to the data. There is always a title to define the
table provided and also, if there is a source from which the table is taken or any other
additional information relating to the table, it is provided below the table. A table
consists of the table title, along with the table number, rows, c o l u m n s , and footnotes, as
shown in Fig. 6 . 2 a .
Table N u m be r - Table 2: Rwandan mineral production (1995-2000) - Title of the Table
Year Gold Cassiterite Coltan Diamond -column
Production Production Production Exports
(kg) (tons) (tons) (USS)
1995 1 247 54 n/a
1996 1 330 97 n/a - Row
1997 10 327 224 $720,425
1998 17 330 224 $16,606
1999 10 309 122 $439,347
2000 10 437 83 $1, 788,036
Sources: Coltan, cassiterite and gold figures derived from Rwandan Official Statistics
Source +- (No. 227/01/10/MIN): diamond figures from the Diamond High Council. (All figures
originally appeared in the UN Panel of Inquiry, 2 0 0 1 . All 2000 figures are to October.)
Fig. 6 . 2 a : Parts of a Table
The table title should be concise, reflect the complete meaning of the representation and
must follow after a table number. In Fig. 6.2a, the table is about mineral production
from the year 1995-2000, taken from Rwanda official statistics and is appropriately
titled, "Rwanda: mineral production, 1995 - 2000".
The horizontal and vertical representation of a data is known as rows and c o l u m n s of the
table, respectively. In Fig. 6 .2 a , there are four columns and six rows. Each data is
placed in an individual cell. If, other than the source, there is more information to be
furnished in the table, like short-forms or abbreviations used in the table, they can be
provided below the table after the source (if any). All this information is together known
as footnotes.
The p r i n c i p l e s of a table can be stated as:
1. Every table must contain a title that is clearly understandable and concise.
2. A table should be properly numbered for future reference.
3. The c o l u m n h e a d i n g and row heading should be clear and short.
4. The u n i t s of measurement should always be mentioned wherever necessary in row
or column heading. For example, as illustrated in Fig. 6.2a, the units of
measurement, like 'kg' and 'tons' are provided.
5. The footnotes should be placed beneath the table, along with any other a d d i t i o n a l
information needed to be highlighted in the table.
6. The source should be below the table.
7. The c o l u m n s can be numbered for reference. If a table contains ten c o l u m n s , then
by n u m b e r i n g the columns, it helps in its reference.
8. The data of the columns that are to be compared should be placed side by side.
9. A l i g n m e n t of values in the cells should be proper. Also, the decimal point and signs
like + and - should be properly aligned.
1 0 . Abbreviations should be rarely used, only if necessary, and also, ditto marks
should be avoided.
1 1 . Representation of large data in a table can make it look messy and unclear. So, in
such cases, the data should not be clumped in one single table.
1 2 . The row totals are placed at the extreme right column of the table, while the
c o l u m n totals are placed in the last row of the table.
The tables, figures, and graphs help in the brief representation of the data, which helps
in the a n a l y s i s of the f i n d i n g . The analysis of the data is done accordingly, d e p e n d i n g on
the data type and size of the data. It is very important to implement an appropriate
statistical technique, in order to get the appropriate results. There are many statistical
techniques used in data analysis. In this chapter, some of the important statistical
techniques are discussed.
6.3 M u l t i p l e Regression Analysis
Regression a n a l y s i s enables us to obtain an equation depicting the relationship between
two variables. Similarly, multiple regression analysis enables us to obtain the
relationship between more than two variables.
Consider that X1 and X2 are two independent variables and Y is a dependent variable.
The m u l t i p l e regression equation can be expressed as:
Y=a+b 1
X 1
+b2X2
Where,
X 1 and X2 = Independent variables
Y = Dependent variable
a, b1, b2 = Constants
To obtain the regression equation, you need to calculate the values of the constants.
This can be done by solving the following normal equations:
IY = Na+b,IX, +b,IX,
I X , Y = aI X, + b, I X ; + b,I X 1 X 2
I X2Y = aI x 2
+ b,I x x
1 2
+ b,I Xj
6 . 4 M u l t i p l e D i s c r i m i n a n t Analysis
Discriminant analysis is appropriate in situations where the independent variable is
quantitative and the dependent variable is categorical in nature. For example, on the
basis of an applicant's age, income, length of time at present home, etc., a credit
m a n a g e r wishes to classify a person as either good or poor credit risk.
Discriminant analysis is suitable with nominal dependent variable and interval
independent variables. It is the technique to analyze data when the criterion or
dependent variable is categorical and the predictor or independent variables are interval
in nature (Lachenbruch, 1975). The discriminant analysis involving more than two
variables is known as a multiple discriminant analysis.
The equation showing such a relationship between 'n' variables is called a discriminant
function and is g i v e n a s :
z = w,x, + wx 2 2
+ ... + w"x"
Where,
Z = D i s c r i m i n a n t scores
W; = D i s c r i m i n a n t weight for variable X;
X; = i'h i n d e p e n d e n t variable
Discriminant analysis is widely used in business research. According to Gupta (2003),
following aspects can be studied by this type of analysis:
Identification of new buyer group
Consumer behavior towards new products or brands
Brand loyalty study
Relationship between variables
Checklist of properties of new products
6 . 5 M e a s u r e s of Central Tendency
Measures of central tendency help to obtain a representative value to study the
characteristics of the population from which the data is extracted. It is also known as a
statistical average as an average value of the data is obtained when these measures are
computed.
The measures of central tendency are also known as measures of location as the value
obtained from such computation indicates the position/location of the value. The
measures of central tendency are:
Mean/Arithmetic mean/Average
Mode
Median
H a r m o n i c Mean
Geometric Mean
6.5.1 Mean/ Arithmetic mean/ Average
It is the most commonly used measure of central tendency and the simplest of a l l . It is
obtained by dividing the sum of all the values of the observation by total number of
observations.
Consider the values of ' n ' observations X1, X2, X3 ... Xn, then, the arithmetic mean is
given a s :
LX,
X=-'
n
Where,
X = Arithmetic mean
L x, = S u m of a l l the 'n' observations
'
n = Total n u m b e r of observations
For example, if leaves taken by 10 employees in a bank, in the last three months are 2,
4, 6, 1, 2, 3, 1, 2, 1, and 5, then, the average leaves is obtained by the arithmetic
mean:
X = 2 + 4 + 6 + 1 + 2 + 3 + 1 + 2 + 1 + 5 = 27 = 2.7
10 10
While, if for these 'n' observations, the corresponding frequencies are given, that is,
f 1, f 2, f3 ... f0 then its arithmetic mean is given as:
Lf,X,
X= If,
I
Where,
X = Arithmetic mean
L f,X, = Sum of the product of the observations and their corresponding
'
frequencies
L f, = Total frequency
'
Example: If you have a frequency distribution as illustrated in Table 6 . 5 . l a .
I I I I I ! I
Table 6 . 5 . l a : Frequency Distribution Table
The arithmetic mean is g i v e n a s :
If,X,
X='--
If,
'
For t h i s formula, 'fx' is calculated as illustrated in Table 6 . 5 . l b :
x 0 1 2 3 Total
f 2 4 6 8 20
fx 0 4 12 8 24
Table 6 . 5 . l b : Frequency Distribution Table
- 24
Therefore, X = - = 1.2
20
6.5.2 Median
Median is the middle value of a data when all the observations are arranged in
ascending or descending order of their magnitude. This is applicable if the data under
consideration is an u n g r o u p e d data.
For example, the monthly sales (in thousands) of 9 salespersons of an insurance
company are 10, 11, 13, 12, 17, 18, 9, 14, and 15. To obtain the m e d i a n for t h i s data,
first, the observation is arranged in ascending order, that is, 9, 10, 11, 12, 14, 15, 17,
9 1
and 18. Then, the value of the observation in the ( ; )"' position, that is, 5th position
gives the m e d i a n and is eq u a l to 14.
However, suppose that in the above example, there are 10 salespersons and the
observations are 10, 11, 13, 12, 17, 18, 9, 14, 15, and 17. Similarly, the observations
are arranged in ascending order; 9, 10, 11, 12, 14, 15, 17, 17, and 18. Then, the
median is the value obtained as the mean of the values of the observations in the
14; 15
(
12
)th and (
1
i + 1)"' positions. Thus, the median is ( ) = 14.5
A g a i n , for a discrete frequency distribution:
x f c.f.
x, f, c,
X2 f2 C2
X3 f, C3
... ... ...
Xn fn Cn
Total N
. . .
Table 6.5.2a: Frequency D1stnbut1on Table
First, you have to compute ( )
Then, the value of c.f. just greater than the value obtained from () is found and the
value of X, corresponding to the value of c.f., is the median. Also, for a continuous
frequency table the calculation of median is different from the discrete frequency table.
Consider the following distribution of employees' bonus for a company.
x f c.f.
0 - 10 2 2
10 - 20 4 6
20 - 30 6 12
30 - 40 8 20
Total 20
. . .
Table 6 . 5 . 2 b : Distribution of Employees' Bonus
Thus, the m e d i a n is g i v e n by the formula:
rz -
2
c.f.
Median= L + f x i
Where,
L = Lower l i m i t of the m ed i a n class
c.f. = C u m u l a t i v e frequency of the class preceding the median class
f = Frequency of the median class
i = Class interval of the median class
2
2
Therefore, from Table 6 . 5 . 2 b , = = 10, then c.f. just greater than 10 is 12 and 12
corresponds to the class 20 - 3 0 .
Thus, m e d i a n class is 20 - 3 0 , L = 20, c.f. = 6, f = 6 and i = 10
rz -2
c.f.
Median= L + f x i
6
= 20 + lO - x 10
6
4
= 20 + x 10
6
= 20 + 6 . 6 6 7
= 26.667
Thus, the m e d i a n d i v ides the whole data series into two equal parts. In add it ion, it is not
affected by the extreme values.
6 . 5 . 3 Mode
It measures the most frequently occurring value in the data. For an ungrouped discrete
data, the mode value is the value of the observation which occurs for m a x i m u m number
of times.
For example, if the age of the employees in the marketing department is g i v e n as: 25,
22, 35, 29, 42, 26, 36, 25, 29, 28, and 29. Here, 29 occurs m a x i m u m n u m b e r of times,
that is, 3; therefore, mode is 2 9 .
For a grouped continuous frequency distribution, the mode is obtained by the following
formula:
M = L + f,-fo x i
0
2f, - f, - f,
Where,
L = Lower l i m i t of the modal class
f1 = Frequency of the modal class
fa = Frequency of the class preceding the modal class
f, = Frequency of the class succeeding the modal class
For example, consider the following frequency distribution.
x f c.f.
0 - 10 5 5
10 - 20 14 19
20 - 30 23 42
30 - 40 8 50
Total 50
Table 6 . 5 . 3 a : Distribution of Employees' Bonus
Here, the highest frequency is 23, thus, the modal class is 20 - 3 0 .
Also, L = 20, f1 = 23, f a = 14, and f, = 8
23 14
M
0
= 20 + - x 10
2 x 23 - 14 - 8
9
= 20 + x 10
46-22
9
= 20 + 24 x 10
= 2 3 . 75
The mode can be determined graphically. In some cases, there can be two or more
modes in a single data. Thus, for such a data, the interpretation of results is very
d ifficu It.
6 . 5 . 4 Geometric Mean
The nth root of the product of all the observations is called a geometric mean and is
abbreviated as G.M. That is, if xi , x2 ... x0 is a given set of 'n' observations, then the
geometric mean is:
G . M. = x 1 .x 2 x,
First taking the logarithm and then antilogarithm above, the formula can be written a s :
L logx )
G.M. = a n t i log n '
(
For u n g r o u p e d data, if the scores obtained by 5 candidates in an aptitude test are 5, 6,
5. 5, 8, and 5.
Therefore, G. M. = x 1 .x 2 x, = Vs x 6 x 5.5 x 8 x 5 = V6600 = 5.806
For discrete frequency distribution, the formula for G . M . is:
G , M. = X1f1X2f2 , , , Xnfn
or
_ . (Lf,logx,J
G.M. - antiloq N
Where, N = L
For c o n t i n u o u s frequency distribution, from the class intervals, the mid-values ( rn . ) are
calculated, then the formula for G . M . is given as:
Lf,logm,J
G.M. = antilog N
(
6.5.5 H a r m o n i c Mean
Harmonic mean or H.M. is reciprocal of the average of the reciprocals of the
observations. That is, if x 1 , x2 ... x, is a given set of 'n' observations, then the harmonic
mean is g i v e n a s :
H a r m o n i c Mean (H.M.) = =._1-+---1-+_n_

..-
. . -.+
-_-
1
-
= Ln}__
X1 X2 x, Xi
The above equation is used for ungrouped discrete data.
For example, the H . M . for the observations 4, 6, and 2
3 3 12
H.M. = = x = 3.273
1 1 1 11
- + - + -
4 6 2
For grouped c o n t i n u o u s distribution, as in Table 6 . 5 . 2 a , the harmonic mean is given a s :
N
H.M. =----
1)
"[,(f, x
x,
For continuous series,

H.M.= ( N 1 r where ' rn . : is the middle value of the i'h
I, f,x
m,
classes, and N = J,
Mid-value f
x f -
m m
0 - 10 5 2 0.4
10 - 20 15 4 0.267
20 - 30 25 1 0.04
30 - 40 35 3 0.086
Total 10 0.793
Table 6 . 5 . 5 a : Frequency Distribution Table
10
12.610
0.793
6 . 6 M e a s u r e s of Dispersion
Dispersion in statistics means the variability or spread of the observation in data. The
measure of d i s p e r s i o n helps to identify the suitability of the data, that is, if the data is
more scattered, then it is not reliable, otherwise, it is reliable.
The most p o p u l a r measures of dispersion are:
Range
Mean deviation
Standard deviation
6.6.1 Range
Range is the simplest measure of dispersion. It is defined as the difference between the
highest and the lowest value of the data series and is given a s :
Range= H i g h e s t value - Lowest value
Since range is not based on all observations, its application is limited.
6.6. 2 Mean Deviation
Mean deviation or M.D. is defined as the average of the sum of all deviations of the
values in a series from its mean and it is given as:
1 n
M.D. = - I Ix, -
n 1= 1
Where,
In t h i s measure, the negative sign ( - ) of deviation is ignored in its c a l c u l a t i o n . Also, for
frequency d i s t r i b u t i o n , as in Table 6.5.2a, its formula is given a s :
1 '
M.D. = - I;f1IX1 -
N 1 = 1
Where,
'
I; f , x ,
X = =
' '-
N
For example, consider the following frequency table:
x f fx Ix- x-
1 5 5 2.9 14.5
2 10 20 1.9 19
3 15 45 0.9 13.5
4 30 120 0.1 3
5 40 200 1.1 44
Total 100 390 94
Table 6 . 6 . 2 a : Table for the Calculation of M . D .
n Lf1X1
1
From Table 6 . 6 . 2 a , N = L f1 = 100 and x = =
1
= 3.9
1=1 N
Therefore, M.D.
1
= - "
L f,
I
x, - vi
x =-1
- x 94 = 0. 94
N1=i 1 100
6 . 6 . 3 Standard Deviation
Standard deviation is the most widely used measure of dispersion. It is defined as the
positive square root of the average of the squares of deviations of the observation from
its mean and is denoted by a. It is given as:
CT =
1 ' (
- L X; -
-)2
x
n i = 1
For frequency d i s t r i b u t i o n , standard deviation is given a s :
1 n ( -)2
cr = - I: f x, -
1
x
N 1 = 1
2 2
The value a2- is the variance of the data, that is, cr = .!. L (x, - X)
n
For example, for the g i v e n data, the standard deviation is calculated as:
Mid-value -
Class Interval f fx x-X t(x- xf
(x-xf
x
0 - 3 2 1 3 -2.875 8.266 8.266
3 - 6 4 5 20 -0.875 0.766 3.83
6 - 9 8 2 16 3.125 9.766 19.532
Total 8 39 31.628
Table 6 . 6 . 3 a : Frequency Distribution
"
I; f , x ,
1
39
x = ,= = = 4 875
N 8 .
o = ..!. f
1
(x 1
- XJ = /..!. x 31.628 = 1.988
N 1 = , 'V 8
Thus, the standard deviation is equal to 1. 988
6 . 7 M e a s u r e s of Skewness
The measure of skewness describes how asymmetric the d i s t r i b u t i o n is. In a symmetric
distribution, the mean, mode, and median lie on the same point that d i v i d e s the whole
distribution into two equal parts. However, in case of an asymmetric distribution, the
mean, m ed i a n , and mode do not lie on the same point. Fig. 6.7a gives the shape of
different curves:
Mode Mean = Mode = Median Mode
Median Median
Mean
I I
I I I I
Negatively skewed Normal curve Positively skewed
Fig. 6 . 7 a : Symmetrical and Asymmetrical Distribution
The first figure shows a negatively skewed curve, where mode > median > mean.
However, the third figure shows mode < median < mean. While, in the second figure,
mode = m e d i a n = mean.
Prof. Karl Pearson has defined the measure of skewness, which is called Karl Pearson's
coefficient of skewness, a s :
M e a n - Mode
Karl P e a r s o n ' s Coefficient of Skewness=----- ................. (1)
Standard Deviation
If mean >, = or < mode, then the equation (1) is positively skewed, symmetrical or
negatively skewed, respectively.
However, if the data has two or more modes, then the above equation (1) is given a s :
. . 3(Mean-Median)
Karl P e a r s o n ' s Coefflclent of Skewness= d d (2)
Stan ar Deviation
If mean >, = or < median, then the equation (1) is positively skewed, symmetrical or
negatively skewed, respectively.
6 . 8 M e a s u r e s of Relationships
In a data with many variables, you can use different statistical measures for calculation
of relationship between the different variables. For a bivariate data, a cross tabulation,
Charles Spearman's correlation coefficient, Karl Pearson's correlation coefficients or
s i m p l e regression can be used for measuring the relationships between the variables.
Cross Tabulation
In a cross table representation, the data is represented such that the variables can be
compared with each other. The whole data is arranged into categories and further
d i v i d e s it into two or more sub-categories. So, the row categories and c o l u m n categories
are compared and the cross table is filled in, according to the g i v e n data.
For example, Fig. 6.Sa is a cross tabulation representation.
(Al Cross-Tabulation of Qustion ..Hava you followiNI the nw storis about AIG bonuses]"
Total Gender Ag
-- -----
Adults Men Women 18-29 30-39 40-49 So-64 65+
Closely Followed News Stories Very dosely

.,,. er .. ....
... ...
60" 51" 49"
about AlG Bonuses? Somewhat closely "" ao .. 35"

, ""
, ,s .,.
Not very closely ...

"" ... "" s.. 4'6 4'6 ""
Not at all '"
1 .,. 4'6 '" 4" '"
1 .,.
Not sure '" "'

0'6 '" '"
1 '6
'" '" '" "' '" "'

(Bl Cross-Tabulation of Question "Is tha bailout monay going to thosathat craatN the crisis?"
Total Gand@r Ag
-- -----
Adults Man Woman 18-29 30-39 40-49 50-64 65+
Most Bailout Money Going Ye, .... .,.,. .... .,.,. r .,. .... ,o.,.
61"
to People lNho Created Crisis? No 18'6 14'6 14'6 16'6 16%
Not sure 14'6 ""

10'6 ...
"" 10'6 "" 14'6
"" "" ""

Fig. 6 . S a : Cross Tabulation Representation
Source: Z i k m u n d , Baabin, Carr, Griffin, Business Research Methods, sm Edition.
Karl Pearson's Correlation Coefficient
It is the most popular statistical test for measuring the relationship between two
variables. It can only detect the extent of relation between the variables but does not
give any information about the cause and effect of the relationship.
The Karl Pearson's correlation coefficient for two variables X and Y is g i v e n a s :
Cov(X, Y) L (x, - X)(y, - Y)

r = or r = -';====sea======
cr , cr , J(x, - 2
X) (y
1
- Y)'
Where,
X = Mean of X variable
Y = Mean of Y variable
a,= Standard deviation of X variable
o ; = Standard deviation of Y variable
Cov(X, Y) = Covariance between X and Y variables
The value of r lies between -1 and 1. If the values of r = -1, then there is a perfect
negative correlation between the variables. If r = 1, then there is a high degree of
correlation or a perfect correlation between the variables.
Spearman's Correlation Coefficient
When the data is in ordinal scale of measurement, Karl Pearson's correlation coefficient
fails to determine the relationship between the variables, in such a case, Spearman's
Rank correlation coefficient is used and the values of the variables are assigned ranks.
Then, it is calculated by the formula:
r - 1 - 6 L.,
'\' d, J
2
n(n - 1)
l
Where,
d, = Difference between the ranks of the i'h pair of variables
n = Total p a i r of observation
Regression
Regression gives the linear relationship between two variables. U n l i k e correlation, it can
give the cause and effect of the relation. In this technique, the equation representing
the linear r e l a t i o n s h i p between the variables is considered a s :
Y = a + bX ................. (1)
Solving equation (1) and obtaining the values of a and b give the regression equation.
The two normal e q u a t i o n s to obtain a and bare:
LY= na+ bLX
2
i: x v = ai:x + b i: x
The regression l i n e is g r a p h i c a l l y represented in the Fig. 6.Sb .
----Regression Line
. . .. .

Fig. 6.Sb: Regression line
0 6 . T a b u l a t i o n a n d Analysis of Data eBook
These t e c h n i q u e s are used for only two variables but when more t h a n two variables are
to be compared, you can use multiple correlation, partial correlation, multiple
regression, etc. accordingly.
6 . 9 Association of Attributes
To study the relationship between two attributes, you have to use association of
attributes. For such association, Prof. Yule has defined a coefficient of association, which
is known as Yule's coefficient of association and is denoted by QAs, where A and B are
two attributes to be compared. It is given as:
( A B ) ( a b ) - (Ab)(aB)
QAe = ( A B ) ( a b ) + (Ab)(aB)
Where,
QAs = Yule's coefficient of association between attributes A and B
(AB) = Frequency denoting A and B a r e present
(Ab) = Frequency denoting A is present but B i s absent
(aB) = Frequency denoting A is absent but B i s present
(ab)= Frequency denoting both A and B a r e absent
The mentioned frequencies are shown in Table 6.9a which is a 2 x 2 contingency table.
Attribute -t
A a Total
,I.
B (AB) (aB) (B)
b (Ab) (ab) (b)
Total (A) (a) N
Table 6 . 9 a : Frequency Table for Attributes A and B
After computing the value of QAs, if the value of QAs = + 1, then there is a perfect
positive association between the attributes. If QAs = - 1 , then there is a perfect negative
correlation between the attributes and if QAs = 0, there is no association between the
attributes.

6 . 1 0 T i m e Series Analysis and Index N u m b e r
Time Series
Data which is given with respect to a sequence of time are called time series data.
Example: The yearly or monthly sales of a departmental store, the monthly incentives
given to the employees in a sales department of a company, etc.
Thus, a time series data consists of a variable denoted as Y,, recorded at specified time
point, t. Time series is affected by four components, they are:
Secular variation or Trend (T)
Cyclical variation (C)
Seasonal variation (S)
Irregular or Random variation (I)
Secular variations are the variations in data when it is observed for a long period of
time. Thus, the effect of a trend is almost consistent throughout the period considered.
In cyclical variations, there is an oscillatory movement in the data. A b u s i n e s s cycle is a
good example of a cyclical variation as the cycle oscillates from prosperity to recession,
then to depression and finally, recovery, as shown in Fig. 6 . 1 0 a .
PEAK PEAK
P R O S P E RITY +
TROUGH
Four Phases of Business Cycle
Fig. 6 . 1 0 a : Cyclical Variation in a Business Cycle

Seasonal variations are the variations that occur in a data seasonally. Like the sales of
flight tickets go h i g h d u r i n g holiday season and during the rest of the year, it is n o r m a l .
And lastly, i r r e g u l a r fluctuations are the variations that occur in a data randomly. Like, if
there is a natural calamity, like flood, earthquake, etc. or there is a strike or war, then
the variation in the data, d u e to such causes is called irregular variation.
Index Number
An index number is a device which shows, by its variation, the change in a magnitude
that is not capable of accurate measurement by itself or of direct valuation in practice
(Wheldom, Business Statistics). Thus, index number studies the change mainly in
economic activities in a period of time. For example, it studies the change in prices of a
commodity in two different situations (years).
Some of the most commonly used index numbers are Laspeyres method, Paasche
method, Fisher's ideal method. Index number is also called economic barometer because
it studies the change in different economic situations. It gives us an approximate idea of
the change but does not give an accurate result of the change.

6 . 1 1 Chapter S u m m a r y
Tabulation is a useful representation of data to summarize a data for its
comparison and computation.
Measures of central tendency help to represent the characteristics of a data by a
single value. The measures of central tendency involve mean, median, mode,
harmonic mean, and geometric mean.
Measures of dispersion study the variability among the observations in a data
series. The main types of measures of dispersion are range, mean deviation, and
standard deviation.
The objectives of discriminant analysis are to classify objects, by a set of
independent variables, into two or more mutually exclusive and exhaustive
categories.
When there are two or more than two independent variables, the anal y s i s
concerning relationship is known as multiple correlation and the e quat i o n
describing such relationships is known as multiple regression eq u a t i o n .
If a variable Y, is studied at different points of time (t), the series so obtained is
called a time series.
The change in the economic condition over two situations is called an index
number.

Hypothesis Testing
07. Hypothesis Testing eBook
7 . 1 Introduction
Hypothesis testing is an important tool used in obtaining research inferences. Research
is carried out for new and advanced findings and hypothesis testing enables the
researcher to obtain this. It helps to draw conclusions about the population based on
sample observations.
Hypothesis testing also helps in decision making in the field of business and industry.
For example, the m a n a g e r of a garment factory wants to compare the outputs of two of
t h e i r factories in two different locations. Thus, to test the hypothesis, one must initially
proceed, considering that both factories have the same outputs. Finally, after the
application of statistical tools for testing the hypothesis, one can conclude whether the
hypothesis is true or not, on the basis of the result value. Also, if one wants to study the
average customers visiting a shopping centre, then by collecting a sample of the n u m b e r
of customers visiting the shopping centre for 10 days and, accordingly, after calculating
the result by a statistical technique, one can conclude the average n u m b e r of customers
visiting the s h o p p i n g centre.
Define hypothesis
State the different types of hypothesis
Discuss some terminologies of hypothesis testing
Describe the procedure of testing of hypothesis
Discuss parametric and non-parametric tests for hypothesis testing
Discuss hypothesis testing for mean of single sample and compare two means of
two samples
Discuss hypothesis testing for variance
Discuss hypothesis testing for simple, partial, and multiple correlation coefficients

7 . 2 Hypothesis
Hypothesis is a statement about population parameters. The hypothesis is constructed in
a m a n n e r that completely reflects the research problem. These statements are based on
some a s s u m p t i o n s and d e p e n d i n g on these assumptions, the results are o b t a i n e d .
According to Robert B. Burns and Richard A. Burns, "A hypothesis is a hunch, an
educated guess, a proposition that is empirically testable." They also stated that,
"It possesses three essential steps:
1. The proposal of a hypothesis or tentative assumption to account for a
phenomenon or test the validity of some situation.
2. The deduction from the hypothesis that certain phenomena should be
observed in given circumstances.
3. The checking of this deduction by observation and testing."
Usually, there are relational hypotheses denoting some relationship between the
variables under study, hypotheses about the differences between groups and also,
hypotheses about differences of a particular group from some standard group/measures.
The characteristics of a hypothesis are:
A hypothesis should be simple, clearly understandable, and subject-specific, in
order to give a consistent fi n d i n g .
Hypothesis should be capable of undergoing statistical tests to scientifically obtain
accurate resu Its.
In case of a relational hypothesis, the hypothesis states the relationship between
the variables.
Hypothesis should be limited in scope and must be specific.
A hypothesis should not conflict with known facts.
It should be tested within a particular period of time. It should not be so complex
that, for its testing, an excessive amount of time will be required.
Hypotheses that sometimes involve facts should be explained properly.

7 . 3 Types of Hypothesis
Hypotheses are of two types: N u l l hypothesis and alternative hypothesis.
Null Hypothesis
The null hypothesis is a statement of no difference, that is, a hypothesis that states that
there is no difference between study variables.
According to Prof. Fisher, "A null hypothesis is the hypothesis which is tested for
possible rejection under the assumptions that it is true." It is denoted by H

0
For example, if you want to test whether two samples from a population are s i m i l a r or
not, in such a case, the null hypothesis can be stated a s :
H0 : There is no difference between the two samples.
Also, if the average height of the students in a college is said to be 5.4 feet, where the
height is normally d i s t r i b u t ed with m e a n , then the null hypothesis is given a s :
H 0 :
= 5.4feet
Alternative Hypothesis
Alternative hypothesis is a statement which is the opposite of null hypothesis. Thus, a
statement with difference is called an alternative hypothesis. It is denoted by H, or HA or
H,.
For example, the alternative hypothesis for a null hypothesis of the type
H0 : = 5 . 4 feet is H, : "' 5 . 4 feet or H, : < 5 . 4 or H, : > 5.4
Here, H, : "' 5 . 4 Inch is a two-tailed alternative hypothesis.
While, H, : < 5 . 4 and H, : > 5.4 are one-tailed alternative hypotheses (left tailed and
rig ht t a i l e d , respectively).

7 . 4 T e r m i n o l o g i e s Used in Hypothesis Testing
Population
The population is an aggregate of all the items or individuals. For example, the collection
of a l l individuals in a locality is called the population of the locality. All computers in an
office, collectively, are an example of a population. Usually, population is denoted by S .
Each of the items or individuals in a population is called a population unit. Thus, in a
population S, if there are N numbers of units, then N is termed as the population size.
A population can be finite or infinite, depending on its size. For example, the population
of private sector employees in India is an infinite population, while the population of
employees at a particular company is a finite population.
Parameter
The parameters are characteristics of the population which define the population. The
2,
population parameters are mean, denoted by the symbol , variance denoted by cr etc.
Sample
A sample is the subset of a population. For example, if from a population of 1 0 0 0 books
on fiction, academics, and short-stories, only books on academics are selected, it is a
sample drawn from the population of 1000 books.
The size of the sample is the total number of items or individuals in a sample and is
denoted by n. A sample is very useful when the population is very large and studying
the p o p u l a t i o n w i l l involve more money, time, and effort. In such a case, it is convenient
to take a sample from the given population, which is a representative of the p o p u l a t i o n .
Statistic
As the parameter defines characteristics of population, likewise, statistics defines
characteristics of a sample. Statistics, like mean and variance, are represented by
different symbols to distinguish between parameters and statistics, that is, mean is
denoted by x and variance is denoted by s

2
, respectively.
Simple Hypothesis and Composite Hypothesis
A simple hypothesis is a hypothesis that completely specifies the population, while a
composite hypothesis partially specifies the population.

For example, if the weekly sales of an apparel brand for a year are normally distributed
2
with mean and variance cr , then, the simple hypothesis is g i v e n a s :
2
H0 : = 0 and cr = cr
And the composite hypotheses are given as:
a) Ho : = o
2
b) H 0
: = 0
, cr > cr
c) H, : " 0 , cr " cr , etc.
Test Statistic
After the formulation of the hypothesis, the next step is to test the hypothesis, that is,
to accept or reject the null hypothesis. To test the population characteristics under
study, it is not possible to obtain observations from the whole p o p u l a t i o n , so a sample is
selected. Based on t h i s sample the hypothesis is tested, that is, statistics of the sample
are involved in the computation and a test statistic is a function of these statistics.
For example, z = :;?n is a test statistic. Here, z is a function of the sample mean ( x ),
assumed mean (0), standard deviation ( cr ) , and sample size n.
Thus, the test statistic is calculated from the sample and its value is used in decision
making, whether the null hypothesis is to be accepted or rejected. The choice of an
appropriate test statistic depends on the hypothesis formulated and the population
distribution.
Critical Value
A critical value for a hypothesis test is the value to which the value of the test statistic is
compared and the d ec i s i o n of accepting or rejecting a hypothesis is taken. The critical
value varies according to the level of significance of the hypothesis and two-sided or
one-sided test.
Type-I Error and Type-II Error
While testing a hypothesis, two types of errors can be committed, namely, type-I error
and type-II error. Type-I error is committed if the null hypothesis is rejected, when it is
in fact true and type-II error is committed, if the null hypothesis is accepted, when it is
in fact false.

P(type-I error) = o. and P(type-II error) = 13
These errors can be represented in a tabular form as shown in Table 7 .4a.
Decision H0 is true H0 is false
Reject H0 Type-I error Correct decision
Accept H0 Correct decision Type- II error
Table 7.4a: Type-I and Type-II Error
Level of Significance
The level of significance is a fixed value which indicates the a m o u n t of correctness in the
conclusion drawn from the hypothesis testing. It is denoted by a , so it is g i v e n a s :
P(type-I error) = c,,
If a = 0 .05, then the level of significance is 5/o, that is, the probability of rejecting a
true H0 is 5%, w h i l e to accept a true H0 is 95/o (100/o- 5/o = 9 5 % ) .
p-value
Probability value is termed as p-value. It is a value that assumes the value of a test
statistic when the null hypothesis is true. If the p-value in a hypothesis testing is less
than the already decided significance level, the difference is significant. A smaller p
value indicates that the n u l l hypothesis is less likely to be true.
Power of the Test
The power of a hypothesis test is the probability of not committing a type-II error. Thus,
the power of a hypothesis test is one minus the probability of accepting the null
hypothesis when it is false, that i s : Power of the t e s t = 1 - P(type - II e r r o r ) = 1 - 13
Acceptance Region and Rejection Region
The acceptance region is the region formed by the values of test statistics under
consideration, in which the null hypothesis is accepted, that is the sample space of a l l
the values of the test statistic is divided into two regions, acceptance region and
rejection reg ion. If the calculated value of the test statistic lies in the acceptance reg ion,
then the n u l l hypothesis is accepted; whereas, if the calculated value of the test statistic
lies in the rejection region, then the null hypothesis is rejected. The rejection reg ion is
also known as the critical region. Fig. 7.4a shows the acceptance and rejection region in
w h i c h the level of significance is cc .

H, : < o
Acceptance
Region
Cl
0 + oo
Critical Region
Fig. 7 .4a: Acceptance and Critical Region for Left Tailed Test
One-sided Test and Two-sided Test
To e x p l a i n one-sided and two-sided test, consider the following example:
Suppose the mean of a normally distributed population is and

0
is a fixed value. The
n u l l and alternative hypothesis is given as: H0 : =

0
and H, : ctc 0
Thus, the alternative hypothesis means, either > 0 and <

0
.
Therefore, the critical reg ion is located on one tail of the probability distribution, as
shown in Fig. 7 .4a. This is called a one-tailed test.
In Fig. 7 .4a, the critical reg ion is located on the left side of the distribution and is
termed as left-tailed test but when the critical region is located on the right side of the
distribution, it is termed as a right-tailed test. Fig. 7.4b shows a right-tailed test.
Acceptance
Region
Cl
- 00 0
Critical Region
Fig. 7 .4b: Acceptance and Critical Region for Right Tailed Test

For H, : a< , the critical region falls on both sides of the d i s t r i b u t i o n and is termed as
0
a two-sided test.
Acceptance
Region
aJ2 aJ2
Cntical Critical
Region Region
Fig. 7 . 4 d : Acceptance and Critical Region for Two-sided Test
7 . 5 Procedure of Testing of Hypothesis
The procedure for testing a hypothesis is:
1. Formulation of the hypothesis
The first step is to formulate the null hypothesis and the alternative hypothesis in
a m a n n e r that it reflects the purpose of the study.
2. Decide the distribution
Once the sample is collected, the sampling distribution is obtained. Also, an
appropriate test statistic for the test is decided.
3. Select the level of significance
The next step is to select an appropriate level of significance. Usually, 5% and 1%
level of significance ( a ) is considered.
4. Computation of the value of the test statistic
The v a l u e of the test statistic is computed with sample observations.
5. Obtain the critical value
The critical value is obtained, depending on the level of significance, according to
the test statistic.

6. Comparing the calculated value with the critical value
Compare the calculated value of the test statistic with the critical value. If the
calculated value is less than the critical value, then the null hypothesis is accepted,
otherwise, it is rejected. If the calculated probability is e q u a l to or s m a l l e r t h a n a
value in case of one-tailed test (and a/2 in case of two-tailed test), then accept
the null hypothesis, but if the calculated probability is greater, reject the null
hypothesis.
7.6 Parametric and Non-parametric Testing
For hypothesis testing, the conclusion of accepting or rejecting a n u l l hypothesis is based
on the value of the test statistic. There are many statistical tests used for hypothesis
testing. These tests are of two types: Parametric and non-parametric tests.
Parametric test involves some assumptions about the population considered. While in
the case of a non-parametric test, it does not involve any such a s s u m p t i o n s . Thus, when
information about the population is not available, the non-parametric test is appropriate.
2
Some parametric tests are z-test, t-test, x test, F-test, etc. The non-parametric tests
are s i g n test, run test, Wilcoxon matched pairs test, Kruskal-Wallis test, etc.
Parametric tests for hypothesis testing that are based on the assumption of normal
d i s t r i b u t i o n are:
z-test
It is based on the assumption of normality and is applicable for testing the
significance of several measures, like mean, median, mode, coefficient of
correlation, etc. It is used to compare the mean of a large sample to a
hypothetical mean, in order to test the significance of difference between means of
two samples, in case of a large sample and also, when the variance is known. It is
useful for samples with a size greater than 30.
t-test
The statistic used f o r t - t e s t was introduced by W . S . G o s se t in 1908 and termed as
student's t statistic. Like the z-test, t-test is also used to test the significance of a
sample mean or the difference between two sample means, when the variance of

the population from which the sample is drawn is not known. Also, for paired
samples, it is used to test the independence between samples.
2
Chi-square ( x ) test
This test is used to compare a sample variance with a theoretical population
variance.
F-test
This test is used to compare the two independent samples in ANOVA. It is also
used to compare more than two samples at a time and for testing the
homogeneity of variance of two normal populations.
Some of the non-parametric tests for hypothesis testing are:
Sign test
It is the simplest of all the non-parametric tests. In this test, the values of the
observations are replaced b y ' + ' o r ' - ' sign to the direction it is moving towards or
away, from a hypothetical value, respectively. Therefore, it is termed as s i g n test.
Run test
The word 'run' here denotes the sequence or series of symbols w h i c h are followed
or preceded by a different symbol or no other symbol. A run test is used to verify
whether there is any randomness among the observations in a g i v e n data.
Wilcoxon matched-pairs test
When you have data for two samples which are paired, the Wilcoxon matched
pairs test is applicable. This test is used to make inferences about the difference
between two p o p u l a t i o n s.
Kruskal-Wallis test
This test is used to compare more than two populations for continuous data. In
t h i s test, a l l g r o u p observations under consideration are ranked and then, one-way
AN OVA is used, with the ranks as the values of the observations for each g r o u p .

7 . 7 Testing of Hypothesis for Mean
Hypothesis testing for mean of single sample and for difference between mean is:
Testing for mean of single sample
Case 0 1 :
2
When the population is infinite and normally distributed but the variance, cr , of the
population is known. Also, the sample size is denoted by n.
Here, H0 : =
0
Where,
, = A hypothetical value
Then, for a one-sided or two-sided alternative hypothesis, the test statistic applied is
X - 0
given a s : z =
CT I ,,[ri
And z follows standard normal distribution with mean O and variance 1.
Case 0 2 :
2
When the population is finite and normally distributed but the variance ( cr ) of the
population is known with null hypothesis: H0 : =

0
and alternative hypothesis is one
sided or two-sided.
X -
Then, test statistic z is given as: z - 0
- (cr/.fri)RcN-n)/(N-1)]
Case 0 3 :
When the population is infinite and normally distributed but the variance (a') of the
population is u n k n o w n .
Since the population variance is not known, sample standard deviation is used as an
estimate of the population standard deviation, a

5
= L (x, - X)'
n-1
0
And the test statistic used to test the hypothesis is t - X - with (n - 1) degrees of
- (JS ; ,,fri '
freedom or df.

In case, the population is finite, the test statistic is given a s :
t - X - o
- {cr, I Jn)x [(N- n ) / ( N - l)j
Example 0 1 :
A sample of 400 male students is found to have a mean height 67.47 inch. Can it be
reasonably regarded as a sample from a large population with mean height 67.39 inch
and standard deviation 1 . 30 inch? Test whether the sample is drawn from the given
population at 5% level of significance.
Solution O 1 :
Taking the null hypothesis that the mean height of the population is e q u a l to 6 7 . 3 9 inch,
we can write:
H0 : H, = 67 . 3 9 "
H, : H, 7' 67.39'
Therefore, the given information is written as X = 6 7 . 4 7" , cr P = 1.30", n = 400. Assuming
the population to be normal, we can work out the test statistics 'z' as follows:
z = x - Ho = 67.47 - 67.39 = 0.08 = 1.231
crp I ../n 1.30/.J400 o.065
As H, is two-sided in the given question, we shall be applying a two-tailed test for
determining the rejection regions at 510 level of significance w h i c h , using normal curve
area table is as follows:
R : I > 1 . 96
The observed value of z is 1.231, which is in the acceptance region, since R : I > 1.96
and thus, H is accepted. We may conclude that the given sample (with mean height =
0
= 67.47') can be regarded to have been taken from a population with mean height
6 7 . 3 9 " and standard deviation 1.30" at 510 level of significance.
Source: Kothari .C.R., Research Methodology, Methods and Techniques, New Age International Publishers,
New Delhi, 2nd Edition

Testing for difference between means
To test the n u l l hypothesis in different cases, the test statistics are:
Case 04:
2 22
When the population variance (cr 1 , cr ) is known and the samples are large. Then, to
test the hypothesis:
Ho : , = ,
Where,
,,
2
= Population means of two separate populations from which samples are d r a w n .
Then the test statistics z is given a s : z =
Where,
X1 and X2 = Means of the samples of size n, and n2
22
If the p o p u l a t i o n variances cr2i, cr are not known, sample variances are used.
I(x" - x , j (x2, - x,j

(n, - 1) (n
2
-1)
Case O S :
If large samples are drawn from the same population with known variance, then the test
. . . . x,-x.
statistic z rs given a s : z = ---;=========
cr(_!_ + _!_)
n, n2
However, in case the population variance is not known, combined sample standard
n,(cr;, + dn + n,(cr;, + d)
deviation ( cr , ) is used and is given a s : o =

a
Where,
d, = (X, - X12)
d, = (X2 - X12)
X, = n,X, + n 2 X2
2
n1 + n2

Case 0 6 :
For small samples, if the population variances are unknown but assumed to be equal.
Then, the test statistic is given a s :
t = (x,-x,)
with df (n, + n, - 2)
L (x" - x,j + L (x - x,j J x (_!_ + _!_)

[ n, + n2 - 2 n, n
2
Example 0 2 :
The mean score in a test of total marks 100 of two samples of size 80 and 100 students
are 61 and 55. Also, the standard deviations for the two samples are given as 2 and 1,
respectively. Test whether the two samples are drawn from the same population with
standard deviation 1.4. Use level of s i g n i f i c a n c e = 0.05.
Solution O 2:
The n u l l and the alternative hypothesis is stated as:
Ho : , = ,
H, : , " ,
From the given data we have, X,= 65 and X 2 = 55
S a m p l e s of size n, = 80 and
cr = 1.4
Here, the test statistic to be used is 'z' statistic and is given a s :
X 1 -X 2
z = ---;=========
cr2 (_!_ + _!_)

n, n2
6 5 - 55
=
(1.4)2( _.!.._ + _1_)

80 100
5
=
(l.96{810 + 1 0 )
5
=
(1.96)(0.0225)
= 23.8095

Using normal curve area table, the critical region for 510 level of significance is lzl > 1.96
Therefore, the calculated value of z falls on the rejection region. Thus, the null
hypothesis is statistically significant at 510 level of significance. Therefore, we may
conclude that the two samples are not drawn from the same p o p u l a t i o n .
7 . 8 Testing of Hypothesis for Variance
When the hypothesis is tested for variance there can be two cases:
Va ria nee of sing le sample
E q u a l i t y of variances of two normal populations
Variance of a single sample
When a sample is drawn with variance a; from a population with variance ", the null
hypothesis is g i v e n a s : H0 : cr; = a
The test statistic i s : x' = a; (n - 1 ) , which follows chi-square d i s t r i b u t i o n with (n - 1) df.

(}" p
Equality of variances of two normal populations
When two p o p u l a t i o n s are to be compared for equality of variances, the null hypothesis
is given a s : H0 : cr/ = a;
cr2
And the test statistic is given a s : F =-- 2

cr s,
Where,
(J ,, =
L(x - x,)2
,,
(J =
(n, - 1)
If the calculated value of F is greater than the table value of F, at a certain level of
significance, for (n1 - 1) and (n2 - 1) degrees of freedom, regard the F-ratio as
significant.

Example 0 3 :
If two samples are drawn from two normally distributed populations as g i v e n below, test
whether the two p o p u l a t i o n s have similar variance at 5/o level of significance?
Sample 1 4 6 3 8 10 11 6 7 2 12 17 16
Sample 2 12 14 18 19 15 10 11 16 21 20 18 9 13 1 1 7 1
. . .
Table 7.Sa: D1stnbution of the Two Samples
Solution 0 3 :
Here, the n u l l hypothesis is given a s : H0 : a'i = if2
2
Since the p o p u l a t i o n variance is not known, we use sample variances a's, and 0 ,,
The test statistics to be used is F statistics and is given a s :
Sample 1 Sample 2
(xi, - x1f (x21 - x2f

XI x,
4 12 20.25
10.3298
6 14 6.25
1.473796
3 18 30.25
7. 7 6 1 7 9 6
8 19 0.25
14.3338
10 15 2.25
0.045796
11 10 6.25
27.1858
6 11 6.25
17. 7578
7 16 2.25
0.617796
2 21 42.25
33.4778
12 20 12.25
22. 9058
17 18 72.25
7.761796
16 9 56.25
38.6138
13
4. 9 0 1 7 9 6
17
3.189796
102 213 269.25 190.3571
Table 7 . S b : Distribution of the Two Samples

S a m p l e mean and variance for the two samples are as g i v e n :
11
X , = l: X = 1 0 2 = 8 . 5
n, 12
213
X, = L x,, = = 15.214
n, 14
0' =
"(X
L, 11 -
x)'
1 = 269.25 - 24.477
ei (n,-1) 1 2 - 1
02
= 2: (x,; - x;J 190.357 =
14_643
sa (n, - 1) 14 - 1
F = 0, = 23.364 = 1 . 6 7 2
The value of the test statistic i s :
2
0 14.643
sa
The tabulated value of F statistic at 5/o level of significance for ( 1 2 , 1 3 ) df is 2 . 6 0 . Thus,
the calculated value (1.672) is less than the tabulated value (2.60), so we accept the
n u l l hypothesis at 5% level of significance. Hence, we can conclude that the two samples
have been drawn from two populations with the same variance.
7 . 9 Testing of Hypothesis for Correlation Coefficients
If a sample of 'n' pairs of observations (x, y) from a normal population, 'r' is the
correlation coefficient between X and Y, and the population correlation is 'p', the null
hypothesis i s : H0 : p = 0
Testing of significance of simple correlation coefficient
r .
In case of s i m p l e correlation coefficient, the test statistic is: t = with df (n - 2)
1- r'
This calculated value is then compared with its tabulated value for a specific level of
significance. If the calculated value is less than the tabulated value, the null hypothesis
is either accepted or rejected.

Testing of significance of partial correlation coefficient
rP
In case of partial correlation coefficient (rp), t = --'----=
Ji- r:
Where,
n = N u m b e r of paired observations
k = N u m b e r of variables
If the tabulated value o f t , for (n - k)df, is greater than the calculated value, we accept
the null hypothesis for a specific level of significance that there is no partial correlation
coefficient.
Testing of significance of multiple correlation coefficient
If the multiple correlation coefficient is denoted by R, then the test statistic applicable
R A!< - 1 )
here is F-statistics and is given as: F-
- (l- R2 V
/(n- k)
Where,
k = N u m b e r of variables involved
If the tabulated value of F is obtained for ( k - 1, n - k )df at a% level of significance is
less than the calculated value of F, the null hypothesis is rejected at a% level of
significance.
7 . 1 0 L i m i t a t i o n s of Testing of Hypothesis
Hypothesis testing is only a technique to help in d e c i s i o n - m a k i n g .
It only e x p l a i n s whether the null hypothesis is true or false but it does not e x p l a i n
why the hypothesis is accepted or rejected. Thus, it fails to give the cause of the
acceptance or rejection.
The result obtained from the computation of a test statistic is compared with the
critical values w h i c h are probability values.
All significance tests cannot be considered as accurate measures, with respect to
the formulated hypothesis.

Hypothesis is usually considered an essential tool in research. Its m a i n function is
to suggest new experiments and observations.
A statement made about the population parameter is known as hypothesis.
A hypothesis where there is no difference between two situations, groups,
outcomes or the prevalence of a condition or phenomenon is called a null
hypothesis.
A hypothesis that is an opposite of the null hypothesis is called an alternative
hypothesis. It is also known as hypothesis of difference.
In parametric tests procedure, assume that the data has come from a type of
probability distribution.
Non-parametric tests are often used in place of their parametric counterparts,
when certain a s s u m p t i o n s about the underlying population are q u e s t i o n a b l e .

A n a l y s i s of
Variance
( A N OVA)
08. Analysis of Variance (ANOVA) eBook
8 . 1 Introduction
As discussed in the previous chapter, t-test is used to compare the means of two
samples to see if there is a significant difference between them. However, if an
experiment involves more than two sets of data, it would be time c o n s u m i n g to compare
the results. In case of agriculture application, you must test more than two sa m p l e s to
study the influence of various factors, such as variation in seed quality, effect of
fertilizers on the types of seeds, etc. in such a situation, analysis of variance can be
applicable.
Analysis of variance, most commonly known as ANOVA, is one of the main statistical
techniques used to test differences between two or more means. ANOVA means
'Analysis of Variance', rather than 'Analysis of Means' because inferences about means
are made by analyzing the variance. With ANOVA, you can analyze data from several
i n d e p e n d e n t variables, simultaneously.
Analysis of variance for experimenting with only one factor is called 'one-way ANOVA'
and for experimenting with two factors, it is called 'two-way ANOVA'. In t h i s method, the
effect of a factor is tested by calculating the F-ratio, where a separate F-ratio is
computed for each factor in the experiment, which is a test for main effects. ANOVA
method doesn't depend on the number of levels of each factor. ANOVA is available for
both, parametric (score data) and non-parametric (ranking/ordering) data.
State the meaning of Analysis of Variance (ANOVA)
Describe v a r i a b i l i t y measure by one-way ANOVA
Describe the method of one-way ANOVA technique
Describe the method of two-way ANOVA technique
Describe the method of Analysis of Covariance (ANOCOVA) t e c h n i q u e

8.2 A n a l y s i s of V a r i a n c e CANOVA)
According to Prof. R. A. Fisher, "Analysis of variance (ANOVA) is the separation
of variance ascribable to one group of causes from the variance ascribable to
another group." By this technique, total variation in the sa m p l e data is expressed as
the sum of its non-negative components, where each of these is a measure of the
variation, d u e to some specific independent source or factor or cause.
Analysis of variance (ANOVA) involves investigation of the effects of one treatment
(independent) variable on an interval-scaled dependent variable. To check whether the
difference in means between two or more groups are statistically significant, which is
practically difficult to solve by z-test or t-test, in such a case, the hypothesis testing
t e c h n i q u e ANOVA is u s ed .
Using AN OVA technique, you can decompose the total variability found w i t h i n a data set
into two components that are random and systematic factors. The random factors do not
have any statistical influence on the given data set, while the systematic factors d o . The
ANOVA test is used to determine the impact of independent variables on the d e p e n d e n t
variable in a regression analysis.
Examples:
In case you have data on a student's performance in non-assessed assignments,
as well as, t h e i r final grading and if you are interested in finding out whether the
performance in the assignment is related to the final grade obtained. Using
AN OVA, you can break up the group according to the grade and see if the relation
of performance is different across these grades.
Consider that a manager wants to find whether the location has an effect on the
profit of an apparel retail business having the following alternatives for location,
namely, stand-alone shop, shop in a shopping centre, and o n l i n e delivery system.
Here, location is the only independent variable (IV) and the profit/loss is the
dependent variable ( D V ) . In such a case, the t-test would not be appropriate. One
way AN OVA would be the choice for analysis.

Assumptions in ANOVA
Ordinarily, the categories of the independent variable are assumed to be fixed.
This type of model is known as fixed effect model.
The error terms or variation within samples is normally d i s t r i b u t e d , with a constant
variance and zero mean. The error is not related to any of the level of X.
The error terms or variation within samples are uncorrelated. If the error terms
are correlated (that is, the observations are not independent), then F-ratio can be
seriously distorted.
ANOVA must have a Dependent Variable (DV) that is metric and also, one or more
Independent Variables (N) that are all categorical. Measurable variables, such as h e i g h t ,
income, and age are called metric variables.
According to Gudmund R. Iversen, Mary Gergen, and Mary M. Gergen, "a metric
variable is not metric in the sense that the metric system but in the sense that
its values can be numerically measured."
In t h i s ANOVA technique, factors are the categorical independent variables. Treatment is
nothing but a particular combination of factor levels or categories. In the previous
example of retail business, profit/loss is DV and three locations are treatment for a
location as a factor.
Relationship Among Techniques
One-way ANOVA involves only one categorical variable or a s i n g l e factor. There w i l l be
various levels for the single factor. If more than two factors are involved, the analysis is
termed as n-way ANOVA.
Professor Snedecor and many others contributed to the development of ANOVA
technique. ANOVA is, essentially, a procedure for testing the difference among different
groups of data for homogeneity. According to Prof. Snedecor, "The essence of
ANOVA is that the total amount of variation in a set of data is broken down into
two types, that amount which can be attributed to chance and that amount
which can be attributed to specified causes."
There may be variation within sample items and between samples. Using ANOVA, you
can s p l i t the variance for analytical purposes. Therefore, it is a method of analyzing the
variance to which a response is subject into its various components, corresponding to

various sources of variation. Using ANOVA technique, you can explain whether varieties
of seeds or fertilizers or soils differ significantly. Similarly, differences in various types of
feed prepared for a particular class of animal or various types of d r u g s manufactured for
curing a specific disease, may be studied and judged to be significant or not t h r o u g h the
a p p l i c a t i o n of ANOVA technique.
8 . 3 Why Analyze Variance?
Consider two different experiments, with their distribution pattern as below.
Experiment - 1 Experiment - 2
) \
,o -10 -s O s 10 IS 20
0 2 4

Within group variability for each Three groups have approximately
group is relatively small. the same mean, unlike experiment
It is easy to see that there is a 1, and the variability within each
difference between the means of group is m u c h larger.
the three g r o u p s . It is not easy to see the difference
between the means of the three
groups.
Hence, to differentiate the groups in experiment 2, the variability between the groups
must be greater than the variability within the groups. If the v a r i a b i l i t y w i t h i n the g r o u p s
is large compared to the variability between the groups, any difference between the
groups is d i ff i c u l t to detect. Variability between the groups and variability within the
groups are compared to determine whether or not the group means are significantly
different.
Using ANOVA technique, one can investigate any number of factors that influence the
dependent variable. Also, one may investigate the differences amongst various

categories within each of these factors, which may have a large number of possible
values.
If you take only one factor and investigate the differences amongst its various
categories having numerous possible values, you are said to use one-way ANOVA and in
case we investigate two factors at the same time, we use two-way ANOVA. In a two or
more way ANOVA, the interaction (that is, inter-relation between two independent
variables/factors), if any, between two independent variables affecting a dependent
variable can also be studied for better decisions.
8 . 4 V a r i a b i l i t y M e a s u r e by One-way ANOVA
Differences among the means of the population are tested by analyzing the amount of
v a r i a b i l i t y w i t h i n the g r o u p , relative to the amount of variation between the g r o u p s .
In ANOVA, the v a r i a b i l i t y is decomposed into two, as shown in Fig. 8.4a.
Total
variability in
DV
Fig. 8.4a: Decomposition of Total Variability
Variability within the groups
In terms of variation within the given population, it is assumed that the values of i'h
observation of i'h group (Y, (where, i and j are positive integers excluding zero) differ
1)
from the mean of this population only because of random effects, that is, there are
influences on (Y 1
i) that are unexplainable.
It represents an estimate of population based on within sample variance that is
u n e x p l a i n a b l e , t h u s , an error in observations.

Variability between the groups
In terms of variation (differences) between population, assume that the difference
between the mean of the /h group and the grand mean is attributable to what is called a
'specific factor' or what is technically described as treatment effect.
Two estimates are compared with the F-test for the given degrees of freedom and level
of significance. The F statistic in the ANOVA is:
F = Estimate of p o p u l a t i o n variance based on between sample variance
Estimate of population variance based on within sample variance
In AN OVA, the F-test is used to test the null hypothesis, which is stated as:
Ho : , = 2 = 3 = = ,
For a large value of F statistic, the greater the likelihood that the differences between
means are d u e to the treatment or something other than the chance alone, that is, the
means are significantly different from each other.
Thus, you have to accept the alternative hypothesis, which states that at least one of
the sample mean is significantly different from the rest of the means.
H, : , " 2 " 3 " . . . . " ,

8.5 ANOVA T e c h n i q u e
The steps for ANOVA technique are displayed in Fig. 8.Sa.
I d e n t i fy the dependent and i n d e p e n d e n t v a r i a b l e s
Decompose the total variation
M e a s u r e t h e effects
)
Test the significance

)
Interpret the results

)
Fig. 8.Sa: The Steps for ANOVA Technique
Layout for one-way ANOVA
Consider that N observations x,1 (i = 1, 2 ... k; j = 1, 2 ... n.) of a random variable X are
k
grouped into k classes of sizes n i . n2 ... nk, respectively, (N = n,) as shown in Table
i = 1
8.Sa.
Means Total
-
X11 x,, . . .
X1n1 x, . T,
x,, x,, ...

X2n
2
x, . T2
' '
'
X,1 X,2 . . .
X1n, x, r,
x" x,, ...

xknk x, . T,
Table 8 . S a : Layout for One-way ANOVA

8.6 One-way ANOVA - Example
Example 0 1 :
Three machines A, B, and C are tested to see whether their outputs ( n u m b e r of items
produced) are e q u i v a l e n t . The following observations of output are m a d e :
Machine A 12 10 11 13 14 15
Machine B 11 8 12 10 13
Machine C 10 11 14 15 12 13
Table 8 . 6 a : Output of Three Different Machines
Carry out the analysis of variance and state your conclusions.
Solution O 1 :
Step 1: Identifying IV and DV
For the given example,
Independent V a r i a b l e : Number of items produced X(IV)
Treatment/Factors: There are three types of machines Y(DV)
Step 2: Decompose the Total Variation
The total variation in Y, denoted by SSy (SS: Sum of Squares), is decomposed into two
components SSbetween and SSw1thin
Where,
Total Variability
SSy = Total of (Observed value - Grand mean)?
n, c
ssv = I I <Yi1 - Yl'

i=l J=l
Also, here R . S . S . ( R a w sum of squaresj s, IIY,i',
SSy = Total S . S . = R.S.S. - c.f.
Within Group
2
SSwithin = Total of (Group observed value - Group mean)

Between Group
SS between = Total of [ng,oup x (Group mean - Grand mean)2]
y2
Between sum of squares = I-' - c. f .
i n,
Within S . S . = Total S . S . - Between S . S .
,,
I\
y = c=, ( M e a n for category/group j)
' nJ
,, c
I IY,j
1
Y = ,=, ; (Mean over the whole sample or the grand mean)
Here, we set u p the n u l l hypothesis as:
Ho: Various m a chi n e s are homogeneous
That is,
1
=
2
=
3
=
Where, , = Mean output from ith machine (i = 1, 2, 3)
For the given e x a m p l e :
Machines Output Y;.

I v.;J 2
A 12 10 11 13 14 15 75 955
B 11 8 12 10 13 54 598
c 10 11 14 15 12 13 75 955
Total 204 2508
Table 8 . 6 b : Output of Three Different Machines

R.S.S. (Raw S u m of S q u a r e s ) = LLY/ = 2508, G = c.f. = {l: Y , J J = 2448
SSy = Total S . S . = R.S.S. - c.f. = 2508 - 2448 = 60
2 2 2 2
y 75 54 75
B e t w e e n s u m o f sq u a r e s = L-'- - c. f . = (-+-+-)-2448=10.2
, n, 6 5 6
Within S . S . = Total S . S . - Between S . S .
= 6 0 - 1 0 . 2
= 49.8
Step 3: Measure the Effect
The strength of the effects of X (IV) on Y (DV) is measured as follows:
2 ssbetv.een ss, - sswithin
YI = SS = SS
y y
2
The value of ri varies between O and 1.
For the given example,
2 = ssbe,_a = 10.2 = 0.17
YI ss, 60
In other words, 17% of the variation in the defect rate is accounted due to types of
machines.
Step 4: Test the Significance
In one-way ANOVA, the interest lies in testing the null hypothesis that the category
means are e q u a l in the population.
Under n u l l hypothesis:
Ho : , = , = 3
Assume that the variation between the samples and within the samples come from the
same source of v a r i a t i o n .
The null hypothesis may be tested by the F statistic based on the ratio between the two
estimates as follows:

ssbetwee,/
F _ /(c - 1 ) _ MSbetwee,
- SSw1t111n/ - MSw1t111n
/(N - c)
MS = SSw,thi, = Mean square variation within the samples and (N - c) represents

wlthln (N _ c)
degrees of freedom w i t h i n the samples.
MS SSbetwee, = Mean square variation between the samples and (c - 1)

between - (c _ 1)
represents degrees of freedom between the samples.
F-test statistic follows the F-distribution with (c - 1) and (N - c) degrees of freedom.
Refer to the F-distribution table for the value of Fcr,ucal for various levels of significance
( 0 . 0 5 , 0 . 0 1 etc.). Reject the null hypothesis if, Fcritical < Fcalulated
For the given e x a m p l e :
Fcritical = 3 . 7 5 4 for the u = 0 . 0 5 , level of significance, and degrees of freedom.
(c - 1) = 2 and (N - c) = 14
ssbetwee,/ 10.2/
/(c - 1) _ 72 _
Fca1wate0 = SS-+-
- - 49 Yi -1.4338
within
(N - c) 14
Since, Fcr1t1ca1 > Fca1u1ated, the null hypothesis may be accepted.
Step 5: Interpret the Results
The independent variable does not have a significant effect on the dependent
variable, if the n u l l hypothesis of equal category means is not rejected.
On the other hand, the effect of the independent variable is significant, if the null
hypothesis is rejected.
A comparison of the category means will indicate the nature of the effect of the
i n d e p e n d e n t variable.
For the given example, the null hypothesis may be accepted. There is no significant
difference in the mean of the groups. The type of machine does not have significant
effect on the o u t p u t .

8. 7 Two-way AN OVA
While doing research, the researcher is often concerned with the effect of more than one
factor simultaneously. In two-way ANOVA, the influences of two factors are considered
simultaneously, with t h e i r respective categories on the dependent variable.
For example, the quality of fabric (high, medium, and low) interacts with price levels
(high, medium, and low) to influence a brand's sale. Here, the dependent variable is the
brand's sale and the independent variables are quality and pricing. Within two factors,
there are three levels of categories.
Depending u p o n the replication of data within the levels of the factor, two-way AN OVA is
classified into two types:
Two-factor ANOVA without replication, where each factor combination is
observed exactly once.
Two-factor ANOVA with replication, where each factor c o m b i n a t i o n is observed
'm' n u m b e r of times.
Two-way ANOVA Layout: Without Replication
The data format for two-way ANOVA without replication is shown in Table 8.7a.
Levels of Factor Levels of Factor B (Bj)

Row
A
Mean
B1 B2 ... Be
(A;)
A1 Yu Y12 ... y le Y,.
-
A2 Y21 Y22 ... Y2, Y,.
... ... ... ... .. . .. .
A, Y rl Y,2 ... Yrc \
- -
Column Mean Y_, Y_, ...
Y.c
Table 8 . 7 a : Two-way ANOVA without Replication
i = 1, 2, 3 r, represents the different categories of factor A
j = 1, 2, 3 c, represents the different categories of factor B

Analysis of Variance Table for Two-way ANOVA
ANOVA table can be setup in the usual fashion, as shown in Table 8 . 7 b .
Source Degrees
F-test
of Sum of Squares of Mean Square
Ratio
Variation Freedom
r
(r - 1 )
MS = SSA F - MSA
Factor A SSA = CL (Y, - Y)'
A - MS
i=l
A (r - 1)
E
MS = SSB le - MSB
Factor B S S A = rL (Yi - Y)2 (c - 1)
B - MS
B (C - 1)
j=l E
r c
2 MS = SS,
Error SS,= L L(Y
IJ
- Y - Y.J. + Y )
I.
i=l j=l (r-l)(c-1) ' (c-l)(r-1)
r c
Total ss, = L L C\i - Yl'

1=1 J=1
(re - 1)
Table 8 . 7 b : Analysis of Variance Table for Two-way ANOVA

8.8 A n a l y s i s of Covariance (ANOCOVA)
W h i l e studying differences in the mean values of the dependent variables related to the
effect of the controlled independent variables, it is often necessary to take into account
the influence of uncontrolled independent variables.
Using Analysis of Covariance Technique (ANOCOVA), the influence of uncontrolled
variables is usually removed by simple linear regression method and the residual sums
of squares are used to provide variance estimates, which, in turn, are used to make
tests of significance.
,z
I x---.v'"r
Consider the influence of variable X (IV) on variable Y (DV) and also, the influence of
uncontrolled variable, z, which is correlated to the variable Y.
Covariance analysis consists of:
Subtracting each individual score (Y;) from correction factor of Y(Yi), that is
predictable from the uncontrolled variable ( Z ; ) .
Computing the usual analysis of variance on the resulting (Y - Y' ) .
ANOCOVA: Assumptions
Assume that there is some sort of relationship between the dependent variable
and the uncontrolled variable.
Various treatment g r o u p s are selected at random from the p o p u l a t i o n .
The g r o u p s are homogeneous in variability.
The regression is linear and is same from group-to-group.

8.9 Chapter S u m m a ry
ANO VA is a data analyzing technique to determine differences between the means
of more t h a n two samples.
It considers that total variation is due to the variation in treatment and the
variation that is u n e x p l a i n e d .
In one-way ANOVA, only one factor of influence is considered for study, in two
way ANOVA, the influences of two factors are considered simultaneously.
Two estimates are compared with the F-test for the g i v e n degrees of freedom and
level of significance. The 'F' statistic in the ANOVA is as g i v e n below:
F = Estimate of p o p u l a t i o n variance based on between sa m p l e variance
Estimate of population variance based on within sample variance
In a na l y s i s of covariance, the effect of uncontrolled variables associated with
dependent variables is analyzed by the regression method.

Non-parametric
Testing a n d
C h i - s q u a r e Test
R,esearch Methodology
09. Non-parametric Testing and Chi-square Test eBook
9 . 1 Introduction
There are various statistical techniques that are based on assumptions about the
p o p u l a t i o n from w h i c h the samples are drawn, For example, a s s u m i n g that the sa m p l e is
drawn from a normally-distributed population, All such techniques fall under parametric
tests but in situations where there are no rigid assumptions about the population, a
parametric test is not a p p l i ca b l e . Thus, for such data, non-parametric tests are the only
choice because they make no assumptions regarding the population and their
parameters,
In t h i s chapter, different non-parametric tests will be d i sc u s s ed , Also c h i - sq u a r e test, a
measure of non-parametric test, is discussed, Chi-square test is a widely used test for
inference in almost a l l fields of research.
Describe non-parametric tests
State the advantages and disadvantages of non-parametric tests over parametric
tests
Discuss C h i - sq u a r e test
E x p l a i n s i g n test and run test
E x p l a i n Spearman's rank correlation coefficient and Kendall's coefficient
Discuss Wilcoxon matched-pairs test
vivtv. i t rn u n i v e r s i t y o n l i n e . o r q Page 141

R_esearch Methodology
09. N o n - p a r a m e t r i c Testing and Chi-square Test eBook
9 . 2 N o n - p a r a m e t r i c Test
Definition
The non-parametric test is for inferences which do not need a n y a s s u m p t i o n s about the
distribution of variables. Thus, non-parametric test is also known as 'distribution-free
test'. For example, if the sales of two sports goods' brands are to be compared and
there is no assumption about the distribution of the two variables (both brands), then
you w i l l be u s i n g a non-parametric test for the inferences.
According to Gibbons, "a statistical technique is said to be non-parametric if it
satisfies one of the following five criteria:
1. The data are count data of number of observations in each category
2. The data are nominal scale data
3. The data are ordinal scale data
4. The inferences does not concern a parameter
5. The assumptions are general rather than specified."
In a parametric test, assumptions about the distribution of variables are necessary, so
its result is more reliable than a non-parametric test. Thus, parametric and non
parametric tests differ from each other. Table 9.2a shows the advantages and
disadvantages of non-parametric test over parametric test.
Advantages Disadvantages
1. Non-parametric methods are 1. Non-parametric tests can be used
readily comprehensible, very simple, only if the measurements are n o m i n a l or
easy to a p p l y , and do not require a ordinal. Even in that case, if a
complicated sample theory. parametric test exists, it is more
powerful than the non-parametric test.
In other words, if all the a s s u m p t i o n s of
a statistical model are satisfied by the
data and if the measurements are of
required strength, then the non-
parametric tests are a waste of time a n d
data.
2. No a s s u m p t i o n is made about the 2. So far, no non-parametric methods
form or the frequency function of exist for testing interactions in an
the parent p o p u l a t i o n from which 'analysis of variance' model, u n l e s s

the s a m p l i n g is d o n e , special assumptions about the a d d i t i v i t y
of the model are made,
3, No parametric technique will 3, Non-parametric tests are d e s i g n e d to
a p p l y to the data w h i c h are mere test statistical hypotheses o n l y and not
classification (that is, which are for estimating the parameters,
measured in nominal scale), while
non-parametric methods exist to
deal with such data.
4, Since the socio-economic data are
not, in general, normally-distributed,
non-parametric tests have found
a p p l i c a t i o n s in psychometrical,
sociological, and educational
statistics,
5, Non-parametric tests are
available to deal with data that are
given in ranks or whose seemingly
numerical scores have the strength
of ranks. For instance, non-
parametric tests can be applied if
the scores are given in grades, such
+ - +
as: A , A , B, A, B , etc.
. .
Table 9 . 2 a : Advantages and Disadvantages of Non-parametric Test over
Parametric Test
Source: Gupta. S. C., Kapoor V. K., Fundamentals of Mathematical Statistics, Eleventh Edition, Sultan
Chand & Sons, New De l h i , 2002
In t h i s chapter, some of the non-parametric tests, like sign test, rank sum test, run test,
Kendall's test, and chi-square test are discussed,
Non-parametric tests are very easy to calculate and also, provide quick results, In a
situation where you have data that is not exact or have no information about the
p o p u l a t i o n d i s t r i b u t i o n from which it is taken, then the parametric test fails and the non
parametric test is the only savior.

9 . 3 C h i - s q u a r e Test
2
The Chi-square test is symbolically represented as x , The chi-square test is used for
the following cases:
To test the significance of population variance or homogeneity
To test the goodness of fit
To test the significance of association between two attributes
Chi-square to Test the Significance of Population Variance
The chi-square test can be used to determine whether the population variance is
significant, that is, a sample is drawn from a population, which is normally distributed
2
with mean and variance cr
The nu II hypothesis i s :
, 2 2
Ho CT s = CT P
Where,
2
cr 5 = Variance of the sample
cr = Variance of the population
n = S a m p l e size
Then the test statistic is g i v e n a s :
x 2
=2(n-1)
cr>
p
The value obtained from the above formula is compared with the critical value of chi
square at a level of significance, x2

, If the calculated value is greater than the
a, n-1
tabulated value of chi-square, then the null hypothesis is rejected at level of
significance a,
Chi-square is also used as a non-parametric test when the assumptions about the
population from which the sample are drawn are not known, The chi-square for non
parametric test involves chi-square goodness of fit and chi-square to test the
significance of association between two attributes,
vsvtv. itmu niversityonline.orq Page 144

Chi-square Goodness of Fit
This test measures the difference between the observed and the theoretical (expected)
frequencies. This mechanism was developed by Karl Pearson in 1900, who named it
Chi-square goodness of fit.
To test the n u l l hypothesis,
H0 : There is no d i ff e r e n c e between the expected and observed f r e q u e n c i e s
And the test statistics is given by:
X2 = I (o, - e , J ' x-,

e,
Where,
e, = Expected frequency
o, = Observed frequency
k N u m b e r of categories
If a sample is arranged in k categories and the observed frequency is given, then the
expected frequency is calculated by the formula:
e, = np., i = 1, 2, 3 . . . k
Where,
p, = The p r o b a b i l i t y random variable X falls in the i'h category
n = S a m p l e size
Thus, comparing the calculated and the critical value at a specified level of significance
H is accepted or rejected, accordingly.

0
Example 0 1 :
If two coins are tossed 60 times, then the number of heads is g i v e n as shown in Table
9.3a.
Number of Heads 0 1 2
Frequency 15 25 20
Probability 0.25 0.5 0.25
Table 9 . 3 a : The Frequency Distribution of N u m b e r of Heads

Solution O 1 :
2
(e, - e,)2
x P, 0, e, oi-ei (o, - e;}
e,
0 0.25 15 15 0 0 0
1 0.5 25 30 -5 25 25 5
- = -
30 6
2 0.25 20 15 5 25 25 5
- = -
15 3
15
-
6
Table 9 . 3 b : The Frequency Distribution of N u m b e r of Heads
(o, - e.)? 15
2
Therefore, X = L ' ' = - = 2.5
e; 6
Thus, the critical value of x;_, at 510 level of significance is 5.991. Since, calculated
x 2
< x;_,, therefore, we accept the null hypothesis at 510 level of significance and
conclude that there is no difference in the observed and expected frequencies.
Chi-square to Test the Significance of Association Between Two Attributes
If two attributes, A and B, are given, which are divided into 'r' and 's' sub-categories,
such that they are arranged in a r x s contingency table, as shown in Table 9.3c.
A, A2 ... A, Total
B, (A,B,) (A 2 B,) ... (A,B,) (B,)
B2 (A, 8 2 ) (A2B2) ... (A,8 2 ) (82)
B, (A,B,) (A 2 B,) ... (A,B,) (B.)
Total (A,) (A2J ... (A,) N
Table 9 . 3 c : r x s Contingency Table
From Table 9.3c, LA; = LBJ = N

Here, the n u l l hypothesis is H : There is no difference between two a t t r i b u t e s

0
S i m i l a r l y , the expected frequencies are obtained, as in, c h i - sq u a r e goodness of fit test.
The expected frequency for (A ) = [Total of i"' column x Total of j"'row]

6
' ' Grand total (N)
And the test statistic is g i v e n a s :
X' = I (o, - e,)'
e,
Where,
o, = Observed frequency
e, = Expected frequency
F in ally , the critical value is obtained in order to test the significance of the null
hypothesis at a level of significance for (r - l ) ( s - 1) degrees of freedom,
Example 02:
Table 9.3d shows the distribution of sales of two apparel brands in showrooms,
s h o p p i n g centers, and online s h o p p i n g .
Sales from
Brands Shopping Online Total

Showrooms
Centers Shopping
A 120 100 60 280
B 180 60 80 320
Total 300 160 140 600
Table 9 . 3 d : 2 x 3 Contingency Table
Test the significance of both the brands that are equally referred at a 5% level of
significance,

R.esearch Methodology
Solution O 2:
The null hypothesis is H : There is no difference between sales of the two brands in the
0
g i v e n outlets.
The expected frequencies of brand A are:
280 300
Sales in showroom = x = 140
600
280 160
Sales in s h o p p i n g centre = x = 74.667
600
280 140
Sales in on line s h o p p i n g = x = 65.333
600
S i m i l a r l y , for brand B:
320 300
Sales in showroom = x = 160
600
320 160
Sales in s h o p p i n g centre = x = 85.333
600
320 140
Sales in on line s h o p p i n g = x = 74.667
600
Therefore, value of x 2
statistic is calculated, as shown in Table 9. 3e .
Observed Expected
2
2
(o - e)
Brands Frequency Frequency (o-e) (o- e)
e
(o) (e)
Showrooms 120 140 -20 400 2.857
A Shopping Centers 100 74.667 25.333 641. 761 8.595
O n l i n e Shopping 60 65.333 - 5.333 28.441 0.435
Showrooms 180 160 20 400 2.5
B Shopping Centers 60 85.333 - 25.333 641. 761 7.521
O n l i n e Shopping 80 74.667 5.333 28.441 0.381
22.289
.
Table 9 .3 e : 2 x 3 Contingency Table
Degree of freedom is (r - l)(s - 1) = (2 - 1 ) ( 3 - 1 ) = 2

Therefore, the critical value of chi-square for 2 degrees of freedom at 5% level of
significance is 5.99L Since the critical value is less than the calculated value of chi
square, we reject the n u l l hypothesis at 5/o level of significance.
Characteristics of Chi-square Test
The m a i n characteristics of chi-square test are:
Since no a s s u m p t i o n s are available about the population, the test is not based on
parameters, such as mean, standard deviation, etc but it is based on the
frequencies.
It is not a p p l i c a b l e for estimation. It is only used for testing hypotheses.
It follows the additive property.
It is useful for complex contingency tables.
Conditions for the Application of Chi-square Test
Some conditions where the chi-square test can be applied are:
Observations recorded and used are collected on a random basis.
A l l the items in the sample must be independent.
No group should contain less than 10 items. In cases where the frequencies are
less than 10, regrouping is done by combining the frequencies of a d j o i n i n g groups,
so that the new frequencies become greater than 10.
Some statisticians t h i n k this number is 5 but most believe 10 is better.
The overall n u m b e r of items must also be reasonably large. It should normally be
at least SO, howsoever small the number of groups may be.
The constraints must be linear. Constraints which involve linear equations in the
cell frequencies of a contingency table (equations containing no squares or higher
powers of frequency) are known as linear constraints.
Source: Kothari, Research Methodology, 2002, New Age International, New Delhi.
9 . 4 S i g n Test
The easiest and simplest of all non-parametric tests is the sign test. In this test, the
direction of the observations, that is, positive or negative direction, which are denoted
by '+' and '-' signs, are considered instead of the magnitude. There are two types of
s i g n test:
One sample s i g n test
Two sa m p l e s i g n test

One Sample Sign Test
If a data X
1
, X , , X, " , X" is given with the sample median 8 , T h i s test is used when you
want to test the s a m p l e median with a specific mean value (9 ),

0
And it is assumed that, P(X < 8) = P(X > 8) = .!

2
Then, the n u l l hypothesis i s :
Ho : = o
If the value of the sample observation is greater than 80, then the values are replaced
by a positive sign '+' otherwise by a negative sign '-', However, if the values of the
sample observation are equal t o 8 0 , then it is ignored,
The total numbers of'+' signs (r) and the total numbers of'-' signs (s) is such that
r + s n
Thus, in order to test the hypothesis here, r is considered to follow binomial distribution
with p = .!. , Then, the hypotheses are stated as:

2
1 1 1
H0 : p = - and H1 : p " - or p < -
2 2 2
However, for a large sample size, normal approximation to binomial d i s t r i b u t i o n is used,
Example 0 3 :
Consider the n u m b e r of pages printed by 12 printing machines, in a printing press is as
given: 320, 370, 4 3 0 , 320, 350, 3 1 0 , 390, 380, 360, 320, 400, and 320,
Using s i g n test at 5% level of significance, test that the average pages printed are 3 8 5 ,
Solution 0 3 :
The n u l l hypothesis is H
0
: = 385
The values of the g i v e n data are replaced by positive s i g n ' + ' and with a negative s i g n '
- ' as below:
-, -, +, -, -, -, +, -, -, -, + and - .
Thus, the modified data follows binomial distribution and the null hypothesis is
1
Ho : p = -

And n = 12, r = 3, s = 9, and p = .!:. , s o tabulated value at 5/o level of significance from
2
the b i n o m i a l table is 2 . 6 8 , which is greater than a = 0.05.
Therefore, the null hypothesis is accepted at 5/o level of significance and we can
conclude that the average pages printed are 385.
Two Sample Sign Test
This test is used to determine if two samples are drawn from an identical population.
Thus, if there are two samples, then the sign test for two samples is a p p l i c a b l e .
In t h i s method, each pair of values is replaced with a positive sign ' + ' if the value of the
first sample is greater than the value of the second sample. Otherwise, it is replaced by
a negative s i g n ' - ' . If the values are equal, it is ignored.
9 . 5 R u n Test
The word 'run' here denotes the sequence or series of symbols that are followed or
preceded by a different symbol or no symbol. A run test is used to verify whether there
is any randomness among the observations in a given data. Thus, a run test helps you
to find out if the sample is randomly selected from the population.
For example, before launching a new health drink in the market, the manager of the
company wants to conduct a survey to determine which age group would be the
preferable target group for the product. Customers in the age group < 25 years are
denoted by T and those > 25 years are denoted by 0. The manager puts up a counter
for the customers to taste the health drink and feedback is taken.
Here, the n u l l hypothesis i s :
H : The customers in age group < 25 and > 25 visiting the counter are random.
0
If the sequence representing the type of customers coming to counter is g i v e n a s :
T T O O T T T T T T T T O O O O O O T T T T
1 2 3 4 5
Thus, in the above representation, there are 5 total number of runs (r) in which 14 are
of ages < 2 5 and 8 are of ages > 2 5 , that is, n1 = 14 and n2 = 8.

R.esearch Methodology
Thus, for small samples, when the sample size is less than 20, then the lower () and
upper ( r , ) critical values at a specific significant level can be obtained from the table for
run tests. If r , ,; r s r,, then the null hypothesis is accepted.
If the sample size is greater than 20, then the sampling distribution of 'r' tends to
2
normal d i s t r i b u t i o n with mean ()and variance ( cr ).
Where,
2n 1 n 2 +l
=---
nl + n2
2
2n n (2n n
- n, - n,)
1 2 1 2
(J = --_c__c-
(n, + n 2)2 (n 1 + n, - 1)
To test 'r', the following standard normal statistics are obtained as:
Z = r -
CT
If the calculated value of Z lies between the tabulated values - Ziz and Z ';/,_ , then the
null hypothesis is either accepted or rejected.
Example 04:
If a d i e is thrown 20 times, you need to test whether the occurrence of an even number
( E ) and an odd number (0) is random or not.
E E E E E E E O O O E E O O O O E E E E
1st
Use a = 0.05.
Solution 04:
In the g i v e n sequence,
n, = N u m b e r occurence for even n u m b e r s = 13
n = N u m b e r occurence for odd n u m b e r s = 7

2
r = N u m b e r of r u n s = 5
Here, H0: The events are random
The lower (r,) and upper (r,) critical values of r a t 5/o level of significance for given
n, = 13, n, = 7 are 5 and 15, respectively. As a result, r, ,; r ,; r,, so we accept the null

hypothesis and state that the occurrence of even and odd numbers in an experiment
where the d i e is thrown 20 times, is random at 5/o level of significance,
9 . 6 S p e a r m a n ' s Rank Correlation
When the data values are not numerically measurable but can be ranked, a c c o r d i n g l y , In
such a situation, a rank correlation coefficient is used, that is, it is used to measure the
association between the variables. It has been formulated by Charles Edward Spe ar man
in 1906, Thus, it is known as Spearman's rank correlation coefficient and is denoted by
p,
The formula for obtaining the rank correlation is:
6Ld }
p - 1 - '
{ n(n2 - 1)
Where,
d, = Difference between ranks = (R1 - R2)
R1 = Ranks assigned to values of the first variable
R2 = Ranks assigned to values of the second variable
2
I d 1
= S u m of the squares of difference between ranks
Here, the null hypothesis H0 : The variables are independent or there is no correlation
between the v a r i a b l e s .
Against the alternative hypothesis,
H : The variables are dependent or there is a correlation between the variables,

1
For sample sizes less than 30, if the critical value is greater than the tabulated value of
the test statistic, then the n u l l hypothesis is either accepted or rejected,
For sample sizes greater than 30, the sample distribution is assumed to follow normal
d i s t r i b u t i o n , with mean zero and standard deviation, Y.J n - t '
That is, standard error, a, = 1/c=:

I ,in - 1
The table for normal curve is used for critical value,

Note:
It must be noted that, if the ranks of two or more values are equal, then the average
v a l u e of the ranks that would have been assigned to the values if they were different, is
assigned to those values. So, the formula for the statistic is adjusted by the term
(m m), where m denotes the number of observations involved in a tie in any of the
1;
variables u n d e r study.
6Ld + L (m -m))
P l _ 12
2
n(n - 1)
Where, L (m m) the summation stands for the number of tied ranks.

1;
Example 0 5 :
Table 9.6a d i s plays the values of two variables X and Y. Test whether the variables are
i n d e p e n d e n t or not at 5% level of significance.
x y
101 120
111 125
102 123
105 121
112 122
109 126
Table 9 . 6 a : Distribution of X and Y
Solution 0 5 :
For the above problem, the null hypothesis is:
H : The variables X and Y are independent.

0
H, : The variables X and Y are dependent.
Here, Spearman's rank correlation coefficient is used, which is g i v e n a s :
6Ld }
p - 1 - '
{ n(n2 - 1)

Table 9 . 6 b is constructed to obtain the calculated value of p .
x y R1 R, d, d. 2
'
101 120 1 1 0 0
111 125 5 5 0 0
102 123 2 4 - 2 4
1 05 121 3 2 1 1
112 122 6 3 3 9
109 126 4 6 - 2 4
Total 18
Table 9 . 6 b : Distribution of X and Y
Therefore, p = 1 - { nJ l)} = 1 - { : 81)}

6
=1-{108}
210
10 2
=
210
= 0.486
Here, for n = 6, at 5% level of significance, the tabulated value of p is 0 , 8 8 6 , Since, it is
greater than the calculated value, we accept the null hypothesis, Thus, the variable X
and Y are i n d e p e n d e n t .
9 . 7 K e n d a l l ' s Test
This test is an important non-parametric test for measuring the significant relationship
between two v a r i a b l e s , When two variables are tested for the association between them,
either Spearman's rank correlation coefficient is used or Kendall's coefficient is a p p l i e d ,
The Kendall's test is also termed as Kendall's Coefficient of Concordance (W),
Here, the hypotheses are stated as:

0

The a s s u m p t i o n s for a p p l y i n g Kendal's test are:
The r a n k i n g is g i v e n independently
The data is o r d i n a l in nature
The procedure for computing and interpreting Kendall's coefficient of concordance (W)
is:
L All the objects, N, should be ranked by all k judges in the usual fashion and this
information may be put in the form of a k by N matrix,
2, For each object, determine the sum of ranks ( R i ) assigned by a l l the k j u d g e s (j =
= 1, 2, 3 . . . k).
3, Determine Ri (mean for R i ), and obtain the value of s as g i v e n :
4, Work out the value of W, using the following formula:
W = s
2 3
_1_ k (N - N)
12
Where,
S=L(RJ-RJJ
N = N u m b e r of objects ranked
Source: Kothari, Research Methodology-Methods and Techniques, New Age International Publishers, New
Delhi, 2 0 0 2
Note:
If there are tied ranks in the data, then the above formula is modified a s :
W = s
2 3
_1_ k (N - N) - kL T
12
A correction factor 'T' is calculated for each of the k sets of ranks and these are added
together over the k sets to obtain LT.
L(t'-t)
T = and the summation depends on the number of tied ranks,
12

Example 0 6 :
The ranks obtained by 5 candid ates from 4 interviews that were conducted for the post
of a CA are as g i v e n in Table 9.7a.
A B c D
1 1 1 5 1
2 3 5 3 4
3 4 4 2 2
4 2 2 4 5
5 5 3 1 3
Table 9 . 7 a : Distribution Ranks given by the 4 Judges
Test at 5% level of significance that the ranks assigned by j u d g e s are different.
Solution 0 6 :
For t h i s , we construct the Table 9 . 7 b .
Sum of ranks
A B c D s = (Ri - R i f
(R;)
I

1 1 1 5 1 8 16
2 3 5 3 4 15 9
3 4 4 2 2 12 0
4 2 2 4 5 13 1
5 5 3 1 3 12 0
60 26
Table 9 . 7 b : Distribution Ranks Given By the 4 Judges
Here, the hypotheses are stated as:

0

s
Therefore, W=----
J:_ k 2 ( N 3 - N )
12
26
=----
J:_ 4 2 ( 5 3 - 5 )
12
= 0, 1625
The calculated value of W is 26 and the tabulated value for k = 4 and N = 5 (using
Kendall's table) is 8 8 A , which is greater than the calculated value,
So, we accept the null hypothesis and conclude that the judges' ranking is insignificant
at 5% level of significance,
Relationship between Spearman's Correlation Coefficient and Kendall's
Coefficient
W is an appropriate measure for studying the degree of association among three or
more sets of ranks but you can also determine the degree association among k sets of
ranking by averaging the Spearman's correlation coefficients ( p) between all possible
pairs k(k - l) of r a n k i n g in view that W bears a linear relation to the average (p) taken
2
over a l l possible pairs. The relationship between the average of p and Kendall's W can
be put in the following form:
Average of p = (kW - %- l)
However, the method of finding W, using average p between all possible pairs is quite
tedious, particularly when k happens to be a big figure and, as such, this method is
rarely used in practice for finding W (Kothari, Research Methodology, 2 0 0 2 ) ,

9.8 Wilcoxon M a t c h e d - p a i r s Test
Wilcoxon matched-pairs test is used in case of paired data, If the data for paired
samples is given like the values before and after a medical treatment, the supply and
demand of a commodity in the market, etc., then the Wilcoxon test, among all non
parametric tests, is the most suitable test.
If X and Y are two paired data of small sample sizes, then the difference between the
values of pairs of variables is obtained and is denoted by d . , that is, d, = X, - Y , ,
Then, the ranks are assigned to the differences by ignoring the + and - sign and also
ignoring the differences with value equal to zero, The next step is to calculate the sum
of a l l ranks with a positive sign (T), with a negative sign ( T - ) , and then obtain Min (T+
'T-),
The n u l l hypothesis is stated a s :
H : There is no difference between the two samples,

0
H, : There is difference between the two samples,
F in ally , the critical value is obtained at a specified level of significance, Thus, if the
calculated value is less than the critical value, then the null hypothesis is either accepted
or rejected.
Example 0 7 :
Using Wilcoxon m a t c h ed - p a i r s test, test whether the two samples are significantly
different at 5% level of significance,
First Sample 10 21 15 14 11 16 13 16
Second Sample 9 25 13 15 17 16 10 11
. . .
Table 9.Sa: Distribution of Values of Two Samples X and Y

Solution 0 7 :
H : There is no difference between the two samples,

0
To calculate the Wilcoxon matched-pairs test, we prepare the Table 9.Sb,
X, Y, d, ld,I Ranks
10 9 1 1 LS
21 25 -4 4 5
15 13 2 2 3
14 15 -1 1 LS
11 17 -6 6 7
16 16 0 0 -
13 10 3 3 4
16 11 5 5 6
Table 9 . S b : Distribution of Values of Two Samples X and Y
Here, T- = 1 3 . 5 and T = 14. 5
Therefore, min ( T, T-) is 1 3 . 5
The tabulated value of T at 510 level of significance in case of a two-tailed test is 2,
which is less than the calculated value of T. Thus, we reject the null hypothesis and
conclude that there is a difference between the samples at 510 level of significance.
9 . 9 M a n n - W h i t n e y U Test
Mann-Whitney U test is used to find out whether the two given samples that are drawn
from the two populations are identical or not. Here, the null hypothesis states that the
two samples are drawn from different populations having the same d i s t r i b u t i o n ,
For example, there are two samples of sizes n, and n

2
, such that N = n, + n,
The ranks are assigned to the two samples separately and if two values are s i m i l a r , then
the average of the ranks that would be assigned to the two values if they were different,
are assigned to the two numbers. Then, the ranks of the two sa m p l e s are summed
separately and are denoted by R,and R2, respectively,

For a s m a l l sample ( n,, n is large than 8 ), the test statistic is given a s :

2
U n, (n, + 1)
, = n, n, + 2
U _ n1 (n1 + 1) R
2 - n1 n2 + - 2
2
Then, obtain min(U,, U ) and compare the obtained value with the tabulated value of U
2
for (n,, n,) degrees of freedom. If the tabulated value is greater than the calculated
value, the hypothesis is accepted.
For a larger sa m p l e ( n , , n, is between 9 a n d 20 ), the test statistic is g i v e n a s :
U _ n, (n1 + 1) R
- n1n2 + 2 - t
Where,
2
U N(u, ou )
2
With mean = n,n, and variance a = n,n,(n, + n, + l)
u 2 u 12
Therefore, the test statistic i s :
U -
z = u
u
Thus, if the first population > the second population, then for Zu < -Za, reject H0 If
the first population < the second population, then for calculated Zu > Za, reject H0 If
the first population and the second population differ from each other, the calculated
Zu < -Za/2 or Zu > Za/2, then reject H0.

Non-parametric tests are for inferences that do not require a n y a s s u m p t i o n s about
the p o p u l a t i o n in the study,
2
The c h i - sq u a r e ( x ) test is used for the following cases:
o To test the significance of population variance or homogeneity
o To test the goodness of fit
o To test the significance of association between two attributes
S i g n test is a non-parametric test in which only direction that is denoted b y ' + ' or
'-' sign are considered, while the magnitude is not considered, There are two
types of s i g n test:
o One sample sign test
o Two sample s i g n test
A run test is used to find out whether the sample is random or not, This test is
based on runs, that is, a sequence or series of symbols, which is followed or
preceded by a different symbol or no symbol-
Spearman's correlation coefficient is used to determine whether the two variables
are i n d e p e n d e n t or not.
Kendall's coefficient is the technique to verify if there is any association between
more than two variables.
The Wilcoxon matched-pairs test is used in case of paired data,

Research Report
Writing
10. Research Report Writing eBook
1 0 . 1 Introduction
In the previous chapters, the topics necessary for a research were discussed. In this
chapter, you will get a brief idea of how to pen-down the research, in order to get a
proper research report. Right from data collection to analysis and interpretation of the
data, the whole process of research is systematically represented in a research report.
Research report w r i t i n g is the final stage of a research. To write a good, effective, and
detailed research report is very subjective in the sense that it varies according to the
perspective, experience, and research of different researchers.
Lately, research is being widely carried out in different fields of physical and social
sciences, so as to convey the results of these researches, proper reports should be
written. Thus, a research report contains all the evidence and valid references to support
its interpretation. Hence, research report writing is the most important part of any
research.
Define research report
State the importance of research report
List the steps in writing research report
State the different parts of a research report
Describe the preliminary part of a research report
Describe the m a i n body of a research report
Discuss the different types of research report

1 0 . 2 M e a n i n g a n d Importance of Research Report
Meaning
According to Z i k m u n d , "A research report is an oral presentation a n d / o r written
statement whose purpose is to communicate research results, strategic
recommendations, and/or other conclusions to management or other specific
audiences."
Thus, a research report is an essential part of research, without which the research is
incomplete. It not only reflects information about the research f i n d i n g but also the whole
process of research is summarized. A report for a business firm is prepared, so as to
help in business decision making, as well as, to forecast the measures to be taken w h i c h
w i l l lead to the growth of the business.
A research proposal reflects a systematic order of the steps of a research work. It is
defined by Ranjit Kumar as, "an overall plan, scheme, structure and strategy
deigned to obtain answers to the research question or problems that constitute
your research project, its main function is to detail the operational plan for
obtaining answers to your research questions."
In fields like finance, marketing, sales, human resources, mathematics, statistics,
medical, biotechnology, social sciences, etc., research is carried out immensely but the
presentation of research reports differ according to the subject. However, for
standardization of these reports in different fields, a set of guidelines and format is
followed, in order to obtain consistency in the reports. In this chapter, the standard
format for a research report, that is, the main sections to be included in a report are
discussed.
For example, the Ph.D. thesis is a research report, where the whole process is written to
be communicated to the readers after the approval of its g u i d e . Its e m p h a s i s is more in
some u n i q u e f i n d i n g s , in their respective fields.

Importance
The importance of a research report is as follows:
It communicates the research work, as a medium of c o m m u n i c a t i o n .
M a n a g e m e n t decision making are aided by such reports.
It also helps in planning s c h e m e s and strategies for future, based on its result.
It can be used for future references in any research with relation to it, for a more
advanced study of the fin d in g.
Characteristics of a Good Report
A good report is one which is clear, easily understood, and precise. To distinguish a
report as a good one, the following characteristics are to be satisfied:
It should be properly presented.
It shou Id be attractive in its appearance.
The chapters and sections should be well organized.
It should be in a simple language so that it is easily understood by its
readers/audience.
The facts mentioned in the report should be scientifically verified. Also, the data
should be checked, so that there are no issues of validity and r e l i a b i l i t y present.
Data should be collected in a practical way.
It should highlight the difficulties faced in data collection and not only the
achievements in its success.
A report having the above characteristics is attractive in its approach and gets more
au d ience or readers.

1 0 . 3 Steps i n Writing a Research Report
Step 1 . Logical Analysis of Research Question under Study
Step 2 . Preparation of the Outline
Step 3. Preparation of the Rough Draft
Step 4. Preparation of the Bibliography
Step 5.

Rewriting and Refining the Rough Draft
Step 6. Final Draft Writing
Fig. 1 0 . 3 a : Steps to Writing a Research Report
1. Logical Analysis of Research Question under Study
The first step in writing a research report is the analysis of the research q u e s t i o n ,
where two aspects are to be considered:
L og i c al: Analyze all logical associations and relation between the research
question u n d e r study and other studies, published papers, etc.
Chronological: Analyze all chronological evidences relating to the research
question u n d e r consideration.
2. Preparation of the Outline
The next step is to prepare an outline of the research work. By doing this, the
research can be framed in a systematic order and also, one can list out all the
important points to be considered during the research.

3. Preparation of the Rough Draft
After the outline of the research is prepared, the next step is to prepare a rough
draft. The rough draft will consist of the procedure for data collection and the
l i m i t a t i o n s faced in the collection, statistical tools for analysis, generalization of the
resu Its, etc.
4. Rewriting and Refining the Rough Draft
In t h i s step, a l l the limitations present in the rough draft of the research report are
rectified and a more refined draft is prepared.
5. Preparation of the Bibliography
Before the final step, you need to prepare the bibliography, which is the list of
books, j o u r n a l s , articles, etc. listed in a particular manner. The order in w h i c h the
books and pamphlets are to be listed is, name of the author (last name first), title,
place, publisher, date of publication, and volume number. The order in which
magazines and newspapers are listed is, name of the author (last name first), title
of article (in quotation marks), name of periodical, the volume number, date of
issue, and page n u m b e r .
6. Final Draft Writing
Lastly, the final draft of the research is prepared, where detailed information about
the research is g i v e n in simple language. It is a refined version of a l l the previous
drafts of the research, in order to get a polished and proper research report.

1 0 . 4 Report Format
For an effective and valid research report, it is necessary to write the report in some
standard format that is universally accepted. A report can be divided into three parts:
preliminary parts, main body, and appended part. These three parts can be further
classified, as shown in Fig. 10.4a.
Research Report Parts
Preliminary Parts Main Body Appended Parts

I
Title Page Data Collection

Introduction
Forms
Letter of
Detailed
Authorization Methodology
Executive Summary Calculations
Table of Content Objectives Data Analysis General or
Results and Results Technical Tables
Conclusions
List of Figures, Recommendations Conclusions and

Tables, and Graphs Bibliography
Recommendations
Fig. 10.4a: Research Report Parts
A report should contain all the parts shown in Fig. 10.4a and in the same order, so as
to be accepted by its reader. Every report, irrespective of the subject, is suggested to
use t h i s standard format.
1 0 . 5 P r e l i m i n a r y Parts of Research Report
The p r e l i m i n a r y parts of research report consist of the following:
Title page
Letter of authorization
Table of content
List of figures, tables, and graphs
Title Page
The title page should express the title of the research, 'for whom' the report is prepared,
'by whom' it is prepared with the 'name of the institute/university/company', and the

'date of its release'. The title of the research report should correctly and completely
depict the purpose of the research.
Letter of Authorization
For the validity and approval of the research, the letter of authorization from the
concerned authority is required. The letter approves the work done for the research,
highlighting the details of the data and its sources. Also, along with this letter of
authorization, a letter of transmittal is given, which indicates the release of the report to
its readers.
EMR ResearchGroup
MOW!g,ottbtw-Md!
August 30, 2009
Mr. Mario lagasto
Pres.ident, leading Edge Food Group
Columbia, IA 50057
Re: Presentation of Research Identifying Customer Loyalty
Dear Mr. Lagasto:
The report outlined in the research proposal of March 15, 2009, is complete. I have
personally supeMSed the project, conducted the statistical analyses, and prepared thts
report alooo with my two senior research assoaates.. NataHa James and David Parker.
The report addresses the key decision statement: In what WWfS can )'OUl" restaurants build
customer loyalty so that revenues increase through more frequent patronage? The key
research questions involve identifying controllable characteristics that end up relating to
greater share of wallet. As agreed upon in the pn>posal, the report offers no specific
recommendations for managerial action, but rather, it presents conclusions which shouk:I
emble you to make ilformed decisions. Thus, the conclusions conform to the
deliverables desaibed in the proposal letter.
We successfully accompflshed the research project as described in the outline. We were
able to meet OU( goais for interviewing groups of customers and non-customers in a
timely fashion. We are grateful for your business and k><>k forward to working with you
as you develop strateoic ptans of achon based on this report. Once you have taken a look
at the report, please contact me and we will schedule a formal presentation and
question and answer period for yotl' management team.
Sincerely,
Barry J. Babin
President
. R-,ch Grol.J,
11-4 Rlilto.i Aw
Chaud11r1L u. nm
Fig. 10.Sa: Letter of Authorization
Source: Business Research Methods, gm Edition, Zikmund, Babin, Carr, Griffin

Thus, the letter of authority is a declaration given by the person who has verified the
whole study and declared the acceptability of the study, as in Fig. 10.Sa.
Table of Content
It is an essential part in any report. It is the list of all the topics covered with the topic
divisions and subdivisions along with its page references. The table of content is
prepared on the bases of the final draft of the research.
Table of Contents
CHAPTER 1 RESEARCH FUNDAMENTALS AND TERMINOLOGY14
DEFINITIONS OF BASIC RESEARCH TERMS 15
Reseilfc:h, ,- _,,,_,, - 15
ReseMc:h Methods 15
ReseMm Methodolog 17
Srientifi<; Methods 17
Research Process 19
Resevch Desig:n -.,-------------- .. -. 19
OBJECTIVES OF RESEARCH ...................................................................... 20
MOTIVATION IN RESEARCH 21
SIGNFICANCE OF RESEARCH IN GOVERNMENT, INDUSTRY
BUSINESS AND TRADE ............................................................................... 23
SCOPE OF RESEARCH INCLUDES THE FOLLOWS AREAS ................ 28
PRINCIPLES OF QUALITY RESEARCH WORK ...................................... 29
PROBLEMS/LIMITATIONS OF RESEARCH 31
ISSUES AND TRENDS IN RESEARCH ....................................................... 33
SUMMARY 35
REVIEW EXERCISES 35
FURTH ER READINGS .................................................................................. 3 7
CHAPTER 2 IMPORTANCE OF RESEARCH IN MANAGEMENT
DECISIONS 38
FUNDAMENTALS OF MANAGEMENT DECISIONS 39
Chi1r.1cteristics of M.1emen1 Decisions- -, - -.. , ,,--- .. - 39
Elements of Decision M.11ting - -, -.. , _.,, 40
TYPES OF MANAGEMENT DECISIONS 4-4
Plilnninc Decisions on Time Horizons .. -.. --- _,,_.,,_ .. ,_ .. ,- .. ,_ .. _ _ _,, 45
SbtK Plilnninc Decisions - 47
Dynilmic: Plilnning: Disions ---- .. -------- 47

Planning ullder Dynamic Collditions -, -,, 48
Planning intilncible Decisions -, SO
Control Disions-------- .. ---------- .. - 51
Prognimmed and Non-proerammed Decisions - - 52
Routine .11ld Stnteck; De<isions ----- - 51
Policy ilnd Str.1tecic Decisions ------ .. --------- .. -- S1
Oep.1rtmenhl illld Non-Economic Decisions 51
Fig. 1 0 . S b : Table of Content
Source: http: IIebookbrowsee. net/ research-methodology-self-learning- ma nua l-pdf-d 18416 6142
List of Figures, Tables, and Graphs
If there are many figures, graphs, and tables in support of the research, a list of the
name of the figures, graphs, and tables with page references is g i v e n after the table of
content in a research report.

These sections are i n c l u d ed in the preliminary part of a research report. After t h i s , the
main body of the research report begins, explaining the whole procedure and technique
of the research. In addition to these sections in the preliminary part, it also consists of
an executive summary.
The Executive S u m m a r y
It is simply the summary of the whole report. It briefly explains all the four parts of a
research:
Objectives: It states all the important information and purpose of the research.
Results: It states the methodology and result of the research.
Conclusions: It states the interpretation of the results obtained and other
interpretations, based on the results.
Recommendations: It states any suggestions, based on the c o n c l u s i o n .
A sample executive summary has been given below:
Executive Summary
Uncertainty associated with changes in carbon stock is from two additive variance
components:
o Prediction error, a measure of possible bias in the allometric e q u a t i o n s used
to predict above ground tree carbon.
o Sampling error, a measure of variation recognizing that only a very small
proportion of Kyoto forest is actually surveyed.
The average prediction error is estimated to be around 1 %. This figure is likely to
be an underestimate, especially when estimating changes in carbon stock over
2008 - 2013. More biomass data are required to verify t h i s uncertainty. W h i l e we
show the effects of varying prediction error on total uncertainty, the confidence
intervals in t h i s report are calculated only in terms of s a m p l i n g error.
The estimates of carbon stock from the 2004 Nelson and Marlborough pilot data
are 64.4 12.6 t/ ha (95/o confidence interval). This estimate uses analytical
methods to calculate the uncertainty. Carbon stock is estimated from 104 plots for
six pools:
o Above-ground live planted trees
o Above-ground live other species (includes unplanted trees and shrubs)
o Below-ground live planted trees
o Below-ground live other species (includes unplanted trees only)
o Coarse woody debris
o Fine litter

Estimates of change in carbon stock using C_Change to predict carbon for 2008
and 2013 are 55.0 10.3 t/ ha. The estimate of change in carbon is for four
pools:
o Above-ground live planted trees
o Below-ground live planted trees
o Coarse woody debris
o Fine litter
Uncertainty is expected to be reduced in a nationwide survey, with 200 sites the
confidence interval is estimated to be 4. 9 t/ ha. This estimate assumes that the
two surveys, 2008 and 2 0 1 3 are correlated w i t h ? = 0.90. There is some evidence
that the correlation could be as high as 0. 97 but may be less than this if there is
extra variation from genetic, silviculture and climatic factors. A conservative
approach should be adopted in choosing the final number of sites in the nationwide
survey to allow for future extra variation.
Estimates of uncertainty have been derived using analytical methods and it is not
necessary to use Monte Carlo simulation.
Extra error from area definition will inevitably increase uncertainty associated with
total carbon stocks. The estimated errors apply to carbon d e n s i t y (t / h a ) .
Source: http ://www. math .canterbury .ac.nz/research/ucdms2005n8.pdf
10.6 Main Body of Research Report
The m a i n body of a research report has the following sections:
Introduction
Methodology
Data a n a l y s i s and results
Conclusions and recommendations
Appended pa rt
1 0 . 6 . 1 Introduction
The first section of the main body is the introduction. The introduction in a report
introduces the research to its readers. In this section, the objectives of the research are
clearly stated and also, the reasons for which the investigation is taken up. The main
concept involved in a research is introduced and explained properly, so that a l l the terms
and terminologies explained throughout the report are covered in t h i s section.

Thus, introduction helps the readers to comprehend the purpose and concept of the
research. The introduction always follows after the executive summary. A sample of an
'Introduction' has been given below:
Introduction
As a signatory to the Kyoto Protocol New Zealand has agreed to report, in a transparent
and verifiable manner, greenhouse gas emissions by sources, and removals by sinks,
associated with direct human-induced, land-use change and forestry activities. These
land-use change and forestry activities are limited to afforestation, reforestation, and
deforestation that have occurred since 1990. In order to provide the necessary data to
allow carbon stocks, and changes in carbon stock, to be estimated in accordance with
the recently-adopted Good Practice Guidance for Land-Use, Land-Use Change and
Forestry (IPCC 2003), a national forest inventory specifically designed for carbon
monitoring is being i m p l e m e n t e d . The initial focus of the inventory w i l l be planted Kyoto
compliant forests. These are forests which were established after 1 January 1990 on
land, which did not previously contain forests. Part of the preliminary work associated
with the development of this national inventory consisted of a pilot survey, which was
conducted in the Nelson and Marlborough regions. The purpose of the pilot study was to
test the proposed field methodology and collect sufficient data to be able to produce
i n i t i a l estimates of carbon stocks and stock changes.
Any large-scale survey will include some errors (Merritt et al. 2005). Good practice in
forest inventories means that uncertainty associated with the survey and estimation
should be reduced as far as practicable. Good practice also recognizes that w h i l e there
will be some uncertainty remaining it should be identified. Uncertainty analysis is
concerned, with this identification of credible limits to the accuracy of an estimate
(Cullen and Frey 1999). Moreover, the good practice guide (IPCC 2003) for the
preparation of greenhouse gas inventories stipulates that uncertainties associated with
estimates of sources and removals must be quantified.
In this report, we present estimates of the uncertainty associated with the carbon
estimates from the pilot study to demonstrate procedures for future analysis, when
carbon is assessed at a nationwide scale.
Source: http:/ /www. math .canterbury .ac.nz/research/ucdms2005n8. pdf

As g i v e n in the above pages, it can be seen that the executive summary is followed by
the introduction of the report. Here, to separate the different sections in the executive
summary, bullet points are being used. Thus, the executive summary can be referred to
as the abstract of the report. However, the introduction of the report is given more
elaborately.
After e x p l a i n i n g the research objective, concept, and purpose, the review of literature is
provided. It helps a reader to compare the research with the context of other similar
researches. The context mentioned should contain the information of its a u t h o r and the
year.
Methodology
Data for a research should be collected in a scientific manner, in order to get valid
results. The methods and techniques used for the collection of data are explained in t h i s
section. The methods used to obtain data are selected, so as to encounter fewer
a m o u n t s of biases that can get incorporated in the different phases of data collection.
The methodology provides the following:
Research Design: This includes the study type, the source of data collection:
primary or secondary, details of how the data is collected and the m e d i u m used in
the research.
Sample Design: It explains the type of sampling design and the sa m p l e size used
in the research for its data collection. An appropriate sampling type is used
depending u p o n the population from which the data is to be collected.
The Fieldwork for Data Collection: The whole process of field data collection
i n c l u d e s the information about 'by whom', 'how' and 'where' the collection of the
data w i l l be done.
Data Analysis and Results
After the methodology of research is clearly stated, the next section states what type of
a n a l y s i s are done to obtain the results for the study. This section is the most important
part of the research. Appropriate data analysis employed, is explained briefly and also,
the reason for its suitability to be applied in the research is stated. If an appropriate
analysis is not used accordingly, depending upon the population u n d e r consideration, in
that case, the result of the research may deviate from its actual f i n d i n g .

A research without a result or fi n d i n g is not complete or not acceptable. Thus, it is very
much necessary to implement proper analysis and find the result. The interpretation of
the result w i l l indicate whether the hypothesis under consideration is correct or wrong.
Conclusions and Recommendations
This section consists of the judgment of the researcher on the basis of the results
obtained and also the suggestions, regarding the same is provided. That is, the view of
the researcher towards its study is summarized, so that it can communicate the
researcher's words to its readers.
Appended Part
It consists of a l l the material or subsidiary documents related to the research. Technical
documents can also be a d d ed in the appended part. The documents/references that are
u s u a l l y included in this part are:
Data Collection Forms
Detailed Calculations
General a n d Technical Tables
Bibliography
Data Collection Forms
A l l the questionnaires or schedules or proforma used are placed in t h i s section.
Detailed Calculations
To calculate the result, the calculations need to be illustrated and because of the brief
information given in the report, it cannot be discussed, those calculations can be
provided in the appendix. Also, some terminologies mentioned in the report must be
discussed in the a p p e n d ed part.
General and Technical Tables
The statistical or measurement tables, which are used for interpretation of the finding
must be furnished in the a p p e n d i x .
Bibliography
It consists of a l l the references of the research report. Bibliography is also indicated as
references in some reports. The following, Fig. 10.6a, is a bibliography of a report
where a l l the books, articles, and links used for reference are listed.

BIBLIOGRAPHY
'Bureau of Indian Affairs". 1 2 July 2008. Department of Indian Affairs. 2002
<http ://v,ww. doi .gov/bial>.
"Bureau of Indian Affairs: Quick Facts". , 2 July 2008. Department of Indian Affairs.
2002 <httpl/wv/\v.doi.gov/bia!quick_facts.html>.
Conley, R. J. The Chemkee Nation A History. Albuquerque: Universi1y of New
Mexico Press, 2005.
"Education Facts and History". 18 July 2008. National Indian Education Association.
2002 <http://wv11.v.niea.org/history/research.php>.
Ethridge, R. Creek Country The Creek Indians and Their World Chapel Hill: The
University of North Carolina Press, 2002.
Fenn, E. A., Wood, P . H . , Watson, H. L., Clayton, T. H., Nathans, S., Parramore, T. C.,
et al. The Way We Lived in Notth Carolina. Chapel Hill. NC: The University of
North Carolina Press, 2003.
Gannon, M. FLORIDA A Short History. Gainesville: University Press of Florida, 2003.
"Native Americans - American Indians - The Firs1 People of America". 12 July
2008. Native Americans Websrle. 2007 <http:!!www.nativeamericans.com/>.
Spencer, D. D. Seminole Indians in Old Picture Postcards. Ormond Beach:
Camelot Publising Company, 2002.
Taylor, R. A. FLORIDA: An Illustrated History. New York: Hippocrene Books, Inc., 2005.
Fig. 10.6a: Bibliography
Source: http://img.docstoccdn.com/thumb/orig/ 123795011.png
1 0 . 7 Types of Research Report
A research report can be written in different types, depending on its target au d ien c e.
The report can be of the following types in terms of the presentation of results and
procedure of the research:
Technical report
P o p u l a r report
Article
M onogr aph
Oral presentation

In these types of research reports, the main purpose is to describe the research
completely, while they only differ in their writing style, that is, the way the whole
procedure of the research is written. Like a study which is for the general population is
presented in a simple but concise way, while a study for an audience c o m p r i s i n g people
who are well aware of the technical terminologies of the subject u n d e r study, the report
in such a case can be more precise and technical.
The types of reports mentioned above are discussed below:
Technical Report
Such kind of report mainly concentrates on the research methodology used,
assumptions required for the research/study, and the also the limitations mentioned in
the study, with supporting evidence or data.
The m a i n sections in a technical report are:
1. S u m m a r y of the s t u d y : A brief overview of the research and its f i n d i n g .
2. Nature of the study: It includes the objectives of the research, research
formulation and the hypothesis. It also includes details about the population
targeted for the study, which supply the required data.
3. Methodology: It includes the methods and techniques used for the collection of
data. This also includes the details of the sample size, type of sampling, and
m e d i u m used for collection of data.
4. Data: This section describes data in detail and its characteristics.
5. Analysis of data and interpretation of results: The data collected are analyzed by
suitable statistical tools for appropriate output. This output or result's
interpretation is the answer to the research question formulated. The
interpretation of the result completes the whole research.
6. Conclusion: A detailed summary of the result and suggestions drawn from the
results are included in this section.
7. Bibliography: The sources consulted in the study are listed in t h i s section.
8. Technical appendices: It includes all the documents related to the research. Like
questionnaires/schedules, tables/techniques that are used in analysis of the data
utilized in the research are given in this section.

Popular Report
It is rightly named 'popular' because such a report is prepared in a generalized manner
for a mass a u d i e n c e . As it is accessed by population from different backgrounds, it must
be in very s i m p l e and an understandable language. It contains more flowcharts, graphs,
and figures to e x p l a i n the research in simple language but in detail.
The following sections are to be included while preparing a popular report:
1. Summary: This section throws light to the generalized finding of the research and
its i m p l e m e n t a t i o n in the practical world.
2. Recommendations for the research: Based on the results, the recommendations
that are to be suggested are included in this section.
3. Objective of the study: Specific objectives for the research are given in this
section.
4. Methodology: The techniques and methods used for the research are included in
t h i s section. The details are given in such a way that it is easily understood and it
does not contain technical terms which are not practically used.
Article
An article is a short write-up that is published in a newspaper, magazine or j o u r n a l . Even
t h i s is for a mass a u d i e n c e . It is short, attractive, and less formal in its writing style.
It m a i n l y i n c l u d e s the following sections:
1. Title: An attractive title to gain the attention of its readers.
2. Introduction: A clear review of the research is included in a short and s i m p l e way.
3. Main body: It comprises two to five paragraphs describing the details of how the
research is done.
4. Conclusion: It gives the final interpretation of results or comments regarding the
study.

Monograph
In comparison to all the reports mentioned above, this is the most detailed write-up,
which is technically written for a specific subject. The main objective of such a report is
to provide i n s i g h t to the topic under study and be more informative. The writer of such a
report must make sure that, the topic considered for the study is not established earlier
in any of the studies, that is, it should be a unique one. However, it can be an
advancement of the results of a previous study.
The target reader for a monograph is very limited because it is subject-specific and not
g en era l .
Oral Presentation
An oral presentation is also an essential part of report presentation to e x p l a i n a study to
its clients/readers. It such reports, the researcher can highlight the importance,
objectives, and results of the study in a precise way by using more g r a p h s , tables, and
flowcharts. Since it is a face-to-face presentation, readers get a scope for clearing their
doubts by asking the researcher, relevant questions. It has an advantage that there is
an active interaction between the researcher and readers.

Research Methodology - ITM Unviersity - MBA - Sem 2

Uploaded by

Copyright:

Available Formats

You might also like

Research Methodology - ITM Unviersity - MBA - Sem 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Research Methodology - ITM Unviersity - MBA - Sem 2

Uploaded by

Copyright:

Available Formats

I T M

Table of Content eBook

1. Introduction to Research Methodology 6

1.2 Definition of Research 8

1.3 Research Methods and Research Methodology 8

1.4 Objectives of Research 9

1. 5 Motivation for Conducting Research 10

1. 6 Criteria of Good Research 10

1.8 Types of Research 12

1. 9 Steps Involved in Research Process 16

1.10 Role of Research in Business 19

2. Research Problem Formulation and Research Design 21

2.2 Definition of Research Problem 23

2.3 Procedure of Defining General Research Problem 24

2.4 Objectives of Research Design 27

2.6 Contents of Research Design 28

2. 7 Important Concepts in Research Design 29

2.8 Types of Research Design 31

2. 9 Basic P r i n c i p l e s of Experimental Design 33

3. Sampling Design and Sampling Techniques 35

3.2 Population, Census, and Sample 37

3.3 Sampling Design 38

3.4 S a m p l e Design Procedure 38

3.5 Characteristics of a Good Sample Design 41

3.6 Criteria for Selecting a Sampling Procedure 41

3.7 Types of S a m p l i n g Techniques 43

Table of Content eBook

4. Methods and Tools of Data Collection 50

4.2 Data Types 52

4.3 Questionnaire Design 58

4.4 Requirements of a Good Questionnaire 61

4.5 Case Study 62

5. Measurement and Scaling Techniques 65

5.2 Measurement and Scaling 67

5.3 Primary Scales of Measurement 68

5.4 Classification of Scaling Techniques 71

5.5 Comparative Scales 72

5.6 Categorical Scales 74

6. Tabulation and Analysis of Data 81

6.3 M u l t i p l e Regression Analysis 85

6. 5 Measures of Central Tendency 86

6.6 Measures of Dispersion 94

6.8 Measures of Relationships 97

6. 9 Association of Attributes 100

6.10 Time Series Analysis and Index Number 101

6.11 Chapter S u m m a r y 103

Table of Content eBook

7. Hypothesis Testing 104

7.1 Introduction 105

7.2 Hypothesis 106

7.3 Types of Hypothesis 107

7.4 Terminologies Used in Hypothesis Testing 108

7.5 Procedure of Testing of Hypothesis 112

7.6 Parametric and Non-parametric Testing 113

7. 7 Testing of Hypothesis for Mean 115

7.8 Testing of Hypothesis for Variance 119

7.9 Testing of Hypothesis for Correlation Coefficients 121

7.10 Limitations of Testing of Hypothesis 122