Professional Documents
Culture Documents
Unit 1
Unit 1
Conceptual research is involves investigation of thoughts and ideas and developing new ideas or
interpreting the old ones based on logical reasoning. In contrast empirical research is based on firm
verifiable data collected by either observation of facts under natural condition or obtained through
experimentation.
Scope of Research:
The scope of the study explains the extent to which the research area will be explored in the work.
Basically scope of research divided into three parts.
Environmental level
Technological innovations
Competitive analysis
Industry fears
New market entry
New product development
Organizational Level
HRM
Finance
Production
Organizational effectiveness and success
Marketing Level
Product
Price
Place
Promotion
Sales
Limitations:
If business activities are carried on the basis of custom and traditions, in such case research study
becomes irrelevant.
The research activities are very expensive.
It may not be done for small and medium scale unit.
Research Process:
Before embarking on the details of research methodology and techniques, it seems appropriate to
present a brief overview of the research process. Research process consists of series of actions or
steps necessary to effectively carry out research and the desired sequencing of these steps. The chart
shown in Figure well illustrates a research process. The chart indicates that the research process
consists of a number of closely related activities, as shown through I to VII. But such activities
overlap continuously rather than following a strictly prescribed sequence.
At times, the first step determines the nature of the last step to be undertaken. If subsequent
procedures have not been taken into account in the early stages, serious difficulties may arise which
may even prevent the completion of the study. One
should remember that the various steps involved in a research process are not mutually exclusive;
nor are they separate and distinct.
They do not necessarily follow each other in any specific order and the researcher has to be
constantly anticipating at each step in the research process the requirements of the subsequent
steps. However, the following order concerning various steps provides a useful procedural guideline
regarding the research process:
formulating the research problem;
extensive literature survey;
developing the hypothesis;
preparing the research design;
determining sample design;
collecting the data;
execution of the project;
analysis of data;
hypothesis testing;
generalizations and interpretation, and
preparation of the report or presentation of the results,i.e., formal write-up of conclusions reached.
1.Formulating the research problem: There are two types of research problems, vi., those which
relate to states of nature and those which relate to relationships between variables. At thievery
outset the researcher must single out the problem he wants to study, i.e., he must decide the general
area of interest or aspect of a subject-matter that he would like to inquire into. Initially the problem
may be stated in a broad general way and then the ambiguities, if any, relating to the problem be
resolved. Then, the feasibility of a particular solution has to be Considered before a working
formulation of the problem can be set up. The formulation of a general topic into a specific research
problem, thus, constitutes the first step in a scientific enquiry. Essentially two steps are involved in
formulating the research problem, vi., understanding the problem thoroughly, and rephrasing the
same into meaningful terms from an analytical point of view.
The best way of understanding the problem is to discuss it with one’s own colleagues or with those
having some expertise in the matter. In an academic institution the researcher can seek the help from
a guide who is usually an experienced man and has several research problems in mind.
He may review two types of literature—the conceptual literature concerning the concepts and
theories, and the empirical literature consisting of studies made earlier which are similar to the one
proposed.
This task of formulating, or defining, a research problem is a step of greatest importance in the entire
research process. The problem to be investigated must be defined unambiguously for that will help
discriminating relevant data from irrelevant ones.
2. Extensive literature survey: Once the problem is formulated, a brief summary of it should be
written down. It is compulsory for a research worker writing a thesis for a Ph.D. degree to write
synopsis of the topic and submit it to the necessary Committee or the Research Board for approval.
At this juncture the researcher should undertake extensive literature survey connected with the
problem.
For this purpose, the abstracting and indexing journals and published or unpublished bibliographies
are the first place to go to. Academic journals, conference proceedings, government reports, books
etc., must be tapped depending on the nature of the problem. In this process, it should be
remembered that one source will lead to another. The earlier studies, if any, which are similar to the
study in and should be carefully studied. A good library will be a great help to the researcher at this
stage.
3. Development of working hypotheses: After extensive literature survey, researcher should state
in clear terms the working hypothesis or hypotheses. Working hypothesis is tentative assumption
made in order to draw out and test its logical or empirical consequences. As such the manner in
which research hypotheses are developed is particularly important since they provide the focal point
for research.
They also affect the manner in which tests must be conducted in the analysis of data and indirectly
the quality of data which is required for the analysis. In most types of research, the development of
working hypothesis plays an important role.
Hypothesis should be very specific and limited to the piece of research in hand because it has to be
tested. The role of the hypothesis is to guide the researcher by delimiting the area of research and to
keep him on the right track. It sharpens his thinking and focuses attention on the more important
facets of the problem. It also indicates the type of data required and the type of methods of data
analysis to be used.
How does one go about developing working hypotheses? The answer is by using the following
approach:
Discussions with colleagues and experts about the problem, its origin and the objectives in seeking a
solution;
Examination of data and records, if available, concerning the problem for possible trends,
peculiarities and other clues;
Review of similar studies in the area or of the studies on similar problems; and
Exploratory personal investigation which involves original field interviews on a limited scale with
interested parties and individuals with a view to secure greater insight into the practical aspects of
the problem.
4. Preparing the research design: The research problem having been formulated in clear cut
terms, the researcher will be required to prepare a research design, i.e., he will have to state the
conceptual structure within which research would be conducted. The preparation of such a design
facilitates research to be as efficient as possible yielding maximal information.
In other words, the function of research design is to provide for the collection of relevant evidence
with minimal expenditure of effort, time and money. But how all these can be achieved depends
mainly on the research purpose. Research purposes may be grouped into four categories,
Exploration,
Description,
Diagnosis, and
Experimentation.
5.Determining sample design: All the items under consideration in any field of inquiry constitute
‘universe’ or ‘population’. A complete enumeration of all the items in the ‘population’ is known as a
census inquiry. It can be presumed that in such an inquiry when all the items are covered no element
of chance is left and highest accuracy is obtained. But in practice this may not be true.
Even the slightest element of bias in such an inquiry will get larger and larger as the number of
observations increases. Moreover, there is no way of checking the element of bias or its extent except
through is survey or use of sample checks. Besides, this type of inquiry involves a great deal of time,
money and energy. Not only this, census inquiry is not possible in practice under many
circumstances. For instance, blood testing is done only on sample basis. Hence, quite often we select
only a few items from the universe for our study purposes. The items so selected constitute what is
technically called sample.
The researcher must decide the way of selecting a sample or what is popularly known as the sample
design. In other words, a sample design is a definite plan determined before any data are actually
collected for obtaining a sample from a given population. Thus, the plan to select 12 of a city’s 200
drugstores in a certain way constitutes a sample design. Samples can be either probability samples or
non- probability samples.
With probability samples each element has a known probability of being included in the sample but
the non-probability samples do not allow the researcher to determine this probability.
a. Deliberate sampling
c.Systematic sampling
d.Stratified sampling
e.Quota sampling
g. Multi-stage sampling
h. Sequential sampling
6.Collecting the data: In dealing with any real life problem it is often found that data at hand are
inadequate, and hence, it becomes necessary to collect data that are appropriate. There are several
ways of collecting the appropriate data which differ considerably in context of money costs, time and
other resources at the disposal of the researcher. Primary data can be collected either through
experiment or through survey. If the researcher conducts an experiment, he observes some
quantitative measurements, or the data, with the help of which he examines the truth contained
Considered before a working formulation of the problem can be set up. The formulation of a general
topic into a specific research problem, thus, constitutes the first step in a scientific enquiry.
Essentially two steps are involved in formulating the research problem, vi., understanding the
problem thoroughly, and rephrasing the same into meaningful terms from an analytical point of
view.
The best way of understanding the problem is to discuss it with one’s own colleagues or with those
having some expertise in the matter. In an academic institution the researcher can seek the help from
a guide who is usually an experienced man and has several research problems in mind.
He may review two types of literature—the conceptual literature concerning the concepts and
theories, and the empirical literature consisting of studies made earlier which are similar to the one
proposed.
This task of formulating, or defining, a research problem is a step of greatest importance in the entire
research process. The problem to be investigated must be defined unambiguously for that will help
discriminating relevant data from irrelevant ones.
8.Analysis of data: After the data have been collected, the researcher turns to the task of analyzing
them. The analysis of data requires a number of closely related operations such as establishment of
categories, the application of these categories in his hypothesis. But in the case of a survey, data can
be collected by any one or more of the following ways:
By observation
Through personal interview
Through telephone interviews
By mailing of questionnaires
Through schedules
7. Execution of the project: Execution of the project is a very important step in the research
process. If the execution of the project proceeds on correct lines, the data to be collected would be
adequate and dependable. The researcher should see that the project is executed in a systematic
manner and in time. If the survey is to be conducted by means of structured questionnaires, data
can be readily machine- processed. In such a situation, questions as well as the possible answers may
be coded. If the data are to be collected through interviewers, arrangements should be made for
proper selection and training of the interviewers.
The training may be given with the help of instruction manuals which explain clearly the job of the
interviewers at each step. Occasional field checks should be made to ensure that the interviewers are
doing their assigned job sincerely and efficiently .A careful watch should be kept for unanticipated
factors in order to keep the survey as much realistic as possible. This, in other words, means
that steps should be taken to ensure that the survey is under statistical control so that the collected
information is in accordance with the pre-defined standard of accuracy.
to raw data through coding, tabulation and then drawing statistical inferences. The unwieldy data
should necessarily be condensed into a few manageable groups and tables for further analysis. Thus,
researcher should classify the raw data into some purposeful and usable categories. Coding
operation is usually done at this stage through which the categories of data are transformed into
symbols that may be tabulated and counted. Editing is the procedure that improves the quality of
the data for coding. With coding the stage is ready for tabulation .Tabulation is a part of the
technical procedure wherein the classified data are put in the form of tables. The mechanical devices
can be made use of at this juncture. A great deal of data, especially in large inquiries, is tabulated by
computers. Computers not only save time but also make it possible to study large number of
variables affecting a problem simultaneously.
Analysis work after tabulation is generally based on the computation of various percentages,
coefficients, etc., by applying various well defined statistical formulae. In the process of analysis,
relationships or differences supporting or conflicting with original or new hypotheses should be
subjected to tests of significance to determine with what validity data can be said to indicate any
conclusion(s).
9.Hypothesis-testing: After analyzing the data as stated above, the researcher is in a position to test
the hypotheses, if any, he had formulated earlier. Do the facts support the hypotheses or they happen
to be contrary? This is the usual question which should be answered while testing hypotheses
.Various tests, such as Chi square test, t-test, F-test, have been developed by statisticians for the
purpose. The hypotheses may be tested through the use of one or more of such tests, depending upon
the nature and object of research inquiry. Hypothesis -testing will result in either accepting the
hypothesis or in rejecting it. If the researcher had no hypotheses to start with, generalizations
established on the basis of data may be stated as hypotheses to be tested by subsequent researches
in times to come.
10. Generalizations and interpretation: If a hypothesis is tested and upheld several times, it may
be possible for the researcher to arrive at generalization, i.e., to build a theory. As a matter of fact, the
real value of research lies in its ability to arrive at certain generalizations. If the researcher had no
hypothesis to start with, he might seek to explain his findings on the basis of some theory. It is
known as interpretation. The process of interpretation may quite often trigger off new questions
which in turn may lead to further researches.
11. Preparation of the report or the thesis: Finally, the researcher has to prepare the report of
what has been done by him. Writing of report must be done with great care keeping in view the
following: The layout of the report should be as follows:
the preliminary pages;
the main text,and iii.the end matter.
Research Design:
Research design is the framework of research methods and techniques chosen by a researcher. A
research design is the arrangement of conditions for collection and analysis of data in a manner that
aims to combine relevance to the research purpose with economy in procedure. The research design
is the conceptual structure within which research is conducted. It constitutes the blueprint for the
collection, measurement and analysis of data. The design includes an outline of what the researcher
will do from writing the hypothesis and its operational implications to the final analysis of data. The
design decisions can be taken by considering the following heads:
What is the study about?
Why is the study being made?
Where will the study be carried out?
What type of data is required?
Where can the required data be found?
What periods of time will the study include?
What will be the sample design?
What techniques of data collection will be used?
How will the data be analysed?
In what style will the report be prepared?
The essential elements of the research design are:
Accurate purpose statement
Techniques to be implemented for collecting and analyzing research
The method applied for analyzing collected details
Type of research methodology
Probable objections for research
Settings for the research study
Timeline
Measurement of analysis
Proper research design sets your study up for success. Successful research studies provide
insights that are accurate and unbiased. You’ll need to create a survey that meets all of the
main characteristics of a design.
There are four key characteristics of research design:
1. Neutrality: When you set up your study, you may have to make assumptions about the data
you expect to collect. The results projected in the research design should be free from bias and
neutral. Understand opinions about the final evaluated scores and conclusions from multiple
individuals and consider those who agree with the derived results.
2. Reliability: With regularly conducted research, the researcher involved expects similar
results every time. Your design should indicate how to form research questions to ensure the
standard of results. You’ll only be able to reach the expected results if your design is reliable.
3. Validity: There are multiple measuring tools available. However, the only correct measuring
tools are those which help a researcher in gauging results according to the objective of the
research. The questionnaire developed from this design will then be valid.
4. Generalization: The outcome of your design should apply to a population and not just a
restricted sample. A generalized design implies that your survey can be conducted on any part
of a population with similar accuracy.
The above factors affect the way respondents answer the research questions and so all the
above characteristics should be balanced in a good design. A researcher must have a clear
understanding of the various types of research design to select which model to implement for a
study. Like research itself, the design of your study can be broadly classified into quantitative
and qualitative.
Qualitative research design: Qualitative research determines relationships between
collected data and observations based on mathematical calculations. Theories related to a
naturally existing phenomenon can be proved or disproved using statistical methods.
Researchers rely on qualitative research design methods that conclude “why” a particular
theory exists along with “what” respondents have to say about it.
Quantitative research design: Quantitative research is for cases where statistical conclusions
to collect actionable insights are essential. Numbers provide a better perspective to make
critical business decisions. Quantitative research design methods are necessary for the growth
of any organization. Insights drawn from hard numerical data and analysis prove to be highly
effective when making decisions related to the future of the business.
Types of Research Design:
1. Descriptive research design: In a descriptive design, a researcher is solely interested in
describing the situation or case under their research study. It is a theory-based design method
which is created by gathering, analyzing, and presenting collected data. This allows a
researcher to provide insights into the why and how of research. Descriptive design helps
others better understand the need for the research. If the problem statement is not clear, you
can conduct exploratory research.
2. Experimental research design: Experimental research design establishes a relationship
between the cause and effect of a situation. It is a causal design where one observes the impact
caused by the independent variable on the dependent variable. For example, one monitors the
influence of an independent variable such as a price on a dependent variable such as customer
satisfaction or brand loyalty. It is a highly practical research design method as it contributes to
solving a problem at hand. The independent variables are manipulated to monitor the change
it has on the dependent variable. It is often used in social sciences to observe human behavior
by analyzing two groups. Researchers can have participants change their actions and study
how the people around them react to gain a better understanding of social psychology.
3. Correlational research design: Correlational research is a non-experimental
research design technique that helps researchers establish a relationship between two closely
connected variables. This type of research requires two different groups. There is no
assumption while evaluating a relationship between two different variables, and statistical
analysis techniques calculate the relationship between them. A correlation coefficient
determines the correlation between two variables, whose value ranges between -1 and +1. If
the correlation coefficient is towards +1, it indicates a positive relationship between the
variables and -1 means a negative relationship between the two variables.
4. Diagnostic research design: In diagnostic design, the researcher is looking to evaluate the
underlying cause of a specific topic or phenomenon. This method helps one learn more about
the factors that create troublesome situations.
This design has three parts of the research:
· Inception of the issue
· Diagnosis of the issue
· Solution for the issue
5. Explanatory research design: Explanatory design uses a researcher’s ideas and thoughts
on a subject to further explore their theories. The research explains unexplored aspects of a
subject and details about what, how, and why of research questions.
Terminologies:
RESEARCH: Research is defined as a systematic and scientific process to answer questions
about facts and relationship between facts. It is an activity involved in seeking answer to
unanswered questions.
ABSTRACT: A clear, concise summary that communicates the essential information about the
study. In research journals, it is usually located at the beginning of an article.
DATA: Units of information or any statistics, facts, figures, general material, evidence, or
knowledge collected during the course of the study.
VARIABLES: Attributes or characteristics that can have more than one value, such as height or
weight. Variables are qualities or quantities, properties or characteristics of people, things, or
situations that change or vary.
INDEPENDENT VARIABLE: Variables that are purposely manipulated or changed by the
researcher. It is also called as “MANIPULATED VARIBLE”.
RESEARCH VARIABLE: Refers to Qualities, Properties or Characteristics which are observed
or measured in a natural setting without manipulating & establishing cause & effect
relationship
DEMOGRAPHIC VARIABLES: The characteristics & attributes of study subjects such as age,
gender, place of living, educational status, religion, social class, marital status, occupation,
income are considered as demographic variables.
EXTRANEOUS VARIABLES: Are factors that are not the part of the study but may affect the
measurements of the study variable.
OPERATIONAL DEFINITION: Refers to the way in which the researcher defines the variables
under investigation. Operational definition is stated in such way by the investigator specifying
how the study variables will be measured in the actual research situation.
CONCEPT: Refers to a mental idea of a phenomenon. Concepts are words or terms that
symbolize some aspects of reality. E.g.. Love, pain.
CONSTRUCT: Is a highly abstract & complex phenomenon (concept) which is denoted by a
made up or constructed term.
PROPOSITION: A Proposition is a statement or assertion of the relationship between
concepts. E.g., relationship between anxiety and performance.
CONCEPTUAL FRAMEWORK: Interrelated concepts or abstractions that are assembled
together in some rational scheme by virtue of their relevance to a common theme. It is also
referred to as theoretical framework.
ASSUMPTION: Basic principle that is being true on the basis of logic or reason, without proof
or verification.
HYPOTHESIS: A statement of the predicted relationship between two or more variables in a
research study; an educated or calculated guess by the researcher.
LITERATURE REVIEW: A critical summary or research on a topic of interest, generally
prepared to put a research problem in context or to identify gaps and weaknesses in prior
studies so as to justify a new investigation.
LIMITATIONS: Restrictions in a study that may decrease the credibility and generalizability of
the research findings.
MANIPULATION: An intervention or treatment introduced by the researcher in an
experimental or quasi experimental study; the researcher manipulates the independent
variable to assess its impact on the dependent variable.
POPULATION: The entire set of individuals or objects having some common characteristic(s)
selected for a research study is referred to as population.
TARGET POPULATION: The entire population in which the researchers are interested and to
which they would like to generalize the research findings.
ACCESSIBLE POPULATION: The aggregate of cases that conform to designated inclusion or
exclusion criteria and that are accessible as subjects of the study.
RESEARCH SETTING: The study setting is the location in which the research is conducted. It
could be natural, partially controlled environment or laboratories.
SAMPLE: A part or subset of population selected to participate in the research study.
SAMPLING: The process of selecting sample from the target population to represent the entire
population.
PROBABILITY SAMPLING: The selection of subjects or sampling units from a population
using random procedure; E.g., Simple random Sampling, Stratified random Sampling.
NON PROBABILITY SAMPLING: The selection of subjects or sampling units from a population
using non random procedure. E.g., Convenient Sampling, Purposive Sampling.
RELIABILITY: The degree of consistency or accuracy with which an instrument measures the
attributes it is designed to measure.
VALIDITY: The degree to which an instrument what it is intended to measure.
PILOT STUDY: Study carried out at the end of the planning phase of research in order to
explore and test the research elements to make relevant modifications in research tools and
methodology.
ANALYSIS: Method of organizing, sorting, and scrutinizing data in such a way that research
question can be answered or meaningful inferences can be drawn.
RESEARCH PROJECT:
A research project is a scientific attempt to answer a research question. A research project
must include a description of a defined protocol, clearly defined goals, defined methods and
outputs, and a defined start and end data.
Choice of Topic
The ability to develop a good research topic is an important skill. An instructor may assign you
a specific topic, but most often instructors require you to select your own topic of interest.
When deciding on a topic, there are a few things that you will need to do:
brainstorm for ideas
Choose a topic that will enable you to read and understand the literature.
Ensure that the topic is manageable, and that material is available.
Make a list of key words.
Be flexible.
Define your topic as a focused research question.
Research and read more about your topic.
Formulate a thesis statement.
Be aware that selecting a good topic may not be easy. It must be narrow and focused enough to
be interesting, yet broad enough to find adequate information. Before selecting your topic,
make sure you know what your final project should look like. Each class or instructor will
likely require a different format or style of research project.
Use the steps below to guide you through the process of selecting a research topic.
Step 1: Brainstorm for ideas
Choose a topic that interests you. Use the following questions to help generate topic ideas.
Do you have a strong opinion on a current social or political controversy?
Did you read or see a news story recently that has piqued your interest or made you angry or
anxious?
Do you have a personal issue, problem or interest that you would like to know more about?
Do you have a research paper due for a class this semester?
Is there an aspect of a class that you are interested in learning more about?
Step 2: Read General Background Information
Read a general encyclopedia article on the top two or three topics you are considering. Reading
a broad summary enables you to get an overview of the topic and see how your idea relates to
broader, narrower, and related issues. It also provides a great source for finding words
commonly used to describe the topic. These keywords may be very useful to your later
research. If you can’t find an article on your topic, try using broader terms and ask for help
from a librarian. For example, the Encyclopedia Britannica Online (or the printed version of
this encyclopedia, in Thompson Library's Reference Collection on Reference Table 1) may not
have an article on Social and Political Implications of Jackie Robinsons Breaking of the Color
Barrier in Major League Baseball but there will be articles on baseball history and on Jackie
Robinson.
Step 3: Focus on Your Topic : Keep it manageable. A topic will be very difficult to research if
it is too broad or narrow. One way to narrow a broad topic such as "the environment" is to
limit your topic. Some common ways to limit a topic are:
by geographical area
Example: What environmental issues are most important in the Southwestern United States
by culture
Example: How does the environment fit into the Navajo world view?
by time frame: Example: What are the most prominent environmental issues of the last 10
years?
by discipline
Example: How does environmental awareness effect business practices today?
by population group
Example: What are the effects of air pollution on senior citizens?
locally confined - Topics this specific may only be covered in these (local) newspapers, if at
all.
Example: What sources of pollution affect the Genesee County water supply?
recent - If a topic is quite recent, books or journal articles may not be available, but newspaper
or magazine articles may. Also, Web sites related to the topic may or may not be available.
broadly interdisciplinary - You could be overwhelmed with superficial information.
Example: How can the environment contribute to the culture, politics and society of the
Western states?
popular - You will only find very popular articles about some topics such as sports figures and
high-profile celebrities and musicians.
If you have any difficulties or questions with focusing your topic, discuss the topic with your
instructor, or with a librarian
Step 4: Make a List of Useful Keywords
Keep track of the words that are used to describe your topic.
Look for words that best describe your topic
Look for them in when reading encyclopedia articles and background and general information
Find broader and narrower terms, synonyms, key concepts for key words to widen your search
capabilities
Make note of these words and use them later when searching databases and catalogs
Step 5: Be Flexible
It is common to modify your topic during the research process. You can never be sure of what
you may find. You may find too much and need to narrow your focus, or too little and need to
broaden your focus. This is a normal part of the research process. When researching, you may
not wish to change your topic, but you may decide that some other aspect of the topic is more
interesting or manageable.
Keep in mind the assigned length of the research paper, project, bibliography or other research
assignment. Be aware of the depth of coverage needed and the due date. These important
factors may help you decide how much and when you will modify your topic.
Writing Research Proposal:
Add a meaningful short title: provide a brief and meaningful title to your project.
Introduction: Background or introduction section provides a description of the basic facts and
importance of the research area.
What is your research area?
What is the motivation of research?
How important is it for the industry?
Knowledge advancement.
Problem Statement: Problem statement provides a clear and concise description of the issues
that need to be addressed.
What is the specific problem in that research area that you will address?
For example: lack of understanding of a subject. Low performance etc.
Objectives: it provides a list of goals that will be achieved through the proposed research.
What are the benefits/impact that will be generated if the research problem is
answeres?
Why would we allow this research to be done?
Preliminary Literature Review: Provide a summary of previous related research on the
research problem and their strength and weakness and a justification of your research.
What is known/what have been done by others?
Why your research is still necessary?
Research Methodologies: what to do and how to solve the problem and achieve proposed
objectives. Which research methods will be used? Attach a project schedule table, if necessary.
Reference: All factual material that is not original with you must be accompanied by a
reference to its source. Follow the proper referencing guidelines as directed by the research
approval authorities.
Hypothesis: A hypothesis can be defined as a tentative prediction or explanation of the
relationship between two or more variables. Hypotheses are not meant to be haphazard
guesses, but should reflect the depth of knowledge, imagination and experience of the
investigator. In the process of formulating the hypothesis all variables relevant to the study
must be identified.
Time Frame: The researcher should include an outline of the various stages and
corresponding time frames for developing and implementing the research, including writing
up the research. For full-time study the research should be completed within three years, with
writing up completed in the fourth year of registration. For part-time study the research
should be completed within six years, with writing up completed by the eighth year.
UNIT-2
SURVEY RESEARCH DESIGN:
Survey research designs are procedures in quantitative research in which investigators
administer a survey to a sample or to the entire population of people to describe the attitudes,
opinions, behaviors, or characteristics of the population. In this procedure, survey researchers
collect quantitative, numbered data using questionnaires (e.g., mailed questionnaires) or
interviews (e.g., one-on-one interviews) and statistically analyse the data to describe trends
about responses to questions and to test research questions or hypotheses.
Types of Survey Designs:
1. Cross-Sectional Survey Designs:
In a cross-sectional survey design, the researcher collects data at one point in time.
For example, when middle school children complete a survey about teasing, they are recording
data about their present views.
This design has the advantage of measuring current attitudes or practices.
It also provides information in a short amount of time.
A cross-sectional study can examine current attitudes, beliefs, opinions, or practices. Attitudes,
beliefs, and opinions are ways in which individuals think about issues, whereas practices are
their actual behaviours.
Another cross-sectional design compares two or more educational groups in terms of
attitudes, beliefs, opinions, or practices. These group comparisons may compare students with
students, students with teachers, students with parents, or they may compare other groups
within educational and school settings.
Longitudinal Survey Designs:
An alternative to using a cross-sectional design is to collect data over time using a longitudinal
survey design.
A longitudinal survey design involves the survey procedure of collecting data about trends
with the same population, changes in a cohort group or subpopulation, or changes in a panel
group of the same individuals over time.
Thus, in longitudinal designs, the participants may be different or the same people.
Example of a longitudinal design would be a follow-up with graduates from a program or
school to learn their views about their educational experiences.
SAMPLING:
Sampling definition:
Sampling is a technique of selecting individual members or a subset of the population to make
statistical inferences from them and estimate characteristics of the whole population.
Types of sampling: sampling methods:
Sampling in market research is of two types – probability sampling and non-probability
sampling. Let’s take a closer look at these two methods of sampling.
1. Probability sampling: Probability sampling is a sampling technique where a researcher
sets a selection of a few criteria and chooses members of a population randomly. All the
members have an equal opportunity to be a part of the sample with this selection parameter.
There are four main types of probability sample.
a. Simple random sampling:
In a simple random sample, every member of the population has an equal chance of being
selected. Your sampling frame should include the whole population.
To conduct this type of sampling, you can use tools like random number generators or other
techniques that are based entirely on chance.
Example
You want to select a simple random sample of 100 employees of Company X. You assign a
number to every employee in the company database from 1 to 1000 and use a random number
generator to select 100 numbers.
b. Systematic sampling:
Systematic sampling is like simple random sampling, but it is usually slightly easier to conduct.
Every member of the population is listed with a number, but instead of randomly generating
numbers, individuals are chosen at regular intervals.
Example
All employees of the company are listed in alphabetical order. From the first 10 numbers, you
randomly select a starting point: number 6. From number 6 onwards, every 10th person on the
list is selected (6, 16, 26, 36, and so on), and you end up with a sample of 100 people.
c. Stratified sampling:
Stratified sampling involves dividing the population into subpopulations that may differ in
important ways. It allows you draw more precise conclusions by ensuring that every subgroup
is properly represented in the sample.
To use this sampling method, you divide the population into subgroups (called strata) based
on the relevant characteristic (e.g. gender, age range, income bracket, job role).
Based on the overall proportions of the population, you calculate how many people should be
sampled from each subgroup. Then you use random or systematic sampling to select a sample
from each subgroup.
Example
The company has 800 female employees and 200 male employees. You want to ensure that the
sample reflects the gender balance of the company, so you sort the population into two strata
based on gender. Then you use random sampling on each group, selecting 80 women and 20
men, which gives you a representative sample of 100 people.
d. Cluster sampling:
Cluster sampling also involves dividing the population into subgroups, but each subgroup
should have similar characteristics to the whole sample. Instead of sampling individuals from
each subgroup, you randomly select entire subgroups.
If it is practically possible, you might include every individual from each sampled cluster. If the
clusters themselves are large, you can also sample individuals from within each cluster using
one of the techniques above.
This method is good for dealing with large and dispersed populations, but there is more risk of
error in the sample, as there could be substantial differences between clusters. It’s difficult to
guarantee that the sampled clusters are representative of the whole population.
Example
The company has offices in 10 cities across the country (all with roughly the same number of
employees in similar roles). You don’t have the capacity to travel to every office to collect your
data, so you use random sampling to select 3 offices – these are your clusters.
Uses of probability sampling:
There are multiple uses of probability sampling:
Reduce Sample Bias: Using the probability sampling method, the bias in the sample derived
from a population is negligible to non-existent. The selection of the sample mainly depicts the
understanding and the inference of the researcher. Probability sampling leads to higher quality
data collection as the sample appropriately represents the population.
Diverse Population: When the population is vast and diverse, it is essential to have adequate
representation so that the data is not skewed towards one demographic. For example, if
Square would like to understand the people that could make their point-of-sale devices, a
survey conducted from a sample of people across the US from different industries and socio-
economic backgrounds helps.
Create an Accurate Sample: Probability sampling helps the researchers plan and create an
accurate sample. This helps to obtain well-defined data.
2. Non-probability sampling: In non-probability sampling, the researcher chooses members
for research at random. This sampling method is not a fixed or predefined selection process.
This makes it difficult for all elements of a population to have equal opportunities to be
included in a sample.
a. Convenience sampling:
A convenience sample simply includes the individuals who happen to be most accessible to the
researcher.
This is an easy and inexpensive way to gather initial data, but there is no way to tell if the
sample is representative of the population, so it can’t produce generalizable results.
Example
You are researching opinions about student support services in your university, so after each
of your classes, you ask your fellow students to complete a survey on the topic. This is a
convenient way to gather data, but as you only surveyed students taking the same classes as
you at the same level, the sample is not representative of all the students at your university.
b. Voluntary response sampling: Like a convenience sample, a voluntary response sample is
mainly based on ease of access. Instead of the researcher choosing participants and directly
contacting them, people volunteer themselves (e.g. by responding to a public online survey).
Voluntary response samples are always at least somewhat biased, as some people will
inherently be more likely to volunteer than others.
Example
You send out the survey to all students at your university and a lot of students decide to
complete it. This can certainly give you some insight into the topic, but the people who
responded are more likely to be those who have strong opinions about the student support
services, so you can’t be sure that their opinions are representative of all students.
c. Purposive sampling:
This type of sampling, also known as judgment sampling, involves the researcher using their
expertise to select a sample that is most useful to the purposes of the research.
It is often used in qualitative research, where the researcher wants to gain detailed knowledge
about a specific phenomenon rather than make statistical inferences, or where the population
is very small and specific.
An effective purposive sample must have clear criteria and rationale for inclusion.
Example
You want to know more about the opinions and experiences of disabled students at your
university, so you purposefully select a number of students with different support needs in
order to gather a varied range of data on their experiences with student services.
d. Snowball sampling:
If the population is hard to access, snowball sampling can be used to recruit participants via
other participants. The number of people you have access to “snowballs” as you get in contact
with more people.
Example
You are researching experiences of homelessness in your city. Since there is no list of all
homeless people in the city, probability sampling isn’t possible. You meet one person who
agrees to participate in the research, and she puts you in contact with other homeless people
that she knows in the area.
Uses of non-probability sampling
Non-probability sampling is used for the following:
Create a hypothesis: Researchers use the non-probability sampling method to create an
assumption when limited to no prior information is available. This method helps with the
immediate return of data and builds a base for further research.
Exploratory research: Researchers use this sampling technique widely when conducting
qualitative research, pilot studies, or exploratory research.
Budget and time constraints: The non-probability method when there are budget and time
constraints, and some preliminary data must be collected. Since the survey design is not rigid,
it is easier to pick respondents at random and have them take the survey or questionnaire.
QUALITATIVE DATA:
Definition:
Qualitative data is defined as the data that approximates and characterizes.
Qualitative data can be observed and recorded. This data type is non-numerical in nature. This
type of data is collected through methods of observations, one-to-one interviews, conducting
focus groups, and similar methods. Qualitative data in statistics is also known as categorical
data – data that can be arranged categorically based on the attributes and properties of a thing
or a phenomenon.
Examples: The cake is orange, blue, and black in colour (qualitative).
Females have brown, black, blonde, and red hair (qualitative).
Importance of Qualitative Data:
Qualitative data is important in determining the frequency of traits or characteristics. It allows
the statistician or the researchers to form parameters through which larger data sets can be
observed.
Qualitative data provides the means by which observers can quantify the world around them.
For a market researcher, collecting qualitative data helps in answering questions like, who
their customers are, what issues or problems they are facing, and where do they need to focus
their attention, so problems or issues are resolved.
Qualitative data is about the emotions or perceptions of people, what they feel. In quantitative
data, these perceptions and emotions are documented. It helps the market researchers
understand the language their consumers speak and deal with the problem effectively and
efficiently.
Types of Qualitative Data:
1. One-to-One Interviews: It is one of the most used data collection instruments for
qualitative research, mainly because of its personal approach. The interviewer or the
researcher collects data directly from the interviewee on a one-to-one basis. The interview
may be informal and unstructured – conversational. Mostly the open-ended questions are
asked spontaneously, with the interviewer letting the flow of the interview dictate the
questions to be asked.
2. Focus groups: This is done in a group discussion setting. The group is limited to 6-10
people, and a moderator is assigned to moderate the ongoing discussion.Depending on the
data, which is sorted, the members of a group may have something in common. For example, a
researcher conducting a study on track runners will choose athletes who are track runners or
were track runners and have enough knowledge of the subject matter.
3. Record keeping: This method makes use of the already existing reliable documents and
similar sources of information as the data source. This data can be used in the new research. It
is like going to a library. There, one can go over books and other reference material to collect
relevant data that can be used in the research.
4. Process of observation: In this qualitative data collection method, the researcher immerses
himself/ herself in the setting where his respondents are and keeps a keen eye on the
participants and takes down notes. This is known as the process of observation. Besides taking
notes, other documentation methods, such as video and audio recording, photography, and
similar methods, can be used.
5. Longitudinal studies: This data collection method is performed on the same data source
repeatedly over an extended period. It is an observational research method that goes on for a
few years and, in some cases, can go on for even decades. This data collection method aims to
find correlations through an empirical study of subjects with common traits.
6. Case studies: In this method, data is gathered by an in-depth analysis of case studies. The
versatility of this method is demonstrated in how this method can be used to analyse both
simple and complex subjects. The strength of this method is how judiciously it uses a
combination of one or more qualitative data collection methods to draw inferences.
5 Steps to Qualitative Data Analysis:
Whether you are looking to analyse qualitative data collected through a one-to-one interview
or qualitative data from a survey, these simple steps will ensure a robust data analysis.
Step 1: Arrange your Data:
Once you have collected all the data, it is largely unstructured and sometimes makes no sense
when looked briefly. Therefore, it is essential that as a researcher, you first need to transcribe
the data collected. The first step in analysing your data is arranging it systematically. Arranging
data means converting all the data into a text format. You can either export the data into a
spreadsheet or manually type in the data or choose from any of the computer-assisted
qualitative data analysis tools.
Step 2: Organize all your Data:
After transforming and arranging your data, the immediate next step is to organize your data.
There are chances you most likely have a large amount of information that still needs to be
arranged in an orderly manner. One of the best ways to organize the data is by going back to
your research objectives and then organizing the data based on the questions asked. Arrange
your research objective in a table, so it appears visually clear. At all costs, avoid temptations of
working with unorganized data. You will end up wasting time, and there will be no conclusive
results obtained.
Step 3: Set a Code to the Data Collected:
Setting up proper codes for the collected data takes you a step ahead. Coding is one of the best
ways to compress a tremendous amount of information collected. The coding of qualitative
data simply means categorizing and assigning properties and patterns to the collected data.
Coding is an important step in qualitative data analysis, as you can derive theories from
relevant research findings. After assigning codes to your data, you can then begin to build on
the patterns to gain in-depth insight into the data that will help make informed decisions.
Step 4: Validate your Data:
Validating data is one of the crucial steps of qualitative data analysis for successful research.
Since data is quintessential for research, it is imperative to ensure that the data is not flawed.
Please note that data validation is not just one step in qualitative data analysis; this is a
recurring step that needs to be followed throughout the research process. There are two sides
to validating data:
Accuracy of your research design or methods.
Reliability, which is the extent to which the methods produce accurate data consistently.
Step 5: Concluding the Analysis Process:
It is important to finally conclude your data, which means systematically presenting your data,
a report that can be readily used. The report should state the method that you, as a researcher,
used to conduct the research studies, the positives, and negatives and study limitations. In the
report, you should also state the suggestions/inferences of your findings and any related area
for future research.
Advantages of Qualitative Data:
1. It helps in-depth analysis: Qualitative data collected provide the researchers with a
detailed analysis of subject matters. While collecting qualitative data, the researchers tend to
probe the participants and can gather ample information by asking the right kind of questions.
From a series of questions and answers, the data that is collected is used to conclude.
2. Understand what customers think: Qualitative data helps the market researchers to
understand the mindset of their customers. The use of qualitative data gives businesses an
insight into why a customer purchased a product. Understanding customer language helps
market research infer the data collected more systematically.
3. Rich data: Collected data can be used to conduct research in the future as well. Since the
questions asked to collect qualitative data are open-ended questions, respondents are free to
express their opinions, leading to more information.
Disadvantages of Qualitative Data:
1. Time-consuming: As collecting qualitative data is more time consuming, fewer people are
studying in comparison to collecting quantitative data. Unless time and budget allow, a smaller
sample size is included.
2. Not easy to generalize: Since fewer people are studied, it is difficult to generalize the
results of that population.
3.Dependent on the researcher’s skills: This type of data is collected through one-to-one
interviews, observations, focus groups, etc. it relies on the researcher’s skills and experience to
collect information from the sample.
QUANTITATIVE DATA:
Definition:
Quantitative data is defined as the value of data in the form of counts or numbers where each
dataset has a unique numerical value associated with it. This data is any quantifiable
information that can be used for mathematical calculations and statistical analysis, such that
real-life decisions can be made based on these mathematical derivations. Quantitative data is
used to answer questions such as “How many?”, “How often?”, “How much?”.
Types of Quantitative Data with Examples:
The most common types of quantitative data are as below:
Counter: Count equated with entities. For example, the number of people who download
application from the App Store.
Measurement of physical objects: Calculating measurement of any physical thing. For
example, the HR executive carefully measures the size of each cubicle assigned to the newly
joined employees.
Sensory calculation: Mechanism to naturally “sense” the measured parameters to create a
constant source of information. For example, a digital camera converts electromagnetic
information to a string of numerical data.
Projection of data: Future projection of data can be done using algorithms and other
mathematical analysis tools. For example, a marketer will predict an increase in the sales after
launching a new product with thorough analysis.
Quantification of qualitative entities: Identify numbers to qualitative information. For
example, asking respondents of an online survey to share the likelihood of recommendation on
a scale of 0-10.
Quantitative Data Collection Methods:
As quantitative data is in the form of numbers, mathematical and statistical analysis of these
numbers can lead to establishing some conclusive results.
There are two main Quantitative Data Collection Methods:
Surveys: Traditionally, surveys were conducted using paper-based methods and have
gradually evolved into online mediums. Closed-ended questions form a major part of these
surveys as they are more effective in collecting quantitative data. The survey makes include
answer options which they think are the most appropriate for a question. Surveys are integral
in collecting feedback from an audience which is larger than the conventional size. A critical
factor about surveys is that the responses collected should be such that they can be
generalized to the entire population without significant discrepancies. Based on the time
involved in completing surveys, they are classified into the following –
Longitudinal Studies: A type of observational research in which the market researcher
conducts surveys from a specific time period to another, i.e., over a considerable course of
time, is called longitudinal survey. This survey is often implemented for trend analysis or
studies where the primary objective is to collect and analyse a pattern in data.
Cross-sectional Studies: A type of observational research in which the market research
conducts surveys at a time period across the target sample is known as cross-sectional survey.
This survey type implements a questionnaire to understand a specific subject from the sample
at a definite time period.
Quantitative Data Analysis Methods:
Data collection forms a major part of the research process. This data however has to be
analysed to make sense of. There are multiple methods of analysing quantitative data collected
in surveys. They are:
Cross-tabulation: Cross-tabulation is the most widely used quantitative data analysis
methods. It is a preferred method since it uses a basic tabular form to draw inferences
between different datasets in the research study. It contains data that is mutually exclusive or
have some connection with each other.
Trend analysis: Trend analysis is a statistical analysis method that provides the ability to look
at quantitative data that has been collected over a long period of time. This data analysis
method helps collect feedback about data changes over time and if aims to understand the
change in variables considering one variable remains unchanged.
MaxDiff analysis: The MaxDiff analysis is a quantitative data analysis method that is used to
gauge customer preferences for a purchase and what parameters rank higher than the others
in this process. In a simplistic form, this method is also called the “best-worst” method. This
method is very similar to conjoint analysis but is much easier to implement and can be
interchangeably used.
Conjoint analysis: Like in the above method, conjoint analysis is a similar quantitative data
analysis method that analyzes parameters behind a purchasing decision. This method
possesses the ability to collect and analyze advanced metrics which provide an in-depth
insight into purchasing decisions as well as the parameters that rank the most important.
TURF analysis: TURF analysis or Total Unduplicated Reach and Frequency Analysis, is a
quantitative data analysis methodology that assesses the total market reach of a product or
service or a mix of both. This method is used by organizations to understand the frequency and
the avenues at which their messaging reaches customers and prospective customers which
helps them tweak their go-to-market strategies.
Gap analysis: Gap analysis uses a side-by-side matrix to depict quantitative data that helps
measure the difference between expected performance and actual performance. This data
analysis helps measure gaps in performance and the things that are required to be done to
bridge this gap.
SWOT analysis: SWOT analysis, is a quantitative data analysis method that assigns numerical
values to indicate strength, weaknesses, opportunities and threats of an organization or
product or service which in turn provides a holistic picture about competition. This method
helps to create effective business strategies.
Text analysis: Text analysis is an advanced statistical method where intelligent tools make
sense of and quantify or fashion qualitative and open-ended data into easily understandable
data. This method is used when the raw survey data is unstructured but has to be brought into
a structure that makes sense.
Steps to conduct Quantitative Data Analysis:
For Quantitative Data, raw information has to presented in meaningful manner using analysis
methods. Quantitative data should be analyzed in order to find evidential data that would help
in the research process.
Relate measurement scales with variables: Associate measurement scales such as Nominal,
Ordinal, Interval and Ratio with the variables. This step is important to arrange the data in
proper order. Data can be entered into an excel sheet to organize it in a specific format.
Connect descriptive statistics with data: Link descriptive statistics to encapsulate available
data. It can be difficult to establish a pattern in the raw data. Some widely used descriptive
statistics are:
Mean- An average of values for a specific variable
Median- A midpoint of the value scale for a variable
Mode- For a variable, the most common value
Frequency- Number of times a particular value is observed in the scale
Minimum and Maximum Values- Lowest and highest values for a scale
Percentages- Format to express scores and set of values for variables
Decide a measurement scale: It is important to decide the measurement scale to conclude a
descriptive statistic for the variable. For instance, a nominal variable score will never have a
mean or median and so the descriptive statistics will correspondingly vary. Descriptive
statistics suffice in situations where the results are not to be generalized to the population.
Select appropriate tables to represent data and analyze collected data: After deciding on
a suitable measurement scale, researchers can use a tabular format to represent data. This
data can be analyzed using various techniques such as Cross-tabulation or TURF.
Advantages of Quantitative Data:
Some of advantages of quantitative data, are:
Conduct in-depth research: Since quantitative data can be statistically analyzed, it is highly
likely that the research will be detailed.
Minimum bias: There are instances in research, where personal bias is involved which leads
to incorrect results. Due to the numerical nature of quantitative data, the personal bias is
reduced to a great extent.
Accurate results: As the results obtained are objective in nature, they are extremely accurate.
Disadvantages of Quantitative Data:
Some of disadvantages of quantitative data, are:
Restricted information: Because quantitative data is not descriptive, it becomes difficult for
researchers to make decisions based solely on the collected information.
Depends on question types: Bias in results is dependent on the question types included to
collect quantitative data. The researcher’s knowledge of questions and the objective of
research are exceedingly important while collecting quantitative data.
SCALING TECHNIQUES:
Definition: Scaling technique is a method of placing respondents in continuation of gradual
change in the pre-assigned values, symbols or numbers based on the features of a particular
object as per the defined rules. All the scaling techniques are based on four pillars, i.e., order,
description, distance and origin.
Types of Scaling Techniques:
The researchers have identified many scaling techniques; today, we will discuss some of the
most common scales used by business organizations, researchers, economists, experts, etc.
These techniques can be classified as primary scaling techniques and other scaling techniques.
Let us now study each of these methods in-depth below:
Primary Scaling Techniques
The major four scales used in statistics for market research consist of the following:
a. Nominal Scale:
Nominal scales are adopted for non-quantitative (containing no numerical implication)
labeling variables which are unique and different from one another.
Types of Nominal Scales:
Dichotomous: A nominal scale that has only two labels is called ‘dichotomous’; for example,
Yes/No.
Nominal with Order: The labels on a nominal scale arranged in an ascending or descending
order is termed as ‘nominal with order’; for example, Excellent, Good, Average, Poor, Worst.
Nominal without Order: Such nominal scale which has no sequence, is called ‘nominal
without order’; for example, Black, White.
b. Ordinal Scale:
The ordinal scale functions on the concept of the relative position of the objects or labels based
on the individual’s choice or preference.
For example, At Amazon. in, every product has a customer review section where the buyers
rate the listed product according to their buying experience, product features, quality, usage,
etc.
The ratings so provided are as follows:
5 Star – Excellent
4 Star – Good
3 Star – Average
2 Star – Poor
1 Star – Worst
c. Interval Scale:
An interval scale is also called a cardinal scale which is the numerical labelling with the same
difference among the consecutive measurement units. With the help of this scaling technique,
researchers can obtain a better comparison between the objects.
For example, A survey conducted by an automobile company to know the number of vehicles
owned by the people living in a particular area who can be its prospective customers in future.
It adopted the interval scaling technique for the purpose and provided the units as 1, 2, 3, 4, 5,
6 to select from.
In the scale mentioned above, every unit has the same difference, i.e., 1, whether it is between
2 and 3 or between 4 and 5.
d. Ratio Scale:
One of the most superior measurement techniques is the ratio scale. Similar to an interval
scale, a ratio scale is an abstract number system. It allows measurement at proper intervals,
order, categorization and distance, with an added property of originating from a fixed zero
point. Here, the comparison can be made in terms of the acquired ratio.
For example, A health product manufacturing company surveyed to identify the level of
obesity in a particular locality. It released the following survey questionnaire:
Select a category to which your weight belongs to:
Less than 40 kilograms
40-59 Kilograms
60-79 Kilograms
80-99 Kilograms
100-119 Kilograms
120 Kilograms and more
Other Scaling Techniques:
Scaling of objects can be used for a comparative study between more than one objects
(products, services, brands, events, etc.). Or can be individually carried out to understand the
consumer’s behaviour and response towards a particular object.
Following are the two categories under which other scaling techniques are placed based on
their comparability:
Comparative Scales:
For comparing two or more variables, a comparative scale is used by the respondents.
Following are the different types of comparative scaling techniques:
1. Paired Comparison:
A paired comparison symbolizes two variables from which the respondent needs to select one.
This technique is mainly used at the time of product testing, to facilitate the consumers with a
comparative analysis of the two major products in the market.
2. Rank Order:
In rank order scaling the respondent needs to rank or arrange the given objects according to
his or her preference.
3. Constant Sum:
It is a scaling technique where a continual sum of units like dollars, points, chits, chips, etc. is
given to the features, attributes and importance of a particular product or service by the
respondents.
4. Q-Sort Scaling:
Q-sort scaling is a technique used for sorting the most appropriate objects out of a large
number of given variables. It emphasizes on the ranking of the given objects in a descending
order to form similar piles based on specific attributes.It is suitable in the case where the
number of objects is not less than 60 and more than 140, the most appropriate of all ranging
between 60 to 90.
Non-Comparative Scales:
A non-comparative scale is used to analyse the performance of an individual product or object
on different parameters. Following are some of its most common types:
1. Continuous Rating Scales:
It is a graphical rating scale where the respondents are free to place the object at a position of
their choice. It is done by selecting and marking a point along the vertical or horizontal line
which ranges between two extreme criteria.
2. Itemized Rating Scale:
Itemized scale is another essential technique under the non-comparative scales. It emphasizes
on choosing a particular category among the various given categories by the respondents. Each
class is briefly defined by the researchers to facilitate such selection.
The three most commonly used itemized rating scales are as follows:
Likert Scale: In the Likert scale, the researcher provides some statements and ask the
respondents to mark their level of agreement or disagreement over these statements by
selecting any one of the options from the five given alternatives.
For example, A shoes manufacturing company adopted the Likert scale technique for its new
sports shoe range named Z sports shoes. The purpose is to know the agreement or
disagreement of the respondents.
For this, the researcher asked the respondents to circle a number representing the most
suitable answer according to them, in the following representation:
1 – Strongly Disagree
2 – Disagree
3 – Neither Agree nor Disagree
4 – Agree
5 – Strongly Agree
Semantic Differential Scale: A bi-polar seven-point non-comparative rating scale is where
the respondent can mark on any of the seven points for each given attribute of the object as
per personal choice. Thus, depicting the respondent’s attitude or perception towards the
object.
RESEARCH METHODS:
Research methods are specific procedures for collecting and analysing data.Research methods
are the strategies, processes or techniques utilized in the collection of data or evidence for
analysis in order to uncover new information or create better understanding of a topic.
There are different types of research methods which use different tools for data collection.
1. Interview Research Method:
An interview is generally a qualitative research technique which involves asking open-ended
questions to converse with respondents and collect elicit data about a subject.
The interviewer in most cases is the subject matter expert who intends to understand
respondent opinions in a well-planned and executed series of questions and answers.
Types of Interviews:
(a) Structured Interview:
A structured interview is a quantitative research method where the interviewer a set of
prepared closed-ended questions in the form of an interview schedule, which he/she reads out
exactly as worded.
Interviews schedules have a standardized format which means the same questions are asked
to each interviewee in the same order.
The interviewer will not deviate from the interview schedule (except to clarify the meaning of
the question) or probe beyond the answers received.
A structured interview is also known as a formal interview (like a job interview).
Strengths
Structured interviews are easy to replicate as a fixed set of closed questions are used, which
are easy to quantify – this means it is easy to test for reliability.
Structured interviews are fairly quick to conduct which means that many interviews can take
place within a short amount of time. This means a large sample can be obtained resulting in the
findings being representative and having the ability to be generalized to a large population.
Limitations
Structure interviews are not flexible. This means new questions cannot be asked impromptu
(i.e. during the interview) as an interview schedule must be followed.
The answers from structured interviews lack detail as only closed questions are asked which
generates quantitative data. This means a researcher won't know why a person behaves in a
certain way.
(b) Unstructured Interview:
Unstructured interviews do not use any set questions, instead, the interviewer asks open-
ended questions based on a specific research topic and will try to let the interview flow like a
natural conversation.
The interviewer modifies his or her questions to suit the candidate's specific experiences.
Unstructured interviews are sometimes referred to as ‘discovery interviews’ and are more like
a ‘guided conservation’ than a strict structured interview.
They are sometimes called informal interviews.
Strengths
Unstructured interviews are more flexible as questions can be adapted and changed depending
on the respondents’ answers. The interview can deviate from the interview schedule.
Unstructured interviews generate qualitative data through the use of open questions. This
allows the respondent to talk in some depth, choosing their own words. This helps the
researcher develop a real sense of a person’s understanding of a situation.
They also have increased validity because it gives the interviewer the opportunity to probe for
a deeper understanding, ask for clarification & allow the interviewee to steer the direction of
the interview etc.
Limitations
It can be time-consuming to conduct an unstructured interview and analyze the qualitative
data (using methods such as thematic analysis).
Employing and training interviewers is expensive, and not as cheap as collecting data via
questionnaires. For example, certain skills may be needed by the interviewer. These include
the ability to establish rapport and knowing when to probe.
(c) Focus Group Interview:
Focus group interview is a qualitative approach where a group of respondents are interviewed
together, used to gain an in‐depth understanding of social issues.
The method aims to obtain data from a purposely selected group of individuals rather than
from a statistically representative sample of a broader population.
The role of the interview moderator is to make sure the group interact with each other and do
not drift off-topic.
Strengths
Group interviews generate qualitative narrative data through the use of open questions. This
allows the respondents to talk in some depth, choosing their own words. This helps the
researcher develop a real sense of a person’s understanding of a situation. Qualitative data also
includes observational data, such as body language and facial expressions.
They also have increased validity because some participants may feel more comfortable being
with others as they are used to talking in groups in real life (i.e. it's more natural).
Limitations
The researcher must ensure that they keep all the interviewees' details confidential and
respect their privacy. This is difficult when using a group interview. For example, the
researcher cannot guarantee that the other people in the group will keep information private.
Group interviews are less reliable as they use open questions and may deviate from the
interview schedule making them difficult to repeat.
Group interviews may sometimes lack validity as participants may lie to impress the other
group members. They may conform to peer pressure and give false answers.
Design of Interviews:
First, you must choose whether to use a structured or non-structured interview.
Next, you must consider who will be the interviewer, and this will depend on what type of
person is being interviewed. There are a number of variables to consider:
Gender and age: This can have a big effect on respondent's answers, particularly on personal
issues.
Personal characteristics: Some people are easier to get on with than others. Also, the accent
and appearance (e.g. clothing) of the interviewer can have an effect on the rapport between the
interviewer and interviewee.
Also, the language the interviewer uses should be appropriate to the vocabulary of the group of
people being studied. For example, the researcher must change the language of questions to
match the social background of respondents' age / educational level / social class / ethnicity
etc. <
The interviewer must ensure that they take special care when interviewing vulnerable groups,
such as children. For example, children have a limited attention span and for this reason,
lengthy interviews should be avoided.
Ethnicity: People have difficulty interviewing people from a different ethnic group.
2. Observational Method:
Observation (watching what people do) would seem to be an obvious method of carrying out
research in psychology. However, there are different types of observational methods and
distinctions need to be made between:
(a) Controlled Observations
(b) Naturalistic Observations
(c) Participant Observations
(a) Controlled Observation:
Controlled observations (usually a structured observation) are likely to be carried out in a
psychology laboratory.
The researcher decides where the observation will take place, at what time, with which
participants, in what circumstances and uses a standardized procedure.
Participants are randomly allocated to each independent variable group.
Rather than writing a detailed description of all behavior observed, it is often easier to code
behavior according to a previously agreed scale using a behavior schedule (i.e. conducting a
structured observation).
Coding might involve numbers or letters to describe a characteristic, or use of a scale to
measure behavior intensity.
Strengths
Controlled observations can be easily replicated by other researchers by using the same
observation schedule. This means it is easy to test for reliability.
The data obtained from structured observations is easier and quicker to analyse.
Controlled observations are quick to conduct which means that many observations can take
place within a short amount of time.
Limitations
Controlled observations can lack validity due to the Hawthorne effect/demand characteristics.
When participants know they are being watched they may act differently.
(b) Naturalistic Observation:
Naturalistic observation is a research method commonly used by psychologists and other
social scientists.
This technique involves observing involves studying the spontaneous behavior of participants
in natural surroundings.
The researcher simply records what they see in whatever way they can.
In unstructured observations, the researcher records all relevant behavior without system
Strengths
Naturalistic observation is often used to generate new ideas. Because it gives the researcher
the opportunity to study the total situation it often suggests avenues of inquiry not thought of
before.
Limitations
These observations are often conducted on a micro (small) scale and may lack a representative
sample (biased in relation to age, gender, social class or ethnicity). This may result in the
findings lacking the ability to be generalized to wider society.
Natural observations are less reliable as other variables cannot be controlled. This makes it
difficult for another researcher to repeat the study in exactly the same way.
(c) Participant Observation:
Participant observation is a variant of the above (natural observations) but here the
researcher joins in and becomes part of the group they are studying to get a deeper insight into
their lives.
If it were research on animals, we would now not only be studying them in their natural
habitat but be living alongside them as well!
Participant observations can be either cover or overt.
Covert is where the study is carried out 'undercover'. The researcher's real identity and
purpose are kept concealed from the group being studied. The researcher takes a false identity
and role, usually posing as a genuine member of the group.
On the other hand, overt is where the researcher reveals his or her true identity and purpose
to the group and asks permission to observe.
Limitations
It can be difficult to get time / privacy for recording. For example, with covert observations
researchers can’t take notes openly as this would blow their cover. This means they have to
wait until they are alone and rely on their memory. This is a problem as they may forget details
and are unlikely to remember direct quotations.
If the researcher becomes too involved, they may lose objectivity and become bias. There is
always the danger that we will “see” what we expect (or want) to see. This is a problem as they
could selectively report information instead of noting everything they observe. Thus, reducing
the validity of their data.
3. Questionnaire method:
Questionnaire is as an instrument for research, which consists of a list of questions, along with
the choice of answers, printed or typed in a sequence on a form used for acquiring specific
information from the respondents.
In general, questionnaires are delivered to the persons concerned either by post or mail,
requesting them to answer the questions and return it.
Informants are expected to read and understand the questions and reply in the space provided
in the questionnaire itself.
The questionnaire is prepared in such a way that it translates the required information into a
series of questions, that informants can and will answer.
Characteristics of a Good Questionnaire:
The following are characteristics of good questionnaires:
It should consist of a well-written list of questions.
The questionnaire should deal with an important or significant topic to create interest among
respondents.
It should seek only that data which cannot be obtained from other sources.
It should be as short as possible but should be comprehensive.
It should be attractive.
Directions should be clear and complete.
It should be represented in good psychological order proceeding from general to more specific
responses.
Double negatives in questions should be avoided.
Putting two questions in one question also should be avoided. Every question should seek to
obtain only one specific information.
It should be designed to collect information which can be used subsequently as data for
analysis.
Format of Questions in Questionnaires:
The questions asked can take two forms:
Restricted questions, also called closed-ended, ask the respondent to make choices — yes or
no, check items on a list, or select from multiple choice answers.Restricted questions are easy
to tabulate and compile.
Unrestricted questions are open-ended and allow respondents to share feelings and opinions
that are important to them about the matter at hand.
Unrestricted questions are not easy to tabulate and compile, but they allow respondents to
reveal the depth of their emotions.
If the objective is to compile data from all respondents, then sticking with restricted questions
that are easily quantified is better.
If degrees of emotions or depth of sentiment is to be studied, then develop a scale to quantify
those feelings.
Uses of Questionnaires:
Questionnaires are a common and inexpensive research tool used by private companies,
government departments, individuals, groups, NGOs etc to get feedback, research, collect data
from consumer, customers or from general public depending on the need.
Questionnaires are the most important part of primary surveys.
Advantages of Questionnaire:
One of the greatest benefits of questionnaires lies in their uniformity — all respondents see the
same questions.
It is an inexpensive method, regardless of the size of the universe.
Free from the bias of the interviewer, as the respondents answer the questions in his own
words.
Respondents have enough time to think and answer.
Due to its large coverage, respondents living in distant areas can also be reached conveniently.
Limitations of Questionnaire:
The risk of collection of inaccurate and incomplete information is high in the questionnaire, as
it might happen that people may not be able to understand the question correctly.
The rate of non-response is high.
Action Research:
Action research can be defined as “an approach in which the action researcher and a client
collaborate in the diagnosis of the problem and in the development of a solution based on the
diagnosis”.
In other words, one of the main characteristic traits of action research relates to collaboration
between researcher and member of organisation in order to solve organizational problems.
Action study assumes social world to be constantly changing, both, researcher and research
being one part of that change.
Generally, action research can be divided into three categories: positivist, interpretive and
critical.
Positivist approach to action research, also known as ‘classical action research’, perceives
research as a social experiment. Accordingly, action research is accepted as a method to test
hypotheses in a real-world environment.
Interpretive action research, also known as ‘contemporary action research’, perceives
business reality as socially constructed and focuses on specifications of local and
organizational factors when conducting the action research.
Critical action research is a specific type of action research that adopts critical approach
towards business processes and aims for improvements.
The following features of action research need to be considered when considering its
suitability for any given study:
It is applied in order to improve specific practices. Action research is based on action,
evaluation and critical analysis of practices based on collected data in order to introduce
improvements in relevant practices.
This type of research is facilitated by participation and collaboration of number of individuals
with a common purpose
Such a research focuses on specific situations and their context
Advantages of Action Research
High level of practical relevance of the business research.
Can be used with quantitative, as well as qualitative data.
Possibility to gain in-depth knowledge about the problem.
Disadvantages of Action Research
Difficulties in distinguishing between action and research and ensure the application of both.
Delays in completion of action research due to a wide range of reasons are not rare
occurrences
Lack of repeatability
Documentary Research:
Documentary research is defined as the research conducted using official documents or
personal documents as the source of information.
Documents can include anything from the following:
Newspapers
Stamps
Diaries
Maps
Handbills
Directories
Paintings
Government statistical publications
Gramophone records
Photographs
Computer files
Tapes
The above may not fit the traditional bill of a “document” but since they contain information,
they can be used towards documentary research.
Social scientists often conduct documentary research. It is mainly conducted to assess various
documents in the interest of social or historical value.
Sometimes, researchers also conduct documentary research to study various documents
surrounding events or individuals.
Documentary research is similar to content analysis, which involves studying existing
information recorded in media, texts, and physical items.
Here, data collection from people is not required to conduct research. Hence, this is a prime
example of secondary research.
Advantages of documentary research method
Here are the advantages of the documentary research method:
Data readily available: Data is readily available in various sources. You only need to know
where to look and how to use it. The data is available in different forms, and harnessing it is
the real challenge.
Inexpensive and economical: The data for research is already collected and published in
either print or other forms. The researcher does not need to spend money and time like they
do to collect market research insights and gather data. They need to search for and compile the
available data from different sources.
Saves time: Conducting market research is time-consuming. Responses will not come in
quickly as expected, and gathering global responses will take a huge amount of time. If you
have all the reference documents available (or you know where to find them), research is
relatively quick.
Non-bias: Primary data collection tends to be biased. This bias depends on a lot of factors like
the age of the respondents, the time they take the survey, their mentality while taking the
survey, their gender, their feelings towards certain ideas, to name a few. The list goes on and
on when it comes to surveying bias.
Researcher not necessary during data collection: The researcher doesn’t need to be
present during data collection. It is practically impossible for the researcher to be present at
every point of the data source, especially thinking about the various data sources.
Useful for hypothesis: Use historical data to draw inferences of the current or future events.
Conclusions can be drawn from the experience of past events and data available for them.
Disadvantages of documentary research method
Here are the disadvantages of the documentary research method:
Limited data: Data is not always available, especially when you need to cross-verify a theory
or strengthen your argument based on different forms of data.
Inaccuracies: As the data is historical and published, there is almost no way of ascertaining if
the data is accurate or not.
Incomplete documents: Often, documents can be incomplete, and there is no way of knowing
if there are additional documents to refer to on the subject.
Data out of context: The data that the researcher refers to may be out of context and may not
be in line with the concept the researcher is trying to study. Its because the research goal is not
thought of when creating the original data. Often, researchers have to make do with the
available data at hand.
UNIT-3
RESEARCH:
Research is a systematized effort to gain new knowledge.
What is data analysis in research?
Research data analysis is a process used by researchers for reducing data to a story and
interpreting it to derive insights. The data analysis process helps in reducing a large number of
data into smaller fragments, which makes sense.
Types of Research Data:
Data may be grouped into four main types based on methods for collection:
Observational data
Experimental data
Simulation data
Derived /Compiled data
1) Observational Data:
Observational data are captured through observation of a behavior or activity.
It is collected using methods such as human observation, open-ended surveys, or the use of an
instrument or sensor to monitor and record information.
Because observational data are captured in real time, it would be very difficult or impossible to
recreate if lost.
2) Experimental Data:
Experimental data are collected through active intervention by the researcher to produce and
measure change.
It allows the researcher to determine a casual relationship and is typically projectable to a
larger population.
3) Simulation Data:
Simulation data are generated by imitating the operation of a real-world process or system
over time using computer test models.
This method is used to try to determine what would, or could, happen under certain
conditions.
4) Derived/Compiled Data:
It involves using existing data points, often from different data sources, to create new data
through some sort of transformation.
For example, combining area and population data from the twin cities to create population
density data.
Types of Primary data in research:
(i) Qualitative data:
When the data presented has words and descriptions, then we call it qualitative data.
Ex: quality data represents everything describing taste, experience, or an opinion is considered
as a quality data.
(ii) Quantitative Data:
Any data expressed in number of numerical figures are called quantitative data.
Ex: questions such as age, rank, cost, length, weight, score, etc.
This data can be presented in graphical format, charts etc.
This frequency table will help us make better sense of the data given. Also when the data set is
too big we use tally marks for counting. It makes the task more organized and easy.
There are different types of frequency distributions.
Grouped frequency distribution
Ungrouped frequency distribution
Cumulative frequency distribution
Relative frequency distribution
Relative cumulative frequency distribution
To get a frequency distribution, we need to divide data into different classes of appropriate
size while indicating the number of observations in each class. Through frequency distribution,
it becomes a lot easier to summarize the data. That's why it's also defined as a process of
presenting the data in a summarized form. It's also called Frequency Table.
Uses of Frequency Distribution
It is quite useful for data analysis.
It assists in estimating the frequencies of the population on the basis of the ample.
It facilitates the computation of different statistical measures.
The factor is indicated by ‘C’. The deviation, when reduced by this factor, is known as a step-
deviation. The formula is as follows:
Mean = A + (∑fd’/∑f) ×C
C = The common factor using which deviations are converted to step-deviations
Note: In this method step-deviation denoted by d’ is used and not d.
d’=(X-A)/C
Here, X = The value of the item, A = Assumed value of mean and
C = Common factor chosen
Below are discussed the steps to calculate mean:
Step—1:
Assume any one mid-point of the distribution as mean. But the best plan is to take mid-point of
an interval near the centre which has the largest frequency.
Step—2:
Find out the d column, d is the deviation between the score and the assumed mean.
Here we can find out d by using the following formula:
Step—3:
Find out fd column. It is found out by multiplying f column by d column.
Step—4:
Find out ∑fd. Add all the positive values and negative values separately. Then find out the
algebraic sum which is ∑fd.
Step—5:
Find out the mean by using formula.
Uses of Mean:
There are certain general rules for using mean. Some of these uses are as following:
Mean is more stable than the median and mode. So that when the measure of central tendency
having the greatest stability is wanted mean is used.
Mean is used to calculate other statistics like S.D., coefficient of correlation, ANOVA, ANCOVA
etc.
Merits of Mean:
Mean is rigidly defined so that there is no question of misunderstanding about its meaning and
nature.
It is the most popular central tendency as it is easy to understand.
It is easy to calculate.
It is not affected by sampling so that the result is reliable.
Demerits of Mean:
Mean is affected by extreme scores.
Sometimes mean is a value which is not present in the series.
Sometimes it gives absurd values. For example, there are 41, 44 and 42 students in class VIII, IX
and X of a school. So, the average students per class are 42.33. It is never possible.
2. MEDIAN:
Median, in statistics, is the middle value of the given list of data, when arranged in an order.
The arrangement of data or observations can be done either in ascending order or descending
order.
Example: The median of 2,3,4 is 3.
The median of a set of data is the middlemost number or center value in the set. The median is
also the number that is halfway into the set.
To find the median, the data should be arranged, first, in order of least to greatest or greatest
to the least value. A median is a number that is separated by the higher half of a data sample, a
population or a probability distribution, from the lower half. The median is different for
different types of distribution.
For example, the median of 3, 3, 5, 9, 11 is 5. If there is an even number of observations, then
there is no single middle value; the median is then usually defined to be the mean of the two
middle values: so the median of 3, 5, 7, 9 is (5+7)/2 = 6.
Median Formula
The formula to calculate the median of the finite number of data set is given here. Median
formula is different for even and odd numbers of observations. Therefore, it is necessary to
recognize first if we have odd number of values or even number of values in a given data set.
The formula to calculate the median of the data set is given as follow.
Odd Number of Observations
If the total number of observations given is odd, then the formula to calculate the median is:
Median = {(n+1)/2}thterm
where n is the number of observations
Even Number of Observations
If the total number of observations is even, then the median formula is:
Median = [(n/2)th term + {(n/2)+1}th]/2
where n is the number of observations
How to Calculate the Median?
To find the median, place all the numbers in the ascending order and find the middle.
Example 1:
Find the Median of 14, 63 and 55
solution:
Put them in ascending order: 14, 55, 63
The middle number is 55, so the median is 55.
Example 2:
Find the median of the following:
4, 17, 77, 25, 22, 23, 92, 82, 40, 24, 14, 12, 67, 23, 29
Solution:
When we put those numbers in the order we have:
4, 12, 14, 17, 22, 23, 23, 24, 25, 29, 40, 67, 77, 82, 92,
There are fifteen numbers. Our middle is the eighth number:
The median value of this set of numbers is 24.
Example 3:
Rahul’s family drove through 7 states on summer vacation. The prices of Gasoline differ from
state to state. Calculate the median of gasoline cost.
1.79, 1.61, 2.09, 1.84, 1.96, 2.11, 1.75
Solution:
By organizing the data from smallest to greatest, we get:
1.61, 1.75, 1.79, 1.84 , 1.96, 2.09, 2.11
Hence, the median of the gasoline cost is 1.84. There are three states with greater gasoline
costs and 3 with smaller prices.
Merits of Median:
It is simple to understand and easy to calculate.
It is not affected by the extreme items in the series.
It can be determined graphically.
For open-ended classes, median can be calculated.
Demerits of Median:
It does not consider all variables because it is a positional average.
The value of median is affected more by sampling fluctuations
It is not capable of further algebraic treatment.
Like mean, combined median cannot be calculated.
It cannot be computed precisely when it lies between two items.
3. MODE:
The mode is the value that appears most frequently in a data set.
A set of data may have one mode, more than one mode, or no mode at all.
In statistics, the mode is the most commonly observed value in a set of data.
For the normal distribution, the mode is also the same value as the mean and median.
Examples of the Mode:
For example, in the following list of numbers, 16 is the mode since it appears more times in the
set than any other number:
3, 3, 6, 9, 16, 16, 16, 27, 27, 37, 48
A set of numbers can have more than one mode (this is known as bimodal if there are two
modes) if there are multiple numbers that occur with equal frequency, and more times than the
others in the set.
3, 3, 3, 9, 16, 16, 16, 27, 37, 48
In the above example, both the number 3 and the number 16 are modes as they each occur
three times and no other number occurs more often.
Advantages:
The mode is easy to understand and calculate.
The mode is not affected by extreme values.
The mode is easy to identify in a data set and in a discrete frequency distribution.
The mode can be located graphically.
Disadvantages:
The mode is not defined when there are no repeats in a data set.
The mode is not based on all values.
Sometimes data have one mode, more than one mode, or no mode at all.
Negative Correlation: When there is a decrease in values of one variable with decrease in
values of other variable. In that case, correlation coefficient would be negative.
Zero Correlation or No Correlation: There is one more situation when there is no specific
relation between two variables.
Regression Analysis:
Regression analysis is a set of statistical methods used for the estimation of relationships
between a dependent variable and one or more independent variables. It can be utilized to
assess the strength of the relationship between variables and for modeling the future
relationship between them.
Regression Analysis – Simple linear regression:
Simple linear regression is a model that assesses the relationship between a dependent
variable and an independent variable. The simple linear model is expressed using the following
equation:
Y = a + bX + ϵ
Where:
Y – Dependent variable
X – Independent (explanatory) variable
a – Intercept
b – Slope
ϵ – Residual (error)
EXAMPLE-A soft drink maker claims that a majority of adults prefer its leading beverage over
that of its main competitor’s. To test this claim 500 randomly selected people were given the
two beverages in random order to taste. Among them, 270 preferred the soft drink maker’s
brand, 211 preferred the competitor’s brand, and 19 could not make up their minds.
Determine whether there is sufficient evidence, at the 5% level of significance, to support the
soft drink maker’s claim against the default that the population is evenly split in its preference.
Solution:
We will use the critical value approach to perform the test. The same test will be performed
using the p-value
We must check that the sample is sufficiently large to validly perform the test.
Since pˆ=270/500=0.54p
,
pˆ(1−pˆ)n−−−−−−−−√=(0.54)(0.46)500−−−−−−−−−−−√≈0.02p^(1−p^)n=(0.54)(0.46)500≈0.02
hence
[pˆ−3 pˆ(1−pˆ)n−−−−−√, pˆ+3 pˆ(1−pˆ)n−−−−−√ ]=[0.54−(3)(0.02),0.54+(3)
(0.02)]=[0.48,0.60]⊂[0,1][p^−3 p^(1−p^)n, p^+3 p^(1−p^)n ]=[0.54−(3)(0.02),0.54+(3)
(0.02)]=[0.48,0.60]⊂[0,1]
so the sample is sufficiently large.
Step 1. The relevant test is
H0:p vs. Ha:p=>0.500.50@ α=0.05H0:p=0.50 vs. Ha:p>0.50@ α=0.05
where p denotes the proportion of all adults who prefer the company’s beverage over that of
its competitor’s beverage.
Step 2. The test statistic is
Z=pˆ−p0p0 q0n−−−−√Z=p^−p0p0 q0n
and has the standard normal distribution.
Step 3. The value of the test statistic is
Z=pˆ−p0p0 q0n−−−−√=0.54−0.50(0.50)(0.50)500−−−−−−−−√=1.789Z=p^−p0p0
q0n=0.54−0.50(0.50)(0.50)500=1.789
Step 4. Since the symbol in Ha is “>” this is a right-tailed test, so there is a single critical
value, zα=z0.05.zα=z0.05. Reading from the last line in Figure 12.3 "Critical Values of " its value
is 1.645. The rejection region is [1.645,∞).[1.645,∞).
Step 5. As shown in Figure 8.15 "Rejection Region and Test Statistic for " the test statistic falls
in the rejection region. The decision is to reject H0. In the context of the problem our
conclusion is:
The data provide sufficient evidence, at the 5% level of significance, to conclude that a majority
of adults prefer the company’s beverage to that of their competitor’s.
From the normality plots, we conclude that both populations may come from normal
distributions.
Assumption 3: Do the populations have equal variance? Yes, since s1s1 and s2s2 are not that
different. How do conclude this? By using a rule of thumb where the ratio of the two sample
standard deviations is from 0.5 to 2. (They are not that different
as s1/s2=0.683/0.750=0.91s1/s2=0.683/0.750=0.91 is quite close to 1. We will discuss this in
more details and quantify what is "close" later in this lesson.)
We can thus proceed with the pooled t-test.
Let μ1μ1 denote the mean for the new machine and μ2μ2 denote the mean for the old machine.
Step 1.
H0:μ1−μ2=0H0:μ1−μ2=0,
Ha:μ1−μ2<0Ha:μ1−μ2<0
Step 2. Significance level:
α=0.05α=0.05.
Step 3. Compute the t-statistic:
sp=√ 9⋅(0.683)2+9⋅(0.750)210+10−2 =0.717sp=9⋅(0.683)2+9⋅(0.750)210+10−2=0.717
t∗=(¯x1−¯x2)−0sp√ 1n1+1n2 =42.14−43.230.717⋅√ 110+110 =−3.40t∗=(x¯1−x¯2)−0sp1n1+1n2
=42.14−43.230.717⋅110+110=−3.40
Step 4. Critical value:
Left-tailed test
Critical value = −tα=−t0.05−tα=−t0.05
Degrees of freedom =10+10−2=18=10+10−2=18
−t0.05=−1.734−t0.05=−1.734
Rejection region t∗<−1.734t∗<−1.734
Step 5. Check to see if the value of the test statistic falls in the rejection region and decide
whether to reject Ho.
t∗=−3.40<−1.734t∗=−3.40<−1.734
Reject H0H0 at α=0.05α=0.05
We can perform the separate variances test using the following test statistic:
t∗=¯x1−¯x2√ s21n1+s22n2 t∗=x¯1−x¯2s12n1+s22n2
with
df=(n1−1)⋅(n2−1)(n2−1)C2+(1−C)2(n1−1)df=(n1−1)⋅(n2−1)(n2−1)C2+(1−C)2(n1−1)
(round down to nearest integer)
where
C=s21/n1s21n1+s22n2C=s12/n1s12n1+s22n2
NOTE: This calculation for the exact degrees of freedom is cumbersome and is typically done
by software. An alternate, conservative option to using the exact degrees of freedom
calculation can be made by choosing the smaller of n1−1n1−1 and n2−1n2−1.
test of hypothesis for a binomial proportion
Example 1: Suppose you have a die and suspect that it is biased towards the number three, and
so run an experiment in which you throw the die 10 times and count that the number
three comes up 4 times. Determine whether the die is biased.
Define x = the number of times the number three occurs in 10 trials. This random variable has
the binomial distribution where π is the population parameter corresponding to the
probability of success on any trial. We use the following null and alternative hypotheses:
H0: π ≤ 1/6; i.e. the die is not biased towards the number 3
H1: π > 1/6
Setting α = .05, we have
P(x ≥ 4) = 1–BINOM.DIST(3, 10, 1/6, TRUE) = 0.069728 > 0.05 = α.
and so we cannot reject the null hypothesis that the die is not biased towards the number 3
with 95% confidence.
Example 2: We suspect that a coin is biased towards heads. When we toss the coin 9 times,
how many heads need to come up before we are confident that the coin is biased towards
heads?
We use the following null and alternative hypotheses:
H0: π ≤ .5
H1: π > .5
Using a confidence level of 95% (i.e. α = .05), we calculate
BINOM.INV(n, p, 1–α) = BINOM.INV(9, .5, .95) = 7
which means that if 8 or more heads come up then we are 95% confident that the coin is
biased towards heads, and so can reject the null hypothesis.
We confirm this conclusion by noting that P(x ≥ 8) = 1–BINOM.DIST(7, 9, .5, TRUE) = 0.01953
< 0.05 = α, while P(x ≥ 7) = 1–BINOM.DIST(6, 9, .5, TRUE) = .08984 > .05.
Example 3: Historically a factory has been able to produce a very specialized nano-technology
component with 35% reliability, i.e. 35% of the components passed its quality assurance
requirements. They have now changed their manufacturing process and hope that this has
improved the reliability. To test this, they took a sample of 24 components produced using the
new process and found that 13 components passed the quality assurance test. Does this show a
significant improvement over the old process?
We use a one-tailed test with null and alternative hypotheses:
H0: p ≤ .35
H1: p > .35
p-value = 1–BINOM.DIST(12, 24, .35, TRUE) = .04225 < .05 = α
and so conclude with 95% confidence that the new process shows a significant improvement.
test of hypothesis for the difference between binomial proportions
Given a set of N1 observations in a variable X1 and a set of N2 observations in a variable X2, we
can compute a normal approximation test that the two proportions are equal (or alternatively,
that the difference of the two proportions is equal to 0). In the following, let p1 and p2 be the
population proportion of successes for samples one and two, respectively.
The hypothesis test that the two binomial proportions are equal is
H0: p1 = p2
Ha: p1 ≠ p2
Z=p1^−p2^p^(1−p^)(1/n1+1/n2)√
Test Statistic: where p^ is the proportion of successes for the combined sample and
p^==n1p1^+n2p2^n1+n2X1+X2n1+n2
Significance Level: Α
For a two-tailed test
Z>Φ−1(1−α/2)
Z<Φ−1(α/2)
Critical Region: For a lower tailed test
Z<Φ−1(α)
For an upper tailed test
Z>Φ−1(1−α)
Conclusion: Reject the null hypothesis if Z is in the critical region
inference from small samples :student's t distribution
The fact that as the number of observations increase, the distribution values will often tend
towards a normal frequency distribution allows statisticians to make important inferences
about statistical properties. However, in practice, due to experimental costs or the availability
of subject populations, scientists often are constrained to use small sample sizes.
The t distribution is very useful in these cases.
The rationale for the t distribution:
For small samples, where the sample estimates are less representative of the
population, t provides an alternative to the normal curve that is more conservative.
The t distribution is sometimes named Student’s t distribution after the pseudonym of the
statistician William Sealy Gosset, who developed it while working at the Guiness Brewery.
The t distribution describes the frequency distribution of small numbers of samples
chosen from a normal distribution. Like the normal distribution, the function that defines its
probability density is complex and understanding it is not required to make use of
the t distribution.
Defining the t statistic
The t distribution looks a lot like the normal distribution, but it does not approach zero as
quickly:
1 n1
2 n2
Small sample size studies use the student t statistics and large sample sizes studies use the
standard normal z-score statistics.
If we let ( and and be a combined standard
deviation for both sample distribution or data sets, then
Also
Confidence Interval is
2. Know how to use appropriate statistics to test if two sample means are equal or if their
difference = 0 (small sample size).
3 Types of tests in comparing two sample means:
When comparing the sample means, there are 3 questions to consider:
Question 1: : Is ? Ha (Two-tailed test)
Question 2: : Is ? Ha (Right-tailed test)
Question 3: : Is ? Ha (Left-tailed test)
Example 4 is an example of the pool t-test.
Question 1: Is ? Ha (Two-tailed test)
Problem 1. Two types of cars are compared for acceleration rate. The test runs are recorded
for each car and the results for the mean elapsed time recorded below:
Sample standard
Sample mean Sample size
deviation
Car A (x1) 8.5 1.8 20
Car B (x2) 7.2 2.1 30
Construct a 98% CI for the difference in the mean elapsed time for the two types of cars. Using
this CI, determine if there is a difference in the mean elapsed times?
Given difference , at least one of the same < 30 (small so must use the
student's t distribution or t-statistics),
Step 1 - Hypothesis: The claim that , the null hypothesis.
The alternate hypothesis is that
H0 :
Ha :
Step 2. Select level of significance: This is given as (2% = 100 - 98)
UNIT-5
POINT ESTIMATION & NON-PARAMETER TEST
The central limit theorem (CLT) is one of the most important results in probability theory. It
states that, under certain conditions, the sum of a large number of random variables is
approximately normal. Here, we state a version of the CLT that applies to i.e., random
variables. Suppose that X1, X2,...,Xn are random variables with expected values EXi=μ<∞ and
variance Var (Xi)=σ2<∞. Then as we saw above, the sample mean X¯=X1+X2+...+Xn has mean
EX¯=μ and variance Var(X¯) =σ rootn. Thus, the normalized random variable
sampling distribution of sample mean and proportion.
sample statistic is a single value that estimates a population parameter, we refer to the statistic
as a point estimate.
Before we begin, we will introduce a brief explanation of notation and some new terms that we
will use this lesson and in future lessons.
Notation:
Sample mean: book uses y-bar or y¯; most other sources use x-bar or x¯
Population mean standard notation is the Greek letter μ
Sample proportion: book uses π-hat (^π); other sources use p-hat, (^p)
Population proportion: book uses ππ; other sources use p
[NOTE: Remember that the use of ππ is NOT to be interpreted as the numeric representation of
3.14 but instead is simply a symbol.]
Terms
Standard error – standard deviation of a sample statistic
Standard deviation – relates to a sample.
Parameters, e.g., mean and SD, are summary measures of population, e.g., μ and σ. These are
fixed.
Statistics, e.g., sample mean and sample SD, are summary measures of a sample, e.g., ¯x and s.
These vary. Think about taking a sample and the sample is not always the same therefore the
statistics change. This is the motivation behind this lesson - due to this sampling variation the
sample statistics themselves have a distribution that can be described by some measure of
central tendency and spread.
Sampling error is the error resulting from using a sample characteristic to estimate a
population characteristic.
Sample size and sampling error: As the dot pots above shows, the possible sample means
cluster more closely around the population mean as the sample size increases. Thus, possible
sampling error decreases as sample size increases.
The mean of sample mean is the population mean. That is: μ¯y=μ
When sampling with replacement, the standard deviation of the sample mean called the
standard error equals the population standard deviation divided by the square root of the
sample size. That is: σ¯y=σ√n.
Sampling Distribution of the Mean When the Population is Normal:
Key Fact: If the population is normally distributed with mean μμ and standard deviation σ,
then the sampling distribution of the sample mean is also normally distributed no matter what
the sample size is. When the sampling is done with replacement or if the population size is
large compared to the sample size, it follows from the above two formulas that ¯yy¯ has mean
μμ and standard error σ/√n
SPECIAL NOTE: In the rest of this course, we only deal with the case when the sampling is done
with replacement or if the population size is much larger than the sample size.
Application of Sample Mean Distribution:When we know the sample, mean is normal or
approximately normal, and we know the population mean, μμ, and population standard
deviation, σσ, then we can calculate a z-score for the sample mean and determine probabilities
for it where:
Z=¯y−μσ/√n
Large Sample Estimation:
The Central Limit Theorem says that, for large samples (samples of size n ≥ 30), when viewed
as a random variable the sample mean X- is normally distributed with mean μX−=μand
standard deviation σX−=σ/n The Empirical Rule says that we must go about two standard
deviations from the mean to capture 95% of the values of X− generated by sample after
sample. A more precise distance based on the normality of X−- is 1.960 standard deviations,
which is E=1.960σ/n
For 100(1−α) % confidence the area in each tail is α∕2.
Figure 7.4
We'll start the lesson with some formal definitions. In doing so, recall that we denote the
nrandom variables arising from a random sample as subscripted uppercase letters:
X1, X2, ..., Xn
The corresponding observed values of a specific random sample are then denoted as
subscripted lowercase letters:
x1, x2, ..., xn
Definition: The range of possible values of the parameter θ is called the parameter space Ω
(the greek letter "omega").
For example, if μ denotes the mean grade point average of all college students, then the
parameter space (assuming a 4-point grading scale) is:
Ω = {μ: 0 ≤ μ ≤ 4}
And, if p denotes the proportion of students who smoke cigarettes, then the parameter space
is:
Ω = {p: 0 ≤ p ≤ 1}
Definition. The function of X1, X2, ..., Xn, that is, the statistic u (X1, X2, ..., Xn), used to
estimate θ is called a point estimator of θ.
Definition. The function u (x1, x2, ..., xn) computed from a set of data is an observed
point estimate of θ.
In simple terms, any statistic can be a point estimate. A statistic is an estimator of some
parameter in a population. For example:
• The sample standard deviation (s) is a point estimate of the population standard deviation
(σ).
• The sample mean (̄x) is a point estimate of the population mean, μ
• The sample variance (s2 is a point estimate of the population variance (σ2).
In more formal terms, the estimate occurs as a result of point estimation applied to a set of
sample data. Points are single values, in comparison to interval estimates, which are a range of
values. For example, a confidence interval is one example of an interval estimate.
Finding the Estimates
Four of the most common ways to find an estimate:
• The Method of Moments is based on the law of large numbers and uses relatively simple
equations to find point estimates. Is often not too accurate and has a tendency to be biased.
More info.
• Maximum Likelihood: uses a model (for example, the normal distribution) and uses the
values in the model to maximize a likelihood function. This results in the most likely parameter
for the inputs selected. More info.
• Bayes Estimators: minimize the average risk (an expectation of random variables). More
info.
• Best Unbiased Estimators: several unbiased estimators can be used to approximate a
parameter. Which one is “best” depends on what parameter you are trying to find? For
example, with variance, the estimator with the smallest variance is “best”. More info.
Interval Estimation:
Interval estimation, in statistics, the evaluation of a parameter—for example, the mean
(average)—of a population by computing an interval, or range of values, within which the
parameter is most likely to be located. Intervals are commonly chosen such that the parameter
falls within with a 95 or 99 percent probability, called the confidence coefficient. Hence, the
intervals are called confidence intervals; the end points of such an interval are called upper
and lower confidence limits.
The interval containing a population parameter is established by calculating that statistic from
values measured on a random sample taken from the population and by applying the
knowledge (derived from probability theory) of the fidelity with which the properties of a
sample represent those of the entire population.
The probability tells what percentage of the time the assignment of the interval will be correct
but not what the chances are that it is true for any given sample. Of the intervals computed
from many samples, a certain percentage will contain the true value of the parameter being
sought.
Here we consider the joint estimation of a multivariate set of population means. That is, we
have observed a set of p X-variables and may wish to estimate the population mean for each
variable. In some instances, we may also want to estimate one or more linear combinations of
population means. Our basic tool for estimating the unknown value of a population parameter
is a confidence interval, an interval of values that is likely to include the unknown value of the
parameter.
The general format of a confidence interval estimates of a population mean is
Sample mean ± Multiplier × Standard error of mean
In this formula, ¯xj is the sample mean, sj is the sample standard deviation and n is the sample
size. The multiplier value is a function of the confidence level, the sample size, and the strategy
used for dealing with the multiple inference issue.
Confidence Interval of Population Means:
Estimating the mean:
Estimating the mean of a normally distributed population entails drawing a sample of size n
and computing which is used as a point estimate of.
It is more meaningful to estimate by an interval that communicates information regarding the
probable magnitude of.
Sample distributions and estimation:
Interval estimates are based on sampling distributions. When the sample mean is being used
as an estimator of a population mean, and the population is normally distributed, the sample
mean will be normally distributed with mean, equal to the population mean, and variance.
Example
Suppose a researcher, interested in obtaining an estimate of the average level of some enzyme
in a certain human population, takes a sample of 10 individuals, determines the level of the
enzyme in each, and computes a sample mean of x = 22. Suppose further it is known that the
variable of interest is approximately normally distributed with a variance of 45. We wish to
estimate .
Solution
An approximate confidence interval for is given by:
Components of an interval estimate:
This is the general form for an interval estimate.
estimator ± (reliability coefficient) (standard error)
The general form for an interval estimate consists of three components. These are known as
the estimator, the reliability coefficient, and the standard error.
Estimator: The interval estimate of is centred on the point estimate of . As noted in the
table above, is an unbiased point estimator for .
Reliability coefficient: Approximately 95% of the values of the standard normal curve lie
within 2 standard deviations of the mean. The z score in this case is called the reliability
coefficient. We use a value of z that will give the correct interval size. The proper z score
depends on the value of being used. Generally, the three values of most commonly used
are .01, .05 and .10. Their corresponding z scores are 1.645, 1.96 and 2.575, respectively.
intervals in the form will, in the long run, include the population mean, . The
quantity 1- is called the confidence coefficient or confidence level and the interval,
percent confident that the single computed interval, , contains the population
mean, .
The t distribution:
In most real-lifesituations, the variance of the population is unknown. We know that the z
= df/(df-2) or
4. The range is - to + .
5. t is really a family of distributions because the divisors are different.
6. Compared with the normal distribution, t is less peaked and has higher tails.
7. t distribution approaches the normal distribution as n-1 approaches infinity.
Confidence interval for a mean using t
When sampling is from a normal distribution whose standard deviation, , is unknown, the
100(1- ) percent confidence interval for the population mean, , is given by: