Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 138

Research Design

MEANING OF RESEARCH DESIGN


• Defining the research problem is the preparation of the
design of the research project – research design

• The conceptual structure within which research is conducted

• The blueprint for the collection, measurement and analysis of


data ( A framework for conducting research project)
• It includes an outline of what the researcher will do from
writing the hypothesis and its operational implications to the
final analysis of data
Research design focused on :
• (i) What is the study about?
• (ii) Why is the study being made?
• (iii) Where will the study be carried out?
• (iv) What type of data is required?
• (v) Where can the required data be found?
• (vi) What periods of time will the study include?
• (vii) What techniques of data collection will be used?
• (viii) How will the data be analyzed?
Research design cont…
• It is a plan that specifies the sources and types of information
relevant to the research problem

• It is a strategy specifying which approach will be used for


gathering and analyzing the data

• It also includes the time and cost budgets since most studies
are done under these two constraints.
Research design cont…
Research design must, at least, contain

• a clear statement of the research problem

• procedures and techniques to be used for gathering


information

• the population to be studied

• methods to be used in processing and analyzing data.


DIFFERENT RESEARCH DESIGNS

• Exploratory research
• Descriptive research
• Hypothesis-testing research studies
(Experimental Research Design)
• Case Study
Exploratory research
• Termed as formulative research studies

• Formulating a problem for more precise investigation


or of developing the working hypotheses from an
operational point of view

• The major emphasis in such studies is on the


discovery of ideas and insights
Three methods in exploratory studies
• The survey of concerning literature
• Experience survey
• Analysis of ‘insight-stimulating’ examples
- where there is little experience to serve as a guide
- Intensive study
- unstructured interviewing may take place
Descriptive research
• Survey design
• Rigid design (design must make enough provision for
protection against bias and must maximize reliability)
• Describing the characteristics of a particular
individual, or of a group
• The researcher must be able to define clearly,
what he wants to measure and must find adequate
methods for measuring it along with a clear cut
definition of ‘population’ he wants to study.
Descriptive design must be rigid and not
flexible and must focus attention on the following:

• Formulating the objective of the study (what the study is about


and why is it being made?)
• Designing the methods of data collection (what techniques of
gathering data will be adopted?)
• Selecting the sample (how much material will be needed?)
• Collecting the data (where can the required data be found and
with what time period should the data be related?)
• Processing and analyzing the data
• Reporting the findings.
Hypothesis-testing research studies (Expermental Research Design)

• Generally known as experimental studies

• The researcher tests the hypotheses of causal


relationships between variables

• Permit drawing inferences about causality


Experimental Research

• Experimental Research
– Experimental research differs from the other
research approaches noted above through its
greater control over the objects of its study.
– The researcher strives to isolate and control
every relevant condition which determines the
events investigated, so as to observe the effects
when the conditions are manipulated.

07/20/2022 12
Experimental Research

 Experimental Research
◦ When the researcher has established that the study
is amenable to experimental methods, a prediction
(technically called a hypothesis) of the likely
cause-and-effect patterns of the phenomenon has to
be made.
◦ This allows decisions to be made as to what variables
are to be tested and how they are to be controlled and
measured.

07/20/2022 13
Experimental Research

• Types of Experimental Research

– ‘Pre-Experimental’:
• No control or comparison group to compare
• A group is given a pre-test (E.g., Supervisory
behavior), is exposed to a treatment (E.g., Training),
and is then administered a post-test (supervisory
behavior) to measure the effects of the treatment.
• The effects of the treatment are measured by the
difference between the pre-test and the post-test.

07/20/2022 14
Experimental Research

• Types of Experimental Research


– True Experimental Design
• This classical experimental design has an
experimental/treatment group and a control group both
measured at pre-test and post-test on the dependent variable.
• There is random allocation of cases to experimental and
control groups.
• The only difference between the two groups is that one
received the treatment and the other did not.

07/20/2022 15
Survey Design

 Survey Design

◦ This is a research design in which a group of people or


items is studied by collecting and analyzing sample
data or data from the entire population. If the survey
involves study of a sample from the population, it is
referred to as a Sample Survey. If it involves the study
of the entire population, it is referred to as a Census
Survey.

07/20/2022 16
Survey Design

• Types Of Survey
– A cross-sectional survey collects data at one time. The researcher
can generalize findings from such one-shot studies to the sampled
population only at the time of the survey.
– A longitudinal survey takes place over time with two or more
data collections and has the benefit of measuring change over
time.
• Trend survey
• Cohort survey
• Panel survey

07/20/2022 17
Case Study Design
• Case Study Research Design:
– This involves intensive study geared towards a
thorough understanding of a given social unit or
business activities. It is worth noting that case
studies are of limited generalizability.
– Only very few units are involved in case studies
and as such, the findings cannot be generalized to
the population.
– much emphasis is on obtaining a complete
description and understanding of factors in each
case, regardless of the number involved.
07/20/2022 18
Case Study Design

• Strength
– One of the greatest strengths of the case study design is
its adaptability to different types of research question
and to different research settings.
– The use of multiple sources of evidence allows
triangulation of findings
– Offer the benefit of studying phenomena in detail and in
context, particularly in situations where there are many
more variables of interest than there are observations.

07/20/2022 19
Case Study Design

• Weakness
– Selection bias whereby the choice of cases biases
the findings of the research
– Concern raised in generalizability, particularly of
single case studies
– The requirement it places on the researcher in terms
of dealing with the complexity of field research if
multiple data collection methods are used

07/20/2022 20
Design Strategies

• Quantitative Research
– is predominantly used as a synonym for
any data collection technique (such as a
questionnaire) or data analysis procedure
(such as graphs or statistics) that generates
or uses numerical data.

07/20/2022 21
Design Strategies

• Qualitative Research
– is used predominantly as a synonym for
any data collection technique (such as
an interview) or data analysis procedure
(such as categorizing data) that
generates or use non-numerical data.

07/20/2022 22
Design Strategies

• Mixed Methods Approach: is the general


term for when both quantitative and
qualitative data collection techniques and
analysis procedures are used in a research
design; either at the same time (parallel) or
one after the other (sequential) but does not
combine them

07/20/2022 23
Design Strategies

• Why Mixed Methods Approach?


– Triangulation (to corroborate research finding)
– Facilitation (as an aid)
– Complementarity (to fill gaps)
– Generality ( to contextualize main study)
– Aid Interpretation(to help explain relationships and
aspects)
– Study different aspects
– Solving a puzzle (method reveals unexplainable)

07/20/2022 24
Design Strategies

• Mono or Multiple Method

– Mono Method: using a single data collection


technique and corresponding analysis
procedures; or
– Multiple Methods: using more than one data
collection techniques and analysis procedures
to answer your research question.

07/20/2022 25
Measurement
• Operationalization
– Dimensions
– Indicators
• Level of Measurements
– Nominal
– Ordinal
– Interval
– Ratio

07/20/2022 26
Measurement of Quality
• Criteria of Measurement Quality
– Reliability (Consistency of
Measurement),
– Accuracy (getting the right answer on
average, and
– Validity (Am I really tracking what I
want to track.

07/20/2022 27
Measurement of Quality

• Validity: Validity refers to the extent to which


researchers measure what they planned to measure.
– Criterion-Related Validity: is the degree to
which a measure correlates with some other
measure accepted as an accurate indicator of the
concept. Example: voting preference (measured
prior to the election) is correlated with actual
voting behavior.

07/20/2022 28
Measurement of Quality
– Face Validity: Some simply involve researchers asking
themselves if their measures seem like logical and
common sense ways to measure concepts. Example:
measuring income of a family, valid in case the income
of the husband and the wife is considered, if both are
earning.
– Content Validity: is similar to face validity but uses
stricter standards. For a measure to have content
validity, it must capture all dimensions or features of
the concept as it is defined. For example, a general job
satisfaction measure should include pay satisfaction,
job security satisfaction, satisfaction with promotion
opportunities, and so on.
07/20/2022 29
Measurement of Quality
◦ Construct Validity: of a measure refers to one of two
validity assessment strategies:
◦ First, it can refer to whether the variable, when assessed
with this measure, behaves as it should. For example, if the
theory (and/or past research) says it should be related
positively to another variable Y, then that relationship
should be found when the measure is used.
◦ The second use of construct validity refers to the degree to
which multiple indicators of the concept are related to the
underlying construct and not to some other construct. For
example, if a researcher has five indicators of cultural
capital and four indicators of social capital, a factor
analysis should produce two lowly correlated factors, one
for each set of indicators.
07/20/2022 30
Measurement of Quality

• Reliability
– Test-retest Method–make the same measurement more
than once – should expect same response both times
– Inter-Rater Reliability– compare measurements from
different raters; verify initial measurements
– Split-Half Method– make more than one measure of
any concept; see if each measures the concept
differently

07/20/2022 31
• Reading Assignment on
Scaling Design i.e.
 COMPARATIVE SCALING
TECHNIQUES and NON-COMPARATIVE
SCALING TECHNIQUES
Sampling Design

33
Sampling Design
• Sampling is part of the target population,
carefully selected to represent the population.
• Generalization of the research finding is depend
up on the sampling procedure followed.
• In physical science there is no problem of
sampling. Any fragment or piece of phenomenon
is the true representative.
• therefore, generalization based on a sample is
true.
• But in behavioral science sampling is a crucial
problem to have a representative sample.
Census and sample survey
• All items in a field of inquiry constitute a
“universe” or “population.
Census: is a complete enumeration of all its
items in the “population".
 No element of is left and high accuracy is
obtained.
 But in practice this is may not true.
 Only government is institution that can make a
complete census.
 But if the population is small no need to
have a sample.
Census and sample survey
Sampling:
Population is the entire mass of observation
which is the parent group from which a
sample is to be formed.
The sampling observation provide only an
estimate of the population characteristics.
Function of Population and Sampling
• Research work is guided by inductive thinking.
• The researcher proceeds from specificity to generality.
• The sample observation is the specific situation, which
are applied to the population- the general situation.
• Sampling is the fundamental to all the statistical
techniques and statistical analysis.
• The measure of sample is Statistics(mean, sd,etc.)
• The measure of population is parameter “.
• The accuracy of the parameter is depend on sample
representativeness or statistics.
Sample Design
• A Sample Design is a specific plan for obtaining a
sample to the technique of the procedure the
researcher would adopt in selecting items for the
sample.
• Sample design may as well lay down the number of
items to be included in the sample, i.e. the size of the
sample.
• Sample design is determine before data are collected.
• Sample design s are relatively more precise and easy
to apply
39
Selecting a Sample

Sample: subset SAMPLE


of a larger population.

POPULATION
Why Sample
– Its save time
– It reduces cost
– More reliable result can obtained
– It provides more detailed information
– Sometimes only method to depend upon
– Administrative convenience
– More scientific
Method of sampling
• Probability sampling
• Non-probability sampling
Sampling
• Who is to be sampled?
• How large a sample?
• How will sample units be selected?
– Probability Samples – every member of the
population has a known, nonzero probability of
being selected
– Non-probability Samples
Character tics of good sample design
• It must result in a truly representative sample
• It must be such which results in a small
sampling error
• It must be viable in the context of cost
• The result of the sample can apply for the
universe with reasonable context.
Step in sample design
• Type of universe: whether it is finite or infinite must
be clearly defined
• Sampling unit: before selecting a sample unit must
be decided(eg. state ,village, family.etc.)
• Source list or sample frame : - from which sample is
drawn –it contain the names of all items of a
universe
• Size of sample: number of items be selected from
the universe to constitute a sample. The size of a
sample should neither be excessively large nor too
small.
Criteria for selecting a sample procedure
• Keep away from causes of incorrect inference
resulting from the data
• Systematic bias- results from errors in sampling
procedures and can’t be reduced or eliminated by
increasing the sample size
• Systematic bias is the result of one or more of the
following:
 Inappropriate sampling frame-bias of representative
 Defective measuring device
 Non-respondent
 Natural bias in reporting of data
Bases of sampling design
• Sampling design is based on two factors:
• The representation basis: the sample may be
probability sampling (based on concept of
random selection) or it may be non-
probability sampling(non-random sampling-
arbitrary ).
• The element of selection Techniques basis:
the sample may be either unrestricted or
restricted
Probability sampling

• Probability sampling is also known as “random


sampling” or chance of sampling “.
• A probability sampling scheme is one in which every unit in
the population has a chance (greater than zero) of being
selected in the sample, and this probability can be accurately
determined.
• When every element in the population does have the same
probability of selection, this is known as an 'equal
probability of selection' (EPS) design. Such designs are also
referred to as 'self-weighting' because all sampled units are
given the same weight.
• The sample error is inversely in proportion to the size of
the sample.
Types of probability sampling
Types of probability sampling
• Simple Random Sampling
• Systematic Sampling
• Stratified Sampling
• Cluster Sampling
• Multistage Sampling
Simple Random Sampling
• In which each element of the population has an equal
independent chance of being included in the sample.
• Applicable when population is small, homogeneous &
readily available
• All subsets of the frame are given an equal probability.
Each element of the frame thus has an equal probability
of selection.
• It provides for greatest number of possible samples. This
is done by assigning a number to each unit in the
sampling frame.
• A table of random number or lottery system is used to
determine which units are to be selected.
Simple Random Sampling

• Advantage:
 It requires a minimum knowledge of population
 It is free from subjectivity
 It provides appropriate data for our purpose
 Least cost ,time
 Better than judgmental sampling
• Disadvantage:
 Representation of sample can not be insures
 It does not use the knowledge about the population up to
date information
 Inference is depend on the size of the population
Systematic Sampling
• It is by taking the Kth element of the
population(N/n)
• It is improvement of the simple random
sampling
• It requires a complete information about the
population
Systematic Sampling
• Advantage:
 More efficient and simple to select a sample
 Sample may be representative and comprehensive
 Observation of the sample may be used for drawing
conclusion
• Disadvantage:
 It is not free from error, subjective
 Knowledge of population is essential
 It cannot ensure representation
Stratified Sampling

– Where population embraces a number of distinct


categories, the frame can be organized into
separate "strata." Each stratum is then sampled as
an independent sub-population, out of which
individual elements can be randomly selected.
– Every unit in a stratum has same chance of being
selected.
– Using same sampling fraction for all strata ensures
proportionate representation in the sample.
– Adequate representation of minority subgroups of
interest can be ensured by stratification & varying
sampling fraction between strata as required.
Reason for Stratified Sampling

• To increase a sample’s statistical efficiency


• To provide adequate data for analyzing of the
various sub-population
• To enable different research methods and
procedures to be used in different strata.
• Stratified sampling is more usually efficient
statistically than simple random sampling
• Types of stratified sampling:
 Disproportional stratified sampling
 Proportionate stratified sampling
Stratified Sampling

• Advantage:
• It is a good representative of the population
• It is an improvement of the earliest methods
• It is an objective method of sampling
• Observation can be used for inferential purpose
• Disadvantage:
• It is difficult to decide the relevant criteria for
stratifying
• Only one criteria can be used for stratifying
• It is costly and time consuming
• There is a risk in generalization. Knowledge of the
population is needed.
Cluster Sampling

– Population can be divided into a number of relatively small


sub divisions which are themselves cluster of still smaller
units.
– Cluster sampling is an example of 'two-stage sampling' .
– First stage a sample of areas is chosen;
– Second stage a sample of respondents within those areas is
selected.
– Population divided into clusters of homogeneous units,
usually based on geographical contiguity.
– Sampling units are groups rather than individuals.
– A sample of such clusters is then selected.
– All units from the selected clusters are studied.
Cluster Sampling

• Advantage:
• It may be a good representative of the population
• It is an easy method
• It is an economical
• Useful when we do not have list of population
• Disadvantage:
• It is not free from error
• It is not comprehensive
• Areas may be dissimilar
Difference Between Strata and Clusters
– Although strata and clusters are both non-
overlapping subsets of the population, they differ in
some ways.
– All strata are represented in the sample; but only a
subset of clusters are in the sample.
– With stratified sampling, the best survey results
occur when elements within strata are internally
homogeneous. However, with cluster sampling, the
best results occur when elements within clusters are
internally heterogeneous
Multistage Sampling

– Complex form of cluster sampling in which two


or more levels of units are embedded one in the
other.
– First stage, random number of zones chosen in all
states.
– Followed by random number of woredas/towns
villages.
– Then third stage units will be houses.
– All ultimate units (houses, for instance) selected
at last step are surveyed.
Types of Non –Probability Sampling
Convenience sampling
Purposive Sampling:
-Judgmental sampling
-Quota sampling
Snowball Sampling
Convenience sampling
 It is unrestricted non- probability sampling.
 Researchers have freedom to choose whoever
he find.eg., information from peers or friends
 It is the least reliable but the most cheapest
method.
 There is no control to ensure precision.
Judgmental sampling
• Occurs when researcher selects sample
members to conform to same sample criteria
• In the earliest period of exploratory study ,
judgmental sample is appropriate
• It is also good when one wish to select a
biased group for screening purpose
Judgmental sampling
• Advantage:
• In this technique of sampling knowledge of
the investigator can be best used
• It is economical
• Disadvantage:
• This technique is subjective
• It is not free from error
• It includes uncontrollable variables
Quota sampling
• The population is classified into several categories
on the basis of judgment or assumption or the
previous knowledge of the proportion of
population falling each category is decided.
• It aims at making the best use of stratification
without incurring highest costs involved in
probability methods
• It composed both judgmental and probability
sampling
• It is very arbitrary
Quota sampling
• Advantage:
• It is an improvement over Judgmental sampling
• It I an easy sampling technique
• It is most frequently used in social survey
• Disadvantage:
• It is not a representative sample
• It is not free from error
Snowball sampling

• In the initial stage individuals may not be selected


through probability methods.
• The group is then used to locate other who possess
similar characteristics and whom in turn identify
others.
Scales of Measurement
– Nominal Scale - groups or classes
Gender
– Ordinal Scale - order matters
Ranks (top ten videos)
– Interval Scale - difference or distance matters – has
arbitrary zero value.
Temperatures (0F, 0C)
– Ratio Scale - Ratio matters – has a natural zero
value.
Salaries
70
UNIT 3
Methods of Data Collection
Methods of Data Collection
• Primary Data The primary data are those which are
collected afresh and for the first time, and thus happen to be
original in character.

• Secondary Data :are those which have already been


collected by someone else and which have already been
passed through the statistical process
Collection of Primary Data
• Primary data is collected through
Observation
Interview
Telephone interview
Questionnaire
Schedule
Data Collection through Observation

• Observation methods can be (structured or unstructured


observation)
1. Participant observation-The observer making himself a
member of the group observed
2. Non- participative
• Disguised observation-the researcher presence may be
unknown to the people he is observing
4. Controlled observation: takes place according to definite pre-
arranged plans, involving experimental procedure
5. Uncontrolled observation : if is done in its natural setting
Data Collection through Interview

• Personal interview
• Telephone interview
Data Collection through Interview

• Personal interviews: a person known as interviewer asking


questions generally in a face to face contact to the other
party
Interview can be:
 Structured interview
 Unstructured interview
 Semi-structured interview
 Clinical interview-broad underlying feeling/motivation or
life experience of individual
 no-directive interview-simply to encourage respondents to
talk about the given topic
Data Collection through Interview

• Pre-requisites of interviewing:
• Interviewer must be carefully selected, trained and
briefed.
• There should be honest ,sincere, hardworking
• Every effort should be made to create friendly
atmosphere of trust and confidence
• Must ask questions properly
• Should not show surprise or disappoint
Interview method
• Advantage:
• It is possible to get complete response
• It is more personal than questionnaire
• Interviewer has much control over the flow and sequence of
questions
• it is possible to make survey responsive much to earlier
results
• Disadvantage:
• Information obtained are difficult to analyze
• It ca not be quantified
• It need trained interviewer
Telephone Interviews
 In telephone interviews, respondents are contacted
by telephone in order to collect data for surveys
 Telephone interviewing has been used for decades
and, in some ways, has advantages over other
methods of undertaking surveys
 With improvements in the IT-field, computers can be
used to assist in telephone interviewing, and answers
given by respondents can be entered by interviewers
directly into the computer, saving effort, time and
cost
Questionnaire design
What is a questionnaire

• A research tool for data collection


• It’s function is measurement
• The term ‘questionnaire’ used in different ways:
– often refers to self-administered and postal
questionnaires (mail surveys)
– some authors also use the term to describe
interview schedules (telephone or face-to-face)
Why use a questionnaire?
• Target large amount of people
• Use to describe, compare or explain
• Can cover activities and behaviour,
knowledge, attitudes, preferences
• Specific objectives, standardised and highly
structured questions
• Used to collect quantitative data – information
that can be counted or measured
Techniques for minimising non-response
• Good design
– Thoughtful layout, easy to follow, simple
questions, appearance, length, degree of interest
and importance, thank people for taking part
• Pre-notification
• Explanation of selection
• Sponsorship, e.g. letter of introduction /
recommendation
• Cover letter
Techniques for minimizing non-response
• Incentives
– Small future incentives, e.g. prize draw
– Understanding why their input is important
• Reminders
• Confidentiality
• Anonymity
• Pre-paid return envelopes
Question wording – things to avoid
• Abbreviations
• Alternative meanings (tea, cool, dinner)
• Ambiguity and vague wording (fairly, generally,
you – the respondent, household, family?)
• Doubled barrelled – ‘do you speak English or
French?’
• Double negatives
• Inappropriate categories
Question wording – things to avoid
• Leading questions
• Memory issues
• Social desirability
• Question complexity
Question wording – other things to think about
• Missing categories – include ‘other’, ‘don’t
know’ and ‘not applicable’
• Sensitive questions
• Simple language – not technical or slang
• Question ordering
• Open or closed questions?
– Closed question – choice of alternative replies
– Open question – written text (or spoken answers)
Open and closed questions
(from Oppenheim, 1992)
Strength Limitation
OPEN Freedom & spontaneity of Time-consuming
answer
Opportunity to probe Coding more problematic
Useful for testing hypothesis More effort from respondents
about ideas or awareness
CLOSED Requires little time Loss of spontaneous responses
No extended writing Bias in answer categories
Low costs Sometimes too crude
Easy to process May irritate respondents
Make group comparisons
easy
Useful for testing specific
hypothesis
Data Collection through Schedule
• It is very much like the collection of data through
questionnaire
• The difference is that-schedule( Performa containing a
set of questions) are being filled in by the enumerators
who are specially appointed for the purpose
• This method requires section of enumerators for filling
up schedules or assisting respondents to fill up
schedules
• Such enumerators have to be carefully selects and
trained
• Enumerators should be intelligent and must possess
the capability of cross-examination in order to find out
the truth
• He should be honest, sincere, hardworking and
patience
Types of questions
• Unstructured questions
• Structured questionnaire:
-Open ended questions
-Closed-ended questions
• Dichotomous or two –choice questions
• Multiple- choice questions
• Declarative questions-as choice but list of
statements
Essentials of good questionnaire

• It should be comparatively short and simple


• The size of the questionnaire should be kept to the
minimum
• Questions should proceed in logical sequence flow
from easy- to –difficult
• Personal and intimate questions should be left out
• Technical terms and vague expressions should be
avoided
• Danger words, catch words or words with emotional
connotations should be avoided
• Caution must also be excised in the use of phrases.
Questionnaire VS. Schedule

Difference between Questionnaire and Schedule:


Both methods are being increasingly used in survey
research. But from technical point of view there is
difference between them.
1. Questions are sent through mail but schedules are
generally filled out by the research worker
2. Questionnaire is relatively economical
3. In questionnaire it is not clear as who answer the question
4. Questionnaire may not be returned quickly
5. Questionnaire is used only when respondents are educated
and cooperative
6. No personal contact is established in questionnaire
7. There is a risk of incomplete and incorrect response in
questionnaires
Secondary Data
Secondary Data
• Primary Vs Secondary
• What is secondary data?
• Types? (Saunders 2000:190)
• Documentary
– Written documents
• Survey based secondary
– Census
– Continuous survey
– Adhoc surveys

94
Collection of Secondary Data
• Sources of secondary data can be:
• various publications (governmental and
international)
• Journals, books, magazines, News papers
reports, Public records, Statistics
Characteristics of Secondary data
• Secondary data should possess the following
characteristics:
• Reliability of the data-who collect, what were
the source, what was the method, when it was
collected, was there any bias, level of
accuracy?
• Suitability of data
• Adequacy of data
Issues in documentary research
• Authenticity
– You can’t always believe what you read
• Check – does it make sense / different versions / consistency / transcribed
by many/ circulated via those with interest/ reliable source
• Credibility
– Is it free from error / distortion
• Representativeness
• Constitute a representative sample of the universe of
documents as they originally existed
– Partial account of one person

97
UNIT 4
Method of Data Analysis
• Descriptive Statistics
• Parametric & non- parametric
• Univariate
• Bivariate
• Multivariate statistical techniques such as:
Multiple regressions
Discriminant analysis
Factor analysis
Cluster analysis
Descriptive Statistics
Descriptive statistics : are numerical and graphical methods
used to summarize data and bring forth the underlying
information.

The numerical methods include measures of central


tendency and measures of variability

99
Summary Measures:

 Measures of Central Tendency  Measures of Variability


 Median  Range
 Variance
 Mode
 Standard Deviation
 Mean

 Other summary
measures:
 Skewness
 Kurtosis
100
Descriptive statistics :Example
A marketing company has a sales staff of
20 sales executives. The data regarding
their age and total sales achieved in their
territories in a particular month are given .
We wish to calculate some basic
descriptive statistics for this data.
Parametric & non-parametric
Hypothesis Tests

Parametric Tests Non-parametric Tests


(Metric Tests) (Non-metric Tests)

One Sample Two or More One Sample Two or More


Samples Samples
* t test * Chi-Square
* Z test * K-S
* Runs
* Binomial
Independent Paired
Samples Samples Independent Paired
* Two-Group t
Samples Samples
* Paired * Chi-Square * Sign
test t test
* Z test * Median
* K-S
Comparing Means:
One or Two Samples t-Tests
- In many real life problems, while the population mean is known, the population standard deviation cannot be calculated. In such cases, t-tests should be used.
- Besides, the t-test does not require a big sample size.
Descriptive Statistics

There are three different types of t-tests

 One sample t-test

 Independent sample t-test

 Dependent (Paired) t-test

104
One Sample t-test
One sample t-test is used to compare the mean of a
single sample with the population mean

A business school in its advertisements claims that the


average salary of its graduates in a particular lean year is at
par with the average salaries offered at the top five business
schools. A sample of 30 graduates, from the business school
whose claim was to be verified, was taken at random.

105
independent samples T- test
In many real life situations, we cannot determine the exact
value of the population mean. We are only interested in
comparing two populations using a random sample from each.
Such experiments, where we are interested in detecting
differences between the means of two independent groups
are called independent samples test

EXAMPLE : An economist wants to compare the per capita


income of two different regions

106
Exemple: Independent t-test
A study was conducted to compare the efficiency of
the workers of two mines, one with private
ownership and the other with government
ownership. The researcher was of the view that
there is no significant difference in their efficiency
levels. 20 workers from the private sector mine and
24 from the government owned mine were selected
and their average output per shift was recorded. In
this problem we want to assess whether the
efficiency of the workers of the two mines is the
same

107
The null hypothesis in this case would be that there is no
difference in the efficiency of the workers of the two mines.

H0: Average output of the workers from mine 1 equals that


of the workers from mine 2

Ha ??

The variables are labeled as miner, mine, and output


respectively for entering into the SPSS Program

108
Dependent (paired sample t-test)

Experiments where the observations are made on


the same sample at two different times, is called
dependent or paired sample t-test.

EXAMPLE:

TEACHER wants to know if interactive teaching helps


students learn more as compared to one-way
lecturing
109
Example: paired t-test
• A corporate training institution claimed that its
training program can greatly enhance the efficiency of
call center employees. A big call center sent some of
its employees for the training program. The efficiency
was measured by the number of deals closed by each
employee in a one-month period. Data was collected
for a one-month period before sending the employees
for the training program. After the training program,
data was again collected on the same employees for a
one-month period.
The company wants to know……………….

110
• H0: The average output of the employees
is same before and after going through the
training program.

111
Comparing Means: Analysis of Variance

• Analysis of Variance(ANOVA) is used to compare


the means of more than two populations.
The null hypothesis, typically, is that all means are
equal
Analysis of variance must have a dependent
variable that is metric (measured using an interval
or ratio scale)
There must also be one or more independent
variables that are all categorical (non-metric).
Categorical independent variables are also called
factors
112
Types of ANOVA
1. One-way ANOVA : involves only one categorical variable, or a
single factor. In one-way analysis of variance, a treatment is the
same as a factor level.
2. If two or more factors are involved, the analysis is termed n-way
analysis of variance.
3. If the set of independent variables consists of both categorical
and metric variables, the technique is called analysis of
covariance (ANCOVA). In this case, the categorical independent
variables are still referred to as factors, whereas the metric-
independent variables are referred to as covariates.

113
4. Multivariate analysis of variance (MANOVA) is similar
to analysis of variance (ANOVA), except that instead of
one metric dependent variable, we have two or more.

In ANOVA, the means of the dependant variable are equal


across the groups

• In MANOVA, the null hypothesis is that the vectors of


means on multiple dependent variables are equal
across groups.

• Multivariate analysis of variance is appropriate when


there are two or more dependent variables that are
correlated.

114
Relationships amongst ANOVA, ANCOVA, & Regression

Metric Dependent Variable

One or More
Independent Variables

Categorical: Categorical
Interval
Factorial and Interval

Analysis of Analysis of
Variance Covariance Regression

More than
One Factor One Factor

One-Way Analysis N-Way Analysis


of Variance of Variance
Example: one-way ANOVA

An oil company has introduced a new brand of


gasoline in its outlets in three major metro cities.
However, they are not sure how the new brand is
selling at the three places since there is a lot of
difference in the driving habits of people in the
three metros. The company selected 10 outlets in
each city and tabulated the data on an average
daily sale at each of the selected outlets.
H0 : The average sale of the new brand
of gasoline is same in all the metro cities

Between Groups gives the variability due to the place of sale


(between-groups variability). This is the variation in sales
volume(Y) related to the variation in the means of the categories
of (metro)X

Within Groups gives variability due to random error, and the


third row gives the total variability. SSerror, this is the variation in Y
due to the variation within each of the categories of X.

There is a significant difference in the sales volume of the new


brand of gasoline in the three metros, F (2, 27) = 35.52, p < 0.001

117
interpret the result
• If the null hypothesis of equal category means is not rejected,
then the independent variable does not have a significant
effect on the dependent variable.

• On the other hand, if the null hypothesis is rejected, then the


effect of the independent variable is significant.

• A comparison of the category mean values will indicate the


nature of the effect of the independent variable.

118
The F-statistic in ANOVA only tells us that the
dependent variable Skewed to for
varies left different levels
of factor(s). In case of more than two levels,
the F-statistic does not tell us the exact way
in which the dependent variable differs by
the levels of factor(s)

119
Two-Way ANOVA
In two-way analysis, we have two independent
variables or factors and we are interested in
Symmetric
knowing their effect on the same dependent
variable

120
2 way ANOVA
Example
Skewed to right
An MBA aspirant was interested in knowing the
impact of educational background (arts/commerce
and science/engineering) on the final placement
salaries. He is also aware that previous work
experience has an impact on the salaries.
Therefore, he chooses educational background and
work experience as two independent variables.

121
H0: The educational background and
previous work experience have no
bearing on the placement salaries of
MBA students.

122
Correlation Analysis
 Correlation is a measure of relationship
between two variables

 use when the two variables between which correlation


is to be established are equal interval or ratio scaled in
their level of measurement, such as age, weight etc.

 The correlation coefficient gives a mathematical value


for measuring the strength of the linear relationship
between two variables. It can take values from –1 to 1
with
123
Correlation cont…

The correlation between two random variables, X and


Y, is a measure of the degree of linear association
between the two variables.

  
The indicates correlation,
population a perfect negative linearby,
denoted relationship
can take on
-1 <  < 0 indicates a negative linear relationship
any value

from -1 to 1.
indicates no linear relationship
0 <  < 1 indicates a positive linear relationship
   indicates a perfect positive linear relationship

The absolute value of  indicates the strength or exactness of the


relationship. 124
Illustrations of Correlation

Y Y Y
 = -1 =0
=1

X X X

Y  = -.8 Y =0 Y
 = .8

X X X

125
The product moment correlation( r), summarizes the strength
of association between two metric (interval or ratio scaled)
variables, say X and Y.

As it was originally proposed by Karl Pearson, it is also known as


the Pearson correlation coefficient.

It is also referred to as simple correlation, bivariate correlation,


or merely the correlation coefficient.

126
Bivariate/Simple Correlation

• Bivariate correlation tests the strength of the


relationship between two variables without giving any
consideration to the interference some other variable
might cause to the relationship between the two
variables being tested.

• Example : the correlation between the academic


performance and attendance of a student,

127
Partial Correlation
A partial correlation coefficient measures the
association between two variables after controlling
for, or adjusting for, the effects of one or more
additional variables.

Partial correlations have an order associated with


them. The order indicates how many variables are
being adjusted or controlled.
The simple correlation coefficient, r, has a zero-
order, as it does not control for any additional
variables while measuring the association between
two variables.
128
Regression Analysis
 Regression analysis is used to assess the relationship
between one dependent variable (DV) and several
independent variables (IVs)
 Examines associative relationships between a metric
dependent variable and one or more independent
variables
 Is a dependence technique
 Determine whether the independent variables
explain a significant variation in the dependent
variable: whether a relationship exists
 Determine how much of the variation in the
dependent variable can be explained by the
independent variables: strength of the relationship
129
• Bivariate/Simple Regression: involves a single metric
dependent variable and a single metric independent variable

• Multiple Regression involves a single metric dependent


variable and two or more metric independent variables

• Coefficient of determination, r 2. The strength of association is


measured by the coefficient of determination, r 2. It varies
between 0 and 1 and signifies the proportion of the total
variation in Y that is accounted for by the variation in X

• r 2 is used to find out how well the IVs are able to predict the DV

• Adjusted r 2 gives more accurate information about the fitness of


the model. 130
Multicollinearity
• Refers to a situation when two or more IVs are highly
correlated with each other

• causes an inflation in the standard error of regression


coefficients resulting in a reduction of their
significance

131
Example :
• A researcher wants to test some hypotheses
regarding the relationship between size and age
of a firm and its performance in a particular
industry. Size was measured by the number of
employees (in 100s) working in the firm, age was
the number of years for which the firm has been
operating, and performance was measured by
return on equity. A sample of 50 firms was
selected at random.

132
Cont..
H1: Performance of a firm is positively related to its
size

H2: Performance of a firm is positively related to its


age

133
Regression cont…
• Regression is the determination of a statistical
relationship between two or more variables.
• The independent variable is the cause of the
behavior of an other one(dependent variable).
• Regression can be only interpret what exists
physically(i.e., a physical way in which independent
variable can affect dependent variable)

134
The Basic Regression Model
Simple Linear Regression
Y = a + bX
Y = Dependent variable
a = intercept
b = slope/regression coefficient (change in Y with a one unit change in X)
X = predictor value

Multiple Linear Regression


Y = a + b1X1 + b2X2 +b3X3 + …

135
The Regression Line

Y Y=a+bX

b
a
X
136
How Good is the Regression?
The coefficient of determination, r2, is a
descriptive measure of the strength of the
regression relationship, a measure of how well the
. fits the data.
regression line
{
}
Percentage of total variation explained by
the{ regression.

137
Multivariate Statistical Analysis

1. Discriminant analysis
2. Factor analysis
3. Cluster analysis

138
Unit 5: Report writing
• Reading Assignment on :
Report writing and presentation of results
Importance of report writing,
types of research report, report structure,
guidelines for effective documentation

You might also like