Research Methodologyy

Unit-1: Research: Meaning, Objectives & Importance of Research; Role of research in Functional Areas:
Finance, Marketing, HRD; Research Methodology; Process of Research.

Unit-2: Defining Research Problem: Process of formulating Hypothesis; Research Design; Sampling
Design.
Unit-3: Collection, Processing & Analysis of Data, Design of Questionnaire; Testing of Hypothesis;
Parametric and Non-parametric Tests; T-test, Z-test and chi-square test.
Unit-4: Multivariate Analysis Techniques; Multiple Regression Analysis; Discriminant Analysis; Factor
Analysis, ANOVA.
Unit-5: Interpretation & Report Writing; Importance & Techniques of Interpretation; Significance of
Report Writing; Steps in Writing Report; Layout of the Research Report; Types of Report.
Researchers organize their research by formulating and defining a research problem. This helps them focus
the research process so that they can draw conclusions reflecting the real world in the best possible way.
Hypothesis
In research, a hypothesis is a suggested explanation of a phenomenon.
A null hypothesis is a hypothesis which a researcher tries to disprove. Normally, the null hypothesis represents the
current view/explanation of an aspect of the world that the researcher wants to challenge.
Research methodology involves the researcher providing an alternative hypothesis, a research hypothesis, as an
alternate way to explain the phenomenon.
The researcher tests the hypothesis to disprove the null hypothesis, not because he/she loves the research
hypothesis, but because it would mean coming closer to finding an answer to a specific problem. The research
hypothesis is often based on observations that evoke suspicion that the null hypothesis is not always correct.
A variable is something that changes. It changes according to different factors. Some variables change easily, like
the stock-exchange value, while other variables are almost constant, like the name of someone. Researchers are
often seeking to measure variables.
The variable can be a number, a name, or anything where the value can change.
n research, you typically define variables according to what you're measuring. The independent variable is the
variable which the researcher would like to measure (the cause), while the dependent variable is the effect (or
assumed effect), dependent on the independent variable. These variables are often stated in experimental
research, in a hypothesis, e.g. "what is the effect of personality on helping behavior?"
In explorative research methodology, e.g. in some qualitative research, the independent and the dependent
variables might not be identified beforehand. They might not be stated because the researcher does not have a
clear idea yet on what is really going on.
Confounding variables are variables with a significant effect on the dependent variable that the researcher failed
to control or eliminate - sometimes because the researcher is not aware of the effect of the confounding variable.
The key is to identify possible confounding variables and somehow try to eliminate or control them.
Choosing the Research Method
The selection of the research method is crucial for what conclusions you can make about a phenomenon. It affects
what you can say about the cause and factors influencing the phenomenon.
It is also important to choose a research method which is within the limits of what the researcher can do. Time,
money, feasibility, ethics and availability to measure the phenomenon correctly are examples of issues
constraining the research.
Choosing the Measurement
Choosing the scientific measurements are also crucial for getting the correct conclusion. Some measurements
might not reflect the real world, because they do not measure the phenomenon as it should.
Significance Test
To test a hypothesis, quantitative research uses significance tests to determine which hypothesis is right.
The significance test can show whether the null hypothesis is more likely correct than the research hypothesis.
Research methodology in a number of areas like social sciences depends heavily on significance tests.
A significance test may even drive the research process in a whole new direction, based on the findings.
The t-test (also called the Student's T-Test) is one of many statistical significance tests, which compares two
supposedly equal sets of data to see if they really are alike or not. The t-test helps the researcher conclude
whether a hypothesis is supported or not.
Drawing Conclusions
Drawing a conclusion is based on several factors of the research process, not just because the researcher got the
expected result. It has to be based on the validity and reliability of the measurement; how good the measurement
was to reflect the real world and what more could have affected the results.
The observations are often referred to as 'empirical evidence' and the logic/thinking leads to the conclusions.
Anyone should be able to check the observation and logic, to see if they also reach the same conclusions.
Errors of the observations may stem from measurement-problems, misinterpretations, unlikely random events etc.
A common error is to think that correlation implies a causal relationship. This is not necessarily true.
Errors in Research
Logically, there are two types of errors when drawing conclusions in research:
Type 1 error is when we accept the research hypothesis when the null hypothesis is in fact correct.
Type 2 error is when we reject the research hypothesis even if the null hypothesis is wrong.
POPULATION - A research population is generally a large collection of individuals or objects that is the main focus
of a scientific query. A research population is also known as a well-defined collection of individuals or objects
known to have similar characteristics. All individuals or objects within a certain population usually have a common,
binding characteristic or trait.
Target population refers to the ENTIRE group of individuals or objects to which researchers are interested
in generalizing the conclusions. The target population usually has varying characteristics and it is also known as the
theoretical population.
SAMPLE is simply a subset of the population. The concept of sample arises from the inability of the researchers to
test all the individuals in a given population. The sample must be representative of the population from which it
was drawn and it must have good size to warrant statistical analysis.
The main function of the sample is to allow the researchers to conduct the study to individuals from the population
so that the results of their study can be used to derive conclusions that will apply to the entire population. It is
much like a give-and-take process. The population “gives” the sample, and then it “takes” conclusions from the
results obtained from the sample.
Sampling Methods
Probability Sampling refers to sampling when the chance of any given individual being selected is known and these
individuals are sampled independently of each other. This is also known as random sampling. A researcher can
simply use a random number generator to choose participants (known as simple random sampling), or
every nth individual (known as systematic sampling) can be included. Researchers also may break their target
population into strata, and then apply these techniques within each strata to ensure that they are getting enough
participants from each strata to be able to draw conclusions. For example, if there are several ethnic communities
in one geographical area that a researcher wishes to study, that researcher might aim to have 30 participants from
each group, selected randomly from within the groups, in order to have a good representation of all the relevant
groups.
Random Sampling
Sampling techniques can be divided into two broad categories:
Random (or probability) sampling and
 simple random sampling, Suppose-
 stratified sampling,
 systematic sampling and N=1000 n=100
 two stage sampling
 multi-stage sampling N= N1+N2+N3…
 cluster sampling,
n = n1+n2+n3…
A simple random sample (SRS) is the most basic probabilistic option used for creating a sample from a
population. Each SRS is made of individuals drawn from a larger population (represented by the variable N),
completely at random. As a result, said individuals have an equal chance of being selected throughout the
sampling process. The benefit of SRS is that as a result, the investigator is guaranteed to choose a sample which is
representative of the population, which ensures statistically valid conclusions.
Example
An investigator wishes to draw multiple samples consisting of 5 people each from a village of 100. (Here, the
variable n is used to represent the size of the sample; thus village size N=100 and sample size n=5). By randomizing
the selection procedure, any member of this village has an equal chance of being selected as part of this first
sample, and an equal chance of being selected for the next sample of the same size (and so on).
To create a systemic random sample, there are seven steps: (a) defining the population; (b) choosing your sample
size; (c)listing the population; (d) assigning numbers to cases; (e) calculating the sampling fraction; (f) selecting the
first unit; and(g) selecting your sample.
A simple random sample (SRS) is the most basic probabilistic option used for creating a sample from a population.
Each SRS is made of individuals drawn from a larger population (represented by the variable N), completely at
random. As a result, said individuals have an equal chance of being selected throughout the sampling process. The
benefit of SRS is that as a result, the investigator is guaranteed to choose a sample which is representative of the
population, which ensures statistically valid conclusions.
Example
An investigator wishes to draw multiple samples consisting of 5 people each from a village of 100. (Here, the
variable n is used to represent the size of the sample; thus village size N=100 and sample size n=5). By randomizing
the selection procedure, any member of this village has an equal chance of being selected as part of this first
sample, and an equal chance off being selected for the next sample of the same size (and so on).
To create a systemic random sample, there are seven steps: (a) defining the
population; (b) choosing your sample size; (c)listing the population; (d) assigning numbers
to cases; (e) calculating the sampling fraction; (f) selecting the first unit; and(g) selecting
your sample.
Stratified random sampling is a probabilistic sampling option. The first step in stratified random sampling is
to split the population into strata, i.e. sections or segments. The strata are chosen to divide a population into
important categories relevant to the research interest.
For example, if interested in school achievement we may want to first split schools into rural, urban, and suburban
as school achievement on average may be quite distinct between these regions. The second step is to take a simple
random sample within each stratum. This way a randomized probabilistic sample is selected within each stratum.
Each strata should be mutually exclusive (i.e. every element in the population can be assigned to only one
stratum), and no population element can be excluded in the construction of strata.
Stratified random sampling is a probabilistic sampling option. The first step in stratified random sampling is
to split the population into strata, i.e. sections or segments. The strata are chosen to divide a population into
important categories relevant to the research interest.
For example, if interested in school achievement we may want to first split schools into rural, urban, and suburban
as school achievement on average may be quite distinct between these regions. The second step is to take a simple
random sample within each stratum. This way a randomized probabilistic sample is selected within each stratum.
Each strata should be mutually exclusive (i.e. every element in the population can be assigned to only one
stratum), and no population element can be excluded in the construction of strata.
Systematic sampling is a random sampling technique which is frequently chosen by researchers for its
simplicity and its periodic quality.
In systematic random sampling, the researcher first randomly picks the first item or subject from the population.
Then, the researcher will select each n'th subject from the list.
The procedure involved in systematic random sampling is very easy and can be done manually. The results are
representative of the population unless certain characteristics of the population are repeated for every n'th
individual, which is highly unlikely.
Steps in selecting a systematic random sample:
 Calculate the sampling interval (the number of households in the population divided by the number of
households needed for the sample)
 Select a random start between 1 and sampling interval
 Repeatedly add sampling interval to select subsequent households
Systematic sampling is a random sampling technique which is frequently chosen by researchers for its
simplicity and its periodic quality.
In systematic random sampling, the researcher first randomly picks the first item or subject from the population.
Then, the researcher will select each n'th subject from the list.
The procedure involved in systematic random sampling is very easy and can be done manually. The results are
representative of the population unless certain characteristics of the population are repeated for every n'th
individual, which is highly unlikely.
The process of obtaining the systematic sample is much like an arithmetic progression.
1. Starting number: The researcher selects an integer that must be less than the total number of individuals
in the population. This integer will correspond to the first subject.
2. Interval: The researcher picks another integer which will serve as the constant difference between any
two consecutive numbers in the progression.
The integer is typically selected so that the researcher obtains the correct sample size
For example, the researcher has a population total of 100 individuals and need 12 subjects. He first picks his
starting number, 5.
Then the researcher picks his interval, 8. The members of his sample will be individuals 5, 13, 21, 29, 37, 45, 53, 61,
69, 77, 85, 93.
Steps in selecting a systematic random sample:
 Calculate the sampling interval (the number of households in the population divided by the number of
households needed for the sample)
 Select a random start between 1 and sampling interval
 Repeatedly add sampling interval to select subsequent households
Sampling Gap
Sampling fraction K = N/n (one sample to another the gap is inclusive)
Suppose-
N = 20 n = 5 then, K = 20/5 = 4
(Population = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 )
Ex- first select 9 then, 13, 17 , 1, 5….
Sample size – small ( n = < 30 ) large ( n is >=30)
The sampling process comprises several stages:

 Defining the population of concern
 Specifying a sampling frame, a set of items or events possible to measure
 Specifying a sampling method for selecting items or events from the frame
 Determining the sample size
 Implementing the sampling plan
 Sampling and data collecting
 Reviewing the sampling process
Non-Probability Sampling, (Judgment sampling) or convenience sampling, refers to when researchers take
whatever individuals happen to be easiest to access as participants in a study. This is only done when the
processes the researchers are testing are assumed to be so basic and universal that they can be generalized
beyond such a narrow sample.(2) For example, snowball sampling is an approach for locating information-rich key
informants.(3) Using this approach, a few potential respondents are contacted and asked whether they know of
anybody with the characteristics that you are looking for in your research. Snowball sampling is not a stand-alone
tool; the tool is a way of selecting participants and then using other tools, such as interviews or surveys.
Non-random (or non-probability) sampling.
 quota,
 convenience,
 judgment and
 Snowball sampling methods.
A method of sampling that involves the division of a population into smaller groups known as strata. In stratified
random sampling, the strata are formed based on members' shared attributes or characteristics. A random sample
from each stratum is taken in a number proportional to the stratum's size when compared to the population. These
subsets of the strata are then pooled to form a random sample. Ex-soil sample collection
TERMS
TRIAL – any type of experiment ex- tossing a coin
EVENT – results/ outcome of the trial ex- head/tail
EXHAUSTIVE NO OF CASES – total no of possible outcomes in a trial ex- 2 heads in tossing two coins
FAVOURABLE CASES – entails happening of the situation
MUTUALLY EXCLUSIVE CASES- two events cannot be simultaneous
FC
P=
EC
P= probability FC - Favorable cases EC - Exhaustive no of cases
Statistical Tests –
A statistical test provides a mechanism for making quantitative decisions about a process or processes. The intent
is to determine whether there is enough evidence to "reject" a conjecture or hypothesis about the process. The
conjecture is called the null hypothesis. Not rejecting may be a good result if we want to continue to act as if we
"believe" the null hypothesis is true. Or it may be a disappointing result, possibly indicating we may not yet have
enough data to "prove" something by rejecting the null hypothesis.
There are many different statistical significance, or hypothesis, tests. They all follow the same basic principle. The
appropriate test for a given situation depends on the nature of the data being analyzed. This chapter explains p-
values, gives a detailed description of significance testing and discusses the relationship between confidence
intervals and significance tests.
Types of statistical tests: There are a wide range of statistical tests. The decision of which statistical test to use
depends on the research design, the distribution of the data, and the type of variable. In general, if the data is
normally distributed you will choose from parametric tests. If the data is non-normal you choose from the set of
non-parametric tests. Below is a table listing just a few common statistical tests and their use.
Type of Test: Use:
Chi-square Tests for the strength of the association between two categorical variables
Paired T-test Tests for difference between two related variables
Independent T-
Tests for difference between two independent variables
test
F test Study two samples at a time
Tests the difference between group means after any other variance in the outcome variable is
ANOVA
accounted for. More than two samples at a time. Can be one way or two way
Z-test
A hypothesis is a speculation or theory based on insufficient evidence that lends itself to further testing and
experimentation. With further testing, a hypothesis can usually be proven true or false. Let's look at an example.
Little Susie speculates, or hypothesizes, that the flowers she waters with club soda will grow faster than flowers
she waters with plain water. She waters each plant daily for a month (experiment) and proves her hypothesis true!
A null hypothesis (H0) is a hypothesis that says there is no statistical significance between the two variables in
the hypothesis. It is the hypothesis that the researcher is trying to disprove. In the example, Susie's null hypothesis
would be something like this: There is no statistically significant relationship between the type of water I feed the
flowers and growth of the flowers. A researcher is challenged by the null hypothesis and usually wants to disprove
it, to demonstrate that there is a statistically-significant relationship between the two variables in the hypothesis.
A statistical significance test measures the strength of evidence which the data sample supplies for or against some
proposition of interest.
This proposition is known as a 'null hypothesis', since it usually relates to there being 'no difference' between
groups or 'no effect' of a treatment.
An alternative hypothesis (H1) simply is the inverse, or opposite, of the null hypothesis. So, if we continue
with the above example, the alternative hypothesis would be that there IS indeed a statistically-significant
relationship between what type of water the flower plant is fed and growth. More specifically, here would be the
null and alternative hypotheses for Susie's study:
Null: If one plant is fed club soda for one month and another plant is fed plain water, there will be no difference in
growth between the two plants.
Alternative: If one plant is fed club soda for one month and another plant is fed plain water, the plant that is fed
club soda will grow better than the plant that is fed plain water.
7 Step Process of Statistical Hypothesis Testing
Step 1: State the Null Hypothesis.
Step 2: State the Alternative Hypothesis.
Step 3: Find out the degrees of freedom ( no. of independent observation in a series)
Degrees of freedom [ when data are arranged in individual form – df= (n-1) ]
Degrees of freedom [ when data are arranged in tabular form – df= (m-1) (n-1) ] where m= no of row, n= no of
colomn
Step4: choose Confidence level-
CL = 95% means 5% error, CL = 99% means 1% error
Step 5: Find out the Calculated value using formulae by preparing Chi-square table
Step 6: Find out the Tabulated value – (from table) depending on degrees of freedom & Confidence level.
Step 7: Making decisions
If Cal Value < Tab value = Null hypothesis is Accepted
If Cal Value > Tab value = Null hypothesis is Rejected
chi-square test :a statistical method assessing the goodness of fit between a set of observed values and
those expected theoretically. Chi-square is a statistical test commonly used to compare observed data with data
we would expect to obtain according to a specific hypothesis.
For example, if, according to Mendel's laws, you expected 10 of 20 offspring from a cross to be male and the actual
observed number was 8 males, then you might want to know about the "goodness to fit" between the observed
and expected. Were the deviations (differences between observed and expected) the result of chance, or were
they due to other factors. How much deviation can occur before you, the investigator, must conclude that
something other than chance is at work, causing the observed to differ from the expected?
The chi-square test is always testing what scientists call the null hypothesis, which states that there is no
significant difference between the expected and observed result.
(chi-square )X2= (O-E)2/E
Where O = observed value/frequency
E = expected value/frequency
That is, chi-square is the sum of the squared difference between observed (O) and the expected (E) data (or the
deviation, d), divided by the expected data in all possible categories.
Individual form of data-
(chi-square )X2= fx/N
X 10 12 15
f 5 3 4
Tabular form of data-

Happy Unhapp
family y family
Educated 35 25 60
Uneducated 20 15 35
55 40 95
Expected value = row total X col total

Sum total
E35 = (60 X 55)/ 95 = 34.75 = 35

E25 = (60 X 40)/ 95 = 25
E20 = (35 X 55)/ 95 = 20
E15 = (40 X 35)/ 95 = 15
Sampling error
“Sampling error is the error that arises in a data collection process as a result of taking a sample from a population
rather than using the whole population.
Sampling error is one of two reasons for the difference between an estimate of a population parameter and the
true, but unknown, value of the population parameter. The other reason is non-sampling error. Even if a sampling
process has no non-sampling errors then estimates from different random samples (of the same size) will vary
from sample to sample, and each estimate is likely to be different from the true value of the population parameter.
The sampling error for a given sample is unknown but when the sampling is random, for some estimates (for
example, sample mean, sample proportion) theoretical methods may be used to measure the extent of the
variation caused by sampling error.”
Non-sampling error:
“Non-sampling error is the error that arises in a data collection process as a result of factors other than taking a
sample.
Non-sampling errors have the potential to cause bias in polls, surveys or samples.
There are many different types of non-sampling errors and the names used to describe them are not consistent.
Examples of non-sampling errors are generally more useful than using names to describe them.
QUESTION
A survey conducted in BBSR regarding family life and education. Test at 5% SL that is there any
significant difference between family life and education from the following information.
Happy Unhapp
family y family
Educated 65 35 100
Uneducated 72 60 132
137 95 232
Step 1: Null Hypothesis. - There is NO significant difference between family life and education
Step 2: Alternative Hypothesis. There IS significant difference between family life and education
Step 3: degrees of freedom ( no. of independent observation in a series)
Individual form – df= (n-1) & Tabular form – df= (m-1) (n-1) ] where m= no of row, n= no of column
df tabular = ( 2-1) (2-1) = 1
Step 4: choose Confidence level-
SL 5% or CL = 95%
Step 5: Calculated value using formulae by preparing Chi-square table
E65 = (100X137)/232 = 59
E35 = (100X95)/232 = 41
E72 = (132X137)/232 = 78
E60 = (95X132)/232 = 54
Obs Value Exp value
O E O-E (O-E)2 (O-E)2/E
65 59 6 36 0.61
35 41 -6 36 0.88
72 78 -6 36 0.46
60 54 6 36 0.67
TOTAL 2.62
X2 cal = 2.62
Step 6: Tabulated value – (from table) at SL 5% and df = 1 is 3.84.
2.62 < 3.84
There is NO significant difference between family life and education
QUESTION
In an experiment an immunization of goats from anthrax the following results are obtained. Derive your
inference on the efficacy of vaccine.
Died of Survived Total
Anthrax
Inoculated 2 10 12
Not inoculated 6 6 12
Total 8 16 24
Step 1: Null Hypothesis. - There is NO significant relationship between inoculation and survival
Step 2: Alternative Hypothesis. There IS significant relationship between inoculation and survival
df tabular = ( 2-1) (2-1) = 1
SL 5% or CL = 95%
E2 = (12X8)/24 = 4
E10 = (12X16)/24 = 8
E6 = (8X12)/24 = 4
E6 = (16X12)/24 = 8
Obs Value Exp value
O E O-E (O-E)2 (O-E)2/E
2 4 -2 4 1
10 8 2 4 .5
6 4 2 4 1
6 8 -2 4 .5
TOTAL 3
X2 cal = 3
Step 6: Tabulated value – (from table) at SL 5% and df = 1 is 3.84.
3 < 3.84
There is NO significant relationship between inoculation and survival
QUESTION
Four major automobile manufacturers in India are Maruti, Hyundai, Tata, Ford. These companies manufacture cars
in 3 segments small, medium, large. Suppose in a hypothetical study conducted to analyze whether there was any
relationship between manufacturing and type of car performed by consumers from the following-
Maruti Hyundai Tata Ford

Small 80 60 75 10 25
Medium 15 10 15 30 70
Large 5 30 10 60 105
Total 100 100 100 100 395
Step 1: Null Hypothesis. - There is NO significant relationship between manufacturing and type of car performed
by consumers
Step 2: Alternative Hypothesis. There IS significant relationship between manufacturing and type of car performed
by consumers
df tabular = ( 4-1) (3-1) = 6
SL 5% or CL = 95%
E80 = (25X100)/395 = 6.33
E60 = (25X100)/ 395 = 6.33
E75 = (25X100)/ 395 = 6.33
E10 = (25X100)/ 395 = 6.33
E15 = (70X100)/ 395 = 17.72
E10 = (70X100)/ 395 = 17.72
E15 = (70X100)/ 395 = 17.72
E30 = (70X100)/ 395 = 17.72
E5 = (105X100)/ 395 = 26.58
E30 = (105X100)/ 395 = 26.58
E10 = (105X100)/ 395 = 26.58
E60 = (105X100)/ 395 = 26.58
O E O-E (O-E)2 (O-E)2/E
80 6.33 73.67 5427.27 857.39
60 6.33 53.67 2880.47 455.05
75 6.33 68.67 4715.57 744.96
10 6.33 3.67 13.47 2.13
15 17.72 -2.72 7.40 0.42
10 17.72 -7.72 59.60 3.36
15 17.72 -2.72 7.40 0.42
30 17.72 12.28 150.80 8.51
5 26.58 -21.58 465.70 17.52
30 26.58 3.42 11.70 0.44
10 26.58 -16.58 274.90 10.34
60 26.58 33.42 1116.90 42.02
Total 2142.55
X2 cal = 2142.55
Step 6: Tabulated value – (from table) at SL 5% and df = 1 is -----------
If Cal Value < Tab value = Null hypothesis is xxxxxxx
xxxxxxxxxxxx
There is xxxx significant relationship between manufacturing and type of car performed by consumers
QUESTION
The following table gives the no of car accidents that occur during various days of the week. Test the
hypothesis that at 5%SL the accidents are uniformly distributed over the week-
O E O-E (O-E)2 (O-E)2/E
Sunday 14 12 2 4.00 0.33
Monday 15 12 3 9.00 0.75
Tuesday 8 12 -4 16.00 1.33
Wednesday 12 12 0 0.00 0.00
Thursday 11 12 -1 1.00 0.08
Friday 9 12 -3 9.00 0.75
Saturday 14 12 2 4.00 0.33
83/7=11.8 3.58
X2 cal = 3.58
Step 1: Null Hypothesis. - There is NO significant relationship between manufacturing and type of car performed
by consumers
Step 2: Alternative Hypothesis. There IS significant relationship between manufacturing and type of car performed
by consumers
df individual = (n-1) = 7-1= 6
SL 5% or CL = 95%
X2 cal = 3.58
Step 6: Tabulated value – (from table) at SL 5% and df = 1 is 12.592
3.58 < 12.592
There is significant relationship between manufacturing and type of car performed by consumers
The t-test
The t-test assesses whether the means of two groups are statistically different from each other. This analysis is
appropriate whenever you want to compare the means of two groups, and especially appropriate as the analysis
for the posttest-only two-group randomized experimental design.
The t-test is probably the most commonly used Statistical Data Analysis procedure for hypothesis testing. Actually,
there are several kinds of t-tests, but the most common is the "two-sample t-test" also known as the "Student's t-
test" or the "independent samples t-test".
One sample t-test
Comparison of sample mean with population mean when standard deviation of the population is estimated from
the sample.
Calculation of the test statistic requires four components:
1. The average of the sample (observed average) / sample mean= Xbar
ItI = I Xbar – M I
---
2. The population average or other known value (expected average)/ popln mean= M S/ V n
3. The standard deviation (SD) of the sample average = S
4. The number of observations./ sample size = n
SD = sqrt ( 1/n-1 Sumn (X-Xbar)2)
QUESTION
An electric bulb manufacturing company claims that the life span of electric bulb is 40 months. Sample
test of 6 bulbs show whose life span are 35m,42m,20m,22m,38m,45m- test the hypothesis that at 5% SL
company’s claim is valid.
X X bar X - Xbar (X - Xbar)2
35 34 1 1
42 34 8 64
20 34 -14 196
22 34 -12 144
38 34 4 16
45 34 11 121
Xbar= 202/6=33.66=34 542
SD = sqrt ( 1/5 X 542) = 10.41
= ItI = I 34 – 40 I = 6 / 4.22 = 1.42

ItI = I Xbar – M I ---
--- 10.41/ V 6
S/ V n
Step 1: Null Hypothesis. - The life span of electric bulb is 40 months

Step 2: Alternative Hypothesis. The life span of electric bulb is not always 40 months
df individual = (n-1) = 6-1= 5
SL 5% or CL = 95%
Step 5: Calculated value using formulae by preparing t-test table
t cal = 1.42
Step 6: Tabulated value – (from table) at SL 5% and df = 5 is 2.32
1.42 < 2.32
The life span of electric bulb is 40 months
Two-Sample T-Test
We often want to know whether the means of two populations on some outcome differ. For example, there are
many questions in which we want to compare two categories of some categorical variable (e.g., compare males
and females) or two populations receiving different treatments in context of an experiment. The two-sample t-test
is a hypothesis test for answering questions about the mean where the data are collected from two random
samples of independent observations, each from an underlying normal distribution:
The steps of conducting a two-sample t-test are quite similar to those of the one-sample test. And
for the sake of consistency, we will focus on another example dealing with birthweight and
prenatal care. In this example, rather than comparing the birth weight of a group of infant to
some national average, we will examine a program's effect by comparing the birth weights of
babies born to women who participated in an intervention with the birth weights of a group that
did not.
X Y
Sample size n1 n2
Sample mean Xbar Ybar
Combined SD S
Degree of freedom n1-1 n2-1
Degree of freedom for combined n1+n2 - 2
ItI = I Xbar – Ybar I

S X sqrt n1.n2
n1+n2
S = sqrt 1/(n1+n2-2)[ Sumn (X-Xbar)2 + sumn (Y-Ybar)2]
QUESTION
Suppose there are two types of drugs Imported and Indigenous for weight loss. Imported drugs are applied to 7
patients and Indigenous to 9 patients. Following results are obtained- test at 5% SL is there any significant
difference between the types of drugs.
Republic day in India
Impo drugs Indeg drugs
2 2
X Y X-Xbar (X-Xbar) Y-Ybar (Y-Ybar)
Wt loss 7 2 3 9 -3 9
2 5 -2 4 0 0
5 7 1 1 2 4
1 4 -3 9 -1 1
8 3 4 16 -2 4
2 6 -2 4 1 1
3 4 -1 1 -1 1
9 4 16
5 0 0
28 45 0 44 36
Means 28/7=4 5
S = sqrt 1/(n1+n2-2)[ Sumn (X-Xbar)2 + sumn (Y-Ybar)2]
S = sqrt 1/14 (44 + 36) = 5.7
ItcalI = I Xbar – Ybar I

S X sqrt n1.n2
n1+n2
ItcalI = I 4-5 I
5.7 X sqrt 7X9 = 0.26
16
t tab = at 5% SL and 14 df =
An F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis. It is most
often used when comparing statistical models that have been fitted to a data set, in order to identify the model
that best fits the population from which the data were sampled.
Since F is formed by chi-square, many of the chi-square properties carry over to the F distribution.
 The F-values are all non-negative
 The distribution is non-symmetric
 The mean is approximately 1
 There are two independent degrees of freedom, one for the numerator, and one for the denominator.
 There are many different F distributions, one for each pair of degrees of freedom.
If the null hypothesis is true, then the F test-statistic given above can be simplified (dramatically). This ratio of
sample variances will be test statistic used. If the null hypothesis is false, then we will reject the null
hypothesis that the ratio was equal to 1 and our assumption that they were equal.
There are several different F-tables. Each one has a different level of significance. So, find the correct level of
significance first, and then look up the numerator degrees of freedom and the denominator degrees of
freedom to find the critical value.
F = Sx2 / Sy2 OR Sy2 / Sx2
x Y
Mean Xbar Ybar
Sample n1 n2
df n1 -1 n2 -1
Sample variance Sx2 Sy2
Sx2 = ( 1/ n1 -1) sumn (X - Xbar)2 Sy2 = ( 1/ n2 -1) sumn (Y - Ybar)2
df = (n1 -1, n2 -1)
x y X-Xbar Y-Ybar (x-xbar)2 (y-ybar)2

20 11 -10 -12 100 144
30 17 0 -6 0 36
40 22 10 -1 100 1
25 28 -5 5 25 25
35 25 5 2 25 4
27 4 16
31 8 64
Total 150 161 250 290
Xbar=30
Ybar=23
Sx2 = ( 1/ n1 -1) sumn (X - Xbar)2 = 1/(5-1)250 = 62.5
Sy2 = ( 1/ n2 -1) sumn (Y - Ybar)2 = 1/(7-1) 290 = 48.33
Analysis of variance (ANOVA) is a collection of statistical models used to analyze the differences among group
means and their associated procedures (such as "variation" among and between groups), developed by statistician
and evolutionary biologist Ronald Fisher
One-way ANOVA – single parameter
Two-way ANOVA – two parameter
Steps of ANOVA- (ONE PARAMETER)
1. Set up Null hypothesis
2. Set up Alt Hypothesis
3. Find GRAND TOTAL = G
4. Find the Correction Factor = CF = G2/N, N = total no of subgroups
5. Find Raw Sum of Squares ( RSS)
6. Find Total Sum of Squares ( TSS) = RSS - CF
7. Find Between Sample Sum of Squares ( BSS )
= [ (sumnA)2 + (sumn B)2 + (sumn C)2] - CF

n n n
8. Find Within Sample Sum of Squares ( WSS) = TSS – BSS
9. Prepare ANOVA table-
Source Sum of Squares Degrees of freedom Mean Square Error Fcal
SS df MSS = SS/df
BSS X 3-1=2 x/2 = x x/y OR y/x
WSS Xx 11-2= 9 Xx/9 = y At (2,9) or (9,2) df
TSS xxx N-1 = 11 if sample
size is 12
10. Find table value at specific df and SL
11. Compare
12. Decision making
PROBLEM Raw SS
Plots Seeds Plots Seeds
A B C A B C
1 2 7 7 1 4 49 49
2 3 4 6 2 9 16 36
3 5 2 4 3 25 4 16
4 8 9 8 4 64 81 64
18 22 25 65 102 150 165 417
Steps of ANOVA-
1. Set up Null hypothesis – there is NO significant difference between plots and seeds
2. Set up Alt Hypothesis - there is significant difference between plots and seeds
3. Find GRAND TOTAL = G = 65
4. Find the Correction Factor = CF = G2/N, CF= (65)2 /12=352.08
5. Find Raw Sum of Squares ( RSS) 417
6. Find Total Sum of Squares ( TSS) = RSS - CF = 417 – 352.08= 64.92
7. Find Between Sample Sum of Squares ( BSS )
= [ (sumnA)2 + (sumn B)2 + (sumn C)2] - CF

n n n
[ (102)2/12 + (150)2/12 + (165)2/12 ] - 352.08 = 6.17
8. Find Within Sample Sum of Squares ( WSS) = TSS – BSS = 64.92-6.17= 58.75
SS df MSS = SS/df
BSS 6.17 3-1=2 6.17/2 = 3.085 6.52/3.08=2.11
WSS 58.75 11-2= 9 58.75/9 = 6.527 At (2,9) or (9,2) df
TSS 64.92 N-1 = 11 if sample
size is 12
10.Find table value at specific df and SL

11.Compare
12.Decision making
Steps of ANOVA- (TWO PARAMETER)

1. Set up Null hypothesis
2. Set up Alt Hypothesis
3. Find GRAND TOTAL = G
4. Find the Correction Factor = CF = G2/N, N = total no of subgroups
5. Find Raw Sum of Squares ( RSS)
6. Find Total Sum of Squares ( TSS) = RSS - CF
7. Find Row Sample Sum of Squares ( Row SS )
[ sumn R1 2/n + sumn R2 2/n + sumn R3 2/n +…..] – CF
8. Find Column Sum of Squares (CSS)
[ sumn C1 2/n + sumn C2 2/n + sumn C3 2/n +…..] – CF
9. Find Error Sum of Squares (ESS)
= TSS – (Row SS + CSS)
SS df MSS = SS/df
RSS X r-1 x/(r-1) = x F = x/z OR z/x
CSS Xx c-1 xx/(c-1) = y F = y/z OR z/y
ESS xxx (r-1)(c-1) xxx/(r-1)(c-1)= z
TSS N-1 = 11 if sample
size is 12
11.ind table value at specific df and SL
12.Compare
13.Decision making
PROBLEM:
The following table gives the number of units of production per day turn out by 4 different types of machines.
To test the hypothesis that the mean production is the same for the 4 machines
To test the hypo that the employee do not differ wrt mean productivity using 5% SL of ANOVA/
Machines M1 M2 M3 M4
Employees
1 40 36 45 30
2 38 42 50 41
3 36 30 48 35
4 46 47 52 44
1. Set up Null hypothesis = mean production is the same for the 4 machines
2. Set up Alt Hypothesis mean production is NOT the same for the 4 machines
3. Find GRAND TOTAL = G = 660
4. Find the Correction Factor = CF = G2/N, = 27225
5. Find Raw Sum of Squares ( RSS) = 27900
Raw sum of squares

Employees
1 1600 1296 2025 900
2 1444 1764 2500 1681
3 1296 900 2304 1225
4 2116 2209 2704 1936
6456 6169 9533 5742 27900
6. Find Total Sum of Squares ( TSS) = RSS - CF = 27900 – 27225 = 675

7. Find Row Sample Sum of Squares ( Row SS )
[ sumn R1 2/n + sumn R2 2/n + sumn R3 2/n +…..] – CF
Row SS
Employees Row sumn (Row SS)2 (Row SS)2/4
1 40 36 45 30 151 22801 5700
2 38 42 50 41 171 29241 7310
3 36 30 48 35 149 22201 5550
4 46 47 52 44 189 35721 8930
27491
27491-27225= 266
8. Find Column Sum of Squares (CSS)

[ sumn C1 2/n + sumn C2 2/n + sumn C3 2/n +…..] – CF
Col SS
Employees
1 40 36 45 30
2 38 42 50 41
3 36 30 48 35
4 46 47 52 44
160 155 195 150
Sumn 160 155 195 150
(col sum)2 25600 24025 38025 22500
(col sum)2/4 6400 6006 9506 5625 27538
27538 – 27225 = 313
9. Find Error Sum of Squares (ESS)
= TSS – (Row SS + CSS) = 675-(265-312.5)= 675-57.5 = 96.5
SS df MSS = SS/df
RSS 266 r-1 = 3 266/3=88.66 = x F = x/z
=88.66/10.72=8.22
CSS 312.5 c-1 = 3 312.5/3=104.16 = y F = y/z =
104.16/10.72=9.71
ESS 96.5 (r-1)(c-1) = 9 96.5/15=10.72 = z
TSS 675 N-1 = 15 if sample 45
size is 16
11.Find table value at specific df and 5% SL at df(3,9) = 3.86
12.Compare = 8.27 > 3.86 Null Hypo is rejected for machines
9.71 > 3.86 Null Hypo is rejected for employees
13.Decision making
Wilcoxon’s Test / U test / Rank Sum Test-

Characteristics of Normal Probability Curve:
Some of the major characteristics of normal probability curve are as follows:
1. The curve is bilaterally symmetrical.
The curve is symmetrical to its ordinate of the central point of the curve. It means the size, shape and slope of the
curve on one side of the curve is identical to the other side of the curve. If the curve is bisected then its right hand
side completely matches to the left hand side.
2. The curve is asymptotic:
The Normal Probability Curve approaches the horizontal axis and extends from-∞ to + ∞. Means the extreme ends
of the curve tends to touch the base line but never touch it.
It is depicted in figure (11.3) given below:
50% OR 0.5 50% OR 0.5
Probability of success + Probability of failure = 1

3. The Mean, Median and Mode:
The mean, Median and mode fall at the middle point and they are numerically equal.
4. The Points of inflection occur at ± 1 Standard deviation unit:
The points of influx in a NPC occur at ± 1σ to unit above and below the mean. Thus at this point the curve changes
from convex to concave in relation to the horizontal axis.
5. The total area of NPC is divided in to ± standard deviations:

The total of NPC is divided into six standard deviation units. From the center it is divided in to three +ve’ standard
deviation units and three —ve’ standard deviation units.
Thus ± 3σ of NPC include different number of cases separately. Between ± 1σ lie the middle 2/3rd cases or 68.26%,
between ± 2σ lie 95.44% cases and between ± 3σ lie 99.73% cases and beyond + 3σ only 0.37% cases fall.
9. The Mean of NPC is µ and the standard deviation is σ:
As the mean of the NPC represent the population mean so it is represented by the µ (Meu). The standard deviation
of the curve is represented by the Greek Letter, σ.
10. In Normal Probability Curve the Standard deviation is the 50% larger than the Q:
In NPC the Q is generally called the probable error or PE.
Applications of Normal Probability Curve:
Some of the most important applications of normal probability curve are as follows:
The principles of Normal Probability Curve are applied in the behavioral sciences in many different areas.
1. NPC is used to determine the percentage of cases in a normal distribution within given limits:
The Normal Probability Curve helps us to determine:
i. What percent of cases fall between two scores of a distribution?
ii. What percent of scores lie above a particular score of a distribution?
iii. What percent of scores lie below a particular score of a distribution?
General normal distribution- X~N ( Mew, SD2) N= normal distribution Mew = mean of normal distribution
SD2 = variance
Standard normal distribution- X~N ( 0, 1) Mew = 0 SD2 = 1
All General normal distribution is converted into Standard Normal Variate (Z)
Z = (X – Mew )/SD X = probability
The total of NPC is divided into six standard deviation units. From the center it is divided in to three +ve’ standard
deviation units and three —ve’ standard deviation units.
Thus ± 3σ of NPC include different number of cases separately.
Between ± 1σ lie the middle 2/3rd cases or 68.26% ~ 68%
between ± 2σ lie 95.44% ~ 95 cases and
between ± 3σ lie 99.73% ~ 99 cases and
beyond + 3σ only 0.37% cases fall.
Where X = Mew Z = 0/SD = 0 same as ( Mew – Mew)/SD
Z = (Mew + SD-Mew)/SD = SD/SD = 1
Z = (Mew + 2SD-Mew)/SD = 2SD/SD = 2 ‘Z’ value to be found out from Table
Z = (Mew + 3SD-Mew)/SD = 3SD/SD = 3
QUESTION-
The mean height of soldiers is 68.6” and Std Deviation = 5.2” in a certain regiment. The Ht and SD follows Normal
distribution. Find out how many no of soldiers whose ht was > 6” out of 1000 soldiers.
Answer-
Mew = 68.6 SD = 5.2
Z =( X-mew)/SD = (X-68.6)/5.2 ; X = 6’ OR 72” so, (72 - 68.6)/5.2 = 0.65”
Shaded area = 0.5 - table value of 0.65

0.5 – 0.2422 = 0.2758
0.65
0.2578 X 1000 = 258 no.s
U test –
Step 1: arrange the data in ascending order

Step 2: Find out the rank of numbers as a whole
Step 3: identify the groups /samples
Step 4: Find out the Sumn of 1st sample rank = R1
Step 5: Find out the Sumn of 2nd sample rank = R2
Step 6: Find out the mean = (n1 X n2)/2 n1 is 1 st sample size, n2 is 2nd sample size
Step 7: Find out the Std Deviation = SD = V n1.n2(n1+n2+1)
12
Step 8: Find out the U calculated value n1.n2 + n1(n1+1) – Sumn R1
2
OR
U calculated value n1.n2 + n2(n2+1) – Sumn R2 Acceptance region
2 Null Hypo accepted
Step 9: Find out the upper limit = mew + Z . SD
Lower limit = mew – Z . SD
ULL UUL
Step 10: If U cal lies between ULL and UUL I,e. Acceptance region = Null Hypo accepted
QUESTION-
The values of one sample are 53,38,69,57,46,39,73,48,73,74,60,78
The values of second sample are 44,40,61,52,32,44,70,41,67,72,53,72
Using U test the hypothesis that they came from with same mean at 10% SL. Table value of 0.45 is 1.64
0.45
0.05 0.05
Upper limit = mew + Z . SD

Mew = n1.n2/2 = (12X12)/2=144/2=72
Z = 1.64 (given)
SD = V n1.n2(n1+n2+1) = 12X12(12+12+1) =
12 12
Sample Name of related Rank sum of Rank sum of

size Rank sample sample A sample B
32 1 B 1
38 2 A 2 Step 2: Find out the rank of numbers as a whole
39 3 A 3 Step 3: identify the groups /samples
40 4 B 4
41 5 B 5
44 6.5 B 6.5
44 6.5 B 6.5
46 8 A 8
48 9 A 9
52 10 B 10
53 11.5 A 11.5
53 11.5 B 11.5
57 13 A 13
60 14 A 14
61 15 B 15
67 16 B 16
69 17 A 17
70 18 B 18
72 19.5 B 19.5
72 19.5 B 19.5
73 21.5 A 21.5
73 21.5 A 21.5
74 23 A 23
78 24 A 24
Sumn R1 Sumn R2
=167.5 =132.5
(12X12)/2 = 72 mean/mew
Step 7: Find out the Std Deviation = SD = V n1.n2(n1+n2+1) = Sqrt (144X25)/12 = 17.32 = SD
12
Step 8: Find out the U calculated value n1.n2 + n1(n1+1) – Sumn R1 = 144+(12X13)/2 – 167.5=54.5
2
OR
U calculated value n1.n2 + n2(n2+1) – Sumn R2 Acceptance region
2 Null Hypo accepted
Step 9: Find out the upper limit = mew + Z . SD
ULL UUL
Step 9: Find out the upper limit = mew + Z . SD Test at 10%SL
Lower limit = mew – Z . SD 0.45
(0.05/5%)
Acceptance zone
43.59 100.40
Upper limit = mew + Z . SD
= 72 + 1.64X17.32 = 100.40
Lower limit = mew - Z . SD
= 72 - 1.64X17.32 = 43.59
U calculated value n1.n2 + n2(n2+1) – Sumn R2 = = 144+(12X13)/2 – 132.5=89.5

2
ULL = 43.59 and UUL = 100.40
Ucal (R1) = 54.5 and Ucal (R2) = 89.5 both cases it lies in the acceptance zone
so, Null Hypo is accepted
QUESTION-
An animal trainer in a circus is training 20 lions to perform a special trick. The lions have been devided into 2
groups. Grp-A gets positive reinforcement of food and favourable comments during the learning session where as
Grp-B does not. The following table indicates the no. of days needed by each lion to learn the trick.
Group A- 78, 95, 82, 69, 111, 65, 73, 84, 92, 110
Group B- 121, 132, 101, 79, 94, 88, 102, 93, 98, 127.
Using 5%SL test the Null Hypo that the mean time for both the groups is the same using U test. Table
value of Z=1.96 ( Z value of 0.475)
LIONS RANK Rank-Sample-A Rank-SampleB RankSumA RankSumB

65 1 A 1
69 2 A 2
73 3 A 3
78 4 A 4
79 5 B 5
82 6 A 6
84 7 A 7
88 8 B 8
92 9 A 9
93 10 B 10
94 11 B 11
95 12 A 12
98 13 B 13
101 14 B 14
102 15 B 15
110 16 A 16
111 17 A 17
121 18 B 18
127 19 B 19
132 20 B 20
Sumn R1 =77 Sumn R2= 133
(10X10)/2 = 50 mean/mew
Step 7: Find out the Std Deviation = SD = V n1.n2(n1+n2+1) = Sqrt (100X21)/12 = 13.22 = SD
12
Step 8: Find out the U calculated value n1.n2 + n1(n1+1) – Sumn R1 = 100+(10X11)/2 – 77=28
2
OR
U calculated value n1.n2 + n2(n2+1) – Sumn R2 = 100+(10X11)/2 – 133=-28
2
Step 9: Find out the upper limit = mew + Z . SD = 50+1.96X13.22 = 75.91
Lower limit = mew – Z . SD = 50 – 1.96X13.22 = 24.08
Acceptance region
Null Hypo accepted
24.08 75.91
Step 10: If U cal values does not lies between U LL and UUL I,e. Acceptance region = Null Hypo rejected
ULL = 24.08 and UUL = 75.91
Ucal (R1) = 28 and Ucal (R2) = -28 both cases it lies in the acceptance zone
so, Null Hypo is rejected
H test (Kruskal’s Wallies Test) : it is an extension of chi-square test.

Step 2: Find out the rank of numbers as a whole
Step 3: identify the groups /samples
Step 7: Find out the Std Deviation = SD = V n1.n2(n1+n2+1)
12
Step 8: Find out the H calculated value = 12/n(n+1)[sumnR1 2/n1+ sumnR22/n2+ sumnR32/n3] – 3 (n+1)
n=total no of items / observations
r1,r2,r3 – rank of sample 1,2,3respectively
n1,n2,n3 – sample size of 1st, 2nd, 3rd samples respectively.
Step 9: Find out the Htab at -- df, --% Sl
Step 10: If Tab value > Cal value : Null Hypo is accepted
QUESTION-
Use H test at 5% SL to test the Null Hypo that a professional bowler performs equally well with the 4 bowling balls
given the following results-
Ball A – 271, 282, 257, 248, 262
Ball B – 252, 275, 302, 268, 276
Ball C – 260, 255, 239, 246, 266
Ball D- 279, 242, 297, 270, 258
Answer-
Name of Rank Rank Rank Rank
Bowling Result Rank Ball sumA sumB sumC sumD
239 1 C 1
242 2 D 2
246 3 C 3
248 4 A 4
252 5 B 5
255 6 C 6
257 7 A 7
258 8 D 8
260 9 C 9
262 10 A 10
266 11 C 11
268 12 B 12
270 13 D 13
271 14 A 14
275 15 B 15
276 16 B 16
279 17 D 17
282 18 A 18
297 19 D 19
302 20 B 20
RA=53 RB=68 RC=30 RD=59
Step 6: Find out the mean = (n1 X n2 x n3 x n4)/4 n1 is 1 st sample size, n2 is 2nd sample size and so on..
= (5x5x5x5)/4 = 156.25 = mean
Step 7: Find out the Std Deviation = SD = V n1.n2.n3.n4(n1+n2+n3+n4+3) = (625X23)/12=1197.91=SD
12
Step 8: Find out the H calculated value
= 12/n(n+1)[sumnR12/n1+ sumnR22/n2+ sumnR32/n3+sumnR42/n4] – 3 (n+1)
= 12/20X21[532/5 + 682/5 + 302/5 + 592/5] – 3X21
= 12/420[562+925+180+696] – 63
=4.51
Step 9: Htab at (4-1)=3 df, 5% Sl = 7.815

Step 10: If Tab value > Cal value : Null Hypo is accepted, 7.81>4.51 so, Null Hypo is accepted
A Z-test is a hypothesis test based on the Z-statistic, which follows the standard normal distribution under the
null hypothesis. The simplest Z-testis the 1-sample Z-test, which tests the mean of a normally distributed
population with known variance.
A statistical test used to determine whether two population means are different when the variances are known
and the sample size is large. The test statistic is assumed to have a normal distribution and nuisance parameters
such as standard deviation should be known in order for an accurate z-test to be performed.
Where n=>30
Follows Normal distribution
Z 95% = 1.96 = 5%SL
Z 99% = 2.58 = 1%SL
actual value - Hypothetical value
Z= ---------------------------------------------
Standard Error
Standard Error means Standard deviation of sampling distribution
Mean = binomial distribution = np
No of trial of probability
Variance = npq
SD = sqrt npq(p+q+1)
P=probability of success
Q=probability of failure
Mean > variance always
A coin is tossed 500 times, 280 times, turn of head. Test at 5% SL. Hypothesis is that the coin is unbiased.
280-250 280-250
Z= --------------- = -------------------------- = 30/11.8 = 2.54 (Zcal)
Sqrt npq sqrt 500X1/2X1/2
Ztab at %SL = 1.96, as Zcal > Ztab: Null hypo is rejected
What is a Z Test used for?
One of the main purposes of statistics is hypothesis testing.

Hypothesis testing is just a way for you to figure out if results from a test are valid or repeatable. For example, if
someone said they had found a new drug that cures cancer, you would want to be sure it was probably true. A
hypothesis test will tell you if it’s probably true, or probably not true. One type of hypothesis test is a Z test, which
is usually used when your data is approximately normally distributed. In other words, if your data roughly fits the
shape of a bell curve.
When you can run a Z Test.
Several different types of tests are used in statistics (i.e. f test, chi square test, t test). You would use a Z test if:
 Your sample size should be greater than 30. Otherwise, use a t test.
 Data points should be independent from each other.
 Your data should be normally distributed. However, for large sample sizes (over 30) this doesn’t always
matter.
 Your data should be randomly selected from a population, where each item has an equal chance of being
selected.
 Sample sizes should be equal if at all possible.
How do I run a Z Test?
Running a Z test on your data requires five steps:
1. State the null hypothesis and alternate hypothesis.
2. Choose an alpha level.
3. Find the critical value of z in a z table.
4. Calculate the z test statistic (see below).
5. Compare the test statistic to the critical z value and decide if you should support or reject the null
hypothesis.
You could perform all these steps by hand. For example, you could find a critical value by hand, or calculate a z
value by hand.
You could also use technology, for example:
 Two sample z test in Excel.

 Find a critical z value on the TI 83.
 Find a critical value on the TI 89 (left-tail) .
Research in common parlance refers to a search for knowledge. Once can also define research as a
scientific and systematic search for pertinent information on a specific topic. In fact, research is an art of
scientific investigation.
Research is an academic activity and as such the term should be used in a technical sense. According to
Clifford Woody research comprises defining and redefining problems, formulating hypothesis or
suggested solutions; collecting, organising and evaluating data; making deductions and reaching
conclusions; and at last carefully testing the conclusions to determine whether they fit the formulating
hypothesis.
Research is, thus, an original contribution to the existing stock of knowledge making for its
advancement. It is the persuit of truth with the help of study, observation, comparison and experiment.
In short, the search for knowledge through objective and systematic method of finding solution to a
problem is research.
OBJECTIVES OF RESEARCH The purpose of research is to discover answers to questions through the
application of scientific procedures. The main aim of research is to find out the truth which is hidden and
which has not been discovered as yet. Though each research study has its own specific purpose, we may
think of research objectives as falling into a number of following broad groupings:
1. To gain familiarity with a phenomenon or to achieve new insights into it (studies with this object in
view are termed as exploratory or formulative research studies);
2. To portray accurately the characteristics of a particular individual, situation or a group (studies with
this object in view are known as descriptive research studies);
3. To determine the frequency with which something occurs or with which it is associated with
something else (studies with this object in view are known as diagnostic research studies);
4. To test a hypothesis of a causal relationship between variables (such studies are known as hypothesis-
testing research studies).
TYPES OF RESEARCH
The basic types of research are as follows:
(i) Descriptive vs. Analytical: Descriptive research includes surveys and fact-finding enquiries of
different kinds. The major purpose of descriptive research is description of the state of affairs as it
exists at present. In social science and business research we quite often use the term Ex post facto
research for descriptive research studies. The main characteristic of this method is that the researcher
has no control over the variables; he can only report what has happened or what is happening. In
analytical research, on the other hand, the researcher has to use facts or information already available,
and analyze these to make a critical evaluation of the material.
(ii) Applied vs. Fundamental: Research can either be applied (or action) research or fundamental (to
basic or pure) research. Applied research aims at finding a solution for an immediate problem facing a
society or an industrial/business organization, whereas fundamental research is mainly concerned with
generalizations and with the formulation of a theory. “Gathering knowledge for knowledge’s sake is
termed ‘pure’ or ‘basic’ research.” Research to identify social, economic or political trends that may
affect a particular institution are examples of applied research. Thus, the central aim of applied research
is to discover a solution for some pressing practical problem, whereas basic research is directed towards
finding information that has a broad base of applications and thus, adds to the already existing
organized body of scientific knowledge.
(iii) Quantitative vs. Qualitative: Quantitative research is based on the measurement of quantity or
amount. It is applicable to phenomena that can be expressed in terms of quantity. Qualitative research,
on the other hand, is concerned with qualitative phenomenon, i.e., phenomena relating to or involving
quality or kind. For instance, when we are interested in investigating the reasons for human behavior
(i.e., why people think or do certain things), we quite often talk of ‘Motivation Research’, an important
type of qualitative research. Qualitative research is especially important in the behavioral sciences
where the aim is to discover the underlying motives of human behavior. Through such research we can
analyse the various factors which motivate people to behave in a particular manner or which make
people like or dislike a particular thing.
(iv) Conceptual vs. Empirical: Conceptual research is that related to some abstract idea(s) or theory. It is
generally used by philosophers and thinkers to develop new concepts or to reinterpret existing ones. On
the other hand, empirical research relies on experience or observation alone, often without due regard
for system and theory. It is data-based research, coming up with conclusions which are capable of being
verified by observation or experiment.
(v) Some Other Types of Research: All other types of research are variations of one or more of the
above stated approaches, based on either the purpose of research, or the time required to accomplish
research, on the environment in which research is done, or on the basis of some other similar factor.
Form the point of view of time, we can think of research either as one-time research or longitudinal
research. In the former case the research is confined to a single time-period, whereas in the latter case
the research is carried on over several time-periods. Research can be field-setting research or laboratory
research or simulation research, depending upon the environment in which it is to be carried out.
Research can as well be understood as clinical or diagnostic research.
Research methodology is a way to systematically solve the research problem. It may be
understood as a science of studying how research is done scientifically. In it we study the various steps
that are generally adopted by a researcher in studying his research problem along with the logic behind
them. It is necessary for the researcher to know not only the research methods/techniques but also the
methodology. Researchers not only need to know how to develop certain indices or tests, how to
calculate the mean, the mode, the median or the standard deviation or chi-square, how to apply
particular research techniques, but they also need to know which of these methods or techniques, are
relevant and which are not, and what would they mean and indicate and why. Researchers also need to
understand the assumptions underlying various techniques and they need to know the criteria by which
they can decide that certain techniques and procedures will be applicable to certain problems and
others will not. All this means that it is necessary for the researcher to design his methodology for his
problem as the same may differ from problem to problem.
Research Process Before embarking on the details of research methodology and techniques, it
seems appropriate to present a brief overview of the research process. Research process consists of
series of actions or steps necessary to effectively carry out research and the desired sequencing of these
steps.
The research process consists of a number of closely related activities, as shown through I to VII. But
such activities overlap continuously rather than following a strictly prescribed sequence. At times, the
first step determines the nature of the last step to be undertaken. If subsequent procedures have not
been taken into account in the early stages, serious difficulties may arise which may even prevent the
completion of the study. One should remember that the various steps involved in a research process are
not mutually exclusive; nor are they separate and distinct. They do not necessarily follow each other in
any specific order and the researcher has to be constantly anticipating at each step in the research
process the requirements of the subsequent steps. However, the following order concerning various
steps provides a useful procedural guideline regarding the research process:
(1) Formulating the research problem;
(2) Extensive literature survey;
(3) Developing the hypothesis;
(4) Preparing the research design;
(5) Determining sample design;
(6) Collecting the data;
(7) Execution of the project;
(8) Analysis of data;
(9) Hypothesis testing;
(10) Generalizations and interpretation, and
(11) Preparation of the report or presentation of the results, i.e., formal write-up of conclusions reached
A brief description of the above stated steps will be helpful.

1. Formulating the research problem: There are two types of research problems, viz., those which relate
to states of nature and those which relate to relationships between variables. At the very outset the
researcher must single out the problem he wants to study, i.e., he must decide the general area of
interest or aspect of a subject-matter that he would like to inquire into. Initially the problem may be
stated in a broad general way and then the ambiguities, if any, relating to the problem be resolved.
Then, the feasibility of a particular solution has to be considered before a working formulation of the
problem can be set up. The formulation of a general topic into a specific research problem, thus,
constitutes the first step in a scientific enquiry. Essentially two steps are involved in formulating the
research problem, viz., understanding the problem thoroughly, and rephrasing the same into meaningful
terms from an analytical point of view. The best way of understanding the problem is to discuss it with
one’s own colleagues or with those having some expertise in the matter.
2. Extensive literature survey: Once the problem is formulated, a brief summary of it should be written
down. It is compulsory for a research worker writing a thesis for a Ph.D. degree to write a synopsis of the
topic and submit it to the necessary Committee or the Research Board for approval. At this juncture the
researcher should undertake extensive literature survey connected with the problem. For this purpose,
the abstracting and indexing journals and published or unpublished bibliographies are the first place to
go to. Academic journals, conference proceedings, government reports, books etc., must be tapped
depending on the nature of the problem. In this process, it should be remembered that one source will
lead to another.
3. Development of working hypotheses: Working hypothesis is tentative assumption made in order to
draw out and test its logical or empirical consequences. As such the manner in which research
hypotheses are developed is particularly important since they provide the focal point for research. They
also affect the manner in which tests must be conducted in the analysis of data and indirectly the quality
of data which is required for the analysis. In most types of research, the development of working
hypothesis plays an important role. Hypothesis should be very specific and limited to the piece of
research in hand because it has to be tested. The role of the hypothesis is to guide the researcher by
delimiting the area of research and to keep him on the right track. It sharpens his thinking and focuses
attention on the more important facets of the problem. It also indicates the type of data required and
the type of methods of data analysis to be used. How does one go about developing working
hypotheses? The answer is by using the following approach: (a) Discussions with colleagues and experts
about the problem, its origin and the objectives in seeking a solution; (b) Examination of data and
records, if available, concerning the problem for possible trends, peculiarities and other clues; (c) Review
of similar studies in the area or of the studies on similar problems; and (d) Exploratory personal
investigation which involves original field interviews on a limited scale with interested parties and
individuals with a view to secure greater insight into the practical aspects of the problem. Thus, working
hypotheses arise as a result of a-priori thinking about the subject, examination of the available data and
material including related studies and the counsel of experts and interested parties.
4. Preparing the research design: The research problem having been formulated in clear cut terms, the
researcher will be required to prepare a research design, i.e., he will have to state the conceptual
structure within which research would be conducted. The preparation of such a design facilitates
research to be as efficient as possible yielding maximal information. In other words, the function of
research design is to provide for the collection of relevant evidence with minimal expenditure of effort,
time and money. But how all these can be achieved depends mainly on the research purpose. Research
purposes may be grouped into four categories, viz., (i) Exploration, (ii) Description, (iii) Diagnosis, and (iv)
Experimentation. A flexible research design which provides opportunity for considering many different
aspects of a problem is considered appropriate if the purpose of the research study is that of
exploration. But when the purpose happens to be an accurate description of a situation or of an
association between variables, the suitable design will be one that minimizes bias and maximizes the
reliability of the data collected and analyzed. There are several research designs, such as, experimental
and non-experimental hypothesis testing. Experimental designs can be either informal designs (such as
before-and-after without control, after-only with control, before-and-after with control) or formal
designs (such as completely randomized design, randomized block design, Latin square design, simple
and complex factorial designs), out of which the researcher must select one for his own project. The
preparation of the research design, appropriate for a particular research problem, involves usually the
consideration of the following: (i) the means of obtaining the information; (ii) the availability and skills of
the researcher and his staff (if any); (iii) explanation of the way in which selected means of obtaining
information will be organised and the reasoning leading to the selection; (iv) the time available for
research; and (v) the cost factor relating to research, i.e., the finance available for the purpose.
5. Determining sample design: All the items under consideration in any field of inquiry constitute a
‘universe’ or ‘population’. A complete enumeration of all the items in the ‘population’ is known as a
census inquiry. It can be presumed that in such an inquiry when all the items are covered no element of
chance is left and highest accuracy is obtained. But in practice this may not be true. Even the slightest
element of bias in such an inquiry will get larger and larger as the number of observations increases.
Moreover, there is no way of checking the element of bias or its extent except through a resurvey or use
of sample checks. Besides, this type of inquiry involves a great deal of time, money and energy. Not only
this, census inquiry is not possible in practice under many circumstances. The researcher must decide
the way of selecting a sample or what is popularly known as the sample design. Samples can be either
probability samples or non-probability samples. With probability samples each element has a known
probability of being included in the sample but the non-probability samples do not allow the researcher
to determine this probability. Probability samples are those based on simple random sampling,
systematic sampling, stratified sampling, cluster/area sampling whereas non-probability samples are
those based on convenience sampling, judgment sampling and quota sampling techniques. A brief
mention of the important sample designs is as follows:
(i) Deliberate sampling: Deliberate sampling is also known as purposive or non-probability sampling. This
sampling method involves purposive or deliberate selection of particular units of the universe for
constituting a sample which represents the universe. When population elements are selected for
inclusion in the sample based on the ease of access, it can be called convenience sampling. If a
researcher wishes to secure data from, say, gasoline buyers, he may select a fixed number of petrol
stations and may conduct interviews at these stations. This would be an example of convenience sample
of gasoline buyers. At times such a procedure may give very biased results particularly when the
population is not homogeneous. On the other hand, in judgment sampling the researcher’s judgment is
used for selecting items which he considers as representative of the population. For example, a
judgment sample of college students might be taken to secure reactions to a new method of teaching.
Judgment sampling is used quite frequently in qualitative research where the desire happens to be to
develop hypotheses rather than to generalize to larger populations.
(ii) Simple random sampling: This type of sampling is also known as chance sampling or probability
sampling where each and every item in the population has an equal chance of inclusion in the sample
and each one of the possible samples, in case of finite universe, has the same probability of being
selected. For example, if we have to select a sample of 300 items from a universe of 15,000 items, then
we can put the names or numbers of all the 15,000 items on slips of paper and conduct a lottery. Using
the random number tables is another method of random sampling. To select the sample, each item is
assigned a number from 1 to 15,000. Then, 300 five digit random numbers are selected from the table.
To do this we select some random starting point and then a systematic pattern is used in proceeding
through the table. We might start in the 4th row, second column and proceed down the column to the
bottom of the table and then move to the top of the next column to the right. When a number exceeds
the limit of the numbers in the frame, in our case over 15,000, it is simply passed over and the next
number selected that does fall within the relevant range. Since the numbers were placed in the table in
a completely random fashion, the resulting sample is random. This procedure gives each item an equal
probability of being selected. In case of infinite population, the selection of each item in a random
sample is controlled by the same probability and that successive selections are independent of one
another.
(iii) Systematic sampling: In some instances the most practical way of sampling is to select every 15th
name on a list, every 10th house on one side of a street and so on. Sampling of this type is known as
systematic sampling. An element of randomness is usually introduced into this kind of sampling by using
random numbers to pick up the unit with which to start. This procedure is useful when sampling frame is
available in the form of a list. In such a design the selection process starts by picking some random point
in the list and then every nth element is selected until the desired number is secured.
(iv) Stratified sampling: If the population from which a sample is to be drawn does not constitute a
homogeneous group, then stratified sampling technique is applied so as to obtain a representative
sample. In this technique, the population is stratified into a number of non overlapping subpopulations
or strata and sample items are selected from each stratum. If the items selected from each stratum is
based on simple random sampling the entire procedure, first stratification and then simple random
sampling, is known as stratified random sampling.
(v) Quota sampling: In stratified sampling the cost of taking random samples from individual strata is
often so expensive that interviewers are simply given quota to be filled from different strata, the actual
selection of items for sample being left to the interviewer’s judgment. This is called quota sampling. The
size of the quota for each stratum is generally proportionate to the size of that stratum in the
population. Quota sampling is thus an important form of non-probability sampling. Quota samples
generally happen to be judgment samples rather than random samples.
(vi) Cluster sampling and area sampling: Cluster sampling involves grouping the population and then
selecting the groups or the clusters rather than individual elements for inclusion in the sample. Suppose
some departmental store wishes to sample its credit card holders. It has issued its cards to 15,000
customers. The sample size is to be kept say 450. For cluster sampling this list of 15,000 card holders
could be formed into 100 clusters of 150 card holders each. Three clusters might then be selected for
the sample randomly. The sample size must often be larger than the simple random sample to ensure
the same level of accuracy because is cluster sampling procedural potential for order bias and other
sources of error are usually accentuated. The clustering approach can, however, make the sampling
procedure relatively easier and increase the efficiency of field work, especially in the case of personal
interviews. Area sampling is quite close to cluster sampling and is often talked about when the total
geographical area of interest happens to be big one. Under area sampling we first divide the total area
into a number of smaller non-overlapping areas, generally called geographical clusters, then a number
of these smaller areas are randomly selected, and all units in these small areas are included in the
sample. Area sampling is especially helpful where we do not have the list of the population concerned. It
also makes the field interviewing more efficient since interviewer can do many interviews at each
location.
(vii) Multi-stage sampling: This is a further development of the idea of cluster sampling. This technique is
meant for big inquiries extending to a considerably large geographical area like an entire country. Under
multi-stage sampling the first stage may be to select large primary sampling units such as states, then
districts, then towns and finally certain families within towns. If the technique of random-sampling is
applied at all stages, the sampling procedure is described as multi-stage random sampling.
(viii) Sequential sampling: This is somewhat a complex sample design where the ultimate size of the
sample is not fixed in advance but is determined according to mathematical decisions on the basis of
information yielded as survey progresses. This design is usually adopted under acceptance sampling plan
in the context of statistical quality control. In practice, several of the methods of sampling described
above may well be used in the same study in which case it can be called mixed sampling. It may be
pointed out here that normally one should resort to random sampling so that bias can be eliminated and
sampling error can be estimated. But purposive sampling is considered desirable when the universe
happens to be small and a known characteristic of it is to be studied intensively. Also, there are
conditions under which sample designs other than random sampling may be considered better for
reasons like convenience and low costs. The sample design to be used must be decided by the
researcher taking into consideration the nature of the inquiry and other related factors.
6. Collecting the data: In dealing with any real life problem it is often found that data at hand are
inadequate, and hence, it becomes necessary to collect data that are appropriate. There are several
ways of collecting the appropriate data which differ considerably in context of money costs, time and
other resources at the disposal of the researcher. Primary data can be collected either through
experiment or through survey. If the researcher conducts an experiment, he observes some quantitative
measurements, or the data, with the help of which he examines the truth contained in his hypothesis.
But in the case of a survey, data can be collected by any one or more of the following ways:
(i) By observation: This method implies the collection of information by way of investigator’s own
observation, without interviewing the respondents. The information obtained relates to what is
currently happening and is not complicated by either the past behavior or future intentions or attitudes
of respondents. This method is no doubt an expensive method and the information provided by this
method is also very limited. As such this method is not suitable in inquiries where large samples are
concerned.
(ii) Through personal interview: The investigator follows a rigid procedure and seeks answers to a set of
pre-conceived questions through personal interviews. This method of collecting data is usually carried
out in a structured way where output depends upon the ability of the interviewer to a large extent.
(iii) Through telephone interviews: This method of collecting information involves contacting the
respondents on telephone itself. This is not a very widely used method but it plays an important role in
industrial surveys in developed regions, particularly, when the survey has to be accomplished in a very
limited time.
(iv) By mailing of questionnaires: The researcher and the respondents do come in contact with each
other if this method of survey is adopted. Questionnaires are mailed to the respondents with a request
to return after completing the same. It is the most extensively used method in various economic and
business surveys. Before applying this method, usually a Pilot Study for testing the questionnaire is
conduced which reveals the weaknesses, if any, of the questionnaire? Questionnaire to be used must be
prepared very carefully so that it may prove to be effective in collecting the relevant information.
(v) Through schedules: Under this method the enumerators are appointed and given training. They are
provided with schedules containing relevant questions. These enumerators go to respondents with
these schedules. Data are collected by filling up the schedules by enumerators on the basis of replies
given by respondents. Much depends upon the capability of enumerators so far as this method is
concerned. Some occasional field checks on the work of the enumerators may ensure sincere work. The
researcher should select one of these methods of collecting the data taking into consideration the
nature of investigation, objective and scope of the inquiry, financial resources, available time and the
desired degree of accuracy. Though he should pay attention to all these factors but much depends upon
the ability and experience of the researcher.
7. Execution of the project: Execution of the project is a very important step in the research process. If
the execution of the project proceeds on correct lines, the data to be collected would be adequate and
dependable. The researcher should see that the project is executed in a systematic manner and in time.
If the survey is to be conducted by means of structured questionnaires, data can be readily machine-
processed. In such a situation, questions as well as the possible answers may be coded. If the data are to
be collected through interviewers, arrangements should be made for proper selection and training of
the interviewers. The training may be given with the help of instruction manuals which explain clearly
the job of the interviewers at each step. Occasional field checks should be made to ensure that the
interviewers are doing their assigned job sincerely and efficiently. A careful watch should be kept for
unanticipated factors in order to keep the survey as much realistic as possible. This, in other words,
means that steps should be taken to ensure that the survey is under statistical control so that the
collected information is in accordance with the pre-defined standard of accuracy. If some of the
respondents do not cooperate, some suitable methods should be designed to tackle this problem. One
method of dealing with the non-response problem is to make a list of the non-respondents and take a
small sub-sample of them, and then with the help of experts vigorous efforts can be made for securing
response.
8. Analysis of data: After the data have been collected, the researcher turns to the task of analyzing
them. The analysis of data requires a number of closely related operations such as establishment of
categories, the application of these categories to raw data through coding, tabulation and then drawing
statistical inferences. The unwieldy data should necessarily be condensed into a few manageable groups
and tables for further analysis. Thus, researcher should classify the raw data into some purposeful and
usable categories. Coding operation is usually done at this stage through which the categories of data
are transformed into symbols that may be tabulated and counted. Editing is the procedure that
improves the quality of the data for coding. With coding the stage is ready for tabulation. Tabulation is a
part of the technical procedure wherein the classified data are put in the form of tables. The mechanical
devices can be made use of at this juncture. A great deal of data, especially in large inquiries, is
tabulated by computers. Computers not only save time but also make it possible to study large number
of variables affecting a problem simultaneously. Analysis work after tabulation is generally based on the
computation of various percentages, coefficients, etc., by applying various well defined statistical
formulae. In the process of analysis, relationships or differences supporting or conflicting with original or
new hypotheses should be subjected to tests of significance to determine with what validity data can be
said to indicate any conclusion(s). For instance, if there are two samples of weekly wages, each sample
being drawn from factories in different parts of the same city, giving two different mean values, then our
problem may be whether the two mean values are significantly different or the difference is just a
matter of chance. Through the use of statistical tests we can establish whether such a difference is a real
one or is the result of random fluctuations. If the difference happens to be real, the inference will be
that the two samples come from different universes and if the difference is due to chance, the
conclusion would be that the two samples belong to the same universe. Similarly, the technique of
analysis of variance can help us in analyzing whether three or more varieties of seeds grown on certain
fields yield significantly different results or not. In brief, the researcher can analyze the collected data
with the help of various statistical measures.
9. Hypothesis-testing: After analyzing the data as stated above, the researcher is in a position to test the
hypotheses, if any, he had formulated earlier. Do the facts support the hypotheses or they happen to be
contrary? This is the usual question which should be answered while testing hypotheses. Various tests,
such as Chi square test, t-test, F-test, have been developed by statisticians for the purpose. The
hypotheses may be tested through the use of one or more of such tests, depending upon the nature and
object of research inquiry. Hypothesis-testing will result in either accepting the hypothesis or in rejecting
it. If the researcher had no hypotheses to start with, generalizations established on the basis of data
may be stated as hypotheses to be tested by subsequent researches in times to come.
10. Generalizations and interpretation: If a hypothesis is tested and upheld several times, it may be
possible for the researcher to arrive at generalization, i.e., to build a theory. As a matter of fact, the real
value of research lies in its ability to arrive at certain generalizations. If the researcher had no hypothesis
to start with, he might seek to explain his findings on the basis of some theory. It is known as
interpretation. The process of interpretation may quite often trigger off new questions which in turn
may lead to further researches.
11. Preparation of the report or the thesis: Finally, the researcher has to prepare the report of what has
been done by him. Writing of report must be done with great care keeping in view the following:
1. The layout of the report should be as follows:
(i) the preliminary pages; -
a) cover page,
b) preface,
c) guide certification,
d) acknowledgements,
e) executive summary,
f) table of contents,
g) list of tables and list of graphs and charts, etc.
(ii) the main text, and
The main text of the report should have the following parts:
(a) Introduction: It should contain a clear statement of the objective of the research and an
explanation of the methodology adopted in accomplishing the research. The scope of the study
along with various limitations should as well be stated in this part.
(b) Summary of findings: After introduction there would appear a statement of findings and
recommendations in non-technical language. If the findings are extensive, they should be
summarized.
(c) Main report: The main body of the report should be presented in logical sequence and
broken-down into readily identifiable sections.
(d) Conclusion: Towards the end of the main text, researcher should again put down the results
of his research clearly and precisely. In fact, it is the final summing up.
(ii) the end matter.
At the end of the report, appendices should be enlisted in respect of all technical data.
Bibliography, i.e., list of books, journals, reports, etc., consulted, should also be given in the end.
Index should also be given specially in a published research report.
2. Report should be written in a concise and objective style in simple language avoiding vague
expressions such as ‘it seems,’ ‘there may be’, and the like.
3. Charts and illustrations in the main report should be used only if they present the information more
clearly and forcibly.
4. Calculated ‘confidence limits’ must be mentioned and the various constraints experienced in
conducting research operations may as well be stated.
Questionnaire in Research Methodology

A questionnaire refers to a device for securing answers to questions by using a form which the
respondent fills in by himself. It consists of a number of questions printed or typed in a definite order.
These forms are actually mailed to the respondent who was expected to read and understand the
questions and reply to them by writing the relevant answers in the spaces provided. Ideally speaking
respondent must answer to a verbal stimulus and give a written or verbal response. It is totally devoid of
any table. Its purpose is to collect information from the respondents who are scattered over a vast area.
Questionnaire include open-ended questions and close-ended questions. Open-ended questions allow
the respondent considerable freedom in answering. However, questions are answered in details. Close-
ended questions has to be answered by the respondent by choosing an answer from the set of answers
given under a question just by ticking.
Following are the different types of Questionnaire used by social scientists and anthropologists.
 Structured questionnaire: It include definite, concrete and pre-obtained questions which were
prepared in advance.
 Closed-form questionnaire: It is used when categorized data is required.
 Pictorial questionnaire: It is used to promote interest in answering after seeing the pictures on a
particular theme.
 Unstructured questionnaire: Designed to obtained view points, opinions, and attitudes and to
show relationships and inter-connections between data which might escape notice under more
mechanical types of interrogations.
 Formal standardized questionnaires: If the researcher is looking to test and quantify
hypotheses and the data is to be analyzed statistically, a formal standardized questionnaire is
designed. Such questionnaires are generally characterized by:
 · prescribed wording and order of questions, to ensure that each respondent receives
the same stimuli
 · prescribed definitions or explanations for each question, to ensure interviewers handle
questions consistently and can answer respondents' requests for clarification if they
occur
 · Prescribed response format, to enable rapid completion of the questionnaire during
the interviewing process.
 Given the same task and the same hypotheses, six different people will probably come up with
six different questionnaires that differ widely in their choice of questions, line of questioning,
use of open-ended questions and length. There are no hard-and-fast rules about how to design
a questionnaire, but there are a number of points that can be borne in mind:
 1. A well-designed questionnaire should meet the research objectives. This may seem obvious,
but many research surveys omit important aspects due to inadequate preparatory work, and do
not adequately probe particular issues due to poor understanding. To a certain degree some of
this is inevitable. Every survey is bound to leave some questions unanswered and provide a need
for further research but the objective of good questionnaire design is to 'minimise' these
problems.
 2. It should obtain the most complete and accurate information possible. The questionnaire
designer needs to ensure that respondents fully understand the questions and are not likely to
refuse to answer, lie to the interviewer or try to conceal their attitudes. A good questionnaire is
organized and worded to encourage respondents to provide accurate, unbiased and complete
information.
 3. A well-designed questionnaire should make it easy for respondents to give the necessary
information and for the interviewer to record the answer, and it should be arranged so that
sound analysis and interpretation are possible.
 4. It would keep the interview brief and to the point and be so arranged that the
respondent(s) remain interested throughout the interview.
Preliminary decisions in questionnaire design

There are nine steps involved in the development of a questionnaire:
1. Decide the information required.
2. Define the target respondents.
3. Choose the method(s) of reaching your target respondents.
4. Decide on question content.
5. Develop the question wording.
6. Put questions into a meaningful order and format.
7. Check the length of the questionnaire.
8. Pre-test the questionnaire.
9. Develop the final survey form.
Guidelines-
 Questions should be brief
 Objective in nature
 Should be of multiple choice
 No mathematical calculations
 Avoid sensitive questions
 Dual meaning questions be avoided
 Should look attractive
 No ambiguity
 Should be pretested.
Types of questions-
Closed-Ended Questions
Closed-ended questions limit the answers of the respondents to response options provided on the
questionnaire.
 Advantages: time-efficient; responses are easy to code and interpret; ideal for quantitative type
of research
 Disadvantages: respondents are required to choose a response that does not exactly reflect
their answer; the researcher cannot further explore the meaning of the responses
Some examples of close ended questions are:
a. Dichotomous or two-point questions (e.g. Yes or No, Unsatisfied or Satisfied)
b. Multiple choice questions (e.g. A, B, C or D)
c. Scaled questions that are making use of rating scales such as the Likert scale (i.e. a type of five-
point scale), three-point scales, semantic differential scales, and seven-point scales
Open-Ended Questions
In open-ended questions, there are no predefined options or categories included.
The participantsshould supply their own answers.
 Advantages: participants can respond to the questions exactly as how they would like to answer
them; the researcher can investigate the meaning of the responses; ideal forqualitative type of
research
 Disadvantages: time-consuming; responses are difficult to code and interpret
Contingency Questions
Questions that need to be answered only when the respondent provides a particular response
to a question prior to them are called contingency questions. Asking these questions effectively
avoids asking people questions that are not applicable to them. For example:
Have you ever smoked a cigarette?
___Yes ___ No
If YES, how many times have you smoked cigarette?
__ Once
__2-5 times
__ 6-10 times
__more than 10 times
The second question above is what we refer to as a contingency question following up a closed-
ended question.
Factual question (background of the respondents)
Opinion based/Attitude based questions- on 5 point scale/7 point scale/14 point scale
Research Design-
The arrangement of conditions for collection and analysis of data in a manner that aims to combine
relevance to the research purpose with economy in procedure.
1) Sampling design
2) Observational design
3) Statistical design
4) Operational design
Important features of Research Design-
 Plan- specifies the sources and types of information
 Strategy –which approach will be used for gathering and amnalysing the data
 Time and costs budgets- most studies are done under these two constraints
Research Design must contain-
 Research problem
 Procedure and techniques to be used
 Population to be studies
 Methods to be used in the processing and analyzing
Various concepts relating to Research design-

1. Dependent and independent variables: A concept which can take on different quantitative values is called a
variable. As such the concepts like weight, height, income are all examples of variables. Qualitative
phenomena (or the attributes) are also quantified on the basis of the presence or absence of the concerning
attribute(s). Phenomena which can take on quantitatively different values even in decimal points are called
‘continuous variables’. But all variables are not continuous. If they can only be expressed in integer values,
they are non-continuous variables or in statistical language ‘discrete variables’. Age is an example of
continuous variable, but the number of children is an example of non-continuous variable. If one variable
depends upon or is a consequence of the other variable, it is termed as a dependent variable, and the variable
that is antecedent to the dependent variable is termed as an independent variable. For instance, if we say that
height depends upon age, then height is a dependent variable and age is an independent variable. Further, if
in addition to being dependent upon age, height also depends upon the individual’s sex, then height is a
dependent variable and age and sex are independent variables. Similarly, readymade films and lectures are
examples of independent variables, whereas behavioural changes, occurring as a result of the environmental
manipulations, are examples of dependent variables.
2. Extraneous variable: Independent variables that are not related to the purpose of the study, but may affect
the dependent variable are termed as extraneous variables. Suppose the researcher wants to test the
hypothesis that there is a relationship between children’s gains in social studies achievement and their self-
concepts. In this case self-concept is an independent variable and social studies achievement is a dependent
variable. Intelligence may as well affect the social studies achievement, but since it is not related to the
purpose of the study undertaken by the researcher, it will be termed as an extraneous variable. Whatever
effect is noticed on dependent variable as a result of extraneous variable(s) is technically described as an
‘experimental error’. A study must always be so designed that the effect upon the dependent variable is
attributed entirely to the independent variable(s), and not to some extraneous variable or variables.
3. Control: One important characteristic of a good research design is to minimise the influence or effect of
extraneous variable(s). The technical term ‘control’ is used when we design the study minimising the effects of
extraneous independent variables. In experimental researches, the term ‘control’ is used to refer to restrain
experimental conditions.
4. Confounded relationship: When the dependent variable is not free from the influence of extraneous
variable(s), the relationship between the dependent and independent variables is said to be confounded by an
extraneous variable(s).
5. Research hypothesis: When a prediction or a hypothesised relationship is to be tested by scientific methods, it
is termed as research hypothesis. The research hypothesis is a predictive statement that relates an
independent variable to a dependent variable. Usually a research hypothesis must contain, at least, one
independent and one dependent variable. Predictive statements which are not to be objectively verified or the
relationships that are assumed but not to be tested, are not termed research hypotheses.
6. Experimental and non-experimental hypothesis-testing research: When the purpose of research is to test a
research hypothesis, it is termed as hypothesis-testing research. It can be of the experimental design or of the
non-experimental design. Research in which the independent variable is manipulated is termed ‘experimental
hypothesis-testing research’ and a research in which an independent variable is not manipulated is called
‘non-experimental hypothesis-testing research’. For instance, suppose a researcher wants to study whether
intelligence affects reading ability for a group of students and for this purpose he randomly selects 50 students
and tests their intelligence and reading ability by calculating the coefficient of correlation between the two
sets of scores. This is an example of non-experimental hypothesis-testing research because herein the
independent variable, intelligence, is not manipulated. But now suppose that our researcher randomly selects
50 students from a group of students who are to take a course in statistics and then divides them into two
groups by randomly assigning 25 to Group A, the usual studies programme, and 25 to Group B, the special
studies programme. At the end of the course, he administers a test to each group in order to judge the
effectiveness of the training programme on the student’s performance-level. This is an example of
experimental hypothesis-testing research because in this case the independent variable, viz., the type of
training programme, is manipulated.
7. Experimental and control groups: In an experimental hypothesis-testing research when a group is exposed to
usual conditions, it is termed a ‘control group’, but when the group is exposed to some novel or special
condition, it is termed an ‘experimental group’. In the above illustration, the Group A can be called a control
group and the Group B an experimental group. If both groups A and B are exposed to special studies
programmes, then both groups would be termed ‘experimental groups.’ It is possible to design studies which
include only experimental groups or studies which include both experimental and control groups.
8. Treatments: The different conditions under which experimental and control groups are put are usually
referred to as ‘treatments’. In the illustration taken above, the two treatments are the usual studies
programme and the special studies programme. Similarly, if we want to determine through an experiment the
comparative impact of three varieties of fertilizers on the yield of wheat, in that case the three varieties of
fertilizers will be treated as three treatments.
9. Experiment: The process of examining the truth of a statistical hypothesis, relating to some research problem,
is known as an experiment. For example, we can conduct an experiment to examine the usefulness of a
certain newly developed drug. Experiments can be of two types viz., absolute experiment and comparative
experiment. If we want to determine the impact of a fertilizer on the yield of a crop, it is a case of absolute
experiment; but if we want to determine the impact of one fertilizer as compared to the impact of some other
fertilizer, our experiment then will be termed as a comparative experiment. Often, we undertake comparative
experiments when we talk of designs of experiments.
10. Experimental unit(s): The pre-determined plots or the blocks, where different treatments are used, are known
as experimental units. Such experimental units must be selected (defined) very carefully.
Scaling techniques-
Types of Data & Measurement Scales:

There are four measurement scales (or types of data): nominal, ordinal, interval and ratio.
Nominal
Let’s start with the easiest one to understand. Nominal scales are used for labeling variables, without
any quantitative value. “Nominal” scales could simply be called “labels.” Here are some examples,
below. Notice that all of these scales are mutually exclusive (no overlap) and none of them has any
numerical significance. A good way to remember all of this is that “nominal” sounds a lot like “name”
and nominal scales are kind of like “names” or labels.
Ordinal
With ordinal scales, it is the order of the values is what’s important and significant, but the differences
between each one is not really known. Take a look at the example below. In each case, we know that a
#4 is better than a #3 or #2, but we don’t know–and cannot quantify–how much better it is. For
example, is the difference between “OK” and “Unhappy” the same as the difference between “Very
Happy” and “Happy?” We can’t say. Mainly based on ranking.
Interval
Interval scales are numeric scales in which we know not only the order, but also the exact differences
between the values. The classic example of an interval scale is Celsius temperature because the
difference between each value is the same. For example, the difference between 60 and 50 degrees is a
measurable 10 degrees, as is the difference between 80 and 70 degrees. Time is another good example
of an interval scale in which the increments are known, consistent, and measurable.
Interval scales are nice because the realm of statistical analysis on these data sets opens up. For
example, central tendency can be measured by mode, median, or mean; standard deviation can also be
calculated.
Like the others, you can remember the key points of an “interval scale” pretty easily. “Interval” itself
means “space in between,” which is the important thing to remember–interval scales not only tell us
about order, but also about the value between each item. Ex- Max temp and Min temp.
Ratio
Ratio scales are the ultimate nirvana when it comes to measurement scales because they tell us about
the order, they tell us the exact value between units, AND they also have an absolute zero–which allows
for a wide range of both descriptive and inferential statistics to be applied. At the risk of repeating
myself, everything above about interval data applies to ratio scales + ratio scales have a clear definition
of zero. Good examples of ratio variables include height and weight.
Ratio scales provide a wealth of possibilities when it comes to statistical analysis. These variables can be
meaningfully added, subtracted, multiplied, divided (ratios). Central tendency can be measured by
mode, median, or mean; measures of dispersion, such as standard deviation and coefficient of variation
can also be calculated from ratio scales. Ex- 1 : 2.
LIKERT SCALES
It is a summated scale with a detailed item analysis in its construction process. It is the most commonly
used scaling method because of its relatively simple construction procedure.
Likert Scales are characterized by the use of items which ask the respondent to check one point on a five
to eleven point scale, e.g. (1) strongly agree (2) agree (3) undecided (4) disagree and (5) strongly
disagree
For instance, they could rate each item on a 1-to-5 response scale where: 1 = strongly disagree, 2 =
disagree, 3 = undecided, 4 = agree, 5 = strongly agree.
Guidelines for preparing research reports

Research experience is as close to a professional problem-solving activity as anything in the curriculum.
It provides exposure to research methods and an opportunity to work closely with a faculty advisor,
graduate students, and sometimes post doctoral fellows and visiting scientists. Research usually requires
the use of advanced concepts, a variety of experimental techniques, and state-of-the-art
instrumentation. Ideally, undergraduate research should focus on a well-defined project that stands a
reasonable chance of completion in the time available. A literature survey alone is not a satisfactory
research project. Neither is repetition of established procedures
TECHNICAL Report/ Academic – consists of

1) Major findings and concepts
2) Nature of research work – including
 Objectives of study
 Formulation of problem
 Working hypothesis
 Type of analysis
 Data required
3) Research methodology – it explains various methods used in the study and their limitations
4) Data analysis – emphasizes on analysis of the data and their sources/ characateristics nd
limitations
5) Presentation of the findings either in table or chart form
6) Main conclusion
7) Bibliography
8) Technical appendix- including
 Questionnaire
 Mathematical derivations
 Elaboration of concepts
Popular report – focuses on simplicity and attractiveness. Its writing is clear with minimum
statistical details and liberal use of charts and diagrams. It has a attractive layout, large font size, many
sub headings. It emphasizes on practical aspects and policy implications
It contains
1) Major findings and conclusion
2) Follow-up action
3) Specific objectives of the study
4) Methodology
5) Results
6) Appendix
Stages of writing Academic report
1) Systematic analysis of the subject
2) Drawing the outline
3) Preparation of rough draft
4) Enrichment of the final draft
5) Preparation of final bibliography
6) Finalize and complete the draft
SIGNIFICANCE OF REPORT WRITING

Research report is considered a major component of the research study for the research task remains
incomplete till the report has been presented and/or written. As a matter of fact even the most brilliant
hypothesis, highly well designed and conducted research study, and the most striking generalizations and
findings are of little value unless they are effectively communicated to others. The purpose of research is
not well served unless the findings are made known to others. Research results must invariably enter the
general store of knowledge. All this explains the significance of writing research report. There are people
who do not consider writing of report as an integral part of the research process. But the general opinion
is in favor of treating the presentation of research results or the writing of report as part and parcel of the
research project. Writing of report is the last step in a research study and requires a set of skills
somewhat different from those called for in respect of the earlier stages of research. This task should be
accomplished by the researcher with utmost care; he may seek the assistance and guidance of experts
for the purpose.
 It provides details.
 It gives us idea about organized data
 It reflects final research
 It is a tool for evaluating researcher
 It gives is idea about bibliographical evidence
Precautions of writing a report
 It should be concise and complete
 Avoid technical jargons
 Gives us idea about the objective layout
 It should be free from bias and errors
 Should be logical presentation
 Should be original and specific
 Should have recommendations for follow-up
 Give an idea about proper indexing
 Mention limits and constraints
Presentation-
Is defined as the commitment of a speaker to serve a problem with the audience . elements of a
presentation are-
 Presenter/speaker
 Specific content with a definite objective
 Audience
Factors affecting presentation-
 Audience evaluation
 Environment
 Presenters appearance
 Use of AV aids
 Organized presentation
 Language and words
 Quality of voice
 Body language
 Handling and use of question and answers
Techniques of presentation-=
 Use of AV aids
 Lecture methods
o Role play
o Problem solving
o Demonstration
o Dramatization
 Case study method
Multivariate Data Analysis refers to any statistical technique used to analyze data that arises from
more than one variable. This essentially models reality where each situation, product, or decision involves
more than a single variable. The information age has resulted in masses of data in every field. Despite the
quantum of data available, the ability to obtain a clear picture of what is going on and make intelligent
decisions is a challenge. When available information is stored in database tables containing rows and
columns, Multivariate Analysis can be used to process the information in a meaningful fashion.
Multivariate analysis methods typically used for:
 Consumer and market research
 Quality control and quality assurance across a range of industries such as food and beverage,
paint, pharmaceuticals, chemicals, energy, telecommunications, etc
 Process optimization and process control
 Research and development

Research Methodologyy

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Research Methodologyy

Uploaded by

Copyright:

Available Formats

Unit-1: Research: Meaning, Objectives & Importance of Research; Role of research in Functional Areas:

Finance, Marketing, HRD; Research Methodology; Process of Research.

The sampling process comprises several stages:

Tabular form of data-

Expected value = row total X col total

E35 = (60 X 55)/ 95 = 34.75 = 35

Maruti Hyundai Tata Ford

4. The number of observations./ sample size = n

SD = sqrt ( 1/n-1 Sumn (X-Xbar)2)

= ItI = I 34 – 40 I = 6 / 4.22 = 1.42

Step 1: Null Hypothesis. - The life span of electric bulb is 40 months

ItI = I Xbar – Ybar I

ItcalI = I Xbar – Ybar I

 The mean is approximately 1

F = Sx2 / Sy2 OR Sy2 / Sx2

df = (n1 -1, n2 -1)

x y X-Xbar Y-Ybar (x-xbar)2 (y-ybar)2

Sy2 = ( 1/ n2 -1) sumn (Y - Ybar)2 = 1/(7-1) 290 = 48.33

= [ (sumnA)2 + (sumn B)2 + (sumn C)2] - CF

= [ (sumnA)2 + (sumn B)2 + (sumn C)2] - CF

[ (102)2/12 + (150)2/12 + (165)2/12 ] - 352.08 = 6.17

10.Find table value at specific df and SL

Steps of ANOVA- (TWO PARAMETER)

Raw sum of squares

6. Find Total Sum of Squares ( TSS) = RSS - CF = 27900 – 27225 = 675

8. Find Column Sum of Squares (CSS)

Wilcoxon’s Test / U test / Rank Sum Test-

Probability of success + Probability of failure = 1

5. The total area of NPC is divided in to ± standard deviations:

Shaded area = 0.5 - table value of 0.65

0.2578 X 1000 = 258 no.s

Step 1: arrange the data in ascending order

Upper limit = mew + Z . SD

Sample Name of related Rank sum of Rank sum of

U calculated value n1.n2 + n2(n2+1) – Sumn R2 = = 144+(12X13)/2 – 132.5=89.5

LIONS RANK Rank-Sample-A Rank-SampleB RankSumA RankSumB

H test (Kruskal’s Wallies Test) : it is an extension of chi-square test.

Step 9: Htab at (4-1)=3 df, 5% Sl = 7.815

Ztab at %SL = 1.96, as Zcal > Ztab: Null hypo is rejected

What is a Z Test used for?

One of the main purposes of statistics is hypothesis testing.

 Two sample z test in Excel.

A brief description of the above stated steps will be helpful.

Questionnaire in Research Methodology

Preliminary decisions in questionnaire design

Various concepts relating to Research design-

Types of Data & Measurement Scales:

Guidelines for preparing research reports

TECHNICAL Report/ Academic – consists of

SIGNIFICANCE OF REPORT WRITING

 Process optimization and process control

 Research and development

You might also like