Professional Documents
Culture Documents
Research Methodologyy
Research Methodologyy
Researchers organize their research by formulating and defining a research problem. This helps them focus
the research process so that they can draw conclusions reflecting the real world in the best possible way.
Hypothesis
In research, a hypothesis is a suggested explanation of a phenomenon.
A null hypothesis is a hypothesis which a researcher tries to disprove. Normally, the null hypothesis represents the
current view/explanation of an aspect of the world that the researcher wants to challenge.
Research methodology involves the researcher providing an alternative hypothesis, a research hypothesis, as an
alternate way to explain the phenomenon.
The researcher tests the hypothesis to disprove the null hypothesis, not because he/she loves the research
hypothesis, but because it would mean coming closer to finding an answer to a specific problem. The research
hypothesis is often based on observations that evoke suspicion that the null hypothesis is not always correct.
A variable is something that changes. It changes according to different factors. Some variables change easily, like
the stock-exchange value, while other variables are almost constant, like the name of someone. Researchers are
often seeking to measure variables.
The variable can be a number, a name, or anything where the value can change.
n research, you typically define variables according to what you're measuring. The independent variable is the
variable which the researcher would like to measure (the cause), while the dependent variable is the effect (or
assumed effect), dependent on the independent variable. These variables are often stated in experimental
research, in a hypothesis, e.g. "what is the effect of personality on helping behavior?"
In explorative research methodology, e.g. in some qualitative research, the independent and the dependent
variables might not be identified beforehand. They might not be stated because the researcher does not have a
clear idea yet on what is really going on.
Confounding variables are variables with a significant effect on the dependent variable that the researcher failed
to control or eliminate - sometimes because the researcher is not aware of the effect of the confounding variable.
The key is to identify possible confounding variables and somehow try to eliminate or control them.
Choosing the Research Method
The selection of the research method is crucial for what conclusions you can make about a phenomenon. It affects
what you can say about the cause and factors influencing the phenomenon.
It is also important to choose a research method which is within the limits of what the researcher can do. Time,
money, feasibility, ethics and availability to measure the phenomenon correctly are examples of issues
constraining the research.
Choosing the Measurement
Choosing the scientific measurements are also crucial for getting the correct conclusion. Some measurements
might not reflect the real world, because they do not measure the phenomenon as it should.
Significance Test
To test a hypothesis, quantitative research uses significance tests to determine which hypothesis is right.
The significance test can show whether the null hypothesis is more likely correct than the research hypothesis.
Research methodology in a number of areas like social sciences depends heavily on significance tests.
A significance test may even drive the research process in a whole new direction, based on the findings.
The t-test (also called the Student's T-Test) is one of many statistical significance tests, which compares two
supposedly equal sets of data to see if they really are alike or not. The t-test helps the researcher conclude
whether a hypothesis is supported or not.
Drawing Conclusions
Drawing a conclusion is based on several factors of the research process, not just because the researcher got the
expected result. It has to be based on the validity and reliability of the measurement; how good the measurement
was to reflect the real world and what more could have affected the results.
The observations are often referred to as 'empirical evidence' and the logic/thinking leads to the conclusions.
Anyone should be able to check the observation and logic, to see if they also reach the same conclusions.
Errors of the observations may stem from measurement-problems, misinterpretations, unlikely random events etc.
A common error is to think that correlation implies a causal relationship. This is not necessarily true.
Errors in Research
Logically, there are two types of errors when drawing conclusions in research:
Type 1 error is when we accept the research hypothesis when the null hypothesis is in fact correct.
Type 2 error is when we reject the research hypothesis even if the null hypothesis is wrong.
POPULATION - A research population is generally a large collection of individuals or objects that is the main focus
of a scientific query. A research population is also known as a well-defined collection of individuals or objects
known to have similar characteristics. All individuals or objects within a certain population usually have a common,
binding characteristic or trait.
Target population refers to the ENTIRE group of individuals or objects to which researchers are interested
in generalizing the conclusions. The target population usually has varying characteristics and it is also known as the
theoretical population.
SAMPLE is simply a subset of the population. The concept of sample arises from the inability of the researchers to
test all the individuals in a given population. The sample must be representative of the population from which it
was drawn and it must have good size to warrant statistical analysis.
The main function of the sample is to allow the researchers to conduct the study to individuals from the population
so that the results of their study can be used to derive conclusions that will apply to the entire population. It is
much like a give-and-take process. The population “gives” the sample, and then it “takes” conclusions from the
results obtained from the sample.
Sampling Methods
Probability Sampling refers to sampling when the chance of any given individual being selected is known and these
individuals are sampled independently of each other. This is also known as random sampling. A researcher can
simply use a random number generator to choose participants (known as simple random sampling), or
every nth individual (known as systematic sampling) can be included. Researchers also may break their target
population into strata, and then apply these techniques within each strata to ensure that they are getting enough
participants from each strata to be able to draw conclusions. For example, if there are several ethnic communities
in one geographical area that a researcher wishes to study, that researcher might aim to have 30 participants from
each group, selected randomly from within the groups, in order to have a good representation of all the relevant
groups.
Random Sampling
Sampling techniques can be divided into two broad categories:
Random (or probability) sampling and
simple random sampling, Suppose-
stratified sampling,
systematic sampling and N=1000 n=100
two stage sampling
multi-stage sampling N= N1+N2+N3…
cluster sampling,
n = n1+n2+n3…
A simple random sample (SRS) is the most basic probabilistic option used for creating a sample from a
population. Each SRS is made of individuals drawn from a larger population (represented by the variable N),
completely at random. As a result, said individuals have an equal chance of being selected throughout the
sampling process. The benefit of SRS is that as a result, the investigator is guaranteed to choose a sample which is
representative of the population, which ensures statistically valid conclusions.
Example
An investigator wishes to draw multiple samples consisting of 5 people each from a village of 100. (Here, the
variable n is used to represent the size of the sample; thus village size N=100 and sample size n=5). By randomizing
the selection procedure, any member of this village has an equal chance of being selected as part of this first
sample, and an equal chance of being selected for the next sample of the same size (and so on).
To create a systemic random sample, there are seven steps: (a) defining the population; (b) choosing your sample
size; (c)listing the population; (d) assigning numbers to cases; (e) calculating the sampling fraction; (f) selecting the
first unit; and(g) selecting your sample.
A simple random sample (SRS) is the most basic probabilistic option used for creating a sample from a population.
Each SRS is made of individuals drawn from a larger population (represented by the variable N), completely at
random. As a result, said individuals have an equal chance of being selected throughout the sampling process. The
benefit of SRS is that as a result, the investigator is guaranteed to choose a sample which is representative of the
population, which ensures statistically valid conclusions.
Example
An investigator wishes to draw multiple samples consisting of 5 people each from a village of 100. (Here, the
variable n is used to represent the size of the sample; thus village size N=100 and sample size n=5). By randomizing
the selection procedure, any member of this village has an equal chance of being selected as part of this first
sample, and an equal chance off being selected for the next sample of the same size (and so on).
To create a systemic random sample, there are seven steps: (a) defining the
population; (b) choosing your sample size; (c)listing the population; (d) assigning numbers
to cases; (e) calculating the sampling fraction; (f) selecting the first unit; and(g) selecting
your sample.
Stratified random sampling is a probabilistic sampling option. The first step in stratified random sampling is
to split the population into strata, i.e. sections or segments. The strata are chosen to divide a population into
important categories relevant to the research interest.
For example, if interested in school achievement we may want to first split schools into rural, urban, and suburban
as school achievement on average may be quite distinct between these regions. The second step is to take a simple
random sample within each stratum. This way a randomized probabilistic sample is selected within each stratum.
Each strata should be mutually exclusive (i.e. every element in the population can be assigned to only one
stratum), and no population element can be excluded in the construction of strata.
Stratified random sampling is a probabilistic sampling option. The first step in stratified random sampling is
to split the population into strata, i.e. sections or segments. The strata are chosen to divide a population into
important categories relevant to the research interest.
For example, if interested in school achievement we may want to first split schools into rural, urban, and suburban
as school achievement on average may be quite distinct between these regions. The second step is to take a simple
random sample within each stratum. This way a randomized probabilistic sample is selected within each stratum.
Each strata should be mutually exclusive (i.e. every element in the population can be assigned to only one
stratum), and no population element can be excluded in the construction of strata.
Systematic sampling is a random sampling technique which is frequently chosen by researchers for its
simplicity and its periodic quality.
In systematic random sampling, the researcher first randomly picks the first item or subject from the population.
Then, the researcher will select each n'th subject from the list.
The procedure involved in systematic random sampling is very easy and can be done manually. The results are
representative of the population unless certain characteristics of the population are repeated for every n'th
individual, which is highly unlikely.
Steps in selecting a systematic random sample:
Calculate the sampling interval (the number of households in the population divided by the number of
households needed for the sample)
Select a random start between 1 and sampling interval
Repeatedly add sampling interval to select subsequent households
Systematic sampling is a random sampling technique which is frequently chosen by researchers for its
simplicity and its periodic quality.
In systematic random sampling, the researcher first randomly picks the first item or subject from the population.
Then, the researcher will select each n'th subject from the list.
The procedure involved in systematic random sampling is very easy and can be done manually. The results are
representative of the population unless certain characteristics of the population are repeated for every n'th
individual, which is highly unlikely.
The process of obtaining the systematic sample is much like an arithmetic progression.
1. Starting number: The researcher selects an integer that must be less than the total number of individuals
in the population. This integer will correspond to the first subject.
2. Interval: The researcher picks another integer which will serve as the constant difference between any
two consecutive numbers in the progression.
The integer is typically selected so that the researcher obtains the correct sample size
For example, the researcher has a population total of 100 individuals and need 12 subjects. He first picks his
starting number, 5.
Then the researcher picks his interval, 8. The members of his sample will be individuals 5, 13, 21, 29, 37, 45, 53, 61,
69, 77, 85, 93.
Steps in selecting a systematic random sample:
Calculate the sampling interval (the number of households in the population divided by the number of
households needed for the sample)
Select a random start between 1 and sampling interval
Repeatedly add sampling interval to select subsequent households
Sampling Gap
Sampling fraction K = N/n (one sample to another the gap is inclusive)
Suppose-
N = 20 n = 5 then, K = 20/5 = 4
(Population = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 )
Ex- first select 9 then, 13, 17 , 1, 5….
Sample size – small ( n = < 30 ) large ( n is >=30)
FC
P=
EC
P= probability FC - Favorable cases EC - Exhaustive no of cases
Statistical Tests –
A statistical test provides a mechanism for making quantitative decisions about a process or processes. The intent
is to determine whether there is enough evidence to "reject" a conjecture or hypothesis about the process. The
conjecture is called the null hypothesis. Not rejecting may be a good result if we want to continue to act as if we
"believe" the null hypothesis is true. Or it may be a disappointing result, possibly indicating we may not yet have
enough data to "prove" something by rejecting the null hypothesis.
There are many different statistical significance, or hypothesis, tests. They all follow the same basic principle. The
appropriate test for a given situation depends on the nature of the data being analyzed. This chapter explains p-
values, gives a detailed description of significance testing and discusses the relationship between confidence
intervals and significance tests.
Types of statistical tests: There are a wide range of statistical tests. The decision of which statistical test to use
depends on the research design, the distribution of the data, and the type of variable. In general, if the data is
normally distributed you will choose from parametric tests. If the data is non-normal you choose from the set of
non-parametric tests. Below is a table listing just a few common statistical tests and their use.
Type of Test: Use:
Chi-square Tests for the strength of the association between two categorical variables
Paired T-test Tests for difference between two related variables
Independent T-
Tests for difference between two independent variables
test
F test Study two samples at a time
Tests the difference between group means after any other variance in the outcome variable is
ANOVA
accounted for. More than two samples at a time. Can be one way or two way
Z-test
A hypothesis is a speculation or theory based on insufficient evidence that lends itself to further testing and
experimentation. With further testing, a hypothesis can usually be proven true or false. Let's look at an example.
Little Susie speculates, or hypothesizes, that the flowers she waters with club soda will grow faster than flowers
she waters with plain water. She waters each plant daily for a month (experiment) and proves her hypothesis true!
A null hypothesis (H0) is a hypothesis that says there is no statistical significance between the two variables in
the hypothesis. It is the hypothesis that the researcher is trying to disprove. In the example, Susie's null hypothesis
would be something like this: There is no statistically significant relationship between the type of water I feed the
flowers and growth of the flowers. A researcher is challenged by the null hypothesis and usually wants to disprove
it, to demonstrate that there is a statistically-significant relationship between the two variables in the hypothesis.
A statistical significance test measures the strength of evidence which the data sample supplies for or against some
proposition of interest.
This proposition is known as a 'null hypothesis', since it usually relates to there being 'no difference' between
groups or 'no effect' of a treatment.
An alternative hypothesis (H1) simply is the inverse, or opposite, of the null hypothesis. So, if we continue
with the above example, the alternative hypothesis would be that there IS indeed a statistically-significant
relationship between what type of water the flower plant is fed and growth. More specifically, here would be the
null and alternative hypotheses for Susie's study:
Null: If one plant is fed club soda for one month and another plant is fed plain water, there will be no difference in
growth between the two plants.
Alternative: If one plant is fed club soda for one month and another plant is fed plain water, the plant that is fed
club soda will grow better than the plant that is fed plain water.
7 Step Process of Statistical Hypothesis Testing
Step 1: State the Null Hypothesis.
Step 2: State the Alternative Hypothesis.
Step 3: Find out the degrees of freedom ( no. of independent observation in a series)
Degrees of freedom [ when data are arranged in individual form – df= (n-1) ]
Degrees of freedom [ when data are arranged in tabular form – df= (m-1) (n-1) ] where m= no of row, n= no of
colomn
Step4: choose Confidence level-
CL = 95% means 5% error, CL = 99% means 1% error
Step 5: Find out the Calculated value using formulae by preparing Chi-square table
Step 6: Find out the Tabulated value – (from table) depending on degrees of freedom & Confidence level.
Step 7: Making decisions
If Cal Value < Tab value = Null hypothesis is Accepted
If Cal Value > Tab value = Null hypothesis is Rejected
chi-square test :a statistical method assessing the goodness of fit between a set of observed values and
those expected theoretically. Chi-square is a statistical test commonly used to compare observed data with data
we would expect to obtain according to a specific hypothesis.
For example, if, according to Mendel's laws, you expected 10 of 20 offspring from a cross to be male and the actual
observed number was 8 males, then you might want to know about the "goodness to fit" between the observed
and expected. Were the deviations (differences between observed and expected) the result of chance, or were
they due to other factors. How much deviation can occur before you, the investigator, must conclude that
something other than chance is at work, causing the observed to differ from the expected?
The chi-square test is always testing what scientists call the null hypothesis, which states that there is no
significant difference between the expected and observed result.
(chi-square )X2= (O-E)2/E
Where O = observed value/frequency
E = expected value/frequency
That is, chi-square is the sum of the squared difference between observed (O) and the expected (E) data (or the
deviation, d), divided by the expected data in all possible categories.
Individual form of data-
(chi-square )X2= fx/N
X 10 12 15
f 5 3 4
Non-sampling error:
“Non-sampling error is the error that arises in a data collection process as a result of factors other than taking a
sample.
Non-sampling errors have the potential to cause bias in polls, surveys or samples.
There are many different types of non-sampling errors and the names used to describe them are not consistent.
Examples of non-sampling errors are generally more useful than using names to describe them.
QUESTION
A survey conducted in BBSR regarding family life and education. Test at 5% SL that is there any
significant difference between family life and education from the following information.
Happy Unhapp
family y family
Educated 65 35 100
Uneducated 72 60 132
137 95 232
Step 1: Null Hypothesis. - There is NO significant difference between family life and education
Step 2: Alternative Hypothesis. There IS significant difference between family life and education
Step 3: degrees of freedom ( no. of independent observation in a series)
Individual form – df= (n-1) & Tabular form – df= (m-1) (n-1) ] where m= no of row, n= no of column
df tabular = ( 2-1) (2-1) = 1
Step 4: choose Confidence level-
SL 5% or CL = 95%
Step 5: Calculated value using formulae by preparing Chi-square table
E65 = (100X137)/232 = 59
E35 = (100X95)/232 = 41
E72 = (132X137)/232 = 78
E60 = (95X132)/232 = 54
Obs Value Exp value
O E O-E (O-E)2 (O-E)2/E
65 59 6 36 0.61
35 41 -6 36 0.88
72 78 -6 36 0.46
60 54 6 36 0.67
TOTAL 2.62
X2 cal = 2.62
Step 6: Tabulated value – (from table) at SL 5% and df = 1 is 3.84.
Step 7: Making decisions
If Cal Value < Tab value = Null hypothesis is Accepted
2.62 < 3.84
There is NO significant difference between family life and education
QUESTION
In an experiment an immunization of goats from anthrax the following results are obtained. Derive your
inference on the efficacy of vaccine.
Died of Survived Total
Anthrax
Inoculated 2 10 12
Not inoculated 6 6 12
Total 8 16 24
Step 1: Null Hypothesis. - There is NO significant relationship between inoculation and survival
Step 2: Alternative Hypothesis. There IS significant relationship between inoculation and survival
Step 3: degrees of freedom ( no. of independent observation in a series)
Individual form – df= (n-1) & Tabular form – df= (m-1) (n-1) ] where m= no of row, n= no of column
df tabular = ( 2-1) (2-1) = 1
Step 4: choose Confidence level-
SL 5% or CL = 95%
Step 5: Calculated value using formulae by preparing Chi-square table
E2 = (12X8)/24 = 4
E10 = (12X16)/24 = 8
E6 = (8X12)/24 = 4
E6 = (16X12)/24 = 8
Obs Value Exp value
O E O-E (O-E)2 (O-E)2/E
2 4 -2 4 1
10 8 2 4 .5
6 4 2 4 1
6 8 -2 4 .5
TOTAL 3
X2 cal = 3
Step 6: Tabulated value – (from table) at SL 5% and df = 1 is 3.84.
Step 7: Making decisions
If Cal Value < Tab value = Null hypothesis is Accepted
3 < 3.84
There is NO significant relationship between inoculation and survival
QUESTION
Four major automobile manufacturers in India are Maruti, Hyundai, Tata, Ford. These companies manufacture cars
in 3 segments small, medium, large. Suppose in a hypothetical study conducted to analyze whether there was any
relationship between manufacturing and type of car performed by consumers from the following-
Step 1: Null Hypothesis. - There is NO significant relationship between manufacturing and type of car performed
by consumers
Step 2: Alternative Hypothesis. There IS significant relationship between manufacturing and type of car performed
by consumers
Step 3: degrees of freedom ( no. of independent observation in a series)
Individual form – df= (n-1) & Tabular form – df= (m-1) (n-1) ] where m= no of row, n= no of column
df tabular = ( 4-1) (3-1) = 6
Step 4: choose Confidence level-
SL 5% or CL = 95%
Step 5: Calculated value using formulae by preparing Chi-square table
E80 = (25X100)/395 = 6.33
E60 = (25X100)/ 395 = 6.33
E75 = (25X100)/ 395 = 6.33
E10 = (25X100)/ 395 = 6.33
E15 = (70X100)/ 395 = 17.72
E10 = (70X100)/ 395 = 17.72
E15 = (70X100)/ 395 = 17.72
E30 = (70X100)/ 395 = 17.72
E5 = (105X100)/ 395 = 26.58
E30 = (105X100)/ 395 = 26.58
E10 = (105X100)/ 395 = 26.58
E60 = (105X100)/ 395 = 26.58
O E O-E (O-E)2 (O-E)2/E
80 6.33 73.67 5427.27 857.39
60 6.33 53.67 2880.47 455.05
75 6.33 68.67 4715.57 744.96
10 6.33 3.67 13.47 2.13
15 17.72 -2.72 7.40 0.42
10 17.72 -7.72 59.60 3.36
15 17.72 -2.72 7.40 0.42
30 17.72 12.28 150.80 8.51
5 26.58 -21.58 465.70 17.52
30 26.58 3.42 11.70 0.44
10 26.58 -16.58 274.90 10.34
60 26.58 33.42 1116.90 42.02
Total 2142.55
X2 cal = 2142.55
Step 6: Tabulated value – (from table) at SL 5% and df = 1 is -----------
Step 7: Making decisions
If Cal Value < Tab value = Null hypothesis is xxxxxxx
xxxxxxxxxxxx
There is xxxx significant relationship between manufacturing and type of car performed by consumers
QUESTION
The following table gives the no of car accidents that occur during various days of the week. Test the
hypothesis that at 5%SL the accidents are uniformly distributed over the week-
O E O-E (O-E)2 (O-E)2/E
Sunday 14 12 2 4.00 0.33
Monday 15 12 3 9.00 0.75
Tuesday 8 12 -4 16.00 1.33
Wednesday 12 12 0 0.00 0.00
Thursday 11 12 -1 1.00 0.08
Friday 9 12 -3 9.00 0.75
Saturday 14 12 2 4.00 0.33
83/7=11.8 3.58
X2 cal = 3.58
Step 1: Null Hypothesis. - There is NO significant relationship between manufacturing and type of car performed
by consumers
Step 2: Alternative Hypothesis. There IS significant relationship between manufacturing and type of car performed
by consumers
Step 3: degrees of freedom ( no. of independent observation in a series)
Individual form – df= (n-1) & Tabular form – df= (m-1) (n-1) ] where m= no of row, n= no of column
df individual = (n-1) = 7-1= 6
Step 4: choose Confidence level-
SL 5% or CL = 95%
Step 5: Calculated value using formulae by preparing Chi-square table
X2 cal = 3.58
Step 6: Tabulated value – (from table) at SL 5% and df = 1 is 12.592
Step 7: Making decisions
If Cal Value < Tab value = Null hypothesis is Accepted
3.58 < 12.592
There is significant relationship between manufacturing and type of car performed by consumers
The t-test
The t-test assesses whether the means of two groups are statistically different from each other. This analysis is
appropriate whenever you want to compare the means of two groups, and especially appropriate as the analysis
for the posttest-only two-group randomized experimental design.
The t-test is probably the most commonly used Statistical Data Analysis procedure for hypothesis testing. Actually,
there are several kinds of t-tests, but the most common is the "two-sample t-test" also known as the "Student's t-
test" or the "independent samples t-test".
One sample t-test
Comparison of sample mean with population mean when standard deviation of the population is estimated from
the sample.
Calculation of the test statistic requires four components:
1. The average of the sample (observed average) / sample mean= Xbar
ItI = I Xbar – M I
---
2. The population average or other known value (expected average)/ popln mean= M S/ V n
3. The standard deviation (SD) of the sample average = S
QUESTION
An electric bulb manufacturing company claims that the life span of electric bulb is 40 months. Sample
test of 6 bulbs show whose life span are 35m,42m,20m,22m,38m,45m- test the hypothesis that at 5% SL
company’s claim is valid.
X X bar X - Xbar (X - Xbar)2
35 34 1 1
42 34 8 64
20 34 -14 196
22 34 -12 144
38 34 4 16
45 34 11 121
Xbar= 202/6=33.66=34 542
SD = sqrt ( 1/5 X 542) = 10.41
Two-Sample T-Test
We often want to know whether the means of two populations on some outcome differ. For example, there are
many questions in which we want to compare two categories of some categorical variable (e.g., compare males
and females) or two populations receiving different treatments in context of an experiment. The two-sample t-test
is a hypothesis test for answering questions about the mean where the data are collected from two random
samples of independent observations, each from an underlying normal distribution:
The steps of conducting a two-sample t-test are quite similar to those of the one-sample test. And
for the sake of consistency, we will focus on another example dealing with birthweight and
prenatal care. In this example, rather than comparing the birth weight of a group of infant to
some national average, we will examine a program's effect by comparing the birth weights of
babies born to women who participated in an intervention with the birth weights of a group that
did not.
X Y
Sample size n1 n2
Sample mean Xbar Ybar
Combined SD S
Degree of freedom n1-1 n2-1
Degree of freedom for combined n1+n2 - 2
ItcalI = I 4-5 I
5.7 X sqrt 7X9 = 0.26
16
t tab = at 5% SL and 14 df =
An F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis. It is most
often used when comparing statistical models that have been fitted to a data set, in order to identify the model
that best fits the population from which the data were sampled.
Since F is formed by chi-square, many of the chi-square properties carry over to the F distribution.
The F-values are all non-negative
The distribution is non-symmetric
There are two independent degrees of freedom, one for the numerator, and one for the denominator.
There are many different F distributions, one for each pair of degrees of freedom.
If the null hypothesis is true, then the F test-statistic given above can be simplified (dramatically). This ratio of
sample variances will be test statistic used. If the null hypothesis is false, then we will reject the null
hypothesis that the ratio was equal to 1 and our assumption that they were equal.
There are several different F-tables. Each one has a different level of significance. So, find the correct level of
significance first, and then look up the numerator degrees of freedom and the denominator degrees of
freedom to find the critical value.
x Y
Mean Xbar Ybar
Sample n1 n2
df n1 -1 n2 -1
Sample variance Sx2 Sy2
Sx2 = ( 1/ n1 -1) sumn (X - Xbar)2 Sy2 = ( 1/ n2 -1) sumn (Y - Ybar)2
Analysis of variance (ANOVA) is a collection of statistical models used to analyze the differences among group
means and their associated procedures (such as "variation" among and between groups), developed by statistician
and evolutionary biologist Ronald Fisher
One-way ANOVA – single parameter
Two-way ANOVA – two parameter
Steps of ANOVA- (ONE PARAMETER)
1. Set up Null hypothesis
2. Set up Alt Hypothesis
3. Find GRAND TOTAL = G
4. Find the Correction Factor = CF = G2/N, N = total no of subgroups
5. Find Raw Sum of Squares ( RSS)
6. Find Total Sum of Squares ( TSS) = RSS - CF
7. Find Between Sample Sum of Squares ( BSS )
PROBLEM Raw SS
Plots Seeds Plots Seeds
A B C A B C
1 2 7 7 1 4 49 49
2 3 4 6 2 9 16 36
3 5 2 4 3 25 4 16
4 8 9 8 4 64 81 64
18 22 25 65 102 150 165 417
Steps of ANOVA-
1. Set up Null hypothesis – there is NO significant difference between plots and seeds
2. Set up Alt Hypothesis - there is significant difference between plots and seeds
3. Find GRAND TOTAL = G = 65
4. Find the Correction Factor = CF = G2/N, CF= (65)2 /12=352.08
5. Find Raw Sum of Squares ( RSS) 417
6. Find Total Sum of Squares ( TSS) = RSS - CF = 417 – 352.08= 64.92
7. Find Between Sample Sum of Squares ( BSS )
8. Find Within Sample Sum of Squares ( WSS) = TSS – BSS = 64.92-6.17= 58.75
9. Prepare ANOVA table-
Source Sum of Squares Degrees of freedom Mean Square Error Fcal
SS df MSS = SS/df
BSS 6.17 3-1=2 6.17/2 = 3.085 6.52/3.08=2.11
WSS 58.75 11-2= 9 58.75/9 = 6.527 At (2,9) or (9,2) df
TSS 64.92 N-1 = 11 if sample
size is 12
PROBLEM:
The following table gives the number of units of production per day turn out by 4 different types of machines.
To test the hypothesis that the mean production is the same for the 4 machines
To test the hypo that the employee do not differ wrt mean productivity using 5% SL of ANOVA/
Machines M1 M2 M3 M4
Employees
1 40 36 45 30
2 38 42 50 41
3 36 30 48 35
4 46 47 52 44
1. Set up Null hypothesis = mean production is the same for the 4 machines
2. Set up Alt Hypothesis mean production is NOT the same for the 4 machines
3. Find GRAND TOTAL = G = 660
4. Find the Correction Factor = CF = G2/N, = 27225
5. Find Raw Sum of Squares ( RSS) = 27900
Row SS
Machines M1 M2 M3 M4
Employees Row sumn (Row SS)2 (Row SS)2/4
1 40 36 45 30 151 22801 5700
2 38 42 50 41 171 29241 7310
3 36 30 48 35 149 22201 5550
4 46 47 52 44 189 35721 8930
27491
27491-27225= 266
Col SS
Machines M1 M2 M3 M4
Employees
1 40 36 45 30
2 38 42 50 41
3 36 30 48 35
4 46 47 52 44
160 155 195 150
Sumn 160 155 195 150
(col sum)2 25600 24025 38025 22500
(col sum)2/4 6400 6006 9506 5625 27538
27538 – 27225 = 313
9. Find Error Sum of Squares (ESS)
= TSS – (Row SS + CSS) = 675-(265-312.5)= 675-57.5 = 96.5
10. Prepare ANOVA table-
Source Sum of Squares Degrees of freedom Mean Square Error Fcal
SS df MSS = SS/df
RSS 266 r-1 = 3 266/3=88.66 = x F = x/z
=88.66/10.72=8.22
CSS 312.5 c-1 = 3 312.5/3=104.16 = y F = y/z =
104.16/10.72=9.71
ESS 96.5 (r-1)(c-1) = 9 96.5/15=10.72 = z
TSS 675 N-1 = 15 if sample 45
size is 16
11.Find table value at specific df and 5% SL at df(3,9) = 3.86
12.Compare = 8.27 > 3.86 Null Hypo is rejected for machines
9.71 > 3.86 Null Hypo is rejected for employees
13.Decision making
Thus ± 3σ of NPC include different number of cases separately. Between ± 1σ lie the middle 2/3rd cases or 68.26%,
between ± 2σ lie 95.44% cases and between ± 3σ lie 99.73% cases and beyond + 3σ only 0.37% cases fall.
9. The Mean of NPC is µ and the standard deviation is σ:
As the mean of the NPC represent the population mean so it is represented by the µ (Meu). The standard deviation
of the curve is represented by the Greek Letter, σ.
10. In Normal Probability Curve the Standard deviation is the 50% larger than the Q:
In NPC the Q is generally called the probable error or PE.
Applications of Normal Probability Curve:
Some of the most important applications of normal probability curve are as follows:
The principles of Normal Probability Curve are applied in the behavioral sciences in many different areas.
1. NPC is used to determine the percentage of cases in a normal distribution within given limits:
The Normal Probability Curve helps us to determine:
i. What percent of cases fall between two scores of a distribution?
ii. What percent of scores lie above a particular score of a distribution?
iii. What percent of scores lie below a particular score of a distribution?
General normal distribution- X~N ( Mew, SD2) N= normal distribution Mew = mean of normal distribution
SD2 = variance
Standard normal distribution- X~N ( 0, 1) Mew = 0 SD2 = 1
All General normal distribution is converted into Standard Normal Variate (Z)
Z = (X – Mew )/SD X = probability
The total of NPC is divided into six standard deviation units. From the center it is divided in to three +ve’ standard
deviation units and three —ve’ standard deviation units.
Thus ± 3σ of NPC include different number of cases separately.
Between ± 1σ lie the middle 2/3rd cases or 68.26% ~ 68%
between ± 2σ lie 95.44% ~ 95 cases and
between ± 3σ lie 99.73% ~ 99 cases and
beyond + 3σ only 0.37% cases fall.
Where X = Mew Z = 0/SD = 0 same as ( Mew – Mew)/SD
Z = (Mew + SD-Mew)/SD = SD/SD = 1
Z = (Mew + 2SD-Mew)/SD = 2SD/SD = 2 ‘Z’ value to be found out from Table
Z = (Mew + 3SD-Mew)/SD = 3SD/SD = 3
QUESTION-
The mean height of soldiers is 68.6” and Std Deviation = 5.2” in a certain regiment. The Ht and SD follows Normal
distribution. Find out how many no of soldiers whose ht was > 6” out of 1000 soldiers.
Answer-
Mew = 68.6 SD = 5.2
Z =( X-mew)/SD = (X-68.6)/5.2 ; X = 6’ OR 72” so, (72 - 68.6)/5.2 = 0.65”
0.65
U test –
QUESTION-
The values of one sample are 53,38,69,57,46,39,73,48,73,74,60,78
The values of second sample are 44,40,61,52,32,44,70,41,67,72,53,72
Using U test the hypothesis that they came from with same mean at 10% SL. Table value of 0.45 is 1.64
0.45
0.05 0.05
Step 6: Find out the mean = (n1 X n2)/2 n1 is 1 st sample size, n2 is 2nd sample size
(12X12)/2 = 72 mean/mew
Step 7: Find out the Std Deviation = SD = V n1.n2(n1+n2+1) = Sqrt (144X25)/12 = 17.32 = SD
12
Step 8: Find out the U calculated value n1.n2 + n1(n1+1) – Sumn R1 = 144+(12X13)/2 – 167.5=54.5
2
OR
U calculated value n1.n2 + n2(n2+1) – Sumn R2 Acceptance region
2 Null Hypo accepted
Step 9: Find out the upper limit = mew + Z . SD
Lower limit = mew – Z . SD
ULL UUL
Step 10: If U cal lies between ULL and UUL I,e. Acceptance region = Null Hypo accepted
Step 9: Find out the upper limit = mew + Z . SD Test at 10%SL
Lower limit = mew – Z . SD 0.45
(0.05/5%)
Acceptance zone
43.59 100.40
Upper limit = mew + Z . SD
= 72 + 1.64X17.32 = 100.40
Lower limit = mew - Z . SD
= 72 - 1.64X17.32 = 43.59
QUESTION-
An animal trainer in a circus is training 20 lions to perform a special trick. The lions have been devided into 2
groups. Grp-A gets positive reinforcement of food and favourable comments during the learning session where as
Grp-B does not. The following table indicates the no. of days needed by each lion to learn the trick.
Group A- 78, 95, 82, 69, 111, 65, 73, 84, 92, 110
Group B- 121, 132, 101, 79, 94, 88, 102, 93, 98, 127.
Using 5%SL test the Null Hypo that the mean time for both the groups is the same using U test. Table
value of Z=1.96 ( Z value of 0.475)
Step 8: Find out the U calculated value n1.n2 + n1(n1+1) – Sumn R1 = 100+(10X11)/2 – 77=28
2
OR
U calculated value n1.n2 + n2(n2+1) – Sumn R2 = 100+(10X11)/2 – 133=-28
2
Step 9: Find out the upper limit = mew + Z . SD = 50+1.96X13.22 = 75.91
Lower limit = mew – Z . SD = 50 – 1.96X13.22 = 24.08
Acceptance region
Null Hypo accepted
24.08 75.91
Step 10: If U cal values does not lies between U LL and UUL I,e. Acceptance region = Null Hypo rejected
ULL = 24.08 and UUL = 75.91
Ucal (R1) = 28 and Ucal (R2) = -28 both cases it lies in the acceptance zone
so, Null Hypo is rejected
QUESTION-
Use H test at 5% SL to test the Null Hypo that a professional bowler performs equally well with the 4 bowling balls
given the following results-
Ball A – 271, 282, 257, 248, 262
Ball B – 252, 275, 302, 268, 276
Ball C – 260, 255, 239, 246, 266
Ball D- 279, 242, 297, 270, 258
Answer-
Name of Rank Rank Rank Rank
Bowling Result Rank Ball sumA sumB sumC sumD
239 1 C 1
242 2 D 2
246 3 C 3
248 4 A 4
252 5 B 5
255 6 C 6
257 7 A 7
258 8 D 8
260 9 C 9
262 10 A 10
266 11 C 11
268 12 B 12
270 13 D 13
271 14 A 14
275 15 B 15
276 16 B 16
279 17 D 17
282 18 A 18
297 19 D 19
302 20 B 20
RA=53 RB=68 RC=30 RD=59
Step 6: Find out the mean = (n1 X n2 x n3 x n4)/4 n1 is 1 st sample size, n2 is 2nd sample size and so on..
= (5x5x5x5)/4 = 156.25 = mean
Step 7: Find out the Std Deviation = SD = V n1.n2.n3.n4(n1+n2+n3+n4+3) = (625X23)/12=1197.91=SD
12
Step 8: Find out the H calculated value
= 12/n(n+1)[sumnR12/n1+ sumnR22/n2+ sumnR32/n3+sumnR42/n4] – 3 (n+1)
= 12/20X21[532/5 + 682/5 + 302/5 + 592/5] – 3X21
= 12/420[562+925+180+696] – 63
=4.51
A Z-test is a hypothesis test based on the Z-statistic, which follows the standard normal distribution under the
null hypothesis. The simplest Z-testis the 1-sample Z-test, which tests the mean of a normally distributed
population with known variance.
A statistical test used to determine whether two population means are different when the variances are known
and the sample size is large. The test statistic is assumed to have a normal distribution and nuisance parameters
such as standard deviation should be known in order for an accurate z-test to be performed.
Where n=>30
Follows Normal distribution
Z 95% = 1.96 = 5%SL
Z 99% = 2.58 = 1%SL
actual value - Hypothetical value
Z= ---------------------------------------------
Standard Error
Standard Error means Standard deviation of sampling distribution
Mean = binomial distribution = np
No of trial of probability
Variance = npq
SD = sqrt npq(p+q+1)
P=probability of success
Q=probability of failure
Mean > variance always
A coin is tossed 500 times, 280 times, turn of head. Test at 5% SL. Hypothesis is that the coin is unbiased.
280-250 280-250
Z= --------------- = -------------------------- = 30/11.8 = 2.54 (Zcal)
Sqrt npq sqrt 500X1/2X1/2
Research in common parlance refers to a search for knowledge. Once can also define research as a
scientific and systematic search for pertinent information on a specific topic. In fact, research is an art of
scientific investigation.
Research is an academic activity and as such the term should be used in a technical sense. According to
Clifford Woody research comprises defining and redefining problems, formulating hypothesis or
suggested solutions; collecting, organising and evaluating data; making deductions and reaching
conclusions; and at last carefully testing the conclusions to determine whether they fit the formulating
hypothesis.
Research is, thus, an original contribution to the existing stock of knowledge making for its
advancement. It is the persuit of truth with the help of study, observation, comparison and experiment.
In short, the search for knowledge through objective and systematic method of finding solution to a
problem is research.
OBJECTIVES OF RESEARCH The purpose of research is to discover answers to questions through the
application of scientific procedures. The main aim of research is to find out the truth which is hidden and
which has not been discovered as yet. Though each research study has its own specific purpose, we may
think of research objectives as falling into a number of following broad groupings:
1. To gain familiarity with a phenomenon or to achieve new insights into it (studies with this object in
view are termed as exploratory or formulative research studies);
2. To portray accurately the characteristics of a particular individual, situation or a group (studies with
this object in view are known as descriptive research studies);
3. To determine the frequency with which something occurs or with which it is associated with
something else (studies with this object in view are known as diagnostic research studies);
4. To test a hypothesis of a causal relationship between variables (such studies are known as hypothesis-
testing research studies).
TYPES OF RESEARCH
The basic types of research are as follows:
(i) Descriptive vs. Analytical: Descriptive research includes surveys and fact-finding enquiries of
different kinds. The major purpose of descriptive research is description of the state of affairs as it
exists at present. In social science and business research we quite often use the term Ex post facto
research for descriptive research studies. The main characteristic of this method is that the researcher
has no control over the variables; he can only report what has happened or what is happening. In
analytical research, on the other hand, the researcher has to use facts or information already available,
and analyze these to make a critical evaluation of the material.
(ii) Applied vs. Fundamental: Research can either be applied (or action) research or fundamental (to
basic or pure) research. Applied research aims at finding a solution for an immediate problem facing a
society or an industrial/business organization, whereas fundamental research is mainly concerned with
generalizations and with the formulation of a theory. “Gathering knowledge for knowledge’s sake is
termed ‘pure’ or ‘basic’ research.” Research to identify social, economic or political trends that may
affect a particular institution are examples of applied research. Thus, the central aim of applied research
is to discover a solution for some pressing practical problem, whereas basic research is directed towards
finding information that has a broad base of applications and thus, adds to the already existing
organized body of scientific knowledge.
(iii) Quantitative vs. Qualitative: Quantitative research is based on the measurement of quantity or
amount. It is applicable to phenomena that can be expressed in terms of quantity. Qualitative research,
on the other hand, is concerned with qualitative phenomenon, i.e., phenomena relating to or involving
quality or kind. For instance, when we are interested in investigating the reasons for human behavior
(i.e., why people think or do certain things), we quite often talk of ‘Motivation Research’, an important
type of qualitative research. Qualitative research is especially important in the behavioral sciences
where the aim is to discover the underlying motives of human behavior. Through such research we can
analyse the various factors which motivate people to behave in a particular manner or which make
people like or dislike a particular thing.
(iv) Conceptual vs. Empirical: Conceptual research is that related to some abstract idea(s) or theory. It is
generally used by philosophers and thinkers to develop new concepts or to reinterpret existing ones. On
the other hand, empirical research relies on experience or observation alone, often without due regard
for system and theory. It is data-based research, coming up with conclusions which are capable of being
verified by observation or experiment.
(v) Some Other Types of Research: All other types of research are variations of one or more of the
above stated approaches, based on either the purpose of research, or the time required to accomplish
research, on the environment in which research is done, or on the basis of some other similar factor.
Form the point of view of time, we can think of research either as one-time research or longitudinal
research. In the former case the research is confined to a single time-period, whereas in the latter case
the research is carried on over several time-periods. Research can be field-setting research or laboratory
research or simulation research, depending upon the environment in which it is to be carried out.
Research can as well be understood as clinical or diagnostic research.
Research methodology is a way to systematically solve the research problem. It may be
understood as a science of studying how research is done scientifically. In it we study the various steps
that are generally adopted by a researcher in studying his research problem along with the logic behind
them. It is necessary for the researcher to know not only the research methods/techniques but also the
methodology. Researchers not only need to know how to develop certain indices or tests, how to
calculate the mean, the mode, the median or the standard deviation or chi-square, how to apply
particular research techniques, but they also need to know which of these methods or techniques, are
relevant and which are not, and what would they mean and indicate and why. Researchers also need to
understand the assumptions underlying various techniques and they need to know the criteria by which
they can decide that certain techniques and procedures will be applicable to certain problems and
others will not. All this means that it is necessary for the researcher to design his methodology for his
problem as the same may differ from problem to problem.
Research Process Before embarking on the details of research methodology and techniques, it
seems appropriate to present a brief overview of the research process. Research process consists of
series of actions or steps necessary to effectively carry out research and the desired sequencing of these
steps.
The research process consists of a number of closely related activities, as shown through I to VII. But
such activities overlap continuously rather than following a strictly prescribed sequence. At times, the
first step determines the nature of the last step to be undertaken. If subsequent procedures have not
been taken into account in the early stages, serious difficulties may arise which may even prevent the
completion of the study. One should remember that the various steps involved in a research process are
not mutually exclusive; nor are they separate and distinct. They do not necessarily follow each other in
any specific order and the researcher has to be constantly anticipating at each step in the research
process the requirements of the subsequent steps. However, the following order concerning various
steps provides a useful procedural guideline regarding the research process:
(1) Formulating the research problem;
(2) Extensive literature survey;
(3) Developing the hypothesis;
(4) Preparing the research design;
(5) Determining sample design;
(6) Collecting the data;
(7) Execution of the project;
(8) Analysis of data;
(9) Hypothesis testing;
(10) Generalizations and interpretation, and
(11) Preparation of the report or presentation of the results, i.e., formal write-up of conclusions reached
Guidelines-
Questions should be brief
Objective in nature
Should be of multiple choice
No mathematical calculations
Avoid sensitive questions
Dual meaning questions be avoided
Should look attractive
No ambiguity
Should be pretested.
Types of questions-
Closed-Ended Questions
Closed-ended questions limit the answers of the respondents to response options provided on the
questionnaire.
Advantages: time-efficient; responses are easy to code and interpret; ideal for quantitative type
of research
Disadvantages: respondents are required to choose a response that does not exactly reflect
their answer; the researcher cannot further explore the meaning of the responses
Some examples of close ended questions are:
a. Dichotomous or two-point questions (e.g. Yes or No, Unsatisfied or Satisfied)
b. Multiple choice questions (e.g. A, B, C or D)
c. Scaled questions that are making use of rating scales such as the Likert scale (i.e. a type of five-
point scale), three-point scales, semantic differential scales, and seven-point scales
Open-Ended Questions
In open-ended questions, there are no predefined options or categories included.
The participantsshould supply their own answers.
Advantages: participants can respond to the questions exactly as how they would like to answer
them; the researcher can investigate the meaning of the responses; ideal forqualitative type of
research
Disadvantages: time-consuming; responses are difficult to code and interpret
Contingency Questions
Questions that need to be answered only when the respondent provides a particular response
to a question prior to them are called contingency questions. Asking these questions effectively
avoids asking people questions that are not applicable to them. For example:
Have you ever smoked a cigarette?
___Yes ___ No
If YES, how many times have you smoked cigarette?
__ Once
__2-5 times
__ 6-10 times
__more than 10 times
The second question above is what we refer to as a contingency question following up a closed-
ended question.
Factual question (background of the respondents)
Opinion based/Attitude based questions- on 5 point scale/7 point scale/14 point scale
Research Design-
The arrangement of conditions for collection and analysis of data in a manner that aims to combine
relevance to the research purpose with economy in procedure.
1) Sampling design
2) Observational design
3) Statistical design
4) Operational design
Important features of Research Design-
Plan- specifies the sources and types of information
Strategy –which approach will be used for gathering and amnalysing the data
Time and costs budgets- most studies are done under these two constraints
Research Design must contain-
Research problem
Procedure and techniques to be used
Population to be studies
Methods to be used in the processing and analyzing
Scaling techniques-
Ordinal
With ordinal scales, it is the order of the values is what’s important and significant, but the differences
between each one is not really known. Take a look at the example below. In each case, we know that a
#4 is better than a #3 or #2, but we don’t know–and cannot quantify–how much better it is. For
example, is the difference between “OK” and “Unhappy” the same as the difference between “Very
Happy” and “Happy?” We can’t say. Mainly based on ranking.
Interval
Interval scales are numeric scales in which we know not only the order, but also the exact differences
between the values. The classic example of an interval scale is Celsius temperature because the
difference between each value is the same. For example, the difference between 60 and 50 degrees is a
measurable 10 degrees, as is the difference between 80 and 70 degrees. Time is another good example
of an interval scale in which the increments are known, consistent, and measurable.
Interval scales are nice because the realm of statistical analysis on these data sets opens up. For
example, central tendency can be measured by mode, median, or mean; standard deviation can also be
calculated.
Like the others, you can remember the key points of an “interval scale” pretty easily. “Interval” itself
means “space in between,” which is the important thing to remember–interval scales not only tell us
about order, but also about the value between each item. Ex- Max temp and Min temp.
Ratio
Ratio scales are the ultimate nirvana when it comes to measurement scales because they tell us about
the order, they tell us the exact value between units, AND they also have an absolute zero–which allows
for a wide range of both descriptive and inferential statistics to be applied. At the risk of repeating
myself, everything above about interval data applies to ratio scales + ratio scales have a clear definition
of zero. Good examples of ratio variables include height and weight.
Ratio scales provide a wealth of possibilities when it comes to statistical analysis. These variables can be
meaningfully added, subtracted, multiplied, divided (ratios). Central tendency can be measured by
mode, median, or mean; measures of dispersion, such as standard deviation and coefficient of variation
can also be calculated from ratio scales. Ex- 1 : 2.
LIKERT SCALES
It is a summated scale with a detailed item analysis in its construction process. It is the most commonly
used scaling method because of its relatively simple construction procedure.
Likert Scales are characterized by the use of items which ask the respondent to check one point on a five
to eleven point scale, e.g. (1) strongly agree (2) agree (3) undecided (4) disagree and (5) strongly
disagree
For instance, they could rate each item on a 1-to-5 response scale where: 1 = strongly disagree, 2 =
disagree, 3 = undecided, 4 = agree, 5 = strongly agree.
Multivariate Data Analysis refers to any statistical technique used to analyze data that arises from
more than one variable. This essentially models reality where each situation, product, or decision involves
more than a single variable. The information age has resulted in masses of data in every field. Despite the
quantum of data available, the ability to obtain a clear picture of what is going on and make intelligent
decisions is a challenge. When available information is stored in database tables containing rows and
columns, Multivariate Analysis can be used to process the information in a meaningful fashion.
Multivariate analysis methods typically used for:
Consumer and market research
Quality control and quality assurance across a range of industries such as food and beverage,
paint, pharmaceuticals, chemicals, energy, telecommunications, etc