Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 58

JOMO KENYATTA UNIVERSITY

OF

AGRICULTURE AND TECHNOLOGY

ABA 2401 – INTRODUCTION TO RESEARCH METHODS

ORIGINS AND DEFINITION OF RESEARCH

The etymology–or linguistic origins–of the word research is derived from Middle French "recherche",
which means "to go about seeking", the term itself being derived from the Old French term
"recerchier" a compound word from "re-" + "cerchier", or "sercher", meaning 'search'. Research is
defined in several ways depending on the field of study and it’s the purpose of research. Some
definitions include:

(1) Research, broadly speaking, is any gathering of data, information and facts for the advancement of
knowledge. In this case, then, activities such as watching the news, reading magazines or newspapers,
chatting with neighbors and friends and so forth can be regarded as research1.

(2) Research is creative work undertaken on a systematic basis in order to increase the stock of
knowledge including knowledge of humans, culture and society, and the use of this stock of knowledge
to devise new applications (Creswell, 2008)2.

(3) Research is a process of steps used to collect and analyze information to increase our understanding
of a topic or issue. It consists of three steps: Pose a question, collect data to answer the question, and
present an answer to the question.

(4) The Merriam-Webster Online Dictionary defines research in more detail as "a studious inquiry or
examination; especially investigation or experimentation aimed at the discovery and interpretation of
facts, revision of accepted theories or laws in the light of new facts, or practical application of such
new or revised theories or laws.

PURPOSES OF RESEARCH

Purposes of research differ among various fields as well as according to the types of research carried
out within established disciplinary boundaries. There are generally two types of research: basic or pure
research and applied research.

Basic research explains the world around us and tries to understand how the universe operates. It is
about finding out what is already there without any greater purpose of research than the explanation
itself. It is a direct descendent of philosophy, where philosophers and scientists try to understand the
underlying principles of existence. Whilst offering no direct benefits, basic research often has indirect
benefits, which can contribute greatly to the advancement of humanity. For example, pure research into
the structure of the atom has led to structural glass, x-rays, nuclear power and silicon chips. The
primary purposes of basic research are documentation, discovery, interpretation or development of
methods and systems for the advancement of human knowledge.

Applied research seeks answers to specific questions that help humanity, for example medical
research, urban studies or environmental studies. Such research generally takes a specific question and
tries to find a definitive and comprehensive answer. The purpose of applied research is about testing
theories, often generated by pure science, and applying them to real life situations. Applied scientific
1
Shuttleworth, M. (2008). "Definition of Research". Available at https://explorable.com/definition-of-research (accessed on 2nd
May, 2020)
2
Creswell, J. W. (2008). Educational Research: Planning, conducting, and evaluating quantitative and qualitative research (3rd
ed.). Upper Saddle River: Pearson.

1
research can be about finding out the answer to a specific problem, such as 'Is global warming
avoidable?' or 'Does a new type of medicine really help the patients?

From these general descriptions we can deduce that research is used to:

(1) establish or confirm facts,

(2) reaffirm the results of previous research work,

(3) solve new or existing problems,

(4) support theorems, or develop new theories,

(5) expansion on past work in the field,

(6) test the validity of instruments, procedures, or experiments, and/or,

(7) replicate elements of prior projects or prior research.

CLASSIFICATIONS OF RESEARCH

Several categories of research exist in diverse fields of study today:

(1) Scientific research is a way to produce knowledge, which is based on objective truth and attempts
to be universal. In other words science is a method, a procedure to produce knowledge i.e. discovering
universalities/principles, laws, and theories through the process of observation and re-observation.
Observation here implies that scientists use "sensory experiences" for the study of the phenomena.
They use their five senses, which are possessed by every normal human being. They not only observe a
phenomenon but also repeat the observation several times in order to verify their observations. The
researchers do so because they want to be accurate and definite about their findings. This research
provides theories for the explanation of nature and the properties of the world. It makes practical
applications possible.

(2) Research in the humanities involves different methods, of which the most salient are
hermeneutics and semiotics3. Humanities scholars usually do not search for the ultimate correct answer
to a question, but instead explore the issues and details that surround it. Context is always important,
and context can be social, historical, political, cultural, or ethnic. An example of research in the
humanities is historical research, which is embodied in historical method. Historians use primary
sources and other evidence to systematically investigate a topic, and then to write histories in the form
of accounts of the past.

(3) Artistic research, also seen as 'practice-based research', can take form when creative works are
considered both the research and the object of research itself. It is the body of thought which offers an
alternative to purely scientific methods in research in its search for knowledge and truth.

Research studies are also divvied into exploratory, descriptive/constructive and experimental studies.

3
Hermeneutics refers to the theory and methodology of interpretation, especially the interpretation of philosophical texts,
biblical texts and wisdom literature. The terms "hermeneutics" and "exegesis" are sometimes used interchangeably.
Hermeneutics is a wider discipline that includes written, verbal, and non-verbal communication. Exegesis focuses primarily upon
texts. Modern hermeneutics includes both verbal and non-verbal communication as well as semiotics, presuppositions, and pre-
understandings. Hermeneutics has been broadly applied in the humanities, especially in law, history and theology. Semiotics is
the study of the construction of meaning or meaning making, the study of sign processes and meaningful communication. It
includes the study of signs and sign processes including, indication, designation, likeness, metaphor, analogy, symbolism,
signification, and communication. Semiotics is closely related to linguistics, which, for its part, studies the structure and meaning
of language more specifically. The semiotic tradition explores the study of signs and symbols as a significant part of
communications. As different from linguistics, however, semiotics also studies non-linguistic sign systems.

2
The goal of the research process is to produce new knowledge or deepen understanding of a topic or
issue. This process takes three main forms (although, as previously discussed, the boundaries between
them may be obscure):

 Exploratory research studies, which helps to identify and define a problem or question.
 Descriptive or constructive research studies, which tests theories and proposes solutions to a
problem or question.
 Empirical research studies, which tests the feasibility of a solution using empirical evidence.

EXPLORATORY STUDIES

Exploratory studies are a researcher’s tool that helps them understand an issue thoroughly before
attempting to quantify mass responses into statistically inferable data. Consider that when a researcher
asks a closed-ended question–essentially a multiple choice question–the list of options for answers
should be exhaustive so as to take account of any possible answer(s) a respondent may have. Forcing
respondents to pick between the options the researcher comes up with offhand is one of the leading
causes of surrogate information bias –a form of research bias. Adding to the options an, “Other, please
specify: (…),” may help pick up any answers outside those the researcher gives. However those outlier
answers probably won’t be statistically useful and therefore make moot the use of closed-ended
questions. Furthermore, without using exploratory research to guide the survey design and question
building process, the research goals may be wrong. They may lead to the collection of entirely false,
biased or inadequate information. Say a researcher is creating a restaurant feedback survey with the end
goal of identifying and improving upon a given restaurant’s weaknesses. They may decide to make
respondents rate their level of happiness with the restaurant’s customer service, menu selection, and
food quality. Though this list of parameters may seem extensive, it is completely possible for a
significant portion of respondents to be dissatisfied with ulterior issues like the restaurant’s atmosphere
or location. However, without any preliminary exploratory research to identify these as issues, the
survey would completely miss them.

Exploratory studies, therefore, provides information that helps identify main issues that should be
addressed in subsequent surveys and significantly reduces a research project’s level of bias. There are
different ways people can use exploratory research in their projects. These are covered in the
succeeding section:

Implementing exploratory research

Exploratory studies may be carried out in four ways:

(i) Focus groups: A focus group most commonly contains 8 to 12 people fitting the description of the
target sample group and asks them specific questions on the issues and subjects being researched.
Sometimes, focus groups will also host interactive exercises during the session and request feedback on
what was given. This depends on what is being researched, like a food sampling for a fast food chain or
maybe a presentation of potential advertisements for an anti-smoking campaign. Focus groups are one
of the most common uses of exploratory research, providing researchers with a great foundation on
where people stand on an issue. The open and natural discussion format of focus groups allows for a
wider variety of perspectives in a relatively short period of time, say, a day or two.

(ii) Secondary research: It is almost impossible to come up with a research topic that hasn’t been
conducted before. Beyond this, when it comes to designing your survey and research plan, it is usually
not best to reinvent the wheel. All research strategies can benefit from reviewing similar studies taken
and learning from their results. It would be appropriate, for instance, to consider an organization’s
previous research as suggestive of the direction in which one should design their present research goals
if working for that organization. For example, if running a second annual customer feedback survey,
one ought to peruse the questions that provided the most useful information and reuse them in the
eminent survey. External secondary research can also help perfect research design. Beyond reviewing
other organizations’ research projects, social media platforms such as blogs and online forums can give
a sense of the prominent issues, opinions and behavior that go along with a given research’s subject
matter.

3
(iii) Expert surveys: Expert surveys allow researchers to gain information from specialists in a field
that the researchers are less qualified or knowledgeable in. For example, if tasked with surveying the
public’s stance and awareness on environmental issues, one could create a preliminary expert survey
for a selected group of environmental authorities. It would ask broad open-ended questions that are
designed to receive large amounts of content, providing the freedom for the experts to demonstrate
their knowledge. With their input, it becomes possible to create a survey covering all aspects of the
issues under study.

(iv) Open-ended questions: All open-ended questions in a survey are exploratory in nature. The mere
fact that respondents are allowed to provide any feedback as they please, gives a researcher the
opportunity to gain insights on topics they haven’t encountered or tackled. Adding a few open-ended
questions in surveys with numbers of respondents in the hundreds or thousands can be somewhat
difficult and time-consuming to sort through, but it can indicate important trends and opinions for
further research. For example, let’s say we own a news website and asked our visitors the open-ended
question, ‘What would you like to see improved most on our website?’ After analyzing the responses,
we identify the top three discussed areas: (1) Navigation (2) Quality of Information (3) Visual
Displays. We can then use these three topics as our main focus or research objectives for a new survey
that will look to statistically quantify people’s issues with the website with closed-ended questions.

DESCRIPTIVE/CONSTRUCTIVE STUDIES

A descriptive study is one in which information is collected without changing the environment (i.e.,
nothing is manipulated). Sometimes these are referred to as “ correlational ” or “ observational ”
studies. The Office of Human Research Protections (OHRP)4 defines a descriptive study as “(…) any
study that is not truly experimental.” In medical and humanities research, a descriptive study can
provide information about the naturally occurring health status, behavior, attitudes or other
characteristics of a particular group. Descriptive studies are also conducted to demonstrate associations
or relationships between things in the world around you. Descriptive studies can involve a one-time
interaction with groups of people (cross-sectional study) or a study might follow individuals over time
(longitudinal study). Descriptive studies, in which the researcher interacts with the participant, may
involve surveys5 or interviews to collect the necessary information. Descriptive studies in which the
researcher does not interact with the participant include observational studies of people in an
environment and studies involving data collection using existing records (e.g., medical record review).
Descriptive studies are usually the best methods for collecting information that will demonstrate
relationships and describe the world as it exists. These types of studies are often done before an
experiment to know what specific things to manipulate and include in an experiment. Bickman and
Roy (1998)6 suggest that descriptive studies can answer questions such as “what is” or “what was.”

Descriptive studies are used to describe characteristics of a populace or phenomenon being studied. It
does not answer questions about how/when/why the characteristics occurred. Rather it addresses the
"what" question (what are the characteristics of the population or situation being studied?) The
characteristics used to describe the situation or population are usually some kind of categorical scheme
also known as descriptive categories. For example, the periodic table categorizes the elements.
Scientists use knowledge about the nature of electrons, protons and neutrons to devise this categorical
scheme. We now take for granted the periodic table, yet it took descriptive research to devise it.
Descriptive research generally precedes explanatory research. For example, over time the periodic
table’s description of the elements allowed scientists to explain chemical reaction and make sound

4
This is an office within the United States Department of Health and Human Services (DHHS) that deals with oversight of
clinical research conducted by the Department, mostly through the National Institutes of Health (NIH). It provides leadership in
the protection of the rights, welfare, and wellbeing of human subjects involved in research conducted or supported by the DHHS.
It provides clarification and guidance, develops educational programs and materials, maintains regulatory oversight, and provides
advice on ethical and regulatory issues in biomedical and behavioral research.

5
Surveys are a method of collecting information by asking questions. Sometimes surveys or interviews are done face-to-face
with people at home, in school, or at work. Other times questions are sent in the mail for people to answer and mail back.
Increasingly, surveys are conducted by telephone.
6
Brickman, Leonard, and Debra J. Roy, eds. 1998. Handbook of Applied Social Research Methods. Thousand Oaks, CA: Sage
Publications.

4
prediction when elements were combined. Hence, descriptive research cannot describe what caused a
situation. Thus, descriptive research cannot be used to as the basis of a causal relationship, where one
variable affects another. In other words, descriptive research can be said to have a low requirement for
internal validity. The description is used for frequencies, averages and other statistical calculations.
Often the best approach, prior to writing descriptive research, is to conduct a survey investigation.
Qualitative research often has the aim of description and researchers may follow-up with examinations
of why the observations exist and what the implications of the findings are.

EXPLANATORY/EXPERIMENTAL/EMPIRICAL/ANALYTICAL STUDIES

Explanatory studies or empirical studies are an attempt to connect ideas in order to understand cause
and effect so as to make sense of phenomena. They are also known as experimental and analytical
studies. An experimental study is a study in which a treatment, procedure, or program is intentionally
introduced and a result or outcome is observed. The American Heritage Dictionary of the English
Language defines an experiment as “a test under controlled conditions that is made to demonstrate a
known truth, to examine the validity of a hypothesis, or to determine the efficacy of something
previously untried.” True experiments have four elements: manipulation, control, random
assignment, and random selection. The most important of these elements are manipulation and
control.

(i) Manipulation is a controlled change that is introduced by the research such as an alteration of the
environment, a program or a treatment. It means that the researcher purposefully changes something in
the research environment.

(ii) Control is an important element of a true experiment that prevents outside factors from influencing
the results of the study. It is used to ensure only the manipulated factors influence the study outcome.
When something is manipulated and controlled and then the outcome happens, it makes us more
confident that the manipulation “caused” the outcome. In addition, experiments involve highly
controlled and systematic procedures in an effort to minimize error and bias. This minimization of
error and bias increases our confidence that the researchers manipulation “caused” the outcome. Bias
refers to any event that happens during the course of a study that is not part of the research protocol 7
and which alters the results. Error occurs when anything happens that interferes with making a
confident conclusion to the study.

(iii) Random assignment means that if there are groups or treatments in the experiment, participants are
assigned to these groups or treatments, or randomly (like the flip of a coin). This means that no matter
whom the participant is, he/she has an equal chance of getting into all of the groups or treatments in an
experiment. This process helps to ensure that the groups or treatments are similar at the beginning of
the study so that there is more confidence that the manipulation (group or treatment) “caused” the
outcome.

(iv) Random selection is a form of sampling where a representative group of research participants is
selected from a larger group by chance. This ensures that the results of the chosen sample are not
attributable to the researchers’ own personal inclinations, prejudices or subjectivities.

There are two major types of empirical research design: qualitative research and quantitative research.
Researchers choose qualitative or quantitative methods according to the nature of the research topic
they want to investigate and the research questions they aim to answer:

Qualitative research
This is the understanding of human behavior and the reasons that govern such behavior. It
involves asking a broad question and collecting data in the form of words, images, video, and
so forth, that is analyzed for themes. This type of research aims to investigate a question
without attempting to quantifiably measure variables or look to potential relationships
between variables. It is viewed as more restrictive in testing hypotheses because it can be
expensive and time-consuming, and typically limited to a single set of research subjects.
7
Research protocol here refers to the research plan developed by the researcher that should be followed when carrying out the
study.

5
Qualitative research is often used as a method of exploratory research as a basis for later
quantitative research hypotheses. Qualitative research is linked with the philosophical and
theoretical stance of social constructionism8.

Quantitative research
This is the systematic empirical investigation of physically measurable properties and
phenomena and their relationships. It entails asking a narrow question and collecting relevant
numerical data to analyze utilizing suitable statistical methods. The quantitative research
designs are experimental, correlational, and survey (or descriptive). Statistics derived from
quantitative research can be used to establish the existence of associative or causal
relationships between variables. Quantitative research is linked with the philosophical and
theoretical stance of positivism9.

The quantitative data collection methods rely on random sampling and structured data collection
instruments that fit diverse experiences into predetermined response categories. These methods
produce results that are easy to summarize, compare, and generalize. Quantitative research is
concerned with testing hypotheses derived from theory and/or being able to estimate the size of a
phenomenon of interest. Depending on the research question, participants may be randomly assigned to
different treatments (this is the only way that a quantitative study can be considered a true experiment).
If this is not feasible, the researcher may collect data on participant and situational characteristics in
order to statistically control for their influence on the dependent, or outcome, variable. If the intent is to
generalize from the research participants to a larger population, the researcher will employ probability
sampling to select participants. In either qualitative or quantitative research, the researcher(s) may
collect primary or secondary data. Primary data is data collected specifically for the research, such as
through interviews or questionnaires. Secondary data is data that already exists, such as census data,
which can be re-used for the research. It is good ethical research practice to use secondary data
wherever possible.

Mixed-method research, i.e. research that includes qualitative and quantitative elements, using both
primary and secondary data, is becoming more common because research studies are increasingly
trans-disciplinary. That is to say they cover fields across the sciences, humanities, history and the arts.

Big data10 has brought big impacts on research methods that now researchers do not put much effort on
data collection, and also methods to analyze easily available huge amount of data have also changed.

Non-empirical research refers to an approach that is grounded in theory as opposed to using


observation and experimentation to achieve the outcome. As such, non-empirical research seeks
solutions to problems using existing knowledge as its source. This, however, does not mean that new
ideas and innovations cannot be found within the pool existing and established knowledge. Non-
empirical is not an absolute alternative to empirical research because they may be used together to
strengthen a research approach. Neither one is less effective than the other since they have their
particular purpose within life and in science. A simple example of a non-empirical task could the

8
Social constructionism or the social construction of reality (also social concept) is a theory in sociology that examines the
development of jointly constructed understandings of the world that form the basis for shared premises or assumptions about
reality. The theory centres on the notions that human beings rationalize their experience by creating models of the social world
and share and reify these models through language. A major focus of social constructionism is to uncover the ways in which
individuals and groups participate in the construction of their perceived social reality. It involves looking at the ways social
phenomena are created, institutionalized, known, and made into tradition by humans.

9
Positivism is a philosophical system that holds that every rationally justifiable assertion can be scientifically verified or is
capable of logical or mathematical proof, and that, therefore, rejects metaphysics and theism. It holds that positive knowledge is
based on natural phenomena and their properties and relations. Thus, information derived from sensory experience, interpreted
through reason and logic, forms the exclusive source of all authoritative knowledge. Positivism also holds that society, like the
physical world, operates according to general laws. Introspective and intuitive knowledge is rejected, as is metaphysics and
theology. Although the positivist approach has been a recurrent theme in the history of western thought its modern version was
formulated by the philosopher Auguste Comte in the early 19th century. Comte argued that, much as the physical world operates
according to gravity and other absolute laws, so does society, and further developed positivism into a Religion of Humanity.
10
In computing terms, Big Data refers to extremely large data sets that may be analyzed computationally to reveal patterns,
trends, and associations, especially relating to human behaviour and interactions.

6
prototyping of a new drug using a differentiated application of existing knowledge; similarly, it could
be the development of a business process in the form of a flow chart and texts where all the ingredients
are from established knowledge. Empirical research on the other hand seeks to create new knowledge
through observations and experiments in which established knowledge could either be contested or
supplemented.

SUMMARY OF RESEARCH CLASSIFICATION

Generally research begins by exploring a new phenomenon with the help of an exploratory study.
Thereafter, a descriptive study is conducted to increase the researchers knowledge of that
phenomenon. Lastly, the researcher needs to explain the phenomenon. An explanatory study is then an
attempt to connect ideas to understand cause and effect that helps researchers to explain what is going
on.

SOURCES OF KNOWLEDGE

Research is only one of several ways of "knowing." Different fields of knowledge have different ways
of knowing. For instance knowledge in social sciences is generated differently from knowledge in
history or mathematics. The branch of philosophy that deals with this knowledge is called
epistemology. Epistemologists generally recognize at least four different sources of knowledge based
on perception, memory, consciousness, and reason:

(1) Intuitive knowledge takes forms such as belief, faith, intuition, or personal opinion. It is based on
feelings rather than hard, cold "facts."

(2) Authoritative knowledge is based on information received from people, books, or a supreme
being. Its strength depends on the legitimacy of these sources.

(3) Logical knowledge is arrived at by reasoning from a point, say, "point A" (which is generally
accepted) to another point, say, "point B" (which is the new knowledge).

(4) Empirical knowledge is based on demonstrable, objective facts (which are determined through
observation and/or experimentation).

Research often makes use of all four of these ways of knowing: intuitive (when coming up with an
initial idea for research), authoritative (when reviewing professional or reputable literature), logical
(when reasoning from findings to conclusions), and, empirical (when engaging in procedures that lead
to concise findings).

Finding information is key in research of all types. There are primary and secondary sources of
knowledge or information. For instance in the academic study of history, a primary source (also called
original source or evidence) is an artifact, a document, diary, manuscript, autobiography, a recording,
or other source of information that was created at the time under study. It serves as an original source
of information about the topic. Similar definitions are used in library science, and other areas of
scholarship, although different fields have somewhat different definitions. In journalism, a primary
source can be a person with direct knowledge of a situation, or a document written by such a person.

Primary sources are distinguished from secondary sources, which cite, comment on, or build upon
primary sources. Generally, accounts written after the fact with the benefit (and possible distortions) of
hindsight are secondary. A secondary source may also be a primary source depending on how it is
used. For example, a memoir would be considered a primary source in research concerning its author
or about his or her friends characterized within it, but the same memoir would be a secondary source if
it were used to examine the culture in which its author lived. "Primary" and "secondary" should be
understood as relative terms, with sources categorized according to specific historical contexts and
what is being studied.[

RESEARCH PROCESS

7
Research is often conducted using the hourglass model structure of research. The hourglass model
starts with a broad spectrum for research, focusing in on the required information through the method
of the project (like the neck of the hourglass), then expands the research in the form of discussion and
results. The major steps in conducting research are:

 Identification of research problem


 Literature review
 Specification of the purpose or objectives of research
 Determination of specific research questions
 Formulation of a conceptual framework11, usually a set of hypotheses
 Choice of a methodologies (for data collection)
 Data collection
 Verification of data
 Analysis and interpretation of the data
 Report preparation and evaluation of research
 Communication of the research findings and, possibly, recommendations

The steps generally represent the overall process; however, they should be viewed as an ever-changing
iterative process rather than a fixed set of steps. Most research begins with a general statement of the
problem, or rather, the purpose for engaging in the study. The literature review identifies flaws or
holes in previous research–essentially a series of discrepancies which is presented as justification for
the study. Often, a literature review is conducted in a given subject area before a research question is
identified. A gap in the current literature, as identified by a researcher, then engenders a research
question. The research question may be parallel to the hypothesis. The hypothesis is the supposition to
be tested. The researcher(s) collects data to test the hypothesis. The researcher(s) then analyzes and
interprets the data via a variety of statistical methods, engaging in what is known as empirical
research. The results of the data analysis in confirming or failing to reject the null hypothesis are then
reported and evaluated. At the end, the researcher may discuss avenues for further research. However,
some researchers advocate for the flip approach: starting with articulating findings and discussion of
them, moving "up" to identification research problem that emerging in the findings and literature
review introducing the findings. The flip approach is justified by the transactional nature of the
research endeavor where research inquiry, research questions, research method, relevant research
literature, and so on are not fully known until the findings fully emerged and interpreted. The
transactional nature of research is represented in two paradoxes: (1) Rudolph Rummel 12 says, "... no
researcher should accept any one or two tests as definitive. It is only when a range of tests are
consistent over many kinds of data, researchers, and methods can one have confidence in the results."
(2) Plato in Meno13 talks about an inherent difficulty, if not a paradox, of doing research that can be
paraphrase in the following way!, "If you know what you're searching for, why do you search for it [i.e.
you have already found it]?! But, then again, if you don't know what you're searching for, what are you
searching for [i.e. you have no idea why you are searching]?!”

11
A conceptual framework is a theoretical structure of assumptions, principles, and rules that holds together the ideas
comprising a broad concept. It primarily originates from the researcher and is used to clarify concepts and relationships between
concepts, provide a context for interpreting study findings, explain observations, outline possible courses of action including the
development of theory that is useful to practice or to present a preferred approach to an idea. A theoretical framework–on the
other hand–is a map for the research; a structure that supports the theory of a research study. It introduces and describes the
theory that explains why the research problem exists. It constitutes the research from previous literature that defines a study’s
core theory and concepts. Researchers read all of the existing literature on their topic of study. They highlight different
definitions of the same terms and the varying methodologies to find answers to key questions. They then develop a consistent
definition for each concept and find the theories upon which their study seeks to build. The framework also reputes theories that
oppose assumptions within the study. Critical analyses of the methodologies within the existing literature develop the
methodology for a new study. In this way researchers use the theoretical framework to craft a logical argument for a need for
their research.

12
Rummel R.J, 1996. The miracle that is freedom: the solution to war, violence, genocide, and poverty. Martin Monograph
Series No. 1, Moscow, Idaho: Martin Institute for Peace Studies and Conflict Resolution, University of Idaho, 1996. Available at
http://www.hawaii.edu/powerkills/MTF.CHAP9.HTM

13
Plato, & Bluck, R. S. (1962). Meno. Cambridge, UK: University Press.

8
9
BASIC ELEMENTS OF RESEARCH

An understanding of the basic elements of research is essential for good research practices. Among the
most important elements to be considered are concepts, variables, associations, sampling, random
selection, random assignment, and blinding.

Concepts

Things we observe are the observable realities, which could be physical or abstract. For purposes of
identification of reality we try to give a name to it. By using the name we communicate with others and
over time it becomes part of our language. A concept is a generalized idea about a class of objects,
attributes, occurrences, or processes that has been given a name. In other words a concept is an idea
expressed as a symbol or in words. Natural science concepts are often expressed in symbolic forms.
Most social science concepts are expressed as words. Words, after all, are symbols too; they are
symbols we learn with language. Height is a concept with which we are familiar. In a sense, a language
is merely an agreement to represent ideas by sound or written characters that people learn at some point
in their lives. Learning concepts and theory is like learning language.

Concepts are an abstraction of reality


Concepts are everywhere, and are used all the time. Height is a simple concept form everyday
experience. What does it mean? It is easy to use the concept of height, but describing the concept itself
is difficult. It represents an abstract idea about physical reality, or an abstraction of reality. Height is a
characteristic of physical objects, the distance from top to bottom.  All people, buildings, trees,
mountains, books and so forth have height. The word height refers to an abstract idea. We associate its
sound and its written form with that idea. There is nothing inherent in the sounds that make up the
word and the idea it represents. The connection is arbitrary, but it is still useful. People can express the
abstract idea to one another using these symbols. In other words concepts are the abstractions of
physical reality into non-physical expressions and verbal symbols like table, leadership, productivity,
and morale are all labels given to some phenomenon (reality). The concepts stand for phenomenon not
the phenomenon itself; hence it may be called an abstraction of empirical reality.

The degree of abstraction of concepts: Concepts vary in their level of abstraction. They are on a
continuum from the most concrete to the most abstract. Very concrete ones refer to straightforward
physical objects or familiar experiences (e.g. height, school, age, family income, or housing). More
abstract concepts refer to ideas that have a diffuse, indirect expression (e.g. family dissolution, paradox,
racism, political power). The organization of concepts in sequence from the most concrete and
individual to the most general indicates the degree of abstraction. Moving up the ladder of abstraction,
the basic concept becomes more abstract, wider in scope, and less amenable to measurement. The
scientific researcher operates at two levels of concepts (and propositions) and on the empirical level of
variables. At the empirical level we experience reality that is we observe objects or events.

Sources of concepts: Everyday culture is filled with concepts, but many of them have vague and
unclear definitions. Likewise, the values and experiences of people in a culture may limit everyday
concepts. Nevertheless, we borrow concepts from everyday culture; though these are refined by
definition and clarification of parameters of measurement or description. We create concepts from
personal experiences, creative thought, or observation. The classical theorist originated many concepts
like family system, gender role, socialization, self-worth, frustration, and displaced aggression. We also
borrow concepts from sister disciplines.

Importance of concepts: Scientific, historical, humanities, and artistic concepts form specialized intra-
disciplinary languages, or jargon. Specialists use jargon as a short hand way to communicate with one
another. Most fields have their own jargon. Architects, physicians, lawyers, engineers, accountants,
plumbers, and auto mechanics all have specialized languages. They use their jargon to refer to the ideas
and objects with which they work. Special problems grow out of the need for concept precision and
inventiveness. Vague meanings attached to a concept create problems of measurement. Therefore, it is

10
not only the construction of concepts that is necessary: these constructions should be precise and
researchers within the field should have some agreement to their meaning(s). Identification of concepts
is necessary because we use concepts in hypothesis formulation. Here too one of the characteristics of a
good hypothesis is that it should be conceptually clear. The success of research hinges on (1) how
clearly we conceptualize and (2) how well others understand the concept we use. For example, we
might ask respondents for an estimate of their family income. This may seem to be a simple,
unambiguous concept, but we may receive varying and confusing answers unless we restrict or narrow
the concept by specifying parameters such as (i) the time period in which the income is made–is it
weekly, monthly, or annually? (ii) whether it is gross or net income–is it income made before or after
income taxes? (iii) whose income–is it the income of the head of the family solely or the income of all
family members? (iv) the source of income–does it comprise salary and wage income(s) only or does it
include income from dividends, interest, and capital gains? (v) type of income–is it in cash or also in
kind, such as free rent, employee discounts, or food stamps?

Definitions

Confusion about the meaning of concepts can destroy a research study's value without the researcher or
client even knowing it. If words have different meanings to the parties involved, then they are not
communicating on the same wavelength. Definitions are one way to reduce the danger of
miscommunication.

Theoretical Definitions: Researchers must struggle with two types of definitions. In the more familiar
dictionary, a concept is defined with explanations and synonyms. For example, a customer is defined as
a patron: a patron, in turn, is defined as customer or client of an establishment; a client is defined as
one who employs the services of any professional: also loosely, a patron of any shop. These circular
definitions may be adequate for general communication but not for research. Dictionary definitions are
also called conceptual or theoretical or nominal definitions. Conceptual definition is a definition in
abstract, theoretical terms. It refers to other ideas or constructs. There is no magical way to turn a
construct into precise conceptual definition. It involves thinking carefully, observing directly,
consulting with others, reading what others have said, and trying possible definitions. A single
construct can have several definitions, and people may disagree over definitions. Conceptual
definitions are linked to theoretical frameworks and to value positions. For example, a conflict theorist
may define social class as the power and property a group of people in a society has or lacks. A
structural functionalist defines it in terms of individual who share a social status, life-style, or
subjective justification. Although people disagree over definitions, the researcher should always state
explicitly which definition he or she is using. Some constructs are highly abstract and complex. They
contain lower level concepts within them (e.g. powerlessness), which can be made even more specific
(e.g. a feeling of little power over wherever on lives). Other concepts are concrete and simple (e.g.
age). When developing definitions, a researcher needs to be aware of how complex and abstract a
construct is. For example, a concrete construct such as age is easier to define (e.g. number of years that
have passed since birth) than is a complex, abstract concept such as morale.

Operational Definitions: In research we must measure concepts and variables (theoretical constructs).
This requires rigorous definitions. A concept must be made operational in order to be measured. An
operational definition gives meanings to a concept by specifying the activities or operations necessary
to measure it. An operational definition specifies what must be done to measure the concept under
investigation.  It is like a manual of instruction or a recipe: do this-and-that in such-and-such a manner.
Operational definitions are also called working definitions stated in terms of specific testing or
measurement criteria. For this to be possible, concepts must have empirical referents: we must be able
to count, dimension, quantify, or in some other way gather measurable information through our senses.
Whether the object to be defined is physical–such as a machine tool–or highly abstract–such as
achievement or motivation–the definition must specify characteristics and how they are to be observed,
measured and recorded. The specifications (for example SI units or metrics) and procedures (tools or
instruments such as meters) of observation, measurement and documentation must be clear so that any
competent person using them would observe, classify or measure the objects the same way. So
operational definitions specify concrete indicators that can be observed and measured.

Research uses both theoretical and operational definitions

11
In research, it is important to communicate phenomena in systematic ways that facilitate their
understanding and clarify their observation. So, in the research process, researchers take note of or look
at an observable phenomenon. They then construct a label or several label(s) for it so as to enable its
conceptualization. They then try to define it theoretically, a process which then gives a lead to the
development of criteria for its operationalization or measurement. Finally they gather data that is
relevant for its description, assessment and interpretation in order to gain or add to knowledge about
the phenomenon.

Variables

The purpose of all research is to describe and explain variance in the world. Variance is simply the
difference; that is, variation that occurs naturally in the world or change that we create as a result of a
manipulation. Variables are names that are given to the variance we wish to explain: A variable (or
theoretical construct) is defined as anything that has a quantity or quality that varies. A variable is
either a result of some force or is itself the force that causes a change in another variable. In
experiments, these are called dependent and independent variables respectively. When a researcher
gives an active drug to one group of people and a placebo, or inactive drug, to another group of people,
the independent variable is the drug treatment. Each person's response to the active drug or placebo is
called the dependent variable. This could be many things depending upon what the drug is for–is it for
high blood pressure or muscle pain. Therefore in experiments, a researcher manipulates an independent
variable to determine if it causes a change in the dependent variable. In a descriptive study, variables
are not manipulated. They are observed as they naturally occur and then associations between variables
are studied. In a way, all the variables in descriptive studies are dependent variables because they are
studied in relation to all the other variables that exist in the setting where the research is taking place.
However, in descriptive studies, variables are not discussed using the terms "independent" or
"dependent." Instead, the names of the variables are used when discussing the study.  For example,
there is more diabetes in people of Native American heritage than people who come from Eastern
Europe.  In a descriptive study, the researcher would examine how diabetes (a variable) is related to a
person's genetic heritage (another variable). Variables are important to understand because they are the
basic units of the information studied and interpreted in research studies. Researchers carefully analyze
and interpret the value(s) of each variable to make sense of how things relate to each other in a
descriptive study or what has happened in an experiment.

Researchers often perform experiments. Take an example of an examination of four people's ability to
throw a ball when they haven't eaten for a specific period of time: 6, 12, 18 and 24 hours. In this
experiment, the researcher will manipulate a variable to see what happens to another variable. These
are what are defined as dependent and independent variables. The dependent variable is the variable a
researcher is interested in. The changes to the dependent variable are what the researcher measures
with all their techniques. In our example, your dependent variable is the person's ability to throw a ball.
We're trying to measure the change in ball throwing as influenced by hunger. An independent
variable is a variable believed to affect the dependent variable. This is the variable that the researcher
manipulates to see if it makes the dependent variable change. In the example of hungry people
throwing a ball, our independent variable is how long it's been since they've eaten. To reiterate, the
independent variable is the thing over which the researcher has control and is manipulating. In this
experiment, the researcher is controlling the food intake of the participant. The dependent variable is
believed to be dependent on the independent variable.

Associations

The term association means that two or more things are related or connected to one another like height
and weight, cholesterol level and heart failure or exercise and weight loss. Associations can be positive
or negative. Positive associations suggest that when one variable is increased, the value of another
variable increases (e.g., as height increases, so does weight; as cholesterol level increases, so does the
risk of heart failure). Negative associations mean that when a variable is increased, the value of another
variable decreases (e.g., exercise is introduced (or increased) and weight decreases). Associations can
be found in experimental or descriptive studies. Finding significant associations, either during
descriptive or experimental studies, may lead to the development of programs or treatments to remedy
a particular problem.

12
Sampling

Sampling is the process of choosing participants for a research study. Sampling involves choosing a
small group of participants that will represent a larger group. Sampling is used because it is difficult or
impractical to include all members of a group (e.g., all Hispanic women in the United States; all male
college athletes). However, research projects are designed to ensure that enough participants are
recruited to generate useful information that can be generalized to or representative of the group
represented.

Random selection

Random selection is a form of sampling where a representative group of research participants is


selected from a larger group by chance. This can be done by identifying all of the possible candidates
for study participation (e.g., people attending the County fair on a Tuesday) and randomly choosing a
subset to participate (e.g., selecting every 10th person who comes through the gate). This allows for
each person to have an equal chance of participating in the study. Allowing each person in the group an
equal chance to participate increases the chance that the smaller group possesses characteristics similar
to the larger group. This produces findings that are more likely to be representative of and applicable to
the larger group.  Therefore, it is extremely important to adhere to this procedure if it is included in the
research design. Ignoring or altering random selection procedures compromises the research design and
subsequent results. For example, friends or relatives may be easier or more convenient to recruit into a
research study, but selecting these individuals would not reflect a random selection of all of the
possible participants. Similarly, it would be wrong to select only individuals who may potentially
benefit from study participation rather than randomly selecting from the entire group of individuals
being studied. Ignoring random selection procedures when they are called for in the research design
reduces the quality of the information collected and decreases the usefulness of the study findings.

Random assignment

Random assignment is a procedure used in experiments to create multiple study groups that include
participants with similar characteristics so that the groups are equivalent at the beginning of the study.
The procedure involves assigning individuals to an experimental treatment or program at random, or by
chance (like the flip of a coin). This means that each individual has an equal chance of being assigned
to either group. Usually in studies that involve random assignment, participants will receive a new
treatment or program, will receive nothing at all or will receive an existing treatment. When using
random assignment, neither the researcher nor the participant can choose the group to which the
participant is assigned. The benefit of using random assignment is that it “evens the playing field.” This
means that the groups will differ only in the program or treatment to which they are assigned. If both
groups are equivalent except for the program or treatment that they receive, then any change that is
observed after comparing information collected about individuals at the beginning of the study and
again at the end of the study can be attributed to the program or treatment. This way, the researcher has
more confidence that any changes that might have occurred are due to the treatment under study and
not to the characteristics of the group. A potential problem with random assignment is the temptation to
ignore the random assignment procedures. For example, it may be tempting to assign an overweight
participant to the treatment group that includes participation in a weight-loss program. Ignoring random
assignment procedures in this study limits the ability to determine whether or not the weight loss
program is effective because the groups will not be randomized. Research staff must follow random
assignment protocol, if that is part of the study design, to maintain the integrity of the research. Failure
to follow procedures used for random assignment prevents the study outcomes from being meaningful
and applicable to the groups represented.

Blinding

Blinding is a technique used to decrease bias on the part of the researcher or the participant. In some
studies, the participant is not told to which group they have been assigned. This is called single
blinding. There is another level of blinding called double blinding where neither the researcher nor the
participant know which group the participant is in until this information is revealed at the end of the

13
study. Blinding can reduce the temptation to ignore random assignment procedures and can reduce any
expectations about the potential effectiveness of the treatment or program since group assignment
remains unknown by the participant, the researcher or both the participant and researcher. The results
are more likely to provide information about the true effect of the treatment or program being tested
when blinding is used.

14
Design Research Methods
Background

Social, political and economic developments of the late 19th and first half of the 20th century put into
motion modern benefits and constraints for living and working. Industrial and technological
breakthroughs associated with this period created social and economic complexities for people and
their environment. Disciplines such as architecture, urban planning, engineering and product
development began to tackle new types of problem-solving that went beyond traditional artifact
making. More informed and methodical approaches to designing were required.

From 1958 to 1963 Horst Rittel was a pioneer in articulating the relationship between science and
design, specifically the limitations of design processes based on the 19th century rational view of
science, in his courses at Ulm School of Design in Germany (Hochschule für Gestaltung - HfG Ulm:
1953–1968). Rittel proposed principles for dealing with these limitations through his seminal HfG
design methods courses: cybernetics14, operational analysis15 and communication theory16. In 1963 he
was recruited to Berkeley to teach design methods courses and helped found the Design Methods
Group (DMG) and the DMG Journal.

Design methods in England originally drew from a 1962 conference called "The Conference on
Systematic and Intuitive Methods in Engineering, Industrial Design, Architecture and
Communications." This event was organized by John Chris Jones, and Peter Slann who, with
conference invitees, were driven by concerns about how their modern industrialized world was being
manifested. Conference participants countered the craftsman model of design which was rooted in
turning raw materials through tried and true craft-based knowledge into finished products. They
believed that a single craft-based designer producing design solutions was not compatible with
addressing the evolving complexity of post-industrial societies. They stressed that designers needed to
work in cross-disciplinary teams where each participant brings his/her specific body of skills, language
and experiences to defining and solving problems in whatever context. The key benefit was to find a
method that suits a particular design situation.

Emergence of design research and design studies

In the late 1950s and early 1960s, graduates of the Ulm School of Design in Germany (Hochschule für
Gestaltung - HfG Ulm: 1953–1968) began to spread Horst Rittel's approach of design methodology
across Europe and the United States in context of their professional work and teaching what became
known as the 'Ulm Model'. Practitioners and scholars began to define an area of research that focused
on design. Three "camps" seemed to emerge to integrate the initial work in Design Methods:

 Behaviorism interpreted Design Methods as a way to describe human behaviour in


relation to the built environment. Its clinical approach tended to rely on human
behavior processes (taxonomic activities).

14
Cybernetics is a transdisciplinary approach for exploring regulatory systems, their structures, constraints, and possibilities. In
the 21st century, the term is often used in a rather loose way to imply control of any system using technology.

15
Operational analysis is a discipline that deals with the application of advanced mathematical methods to help make better
decisions. Employing techniques from mathematical sciences, such as mathematical modeling, statistical analysis, and
mathematical optimization, operations research arrives at optimal or near-optimal solutions to complex decision-making
problems.

16
Communication theory is a field of information theory and mathematics that studies the technical process of information and
the process of human communication. It deals with the development of a model of communication that assists in developing a
mathematical theory of communication. It contributed to the emergence of computer science and information technology and led
to very useful work on redundancy in language. In making 'information' 'measurable' it gave birth to the mathematical study of
'information theory.’

15
 Reductivism broke Design Methods down into small constituent parts. This
scientific approach tended to rely on rationalism and objectified processes such as
epistemological activities.
 Phenomenology approached design methods from an experiential approach (human
experience and perception.)

The Design Research Society was founded in 1967 with many participants from the Conference on
Design Methods in 1962. The purpose of the Society is to promote "the study of and research into the
process of designing in all its many fields" and is an interdisciplinary group with many professions
represented, but all bound by the conviction of the benefits of design research. The Environmental
Design and Research Association is one of the best-known entities that strive to integrate designers and
social science professionals for better-built environments. EDRA was founded by Henry Sanoff in
1969. Both John Chris Jones and Christopher Alexander interacted with EDRA and other camps; both
seemed at a certain point to reject their interpretations. Jones and Christopher also questioned their
original thesis about design methods.

An interesting shift that affected design methods and design studies was the 1968 lecture from Herbert
A. Simon, the Nobel laureate (economics; 1978), who presented "The Sciences of the Artificial." He
proposed using scientific methods to explore the world of man-made things (hence artificial). He
discussed the role of analysis (observation) and synthesis (making) as a process of creating man-made
responses to the world he/she interacted with. Important to Simon's contribution were his notions of
"bounded rationality" and "satisficing." Simon's concept had a profound impact on the discourse in
both design methods, and the newly emerging design studies communities in two ways. It provided an
entry of using scientific ideas to overlay on design, and it also created an internal debate whether
design could/should be expressed and practiced as a type of science with the reduction of emphasis on
intuition.

Nigel Cross, a British academic and design researcher, has been prolific in articulating the issues of
design methods and design research. The discussion of the ongoing debate of what is design research
and design science was, and continues to be articulated by Cross. His thesis is that design is not a
science, but is an area that is searching for "intellectual independence." He views the original design
methods discussions of the 1960s as a way to integrate objective and rational methods in practicing
design. Scientific method was borrowed as one framework, and the term "design science" was coined
in 1966 at the Second Conference on the Design Method focusing on a systematic approach to
practicing design. Cross defined the "science of design" as a way to create a body of work to improve
the understanding of design methods—and more importantly that design methods does not need to be a
binary choice between science and art. The following is what design research is concerned with:

 The physical embodiment of man-made things, how these things perform their jobs,
and how their users perceive and employ them.
 Construction as a human activity, how designers work, how they think, and how they
carry out design activity, and how non-designers participate in the process.
 What is achieved at the end of a purposeful design activity, how an artificial thing
appears, and what it means.
 Embodiment of configurations, forms and geometries.
 Systematic search and acquisition of knowledge related to design and design activity.

Some design research methods and areas of study include: Action Research, Design theory, Reflective
practice, Contextual Inquiry, User-Centred Design and Environmental Design.

Action research is either research initiated to solve an immediate problem or a reflective process of
progressive problem solving led by individuals working with others in teams or as part of a
"community of practice" to improve the way they address issues and solve problems. There are two
types of action research: participatory action research and practical action research. An action research
strategy's purpose is to solve a particular problem and to produce guidelines for best practice. Action
research involves actively participating in a change situation, often via an existing organization, whilst
simultaneously conducting research. Action research can also be undertaken by larger organizations or
institutions, assisted or guided by professional researchers, with the aim of improving their strategies,

16
practices and knowledge of the environments within which they practice. As designers and
stakeholders, researchers work with others to propose a new course of action to help their community
improve its work practices.

Design theory covers the methods, strategies, research and analysis of the term design. Design theory
underpins the concept of, and reflection upon, creative work. Design theory, as well as design, is
influenced by the particular context under which it is operating. Unlike other sciences, which may
consider their subjects experimentally or empirically, design is about changing its environment and
thus is also the subject that is influencing a theory about design. The statements of design theory are
therefore not universal, but always in relation to a situation, a context, or a time.

Reflective practice is the ability to reflect on an action so as to engage in a process of continuous


learning. It involves paying critical attention to the practical values and theories that inform everyday
actions, by examining practice reflectively and reflexively. This leads to developmental insight. A key
rationale for reflective practice is that experience alone does not necessarily lead to learning; deliberate
reflection on experience is essential. Reflective practice can be an important tool in practice-based
professional learning settings where people learn from their own professional experiences, rather than
from formal learning or knowledge transfer. It may be the most important source of personal
professional development and improvement. It is also an important way to bring together theory and
practice; through reflection a person is able to see and label forms of thought and theory within the
context of his or her work. A person who reflects throughout his or her practice is not just looking back
on past actions and events, but is taking a conscious look at emotions, experiences, actions, and
responses, and using that information to add to his or her existing knowledge base and reach a higher
level of understanding.

Contextual inquiry is a user-centered design research method, part of the Contextual Design
methodology. A contextual inquiry interview is usually structured as an approximately two-hour, one-
on-one interaction in which the researcher watches the user do their normal activities and discusses
what they see with the user. A contextual inquiry may gather data from as few as 4 users (for a single,
small task) to 30 or more. Following a contextual inquiry field interview, the method defines
interpretation sessions as a way to analyze the data. In an interpretation session, 3-8 team members
gather to hear the researcher re-tell the story of the interview in order. As the interview is re-told, the
team adds individual insights and facts as notes. They also may capture representations of the user’s
activities as work models. The notes may be organized using an affinity diagram. Many teams use the
contextual data to generate in-depth personas. Contextual inquiries may be conducted to understand the
needs of a market and to scope the opportunities. They may be conducted to understand the work of
specific roles or tasks, to learn the responsibilities and structure of the role. Or they may be narrowly
focused on specific tasks, to learn the details necessary to support that task.

User-centered design or user-driven development is a framework of processes (not restricted to


interfaces or technologies) in which the needs, wants, and limitations of end users of a product, service
or process are given extensive attention at each stage of the design process. User-centered design can
be characterized as a multi-stage problem solving process that not only requires designers to analyze
and foresee how users are likely to use a product, but also to test the validity of their assumptions with
regard to user behavior in real world tests with actual users at each stage of the process from
requirements, concepts, pre-production models, mid production and post production creating a circle of
proof back to and confirming or modifying the original requirements. Such testing is necessary as it is
often very difficult for the designers of a product to understand intuitively what a first-time user of their
design experiences, and what each user's learning curve may look like. The chief difference from other
product design philosophies is that user-centered design tries to optimize the product around how users
can, want, or need to use the product, rather than forcing the users to change their behavior to
accommodate the product.

Environmental design is the process of addressing surrounding environmental parameters when


devising plans, programs, policies, buildings, or products. Environmental design may refer to the
applied arts and sciences dealing with creating the human-designed environment. These fields include
architecture, geography, urban planning, landscape architecture, and interior design. Environmental
design can also encompass interdisciplinary areas such as historical preservation and lighting design. In
terms of a larger scope, environmental design has implications for the industrial design of products:

17
innovative automobiles, wind-electricity generators, solar-electric equipment, and other kinds of
equipment could serve as examples. Currently, the term has expanded to apply to ecological and
sustainability issues.

Other fields related to design research include design management research, professional design
practice research, human-computer interface research, and software development research.

Current state of design methods


Design methods today are multifarious and versatile. This is because design methodology is essentially
a conversation about everything that could be made to happen in design. Design practitioners and
researchers have argued that the language of this conversation must bridge the logical gap between past
and future, but in doing so it should not limit the variety of possible futures that are discussed nor
should it force the choice of a future that is not free. The focus of enhancements to design methods,
therefore, has been on developing a series of relevant, sound, humanistic problem-solving procedures
and techniques to reduce avoidable errors and oversights that can adversely affect design solutions. The
key benefit is to find a method that suits a particular design situation.

The benefits of the original work in design research has been abstracted many times over; but in today's
design environment, several of their main ideas have been integrated into contemporary design
methods:

 Emphasis on the user of design.


 Emphasis on the environmental impacts of design and design-related activities.
 Use of basic research methods to validate convictions with fact.
 Use of brainstorming and other related means to break mental patterns and precedent
 Increased collaborative nature of design with other disciplines; transdisciplinarity.

A large challenge for design as a discipline, its use of methods and an endeavor to create shared values,
is its inherent synthetic nature as an area of study and action. This allows design to be extremely
malleable in nature, borrowing ideas and concepts from a wide variety of professions to suit the ends of
individual practitioners. It also makes design vulnerable since these very activities make design a
discipline inextensible as a shared body of knowledge.

In 1983, Donald Schon at the Massachusetts Institute of Technology, published The Reflective
Practitioner. He saw traditional professions with stable knowledge bases, such as law and medicine,
becoming unstable due to outdated notions of 'technical-rationality' as the grounding of professional
knowledge. Practitioners were able to describe how they 'think on their feet', and how they make use of
a standard set of frameworks and techniques. Schon foresaw the increasing instability of traditional
knowledge and how to achieve it. This is in line with the original founders of design methods who
wanted to break with an unimaginative and static technical society and unify exploration, collaboration
and intuition.

Design research and the development of research methods have influenced design practice and design
education. These advances have benefited the design community by helping to create introductions that
would never have happened if traditional professions remained stable, which did not necessarily allow
collaboration due to gatekeeping (or protection of turf) of areas of knowledge and expertise. Design has
been by nature an interloper activity, with individuals that have crossed disciplines to question and
innovate. The challenge is to transform individual experiences, frameworks and perspectives into a
shared, understandable, and, most importantly, a transmittable area of knowledge. This may prove
difficult because:

 Domain knowledge is a mixture of vocation (discipline) and avocation (interest)


creating hybrid definitions that degrade shared knowledge
 Intellectual capital of design and wider scholarly pluralism has diluted focus and
shared language which has led to ungovernable laissez-faire values

18
 Individual explorations of design discourse focuses too much on individual narratives
leading to personal point of view rather than a critical mass of shared values

In the end, design methods is a term that is widely used. Though conducive to interpretations, it is a
shared belief in an exploratory and rigorous method to solve problems through design, an act which is
part and parcel of what designers aim to accomplish in today's complex world.

19
STATISTICS IN RESEARCH
Statistics is the science of collecting, analyzing and making inferences from data. Statistics is a
particularly useful branch of mathematics that is not only studied theoretically by advanced
mathematicians but one that is used by researchers in many fields to organize, analyze, and summarize
data.  Statistical methods and analyses are often used to communicate research findings and to support
hypotheses and give credibility to research methodology and conclusions.  It is important for
researchers and also consumers of research to understand statistics so that they can be
informed, evaluate the credibility and usefulness of information, and make appropriate
decisions.   Statistical analysis is used to determine if a scientific experiment produced a result that
supported a hypothesis or not.

Some purposes of statistics are to help us understand and describe phenomena in our world and to help
us draw reliable conclusions about those phenomena.  Questions answered by statistics may include:
What are some conclusions you might draw from your study?  What hypotheses can be made?  Who
do you think would be interested in information like this? What are the implications for future
research?  What information supports those implications?

Limitations of Statistics
It is easy to misinterpret statistics and present deceptive analysis. One outlier in an experiment, for
example, can skew results away from the true central mean. In addition, bias can be introduced into
public surveys by asking questions in an inappropriate manner. So-called "push polls" can mislead the
media and the public to believe something that is not supported by fair polls. Additionally, scientists
are often accused of misinterpreting data. For example, a large sample is likely to contain one or more
anomalous results: scientists who do not state a clear hypothesis before running tests and conducting
statistical analysis may conclude that one such small anomaly is evidence for a particular hypothesis
even though the result is merely noise in the data. The important limitations of statistics are the
following:

1. Qualitative information is often considered less useful: Statistical methods don’t study the nature
of phenomena which cannot be expressed in quantitative terms. Such phenomena such as health, riches,
intelligence, moods, perception and so forth require conversion of qualitative data into quantitative
data. Scientific experiments undertaken to measure complex human and social characteristics require
conversion of concepts and constructs into quantitative data. Lately statistical data is ubiquitous and is
widely applied in the analysis of complex information.

2. It does not deal with individual items: It is clear that statistics means the aggregates of facts that
are evaluated or analyzed in relation to each other. Statistics deals with only aggregates of facts or
items and it does not recognize individual items. Thus, individual terms as death of 6 persons in an
accident, 85% of the results of a class of a school in a particular year, and so forth, will not amount to
statistical data if they are not placed in a group of similar items.

3. It does not depict the entirety of certain phenomena: Phenomena occur due to many causes.
Cases exist where these causes cannot be expressed in terms of statistical data. So it may not be
possible to reach useful or correct conclusions regarding the phenomena by using statistical methods.
Development of individuals comprising a community, for example, depends upon many social factors
such as parents’ economic means, education, culture, region, social organization and so forth. But some
of these factors cannot be represented in concrete terms or through statistical data: they may be
metaphysical, visceral or otherwise qualitative by nature. So, by using statistical means, researchers
may inappropriately reduce and analyze those factors quantitatively rather than qualitatively or by other
quantity-related means. In this case then results or conclusion may not be correct because many non-
quantifiable aspects of information are ignored.

4. Statistics are liable to being misconstrued or misused: A shortcoming of statistical methods is


that “they do not bear on their face the label of their quality.” The only proof of quality is the checking

20
of the data and procedures to see how they reflect in the conclusions of a study. It is possible for
inexperienced persons to collect statistical data or for researchers to be dishonest or inherently biased.
Unscrupulous persons may easily misuse statistics. So data must be handled and used with great care.
Otherwise results may prove to be false.

5. Laws are not exact: Two fundamental laws affect statistics and statistical methods including the
decisions made from them:

(i) The law of inertia of large numbers17, and, (ii) The law of statistical regularity18

These laws remind us that statistics are only probabilities. So they are not as good as scientific facts:
results or findings from statistics cannot be considered as good as scientific laws. It is only on the basis
of probability or interpolation, for instance, that we can estimate the production of maize, rice or wheat
in Kenya in 2023, but we cannot make a claim that these probabilities could possibly be 100% existent
in reality. Approximations are not absolute truth.

6. Results are true only on average: As discussed in 5 above, results are usually interpolated for
which time series or regression or probability can be used. These results are not the absolute truth. For
example, if the average of two sets of students is statistically similar, it does not mean that all students
in set A got the exact same scores as those in set B. There may be high variation between the two sets
of students’ scores. Statistics largely deals with averages and these averages may be made up of
individual items radically different from each other.

7. Variant statistical methods exist with which to study problems: In research it is possible to use
various methods to assess a single variable or to assess a set of variables. In statistics, variation–for
example–can be found and expressed in several different ways such as quartile deviation, mean
deviation or standard deviations and the results vary in each case. It must not be assumed that statistical
analysis is the only method that is useful in research. Neither should statistical methods be considered
the best approach for tackling all research problems.

8. Statistical results are not always beyond doubt: Statistics deal only with measurable aspects of
things and, therefore, may seldom give holistic solutions to real-life problem. They provide a basis for
judgment but do not offer, per se, the conclusive solution for complicated matters or complex
decisions. Although various useful laws and formulae may be applied in statistics the results achieved
are not final and conclusive. As they are unable to give complete solution(s) to research problems, the
results must be put in proper perspective and used prudently.

(9) Statistics cannot be applied to heterogeneous data: Statistics require the expression of data in the
same standardized terms so that they can be analyzed by singular methods.

(10) Statistical methods require expert analysis: Only persons who have expert knowledge of
statistics can handle statistical data efficiently and correctly. This, then, means that the decisions made
with regard to statistics are not democratic or representative of the views of all members of a society
regardless of their educational deficits.

(11) Statistical methods are prone to error: Errors are possible and frequently occur in statistical
analysis. Inferential statistics in particular involves certain errors. And in cases it may be that we do not
know whether an error has been committed or not.

17
The law of inertia of large numbers or the law of large numbers, in statistics, the theorem that, as the number of
identically distributed, randomly generated variables increases, their sample mean (average) approaches their
theoretical mean.
18
Statistical regularity is a notion in statistics and probability theory that random events exhibit regularity when
repeated enough times or that enough sufficiently similar random events exhibit regularity. It is an umbrella term
that covers the law of large numbers, all central limit theorems and ergodic theorems.

21
Levels of Measurement
The level of measurement refers to the relationship among the values that are assigned to the attributes
for a variable. What
does that mean?
Begin with the idea of
the variable, in this
example "party
affiliation." That
variable has a number
of attributes. Let's
assume that in this
particular election
context the only
relevant attributes are
"republican",
"democrat", and
"independent". For
purposes of analyzing
the results of this
variable, we arbitrarily assign the values 1, 2 and 3 to the three attributes. The level of measurement
describes the relationship among these three values. In this case, we simply are using the numbers as
shorter placeholders for the lengthier text terms. We don't assume that higher values mean "more" of
something and lower numbers signify "less". We don't assume the value of 2 means that democrats are
twice something that republicans are. We don't assume that republicans are in first place or have the
highest priority just because they have the value of 1. In this case, we only use the values as a shorter
name for the attribute. Here, we would describe the level of measurement as "nominal".

Why are Levels of Measurement Important?


First, knowing the level of measurement helps you decide how to interpret the data from that variable.
When you know that a measure is nominal (like the one just described), then you know that the
numerical values are just short codes for the longer names. Second, knowing the level of measurement
helps you decide what statistical analysis is appropriate on the values that were assigned. If a measure
is nominal, then you know that you would never average the data values or do a t-test on the data.

There are typically four levels of measurement that are defined:

 Nominal
 Ordinal
 Interval
 Ratio

In nominal measurement the numerical values just "name" the attribute uniquely. No ordering of the
cases is implied. For example, jersey numbers in basketball are measures at the nominal level. A player
with number 30 is not more of anything than a player with number 15, and is certainly not twice
whatever number 15 is.

In ordinal measurement the attributes can be rank-ordered. Here, distances between attributes do not
have any meaning. For example, on a survey you might code Educational Attainment as 0=less than
high school; 1=some high school.; 2=high school degree; 3=some college; 4=college degree; 5=post
college. In this measure, higher numbers mean more education. But is distance from 0 to 1 same as 3 to
4? Of course not. The interval between values is not interpretable in an ordinal measure.

22
In interval measurement the
distance between attributes does
have meaning. For example,
when we measure temperature
(in Fahrenheit), the distance
from 30-40 is same as distance
from 70-80. The interval
between values is interpretable.
Because of this, it makes sense
to compute an average of an
interval variable, where it
doesn't make sense to do so for
ordinal scales. But, note that in
interval measurement, ratios
don't make any sense - 80
degrees is not twice as hot as 40
degrees (although the attribute
value is twice as large).

Finally, in ratio measurement there is always an absolute zero that is meaningful. This means that you
can construct a meaningful fraction (or ratio) with a ratio variable. Weight is a ratio variable. In applied
social research most "count" variables are ratio, for example, the number of clients in past six months.
Why? Because you can have zero clients and because it is meaningful to say that "...we had twice as
many clients in the past six months as we did in the previous six months."

It's important to recognize that there is a hierarchy implied in the level of measurement idea. At lower
levels of measurement, assumptions tend to be less restrictive and data analyses tend to be less
sensitive. At each level up the hierarchy, the current level includes all of the qualities of the one below
it and adds something new. In general, it is desirable to have a higher level of measurement (e.g.,
interval or ratio) rather than a lower one (nominal or ordinal).

Scaling
Scaling is the branch of measurement that involves the construction of an instrument that associates
qualitative constructs with quantitative metric units. Scaling evolved out of efforts in psychology and
education to measure "immeasurable" constructs like authoritarianism and self-esteem. In many ways,
scaling remains one of the most arcane and misunderstood aspects of social research measurement.
And, it attempts to do one of the most difficult of research tasks -- measure abstract concepts.

Most people don't even understand what scaling is. The basic idea of scaling is described in General
Issues in Scaling, including the important distinction between a scale and a response format. Scales are
generally divided into two broad categories: uni-dimensional and multidimensional. The uni-
dimensional scaling methods were developed in the first half of the twentieth century and are generally
named after their inventor. We'll look at three types of uni-dimensional scaling methods here:

 Thurstone or Equal-Appearing Interval Scaling


 Likert or "Summative" Scaling
 Guttman or "Cumulative" Scaling

In the late 1950s and early 1960s, measurement theorists developed more advanced techniques for
creating multidimensional scales. Although these techniques are not considered here, you may want to
look at the method of concept mapping that relies on that approach to see the power of these
multivariate methods.

23
General Issues in Scaling
S.S. Stevens came up with what is the simplest and most straightforward definition of scaling. He said:

“Scaling is the assignment of objects to numbers according to a rule.”

But what does that mean? In most scaling, the objects are text statements, usually statements of attitude
or belief. The figure
shows an example.
There are three
statements describing
attitudes towards
immigration. To scale
these statements, we
have to assign numbers
to them. Usually, we
would like the result to
be on at least an interval
scale as indicated by the
ruler in the figure. And
what does "according to
a rule" mean? If you
look at the statements,
you can see that as you
read down, the attitude
towards immigration
becomes more restrictive
-- if a person agrees with a statement on the list, it's likely that they will also agree with all of the
statements higher on the list. In this case, the "rule" is a cumulative one. So what is scaling? It's how we
get numbers that can be meaningfully assigned to objects -- it's a set of procedures. We'll present
several different approaches below.

But first, it is important to clear up one pet peeve, this: people often confuse the idea of a scale and a
response scale. A response scale is the way you collect responses from people on an instrument. You
might use a dichotomous response scale like Agree/Disagree, True/False, or Yes/No. Or, you might use
an interval response scale like a 1-to-5 or 1-to-7 rating. But, if all you are doing is attaching a response
scale to an object or statement, you can't call that scaling. As you will see, scaling involves procedures
that you do independent of the respondent so that you can come up with a numerical value for the
object. In true scaling research, you use a scaling procedure to develop your instrument (scale) and you
also use a response scale to collect the responses from participants. But just assigning a 1-to-5 response
scale for an item is not scaling! The differences are illustrated in the table below.

Scale Response Scale

is used to collect the


results from a process
response for an item

each item on scale has item not associated with a


a scale value scale value

refers to a set of items used for a single item

Purposes of Scaling

24
Why do we do scaling? Why not just create text statements or questions and use response formats to
collect the answers? First, sometimes we do scaling to test a hypothesis. We might want to know
whether the construct or concept is a single dimensional or multidimensional one (more about
dimensionality later). Sometimes, we do scaling as part of exploratory research. We want to know what
dimensions underlie a set of ratings. For instance, if you create a set of questions, you can use scaling
to determine how well they "hang together" and whether they measure one concept or multiple
concepts. But probably the most common reason for doing scaling is for scoring purposes. When a
participant gives their responses to a set of items, we often would like to assign a single number that
represents that's person's overall attitude or belief. For the figure above, we would like to be able to
give a single number that describes a person's attitudes towards immigration, for example.

Dimensionality
A scale can have any number of dimensions in it. Most scales that we develop have only a few
dimensions. What's a dimension? Think of a dimension as a number line. If we want to measure a
construct, we have to decide whether the construct can be measured well with one number line or
whether it may
need more. For
instance, height is
a concept that is
uni-dimensional or
one-dimensional.
We can measure
the concept of
height very well
with only a single
number line (e.g., a
ruler). Weight is
also uni-
dimensional -- we
can measure it with
a scale. Thirst
might also bee
considered a uni-dimensional concept -- you are either more or less thirsty at any given time. It's easy
to see that height and weight are uni-dimensional. But what about a concept like self esteem? If you
think you can measure a person's self esteem well with a single ruler that goes from low to high, then
you probably have a uni-dimensional construct.

What would a two-dimensional


concept be? Many models of
intelligence or achievement
postulate two major dimensions --
mathematical and verbal ability.
In this type of two-dimensional
model, a person can be said to
possess two types of
achievement. Some people will
be high in verbal skills and lower
in math. For others, it will be the
reverse. But, if a concept is truly
two-dimensional, it is not
possible to depict a person's level
on it using only a single number
line. In other words, in order to
describe achievement you would
need to locate a person as a point
in two dimensional (x,y) space.

25
OK, let's push this one step further: how about a three-dimensional concept? Psychologists who study
the idea of meaning theorized that the meaning of a term could be well described in three dimensions.
Put in other terms, any objects can be distinguished or differentiated from each other along three
dimensions. They labeled these three dimensions activity, evaluation, and potency. They called this
general theory of
meaning the semantic
differential. Their
theory essentially
states that you can rate
any object along those
three dimensions. For
instance, think of the
idea of "ballet." If you
like the ballet, you
would probably rate it
high on activity,
favorable on
evaluation, and
powerful on potency.
On the other hand,
think about the concept
of a "book" like a
novel. You might rate
it low on activity (it's
passive), favorable on evaluation (assuming you like it), and about average on potency. Now, think of
the idea of "going to the dentist." Most people would rate it low on activity (it's a passive activity),
unfavorable on evaluation, and powerless on potency (there are few routine activities that make you
feel as powerless!). The theorists who came up with the idea of the semantic differential thought that
the meaning of any concepts could be described well by rating the concept on these three dimensions.
In other words, in order to describe the meaning of an object you have to locate it as a dot somewhere
within the cube (three-dimensional space).

Uni-dimensional or Multi-dimensional?
What are the advantages of using a uni-dimensional model? Uni-dimensional concepts are generally
easier to understand. You have either more or less of it, and that's all. You're either taller or shorter,
heavier or lighter. It's also important to understand what a uni-dimensional scale is as a foundation for
comprehending the more complex multidimensional concepts. But the best reason to use uni-
dimensional scaling is because you believe the concept you are measuring really is uni-dimensional in
reality. As you've seen, many familiar concepts (height, weight, temperature) are actually uni-
dimensional. But, if the concept you are studying is in fact multidimensional in nature, a uni-
dimensional scale or number line won't describe it well. If you try to measure academic achievement on
a single dimension, you would place every person on a single line ranging from low to high achievers.
But how do you score someone who is a high math achiever and terrible verbally, or vice versa? A uni-
dimensional scale can't capture that type of achievement.

Major Uni-dimensional Scale Types

There are three major types of uni-dimensional scaling methods. They are similar in that they each
measure the concept of interest on a number line. But they differ considerably in how they arrive at
scale values for different items. The three methods are Thurstone or Equal-Appearing Interval Scaling,
Likert or "Summative" Scaling, and Guttman or "Cumulative" Scaling.

26
Thurstone Scaling
Thurstone was one of the first and most productive scaling theorists. He actually invented three
different methods for developing a uni-dimensional scale: the method of equal-appearing intervals;
the method of successive intervals; and, the method of paired comparisons. The three methods
differed in how the scale values for items were constructed, but in all three cases, the resulting scale
was rated the same way by respondents. To illustrate Thurstone's approach, I'll show you the easiest
method of the three to implement, the method of equal-appearing intervals.

The Method of Equal-Appearing Intervals

Developing the Focus. The Method of Equal-Appearing Intervals starts like almost every other scaling
method -- with a large set of statements. Oops! I did it again! You can't start with the set of statements
-- you have to first define the focus for the scale you're trying to develop. Let this be a warning to all of
you: methodologists like me often start our descriptions with the first objective methodological step (in
this case, developing a set of statements) and forget to mention critical foundational issues like the
development of the focus for a project. So, let's try this again...

The Method of Equal-Appearing Intervals starts like almost every other scaling method -- with the
development of the focus for the scaling project. Because this is a unidimensional scaling method, we
assume that the concept you are trying to scale is reasonably thought of as one-dimensional. The
description of this concept should be as clear as possible so that the person(s) who are going to create
the statements have a clear idea of what you are trying to measure. I like to state the focus for a scaling
project in the form of a command -- the command you will give to the people who will create the
statements. For instance, you might start with the focus command:

Generate statements that describe specific attitudes that people might have towards persons with
AIDS.

You want to be sure that everyone who is generating statements has some idea of what you are after in
this focus command. You especially want to be sure that technical language and acronyms are spelled
out and understood (e.g., what is AIDS?).

Generating
Potential
Scale Items.
Now, you're
ready to
create
statements.
You want a
large set of
candidate
statements
(e.g., 80 --
100) because
you are
going to
select your
final scale
items from
this pool.
You also want to be sure that all of the statements are worded similarly -- that they don't differ in
grammar or structure. For instance, you might want them each to be worded as a statement which you
could agree or disagree with. You don't want some of them to be statements while others are questions.

27
For our example focus on developing an AIDS attitude scale, we might generate statements like the
following (these statements came from a class exercise done in a 1997 undergrad class):

 people get AIDS by engaging in immoral behavior


 you can get AIDS from toilet seats
 AIDS is the wrath of God
 anybody with AIDS is either gay or a junkie
 AIDS is an epidemic that affects us all
 people with AIDS are bad
 people with AIDS are real people
 AIDS is a cure, not a disease
 you can get AIDS from heterosexual sex
 people with AIDS are like my parents
 you can get AIDS from public toilets
 women don’t get AIDS
 I treat everyone the same, regardless of whether or not they have AIDS
 AIDS costs the public too much
 AIDS is something the other guy gets
 living with AIDS is impossible
 children cannot catch AIDS
 AIDS is a death sentence
 because AIDS is preventable, we should focus our resources on prevention instead of curing
 People who contract AIDS deserve it
 AIDS doesn't have a preference, anyone can get it.
 AIDS is the worst thing that could happen to you.
 AIDS is good because it will help control the population.
 If you have AIDS, you can still live a normal life.
 People with AIDS do not need or deserve our help
 By the time I would get sick from AIDS, there will be a cure
 AIDS will never happen to me
 you can't get AIDS from oral sex
 AIDS is spread the same way colds are
 AIDS does not discriminate
 You can get AIDS from kissing
 AIDS is spread through the air
 Condoms will always prevent the spread of AIDS
 People with AIDS deserve what they got
 If you get AIDS you will die within a year
 Bad people get AIDS and since I am a good person I will never get AIDS
 I don't care if I get AIDS because researchers will soon find a cure for it.
 AIDS distracts from other diseases that deserve our attention more
 bringing AIDS into my family would be the worst thing I could do
 very few people have AIDS, so it's unlikely that I'll ever come into contact with a sufferer
 if my brother caught AIDS I'd never talk to him again
 People with AIDS deserve our understanding, but not necessarily special treatment
 AIDS is a omnipresent, ruthless killer that lurks around dark alleys, silently waiting for naive
victims to wander passed so that it might pounce.
 I can't get AIDS if I'm in a monogamous relationship
 the nation's blood supply is safe
 universal precautions are infallible
 people with AIDS should be quarantined to protect the rest of society
 because I don't live in a big city, the threat of AIDS is very small
 I know enough about the spread of the disease that I would have no problem working in a
health care setting with patients with AIDS
 the AIDS virus will not ever affect me
 Everyone affected with AIDS deserves it due to their lifestyle
 Someone with AIDS could be just like me

28
 People infected with AIDS did not have safe sex
 Aids affects us all.
 People with AIDS should be treated just like everybody else.
 AIDS is a disease that anyone can get if there are not careful.
 It's easy to get AIDS.
 The likelihood of contracting AIDS is very low.
 The AIDS quilt is an emotional reminder to remember those who did not deserve to die
painfully or in vain
 The number of individuals with AIDS in Hollywood is higher than the general public thinks
 It is not the AIDS virus that kills people, it is complications from other illnesses (because the
immune system isn't functioning) that cause death
 AIDS is becoming more a problem for heterosexual women and their offsprings than IV drug
users or homosexuals
 A cure for AIDS is on the horizon
 A cure for AIDS is on the horizon
 Mandatory HIV testing should be established for all pregnant women

Rating the Scale Items. OK, so now you have a set of statements. The next step is to have your
participants (i.e., judges) rate each statement on a 1-to-11 scale in terms of how much each statement
indicates a favorable attitude towards people with AIDS. Pay close attention here! You DON'T want
the participants to tell you what their attitudes towards AIDS are, or whether they would agree with the
statements. You want them to rate the "favorableness" of each statement in terms of an attitude towards
AIDS, where 1 = "extremely unfavorable attitude towards people with AIDS" and 11 = "extremely
favorable attitude towards people with AIDS." (Note that I could just as easily had the judges rate how
much each statement represents a negative attitude towards AIDS. If I did, the scale I developed would
have higher scale values for people with more negative attitudes).

29
Computing Scale Score Values for Each Item. The next step is to analyze the rating data. For each
statement, you need to compute the Median and the Interquartile Range. The median is the value above
and below which 50% of the ratings fall. The first quartile (Q1) is the value below which 25% of the
cases fall and above which 75% of the cases fall -- in other words, the 25th percentile. The median is
the 50th percentile. The third quartile, Q3, is the 75th percentile. The Interquartile Range is the
difference between third and first quartile, or Q3 - Q1. The figure above shows a histogram for a single
item and indicates the median and Interquartile Range. You can compute these values easily with any
introductory statistics program or with most spreadsheet programs. To facilitate the final selection of
items for your scale, you might want to sort the table of medians and Interquartile Range in ascending
order by Median and, within that, in descending order by Interquartile Range. For the items in this
example, we got a table like the following:

Statement Number Median Q1 Q3 Interquartile Range


23 1 1 2.5 1.5
8 1 1 2 1
12 1 1 2 1
34 1 1 2 1
39 1 1 2 1
54 1 1 2 1
56 1 1 2 1
57 1 1 2 1
18 1 1 1 0
25 1 1 1 0
51 1 1 1 0
27 2 1 5 4
45 2 1 4 3
16 2 1 3.5 2.5
42 2 1 3.5 2.5
24 2 1 3 2
44 2 2 4 2
36 2 1 2.5 1.5
43 2 1 2.5 1.5
33 3 1 5 4

30
48 3 1 5 4
20 3 1.5 5 3.5
28 3 1.5 5 3.5
31 3 1.5 5 3.5
19 3 1 4 3
22 3 1 4 3
37 3 1 4 3
41 3 2 5 3
6 3 1.5 4 2.5
21 3 1.5 4 2.5
32 3 2 4.5 2.5
9 3 2 3.5 1.5
1 4 3 7 4
26 4 1 5 4
47 4 1 5 4
30 4 1.5 5 3.5
13 4 2 5 3
11 4 2 4.5 2.5
15 4 3 5 2
40 5 4.5 8 3.5
2 5 4 6.5 2.5
14 5 4 6 2
17 5.5 4 8 4
49 6 5 9.75 4.75
50 8 5.5 11 5.5
35 8 6.25 10 3.75
29 9 5.5 11 5.5
38 9 5.5 10.5 5
3 9 6 10 4
55 9 7 11 4
10 10 6 10.5 4.5
7 10 7.5 11 3.5
46 10 8 11 3
5 10 8.5 11 2.5
53 11 9.5 11 1.5
4 11 10 11 1

Selecting the Final Scale Items. Now, you have to select the final statements for your scale. You
should select statements that are at equal intervals across the range of medians. In our example, we
might select one statement for each of the eleven median values. Within each value, you should try to
select the statement that has the smallest Interquartile Range. This is the statement with the least
amount of variability across judges. You don't want the statistical analysis to be the only deciding
factor here. Look over the candidate statements at each level and select the statement that makes the
most sense. If you find that the best statistical choice is a confusing statement, select the next best
choice.

When we went through our statements, we came up with the following set of items for our scale:

 People with AIDS are like my parents (6)

31
 Because AIDS is preventable, we should focus our resources on prevention instead of curing
(5)
 People with AIDS deserve what they got. (1)
 Aids affects us all (10)
 People with AIDS should be treated just like everybody else. (11)
 AIDS will never happen to me. (3)
 It's easy to get AIDS (5)
 AIDS doesn't have a preference, anyone can get it (9)
 AIDS is a disease that anyone can get if they are not careful (9)
 If you have AIDS, you can still lead a normal life (8)
 AIDS is good because it helps control the population. (2)
 I can't get AIDS if I'm in a monogamous relationship. (4)

The value in parentheses after each statement is its scale value. Items with higher scale values should,
in general, indicate a more favorable attitude towards people with AIDS. Notice that we have randomly
scrambled the order of the statements with respect to scale values. Also, notice that we do not have an
item with scale value of 7 and that we have two with values of 5 and of 9 (one of these pairs will
average out to a 7).

Administering the Scale. You now have a scale -- a yardstick you can use for measuring attitudes
towards people with AIDS. You can give it to a participant and ask them to agree or disagree with each
statement. To get that person's total scale score, you average the scale scores of all the items that person
agreed with. For instance, let's say a respondent completed the scale as follows:

People with AIDS are like my parents.


Agree Disagree

Because AIDS is preventable, we should


focus our resources on prevention instead of
Agree Disagree curing.

People with AIDS deserve what they got.


Agree Disagree

Aids affects us all.


Agree Disagree

People with AIDS should be treated just


Agree Disagree like everybody else.

AIDS will never happen to me.


Agree Disagree

It's easy to get AIDS.


Agree Disagree

AIDS doesn't have a preference, anyone can

32
Agree Disagree get it.

AIDS is a disease that anyone can get if


Agree Disagree they are not careful.

If you have AIDS, you can still lead a


Agree Disagree normal life.

AIDS is good because it helps control the


Agree Disagree population.

I can't get AIDS if I'm in a monogamous


Agree Disagree relationship.

If you're following along with the example, you should see that the respondent checked eight items as
Agree. When we take the average scale values for these eight items, we get a final value for this
respondent of 7.75. This is where this particular respondent would fall on our "yardstick" that measures
attitudes towards persons with AIDS. Now, let's look at the responses for another individual:

People with AIDS are like my parents.


Agree Disagree

Because AIDS is preventable, we should


focus our resources on prevention instead of
Agree Disagree curing.

People with AIDS deserve what they got.


Agree Disagree

Aids affects us all.


Agree Disagree

People with AIDS should be treated just


Agree Disagree like everybody else.

AIDS will never happen to me.


Agree Disagree

It's easy to get AIDS.


Agree Disagree

33
AIDS doesn't have a preference, anyone can
Agree Disagree get it.

AIDS is a disease that anyone can get if


Agree Disagree they are not careful.

If you have AIDS, you can still lead a


Agree Disagree normal life.

AIDS is good because it helps control the


Agree Disagree population.

I can't get AIDS if I'm in a monogamous


Agree Disagree relationship.

In this example, the respondent only checked four items, all of which are on the negative end of the
scale. When we average the scale items for the statements with which the respondent agreed we get an
average score of 2.5, considerably lower or more negative in attitude than the first respondent.

The Other Thurstone Methods

The other Thurstone scaling methods are similar to the Method of Equal-Appearing Intervals. All of
them begin by focusing on a concept that is assumed to be unidimensional and involve generating a
large set of potential scale items. All of them result in a scale consisting of relatively few items which
the respondent rates on Agree/Disagree basis. The major differences are in how the data from the
judges is collected. For instance, the method of paired comparisons requires each judge to make a
judgement about each pair of statements. With lots of statements, this can become very time consuming
indeed. With 57 statements in the original set, there are 1,596 unique pairs of statements that would
have to be compared! Clearly, the paired comparison method would be too time consuming when there
are lots of statements initially.

Likert Scaling
Like Thurstone or Guttman Scaling, Likert Scaling is a unidimensional scaling method. Here, I'll
explain the basic steps in developing a Likert or "Summative" scale.

Defining the Focus. As in all scaling methods, the first step is to define what it is you are trying to
measure. Because this is a uni-dimensional scaling method, it is assumed that the concept you want to
measure is one-dimensional in nature. You might operationalize the definition as an instruction to the
people who are going to create or generate the initial set of candidate items for your scale.

34
Generating the
Items. next, you
have to create the
set of potential
scale items.
These should be
items that can be
rated on a 1-to-5
or 1-to-7
Disagree-Agree
response scale.
Sometimes you
can create the
items by yourself
based on your
intimate
understanding of
the subject
matter. But, more often than not, it's helpful to engage a number of people in the item creation step. For
instance, you might use some form of brainstorming to create the items. It's desirable to have as large a
set of potential items as possible at this stage, about 80-100 would be best.

Rating the Items. The next step is to have a


group of judges rate the items. Usually you
would use a 1-to-5 rating scale where:

1. = strongly unfavorable to the


concept
2. = somewhat unfavorable to the
concept
3. = undecided
4. = somewhat favorable to the
concept
5. = strongly favorable to the concept

Notice that, as in other scaling methods, the


judges are not telling you what they believe
-- they are judging how favorable each item
is with respect to the construct of interest.

Selecting the Items. The next step is to


compute the intercorrelations between all pairs of items, based on the ratings of the judges. In making
judgments about which items to retain for the final scale there are several analyses you can do:

 Throw out any items that have a low correlation with the total (summed) score across all items

In most statistics packages it is relatively easy to compute this type of Item-Total correlation.
First, you create a new variable which is the sum of all of the individual items for each
respondent. Then, you include this variable in the correlation matrix computation (if you
include it as the last variable in the list, the resulting Item-Total correlations will all be the last
line of the correlation matrix and will be easy to spot). How low should the correlation be for
you to throw out the item? There is no fixed rule here -- you might eliminate all items with a
correlation with the total score less that .6, for example.

The item total correlation is a correlation between the question score (e.g., 0 or 1 for multiple choice)
and the overall assessment score (e.g., 67%). It is expected that if a participant gets a question correct
they should, in general, have higher overall assessment scores than participants who get a question
wrong. Similarly with essay type question scoring where a question could be scored between 0 and 5

35
participants who did a really good job on the essay (got a 4 or 5) should have higher overall assessment
scores (maybe 85-90%). This relationship is shown in an example graph below.

This relationship is called ‘discrimination’ referring to how well a question differentiates between
participants who know the material and those that do not know the material. Participants who know the
material taught to them should get high scores on questions and high overall assessment scores.
Participants who did not master the material should get low scores on questions and lower overall
assessment scores. This is the relationship that an item-total correlation provides to help evaluate the
performance of questions. We want to have lots of highly discriminating questions on our tests because
they are the most fine-tuned measurements to find out what participants know and can do. When
looking at an item-total correlation generally negative values are a major red flag it is unexpected that
participants who get low scores on the questions get high scores on the assessment. This could indicate
a mis-keyed question or that the question was highly ambiguous and confusing to participants. Values
for an item-total correlation (point-biserial) between 0 and 0.19 may indicate that the question is not
discriminating well, values between 0.2 and 0.39 indicate good discrimination, and values 0.4 and
above indicate very good discrimination.

 For each item, get the average rating for the top quarter of judges and the bottom quarter.
Then, do a t-test of the differences between the mean value for the item for the top and bottom
quarter judges.

Higher t-values mean that there is a greater difference between the highest and lowest judges.
In more practical terms, items with higher t-values are better discriminators, so you want to
keep these items. In the end, you will have to use your judgment about which items are most
sensibly retained. You want a relatively small number of items on your final scale (e.g., 10-
15) and you want them to have high Item-Total correlations and high discrimination (e.g.,
high t-values).

Administering the Scale. You're now ready to use your Likert scale. Each respondent is asked to rate
each item on some response scale. For instance, they could rate each item on a 1-to-5 response scale
where:

1. = strongly disagree

36
2. = disagree
3. = undecided
4. = agree
5. = strongly agree

There are a variety possible response scales (1-to-7, 1-to-9, 0-to-4). All of these odd-numbered scales
have a middle value is often labeled Neutral or Undecided. It is also possible to use a forced-choice
response scale with an even number of responses and no middle neutral or undecided choice. In this
situation, the respondent is forced to decide whether they lean more towards the agree or disagree end
of the scale for each item.

The final score for the respondent on the scale is the sum of their ratings for all of the items (this is why
this is sometimes called a "summated" scale). On some scales, you will have items that are reversed in
meaning from the overall direction of the scale. These are called reversal items. You will need to
reverse the response value for each of these items before summing for the total. That is, if the
respondent gave a 1, you make it a 5; if they gave a 2 you make it a 4; 3 = 3; 4 = 2; and, 5 = 1.

Example: The Employment Self Esteem Scale

Here's an example of a ten-item Likert Scale that attempts to estimate the level of self esteem a person
has on the job. Notice that this instrument has no center or neutral point -- the respondent has to declare
whether he/she is in agreement or disagreement with the item.

INSTRUCTIONS: Please rate how strongly you agree or disagree with each of the following
statements by placing a check mark in the appropriate box.

Strongly Somewhat Somewhat Strongly 1. I feel good about my work on the job.
Disagree Disagree Agree Agree

2. On the whole, I get along well with


Strongly Somewhat Somewhat Strongly
others at work.
Disagree Disagree Agree Agree

3. I am proud of my ability to cope with


Strongly Somewhat Somewhat Strongly
difficulties at work.
Disagree Disagree Agree Agree

4. When I feel uncomfortable at work, I


Strongly Somewhat Somewhat Strongly
know how to handle it.
Disagree Disagree Agree Agree

5. I can tell that other people at work are


Strongly Somewhat Somewhat Strongly
glad to have me there.
Disagree Disagree Agree Agree

6. I know I'll be able to cope with work for


Strongly Somewhat Somewhat Strongly
as long as I want.
Disagree Disagree Agree Agree

7. I am proud of my relationship with my


Strongly Somewhat Somewhat Strongly
supervisor at work.
Disagree Disagree Agree Agree

8. I am confident that I can handle my job


Strongly Somewhat Somewhat Strongly
without constant assistance.
Disagree Disagree Agree Agree

9. I feel like I make a useful contribution at


Strongly Somewhat Somewhat Strongly
work.
Disagree Disagree Agree Agree
10. I can tell that my co-workers respect
Strongly Somewhat Somewhat Strongly me.

37
Disagree Disagree Agree Agree

Guttman Scaling
Guttman scaling is also sometimes known as cumulative scaling or scalogram analysis. The purpose
of Guttman scaling is to establish a one-dimensional continuum for a concept you wish to measure.
What does that mean? Essentially, we would like a set of items or statements so that a respondent who
agrees with any specific question in the list will also agree with all previous questions. Put more
formally, we would like to be able to predict item responses perfectly knowing only the total score for
the respondent. For example, imagine a ten-item cumulative scale. If the respondent scores a four, it
should mean that he/she agreed with the first four statements. If the respondent scores an eight, it
should mean they agreed with the first eight. The object is to find a set of items that perfectly matches
this pattern. In practice, we would seldom expect to find this cumulative pattern perfectly. So, we use
scalogram analysis to examine how closely a set of items corresponds with this idea of cumulativeness.
Here, I'll explain how we develop a Guttman scale.

Define the Focus. As in all of the scaling methods. we begin by defining the focus for our scale. Let's
imagine that you wish to develop a cumulative scale that measures U.S. citizen attitudes towards
immigration. You would want to be sure to specify in your definition whether you are talking about
any type of immigration (legal and illegal) from anywhere (Europe, Asia, Latin and South America,
Africa).

Develop the Items. Next, as in all scaling methods, you would develop a large set of items that reflect
the concept. You might do this yourself or you might engage a knowledgeable group to help. Let's say
you came up with the following statements:

 I would permit a child of mine to marry an immigrant.


 I believe that this country should allow more immigrants in.
 I would be comfortable if a new immigrant moved next door to me.
 I would be comfortable with new immigrants moving into my community.
 It would be fine with me if new immigrants moved onto my block.
 I would be comfortable if my child dated a new immigrant.

Of course, we would want to come up with many more statements (about 80-100 would be desirable).

Rate the Items. Next, we would want


to have a group of judges rate the
statements or items in terms of how
favorable they are to the concept of
immigration. They would give a Yes if
the item was favorable toward
immigration and a No if it is not. Notice
that we are not asking the judges
whether they personally agree with the
statement. Instead, we're asking them to
make a judgment about how the
statement is related to the construct of
interest.

Develop the Cumulative Scale. The


key to Guttman scaling is in the
analysis. We construct a matrix or table
that shows the responses of all the
respondents on all of the items. We then

38
sort this matrix so that respondents who agree with more statements are listed at the top and those
agreeing with fewer are at the bottom. For respondents with the same number of agreements, we sort
the statements from left to right from those that most agreed to to those that fewest agreed to. We might
get a table something like the figure. Notice that the scale is very nearly cumulative when you read
from left to right across the columns (items). Specifically if someone agreed with Item 7, they always
agreed with Item 2. And, if someone agreed with Item 5, they always agreed with Items 7 and 2. The
matrix shows that the cumulativeness of the scale is not perfect, however. While in general, a person
agreeing with Item 3 tended to also agree with 5, 7 and 2, there are several exceptions to that rule.

While we can examine the matrix if there are only a few items in it, if there are lots of items, we need
to use a data analysis called scalogram analysis to determine the subsets of items from our pool that
best approximate the cumulative property. Then, we review these items and select our final scale
elements. There are several statistical techniques for examining the table to find a cumulative scale.
Because there is seldom a perfectly cumulative scale we usually have to test how good it is. These
statistics also estimate a scale score value for each item. This scale score is used in the final calculation
of a respondent's score.

Administering the Scale. Once you've selected the final scale items, it's relatively simple to administer
the scale. You simply present the items and ask the respondent to check items with which they agree.
For our hypothetical immigration scale, the items might be listed in cumulative order as:

 I believe that this country should allow more immigrants in.


 I would be comfortable with new immigrants moving into my community.
 It would be fine with me if new immigrants moved onto my block.
 I would be comfortable if a new immigrant moved next door to me.
 I would be comfortable if my child dated a new immigrant.
 I would permit
a child of mine to
marry an
immigrant.

Of course, when we give the


items to the
respondent, we would
probably want to mix up
the order. Our final scale
might look like:

INSTRUCTIONS: Place a check next to each statement you agree


with.

_____ I would permit a child of mine to marry an immigrant.

_____ I believe that this country should allow more immigrants in.

_____ I would be comfortable if a new immigrant moved next door


to me.

39
_____ I would be comfortable with new immigrants moving into
my community.

_____ It would be fine with me if new immigrants moved onto my


block.

_____ I would be comfortable if my child dated a new immigrant.

Each scale item has a scale value associated with it (obtained from the scalogram analysis). To
compute a respondent's scale score we simply sum the scale values of every item they agree with. In
our example, their final value should be an indication of their attitude towards immigration.

40
Types of Variable
It is important to understand variables before delving further into statistics. All experiments examine
some kind of variable(s). A variable is not only something that we measure, but also something that we
can manipulate and something we can control for. To understand the characteristics of variables and
how we use them in research, this topic is divided into three main sections. First, we illustrate the role
of dependent and independent variables. Second, we discuss the difference between experimental and
non-experimental research. Finally, we explain how variables can be characterised as either
discrete/categorical or continuous.

Dependent and Independent Variables

An independent variable, sometimes called an experimental or predictor variable, is a variable that is


being manipulated in an experiment in order to observe the effect on a dependent variable, sometimes
called an outcome variable.

Imagine that a tutor asks 100 students to complete a maths test. The tutor wants to know why some
students perform better than others. Whilst the tutor does not know the answer to this, she thinks that it
might be because of two reasons: (1) some students spend more time revising for their test; and (2)
some students are naturally more intelligent than others. As such, the tutor decides to investigate the
effect of revision time and intelligence on the test performance of the 100 students. The dependent and
independent variables for the study are:

Dependent Variable: Test Mark (measured from 0 to 100)

Independent Variables: Revision time (measured in hours) Intelligence (measured using IQ score)

The dependent variable is simply that, a variable that is dependent on an independent variable(s). For
example, in our case the test mark that a student achieves is dependent on revision time and
intelligence. Whilst revision time and intelligence (the independent variables) may (or may not) cause a
change in the test mark (the dependent variable), the reverse is implausible; in other words, whilst the
number of hours a student spends revising and the higher a student's IQ score may (or may not) change
the test mark that a student achieves, a change in a student's test mark has no bearing on whether a
student revises more or is more intelligent (this simply doesn't make sense).

Therefore, the aim of the tutor's investigation is to examine whether these independent variables -
revision time and IQ - result in a change in the dependent variable, the students' test scores. However, it
is also worth noting that whilst this is the main aim of the experiment, the tutor may also be interested
to know if the independent variables - revision time and IQ - are also connected in some way.

In the section on experimental and non-experimental research that follows, we find out a little more
about the nature of independent and dependent variables.

Experimental and Non-Experimental Research

 Experimental research: In experimental research, the aim is to manipulate an independent


variable(s) and then examine the effect that this change has on a dependent variable(s). Since
it is possible to manipulate the independent variable(s), experimental research has the
advantage of enabling a researcher to identify a cause and effect between variables. For
example, take our example of 100 students completing a maths exam where the dependent
variable was the exam mark (measured from 0 to 100), and the independent variables were
revision time (measured in hours) and intelligence (measured using IQ score). Here, it would

41
be possible to use an experimental design and manipulate the revision time of the students.
The tutor could divide the students into two groups, each made up of 50 students. In "group
one", the tutor could ask the students not to do any revision. Alternately, "group two" could be
asked to do 20 hours of revision in the two weeks prior to the test. The tutor could then
compare the marks that the students achieved.
 Non-experimental research: In non-experimental research, the researcher does not
manipulate the independent variable(s). This is not to say that it is impossible to do so, but it
will either be impractical or unethical to do so. For example, a researcher may be interested in
the effect of illegal, recreational drug use (the independent variable(s)) on certain types of
behaviour (the dependent variable(s)). However, whilst possible, it would be unethical to ask
individuals to take illegal drugs in order to study what effect this had on certain behaviours.
As such, a researcher could ask both drug and non-drug users to complete a questionnaire that
had been constructed to indicate the extent to which they exhibited certain behaviours. Whilst
it is not possible to identify the cause and effect between the variables, we can still examine
the association or relationship between them. In addition to understanding the difference
between dependent and independent variables, and experimental and non-experimental
research, it is also important to understand the different characteristics amongst variables. This
is discussed next.

Discrete/Categorical and Continuous Variables

Categorical variables are also known as discrete or qualitative variables. Categorical variables can be
further categorized as either nominal, ordinal or dichotomous.

 Nominal variables are variables that have two or more categories, but which do not have an
intrinsic order. For example, a real estate agent could classify their types of property into
distinct categories such as houses, condos, co-ops or bungalows. So "type of property" is a
nominal variable with 4 categories called houses, condos, co-ops and bungalows. Of note, the
different categories of a nominal variable can also be referred to as groups or levels of the
nominal variable. Another example of a nominal variable would be classifying where people
live in the USA by state. In this case there will be many more levels of the nominal variable
(50 in fact).
 Dichotomous variables are nominal variables which have only two categories or levels. For
example, if we were looking at gender, we would most probably categorize somebody as
either "male" or "female". This is an example of a dichotomous variable (and also a nominal
variable). Another example might be if we asked a person if they owned a mobile phone.
Here, we may categorise mobile phone ownership as either "Yes" or "No". In the real estate
agent example, if type of property had been classified as either residential or commercial then
"type of property" would be a dichotomous variable.
 Ordinal variables are variables that have two or more categories just like nominal variables
only the categories can also be ordered or ranked. So if you asked someone if they liked the
policies of the Democratic Party and they could answer either "Not very much", "They are
OK" or "Yes, a lot" then you have an ordinal variable. Why? Because you have 3 categories,
namely "Not very much", "They are OK" and "Yes, a lot" and you can rank them from the
most positive (Yes, a lot), to the middle response (They are OK), to the least positive (Not
very much). However, whilst we can rank the levels, we cannot place a "value" to them; we
cannot say that "They are OK" is twice as positive as "Not very much" for example.

Continuous variables are also known as quantitative variables. Continuous variables can be further
categorized as either interval or ratio variables.

 Interval variables are variables for which their central characteristic is that they can be
measured along a continuum and they have a numerical value (for example, temperature
measured in degrees Celsius or Fahrenheit). So the difference between 20C and 30C is the
same as 30C to 40C. However, temperature measured in degrees Celsius or Fahrenheit is NOT
a ratio variable.
 Ratio variables are interval variables, but with the added condition that 0 (zero) of the
measurement indicates that there is none of that variable. So, temperature measured in degrees

42
Celsius or Fahrenheit is not a ratio variable because 0C does not mean there is no temperature.
However, temperature measured in Kelvin is a ratio variable as 0 Kelvin (often called absolute
zero) indicates that there is no temperature whatsoever. Other examples of ratio variables
include height, mass, distance and many more. The name "ratio" reflects the fact that you can
use the ratio of measurements. So, for example, a distance of ten metres is twice the distance
of 5 metres.

Ambiguities in classifying a type of variable

In some cases, the measurement scale for data is ordinal, but the variable is treated as continuous. For
example, a Likert scale that contains five values - strongly agree, agree, neither agree nor disagree,
disagree, and strongly disagree - is ordinal. However, where a Likert scale contains seven or more
value - strongly agree, moderately agree, agree, neither agree nor disagree, disagree, moderately
disagree, and strongly disagree - the underlying scale is sometimes treated as continuous (although
where you should do this is a cause of great dispute).

It is worth noting that how we categorise variables is somewhat of a choice. Whilst we categorised
gender as a dichotomous variable (you are either male or female), social scientists may disagree with
this, arguing that gender is a more complex variable involving more than two distinctions, but also
including measurement levels like gender queer, intersex and transgender. At the same time, some
researchers would argue that a Likert scale, even with seven values, should never be treated as a
continuous variable.

Types of Statistics: Descriptive and Inferential Statistics


When analysing data, such as the marks achieved by 100 students for a piece of coursework, it is
possible to use both descriptive and inferential statistics in your analysis of their marks. Typically, in
most research conducted on groups of people, you will use both descriptive and inferential statistics to
analyse your results and draw conclusions. So what are descriptive and inferential statistics? And what
are their differences?

Descriptive Statistics
Descriptive statistics is the term given to the analysis of data that helps describe, show or summarize
data in a meaningful way such that, for example, patterns might emerge from the data. Descriptive
statistics do not, however, allow us to make conclusions beyond the data we have analysed or reach
conclusions regarding any hypotheses we might have made. They are simply a way to describe our
data.

Descriptive statistics are very important because if we simply presented our raw data it would be hard
to visulize what the data was showing, especially if there was a lot of it. Descriptive statistics therefore
enables us to present the data in a more meaningful way, which allows simpler interpretation of the
data. For example, if we had the results of 100 pieces of students' coursework, we may be interested in
the overall performance of those students. We would also be interested in the distribution or spread of
the marks. Descriptive statistics allow us to do this. How to properly describe data through statistics
and graphs is an important topic covered in subsequent parts of this section. Typically, there are two
general types of statistic that are used to describe data:

 Measures of central tendency: these are ways of describing the central position of a
frequency distribution for a group of data. In this case, the frequency distribution is simply the
distribution and pattern of marks scored by the 100 students from the lowest to the highest.
We can describe this central position using a number of statistics, including the mode, median,
and mean.
 Measures of spread or dispersion: these are ways of summarizing a group of data by
describing how the scores are distributed. For example, the mean score of our 100 students
may be 65 out of 100. However, not all students will have scored 65 marks. Rather, their

43
scores will be spread out: some will be lower and others higher. Measures of spread/dispersion
help us to summarize how these scores are distributed. To describe this spread, a number of
statistics are available to us, including the range, quartiles, absolute deviation, variance and
standard deviation.

When we use descriptive statistics it is useful to summarize our group of data using a combination of
tabulated description (i.e., tables), graphical description (i.e., graphs and charts) and statistical
commentary (i.e., a discussion of the results).

Measures of Central Tendency

A measure of central tendency is a single value that attempts to describe a set of data by identifying the
central position within that set of data. As such, measures of central tendency are sometimes called
measures of central location. They are also classed as summary statistics. The mean (often called the
average) is most likely the measure of central tendency that you are most familiar with, but there are
others, such as the median and the mode.

The mean, median and mode are all valid measures of central tendency, but under different conditions,
some measures of central tendency become more appropriate to use than others. In the following
sections, we will look at the mean, mode and median, and learn how to calculate them and under what
conditions they are most appropriate to be used.

Mean (Arithmetic)

The mean (or average) is the most popular and well-known measure of central tendency. It can be used
with both discrete and continuous data, although its use is most often with continuous data. The mean
is equal to the sum of all the values in the data set divided by the number of values in the data set. So, if
we have n values in a data set and they have values x1, x2, ..., xn, the sample mean, usually denoted by
(pronounced x bar), is:

This formula is usually written in a slightly different manner using the Greek capitol letter, ,
pronounced "sigma", which means "sum of...":

You may have noticed that the above formula refers to the sample mean. So, why have we called it a
sample mean? This is because, in statistics, samples and populations have very different meanings and
these differences are very important, even if, in the case of the mean, they are calculated in the same
way. To acknowledge that we are calculating the population mean and not the sample mean, we use the
Greek lower case letter "mu", denoted as µ:

The mean is essentially a model of your data set. It is the value that is most common. You will notice,
however, that the mean is not often one of the actual values that you have observed in your data set.

44
However, one of its important properties is that it minimizes error in the prediction of any one value in
your data set. That is, it is the value that produces the lowest amount of error from all other values in
the data set. An important property of the mean is that it includes every value in your data set as part of
the calculation. In addition, the mean is the only measure of central tendency where the sum of the
deviations of each value from the mean is always zero.

When not to use the mean

The mean has one main disadvantage: it is particularly susceptible to the influence of outliers. These
are values that are unusual compared to the rest of the data set by being especially small or large in
numerical value. For example, consider the wages of staff at a factory below:

Staff 1 2 3 4 5 6 7 8 9 10
Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k

The mean salary for these ten staff is $30.7k. However, inspecting the raw data suggests that this mean
value might not be the best way to accurately reflect the typical salary of a worker, as most workers
have salaries in the $12k to 18k range. The mean salary is skewed by the two large salaries. Therefore,
in this situation, we would like to have a better measure of central tendency. As we will find out later,
taking the median would be a better measure of central tendency in this situation.

Another case in which to prefer the median over the mean (or mode) is when data is skewed (i.e., the
frequency distribution for the data is skewed. If we consider the normal distribution - as this is the most
frequently assessed in statistics - when the data is perfectly normal, the mean, median and mode are
identical. Moreover, they all represent the most typical value in the data set. However, as the data
becomes skewed the mean loses its ability to provide the best central location for the data because the
skewed data is dragging it away from the typical value. However, the median best retains this position
and is not as strongly influenced by the skewed values. This is explained in more detail in the skewed
distribution section later in this guide.

Median

The median is the middle score for a set of data that has been arranged in order of magnitude. The
median is less affected by outliers and skewed data. In order to calculate the median, suppose we have
the data below:

65 55 89 56 35 14 56 55 87 45 92

We first need to rearrange that data into order of magnitude (smallest first):

14 35 45 55 55 56 56 65 87 89 92

Our median mark is the middle mark - in this case, 56 (highlighted in bold). It is the middle mark
because there are 5 scores before it and 5 scores after it. This works fine when you have an odd number
of scores, but what happens when you have an even number of scores? What if you had only 10 scores?
Well, you simply have to take the middle two scores and average the result. So, if we look at the
example below:

65 55 89 56 35 14 56 55 87 45

We again rearrange that data into order of magnitude (smallest first):

14 35 45 55 55 56 56 65 87 89

45
Only now we have to take the 5th and 6th score in our data set and average them to get a median of
55.5.

Mode

The mode is the most frequent score in our data set. On a histogram it represents the highest bar in a
bar chart or histogram. You can, therefore, sometimes consider the mode as being the most popular
option. An example of a mode is presented below:

Normally, the mode is used for categorical data where we wish to know which is the most common
category, as illustrated below:

46
We can see above that the most common form of transport, in this particular data set, is the bus.
However, one of the problems with the mode is that it is not unique, so it leaves us with problems when
we have two or more values that share the highest frequency, such as below:

We are now stuck as to which mode best describes the central tendency of the data. This is particularly
problematic when we have continuous data because we are more likely not to have any one value that
is more frequent than the other. For example, consider measuring 30 peoples' weight (to the nearest 0.1
kg). How likely is it that we will find two or more people with exactly the same weight (e.g., 67.4 kg)?
The answer, is probably very unlikely - many people might be close, but with such a small sample (30

47
people) and a large range of possible weights, you are unlikely to find two people with exactly the
same weight; that is, to the nearest 0.1 kg. This is why the mode is very rarely used with continuous
data.

Another problem with the mode is that it will not provide us with a very good measure of central
tendency when the most common mark is far away from the rest of the data in the data set, as depicted
in the diagram below:

In the above diagram the mode has a value of 2. We can clearly see, however, that the mode is not
representative of the data, which is mostly concentrated around the 20 to 30 value range. To use the
mode to describe the central tendency of this data set would be misleading.

Skewed Distributions and the Mean and Median

We often test whether our data is normally distributed because this is a common assumption underlying
many statistical tests. An example of a normally distributed set of data is presented below:

48
When you have a normally distributed sample you can legitimately use both the mean or the median as
your measure of central tendency. In fact, in any symmetrical distribution the mean, median and mode
are equal. However, in this situation, the mean is widely preferred as the best measure of central
tendency because it is the measure that includes all the values in the data set for its calculation, and any
change in any of the scores will affect the value of the mean. This is not the case with the median or
mode.

However, sometimes our data is skewed, for example, as with the right-skewed data set below:

We find, in this case, that the mean is being dragged in the direction of the skew. In these situations,
the median is generally considered to be the best representative of the central location of the data. The

49
more skewed the distribution, the greater the difference between the median and mean, and the greater
emphasis should be placed on using the median as opposed to the mean. A classic example of the
above right-skewed distribution is income (salary), where higher-earners provide a false representation
of the typical income if expressed as a mean and not a median.

If dealing with a normal distribution, and tests of normality show that the data is non-normal, it is
customary to use the median instead of the mean. However, this is more a rule of thumb than a strict
guideline. Sometimes, researchers wish to report the mean of a skewed distribution if the median and
mean are not appreciably different (a subjective assessment), and if it allows easier comparisons to
previous research to be made.

Summary of when to use the mean, median and mode

Please use the following summary table to know what the best measure of central tendency is with
respect to the different types of variable.

Type of Variable Best measure of central tendency


Nominal Mode
Ordinal Median
Interval/Ratio (not skewed) Mean
Interval/Ratio (skewed) Median

50
Measures of dispersal

Standard Deviation

The standard deviation is a measure of the spread of scores within a set of data. Usually, we are
interested in the standard deviation of a population. However, as we are often presented with data from
a sample only, we can estimate the population standard deviation from a sample standard deviation.
These two standard deviations - sample and population standard deviations - are calculated differently.
In statistics, we are usually presented with having to calculate sample standard deviations, and so this is
what this article will focus on, although the formula for a population standard deviation will also be
shown.

When to use the sample or population standard deviation

We are normally interested in knowing the population standard deviation because our population
contains all the values we are interested in. Therefore, you would normally calculate the population
standard deviation if: (1) you have the entire population or (2) you have a sample of a larger
population, but you are only interested in this sample and do not wish to generalize your findings to the
population. However, in statistics, we are usually presented with a sample from which we wish to
estimate (generalize to) a population, and the standard deviation is no exception to this. Therefore, if all
you have is a sample, but you wish to make a statement about the population standard deviation from
which the sample is drawn, you need to use the sample standard deviation. Confusion can often arise as
to which standard deviation to use due to the name "sample" standard deviation incorrectly being
interpreted as meaning the standard deviation of the sample itself and not the estimate of the population
standard deviation based on the sample.

What type of data should you use when you calculate a standard deviation?

The standard deviation is used in conjunction with the mean to summarize continuous data, not
categorical data. In addition, the standard deviation, like the mean, is normally only appropriate when
the continuous data is not significantly skewed or has outliers.

Examples of when to use the sample or population standard deviation

Q. A teacher sets an exam for their pupils. The teacher wants to summarize the results the pupils
attained as a mean and standard deviation. Which standard deviation should be used?

A. Population standard deviation. Why? Because the teacher is only interested in this class of pupils'
scores and nobody else.

Q. A researcher has recruited males aged 45 to 65 years old for an exercise training study to investigate
risk markers for heart disease (e.g., cholesterol). Which standard deviation would most likely be used?

A. Sample standard deviation. Although not explicitly stated, a researcher investigating health related
issues will not be concerned with just the participants of their study; they will want to show how their
sample results can be generalized to the whole population (in this case, males aged 45 to 65 years old).
Hence, the use of the sample standard deviation.

Q. One of the questions on a national consensus survey asks for respondents' age. Which standard
deviation would be used to describe the variation in all ages received from the consensus?

A. Population standard deviation. A national consensus is used to find out information about the
nation's citizens. By definition, it includes the whole population. Therefore, a population standard
deviation would be used.

51
What are the formulas for the standard deviation?

The sample standard deviation formula is:

where,

s = sample standard deviation


= sum of...
= sample mean
n = number of scores in sample.

The population standard deviation formula is:

where,

= population standard deviation


= sum of...
= population mean
n = number of scores in sample.

52
Inferential Statistics
We have seen that descriptive statistics provide information about our immediate group of data. For
example, we could calculate the mean and standard deviation of the exam marks for the 100 students
and this could provide valuable information about this group of 100 students. Any group of data like
this, which includes all the data you are interested in, is called a population. A population can be small
or large, as long as it includes all the data you are interested in. For example, if you were only
interested in the exam marks of 100 students, the 100 students would represent your population.
Descriptive statistics are applied to populations, and the properties of populations, like the mean or
standard deviation, are called parameters as they represent the whole population (i.e., everybody you
are interested in).

Often, however, you do not have access to the whole population you are interested in investigating, but
only a limited number of data instead. For example, you might be interested in the exam marks of all
students in the Kenya. It may not feasible to access all exam marks of all students in the whole of the
Kenya so you have to measure a smaller sample of students (e.g., 100 students), which are used to
represent the larger population of all Kenya students. Properties of samples, such as the mean or
standard deviation, are not called parameters, but statistics. Inferential statistics are techniques that
allow us to use these samples to make generalizations about the populations from which the samples
were drawn. It is, therefore, important that the sample accurately represents the population. The process
of achieving this is called sampling. Inferential statistics arise out of the fact that sampling naturally
incurs sampling error and thus a sample is not expected to perfectly represent the population. The
methods of inferential statistics are (1) the estimation of parameter(s) and (2) testing of statistical
hypotheses.

Hypothesis Testing

When you conduct a piece of quantitative research, you are inevitably attempting to answer a research
question or hypothesis that you have set. One method of evaluating this research question is via a
process called hypothesis testing, which is sometimes also referred to as significance testing. Since
there are many facets to hypothesis testing, we start with the example we refer to throughout this guide.

An example of a lecturer's dilemma

Two statistics lecturers, Sarah and Mike, think that they use the best method to teach their students.
Each lecturer has 50 statistics students who are studying a graduate degree in management. In Sarah's
class, students have to attend one lecture and one seminar class every week, whilst in Mike's class
students only have to attend one lecture. Sarah thinks that seminars, in addition to lectures, are an
important teaching method in statistics, whilst Mike believes that lectures are sufficient by themselves
and thinks that students are better off solving problems by themselves in their own time. This is the
first year that Sarah has given seminars, but since they take up a lot of her time, she wants to make sure
that she is not wasting her time and that seminars improve her students' performance.

The research hypothesis

The first step in hypothesis testing is to set a research hypothesis. In Sarah and Mike's study, the aim is
to examine the effect that two different teaching methods – providing both lectures and seminar classes
(Sarah), and providing lectures by themselves (Mike) – had on the performance of Sarah's 50 students
and Mike's 50 students. More specifically, they want to determine whether performance is different
between the two different teaching methods. Whilst Mike is skeptical about the effectiveness of
seminars, Sarah clearly believes that giving seminars in addition to lectures helps her students do better
than those in Mike's class. This leads to the following research hypothesis:

Research Hypothesis: When students attend seminar classes, in addition to

53
lectures, their performance increases.

Before moving onto the second step of the hypothesis testing process, we need to take you on a brief
detour to explain why you need to run hypothesis testing at all.

Sample to population

If you have measured individuals (or any other type of "object") in a study and want to understand
differences (or any other type of effect), you can simply summarize the data you have collected. For
example, if Sarah and Mike wanted to know which teaching method was the best, they could simply
compare the performance achieved by the two groups of students – the group of students that took
lectures and seminar classes, and the group of students that took lectures by themselves – and conclude
that the best method was the teaching method which resulted in the highest performance. However, this
is generally of only limited appeal because the conclusions could only apply to students in this study.
However, if those students were representative of all statistics students on a graduate management
degree, the study would have wider appeal.

In statistics terminology, the students in the study are the sample and the larger group they represent
(i.e., all statistics students on a graduate management degree) is called the population. Given that the
sample of statistics students in the study are representative of a larger population of statistics students,
you can use hypothesis testing to understand whether any differences or effects discovered in the study
exist in the population. In layman's terms, hypothesis testing is used to establish whether a research
hypothesis extends beyond those individuals examined in a single study.

Another example could be taking a sample of 200 breast cancer sufferers in order to test a new drug
that is designed to eradicate this type of cancer. As much as you are interested in helping these specific
200 cancer sufferers, your real goal is to establish that the drug works in the population (i.e., all breast
cancer sufferers).

As such, by taking a hypothesis testing approach, Sarah and Mike want to generalize their results to a
population rather than just the students in their sample. However, in order to use hypothesis testing,
you need to re-state your research hypothesis as a null and alternative hypothesis. Before you can do
this, it is best to consider the process/structure involved in hypothesis testing and what you are
measuring.

The structure of hypothesis testing

Whilst all pieces of quantitative research have some dilemma, issue or problem that they are trying to
investigate, the focus in hypothesis testing is to find ways to structure these in such a way that we can
test them effectively. Typically, it is important to:

1. Define the research hypothesis for the study.


Explain how you are going to operationalize (that is, measure or operationally define) what you
2.
are studying and set out the variables to be studied.
Set out the null and alternative hypothesis (or more than one hypothesis; in other words, a
3.
number of hypotheses).
4. Set the significance level.
5. Make a one- or two-tailed prediction.
Determine whether the distribution that you are studying is normal (this has implications for the
6.
types of statistical tests that you can run on your data).
Select an appropriate statistical test based on the variables you have defined and whether the
7.
distribution is normal or not.
8. Run the statistical tests on your data and interpret the output.
9. Reject or fail to reject the null hypothesis.

54
Whilst there are some variations to this structure, it is adopted by most thorough quantitative research
studies. We focus on the first five steps in the process, as well as the decision to either reject or fail to
reject the null hypothesis.

Operationally defining (measuring) the study

So far, we have simply referred to the outcome of the teaching methods as the "performance" of the
students, but what do we mean by "performance". "Performance" could mean how students score in a
piece of coursework, how many times they can answer questions in class, what marks they get in their
exams, and so on. There are three major reasons why we should be clear about how we operationalize
(i.e., measure) what we are studying. First, we simply need to be clear so that people reading our work
are in no doubt about what we are studying. This makes it easier for them to repeat the study in future
to see if they also get the same (or similar) results; something called internal validity. Second, one of
the criteria by which quantitative research is assessed, perhaps by an examiner if you are a student, is
how you define what you are measuring (in this case, "performance") and how you choose to measure
it. Third, it will determine which statistical test you need to use because the choice of statistical test is
largely based on how your variables were measured (e.g., whether the variable, "performance", was
measured on a "continuous" scale of 1-100 marks; an "ordinal" scale with groups of marks, such as 0-
20, 21-40, 41-60, 61-80 and 81-100; or some of other scale).

It is worth noting that these choices will sometimes be personal choices (i.e., they are subjective) and at
other times they will be guided by some other/external information. For example, if you were to
measure intelligence, there may be a number of characteristics that you could use, such as IQ,
emotional intelligence, and so forth. What you choose here will likely be a personal choice because all
these variables are proxies for intelligence; that is, they are variables used to infer an individual's
intelligence, but not everyone would agree that IQ alone is an accurate measure of intelligence. In
contrast, if you were measuring company performance, you would find a number of established metrics
in the academic and practitioner literature that would determine what you should test, such as "Return
on Assets", etc. Therefore, to know what you should measure, it is always worth looking at the
literature first to see what other studies have done, whether you use the same measures or not. It is then
a matter of making an educated decision whether the variables you choose to examine are accurate
proxies for what you are trying to study, as well as discussing the potential limitations of these proxies.

In the case of measuring a statistics student's performance there are a number of proxies that could be
used, such as class participation, coursework marks and exam marks, since these are all good measures
of performance. However, in this case, we choose exam marks as our measure of performance for two
reasons: First, as a statistics tutor, we feel that Sarah's main job is to help her students get the best grade
possible since this will affect her students' overall grades in their graduate management degree. Second,
the assessment for the statistics course is a single two-hour exam. Since there is no coursework and
class participation is not assessed in this course, exam marks seem to be the most appropriate proxy for
performance. However, it is worth noting that if the assessment for the statistics course was not only a
two hour exam, but also a piece of coursework, we would probably have chosen to measure both exam
marks and coursework marks as proxies of performance.

Variables

The next step is to define the variables that we are using in our study. Since the study aims to examine
the effect that two different teaching methods–providing lectures and seminar classes (Sarah) and
providing lectures by themselves (Mike)–had on the performance of Sarah's 50 students and Mike's 50
students, the variables being measured are:

Dependent variable: Exam marks


Independent variable: Teaching method ("seminar + lecture" vs. "lecture only")

By using a very straightforward example, we have only one dependent variable and one independent
variable although studies can examine any number of dependent and independent variables. Now that
we know what our variables are, we can look at how to set out the null and alternative hypothesis.

55
The null and alternative hypothesis

In order to undertake hypothesis testing you need to express your research hypothesis as a null and
alternative hypothesis. The null hypothesis and alternative hypothesis are statements regarding the
differences or effects that occur in the population. You will use your sample to test which statement
(i.e., the null hypothesis or alternative hypothesis) is most likely (although technically, you test the
evidence against the null hypothesis). So, with respect to our teaching example, the null and alternative
hypothesis will reflect statements about all statistics students on graduate management courses.

The null hypothesis is essentially the "devil's advocate" position. That is, it assumes that whatever you
are trying to prove did not happen (hint: it usually states that something equals zero). For example, the
two different teaching methods did not result in different exam performances (i.e., zero difference).
Another example might be that there is no relationship between anxiety and athletic performance (i.e.,
the slope is zero). The alternative hypothesis states the opposite and is usually the hypothesis you are
trying to prove (e.g., the two different teaching methods did result in different exam performances).
Initially, you can state these hypotheses in more general terms (e.g., using terms like "effect",
"relationship", etc.), as shown below for the teaching methods example:

Null Hypotheses (H0): Undertaking seminar classes has no effect on students'


performance.
Alternative Hypothesis Undertaking seminar class has a positive effect on students'
(HA): performance.

Depending on how you want to "summarize" the exam performances will determine how you might
want to write a more specific null and alternative hypothesis. For example, you could compare the
mean exam performance of each group (i.e., the "seminar" group and the "lectures-only" group). This
is what we will demonstrate here, but other options include comparing the distributions, medians,
amongst other things. As such, we can state:

Null Hypotheses (H0): The mean exam mark for the "seminar" and "lecture-
only" teaching methods is the same in the population.
Alternative Hypothesis (HA): The mean exam mark for the "seminar" and "lecture-
only" teaching methods is not the same in the
population.

Now that you have identified the null and alternative hypotheses, you need to find evidence and
develop a strategy for declaring your "support" for either the null or alternative hypothesis. We can do
this using some statistical theory and some arbitrary cut-off points. Both these issues are dealt with
next.

Significance levels
The level of statistical significance is often expressed as the so-called p-value. Depending on the
statistical test you have chosen, you will calculate a probability (i.e., the p-value) of observing your
sample results (or more extreme) given that the null hypothesis is true. Another way of phrasing this
is to consider the probability that a difference in a mean score (or other statistic) could have arisen
based on the assumption that there really is no difference. Let us consider this statement with respect to
our example where we are interested in the difference in mean exam performance between two
different teaching methods. If there really is no difference between the two teaching methods in the
population (i.e., given that the null hypothesis is true), how likely would it be to see a difference in the
mean exam performance between the two teaching methods as large as (or larger than) that which has
been observed in your sample?

So, you might get a p-value such as 0.03 (i.e., p = .03). This means that there is a 3% chance of finding
a difference as large as (or larger than) the one in your study given that the null hypothesis is true.
However, you want to know whether this is "statistically significant". Typically, if there was a 5% or

56
less chance (5 times in 100 or less) that the difference in the mean exam performance between the two
teaching methods (or whatever statistic you are using) is as different as observed given the null
hypothesis is true, you would reject the null hypothesis and accept the alternative hypothesis.
Alternately, if the chance was greater than 5% (5 times in 100 or more), you would fail to reject the
null hypothesis and would not accept the alternative hypothesis. As such, in this example where p
= .03, we would reject the null hypothesis and accept the alternative hypothesis. We reject it because at
a significance level of 0.03 (i.e., less than a 5% chance), the result we obtained could happen too
frequently for us to be confident that it was the two teaching methods that had an effect on exam
performance.

Whilst there is relatively little justification why a significance level of 0.05 is used rather than 0.01 or
0.10, for example, it is widely used in academic research. However, if you want to be particularly
confident in your results, you can set a more stringent level of 0.01 (a 1% chance or less; 1 in 100
chance or less).

One- and two-tailed predictions

When considering whether we reject the null hypothesis and accept the alternative hypothesis, we need
to consider the direction of the alternative hypothesis statement. For example, the alternative
hypothesis that was stated earlier is:

Alternative Hypothesis (HA): Undertaking seminar classes has a positive effect on


students' performance.

The alternative hypothesis tells us two things. First, what predictions did we make about the effect of
the independent variable(s) on the dependent variable(s)? Second, what was the predicted direction of
this effect? Let's use our example to highlight these two points.

Sarah predicted that her teaching method (independent variable: teaching method), whereby she not
only required her students to attend lectures, but also seminars, would have a positive effect (that is,
increased) students' performance (dependent variable: exam marks). If an alternative hypothesis has a
direction (and this is how you want to test it), the hypothesis is one-tailed. That is, it predicts direction
of the effect. If the alternative hypothesis has stated that the effect was expected to be negative, this is
also a one-tailed hypothesis.

Alternatively, a two-tailed prediction means that we do not make a choice over the direction that the
effect of the experiment takes. Rather, it simply implies that the effect could be negative or positive. If
Sarah had made a two-tailed prediction, the alternative hypothesis might have been:

Alternative Hypothesis (Ha): Undertaking seminar classes has an effect on


students' performance.

In other words, we simply take out the word "positive", which implies the direction of our effect. In our
example, making a two-tailed prediction may seem strange. After all, it would be logical to expect that
"extra" tuition (going to seminar classes as well as lectures) would either have a positive effect on
students' performance or no effect at all, but certainly not a negative effect. However, this is just our
opinion (and hope) and certainly does not mean that we will get the effect we expect. Generally
speaking, making a one-tail prediction (i.e., and testing for it this way) is frowned upon as it usually
reflects the hope of a researcher rather than any certainty that it will happen. Notable exceptions to this
rule are when there is only one possible way in which a change could occur. This can happen, for
example, when biological activity/presence in measured. That is, a protein might be "dormant" and the
stimulus you are using can only possibly "wake it up" (i.e., it cannot possibly reduce the activity of a
"dormant" protein). In addition, for some statistical tests, one-tailed tests are not possible.

57
Rejecting or failing to reject the null hypothesis

Let's return finally to the question of whether we reject or fail to reject the null hypothesis.

If our statistical analysis shows that the significance level is below the cut-off value we have set (e.g.,
either 0.05 or 0.01), we reject the null hypothesis and accept the alternative hypothesis. Alternatively, if
the significance level is above the cut-off value, we fail to reject the null hypothesis and cannot accept
the alternative hypothesis. You should note that you cannot accept the null hypothesis: you can only
find evidence against it.

58

You might also like