Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Introduction to Research and Psychological Testing

What is Research?
A systematic, empirical, critical investigation that is structured to answer questions about
the behaviour and experience of individuals, is called research. Research can have educational,
occupational or clinical applications. The researcher (who more often than not is really a small
group of researchers) formulates a research question, conducts a study designed to answer the
question, analyzes the resulting data, draws conclusions about the answer to the question, and
publishes the results so that they become part of the research literature. Because the research
literature is one of the primary sources of new research questions, this process can be thought of
as a cycle. New research leads to new questions, which lead to new research, and so on. It can be
better understood by the following figure:
Figure 1
Model of Scientific Research in Psychology

This figure also indicates that research questions can originate outside of this cycle either
with informal observations or with practical problems that need to be solved. But even in these
cases, the researcher would start by checking the research literature to see if the question had
already been answered and to refine it based on what previous research had already found.
Therefore, to sum up, research is all about answering questions.
Goals in Research
Every research has the common goal of learning how things work. The goals specifically
aimed at uncovering the mysteries of human and animal behaviour are description, explanation,
prediction, control and application.
Description
The first step in understanding anything is to describe it. Description involves observing a
behaviour and noting everything about it: what is happening, where it happens, to whom it
happens and under what circumstances it seems to happen.
Explanation
Based on one’s observation, a researcher might try to understand or find an explanation.
Finding explanations for behaviour is a very important step in the process of forming theories of
behaviour. A theory is a general explanation of a set of observations or facts. The goal of
description provides the observations, and the goal of explanation helps to build the theory.
Prediction
Determining what will happen in the future is a prediction. If through research, a
researcher gains knowledge, he can use it to make future predictions to control or modify the
behaviour under study.
Control
The focus of control, or the modification of some behaviour is to change a behaviour
from an undesirable one to a desirable one.
Application
The use of acquired knowledge for betterment of human society, is one of the last goals
of research. If the results obtained by research can be applied in real life, it fulfils the application
goal of research.
Steps in Quatitative Research
Step1: Identify a Question of Interest
The first step of scientific enquiry is to identify the question of interest. From personal
experiences, news events, scientific articles and books, and other sources, researchers observe
something that piques their interest and they ask a question about it.
Step 2: Gather Information and Form Hypothesis
Next, scientists examine whether any studies, theories, and other information already
exists that might help answer their question, and then they form a hypothesis. Hypothesis is a
specific prediction about some phenomenon.
Step 3: Test Hypothesis by Conducting Research
The third step is to test the hypothesis by conducting research. Research conducted can
be done through various methods. The data is collected and the hypothesis is tested by drawing
out results from the acquired data.
Step 4: Analyse data, Draw Tentative Conclusions, and Report Findings
At the fourth step, researchers analyse the information (called data) they collect, draw
tentative conclusions, and report their findings to the scientific community. If expert reviewers
favourably judge the quality and importance of the research, the article gets published. It allows
fellow scientists to learn about new findings, to evaluate the research, and to challenge or expand
on it.
Step 5: Build a Body of Knowledge
At the fifth step, scientists build a body of knowledge about the topic in question. They
ask further questions, formulate new hypotheses, and test those hypotheses by conducting more
research. As additional evidence comes in, scientists may attempt to build theories. A theory is a
set of formal statements that explains how and why certain events are related to one another.
Theories are broader than hypotheses. Scientists use the theory to formulate new hypotheses,
which are then tested by conducting still more research. In this manner, the scientific process
becomes self correcting. If research consistently supports the hypotheses derived from the theory,
confidence in the theory becomes stronger. If the predictions made by the theory are not
supported, then it will need to be modified or, ultimately, discarded.

Types of Research
On the basis of Use and Audience of Research
Basic Research. Research designed to advance fundamental knowledge about how the
world works and build/test theoretical explanations by focusing on the “why” question. The
scientific community is its primary audience. Basic research lacks practical applications in the
short term, but it builds a foundation for knowledge and broad understanding that has an impact
on many issues, policy areas, or areas of study. Basic research is also the main source of the
tools—methods, theories, and ideas—that all researchers use. Almost all of the major
breakthroughs and significant advances in knowledge originated in basic research. It lays a
foundation for core understandings and may have implications for issues that do not even exist
when a study is conducted.
Applied Research. Research designed to offer practical solutions to a concrete problem
or address the immediate and specific needs of clinicians or practitioners. Only rarely in applied
research do we try to build, test, or make connections to theory. Most applied research studies are
short term and small scale. They offer practical results that we can use within a year or less. For
example, the student government of University X wants to reduce alcohol abuse. It wants,
therefore, to find out whether the number of University X students arrested for driving while
intoxicated would decline if the student government were to sponsor alcohol-free parties next
year. An applied research study would be most applicable for this situation. Businesses,
government offices, health care facilities, social service agencies, political organisations, and
educational institutions conduct applied studies and make decisions based on findings.
On the basis of Time
Cross-sectional. Any research that examines information on many cases at one point in
time. Cross-sectional research can be exploratory, descriptive, or explanatory, but it is most
consistent with a descriptive approach. It is usually the simplest and least costly alternative but
rarely captures social processes or change. For example, scientists in healthcare may use
cross-sectional research to understand how children ages 2-12 years across India are prone to
calcium deficiency.
Longitudinal Research. Any research that examines information from many units or
cases across more than one point in time We can use longitudinal studies for exploratory,
descriptive, and explanatory purposes. Usually more complicated and costly to conduct than
cross sectional research, longitudinal studies are more powerful. For instance, consider a study
conducted to understand the similarities or differences between identical twins who are brought
up together versus identical twins who are not. The study observes several variables, but the
constant is that all the participants have identical twins. In this case, researchers would want to
observe these participants from childhood to adulthood, to understand how growing up in
different environments influences traits, habits, and personality. Over many years, researchers
can see both sets of twins as they experience life without intervention. Because the participants
share the same genes, it is assumed that any differences are due to environmental factors, but
only an attentive study can conclude those assumptions.
On the basis of Purpose of Research
Descriptive Research. Research in which the primary purpose is to “paint a picture”
using words or numbers and to present a profile, a classification of types, or an outline of steps to
answer questions such as who, when, where, and how. Descriptive research presents a picture of
the specific details of a situation, social setting, or relationship. Much of the social research
found in scholarly journals or used for making policy decisions is descriptive. A descriptive
research study starts with a well-defined issue or question and tries to describe it accurately. The
study’s outcome is a detailed picture of the issue or answer to the research question. For
example, the focused issue might be the relationship between parents who are heavy alcohol
drinkers and child abuse. Results could show that 25 percent of heavy-drinking parents had
physically or sexually abused their children compared to 5 percent of parents who never drink or
drink very little.
Exploratory Research. Research whose primary purpose is to examine a little
understood issue or phenomenon and to develop preliminary ideas about it and move toward
refined research questions. We use exploratory research when the subject is very new, we know
little or nothing about it, and no one has yet explored it. Our goal with it is to formulate more
precise questions that we can address in future research. As a first stage of inquiry, we want to
know enough after the exploratory study so we can design and execute a second, more
systematic and extensive study. Researchers who conduct exploratory research must be creative,
open minded, and flexible; adopt an investigative stance; and explore all sources of information.
For example, an expectation might be that the impact of immigration to a new nation would be
more negative on younger children than on older ones. Instead, the unexpected finding was that
children of a specific age group (between ages six and eleven) who immigrate are most
vulnerable to its disruption more so than either older or younger children.
Explanatory Research. Research whose primary purpose is to explain why events occur
and to build, elaborate, extend, or test theory. When encountering an issue that is known and
with a description of it, we might wonder why things are the way they are. Addressing the “why”
is the purpose of explanatory research. It builds on exploratory and descriptive research and goes
on to identify the reason something occurs. Going beyond providing a picture of the issue, an
explanatory study looks for causes and reasons. For example, a descriptive study would
document the numbers of heavy-drinking parents who abuse their children whereas an
explanatory study would be interested in learning why these parents abuse their children. We
focus on exactly what is it about heavy drinking that contributes to child abuse.
On the basis of Data Collection Technique
Qualitative Research. Qualitative research is expressed in words. It is used to
understand concepts, thoughts or experiences. This type of research enables you to gather
in-depth insights on topics that are not well understood. Common qualitative methods include
interviews with open-ended questions, observations described in words, and literature reviews
that explore concepts and theories.
Quantitative Research. Quantitative research is expressed in numbers and graphs. It is
used to test or confirm theories and assumptions. This type of research can be used to establish
generalizable facts about a topic. Common quantitative methods include experiments,
observations recorded as numbers, and surveys with closed-ended questions.

Table 1
Qualitative v/s Quantitative Research

Qualitative Research Quantitative Research

Emphasis is on an interpretivist approach Emphasis is on a positivist approach, stressing


stressing on soft data (words, impressions, on hard data often in the form of numbers.
symbols).

Non linear research path that permits and Clearly set linear path and successive
obligates the researcher to go in cyclical, back procedures that seem to follow in a logical
and forth, and non-‐successive sequences. sequence.

Usually a large number of cases representing Usually a small group of respondents


a population of interest.

May start out with a vague or poorly defined Questions are finalised before the study and
research question which may evolve as the are used in developing steps and guiding the
study progresses and new insights are gained study.
and incorporated.
Use semi-‐structured methods such as in-‐ Use highly structured methods such as
depth interviews, focus groups and participant questionnaires, surveys and structured
observation. observation

Primarily inductive; used to formulate theory Primarily deductive; used to test


or hypotheses. pre-‐specified concepts, constructs, and
hypotheses that make up a theory.

Allows subjectivity; researchers can employ Stress is on objectivity.


personal insights, feelings, and human
perspectives to understand social life.

Analysis proceeds by summarising, Analysis proceeds by using statistics, tables,


categorising and interpreting data to provide or charts and discussing how the results relate
valuable insights that are open to exploration. to hypotheses.

Data Collection Techniques: Qualitative Data


Interviews
Qualitative interviewing is a common qualitative data collection method that
characteristically involves questions and probes by the interviewer designed to encourage the
interviewee to talk freely and extensively about the topic(s) defined by the researcher. Interviews
are often described as varying between the structured and the unstructured. Most of us have, at
some stage, taken part in a market research interview in the street or over the telephone. Such
interviews typify structured interviews. The questions asked are often simply read from a list and
the interviewee chooses from another list of possible answers for each question. There is little
opportunity for the interviewer to depart from the prepared ‘script’. In other words, as much as
possible is planned and predetermined. Academic quantitative researchers use variations on the
theme of structured interviewing in their research. If structured interviewing meets the needs of
one’s research, then data can be collected fairly economically in terms of both time and financial
costs. In contrast, qualitative interviews are time consuming for everyone involved and are more
complex in terms of planning and recruiting suitable participants than structured interviews.
Often qualitative interviews are referred to as semistructured. In theory there is also the
unstructured interview which lacks any pre planned structure.
The whole point of the qualitative interview is that it generally generates extensive and
rich data from participants in the study. Such reasons for using qualitative interviewing touch on
the ethos of qualitative research just as much as structured interviewing reflects the quantitative
ethos. Unlike everyday conversation, the qualitative interview is built on the principle that the
interviewee does most of the talking – the researcher merely steers and guides the interviewee,
probes for more information and interjects in other ways when necessary. It is not generally
expected that the interviewer will answer questions – that is the role of the interviewee. Equally,
the interviewee does not ask the interviewer personal questions of the sort that the interviewer is
free to pose. That is not in the ‘rules’ of the interview. The interviewee can be asked to talk at
some length about matters that are difficult for them – perhaps because they have not thought
about the issue, perhaps because the topic of the interview is embarrassing, and so forth.
The interviewer has to conduct the business of the interview while at the same time
absorbing a great deal of information that bombards them during the interview. This information
has to be absorbed and retained so that probes using this new information can be inserted
wherever necessary.
Observations
A great deal of behavioural research involves the direct observation of human or
nonhuman behaviour. Behavioural researchers have been known to observe and record
behaviours as diverse as eating, arguing, bar pressing, blushing, smiling, helping, food salting,
hand clapping, eye blinking, mating, yawning, conversing, and even urinating. However,
researchers who use observational methods must make some decisions about how they will
observe and record participants' behaviour in a particular study: (1) Will the observation occur in
a natural or contrived setting? And (2) Will the participants know they are being observed?
Naturalistic Observation vs Contrived Observation. Naturalistic observation involves
the observation of ongoing behaviour as it occurs naturally with no intrusion or intervention by
the researcher. In naturalistic studies, the participants are observed as they engage in ordinary
activities in settings that have not been arranged specifically for research purposes. For example,
researchers have used naturalistic observation to study behaviour during riots and other mob
events, littering, nonverbal behaviour, and parent-child interactions on the playground.
Researchers who are interested in the behaviour of nonhuman animals in their natural
habitats-ethologists and comparative psychologists-also use naturalistic observation methods.
Participant observation is one special type of naturalistic observation. In participant observation,
the researcher engages in the same activities as the people he or she is observing.
Contrived observation involves the observation of behaviour in settings that are arranged
specifically for observing and recording behaviour. Often such studies are conducted in
laboratory settings in which participants know they are being observed, although the observers
are usually concealed, such as behind a one-way mirror. For example, to study parent-child
relationships, researchers often observe parents interacting with their children in laboratory
settings. In other cases, researchers use contrived observation in the "real world." In these
studies, researchers set up situations outside of the laboratory to observe people's reactions.
Disguised vs Nondisguised Observation. The second decision a researcher must make
when using observational methods is whether to let participants know they are being observed.
Sometimes the individuals who are being studied know that the researcher is observing their
behaviour (undisguised observation). As you might guess, the problem with undisguised
observation is that people often do not respond naturally when they know they are being
scrutinised. When they react to the researcher's observation, their behaviours are affected.
Researchers refer to this phenomenon as reactivity.
When they are concerned about reactivity, researchers may conceal the fact that they are
observing and recording participants' behaviour (disguised observation). In some instances,
researchers compromise by letting participants know they are being observed while withholding
information regarding precisely what aspects of the participants' behaviour are being recorded.
This partial concealment strategy (Weick, 1968) lowers, but does not eliminate, the problem of
reactivity while avoiding ethical questions involving invasion of privacy and informed consent.
Focus Groups
Focus Group is a type of in-depth interview accomplished in a group, whose meetings
present characteristics defined with respect to the proposal, size, composition, and interview
procedures. The focus or object of analysis is the interaction inside the group. The participants
influence each other through their answers to the ideas and contributions during the discussion.
The moderator stimulates discussion with comments or subjects. The fundamental data produced
by this technique are the transcripts of the group discussions and the moderator's reflections and
annotations. The general characteristics of the Focus Group are people's involvement, a series of
meetings, the homogeneity of participants with respect to research interests, the generation of
qualitative data, and discussion focused on a topic, which is determined by the purpose of the
research.
A focus group discussion is a form of group interviewing in which a small group –
usually 10 to 12 people – is led by a moderator (interviewer) in a loosely structured discussion of
various topics of interest. The course of the discussion is usually planned in advance and most
moderators rely on an outline, or moderator’s guide, to ensure that all topics of interest are
covered. A focus group discussion (FGD) is a good way to gather together people from similar
backgrounds or experiences to discuss a specific topic of interest.
Ethnography
Ethnographic research is perhaps the most common applicable type of qualitative
research method in psychology and medicine. In ethnography studies, the researcher immerses
himself in the environment of participants to understand the cultures, challenges, motivations,
and topics that arise between them by investigating the environment directly. This type of
research method can last for a few days to a few years because it involves in-depth monitoring
and data collection based on these foundations.
Ethnographic study can be applied anywhere, including familiar places. Ethnographic
study can be carried out in many types of societies including formal and informal organisations,
such as workplaces, urban communities, clubs, shopping centres, and social media. The main
role that ethnologists play, however, remains almost unchanged: to observe and analyse how
people interact with each other, and with their environment, in order to understand their culture.
From the emic and etic viewpoints, ethnologists strive to gain a principled perspective, or the
viewpoint of the citizen himself in a specific culture. This means that they are trying to look at
the culture under study from within, through the meanings of individuals belonging to this
culture.
This research provides an in-depth look at the participants' opinions and behaviours, as
well as the situations he faces during his day. The researcher provides an understanding of how
these participants see the world around them and how they interact with everything around them,
using direct observation tools, diary study, video recordings, photography, and impact analysis,
such as devices that a person uses throughout the day. Notes can be made anywhere from the
participant's workplace or home, or while out with family and friends.
Schwartzman (1993) stated that ethnographic study takes the image of a cultural lens or
cultural perspective to study the lives of people within their societies, as the roots of ethnography
lie in anthropological studies that focused on studying the social and cultural aspects of small
societies. Ethnographic researchers live among the study participants in order to understand the
culture in which they participate. Thus, classical anthropologists were strangers in the fields of
their field studies, and that is why it often takes years for ethnographic study for researchers to
integrate into the culture of the society they were studying. In order to do this, they had to learn
the language needed to socialise with the population and understand their daily habits, rituals,
customs, and actions.
Case Study
The case study is not itself a research method. Instead, it constitutes an approach to the
study of singular entities, which may involve the use of a wide range of diverse methods of data
collection and analysis. The case study is, therefore, not characterised by the methods used to
collect and analyse data, but rather by its focus upon a particular unit of analysis: the case. A
case can be an organisation, a city, a group of people, a community, a patient, a school, an
intervention, even a nation state or an empire. It can be a situation, an incident or an experience.
Bromley (1986: 8) describes cases as ‘natural occurrences with definable boundaries’.
The case study involves an in-depth, intensive and sharply focused exploration of such an
occurrence. Case studies can make use of both qualitative and quantitative research methods.
However, despite such diversity, it is possible to identify a number of defining features of case
study research. These include:
An Idiographic Perspective. Here, researchers are concerned with the particular rather
than the general. The aim is to understand an individual case, in its particularity. This can be
contrasted with a nomothetic approach, which aims to identify general laws of human behaviour
by averaging out individual variation (for a more detailed discussion of idiography, see Smith et
al. 1995).
Attention to Contextual Data. Case study research takes a holistic approach, in that it
considers the case within its context. This means that the researcher pays attention to the ways in
which the various dimensions of the case relate to or interact with its environment. Thus, while
particular cases need to be identified as the focus of the study, they cannot be considered in
isolation (for a discussion of the role of the ‘ecological context’ in psychological case studies,
see Bromley 1986: 25).
Triangulation. Case studies integrate information from diverse sources to gain an
in-depth understanding of the phenomenon under investigation. This may involve the use of a
range of data collection and analysis techniques within the framework of one case study.
Triangulation enriches case study research because it allows the researcher to approach the case
from a number of different perspectives. This in turn facilitates an appreciation of the various
dimensions of the case as well as its embeddedness within its various (social, physical, symbolic,
psychological) contexts.
A Temporal Element. Case studies involve the investigation of occurrences over a
period of time. According to Yin (1994: 16), ‘Establishing the how and why of a complex human
situation is a classic example of the use of case studies.’ Case studies are concerned with
processes that take place over time. This means that a focus on change and development is an
important feature of case studies.
A Concern with Theory. Case studies facilitate theory generation. The detailed
exploration of a particular case can generate insights into social or psychological processes,
which in turn can give rise to theoretical formulations and hypotheses. Freud’s psychoanalytic
case studies constitute a clear example of the relationship between case studies and theory
development. Hamel (1993: 29) goes as far as to claim that ‘All theories are initially based on a
particular case or object’. In addition, case studies can also be used to test existing theories or to
clarify or extend such theories, for example, by looking at deviant or extreme cases.
Data Collection Techniques: Quantitative Data
Experiments
Experiments are widely used in psychology as one of the most primary methods of
inquiry. An experiment refers to an investigation in which the validity of a hypothesis is tested in
a scientific manner. All experiments require at least these two special features, the independent
and dependent variables. The dependent variable is the response measure of an experiment that is
dependent on the subject. The independent variable is a manipulation of the environment
controlled by the experimenter. The key features of an experiment are controlled methods and
the random allocation of participants into controlled and experimental groups. Experiments are
used to study the impact of one variable on another variable and to establish a cause and effect
relationship. When speaking of experiments, most people assume that these are confined to the
laboratory. Although there is a category known as the laboratory experiment in which the study
is conducted in a very controlled environment, there are other experiments as well. These are
known as natural experiments in which the variables are merely observed rather than controlled.
Some key terms used in an experiment are as follows:
Independent Variable. Variables that are manipulated to study its effect on the
dependent variable.
Dependent Variable. Variables that get modified or changed due to a change in
independent variables are called dependent variables.
Experimental Group. An experimental group ( sometimes called a treatment group) is a
group that receives a treatment in an experiment. The group is made up of test subjects (humans,
animals etc.) and the “treatment” is the variable you are studying.
Control Group. The group that does not receive the treatment is called the control group.
Hypothesis. A hypothesis is a precise, testable statement of what the researcher(s) predict
will be the outcome of the study. This usually involves proposing a possible relationship between
two variables: the independent variable (what the researcher changes) and the dependent variable
(what the research measures).
For example, in the classic research experiment by Ivan Pavlov, called Classical
Conditioning, the independent variable was the ringing of the bell and the dependent variable
was the salivation by the dog. The salivation (amount and frequency) depend completely on the
ringing of the bell, therefore establishing a cause and effect relationship, as proved by the
experiment.
Advantages of Experiments. The main advantage of experiments is better control of
extraneous variation. In the ideal experiment, no factors (variables) except the one being studied
are permitted to influence the outcome; in the jargon of experimental psychology, we say that
these other factors are controlled. If, as in the ideal experiment, all factors but one (that under
investigation) are held constant, we can logically conclude that any differences in outcome must
be caused by manipulation of that one independent variable. As the levels of the independent
variable are changed, the resulting differences in the dependent variable can occur only because
the independent variable has changed.
Another advantage of experiments is the economy. Using the technique of naturalistic
observation requires that the scientist wait patiently until the conditions of interest occur. The
experimenter controls the situation by creating the conditions of interest, thus obtaining data
quickly and efficiently. Experiments also have the advantage that the research provides
conclusions that are specific and it allows the cause and effect relationship to be determined.
Disadvantages of Experiments. Although there are many advantages of conducting an
experimental research, it is highly subjective to the possibility of human error and sometimes
creates ethical or practical problems with variable control. Another shortcoming is that lab
settings can inhibit natural behaviour of an individual.
Psychological Tests
Definition. A test is a standardised procedure for sampling behaviour and describing it
with categories or scores. All of the following could be tests, according to the definition,
checklist for rating the social skills of a youth with mental retardation; a non timed measure of
mastery in adding pairs of three-digit numbers; microcomputer appraisals of reaction time; and
even situational tests such as observing an individual working on a group task with two “helpers”
who are obstructive and uncooperative. It also includes the traditionally known personality
questionnaires and the intelligence tests. In sum, tests are enormously varied in their formats and
applications. Nonetheless, most tests possess these defining features:
Standardised Procedure.Standardised procedure is an essential feature of any
psychological test. A test is considered to be standardised if the procedures for administering it
are uniform from one examiner and setting to another. Standardisation, therefore, rests largely on
the directions for administration found in the instructional manual that typically accompanies a
test. In order to guarantee uniform administration procedures, the test developer must provide
comparable stimulus materials to all testers, specify with considerable precision the oral
instructions for each item or subtest, and advise the examiner how to handle a wide range of
queries from the examinee.
Behaviour Sample. A psychological test is also a limited sample of behaviour. Neither
the subject nor the examiner has sufficient time for truly comprehensive testing, even when the
test is targeted to a well-defined and finite behaviour domain. Thus, practical constraints dictate
that a test is only a sample of behaviour. Yet, the sample of behaviour is of interest only insofar
as it permits the examiner to make inferences about the total domain of relevant behaviours. For
example, the purpose of a vocabulary test is to determine the examinee’s entire word stock by
requesting definitions of a very small but carefully selected sample of words. Whether the
subject can define the particular 35 words from a vocabulary subtest (e.g., on the Wechsler Adult
Intelligence Scale-IV, or the WAIS-IV) is of little direct consequence. But the indirect meaning
of such results is of great import because it signals the examinee’s general knowledge of
vocabulary.
Scores or Categories. A psychological test must also permit the derivation of scores or
categories. Every test furnishes one or more scores or provides evidence that a person belongs to
one category and not another. In short, psychological testing sums up performance in numbers or
classifications.
Norms or Standards. Psychological tests must also possess norms or standards. An
examinee’s test score is usually interpreted by comparing it with the scores obtained by others on
the same test. For this purpose, test developers typically provide norms—a summary of test
results for a large and representative group of subjects (Petersen, Kolen, & Hoover, 1989).
Norms not only establish an average performance but also serve to indicate the frequency with
which different high and low scores are obtained. Thus, norms allow the tester to determine the
degree to which a score deviates from expectations. Such information can be very important in
predicting the non test behaviour of the examinee.
Prediction of Non-test Behaviour. The ultimate purpose of a test is to predict additional
behaviours, other than those directly sampled by the test. Thus, the tester may have more interest
in the non test behaviours predicted by the test than in the test responses per se. The essential
characteristic of a good test is that it permits the examiner to predict other behaviours— not that
it mirrors the to-be-predicted behaviours. If answering “true” to the question “I drink a lot of
water” happens to help predict depression, then this seemingly unrelated question is a useful
index of depression.
Characteristics of Good Psychological Tests
Psychological testing is a dynamic process influenced by many factors. Examiners strive
to ensure that test results accurately reflect the traits or capacities being assessed. Therefore,
there are certain characteristics that a test needs to possess to measure what it is designed to
measure, accurately.
Reliability
Reliability refers to the attribute of consistency in measurement. The concept of
reliability is best viewed as a continuum ranging from minimal consistency of measurement
(e.g., simple reaction time) to nearperfect repeatability of results (e.g., weight). Psychometricians
have devised several statistical methods for estimating the degree of reliability of measurements.
Reliability can be understood with respect to three aspects:
Temporal Stability. It is the degree to which test scores remain stable over time; whether
the test produces similar results on repeated administrations.
Equivalence. It is the degree to which test scores remain stable across different forms of
the test or across different testers scoring the same test.
Homogeneity. It is the extent to which the items across a test legitimately team together
to measure a single characteristic.
Methods of Assessing Reliability
Test-Retest. The most straightforward method for determining the reliability of test
scores is to administer the identical test twice to the same group of heterogeneous and
representative subjects. If the test is perfectly reliable, each person’s second score will be
completely predictable from his or her first score.
Alternate Forms. In some cases, test developers produce two forms of the same test.
These alternate forms are independently constructed to meet the same specifications, often on an
item-by-item basis. Thus, alternate forms of a test incorporate similar content and cover the same
range and level of difficulty in items.
Split- Half Reliability. If scores on two half tests from a single test administration show
a strong correlation, then scores on two whole tests from two separate test administrations (the
traditional approach to evaluating reliability) also should reveal a strong correlation We obtain an
estimate of split-half reliability by correlating the pairs of scores obtained from equivalent halves
of a test administered only once to a representative sample of examinees. The Spearman-Brown
Formula is one of the ways to test the reliability of test through Split-Half Method:
2𝑟ℎℎ
rSB = 1+𝑟
ℎℎ

Internal Consistency Methods. Cronbach’s coefficient alpha and Kuder-ichardson


estimate. Notice that the split-half method gives us an estimate of reliability for an instrument
half as long as the full test. Although there are some exceptions, a shorter test generally is less
reliable than a longer test. This is especially true if, in comparison to the shorter test, the longer
test embodies equivalent content and similar item difficulty. Thus, the Pearson r between two
halves of a test will usually underestimate the reliability of the full instrument. We need a method
for deriving the reliability of the whole test based on the half-test correlation coefficient. As
proposed by Cronbach (1951) and subsequently elaborated by others (Novick & Lewis, 1967;
Kaiser & Michael, 1975), coefficient alpha may be thought of as the mean of all possible
split-half coefficients, corrected by the Spearman-Brown formula. The formula for coefficient
alpha is:

Coefficient alpha is an index of the internal consistency of the items, that is, their
tendency to correlate positively with one another. Insofar as a test or scale with high internal
consistency will also tend to show stability of scores in a test–retest approach, coefficient alpha
is therefore a useful estimate of reliability.
Cronbach (1951) has shown that coefficient alpha is the general application of a more
specific formula developed earlier by Kuder and Richardson (1937). Their formula is generally
referred to as Kuder- Richardson formula 20 or, simply, KR-20.

Interscorer. Some tests leave a great deal of judgement to the examiner in the
assignment of scores like projective techniques and tests on moral development and creativity. In
this method a sample of tests is independently scored by two or more examiners and scores for
pairs of examiners are then correlated.

Validity
Standards for Educational and Psychological Testing define validity as “the degree to
which evidence and theory support the intended interpretation of test scores for the proposed
purpose” (AERA, APA, NCME, 1999, p.11). A test is valid to the extent that inferences made
from it are appropriate, meaningful, and useful. In other words, validity describes how
adequately attest measures the attribute it is designed to measure.
Traditionally, the different ways of accumulating validity evidence have been grouped
into three categories:
Content validity. Content validity is determined by the degree to which the questions,
tasks, or items on a test are representative of the universe of behaviour the test was designed to
sample. The items of a test can be visualised as a sample drawn from a larger population of
potential items that define what the researcher really wishes to measure. If the sample (specific
items on the test) is representative of the population (all possible items), then the test possesses
content validity. For example, is there an appropriate representation of questions from each topic
area on the assessment that reflect the curriculum that is being taught.
Criterion-related validity. Criterion-related validity is demonstrated when a test is
shown to be effective in estimating an examinee’s performance on some outcome measure. In
this context, the variable of primary interest is the outcome measure, called a criterion. The test
score is useful only insofar as it provides a basis for accurate prediction of the criterion. For
example, a college entrance exam that is reasonably accurate in predicting the subsequent grade
point average of examinees would possess criterion-related validity.
Construct validity. A construct is a theoretical, intangible quality or trait in which
individuals differ (Messick, 1995). Examples of constructs include leadership ability,
overcontrolled hostility, depression, and intelligence. A test designed to measure a construct
must estimate the existence of an inferred, underlying characteristic (e.g., leadership ability)
based on a limited sample of behaviour. Construct validity refers to the appropriateness of these
inferences about the underlying construct.
Norms
Norms indicate an examinee’s standing on the test relative to the performance of other persons of
the same age, grade, sex and so on. It is important to note that several different groups may be
used in providing normative information for interpreting test scores. There are national, local and
subgroup norms. This essentially means that:No single population can be regarded as the norm
group.A wide variety of norm-based interpretations could be made for a given raw score,
depending on which normative group is chosen
Provided that they are of sufficient size and fairly representative of their categories
subgroups can be formed in terms of sex, occupation, ethnicity, scio-economic level, education
level or any other variables that may have a significant impact on test scores or yield
comparisons of interest. The types of norms can be as follows.
National Norms. National norms are derived from a normative sample that was
nationally representative of the population at the time the norming study was conducted. Norms
for group ability tests and large achievement test batteries used in school settings are usually in
scope.
Local Norms. Typically developed by test users themselves, local norms provide
normative information with respect to the performance of a more narrowly defined population on
some test such as the employees of a particular company or the students of a certain university.
Subgroup Norms. When large samples are gathered to represent broadly defined
population norms can be reported in the aggregate or can be separated into sub group norms.
Provided they are of sufficient size and fairly representative of their categories subgroup groups
can be formed in terms of sex, occupation, ethnicity socio-economic level, education level or any
other variable that may have a significant impact on test scores or yield comparisonof interest.
Age Norms. Age equivalent scores, also known as age norms, depict the level of test
performance for each separate age group in the normative sample. The purpose of age normsIs to
facilitate same east comparisons.With age norms,performance open examining is interpreted in
relation to standardisation subjects of same age. Age norms can be developed for any
characteristics that systematically change with age such as vocabulary, mathematical ability,
moral reasoning etc.
Grade Norms. Grade equivalent scores also known as great norms are conceptually
similar to age norms. A grade norm depicts the level of test performance for each separate grade
in the normative sample. Great norms are rarely used with ability tests.However these norms are
specially useful in school settings when reporting the achievement levels of schoolchildren.
Within-group Norms.Within groups, norms can be described as a test scoring method. It
is the most common normative strategy for testing. This type of scoring is very common in
psychological and intelligence measures. A test is given to a group of individuals and their
results are used to create a normal distribution. This distribution of scores is used as a normative
group in which to compare and score people who take the test. The within-group norms can be,
percentile norms, standard scores, stanines and stens.
Percentile norms express the percentage of cases in the standardisation sample who
scored below a specific raw score. For example, if 94 percent of the sample fell below a raw
score of 25, we can say that a raw score of 25 corresponds to a percentile rank of 94. This will be
denoted as P94 = 25. The 50th percentile (Q2) corresponds to the median and the 25th and the
75th percentile are known as the first and third quartile points (Q1 and Q3). Percentiles make no
assumption with regard to the characteristics of the total distribution. Thus they can be
interpreted easily when the distribution of test scores is non-normal.
The standard score indicates the position of raw scores, relative to the mean of the
distribution, in standard deviation units. Unlike percentile ranks, standard scores represent
measurement on an interval scale. Standard scores are obtained by linear transformation of the
data. The distribution of standard scores has exactly the same shape as the distribution of raw
scores. Therefore, the relative magnitude of differences between successive values correspond
exactly to that between the raw scores. One of the most familiar standard scores is the z score.
The z score has a mean of 0 and a standard deviation of 1. The z score is extremely useful
because it indicates each person’s standing as compared to the group mean. Also, when the
distribution of raw scores is reasonably normal, it can be directly converted into a percentile. T
score is a variant of z score, suggested by McCall (1922). It is exactly the same as the z score
except that the mean is 50 rather than 0 and the standard deviation is 10 rather than 1.
Stanines were originally devised by the U.S. Air Force during World War II. The
“standard nine,” or stanine scale divides the distribution of scores into nine groups, and
transforms all the scores into single-digit numbers from 1 to 9. The mean of the stanine scale is 5
and its standard deviation is approximately 2. Except the ranks of stanine 1 (lowest) and 9
(highest), each unit is equal to one half of a standard deviation.
The Sten (standard ten) is a standard score system that is conceptually similar to the
stanine scale. Stens divide the score scale into ten units. Each unit has a band width of half a
standard deviation except the highest unit (Sten 10) which extends from 2 standard deviations
above the mean, and the lowest unit (Sten 1) which extends from 2 standard deviations below the
mean. The mean of the sten scale is 5.5 and its standard deviation is approximately 2.
Types of Tests
On the Basis of Number of People
Group Tests. A Group Test consists of tests that can be administered to a large group of
people at one time. Group tests were designed as mass testing instruments; they not only permit
the simultaneous examination of large groups but they also use simplified instruction and
administration procedures. Thereby requiring a minimum of training on the part of the examiner.
Most testing today is administered as group tests considering the many benefits that are
associated with these tests. Considering the many standardised tests that are administered each
year, it is understandable that many of these are group tests. Examples of group tests include
statewide testing throughout K-12 students, placement examinations into college, and placement
examinations into graduate coursework. There are many advantages to group tests over
individual tests. Group tests are much more time-efficient in many aspects. For example, group
tests are administered to many people at once; to test each person individually is unrealistic.
Group tests also are much easier to score because they are dominantly multiple choice. More
time is taken to score short answer or essay-based questions, but these are still much quicker than
scoring each person's individual answers. Scoring is also more objective and more reliable since
subjectivity of the grader is not as prevalent. These tests are also more cost-efficient since they
don't require expensive materials or extensive training of administrators
Individual Tests. An individual test is a test that can be administered to only one person
at a time. The examiner gives instructions and records the examinee’s responses using a
standardised approach outlined in the test manual. The examiner then assesses and scores the
responses. This scoring procedure usually involves considerable skill. For example: Stanford
Binet Intelligence Test, Wechsler Scales. The advantage of individual tests is that they are often
more comprehensive, valid, and have better psychometric properties than group tests. They are
helpful in determining a person’s unique attributes and allow individualised interpretation of test
results. Individual tests, however, are more expensive due to the necessity for a qualified
administrator.
On the Basis of Degree of Difficulty
Speed Tests. A type of test used to calculate the number of problems or tasks the
participant can solve or perform in a predesignated block of time. The participant is often, but
not always, made aware of the time limit. Speed tests are designed to assess how quickly a test
taker is able to complete the items within a set time period. The primary objective of speed tests
is to measure the person's ability to process information quickly and accurately, while under
duress. Speed tests contain more items than the vast majority of applicants will be able to answer
in the time allotted, and the items are usually not high in difficulty. Scoring is based on how
many questions are answered by the applicant within the time limit. Often these tests are used by
human resource professionals and I/O Psychologists during the hiring process. Example of a
speed test is the Clinical Speed and Accuracy Test.
Power Tests. a type of test intended to calculate the participant’s level of mastery of a
particular topic under conditions of little or no time pressure. The test is designed so that items
become progressively more difficult. Thus, power tests are designed to gauge the knowledge of
the test-taker. A score on the power test depends entirely upon the numbers of items answered
and answered correctly. Raven’s Progressive Matrices (Raven & Court, 1998) is an example of
power test.
On the Basis of Culture Fairness
Culture Fair Tests. Culture-free tests, in contrast, are those that are relatively free of
specific cultural influences of the test designer and administrator. Items are designed to measure
innate abilities not affected by culture. Example: Maze tests and Block design tests
Culturally Loaded Tests. These types of tests are designed for a specific population and
show biased results for a specific group, culture, and population due to cultural influence. A
particular population influenced by cultural elements display either low or high scores relative to
the test norms.
On the Basis of Attribute and Purpose
Intelligence Tests. Intelligence refers to the global mental capacities of an individual,
and tests of intelligence essentially measure rational and abstract thinking of an individual. They
are designed to measure the global mental capacities of an individual in terms of verbal
comprehension, perceptual organisation, reasoning etc. The purpose is usually to determine the
subject’s suitability for some occupation or scholastic work. Example of the most commonly
used Intelligence test is Wechsler Adult Intelligence Scale (WAIS).
Achievement Tests. Achievement refers to a person’s past learning, and achievement
tests are designed to measure a person’s past learning on accomplishment in a task. Stanford
Achievement Test by Gardner and Madden (1969) is an example of Achievement test. The
distinction between aptitude and achievement tests is more a matter of use than content (Gregory
1994). In fact, any test can be an aptitude test to the extent it helps in predicting future
performance. Likewise, any test can be an achievement test to the extent it measures past
learning and measures a person's degree of success, or accomplishment in a subject or task.
Aptitude Tests. Aptitude refers to an individual’s potential to learn a specified task under
provision of training. Aptitude tests are designed to measure the subject’s capability of learning
specific tasks or acquiring specific skills. SAT (Scholastic Aptitude Test), Seashore Measure of
Musical Talent, Guilford and Zimmerman Aptitude Survey, General Aptitude Test Battery etc are
some examples of aptitude tests.
Personality Tests. These tests are designed to measure a person’s individuality in terms
of his unique traits and behaviour. These tests help in predicting an individuals’ future behaviour.
They come in several varieties like checklists, inventories and subject evaluation techniques,
inkblot and sentence completion tests. Personality tests can broadly be classified further into two
categories –structured personality tests and unstructured personality tests.
Structured Personality Tests are based on the premise that there are common dimensions
across all personalities which can be measured with the help of a psychological test in an
objective manner. In such tests, responses are already defined and the testee has only to choose
one of the options in the form of his responses. Tests coming in this category are 16PF, MMPI,
Maudsley Personality Inventory (MPI), and so on.
Unstructured Personality Tests, on the other hand, believe in idiosyncratic individual
specific needs, which are discovered and measured by analysing the responses given by the
testee on the presentation of ambiguous stimuli. These tests are based on the rationale that a
test-taker reacts to a vague or an ambiguous stimulus by projecting own feelings, thoughts,
experiences and memories. The responses given by the client indicate different facets of the
personality dimensions. Examples of unstructured personality tests are projective tests like
Thematic Apperception Test (TAT), Rorschach Inkblot Test etc.
Interest Inventories/Tests. Measure an individual's preference for certain activities or
topics and thereby help determine occupational choice. Examples of Interest inventories include
the Strong Interest Inventory, the Campbell Interest and Skill Survey, and the Myers Briggs Type
Indicator (MBTI).
Creative Tests. Creativity refers to a person’s ability to think of new ideas and creativity
tests are designed to measure a person’s ability to produce new and original ideas, and the
capacity to find unexpected solutions to vaguely defined problems. Examples of creativity tests
are the Torrance Test of Creative Thinking by E. Paul Torrance (1966) and the Creativity Self
Report by Feldhusen (1965).
Neuropsychological Tests. Measure cognitive, sensory, perceptual, and motor
performance to determine the extent, locus, and behavioural consequences of brain damage.
Behavioural Procedures. Objectively describe and count the frequency of a behaviour,
identifying the antecedents and consequences of the behaviour. Some behavioural assessments
include Vineland Adaptive Behaviour Scales, Conners Parent and Teacher Rating Scales, and
Behaviour Assessment System for Children (BASC), among others.

Applications of Testing
By far the most common use of psychological tests is to make decisions about persons.
For example, educational institutions frequently use tests to determine placement levels for
students, and universities ascertain who should be admitted, in part, on the basis of test scores.
State, federal, and local civil service systems also rely heavily on tests for purposes of personnel
Selection. But simple decision making or hiring is not the only function of psychological testing.
It is convenient to distinguish five uses of tests:
Classification. The term classification encompasses a variety of procedures that share a
common purpose: assigning a person to one category rather than another. Thus, classification can
have important effects such as granting or restricting access to a specific college or determining
whether a person is hired for a particular job. There are many variant forms of classification,
each emphasising a particular purpose in assigning persons to categories. We will distinguish
placement, screening, certification, and selection.
Diagnosis and Treatment Planning. Diagnosis consists of two intertwined tasks:
determining the nature and source of a person’s abnormal behaviour, and classifying the
behaviour pattern within an accepted diagnostic system. Diagnosis is usually a precursor to
remediation or treatment of personal distress or impaired performance. Psychological tests often
play an important role in diagnosis and treatment planning. For example, intelligence tests are
absolutely essential in the diagnosis of mental retardation. A proper diagnosis conveys
information—about strengths, weaknesses, etiology, and best choices for remediation/treatment.
Self-knowledge. In some cases, the feedback a person receives from psychological tests
can change a career path or otherwise alter a person’s life course. Of course, not every instance
of psychological testing provides self-knowledge.
Program Evaluation. Another use for psychological tests is the systematic evaluation of
educational and social programs. We focus here on the use of tests in the evaluation of social
programs. Social programs are designed to provide services that improve social conditions and
community life.
Research. Tests also play a major role in both the applied and theoretical branches of
behavioural research. As an example of testing in applied research, consider the problem faced
by neuropsychologists who wish to investigate the hypothesis that low-level lead absorption
causes behavioural deficits in children. The only feasible way to explore this supposition is by
testing normal and lead-burdened children with a battery of psychological tests.
Limitations of Testing
Uncertainty of Measurement. Because psychological tests are attempting to measure
attributes that are not directly observable, there is always a gap between what a test is attempting
to measure and what is what it actually measures. Since tests often rely on indirect measures
such as an individual responding to hypothetical situations, the decisions made in testing
situations are not always the same that people would take in real life situations.
Changing Circumstances. Because of changes in psychological theories and
advancements in technology, psychological tests only remain relevant for a time. Social or
cultural changes can lead to test items becoming obsolete, or new psychological theories may
replace the founding theories of the tests.To remain valid and reliable, psychological tests must
be updated often.
Cultural Bias. Psychological tests often used the dominant middle class culture as the
standard. This limits their validity for individuals from a different economic or cultural
background who may not have the same experiences that the test assumes as standard. It is
nearly impossible to create test questions that account for the different experiences of
individuals, so test administrators must use results with caution.
Language Bias. Most psychological tests are standardised in English and test results are
often not accurate for people who speak another language. Even when tests are translated into
native languages, problems occur with words that have multiple meanings and idioms specific to
one language or culture.
Inappropriate Standardisation Samples. Tests are often standardised on specific
normative groups. Most often minorities’ representation in norming samples may be insufficient
to allow for accurate interpretations of those groups.
Examiners’ Bias. Examiners who speak standard English may intimidate examinees and
communicate inaccurately with them, spuriously lowering their test scores. Sex, experience, or
race of the examiner may also affect test scores.
Inequitable Social Consequences. According to some authors, the unequal results of
standardised tests produce inequitable social consequences. Low test scores relegate minority
group members, already at an educational and vocational disadvantage, to educational tracks that
lead to mediocrity and low achievement. They may also be denied employment or be subjected
to other forms of discrimination.
Stereotype Threat. Labelling or stereotyping is another example of social consequences
of psychological testing. Stereotype threat is the thread of confirming, as self characteristic, a
negative stereotype about one’s group. For example, based on published data and media
coverage about race and IQ scores, African Americans are stereotyped as possessing less
intellectual ability than others. As a consequence, whenever they encounter tests of intelligence
or academic achievements, individuals from this group may perceive a risk that they will confirm
the stereotype.
Ethics in Research and Psychological testing
Rapport Formation
Ethics refers to the correct rules of conduct necessary when carrying out research. We have a
moral responsibility to protect research participants from harm. However important the issues under
investigation, psychologists need to remember that they have a duty to respect the rights and dignity of
research participants. This means that they must abide by certain moral principles and rules of
conduct. The purpose of these codes of conduct is to protect research participants, the reputation of
psychology, and psychologists themselves.
Voluntary Participation
All ethical research must be conducted using willing participants. Study volunteers should not
feel coerced, threatened or bribed into participation. This becomes especially important for researchers
working at universities or prisons, where students and inmates are often encouraged to participate in
experiments.
Informed Consent
Whenever possible investigators should obtain the consent of participants. In practice, this
means it is not sufficient to simply get potential participants to say “Yes”. They also need to know
what it is that they are agreeing to. In other words, the psychologist should, so far as is practicable,
explain what is involved in advance and obtain the informed consent of participants.
Debriefing
After the research is over the participant should be able to discuss the procedure and the
findings with the psychologist. They must be given a general idea of what the researcher was
investigating and why, and their part in the research should be explained. Participants must be told if
they have been deceived and given reasons why. They must be asked if they have any questions and
those questions should be answered honestly and as fully as possible.
Sharing the Results of The Study
After research results are published, psychologists do not withhold the data on which their
conclusions are based from other competent professionals who seek to verify the substantive claims
through reanalysis and who intend to use such data only for that purpose, provided that the
confidentiality of the participants can be protected and unless legal rights concerning proprietary data
preclude their release.
Confidentiality of Data
Participants and the data gained from them must be kept anonymous unless they give their full
consent. No names must be used in a lab report.
References

American Psychological Association. Speed Test. In APA Dictionary of Psychology


https://dictionary.apa.org/speed-test
Arnout, B. A. (2020, November 1). Ethnographic research method for psychological and

medical studies in light of COVID-19 pandemic outbreak: Theoretical approach. Wiley

Online Library.

https://onlinelibrary.wiley.com/doi/full/10.1002/pa.2404#:%7E:text=Ethnographic%20res

earch%20is%20perhaps%20the,method%20in%20psychology%20and%20medicine.&tex

t=This%20type%20of%20research%20method,collection%20based%20on%20these%20

foundations.

Ciccareli S., & White N. (2017). Psychology: An exploration (4th Ed.). Pearson.
Chiang, I. A. (2015). Scientific research in psychology – research methods in

psychology – 2nd Canadian Edition. Pressbooks.

https://opentextbc.ca/researchmethods/chapter/scientific-research-in-psychology/

Cramer, D. D. H. (2022). Introduction to statistics in psychology (3 Ed). Prentice Hall.

Fleetwood, D. (2021). What is a longitudinal study?Definition and explanation. QuestionPro.


https://www.questionpro.com/blog/longitudinal-study/
Gregory, R. J. (2022). Psychological testing: history, principles, and applications:

International Edition (6th edition) [E-book].

Holt, N., Bremner, A., Sutherland, E., Vliek, M., Passer, M., & Smith, R. (2019). Psychology:

The science of mind and behaviour 4e (4th ed.). McGraw-Hill Education.

Kantowitz, B. H., Roediger, H. I. L., & Elmes, D. G. (2014). Experimental psychology (10th

ed.). Cengage Learning.

Mishra L. (2016, June). Focus group discussion in qualitative research. Techno Learn 6:(1), p.
1-5.
Neuman, L. W. (2009). Social research methods: Qualitative and quantitative approaches (7th

ed.). Pearson.

Patino E. (2021). Types of behavior assessments

https://www.understood.org/articles/en/types-of-behavior-assessments

Power Tests definition. Psychology Glossary. alleydog.com

https://www.alleydog.com/glossary/definition.php?term=Power+Tests

Speed Tests definition. Psychology Glossary. alleydog.com


https://www.alleydog.com/glossary/definition.php?term=Speed+Tests
Streefkerk, R. (2022, February 7). Qualitative vs. quantitative research. Scribbr.

https://www.scribbr.com/methodology/qualitative-quantitative-research

Wikipedia contributors. (2021, December 18). Psychological research. Wikipedia.

https://en.wikipedia.org/wiki/Psychological_research

You might also like