Professional Documents
Culture Documents
Psych Assessment
Psych Assessment
Basic Concepts
A test is one of the many tools used in the field of psychology. It can be either a device or a technique
that allows behavior to be quantified or predicted. Meanwhile, the term testing refers to the process
of utilizing the former such as the administration of the test. Putting a modifier before the word test,
the meaning becomes different. The term, though, had become obsolete in practice, and practitioners
prefer to use the term assessment.
When we use the term psychological testing, this refers to the systematic procedure of gathering
sample behavior in relation to cognitive or affective functioning (Urbina, 2004). The data collected
from this are used as a basis to establish the standards of the test or tool.
Psychological testing pertains to the utility of the tests to evaluate an individual. On the other hand,
psychological assessment is integrating the collected data from various assessment tools in order to
come up with an evaluation. Earlier it was mentioned that a test helps to quantify behavior, and with
quantification, there is the measurement, numbers are assigned to objects, in this case behavior and
personality traits, according to a certain set of rules (Christensen, 1991).
Psychological testing and assessment allow behavior to be scaled, something which was thought
impossible before, since these processes utilize various tests that give us an idea what a person's IQ is
or the level of one's aggression. This allows data to be objective, leading to a less biased judgment of
an individual - evaluation. This is more extensive than measurement as inferences are drawn from
various assessment procedures.
For a better understanding of the aforementioned concept, please refer to the Table below (adapted
from Cohen-Swerdlick, 2010)
In the 1800s, the field of psychology was blooming, but some concepts were still vague. It was in 1838
that Jean Esquirol differentiated mental illness and mental retardation, but the test that will measure
the latter only came in 1905 when Alfred Binet and his partner, Theodore Simon, developed the first
intelligence test upon the commission of the French government. Moreover, theories, empirical
studies, and measurements were still being developed at that time.
In 1904, Charles Spearman postulated that intelligence is made up of a single g factor - a general factor
and a number of s factors or specific factors. In the same year, Karl Pearson contributed correlation
measures which are an immense help in testing.
In 1916, the Binet-Simon test came to the United States and was renamed as Stanford-Binet
Intelligence Test after Lewis Terman and his team from the said university revised and reformed the
test. The following year, Army Tests were developed, and this was led by Robert Yerkes. He and his
team made the Army Alpha, a verbal test for native English-speaking recruits, and Army Beta, a non-
verbal test which was for immigrant recruits for the first world war. Given this selection process,
Robert Woodworth developed the Personal Data Sheet in 1918, which he gleaned from answers from
the questionnaires of soldiers (Cohen & Swerdlik, 2017).
Later on, he made the Woodworth Psychoneurotic Inventory for civilian test takers. This was
considered by some the first personality test (Santos & Pastor, 2009). Rorschach Inkblot Test came in
1920. It was developed by Herman Rorschach, a Swiss psychiatrist. However, it was not complete as
there was no clear scoring system as he passed away before completing this. But this did not become
a hindrance as the inkblot test is one of the popular projective tests and is widely researched. Aside
from this, Henry Murray and Christiana Morgan developed another projective technique which was
the Thematic Apperception Test.
As psychological tests began to gain popularity, a lot of tests were being published, and by 1921,
Psychological Corporation was founded by Cattel, Thorndike, and Woodworth, who was the first test
publisher (Santos & Pastor, 2009).
Some five years later, the SAT or Scholastic Aptitude Test was made and published by the college
examination board. This marked development of tests exclusive to the educational setting. One year
later, Vocational Interest Blank published its first edition. Over the years, the development in the field
of testing continued. Theories and ideas were researched and written, and these supplemented test
development proposals. Few notable developments were: the publication of Lauretta Bender's Bender
Visual Motor Gestalt Test in 1938, which can detect organic problems in a person; 1939, Wechsler-
Bellevue Intelligence Scale by David Wechsler which was revised to Wechsler Adult Intelligence Scale
and Wechsler Intelligence Scale for Children in 1949; development and publication of Minnesota
Multiphasic Personality Inventory in 1942 and introduction of the coefficient alpha to measure the
internal consistency or reliability of tests and other assessment methods.
3
The history of tests and testing is a story of discovery and improvement. With the existing ideas and
theories of earlier theorists and developers, new tests and statistical measures are developed to
improve the quality of tests that society will be using.
Assessment Process
As a general practice, the process of assessment starts with a referral for assessment from a source
such as a teacher, guidance counselor, social worker, judge, or human resource recruiter. Referral
questions guide the assessor on what needs to be checked in the assessee. Some examples of referral
questions are the following: the mental age of the child, the capability of an employee to handle a
managerial position, grounds for annulment cases, and “Does this person has a substance use disorder?
The assessor will do a formal assessment in order to clarify or rule out the reason for his/her referral.
Hence, he/she will choose tools for assessment suitable for the assessee’s situation. It is important
that in sensitive cases like a referral for an annulment, the tool selection process may be informed by
some research in preparation for the assessment. After selecting appropriate instruments or
procedures to be employed, the formal assessment will begin. After the assessment, the assessor will
write a report of the findings that is designed to answer the referral question. More feedback sessions
with the assessee and/or interested third parties (such as the assessee’s parents and the referring
professional) may also be scheduled.
1. Theoretical Orientation
As cited by Groth-Marnett in 2010, Haynes, Richard, & Kubany (1995) emphasized that clinicians
should study the construct that the test is supposed to measure and how the test approaches this test
construct. Usually, this information is easily found in the test manual. Careful examination of the
individual items will help the clinician to understand and obtain meaningful information about the
construct being measured. An example of psychological tests with a strong theoretical orientation will
be the Revised NEO Pi R based on the Five Factor Theory of McCrae and Costa Jr.
4
2. Practical Considerations
Before using a test, a number of practical considerations about the context and manner used should be
examined. First the appropriateness of the test to the examinee's educational level (reading skills).
Imagine administering an IQ test that needs at least a high school level attainment to a group of
illiterates (especially in prison settings)! The examinee must be able to read, comprehend, and
respond appropriately to the test. Otherwise, the result of the test will be useless.
Second, the length of the test should be considered, too, as it may cause boredom, fatigue, and
frustration on the part of the examinees, which in turn will affect the quality of data gathered.
Administering short forms of the test may reduce these problems, provided these forms have been
properly developed and are treated with appropriate caution (Groth-Marnatt, 2010).
Lastly, a clinician should be honest about how knowledgeable and competent he /she is in
administering and interpreting the instrument. If further training is necessary, a method must be
developed to enroll in this training.
These tools are developed in a manner that is able to measure what it is supposed to measure - validity
and can give a consistent result - reliability.
Tests are used in various settings, and these tools proved to be very useful in many ways:
Clinical Setting
It is in the clinical setting where tests - psychological tests, in particular, are often heard. This is not
surprising as tests and other assessment procedures like a clinical diagnostic interview and behavior
assessment are utilized to help the clinician - psychologists and psychometricians to come up with a
proper diagnosis. These detect intellectual disabilities as well as emotional and behavioral instability,
which are important facets of one's personality. Aside from diagnosis, these tools enable the
identification of suitable interventions for the client, like counseling, psychotherapy, or behavior
therapy.
Industrial Setting
In an industrial setting, these assessment devices pave the ways for the selection process to be easier.
Tests aid the HR practitioners in finding the right person for the right job, and these also allow
promotion and training of employees to be done efficiently and objectively. As the tests are means to
measure the skills and capabilities of the employee, the system of evaluating the performance is
unbiased and more objective. The results from these tests can help the development and planning of a
good training program and see if these programs are effective.
5
Test reviewers
These are people in the same or related fields who would evaluate the developed tests based on the
tool’s theoretical, empirical and psychometric merits.
Test Users
Their role is to select a test that is appropriate for the purpose of testing. This would be a challenge
since there is a thousand test that is published annually. Also, these people can be the test
administrators, scorers as well as interpreters, depending on their qualifications and training.
Being test administrators, they are required to know the process of giving the exam, whether by group
or individually. As test scorers, they have to get the raw scores from the tests and transform these into
interpretable scores through the objective process and evaluative judgments. The scores then will have
to be interpreted in a manner that will be understandable by other professionals and disciplines. Also,
the interpretation needs to be clear and informative, based on the test results, so it can be utilized for
decision-making.
Test takers
They are the ones whom the tests are made for in order to measure a specific facet of their personality.
6
2. Tests
As mentioned earlier, these are tools that are devised to measure some variables. The types also
tend to vary depending on where it is used. Tests in education are sometimes referred to as
psychoeducation tests which do not measure achievement in class but rather the individual skills -
aptitude and intelligence tests which determine the intellectual functioning of a person. The human
resource utilizes various tests as well both that measure skills as well as personality. In the clinical
setting, we see a more focused test - a structured and projective personality test. Both measure
personality traits which aid in diagnosis and clinical evaluation.
3. Behavioral Observation
This is another tool in assessment whereby it can substitute when other tools cannot be utilized.
It uses naturalistic observation, interview, and rating scales to better understand how and why the
person behaves as such. It is very useful in an industrial setting when there is a need to choose an
employee with certain abilities required to perform a job (Cohen & Swerdlik, 2017).
4. Other Tools
Aside from the aforementioned tools, there are several more tools that can be utilized in the
assessment procedures; among them is the case history data. Case studies have been prevalent during
the development of psychology, and this is often the only source of information for the practitioner.
However, in the modern period, it is utilized together with other tools to understand the factors that
contributed to the person's past as well as present functioning.
7
Principle A: Competence
In psychological testing and assessment, we have two practitioners, the psychometrician and
the psychologist. They are expected to fulfill their roles as professionals and do their jobs within their
training and education, knowing their professional limitations and boundaries.
Principle B: Integrity
Practitioners strive to make objective assessments and evaluations based on the data that they
have gathered and analyzed. Unbiased evaluation reports are the end goal. Thus, they utilize tools with
high reliability and validity as it allows them to achieve this.
Principle C: Professional and Scientific Responsibility
In the practice of psychology, it is not surprising to collaborate with professionals in other fields.
Among which are doctors, lawyers, teachers as well as social workers. This is because professionals
know the limits of their job responsibilities, and in order to help the client in their care, they must work
with others for the best interest of the client. Aside from this, they should know their tools well so as
to be confident of their reliability and validity to the population that they will be used.
The APA ethics code was amended in 2010 and 2016 in order to address changes and
development in practice as well as in society.
8
7. Vallente, K. K. (2014, March 13). Behavioral assessment - Clinical Psychology. Retrieved from
https://www.slideshare.net/keziahkeilavallente/behavioral-assessment-clinical-psychology
8. What you need to know about the new code. (n.d.). Retrieved August 13, 2018, from
http://www.apa.org/monitor/jan03/newcode.aspx
II: Nature and Uses of Psychological Test
Overview
In the first module, we have learned about the basic concepts and principles in psychological tests and
assessments. In the field of psychological assessment, a Test refers to the many tools used in the field
of psychology since it is a device or method that allows behavior to be quantified or to be predicted.
In Module 1, it was also discussed the major differences between Testing and Assessment. When we
say Testing, the main objective in performing this method is to maintain a numerical estimate with
regard to ability or attribute (i.e., kindness, industriousness), in contrast to Assessment, which is
typically performed to answer a referral question (Does the patient has substance use disorder?), solve
a problem (intervention for speech delay), to arrive at a decision through the use of tools of evaluation
(insanity pleas in Court).
Likewise, we can now explain the brief history of testing and assessment, and history shows that
psychological tests have evolved in a complicated environment in which hostile and friendly forces
have produced a balance characterized by innovation and a continuous quest for better methods
(Kaplan & Sacuzzo,,2013)
Now that we know the basic concepts and principles of psychological testing and assessment let's
study the nature and uses of psychological tests.
1. Standard Procedure
An important characteristic and requirement of any psychological test should include a uniform
administration procedure.
2. Behavior Sample
Apruebo (2010) explained that a psychological test is a limited amount of sample of behavior. This
means, for a short period of time, using a psychological test enables a clinician to gather data about the
behavior of a person. This sample of behavior allows the clinician to make inferences and
interpretations about the total domains of relevant behavior. For example, using an intelligence test
such as Purdue Non-language Test helps the clinician to determine the intellectual functioning of
his/her client.
3. Scores/ Categories
Another important defining feature of a test has scores/categories. A psychological test should provide
one or more scores, meaning it provides the data/person that it belongs to one category and not
another. In simpler terms, a psychological test calculates the performance in numbers or
classifications.
4. Norms or Categories.
Another essential feature of a psychological test is possessing norms or standards.
Kaplan & Sacuzzo (2016) defined norms as the performances by defined groups on particular tests.
Means, norms consisted of a summary of test results from a large and representative group of subjects
in which a test score is interpreted by comparing it with scores obtained by others on the same test.
Standardized
A good test is standardized when it follows a uniformed process from administration, scoring, and
interpretation as well as the norms of the test. Standardization of the test ensures that the same
procedure will be given to everyone who takes the test, thus lessening the bias as well as mistakes that
can affect the results of the test.
This process covers how the test will be done - what materials will be utilized in the testing process,
the time limit, directions and instructions that will be given on the test, and other details that will
directly affect the process of testing (Santos & Pastor, 2009). All of these factors should be exactly the
same for every test taker.
Another aspect of a standardized test is the norms. Norms help the test used to make sense of the
results of the psychological tests since these kinds of tests cannot be easily interpreted as pass or fail.
Interpreting a psychological test is different as the test user will have to compare the results of an
individual to the scores of other test takers on the same test. By standardizing the norms of a test, the
results can be easily understood - what is the difference between this individual in this measure
compared to other people.
Reliability
A good measuring tool consistently measures a factor on different occasions. A good test is said
to be reliable when a person shows consistent scores when examined on the same test but in separate
periods of time. Suppose that you wanted to know how a student's Math skills are, so you gave him a
20-item test covering different areas of mathematics. You gave that test to him on three trials - in the
first trial, the student scored 15; on the second try, the score was seven; and in the last, the score was
10. Given these scores, did the test show reliability? In the scenario mentioned, the test was not
reliable, seeing that the scores deviate in a random fashion. The test needs to be able to accurately
measure the facet it has to measure in order for the test to be useful, and one way of testing its
reliability is when it yields consistent measurements at different times.
Validity
Reliability is not enough to say that a test is good; a test needs to be valid as well. This property
of psychometric soundness measures how well a tool assesses a factor that it claims to measure. Let's
say that you are holding a test that you are told will measure knowledge in history. Knowing this, you
will be expecting questions or items relating to events that have happened in the past. However, upon
II: Nature and Uses of Psychological Test
perusal, you see that there were items related to grammar rules and tenses of verbs. Do these items
make the test valid?
For a test to be considered valid, it has to measure the purpose for which it is intended. Who
then deems a test to be valid? In the earlier example, about the history test, the test has to be
scrutinized by experts in the field of history since they are knowledgeable of the events that took place
in the past. These experts are also the ones who can help assess if the test covers enough areas to say
that the test will adequately measure the historical knowledge (Cohen & Swerdlik, 2017).
Categories of Test
Apruebo (2010) maintained that a psychological test is typically considered and categorized as
a psychometric test and projective test.
He defined the psychometric test as a structured, voluntary, objective, and specifically designed
to measure intelligence, aptitude, and personality traits. Meanwhile, the projective test is the exact
opposite of the abovementioned test, as it is an unstructured and subjective way of measuring covert
(non-observable) or unconscious characteristics of behavior in which the results are discussed in a
qualitative/reflective manner.
As cited by Apruebo in 2010, Campbell discussed the categories of tests in terms of the following
three (3) dimensions:
Adapted from Psychological Testing Volume 1 by Dr. Roxel A. Apruebo, RGC Copyright ©2010 Central Book
Supply Inc.
Testing Levels
It should be a common practice that only qualified users/examiners shall handle the administration
and safekeeping of all test materials (including the manuals, answer keys, reusable booklets, etc.) At
the hands of incompetent and unauthorized persons, the usefulness of tests will either be of no use or
may break/destroy a person’s life. Just imagine a Court scenario a rapist received a Scot-free verdict
due to an incompetent psychological assessment report. As we mentioned earlier, in assessment, every
word has the power to either make or break a person's life. Hence, the following are the guidelines for
administering tests according to the level of the test, as elaborately indicated by Apruebo in 2010:
Level A
The qualifications of the psychologist should fall on the following: undergraduate courses in
testing or psychometrics and adequate training/administration in testing. He/she can be administered
paper-and-pencil tests: IQ, Achievement tests, Aptitude Tests, etc.
Level B
II: Nature and Uses of Psychological Test
For this level, the psychologist should at least complete an advanced course in testing (graduate
level) in a university or its equivalent in training under the guidance and supervision of a qualified
clinical psychologist./psychological consultant. Under this level, tests that can be administered are
the following: those that are under Level A and paper-and-pencil tests of personality: Sentence
Completion Tests, Personality Assessment Inventory, 16 Personality Factor, and the Wechsler
Scales.
Level C
To administer and interpret Level C tests, psychologists should have a M.A. or Ph. D. and/or
equivalent experience in training and psychodiagnostic. The tests that can be administered and
evaluated at this level are the following: Level A and Level B tests and projective techniques such as
Thematic Apperception Test, Children Apperception Test, and Rorschach Psychodiagnostic Test, to
name a few.
2. Informed Consent
Informed consent refers to the unanimous agreement between a professional and a particular
person and his/her legal representative. Under this agreement, permission is given to administer
psychological tests to the person and to obtain other information from that person for
evaluative/dianoetic reasons.
General Considerations
As discussed in the previous module, it was discussed the characteristics of what a good test
should have. For a test to be helpful to a clinician, it should measure what it intends to measure in as
accurate a way as much as possible. Which brings us to this important question, what are the factors
we need to consider before using a test to assess someone psychologically?
Reliability
To consider the test to be suitable, it should be, first, reliable. Reliability of a test refers to the
“accuracy, precision, or consistency of a score obtained through the test” (Apruebo, 2010). Likewise,
Souza et al. (2017) mentioned that it should yield “a consistent result in time and space, or from
different observers, presenting aspects on coherence, stability, equivalence, and homogeneity. This
means, across different times, different situations, and different test takers, - a reliable test will always
reproduce a stable score that will measure a skill, knowledge, and domain consistently. In other
words, reliability addresses the degree to which an obtained score by a person is the same even if the
person retakes the same test on different occasions (Groth-Marnatt, 2010).
As an illustration, a K-Pop enthusiast decided to take an aptitude exam for the Korean language
without any preparation, relying on the phrases she learned from the Korean series she binge-
watched. As a result, she utterly fails the exam. Frustrated with her obtained score and a strong desire
to learn the language, she enrolled in a review center for this subject with plans to retake the exam.
After 3 months, she decided to retake the exam and scored higher than before. The Korean aptitude
test is said to be a good test as it can consistently measure her aptitude based on her understanding
of the subject. If the test is not reliable, retaking the said exam will only yield an increase or decrease
in her score based purely on chance.
However, Kaplan (2009) explained that errors of measurement could not be avoided as
discrepancies between true ability and measurement of abilities is inevitable. Humans are bound to
make mistakes, and our goal is to lessen the error to “keep testing errors within reasonably accepted
limits” (Groth-Marnatt, 2010). In other words, errors in measurement are an estimate of the possible
range of random changes in the score that can be expected from a person's score.
In psychological assessment, error implies inaccuracy of measurement. Again, tests that are
"relatively free of error" (Kaplan, 2009) are considered to be reliable. How do we know that a test is
“reliable” then?
This is where reliability analysis will enter to examine whether the test provides a consistent
measure.
For example, in answering a personality inventory, you might encounter a statement such as "I rather
read books than go out and party with people." Typical choices on this test are the following: Strongly
Agree, Agree, Neutral, Disagree, and Strongly Disagree. There is no right or wrong answer, but rather
you are just saying where you stand on the range of agreeing or disagreeing on this statement.
Validity
In psychological assessment, it is important to use a test that will measure what it intends to measure.
Just imagine you are taking your mid-term exams in Theories of Personality only to answer trivial
questions such as "What age did Sigmund Freud die?" or " Who coined the term "schizophrenia"? "
That is so unfair; the test is INVALID; it does not measure my knowledge about personality theories,"
you exclaimed. A test that is valid for identifying personality traits should measure what it is intended
to measure and should also produce information useful to clinicians. Validity is the degree to which
certain inference from a test is appropriate or meaningful. In layman's terms, it measures what it wants
to measure.
Groth-Marnatt (2010) explained that even though an instrument/test can be reliable without being
valid, it is an important requirement for the test to achieve a certain level of reliability. Souza et al.
(2017) emphasized that a test that is not reliable cannot be valid; however, a reliable test can,
sometimes, be invalid. Hence, high reliability does not guarantee a test's validity.
As cited by Apruebo in 2010, Nunnally & Bernstein said that validity has three (3) major meanings:
a. Construct Validity is measuring psychological domains
b. Predictive Validity is establishing a statistical relationship with a particular criterion.
c. Content Validity is sampling from a pool of required content.
Types of Validity Methods
According to Translation Validity (Apruebo, 2010)
Face Validity
One night, while browsing the internet, you become bored and decides to try an English proficiency
test on the Internet. Some of the questions go like this “ A is for Apple, C is for ___"?, " How much wood
would the woodchuck chucked?" and " Nan has 5 siblings. Bab, Beb, Bib, and _____. “
After item number 3, you decided to stop answering the test as you feel you've been duped, and it's a
waste of time since it clearly doesn't look like an English proficiency test. And that is a classic example
of what Face Validity is all about.
Face validity refers to the appearance of the test. It pertains to the perceived purpose of the test.
In other words, “Does your test looks like a test”?
For example, if you think you are answering an intelligence test because the test items are composed
of abstract items, then we can say that it has face validity.
Groth-Marnatt (2010) implied that it is really not a type of validity at all as it does not offer evidence
to support conclusions drawn from test scores. However, bear in mind that it is essential to have a test
that “looks like” it is a valid test, as these appearances can help motivate test takers because they can
see that the test is relevant.
Content validity
Say that, for example, you have an upcoming test for General Psychology. You have rigorously studied
your notes and book for that examination and known almost everything only to find that the professor
has come up with some trivial items that do not represent the content of the course. I know how hard
that moment is, which is why it is important for a test to have content validity.
Refers to the degree to which the items of the test are a representative sample of a universe content
(i.e., contains all the possible content areas of a construct). Meaning to say, it shows whether the test
includes comprehensive coverage of the construct. It also shows whether the test has been adequately
constructed and whether item contents and the domain it represented were examined by experts.
An example of a test with high content validity was the Board Licensure Examinations.
Predictive Validity
This type of validity measures how well its prediction agrees with subsequent and/or future outcomes.
A classic example of this would be in the United States; they used the SAT Critical Reading Test serves
as predictive validity evidence for college admissions tests to know if it accurately forecasts how well
5
Concurrent validity
Say that you, as the newly-hired Human Resource Specialist, are assigned to hire a Chef for a Korean
Eat-All-You-Can Buffet. You already screen your applicants to three (3) with the most impressive job
experience. Since they appear to have the same qualifications, what will be your tool for hiring the best
Chef among the three?
One way is to test potential employees on a sample of behaviors that represent the tasks to be required
of them. For example, as cited by Campion (cited by Kaplan & Sacuzzo, 2015) found that the most
effective way to select maintenance mechanics was to obtain samples of their mechanical work.
Similarly, the best way to hire the Chef is to require them to create their best version of Korean
Samgyupsal, and the best way to showcase their skill is, of course, to cook!
The abovementioned scenario is a good instance of the use of concurrent validity. In short, concurrent
validity is a correlation between the test and a criterion when both are measured at the same point in
time.
Convergent Validity
A measure determined by significant and strong correlations between different measures of the same
construct.
For example, you decided to test your newly constructed depression questionnaire, Light Scales, to be
compared with Aaron Beck's Depression Inventory to see if there is a high correlation between the two
tests.
If the data you obtained denotes a high correlation, it means that the Light Scales indeed measure
depression.
Discriminant Validity
This measure refers to the extent to which measures diverge from other operationalizations.
This means that when you use this validity test, it should yield a low correlation for tests that are
opposites of your measure.
For example, just for the sake of discussion, the test entitled Resilience Scale should not
correlate highly with Aaron Beck's Depression Inventory because it will mean that the Resilience Scale
measures the wrong construct, which is depression.
Validity Coefficient
The relationship between a test and a criterion is usually expressed as a correlation called a validity
coefficient. This coefficient tells the extent to which the test is valid for making statements about the
criterion.
Norms
This pertains to the performance of a particular reference group to which an examinee's score can be
compared. This means a norm is a normal or average performance.
It can be expressed as the number of correct items, the time required to finish a task, the number
of errors committed, etc.
Apruebo (2010) strongly argued that raw scores are pointless until they can be evaluated in terms of
appropriate interpretative standard data or statistical techniques.
In short, a norm is a set of scores from a group of individuals to which the raw score from a
psychological test is compared to.
Usage of Norms
Psychological test manuals provide tables of norms to facilitate comparing both individuals and
groups. However, several methods and techniques for deriving into more meaningful norms, more
specifically, "standard scores" from "raw scores," have been widely adopted because all of them reveal
the relative status of individuals within the group.
1.1 Mean
Commonly known as Arithmetic Average, computed by adding all the scores in the distribution
and dividing by the number of scores.
1.2 Median
The median is the score that divides a distribution exactly in half. Exactly 50% of the individuals in
distribution have scores or below the median. The median is equivalent to the 50 th percentile.
7
1.3 Mode
In a frequency distribution, the mode is the score or category that has the greatest frequency.
2. Frequency Distribution
A frequency distribution is an organized tabulation of the number of individuals located in each
category of the scale of measurement. It takes a disorganized set of scores and places them in order from
highest to lowest, grouping together all individuals who have the same score.
Personality Traits
Anxiety Traits
f %
(ANX)
59 or less 54 51.92
60 T to 69 T 41 39.42
70 to 81t 9 8.65
Total 104 100
An example of a frequency distribution. From the data above, the table indicates that the majority of
respondents' scores fall in the bracket of 59T or less, which means 54 people obtain that score.
Adapted from Statistics for the Behavioral Sciences by Gravetter, Frederick J. & Wallnau, Larry B. Copyright
©2012 Wadsworth/Cengage Learning
In a symmetrical distribution, it is possible to draw a vertical line through the middles so that one side
of the distribution is a mirror image of the other. In a skewed distribution, the scores tend to pile up
toward one end of the scale and taper off gradually at the other end.
The section where the scores taper off toward one end of the distribution is called the tail of the
distribution.
For example, in a very difficult exam, most scores tend to be low, with only a few individuals earning high
scores. This will produce a positively skewed distribution.
On the other hand, a very easy exam is inclined to produce a negatively skewed distribution, with most
of the students earning high scores and only a few low values.
4. Percentile Rank
A rank or percentile rank of a particular score is defined as the percentage of individuals in the
distribution with scores at or below the particular value. When a score is identified by its percentile rank,
the score is called a percentile. Percentile describes your exact position within the distribution.
How to interpret percentile:
0- 5 % tile Compartment 1 = Fail
6-10 % tile Compartment 2 = Low Average
11-50 % tile Compartment 3 = Below Average
51-85 % tile Compartment 4 = Average
86-95 % tile Compartment 5 =High Average
96-99 % tile Compartment 6 =Excellent
5. Stanine System
a. Raw Scores are transformed into nine groups.
b. one is the lowest and 9 Highest