Professional Documents
Culture Documents
Psychological Assessment
Psychological Assessment
These tests are categorized according to the manner of VERBAL TESTS - instruments that involve words to measure a
administration, purpose, and nature. particular domain.
Administration - Individual; Group
NONVERBAL TESTS - instruments that do not use words, rather,
Item Format - Objective; Projective
they use geometrical drawing or patterns.
Response Format - Verbal; Performance
Domain Measures - Cognitive; Affective COGNITIVE TESTS - measure thinking skills.
Types of Test AFFECTIVE TESTS - measure personality, interests, values, etc.
STANDARDIZED TESTS - have prescribed directions for TESTING OF HUMAN ABILITY
administration, scoring, and interpretation. Examples: MBTI, MMPI,
SB-5, WAIS
Approaches to the Development of Personality Assessment PSYCHOLOGICAL ASSESSMENT - the gathering and integration
of psychology-related data for the purpose of making a psychological
Empirical-Criterion Keying - a method for developing evaluation that is accomplished through the use of tools such as tests,
personality inventories in which the items (presumed to interviews, case studies, behavioral observation, and especially
measure one or more traits) are created and then administered designed apparatuses and measurement procedures.
to a criterion group of people known to possess a certain
characteristic (e.g., antisocial behavior, significant anxiety, DIFFERENT APPROACHES TO ASSESSMENT
exaggerated concern about physical health) and to a control
group of people without the characteristic. Only those items COLLABORATIVE PSYCHOLOGICAL ASSESSMENT -
that demonstrate an ability to distinguish between the two the assessor and the assessee work as partners from initial
groups are chosen for inclusion in the final inventory. contact through final feedback
Factor Analysis - a method of finding the minimum number of THERAPEUTIC PSYCHOLOGICAL ASSESSMENT -
dimensions or factors for explaining the largest number of therapeutic self-discovery and new understandings are
variables. encouraged throughout the assessment process
DYNAMIC ASSESSMENT - the interactive approach to
EXPLORATORY FACTOR ANALYSIS - entails psychological assessment that usually follows a model of:
estimating, or extracting factors; deciding how many evaluation> intervention> evaluation. Interactive, changing,
factors to retain: and rotating factors to an interpretable and varying nature of assessment
orientation
CONFIRMATORY FACTOR ANALYSIS - the
degree to which a hypothetical model - which includes
factors, fits the actual data.
Personality Theories - the result of hypotheses, experiments,
case studies, and clinical research led by scientists in the
psychology and human behavior field.
Reasons for using the test Acquiring Knowledge Relating to the Client
- Consider the problem, the adequacy of the tests you will use,
and specific applicability of that test to an individual’s unique
situation.
- Competence in merely administering and scoring tests is
insufficient to conduct effective assessment.
Contextualize the problem
Data Collection
- expanded the use of correlational methods pioneered by GUSTAV THEODOR FECHNER - involved in the Mathematic
Galton and Karl Pearson, and provided the conceptual sensory thresholds of experience;
foundation for factor analysis, a technique for reducing
- founder of Psychophysics, and one of the founders of
a large number of variables to a smaller set of factors
that would become central to the advancement of Experimental Psychology.
testing and trait theory; devised a theory of intelligence - Weber-Fechner Law was the first to relate to situation and
that emphasized a general intelligence factor (g) present stimulus. It states that the strength of a sensation grows as the
in all intellectual activities. logarithm of the stimulus intensifies.
- considered by some as the founder of Psychometrics.
KARL PEARSON - famous student of Galton; continued Galton’s
early work with statistical regression. GUY MONTROSE WHIPPLE - was influenced by Fechner and was
a student of Titchener; pioneered the human ability testing.
- invented the formula for the coefficient of correlation;
Pearson’s r. - conducted seminar that changed the field of psychological
testing (Carnegie Institute in 1918).
JAMES MCKEEN CATTELL - first person who used the term - because of his criticisms, APA issued its first standards for
mental test; made a dissertation on reaction time based upon Galton’s professional psychological testing.
work. - construction of Carnegie Interest Inventory – Strong
Vocational Interest Blank.
- tried to link various measures of simple discriminative,
perceptive, and associative power to independent estimates of LOUIS LEON THURSTONE - a large contributor to factor
intellectual level, such as school grades. analysis and attended Whipple’s seminars; his approach to
measurement was called the Law of Comparative Judgment
EARLY EXPERIMENTAL PSYCHOLOGISTS which is a mathematical representation of a discriminal process,
- Early 19th century, scientists were generally interested in which is any process in which a comparison is made between pairs
identifying common aspects, rather than individual differences. of a collection of entities with respect to magnitudes of an
- Differences between individuals were considered as source of attribute, trait, attitude, and so on.
error, which rendered human measurement inexact.
WOODWORTH PERSONAL DATA SHEET - the first objective MINNESOTA MULTIPHASIC PERSONALITY INVENTORY
personality test meant to assist in psychiatric interviews; was (MMPI 1943) - tests like Woodworth made too many assumptions;
developed during the WW1. the meaning of the test response could only be determined by
empirical research.
- designed to screen out soldiers unfit for duty.
- mistakenly assume that subject’s response could be taken at - MMPI-2 and MMPI-A are most widely used.
face value.
RAYMOND B. CATTELL (16 PF) - the test was based on factor
analysis – a method for finding the minimum number of dimensions or
factors for explaining the largest number of variables.
1900s - Everything necessary for the rise of the first truly modern and
successful psychological test was in place.
PERSONALITY TESTING: Slow Rise – Projective Techniques
1904 - Alfred Binet was appointed to devise a method of evaluating
HERMAN RORSCHACH (Rorschach Inkblot Test) - pioneered
children who could not profit from regular classes and would require
the projective assessment using his inkblot test; started with great
special education.
suspicion; first serious study made in 1932.
1905 - Binet and Theodore Simon published the first useful
- symmetric colored and black & white inkblots and was instrument to measure general cognitive abilities or global intelligence.
introduced to the US by David Levy.t in 1921.
1908 - Binet revised, expanded, and refined his first scale.
THEMATIC APPERCEPTION TEST (TAT)- was developed in
1935 and composed of ambiguous pictures that were considerably 1911 - The birth of the IQ; William Stern (1911) proposed the
more structured than the Rorschach. computation for IQ based on Binet-Simon scale (IQ = Mental Age /
Chronological Age X 100).
- Subjects are shown pictures and asked to tell a story including:
What has led up to the event shown; 1916 - Lewis Terman translated the Binet-Simon scales to English and
What is happening at the moment; published it as the Stanford-Binet Intelligence Scale.
What the characters are feeling and thinking; and o What the
outcome of the story was.
1917 World War I - Robert Yerkes, APA President developed a Features of a psychological test:
group test of intelligence for US Army; pioneered the first group
testing - Army Alpha and Army Beta. Sample of behavior
Objective and standardized measure of behavior
1918 - Arthur Otis devised a multiple choice items that could be Diagnostic or predictive value depends on how much it is an
scored objectively and rapidly; published Group Intelligence Scale that indicator of relatively broad and significant areas of behavior
had served as model for Army Alpha. Tests alone are not enough – it has to be empirically
demonstrated that test performance is related to the skill set for
1919 - E.L Thorndike produced an intelligence test for high school
which he or she is tested
graduates
Tests need not resemble closely the behavior trying to be
USES OF PSYCHOLOGICAL TESTS predicted
Prediction – assumes that the performance of the individual in
1. Measure differences between individuals or between reactions the test generalizes to other situations
of the same individual under different circumstances Capacity – can tests measure “potential”?
2. Detection of intellectual difficulties, severe emotional - Only in the sense that present behavior can be used as an
problems, and behavioral disorders indicator of future behavior.
3. Classification of students according to type of instruction, No psychological test can do more than measure behavior
slow and fast learners, educational and occupational Standardization
counseling, selection of applicants for professional schools Uniformity of procedure when administering and scoring a test
4. Individual counseling – educational and vocational plans, Testing conditions must be the same for all
emotional well-being, effective interpersonal relations, enhance Establishing norms (normal or average performance of others
understanding and personal development, aid in decision- who took the same test under the same conditions)
making Raw scores are meaningless unless evaluated against suitable
5. Basic research – nature and extent of individual differences, interpretative data
psychological traits, group differences, identification of Standardization sample – indicates average performance and
biological and cultural factors frequency of deviating by varying degrees from the average
6. Investigating problems such as developmental changes in the - Indicates position with reference to all others who took the test
lifespan, effectiveness of educational interventions, - In personality tests, indicates scores typically obtained by
psychotherapy outcomes, community program impact average persons
assessment, influence of environment on performance Objective measurement of difficulty
7. Measures broad aptitudes to specific skills Objective – scores remain the same regardless of examiner
characteristics
Difficulty – items passed by the most number of people are the
easiest
MEASUREMENT - the act of assigning number or symbols to Scale of Equal Absolute
characteristics of things (people, events, etc.) according to rules. Magnitude
measurement Interval Zero
SCALE - set of numbers (or other symbols) whose properties model Nominal No No No
empirical properties of the objects to which the numbers are assigned. Ordinal Yes No No
Interval Yes Yes No
CATEGORIES OF SCALES
Ratio Yes Yes Yes
DISCRETE- Values that are distinct and separate; they can be Nominal - known as the simplest form of measurement.
counted. - Involve classification or categorization based on one or more
CONTINUOUS - exists when it is theoretically possible to distinguishing characteristics, where all things measured must
divide any of the values of the scale; the values may take of be placed into mutually exclusive and exhaustive categories.
any value within a finite or infinite interval. • Example: DSM5, Gender of the patients, colors
Ordinal - permits classification and in addition, rank ordering
ERROR - refers to the collective influence of all the factors on a test on some characteristics is also permissible.
score or measurement beyond those specifically measured by the test - It implies nothing about how much greater one ranking is than
or measurement; it is very much an element of all measurement, and it another; and the numbers do not indicate units of measurement.
is an element for which any theory of measurement must surely • Examples: fastest reader, size of waistline, job positions
account. Interval - Permit both categorization and rank, in addition, it
Scales of Measurement contain intervals between numbers, thus, each unit on the scale
is exactly equal to any other unit on the scale.
Properties of Scales: - No absolute zero point however, it is possible to average a set
of measurements and obtain a meaningful result.
1. Magnitude – “moreness”; suggests that one is more than the - For example, IQs of 80 and 100 is thought to be similar to that
others existing between IQs of 100 and 120. If an individual achieved
2. Equal interval – the difference between two points at any an IQ of 0, it would not be an indication of zero intelligence or
place has the same meaning as the difference between two total absence of it.
other points on other places. - Examples: temperature, time, IQ scales, psychological scales
3. Absolute zero – zero suggests absence of the variable being
Ratio - Contains all the properties of nominal, ordinal, and
measured
interval scales, and it has a true zero point; negative values are
not possible.
- A score of zero means the complete absence of the attribute
being measured.
- Examples: exam score, neurological exam (i.e. hand grip),
heart rate
DESCRIPTIVE STATISTICS - used to say something about a set of - RANGE - the simplest measure of variability. • It is the
information that has been collected only. difference between the highest and the lowest score.
INTERQUARTILE RANGE - a measure of variability equal
- DISTRIBUTION - set of test scores arrayed for recording or
to the difference between Q3 and Q1.
study. SEMI-INTERQUARTILE RANGE - equal to the
- RAW SCORE - Is a straightforward, unmodified accounting interquartile range divided by two
of performance that is usually numerical; may reflect a simple - AVERAGE DEVIATION- another tool that could be used to
tally, such as the number of items responded to correctly on an describe the amount of variability in a distribution; rarely used
achievement test. perhaps due to the deletion of algebraic signs renders it is a
- FREQUENCY DISTRIBUTION - all scores are listed useless measure for purpose of any further operations.
alongside the number of times each score occurred; scores - STANDARD DEVIATION (SD) - a measure of variability
might be listed in a tabular or graphical form. that is equal to the square root of the average squared
- MEASURES OF CENTRAL TENDENCY - indicates the deviations about the mean. • The square root of the variance. •
average of midmost scores between the extreme scores in a A low SD indicates that the values are close to the mean, while
distribution. a high SD indicates that the values ae dispersed over a wider
MEAN - the most common measure of central tendency. It range.
takes into account the numerical value of every score; “average - SKEWNESS - refers to the absence of symmetry; an
of scores” indication of how a measurement in a distribution is
MEDIAN - the middle most score in the distribution; distributed.
determined by arranging the scores in either ascending or POSITIVELY SKEWED - a type of distribution in which
descending order. most values are clustered around the left tail of the distribution
MODE - the most frequently occurring score in a distribution while the right tail of the distribution is longer; means the
of scores outliers of the distribution curve are further out towards the
MEASURES OF VARIABILITY right and closer to the mean on the left.
NEGATIVELY SKEWED - a type of distribution in which
Variability - indication of how scores in a distribution are scattered or more values are concentrated on the right side (tail) of the
dispersed. distribution graph while the left tail of the distribution graph is
longer; means the outliers of the distribution curve are further
out towards the left and closer to the mean on the right.
- KURTOSIS - The steepness of the distribution in its center;
describes how heavy or light the tails are.
PLATYKURTIC - relatively flat, gently curved
MESOKURTIC - moderately curved, somewhere in the
middle
LEPTOKURTIC - relatively peaked
50% of the scores occur above the mean and 50% of the scores
occur below the mean.
Approximately 34% of all scores occur between the mean and
1 SD above the mean.
Approximately 34% of all scores occur between the mean and
1 SD below the mean.
Approximately 68% of all scores occur between the mean and
+/- 1 SD.
Approximately 95% of all scores occur between the mean and
NORMAL CURVE- bell-shaped, smooth, mathematically defined +/- 2 SD.
curve that is highest at its center.
STAN DARD SCORES - raw scores that have been converted
- It is perfectly symmetrical with no skewness. from one scale to another scale, where the latter scale has some
- Majority of the test takers are bulked at the middle of the arbitrarily set mean and standard deviation; provides a context of
distribution; very few test takers are at the extremes. comparing scores on two different tests by converting scores from
- Mean = Median = Mode the two tests into z-score.
- Q1 and Q3 have equal distances to the Q2 (median).
TYPES OF STANDARD SCORES
Cumulative scoring - assumption that higher the test taker’ score is, ERROR VARIANCE - the component of a test score attributable to
there is the presumption to be on the targeted ability or trait sources than the trait or ability measured
ASSUMPTION 3 - Test-Related Behavior Predicts Non-Test- Related Assessors and the people assessed can be sources of error.
Behavior
ASSUMPTION 6 - Testing and Assessment Can Be Conducted in a
Provides some indication of the examinee’s behavior Fair and Unbiased Manner
outside the testing procedure.
Obtained sample of behavior is typically used to make Several controversies for fair uses of tests have arisen in the
predictions about future behavior. history of the profession.
In some cases, testing and assessment methods are used to One source of fairness-related problems is the test user
postdict – it aids in understanding behavior that has already who attempts to use a particular test on people whose
taken place. background differ from other people the test was made
and intended for.
Other testing controversies in relation to employment, who
ASSUMPTION 4 - Tests and Other Measurement Techniques Have gets admission in a school, and other opportunities have
Strengths and Weakness also risen.
However, all of these are still considered tools – one which
Competent test users must understand how a test was
we can use properly or not.
developed, the circumstances under which it is appropriate
to administer the test, how to administer the test and to ASSUMPTION 7 - Testing and Assessment Benefit Society
whom, how the test results should be interpreted, and how
those limitations might be compensated for by data from Without test, there will be….
other sources.
Subjective personnel hiring process
Children with special needs might be assigned to certain
ASSUMPTION 5 - Various Sources of Error Are Part of the classes by gut feel of the teachers and school administrators
Assessment Process Great needs to diagnose educational difficulties
No instruments to diagnose neuropsychological impairments
No practical way for military to screen thousands of recruits
Typical vs Maximum Performance - Scaling – process of setting rules for assigning numbers in
measurement.
Maximum/Maximal
Typical Performance Test Designing a measuring device for the trait/ability being
Performance Test
- How well the person - How well the person measured
typically does can do Manifested through its item format
- No right or wrong - Have a correct answer Dichotomous - offers two alternatives for each item.
answers - Achievement and Usually a point is given for the selection of one of the
- Personality aptitude, speed test, alternatives. The most common example of this format
inventories, attitude and power test is the true-false examination.
scales, and opinion
Polytomous - (sometimes called polychotomous)
questionnaires
Test development- is an umbrella term for all that goes into resembles the dichotomous format except that each item
the process of creating a test. has more than two alternatives. Examples are multiple
The process of developing a test occurs in five stages: choice format or distractors- incorrect choices.
1. Test conceptualization – stage wherein the following
are determined: construct, goal, user, taker,
administration, format, response, benefits, cost, and
interpretation.
- determination whether the test would be norm-
referenced or criterion referenced.
Norm-referenced tests compare individual performance with
the performance of a group.
Criterion-referenced assessments measure how well a
student has mastered a specific learning goal (or objective)
Pilot work, pilot study, and pilot research refer, in general,
to the preliminary research surrounding the creation of a
prototype of the test. Test items may be pilot studied (or Likert - because it was used as part of Likert’s (1932)
piloted) to evaluate whether they should be included in the method of attitude scale construction. A scale using the
final form of the instrument Likert format consists of items such as “I am afraid of
heights”. Instead of asking for a yes-no reply, five
2. Test construction – writing the test items as well as alternatives are offered: strongly disagree, disagree,
formatting items, scoring rules, and otherwise designing neutral, agree, and strongly agree. In some applications,
and building the test. six options are used to avoid allowing the respondent to
be neutral. The six responses might be strongly
disagree, moderately disagree, mildly disagree, mildly conditions under which the “final version” of the test
agree, moderately agree, and strongly agree. will be administered.
Category - a technique that is similar to the Likert - Issues may include: determination of target population and
format but that uses an even greater number of choices. number of samples for test tryout (# of items x 10), and test
On a scale from 1 to 10, with 1 as the lowest and 10 as tryout should be executed under conditions as identical as
the highest. Visual analogue scale- . Popular for possible to the conditions under which the standardized test
measuring self- rated health. Using this method, the will be administered.
respondent is given a 100-millimeter line and asked to
place a mark between two well-defined endpoints. 4. Item analysis – entails procedures usually statistically
Item pool – usually 2 times the intended final form number of designed to explore how individual test items works as
items. (3 items is advised for inexperienced test developer.) compared to other items in the test and in the context of
- Reservoir from which items will or will not be drawn for the the whole test.
final version of the test. - Determination of the following:
- Final test items should contain all domains of the test. Reliability - Consistency of scores obtained when retested with
Determination of scoring model the same test or with an equivalent form of test
Cumulative - assumes that the more the test taker Validity - Degree to which the test measures what it’s
responds in a particular fashion, the more the test taker supposed to measure
exhibits the attribute being measured—is probably the - Requires independent, external criteria against which the test is
most common method for determining an individual's evaluated
final test score. - Validity coefficient – determines how closely the criterion
Categorical - test taker responses earn credit toward performance can be predicted from the test score
placement in a particular class or category with other Low VC – low correspondence between test
test takers whose pattern of responses is presumably performance and criterion
similar in some way. This approach is used by some High VC – high correspondence between test
diagnostic systems wherein individuals must exhibit a performance and criterion
certain number of symptoms to qualify for a specific - Broader tests must be validated against accumulated data based
diagnosis. on different investigations
Ipsative - comparing a test taker’s score on one - Validity is first established on a representative sample of test
construct within a test to another construct within that takers before it is ready for use
same test. Tells us what the test is measuring
Creation of the “final form” of the test Tells us the extent to which we know what the test
measures
3. Test tryout – administration of a test to a representative Item Difficulty - calculating the proportion of the total number
sample if test takers under conditions that stimulate the of test takers who answered the item correctly.
Item Discrimination - measure of the difference between the - Questions on test’s validity may focus on the items that
proportion of high scorers answering an item correctly and the collectively make up the test.
proportion of low scorers answering the item correctly. The - A test may be reliable but not valid but it cannot be valid
higher the value of d, the greater the number of high without being reliable.
scorers answering the item correctly. Norms - test performance data of a particular group of
test takers that are designed for use as a reference when
Differential item functioning (DIF) - a phenomenon, wherein evaluating or interpreting individual test scores
an item functions differently in one group of test takers as - Obtained by administering the test to a sample of people and
compared to another group of test takers known to have the obtaining the distribution of scores for that group
same (or similar) level of the underlying trait Normative Sample- Group of people whose
performance on a particular test in analyzed for
5. Test revision – balancing the weakness and strengths reference in evaluating the performance of an individual
of the test or an item. test taker
- After all necessary items has been revised based on the analysis Norming- The process of deriving norms
of reliability, validity, item difficulty, and item discrimination, - May be modified to describe a particular type of norm
the test will be ‘tried out’ again to recalibrate the psychometric derivation
properties. Standardization- the process of administering a test to a
representative sample of test takers for the purpose of
WHAT IS A GOOD TEST?
establishing norms
Psychometric Soundness – also referred as “psychometric - A test is said to be standardized when it has clearly specified
adequacy”; refers to whether the tests demonstrate sufficient procedures for administration and scoring. Typically including
levels of reliability and validity for ethical use with clients. normative data
Reliability - consistency in measurement
TYPES OF STANDARD ERROR
- the precision with which the test measures and the extent to
which error is present in measurement; “free from errors” STANDARD ERROR OF MEASUREMENT (SEM)- a statistic to
- perfectly reliable measuring tool consistently measures in the estimate the extent to which an observed score deviates from a true
same way score
Validity - when a test measures what it purports to
measure STANDARD ERROR OF ESTIMATE (SEE) - in regression, it is
- An intelligence test is valid test because it measures an estimate of the degree of error involved in predicting the value of
intelligence; the same way with personality tests; and with one variable from another
other psychological tests STANDARD ERROR OF THE MEAN (SEM) - a measure of
sampling error
STANDARD ERROR OF THE DIFFERENCE (SED) - a statistic
used to estimate how large difference between two scores should be
before the difference is considered statistically significant