Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 23

PSYCHOLOGICAL ASSESSMENT CRITERION-REFERENCED TESTS - the measures whose criteria

for passing or failing have decided beforehand.


PSYCHOLOGICAL TESTS - are objective and standardized
measure of a sample of human behavior (Anastasi & Urbina, 1997). INDIVIDUAL TESTS - instruments that are administered one-on-
one, face-to-face.
 These are instruments with three defining characteristics:
 It is a sample of human behavior. GROUP TESTS - can be administered to a group and usually done on
 The sample is obtained under standardized conditions. a paper-pencil method; and can be administered individually
 There are established rules for scoring or for obtaining
SPEED TESTS - administered under a prescribed time limit usually
quantitative information from the behavior sample.
for a short period of time and not enough for an individual to finish
PYSCHOLOGICAL MEASUREMENT- the process of assigning answering the entire test. The level of difficulty is the same for all
numbers (i.e. test scores) to persons in such a way that some attributes items.
of the person being numbers.
POWER TESTS - measure competencies and abilities. The time limit
GENERAL TYPES OF PSYCHOLOGICAL TESTS prescribed is usually enough for one to accomplish the entire test.

 These tests are categorized according to the manner of VERBAL TESTS - instruments that involve words to measure a
administration, purpose, and nature. particular domain.
 Administration - Individual; Group
NONVERBAL TESTS - instruments that do not use words, rather,
 Item Format - Objective; Projective
they use geometrical drawing or patterns.
 Response Format - Verbal; Performance
 Domain Measures - Cognitive; Affective COGNITIVE TESTS - measure thinking skills.
Types of Test AFFECTIVE TESTS - measure personality, interests, values, etc.
STANDARDIZED TESTS - have prescribed directions for TESTING OF HUMAN ABILITY
administration, scoring, and interpretation. Examples: MBTI, MMPI,
SB-5, WAIS

NON-STANDARDIZED TESTS (Informal Tests) - exemplified by


teacher-made tests either for formative or summative evaluation of
student performance. Examples: Prelim Exam, Quizzes

NORM-REFERENCED TESTS - instruments whose score


interpretation is based on the performance of a particular group.
TESTS FOR SPECIAL POPULATION - developed similarly for PROJECTIVE TECHNIQUES - are relatively unstructured tasks; a
use with persons who cannot be properly or adequately examined with task that permits almost an unlimited variety of possible responses; a
traditional instruments, such as the individual scales; follows disguised procedure task.
performance, or nonverbal tasks.
PSYCHOLOGICAL TESTING - the process of measuring
PERSONALITY TESTS - that are used for the measurement of psychology-related variables by means of devices or procedures
emotional, motivational, interpersonal, and attitudinal characteristics. designed to obtain a sample of behavior.

Approaches to the Development of Personality Assessment PSYCHOLOGICAL ASSESSMENT - the gathering and integration
of psychology-related data for the purpose of making a psychological
 Empirical-Criterion Keying - a method for developing evaluation that is accomplished through the use of tools such as tests,
personality inventories in which the items (presumed to interviews, case studies, behavioral observation, and especially
measure one or more traits) are created and then administered designed apparatuses and measurement procedures.
to a criterion group of people known to possess a certain
characteristic (e.g., antisocial behavior, significant anxiety, DIFFERENT APPROACHES TO ASSESSMENT
exaggerated concern about physical health) and to a control
group of people without the characteristic. Only those items  COLLABORATIVE PSYCHOLOGICAL ASSESSMENT -
that demonstrate an ability to distinguish between the two the assessor and the assessee work as partners from initial
groups are chosen for inclusion in the final inventory. contact through final feedback
 Factor Analysis - a method of finding the minimum number of  THERAPEUTIC PSYCHOLOGICAL ASSESSMENT -
dimensions or factors for explaining the largest number of therapeutic self-discovery and new understandings are
variables. encouraged throughout the assessment process
 DYNAMIC ASSESSMENT - the interactive approach to
 EXPLORATORY FACTOR ANALYSIS - entails psychological assessment that usually follows a model of:
estimating, or extracting factors; deciding how many evaluation> intervention> evaluation. Interactive, changing,
factors to retain: and rotating factors to an interpretable and varying nature of assessment
orientation
 CONFIRMATORY FACTOR ANALYSIS - the
degree to which a hypothetical model - which includes
factors, fits the actual data.
 Personality Theories - the result of hypotheses, experiments,
case studies, and clinical research led by scientists in the
psychology and human behavior field.
Reasons for using the test Acquiring Knowledge Relating to the Client

- Consider the problem, the adequacy of the tests you will use,
and specific applicability of that test to an individual’s unique
situation.
- Competence in merely administering and scoring tests is
insufficient to conduct effective assessment.
 Contextualize the problem

Data Collection

- This may come from a wide variety of sources, the most


frequent of which are test scores, personal history, behavioral
Phases of Psychological Assessment observations, and interview data.
- Obtain from school records, previous psychological
observations, medical records, police reports, or discuss the
client with parents or teachers.

Interpreting the Data


Evaluating the
referral question -- Acquiring
Interpreting the - Description of the client’s present level of functioning
the "backbone" of knowledge relating Data collection
data
psychological to the content - Considerations relating to etiology – the cause(s) of the
report writing.
condition.
- Prognosis - likely outcome or course of a condition; the chance
of recovery or recurrence.
- Treatment recommendations
- Clinicians should also pay careful attention to research on the
Evaluating the Referral Question
implications of incremental validity and continually be aware
- Uncover hidden agendas, unspoken expectations, and complex of the limitations and possible inaccuracies involved in clinical
interpersonal relationships, as well as explain the specific judgment.
limitations of psychological tests.
 Assets and limitations of psychological tests clarify
the requests they receive
- Contact the referral source at different stages in the assessment
process.
Initial data collection - Uses: diagnosis, treatment, selection, decisions.
- Panel interview – involves multiple interviewers.
Development of inferences  Advantage: minimizes the idiosyncratic biases of a lone
interviewer.
Reject/modify/accept inference  Disadvantage: costly; the use of multiple interviewers
may not be justified
Develop and integrate hypothesis  PORTFOLIO - contains a sample of one’s ability and
accomplishment which can be used for evaluation.
Dynamic model of the person - Case History Data - are the records, transcripts, and other
accounts in written, pictorial, or other form that preserve
Situational variables archival information, official and informal accounts, and other
data and items relevant to the assesse.
Prediction of behavior - Case Study or Case History - a report of illustrative account
concerning a person or an event that was compiled on the basis
THE TEST- defined simply as a measuring device or procedure. of case history data.
 BEHAVIORAL OBSERVATION - monitoring the actions of
PSYCHOLOGICAL TEST- refers to a device or procedure designed
others or oneself by visual or electronic means while recording
to measure variables related to psychology:
quantitative or qualitative information regarding the actions;
aids the development of therapeutic intervention which is
extremely useful in institutional settings such as schools,
hospitals, prisons, and group homes.
- Naturalistic Observation - observing behaviors in their
natural setting in which behavior would typically be expected
to occur.
 ROLE-PLAY TESTS - acting in improvised or partially
improvised part in a simulated situation. Assesses are directed
to act as if they were in a particular situation.
- Evaluation: expressed thoughts, behaviors, abilities, and
other related variables.
TOOLS OF PSYCHOLOGICAL ASSESSMENT - can be used as both a tool of assessment and a measure

 INTERVIEW- a method of gathering information through


direct communication involving reciprocal exchange.
- Differences: purpose, length, and nature.
 SIMULATION – the realistic imitation of a real world process PAPER AND PENCIL VS COMPUTER FORMAT
which may involve the use of computer programs and/or
modelled job equipment. In contrast to being directed,  Method of presentation – Which one has clearer pictures or
assessees are told to act as themselves during simulations. more readable items?
 COMPUTERS AS TOOLS – Computer Assisted  Requirements of the task or test-taking strategies – Can you
Psychological assessment entails the use of computers in: go back and review earlier items? How about answer later
 presenting and administering test items and instructions items first? Are there time limits per question?
 quick and efficient scoring with transformation to  Method of responding – Will they be required to check items,
standard scores shade, touch the screen? Are blank responses acceptable?
 can be programmed to generate basic test result  Method of interpretation – Are norms for a pen and paper test
interpretations acceptable to computerized versions?
- services using computers may include: local processing PARTICIPANTS IN THE TESTING PROCESS AND THEIR
terminal to mainframe computer, and through a remote ROLES
central location.
- Some other computer-related characteristics:  Test authors and developers - conceive, prepare, and develop
 computers can store large amounts of data like item test; also find a way to disseminate their tests.
banks or normative data  Test publishers - publish, market, and sell tests, thus
 computers allow for control over other devices such as controlling their distribution.
optical scanners, printers, and video disc presentations  Test reviewers – prepare evaluative critiques of tests based on
 allows for Computer Adaptive Testing (CAT) – refers technical and practical merits.
to the computer’s ability to tailor the test taken by the  Test users - select or decide which specific test/s will be used
individual according to his/her test-taking ability or for some purposes; may also act as examiners or scorers.
test-taking pattern.  Test sponsors - institutional boards or agencies that contact
- Interpretive Reports - distinguished by its inclusion of test developers or publishers for various testing services.
numerical or narrative interpretive statements in the  Test administrators or examiner - administer the test either
report. to one individual at a time or to groups.
- Consultative Reports - written in language appropriate  Test takers - take the test by choice or necessity.
for communication between assessment professionals  Test scorers - tally the raw scores and transform into test
and may provide expert opinion concerning data scores through objective or mechanical scoring or through the
analysis. application of evaluative judgment.
- Integrative Report - employs previously collected data  Test score interpreters - interpret test results to consumers
into the test report. such as; individual test takers or their relatives, other
professionals, or organizations of various kinds.
Test takers may also vary in variables including… neuropsychological tests, or other specialized instruments
depending on the presenting or suspected problem area.
 amount of test anxiety experienced and how it may  COUNSELING SETTINGS - aims to improve the assesee’s
significantly affect test results adjustment, productivity, or some related variables; may be
 capacity and willingness to cooperate with the examiner personality, interest, attitude, and values tests.
 physical or emotional disturbances experienced during  GERIATRIC SETTINGS - quality of life assessment which
testing measures variables related to perceived stress, loneliness,
 extent to which they are predisposed to agreeing or sources of satisfaction, personal values, quality of living
disagreeing when presented with stimulus statements conditions, and quality of friendships and social support.
 extent of prior coaching received  BUSINESS & MILITARY SETTINGS - decision making
 extent to which they are lucky or “can beat the odds” in a test about the careers of the personnel.
 GOVERNMENTAL & ORGANIZATIONAL
ASSESSMENT FOR PEOPLE WITH DISABILITIES
CREDENTIALING - licensing or certifying exams.
 Assessing people with disabilities requires accommodation –  ACADEMIC RESEARCH SETTINGS - sound knowledge
an adaptation of a test or procedure/substitution of one test for of measurement principles and assessment tools are required
another to make the assessment more suitable for the person prior to research publication.
with special needs.
PROTOCOL - typically refers to the form or sheet or booklet on
 Accommodation of special needs allows for alternate
which a test taker's responses are entered. • May also be used to refer
assessments – the evaluative or diagnostic procedure that
to a description of a set of test-or assessment-related procedures.
varies from the standardized way of measurement.
 Forms of accommodation: RAPPORT - the working relationship between the examiner and the
1. The form of the test as presented to the test taker examinees.
2. The way responses to the tests are obtained
3. Modification of the physical environment ACCOMMODATION - the adaptation of a test, procedure, or
4. Modifications of the interpersonal environment situation or the substitution of one test for another, to make the
assessment more suitable for an assessee without an exceptional need.
SETTINGS WHERE ASSESSMENTS ARE CONDUCTED

 EDUCATIONAL SETTINGS - helps to identify children


who may have special needs; diagnostic tests and/or
achievement tests
 CLINICAL SETTINGS - for screening and or diagnosing
behavioral problems; may be intelligence, personality,
TEST USER QUALIFICATION LEVELS BRIEF HISTORY OF PSYCHOLOGICAL TESTING

 20th century France - the roots of contemporary psychological


testing and assessment.
 1905 - Alfred Binet and a colleague published a test to help
place Paris school children in classes.
 1917 World War I - the military needed a way to screen large
numbers of recruits quickly for intellectual and emotional
problems.
 World War II - military depend even more on psychological
tests to screen recruits for the service.
 Post-war - more and more tests purporting to measure an ever-
widening array of psychological variables were developed and
used.

PROMINENT FIGURES IN THE HISTORY OF


PSYCHOMETRICS
SOURCES OF INFORMATION ABOUT TESTS
 INDIVIDUAL DIFFERENCES – “In spite of our similarities,
 TEST CATALOGUES - usually contain only a brief
no two humans are exactly the same.”
description of the test and seldom contain the kind of detailed
technical information. CHARLES DARWIN - believed that some of the individual
 TEST MANUALS - detailed information concerning the differences are more adaptive than others; individual differences, over
development if a particular test and technical information time, lead to more complex, intelligent organisms.
relating to it.
 REFERENCE VOLUMES - periodically updated which FRANCIS GALTON - cousin of Charles Darwin; he was an applied
provides detailed information for each test listed; Mental Darwinist and claimed that some people possessed characteristics that
Measurements Yearbook. made them more fit than others; wrote Hereditary Genius (1869).
 JOURNAL ARTICLES - contain reviews of the test, updated, - set up an anthropometric laboratory at the International
or dependent studies of its psychometric soundness, or Exposition of 1884; he also noted that persons with
examples of how the instrument was used in either research or mental retardation also tend to have diminished ability
applied context. to discriminate among heat, cold, and pain.
 ONLINE DATABASES - maintained by APA; PsycINFO,
ClinPSYC, PsyARTICLES, etc.
CHARLES SPEARMAN - had been trying to prove Galton’s ERNST HEINRICH WEBER - proposed the concepts of sensory
hypothesis concerning the link between intelligence and visual acuity. thresholds and Just Noticeable Differences (JND).

- expanded the use of correlational methods pioneered by GUSTAV THEODOR FECHNER - involved in the Mathematic
Galton and Karl Pearson, and provided the conceptual sensory thresholds of experience;
foundation for factor analysis, a technique for reducing
- founder of Psychophysics, and one of the founders of
a large number of variables to a smaller set of factors
that would become central to the advancement of Experimental Psychology.
testing and trait theory; devised a theory of intelligence - Weber-Fechner Law was the first to relate to situation and
that emphasized a general intelligence factor (g) present stimulus. It states that the strength of a sensation grows as the
in all intellectual activities. logarithm of the stimulus intensifies.
- considered by some as the founder of Psychometrics.
KARL PEARSON - famous student of Galton; continued Galton’s
early work with statistical regression. GUY MONTROSE WHIPPLE - was influenced by Fechner and was
a student of Titchener; pioneered the human ability testing.
- invented the formula for the coefficient of correlation;
Pearson’s r. - conducted seminar that changed the field of psychological
testing (Carnegie Institute in 1918).
JAMES MCKEEN CATTELL - first person who used the term - because of his criticisms, APA issued its first standards for
mental test; made a dissertation on reaction time based upon Galton’s professional psychological testing.
work. - construction of Carnegie Interest Inventory – Strong
Vocational Interest Blank.
- tried to link various measures of simple discriminative,
perceptive, and associative power to independent estimates of LOUIS LEON THURSTONE - a large contributor to factor
intellectual level, such as school grades. analysis and attended Whipple’s seminars; his approach to
measurement was called the Law of Comparative Judgment
 EARLY EXPERIMENTAL PSYCHOLOGISTS which is a mathematical representation of a discriminal process,
- Early 19th century, scientists were generally interested in which is any process in which a comparison is made between pairs
identifying common aspects, rather than individual differences. of a collection of entities with respect to magnitudes of an
- Differences between individuals were considered as source of attribute, trait, attitude, and so on.
error, which rendered human measurement inexact.

JOHAN FRIEDRICH HERBART - proposed the Mathematical


models of the mind; the founder of Pedagogy as an academic
discipline.
 INTEREST IN MENTAL DEFICIENCY - SB scale became more psychometrically sound and the term IQ
was introduced
JEAN ETIENNE ESQUIROL - a French physician and was the
- IQ = Mental Age / Chronological Age X 100
favorite student of Phillippe Pinel– the founder of Psychiatry.
ROBERT YERKES - president of the APA who was commissioned
- was responsible for the manuscript on mental retardation which
by the US Army to develop structured tests of human abilities.
differentiated between insanity and mental retardation.
- WW1 arose the need for large-scale group administered ability
EDOUARD SEGUIN - a French physician who pioneered in training
tests by the army.
mentally retarded persons; rejected the notion of incurable mental
- Army Alpha – verbal; administered to literate soldiers.
retardation (MR).
- Army Beta – nonverbal; administered to illiterate soldiers.
- 1837, he opened the first school devoted to teaching children
DAVID WECHSLER - subscales on his tests were adopted from the
with MR.
army scales; produces several scores of intellectual ability rather than
- 1866, he conducted experiments with physiological training of
Binet’s single scores; evolved to the Wechsler Series of intelligence
MR which involved sense/muscle training used until today and
tests (WAIS, WISC, etc.)
leads to nonverbal tests of intelligence (Seguin Form Board
Test). PERSONALITY TESTING- these tests were intended to measure
personality traits.
EMIL KRAPELIN - devised a series of examinations for evaluating
emotionally impaired individuals. TRAITS - are relatively enduring dispositions (tendencies to act,
think, or feel in a certain manner in any given circumstance).
 INTELLIGENCE TESTING
- 1920s - The rise of personality testing
ALFRED BINET - appointed by the French government to develop a
- 1930s - The fall of personality testing
test that will place Paris school children to special classes who failed
- 1940s - The slow rise of personality testing
to respond to normal schooling;

- devised the first intelligence test: the Binet-Simon scale of


1905. The scale has standardized administration and used a
standardization sample.

LEWIS MADISON TERMAN - translated the Binet-Simon Scales in


English to be used in the US and in 1916, it was published as the
Stanford-Binet Intelligence Scale.
PERSONALITY TESTING: First Structured Test PERSONALITY TESTING: Second Structured Test

WOODWORTH PERSONAL DATA SHEET - the first objective MINNESOTA MULTIPHASIC PERSONALITY INVENTORY
personality test meant to assist in psychiatric interviews; was (MMPI 1943) - tests like Woodworth made too many assumptions;
developed during the WW1. the meaning of the test response could only be determined by
empirical research.
- designed to screen out soldiers unfit for duty.
- mistakenly assume that subject’s response could be taken at - MMPI-2 and MMPI-A are most widely used.
face value.
RAYMOND B. CATTELL (16 PF) - the test was based on factor
analysis – a method for finding the minimum number of dimensions or
factors for explaining the largest number of variables.

- J. R. Guilford, the first to apply factor analytic approach to


test construction.

THE RISE OF MODERN PSYCHOLOGICAL TESTING

1900s - Everything necessary for the rise of the first truly modern and
successful psychological test was in place.
PERSONALITY TESTING: Slow Rise – Projective Techniques
1904 - Alfred Binet was appointed to devise a method of evaluating
HERMAN RORSCHACH (Rorschach Inkblot Test) - pioneered
children who could not profit from regular classes and would require
the projective assessment using his inkblot test; started with great
special education.
suspicion; first serious study made in 1932.
1905 - Binet and Theodore Simon published the first useful
- symmetric colored and black & white inkblots and was instrument to measure general cognitive abilities or global intelligence.
introduced to the US by David Levy.t in 1921.
1908 - Binet revised, expanded, and refined his first scale.
THEMATIC APPERCEPTION TEST (TAT)- was developed in
1935 and composed of ambiguous pictures that were considerably 1911 - The birth of the IQ; William Stern (1911) proposed the
more structured than the Rorschach. computation for IQ based on Binet-Simon scale (IQ = Mental Age /
Chronological Age X 100).
- Subjects are shown pictures and asked to tell a story including:
 What has led up to the event shown; 1916 - Lewis Terman translated the Binet-Simon scales to English and
 What is happening at the moment; published it as the Stanford-Binet Intelligence Scale.
 What the characters are feeling and thinking; and o What the
outcome of the story was.
1917 World War I - Robert Yerkes, APA President developed a Features of a psychological test:
group test of intelligence for US Army; pioneered the first group
testing - Army Alpha and Army Beta.  Sample of behavior
Objective and standardized measure of behavior
1918 - Arthur Otis devised a multiple choice items that could be Diagnostic or predictive value depends on how much it is an
scored objectively and rapidly; published Group Intelligence Scale that indicator of relatively broad and significant areas of behavior
had served as model for Army Alpha. Tests alone are not enough – it has to be empirically
demonstrated that test performance is related to the skill set for
1919 - E.L Thorndike produced an intelligence test for high school
which he or she is tested
graduates
Tests need not resemble closely the behavior trying to be
USES OF PSYCHOLOGICAL TESTS predicted
Prediction – assumes that the performance of the individual in
1. Measure differences between individuals or between reactions the test generalizes to other situations
of the same individual under different circumstances Capacity – can tests measure “potential”?
2. Detection of intellectual difficulties, severe emotional - Only in the sense that present behavior can be used as an
problems, and behavioral disorders indicator of future behavior.
3. Classification of students according to type of instruction, No psychological test can do more than measure behavior
slow and fast learners, educational and occupational  Standardization
counseling, selection of applicants for professional schools Uniformity of procedure when administering and scoring a test
4. Individual counseling – educational and vocational plans, Testing conditions must be the same for all
emotional well-being, effective interpersonal relations, enhance Establishing norms (normal or average performance of others
understanding and personal development, aid in decision- who took the same test under the same conditions)
making Raw scores are meaningless unless evaluated against suitable
5. Basic research – nature and extent of individual differences, interpretative data
psychological traits, group differences, identification of Standardization sample – indicates average performance and
biological and cultural factors frequency of deviating by varying degrees from the average
6. Investigating problems such as developmental changes in the - Indicates position with reference to all others who took the test
lifespan, effectiveness of educational interventions, - In personality tests, indicates scores typically obtained by
psychotherapy outcomes, community program impact average persons
assessment, influence of environment on performance  Objective measurement of difficulty
7. Measures broad aptitudes to specific skills Objective – scores remain the same regardless of examiner
characteristics
Difficulty – items passed by the most number of people are the
easiest
MEASUREMENT - the act of assigning number or symbols to Scale of Equal Absolute
characteristics of things (people, events, etc.) according to rules. Magnitude
measurement Interval Zero

SCALE - set of numbers (or other symbols) whose properties model Nominal No No No
empirical properties of the objects to which the numbers are assigned. Ordinal Yes No No
Interval Yes Yes No
CATEGORIES OF SCALES
Ratio Yes Yes Yes
 DISCRETE- Values that are distinct and separate; they can be  Nominal - known as the simplest form of measurement.
counted. - Involve classification or categorization based on one or more
 CONTINUOUS - exists when it is theoretically possible to distinguishing characteristics, where all things measured must
divide any of the values of the scale; the values may take of be placed into mutually exclusive and exhaustive categories.
any value within a finite or infinite interval. • Example: DSM5, Gender of the patients, colors
 Ordinal - permits classification and in addition, rank ordering
ERROR - refers to the collective influence of all the factors on a test on some characteristics is also permissible.
score or measurement beyond those specifically measured by the test - It implies nothing about how much greater one ranking is than
or measurement; it is very much an element of all measurement, and it another; and the numbers do not indicate units of measurement.
is an element for which any theory of measurement must surely • Examples: fastest reader, size of waistline, job positions
account.  Interval - Permit both categorization and rank, in addition, it
Scales of Measurement contain intervals between numbers, thus, each unit on the scale
is exactly equal to any other unit on the scale.
Properties of Scales: - No absolute zero point however, it is possible to average a set
of measurements and obtain a meaningful result.
1. Magnitude – “moreness”; suggests that one is more than the - For example, IQs of 80 and 100 is thought to be similar to that
others existing between IQs of 100 and 120. If an individual achieved
2. Equal interval – the difference between two points at any an IQ of 0, it would not be an indication of zero intelligence or
place has the same meaning as the difference between two total absence of it.
other points on other places. - Examples: temperature, time, IQ scales, psychological scales
3. Absolute zero – zero suggests absence of the variable being
 Ratio - Contains all the properties of nominal, ordinal, and
measured
interval scales, and it has a true zero point; negative values are
not possible.
- A score of zero means the complete absence of the attribute
being measured.
- Examples: exam score, neurological exam (i.e. hand grip),
heart rate
DESCRIPTIVE STATISTICS - used to say something about a set of - RANGE - the simplest measure of variability. • It is the
information that has been collected only. difference between the highest and the lowest score.
 INTERQUARTILE RANGE - a measure of variability equal
- DISTRIBUTION - set of test scores arrayed for recording or
to the difference between Q3 and Q1.
study.  SEMI-INTERQUARTILE RANGE - equal to the
- RAW SCORE - Is a straightforward, unmodified accounting interquartile range divided by two
of performance that is usually numerical; may reflect a simple - AVERAGE DEVIATION- another tool that could be used to
tally, such as the number of items responded to correctly on an describe the amount of variability in a distribution; rarely used
achievement test. perhaps due to the deletion of algebraic signs renders it is a
- FREQUENCY DISTRIBUTION - all scores are listed useless measure for purpose of any further operations.
alongside the number of times each score occurred; scores - STANDARD DEVIATION (SD) - a measure of variability
might be listed in a tabular or graphical form. that is equal to the square root of the average squared
- MEASURES OF CENTRAL TENDENCY - indicates the deviations about the mean. • The square root of the variance. •
average of midmost scores between the extreme scores in a A low SD indicates that the values are close to the mean, while
distribution. a high SD indicates that the values ae dispersed over a wider
 MEAN - the most common measure of central tendency. It range.
takes into account the numerical value of every score; “average - SKEWNESS - refers to the absence of symmetry; an
of scores” indication of how a measurement in a distribution is
 MEDIAN - the middle most score in the distribution; distributed.
determined by arranging the scores in either ascending or  POSITIVELY SKEWED - a type of distribution in which
descending order. most values are clustered around the left tail of the distribution
 MODE - the most frequently occurring score in a distribution while the right tail of the distribution is longer; means the
of scores outliers of the distribution curve are further out towards the
MEASURES OF VARIABILITY right and closer to the mean on the left.
 NEGATIVELY SKEWED - a type of distribution in which
Variability - indication of how scores in a distribution are scattered or more values are concentrated on the right side (tail) of the
dispersed. distribution graph while the left tail of the distribution graph is
longer; means the outliers of the distribution curve are further
out towards the left and closer to the mean on the right.
- KURTOSIS - The steepness of the distribution in its center;
describes how heavy or light the tails are.
 PLATYKURTIC - relatively flat, gently curved
 MESOKURTIC - moderately curved, somewhere in the
middle
 LEPTOKURTIC - relatively peaked

 50% of the scores occur above the mean and 50% of the scores
occur below the mean.
 Approximately 34% of all scores occur between the mean and
1 SD above the mean.
 Approximately 34% of all scores occur between the mean and
1 SD below the mean.
 Approximately 68% of all scores occur between the mean and
+/- 1 SD.
 Approximately 95% of all scores occur between the mean and
NORMAL CURVE- bell-shaped, smooth, mathematically defined +/- 2 SD.
curve that is highest at its center.
STAN DARD SCORES - raw scores that have been converted
- It is perfectly symmetrical with no skewness. from one scale to another scale, where the latter scale has some
- Majority of the test takers are bulked at the middle of the arbitrarily set mean and standard deviation; provides a context of
distribution; very few test takers are at the extremes. comparing scores on two different tests by converting scores from
- Mean = Median = Mode the two tests into z-score.
- Q1 and Q3 have equal distances to the Q2 (median).
TYPES OF STANDARD SCORES

 z scores - known as the golden scores.


- results from the conversion of a raw score into a number
indicating how many SD units the raw score is below or above
the mean of the distribution.
- mean = 0; SD = 1
- zero plus or minus one scale (0 +/- 1)
- scores can be positive or negative.
 t scores - fifty plus or minus ten scale (50 +/- 10)
- mean = 50; SD = 10
- devised by W.A McCall (1922, 1939) and named a T score in
honor of his professor E.L. Thorndike CORRELATIONAL STATISTICS - statistical tools for testing the
- composed of a scale that ranges from 5 SD below the mean to 5 relationships or associations between variables.
SD above the mean
- A statistical tool of choice when the relationship between
- none of the scores are negative
variables is linear and when the variables being correlated are
 Stanine - takes the whole numbers from 1 to 9 without
continuous.
decimals, which represent a range of performance that is half of
SD in width. COVARIANCE - how much two scores vary together.
- Mean = 5; SD = 2
CORRELATION COEFFICIENT - a mathematical index that
- used by US Airforce Assessment
describes the direction and magnitude of a relationship; always ranges
 Deviation IQ - used for interpreting IQ scores
from -1.00 to +1.00 only.
- mean = 100; SD = 15
 STEN - standard ten PEARSON PRODUCT MOMENT CORRELATION - determines
- mean = 5.5; SD = 2 the degree of variation in one variable that can be estimated from
 Graduate Record Exam (GRE) or Scholastic Aptitude Test knowledge about variation in other variable.
(SAT) - used from admission to graduate school and college.
- correlated two variables in interval or ratio scale format.
- mean = 500; SD = 100
- devised by Karl Pearson
TETRACHORIC CORRELATION - correlated two dichotomous
data; both should be artificial dichotomy; example, passing or failing a
TYPES OF CORRELATIONS test and being highly anxious or not.
 SPEARMAN RHO CORRELATION - a method of KENDALL COEFFICIENT OF CONCORDANCE - a measure
correlation for finding the association between two sets of that uses ranks to assess agreement between 3 or more raters.
ranks thus, two variables must be in ordinal scale.
- frequently used when the sample size is small (fewer than 30 CORRELATIONAL-LINK, ASSOCIATION, RELATIONSHIP
pairs of measurements).
Variable 1 Variable 2
- also called rank-order correlation coefficient or rank-
Pearson Product
difference correlation. Continuous Continuous
Moment Correlation
- devised by Charles Spearman Spearman Rho/Rank
Rank Rank
 BISERIAL CORRELATION - expresses the relationship Correlation
between a continuous variable and an artificial dichotomous Point Biserial
True Dichotomy Continuous
variable. Correlation
Biserial
 For example, the relationship between passing or failing the bar Artificial Dichotomy Continuous
Correlation
exam (artificial dichotomous variable) and general weighted True/Artificial
average (GPA) in law school (continuous variable) Phi-coefficient True Dichotomy
Dichotomy
 POINT BISERIAL CORRELATION - correlates one Tetrachoric
Artificial Dichotomy Artificial Dichotomy
continuous and one true dichotomous data. Correlation
 For example, score in the test (continuous or interval) and Kendall Coefficient
Agreement among 3 or more raters
of Concordance
correctness in an item within the test (true dichotomous).
ISSUES IN THE USE OF CORRELATION
TRUE DICHOTOMY - there are only two possible categories that
RESIDUAL - difference between the predicted and the observed
are formed naturally; example: Gender (M/F)
values.
ARTIFICIAL DICHOTOMY - Reflect an underlying continuous
STANDARD ERROR OF ESTIMATE - standard deviation of the
scale forced into a dichotomy; there are other possibilities in a certain
residual; measure of accuracy and prediction.
category; example: Exam score (Pass or Fail)
SHRINKAGE- the amount of decrease observed when a regression
PHI-COEFFICIENT - correlates two dichotomous data; at least one
equation is created for one population and then applied to another.
should be true dichotomy; example, gender population who passed or
fail the 2018 Physician Licensure Exam. COEFFICIENT OF DETERMINATION (R2) - tells the proportion
of the total variation in scores on Y that we know as a function of
information about X. It also suggests the percentage shared by two MULTIPLE REGRESSION ANALYSIS - a type of multivariate
variables; the effect of one variable to the other. (three or more variables) analysis which finds the linear combination
of the three variables that provides the best prediction.
COEFFICIENT OF ALIENATION - measures the non- association
between two variables.  Statistical technique in predicting one variable from a series of
predictors.
RESTRICTED RANGE - significant relationships are difficult to
 Intercorrelations among all the variables involved.
find if the variability is restricted.
 Applicable only for all data that are continuous.
Essential Facts about Correlation
STANDARDIZED REGRESSION COEFFICIENTS - also known
 The degree of relationship between two variables is indicated as beta weights; tells how much a variable from a given list of
by the number in the coefficient, whereas the direction of the variables predict a single variable.
relationship is indicated by the sign.
FACTOR ANALYSIS - used to study the interrelationships among a
 Correlation, even if high, does not imply causation.
set of variables without reference to a criterion.
 High correlations allow us to make predictions
- Factors –the variables; also called principal components.
REGRESION - the analysis or relationships among variables for the
- Factor Loading –the correlation between original items and
purpose of understanding how one variable may predict the other
the factors.
through the use of linear regression.
META-ANALYSIS - family of techniques used to statistically
 Predictor (X) – serves as the IV; causes changes to the other
combine information across studies to produce single estimates of the
variable.
data under study.
 Predicted (Y) – serves as the DV; result of the change as the
value of predictor changes. ADVANTAGES:
 INTERCEPT (a) - the point at which the regression line
 Can be replicated
crosses the Y axis
 Conclusions tend to be more reliable and precise than
 REGRESSION COEFFICIENT (b) - the slope of the
conclusions from single studies
regression line
 More focus on effect size than statistical significance alone
 Represented by the formula: Y = a + bX
 Promotes evidenced-based practice –a professional practice
 REGRESSION LINE- best fitting straight line through a set
that is based on clinical research findings.
of points in a scatter plot
 STANDARD ERROR OF ESTIMATE - measures the Effect Size – the estimate of strength of relationship or size of
accuracy of prediction differences; typically expressed as a correlation coefficient.

PARAMETRIC VS NON PARAMETRIC TESTS


PARAMETRIC - assumptions are made for the population TRAITS - any distinguishable, relatively enduring way in which one
individual varies from one another
 homogenous data; normally distributed samples
 has mean and SD  Stable characteristics a person has
 randomly selected samples  Can be seen in most situations in a person’s life (long-lasting)
 A collection of psychological traits form a psychological
NON-PARAMETRIC - assumptions are made for the samples only type
 heterogeneous data; skewed distribution.  Traits exist as psychological constructs – an informed,
scientific concept developed or constructed to describe or
 ordinals and categories
explain behavior. We infer the existence of these constructs
 highly purposive sampling
through overt behavior.
Number of Scale of  Traits are relatively enduring – not expected to be
Number of
times DV is measurement of manifested in behavior 100% of the time.
groups
measured DV  In psychological measurement, refer to a way in which one
t-test independent individual varies from another. Context is also very
1 2 Interval/Ratio
means important when selecting appropriate trait terms for observed
t-test dependent
2 1 Interval/Ratio behavior.
means
ANOVA STATES – a relatively less enduring, temporary change in one’s
1 >2 Interval/Ratio
1 way
personality
ANOVA Repeated
>2 1 Interval/Ratio
Measures  A state is affected more by the power of a situation.
Mann-Whitney U
1 2 Ordinal  Brief and can be controlled by manipulating the situation
Test
Wilcoxon Signed
2 1 Ordinal
Rank Test
ASSUMPTION 2 - Psychological Traits and States Can Be Quantified
Kruskal-Wallis H
1 >2 Ordinal and Measured
Test
Friedman Test >2 1 Ordinal
 Traits and states shall be clearly defined to be measured
accurately.
ASSUMPTIONS ABOUT PSYCHOLOGICAL TESTING AND  Test developers and researchers, like other people in
MEASUREMENT general, have many different ways of looking at and
defining the same phenomenon (operational definition)
ASSUMPTION 1 - Psychological Traits and States Exist
 The possible items that can be written to gauge the  The typical use of the word “errors” refer to mistakes,
strengths of that trait in test taker’s should also be miscalculations, and the like on a test.
considered (developing appropriate test items)  To the contrary, error traditionally refers to something that
 Test scores are presumed to represent the strength of is more than expected; it is actually a component of the
the targeted ability, trait or state. measurement process

Cumulative scoring - assumption that higher the test taker’ score is, ERROR VARIANCE - the component of a test score attributable to
there is the presumption to be on the targeted ability or trait sources than the trait or ability measured

ASSUMPTION 3 - Test-Related Behavior Predicts Non-Test- Related  Assessors and the people assessed can be sources of error.
Behavior
ASSUMPTION 6 - Testing and Assessment Can Be Conducted in a
 Provides some indication of the examinee’s behavior Fair and Unbiased Manner
outside the testing procedure.
 Obtained sample of behavior is typically used to make  Several controversies for fair uses of tests have arisen in the
predictions about future behavior. history of the profession.
 In some cases, testing and assessment methods are used to  One source of fairness-related problems is the test user
postdict – it aids in understanding behavior that has already who attempts to use a particular test on people whose
taken place. background differ from other people the test was made
and intended for.
 Other testing controversies in relation to employment, who
ASSUMPTION 4 - Tests and Other Measurement Techniques Have gets admission in a school, and other opportunities have
Strengths and Weakness also risen.
 However, all of these are still considered tools – one which
 Competent test users must understand how a test was
we can use properly or not.
developed, the circumstances under which it is appropriate
to administer the test, how to administer the test and to ASSUMPTION 7 - Testing and Assessment Benefit Society
whom, how the test results should be interpreted, and how
those limitations might be compensated for by data from Without test, there will be….
other sources.
 Subjective personnel hiring process
 Children with special needs might be assigned to certain
ASSUMPTION 5 - Various Sources of Error Are Part of the classes by gut feel of the teachers and school administrators
Assessment Process  Great needs to diagnose educational difficulties
 No instruments to diagnose neuropsychological impairments
 No practical way for military to screen thousands of recruits
Typical vs Maximum Performance - Scaling – process of setting rules for assigning numbers in
measurement.
Maximum/Maximal
Typical Performance Test  Designing a measuring device for the trait/ability being
Performance Test
- How well the person - How well the person measured
typically does can do  Manifested through its item format
- No right or wrong - Have a correct answer  Dichotomous - offers two alternatives for each item.
answers - Achievement and Usually a point is given for the selection of one of the
- Personality aptitude, speed test, alternatives. The most common example of this format
inventories, attitude and power test is the true-false examination.
scales, and opinion
 Polytomous - (sometimes called polychotomous)
questionnaires
Test development- is an umbrella term for all that goes into resembles the dichotomous format except that each item
the process of creating a test. has more than two alternatives. Examples are multiple
The process of developing a test occurs in five stages: choice format or distractors- incorrect choices.
1. Test conceptualization – stage wherein the following
are determined: construct, goal, user, taker,
administration, format, response, benefits, cost, and
interpretation.
- determination whether the test would be norm-
referenced or criterion referenced.
 Norm-referenced tests compare individual performance with
the performance of a group.
 Criterion-referenced assessments measure how well a
student has mastered a specific learning goal (or objective)
 Pilot work, pilot study, and pilot research refer, in general,
to the preliminary research surrounding the creation of a
prototype of the test. Test items may be pilot studied (or  Likert - because it was used as part of Likert’s (1932)
piloted) to evaluate whether they should be included in the method of attitude scale construction. A scale using the
final form of the instrument Likert format consists of items such as “I am afraid of
heights”. Instead of asking for a yes-no reply, five
2. Test construction – writing the test items as well as alternatives are offered: strongly disagree, disagree,
formatting items, scoring rules, and otherwise designing neutral, agree, and strongly agree. In some applications,
and building the test. six options are used to avoid allowing the respondent to
be neutral. The six responses might be strongly
disagree, moderately disagree, mildly disagree, mildly conditions under which the “final version” of the test
agree, moderately agree, and strongly agree. will be administered.
 Category - a technique that is similar to the Likert - Issues may include: determination of target population and
format but that uses an even greater number of choices. number of samples for test tryout (# of items x 10), and test
On a scale from 1 to 10, with 1 as the lowest and 10 as tryout should be executed under conditions as identical as
the highest. Visual analogue scale- . Popular for possible to the conditions under which the standardized test
measuring self- rated health. Using this method, the will be administered.
respondent is given a 100-millimeter line and asked to
place a mark between two well-defined endpoints. 4. Item analysis – entails procedures usually statistically
 Item pool – usually 2 times the intended final form number of designed to explore how individual test items works as
items. (3 items is advised for inexperienced test developer.) compared to other items in the test and in the context of
- Reservoir from which items will or will not be drawn for the the whole test.
final version of the test. - Determination of the following:
- Final test items should contain all domains of the test.  Reliability - Consistency of scores obtained when retested with
 Determination of scoring model the same test or with an equivalent form of test
 Cumulative - assumes that the more the test taker  Validity - Degree to which the test measures what it’s
responds in a particular fashion, the more the test taker supposed to measure
exhibits the attribute being measured—is probably the - Requires independent, external criteria against which the test is
most common method for determining an individual's evaluated
final test score. - Validity coefficient – determines how closely the criterion
 Categorical - test taker responses earn credit toward performance can be predicted from the test score
placement in a particular class or category with other  Low VC – low correspondence between test
test takers whose pattern of responses is presumably performance and criterion
similar in some way. This approach is used by some  High VC – high correspondence between test
diagnostic systems wherein individuals must exhibit a performance and criterion
certain number of symptoms to qualify for a specific - Broader tests must be validated against accumulated data based
diagnosis. on different investigations
 Ipsative - comparing a test taker’s score on one - Validity is first established on a representative sample of test
construct within a test to another construct within that takers before it is ready for use
same test.  Tells us what the test is measuring
 Creation of the “final form” of the test  Tells us the extent to which we know what the test
measures
3. Test tryout – administration of a test to a representative  Item Difficulty - calculating the proportion of the total number
sample if test takers under conditions that stimulate the of test takers who answered the item correctly.
 Item Discrimination - measure of the difference between the - Questions on test’s validity may focus on the items that
proportion of high scorers answering an item correctly and the collectively make up the test.
proportion of low scorers answering the item correctly. The - A test may be reliable but not valid but it cannot be valid
higher the value of d, the greater the number of high without being reliable.
scorers answering the item correctly.  Norms - test performance data of a particular group of
test takers that are designed for use as a reference when
 Differential item functioning (DIF) - a phenomenon, wherein evaluating or interpreting individual test scores
an item functions differently in one group of test takers as - Obtained by administering the test to a sample of people and
compared to another group of test takers known to have the obtaining the distribution of scores for that group
same (or similar) level of the underlying trait  Normative Sample- Group of people whose
performance on a particular test in analyzed for
5. Test revision – balancing the weakness and strengths reference in evaluating the performance of an individual
of the test or an item. test taker
- After all necessary items has been revised based on the analysis  Norming- The process of deriving norms
of reliability, validity, item difficulty, and item discrimination, - May be modified to describe a particular type of norm
the test will be ‘tried out’ again to recalibrate the psychometric derivation
properties.  Standardization- the process of administering a test to a
representative sample of test takers for the purpose of
WHAT IS A GOOD TEST?
establishing norms
 Psychometric Soundness – also referred as “psychometric - A test is said to be standardized when it has clearly specified
adequacy”; refers to whether the tests demonstrate sufficient procedures for administration and scoring. Typically including
levels of reliability and validity for ethical use with clients. normative data
 Reliability - consistency in measurement
TYPES OF STANDARD ERROR
- the precision with which the test measures and the extent to
which error is present in measurement; “free from errors” STANDARD ERROR OF MEASUREMENT (SEM)- a statistic to
- perfectly reliable measuring tool consistently measures in the estimate the extent to which an observed score deviates from a true
same way score
 Validity - when a test measures what it purports to
measure STANDARD ERROR OF ESTIMATE (SEE) - in regression, it is
- An intelligence test is valid test because it measures an estimate of the degree of error involved in predicting the value of
intelligence; the same way with personality tests; and with one variable from another
other psychological tests STANDARD ERROR OF THE MEAN (SEM) - a measure of
sampling error
STANDARD ERROR OF THE DIFFERENCE (SED) - a statistic
used to estimate how large difference between two scores should be
before the difference is considered statistically significant

SAMPLING TO DEVELOP NORMS

SAMPLE - the representative of the whole population; could be a


small as one person, though samples that approach the size of the
population reduce the possible sources of error due to insufficient
sample size

SAMPLING - the process of selecting the portion of the universe


deemed to be representative of the whole population

You might also like