Assessment Reviews

Madeline McCormick
SPED 5140
Assessment Reviews
Woodcock-Johnson IV (WJ IV)
 Purpose and Suggested uses/Time requirements for administration

o Overview/Purpose: The WJ IV Tests of Achievement are designed to measure
intellectual abilities, academic achievement, and oral language abilities. The WJ
IV consists of the following three batteries: Tests of Cognitive Abilities (WJ IV
COG), Tests of Oral Language (WJ IV OL), and Tests of Achievement (WJ IV
ACH). The WJ IV ACH consists of 20 tests that evaluate strengths and
weaknesses in reading, written language, mathematics, and academic knowledge.
These 20 tests are categorized as standard battery tests or extended battery tests.
o Suggested Use: The WJ IV can be given to children at age 2 through adulthood.
Figure 3-2 on page 27 of the Examiner’s Manual gives suggested starting points
based on age.
o Time Requirements:
 Tests 1-6 takes approximately 40 minutes to administer
 Tests 1-5 require about 5-10 minutes each
 Test 6 requires about 15-20 minutes
 Tests 7-12 take approximately 10-15 minutes each
 The time to administer the overall WJ IV differs depending on the
examinee
o Extra time can be given on some sections if requested by the
examinee and noted in the directions
 Construct (what the instrument measures)

o The WJ IV measures the following four domains: reading, written language,
mathematics, and academic knowledge through 20 tests explained in the
description of each subtest section below.
o Within those 4 domains it tests the following skills:
 Comprehension-Knowledge
 Long-Term Retrieval
 Visual-Spatial Thinking
 Auditory Processing
 Fluid Reasoning
 Processing Speed
 Short-Term Memory
 Quantitative Knowledge
 Reading-Writing Ability
 Reliability and Validity

o Reliability: This is how consistently a test measures an ability. The WJ IV
Technical Manual contains the reliability studies and analyses that were
performed to verify that the test can reliably measure cognitive ability, oral
Madeline McCormick
SPED 5140
Assessment Reviews
language ability, and achievement. The following factors were considered: error
of measurement, reliability coefficients, test reliabilities, cluster reliabilities, and
alternate-forms equivalence. The extensive research suggests the WJ IV is
reliable.
o Validity: This is whether the test measures what it purports to measure. The
Technical Manual provides evidence of validity consistent with the Standards for
Educational and Psychological Testing. It provides substantive, internal, and
external validity evidence. In addition, it acknowledges the benefits of the Rasch
model. The extensive research suggests the WJ IV is valid.
 Procedures for Standardization (be sure to note whether or not

Differential Item Functioning analyses were performed)
o The national standardization consisted of 7,416 individuals ages 2-90+ years old
during a 25-month period. The demographics of the sample closely resemble that
of the general population of the United States.
o Differential Item Functioning (DIF) is an item bias when examinees from
different racial or gender groups score differently on the same item even though
they are at the same ability level. The WJ IV used the Rasch iterative-logit
method to detect DIF. Items were analyzed for race, gender, and ethnicity.
 Description of each subscale

o Test 1: Letter-Word Identification
 This test measures word identification skills. First, the examinee must
identify letters. Then, the examinee must correctly read aloud words from
a list. The examinee is only expected to read the words which increase in
difficulty.
o Test 2: Applied Problems
 This test requires the examinee to analyze and solve math problems. The
examinee listens to the problem, decides which operation to use, and
complete simple calculations.
o Test 3: Spelling
 This test requires the examinee to write words that are given orally. First,
the ability to trace and draw lines is assessed. Then, the examinee must
produce lowercase and uppercase letters. The rest of the items are
measuring the examinee’s ability to correctly spell words. The words are
used within the context of a sentence.
o Test 4: Passage Comprehension
 This test measures the ability to use syntactic and semantic cues to identify
a missing word that makes sense in the context of the passage. Initially,
there are only pictures and then more text is introduced and eventually
there are no pictures and longer passages with complex syntax.
Madeline McCormick
SPED 5140
Assessment Reviews
o Test 5: Calculation
 This test measures the ability to perform mathematical computations.
First, the examinee is asked to write single numbers. Then, the examinee
uses addition, subtraction, multiplication, and division to solve equations.
o Test 6: Writing Samples
 This test measures the ability to write a response in response to several
different demands. Spelling and punctuation are not penalized. The
sentences are evaluated for quality of expression.
o Test 7: Word Attack
 This test measures the ability to use phonic and structural analysis to
pronounce unfamiliar printed words. First, the examinee must produce the
sounds for single letters. Then, the examinee must read aloud nonsense
words that increase in difficulty.
o Test 8: Oral Reading
 This test measures story reading accuracy and prosody. The examinee
must read sentences aloud that increase in difficulty. Accuracy and
fluency are scored.
o Test 9: Sentence Reading Fluency
 This test measures reading rate via silently reading simple sentences and
circling yes or no in the Response Booklet. The examinee has 3 minutes to
complete as many as possible.
o Test 10: Math Facts Fluency
 This test measures the speed of solving a mixture of simple addition,
subtraction, and multiplication facts. The examinee has 3 minutes to solve
as many math facts as possible.
o Test 11: Sentence Writing Fluency
 This test measures the ability to quickly formulate and write simple
sentences. The sentences are supposed to relate to the picture and use all
three words given.
o Test 12: Reading Retell
 This test measures reading comprehension. The examinee silently reads a
short story and retells as much information as he or she can remember.
There is a 5 minute time limit on this test.
o Test 13: Number Matrices
 This test measures quantitative reasoning. The examinee must identify the
missing number in a matrix.
o Test 14: Editing
 This test measures the ability to identify and correct errors in a written
passage. Possible errors to identify include incorrect punctuation,
capitalization, spelling, and word usage.
o Test 15: Word Reading Fluency
Madeline McCormick
SPED 5140
Assessment Reviews
 This test measures vocabulary knowledge and semantic fluency. The
examinee is given 3 minutes to pick the two words that go together in each
row. There are two words in each row that are related somehow
(potentially synonyms or antonyms).
o Test 16: Spelling of Sounds

 This test measures spelling ability. Initially the examinee is asked to write
single letters with a single sound. Then, the examinee must listen to a
recording of nonsense words and spell them.
o Test 17: Reading Vocabulary
 This test includes two subtests called synonyms and antonyms. The first
subtest asks the examinee to read of word and give a synonym. The
second subtest asks the examinee to read of word and give an antonym.
o Test 18: Science
 This test measures the examinee’s knowledge in the various sciences
(anatomy, biology, chemistry, geology, medicine, and physics). Initially
the examinee gives a pointing response but eventually must respond
orally.
o Test 19: Social Studies
 This test measures the examinee’s knowledge of history, economics,
geography, government, and psychology. Initially the examinee gives a
pointing response but eventually must respond orally.
o Test 20: Humanities
 This test measures the examinee’s knowledge of art, music, and literature.
Initially the examinee gives a pointing response but eventually must
respond orally.
 Administration procedures/specific training required

o Administration Procedures: The first page of each of the subtests gives an
Administration Overview that explains what to do. It also contains information on
scoring, basal, ceiling, and suggested starting points. This information is concise,
helpful, and easy to follow. The examiner must follow the guidelines and script
exactly. Professional judgement should be used in regards to testing fatigue.
o Training: The examiner should receive formal training and complete the
Examiner Training Workbook before administering the test. The examiner should
have general knowledge of standardized administering, scoring, and testing
procedures. Graduate level training in assessment is recommended.
 Scoring procedures/types of scores obtained (grade, percentile rank)

Madeline McCormick
SPED 5140
Assessment Reviews
o Raw Score: The number of correct responses each receive 1 raw score point
(except in Test 6, 8, and 12 because higher scores are possible)
o W Score: The raw scores are converted using the online scoring program into W
scores, which are adaptations of the Rasch ability scale.
o Grade Equivalent (GE): The grade equivalent reflects performance in regards to
the grade level from the norming sample that has the same average score as the
examinee
o Age Equivalent (AE): The age equivalent reflects performance in regards to the
age level from the norming sample that has the same average score as the
examinee
o W Difference Score: This is the difference between the examinee’s test score and
the average test score from the norming sample (same age or grade)
o Relative Proficiency Index (RPI): This allows predictions to be made about
quality of performance on similar tasks to the ones tested.
o Instructional Zone: The instructional zone is a range along a developmental scale
that targets the examinee’s current level of functioning from their independent
level to frustration level (easy to difficult).
o CALP Levels: This is an optional part of the report that describes the examinee’s
cognitive-academic language proficiency (CALP) which helps indicate their
language proficiency in English.
o Percentile Rank: This describes performance on a scale of 1-99 in comparison to a
portion of the norming sample at a specific age or grade level. The percentile rank
indicates the percentage that scored the same or lower than the examinee.
o Standard Score: This is based on a mean of 100 and a standard deviation of 15.
o Standard Error of Measurement (SEM): The quantity of error that is inherent in a
score is the SEM and it gives professionals an idea of how much confidence to
have in the score.
 Cautions and considerations for language/cultural diversity

o It is very important to have an examiner who is knowledgeable about several
issues relevant to bilingualism. The WJ IV Tests of Achievement Examiner’s
Manuel states that “the examiner must be familiar with the second language
acquisition process, native language attrition, language shift in dominance, cross
linguistic transfer of learning, and the impact of special language programming
and socioeconomic factors on language learning” (p. 41).
o If the examiner is not familiar with second language acquisition, then they must
consult a specialist before, during, and after testing.
o Make sure the examinee understands the task required.
o The examiner must know how many years the examinee has been exposed to
English and the examinee’s abilities in his or her first language.
 Summary of strengths and weaknesses of instrument

Madeline McCormick
SPED 5140
Assessment Reviews
o Strengths: A thorough amount of research was conducted to prove the WJ IV to
be reliable and valid, which should make the examiner confident when using it. In
addition, the multitude of tests and skills assessed can provide a plethora of
information about a student.
o Weaknesses: This is a lengthy test. I know the Special Education Teacher at our
school has had difficulty fitting the administration of this test into her busy
schedule. In addition, I’ve seen a first grader struggle with the length of the test
(so there is a concern about testing fatigue).
Gray Oral Reading Tests Fifth Edition (GORT-5)

 Purpose and Suggested uses/time requirements for administration
o Purpose: One of the purposes of the GORT-5 is to identify if a student’s oral
reading abilities are significantly behind his or her grade level peers. Another
purpose is to identify oral reading strengths and weaknesses and monitor progress
in special intervention programs. Additionally, the GORT-5 can be used in
research to study reading.
o Suggested Use: The GORT-5 is to be used with students between the ages of 6-23
to test oral reading rate, accuracy, fluency, and comprehension
o Time Requirements: The GORT-5 consists of 1-2 sessions of 15-45 minutes
 Construct (what the instrument measures)

o The GORT-5 measures oral reading fluency and comprehension. This test has
equivalent Forms A and B that each contain 16 developmentally sequenced
reading passages. After each passage there are five comprehension questions.
o This is how the reading fluency and comprehension are measured:
 Fluency is the combination of the Rate and Accuracy scores:
 Rate is how many seconds it takes the student to read a story aloud.
 Accuracy is the amount of words the student correctly pronounces
when reading a story aloud.
 Comprehension is how many of the five comprehension questions the
student correctly answers.
 Reliability and Validity

o Reliability: This is how consistently a test measures an ability. To test the
reliability of the GORT, the following five types of correlation coefficients were
calculated: coefficient alpha, alternate forms (immediate administration), test-
retest, alternate forms (delayed administration), and interscorer reliability. These
correlation coefficients were used to measure the following three sources of error
variance: content, time, and scorer. In order for the GORT-5 to be reliable, the
Madeline McCormick
SPED 5140
Assessment Reviews
reliability coefficients must be around .80 or above (higher scores are more
desirable). The tables in the Examiner’s Manual show the GORT-5 consistently
scores at a high degree of reliability in all the areas studied. This suggests the
GORT-5 is reliable with little test error.
o Validity: This is whether the test measures what it purports to measure.
 Content-Description Validity: There is an explanation of the rationale for
all of the components of the GORT in the Examiner’s Manual.
 Criterion-Prediction Validity: The scaled scores were checked against the
index scores of five criterion measures of reading ability to check if they
were strongly related.
 Construct-Identification Validity: A three-step process was used to
determine this type of validity:
 Identify constructs thought to be responsible for test performance
 Develop hypotheses based on the identified constructs
 Verify the hypotheses
 All of the results suggest the GORT-5 is valid.
 Procedures for Standardization (be sure to note whether or not

Differential Item Functioning analyses were performed)
o The norm sample was 2,556 students in 33 states between the ages of 6-23
 Characteristics considered for norming sample:
 Geographic Region
 Gender
 Race
 Hispanic Status
 Parents’ Education Attainment
 Household Income
 The percentage of the normative sample were compared to the most recent
Census report and this demonstrated that the sample is representative
o The Differential Item Functioning (DIF) is an item bias when examinees from
different racial or gender groups score differently on the same item even though
they are at the same ability level. The GORT-5 used the logistic regression
procedure by Swaminathan and Rogers (1990) to detect DIF. This method
compares two different models.
 Description of each subscale

o The GORT-5 measures oral reading fluency and comprehension. Forms A and B
each contain 16 developmentally sequenced reading passages with five
comprehension questions.
o The Examiner Record Booklet has 8 Sections:
 Section 1 Identifying Information
 Provide the examinee’s name, gender, grade, and school
Madeline McCormick
SPED 5140
Assessment Reviews
 Section 2 Record of GORT-5 Scores
 The rate, accuracy, fluency, and comprehension score from each
story are listed
 Section 3 Performance Summary
 The raw scores from section 2 are converted into age equivalent,
grade equivalent, percentile rank, and scaled score using the tables
in the Appendix
 Section 4 Descriptive Terms Corresponding to Scaled and Index Scores
 This shows the description terms (from very poor to very superior)
associated with the different ranges of scaled scores
 Section 5 Worksheet for Recording GORT-5 Miscues (Optional)
 Space is provided to do a miscue analysis of the specific types of
errors made while reading aloud. Categorized miscues include:
 Meaning similarity
 Function similarity
 Graphic/Phonemic similarity
 Multiple Sources
 Self-Correction
 Section 6 Summary of Other Reading Behaviors (Optional)
 Space is provided to note the following types of deviations from
print analysis:
 Substitutions
 Omissions
 Mispronunciations
 Additions
 Reversals
 Hesitations
 Space is provided to note the following observations:
 Posture
 Word-by-word reading
 Poor enunciation
 Disregard of punctuation
 Head movement
 Finger pointing
 Loss of place
 Nervousness
 Poor attitude
 Other
 Section 7 Prosody (Optional)
 Space is provided to rank the following categories:
 Expression
 Volume
 Phrasing
 Smoothness
 Pacing
Madeline McCormick
SPED 5140
Assessment Reviews
 Section 8 Record of Performance
 This consists of the stories, comprehension questions, and
conversion table to mark how the student is performing
 There is also administration and scoring instructions
 Reminders about entry points, basals, and ceilings
 Administration procedures/specific training required

o Administration sequence for every story in the GORT-5:
 The student reads a story aloud and the examiner times the reading and
marks the errors in the Examiner Record Booklet
 After the student finishes reading, the examiner writes the time in seconds
in the Time section
 The examiner adds up the reading errors and writes it in the Deviations
from Print section.
 The story is removed and five comprehension questions are asked. The
response can get a score of 1 if correct or 0 if incorrect.
 The examiner adds up the correct comprehension responses and writes it
in the Comprehension Score box.
 There is table in the Examiner Record Booklet titled Converting the Time
and Deviations from Print to Rate and Accuracy which should be used to
convert the time into a Rate Score.
 The same table should be used to convert the Deviations from Print into an
Accuracy Score
 The Rate and Accuracy Scores are added together to get the Fluency Score
(this determines basals and ceilings)
o Training: Examiner should receive formal training to understand testing statistics,
administration procedures, scoring, and reading evaluation. It is also
recommended to have a supervised practice session.
 Teachers, school psychologists, and diagnosticians are considered
qualified examiners.
 Scoring procedures/types of scores obtained (grade, percentile rank)

o Rate, Accuracy, Fluency, and Comprehension (which are described in the
Construct and Administration section of this report) are reported as age and grade
equivalents, percentile ranks, scaled scores.
 Age and Grade Equivalents: These are made from the average scores of all
the examinees in the norming sample. There are Age and Grade
Equivalents for each point on a subtest.
 Percentile Ranks: These ranks are on a scale of 0-99 and show the
percentage of the norming sample that is at or below any particular
percentile.
Madeline McCormick
SPED 5140
Assessment Reviews
Scaled Scores: Norms for Rate, Accuracy, Fluency, and Comprehension
are reported as scaled scores that have a mean of 10 and a standard
deviation of 3.
o The Oral Reading Index (ORI) is given as a standard score and percentile rank.
 ORI is a composite score made by combining the sum of the Fluency
(Rate and Accuracy) and Comprehension scaled scores to get a mean of
100 and a standard deviation of 15. The ORI is the most reliable score on
the GORT-5.
 Cautions and considerations for language/cultural diversity

o Considerations:
 It is very important to have an examiner who is knowledgeable about
several issues relevant to bilingualism.
 If the examiner is not familiar with second language acquisition, then they
must consult a specialist before, during, and after testing.
 Make sure the examinee understands the task required.
o Cautions:
 When interpreting the results, the factors associated with being a
multilingual learner must be considered
 Even though the GORT-5 thoroughly standardized, caution should be used
when interpreting the results and making diagnostic and instructional
decisions.
 The GORT-5 is only one piece. A body of evidence should be compiled.
 Summary of strengths and weaknesses of instrument

o Strengths:
 The test seems straightforward, easy to follow, and quick.
 The research has proven the test has strong validity and reliability.
 The miscue analysis section could help pinpoint specific areas to target
with instruction and intervention.
 This test can also be used with a large age span.
o Weaknesses:
 I could not find very much information about considerations for
multilingual learners in the Examiner’s Manual.
 My concern is that since there was not much information about it
in the manual, it is left to the person interpreting the results to
consider where the student is at in the second language acquisition
process.
 Second language acquisition especially must be considered when
interpreting the comprehension results of a multilingual learner.
 Depending on why you are using this assessment, the fact that this test is
likely more useful for instruction and intervention than diagnosing a
disability could be considered a weakness.

Assessment Reviews

Uploaded by

Copyright:

Available Formats

You might also like

Assessment Reviews

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assessment Reviews

Uploaded by

Copyright:

Available Formats

Madeline McCormick

Woodcock-Johnson IV (WJ IV)

 Purpose and Suggested uses/Time requirements for administration

 Construct (what the instrument measures)

 Reliability and Validity

 Procedures for Standardization (be sure to note whether or not

 Description of each subscale

o Test 16: Spelling of Sounds

 Administration procedures/specific training required

 Scoring procedures/types of scores obtained (grade, percentile rank)

 Cautions and considerations for language/cultural diversity

 Summary of strengths and weaknesses of instrument

Gray Oral Reading Tests Fifth Edition (GORT-5)

 Construct (what the instrument measures)

 Reliability and Validity

 Procedures for Standardization (be sure to note whether or not

 Description of each subscale

 Administration procedures/specific training required

 Scoring procedures/types of scores obtained (grade, percentile rank)

 Cautions and considerations for language/cultural diversity

 Summary of strengths and weaknesses of instrument

You might also like