Testing Protocol

Standardized Assessment Worksheet
Which Standardized Assessment did you chose?
- I chose the Bayley Scales of Infant Development assessment.
What age range is this assessment appropriate for?
- Birth to 42 months old.
What languages is it available in (if stated)?
- English is currently the only language for manual.
What does it mean to be a criterion referenced test (textbook has answer)
- A criterion referenced test is interpreted according to fixed standards that represent an

acceptable level of performance with a defined list of skills. It is usually detailed and covers
specific or developmental milestones. It does not have a score distribution, rather the child
may pass or fail for all items. The items chosen are for functional and developmental
importance to provide necessary information for developing therapy objectives for that
specific individual.
What does the manual say about how the criteria for scoring was developed for this test?
- The technical manual states that the primary purpose of the Bayley Scales of Infant and
Toddler Development 4th edition was to identify the children with developmental delays and
provide information for intervention planning. The manual talks about the history of the
BSID-IV and the uneven course of development for each individual over the first two years of
life. It is a power test dynamic to allow for flexibility in administration of the test but also
gaining effective data. There have been many revisions to this assessment resulting in the
BSID-IV. This scale uses Cognition, Language, Motor, Social Emotional and Adaptive
behaviors with subtests to measure aspects of these scales. This assessment incorporates
administration of structured items, direct observation of behaviors and active participation
of parent or caregivers in evaluation process.
- The administration manual states that the Bayley provides three types of scores. Scaled and
standard and percentile ranks. The results can be derived from age equivalents to represent
average age in months when given raw scores. This assessment gives both quantitative and
qualitative information to chart a child’s progression. Lastly, the assessment has time
duration, subtest order, item order and test material guidelines to follow during the test per
standardized requirements.
Where are the response criteria for scoring explained for the administrator? (in test booklet,
section of manual, supplemental booklet, easel)?
- The response criteria for scoring is explained in the administration manual on pages 27-38
as well as pages 269-282
There are times when we might need to modify or adapt testing protocols to allow a child to
complete the test items. These modifications are noted by the test administrator in the protocol
sheets and reported in the evaluation report. A good test administrator understands what can be
done without violating the criterion for testing. What does the manual say about allowable
modifications?
- The manual states that there does not need to be a specific order of test to occur in. This
allows for flexibility and child participation at different stages, as long as the required tasks
are done and scored with complete or not, it can be modified.
o Items can have simplified directions.
o Directions are less scripted.
o Many items are scored via observation.
What is construct validity?
- Construct validity establishes the ability of an instrument to measure the dimensions and
theoretical foundations of an abstract construct. For criterion related validity, it establishes
the correspondence between a target measure and a reference or “gold’ standard measure
of the same construct. Meaning how well or valid a test measures with other related
constructs or activities.
Does this assessment have construct validity? In the case of the Bayley, refer to the development of
the Motor and Cognitive Scales only.
- Yes this assessment has construct validity
Does the manual mention the assessments’ ability to discriminate between younger and
older children? If it does, it will be found in the section on construct validity.
- On Page 61 of the technical manual it discusses developmental age equivalents and in the
administration manual it uses chronological age to differentiate between younger and older
children and what items to use for certain measurements.
Does the manual mention the assessment’s ability to distinguish between typically
developing children and those with specific diagnoses? If it does, it will be found in the
section on construct validity.
- On page 65, 81-85 the technical manual discusses the ability of children with developing
children with certain diagnosis. Exclusion and inclusion criteria for specific diagnoses. In
the administration manual there are tests that can be run that are appropriate for each
diagnoses that may not be appropriate for others.
What is content validity?
- Content validity establishes that the multiple items that make up a questionnaire inventory
or scale adequately samples the universe of content that reflects the construct being
measured.
Does this assessment have content validity? In the case of the Bayley, refer to the development of
the Motor and Cognitive Scales only. Was this assessment compared to other assessments?
- This assessment has content validity. It compares the Bayley-4 to the Vinland-3, WPPSI-IV,
PDMS-3 and the PLS-4
What does it mean to be norm referenced (textbook has answer)
- Norm-referenced tests are developed by giving the test to a large number of children,
usually several hundred or more. The norm reference is being tested in comparison with the
normative sample. It is also used to determine how a child performs in relation to the
average performance of the normative sample.
On what population(s) was this assessment normed? Include numbers of children in each category
if information is provided.
- Asian, African Americans, Hispanic, Caucasian, and Other races and genders were
included.
- Pediatric disorders
o ASD (31)
o Developmental delays (57)
o Language delays (25)
o Language impairment (25)
o Motor impairment (40)
o Prenatal drug/alcohol abuse (44)
o moderate/late premature (70)
o very/extremely premature. (66)
- Mixed populations
o Pediatric Normative sample (1,700)
o Cognitive, language, Motor (1,700)
o Social (320)
o Adaptive behavior (750)
- Disabled individuals
o Down syndrome (54)
What geographic areas were used?
- Midwest, Northeast, South, West
What socio-economic groups were studied?
- Socioeconomic statuses were not included in the manual, but education levels were. These
included 0-12 years no diploma, high school with diploma or equivalent, some college and
associate degree and bachelor’s degree or more.
What is test-retest reliability (textbook)

- Test re-test is a measurement of the stability of test over time. It’s obtained by having the
same individual perform/give the test on two different occasions.
What does the manual say about the test-retest reliability of this assessment? If they give a
number, report it.
- The manual states that the test-retest was split into two parts. The first section describes
the studies conducted to examine the relation between the bayley-4 and other measures.
Special groups studies were described in the second section.
- The pediatric normative sample has acceptable test re-test reliability for both scale and
subscales of the assessment.
What is inter-rater reliability (textbook)?
- Interrater reliability refers to the ability of two independent raters to obtain the same scores
when scoring the same child concurrently.
What does the manual say about inter-rater reliability for this assessment? If they give a number,
report it.
- This assessment has adequate to excellent interrater reliability. There is a pre-post test for
the child to get basal and progression data. The number given is n=47 with interrater
reliability for each subscale ranging .60 to .80 depending on the subscale.
When reporting test results, what type of scores are produced?
Z score?
- Standard scores from -1 to 1
T score?
- Standard scores from 50 and anything lower indicates a score below the mean.
Developmental index?
- Growth scale overtime values. Mean of 100 and a SD of 15 or 16.
Percentile score?
- Percentile Ranks usually in a numbered form with a percent. A score of 60 would be 60%
which indicates that the number of people in this test sampele received a score that was
below the raw score according to the 60th percentile.
Age equivalence?
- Age equivalents scoring representing average age in months. The raw score is at the 50th
percentile. It represents only the score that a child at the same age who is performing at the
50th percentile would receive.
Each of these ways of reporting describes an individual’s test results as a distance from the mean.
What terms are suggested to use for interpreting the scores of families?
- Some terms that could be used to interpret scoring for families are population, constructs
or concepts, and being culturally respectful. There may not specifically be a number
depending on the assessment.

Testing Protocol

Uploaded by

Copyright:

You might also like

Testing Protocol

Uploaded by

Document Information

Copyright

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Testing Protocol

Uploaded by

Copyright:

Standardized Assessment Worksheet

Which Standardized Assessment did you chose?

- I chose the Bayley Scales of Infant Development assessment.

What age range is this assessment appropriate for?

- Birth to 42 months old.

What languages is it available in (if stated)?

- English is currently the only language for manual.

What does it mean to be a criterion referenced test (textbook has answer)

- A criterion referenced test is interpreted according to fixed standards that represent an

What is construct validity?

- Yes this assessment has construct validity

What is content validity?

What does it mean to be norm referenced (textbook has answer)

What geographic areas were used?

- Midwest, Northeast, South, West

What socio-economic groups were studied?

What is test-retest reliability (textbook)

What is inter-rater reliability (textbook)?

When reporting test results, what type of scores are produced?

- Standard scores from -1 to 1

- Growth scale overtime values. Mean of 100 and a SD of 15 or 16.

You might also like