NCME 2010 Update On Test Standards Revision 4-27-10

Update on the Revisions to the Standards for Educational and Psychological Testing: Overview
2010 Annual Meeting of the NCME Denver, Colorado May 1, 2010, 4:05 6:05 p.m.
Michael Kolen
University of Iowa
Joint Committee Members
Lauress Wise, Co Chair, HumRRO Barbara Plake, Co Chair, University of Neb. Linda Cook, ETS Fritz Drasgow, University of Illinois Brian Gong, NCIEA Laura Hamilton, Rand Corporation Jo-Ida Hansen, University on MN Joan Herman, UCLA
May 1, 2010
Update on Revisions to the Test Standards
Joint Committee Members
Michael Kane, ETS Michael Kolen, University of Iowa Antonio Puente, UNC-Wilmington Paul Sackett, University of MN Nancy Tippins, Valtera Corporation Walter (Denny) Way, Pearson Frank Worrell, Univ of CA- Berkeley
May 1, 2010
Scope of the Revision
Based on comments each organization
received from invitation to comment Summarized by the Management Committee in consultation with the CoChairs
Wayne Camara, Chair, APA Suzanne Lane, AERA David Frisbie, NCME
May 1, 2010
Five Identified Areas for the Revisions
Access/Fairness Accountability Technology Workplace Format issues
May 1, 2010
Theme Teams
Working teams Cross team collaborations Chapter Leaders Focusing of bringing into chapters
content related to themes in coherent and meaningful ways
May 1, 2010
Presentation: Five Identified Areas & Discussant
Fairness Joan Herman Accountability Laura Hamilton Technology Denny Way Workplace Laurie Wise Format and Publication Options
Barbara Plake Discussant - Steve Ferrara, NCME Liaison to JC
May 1, 2010
Timeline
First meeting January, 2009 Three year process for completing text of

revision Release of draft revision following December 2010 JC meeting Open comment/Organization reviews Projected publication Summer, 2012
May 1, 2010
Revision of the Standards for Educational and Psychological Testing: Fairness

Joan Herman CRESST/UCLA
Overview
1999 Approach to Fairness Committee Charge Revision Response
May 1, 2010
10
1999 Approach
Standards related to fairness appear

throughout many chapters Concentrated attention in:
Chapter 7: Fairness in Testing and Test Use Chapter 8: Rights and Responsibilities of
Test Takers Chapter 9: Testing Individuals of Diverse Linguistic Backgrounds Chapter 10: Testing Individuals with Disabilities
May 1, 2010
11
Committee Charge
Five elements of the charge focused on accommodations/modifications

Impact/differentiation of accommodation and modification Appropriate selection/use for ELL and EWD Attention to other groups, e.g., pre-K, older populations Flagging Comparability/validity
One element focused on adequacy and comparability of translations One element focused on Universal Design
May 1, 2010
12
Revision Response
Fairness is fundamental to test validity: include as foundation chapter Fairness and access are inseparable Same principles of fairness and access apply to all individuals and regardless of specific subgroup From three chapters to a single chapter describe core principles and standards
Examples drawn from ELs, EWD, and other groups (young children, aging adults adults, etc) Comments point to applications for specific groups Special standards retained where appropriate (e.g., test translations)
May 1, 2010
13
Overview to Fairness Chapter
Section I: General Views of Fairness Section II: Threats to the Fair and Valid
Interpretations of Test Scores Section III: Minimizing Construct Irrelevant Components Through the Use of Test Design and Testing Adaptations Section IV: The Standards
May 1, 2010
14
Four Clusters of Standards

1. Use test design, development administration and scoring procedures that minimize barriers to valid test interpretations for all individuals. Conduct studies to examine the validity of test score inferences for the intended examinee population. Provide appropriate accommodations to remove barriers to the accessibility of the construct measured by the assessment and to the valid interpretation of the assessment scores. Guard against inappropriate interpretations, use, and/or unintended consequences of test results for individuals or subgroups.
2.
3.
4.
May 1, 2010
15
Revision of the Standards for Educational and Psychological Testing: Accountability

Laura Hamilton RAND Corporation
Overview
Use of tests for accountability has expanded
Most notably in education but also in other areas such as behavioral health Facilitated by increasing availability of data and analysis tools Recent and impending federal and state initiatives will likely lead to further expansion
Under NCLB, or new pay for performance programs, tests often have consequences for individuals other than the examinees
Use of test scores in policy and program evaluations continues to be widespread
Reinforced by groups that fund and evaluate research (e.g., IES, What Works Clearinghouse)
May 1, 2010
17
Organization of Accountability Material
Chapter on policy uses of tests focuses on use

of aggregate scores for accountability and policy
Chapter on educational testing addresses
student-level accountability (e.g., promotional gates, high school exit exams) and interim assessment
Validity, reliability, and fairness standards in
earlier chapters apply to accountability testing as well

May 1, 2010
18
Some Key Accountability Issues Included in Our Charge

1. Calculation of accountability indices using composite scores at level of institution or individual

Institutional level (e.g., conjunctive and disjunctive rules for combining scores) Individual level (e.g., teacher value-added modeling)
2. Issues related to validity, reliability, and reporting of individual and aggregate scores 3. Test preparation 4. Interim assessments
May 1, 2010
19
1. Accountability Indices

Most test-based accountability systems require calculation of indices using complex set of rules Advances in data systems and statistical methodology have led to more sophisticated indices to support causal inferences
E.g., teacher and principal value-added measures Consequences attached to these measures are growing increasingly significant
May 1, 2010
20
2. Validity, Reliability, and Reporting Requirements

Accountability indices should be subjected to validation related to intended purposes Error estimates should be incorporated into score reports, including those that provide subscores and diagnostic guidance for individuals or groups Reports should provide clear, detailed information on rules used to create aggregate scores or indices
May 1, 2010
21
2. Validity, Reliability, and Reporting Requirements, cont.
Guidance should be provided for interpretation of scores from subgroups
Describe exclusion rules, accommodations, and modifications Address error stemming from small subgroups Explain contribution of subgroup performance to accountability index
Teachers and other users should be given assistance to ensure appropriate interpretation and use of information from tests
May 1, 2010
22
3. Test Preparation

High-stakes testing raises concerns about inappropriate test preparation Users should take steps to reduce likelihood of test preparation that undermines validity
Help administrators and teachers understand what kinds of preparation are appropriate and desirable Design tests and testing systems to limit likelihood of harmful test preparation
Consequences of accountability policies should be monitored
May 1, 2010
23
4. Addressing Interim Assessments
Interim assessments are common but take many different forms
Some produced by commercial publishers, others homegrown Vary in the extent to which they provide formative feedback vs. benchmarking to end-of-year tests Need to determine which of these tests should be subjected to the Standards
Requirements for validity and reliability depend in part on how scores are used
If used for high-stakes decisions such as placement, evidence of validity for that purpose should be provided Systems that provide instructional guidance should include rationale and evidence to support it
May 1, 2010
24
Revision of the Standards for Educational and Psychological Testing: Technology

Denny Way Pearson
Overview
Technological advances are changing the way
tests are delivered, scored, interpreted and in some cases, the nature of the tests themselves The Joint Committee has been charged with considering how technological advances should impact revisions to the Standards As with the other themes, comments on the standards that related to technology were compiled by the Management Committee and summarized in their charge to the Joint Committee
May 1, 2010
26
Key Technology Issues Included in our Charge
Reliability & validity of innovative item formats
Validity issues associated with the use of:

Automated scoring algorithms Automated score reports and interpretations
Security issues for tests delivered over the internet Issues with web-accessible data, including data warehousing
May 1, 2010
27
Reliability & Validity of Innovative Item Formats

What special issues exist for innovative items with respect to access and elimination of bias against particular groups? How might the standards reflect these issues? What steps should the standards suggest with regards to usability of innovative items? What issues will emerge over the next five years related to innovative items/test formats that need to be addressed by the standards?
May 1, 2010
28
Automated Scoring Algorithms

What level of documentation/disclosure is appropriate and tolerable for automated scoring developers/vendors? What sorts of evidence seem most important for demonstrating the validity and reliability of automated scoring systems? What issues will emerge over the next five years related to automated scoring systems that need to be addressed by the standards?
May 1, 2010
29
Expert Panel Input
To address issues related to innovative

item formats and automated scoring algorithms, we convened a panel of experts from the field and solicited their advice Invited members made presentations on these topics and discussed associated issues with the joint standards committee
May 1, 2010
30
Highlights of Technology Panel Input
Test development and simulations Security & Fairness Timed tasks & processing speed Innovative clinical assessments & faking
(effort assessment)
Rationale / validity argument Usability studies / field testing
May 1, 2010
31
Highlights of Technology Panel Input
Disclosure of automated scoring algorithms:

Differing viewpoints
Disclose everything to great detail (use patents to

protect proprietary IP) vs. provide sufficient documentation for other experts to confirm validity of process Possible compromise: expert review under conditions of nondisclosure
Quality Assurance: Importance of

independent calibrations
May 1, 2010
32
Automated Score Reports and Interpretation
Use of computer for score interpretation Actionable reports (e.g., routing students and teachers to instructional materials and lesson plans based on test results)
Documentation of rationale Supporting validity evidence
May 1, 2010
33
Revision of the Standards for Educational and Psychological Testing: Workplace Testing
Laurie Wise
Human Resources Research Organization
(HumRRO)
Overview

Standards for testing in the work place are currently covered in Chapter 14 (one of the testing application chapters). Work-place testing includes employment testing as well as licensure, certification, and promotion testing.
Comments on standards related to work place testing were received by the Management Committee and summarized in their charge to the Joint Committee.
Comments suggested areas for extending or clarifying testing standards, but did not suggest major revisions existing standards.
May 1, 2010
35
Key Work-Place Testing Issues Included in Our Charge

1. 2. 3. Validity and reliability requirements for certification and licensure tests. Issues when tests are administered only to small populations of job incumbents. Requirements for tests for new, innovative job positions that do not have incumbents or job history to provide validity evidence. Assuring access to licensure and certification tests for examinees with disabilities that may limit participation in regular testing sessions? Differential requirements for certification and licensure and employment tests.
4.
5.
May 1, 2010
36
1. Validity and Reliability Requirements for Certification
Some specific issues:
Documenting and communicating the

validity and reliability of pass-fail decisions in addition to the underlying scores How cut-offs are determined How validity and reliability information is communicated to relevant stakeholders
A key change is the need for focus on

pass-fail decisions
May 1, 2010
37
2. Issues with Small Examinee Populations
Including:
Alternatives to statistical tools for item screening Alternatives to empirical validity evidence Maintaining comparability of scores from different
Assuring fairness Assuring technical accuracy
Key concern is the with appropriate use

of expert judgment
test forms
May 1, 2010
38
3. Requirements for New Jobs
Issues include:
Key here is also appropriate use of

expert judgment
Identifying test content Establishing passing scores Assessing reliability Demonstrating validity
May 1, 2010
39
4. Assuring Access to Certification and Licensure Testing
See also separate presentation on fairness Issues include:

Determining appropriate versus inappropriate
accommodations Relating testing accommodations to accommodations available in the work place
May 1, 2010
40
5. Certification and Licensure versus Employment Testing
Currently, two sections in the same

chapter Examples of relevant issues:
Goal is to increase coherence in approach

to these two related uses of tests
May 1, 2010
Differences in how test content is identified Differences in validation strategies Differences in test score use Who oversees testing
41
Revision of the Standards for Educational and Psychological Testing: Format and Publication
Barbara Plake University of Nebraska-Lincoln
Format Issues
Organization of Chapters Consideration of ways to identify of Priority

Standards More parallelism between chapter Tone Complexity Technical language
May 1, 2010
43
Organization of Chapters
1999 Testing Standards
Three sections Foundation: Validity, Reliability, Test Development,
Scaling & Equating, Administration & Scoring, Documentation Fairness: Fairness, Test Takers Rights and Responsibilities, Disabilities, Linguistic Minorities Applications: Test Users, Psychological, Educational, Workplace, Policy
May 1, 2010
44
Revised Test Standards Possible Chapter Organization

Section 1: Validity, Reliability, Fairness Section 2: Test Design and Development, Scaling & Equating, Test Administration & Scoring, Documentation, Test Takers, Test Users Section 3: Psychological, Educational, Workplace, Policy and Accountability
May 1, 2010
45
Possible Ways to Identify Priority Standards
Clustering of Standards into thematic

topics Over-arching Standards/ Guiding Principles Application Chapters
standards
Connection of standards to previous
May 1, 2010
46
More Parallelism Across Chapters
Cross-team collaborations Content editor with psychometric

expertise Structural continuity
May 1, 2010
47
Publication Options
Management Committee responsibility Goal is for electronic access Pursuing options for Kindle, etc. Concerns about retaining integrity and
financial support for future revision efforts
May 1, 2010
48

NCME 2010 Update On Test Standards Revision 4-27-10

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

NCME 2010 Update On Test Standards Revision 4-27-10

Uploaded by

Copyright:

Available Formats

Update on the Revisions to the Standards for Educational and Psychological Testing: Overview

Joint Committee Members

Joint Committee Members

Scope of the Revision

Based on comments each organization

Five Identified Areas for the Revisions

Access/Fairness Accountability Technology Workplace Format issues

Update on Revisions to the Test Standards

Update on Revisions to the Test Standards

Presentation: Five Identified Areas & Discussant

Revision of the Standards for Educational and Psychological Testing: Fairness

Joan Herman CRESST/UCLA

1999 Approach to Fairness Committee Charge Revision Response

Update on Revisions to the Test Standards

Standards related to fairness appear

Update on Revisions to the Test Standards

Five elements of the charge focused on accommodations/modifications

Overview to Fairness Chapter

Update on Revisions to the Test Standards

Four Clusters of Standards

Revision of the Standards for Educational and Psychological Testing: Accountability

Laura Hamilton RAND Corporation

Use of tests for accountability has expanded

Organization of Accountability Material

Chapter on policy uses of tests focuses on use

Chapter on educational testing addresses

Validity, reliability, and fairness standards in

earlier chapters apply to accountability testing as well

Some Key Accountability Issues Included in Our Charge

Update on Revisions to the Test Standards

Update on Revisions to the Test Standards

2. Validity, Reliability, and Reporting Requirements

Update on Revisions to the Test Standards

2. Validity, Reliability, and Reporting Requirements, cont.

Guidance should be provided for interpretation of scores from subgroups

Update on Revisions to the Test Standards

Consequences of accountability policies should be monitored

Update on Revisions to the Test Standards

4. Addressing Interim Assessments

Interim assessments are common but take many different forms

Revision of the Standards for Educational and Psychological Testing: Technology

Denny Way Pearson

Key Technology Issues Included in our Charge

Reliability & validity of innovative item formats

Validity issues associated with the use of:

Reliability & Validity of Innovative Item Formats

Update on Revisions to the Test Standards

Automated Scoring Algorithms

Update on Revisions to the Test Standards

Expert Panel Input

To address issues related to innovative

Highlights of Technology Panel Input

Rationale / validity argument Usability studies / field testing

Update on Revisions to the Test Standards

Highlights of Technology Panel Input

Disclosure of automated scoring algorithms:

Disclose everything to great detail (use patents to

Quality Assurance: Importance of

Update on Revisions to the Test Standards

Automated Score Reports and Interpretation

Documentation of rationale Supporting validity evidence

Update on Revisions to the Test Standards