Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 48

Update on the Revisions to the Standards for Educational and Psychological Testing: Overview

2010 Annual Meeting of the NCME Denver, Colorado May 1, 2010, 4:05 6:05 p.m.

Michael Kolen
University of Iowa

Joint Committee Members

Lauress Wise, Co Chair, HumRRO Barbara Plake, Co Chair, University of Neb. Linda Cook, ETS Fritz Drasgow, University of Illinois Brian Gong, NCIEA Laura Hamilton, Rand Corporation Jo-Ida Hansen, University on MN Joan Herman, UCLA
May 1, 2010
Update on Revisions to the Test Standards

Joint Committee Members

Michael Kane, ETS Michael Kolen, University of Iowa Antonio Puente, UNC-Wilmington Paul Sackett, University of MN Nancy Tippins, Valtera Corporation Walter (Denny) Way, Pearson Frank Worrell, Univ of CA- Berkeley
May 1, 2010
Update on Revisions to the Test Standards

Scope of the Revision

Based on comments each organization

received from invitation to comment Summarized by the Management Committee in consultation with the CoChairs

Wayne Camara, Chair, APA Suzanne Lane, AERA David Frisbie, NCME
Update on Revisions to the Test Standards

May 1, 2010

Five Identified Areas for the Revisions

Access/Fairness Accountability Technology Workplace Format issues

May 1, 2010

Update on Revisions to the Test Standards

Theme Teams

Working teams Cross team collaborations Chapter Leaders Focusing of bringing into chapters
content related to themes in coherent and meaningful ways

May 1, 2010

Update on Revisions to the Test Standards

Presentation: Five Identified Areas & Discussant

Fairness Joan Herman Accountability Laura Hamilton Technology Denny Way Workplace Laurie Wise Format and Publication Options
Barbara Plake Discussant - Steve Ferrara, NCME Liaison to JC
Update on Revisions to the Test Standards

May 1, 2010

Timeline

First meeting January, 2009 Three year process for completing text of

revision Release of draft revision following December 2010 JC meeting Open comment/Organization reviews Projected publication Summer, 2012
Update on Revisions to the Test Standards

May 1, 2010

Revision of the Standards for Educational and Psychological Testing: Fairness


2010 Annual Meeting of the NCME Denver, Colorado May 1, 2010, 4:05 6:05 p.m.

Joan Herman CRESST/UCLA

Overview

1999 Approach to Fairness Committee Charge Revision Response

May 1, 2010

Update on Revisions to the Test Standards

10

1999 Approach

Standards related to fairness appear


throughout many chapters Concentrated attention in:

Chapter 7: Fairness in Testing and Test Use Chapter 8: Rights and Responsibilities of
Test Takers Chapter 9: Testing Individuals of Diverse Linguistic Backgrounds Chapter 10: Testing Individuals with Disabilities

May 1, 2010

Update on Revisions to the Test Standards

11

Committee Charge

Five elements of the charge focused on accommodations/modifications


Impact/differentiation of accommodation and modification Appropriate selection/use for ELL and EWD Attention to other groups, e.g., pre-K, older populations Flagging Comparability/validity

One element focused on adequacy and comparability of translations One element focused on Universal Design
Update on Revisions to the Test Standards

May 1, 2010

12

Revision Response

Fairness is fundamental to test validity: include as foundation chapter Fairness and access are inseparable Same principles of fairness and access apply to all individuals and regardless of specific subgroup From three chapters to a single chapter describe core principles and standards
Examples drawn from ELs, EWD, and other groups (young children, aging adults adults, etc) Comments point to applications for specific groups Special standards retained where appropriate (e.g., test translations)
Update on Revisions to the Test Standards

May 1, 2010

13

Overview to Fairness Chapter

Section I: General Views of Fairness Section II: Threats to the Fair and Valid
Interpretations of Test Scores Section III: Minimizing Construct Irrelevant Components Through the Use of Test Design and Testing Adaptations Section IV: The Standards

May 1, 2010

Update on Revisions to the Test Standards

14

Four Clusters of Standards


1. Use test design, development administration and scoring procedures that minimize barriers to valid test interpretations for all individuals. Conduct studies to examine the validity of test score inferences for the intended examinee population. Provide appropriate accommodations to remove barriers to the accessibility of the construct measured by the assessment and to the valid interpretation of the assessment scores. Guard against inappropriate interpretations, use, and/or unintended consequences of test results for individuals or subgroups.
Update on Revisions to the Test Standards

2.

3.

4.

May 1, 2010

15

Revision of the Standards for Educational and Psychological Testing: Accountability


2010 Annual Meeting of the NCME Denver, Colorado May 1, 2010, 4:05 6:05 p.m.

Laura Hamilton RAND Corporation

Overview

Use of tests for accountability has expanded

Most notably in education but also in other areas such as behavioral health Facilitated by increasing availability of data and analysis tools Recent and impending federal and state initiatives will likely lead to further expansion

Under NCLB, or new pay for performance programs, tests often have consequences for individuals other than the examinees
Use of test scores in policy and program evaluations continues to be widespread

Reinforced by groups that fund and evaluate research (e.g., IES, What Works Clearinghouse)
Update on Revisions to the Test Standards

May 1, 2010

17

Organization of Accountability Material

Chapter on policy uses of tests focuses on use


of aggregate scores for accountability and policy

Chapter on educational testing addresses

student-level accountability (e.g., promotional gates, high school exit exams) and interim assessment

Validity, reliability, and fairness standards in

earlier chapters apply to accountability testing as well


Update on Revisions to the Test Standards

May 1, 2010

18

Some Key Accountability Issues Included in Our Charge


1. Calculation of accountability indices using composite scores at level of institution or individual

Institutional level (e.g., conjunctive and disjunctive rules for combining scores) Individual level (e.g., teacher value-added modeling)

2. Issues related to validity, reliability, and reporting of individual and aggregate scores 3. Test preparation 4. Interim assessments

May 1, 2010

Update on Revisions to the Test Standards

19

1. Accountability Indices

Most test-based accountability systems require calculation of indices using complex set of rules Advances in data systems and statistical methodology have led to more sophisticated indices to support causal inferences

E.g., teacher and principal value-added measures Consequences attached to these measures are growing increasingly significant

May 1, 2010

Update on Revisions to the Test Standards

20

2. Validity, Reliability, and Reporting Requirements



Accountability indices should be subjected to validation related to intended purposes Error estimates should be incorporated into score reports, including those that provide subscores and diagnostic guidance for individuals or groups Reports should provide clear, detailed information on rules used to create aggregate scores or indices

May 1, 2010

Update on Revisions to the Test Standards

21

2. Validity, Reliability, and Reporting Requirements, cont.

Guidance should be provided for interpretation of scores from subgroups

Describe exclusion rules, accommodations, and modifications Address error stemming from small subgroups Explain contribution of subgroup performance to accountability index

Teachers and other users should be given assistance to ensure appropriate interpretation and use of information from tests

May 1, 2010

Update on Revisions to the Test Standards

22

3. Test Preparation


High-stakes testing raises concerns about inappropriate test preparation Users should take steps to reduce likelihood of test preparation that undermines validity
Help administrators and teachers understand what kinds of preparation are appropriate and desirable Design tests and testing systems to limit likelihood of harmful test preparation

Consequences of accountability policies should be monitored

May 1, 2010

Update on Revisions to the Test Standards

23

4. Addressing Interim Assessments

Interim assessments are common but take many different forms

Some produced by commercial publishers, others homegrown Vary in the extent to which they provide formative feedback vs. benchmarking to end-of-year tests Need to determine which of these tests should be subjected to the Standards

Requirements for validity and reliability depend in part on how scores are used

If used for high-stakes decisions such as placement, evidence of validity for that purpose should be provided Systems that provide instructional guidance should include rationale and evidence to support it
Update on Revisions to the Test Standards

May 1, 2010

24

Revision of the Standards for Educational and Psychological Testing: Technology


2010 Annual Meeting of the NCME Denver, Colorado May 1, 2010, 4:05 6:05 p.m.

Denny Way Pearson

Overview
Technological advances are changing the way
tests are delivered, scored, interpreted and in some cases, the nature of the tests themselves The Joint Committee has been charged with considering how technological advances should impact revisions to the Standards As with the other themes, comments on the standards that related to technology were compiled by the Management Committee and summarized in their charge to the Joint Committee
Update on Revisions to the Test Standards

May 1, 2010

26

Key Technology Issues Included in our Charge

Reliability & validity of innovative item formats

Validity issues associated with the use of:


Automated scoring algorithms Automated score reports and interpretations

Security issues for tests delivered over the internet Issues with web-accessible data, including data warehousing
Update on Revisions to the Test Standards

May 1, 2010

27

Reliability & Validity of Innovative Item Formats



What special issues exist for innovative items with respect to access and elimination of bias against particular groups? How might the standards reflect these issues? What steps should the standards suggest with regards to usability of innovative items? What issues will emerge over the next five years related to innovative items/test formats that need to be addressed by the standards?

May 1, 2010

Update on Revisions to the Test Standards

28

Automated Scoring Algorithms


What level of documentation/disclosure is appropriate and tolerable for automated scoring developers/vendors? What sorts of evidence seem most important for demonstrating the validity and reliability of automated scoring systems? What issues will emerge over the next five years related to automated scoring systems that need to be addressed by the standards?

May 1, 2010

Update on Revisions to the Test Standards

29

Expert Panel Input

To address issues related to innovative


item formats and automated scoring algorithms, we convened a panel of experts from the field and solicited their advice Invited members made presentations on these topics and discussed associated issues with the joint standards committee
Update on Revisions to the Test Standards

May 1, 2010

30

Highlights of Technology Panel Input

Test development and simulations Security & Fairness Timed tasks & processing speed Innovative clinical assessments & faking
(effort assessment)

Rationale / validity argument Usability studies / field testing

May 1, 2010

Update on Revisions to the Test Standards

31

Highlights of Technology Panel Input

Disclosure of automated scoring algorithms:


Differing viewpoints

Disclose everything to great detail (use patents to


protect proprietary IP) vs. provide sufficient documentation for other experts to confirm validity of process Possible compromise: expert review under conditions of nondisclosure

Quality Assurance: Importance of


independent calibrations

May 1, 2010

Update on Revisions to the Test Standards

32

Automated Score Reports and Interpretation

Use of computer for score interpretation Actionable reports (e.g., routing students and teachers to instructional materials and lesson plans based on test results)

Documentation of rationale Supporting validity evidence

May 1, 2010

Update on Revisions to the Test Standards

33

Revision of the Standards for Educational and Psychological Testing: Workplace Testing
2010 Annual Meeting of the NCME Denver, Colorado May 1, 2010, 4:05 6:05 p.m.

Laurie Wise
Human Resources Research Organization
(HumRRO)

Overview

Standards for testing in the work place are currently covered in Chapter 14 (one of the testing application chapters). Work-place testing includes employment testing as well as licensure, certification, and promotion testing.

Comments on standards related to work place testing were received by the Management Committee and summarized in their charge to the Joint Committee.
Comments suggested areas for extending or clarifying testing standards, but did not suggest major revisions existing standards.

May 1, 2010

Update on Revisions to the Test Standards

35

Key Work-Place Testing Issues Included in Our Charge


1. 2. 3. Validity and reliability requirements for certification and licensure tests. Issues when tests are administered only to small populations of job incumbents. Requirements for tests for new, innovative job positions that do not have incumbents or job history to provide validity evidence. Assuring access to licensure and certification tests for examinees with disabilities that may limit participation in regular testing sessions? Differential requirements for certification and licensure and employment tests.
Update on Revisions to the Test Standards

4.

5.

May 1, 2010

36

1. Validity and Reliability Requirements for Certification

Some specific issues:

Documenting and communicating the


validity and reliability of pass-fail decisions in addition to the underlying scores How cut-offs are determined How validity and reliability information is communicated to relevant stakeholders

A key change is the need for focus on


pass-fail decisions
May 1, 2010
Update on Revisions to the Test Standards

37

2. Issues with Small Examinee Populations

Including:

Alternatives to statistical tools for item screening Alternatives to empirical validity evidence Maintaining comparability of scores from different
Assuring fairness Assuring technical accuracy

Key concern is the with appropriate use


of expert judgment
Update on Revisions to the Test Standards

test forms

May 1, 2010

38

3. Requirements for New Jobs

Issues include:

Key here is also appropriate use of


expert judgment

Identifying test content Establishing passing scores Assessing reliability Demonstrating validity

May 1, 2010

Update on Revisions to the Test Standards

39

4. Assuring Access to Certification and Licensure Testing

See also separate presentation on fairness Issues include:


Determining appropriate versus inappropriate

accommodations Relating testing accommodations to accommodations available in the work place

May 1, 2010

Update on Revisions to the Test Standards

40

5. Certification and Licensure versus Employment Testing

Currently, two sections in the same


chapter Examples of relevant issues:

Goal is to increase coherence in approach


to these two related uses of tests
May 1, 2010
Update on Revisions to the Test Standards

Differences in how test content is identified Differences in validation strategies Differences in test score use Who oversees testing

41

Revision of the Standards for Educational and Psychological Testing: Format and Publication
2010 Annual Meeting of the NCME Denver, Colorado May 1, 2010, 4:05 6:05 p.m.

Barbara Plake University of Nebraska-Lincoln

Format Issues

Organization of Chapters Consideration of ways to identify of Priority


Standards More parallelism between chapter Tone Complexity Technical language

May 1, 2010

Update on Revisions to the Test Standards

43

Organization of Chapters

1999 Testing Standards

Three sections Foundation: Validity, Reliability, Test Development,

Scaling & Equating, Administration & Scoring, Documentation Fairness: Fairness, Test Takers Rights and Responsibilities, Disabilities, Linguistic Minorities Applications: Test Users, Psychological, Educational, Workplace, Policy

May 1, 2010

Update on Revisions to the Test Standards

44

Revised Test Standards Possible Chapter Organization


Section 1: Validity, Reliability, Fairness Section 2: Test Design and Development, Scaling & Equating, Test Administration & Scoring, Documentation, Test Takers, Test Users Section 3: Psychological, Educational, Workplace, Policy and Accountability
Update on Revisions to the Test Standards

May 1, 2010

45

Possible Ways to Identify Priority Standards

Clustering of Standards into thematic



topics Over-arching Standards/ Guiding Principles Application Chapters
standards

Connection of standards to previous

May 1, 2010

Update on Revisions to the Test Standards

46

More Parallelism Across Chapters

Cross-team collaborations Content editor with psychometric


expertise Structural continuity

May 1, 2010

Update on Revisions to the Test Standards

47

Publication Options

Management Committee responsibility Goal is for electronic access Pursuing options for Kindle, etc. Concerns about retaining integrity and
financial support for future revision efforts

May 1, 2010

Update on Revisions to the Test Standards

48

You might also like