Professional Documents
Culture Documents
Third Group of The Princples of Assessment
Third Group of The Princples of Assessment
Written by:
ALIYA IZET BEGOVIC YAHYA
EVI MALA WIJAYANTI
NAFRIANTI
RAIHANA PERMATA SARI
A. Summary
Assessment is an ongoing process that encompasses a much wider domain than a test
(Brown, 2004:4). It is meant that all students’ performance, whether written or spoken work. It
does imply that the teacher does not only rely on testing score but also the things happen along the
classroom are being assessed.
Related with assessment, Brown explores the principles of assessment which are divided
into five types: practicality, reliability, validity, authenticity and washback. These five principles
should be applied in assessment.
First, practicality related to factors such as cost, time, administration, and
scoring/evaluation (p.19). It refers to the relationship between the resources that will be required
in the design, development and use of the test and the resources that will be available for
assessment.
Second, reliability refers to the extent which a test produces consistent scores at different
administrations to the similar group of test takers. Reliability are divided into 4 types which are
student-related reliability, rater reliability, test administration reliability and test reliability.
The first type of reliability, student-related reliability refers to psychological and physical factors
including illness, fatigue and bad day which can affect the true score of the test-takers and brings
out ‘observed score’. It infers to students’ performances are not fully administered during the tests.
The second type of reliability, rater reliability is divided into two types which can affect the
assessment: inter-rater reliability and intra-rater reliability. These two types related to the rater’s
internal and external factors affect the assessment. The third type of reliability, test administration
refers to the conditions that triggers in which test is administered such as noisy sound, the amount
of light, variations in temperature, etc. The last type of reliability, test reliability is meant that the
tests should be fit into the time constraints, not too long or short and it also should be clearly
written.
Third, validity is about the extent to which inferences made from assessment results are
appropriate, meaningful and useful in terms of the purpose of the assessment. In order to establish
validity, we have to consider five types of validity: content validity, criterion validity, construct
validity, consequential validity and face validity. The first type of validity, content validity is
about the relations between what the tests actually matters and the conclusions are to be drawn
from it. For example, in assessing listening, the teacher can use multiple-choice test. The second
type of validity, criterion validity is referred to the extent to which performance on a test is related
to criterion which the indicator of ability being tested. For example we can obtain criterion validity
in communicative classroom test if test scores are added to communicative measures of grammar
points. Criterion validity falls into two categories: concurrent validity and predictive validity.
It refers to the test scores are supported by other concurrent performance whereas predictive
validity refers to a prediction of a test-taker’s likelihood of future success. The third type of
validity, construct validity is about the extent to which a test actually taps into the theoretical
construct (theory) as it has been defined. For example, proficiency and communicative competence
are linguistic constructs. The fourth type of validity, consequential validity is about the positive or
negative consequences of a particular test. The last type of validity, face validity is the extent to
which ‘students view the assessment as fair, relevant, and useful for improving learning’
(Gronlund, 1998:210). It is meant that the validity is all about test-taker’s point of view so it
becomes more subjective than other types of validity.
Fourth, authenticity refers to the degree of correspondence of the characteristics of a given
language test task to the features of a target language task and then suggest an agenda for
identifying those target language tasks and for transforming them into valid test items (Bachman
and Palmer, 1996:23). In short, the task will be valid if it is likely to be enacted in the real world.
Therefore, in a test, authenticity may be present in the following ways: the test should be natural
as possible; items are contextualized; topics are meaningful for the learner; some thematic
organization to items is provided through a story line or episode; and tasks represent closely real-
world tasks.
Fifth, washback refers to the effects the tests have on instruction in terms of how students
prepare for the test. It can be said as a facet of consequential validity. This validity also refers to
the effects of an assessment on teaching and learning prior to the assessment itself, such as
preparation before tests. In enhancing washback, the teacher should consider to comment
generously and specifically on test performance which we call it as feedback. It is much better than
single letter grade or numerical score in a test.
Here some books that also discussed about the principles of assessment. The books are:
Anderson, L. W. 2003. Classroom assessment. London: LEA Publisher
Earl, L.M & Katz, M. S. 2006. Rethinking classroom assessment with purpose in mind:
Assessment for learning, assessment as learning, assessment of learning. Manitoa
Education
Russel,l, M.K. 2012. Classroom assessment: Concept and application. McGraw Hill
Stufflebeam, D.L & Coryn, C. L. S. 2014. Evaluation theory, models, and applications.
Jossey-Bass
To know more detail about the principles from those authors, in the below the summary of each
books.
1. Anderson (2012)
Generally, in this book talks about classroom assessment focus in concepts and application.
Assessment is a daily, ongoing, integral part of teaching and learning. Classroom Assessment:
Concepts and Applications explores how assessment is a key component of all aspects of the
instructional process, including organizing and creating a classroom culture, planning lessons,
delivering instruction, and examining how students have grown as result of instruction. The text
also introduces preservice teachers to new tools and approaches to classroom assessment that result
from the infusion of computer-based technologies in schools. This is the most teacher friendly
assessment textbook available one that will inform a teacher’s assessment practices for years to
come.
The explanation about the assessment more complete discussing in the chapter2. Here the
explanation about the assessment proposed by Anderson (2012).
No Name Definition
1 Assessment a process of collecting, synthesizing, and interpreting
information in order to make a decision. Depending on the
decision being made and the information a teacher needs in
order to inform that decision, testing, measurement, and
evaluation often contribute to the process of assessment.
2 Testing a formal, systematic procedure used to gather information
about student’s achievement or other cognitive skills
3. Measurement a process of quantifying or assigning a number to a
performance or trait. The example is when a teacher scores a
quiz or test.
4. Evaluation a product of assessment that produces a decision about the
value or worth of a performance or activity based on
information that has been collected, synthesized, and reflected
on.
a. STANDARDIZED ASSESSMENT
Administered, scored, and interpreted in the same way for students, regardless of where
and when they are assessed. The main reason for standardizing assessment procedures is to ensure
that the testing conditions and scoring procedures have a similar effect on the performance of
students in different schools and states.
b. NON-STANDARDIZED ASSESSMENT
Constructed for use in a single classroom with a single group of students. It majorly focused
in the single classroom. It's important to know that standardized tests are not necessarily better
than non-standardized ones. Standardization is only important when information from an
assessment instrument is to be used for the same purpose across many different classroom and
location.
4. Record Keeping
High-quality record-keeping is critical for ensuring quality in classroom assessment. The
records that teachers and students keep are the evidence that support the decisions that are made
about students’ learning. The records should include detailed and descriptive information about
the nature of the expected learning as well as evidence of students’ learning and should be collected
from a range of assessments.
3. Russell (2012)
This part will discuss the summary of chapter one and two in Anderson’s book (2003) which
related to the principles in classroom assessment. Chapter one discuss the introduction of
classroom assessment while in chapter two Anderson discuss the why, what, and when of
assessment.
1. Chapter one: Introduction of Classroom Assessment
In this chapter, Anderson (2003) emphasizes that one of the keys to be a good teacher
is laying in the decision that teacher make. It supports what Shavelson (1973, p. 18) argues
that any teaching act is the result of a decision, whether conscious or unconscious, that the
teacher makes after a complex cognitive processing of available information. The
reasoning leads the hypothesis that the basic teaching skill is decision making.
In line with the decision making by the teachers, another thing to be concerned is
that how we understand the teacher decision. The teacher decision making is closely related
to how teacher will place students in certain cell. In the process of decision making, teacher
needs the source of information as consideration as following:
- Health information
- Transcripts course taken and grades earned in those course
- Written comments made by teacher;
- Standardized test score;
- Disciplinary referrals;
- Correspondence between home and school;
- Participation in extracurricular activities
- Portions of divorce decrees pertaining to child and visitation rights;
- Arrest records.
The quality of information
a. Validity
In general terms, validity is the extent to which the information obtained from an
assessment instrument (e.g., test) or method (e.g., observation) enables you to
accomplish the purpose for which the information was collected. In terms of
classroom assessment, the purpose is to inform a decision. For example, a 10
CHAPTER 1 teacher wants to decide on the grade to be assigned to a student or a
teacher wants to know what he or she should do to get a student to work harder.
b. Reliability
Reliability is the consistency of the information obtained from one or more
assessments. Some writers equate reliability with dependability, which conjures up
a common-sense meaning of the term. A reliable person is a dependable one—a
person who can be counted on in a variety of situations and at various times.
Similarly, reliable information is information that is consistent across tasks,
settings, times, and/or assessors.
c. Objectivity
In the field of tests and measurement, objectivity means that the scores assigned by
different people to students’ responses to items included on a quiz, test, homework
assignment, and so on are identical or, at the very least, highly INTRODUCTION
TO CLASSROOM ASSESSMENT 13 similar. If a student is given a multiple-
choice test that has an accompanying answer key, then anyone using the answer
key to score the tests should arrive at the same score. Hence, multiple-choice tests
(along with true-false tests, matching tests, and most short answer tests) are referred
to as objective tests. Once again, as in the case of test validity, this is a bit of a
misnomer. It is the scores on the tests, not the tests per se, that are objective.
Here, there are some general criteria for evaluating program evaluation theories that organized
by category
a. Professionalizing Program Evaluation
Is the theory useful for. . .
• Generating and testing standards for program evaluations?
• Clarifying roles and goals of program evaluation?
• Developing needed tools and strategies for conducting evaluations?
• Providing structure for program evaluation curricula?
b. Research
Is the theory useful for. . .
• Generating and testing predictions or propositions concerning evaluative actions and
consequences?
• Application to specific classes of program evaluation (the criterion of particularity) or a wide
range of program evaluations (the criterion of generalizability)?
• Generating new ideas about evaluation (the criterion of heuristic power)?
• Drawing out lessons from evaluation practice to generate better theory?
c. Planning Evaluations
Is the theory useful for. . .
• Giving evaluators a structure for conceptualizing evaluation problems and approaches?
• Determining and stating comprehensive, clear assumptions for particular evaluations?
• Determining boundaries and taking account of context in particular program evaluations?
•Providing reliable, valid, actionable direction for ethically and systematically conducting
effective program evaluations?
d. Staffing Evaluations
Is the theory useful for. . .
• Clarifying roles and responsibilities of evaluators?
• Determining the competencies and other characteristics evaluators need to conduct sound,
effective evaluations?
•Determining the areas of needed cooperation and support from evaluation clients and
stakeholders?
e. Guiding Evaluations
Is the theory useful for. . .
•Conducting evaluations that are parsimonious, efficient, resilient, robust, and effective?
• Promoting evaluations that clients and others can and do use?
• Promoting integrity, honesty, and respect for people involved in program evaluations?
• Responsibly serving the general and public welfare?
The program evaluation field could benefit by employing the methodology of grounded
theories as one theory development tool. In applying this approach, theorists would generate
theories grounded in systematic, rigorous documentation and analysis of actual program
evaluations and their particular circumstances. Besides that, researchers should use metaevaluation
reports systematically to examine the reasons why different evaluation approaches succeeded or
failed.
Sound theories of evaluation are needed to advance effective evaluation practices. The
history of formal program evaluation includes theoretical approaches that have proved useful,
limited, or in some cases counter productive (for instance, objectives-based evaluation and
randomized controlled experiments). The definition of an evaluation theory is more demanding
than that of an evaluation model (an evaluation theorist’s idealized conceptualization for
conducting program evaluations). An evaluation theory is defined as a coherent set of conceptual,
hypothetical, pragmatic, and ethical principles forming a general framework to guide the study and
practice of program evaluation. Beyond meeting these requirements, an evaluation theory should
meet the following criteria: utility in efficiently generating verifiable predictions or propositions
concerning evaluative acts and consequences; provision of reliable, valid, actionable direction for
ethically conducting effective program evaluations; and contribution to an evaluation’s clarity,
comprehensiveness, parsimony, resilience, robustness, generalizability, and heuristic power.
Despite these demanding requirements of sound evaluation theories, theory development must be
respected as a creative process that defies prescriptions of how to develop a soundtheory.
D. Discussion
Each author has their own way to explain the principles. The table below shows that the
difference of ways of each author in explaining the principles of assessment from Brown’s book.
In short, the difference can be shown in the table:
Anderson (2003) Earl & Katz (2006) Russell (2012) Stufflebeam &
Coryn (2014)
1. Anderson used 1. The same term 1. The book emphasizes 1. Overall
terms of quality ‘quality’ by not only in classroom coherence
information in Anderson is used assessment, but also to 2. Tested
referring to in here the action after that: hypotheses
principles 2. Earl and Katz Decision making; concerning how
2. The terms of added two 2. The book provides the evaluation
objectivity is principles which important things procedures
overlapped with are reference point related to classroom produce desired
Brown’s and record- assessment: why, outcomes
definition of keeping what, when and how 3. Ethical
reliability. the classroom requirements
3. The book mainly assessment need to be 4. A general
discuss issues conducted; framework for
about classroom 3. The book discuss the guiding program
assessment. So it issues of classroom evaluation
is more practical assessment. One of practice and
than Brown. them is the ethics of conducting
assessment research on
program
evaluation
E. Conclusion
In conclusion, five of books are well written and well conceptualized. The way to explain
the principles can be understood by beginning-level and advanced teachers. Overall, the concept
of the principles of assessment are same, how to evaluate in appropriate way. The Brown’s book
is given the theoretical basic in the principles of assessment rather than others book. On the other
hand, the rest of books can be useful to enrich knowledge about the principles of assessment from
various perspectives that can help develop the assessment appropriately. Each book has strengths
and weaknesses. However, they can complete the gap among them. It is recommended to read all
of those books.
References