Methodology 3 Exam Notes

1. Name at least 4 out of 8 factors that make listening difficult and explain.
Clustering – attending to appropriate chunks of language: phrases, words, clauses,

constituents etc.
Redundancy – recognizing the kinds of repetitions, rephrasing, elaborations and
insertions that unrehearsed spoken language often contains, and benefiting from that
recognition.
Reduced forms – understanding the reduced forms that may have not been a part of
an English learners past learning experiences in classes where only formal
“textbook” language has been presented.
Performance variables – being able to weed out hesitations, false starts, pauses and
corrections in natural speech.
Colloquial language – comprehending idioms, slang, reduced forms, and shared
cultural knowledge.
Rate of delivery – keeping up with the speed of delivery, processing automatically as
the speaker delivers.
Interaction – managing the interactive flow of language from listening to speaking
to listening, etc.
Stress, rhythm, and intonation – correctly understanding prosodic elements of
spoken language, which is almost always more difficult than understanding the smaller
phonological bits and pieces.
2. Validity and reliability?
By far the most complex and important principle of an effective test – validity, “the
extent to which inferences made from assessment results are appropriate,
meaningful and useful in terms of the purpose of the assessment”(Gronlund).
Several different kinds of evidence may be invoked in order to confirm a test’s
validity. It may be appropriate to examine the extent to which a test calls for
performance that matches that of the course or unit of study being tested. We may
be concerned how well a test determines whether or not students have reached an
established set of goals or level of competence. Statistical correlation with other
related but independent measures is another widely accepted form of evidence. Other
concerns of a test’s validity may focus on the consequences of a test, or even on the test
taker’s perception of validity.
Content validity (content related evidence) – if a test actually samples the subject matter
about which conclusions are to be drawn, and if it requires the test taker to perform the
behavior that is being measured.
Criterion-related validity – the extent to which the criterion of the test has been
reached. In such tests, specified classroom objectives are measured, and implied
predetermined levels of performance are expected to be reached.
Construct validity – asks does the test actually tap into the theoretical construct as it has
been defined.
Consequential validity – Consequences of a test such as accuracy in measuring intended
criteria, the test’s impact on the preparation of the test takers, its effects on the learner.
Face validity – the degree to which the test looks right and appears to measure the
knowledge or abilities it claims to measure, based on the subjective judgment of the
examinees who take it and the administrative personnel who decide on its use.
A reliable test is consistent and dependable and yields the same results if given to the
same or matched students on two different occasions. The reliability of a test may
best be considered by checking a number of factors which contribute to the
unreliability of a test: fluctuations in scoring, in the student (temporary illness,
fatigue, a bad day or anxiety), test administrations and in the test itself.
3. What are test specifications?
Most often a simple outline of a test. It can comprise of a broad outline of the test,
what skills will be tested and what the test items will look like.
4. Cloze task?
These tasks were developed on the assumption that, in written language, a sentence
with a word left out should have enough context that a reader should be able to close
that gap with a calculated guess, using linguistic expectancies (formal schemata),
background expectancies (content schemata) and strategic competence.
Cloze tests are usually a minimum of two paragraphs in length in order to account
for discourse expectancies. Specifications for scoring and choosing deletions need to be
clearly defined. Typically, every seventh word is deleted (fixed-ratio deletion), but
many cloze test designers use a rational deletion procedure of choosing deletions
according to the grammatical or discourse functions of the words. Traditionally,
cloze passages have between 30-50 gaps to fill. The test may likewise come in a
multiple-choice format, making grading even easier.
There are two scoring methods for the test – the exact word method and the
appropriate word method.
5. Listening as a second fiddle?
Listening is often implied as a component of speaking. The overt nature of speaking

makes it more empirically measurable than listening. A deeper cause may be a
universal bias towards speaking, as a good speaker is often more valued than a good
listener. However, every good teacher of English knows that one’s oral production
ability is as good as one’s listening comprehension ability. Additionally, input in the
aural-oral mode accounts for a large portion of language acquisition, in other words,
we do far more listening than speaking in a day.
6. Proficiency vs. Achievement test?
If your aim is to test global competence in a language, then you are testing
proficiency. These tests have traditionally consisted of standardized multiple-choice
items on grammar, vocabulary, reading comprehension, and aural comprehension.
Proficiency tests are almost always summative and norm referenced, and provide
results in the form of a single score, which is a sufficient result for the gate-keeping
role they play of accepting or denying someone passage into the next stage of a journey.
(Example: TOEFL test)
An achievement test is related directly to classroom lessons, units, or even a total

curriculum. Achievement tests are limited to particular material addressed in a
curriculum within a particular time frame and are offered after a course has focused on
the objectives in question. The primary role of an achievement test is to determine
whether course objectives have been met by the end of a period of instruction.
These tests are often summative because they are administered at the end of a unit
or term of study. An affective achievement test will offer washback about the quality
of a learner’s performance in subsets of the unit or course. Achievement tests range
from five/ ten minute quizzes to three-hour final examinations, with an almost infinite
variety of item types and formats.
7. Three major genres of writing (according to Brown) with at least three examples for
each!
Academic writing:
Papers, essays, short-answer test responses, technical reports, dissertations, theses
Job-related writing:
Messages, letters/emails, memos, reports, advertisements, manuals
Personal writing:
Emails, letters, greeting cards, messages, notes, forms, fiction, diaries, questioners,
medical reports etc.
8. Three kinds of test reliability?
Student-related reliability, rater reliability and test administration reliability.
9. C-test (vs. Cloze test)? Add examples!

The C-test is a variation on the standard cloze test. In this test, the second half
(according to the number of letters) of every other word is obliterated and the test-
taker must restore each word. Some consider it to be irritating.
10. Alternative assessment (Anderson quote; fill-in-the-gaps)?
Alternative assessment’ is usually taken to mean assessment procedures which are less
formal than traditional testing, which are gathered over a period of time rather than
being taken at one point in time, which are usually formative rather than summative in
function, are often low-stakes in terms of consequences, and are claimed to have
beneficial washback effects (2001: 228)
11. Why are Validity and Washback important?
Teachers can make accurate judgments about the competence of the learners they
are working with based on the validity of a test.
Washback – “the effect of testing on teaching and learning”

Washback enhances a number of basic principles of language acquisition: intrinsic
motivation, autonomy, self-confidence, language ego, interlanguage, and strategic
investment. It entails giving more than a simple score or grade to the test taker or
student. It entails responding to as many details as time will permit. It includes giving
praise for strengths and constructive criticism for weaknesses, give strategic hints
for improvement. Simply put, it entails making the test performance an intrinsically
motivating experience from which a student will gain a sense of accomplishment
and challenge.
12. Listening: Macroskills and microskills?
Macroskills:
Recognize the communicative functions of utterances, according to situations,
participants, goals.
Infer situations, participants, goals using real-world language.
From events, ideas, and so on, described, deduce causes, and effects and such relations
as main idea, supporting idea, new information, given information, generalization and
exemplification.
Distinguish between literal and implied meanings.
Use facial, kinesic, body language, and other nonverbal cues to decipher meaning.
Develop and use a battery of listening strategies, such as detecting key words, guessing
the meaning of words from context, appealing for help, and signaling comprehension of
lack thereof.
Microskills:
Discriminate among the distinctive sounds of English.
Retain chunks of language of different lengths in short term memory.
Recognize English stress patterns, words in stressed and unstressed positions, rhythmic
structure, intonation contours, and their role in signaling information.
Distinguish word boundaries, recognize a core of words, and interpret word order
patterns and their significance.
Process speech at different rates of delivery.
Process speech containing pauses, errors, corrections, and other performance variables.
Recognize grammatical word classes, systems, patterns and elliptical forms.
Detect sentence constituents and distinguish between major and minor.
Recognize cohesive devices in spoken discourse.
Recognize that a particular meaning may be expressed in different grammatical forms.
13. 5 factors which make a test practical?
A practical test
Is not excessively expensive
Stays within appropriate time constraints
Is relatively easy to administer
Has a scoring procedure that is specific and time efficient
14. Holistic and Primary scoring?
Each point on a holistic scale is given a systematic set of descriptors, and the reader-
evaluator matches an overall impression with the descriptors to arrive at a score.
Descriptors usually follow a prescribed pattern. For example, the first descriptor across
all score categories may address the quality of task achievement, the second may
deal with organization, the third with grammatical or rhetorical organizations, and
so on. Scoring, however, is truly holistic in that those subsets are not quantitatively
added up to yield a score. Advantages include: fast evaluation, relatively high inter-
rater reliability, the fact that scores represent “standards” that are easily interpreted by
lay persons, the fact that scores tend to emphasize the writer’s strengths and applicability
to writing across many disciplines. For classroom instructional purposes, holistic
scoring provides very little. In most classroom setting where a teacher wishes to adapt a
curriculum to the needs of a particular group of students, much more differentiated
information across subskills is desirable than is provided by holistic scoring.
Primary scoring focuses on “how well students can write within a narrowly defined
range of discourse”. This type of scoring emphasizes the task at hand and assigns a
score based on the effectiveness of the text’s achieving that one goal. In summary, a
primary trait score would assess:
The accuracy of the account of the original (summary)
The expression of the writer’s opinion (response to an article)
A four point scale ranging from one to zero is suggested for the rating the primary
trait of the text. It goes without saying that organization, fluency, syntactic variety,
supporting details, and other features will implicitly be evaluated in the process of
offering a primary trait score.
15. Name at least three tasks for assessing listening.
Dictation, listening cloze, sentence repetition, note-taking, retelling, information

transfer etc.
16. Explain extensive reading.
Extensive reading applies to texts of more than a page, up to and including essays,
professional articles, technical reports, short stories and books. The purposes of
assessment are usually to tap into a learner’s global understanding of a text as
opposed to asking text-takers to “zoom in” on small details. Top-down processing is
assumed for more extensive tasks.
17. What skills do learners have to master in order to become efficient readers?
First, they need to be able to master fundamental bottom-up strategies for processing
separate letters, words and phrases, as well as top-down conceptually driven
strategies for comprehension.
Second, as a part of that top-down approach, second language learners must develop
appropriate content and formal schemata – background information and cultural
experience – to carry out those interpretations effectively.
18. Formative and summative assessment?
These two types of assessment are based on the function of the assessment.
Most of our classroom assessment is formative assessment – evaluating students in
the process of “forming” their competencies and skills with the goal of helping them
to continue their growth process. The key to such formation is the delivery (by the
teacher) and internalization (by the student) of appropriate feedback on
performance, with an eye on the future continuation (formation) of learning. Virtually
all kinds of informal assessment are formative.
Summative assessment aims to measure, or summarize, what a student has grasped,

and typically occurs at the end of a course or unit of instruction. A summation of
what a student has learned implies looking back and taking stock of how well that
student has accomplished objectives, but does not necessarily point the way to
future progress. Final exams in a course and general proficiency exams are examples
of summative assessment.
Formal assessments are the systematic, data-based tests that measure what and how well the students
have learned. Formal assessments determine the students’ proficiency or mastery of the content, and
can be used for comparisons against certain standards.
Examples:
 standardized tests
 criterion referenced tests
 norm referenced test
 achievement tests
 aptitude tests
Informal assessments are those spontaneous forms of assessment that can easily be incorporated in the
day-to-day classroom activities and that measure the students’ performance and progress. Informal
assessments are content and performance driven.
Examples:
 checklist
 observation
 portfolio
 rating scale
 time sampling
 event sampling
 anecdotal record
Formative vs. Summative Assessment

Formative assessment is designed to enhance learning by providing feedback to learners before
instructors issue evaluations of performance. Formative assessments identify strengths and weaknesses
of learners throughout a learning cycle and, therefore, aim to improve future performance. Formative
assessments communicate learners' mastery of material and skills to internal stakeholders; i.e. learners
and instructors.
Summative assessment (evaluation/grading) is designed to assess readiness for progression by providing
evaluations of performance. As the term suggests, summative assessment occurs at the end of an
educational activity or learning cycle and is designed to evaluate the learner's overall performance
(knowledge, skills sets, etc.). Summative evaluations serve as the basis for grade assignments. They
communicate learners' mastery of material and skills to external stakeholders; e.g., administrators and
prospective employers.
Traditional Assessment
A conventional method of assessment that has been followed since a long time is the traditional
assessment. It is a simple approach that generally utilizes a pen and paper or computer-based
examination method which constitutes a similar pattern of questions such as Multiple Choice, true and
false or matching items.
Alternative Assessment
Authentic assessments, as the name suggests, is a more practical and experimental approach of
evaluation. It requires the students to be involved in several tasks that utilize their analytical, reasoning
and logical thinking capabilities. This includes project work, research and experimentation. The things
that are learned are connected to their real-world applications. Hence this approach is more
comprehensive and beneficial.
Reliability refers to how dependably or consistently a test measures a characteristic. If a person takes
the test again, will he or she get a similar test score, or a much different score? A test that yields similar
scores for a person who repeats the test is said to measure a characteristic reliably.
How do we account for an individual who does not get exactly the same test score every time he or she
takes the test? Some possible reasons are the following:
 Test taker's temporary psychological or physical state. Test performance can be influenced by a
person's psychological or physical state at the time of testing. For example, differing levels of
anxiety, fatigue, or motivation may affect the applicant's test results.
 Environmental factors. Differences in the testing environment, such as room temperature,
lighting, noise, or even the test administrator, can influence an individual's test performance.
 Test form. Many tests have more than one version or form. Items differ on each form, but each
form is supposed to measure the same thing. Different forms of a test are known as parallel
forms or alternate forms. These forms are designed to have similar measurement
characteristics, but they contain different items. Because the forms are not exactly the same, a
test taker might do better on one form than on another.
 Multiple raters. In certain tests, scoring is determined by a rater's judgments of the test taker's
performance or responses. Differences in training, experience, and frame of reference among
raters can produce different test scores for the test taker.
Validity is the most important issue in selecting a test. Validity refers to what characteristic the test
measures and how well the test measures that characteristic.
 Validity tells you if the characteristic being measured by a test is related to job qualifications and
requirements.
 Validity gives meaning to the test scores. Validity evidence indicates that there is linkage
between test performance and job performance. It can tell you what you may conclude or
predict about someone from his or her score on the test. If a test has been demonstrated to be
a valid predictor of performance on a specific job, you can conclude that persons scoring high on
the test are more likely to perform well on the job than persons who score low on the test, all
else being equal.
 Validity also describes the degree to which you can make specific conclusions or predictions
about people based on their test scores. In other words, it indicates the usefulness of the test.
An achievement test measures one's performance on specific knowledge or skills learnt in a specific
course. In other words, an achievement test measures how much the person has learnt in a specific time
which can be, e.g., at the end of a chapter, or at the end of a course, etc to test the student's weak
points or strengths. Achievement tests have educational and diagnostic purposes that can help the
teacher and the learner to investigate which area(s) the learner needs to improve. However,
proficiency tests measure how much a person is able to use the knowledge or skills s/he has learnt in
real time situation(s) and they evaluate the learners’ level of knowledge or skills on a specific area, e.g.
IELTS and TOEFL are the examples of language proficiency tests.
A diagnostic test is a test that helps the teacher and learners identify problems that they have with the
language. At the start of the course, the teacher gives the learners a diagnostic test to see what areas of
language need to be in the syllabus.
Test specifications are a detailed summary of the test components and their purpose. They can serve as
an outline for the structure of the test and its general objectives, and they can also be made more
detailed depending on how much info is added.
Assessing skills:
Speaking: word repetition, sentence/dialogue completion, oral questionnaires, picture-cued tasks,
translations, question and answer, paraphrasing.
Reading: multiple choice, picture-cued tasks, reading aloud, matching, editing, gap-filling, cloze tasks,
editing, scanning, ordering.
Writing: copying, listening cloze, picture-cued tasks, multiple choice, matching, dictation, grammatical
transformation, ordering, paraphrasing.
Listening: question and answer, listening cloze, information transfer, sentence repetition, dictations,
note-taking, editing, retelling.
Extensive reading, free reading, book flood, or reading for pleasure is a way of language learning,
including foreign language learning, through large amounts of reading. As well as facilitating acquisition
of vocabulary, it is believed to increase motivation through positive affective benefits.
Holistic scoring provides an examinee with a single score regarding the quality of examinee work (i.e.,
performance) as a whole. Most commonly, holistic scoring is used to assess writing samples, though it
may be employed to assess any performance task, for example, acting, debate, dance, or athletics.
When scoring an essay holistically, the rater neither marks errors on the paper nor does the individual
write constructive comments in the margins. Instead, the rater considers the quality of the entire paper
and then assigns one holistic score. The SAT, ACT, and Advanced Placement tests all utilize a 6-point
holistic scoring rubric to assess their respective writing sections.
Analytic scoring is a method of evaluating student work that requires assigning a separate score for
each dimension of a task. Often used with performance assessment tasks, analytic scoring rubrics
specify the key dimensions of a task and define student performance relative to a set of criteria across
performance levels for each dimension. For example, analytic rubrics used to evaluate student essay
writing often include the following dimensions: development of ideas, organization, language use,
vocabulary, grammar, spelling, and mechanics.
What makes speaking difficult?
- Clustering, redundancy, reduced forms, performance variables, colloquial language, rate of delivery,
stress, rhythm and intonation and interaction.
A cloze test (also cloze deletion test or occlusion test) is an exercise, test, or assessment consisting of a
portion of language with certain items, words, or signs removed (cloze text), where the participant is
asked to replace the missing language item. Cloze tests require the ability to understand context
and vocabulary in order to identify the correct language or part of speech that belongs in the deleted
passages. This exercise is commonly administered for the assessment of native and second language
learning and instruction.
C-test is a task where the second half of every other word is obliterated and the test taker must restore
each word.
What skills do learners have to master in order to become efficient readers?
Top-down, bottom-up and schemata. Learn the basics and then connect whole contexts.
‘ Alternative assessment’ is usually taken to mean assessment procedures which are less formal than
traditional testing, which are gathered over a period of time rather than being taken at one point in
time, which are usually formative rather than summative in function, are often low-stakes in terms of
consequences, and are claimed to have beneficial washback effects.

Methodology 3 Exam Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Methodology 3 Exam Notes

Uploaded by

Copyright:

Available Formats

1. Name at least 4 out of 8 factors that make listening difficult and explain.

Clustering – attending to appropriate chunks of language: phrases, words, clauses,

2. Validity and reliability?

3. What are test specifications?

5. Listening as a second fiddle?

Listening is often implied as a component of speaking. The overt nature of speaking

6. Proficiency vs. Achievement test?

An achievement test is related directly to classroom lessons, units, or even a total

8. Three kinds of test reliability?

Student-related reliability, rater reliability and test administration reliability.

9. C-test (vs. Cloze test)? Add examples!

10. Alternative assessment (Anderson quote; fill-in-the-gaps)?

11. Why are Validity and Washback important?

Washback – “the effect of testing on teaching and learning”

12. Listening: Macroskills and microskills?

13. 5 factors which make a test practical?

14. Holistic and Primary scoring?

15. Name at least three tasks for assessing listening.

Dictation, listening cloze, sentence repetition, note-taking, retelling, information

16. Explain extensive reading.

18. Formative and summative assessment?

Summative assessment aims to measure, or summarize, what a student has grasped,

Formative vs. Summative Assessment

You might also like