Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 13

Designing Test Task (ADUBAL)

(1st slide)
Some grammar test takers may sometimes wonder why their performance is low, they tend to question
their grammatical ability but one possible reason for them getting low grammar test scores is maybe
there is a problem with the characteristic of the test itself. What are these characteristics? The
instruction, time allotment, the input, the scoring method, and so on.

(2nd)
Not only the characteristics of the test that developers should try to look after. They should also try
considering the characteristics of the test takers or the individual that will take the grammar test. These
characteristics are: his or her grammatical knowledge, personal attributes, topical knowledge, and
affective schemata. For example, some test takers perform better in MCQ than in oral tasks, some
perform better in essay writing than in making a graph. It simply implies that each task has a unique
characteristic, called test-task characteristics.

(3rd)
According to these authors presented, grammar test scores may also vary based on their personal
attributes which are age, gender, and language background. Also in their strategy use, motivation, and
level of anxiety.

To conclude, in designing test tasks developers should understand the characteristics test itself and the
test takers. If we understand both the nature of grammatical ability and the nature of the test tasks we
use, we will be able to account for the effect of the method on how we interpret scores on grammar
tests. Meaning, we can come up with the appropriate test method to use.

Grammatical Ability (AMIGO)


(Intro) Greetings or Introduce yourself

(Slide) 1 Definition of Grammatical Ability (recap lang kay ma discuss siya sa group 2)
Let us have a recap on the definition of grammatical ability (read slide). Meaning, we are able to use
our knowledge on grammar into practice or in a language use situation.

(Slide 2) Graphic Organizer


The previous reporter emphasized the importance of understanding the test task that we use to elicit
test performance, which is our grammatical ability because there is considerable research evidence
that demonstrates that the test method used to elicit test performance significantly impacts the test-
takers score. (Graphic organizer).

Slide (3) Example


For example, Bachman and Palmer (1982) examined whether the variation in scores obtained from
speaking and reading tests was due to the examinees’ speaking and reading abilities or to the methods
used to elicit speaking and reading performance. Unsurprisingly, they found considerable evidence of a
test method effect.
According to Onaiba and Jannat (2019), test method effect is “a term used when the method used for
test language ability affects students’ scores.” In the example, Bachman and Palmer found out that the
test method that was used, which is self-rating and oral interview, affects the students’ scores most
likely because of the features of the test. In other words, the self-rating they used seemed to be more
a measure of the method than of the test-takers ability, the same with the oral interview.

Instead nga ang imeasure sa self-rating ang abilities sa test-takers in terms of their reading and
speaking ability or ang ilang performance, ang method na hinoon ang iyang gi measure same with the
oral interview. Back to the graphic organizer, the test method that we will use will significantly affect the
test-takers score.
That is why, as stated in the book of Purpura, the topic designing test task to measure L2 grammatical
ability is devoted to the notion of “task” and to the ways in which grammar task can be specified in tests
to elicit the type of performance we are mainly interested in measuring.

How does test development begin? (BARNES, CAGUTOM)


1st slide- Read first what is in the ppt then continue with this. “Within these situations, the tasks or
activities requiring language to achieve a communicative goal are called the target language use tasks.
A TLU task is one of many language use tasks that test-takers might encounter in the target language
use domain. It is to this domain that language testers would like to make inferences about language
ability, or more specifically, about grammatical ability.”
2nd slide is a situational example - To illustrate, suppose we wanted to know if a student with two years
of English language instruction at a Saudi Air Force Academy has some requisite level of English
language proficiency to begin flight school in English. To obtain this information, we give the student a
test in which the tasks are based on helicopter flight-training sessions. In other words, one task
requires the student to demonstrate his knowledge of the flight controls by answering cause–effect
questions (e.g., What’ll happen if you mistakenly move the cyclic to the left?). The language being
tested involves the future tense in conditional sentences used to express cause–effect relationships
related to flight control. Based on the results of all the grammar tasks and our interpretation of the
scores, we decide whether 102 the student has some criterion level of English language proficiency
for flight school.
In this example, the TLU situation is language instruction at flight school, and the assessment purpose
is to measure the student’s ability to use grammar as a resource for communication in this setting.
Among the competencies to be measured, we include the student’s ability to use conditional sentences
to express cause–effect relationships. One TLU task requires the student to respond to questions
about the flight controls. This is one of many TLU tasks that could have been selected from the TLU
domain. The decision to permit or deny the student admission into the program is a high-stakes
decision given the potential seriousness of its consequences. In other words, to permit him to begin
flight training with an insufficient command of English could be dangerous, and to delay him from
beginning training could cost time and money.

*EXPLAINATION FOR THE TABLE PART* (nka colored ang explanation)


What do you mean by Tasks? (CAMATO)
Greetings!

1st Slide (READ)

Explanation:
So, basically the word task refers to any activities that requires our learners to accomplish or to do
something. A task then is any activity, for example short answers or role-plays as long as it involves a
linguistic or non-linguistic (circle the answer) response to input. A task is a complex activity that aims to
accomplish a number of pedagogical goals. For example, a task that is carefully developed by an ESL
teacher may target a particular language structure, the acquisition of which is necessary in order to
complete a task. Alternatively, a task can be developed to prepare L2 learners for successful
communication outside the ESL classroom. Because one task can potentially have different goals, it is
difficult to find a single uniform definition of a task.

2nd Slide

Task Naturalness (DAGATAN)

1st Slide(Intro)

Loschky and Bely-Vroman(1993) work in structure trapping and distinguish three types of structure-
task relationships:

2nd Slide

1. Task naturalness
2. Task utility
3. Task essentialness

3rd Slide

TASK NATURALNESS
- a condition where 'a grammatical construction may arise naturally during the performance of a
particular task, but the task can often be performed perfectly well, even quite easily, without it'.

In task-naturalness, a grammatical construction may arise naturally during the performance of a


particular task, but the task can often be performed perfectly well, even quite easily without it. In the
case of task- utility, it is possible to complete a task without the structure, but with use of the structure
the task becomes easier.

The characteristics of a task are often such that a particular structure is likely to arise naturally.
Perhaps the successful completion of the task does not absolutely require the accurate use of the
structure and can even be completed quite efficiently without the structure. Nevertheless, the task
lends itself, in some natural way, to the frequent use of the structure. We say in these cases that the
structure is "natural" to the particular task.

4th Slide

TASK NATURALNESS
Example:

> the butler could have done it or,


>the maid might have killed her
———————————————————————-
>Maybe the butler did it or;
>I suspect the maid killed her

For example, in a task designed to elicit past modals in the context of a murder mystery, we expect
forms like:

> the butler could have done it or,


>the maid might have killed her

We can also get forms llike


>Maybe the butler did it or;
>I suspect the maid killed her

As you can see the successful completion of the task does not absolutely require the accurate use of
the structure can even be completed quite efficiently without the structure itself.

5th Slide

Thank you for listening!

Task Utility (DAL)

1st Slide (Title)

Greetings & Intro… (ikaw na bahala mukapa dria)

(Next) This is the second type of task which is the Task Utility. In task-naturalness, the target structure
may not be vital for task completion but is expected to arise naturally and frequently. Moreover, in the
case of Task Utility…(next slide dayun)

2nd Slide (Definition)


(Read the definition.) In short, in task utility, it is possible to complete a task without the structure, but
with the structure, the task becomes easier. To elaborate, task utility, although the target feature is not
necessary for task completion, it is very "useful" in the way that using the target feature makes it easier
to perform tasks.

3rd Slide
For example… Let’s say si Angelou (sampol rani sya. ikaw bahala kinsa imo iingon na name) states
that... (basahon dayun nimo ang example.) Here, Angelou is comparing the two cities; notice that the
word very is repeated three times. If we analyze it, the word very is not necessary in the structure;
however, this completes the structure. Moreover, if Angelou had used words like comparatives or
superlatives in the message, it could have been communicated much more easily. Let’s try to rewrite
the example, but with the use of a superlative. (Shiraz is a beautiful city, but Esfahan is the most
beautiful city in Iran.) If you notice, the structure is now easier to convey than the previous structure.

Task Essentialness (Igloria)


Essential Grammatical Knowledge: The Task-essentialness of a Structure

Loschky and Bely-Vroman(1993) work in structure trapping and distinguish three types of structure-
task relationships:

1. Task-naturalness is a structure that would arise naturally and its use is unforced.
2. Task-utility that uses a particular structure and would help complete the task perfectly but
avoided using other structures and is difficult.
3. Task-essentialness which is essential to attend to the structure in order to complete the task
successfully and needs more control and adjustment. Task-essentialness will force learners` attention
to certain structures and improve interlanguage hypotheses.

So far, we have concentrated on tasks in which a particular grammatical point is natural or useful to a
task. However, if some tasks are constructed carefully, it is essential to attend to the relevant structure
to perform the task successfully; it is impossible to succeed unless the grammatical knowledge is
attended to. We will call this type of grammatical to that structure because the task can be completed
more efficiently and with a greater likelihood of success if the structure is used correctly.

The assertion that a particular structure is in fact useful for a particular task is, of course, an empirical
claim. And like all classroom proposals, it must be accountable to empirical testing. The most
straightforward way to investigate the task utility of a given structure is to compare the performance of
the task with and without the use of the structure. This can be done with learners: for example, one
group is given instruction in the structure (and they are demonstrated to have some level of mastery);
while another group is not. Both groups do the task. The performance of the two groups is compared,
using some measures of task success, such as accuracy of task outcome or speed of task completion.
Multiple measures would be valuable and could lead to modifications of the task to increase utility. For
example, suppose it is discovered that learners who know locative prepositions aren't much more
accurate in spotting differences in a 'Spot the difference' task than those who don't, but that they
complete the task much faster.

The teacher might then decide to add a time constraint to the task in order to enhance determine
whether a given picture goes with a given sentence. For a given item, the action (painting in this
example) is always constant, as are the characters. Attention is focused only on who is doing the
action to whom: in effect, whether the reflexive refers to Mr Fat or to Mr Thin. Among the items,
sentence structures are systematically varied: sometimes the antecedent is the main clause subject,
sometiInes the object. Sometimes an infinitive clause is used, and sometimes a finite clause. Precisely
those factors are varied and are relevant to determining the references of pronouns and reflexives:
these factors are thus the essence of the task, and the task cannot be performed without employing
the relevant principles.

No doubt, such tasks - in which a structure is essential- are sometimes difficult to create; certainly, they
will always be harder to create than tasks in which the structure is merely natural or useful. Because
essentialness is a much more stringent requirement than utility, achieving it requires correspondingly
more control over the discourse. Thus, the goal in production tasks is likely to be limited to the
task...utility or task naturalness, while in comprehension tasks, task-essentialness can more easily be
achieved.

Conclusion
I think to be able to speak and understand a second language does not just know a long list of
vocabulary or grammatical structures as the students of high school know but they are not able to
communicate in English. For producing a second language, let`s say English in our case, the most
important thing is to know the grammatical rules in a meaningful way. It means that material designers
should design books in which the students will be guided to use structures in a meaningful way as
Long (1991) refers to it. If the students` attention is just directed to meaning, it would be useful but for a
short period of time because the structures of the language wouldn’t be internalized for future use in
long-term memory. So some degree of attention should be paid to forms.

Comparing the structure-based proposals, I think the students should be involved in tasks that don’t
give them the feeling of pressure to produce the forms. They should use the forms in an unforced
manner as Loschky and Bely-Vroman (1993) mention in Tasknaturalness. To teach the students
specific structures the teacher can get help from the Taskutility which is also mentioned by Loschky
and Bely-Vroman (1993). So I think that as Ellis, Basturkmen, and Lowen (2001) found, the learners
who engaged in communicative, focus on form activities improved their grammatical accuracy and their
use of new forms. I think focus on form activities can lead the students to produce more accurate
structures. I think the results of some researches have shown the effectiveness of focus on form.
Considering the limitations of focus on form, I think as the process of second language learning is very
complicated it has its own limitations and overcoming or lessening them can`t be impossible to ignore
using focus on form in instruction for second language learners. In sum, form- focused classes with
traditional curriculum cannot help students acquire language. Moreover, this will discouraged them
from learning a second language and, therefore, they should experience a rich diet of comprehensible
input to acquire language.

Selected Response Task Types (LAMATA)


*SLIDE-EXPLANATION
*Diff highlight = Diff slide
Selected-response tasks present input in the form of an item, and test- takers are expected to select
the response.

-In selected response tasks, there is no need for the student to make or construct a response because
different responses were made, all they have to do is pick the correct answer out of all the responses.
In terms of the response, selected- response tasks are intended to measure recognition or recall of
grammatical form and/or meaning.
Selected-response tasks help students recall or recognize because the information they were
supposed to recall or recognize is present among the attractive alternatives. Let’s say, in a multiple
choice item, students must pick the correct phrase to complete a sentence. If students can recall
grammatical rules, such as the correct use of tense, pronouns, or word form, then they can recognize
the correct answer among the options.

selected-response tasks can vary in terms of reactivity, scope and directness.


Reactivity is ‘the extent to which the input or the response affects subsequent input and responses.
There are three types of reactivity. First is Reciprocal, which involves both interaction and feedback
between two or more examinees. Second is adaptive, where there is no feedback, but there is
interaction in the sense that the responses influence subsequent language use. The last one is non-
reciprocal where there is no reactivity since no interaction or feedback is required to complete the task.
The scope of the input can also vary. In answering selected response tasks, students must process
either broad or narrow input.
Lastly, the directness of a selected response task can also vary. If the response is based on
information in the input, the relationship between the input and response is direct. But, if the response
needs other kinds of topical or pragmatic information, the relationship between the input and response
is indirect.

MULTIPLE CHOICE
*read slide*
MULTIPLE CHOICE ERROR IDENTIFICATION
*read slide*

MATCHING
*read slide*

DISCRIMINATION TASK
*read description*
This is a discrimination task in which the input is both non-language and language. The images are the
choices while the utterance is the input. The utterance is “Se la entregó a ella.“ or “He delivered it to
her.” And the two options completely contrast one another. In picture 1, the woman received the
package. In picture two, the man is the one who receives a package. With that in mind, the
test-takers must select which image is best represented by the utterance.

NOTICING TASK
*read description*

GRAMMATICALITY JUDGEMENT TASK


*read description*
Grammaticality-judgment tasks are almost exclusively used in SLA research because it is questionable
if grammaticality-judgment tasks tap into grammatical knowledge or they simply a measure of the
students’ metagrammatical knowledge.

Limited Production Task Types (MANGANA)


Slide 1 Intro:
*We are now down to the second task type which is the Limited production task type.
It is a type of task where the student is expected to produce a limited amount of language production.

✓Slide 2 : Read what is in the slide first


– Limited-production tasks present input in the form of an item with language and/or non-language
information that can vary in length or topic.
E: As opposed to selected-response tasks,limited production test tasks elicit a response with a limited
amount of language production. This response can range in length from a single word to a full
sentence. With the exception of two, all task characteristics in limited-production tasks can vary: the
type of input (always a 'item') and the type of expected response (always 'limited-production').

✓Slide 3: Read what is in the slide first


– Limited-production tasks are intended to assess one or more areas of grammatical knowledge
depending on the construct definition.
E: Simply the test-taker’s response can vary from a one-word answer to a sentence answer depending
on his/her grammatical ability that is defined, representing only a limited amount of language
production.

✓Slide 4
INTRO : Let us now move on to the types of limited production test tasks and their examples.
– In a gap filling task, test-takers select the proper response to complete the missing information or
gaps in a sentence, dialogue, or passage based on the context. This means that you are able to
assess someone's language and grammar skills by looking at the changes.

✓ Slide 5
– While, in a short answer task, test takers read a passage before responding to a question or
questions. To answer the question, test-takers must employ the appropriate grammatical construction
and meaning. This exercise examines a person's reading comprehension as well as their grammatical
knowledge.

✓Slide 6
– Lastly, the dialogue completion task, test-takers are required to complete a short conversation or
dialogue in which some exchanges are left blank.The responses provided by test-takers must be
grammatically correct in both form and content.

Extended Production Task Types (MENDIOLA)


Intro: Now that we are done discussing Selected response and Limited-production task type, we will
then proceed to the last task type which is the Extended Production tasks.
1st slide:
Attempt to elicit an immense amount of data, the quality and quantity of which can vary substantially for
each test-taker. Some of these activities are considered to test implicit grammatical knowledge due to
their real-time nature. Also, it focuses more on Prompt-based learning which uses the information
acquired by pre-trained language models on vast amounts of text input to accomplish various
downstream tasks such as text categorization, machine translation, named-entity detection, text
summarization, and so on.
Extended Production tasks assess the examinee's ability to employ grammatical forms to convey
meaning in instances of language use, which can include speaking and writing.
Extended-production tasks are scored with the rating-scale method which is a scale on which a
learner's performance is rated, producing a quantifiable result.

2nd slide: These are the categories that is under the Extended Production tasks
· Extended Production Tasks with Written Responses
o They can be as short as a few phrases or as long as a complete essay or even a full-length
research report that focuses on thoughts or ideas that can be done as a written response.
o Ranges from essays, summaries, and research paper
· Extended production tasks with spoken responses
o This involves their speaking abilities to perform or to provide information.
o This can be Dialogues, Interviews, Role-play, Stories, and etc.
· Portfolios
o This is an academic discussion about student learning, curriculum, pedagogy, and student support
services are encouraged. Encourages students to reflect on their own learning. Students may realize
what they have and have not learned.

What makes Grammar Test useful? (NUCUM)

(First 2 Bullets)

It basically means that Score-Based Inferences aims to gather data on how effectively a student knows
or uses grammar to communicate meaning in a setting when the target language is employed. The
feedback to the test items can then be utilized to award scores and draw conclusions about the
student's underlying grammatical abilities.

(3rd Bullet)

Given that it is impossible to directly examine a person's grammatical skill, we must infer it from
responses to questions or examples of real performance. Formative evaluation in grammar
assessment provides advice during a period of teaching or learning on how test-takers can enhance
their understanding of grammar or their ability to apply language in communicative contexts. It also
informs teachers about how they might adapt future instruction or fine-tune the curriculum. As for the
summative evaluation it’s basically to test the overall performance of the student at the end of the
discussion or course.

(4th Bullet)

This part actually is self-explanatory na, It basically just wants to ask thought-provoking question that
will be answered on the next bullet.

(5th Bullet)

Bachman and Palmer in 1996 suggested a test framework. According to them, they consider a test
'useful' for any given testing context if it exhibits a balance of the six complementing properties listed
below: reliability, construct validity, authenticity, interactivity, impact, and practicality.

The Quality of Reliability (PAICAN)

Intro: The previous reporter discussed “What makes a grammar test useful?” So the characteristics of
a good language test or good grammar tests are; reliability, validity, authenticity and fairness but the
focus of this topic is predominantly on ‘reliability in grammar testing’

Reliability is concerned with how we measure.


(Read 1st slide)
In fact, an unreliable test is worth nothing. In order to understand the concept, an example may prove
helpful. The reliability of a test is concerned with the consistency of scoring and the accuracy of the
administration procedures of the test
(Read 2nd slide)

In other words, reliability can be described as the extent to which a test measures what it purports to
measure consistently and accurately
The Quality of Construct Validity (PONGASI)
First slide: Construct Validity is the extent to which your test or measure accurately assesses what it's
supposed to. In research, it's important to operationalize constructs into concrete and measurable
characteristics based on your idea of the construct and its dimensions.

According to bachman and palmer, the extent to which a specific test score can be interpreted as an
indicator of the ability(ies) or construct(s) we aim to measure. as well as the sphere of generalization to
which our score interpretations generalize'.

When constructing or evaluating a measure, construct validity ensures that you are measuring the
construct of interest. Without construct validity, you risk measuring unrelated or separate constructs
and losing precision.

The Quality of Authenticity (RAMIREZ)

Intro (Read 1st ppt slide)

To further explain Bachman and Palmer's point, when the term "authenticity" is used in the context of
assessment, it refers to how similar, relevant, and related the task type is to the TLU task
characteristics, or the objectives to evaluate a particular set of skills.

(Read 2nd slide)

When completing a selected-response activity, the student being evaluated just needs to choose one
response from the available options. There's a strong chance that that particular student would choose
to guess rather than come up with the right response. Because we are unaware of what the test is
actually measuring, language teachers believe MCQ tests to be unauthentic.

(Read the 3rd slide)

So how can we assess authentically?

Finding out whether student knowledge can be used outside of the classroom is one of the objectives
of authentic assessment. For example, a physics examination ought to include doing physics by
carrying out experiments and resolving issues in the same manner as a real-world physicist.

Another example is if a child has been studying about various sorts of rocks, they can be requested to
compile a collection of rocks, that is as an example of authentic assessment. The child must apply
what they have learnt to complete the evaluation in both of these situations.

(Read the last slide)

In the context of language acquisition, we can assess a student's grammatical ability by allowing them
to compose an essay or by asking them to fill in a blank with the appropriate grammatical form. We
must make every effort to ensure that the statements come across as natural and understandable.
Thank you.

The Quality of Interactiveness (SABLAS)


(1st slide)
A fourth quality of test usefulness was outlined by Bachman and Palmer 1996. This refers to how we
constructed a test-task that can measure test takers’ language ability or what aspects the facilitator
wanted to assess from them either in grammatical or language knowledge.
(2nd slide)
A task can be interactive if it engages the test takers' topical knowledge and positive affective
schemata. Topical knowledge concerns what students already know about the subject to be learned.
(They believed that the learners were not an empty vessels and that we need to fill them with all the
knowledge, learners always have their topical knowledge and that’s what the facilitator needs to do to
bring out and add that topical knowledge they already possess.)
(to sum it up)
In other words, the quality of instructiveness refers to the test task to be given to the test-takers to
measure their grammatical ability. Test task characteristics were varied indeed, thus, choosing what
was suited the most for that context so the test takers would bring out what they really learned and
help them monitor their knowledge about grammar is a must thing to do. Choosing the right test task
gives us the correct data that we wanted to see and measure as well as avoids the risk that this may
mask the very constructs we are trying to measure.

The Quality of Impact (VIDAMO)


1st slide
Testing plays an important role in society. Tests serve as gatekeeping devices or doors to opportunity.
They can be used to punish and to praise. It is, therefore, important to recognize
that tests reflect and repre- sent the social, cultural and political values of any given society, and in the
evaluation of test usefulness, we must take into consideration the possible consequences that may
ensue from the decision to use test results for decision-making.
2nd slide
Bachman and Palmer (1996) refer to the degree to which testing and test score decisions influence all
aspects of society and the individuals within that society as test impact. Therefore, impact
refers to the link between the inferences we make from scores and the decisions we make based on
these interpretations.
In terms of impact, most educators would agree that tests should promote positive test-taker
experiences leading to positive attitudes (e.g., a feeling of accomplishment) and actions (e.g., studying
hard).

Tests are also said to be ‘useful’ when they promote positive attitudes on the part of test-takers and
other test constituents to be more engaged in the testing–learning process.
3rd slide
Test practicality is not a quality of a test itself, but is a function of the extent to which we are able to
balance the costs associated with designing, developing, administer- ing, and scoring a test in light of
the available resources (Bachman, personal communication, 2002)

For example, we may want to include limited- and extendedproduction tasks in a grammar test to
measure students’ explicit as well as their implicit knowledge of grammar, so in the test design stage
we need to decide how important this is in relation to the other qualities of the test.

4th slide

If we decide, for example, that reliability is very important, we need to consider the costs (time and
people) of scoring both the limited- and the extended-production tasks. If the scoring costs, however,
outweigh the available resources, we must then reconsider the goals of the test and our priorities and,
if needed, reallocate the resources by changing our design.

You might also like