Standards-Referenced Assessment For Vocational Edu

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/228427015
Standards-Referenced Assessment for Vocational Education and

Training in Schools
Article in Australian Journal of Education · April 2007

DOI: 10.1177/000494410705100103
CITATIONS READS
21 994
3 authors, including:
Patrick Griffin Shelley Gillis

University of Melbourne University of Melbourne
142 PUBLICATIONS 3,425 CITATIONS 27 PUBLICATIONS 168 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
The design and validation of assessment, reporting, and planning materials to support foundational skill development for children and young
people with additional learning needs View project
Collaborative problem solving View project
All content following this page was uploaded by Patrick Griffin on 05 June 2014.
The user has requested enhancement of the downloaded file.

Standards-referenced assessment
for vocational education and
training in schools Patrick Griffin
Shelley Gillis
Leanne Calvitto
Assessment Research Centre
The University of Melbourne
T
his study examined a model of assessment that could be applied nationally
for Year Twelve Vocational Education and Training (VET) subjects and
which could yield both a differentiating score and recognition of com-
petence. More than fifty colleges across all states and territories of Australia field-
tested the approach over one school year. Results showed that the model allowed
for a standards-referenced model to be used: that the approach was compatible
with the diverse range of senior secondary assessment systems in use throughout
Australia and that there were considerable cost benefits to be had in adopting the
logic of item response modelling for the development of rubrics for scoring per-
formances on units of competence from National Training Packages. A change in the
logic of competency assessment was proposed, in that the performance indicators
were not rated using a dichotomy but with a series of quality ordered criteria to
indicate how well students performed specified tasks in the workplace or its
simulation. The study validated the method of assessment development, demon-
strated the method’s consistency, and showed how the method could address the
issue of consistency across states. The study also proposed a set of principles for a
joint assessment of both quality and competence.
The introduction of competency-based education to Australia in 1992 was intended

to increase the skill level of the Australian workforce. Its introduction was accom-
panied by a debate about how the competencies should be assessed and other issues
(such as grades) that were more related to reporting than assessment. The dis-
cussions were hampered by a lack of clarity of the terminology, shifting definitions
of basic concepts and inconsistent use of language to describe the process (Stanley,
1993).
The debate surrounding the nature of competency-based assessment centred
on two issues: first, whether grading was a suitable approach to use with com-
petency assessment; and second, whether criteria for assessment can be generic or
specific.
Australian Journal of Education, Vol. 51, No. 1, 2007, 19–38 19

Discussion of both issues tends to have been confused by imprecise use of ter-
minology and appears to have ignored the considerable body of literature available
on each topic. The lack of rigorous research and theoretical models has been
detrimental to the development of the field.
One of the few comprehensive studies into this field (McCurry, 2003) has
shown that generic criteria cannot be applied effectively to competence assess-
ments, and therefore that debate will not be extended here. Rather, this article
focuses on the application of specific criteria in the development of assessment
procedures to provide a differentiating score for students who chose to study
vocational subjects in their last year of secondary school and who, under the regime
of the dichotomy between the existence or absence of competency, were at a
disadvantage to those students studying subjects graded on levels of quality when
university selection procedures were implemented.
Griffin, Gillis, Keating and Fennessy (2001) examined the use of standards-
referenced assessments in vocational education and Griffin and Gillis (2001; 2002)
outlined possible procedures that could be used to examine the efficacy of a
differentiated scoring system for VET in schools. Their research has shown how
current approaches to competency-based assessment could be used to yield a dif-
ferentiated score in addition to the recognition of competence, without altering the
fundamentals of the entrenched competency-based approach but instead focusing
on a customisation of record-keeping and reporting frameworks. This article
reports on how those approaches were tested and the model of assessment for
competency-based education that emerged.
Competency assessment and standards-referenced

assessment
A major purpose of competency assessment has been to provide for the recog-
nition of competence. This article argues that it can also provide evidence of dif-
ferentiation for selection purposes. Observations used to provide for recognition or
differentiation can all be based on the procedures already used by assessors in the
workplace, endorsed by the Australian National Training Authority and national
industry training advisory boards and documented in the relevant assessor-training
package (National Assessors and Workplace Trainers Body, 1998).
When competency-based assessment was introduced into vocational edu-
cation in Australia in the early 1990s it was presented as an example of criterion-
referenced or criterion-based assessment, but this was a limited and misleading
view of criterion referencing. Over the best part of a decade, the discussion was
based on the use of a dichotomy that recognised only the existence or absence of
competence and excluded the idea of the quality of performance. In 2001, Griffin
et al. argued that the definition of criterion-referenced assessment could itself be
used to justify the approach of the dichotomy of competent or not-yet-competent
and yet still yield a differentiating score.They argued that competence was itself a
decision point within a continuum of increasing quality of performance and this
was entirely consistent with the original concept of criterion referencing. Glaser
(1981), who conceived criterion-referenced interpretation, agreed that the concept
20 Australian Journal of Education

incorporated a competence decision because it was about the ability to progress
along a continuum of increasing competence as defined by performance quality.
Standards referencing is recent terminology for a form of criterion-referenced
interpretation. It is important to note also that criterion referencing is not an assess-
ment method. It is not a testing procedure; rather, it is the use of a defined
interpretation framework. The correct use of the expression should refer to
criterion-referenced interpretation of assessment or standards-referenced interpre-
tation of assessment, not criterion-referenced or standards-referenced assessment or
testing in and of itself.
In the early 1980s, the development of subject profiles within the school
systems was an extensive and practical approach to the development of criterion-
referenced interpretation frameworks. This led to the implementation of national
profiles and curriculum statements and within a few years they became the basis of
‘outcomes-based education’. Outcomes were defined in terms of increasing levels
of competence within discipline areas in the school curriculum. Outcomes were
described using ‘progress maps’, but progress maps could be generated by a small
number of people capable of conducting item response modelling analyses.
Standards referencing was first proposed in Queensland in the late 1980s and early
1990s (Sadler, 1987) and gained credibility with the McGaw (1997) and later
Masters (2002) reports regarding the New South Wales Higher School Certificate.
Largely through these reports, standards referencing became synonymous with
outcomes-based education. Outcomes and competence-based education merged
through this terminology and standards referencing began to subsume the ideas of
competency in school education.
Education began to report in terms of developmental frameworks using
descriptions of skill levels and competencies; hence, if a similar shift occurred and
a criterion or standards-referenced framework were to be adopted for competency-
based education in VET, recording methods would need to be adjusted so that the
records of achievement by students in ‘VET in schools’ programs could record their
level of performance in addition to the competence dichotomy.
Universities make decisions about which students to select into their courses
and most use a ranking system based on Year Twelve examinations. The system is
usually based on a percentile rank reported at an assumed accuracy of two decimal
places. Errors of measurement must be minimised and the quality and reliability of
the assessment data need to be high in order to allow fine discrimination between
candidates seeking entrance into university.
The current understanding of recognition requires that a person be described
as having achieved a status of competent or not-yet-competent for units and ele-
ments in a training package. It is simply a two-level continuum. If the continuum
were to be expanded beyond the two levels to identify how well a student had
performed, observers would be able to differentiate among those students classified
as competent. Such differentiation would not remove or invalidate the current
methods of reporting and recording of competent and not-yet-competent, but it
would provide for extensions of this classification system to allow for the quality of
the performance to be recognised and reported and then make it possible to rank
Standards-referenced assessment for VET 21

students in the same way that examinations at the end of Year Twelve enable them
to be ranked for selection purposes. Moreover, opportunities heretofore denied to
VET students would be opened up if VET subjects could be used for university
entrance and in other selection contexts. Stringent requirements regarding the
measurement details of selection procedures would be necessary, however, if the
recognition of competence were to be supplemented by the ability to differentiate
for selection purposes.
The notion of competence

The requirements necessary to supplement the assessment of competence with dif-
ferential grading can be achieved if we allow a small adjustment in our under-
standing of competence. Competence has been generally defined as the capacity to
meet the standard of performance expected in the workplace.This has been a fine
definition but it is a truism that there is no fixed standard expected across work-
places within an industry. Employers exploit their competitive advantage by
arguing that their workers are able to demonstrate superior performance against the
competencies in training packages and that they expect and achieve higher
standards of production than their competitors (Curwood, 2006). It is difficult to
reconcile employers’ recognition of differentiated levels of performance with the
argument that there is a single defining point on any continuum to indicate when
competence (the standard expected in the workplace) has been achieved for a unit
of competence that defines skills required across workplaces with variable expecta-
tions. Hence, regardless of whether there were two levels defined as competent and
not-yet-competent, the continuum would not have a single stable and invariant
cut point across all workplaces in the same industry even for the same unit of
competence.
There is therefore a need for a different view of competence. Competence
could be a person’s capacity to adjust their performance to meet the variable
demands expected in workplaces.This incorporates the previous idea that a person
can meet the standard expected in a workplace, but it also says that the person
experiencing different workplace expectations can adjust their level of performance
to the standards encountered across different workplaces or even within the same
workplace over time. This idea of competence incorporates the idea of variable
expectations and levels of performance quality and allows competence to be a con-
struct used to interpret the quality of performance on a coherent series of tasks
such as those found in the training packages.
Assumptions
This does not mean that it is necessary to jettison the idea of competence as a
dichotomy or the practice of confirming competence on the basis of performance
on discrete tasks as long as the quality of the performance can be taken into
account. There is, however, a need for the tasks to be a coherent set ordered in
terms of skill demand if they are to define a continuum of competence. It also
requires a set of assumptions to be explicated.They also apply to the dichotomy of
competence but are almost never considered (Griffin, 1997).

The set of assumptions begins with the postulation that continua can be
constructed to describe development or growth in specific domains of performance.
The continua do not exist in and of themselves, but they are constructed to
assist in explaining observations of performance.
Each continuum can be defined by a set of indicative tasks representing
levels of performance quality.
Not all competencies can be directly observed. Related, indirect behaviours
can also be used, along with the directly observable behaviours.
The indicators (behaviours) may be ordered along a continuum according to
the amount of competence required to demonstrate relative performance quality
on each task.
People can be ordered along the continuum according to the performances
they are able to exhibit.
The continuum can be defined by any representative, cohesive sample of
(performance quality) indicators that covers a range of levels on the continuum.
Once the continuum is identified together with the underpinning construct,
samples of indicative performances can be interchanged.
The existence of higher order indicators implies the ability to demonstrate
lower order behaviours.The relationship is probabilistic, not causal.
These assumptions define the idea of competence incorporating variable
expectations and levels of performance quality.
The developmental continuum

There is an advantage to this way of defining the developmental continuum. The
theory underpinning item response modelling analyses shows that when the com-
petence of the person equals the demands of the performance (that is, both are
located at the same point on the continuum) the odds of success are fifty-fifty. From
this it can be deduced that, if the person were to improve a little, he or she would
have a better than even chance of demonstrating performances at or near that point
on the continuum. It could be further argued that the main outcome of training is
to increase the odds of success in each of these workplace performances. Levels on
the continuum are defined by the clusters of groups of performance descriptions
that have similar levels of difficulty. As a person moves from one level to another
on the continuum, the odds of fifty-fifty at the transition points can be linked to a
change in the required performance quality and this can be directly translated into
an implication for training. If the skill changed, then this had an implication for a
change in training content and strategy. Clearly a fifty-fifty chance of success is
unsatisfactory for the workplace. Surely no employer would want a worker who
could only perform at a required level half the time. Consequently in any empir-
ical study of competence, it would be advisable to set item response modelling
calibrations at a probability of a value such as 0.95 for success.
If the qualitative analysis undertaken on the item response modelling
results ‘back translates’ to match or closely approximate an original proposed idea
of a developmental continuum it can also be used as evidence of validity. The
technique of ‘levels’ has so far been used sparingly but is increasingly emerging in

international studies; for example, Greaney, Khandker and Alam (1999) used the
procedure in their report on the ‘Education For All’ project.
Changed focus for ‘VET in schools’ program

If the different status and esteem of ‘VET in schools’ subjects and mainstream
academic subjects were to be removed,VET subjects could be made available to all
students and included in the procedures used to calculate university entrance
scores. Their exclusion has tended to lower the esteem of VET subjects, despite
considerable efforts having been made in many education jurisdictions to expand
the range of VET subjects taken as equivalent to academic mainstream subjects.This
could ensure that every Year Twelve subject was at least eligible to contribute to
university entrance scores so that students were not forced into life-changing
decisions based on which subjects they might study and which career paths were
open or closed to them. As a consequence of the change in focus, a change in
record keeping and reporting would be required to monitor students’ levels of
competence as well as the competency dichotomy.
The study
The study addressed the issue of differentiation among students on the basis of
relative quality of performance and was conducted in four industries using a total
of fifty-six competency units: seventeen units in Metal and Engineering, fifteen
units in Information Technology, twelve units in Business Administration and four-
teen competency units in Hospitality. Sixty schools nationally participated in the
project on the advice of each of the state vocational education jurisdictions.There
were nine schools in the Australian Capital Territory, twelve in New South Wales,
five in Queensland, six in South Australia, thirteen in Victoria, five in Tasmania and
ten in Western Australia. The schools focused on VET subjects and units of com-
petency distributed over the four industries.A total of 3,672 students were assessed,
493 in Metals, 1,215 in Hospitality, 884 in Business and 1,080 in Information
Technology. The students were assessed against each of a series of units of com-
petence in their relevant VET study area. The assessments were conducted by
teachers who were provided with a rating sheet on which they recorded their
observations of the relative quality of the student performance in the tasks associ-
ated with each of the units of competency. This was then developed into a score
for the subject.The derivation of the score is described below.
Developing the descriptions of competence

A panel of experts was established for each of the VET study areas.The panels con-
sisted of specialists nominated by the relevant national industry training advisory
boards.The panellists attended a workshop organised by the project staff to explore
the structure and development process for establishing rubrics that reflected
relative quality of performance. Figure 1 depicts the structure of this process.
Training packages consist of units of competence. Each unit contains
elements of competence; each element was identified by a series of performance
indicators. If a rubric were developed for each performance indicator to indicate

Key
Evidence
Performance Expectations? How well?
checklist
Area
Criterion 1.1.1.1
Indicator 1.1.1
Capability 1.1 Criterion 1.1.1.2
Domain 1
Indicator 1.1.2 Criterion 1.1.1.3
Capability 1.2
Indicator 1.1.3
Capability 1.3
Indicator 1.1.n
Framework
Capability 1.n
Domain 2
Domain 3
Strand Capability Indicator Quality

Domain Competence Pointer Criterion
Construct Requirement Evidence How well
Learning outcome Performance Indicator
Figure 1 The derivation of rubrics for assessing performance quality
how well each task was completed, the issue of ‘how well’ could be addressed.
The descriptions of ‘how well’ were called quality criteria. Some performance
indicators might have two identifiable quality criteria, some three, some four, but
in all cases a set of procedures for defining the criteria would have to be followed.
When a set of quality criteria were combined with the performance indicators as
rating scales the composite is called a rubric.
The panellists were shown how to write rubrics according to the rules that
quality criteria should: reflect levels of [workplace] performance quality not pro-
cedural steps; avoid counts of things right or wrong; avoid all comparative terms;
discriminate among ordered quality performances; enable assessees to verify their
performance assessment; focus on a series of single performances within the same
domain; and yield reliable and consistent judgments (Griffin, 1997).
The number of quality criteria for each rubric had to be defined by the
panellists, who could each call on expertise outside the sphere of their own
experience in order to avoid the restrictions of self referencing.
Score development
The model used for the study consisted of several steps.The first step was defining
the framework or the broad area of study, as shown on the left in Figure 1. In this
case it represented the training package itself and represented the overall VET study
area within which the assessment was conducted.
The second level of the model described the key performance areas, or skill
sets required within the framework. In the training package these were called the
‘units of competence’.

The third level defined the expectations employers had of assessees within
each skills set; that is, defined the assessees’ capabilities through tasks which assessees
needed to complete to demonstrate skills. In the training package these were called
‘the elements’.
The fourth level of the model was an evidence checklist, which consisted of
a series of performance indicators. The checklist contained a series of typical or
indicative tasks or performances that the assessors used to monitor the existence of
competence, which they recorded as either achieved or not achieved (‘competent’
or ‘not yet competent’).The assessee was expected to demonstrate all the indicative
behaviours in order to be declared competent.
The fifth and final step asked the question ‘how well were the indicative tasks
performed?’These descriptions were termed ‘quality criteria’.
Existing competency-assessment documents already defined the first four
levels: the framework (training package), key performance areas (units), expecta-
tions (elements), and evidence (performance indicators). Panellists in this study
were required to define the fifth level, the quality criteria, in order to indicate how
well each performance indicator might have been performed by assessees of differ-
ent capability levels. The panellists did not have to weight the criteria, they only
had to define them using the rubric rules. As this was the first time such an exer-
cise had been attempted, there was uncertainty among the panellists regarding
whether there was a developmental sequence among the indicators for any
specific performance indicator. In some cases there was a tendency to use ‘steps
taken’ as indicators of quality and this was discussed and remedied.
An example is shown in Figure 2.The figure shows the model for a hospital-
ity training package. One key performance area, or skill required, is for the assessee
to ‘work with colleagues and customers’; one expectation, or capability, as part of
this skill is for the assessee to ‘work in a team’; and one performance indicator,
or evidence, of this capability, is for the assessee to ‘recognise and accommodate
cultural differences within the team’.The panel identified three levels of quality of
performance.These were written in the form of a multiple choice item, as shown
in Figure 2, and the teachers were asked to select the option that best described the
student’s level of performance quality.
For the units involved in the Year Twelve assessment study, there were
fourteen expectations (or capabilities), and a total of twenty-six performance indi-
cators. The panellists established a series of quality criteria for each performance
indicator.
There was no need to weight the criteria based on importance or criticality
as proposed by Thompson, Mathers and Quirk (1996). Weighting of the quality
criteria was based on the use of item response modelling which weights criteria (or
score points) according to their capacity to differentiate or discriminate between
students.
If the ratings were simply added across indicators within a unit and then
across units within a subject, the performance indicators with the largest number
of quality criteria would have made the greatest contribution to a total score.The
subject with the most key performance areas, expectations, performance indicators

Quality criteria
Key
Performance Can you confirm
Framework Performance Expectations Score
indicators that the student
Areas
can:
Criterion 1.1.1.1 1
Establish rapport
when working with
others from a range
of social, cultural
and ethnic
backgrounds?
Criterion 1.1.1.2 2
Area 1: Indicator 1.1.1 Apply a range of
Work with Capability 1.1 Recognise and cummunication
colleagues and Work in a team accommodate strategies when
customers cultural dealing with team
differences members from
within the team diverse
backgrounds?
Criterion 1.1.1.3 3
Use the diverse
background
composition of the
Hospitality team to assist
colleagues achieve
work group goals
Criterion 1.1.2.1
Criterion 1.1.2.3
Criterion 1.1.3.1
Criterion 1.1.3.3
Area 2
Area 3
Figure 2 Presentation of the rubric for a quality criterion in hospitality
and quality criteria would score highest, and the relative importance of a unit could
be enhanced simply by defining a larger number of quality criteria.This approach,
in addition to being flawed, would generally lead to an equally incorrect practice
of insisting that all criteria had the same number of levels or score points and it
would require an irrelevant method of weighting process to establish the relative
importance of the rubrics. If there were more levels of quality, the importance or
influence of the criterion would be increased. Applying the logic of item response
modelling was an important procedure in making this assessment approach acces-
sible. It also provided a way of defining the underlying continuum in terms of a

developing competence linked to the training package. A differentiating score was
a required outcome in this study and it made sense to use a weighting method
directly related to that purpose.
In this project the scores within performance indicators were weighted on the
basis of their effectiveness to differentiate between students. This meant that only
the very best students would achieve the higher scores on the most discriminating
items. So a simple addition of performance indicator scores could be used because
it would produce a discriminating total score. It would then be possible to stan-
dardise the subject scores to produce a university entrance score (where scores were
required) or, in the case of Queensland, an overall position (OP) band, (Griffin,
Gillis, Keating, & Fennessy, 2001).
The results suit the selection process by providing ranked scores; however,
their usefulness is limited in a VET competency context. The numbers have little
substantive meaning unless they are linked to the competency criteria they are
meant to represent. The debate on grading, while antithetical to the idea of stan-
dards referencing, posited the possibility of using a system of score reporting that
looked like grading but which described in performance terms how well a person
has performed in a unit or a subject.
Grading is not an assessment method. It is a normative method of reporting
that translates into a letter grade ranging from A to F, for example, determined by
a distribution with no real substantive interpretation of the letter grades other than
relative locations in the distribution. Like scores, grading too has limited usefulness
in a VET competency context. By comparison, a standards-referenced framework
has no a priori distribution across the levels. Ideally, all students could demonstrate
performance at the highest possible level. It has a greater potential utility in a VET
competency context.
The differentiating score, developed by the procedure described above, was
based on recognising the amount of skill required to perform a task at a specified
level of performance quality. For each performance task, a behavioural rating scale
was used, usually with a score out of two, three or four with each additional
score point recognising an increasing level of performance quality. The higher
the score, the more difficult it was to perform the task at that level of quality; how-
ever, not all equal score numbers represented the same relative level of quality.
A score of three for one criterion might be very difficult to obtain whereas the
same score of three on another criterion might be relatively simple to obtain.The
score of three that was more difficult to obtain therefore identified more capable
people and generally only the capable people could attain this score level of
quality. The score on the easier criterion did not separate out the more capable
people but did identify and describe the highest level of performance for that task
and helped to identify and define the performances of students at lower levels of
quality and at a lower capability of competence. The task for the panel members
was to define the performance levels associated with each score point for each
performance task and to compare each criterion with every other criterion
within a unit of competence in order to establish the discriminating power of
the criterion.

The panel members used a grid to differentiate between performance quality
levels. Each column in the grid represented a task. Rows represented the relative
difficulty of performances.The higher the row the more difficult it was to demon-
strate that level of performance. Numbers were placed in the vertical columns at a
row height that represented the relative difficulty of performing the task at that
level of quality. The row height in the column indicated how difficult it was to
achieve that score.The most difficult performances were at the top and the easiest
ones were at the bottom of the grid. When all codes were placed in the vertical
space, the scores were replaced by the descriptions of the performances that each
score represented.
The panellists identified horizontal groupings or clusters of criteria that were
at or about the same height in the grid.They interpreted those clusters and wrote
a description of the general theme they represented.The themes were refined into
descriptions of the levels of competence that students demonstrated in each
competency unit.
The theme descriptions were at times quite long so summary versions of each
overall level description were developed.The summary statements in turn became
levels in a rating scale that could be used to record performances across multiple
units.This was an important step because the summary statements were then con-
flated across units to obtain scores for subjects. This in turn enabled a shift in the
focus of the assessment from performance indicators to units of competence and
then to obtain a score for a VET subject, and yielded a score for a subject or cer-
tificate from a combination of units. These summary statements and the resultant
record sheet are shown in Figure 3.
It was possible to provide a report for students for each competency unit,
indicating both the level of performance quality and a score. The teacher needed
to rate each student on each performance indicator to obtain a unit score and com-
petency level and then record the competency level. Instead of ticking as each was
observed, the teacher selected a description of the quality criterion that best
matched the student performances.This ensured that there was no additional work-
load for teachers in recording the assessment.
The result was a school-based assessment for ‘VET in schools’ subjects that
could be subjected to statistical and consensus moderation procedures, be differen-
tially weighted, standardised and scaled to produce the universities admission
index appropriate for each of the national education jurisdictions. Each juris-
diction would be able to use a moderated score system that would enable a scaled
university entrance score to be obtained.
The scores and the score range for units in the standards-referenced frame-
work are shown in Table 1. The example is based on an analysis of the fourteen
units common to the New South Wales Higher School Certificate and the
Victorian Certificate of Education in the Hospitality subject.The distribution does
not represent any overall state distribution of student scores because a calibration
sample was used to establish the properties of the standards-referenced framework
and the calibration sample was not a representative random sample of students. Data
collected from the project were merged with each state central examination data

Unit Level 0 Level 1 Level 2 Level 3 Level 4
title
Works as a Demonstrates limited Communicates and Deals with difficult Displays cultural Anticipates, monitors
member of ability to work with interacts with others situations in a sensitivity and high and resolves difficult
team colleagues and customers in a positive and positive and quality sevice situations when dealing
supportive manner sensitive manner with others
0 1 2 3 4
Work in a Requires support Uses various Displays cultural Avoids, and when
socially to work in a communication strategies awareness and required,
diverse socially diverse when dealing with sensitivity resolves cultural
evnironment environment diverse groups misunderstandings
0 1 2 3
Follow Requires support Identifies and reports Follows correct health, Contributes to the
occupational to follow occupational OHS issues safety and security management of workplace
health, safety health, safety requiring attention procedures and can deal health, safety
and security and security procedures with emergency situations and security
procedures 0 1 2 3
Follow Demonstrates ability Follows workplace Applies corrective action Understands hygiene
hygiene to follow basic standards for handling to minimise or remove regulation and its impact
procedures hygiene procedures and storage of foods hygiene risks on the industry
0 1 2 3
Develop and Demonstrates limited Accesses specific Maintain knowledge of the Demonstrates awareness
update ability to develop and information on industry, including and understanding of a
hospitality update hospitality relevant sector of work legal, ethical, and current range of industry related
industry industry knowledge when required issues of local concern issues including current
knowledge and emerging
0 1 2 3
Communicate Demonstrates basic Provides assistance to Applies appropriate Efficiently handles

on the abilities to communicate routine calls and communications, record incoming and outgoing
telephone on the telephone inquiries keeping and problems calls, including those that
solving skills when are urgent and atypical
using the telephone and uses advanced
features of a telephone
system
0 1 2 3
Organise and Demonstrates ability to Follows correct equipment Prepares a range of food Uses logical and time- Improvises ingredients
prepare food assemble equipment and safety procedures and quickly and accurately efficient workflow in when required, applies
prepare some ingredients assembles ingredients with consideration to preparation of food and cutting and shaping
for menu items quality, hygiene, suitability displays a range of knife techniques appropriate to
consistency and wastage handling, cutting and the style of cuisine, and
shaping techniques cleans and prepares
seafood with consideration
to hygiene and OHS
0 1 2 3 4
Present food Demonstrates ability to Portions and presents Presents food using
select appropriate food according to classical and innovate
garnishes/sauces when standard recipes and styles with consideration
presenting food instructions, OHS and to colour, contrast and
hygiene regulations and temperature
food presentation
0 1 2
Process Requires support to Receives cash payments, Follows enterprise Conducts timely Processes a range of
financial process financial issues correct change procedures when transactions, counts cash, non cash transactions in
transactions transactions and records transactions processing automated calculates non cash accordance with enterprise
in timely manner receipts and cash documents and maintains and financial
payments, and removing accurate records institutions
and recording takings
from register/terminal
0 1 2 3 4
Promote Requries support to Supplies acurate and Actively researches and Evaluates products, Applies conflict resolution
products and promote products and readily available maintains product/service services and promotional strategies
services to services to customers information to customers knowledge initiatives. Successfully
customers on products employs upselling and
and services cross selling techniques
0 1 2 3 4
Prepare and Demonstrates limited Prepares and services a Drinks can be customised Prepares ingredients for an
serve non ability to prepare and range of beverages in to meet specific requests extensive range of hot and
alcoholic serve non alcoholic drinks a logical, efficient and and quality control is cold beverages and
drinks presentable manner maintained during busy maintains and monitors
periods equipment usage and
functionality
0 1 2 3
Receive and Follows standard Prioritise storage Identifies, records and Manages stock to
store kitchen procedures for inspecting requirements of various reports incoming stock ensure timely use
supplies storing and recording foods and mantain stock variations and and replacement
incoming stock with consideration to discrepancies and of goods
usage, safety and disposes damaged or
hygiene expired stock in
accordance with OHS
and industry regulations
0 1 2 3
Figure 3 Subject rating and recording form

Table 1 Distribution and score properties for subject aggregation
Hospitality Levels -> F E D C B A
Unit Ni α m s Max Level Max Cut Score
School
Based 14 0.84 9.35 8.10 44 4 9 22 33 40 44
and it helped to illustrate that the procedure could be carried out and that statis-
tical moderation of the school-based assessment is possible with item response
modelling analysis.
From Table 2 it can be seen that there were variable numbers of performance
indicators for each unit. Ni, the number of performance indicators within a com-
petency unit, ranged from twenty-six indicators (unit THHCOR01B) to six (unit
THHGHS01B). Measures of reliability were obtained using the scores assigned to
indicators within units and a score from which levels of competence could be iden-
tified. Reliability estimates ranged from 0.95 to 0.83. Maximum scores for each of
the units are also shown and these were obtained by summing the maximum
quality criteria ratings over all indicators within a unit. Panellists defined the cut
scores for the change of competence level.
Table 2 Distribution and score properties for Hospitality units

Hospitality 0 1 2 3 4
Unit Code Unit Description Ni α Max Level Max Raw Score
THHCOR01B Work with colleagues

and customers. 26 0.95 50 2 9 25 41 50
THHCOR02B Work in a socially diverse
environment 10 0.87 19 3 7 12 19
THHCOR03B Follow health, safety and
security procedures. 11 0.86 18 3 8 15 18
THHGHS01B Follow workplace hygiene
procedures 6 0.83 15 3 9 12 15
THHHC001B Develop and update hospitality
industry knowledge 10 0.92 22 2 7 15 22
THHGGA01B Communicate on the telephone. 13 0.90 24 4 10 18 24
THHBH01B Provide housekeeping services
to guests. 11 26 10 15 20 26
THHBH03B Prepare room for guests 20 43 11 21 29 43
THHBKA01B Organise and prepare food 16 0.92 36 3 6 20 32 36
THHBKA02B Present food 10 0.90 24 4 15 24
THHGFA01B Process financial transactions 15 0.95 35 6 11 20 35
THHGCS02B Promote products and services
to customers 12 30 6 21 28 30
THHBFB10B Prepare and serve non alcoholic
beverages 10 0.94 22 7 12 17 22
THHBKA03B Receive and store kitchen supplies 15 0.93 26 5 10 15 26
α Represents the reliability of the score for the unit

The summary descriptions for each level within a unit shown in Figure 3
enable an aggregation process to be used to obtain a subject score.A student judged
across the fourteen units being at competence levels (4,3,3,2,3,3,2,2,3,2,3, 2,2,3)
would gain an aggregate subject score of thirty-seven from a maximum possible of
forty-three. From Table 3 it can be seen that the overall reliability of the scores for
the aggregation of units was alpha = 0.84.The mean score and standard deviation
were 9.35 and 8.10 respectively in a possible range from zero to forty-three.These
measures do not have a contribution to make to the interpretation of competence,
except to illustrate that there was a wide spread of scores and most were about the
mid range of possible performance. It was possible, however, to grade the scores on
a scale A to F using cut scores as shown in Table 3. Given the failure of scores to
communicate the competency levels of the candidates, it was necessary to translate
the grades into competency statements and have one level
for each unit identified by the expert panel as the minimum for recognition of
competence in that unit. In Figure 3 the first level is below the minimum level.
Table 3 Reliabilities and distributions of the Hospitality assessments in

NSW and Victoria
Separation Score Level
Unit* Alpha Item Case Mean SD F E D C B A
SRF (all) 0.84 0.55 0.78 9.35 8.1 4 9 22 34 36 44

SRF (Vic) - - - 8.55 8.29 11 21 27 33 40
SRF + FB Vic Exam 0.96 0.64 0.86 93.28 39.8 35 70 157 209 215
SRF + CC Vic Exam 0.96 0.71 0.89 65.27 41.7 34 62 93 155 202 218
FB Exam Only 0.95 0.86 0.94 98.51 34.7 22 47 126 175
CC Exam Only 0.95 0.7 0.94 78.9 34.0 23 42 66 121 160 178
SRF + NSW Exam 0.76 0.86 0.79 32.1 7.09 8 32 58 74 85 98
SRF (NSW) - - - 10.56 8.3 8 16 22 24 29
NSW Exam Only 0.76 0.98 0.82 32.15 6.77 0 16 36 50 56 69
*SRF = Standards Referenced Model; FB = Food and Beverage, CC = Commercial Cookery
Consistency
Classical measures of consistency include the indices of reliability known as
Cronbach’s alpha. This index ranges in value from 0.0 to 1.0. Zero indicates that
there is no consistency and that the assessment is entirely random. More recently,
item response theories have added indices of ‘separation’ reliability and they pro-
vide some interesting interpretations. An item separation reliability index is closely
related to the Cronbach index but indicates the extent to which the items or, in
this case, the rubrics or quality criteria were separated along the underpinning con-
struct or continuum. A value of zero indicates that the rubrics are all clustered at
the same point and each indicates the same level of performance. Adding more
items or rubrics at the same level does not add value to the assessment.As the items
are increasingly separated, the value of the index rises to 1.0. At 1.0 the rubrics or

quality criteria are completely separated and each one adds new meaning to the
nature of the continuum and the competency construct. Given that they add to
the meaning of the construct, these data can also be interpreted as a measure of the
validity of the construct. Item response theory also provides a measure of person
separation.The index ranges from 0.0 to 1.0 with similar meaning to the values for
the item separation index. In the case of persons, however, perfect separation on the
underlying variable would indicate that it is possible to accurately identify differ-
ences between the persons assessed. It is possible to use this as a measure of
criterion validity. Item response modelling therefore tends to merge the ideas of
reliability of data and validity of interpretation in ways that other approaches to
assessment cannot.
There was also another form of consistency needed in this project.The expert
panel judgments of the indicator difficulty level needed to be consistent with the
empirical calibration of the level of the indicator. That is, the validity and consis-
tency of the panel judgment needed to be verified.The validity and reliability relied
on the panel approach yielding the same definition of the continuum as yielded by
the item response modelling analysis.This was tested by comparing the panel place-
ment of rubrics at levels of performance and then empirically calibrating their
relative difficulty using item response modelling and data collected by teacher judg-
ments of student performances. A standard error of judgement was defined as the
standard deviation of the differences between the panellists and item response
modelling approaches taken over judgments within a unit of competence.There is
a need for further research to see if this measure was appropriate and to determine
how to assess the significance of its value.The standard error of judgement for the
Hospitality data used in this project, for example, was 0.27. Zero would indicate a
perfect match, but it is not known what the overall distribution statistics are for the
standard error of judgement and hence it is not possible to state how impressive or
otherwise this figure is. Given the small value, it does suggest that there was a close
match and that the subject matter experts group could be regarded as making valid
judgments of relative levels of performance quality and task difficulty. In Table 3,
the reliability indices are reported for Hospitality assessments and in Victoria and
NSW, where the assessments are combined with central examinations, the combined
reliabilities were also assessed, the reliabilities for the standards-referenced frame-
work were not state based and only national reliability is reported for these measures.
The standards-referenced framework developed in this study had high relia-
bility and validity indices, lending considerable support for its use, whether alone
or in combination with a central exam or other forms of assessment. It is likely that
the use of a mean square error will ultimately yield a measure of agreement and
consistency of the panellists’ estimates. Procedures reported by Bateman (2003) and
Connally (2004) relied on additional judgment to assess the level of agreement and
were based more on intuition than measurement.
Consistency is an issue that has to be addressed. In making decisions about
competence, there is a need to not only be consistent but to appear to be as well.
The following analyses show that, for the most part, the competency assessment for
a subject or for a differentiating score is unaffected by the selection of units or by

4
3
Expected Score
0
-3 -2 -1 0 1 2 3 4
Person Location (logits)
Figure 4 Competence and the selection of units
how many units are used. Figure 4 relates student ability or competence to the
standards-referenced band assigned for the each unit. The close alignment of the
lines shows that there is little difference in which or how many units are used to
assess overall competence.
Figure 5, however, shows that the assessment can differ considerably depend-
ing on the location of the assessment. In the Australian Capital Territory, for
example, a student had to demonstrate a high level of performance quality in order
to be assessed as competent (represented by the horizontal line at a value of 1.0.)
In most other states, a much lower level of performance quality was required in
order for the student to be assessed as competent. In this study, the difference due
to location was controlled through the application of item response modelling
analysis but in most assessments, where the weighted differentiating score is not
used and a decision of competence is made, this difference is uncontrolled and the
consistency of decision-making is variable. This underpins the point that there is
no fixed level of competence. Not only is the level of performance expected in the
workplace variable, but the competence decision varies according to the demands
of the curriculum in the school system. The important thing is whether the
student can meet the expectations of the workplace or the school system. This
analysis has exposed the variability in expectations demanded for attributing
competence.
Figures 4 and 5 show that there were different views of competence across
systems and action needs to be taken to address this lack of consistency.The lack of
consistency was not an artefact of the current procedure demonstrated in this
article; it was a hidden aspect of competency assessment, which relied on judgment

4
3
Expected Score
0
-3 -2 -1 0 1 2 3 4
Person location (logits)
Figure 5 Competency and location of assessment
in context.We expect the effect would be exacerbated across workplaces and across
assessors. The lack of consistency is not a weakness of the proposed assessment of
competence and quality; on the contrary, the identification of this inconsistency is
a strength of this analysis.
Previous investigations of consistency of competency assessment have not
focused on outcomes or on the performances of the assessees (Victorian Education
Training and Assessment Services, 2000). They tend to have examined the proce-
dures and materials. Even with constant process and materials, differences exist in
the interpretation of competence. If a consistent national process were to be used
it would be possible for national standards to be set and monitored in competency
assessment and to control effects of location, localised curriculum effects and
judgment inconsistency. Hence, the method reported in this article not only pro-
vides an opportunity for scored assessment, it adds the notion of quality to com-
petence and allows monitoring of standards and the identification of any bias in
competency assessment.
Discussion
In this study the capacity of the student or the level of performance has been con-
trolled for the effect of the location.Without the weighted differentiating score and
item response modelling calibration, this effect would have to be controlled
through moderation. Consistency of competence assessment is an issue that still
needs to be resolved. The methods displayed in this article have shown a possible
approach to resolving or at least identifying such an issue.

Finally the study has illustrated a series of principles that have been adhered
to and that were recommended for the Australian system (Griffin, Gillis, & Calvitto,
2004).
The procedures reported in this article offer an opportunity to combine
quality with competence and to encourage industry and training organisations to
pursue excellence through training and target setting within the training
package frameworks. They have implications for a range of organisations and
individuals. They require an acceptance of levels of performance that go beyond
the dichotomy of competent and not-yet-competent but this requires an
ideological shift that may be resisted.
The approach is clearly differentiated from a normative approach to grading.
It is also necessary to accept that a score could be obtained for each unit and for
combinations of units making up a subject or certificate. Scores for discrete tasks or
performances were weighted on the basis of their capacity to differentiate between
students, rather than on importance, or the number of indicators or criteria. The
scoring and weighting procedure automatically transform numerical scores into a
standards-referenced framework whereas other approaches provide a normative dis-
tribution.The indicators, criteria and competency levels were also based directly on
training package competencies rather than generic skills or assessment methods and
hence the validity of the assessment was enhanced (McCurry, 2003; Messick, 1992).
As a result of this study a series of principles were proposed by Griffin and
Gillis (2001). The principles underpinned the development of the standards-
referenced framework and its attendant reporting system.
The principles stated that in developing quality-based competency
assessment, the following must be true.
1 The system of assessment and reporting must be situated in a theory of
learning and assessment.
2 The procedure and assessment must satisfy both criterion- and norm-
referenced interpretation.
3 The model, approach used, assessment method, materials and decisions must be
transparent and externally verifiable through a formal audit process.
4 The assessment procedure and the model must be resource-sensitive in both
development and application.
5 The model and the approach to assessment and reporting must accommodate
the existing assessment procedures that workplace assessors have been trained
to use with minimal change.
6 The model and its procedures should be accessible to subject matter experts.
7 The procedure must have both face and construct validity.
8 The procedures must be demonstrably fair, equitable and unbiased.
9 The model must be communicative and satisfy the information needs of stake-
holders in a quality assurance context that must be accommodated.
10 The scores and assessments must be amenable to statistical and or consensus
moderation to ensure consistency of decisions and accuracy of score.
This set of principles is recommended for a joint assessment of both quality
and competence.

Key words
vocational education assessment Rasch model
competency secondary education university selection
References
Bateman, A. (2003). A validation of multi source assessment of higher order competency assessment.
Unpublished masters thesis, Faculty of Education, The University of Melbourne,
Australia.
Connally, J. (2004). A multi source assessment of higher order competencies. Unpublished
doctoral thesis, Faculty of Education, The University of Melbourne, Australia.
Curwood, M. (2006). A case study of the implementation of competency based assessment and
training in Australian industry. Unpublished doctoral thesis, Faculty of Education, The
University of Melbourne, Australia.
Glaser, R. (1981). The future of testing: A research agenda for cognitive psychology and
psychopathology, American Psychologist, 36(9), 9–23.
Greaney, V., Khandker, S.R., & Alam, K. (1999). Bangladesh: Assessing basic skills. Dhaka:
University Press.
Griffin, P. (1995). Competency assessment: Avoiding the pitfalls of the past. Australian and
New Zealand Journal of Vocational Education, 3(2), 33–59.
Griffin, P. (1997, September 18). Developing assessment in schools and workplace. Paper pre-
sented at the Inaugural Professorial Lecture, Dean’s Lecture Series, Faculty of
Education, The University of Melbourne.
Griffin, P., & Gillis, S. (2001, May 4). Competence and quality: Can we assess both? Paper pre-
sented at the Upgrading Assessment: A National Conference on Graded Assessment,
Kangan Batman Institute of TAFE, Melbourne.
Griffin, P., & Gillis, S. (2002). Scored assessment for Year 12. (Report of the pilot study).
Melbourne: The University of Melbourne, Assessment Research Centre.
Griffin, P., Gillis, S., & Calvitto, L. (2004). Connecting competence and quality: Scored assess-
ment in Year 12 VET. Melbourne: The University of Melbourne, Assessment
Research Centre.
Griffin, P., Gillis, S., Keating J., & Fennessy, D. (2001). Assessment and reporting of VET
courses within senior secondary certificates. In Creating expanded opportunity for youth:
Greater recognition for VET courses in industry and university. Sydney: New South Wales
Department Vocational of Education and Training
Masters, G.N. (2002). Fair and meaningful measures? A review of examination procedures in the
NSW Higher School Certificate. Melbourne: Australian Council for Educational
Research.
McCurry, D., (2003). But will it work in theory? Theory, empiricism, pragmatics and the key com-
petencies: the place of theory and research in the development of a notion of a work related skills
and the whole school assessment of generic skills. Melbourne: Australian Council for
Educational Research.
McGaw, B. (1997). Shaping their future: Recommendations for reform of the Higher School
Certificate. Sydney: NSW Department of Training and Education Coordination.
Messick, S. (1992). The interplay of evidence and consequences in the validation of performance
assessments: Research report. Paper presented to the Annual Meeting of the National
Council on Measurement in Education, San Francisco, CL, USA.
National Assessors and Workplace Trainers Body. (1998). Training package for assessment and
workplace training. Melbourne: Australian National Training Authority.

Sadler, R. (1987). Specifying and promulgating achievement standards. Oxford Review of
Education, 13(2), 191–209.
Stanley, G. (1993). The psychology of competency based education. In C. Collins (Ed.),
The debate on competencies in Australian education and training. Canberra: The Australian
College of Education.
Thomson, P., Mathers, R., & Quirk, R. (1996). Grade debate: should we grade competency-
based assessment? Adelaide: National Centre for Vocational Education Research.
Victorian Education Training and Assessment Services. (2000). VCE VET Graded
Assessment Pilot Project 1999. Melbourne: Victorian Board of Studies.
Authors
Professor Patrick Griffin is Director of the Assessment Research Centre, Chair of Edu-
cation (Assessment), Deputy Dean, and Associate Dean of Innovation and Development at
the University of Melbourne.
Email p.griffin@unimelb.edu.au
Dr Shelley Gillis is a lecturer and research fellow at the Assessment Research Centre.
Leanne Calvitto was a research associate at the Assessment Research Centre at the time of
this study.
View publication stats

Standards-Referenced Assessment For Vocational Edu

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Standards-Referenced Assessment For Vocational Edu

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Standards-Referenced Assessment for Vocational Education and

Article in Australian Journal of Education · April 2007

Patrick Griffin Shelley Gillis

SEE PROFILE SEE PROFILE

Collaborative problem solving View project

The user has requested enhancement of the downloaded file.

The introduction of competency-based education to Australia in 1992 was intended

Australian Journal of Education, Vol. 51, No. 1, 2007, 19–38 19

Competency assessment and standards-referenced

20 Australian Journal of Education

Standards-referenced assessment for VET 21

The notion of competence

22 Australian Journal of Education

The developmental continuum

Standards-referenced assessment for VET 23

Changed focus for ‘VET in schools’ program

Developing the descriptions of competence

24 Australian Journal of Education

Strand Capability Indicator Quality

Figure 1 The derivation of rubrics for assessing performance quality

Standards-referenced assessment for VET 25

26 Australian Journal of Education

Indicator 1.1.2 Criterion 1.1.2.2

Indicator 1.1.3 Criterion 1.1.3.2

Figure 2 Presentation of the rubric for a quality criterion in hospitality

Standards-referenced assessment for VET 27

28 Australian Journal of Education

Standards-referenced assessment for VET 29

Communicate Demonstrates basic Provides assistance to Applies appropriate Efficiently handles

Figure 3 Subject rating and recording form

30 Australian Journal of Education

Table 2 Distribution and score properties for Hospitality units

THHCOR01B Work with colleagues

Standards-referenced assessment for VET 31

Table 3 Reliabilities and distributions of the Hospitality assessments in

Unit* Alpha Item Case Mean SD F E D C B A

SRF (all) 0.84 0.55 0.78 9.35 8.1 4 9 22 34 36 44

32 Australian Journal of Education

Standards-referenced assessment for VET 33

Figure 4 Competence and the selection of units

34 Australian Journal of Education

Figure 5 Competency and location of assessment

Standards-referenced assessment for VET 35

36 Australian Journal of Education

Standards-referenced assessment for VET 37

38 Australian Journal of Education

View publication stats

You might also like