Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Notes on Assessment/Test Construction

Table of Specifications (TOS) – sometimes called a test blueprint, is used by teachers to design a test. It is a table that maps out
the test objectives, contents, or topics covered by the test; levels of cognitive behavior to be measured; the distribution of
items, number placement, and weights of test outcomes, assessment, and instruction aligned.

Generally, the TOS is prepared before a test is created. Teachers need to create a TOS for every test that they intend
to develop. The TOS is important because it does the following:
 ensures that the instructional objectives and what the test captures match
 ensures that the test developer will not overlook details that are considered essential to a good test
 makes developing a test easier and more efficient
 ensures that the test will sample all important content areas and processes
 useful in processing and organizing
 offers an opportunity for teachers and students to clarify achievement expectations

Steps in developing a Table of Specifications

1. determine the objectives of the test (instructional objectives or the intended learning outcomes)
2. determine the coverage of the test (contents of the test)
3. calculate the weight of each topic (time spent for each learning objective during instruction)
4. determine the number of items for the whole test.
5. determine the number of items per topic or leaning objective

Different Formats of a TOS

1. One-Way TOS- maps out the content or topic, test objectives, number of hours spent per learning objective, and format,
number placement of items.
2. Two-Way TOS – contains those in the one-way format and the levels of cognitive behavior targeted per test content

 Deciding on what test format to use generally depends on the learning objectives or desired learning outcomes
of the subject/unit/lesson.
 The level of thinking to be assessed is also an important factor to consider when designing a test.
 Test items should be meaningful and realistic to the learners, relevant or related to their everyday experiences.

Traditional Tests are categorized as:

1. selected-response type – learners select the correct response from the given options.
 multiple choice
 true or false or alternative response test
 matching type test
2. Constructed-response test – require learners to supply answers to a given question or
problem
 short answer test – consists of open-ended questions or incomplete sentences that require learners to
create an answer for each item. (completion, identification, enumeration
 essay test – consists of problems/questions that require learners to compose or construct written
responses.
 problem solving test – consists of problems/questions that require learners to solve problems in
quantitative or non-quantitative settings using knowledge and skills in math

Test Reliability – consistency of the responses to measure under three conditions:

1. when retested on the same person


- consistent responses is expected when the test is given to the same participants
2. when retested on the same measure
- if the responses to the test is consistent with the same test or its equivalent or
another test that measures, but measures the same characteristic when
administered at a different time
3. similarity of responses across items that measure the same characteristics
- when the person responded in the same way or consistently across items that
measure the same characteristic

There are different factors that affect the reliability of a measure. The reliability of a
measure can be high or low depending of the following factors:

1. The number of items in a test. The more items a test has, the likelihood of reliability is
high. The probability of obtaining consistent scores is high because of its large pool of
items.
2. Individual differences of test participants. Every participant possess characteristics that
affect the performance in a test, such as fatigue, concentration, innate ability,
perseverance, and motivation. These individual factors change over time and affect the
consistency of the answers in a test.
3. External environment. The external environment may include room temperature, noise
level, depth of instruction, exposure to materials, and quality of instruction, which could
affect changes in the responses of examinees.

What are the different ways to establish test reliability?

The specific kind of reliability will depend on


1. variable you are measuring
2. type of test, and
3. number of versions of the test

Method in testing How is this reliability done? What statistics is used?


reliability
1. Test-retest The test is administered at one time to a group Correlate the test scores from the
of examinees. Then administer it again at first and the next administration.
another time to the same group of examinees. You may use Pearson product
Time interval – not more than 6 months moment of correlation or Pearson
r.
2. Parallel Forms There are two versions of the test. The items Pearson r
need to exactly measure the same skill. Each
version is called a Form. Administer one form
at one time and the other form to another time
to the same group of examinees. Ex. Entrance
test
3. Split-half Administer a test to a group of examinees. Test Correlate the two sets of scores
items need to be split into halves, using odd- using Pearson r. Then use
even technique. Spearman Brown Coefficient. The
Used when the test has a large number of correlation coefficients obtained
items. using the two statistics must be
significant and positive to mean
that the test has internal
consistency reliability.
4. Test of internal This procedure involves determining if the Cronbach's alpha rule or Kuder
consistency using scores for each item are consistently answered Richardson. A Cronbach's alpha
Kuder-Richardson and by the examinees. value of 0.60 and above indicates
Cronbach's Alpha that the test items have internal
method consistency
5. Inter-rater reliability used to determine the consistency of multiple Kendall's tau coefficient of
raters when using rating scales or rubrics to concordance to determine if the
judge performance. reliability referred here is ratings provided by multiple raters
the similarity or consistency of ratings provided agree with each other.
by more than one rater or judge when using an
assessment tool.

The very basis of statistical analysis to determine reliability is the use of linear regression.

Test Validity

A measure is valid when it measures what it is to measure. If a quarterly exam is valid, then the contents
should directly measure the objectives of the curriculum. Example. If an entrance exam is valid, it should predict
students' grades after the semester.

What are the different ways to establish test validity?

Type of validity Definition Procedure


1. Content Validity when the test items The test items are compared with the objectives of
represent the domain being the program. The items need to measure directly the
measured objectives (achievement)
2. Face validity When the test is presented The test items and layout are reviewed and tried out
well, free from errors, and on a small group of respondents.
administered well.
3. Predictive validity A measure should predict A correlation is obtained where the x-variable is used
future criterion. Ex> as the predictor and the y-variable as the criterion.
entrance exam
4. Construct validity The components or factors The scores on the measures should be correlated
of the test should contain
items that are present for
each examinee that measure
the same characteristic
5. Convergent validity when the components or correlation is done for the factors of the test
factors of a test are
hypothesized to have a
positive correlation
6. Divergent validity when the components or correlation is done for the fac tors of the test
factors of a test are
hypothesized to have a
negative correlation. ex.
scores for extrinsic and
intrinsic motivation

You might also like