Types of Norm

Types of Norm
Age- average score in different ages.
Grade norms- average scores of students in different grade level of groups.
Percentile norms- score or number of students below that is equal to the % of such students
Ex: 25 feet of students...75% scored below
National norms- Sample coming from whole population
Ex: Luzon, Visaya, Mindanao
Subgroup norms- segmented by any group of criteria. (ex: Profile- select according to age).
Local population norms- local in test
example- compare results of test in national to local ; students take ACHIEVEMENT TEST IN BAUAN).
• Standardization or Test standardization - The process of administering a test to a representative

sample of testtakers for the purpose of establishing norms.
•Developing norms for a standardized test.
Types of Norms
• Age norms
• Grade norm
• National norms
• National anchor norms
• Local normal
• Subgroup norms
• Percentile norms
Fixed Reference Group Scoring Systems
-the distribution of scores obtained on the test from one group of test takers called the fixed reference
group is used as the basis for the calculation of test scores for future administrations of the test
example: Scholastic Aptitude test

Norm-Referenced versus Criterion- Referenced Evaluation
• Norm-referenced tests make comparisons between individuals.
• Criterion-referenced tests measure a fest taker's performance compared to a specific set of standards
or criteria.
Norm-referenced versus criterion- referenced evaluation
- Suppose you received a score of 90% on a Math exam in school. This could be interpreted in both
ways. It the cut-score was 80%, you clearly passed. If the average score was 75%, then you performed at
the tope of the class.
NORM- comparing 90 to 80
CRITERION REFERENCE TEST- comparing average score who pass in 90.
Correlation is an expression of the degree and direction of correspondence between two things.
• Pearson (r ); +1 0 -1
• Spearman Rho
• Regression
Meta-Analysis
- a family of techniques used to statistically combine information across studies to produce singie
estimates of the statistics being studied.
Positive Correlation ⬆️⬆️⬇️⬇️
Negative Correlation ⬆️⬇️
Utility
-usefullness of practical value of testing to improve efficiency.

Factors that affect utility
•Psychometric soundness - the reliability and validity of a test
•Cost-economic and noneconomic
•Benefits - justify the costs of administering, scorings and interpreting the test
Utility Analysis
- a family of techniques that entail a cost-benefit analysis designed to yield information relevant to a
decision about the usefulness and/or practical value of a tool of assessment
UTILITY RELATED QUESTIONS
- one test is preferable to another test for use for a specific purpose
- one tool of assessment (such as a test) is preferable to another tool of assessment (such as behavioral
observation) for a specific purpose
- the addition of one or more tests (or other tools of assessment) to one or more tests (or other tools of
assessment) that are already in use is preferable for a specific purpose
- no testing or assessment is preferable to any testing or assessment
Some Practical
- The pool of job applicants
- The complexity of the job
- The cut score in use
Cut Score
- Reference point derived as a result of a judgment and used to divide a set of data into two or more
classifications, with some action to be taken or some inference to be made on the basis of these
classifications.
Type of Cut Score
• relative cut score
• norm-referenced cut score
•fixed cut score
• multiple cut scores
Methods for Setting Cut Scores
• The Angoff Method
• The Known Groups Method- compare traits and characteristics in job
• IRT-Based Method- correct pass or failedTest Development
The process of developing a test occurs in fi ve stages
• test conceptualization - self thought
• test construction
• test tryout- test the validity of the test (SCORING ITEMS)
• item analysis- items (acceptable or not)
• test revision
Test Conceptualization
• Scaling Methods process of the assigning numbers in measurement.
-Rating Scale
• Summative Scale
• Liken Scale
- Method of Paired Comparisons
-Comparative Scaling
-Categorical Scaling
-Guttman Scale
For example: Rate Burger King in comparison to McDonald's
. Excellent
• Very good
• Good
Both are the same
• Poor
• Very poor
A comparative rating scale allows the researcher to interpret the resulting data in relation to another
company or product.
Categorical scaling
• In our running MDBS-R example, testtakers might be given 30 index cards on each of which is printed
one of the 30 items. Testtakers would be asked to sort the cards into three piles: those behaviors that
are never justified, those that are sometimes justified, and those that are always justified.
• MDBS-R-Morally Debatable Behaviors Scale-Revised
• Guttman scale,
• Do you AGREE or DISAGREE with each of the following:
a. All people should have the right to decide whether they wish to end their lives.
b. People who are terminally ill and in pain should have the option to have a doctor assist them in
ending their lives.
c. People should have the option to sign away the use of artificial life- support equipment before they
become seriously ill.
d. People have the right to a comfortable life.
•Writing Items
- Selected-response format
• Multiple choice
•Matching
• True/False
Writing Items
- Constructed-response format
• Completion item
• Short answer
•Essay
Item A
Stem- A psychological test, an interview, and a case study are
CORRECT ALT
a psychological assessment tools
DISTRACTORS
standardized behavioral samples
reliable assement instruments
theory-linked measures
Items B
A good multiple-choice item in an achievement test
a. has one correct alternative
b. has grammatically parallel alternatives
c. has alternatives of similar length
d. has alternatives that fit grammatically with the stem
e. includes as much of the item as possible in the stem to avoid unnecessary repetition
f. avoids ridiculous distractors
g. is not excessively long
h. all of the above
l. none of the above
• Writing items for computer administration
- Item bank - relatively large and easily accessible collection of test questions
- Item branching The ability of the computer to tailor the content and order of presentation of test items
on the basis of responses to previous items.
Scoring Items
• Cumulative-the higher the score on the test, the higher the test taker is on the ability, trait, or other
characteristic that the test purports to measure
• Class Category-test taker responses ram credit toward placement in a particular class or category with
other test takers whose pattern of responses is presumably similar in some way
• Ipsative Scoring- comparing a test taker's score on one scale within a test to another scale within that
same test.
Item Analysis
• Other Considerations
• Guessing
•Item fairness
• Speed Test
Item Difficulty Index
• It is a measure of the proportion of examinees who answered the item correctly for this reason, it is
frequently called the p-value. As the proportion of examinees who got the item right, the p-value might
more properly be called the item easiness index, rather than the item difficulty.
• It can range between 0.0 and 1.0, with a higher value indicating that a greater proportion of
examinees responded to the item correctly, and it was thus an easier item.
Item Reliability Index
• Provides an indication of the internal consistency of a test
The higher this index, the greater the test's internal consistency.
• This index is equal to the product of the item-score standard deviation (s) and the correlation (r)
between the item score and the total test score.
Item Validity Index
•Provide an indication of the degree to which a test is measuring what it purports to measure
• The higher the item-validity index, the greater the test's criterion-related validity.
Item Discrimination Index
• measure of how well an item is able to distinguish betwe examinees who are knowledgeable and
those who are not, between masters and non-masters.
Potential Areas of Explanation by Means of Qualitative Item Analysis
- Possible interest in test users.
- The questions raised either orally or in writing shortly after a test adminstration.
- Depending upon objectives of the test users the questions could be placed in other formats such as
true or false or multiple choice.
TEST REVISION
• For new test development
• For existing test
Cross validation-revalidation of a test on a sample of testtakers other than those on whom test
performance was originally found to be a valid predictor of some criterion
Co-validation-a test validation process conducted on two or more tests using the same sample of
testtakers

Types of Norm

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Types of Norm

Uploaded by

Copyright:

Available Formats

Types of Norm

Age- average score in different ages.

Grade norms- average scores of students in different grade level of groups.

Ex: 25 feet of students...75% scored below

National norms- Sample coming from whole population

Ex: Luzon, Visaya, Mindanao

Local population norms- local in test

• Standardization or Test standardization - The process of administering a test to a representative

•Developing norms for a standardized test.

• National anchor norms

Fixed Reference Group Scoring Systems

example: Scholastic Aptitude test

• Norm-referenced tests make comparisons between individuals.

Norm-referenced versus criterion- referenced evaluation

CRITERION REFERENCE TEST- comparing average score who pass in 90.

Positive Correlation ⬆️⬆️⬇️⬇️

Negative Correlation ⬆️⬇️

-usefullness of practical value of testing to improve efficiency.

•Psychometric soundness - the reliability and validity of a test

•Cost-economic and noneconomic

UTILITY RELATED QUESTIONS

- no testing or assessment is preferable to any testing or assessment

- The pool of job applicants

- The complexity of the job

- The cut score in use

• relative cut score

• norm-referenced cut score

•fixed cut score

• multiple cut scores

Methods for Setting Cut Scores

• The Angoff Method

• The Known Groups Method- compare traits and characteristics in job

• IRT-Based Method- correct pass or failedTest Development

The process of developing a test occurs in fi ve stages

• test conceptualization - self thought

• test tryout- test the validity of the test (SCORING ITEMS)

• item analysis- items (acceptable or not)

• Scaling Methods process of the assigning numbers in measurement.

- Method of Paired Comparisons

Both are the same

• MDBS-R-Morally Debatable Behaviors Scale-Revised

• Do you AGREE or DISAGREE with each of the following:

d. People have the right to a comfortable life.

a psychological assessment tools

standardized behavioral samples

reliable assement instruments

A good multiple-choice item in an achievement test

a. has one correct alternative

b. has grammatically parallel alternatives

c. has alternatives of similar length

d. has alternatives that fit grammatically with the stem

f. avoids ridiculous distractors

g. is not excessively long

h. all of the above

l. none of the above

• Writing items for computer administration

Item Difficulty Index

Item Reliability Index

• Provides an indication of the internal consistency of a test

Item Validity Index

Item Discrimination Index

Potential Areas of Explanation by Means of Qualitative Item Analysis

- Possible interest in test users.