Professional Documents
Culture Documents
Chap008
Chap008
Chapter 08
Test Development
8-1
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
4. Asexuality
A. is a sexual orientation.
B. is not a sexual orientation.
C. is considered by some to be a sexual orientation and by others not.
D. was de-listed as a sexual orientation in DSM-5.
5. It is an online community of asexual individuals which has become a source of recruitment
of subjects for asexuality research. It is called the
A. Asexuality and Visibility Education Network.
B. Friends of Asexuality.
C. League of Asexual and Non-Sexual Individuals.
D. American Society of Affiliated Individuals for Asexuality.
6. A disadvantage of recruiting asexual research subjects from a single online community is
that
A. the persons belonging to the online community may constitute a unique group within the
asexual population.
B. the persons belonging to the online community have already acknowledged their asexuality
as an identity.
C. asexual individuals who do not belong to the community will be systematically omitted.
D. All of these.
8-2
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
7. In response to the need for an instrument to help identify individuals who have experienced
a lifelong lack of sexual attraction, but who have never heard the term "asexual," Yule et al.
(2015) developed a test called the
A. Asexuality Evaluation Schedule.
B. Asexuality Identification Scale.
C. Asexual Research Subject Selector.
D. None of these
9. The test of asexuality developed by Yule et al. (2015) contains ___ items.
A. 12
B. 18
C. 36
D. 48
10. Brotto and Yule reported that the development of their measure of asexuality was
developed in four stages. Which best characterizes Stage 1?
A. literature search for definitions of asexuality
B. development of open-ended questions
C. literature search for correlates of asexuality
D. writing and submission of a research grant request
8-3
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
8-4
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
11. Brotto and Yule reported that the development of their measure of asexuality was
developed in four stages. Which best characterizes what they did during Stages 2 and 3?
A. analysis of variance
B. regression analysis
C. factor analysis
D. meta-analysis
12. In the course of developing their asexuality measure, Brotto and Yule were able to identify
about ____% of self-identified asexual individuals.
A. 88
B. 93
C. 94
D. 97
13. In order to determine whether their new measure of asexuality was useful over and above
already-available measures of sexual orientation, Brotto and Yule compared it to a previously
established measure of sexual orientation called the
A. Sexual Desire Inventory.
B. Solitary Desire subscale of the Sexual Desire Inventory.
C. Abernathy Measure of Sexual Orientation.
D. Klein Scale.
8-5
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
14. Brotto and Yule established the discriminant validity of their measure of asexuality by
comparing scores on it with scores on
A. the Childhood Trauma Questionnaire.
B. the Short-Form Inventory of Interpersonal Problems-Circumplex scales.
C. the Big-Five Inventory.
D. All of these
8-6
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
15. According to Brotto and Yule, their new measure of asexuality performed satisfactorily
on
A. a measure of incremental validity.
B. a measure of convergent validity.
C. a measure of discriminant validity.
D. All of these
16. Brotto and Yule expressed their belief that their new measure of asexuality
A. does not depend on one's self-identification as asexual.
B. is not capable of identifying the individual who exhibits characteristics of a lifelong lack of
sexual attraction in the absence of personal distress.
C. should be used with caution as a tool of recruitment with members of the asexuality
population.
D. All of these
17. An analysis of a test's item may take many forms. Thinking of the descriptions cited in
your text, which is NOT one of those forms?
A. item validity analysis
B. item discrimination analysis
C. item tryout analysis
D. item reliability analysis
8-7
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
18. As illustrated in the sample item-characteristic curve published in your textbook, the
vertical axis on the graph lists the
A. values of the score on the test ranging from 0 to 100.
B. values of the characteristic of the items on a scale of 1 to 10.
C. heteroscedasity of the item curve in values ranging from 0 to infinity.
D. probability of correct response in values ranging from 0 to 1.
8-8
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
20. Item banks
A. were once a profit center for the Wells Fargo Company.
B. originated as a result of investments made by Morgan-Stanley.
C. originated as a result of investments made by Morgan Freeman.
D. None of these
8-9
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
8-10
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
8-11
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
28. Guttman scales
A. are typically used with nominal categories.
B. typically are constructed so that agreement with one statement may predict agreement with
another statement.
C. typically are constructed so that agreement with one statement should not be correlated
with agreement with any other statement.
D. were originally developed by a Peace Corps task force.
8-12
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
30. Test items that contain alternatives with five points ranging from "strongly agree" to
"strongly disagree" are characterized as using this approach to scaling:
A. Guttman scaling.
B. Likert scaling.
C. Nielson scaling.
D. Opinion scaling.
31. Ideally, the first draft of a test should include at least how many items as compared with
the final version of the test?
A. about twice the number of the final version
B. about half the number of the final version
C. about three times the number of the final version
D. roughly the same number as the final version
8-13
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
8-14
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
38. According to your textbook, the minimum sample for a test tryout is
A. one-half of the number of testtakers in the standardization sample.
B. 25 testtakers.
C. 50 testtakers.
D. 500 testtakers.
39. An ADVANTAGE of applying item response theory (IRT) in test development is that
A. the principles underlying IRT make its application easy and appealing.
B. sample sizes used to test the utility of test items can be relatively small.
C. assumptions underlying IRT usage are weak.
D. item statistics are independent of the samples administered the test.
40. If 100 people take a test and 20 of those testtakers answer a particular item correctly, then
the p value of the item is
A. .25.
B. .20.
C. .40.
D. .04.
41. Which statement best describes the relationship between item difficulty and a "good"
item?
A. The difficulty level is not a factor in determining a "good" item.
B. An item with a high difficulty level is likely to be "good."
C. An item with a mid-range difficulty level is likely to be "good."
D. An item with a low difficulty level is likely to be "good."
8-15
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
8-16
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
46. In item analysis, the term item endorsement refers to the percent of testtakers who
A. responded correctly to a particular item.
B. indicate that they agree with a particular item.
C. passed the item on a pass/fail test of ability.
D. consented to answer an optional item.
8-17
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
8-18
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
8-19
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
54. As a distribution of scores gets flatter, what happens to the optimal boundary line for
determining higher- and lower-scoring groups for item-discrimination indices?
A. the optimal boundary line gets smaller
B. the optimal boundary line gets larger
C. the optimal boundary line does not change
D. the optimal boundary line ceases to be optimal
55. The greater the magnitude of the item-discrimination index, the more testtakers in the
higher-scoring group answered the item correctly, as compared to testtakers
A. who served as the non-test-taking control group.
B. in the lower-scoring group.
C. who participated in the test standardization.
D. None of these
8-20
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
58. What is the value of the item-discrimination index for an item that all the students in the
higher-scoring group answered correctly but that no one in the lower-scoring group answered
correctly?
A. -1
B. +1
C. .50
D. .25
59. What is the value of the item-discrimination index for an item answered correctly by an
equal number of students in the higher- and lower-scoring groups?
A. -1
B. +1
C. .50
D. 0
8-21
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
8-22
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
8-23
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
66. In general, what can be said about an item analysis of a speeded test?
A. Results are often misleading and difficult to interpret.
B. Item-difficulty levels are higher toward the end of the test.
C. Item-discrimination levels are higher for later items.
D. All of these
68. Ability tests are typically standardized on a sample that is representative of the general
population and selected on the basis of variables such as
A. age.
B. gender.
C. geographic region.
D. All of these
8-24
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
70. Which of the following conditions may lead to the decision to revise a psychological or
educational test?
A. item content, including the vocabulary used in instructions and pictures, has become dated
B. test norms no longer represent the population for which the test is designed
C. reliability and validity of a test can be improved by a revision
D. All of these
71. As part of the test development process, a test revision may entail
A. re-wording, deletion, or development of new items.
B. development of a new edition of a test.
C. the reprinting of a test.
D. Both re-wording, deletion, or development of new items and development of a new edition
of a test.
73. Co-validation is:
A. highly recommended and encouraged by test professionals.
B. also referred to as co-norming.
C. a strategy that can save time and money for the test publisher.
D. Both also referred to as co-norming and a strategy that can save time and money for the
test publisher.
8-25
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
8-26
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
74. During the norming of a new intelligence test, a test publisher administers to all of the
testtakers not only the new intelligence test, but a vision test using an eye chart. The publisher
has engaged in
A. test conceptualization.
B. cross-validation.
C. shared validation.
D. None of these
76. The term used to describe the decrease in item validities that typically occurs during
cross-validation is
A. validity detriment.
B. validity decrement.
C. validity shrinkage.
D. cross-validation devaluation.
8-27
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
77. A test manual for a commercially prepared test should ideally include
A. a description of the test development procedures used.
B. test-retest reliability data.
C. internal-consistency reliability data.
D. All of these
8-28
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
78. A student raises concern that a professor has given different grades to two essay answers
that are very similar. From a psychometric perspective, the student is expressing concerns
about
A. criterion-related validity.
B. rater error.
C. test-retest reliability.
D. parallel forms reliability.
79. A student complains that a midterm examination did not include items from a particular
in-class lecture. From a psychometric perspective, the students is expressing concern about
the midterm's
A. test-retest reliability.
B. internal consistency reliability.
C. content validity.
D. cross-validation.
80. A student makes the following complaint after taking an exam: "I spent all night studying
Chapter 7 and there wasn't even one test question from that chapter!" From a psychometric
perspective, this student is concerned about the exam's
A. error variance.
B. test-retest reliability.
C. rater error.
D. None of these
8-29
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
81. A professor who asks a colleague to re-grade a set of essay questions is most likely trying
to address or prevent concerns about:
A. rater error.
B. validity shrinkage.
C. criterion-related validity.
D. test-retest reliability.
82. Most classroom tests developed by instructors for use in their own classroom are
A. subjected to formal procedures of psychometric evaluation.
B. only evaluated formally for content validity.
C. evaluated informally for their psychometric properties.
D. used without modification, year after year, until retirement or death.
84. Which scaling method entails a process by which measures of item difficulty are obtained
from samples of testtakers who vary in ability?
A. difficulty scaling
B. absolute scaling
C. content scaling
D. sample-contingent scaling
8-30
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
86. Which is NOT a typical question that is raised and answered during the test
conceptualization stage of test development?
A. What is the objective of the test?
B. Is there a need for the test?
C. How valid are the items on the test?
D. What types of responses will be required of the testtaker?
8-31
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
90. When writing items for a test, a test developer would be well advised to incorporate
A. knowledge acquired from Cohen & Swerdlik (2017).
B. knowledge from information supplied in scholarly journals.
C. interviews with experts.
D. All of these
8-32
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
8-33
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
93. An advantage of using a true-false item format over a multiple-choice item format in a
teacher-made test designed for classroom use is
A. true-false items are applicable to a wider range of subject areas.
B. true-false items are easier to write.
C. true-false items reduce the odds of a correct answer as the result of guessing.
D. true-false items will never become dated.
8-34
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
97. A decision is made to use only a few subjects per item during the test tryout phase of a
test's construction. This decision is MOST LIKELY to lead to
A. "phantom factors" during test construction.
B. "phantom factors" during the test administration.
C. "phantom factors" during factor analysis.
D. "phantom deposits" in the test author's royalty account.
8-35
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
98. A DISADVANTAGE of applying classical test theory (CTT) in test development is that
A. the number of testtakers in the sample must be very large.
B. all CTT-based statistics are sample-dependent.
C. assumptions underlying CTT use are weak.
D. All of these
8-36
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
8-37
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
8-38
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
107. An analysis of item alternatives for a multiple-choice test can yield information about
A. the effectiveness of distracter choices.
B. which items are in need of revision.
C. testtaker response patterns.
D. All of these
8-39
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
111. On a particular test, men and women tend to have the same total score. Men and women
do, however, tend to exhibit different response patterns to specific items. A reasonable
conclusion is that the test is:
A. unreliable.
B. invalid.
C. biased.
D. patently unfair.
8-40
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
113. As the result of a sensitivity review, items containing __________ may be eliminated
from a test.
A. offensive language
B. stereotypes
C. unfair reference to situations
D. All of these
8-41
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
115. As part of the process of test development, the term test revision BEST refers to the
A. rewording, deletion, or development of new items.
B. development of a completely new test.
C. reprinting of a test after a previous edition has sold out.
D. Both rewording, deletion, or development of new items and development of a completely
new test.
8-42
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
119. A test developer designs a test for the sole purpose of identifying the most highly skilled
individuals among those tested. During the test revision stage of test development, the test
developer will be particularly interested in
A. item bias.
B. item discrimination.
C. item reliability.
D. item validity.
120. In creating a test designed to measure personality constructs, the test developer's first
step would BEST be to
A. determine which items would lead to socially desirable responses.
B. create a large pool of potential items.
C. define the construct or constructs being measured.
D. select a representative sample of testtakers for test tryout.
8-43
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
1. strongly agree.
2. agree.
3. unsure.
4. disagree.
5. strongly disagree.
123. If 50 students were administered a classroom test, how many would be included in each
group for the purpose of calculating d, the item-discrimination index?
A. 25
B. 10
C. 13
D. 17
8-44
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
124. The Rokeach values measure involves presenting the subject with index cards, on each
of which a single value is listed. Testtakers are asked to place the cards in order of their own
concern about each of the values. This procedure BEST exemplifies
A. multidimensional scaling.
B. Likert scaling.
C. comparative scaling.
D. Murray scaling.
125. When analyzing a particular item's discriminative abilities for an ability test, the test
developer typically compares the responses to the item to
A. the highest and lowest scorers on the test.
B. the highest and middle scorers on the test.
C. the performance on the test of a minority groups to rule out any possible bias.
D. testtakers from predefined age groups to rule out any possible age discrimination.
8-45
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
8-46
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
129. You are interested in developing a test for social adjustment in a college fraternity or
sorority. You begin by interviewing persons who had graduated from college after having
been a member of a fraternity or sorority for at least 2 years. Which stage of the test
development process BEST describes the stage that you are in?
A. the test-tryout stage
B. the pilot work stage
C. the test construction stage
D. None of these
130. These tests are often used for the purpose of licensing persons in professions. The tests
referred to here are
A. pilot tests.
B. norm-referenced tests.
C. criterion-referenced tests.
D. Guttman scales.
8-47
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
131. It is a term that is used to refer to the preliminary research surrounding the creation of a
prototype of a test. Which of the following BEST describes that term?
A. pilot work.
B. pilot study.
C. pilot research.
D. All of these
8-48
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
132. In his article entitled "A Method of Scaling Psychological and Educational Tests," L. L.
Thurstone introduced absolute scaling which was a
A. procedure for obtaining a measure of item validity.
B. procedure for obtaining a measure of item difficulty.
C. procedure for deriving equal-appearing intervals.
D. procedure for divining item reliability.
133. As with the use of other rating scales, the use of Likert scales typically yields _______-
level data.
A. nominal
B. ordinal
C. interval
D. ratio
8-49
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
8-50
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
136. In contrast to scaling methods that employ indirect estimation, scaling methods that
employ direct estimation do not require:
A. writing two sets of items for parallel forms.
B. the use of the method of equal-appearing intervals.
C. transforming testtaker responses into some other scale.
D. indirect methods to interpret testtaker responses.
138. As described in the text, all of the following are elements of a matching item EXCEPT:
A. a column listing propositions.
B. a column listing responses.
C. a column listing premises.
D. a place to insert the correct number or letter choice.
139. The two columns of a matching item may contain different number of items because this
makes
A. the odds of cheating successfully on this type of item significantly less.
B. it more difficult to achieve a perfect score by guessing.
C. the role of chance a much greater factor than it would be otherwise.
D. it possible for testtakers to decline to respond to certain items.
8-51
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
141. A strategy for cheating on an examination entails one testtaker memorizing items and
later recalling and reciting them for the benefit of a future testtaker. This cheating strategy
may be countered by
A. a computer-tailored test administration to each testtaker.
B. a computer-randomized presentation of test items.
C. Both a computer-tailored test administration to each testtaker and a computer-randomized
presentation of test items.
D. None of these
142. On a true/false inventory, a respondent selects true for an item that reads, "I summer in
Tehran." The individual scoring the test would BEST interpret this response as indicative of
the fact that this respondent
A. is extremely eccentric with respect to choice of time shares.
B. requires more sensation-seeking than Cape Cod has to offer.
C. is responding randomly to test items.
D. None of these
8-52
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
143. Jana takes a personality test administered by the "True Compatibility Dating Service."
According to the personalized, computerized personality profile that results, Jana learns that
her need for exhibitionism is much greater than her need for stability. Since the test analyzes
data only with regard to Jane, and no other client of the dating service, it may be assumed that
the test was scored using
A. a diagnostic model.
B. a cumulative model.
C. an ipsative model of scoring.
D. truly compatible models.
144. A math test developer is interested in deriving an index of the difficulty of the average
item for his math test. As his consultant on test development, you advise him that this index
could be obtained by:
A. identifying the item deemed to be average in difficulty and then deriving an item-difficulty
index for that item.
B. averaging the item-difficulty indices for all test items and then dividing by the total number
of items on the test.
C. dividing the total number of items on the test by the average item-difficulty index.
D. raising that very same question to a more knowledgeable test development consultant.
8-53
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
145. A test developer of multiple-choice ability tests reviews data from a recent test
administration. She discovers that testtakers who scored very high on the test as a whole, all
responded to item 13 with the same incorrect choice. Accordingly, the test developer
A. assumes that members of the high-scoring group are making some sort of unintended
interpretation of item 13.
B. plans to interview members of the high-scoring group to understand the basis for their
choice.
C. Both assumes that members of the high-scoring group are making some sort of unintended
interpretation of item 13 and plans to interview members of the high-scoring group to
understand the basis for their choice.
D. should remove item 13 from the test and place in its stead a note that reads: "Go to Item
14."
8-54
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
149. The reason latent-trait theory is so-named has to do with the presumption that
A. latent traits exist in males and females to the same degree.
B. whatever the test is measuring is multidimensional in nature.
C. the variable being measured is never directly measurable itself.
D. None of these
8-55
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
150. Test scores measuring latent traits can, in theory at least, take on values ranging from
A. 0 to infinity.
B. negative infinity to positive infinity.
C. 0 to one million.
D. negative one million to positive one million.
151. On the item characteristic curves for a test of ability, a large number of items biased in
favor of male testtakers is found to coexist with the exact same number of items biased in
favor of female testtakers. Based on these findings, it would be reasonable for the test
developer to claim that the test
A. measures the same ability in the two groups.
B. is a fair test as any observed bias balances out.
C. demonstrates gender equality for the ability measured.
D. None of these
153. Possible applications of IRT were discussed in your textbook. Which of the following is
NOT one of those possible applications?
A. determining measurement equivalence across testtaker populations
B. identifying a common metric among several tests measuring the same construct
C. evaluating existing tests for the purpose of mapping test revisions
D. developing item banks
8-56
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
8-57
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
154. To increase the precision of a test, test developers may have to
A. increase the number of items.
B. increase the number of response options.
C. Both increase the number of items and increase the number of response options.
D. None of these
155. When a test is translated from one language in one culture to another language in another
culture, ______ can help ensure that the original test and the translated test are reasonably
equivalent and tapping the same construct.
A. a translator
B. IRT
C. bi-lingual people who are experts on the two cultures
D. All of these
156. A test item functions differently in one group of testtakers as compared to another group
of testtakers known to have the same level of an underlying trait. This phenomenon is known
as:
A. dysfunctional item syndrome.
B. DIF.
C. DIF item difference.
D. DIF item incongruity.
8-58
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
8-59
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
159. When testing is conducted by means of a computer within a CAT context, it means that
A. a testtaker's response to one item may automatically trigger what item will be presented
next.
B. testing may be terminated based on some pre-set number of consecutive item failures.
C. testing may be terminated based on some pre-set, maximum number of items being
administered.
D. All of these
160. As mentioned in the text, CAT is available on a wide array of platforms including
A. the Internet.
B. X-box.
C. Playstation.
D. All of these
8-60
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
A. as theta increases, the probability of a response scored correct increases.
B. as theta decreases, the probability of a response scored correct increases.
C. as theta increases, the probability of a response scored correct decreases.
D. None of these
162. The inspiration to create a new test may come from many varied sources. Thinking of the
illustrative descriptions of inspiration cited in your text, which of the following is NOT a
possible source of inspiration for the creation of a new test?
A. an emerging social phenomenon suggests the need for a psychological test
B. legislation has been passed ordering the creation of a new psychological test
C. a review of the literature suggests a need for a new psychological test
D. a test developer thinks "there is a need for this sort of test"
8-61
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
8-62
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
163. One of the questions that the developer of a new test must answer is, "How will the test
be administered?" The answer to this question may be
A. the test will be individually administered.
B. the test will be group administered.
C. the test will be individually or group administered.
D. None of these
164. One of the questions that the developer of a new test must answer is, "Should more than
one form of the test be developed?" In answering this question, a primary consideration is
A. development costs.
B. test content.
C. test reliability.
D. item discrimination.
8-63
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
8-64
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
168. A close friend, who is now a beauty school dropout, is heard to complain: "I spent all
night studying ‘Shampoo' for the final examination and there was not a single question on that
subject!" As a budding expert in testing and assessment you hear that complaint as:
A. "I have a problem with that test's content validity!"
B. "There was excessive error variance in the test administration procedures!"
C. "The instructor should have paid more attention to the test's construct validity!"
D. "Now I am going to have to reconsider a career as a tanning technician!"
169. If all raw scores on a test are to be converted to scores that range only from 1 to 9, the
resulting scale is referred to as this type of scale:
A. a unidimensional scale.
B. a stanine scale.
C. a multidimensional scale.
D. None of these.
8-65
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
8-66
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
172. Test item writers must keep many considerations in mind. Which of the following is
NOT typically one of those considerations?
A. Will the test be administered by an instructor or a teaching assistant?
B. Which item format or formats should be employed?
C. How many items should be written in total?
D. What range of content should the items cover?
173. A test developer is designing a standardized test using a multiple-choice format. The
final form of the test will contain 50 items. It would be advisable for the first draft of this test
to contain, at least, how many items?
A. 50
B. 100
C. 150
D. 25
174. A test item written in a multiple-choice format has three elements. Which of the
following is NOT one of those elements?
A. foil
B. stem
C. leaf
D. correct option
8-67
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
8-68
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
"I am going to ace this course in psychological testing and assessment." Circle TRUE or
FALSE according to your own belief.
177. A test developer has created a pool of 30 items and is ready for a test tryout. At a
minimum, how many subjects should the test be administered to?
A. 60
B. 120
C. 150
D. 180
8-69
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.
Chapter 08 - Test Development
178. Test developers have at their disposal a number of statistical tools that may be applied
when selecting items for use on a test. In Chapter 8's Meet an Assessment Professional, Dr.
Scott Birkeland made reference to two such techniques. One was a measure of item
discrimination, and the other was a measure of item
A. reliability.
B. utility.
C. difficulty.
D. variance.
8-70
Copyright © 2018 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of
McGraw-Hill Education.