Professional Documents
Culture Documents
Assessment of University Students' Critical Thinking: Next Generation Performance Assessment
Assessment of University Students' Critical Thinking: Next Generation Performance Assessment
Assessment of University Students' Critical Thinking: Next Generation Performance Assessment
Julian P. Marino
Universidad de los Andes Facultad de Ciencias, Colombia
INTRODUCTION
the universe of situations in which the skills were enacted. Taken together, the
job analysis represented both the universe of tasks performed by patrolmen
and the universe of situations in which they were performed. Situations
sampled from the job universe were built as high fidelity simulations of the
tasks patrolmen actually carry out (for additional examples of performance
tasks, see Seminara, Shavelson, & Parsons, 1967; for particularly pre-college
education, Lane & Stone, 2006; Shavelson, Baxter, & Pine, 1991; NAEP,
2009, p. 107; see also Fu, Raizen, & Shavelson, 2009).
Performance assessment typically comprises a combination of selected-
response (“respondent”) and constructed-response (“operant”) items and tasks.
To qualify as a performance assessment, however, there has to be at least one
concrete high-fidelity simulation of a criterion situation (and preferably more).
The task(s) should be sampled from a specified universe of tasks comprising
the criterion situation. Think of all the possible situations you might want to
observe a novice automobile driver navigate; better yet go out and observe
him/her. Then, sample the situations. From the driver’s performance on the
sample of tasks, infer her/his performance in the universe of driving situations
(with some degree of error).
We now turn to a description of the iPAL performance assessment frame-
work. Our focus is on performance tasks and not selected-response tasks
because there is plenty of textbook material on the latter (e.g., Secolsky &
Denison, 2018). More specifically, we focus on “test domain” analysis and
modeling, task and response sampling, scoring, and other matters (e.g., imple-
mentation) (for more details, see Shavelson et al., 2018).
The overarching constructs underlying iPAL are students’ and more generally
citizens’ capacity when confronted with everyday complex life situations to,
for example, think critically about the environment, or take the perspective of
others in a business situation and communicate their ideas, beliefs, analyses
and decisions precisely. Typically, a real-world event or “problem to be sol-
ved” is presented in brief story format accompanied by information more or
less reliable and relevant to the event or problem (see example in the next sec-
tion). The story might require reasoning for a claim that admitting migrants
into a country raises the crime rate, with students combining several data sour-
ces and doing some basic calculation to generate reliable and useful informa-
tion to address the claim. In another case, the problem might tap thinking
critically about the message underlying an art exhibition that portrays the ten-
sion between engineering’s contribution to progress and its negative environ-
mental impact in pictures, sculptures, and literature. At other times, critical
ASSESSMENT OF UNIVERSITY STUDENTS’ CRITICAL THINKING 5
a. contexts in which thought processes are needed for solving problems and
making decisions in everyday life, and
b. contexts in which mental processes can be applied that must be developed
by formal instruction, including processes such as comparing, evaluating
and justifying.
Task Universe
The universe of tasks demanding generic critical thinking comprises the myr-
iad everyday complex life situations. iPAL samples such situations for inclu-
sion in performance tasks and more traditional items (e.g., multiple-choice). A
prime source of situations may be found easily in mass media (e.g., politics,
environment, business, and science). The Airplane task developed for the CLA
(Shavelson, 2010), for example, was inspired by the report of an aircraft crash
at the Van Nuys Airport in Southern California.
Performance tasks are complex often without a clear path toward solution,
decision or action. Rather there are tradeoffs. They admit to more than one
feasible solution; when incorporated into an assessment they have better and
worse solutions, decisions, actions etc. The tasks are compelling in the sense
they represent current everyday challenges that test-takers face or might be
expected to face as college graduates and, more generally, as citizens.
Once a construct such as critical thinking is chosen and a domain of life activ-
ities is selected, a search for possible assessment stories and specific tasks ensues
(an internet search is invaluable). Once chosen, the following is carried out:
1. An assessment story is built. The story is a short and motivates the per-
formance assessment activities.
2. Assessment tasks are developed to include certain elements that invite
test-takers to think critically—trustworthiness, relevance, proneness.
3. A response is requested that involves bringing evidence to bear from the
information given on the problem, activity or situation in order to justify a
decision, recommendation, course of action, etc.
Information-Source Sampling
Material such as newspaper articles, YouTube videos, government reports are
sampled from real-world domains and constructed to vary information in the
event. The information provided may be manipulated as to its:
Response Considerations
The result of critical thinking is typically a problem solution, a decision, a rec-
ommended course of action, a judgment or direct action. In all cases, two ele-
ments are required:
1. The problem solution (etc.) must be justified with the information avail-
able in the assessment. That is, a strong response would:
use trustworthy information and avoid less-than-trustworthy information,
use relevant information and avoid peripheral information,
avoid judgmental and decision-making “traps” and biases, and
8 SHAVELSON ET AL.
To measure critical thinking and evaluate the measurement, we used a new iPAL-
developed computer-based assessment, Wind Turbines (WT). Evidence of its
reliability and validity were garnered from (1) student performance scores, (2) a
(semi-)structured cognitive interview based on the cognitive validity of score inter-
pretation, and (3) a standardized questionnaire with selected-response questions
measuring potential personal factors and contextual influence factors in test
performance.
Task Construct
The PAL task “Wind Turbine” (WT) presented here is designed to measure
critical thinking in higher education students or graduates across domains of
study in four central facets:
TABLE 1
Descriptive Statistics and Reliability of Average Scores on the 6-point Likert Scale in the
Performance Task (N = 30)
Internal
Standard Consistency
Statistic Variable Mean Deviation Minimum Maximum alpha
Overall (Total possible score: 138) 3.55 0.76 1.39 4.65 0.95
Rubric 1: Recognizing and 4.05 0.70 2.13 4.88 0.64
evaluating the relevance (24)
Rubric 2: Evaluating and decision 3.52 0.83 1.23 5.06 0.90
making (54)
Rubric 3: Recognizing and 2.68 0.84 1.00 4.63 0.82
evaluating consequences (24)
Rubric 4: Writing effectiveness (36) 3.84 0.94 1.33 5.25 0.91
Story. The test-taker takes the role of a student representative on the City
of Elmsen’s Council along with six others with varying backgrounds and inter-
ests. The mayor has asked the student to evaluate Ventusa’s offer to build
a park with six wind turbines. The WT is in principle licensable and its use for
the production of wind energy generally desirable. Ventusa prefers the Elmsen
location because of high wind-power density, but if Elmsen declines would
consider building in the neighboring municipality Murn.
Task. As a member of the municipal council, the mayor has asked the
test-taker to prepare a statement for the next meeting, which will: (1) compile
the main arguments for and against the adoption of the VENTURA proposal;
(2) formulate a reasoned decision recommendation, supported with evidence
from the documents; and (3) suggest one or two pieces of additional informa-
tion that would increase confidence in the recommendation.
Sample
A total of 30 students from a German university participated in the pilot study.
Twenty-five participants were enrolled for a bachelor’s degree; five were
enrolled for a master’s degree. They aged 19 to 37 years, with an average age
of 24 years (SD = 4.03 years). Since prior education provides an indicator of
general cognitive skills (e.g., Kim & Lalancette, 2013; Schaap, Schmidt, &
Verkoeijen, 2011), we characterized test-takers by their prior education or
vocational training: 6 participants had achieved a higher education entrance
qualification at a specialized secondary school with a focus on economics; 11
had already completed commercial vocational training. Finally, 2 participants
mentioned political commitment in their community, and 3 mentioned other
social commitments, which could constitute relevant experience in the context
of solving problems such as those found in WT (Table A2 in appendix shows
the descriptive statistics of the sample).
Scoring
Analytic dimensional scoring rubrics were developed based on the construct
definition of critical thinking and the classification of the different information
sources and arguments according to relevance, trustworthiness, and judgmental
heuristics/bias. The rubrics flexibly took into account test-takers’ specific use
of information varying in trustworthiness and relevance as well as their reflec-
tion on and use or avoidance of heuristics that can facilitate or lead to errors
in judgment and decision making. The rubrics code the use of such informa-
tion in evaluating test-takers’ justifications for their recommendations for
action, evaluation of alternative courses of action and identification of add-
itional information needed. The trained raters score performance along a set of
23 correlated dimensions that are clustered into four rubrics as follows:
Cognitive Interview
Following the completion of WT, an 80-minute semi-structured cognitive
interview was conducted and taped for later transcription to: (a) examine how
participants handled the PAL task; (b) examine, students’ self-reported
thought, decision-making and response processes when performing the task,
and (c) assess how the provided information was considered, particularly in
terms of their relevance, trustworthiness, and judgmental error/bias. In the
interviews, we also gathered data on how the quantity and quality of each of
the different documents were perceived while completing the task and exam-
ined the participants’ ability to judge information source quality, in terms of
its validity, and relevance. For every single given piece of information (see
Table A1 in the appendix) the participant was asked whether they rated this
information as (ir)relevant or (in)valid.
The second part of the interview focused on individual factors that influ-
ence the decision-making and response process, for example, personal beliefs,
relevant background knowledge on the story topic, and the extent to which
test-takers knowledge or experience contradicted the sources provided; for
example, whether a general rejection of wind turbine building for ecological
reasons resulted in the selective inclusion of the given information. Particular
attention was paid to seeing whether students used their prior knowledge of
a particular domain (e.g., economics) to solve the tasks. Finally, participants
were given the opportunity to give personal feedback on the task, reflecting
how personal skills might have influenced their processing of WT.
Questionnaire
After the cognitive interview, an additional short standardized questionnaire
of approximately 10 minutes was administered. The questionnaire focused on
participants’ media use: (1) ability to judge the quality of the information/
sources, (2) use of different decision-making heuristics, (3) detailed query
ASSESSMENT OF UNIVERSITY STUDENTS’ CRITICAL THINKING 15
of general media use, and (4) when and how they reached a decision. The
instrument was also used to describe decision-making.
Validity
We had a variety of evidence against which to examine interpretations of the
performance assessment scores including cognitive interviews and correlation
analyses of survey items.
Cognitive Interview: The 30 respondents were asked when they had come to
their decision on whether to support wind turbines. Fourteen of 30 respondents
came to their decisions after reviewing the documents in the library. Eleven
16 SHAVELSON ET AL.
considered “for” and “against” arguments on the basis of the facts and information;
two decided as they wrote their responses; and one decided after finishing writing.
The qualitative analysis of cognitive interviews indicates that they also considered
the presented information and documents as for and against arguments, using what
we would define as highly critical thinking. These 14 participants performed better
(on average over 3.7 points on a 6 point scale) than students who think less critically
according to cognitive interviews (on average < 3.5 points). One respondent
reported to have based the decision on prior knowledge and performed significantly
worse results than the 14 participants (2.46 points). Seven respondents reported hav-
ing decided directly after reading the task for the first time (with or without the
accompanying documents; average score of 3.42) (see Table A3 in the appendix).
The rank-order correlation between time when decision was made and
performance scores were: 0.25, 0.06, 0.26, 0.30, and 0.22 for total score and
the four rubric scores. These correlations reflect the differences observed
among groups of test takers’ mean scores just described.
Finally, we found that test takers (with one exception, as noted) did not
make use of their prior knowledge when solving the tasks. Furthermore, there
were no statistically significant correlations between prior knowledge, study
domain and WT performance.
Questionnaire Analyses: We expected measures of achievement (high
school grade-point-average [GPA], mathematics grade, and German grade) to be
positively correlated with WT total scores. We found correlations of: 0.20, 0.13,
and 0.15, respectively. We expected the intelligence scores to positively correlate
with total WT scores and the actual result was: 0.32. These findings support the
interpretation that WT scores are influenced, as expected, by indicators of cogni-
tive ability but the correlations are low, suggesting, as assumed, that general cog-
nitive factors influence critical thinking as well.
In addition, we examined the performance assessment scores as to: gender,
degree (bachelor, master), migration background, type of high school attended
(academic/vocational), vocational training, study abroad, and socio-political
engagement (see Table A4 in the appendix). With a few exceptions, none of these
variables significantly influenced performance assessment scores. However, we
did find that socio-political commitment significantly correlated with performance
(0.41), which suggests that skills acquired during experiential activities, such as
being an active member of a political party, influence critical thinking as well.
By critically examining the data and the first findings in the pilot study, we saw
what worked well with the newly developed performance assessment, WT, and
ASSESSMENT OF UNIVERSITY STUDENTS’ CRITICAL THINKING 17
what did not work so well. While we could produce reliable scores on WT, and
total scores correlated positively with measures of general cognitive ability, the
think aloud data from cognitive interviews suggested that about half of test-takers
did not actually carry out the performance assessment as intended. Rather they
took the story at face value and wrote a recommendation paying scant attention to
the documents. While time pressure did not appear to be an issue with the test
takers, this may be because they did not review the documents carefully—think-
ing too fast rather than slow. One implication of this finding is that fewer docu-
ments and sharper focus in the storyline are needed in revisions.
More generally, according to the ECD (Mislevy & Haertel, 2007), the domain
analysis and modeling should be critically reexamined. The test design model is
based on the criterion sampling approach (Shavelson et al., 1974). While analyses
and descriptions of the domain in question, for example, based on job descriptions
(e.g., patrolman), can be implemented relatively well for determining job-specific
skills, a precise description is much more complicated for generic skills such as
critical thinking. The challenges became particularly evident in the construct def-
inition and test framework for WT. They also became apparent as we developed
the scoring scheme, for example, being able to precisely and distinctly divide or
weigh the individual (sub)dimensions and categories.
The reliability analyses presented here support the theoretically developed
scoring scheme with the four main categories as facets of the overall construct
critical thinking. However, latent factor analyses and cluster models as well as
multiple-category IRT models, with a larger sample, are recommended for fur-
ther studies in order to gain differentiated information about weighting of sin-
gle tasks and the test’s internal structure.
In sharpening the construct it might be helpful to consider whether
approaches developed in the theory of moral judgement are transferable to the
idea of critical thinking in general. There and here it appears necessary to bal-
ance conflicting interests (in a wider sense: aspects or arguments). Roughly
speaking, the (moral) quality of solutions was measured on two dimensions,
(1) which principle/value (e.g., [economic] advantage, [social] acceptance,
[environmental] sustainability) a position statement is based on, and (2) how
narrow or broad the participant’s perspective on consequences of a decision is
(reaching from ego and alter up to the actual and future world population).
There are sophisticated models of moral reasoning (e.g., the neo-Kohlbergian
moral judgement theory developed by Minnameier, 2001) which should be
examined if and to what extent they may add more clarity in sharpening the
construct of critical thinking.
The interrater reliability and generalizability analyses indicate that the
assessment design source of measurement error are far more complicated than
reflected in coefficient alpha. Sorting out this complexity in, for instance, a
18 SHAVELSON ET AL.
design with items nested in subareas would provide a more accurate picture of
measurement error. Moreover, regressing WT total and subscores on nature of
information (trustworthiness, relevance, judgment/bias) used by students would
bear concretely on inferences about their critical thinking. Moreover, WT test
instructions should be critically examined and possibly stated more precisely.
The role of previous knowledge and beliefs has to be accounted for by
varying the domain knowledge underlying the tasks sampled in the assessment.
Remarkably, it is the test-takers who integrate their beliefs (not (previous)
knowledge) into solving the case that achieve the highest performance test
scores. This finding raises further questions regarding the correlation between
knowledge, beliefs and critical thinking processes, which require further
examination.
All in all, the pilot-test provided a solid basis to continue working on the
more precise determination of the threshold and quality criteria for critical
thinking and on appropriate indicators for the representation of underlying
mental processes. In other words, the crucial question arises as to what extent
critical thinking is being assessed when the student is working on the perform-
ance task, compared to, for example, simple integrative information process-
ing, testwiseness and/or general abilities irrelevant to the construct. Further,
the analysis would examine whether there is “only” a single test motivation
that leads to a particularly intensive, differentiated and detailed solving of the
task or whether the test persons have specific cross-situational critical attitudes
that lead them to think critically in terms of the construct definition.
In future analyses, the focus of cognitive interviews will be on the cognitive
processes involved in real-time task solving and determine to what extent crit-
ical thinking took place—the extent to which test-takers’ cognitive processes
reflect the critical thinking the tasks are intended to stimulate. Moreover cogni-
tive analyses should examine whether respondents applied stereotype-based
fast-thinking on the basis of simple heuristics when solving the performance
task. Within the scope of an additional experimental laboratory study, we need
to examine how much time the participants need to complete the task, hence
showing their critical thinking skills. Here, eye-tracking as well as analyses of
key logfiles could provide additional relevant information.
ACKNOWLEDGEMENTS
We would like to thank Marie-Theres Nagel, Dimitri Molerov and our student
assistants for their work in carrying out the study. We would like to thank the
two anonymous reviewers and the editors, Maria Elena Oliveri and Robert
Mislevy, who provided constructive feedback and helpful guidance in the
revision of this paper.
ASSESSMENT OF UNIVERSITY STUDENTS’ CRITICAL THINKING 19
REFERENCES
Secolsky, C., & Denison, B. D. (Eds.). (2018). Handbook on measurement, assessment and evalu-
ation in higher education (2nd ed.). Oxford, UK: Routledge.
Seminara, J. L., Shavelson, R. J., & Parsons, S. O. (1967). Effect of reduced pressure on human
performance. Human Factors, 9(5), 409–418. doi:10.1177/001872086700900503
Shavelson, R. J. (2010). Measuring college learning responsibly: Accountability in a new era.
Stanford, CA: Stanford University Press.
Shavelson, R. J. (2013). On an approach to testing and modeling competence. Educational
Psychologist, 48(2), 73–86. doi:10.1080/00461520.2013.779483
Shavelson, R. J., Baxter, G. P., & Pine, J. (1991). Performance assessment in science. Applied
Measurement in Education, 4(4), 347–362.
Shavelson, R. J., Beckum, L., & Brown, B. (1974). A criterion-sampling approach to selecting
patrolmen. Police Chief, 41(9), 55–61.
Shavelson, R. J., Davey, T., Holland, P. W., Webb, N. M., & Wise, L. L. (2015). Psychometric
considerations for the next generation of performance assessment. Princeton, NJ: Educational
Testing Service.
Shavelson, R. J., & Webb, N. M. (1981). Generalizability theory: 1973–1980. British Journal of
Mathematical and Statistical Psychology, 34(2), 133–166. doi:10.1111/j.2044-
8317.1981.tb00625.x
Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. Newbury Park, CA:
Sage.
Shavelson, R. J., Zlatkin-Troitschanskaia, O., & Mari~ no, J. P. (2018). International performance
assessment of learning in higher education (iPAL) – Research and development. Wiesbaden,
Germany: Springer.
Stanovich, K. E. (2009). What intelligence tests miss: The psychology of rational thought (1st ed.).
New Haven, CT: Yale University Press.
Stanovich, K. E. (2016). The rationality quotient: Toward a test of rational thinking (1st ed.).
Cambridge, MA: MIT Press.
Tremblay, K., Lalancette, D., & Roseveare, D. (2012). Assessment of Higher Education Learning
Outcomes. Design and Implementation (Feasibility Study Report Vol. 1). Retrieved from http://
www.oecd.org/education/skills-beyond-school/AHELOFSReportVolume1.pdf
Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science,
185(4157), 1124–1131. doi:10.1126/science.185.4157.1124
Wheeler, P., & Haertel, G. D. (1993). Resource handbook on performance assessment and meas-
urement: A tool for students, practitioners, and policymakers. Berkeley, CA: Owl Press.
Zlatkin-Troitschanskaia, O., Shavelson, R. J., & Pant, H. A. (2018). Assessment of learning out-
comes in higher education: International comparisons and perspectives. In C. Secolsky & D. B.
Denison (Eds.), Handbook on measurement, assessment and evaluation in higher education
(2nd ed., pp. 686–698). Oxford, UK: Routledge.
ASSESSMENT OF UNIVERSITY STUDENTS’ CRITICAL THINKING 21
TABLE A1
Overview PAL Task “Wind Turbine” incl. Scoring
List of appendices
[6] Notes on the Name, profession, back- Table in-house relevant
members of the ground information on construction +/valid +
local council each of the seven mem-
bers of the local council
[7] Municipality Demographic structure Number in-house relevant
Elmsen – population construction +/valid +
[8] Municipality Demographic structure – Pie chart in-house relevant
Elmsen age structure construction +/valid +
[9] Municipality Demographic structure Bar chart in-house relevant
Elmsen – employment construction +/valid +
[10] Municipality Demographic structure Number in-house relevant
Elmsen – commuters construction +/valid +
[11] Municipality Demographic structure – Bar chart in-house relevant
Elmsen last general election construction +/valid +
[12] Municipality Location Text in-house relevant
Elmsen construction +/valid +
[13] Municipality History Text in-house relevant
Elmsen construction +/valid +
[14] Municipality Politics/Law Text in-house relevant
Elmsen construction +/valid +
[15] Municipality Economy Text in-house relevant
Elmsen construction +/valid +
[16] Municipality Land utilization in the Figure in-house relevant
Elmsen three municipalities construction +/valid +
[17] Structure of a Description of each part Text as list Energienpoin- relevant - /
wind turbine of a wind turbine and figure t.de valid +
[18] Effects of wind Brief section on Text Wikipedia relevant
turbines: bird mortality „Windkraftan- +/valid -
Emission/ lage“
Immission
(Continued)
22 SHAVELSON ET AL.
TABLE A1
(Continued)
No. Section Content Presentation Source Scoring
[19] Effects of wind Extensive text on Text with fig- Hyperlink to
turbines: wind turbines ures website
Emission/ and graphs Wikipedia
Immission “Windkraftan-
lage”
[20] Effects of wind Brief section on Text Wikipedia relevant
turbines: bat mortality “Windkraftan- +/valid -
Emission/ lage”
Immission
[21] Effects of wind Bats and wind turbines Text NaBu relevant
turbines: with figures +/valid -
Emission/
Immission
[22] Effects of wind Brief section on Text Sound relevant
turbines: sound emissions (Excerpt from: +/valid +
Emission/ Obermaier,
Immission H. (2011)
[23] Effects of wind Sound immissions of Text with fig- Fachagentur- relevant-/
turbines: wind turbines ures Windenergie valid+
Emission/ and graphs
Immission
[24] Effects of wind Sound immissions of Figure DEWI GmbH relevant-/
turbines: wind turbines in the sur- valid+
Emission/ rounding area
Immission
[25] Effects of wind Infrasound (detrimental Headline to Newspaper relevant
turbines: to one’s health?) the subse- Die Welt +/valid -
Emission/ quent source
Immission
[26] Effects of wind Reference newspaper Hyperlink Newspaper relevant
turbines: Die Welt to Website Die Welt +/valid -
Emission/
Immission
[27] Effects of wind Bird strike Headline to ARD relevant
turbines: following Mediathek +/valid+
Emission/ video
Immission
[28] Effects of wind Reference video (6 Min.) Hyperlink ARD relevant
turbines: to Mediathek Mediathek +/valid+
Emission/
Immission
[29] Effects of wind Video (3 Min.) Headline to ARD relevant
turbines: following Mediathek +/valid -
Emission/ video
Immission
(Continued)
ASSESSMENT OF UNIVERSITY STUDENTS’ CRITICAL THINKING 23
TABLE A1
(Continued)
No. Section Content Presentation Source Scoring
[30] Effects of wind Reference video (3 Min.) Hyperlink ARD relevant
turbines: to Mediathek Mediathek +/valid -
Emission/
Immission
TABLE A2
Sample Description (N = 30)
Frequency
Variable n %
Gender
Women 21 70.0
Men 9 30.0
Degree
Bachelor 25 83.3
Master 5 16.7
Country of birth
Germany 27 90.0
Other 1 3.3
Parents migration background
Yes 3 10.0
No 27 90.0
Highest school-leaving qualification in Germany
Yes 29 96.7
No — —
Completed vocational training
Yes 11 36.7
No 18 60.0
Completed internship
Yes 25 83.3
Not specified 5 16.7
Stay abroad
Yes 14 46.7
No 15 50.0
Social engagement
Yes 13 43.3
No 16 53.3
24 SHAVELSON ET AL.
Variable
n M SD
Age 29 23.93 4.03
Semester 30 4.77 2.46
University entry qualification grade 28 2.07 0.60
TABLE A3
Time of Decision
Average
Decision was made… Frequency test score
immediately after reading the scenario description and the task 3 3.40
for the first time (before reading the accompanying documents)
after reading the scenario description and task several times 2 3.50
immediately after reading the entire task, including the accompa- 7 3.42
nying documents, for the first time
after reading the entire task, including the accompanying docu- 1 1.98
ments, several times
after considering “for” and “against” arguments that were based 1 2.46
on prior knowledge
after considering “for” and “against” arguments that were based 2 4.32
on what you believe to be true
after considering “for” and “against” arguments based on the pre- 11 3.72
sented facts and information
while writing the statement 2 3.78
after completion of the statement 1 3.85
ASSESSMENT OF UNIVERSITY STUDENTS’ CRITICAL THINKING 25
TABLE A4
Means of PT-Performance of Different Groups
Variable M SD
Gender
Women 3.54 0.82
Men 3.57 0.63
Degree
Bachelor 3.53 0.73
Master 3.67 0.95
Migration Background
No 3.63 0.75
Yes 2.83 0.32
Completed Vocational Training
No 3.52 0.87
Yes, commercial 3.72 0.52
Yes, non-commercial 3.28 —
Stay Abroad
Yes 3.72 0.70
No 3.45 0.78
Social Engagement
No 3.26 0.81
Yes, political 4.25 0.20
Yes, faith-related 3.85 0.37
Yes, athletic 3.74 —
Yes, other 3.90 0.58
TABLE A5 26
Dimension 1 of the Rating Scheme with Individual Categories and Behavior Anchors
Recognizing and evaluating the relevance, reliability, and validity of given information
1 2 3 4 5 6
Amount of used/consid- Did not recogniz- Used only little of Used some of the Used 3 sources of Considered 4 or Considered 4 or
ered information ably use given the given infor- given informa- the given more sources more sources
and sources information mation (1 tion information of the given of the given
source) (2 sources) information information
SHAVELSON ET AL.
and included
further
information
Accurately judges quality Only used unreli- Used some of the Used 2 sources of Used 3 sources of Used 4 or more Used 4 or more
of evidence avoiding able reliable given the relevant the reliable sources of the sources of the
or qualifying unreli- information information (1 reliable infor- given reliable reliable infor-
able, erroneous, and source) and mation and information information mation and
uncertain sources and mostly unreli- some unreli- defined unreli-
information able able able
information information information
Accurately judges quality Only used irrele- Used 1 of the Used 2 of the Used most of the Used the relevant Used only rele-
of evidence avoiding vant relevant given relevant given relevant given information vant informa-
invalid and irrelevant information information information information correctly tion and
information and mostly and some defined irrele-
irrelevant irrelevant vant
information information information
Acknowledges uncer- Does not acknow- Acknowledges Mentions and Mentions and Mentions and Mentions and
tainty and justified ledge uncer- some uncer- specifies uncer- specifies uncer- specifies uncer- specifies uncer-
and reasonable need tainty / need tainty, no men- tainty, no men- tainty, men- tainty, pre- tainty, defines
for further information for further tion of need tion of need tions need for cisely describes and justifies
information for for information more need for more needed
information information information information