Professional Documents
Culture Documents
Mettl Test For Critical Thinking
Mettl Test For Critical Thinking
Mettl Test For Critical Thinking
_____________________________________________________________________________________
Revised in 2022
This Manual may not, in whole or in part, be copied, photocopied, reproduced, translated, or converted to any electronic or machine-readable form without
prior written consent of Mercer Mettl.
Page 2
Table of Contents
Executive Summary
The purpose of this technical manual is to describe the process of standardization and validation of the Mettl
Test for Critical Thinking (MTCT). This test requires applications of analytical reasoning in a verbal context.
Critical thinking is an extremely important ability in employees in today’s organizations. With the rise of a
VUCA world, automation and big data the demand for a workforce with high critical thinking is growingi. It is
important to note here that the ability to connect, interpret and analyse information in a world full of
ambiguity and change requires a higher level of critical thinking. According to McKinsey, due to the rise in
automation the need of basic cognitive skills required in data input and processing will decrease. On the other
hand, the demand for higher cognitive skills such as creativity, critical thinking, decision making, and complex
information processing, will grow through 2030ii. In our experience, critical thinking tests are one of the most
important tests in employment testing. The previous version of this test was extensively used in hiring and
developmental interventions for mid to senior level executives. This test is also used in hiring for critical roles
at all job levels, individual contributor to mid and senior management roles across all industries. The previous
version of this test was effective in measuring competencies like strategic thinking, problem solving and
decision making.
Mettl’s Test for Critical Thinking helps in measuring the following ability of the test takers:
Ability to collect information from relevant sources.
Ability to critically analyse the information coming from diverse sources.
Ability to interpret data rationally and draw valid conclusions.
Ability to render accurate judgements based on evidence and the logical relationship between
propositions.
Ability to recognize problems and solve the problems efficiently.
Ability to reflect and take logical and conclusive decisions.
Critical thinking is defined by multiple researchers in different ways (Black 2005iv; Moseley et al., 2005;
Sternberg, Roediger & Halpern, 2007). In general, it is defined as the ability of an individual to achieve a
desired outcome by thinking rationally and logically. According to Halpren (2003) “critical thinking is
purposeful, reasoned, and goal directed. It is the kind of thinking involved in solving problems, formulating
inferences, calculating likelihoods, and making decisions, when the thinker is using skills that are thoughtful
and effective for the particular context”. According to Mayer and Goodchild (1990)v critical thinking is defined
as “an active and systematic attempt to understand and evaluate arguments”. On the other hand Beyer (1984)vi
defined critical thinking as the combination of 10 discreet skills which includes “(a) an ability to discriminate
between provable facts and value claims; (b) determining the reliability of a source; (c) distinguishing
relevant from irrelevant information, claims, or reasons; (d) detecting bias (e) identifying unstated
assumptions; (f) identifying ambiguous or equivocal claims or arguments; (g) recognizing logical
inconsistencies or fallacies in a line of reasoning; (h) distinguishing between warranted or unwarranted
claims and; (i) determining the strength of an argument”.
According to Ghasemi Far (2004)vii, critical thinking involves the; identification of problems, the estimation of
the relationship between the different components of the problem, inference, combining elements to create a
new thought pattern and appropriate interpretations or conclusions. Conversely, Halpern (1998)viii believes
critical thinking includes six skills namely; verbal reasoning, argument analysis, hypothesis testing, likelihood
and uncertainty and decision-making and problem solving. Watson and Glasser (2008)ix proposed five factor
model measuring five components of critical thinking namely; recognition of assumptions, inferences,
Page 6
evaluation of arguments, interpretation and deduction. It is also important to note here that intelligence and
critical thinking are two separate constructs. They are related to each other, but they are not similar. In sum,
the major components of critical thinking are judgement, reasoning, metacognition and reflective thinking.
Based on a thorough literature of critical thinking we see MTCT as a multi-faceted measure of critical thinking
which includes three elements of critical thinking; recognizing assumptions, evaluating arguments, and
drawing conclusions. Recognizing assumptions is the ability to identify assumptions/ suppositions made
implicitly while arriving at conclusions. Evaluating arguments is the ability to discriminate between weak and
strong, relevant and irrelevant arguments related to some specific matter. Lastly, drawing conclusions is the
ability to derive valid conclusions based on available evidence. In sum, as per our definition, critical thinking
is a rational way of thinking with clarity and precision. It includes questioning assumptions, making
evaluations that are impartial and precise and identifying the relevant information when reaching
conclusions.
Literature Review
The fast pace of change in our world today has made the ability to think critically become one of the most
significant and relevant skills for employee (Halpern, 2002)xiv. Both, The World Economic Forum (WEF) The
Future of Jobs (2016)xv identify critical thinking and complex problem solving as the most sought skills over
Page 7
the next few years. These reports believe using logic and reasoning to recognise the strengths and weaknesses
of alternate solutions, making tough decisions or developing different approaches to solve problems are the
key skills required in multiple job families especially business and financial operations, architecture and
engineering, management, computers and Mathematical jobs. In summary, it is going to be difficult to imagine
any area or job where the ability to think critically is not needed. Most of the jobs in the present and the future
will require employees to make decisions, analyse arguments and solve problems every day.
Halpem (2006)xvi suggested that critical thinking is purposeful, reasoned and directed to solve problems,
calculating probability and making decisions. Critical thinking also facilitates the reasoning which helps to
decide which factors to consider when taking decisions in a variety of settings (Halpern, 1998)xvii. The majority
of empirical literature on critical thinking clearly suggest the positive relationship between critical thinking
and academic performance (Ernst and Monroe, 2004xviii; Gadzella, Stephens and Stacks, 2004xix; Kuhn, 1999xx;
Lipman, 2003xxi, Zoller et al., 2000xxii). In a study conducted by Saremi & Bahdori (2015)xxiii it was shown that
critical thinking and creativity are positively correlated with each other and critical thinking is also
significantly correlated with emotional intelligence.
Ennis (1993)xxiv suggested that higher critical thinking skills result in a higher capacity to assess a problem
effectively. On the other hand, Glevey (2006)xxv reported that individuals high on critical thinking usually come
up with better problem-solving strategies. According to a study conducted by Khalili (2004)xxvi there was a
positive correlation between student’s critical thinking test score and their GPA, as well as their scores on
math and verbal courses. In another study by Watson and Glasser (2009) it was suggested that there was a
positive relationship between critical thinking score and supervisory ratings on overall job performance and
several dimensions of workplace performance including technical knowledge, judgement and problem
solving. Spector, Schneider, Vance and Hezlett (2000)xxvii suggested that critical thinking and problem-solving
skills are positively correlated with each other. Kudish and Hoffman (2002)xxviii reported that there was a link
between critical thinking capability and the judgement and analysis ability of retail professionals.
Item Banking
The MTCT is developed using an item banking approach to generate multiple equivalent forms to support item
randomization. The term ‘item bank’ is used to describe a group of items which are organized, classified and
catalogued systematically. According to the research conducted by Nakamura (2000)xxix Item Response
Theory (IRT) facilitates item banking standardization by calibrating and positioning all items in the test bank
on the same latent continuum by means of a common metric. This method can be further used to add
additional items to the test bank to increase the strength of the item bank. IRT also allows construction of
equivalent and multiple tests as per the test composition plan.
Our item bank is developed in line with our test composition plan which is based on two parameters;
representation of all types of item content and inclusion of easy, medium and difficult items xxx. In our critical
thinking test all three components of critical thinking; recognizing assumptions, evaluating arguments, and
drawing conclusions are represented in the item bank. Our test’s composition is defined by a specified
percentage of items from various content domains/rules as well as equal numbers of easy, medium and
difficult items. It is used to develop a uniform content outline which is crucial to confirm the construct validity
of the test. In an item bank there are more questions than are needed for each candidate. This enables random
generation of items within certain parameters to ensure each test is no more or less difficult than the last.
Although item characteristics can be estimated with the help of both Classical Test Theory (CTT) and IRT
models, the psychometric literature indicates the IRT method is more suitable for an item banked test
(Embretson & Reise, 2013xxxi; Van der Linden, 2018xxxii). The classical items and test statistics based on the
CTT model vary depending on sample characteristics whereas an IRT model provides ‘sample free’ indices of
item and test statistics. Therefore, we use item response theory to standardize our item banks.
New items can be added to the bank simultaneously and over exposed items can be retired when they
reach a specific level.
Fair and non-discriminatory items are only included in the item bank which reduces the adverse
impact for different groups and produces fair assessments for all the candidates.
Item Writing
The MTCT consists of a piece of text outlining the premise of the question, followed by four statements on
either assumptions, arguments or conclusions. An individual must identify which of the statements following
the premise are assumptions made, which of the arguments are strong or weak or which conclusions are valid.
The following questions are asked at the end of each premise/statement:
Which of the following conclusions can be made from given information?
Which of the following, if true would most strengthen/weaken the above conclusion?
Which of the following assumption is the above conclusion based on?
The development of this test was done in four broad stages; item creation, item review, pilot testing and
standardization. In the first stage of item creation a large pool of 127 items were developed by subject matter
experts and psychometricians. Detailed in-depth interviews were conducted to explore with SMEs which item
and item reasoning should be used. The following general rules were followed when designing the items:
Item Review
Item reviews were conducted by our in-house Psychometricians, who have over 10 years of research
experience. Items and answer keys were both reviewed in depth. The difficulty level and item logic of each
item was reviewed thoroughly. The items were also analysed in terms of cultural- neutrality so that no ethnic
or cultural group would be advantages or disadvantaged due to culturally specific content. All items that did
not meet these strict standards were removed. Out of 127 original items a pool of 72 items were finalized for
the next step, after multiple rounds of item review.
Procedure: In the first stage we conducted a pilot study and individual item parameters were estimated using
a Rasch Model. The objective of conducting the pilot study was to ascertain the basic item properties especially
item difficulty of all 72 items in the first stage. 72 items were divided into three equivalent sets and data was
collected from online administration of all three sets. All the items were mandatory, and participants were not
allowed to skip the item without responding. Only respondents with at least a 90% completion rate were
included in the sample and those with less than a 90% completion rate were not included in final data set. This
resulted in 170, 202 and 228 responses in the three sets respectively.
Sample Details: In the first stage data was collected from 600 respondents. 45.2% of respondents from the
total sample were male, 47% of respondents were female, 0.7% of respondents chose ‘other’ as their gender.
Page 11
43% of the respondent’s native language was English and the mean age of the sample was 32.4 years. A
detailed description of the sample is reported in Appendix 1.
Analysis: A Rasch Model was used to ascertain item properties at stage 1 due to a smaller sample size. This
model provides stable estimates with less than 30 responses per item. A Rasch Model is the one parameter
model of Item Response Theory which estimates the probability of correct responses to a given test item based
on two variables: difficulty of an item and the ability of the candidate. The primary function of this model is to
provide information on item difficulty which helps to organize the test items according to difficulty level,
spread of item difficulty and test length. This helps to ultimately increase the measurement accuracy and test
validity. Based on the findings of the Rasch model, items exhibiting extreme b parameters were rejected at this
stage. Values substantially less than -3 or greater than +3 were regarded as extreme. 23 items from the initial
pool of 72 items got removed at this stage.
Procedure: A total of 49 items survived the pilot study stage. These were arranged in terms of difficulty
parameters and then divided into 3 sets of 21 items each for the final stage of data collection. 11 items got
repeated across three sets to develop three sets of equivalent tests with 21 items in each. The objective of the
second stage of data collection was to standardize the item bank and ascertain the essential psychometric
properties (reliability and validity) of the test. All the items were mandatory at this stage and participants
were not allowed to skip the item without responding. Only respondents with a 90% completion rate were
included in the sample and those with less than 90% completion rate were not included in final data set. This
resulted in 514, 384 and 372 responses in all three sets respectively.
Sample: In the second stage, data was collected from 1270 respondents. 46.9 % of respondents from the total
sample were male, 45.4% of respondents were female, and 2.7 % of respondents identified their gender as
‘other’. 45.5% of the respondent’s native language was English and the mean age of the sample was 31.3 years.
A detailed description of the sample is reported in Appendix 2.
Page 12
Analysis: In the second stage of analysis we used a two-parameter model which advocates that the probability
of the correct response is a function of both item difficulty and the respondent’s proficiency. The two
parameter IRT model provides meaningful estimates of item difficulty and item discrimination. For the
finalization of items in the item bank, the following procedure was followed:
Items displaying a b parameter (item difficulty) larger than -3 or greater than 3 and above were
removed from the data set.
Items displaying a parameter (item discrimination) less than .2 were also removed at this stage.
Three out of 49 items were removed meaning the final bank consisted of 46 items with a balanced spread of
easy, medium and difficult items.
Page 13
Validity
Validity is the most fundamental property of any psychological test. It involves accumulating relevant scientific
evidence for test score interpretation. The APA Standardsxxxiii say that there are four major sources of evidence
to consider when measuring the validity of a test; evidence based on test content, evidence based on response
processes, evidence based on internal structure and evidence based on relationship with other variables
especially criterion variables. In order to ascertain the validity of the MTCT we collected evidence based on
internal structure (construct validity) and, evidence based on relationship with other variables especially
criterion variables (criterion related validity).
Construct Validity
The purpose of the construct validation is to ascertain whether the test measures the proposed construct or
something else. The most common method of ascertaining the construct validity of an assessment is
exploratory and confirmatory factor analysis. We used the CFA method because our objective is to test a
predefined unidimensional measurement model. One of the most important assumptions of using an IRT
model as a measurement system is that it includes unidimensional items from the item bank. Therefore, in
Page 14
order to establish construct validity evidence confirmatory factor analyses was used. The CFA results
confirmed the unidimensional factor structure with fit statistics that were satisfactory. As per the CFA model
the fit indices were within a normal range (IFI = .97; RMSEA = .02; CFI = .978 and TLI = .973).
Criterion Validity
Criterion-related validity evidence indicates the extent to which assessment outcomes are predictive of
employee performance in a specified job or role. In order to establish the criterion-related validity, there are
two major methods used:
1. Concurrent Validity: In this method, data on the criterion measures are obtained at the same time
as the psychometric test scores. This indicates the extent to which the psychometric test scores
accurately estimate an individual’s present job performance.
2. Predictive Validity: In this method, data on criterion measures are obtained after the test. This
indicates the extent to which the psychometric test scores accurately predicts a candidate’s future
performance. In this method, tests are administered to candidates when they apply for the job and
their performance is reviewed after six months or a year. Afterwards, their scores on the two
measures are correlated to estimate the criterion validity of the psychometric test.
In order to ascertain the MTCT’s validity, concurrent criterion-related validity evidence was gathered where
performance data and the MTCT scores were both collected at the same time. Then the relationship between
these two variables was tested and significant relationships were found. It is important to note here that in
criterion related validity analysis, the precision and relevance of criterion data/employee performance data
is extremely vital. Error in measurement of the criterion is a threat to accurate assessment of the test’s validity.
Error in criterion measurement may attenuate the relationship between test score and criterion variables, and
thus lead to an erroneous criterion-related validity estimate. The basic criteria of appropriateness or quality
is as follows. Researchers should
• Have a clear and objective definition and calculation of performance levels.
• Have alignment with key demands of the role.
• Have crucial implications on business outcomes.
• Produce reasonable variance to effectively separate various performance levels.
Page 15
Study Procedure: In the present study MTCT scores were used as the predictor variable and respondent’s
competency score on the basis of Line-managers ratings were used as the criterion variable. Data was collected
from a multinational company which specializes in HR Consulting. A sample of 150 employees from this
organization were invited to participate in the study and the purpose of conducting the assessments were
explained to them in detail. After collecting responses from the employees on the MTCT a detailed
competency-based performance rating form was completed by their respective line managers. In the
competency-based performance rating form all competencies were defined, and respondents were asked to
rate the competency on a 10-point rating scale (1 =low and 10 = high). Pearson product correlation method
was used to test the relationship between the MTCT score and their competency ratings.
Sample: A total of 111 employees participated in the study and completed the MTCT. We received managerial
ratings on competencies for only 87 of these respondents. The mean age of the sample was 35.4 years, 51% of
respondents were male and 49% were female. 74% of the respondents worked as Analysts and Consultants
and the remaining 26% were Leaders and Product Owners.
Analysis: Pearson product correlation method was used to test the relationship between the MTCT score and
line manager competency ratings. Results indicate significant positive correlations between the MTCT score
and competency ratings. MTCT score is positively correlated with critical thinking (r = .368, p <.01) analytical
ability (r = .335, p <.01), Fluid intelligence (r = .301, p<.01) Innovation (r = .215, p <.05) and verbal
comprehension (r = .244, p <.05). These correlation coefficients are not corrected for attenuation or range
restriction. MTCT score is also positively correlated with learning orientation, employability and ability with
numbers (refer to Appendix 3, table 1).
the disadvantage of members of a race, sex or ethnic group” (see section 1607.16). UGESP recommends the
four-fifths rule for examining the potential of Adverse Impact, stating that the “selection rate for any race, sex
or ethnic group which is less than four-fifths (4/5) (or 80%) of the rate for the group with the highest rate
will generally be regarded by the Federal enforcement agencies as evidence of adverse impact.” (1978, see
section 1607.4 D). Courts have also applied this rule to cases involving age discrimination. The Age
Discrimination in Employment Act (ADEA) of 1967 prohibited discrimination in selection contexts against
individuals 40 years of age or older. In addition, the UK’s Equality Act (2010) legally protects people from
discrimination in the workplace and in wider society. Researchers have proposed alternative methods for
examining Adverse Impact (e.g., moderated multiple regression, one-person rule, and the N of 1 rule),
although none have been as widely adopted as the four-fifths rule. Additionally, a statistical significance test
for mean group differences on individual assessment scales is often considered informative.
In the present study group differences based on age, gender, and ethnicity for the MTCT were examined and
reported in table 1- 3 (refer to Appendix 4). Table 1 presents the comparisons of mean group differences
between gender and MTCT score. Results clearly suggest that there is no significant difference in mean score
between male and female respondents. Table 2 indicates mean scores for two groups; those 40 years of age
and less than 40 years of age. Results indicate these differences are not statistically significant. We examined
the mean differences in MTCT scores between two groups– White (reference group) and others (focal group).
Results indicate these differences were statistically significant. However, based on traditional ranges for
interpreting effect sizes (Cohen’s d; Cohen, 1988), the difference is medium. In conclusion, while there were
some differences in mean scores observed, the effect sizes were small and within the normal and allowable
range.
Additionally, in order to test the impact of English language skills on MTCT scores, we examined the mean
difference in MTCT score between native English speakers and non-English speakers. Results indicate these
differences were statistically significant, nonetheless an examination of effect sizes indicates the difference is
medium. This finding clearly indicates that MTCT is free from language bias and it’s a global and culture
agnostic tool (refer to Appendix 4, table 4).
Page 17
Test Administration
MTCT is an online test administered through an internet-based testing system designed by Mettl for the
administration, scoring, and reporting of occupational tests. Test takers are sent a test link to complete the
test and candidate/test taker data is instantly captured for processing through the online system. Test scores
and interpretive reports are instantly generated. Tests can also be administered remotely but most
importantly all candidates’ data, question banks, reports and benchmarks are stored in a well-encrypted and
highly regarded cloud service. In order to prevent cheating and all forms of malpractices, Mettl’s platform also
offers AI powered anti-cheating solutions that include live monitoring, candidates’ authenticity check, and
secure browsing.
Scoring
Responses to MTCT are scored based on how many correct answers a respondent chooses. Each item consists
of 4 answer options, of which only one is correct. Each item answered correctly is awarded 1 mark and items
answered incorrectly or not attempted are given a 0 (zero) mark. An individual’s overall score is an average
of all items answered correctly. Next, we convert raw scores into Sten scores using the formula given below,
which brings these scores into a 10-point scale.
(Z-score * 2) + 5.5
OR
[(X-M)/SD] *2 + 5.5 (same as above)
Test Composition
Each test taker will be asked to complete 21 items in 30 mins. Sample items and a sample report are available
in Appendix 5.
Interpretation
The MTCT measures the critical thinking ability of the test takers working in a variety of individual contributor
or managerial roles. This test is suitable to be used in both recruitment and development settings. Critical
Page 18
thinking is defined as the ability to identify and analyse problems and seek and evaluate relevant information
in order to reach appropriate conclusions. A high score on the MTCT indicates that the test taker possesses
higher ability to think rationally, solve problems and take effective decisions.
Appendices
Appendix 1: Demographic details of Pilot study (N = 600)
Table 1: Gender
Gender Frequency Percent
Female 271 45.2
Male 284 47.3
Other 4 0.7
Prefer not to say 41 6.8
Page 19
Table 2: Age
Age Frequency Percent
20 - 30 years 323 53.8
31-40 years 142 23.7
41-50 years 91 15.2
51-60 years 44 7.3
Table 9: Nationality
Table 2: Age
Table 7: Industry
Table 9: Nationality
Sample Report:
Page 30
welcome to brighter
Female 14.21
Male 37.12
India 28.63
Portugal 1.97
Malaysia 1.94
Vietnam 1.75
Romania 1.46
Others 8.56
Telecommunications 15.91
Accounting 2.18
Insurance 2.16
Retail 1.60
Banking 1.60
Automotive 1.57
Internet 1.48
Chemicals 1.05
Others 14.00
Female 13.52
Male 36.52
Region % of Sample
India 28.03
Portugal 1.60
Malaysia 1.93
Vietnam 1.61
Romania 1.81
Others 8.04
Telecommunications 17.28
Accounting 2.73
Insurance 1.89
Retail 1.40
Banking 1.83
Automotive 1.84
Internet 1.69
Chemicals 1.24
Others 15.41
Female 14.74
Male 36.99
India 27.74
Portugal 1.51
Malaysia 1.80
Vietnam 1.55
Romania 1.71
Others 9.89
Telecommunications 16.38
Accounting 2.68
Insurance 1.69
Retail 1.49
Banking 2.03
Automotive 1.69
Internet 1.59
Chemicals 1.26
Others 15.44
Farming 6.84
Telecommunications 9.40
Pharmaceuticals 3.68
Oil & Energy 1.53
The norms for Critical Thinking –Evaluating Arguments for LATAM region (Portuguese) have been developed based
on 1244 respondents. The average time taken for completing Critical Thinking- Evaluating Arguments assessment is
8.86 minutes. The demographic details of the gender, age group and industry has been provided in the following
tables.
Farming 7.23
Telecommunications 6.83
Pharmaceuticals 3.70
Farming 7.51
Telecommunications 4.21
Pharmaceuticals 4.13
The norms for Critical Thinking-Drawing Conclusions for LATAM region (Spanish) have been developed
based on 6725 respondents. These norms are based on responses from candidates who have attempted 7
questions (2 easy, 3 medium and 2 difficult) randomly. The average time taken for completing Critical
Thinking- Drawing Conclusions assessment is 9.39 minutes. The demographic details of the gender, age
group and industry has been provided in the following tables.
Page 46 of 52
Mercer Mettl Test for Critical Thinking – Technical Manual
Telecommunications 2.88
Machinery 1.84
Banking 1.24
Insurance 1.04
Others 4.54
Page 47 of 52
Mercer Mettl Test for Critical Thinking – Technical Manual
Telecommunications 3.47
Machinery 1.71
Banking 1.21
Others 6.78
Page 48 of 52
Mercer Mettl Test for Critical Thinking – Technical Manual
The norms for Critical Thinking –Recognizing Assumptions for LATAM (Spanish) have been developed
based on 8085 respondents. The average time taken for completing Critical Thinking-Recognizing
Assumptions assessment is 8.435 minutes. The demographic details of the gender, age group and industry
has been provided in the following tables
Page 49 of 52
Mercer Mettl Test for Critical Thinking – Technical Manual
Telecommunications 3.91
Machinery 1.98
Banking 1.14
Insurance 1.10
Others 4.06
Page 50 of 52
Mercer Mettl Test for Critical Thinking – Technical Manual
References
i Desjardins, J. (2018). 10 skills you'll need to survive the rise of automation. Retrieved September 26, 2019
from https://www.weforum.org/agenda/2018/07/the-skills-needed-to-survive-the-robot-invasion-of-the-
workplace.
ii Bughin,J., Hazan, E., Lund, S., Dahlström,P., Wiesinger,A., & Subramaniam, A. (2018). Retrieved September 26,
https://www.forbes.com/sites/bernardmarr/2019/04/29/the-10-vital-skills-you-will-need-for-the-future-
of-work/#4c22043f3f5b.
iv Black, S. (2005). Teaching students to think critically. The Education Digest, 70(6), 42–47.
v Mayer, R., & Goodchild, F. (1990). The critical thinker. New York: Wm. C. Brown.
vi Beyer, B. K. (1984). Improving thinking skills-defining the problem. Phi Delta Kappan, 65, 486–490.
vii Saremi, H., & Bahdori, S. (2015). The relationship between critical thinking with emotional intelligence and
creativity among elementary school principals in Bojnord city, Iran. International Journal of Life Sciences, 9(6),
33-40.
viii Halpern, D. F. (1998). Teaching critical thinking across domains: dispositions, skills, structure training, and
Pearson
x Watson, G., & Glaser, E. M. (1980). Manual for the Watson Glaser critical thinking appraisal.
Schuhfried GmbH.
xiv Halpern, D. F. (2013). Thought and knowledge: An introduction to critical thinking. Psychology Press.
xv The Future of Jobs Employment, Skills and Workforce Strategy for the Fourth Industrial Revolution. Global
skills and disposition toward critical thinking. Environmental Education Research, 10(4), 507-522.
xix Gadzella, B. M., Stephens, R., & Stacks, J (2004). Assessment of critical thinking scores in relation with
psychology and GPA for education majors. Paper presented at the Texas A & M University Assessment
Conference, College Station, TX.
xx Kuhn, D. (1999). A developmental model of critical thinking. Educational Researcher, 28(2), 16-25
xxi Lipman, M. (2003). Thinking in education (2nd ed.). Cambridge, MA: Cambridge University Press
xxii Zoller, U., Ben-Chaim, D., Ron, S., Pentimalli, R., & Borsese, A. (2000). The disposition toward critical thinking
of high school and university science students: An inter-intra Israeli-Italian study. International Journal of
Science Education, 22(6), 571-582.
xxiii Saremi, H., & Bahdori, S. (2015). The relationship between critical thinking with emotional intelligence and
creativity among elementary school principals in Bojnord city, Iran. International Journal of Life Sciences, 9(6),
33-40.
xxiv Ennis, R. (1993). Critical thinking assessment. Theory into Practice, 32, 179-186
xxv Glevey, K.E. (2006). Promoting thinking skills in education. London Review of Education, 4 (3), 291-302.
Page 51 of 52
Mercer Mettl Test for Critical Thinking – Technical Manual
xxvi Khalili, H. (2004). Critical thinking skills of nursing students in Semnan University of Medical Sciences.
Iranian Journal of Medical Education, 4(2), 23-31.
xxvii Spector, P. A., Schneider, J. R., Vance, C. A., & Hezlett, S. A. (2000). The relation of cognitive ability and
personality traits to assessment center performance. Journal of Applied Social Psychology, 30(7), 1474– 1491.
xxviii Kudish, J. D., & Hoffman, B. J. (2002, October). Examining the relationship between assessment center final
dimension ratings and external measures of cognitive ability and personality. Paper presented at the 30th
International Congress on Assessment Center Methods, Pittsburgh, PA.
xxix Nakamura, Y. (2001). Rasch Measurement and Item Banking: Theory and Practice.
xxx Bergstrom, B. A., & Lunz, M. E. (1999). CAT for certification and licensure. Innovations in computerized
assessment, 67-91.
xxxi Embretson, S. E., & Reise, S. P. (2013). Item response theory. Psychology Press.
xxxii Van der Linden, W. J. (2018). Handbook of item response theory, three volume set. Chapman and Hall/CRC.
xxxiii American Educational Research Association, American Psychological Association, Joint Committee on
Standards for Educational, Psychological Testing (US), & National Council on Measurement in Education.
(1985). Standards for educational and psychological testing. American Educational Research Association.
xxxivUniform Guidelines on Employee Selection Procedures (EEOC, 1978), Retrieved September 11, 2019, from
https://www.eeoc.gov/policy/docs/factemployment_procedures.html.
Page 52 of 52