Mettl Test For Critical Thinking

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

welcome to brighter

METTL TEST FOR CRITICAL


THINKING
- Technical Manual

_____________________________________________________________________________________

Copyright © 2019 Mercer Mettl. All rights reserved.

Revised in 2022

This Manual may not, in whole or in part, be copied, photocopied, reproduced, translated, or converted to any electronic or machine-readable form without
prior written consent of Mercer Mettl.
Page 2

Table of Contents

Executive Summary ....................................................................................................................................................3


Theoretical Foundations of the MTCT .....................................................................................................................5
Definition of Critical Thinking ..................................................................................................................................5
Literature Review .......................................................................................................................................................6
Test Development & Standardization- MTCT ........................................................................................................7
Item Banking ...............................................................................................................................................................8
Item Bank Development .............................................................................................................................................9
Item Bank Calibration .............................................................................................................................................10
Psychometric Properties of MTCT .........................................................................................................................13
Internal Consistency Reliability ..............................................................................................................................13
Validity ......................................................................................................................................................................13
Group Differences: Adverse Impact Analysis ........................................................................................................15
Administration, Scoring and Interpretation.............................................................................................................17
Summary Remarks and Recommendations for Use ................................................................................................18
Appendices ................................................................................................................................................................18
Appendix 1: Demographic details of Pilot study (N = 600) ...................................................................................18
Appendix 2: Demographic Details of Standardization Study (N = 1270) ............................................................23
Appendix 3: Criterion Validity Results ..................................................................................................................27
Appendix 4: Adverse Impact Analysis ....................................................................................................................28
Appendix 5: Sample Item and Sample Report.......................................................................................................29
Appendix 6: Demographic details for the norming sample-Global (2021) ..........................................................31
Appendix 7: Demographic details of the norming sample-Turkish (2021) .........................................................40
Appendix 8: Demographic details of the norming sample-Portuguese (2022) ....................................................42
Appendix 9: Demographic details for the norming sample- Spanish (2022) .......................................................46
References..................................................................................................................................................................51
Page 3

Executive Summary

The purpose of this technical manual is to describe the process of standardization and validation of the Mettl
Test for Critical Thinking (MTCT). This test requires applications of analytical reasoning in a verbal context.
Critical thinking is an extremely important ability in employees in today’s organizations. With the rise of a
VUCA world, automation and big data the demand for a workforce with high critical thinking is growingi. It is
important to note here that the ability to connect, interpret and analyse information in a world full of
ambiguity and change requires a higher level of critical thinking. According to McKinsey, due to the rise in
automation the need of basic cognitive skills required in data input and processing will decrease. On the other
hand, the demand for higher cognitive skills such as creativity, critical thinking, decision making, and complex
information processing, will grow through 2030ii. In our experience, critical thinking tests are one of the most
important tests in employment testing. The previous version of this test was extensively used in hiring and
developmental interventions for mid to senior level executives. This test is also used in hiring for critical roles
at all job levels, individual contributor to mid and senior management roles across all industries. The previous
version of this test was effective in measuring competencies like strategic thinking, problem solving and
decision making.

Mettl’s Test for Critical Thinking helps in measuring the following ability of the test takers:
 Ability to collect information from relevant sources.
 Ability to critically analyse the information coming from diverse sources.
 Ability to interpret data rationally and draw valid conclusions.
 Ability to render accurate judgements based on evidence and the logical relationship between
propositions.
 Ability to recognize problems and solve the problems efficiently.
 Ability to reflect and take logical and conclusive decisions.

The following goals guided the development of the MTCT:


 The test must be relevant and measure the critical thinking ability of the test takers.
Page 4

 The test must be credible and high in terms of psychometric rigour.


 The test should be easy to administer and simple to interpret.
 The test should not be too long, and it should take not more than 20-30 minutes to administer.
 The test must be free from cultural biases and adverse impact on a specific demographic group.
 The test should be developed as per the guidelines prescribed by Standards for Educational and
Psychological Testing developed jointly by the American Educational Research Association, American
Psychological Association, and National Council on Measurement in Education (1999), EFPA test
review model, Uniform Guidelines on Employee Selection Procedures (EEOC, 1978), the Society for
Industrial and Organizational Psychology's Principles for the Validation and Use of Personnel Selection
Procedures (SIOP, 2003).
Page 5

Theoretical Foundations of the MTCT


Definition of Critical Thinking
Critical thinking in widely considered as one of the key skills required in the workforce of the futureiii. Higher
critical thinking ability can help an individual to solve complex problems with suitable logic or reasoning and
take appropriate decisions. In the future workforces need to do more higher-level decision making which will
require higher critical thinking skills. Higher critical thinking skills will help individuals make clear and
rationale judgements by adeptly conceptualizing, applying, analysing, synthesizing, or evaluating information.

Critical thinking is defined by multiple researchers in different ways (Black 2005iv; Moseley et al., 2005;
Sternberg, Roediger & Halpern, 2007). In general, it is defined as the ability of an individual to achieve a
desired outcome by thinking rationally and logically. According to Halpren (2003) “critical thinking is
purposeful, reasoned, and goal directed. It is the kind of thinking involved in solving problems, formulating
inferences, calculating likelihoods, and making decisions, when the thinker is using skills that are thoughtful
and effective for the particular context”. According to Mayer and Goodchild (1990)v critical thinking is defined
as “an active and systematic attempt to understand and evaluate arguments”. On the other hand Beyer (1984)vi
defined critical thinking as the combination of 10 discreet skills which includes “(a) an ability to discriminate
between provable facts and value claims; (b) determining the reliability of a source; (c) distinguishing
relevant from irrelevant information, claims, or reasons; (d) detecting bias (e) identifying unstated
assumptions; (f) identifying ambiguous or equivocal claims or arguments; (g) recognizing logical
inconsistencies or fallacies in a line of reasoning; (h) distinguishing between warranted or unwarranted
claims and; (i) determining the strength of an argument”.

According to Ghasemi Far (2004)vii, critical thinking involves the; identification of problems, the estimation of
the relationship between the different components of the problem, inference, combining elements to create a
new thought pattern and appropriate interpretations or conclusions. Conversely, Halpern (1998)viii believes
critical thinking includes six skills namely; verbal reasoning, argument analysis, hypothesis testing, likelihood
and uncertainty and decision-making and problem solving. Watson and Glasser (2008)ix proposed five factor
model measuring five components of critical thinking namely; recognition of assumptions, inferences,
Page 6

evaluation of arguments, interpretation and deduction. It is also important to note here that intelligence and
critical thinking are two separate constructs. They are related to each other, but they are not similar. In sum,
the major components of critical thinking are judgement, reasoning, metacognition and reflective thinking.

Based on a thorough literature of critical thinking we see MTCT as a multi-faceted measure of critical thinking
which includes three elements of critical thinking; recognizing assumptions, evaluating arguments, and
drawing conclusions. Recognizing assumptions is the ability to identify assumptions/ suppositions made
implicitly while arriving at conclusions. Evaluating arguments is the ability to discriminate between weak and
strong, relevant and irrelevant arguments related to some specific matter. Lastly, drawing conclusions is the
ability to derive valid conclusions based on available evidence. In sum, as per our definition, critical thinking
is a rational way of thinking with clarity and precision. It includes questioning assumptions, making
evaluations that are impartial and precise and identifying the relevant information when reaching
conclusions.

Table 1: Summary of Critical thinking Models /Instruments

Critical thinking Models /Instruments Components

Watson-Glaser Critical Thinking Inference, recognition of assumptions, deduction,


Appraisal inventory (1980)x interpretation, and evaluation of arguments
Cornell Critical Thinking Test / Ennis- Induction, credibility, prediction, semantics, deduction,
Weir Critical Thinking Test (1985)xi definitions and assumption identification.
California Critical Thinking Skills Test CT components includes analysis, evaluation, inference,
(CCTST) Facione & Facione (1994)xii deductive reasoning, and inductive reasoning.
Halpern Critical Thinking Assessment (a) verbal reasoning, (b) argument analysis, (c)
(HCTA) (Halpern, 2012)xiii. thinking as hypothesis testing, (d) likelihood and
uncertainty, (e) decision making and problem solving.

Literature Review
The fast pace of change in our world today has made the ability to think critically become one of the most
significant and relevant skills for employee (Halpern, 2002)xiv. Both, The World Economic Forum (WEF) The
Future of Jobs (2016)xv identify critical thinking and complex problem solving as the most sought skills over
Page 7

the next few years. These reports believe using logic and reasoning to recognise the strengths and weaknesses
of alternate solutions, making tough decisions or developing different approaches to solve problems are the
key skills required in multiple job families especially business and financial operations, architecture and
engineering, management, computers and Mathematical jobs. In summary, it is going to be difficult to imagine
any area or job where the ability to think critically is not needed. Most of the jobs in the present and the future
will require employees to make decisions, analyse arguments and solve problems every day.

Halpem (2006)xvi suggested that critical thinking is purposeful, reasoned and directed to solve problems,
calculating probability and making decisions. Critical thinking also facilitates the reasoning which helps to
decide which factors to consider when taking decisions in a variety of settings (Halpern, 1998)xvii. The majority
of empirical literature on critical thinking clearly suggest the positive relationship between critical thinking
and academic performance (Ernst and Monroe, 2004xviii; Gadzella, Stephens and Stacks, 2004xix; Kuhn, 1999xx;
Lipman, 2003xxi, Zoller et al., 2000xxii). In a study conducted by Saremi & Bahdori (2015)xxiii it was shown that
critical thinking and creativity are positively correlated with each other and critical thinking is also
significantly correlated with emotional intelligence.

Ennis (1993)xxiv suggested that higher critical thinking skills result in a higher capacity to assess a problem
effectively. On the other hand, Glevey (2006)xxv reported that individuals high on critical thinking usually come
up with better problem-solving strategies. According to a study conducted by Khalili (2004)xxvi there was a
positive correlation between student’s critical thinking test score and their GPA, as well as their scores on
math and verbal courses. In another study by Watson and Glasser (2009) it was suggested that there was a
positive relationship between critical thinking score and supervisory ratings on overall job performance and
several dimensions of workplace performance including technical knowledge, judgement and problem
solving. Spector, Schneider, Vance and Hezlett (2000)xxvii suggested that critical thinking and problem-solving
skills are positively correlated with each other. Kudish and Hoffman (2002)xxviii reported that there was a link
between critical thinking capability and the judgement and analysis ability of retail professionals.

Test Development & Standardization- MTCT


Development and standardization study of MTCT was conducted between April 2019 and September 2019.
Page 8

Item Banking
The MTCT is developed using an item banking approach to generate multiple equivalent forms to support item
randomization. The term ‘item bank’ is used to describe a group of items which are organized, classified and
catalogued systematically. According to the research conducted by Nakamura (2000)xxix Item Response
Theory (IRT) facilitates item banking standardization by calibrating and positioning all items in the test bank
on the same latent continuum by means of a common metric. This method can be further used to add
additional items to the test bank to increase the strength of the item bank. IRT also allows construction of
equivalent and multiple tests as per the test composition plan.

Our item bank is developed in line with our test composition plan which is based on two parameters;
representation of all types of item content and inclusion of easy, medium and difficult items xxx. In our critical
thinking test all three components of critical thinking; recognizing assumptions, evaluating arguments, and
drawing conclusions are represented in the item bank. Our test’s composition is defined by a specified
percentage of items from various content domains/rules as well as equal numbers of easy, medium and
difficult items. It is used to develop a uniform content outline which is crucial to confirm the construct validity
of the test. In an item bank there are more questions than are needed for each candidate. This enables random
generation of items within certain parameters to ensure each test is no more or less difficult than the last.
Although item characteristics can be estimated with the help of both Classical Test Theory (CTT) and IRT
models, the psychometric literature indicates the IRT method is more suitable for an item banked test
(Embretson & Reise, 2013xxxi; Van der Linden, 2018xxxii). The classical items and test statistics based on the
CTT model vary depending on sample characteristics whereas an IRT model provides ‘sample free’ indices of
item and test statistics. Therefore, we use item response theory to standardize our item banks.

The advantage of using item bank methodology is as follows:


 All items in the bank are calibrated/validated in terms of psychometric properties with the help of
item response theory.
 Item banking also enables us to generate equivalent but different tests which can be randomly
assigned to test respondents.
 Item banks randomise questions which help to prevent cheating or piracy of items.
Page 9

 New items can be added to the bank simultaneously and over exposed items can be retired when they
reach a specific level.
 Fair and non-discriminatory items are only included in the item bank which reduces the adverse
impact for different groups and produces fair assessments for all the candidates.

Item Bank Development


The development of items typically goes through the following stages:
1. Item construction
2. Item review by a panel of experts
3. Pilot testing of items
4. Review of item properties based on pilot data
5. Test administration on a representative sample
6. Analysis of item and test properties
7. Item finalization and development of an item bank

Item Writing
The MTCT consists of a piece of text outlining the premise of the question, followed by four statements on
either assumptions, arguments or conclusions. An individual must identify which of the statements following
the premise are assumptions made, which of the arguments are strong or weak or which conclusions are valid.
The following questions are asked at the end of each premise/statement:
 Which of the following conclusions can be made from given information?
 Which of the following, if true would most strengthen/weaken the above conclusion?
 Which of the following assumption is the above conclusion based on?
The development of this test was done in four broad stages; item creation, item review, pilot testing and
standardization. In the first stage of item creation a large pool of 127 items were developed by subject matter
experts and psychometricians. Detailed in-depth interviews were conducted to explore with SMEs which item
and item reasoning should be used. The following general rules were followed when designing the items:

 Items should not be based on any sensitive issues.


Page 10

 Simple language was preferred over jargon


 No specific knowledge about the issue should be required to answer the items correctly.
 Items/statements should not include any culture specific elements/issues.
 There should be balanced mix of items at all difficulty levels – easy, medium and difficult.
 There should be balanced mix of items for all three aspects of critical thinking - recognizing
assumption, evaluating arguments and drawing conclusions.

Item Review
Item reviews were conducted by our in-house Psychometricians, who have over 10 years of research
experience. Items and answer keys were both reviewed in depth. The difficulty level and item logic of each
item was reviewed thoroughly. The items were also analysed in terms of cultural- neutrality so that no ethnic
or cultural group would be advantages or disadvantaged due to culturally specific content. All items that did
not meet these strict standards were removed. Out of 127 original items a pool of 72 items were finalized for
the next step, after multiple rounds of item review.

Item Bank Calibration


Stage 1: Item trial for item difficulty estimation

Procedure: In the first stage we conducted a pilot study and individual item parameters were estimated using
a Rasch Model. The objective of conducting the pilot study was to ascertain the basic item properties especially
item difficulty of all 72 items in the first stage. 72 items were divided into three equivalent sets and data was
collected from online administration of all three sets. All the items were mandatory, and participants were not
allowed to skip the item without responding. Only respondents with at least a 90% completion rate were
included in the sample and those with less than a 90% completion rate were not included in final data set. This
resulted in 170, 202 and 228 responses in the three sets respectively.

Sample Details: In the first stage data was collected from 600 respondents. 45.2% of respondents from the
total sample were male, 47% of respondents were female, 0.7% of respondents chose ‘other’ as their gender.
Page 11

43% of the respondent’s native language was English and the mean age of the sample was 32.4 years. A
detailed description of the sample is reported in Appendix 1.

Analysis: A Rasch Model was used to ascertain item properties at stage 1 due to a smaller sample size. This
model provides stable estimates with less than 30 responses per item. A Rasch Model is the one parameter
model of Item Response Theory which estimates the probability of correct responses to a given test item based
on two variables: difficulty of an item and the ability of the candidate. The primary function of this model is to
provide information on item difficulty which helps to organize the test items according to difficulty level,
spread of item difficulty and test length. This helps to ultimately increase the measurement accuracy and test
validity. Based on the findings of the Rasch model, items exhibiting extreme b parameters were rejected at this
stage. Values substantially less than -3 or greater than +3 were regarded as extreme. 23 items from the initial
pool of 72 items got removed at this stage.

Stage 2: Item bank calibration and estimation of psychometric properties of test

Procedure: A total of 49 items survived the pilot study stage. These were arranged in terms of difficulty
parameters and then divided into 3 sets of 21 items each for the final stage of data collection. 11 items got
repeated across three sets to develop three sets of equivalent tests with 21 items in each. The objective of the
second stage of data collection was to standardize the item bank and ascertain the essential psychometric
properties (reliability and validity) of the test. All the items were mandatory at this stage and participants
were not allowed to skip the item without responding. Only respondents with a 90% completion rate were
included in the sample and those with less than 90% completion rate were not included in final data set. This
resulted in 514, 384 and 372 responses in all three sets respectively.

Sample: In the second stage, data was collected from 1270 respondents. 46.9 % of respondents from the total
sample were male, 45.4% of respondents were female, and 2.7 % of respondents identified their gender as
‘other’. 45.5% of the respondent’s native language was English and the mean age of the sample was 31.3 years.
A detailed description of the sample is reported in Appendix 2.
Page 12

Figure 1: Sample Item Characteristic Curve

Analysis: In the second stage of analysis we used a two-parameter model which advocates that the probability
of the correct response is a function of both item difficulty and the respondent’s proficiency. The two
parameter IRT model provides meaningful estimates of item difficulty and item discrimination. For the
finalization of items in the item bank, the following procedure was followed:

 Items displaying a b parameter (item difficulty) larger than -3 or greater than 3 and above were
removed from the data set.
 Items displaying a parameter (item discrimination) less than .2 were also removed at this stage.

Three out of 49 items were removed meaning the final bank consisted of 46 items with a balanced spread of
easy, medium and difficult items.
Page 13

Psychometric Properties of MTCT


Internal Consistency Reliability
A commonly used indicator of internal consistency reliability is Cronbach’s alpha, an index of internal
consistency obtained by examining the homogeneity of the items/questions within an assessment and its
value ranges from 0 to 1. As per the APA Standards there are three broad categories of reliability coefficients;
alternate form coefficients, test retest coefficients and internal-consistency coefficient. In the present study,
we computed Cronbach alpha coefficients which are based on the relationships between scores derived from
individual items within the MTCT and all data accrued from a single test administration. As per the APA
Standards “A higher degree of reliability is required for score uses that have more significant consequences
for test takers”. The EFPA BOA test review model also provides guidance on the Cronbach alpha values and
according to them under some conditions a reliability of 0.70 is considered good. For the three sets of critical
thinking tests generated, the median reliability (internal consistency) was 0.72 and the inter quartile range
was 0.69 to 0.75. The range of SEM across all three sets was 0.1 only.

Validity
Validity is the most fundamental property of any psychological test. It involves accumulating relevant scientific
evidence for test score interpretation. The APA Standardsxxxiii say that there are four major sources of evidence
to consider when measuring the validity of a test; evidence based on test content, evidence based on response
processes, evidence based on internal structure and evidence based on relationship with other variables
especially criterion variables. In order to ascertain the validity of the MTCT we collected evidence based on
internal structure (construct validity) and, evidence based on relationship with other variables especially
criterion variables (criterion related validity).

Construct Validity
The purpose of the construct validation is to ascertain whether the test measures the proposed construct or
something else. The most common method of ascertaining the construct validity of an assessment is
exploratory and confirmatory factor analysis. We used the CFA method because our objective is to test a
predefined unidimensional measurement model. One of the most important assumptions of using an IRT
model as a measurement system is that it includes unidimensional items from the item bank. Therefore, in
Page 14

order to establish construct validity evidence confirmatory factor analyses was used. The CFA results
confirmed the unidimensional factor structure with fit statistics that were satisfactory. As per the CFA model
the fit indices were within a normal range (IFI = .97; RMSEA = .02; CFI = .978 and TLI = .973).

Criterion Validity
Criterion-related validity evidence indicates the extent to which assessment outcomes are predictive of
employee performance in a specified job or role. In order to establish the criterion-related validity, there are
two major methods used:

1. Concurrent Validity: In this method, data on the criterion measures are obtained at the same time
as the psychometric test scores. This indicates the extent to which the psychometric test scores
accurately estimate an individual’s present job performance.
2. Predictive Validity: In this method, data on criterion measures are obtained after the test. This
indicates the extent to which the psychometric test scores accurately predicts a candidate’s future
performance. In this method, tests are administered to candidates when they apply for the job and
their performance is reviewed after six months or a year. Afterwards, their scores on the two
measures are correlated to estimate the criterion validity of the psychometric test.

In order to ascertain the MTCT’s validity, concurrent criterion-related validity evidence was gathered where
performance data and the MTCT scores were both collected at the same time. Then the relationship between
these two variables was tested and significant relationships were found. It is important to note here that in
criterion related validity analysis, the precision and relevance of criterion data/employee performance data
is extremely vital. Error in measurement of the criterion is a threat to accurate assessment of the test’s validity.
Error in criterion measurement may attenuate the relationship between test score and criterion variables, and
thus lead to an erroneous criterion-related validity estimate. The basic criteria of appropriateness or quality
is as follows. Researchers should
• Have a clear and objective definition and calculation of performance levels.
• Have alignment with key demands of the role.
• Have crucial implications on business outcomes.
• Produce reasonable variance to effectively separate various performance levels.
Page 15

Study Procedure: In the present study MTCT scores were used as the predictor variable and respondent’s
competency score on the basis of Line-managers ratings were used as the criterion variable. Data was collected
from a multinational company which specializes in HR Consulting. A sample of 150 employees from this
organization were invited to participate in the study and the purpose of conducting the assessments were
explained to them in detail. After collecting responses from the employees on the MTCT a detailed
competency-based performance rating form was completed by their respective line managers. In the
competency-based performance rating form all competencies were defined, and respondents were asked to
rate the competency on a 10-point rating scale (1 =low and 10 = high). Pearson product correlation method
was used to test the relationship between the MTCT score and their competency ratings.

Sample: A total of 111 employees participated in the study and completed the MTCT. We received managerial
ratings on competencies for only 87 of these respondents. The mean age of the sample was 35.4 years, 51% of
respondents were male and 49% were female. 74% of the respondents worked as Analysts and Consultants
and the remaining 26% were Leaders and Product Owners.

Analysis: Pearson product correlation method was used to test the relationship between the MTCT score and
line manager competency ratings. Results indicate significant positive correlations between the MTCT score
and competency ratings. MTCT score is positively correlated with critical thinking (r = .368, p <.01) analytical
ability (r = .335, p <.01), Fluid intelligence (r = .301, p<.01) Innovation (r = .215, p <.05) and verbal
comprehension (r = .244, p <.05). These correlation coefficients are not corrected for attenuation or range
restriction. MTCT score is also positively correlated with learning orientation, employability and ability with
numbers (refer to Appendix 3, table 1).

Group Differences: Adverse Impact Analysis


Definition of Adverse Impact (UGESP, 1978)
The Uniform Guidelines on Employee Selection Procedures (UGESP, 1978xxxiv) defines Adverse Impact as “a
substantially different rate of selection in hiring, promotion, or other employment decisions which works to
Page 16

the disadvantage of members of a race, sex or ethnic group” (see section 1607.16). UGESP recommends the
four-fifths rule for examining the potential of Adverse Impact, stating that the “selection rate for any race, sex
or ethnic group which is less than four-fifths (4/5) (or 80%) of the rate for the group with the highest rate
will generally be regarded by the Federal enforcement agencies as evidence of adverse impact.” (1978, see
section 1607.4 D). Courts have also applied this rule to cases involving age discrimination. The Age
Discrimination in Employment Act (ADEA) of 1967 prohibited discrimination in selection contexts against
individuals 40 years of age or older. In addition, the UK’s Equality Act (2010) legally protects people from
discrimination in the workplace and in wider society. Researchers have proposed alternative methods for
examining Adverse Impact (e.g., moderated multiple regression, one-person rule, and the N of 1 rule),
although none have been as widely adopted as the four-fifths rule. Additionally, a statistical significance test
for mean group differences on individual assessment scales is often considered informative.

In the present study group differences based on age, gender, and ethnicity for the MTCT were examined and
reported in table 1- 3 (refer to Appendix 4). Table 1 presents the comparisons of mean group differences
between gender and MTCT score. Results clearly suggest that there is no significant difference in mean score
between male and female respondents. Table 2 indicates mean scores for two groups; those 40 years of age
and less than 40 years of age. Results indicate these differences are not statistically significant. We examined
the mean differences in MTCT scores between two groups– White (reference group) and others (focal group).
Results indicate these differences were statistically significant. However, based on traditional ranges for
interpreting effect sizes (Cohen’s d; Cohen, 1988), the difference is medium. In conclusion, while there were
some differences in mean scores observed, the effect sizes were small and within the normal and allowable
range.

Additionally, in order to test the impact of English language skills on MTCT scores, we examined the mean
difference in MTCT score between native English speakers and non-English speakers. Results indicate these
differences were statistically significant, nonetheless an examination of effect sizes indicates the difference is
medium. This finding clearly indicates that MTCT is free from language bias and it’s a global and culture
agnostic tool (refer to Appendix 4, table 4).
Page 17

Administration, Scoring and Interpretation

Test Administration
MTCT is an online test administered through an internet-based testing system designed by Mettl for the
administration, scoring, and reporting of occupational tests. Test takers are sent a test link to complete the
test and candidate/test taker data is instantly captured for processing through the online system. Test scores
and interpretive reports are instantly generated. Tests can also be administered remotely but most
importantly all candidates’ data, question banks, reports and benchmarks are stored in a well-encrypted and
highly regarded cloud service. In order to prevent cheating and all forms of malpractices, Mettl’s platform also
offers AI powered anti-cheating solutions that include live monitoring, candidates’ authenticity check, and
secure browsing.

Scoring
Responses to MTCT are scored based on how many correct answers a respondent chooses. Each item consists
of 4 answer options, of which only one is correct. Each item answered correctly is awarded 1 mark and items
answered incorrectly or not attempted are given a 0 (zero) mark. An individual’s overall score is an average
of all items answered correctly. Next, we convert raw scores into Sten scores using the formula given below,
which brings these scores into a 10-point scale.

(Z-score * 2) + 5.5
OR
[(X-M)/SD] *2 + 5.5 (same as above)
Test Composition
Each test taker will be asked to complete 21 items in 30 mins. Sample items and a sample report are available
in Appendix 5.

Interpretation
The MTCT measures the critical thinking ability of the test takers working in a variety of individual contributor
or managerial roles. This test is suitable to be used in both recruitment and development settings. Critical
Page 18

thinking is defined as the ability to identify and analyse problems and seek and evaluate relevant information
in order to reach appropriate conclusions. A high score on the MTCT indicates that the test taker possesses
higher ability to think rationally, solve problems and take effective decisions.

Summary Remarks and Recommendations for Use


Mettl recommends that the MTCT is used with the following caveats and tips in mind:
 Use with other tests: The MTCT, like any other hiring tool, is best used as part of a systematic selection
process, along with other scientifically developed and job-relevant predictors of future success.
Ideally, the MTCT should be administered to job applicants who possess the minimum requirements
for the job. The assessment results can serve as an important part of the hiring decision – but not the
only one.
 Aggregate results: The MTCT, when used with large numbers of job applicants, in recommended ways,
will yield a better-quality workforce over time. However, like with any assessment of human abilities,
it is not infallible and should be used in conjunction with other information and followed up with
behavioural tools such as structured interviews and competency assessment.
 Simple to complex: If the primary focus is to screen out candidates unlikely to succeed, hiring managers
should focus on eliminating those “not recommended for hire” from the pool first. Those remaining,
should be prioritized as those “recommended for hire” and then consider those “cautiously
recommended for hire”.

Appendices
Appendix 1: Demographic details of Pilot study (N = 600)

Table 1: Gender
Gender Frequency Percent
Female 271 45.2
Male 284 47.3
Other 4 0.7
Prefer not to say 41 6.8
Page 19

Table 2: Age
Age Frequency Percent
20 - 30 years 323 53.8
31-40 years 142 23.7
41-50 years 91 15.2
51-60 years 44 7.3

Table 3: Years of work experience


Years of work experience Frequency Percent

0-5 years 367 61.2


6-10 years 78 13
11-15 years 42 7
16-20 years 53 8.8
20+ years 60 10

Table 4: Educational Qualification


Educational Qualification Frequency Percent
Non-Graduate 93 15.5
Bachelors 306 51
Doctorate 19 3.2
Masters 182 30.3

Table 5: Employment Status

Employment Status Frequency Percent


Seeking Employment 165 27.5
Student 127 21.2
Working 308 51.3
Page 20

Table 6: Job Level

Job Level Frequency Percent


Level 1: Executive Officers: Senior-most Leaders (CEO +
One Level Below) 77 12.8
Level 2: Senior Managers/Directors: Senior Management
(Three Levels Below CEO). 59 9.8
Level 3: Managers /Supervisors: Middle management to
first-level managers (Five Levels Below CEO) 166 27.7
Level 4: Entry Level: Non-management/ individual
contributor (including entry level) 166 27.7
Not Applicable 132 22.0

Table 7: Industry Details

Industry Frequency Percent


Consulting 82 13.7
Education 59 9.8
Financial services, Banking, Insurance 76 12.7
Government, Public service, Defense 71 11.8
Health Care 28 4.7
Human Resources 7 1.2
Information Technology & Telecommunications 55 9.2
Manufacturing & Production 27 4.5
Not Applicable 72 12
Professional services 24 4
Publishing, Printing 2 0.3
Trading 11 1.8
Others 86 14.3
Page 21

Table 8: Nature of Occupation

Nature of Occupation Frequency Percent


Architecture and Engineering 61 10.2
Arts, Design, Entertainment, Sports, and Media 31 5.2
Building and Grounds Cleaning and Maintenance 15 2.5
Business and Financial Operations 102 17
Community and Social Service 9 1.5
Computer and Mathematical 21 3.5
Construction and Extraction 4 0.7
Education, Training, and Library 13 2.2
Farming, Fishing, and Forestry 2 0.3
Food Preparation and Serving Related 5 0.8
Healthcare Practitioners and Technical 9 1.5
Healthcare Support 16 2.7
Installation, Maintenance, and Repair 5 0.8
Legal 23 3.8
Life, Physical, and Social Science 4 0.7
Management 55 9.2
Military Specific 2 0.3
Not Applicable 71 11.8
Office and Administrative Support 27 4.5
Personal Care and Service 3 0.5
Production 9 1.5
Protective Service 3 0.5
Sales and Related 34 5.7
Transportation and Material Moving 10 1.7
Others 66 11
Page 22

Table 9: Nationality

Nationality Frequency Percent


Africa 128 21.3
Asia 187 31.2
Australia & NZ 23 3.8
Europe 65 10.8
LATAM 3 0.5
NA 25 4.2
UK 130 21.7
USA & Canada 39 6.5

Table 10: Ethnicity


Ethnicity Frequency Percent
Asian 193 32.2
Black 145 24.2
Chinese 14 2.3
White 173 28.8
Others 75 12.5
Page 23

Appendix 2: Demographic Details of Standardization Study (N = 1270)


Table 1: Gender

Gender Frequency Percent


Male 596 46.9
Female 577 45.4
Others 34 2.7
Prefer not to say 63 5

Table 2: Age

Age Frequency Percent


20 - 30 years 738 58.1
31-40 years 319 25.1
41-50 years 127 10
51-60 years 86 6.8
Table 3: Work Experience

Work Experience Frequency Percent


0-5 years 694 54.6
6-10 years 192 15.1
11-15 years 173 13.6
16-20 years 74 5.8
20+ years 137 10.8

Table 4: Educational Qualifications


Educational Qualifications Frequency Percent
Non-Graduate 195 15.4
Bachelors 674 53.1
Masters 363 28.6
Doctorate 38 3
Page 24

Table 5: Employment Status

Employment Status Frequency Percent


Seeking Employment 333 26.2
Student 316 24.9
Working 621 48.9

Table 6: Job Level

Job Level Frequency Percent


Level 1: Executive Officers: Senior-most Leaders (CEO + 157 12.4
One Level Below).
Level 2: Senior Managers/Directors: Senior Management 119 9.4
(Three Levels Below CEO).
Level 3: Managers/Supervisors: Middle management to 259 20.4
first-level managers (Five Levels Below CEO).
Level 4: Entry Level: Non-management/ individual 401 31.6
contributor (including entry level).
Not Applicable 334 26.3

Table 7: Industry

Industry Frequency Percent


Consulting 257 20.2
Education 121 9.5
Financial services, Banking, Insurance 183 14.4
Government, Public service, Defence 89 7
Health Care 55 4.3
Human Resources 18 1.4
Information Technology & Telecommunications 87 6.9
Manufacturing & Production 61 4.8
Not Applicable 168 13.2
Others 169 13.3
Professional services 49 3.9
Publishing, Printing 5 0.4
Trading 8 0.6
Page 25

Table 8: Nature of Occupation

Nature of Occupation Frequency Percent


Architecture and Engineering 130 10.2
Arts, Design, Entertainment, Sports, and Media 57 4.5
Building and Grounds Cleaning and Maintenance 19 1.5
Business and Financial Operations 285 22.4
Community and Social Service 14 1.1
Computer and Mathematical 53 4.2
Construction and Extraction 8 0.6
Education, Training, and Library 49 3.9
Farming, Fishing, and Forestry 8 0.6
Food Preparation and Serving Related 15 1.2
Healthcare Practitioners and Technical 21 1.7
Healthcare Support 20 1.6
Installation, Maintenance, and Repair 8 0.6
Legal 62 4.9
Life, Physical, and Social Science 16 1.3
Management 54 4.3
Military Specific 16 1.3
Not Applicable 176 13.9
Office and Administrative Support 30 2.4
Others 141 11.1
Personal Care and Service 7 0.6
Production 19 1.5
Protective Service 4 0.3
Sales and Related 33 2.6
Transportation and Material Moving 25 2
Page 26

Table 9: Nationality

Nationality Frequency Percent


Africa 173 13.6
Asia 412 32.4
Australia & NZ 97 7.6
Europe 125 9.8
LATAM 12 0.9
NA 46 3.62
UK 303 23.8
USA & Canada 102 8

Table 10: Ethnicity

Ethnicity Frequency Percent


White 381 30
Black 145 11.4
Asian 359 28.3
Chinese 30 2.4
Prefer not to say 355 28
Page 27

Appendix 3: Criterion Validity Results

Table 1: Correlation Analysis (N = 87)

Competencies Correlations (N = 87)


Verbal Comprehension .244*
High Potential 0.194
Learning Orientation 0.178
Employability 0.172
Critical Thinking .368**
Innovation .215*
Ability with Numbers .291**
Analytical Ability .335**
Fluid Intelligence .301**
Page 28

Appendix 4: Adverse Impact Analysis

Table 1: Mean differences – Gender

Gender N Mean SD t value p value


Male 596 10.26 3.84 0.85 0.39
Female 577 10.07 3.65

Table 2: Mean differences – Age group

Age group N Mean SD t value p value


Less than 40 years 1057 10.27 3.79 0.592 .55
More than 40 years 213 10.44 3.54

Table 3: Mean differences – Ethnicity

Ethnicity N Mean SD t value Effect size


White 381 11.17 3.75 7.14** 0.47
Others 534 9.46 3.41

Table 4: Mean differences – Native Language

Native Language N Mean SD t value Effect size


English 579 11.27 3.7 8.7** 0.48
Others 691 9.49 3.59
Page 29

Appendix 5: Sample Item and Sample Report


Sample Item:

Sample Report:
Page 30
welcome to brighter

Appendix 6: Demographic details for the norming sample-Global (2021)

Section 1: Drawing Conclusions


The Norms for Critical Thinking-Drawing Conclusions for the Global region have been established for a
representative sample of 5878 respondents. These norms are based on responses from candidates who have attempted
7 questions (2 easy, 3 medium and 2 difficult) randomly. The average time taken for completing Critical Thinking-
Drawing Conclusions assessment is 8.95 minutes. A detailed composition of the gender, age, industry, and regional
specifications has been demonstrated in the following tables.

Table 1: Distribution by Gender


Gender % of Sample

Female 14.21

Male 37.12

Did not Specify 48.67

Table 2: Distribution by Age


Age group % of Sample

Upto 20 Years 0.58

20-30 Years 13.30

30-40 Years 22.71

40-50 Years 13.15

50-60 Years 3.13

Above 60 Years 0.34

Not Specified 46.78


Page 32

Table 3: Distribution by Country


Region % of Sample

India 28.63

Sri Lanka 2.23

Portugal 1.97

Malaysia 1.94

United States 1.84

Vietnam 1.75

Romania 1.46

United Kingdom 1.28

Saudi Arabia 1.12

Others 8.56

Not Specified 49.22


Page 33

Table 4: Distribution by Industry


Industry % of Sample

Information Technology and Services 16.16

Telecommunications 15.91

Education Management 13.88

Management Consulting 9.05

Accounting 2.18

Professional Training & Coaching 2.16

Insurance 2.16

Financial Services 2.14

Retail 1.60

Banking 1.60

Computer Software 1.60

Automotive 1.57

Higher Education 1.55

Internet 1.48

Market Research 1.40

Chemicals 1.05

Food & Beverages 1.02

Human Resources 1.00

Others 14.00

Not Specified 8.49


Page 34

Section 2: Evaluating Arguments


The Norms for Critical Thinking-Evaluating Arguments for the Global region have been established on a
representative sample of 10350 respondents. The average time taken for completing Critical Thinking- Evaluating
Arguments assessment is 8.63 minutes. Based on which a detailed breakdown of gender, age, country and industry
has been provided in the following section.

Table 5: Distribution by Gender


Gender % of Sample

Female 13.52

Male 36.52

Did not Specify 50.18

Table 6: Distribution by Age


Age Group % of Sample

Upto 20 Years 0.07

20-30 Years 15.18

30-40 Years 22.39

40-50 Years 13.38

50-60 Years 3.45

Above 60 Years 0.42

Not Specified 45.12

Table 7: Distribution by Country


Page 35

Region % of Sample

India 28.03

Sri Lanka 2.05

Portugal 1.60

Malaysia 1.93

United States 1.31

Vietnam 1.61

Romania 1.81

United Kingdom 1.21

Saudi Arabia 1.31

Others 8.04

Not Specified 51.09


Page 36

Table 8: Distribution by Industry


Industry % of Sample

Information Technology and Services 14.93

Telecommunications 17.28

Education Management 12.87

Management Consulting 8.75

Accounting 2.73

Professional Training & Coaching 2.14

Insurance 1.89

Financial Services 2.20

Retail 1.40

Banking 1.83

Computer Software 1.79

Automotive 1.84

Higher Education 1.22

Internet 1.69

Market Research 1.16

Chemicals 1.24

Human Resources 1.61

Others 15.41

Not Specified 8.02


Page 37

Section 3: Recognizing Assumptions


The Norms for Critical Thinking-Recognizing Assumptions for the Global region for 12581 respondents. The average
time taken for completing the assessment is 8.29 minutes. Based on which a detailed breakdown of gender, age,
country and industry has been provided in the following section.

Table 9: Distribution by Gender


Gender % of Sample

Female 14.74

Male 36.99

Did not Specify 48.26

Table 10: Distribution by Age


Age Group % of Sample

Upto 20 Years 0.22

20-30 Years 11.98

30-40 Years 23.04

40-50 Years 13.85

50-60 Years 3.61

Above 60 Years 0.43

Not Specified 46.87


Page 38

Table 11: Distribution by Country


Region % of Sample

India 27.74

Sri Lanka 2.29

Portugal 1.51

Malaysia 1.80

United States 1.38

Vietnam 1.55

Romania 1.71

United Kingdom 1.14

Saudi Arabia 1.28

Others 9.89

Not Specified 49.71


Page 39

Table 12: Distribution by Industry


Industry % of Sample

Information Technology and Services 14.37

Telecommunications 16.38

Education Management 11.47

Management Consulting 11.74

Accounting 2.68

Professional Training & Coaching 2.06

Insurance 1.69

Financial Services 2.11

Retail 1.49

Banking 2.03

Computer Software 1.70

Automotive 1.69

Higher Education 1.22

Internet 1.59

Market Research 1.05

Chemicals 1.26

Human Resources 1.61

Others 15.44

Not Specified 8.39

CT Global Norms Oct-21


welcome to brighter

Appendix 7: Demographic details of the norming sample-Turkish (2021)

Section 1: Drawing Conclusions.


The norms for Critical Thinking-Drawing Conclusions for Turkish region have been developed based on a
representative sample of 3404 respondents. These norms are based on responses from candidates who have attempted
7 questions (2 easy, 3 medium and 2 difficult) randomly. The average time taken for completing this assessment is
9.65 minutes. A detailed demographic description of the respondents gender along with their area of work has been
demonstrated in the following tables.

Table1: Distribution of Gender


Gender % of Sample
Female 18.21
Male 35.07
Did not Specify 46.72

Table 2: Distribution of Industry


Industry % of Sample
Finance 33.34
Technology 61.43
Others 5.22

Section 2: Evaluating Arguments


The norms for Critical Thinking-Evaluating Arguments for Turkish region have been developed based on a
representative sample of 3960 respondents. The average time taken for completing this assessment is 9.14 minutes.
A detailed demographic description of the respondents gender and area of work has been demonstrated in the
following tables.
Table 3: Distribution of Gender
Gender % of Sample
Female 35.63
Male 35.88
Did not Specify 28.49
Page 41

Table 4: Distribution of Industry


Industry % of Sample
Finance 33.81
Technology 61.33
Others 4.85

Section 3: Recognizing Assumptions


The norms for Critical Thinking-Recognizing Assumptions for Turkish region have been developed based on a
representative sample of 3538 respondents. The average time taken for completing this assessment is 7.83 minutes.
A detailed demographic description of the respondent’s gender and area of work has been demonstrated in the
following tables.
Table 5: Distribution of Gender
Gender % of Sample
Female 17.44
Male 35.30
Did not Specify 47.26

Table 6: Distribution of Industry


Industry % of Sample
Finance 34.68
Technology 60.62
Others 4.7

CT Turkish Norms Sept-21


welcome to brighter

Appendix 8: Demographic details of the norming sample-Portuguese


(2022)
Section 1: Drawing Conclusions
The norms for Critical Thinking-Drawing Conclusions for LATAM region (Portuguese) have been developed based
on 979 respondents. These norms are based on responses from candidates who have attempted 7 questions (2 easy,
3 medium and 2 difficult) randomly. The average time taken for completing Critical Thinking- Drawing Conclusions
assessment is 9.70 minutes. The demographic details of the gender, age group and industry has been provided in the
following tables.

Table 1: Distribution of Gender


Gender % of Sample
Females 14.91
Males 31.15
Not Specified 53.93

Table 2: Distribution of Age


Age group % of Sample
20-29 Years 5.41
30-39 Years 25.23
40-49 Years 24.72
50-59 Years 6.03
Above 60 Years 0.92
Not Specified 37.69
Page 43

Table 3: Distribution of Industry


Industry % of Sample
Information Services 46.48

Financial Services 14.61

Higher Education 11.24

Farming 6.84

Telecommunications 9.40
Pharmaceuticals 3.68
Oil & Energy 1.53

Management Consulting 1.53

Not Specified 4.70

Section 2: Evaluating Arguments

The norms for Critical Thinking –Evaluating Arguments for LATAM region (Portuguese) have been developed based
on 1244 respondents. The average time taken for completing Critical Thinking- Evaluating Arguments assessment is
8.86 minutes. The demographic details of the gender, age group and industry has been provided in the following
tables.

Table 4: Distribution of Gender


Gender % of Sample
Females 12.14
Males 25.88
Not Specified 61.98
Page 44

Table 5: Distribution of Age


Age Group % of sample
20-29 Years 5.39
30-39 Years 23.63
40-49 Years 24.47
50-59 Years 6.59
Above 60 Years 1.05
Not Specified 39.07

Table 6: Distribution of Industry


Industry % of Sample
Information Services 43.09
Financial Services 16.08

Higher Education 12.54

Farming 7.23

Telecommunications 6.83
Pharmaceuticals 3.70

Oil & Energy 2.57


Management Consulting 1.85

Not Specified 6.11

Section 3: Recognizing Assumptions


The norms for Critical Thinking –Recognizing Assumptions for LATAM region have been developed based on 1331
respondents. The average time taken for completing Critical Thinking- Recognizing Assumptions assessment is 8.15
minutes. The demographic details of the gender, age group and industry has been provided in the following tables.
Page 45

Table 7: Distribution of Gender


Gender % of Sample
Females 11.03
Males 23.07
Not Specified 65.89

Table 8: Distribution of Age


Age group % of Sample
20-29 Years 5.18
30-39 Years 23.14
40-49 Years 20.96
50-59 Years 6.39
Above 60 Years 1.35
Not Specified 42.98

Table 9: Distribution of Industry


Industry % of Sample

Information Services 43.13

Financial Services 17.13

Higher Education 13.22

Farming 7.51

Telecommunications 4.21

Pharmaceuticals 4.13

Oil & Energy 2.85

Management Consulting 2.03

Not Specified 5.79


Mercer Mettl Test for Critical Thinking – Technical Manual

Appendix 9: Demographic details for the norming sample- Spanish


(2022)
Section 1: Drawing Conclusions

The norms for Critical Thinking-Drawing Conclusions for LATAM region (Spanish) have been developed
based on 6725 respondents. These norms are based on responses from candidates who have attempted 7
questions (2 easy, 3 medium and 2 difficult) randomly. The average time taken for completing Critical
Thinking- Drawing Conclusions assessment is 9.39 minutes. The demographic details of the gender, age
group and industry has been provided in the following tables.

Table 1: Distribution of Gender


Gender % of Sample
Females 32.31
Males 44.54
Not Specified 23.15

Table 2: Distribution of Age


Age Group % of Sample
Below 20 Years 0.04
20-29 Years 16.59
30-39 Years 27.87
40-49 Years 13.99
50-59 Years 3.55
Above 60 Years 0.31
Not Specified 37.64

Page 46 of 52
Mercer Mettl Test for Critical Thinking – Technical Manual

Table 3: Distribution of Industry


Industry % of Sample

Food & Beverages 22.62

Management Consulting 22.57

Financial Services 10.87

Information Technology and Services 9.62

Health, Wellness and Fitness 4.92

Information Services 4.89

Telecommunications 2.88

Machinery 1.84

Human Resources 1.47

Banking 1.24

Insurance 1.04

Investment Banking 1.10

Others 4.54

Not Specified 10.39

Section 2: Evaluating Arguments


The norms for Critical Thinking–Evaluating Arguments for LATAM region (Spanish) have been developed
based on 6796 respondents. The average time taken for completing Critical Thinking-Evaluating
Arguments assessment is 9.098 minutes. The demographic details of the gender, age group and industry
has been provided in the following tables.

Table 4: Distribution of Gender


Gender % of Sample
Females 33.06
Males 43.44
Not Specified 23.50

Page 47 of 52
Mercer Mettl Test for Critical Thinking – Technical Manual

Table 5: Distribution of Age


Age Group % of Sample
Below 20 Years 0.03
20-29 Years 16.23
30-39 Years 28.59
40-49 Years 14.83
50-59 Years 3.96
Above 60 Years 0.40
Not Specified 35.96

Table 6: Distribution of Industry


Industry % of Sample

Food & Beverages 21.04

Management Consulting 21.81

Financial Services 11.42

Information Technology and Services 9.77

Health, Wellness and Fitness 5.94

Information Services 5.78

Telecommunications 3.47

Machinery 1.71

Human Resources 1.46

Banking 1.21

Others 6.78

Not Specified 9.61

Page 48 of 52
Mercer Mettl Test for Critical Thinking – Technical Manual

Section 3: Recognizing Assumptions

The norms for Critical Thinking –Recognizing Assumptions for LATAM (Spanish) have been developed
based on 8085 respondents. The average time taken for completing Critical Thinking-Recognizing
Assumptions assessment is 8.435 minutes. The demographic details of the gender, age group and industry
has been provided in the following tables

Table 7: Distribution of Gender


Gender % of Sample
Females 32.53
Males 44.04
Not Specified 23.43

Table 8: Distribution of Age


Age group % of Sample
Below 20 Years 0.02
20-29 Years 16.69
30-39 Years 27.95
40-49 Years 14.01
50-59 Years 3.77
Above 60 Years 0.36
Not Specified 37.19

Page 49 of 52
Mercer Mettl Test for Critical Thinking – Technical Manual

Table 9: Distribution of Industry


Industry % of Sample

Food & Beverages 23.73

Management Consulting 23.54

Financial Services 10.44

Information Technology and Services 9.47

Health, Wellness and Fitness 5

Information Services 4.01

Telecommunications 3.91

Machinery 1.98

Human Resources 1.23

Banking 1.14

Insurance 1.10

Investment Banking 1.22

Others 4.06

Not Specified 11.09

Page 50 of 52
Mercer Mettl Test for Critical Thinking – Technical Manual

References

i Desjardins, J. (2018). 10 skills you'll need to survive the rise of automation. Retrieved September 26, 2019
from https://www.weforum.org/agenda/2018/07/the-skills-needed-to-survive-the-robot-invasion-of-the-
workplace.
ii Bughin,J., Hazan, E., Lund, S., Dahlström,P., Wiesinger,A., & Subramaniam, A. (2018). Retrieved September 26,

2019 from https://www.mckinsey.com/featured-insights/future-of-work/skill-shift-automation-and-the-


future-of-the-workforce
iii Marr, B. (2019). The 10 Vital Skills You Will Need for The Future of Work. Retrieved September 26, 2019 from

https://www.forbes.com/sites/bernardmarr/2019/04/29/the-10-vital-skills-you-will-need-for-the-future-
of-work/#4c22043f3f5b.
iv Black, S. (2005). Teaching students to think critically. The Education Digest, 70(6), 42–47.
v Mayer, R., & Goodchild, F. (1990). The critical thinker. New York: Wm. C. Brown.
vi Beyer, B. K. (1984). Improving thinking skills-defining the problem. Phi Delta Kappan, 65, 486–490.
vii Saremi, H., & Bahdori, S. (2015). The relationship between critical thinking with emotional intelligence and

creativity among elementary school principals in Bojnord city, Iran. International Journal of Life Sciences, 9(6),
33-40.
viii Halpern, D. F. (1998). Teaching critical thinking across domains: dispositions, skills, structure training, and

metacognitive monitoring. American Psychologist, 53(4), 449–455.


ix Watson, G., & Glaser, E. M. (2008). Watson-glaser critical thinking appraisal: short form manual. NY:

Pearson
x Watson, G., & Glaser, E. M. (1980). Manual for the Watson Glaser critical thinking appraisal.

Cleveland, OH: Psychological Corporation.


xi Ennis, R. H., Millman, J., & Tomko, T. N. (1985). Cornell Critical Thinking Tests Level X & Level Z:

Manual. Boise, ID: Midwest Publications


xii Facione, N. C., & Facione, P. A. (1994). The" California Critical Thinking Skills Test" and the National League

for Nursing Accreditation Requirement in Critical Thinking.


xiii Halpern, D. F. (2012). Halpern Critical Thinking Assessment: Test Manual. Mödling, Austria:

Schuhfried GmbH.
xiv Halpern, D. F. (2013). Thought and knowledge: An introduction to critical thinking. Psychology Press.
xv The Future of Jobs Employment, Skills and Workforce Strategy for the Fourth Industrial Revolution. Global

Challenge Insight Report (2016). Retrieved October 10, 2019 from


http://www3.weforum.org/docs/WEF_Future_of_Jobs.pdf.
xvi Halpern, D. F. (2006). Is intelligence critical thinking? Why we need a new definition of intelligence.

Extending intelligence: Enhancement and new constructs, 293310.


xvii Halpern, D. F. (1998). Teaching critical thinking for transfer across domains: Disposition, skills, structure

training, and metacognitive monitoring. American psychologist, 53(4), 449.


xviii Ernst, J., & Monroe, M. (2004). The effects of environment‐based education on students' critical thinking

skills and disposition toward critical thinking. Environmental Education Research, 10(4), 507-522.
xix Gadzella, B. M., Stephens, R., & Stacks, J (2004). Assessment of critical thinking scores in relation with

psychology and GPA for education majors. Paper presented at the Texas A & M University Assessment
Conference, College Station, TX.
xx Kuhn, D. (1999). A developmental model of critical thinking. Educational Researcher, 28(2), 16-25
xxi Lipman, M. (2003). Thinking in education (2nd ed.). Cambridge, MA: Cambridge University Press
xxii Zoller, U., Ben-Chaim, D., Ron, S., Pentimalli, R., & Borsese, A. (2000). The disposition toward critical thinking

of high school and university science students: An inter-intra Israeli-Italian study. International Journal of
Science Education, 22(6), 571-582.
xxiii Saremi, H., & Bahdori, S. (2015). The relationship between critical thinking with emotional intelligence and

creativity among elementary school principals in Bojnord city, Iran. International Journal of Life Sciences, 9(6),
33-40.
xxiv Ennis, R. (1993). Critical thinking assessment. Theory into Practice, 32, 179-186
xxv Glevey, K.E. (2006). Promoting thinking skills in education. London Review of Education, 4 (3), 291-302.

Page 51 of 52
Mercer Mettl Test for Critical Thinking – Technical Manual

xxvi Khalili, H. (2004). Critical thinking skills of nursing students in Semnan University of Medical Sciences.
Iranian Journal of Medical Education, 4(2), 23-31.
xxvii Spector, P. A., Schneider, J. R., Vance, C. A., & Hezlett, S. A. (2000). The relation of cognitive ability and

personality traits to assessment center performance. Journal of Applied Social Psychology, 30(7), 1474– 1491.
xxviii Kudish, J. D., & Hoffman, B. J. (2002, October). Examining the relationship between assessment center final

dimension ratings and external measures of cognitive ability and personality. Paper presented at the 30th
International Congress on Assessment Center Methods, Pittsburgh, PA.
xxix Nakamura, Y. (2001). Rasch Measurement and Item Banking: Theory and Practice.

xxx Bergstrom, B. A., & Lunz, M. E. (1999). CAT for certification and licensure. Innovations in computerized

assessment, 67-91.
xxxi Embretson, S. E., & Reise, S. P. (2013). Item response theory. Psychology Press.

xxxii Van der Linden, W. J. (2018). Handbook of item response theory, three volume set. Chapman and Hall/CRC.

xxxiii American Educational Research Association, American Psychological Association, Joint Committee on

Standards for Educational, Psychological Testing (US), & National Council on Measurement in Education.
(1985). Standards for educational and psychological testing. American Educational Research Association.
xxxivUniform Guidelines on Employee Selection Procedures (EEOC, 1978), Retrieved September 11, 2019, from

https://www.eeoc.gov/policy/docs/factemployment_procedures.html.

Page 52 of 52

You might also like