Professional Documents
Culture Documents
Final Exam - Readings
Final Exam - Readings
Contents
Introduction
Frames
These six frames are presented alphabetically and do not suggest a particular sequence in which
they must be learned.
Suggestions on How to Use the Framework for Information Literacy for Higher Education
Introduction for Faculty and Administrators
For Faculty: How to Use the Framework
For Administrators: How to Support the Framework
The Framework offered here is called a framework intentionally because it is based on a cluster of
interconnected core concepts, with flexible options for implementation, rather than on a set of standards
or learning outcomes, or any prescriptive enumeration of skills. At the heart of this Framework are
conceptual understandings that organize many other concepts and ideas about information, research, and
scholarship into a coherent whole. These conceptual understandings are informed by the work of Wiggins
and McTighe,2 which focuses on essential concepts and questions in developing curricula, and also by
threshold concepts,3 which are those ideas in any discipline that are passageways or portals to enlarged
understanding or ways of thinking and practicing within that discipline. This Framework draws upon an
ongoing Delphi Study that has identified several threshold concepts in information literacy,4 but the
Framework has been molded using fresh ideas and emphases for the threshold concepts. Two added
elements illustrate important learning goals related to those concepts: knowledge practices,5 which are
demonstrations of ways in which learners can increase their understanding of these information literacy
concepts, and dispositions,6 which describe ways in which to address the affective, attitudinal, or valuing
dimension of learning. The Framework is organized into six frames, each consisting of a concept central
to information literacy, a set of knowledge practices, and a set of dispositions. The six concepts that
anchor the frames are presented alphabetically:
Neither the knowledge practices nor the dispositions that support each concept are intended to prescribe
what local institutions should do in using the Framework; each library and its partners on campus will
need to deploy these frames to best fit their own situation, including designing learning outcomes. For the
same reason, these lists should not be considered exhaustive.
In addition, this Framework draws significantly upon the concept of metaliteracy,7 which offers a
renewed vision of information literacy as an overarching set of abilities in which students are consumers
and creators of information who can participate successfully in collaborative spaces.8 Metaliteracy
demands behavioral, affective, cognitive, and metacognitive engagement with the information ecosystem.
Because this Framework envisions information literacy as extending the arc of learning throughout
students’ academic careers and as converging with other academic and social learning goals, an expanded
definition of information literacy is offered here to emphasize dynamism, flexibility, individual growth,
and community learning:
Information literacy is the set of integrated abilities encompassing the reflective discovery of
information, the understanding of how information is produced and valued, and the use of
information in creating new knowledge and participating ethically in communities of learning.
The Framework opens the way for librarians, faculty, and other institutional partners to redesign
instruction sessions, assignments, courses, and even curricula; to connect information literacy with
student success initiatives; to collaborate on pedagogical research and involve students themselves in that
research; and to create wider conversations about student learning, the scholarship of teaching and
learning, and the assessment of learning on local campuses and beyond.
Notes
1. Association of College & Research Libraries, Information Literacy Competency Standards for Higher Education
(Chicago, 2000).
2. Grant Wiggins and Jay McTighe. Understanding by Design. (Alexandria, VA: Association for Supervision and
Curriculum Development, 2004).
3. Threshold concepts are core or foundational concepts that, once grasped by the learner, create new perspectives and
ways of understanding a discipline or challenging knowledge domain. Such concepts produce transformation within the
learner; without them, the learner does not acquire expertise in that field of knowledge. Threshold concepts can be
thought of as portals through which the learner must pass in order to develop new perspectives and wider
understanding. Jan H. F. Meyer, Ray Land, and Caroline Baillie. “Editors’ Preface.” In Threshold Concepts and
Transformational Learning, edited by Jan H. F. Meyer, Ray Land, and Caroline Baillie, ix–xlii. (Rotterdam,
Netherlands: Sense Publishers, 2010).
4. For information on this unpublished, in-progress Delphi Study on threshold concepts and information literacy,
conducted by Lori Townsend, Amy Hofer, Silvia Lu, and Korey Brunetti, see http://www.ilthresholdconcepts.com/.
Lori Townsend, Korey Brunetti, and Amy R. Hofer. “Threshold Concepts and Information Literacy.” portal: Libraries
and the Academy 11, no. 3 (2011): 853–69.
5. Knowledge practices are the proficiencies or abilities that learners develop as a result of their comprehending a
threshold concept.
6. Generally, a disposition is a tendency to act or think in a particular way. More specifically, a disposition is a cluster of
preferences, attitudes, and intentions, as well as a set of capabilities that allow the preferences to become realized in a
particular way. Gavriel Salomon. “To Be or Not to Be (Mindful).” Paper presented at the American Educational
Research Association Meetings, New Orleans, LA, 1994.
7. Metaliteracy expands the scope of traditional information skills (determine, access, locate, understand, produce, and
use information) to include the collaborative production and sharing of information in participatory digital
environments (collaborate, produce, and share). This approach requires an ongoing adaptation to emerging technologies
and an understanding of the critical thinking and reflection required to engage in these spaces as producers,
collaborators, and distributors. Thomas P. Mackey and Trudi E. Jacobson. Metaliteracy: Reinventing Information
Literacy to Empower Learners. (Chicago: Neal-Schuman, 2014).
8. Thomas P. Mackey and Trudi E. Jacobson. “Reframing Information Literacy as a Metaliteracy.” College and Research
Libraries 72, no. 1 (2011): 62–78.
9. Metacognition is an awareness and understanding of one’s own thought processes. It focuses on how people learn and
process information, taking into consideration people’s awareness of how they learn. (Jennifer A. Livingston.
“Metacognition: An Overview.” Online paper, State University of New York at Buffalo, Graduate School of Education,
1997. http://gse.buffalo.edu/fas/shuell/cep564/metacog.htm.)
Experts understand that authority is a type of influence recognized or exerted within a community. Experts
view authority with an attitude of informed skepticism and an openness to new perspectives, additional voices,
and changes in schools of thought. Experts understand the need to determine the validity of the information
created by different authorities and to acknowledge biases that privilege some sources of authority over others,
especially in terms of others’ worldviews, gender, sexual orientation, and cultural orientations. An
understanding of this concept enables novice learners to critically examine all evidence—be it a short blog post
or a peer-reviewed conference proceeding—and to ask relevant questions about origins, context, and suitability
for the current information need. Thus, novice learners come to respect the expertise that authority represents
while remaining skeptical of the systems that have elevated that authority and the information created by it.
Experts know how to seek authoritative voices but also recognize that unlikely voices can be authoritative,
depending on need. Novice learners may need to rely on basic indicators of authority, such as type of
publication or author credentials, where experts recognize schools of thought or discipline-specific paradigms.
Knowledge Practices
Learners who are developing their information literate abilities
• define different types of authority, such as subject expertise (e.g., scholarship), societal position
(e.g., public office or title), or special experience (e.g., participating in a historic event);
• use research tools and indicators of authority to determine the credibility of sources,
understanding the elements that might temper this credibility;
• understand that many disciplines have acknowledged authorities in the sense of well-known
scholars and publications that are widely considered “standard,” and yet, even in those situations,
some scholars would challenge the authority of those sources;
• recognize that authoritative content may be packaged formally or informally and may include
sources of all media types;
• acknowledge they are developing their own authoritative voices in a particular area and recognize
the responsibilities this entails, including seeking accuracy and reliability, respecting intellectual
property, and participating in communities of practice;
• understand the increasingly social nature of the information ecosystem where authorities actively
connect with one another and sources develop over time.
Dispositions
Learners who are developing their information literate abilities
• develop and maintain an open mind when encountering varied and sometimes conflicting perspectives;
• motivate themselves to find authoritative sources, recognizing that authority may be conferred or
manifested in unexpected ways;
• develop awareness of the importance of assessing content with a skeptical stance and with a self-
awareness of their own biases and worldview;
• question traditional notions of granting authority and recognize the value of diverse ideas and
worldviews;
• are conscious that maintaining these attitudes and actions requires frequent self-evaluation.
The information creation process could result in a range of information formats and modes of delivery, so
experts look beyond format when selecting resources to use. The unique capabilities and constraints of
each creation process as well as the specific information need determine how the product is used. Experts
recognize that information creations are valued differently in different contexts, such as academia or the
workplace. Elements that affect or reflect on the creation, such as a pre- or post-publication editing or
reviewing process, may be indicators of quality. The dynamic nature of information creation and
dissemination requires ongoing attention to understand evolving creation processes. Recognizing the
nature of information creation, experts look to the underlying processes of creation as well as the final
product to critically evaluate the usefulness of the information. Novice learners begin to recognize the
significance of the creation process, leading them to increasingly sophisticated choices when matching
information products with their information needs.
Knowledge Practices
Learners who are developing their information literate abilities
• articulate the capabilities and constraints of information developed through various creation
processes;
• assess the fit between an information product’s creation process and a particular information
need;
• articulate the traditional and emerging processes of information creation and dissemination in a
particular discipline;
• recognize that information may be perceived differently based on the format in which it is
packaged;
• recognize the implications of information formats that contain static or dynamic information;
• monitor the value that is placed upon different types of information products in varying contexts;
• transfer knowledge of capabilities and constraints to new types of information products;
• develop, in their own creation processes, an understanding that their choices impact the purposes
for which the information product will be used and the message it conveys.
Dispositions
Learners who are developing their information literate abilities
• are inclined to seek out characteristics of information products that indicate the underlying
creation process;
• value the process of matching an information need with an appropriate product;
• accept that the creation of information may begin initially through communicating in a range of
formats or modes;
• accept the ambiguity surrounding the potential value of information creation expressed in
emerging formats or modes;
• resist the tendency to equate format with the underlying creation process;
• understand that different methods of information dissemination with different purposes are
available for their use.
The value of information is manifested in various contexts, including publishing practices, access to
information, the commodification of personal information, and intellectual property laws. The novice
learner may struggle to understand the diverse values of information in an environment where “free”
information and related services are plentiful and the concept of intellectual property is first encountered
through rules of citation or warnings about plagiarism and copyright law. As creators and users of
information, experts understand their rights and responsibilities when participating in a community of
scholarship. Experts understand that value may be wielded by powerful interests in ways that marginalize
certain voices. However, value may also be leveraged by individuals and organizations to effect change
and for civic, economic, social, or personal gains. Experts also understand that the individual is
responsible for making deliberate and informed choices about when to comply with and when to contest
current legal and socioeconomic practices concerning the value of information.
Knowledge Practices
Learners who are developing their information literate abilities
• give credit to the original ideas of others through proper attribution and citation;
• understand that intellectual property is a legal and social construct that varies by culture;
• articulate the purpose and distinguishing characteristics of copyright, fair use, open access, and
the public domain;
• understand how and why some individuals or groups of individuals may be underrepresented or
systematically marginalized within the systems that produce and disseminate information;
• recognize issues of access or lack of access to information sources;
• decide where and how their information is published;
• understand how the commodification of their personal information and online interactions affects
the information they receive and the information they produce or disseminate online;
• make informed choices regarding their online actions in full awareness of issues related to
privacy and the commodification of personal information.
Dispositions
Learners who are developing their information literate abilities
Experts see inquiry as a process that focuses on problems or questions in a discipline or between
disciplines that are open or unresolved. Experts recognize the collaborative effort within a discipline to
extend the knowledge in that field. Many times, this process includes points of disagreement where debate
and dialogue work to deepen the conversations around knowledge. This process of inquiry extends
beyond the academic world to the community at large, and the process of inquiry may focus upon
personal, professional, or societal needs. The spectrum of inquiry ranges from asking simple questions
that depend upon basic recapitulation of knowledge to increasingly sophisticated abilities to refine
research questions, use more advanced research methods, and explore more diverse disciplinary
perspectives. Novice learners acquire strategic perspectives on inquiry and a greater repertoire of
investigative methods.
Knowledge Practices
Learners who are developing their information literate abilities
Dispositions
Learners who are developing their information literate abilities
Research in scholarly and professional fields is a discursive practice in which ideas are formulated, debated,
and weighed against one another over extended periods of time. Instead of seeking discrete answers to
complex problems, experts understand that a given issue may be characterized by several competing
perspectives as part of an ongoing conversation in which information users and creators come together and
negotiate meaning. Experts understand that, while some topics have established answers through this process,
a query may not have a single uncontested answer. Experts are therefore inclined to seek out many
perspectives, not merely the ones with which they are familiar. These perspectives might be in their own
discipline or profession or may be in other fields. While novice learners and experts at all levels can take part
in the conversation, established power and authority structures may influence their ability to participate and
can privilege certain voices and information. Developing familiarity with the sources of evidence, methods,
and modes of discourse in the field assists novice learners to enter the conversation. New forms of scholarly
and research conversations provide more avenues in which a wide variety of individuals may have a voice in
the conversation. Providing attribution to relevant previous research is also an obligation of participation in the
conversation. It enables the conversation to move forward and strengthens one’s voice in the conversation.
Knowledge Practices
Learners who are developing their information literate abilities
Dispositions
Learners who are developing their information literate abilities
• recognize they are often entering into an ongoing scholarly conversation and not a finished
conversation;
• seek out conversations taking place in their research area;
• see themselves as contributors to scholarship rather than only consumers of it;
• recognize that scholarly conversations take place in various venues;
• suspend judgment on the value of a particular piece of scholarship until the larger context for the
scholarly conversation is better understood;
• understand the responsibility that comes with entering the conversation through participatory channels;
• value user-generated content and evaluate contributions made by others;
• recognize that systems privilege authorities and that not having a fluency in the language and
process of a discipline disempowers their ability to participate and engage.
The act of searching often begins with a question that directs the act of finding needed information.
Encompassing inquiry, discovery, and serendipity, searching identifies both possible relevant sources as
well as the means to access those sources. Experts realize that information searching is a contextualized,
complex experience that affects, and is affected by, the cognitive, affective, and social dimensions of the
searcher. Novice learners may search a limited set of resources, while experts may search more broadly
and deeply to determine the most appropriate information within the project scope. Likewise, novice
learners tend to use few search strategies, while experts select from various search strategies, depending
on the sources, scope, and context of the information need.
Knowledge Practices
Learners who are developing their information literate abilities
• determine the initial scope of the task required to meet their information needs;
• identify interested parties, such as scholars, organizations, governments, and industries, who
might produce information about a topic and then determine how to access that information;
• utilize divergent (e.g., brainstorming) and convergent (e.g., selecting the best source) thinking
when searching;
• match information needs and search strategies to appropriate search tools;
• design and refine needs and search strategies as necessary, based on search results;
• understand how information systems (i.e., collections of recorded information) are organized in
order to access relevant information;
• use different types of searching language (e.g., controlled vocabulary, keywords, natural
language) appropriately;
• manage searching processes and results effectively.
Dispositions
Learners who are developing their information literate abilities
The Framework has been conceived as a set of living documents on which the profession will build. The
key product is a set of frames, or lenses, through which to view information literacy, each of which
includes a concept central to information literacy, knowledge practices, and dispositions. The Association
of College & Research Libraries (ACRL) encourages the library community to discuss the
new Framework widely and to develop resources such as curriculum guides, concept maps, and
assessment instruments to supplement the core set of materials in the frames.
As a first step, ACRL encourages librarians to read through the entire Framework and discuss the
implications of this new approach for the information literacy program at their institution. Possibilities
include convening a discussion among librarians at an institution or joining an online discussion of
librarians. In addition, as one becomes familiar with the frames, consider discussing them with
professionals in the institution’s center for teaching and learning, office of undergraduate education, or
similar departments to see whether some synergies exist between this approach and other institutional
curricular initiatives.
The frames can guide the redesign of information literacy programs for general education courses, for
upper level courses in students’ major departments, and for graduate student education. The frames are
intended to demonstrate the contrast in thinking between novice learner and expert in a specific area;
movement may take place over the course of a student’s academic career. Mapping out in what way
specific concepts will be integrated into specific curriculum levels is one of the challenges of
implementing the Framework. ACRL encourages librarians to work with faculty, departmental or college
curriculum committees, instructional designers, staff from centers for teaching and learning, and others to
design information literacy programs in a holistic way.
ACRL realizes that many information literacy librarians currently meet with students via one-shot classes,
especially in introductory level classes. Over the course of a student’s academic program, one-shot
sessions that address a particular need at a particular time, systematically integrated into the curriculum,
can play a significant role in an information literacy program. It is important for librarians and teaching
faculty to understand that the Framework is not designed to be implemented in a single information
literacy session in a student’s academic career; it is intended to be developmentally and systematically
integrated into the student’s academic program at variety of levels. This may take considerable time to
implement fully in many institutions.
This Framework sets forth these information literacy concepts and describes how librarians as
information professionals can facilitate the development of information literacy by postsecondary
students.
Creating a Framework
ACRL has played a leading role in promoting information literacy in higher education for decades.
The Information Literacy Competency Standards for Higher Education (Standards), first published in
2000, enabled colleges and universities to position information literacy as an essential learning outcome
in the curriculum and promoted linkages with general education programs, service learning, problem-
based learning, and other pedagogies focused on deeper learning. Regional accrediting bodies, the
American Association of Colleges and Universities (AAC&U), and various discipline-specific
organizations employed and adapted the Standards.
It is time for a fresh look at information literacy, especially in light of changes in higher education,
coupled with increasingly complex information ecosystems. To that end, an ACRL Task Force developed
the Framework. The Framework seeks to address the great potential for information literacy as a deeper,
more integrated learning agenda, addressing academic and technical courses, undergraduate research,
community-based learning, and co-curricular learning experiences of entering freshman through
graduation. The Framework focuses attention on the vital role of collaboration and its potential for
increasing student understanding of the processes of knowledge creation and scholarship.
The Framework also emphasizes student participation and creativity, highlighting the importance of these
contributions.
The Framework is developed around a set of “frames,” which are those critical gateway or portal
concepts through which students must pass to develop genuine expertise within a discipline, profession,
or knowledge domain. Each frame includes a knowledge practices section used to demonstrate how the
mastery of the concept leads to application in new situations and knowledge generation. Each frame also
includes a set of dispositions that address the affective areas of learning.
• Investigate threshold concepts in your discipline and gain an understanding of the approach used
in the Framework as it applies to the discipline you know.
— What are the specialized information skills in your discipline that students should
develop, such as using primary sources (history) or accessing and managing large data sets
(science)?
• Look for workshops at your campus teaching and learning center on the flipped classroom and
consider how such practices could be incorporated into your courses.
— What information and research assignments can students do outside of class to arrive
prepared to apply concepts and conduct collaborative projects?
• Partner with your IT department and librarians to develop new kinds of multimedia assignments
for courses.
— What kinds of workshops and other services should be available for students involved in
multimedia design and production?
— In your program, how do students interact with, evaluate, produce, and share information
in various formats and modes?
• Consider the knowledge practices and dispositions in each information literacy frame for possible
integration into your own courses and academic program.
— How might you and a librarian design learning experiences and assignments that will
encourage students to assess their own attitudes, strengths/weaknesses, and knowledge gaps
related to information?
• Host or encourage a series of campus conversations about how the institution can incorporate
the Framework into student learning outcomes and supporting curriculum
• Provide the resources to enhance faculty expertise and opportunities for understanding and
incorporating the Framework into the curriculum
• Encourage committees working on planning documents related to teaching and learning (at the
department, program, and institutional levels) to include concepts from the Framework in their
work
• Provide resources to support a meaningful assessment of information literacy of students at
various levels at your institution
• Promote partnerships between faculty, librarians, instructional designers, and others to develop
meaningful ways for students to become content creators, especially in their disciplines
to update the Information Literacy Competency Standards for Higher Education so they reflect the
current thinking on such things as the creation and dissemination of knowledge, the changing
global higher education and learning environment, the shift from information literacy to
information fluency, and the expanding definition of information literacy to include
multiple literacies, for example, transliteracy, media literacy, digital literacy, etc.
The Task Force released the first version of the Framework in two parts in February and April of 2014
and received comments via two online hearings and a feedback form available online for four weeks. The
committee then revised the document, released the second draft on June 17, 2014, and sought extensive
feedback through a feedback form, two online hearings, an in-person hearing, and analysis of social
media and topical blog posts.
On a regular basis, the Task Force used all of ACRL’s and American Library Association’s (ALA)
communication channels to reach individual members and ALA and ACRL units (committees, sections,
round tables, ethnic caucuses, chapters, and divisions) with updates. The Task Force’s liaison
at ACRL maintained a private e-mail distribution list of over 1,300 individuals who attended a fall, spring,
or summer online forum; provided comments to the February, April, June, or November drafts; or
were otherwise identified as having strong interest and expertise. This included members of the Task
Force that drafted the Standards, leading Library Information Science (LIS) researchers and national
project directors, members of the Information Literacy Rubric Development Team for the Association of
American Colleges & Universities, and Valid Assessment of Learning in Undergraduate Education
initiative. Via all these channels, the Task Force regularly shared updates, invited discussion at virtual and
in-person forums and hearings, and encouraged comments on public drafts of the proposed Framework.
ACRL recognized early on that the effect of any changes to the Standards would be significant both
within the library profession and in higher education more broadly. In addition to general announcements,
the Task Force contacted nearly 60 researchers who cited the Standards in publications
outside LIS literature, more than 70 deans, associate deans, directors or chairs of LIS schools, and invited
specific staff leaders (and press or communications contacts) at more than 70 other higher education
associations, accrediting agencies, and library associations and consortia to encourage their members to
read and comment on the draft.
The Task Force systematically reviewed feedback from the first and second drafts of the Framework,
including comments, criticism, and praise provided through formal and informal channels. The three
official online feedback forms had 562 responses; numerous direct e-mails were sent to members of the
Task Force. The group was proactive in tracking feedback on social media, namely blog posts and
Twitter. While the data harvested from social media are not exhaustive, the Task Force made its best
efforts to include all known Twitter conversations, blog posts, and blog commentary. In total, there were
several hundred feedback documents, totaling over a thousand pages, under review. The content of these
documents was analyzed by members of the Task Force and coded using HyperResearch, a qualitative
data analysis software. During the drafting and vetting process, the Task Force provided more detail on
the feedback analysis in an online FAQ document.
In December 2014, the Task Force made final changes. Two other ACRL groups reviewed and provided
feedback on the final drafts: the ACRL Information Literacy Standards Committee and
the ACRL Standards Committee. The latter group submitted the final document and recommendations to
the ACRL Board for its review at the 2015 ALA Midwinter Meeting in Chicago.
Note: Filed by the ACRL Board February 2, 2015; Adopted by the ACRL Board January 11, 2016.
ACRL Information Literacy Competency Standards Review Task Force. “Task Force
Recommendations.” ACRL AC12 Doc 13.1, June 2,
2012. http://www.ala.org/acrl/sites/ala.org.acrl/files/content/standards/ils_recomm.pdf.
American Association for School Librarians. Standards for the 21st-Century Learner. Chicago:
American Library Association,
2007. http://www.ala.org/aasl/sites/ala.org.aasl/files/content/guidelinesandstandards/learningstand
ards/AASL_LearningStandards.pdf.
Blackmore, Margaret. “Student Engagement with Information: Applying a Threshold Concept
Approach to Information Literacy Development.” Paper presented at the 3rd Biennial Threshold
Concepts Symposium: Exploring Transformative Dimensions of Threshold Concepts, Sydney,
Australia, July 1–2, 2010.
Carmichael, Patrick. “Tribes, Territories, and Threshold Concepts: Educational Materialisms at
Work in Higher Education.” Educational Philosophy and Theory 44, no. S1 (2012): 31–42.
Coonan, Emma. A New Curriculum for Information Literacy: Teaching Learning; Perceptions of
Information Literacy. Arcadia Project, Cambridge University Library, July
2011. http://ccfil.pbworks.com/f/emma_report_final.pdf.
Cousin, Glynis. "An Introduction to Threshold Concepts." Planet 17 (December 2006): 4–5.
———. “Threshold Concepts, Troublesome Knowledge and Emotional Capital: An Exploration
into Learning about Others.” In Overcoming Barriers to Student Understanding: Threshold
Concepts and Troublesome Knowledge, edited by Jan H. F. Meyer and Ray Land, 134–47.
London and New York: Routledge, 2006.
Gibson, Craig, and Trudi Jacobson. “Informing and Extending the Draft ACRL Information
Literacy Framework for Higher Education: An Overview and Avenues for Research.” College
and Research Libraries 75, no. 3 (May 2014): 250–4.
Head, Alison J. “Project Information Literacy: What Can Be Learned about the Information-
Seeking Behavior of Today’s College Students?” Paper presented at the ACRL National
Conference, Indianapolis, IN, April 10–13, 2013.
Hofer, Amy R., Lori Townsend, and Korey Brunetti. “Troublesome Concepts and Information
Literacy: Investigating Threshold Concepts for IL Instruction.” portal: Libraries and the
Academy 12, no. 4 (2012): 387–405.
ScienceDirect
w w w. c o m p s e c o n l i n e . c o m / p u b l i c a t i o n s / p r o d c l a w. h t m
A B S T R A C T
Keywords: The vague but vogue notion of ‘big data’ is enjoying a prolonged honeymoon. Well-funded,
Big data ambitious projects are reaching fruition, and inferences are being drawn from inadequate
Data science data processed by inadequately understood and often inappropriate data analytic tech-
Data quality niques. As decisions are made and actions taken on the basis of those inferences, harm
Decision quality will arise to external stakeholders, and, over time, to internal stakeholders as well. A set of
Regulation Guidelines is presented, whose purpose is to intercept ill-advised uses of data and analyti-
cal tools, prevent harm to important values, and assist organisations to extract the achievable
benefits from data, rather than dreaming dangerous dreams.
© 2017 Roger Clarke. Published by Elsevier Ltd. All rights reserved.
Jagadish et al. (2014), Cai and Zhu (2015) and Haryadi et al.
1. Introduction (2016), and particularly Merino et al. (2016).
Outside academe, most publications that offer advice appear
Previous enthusiasms for management science, decision support to be motivated not by the avoidance of harm to affected values,
systems, data warehousing and data mining have been reju- but rather the protection of the interests of organisations con-
venated. Fervour for big data, big data analytics and data science ducting analyses and using the results. Examples of such
has been kindled, and is being sustained, by high-pressure tech- documents in the public sector include DoFD (2015) – subse-
nology salesmen. Like all such fads, there is a kernel of truth, quently withdrawn, and UKCO (2016). Nothing resembling
but also a large penumbra of misunderstanding and misrep- guidelines appears to have been published to date by the rel-
resentation, and hence considerable risk of disappointment, evant US agencies, but see NIST (2015) and GAO (2016).
and worse. Some professional codes and statements are relevant, such
A few documents have been published that purport to as UNSD (1985), DSA (2016), ASA (2016) and ACM (2017). Ex-
provide some advice on how to avoid harm arising from the amples also exist in the academic research arena, e.g. Rivers
practice of these techniques. Within the specialist big data ana- and Lewis (2014), Müller et al. (2016) and Zook et al. (2017).
lytics literature, the large majority of articles focus on techniques However, reflecting the dependence of the data professions
and applications, with impacts and implications relegated to on the freedom to ply their trade, such documents are
a few comments at the end of the paper rather than even being oriented towards facilitation, with the protection of stakehold-
embedded within the analysis, let alone a driving factor in the ers commonly treated as a constraint rather than as an
design. But see Agrawal et al. (2011), Saha and Srivastava (2014), objective.
* Corresponding author. Xamax Consultancy Pty Ltd, 78 Sidaway St, Chapman ACT 2611 Canberra, Australia.
E-mail address: Roger.Clarke@xamax.com.au (R. Clarke).
https://doi.org/10.1016/j.clsr.2017.11.002
0267-3649/© 2017 Roger Clarke. Published by Elsevier Ltd. All rights reserved.
468 computer law & security review 34 (2018) 467–476
Documents have begun to emerge from government agen- Big data analytics encompasses all processes applied to big data
cies that perform regulatory rather than stimulatory functions. that may enable inferences to be drawn from it.
See, for example, a preliminary statement issued by Data Pro-
tection Commissioners (WP29, 2014), a consultation draft from The term ‘data scientist’ emerged two decades ago as an
the Australian Privacy Commissioner (OAIC, 2016), and a docu- upbeat alternative to ‘statistician’ (Press, 2013). Its focus is on
ment issued by the Council of Europe Convention 108 group analytic techniques, whereas the more recent big data move-
(CoE 2017). These are, however, unambitious and diffuse, re- ment commenced with its focus on data. The term ‘data science’
flecting the narrow statutory limitations of such organisations has been increasingly co-opted by the computer science dis-
to the protection of personal data. For a more substantial dis- cipline and business communities in order to provide greater
cussion paper, see ICO (2017). respectability to big data practices. Although computer science
It is vital that guidance be provided for at least those prac- has developed some additional techniques, a primary focus has
titioners who are concerned about the implications of their been the scalability of computational processes to cope with
work. In addition, a reference-point is needed as a basis for large volumes of disparate data. It may be that the re-capture
evaluating the adequacy of organisational practices, of the codes of the field by the statistics discipline will bring with it a re-
and statements of industry and professional bodies, of rec- covery of high standards of professionalism and responsibility
ommendations published by regulatory agencies, and of the – which, this paper argues, are sorely needed. In this paper,
provisions of laws and statutory codes. This paper’s purpose however, the still-current term ‘big data analytics’ is used.
is to offer such a reference-point, expressed as guidelines for Where data is not in a suitable form for application of any
practitioners who are seeking to act responsibly in their ap- particular data analytic technique, modifications may be made
plication of analytics to big data collections. to it in an attempt to address the data’s deficiencies. This was
This paper draws heavily on previous research reported in for many years referred to as ‘data scrubbing’, but it has become
Wigan and Clarke (2013), Clarke (2016a, 2016b), Raab and more popular among proponents of data analytics to use the
Clarke (2016) and Clarke (2017b). It also reflects literature misleading terms ‘data cleaning’ and ‘data cleansing’ (e.g. Rahm
critical of various aspects of the big data movement, notably and Do, 2000, Müller and Freytag, 2003). These terms imply that
Bollier (2010), Boyd and Crawford (2011), Lazer et al. (2014), the scrubbing process reliably achieves its aim of delivering a
Metcalf and Crawford (2016), King and Forder (2016) and high-quality data collection. Whether that is actually so is highly
Mittelstadt et al. (2016). It first provides a brief overview of contestable, and is seldom demonstrated through testing against
the field, sufficient to provide background for the remainder the real world that the modified data purports to represent.
of the paper. It then presents a set of Guidelines whose There are many challenging aspects of data quality. What should
intentions are to filter out inappropriate applications of data be done where data-items that are important to the analysis
analytics, and provide a basis for recourse by aggrieved parties are empty (‘null’)? And what should be done where they contain
against organisations whose malbehaviour or misbehaviour values that are invalid according to the item’s definition, or have
results in harm. An outline is provided of various possible been the subject of varying definitions over the period during
applications of the Guidelines. which the data-set has been collected? Another term that has
come into currency is ‘data wrangling’ (Kandel et al., 2011). Al-
though the term is honest and descriptive, and the authors
adopt a systematic approach to the major challenge of missing
2. Background data, their processes for ‘correcting erroneous values’ are merely
computationally-based ‘transforms’, neither sourced from nor
The ‘big data’ movement is largely a marketing phenom- checked against the real world. The implication that data is
enon. Much of the academic literature has been cavalier in its ‘clean’ or ‘cleansed’ is commonly an over-claim, and hence such
adoption and reticulation of vague assertions by salespeople. terms should be avoided in favour of the frank and usefully
As a result, definitions of sufficient clarity to assist in analy- descriptive term ‘data scrubbing’.
sis are in short supply. This author adopts the approach of Where data is consolidated from two or more data collec-
treating as ‘big data’ any collection that is sufficiently large that tions, some mechanism is needed to determine which records
someone is interested in applying sophisticated analytical tech- in each collection are appropriately merged or linked. In some
niques to it. However, it is important to distinguish among circumstances there may be a common data-item in each
several categories: collection that enables associations between records to be
reliably postulated. In many cases, a combination of data-
• a single large data collection; and items (e.g., in the case of people, the set of first and last
• a consolidation of two or more data collections, which may name, date-of-birth and postcode) may be regarded as repre-
be achieved through: senting the equivalent of a common identifier. This process
• merger into a single physical data collection; or has long been referred to as computer or data matching
• interlinkage into a single virtual data collection (Clarke, 1994). Other approaches can be adopted, but gener-
ally with even higher incidences of false-positives (matches
The term ‘big data analytics’ is distinguishable from its pre- that are made but that are incorrect) and false-negatives
decessor ‘data mining’ primarily on the basis of the decade in (matches that could have been made but were not). A further
which it is used. It is subject to marketing hype to almost the issue is the extent to which a consolidated collection should
same extent as ‘big data’. So all-inclusive are its usages that contain all entries or only those for which a match has (or
a reasonable working definition is: has not) been found. This decision may have a significant
computer law & security review 34 (2018) 467–476 469
within a strong record of failure, and considerable dispute (e.g. The majority of big data analytics activity is performed
Dreyfus, 1992, Katz, 2012). Successive waves of enthusiasts keep behind closed doors. One common justification for this is
emerging, to frame much the same challenges somewhat dif- commercial competitiveness, but other factors are com-
ferently, and win more grant money from parallel new waves monly at work, in both private and public sector contexts. As
of funding decision-makers. Meanwhile, the water has been a result of the widespread lack of transparency, it is far from
muddied by breathless, speculative extensions of AI notions clear that practices take into account the many challenges that
into the realms of metaphysics. In particular, an aside by von are identified in this section.
Neumann about a ‘singularity’ has been elevated to spiritu- Transparency is in any case much more challenging in the
ality (Moravec, 2000; Kurzweil, 2005), and longstanding sci-fi contemporary context than it was in the past. During the early
notions of ‘super-intelligence’ have been re-presented as phi- decades of software development, until c.1990, the rationale
losophy (Bostrom, 2014). underlying any particular inference was apparent from the
Multiple threads of AI are woven into big data mythology. independently-specified algorithm or procedure implemented
Various words with a similarly impressive sound to ‘intelli- in the software. Subsequently, so-called expert systems adopted
gent’ have been used as marketing banners, such as ‘expert’, an approach whereby the problem-domain is described, but
‘neural’, ‘connectionist’, ‘learning’ and ‘predictive’. Defini- the problem and solution, and hence the rationale for an in-
tions are left vague, with each new proposal applying Arthur ference, are much more difficult to access. Recently, purely
C. Clarke’s Third Law, and striving to be ‘indistinguishable empirical techniques such as neural nets and the various ap-
from magic’ and hence to gain the mantle of ‘advanced tech- proaches to machine learning have attracted a lot of attention.
nology’. Within the research community, expressions of These do not even embody a description of a problem domain.
scepticism are in short supply, but Lipton (2015) encapsu- They merely comprise a quantitative summary of some set of
lates the problem by referring to “an unrealistic expectation instances (Clarke, 1991). In such circumstances, no humanly-
that modern feed-forward neural networks exhibit human- understandable rationale for an inference exists, and in many
like cognition”. cases none can be created. As a result, transparency is non-
One cluster of techniques is marketed as ‘machine learn- existent, and accountability is impossible (Burrell, 2016; Knight,
ing’. A commonly-adopted approach (‘supervised learning’) 2017). To cater for such problems, Broeders et al. (2017), writing
involves some kind of (usually quite simple) data structure being in the context of national security applications, called for the
provided to a piece of generic software, often one that has an imposition of a legal duty of care and requirements for exter-
embedded optimisation function. A ‘training set’ of data is fed nal reviews, and the banning of automated decision-making.
in. The process of creating this artefact is claimed to consti- This brief review has identified a substantial set of risk
tute ‘learning’. Aspects of the “substantial amount of ‘black art’“ factors. Critique is important, but critique is by its nature nega-
involved are discussed in Domingos (2012). tive in tone. It is incumbent on critics to also offer positive and
Even where some kind of objective is inherent in the data sufficiently concrete contributions towards resolution of the
structure and/or the generic software, application of the meta- problems that they perceive. The primary purpose of this paper
phor of ‘learning’ is something of stretch for what is a sub- is to present a set of Guidelines whose application would
human and in many cases a non-rational process (Burrell, 2016). address the problems and establish a reliable professional basis
A thread of work that hopes to overcome some of the weak- for the practice of data analytics.
nesses expands the approach from a single level to a multi-
layered model. Inevitably, this too has been given marketing
gloss by referring to it as ‘deep learning’. Even some enthusi-
asts are appalled by the hyperbole: “machine learning
algorithms [are] not silver bullets, . . . not magic pills, . . . not 3. The Guidelines
tools in a toolbox – they are method{ologie}s backed by ratio-
nal thought processes with assumptions regarding the datasets The Guidelines presented here avoid the word ‘big’, and refer
they are applied to” (Rosebrock, 2014). simply to ‘data’ and ‘data analytics’. These are straightfor-
A field called ‘predictive analytics’ over-claims in a differ- ward and generic terms whose use conveys the prescriptions’
ent way. Rather than merely extrapolating from a data- broad applicability. The Guidelines are of particular relevance
series, it involves the extraction of patterns and then to personal data, because data analytics harbours very sub-
extrapolation of the patterns rather than the data; so the claim stantial threats when applied to data about individuals. The
of ‘prediction’ is bold. Even some enthusiasts have warned that Guidelines are expressed quite generally, however, because in-
predictive analytics can have “‘unintended side effects’ – [things] ferences drawn from any form of data may have negative
you didn’t really count on when you decided to build models implications for individuals, groups, communities, societies, poli-
and put them out there in the wild” (Perlich, quoted in Swoyer ties, economies or the environment. The purpose of the
(2017)). Guidelines is to assist in the avoidance of harm to all values
There is little doubt that there are specific applications to of all stakeholders. In addition to external stakeholders, share-
which each particular approach is well-suited – and also little holders and employees stand to lose where material harm to
doubt that each is neither a general approach nor deserving a company’s value arises from poorly-conducted data analyt-
of the pretentious title used to market it. As a tweeted apho- ics, including not only financial loss and compliance breaches
rism has it: “Most firms that think they want advanced AI/ML but also reputational damage.
really just need linear regression on cleaned-up data” (Hanson, The Guidelines are presented in Table 2, divided into
2016). four segments. Three of the segments correspond to the
computer law & security review 34 (2018) 467–476 471
DO’s
1.1 Governance
Ensure that a comprehensive governance framework is in place prior to, during, and for the relevant period after data acquisition,
analysis and use activities, that it is commensurate with the activities’ potential impacts, and that it encompasses:
a. risk assessment and risk management from the perspectives of all affected parties
b. express assignments of accountability, at an appropriate level of granularity
1.2 Expertise
Ensure that all individuals participating in the activities have education, training, and experience in relation to the real-world systems
about which inferences are to be drawn, appropriate to the roles that they play
1.3 Compliance
Ensure that all activities are compliant with all relevant laws and established public policy positions within relevant jurisdictions, and
with public standards of behaviour
2. Data Acquisition
DO’s
DON’Ts
3. Data Analysis
DO’s
3.1 Expertise
Ensure that all staff and contractors involved in the analysis have:
a. appropriate professional qualifications
b. training in the specific tools and processes
c. sufficient familiarity with the real-world system to which the data relates and with the manner in which the data purports to
represent that real-world system
d. accountability for their analyses
Table 2 – (continued)
3.3 The Nature of the Data Processed by the Tools
Understand the assumptions that data analytic tools make about the data that they process, and the extent to which the data to be
processed is consistent with those assumptions. Important areas in which assumptions may exist include:
a. the presence of values in relevant data-items
b. the presence of only specific, pre-defined values in relevant data-items
c. the scales against which relevant data-items have been measured
d. the precision with which relevant data-items have been expressed
DON’Ts
DO’s
4.2 Evaluation
Where decisions based on inferences from data analytics may have material negative impacts, evaluate the advantages and
disadvantages of proceeding, by conducting cost-benefit analysis and risk assessment from an organisational perspective, and
impact assessments from the perspectives of other internal and external stakeholders
4.4 Safeguards
Design, implement and maintain safeguards and mitigation measures, together with controls that ensure the safeguards and
mitigation measures are functioning as intended, commensurate with the potential impacts of the inferences drawn
4.5 Proportionality
Where specific decisions based on inferences from data analytics may have material negative impacts on individuals, consider the
reasonableness of the decisions prior to committing to them
4.6 Contestability
Where actions are taken based on inferences drawn from data analytics, ensure that the rationale for the decisions is transparent
to people affected by them, and that mechanisms exist whereby stakeholders can access information about, and if appropriate
complain about and dispute interpretations, inferences, decisions and actions
DON’Ts
successive processes involved – acquisition of the data, analy- positive outcomes for organisations in terms of the quality of
sis of the data in order to draw inferences, and use of the work performed, and particularly by providing a means of de-
inferences. The first segment specifies generic requirements fending against and deflecting negative media reports, public
that apply across all of the phases. concerns about organisational actions, and acts by any regu-
Each Guideline is expressed in imperative mode, some in lator that may have relevant powers. In practice, however, such
the positive and others in the negative. However, they are not Codes are applied by only a proportion of the relevant
statements of law, nor are they limited to matters that are organisations, are seldom taken very seriously (such as by em-
subject to legal obligations. They are declarations of what is bedding them within corporate policies, procedures, training
needed in order to manage the risks arising from data quality programs and practices), are unenforceable, and generally offer
issues, data meaning uncertainties, incompatibilities in data very limited benefits to external stakeholders. Nonetheless,
meaning among similar data-items sourced from different some modest improvements would be likely to accrue from
data-collections, misinterpretations of meaning, mistakes in- adoption, perhaps at the level of symbolism, but more likely
troduced by data scrubbing, approaches taken to missing data as a means of making it more difficult for data analytics issues
that may solve some problems but at the cost of creating or to be ignored.
exacerbating others, erroneous matches, unjustified assump- Individual organisations can take positive steps beyond such,
tions about the scale against which data has been measured, largely nominal, industry sector arrangements. They can embed
inappropriate applications of analytical tools, lack of review, consideration of the factors identified in these Guidelines into
and confusions among correlation, causality, predictive power their existing business case, cost/benefit and/or risk assess-
and normative force. ment and management processes. In order to fulfil their
The organisations and individuals to whom each Guide- corporate social responsibility commitments, they can also
line is addressed will vary depending on the context. In some evaluate proposed uses of data analytics from the perspec-
circumstances, a single organisation, a single small team within tives of external stakeholders. A very narrow and inadequate
an organisation, or even a single individual, might perform all approach to this merely checks legal compliance, as occurs with
of the activities involved. On the other hand, multiple teams the pseudo-PIA processes conventional in the public sector
within one organisation, or across multiple organisations, may throughout much of North America (Clarke, 2011 s.4), and in
perform several of the activities. the new European ‘Data Protection Impact Assessment’ (DPIA)
The Guidelines are intended to be comprehensive. As a mechanism (Clarke, 2017a). Much more appropriately, a com-
result, in any particular context, some of them will be prehensive Privacy Impact Assessment can be performed
redundant, and some would be more usefully expressed some- (Clarke, 2009; Wright and de Hert, 2012). In some circum-
what differently. In particular, some of the statements are stances, a much broader social impact assessment is warranted
primarily relevant to data that refers to an individual human (Raab and Wright, 2012; Wright and Friedewald, 2013). Raab &
being. Such statements may be irrelevant, or may benefit from Wright (2012 pp. 379–381) calls for extension of the scope of
re-phrasing, where the data relates to inanimate parts of the PIAs firstly to a wide range of impacts on the individual’s “re-
physical world (e.g. meteorological, geophysical, vehicular traffic lationships, positions and freedoms”, then to “impacts on groups
or electronic traffic data), or to aggregate economic or social and categories”, and finally to “impacts on society and the po-
phenomena. In such circumstances, careful sub-setting and ad- litical system”.
aptation of the Guidelines is appropriate. A further step that individual organisations can take is to
enter into formal undertakings to comply with a Code, com-
bined with submission to the decisions of a complaints body,
ombudsman or tribunal that is accessible by any aggrieved party
4. Ways to apply the Guidelines that has the resources to conduct investigations, that has en-
forcement powers, and that uses them. Unfortunately, such
These Guidelines, in their current or some adapted form, can arrangements are uncommon, and it is not obvious that suit-
be adopted by any organisation. Staff and contractors can be able frameworks exist within which an enforceable Code along
required to demonstrate that their projects are compliant, or, the lines of these Guidelines could be implemented.
to the extent that they are not, to explain why not. In prac- Another possibility is for a formal and sufficiently precise
tice, adoption may be driven by staff and contractors, because Standard to be established, and for this to be accepted by courts
many practitioners are concerned about the implications of their as the measuring-stick against which the behaviour of
work, and would welcome the availability of an instrument that organisations that conduct data analytics is to be measured.
enables them to raise issues in the context of project risk A loose mechanism of this kind is declaration by an organisation
management. that it is compliant with a particular published Standard. In
Organisational self-regulation of this kind has the capac- principle, this would appear to create a basis for court action
ity to deliver value for the organisation and for shareholders, by aggrieved parties. In practice, however, it appears that such
but it has only a mediocre track-record in benefiting stake- mechanisms are seldom effective in protecting either inter-
holders outside the organisation. A stronger institutional nal or external stakeholders.
framework is needed if preventable harm arising from inap- As discussed earlier, some documents exist that at least
propriate data, analysis and use is to be avoided. purport to provide independent guidance in relation to data
Industry associations can adopt or adapt the Guidelines, as analytics activities. These Guidelines can be used as a yard-
can government agencies that perform oversight functions. In- stick against which such documents can be measured. The
dustry regulation through a Code of Practice may achieve some UK Cabinet Office’s ‘Data Science Ethical Framework’ (UKCO,
474 computer law & security review 34 (2018) 467–476
2016) was assessed against an at-that-time-unformalised applications of the ideas exist, nor that all data collections are
version of these Guidelines, and found to be seriously wanting of such low quality that no useful inferences can be drawn from
(Raab and Clarke, 2016). For different reasons, and in differ- them, nor that all mergers of data from multiple sources are
ent ways, the Council of Europe document (CoE 2017) falls a necessarily logically invalid or necessarily deliver fatally flawed
very long way short of what is needed by professionals and consolidated data-sets, nor that all data scrubbing fails to clean
the public alike as a basis for responsible use of data analyt- data, nor that all data analytics techniques make assump-
ics. The US Government Accountability Office has identified tions about data that can under no circumstances be justified,
the existence of “possible validity problems in the data and nor that all inferences drawn must be wrong. Expressed in the
models used in [data analytics and innovation efforts – DAI]” positive, some big data has potential value, and some appli-
(GAO, 2016, p. 38), but has done nothing about them. An cations of data analytics techniques are capable of realising
indication of the document’s dismissiveness of the issues is that potential.
this quotation: “In automated decision making [using machine What this paper has done is to identify a very large fleet
learning], monitoring and assessment of data quality and of challenges that have to be addressed by each and every spe-
outcomes are needed to gain and maintain trust in DAI pro- cific proposal for the expropriation of data, the re-purposing
cesses” (p.13, fn.8). Not only does the statement appear in a of data, the merger of data, the scrubbing of data, the appli-
mere footnote, but the concern is solely about ‘trust’ and not cation of data analytics to it, and the use of inferences drawn
at all about the appropriateness of the inferences drawn, the from the process in order to make, or even guide, let alone
actions taken as a result of them, or the resource efficiency explain, decisions and action that affect the real world. Further,
and equitability of those actions. The current set of docu- it is far from clear that measures are being adopted to meet
ments from the US National Institute of Standards and these challenges.
Technology (NIST, 2015) is also remarkably devoid of discus- Ill-advised applications of data analytics are preventable by
sion about data quality and process quality, and offers no applying the Guidelines proposed in this paper. As the ‘big data’
process guidance along the lines of the Guidelines proposed mantra continues to cause organisations to have inflated ex-
in this paper. pectations of what data analytics can deliver, both shareholders
Another avenue whereby progress can be achieved is through and external stakeholders need constructive action to be taken
adoption by the authors of text-books. At present, leading texts in order to get data analytics practices under control, and avoid
commonly have a brief, excusatory segment, usually in the first erroneous business decisions, loss of shareholder value, in-
or last chapter. Curriculum proposals commonly suffer the same appropriate policy outcomes, and unjustified harm to individual,
defect, e.g. Gupta et al. (2015), Schoenherr and Speier-Pero (2015). social, economic and environmental values. The Guidelines pro-
Course-designers appear to generally follow the same pattern, posed in this paper therefore provide a basis for the design of
and schedule a discussion or a question in an assignment, organisational and regulatory processes whereby positive ben-
which represents a sop to the consciences of all concerned, efits can be gained from data analytics, but undue harm
but does almost nothing about addressing the problems, and avoided.
nothing about embedding solutions to those problems within
the analytics process. It is essential that the specifics of the
Guidelines in Table 2 be embedded in the structure of text-
books and courses, and that students learn to consider each Acknowledgement
issue at the point in the acquisition/analysis/use cycle at which
each challenge needs to be addressed. The author received valuable feedback from Prof. Louis de Koker
None of these approaches is a satisfactory substitute for leg- of La Trobe University, Melbourne, David Vaile and Dr. Lyria
islation that places formal obligations on organisations that Bennett Moses of UNSW, Sydney, Dr. Kerry Taylor of the ANU,
apply data analytics, and that provides aggrieved parties with Canberra, Dr. Kasia Bail of the University of Canberra, Prof.
the capacity to sue organisations where they materially breach Charles Raab of Edinburgh University, and an anonymous re-
requirements and there are material negative impacts. Such viewer. Evaluative comments are those of the author alone.
a scheme may be imposed by an activist legislature, or a regu-
latory framework may be legislated and the Code negotiated
with the relevant parties prior to promulgation by a del-
REFERENCES
egated agency. It is feasible for organisations themselves to
submit to a parliament that a co-regulatory scheme of such a
kind should be enacted, for example where scandals arise from
ACM. Statement on algorithmic transparency and accountability.
inappropriate use of data analytics by some organisations, Association for Computing Machinery; 2017. Available from:
which have a significant negative impact on the reputation of https://www.acm.org/binaries/content/assets/public-policy/
an industry sector as a whole. 2017_usacm_statement_algorithms.pdf. [Accessed November
24, 2017].
Agrawal D, Philip Bernstein P, Bertino E, Davidson S, Dayal U,
Franklin M, Gehrke J, et al. Challenges and opportunities with
5. Conclusions big data 2011-1. Cyber Center Technical Reports, Paper 1; 2011.
Available from: http://docs.lib.purdue.edu/cctech/1. [Accessed
November 24, 2017].
This paper has not argued that big data and big data analyt- Anderson C. The end of theory: the data deluge makes the
ics are inherently evil. It has also not argued that no valid scientific method obsolete. Wired Magazine 16:07; 2008.
computer law & security review 34 (2018) 467–476 475
ASA. Ethical guidelines for statistical practice. American DoFD. Better practice guide for big data. Australian Dept of
Statistical Association; 2016. Available from: http:// Finance & Deregulation, v.2; 2015. Available from: http://
www.amstat.org/ASA/Your-Career/Ethical-Guidelines-for www.finance.gov.au/sites/default/files/APS-Better-Practice
-Statistical-Practice.aspx. [Accessed November 24, 2017]. -Guide-for-Big-Data.pdf. [Accessed November 24, 2017].
Bollier D. The promise and peril of big data. The Aspen Institute; Domingos P. A few useful things to know about machine
2010. Available from: https://www.emc.co.tt/collateral/ learning. Commun ACM 2012;55(10):78–87.
analyst-reports/10334-ar-promise-peril-of-big-data.pdf. Dreyfus H. What computers still can’t do. MIT Press; 1992.
[Accessed November 24, 2017]. DSA. Data science code of professional conduct. Data Science
Bostrom N. Superintelligence: paths, dangers, strategies. Oxford Association, undated but apparently of 2016; 2016. Available
University Press; 2014. from: http://www.datascienceassn.org/sites/default/files/
Boyd D, Crawford K. Six provocations for big data. Proc. datasciencecodeofprofessionalconduct.pdf. [Accessed
Symposium on the Dynamics of the Internet and Society; November 24, 2017].
2011. Available from: http://ssrn.com/abstract=1926431. GAO. Emerging opportunities and challenges data and analytics
[Accessed November 24, 2017]. innovation. Government Accountability Office, Washington
Broeders D, Schrijvers E, van der Sloot B, van Brakel R, de Hoog J, DC; 2016. Available from: http://www.gao.gov/assets/680/
Ballina EH. Big data and security policies: towards a 679903.pdf. [Accessed November 24, 2017].
framework for regulating the phases of analytics and use of Gupta B, Goul M, Dinter B. Business intelligence and big data in
big data. Comput Law Secur Rev 2017;33:309–23. higher education: status of a multi-year model curriculum
Burrell J. How the machine ‘thinks’: understanding opacity in development effort for business school undergraduates, MS
machine learning algorithms. Big Data Soc 2016;3(1):1–12. graduates, and MBAs. Commun Assoc Inf Syst 2015;
Cai L, Zhu Y. The challenges of data quality and data quality 36(23):Available from: https://www.researchgate.net/profile/
assessment in the big data era. Data Sci J 2015;14(2):1–10. Babita_Gupta4/publication/274709810_Communications
Available from: https://datascience.codata.org/articles/ _of_the_Association_for_Information_Systems/links/
10.5334/dsj-2015-002/. 557ecd4b08aeea18b7795225.pdf.
Cao L. Data science: a comprehensive overview. ACM Hanson R. This AI boom will also bust. Overcoming Bias Blog;
Computing Surveys; 2017. Available from: http://dl.acm 2016. Available from: http://www.overcomingbias.com/2016/
.org/ft_gateway.cfm?id=3076253&type=pdf. [Accessed 12/this-ai-boom-will-also-bust.html. [Accessed November 24,
November 24, 2017]. 2017].
Clarke R. A contingency approach to the software generations. Haryadi AF, Hulstijn J, Wahyudi A, van der Voort H, Janssen M.
Database 1991;22(3):23–34. PrePrint available from: http:// Antecedents of big data quality: an empirical examination in
www.rogerclarke.com/SOS/SwareGenns.htmlSummer 1991. financial service organizations. Proc. IEEE Int’l Conf. on Big
Clarke R. Dataveillance by governments: the technique of Data; 2016. pp. 116–21. Available from: https://pure.tudelft.nl/
computer matching. Inf Tech People 1994;7(2):46–85. PrePrint portal/files/13607440/Antecedents_of_Big_Data_Quality
available from: http://www.rogerclarke.com/DV/MatchIntro _IEEE2017_author_version.pdf. [Accessed November 24, 2017].
.html. Hazen BT, Boone CA, Ezell JD, Jones-Farmer LA. Data quality for
Clarke R. Privacy impact assessment: its origins and data science, predictive analytics, and big data in supply
development. Comput Law Secur Rev 2009;25(2):123–35. chain management: an introduction to the problem and
PrePrint available from: http://www.rogerclarke.com/DV/ suggestions for research and applications. Int J Prod Econ
PIAHist-08.html. 2014;154:72–80. Available from: https://www.researchgate.net/
Clarke R. An evaluation of privacy impact assessment guidance profile/Benjamin_Hazen/publication/261562559_Data_Quality
documents. Int Data Priv Law 2011;1(2):111–20. PrePrint _for_Data_Science_Predictive_Analytics_and_Big_Data_in
available from: http://www.rogerclarke.com/DV/PIAG-Eval _Supply_Chain_Management_An_Introduction_to_the
.html. _Problem_and_Suggestions_for_Research_and_Applications/
Clarke R. Big data, big risks. Inf Syst J 2016a;26(1):77–90. links/0deec534b4af9ed874000000.
PrePrint available from: http://www.rogerclarke.com/EC/ Huh YU, Keller FR, Redman TC, Watkins AR. Data quality. Inf
BDBR.html. Softw Tech 1990;32(8):559–65.
Clarke R. Quality assurance for security applications of big data. ICO. Big data, artificial intelligence, machine learning and data
Proc. European Intelligence and Security Informatics protection. UK Information Commissioner’s Office, Discussion
Conference (EISIC), Uppsala, 17–19 August 2016; 2016b. Paper v.2.2; 2017. Available from: https://ico.org.uk/for-
PrePrint available from: http://www.rogerclarke.com/EC/ organisations/guide-to-data-protection/big
BDQAS.html. [Accessed November 24, 2017]. -data/. [Accessed November 24, 2017].
Clarke R. The distinction between a PIA and a Data Protection Jagadish HV. Big data and science: myths and reality. Big Data
Impact Assessment (DPIA) under the EU GDPR. Working Res 2015;2(2):49–52.
Paper, Xamax Consultancy Pty Ltd; 2017a. Available from: Jagadish HV, Gehrke J, Labrinidis A, Papakonstantinou Y, Patel
http://www.rogerclarke.com/DV/PIAvsDPIA.html. [Accessed JM, Ramakrishnan R, et al. Big data and its technical
November 24, 2017]. challenges. Commun ACM 2014;57(7):86–94.
Clarke R. Big data prophylactics, chapter 1. In: Lehmann A, Kandel S, Heer J, Plaisant C, Kennedy J, van Ham F, Henry-Riche
Whitehouse D, Fischer-Hübner S, Fritsch L, Raab C, editors. N, et al. Research directions for data wrangling: visualizations
Privacy and identity management. Facing up to next steps. and transformations for usable and credible data.
Springer; 2017b. p. 3–14 [chapter 1]. PrePrint available from: Information Visualization 10. 4; 2011. 271–88. Available from:
http://www.rogerclarke.com/DV/BDP.html. https://idl.cs.washington.edu/files/2011-DataWrangling-
CoE. Guidelines on the protection of individuals with IVJ.pdf. [Accessed November 24, 2017].
regard to the processing of personal data in a world of Katz Y. Noam Chomsky on where artificial intelligence went
big data. Convention 108 Committee, Council of Europe; wrong: an extended conversation with the legendary linguist.
2017. Available from: https://rm.coe.int/ The Atlantic; 2012. Available from: https://www.theatlantic
CoERMPublicCommonSearchServices/ .com/technology/archive/2012/11/noam-chomsky-on-where
DisplayDCTMContent?documentId=09000016806ebe7a. -artificial-intelligence-went-wrong/261637/. [Accessed
[Accessed November 24, 2017]. November 24, 2017].
476 computer law & security review 34 (2018) 467–476
King NJ, Forder J. Data analytics and consumer profiling: finding Raab C, Clarke R. Inadequacies in the UK’s data science ethical
appropriate privacy principles for discovered data. Comput framework. Euro Data Protect L 2016;2(4):555–60. PrePrint
Law Secur Rev 2016;32:696–714. available from: http://www.rogerclarke.com/DV/DSEFR.html.
Knight W. The dark secret at the heart of AI. 11 April 2017, MIT Raab CD, Wright D, de Hert P. editors. Surveillance: extending the
Technology Review; 2017. Available from: https:// limits of privacy impact assessment. 2012. p. 363–83 [Ch. 17].
www.technologyreview.com/s/604087/the-dark-secret-at-the Rahm E, Do HH. Data cleaning: problems and current
-heart-of-ai/. [Accessed November 24, 2017]. approaches. IEEE Data Eng Bull 2000;23:Available from: http://
Kurzweil R. The singularity is near: when humans transcend dc-pubs.dbs.uni-leipzig.de/files/
biology. Viking; 2005. Rahm2000DataCleaningProblemsand.pdf.
Lazer D, Kennedy R, King G, Vespignani A. The parable of Google Rivers CM, Lewis BL. Ethical research standards in a world of big
flu: traps in big data analysis. Science 2014;343(6176):1203–5. data. F1000Res 2014;3:38. Available from: https://
Available from: https://dash.harvard.edu/bitstream/handle/1/ f1000research.com/articles/3-38.
12016836/The%20Parable%20of%20Google%20Flu%20%28WP Rosebrock A. Get off the deep learning bandwagon and get some
-Final%29.pdf. perspective. PY Image Search; 2014. Available from: https://
Lipton ZC. (Deep Learning’s Deep Flaws)’s Deep Flaws. KD www.pyimagesearch.com/2014/06/09/get-deep
Nuggets; 2015. Available from: http://www.kdnuggets.com/ -learning-bandwagon-get-perspective/. [Accessed November
2015/01/deep-learning-flaws-universal-machine-learning 24, 2017].
.html. [Accessed November 24, 2017]. Saha B, Srivastava D. Data quality: The other face of big data.
Mayer-Schonberger V, Cukier K. Big data, a revolution that Proc. Data Engineering (ICDE); 2014. pp. 1294–7. Available
will transform how we live, work and think. John Murray; from: https://people.cs.umass.edu/~barna/paper/ICDE
2013. -Tutorial-DQ.pdf. [Accessed November 24, 2017].
McFarland DA, McFarland HR. Big data and the danger of being Schoenherr T, Speier-Pero C. Data science, predictive
precisely inaccurate. Big Data Soc 2015;2(2):1–4. analytics, and big data in supply chain management:
Merino J, Caballero I, Bibiano R, Serrano M, Piattini M. A data current state and future potential. J Bus Logist 2015;36(1):
quality in use model for big data. Fut Gen Comput Syst 120–32. Available from: http://www.logisticsexpert.org/
2016;63:123–30. top_articles/2016/2016%20-%20Research%20-%20JBL%20
Metcalf J, Crawford K. Where are human subjects in big data -%20Data%20Science,%20Predictive%20Analytics,%20and%
research? The emerging ethics divide. Big Data Soc 20Big%20Data%20in%20Supply%20Chain%20Managementl
2016;3(1):1–14. .pdf.
Mittelstadt BD, Allo P, Taddeo M, Wachter S, Floridi L. The ethics Shanks G, Darke P. Understanding data quality in a data
of algorithms: mapping the debate. Big Data Soc 2016;3(2):1– warehouse. Aust Comput J 1998;30:122–8.
21. Swoyer S. The shortcomings of predictive analytics. TDWI; 2017.
Moravec H. Robot: mere machine to transcendent mind. Oxford Available from: https://tdwi.org/articles/2017/03/08/
University Press; 2000. shortcomings-of-predictive-analytics.aspx. [Accessed
Müller H, Freytag J-C. Problems, methods and challenges in November 24, 2017].
comprehensive data cleansing. Technical Report HUB-IB-164, UKCO. Data science ethical framework. U.K. Cabinet Office, v.1.0;
Humboldt-Universität zu Berlin, Institut für Informatik; 2003. 2016. Available from: https://www.gov.uk/government/
Available from: http://www.informatik.uni-jena.de/dbis/lehre/ publications/data-science-ethical-framework. [Accessed
ss2005/sem_dwh/lit/MuFr03.pdf. [Accessed November 24, November 24, 2017].
2017]. UNSD. Declaration of professional ethics. United Nations
Müller O, Junglas I, vom Brocke1 J, Debortoli S. Utilizing big data Statistical Division; 1985. Available from: http://
analytics for information systems research: challenges, unstats.un.org/unsd/dnss/docViewer.aspx?docID=93#start.
promises and guidelines. Eur J Inf Syst 2016;25(4):289–302. [Accessed November 24, 2017].
Available from: https://www.researchgate.net/profile/ Wang RY, Strong DM. Beyond accuracy: what data quality means
Oliver_Mueller5/publication/290973859_Utilizing_Big_Data to data consumers. J Manag Inf Syst 1996;12(4):5–33. Spring,
_Analytics_for_Information_Systems_Research_Challenges 1996.
_Promises_and_Guidelines/links/56ec168f08aee4707a384fff/ Wigan MR, Clarke R. Big data’s big unintended consequences.
Utilizing-Big-Data-Analytics-for-Information-Systems IEEE Comput 2013;46(6):46–53. PrePrint available from: http://
-Research-Challenges-Promises-and-Guidelines.pdf. www.rogerclarke.com/DV/BigData-1303.html.
NIST. NIST big data interoperability framework. Special WP29. Statement of the WP29 on the impact of the development
Publication 1500-1, v.1, National Institute of Standards and of big data on the protection of individuals with regard to the
Technology; 2015. Available from: https://bigdatawg.nist.gov/ processing of their personal data in the EU. Article 29 Working
V1_output_docs.php. [Accessed November 24, 2017]. Party, European Union; 2014. Available from: http://
OAIC. Consultation draft: guide to big data and the Australian ec.europa.eu/justice/data-protection/article-29/
Privacy Principles. Office of the Australian Information documentation/opinion-recommendation/files/2014/wp221
Commissioner; 2016. Available from: https://www.oaic.gov.au/ _en.pdf. [Accessed November 24, 2017].
engage-with-us/consultations/guide-to-big-data-and-the Wright D, de Hert P, editors. Privacy impact assessments.
-australian-privacy-principles/consultation-draft-guide-to-big Springer; 2012.
-data-and-the-australian-privacy-principles. [Accessed Wright D, Friedewald M. Integrating privacy and ethical impact
November 24, 2017]. assessments. Sci Public Policy 2013;40(6):755–66. Available
Piprani B, Ernst D. A model for data quality assessment. Proc. from: http://spp.oxfordjournals.org/content/40/6/755.full.
OTM Workshops (5333); 2008. pp 750–9. Zook M, Barocas S, boyd d, Crawford K, Keller E, Gangadharan SP,
Press G. A very short history of data science. Forbes; 2013. Goodman A, et al. Ten simple rules for responsible big data
Available from: https://www.forbes.com/sites/gilpress/2013/ research. PLoS Comput Biol 2017;13(3):Available from: https://
05/28/a-very-short-history-of-data-science/#375c75e355cf. www.ncbi.nlm.nih.gov/pmc/articles/PMC5373508/. [Accessed
[Accessed November 24, 2017]. November 24, 2017].
Roger Clarke's 'Responsible AI' 3/10/18, 16)39
Roger Clarke **
Abstract
Organisations across the private and public sectors are looking to use artificial intelligence (AI) techniques not only to draw inferences, but also to make decisions and take
action, and even to do so autonomously. This is despite the absence of any means of programming values into technologies and artefacts, and the obscurity of the rationale
underlying inferencing using contemporary forms of AI.
To what extent is AI really suitable for real-world applications? Can corporate executives satisfy their board-members that the business is being managed appropriately if
AI is inscrutable? Beyond operational management, there are compliance risks to manage, and threats to important relationships with customers, staff, suppliers and the
public. Ill-advised uses of AI need to be identified in advance and nipped in the bud, to avoid harm to important values, both corporate and social. Organisations need to
extract the achievable benefits from advanced technologies rather than dreaming dangerous dreams.
This working paper first considers several approaches to addressing the gap between the current round of AI marketing hype and the hard-headed worlds of business and
government. It is first proposed that AI needs to be re-conceived as 'complementary intelligence', and that the robotics notion of 'machines that think' needs to give way to
the idea of 'intellectics', with the focus on 'computers that do'.
A review of 'ethical analysis' of IT's impacts extracts little of value. A consideration of regulatory processes proves to be of more use, but to still deliver remarkably little
concrete guidance. It is concluded that the most effective approach for organisations to take is to apply adapted forms of the established techniques of risk assessment and
risk management. Critically, stakeholder analysis needs to be performed, and risk assessment undertaken, from those perspectives as well as from that of the organisation
itself. This Working Paper's final contribution is to complement that customised form of established approaches to risk by the presentation of a derivative set of Principles
for Responsible AI, with indications provided of how those Principles can be operationalised for particular forms of complementary intelligence and intellectics.
Contents
1. Introduction
2. Rethinking AI
2.1 'AI' cf. 'Complementary Intelligence'
2.2 Autonomy
2.3 Technology, Artefacts, Systems and Applications
3. Contemporary AI
3.1 Robotics
3.2 Cyborgisation
3.3 'AI / ML'
3.4 Intellectics
4. Ethics
5. Regulation
6. A Practical Approach
6.1 Corporate Risk Assessment
6.2 Stakeholder Risk Assessment
6.3 Comprehensive Risk Management
7. Towards Operational Principles
8. Conclusions
References
Supporting Materials:
Ethical Principles and Information Technology
Principles for AI: A SourceBook
Appendix 1: 50 Principles for Responsible AI Technologies, Artefacts, Systems and Applications
Appendix 2: Omitted Elements
1. Introduction
The term Artificial Intelligence (AI) was coined in 1955 in a proposal for the 1956 Dartmouth Summer Research Project in Automata (McCarthy et al. 1955). The proposal
was based on "the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to
simulate it". Histories of AI (e.g. Russell & Norvig 2009, pp. 16-28) identify multiple strands, but also multiple re-visits to much the same territory, and a considerable
http://rogerclarke.com/EC/GAIF.html Page 1 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39
The over-enthusiasm that characterises the promotion of AI has deep roots. Simon (1960) averred that "Within the very near future - much less than twenty-five years - we
shall have the technical capability of substituting machines for any and all human functions in organisations. ... Duplicating the problem-solving and information-handling
capabilities of the brain is not far off; it would be surprising if it were not accomplished within the next decade". Over 35 years later, with his predictions abundantly
demonstrated as being fanciful, Simon nonetheless maintained his position, e.g. "the hypothesis is that a physical symbol system [of a particular kind] has the necessary
and sufficient means for general intelligent action" (Simon 1996, p. 23 - but expressed in similar terms from the late 1950s, in 1969, and through the 1970s), and "Human
beings, viewed as behaving systems, are quite simple" (p. 53). Simon acknowledged "the ambiguity and conflict of goals in societal planning" (p. 140), but his subsequent
analysis of complexity (pp. 169-216) considered only a very limited sub-set of the relevant dimensions. Much the same dubious assertions can be found in, for example,
Kurzweil (2005): "by the end of the 2020s" computers will have "intelligence indistinguishable to biological humans" (p.25), and in self-promotional documents of the
current decade.
AI has offered a long litany of promises, many of which have been repeated multiple times, on a cyclical basis. Each time, proponents have spoken and written excitedly
about prospective technologies, using descriptions that not merely verged into the mystical, but often crossed the border into the realms of magic and alchemy. Given the
habituated exaggerations that proponents indulge in, it is unsurprising that the field has exhibited cyclical 'boom and bust' patterns, with research funding being sometimes
very easy to obtain, and sometimes very difficult, depending on whether the focus at the time is on the hyperbole or on the very low delivery-rate against promises.
Part of AI's image-problem is that most of the successes deriving from what began as AI research have shed the name, and become associated with other terms. For
example, pattern recognition, variously within text, speech and two-dimensional imagery, has made a great deal of progress, and achieved application in multiple fields, as
diverse as dictation, vehicle number-plate recognition and object and facial recognition. Expert systems approaches, particularly based on rule-sets, have also achieved a
degree of success. Game-playing, particularly of chess and go, have surpassed human-expert levels, and provided entertainment value and spin-offs, but seem not to have
provided the breakthroughs towards posthumanism that their proponents appeared to be claiming for them.
This Working Paper concerns itself with the question of how organisations can identify AI technologies that have practical value, and apply them in ways that achieve
benefits, without incurring disproportionate disbenefits or giving rise to unjustified risks. A key feature of AI successes to date appears to be that, even where the
technology or its application is complex, it is understandable by people with appropriate technical background, i.e. it is not magic and is not presented as magic, and its
applications are auditable. AI technologies that have been effective have been able to be empirically tested in real-world contexts, but under sufficiently controlled
conditions that the risks have been able to be managed.
The scope addressed in this Working Paper is very broad, in terms of both technologies and applications, but it does not encompass design and use for warfare or armed
conflict. It does, however, include applications to civil law enforcement and domestic national security, i.e. safeguards for the public, for infrastructure, and for public
figures.
This working paper commences by considering interpretations of the AI field that may contribute to overcoming its problems and assist in analysing the opportunities and
threats that it embodies. Brief scans are undertaken of current technologies that are within the field of view. There are several possible sources of guidance in relation to the
responsible use of AI. The paper first considers ethics, and then regulatory regimes. It proposes, however, that the most useful approach is through risk assessment and
management processes, but expanding the perspectives from solely that of the organisation itself to also embrace those of stakeholders. The final section draws on the
available sources in order to propose a set of principles for the responsible application of AI that are specific enough to guide organisations' business processes.
2. Rethinking AI
A major contributor to AI's problems has been the diverse and often conflicting conceptions of what it is, and what it is trying to achieve. The first necessary step is to
disentangle the key ideas, and adopt an interpretation that can assist user organisations to appreciate the nature of the technology, and then analyse its potential
contributions and downsides.
The general sense in which the term 'intelligence' is used by the AI community is that an entity exhibits intelligence if it has perception and cognition of (relevant aspects
of) its environment, has goals, and formulates actions towards the achievement of those goals (Albus 1991, Russell & Norvig 2003, McCarthy 2007). Some AI proponents
strive to replicate in artefacts the processes whereby human entities exhibit intelligence, whereas others define AI in terms of the artefact's performance rather than the
means whereby the performance arises.
The term 'artificial' has always been problematic. The originators of the term used it to mean 'synthetic', in the sense of being human-made but equivalent to human. It is far
from clear that there was a need for yet more human intelligence in 1955, when there were 2.8 billion people, let alone now, when there are over 7 billion of us, many
under-employed and likely to remain so.
Some proponents have shifted away from human-equivalence, and posited that AI is synthetic, but in some way 'superior-to-human'. This raises the question as to how
superiority is to be measured. For example, is playing a game better than human experts necessarily a useful measure? There is also a conundrum embedded in this
approach: if human intelligence is inferior, how can it reliably define what 'superior-to-human' means?
An alternative approach may better describe what humankind needs. An idea that is traceable at least to Wyndham (1932) is that " ... man and machine are natural
complements: They assist one another". I argued in Clarke (1989) that there was a need to "deflect the focus ... toward the concepts of 'complementary intelligence' and
'silicon workmates' ... to complement human strengths and weaknesses, rather than to compete with them". Again, in Clarke (1993), reprised in Clarke (2014b), I reasoned
that: "Because robot and human capabilities differ, for the foreseeable future at least, each will have specific comparative advantages. Information technologists must
delineate the relationship between robots and people by applying the concept of decision structuredness to blend computer-based and human elements advantageously".
Adopting this approach, AI needs to be re-conceived such that its purpose is to extend human capabilities. Rather than 'artificial' intelligence, the design objective needs to
be 'complementary' intelligence, the essence of which is:
An important category of 'complementary intelligence' is the use of negative-feedback mechanisms to achieve automated equilibration within human-made systems. A
longstanding example is the maintenance of ship trim and stability by means of hull shape and careful weight distribution, including ballast. A more commonly celebrated
instance is Watts' fly-ball governor for regulating the pressure in a boiler. Of more recent origin are schemes to achieve real-time control over the orientation of craft
floating in fluids, and maintenance of their location or path. There are successful applications to deep-water oil-rigs, underwater craft, and aircraft both with and without
pilots on board. The notion is also exemplified by the distinction between decision support systems (DSS), which are designed to assist humans make decisions, and
decision systems (DS), whose purpose is to make the decisions without human involvement.
http://rogerclarke.com/EC/GAIF.html Page 2 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39
Computer-based systems have a clear advantage over humans in contexts in which significant computation is involved, reliability and accuracy are important, and speed of
inferencing, decision-making and/or action-taking, are important. This advantage is, however, limited to circumstances in which either a structured process exists or
heuristics or purely empirical techniques have been well-demonstrated to be effective.
Further advantages may arise in relation to cost, the delegation to devices of boringly mundane tasks, and the performance by artefacts of tasks that are inherently
dangerous, or that need to be performed in environments that are inherently dangerous to humans and/or are beyond their physical capabilities (e.g. environments that
feature high pressure such as deep water, low pressure such as space, or high radiation levels both in space and close to nuclear materials). Even where such superiority can
be demonstrated, however, the need exists to focus discussion about AI on complementary intelligence, on technologies that augment human capabilities, and on systems
that feature collaboration between humans and artefacts.
I contend that the use of the complementary intelligence notion can assist organisations in their efforts to distinguish uses of AI that have prospects for adoption, the
generation of net benefits, the management of disbenefits, and the achievement of public acceptability.
2.2 Autonomy
The concept of 'automation' is concerned with the performance of a predetermined procedure, or response in predetermined ways to alternative stimuli. It is observable in
humans, e.g. under hypnosis, and is designed-into many kinds of artefacts.
The rather different notion of 'autonomy' means, in humans, the capacity for independent decision and action. Further, in some contexts, it also encompasses a claim to the
right to exercise that capacity. It is associated with the notions of consciousness, sentience, self-awareness, free will and self-determination. Autonomy in artefacts, on the
other hand, lies much closer to the notion of automation. It may merely refer to a substantial repertoire of pre-programmed stimulus-response relationships. Alternatively, it
may refer to some degree of adaptability to context, as might arise if some form of machine-learning were included, such that the specification of the stimulus-response
relationships change over time depending on the cases handled in the intervening period. Another approach might be to define artefact autonomy in terms of the extent to
which a human or some other artefact, does, or even can, intervene in the artefact's behaviour.
In humans, autonomy is best approached as a layered phenomenon. Each of us performs many actions in a subliminal manner. For example, our eye and ear receptors
function without us ever being particularly aware of them, and several layers of our neural systems handle the signals in order to offer us cognition, that is to say awareness
and understanding, of the world around us.
A layered approach is applicable to artefacts as well. Aircraft generally, including drones, may have layers of behaviour that occur autonomously, without pilot action or
even awareness. Maintenance of the aircraft's 'attitude' (orientation to the vertical and horizontal), and angle to the wind-direction, may, from the pilot's viewpoint, simply
happen. At a higher level of delegation, the aircraft may adjust the aircraft's flight controls in order to maintain a predetermined flight-path, and in the case of rotorcraft, to
maintain the vehicle's location relative to the earth's surface. A higher-order autonomous function is inflight manoeuvring to avoid collisions. At a yet higher level, some
aircraft can perform take-off and/or landing autonomously. To date, high-order activities that are seldom if ever autonomous include decisions about when to take off and
land, the mission objective, and 'target acquisition' (where to land, where to deliver a payload, which location to direct the payload towards).
At the lower levels, the rapidity with which analysis, decision and action need to be taken may preclude conscious human involvement. At the higher levels, however, a
pilot may be able to request advice, to accept or reject advice, to authorise an action recommended by an artefact, to override or countermand a default action, or to resume
full control. From the perspective of the drone, its functions may be to perform until its autonomous function is revoked, to perform except where a particular action is
over-ridden, to recommend, to advise, or to do nothing.
IEEE, even thought it is one of the most relevant professional associations in the field, made no meaningful attempt to address these issues for decades. It is currently
endeavouring to do so. It commenced with a discussion paper (IEEE 2017) which avoids the term AI, and instead uses the term 'Autonomous and Intelligent Systems
(A/IS)'. This highlights the need to address both intelligence and autonomy in an integrated manner.
IEEE's discussion paper (IEEE 2017) recognises that the end-result of successive rounds of R&D is complex systems that are applied in real-world contexts. In order to
deliver such systems, however, technology has to be conceived, proven, and embedded in artefacts. It is therefore valuable to distinguish between technology, artefacts that
embody the technology, systems that incorporate the artefacts, and applications of those systems. Appropriate responsibilities can then be assigned to researchers, to
inventors, to innovators, to purveyors, and to users. Table 1 identifies phases, the output from each phase, and the categories of entity that bear legal and moral
responsibility for disbenefits arising from AI.
This section has proposed several measures whereby the fog induced by the AI notion can be lifted, and a framework developed for managing AI-based activities. The
focus needs to be on complementary intelligence and autonomy, as features of technology, artefacts, systems and applications that support collaboration among all system
elements.
3. Contemporary AI
AI's scope is broad, and contested. This section identifies areas that have current relevance. Their relevance derives in part from claims of achievement of progress and
benefits, and in part from media coverage resulting in awareness among both organisations' staff and the general public. In addition to achieving some level of adoptiong,
each faces to at least some degree technical challenges, public scepticism and resistance. Achievement of the benefits that are potentially extractable from these
technologies is also threatened by over-claiming, over-reach, and resulting loss of public confidence. This section considers three forms of AI, and then suggests an
alternative conceptualisation intended to assist in understanding and addressing the technical, acceptance and adoption challenges.
3.1 Robotics
http://rogerclarke.com/EC/GAIF.html Page 3 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39
Robotics originally emerged in the form of machines enhanced with computational capacity. The necessary elements are sensors to acquire data from the robot's
environment, computing hardware and software to enable inferences to be drawn and decisions made, and actuators in order to give effect to those decisions by acting on
the robot's environment. Robotics has enjoyed its major areas of success in controlled environments such as the factory floor and the warehouse. Less obviously 'robotic'
systems include low-level control over the attitude, position and course of craft on or in water and in the air.
The last few years have seen a great deal of coverage of self-driving vehicles, variously on rails and otherwise, in controlled environments such as mines and quarries and
dedicated bus routes, and recently in more open environments. In addition, robotics has taken flight, in the form of drones (Clarke 2014a).
Many claims have been made recently about 'the Internet of Things' (IoT) and about systems comprising many small artefacts, such as 'smart houses' and 'smart cities'. For
a consolidation and rationalisation of multiple such ideas into the notion of an 'eObject', see Manwaring & Clarke (2015). Many of the initiatives in this area are robotic in
nature, in that they encompass all of sensors, computing and actuators.
3.2 Cyborgisation
The term cyborgisation refers to the process of enhancing individual humans by technological means, such that a cyborg is a hybrid of a human and one or more artefacts
(Clarke 2005, Warwick 2014). Many forms of cyborg fall outside the field of AI, such as spectacles, implanted lenses, stents, inert hip-replacements and SCUBA gear.
However, a proportion of the artefacts that are used to enhance humans include sensors, computational or programmatic 'intelligence', and one or more actuators. Examples
include heart pacemakers (since 1958), cochlear implants (since the 1960s, and commercially since 1978), and some replacement legs for above-knee amputees, in that the
artificial knee contains software to sustain balance within the joint.
Many such artefacts replace lost functionality, and are referred to as prosthetics. Others, which can be usefully referred to as orthotics, provide augmented or additional
functionality (Clarke 2011). An example of an orthotic is augmented reality for firefighters, displaying building plans and providing object-recognition in their visual field.
It was argued in Clarke (2014b) that use by drone pilots of instrument-based remote control, and particularly of first-person view (FPV) headsets, represent a form of
orthotic cyborgisation.
Artefacts of these kinds are not commonly included in catalogues of AI technology. On the other hand, they have a great deal in common with it, and with the notion of
complementary intelligence, and research in the field is emergent (Zhaohui et al. 2016). Cyborgisation has accordingly been defined as being within-scope of the present
analysis.
AI research has delivered a further technique, which accords primacy to the data rather than the model, and has the effect of obscuring the model to such an extent that no
humanly-understandable rationale exists for the inferences that are drawn. The relevant branch of AI is 'machine learning' (ML), and the most common technique in use is
'artificial neural networks'. The approach dates to the 1950s, but limited progress was made until sufficiently powerful processors were readily available, from the late
1980s.
Neural nets involve a set of nodes (each of which analogous to the biological concept of a neuron), with connections or arcs among them, referred to as 'edges'. Each
connection has a 'weight' associated with it. Each node performs some computation based on incoming data and may as a result adapt its internal state, including the
weighting on each connection, and may pass output to one or more other nodes. A neural net has to be 'trained'. This is done by selecting a training method (or 'learning
algorithm') and feeding a 'training-set' of data to the network in order to load up a set of weightings on the connections between nodes.
Unlike previous techniques for developing software, neural networking approaches do not begin with active and careful modelling of a real-world problem-solution,
problem or even problem-domain. Rather than comprising a set of entities and relationships that mirrors the key elements and processes of a real-world system, a neural
network model is simply a list of input variables and a list of output variables (and, in the case of 'deep' networks, intermediary variables). If a model exists, in the sense of
a representation of the real world, it is implicit rather than express. The weightings imputed for each connection reflect the characteristics firstly of the training-set that was
fed in, and secondly of the particular learning algorithm that was imposed on the training-set.
Although algorithms are used in the imputation of weightings on the connections within a neural net, the resulting software is not algorithmic, but rather empirical. This
has led some authors to justify a-theoretical mechanisms in general, and to glorify correlation and deprecate the search for causal relationships and systemic analysis
generally (Anderson 2008, Mayer-Schonberger & Cukier 2013).
AI/ML may well have the capacity to discover gems of otherwise-hidden information. However, the inferences drawn inevitably reflect any errors and biasses inherent in
the implicit model, in the selection of real-world phenomena for which data was created, in the selection of training-set, and in the learning algorithms used to develop the
software that delivers the inferences. Means are necessary to assess the quality of the implicit model, of the data-set, of the data-item values, of the training-set and of the
learning algorithm, and the compatibility among them, and to validate the inferences both logically and empirically. Unless and until those means are found, and are
routinely applied, AI/ML and neural nets must be regarded as unproven techniques that harbour considerable dangers to the interests of organisations and their
stakeholders.
3.4 Intellectics
Robotics began with an emphasis on machines being enhanced with computational elements and software. However, the emphasis has been shifting. I contend that the
conception now needs to be inverted, and the field regarded as computers enhanced with sensors and actuators, enabling computational processes to sense the world and act
directly on it. Rather than 'machines that think', the focus needs to be on 'computers that do'. The term 'intellectics' is a useful means of encapsulating that switch in
emphasis.
The term has been previously used in a related manner by Wolfgang Bibel, originally in German (1980, 1989). Bibel was referring to the combination of Artificial
Intelligence, Cognitive Science and associated disciplines, using the notion of the human intellect as the integrating element. Bibel's sense of the term has gained limited
currency, with only a few mentions in the literature and only a few authors citing the relevant papers. The sense in which I use the term here is rather different:
In the new context of intellectics, artefacts go beyond merely drawing inferences from data, in that they generate a strong impulse for an action to be taken in
the real world
I suggest the following criteria for assessing whether an artefact should be classified as falling within the field of intellectics:
http://rogerclarke.com/EC/GAIF.html Page 4 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39
At a higher level, an artefact makes a decision, which will result in action unless over-ridden or countermanded by a human
At the highest level, an artefact makes a decision, and takes action in the real world to give effect to that decision, without providing an opportunity for a
human to prevent the action being taken
The effect of implementing intellectics is to at least reduce the moderating effect of humans in the decision-loop, and even to remove that effect entirely. The emergence of
intellectics is accordingly bringing into much stronger focus the legitimacy of the inferencing techniques used, and of the inferences that they are leading to. Among the
major challenges involved are the difficulty and expense of establishing reliable software (in particular the size of the training-set required), the low quality of a large
proportion of the data on which inferencing depends, the significance of and the approach adopted to empty cells within the data-set, and the applicability of the data-
analytic technique to the data to which it is applied (Clarke 2016a, 2016c).
The earlier generations of computer-performed inferencing enabled the expression of humanly-understandable explanations. During the procedural programming era, a set
of conditions resulted in an output, and the logic of the solution was express in both the software specification and the source-code. In logic-based programming,
'consequents' could be traced back to 'antecedents', and in rule-based systems, which rules 'fired' in order to deliver the output could be documented (Clarke 1991).
That situation changes substantially with AI/ML and its primary technique, neural nets. The model is at best implicit and may be only very distantly related to the real-
world it is assumed to represent, the approach is empirical, it depends on a training-set, and it is not capable of generating a humanly-understandable explanation for an
inference that has been drawn. The application of such inferences to decision-making, and to the performance of actions in and on the real world, raises serious questions
about transparency (Burrell 2016, Knight 2017). A result of the loss of decision transparency is the undermining of organisations' accountability for their decisions and
actions. In the absence of transparency, such principles are under threat as evaluation, fairness, proportionality, evidence-based decision-making, and the capacity to
challenge decisions (APF 2013).
Applications of a variety of data analytics techniques are already giving rise to public disquiet, even in the case of techniques that are (at least in principle) capable of
generating explanations of decision rationale. The most publicly-visible of these are systems for people-scoring, most prominently in financial credit. There are also
applications in 'social credit' - although in this case to date only in the PRC (Chen & Cheung 2017). Similar techniques are also applied in social welfare contexts,
sometimes with seriously problematical outcomes (e.g. Clarke 2018a). Concerns are naturally heightened where inferencing technologies are applied to prediction -
particularly where the technique's effectiveness is assumed rather than carefully tested, published, and subject to challenge. Such approaches result in something
approaching pre-destination, through the allocation of individual people to categories and the attribution of future behaviour, in some circumstances even behaviour of a
criminal nature.
There is increasing public pressure for explanations to be provided for decisions that are adverse to the interests of individuals and of small business, especially in the
context of inscrutable inferencing techniques such as neural networking. The responsibility of decision-makers to provide explanations is implied by the principles of
natural justice and procedural fairness. In the EU, since mid-2018, as a consequence of Articles 13.2(f), 14.2(g) and 15.1(h) of the General Data Protection Regulation
(GDPR 2018), access must be provided to "meaningful information about the logic involved", "at least in" the case of automated decisions (Selbst & Powles 2017). On the
other hand, "the [European Court of Justice] has ... made clear that data protection law is not intended to ensure the accuracy of decisions and decision-making processes
involving personal data, or to make these processes fully transparent [and] a new data protection right, the 'right to reasonable inferences', is needed" (Wachter &
Mittelstadt 2019).
Re-conception of the field as Intellectics enables focus to be brought to bear on key issues confronting organisations that apply the outcomes of AI research. Intellectics
represents a major power-shift towards large organisations and away from individuals. Substantial pushback from the public needs to be anticipated, and new regulatory
obligations may be imposed on organisations. The following sections canvass the scope for these concerns to be addressed firstly by ethics, and secondly through
regulatory arrangements.
4. Ethics
Both the dated notion of AI and the alternative conceptualisations of complementary intelligence and intellectics harbour potentials for harm. So it is important for
organisations to carefully consider what factors constrain their freedom of choice and actions. The following section examines the regulatory landscape. This section first
considers the extent to which ethics affects organisational applications of technology.
Ethics is a branch of philosophy concerned with concepts of right and wrong conduct. Fieser (1995) and Pagallo (2016) distinguish 'meta-ethics', which is concerned with
the language, origins, justifications and sources of ethics, from 'normative ethics', which formulates generic norms or standards, and 'applied ethics', which endeavours to
operationalise norms in particular contexts. In a recent paper, Floridi (2018) has referred to 'hard ethics' - that which "may contribute to making or shaping the law" - and
'soft ethics' - which are discussed after the fact.
From the viewpoint of instrumentalists in business and government, the field of ethics evidences several substantial deficiencies. The first is that there is no authority, or at
least no uncontestable authority, for any particular formulation of norms, and hence every proposition is subject to debate. Further, as a form of philosophical endeavour,
ethics embodies every complexity and contradiction that smart people can dream up. Moreover, few formulations by philosophers ever reach even close to operational
guidance, and hence the sources enable prevarication and provide endless excuses for inaction. The inevitable result is that ethical discussions seldom have much influence
on real-world behaviour. Ethics is an intellectually stimulating topic for the dinner-table, and graces ex post facto reviews of disasters. However, the notion of 'ethics by
design' is even more empty than the 'privacy by design' meme. To an instrumentalist - who wants to get things done - ethics diversions are worse than a time-waster; they're
a barrier to progress.
The occasional fashion of 'business ethics' naturally inherits the vagueness of ethics generally, and provides little or no concrete guidance to organisations in any of the
many areas in which ethical issues are thought to arise. Far less does 'business ethics' assist in relation to complex and opaque digital technologies. Clarke (2018b)
consolidates a collection of attempts to formulate general ethical principles that may have applicability in technology-rich contexts - including bio-medicine, surveillance
and information technology. Remarkably, none of them contain any explicit reference to identifying relevant stakeholders. However, a number of norms are frequently-
encountered in these sets. These include demonstrated effectiveness and benefits, justification of disbenefits, mitigation of disbenefits, proportionality of negative impacts,
supervision (including safeguards, controls and audit), and recourse (including complaints and appeals channels, redress, sanctions, and enforcement powers and
resources).
The related notion of Corporate Social Responsibility (CSR), sometimes extended to include an Environmental aspect, can be argued to have an ethical base. In practice,
its primary focus is usually on the extraction of public relations gains from organisations' required investments in regulatory compliance. CSR can, however, extend
beyond the direct interests of the organisation to include philanthropic contributions to individuals, community, society or the environment.
When evaluating the potential impact of ethics and CSR, it is important to appreciate the constraints on company directors. They are required by law to act in the best
interests of each company of which they are a director. Attention to broad ethical questions is generally extraneous to, and even in conflict with, that requirement, except
where a business case indicates sufficient benefits to the organisation from taking a socially or environmentally responsible approach. The primary ways in which benefits
can accrue are through compliance with regulatory requirements, and enhanced relationships with important stakeholders. Most commonly, these stakeholders will be
customers, suppliers and employees, but the scope might extend to communities and economies on which the company has a degree of dependence.
Given the limited framework provided by ethics, the question arises as to the extent to which organisations are subject to legal and social mechanisms that prevent or
http://rogerclarke.com/EC/GAIF.html Page 5 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39
constrain their freedom to create technologies, and to embody them in artefacts, systems and applications.
5. Regulation
AI seems to have been argued by its proponents to be arriving imminently, on a cyclical basis, roughly every decade since 1956. Despite that, it appears that few regulatory
requirements have been designed or modified specifically with AI in mind. One reason for this is that parliaments seldom act in advance of new technologies being
deployed.
A 'precautionary principle' has been enunciated, whose strong form exists in some jurisdictions' environmental laws, along the lines of 'When human activities may lead to
morally unacceptable harm that is scientifically plausible but uncertain, actions shall be taken to avoid or diminish that potential harm' (TvH 2006). More generally,
however, the 'principle' is merely an ethical norm to the effect that 'If an action or policy is suspected of causing harm, and scientific consensus that it is not harmful is
lacking, then the burden of proof arguably falls on those taking the action'. Where AI appears likely to be impactful on the scale that its proponents suggest, surely the
precautionary principle applies, at the very least in its weak form. On the other hand, the considerable impacts of such AI technologies as automated number-plate
recognition (ANPR), 'facial recognition' and drones have not been the subject even of effective after-the-fact regulatory adaptation or innovation, let alone of proactive
protective measures.
A large body of theory exists relating to regulatory mechanisms (Braithwaite & Drahos 2000). Regulation takes many forms, including intrinsic and natural controls, self-
control, several levels of 'soft' community controls, various kinds of 'formal' or 'hard' regulatory schemes, and regulation by infrastructure or 'code'. An overview of these
categories is in Clarke & Bennett Moses (2014), and a relevant analysis is in Clarke (2014c). This section identifies a range of sources that may offer organisations, to
some extent guidance, and at least insights into what society expects, and obligations that organisations might be subject to.
Economic factors tend to constrain adoption, commonly because of the expense involved and inadequate volume or profit-margin. This is particularly likely to be
determinative where the technology is, or is perceived to be, insufficiently effective in delivering on its promise. In some circumstances, the realisation of the potential
benefits of a technology may be dependent on infrastructure that is unavailable or inadequate. (For example, computing could have exploded in the third quarter of the 19th
century, rather than 100 years later, had metallurgy of the day been able to support Babbage's 'difference' and 'analytical' engines). Another form of control is the opposition
of players with sufficient institutional or market power. This includes the use of formal media and social media to stir up public opprobrium.
It is far from clear that any of the currently-promoted forms of AI are subject to adequate intrinsic and natural controls. The following sub-sections accordingly consider
each of the various forms of regulatory intervention, beginning at the apex of the regulatory pyramid with 'hard law'.
In HTR (2017), South Korea is identified as having enacted the first national law relating to robotics generally: the Intelligent Robots Development Distribution Promotion
Act of 2008. It is almost entirely facilitative and stimulative, and barely even aspirational in relation to regulation of robotics. There is mention of a 'Charter', "including the
provisions prescribed by Presidential Decrees, such as ethics by which the developers, manufacturers, and users of intelligent robots shall abide" - but no such Charter
appears to exist. A mock-up is at Akiko (2012). HTR (2018c) offers a generic regulatory specification in relation to research and technology generally, including robotics
and AI.
In relation to autonomous motor vehicles, a number of jurisdictions have enacted laws. See Palmerini et al. (2014, pp.36-73), Holder et al. (2016), DMV-CA (2018),
Vellinga (2017), which reviews laws in the USA at federal level, California, United Kingdom, and the Netherlands, and Maschmedt & Searle (2018), which reviews such
laws in three States of Australia. Such initiatives have generally had a strong focus on economic motivations, the stimulation and facilitation of innovation, exemptions
from some existing regulation, and limited new regulation or even guidance. One approach to regulation is to leverage off natural processes. For example, Schellekens
(2015) argued that a requirement of obligatory insurance was a sufficient means for regulating liability for harm arising from self-driving cars. In the air, legislatures and
regulators have moved very slowly in relation to the regulation of drones (Clarke & Bennett Moses 2014, Clarke 2016b).
Automated decision-making about people has been subject to French data protection law for many years. In mid-2018 this became a feature of European law generally,
through the General Data Protection Regulation (GDPR) Art. 22, although doubts have been expressed about its effectiveness (Wachter et al. 2017).
On the one hand, it might be that AI-based technologies are less disruptive than they are claimed to be, and that laws need little adjustment. On the other, a mythology of
'technology neutrality' pervades law-making. Desirable as it might be for laws to encompass both existing and future artefacts and processes, genuinely disruptive
technologies have features that render existing laws ambiguous and ineffective.
Applications of AI will generally be subject to the various forms of commercial law, particularly contractual obligations including express and implied terms, consumer
rights laws, and copyright and patent laws. In some contexts (such as robotics, cyborg artefacts, and AI software embedded in devices), product liability laws may apply.
Other laws that assign risk to innovators may also apply, such as the tort of negligence, as may laws of general applicability such as human rights law, anti-discrimination
law and data protection law. The obligations that the corporations law assigns to company directors are also relevant. Further sources of regulatory impact are likely to be
the laws relating to the various industry sectors within which AI is applied, such as road transport law, workplace and employment law, and health law.
Particularly in common law jurisdictions, there is likely to be a great deal of uncertainty about the way in which laws will be applied by tribunals and courts if any
particular dispute reaches them. This acts to some extent as a deterrent against innovation, and can considerably increase the costs incurred by proponents, and delay
deployment. From the viewpoint of people who perceive themselves to be negatively affected by the innovation, on the other hand, channels for combatting those threats
http://rogerclarke.com/EC/GAIF.html Page 6 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39
Unfortunately, few instances of effective co-regulation exist, because such processes typically exclude less powerful stakeholders. In any case, there are few signs of
parliaments being aware of the opportunity, and of its applicability to Intellectics. In Australia, for example, Enforceable Codes exist that are administered by the
Australian Communications and Media Authority (ACMA) in respect of TV and radio broadcasting, and telecommunications, and by the Australian Prudential Regulation
Authority (APRA) in respect of banking services. These arrangements succeed both in facilitating business and government activities and in offering a veneer of
regulation; but they fail to exercise control over behaviour that the public regards as inappropriate, and hence they have little public credibility.
It is common for parliaments to designate a specialist government agency or parliamentary appointee either to exercise loose oversight over a contested set of activities, or
to exercise powers and resources in order to enforce laws or Codes. An important function of either kind of organisation is to provide guidance to both the regulatees and
the parties that the scheme is intended to protect. In very few instances, however, does it appear that AI lies within the scope of an existing agency or appointee. Some
exceptions may exist, for example in relation to the public safety aspects of drones and self-driving motor vehicles.
As a result, in most jurisdictions, limited guidance appears to exist. For example, six decades after the AI era was launched, the EU has gone no further than a preliminary
statement (EC 2018) and a discussion document issued by the Data Protection Supervisor (EDPS 2016). Similarly, the UK Data Protection Commissioner has only reached
the stage of issuing a discussion paper (ICO 2017). The current US Administration's policy is entirely stimulative in nature, and mentions regulation solely as a barrier to
economic objectives (WH 2018).
It could also be argued that, if norms are promulgated by the more responsible corporations in an industry sector, then misbehaviour by the industry's 'cowboys' would be
highlighted. In practice, however, the effect of Industry Codes on corporate behaviour is seldom significant. Few such Codes are sufficiently stringent to protect the
interests of other parties, and the absence of enforcement undermines the endeavour. The more marginal kinds of suppliers ignore them, and responsible corporations feel
the pinch of competition and reduce their commitment to them. As a result, such Codes act as camouflage, obscuring the absence of safeguards and thereby holding off
actual regulatory measures. In the AI field, examples of industry coalitions eagerly pre-countering the threat of regulation include FLI (2017), ITIC (2017), and PoAI
(2018).
A more valuable role is played by industry standards. HTR (2017) lists industry standards issued by the International Standards Organisation (ISO) in the AI arena. A
considerable proportion of industry standards focus on inter-operability, and on business processes intended to achieve quality assurance. Public safety is also an area of
strength, particularly in the field commonly referred to as 'safety-critical systems' (e.g. Martins & Gorschek 2016). Hence some of the physical threats embodied in AI-
based systems are able to be avoided, mitigated and managed through the development and application of industry standards; but threats to economic and social interests
are seldom addressed.
A role can also be played by professional associations, because these generally balance public needs against self-interest somewhat better than industry associations. Their
impact is, however, far less pronounced than that of industry associations. Moreover, the intiatives to date of the two largest bodies are underwhelming, with ACM (2017)
using weak forms such as "should" and "are encouraged to", and IEEE (2017) offering lengthy prose but unduly vague and qualified principles. Neither has to date
provided the guidance needed by professionals, managers and executives.
It was noted above that Directors of corporations are required by law to pursue the interests of the corporation ahead of all other interests. It is therefore unsurprising, and
even to be expected, that organisational self-regulation is almost always ineffectual from the viewpoint of the supposed beneficiaries, and often not even effective at
protecting the organisation itself from bad publicity. Recent offerings by major corporations include IBM (Rayome 2017), Google (Pichai 2018) and MS (2018). For an
indication of the scepticism with which such documents are met, see Newcomer (2018).
A range of, in most cases fairly vague, principles, have been proposed by a diverse array of organisations. Examples include the European Greens Alliance (GEFA 2016), a
British Standard BS 8611 (BS 2016), the UNI Global Union (UGU 2017), the Japanese government (Hirano 2017), a House of Lords Committee (HOL 2018), as
interpreted by a World Economic Forum document (Smith 2018), and the French Parliament (Villani 2018).
Although there are commonalities among these formulations, there is also a lot of diversity, and few of them offer usable advice on how to ensure that Intellectics is
applied in a responsible manner. The next section draws on the sources identified above, in order to offer practical advice. It places the ideas within a conventional
framework, but extends that framework in order to address the needs of all stakeholders rather than just the corporation itself.
A relevant form that 'West Coast Code' could take is the embedment in robots of something resembling 'laws of robotics'. The notion dates to an Asimov short story,
'Runaround', first published in 1942; but many commentators on robotics cling to it. For example. Devlin (2016) quotes a professor of robotics as perceiving that the
British Standard Institute's guidance on ethical design of robots (BS 2016) represents "the first step towards embedding ethical values into robotics and AI". On the other
hand, a study of Asimov's robot fiction showed that he had comprehensively demonstrated the futility of the idea (Clarke 1993). A recent expression of the reason why the
approach is doomed is that "You cannot construct an algorithm that will reliably decide whether or not any algorithm is ethical" (Castell 2018, p.743).
http://rogerclarke.com/EC/GAIF.html Page 7 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39
6. A Practical Approach
Ethical analyses offer little assistance, and regulatory frameworks are lacking. It might seem attractive to business enterprises to face few legal obligations and hence to be
subject to limited compliance risk exposure. On the other hand, the absence of regulation heightens many other business risks. At least some competitors inevitably exhibit
'cowboy' behaviour, and there are always individuals and groups within each organisation who can be tempted by the promise that AI appears to offer. As a result, there are
substantial direct and indirect threats to the organisation's reputation. It is therefore in each organisation's own self-interest for a modicum of regulation to exist, in order to
provide a protective shield against media exposés and public backlash.
This section offers guidance to organisations. It assumes that organisations evaluating AI apply conventional environmental scanning and marketing techniques in order to
identify opportunities, and a conventional business case approach to estimating the strategic, market-share, revenue, cost and profit benefits that the opportunities appear to
offer them. The focus here is on how the downsides can be identified, evaluated and managed.
Familiar, practical approaches to assessing and managing risks are applicable. However, I contend that the conventional framework must be extended to include an
important element that is commonly lacking in business approaches to risk. That missing ingredient is stakeholder analysis. Risk assessment and management needs to be
performed not only from the business perspective, but also from the perspectives of other stakeholders.
There are many sources of guidance in relation to risk assessment and management. The techniques are well-developed in the context of security of IT assets and digital
data, although the language and the approaches vary considerably among the many sources (most usefully: Firesmith 2004, ISO 2005, ISO 2008, NIST 2012, ENISA 2016,
ISM 2017). For the present purpose, a model is adopted that is summarised in Appendix 1 of Clarke (2015). See Figure 1.
Existing corporate practice approaches this model from the perspective of the organisation itself. This gives rise to conventional risk assessment and risk management
processes outlined in Table 2. Relevant assets are identified, and an analysis undertaken of the various forms of harm that could arise to those assets as a result of threats
impinging on, or actively exploiting, vulnerabilities, and giving rise to incidents. Existing safeguards are taken into account, in order to guide the development of a strategy
and plan to refine and extend the safeguards and thereby provide a degree of protection that is judged to suitably balance modest actual costs against much higher
contingent costs.
(2) Identify the relevant Stakeholders, Assets, Values and categories of Harm
(3) Select a Design (or adapt / refine the alternatives to achieve an acceptable
Design)
http://rogerclarke.com/EC/GAIF.html Page 8 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39
(2) Implement
However, the categories of stakeholders are broader than this, comprising not only "participants in the information systems development process" but also "any other
individuals, groups or organizations whose actions can influence or be influenced by the development and use of the system whether directly or indirectly" (Pouloudi &
Whitley 1997, p.3). The term 'usees' is a usefully descriptive term for these once-removed stakeholders (Clarke 1992, Fischer-Hübner & Lindskog 2001, Baumer 2015).
My first proposition for extension beyond conventional corporate risk assessment is that the responsible application of AI is only possible if stakeholder analysis is
undertaken in order to identify the categories of entities that are or may be affected by the particular project (Clarkson 1995). There is a natural tendency to focus on those
entities that have sufficient market or institutional power to significantly affect the success of the project. On the other hand, in a world of social media and rapid and deep
mood-swings, it is advisable to not overlook the nominally less powerful stakeholders. Where large numbers of individuals are involved (typically, employees, consumers
and the general public), it will generally be practical to use representative and advocacy organisations as intermediaries, to speak on behalf of the categories or segments of
individuals.
My second proposition is that the responsible application of AI depends on risk assessment processes being conducted from the perspectives of the various stakeholders, to
complement that undertaken from the perspective of the corporation. Conceivably, such assessments could be conducted by the stakeholders independently, and fed into
the organisation. In practice, the asymmetry of information, resources and power is such that the outputs from independent. and therefore uncoordinated, activities are
unlikely to gain acceptance. The responsibility lies with the sponsor of an initiative to drive the studies, engage effectively with the other parties, and reflect their input in
the project design criteria and features.
The risk assessment process outlined in Table 2 above is generally applicable. However, my third proposition is that risk assessment processes that reflect the interests of
stakeholders needs to be broader than that commonly undertaken within organisations. Relevant techniques include privacy impact assessment (Clarke 2009, Wright & De
Hert 2012), social impact assessment (Becker & Vanclay 2003), and technology assessment (OTA 1977). For an example of impact assessment applied to the specific
category of person-carrier robots, see Villaronga & Roig (2017). The most practical approach may be, however, to adapt the organisation's existing process in order to
encompass whichever aspects of such broader techniques are relevant to the stakeholders whose needs are being addressed.
The results of the two or more risk assessment processes outlined above deliver the information that the organisation needs. They enable the development of a strategy and
plan whereby existing safeguards can be adapted or replaced, and new safeguards conceived and implemented. ISO standard 27005 (2008, pp.20-24) discusses four options
for what it refers to as 'risk treatment': risk modification, risk retention, risk avoidance and risk sharing. A framework is presented in Table 3 that in my experience is more
understandable by practitioners and more readily usable as a basis for identifying possible safeguards.
Proactive Strategies
Avoidance
e.g. non-use of a risk-prone technology or procedure
Deterrence
e.g. signs, threats of dismissal, publicity for prosecutions, substantial fines, gaol-time
Prevention
e.g. surge protectors and backup power sources; quality equipment, media and software; physical and logical access control; staff training, assigned responsibilities
and measures to sustain morale; staff termination procedures
Redundancy
e.g. duplicated equipment and communication paths; multiple, parallel evaluations with cross-checking of results
Reactive Strategies
Detection
e.g. fire and smoke detectors, logging, log-analysis, exception reporting
Reduction / Mitigation
e.g. fire-suppression technologies, fire-warden training, suspension of processing when unexpected harm arises, pre-arranged contingent measures to compensate for
harm
Recovery
e.g. investment in resources, procedures/documentation, staff training, and duplication including 'hot-sites' and 'warm-sites'
Insurance
e.g. mutual arrangements with other organisations, maintenance contracts with suppliers, escrow of third party software, inspection of escrow deposits, policies with
insurance companies
Non-Reactive Strategies
Tolerance / Self-Insurance
where assessment of the contingent costs concludes that they are bearable
Graceful Degradation
http://rogerclarke.com/EC/GAIF.html Page 9 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39
e.g. a pre-funded compensation fund, combined with suspension or cancellation of processing when unexpected harm arises
Graceless Degradation
e.g. siting a nuclear energy company's headquarters adjacent to the power plant, on the grounds that, if it goes, then the organisation and its employees should go
with it
___________
Existing techniques are strongly oriented towards protection against risks as perceived by the organisation. Risks to other stakeholders are commonly treated as, at best, a
second-order consideration, and at worst as if they were out-of-scope. All risk management work involves the exercise of a considerable amount of imagination. That
characteristic needs to be underlined even more strongly in the case of the comprehensive, multi-stakeholder approach that I am contending is necessary in the case of AI-
based systems.
This section has suggested customisation of existing, generic techniques in order to address the context of AI-based systems. The following section presents more specific
proposals.
The Principles in part emerge from the analysis presented in this Working Paper, and in part represent a consolidation of ideas from a suite of previously-published sets of
principles. The suite was assembled by surveying academic, professional and policy literatures. Diversity of perspective was actively sought. The sources include
corporations and industry associations (5), governmental organisations (6), academics (4), professional associations (2), joint associations (2), and non-government
organisations (5). Only sets that were available in the english language were used. This resulted in a strong bias within the suite towards documents that originated in
countries whose primary language(s) is or include English. Of the individual documents, 8 are formulations of 'ethical principles and IT'. Extracts and citations are
provided at Clarke (2018c). The other 16 claim to provide principles or guidance specifically in relation to AI. Extracts and citations are at Clarke (2018d).
In s.2.3 and Table 1 above, distinctions were drawn among the phases of the supply-chain, which in turn produce AI technology, AI-based artefacts, AI-based systems,
deployments of them, and applications of them. In each case, the relevant category of entity was identified that bears responsibility for negative impacts arising from AI. In
only a few of the 24 documents in the suite were such distinctions evident, however, and in most cases it has to be interpolated which part of the supply-chain the
document is intended to address. The European Parliament (CLA-EP 2016) refers to "design, implementation, dissemination and use", IEEE (2017) to "Manufacturers /
operators / owners", GEFA (2016) to "manufacturers, programmers or operators", FLI (2017) to researchers, designers, developers and builders, and ACM (2017) to
"Owners, designers, builders, users, and other stakeholders". Remarkably, however, in all of these cases the distinctions were only made within a single Principle rather
than being applied to the set as a whole.
Some commonalities exist across the source documents. Overall, however, most of the source documents were remarkably sparse, and there was far less consensus that
might have been expected 60 years after AI was first heralded. For example, only 1 document encompassed cyborgisation (GEFA 2016); only 2 documents referred to the
precautionary principle (CLA-EP 2016, GEFA 2016), and only 5 stipulated the conduct of impact assessments. One striking statistic is that only 3 of the c. 50 Principles
were detectable in at least half of the documents in the set:
Each source naturally reflects the express, implicit and subliminal purposes of the drafters and the organisations on whose behalf they were composed. In some cases, for
example, the set primarily addresses just one form of AI, such as robotics or machine-learning. Documents prepared by corporations, industry associations, and even
professional associations and joint associations tended to adopt the perspective of producer roles, with the interests of other stakeholders often relegated to a secondary
consideration. For example, the joint-association Future Life Institute perceives the need for "constructive and healthy exchange between AI researchers and policy-
makers", but not for any participation by stakeholders (FLI 2017 at 3). As a result, transparency is constrained to a small sub-set of circumstances (at 6), 'responsibility' of
'designers and builders' is limited to those roles being mere 'stakeholders in moral implications' (at 9), alignment with human values is seen as being necessary only in
respect of "highly autonomous AI systems" (at 10), and "strict safety and control measuers" are limited to a small sub-set of AI systems (at 22). ITIC (2017) considers that
many responsibilities lie elsewhere, and assigns responsibilities to its members only in respect of safety, controllability and data quality. ACM (2017) is expressed in weak
language (should be aware of, should encourage, are encouraged) and regards decision opaqueness as being acceptable, while IEEE (2017) suggests a range of important
tasks for other parties (standards-setters, regulators, legislatures, courts), and phrases other suggestions in the passive voice, with the result that few obligations are clearly
identified as falling on engineering professionals and the organisations that employ them. The House of Lords report might have been expected to adopt a societal or multi-
stakeholder approach, yet, as favourably reported in Smith (2018), it appears to have adopted the perspective of the AI industry.
The process of developing the set commenced with themes that derived from the analysis reported on in the earlier sections of this Working Paper. The previously-
published sets of principles were then inspected. Detailed propositions within each set were extracted, and allocated to themes, maintaining back-references to the sources.
Where items threw doubt on the structure or formulation of the general themes, the schema was adapted in order to sustain coherence and limit the extent to which
duplications arise.
The Principles have been expressed in imperative mode, i.e. in the form of instructions, in order to convey that they require action, rather than being merely desirable
characteristics, or factors to be considered, or issues to be debated. The full set of Principles, comprising about 50 elements, is in Appendix 1. In order to make them more
digestible, Table 4 presents the 10 over-arching themes.
Some of the items that appear in source documents appear incapable of being operationalised. For example, 'human dignity', 'fairness' and 'justice' are vague abstractions
that need to be unpacked into more specific concepts. In addition, some items fall outside the scope of the present work. The items that have been excluded from the set in
Table 4 are listed in Appendix 2.
Each of the Principles requires somewhat different application in each phase of the AI supply-chain. An important example of this is the manner in which Principle 7 -
Deliver Transparency and Auditability - is intended to be interpreted. In the Research and Invention phases of the technological life-cycle, compliance with Principle 7
requires understanding by inventors and innovators of the AI technology, and explicability to developers and users of AI-based artefacts and systems. During the
Innovation and Dissemination phases, the need is for understandability and manageability by developers and users of AI-based systems and applications, and explicability
to affected stakeholders. In the Application phase, the emphasis shifts to understandability by affected stakeholders of inferences, decisions and actions arising from at least
the AI elements within AI-based systems and applications.
The status of the proposed principles is important to appreciate. They are not expressions of law - although in some jurisdictions, and in some circumstances, some may be
legal requirements. They are expressions of moral obligations; but no authority exists that can impose such obligations. In addition, all are contestable, and in different
http://rogerclarke.com/EC/GAIF.html Page 10 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39
circumstances any of them may be in conflict with other legal or moral obligations, and with various interests of various stakeholders. They represent guidance to
organisations involved in AI as to the expectations of courts, regulatory agencies, oversight agencies, competitors and stakeholders. They are intended to be taken into
account as organisations undertake risk assessment and risk management, as outlined in s.6 above.
AI offers prospects of considerable benefits and disbenefits. All entities involved in applying AI bear legal and moral responsibility to demonstrate the benefits, to be
proactive in relation to disbenefits, and to involve stakeholders in the process.
2. Complement Humans
Considerable public disquiet already exists in relation to displacement of human workers by AI, and the replacement of human decision-making with inhumane machine
decision-making.
Considerable public disquiet already exists in relation to the prospect of humans ceding power to machines.
All entities involved in applying AI bear legal and moral responsibility to provide safeguards for all human stakeholders who are at risk, whether as users of AI-based
artefacts and systems or usees who are affected by them.
AI is capable of having substantial negative impacts on a wide range of civil and political rights.
All entities involved in applying AI have legal and moral responsibilities in relation to the quality of business processes and products.
All entities involved in applying AI have legal and moral obligations in relation to due process and procedural fairness. These obligations can only be fulfilled if the entity
ensures that humanly-understandable explanations are available for all AI-based inferences, decisions and actions.
AI-based systems and associated data must be subject to safeguards commensurate with the significance of their benefits, sensitivity and potential to cause harm to
stakeholders.
All entities involved in applying AI have legal and moral obligations in relation to due process and procedural fairness. These obligations can only be fulfilled if the entity
is discoverable and addresses problems as they arise.
All entities involved in applying AI have legal and moral obligations in relation to due process and procedural fairness. These obligations can only be fulfilled if the entity
implements internal problem-handling processes, and respects and complies with external problem-handling processes.
___________
The Principles in Table 4 are intentionally framed and phrased in an abstract manner, in an endeavour to achieve applicability to at least the currently mainstream forms of
AI discussed earlier - robotics, particularly remote-controlled and self-driving vehicles; cyborgs who incorporate computational capabilities; and AI/ML/neural-networking
applications. More broadly, the intention is that they be applicable to what I proposed above as the appropriate conceptualisation of the field - Intellectics.
These Principles are capable of being further articulated into much more specific guidance in respect of each particular category of AI. For example, in a companion
project, I have proposed 'Guidelines for Responsible Data Analytics' (Clarke 2018b). These provide more detailed guidance for the conduct of all forms of data analytics
projects, including those that apply AI/ML/neural-networking approaches. Areas addressed by the Data Analytics guidelines include governance, expertise and compliance
considerations, multiple aspects of data acquisition and data quality, the suitability of both the data and the analytical techniques applied to it, and factors involved in the
use of inferences drawn from the analysis.
8. Conclusions
This paper has proposed that the unserviceable notion of AI should be replaced by the notion of 'complementary intelligence', and that the notion of robotics ('machines
that think') is now much less useful than that of 'intellectics' ('computers that do').
The techniques and technologies that emerge from research laboratories offer potential but harbour considerable threats to organisations, and to those organisations'
stakeholders. Sources of guidance have been sought, whereby organisations in both the private and public sectors can evaluate the appropriateness of various such
technologies to their own operations. Neither ethical analysis nor regulatory schemes deliver what organisations need. The paper concludes that adapted forms of risk
assessment and risk management processes can fill the void, and that principles specific to AI can be formulated.
http://rogerclarke.com/EC/GAIF.html Page 11 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39
The propositions in this paper need to be workshopped with colleagues in the academic and consultancy worlds. The abstract Principles need to be articulated into more
specific expressions that are directly relevant to particular categories of technology, artefacts, systems and applications. The resulting guidance then needs to be exposed to
relevant professional executives and managers, reviewed by internal auditors, government relations executives and corporate counsel, and pilot-tested in realistic settings.
The following Principles are intended to be applied by the entities responsible for all phases of AI research, invention, innovation, dissemination and application. The
cross-references are to the 'Ethical Principles and IT' sources (Clarke 2018b - E) and 'Principles for AI' sources (Clarke 2018c - P).
1.1 Conceive and design only after ensuring adequate understanding of purposes and contexts
(E4.3, P5.3, P6.21, P7.1, P15.7)
1.5 Publish sufficient information to stakeholders to enable them to conduct impact assessment
(E7.3, P3.7, P4.1, P8.3, P8.4, P8.7)
1.8 Consider alternative, less harmful ways of achieving the same objectives
(E3.22)
2. Complement Humans
2.2 Avoid design for replacement of people by independent devices, except in circumstances in which artefacts are demonstrably more capable than people, and even then
ensuring that the result is complementary to human capabilities
(P5.1)
3.2 In particular, ensure control over autonomous behaviour of AI-based artefacts and systems
(E8.1, P8.4, P10.2, P11.4)
3.4 Ensure human review of inferences and decisions prior to acting on them
(E3.11)
3.5 Respect people's expectations in relation to personal data protections (E5.6), including:
* awareness of data-usage (E3.6)
* consent (E3.7, E3.28, E5.3, P3.11, P4.6)
* data minimisation (E3.9)
* public visibility and consultation (E3.10, E7.2), and
* relationship of data-usage to the data's original purpose (E3.27)
3.7 Avoid services being conditional on the acceptance of AI-based artefacts and systems
(P4.5)
4.2 Ensure people's psychological safety (E3.1, E6.9, E6.13), by avoiding negative effects on any individual's mental health, inclusion in society, worth, standing in
comparison with other people, or emotional state (E5.4, E6.3)
http://rogerclarke.com/EC/GAIF.html Page 12 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39
4.6 Avoid the manipulation of vulnerable people (E4.4, P4.5, P4.9), including taking advantage of individuals' tendency to addiction, e.g. to gambling (E6.3)
5.3 Avoid unfair discrimination and bias, not only where it is legally procribed but also where it is publicly unacceptable
(ICCPR Arts. 2.1, 3, 26 and 27, E3.16, P3.4, P4.5, P11.5, P15.2, P16.1)
5.6 Avoid interference with the rights of freedom of information, opinion and expression
(ICCPR 19, P4.6)
5.9 Avoid interference with the rights to participation in public affairs and access to public service
(ICCPR 25, P6.13)
6.3 Ensure security safeguards against inappropriate modification to and deletion of sensitive data
(E3.15)
6.9 Impose controls in order to ensure that safeguards are operative and effective
(E7.7, P10.2)
7.1 Ensure that the fact that the process is AI-based is transparent to all stakeholders
(E4.4, P4.8)
7.2 Ensure that the means whereby inferences are drawn, decisions made and actions are taken are logged and can be reconstructed
(E6.6, P2.4, P4.8, P5.2, P6.7, P7.4, P7.6, P8.2. P9.2, P11.1, P11.2, P13.2, P16.2, P16.3)
7.3 Ensure people are aware of inferences and how they were reached
(E3.12, P2.4)
http://rogerclarke.com/EC/GAIF.html Page 13 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39
8.1 Provide and sustain appropriate security safeguards against compromise of intended functions arising from both passive threats and active attacks
(E4.3, E6.11, P1.4, P1.5, P4.9, P6.6, P8.4, P9.5)
8.2 Provide and sustain appropriate security safeguards against inappropriate access to sensitive data arising from both passive threats and active attacks
(E3.15, E6.5, E6.10, P3.11, P9.6)
8.3 Conduct audits of justification, proportionality, transparency, mitigation measures and controls
(E7.8, E8.4)
8.4 Ensure resilience, in the sense of prompt and effective recovery from incidents
9.1 Ensure that the responsible entity is apparent or can be readily discovered by any party
(E4.5, E6.4, P2.3, P3.8, P4.7, P8.5, P12.3)
9.2 Ensure that effective remedies exist, in the form of complaints processes, appeals processes, and redress where harmful errors have occurred
(ICCPR 2.3, E3.13, E3.14, E7.7, P3.11, P4.7, P7.2, P8.7, P9.9, P10.5, P11.9, P16.3)
10.1 Ensure that complaints, appeals and redress processes operate effectively
(ICCPR 2.3, E7.7)
10.2 Comply with external complaints, appeals and redress processes and outcomes (ICCPR 14), including, in particular, provision of timely, accurate and complete
information relevant to cases
References
ACM (2017) 'Statement on Algorithmic Transparency and Accountability' Association for Computing Machinery, January 2017, at
https://www.acm.org/binaries/content/assets/public-policy/2017_usacm_statement_algorithms.pdf
Akiko (2012) 'South Korean Robot Ethics Charter 2012' Akiko's Blog, 2012, at https://akikok012um1.wordpress.com/south-korean-robot-ethics-charter-2012/
Albus J. S. (1991) 'Outline for a theory of intelligence' IEEE Trans. Systems, Man and Cybernetics 21, 3 (1991) 473-509, at http://citeseerx.ist.psu.edu/viewdoc/download?
doi=10.1.1.410.9719&rep=rep1&type=pdf
Anderson C. (2008) 'The End of Theory: The Data Deluge Makes the Scientific Method Obsolete' Wired Magazine 16:07, 23 June 2008, at
http://archive.wired.com/science/discoveries/magazine/16-07/pb_theory
APF (2013) 'Meta-Principles for Privacy Protection' Australian Privacy Foundation, March 2013, at https://privacy.org.au/policies/meta-principles/
Baumer E.P.S. (2015) 'Usees' Proc. 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI'15), April 2015
Becker H. & Vanclay F. (2003) 'The International Handbook of Social Impact Assessment' Cheltenham: Edward Elgar, 2003
Bennett Moses L. (2011) 'Agents of Change: How the Law Copes with Technological Change' Griffith Law Review 20, 4 (2011) 764-794, at
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2000428
Bibel W. (1980) ''Intellektik' statt 'KI' -- Ein ernstgemeinter Vorschlag' Rundbrief der Fachgruppe Künstliche Intelligenz in der Gesellschaft für Informatik, 22, 15-16
December 1980
Bibel W. (1989) 'The Technological Change of Reality: Opportunities and Dangers' AI & Society 3, 2 (April 1989) 117-132
Braithwaite B. & Drahos P. (2000) `Global Business Regulation' Cambridge University Press, 2000
http://rogerclarke.com/EC/GAIF.html Page 14 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39
BS (2016) 'Robots and robotic devices - Guide to the ethical design and application of robots and robotic systems' BS 8611, British Standards Institute, April 2016
Burrell J. (2016) How the machine 'thinks': Understanding opacity in machine learning algorithms' Big Data & Society 3, 1 (January-June 2016) 1-12
Calo R. (2017) 'Artificial Intelligence Policy: A Primer and Roadmap' UC Davis L. Rev. 51 (2017) 399-404
Castell S. (2018) 'The future decisions of RoboJudge HHJ Arthur Ian Blockchain: Dread, delight or derision?' Computer Law & Security Review 34, 4 (Jul-Aug 2018)
739-753
Chen Y. & Cheung A.S.Y. (2017). 'The Transparent Self Under Big Data Profiling: Privacy and Chinese Legislation on the Social Credit System, The Journal of
Comparative Law 12, 2 (June 2017) 356-378, at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2992537
CLA-EP (2016) 'Recommendations on Civil Law Rules on Robotics' Committee on Legal Affairs of the European Parliament, 31 May 2016, at
http://www.europarl.europa.eu/sides/getDoc.do?pubRef=-//EP//NONSGML%2BCOMPARL%2BPE-582.443%2B01%2BDOC%2BPDF%2BV0//EN
Clarke R. (1989) 'Knowledge-Based Expert Systems: Risk Factors and Potentially Profitable Application Area', Xamax Consultancy Pty Ltd, January 1989, at
http://www.rogerclarke.com/SOS/KBTE.html
Clarke R. (1991) 'A Contingency Approach to the Application Software Generations' Database 22, 3 (Summer 1991) 23-34, PrePrint at
http://www.rogerclarke.com/SOS/SwareGenns.html
Clarke R. (1992) 'Extra-Organisational Systems: A Challenge to the Software Engineering Paradigm' Proc. IFIP World Congress, Madrid, September 1992, at
http://www.rogerclarke.com/SOS/PaperExtraOrgSys.html
Clarke R. (1993) 'Asimov's Laws of Robotics: Implications for Information Technology' in two parts, in IEEE Computer 26,12 (December 1993) 53-61, and 27,1 (January
1994) 57-66, at http://www.rogerclarke.com/SOS/Asimov.html
Clarke R. (2005) 'Human-Artefact Hybridisation: Forms and Consequences' Proc. Ars Electronica 2005 Symposium on Hybrid - Living in Paradox, Linz, Austria, 2-3
September 2005, PrePrint at http://www.rogerclarke.com/SOS/HAH0505.html
Clarke R. (2009) 'Privacy Impact Assessment: Its Origins and Development' Computer Law & Security Review 25, 2 (April 2009) 123-135, PrePrint at
http://www.rogerclarke.com/DV/PIAHist-08.html
Clarke R. (2011) 'Cyborg Rights' IEEE Technology and Society 30, 3 (Fall 2011) 49-57, at http://www.rogerclarke.com/SOS/CyRts-1102.html
Clarke R. (2014a) 'Understanding the Drone Epidemic' Computer Law & Security Review 30, 3 (June 2014) 230-246, PrePrint at
http://www.rogerclarke.com/SOS/Drones-E.html
Clarke R. (2014b) 'What Drones Inherit from Their Ancestors' Computer Law & Security Review 30, 3 (June 2014) 247-262, PrePrint at
http://www.rogerclarke.com/SOS/Drones-I.html
Clarke R. (2014c) 'The Regulation of of the Impact of Civilian Drones on Behavioural Privacy' Computer Law & Security Review 30, 3 (June 2014) 286-305, PrePrint at
http://www.rogerclarke.com/SOS/Drones-BP.html
Clarke R. (2015) 'The Prospects of Easier Security for SMEs and Consumers' Computer Law & Security Review 31, 4 (August 2015) 538-552, PrePrint at
http://www.rogerclarke.com/EC/SSACS.html
Clarke R. (2016a) 'Big Data, Big Risks' Information Systems Journal 26, 1 (January 2016) 77-90, PrePrint at http://www.rogerclarke.com/EC/BDBR.html
Clarke R. (2016) 'Appropriate Regulatory Responses to the Drone Epidemic' Computer Law & Security Review 32, 1 (Jan-Feb 2016) 152-155, PrePrint at
http://www.rogerclarke.com/SOS/Drones-PAR.html
Clarke R. (2016c) 'Quality Assurance for Security Applications of Big Data' Proc. EISIC'16, Uppsala, 17-19 August 2016, PrePrint at
http://www.rogerclarke.com/EC/BDQAS.html
Clarke R. (2018a) 'Centrelink's Big Data 'Robo-Debt' Fiasco of 2016-17' Xamax Consultancy Pty Ltd, January 2018, at http://www.rogerclarke.com/DV/CRD17.html
Clarke R. (2018b) 'Guidelines for the Responsible Application of Data Analytics' Computer Law & Security Review 34, 3 (May-Jun 2018) 467- 476, PrePrint at
http://www.rogerclarke.com/EC/GDA.html
Clarke R. (2018c) 'Ethical Principles and Information Technology' Xamax Consultancy Pty Ltd, rev. September 2018, at http://www.rogerclarke.com/EC/GAIE.html
Clarke R. (2018d) 'Principles for AI: A 2017-18 SourceBook' Xamax Consultancy Pty Ltd, rev. September 2018, at http://www.rogerclarke.com/EC/GAI.html
Clarke R. & Bennett Moses L. (2014) 'The Regulation of Civilian Drones' Impacts on Public Safety' Computer Law & Security Review 30, 3 (June 2014) 263-285,
PrePrint at http://www.rogerclarke.com/SOS/Drones-PS.html
Clarkson M.B.E. (1995) 'A Stakeholder Framework for Analyzing and Evaluating Corporate Social Performance' The Academy of Management Review 20, 1 (Jan.1995)
92-117 , at
https://www.researchgate.net/profile/Mei_Peng_Low/post/Whats_corporate_social_performance_related_to_CSR/attachment/59d6567879197b80779ad3f2/AS%3A530408064417
Devlin H. (2016). 'Do no harm, don't discriminate: official guidance issued on robot ethics' The Guardian, 18 Sep 2016, at
https://www.theguardian.com/technology/2016/sep/18/official-guidance-robot-ethics-british-standards-institute
DMV-CA (2018) 'Autonomous Vehicles in California' Califiornian Department of Motor Vehicles, February 2018, at
https://www.dmv.ca.gov/portal/dmv/detail/vr/autonomous/bkgd
EC (2018) 'Statement on Artificial Intelligence, Robotics and `Autonomous' Systems' European Group on Ethics in Science and New Technologies' European Commission,
March 2018, at http://ec.europa.eu/research/ege/pdf/ege_ai_statement_2018.pdf
EDPS (2016) 'Artificial Intelligence, Robotics, Privacy and Data Protection' European Data Protection Supervisor, October 2016, at
https://edps.europa.eu/sites/edp/files/publication/16-10-19_marrakesh_ai_paper_en.pdf
ENISA (2016) 'Risk Management:Implementation principles and Inventories for Risk Management/Risk Assessment methods and tools' European Union Agency for
http://rogerclarke.com/EC/GAIF.html Page 15 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39
Firesmith D. (2004) 'Specifying Reusable Security Requirements' Journal of Object Technology 3, 1 (Jan-Feb 2004) 61-75, at
http://www.jot.fm/issues/issue_2004_01/column6
Fischer-Hübner S. & Lindskog H. (2001) 'Teaching Privacy-Enhancing Technologies' Proc. IFIP WG 11.8 2nd World Conference on Information Security Education,
Perth, 2001, at http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.3950&rep=rep1&type=pdf
FLI (2017) 'Asilomar AI Principles' Future of Life Institute, January 2017, at https://futureoflife.org/ai-principles/?cn-reloaded=1
Floridi L. (2018) 'Soft Ethics: Its Application to the General Data Protection Regulation and Its Dual Advantage' Philosophy & Technology 31, 2 (June 2018) 163-167, at
https://link.springer.com/article/10.1007/s13347-018-0315-5
Freeman R.E. & Reed D.L. (1983) 'Stockholders and Stakeholders: A New Perspective on Corporate Governance' California Management Review 25:, 3 (1983) 88-106, at
https://www.researchgate.net/profile/R_Freeman/publication/238325277_Stockholders_and_Stakeholders_A_New_Perspective_on_Corporate_Governance/links/5893a4b2a6fdcc4
and-Stakeholders-A-New-Perspective-on-Corporate-Governance.pdf
GDPR (2018) 'General Data Protection Regulation' Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural
Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, at http://www.privacy-regulation.eu/en/index.htm
GEFA (2016) 'Position on Robotics and AI' The Greens / European Free Alliance Digital Working Group, November 2016, at https://juliareda.eu/wp-
content/uploads/2017/02/Green-Digital-Working-Group-Position-on-Robotics-and-Artificial-Intelligence-2016-11-22.pdf
HOL (2018) 'AI in the UK: ready, willing and able?' Select Committee on Artificial Intelligence, House of Lords, April 2018, at
https://publications.parliament.uk/pa/ld201719/ldselect/ldai/100/100.pdf
Holder C., Khurana V., Harrison F. & Jacobs L. (2016) 'Robotics and law: Key legal and regulatory implications of the robotics age (Part I of II)' Computer Law &
Security Review 32, 3 (May-Jun 2016) 383-402
HTR (2017) 'Robots: no regulatory race against the machine yet' The Regulatory Institute, April 2017, at http://www.howtoregulate.org/robots-regulators-active/#more-230
HTR (2018a) 'Report on Artificial Intelligence: Part I - the existing regulatory landscape' The Regulatory Institute, May 2018, at
http://www.howtoregulate.org/artificial_intelligence/
HTR (2018b) 'Report on Artificial Intelligence: Part II - outline of future regulation of AI' The Regulatory Institute, June 2018, at
http://www.howtoregulate.org/aipart2/#more-327
HTR (2018c) 'Research and Technology Risks: Part IV - A Prototype Regulation' The Regulatory Institute, March 2018, at http://www.howtoregulate.org/prototype-
regulation-research-technology/#more-298
ICO (2017) 'Big data, artificial intelligence, machine learning and data protection' UK Information Commissioner's Office, Discussion Paper v.2.2, September 2017, at
https://ico.org.uk/for-organisations/guide-to-data-protection/big-data/
IEEE (2017) 'Ethically Aligned Design: A Vision for Prioritizing Human Well-being with Autonomous and Intelligent Systems (A/IS)' IEEE, Version 2, December 2017, at
http://standards.ieee.org/develop/indconn/ec/autonomous_systems.html
ISM (2017) 'Information Security Manual' Australian Signals Directorate, November 2017, at https://acsc.gov.au/infosec/ism/index.htm
ISO (2005) 'Information Technology - Code of practice for information security management', International Standards Organisation, ISO/IEC 27002:2005
ISO (2008) 'Information Technology - Security Techniques - Information Security Risk Management' ISO/IEC 27005:2008
ITIC (2017) 'AI Policy Principles' Information Technology Industry Council, undated but apparently of October 2017, at https://www.itic.org/resources/AI-Policy-
Principles-FullReport2.pdf
Knight W. (2017) 'The Dark Secret at the Heart of AI' 11 April 2017, MIT Technology Review https://www.technologyreview.com/s/604087/the-dark-secret-at-the-heart-
of-ai/
Leenes R. & Lucivero F. (2014) 'Laws on Robots, Laws by Robots, Laws in Robots: Regulating Robot Behaviour by Design' Law, Innovation and Technology 6, 2 (2014)
193-220
Lessig L. (1999) 'Code and Other Laws of Cyberspace' Basic Books, 1999
McCarthy J. (2007) 'What is artificial intelligence?' Department of Computer Science, Stanford University, November 2007, at http://www-
formal.stanford.edu/jmc/whatisai/node1.html
McCarthy J., Minsky M.L., Rochester N. & Shannon C.E. (1955) 'A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence' Reprinted in AI
Magazine 27, 4 (2006), at https://www.aaai.org/ojs/index.php/aimagazine/article/viewFile/1904/1802
Manwaring K. & Clarke R. (2015) 'Surfing the third wave of computing: a framework for research into eObjects' Computer Law & Security Review 31,5 (October 2015)
586-603, PrePrint at http://www.rogerclarke.com/II/SSRN-id2613198.pdf
Martins L.E.G. & Gorschek T. (2016) 'Requirements engineering for safety-critical systems: A systematic literature review' Information and Software Technology Journal
75 (2016) 71-89
Maschmedt A. & Searle R. (2018) 'Driverless vehicle trial legislation ,Äì state-by-state' King & Wood Malleson, February 2018, at
https://www.kwm.com/en/au/knowledge/insights/driverless-vehicle-trial-legislation-nsw-vic-sa-20180227
Mayer-Schonberger V. & Cukier K. (2013) 'Big Data: A Revolution That Will Transform How We Live, Work and Think' John Murray, 2013
http://rogerclarke.com/EC/GAIF.html Page 16 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39
Newcomer E. (2018). 'What Google's AI Principles Left Out: We're in a golden age for hollow corporate statements sold as high-minded ethical treatises' Bloomberg, 8
June 2018, at https://www.bloomberg.com/news/articles/2018-06-08/what-google-s-ai-principles-left-out
NIST (2012) 'Guide for Conducting Risk Assessments' National Institute of Standards and Technology, Special Publication SP 800-30 Rev. 1, September 2012, at
http://csrc.nist.gov/publications/nistpubs/800-30-rev1/sp800_30_r1.pdf
OTA (1977) 'Technology Assessment in Business and Government' Office of Technology Assessment, NTIS order #PB-273164', January 1977, at
http://www.princeton.edu/~ota/disk3/1977/7711_n.html
Pagallo U. (2016). 'Even Angels Need the Rules: AI, Roboethics, and the Law' Proc. ECAI 2016
Palmerini E. et al. (2014). 'Guidelines on Regulating Robotics Delivery' EU Robolaw Project, September 2014, at
http://www.robolaw.eu/RoboLaw_files/documents/robolaw_d6.2_guidelinesregulatingrobotics_20140922.pdf
Pichai S. (2018) 'AI at Google: our principles' Google Blog, 7 Jun 2018, at https://www.blog.google/technology/ai/ai-principles/
PoAI (2018) 'Our Work (Thematic Pillars)' Partnership on AI, April 2018, at https://www.partnershiponai.org/about/#pillar-1
Pouloudi A. & Whitley E.A. (1997) 'Stakeholder Identification in Inter-Organizational Systems: Gaining Insights for Drug Use Management Systems' European Journal of
Information Systems 6, 1 (1997) 1-14, at
http://eprints.lse.ac.uk/27187/1/__lse.ac.uk_storage_LIBRARY_Secondary_libfile_shared_repository_Content_Whitley_Stakeholder%20identification_Whitley_Stakeholder%20i
Rayome A.D. (2017) 'Guiding principles for ethical AI, from IBM CEO Ginni Rometty', TechRepublic (17 January 2017), at https://www.techrepublic.com/article/3-
guiding-principles-for- ethical-ai-from-ibm-ceo-ginni-rometty/
Russell S.J. & Norvig P. (2009) 'Artificial Intelligence: A Modern Approach' Prentice Hall, 3rd edition, 2009
Schellekens M. (2015) 'Self-driving cars and the chilling effect of liability law' Computer Law & Security Review 31, 4 (Jul-Aug 2015) 506-517
Scherer M.U. (2016) 'Regulating Artificial Intelligence Systems: Risks, Challenges, Competencies, and Strategies' Harvard Journal of Law & Technology 29, 2 (Spring
2016) 353-400, at http://euro.ecom.cmu.edu/program/law/08-732/AI/Scherer.pdf
Selbst A.D. & Powles J. (2017) 'Meaningful information and the right to explanation' International Data Privacy Law 7, 4 (November 2017) 233-242, at
https://academic.oup.com/idpl/article/7/4/233/4762325
Smith R. (2018). '5 core principles to keep AI ethical'. World Economic Forum, 19 Apr 2018, at https://www.weforum.org/agenda/2018/04/keep-calm-and-make-ai-ethical/
TvH (2006) 'Telstra Corporation Limited v Hornsby Shire Council' NSWLEC 133 (24 March 2006), esp. paras. 113-183, at
http://www.austlii.edu.au/au/cases/nsw/NSWLEC/2006/133.htm
UGU (2017) 'Top 10 Principles for Ethical AI' UNI Global Union, December 2017, at http://www.thefutureworldofwork.org/media/35420/uni_ethical_ai.pdf
Vellinga N.E. (2017) 'From the testing to the deployment of self-driving cars: Legal challenges to policymakers on the road ahead' Computer Law & Security Review 33, 6
(Nov-Dec 2017) 847-863
Villani C. (2017) 'For a Meaningful Artificial Intelligence: Towards a French and European Strategy' Part 5 - What are the Ethics of AI?, Mission for the French Prime
Minister , March 2018, pp.113-130, at https://www.aiforhumanity.fr/pdfs/MissionVillani_Report_ENG-VF.pdf
Villaronga E.F. & Roig A. (2017) 'European regulatory framework for person carrier robots' Computer Law & Security Review 33, 4 (Jul-Aug 2017) 502-220
Wachter S. & Mittelstadt B. (2019) 'A Right to Reasonable Inferences: Re-Thinking Data Protection Law in the Age of Big Data and AI' Forthcoming, Colum. Bus. L. Rev.
(2019), at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3248829
Wachter S., Mittelstadt B. & Floridi L. (2017) 'Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation'
International Data Privacy Law 7, 2 (May 2017) 76-99, at https://academic.oup.com/idpl/article/7/2/76/3860948
WH (2018) 'Summary of the 2018 White House Summit on Artificial Intelligence for American Industry' Office of Science and Technology Policy, White House, May
2018, at https://www.whitehouse.gov/wp-content/uploads/2018/05/Summary-Report-of-White-House-AI-Summit.pdf
Wright D. & De Hert P. (eds) (2012) 'Privacy Impact Assessments' Springer, 2012
Wyndham J. (1932) 'The Lost Machine' (originally published in 1932), reprinted in A. Wells (Ed.) 'The Best of John Wyndham' Sphere Books, London, 1973, pp. 13- 36,
and in Asimov I., Warrick P.S. & Greenberg M.H. (Eds.) 'Machines That Think' Holt, Rinehart, and Wilson, 1983, pp. 29-49
Zhaohui W. et al. (2016) 'Cyborg Intelligence: Recent Progress and Future Directions' IEEE Intelligent Systems 31, 6 (Nov-Dec 2016) 44-50
Acknowledgements
This paper has benefited from feedback from multiple colleagues, and particularly Peter Leonard of Data Synergies and Prof. Graham Greenleaf and Kayleen Manwaring
of UNSW. I first applied the term 'intellectics' during a presentation to launch a Special Issue of the UNSW Law Journal in Sydney in November 2017.
Author Affiliations
Roger Clarke is Principal of Xamax Consultancy Pty Ltd, Canberra. He is also a Visiting Professor in Cyberspace Law & Policy at the University of N.S.W., and a Visiting
Professor in the Research School of Computer Science at the Australian National University. He has also spent many years on the Board of the Australian Privacy
Foundation, and is Company Secretary of the Internet Society of Australia.
Access
http://rogerclarke.com/EC/GAIF.html Page 17 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39
The content and infrastructure for these community service pages are provided by Roger Clarke
through his consultancy company, Xamax.
Xamax Consultancy Pty Ltd
From the site's beginnings in August 1994 until February 2009, the infrastructure was provided by ACN: 002 360 456
the Australian National University. During that time, the site accumulated close to 30 million hits. 78 Sidaway St, Chapman ACT
It passed 50 million in early 2015. 2611 AUSTRALIA
Tel: +61 2 6288 6916
Sponsored by Bunhybee Grasslands, the extended Clarke Family, Knights of the Spatchcock and
their drummer
Created: 11 July 2018 - Last Amended: 3 October 2018 by Roger Clarke - Site Last Verified: 15 February 2009
This document is at www.rogerclarke.com/EC/GAIF.html
Mail to Webmaster - © Xamax Consultancy Pty Ltd, 1995-2017 - Privacy Policy
http://rogerclarke.com/EC/GAIF.html Page 18 of 18
Responsible AI Technologies, Artefacts, Systems and Applications
50 Principles
© Xamax Consultancy Pty Ltd, 2018
This document reproduces Appendix 1 of Clarke (2018a)
The following Principles are intended to be applied by the entities responsible for all phases of AI
research, invention, innovation, dissemination and application. The cross-references are to the
sources on 'Ethical Analysis and IT' sources (Clarke 2018b – E) and of 'Principles for AI' (Clarke
2018c – P).
2. Complement Humans
2.1 Design as an aid, for augmentation, collaboration and inter-operability
(P4.5, P5.1, P9.1, P9.8 .P14.2, P14.4)
2.2 Avoid design for replacement of people by independent devices, except in circumstances in
which artefacts are demonstrably more capable than people, and even then ensuring that the
result is complementary to human capabilities (P5.1)
– 1 –
4. Ensure Human Safety and Wellbeing
4.1 Ensure people's physical health and safety ('nonmaleficence')
(E2.2, E3.1, E4.1, E4.3, E5.4, E6.2, E6.9, E6.13, E6.14, E6.18, P1.2, P1.3, P2.1, P3.2, P3.6, P3.9,
P3.12, P4.3, P4.9, P6.6, P8.4, P9.4, P10.2, P11.4, P13.5, P14.1, P15.3)
4.2 Ensure people's psychological safety (E3.1, E6.9, E6.13), by avoiding negative effects on any
individual's mental health, inclusion in society, worth, standing in comparison with other
people, or emotional state (E5.4, E6.3)
4.3 Ensure people's wellbeing ('beneficence')
(E2.3, E3.20, E5.5, P3.1, P3.4, P6.1, P6.14, P6.15, P8.6, P11.6, P12.2, P13.1, P15.1, P16.4)
4.4 Mitigate negative consequences (E3.24, E7.6, E6.21, E10.4)
4.5 Avoid violation of trust (E3.3)
4.6 Avoid the manipulation of vulnerable people (E4.4, P4.5, P4.9), including taking advantage of
individuals' tendency to addiction, e.g. to gambling (E6.3)
– 2 –
7. Deliver Transparency and Auditability
7.1 Ensure that the fact that the process is AI-based is transparent to all stakeholders
(E4.4, P4.8)
7.2 Ensure that the means whereby inferences are drawn, decisions made and actions are taken
are logged and can be reconstructed
(E6.6, P2.4, P4.8, P5.2, P6.7, P7.4, P7.6, P8.2. P9.2, P11.1, P11.2, P13.2, P16.2, P16.3)
7.3 Ensure people are aware of inferences and how they were reached (E3.12, P2.4)
References
Clarke R. (2018a) 'Guidelines for the Responsible Business Use of AI – Foundational Working
Paper' Xamax Consultancy Pty Ltd, October 2018, at http://www.rogerclarke.com/EC/GAIF.html
Clarke R. (2018b) 'Ethical Analysis and Information Technology' Xamax Consultancy Pty Ltd,
October 2018, at http://www.rogerclarke.com/EC/GAIE.html
Clarke R. (2018c) 'Principles for AI: A SourceBook' Xamax Consultancy Pty Ltd, October 2018, at
http://www.rogerclarke.com/EC/GAIP.html
– 3 –
Journal of Change Management
To cite this article: Carol Flinchbaugh, Catherine Schwoerer & Douglas R. May (2017) Helping
Yourself to Help Others: How Cognitive Change Strategies Improve Employee Reconciliation with
Service Clients and Positive Work Outcomes, Journal of Change Management, 17:3, 249-267, DOI:
10.1080/14697017.2016.1231700
ABSTRACT KEYWORDS
This qualitative study examined the paradox of difficult, yet Reconciliation; cognitive
meaningful, helping as part of employees’ jobs in a social services change; meaningfulness;
organization. Incorporating an emergent design using employee paradox; positive
organizational scholarship
interviews the study identified how employees alter their
understanding of workplace challenges, such as emotional distress
and unsafe client behaviours, in order to find new meaning in the
other-oriented value of their work. The resulting framework of
employees’ experiences through challenging, yet meaningful,
helping extends the research in customer service by proposing the
reconciliation process, achieved through cognitive change strategies
(i.e. visualization techniques, cognitive reframing and mindfulness of
experience) serves as a conceptual bridge that helps the
management of this apparent paradox. We first describe the
workplace challenges and then outline the distinct cognitive
change strategies that engendered the reconciliation process.
Implications for practice and future researchers are then discussed.
What are we doing in this world, and why are we here, if not to help our neighbors. (Founder
of the study’s agency)
Today’s service employees endure numerous challenges in their attempt to help custo-
mers. Employees routinely encounter ambiguity in customer expectations (Johlke & Iyer,
2013), customer’s verbal criticisms (Goussinsky, 2011), and an overall malaise in customer
appreciation (Fisk & Neville, 2011). In essence, encountering and responding effectively to
service challenges, such as patiently listening and calmly responding to irate customers, is
merely a part of the service work. In contrast to single point-in-time employee–customer
interactions found in service work in other settings (e.g. call centres, retail), employees in
social service mental health jobs also face the further risk of on-going physical dangers and
verbal aggression due to unsafe and violent client behaviours (Mizzoni & Kirsh, 2007).
Incorporating a social service context in this study, the agency’s adolescent clients,
often victims of prior abuse and neglect, typically dismiss any need for help and feel
forced to participate in the residential-based treatment. As such, the employees rarely
experience the beneficial results of their service efforts since client improvements often
occur well after the timeframe of the employees’ direct care (Tarren-Sweeney, 2008). How
employees navigate the abundant challenges without experiencing many service suc-
cesses in this type of service work is unclear.
Articulating how employees reach service success in arduous settings is perhaps even
more important given the increasing number of service jobs (Maneotis, Grandey, &
Krauss, 2014). The more recent service literature has begun to outline the value of contex-
tual (Anderson, 2006; Duke, Goodman, Treadway, & Breland, 2009) and dispositional (Gous-
sinsky, 2011; Maneotis et al., 2014) antecedents as influences on positive employee
responses to service challenges. This focus moves beyond the traditional scholarly research
that examined the deleterious consequences of such routine service demands on service
employee attitudes and behaviours (Anderson, 2006; Koys, 2001). In examining antece-
dents, for instance, Judge, Woolf, and Hurst (2009) identify the value of service workers’
extraversion to quality service provision. Other scholars extend earlier cognitive reappraisal
research (Folkman, Lazarus, Dunkel-Schetter, DeLongis, & Gruen, 1986) to consider how
employees regulate their emotional response to customer incivilities in both retail (i.e.
focusing on a task; Wallace, Edwards, Shull, & Finch, 2009) and call centre (i.e. job autonomy,
Grandey, Dickter, & Sin, 2004; adhering to company policy, Goldberg & Grandey, 2007) set-
tings. Such consideration of the antecedents to improved service performance provides a
valuable first step to understand factors contributing to the quality of customer service. Yet,
this research often relies on cross-sectional findings, thus failing to incorporate the ongoing
feedback loops required for successful employee navigation in ongoing service encounters
(Goodman & Rousseau, 2004). For instance, how does an employee continually respond to
daily client crises? Or face repeated verbal incivilities from a volatile youth without witnes-
sing youth improvements? Indeed, notably absent from these studies is an understanding
of the longer term processes and strategies employees engage in to successfully journey
through the paradox of highly challenging, yet highly rewarding service jobs.
To address this limitation, researchers have more recently conceptualized process-
oriented approaches that consider the reciprocal steps involved in the employee–customer
service exchange. For example, Atkins and Parker (2011) conceptualized the mutual feedback
loops between employees’ compassionate acting and behavioural responses. In this process-
oriented feedback loop employees first are mindful of the situation and then respond with
compassionate actions. Similarly, Grandey and Gabriel (2015) acknowledged the need to con-
sider cyclical employee responses in the dimensions of emotional labour. They call for an
improved understanding of what allows one employee to align their behavioural responses
with their feelings about the customer (e.g. deep acting), while another employee maintains
misalignment between service behaviours and feelings (e.g. surface acting) (Grandey &
Gabriel, 2015). Responding to these calls, qualitative studies have begun to explore employ-
ees’ process of coping with care-giving roles in social service organizations (Cricco-Lizza,
2014). However, most process-oriented studies to date focus on care-giver difficulties in
family situations, not employees in social service organizations (Glavin & Peters, 2015; Robin-
son, Weiss, Lunsky, & Ouellette-Kuntz, 2015). Thus, to answer these recent calls for a more
process-oriented approach to employee service, the purpose of this research is to identify
how employees navigate personal hardships to provide service to others.
In order to understand how employees are able to maintain employment in spite of
arduous workplace challenges, the initial research focus used a grounded theory approach
to examine the potential link between employees’ resiliency and their tenure in the
JOURNAL OF CHANGE MANAGEMENT 251
organization. Resilient employees demonstrate the capability to bounce back after hardships
(Masten, 2001) and potentially even develop new skills as a result of the challenges they
experience (Fredrickson, 2003). Indeed, Jackson, Firtko, and Edenborough (2007) found
that resilient employees could persevere and adapt to service challenges. Extending these
findings and using the foundation of positive organizational scholarship work to fully
explore these issues (e.g. Cameron, Dutton, & Quinn, 2003; Lilius, Worline, Dutton, Kanov,
& Maitlis, 2011), we initially sought to better understand how resilience increased employee
longevity in challenging service roles. Importantly, the emergent nature of the research
moved well beyond the original focus on resilience when data revealed employees use cog-
nitive change strategies, including visualization techniques, cognitive reframing, and mind-
fulness of experience to generate positive service behaviours through the process of
reconciliation. Reconciliation is depicted in the extant literature as a mutual effort by all
parties to restore a damaged relationship (Aquino, Tripp, & Bies, 2006). We demonstrate
more specifically how reconciliation exists as both an inter- and intrapersonal focus depicting
how employees make amends and have a renewed perspective to service difficulties. Sub-
sequent to these reconciliatory efforts, employees report beneficial experiences despite
adverse work conditions, such as positive, meaningful experiences in other-oriented
helping. The contribution of this research is its demonstration of how employees develop
and sustain personal capabilities in the face of difficulties and hardships in providing care-
giving services. Instead of assuming employee capabilities in the service to others are con-
stant, our framework suggests that employees have to renew their capabilities through
reconciliatory cognitive changes in order to sustain their energy for service to others.
To investigate employees’ use of cognitive change strategies and reconciliation in the
successful guidance through service challenges, we conducted an in-depth qualitative
case study of a mental health social service agency in a major Midwestern city in the
U.S. We incorporate interview data from the workplace experiences of 22 long-term
employees across agency levels at two distinct time points.
Methods
Description of the organization
The setting was a child welfare agency founded in 1843 by a religious order to help single
women who located in the city during westward expansion. In 1998, the religious order
transferred ownership to a non-profit organization. Today, the organization’s mission is
to provide residential treatment services, foster care, and in-home therapeutic family ser-
vices to youth in ‘greatest need’. The agency currently employs 238 employees and serves
over 600 children and families annually.
A visit to the organization reveals how employees face opportunities for meaningful
contributions and distressing challenges alike. Once inside you encounter locked doors,
unbreakable glass, and may hear the voice of a severely agitated girl screaming and
cursing. Walk a little closer to the troubling voice and you may come across an isolated
and secure ‘timeout room’. The child may be located here involuntarily because she is
deemed unsafe to herself and others due to her violent behaviour. She remains here
until she can safely rejoin the group. While this incident may seem unsettling to an
outside observer, this type of volatility and violence is not unexpected from the youth
252 C. FLINCHBAUGH ET AL.
served at this agency. The youth are often emotionally harmed from past abuses by family
and acquaintances and typically enter the facility following psychiatric hospitalization.
During their placement at the agency, the youth routinely engage in volatile and
violent verbal and physical aggression, endangering both the safety of other clients and
employees. These children are in need of the guidance, care, and support provided by
the agency’s employees. This is the context for our examination of the process of how
employees navigate their potentially dangerous service roles.
Philosophical positioning
The research team employed an interpretive approach relying on naturalistic methods. We
elected to conduct our research in a natural setting in order to garner a rich understanding
of the employee experiences through persistent workplace challenges (Lincoln & Guba,
1985). Below, we describe the study’s naturalistic method, including the researchers’
role, emergent design, purposeful sample, and study data and analysis.
occurred four months later in order to get a deeper understanding of themes that
emerged from the initial interviews. Steps were also taken to minimize bias through
formal and informal member checks with employees regarding validity of information.
Sampling techniques
In the first interviews, the researchers, in conjunction with the CEO, used purposive
sampling to identify a diverse cross-section of employees from all programmes and organ-
izational levels in the agency. This non-random identification of potential participants
ensured that the study sample included representatives from all programmes and job
roles (Robinson, 2014). All interviewees had a minimum of 5 years of agency employment
(average of 12.5 years). We conducted 2 rounds of semi-structured interviews with 22
employees (see Table 1). In the second round of interviews, we used iterative sampling
to follow-up on new insights from employee’s initial interview comments. The interviewee
diversity in both job roles and demographic characteristics facilitated a global understand-
ing of employee experiences. Nine participants held direct care positions (i.e. more than
95% of the shift is spent directly with youth). Nineteen participants were females, consist-
ent with the social service sector (Schilling, Morrish, & Liu, 2008). Four participants were
African-American and 20 were Caucasian which is representative of the overall agency’s
employee work force. The response rate for interview participation was 86% (two non-
respondents to the interview request and two employees were on leave).
Data collection
The research team collected data using multiple methods and sources:
Open-ended interviews
The majority of data came from the semi-structured interviews. The initial interview ques-
tions were based on existing resilience research (e.g. Block & Kremen, 1996) to better
understand how employees withstood ongoing challenges. The use of semi-structured
interviews provided structure to attenuate biased questioning, but also gave the inter-
viewer flexibility to ask follow-up questions for further clarification (Pratt & Rosa, 2003).
The interviews lasted from 1 to 2 hours with most lasting around 75 minutes. Sixteen ques-
tions were asked during the initial interviews. Following an emergent process (Pratt &
Rosa, 2003), questions for the second round of interviews were based on themes from
the initial interviews, such as assessing employees’ involvement in cognitive change strat-
egies and reconciliatory processes. To minimize bias, the researchers guarded against
possible priming effects in the question design and alternated the question order between
employees’ positive and negative experiences. In order to create a comfortable environ-
ment and encourage accurate and honest responses, all interviews were conducted in
enclosed offices without obtrusive recording devices. Notes were taken using a pen and
paper method and transcribed within 24–36 hours to maintain accuracy.
Informal conversations
Informal conversations that occurred between the first author and employees during the
research period served to strengthen the understanding of the setting and assess notable
changes that may have occurred since the researcher’s employment. These conversations
were characterized by a personal, familiar tone as the parties’ reminisced about past work-
place experiences. On two occasions, the researcher was approached by interviewees who
wanted to further expand upon their earlier interview responses. The content of the con-
versations was not recorded; however, the information did serve to inform the researcher’s
observations of the setting.
Data analysis
The research team employed inductive data analysis to further inform the lead research-
er’s tacit knowledge of the workplace. The first author initially reviewed and categorized
the data based on broad, theoretically based common themes. Fifty themes were ident-
ified after the initial interviews (see Table 2). Next, the author inductively sorted and col-
lapsed the themes into well-defined categories (Glaser & Strauss, 1967; Pratt & Rosa, 2003).
To foster a reflexive design and reduce potential bias from the researchers’ knowledge of
past research, the research team reviewed and discussed divergent meanings of the inter-
view data and ensuing themes (Lincoln & Guba, 1985; Wodak, 2004). As a result of the mul-
tiple perspectives of the interview data, the researchers focused on the underlying process
of employees successfully navigating through job challenges. To confirm new insights
from our initial data interpretation (Lincoln & Guba, 1985; Wodak, 2004), the researchers
decided incorporating an iterative approach with a second round of interviews with 14
participants and additional questions was necessary. The new questions sought to
explore the sequence of the cognitive and emotional processes surrounding the intervie-
wees’ experience with workplace challenges.
Following the second set of interviews, the researchers revised and amended the the-
matic categories. Final themes emerged and suggested a framework that employees
espouse to help themselves provide client services in this difficult setting. Specifically,
the authors identified how employees use distinct cognitive change techniques through-
out the day to promote personal reconciliation and positive outcomes despite job-related
challenges (see Figure 1).
JOURNAL OF CHANGE MANAGEMENT 255
Results
Identifying positive experiences through hardships
The data revealed grave hardships in the employees’ daily job experiences. Regardless of
whether they worked directly with the youth or served in support roles, employees were
aware of the extremely difficult, even threatening, yet ultimately rewarding nature of their
work. Employees discussed the widespread conflict and ambiguity inherent in this
paradox. On the one hand, employees described the tenuous nature of the work and
the recognition that at any moment a client emergency – often a traumatic crisis – may
occur. Conversely, employees voiced their admiration for their colleagues and acknowl-
edged the valuable contributions they make towards bettering others’ lives. A clear
picture of the difficult, yet rewarding other-oriented experiences surfaces throughout
the interviews.
Employees acknowledged the difficult environment in every interview and echoed the
frequency of grim client behaviour. Employees talked about times they had been sworn at,
hit, or spat on by the youth. Employees also described situations when they needed to
physically restrain a violent child because the client was considered ‘a risk to themselves
or others’.1 One employee recollected a situation where she had to intervene alone with an
aggressive client. Her memory epitomized the shocking challenges that often arose:
We had a kid here who was a very tough kid … She was in a fight with another girl and they
broke a window. She needed to be physically restrained. My co-workers had to go deal with
the other girls, so I was left alone with this girl. As I was trying to restrain her, I got cut with the
broken glass and got bit by the girl – in a very private area. I was emotional about this situation
because I was left alone in a volatile situation.
Similarly, another employee shared a ‘very haunting experience’ where a client tried to
hang herself in the dorm bathroom. The employee recalled needing to ‘hold her up’
until other staff could come with assistance. Certainly, these examples demonstrate
how the personal trauma arising from such events are experienced by both the employee
and client. Unfortunately, these traumatic moments are typical of employee experiences
as every employee reported such negative experiences; they served as a reminder of
the organization’s mission to ‘serve those in greatest need’.
Yet the challenging and exhausting nature of the work was not the prominent emphasis in
the interviews. In contrast, employees’ discussions shifted to how their jobs presented oppor-
tunities for other-focused positive experiences. Simply put, employees did not merely focus
on the workplace challenges; they ultimately emphasized the value of other-focused helping.
Throughout the interviews, employees’ expression of positive job experiences (n = 118) far
outnumbered any difficulties (n = 68), and are reflected in comments such as:
Indeed, several employees reported that their job’s positive experiences outweighed
any personal sacrifices. The information technology manager, in spite of limited youth
interaction, identified the other-oriented importance of the work:
JOURNAL OF CHANGE MANAGEMENT 257
If I can help others do their jobs, then I have made a contribution … I’m not just drawing a
paycheck; I could make more than double what I make here at another job. By proxy, when
I help others I am part of the mission.
Reconciliation as a bridge
We learned from the interview data how reconciliation allowed employees to make an
evaluative judgement of the paradox of engaging in difficult work while recognizing its
future positive potential. More simply, reconciliation helped an individual move beyond
the crisis moments to recognize positive outcomes in the work. Moreover, employees
described the methods they used to shift their cognitive understanding of the situation.
Thus, we find a recursive process between personal reconciliation and cognitive change.
Before we discuss the nature of reconciliation in this study, it is important to note that
reconciliation is not new to the management literature or this social service agency.
Reconciliation describes a victim’s extension of goodwill towards the offending party in
order to restore the relationship (Aquino et al., 2006). The value of reconciliation surfaces
when two or more individuals find resolution after a conflict. In fact, relationship restor-
ation, and not merely conflict resolution, epitomizes the enduring nature of reconciliation
(Palanski, 2012).
Reconciliation also has historical significance at the study’s agency. Namely, the
agency’s religious founders championed its importance by considering reconciliation as
one of their original ‘core values’. To this day, employees are actively encouraged to recon-
cile and resolve inequities that the youth have previously faced. Injustice perceptions or
conflict between individuals (i.e. colleagues, supervisors, customers) is common in any
workplace and this agency is no different. Employees noted the existence of conflict
between and among coworkers and clients. In fact, the youth are often referred to this
organization due to their excessive conflict and aggression with parents, foster parents,
or school personnel. Their conflict-ridden and often dangerous behaviours commonly con-
tinue after arriving at the agency. As such, it is no surprise that challenging youth beha-
viours lead to interpersonal conflict with employees. As a direct result of this pervasive
258 C. FLINCHBAUGH ET AL.
pattern of conflict, employees describe being emotionally and physically taxed at the end
of the day, crying over challenges with youth, and questioning whether they could have
intervened differently in conflict situations. Yet, despite the grim realities of their work
experiences, reconciliation provides a solution for employees.
Reconciliation emerged as employees express ‘other-focused’ concern. Employees
maintained their other-focused concern even when the difficulties were initiated by the
offending party, primarily the youth. Time and again, employees mentioned experiencing
a personally traumatic event, and then shared how they were capable of transcending the
crises and expressing concern for the other individual. One employee directly articulated
the importance of reconciliation:
I have had conflict with others at all levels … I realize that I just need to ‘shake-off’ some con-
flicts and I don’t need to turn every possibility of disagreement into a conflict … But this is
where the value of reconciliation is important – the idea is to restore relationships. It makes
sense with the kids when we (staff) can model it ourselves.
Thus, the employee realized the need to actively live the values of the agency in her inter-
actions with the clients.
Visualization techniques
The employees described their use of routine and clearly defined visualization patterns as
they mentally prepared for workday challenges. They also reported their use of the visu-
alization techniques at explicit times and locations. For example, one employee acknowl-
edged her intentional incorporation of the daily visualizations. She states:
There are uncertainties in the hand that you are dealt each day. You just have to handle the
hand that is dealt and try to make the best out of it. If I come in with the attitude that things
are going to be good – then it can be. But, I do need to be ‘re-tooled’ with the right mind and
spirit every day. Each day I prepare my mindset. I have a mini pep rally for myself. I listen to
gospel music, talk to someone who makes me laugh and say a prayer. When I first started
working here I would listen to NPR, the news, and talk radio on my drive in. After some
time, I changed it to my pep rally and I only listen to gospel music. There is no more news,
weather or things like that. The gospel music helps me get in the right mindset to be prepared
for work with the right attitude.
The employees’ intentional use of personal visualizations occurred at varied but consist-
ent time points in the workday. A direct care employee acknowledged that she took
specific steps to deal with the negative work emotions and has developed a ‘pattern of
leaving here. Where I talk with a friend on my drive home and debrief any difficult
things. Then we move on and talk about other things’. On more difficult days, this
employee went home and took a shower to ‘wash off the day’s difficulties’. Another super-
visor reported using daily techniques where she visualizes her experiences at the begin-
ning and end of her workday. As she drives down the driveway on her way home, she
‘visualizes a compartment at the end of the hill where I leave my difficult emotions
from the day’. She then ‘picks it up from the compartment when I return’.
The employees’ description of their visualization techniques was similar to the formal
practice of autogenic training. Users of autogenic training maintain daily practice sessions
to induce a relaxation state and train the subconscious mind to develop intentional mental
associations (Stetter & Kupper, 2002). In this study, employees reported engaging in visu-
alization techniques as they prepared for the workday by listening to select music genres
or ending the workday by explicitly compartmentalizing work thoughts as they leave the
facility. Furthermore, similar to the known health benefits of autogenic training (Stetter &
Kupper, 2002), the employees also described the benefits gained from the visualization
techniques, such as an elevated mood, balance in their work and home life, and improved
mental clarity.
Interestingly, no employees claimed learning about the visualization techniques in
training or through another’s advice. In fact, no employees even used the terms ‘visualiza-
tion techniques’ or ‘autogenic training’ to describe the techniques. They simply describe
patterns of behaviours that helped them deal with the workday difficulties; they visualized
positive scenarios to facilitate coping with daily challenges. Through their personal
resources they envisioned a method to help themselves cope with the tenuous uncertain-
ties of their jobs.
Cognitive reframing
Further, employees conveyed how they cognitively altered their perceptions of their work-
place experiences. Employees voiced a capability to reframe a given situation in order to
reflect an alternative perspective. Frequently, their reframing took the form of
260 C. FLINCHBAUGH ET AL.
circumventing a difficult situation through a focus on the future positives. This is particu-
larly the case when employees altered their perspective of the youth’s current violent
behaviours by recalling the past abuse the child endured and envisioning future
success for the child. They say,
. ‘I can’t make sense of the kid’s trauma. I need to move on and realize that it doesn’t
always have to be this way. I trust the mission and what we are doing. This will
change their lives’.
. ‘I really feel that people are doing the best they can with what they have. Everyone is
wounded in some way – not everyone knows or sees this’.
. ‘I like the idea of being the person that plants the seed (of future change) … I can
believe that they will benefit from what I have done. I have learned to celebrate the
small growth and progress in the kids’.
When the interview focus moved from the clients, the employees emphasized how they
also used cognitive reframing to make a deliberate cognitive shift to positive emotions,
memories, or situations. Employees expressed how they actively sought out a co-worker
who will make them laugh or who has an appreciative mindset; or recalled a positive
memory such as how their team was full of laughs during the last team meeting. Impor-
tantly, almost all of the interviewees (20 out of 22) expressed a pattern of reframing chal-
lenges in a positive light.
We recognize that employee engagement in cognitive reframing is not unique to this
study. Cognitive reframing is a psychological technique that allows one to perceive a past
event from another perspective and from this new vantage point feel more comfortable
with the situation (Erickson, Rossi, & Rossi, 1976). In workplace contexts, reframing has
been examined to model how employees normalize their ‘dirty jobs’ (Ashforth, Kreiner,
Clark, & Fugate, 2007). Cognitive reframing is also similar to rational-emotive behavioural
therapy (REBT) a therapeutic technique in which individuals rationally consider how
emotions influence their self-talk and behaviours (Ellis, 2003). REBT has been used in work-
place settings as a practical approach to improving employee performance (Criddle, 2007).
Yet, in this study, employee use of reframing extended beyond normalization or rational-
ization of their job. In contrast, the cognitive shift enabled employees to illustrate their
meaningful, other-oriented contributions rather than focus on the ‘dirty’, undesirable
characteristics of their roles.
Mindfulness of experience
The employees also reflected on past experiences where they successfully navigated
through difficulties in order to positively reconcile the challenges. Mindfulness of, or
awareness of, past workplace success essentially eases employees’ apprehension about
existing uncertainties. Through fostering a mindfulness of their past experiences, employ-
ees garnered additional information about the causes and consequences of past work-
place events. Akin to situational attribution conceptualized in attribution theory (Lord &
Smith, 1983; Weiner, 1985), we find how an employee’s mindfulness of experience assisted
in his/her explanation of the roots of a behaviour. In this setting, recalling past situations of
successful service was especially informative as employees reflected on past events to
ease the potential for negative future outcomes. For example, an employee employed
JOURNAL OF CHANGE MANAGEMENT 261
at the agency for 19-years discussed her perspective on job difficulties, ‘I have been here
so long and have seen so many positive experiences. I have an instinctual sense that things
will get better’. Her reflection on successful resolutions to past challenges enables her to
reconcile present workplace challenges. Importantly, the focused mindfulness of past suc-
cesses does not rely on employee tenure-level at the organization. In fact, the interviewee
with the shortest agency tenure (five years) expressed a similar mindfulness of experience:
‘I will work through the hard times. I have recognized that the good and bad times cycle.
Seeing the kid’s success stories gives me hope for the future’. Even the employee who
experienced getting cut and bitten is able to reflect on the unsavory experience and
articulate positive remarks:
… It was an emotional experience, but two years ago, that same client called here and apol-
ogized for the day that she cut me. She told me that she is now donating some of her belong-
ings to a friend in need … I recognized that joy came through this initial hostility. The girl
asked her if I would forgive her and I said ‘absolutely’. There is joy out of the tragedy.
Discussion
The dynamic process underlying the paradox
Our research highlights how employee reconciliation, in the form of both interpersonal
and intrapersonal reconciliation, provides the conceptual bridge in the paradox
between difficult and meaningful other-focused service work. The presence of paradoxical
context is common in the management literature (Davis, Maranville, & Obloj, 1997);
however, many studies fail to identify the underlying factors that contribute to such a
paradox (Lewis, 2000). In this study, we outline a process-oriented framework to demon-
strate how employee reconciliation achieved through the use of cognitive change strat-
egies bridges the underlying tensions in the paradox between difficult, yet meaningful
service work. Reconciliation as the conceptual bridge allows employees to find meaning
in their jobs despite the challenging environment. Through such reconciliation, employees
discover that the positive helping experiences in their jobs outweigh any personal injus-
tice or threat faced at work.
interpersonal reconciliation exists and warrants the need for employees’ intrapersonal
reconciliation of the offense.
Second, the authors believe it is important to acknowledge that while our study’s focus
depicted how employees used cognitive change strategies to achieve reconciliation and
positive workplace outcomes, the employees also acknowledged the agency’s contextual
shortcomings. For instance, employees shared their concerns about limited financial
resources and the agency’s pending leadership change. Certainly these conditions were
seen as problematic; nevertheless, it was clear to the researchers that the employees
were nonetheless capable of maintaining a positive outlook and finding worth and
meaning in their jobs. Future researchers should examine alternative factors that may
facilitate employees’ positive experiences in similar settings. It may be that the organiz-
ational culture influences employees’ pro-social motivation. Employees may make connec-
tions between the agency’s past religious affiliation and jobs that increase the salience of
their own values and meaningfulness experienced (Dik & Duffy, 2009). Likewise, future
researchers should examine the influence of other contextual influences on employees’
meaningfulness, such as supervisor and coworker influences (May et al., 2004) or tenure
at the organization.
Future researchers should also investigate whether focused employee training on the
use of cognitive change techniques can enhance employee performance through devel-
opment of psychological resources supportive of intrapersonal reconciliation. Akin to the
known physical and psychological benefits of stress management techniques (Van der
Klink, Blonk, Schene, & Van Dijk, 2001), our findings appear to indicate that training
employees on the benefits of cognitive change might assist individuals in moving
their mindset towards a future orientation of expected positive job-related outcomes
for both themselves and their clients. This future investigation will require a combination
of qualitative longitudinal work and quantitative analysis of employee reports and
records.
Conclusion
In this paper, we examined the individual level processes of cognitive change strategies
and reconciliation that influence employees’ positive workplace outcomes. The findings
go beyond identification of a job-related paradox to interpret how reconciliation and
the change strategies serve as the underlying process in managing the tensions in this
paradox. In fact, an employee explicitly referenced such successful ‘transformation’ follow-
ing the tension to reward in the following job-related experience:
My most memorable experience was working with a girl who was extremely volatile in her
actions and emotions. I would get a knot in my stomach anytime I had to work with her.
Our conversation would start great and then would turn on a dime-often leading to difficult
behaviors. But, she eventually did really well in her foster home and we were celebrating her
move to an independent living home. I wrote something up to share at this ceremony and
I compared my time working with her like a roller coaster. There are moments that scare
you to death, and moments that give you a thrill, but at the end of the ride you want to
get back on and ride again. That was my experience with this girl. My time with her was
transforming to me.
It is our hope that the study’s findings and emergent, process-focused framework can be
used by researchers and practitioners alike in order to extend our understanding of how
employees successfully navigate workplace challenges.
JOURNAL OF CHANGE MANAGEMENT 265
Note
1. Safe crisis management, a therapeutic physical restraint, is used to maintain client safety
during violent outbursts.
Disclosure statement
No potential conflict of interest was reported by the authors.
Notes on contributors
Carol Flinchbaugh an Assistant Professor of management at New Mexico State University. Her
research interests seek to understand how organizational policies and procedures influence
employee level behaviour in areas such as employee well-being, stress, and positive psychology.
Catherine Schwoerer is an Associate Professor of management at the University of Kansas. Her work
considers employees’ self-efficacy, well-being, and career management.
Douglas R. May is a Professor and Director of the International Center for Ethics in Business at the
University of Kansas. He examines employees’ moral efficacy and ethical decision-making as well as
employees’ pursuit of engagement, thriving and meaningful work.
References
Allen, M. S., & McCarthy, P. J. (2015). Be happy in your work: The role of positive psychology in
working with change and performance. Journal of Change Management. doi:10.1080/14697017.
2015.1128471
Anderson, J. (2006). Managing employees in the service sector: A literature review and conceptual
development. Journal of Business and Psychology, 20, 501–523.
Aquino, K., Tripp, T., & Bies, R. (2006). Getting even or moving on? Power, procedural justice, and
types of offense as predictors of revenge, forgiveness, reconciliation, and avoidance in organiz-
ations. Journal of Applied Psychology, 91, 653–668. doi:10.1037/0021-9010.91.3.653
Ashforth, B., & Kreiner, G. (1999). ‘How can you do it?’: Dirty work and the challenge of constructing a
positive identity. The Academy of Management Review, 24, 413–434. doi:10.5465/AMR.1999.2202129
Ashforth, B., Kreiner, G., Clark, M., & Fugate, M. (2007). Normalizing dirty work: Managerial tactics for
countering occupational taint. Academy of Management Journal, 50, 149–174. doi:10.5465/AMJ.
2007.24162092
Atkins, P., & Parker, S. (2011). Understanding individual compassion in organizations: The role of
appraisals and psychological flexibility. Academy of Management Review, amr-10. doi:10.5465/
amr.10.0490
Beach, M. C., Roter, D., Korthuis, P. T., Epstein, R. M., Sharp, V., Ratanawongsa, N., … , Saha, S. (2013). A
multicenter study of physician mindfulness and health care quality. The Annals of Family Medicine,
11(5), 421–428.
Block, J., & Kremen, A. M. (1996). IQ and ego-resiliency: Conceptual and empirical connections and
separateness. Journal of Personality and Social Psychology, 70(2), 349–361.
Brown, K. W., Ryan, R. M., & Creswell, J. D. (2007). Mindfulness: Theoretical foundations and evidence
for its salutary effects. Psychological Inquiry, 18, 211–237.
Burke, R., & Greenglass, E. (1989). Psychological burnout among men and women in teaching: An
examination of the Cherniss Model. Human Relations, 42, 261–273. doi:10.1177/
001872678904200304
Cameron, K., Dutton, J., & Quinn, R. (Eds). (2003). Positive organizational scholarship: Foundations of a
new discipline. San Francisco, CA: Berrett-Koehler Publishers, Inc.
Creswell, J. W. (2013). Research design: Qualitative, quantitative, and mixed methods approaches.
Thousand Oaks, CA: Sage.
266 C. FLINCHBAUGH ET AL.
Cricco-Lizza, R. (2014). The need to nurse the nurse: Emotional labor in neonatal intensive care.
Qualitative Health Research, 24(5), 615–628.
Criddle, W. D. (2007). Adapting REBT to the world of business. Journal of Rational-Emotive & Cognitive-
Behavior Therapy, 25(2), 87–106.
Davis, A., Maranville, S., & Obloj, K. (1997). The paradoxical process of organizational transformation:
Propositions and a case study. Research in Organizational Change and Development, 10, 275–314.
Desbordes, G., Gard, T., Hoge, E. A., Hölzel, B. K., Kerr, C., Lazar, S. W., … , Vago, D. R. (2015). Moving
beyond mindfulness: Defining equanimity as an outcome measure in meditation and contempla-
tive research. Mindfulness, 6(2), 356–372.
Dik, B., & Duffy, R. (2009). Calling and vocation at work: Definitions and prospects for research and
practice. The Counseling Psychologist, 37, 424–450. doi:10.1177/0011000008316430
Duke, A., Goodman, J., Treadway, D., & Breland, J. (2009). Perceived organizational support as a mod-
erator of emotional labor/outcomes relationships. Journal of Applied Social Psychology, 39, 1013–
1034. doi:10.1111/j.1559-1816.2009.00470
Ellis, A. (2003). Reasons why rational emotive behavior therapy is relatively neglected in the pro-
fessional and scientific literature. Journal of Rational-Emotive and Cognitive-Behavior Therapy, 21
(3–4), 245–252.
Erickson, M., Rossi, E., & Rossi, S. (1976). Hypnotic realities: The induction of clinical hypnosis and forms
of indirect suggestion. New York: Irvington.
Fisk, G., & Neville, L. (2011). Effects of customer entitlement on service workers’ physical and psycho-
logical well-being: A study of waitstaff employees. Journal of Occupational Health Psychology, 16,
391–405. doi:10.1037/a0023802
Folkman, S., Lazarus, R. S., Dunkel-Schetter, C., DeLongis, A., & Gruen, R. J. (1986). Dynamics of a stress-
ful encounter: Cognitive appraisal, coping, and encounter outcomes. Journal of Personality and
Social Psychology, 50(5), 992–1003.
Fredrickson, B. L. (2003). The value of positive emotions. American Scientist, 91, 330–335.
Fredrickson, B. L. (2004). The broaden-and-build theory of positive emotions. Philosophical
Transactions of the Royal Society B: Biological Sciences, 359, 1367–1377.
Glaser, B., & Strauss, A. (1967). The discovery of grounded theory: Strategies for qualitative research.
Chicago, IL: Aldine.
Glavin, P., & Peters, A. (2015). The costs of caring: Caregiver strain and work–family conflict among
Canadian workers. Journal of Family and Economic Issues, 36(1), 5–20.
Goldberg, L. S., & Grandey, A. A. (2007). Display rules versus display autonomy: Emotion regulation,
emotional exhaustion, and task performance in a call center simulation. Journal of Occupational
Health Psychology, 12(3), 301–318.
Goodman, P., & Rousseau, D. (2004). Organizational change that produces results: The linkage
approach. The Academy of Management Executive, 18, 7–19. doi:10.5465/AME.2004.14776160
Goussinsky, R. (2011). Does customer aggression more strongly affect happy employees? The mod-
erating role of positive affectivity and extraversion. Motivation and Emotion, 35, 220–234. doi:10.
1007/s11031-001-9215
Grandey, A. A., Dickter, D. N., & Sin, H. P. (2004). The customer is not always right: Customer aggression
and emotion regulation of service employees. Journal of Organizational Behavior, 25(3), 397–418.
Grandey, A., & Gabriel, A. (2015). Emotional labor at a crossroads: Where do we go from here? Annual
Review of Organizational Psychology and Organizational Behavior, 2(1), 323–349. doi:10.1146/
annurev-orgpsych-032414-111400
Jackson, D., Firtko, A., & Edenborough, M. (2007). Personal resilience as a strategy for surviving and thriv-
ing in the face of workplace adversity: A literature review. Journal of Advanced Nursing, 60(1), 1–9.
Johlke, M., & Iyer, R. (2013). A model of retail job characteristics, employee role ambiguity, external
customer mind-set, and sales performance. Journal of Retailing and Consumer Services, 20, 58–67.
Judge, T. A., Woolf, E. F., & Hurst, C. (2009). Is emotional labor more difficult for some than for others?
A multilevel, experience-sampling study. Personnel Psychology, 62(1), 57–88.
Koys, D. (2001). The effects of employee satisfaction, organizational citizenship behavior, and turn-
over on organizational effectiveness: A unit-level, longitudinal study. Personnel Psychology, 54,
101–114. doi:10.1111/j.1744-6570.2001.tb00087
JOURNAL OF CHANGE MANAGEMENT 267
Lewis, M. (2000). Exploring paradox: Toward a more comprehensive guide. The Academy of
Management Review, 25, 760–776. doi:10.5465/AMR.2000.3707712
Lilius, J., Worline, M., Dutton, J., Kanov, J., & Maitlis, S. (2011). Understanding compassion capability.
Human Relations, 64, 873–899. doi:10.1177/0018726710396250
Lincoln, Y., & Guba, E. (1985). Naturalistic inquiry. London: Sage.
Lord, R. G., & Smith, J. E. (1983). Theoretical, information processing, and situational factors affecting
attribution theory models of organizational behavior. Academy of Management Review, 8(1), 50–60.
Maneotis, S., Grandey, A., & Krauss, A. (2014). Understanding the ‘why’ as well as the ‘how’: Service
performance is a function of prosocial motives and emotional labor. Human Performance, 27,
80–97. doi:10.1080/08959285.2013.854366
Masten, A. S. (2001). Ordinary magic: Resilience process in development. American Psychologist, 56,
227–238.
May, D., Gilson, R., & Harter, L. (2004). The psychological conditions of meaningfulness, safety and
availability and the engagement of the human spirit at work. Journal of Occupational and
Organizational Psychology, 77, 11–37. doi:10.1348/096317904322915892
Mizzoni, C., & Kirsh, B. (2007). Employer perspectives on supervising individuals with mental health
problems. Canadian Journal of Community Mental Health, 25, 193–206.
Palanski, M. (2012). Forgiveness and reconciliation in the workplace: A multi-level perspective and
research agenda. Journal of Business Ethics, 109, 275–287. doi:10.1007/s10551.011.1125.1
Poitras, J., & Le Tareau, A. (2009). Quantifying the quality of mediation agreements. Negotiation and
Conflict Management Research, 2(4), 363–380.
Pratt, M., & Rosa, J. (2003). Transforming work–family conflict into commitment in network marketing
organizations. The Academy of Management Journal, 46, 395–418.
Reb, J., Narayanan, J., & Ho, Z. W. (2015). Mindfulness at work: Antecedents and consequences of
employee awareness and absent-mindedness. Mindfulness, 6(1), 111–122.
Robinson, O. C. (2014). Sampling in interview-based qualitative research: A theoretical and practical
guide. Qualitative Research in Psychology, 11(1), 25–41.
Robinson, S., Weiss, J. A., Lunsky, Y., & Ouellette-Kuntz, H. (2015). Informal support and burden among
parents of adults with intellectual and/or developmental disabilities. Journal of Applied Research in
Intellectual Disabilities, 29(4), 356–365.
Schilling, R., Morrish, J., & Liu, G. (2008). Demographic trends in social work over a quarter-century in
an increasingly female profession. Social Work, 53, 103–114. doi:10.1093/sw/53.2.103
Shenton, A. K. (2004). Strategies for ensuring trustworthiness in qualitative research projects.
Education for Information, 22(2), 63–75.
Shnabel, N., & Nadler, A. (2008). A needs-based model of reconciliation: Satisfying the differential
emotional needs of victim and perpetrator as a key to promoting reconciliation. Journal of
Personality and Social Psychology, 94, 116–132.
Stetter, F., & Kupper, S. (2002). Autogenic training: A meta-analysis of clinical outcome studies.
Applied Psychophysiology and Biofeedback, 27, 45–98. doi:10.1023.A.1014576505223
Sutcliffe, K. M., Vogus, T. J., & Dane, E. (2016). Mindfulness in organizations: A cross-level review.
Annual Review of Organizational Psychology and Organizational Behavior, 3, 55–81.
Tarren-Sweeney, M. (2008). Retrospective and concurrent predictors of the mental health of children
in care. Children and Youth Services Review, 30(1), 1–25.
Van der Klink, J. J., Blonk, R. W., Schene, A. H., & Van Dijk, F. J. (2001). The benefits of interventions for
work-related stress. American Journal of Public Health, 91(2), 270–276.
Wallace, J. C., Edwards, B. D., Shull, A., & Finch, D. M. (2009). Examining the consequences in the ten-
dency to suppress and reappraise emotions on task-related job performance. Human Performance,
22(1), 23–43.
Weiner, B. (1985). ‘Spontaneous’ causal thinking. Psychological Bulletin, 97, 74–84. doi:10.1037/0033-
2909.97.1.74
Wodak, R. (2004). Critical discourse analysis. In C. Seale, J. F. Gubrium, & D. Silverman (Eds.), Qualitative
research practice (pp. 185–204). Thousand Oaks, CA: Sage.
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/51554230
CITATIONS READS
2,783 52,868
3 authors, including:
Catherine Fritz
The University of Northampton
26 PUBLICATIONS 3,778 CITATIONS
SEE PROFILE
All content following this page was uploaded by Catherine Fritz on 04 December 2016.
The Publication Manual of the American Psychological Association (American Psychological Associ-
ation, 2001, 2010) calls for the reporting of effect sizes and their confidence intervals. Estimates of effect
size are useful for determining the practical or theoretical importance of an effect, the relative contri-
butions of factors, and the power of an analysis. We surveyed articles published in 2009 and 2010 in the
Journal of Experimental Psychology: General, noting the statistical analyses reported and the associated
reporting of effect size estimates. Effect sizes were reported for fewer than half of the analyses; no article
reported a confidence interval for an effect size. The most often reported analysis was analysis of
variance, and almost half of these reports were not accompanied by effect sizes. Partial 2 was the most
commonly reported effect size estimate for analysis of variance. For t tests, 2/3 of the articles did not
report an associated effect size estimate; Cohen’s d was the most often reported. We provide a
straightforward guide to understanding, selecting, calculating, and interpreting effect sizes for many types
of data and to methods for calculating effect size confidence intervals and power analysis.
Keywords: effect size, eta squared, confidence intervals, statistical reporting, statistical interpretation
Experimental psychologists are accomplished at designing and included in the sixth edition (APA, 2010). Regarding effect sizes,
analyzing factorial experiments and at reporting inferential statis- the sixth edition states,
tics that identify significant effects. In addition to statistical sig-
nificance, most research reports describe the direction of an effect, For the reader to appreciate the magnitude or importance of a study’s
but it is also instructive to consider its size. Estimates of effect size findings, it is almost always necessary to include some measure of
effect size in the Results section. Whenever possible, provide a
are useful for determining the practical or theoretical importance
confidence interval for each effect size reported to indicate the pre-
of an effect, the relative contribution of different factors or the cision of estimation of the effect size. (APA, 2010, p. 34)
same factor in different circumstances, and the power of an anal-
ysis. This article reports the use of effect size estimates in the 2009 Effect sizes allow researchers to move away from the simple
and 2010 volumes of the Journal of Experimental Psychology: identification of statistical significance and toward a more gener-
General (JEP: General), comments briefly on their use, and offers ally interpretable, quantitative description of the size of an effect.
practical advice on choosing, calculating, and reporting effect size They provide a description of the size of observed effects that is
estimates and their confidence intervals (CIs). independent of the possibly misleading influences of sample size.
Effect size estimates have a long and somewhat interesting Studies with different sample sizes but the same basic descriptive
history (for details, see Huberty, 2002), but the current attention to characteristics (e.g., distributions, means, standard deviations, CIs)
them stems from Cohen’s work (e.g., Cohen, 1962, 1988, 1994) will differ in their statistical significance values but not in their
championing the reporting of effect sizes. In response to Cohen effect size estimates. Effect sizes describe the observed effects;
(1994) the American Psychological Association (APA) Board of effects that are large but nonsignificant may suggest further re-
Scientific Affairs set up a task force that proposed guidelines for search with greater power, whereas effects that are trivially small
statistical methods for psychology journals (Wilkinson & the APA but nevertheless significant because of large sample sizes can warn
Task Force on Statistical Inference, 1999). These guidelines were researchers against possibly overvaluing the observed effect.1
subsequently incorporated into the revised fifth edition of the Effect sizes can also allow the comparison of effects in a single
Publication Manual of the American Psychological Association study and across studies in either formal or informal meta-
(APA, 2001; hereinafter APA Publication Manual) and were again analyses. When planning new research, previously observed effect
sizes can be used to calculate power and thereby estimate appro-
priate sample sizes. Cohen (1988), Keppel and Wickens (2004),
This article was published Online First August 8, 2011. and most statistical textbooks provide guidance on calculating
Catherine O. Fritz, Educational Research Department, Lancaster Uni- power; a very brief, elementary guide appears in the Appendix
versity, Lancaster, United Kingdom; Peter E. Morris, Department of Psy- along with mention of planning sample sizes based on accuracy in
chology, Lancaster University, Lancaster, United Kingdom; Jennifer J.
Richler, Department of Psychology, Vanderbilt University.
1
We thank Thomas D. Wickens and Geoffrey Cumming for their very It is rarely the case that experimental studies have the problem of too
helpful advice on an earlier version of this article. many cases making trivial effects statistically significant, but some large-
Correspondence concerning this article should be addressed to Catherine scale surveys and other studies with very large sample sizes can have this
O. Fritz, Educational Research Department, Lancaster University, Lan- problem. For example, a correlation of .1, accounting for only 1% of the
caster LA1 4YD, United Kingdom. E-mail: c.fritz@lancaster.ac.uk variability, is statistically significant with a sample size of 272 (one tailed).
2
EFFECT SIZE ESTIMATES 3
parameter estimation (i.e., planning the size of the CIs; Cumming, two levels or are continuous, effect size estimates usually describe
2012; Kelley & Rausch, 2006; Maxwell, Kelley, & Raush, 2008). the proportion of variability accounted for by each independent
A brief note on the terminology used in this article may be variable; they include eta squared (2, sometimes called R2),
helpful. Effect sizes calculated to describe the data in a sample, partial eta squared (2p), generalized eta squared (G 2
), associated
like any other descriptive statistic, also potentially estimate the omega squared measures (2, 2p, G 2
), and common correlational
corresponding population parameter. Throughout this article, we measures, such as r2, R2, and Radj2. In addition, there are other less
refer to the calculated effect size, which describes the sample and frequently encountered statistics, such as epsilon squared (ε2;
estimates the population, as an effect size estimate. It is important Ezekiel, 1930) and various statistics devised by Cohen (1988),
to remember that the estimates both describe the sample and including q, f, and f2. Finally, there are the effect size estimates
estimate the population and that some statistics, as we describe relevant to categorical data, such as phi (), Cramér’s V (or c),
later, provide better estimation of the population parameters than Goodman–Kruskal’s lambda, and Cohen’s w (Cohen, 1988). The
do others. plethora of possible effect size estimates may create confusion and
The most basic and obvious estimate of effect size when con- contribute to the lack of engagement with reporting and interpret-
sidering whether two data sets differ is the difference between the ing effect sizes. Many of these statistics are conceptually, and even
means; most articles report means, and the difference is easily algebraically, quite similar but have been developed as improve-
calculated. Some researchers argue that differences between the ments or to serve different types of data and different purposes.
means are generally sufficient and superior to other ways of The emergence of a consensus to use a few selected estimates
quantifying effect size (e.g., Baguley, 2009; Wilkinson & the APA would probably be a useful simplification, as long as the choice
Task Force on Statistical Inference, 1999). The raw difference was driven by the genuine usefulness of those estimates and not
between the means can provide a useful estimate of effect size merely by their easy availability.
when the measures involved are meaningful ones, such as IQ or One important distinction to make among effect sizes is that
reading age, assessed by a standard scale that is widely used. some statistics, such as 2 and R2, describe the samples observed
Discussion of the effect would naturally focus on the raw differ- but may overestimate the population parameters, whereas others,
ence, and it would be easy to compare the results with other such as 2 and adjusted R2, attempt to estimate the variability in
research using the same measure. the sampled population and, thus, in replications of the experiment.
However, comparing means without considering the distribu- These latter statistics are often recommended by statistical text-
tions from which the means were calculated can be seriously books because they relate to the population and are less vulnerable
misleading. If two studies (A and B) each have two conditions with to inflation from chance factors. However, researchers very rarely
means of 100 and 108, it would be very misleading to conclude report these population estimates, perhaps because they tend to be
that the effects in the two studies are the same. If the standard smaller than the sample statistics.
deviations for the conditions in Study A were both two and in Although the APA Publication Manual has strongly advocated
Study B were both 80, then it is clear that the distributions for the reporting of effect sizes for 10 years and many psychology
Study A would have virtually no overlap, whereas those for Study editors have done so for longer than that (e.g., Campbell, 1982;
B would overlap substantially. Using Cohen’s U1, which we Levant, 1992; Murphy, 1997), a glance through many journals
describe later, we find that only 2% of the distributions for Study suggests that such reporting is inconsistent. Morris and Fritz
A would overlap, given the standardized difference between the (2011) surveyed cognitive articles published in 2009; they found
means (d) of 4, but 92% of the distributions would overlap in that only two in five of these articles intentionally reported effect
Study B because the standardized difference between these means sizes. Isabel Gautier, as the incoming editor of JEP: General,
is d ⫽ 0.1. Significance tests make the difference between the two asked us to conduct a similar survey of recent volumes of this
studies quite clear. For the study with standard deviations of two, journal and to review the methods of calculating effect size esti-
a t test would find a two-tailed significant difference (p ⬍ .05) mates.
with three participants per group, but 770 participants per group
would be needed to obtain a significant difference for the study Method
with standard deviations of 80. The consequence of the difference
in the size of the distributions is also obvious when considering the We reviewed articles published in the 2009 and 2010 volumes
CIs: With 50 samples in each study, Study A’s CI ⫽ ⫾0.6, of JEP: General, noting the statistical analyses, descriptive statis-
whereas Study B’s CI ⫽ ⫾22.2. These examples illustrate how tics, and effect size estimates reported in each.
comparisons between means without considering the variability of
the data can conceal important properties of the effect. To address Results
this problem, standardized effect size calculations have been de-
veloped that consider variability as well as the differences between Table 1 provides frequencies of the most commonly used sta-
means. Effect size calculations are addressed by a growing number tistical analyses for each year; corresponding percentages are il-
of specialized texts, including Cumming (2012), Ellis (2010), lustrated in Figure 1. Note that data are reported for each article,
Grissom and Kim (2005, 2011), and Rosenthal, Rosnow, and not for each experiment, but the analyses were similar across
Rubin (2000) as well as many general statistical texts. experiments in most articles. Analysis of variance (ANOVA) was
When examining the difference between two conditions, effect reported in most articles, 83% overall, followed by t tests, 66%
sizes based on standardized differences between the means are overall; these were often used together to locate the source of
commonly recommended. These include Cohen’s d, Hedges’s g, effects in factorial designs. Overall, 58% of the articles reported at
and Glass’s d and ⌬. When independent variables have more than least one measure of effect size (73% for 2009, 45% for 2010).
4 FRITZ, MORRIS, AND RICHLER
Table 1 value for the test, although most reported some statistics associated
Numbers of Articles Reporting Commonly Used Statistical with the predictors, such as the t tests, the regression weights, the
Analyses partial correlations, or the odds ratios. We counted all reports of R2
as estimates of effect size, although most were not explicitly
Year Articles ANOVA t test Correlation Regression presented as such.
2009 33 27 23 14 8 A few nonparametric and frequency-based tests were also re-
2010 38 32 24 13 9 ported; only one of these included a measure of effect size. These
Overall 71 59 47 27 17 reports also tended to neglect statistical summaries of the data.
Note. Corresponding percentages are shown in Figure 1. ANOVA ⫽
analysis of variance. Discussion
Our initial concern over the reporting of effect sizes was justi-
Different analyses are often associated with different estimates fied by our analysis. Across the 2 years studied, 42% of articles
of effect size; therefore, the use of specific effect size estimates is reported no measure of effect size. Most of the articles counted as
reported within the context of the analysis conducted. Table 2 including effect sizes reported them for only some of the analyzed
shows the effect size estimates used in conjunction with ANOVA effects. Even where articles reported 2p for ANOVA analyses,
type analyses, including analysis of covariance. Overall, slightly they often omitted effect size estimates for nonsignificant effects
more than half of the articles reported a measure of effect size and other comparisons. Fewer than a third of the articles reporting
associated with ANOVA at least once; 2p was by far the most t tests included associated effect size estimates to aid in interpret-
frequently used, almost certainly because it is provided by SPSS. ing the results. On the positive side, reported effect sizes in the
Effect size estimates were rarely reported for further JEP: General articles were clear with respect to which effect size
ANOVA-related analyses, such as simple effects and post hoc and statistic was used. Our recent survey of cognitive articles (Morris
planned comparisons. Table 3 summarizes the frequency of further & Fritz, 2011) found articles in which effect sizes were wrongly
ANOVA-related analyses and the inclusion of effect size esti- identified, and we have occasionally encountered articles that
mates. Most articles did not report all components of the ANOVA; report effect size figures without identifying which statistic was
only 10 of the 59 articles (five from each year) reported the mean used. As Vacha-Haase and Thompson (2004) observed, it is es-
square error (MSE) terms and only 20 (12 for 2009 and 8 for 2010) sential to correctly identify which statistic is used: Reporting that
reported F ratios for all effects. When reporting ANOVA, articles an effect size is .5 means something very different depending on
usually included descriptive statistics for the data, either in terms whether the statistic used is d, r, r2, 2, 2, or others.
of individual cells or marginals. At least some means associated We observed almost no interpretation of the effect sizes that
with ANOVA were reported in 93% of the articles (93% for 2009 were reported, despite APA’s direction to address the “theoretical,
and 94% for 2010); some measure of variability was less often clinical, or practical significance of the outcomes” (APA, 2010, p.
reported, appearing in only 80% of the articles (81% for 2009 and 36). Clearly effect sizes are important in a clinical and practical
78% for 2010). sense. Are they less relevant in a theoretical sense? If theories are
When reporting t tests, roughly one quarter of articles included solely concerned with the statistical significance of effects and not
a measure of effect size; Cohen’s d was the most often used effect with their size, then perhaps there is no useful role for effect size
size estimate. See Table 4 for numbers and percentages of effect consideration in interpretation, but surely good theories are con-
size estimates and descriptive statistics reported in association with cerned with substantive significance rather than merely statistical
t tests. Descriptive statistics were less often provided than for significance. A theory that only predicts a difference (or relation-
ANOVA; almost one quarter did not report a measure of central
tendency, and almost half failed to report the variability of the
data.
Neither intentional reporting of effect size estimates nor descrip-
tive statistics tended to accompany reports of correlations. Refer to
Table 5 for numbers and percentages of articles intentionally
reporting effect size estimates and descriptive statistics associated
with correlation analyses. Fewer than 10% of the articles reporting
correlations provided r2 or any other associated effect size estimate
beyond the correlation, and fewer than one quarter reported de-
scriptive statistics for the data. Although r is a useful estimate of
effect size, there is a difference between reporting it as a correla-
tion and treating it as an estimate of effect size; none of these
articles appeared to present it as an effect size estimate.
Figure 1. Percentage of published articles including each type of analy-
Various types of regression were also reported in 17 articles.
sis. Other types of analyses were also observed with lower frequencies,
Most of these were very selective in terms of the statistics reported including 2 (18% for 2009 and 16% for 2010), nonparametric difference
from the analysis and in terms of descriptive statistics; there were tests (3% for 2009 and 11% for 2010), and Cronbach’s alpha (6% for 2009
almost no reports of effect size. Table 6 shows numbers and and 11% for 2010); these were not accompanied by measures of effect size
percentages of statistics reported in association with regression and are not discussed here. Corresponding frequencies are shown in Table
analyses. Most articles did not report the F ratio or significance 1. ANOVA ⫽ analysis of variance.
EFFECT SIZE ESTIMATES 5
Table 2
Number (and Percentage) of Articles Reporting Effect Size Estimates Associated With ANOVA
Note. Articles were included in the counts if the statistic was reported at least once. Percentages across measures sum to more than 100% because some
articles included more than one measure of effect size. ANOVA ⫽ analysis of variance; ES ⫽ effect size.
ship) but is not concerned with the size of that effect will be one means, are point estimates. They describe the sample and provide
that is quite difficult to falsify and perhaps even more difficult to an estimate of the population parameter. For an estimate to be
apply. useful, it is important to provide some idea of how precise that
It appears that effect sizes may be reported to meet the minimum estimate might be—the expected range within which the popula-
letter of the law, with little regard for the spirit of the law. The tion parameter falls with some specified probability. When means
preponderance of 2p in these analyses and the sparsity of discus- are reported, it is widely accepted good practice to report some
sion of reported effect sizes is consistent with a scenario wherein measure of variability— either the standard error, the standard
people obtain 2p values from their statistical software and report it deviation (from which the standard error is easily calculated), or a
as required, but they give scant consideration to the implications of CI. These variability statistics provide a guide to the probable
the values obtained. Little appears to have changed in the 60 years values for the population parameter. The effect size estimate also
since Yates (1951) observed that an emphasis on statistical signif- requires some accompanying description of its likely variability;
icance testing had two main negative consequences: Statisticians that variability statistic is the associated CI.
develop significance tests for The lack of these CIs in research reports is, however, under-
problems . . . of little or no practical importance [and] scientific standable. Textbooks that describe effect size statistics typically do
research workers . . . pay undue attention to the results of the tests of not provide associated guidance for calculating the CIs. Com-
significance they perform on their data, particularly data derived from monly used statistical software packages also fail to provide them.
experiments, and too little to the estimates of the magnitude of the Furthermore, most measures of effect size are noncentrally distrib-
effects they are investigating. (p. 32). uted (see, e.g., Ellis, 2010, pp. 19 –21; Grissom & Kim, 2005, p.
64), a somewhat nonintuitive concept that makes them more dif-
Many researchers may be cautious about engaging too deeply
ficult to understand and to calculate. The CIs above and below the
with the effect size values that they calculate because, in contrast
to the use of inferential statistics, they have far less experience in effect size estimate are not equal in size and have to be estimated
using the effect size estimates as an aspect of evaluating results by special software carrying out iterative procedures (e.g., Cum-
and providing guidance for future research. In part, this situation ming, 2012; Cumming & Finch, 2001; Smithson, 2003; Steiger,
may arise from the tendency to report 2p, which has limited 2004). These unusual characteristics of CIs for effect size esti-
usefulness. The 2p statistic may be useful for cross-study compar- mates, combined with researchers’ lack of familiarity with them,
isons with identical designs, but where designs differ, G 2
is may help to explain why these CIs are not reported. In a later
needed. Within a factorial study, p cannot properly be used to
2 section on CIs, we suggest sources for relevant software and offer
compare effects; 2 is needed. We describe each of these measures formulas to approximate CIs for Cohen’s d and R2.
and discuss the possible uses and interpretations of various effect Analyses were sometimes reported without the relevant descrip-
sizes in a later section of the article. tive statistics, making it more difficult for the reader to understand
The fifth and sixth editions of the APA Publication Manual and evaluate the results. The most basic sort of effect size estimate
recommend the inclusion of CIs for effect size estimates, but none when evaluating differences is the difference between means, but
were included in any of the JEP: General articles that we exam- almost one quarter of t test reports were not accompanied by
ined, and these CIs were only reported in 1 of the 386 cognitive means, and almost half lacked reports of variability measures.
articles that we surveyed (Morris & Fritz, 2011). Effect sizes, like When evaluating correlations, it is also necessary to consider the
Table 3
Articles Reporting Additional Analyses Associated With ANOVA and Effect Size Reporting for Those Analyses
Note. Articles were included in the counts if the analysis or statistic appeared at least once. ANOVA ⫽ analysis of variance.
6 FRITZ, MORRIS, AND RICHLER
Table 4 Table 6
Number (and Percentage) of Articles Reporting Effect Size Number (and Percentage) of Articles Reporting Effect Size
Estimates and Descriptive Statistics Associated With t Tests Estimates and Descriptive Statistics Associated With Regression
Analyses
Effect size
estimates Descriptive statistics Descriptive statistics
Articles with
Year Articles with t test d 2
p M Variability Year regression R2a F M Variability
a
2009 23 9 (39) 0 18 (78) 16 (70) 2009 8 4 (7) 1 (13) 6 (75) 6 (75)
2010 24 3 (13) 2 (8) 18 (75) 10 (42)b 2010 9 2 (8) 3 (33) 3 (33) 2 (22)
Overall 47 12 (26) 2 (4) 36 (77) 26 (55) Overall 17 2 (7) 4 (24) 9 (53) 8 (47)
Note. Articles were included if the statistic was reported at least once. Note. Articles were included if the statistic was reported at least once.
a
Nine articles reported standard deviation, six reported standard error of a
One of the 2009 articles reported adjusted R2 as well as R2; no other
the mean, and one reported 95% confidence interval. b Nine articles articles reported adjusted R2.
reported standard deviation, and five reported standard error of the mean.
Readers may prefer to consult specialized statistics books address- The correction is very small when the sample size is large (only
ing effect sizes (e.g., Cumming, 2012; Ellis, 2010; Grissom & 3% for df ⫽ 25) but is more substantial with a smaller sample size
Kim, 2005; Rosenthal et al., 2000). (8% for df ⫽ 10). This value is not the same as the original
This article addresses several effect sizes: those specific to Hedges’s g (1982), described earlier, although g might be used to
comparing two conditions (Cohen’s d, Hedges’s g, Glass’s d or ⌬, refer to either; dunb is a less ambiguous symbol, but in either case
and point biserial correlation r), those describing the proportion of the formula should be provided for clarity.
variability explained (2, 2p, G2
, R2, the 2 family, adjusted R2, A discussion of the rather confusing history of the chosen
and ε ), and effect sizes for nonnormal data (z associated with the
2
symbols for these statistics can be found in Ellis (2010). For most
Mann–Whitney and Wilcoxon tests, and , Cramér’s V, and reasonably sized samples, the difference between Cohen’s d, cal-
Goodman–Kruskal’s lambda for categorical data). culated using n, and Hedges’s g, calculated using n ⫺ 1 degrees of
freedom (df), will be very small. Especially when sample sizes are
small, it is helpful for authors to clearly specify how the reported
Effect Sizes Specific to Comparing Two Conditions effect size estimates were calculated, regardless of what symbol is
The most common approach to calculating effect size when used, so that the reader can interpret them correctly and they might
comparing two conditions is to describe the standardized differ- be useful for subsequent meta-analyses.
ence between the means, that is, the difference between the means There is virtually always some difference between the standard
of the two conditions in terms of standard (z) scores. There are deviations of the two distributions. When the standard deviations
varieties of this approach, discussed later, based on the way the (sA and sB) and the sample sizes of the two distributions (A and B)
standard deviation is calculated. In all cases, the sign of the effect are very similar, it may be sufficiently accurate when estimating
size statistic is a function of the order assigned to the two condi- the combined standard deviation (sAB) to take the average of the
tions; where the conditions are not inherently ordered, a positive two standard deviations:
effect size should be reported. Online calculators for the standard- sA ⫹ sB
ized difference statistics are available (e.g., Becker, 2000; Ellis, sAB ⫽ .
2
2009).
Cohen’s d and Hedges’s g. Cohen (1962, 1988) introduced a When the standard deviations differ but the sample sizes for each
measure similar to a z score in which one of the means from the group are very similar, then averaging the square of the standard
two distributions is subtracted from the other and the result is deviations and taking the square root of the result is more accurate
divided by the population standard deviation () for the variables: (Cohen, 1988, pp. 43– 44; Keppel & Wickens, 2004, p. 160):
d⫽
MA ⫺ MB
, sAB ⫽ 冑 s2A ⫹ s2B
2
.
where MA and MB are the two means and refers to the standard However, where the sample size and/or the standard deviation of
deviation for the population. Hedges (1982) proposed a small the two distributions differ markedly it is usually recommended
modification for his statistic g in which the population standard (e.g., Keppel & Wickens, 2004) that the sums of squares and the
deviation (, calculated with a denominator of n, the number of degrees of freedom for the two variables should be combined with
cases) is replaced by the pooled sample standard deviation (s, the following formula (Keppel & Wickens, p. 160):
calculated with a denominator of n ⫺ 1)
g⫽
MA ⫺ MB
s
.
sAB ⫽ 冑 SSA ⫹ SSB
dfA ⫹ dfB
.
That is, the sum of squares for the two variables A and B should be
The standard deviations made available by common statistical added together, as should the degrees of freedom for the variables.
packages are for the sample(s) so that the more convenient statistic Then, the sum of the sums of squares is divided by the sum of the
for researchers to calculate is g rather than d. However, as we degrees of freedom, and the square root of the result taken. When
observed in our review, it is rare for authors to report Hedges’s g, not provided by the statistical package, the sum of squares for a
even though it may be what they have actually calculated. It variable can be easily calculated from the standard deviation as
appears to be the case that d may be often used as a generic term
for this type of effect size. For example, Borenstein, Hedges, SS ⫽ df ⫻ s2
Higgins, and Rothstein (2009) referred to the g statistic defined
or from the standard error of the mean (SE) as
above as d as does Comprehensive Meta-Analysis software that is
widely used for meta-analysis. These sources use g to refer to an SS ⫽ df ⫻ SE2 ⫻ N.
unbiased calculation, sometimes called dunbiased or dunb, that is
particularly useful for small sample sizes, where d tends to over- If pairs of conditions are being compared from among several
estimate the population effect size. The formula to adjust d, from that have been evaluated by an ANOVA, rather than working out
Borenstein et al. (p. 27), is the standard deviation for each comparison, it is acceptable to
8 FRITZ, MORRIS, AND RICHLER
replace the combined standard deviation for the multiple compar- Table 7
isons by the square root of the MSE (Grissom & Kim, 2005): Associated Values of Cohen’s d, r, r2 (or 2), PS, and U1
sAB ⬇ 冑MSE d r r2 or 2 PS U1
In an attempt to help with the interpretation of d, Cohen (1988) 0.0 .00 .000 50 0
suggested that d values of .8, .5, and .2 represented large, medium, 0.1 .05 .002 53 8
0.2 .10 .010 56 15
and small effect sizes, respectively, perhaps more meaningfully 0.3 .15 .022 58 21
described as obvious, subtle, and merely statistical. He recognized 0.4 .20 .038 61 27
that what would be a large, medium, or small effect size would, in 0.5 .24 .059 64 33
practice, depend on the particular area of study, and he recom- 0.6 .29 .083 66 38
0.7 .33 .11 69 43
mended these values for use only when no better basis for esti-
0.8 .37 .14 71 47
mating the effect size index was available. These designations 0.9 .41 .17 74 52
clearly do not reflect practical importance or substantive signifi- 1.0 .45 .20 76 55
cance, as those are judgments based on a more comprehensive 1.1 .48 .23 78 59
consideration of the research. 1.2 .51 .27 80 62
1.3 .55 .30 82 65
Glass’s d or ⌬. An alternative to both Cohen’s d and Hedg- 1.4 .57 .33 84 68
es’s g involves using the standard deviation for a control group 1.5 .60 .36 86 71
rather than a standard deviation based on combining the groups. 1.6 .63 .39 87 73
This approach is appropriate if the experimental manipulations are 1.7 .65 .42 89 75
1.8 .67 .45 90 77
thought to have distorted the distribution in some way. This
1.9 .69 .47 91 79
measure was proposed by Glass (1976) and is known as Glass’s d 2.0 .71 .50 92 81
or ⌬. 2.2 .74 .55 94 84
Point biserial correlation, r. There are alternatives to using 2.4 .77 .59 96 87
the standardized difference statistics as described earlier. Some 2.6 .79 .63 97 89
2.8 .81 .66 98 91
(e.g., Rosenthal, Rosnow, & Rubin, 2000) have preferred the point 3.0 .83 .69 98 93
biserial correlation coefficient, r, on the grounds that psychologists 3.2 .85 .72 99 94
are already familiar with it. Furthermore, r2 is equivalent to 2 and 3.4 .86 .74 99 95
other effect size estimates that describe the proportion of variabil- 3.6 .87 .76 99 96
3.8 .89 .78 100 97
ity associated with an effect, described later. For two groups, the
4.0 .89 .80 100 98
point biserial correlation, r, is calculated by coding group mem-
bership with numbers, for example, 1 and 2. The correlation Note. PS ⫽ probability of superiority. PS is the percentage of occasions
between these codes and the scores for the two conditions give the when a randomly sampled member of the distribution with the higher mean
value of point biserial r. It is also easy to calculate r if an will have a higher score than a randomly sampled member of the other
distribution. U1 ⫽ the percentage of nonoverlap between the two distribu-
independent samples t test has already been carried out because tions. Data are from Grissom (1994) and Cohen (1988); they assume
冑
similar sample sizes.
t2
r⫽ .
t ⫹ df
2
Table 8
Formulas for Deriving Effect Size Estimates Directly and Indirectly
To this statistic
Direct formula
d ⫽
MA ⫺ MB
r ⫽ 冑 SSeffect
SStotal
2 ⫽
SSfactor
SStotal
d — d d2
r ⫽ 2 ⫽
冑d 2
⫹4 d ⫹4
2
Point biserial r 2r — 2 ⫽ r2
d ⫽
冑1 ⫺ r 2
Note. When group sizes differ considerably (when one group has fewer than one third of the total N), then r is smaller than the above calculation. For
more information about the translation between statistics with very uneven sample sizes, see McGrath and Meyer (2006).
Table 9
Example Between-Groups Analysis of Variance Summary Table With Calculations of 2, 2p, 2, and 2p
The denominator term (SStotal) sums the degrees of freedom for the its own contribution to the total variability under consideration. If
error term with the products of each F ratio and its corresponding several factors vary together, they may jointly account for a
degrees of freedom. substantial proportion of the variability, but any individual factor
The 2 statistic is a useful measure of the contribution of an might contribute only a relatively small part of the whole. Alter-
effect— of a factor or an interaction—to the observed dependent native calculations, described later, produce variants of 2 that
variable. So, for example, examining the values of 2 in Table 9 eliminate some of the other variability from consideration.
reveals that Factor B accounts for twice as large a proportion of the Generalized eta squared (2G). Scientific research is a cumu-
total variability as Factor A, but also that the three-way interaction of lative activity; it is necessary to compare and combine the results of
Factors A, B, and C contributes as much variability as does Factor B. research across studies. Unfortunately, neither 2 nor 2p is well suited
For comparing the size of effects within a study, 2 is useful, but for making comparisons across studies with different designs. G 2
there are risks in comparing 2 values across studies with different provides a way to compare effect size across studies; it was introduced
designs. These risks derive from the differences in total variability that by Olejnik and Algina (2003) and Bakeman (2005) extended their
arise from manipulating additional variables, thereby adding variabil- description of its use for repeated measures designs. Like R2 and 2,
ity, or from controlling variables, thereby reducing variability. If the G2
gives an estimate of the proportion of variability within a study
effect of Factor A is the same across two studies (i.e., SSA remains that is associated with a variable but without the distorting effects of
constant), a study that manipulates Factor A alone will have a greater variables introduced in some studies but not others. For Olejnik and
value for 2 than one that manipulates Factor A and introduces an Algina, the distinction between manipulated factors and individual
additional manipulated Factor B. This difference is because in the differences factors is key. To illustrate the distinction, a study that
latter case, the total variability is increased by the variability intro- tested children in two different types of experimental rooms would
duced with Factor B. Conversely, controlling variables so that they do have room type as a manipulated variable. However, if the children
not contribute their variability to the overall ANOVA will, obviously, from a class were classified into groups by their ages and by their
reduce the SStotal. If the controlled variables do not interact with an personalities, these would be individual differences factors. The cen-
effect, so that the SSeffect is unchanged, then the 2 for that effect will tral idea when calculating G 2
is that the sums of squares for manip-
be larger than if the variables had not been controlled. ulated variables are not included in the denominator of the calculation,
Unmatched total variability is an issue for cross-study comparisons except under two conditions. Those conditions are (a) when calculat-
involving most measures of effect size. Cohen’s d, for example, ing G2
for the manipulated variable itself and (b) when calculating G 2
depends on the standard deviations of the variables and they, in turn, for an interaction between that manipulated variable and either an
depend on the extent to which other factors have been controlled. individual differences factor or a subject factor in a repeated measures
Psychologists calculating 2 for their own data for the first time design (i.e., the between-subjects error term).
are often disappointed by the size of the effect that they are We can demonstrate the calculation of G 2
using the Table 9
studying. A manipulation with an 2 of .04 accounts for only 4% example. Suppose that, continuing our developmental example, Fac-
of the total variability in the dependent variable—an amount that tor A is the room type in which the children are tested, Factor B is age
may seem trivial, especially when compared to r2 values com- group (younger or older children from the class), with two levels, and
monly seen in correlational research. It may be easier to deal with Factor C is a two-level classification of the children, such as introvert
small 2 values in terms of Cohen’s (1988, pp. 283–287) descrip- or extravert. Factor A is a manipulated factor but Factors B and C are
tion of large (.14), medium (.06), and small (.01) effects, but individual differences factors. To calculate G 2
for Factor B (Age, an
obviously it is the practical or theoretical importance of the effect individual differences factor), use the formula for 2 but remove from
that determines what size qualifies the outcome as substantively the total sums of squares in the denominator the sums of squares
significant. In most experimental research, observed effect sizes associated with Factor A because it is a manipulated factor— one that
are likely to be small; many factors influence behavior in almost adds variability to the design. Thus, although 2 for Factor B is
any area, and few of these will be examined in the analysis. It
would be an exceptional situation to research a behavior that was SSB 200
2 ⫽ ⫽ ⫽ .16,
determined by only one or two causal factors. Each factor makes SSTotal 1280
EFFECT SIZE ESTIMATES 11
The basic formula for estimating 2 in a one-way ANOVA or Effect Sizes for Nonparametric Data
in a factorial design is
2
p
Effect size estimates for Mann–Whitney and Wilcoxon non-
SSeffect ⫺ 共a ⫺ 1兲 ⫻ MSerror parametric tests. Most of the effect size estimates we have
2 or 2p for A ⫽ , described here assume that the data have a normal distribution.
SStotal ⫹ MSerror
However, some data do not meet the requirements of parametric
where a is the number of levels of the factor (Hays, 1973, p. 513). tests, for example, data on an ordinal but not interval scale. For
This same value can be calculated directly from the F statistic such data, researchers usually turn to nonparametric statistical
(Keppel & Wickens, 2004, p. 233): tests, such as the Mann–Whitney and the Wilcoxon tests. The
significance of these tests is usually evaluated through the approx-
共a ⫺ 1兲 ⫻ 共FA ⫺ 1兲 imation of the distributions of the test statistics to the z distribution
2 or 2p for A ⫽ ,
共a ⫺ 1兲 ⫻ 共FA ⫺ 1兲 ⫹ N when sample sizes are not too small, and statistical packages, such
as SPSS, that run these tests report the appropriate z value in
where a is the number of levels of the Factor A, and N is the total addition to the values for U or T; z can also be calculated by hand
number of subjects. (e.g., Siegel & Castellan, 1988). The z value can be used to
One can calculate 2 in a similar way for multifactor between- calculate an effect size, such as the r proposed by Cohen (1988);
subject designs. The numerator remains the same, but the denom- Cohen’s guidelines for r are that a large effect is .5, a medium
inator includes the product of the degrees of freedom and the F effect is .3, and a small effect is .1 (Coolican, 2009, p. 395). It is
ratio reduced by 1 for each of the effects (factors and interactions) easy to calculate r, r2, or 2 from these z values because
in the analysis. So, for a multifactor design,
z
r⫽ ,
2 ⫽
共a ⫺ 1兲 ⫻ 共Feffect ⫺ 1兲
,
冑N
⌺关dfeffect ⫻ 共Feffect ⫺ 1兲兴 ⫹ N
and
summing across all of the effects (Keppel & Wickens, 2004, p.
481). z2
r2 or 2 ⫽ .
We have used the formulas to calculate 2 and 2p for each of N
the factors in Table 9. For these particular imaginary data, 2 and
2 are similar and so are 2p and 2p. This near identity is because These effect size estimates remain independent of sample size
the example has a reasonable sample size, with just two levels for despite the presence of N in the formulas. This is because z is
each factor, and the effect itself is large. The size of the distortion sensitive to sample size; dividing by a function of N removes the
for sample rather than population effect size calculations (i.e., 2 effect of sample size from the resultant effect size estimate.
rather than 2) depends on the number of participants tested, the Effect sizes for categorical data. Categorical data are often
number of levels of the factors, and the size of the effect. More tested with the chi-square statistic (2) but, like ANOVA and t
participants, fewer levels, and larger effects lead to less difference tests, the significance of a 2 test depends on the sample size as
between 2 and 2. With reasonably sized samples, limited num- well as the strength of the association. There are various measures
bers of factor levels, and larger effects, the overestimation of 2 of association for contingency tables; we describe three that may
may often be acceptable. This is fortunate, because there are be used for unordered categories. These can be easily calculated
problems in estimating 2 for repeated measures designs; for using SPSS by choosing Analyse, Descriptive Statistics, Cross-
these, only a range, not the actual value, can be calculated (Keppel tabs, Statistics and choosing the appropriate statistic.
& Wickens, 2004, p. 427). Instead, 2 has to be reported, but the Where the data being analyzed are in a 2 ⫻ 2 contingency table
inflation of the estimate has to be recognized. Advice on calculat- the correlation coefficient can be used. One can calculate from
ing G 2
can be found in Olejnik and Algina (2003). 2 for the data using the formula
Radj and 2. For the R2 calculated by multiple regression,
2
there has long been the Wherry (1931) formula for calculating
adjusted or shrunken R2 (R2adj) with the aim of predicting, like 2,
⫽ 冑 2
N
,
the R2 to be expected if the study were to be repeated with a
sample from the same population. where N is the total sample size. If, for example, the obtained value
of 2 was 10 with a sample size of 40 then
2
Radj ⫽ 1 ⫺ 共1 ⫺ R2 兲 冉 N⫺1
N⫺k⫺1
, 冊 ⫽ 冑 10
⫽ 冑0.25 ⫽ .05.
40
where N is the sample size, and k is the number of independent
variables in the analysis. Many statistical software packages cal- Cramér (1946) extended the statistic to larger contingency tables
culate R2adj. than the 2 ⫻ 2 of the correlation. This statistic, known as
A similar approach is taken to calculating an effect size known Cramér’s V or c, modifies the formula for to be
as ε2 (Ezekiel, 1930), which is an alternative to 2. However, ε2 is
rarely reported, and we do not discuss it further here. Details of its
calculation can be found in Richardson (1996).
c ⫽ 冑 2
N共k ⫺ 1兲
,
EFFECT SIZE ESTIMATES 13
where N is the total sample size, and k is the number of rows or Table 11
columns in the table, whichever is the smaller. Be aware that, Altered Example Contingency Table With L ⫽ 0
unlike Pearson’s r, the square of , or of Cramér’s V, is not a valid
description of the proportion of variability accounted for (Siegel & Seminar attendance Adequate answer Poor answer Total
Castellan, 1988, p. 231). Attended 75 15 90
When the rows and columns of a contingency table represent a Not attended 45 9 54
predictor and a predicted variable, Goodman–Kruskal’s lambda Total 120 24 144
(L) describes how much the prediction is improved by knowing the
category for the predictor, a potentially useful description of the
size of the effect (Ellis, 2010; Siegel & Castellan, 1988). One may
calculate lambda from any size contingency table; two values can www.thenewstatistics.com. Bird (2002) described methods for cal-
be calculated: how well the row variable improves the predictabil- culating effect sizes for ANOVA, and Smithson (2003) provided
ity of the column variable and vice versa. Usually only one instructions and downloadable scripts for SPSS, SAS, SPlus, and
direction is meaningful. To calculate Lrow for predicting column R for calculating CIs for effect sizes associated with t tests,
membership from row membership, sum the highest frequency in ANOVA, regression, and 2 analyses at http://dl.dropbox.com/u/
each of the columns, subtract the largest row total, and divide by 1857674/CIstuff/CI.html. This webpage also provides links to
the total number of observations not in that largest row. The other websites that may be helpful. These calculators include
formula is consideration of the noncentral nature of the distribution. Further
details on calculating noncentral effect size CIs were given by
⌺j⫽1
k
nMj ⫺ max共Ri 兲 Steiger (2004). However, it may not always be possible or neces-
Lrow ⫽ ,
N ⫺ max共Ri 兲 sary to adjust for noncentrality: Bird (2002, p. 204) observed that
where k is the number of columns, nMj is the highest frequency in where the effect is not too large (e.g., d ⱕ 2) and there are
the jth column, max(Ri) is the largest row total, and N is the total sufficient degrees of freedom in the error term (more than 30), the
number of observations (Siegel & Castellan, 1988, p. 299). So, for adjustment makes little difference.
example, to determine how much attending a seminar improved CIs for d can be estimated with the procedure from Grissom and
the ability to predict an adequate answer on the relevant exam Kim (2005, pp. 59 – 60); this estimate does not adjust for noncen-
question, the contingency table might appear as in Table 10. One trality but is useful for normally distributed data, reasonable sam-
would calculate lambda as ple sizes (at least 10 per group), and values of d that are not very
large. The calculation is based on Hedges and Olkin’s (1985)
共75 ⫹ 30兲 ⫺ 90 15 formula for calculating the variance (sd2) for the theoretical sam-
Lrow ⫽ ⫽ ⫽ .28,
144 ⫺ 90 54 pling distribution of d:
useful graphical aspect of meta-analysis (for notes about their white background, d ⫽ 1.31. Consulting Table 7 gives a PS of 82%
origin, see Lewis & Clarke, 2001). for this value of d. That is, if pairs of pictures, one with a red and
Cohen et al. (2003, p. 88) described a method for estimating CIs one with a white background, were selected at random, the picture
for R2, provided that the sample size is greater than 60. The with the red background would be reported as more attractive on
standard error of R2 is calculated 82% of comparisons. This use of the PS statistic helps to demon-
冑
strate the size of the effect in a more concrete and meaningful way
4R2 共1 ⫺ R2)2(n ⫺ k ⫺ 1兲2 than the standardized difference. This concept has been elaborated
SER2 ⫽ ,
共n2 ⫺ 1兲共n ⫹ 3兲 and extended by Vargha and Delaney (2000) to include all types of
ordinal and interval data.
where n is the number of cases and k is the number of independent
Table 7 also reports U1, which was devised by Cohen (1988). U1
variables. The bounds of a 67% CI can be estimated as R2 ⫾ SER2;
describes the degree of nonoverlap between the two population
factors of 1.3, 2, or 2.6 can be applied to the standard error to
distributions for various values of the effect sizes. For example,
provide estimates of 80%, 95%, or 99% CIs, respectively. This
when d ⫽ 0, the populations for the two distributions are perfectly
estimate does not adjust for noncentrality, but with larger samples,
superimposed on each other, and the value of U1 is zero; when d ⫽
the expected error is small.
0.5, U1 ⫽ 33%, one third of the areas in the distributions do not
overlap. U1 ⫽ 81% for the difference between the height of men
Translating Between Effect Sizes and women with d ⫽ 2.0 (McGraw & Wong, 1992); that is, 81%
We have described many ways of estimating effect sizes. Per- of the distributions for men and women do not overlap. For Elliot
haps one of the reasons why effect sizes are underreported and et al.’s (2010, Experiment 2) data on the attractiveness of men seen
infrequently discussed is that effect sizes may be reported using with red or white backgrounds, the U1 percentage nonoverlap of
one statistic in one study and a different statistic in another study, the distributions for the value of d ⫽ 1.31 is 65%. As for PS, the
making it difficult to compare the effect sizes. Many of the effect U1 statistic helps the reader to visualize the size of the effect being
size estimates can be converted to other estimates. In Table 8, we reported.
have provided formulas for translation between d, r, and 2. The substantive significance, or importance, of an effect de-
pends in part on what is being studied. Rosnow and Rosenthal
(1989), for example, illustrated how a very small effect relating to
Interpreting Effect Sizes
life-threatening situations, such as the reduction of heart attacks, is
The object of reporting effect sizes is to better enable the reader important in the context of saving lives on a worldwide basis (see
to interpret the importance of the findings. All other things being Table 12 and Ellis, 2010). When the data are the correlation of two
equal, the larger an effect size, the bigger the impact the experi- binary variables—such as having or not having a heart attack when
mental variable is having and the more important the discovery of in a treatment or a control condition—Rosnow and Rosenthal
its contribution is. recommended the use of what they called the binomial effect size
In Table 7, we offer not only corresponding values for d, r, r2, display to represent the relationship. The use of the binomial effect
and 2 but also two statistics—probability of superiority (PS) and size display is illustrated in their example: Table 12 shows the
the percentage of nonoverlap of the distributions (U1)—that help frequency of heart attacks in a large study of doctors who took
to clarify the relationships between the distributions of the condi- either aspirin or a placebo for the effect size r ⫽ .034. The success
tions being compared. The values of these statistics help the rate for the treatment is .50 ⫹ r/2 and for the control group is .50 ⫺
readers of reports to imagine the relationships between the two r/2. For the example in Table 12, these values are .50 ⫹ .017 and
distributions from which the effect size was calculated. We suggest .50 ⫺ .017. The table cells are then made up to complete 100% for
that one of these statistics be given along with the effect size the columns and rows. The success rate for the treatment is
estimate for the more important results reported in an article. calculated by subtracting the treatment effect (e.g., for aspirin)
PS gives the percentage of occasions when a randomly sampled from the control effect (e.g., the placebo). For our example, that is,
member of the distribution with the higher mean will have a higher 51.7 ⫺ 48.3 ⫽ 3.4% or r ⫽ .034; thus, 34 people in 1,000 would
score than a randomly sampled member of the other distribution. be spared heart attacks if they regularly took the appropriate dose
The values in Table 7 were abstracted from Grissom (1994). PS is of aspirin. It should be noted that although the simplicity of
also known as the common language effect size (McGraw &
Wong, 1992). Consider, as an example, a medium size effect of
d ⫽ 0.5 as defined by Cohen (1988). The PS for a d of 0.5 is 64%. Table 12
That is, if you sampled items randomly, one from each distribu- Binomial Effect Size Display for the Effect of Aspirin on Heart
tion, the one from the condition with the higher mean would be Attack Risk (r ⫽ .034)
bigger than that from the other condition for 64% of the pairs. A
real-world example is given by McGraw and Wong (1992): The d Treatment Heart attack No heart attack Total
for the difference in height between men and women is 2.0 for Aspirin 48.3 51.7 100
which the PS is 92%. That implies that if you compared randomly Placebo 51.7 48.3 100
chosen men and women, the man would be taller than the woman Total 100 100 200
for 92% of the comparisons. Finally, selecting an example from
Note. Values are percentages. Adapted from “Statistical Procedures and
the JEP: General articles that we reviewed earlier, Elliot et al. the Justification of Knowledge in Psychological Science,” by R. L. Rosnow
(2010, Experiment 2) found that women rated men seen in pictures & R. Rosenthal, 1989, American Psychologist, 44, p. 1279. Copyright 1989
with a red background as more attractive than men seen against a by the American Psychological Association.
EFFECT SIZE ESTIMATES 15
calculation and clarity of presentation of the binomial effect size Nevertheless, we suggest that in some cases, in addition to report-
display is attractive, Hsu (2004) has shown that it can overestimate ing PS and/or U1 to clarify the interpretation of an effect size, it is
success rate differences unless various conditions are met. often worthwhile to report more than one measure of effect size to
better interpret the results. It would, for example, be appropriate to
Considerations When Reporting and report 2p to indicate the proportion of variability associated with a
Using Effect Sizes factor when all others are controlled, but also to report G 2
to give
an idea of the contribution that the factor makes to the overall
Effect sizes estimates are important and useful descriptive statistics. performance when other nonmanipulated variables are allowed to
Like all good descriptive statistics, they reflect the properties of the vary. Both of these values would be useful in evaluating the effect.
data and the conditions under which the data were collected. Just as To provide another example, Cohen’s d is useful for conceptual-
means are valuable estimates of central tendency that can, neverthe- izing and comparing the size of a difference independently of the
less, be misleading if the distribution is skewed—for example, when specific measure used; it enables comparisons between studies
studying income or life expectancy—so effect sizes must be con- concerned with the same factor but using different dependent
sidered within the context of the design and procedure, also con- measures. However, interpretation of the results and of compari-
sidering the properties of the distributions. If the measures used are sons could be enriched by also considering r or r2 as measures of
unreliable or if their range has been restricted, then the value of the the relative impact that the factor has on the outcome as is
effect size estimate will be different from, and probably smaller sometimes done in regression analyses where both the value of the
than, one that comes from very reliable measures or data that cover standardized regression coefficient and the proportion of variance
the full range. The allocation of observed variability to identified accounted for are discussed. The APA Publication Manual (APA,
effects or to error will also influence estimates of effect size. 2010, p. 34) specifically suggests that it will often be useful to
Imagine studies of a factor that has a similar effect on people from report and discuss effect size estimates in terms of the original
various economic classes. One study samples only middle-class units, as well as the standardized approaches. The effect size
people; the error variability in this case would be smaller than the expressed in original units is often clear and easy to discuss in the
error variability in another, similar study that samples more context of a single study, whereas the standardized units approach
widely. Because the error variability is smaller in the first case, the facilitates comparisons between studies and meta-analyses. It is
size of the effect is likely to appear larger. Yet, another study also useful, when disentangling the effect of a factor with more
might account for variability associated with socioeconomic group than two levels, to provide an effect size estimate for the full effect
by including income as a factor or covariate in the analysis, of the factor and for each of the pairwise comparisons or other
thereby reducing the error variability and increasing the apparent linear contrasts (see Keppel & Wickens, 2004). Similarly, analysis
effect size. In general, if variables are controlled in a study and of simple main effects associated with an interaction should in-
therefore do not contribute to error variability, the estimated effect clude effect size estimates both for the interaction and for the
size is likely to be larger than effect sizes for studies in which simple main effects.
variables have not been controlled or have been counterbalanced Good practice with respect to effect size reporting appears to be
across the conditions (without including the counterbalancing fac- on the increase but does not seem to have been fully adopted by
tor in the analysis). It is possible to correct some effect size most authors. Roughly half of the ANOVA reports included a
estimates for some of these distorting factors by using statistics measure of effect size, although few included effect size estimates
such as G 2
and G 2
(Baguley, 2009; Grissom & Kim, 2005; Olejnik for further analyses related to the ANOVA. In a few articles,
& Algina, 2003), but in all cases, interpretation and comparison of authors were thorough in reporting 2p for the main effect in an
effect sizes requires careful consideration of the sources of vari- ANOVA and reporting Cohen’s d or 2 for simple effects or
ability. planned or post hoc comparisons. As with all analyses, it is
The key point is that all estimates of effect size should be important to think carefully about which type of effect size is most
evaluated in the context of the research. It is not sensible to say of useful for each comparison (e.g., 2 or 2p). Keppel and Wickens
some phenomenon that its effect size is X without qualifying under (2004) and Rosenthal et al. (2000) provided helpful advice on
what conditions it has been found to be X. Nevertheless, estimates using contrasts and comparisons in ANOVA designs.
of effect size provide both an invitation to further, meaningful We began this research with an interest in the use of effect sizes
interpretation and a useful metric for considering multiple, varied as a way of quantitatively describing effects—as a supplement to
studies together. Complete effect size information, including the the descriptions of the data and the results of statistical tests for
CIs of the effect size estimates, is helpful to subsequent meta- those effects. We found that authors have begun to include reports
analyses, and these meta-analyses make an excellent contribution of effect size estimates with substantial encouragement from the
to furthering the understanding of psychological phenomena. Just APA and related professional organizations as well as from journal
as psychology researchers have become sophisticated in dealing editors. Nevertheless, although slightly more than half of the
with the complexities of inferential statistics, the regular consid- articles report some effect size estimate, the majority of individual
eration of effect sizes can lead to these statistics being demystified effects that are tested and reported are still not accompanied by
and becoming valuable tools. descriptions of effect size. We also found that descriptions of data
In our surveys of the reporting of effect sizes, we have not were often lacking. For a reader to engage with, think though, and
encountered any occasion when more than one effect size was fully consider the implications of the results of a study, descrip-
reported for any particular effect. This selectivity may result from tions of data and of the size of observed effects— both significant
efforts toward conciseness in reporting, or it may reflect a strategy and nonsignificant—are needed. It is not enough to simply identify
of doing the minimum required to placate reviewers and editors. that some effects were significant and others were not. There have
16 FRITZ, MORRIS, AND RICHLER
been calls from some quarters to shift the emphasis away from tation provided in the report, and where effects or Ns are
inferential testing and toward a more descriptive and thoughtful small, indicate the possible inflation of 2 by also report-
approach to interpreting results (e.g., Cohen, 1994; Loftus, 1996). ing 2, G2
, and/or 2p.
Although we are sympathetic to many of those concerns, we value
an approach that includes complete reporting of statistical tests 3. For complex analyses, such as factorial ANOVA or mul-
combined with descriptions of both the data and the effects. The tiple regression, report all effects. Report the results for
value of a piece of research goes beyond its significant effects. The each effect, including F, df, and MSE, so that the reader
richness of the story and the argument presented by the research is can calculate effect sizes other than those reported.
essential to the development of greater understanding (e.g., Abel-
4. Take steps to aid the reader to understand and interpret
son, 1995), but the patterns in the data and in the effects must be
the size of the more important effects. Use statistics such
reported in order for the reader to engage with the author in
as the PS and Cohen’s U1 or Goodman–Kruskal’s lambda
comprehending and evaluating the results of the research. At the
to help the reader conceptualize the size of the effect.
moment, for most authors, considering effect sizes seems to be the
last stage of their examination of their data. We believe that it 5. Always discuss the practical, clinical, or theoretical im-
should become one of the first stages. A clear grasp of the size of plications of the more important of the effect sizes ob-
the effects observed is at least as important as significance testing tained.
or the calculation of CIs.
When reporting requirements change, it is usually necessary for References
people to learn, perhaps to teach themselves, about the new sys-
tems. It is not always easy to do so. Because we teach statistics as Abelson, R. P. (1995). Statistics as principled argument. Hillsdale, NJ:
well as conduct research, we have been driven to explore the types Erlbaum.
American Psychological Association. (2001). Publication manual of the
of effect sizes and the usefulness of each. Our review of the
American Psychological Association (5th ed.). Washington, DC: Author.
reporting of effect sizes suggests that many authors have sought American Psychological Association. (2010). Publication manual of the
the minimum engagement with effect sizes that is possible while American Psychological Association (6th ed.). Washington, DC: Author.
still being published. This approach is suggested by the frequent Aron, A., Aron, E. N., & Coups, E. (2009). Statistics for psychology (5th
choice of effect size measures that are easily available (e.g., 2p) ed.). Upper Saddle River, NJ: Pearson.
but less than optimally useful and usually not those recommended Baguley, T. (2009). Standardized or simple effect size: What should be
by the authors of statistical textbooks (e.g., 2). Statistical texts reported? British Journal of Psychology, 100, 603– 617. doi:10.1348/
likely to be accessed by researchers are often selective in their 000712608X377117
advice about effect sizes. There are excellent discussions of the Bakeman, R. (2005). Recommended effect size statistics for repeated
measures designs. Behavior Research Methods, 37, 379 –384. doi:
complexities of effect size available in specialist journals, but they
10.3758/BF03192707
tend to be presented in the often dense language of statistical
Becker, L. A. (2000). Effect size calculators. Retrieved from http://
formulas that are understandably avoided by all but the most www.uccs.edu/⬃faculty/lbecker/
competent or desperate researchers. We hope that this article Bird, K. D. (2002). Confidence intervals for effect sizes in analysis of
provides a shortcut in the process of accumulating the necessary variance. Educational and Psychological Measurement, 62, 197–226.
expertise to report and use effect sizes more effectively and helps doi:10.1177/0013164402062002001
people to appreciate the value of incorporating good descriptions Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009).
of data and effect sizes in their reports. Introduction to meta-analysis. Chichester, UK: Wiley. doi:10.1002/
We end with a minimum set of recommendations that are 9780470743386
designed for the novice effect size user (we include ourselves in Campbell, J. P. (1982). Editorial: Some remarks from the outgoing editor.
Journal of Applied Psychology, 67, 691–700. doi:10.1037/h0077946
this category) and are not intended to constrain the fuller use of
Cohen, J. (1962). The statistical power of abnormal–social psychological
alternative techniques. We suggest the following: research: A review. Journal of Abnormal and Social Psychology, 65,
145–153. doi:10.1037/h0045186
1. Always describe the data: (a) report means or other Cohen, J. (1988). Statistical power analysis for the behavioral sciences
appropriate measures of central tendency to accompany (2nd ed.). Hillsdale, NJ: Erlbaum.
every reported analysis, and (b) report at least one asso- Cohen, J. (1994). The earth is round (p ⬍ .05). American Psychologist, 49,
ciated measure of variability for each mean and the MSE 997–1003. doi:10.1037/0003-066X.49.12.997
for ANOVA analyses. Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple
regression/correlation analysis for the behavioral sciences (3rd ed.).
2. Also describe the effects: (a) report an effect size esti- Mahwah: NJ: Erlbaum.
mate for each reported analysis, (b) for the most impor- Coolican, H. (2009). Research methods and statistics in psychology. Lon-
tant effects, report complete effect size information, in- don, United Kingdom: Hodder.
cluding the CIs of the effect size estimates for possible Cramér, H. (1946). Mathematical methods of statistics. Princeton, NJ:
Princeton University Press.
use in subsequent meta-analyses, (c) for the difference
Cumming, G. (2012). Understanding the new statistics: Effect sizes, con-
between two sets of data, as a default, use Cohen’s d (or fidence intervals, and meta-analysis. New York, NY: Routledge.
Hedges’s g) as the effect size estimate and, for small Cumming, G., & Finch, S. (2001). A primer on the understanding, use and
sample sizes, also report dunbiased, and (d) for factorial calculation of confidence intervals based on central and noncentral
analyses, with due thought and consideration, select and distributions. Educational and Psychological Measurement, 61, 532–
report 2, G2
, and/or 2p as appropriate for the interpre- 574.
EFFECT SIZE ESTIMATES 17
Elliot, A. J., Niesta Kayser, D., Greitemeyer, T., Lichtenfeld, S., Gramzow, change the way we analyze data. Current Directions in Psychological
R. H., Maier, M., & Liu, H. (2010). Red, rank, and romance in women Science, 5, 161–171. doi:10.1111/1467-8721.ep11512376
viewing men. Journal of Experimental Psychology: General, 139, 399 – Maxwell, S. E., Kelley, K., & Rausch, J. R. (2008). Sample size planning for
417. doi:10.1037/a0019689 statistical power and accuracy in parameter estimation. Annual Review of
Ellis, P. D. (2009). Effect size calculators. Retrieved from http:// Psychology, 59, 537–563. doi:10.1146/annurev.psych.59.103006.093735
myweb.polyu.edu.hk/⬃mspaul/calculator/calculator.html McGrath, R. E., & Meyer, G. J. (2006). When effect sizes disagree: The
Ellis, P. D. (2010). The essential guide to effect sizes: Statistical power, case of r and d. Psychological Methods, 11, 386 – 401. doi:10.1037/
meta-analysis, and the interpretation of research results. Cambridge, 1082-989X.11.4.386
United Kingdom: Cambridge University Press. McGraw, K. O., & Wong, S. P. (1992). A common language effect size
Ezekiel, M. (1930). Methods of correlational analysis. New York, NY: statistic. Psychological Bulletin, 111, 361–365. doi:10.1037/0033-
Wiley. 2909.111.2.361
Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Morris, P. E., & Fritz, C. O. (2011). The reporting of effect sizes in
Educational Researcher, 5, 3– 8. cognitive publications. Manuscript submitted for publication.
Grissom, R. J. (1994). Probability of the superior outcome of one treatment Murphy, K. R. (1997). Editorial. Journal of Applied Psychology, 82, 3–5.
over another. Journal of Applied Psychology, 79, 314 –316. doi:10.1037/ doi:10.1037/h0092448
0021-9010.79.2.314 Olejnik, S., & Algina, J. (2003). Generalized eta and omega squared
Grissom, R. J., & Kim, J. J. (2005). Effect sizes for research: A broad statistics: Measures of effect size for some common research designs.
practical approach. New York, NY: Psychology Press. Psychological Methods, 8, 434 – 447. doi:10.1037/1082-989X.8.4.434
Grissom, R. J., & Kim, J. J. (2011). Effect sizes for research: A broad Richardson, J. T. E. (1996). Measures of effect size. Behavior Research
practical approach (2nd ed.). New York, NY: Psychology Press. Methods, Instruments & Computers, 28, 12–22. doi:10.3758/
Hays, W. L. (1973). Statistics for the social sciences (2nd ed.). New York, BF03203631
NY: Holt, Reinhart, & Winston. Rosenthal, R., Rosnow, R. L., & Rubin, D. B. (2000). Contrasts and effect
Hedges, L. V. (1982). Estimation of effect size from a series of indepen- sizes in behavioral research: A correlational approach. Cambridge, UK:
dent experiments. Psychological Bulletin, 92, 490 – 499. doi:10.1037/ Cambridge University Press.
0033-2909.92.2.490 Rosnow, R. L., & Rosenthal, R. (1989). Statistical procedures and the
Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. justification of knowledge in psychological science. American Psychol-
San Diego, CA: Academic Press. ogist, 44, 1276 –1284. doi:10.1037/0003-066X.44.10.1276
Hoenig, J. M., & Heisey, D. M. (2001). The abuse of power: The pervasive Siegel, S., & Castellan, N. J. (1988). Nonparametric statistics for the
fallacy of power calculations for data analysis. American Statistician, 55, behavioral sciences (2nd ed.). New York, NY: McGraw-Hill.
19 –24. doi:10.1198/000313001300339897 Smithson, M. (2003). Confidence intervals. Thousand Oaks, CA: Sage.
Howell, D. C. (2002). Statistical methods for psychology (5th ed.). Pacific Steiger, J. H. (2004). Beyond the F test: Effect size confidence intervals
Grove, CA: Duxbury. and tests of close fit in the analysis of variance and contrast analysis.
Hsu, L. M. (2004). Biases of success rate differences shown in binomial Psychological Methods, 9, 164 –182. doi:10.1037/1082-989X.9.2.164
effect size displays. Psychological Methods, 9, 183–197. doi:10.1037/ Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th
1082-989X.9.2.183 ed.). Boston, MA: Pearson.
Huberty, C. J. (2002). A history of effect size indices. Educational and Vacha-Haase, T., & Thompson, B. (2004). How to estimate and interpret
Psychological Measurement, 62, 227–240. doi:10.1177/ various effect sizes. Journal of Counseling Psychology, 51, 473– 481.
0013164402062002002 doi:10.1037/0022-0167.51.4.473
Kelley, K., & Rausch, J. R. (2006). Sample size planning for the standard- Vargha, A., & Delaney, H. D. (2000). A critique and improvement of the
ized mean difference: Accuracy in planning estimation via narrow CL common language effect size of McGraw and Wong. Journal of
confidence intervals. Psychological Methods, 11, 363–385. doi:10.1037/ Educational and Behavioral Statistics, 25, 101–132.
1082-989X.11.4.363 Wherry, R. J. (1931). A new formula for predicting the shrinkage of the
Keppel, G., & Wickens, T. D. (2004). Design and analysis: A researcher’s coefficient of multiple correlation. Annals of Mathematical Statistics, 2,
handbook (4th ed.). Upper Saddle River, NJ: Pearson. 440 – 457. doi:10.1214/aoms/1177732951
Levant, R. F. (1992). Editorial. Journal of Family Psychology, 6, 3–9. Wilkinson, L., & the APA Task Force on Statistical Inference. (1999).
doi:10.1037/0893-3200.6.1.5 Statistical methods in psychology journals: Guidelines and explanations.
Lewis, S., & Clarke, M. (2001). Forest plots: Trying to see the wood and American Psychologist, 54, 594 – 604. doi:10.1037/0003-066X.54.8.594
the trees. British Medical Journal, 322, 1479 –1480. doi:10.1136/ Yates, F. (1951). The influence of statistical methods for research workers
bmj.322.7300.1479 on the development of the science of statistics. Journal of the American
Loftus, G. R. (1996). Psychology will be a much better science when we Statistical Association, 46, 19 –34. doi:10.2307/2280090
(Appendix follows)
18 FRITZ, MORRIS, AND RICHLER
Appendix
Table A1
Number of Participants per Group Required for t Tests to Table A2 is also adapted from Cohen (1988); it lists power
Achieve Selected Levels of Power, Based on the Anticipated Size levels for small, medium, and large effect sizes given some num-
of the Effect ber of groups and participants per group. These values apply to
ANOVAs and two-tailed t tests.
Effect size for one-tailed test Effect size for two-tailed test Power may not be the sole consideration when estimating the
Small Medium Large Small Medium Large
number of participants required. Sample means and variability
Power (d ⫽ .2) (d ⫽ .5) (d ⫽ .8) (d ⫽ .2) (d ⫽ .5) (d ⫽ .8) provide estimates of the population parameters; the accuracy or
precision of those estimates is a function of the sample size. It may
.25 48 8 4 84 14 6 be as useful or more useful in some cases to estimate the sample
.50 136 22 9 193 32 13
.60 181 30 12 246 40 16 size required for a desired degree of accuracy in parameter esti-
.67 216 35 14 287 47 19 mation based on defining the maximum acceptable confidence
.70 236 38 15 310 50 20 interval width. Maxwell, Kelley, and Rausch (2008) provided an
.75 270 44 18 348 57 23
.80 310 50 20 393 64 26 excellent discussion of power and accuracy in parameter estima-
.85 360 58 23 450 73 29 tion; practical guidance is available there and in other articles (e.g.,
.90 429 69 27 526 85 34
.95 542 87 35 651 105 42 Kelley & Rausch, 2006) and texts (Cumming, 2012).
Note. Where power is .8, there is a 20% chance of failing to detect an
effect. Adapted from Statistical Power Analysis for the Behavioral Sci- Received April 1, 2011
ences (2nd ed., pp. 54 –55), by J. Cohen, 1988, Hillsdale, NJ: Erlbaum. Revision received May 15, 2011
Copyright 1988 by Taylor & Francis. Accepted May 15, 2011 䡲
Correction to Fritz et al. (2011)
The article “Effect Size Estimates: Current Use, Calculations, and Interpretation,” by Catherine O.
Fritz, Peter E. Morris, and Jennifer J. Richler (Journal of Experimental Psychology: General,
Advance online publication. August 8, 2011. doi:10.1037/a0024338) contained a production-related
error. The sixth equation under “Effect Sizes Specific to Comparing Two Conditions” should have
had a plus sign rather than a minus sign in the denominator. All versions of this article have been
corrected.
DOI: 10.1037/a0026092
Diagnostic efficiency statistics include sensitivity, specificity, and positive and negative predictive
power. In reviewing the literature on the performance of self-report questionnaires to screen for
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
depression, we found errors in several published articles in which these statistics were computed. To
This document is copyrighted by the American Psychological Association or one of its allied publishers.
determine the extent of this problem we examined all studies of the diagnostic performance of self-
report scales published between 1980 and 1991 in the Journal of 'Consulting and Clinical Psychology
and Psychological Assessment: A Journal of Consulting and Clinical Psychology. We found 26 rele-
vant studies: 9 had an error in the calculation of diagnostic efficiency statistics and 3 made calcula-
tions based on unconventional definitions of the terms. Moreover, no study reported all 4 diagnostic
statistics together with the total and chance-corrected level of agreement between the scale and
the diagnostic gold standard. Recommendations for standardized reporting are suggested, and the
implications of these findings are discussed.
As part of a review of the literature on the use of self-report easily computed with certain raw data. Studies of test perfor-
questionnaires to screen for and diagnose depression, we were mance are typically presented as in Figure 1.
surprised to discover several mistakes in the reporting of diag- Sensitivity refers to a test's ability to identify correctly indi-
nostic efficiency statistics in three studies published in the Jour- viduals with the illness, whereas specificity refers to the test's
nal of Consulting and Clinical Psychology and Psychological ability to identify non-ill persons. Sensitivity, also called the true
Assessment: A Journal of Consulting and Clinical Psychology positive rate, is the percentage of ill persons who are identified
(Gallagher, Breckenridge, Steinmetz, & Thompson, 1983; M. by the test as ill [a/(a + c)]. Specificity, the true negative rate, is
Hesselbrock, V. Hesselbrock, Tennen, Meyer, & Workman, the percentage of non-ill persons correctly identified by the test
1983; Nelson & Cicchetti, 1991). To determine the extent of as non-ill [d/(b + d)].
this problem, we examined all studies of the diagnostic perfor- Sensitivity and specificity provide useful psychometric infor-
mance of self-report scales published in these two journals be- mation about a test; however, the clinically more meaningful
tween 1980 and 1991 for the accuracy and completeness of data conditional properties are positive and negative predictive val-
presentation. Before describing our findings, we present a brief ues. These values indicate the probability that an individual is
overview of how these terms are defined and calculated (for a ill or non-ill given that the test identifies him or her as ill or non-
more detailed discussion, see Baldessarini, Finkelstein, & Ar- ill. Accordingly, positive predictive value is the percentage of
ana, 1983; R. Fletcher, S. Fletcher, & Wagner, 1988; Griner, individuals classified by the test as ill who truly are ill [a/(a +
Mayewski, Mushlin, & Greenland, 1981; Mausner & Kramer, b)], whereas negative predictive value is the percentage of indi-
1985;Sackett, 1992). viduals classified as non-ill by the test who truly are non-ill [d/
Putative diagnostic tests of psychiatric syndromes are typi- (c + d)}.
cally judged by their association to a "gold standard," tradition- Kappa represents the level of agreement between the test in
ally the clinical, or structured clinical, interview. Test perfor- question and a gold standard beyond that accounted for by
chance alone. There are a variety of statistics used to correct for
mance is quantified in terms of its sensitivity, specificity, posi-
chance agreement, kappa being most widely used. Finally, the
tive and negative predictive power, and the absolute and chance-
overall correct classification rate, also known as the "hit rate"
corrected level of agreement with the standard. These may be
or "overall level of agreement," refers to the proportion of ill
and non-ill patients correctly classified by the test (a + d)/N.
There is marked variability across studies with regard to
Julie B. Kessel and Mark Zimmerman, Medical College of Pennsyl-
which statistics are reported. Each of the six statistics described
vania at Eastern Pennsylvania Psychiatric Institute. before yields a different perspective of a test's performance. We
We thank James Herbert for his comments on an earlier draft of this believe that these statistics, together, provide a broad profile of
article. the performance of diagnostic tests and should be included in
Correspondence concerning this article should be addressed to Mark routine reporting. However, more or less emphasis may be
Zimmerman, Medical College of Pennsylvania at Eastern Pennsylva- placed on one statistic over another, in accord with the nature
nia Psychiatric Institute, 3200 Henry Avenue, Philadelphia, Pennsyl- of the test, the population tested, and the hypothesis of investi-
vania 19129. gation.
395
396 JULIE B. KESSEL AND MARK ZIMMERMAN
DIAGNOSIS
are presented in Table 1. Most investigators presented either 3
Present Absent (n = 10) or 4 (n = 6) values. The overall level of agreement
was calculated in 16 studies; 18 reported both sensitivity and
specificity; 6 calculated the positive and negative predictive
power; and kappa was calculated in 6. The specific problems in
the 9 studies with errors and the 3 studies with unconventional
Negative definitions of terms are briefly summarized in the following.
Gallagher et al. (1983) examined the ability of the Beck De-
pression Inventory (BDI) to identify major depression in 102
elderly outpatients. Using a cutoff score of 11, they reported
that the false negative rate was 8.8%. From the raw data pre-
IMPORTANT DEFINITIONS sented in Table 1 of their article, we calculated the false negative
rate to be 1.8% (1/57).
Sensitivity = a/la-f-c) Goldston, O'Hara, and Schartz (1990) reported that the spec-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Specificity = d/lb+d)
ificity of the Inventory to Diagnose Depression (IDD) was
This document is copyrighted by the American Psychological Association or one of its allied publishers.
Table 1
Data Presentation and Errors in Reporting of Diagnostic Efficiency Statistics
Statistics reported
Overall
Author Sensitivity Specificity PPP NPP Kappa correct
No errors
Clopton, Weiner, & Davis (1980)
Hakstian & McLean (1989) +
Hodges (1990) +
Keane, Caddell, & K. Taylor
(1988) +
Klein, Dickstein, E. Taylor, &
Harding (1989)
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Greist(1990) +
McFall, D. Smith, Mackay, &
Tarver(1990) +
Oliver & Simmons (1984) +
Rapp, Walsh, Parisi, & Wallace
(1988) +
M. Smith & Thelen (1984) +
Thelen& Farmer (1991) +
Wolfson & Erbaugh (1984)
Zimmerman & Coryell (1987) +
Insufficient data
Goldberg, Shaw, & Segal (1987)
Lewinsohn & Teri (1982)
Unconventional definitions
Bryer, Marlines, & Dignan
(1990)
M. Hesselbrock, V. Hesselbrock,
Tennen, Meyer, & Workman
(1983)
Parmalee, Powell, & Katz (1989)
Miscalculations
Gallagher, Breckenridge,
Steinmetz, & Thompson
(1983)
Goldston, O'Hara, & Schartz
(1990)
M. Hesselbrock etal.( 1983)
Lewis, Turtletaub, Pohl, &
Rainey(1990)
Nelson & Cicchetti (1991) + +
Post &Lobitz( 1980)
Stukenberg, Dura, & Kiecolt-
Glaser(1990)
Trull (1991) + +
Turner, Beidel, Dancu, &
Stanley (1989)
Note. PPP = positive predictive power; NPP = negative predictive power; (+) = present; (-) = absent.
zich MMPI Depression scale to 162 psychiatric inpatients. classification rate for the Mezzich scale was reported to be 71 %;
They reported prevalence rates, overall level of agreement, false we computed it to be 85% (138/162).
negative and false positive rates for each scale. We used these Stukenberg, Dura, and Kiecolt-Glaser (1990) gave the BDI to
data to generate a 2 X 2 table for both tests. The overall correct 177 elderly community dwellers. They presented the sensitivity,
classification rate for the MMPI (D) scale was given as 65%; specificity, and overall correct classification rate at different BDI
by our calculation, it was 82% (133/162). The overall correct cutoffs. We generated the 2 X 2 table for the BDI at a cutoff
398 JULIE B. KESSEL AND MARK ZIMMERMAN
score of 5 (actual n was 163) based on the prevalence rates, sen- review system. The calculation of diagnostic efficiency statistics
sitivity, and specificity values given by the authors. We calcu- is relatively easy to double-check. This is not true of most other
lated the overall level of agreement as 81% (132/163), not 74% statistics. It is possible that reports based on more complex sta-
as reported by the authors. tistical constructs may be flawed to an even greater degree. We
Trull (1991) administered the MMPI Borderline Personality do not have any easily implemented suggestions to deal with this
Disorder (BPD) subscale to a sample of 395 psychiatric inpa- potential problem. However, the issue of mistakes in scientific
tients. We generated a 2 X 2 table from the data given in Table communication warrants further discussion and, perhaps, in-
4 of his report (prevalence rates, sensitivity, and specificity). By vestigation.
our calculation, the negative predictive power was 81% (197/
243), not 91% as reported by Trull. 1
In the present context, in which tests are used to distinguish ill from
Finally, Turner, Beidel, Dancu, and Stanley (1989) assessed non-ill individuals, false positives are persons with the desirable out-
the performance of the Social Phobia and Anxiety Inventory come (non-ill) who are incorrectly predicted to have the undesirable
(SPAI) in groups of socially anxious college students, nonso- outcome (ill). In a different context, in which the goal is to predict posi-
cially anxious college students, and outpatient social phobics. tive outcomes such as job success, then false positives may refer to indi-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
We generated the 2 X 2 table for the SPAI scale at a cutoff of 60 viduals with the undesirable outcome who are incorrectly predicted to
This document is copyrighted by the American Psychological Association or one of its allied publishers.
based on the authors' report that the overall correct classifica- have the desirable outcome. Although there is some variability in the
definition of these terms, in the medical literature these terms are de-
tion rate for 84 subjects, 16 of whom were correctly identified
fined as in Figure 1.
as social phobics, was 67.9%. We calculated the false negative
rate as 23.8% (5/21); the authors reported this value to be 14.3%
in Table 3 of their article. References
Three reports included calculations based on statistical terms
not defined as in Table 1 (Bryer, Marlines, & Dignan, 1990; American Psychiatric Association. (1980). Diagnostic and statistical
M. Hesselbrock et al, 1983; Parmalee, Powell, & Katz, 1989). manual of mental disorders (3rd ed.). Washington, DC: Author.
American Psychiatric Association. (1987). Diagnostic and statistical
Parmalee et al. defined false positive rate as b/N and Bryer et al.
manual of mental disorders (3rd ed., rev.) Washington, DC: Author.
as b/(a + b).' It is conventionally defined as b/(b + d). Similarly, Baldessarini, R., Finkelstein, S., & Arana, G. (1983). The predictive
Parmalee et al. defined false negative rate as c/N, rather than c/ power of diagnostic tests and the effect of prevalence of illness. Ar-
(a + c). Finally, M. Hesselbrock et al.'s formula for specificity, chives of General Psychiatry, 40, 569-573.
b/(b + d), is actually the formula for the false positive rate. Bryer, J., Marlines, K., & Dignan, M. (1990). Millon Clinical Multiaxial
Inventory Alcohol Abuse and Drug Abuse scales and the identifica-
tion of substance abuse patients. Psychological Assessment: A Journal
Discussion
of Consulting and Clinical Psychology, 2, 438-441.
We were quite surprised to find such a high rate of errors and Clopton, J., Weiner, R., & Davis, H. (1980). Use of the MMPI in identi-
we do not have a readily apparent explanation. Several articles fication of alcoholic psychiatric patients. Journal of Consulting and
Clinical Psychology, 48, 416-417.
describing the definition and calculation of these statistics have
Fletcher, R., Fletcher, S., & Wagner, E. (1988). Clinical epidemiology.
been published in widely circulated scientific journals. This in- Baltimore: Williams & Wilkins.
formation is also available in most epidemiologic textbooks. Gallagher, D., Breckenridge, J., Steinmetz, J., & Thompson, L. (1983).
We recommend that a standardized reporting format be used The Beck Depression Inventory and Research Diagnostic Criteria:
in future articles of a test's diagnostic performance. Specifically, Congruence in an older population. Journal of Consulting and Clini-
we suggest that the 2 X 2 table be presented as it is outlined in cal Psychology, 5, 945-946.
Figure 1, complete with the cell sizes. This will easily allow Goldberg, J., Shaw, B., & Segal, Z. (1987). Concurrent validity of the
readers and reviewers to double-check calculations and to com- Millon Clinical Multiaxial Inventory depression scales. Journal of
pute statistics not computed by authors. Moreover, as pre- Consulting and Clinical Psychology, 55, 785-787.
Goldston, D., O'Hara, M., & Schartz, H. (1990). Reliability, validity,
viously stated, we believe that sensitivity, specificity, positive
and preliminary normative data for the Inventory to Diagnose De-
and negative predictive power, overall correct classification, and pression in a college population. Psychological Assessment: A Journal
kappa are important measures of test performance and clinical of Consulting and Clinical Psychology, 2,112-215.
utility, and we suggest that they be routinely reported. However, Griner, P., Mayewski, R., Mushlin, A., & Greenland, P. (1981). Selec-
the particular content and focus of the study should determine tion and interpretation of diagnostic tests and procedures. Annals of
the emphasis placed upon one statistic over another. If certain Internal Medicine, 94, 553-592.
statistics are not computed, or alternative ones calculated, the Hakstian, R., & McLean, P. (1989). Brief screen for depression. Psycho-
authors should state the reasons and clearly define the terms and logical Assessment: A Journal of Consulting and Clinical Psychology,
computations. 1, 139-141.
Certainly, the problem of computational inaccuracy occurs Hesselbrock, M., Hesselbrock, V., Tennen, H., Meyer, R., & Workman,
K. (1983). Methodological considerations in the assessment of de-
at the level of the author's report, but peer reviewers also share
pression in alcoholics. Journal of Consulting and Clinical Psychology,
responsibility. Admittedly, it was sometimes time consuming to 51, 339-405.
generate the raw data of the 2 X 2 table, and in 2 studies, it was Hodges, K. (1990). Depression and anxiety in children: A comparison
not possible to do so. Nevertheless, the accuracy of calculations of self-report questionnaires to clinical interview. Psychological As-
and the reporting of diagnostic efficiency statistics should not sessment: A Journal of Consulting and Clinical Psychology, 2, 376-
go unchecked. 381.
Our findings raise some disturbing questions about the peer Keane, X, Caddell, J., & Taylor, K. (1988). Mississippi Scale for Com-
ERRORS IN REPORTING 399
bat-Related Posttraumatic Stress Disorder: Three studies in reliabil- Post, R., & Lobitz, C. (1980). The utility of Mezzich's MMPI regression
ity and validity. Journal of Consulting and Clinical Psychology, 56, formula as a diagnostic criterion in depression research. Journal of
85-90. Consulting and Clinical Psychology, 48, 673-674.
Klein, S., Dickstein, S., Taylor, E., & Harding, K. (1989). Identifying Rapp, S., Walsh, D., Parisi, S., & Wallace, C. (1988). Detecting depres-
chronic affective disorders in outpatients: Validation of the General sion in elderly medical inpatients. Journal of Consulting and Clinical
Behavior Inventory. Journal of Consulting and Clinical Psychology, Psychology, 56, 509-513.
57, 106-111. Sackett, D. (1992). A primer on the precision and accuracy of the clini-
Kobak, K., Reynolds, W., Rosenfeld, R., & Greist, J. (1990). Develop- cal examination. Journal of the American Medical Association, 267,
ment and validation of a computer-administered version of the Ham- 2638-2644.
ilton Depression Rating Scale. Psychological Assessment: A Journal Smith, M., & Thelen, M. (1984). Development and validation of a test
for bulimia. Journal of Consulting and Clinical Psychology, 52, 863-
of Consulting and Clinical Psychology, 2, 56-63.
872.
Lewinsohn, P., & Teri, L. (1982). Selection of depressed and nonde-
Stukenberg, K., Dura, J., & Kiecolt-Glaser, J. (1990). Depression
pressed subjects on the basis of self-report data. Journal of Consulting
screening scale validation in an elderly, community dwelling popula-
and Clinical Psychology, 50, 590-591. tion . Psychological Assessment: A Journal of Consulting and Clinical
Lewis, R., Turtletaub, J., Pohl, R., & Rainey, J. (1990). MMPI differ- Psychology, 2, 134-138.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
entiation of panic disorder patients from other psychiatric outpa- Thelen, M., & Farmer, J. (1991). A revision of the Bulimia Test: The
This document is copyrighted by the American Psychological Association or one of its allied publishers.
tients. Psychological Assessment: A Journal of Consulting and Clini- BULIT-R. Psychological Assessment: A Journal of Consulting and
cal Psychology, 2, 164-168. Clinical Psychology, 3, 119-124.
Mausner, J., & Kramer, S. (1985). Epidemiology—An introductory text. Trull, T. (1991). Discriminant validity of the MMPI-Borderline Person-
Philadelphia: W. B. Saunders. ality Disorder scale. Psychological Assessment: A Journal of Consult-
McFall, M, Smith, D., Mackay, P., &Tarver, D. (1990). Reliability and ing and Clinical Psychology, 3, 232-238.
validity of Mississippi Scale for Combat-Related Posttraumatic Stress Turner, S., Beidel, D., Dancu, C., & Stanley, M. (1989). An empirically
Disorder. Psychological Assessment: A Journal of Consulting and derived inventory to measure social fears and anxiety: The Social
Clinical Psychology, 2, 114-121. Phobia and Anxiety Inventory. Psychological Assessment: A Journal
Nelson, L., & Cicchetti, D. (1991). Validity of the MMPI Depression of Consulting and Clinical Psychology, 1, 35-40.
scale for outpatients. Psychological Assessment: A Journal of Consult- Wolfson, K., & Erbaugh, S. (1984). Adolescent responses to the MacAn-
ing and Clinical Psychology, 3, 55-59. drew scale. Journal of Consulting and Clinical Psychology, 52, 625-
Oliver, J., & Simmons, M. (1984). Depression as measured by the 630.
DSM-lIl and the Beck Depression Inventory in an unselected adult Zimmerman, M., & Coryell, W. (1987). The Inventory to Diagnose De-
population. Journal of Consulting and Clinical Psychology, 52, 892- pression (IDD): A self-report scale to diagnose major depression.
898. Journal of Consulting and Clinical Psychology, 55, 55-59.
Parmalee, P., Powell, L., & Katz, I. (1989). Psychometric properties of
the Geriatric Depression Scale among the institutionalized aged. Psy- Received September 9, 1992
chological Assessment: A Journal of Consulting and Clinical Psychol- Revision received January 20, 1993
ogy, 1, 331-338. Accepted January 20, 1993 •
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/333431653
CITATIONS READS
46 1,698
3 authors, including:
Gerardo Matturro
Universidad ORT Uruguay
40 PUBLICATIONS 238 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
2019e Gestión del MVP y modelo de proceso software para emprendimientos tecnológicos de software View project
All content following this page was uploaded by Gerardo Matturro on 03 June 2019.
Gerardo Matturro
(Universidad ORT Uruguay, Montevideo, Uruguay
matturro@uni.ort.edu.uy)
Florencia Raschetti
(Universidad ORT Uruguay, Montevideo, Uruguay
florencia.raschetti@gmail.com)
Carina Fontán
(Universidad ORT Uruguay, Montevideo, Uruguay
cfontan@gmail.com)
Abstract: To participate in software development projects, team members may need to perform
different roles and be skilled in diverse methodologies, tools and techniques. However, other
skills, usually known as “soft skills” are also necessary. We report the results of a systematic
mapping study to identify existing research on soft skills in software engineering and to
determine what soft skills are considered relevant to the practice of software engineering. After
applying an explicit mapping protocol, 44 papers were finally selected, and 30 main categories
of soft skills were identified. At least half of the studies selected mention five skills:
communication, teamwork, analytical, organizational, and interpersonal skills. We also
identified the data collection methods commonly used for research on this topic: job
advertisements and surveys were the main ones. The results of this work are of interest to
researchers in human aspects of software engineering, to those responsible for Human Resource
in software development companies, and to curriculum designers in careers related to software
engineering and development.
1 Introduction
Software development is a highly technical activity that requires people to have
knowledge and experience in diverse software processes, methodologies, tools and
techniques, but also to perform various functions in software projects. When software
companies assemble project teams or hire new professionals, they often tend to
emphasize the knowledge and technical skills of potential candidates. However, the
human dimension may be as critical as technical capacity [Acuña, 06]. When people
work together on a software project, other skills are necessary to implement activities
such as communicating and interacting with other team members and stakeholders in
the project, managing time, presenting progress of the project, negotiating with the
customer, solving problems and making decisions, among others.
Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ... 17
According to Capretz, software professionals should delve into these nontechnical
issues and recognize that the people involved in the software development process are
as important as the processes and the technology itself [Capretz, 14]. The reason for
addressing these human factors is mainly the recognition that software engineers
could benefit from greater self-awareness and of others’ perspectives to develop their
soft skills, which in turn can positively influence their work [Ahmed, 13]. Even
though soft skills often play a critical role in career advancement, many professionals,
especially engineers and other highly technical people, pay little attention to this fact
[Chou, 13]. Soft skills are also mentioned in the literature as "non-technical skills",
"people skills", “transferable skills”, "social skills", or "generic competencies".
The purpose of this paper is to report the results of a systematic mapping of
literature regarding soft skills in software engineering. A systematic mapping or
mapping study is a form of secondary study intended to identify and classify the set of
publications on a topic. Its value lies partly in identifying areas where there is scope
for a fuller review (the group of related studies), and also to find out in which areas
there may be a need for more primary studies [Kitchenham, 16].
As mentioned by Batteson et al. [Batteson, 16], two research methods have
prevailed within the literature regarding the study of soft skills. The first approach
seeks to identify discrete skills considered soft skills. This method typically involves
eliciting lists of soft skills from relevant stakeholders in a given domain, through
surveys and interviews. The other research approach commonly seen starts with an
existing list of soft skills and tests them in relation to some capacity, such as
determining those skills which are most likely to predict performance, or testing the
agreement of importance of different skills across different participant groups.
In this work we took the first approach: our objective is to identify discrete skills
that are considered soft skills in the domain of software engineering, starting from
studies included in the literature mapping. Besides, we also want to know how often
the identified skills are mentioned in the reviewed literature and what research
methods are usually used in researching this topic.
We postulate that, if software engineers are to develop soft skills and relate them
to software projects roles and development activities, soft skills need to be clearly
articulated and defined. Using the list of soft skills gathered from the studies included
in our mapping, the second research approach mentioned above may be used by
ourselves and other researchers and stakeholders, to follow on the study of this topic.
Thus, we consider the results of this work particularly relevant to: a) researchers
interested in the human aspects of software engineering, b) managers responsible for
Human Resource in software development companies and team leaders of software
development projects, c) curriculum designers of study programs related to software
development and information technology, d) students and professionals.
The rest of this paper is organized as follows. In section 2 we provide background
information about what is meant by the term "soft skills" and its relevance to software
engineering professional practice and education. In section 3 we describe the research
method followed to perform this mapping study. In Section 4 we report the results of
the analysis of the selected studies included in the mapping and answer our research
questions. A discussion of those results is presented in Section 5 and threats to
validity are presented in Section 6. Finally, conclusions and further work are
presented in Section 7.
18 Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ...
2 Background
In this section we present a brief review of general literature on the subject of soft
skills. We highlight the lack of a single definition, review three typical approaches to
conceptualize the term and identify common components, in order to develop a
working definition for this study. In addition, we argue on the relevance of soft skills
in software engineering practice and education, and we describe the related work.
For this study, we will refer to "soft skills" as the combination of the abilities,
attitudes, habits, and personality traits that allow people to perform better in the
workplace, complementing the technical skills required to do their jobs and
influencing the way they behave and interact with others.
20 Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ...
One final aspect we want to comment, that have been pointed out by some
authors, is the apparently difficulty of assessing or measuring soft skills [Bhatnaga,
12], [Tulgan, 15]. To Thomas, the most common, and highly effective, method for
assessing non-technical skills is by using a behavioral marker system [Thomas, 18].
Such a system is defined as a framework that sets out observable, non-technical
behaviors that contribute to superior or sub-standard performance within a work
environment [Klampfer, 01]. Behavioral markers can be used in any domain where
behaviors relating to job performance can be observed. In the domain of software
engineering, one such a system has been proposed by Lacher and colleagues to
measure a set of non-technical skills of software professionals [Lacher, 15].
3 Research Method
Based on the guidelines provided in [Kitchenham, 16], the steps taken for this
mapping study were: 1) Definition of research questions, 2) Search of the relevant
literature, 3) Selection of relevant studies, 4) Data Extraction 5) Data aggregation and
synthesis.
22 Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ...
• RQ1: What are the soft skills considered relevant to the practice of software
engineering?
• RQ2: What are the data sources or research methods used to identify those
soft skills?
Finally, due to the vagueness of the concept of soft skill and the lack of a unified
definition (as explained in Section 2), the third research question was:
The criteria for study inclusion and exclusion defined for this study were:
• Inclusion: a) journal articles and conference proceedings records without
considering specific publication dates, b) articles presenting results of
empirical studies specifically related to software engineering. In order to be
included in the mapping, a study must meet both criteria.
• Exclusion: a) articles published in journals or conference proceedings not
refereed, b) articles that referred to studies about soft skills in ICT
generically, technical support, hardware or software installation or
maintenance, or technology infrastructure, c) articles in which the data
sources or the data collection procedures were not specified, d) items based
on expert or author opinion (position papers), summaries of articles, book
prefaces, journal editorials, readers’ letters, summaries of workshops,
tutorials, and poster sessions. Any paper that met at least one of these criteria
was excluded from the map.
Data extraction from each paper was done independently by two of the authors
and then compared to verify that no datum was missing, and that all soft skills
mentioned had been recorded in the data extraction form.
Data extraction was performed manually. The decision of not using software tools
to perform automatic extraction was based on the results of a study by Marshall and
Brereton, which concludes that, based on their evidence, most of the tools they
identified were in the initial stages of development and use, and therefore there were
very few empirical corroboration data on its effectiveness [Marshall, 13].
24 Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ...
4 Results
In this section we present the outcomes of the search process, the list of papers
selected after applying the inclusion/exclusion criteria, and the answers to the
research questions.
The figures in the “Papers found” column consider all the papers retrieved, before
applying the inclusion/exclusion criteria.
Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ... 25
4.2 Studies selected for the review
From the extensive lists of studies obtained through the searches, a selection of the
relevant ones was done by reading and analysing the title, abstract and keywords of
each article, discarding the ones that were clearly unrelated to the research subject and
those that were duplicated. Then, the studies selected in the previous step were
carefully reviewed by reading the Introduction, Methodology section, Results section,
and Conclusion and applying the inclusion and exclusion criteria presented in Section
3.3. If reading the above items was not enough to decide whether to include it or not,
the study was read in its entirety.
After discarding those items that did not meet the inclusion/exclusion criteria and
those that, although containing some of the search keywords, were not directly related
to the focus of the investigation, the final set was reduced to 44 papers. Of them, 43
are written in English and one in Portuguese.
References to the selected papers, classified by year of publication, are shown in
Table 3. Full bibliographic data of these studies is given in Table 4.
The interest for studying the subject of soft skills in software engineering appears
to have increased since 2009, as 34 studies were reported between that year and 2017.
26 Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ...
After grouping the soft skills, we counted how many times the skills in each
category are mentioned in the selected papers, as shown in Table 5. Column “%”
Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ... 31
indicates the percentage of the selected papers that mention a soft skill included in the
respective category.
Table 5: Main categories of soft skills and number of times they appear mentioned in
the selected papers
We also found many other soft skills that appear mentioned just one or two times
in the selected studies. What follows is a partial list of these other soft skills:
administration skills, appearance, ability to understand diversity, ability to
visualize/conceptualize, ability to apply knowledge, ability to multitask, ability to
give and receive constructive criticism, being persistent, business skills, coaching,
conducting investigations, cooperates with people with different personalities, race or
gender, courage, entrepreneurship, credibility, interviewing skills, role playing skills,
moderation, efficiency, sales, diplomacy, professionalism, follow directions, setting
32 Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ...
Several papers report the use of more than one of these methods to collect data;
that is why column “%” does not sum up 100. For example, in [p37], the authors
report the use of literature reviews and of focus groups consisted of employers,
software engineering and computer science industrial professionals and instructors.
Another example is [p3], which reports the use of site interviews, focus groups, and a
web-based survey as data collection methods.
5 Discussion
The primary purpose of the present mapping study was to identify existing research
on soft skills in software engineering to identify what soft skills are considered
associated to the practice of software engineering. After applying an explicit mapping
protocol, 44 papers were finally selected for further analysis and the lists of the soft
skills mentioned in them were extracted and grouped in 30 categories, as shown in
Table 5.
To create those categories, we followed the procedure describe in sub-section 3.5
and thus, it is debatable that they represent distinct and independent soft skills. In fact,
we recognize that some overlap may exist between some categories.
For example, “presentation skills” (the skills needed to deliver an effective
presentation to a variety of audiences) requires “organizational skills” to prepare and
organize what to deliver in the presentation, “interpersonal skills” to create empathy
with the audience, “decision-making skills” to decide what material to include in the
presentation, and “communication skills” to adequately transmit what is intended to
present. Similarly, conflict management require negotiation and problem-solving
skills as well as oral communication and listening skills.
In aggregating the skills into those distinct categories, we wanted to outline a list
of discrete soft skills in order to advance in the study of their characteristics and their
relationships to the practice of software engineering. Other disciplines besides
software engineering, such as psychology, sociology, and human resource
management, can contribute to better define or conceptualize the set of soft skills and
shed light on their relevance and influence in software engineering practice.
An analysis of the data collection and research methods used in the selected
studies indicates that they rely mostly on job advertisings, followed by surveys to
professionals and practitioners, as shown in Table 6. Job ads, published in newspapers
or in job advertisements portals, are one of the preferred ways software companies
use to advertise job positions when they need to recruit new talent. These ads should
reflect what the industry asks for job positions in software engineering.
The data in Table 6 reveal that the primary studies included in the mapping have
mainly taken the first research approach described in the Introduction, that is, the
elicitation of lists of discrete skills considered soft skills from relevant stakeholders
[Batteson, 16].
On the other hand, Clarkson mentions that the tasks that technical people carry
out fall into three rough groupings: those done primarily as individuals, those done
primarily with other people and those done primarily as leaders of a team. Each
grouping gives rise to a different type of interaction and thus a different set of soft
skills [Clarkson, 01].
In our opinion, here is where our mapping study shows a gap in existing research
that establishes the need for more primary studies to move from collecting lists of soft
skills to studying what different sets of soft skills are required to each grouping of
36 Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ...
tasks during software projects, and their incidence in the general practice of software
engineering.
Thus, we consider the results of this mapping study to be of value for graduate
students and researchers interested in the human aspects of software engineering. Our
results can be taken as a starting point to frame an investigation on the subject within
its general context, such as a prior analysis of the state of the art (a fundamental part
of the process of preparing any academic work), and to plan and develop new studies
to further determine the impact that those skills have on main drivers of software
project success, such as teamwork, interpersonal communication, decision making
and problem solving.
In the Introduction we argued that the results obtained are also relevant to Human
Resource managers in software development companies and team leaders of software
development projects, and to curriculum designers in careers related to software
development and information technology.
As stated above, the main data collection method used in the selected studies was
job advertisings published in newspapers and in job portals. What those ads reflect is
what software companies ask for to new hires, but there is no indication of whether or
how those skills are assessed in hiring decisions or evaluated later on, while people
are working for the company.
If we assume that companies do really assess those skills at hiring time or they
periodically do so later, then those responsible for recruitment and selection can use
the results of this study to determine what kind of soft skills are demanded by peer
organizations, and take them into consideration along with other organizational
aspects such as organizational culture, characteristics of the software development
projects they usually run, and the values and skills of other members of the
organization and their projects teams.
On the other hand, it seems reasonable to think that any software project manager
or team leader will prefer team members that are able to work harmoniously in the
team, make decision and solve problems, negotiate and manage conflicts successfully,
communicate well and establish good interpersonal relationships.
Thus, even if we do not make the above assumption, the results of this study are
still useful for Human Resource managers and software projects team leaders to be
aware of what non-technical skills are usually demanded by the industry and to
identify which of them may be suitable for their environment. As argued by Capretz,
Ahmed and da Silva, it is impossible to exclude human factors from software
engineering experience during software development because software is developed
by people and for people [Capretz, 17].
Regarding the usefulness of the results for curriculum designers of study
programs related to software development and information systems, Capretz and
Ahmed affirm that at present very few programs in software engineering touch on the
topics of teamwork and the evaluation of soft skills [Capretz, 18]. According to these
authors, it is even difficult to find a university that has a full course on the human
aspects of software engineering and consider it unfortunate that the soft skills topics
are far from being part of conventional software engineering education.
Here, we can raise a question: if soft skills are a concept so difficult to define,
how can they be taught?
Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ... 37
From the discussion given in Section 2, although it is hard to have a single,
unified definition of the concept of soft skills itself, there are approximations to the
notion of soft skills, and there is also agreement on several specific soft skills. In this
sense, to Clarkson soft skills are like any other skills. He considers that we can teach
the techniques, but individuals must learn the skill by themselves; they must develop
familiarity and ease with the techniques, and they must adapt their own behavior to
give appropriate responses to new situations [Clarkson, 01]. Under this consideration,
one of the challenges is to choose the appropriate teaching or training method that
gives an individual the opportunity to develop the new skills, within a context that is
close enough to the job he will perform in the labor market. Examples of novel
approaches to teaching soft skills are given in [Dell’Aquila, 17] where the authors
present and discuss several concrete experiences of educational games and training
tools applied to a variety of soft skills, such as negotiation, decision-making,
leadership and problem solving.
In Rao's opinion, there must be an effective coordination between academia,
students, industry and principals of educational institutions to improve this type of
skills among students [Rao, 14], because these skills -in his opinion- improve the
employability of professionals. Therefore, knowing what the most demanded soft
skills in the practice of software engineering are, is of interest for undergraduate
students who will turn to the labor market, and also for graduate professionals seeking
to advance their professional careers. In this sense, Richter and Dumke affirm that
about 80% of people who fail at work do not fail because of lacking technical skills,
but rather because of their inability to relate or communicate well with other people in
a team [Richter, 15]. Communication skills and teamwork are the two most often
mentioned soft skills, as shown in Table 5.
One final aspect we want to discuss is the finding that only 4 papers present
definitions of specific soft skills (Table 7), some of them difficult to interpret and to
grasp their meaning (for example, interpersonal skills as in [p28] or critical thinking
as in [p20]). To advance in the study of soft skills it will be necessary to better
characterize them and use those characterizations in new research to test their value
and incidence in the practice of software engineering.
6 Threats to validity
Several threats to validity have been identified for this systematic review.
First, the keywords used in the search strings as alternative names to "soft skills"
(sub-section 3.2) may not be all the possible options. In our case, we used the
different names found in the literature when writing the background section.
Second, we only accessed the set of databases that were available to us
(subsection 3.2). There are other bibliographic databases and therefore we could have
missed some important studies about the subject.
Third, because of the lack of a formal and unified definition of what a "soft skill"
is, as described in Section 2, it is arguable whether some of the skills reported as
"soft" are actually soft skills. In this case, we counted as "soft skills" all the skills
mentioned as such in the selected studies.
Finally, the soft skills grouped in the categories presented in section 4 are the
most mentioned in the papers included in the mapping. As the main data sources used
38 Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ...
in those papers were job ads, surveys and interviews, it is unclear how strong the
correlation is between being mentioned in those sources and being of value to the
practice of software engineering, a topic that deserves more research.
References
[Aamodt, 16] Aamodt, M.: Industrial/Organizational Psychology: An Applied Approach,
Boston, Cengage Learning, 2016.
Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ... 39
[Acuña, 06] Acuña, S., Juristo, N. Moreno, A. M.: Emphasizing human capabilities in software
development, IEEE Software., Vol. 23, No. 2, pp. 94–101, 2006.
[Ahmed, 13] Ahmed, F., Capretz, L. F., Bouktif, S., Campbell, P.: Soft Skills and Software
Development: A Reflection from Software Industry, International. Journal of Information
Processing and Management, Vol. 4, No. 3, pp. 171–191, May 2013.
[Bancino, 07] Bancino, R., Zevalkink, C.: Soft skills: The New Curriculum for Hard-Core
Technical Professionals, Techniques: Connecting Education and Careers, Vol. 82, No. 5, pp.
20-22, 2007.
[Batteson, 16] Matteson, M. L., Anderson, L., Boyden, C.: "Soft Skills": A Phrase in Search of
Meaning. Portal: Libraries and the Academy Vol. 16, No. 1, pp. 71-88, 2016.
[Bhatnaga, 12] Bhatnaga, N.: Effective communication and soft skills. New Delhi, Dorling
Kindersley, 2012.
[Beecham, 08] Beecham, S., Baddoo, N., Hall, T., Robinson, H., Sharp, H.: Motivation in
Software Engineering: A systematic literature review, Information and Software Technology,
50(9), 2008, pp. 860-878.
[Capretz, 14] Capretz, L. F.: Bringing the Human Factor to Software Engineering, IEEE
Software, 31(2), pp. 102–104 (2014).
[Capretz, 17] Capretz, L. F., Ahmed, F., da Silva, F.: Soft sides of software, Information and
Software Technology, 92, pp. 92-94, 2017.
[Capretz, 18] Capretz L. F., and Ahmed, F.: A Call to Promote Soft Skills in Software
Engineering", Psychol Cogn Sci Open J., 4(1), 2018.
[Chou, 13] Chou, W.: Fast-tracking your career. Soft skills for engineering and IT
professionals, Wiley, Hoboken: NJ, 2013.
[Clarkson, 01] Clarson, M.: Developing IT staff. A practical approach. London, Springer-
Verlag, 2001.
[Dell’Aquila, 17] Dell’Aquila, E., Marocco, D., Ponticorvo, M., di Ferdinando, A., Schembri,
M., Miglino, O.: Educational Games for Soft-Skills Training in Digital Environments,
Switzerland, Springer, 2017.
[Finch, 13] Finch, D., Hamilton, L., Baldwin, R., Zehner, M.: An exploratory study of factors
affecting undergraduate employability, Education+Training, Vol. 55, No.7, pp. 681-704, 2013.
[Goldberg, 14] Goldberg, D. M., Rosenfeld, M.: People-Centric Skills. Interpersonal and
communication skills for auditors and business professionals. Hoboken: NJ, Wiley, 2014.
[Hillage, 98] Hillage, J., Pollard, E.: Developing a framework for policy analysis, No. 85,
Institute for Employment Studies, UK Department of Education and Employment, 1998.
[IEEE, 90] IEEE Standard Glossary of Software Engineering Terminology, IEEE std 610.12-
1990, 1990.
[Iriarte, 17] Iriarte, C., Bayona, S.: Soft skills in IT Project success: A systematic literature
review, 6th International Conference on Software Process Improvement, Zacatecas, Oct. 18-20,
2017, pp. 147-160.
[Kamin, 13] Kamin, M.: Soft skills revolution. A guide for connecting with compassion for
trainers, teams, and leaders, Pfeiffer, (2013).
40 Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ...
[Kitchenham, 16] Kitchenham, B. A., Budgen, D., Brereton, P.: Evidence-Based Software
Engineering and Systematic Reviews, CRC Press, Boca Raton, 2016.
[Klampfer, 01] Klampfer, B., Helmreich, R., Hausler, B., Sexton, B., Fletcher, G., Field, P.,
Staender, S., Lauche, L., Dieckmann, P., Amacher, A.: Enhancing performance in high risk
environments: Recommendations for the use of behavioral markers, Behavioral Markers
Workshop, Swissair Training Centre, Zurich, 2001.
[Klaus, 08] Klaus, P.: The hard truth about soft skills. Workplace lessons smart people wish
they’d learned sooner. Harper Collins, 2008.
[Lacher, 15] Lacher, L., Walia, G., Fagerholm, F., Pagels, M., Nygard, K., Münch, J.: A
Behavior Marker tool for measurement of the Non-Technical Skills of Software Professionals:
An Empirical Investigation, 27th International Conference on Software Engineering and
Knowledge Engineering (SEKE 2015), Pittsburgh, 2015.
[Marshall, 13] Marshall, C., Brereton, P.: Tools to Support Systematic Literature Reviews in
Software Engineering: A Mapping Study, in 2013 ACM / IEEE International Symposium on
Empirical Software Engineering and Measurement, 2013, pp. 296–299.
[Naiem, 15] Naiem, S., Abdellatif, M.: Evaluation of Computer Science and Software
Engineering Undergraduate’s Soft Skills in Egypt from Student’s Perspective, Computer and
Information Science, 8(1), 2015.
[Prince, 13] Prince, E. S.: The advantage. The 7 soft skills you need to stay one step ahead.
Financial Times Press, 2013.
[Radermacher, 13] Radermacher, A., Walia, G.: Gaps Between Industry Expectations and the
Abilities of Graduates, Proceeding of the 44th ACM Technical Symposium on Computer
Science Education (SIGCSE 2013), Denver, March 6–9, pp. 525-530, 2013.
[Ramesh, 10] Ramesh, G., Ramesh, M.: The ACE of soft skills. Attitude, communication and
etiquette for success. New Delhi, Dorling Kindersley, 2010.
[Rao, 10] Rao, M. S.: Soft skills enhancing employability. New Delhi, I. K. International
Publishing House, 2010.
[Rao, 14] Rao, M. S.: Enhancing employability in engineering and management students
through soft skills, Industrial and Commercial Training, 46 (1), pp. 42-48, 2014.
[Richter, 15] Richter, K., Dumke, R.: Modeling, Evaluation, and Predicting IT Human
Resources, CRC Press, Boca Raton, 2015.
[Sedelmaier, 15] Sedelmaier, Y., Landes, D.: SWEBOS – The Software Engineering Body of
Skills, International Journal of Engineering Pedagogy, 5(1), pp. 20-26, 2015.
[Starkweather, 11] Starkweather, J. A., Stevenson, H. S.: IT hiring criteria vs. valued IT
competencies, Managing IT Human Resources: Considerations for Organizations and
Personnel, IGI Global, Hershey, 2011.
[Thomas, 18] Thomas, M.: Training and Assessing Non-Technical Skills: A Practical Guide,
Boca Raton, CRC Press, 2018.
[Tulgan, 15] Tulgan, B.: Bridging the soft skills gap. How to teach the missing basics to
today’s young talent. Hoboken: NJ, Jossey-Bass, 2015.
[Verma, 09] Verma, S.: Soft skills for the BPO sector. New Delhi, Dorling Kinderley, 2009.
Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ... 41
[Zurita, 16] Zurita, G., Baloian, N., Pino, J., Boghosian, M.: Introducing a Collaborative Tool
Supporting a Learning Activity Involving Creativity with Rotation of Group Members, Journal
of Universal Computer Science, 22 (10), pp. 1360-1379, 2016.
CITATIONS READS
20 1,013
3 authors:
Kateryna V Keefer
Trent University
29 PUBLICATIONS 1,177 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by James D A Parker on 20 July 2020.
James DA Parker
Trent University, Canada
Donald H Saklofske
University of Western Ontario, Canada
Kateryna V Keefer
Trent University, Canada
Abstract
Much of the work on predicting academic success in postsecondary education has
focused on the impact of various cognitive abilities, although in recent years there has
been increased attention to the role played by emotional and social competency (also
called emotional intelligence (EI)). Previous work on the link between EI and giftedness is
reviewed, particularly factors connected to the successful transition to postsecondary
education. Data are presented from a sample of 171 exceptionally high-achieving sec-
ondary students (high school grade-point average of 90% or better) who completed a
measure of trait EI at the start of postsecondary studies and who had their academic
progress tracked over the next 6 years. High-achieving secondary students who com-
pleted an undergraduate degree scored significantly higher on a number of EI dimensions
compared to the secondary students who dropped out. Results are discussed in the
context of the importance of EI in the successful transition from secondary to post-
secondary education.
Keywords
giftedness, emotional intelligence, post secondary, achievement
Corresponding author:
James DA Parker, Department of Psychology, Trent University, Peterborough, Ontario K9 J 7B8, Canada.
Email: jparker@trentu.ca
184 Gifted Education International 33(2)
The transition to adulthood is a critical life period that has far-reaching personal, social,
and economic implications (Arnett, 2000). Important markers of the transition to adult-
hood include completing an education, becoming financially self-sufficient, living inde-
pendently, establishing one’s own family, and attaining psychosocial stability
(subjective feelings of self-esteem, belongingness, and life satisfaction). For individuals
who pursue postsecondary education, the transition from high school to university or
college is almost universally perceived as a highly stressful experience (Perry et al.,
2001), with stress levels typically rated higher in first year compared to subsequent years
(Ross et al., 1999). As one consequence, the highest dropout from university and college
typically takes place during the first year of study (Pancer et al., 2000). The reasons
reported by students for withdrawing or transferring from a specific university or college
are consistently linked to the stress of making the transition from high school that
includes settling on the ‘‘right’’ academic program, financial concerns, health problems,
and personal issues (Pancer et al., 2000). The latter reason (personal issues) is often the
most common, involving issues like problems making new friends, difficulties being
away from existing friends and family, and developing appropriate work habits for the
new learning environment (Parker et al., 2004, 2006).
While less is known about the successful transition to university for gifted students
(Rinn and Plucker, 2004), there are important empirical hints that the key issues are quite
similar to other groups of students. While intellectually gifted students do not comprise a
homogeneous group and may have experienced different types of academic programs
and accommodations prior to postsecondary entry (see Schwean et al., 2006), researchers
have reported that factors like the need to make new friends, adjusting to changes with
existing relationships, as well as adjusting to a qualitatively different learning environ-
ment are also key obstacles for gifted students making the successful transition to
university and college (Gómez-Arı́zaga and Conejeros-Solar, 2013; Hammond et al.,
2007; Muratori et al., 2003). This prior research on gifted students can only be sugges-
tive, however, since it has a number of methodological limitations. Previous research has
typically assessed academic success over quite narrow timelines (e.g. single terms), or
compromised the interpretability of results by combining into a common data set full-
and part-time students, young adults and mature students, and students at different stages
of the transition process (e.g. first-year students vs. students about to graduate from
university). The aim of the present study was to explore the predictors of successful
transition to postsecondary education in a relatively homogeneous sample of exception-
ally high-achieving secondary students.
postsecondary students from the United States (Parker et al., 2005) and the United
Kingdom (Qualter et al., 2009).
To help explain this pattern of results for academic achievement, it is worth noting
that EI has been linked to a number of positive indicators in postsecondary settings,
including fewer physical fatigue symptoms (Brown and Schutte, 2006; Thompson et al.,
2007), better overall adjustment and life satisfaction (Extremera and Fernández-
Berrocal, 2006; Saklofske et al., 2003), and less social anxiety and loneliness (Summer-
feldt et al., 2006). Overall, it would appear that students who have higher EI experience
more positive social support and tend to use more positive and adaptive coping strategies
(Austin et al., 2010; Saklofske et al., 2007).
Giftedness and EI
Past research on the relationship between EI and giftedness has produced a very mixed
set of findings. Some researchers have reported that levels of EI-related competencies
are quite similar in samples of gifted individuals and their typically developed peers
(Morawska and Sanders, 2008; Schwean et al., 2006), while others have reported gifted
students to be more vulnerable than other groups of students to social and emotional
problems like depression, shyness, and poor peer relationships (e.g. Plucker and Levy,
2001; Silverman, 1993; Wellisch and Brown, 2012). To make generalizations about the
link between EI and giftedness even more difficult, there is yet a third line of research
that suggests gifted students might actually be better prepared to cope with emotional
and social problems than their typically developed peers (Eklund et al., 2015; Neihart
et al., 2002).
There are certainly multiple reasons for the inconsistent findings in research on the
association between EI and giftedness. A key factor is likely the different operational
definitions that have been used to identify gifted and typically developed individuals,
with some studies placing emphasis on academic achievement versus cognitive abilities
(Martin et al., 2009). Zeidner et al. (2005) have also suggested that the methods used to
assess EI may play a key role in the inconsistent findings on the association between EI
and giftedness. In their work with high school students, Zeidner et al. (2005) found that
gifted students scored significantly higher than typically developed students only when
ability measures of EI were used.
The differential pattern of results reported by Zeidner et al. (2005) also hints at a
broader explanation for the lack of consistency in the EI and giftedness literature. When
giftedness is defined via extreme scores on traditional intelligence measures, the issue of
EI differences becomes quite mute. Theoretically, proponents of both ability and trait
models propose that EI is not related to conventional intelligence (Austin et al., 2008);
researchers developing tools connected to both ability and trait models (e.g. Bar-On,
1997; Mayer et al., 1999, 2002) have gone to great lengths to demonstrate that their
measures correlate very weakly with cognitive measures. However, it is worth noting
that the Mayer-Salovey-Caruso Emotional Intelligence Test (the ability measure used in
Zeidner et al., 2005) has been found to correlate moderately with measures of cognitive
intelligence (unlike trait EI measures; see Webb et al., 2014), which may account for its
positive correlation with giftedness.
Parker et al. 187
Present study
While EI-related abilities may ultimately not be different in gifted populations, an
interesting empirical question is whether the impact of EI in students making the transi-
tion to university can be generalized to gifted or high-achieving secondary students. To
date, this issue has not been systematically examined (Gómez-Arı́zaga and Conejeros-
Solar, 2013). The present study used a sample of 171 exceptionally high-achieving
secondary students (GPA of 90% or better in high school) who completed a measure
of trait EI at the start of their postsecondary studies. Following a procedure used by
Keefer et al. (2012), student’s academic progress was subsequently tracked over the next
6 years to see if the EI variables could be used to distinguish between high-achieving
secondary students who withdrew from the university and those students who completed
an undergraduate degree.
Method
Participants
The sample consisted of 171 undergraduate students (26 men, 145 women) enrolled at a
medium-sized university in Central Ontario, Canada. In terms of ethnicity, majority of
the participants (94.1%) identified themselves as White/Caucasian, 1.8% as Asian, and
the remaining 4.1% represented a mix of other cultural backgrounds. Participants were
on average 18.9 years of age (SD ¼ 0.62) at the time they started their postsecondary
education. All of the participants had been exceptionally high-achieving high school
students, with GPAs used for admittance to the university at 90.0% or higher.
Measures
Participants completed the EQ-i: S (Bar-On, 2002). This self-report tool contains 35
items measuring EI competencies in four domains: interpersonal (10 items), intraperso-
nal (10 items), adaptability (7 items), and stress management (8 items). Scores on the
four EI subscales can be further summed to provide a total EQ score. Higher scores on
the measures reflect higher levels of EI. With the participants’ consent, the following
information was also obtained from the university registrar’s records: high school GPA
(measured on a 100% scale); registration status at a 6-year follow-up (graduated vs.
withdrew).
Procedure
Participants came from several consecutive cohorts of first-year students totaling 3908
cases (72% women) from the same university. Newly registered students in each cohort
were approached by the researchers during introductory week activities (held in the first
week of September) and asked to volunteer for a study on ‘‘personality and academic
success.’’ At that time, consenting participants completed the EQ-i: S and provided
permission to obtain their high school GPA and to track their subsequent degree progress
via official university records.
188 Gifted Education International 33(2)
Results
A gender by group (graduated vs. withdrew) analysis of variance (ANOVA) was con-
ducted with high school GPA as the dependent variable. The main effects for gender and
group were not significant, nor was the interaction of gender and group. This eliminated
the need to include high school GPA as a covariate in subsequent analyses.
To examine the relationship between EI and academic success in high-achieving
students making the transition from high school to university, a gender by group
(graduated vs. withdrew) by EI dimension (interpersonal, intrapersonal, stress manage-
ment, vs. adaptability), ANOVA was conducted with EI level as the dependent variable.
Because of an unequal number of items on the EQ-i: S subscales, the ANOVA compared
mean item scores rather than scale scores. Table 1 presents the means and standard
deviations by gender and group for the various EQ-i: S scales. The main effect for
gender was not significant, nor was the interaction of gender and group, gender and
dimension, and the three-way interaction of gender, group, and dimension. The main
effect for group was significant, with the students who completed their degrees scoring
higher than the students who withdrew on overall EI (F(1167) ¼ 16.31, p < 0.001). The
main effect for dimension was also significant (F(3501) ¼ 23.132, p < 0.001). Multiple
comparisons (Student–Newman–Keuls procedure) found that students scored signifi-
cantly higher on interpersonal abilities compared to the other abilities assessed by the
EQ-i: S. Students also scored significantly higher on stress management compared to
adaptability and intrapersonal.
To understand the main effect for group, separate univariate F tests were con-
ducted comparing students who graduated and students who withdrew from the
university on each of the four EQ-i: S scales. The graduating students scored sig-
nificantly higher than the students who withdrew on interpersonal ability (F(1167) ¼
9.05, p ¼ 0.003), stress management (F(1167) ¼ 6.42, p ¼ 0.012), and adaptability
(F(1167) ¼ 21.07, p < 0.001).
Discussion
The present study found that exceptionally high-achieving high school students who
entered university with lower trait EI scores were significantly less likely to graduate
with a degree 6 years later, compared to their high-EI peers. These results are very
consistent with a previous study on the link between trait EI and successful transition
to university in the general student population, conducted at the same postsecondary
institution. Parker et al. (2006) found that, despite having comparable age, course load,
Table 1. Means and standard deviations on the EQ-i: S scales for gifted students who graduated or withdrew.
Scale Men Women Total Men Women Total Men Women Total
Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD)
Interpersonal 4.21 (.46) 4.38 (.42) 4.30 (.43) 3.85 (.51) 4.19 (.43) 4.13 (.46) 4.01 (.51) 4.30 (.43) 4.25 (.45)
Intrapersonal 3.72 (.65) 3.80 (.63) 3.78 (.63) 3.56 (.48) 3.47 (.76) 3.48 (.72) 3.63 (.55) 3.65 (.71) 3.64 (.68)
Adaptability 3.87 (.80) 3.75 (.58) 3.77 (.61) 3.51 (.37) 3.44 (.67) 3.45 (.63) 3.68 (.62) 3.61 (.64) 3.62 (.63)
Stress management 4.15 (.51) 4.02 (.51) 4.04 (.51) 3.43 (.52) 3.59 (.69) 3.56 (.66) 3.76 (.62) 3.82 (.64) 3.81 (.63)
Total 3.99 (.47) 3.99 (.37) 3.99 (.38) 3.58 (.38) 3.66 (.46) 3.65 (.44) 3.77 (.46) 3.84 (.44) 3.83 (.44)
Note: EQ-i: S: short version of the Emotional Quotient Inventory. N for graduate group was 91 (12 men and 79 women); N for withdraw group was 80 (14 men and
66 woman).
189
190 Gifted Education International 33(2)
and high school GPA, students who entered university with lower EI scores were sig-
nificantly more likely to withdraw from the university after the first year of study than
their higher EI peers. Together, these studies indicate that trait EI is a significant pre-
dictor of the successful postsecondary transition for gifted and typically developed
students alike.
It is worth noting that the EI dimensions (i.e. interpersonal, adaptability, and stress
management) that distinguished between students who withdrew and students who
graduated in the present study were also significant predictors of persistence in the
Parker et al. (2006) study. The interpersonal dimension involves abilities connected
to having good social skills and being able to interact effectively with other people
(Bar-On, 1997, 2000). The adaptability dimension involves skills related to the ability
to identify potential problems in the environment, as well as the use of realistic and
flexible coping strategies (Bar-On, 1997, 2000). The stress management dimension
involves the ability to manage stressful situations in a calm and productive manner.
Individuals who score high on this dimension are rarely impulsive and tend to work
well under pressure (Bar-On, 1997, 2000).
The link between academic success (completing an undergraduate degree) and EI in
high-achieving secondary students is hardly surprising. Graduating from high school and
going on to complete an undergraduate degree is a substantial accomplishment. It is
worth noting that slightly less than half of the students in the total cohort from which the
present sample was drawn actually graduated from the university. This is not an uncom-
mon completion rate for many postsecondary institutions in Canada and the United
States (Ross et al., 2012; Shaienks et al., 2008). Students at college and university are
confronted with a bewildering array of new personal and interpersonal challenges, even
more complicated if they attend school outside of their hometown (Witkow et al., 2015).
Not only do they need to modify existing relationships with family and high school
friends, they need to adapt to a dynamic learning environment that changes considerably
from first to upper year (Fussell et al., 2007). Not only does the academic environment
change but the financial costs of university add even more complexity to the task of
persisting, particularly if students must also balance school and work-related activities
(see, Moulin et al., 2013).
An interesting feature of the sample of exceptionally high-achieving secondary stu-
dents used in this study was that they did not differ from the rest of their student peers on
trait EI. This finding should probably not be considered surprising, since the measure of
EI used in the present study correlates very weakly with measures of cognitive intelli-
gence (Bar-On, 2002; Webb et al., 2014). Perhaps the inconsistent findings in the prior
literature on EI and giftedness is a product of the reality that gifted populations (partic-
ularly when individuals are identified via elite academic or IQ performance) are not
substantially different on emotional and social competencies. Given the growing evi-
dence that EI significantly contributes to educational performance (Keefer et al., 2012;
Perera and DiGiacomo, 2013), and the availability of psychoeducational programming
designed to enhance these competencies in students of all ages (Durlak et al., 2011;
Schutte et al., 2013; Vesely et al., 2014), university-based retention programs targeting
gifted students may want to pay particular attention to promoting various emotional and
social competencies.
Parker et al. 191
Ultimately the findings of the present study are quite consistent with the growing
consensus that successes in life, whether inside or outside of the classroom, ‘‘requires
both head strengths and heart strengths.’’ (Park and Peterson, 2010; Pfeiffer, 2013) One
limitation of the present study to note is that the sample of exceptionally high-achieving
students was comprised predominately of White female students. Future research needs
to encompass a wider range of ethnic backgrounds and include a more equivalent
proportion of male and female students.
Acknowledgements
This study was supported by a research grant to the first author from the Social Sciences and
Humanities Research Council of Canada.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this
article.
References
Arnett JJ (2000) Emerging adulthood: a theory of development from the late teens through the
twenties. American Psychologist 55: 469–480.
Austin EJ, Parker JDA, Petrides KV, et al. (2008) Emotional intelligence. In: Boyle GJ, Matthews
G and Saklofske DH (eds) The SAGE Handbook of Personality Theory And Assessment. Vol. 1.
London: SAGE Publications, pp 576–596.
Austin EJ, Saklofske DH and Mastoras SM (2010) Emotional intelligence, coping and
exam-related stress in Canadian undergraduate students. Australian Journal of Psychology
62: 42–50.
Bar-On R (1997) The Emotional Quotient Inventory (EQ-i). Technical Manual. Toronto:
Multi-Health Systems, Inc.
Bar-On R (2000) Emotional and social intelligence: insights from the emotional quotient inven-
tory. In: Bar-On R and Parker JDA (eds) Handbook of Emotional Intelligence. San Francisco:
Jossey-Bass, pp. 363–388.
Bar-On R (2002) Bar-On Emotional Quotient Inventory short form (EQ-i: Short): Technical
Manual. Toronto: Multi-Health Systems.
Berger JB and Milem JF (1999) The role of student involvement and perceptions of integration in a
causal model of student persistence. Research in Higher Education 40: 641–664.
Brown RF and Schutte NS (2006) Direct and indirect relationships between emotional intelligence
and subjective fatigue in university students. Journal of Psychosomatic Research 60: 585–593.
Cherniss C (2010) Emotional intelligence: toward clarification of a concept. Industrial and
Organizational Psychology 3: 110–126.
Davis SK and Humphrey N (2014) Ability versus trait emotional intelligence: dual influences on
adolescent psychological adaptation. Journal of Individual Differences 35: 54–62.
192 Gifted Education International 33(2)
Durlak JA, Weissberg RP, Dymnicki AB, et al. (2011) The impact of enhancing students’ social
and emotional learning: a meta-analysis of school-based universal interventions. Child Devel-
opment 82: 474–501.
Eklund K, Tanner N, Stoll K, et al. (2015) Identifying emotional and behavioral risk among gifted
and nongifted children: a multi-gate, multi-informant approach. School Psychology Quarterly
30: 197–211.
Extremera N and Fernández-Berrocal P (2006) Emotional intelligence as predictor of mental,
social, and physical health in university students. The Spanish Journal of Psychology 9: 45–51.
Fussell E, Gauthier AH and Evans A (2007) Heterogeneity in the transition to adulthood: The cases
of Australia, Canada, and the United States. European Journal of Population 23: 389–414.
Gómez-Arı́zaga MP and Conejeros-Solar ML (2013) Am I that talented? The experiences of gifted
individuals from diverse educational backgrounds at the postsecondary level. High Ability
Studies 24: 135–151.
Hammond D, McBee M and Hebert T (2007) Exploring the motivational trajectories of gifted
university students. Roeper Review 29: 197–205.
Keefer KV (2015) Self-report assessments of emotional competencies: a critical look at methods
and meanings. Journal of Psychoeducational Assessment 33: 3–23.
Keefer KV, Parker JDA and Wood LM (2012) Trait emotional intelligence and university
graduation outcomes: using latent profile analysis to identify students at risk for degree
non-completion. Journal of Psychoeducational Assessment 30: 402–413.
Martin LT, Burns RM and Schonlau M (2009) Mental disorders among gifted and nongifted youth:
a selected review of the epidemiologic literature. Gifted Child Quarterly 54: 31–41.
Martins A, Ramalho N and Morin E (2010) A comprehensive meta-analysis of the relationship
between emotional intelligence and health. Personality and Individual Differences 49:
554–564.
Mayer JD, Caruso D and Salovey P (1999) Emotional intelligence meets traditional standards for
an intelligence. Intelligence 27: 267–298.
Mayer JD, Salovey P and Caruso DR (2002) Mayer-Salovey-Caruso Emotional Intelligence Test
(MSCEIT) User’s Manual. Toronto: Multi-Health Systems.
Morawska A and Sanders MR (2008) Parenting gifted and talented children: what are the key child
behaviour and parenting issues? Australian and New Zealand Journal of Psychiatry 42:
819–827.
Moulin S, Doray P, Laplante B, et al. (2013) Work intensity and non-completion of university:
longitudinal approach and causal inference. Journal of Education and Work 26: 333–356.
Muratori M, Colangelo N and Assouline S (2003) Early-entrance students: impressions of their
first semester of college. Gifted Child Quarterly 47: 219–238.
Neihart M, Reis SM, Robinson NM, et al. (2002) Social and Emotional Development of Gifted
Children: What Do We Know? Waco: Prufrock Press.
Pancer SM, Hunsberger B, Pratt MW, et al. (2000) Cognitive complexity of expectations and
adjustment to university in the first year. Journal of Adolescent Research 15: 38–57.
Park N and Peterson C (2010) Does it matter where we live? The urban psychology of character
strengths. American Psychologist 65: 535–547.
Parker JDA, Duffy J, Wood LM, et al. (2005) Academic achievement and emotional intelligence:
predicting the successful transition from high school to university. Journal of First-Year
Experience and Students in Transition 17: 67–78.
Parker et al. 193
Parker JDA, Hogan MJ, Eastabrook JM, et al. (2006) Emotional intelligence and student retention:
predicting the successful transition from high school to university. Personality and Individual
Differences 41: 1329–1336.
Parker JDA, Keefer KV and Wood LM (2011) Toward a brief multidimensional assessment of
emotional intelligence: psychometric properties of the emotional quotient inventory–short
form. Psychological Assessment 23: 762–777.
Parker JDA, Summerfeldt LJ, Hogan MJ, et al. (2004) Emotional intelligence and academic
success: examining the transition from high school to university. Personality and Individual
Differences 36: 163–172.
Pascarella ET and Terenzini PT (2005) How College Affects Students: A Third Decade of
Research. San Francisco: Jossey-Bass.
Perera HN and DiGiacomo M (2013) The relationship of trait emotional intelligence with aca-
demic performance: a meta-analytic review. Learning and Individual Differences 28: 20–33.
Perry RP, Hladkyj S, Pekrun RH, et al. (2001) Academic control and action control in the
achievement of college students: a longitudinal field study. Journal of Educational Psychology
93: 776–789.
Petrides KV (2010) Trait emotional intelligence theory. Industrial and Organizational Psychology
3: 136–139.
Pfeiffer SI (2013) Lessons learned from working with high-ability students. Gifted Education
International 29: 86–97.
Plucker JA and Levy JJ (2001) The downside of being talented. American Psychologist 56: 75–76.
Qualter P, Whiteley H, Morley A, et al. (2009) The role of emotional intelligence in the decision to
persist with academic studies in HE. Research in Post-Compulsory Education 14: 219–231.
Rinn A and Plucker J (2004) We recruit them, but then what? The educational and psychological
experiences of academically talented undergraduates. Gifted Child Quarterly 48: 54–67.
Robbins SB, Allen J, Casillas A, et al. (2006) Unraveling the differential effects of motivational
and skills, social, and self-management measures from traditional predictors of college out-
comes. Journal of Educational Psychology 98: 598–616.
Ross T, Kena G, Rathbun A, et al. (2012) Higher Education: Gaps in Access and Persistence
Study. (NCES 2012-046). U.S. Department of Education, National Center for Education
Statistics. Washington: Government Printing Office.
Ross SE, Niebling BC and Heckert TM (1999) Sources of stress among college students. College
Student Journal 32: 312–317.
Saklofske DH, Austin EJ, Galloway J, et al. (2007) Individual difference correlates of
health-related behaviours: preliminary evidence for links between emotional intelligence and
coping. Personality and Individual Differences 42: 491–502.
Saklofske DH, Austin EJ and Minski PS (2003) Factor structure and validity of a trait emotional
intelligence measure. Personality and Individual Differences 34: 707–721.
Schutte NS, Malouff JM and Thorsteinsson EB (2013) Increasing emotional intelligence through
training: current status and future directions. International Journal of Emotional Education 5: 56–72.
Schwean VL, Saklofske DH, Widdifield-Konkin L, et al. (2006) Emotional intelligence and gifted
children. E-Journal of Applied Psychology 2: 30–37.
Shaienks D, Gluszynski T and Bayard J (2008) Postsecondary education, participation and drop-
ping out: differences across university, college and other types of postsecondary institutions.
Ottawa: Statistics Canada (ISBN: 978-1-100-10900-8).
194 Gifted Education International 33(2)
Author biographies
James DA Parker is professor of Psychology at Trent University. His research focuses
on the development of emotional competencies and in the consequences for personality
development, psychopathology and wellness when there are deficits in these abilities.
DOI: 10.1103/PhysRevFluids.5.110515
I. INTRODUCTION
Every scientist and engineer is, by necessity, a communicator. In academia, research often occurs
at interdisciplinary boundaries and requires collaboration between experts from vastly different
technical backgrounds. Within fluid dynamics, it is not unusual to find overlaps between biology,
medicine, robotics, and even paleontology. In industry, engineers frequently work in teams on large-
scale projects that require clear communication between members with different specializations.
Yet communication skills often receive short shrift in our education. A 2010 survey by the
American Society of Mechanical Engineers found that industry managers consider entry-level
engineers lacking in both oral and written communication skills [1]. Despite increased attention
to technical communication in engineering education, discrepancies remain between the skills
expected by industry and those that are taught and practiced in the academic curriculum [2,3].
Simultaneously, social media has exploded with users interested in science communication, and
many scientists now find themselves using these platforms to discuss, promote, and engage with
others about scientific content.
The term “science communication” itself is broad and not well defined. As a field of academic
research, social scientists disagree as to whether science communication constitutes its own in-
dependent discipline [4,5]. In practice, science communication is sometimes conflated with science
outreach and viewed as a vehicle for scientists explaining science to a public audience. But although
communicating with the public is an aspect of science communication, those activities do not
represent the sum of science communication. As Burns et al. [4] identify in their own definition,
“Science communication may involve science practitioners, mediators, and other members of the
general public, either peer-to-peer or between groups.” In other words, even standard academic
activities—like journal publication and conference presentations, as well as the team-based activi-
ties of industry—constitute science communication.
*
nicole.sharp@gmail.com
Published by the American Physical Society under the terms of the Creative Commons Attribution
4.0 International license. Further distribution of this work must maintain attribution to the author(s) and
the published article’s title, journal citation, and DOI.
110515-2
ADOPTING A COMMUNICATION LIFESTYLE
Finally, communicators must consider their message. They may have several key points to make,
but whether they are producing a blog post or a textbook, there should always be one overarching
message a reader or listener should leave understanding. That message serves as the keystone about
which the work is structured. In beginning composition, writers typically think of this message as
their thesis statement. In the world of research, the key message often concerns the implications of
a research project’s results.
I use this exercise of answering three purpose questions regularly, whether I am developing a
presentation, writing an article, or preparing a YouTube video. It is also an exercise I ask each of
my clients to complete as we determine the form and scope of any project. Those familiar with
science communication and scientific writing guides will recognize that these three questions echo
many other authors’ recommendations [8–10]. For those accustomed to simply opening a document
or presentation and typing, taking the time to complete this exercise can save future heartache and
revision later.
The second phase of the science communication process is construction, where the hard work
of making version 1.0 takes place. The details of this phase depend on the type of product the
communicator is making, so I will not elaborate here. Where the purpose phase is about determining
the bigger picture, this phase takes place “down in the weeds,” dealing with the practical details of
writing a paper, preparing a poster, or producing a video.
Once a first version of the product is complete, it is time for the third phase, evaluation and
refinement. In this stage, the communicator returns to the bigger picture determined in the purpose
phase and asks if their creation will help achieve their goal and whether their audience will
understand the work and its message as intended. To evaluate that aspect of a work-in-progress,
I highly recommend test audiences, friendly review [10], and/or peer review from those unfamiliar
with the work. I will return to those topics in Sec. III, and here merely note that this outside
perspective is critical to revising and refining any product before it is ready for release.
The final stage of the science communication process, reflection, typically takes place after a
project is completed, e.g., the paper published, presentation given, or video uploaded. In this stage, I
encourage communicators to take time to appreciate what worked and what did not, lessons learned
for the future, and, importantly, what skills they have gained or improved as they completed the
project.
Taken altogether, the science communication process I present here is extremely general and
may be adapted to a variety of projects. It is, by no means, the sole way to approach science
communication, but it is a construct that has helped me, my clients, and my students. With this
framework in mind, I now turn to some specifics of communication habit-building.
110515-3
N. S. SHARP
useful (Sec. III A 1). To put matters more colloquially, with any exercise suggested here, your
mileage may vary. I encourage readers, regardless of their communication experience level, to
sample widely to discover what works.
In the next two subsections, I recommend habits and exercises to pursue both individually and as
a group.
A. Habit-building as an individual
One hurdle for many inexperienced communicators is recognizing the extensive process behind
a finished product. As Heard [10] describes, “Most writers struggle. I didn’t realize this because I
had been seeing their writing product, not their writing process, which led to finished work that was
clear, smooth, and easy to understand” (emphasis in original). This need to share process and not
simply the product is one reason I have developed the science communication process outlined in
Sec. II. It is also why I encourage professors to let their students see their own working process, not
just a finalized presentation or grant proposal.
The following recommendations and exercises are aimed at individuals. Although anyone may
benefit from them, the first two sections are most beneficial for students, whereas the final section
is aimed more at those in advisory roles.
1. Planning
In my experience, planning is the most crucial (and often absent) piece of the process for students.
For this reason, I emphasize the purpose phase described in Sec. II in my science communication
workshops. Every project I undertake—including preparing this paper and the talk that preceded
it—begins by setting aside time to consider my goals, audience, and message well before I pick up
a pen or sit at the keyboard to start an outline. Others may find that wordstacks or concept maps
provide a more useful starting point for their planning; see Chapter 7 of [10] for descriptions of
those and other planning techniques.
To avoid procrastination or becoming stuck in the initial planning phase, it is helpful to break
large projects down by setting intermediate deadlines. For example, submitting a Gallery of
Fluid Motion video involves two external deadlines: the abstract submission and the final video
submission. But to ensure they meet those deadlines with a good product, a researcher could set
themselves a series of internal deadlines including: answering the three purpose questions; writing a
script; finalizing a storyboard; completing filming; creating a rough cut; receiving feedback from test
audiences; and finalizing the video. Intermediate deadlines not only help produce a better product,
they counter the sense of being overwhelmed by turning a large task into a series of smaller, more
manageable ones.
2. Getting started
For those who still struggle to get words onto the page—due to strong self-editing, for example—
freewriting exercises can help. The ground rules are simple: after determining what information
belongs in the next section, set a timer—five minutes is a good starting point—and start writing.
Stopping and rereading what is written are not allowed. Revision is not allowed until later. Often
a few minutes of this exercise is enough to give writers the momentum needed to continue. Not
everything produced this way will be worth keeping, but freewriting can help writers get past a
dreaded first draft and into the potentially easier task of revision.
110515-4
ADOPTING A COMMUNICATION LIFESTYLE
Changing one’s critiquing techniques can also reduce the time needed to give feedback. Often
professors attempt to provide detailed written feedback on large sections of a student’s work. In
some cases, the student receives a draft dripping with red ink; in others, professors choose to simply
rewrite the draft entirely. Both methods involve a large time commitment for the evaluator and
provide the student with little room to grow as a communicator.
To see why, consider the technical equivalent: a student comes to office hours because they are
struggling with a problem set. Rather than sitting down with the student and identifying specific
issues, the instructor simply hands them the solution key and sends them away. Most of us would
recognize this as a poor recipe for learning fluid dynamics, yet this is exactly how communication
feedback is often treated.
Instead, I suggest an alternative method for critique, based on techniques I was taught as a peer
writing tutor. Rather than attempting to fix every mistake at once, examine a draft for major flaws or
repeated mistakes. Perhaps the writer’s verbosity makes their logic vague and hard to follow. Find
a passage that exemplifies these issues and discuss that passage together. Point out the problem and
guide the student into finding their own clearer rephrasing, perhaps prompted by a few suggestions.
After repeating this process of identifying problems and finding solutions for a subsection of the
draft, encourage the student to continue the process on their own for the remainder of the draft
before returning for additional help.
This methodology is helpful in two ways: it saves the evaluator time they would have spent on
extensive correction, and it places responsibility for the work back into the student’s hands, allowing
them to learn both how to identify and to correct issues on their own. Without that opportunity, the
student will likely continue making the same mistakes in subsequent projects.
In particular, students who are non-native speakers of English may struggle when writing and
presenting in technical English. Remember that university writing centers are an excellent resource;
if evaluators can help a student identify a specific, persistent problem, the student can then look for
help from the writing center to address it.
Working one-on-one or individually is one way to build communication habits, but the rewards
are even richer when working as a group.
B. Habit-building as a group
Regular research group meetings are an excellent venue for practicing communication skills to-
gether. Simply integrating a communication exercise into each meeting provides all group members
regular training and deliberate practice to advance their mastery. Even outside traditional group
meetings, informal student groups can effectively improve communication skills [11].
The following exercises and resources are not an exhaustive list, but they should provide a good
starting point. I present them, roughly, in an order corresponding to the science communication
process presented in Sec. II, beginning with those useful for the purpose phase.
1. Audience adaptation
Inexperienced communicators can struggle with identifying their audience and recognizing
how audience affects the level of detail, jargon, and tone they should use. Figure 2 shows how
Dickerson et al. [12] explain their study of mammals shedding water through shaking to three
different audiences. For an academic audience, the authors use more specialized language and
longer, more complex sentence structures. In a message aimed toward politicians, they simplify
their sentences and concentrate on useful applications of their work while still including the most
significant results. In a book for the general public, Hu uses even less formal language and frames
the work in the context of a story about one specific dog.
In a group format, it is useful to discuss these issues explicitly—highlighting, for example, the
differences between speaking to other laboratory members, a conference audience, a classroom of
high schoolers, or a politician. Then the group can take a single message, like a description of the
laboratory’s focus or a particular member’s latest work, and subdivide into smaller groups, each
110515-5
N. S. SHARP
FIG. 2. Explanations of mammals shaking themselves dry aimed at three different audiences: academic
audiences [12], the public [13], and politicians [14]. Wet dog image courtesy of A. Dickerson and D. Hu, used
with permission.
tasked with adapting that message for a different, specific audience. Afterwards, each subgroup
can present their results and the group as a whole can discuss just how the message changes.
Exercises like this allow all members to participate, even in a fairly large group, and feature the
interactivity and discussion communication researchers Silva and Bultitude [15] found are effective
best practices in science communication training.
3. Narrative structure
In scientific writing and presenting, we use a standard narrative structure: introduction, methods,
results, and discussion, known collectively as IMRaD [10]. Review papers, book chapters, and other
scientific writing can deviate from this structure, but it is by far the most familiar. However, outside
of journal articles, it is often not the most useful or engaging structure for communicating science.
Especially when taking scientific work to more general audiences—including to a grant proposal
committee or a journalist—it is worth considering alternative narrative structures that, unlike
110515-6
ADOPTING A COMMUNICATION LIFESTYLE
FIG. 3. Comparing two results as complementary halves of a figure allows viewers to quickly and easily
identify differences. Screenshot from Ref. [22], used with permission.
IMRaD, return the focus to the characters engaged in the narrative, namely, the scientists. The
Hero’s Journey—the narrative structure behind stories like The Lord of the Rings, Harry Potter, and
Star Wars: A New Hope—is particularly adaptable to telling scientific stories [18]. My recorded
2018 talk on this subject is available at [19].
For those interested in a deeper dive into narrative structure, science writers and journalists
share the behind-the-scenes development of their articles on The Open Notebook website [20].
Hart [21] also provides extensive guidance on writing and structuring narrative nonfiction. To see
these principles in action in the context of fluid dynamics, I highly recommend Ref. [13].
110515-7
N. S. SHARP
what changes would make the author’s intended message clearer. For an excellent recent resource
on developing and refining scientific visuals, I recommend Ref. [23].
5. Lightning talks
Lightning talks are informal, 60-second presentations given without a visual aid. They are
often—but not always—given impromptu, similar to an “elevator pitch.” The exercise combines
well with discussions of related topics, like audience and message refinement, and provides speakers
with a chance to practice developing skills in a low-risk environment while getting useful feedback
from fellow participants [15]. The “Three Minute Thesis” competitions held at universities around
the world represent a similar idea and may be attractive for some students as a venue for practicing
their communication skills [24]. Within the fluids community, the “flash presentations” introduced
at the 2019 APS DFD meeting are another such opportunity.
With this and other exercises, sticking with strict scientific topics is not necessary. A student
might instead explain bicycle maintenance or why they prefer PYTHON over MATLAB.
8. Peer review
Numerous studies show the value of peer review—students evaluating and providing feedback
on one another’s work—in improving communication skills and reducing the evaluation load
on instructors [9,14,15,24]. Such active, collaborative learning mimics best practices in science
communication training [15], helps students identify and refine their own methods [25], and can
foster valuable mentorship roles between junior and senior students [11]. Even among undergrad-
uate engineers, Nelson [25] found that students struggling with their own communication skills
were often able to provide others with sound advice. Simpson et al. [11] describe successful
department-level communication initiatives that prompted graduate students to form their own peer
review networks.
When fostering peer review or implementing many of the exercises described above, it is
important to emphasize constructive critique [26]. Combative attitudes and singling out individuals
make the presenter feel attacked and defensive—not a useful environment for improvement. For this
reason, I highly encourage collaborative attitudes toward revision and critique.
IV. CONCLUSION
The interdisciplinary nature and visual complexity inherent to fluid dynamics makes commu-
nication skills critical for every student and practitioner. Mastering these skills requires frequent,
deliberate practice, but integrating such practice into everyday research activities need not be
110515-8
ADOPTING A COMMUNICATION LIFESTYLE
difficult or onerous. Both research groups and informal groups can pursue simple, regular com-
munication exercises and training that help all participants improve their skills.
Part of improving these skills is recognizing the process of science communication, which
consists of four stages: (1) identifying big-picture issues around goals, audience, and message; (2)
constructing a first version of the product; (3) evaluating and refining the product based on the
bigger picture and outside feedback; and (4) reflecting on the lessons learned during the project.
Of these stages, communicators often struggle most with the first and third. To this end, I have
suggested multiple exercises aimed at planning, message refinement, revision, and critique. My
hope is that these resources serve as a springboard for those looking to adopt or refine their scientific
communication lifestyle.
ACKNOWLEDGMENTS
First and foremost, I thank J. Hertzberg, D. Hu, and J. Aurnou for their vocal support both of
this paper and the talk that preceded it. Thanks are owed also to F. C. Frankel, with whom I first
developed my concept of the science communication process, and A. Athanassiadis, who introduced
me to the Half-Life Your Message exercise. I also thank G. Durey and D. Hu for granting permission
to use their work as examples.
110515-9
N. S. SHARP
[18] N. Sharp, FYFD: tips for connecting with broader audiences, in APS DFD Annual Meeting (APS, Atlanta,
GA, 2018).
[19] N. Sharp, Using the hero’s journey to communicate science, https://youtu.be/9hSDnjyVC8o.
[20] The open notebook, https://www.theopennotebook.com.
[21] J. Hart, Storycraft (The University of Chicago Press, Chicago, 2011).
[22] G. Durey, H. Kwon, Q. Magdelaine, M. Casiulis, J. Mazet, L. Keiser, H. Bense, J. Bico, P. Colinet, and
E. Reyssat, Marangoni bursting: evaporation-induced emulsification of a two-component droplet, 2017
Gallery of Fluid Motion, 2017, doi: 10.1103/APS.DFD.2017.GFM.V0020.
[23] F. C. Frankel, Picturing Science and Engineering (MIT Press, Cambridge, MA, 2018).
[24] Three minute thesis, http://threeminutethesis.org.
[25] S. Nelson, Teaching collaborative writing and peer review techniques to engineering and technology
undergraduates, in Proceedings of the 30th ASEE/IEEE Frontiers in Education Conference (IEEE, Kansas,
2000), Vol. 2, pp. S2B/1–S2B/5.
[26] L. Lerman and J. Borstel, Critical Response Process: A Method for Getting Useful Feedback on Anything
you Make, from Dance to Dessert (Dance Exchange, Inc, Takoma Park, 2003).
110515-10