Download as pdf or txt
Download as pdf or txt
You are on page 1of 143

Framework for Information Literacy

for Higher Education


Filed by the ACRL Board on February 2, 2015. Adopted by the ACRL Board, January 11, 2016.

Contents
Introduction
Frames
These six frames are presented alphabetically and do not suggest a particular sequence in which
they must be learned.

Authority Is Constructed and Contextual


Information Creation as a Process
Information Has Value
Research as Inquiry
Scholarship as Conversation
Searching as Strategic Exploration

Appendix 1: Implementing the Framework

Suggestions on How to Use the Framework for Information Literacy for Higher Education
Introduction for Faculty and Administrators
For Faculty: How to Use the Framework
For Administrators: How to Support the Framework

Appendix 2: Background of the Framework Development

Appendix 3: Sources for Further Reading

Association of College and Research Libraries (CC BY-NC-SA 4.0) 1 http://www.ala.org/acrl/standards/ilframework


Introduction
This Framework for Information Literacy for Higher Education (Framework) grows out of a belief that
information literacy as an educational reform movement will realize its potential only through a richer,
more complex set of core ideas. During the fifteen years since the publication of the Information Literacy
Competency Standards for Higher Education,1 academic librarians and their partners in higher education
associations have developed learning outcomes, tools, and resources that some institutions have deployed
to infuse information literacy concepts and skills into their curricula. However, the rapidly changing
higher education environment, along with the dynamic and often uncertain information ecosystem in
which all of us work and live, require new attention to be focused on foundational ideas about that
ecosystem. Students have a greater role and responsibility in creating new knowledge, in understanding
the contours and the changing dynamics of the world of information, and in using information, data, and
scholarship ethically. Teaching faculty have a greater responsibility in designing curricula and
assignments that foster enhanced engagement with the core ideas about information and scholarship
within their disciplines. Librarians have a greater responsibility in identifying core ideas within their own
knowledge domain that can extend learning for students, in creating a new cohesive curriculum for
information literacy, and in collaborating more extensively with faculty.

The Framework offered here is called a framework intentionally because it is based on a cluster of
interconnected core concepts, with flexible options for implementation, rather than on a set of standards
or learning outcomes, or any prescriptive enumeration of skills. At the heart of this Framework are
conceptual understandings that organize many other concepts and ideas about information, research, and
scholarship into a coherent whole. These conceptual understandings are informed by the work of Wiggins
and McTighe,2 which focuses on essential concepts and questions in developing curricula, and also by
threshold concepts,3 which are those ideas in any discipline that are passageways or portals to enlarged
understanding or ways of thinking and practicing within that discipline. This Framework draws upon an
ongoing Delphi Study that has identified several threshold concepts in information literacy,4 but the
Framework has been molded using fresh ideas and emphases for the threshold concepts. Two added
elements illustrate important learning goals related to those concepts: knowledge practices,5 which are
demonstrations of ways in which learners can increase their understanding of these information literacy
concepts, and dispositions,6 which describe ways in which to address the affective, attitudinal, or valuing
dimension of learning. The Framework is organized into six frames, each consisting of a concept central
to information literacy, a set of knowledge practices, and a set of dispositions. The six concepts that
anchor the frames are presented alphabetically:

• Authority Is Constructed and Contextual


• Information Creation as a Process
• Information Has Value
• Research as Inquiry
• Scholarship as Conversation
• Searching as Strategic Exploration

Neither the knowledge practices nor the dispositions that support each concept are intended to prescribe
what local institutions should do in using the Framework; each library and its partners on campus will
need to deploy these frames to best fit their own situation, including designing learning outcomes. For the
same reason, these lists should not be considered exhaustive.

In addition, this Framework draws significantly upon the concept of metaliteracy,7 which offers a
renewed vision of information literacy as an overarching set of abilities in which students are consumers
and creators of information who can participate successfully in collaborative spaces.8 Metaliteracy
demands behavioral, affective, cognitive, and metacognitive engagement with the information ecosystem.

Association of College and Research Libraries (CC BY-NC-SA 4.0) 2 http://www.ala.org/acrl/standards/ilframework


This Framework depends on these core ideas of metaliteracy, with special focus on metacognition,9 or
critical self-reflection, as crucial to becoming more self-directed in that rapidly changing ecosystem.

Because this Framework envisions information literacy as extending the arc of learning throughout
students’ academic careers and as converging with other academic and social learning goals, an expanded
definition of information literacy is offered here to emphasize dynamism, flexibility, individual growth,
and community learning:

Information literacy is the set of integrated abilities encompassing the reflective discovery of
information, the understanding of how information is produced and valued, and the use of
information in creating new knowledge and participating ethically in communities of learning.

The Framework opens the way for librarians, faculty, and other institutional partners to redesign
instruction sessions, assignments, courses, and even curricula; to connect information literacy with
student success initiatives; to collaborate on pedagogical research and involve students themselves in that
research; and to create wider conversations about student learning, the scholarship of teaching and
learning, and the assessment of learning on local campuses and beyond.

Notes

1. Association of College & Research Libraries, Information Literacy Competency Standards for Higher Education
(Chicago, 2000).
2. Grant Wiggins and Jay McTighe. Understanding by Design. (Alexandria, VA: Association for Supervision and
Curriculum Development, 2004).
3. Threshold concepts are core or foundational concepts that, once grasped by the learner, create new perspectives and
ways of understanding a discipline or challenging knowledge domain. Such concepts produce transformation within the
learner; without them, the learner does not acquire expertise in that field of knowledge. Threshold concepts can be
thought of as portals through which the learner must pass in order to develop new perspectives and wider
understanding. Jan H. F. Meyer, Ray Land, and Caroline Baillie. “Editors’ Preface.” In Threshold Concepts and
Transformational Learning, edited by Jan H. F. Meyer, Ray Land, and Caroline Baillie, ix–xlii. (Rotterdam,
Netherlands: Sense Publishers, 2010).
4. For information on this unpublished, in-progress Delphi Study on threshold concepts and information literacy,
conducted by Lori Townsend, Amy Hofer, Silvia Lu, and Korey Brunetti, see http://www.ilthresholdconcepts.com/.
Lori Townsend, Korey Brunetti, and Amy R. Hofer. “Threshold Concepts and Information Literacy.” portal: Libraries
and the Academy 11, no. 3 (2011): 853–69.
5. Knowledge practices are the proficiencies or abilities that learners develop as a result of their comprehending a
threshold concept.
6. Generally, a disposition is a tendency to act or think in a particular way. More specifically, a disposition is a cluster of
preferences, attitudes, and intentions, as well as a set of capabilities that allow the preferences to become realized in a
particular way. Gavriel Salomon. “To Be or Not to Be (Mindful).” Paper presented at the American Educational
Research Association Meetings, New Orleans, LA, 1994.
7. Metaliteracy expands the scope of traditional information skills (determine, access, locate, understand, produce, and
use information) to include the collaborative production and sharing of information in participatory digital
environments (collaborate, produce, and share). This approach requires an ongoing adaptation to emerging technologies
and an understanding of the critical thinking and reflection required to engage in these spaces as producers,
collaborators, and distributors. Thomas P. Mackey and Trudi E. Jacobson. Metaliteracy: Reinventing Information
Literacy to Empower Learners. (Chicago: Neal-Schuman, 2014).
8. Thomas P. Mackey and Trudi E. Jacobson. “Reframing Information Literacy as a Metaliteracy.” College and Research
Libraries 72, no. 1 (2011): 62–78.
9. Metacognition is an awareness and understanding of one’s own thought processes. It focuses on how people learn and
process information, taking into consideration people’s awareness of how they learn. (Jennifer A. Livingston.
“Metacognition: An Overview.” Online paper, State University of New York at Buffalo, Graduate School of Education,
1997. http://gse.buffalo.edu/fas/shuell/cep564/metacog.htm.)

Association of College and Research Libraries (CC BY-NC-SA 4.0) 3 http://www.ala.org/acrl/standards/ilframework


Authority Is Constructed and Contextual
Information resources reflect their creators’ expertise and credibility, and are evaluated based on
the information need and the context in which the information will be used. Authority is
constructed in that various communities may recognize different types of authority. It is contextual
in that the information need may help to determine the level of authority required.

Experts understand that authority is a type of influence recognized or exerted within a community. Experts
view authority with an attitude of informed skepticism and an openness to new perspectives, additional voices,
and changes in schools of thought. Experts understand the need to determine the validity of the information
created by different authorities and to acknowledge biases that privilege some sources of authority over others,
especially in terms of others’ worldviews, gender, sexual orientation, and cultural orientations. An
understanding of this concept enables novice learners to critically examine all evidence—be it a short blog post
or a peer-reviewed conference proceeding—and to ask relevant questions about origins, context, and suitability
for the current information need. Thus, novice learners come to respect the expertise that authority represents
while remaining skeptical of the systems that have elevated that authority and the information created by it.
Experts know how to seek authoritative voices but also recognize that unlikely voices can be authoritative,
depending on need. Novice learners may need to rely on basic indicators of authority, such as type of
publication or author credentials, where experts recognize schools of thought or discipline-specific paradigms.

Knowledge Practices
Learners who are developing their information literate abilities

• define different types of authority, such as subject expertise (e.g., scholarship), societal position
(e.g., public office or title), or special experience (e.g., participating in a historic event);
• use research tools and indicators of authority to determine the credibility of sources,
understanding the elements that might temper this credibility;
• understand that many disciplines have acknowledged authorities in the sense of well-known
scholars and publications that are widely considered “standard,” and yet, even in those situations,
some scholars would challenge the authority of those sources;
• recognize that authoritative content may be packaged formally or informally and may include
sources of all media types;
• acknowledge they are developing their own authoritative voices in a particular area and recognize
the responsibilities this entails, including seeking accuracy and reliability, respecting intellectual
property, and participating in communities of practice;
• understand the increasingly social nature of the information ecosystem where authorities actively
connect with one another and sources develop over time.

Dispositions
Learners who are developing their information literate abilities

• develop and maintain an open mind when encountering varied and sometimes conflicting perspectives;
• motivate themselves to find authoritative sources, recognizing that authority may be conferred or
manifested in unexpected ways;
• develop awareness of the importance of assessing content with a skeptical stance and with a self-
awareness of their own biases and worldview;
• question traditional notions of granting authority and recognize the value of diverse ideas and
worldviews;
• are conscious that maintaining these attitudes and actions requires frequent self-evaluation.

Association of College and Research Libraries (CC BY-NC-SA 4.0) 4 http://www.ala.org/acrl/standards/ilframework


Information Creation as a Process
Information in any format is produced to convey a message and is shared via a selected delivery
method. The iterative processes of researching, creating, revising, and disseminating information
vary, and the resulting product reflects these differences.

The information creation process could result in a range of information formats and modes of delivery, so
experts look beyond format when selecting resources to use. The unique capabilities and constraints of
each creation process as well as the specific information need determine how the product is used. Experts
recognize that information creations are valued differently in different contexts, such as academia or the
workplace. Elements that affect or reflect on the creation, such as a pre- or post-publication editing or
reviewing process, may be indicators of quality. The dynamic nature of information creation and
dissemination requires ongoing attention to understand evolving creation processes. Recognizing the
nature of information creation, experts look to the underlying processes of creation as well as the final
product to critically evaluate the usefulness of the information. Novice learners begin to recognize the
significance of the creation process, leading them to increasingly sophisticated choices when matching
information products with their information needs.

Knowledge Practices
Learners who are developing their information literate abilities

• articulate the capabilities and constraints of information developed through various creation
processes;
• assess the fit between an information product’s creation process and a particular information
need;
• articulate the traditional and emerging processes of information creation and dissemination in a
particular discipline;
• recognize that information may be perceived differently based on the format in which it is
packaged;
• recognize the implications of information formats that contain static or dynamic information;
• monitor the value that is placed upon different types of information products in varying contexts;
• transfer knowledge of capabilities and constraints to new types of information products;
• develop, in their own creation processes, an understanding that their choices impact the purposes
for which the information product will be used and the message it conveys.

Dispositions
Learners who are developing their information literate abilities

• are inclined to seek out characteristics of information products that indicate the underlying
creation process;
• value the process of matching an information need with an appropriate product;
• accept that the creation of information may begin initially through communicating in a range of
formats or modes;
• accept the ambiguity surrounding the potential value of information creation expressed in
emerging formats or modes;
• resist the tendency to equate format with the underlying creation process;
• understand that different methods of information dissemination with different purposes are
available for their use.

Association of College and Research Libraries (CC BY-NC-SA 4.0) 5 http://www.ala.org/acrl/standards/ilframework


Information Has Value
Information possesses several dimensions of value, including as a commodity, as a means of
education, as a means to influence, and as a means of negotiating and understanding the world.
Legal and socioeconomic interests influence information production and dissemination.

The value of information is manifested in various contexts, including publishing practices, access to
information, the commodification of personal information, and intellectual property laws. The novice
learner may struggle to understand the diverse values of information in an environment where “free”
information and related services are plentiful and the concept of intellectual property is first encountered
through rules of citation or warnings about plagiarism and copyright law. As creators and users of
information, experts understand their rights and responsibilities when participating in a community of
scholarship. Experts understand that value may be wielded by powerful interests in ways that marginalize
certain voices. However, value may also be leveraged by individuals and organizations to effect change
and for civic, economic, social, or personal gains. Experts also understand that the individual is
responsible for making deliberate and informed choices about when to comply with and when to contest
current legal and socioeconomic practices concerning the value of information.

Knowledge Practices
Learners who are developing their information literate abilities

• give credit to the original ideas of others through proper attribution and citation;
• understand that intellectual property is a legal and social construct that varies by culture;
• articulate the purpose and distinguishing characteristics of copyright, fair use, open access, and
the public domain;
• understand how and why some individuals or groups of individuals may be underrepresented or
systematically marginalized within the systems that produce and disseminate information;
• recognize issues of access or lack of access to information sources;
• decide where and how their information is published;
• understand how the commodification of their personal information and online interactions affects
the information they receive and the information they produce or disseminate online;
• make informed choices regarding their online actions in full awareness of issues related to
privacy and the commodification of personal information.

Dispositions
Learners who are developing their information literate abilities

• respect the original ideas of others;


• value the skills, time, and effort needed to produce knowledge;
• see themselves as contributors to the information marketplace rather than only consumers of it;
• are inclined to examine their own information privilege.

Association of College and Research Libraries (CC BY-NC-SA 4.0) 6 http://www.ala.org/acrl/standards/ilframework


Research as Inquiry
Research is iterative and depends upon asking increasingly complex or new questions whose
answers in turn develop additional questions or lines of inquiry in any field.

Experts see inquiry as a process that focuses on problems or questions in a discipline or between
disciplines that are open or unresolved. Experts recognize the collaborative effort within a discipline to
extend the knowledge in that field. Many times, this process includes points of disagreement where debate
and dialogue work to deepen the conversations around knowledge. This process of inquiry extends
beyond the academic world to the community at large, and the process of inquiry may focus upon
personal, professional, or societal needs. The spectrum of inquiry ranges from asking simple questions
that depend upon basic recapitulation of knowledge to increasingly sophisticated abilities to refine
research questions, use more advanced research methods, and explore more diverse disciplinary
perspectives. Novice learners acquire strategic perspectives on inquiry and a greater repertoire of
investigative methods.

Knowledge Practices
Learners who are developing their information literate abilities

• formulate questions for research based on information gaps or on reexamination of existing,


possibly conflicting, information;
• determine an appropriate scope of investigation;
• deal with complex research by breaking complex questions into simple ones, limiting the scope of
investigations;
• use various research methods, based on need, circumstance, and type of inquiry;
• monitor gathered information and assess for gaps or weaknesses;
• organize information in meaningful ways;
• synthesize ideas gathered from multiple sources;
• draw reasonable conclusions based on the analysis and interpretation of information.

Dispositions
Learners who are developing their information literate abilities

• consider research as open-ended exploration and engagement with information;


• appreciate that a question may appear to be simple but still disruptive and important to research;
• value intellectual curiosity in developing questions and learning new investigative methods;
• maintain an open mind and a critical stance;
• value persistence, adaptability, and flexibility and recognize that ambiguity can benefit the
research process;
• seek multiple perspectives during information gathering and assessment;
• seek appropriate help when needed;
• follow ethical and legal guidelines in gathering and using information;
• demonstrate intellectual humility (i.e., recognize their own intellectual or experiential
limitations).

Association of College and Research Libraries (CC BY-NC-SA 4.0) 7 http://www.ala.org/acrl/standards/ilframework


Scholarship as Conversation
Communities of scholars, researchers, or professionals engage in sustained discourse with new
insights and discoveries occurring over time as a result of varied perspectives and interpretations.

Research in scholarly and professional fields is a discursive practice in which ideas are formulated, debated,
and weighed against one another over extended periods of time. Instead of seeking discrete answers to
complex problems, experts understand that a given issue may be characterized by several competing
perspectives as part of an ongoing conversation in which information users and creators come together and
negotiate meaning. Experts understand that, while some topics have established answers through this process,
a query may not have a single uncontested answer. Experts are therefore inclined to seek out many
perspectives, not merely the ones with which they are familiar. These perspectives might be in their own
discipline or profession or may be in other fields. While novice learners and experts at all levels can take part
in the conversation, established power and authority structures may influence their ability to participate and
can privilege certain voices and information. Developing familiarity with the sources of evidence, methods,
and modes of discourse in the field assists novice learners to enter the conversation. New forms of scholarly
and research conversations provide more avenues in which a wide variety of individuals may have a voice in
the conversation. Providing attribution to relevant previous research is also an obligation of participation in the
conversation. It enables the conversation to move forward and strengthens one’s voice in the conversation.

Knowledge Practices
Learners who are developing their information literate abilities

• cite the contributing work of others in their own information production;


• contribute to scholarly conversation at an appropriate level, such as local online community,
guided discussion, undergraduate research journal, conference presentation/poster session;
• identify barriers to entering scholarly conversation via various venues;
• critically evaluate contributions made by others in participatory information environments;
• identify the contribution that particular articles, books, and other scholarly pieces make to
disciplinary knowledge;
• summarize the changes in scholarly perspective over time on a particular topic within a specific
discipline;
• recognize that a given scholarly work may not represent the only - or even the majority -
perspective on the issue.

Dispositions
Learners who are developing their information literate abilities

• recognize they are often entering into an ongoing scholarly conversation and not a finished
conversation;
• seek out conversations taking place in their research area;
• see themselves as contributors to scholarship rather than only consumers of it;
• recognize that scholarly conversations take place in various venues;
• suspend judgment on the value of a particular piece of scholarship until the larger context for the
scholarly conversation is better understood;
• understand the responsibility that comes with entering the conversation through participatory channels;
• value user-generated content and evaluate contributions made by others;
• recognize that systems privilege authorities and that not having a fluency in the language and
process of a discipline disempowers their ability to participate and engage.

Association of College and Research Libraries (CC BY-NC-SA 4.0) 8 http://www.ala.org/acrl/standards/ilframework


Searching as Strategic Exploration
Searching for information is often nonlinear and iterative, requiring the evaluation of a range of
information sources and the mental flexibility to pursue alternate avenues as new understanding
develops.

The act of searching often begins with a question that directs the act of finding needed information.
Encompassing inquiry, discovery, and serendipity, searching identifies both possible relevant sources as
well as the means to access those sources. Experts realize that information searching is a contextualized,
complex experience that affects, and is affected by, the cognitive, affective, and social dimensions of the
searcher. Novice learners may search a limited set of resources, while experts may search more broadly
and deeply to determine the most appropriate information within the project scope. Likewise, novice
learners tend to use few search strategies, while experts select from various search strategies, depending
on the sources, scope, and context of the information need.

Knowledge Practices
Learners who are developing their information literate abilities

• determine the initial scope of the task required to meet their information needs;
• identify interested parties, such as scholars, organizations, governments, and industries, who
might produce information about a topic and then determine how to access that information;
• utilize divergent (e.g., brainstorming) and convergent (e.g., selecting the best source) thinking
when searching;
• match information needs and search strategies to appropriate search tools;
• design and refine needs and search strategies as necessary, based on search results;
• understand how information systems (i.e., collections of recorded information) are organized in
order to access relevant information;
• use different types of searching language (e.g., controlled vocabulary, keywords, natural
language) appropriately;
• manage searching processes and results effectively.

Dispositions
Learners who are developing their information literate abilities

• exhibit mental flexibility and creativity;


• understand that first attempts at searching do not always produce adequate results;
• realize that information sources vary greatly in content and format and have varying relevance
and value, depending on the needs and nature of the search;
• seek guidance from experts, such as librarians, researchers, and professionals;
• recognize the value of browsing and other serendipitous methods of information gathering;
• persist in the face of search challenges, and know when they have enough information to
complete the information task

Association of College and Research Libraries (CC BY-NC-SA 4.0) 9 http://www.ala.org/acrl/standards/ilframework


Appendix 1: Implementing the Framework
Suggestions on How to Use the Framework for Information
Literacy for Higher Education
The Framework is a mechanism for guiding the development of information literacy programs within
higher education institutions while also promoting discussion about the nature of key concepts in
information in general education and disciplinary studies. The Framework encourages thinking about how
librarians, faculty, and others can address core or portal concepts and associated elements in the
information field within the context of higher education. The Framework will help librarians contextualize
and integrate information literacy for their institutions and will encourage a deeper understanding
of what knowledge practices and dispositions an information literate student should develop.
The Framework redefines the boundaries of what librarians teach and how they conceptualize
the study of information within the curricula of higher education institutions.

The Framework has been conceived as a set of living documents on which the profession will build. The
key product is a set of frames, or lenses, through which to view information literacy, each of which
includes a concept central to information literacy, knowledge practices, and dispositions. The Association
of College & Research Libraries (ACRL) encourages the library community to discuss the
new Framework widely and to develop resources such as curriculum guides, concept maps, and
assessment instruments to supplement the core set of materials in the frames.

As a first step, ACRL encourages librarians to read through the entire Framework and discuss the
implications of this new approach for the information literacy program at their institution. Possibilities
include convening a discussion among librarians at an institution or joining an online discussion of
librarians. In addition, as one becomes familiar with the frames, consider discussing them with
professionals in the institution’s center for teaching and learning, office of undergraduate education, or
similar departments to see whether some synergies exist between this approach and other institutional
curricular initiatives.

The frames can guide the redesign of information literacy programs for general education courses, for
upper level courses in students’ major departments, and for graduate student education. The frames are
intended to demonstrate the contrast in thinking between novice learner and expert in a specific area;
movement may take place over the course of a student’s academic career. Mapping out in what way
specific concepts will be integrated into specific curriculum levels is one of the challenges of
implementing the Framework. ACRL encourages librarians to work with faculty, departmental or college
curriculum committees, instructional designers, staff from centers for teaching and learning, and others to
design information literacy programs in a holistic way.

ACRL realizes that many information literacy librarians currently meet with students via one-shot classes,
especially in introductory level classes. Over the course of a student’s academic program, one-shot
sessions that address a particular need at a particular time, systematically integrated into the curriculum,
can play a significant role in an information literacy program. It is important for librarians and teaching
faculty to understand that the Framework is not designed to be implemented in a single information
literacy session in a student’s academic career; it is intended to be developmentally and systematically
integrated into the student’s academic program at variety of levels. This may take considerable time to
implement fully in many institutions.

Association of College and Research Libraries (CC BY-NC-SA 4.0) 10 http://www.ala.org/acrl/standards/ilframework


ACRL encourages information literacy librarians to be imaginative and innovative in implementing
the Framework in their institution. The Framework is not intended to be prescriptive but to be used as a
guidance document in shaping an institutional program. ACRL recommends piloting the implementation
of the Framework in a context that is useful to a specific institution, assessing the results and sharing
experiences with colleagues.

How to Use This Framework

• Read and reflect on the entire Framework document.


• Convene or join a group of librarians to discuss the implications of this approach to
information literacy for your institution.
• Reach out to potential partners in your institution, such as departmental curriculum
committees, centers for teaching and learning, or offices of undergraduate or graduate
studies, to discuss how to implement the Framework in your institutional context.
• Using the Framework, pilot the development of information literacy sessions within a
particular academic program in your institution, and assess and share the results with your
colleagues.
• Share instructional materials with other information literacy librarians in the online
repository developed by ACRL.

Association of College and Research Libraries (CC BY-NC-SA 4.0) 11 http://www.ala.org/acrl/standards/ilframework


Introduction for Faculty and Administrators
Considering Information Literacy
Information literacy is the set of integrated abilities encompassing the reflective discovery of
information, the understanding of how information is produced and valued, and the use of
information in creating new knowledge and participating ethically in communities of learning.

This Framework sets forth these information literacy concepts and describes how librarians as
information professionals can facilitate the development of information literacy by postsecondary
students.

Creating a Framework
ACRL has played a leading role in promoting information literacy in higher education for decades.
The Information Literacy Competency Standards for Higher Education (Standards), first published in
2000, enabled colleges and universities to position information literacy as an essential learning outcome
in the curriculum and promoted linkages with general education programs, service learning, problem-
based learning, and other pedagogies focused on deeper learning. Regional accrediting bodies, the
American Association of Colleges and Universities (AAC&U), and various discipline-specific
organizations employed and adapted the Standards.

It is time for a fresh look at information literacy, especially in light of changes in higher education,
coupled with increasingly complex information ecosystems. To that end, an ACRL Task Force developed
the Framework. The Framework seeks to address the great potential for information literacy as a deeper,
more integrated learning agenda, addressing academic and technical courses, undergraduate research,
community-based learning, and co-curricular learning experiences of entering freshman through
graduation. The Framework focuses attention on the vital role of collaboration and its potential for
increasing student understanding of the processes of knowledge creation and scholarship.
The Framework also emphasizes student participation and creativity, highlighting the importance of these
contributions.

The Framework is developed around a set of “frames,” which are those critical gateway or portal
concepts through which students must pass to develop genuine expertise within a discipline, profession,
or knowledge domain. Each frame includes a knowledge practices section used to demonstrate how the
mastery of the concept leads to application in new situations and knowledge generation. Each frame also
includes a set of dispositions that address the affective areas of learning.

Association of College and Research Libraries (CC BY-NC-SA 4.0) 12 http://www.ala.org/acrl/standards/ilframework


For Faculty: How to Use the Framework
A vital benefit in using threshold concepts as one of the underpinnings for the Framework is the potential
for collaboration among disciplinary faculty, librarians, teaching and learning center staff, and others.
Creating a community of conversations about this enlarged understanding should engender more
collaboration, more innovative course designs, and a more inclusive consideration of learning within and
beyond the classroom. Threshold concepts originated as faculty pedagogical research within disciplines.
Because information literacy is both a disciplinary and a transdisciplinary learning agenda, using a
conceptual framework for information literacy program planning, librarian-faculty collaboration, and
student co-curricular projects can offer great potential for curricular enrichment and transformation. As a
faculty member, you can take the following approaches:

• Investigate threshold concepts in your discipline and gain an understanding of the approach used
in the Framework as it applies to the discipline you know.

— What are the specialized information skills in your discipline that students should
develop, such as using primary sources (history) or accessing and managing large data sets
(science)?

• Look for workshops at your campus teaching and learning center on the flipped classroom and
consider how such practices could be incorporated into your courses.

— What information and research assignments can students do outside of class to arrive
prepared to apply concepts and conduct collaborative projects?

• Partner with your IT department and librarians to develop new kinds of multimedia assignments
for courses.

— What kinds of workshops and other services should be available for students involved in
multimedia design and production?

• Help students view themselves as information producers, individually and collaboratively.

— In your program, how do students interact with, evaluate, produce, and share information
in various formats and modes?

• Consider the knowledge practices and dispositions in each information literacy frame for possible
integration into your own courses and academic program.

— How might you and a librarian design learning experiences and assignments that will
encourage students to assess their own attitudes, strengths/weaknesses, and knowledge gaps
related to information?

Association of College and Research Libraries (CC BY-NC-SA 4.0) 13 http://www.ala.org/acrl/standards/ilframework


For Administrators: How to Support the Framework
Through reading the Framework document and discussing it with your institutions’ librarians, you can
begin to focus on the best mechanisms to implement the Framework in your institution. As an
administrator, you can take the following approaches:

• Host or encourage a series of campus conversations about how the institution can incorporate
the Framework into student learning outcomes and supporting curriculum
• Provide the resources to enhance faculty expertise and opportunities for understanding and
incorporating the Framework into the curriculum
• Encourage committees working on planning documents related to teaching and learning (at the
department, program, and institutional levels) to include concepts from the Framework in their
work
• Provide resources to support a meaningful assessment of information literacy of students at
various levels at your institution
• Promote partnerships between faculty, librarians, instructional designers, and others to develop
meaningful ways for students to become content creators, especially in their disciplines

Association of College and Research Libraries (CC BY-NC-SA 4.0) 14 http://www.ala.org/acrl/standards/ilframework


Appendix 2: Background of the Framework Development
The Information Literacy Competency Standards for Higher Education were published in 2000 and
brought information literacy into higher education conversations and advanced our field. These, like
all ACRL standards, are reviewed cyclically. In July 2011, ACRL appointed a Task Force to decide what,
if anything, to do with the current Standards. In June 2012, that Task Force recommended that the
current Standards be significantly revised. This previous review Task Force made recommendations that
informed the current revision Task Force, formed in 2013, with the following charge:

to update the Information Literacy Competency Standards for Higher Education so they reflect the
current thinking on such things as the creation and dissemination of knowledge, the changing
global higher education and learning environment, the shift from information literacy to
information fluency, and the expanding definition of information literacy to include
multiple literacies, for example, transliteracy, media literacy, digital literacy, etc.

The Task Force released the first version of the Framework in two parts in February and April of 2014
and received comments via two online hearings and a feedback form available online for four weeks. The
committee then revised the document, released the second draft on June 17, 2014, and sought extensive
feedback through a feedback form, two online hearings, an in-person hearing, and analysis of social
media and topical blog posts.

On a regular basis, the Task Force used all of ACRL’s and American Library Association’s (ALA)
communication channels to reach individual members and ALA and ACRL units (committees, sections,
round tables, ethnic caucuses, chapters, and divisions) with updates. The Task Force’s liaison
at ACRL maintained a private e-mail distribution list of over 1,300 individuals who attended a fall, spring,
or summer online forum; provided comments to the February, April, June, or November drafts; or
were otherwise identified as having strong interest and expertise. This included members of the Task
Force that drafted the Standards, leading Library Information Science (LIS) researchers and national
project directors, members of the Information Literacy Rubric Development Team for the Association of
American Colleges & Universities, and Valid Assessment of Learning in Undergraduate Education
initiative. Via all these channels, the Task Force regularly shared updates, invited discussion at virtual and
in-person forums and hearings, and encouraged comments on public drafts of the proposed Framework.

ACRL recognized early on that the effect of any changes to the Standards would be significant both
within the library profession and in higher education more broadly. In addition to general announcements,
the Task Force contacted nearly 60 researchers who cited the Standards in publications
outside LIS literature, more than 70 deans, associate deans, directors or chairs of LIS schools, and invited
specific staff leaders (and press or communications contacts) at more than 70 other higher education
associations, accrediting agencies, and library associations and consortia to encourage their members to
read and comment on the draft.

The Task Force systematically reviewed feedback from the first and second drafts of the Framework,
including comments, criticism, and praise provided through formal and informal channels. The three
official online feedback forms had 562 responses; numerous direct e-mails were sent to members of the
Task Force. The group was proactive in tracking feedback on social media, namely blog posts and
Twitter. While the data harvested from social media are not exhaustive, the Task Force made its best
efforts to include all known Twitter conversations, blog posts, and blog commentary. In total, there were
several hundred feedback documents, totaling over a thousand pages, under review. The content of these
documents was analyzed by members of the Task Force and coded using HyperResearch, a qualitative
data analysis software. During the drafting and vetting process, the Task Force provided more detail on
the feedback analysis in an online FAQ document.

Association of College and Research Libraries (CC BY-NC-SA 4.0) 15 http://www.ala.org/acrl/standards/ilframework


The Task Force continued to revise the document and published the third revision in November 2014,
again announcing broadly and seeking comments via a feedback form.

As of November 2014, the Task Force members included the following:

• Craig Gibson, Professor, Ohio State University Libraries (Co-chair)


• Trudi E. Jacobson, Distinguished Librarian and Head, Information Literacy Department,
University at Albany, SUNY, University Libraries (Co-chair)
• Elizabeth Berman, Science and Engineering Librarian, University of Vermont (Member)
• Carl O. DiNardo, Assistant Professor and Coordinator of Library Instruction/Science
Librarian, Eckerd College (Member)
• Lesley S. J. Farmer, Professor, California State University–Long Beach (Member)
• Ellie A. Fogarty, Vice President, Middle States Commission on Higher Education (Member)
• Diane M. Fulkerson, Social Sciences and Education Librarian, University of South Florida in
Lakeland (Member)
• Merinda Kaye Hensley, Instructional Services Librarian and Scholarly Commons Co-coordinator,
University of Illinois at Urbana-Champaign (Member)
• Joan K. Lippincott, Associate Executive Director, Coalition for Networked Information
(Member)
• Michelle S. Millet, Library Director, John Carroll University (Member)
• Troy Swanson, Teaching and Learning Librarian, Moraine Valley Community College (Member)
• Lori Townsend, Data Librarian for Social Sciences and Humanities, University of New Mexico
(Member)
• Julie Ann Garrison, Associate Dean of Research and Instructional Services, Grand Valley State
University (Board Liaison)
• Kate Ganski, Library Instruction Coordinator, University of Wisconsin–Milwaukee (Visiting
Program Officer, from September 1, 2013, through June 30, 2014)
• Kara Malenfant, Senior Strategist for Special Initiatives, Association of College and Research
Libraries (Staff Liaison)

In December 2014, the Task Force made final changes. Two other ACRL groups reviewed and provided
feedback on the final drafts: the ACRL Information Literacy Standards Committee and
the ACRL Standards Committee. The latter group submitted the final document and recommendations to
the ACRL Board for its review at the 2015 ALA Midwinter Meeting in Chicago.

Note: Filed by the ACRL Board February 2, 2015; Adopted by the ACRL Board January 11, 2016.

Association of College and Research Libraries (CC BY-NC-SA 4.0) 16 http://www.ala.org/acrl/standards/ilframework


Appendix 3: Sources for Further Reading
The following sources are suggested readings for those who want to learn more about the ideas
underpinning the Framework, especially the use of threshold concepts and related pedagogical models.
Some readings here also explore other models for information literacy, discuss students’ challenges with
information literacy, or offer examples of assessment of threshold concepts. Landmark works on
threshold concept theory and research on this list are the edited volumes by Meyer, Land,
and Baillie (Threshold Concepts and Transformational Learning) and by Meyer and Land (Threshold
Concepts and Troublesome Knowledge: Linkages to Ways of Thinking and Practicing within the
Disciplines). In addition, numerous research articles, conference papers, reports, and presentations on
threshold concepts are cited on the regularly updated website Threshold Concepts: Undergraduate
Teaching, Postgraduate Training, and Professional Development; A Short Introduction and Bibliography,
available at http://www.ee.ucl.ac.uk/~mflanaga/thresholds.html. See the Framework Wordpress site for
current news and resources.

ACRL Information Literacy Competency Standards Review Task Force. “Task Force
Recommendations.” ACRL AC12 Doc 13.1, June 2,
2012. http://www.ala.org/acrl/sites/ala.org.acrl/files/content/standards/ils_recomm.pdf.
American Association for School Librarians. Standards for the 21st-Century Learner. Chicago:
American Library Association,
2007. http://www.ala.org/aasl/sites/ala.org.aasl/files/content/guidelinesandstandards/learningstand
ards/AASL_LearningStandards.pdf.
Blackmore, Margaret. “Student Engagement with Information: Applying a Threshold Concept
Approach to Information Literacy Development.” Paper presented at the 3rd Biennial Threshold
Concepts Symposium: Exploring Transformative Dimensions of Threshold Concepts, Sydney,
Australia, July 1–2, 2010.
Carmichael, Patrick. “Tribes, Territories, and Threshold Concepts: Educational Materialisms at
Work in Higher Education.” Educational Philosophy and Theory 44, no. S1 (2012): 31–42.
Coonan, Emma. A New Curriculum for Information Literacy: Teaching Learning; Perceptions of
Information Literacy. Arcadia Project, Cambridge University Library, July
2011. http://ccfil.pbworks.com/f/emma_report_final.pdf.
Cousin, Glynis. "An Introduction to Threshold Concepts." Planet 17 (December 2006): 4–5.
———. “Threshold Concepts, Troublesome Knowledge and Emotional Capital: An Exploration
into Learning about Others.” In Overcoming Barriers to Student Understanding: Threshold
Concepts and Troublesome Knowledge, edited by Jan H. F. Meyer and Ray Land, 134–47.
London and New York: Routledge, 2006.
Gibson, Craig, and Trudi Jacobson. “Informing and Extending the Draft ACRL Information
Literacy Framework for Higher Education: An Overview and Avenues for Research.” College
and Research Libraries 75, no. 3 (May 2014): 250–4.
Head, Alison J. “Project Information Literacy: What Can Be Learned about the Information-
Seeking Behavior of Today’s College Students?” Paper presented at the ACRL National
Conference, Indianapolis, IN, April 10–13, 2013.
Hofer, Amy R., Lori Townsend, and Korey Brunetti. “Troublesome Concepts and Information
Literacy: Investigating Threshold Concepts for IL Instruction.” portal: Libraries and the
Academy 12, no. 4 (2012): 387–405.

Association of College and Research Libraries (CC BY-NC-SA 4.0) 17 http://www.ala.org/acrl/standards/ilframework


Jacobson, Trudi E., and Thomas P. Mackey. “Proposing a Metaliteracy Model to Redefine
Information Literacy.” Communications in Information Literacy 7, no. 2 (2013): 84–91.
Kuhlthau, Carol C. “Rethinking the 2000 ACRL Standards: Some Things to
Consider.” Communications in Information Literacy 7, no. 3 (2013): 92–7.
———. Seeking Meaning: A Process Approach to Library and Information Services. Westport,
CT: Libraries Unlimited, 2004.
Limberg, Louise, Mikael Alexandersson, Annika Lantz-Andersson, and Lena Folkesson. “What
Matters? Shaping Meaningful Learning through Teaching Information Literacy.” Libri 58, no. 2
(2008): 82–91.
Lloyd, Annemaree. Information Literacy Landscapes: Information Literacy in Education,
Workplace and Everyday Contexts. Oxford: Chandos Publishing, 2010.
Lupton, Mandy Jean. The Learning Connection: Information Literacy and the Student
Experience. Blackwood: South Australia: Auslib Press, 2004.
Mackey, Thomas P., and Trudi E. Jacobson. Metaliteracy: Reinventing Information Literacy to
Empower Learners. Chicago: Neal-Schuman, 2014.
Martin, Justine. “Refreshing Information Literacy.” Communications in Information Literacy 7,
no. 2 (2013): 114–27.
Meyer, Jan, and Ray Land. Threshold Concepts and Troublesome Knowledge: Linkages to Ways
of Thinking and Practicing within the Disciplines. Edinburgh, UK: University of Edinburgh,
2003.
Meyer, Jan H. F., Ray Land, and Caroline Baillie. “Editors’ Preface.” In Threshold Concepts and
Transformational Learning, edited by Jan H. F. Meyer, Ray Land, and Caroline Baillie, ix–xlii.
Rotterdam, Netherlands: Sense Publishers, 2010.
Middendorf, Joan, and David Pace. “Decoding the Disciplines: A Model for Helping Students
Learn Disciplinary Ways of Thinking.” New Directions for Teaching and Learning, no. 98
(2004): 1–12.
Oakleaf, Megan. “A Roadmap for Assessing Student Learning Using the New Framework for
Information Literacy for Higher Education.” Journal of Academic Librarianship 40, no. 5
(September 2014): 510–4.
Secker, Jane. A New Curriculum for Information Literacy: Expert Consultation Report. Arcadia
Project, Cambridge University Library, July
2011. http://ccfil.pbworks.com/f/Expert_report_final.pdf.
Townsend, Lori, Korey Brunetti, and Amy R. Hofer. “Threshold Concepts and Information
Literacy.” portal: Libraries and the Academy 11, no. 3 (2011): 853–69.
Tucker, Virginia, Christine Bruce, Sylvia Edwards, and Judith Weedman. “Learning Portals:
Analyzing Threshold Concept Theory for LIS Education.” Journal of Education for Library and
Information Science 55, no. 2 (2014): 150–65.
Wiggins, Grant, and Jay McTighe. Understanding by Design. Alexandria, VA: Association for
Supervision and Curriculum Development, 2004.

Association of College and Research Libraries (CC BY-NC-SA 4.0) 18 http://www.ala.org/acrl/standards/ilframework


computer law & security review 34 (2018) 467–476

Available online at www.sciencedirect.com

ScienceDirect

w w w. c o m p s e c o n l i n e . c o m / p u b l i c a t i o n s / p r o d c l a w. h t m

Guidelines for the responsible application of


data analytics

Roger Clarke a,b,c,*


a
Xamax Consultancy Pty Ltd, Canberra, Australia
b
University of NSW Law, Sydney, Australia
c
Research School of Computer Science, Australian National University, Canberra, Australia

A B S T R A C T

Keywords: The vague but vogue notion of ‘big data’ is enjoying a prolonged honeymoon. Well-funded,
Big data ambitious projects are reaching fruition, and inferences are being drawn from inadequate
Data science data processed by inadequately understood and often inappropriate data analytic tech-
Data quality niques. As decisions are made and actions taken on the basis of those inferences, harm
Decision quality will arise to external stakeholders, and, over time, to internal stakeholders as well. A set of
Regulation Guidelines is presented, whose purpose is to intercept ill-advised uses of data and analyti-
cal tools, prevent harm to important values, and assist organisations to extract the achievable
benefits from data, rather than dreaming dangerous dreams.
© 2017 Roger Clarke. Published by Elsevier Ltd. All rights reserved.

Jagadish et al. (2014), Cai and Zhu (2015) and Haryadi et al.
1. Introduction (2016), and particularly Merino et al. (2016).
Outside academe, most publications that offer advice appear
Previous enthusiasms for management science, decision support to be motivated not by the avoidance of harm to affected values,
systems, data warehousing and data mining have been reju- but rather the protection of the interests of organisations con-
venated. Fervour for big data, big data analytics and data science ducting analyses and using the results. Examples of such
has been kindled, and is being sustained, by high-pressure tech- documents in the public sector include DoFD (2015) – subse-
nology salesmen. Like all such fads, there is a kernel of truth, quently withdrawn, and UKCO (2016). Nothing resembling
but also a large penumbra of misunderstanding and misrep- guidelines appears to have been published to date by the rel-
resentation, and hence considerable risk of disappointment, evant US agencies, but see NIST (2015) and GAO (2016).
and worse. Some professional codes and statements are relevant, such
A few documents have been published that purport to as UNSD (1985), DSA (2016), ASA (2016) and ACM (2017). Ex-
provide some advice on how to avoid harm arising from the amples also exist in the academic research arena, e.g. Rivers
practice of these techniques. Within the specialist big data ana- and Lewis (2014), Müller et al. (2016) and Zook et al. (2017).
lytics literature, the large majority of articles focus on techniques However, reflecting the dependence of the data professions
and applications, with impacts and implications relegated to on the freedom to ply their trade, such documents are
a few comments at the end of the paper rather than even being oriented towards facilitation, with the protection of stakehold-
embedded within the analysis, let alone a driving factor in the ers commonly treated as a constraint rather than as an
design. But see Agrawal et al. (2011), Saha and Srivastava (2014), objective.

* Corresponding author. Xamax Consultancy Pty Ltd, 78 Sidaway St, Chapman ACT 2611 Canberra, Australia.
E-mail address: Roger.Clarke@xamax.com.au (R. Clarke).
https://doi.org/10.1016/j.clsr.2017.11.002
0267-3649/© 2017 Roger Clarke. Published by Elsevier Ltd. All rights reserved.
468 computer law & security review 34 (2018) 467–476

Documents have begun to emerge from government agen- Big data analytics encompasses all processes applied to big data
cies that perform regulatory rather than stimulatory functions. that may enable inferences to be drawn from it.
See, for example, a preliminary statement issued by Data Pro-
tection Commissioners (WP29, 2014), a consultation draft from The term ‘data scientist’ emerged two decades ago as an
the Australian Privacy Commissioner (OAIC, 2016), and a docu- upbeat alternative to ‘statistician’ (Press, 2013). Its focus is on
ment issued by the Council of Europe Convention 108 group analytic techniques, whereas the more recent big data move-
(CoE 2017). These are, however, unambitious and diffuse, re- ment commenced with its focus on data. The term ‘data science’
flecting the narrow statutory limitations of such organisations has been increasingly co-opted by the computer science dis-
to the protection of personal data. For a more substantial dis- cipline and business communities in order to provide greater
cussion paper, see ICO (2017). respectability to big data practices. Although computer science
It is vital that guidance be provided for at least those prac- has developed some additional techniques, a primary focus has
titioners who are concerned about the implications of their been the scalability of computational processes to cope with
work. In addition, a reference-point is needed as a basis for large volumes of disparate data. It may be that the re-capture
evaluating the adequacy of organisational practices, of the codes of the field by the statistics discipline will bring with it a re-
and statements of industry and professional bodies, of rec- covery of high standards of professionalism and responsibility
ommendations published by regulatory agencies, and of the – which, this paper argues, are sorely needed. In this paper,
provisions of laws and statutory codes. This paper’s purpose however, the still-current term ‘big data analytics’ is used.
is to offer such a reference-point, expressed as guidelines for Where data is not in a suitable form for application of any
practitioners who are seeking to act responsibly in their ap- particular data analytic technique, modifications may be made
plication of analytics to big data collections. to it in an attempt to address the data’s deficiencies. This was
This paper draws heavily on previous research reported in for many years referred to as ‘data scrubbing’, but it has become
Wigan and Clarke (2013), Clarke (2016a, 2016b), Raab and more popular among proponents of data analytics to use the
Clarke (2016) and Clarke (2017b). It also reflects literature misleading terms ‘data cleaning’ and ‘data cleansing’ (e.g. Rahm
critical of various aspects of the big data movement, notably and Do, 2000, Müller and Freytag, 2003). These terms imply that
Bollier (2010), Boyd and Crawford (2011), Lazer et al. (2014), the scrubbing process reliably achieves its aim of delivering a
Metcalf and Crawford (2016), King and Forder (2016) and high-quality data collection. Whether that is actually so is highly
Mittelstadt et al. (2016). It first provides a brief overview of contestable, and is seldom demonstrated through testing against
the field, sufficient to provide background for the remainder the real world that the modified data purports to represent.
of the paper. It then presents a set of Guidelines whose There are many challenging aspects of data quality. What should
intentions are to filter out inappropriate applications of data be done where data-items that are important to the analysis
analytics, and provide a basis for recourse by aggrieved parties are empty (‘null’)? And what should be done where they contain
against organisations whose malbehaviour or misbehaviour values that are invalid according to the item’s definition, or have
results in harm. An outline is provided of various possible been the subject of varying definitions over the period during
applications of the Guidelines. which the data-set has been collected? Another term that has
come into currency is ‘data wrangling’ (Kandel et al., 2011). Al-
though the term is honest and descriptive, and the authors
adopt a systematic approach to the major challenge of missing
2. Background data, their processes for ‘correcting erroneous values’ are merely
computationally-based ‘transforms’, neither sourced from nor
The ‘big data’ movement is largely a marketing phenom- checked against the real world. The implication that data is
enon. Much of the academic literature has been cavalier in its ‘clean’ or ‘cleansed’ is commonly an over-claim, and hence such
adoption and reticulation of vague assertions by salespeople. terms should be avoided in favour of the frank and usefully
As a result, definitions of sufficient clarity to assist in analy- descriptive term ‘data scrubbing’.
sis are in short supply. This author adopts the approach of Where data is consolidated from two or more data collec-
treating as ‘big data’ any collection that is sufficiently large that tions, some mechanism is needed to determine which records
someone is interested in applying sophisticated analytical tech- in each collection are appropriately merged or linked. In some
niques to it. However, it is important to distinguish among circumstances there may be a common data-item in each
several categories: collection that enables associations between records to be
reliably postulated. In many cases, a combination of data-
• a single large data collection; and items (e.g., in the case of people, the set of first and last
• a consolidation of two or more data collections, which may name, date-of-birth and postcode) may be regarded as repre-
be achieved through: senting the equivalent of a common identifier. This process
• merger into a single physical data collection; or has long been referred to as computer or data matching
• interlinkage into a single virtual data collection (Clarke, 1994). Other approaches can be adopted, but gener-
ally with even higher incidences of false-positives (matches
The term ‘big data analytics’ is distinguishable from its pre- that are made but that are incorrect) and false-negatives
decessor ‘data mining’ primarily on the basis of the decade in (matches that could have been made but were not). A further
which it is used. It is subject to marketing hype to almost the issue is the extent to which a consolidated collection should
same extent as ‘big data’. So all-inclusive are its usages that contain all entries or only those for which a match has (or
a reasonable working definition is: has not) been found. This decision may have a significant
computer law & security review 34 (2018) 467–476 469

impact on the usability of the collection, and on the quality


Table 1 – Quality factors.
of inferences drawn from it.
Data Quality Factors
Significantly, descriptions of big data analytics processes
(assessable at the time of creation and subsequently)
seldom make any provision for a pre-assessment of the nature
D1 Syntactic Validity
and quality of the data that is to be processed. See, for example, Conformance of the data with the domain on which the data-
Jagadish (2015) and Cao (2017). Proponents of big data analyt- item is defined
ics are prone to make claims akin to ‘big trumps good’, and D2 Appropriate (Id)entity Association
that data quality is irrelevant if enough data is available. Cir- A high level of confidence that the data is associated with the
cumstances exist in which such claims may be reasonable; but particular real-world identity or entity whose attribute(s) it is
intended to represent
for most purposes they are not (Bollier, 2010; Boyd and Crawford,
D3 Appropriate Attribute Association
2011; Clarke, 2016a), and data quality is an important consid-
The absence of ambiguity about which real-world attribute(s)
eration. McFarland and McFarland (2015) argue that ‘precisely the data is intended to represent
inaccurate’ results arise from the ‘biased samples’ that are an D4 Appropriate Attribute Signification
inherent feature of big data. The absence of ambiguity about the particular state of the
A structured framework for assessing data quality is pre- particular real-world attribute(s) that the data is intended to
sented in Table 1. It draws on a range of sources, importantly represent
D5 Accuracy
Huh et al. (1990), Wang and Strong (1996), Müller and Freytag
A high degree of correspondence of the data with the real-
(2003) and Piprani and Ernst (2008). See also Hazen et al. (2014).
world phenomenon that it is intended to represent, typically
Each of the factors in the first group can be assessed at the measured by a confidence interval, such as ‘±1 degree Celsius’
time of data acquisition and subsequently, whereas those in D6 Precision
the second group, distinguished as ‘information quality’ factors, The level of detail at which the data is captured, reflecting the
can only be judged at the time of use. domain on which valid contents for that data-item are
Underlying these factors are features of data that are often defined, such as ’whole numbers of degrees Celsius’
D7 Temporal Applicability
overlooked, but that become very important in the ‘big data’
The absence of ambiguity about the date and time when, or
context of data expropriation, re-purposing and merger. At the period of time during which, the data represents or
the heart of the problem is the materially misleading pre- represented a particular real-world phenomenon. This is
sumption that data is ‘captured’. That which pre-exists the important in the case of volatile data-items such as total
act of data collection comprises real-world phenomena, not rainfall for the last 12 months, marital status, fitness for work,
data that is available for ‘capture’. Each item of data is created, age, and the period during which an income-figure was earned
or a licence was applicable
by a process performed by a human or an artefact that senses
Information Quality Factors
the world and records a symbol that is intended to represent
(assessable only at the time of use)
some aspect of the phenomena that is judged to be relevant. I1 Theoretical Relevance
The choice of phenomena and of their attributes, and the A demonstrable capability of the data-item to make a
processes for creating data to represent them, are designed difference to the inferencing process in which the data is to be
and implemented by or on behalf of some entity that has used
some purpose in mind. The effort invested in data quality I2 Practical Relevance
A demonstrable capability of the data-item’s content to make a
assurance at the time that it is created reflects the character-
difference to the inferencing process in which the data is to be
istics of the human or artefact that creates it, the process
used
whereby it is created, the purpose of the data, the value of I3 Currency
the data and of data quality to the relevant entity, and the The absence of a material lag between a real-world occurrence
available resources. Hence the relationship between the data- and the recording of the corresponding data
item and the real world phenomenon that it purports to I4 Completeness
represent is not infrequently tenuous, and is subject to limi- The availability of sufficient contextual information that the
data is not liable to be misinterpreted
tations of definition, observation, measurement, accuracy,
I5 Controls
precision and cost. The application of business processes that ensure that the data
The conduct of data analytics also depends heavily on the quality and information quality factors have been considered
meanings imputed to data-items. Uncertainties arise even prior to the data’s use
within a single data collection. Where a consolidated collec- I6 Auditability
tion is being analysed, inferences may be drawn based on The availability of metadata that evidences the data quality
and information quality factors
relationships among data-items that originated from differ-
ent sources. The reasonableness of the inferences is heavily Adapted version of Table 1 of Clarke (2016a)
dependent not only on the quality and meaning of each item,
but also on the degree of compatibility among their quality pro-
files and meanings. being predictive, and then apply them as if they were
A further serious concern is the propensity for propo- prescriptive.
nents of big data to rely on correlations, without any context When big data analytics techniques are discussed, the notion
resembling a causative model. This even extends to champi- of Artificial Intelligence (AI) is frequently invoked. This is a catch-
oning the death of theory (Anderson, 2008; Mayer-Schonberger all term that has been used since the mid-1950s. Various strands
and Cukier, 2013). Further, it is all too common for propo- have had spurts of achievement, particularly in the
nents of big data analytics to interpret correlations as somehow pattern-matching field, but successes have been interspersed
470 computer law & security review 34 (2018) 467–476

within a strong record of failure, and considerable dispute (e.g. The majority of big data analytics activity is performed
Dreyfus, 1992, Katz, 2012). Successive waves of enthusiasts keep behind closed doors. One common justification for this is
emerging, to frame much the same challenges somewhat dif- commercial competitiveness, but other factors are com-
ferently, and win more grant money from parallel new waves monly at work, in both private and public sector contexts. As
of funding decision-makers. Meanwhile, the water has been a result of the widespread lack of transparency, it is far from
muddied by breathless, speculative extensions of AI notions clear that practices take into account the many challenges that
into the realms of metaphysics. In particular, an aside by von are identified in this section.
Neumann about a ‘singularity’ has been elevated to spiritu- Transparency is in any case much more challenging in the
ality (Moravec, 2000; Kurzweil, 2005), and longstanding sci-fi contemporary context than it was in the past. During the early
notions of ‘super-intelligence’ have been re-presented as phi- decades of software development, until c.1990, the rationale
losophy (Bostrom, 2014). underlying any particular inference was apparent from the
Multiple threads of AI are woven into big data mythology. independently-specified algorithm or procedure implemented
Various words with a similarly impressive sound to ‘intelli- in the software. Subsequently, so-called expert systems adopted
gent’ have been used as marketing banners, such as ‘expert’, an approach whereby the problem-domain is described, but
‘neural’, ‘connectionist’, ‘learning’ and ‘predictive’. Defini- the problem and solution, and hence the rationale for an in-
tions are left vague, with each new proposal applying Arthur ference, are much more difficult to access. Recently, purely
C. Clarke’s Third Law, and striving to be ‘indistinguishable empirical techniques such as neural nets and the various ap-
from magic’ and hence to gain the mantle of ‘advanced tech- proaches to machine learning have attracted a lot of attention.
nology’. Within the research community, expressions of These do not even embody a description of a problem domain.
scepticism are in short supply, but Lipton (2015) encapsu- They merely comprise a quantitative summary of some set of
lates the problem by referring to “an unrealistic expectation instances (Clarke, 1991). In such circumstances, no humanly-
that modern feed-forward neural networks exhibit human- understandable rationale for an inference exists, and in many
like cognition”. cases none can be created. As a result, transparency is non-
One cluster of techniques is marketed as ‘machine learn- existent, and accountability is impossible (Burrell, 2016; Knight,
ing’. A commonly-adopted approach (‘supervised learning’) 2017). To cater for such problems, Broeders et al. (2017), writing
involves some kind of (usually quite simple) data structure being in the context of national security applications, called for the
provided to a piece of generic software, often one that has an imposition of a legal duty of care and requirements for exter-
embedded optimisation function. A ‘training set’ of data is fed nal reviews, and the banning of automated decision-making.
in. The process of creating this artefact is claimed to consti- This brief review has identified a substantial set of risk
tute ‘learning’. Aspects of the “substantial amount of ‘black art’“ factors. Critique is important, but critique is by its nature nega-
involved are discussed in Domingos (2012). tive in tone. It is incumbent on critics to also offer positive and
Even where some kind of objective is inherent in the data sufficiently concrete contributions towards resolution of the
structure and/or the generic software, application of the meta- problems that they perceive. The primary purpose of this paper
phor of ‘learning’ is something of stretch for what is a sub- is to present a set of Guidelines whose application would
human and in many cases a non-rational process (Burrell, 2016). address the problems and establish a reliable professional basis
A thread of work that hopes to overcome some of the weak- for the practice of data analytics.
nesses expands the approach from a single level to a multi-
layered model. Inevitably, this too has been given marketing
gloss by referring to it as ‘deep learning’. Even some enthusi-
asts are appalled by the hyperbole: “machine learning
algorithms [are] not silver bullets, . . . not magic pills, . . . not 3. The Guidelines
tools in a toolbox – they are method{ologie}s backed by ratio-
nal thought processes with assumptions regarding the datasets The Guidelines presented here avoid the word ‘big’, and refer
they are applied to” (Rosebrock, 2014). simply to ‘data’ and ‘data analytics’. These are straightfor-
A field called ‘predictive analytics’ over-claims in a differ- ward and generic terms whose use conveys the prescriptions’
ent way. Rather than merely extrapolating from a data- broad applicability. The Guidelines are of particular relevance
series, it involves the extraction of patterns and then to personal data, because data analytics harbours very sub-
extrapolation of the patterns rather than the data; so the claim stantial threats when applied to data about individuals. The
of ‘prediction’ is bold. Even some enthusiasts have warned that Guidelines are expressed quite generally, however, because in-
predictive analytics can have “‘unintended side effects’ – [things] ferences drawn from any form of data may have negative
you didn’t really count on when you decided to build models implications for individuals, groups, communities, societies, poli-
and put them out there in the wild” (Perlich, quoted in Swoyer ties, economies or the environment. The purpose of the
(2017)). Guidelines is to assist in the avoidance of harm to all values
There is little doubt that there are specific applications to of all stakeholders. In addition to external stakeholders, share-
which each particular approach is well-suited – and also little holders and employees stand to lose where material harm to
doubt that each is neither a general approach nor deserving a company’s value arises from poorly-conducted data analyt-
of the pretentious title used to market it. As a tweeted apho- ics, including not only financial loss and compliance breaches
rism has it: “Most firms that think they want advanced AI/ML but also reputational damage.
really just need linear regression on cleaned-up data” (Hanson, The Guidelines are presented in Table 2, divided into
2016). four segments. Three of the segments correspond to the
computer law & security review 34 (2018) 467–476 471

Table 2 – Guidelines for the responsible application of data analytics.


1. General

DO’s

1.1 Governance
Ensure that a comprehensive governance framework is in place prior to, during, and for the relevant period after data acquisition,
analysis and use activities, that it is commensurate with the activities’ potential impacts, and that it encompasses:
a. risk assessment and risk management from the perspectives of all affected parties
b. express assignments of accountability, at an appropriate level of granularity

1.2 Expertise
Ensure that all individuals participating in the activities have education, training, and experience in relation to the real-world systems
about which inferences are to be drawn, appropriate to the roles that they play

1.3 Compliance
Ensure that all activities are compliant with all relevant laws and established public policy positions within relevant jurisdictions, and
with public standards of behaviour

2. Data Acquisition

DO’s

2.1 The Problem Domain


Understand the real-world systems about which inferences are to be drawn and to which data analytics are to be applied

2.2 The Data Sources


Understand each source of data, including:
a. the data’s provenance
b. the purposes for which the data was created
c. the meaning of each data-item at the time of creation
d. the data quality at the time of creation
e. the data quality and information quality at the time of use

2.3 Data Merger


If data is to be merged from multiple sources, assess the compatibility of the various collections, records and items of data, taking
into account the data’s provenance, purposes, meaning and quality, and the potential impact of mis-matching and mistaken
assumptions

2.4 Data Scrubbing


If data is to be scrubbed, cleaned or cleansed, assess the reliability of the processes for the intended purpose and the potential
impacts of mistaken assumptions and erroneous changes

2.5 Identity Protection


If the association of data with an entity is sensitive, apply techniques to the data whose effectiveness is commensurate with the
risks to those entities, in order to ensure pseudonymisation (if the purpose is to draw inferences about individual entities), or
de-identification (if the purpose is other than to draw inferences about individual entities)

2.6 Data Security


Minimise the risks arising from data acquisition, storage, access, distribution and retention, and manage the unavoidable risks

DON’Ts

2.7 Identifier Compatibility


Don’t merge data-sets unless the identifiers in each data-set are compatible with one another at a level of reliability commensurate
with the potential impact of the inferences drawn

2.8 Content Compatibility


Don’t merge data-sets unless the reliability of comparisons among the data-items in the sources reaches a threshold commensurate
with the potential impact of the inferences drawn

3. Data Analysis

DO’s

3.1 Expertise
Ensure that all staff and contractors involved in the analysis have:
a. appropriate professional qualifications
b. training in the specific tools and processes
c. sufficient familiarity with the real-world system to which the data relates and with the manner in which the data purports to
represent that real-world system
d. accountability for their analyses

3.2 The Nature of the Tools


Understand the origins, nature and limitations of data analytic tools that are considered for use
(continued on next page)
472 computer law & security review 34 (2018) 467–476

Table 2 – (continued)
3.3 The Nature of the Data Processed by the Tools
Understand the assumptions that data analytic tools make about the data that they process, and the extent to which the data to be
processed is consistent with those assumptions. Important areas in which assumptions may exist include:
a. the presence of values in relevant data-items
b. the presence of only specific, pre-defined values in relevant data-items
c. the scales against which relevant data-items have been measured
d. the precision with which relevant data-items have been expressed

3.4 The Suitability of the Tool and the Data


Demonstrate the applicability of each particular data analytic tool to the particular data that it is proposed be processed using it

DON’Ts

3.5 Inappropriate Data


Don’t apply data analytics unless the data satisfies threshold tests commensurate with the potential impact of the inferences
drawn, in relation to data quality, internal consistency, and reliable correspondence with the real-world systems about which
inferences are to be drawn

3.6 Humanly-Understandable Rationale


Don’t apply an analytical tool that lacks transparency, by which is meant that the rationale for inferences that it draws is expressible
in humanly-understandable terms

4. Use of the Inferences

DO’s

4.1 The Impacts


Understand the potential negative impacts on stakeholders of reliance on the inferences drawn, taking into account the quality of
the data and the data analysis process

4.2 Evaluation
Where decisions based on inferences from data analytics may have material negative impacts, evaluate the advantages and
disadvantages of proceeding, by conducting cost-benefit analysis and risk assessment from an organisational perspective, and
impact assessments from the perspectives of other internal and external stakeholders

4.3 Reality Testing


Test a sufficient sample of the results of the analysis against the real world, in order to gain insight into the reliability of the data as a
representation of relevant real-world entities and their attributes

4.4 Safeguards
Design, implement and maintain safeguards and mitigation measures, together with controls that ensure the safeguards and
mitigation measures are functioning as intended, commensurate with the potential impacts of the inferences drawn

4.5 Proportionality
Where specific decisions based on inferences from data analytics may have material negative impacts on individuals, consider the
reasonableness of the decisions prior to committing to them

4.6 Contestability
Where actions are taken based on inferences drawn from data analytics, ensure that the rationale for the decisions is transparent
to people affected by them, and that mechanisms exist whereby stakeholders can access information about, and if appropriate
complain about and dispute interpretations, inferences, decisions and actions

4.7 Breathing Space


Provide stakeholders who perceive that they will be negatively impacted by the action with the opportunity to understand and to
contest the proposed action

4.8 Post-Implementation Review


Ensure that actions and their outcomes are audited, and that adjustments are made to reflect the findings

DON’Ts

4.9 Humanly-Understandable Rationale


Don’t take actions based on inferences drawn from an analytical tool in any context that may have a material negative impact on
any stakeholder unless the rationale for each inference is readily available to those stakeholders in humanly-understandable terms

4.10 Precipitate Actions


Don’t take actions based on inferences drawn from data analytics until stakeholders who perceive that they may be materially
negatively impacted by the action have had a reasonable opportunity to understand and to contest the proposed action. Denial of a
reasonable opportunity is only justifiable on the basis of emergency, as distinct from urgency or mere expediency or efficiency. Where
a reasonable opportunity is not provided, ensure that stringent safeguards, mitigation measures and controls are designed,
implemented and maintained in relation to justification, reporting, review, and recourse in the case of unjustified or disproportionate
actions

4.11 Automated Decision-Making


Don’t delegate to a device any decision that has potentially harmful effects without ensuring that it is subject to specific human
approval prior to implementation, by a person who is acting as an agent for the accountable organisation
computer law & security review 34 (2018) 467–476 473

successive processes involved – acquisition of the data, analy- positive outcomes for organisations in terms of the quality of
sis of the data in order to draw inferences, and use of the work performed, and particularly by providing a means of de-
inferences. The first segment specifies generic requirements fending against and deflecting negative media reports, public
that apply across all of the phases. concerns about organisational actions, and acts by any regu-
Each Guideline is expressed in imperative mode, some in lator that may have relevant powers. In practice, however, such
the positive and others in the negative. However, they are not Codes are applied by only a proportion of the relevant
statements of law, nor are they limited to matters that are organisations, are seldom taken very seriously (such as by em-
subject to legal obligations. They are declarations of what is bedding them within corporate policies, procedures, training
needed in order to manage the risks arising from data quality programs and practices), are unenforceable, and generally offer
issues, data meaning uncertainties, incompatibilities in data very limited benefits to external stakeholders. Nonetheless,
meaning among similar data-items sourced from different some modest improvements would be likely to accrue from
data-collections, misinterpretations of meaning, mistakes in- adoption, perhaps at the level of symbolism, but more likely
troduced by data scrubbing, approaches taken to missing data as a means of making it more difficult for data analytics issues
that may solve some problems but at the cost of creating or to be ignored.
exacerbating others, erroneous matches, unjustified assump- Individual organisations can take positive steps beyond such,
tions about the scale against which data has been measured, largely nominal, industry sector arrangements. They can embed
inappropriate applications of analytical tools, lack of review, consideration of the factors identified in these Guidelines into
and confusions among correlation, causality, predictive power their existing business case, cost/benefit and/or risk assess-
and normative force. ment and management processes. In order to fulfil their
The organisations and individuals to whom each Guide- corporate social responsibility commitments, they can also
line is addressed will vary depending on the context. In some evaluate proposed uses of data analytics from the perspec-
circumstances, a single organisation, a single small team within tives of external stakeholders. A very narrow and inadequate
an organisation, or even a single individual, might perform all approach to this merely checks legal compliance, as occurs with
of the activities involved. On the other hand, multiple teams the pseudo-PIA processes conventional in the public sector
within one organisation, or across multiple organisations, may throughout much of North America (Clarke, 2011 s.4), and in
perform several of the activities. the new European ‘Data Protection Impact Assessment’ (DPIA)
The Guidelines are intended to be comprehensive. As a mechanism (Clarke, 2017a). Much more appropriately, a com-
result, in any particular context, some of them will be prehensive Privacy Impact Assessment can be performed
redundant, and some would be more usefully expressed some- (Clarke, 2009; Wright and de Hert, 2012). In some circum-
what differently. In particular, some of the statements are stances, a much broader social impact assessment is warranted
primarily relevant to data that refers to an individual human (Raab and Wright, 2012; Wright and Friedewald, 2013). Raab &
being. Such statements may be irrelevant, or may benefit from Wright (2012 pp. 379–381) calls for extension of the scope of
re-phrasing, where the data relates to inanimate parts of the PIAs firstly to a wide range of impacts on the individual’s “re-
physical world (e.g. meteorological, geophysical, vehicular traffic lationships, positions and freedoms”, then to “impacts on groups
or electronic traffic data), or to aggregate economic or social and categories”, and finally to “impacts on society and the po-
phenomena. In such circumstances, careful sub-setting and ad- litical system”.
aptation of the Guidelines is appropriate. A further step that individual organisations can take is to
enter into formal undertakings to comply with a Code, com-
bined with submission to the decisions of a complaints body,
ombudsman or tribunal that is accessible by any aggrieved party
4. Ways to apply the Guidelines that has the resources to conduct investigations, that has en-
forcement powers, and that uses them. Unfortunately, such
These Guidelines, in their current or some adapted form, can arrangements are uncommon, and it is not obvious that suit-
be adopted by any organisation. Staff and contractors can be able frameworks exist within which an enforceable Code along
required to demonstrate that their projects are compliant, or, the lines of these Guidelines could be implemented.
to the extent that they are not, to explain why not. In prac- Another possibility is for a formal and sufficiently precise
tice, adoption may be driven by staff and contractors, because Standard to be established, and for this to be accepted by courts
many practitioners are concerned about the implications of their as the measuring-stick against which the behaviour of
work, and would welcome the availability of an instrument that organisations that conduct data analytics is to be measured.
enables them to raise issues in the context of project risk A loose mechanism of this kind is declaration by an organisation
management. that it is compliant with a particular published Standard. In
Organisational self-regulation of this kind has the capac- principle, this would appear to create a basis for court action
ity to deliver value for the organisation and for shareholders, by aggrieved parties. In practice, however, it appears that such
but it has only a mediocre track-record in benefiting stake- mechanisms are seldom effective in protecting either inter-
holders outside the organisation. A stronger institutional nal or external stakeholders.
framework is needed if preventable harm arising from inap- As discussed earlier, some documents exist that at least
propriate data, analysis and use is to be avoided. purport to provide independent guidance in relation to data
Industry associations can adopt or adapt the Guidelines, as analytics activities. These Guidelines can be used as a yard-
can government agencies that perform oversight functions. In- stick against which such documents can be measured. The
dustry regulation through a Code of Practice may achieve some UK Cabinet Office’s ‘Data Science Ethical Framework’ (UKCO,
474 computer law & security review 34 (2018) 467–476

2016) was assessed against an at-that-time-unformalised applications of the ideas exist, nor that all data collections are
version of these Guidelines, and found to be seriously wanting of such low quality that no useful inferences can be drawn from
(Raab and Clarke, 2016). For different reasons, and in differ- them, nor that all mergers of data from multiple sources are
ent ways, the Council of Europe document (CoE 2017) falls a necessarily logically invalid or necessarily deliver fatally flawed
very long way short of what is needed by professionals and consolidated data-sets, nor that all data scrubbing fails to clean
the public alike as a basis for responsible use of data analyt- data, nor that all data analytics techniques make assump-
ics. The US Government Accountability Office has identified tions about data that can under no circumstances be justified,
the existence of “possible validity problems in the data and nor that all inferences drawn must be wrong. Expressed in the
models used in [data analytics and innovation efforts – DAI]” positive, some big data has potential value, and some appli-
(GAO, 2016, p. 38), but has done nothing about them. An cations of data analytics techniques are capable of realising
indication of the document’s dismissiveness of the issues is that potential.
this quotation: “In automated decision making [using machine What this paper has done is to identify a very large fleet
learning], monitoring and assessment of data quality and of challenges that have to be addressed by each and every spe-
outcomes are needed to gain and maintain trust in DAI pro- cific proposal for the expropriation of data, the re-purposing
cesses” (p.13, fn.8). Not only does the statement appear in a of data, the merger of data, the scrubbing of data, the appli-
mere footnote, but the concern is solely about ‘trust’ and not cation of data analytics to it, and the use of inferences drawn
at all about the appropriateness of the inferences drawn, the from the process in order to make, or even guide, let alone
actions taken as a result of them, or the resource efficiency explain, decisions and action that affect the real world. Further,
and equitability of those actions. The current set of docu- it is far from clear that measures are being adopted to meet
ments from the US National Institute of Standards and these challenges.
Technology (NIST, 2015) is also remarkably devoid of discus- Ill-advised applications of data analytics are preventable by
sion about data quality and process quality, and offers no applying the Guidelines proposed in this paper. As the ‘big data’
process guidance along the lines of the Guidelines proposed mantra continues to cause organisations to have inflated ex-
in this paper. pectations of what data analytics can deliver, both shareholders
Another avenue whereby progress can be achieved is through and external stakeholders need constructive action to be taken
adoption by the authors of text-books. At present, leading texts in order to get data analytics practices under control, and avoid
commonly have a brief, excusatory segment, usually in the first erroneous business decisions, loss of shareholder value, in-
or last chapter. Curriculum proposals commonly suffer the same appropriate policy outcomes, and unjustified harm to individual,
defect, e.g. Gupta et al. (2015), Schoenherr and Speier-Pero (2015). social, economic and environmental values. The Guidelines pro-
Course-designers appear to generally follow the same pattern, posed in this paper therefore provide a basis for the design of
and schedule a discussion or a question in an assignment, organisational and regulatory processes whereby positive ben-
which represents a sop to the consciences of all concerned, efits can be gained from data analytics, but undue harm
but does almost nothing about addressing the problems, and avoided.
nothing about embedding solutions to those problems within
the analytics process. It is essential that the specifics of the
Guidelines in Table 2 be embedded in the structure of text-
books and courses, and that students learn to consider each Acknowledgement
issue at the point in the acquisition/analysis/use cycle at which
each challenge needs to be addressed. The author received valuable feedback from Prof. Louis de Koker
None of these approaches is a satisfactory substitute for leg- of La Trobe University, Melbourne, David Vaile and Dr. Lyria
islation that places formal obligations on organisations that Bennett Moses of UNSW, Sydney, Dr. Kerry Taylor of the ANU,
apply data analytics, and that provides aggrieved parties with Canberra, Dr. Kasia Bail of the University of Canberra, Prof.
the capacity to sue organisations where they materially breach Charles Raab of Edinburgh University, and an anonymous re-
requirements and there are material negative impacts. Such viewer. Evaluative comments are those of the author alone.
a scheme may be imposed by an activist legislature, or a regu-
latory framework may be legislated and the Code negotiated
with the relevant parties prior to promulgation by a del-
REFERENCES
egated agency. It is feasible for organisations themselves to
submit to a parliament that a co-regulatory scheme of such a
kind should be enacted, for example where scandals arise from
ACM. Statement on algorithmic transparency and accountability.
inappropriate use of data analytics by some organisations, Association for Computing Machinery; 2017. Available from:
which have a significant negative impact on the reputation of https://www.acm.org/binaries/content/assets/public-policy/
an industry sector as a whole. 2017_usacm_statement_algorithms.pdf. [Accessed November
24, 2017].
Agrawal D, Philip Bernstein P, Bertino E, Davidson S, Dayal U,
Franklin M, Gehrke J, et al. Challenges and opportunities with
5. Conclusions big data 2011-1. Cyber Center Technical Reports, Paper 1; 2011.
Available from: http://docs.lib.purdue.edu/cctech/1. [Accessed
November 24, 2017].
This paper has not argued that big data and big data analyt- Anderson C. The end of theory: the data deluge makes the
ics are inherently evil. It has also not argued that no valid scientific method obsolete. Wired Magazine 16:07; 2008.
computer law & security review 34 (2018) 467–476 475

ASA. Ethical guidelines for statistical practice. American DoFD. Better practice guide for big data. Australian Dept of
Statistical Association; 2016. Available from: http:// Finance & Deregulation, v.2; 2015. Available from: http://
www.amstat.org/ASA/Your-Career/Ethical-Guidelines-for www.finance.gov.au/sites/default/files/APS-Better-Practice
-Statistical-Practice.aspx. [Accessed November 24, 2017]. -Guide-for-Big-Data.pdf. [Accessed November 24, 2017].
Bollier D. The promise and peril of big data. The Aspen Institute; Domingos P. A few useful things to know about machine
2010. Available from: https://www.emc.co.tt/collateral/ learning. Commun ACM 2012;55(10):78–87.
analyst-reports/10334-ar-promise-peril-of-big-data.pdf. Dreyfus H. What computers still can’t do. MIT Press; 1992.
[Accessed November 24, 2017]. DSA. Data science code of professional conduct. Data Science
Bostrom N. Superintelligence: paths, dangers, strategies. Oxford Association, undated but apparently of 2016; 2016. Available
University Press; 2014. from: http://www.datascienceassn.org/sites/default/files/
Boyd D, Crawford K. Six provocations for big data. Proc. datasciencecodeofprofessionalconduct.pdf. [Accessed
Symposium on the Dynamics of the Internet and Society; November 24, 2017].
2011. Available from: http://ssrn.com/abstract=1926431. GAO. Emerging opportunities and challenges data and analytics
[Accessed November 24, 2017]. innovation. Government Accountability Office, Washington
Broeders D, Schrijvers E, van der Sloot B, van Brakel R, de Hoog J, DC; 2016. Available from: http://www.gao.gov/assets/680/
Ballina EH. Big data and security policies: towards a 679903.pdf. [Accessed November 24, 2017].
framework for regulating the phases of analytics and use of Gupta B, Goul M, Dinter B. Business intelligence and big data in
big data. Comput Law Secur Rev 2017;33:309–23. higher education: status of a multi-year model curriculum
Burrell J. How the machine ‘thinks’: understanding opacity in development effort for business school undergraduates, MS
machine learning algorithms. Big Data Soc 2016;3(1):1–12. graduates, and MBAs. Commun Assoc Inf Syst 2015;
Cai L, Zhu Y. The challenges of data quality and data quality 36(23):Available from: https://www.researchgate.net/profile/
assessment in the big data era. Data Sci J 2015;14(2):1–10. Babita_Gupta4/publication/274709810_Communications
Available from: https://datascience.codata.org/articles/ _of_the_Association_for_Information_Systems/links/
10.5334/dsj-2015-002/. 557ecd4b08aeea18b7795225.pdf.
Cao L. Data science: a comprehensive overview. ACM Hanson R. This AI boom will also bust. Overcoming Bias Blog;
Computing Surveys; 2017. Available from: http://dl.acm 2016. Available from: http://www.overcomingbias.com/2016/
.org/ft_gateway.cfm?id=3076253&type=pdf. [Accessed 12/this-ai-boom-will-also-bust.html. [Accessed November 24,
November 24, 2017]. 2017].
Clarke R. A contingency approach to the software generations. Haryadi AF, Hulstijn J, Wahyudi A, van der Voort H, Janssen M.
Database 1991;22(3):23–34. PrePrint available from: http:// Antecedents of big data quality: an empirical examination in
www.rogerclarke.com/SOS/SwareGenns.htmlSummer 1991. financial service organizations. Proc. IEEE Int’l Conf. on Big
Clarke R. Dataveillance by governments: the technique of Data; 2016. pp. 116–21. Available from: https://pure.tudelft.nl/
computer matching. Inf Tech People 1994;7(2):46–85. PrePrint portal/files/13607440/Antecedents_of_Big_Data_Quality
available from: http://www.rogerclarke.com/DV/MatchIntro _IEEE2017_author_version.pdf. [Accessed November 24, 2017].
.html. Hazen BT, Boone CA, Ezell JD, Jones-Farmer LA. Data quality for
Clarke R. Privacy impact assessment: its origins and data science, predictive analytics, and big data in supply
development. Comput Law Secur Rev 2009;25(2):123–35. chain management: an introduction to the problem and
PrePrint available from: http://www.rogerclarke.com/DV/ suggestions for research and applications. Int J Prod Econ
PIAHist-08.html. 2014;154:72–80. Available from: https://www.researchgate.net/
Clarke R. An evaluation of privacy impact assessment guidance profile/Benjamin_Hazen/publication/261562559_Data_Quality
documents. Int Data Priv Law 2011;1(2):111–20. PrePrint _for_Data_Science_Predictive_Analytics_and_Big_Data_in
available from: http://www.rogerclarke.com/DV/PIAG-Eval _Supply_Chain_Management_An_Introduction_to_the
.html. _Problem_and_Suggestions_for_Research_and_Applications/
Clarke R. Big data, big risks. Inf Syst J 2016a;26(1):77–90. links/0deec534b4af9ed874000000.
PrePrint available from: http://www.rogerclarke.com/EC/ Huh YU, Keller FR, Redman TC, Watkins AR. Data quality. Inf
BDBR.html. Softw Tech 1990;32(8):559–65.
Clarke R. Quality assurance for security applications of big data. ICO. Big data, artificial intelligence, machine learning and data
Proc. European Intelligence and Security Informatics protection. UK Information Commissioner’s Office, Discussion
Conference (EISIC), Uppsala, 17–19 August 2016; 2016b. Paper v.2.2; 2017. Available from: https://ico.org.uk/for-
PrePrint available from: http://www.rogerclarke.com/EC/ organisations/guide-to-data-protection/big
BDQAS.html. [Accessed November 24, 2017]. -data/. [Accessed November 24, 2017].
Clarke R. The distinction between a PIA and a Data Protection Jagadish HV. Big data and science: myths and reality. Big Data
Impact Assessment (DPIA) under the EU GDPR. Working Res 2015;2(2):49–52.
Paper, Xamax Consultancy Pty Ltd; 2017a. Available from: Jagadish HV, Gehrke J, Labrinidis A, Papakonstantinou Y, Patel
http://www.rogerclarke.com/DV/PIAvsDPIA.html. [Accessed JM, Ramakrishnan R, et al. Big data and its technical
November 24, 2017]. challenges. Commun ACM 2014;57(7):86–94.
Clarke R. Big data prophylactics, chapter 1. In: Lehmann A, Kandel S, Heer J, Plaisant C, Kennedy J, van Ham F, Henry-Riche
Whitehouse D, Fischer-Hübner S, Fritsch L, Raab C, editors. N, et al. Research directions for data wrangling: visualizations
Privacy and identity management. Facing up to next steps. and transformations for usable and credible data.
Springer; 2017b. p. 3–14 [chapter 1]. PrePrint available from: Information Visualization 10. 4; 2011. 271–88. Available from:
http://www.rogerclarke.com/DV/BDP.html. https://idl.cs.washington.edu/files/2011-DataWrangling-
CoE. Guidelines on the protection of individuals with IVJ.pdf. [Accessed November 24, 2017].
regard to the processing of personal data in a world of Katz Y. Noam Chomsky on where artificial intelligence went
big data. Convention 108 Committee, Council of Europe; wrong: an extended conversation with the legendary linguist.
2017. Available from: https://rm.coe.int/ The Atlantic; 2012. Available from: https://www.theatlantic
CoERMPublicCommonSearchServices/ .com/technology/archive/2012/11/noam-chomsky-on-where
DisplayDCTMContent?documentId=09000016806ebe7a. -artificial-intelligence-went-wrong/261637/. [Accessed
[Accessed November 24, 2017]. November 24, 2017].
476 computer law & security review 34 (2018) 467–476

King NJ, Forder J. Data analytics and consumer profiling: finding Raab C, Clarke R. Inadequacies in the UK’s data science ethical
appropriate privacy principles for discovered data. Comput framework. Euro Data Protect L 2016;2(4):555–60. PrePrint
Law Secur Rev 2016;32:696–714. available from: http://www.rogerclarke.com/DV/DSEFR.html.
Knight W. The dark secret at the heart of AI. 11 April 2017, MIT Raab CD, Wright D, de Hert P. editors. Surveillance: extending the
Technology Review; 2017. Available from: https:// limits of privacy impact assessment. 2012. p. 363–83 [Ch. 17].
www.technologyreview.com/s/604087/the-dark-secret-at-the Rahm E, Do HH. Data cleaning: problems and current
-heart-of-ai/. [Accessed November 24, 2017]. approaches. IEEE Data Eng Bull 2000;23:Available from: http://
Kurzweil R. The singularity is near: when humans transcend dc-pubs.dbs.uni-leipzig.de/files/
biology. Viking; 2005. Rahm2000DataCleaningProblemsand.pdf.
Lazer D, Kennedy R, King G, Vespignani A. The parable of Google Rivers CM, Lewis BL. Ethical research standards in a world of big
flu: traps in big data analysis. Science 2014;343(6176):1203–5. data. F1000Res 2014;3:38. Available from: https://
Available from: https://dash.harvard.edu/bitstream/handle/1/ f1000research.com/articles/3-38.
12016836/The%20Parable%20of%20Google%20Flu%20%28WP Rosebrock A. Get off the deep learning bandwagon and get some
-Final%29.pdf. perspective. PY Image Search; 2014. Available from: https://
Lipton ZC. (Deep Learning’s Deep Flaws)’s Deep Flaws. KD www.pyimagesearch.com/2014/06/09/get-deep
Nuggets; 2015. Available from: http://www.kdnuggets.com/ -learning-bandwagon-get-perspective/. [Accessed November
2015/01/deep-learning-flaws-universal-machine-learning 24, 2017].
.html. [Accessed November 24, 2017]. Saha B, Srivastava D. Data quality: The other face of big data.
Mayer-Schonberger V, Cukier K. Big data, a revolution that Proc. Data Engineering (ICDE); 2014. pp. 1294–7. Available
will transform how we live, work and think. John Murray; from: https://people.cs.umass.edu/~barna/paper/ICDE
2013. -Tutorial-DQ.pdf. [Accessed November 24, 2017].
McFarland DA, McFarland HR. Big data and the danger of being Schoenherr T, Speier-Pero C. Data science, predictive
precisely inaccurate. Big Data Soc 2015;2(2):1–4. analytics, and big data in supply chain management:
Merino J, Caballero I, Bibiano R, Serrano M, Piattini M. A data current state and future potential. J Bus Logist 2015;36(1):
quality in use model for big data. Fut Gen Comput Syst 120–32. Available from: http://www.logisticsexpert.org/
2016;63:123–30. top_articles/2016/2016%20-%20Research%20-%20JBL%20
Metcalf J, Crawford K. Where are human subjects in big data -%20Data%20Science,%20Predictive%20Analytics,%20and%
research? The emerging ethics divide. Big Data Soc 20Big%20Data%20in%20Supply%20Chain%20Managementl
2016;3(1):1–14. .pdf.
Mittelstadt BD, Allo P, Taddeo M, Wachter S, Floridi L. The ethics Shanks G, Darke P. Understanding data quality in a data
of algorithms: mapping the debate. Big Data Soc 2016;3(2):1– warehouse. Aust Comput J 1998;30:122–8.
21. Swoyer S. The shortcomings of predictive analytics. TDWI; 2017.
Moravec H. Robot: mere machine to transcendent mind. Oxford Available from: https://tdwi.org/articles/2017/03/08/
University Press; 2000. shortcomings-of-predictive-analytics.aspx. [Accessed
Müller H, Freytag J-C. Problems, methods and challenges in November 24, 2017].
comprehensive data cleansing. Technical Report HUB-IB-164, UKCO. Data science ethical framework. U.K. Cabinet Office, v.1.0;
Humboldt-Universität zu Berlin, Institut für Informatik; 2003. 2016. Available from: https://www.gov.uk/government/
Available from: http://www.informatik.uni-jena.de/dbis/lehre/ publications/data-science-ethical-framework. [Accessed
ss2005/sem_dwh/lit/MuFr03.pdf. [Accessed November 24, November 24, 2017].
2017]. UNSD. Declaration of professional ethics. United Nations
Müller O, Junglas I, vom Brocke1 J, Debortoli S. Utilizing big data Statistical Division; 1985. Available from: http://
analytics for information systems research: challenges, unstats.un.org/unsd/dnss/docViewer.aspx?docID=93#start.
promises and guidelines. Eur J Inf Syst 2016;25(4):289–302. [Accessed November 24, 2017].
Available from: https://www.researchgate.net/profile/ Wang RY, Strong DM. Beyond accuracy: what data quality means
Oliver_Mueller5/publication/290973859_Utilizing_Big_Data to data consumers. J Manag Inf Syst 1996;12(4):5–33. Spring,
_Analytics_for_Information_Systems_Research_Challenges 1996.
_Promises_and_Guidelines/links/56ec168f08aee4707a384fff/ Wigan MR, Clarke R. Big data’s big unintended consequences.
Utilizing-Big-Data-Analytics-for-Information-Systems IEEE Comput 2013;46(6):46–53. PrePrint available from: http://
-Research-Challenges-Promises-and-Guidelines.pdf. www.rogerclarke.com/DV/BigData-1303.html.
NIST. NIST big data interoperability framework. Special WP29. Statement of the WP29 on the impact of the development
Publication 1500-1, v.1, National Institute of Standards and of big data on the protection of individuals with regard to the
Technology; 2015. Available from: https://bigdatawg.nist.gov/ processing of their personal data in the EU. Article 29 Working
V1_output_docs.php. [Accessed November 24, 2017]. Party, European Union; 2014. Available from: http://
OAIC. Consultation draft: guide to big data and the Australian ec.europa.eu/justice/data-protection/article-29/
Privacy Principles. Office of the Australian Information documentation/opinion-recommendation/files/2014/wp221
Commissioner; 2016. Available from: https://www.oaic.gov.au/ _en.pdf. [Accessed November 24, 2017].
engage-with-us/consultations/guide-to-big-data-and-the Wright D, de Hert P, editors. Privacy impact assessments.
-australian-privacy-principles/consultation-draft-guide-to-big Springer; 2012.
-data-and-the-australian-privacy-principles. [Accessed Wright D, Friedewald M. Integrating privacy and ethical impact
November 24, 2017]. assessments. Sci Public Policy 2013;40(6):755–66. Available
Piprani B, Ernst D. A model for data quality assessment. Proc. from: http://spp.oxfordjournals.org/content/40/6/755.full.
OTM Workshops (5333); 2008. pp 750–9. Zook M, Barocas S, boyd d, Crawford K, Keller E, Gangadharan SP,
Press G. A very short history of data science. Forbes; 2013. Goodman A, et al. Ten simple rules for responsible big data
Available from: https://www.forbes.com/sites/gilpress/2013/ research. PLoS Comput Biol 2017;13(3):Available from: https://
05/28/a-very-short-history-of-data-science/#375c75e355cf. www.ncbi.nlm.nih.gov/pmc/articles/PMC5373508/. [Accessed
[Accessed November 24, 2017]. November 24, 2017].
Roger Clarke's 'Responsible AI' 3/10/18, 16)39

Roger Clarke's Web-Site © Xamax Consultancy Pty Ltd, 1995-2018 Search

Information Dataveillance Identity Waltzing Advanced Site-Search


Home eBusiness Other Topics What's New
Infrastructure & Privacy Matters Matilda

Guidelines for the Responsible Business Use of AI


Foundational Working Paper
Stable Version of 3 October 2018
(Added ss 5.1 and 5.9, additional citations, editorials, re-formatting)
(Substantial further development of s.7, addition of Table 4 and Appendices)

Roger Clarke **

© Xamax Consultancy Pty Ltd, 2018

Available under an AEShareNet licence or a Creative Commons licence.

This document is at http://www.rogerclarke.com/EC/GAIF.html

Abstract
Organisations across the private and public sectors are looking to use artificial intelligence (AI) techniques not only to draw inferences, but also to make decisions and take
action, and even to do so autonomously. This is despite the absence of any means of programming values into technologies and artefacts, and the obscurity of the rationale
underlying inferencing using contemporary forms of AI.

To what extent is AI really suitable for real-world applications? Can corporate executives satisfy their board-members that the business is being managed appropriately if
AI is inscrutable? Beyond operational management, there are compliance risks to manage, and threats to important relationships with customers, staff, suppliers and the
public. Ill-advised uses of AI need to be identified in advance and nipped in the bud, to avoid harm to important values, both corporate and social. Organisations need to
extract the achievable benefits from advanced technologies rather than dreaming dangerous dreams.

This working paper first considers several approaches to addressing the gap between the current round of AI marketing hype and the hard-headed worlds of business and
government. It is first proposed that AI needs to be re-conceived as 'complementary intelligence', and that the robotics notion of 'machines that think' needs to give way to
the idea of 'intellectics', with the focus on 'computers that do'.

A review of 'ethical analysis' of IT's impacts extracts little of value. A consideration of regulatory processes proves to be of more use, but to still deliver remarkably little
concrete guidance. It is concluded that the most effective approach for organisations to take is to apply adapted forms of the established techniques of risk assessment and
risk management. Critically, stakeholder analysis needs to be performed, and risk assessment undertaken, from those perspectives as well as from that of the organisation
itself. This Working Paper's final contribution is to complement that customised form of established approaches to risk by the presentation of a derivative set of Principles
for Responsible AI, with indications provided of how those Principles can be operationalised for particular forms of complementary intelligence and intellectics.

Contents
1. Introduction
2. Rethinking AI
2.1 'AI' cf. 'Complementary Intelligence'
2.2 Autonomy
2.3 Technology, Artefacts, Systems and Applications
3. Contemporary AI
3.1 Robotics
3.2 Cyborgisation
3.3 'AI / ML'
3.4 Intellectics
4. Ethics
5. Regulation
6. A Practical Approach
6.1 Corporate Risk Assessment
6.2 Stakeholder Risk Assessment
6.3 Comprehensive Risk Management
7. Towards Operational Principles
8. Conclusions
References
Supporting Materials:
Ethical Principles and Information Technology
Principles for AI: A SourceBook
Appendix 1: 50 Principles for Responsible AI Technologies, Artefacts, Systems and Applications
Appendix 2: Omitted Elements

1. Introduction
The term Artificial Intelligence (AI) was coined in 1955 in a proposal for the 1956 Dartmouth Summer Research Project in Automata (McCarthy et al. 1955). The proposal
was based on "the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to
simulate it". Histories of AI (e.g. Russell & Norvig 2009, pp. 16-28) identify multiple strands, but also multiple re-visits to much the same territory, and a considerable

http://rogerclarke.com/EC/GAIF.html Page 1 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39

degree of creative chaos.

The over-enthusiasm that characterises the promotion of AI has deep roots. Simon (1960) averred that "Within the very near future - much less than twenty-five years - we
shall have the technical capability of substituting machines for any and all human functions in organisations. ... Duplicating the problem-solving and information-handling
capabilities of the brain is not far off; it would be surprising if it were not accomplished within the next decade". Over 35 years later, with his predictions abundantly
demonstrated as being fanciful, Simon nonetheless maintained his position, e.g. "the hypothesis is that a physical symbol system [of a particular kind] has the necessary
and sufficient means for general intelligent action" (Simon 1996, p. 23 - but expressed in similar terms from the late 1950s, in 1969, and through the 1970s), and "Human
beings, viewed as behaving systems, are quite simple" (p. 53). Simon acknowledged "the ambiguity and conflict of goals in societal planning" (p. 140), but his subsequent
analysis of complexity (pp. 169-216) considered only a very limited sub-set of the relevant dimensions. Much the same dubious assertions can be found in, for example,
Kurzweil (2005): "by the end of the 2020s" computers will have "intelligence indistinguishable to biological humans" (p.25), and in self-promotional documents of the
current decade.

AI has offered a long litany of promises, many of which have been repeated multiple times, on a cyclical basis. Each time, proponents have spoken and written excitedly
about prospective technologies, using descriptions that not merely verged into the mystical, but often crossed the border into the realms of magic and alchemy. Given the
habituated exaggerations that proponents indulge in, it is unsurprising that the field has exhibited cyclical 'boom and bust' patterns, with research funding being sometimes
very easy to obtain, and sometimes very difficult, depending on whether the focus at the time is on the hyperbole or on the very low delivery-rate against promises.

Part of AI's image-problem is that most of the successes deriving from what began as AI research have shed the name, and become associated with other terms. For
example, pattern recognition, variously within text, speech and two-dimensional imagery, has made a great deal of progress, and achieved application in multiple fields, as
diverse as dictation, vehicle number-plate recognition and object and facial recognition. Expert systems approaches, particularly based on rule-sets, have also achieved a
degree of success. Game-playing, particularly of chess and go, have surpassed human-expert levels, and provided entertainment value and spin-offs, but seem not to have
provided the breakthroughs towards posthumanism that their proponents appeared to be claiming for them.

This Working Paper concerns itself with the question of how organisations can identify AI technologies that have practical value, and apply them in ways that achieve
benefits, without incurring disproportionate disbenefits or giving rise to unjustified risks. A key feature of AI successes to date appears to be that, even where the
technology or its application is complex, it is understandable by people with appropriate technical background, i.e. it is not magic and is not presented as magic, and its
applications are auditable. AI technologies that have been effective have been able to be empirically tested in real-world contexts, but under sufficiently controlled
conditions that the risks have been able to be managed.

The scope addressed in this Working Paper is very broad, in terms of both technologies and applications, but it does not encompass design and use for warfare or armed
conflict. It does, however, include applications to civil law enforcement and domestic national security, i.e. safeguards for the public, for infrastructure, and for public
figures.

This working paper commences by considering interpretations of the AI field that may contribute to overcoming its problems and assist in analysing the opportunities and
threats that it embodies. Brief scans are undertaken of current technologies that are within the field of view. There are several possible sources of guidance in relation to the
responsible use of AI. The paper first considers ethics, and then regulatory regimes. It proposes, however, that the most useful approach is through risk assessment and
management processes, but expanding the perspectives from solely that of the organisation itself to also embrace those of stakeholders. The final section draws on the
available sources in order to propose a set of principles for the responsible application of AI that are specific enough to guide organisations' business processes.

2. Rethinking AI
A major contributor to AI's problems has been the diverse and often conflicting conceptions of what it is, and what it is trying to achieve. The first necessary step is to
disentangle the key ideas, and adopt an interpretation that can assist user organisations to appreciate the nature of the technology, and then analyse its potential
contributions and downsides.

2.1 'AI' cf. 'Complementary Intelligence'


What does, what could, and what should 'intelligence' mean? What does 'artificial' mean? And are the conventional interpretations of these terms useful to individual
organisations, and to the economy and society more generally?

The general sense in which the term 'intelligence' is used by the AI community is that an entity exhibits intelligence if it has perception and cognition of (relevant aspects
of) its environment, has goals, and formulates actions towards the achievement of those goals (Albus 1991, Russell & Norvig 2003, McCarthy 2007). Some AI proponents
strive to replicate in artefacts the processes whereby human entities exhibit intelligence, whereas others define AI in terms of the artefact's performance rather than the
means whereby the performance arises.

The term 'artificial' has always been problematic. The originators of the term used it to mean 'synthetic', in the sense of being human-made but equivalent to human. It is far
from clear that there was a need for yet more human intelligence in 1955, when there were 2.8 billion people, let alone now, when there are over 7 billion of us, many
under-employed and likely to remain so.

Some proponents have shifted away from human-equivalence, and posited that AI is synthetic, but in some way 'superior-to-human'. This raises the question as to how
superiority is to be measured. For example, is playing a game better than human experts necessarily a useful measure? There is also a conundrum embedded in this
approach: if human intelligence is inferior, how can it reliably define what 'superior-to-human' means?

An alternative approach may better describe what humankind needs. An idea that is traceable at least to Wyndham (1932) is that " ... man and machine are natural
complements: They assist one another". I argued in Clarke (1989) that there was a need to "deflect the focus ... toward the concepts of 'complementary intelligence' and
'silicon workmates' ... to complement human strengths and weaknesses, rather than to compete with them". Again, in Clarke (1993), reprised in Clarke (2014b), I reasoned
that: "Because robot and human capabilities differ, for the foreseeable future at least, each will have specific comparative advantages. Information technologists must
delineate the relationship between robots and people by applying the concept of decision structuredness to blend computer-based and human elements advantageously".

Adopting this approach, AI needs to be re-conceived such that its purpose is to extend human capabilities. Rather than 'artificial' intelligence, the design objective needs to
be 'complementary' intelligence, the essence of which is:

1. to do things well that humans do badly or cannot do at all; and


2. to function as elements within systems that include both humans and artefacts, with effective, efficient and adaptable interfacing among them all.

An important category of 'complementary intelligence' is the use of negative-feedback mechanisms to achieve automated equilibration within human-made systems. A
longstanding example is the maintenance of ship trim and stability by means of hull shape and careful weight distribution, including ballast. A more commonly celebrated
instance is Watts' fly-ball governor for regulating the pressure in a boiler. Of more recent origin are schemes to achieve real-time control over the orientation of craft
floating in fluids, and maintenance of their location or path. There are successful applications to deep-water oil-rigs, underwater craft, and aircraft both with and without
pilots on board. The notion is also exemplified by the distinction between decision support systems (DSS), which are designed to assist humans make decisions, and
decision systems (DS), whose purpose is to make the decisions without human involvement.

http://rogerclarke.com/EC/GAIF.html Page 2 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39

Computer-based systems have a clear advantage over humans in contexts in which significant computation is involved, reliability and accuracy are important, and speed of
inferencing, decision-making and/or action-taking, are important. This advantage is, however, limited to circumstances in which either a structured process exists or
heuristics or purely empirical techniques have been well-demonstrated to be effective.

Further advantages may arise in relation to cost, the delegation to devices of boringly mundane tasks, and the performance by artefacts of tasks that are inherently
dangerous, or that need to be performed in environments that are inherently dangerous to humans and/or are beyond their physical capabilities (e.g. environments that
feature high pressure such as deep water, low pressure such as space, or high radiation levels both in space and close to nuclear materials). Even where such superiority can
be demonstrated, however, the need exists to focus discussion about AI on complementary intelligence, on technologies that augment human capabilities, and on systems
that feature collaboration between humans and artefacts.

I contend that the use of the complementary intelligence notion can assist organisations in their efforts to distinguish uses of AI that have prospects for adoption, the
generation of net benefits, the management of disbenefits, and the achievement of public acceptability.

2.2 Autonomy
The concept of 'automation' is concerned with the performance of a predetermined procedure, or response in predetermined ways to alternative stimuli. It is observable in
humans, e.g. under hypnosis, and is designed-into many kinds of artefacts.

The rather different notion of 'autonomy' means, in humans, the capacity for independent decision and action. Further, in some contexts, it also encompasses a claim to the
right to exercise that capacity. It is associated with the notions of consciousness, sentience, self-awareness, free will and self-determination. Autonomy in artefacts, on the
other hand, lies much closer to the notion of automation. It may merely refer to a substantial repertoire of pre-programmed stimulus-response relationships. Alternatively, it
may refer to some degree of adaptability to context, as might arise if some form of machine-learning were included, such that the specification of the stimulus-response
relationships change over time depending on the cases handled in the intervening period. Another approach might be to define artefact autonomy in terms of the extent to
which a human or some other artefact, does, or even can, intervene in the artefact's behaviour.

In humans, autonomy is best approached as a layered phenomenon. Each of us performs many actions in a subliminal manner. For example, our eye and ear receptors
function without us ever being particularly aware of them, and several layers of our neural systems handle the signals in order to offer us cognition, that is to say awareness
and understanding, of the world around us.

A layered approach is applicable to artefacts as well. Aircraft generally, including drones, may have layers of behaviour that occur autonomously, without pilot action or
even awareness. Maintenance of the aircraft's 'attitude' (orientation to the vertical and horizontal), and angle to the wind-direction, may, from the pilot's viewpoint, simply
happen. At a higher level of delegation, the aircraft may adjust the aircraft's flight controls in order to maintain a predetermined flight-path, and in the case of rotorcraft, to
maintain the vehicle's location relative to the earth's surface. A higher-order autonomous function is inflight manoeuvring to avoid collisions. At a yet higher level, some
aircraft can perform take-off and/or landing autonomously. To date, high-order activities that are seldom if ever autonomous include decisions about when to take off and
land, the mission objective, and 'target acquisition' (where to land, where to deliver a payload, which location to direct the payload towards).

At the lower levels, the rapidity with which analysis, decision and action need to be taken may preclude conscious human involvement. At the higher levels, however, a
pilot may be able to request advice, to accept or reject advice, to authorise an action recommended by an artefact, to override or countermand a default action, or to resume
full control. From the perspective of the drone, its functions may be to perform until its autonomous function is revoked, to perform except where a particular action is
over-ridden, to recommend, to advise, or to do nothing.

IEEE, even thought it is one of the most relevant professional associations in the field, made no meaningful attempt to address these issues for decades. It is currently
endeavouring to do so. It commenced with a discussion paper (IEEE 2017) which avoids the term AI, and instead uses the term 'Autonomous and Intelligent Systems
(A/IS)'. This highlights the need to address both intelligence and autonomy in an integrated manner.

2.3 Technology, Artefacts, Systems and Applications


A further factor that has tended to cloud meaningful discussion of responsibility in relation to AI has been inadequate discrimination among the successive phases of the
supply-chain from laboratory experiment to deployment in the field, and failure to assign responsibilities to the various categories of entities that are active in each phase.

IEEE's discussion paper (IEEE 2017) recognises that the end-result of successive rounds of R&D is complex systems that are applied in real-world contexts. In order to
deliver such systems, however, technology has to be conceived, proven, and embedded in artefacts. It is therefore valuable to distinguish between technology, artefacts that
embody the technology, systems that incorporate the artefacts, and applications of those systems. Appropriate responsibilities can then be assigned to researchers, to
inventors, to innovators, to purveyors, and to users. Table 1 identifies phases, the output from each phase, and the categories of entity that bear legal and moral
responsibility for disbenefits arising from AI.

Table 1: Entities with Responsibilities in Relation to AI

Phase Result Responsibility


Research AI Technology Researchers
Invention AI-Based Artefacts R&D Engineers
Innovation AI-Based Systems Developers
Dissemination Installed AI-Based Systems Purveyors
Application Impacts User Organisations and Individuals

This section has proposed several measures whereby the fog induced by the AI notion can be lifted, and a framework developed for managing AI-based activities. The
focus needs to be on complementary intelligence and autonomy, as features of technology, artefacts, systems and applications that support collaboration among all system
elements.

3. Contemporary AI
AI's scope is broad, and contested. This section identifies areas that have current relevance. Their relevance derives in part from claims of achievement of progress and
benefits, and in part from media coverage resulting in awareness among both organisations' staff and the general public. In addition to achieving some level of adoptiong,
each faces to at least some degree technical challenges, public scepticism and resistance. Achievement of the benefits that are potentially extractable from these
technologies is also threatened by over-claiming, over-reach, and resulting loss of public confidence. This section considers three forms of AI, and then suggests an
alternative conceptualisation intended to assist in understanding and addressing the technical, acceptance and adoption challenges.

3.1 Robotics

http://rogerclarke.com/EC/GAIF.html Page 3 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39

Robotics originally emerged in the form of machines enhanced with computational capacity. The necessary elements are sensors to acquire data from the robot's
environment, computing hardware and software to enable inferences to be drawn and decisions made, and actuators in order to give effect to those decisions by acting on
the robot's environment. Robotics has enjoyed its major areas of success in controlled environments such as the factory floor and the warehouse. Less obviously 'robotic'
systems include low-level control over the attitude, position and course of craft on or in water and in the air.

The last few years have seen a great deal of coverage of self-driving vehicles, variously on rails and otherwise, in controlled environments such as mines and quarries and
dedicated bus routes, and recently in more open environments. In addition, robotics has taken flight, in the form of drones (Clarke 2014a).

Many claims have been made recently about 'the Internet of Things' (IoT) and about systems comprising many small artefacts, such as 'smart houses' and 'smart cities'. For
a consolidation and rationalisation of multiple such ideas into the notion of an 'eObject', see Manwaring & Clarke (2015). Many of the initiatives in this area are robotic in
nature, in that they encompass all of sensors, computing and actuators.

3.2 Cyborgisation
The term cyborgisation refers to the process of enhancing individual humans by technological means, such that a cyborg is a hybrid of a human and one or more artefacts
(Clarke 2005, Warwick 2014). Many forms of cyborg fall outside the field of AI, such as spectacles, implanted lenses, stents, inert hip-replacements and SCUBA gear.
However, a proportion of the artefacts that are used to enhance humans include sensors, computational or programmatic 'intelligence', and one or more actuators. Examples
include heart pacemakers (since 1958), cochlear implants (since the 1960s, and commercially since 1978), and some replacement legs for above-knee amputees, in that the
artificial knee contains software to sustain balance within the joint.

Many such artefacts replace lost functionality, and are referred to as prosthetics. Others, which can be usefully referred to as orthotics, provide augmented or additional
functionality (Clarke 2011). An example of an orthotic is augmented reality for firefighters, displaying building plans and providing object-recognition in their visual field.
It was argued in Clarke (2014b) that use by drone pilots of instrument-based remote control, and particularly of first-person view (FPV) headsets, represent a form of
orthotic cyborgisation.

Artefacts of these kinds are not commonly included in catalogues of AI technology. On the other hand, they have a great deal in common with it, and with the notion of
complementary intelligence, and research in the field is emergent (Zhaohui et al. 2016). Cyborgisation has accordingly been defined as being within-scope of the present
analysis.

3.3 'AI / ML'


Computing applications for drawing inferences from data began with hard-wired, machine-level and assembler languages (1940-1960), but made great progress with
genuinely 'algorithmic programs', in languages such as ForTran (formula translator). That approach involves an implied problem that needs to be solved, and an explicit
procedural solution to that problem. During the 1980s, additional means of generating inferences became mainstream, including logic programming and rule-based
('expert') systems. These embody no explicit 'problem' or 'solution'. They instead define a 'problem-domain': some form of modelling of the relevant real world is
undertaken, and the model is expressed in a form that enables inferences to be drawn from it.

AI research has delivered a further technique, which accords primacy to the data rather than the model, and has the effect of obscuring the model to such an extent that no
humanly-understandable rationale exists for the inferences that are drawn. The relevant branch of AI is 'machine learning' (ML), and the most common technique in use is
'artificial neural networks'. The approach dates to the 1950s, but limited progress was made until sufficiently powerful processors were readily available, from the late
1980s.

Neural nets involve a set of nodes (each of which analogous to the biological concept of a neuron), with connections or arcs among them, referred to as 'edges'. Each
connection has a 'weight' associated with it. Each node performs some computation based on incoming data and may as a result adapt its internal state, including the
weighting on each connection, and may pass output to one or more other nodes. A neural net has to be 'trained'. This is done by selecting a training method (or 'learning
algorithm') and feeding a 'training-set' of data to the network in order to load up a set of weightings on the connections between nodes.

Unlike previous techniques for developing software, neural networking approaches do not begin with active and careful modelling of a real-world problem-solution,
problem or even problem-domain. Rather than comprising a set of entities and relationships that mirrors the key elements and processes of a real-world system, a neural
network model is simply a list of input variables and a list of output variables (and, in the case of 'deep' networks, intermediary variables). If a model exists, in the sense of
a representation of the real world, it is implicit rather than express. The weightings imputed for each connection reflect the characteristics firstly of the training-set that was
fed in, and secondly of the particular learning algorithm that was imposed on the training-set.

Although algorithms are used in the imputation of weightings on the connections within a neural net, the resulting software is not algorithmic, but rather empirical. This
has led some authors to justify a-theoretical mechanisms in general, and to glorify correlation and deprecate the search for causal relationships and systemic analysis
generally (Anderson 2008, Mayer-Schonberger & Cukier 2013).

AI/ML may well have the capacity to discover gems of otherwise-hidden information. However, the inferences drawn inevitably reflect any errors and biasses inherent in
the implicit model, in the selection of real-world phenomena for which data was created, in the selection of training-set, and in the learning algorithms used to develop the
software that delivers the inferences. Means are necessary to assess the quality of the implicit model, of the data-set, of the data-item values, of the training-set and of the
learning algorithm, and the compatibility among them, and to validate the inferences both logically and empirically. Unless and until those means are found, and are
routinely applied, AI/ML and neural nets must be regarded as unproven techniques that harbour considerable dangers to the interests of organisations and their
stakeholders.

3.4 Intellectics
Robotics began with an emphasis on machines being enhanced with computational elements and software. However, the emphasis has been shifting. I contend that the
conception now needs to be inverted, and the field regarded as computers enhanced with sensors and actuators, enabling computational processes to sense the world and act
directly on it. Rather than 'machines that think', the focus needs to be on 'computers that do'. The term 'intellectics' is a useful means of encapsulating that switch in
emphasis.

The term has been previously used in a related manner by Wolfgang Bibel, originally in German (1980, 1989). Bibel was referring to the combination of Artificial
Intelligence, Cognitive Science and associated disciplines, using the notion of the human intellect as the integrating element. Bibel's sense of the term has gained limited
currency, with only a few mentions in the literature and only a few authors citing the relevant papers. The sense in which I use the term here is rather different:

In the new context of intellectics, artefacts go beyond merely drawing inferences from data, in that they generate a strong impulse for an action to be taken in
the real world

I suggest the following criteria for assessing whether an artefact should be classified as falling within the field of intellectics:

As a threshold test, the artefact must at least communicate a recommendation to a human

http://rogerclarke.com/EC/GAIF.html Page 4 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39

At a higher level, an artefact makes a decision, which will result in action unless over-ridden or countermanded by a human

At the highest level, an artefact makes a decision, and takes action in the real world to give effect to that decision, without providing an opportunity for a
human to prevent the action being taken

The effect of implementing intellectics is to at least reduce the moderating effect of humans in the decision-loop, and even to remove that effect entirely. The emergence of
intellectics is accordingly bringing into much stronger focus the legitimacy of the inferencing techniques used, and of the inferences that they are leading to. Among the
major challenges involved are the difficulty and expense of establishing reliable software (in particular the size of the training-set required), the low quality of a large
proportion of the data on which inferencing depends, the significance of and the approach adopted to empty cells within the data-set, and the applicability of the data-
analytic technique to the data to which it is applied (Clarke 2016a, 2016c).

The earlier generations of computer-performed inferencing enabled the expression of humanly-understandable explanations. During the procedural programming era, a set
of conditions resulted in an output, and the logic of the solution was express in both the software specification and the source-code. In logic-based programming,
'consequents' could be traced back to 'antecedents', and in rule-based systems, which rules 'fired' in order to deliver the output could be documented (Clarke 1991).

That situation changes substantially with AI/ML and its primary technique, neural nets. The model is at best implicit and may be only very distantly related to the real-
world it is assumed to represent, the approach is empirical, it depends on a training-set, and it is not capable of generating a humanly-understandable explanation for an
inference that has been drawn. The application of such inferences to decision-making, and to the performance of actions in and on the real world, raises serious questions
about transparency (Burrell 2016, Knight 2017). A result of the loss of decision transparency is the undermining of organisations' accountability for their decisions and
actions. In the absence of transparency, such principles are under threat as evaluation, fairness, proportionality, evidence-based decision-making, and the capacity to
challenge decisions (APF 2013).

Applications of a variety of data analytics techniques are already giving rise to public disquiet, even in the case of techniques that are (at least in principle) capable of
generating explanations of decision rationale. The most publicly-visible of these are systems for people-scoring, most prominently in financial credit. There are also
applications in 'social credit' - although in this case to date only in the PRC (Chen & Cheung 2017). Similar techniques are also applied in social welfare contexts,
sometimes with seriously problematical outcomes (e.g. Clarke 2018a). Concerns are naturally heightened where inferencing technologies are applied to prediction -
particularly where the technique's effectiveness is assumed rather than carefully tested, published, and subject to challenge. Such approaches result in something
approaching pre-destination, through the allocation of individual people to categories and the attribution of future behaviour, in some circumstances even behaviour of a
criminal nature.

There is increasing public pressure for explanations to be provided for decisions that are adverse to the interests of individuals and of small business, especially in the
context of inscrutable inferencing techniques such as neural networking. The responsibility of decision-makers to provide explanations is implied by the principles of
natural justice and procedural fairness. In the EU, since mid-2018, as a consequence of Articles 13.2(f), 14.2(g) and 15.1(h) of the General Data Protection Regulation
(GDPR 2018), access must be provided to "meaningful information about the logic involved", "at least in" the case of automated decisions (Selbst & Powles 2017). On the
other hand, "the [European Court of Justice] has ... made clear that data protection law is not intended to ensure the accuracy of decisions and decision-making processes
involving personal data, or to make these processes fully transparent [and] a new data protection right, the 'right to reasonable inferences', is needed" (Wachter &
Mittelstadt 2019).

Re-conception of the field as Intellectics enables focus to be brought to bear on key issues confronting organisations that apply the outcomes of AI research. Intellectics
represents a major power-shift towards large organisations and away from individuals. Substantial pushback from the public needs to be anticipated, and new regulatory
obligations may be imposed on organisations. The following sections canvass the scope for these concerns to be addressed firstly by ethics, and secondly through
regulatory arrangements.

4. Ethics
Both the dated notion of AI and the alternative conceptualisations of complementary intelligence and intellectics harbour potentials for harm. So it is important for
organisations to carefully consider what factors constrain their freedom of choice and actions. The following section examines the regulatory landscape. This section first
considers the extent to which ethics affects organisational applications of technology.

Ethics is a branch of philosophy concerned with concepts of right and wrong conduct. Fieser (1995) and Pagallo (2016) distinguish 'meta-ethics', which is concerned with
the language, origins, justifications and sources of ethics, from 'normative ethics', which formulates generic norms or standards, and 'applied ethics', which endeavours to
operationalise norms in particular contexts. In a recent paper, Floridi (2018) has referred to 'hard ethics' - that which "may contribute to making or shaping the law" - and
'soft ethics' - which are discussed after the fact.

From the viewpoint of instrumentalists in business and government, the field of ethics evidences several substantial deficiencies. The first is that there is no authority, or at
least no uncontestable authority, for any particular formulation of norms, and hence every proposition is subject to debate. Further, as a form of philosophical endeavour,
ethics embodies every complexity and contradiction that smart people can dream up. Moreover, few formulations by philosophers ever reach even close to operational
guidance, and hence the sources enable prevarication and provide endless excuses for inaction. The inevitable result is that ethical discussions seldom have much influence
on real-world behaviour. Ethics is an intellectually stimulating topic for the dinner-table, and graces ex post facto reviews of disasters. However, the notion of 'ethics by
design' is even more empty than the 'privacy by design' meme. To an instrumentalist - who wants to get things done - ethics diversions are worse than a time-waster; they're
a barrier to progress.

The occasional fashion of 'business ethics' naturally inherits the vagueness of ethics generally, and provides little or no concrete guidance to organisations in any of the
many areas in which ethical issues are thought to arise. Far less does 'business ethics' assist in relation to complex and opaque digital technologies. Clarke (2018b)
consolidates a collection of attempts to formulate general ethical principles that may have applicability in technology-rich contexts - including bio-medicine, surveillance
and information technology. Remarkably, none of them contain any explicit reference to identifying relevant stakeholders. However, a number of norms are frequently-
encountered in these sets. These include demonstrated effectiveness and benefits, justification of disbenefits, mitigation of disbenefits, proportionality of negative impacts,
supervision (including safeguards, controls and audit), and recourse (including complaints and appeals channels, redress, sanctions, and enforcement powers and
resources).

The related notion of Corporate Social Responsibility (CSR), sometimes extended to include an Environmental aspect, can be argued to have an ethical base. In practice,
its primary focus is usually on the extraction of public relations gains from organisations' required investments in regulatory compliance. CSR can, however, extend
beyond the direct interests of the organisation to include philanthropic contributions to individuals, community, society or the environment.

When evaluating the potential impact of ethics and CSR, it is important to appreciate the constraints on company directors. They are required by law to act in the best
interests of each company of which they are a director. Attention to broad ethical questions is generally extraneous to, and even in conflict with, that requirement, except
where a business case indicates sufficient benefits to the organisation from taking a socially or environmentally responsible approach. The primary ways in which benefits
can accrue are through compliance with regulatory requirements, and enhanced relationships with important stakeholders. Most commonly, these stakeholders will be
customers, suppliers and employees, but the scope might extend to communities and economies on which the company has a degree of dependence.

Given the limited framework provided by ethics, the question arises as to the extent to which organisations are subject to legal and social mechanisms that prevent or

http://rogerclarke.com/EC/GAIF.html Page 5 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39

constrain their freedom to create technologies, and to embody them in artefacts, systems and applications.

5. Regulation
AI seems to have been argued by its proponents to be arriving imminently, on a cyclical basis, roughly every decade since 1956. Despite that, it appears that few regulatory
requirements have been designed or modified specifically with AI in mind. One reason for this is that parliaments seldom act in advance of new technologies being
deployed.

A 'precautionary principle' has been enunciated, whose strong form exists in some jurisdictions' environmental laws, along the lines of 'When human activities may lead to
morally unacceptable harm that is scientifically plausible but uncertain, actions shall be taken to avoid or diminish that potential harm' (TvH 2006). More generally,
however, the 'principle' is merely an ethical norm to the effect that 'If an action or policy is suspected of causing harm, and scientific consensus that it is not harmful is
lacking, then the burden of proof arguably falls on those taking the action'. Where AI appears likely to be impactful on the scale that its proponents suggest, surely the
precautionary principle applies, at the very least in its weak form. On the other hand, the considerable impacts of such AI technologies as automated number-plate
recognition (ANPR), 'facial recognition' and drones have not been the subject even of effective after-the-fact regulatory adaptation or innovation, let alone of proactive
protective measures.

A large body of theory exists relating to regulatory mechanisms (Braithwaite & Drahos 2000). Regulation takes many forms, including intrinsic and natural controls, self-
control, several levels of 'soft' community controls, various kinds of 'formal' or 'hard' regulatory schemes, and regulation by infrastructure or 'code'. An overview of these
categories is in Clarke & Bennett Moses (2014), and a relevant analysis is in Clarke (2014c). This section identifies a range of sources that may offer organisations, to
some extent guidance, and at least insights into what society expects, and obligations that organisations might be subject to.

5.1 Intrinsic and Natural Controls


An intervention such as the adoption of AI-based technologies may be subject to intrinsic limitations, or may stimulate natural processes whose effect is to prevent the
adoption occurring or continuing, or to curb or mitigate negative impacts. It is appropriate to consider these first. This is because, in the absence of such harm-limitation
mechanisms (a condition referred to by economists as 'market failure'), a case exists for regulatory measures to be devised and imposed; whereas, if adequate intrinsic or
natural controls exist, the costs that regulation would impose on all parties are not justifiable.

Economic factors tend to constrain adoption, commonly because of the expense involved and inadequate volume or profit-margin. This is particularly likely to be
determinative where the technology is, or is perceived to be, insufficiently effective in delivering on its promise. In some circumstances, the realisation of the potential
benefits of a technology may be dependent on infrastructure that is unavailable or inadequate. (For example, computing could have exploded in the third quarter of the 19th
century, rather than 100 years later, had metallurgy of the day been able to support Babbage's 'difference' and 'analytical' engines). Another form of control is the opposition
of players with sufficient institutional or market power. This includes the use of formal media and social media to stir up public opprobrium.

It is far from clear that any of the currently-promoted forms of AI are subject to adequate intrinsic and natural controls. The following sub-sections accordingly consider
each of the various forms of regulatory intervention, beginning at the apex of the regulatory pyramid with 'hard law'.

5.2 AI-Specific Laws


In-place industrial robotics, in production-lines and warehouses, is well-established. Various publications have discussed general questions of robot regulation (e.g. Leenes
& Lucivero 2014, Scherer 2016, HTR 2018a, 2018b), but fewer identify AI-specific laws. Even such vital aspects as worker safety and employer liability appear to depend
not on technology-specific laws, but on generic laws, which may or may not have been adapted to reflect the characteristics of the new technologies.

In HTR (2017), South Korea is identified as having enacted the first national law relating to robotics generally: the Intelligent Robots Development Distribution Promotion
Act of 2008. It is almost entirely facilitative and stimulative, and barely even aspirational in relation to regulation of robotics. There is mention of a 'Charter', "including the
provisions prescribed by Presidential Decrees, such as ethics by which the developers, manufacturers, and users of intelligent robots shall abide" - but no such Charter
appears to exist. A mock-up is at Akiko (2012). HTR (2018c) offers a generic regulatory specification in relation to research and technology generally, including robotics
and AI.

In relation to autonomous motor vehicles, a number of jurisdictions have enacted laws. See Palmerini et al. (2014, pp.36-73), Holder et al. (2016), DMV-CA (2018),
Vellinga (2017), which reviews laws in the USA at federal level, California, United Kingdom, and the Netherlands, and Maschmedt & Searle (2018), which reviews such
laws in three States of Australia. Such initiatives have generally had a strong focus on economic motivations, the stimulation and facilitation of innovation, exemptions
from some existing regulation, and limited new regulation or even guidance. One approach to regulation is to leverage off natural processes. For example, Schellekens
(2015) argued that a requirement of obligatory insurance was a sufficient means for regulating liability for harm arising from self-driving cars. In the air, legislatures and
regulators have moved very slowly in relation to the regulation of drones (Clarke & Bennett Moses 2014, Clarke 2016b).

Automated decision-making about people has been subject to French data protection law for many years. In mid-2018 this became a feature of European law generally,
through the General Data Protection Regulation (GDPR) Art. 22, although doubts have been expressed about its effectiveness (Wachter et al. 2017).

On the one hand, it might be that AI-based technologies are less disruptive than they are claimed to be, and that laws need little adjustment. On the other, a mythology of
'technology neutrality' pervades law-making. Desirable as it might be for laws to encompass both existing and future artefacts and processes, genuinely disruptive
technologies have features that render existing laws ambiguous and ineffective.

5.3 Generic Laws


Applications of new technologies are generally subject to existing laws. Particularly with 'breakthrough', revolutionary and disruptive technologies, existing laws are likely
to be ill-fitted to the new context, because they were "designed around a socio-technical context of the relatively distant past" (Bennett Moses 2011. p.765), and without
knowledge of the new form. In some cases, existing law may hinder new technologies in ways that are unhelpful to both the innovators and those affected by them. In other
cases, existing law may have been framed in such a manner that it does not apply to the new form (or judicial calisthenics has to be performed in order to make it appear to
apply), even though there would have been benefits if it had done so.

Applications of AI will generally be subject to the various forms of commercial law, particularly contractual obligations including express and implied terms, consumer
rights laws, and copyright and patent laws. In some contexts (such as robotics, cyborg artefacts, and AI software embedded in devices), product liability laws may apply.
Other laws that assign risk to innovators may also apply, such as the tort of negligence, as may laws of general applicability such as human rights law, anti-discrimination
law and data protection law. The obligations that the corporations law assigns to company directors are also relevant. Further sources of regulatory impact are likely to be
the laws relating to the various industry sectors within which AI is applied, such as road transport law, workplace and employment law, and health law.

Particularly in common law jurisdictions, there is likely to be a great deal of uncertainty about the way in which laws will be applied by tribunals and courts if any
particular dispute reaches them. This acts to some extent as a deterrent against innovation, and can considerably increase the costs incurred by proponents, and delay
deployment. From the viewpoint of people who perceive themselves to be negatively affected by the innovation, on the other hand, channels for combatting those threats

http://rogerclarke.com/EC/GAIF.html Page 6 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39

may be inaccessible, expensive, slow and even entirely ineffectual.

5.4 Co-Regulation (Enforceable Codes)


Parliaments struggle to understand and cope with new technologies. An approach to regulation that once appeared to offer promise is co-regulation. Under this
arrangement, a parliament establishes a legal framework, including authority, obligations, sanctions and enforcement mechanisms, but without expressing the obligations at
a detailed level. This is achieved through consultative processes among advocates for the various stakeholders. The result is an enforceable Code, which articulates general
principles expressed in the relevant legislation.

Unfortunately, few instances of effective co-regulation exist, because such processes typically exclude less powerful stakeholders. In any case, there are few signs of
parliaments being aware of the opportunity, and of its applicability to Intellectics. In Australia, for example, Enforceable Codes exist that are administered by the
Australian Communications and Media Authority (ACMA) in respect of TV and radio broadcasting, and telecommunications, and by the Australian Prudential Regulation
Authority (APRA) in respect of banking services. These arrangements succeed both in facilitating business and government activities and in offering a veneer of
regulation; but they fail to exercise control over behaviour that the public regards as inappropriate, and hence they have little public credibility.

5.5 Guidance by Regulatory and Oversight Agencies

It is common for parliaments to designate a specialist government agency or parliamentary appointee either to exercise loose oversight over a contested set of activities, or
to exercise powers and resources in order to enforce laws or Codes. An important function of either kind of organisation is to provide guidance to both the regulatees and
the parties that the scheme is intended to protect. In very few instances, however, does it appear that AI lies within the scope of an existing agency or appointee. Some
exceptions may exist, for example in relation to the public safety aspects of drones and self-driving motor vehicles.

As a result, in most jurisdictions, limited guidance appears to exist. For example, six decades after the AI era was launched, the EU has gone no further than a preliminary
statement (EC 2018) and a discussion document issued by the Data Protection Supervisor (EDPS 2016). Similarly, the UK Data Protection Commissioner has only reached
the stage of issuing a discussion paper (ICO 2017). The current US Administration's policy is entirely stimulative in nature, and mentions regulation solely as a barrier to
economic objectives (WH 2018).

5.6 Industry Self-Regulation (Unenforceable Codes)


Corporations club together for various reasons, some of which can be to the detriment of other parties, such as collusion on bidding and pricing. The activities of industry
associations can, however, deliver benefits for others, as well as for their members. In particular, collaborative approaches to infrastructure can improve services and reduce
costs for the sector's customers.

It could also be argued that, if norms are promulgated by the more responsible corporations in an industry sector, then misbehaviour by the industry's 'cowboys' would be
highlighted. In practice, however, the effect of Industry Codes on corporate behaviour is seldom significant. Few such Codes are sufficiently stringent to protect the
interests of other parties, and the absence of enforcement undermines the endeavour. The more marginal kinds of suppliers ignore them, and responsible corporations feel
the pinch of competition and reduce their commitment to them. As a result, such Codes act as camouflage, obscuring the absence of safeguards and thereby holding off
actual regulatory measures. In the AI field, examples of industry coalitions eagerly pre-countering the threat of regulation include FLI (2017), ITIC (2017), and PoAI
(2018).

A more valuable role is played by industry standards. HTR (2017) lists industry standards issued by the International Standards Organisation (ISO) in the AI arena. A
considerable proportion of industry standards focus on inter-operability, and on business processes intended to achieve quality assurance. Public safety is also an area of
strength, particularly in the field commonly referred to as 'safety-critical systems' (e.g. Martins & Gorschek 2016). Hence some of the physical threats embodied in AI-
based systems are able to be avoided, mitigated and managed through the development and application of industry standards; but threats to economic and social interests
are seldom addressed.

A role can also be played by professional associations, because these generally balance public needs against self-interest somewhat better than industry associations. Their
impact is, however, far less pronounced than that of industry associations. Moreover, the intiatives to date of the two largest bodies are underwhelming, with ACM (2017)
using weak forms such as "should" and "are encouraged to", and IEEE (2017) offering lengthy prose but unduly vague and qualified principles. Neither has to date
provided the guidance needed by professionals, managers and executives.

5.7 Organisational Self-Regulation

It was noted above that Directors of corporations are required by law to pursue the interests of the corporation ahead of all other interests. It is therefore unsurprising, and
even to be expected, that organisational self-regulation is almost always ineffectual from the viewpoint of the supposed beneficiaries, and often not even effective at
protecting the organisation itself from bad publicity. Recent offerings by major corporations include IBM (Rayome 2017), Google (Pichai 2018) and MS (2018). For an
indication of the scepticism with which such documents are met, see Newcomer (2018).

5.8 Vague Prescriptions

A range of, in most cases fairly vague, principles, have been proposed by a diverse array of organisations. Examples include the European Greens Alliance (GEFA 2016), a
British Standard BS 8611 (BS 2016), the UNI Global Union (UGU 2017), the Japanese government (Hirano 2017), a House of Lords Committee (HOL 2018), as
interpreted by a World Economic Forum document (Smith 2018), and the French Parliament (Villani 2018).

Although there are commonalities among these formulations, there is also a lot of diversity, and few of them offer usable advice on how to ensure that Intellectics is
applied in a responsible manner. The next section draws on the sources identified above, in order to offer practical advice. It places the ideas within a conventional
framework, but extends that framework in order to address the needs of all stakeholders rather than just the corporation itself.

5.9 Regulation by 'West Coast Code'


One further regulatory element requires consideration. Lessig (1999) popularised the notion of behaviour in socio-technical systems being subject not only to formal law
('East Coast Code'), but also to constraints that exist within computer and network architecture and infrastructure, i.e. standards, protocols, hardware and software ('West
Coast Code').

A relevant form that 'West Coast Code' could take is the embedment in robots of something resembling 'laws of robotics'. The notion dates to an Asimov short story,
'Runaround', first published in 1942; but many commentators on robotics cling to it. For example. Devlin (2016) quotes a professor of robotics as perceiving that the
British Standard Institute's guidance on ethical design of robots (BS 2016) represents "the first step towards embedding ethical values into robotics and AI". On the other
hand, a study of Asimov's robot fiction showed that he had comprehensively demonstrated the futility of the idea (Clarke 1993). A recent expression of the reason why the
approach is doomed is that "You cannot construct an algorithm that will reliably decide whether or not any algorithm is ethical" (Castell 2018, p.743).

http://rogerclarke.com/EC/GAIF.html Page 7 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39

6. A Practical Approach

Ethical analyses offer little assistance, and regulatory frameworks are lacking. It might seem attractive to business enterprises to face few legal obligations and hence to be
subject to limited compliance risk exposure. On the other hand, the absence of regulation heightens many other business risks. At least some competitors inevitably exhibit
'cowboy' behaviour, and there are always individuals and groups within each organisation who can be tempted by the promise that AI appears to offer. As a result, there are
substantial direct and indirect threats to the organisation's reputation. It is therefore in each organisation's own self-interest for a modicum of regulation to exist, in order to
provide a protective shield against media exposés and public backlash.

This section offers guidance to organisations. It assumes that organisations evaluating AI apply conventional environmental scanning and marketing techniques in order to
identify opportunities, and a conventional business case approach to estimating the strategic, market-share, revenue, cost and profit benefits that the opportunities appear to
offer them. The focus here is on how the downsides can be identified, evaluated and managed.

Familiar, practical approaches to assessing and managing risks are applicable. However, I contend that the conventional framework must be extended to include an
important element that is commonly lacking in business approaches to risk. That missing ingredient is stakeholder analysis. Risk assessment and management needs to be
performed not only from the business perspective, but also from the perspectives of other stakeholders.

6.1 Corporate Risk Assessment

There are many sources of guidance in relation to risk assessment and management. The techniques are well-developed in the context of security of IT assets and digital
data, although the language and the approaches vary considerably among the many sources (most usefully: Firesmith 2004, ISO 2005, ISO 2008, NIST 2012, ENISA 2016,
ISM 2017). For the present purpose, a model is adopted that is summarised in Appendix 1 of Clarke (2015). See Figure 1.

Figure 1: The Conventional Risk Model

Existing corporate practice approaches this model from the perspective of the organisation itself. This gives rise to conventional risk assessment and risk management
processes outlined in Table 2. Relevant assets are identified, and an analysis undertaken of the various forms of harm that could arise to those assets as a result of threats
impinging on, or actively exploiting, vulnerabilities, and giving rise to incidents. Existing safeguards are taken into account, in order to guide the development of a strategy
and plan to refine and extend the safeguards and thereby provide a degree of protection that is judged to suitably balance modest actual costs against much higher
contingent costs.

Table 2: The Risk Assessment and Risk Management Processes

Analyse / Perform Risk Assessment

(1) Define the Objectives and Constraints

(2) Identify the relevant Stakeholders, Assets, Values and categories of Harm

(3) Analyse Threats and Vulnerabilities

(4) Identify existing Safeguards

(5) Identify and Prioritise the Residual Risks

Design / Initiate Risk Management

(1) Identify alternative Safeguards

(2) Evaluate the alternatives against the Objectives and Constraints

(3) Select a Design (or adapt / refine the alternatives to achieve an acceptable
Design)

http://rogerclarke.com/EC/GAIF.html Page 8 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39

Do / Perform Risk Management

(1) Plan the implementation

(2) Implement

(3) Review the implementation

6.2 Stakeholder Risk Assessment


The notion of 'stakeholders' was introduced as a means of juxtaposing the interests of other parties against those of the corporation's shareholders (Freeman & Reed 1983).
Many stakeholders are participants in relevant processes, in such roles as employees, customers and suppliers. Where the organisation's computing services extend beyond
its boundaries, any and all of those primary categories of stakeholder may be users of the organisation's information systems.

However, the categories of stakeholders are broader than this, comprising not only "participants in the information systems development process" but also "any other
individuals, groups or organizations whose actions can influence or be influenced by the development and use of the system whether directly or indirectly" (Pouloudi &
Whitley 1997, p.3). The term 'usees' is a usefully descriptive term for these once-removed stakeholders (Clarke 1992, Fischer-Hübner & Lindskog 2001, Baumer 2015).

My first proposition for extension beyond conventional corporate risk assessment is that the responsible application of AI is only possible if stakeholder analysis is
undertaken in order to identify the categories of entities that are or may be affected by the particular project (Clarkson 1995). There is a natural tendency to focus on those
entities that have sufficient market or institutional power to significantly affect the success of the project. On the other hand, in a world of social media and rapid and deep
mood-swings, it is advisable to not overlook the nominally less powerful stakeholders. Where large numbers of individuals are involved (typically, employees, consumers
and the general public), it will generally be practical to use representative and advocacy organisations as intermediaries, to speak on behalf of the categories or segments of
individuals.

My second proposition is that the responsible application of AI depends on risk assessment processes being conducted from the perspectives of the various stakeholders, to
complement that undertaken from the perspective of the corporation. Conceivably, such assessments could be conducted by the stakeholders independently, and fed into
the organisation. In practice, the asymmetry of information, resources and power is such that the outputs from independent. and therefore uncoordinated, activities are
unlikely to gain acceptance. The responsibility lies with the sponsor of an initiative to drive the studies, engage effectively with the other parties, and reflect their input in
the project design criteria and features.

The risk assessment process outlined in Table 2 above is generally applicable. However, my third proposition is that risk assessment processes that reflect the interests of
stakeholders needs to be broader than that commonly undertaken within organisations. Relevant techniques include privacy impact assessment (Clarke 2009, Wright & De
Hert 2012), social impact assessment (Becker & Vanclay 2003), and technology assessment (OTA 1977). For an example of impact assessment applied to the specific
category of person-carrier robots, see Villaronga & Roig (2017). The most practical approach may be, however, to adapt the organisation's existing process in order to
encompass whichever aspects of such broader techniques are relevant to the stakeholders whose needs are being addressed.

6.3 Comprehensive Risk Management

The results of the two or more risk assessment processes outlined above deliver the information that the organisation needs. They enable the development of a strategy and
plan whereby existing safeguards can be adapted or replaced, and new safeguards conceived and implemented. ISO standard 27005 (2008, pp.20-24) discusses four options
for what it refers to as 'risk treatment': risk modification, risk retention, risk avoidance and risk sharing. A framework is presented in Table 3 that in my experience is more
understandable by practitioners and more readily usable as a basis for identifying possible safeguards.

Table 3: Categories of Risk Management Strategy

Proactive Strategies
Avoidance
e.g. non-use of a risk-prone technology or procedure
Deterrence
e.g. signs, threats of dismissal, publicity for prosecutions, substantial fines, gaol-time
Prevention
e.g. surge protectors and backup power sources; quality equipment, media and software; physical and logical access control; staff training, assigned responsibilities
and measures to sustain morale; staff termination procedures
Redundancy
e.g. duplicated equipment and communication paths; multiple, parallel evaluations with cross-checking of results

Reactive Strategies

Detection
e.g. fire and smoke detectors, logging, log-analysis, exception reporting
Reduction / Mitigation
e.g. fire-suppression technologies, fire-warden training, suspension of processing when unexpected harm arises, pre-arranged contingent measures to compensate for
harm
Recovery
e.g. investment in resources, procedures/documentation, staff training, and duplication including 'hot-sites' and 'warm-sites'
Insurance
e.g. mutual arrangements with other organisations, maintenance contracts with suppliers, escrow of third party software, inspection of escrow deposits, policies with
insurance companies

Non-Reactive Strategies
Tolerance / Self-Insurance
where assessment of the contingent costs concludes that they are bearable
Graceful Degradation

http://rogerclarke.com/EC/GAIF.html Page 9 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39

e.g. a pre-funded compensation fund, combined with suspension or cancellation of processing when unexpected harm arises
Graceless Degradation
e.g. siting a nuclear energy company's headquarters adjacent to the power plant, on the grounds that, if it goes, then the organisation and its employees should go
with it

___________

Existing techniques are strongly oriented towards protection against risks as perceived by the organisation. Risks to other stakeholders are commonly treated as, at best, a
second-order consideration, and at worst as if they were out-of-scope. All risk management work involves the exercise of a considerable amount of imagination. That
characteristic needs to be underlined even more strongly in the case of the comprehensive, multi-stakeholder approach that I am contending is necessary in the case of AI-
based systems.

This section has suggested customisation of existing, generic techniques in order to address the context of AI-based systems. The following section presents more specific
proposals.

7. Towards Operational Principles


This section presents a set of Principles for AI. The purpose of doing so is to provide organisations and individuals with guidance as to how they can fulfil their
responsibilities in relation to AI and AI-based activities. Because of the broad scope of the AI notion, the considerable diversity among its various forms, and the changes
in those forms over time, the Principles proposed below are still somewhat abstract. The intention is to express them in as practically useful a manner as can reasonably be
achieved. At the very least, they should provide a firm base for the expression of operational guidance for each specific form of AI.

The Principles in part emerge from the analysis presented in this Working Paper, and in part represent a consolidation of ideas from a suite of previously-published sets of
principles. The suite was assembled by surveying academic, professional and policy literatures. Diversity of perspective was actively sought. The sources include
corporations and industry associations (5), governmental organisations (6), academics (4), professional associations (2), joint associations (2), and non-government
organisations (5). Only sets that were available in the english language were used. This resulted in a strong bias within the suite towards documents that originated in
countries whose primary language(s) is or include English. Of the individual documents, 8 are formulations of 'ethical principles and IT'. Extracts and citations are
provided at Clarke (2018c). The other 16 claim to provide principles or guidance specifically in relation to AI. Extracts and citations are at Clarke (2018d).

In s.2.3 and Table 1 above, distinctions were drawn among the phases of the supply-chain, which in turn produce AI technology, AI-based artefacts, AI-based systems,
deployments of them, and applications of them. In each case, the relevant category of entity was identified that bears responsibility for negative impacts arising from AI. In
only a few of the 24 documents in the suite were such distinctions evident, however, and in most cases it has to be interpolated which part of the supply-chain the
document is intended to address. The European Parliament (CLA-EP 2016) refers to "design, implementation, dissemination and use", IEEE (2017) to "Manufacturers /
operators / owners", GEFA (2016) to "manufacturers, programmers or operators", FLI (2017) to researchers, designers, developers and builders, and ACM (2017) to
"Owners, designers, builders, users, and other stakeholders". Remarkably, however, in all of these cases the distinctions were only made within a single Principle rather
than being applied to the set as a whole.

Some commonalities exist across the source documents. Overall, however, most of the source documents were remarkably sparse, and there was far less consensus that
might have been expected 60 years after AI was first heralded. For example, only 1 document encompassed cyborgisation (GEFA 2016); only 2 documents referred to the
precautionary principle (CLA-EP 2016, GEFA 2016), and only 5 stipulated the conduct of impact assessments. One striking statistic is that only 3 of the c. 50 Principles
were detectable in at least half of the documents in the set:

ensure physical safety (17 / 24)


ensure human control (12 / 24)
ensure transparency of inferencing, decision-making and actions (12 / 24)

Each source naturally reflects the express, implicit and subliminal purposes of the drafters and the organisations on whose behalf they were composed. In some cases, for
example, the set primarily addresses just one form of AI, such as robotics or machine-learning. Documents prepared by corporations, industry associations, and even
professional associations and joint associations tended to adopt the perspective of producer roles, with the interests of other stakeholders often relegated to a secondary
consideration. For example, the joint-association Future Life Institute perceives the need for "constructive and healthy exchange between AI researchers and policy-
makers", but not for any participation by stakeholders (FLI 2017 at 3). As a result, transparency is constrained to a small sub-set of circumstances (at 6), 'responsibility' of
'designers and builders' is limited to those roles being mere 'stakeholders in moral implications' (at 9), alignment with human values is seen as being necessary only in
respect of "highly autonomous AI systems" (at 10), and "strict safety and control measuers" are limited to a small sub-set of AI systems (at 22). ITIC (2017) considers that
many responsibilities lie elsewhere, and assigns responsibilities to its members only in respect of safety, controllability and data quality. ACM (2017) is expressed in weak
language (should be aware of, should encourage, are encouraged) and regards decision opaqueness as being acceptable, while IEEE (2017) suggests a range of important
tasks for other parties (standards-setters, regulators, legislatures, courts), and phrases other suggestions in the passive voice, with the result that few obligations are clearly
identified as falling on engineering professionals and the organisations that employ them. The House of Lords report might have been expected to adopt a societal or multi-
stakeholder approach, yet, as favourably reported in Smith (2018), it appears to have adopted the perspective of the AI industry.

The process of developing the set commenced with themes that derived from the analysis reported on in the earlier sections of this Working Paper. The previously-
published sets of principles were then inspected. Detailed propositions within each set were extracted, and allocated to themes, maintaining back-references to the sources.
Where items threw doubt on the structure or formulation of the general themes, the schema was adapted in order to sustain coherence and limit the extent to which
duplications arise.

The Principles have been expressed in imperative mode, i.e. in the form of instructions, in order to convey that they require action, rather than being merely desirable
characteristics, or factors to be considered, or issues to be debated. The full set of Principles, comprising about 50 elements, is in Appendix 1. In order to make them more
digestible, Table 4 presents the 10 over-arching themes.

Some of the items that appear in source documents appear incapable of being operationalised. For example, 'human dignity', 'fairness' and 'justice' are vague abstractions
that need to be unpacked into more specific concepts. In addition, some items fall outside the scope of the present work. The items that have been excluded from the set in
Table 4 are listed in Appendix 2.

Each of the Principles requires somewhat different application in each phase of the AI supply-chain. An important example of this is the manner in which Principle 7 -
Deliver Transparency and Auditability - is intended to be interpreted. In the Research and Invention phases of the technological life-cycle, compliance with Principle 7
requires understanding by inventors and innovators of the AI technology, and explicability to developers and users of AI-based artefacts and systems. During the
Innovation and Dissemination phases, the need is for understandability and manageability by developers and users of AI-based systems and applications, and explicability
to affected stakeholders. In the Application phase, the emphasis shifts to understandability by affected stakeholders of inferences, decisions and actions arising from at least
the AI elements within AI-based systems and applications.

The status of the proposed principles is important to appreciate. They are not expressions of law - although in some jurisdictions, and in some circumstances, some may be
legal requirements. They are expressions of moral obligations; but no authority exists that can impose such obligations. In addition, all are contestable, and in different

http://rogerclarke.com/EC/GAIF.html Page 10 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39

circumstances any of them may be in conflict with other legal or moral obligations, and with various interests of various stakeholders. They represent guidance to
organisations involved in AI as to the expectations of courts, regulatory agencies, oversight agencies, competitors and stakeholders. They are intended to be taken into
account as organisations undertake risk assessment and risk management, as outlined in s.6 above.

Table 4: Principles For A.I. Technologies, Artefacts, Systems and Applications


The following Principles are intended to be applied by the entities responsible for all phases of AI research, invention, innovation, dissemination and application.

1. Evaluate Positive and Negative Impacts

AI offers prospects of considerable benefits and disbenefits. All entities involved in applying AI bear legal and moral responsibility to demonstrate the benefits, to be
proactive in relation to disbenefits, and to involve stakeholders in the process.

2. Complement Humans

Considerable public disquiet already exists in relation to displacement of human workers by AI, and the replacement of human decision-making with inhumane machine
decision-making.

3. Ensure Human Control

Considerable public disquiet already exists in relation to the prospect of humans ceding power to machines.

4. Ensure Human Safety and Wellbeing

All entities involved in applying AI bear legal and moral responsibility to provide safeguards for all human stakeholders who are at risk, whether as users of AI-based
artefacts and systems or usees who are affected by them.

5. Ensure Consistency with Human Values and Human Rights

AI is capable of having substantial negative impacts on a wide range of civil and political rights.

6. Embed Quality Assurance

All entities involved in applying AI have legal and moral responsibilities in relation to the quality of business processes and products.

7. Deliver Transparency and Auditability

All entities involved in applying AI have legal and moral obligations in relation to due process and procedural fairness. These obligations can only be fulfilled if the entity
ensures that humanly-understandable explanations are available for all AI-based inferences, decisions and actions.

8. Exhibit Robustness and Resilience

AI-based systems and associated data must be subject to safeguards commensurate with the significance of their benefits, sensitivity and potential to cause harm to
stakeholders.

9. Ensure Accountability for Legal and Moral Obligations

All entities involved in applying AI have legal and moral obligations in relation to due process and procedural fairness. These obligations can only be fulfilled if the entity
is discoverable and addresses problems as they arise.

10. Enforce, and Accept Enforcement of, Liabilities and Sanctions

All entities involved in applying AI have legal and moral obligations in relation to due process and procedural fairness. These obligations can only be fulfilled if the entity
implements internal problem-handling processes, and respects and complies with external problem-handling processes.

___________

The Principles in Table 4 are intentionally framed and phrased in an abstract manner, in an endeavour to achieve applicability to at least the currently mainstream forms of
AI discussed earlier - robotics, particularly remote-controlled and self-driving vehicles; cyborgs who incorporate computational capabilities; and AI/ML/neural-networking
applications. More broadly, the intention is that they be applicable to what I proposed above as the appropriate conceptualisation of the field - Intellectics.

These Principles are capable of being further articulated into much more specific guidance in respect of each particular category of AI. For example, in a companion
project, I have proposed 'Guidelines for Responsible Data Analytics' (Clarke 2018b). These provide more detailed guidance for the conduct of all forms of data analytics
projects, including those that apply AI/ML/neural-networking approaches. Areas addressed by the Data Analytics guidelines include governance, expertise and compliance
considerations, multiple aspects of data acquisition and data quality, the suitability of both the data and the analytical techniques applied to it, and factors involved in the
use of inferences drawn from the analysis.

8. Conclusions

This paper has proposed that the unserviceable notion of AI should be replaced by the notion of 'complementary intelligence', and that the notion of robotics ('machines
that think') is now much less useful than that of 'intellectics' ('computers that do').

The techniques and technologies that emerge from research laboratories offer potential but harbour considerable threats to organisations, and to those organisations'
stakeholders. Sources of guidance have been sought, whereby organisations in both the private and public sectors can evaluate the appropriateness of various such
technologies to their own operations. Neither ethical analysis nor regulatory schemes deliver what organisations need. The paper concludes that adapted forms of risk
assessment and risk management processes can fill the void, and that principles specific to AI can be formulated.

http://rogerclarke.com/EC/GAIF.html Page 11 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39

The propositions in this paper need to be workshopped with colleagues in the academic and consultancy worlds. The abstract Principles need to be articulated into more
specific expressions that are directly relevant to particular categories of technology, artefacts, systems and applications. The resulting guidance then needs to be exposed to
relevant professional executives and managers, reviewed by internal auditors, government relations executives and corporate counsel, and pilot-tested in realistic settings.

Appendix 1: 50 Principles for Responsible AI Technologies, Artefacts, Systems and Applications


A PDF version of this Appendix is available

The following Principles are intended to be applied by the entities responsible for all phases of AI research, invention, innovation, dissemination and application. The
cross-references are to the 'Ethical Principles and IT' sources (Clarke 2018b - E) and 'Principles for AI' sources (Clarke 2018c - P).

1. Evaluate Positive and Negative Impacts

1.1 Conceive and design only after ensuring adequate understanding of purposes and contexts
(E4.3, P5.3, P6.21, P7.1, P15.7)

1.2 Justify objectives


(E3.25)

1.3 Demonstrate the achievability of postulated benefits


(Not found in any of the documents, but a logical pre-requisite)

1.4 Conduct impact assessment


(E7.1, P3.12, P4.1, P4.2, P6.21, P11.8)

1.5 Publish sufficient information to stakeholders to enable them to conduct impact assessment
(E7.3, P3.7, P4.1, P8.3, P8.4, P8.7)

1.6 Conduct consultation with stakeholders and enable their participation


(E5.6, E7.2, P3.7, P8.6, P8.7)

1.7 Justify negative impacts on individuals ('proportionality')


(E3.21, E7.4, E7.5)

1.8 Consider alternative, less harmful ways of achieving the same objectives
(E3.22)

2. Complement Humans

2.1 Design as an aid, for augmentation, collaboration and inter-operability


(P4.5, P5.1, P9.1, P9.8 .P14.2, P14.4)

2.2 Avoid design for replacement of people by independent devices, except in circumstances in which artefacts are demonstrably more capable than people, and even then
ensuring that the result is complementary to human capabilities
(P5.1)

3. Ensure Human Control

3.1 Ensure human control over AI-based artefacts and systems


(E4.2, E6.1, E6.8, E6.19, P1.4, P2.1, P4.2, P5.2, P6.16, P8.4, P9.3, P12.1, P13.5, P15.4)

3.2 In particular, ensure control over autonomous behaviour of AI-based artefacts and systems
(E8.1, P8.4, P10.2, P11.4)

3.3 Respect each person's autonomy, freedom of choice and self-determination


(E2.1, E5.3, P3.3, P9.7, P11.3)

3.4 Ensure human review of inferences and decisions prior to acting on them
(E3.11)

3.5 Respect people's expectations in relation to personal data protections (E5.6), including:
* awareness of data-usage (E3.6)
* consent (E3.7, E3.28, E5.3, P3.11, P4.6)
* data minimisation (E3.9)
* public visibility and consultation (E3.10, E7.2), and
* relationship of data-usage to the data's original purpose (E3.27)

3.6 Avoid deception of humans


(E4.4, E6.20, P2.5)

3.7 Avoid services being conditional on the acceptance of AI-based artefacts and systems
(P4.5)

4. Ensure Human Safety and Wellbeing

4.1 Ensure people's physical health and safety ('nonmaleficence')


(E2.2, E3.1, E4.1, E4.3, E5.4, E6.2, E6.9, E6.13, E6.14, E6.18, P1.2, P1.3, P2.1, P3.2, P3.6, P3.9, P3.12, P4.3, P4.9, P6.6, P8.4, P9.4, P10.2, P11.4, P13.5, P14.1, P15.3)

4.2 Ensure people's psychological safety (E3.1, E6.9, E6.13), by avoiding negative effects on any individual's mental health, inclusion in society, worth, standing in
comparison with other people, or emotional state (E5.4, E6.3)

http://rogerclarke.com/EC/GAIF.html Page 12 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39

4.3 Ensure people's wellbeing ('beneficence')


(E2.3, E3.20, E5.5, P3.1, P3.4, P6.1, P6.14, P6.15, P8.6, P11.6, P12.2, P13.1, P15.1, P16.4)

4.4 Mitigate negative consequences


(E3.24, E7.6, E6.21, E10.4)

4.5 Avoid violation of trust


(E3.3)

4.6 Avoid the manipulation of vulnerable people (E4.4, P4.5, P4.9), including taking advantage of individuals' tendency to addiction, e.g. to gambling (E6.3)

5. Ensure Consistency with Human Values and Human Rights

5.1 Ensure compliance with human rights laws


(E4.2, P3.5, P3.9, P4.3)

5.2 Be just / fair / impartial and treat individuals equally


(E2.4, E3.16, E3.29, P3.4)

5.3 Avoid unfair discrimination and bias, not only where it is legally procribed but also where it is publicly unacceptable
(ICCPR Arts. 2.1, 3, 26 and 27, E3.16, P3.4, P4.5, P11.5, P15.2, P16.1)

5.4 Avoid restrictions on freedom of movement


(ICCPR 12, P6.13)

5.5 Avoid interference with privacy, family, home or reputation


(ICCPR 17, E5.6, P3.11, P6.12, P8.4, P9.6, P13.3, P15.5)

5.6 Avoid interference with the rights of freedom of information, opinion and expression
(ICCPR 19, P4.6)

5.7 Avoid interference with the right of freedom of assembly


(ICCPR 21, P6.13)

5.8 Avoid interference with the right of freedom of association


(ICCPR 22, P6.13)

5.9 Avoid interference with the rights to participation in public affairs and access to public service
(ICCPR 25, P6.13)

6. Embed Quality Assurance

6.1 Invest in quality assurance


(E6.2, P4.2, P15.6)

6.2 Ensure effective, efficient and adaptive performance of intended functions


(E6.11, P1.6)

6.3 Ensure security safeguards against inappropriate modification to and deletion of sensitive data
(E3.15)

6.4 Ensure justification of the use of sensitive data


(E3.26, E7.4)

6.5 Ensure data quality and data relevance


(P10.3, P11.2)

6.6 Deal fairly with people (faithfulness, fidelity)


(E2.5, E3.2)

6.7 Avoid invalid and unvalidated techniques


(E3.5, P7.7)

6.8 Test for result validity


(E3.5, P7.7. P9.2)

6.9 Impose controls in order to ensure that safeguards are operative and effective
(E7.7, P10.2)

6.10 Conduct audits of safeguards and controls (E7.8, P9.3)

7. Deliver Transparency and Auditability

7.1 Ensure that the fact that the process is AI-based is transparent to all stakeholders
(E4.4, P4.8)

7.2 Ensure that the means whereby inferences are drawn, decisions made and actions are taken are logged and can be reconstructed
(E6.6, P2.4, P4.8, P5.2, P6.7, P7.4, P7.6, P8.2. P9.2, P11.1, P11.2, P13.2, P16.2, P16.3)

7.3 Ensure people are aware of inferences and how they were reached
(E3.12, P2.4)

http://rogerclarke.com/EC/GAIF.html Page 13 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39

8. Exhibit Robustness and Resilience

8.1 Provide and sustain appropriate security safeguards against compromise of intended functions arising from both passive threats and active attacks
(E4.3, E6.11, P1.4, P1.5, P4.9, P6.6, P8.4, P9.5)

8.2 Provide and sustain appropriate security safeguards against inappropriate access to sensitive data arising from both passive threats and active attacks
(E3.15, E6.5, E6.10, P3.11, P9.6)

8.3 Conduct audits of justification, proportionality, transparency, mitigation measures and controls
(E7.8, E8.4)

8.4 Ensure resilience, in the sense of prompt and effective recovery from incidents

9. Ensure Accountability for Legal and Moral Obligations

9.1 Ensure that the responsible entity is apparent or can be readily discovered by any party
(E4.5, E6.4, P2.3, P3.8, P4.7, P8.5, P12.3)

9.2 Ensure that effective remedies exist, in the form of complaints processes, appeals processes, and redress where harmful errors have occurred
(ICCPR 2.3, E3.13, E3.14, E7.7, P3.11, P4.7, P7.2, P8.7, P9.9, P10.5, P11.9, P16.3)

10. Enforce, and Accept Enforcement of, Liabilities and Sanctions

10.1 Ensure that complaints, appeals and redress processes operate effectively
(ICCPR 2.3, E7.7)

10.2 Comply with external complaints, appeals and redress processes and outcomes (ICCPR 14), including, in particular, provision of timely, accurate and complete
information relevant to cases

Appendix 2: Omitted Elements


The following elements within the sources have not been reflected in Table 4 and Appendix 1.

Environmental Sustainability (E5.5, E6.7, P4.4, P11.3)


"A user must not use a robot to commit an illegal act" (E6.12)
"[Do not] deliberately damage or destroy a robot" (E6.15)
"[Do not,] through gross negligence ... allow a robot to come to harm" (E6.16)
"It is a lesser but nonetheless serious offence to treat a robot in a way which may be construed as deliberately and inordinately abusive" (E6.17)
"[Respect a robot's] right to exist without fear of injury or death" (E6.21)
"[Respect a robot's] right to live an existence free from systematic abuse (E6.22)
"Fund research in particular with regards to the ethical and legal effects of artificial intelligence" (P4.10, P6.2)
Asimov's Meta-Law (P1.1)
Asimov's Procreation Law (P1.7)
Ensure reversability of actions (P3.10)
Respect and improve social processes, and avoid subverting them (P6.17)
Public Empowerment - The public's ability to understand AI-enabled services, and how they work, is key to ensuring trust in the technology - 'Algorithmic Literacy'
must be a basic skill ... (P8.3)
Equip AI systems with an 'Ethical Black Box' that contains clear data and information on the ethical considerations built into said system (P11.2)
Secure a Just Transition as workers are displaced (P11.7)
Establish governance mechanisms [i.e. long-term impact assessment and management] (P11.8)

References

ACM (2017) 'Statement on Algorithmic Transparency and Accountability' Association for Computing Machinery, January 2017, at
https://www.acm.org/binaries/content/assets/public-policy/2017_usacm_statement_algorithms.pdf

Akiko (2012) 'South Korean Robot Ethics Charter 2012' Akiko's Blog, 2012, at https://akikok012um1.wordpress.com/south-korean-robot-ethics-charter-2012/

Albus J. S. (1991) 'Outline for a theory of intelligence' IEEE Trans. Systems, Man and Cybernetics 21, 3 (1991) 473-509, at http://citeseerx.ist.psu.edu/viewdoc/download?
doi=10.1.1.410.9719&rep=rep1&type=pdf

Anderson C. (2008) 'The End of Theory: The Data Deluge Makes the Scientific Method Obsolete' Wired Magazine 16:07, 23 June 2008, at
http://archive.wired.com/science/discoveries/magazine/16-07/pb_theory

APF (2013) 'Meta-Principles for Privacy Protection' Australian Privacy Foundation, March 2013, at https://privacy.org.au/policies/meta-principles/

Baumer E.P.S. (2015) 'Usees' Proc. 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI'15), April 2015

Becker H. & Vanclay F. (2003) 'The International Handbook of Social Impact Assessment' Cheltenham: Edward Elgar, 2003

Bennett Moses L. (2011) 'Agents of Change: How the Law Copes with Technological Change' Griffith Law Review 20, 4 (2011) 764-794, at
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2000428

Bibel W. (1980) ''Intellektik' statt 'KI' -- Ein ernstgemeinter Vorschlag' Rundbrief der Fachgruppe Künstliche Intelligenz in der Gesellschaft für Informatik, 22, 15-16
December 1980

Bibel W. (1989) 'The Technological Change of Reality: Opportunities and Dangers' AI & Society 3, 2 (April 1989) 117-132

Braithwaite B. & Drahos P. (2000) `Global Business Regulation' Cambridge University Press, 2000

http://rogerclarke.com/EC/GAIF.html Page 14 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39

BS (2016) 'Robots and robotic devices - Guide to the ethical design and application of robots and robotic systems' BS 8611, British Standards Institute, April 2016

Burrell J. (2016) How the machine 'thinks': Understanding opacity in machine learning algorithms' Big Data & Society 3, 1 (January-June 2016) 1-12

Calo R. (2017) 'Artificial Intelligence Policy: A Primer and Roadmap' UC Davis L. Rev. 51 (2017) 399-404

Castell S. (2018) 'The future decisions of RoboJudge HHJ Arthur Ian Blockchain: Dread, delight or derision?' Computer Law & Security Review 34, 4 (Jul-Aug 2018)
739-753

Chen Y. & Cheung A.S.Y. (2017). 'The Transparent Self Under Big Data Profiling: Privacy and Chinese Legislation on the Social Credit System, The Journal of
Comparative Law 12, 2 (June 2017) 356-378, at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2992537

CLA-EP (2016) 'Recommendations on Civil Law Rules on Robotics' Committee on Legal Affairs of the European Parliament, 31 May 2016, at
http://www.europarl.europa.eu/sides/getDoc.do?pubRef=-//EP//NONSGML%2BCOMPARL%2BPE-582.443%2B01%2BDOC%2BPDF%2BV0//EN

Clarke R. (1989) 'Knowledge-Based Expert Systems: Risk Factors and Potentially Profitable Application Area', Xamax Consultancy Pty Ltd, January 1989, at
http://www.rogerclarke.com/SOS/KBTE.html

Clarke R. (1991) 'A Contingency Approach to the Application Software Generations' Database 22, 3 (Summer 1991) 23-34, PrePrint at
http://www.rogerclarke.com/SOS/SwareGenns.html

Clarke R. (1992) 'Extra-Organisational Systems: A Challenge to the Software Engineering Paradigm' Proc. IFIP World Congress, Madrid, September 1992, at
http://www.rogerclarke.com/SOS/PaperExtraOrgSys.html

Clarke R. (1993) 'Asimov's Laws of Robotics: Implications for Information Technology' in two parts, in IEEE Computer 26,12 (December 1993) 53-61, and 27,1 (January
1994) 57-66, at http://www.rogerclarke.com/SOS/Asimov.html

Clarke R. (2005) 'Human-Artefact Hybridisation: Forms and Consequences' Proc. Ars Electronica 2005 Symposium on Hybrid - Living in Paradox, Linz, Austria, 2-3
September 2005, PrePrint at http://www.rogerclarke.com/SOS/HAH0505.html

Clarke R. (2009) 'Privacy Impact Assessment: Its Origins and Development' Computer Law & Security Review 25, 2 (April 2009) 123-135, PrePrint at
http://www.rogerclarke.com/DV/PIAHist-08.html

Clarke R. (2011) 'Cyborg Rights' IEEE Technology and Society 30, 3 (Fall 2011) 49-57, at http://www.rogerclarke.com/SOS/CyRts-1102.html

Clarke R. (2014a) 'Understanding the Drone Epidemic' Computer Law & Security Review 30, 3 (June 2014) 230-246, PrePrint at
http://www.rogerclarke.com/SOS/Drones-E.html

Clarke R. (2014b) 'What Drones Inherit from Their Ancestors' Computer Law & Security Review 30, 3 (June 2014) 247-262, PrePrint at
http://www.rogerclarke.com/SOS/Drones-I.html

Clarke R. (2014c) 'The Regulation of of the Impact of Civilian Drones on Behavioural Privacy' Computer Law & Security Review 30, 3 (June 2014) 286-305, PrePrint at
http://www.rogerclarke.com/SOS/Drones-BP.html

Clarke R. (2015) 'The Prospects of Easier Security for SMEs and Consumers' Computer Law & Security Review 31, 4 (August 2015) 538-552, PrePrint at
http://www.rogerclarke.com/EC/SSACS.html

Clarke R. (2016a) 'Big Data, Big Risks' Information Systems Journal 26, 1 (January 2016) 77-90, PrePrint at http://www.rogerclarke.com/EC/BDBR.html

Clarke R. (2016) 'Appropriate Regulatory Responses to the Drone Epidemic' Computer Law & Security Review 32, 1 (Jan-Feb 2016) 152-155, PrePrint at
http://www.rogerclarke.com/SOS/Drones-PAR.html

Clarke R. (2016c) 'Quality Assurance for Security Applications of Big Data' Proc. EISIC'16, Uppsala, 17-19 August 2016, PrePrint at
http://www.rogerclarke.com/EC/BDQAS.html

Clarke R. (2018a) 'Centrelink's Big Data 'Robo-Debt' Fiasco of 2016-17' Xamax Consultancy Pty Ltd, January 2018, at http://www.rogerclarke.com/DV/CRD17.html

Clarke R. (2018b) 'Guidelines for the Responsible Application of Data Analytics' Computer Law & Security Review 34, 3 (May-Jun 2018) 467- 476, PrePrint at
http://www.rogerclarke.com/EC/GDA.html

Clarke R. (2018c) 'Ethical Principles and Information Technology' Xamax Consultancy Pty Ltd, rev. September 2018, at http://www.rogerclarke.com/EC/GAIE.html

Clarke R. (2018d) 'Principles for AI: A 2017-18 SourceBook' Xamax Consultancy Pty Ltd, rev. September 2018, at http://www.rogerclarke.com/EC/GAI.html

Clarke R. & Bennett Moses L. (2014) 'The Regulation of Civilian Drones' Impacts on Public Safety' Computer Law & Security Review 30, 3 (June 2014) 263-285,
PrePrint at http://www.rogerclarke.com/SOS/Drones-PS.html

Clarkson M.B.E. (1995) 'A Stakeholder Framework for Analyzing and Evaluating Corporate Social Performance' The Academy of Management Review 20, 1 (Jan.1995)
92-117 , at
https://www.researchgate.net/profile/Mei_Peng_Low/post/Whats_corporate_social_performance_related_to_CSR/attachment/59d6567879197b80779ad3f2/AS%3A530408064417

Devlin H. (2016). 'Do no harm, don't discriminate: official guidance issued on robot ethics' The Guardian, 18 Sep 2016, at
https://www.theguardian.com/technology/2016/sep/18/official-guidance-robot-ethics-british-standards-institute

DMV-CA (2018) 'Autonomous Vehicles in California' Califiornian Department of Motor Vehicles, February 2018, at
https://www.dmv.ca.gov/portal/dmv/detail/vr/autonomous/bkgd

EC (2018) 'Statement on Artificial Intelligence, Robotics and `Autonomous' Systems' European Group on Ethics in Science and New Technologies' European Commission,
March 2018, at http://ec.europa.eu/research/ege/pdf/ege_ai_statement_2018.pdf

EDPS (2016) 'Artificial Intelligence, Robotics, Privacy and Data Protection' European Data Protection Supervisor, October 2016, at
https://edps.europa.eu/sites/edp/files/publication/16-10-19_marrakesh_ai_paper_en.pdf

ENISA (2016) 'Risk Management:Implementation principles and Inventories for Risk Management/Risk Assessment methods and tools' European Union Agency for

http://rogerclarke.com/EC/GAIF.html Page 15 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39

Network and Information Security, June 2016, at https://www.enisa.europa.eu/publications/risk-management-principles-and-inventories-for-risk-management-risk-


assessment-methods-and-tools

Fieser J. (1995) 'Ethics' Internet Encyclopaedia of Philosophy, 1995, at https://www.iep.utm.edu/ethics/

Firesmith D. (2004) 'Specifying Reusable Security Requirements' Journal of Object Technology 3, 1 (Jan-Feb 2004) 61-75, at
http://www.jot.fm/issues/issue_2004_01/column6

Fischer-Hübner S. & Lindskog H. (2001) 'Teaching Privacy-Enhancing Technologies' Proc. IFIP WG 11.8 2nd World Conference on Information Security Education,
Perth, 2001, at http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.3950&rep=rep1&type=pdf

FLI (2017) 'Asilomar AI Principles' Future of Life Institute, January 2017, at https://futureoflife.org/ai-principles/?cn-reloaded=1

Floridi L. (2018) 'Soft Ethics: Its Application to the General Data Protection Regulation and Its Dual Advantage' Philosophy & Technology 31, 2 (June 2018) 163-167, at
https://link.springer.com/article/10.1007/s13347-018-0315-5

Freeman R.E. & Reed D.L. (1983) 'Stockholders and Stakeholders: A New Perspective on Corporate Governance' California Management Review 25:, 3 (1983) 88-106, at
https://www.researchgate.net/profile/R_Freeman/publication/238325277_Stockholders_and_Stakeholders_A_New_Perspective_on_Corporate_Governance/links/5893a4b2a6fdcc4
and-Stakeholders-A-New-Perspective-on-Corporate-Governance.pdf

GDPR (2018) 'General Data Protection Regulation' Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural
Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, at http://www.privacy-regulation.eu/en/index.htm

GEFA (2016) 'Position on Robotics and AI' The Greens / European Free Alliance Digital Working Group, November 2016, at https://juliareda.eu/wp-
content/uploads/2017/02/Green-Digital-Working-Group-Position-on-Robotics-and-Artificial-Intelligence-2016-11-22.pdf

HOL (2018) 'AI in the UK: ready, willing and able?' Select Committee on Artificial Intelligence, House of Lords, April 2018, at
https://publications.parliament.uk/pa/ld201719/ldselect/ldai/100/100.pdf

Holder C., Khurana V., Harrison F. & Jacobs L. (2016) 'Robotics and law: Key legal and regulatory implications of the robotics age (Part I of II)' Computer Law &
Security Review 32, 3 (May-Jun 2016) 383-402

HTR (2017) 'Robots: no regulatory race against the machine yet' The Regulatory Institute, April 2017, at http://www.howtoregulate.org/robots-regulators-active/#more-230

HTR (2018a) 'Report on Artificial Intelligence: Part I - the existing regulatory landscape' The Regulatory Institute, May 2018, at
http://www.howtoregulate.org/artificial_intelligence/

HTR (2018b) 'Report on Artificial Intelligence: Part II - outline of future regulation of AI' The Regulatory Institute, June 2018, at
http://www.howtoregulate.org/aipart2/#more-327

HTR (2018c) 'Research and Technology Risks: Part IV - A Prototype Regulation' The Regulatory Institute, March 2018, at http://www.howtoregulate.org/prototype-
regulation-research-technology/#more-298

ICO (2017) 'Big data, artificial intelligence, machine learning and data protection' UK Information Commissioner's Office, Discussion Paper v.2.2, September 2017, at
https://ico.org.uk/for-organisations/guide-to-data-protection/big-data/

IEEE (2017) 'Ethically Aligned Design: A Vision for Prioritizing Human Well-being with Autonomous and Intelligent Systems (A/IS)' IEEE, Version 2, December 2017, at
http://standards.ieee.org/develop/indconn/ec/autonomous_systems.html

ISM (2017) 'Information Security Manual' Australian Signals Directorate, November 2017, at https://acsc.gov.au/infosec/ism/index.htm

ISO (2005) 'Information Technology - Code of practice for information security management', International Standards Organisation, ISO/IEC 27002:2005

ISO (2008) 'Information Technology - Security Techniques - Information Security Risk Management' ISO/IEC 27005:2008

ITIC (2017) 'AI Policy Principles' Information Technology Industry Council, undated but apparently of October 2017, at https://www.itic.org/resources/AI-Policy-
Principles-FullReport2.pdf

Knight W. (2017) 'The Dark Secret at the Heart of AI' 11 April 2017, MIT Technology Review https://www.technologyreview.com/s/604087/the-dark-secret-at-the-heart-
of-ai/

Leenes R. & Lucivero F. (2014) 'Laws on Robots, Laws by Robots, Laws in Robots: Regulating Robot Behaviour by Design' Law, Innovation and Technology 6, 2 (2014)
193-220

Lessig L. (1999) 'Code and Other Laws of Cyberspace' Basic Books, 1999

McCarthy J. (2007) 'What is artificial intelligence?' Department of Computer Science, Stanford University, November 2007, at http://www-
formal.stanford.edu/jmc/whatisai/node1.html

McCarthy J., Minsky M.L., Rochester N. & Shannon C.E. (1955) 'A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence' Reprinted in AI
Magazine 27, 4 (2006), at https://www.aaai.org/ojs/index.php/aimagazine/article/viewFile/1904/1802

Manwaring K. & Clarke R. (2015) 'Surfing the third wave of computing: a framework for research into eObjects' Computer Law & Security Review 31,5 (October 2015)
586-603, PrePrint at http://www.rogerclarke.com/II/SSRN-id2613198.pdf

Martins L.E.G. & Gorschek T. (2016) 'Requirements engineering for safety-critical systems: A systematic literature review' Information and Software Technology Journal
75 (2016) 71-89

Maschmedt A. & Searle R. (2018) 'Driverless vehicle trial legislation ,Äì state-by-state' King & Wood Malleson, February 2018, at
https://www.kwm.com/en/au/knowledge/insights/driverless-vehicle-trial-legislation-nsw-vic-sa-20180227

Mayer-Schonberger V. & Cukier K. (2013) 'Big Data: A Revolution That Will Transform How We Live, Work and Think' John Murray, 2013

MS (2018) 'Microsoft AI principles' Microsoft, August 2018, at https://www.microsoft.com/en-us/ai/our-approach-to-ai

http://rogerclarke.com/EC/GAIF.html Page 16 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39

Newcomer E. (2018). 'What Google's AI Principles Left Out: We're in a golden age for hollow corporate statements sold as high-minded ethical treatises' Bloomberg, 8
June 2018, at https://www.bloomberg.com/news/articles/2018-06-08/what-google-s-ai-principles-left-out

NIST (2012) 'Guide for Conducting Risk Assessments' National Institute of Standards and Technology, Special Publication SP 800-30 Rev. 1, September 2012, at
http://csrc.nist.gov/publications/nistpubs/800-30-rev1/sp800_30_r1.pdf

OTA (1977) 'Technology Assessment in Business and Government' Office of Technology Assessment, NTIS order #PB-273164', January 1977, at
http://www.princeton.edu/~ota/disk3/1977/7711_n.html

Pagallo U. (2016). 'Even Angels Need the Rules: AI, Roboethics, and the Law' Proc. ECAI 2016

Palmerini E. et al. (2014). 'Guidelines on Regulating Robotics Delivery' EU Robolaw Project, September 2014, at
http://www.robolaw.eu/RoboLaw_files/documents/robolaw_d6.2_guidelinesregulatingrobotics_20140922.pdf

Pichai S. (2018) 'AI at Google: our principles' Google Blog, 7 Jun 2018, at https://www.blog.google/technology/ai/ai-principles/

PoAI (2018) 'Our Work (Thematic Pillars)' Partnership on AI, April 2018, at https://www.partnershiponai.org/about/#pillar-1

Pouloudi A. & Whitley E.A. (1997) 'Stakeholder Identification in Inter-Organizational Systems: Gaining Insights for Drug Use Management Systems' European Journal of
Information Systems 6, 1 (1997) 1-14, at
http://eprints.lse.ac.uk/27187/1/__lse.ac.uk_storage_LIBRARY_Secondary_libfile_shared_repository_Content_Whitley_Stakeholder%20identification_Whitley_Stakeholder%20i

Rayome A.D. (2017) 'Guiding principles for ethical AI, from IBM CEO Ginni Rometty', TechRepublic (17 January 2017), at https://www.techrepublic.com/article/3-
guiding-principles-for- ethical-ai-from-ibm-ceo-ginni-rometty/

Russell S.J. & Norvig P. (2009) 'Artificial Intelligence: A Modern Approach' Prentice Hall, 3rd edition, 2009

Schellekens M. (2015) 'Self-driving cars and the chilling effect of liability law' Computer Law & Security Review 31, 4 (Jul-Aug 2015) 506-517

Scherer M.U. (2016) 'Regulating Artificial Intelligence Systems: Risks, Challenges, Competencies, and Strategies' Harvard Journal of Law & Technology 29, 2 (Spring
2016) 353-400, at http://euro.ecom.cmu.edu/program/law/08-732/AI/Scherer.pdf

Selbst A.D. & Powles J. (2017) 'Meaningful information and the right to explanation' International Data Privacy Law 7, 4 (November 2017) 233-242, at
https://academic.oup.com/idpl/article/7/4/233/4762325

Smith R. (2018). '5 core principles to keep AI ethical'. World Economic Forum, 19 Apr 2018, at https://www.weforum.org/agenda/2018/04/keep-calm-and-make-ai-ethical/

TvH (2006) 'Telstra Corporation Limited v Hornsby Shire Council' NSWLEC 133 (24 March 2006), esp. paras. 113-183, at
http://www.austlii.edu.au/au/cases/nsw/NSWLEC/2006/133.htm

UGU (2017) 'Top 10 Principles for Ethical AI' UNI Global Union, December 2017, at http://www.thefutureworldofwork.org/media/35420/uni_ethical_ai.pdf

Vellinga N.E. (2017) 'From the testing to the deployment of self-driving cars: Legal challenges to policymakers on the road ahead' Computer Law & Security Review 33, 6
(Nov-Dec 2017) 847-863

Villani C. (2017) 'For a Meaningful Artificial Intelligence: Towards a French and European Strategy' Part 5 - What are the Ethics of AI?, Mission for the French Prime
Minister , March 2018, pp.113-130, at https://www.aiforhumanity.fr/pdfs/MissionVillani_Report_ENG-VF.pdf

Villaronga E.F. & Roig A. (2017) 'European regulatory framework for person carrier robots' Computer Law & Security Review 33, 4 (Jul-Aug 2017) 502-220

Wachter S. & Mittelstadt B. (2019) 'A Right to Reasonable Inferences: Re-Thinking Data Protection Law in the Age of Big Data and AI' Forthcoming, Colum. Bus. L. Rev.
(2019), at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3248829

Wachter S., Mittelstadt B. & Floridi L. (2017) 'Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation'
International Data Privacy Law 7, 2 (May 2017) 76-99, at https://academic.oup.com/idpl/article/7/2/76/3860948

Warwick K. (2014) 'The Cyborg Revolution' Nanoethics 8, 3 (Oct 2014) 263-273

WH (2018) 'Summary of the 2018 White House Summit on Artificial Intelligence for American Industry' Office of Science and Technology Policy, White House, May
2018, at https://www.whitehouse.gov/wp-content/uploads/2018/05/Summary-Report-of-White-House-AI-Summit.pdf

Wright D. & De Hert P. (eds) (2012) 'Privacy Impact Assessments' Springer, 2012

Wyndham J. (1932) 'The Lost Machine' (originally published in 1932), reprinted in A. Wells (Ed.) 'The Best of John Wyndham' Sphere Books, London, 1973, pp. 13- 36,
and in Asimov I., Warrick P.S. & Greenberg M.H. (Eds.) 'Machines That Think' Holt, Rinehart, and Wilson, 1983, pp. 29-49

Zhaohui W. et al. (2016) 'Cyborg Intelligence: Recent Progress and Future Directions' IEEE Intelligent Systems 31, 6 (Nov-Dec 2016) 44-50

Acknowledgements
This paper has benefited from feedback from multiple colleagues, and particularly Peter Leonard of Data Synergies and Prof. Graham Greenleaf and Kayleen Manwaring
of UNSW. I first applied the term 'intellectics' during a presentation to launch a Special Issue of the UNSW Law Journal in Sydney in November 2017.

Author Affiliations

Roger Clarke is Principal of Xamax Consultancy Pty Ltd, Canberra. He is also a Visiting Professor in Cyberspace Law & Policy at the University of N.S.W., and a Visiting
Professor in the Research School of Computer Science at the Australian National University. He has also spent many years on the Board of the Australian Privacy
Foundation, and is Company Secretary of the Internet Society of Australia.

Access

http://rogerclarke.com/EC/GAIF.html Page 17 of 18
Roger Clarke's 'Responsible AI' 3/10/18, 16)39

Personalia Photographs Statistics

The content and infrastructure for these community service pages are provided by Roger Clarke
through his consultancy company, Xamax.
Xamax Consultancy Pty Ltd
From the site's beginnings in August 1994 until February 2009, the infrastructure was provided by ACN: 002 360 456
the Australian National University. During that time, the site accumulated close to 30 million hits. 78 Sidaway St, Chapman ACT
It passed 50 million in early 2015. 2611 AUSTRALIA
Tel: +61 2 6288 6916
Sponsored by Bunhybee Grasslands, the extended Clarke Family, Knights of the Spatchcock and
their drummer

Created: 11 July 2018 - Last Amended: 3 October 2018 by Roger Clarke - Site Last Verified: 15 February 2009
This document is at www.rogerclarke.com/EC/GAIF.html
Mail to Webmaster - © Xamax Consultancy Pty Ltd, 1995-2017 - Privacy Policy

http://rogerclarke.com/EC/GAIF.html Page 18 of 18
Responsible AI Technologies, Artefacts, Systems and Applications
50 Principles
© Xamax Consultancy Pty Ltd, 2018
This document reproduces Appendix 1 of Clarke (2018a)
The following Principles are intended to be applied by the entities responsible for all phases of AI
research, invention, innovation, dissemination and application. The cross-references are to the
sources on 'Ethical Analysis and IT' sources (Clarke 2018b – E) and of 'Principles for AI' (Clarke
2018c – P).

1. Evaluate Positive and Negative Impacts


1.1 Conceive and design only after ensuring adequate understanding of purposes and contexts
(E4.3, P5.3, P6.21, P7.1, P15.7)
1.2 Justify objectives (E3.25)
1.3 Demonstrate the achievability of postulated benefits
(Not found in any of the documents, but a logical pre-requisite)
1.4 Conduct impact assessment (E7.1, P3.12, P4.1, P4.2, P6.21, P11.8)
1.5 Publish sufficient information to stakeholders to enable them to conduct impact assessment
(E7.3, P3.7, P4.1, P8.3, P8.4, P8.7)
1.6 Conduct consultation with stakeholders and enable their participation
(E5.6, E7.2, P3.7, P8.6, P8.7)
1.7 Justify negative impacts on individuals ('proportionality') (E3.21, E7.4, E7.5)
1.8 Consider alternative, less harmful ways of achieving the same objectives (E3.22)

2. Complement Humans
2.1 Design as an aid, for augmentation, collaboration and inter-operability
(P4.5, P5.1, P9.1, P9.8 .P14.2, P14.4)
2.2 Avoid design for replacement of people by independent devices, except in circumstances in
which artefacts are demonstrably more capable than people, and even then ensuring that the
result is complementary to human capabilities (P5.1)

3. Ensure Human Control


3.1 Ensure human control over AI-based artefacts and systems
(E4.2, E6.1, E6.8, E6.19, P1.4, P2.1, P4.2, P5.2, P6.16, P8.4, P9.3, P12.1, P13.5, P15.4)
3.2 In particular, ensure control over autonomous behaviour of AI-based artefacts and systems
(E8.1, P8.4, P10.2, P11.4)
3.3 Respect each person's autonomy, freedom of choice and self-determination
(E2.1, E5.3, P3.3, P9.7, P11.3)
3.4 Ensure human review of inferences and decisions prior to acting on them (E3.11)
3.5 Respect people's expectations in relation to personal data protections (E5.6), incl.:
• awareness of data-usage (E3.6)
• consent (E3.7, E3.28, E5.3, P3.11, P4.6)
• data minimisation (E3.9)
• public visibility and consultation (E3.10, E7.2), and
• relationship of data-usage to the data's original purpose (E3.27)
3.6 Avoid deception of humans (E4.4, E6.20, P2.5)
3.7 Avoid services being conditional on the acceptance of AI-based artefacts and systems
(P4.5)

– 1 –
4. Ensure Human Safety and Wellbeing
4.1 Ensure people's physical health and safety ('nonmaleficence')
(E2.2, E3.1, E4.1, E4.3, E5.4, E6.2, E6.9, E6.13, E6.14, E6.18, P1.2, P1.3, P2.1, P3.2, P3.6, P3.9,
P3.12, P4.3, P4.9, P6.6, P8.4, P9.4, P10.2, P11.4, P13.5, P14.1, P15.3)
4.2 Ensure people's psychological safety (E3.1, E6.9, E6.13), by avoiding negative effects on any
individual's mental health, inclusion in society, worth, standing in comparison with other
people, or emotional state (E5.4, E6.3)
4.3 Ensure people's wellbeing ('beneficence')
(E2.3, E3.20, E5.5, P3.1, P3.4, P6.1, P6.14, P6.15, P8.6, P11.6, P12.2, P13.1, P15.1, P16.4)
4.4 Mitigate negative consequences (E3.24, E7.6, E6.21, E10.4)
4.5 Avoid violation of trust (E3.3)
4.6 Avoid the manipulation of vulnerable people (E4.4, P4.5, P4.9), including taking advantage of
individuals' tendency to addiction, e.g. to gambling (E6.3)

5. Ensure Consistency with Human Values and Human Rights


5.1 Ensure compliance with human rights laws (E4.2, P3.5, P3.9, P4.3)
5.2 Be just / fair / impartial and treat individuals equally (E2.4, E3.16, E3.29, P3.4)
5.3 Avoid unfair discrimination and bias, not only where it is legally procribed but also where it is
publicly unacceptable
(ICCPR Arts. 2.1, 3, 26 and 27, E3.16, P3.4, P4.5, P11.5, P15.2, P16.1)
5.4 Avoid restrictions on freedom of movement (ICCPR 12, P6.13)
5.5 Avoid interference with privacy, family, home or reputation
(ICCPR 17, E5.6, P3.11, P6.12, P8.4, P9.6, P13.3, P15.5)
5.6 Avoid interference with the rights of freedom of information, opinion and expression (ICCPR
19, P4.6)
5.7 Avoid interference with the right of freedom of assembly (ICCPR 21, P6.13)
5.8 Avoid interference with the right of freedom of association (ICCPR 22, P6.13)
5.9 Avoid interference with the rights to participation in public affairs and access to public
service (ICCPR 25, P6.13)

6. Embed Quality Assurance


6.1 Invest in quality assurance (E6.2, P4.2, P15.6)
6.2 Ensure effective, efficient and adaptive performance of intended functions
(E6.11, P1.6)
6.3 Ensure security safeguards against inappropriate modification to and deletion of sensitive
data (E3.15)
6.4 Ensure justification of the use of sensitive data (E3.26, E7.4)
6.5 Ensure data quality and data relevance (P10.3, P11.2)
6.6 Deal fairly with people (faithfulness, fidelity) (E2.5, E3.2)
6.7 Avoid invalid and unvalidated techniques (E3.5, P7.7)
6.8 Test for result validity (E3.5, P7.7. P9.2)
6.9 Impose controls in order to ensure that safeguards are operative and effective
(E7.7, P10.2)
6.10 Conduct audits of safeguards and controls (E7.8, P9.3)

– 2 –
7. Deliver Transparency and Auditability
7.1 Ensure that the fact that the process is AI-based is transparent to all stakeholders
(E4.4, P4.8)
7.2 Ensure that the means whereby inferences are drawn, decisions made and actions are taken
are logged and can be reconstructed
(E6.6, P2.4, P4.8, P5.2, P6.7, P7.4, P7.6, P8.2. P9.2, P11.1, P11.2, P13.2, P16.2, P16.3)
7.3 Ensure people are aware of inferences and how they were reached (E3.12, P2.4)

8. Exhibit Robustness and Resilience


8.1 Deliver and sustain appropriate security safeguards against compromise of intended
functions arising from both passive threats and active attacks
(E4.3, E6.11, P1.4, P1.5, P4.9, P6.6, P8.4, P9.5)
8.2 Deliver and sustain appropriate security safeguards against inappropriate access to sensitive
data arising from both passive threats and active attacks
(E3.15, E6.5, E6.10, P3.11, P9.6)
8.3 Conduct audits of justification, proportionality, transparency, mitigation measures and controls
(E7.8, E8.4)
8.4 Ensure resilience, in the sense of prompt and effective recovery from incidents

9. Ensure Accountability for Legal and Moral Obligations


9.1 Ensure that the responsible entity is apparent or can be readily discovered by any party
(E4.5, E6.4, P2.3, P3.8, P4.7, P8.5, P12.3)
9.2 Ensure that effective remedies exist, in the form of complaints processes, appeals
processes, and redress where harmful errors have occurred
(ICCPR 2.3, E3.13, E3.14, E7.7, P3.11, P4.7, P7.2, P8.7, P9.9, P10.5, P11.9, P16.3)

10. Enforce, and Accept Enforcement of, Liabilities and Sanctions


10.1 Ensure that complaints, appeals and redress processes operate effectively
(ICCPR 2.3, E7.7)
10.2 Comply with external complaints, appeals and redress processes and outcomes (ICCPR 14),
including, in particular, provision of timely, accurate and complete information relevant to
cases

References
Clarke R. (2018a) 'Guidelines for the Responsible Business Use of AI – Foundational Working
Paper' Xamax Consultancy Pty Ltd, October 2018, at http://www.rogerclarke.com/EC/GAIF.html
Clarke R. (2018b) 'Ethical Analysis and Information Technology' Xamax Consultancy Pty Ltd,
October 2018, at http://www.rogerclarke.com/EC/GAIE.html
Clarke R. (2018c) 'Principles for AI: A SourceBook' Xamax Consultancy Pty Ltd, October 2018, at
http://www.rogerclarke.com/EC/GAIP.html

– 3 –
Journal of Change Management

ISSN: 1469-7017 (Print) 1479-1811 (Online) Journal homepage: https://www.tandfonline.com/loi/rjcm20

Helping Yourself to Help Others: How Cognitive


Change Strategies Improve Employee
Reconciliation with Service Clients and Positive
Work Outcomes

Carol Flinchbaugh, Catherine Schwoerer & Douglas R. May

To cite this article: Carol Flinchbaugh, Catherine Schwoerer & Douglas R. May (2017) Helping
Yourself to Help Others: How Cognitive Change Strategies Improve Employee Reconciliation with
Service Clients and Positive Work Outcomes, Journal of Change Management, 17:3, 249-267, DOI:
10.1080/14697017.2016.1231700

To link to this article: https://doi.org/10.1080/14697017.2016.1231700

Published online: 03 Oct 2016.

Submit your article to this journal

Article views: 361

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=rjcm20
JOURNAL OF CHANGE MANAGEMENT, 2017
VOL. 17, NO. 3, 249–267
http://dx.doi.org/10.1080/14697017.2016.1231700

Helping Yourself to Help Others: How Cognitive Change


Strategies Improve Employee Reconciliation with Service
Clients and Positive Work Outcomes
Carol Flinchbaugha, Catherine Schwoererb and Douglas R. Mayb
a
Department of Management, New Mexico State University, Las Cruces, NM, USA; bDepartment of
Management, University of Kansas, Lawrence, KS, USA

ABSTRACT KEYWORDS
This qualitative study examined the paradox of difficult, yet Reconciliation; cognitive
meaningful, helping as part of employees’ jobs in a social services change; meaningfulness;
organization. Incorporating an emergent design using employee paradox; positive
organizational scholarship
interviews the study identified how employees alter their
understanding of workplace challenges, such as emotional distress
and unsafe client behaviours, in order to find new meaning in the
other-oriented value of their work. The resulting framework of
employees’ experiences through challenging, yet meaningful,
helping extends the research in customer service by proposing the
reconciliation process, achieved through cognitive change strategies
(i.e. visualization techniques, cognitive reframing and mindfulness of
experience) serves as a conceptual bridge that helps the
management of this apparent paradox. We first describe the
workplace challenges and then outline the distinct cognitive
change strategies that engendered the reconciliation process.
Implications for practice and future researchers are then discussed.

What are we doing in this world, and why are we here, if not to help our neighbors. (Founder
of the study’s agency)

Today’s service employees endure numerous challenges in their attempt to help custo-
mers. Employees routinely encounter ambiguity in customer expectations (Johlke & Iyer,
2013), customer’s verbal criticisms (Goussinsky, 2011), and an overall malaise in customer
appreciation (Fisk & Neville, 2011). In essence, encountering and responding effectively to
service challenges, such as patiently listening and calmly responding to irate customers, is
merely a part of the service work. In contrast to single point-in-time employee–customer
interactions found in service work in other settings (e.g. call centres, retail), employees in
social service mental health jobs also face the further risk of on-going physical dangers and
verbal aggression due to unsafe and violent client behaviours (Mizzoni & Kirsh, 2007).
Incorporating a social service context in this study, the agency’s adolescent clients,
often victims of prior abuse and neglect, typically dismiss any need for help and feel
forced to participate in the residential-based treatment. As such, the employees rarely
experience the beneficial results of their service efforts since client improvements often

CONTACT Carol Flinchbaugh cflinch@nmsu.edu


© 2016 Informa UK Limited, trading as Taylor & Francis Group
250 C. FLINCHBAUGH ET AL.

occur well after the timeframe of the employees’ direct care (Tarren-Sweeney, 2008). How
employees navigate the abundant challenges without experiencing many service suc-
cesses in this type of service work is unclear.
Articulating how employees reach service success in arduous settings is perhaps even
more important given the increasing number of service jobs (Maneotis, Grandey, &
Krauss, 2014). The more recent service literature has begun to outline the value of contex-
tual (Anderson, 2006; Duke, Goodman, Treadway, & Breland, 2009) and dispositional (Gous-
sinsky, 2011; Maneotis et al., 2014) antecedents as influences on positive employee
responses to service challenges. This focus moves beyond the traditional scholarly research
that examined the deleterious consequences of such routine service demands on service
employee attitudes and behaviours (Anderson, 2006; Koys, 2001). In examining antece-
dents, for instance, Judge, Woolf, and Hurst (2009) identify the value of service workers’
extraversion to quality service provision. Other scholars extend earlier cognitive reappraisal
research (Folkman, Lazarus, Dunkel-Schetter, DeLongis, & Gruen, 1986) to consider how
employees regulate their emotional response to customer incivilities in both retail (i.e.
focusing on a task; Wallace, Edwards, Shull, & Finch, 2009) and call centre (i.e. job autonomy,
Grandey, Dickter, & Sin, 2004; adhering to company policy, Goldberg & Grandey, 2007) set-
tings. Such consideration of the antecedents to improved service performance provides a
valuable first step to understand factors contributing to the quality of customer service. Yet,
this research often relies on cross-sectional findings, thus failing to incorporate the ongoing
feedback loops required for successful employee navigation in ongoing service encounters
(Goodman & Rousseau, 2004). For instance, how does an employee continually respond to
daily client crises? Or face repeated verbal incivilities from a volatile youth without witnes-
sing youth improvements? Indeed, notably absent from these studies is an understanding
of the longer term processes and strategies employees engage in to successfully journey
through the paradox of highly challenging, yet highly rewarding service jobs.
To address this limitation, researchers have more recently conceptualized process-
oriented approaches that consider the reciprocal steps involved in the employee–customer
service exchange. For example, Atkins and Parker (2011) conceptualized the mutual feedback
loops between employees’ compassionate acting and behavioural responses. In this process-
oriented feedback loop employees first are mindful of the situation and then respond with
compassionate actions. Similarly, Grandey and Gabriel (2015) acknowledged the need to con-
sider cyclical employee responses in the dimensions of emotional labour. They call for an
improved understanding of what allows one employee to align their behavioural responses
with their feelings about the customer (e.g. deep acting), while another employee maintains
misalignment between service behaviours and feelings (e.g. surface acting) (Grandey &
Gabriel, 2015). Responding to these calls, qualitative studies have begun to explore employ-
ees’ process of coping with care-giving roles in social service organizations (Cricco-Lizza,
2014). However, most process-oriented studies to date focus on care-giver difficulties in
family situations, not employees in social service organizations (Glavin & Peters, 2015; Robin-
son, Weiss, Lunsky, & Ouellette-Kuntz, 2015). Thus, to answer these recent calls for a more
process-oriented approach to employee service, the purpose of this research is to identify
how employees navigate personal hardships to provide service to others.
In order to understand how employees are able to maintain employment in spite of
arduous workplace challenges, the initial research focus used a grounded theory approach
to examine the potential link between employees’ resiliency and their tenure in the
JOURNAL OF CHANGE MANAGEMENT 251

organization. Resilient employees demonstrate the capability to bounce back after hardships
(Masten, 2001) and potentially even develop new skills as a result of the challenges they
experience (Fredrickson, 2003). Indeed, Jackson, Firtko, and Edenborough (2007) found
that resilient employees could persevere and adapt to service challenges. Extending these
findings and using the foundation of positive organizational scholarship work to fully
explore these issues (e.g. Cameron, Dutton, & Quinn, 2003; Lilius, Worline, Dutton, Kanov,
& Maitlis, 2011), we initially sought to better understand how resilience increased employee
longevity in challenging service roles. Importantly, the emergent nature of the research
moved well beyond the original focus on resilience when data revealed employees use cog-
nitive change strategies, including visualization techniques, cognitive reframing, and mind-
fulness of experience to generate positive service behaviours through the process of
reconciliation. Reconciliation is depicted in the extant literature as a mutual effort by all
parties to restore a damaged relationship (Aquino, Tripp, & Bies, 2006). We demonstrate
more specifically how reconciliation exists as both an inter- and intrapersonal focus depicting
how employees make amends and have a renewed perspective to service difficulties. Sub-
sequent to these reconciliatory efforts, employees report beneficial experiences despite
adverse work conditions, such as positive, meaningful experiences in other-oriented
helping. The contribution of this research is its demonstration of how employees develop
and sustain personal capabilities in the face of difficulties and hardships in providing care-
giving services. Instead of assuming employee capabilities in the service to others are con-
stant, our framework suggests that employees have to renew their capabilities through
reconciliatory cognitive changes in order to sustain their energy for service to others.
To investigate employees’ use of cognitive change strategies and reconciliation in the
successful guidance through service challenges, we conducted an in-depth qualitative
case study of a mental health social service agency in a major Midwestern city in the
U.S. We incorporate interview data from the workplace experiences of 22 long-term
employees across agency levels at two distinct time points.

Methods
Description of the organization
The setting was a child welfare agency founded in 1843 by a religious order to help single
women who located in the city during westward expansion. In 1998, the religious order
transferred ownership to a non-profit organization. Today, the organization’s mission is
to provide residential treatment services, foster care, and in-home therapeutic family ser-
vices to youth in ‘greatest need’. The agency currently employs 238 employees and serves
over 600 children and families annually.
A visit to the organization reveals how employees face opportunities for meaningful
contributions and distressing challenges alike. Once inside you encounter locked doors,
unbreakable glass, and may hear the voice of a severely agitated girl screaming and
cursing. Walk a little closer to the troubling voice and you may come across an isolated
and secure ‘timeout room’. The child may be located here involuntarily because she is
deemed unsafe to herself and others due to her violent behaviour. She remains here
until she can safely rejoin the group. While this incident may seem unsettling to an
outside observer, this type of volatility and violence is not unexpected from the youth
252 C. FLINCHBAUGH ET AL.

served at this agency. The youth are often emotionally harmed from past abuses by family
and acquaintances and typically enter the facility following psychiatric hospitalization.
During their placement at the agency, the youth routinely engage in volatile and
violent verbal and physical aggression, endangering both the safety of other clients and
employees. These children are in need of the guidance, care, and support provided by
the agency’s employees. This is the context for our examination of the process of how
employees navigate their potentially dangerous service roles.

Philosophical positioning
The research team employed an interpretive approach relying on naturalistic methods. We
elected to conduct our research in a natural setting in order to garner a rich understanding
of the employee experiences through persistent workplace challenges (Lincoln & Guba,
1985). Below, we describe the study’s naturalistic method, including the researchers’
role, emergent design, purposeful sample, and study data and analysis.

Lead researcher’s role


The lead researcher facilitated entry into this agency because she was previously employed
here in a management position for approximately eight years. We acknowledge that the
researcher’s past agency experience is beneficial to the research and yet may create the
potential for bias. The research team took intentional steps to attenuate potential bias,
such as involving participants who had no prior professional interaction with the lead
researcher and using member checks to assure accurate representation of participants’
responses (Creswell, 2013). Rather than biasing results, we believe the researcher’s familiarity
with the agency and tacit knowledge of the context facilitated the desire to understand
employee responses to workplace challenges and supported the use of interviews to get at
the underlying nature of employee responses to difficult work (Glaser & Strauss, 1967). For
instance, during her tenure at the agency the researcher developed an understanding of its
operations and fostered relationships with employees’ at all organizational levels. These
relationships facilitated open and candid sharing of information by participants about their
workplace experiences and enhanced the trustworthiness of the findings (Pratt & Rosa, 2003).

Participants and procedures


Interview procedures
The study design used an emergent process regarding the ongoing involvement of par-
ticipants (Lincoln & Guba, 1985). Initially, the lead researcher met with the agency CEO
to discuss the proposed study, secure permission, and develop a list of potential intervie-
wees and interview questions. Upon agency approval, the research team then received
human subjects approval from the University’s Institutional Review Board. Next, the
team e-mailed all potential interviewees and arranged initial interview dates and times
with the lead researcher to occur within a two-week period. The research team also
took steps to maintain participants’ confidentiality by conducting the interviews in
private locations, offering to hold interviews at a neutral location, and creating participant
pseudonyms in notetaking. Incorporating iterative sampling, a second round of interviews
JOURNAL OF CHANGE MANAGEMENT 253

occurred four months later in order to get a deeper understanding of themes that
emerged from the initial interviews. Steps were also taken to minimize bias through
formal and informal member checks with employees regarding validity of information.

Sampling techniques
In the first interviews, the researchers, in conjunction with the CEO, used purposive
sampling to identify a diverse cross-section of employees from all programmes and organ-
izational levels in the agency. This non-random identification of potential participants
ensured that the study sample included representatives from all programmes and job
roles (Robinson, 2014). All interviewees had a minimum of 5 years of agency employment
(average of 12.5 years). We conducted 2 rounds of semi-structured interviews with 22
employees (see Table 1). In the second round of interviews, we used iterative sampling
to follow-up on new insights from employee’s initial interview comments. The interviewee
diversity in both job roles and demographic characteristics facilitated a global understand-
ing of employee experiences. Nine participants held direct care positions (i.e. more than
95% of the shift is spent directly with youth). Nineteen participants were females, consist-
ent with the social service sector (Schilling, Morrish, & Liu, 2008). Four participants were
African-American and 20 were Caucasian which is representative of the overall agency’s
employee work force. The response rate for interview participation was 86% (two non-
respondents to the interview request and two employees were on leave).

Data collection
The research team collected data using multiple methods and sources:

Open-ended interviews
The majority of data came from the semi-structured interviews. The initial interview ques-
tions were based on existing resilience research (e.g. Block & Kremen, 1996) to better
understand how employees withstood ongoing challenges. The use of semi-structured
interviews provided structure to attenuate biased questioning, but also gave the inter-
viewer flexibility to ask follow-up questions for further clarification (Pratt & Rosa, 2003).
The interviews lasted from 1 to 2 hours with most lasting around 75 minutes. Sixteen ques-
tions were asked during the initial interviews. Following an emergent process (Pratt &
Rosa, 2003), questions for the second round of interviews were based on themes from
the initial interviews, such as assessing employees’ involvement in cognitive change strat-
egies and reconciliatory processes. To minimize bias, the researchers guarded against

Table 1. Summary of participant characteristics.


Number
Role interviewed Status level Gender
Director (HR, Development, Programming) 4 Upper management 2 Females, 2 male
Executive (VP, CEO) 2 Executive 1 Female, 1 male
Clinical staff 2 Clinicians 2 Females
Direct care (youth counselor, support counselor) 8 Line 8 Females
Programme manager (admissions, quality 4 Middle management 3 Females, 1 male
improvement, service areas)
Support personnel (IT, clerical, finance) 4 Support personnel 3 Females, 1 male
Note: List of interview questions can be obtained by contacting the first author.
254 C. FLINCHBAUGH ET AL.

possible priming effects in the question design and alternated the question order between
employees’ positive and negative experiences. In order to create a comfortable environ-
ment and encourage accurate and honest responses, all interviews were conducted in
enclosed offices without obtrusive recording devices. Notes were taken using a pen and
paper method and transcribed within 24–36 hours to maintain accuracy.

Lead researcher’s work experience


The lead researcher’s prior employment provided a tacit understanding of the setting and
served to deeply inform interpretation of the participants’ responses. As a former employee,
the researcher directed the operations of a community-based programme, attended and
led organization-wide training, and represented the agency at community-wide meetings.

Informal conversations
Informal conversations that occurred between the first author and employees during the
research period served to strengthen the understanding of the setting and assess notable
changes that may have occurred since the researcher’s employment. These conversations
were characterized by a personal, familiar tone as the parties’ reminisced about past work-
place experiences. On two occasions, the researcher was approached by interviewees who
wanted to further expand upon their earlier interview responses. The content of the con-
versations was not recorded; however, the information did serve to inform the researcher’s
observations of the setting.

Data analysis
The research team employed inductive data analysis to further inform the lead research-
er’s tacit knowledge of the workplace. The first author initially reviewed and categorized
the data based on broad, theoretically based common themes. Fifty themes were ident-
ified after the initial interviews (see Table 2). Next, the author inductively sorted and col-
lapsed the themes into well-defined categories (Glaser & Strauss, 1967; Pratt & Rosa, 2003).
To foster a reflexive design and reduce potential bias from the researchers’ knowledge of
past research, the research team reviewed and discussed divergent meanings of the inter-
view data and ensuing themes (Lincoln & Guba, 1985; Wodak, 2004). As a result of the mul-
tiple perspectives of the interview data, the researchers focused on the underlying process
of employees successfully navigating through job challenges. To confirm new insights
from our initial data interpretation (Lincoln & Guba, 1985; Wodak, 2004), the researchers
decided incorporating an iterative approach with a second round of interviews with 14
participants and additional questions was necessary. The new questions sought to
explore the sequence of the cognitive and emotional processes surrounding the intervie-
wees’ experience with workplace challenges.
Following the second set of interviews, the researchers revised and amended the the-
matic categories. Final themes emerged and suggested a framework that employees
espouse to help themselves provide client services in this difficult setting. Specifically,
the authors identified how employees use distinct cognitive change techniques through-
out the day to promote personal reconciliation and positive outcomes despite job-related
challenges (see Figure 1).
JOURNAL OF CHANGE MANAGEMENT 255

Table 2. Initial themes.


Themes Supporting examples
Sense of connectedness Sense of family
Support from others
Strong role models and support at the agency
Sharing the difficulties with others is something she likes most
Collaborative strength and ability to work together
Agency is ‘home base’
It becomes validating to just be able to go to others as a strength
Continuous learning emphasis New opportunities
through change in roles Transition through many roles at agency
and clients over the years Knowledge of agency from different perspectives
Continual learning
Desire for continuous learning
Self-awareness – need to grow
Need to work with others to learn
Recognized areas of growth
Generative perspective Reframing to positive events
Change as a positive challenge
Resiliency builds and replenishes emotional resources
Finding new solutions to problems
Encouraged to be innovative
We have been through a lot and will face challenges and overcome them
Spiritual/religious element Feeling of ‘supposed to be here’
Support of prayer
Action oriented Active engagement in problem solving
Action oriented in ‘connecting, building, helping, etc.’
Helping the girls. Make them aware that there is a different way to handle things
Job gives fulfilment
Train others to come prepared with data
Positive focus Net positive at the end of the day
We are seeing results
Helping others carry on the organization’s legacy
Individual traits Has control over her future
Positive outlook
Acknowledgement of over-achievers
Awareness of limitations – cannot control others
Harmony at work
Reconciliation
Simplicity
Work–life balance
Good fit: emotional and values
Polarity in experience Limited financial resources – to – life changing events

Figure 1. Reconciliation as a bridge in explaining the paradox.


256 C. FLINCHBAUGH ET AL.

Results
Identifying positive experiences through hardships
The data revealed grave hardships in the employees’ daily job experiences. Regardless of
whether they worked directly with the youth or served in support roles, employees were
aware of the extremely difficult, even threatening, yet ultimately rewarding nature of their
work. Employees discussed the widespread conflict and ambiguity inherent in this
paradox. On the one hand, employees described the tenuous nature of the work and
the recognition that at any moment a client emergency – often a traumatic crisis – may
occur. Conversely, employees voiced their admiration for their colleagues and acknowl-
edged the valuable contributions they make towards bettering others’ lives. A clear
picture of the difficult, yet rewarding other-oriented experiences surfaces throughout
the interviews.
Employees acknowledged the difficult environment in every interview and echoed the
frequency of grim client behaviour. Employees talked about times they had been sworn at,
hit, or spat on by the youth. Employees also described situations when they needed to
physically restrain a violent child because the client was considered ‘a risk to themselves
or others’.1 One employee recollected a situation where she had to intervene alone with an
aggressive client. Her memory epitomized the shocking challenges that often arose:
We had a kid here who was a very tough kid … She was in a fight with another girl and they
broke a window. She needed to be physically restrained. My co-workers had to go deal with
the other girls, so I was left alone with this girl. As I was trying to restrain her, I got cut with the
broken glass and got bit by the girl – in a very private area. I was emotional about this situation
because I was left alone in a volatile situation.

Similarly, another employee shared a ‘very haunting experience’ where a client tried to
hang herself in the dorm bathroom. The employee recalled needing to ‘hold her up’
until other staff could come with assistance. Certainly, these examples demonstrate
how the personal trauma arising from such events are experienced by both the employee
and client. Unfortunately, these traumatic moments are typical of employee experiences
as every employee reported such negative experiences; they served as a reminder of
the organization’s mission to ‘serve those in greatest need’.
Yet the challenging and exhausting nature of the work was not the prominent emphasis in
the interviews. In contrast, employees’ discussions shifted to how their jobs presented oppor-
tunities for other-focused positive experiences. Simply put, employees did not merely focus
on the workplace challenges; they ultimately emphasized the value of other-focused helping.
Throughout the interviews, employees’ expression of positive job experiences (n = 118) far
outnumbered any difficulties (n = 68), and are reflected in comments such as:

. ‘Having positive interactions with the kids brings me joy’;


. ‘I get positive energy from my connections with the kids and other staff’;
. ‘I can recall a good day for every girl, even if there were bad times’.

Indeed, several employees reported that their job’s positive experiences outweighed
any personal sacrifices. The information technology manager, in spite of limited youth
interaction, identified the other-oriented importance of the work:
JOURNAL OF CHANGE MANAGEMENT 257

If I can help others do their jobs, then I have made a contribution … I’m not just drawing a
paycheck; I could make more than double what I make here at another job. By proxy, when
I help others I am part of the mission.

A process-oriented framework: reconciliation and cognitive change strategies


It is valuable to note that employees shifted from describing job difficulties to finding
other-oriented meaning; however, we believe the study’s real contribution arose
through employees’ description of how this mental shift occurs. By addressing how
employees accomplish cognitive changes in a challenging service context, our findings
answer recent calls (e.g. Grandey & Gabriel, 2015; Maneotis et al., 2014) to describe a
process-oriented framework of employee helping in spite of continued service incivili-
ties. The employees’ stories revealed particular steps that heightened their ability to
reconcile the paradox between challenging work and positive outcomes. The data
revealed the process-oriented shift through two specific mechanisms. First, the
process of employee reconciliation emerged throughout the interviews. Second, the
employees identified three cognitive change strategies that facilitated reconciliatory
acts. Thus, the presence of employee reconciliation emerged as a bridge in the
paradox, allowing employees to make meaningful connections in their helping work
despite experienced challenges.

Reconciliation as a bridge
We learned from the interview data how reconciliation allowed employees to make an
evaluative judgement of the paradox of engaging in difficult work while recognizing its
future positive potential. More simply, reconciliation helped an individual move beyond
the crisis moments to recognize positive outcomes in the work. Moreover, employees
described the methods they used to shift their cognitive understanding of the situation.
Thus, we find a recursive process between personal reconciliation and cognitive change.
Before we discuss the nature of reconciliation in this study, it is important to note that
reconciliation is not new to the management literature or this social service agency.
Reconciliation describes a victim’s extension of goodwill towards the offending party in
order to restore the relationship (Aquino et al., 2006). The value of reconciliation surfaces
when two or more individuals find resolution after a conflict. In fact, relationship restor-
ation, and not merely conflict resolution, epitomizes the enduring nature of reconciliation
(Palanski, 2012).
Reconciliation also has historical significance at the study’s agency. Namely, the
agency’s religious founders championed its importance by considering reconciliation as
one of their original ‘core values’. To this day, employees are actively encouraged to recon-
cile and resolve inequities that the youth have previously faced. Injustice perceptions or
conflict between individuals (i.e. colleagues, supervisors, customers) is common in any
workplace and this agency is no different. Employees noted the existence of conflict
between and among coworkers and clients. In fact, the youth are often referred to this
organization due to their excessive conflict and aggression with parents, foster parents,
or school personnel. Their conflict-ridden and often dangerous behaviours commonly con-
tinue after arriving at the agency. As such, it is no surprise that challenging youth beha-
viours lead to interpersonal conflict with employees. As a direct result of this pervasive
258 C. FLINCHBAUGH ET AL.

pattern of conflict, employees describe being emotionally and physically taxed at the end
of the day, crying over challenges with youth, and questioning whether they could have
intervened differently in conflict situations. Yet, despite the grim realities of their work
experiences, reconciliation provides a solution for employees.
Reconciliation emerged as employees express ‘other-focused’ concern. Employees
maintained their other-focused concern even when the difficulties were initiated by the
offending party, primarily the youth. Time and again, employees mentioned experiencing
a personally traumatic event, and then shared how they were capable of transcending the
crises and expressing concern for the other individual. One employee directly articulated
the importance of reconciliation:
I have had conflict with others at all levels … I realize that I just need to ‘shake-off’ some con-
flicts and I don’t need to turn every possibility of disagreement into a conflict … But this is
where the value of reconciliation is important – the idea is to restore relationships. It makes
sense with the kids when we (staff) can model it ourselves.

Thus, the employee realized the need to actively live the values of the agency in her inter-
actions with the clients.

Reconciliation through cognitive change strategies


A deeper look into the employee responses to workplace hardships reveals that the
capacity for reconciliation does not exist solely as a trait-like disposition, but actually
emerges as a new capability through the employees’ ongoing utilization of cognitive
change strategies. Employee use of the techniques allowed them to refocus their daily
job perspective from one of relentless challenges to one of meaningful, positive experi-
ences. Based on these findings, we offer a process-oriented framework that proposes
that employees renew their capabilities through cognitive changes in order to reconcile
their energies to serve others. First, this framework identifies how three distinct cognitive
change strategies can guide employees through the paradox by facilitating reconciliation.
Then, we identify how intrapersonal reconciliation emerges from the cognitive strategies to
allow employees to reassess the crisis and initiate relationship repair in the absence of the
offending party.
Before we outline the cognitive change strategies, we first introduce intrapersonal
reconciliation, a novel component in the reconciliation process. Traditionally, scholars
depict reconciliation as a process dependent upon mutual amends-making by all involved
parties following an injustice (Aquino et al., 2006). For instance, both parties take steps
towards resolution together, such as giving apologies (Shnabel & Nadler, 2008) or partici-
pating in mediation (Poitras & Le Tareau, 2009). However, in this setting, employees noted
how reconciliation also occurs at an intrapersonal level. The employees illustrated intraper-
sonal reconciliatory acts where they embraced a changed perspective towards the experi-
enced offense in the absence of the offending party. In these cases, a victim (e.g.
employee) reconciled the offense without the offender’s (e.g. client) help. Moreover, intra-
personal reconciliation was apparent through employee use of the cognitive change strat-
egies. The employees voiced their reliance on distinct cognitive change techniques,
namely visualization techniques, cognitive reframing, and mindfulness of experience, to
generate a renewed perspective through frequent crises.
JOURNAL OF CHANGE MANAGEMENT 259

Visualization techniques
The employees described their use of routine and clearly defined visualization patterns as
they mentally prepared for workday challenges. They also reported their use of the visu-
alization techniques at explicit times and locations. For example, one employee acknowl-
edged her intentional incorporation of the daily visualizations. She states:
There are uncertainties in the hand that you are dealt each day. You just have to handle the
hand that is dealt and try to make the best out of it. If I come in with the attitude that things
are going to be good – then it can be. But, I do need to be ‘re-tooled’ with the right mind and
spirit every day. Each day I prepare my mindset. I have a mini pep rally for myself. I listen to
gospel music, talk to someone who makes me laugh and say a prayer. When I first started
working here I would listen to NPR, the news, and talk radio on my drive in. After some
time, I changed it to my pep rally and I only listen to gospel music. There is no more news,
weather or things like that. The gospel music helps me get in the right mindset to be prepared
for work with the right attitude.

The employees’ intentional use of personal visualizations occurred at varied but consist-
ent time points in the workday. A direct care employee acknowledged that she took
specific steps to deal with the negative work emotions and has developed a ‘pattern of
leaving here. Where I talk with a friend on my drive home and debrief any difficult
things. Then we move on and talk about other things’. On more difficult days, this
employee went home and took a shower to ‘wash off the day’s difficulties’. Another super-
visor reported using daily techniques where she visualizes her experiences at the begin-
ning and end of her workday. As she drives down the driveway on her way home, she
‘visualizes a compartment at the end of the hill where I leave my difficult emotions
from the day’. She then ‘picks it up from the compartment when I return’.
The employees’ description of their visualization techniques was similar to the formal
practice of autogenic training. Users of autogenic training maintain daily practice sessions
to induce a relaxation state and train the subconscious mind to develop intentional mental
associations (Stetter & Kupper, 2002). In this study, employees reported engaging in visu-
alization techniques as they prepared for the workday by listening to select music genres
or ending the workday by explicitly compartmentalizing work thoughts as they leave the
facility. Furthermore, similar to the known health benefits of autogenic training (Stetter &
Kupper, 2002), the employees also described the benefits gained from the visualization
techniques, such as an elevated mood, balance in their work and home life, and improved
mental clarity.
Interestingly, no employees claimed learning about the visualization techniques in
training or through another’s advice. In fact, no employees even used the terms ‘visualiza-
tion techniques’ or ‘autogenic training’ to describe the techniques. They simply describe
patterns of behaviours that helped them deal with the workday difficulties; they visualized
positive scenarios to facilitate coping with daily challenges. Through their personal
resources they envisioned a method to help themselves cope with the tenuous uncertain-
ties of their jobs.

Cognitive reframing
Further, employees conveyed how they cognitively altered their perceptions of their work-
place experiences. Employees voiced a capability to reframe a given situation in order to
reflect an alternative perspective. Frequently, their reframing took the form of
260 C. FLINCHBAUGH ET AL.

circumventing a difficult situation through a focus on the future positives. This is particu-
larly the case when employees altered their perspective of the youth’s current violent
behaviours by recalling the past abuse the child endured and envisioning future
success for the child. They say,

. ‘I can’t make sense of the kid’s trauma. I need to move on and realize that it doesn’t
always have to be this way. I trust the mission and what we are doing. This will
change their lives’.
. ‘I really feel that people are doing the best they can with what they have. Everyone is
wounded in some way – not everyone knows or sees this’.
. ‘I like the idea of being the person that plants the seed (of future change) … I can
believe that they will benefit from what I have done. I have learned to celebrate the
small growth and progress in the kids’.

When the interview focus moved from the clients, the employees emphasized how they
also used cognitive reframing to make a deliberate cognitive shift to positive emotions,
memories, or situations. Employees expressed how they actively sought out a co-worker
who will make them laugh or who has an appreciative mindset; or recalled a positive
memory such as how their team was full of laughs during the last team meeting. Impor-
tantly, almost all of the interviewees (20 out of 22) expressed a pattern of reframing chal-
lenges in a positive light.
We recognize that employee engagement in cognitive reframing is not unique to this
study. Cognitive reframing is a psychological technique that allows one to perceive a past
event from another perspective and from this new vantage point feel more comfortable
with the situation (Erickson, Rossi, & Rossi, 1976). In workplace contexts, reframing has
been examined to model how employees normalize their ‘dirty jobs’ (Ashforth, Kreiner,
Clark, & Fugate, 2007). Cognitive reframing is also similar to rational-emotive behavioural
therapy (REBT) a therapeutic technique in which individuals rationally consider how
emotions influence their self-talk and behaviours (Ellis, 2003). REBT has been used in work-
place settings as a practical approach to improving employee performance (Criddle, 2007).
Yet, in this study, employee use of reframing extended beyond normalization or rational-
ization of their job. In contrast, the cognitive shift enabled employees to illustrate their
meaningful, other-oriented contributions rather than focus on the ‘dirty’, undesirable
characteristics of their roles.

Mindfulness of experience
The employees also reflected on past experiences where they successfully navigated
through difficulties in order to positively reconcile the challenges. Mindfulness of, or
awareness of, past workplace success essentially eases employees’ apprehension about
existing uncertainties. Through fostering a mindfulness of their past experiences, employ-
ees garnered additional information about the causes and consequences of past work-
place events. Akin to situational attribution conceptualized in attribution theory (Lord &
Smith, 1983; Weiner, 1985), we find how an employee’s mindfulness of experience assisted
in his/her explanation of the roots of a behaviour. In this setting, recalling past situations of
successful service was especially informative as employees reflected on past events to
ease the potential for negative future outcomes. For example, an employee employed
JOURNAL OF CHANGE MANAGEMENT 261

at the agency for 19-years discussed her perspective on job difficulties, ‘I have been here
so long and have seen so many positive experiences. I have an instinctual sense that things
will get better’. Her reflection on successful resolutions to past challenges enables her to
reconcile present workplace challenges. Importantly, the focused mindfulness of past suc-
cesses does not rely on employee tenure-level at the organization. In fact, the interviewee
with the shortest agency tenure (five years) expressed a similar mindfulness of experience:
‘I will work through the hard times. I have recognized that the good and bad times cycle.
Seeing the kid’s success stories gives me hope for the future’. Even the employee who
experienced getting cut and bitten is able to reflect on the unsavory experience and
articulate positive remarks:
… It was an emotional experience, but two years ago, that same client called here and apol-
ogized for the day that she cut me. She told me that she is now donating some of her belong-
ings to a friend in need … I recognized that joy came through this initial hostility. The girl
asked her if I would forgive her and I said ‘absolutely’. There is joy out of the tragedy.

Employees’ mindfulness of their past experiences facilitates reconciliation of present diffi-


culties in the expectation of positive future outcomes. Through this cognitive change
process, employees draw connections between past situations and anticipated positive
outcomes. In this case, we find that the employees’ descriptions of the temporal sequence
of mindfulness extend what we know about mindfulness as strictly an experiential proces-
sing of the present moment (Brown, Ryan, & Creswell, 2007; Sutcliffe, Vogus, & Dane, 2016).
Instead, the employees report how mindfulness of experience enables them to help others
reach future potential in spite of past hardship.

Discussion
The dynamic process underlying the paradox
Our research highlights how employee reconciliation, in the form of both interpersonal
and intrapersonal reconciliation, provides the conceptual bridge in the paradox
between difficult and meaningful other-focused service work. The presence of paradoxical
context is common in the management literature (Davis, Maranville, & Obloj, 1997);
however, many studies fail to identify the underlying factors that contribute to such a
paradox (Lewis, 2000). In this study, we outline a process-oriented framework to demon-
strate how employee reconciliation achieved through the use of cognitive change strat-
egies bridges the underlying tensions in the paradox between difficult, yet meaningful
service work. Reconciliation as the conceptual bridge allows employees to find meaning
in their jobs despite the challenging environment. Through such reconciliation, employees
discover that the positive helping experiences in their jobs outweigh any personal injus-
tice or threat faced at work.

Importance of intrapersonal reconciliation


Intrapersonal reconciliation that results from employees’ use of the cognitive change strat-
egies extends the current understanding of reconciliation. The extant reconciliation
research suggests that successful reconciliation requires the participation of all parties
(i.e. victim and offender) involved in the offense (Aquino et al., 2006). However, we
262 C. FLINCHBAUGH ET AL.

demonstrate how intrapersonal reconciliation emerges when employees use cognitive


change strategies to find a meaningful perspective of the offender and/or offense
without active involvement with the offending party. Furthermore, we believe our depic-
tion of intrapersonal reconciliation moves beyond similar psychological constructs such as
compassion or empathy, due to the employees’ actual experience of being physically or
emotionally harmed in the service encounter. To this end, intrapersonal reconciliation
facilitates a self-referential process where employees first independently need to make
sense of the difficult experiences and then identify how they can continue to make mean-
ingful service contributions.
Employees’ altered perspectives towards their workplace difficulties following the use
of the cognitive change strategies were reported by the vast majority of the participants.
Similar to the processes that add value to ‘dirty jobs’ (Ashforth & Kreiner, 1999) and happi-
ness through hardship (Allen & McCarthy, 2015), employee use of cognitive reframing and
mindfulness of experience helped them gain an increased understanding of their own and
others’ workplace experiences. The employees’ cognitive changes involved their recog-
nition that hardships will pass, that the youth rely on employees’ support to overcome per-
sonal traumas, and that the agency will continue to provide successful youth service
consistent with its history. For employees, mindfulness of experience as a temporal cog-
nitive connection (between past and future experiences) appears to extend what we
know about the mindfulness process and assist employees in attaining their service
goals (Sutcliffe et al., 2016) by facilitating connections between static events, such as
felt emotions (Desbordes et al., 2015) and future service performance (Beach et al.,
2013; Reb, Narayanan, & Ho, 2015). Through these new perspectives employees develop
new capabilities to help them transcend difficult work experiences and recognize the
value of their ongoing service.
Similarly, the cognitive change achieved through visualization techniques allows
employees to maintain their engagement at work despite hardships, serves as a buffer
against negative emotions inherent in their jobs, and provides mental clarity in their
workday preparations. Visualization techniques also help employees better understand
work situations, enhance their willingness to reconcile offenses, and raise their level of
positive emotions. In turn, employees’ positive experiences contribute to their flourishing
instead of languishing in job-related difficulties (Fredrickson, 2004). To this end, we show
how intrapersonal reconciliation is a type of self-transcendence that cognitively enriches
difficult jobs (May, Gilson, & Harter, 2004).
This new component of reconciliation stemming from the cognitive change is a valu-
able personal capability for employees in this setting for several reasons. First, this
agency’s employee–client relationship exemplifies a context that requires employees to
face grave client injustices as a routine part of their jobs. Second, due to environmental
constraints (e.g. shift changes) or service protocols (e.g. professional boundaries),
limited dialogue between the employee and the client after the offensive act is typical.
For example, after a violent client outburst, the client might be transferred to a more
restrictive care setting which prevents further client–employee interaction. Moreover,
similar to service guidelines in other service-oriented contexts (i.e. retail, call centres), pro-
fessional adherence to service protocol might limit an employees’ ability to actively
engage in relationship repair after a client offense. To this end, scant opportunity for
JOURNAL OF CHANGE MANAGEMENT 263

interpersonal reconciliation exists and warrants the need for employees’ intrapersonal
reconciliation of the offense.

Contextual and practical implications


The employees reported using all three cognitive change strategies; however, the use of
particular techniques depended largely on the specific job role. For example, the employ-
ees in direct care roles expressed active engagement in daily visualization strategies
outside their work schedule to help them prepare for and cope with their difficult jobs.
It is perhaps their proximity to the youth which requires these employees to engage in
visualization techniques outside work time; they simply cannot remove themselves
from crises. As such, the employees recognize the need for their intentional preparation
for workplace challenges, and visualization techniques appear to be the most effective
means to do this. Indeed, they acknowledged how their ongoing practice of such tech-
niques helped them fulfil their job responsibilities despite youth challenges, elevated
their mood after work, and helped them bring a positive outlook to the workplace.
Alternatively, those relatively removed from the youth in administrative or support roles
were more likely to use cognitive reframing. We contend there are several possible expla-
nations for their preference for such reframing. First, the distal location of their offices
physically removes them from sudden client crises and allows for time to cognitively
process challenging events and find an alternative viewpoint. Second, many administra-
tors have clinical backgrounds and have been previously exposed to similar therapeutic
interventions used at the agency.
Interestingly, while employee use of visualization techniques and cognitive reframing
largely depended on job role, employees in any role and at any tenure-level used mind-
fulness to seek success through consideration of past experiences. It is possible that the
reported influence of the agency mission guided employees to recall the value of their
past service. In summary, regardless of employees’ preferences for particular techniques,
the cognitive change strategies helped them to individually reconcile difficult client
service experiences.
Our findings contribute to positive organizational research and the service literature
(Anderson, 2006; Lilius et al., 2011) by moving beyond antecedents and outcomes to illus-
trate a more process-oriented approach of successful employee navigation through diffi-
cult jobs. Moreover, the use of these practices provides a unique respite for employees
who may otherwise fail to alleviate job strain in an industry highly vulnerable to employee
burnout and turnover (e.g. Burke & Greenglass, 1989). Future researchers should examine
employee use of cognitive change strategies in additional job roles and industries. It may
be that the ambiguous and potentially unsafe nature of client interaction in this study
lends itself to heightened employee use of the strategies.

Limitations and opportunities for research


Like any research our study is not without limitations. First, there is the potential for
researcher bias in any qualitative research study (e.g. Shenton, 2004). However, we did
take steps to mitigate potential bias and increase the trustworthiness of the results
through the use of member checks and discussion of divergent data interpretation.
264 C. FLINCHBAUGH ET AL.

Second, the authors believe it is important to acknowledge that while our study’s focus
depicted how employees used cognitive change strategies to achieve reconciliation and
positive workplace outcomes, the employees also acknowledged the agency’s contextual
shortcomings. For instance, employees shared their concerns about limited financial
resources and the agency’s pending leadership change. Certainly these conditions were
seen as problematic; nevertheless, it was clear to the researchers that the employees
were nonetheless capable of maintaining a positive outlook and finding worth and
meaning in their jobs. Future researchers should examine alternative factors that may
facilitate employees’ positive experiences in similar settings. It may be that the organiz-
ational culture influences employees’ pro-social motivation. Employees may make connec-
tions between the agency’s past religious affiliation and jobs that increase the salience of
their own values and meaningfulness experienced (Dik & Duffy, 2009). Likewise, future
researchers should examine the influence of other contextual influences on employees’
meaningfulness, such as supervisor and coworker influences (May et al., 2004) or tenure
at the organization.
Future researchers should also investigate whether focused employee training on the
use of cognitive change techniques can enhance employee performance through devel-
opment of psychological resources supportive of intrapersonal reconciliation. Akin to the
known physical and psychological benefits of stress management techniques (Van der
Klink, Blonk, Schene, & Van Dijk, 2001), our findings appear to indicate that training
employees on the benefits of cognitive change might assist individuals in moving
their mindset towards a future orientation of expected positive job-related outcomes
for both themselves and their clients. This future investigation will require a combination
of qualitative longitudinal work and quantitative analysis of employee reports and
records.

Conclusion
In this paper, we examined the individual level processes of cognitive change strategies
and reconciliation that influence employees’ positive workplace outcomes. The findings
go beyond identification of a job-related paradox to interpret how reconciliation and
the change strategies serve as the underlying process in managing the tensions in this
paradox. In fact, an employee explicitly referenced such successful ‘transformation’ follow-
ing the tension to reward in the following job-related experience:
My most memorable experience was working with a girl who was extremely volatile in her
actions and emotions. I would get a knot in my stomach anytime I had to work with her.
Our conversation would start great and then would turn on a dime-often leading to difficult
behaviors. But, she eventually did really well in her foster home and we were celebrating her
move to an independent living home. I wrote something up to share at this ceremony and
I compared my time working with her like a roller coaster. There are moments that scare
you to death, and moments that give you a thrill, but at the end of the ride you want to
get back on and ride again. That was my experience with this girl. My time with her was
transforming to me.

It is our hope that the study’s findings and emergent, process-focused framework can be
used by researchers and practitioners alike in order to extend our understanding of how
employees successfully navigate workplace challenges.
JOURNAL OF CHANGE MANAGEMENT 265

Note
1. Safe crisis management, a therapeutic physical restraint, is used to maintain client safety
during violent outbursts.

Disclosure statement
No potential conflict of interest was reported by the authors.

Notes on contributors
Carol Flinchbaugh an Assistant Professor of management at New Mexico State University. Her
research interests seek to understand how organizational policies and procedures influence
employee level behaviour in areas such as employee well-being, stress, and positive psychology.
Catherine Schwoerer is an Associate Professor of management at the University of Kansas. Her work
considers employees’ self-efficacy, well-being, and career management.
Douglas R. May is a Professor and Director of the International Center for Ethics in Business at the
University of Kansas. He examines employees’ moral efficacy and ethical decision-making as well as
employees’ pursuit of engagement, thriving and meaningful work.

References
Allen, M. S., & McCarthy, P. J. (2015). Be happy in your work: The role of positive psychology in
working with change and performance. Journal of Change Management. doi:10.1080/14697017.
2015.1128471
Anderson, J. (2006). Managing employees in the service sector: A literature review and conceptual
development. Journal of Business and Psychology, 20, 501–523.
Aquino, K., Tripp, T., & Bies, R. (2006). Getting even or moving on? Power, procedural justice, and
types of offense as predictors of revenge, forgiveness, reconciliation, and avoidance in organiz-
ations. Journal of Applied Psychology, 91, 653–668. doi:10.1037/0021-9010.91.3.653
Ashforth, B., & Kreiner, G. (1999). ‘How can you do it?’: Dirty work and the challenge of constructing a
positive identity. The Academy of Management Review, 24, 413–434. doi:10.5465/AMR.1999.2202129
Ashforth, B., Kreiner, G., Clark, M., & Fugate, M. (2007). Normalizing dirty work: Managerial tactics for
countering occupational taint. Academy of Management Journal, 50, 149–174. doi:10.5465/AMJ.
2007.24162092
Atkins, P., & Parker, S. (2011). Understanding individual compassion in organizations: The role of
appraisals and psychological flexibility. Academy of Management Review, amr-10. doi:10.5465/
amr.10.0490
Beach, M. C., Roter, D., Korthuis, P. T., Epstein, R. M., Sharp, V., Ratanawongsa, N., … , Saha, S. (2013). A
multicenter study of physician mindfulness and health care quality. The Annals of Family Medicine,
11(5), 421–428.
Block, J., & Kremen, A. M. (1996). IQ and ego-resiliency: Conceptual and empirical connections and
separateness. Journal of Personality and Social Psychology, 70(2), 349–361.
Brown, K. W., Ryan, R. M., & Creswell, J. D. (2007). Mindfulness: Theoretical foundations and evidence
for its salutary effects. Psychological Inquiry, 18, 211–237.
Burke, R., & Greenglass, E. (1989). Psychological burnout among men and women in teaching: An
examination of the Cherniss Model. Human Relations, 42, 261–273. doi:10.1177/
001872678904200304
Cameron, K., Dutton, J., & Quinn, R. (Eds). (2003). Positive organizational scholarship: Foundations of a
new discipline. San Francisco, CA: Berrett-Koehler Publishers, Inc.
Creswell, J. W. (2013). Research design: Qualitative, quantitative, and mixed methods approaches.
Thousand Oaks, CA: Sage.
266 C. FLINCHBAUGH ET AL.

Cricco-Lizza, R. (2014). The need to nurse the nurse: Emotional labor in neonatal intensive care.
Qualitative Health Research, 24(5), 615–628.
Criddle, W. D. (2007). Adapting REBT to the world of business. Journal of Rational-Emotive & Cognitive-
Behavior Therapy, 25(2), 87–106.
Davis, A., Maranville, S., & Obloj, K. (1997). The paradoxical process of organizational transformation:
Propositions and a case study. Research in Organizational Change and Development, 10, 275–314.
Desbordes, G., Gard, T., Hoge, E. A., Hölzel, B. K., Kerr, C., Lazar, S. W., … , Vago, D. R. (2015). Moving
beyond mindfulness: Defining equanimity as an outcome measure in meditation and contempla-
tive research. Mindfulness, 6(2), 356–372.
Dik, B., & Duffy, R. (2009). Calling and vocation at work: Definitions and prospects for research and
practice. The Counseling Psychologist, 37, 424–450. doi:10.1177/0011000008316430
Duke, A., Goodman, J., Treadway, D., & Breland, J. (2009). Perceived organizational support as a mod-
erator of emotional labor/outcomes relationships. Journal of Applied Social Psychology, 39, 1013–
1034. doi:10.1111/j.1559-1816.2009.00470
Ellis, A. (2003). Reasons why rational emotive behavior therapy is relatively neglected in the pro-
fessional and scientific literature. Journal of Rational-Emotive and Cognitive-Behavior Therapy, 21
(3–4), 245–252.
Erickson, M., Rossi, E., & Rossi, S. (1976). Hypnotic realities: The induction of clinical hypnosis and forms
of indirect suggestion. New York: Irvington.
Fisk, G., & Neville, L. (2011). Effects of customer entitlement on service workers’ physical and psycho-
logical well-being: A study of waitstaff employees. Journal of Occupational Health Psychology, 16,
391–405. doi:10.1037/a0023802
Folkman, S., Lazarus, R. S., Dunkel-Schetter, C., DeLongis, A., & Gruen, R. J. (1986). Dynamics of a stress-
ful encounter: Cognitive appraisal, coping, and encounter outcomes. Journal of Personality and
Social Psychology, 50(5), 992–1003.
Fredrickson, B. L. (2003). The value of positive emotions. American Scientist, 91, 330–335.
Fredrickson, B. L. (2004). The broaden-and-build theory of positive emotions. Philosophical
Transactions of the Royal Society B: Biological Sciences, 359, 1367–1377.
Glaser, B., & Strauss, A. (1967). The discovery of grounded theory: Strategies for qualitative research.
Chicago, IL: Aldine.
Glavin, P., & Peters, A. (2015). The costs of caring: Caregiver strain and work–family conflict among
Canadian workers. Journal of Family and Economic Issues, 36(1), 5–20.
Goldberg, L. S., & Grandey, A. A. (2007). Display rules versus display autonomy: Emotion regulation,
emotional exhaustion, and task performance in a call center simulation. Journal of Occupational
Health Psychology, 12(3), 301–318.
Goodman, P., & Rousseau, D. (2004). Organizational change that produces results: The linkage
approach. The Academy of Management Executive, 18, 7–19. doi:10.5465/AME.2004.14776160
Goussinsky, R. (2011). Does customer aggression more strongly affect happy employees? The mod-
erating role of positive affectivity and extraversion. Motivation and Emotion, 35, 220–234. doi:10.
1007/s11031-001-9215
Grandey, A. A., Dickter, D. N., & Sin, H. P. (2004). The customer is not always right: Customer aggression
and emotion regulation of service employees. Journal of Organizational Behavior, 25(3), 397–418.
Grandey, A., & Gabriel, A. (2015). Emotional labor at a crossroads: Where do we go from here? Annual
Review of Organizational Psychology and Organizational Behavior, 2(1), 323–349. doi:10.1146/
annurev-orgpsych-032414-111400
Jackson, D., Firtko, A., & Edenborough, M. (2007). Personal resilience as a strategy for surviving and thriv-
ing in the face of workplace adversity: A literature review. Journal of Advanced Nursing, 60(1), 1–9.
Johlke, M., & Iyer, R. (2013). A model of retail job characteristics, employee role ambiguity, external
customer mind-set, and sales performance. Journal of Retailing and Consumer Services, 20, 58–67.
Judge, T. A., Woolf, E. F., & Hurst, C. (2009). Is emotional labor more difficult for some than for others?
A multilevel, experience-sampling study. Personnel Psychology, 62(1), 57–88.
Koys, D. (2001). The effects of employee satisfaction, organizational citizenship behavior, and turn-
over on organizational effectiveness: A unit-level, longitudinal study. Personnel Psychology, 54,
101–114. doi:10.1111/j.1744-6570.2001.tb00087
JOURNAL OF CHANGE MANAGEMENT 267

Lewis, M. (2000). Exploring paradox: Toward a more comprehensive guide. The Academy of
Management Review, 25, 760–776. doi:10.5465/AMR.2000.3707712
Lilius, J., Worline, M., Dutton, J., Kanov, J., & Maitlis, S. (2011). Understanding compassion capability.
Human Relations, 64, 873–899. doi:10.1177/0018726710396250
Lincoln, Y., & Guba, E. (1985). Naturalistic inquiry. London: Sage.
Lord, R. G., & Smith, J. E. (1983). Theoretical, information processing, and situational factors affecting
attribution theory models of organizational behavior. Academy of Management Review, 8(1), 50–60.
Maneotis, S., Grandey, A., & Krauss, A. (2014). Understanding the ‘why’ as well as the ‘how’: Service
performance is a function of prosocial motives and emotional labor. Human Performance, 27,
80–97. doi:10.1080/08959285.2013.854366
Masten, A. S. (2001). Ordinary magic: Resilience process in development. American Psychologist, 56,
227–238.
May, D., Gilson, R., & Harter, L. (2004). The psychological conditions of meaningfulness, safety and
availability and the engagement of the human spirit at work. Journal of Occupational and
Organizational Psychology, 77, 11–37. doi:10.1348/096317904322915892
Mizzoni, C., & Kirsh, B. (2007). Employer perspectives on supervising individuals with mental health
problems. Canadian Journal of Community Mental Health, 25, 193–206.
Palanski, M. (2012). Forgiveness and reconciliation in the workplace: A multi-level perspective and
research agenda. Journal of Business Ethics, 109, 275–287. doi:10.1007/s10551.011.1125.1
Poitras, J., & Le Tareau, A. (2009). Quantifying the quality of mediation agreements. Negotiation and
Conflict Management Research, 2(4), 363–380.
Pratt, M., & Rosa, J. (2003). Transforming work–family conflict into commitment in network marketing
organizations. The Academy of Management Journal, 46, 395–418.
Reb, J., Narayanan, J., & Ho, Z. W. (2015). Mindfulness at work: Antecedents and consequences of
employee awareness and absent-mindedness. Mindfulness, 6(1), 111–122.
Robinson, O. C. (2014). Sampling in interview-based qualitative research: A theoretical and practical
guide. Qualitative Research in Psychology, 11(1), 25–41.
Robinson, S., Weiss, J. A., Lunsky, Y., & Ouellette-Kuntz, H. (2015). Informal support and burden among
parents of adults with intellectual and/or developmental disabilities. Journal of Applied Research in
Intellectual Disabilities, 29(4), 356–365.
Schilling, R., Morrish, J., & Liu, G. (2008). Demographic trends in social work over a quarter-century in
an increasingly female profession. Social Work, 53, 103–114. doi:10.1093/sw/53.2.103
Shenton, A. K. (2004). Strategies for ensuring trustworthiness in qualitative research projects.
Education for Information, 22(2), 63–75.
Shnabel, N., & Nadler, A. (2008). A needs-based model of reconciliation: Satisfying the differential
emotional needs of victim and perpetrator as a key to promoting reconciliation. Journal of
Personality and Social Psychology, 94, 116–132.
Stetter, F., & Kupper, S. (2002). Autogenic training: A meta-analysis of clinical outcome studies.
Applied Psychophysiology and Biofeedback, 27, 45–98. doi:10.1023.A.1014576505223
Sutcliffe, K. M., Vogus, T. J., & Dane, E. (2016). Mindfulness in organizations: A cross-level review.
Annual Review of Organizational Psychology and Organizational Behavior, 3, 55–81.
Tarren-Sweeney, M. (2008). Retrospective and concurrent predictors of the mental health of children
in care. Children and Youth Services Review, 30(1), 1–25.
Van der Klink, J. J., Blonk, R. W., Schene, A. H., & Van Dijk, F. J. (2001). The benefits of interventions for
work-related stress. American Journal of Public Health, 91(2), 270–276.
Wallace, J. C., Edwards, B. D., Shull, A., & Finch, D. M. (2009). Examining the consequences in the ten-
dency to suppress and reappraise emotions on task-related job performance. Human Performance,
22(1), 23–43.
Weiner, B. (1985). ‘Spontaneous’ causal thinking. Psychological Bulletin, 97, 74–84. doi:10.1037/0033-
2909.97.1.74
Wodak, R. (2004). Critical discourse analysis. In C. Seale, J. F. Gubrium, & D. Silverman (Eds.), Qualitative
research practice (pp. 185–204). Thousand Oaks, CA: Sage.
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/51554230

Effect Size Estimates: Current Use, Calculations, and Interpretation

Article  in  Journal of Experimental Psychology General · August 2011


DOI: 10.1037/a0024338 · Source: PubMed

CITATIONS READS
2,783 52,868

3 authors, including:

Catherine Fritz
The University of Northampton
26 PUBLICATIONS   3,778 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Catherine Fritz on 04 December 2016.

The user has requested enhancement of the downloaded file.


THIS ARTICLE HAS BEEN CORRECTED. SEE LAST PAGE

Journal of Experimental Psychology: General © 2011 American Psychological Association


2012, Vol. 141, No. 1, 2–18 0096-3445/11/$12.00 DOI: 10.1037/a0024338

Effect Size Estimates: Current Use, Calculations, and Interpretation

Catherine O. Fritz and Peter E. Morris Jennifer J. Richler


Lancaster University Vanderbilt University

The Publication Manual of the American Psychological Association (American Psychological Associ-
ation, 2001, 2010) calls for the reporting of effect sizes and their confidence intervals. Estimates of effect
size are useful for determining the practical or theoretical importance of an effect, the relative contri-
butions of factors, and the power of an analysis. We surveyed articles published in 2009 and 2010 in the
Journal of Experimental Psychology: General, noting the statistical analyses reported and the associated
reporting of effect size estimates. Effect sizes were reported for fewer than half of the analyses; no article
reported a confidence interval for an effect size. The most often reported analysis was analysis of
variance, and almost half of these reports were not accompanied by effect sizes. Partial ␩2 was the most
commonly reported effect size estimate for analysis of variance. For t tests, 2/3 of the articles did not
report an associated effect size estimate; Cohen’s d was the most often reported. We provide a
straightforward guide to understanding, selecting, calculating, and interpreting effect sizes for many types
of data and to methods for calculating effect size confidence intervals and power analysis.

Keywords: effect size, eta squared, confidence intervals, statistical reporting, statistical interpretation

Experimental psychologists are accomplished at designing and included in the sixth edition (APA, 2010). Regarding effect sizes,
analyzing factorial experiments and at reporting inferential statis- the sixth edition states,
tics that identify significant effects. In addition to statistical sig-
nificance, most research reports describe the direction of an effect, For the reader to appreciate the magnitude or importance of a study’s
but it is also instructive to consider its size. Estimates of effect size findings, it is almost always necessary to include some measure of
effect size in the Results section. Whenever possible, provide a
are useful for determining the practical or theoretical importance
confidence interval for each effect size reported to indicate the pre-
of an effect, the relative contribution of different factors or the cision of estimation of the effect size. (APA, 2010, p. 34)
same factor in different circumstances, and the power of an anal-
ysis. This article reports the use of effect size estimates in the 2009 Effect sizes allow researchers to move away from the simple
and 2010 volumes of the Journal of Experimental Psychology: identification of statistical significance and toward a more gener-
General (JEP: General), comments briefly on their use, and offers ally interpretable, quantitative description of the size of an effect.
practical advice on choosing, calculating, and reporting effect size They provide a description of the size of observed effects that is
estimates and their confidence intervals (CIs). independent of the possibly misleading influences of sample size.
Effect size estimates have a long and somewhat interesting Studies with different sample sizes but the same basic descriptive
history (for details, see Huberty, 2002), but the current attention to characteristics (e.g., distributions, means, standard deviations, CIs)
them stems from Cohen’s work (e.g., Cohen, 1962, 1988, 1994) will differ in their statistical significance values but not in their
championing the reporting of effect sizes. In response to Cohen effect size estimates. Effect sizes describe the observed effects;
(1994) the American Psychological Association (APA) Board of effects that are large but nonsignificant may suggest further re-
Scientific Affairs set up a task force that proposed guidelines for search with greater power, whereas effects that are trivially small
statistical methods for psychology journals (Wilkinson & the APA but nevertheless significant because of large sample sizes can warn
Task Force on Statistical Inference, 1999). These guidelines were researchers against possibly overvaluing the observed effect.1
subsequently incorporated into the revised fifth edition of the Effect sizes can also allow the comparison of effects in a single
Publication Manual of the American Psychological Association study and across studies in either formal or informal meta-
(APA, 2001; hereinafter APA Publication Manual) and were again analyses. When planning new research, previously observed effect
sizes can be used to calculate power and thereby estimate appro-
priate sample sizes. Cohen (1988), Keppel and Wickens (2004),
This article was published Online First August 8, 2011. and most statistical textbooks provide guidance on calculating
Catherine O. Fritz, Educational Research Department, Lancaster Uni- power; a very brief, elementary guide appears in the Appendix
versity, Lancaster, United Kingdom; Peter E. Morris, Department of Psy- along with mention of planning sample sizes based on accuracy in
chology, Lancaster University, Lancaster, United Kingdom; Jennifer J.
Richler, Department of Psychology, Vanderbilt University.
1
We thank Thomas D. Wickens and Geoffrey Cumming for their very It is rarely the case that experimental studies have the problem of too
helpful advice on an earlier version of this article. many cases making trivial effects statistically significant, but some large-
Correspondence concerning this article should be addressed to Catherine scale surveys and other studies with very large sample sizes can have this
O. Fritz, Educational Research Department, Lancaster University, Lan- problem. For example, a correlation of .1, accounting for only 1% of the
caster LA1 4YD, United Kingdom. E-mail: c.fritz@lancaster.ac.uk variability, is statistically significant with a sample size of 272 (one tailed).

2
EFFECT SIZE ESTIMATES 3

parameter estimation (i.e., planning the size of the CIs; Cumming, two levels or are continuous, effect size estimates usually describe
2012; Kelley & Rausch, 2006; Maxwell, Kelley, & Raush, 2008). the proportion of variability accounted for by each independent
A brief note on the terminology used in this article may be variable; they include eta squared (␩2, sometimes called R2),
helpful. Effect sizes calculated to describe the data in a sample, partial eta squared (␩2p), generalized eta squared (␻G 2
), associated
like any other descriptive statistic, also potentially estimate the omega squared measures (␻2, ␻2p, ␻G 2
), and common correlational
corresponding population parameter. Throughout this article, we measures, such as r2, R2, and Radj2. In addition, there are other less
refer to the calculated effect size, which describes the sample and frequently encountered statistics, such as epsilon squared (ε2;
estimates the population, as an effect size estimate. It is important Ezekiel, 1930) and various statistics devised by Cohen (1988),
to remember that the estimates both describe the sample and including q, f, and f2. Finally, there are the effect size estimates
estimate the population and that some statistics, as we describe relevant to categorical data, such as phi (␾), Cramér’s V (or ␾c),
later, provide better estimation of the population parameters than Goodman–Kruskal’s lambda, and Cohen’s w (Cohen, 1988). The
do others. plethora of possible effect size estimates may create confusion and
The most basic and obvious estimate of effect size when con- contribute to the lack of engagement with reporting and interpret-
sidering whether two data sets differ is the difference between the ing effect sizes. Many of these statistics are conceptually, and even
means; most articles report means, and the difference is easily algebraically, quite similar but have been developed as improve-
calculated. Some researchers argue that differences between the ments or to serve different types of data and different purposes.
means are generally sufficient and superior to other ways of The emergence of a consensus to use a few selected estimates
quantifying effect size (e.g., Baguley, 2009; Wilkinson & the APA would probably be a useful simplification, as long as the choice
Task Force on Statistical Inference, 1999). The raw difference was driven by the genuine usefulness of those estimates and not
between the means can provide a useful estimate of effect size merely by their easy availability.
when the measures involved are meaningful ones, such as IQ or One important distinction to make among effect sizes is that
reading age, assessed by a standard scale that is widely used. some statistics, such as ␩2 and R2, describe the samples observed
Discussion of the effect would naturally focus on the raw differ- but may overestimate the population parameters, whereas others,
ence, and it would be easy to compare the results with other such as ␻2 and adjusted R2, attempt to estimate the variability in
research using the same measure. the sampled population and, thus, in replications of the experiment.
However, comparing means without considering the distribu- These latter statistics are often recommended by statistical text-
tions from which the means were calculated can be seriously books because they relate to the population and are less vulnerable
misleading. If two studies (A and B) each have two conditions with to inflation from chance factors. However, researchers very rarely
means of 100 and 108, it would be very misleading to conclude report these population estimates, perhaps because they tend to be
that the effects in the two studies are the same. If the standard smaller than the sample statistics.
deviations for the conditions in Study A were both two and in Although the APA Publication Manual has strongly advocated
Study B were both 80, then it is clear that the distributions for the reporting of effect sizes for 10 years and many psychology
Study A would have virtually no overlap, whereas those for Study editors have done so for longer than that (e.g., Campbell, 1982;
B would overlap substantially. Using Cohen’s U1, which we Levant, 1992; Murphy, 1997), a glance through many journals
describe later, we find that only 2% of the distributions for Study suggests that such reporting is inconsistent. Morris and Fritz
A would overlap, given the standardized difference between the (2011) surveyed cognitive articles published in 2009; they found
means (d) of 4, but 92% of the distributions would overlap in that only two in five of these articles intentionally reported effect
Study B because the standardized difference between these means sizes. Isabel Gautier, as the incoming editor of JEP: General,
is d ⫽ 0.1. Significance tests make the difference between the two asked us to conduct a similar survey of recent volumes of this
studies quite clear. For the study with standard deviations of two, journal and to review the methods of calculating effect size esti-
a t test would find a two-tailed significant difference (p ⬍ .05) mates.
with three participants per group, but 770 participants per group
would be needed to obtain a significant difference for the study Method
with standard deviations of 80. The consequence of the difference
in the size of the distributions is also obvious when considering the We reviewed articles published in the 2009 and 2010 volumes
CIs: With 50 samples in each study, Study A’s CI ⫽ ⫾0.6, of JEP: General, noting the statistical analyses, descriptive statis-
whereas Study B’s CI ⫽ ⫾22.2. These examples illustrate how tics, and effect size estimates reported in each.
comparisons between means without considering the variability of
the data can conceal important properties of the effect. To address Results
this problem, standardized effect size calculations have been de-
veloped that consider variability as well as the differences between Table 1 provides frequencies of the most commonly used sta-
means. Effect size calculations are addressed by a growing number tistical analyses for each year; corresponding percentages are il-
of specialized texts, including Cumming (2012), Ellis (2010), lustrated in Figure 1. Note that data are reported for each article,
Grissom and Kim (2005, 2011), and Rosenthal, Rosnow, and not for each experiment, but the analyses were similar across
Rubin (2000) as well as many general statistical texts. experiments in most articles. Analysis of variance (ANOVA) was
When examining the difference between two conditions, effect reported in most articles, 83% overall, followed by t tests, 66%
sizes based on standardized differences between the means are overall; these were often used together to locate the source of
commonly recommended. These include Cohen’s d, Hedges’s g, effects in factorial designs. Overall, 58% of the articles reported at
and Glass’s d and ⌬. When independent variables have more than least one measure of effect size (73% for 2009, 45% for 2010).
4 FRITZ, MORRIS, AND RICHLER

Table 1 value for the test, although most reported some statistics associated
Numbers of Articles Reporting Commonly Used Statistical with the predictors, such as the t tests, the regression weights, the
Analyses partial correlations, or the odds ratios. We counted all reports of R2
as estimates of effect size, although most were not explicitly
Year Articles ANOVA t test Correlation Regression presented as such.
2009 33 27 23 14 8 A few nonparametric and frequency-based tests were also re-
2010 38 32 24 13 9 ported; only one of these included a measure of effect size. These
Overall 71 59 47 27 17 reports also tended to neglect statistical summaries of the data.
Note. Corresponding percentages are shown in Figure 1. ANOVA ⫽
analysis of variance. Discussion
Our initial concern over the reporting of effect sizes was justi-
Different analyses are often associated with different estimates fied by our analysis. Across the 2 years studied, 42% of articles
of effect size; therefore, the use of specific effect size estimates is reported no measure of effect size. Most of the articles counted as
reported within the context of the analysis conducted. Table 2 including effect sizes reported them for only some of the analyzed
shows the effect size estimates used in conjunction with ANOVA effects. Even where articles reported ␩2p for ANOVA analyses,
type analyses, including analysis of covariance. Overall, slightly they often omitted effect size estimates for nonsignificant effects
more than half of the articles reported a measure of effect size and other comparisons. Fewer than a third of the articles reporting
associated with ANOVA at least once; ␩2p was by far the most t tests included associated effect size estimates to aid in interpret-
frequently used, almost certainly because it is provided by SPSS. ing the results. On the positive side, reported effect sizes in the
Effect size estimates were rarely reported for further JEP: General articles were clear with respect to which effect size
ANOVA-related analyses, such as simple effects and post hoc and statistic was used. Our recent survey of cognitive articles (Morris
planned comparisons. Table 3 summarizes the frequency of further & Fritz, 2011) found articles in which effect sizes were wrongly
ANOVA-related analyses and the inclusion of effect size esti- identified, and we have occasionally encountered articles that
mates. Most articles did not report all components of the ANOVA; report effect size figures without identifying which statistic was
only 10 of the 59 articles (five from each year) reported the mean used. As Vacha-Haase and Thompson (2004) observed, it is es-
square error (MSE) terms and only 20 (12 for 2009 and 8 for 2010) sential to correctly identify which statistic is used: Reporting that
reported F ratios for all effects. When reporting ANOVA, articles an effect size is .5 means something very different depending on
usually included descriptive statistics for the data, either in terms whether the statistic used is d, r, r2, ␩2, ␻2, or others.
of individual cells or marginals. At least some means associated We observed almost no interpretation of the effect sizes that
with ANOVA were reported in 93% of the articles (93% for 2009 were reported, despite APA’s direction to address the “theoretical,
and 94% for 2010); some measure of variability was less often clinical, or practical significance of the outcomes” (APA, 2010, p.
reported, appearing in only 80% of the articles (81% for 2009 and 36). Clearly effect sizes are important in a clinical and practical
78% for 2010). sense. Are they less relevant in a theoretical sense? If theories are
When reporting t tests, roughly one quarter of articles included solely concerned with the statistical significance of effects and not
a measure of effect size; Cohen’s d was the most often used effect with their size, then perhaps there is no useful role for effect size
size estimate. See Table 4 for numbers and percentages of effect consideration in interpretation, but surely good theories are con-
size estimates and descriptive statistics reported in association with cerned with substantive significance rather than merely statistical
t tests. Descriptive statistics were less often provided than for significance. A theory that only predicts a difference (or relation-
ANOVA; almost one quarter did not report a measure of central
tendency, and almost half failed to report the variability of the
data.
Neither intentional reporting of effect size estimates nor descrip-
tive statistics tended to accompany reports of correlations. Refer to
Table 5 for numbers and percentages of articles intentionally
reporting effect size estimates and descriptive statistics associated
with correlation analyses. Fewer than 10% of the articles reporting
correlations provided r2 or any other associated effect size estimate
beyond the correlation, and fewer than one quarter reported de-
scriptive statistics for the data. Although r is a useful estimate of
effect size, there is a difference between reporting it as a correla-
tion and treating it as an estimate of effect size; none of these
articles appeared to present it as an effect size estimate.
Figure 1. Percentage of published articles including each type of analy-
Various types of regression were also reported in 17 articles.
sis. Other types of analyses were also observed with lower frequencies,
Most of these were very selective in terms of the statistics reported including ␹2 (18% for 2009 and 16% for 2010), nonparametric difference
from the analysis and in terms of descriptive statistics; there were tests (3% for 2009 and 11% for 2010), and Cronbach’s alpha (6% for 2009
almost no reports of effect size. Table 6 shows numbers and and 11% for 2010); these were not accompanied by measures of effect size
percentages of statistics reported in association with regression and are not discussed here. Corresponding frequencies are shown in Table
analyses. Most articles did not report the F ratio or significance 1. ANOVA ⫽ analysis of variance.
EFFECT SIZE ESTIMATES 5

Table 2
Number (and Percentage) of Articles Reporting Effect Size Estimates Associated With ANOVA

Year Articles with ANOVA Any ES measure ␩2 ␩2p ␻2 ␻2p d

2009 27 18 (67) 1 (6) 17 (94) 0 0 2 (11)


2010 32 15 (47) 5 (33) 9 (60) 1 (7) 0 3 (20)
Overall 59 33 (56) 6 (18) 26 (79) 1 (3) 0 5 (15)

Note. Articles were included in the counts if the statistic was reported at least once. Percentages across measures sum to more than 100% because some
articles included more than one measure of effect size. ANOVA ⫽ analysis of variance; ES ⫽ effect size.

ship) but is not concerned with the size of that effect will be one means, are point estimates. They describe the sample and provide
that is quite difficult to falsify and perhaps even more difficult to an estimate of the population parameter. For an estimate to be
apply. useful, it is important to provide some idea of how precise that
It appears that effect sizes may be reported to meet the minimum estimate might be—the expected range within which the popula-
letter of the law, with little regard for the spirit of the law. The tion parameter falls with some specified probability. When means
preponderance of ␩2p in these analyses and the sparsity of discus- are reported, it is widely accepted good practice to report some
sion of reported effect sizes is consistent with a scenario wherein measure of variability— either the standard error, the standard
people obtain ␩2p values from their statistical software and report it deviation (from which the standard error is easily calculated), or a
as required, but they give scant consideration to the implications of CI. These variability statistics provide a guide to the probable
the values obtained. Little appears to have changed in the 60 years values for the population parameter. The effect size estimate also
since Yates (1951) observed that an emphasis on statistical signif- requires some accompanying description of its likely variability;
icance testing had two main negative consequences: Statisticians that variability statistic is the associated CI.
develop significance tests for The lack of these CIs in research reports is, however, under-
problems . . . of little or no practical importance [and] scientific standable. Textbooks that describe effect size statistics typically do
research workers . . . pay undue attention to the results of the tests of not provide associated guidance for calculating the CIs. Com-
significance they perform on their data, particularly data derived from monly used statistical software packages also fail to provide them.
experiments, and too little to the estimates of the magnitude of the Furthermore, most measures of effect size are noncentrally distrib-
effects they are investigating. (p. 32). uted (see, e.g., Ellis, 2010, pp. 19 –21; Grissom & Kim, 2005, p.
64), a somewhat nonintuitive concept that makes them more dif-
Many researchers may be cautious about engaging too deeply
ficult to understand and to calculate. The CIs above and below the
with the effect size values that they calculate because, in contrast
to the use of inferential statistics, they have far less experience in effect size estimate are not equal in size and have to be estimated
using the effect size estimates as an aspect of evaluating results by special software carrying out iterative procedures (e.g., Cum-
and providing guidance for future research. In part, this situation ming, 2012; Cumming & Finch, 2001; Smithson, 2003; Steiger,
may arise from the tendency to report ␩2p, which has limited 2004). These unusual characteristics of CIs for effect size esti-
usefulness. The ␩2p statistic may be useful for cross-study compar- mates, combined with researchers’ lack of familiarity with them,
isons with identical designs, but where designs differ, ␩G 2
is may help to explain why these CIs are not reported. In a later
needed. Within a factorial study, ␩p cannot properly be used to
2 section on CIs, we suggest sources for relevant software and offer
compare effects; ␩2 is needed. We describe each of these measures formulas to approximate CIs for Cohen’s d and R2.
and discuss the possible uses and interpretations of various effect Analyses were sometimes reported without the relevant descrip-
sizes in a later section of the article. tive statistics, making it more difficult for the reader to understand
The fifth and sixth editions of the APA Publication Manual and evaluate the results. The most basic sort of effect size estimate
recommend the inclusion of CIs for effect size estimates, but none when evaluating differences is the difference between means, but
were included in any of the JEP: General articles that we exam- almost one quarter of t test reports were not accompanied by
ined, and these CIs were only reported in 1 of the 386 cognitive means, and almost half lacked reports of variability measures.
articles that we surveyed (Morris & Fritz, 2011). Effect sizes, like When evaluating correlations, it is also necessary to consider the

Table 3
Articles Reporting Additional Analyses Associated With ANOVA and Effect Size Reporting for Those Analyses

For significant interactions


Articles with Post hoc or planned Post hoc or planned
Year ANOVA contrasts effect size No. of articles Simple effects Simple effect size

2009 27 16 1 (6%) 18 10 (55%) 1 (10%)


2010 32 20 1 (5%) 23 14 (44%) 2 (14%)
Overall 59 36 2 (6%) 41 24 (41%) 3 (13%)

Note. Articles were included in the counts if the analysis or statistic appeared at least once. ANOVA ⫽ analysis of variance.
6 FRITZ, MORRIS, AND RICHLER

Table 4 Table 6
Number (and Percentage) of Articles Reporting Effect Size Number (and Percentage) of Articles Reporting Effect Size
Estimates and Descriptive Statistics Associated With t Tests Estimates and Descriptive Statistics Associated With Regression
Analyses
Effect size
estimates Descriptive statistics Descriptive statistics
Articles with
Year Articles with t test d ␩ 2
p M Variability Year regression R2a F M Variability
a
2009 23 9 (39) 0 18 (78) 16 (70) 2009 8 4 (7) 1 (13) 6 (75) 6 (75)
2010 24 3 (13) 2 (8) 18 (75) 10 (42)b 2010 9 2 (8) 3 (33) 3 (33) 2 (22)
Overall 47 12 (26) 2 (4) 36 (77) 26 (55) Overall 17 2 (7) 4 (24) 9 (53) 8 (47)

Note. Articles were included if the statistic was reported at least once. Note. Articles were included if the statistic was reported at least once.
a
Nine articles reported standard deviation, six reported standard error of a
One of the 2009 articles reported adjusted R2 as well as R2; no other
the mean, and one reported 95% confidence interval. b Nine articles articles reported adjusted R2.
reported standard deviation, and five reported standard error of the mean.

multiple regression allows the effect of variables to be evaluated


distribution of the data, but the great majority—about 80%— of both collectively and in terms of their individual contributions.
the correlation reports failed to include a description of the data. Statistical outputs of regression analyses typically include clear
The frequent use of ANOVA, reported in 83% of the JEP: measures of effect size at all levels and in various forms, through
General articles, resembled our finding for cognitive journals R2, adjusted R2, changes in R2, standardized regression weights,
(Morris & Fritz, 2011) where 86% of articles reported at least one partial correlations and semipartial correlations. However, among
ANOVA. There is an argument that ANOVA is overused and that the articles surveyed, regression was rarely used except as a
in many cases, regression would be more appropriate (e.g., Cohen, model-fitting tool.
Cohen, West, & Aiken, 2003). ANOVA is a valuable technique, A substantial number of articles (20%) failed to report measures
but when factors are categorized versions of continuous variables, of the variability of data analyzed by ANOVA. As we illustrated
it is less appropriate than more flexible techniques, such as mul- earlier, standard deviations are as important as means for under-
tiple regression. After all, ANOVA and regression both derive standing data sets. One might argue that both standard deviations
from the general linear model, and ANOVA can be regarded as a and standard errors should be reported, which rarely occurred.
special case of regression (e.g., Tabachnick & Fidell, 2007, p. Where both were reported, it was usually the case that standard
155). To fit the requirements of ANOVA, researchers sometimes errors appeared as error bars on a figure and standard deviations
treat continuous variables, such as age, skill level, and knowledge, were reported in the text or a table; this combination provided an
as if there were a small number of discrete, categorical values. This accessible, clear description of the data. Like significance tests and
process is problematic in that it loses some of the power inherent CIs, the size of standard errors depends on the sample size, so that
in the original continuous variable. The fewer the number of levels although a standard error is very valuable for interpreting differ-
defined, the more data and thus the more power are lost. If just two ences between conditions, it does not provide an easily appreciated
or three levels are defined, it can also be tempting when planning idea of the distribution of the data. Standard deviations are a more
factorial research to select groups or conditions that are high and straightforward way of helping the reader conceptualize data sets.
low on some variable and, as a result, to almost certainly overes- Although the reporting of MSE is not a requirement for APA
timate the influence of the variable and the size of its effect in the journals, many editors encourage it, so it was surprising that 83%
population including midrange values. In addition, although of the articles surveyed did not include MSEs with the reports of
ANOVA tests the statistical significance of each individual effect, ANOVAs. The great advantage of reporting MSEs is that, along
with the value of the F ratio and its accompanying degrees of
freedom, knowledge of the relevant MSE allows the reader to
Table 5 reconstruct the remaining ANOVA details for that test including
Number (and Percentage) of Articles Reporting Effect Size the sums of squares, which are useful for effect size estimations.
Estimates and Descriptive Statistics Associated With Thus, the reporting of MSEs opens up considerable opportunities
Correlations for readers wishing to understand the data in greater depth and
perhaps calculate their own effect size estimates. Ideally, for each
Descriptive statistics ANOVA it would be valuable for the article to include the signif-
Articles with Effect size icance tests of all effects— both significant and nonsignificant—
Year correlation (r2) M Variability
with their MSEs reported so that the full details of the analysis
2009 14 1 (7) 2 (14) 2 (14)a could be reconstructed. Complete reporting would be useful for
2010 13 1 (8) 5 (38) 4 (31)b meta-analyses and would allow readers to calculate the types of ␩2
Overall 27 2 (7) 7 (26) 6 (22) and ␻2 that they thought most appropriate.
Note. Articles were included if the statistic was reported at least once.
Although r is also a measure of effect size, it was not described or treated Calculating Effect Sizes
as such in the articles surveyed.
a
The articles reported standard deviation. b Two articles reported stan- Our aim in this section is to demystify as far as possible selected
dard deviation and two reported standard error of the mean. effect size estimates and to recommend convenient ways of cal-
EFFECT SIZE ESTIMATES 7

culating them. General purpose statistics books for psychologists


(e.g., Aron, Aron, & Coups, 2009; Howell, 2002) typically address
few effect size statistics and may not consider them thoroughly.

g or dunb ⫽ d 1 ⫺
3
4df ⫺ 1 冊
.

Readers may prefer to consult specialized statistics books address- The correction is very small when the sample size is large (only
ing effect sizes (e.g., Cumming, 2012; Ellis, 2010; Grissom & 3% for df ⫽ 25) but is more substantial with a smaller sample size
Kim, 2005; Rosenthal et al., 2000). (8% for df ⫽ 10). This value is not the same as the original
This article addresses several effect sizes: those specific to Hedges’s g (1982), described earlier, although g might be used to
comparing two conditions (Cohen’s d, Hedges’s g, Glass’s d or ⌬, refer to either; dunb is a less ambiguous symbol, but in either case
and point biserial correlation r), those describing the proportion of the formula should be provided for clarity.
variability explained (␩2, ␩2p, ␩G2
, R2, the ␻2 family, adjusted R2, A discussion of the rather confusing history of the chosen
and ε ), and effect sizes for nonnormal data (z associated with the
2
symbols for these statistics can be found in Ellis (2010). For most
Mann–Whitney and Wilcoxon tests, and ␾, Cramér’s V, and reasonably sized samples, the difference between Cohen’s d, cal-
Goodman–Kruskal’s lambda for categorical data). culated using n, and Hedges’s g, calculated using n ⫺ 1 degrees of
freedom (df), will be very small. Especially when sample sizes are
small, it is helpful for authors to clearly specify how the reported
Effect Sizes Specific to Comparing Two Conditions effect size estimates were calculated, regardless of what symbol is
The most common approach to calculating effect size when used, so that the reader can interpret them correctly and they might
comparing two conditions is to describe the standardized differ- be useful for subsequent meta-analyses.
ence between the means, that is, the difference between the means There is virtually always some difference between the standard
of the two conditions in terms of standard (z) scores. There are deviations of the two distributions. When the standard deviations
varieties of this approach, discussed later, based on the way the (sA and sB) and the sample sizes of the two distributions (A and B)
standard deviation is calculated. In all cases, the sign of the effect are very similar, it may be sufficiently accurate when estimating
size statistic is a function of the order assigned to the two condi- the combined standard deviation (sAB) to take the average of the
tions; where the conditions are not inherently ordered, a positive two standard deviations:
effect size should be reported. Online calculators for the standard- sA ⫹ sB
ized difference statistics are available (e.g., Becker, 2000; Ellis, sAB ⫽ .
2
2009).
Cohen’s d and Hedges’s g. Cohen (1962, 1988) introduced a When the standard deviations differ but the sample sizes for each
measure similar to a z score in which one of the means from the group are very similar, then averaging the square of the standard
two distributions is subtracted from the other and the result is deviations and taking the square root of the result is more accurate
divided by the population standard deviation (␴) for the variables: (Cohen, 1988, pp. 43– 44; Keppel & Wickens, 2004, p. 160):

d⫽
MA ⫺ MB

, sAB ⫽ 冑 s2A ⫹ s2B
2
.

where MA and MB are the two means and ␴ refers to the standard However, where the sample size and/or the standard deviation of
deviation for the population. Hedges (1982) proposed a small the two distributions differ markedly it is usually recommended
modification for his statistic g in which the population standard (e.g., Keppel & Wickens, 2004) that the sums of squares and the
deviation (␴, calculated with a denominator of n, the number of degrees of freedom for the two variables should be combined with
cases) is replaced by the pooled sample standard deviation (s, the following formula (Keppel & Wickens, p. 160):
calculated with a denominator of n ⫺ 1)

g⫽
MA ⫺ MB
s
.
sAB ⫽ 冑 SSA ⫹ SSB
dfA ⫹ dfB
.

That is, the sum of squares for the two variables A and B should be
The standard deviations made available by common statistical added together, as should the degrees of freedom for the variables.
packages are for the sample(s) so that the more convenient statistic Then, the sum of the sums of squares is divided by the sum of the
for researchers to calculate is g rather than d. However, as we degrees of freedom, and the square root of the result taken. When
observed in our review, it is rare for authors to report Hedges’s g, not provided by the statistical package, the sum of squares for a
even though it may be what they have actually calculated. It variable can be easily calculated from the standard deviation as
appears to be the case that d may be often used as a generic term
for this type of effect size. For example, Borenstein, Hedges, SS ⫽ df ⫻ s2
Higgins, and Rothstein (2009) referred to the g statistic defined
or from the standard error of the mean (SE) as
above as d as does Comprehensive Meta-Analysis software that is
widely used for meta-analysis. These sources use g to refer to an SS ⫽ df ⫻ SE2 ⫻ N.
unbiased calculation, sometimes called dunbiased or dunb, that is
particularly useful for small sample sizes, where d tends to over- If pairs of conditions are being compared from among several
estimate the population effect size. The formula to adjust d, from that have been evaluated by an ANOVA, rather than working out
Borenstein et al. (p. 27), is the standard deviation for each comparison, it is acceptable to
8 FRITZ, MORRIS, AND RICHLER

replace the combined standard deviation for the multiple compar- Table 7
isons by the square root of the MSE (Grissom & Kim, 2005): Associated Values of Cohen’s d, r, r2 (or ␩2), PS, and U1

sAB ⬇ 冑MSE d r r2 or ␩2 PS U1

In an attempt to help with the interpretation of d, Cohen (1988) 0.0 .00 .000 50 0
suggested that d values of .8, .5, and .2 represented large, medium, 0.1 .05 .002 53 8
0.2 .10 .010 56 15
and small effect sizes, respectively, perhaps more meaningfully 0.3 .15 .022 58 21
described as obvious, subtle, and merely statistical. He recognized 0.4 .20 .038 61 27
that what would be a large, medium, or small effect size would, in 0.5 .24 .059 64 33
practice, depend on the particular area of study, and he recom- 0.6 .29 .083 66 38
0.7 .33 .11 69 43
mended these values for use only when no better basis for esti-
0.8 .37 .14 71 47
mating the effect size index was available. These designations 0.9 .41 .17 74 52
clearly do not reflect practical importance or substantive signifi- 1.0 .45 .20 76 55
cance, as those are judgments based on a more comprehensive 1.1 .48 .23 78 59
consideration of the research. 1.2 .51 .27 80 62
1.3 .55 .30 82 65
Glass’s d or ⌬. An alternative to both Cohen’s d and Hedg- 1.4 .57 .33 84 68
es’s g involves using the standard deviation for a control group 1.5 .60 .36 86 71
rather than a standard deviation based on combining the groups. 1.6 .63 .39 87 73
This approach is appropriate if the experimental manipulations are 1.7 .65 .42 89 75
1.8 .67 .45 90 77
thought to have distorted the distribution in some way. This
1.9 .69 .47 91 79
measure was proposed by Glass (1976) and is known as Glass’s d 2.0 .71 .50 92 81
or ⌬. 2.2 .74 .55 94 84
Point biserial correlation, r. There are alternatives to using 2.4 .77 .59 96 87
the standardized difference statistics as described earlier. Some 2.6 .79 .63 97 89
2.8 .81 .66 98 91
(e.g., Rosenthal, Rosnow, & Rubin, 2000) have preferred the point 3.0 .83 .69 98 93
biserial correlation coefficient, r, on the grounds that psychologists 3.2 .85 .72 99 94
are already familiar with it. Furthermore, r2 is equivalent to ␩2 and 3.4 .86 .74 99 95
other effect size estimates that describe the proportion of variabil- 3.6 .87 .76 99 96
3.8 .89 .78 100 97
ity associated with an effect, described later. For two groups, the
4.0 .89 .80 100 98
point biserial correlation, r, is calculated by coding group mem-
bership with numbers, for example, 1 and 2. The correlation Note. PS ⫽ probability of superiority. PS is the percentage of occasions
between these codes and the scores for the two conditions give the when a randomly sampled member of the distribution with the higher mean
value of point biserial r. It is also easy to calculate r if an will have a higher score than a randomly sampled member of the other
distribution. U1 ⫽ the percentage of nonoverlap between the two distribu-
independent samples t test has already been carried out because tions. Data are from Grissom (1994) and Cohen (1988); they assume


similar sample sizes.
t2
r⫽ .
t ⫹ df
2

next to these variability-based measures. Most of the variability-


Just as the sign of the t statistic is an artifact of the order assigned
based effect size estimates involve comparing various combina-
to the conditions, so too is the sign of the effect size. Unless there
tions of sums of squares and means squares taken from ANOVA
is a meaningful order for the two conditions, the statistics should
summary tables.2 To illustrate the different measures we refer to
be reported as positive numbers. If there is a meaningful order and
Table 9, which reports an imaginary three-way between-subjects
it was used for the t test, the sign of the t statistic should be applied
ANOVA.
to r.
Partial eta squared (␩p2). The ␩2p statistic is simply the ratio
Table 7 provides values of r corresponding to values of d when
of the sum of squares for the particular variable under consider-
group sizes are similar. An excellent discussion of the relative
ation divided by the total of that sum of squares and the sum of
benefits and limitations of d and point biserial r is provided by
squares of the relevant error term. It describes the proportion of
McGrath and Meyer (2006). For point biserial r, McGrath and
variability associated with an effect when the variability associated
Meyer suggested that values of .37, .24, and .10 represent large,
with all other effects identified in the analysis has been removed
medium and small— or obvious, subtle, and merely statistical—
from consideration. As we described earlier, it is the most com-
effect sizes, respectively. Formulas for converting between several
monly reported effect size in recent issues of JEP: General. This
effect size estimates, including r, are provided in Table 8.
popularity is almost certainly because ␩2p can be calculated directly
by SPSS. In general, the formula is
Effect Sizes Describing the Proportion of Variability
Explained
2
Sums of squares are calculated by subtracting the mean for any set of
For pairs of conditions, it is also possible to apply proportion of data from each score, squaring each result, and summing these squared
variability statistics such as R2 or ␩2, in a manner similar to the deviations from the mean. The mean square is the sum of squares divided
squared point biserial correlation, r2, described earlier. We turn by the degrees of freedom.
EFFECT SIZE ESTIMATES 9

Table 8
Formulas for Deriving Effect Size Estimates Directly and Indirectly

To this statistic

From this statistic d Point biserial r ␩2 with similar group sizes

Direct formula
d ⫽
MA ⫺ MB
␴ r ⫽ 冑 SSeffect
SStotal
␩2 ⫽
SSfactor
SStotal

d — d d2
r ⫽ ␩2 ⫽
冑d 2
⫹4 d ⫹4
2

Point biserial r 2r — ␩2 ⫽ r2
d ⫽
冑1 ⫺ r 2

␩ with similar group sizes 2 冑␩2 冑␩2


2
r ⫽ —
d ⫽
冑共1 ⫺ ␩2兲
t with similar group sizes
d ⫽
2t
冑N ⫺ 2 r ⫽ 冑 t2
t ⫹ df
2
␩2 ⫽
t2
t ⫹ df
2

Note. When group sizes differ considerably (when one group has fewer than one third of the total N), then r is smaller than the above calculation. For
more information about the translation between statistics with very uneven sample sizes, see McGrath and Meyer (2006).

SSeffect Eta squared (␩2). The ␩2 statistic (sometimes called R2) is a


␩ 2p ⫽ . simple ratio of the variability associated with an effect compared with
SSeffect⫹SSerror
all of the variability in an analysis.3 It describes the proportion of the
As an example, in Table 9, Factor A has a sum of squares of 100, total variability in the data that are accounted for by the effect under
and the error term has a sum of squares of 600, so consideration. One can easily calculate ␩2 from the ANOVA output
from a statistical package; it is the ratio of the sum of squares for the
100
␩ 2p ⫽ ⫽ .14. effect divided by the total sum of squares. That is,
100 ⫹ 600
SSeffect
Apart from asking SPSS to calculate ␩2p, it is easy to abstract the ␩2 ⫽ .
necessary sums of squares from ANOVA summary tables reported SStotal
by statistical software. It is also easy to calculate ␩2p from an F ratio
Thus, in Table 9, ␩2 for Factor A is calculated from the sum of
and its degrees of freedom, because
squares for A (100) and the total sum of squares (1,280). So, for
dfeffect ⫻ Feffect Factor A,
␩ 2p ⫽ .
共dfeffect ⫻ Feffect兲 ⫹ dferror
100
Thus, it is possible to calculate ␩2p from published results where the ␩2 ⫽ ⫽ .08.
1280
authors have not reported this effect size.
Care must be taken when comparing ␩2p estimates across studies One can also calculate ␩2 from reported F ratios and degrees of
with different designs to ensure that the error terms are compara- freedom where all of the effects from an ANOVA are reported. In
ble. The size of ␩2p is influenced by changes to the error variability. a two-factor (G ⫻ H) between-groups design, for the effect of
Error variability (SSerror) increases when sources of variability are Factor G
neither controlled nor identified as part of the analysis; it decreases
when these sources of variability are controlled or are identified in dfeffectG ⫻ FG
␩2 ⫽ .
the analysis. The variability associated with an uncontrolled vari- 共dfeffectG⫻FG兲⫹共dfeffectH⫻FH兲⫹共dfeffectG⫻H⫻FG⫻H兲⫹dferror
able appears in the SSerror, thereby reducing the size of ␩2p, whereas
controlling that variable or including it as an individual differences
factor in the analysis removes that variability from the SSerror,
3
Some authors prefer to refer to ␩2 as R2 because it fits with the
thereby increasing the value of ␩2p. Some of these issues can be statistical convention of reserving Greek letters for population parameters
addressed by using ␩G 2 and because of the commonality with R2 for regression. Although it seems
, which we discuss later.
simpler to use just one term for the proportion of variability explained,
One can use ␩p to compare the effect of some factor that appears
2
statistical software and textbooks most often use ␩2 for ANOVA and R2 for
in multiple studies but only when the error terms are comparable; regression. Furthermore, ␩2p, which is more often used than ␩2, is equiva-
␩G2
(see later) is more generally useful for between-study compar- lent to a partial correlation—a concept that is less familiar to people who
isons. For comparing the relative contribution of different factors do not use multiple regression regularly. We choose to use ␩2 with
within a single study, ␩2p is not useful because the baseline vari- ANOVA for these reasons and because there is an argument that applying
ability (i.e., the denominator) is different for each calculation. one term to ANOVA and another to regression is clearer and simpler.
10 FRITZ, MORRIS, AND RICHLER

Table 9
Example Between-Groups Analysis of Variance Summary Table With Calculations of ␩2, ␩2p, ␻2, and ␻2p

Source SS df MS F ␩2 ␩2p ␻2 ␻2p

Factor A 100 1 100 9.3 .08 .14 .07 .12


Factor B 200 1 200 18.7 .16 .25 .15 .22
Factor C 50 1 50 4.7 .04 .08 .03 .06
A⫻B 100 1 100 9.3 .08 .14 .07 .12
A⫻C 20 1 20 1.9 .02 .03 .01 .01
B⫻C 10 1 10 0.9 .01 .02 0 (⫺.001)a 0 (⫺.002)a
A⫻B⫻C 200 1 200 18.7 .16 .25 .15 .22
Error 600 56 10.7
Total 1,280 63

Note. SS ⫽ sum of squares; MS ⫽ mean squares.


a
Negative values of ␻2 can occur when F ⬍ 1. Keppel and Wickens (2004) recommend setting the value to zero.

The denominator term (SStotal) sums the degrees of freedom for the its own contribution to the total variability under consideration. If
error term with the products of each F ratio and its corresponding several factors vary together, they may jointly account for a
degrees of freedom. substantial proportion of the variability, but any individual factor
The ␩2 statistic is a useful measure of the contribution of an might contribute only a relatively small part of the whole. Alter-
effect— of a factor or an interaction—to the observed dependent native calculations, described later, produce variants of ␩2 that
variable. So, for example, examining the values of ␩2 in Table 9 eliminate some of the other variability from consideration.
reveals that Factor B accounts for twice as large a proportion of the Generalized eta squared (␩2G). Scientific research is a cumu-
total variability as Factor A, but also that the three-way interaction of lative activity; it is necessary to compare and combine the results of
Factors A, B, and C contributes as much variability as does Factor B. research across studies. Unfortunately, neither ␩2 nor ␩2p is well suited
For comparing the size of effects within a study, ␩2 is useful, but for making comparisons across studies with different designs. ␩G 2

there are risks in comparing ␩2 values across studies with different provides a way to compare effect size across studies; it was introduced
designs. These risks derive from the differences in total variability that by Olejnik and Algina (2003) and Bakeman (2005) extended their
arise from manipulating additional variables, thereby adding variabil- description of its use for repeated measures designs. Like R2 and ␩2,
ity, or from controlling variables, thereby reducing variability. If the ␩G2
gives an estimate of the proportion of variability within a study
effect of Factor A is the same across two studies (i.e., SSA remains that is associated with a variable but without the distorting effects of
constant), a study that manipulates Factor A alone will have a greater variables introduced in some studies but not others. For Olejnik and
value for ␩2 than one that manipulates Factor A and introduces an Algina, the distinction between manipulated factors and individual
additional manipulated Factor B. This difference is because in the differences factors is key. To illustrate the distinction, a study that
latter case, the total variability is increased by the variability intro- tested children in two different types of experimental rooms would
duced with Factor B. Conversely, controlling variables so that they do have room type as a manipulated variable. However, if the children
not contribute their variability to the overall ANOVA will, obviously, from a class were classified into groups by their ages and by their
reduce the SStotal. If the controlled variables do not interact with an personalities, these would be individual differences factors. The cen-
effect, so that the SSeffect is unchanged, then the ␩2 for that effect will tral idea when calculating ␩G 2
is that the sums of squares for manip-
be larger than if the variables had not been controlled. ulated variables are not included in the denominator of the calculation,
Unmatched total variability is an issue for cross-study comparisons except under two conditions. Those conditions are (a) when calculat-
involving most measures of effect size. Cohen’s d, for example, ing ␩G2
for the manipulated variable itself and (b) when calculating ␩G 2

depends on the standard deviations of the variables and they, in turn, for an interaction between that manipulated variable and either an
depend on the extent to which other factors have been controlled. individual differences factor or a subject factor in a repeated measures
Psychologists calculating ␩2 for their own data for the first time design (i.e., the between-subjects error term).
are often disappointed by the size of the effect that they are We can demonstrate the calculation of ␩G 2
using the Table 9
studying. A manipulation with an ␩2 of .04 accounts for only 4% example. Suppose that, continuing our developmental example, Fac-
of the total variability in the dependent variable—an amount that tor A is the room type in which the children are tested, Factor B is age
may seem trivial, especially when compared to r2 values com- group (younger or older children from the class), with two levels, and
monly seen in correlational research. It may be easier to deal with Factor C is a two-level classification of the children, such as introvert
small ␩2 values in terms of Cohen’s (1988, pp. 283–287) descrip- or extravert. Factor A is a manipulated factor but Factors B and C are
tion of large (.14), medium (.06), and small (.01) effects, but individual differences factors. To calculate ␩G 2
for Factor B (Age, an
obviously it is the practical or theoretical importance of the effect individual differences factor), use the formula for ␩2 but remove from
that determines what size qualifies the outcome as substantively the total sums of squares in the denominator the sums of squares
significant. In most experimental research, observed effect sizes associated with Factor A because it is a manipulated factor— one that
are likely to be small; many factors influence behavior in almost adds variability to the design. Thus, although ␩2 for Factor B is
any area, and few of these will be examined in the analysis. It
would be an exceptional situation to research a behavior that was SSB 200
␩2 ⫽ ⫽ ⫽ .16,
determined by only one or two causal factors. Each factor makes SSTotal 1280
EFFECT SIZE ESTIMATES 11

the adjusted calculation for ␩G


2
is when new variables are added in hierarchical regressions allow the
contributions of independent variables to be assessed.
SSB 200 200
␩ G2 ⫽ ⫽ ⫽ ⫽ .17.
SSTotal ⫺ SSA 1280 ⫺ 100 1180 SSChange
2
RChange ⫽ .
SSTotal
The adjustment, removing the variability that was added by the
manipulated variable, results in a higher value for ␩G 2
; more The square of the semipartial (“part” in SPSS) correlation between
importantly, it is a value that can be compared with the ␩G 2
value an independent variable and the dependent variable when the other
from another study, even if the designs of the two studies differed. independent variables have been controlled gives the proportion of
As a further example, suppose that Factor C was, instead, a the total variability uniquely predicted by the independent vari-
manipulated variable such as presence or absence of an adult in the able—analogous to ␩2. Similarly, the square of the partial corre-
room. In this case, the ␩G2
for Factor B will remove the variability lation between an independent variable and the dependent variable
associated with both manipulated variables (A and C) and their is analogous to ␩2p. Thus, in multiple regression analyses, R2,
interaction (A ⫻ C): 2
RChange , the squared semipartial correlations and the squared par-
tial correlations answer many questions about the size of the
SSB 200 relative contributions of the dependent variables. Note that al-
␩ G2 ⫽ ⫽
SSTotal ⫺ SSA ⫺ SSc ⫺ SSAxC 1280 ⫺ 100 ⫺ 50 ⫺ 20 though R2 and ␩2 are the same in that each describes the proportion
of the total variability that is accounted for, their use is somewhat
200
⫽ ⫽ .18. different. In regression, R2 describes the effect of a set of variables
1110 (one or more), whereas in ANOVA, ␩2 describes the effect of a
single factor or interaction (equivalent to the squared semipartial
Similar calculations can be easily made for repeated measures
correlation in regression).
designs, although the denominator may have to be constructed by
␻2, ␻p2, and ␻2G. The various ␩2 and R2 statistics describe the
accumulating the appropriate sums of squares rather than by sub-
effect size observed in the research. However, it is often valuable
traction from the total sum of squares. These sums of squares can
to think beyond a particular study, to the population from which
be obtained as part of the analysis from the statistical software or,
the sample came, and therefore of the effect size that would be
for published work, by reconstructing the ANOVA summary table
predicted in a replication of the study. In a replication, the vari-
(such as in Table 9) if the reporting of the ANOVA was suffi-
ability accounted for by each factor or set of predictors is likely to
ciently complete—that is, if all effects were reported complete
be somewhat different from the observations from one sample.
with all mean squares for the error terms (MSEs).
Sample variability includes both population variability and sam-
As an example of constructing ␩G 2
for a repeated measures
pling variability and so tends to be somewhat larger than the
factor, imagine that an analysis involves just two repeated mea-
population value alone. Thus, R2 overstates the variation in the
sures factors, P and Q, with both as manipulated factors. When
population, especially for small effects. Various statistics have
creating the denominator for the effect size of Factor P, the sum of
therefore been developed to estimate the effect size in the popu-
squares for Q will be omitted. However, the denominator will
lation rather than the observed sample.
include all sources of variability associated with P or with the
One statistic that is popular with the authors of statistical text-
between-subjects error: the sums of squares for P, for subjects
books (e.g., Hays, 1973; Howell, 2002; Keppel & Wickens, 2004;
(Subj), and for the interactions P ⫻ Subj, Q ⫻ Subj, and P ⫻Q ⫻
Tabachnick & Fidell, 2007) is ␻2. However, our survey of recent
Subj. So, for P,
JEP: General articles and of articles from three 2009 cognitive
SSP journals (Morris & Fritz, 2011) found that despite the recommen-
␩ G2 ⫽ . dations of these and other textbook authors, it is very rare for an
SSP ⫹ SSSubj ⫹ SSP⫻Subj ⫹ SSQ⫻Subj ⫹ SSP⫻Q⫻Subj
article to report ␻2. We observed only one instance in the com-
Details of the appropriate sums of squares to be included in the bined set of 457 articles.
denominator for most common designs can be found in Bakeman As for ␩2, there are three types of ␻2 estimates: ␻2, ␻2p, and ␻G 2
.
(2005). The basic principle of ␻2 is that it is the ratio of the population
R2. In regression, R2 is the square of the correlation between variability explained by the factor being measured to the popula-
the observed values and the values predicted by the regression tion’s total variability. For a one-way ANOVA, the total variability
equation. It is used to report the proportion of the variability of the can be divided into the variability associated with a particular
dependent variable that is predictable from the set of variables factor and the error variability. So, for a one way ANOVA with
entered into the regression and thus provides a good effect size Factor A,
estimate. R2 is calculated from the ratio
␴ A2
␻2 ⫽ ,
SSRegression ␴ A2 ⫹ ␴error
2
R2 ⫽ .
SSTotal
where ␴A 2
represents the population variance for Factor A, and
R2 is similar to ␩2 in that the variability associated with the focus ␴error
2
represents the appropriate population error variance
of the analysis—in this case the prediction—is considered as a for Factor A. The same formula applies when calculating
proportion of the total variability; R2 and ␩2 are identical when the ␻ 2p , where the error term is the term against which the
predictor is a factor, coded as a dummy variable. Changes in R2 relevant factor is evaluated.
12 FRITZ, MORRIS, AND RICHLER

The basic formula for estimating ␻2 in a one-way ANOVA or Effect Sizes for Nonparametric Data
␻ in a factorial design is
2
p
Effect size estimates for Mann–Whitney and Wilcoxon non-
SSeffect ⫺ 共a ⫺ 1兲 ⫻ MSerror parametric tests. Most of the effect size estimates we have
␻ 2 or ␻ 2p for A ⫽ , described here assume that the data have a normal distribution.
SStotal ⫹ MSerror
However, some data do not meet the requirements of parametric
where a is the number of levels of the factor (Hays, 1973, p. 513). tests, for example, data on an ordinal but not interval scale. For
This same value can be calculated directly from the F statistic such data, researchers usually turn to nonparametric statistical
(Keppel & Wickens, 2004, p. 233): tests, such as the Mann–Whitney and the Wilcoxon tests. The
significance of these tests is usually evaluated through the approx-
共a ⫺ 1兲 ⫻ 共FA ⫺ 1兲 imation of the distributions of the test statistics to the z distribution
␻ 2 or ␻ 2p for A ⫽ ,
共a ⫺ 1兲 ⫻ 共FA ⫺ 1兲 ⫹ N when sample sizes are not too small, and statistical packages, such
as SPSS, that run these tests report the appropriate z value in
where a is the number of levels of the Factor A, and N is the total addition to the values for U or T; z can also be calculated by hand
number of subjects. (e.g., Siegel & Castellan, 1988). The z value can be used to
One can calculate ␻2 in a similar way for multifactor between- calculate an effect size, such as the r proposed by Cohen (1988);
subject designs. The numerator remains the same, but the denom- Cohen’s guidelines for r are that a large effect is .5, a medium
inator includes the product of the degrees of freedom and the F effect is .3, and a small effect is .1 (Coolican, 2009, p. 395). It is
ratio reduced by 1 for each of the effects (factors and interactions) easy to calculate r, r2, or ␩2 from these z values because
in the analysis. So, for a multifactor design,
z
r⫽ ,
␻2 ⫽
共a ⫺ 1兲 ⫻ 共Feffect ⫺ 1兲
,
冑N
⌺关dfeffect ⫻ 共Feffect ⫺ 1兲兴 ⫹ N
and
summing across all of the effects (Keppel & Wickens, 2004, p.
481). z2
r2 or ␩2 ⫽ .
We have used the formulas to calculate ␻2 and ␻2p for each of N
the factors in Table 9. For these particular imaginary data, ␻2 and
␩2 are similar and so are ␻2p and ␩2p. This near identity is because These effect size estimates remain independent of sample size
the example has a reasonable sample size, with just two levels for despite the presence of N in the formulas. This is because z is
each factor, and the effect itself is large. The size of the distortion sensitive to sample size; dividing by a function of N removes the
for sample rather than population effect size calculations (i.e., ␩2 effect of sample size from the resultant effect size estimate.
rather than ␻2) depends on the number of participants tested, the Effect sizes for categorical data. Categorical data are often
number of levels of the factors, and the size of the effect. More tested with the chi-square statistic (␹2) but, like ANOVA and t
participants, fewer levels, and larger effects lead to less difference tests, the significance of a ␹2 test depends on the sample size as
between ␻2 and ␩2. With reasonably sized samples, limited num- well as the strength of the association. There are various measures
bers of factor levels, and larger effects, the overestimation of ␩2 of association for contingency tables; we describe three that may
may often be acceptable. This is fortunate, because there are be used for unordered categories. These can be easily calculated
problems in estimating ␻2 for repeated measures designs; for using SPSS by choosing Analyse, Descriptive Statistics, Cross-
these, only a range, not the actual value, can be calculated (Keppel tabs, Statistics and choosing the appropriate statistic.
& Wickens, 2004, p. 427). Instead, ␩2 has to be reported, but the Where the data being analyzed are in a 2 ⫻ 2 contingency table
inflation of the estimate has to be recognized. Advice on calculat- the ␾ correlation coefficient can be used. One can calculate ␾ from
ing ␻G 2
can be found in Olejnik and Algina (2003). ␹2 for the data using the formula
Radj and ␧2. For the R2 calculated by multiple regression,
2

there has long been the Wherry (1931) formula for calculating
adjusted or shrunken R2 (R2adj) with the aim of predicting, like ␻2,
␾⫽ 冑 ␹2
N
,
the R2 to be expected if the study were to be repeated with a
sample from the same population. where N is the total sample size. If, for example, the obtained value
of ␹2 was 10 with a sample size of 40 then
2
Radj ⫽ 1 ⫺ 共1 ⫺ R2 兲 冉 N⫺1
N⫺k⫺1
, 冊 ␾⫽ 冑 10
⫽ 冑0.25 ⫽ .05.
40
where N is the sample size, and k is the number of independent
variables in the analysis. Many statistical software packages cal- Cramér (1946) extended the ␾ statistic to larger contingency tables
culate R2adj. than the 2 ⫻ 2 of the ␾ correlation. This statistic, known as
A similar approach is taken to calculating an effect size known Cramér’s V or ␾c, modifies the formula for ␾ to be
as ε2 (Ezekiel, 1930), which is an alternative to ␻2. However, ε2 is
rarely reported, and we do not discuss it further here. Details of its
calculation can be found in Richardson (1996).
␾c ⫽ 冑 ␹2
N共k ⫺ 1兲
,
EFFECT SIZE ESTIMATES 13

where N is the total sample size, and k is the number of rows or Table 11
columns in the table, whichever is the smaller. Be aware that, Altered Example Contingency Table With L ⫽ 0
unlike Pearson’s r, the square of ␾, or of Cramér’s V, is not a valid
description of the proportion of variability accounted for (Siegel & Seminar attendance Adequate answer Poor answer Total
Castellan, 1988, p. 231). Attended 75 15 90
When the rows and columns of a contingency table represent a Not attended 45 9 54
predictor and a predicted variable, Goodman–Kruskal’s lambda Total 120 24 144
(L) describes how much the prediction is improved by knowing the
category for the predictor, a potentially useful description of the
size of the effect (Ellis, 2010; Siegel & Castellan, 1988). One may
calculate lambda from any size contingency table; two values can www.thenewstatistics.com. Bird (2002) described methods for cal-
be calculated: how well the row variable improves the predictabil- culating effect sizes for ANOVA, and Smithson (2003) provided
ity of the column variable and vice versa. Usually only one instructions and downloadable scripts for SPSS, SAS, SPlus, and
direction is meaningful. To calculate Lrow for predicting column R for calculating CIs for effect sizes associated with t tests,
membership from row membership, sum the highest frequency in ANOVA, regression, and ␹2 analyses at http://dl.dropbox.com/u/
each of the columns, subtract the largest row total, and divide by 1857674/CIstuff/CI.html. This webpage also provides links to
the total number of observations not in that largest row. The other websites that may be helpful. These calculators include
formula is consideration of the noncentral nature of the distribution. Further
details on calculating noncentral effect size CIs were given by
⌺j⫽1
k
nMj ⫺ max共Ri 兲 Steiger (2004). However, it may not always be possible or neces-
Lrow ⫽ ,
N ⫺ max共Ri 兲 sary to adjust for noncentrality: Bird (2002, p. 204) observed that
where k is the number of columns, nMj is the highest frequency in where the effect is not too large (e.g., d ⱕ 2) and there are
the jth column, max(Ri) is the largest row total, and N is the total sufficient degrees of freedom in the error term (more than 30), the
number of observations (Siegel & Castellan, 1988, p. 299). So, for adjustment makes little difference.
example, to determine how much attending a seminar improved CIs for d can be estimated with the procedure from Grissom and
the ability to predict an adequate answer on the relevant exam Kim (2005, pp. 59 – 60); this estimate does not adjust for noncen-
question, the contingency table might appear as in Table 10. One trality but is useful for normally distributed data, reasonable sam-
would calculate lambda as ple sizes (at least 10 per group), and values of d that are not very
large. The calculation is based on Hedges and Olkin’s (1985)
共75 ⫹ 30兲 ⫺ 90 15 formula for calculating the variance (sd2) for the theoretical sam-
Lrow ⫽ ⫽ ⫽ .28,
144 ⫺ 90 54 pling distribution of d:

so knowing row membership improves prediction of answer qual- na ⫹ nb d2


s2d ⫽ ⫹ ,
ity by 28%. Notice that lambda can also be zero as in the altered nanb 2共na ⫹ nb兲
data in Table 11. Here, lambda is calculated as
where na and nb are the sample sizes. The limits of the 95% CI
共75 ⫹ 15兲 ⫺ 90 0 would be
Lrow ⫽ ⫽ ⫽ 0.0.
144 ⫺ 90 54
95% CI⌬ ⫽ d ⫾ z.025 sd.
Where knowledge of the row does not contribute to predicting
column membership, lambda is zero. The lambda statistic seems Most statistics textbooks and websites provide tables of areas
especially useful in describing the size of the effect in terms that under the normal distribution that provide values for z at the
people without statistical training are likely to easily understand. desired cutoff. The cutoff is simply half of the difference between
1.00 and the desired CI. For a 95% CI, the cutoff is half of
CIs for Effect Sizes (1.00 –.95) which is .025; table lookup provides the corresponding
z value, which is 1.96. Grissom and Kim provided the following
As discussed earlier, the calculation of CIs for effect sizes is not example: For na ⫽ nb ⫽ 20 and d ⫽ 0.7, then sd2 ⫽ 0.106 and sd ⫽
as straightforward as it is for means because the distributions are 0.326; the 95% CI would be 0.7 ⫾ (1.96 ⫻ 0.326), giving a lower
not centered on the effect size value. Help is available, though. limit of 0.06 and an upper limit of 1.34. The resultant range of
Cumming (2012) provided guidance and Excel-based software for values for d—from almost zero to a very large effect size—is so
calculating CIs for d, which can be downloaded from http:// broad that it would be difficult to draw any conclusions on the
basis of the research, despite having observed a moderately large
effect. Although effect sizes are independent of sample size, their
Table 10 presumed accuracy is increased by larger sample sizes, so the
Example Contingency Table range of values in the CI becomes narrower with larger samples. If
Seminar attendance Adequate answer Poor answer Total this example involved groups of 100 cases rather than 20, the
bounds of the 95% CI would be .41 to .99. Replicability, as always,
Attended 75 15 90 is an important source of confidence and even the broad ranges are
Not attended 24 30 54 useful in meta-analyses (e.g., Borenstein et al., 2009); they allow
Total 99 45 144
a clear pattern to emerge from multiple studies in forest plots, a
14 FRITZ, MORRIS, AND RICHLER

useful graphical aspect of meta-analysis (for notes about their white background, d ⫽ 1.31. Consulting Table 7 gives a PS of 82%
origin, see Lewis & Clarke, 2001). for this value of d. That is, if pairs of pictures, one with a red and
Cohen et al. (2003, p. 88) described a method for estimating CIs one with a white background, were selected at random, the picture
for R2, provided that the sample size is greater than 60. The with the red background would be reported as more attractive on
standard error of R2 is calculated 82% of comparisons. This use of the PS statistic helps to demon-


strate the size of the effect in a more concrete and meaningful way
4R2 共1 ⫺ R2)2(n ⫺ k ⫺ 1兲2 than the standardized difference. This concept has been elaborated
SER2 ⫽ ,
共n2 ⫺ 1兲共n ⫹ 3兲 and extended by Vargha and Delaney (2000) to include all types of
ordinal and interval data.
where n is the number of cases and k is the number of independent
Table 7 also reports U1, which was devised by Cohen (1988). U1
variables. The bounds of a 67% CI can be estimated as R2 ⫾ SER2;
describes the degree of nonoverlap between the two population
factors of 1.3, 2, or 2.6 can be applied to the standard error to
distributions for various values of the effect sizes. For example,
provide estimates of 80%, 95%, or 99% CIs, respectively. This
when d ⫽ 0, the populations for the two distributions are perfectly
estimate does not adjust for noncentrality, but with larger samples,
superimposed on each other, and the value of U1 is zero; when d ⫽
the expected error is small.
0.5, U1 ⫽ 33%, one third of the areas in the distributions do not
overlap. U1 ⫽ 81% for the difference between the height of men
Translating Between Effect Sizes and women with d ⫽ 2.0 (McGraw & Wong, 1992); that is, 81%
We have described many ways of estimating effect sizes. Per- of the distributions for men and women do not overlap. For Elliot
haps one of the reasons why effect sizes are underreported and et al.’s (2010, Experiment 2) data on the attractiveness of men seen
infrequently discussed is that effect sizes may be reported using with red or white backgrounds, the U1 percentage nonoverlap of
one statistic in one study and a different statistic in another study, the distributions for the value of d ⫽ 1.31 is 65%. As for PS, the
making it difficult to compare the effect sizes. Many of the effect U1 statistic helps the reader to visualize the size of the effect being
size estimates can be converted to other estimates. In Table 8, we reported.
have provided formulas for translation between d, r, and ␩2. The substantive significance, or importance, of an effect de-
pends in part on what is being studied. Rosnow and Rosenthal
(1989), for example, illustrated how a very small effect relating to
Interpreting Effect Sizes
life-threatening situations, such as the reduction of heart attacks, is
The object of reporting effect sizes is to better enable the reader important in the context of saving lives on a worldwide basis (see
to interpret the importance of the findings. All other things being Table 12 and Ellis, 2010). When the data are the correlation of two
equal, the larger an effect size, the bigger the impact the experi- binary variables—such as having or not having a heart attack when
mental variable is having and the more important the discovery of in a treatment or a control condition—Rosnow and Rosenthal
its contribution is. recommended the use of what they called the binomial effect size
In Table 7, we offer not only corresponding values for d, r, r2, display to represent the relationship. The use of the binomial effect
and ␩2 but also two statistics—probability of superiority (PS) and size display is illustrated in their example: Table 12 shows the
the percentage of nonoverlap of the distributions (U1)—that help frequency of heart attacks in a large study of doctors who took
to clarify the relationships between the distributions of the condi- either aspirin or a placebo for the effect size r ⫽ .034. The success
tions being compared. The values of these statistics help the rate for the treatment is .50 ⫹ r/2 and for the control group is .50 ⫺
readers of reports to imagine the relationships between the two r/2. For the example in Table 12, these values are .50 ⫹ .017 and
distributions from which the effect size was calculated. We suggest .50 ⫺ .017. The table cells are then made up to complete 100% for
that one of these statistics be given along with the effect size the columns and rows. The success rate for the treatment is
estimate for the more important results reported in an article. calculated by subtracting the treatment effect (e.g., for aspirin)
PS gives the percentage of occasions when a randomly sampled from the control effect (e.g., the placebo). For our example, that is,
member of the distribution with the higher mean will have a higher 51.7 ⫺ 48.3 ⫽ 3.4% or r ⫽ .034; thus, 34 people in 1,000 would
score than a randomly sampled member of the other distribution. be spared heart attacks if they regularly took the appropriate dose
The values in Table 7 were abstracted from Grissom (1994). PS is of aspirin. It should be noted that although the simplicity of
also known as the common language effect size (McGraw &
Wong, 1992). Consider, as an example, a medium size effect of
d ⫽ 0.5 as defined by Cohen (1988). The PS for a d of 0.5 is 64%. Table 12
That is, if you sampled items randomly, one from each distribu- Binomial Effect Size Display for the Effect of Aspirin on Heart
tion, the one from the condition with the higher mean would be Attack Risk (r ⫽ .034)
bigger than that from the other condition for 64% of the pairs. A
real-world example is given by McGraw and Wong (1992): The d Treatment Heart attack No heart attack Total
for the difference in height between men and women is 2.0 for Aspirin 48.3 51.7 100
which the PS is 92%. That implies that if you compared randomly Placebo 51.7 48.3 100
chosen men and women, the man would be taller than the woman Total 100 100 200
for 92% of the comparisons. Finally, selecting an example from
Note. Values are percentages. Adapted from “Statistical Procedures and
the JEP: General articles that we reviewed earlier, Elliot et al. the Justification of Knowledge in Psychological Science,” by R. L. Rosnow
(2010, Experiment 2) found that women rated men seen in pictures & R. Rosenthal, 1989, American Psychologist, 44, p. 1279. Copyright 1989
with a red background as more attractive than men seen against a by the American Psychological Association.
EFFECT SIZE ESTIMATES 15

calculation and clarity of presentation of the binomial effect size Nevertheless, we suggest that in some cases, in addition to report-
display is attractive, Hsu (2004) has shown that it can overestimate ing PS and/or U1 to clarify the interpretation of an effect size, it is
success rate differences unless various conditions are met. often worthwhile to report more than one measure of effect size to
better interpret the results. It would, for example, be appropriate to
Considerations When Reporting and report ␩2p to indicate the proportion of variability associated with a
Using Effect Sizes factor when all others are controlled, but also to report ␩G 2
to give
an idea of the contribution that the factor makes to the overall
Effect sizes estimates are important and useful descriptive statistics. performance when other nonmanipulated variables are allowed to
Like all good descriptive statistics, they reflect the properties of the vary. Both of these values would be useful in evaluating the effect.
data and the conditions under which the data were collected. Just as To provide another example, Cohen’s d is useful for conceptual-
means are valuable estimates of central tendency that can, neverthe- izing and comparing the size of a difference independently of the
less, be misleading if the distribution is skewed—for example, when specific measure used; it enables comparisons between studies
studying income or life expectancy—so effect sizes must be con- concerned with the same factor but using different dependent
sidered within the context of the design and procedure, also con- measures. However, interpretation of the results and of compari-
sidering the properties of the distributions. If the measures used are sons could be enriched by also considering r or r2 as measures of
unreliable or if their range has been restricted, then the value of the the relative impact that the factor has on the outcome as is
effect size estimate will be different from, and probably smaller sometimes done in regression analyses where both the value of the
than, one that comes from very reliable measures or data that cover standardized regression coefficient and the proportion of variance
the full range. The allocation of observed variability to identified accounted for are discussed. The APA Publication Manual (APA,
effects or to error will also influence estimates of effect size. 2010, p. 34) specifically suggests that it will often be useful to
Imagine studies of a factor that has a similar effect on people from report and discuss effect size estimates in terms of the original
various economic classes. One study samples only middle-class units, as well as the standardized approaches. The effect size
people; the error variability in this case would be smaller than the expressed in original units is often clear and easy to discuss in the
error variability in another, similar study that samples more context of a single study, whereas the standardized units approach
widely. Because the error variability is smaller in the first case, the facilitates comparisons between studies and meta-analyses. It is
size of the effect is likely to appear larger. Yet, another study also useful, when disentangling the effect of a factor with more
might account for variability associated with socioeconomic group than two levels, to provide an effect size estimate for the full effect
by including income as a factor or covariate in the analysis, of the factor and for each of the pairwise comparisons or other
thereby reducing the error variability and increasing the apparent linear contrasts (see Keppel & Wickens, 2004). Similarly, analysis
effect size. In general, if variables are controlled in a study and of simple main effects associated with an interaction should in-
therefore do not contribute to error variability, the estimated effect clude effect size estimates both for the interaction and for the
size is likely to be larger than effect sizes for studies in which simple main effects.
variables have not been controlled or have been counterbalanced Good practice with respect to effect size reporting appears to be
across the conditions (without including the counterbalancing fac- on the increase but does not seem to have been fully adopted by
tor in the analysis). It is possible to correct some effect size most authors. Roughly half of the ANOVA reports included a
estimates for some of these distorting factors by using statistics measure of effect size, although few included effect size estimates
such as ␩G 2
and ␻G 2
(Baguley, 2009; Grissom & Kim, 2005; Olejnik for further analyses related to the ANOVA. In a few articles,
& Algina, 2003), but in all cases, interpretation and comparison of authors were thorough in reporting ␩2p for the main effect in an
effect sizes requires careful consideration of the sources of vari- ANOVA and reporting Cohen’s d or ␩2 for simple effects or
ability. planned or post hoc comparisons. As with all analyses, it is
The key point is that all estimates of effect size should be important to think carefully about which type of effect size is most
evaluated in the context of the research. It is not sensible to say of useful for each comparison (e.g., ␩2 or ␩2p). Keppel and Wickens
some phenomenon that its effect size is X without qualifying under (2004) and Rosenthal et al. (2000) provided helpful advice on
what conditions it has been found to be X. Nevertheless, estimates using contrasts and comparisons in ANOVA designs.
of effect size provide both an invitation to further, meaningful We began this research with an interest in the use of effect sizes
interpretation and a useful metric for considering multiple, varied as a way of quantitatively describing effects—as a supplement to
studies together. Complete effect size information, including the the descriptions of the data and the results of statistical tests for
CIs of the effect size estimates, is helpful to subsequent meta- those effects. We found that authors have begun to include reports
analyses, and these meta-analyses make an excellent contribution of effect size estimates with substantial encouragement from the
to furthering the understanding of psychological phenomena. Just APA and related professional organizations as well as from journal
as psychology researchers have become sophisticated in dealing editors. Nevertheless, although slightly more than half of the
with the complexities of inferential statistics, the regular consid- articles report some effect size estimate, the majority of individual
eration of effect sizes can lead to these statistics being demystified effects that are tested and reported are still not accompanied by
and becoming valuable tools. descriptions of effect size. We also found that descriptions of data
In our surveys of the reporting of effect sizes, we have not were often lacking. For a reader to engage with, think though, and
encountered any occasion when more than one effect size was fully consider the implications of the results of a study, descrip-
reported for any particular effect. This selectivity may result from tions of data and of the size of observed effects— both significant
efforts toward conciseness in reporting, or it may reflect a strategy and nonsignificant—are needed. It is not enough to simply identify
of doing the minimum required to placate reviewers and editors. that some effects were significant and others were not. There have
16 FRITZ, MORRIS, AND RICHLER

been calls from some quarters to shift the emphasis away from tation provided in the report, and where effects or Ns are
inferential testing and toward a more descriptive and thoughtful small, indicate the possible inflation of ␩2 by also report-
approach to interpreting results (e.g., Cohen, 1994; Loftus, 1996). ing ␻2, ␻G2
, and/or ␻2p.
Although we are sympathetic to many of those concerns, we value
an approach that includes complete reporting of statistical tests 3. For complex analyses, such as factorial ANOVA or mul-
combined with descriptions of both the data and the effects. The tiple regression, report all effects. Report the results for
value of a piece of research goes beyond its significant effects. The each effect, including F, df, and MSE, so that the reader
richness of the story and the argument presented by the research is can calculate effect sizes other than those reported.
essential to the development of greater understanding (e.g., Abel-
4. Take steps to aid the reader to understand and interpret
son, 1995), but the patterns in the data and in the effects must be
the size of the more important effects. Use statistics such
reported in order for the reader to engage with the author in
as the PS and Cohen’s U1 or Goodman–Kruskal’s lambda
comprehending and evaluating the results of the research. At the
to help the reader conceptualize the size of the effect.
moment, for most authors, considering effect sizes seems to be the
last stage of their examination of their data. We believe that it 5. Always discuss the practical, clinical, or theoretical im-
should become one of the first stages. A clear grasp of the size of plications of the more important of the effect sizes ob-
the effects observed is at least as important as significance testing tained.
or the calculation of CIs.
When reporting requirements change, it is usually necessary for References
people to learn, perhaps to teach themselves, about the new sys-
tems. It is not always easy to do so. Because we teach statistics as Abelson, R. P. (1995). Statistics as principled argument. Hillsdale, NJ:
well as conduct research, we have been driven to explore the types Erlbaum.
American Psychological Association. (2001). Publication manual of the
of effect sizes and the usefulness of each. Our review of the
American Psychological Association (5th ed.). Washington, DC: Author.
reporting of effect sizes suggests that many authors have sought American Psychological Association. (2010). Publication manual of the
the minimum engagement with effect sizes that is possible while American Psychological Association (6th ed.). Washington, DC: Author.
still being published. This approach is suggested by the frequent Aron, A., Aron, E. N., & Coups, E. (2009). Statistics for psychology (5th
choice of effect size measures that are easily available (e.g., ␩2p) ed.). Upper Saddle River, NJ: Pearson.
but less than optimally useful and usually not those recommended Baguley, T. (2009). Standardized or simple effect size: What should be
by the authors of statistical textbooks (e.g., ␻2). Statistical texts reported? British Journal of Psychology, 100, 603– 617. doi:10.1348/
likely to be accessed by researchers are often selective in their 000712608X377117
advice about effect sizes. There are excellent discussions of the Bakeman, R. (2005). Recommended effect size statistics for repeated
measures designs. Behavior Research Methods, 37, 379 –384. doi:
complexities of effect size available in specialist journals, but they
10.3758/BF03192707
tend to be presented in the often dense language of statistical
Becker, L. A. (2000). Effect size calculators. Retrieved from http://
formulas that are understandably avoided by all but the most www.uccs.edu/⬃faculty/lbecker/
competent or desperate researchers. We hope that this article Bird, K. D. (2002). Confidence intervals for effect sizes in analysis of
provides a shortcut in the process of accumulating the necessary variance. Educational and Psychological Measurement, 62, 197–226.
expertise to report and use effect sizes more effectively and helps doi:10.1177/0013164402062002001
people to appreciate the value of incorporating good descriptions Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009).
of data and effect sizes in their reports. Introduction to meta-analysis. Chichester, UK: Wiley. doi:10.1002/
We end with a minimum set of recommendations that are 9780470743386
designed for the novice effect size user (we include ourselves in Campbell, J. P. (1982). Editorial: Some remarks from the outgoing editor.
Journal of Applied Psychology, 67, 691–700. doi:10.1037/h0077946
this category) and are not intended to constrain the fuller use of
Cohen, J. (1962). The statistical power of abnormal–social psychological
alternative techniques. We suggest the following: research: A review. Journal of Abnormal and Social Psychology, 65,
145–153. doi:10.1037/h0045186
1. Always describe the data: (a) report means or other Cohen, J. (1988). Statistical power analysis for the behavioral sciences
appropriate measures of central tendency to accompany (2nd ed.). Hillsdale, NJ: Erlbaum.
every reported analysis, and (b) report at least one asso- Cohen, J. (1994). The earth is round (p ⬍ .05). American Psychologist, 49,
ciated measure of variability for each mean and the MSE 997–1003. doi:10.1037/0003-066X.49.12.997
for ANOVA analyses. Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple
regression/correlation analysis for the behavioral sciences (3rd ed.).
2. Also describe the effects: (a) report an effect size esti- Mahwah: NJ: Erlbaum.
mate for each reported analysis, (b) for the most impor- Coolican, H. (2009). Research methods and statistics in psychology. Lon-
tant effects, report complete effect size information, in- don, United Kingdom: Hodder.
cluding the CIs of the effect size estimates for possible Cramér, H. (1946). Mathematical methods of statistics. Princeton, NJ:
Princeton University Press.
use in subsequent meta-analyses, (c) for the difference
Cumming, G. (2012). Understanding the new statistics: Effect sizes, con-
between two sets of data, as a default, use Cohen’s d (or fidence intervals, and meta-analysis. New York, NY: Routledge.
Hedges’s g) as the effect size estimate and, for small Cumming, G., & Finch, S. (2001). A primer on the understanding, use and
sample sizes, also report dunbiased, and (d) for factorial calculation of confidence intervals based on central and noncentral
analyses, with due thought and consideration, select and distributions. Educational and Psychological Measurement, 61, 532–
report ␩2, ␩G2
, and/or ␩2p as appropriate for the interpre- 574.
EFFECT SIZE ESTIMATES 17

Elliot, A. J., Niesta Kayser, D., Greitemeyer, T., Lichtenfeld, S., Gramzow, change the way we analyze data. Current Directions in Psychological
R. H., Maier, M., & Liu, H. (2010). Red, rank, and romance in women Science, 5, 161–171. doi:10.1111/1467-8721.ep11512376
viewing men. Journal of Experimental Psychology: General, 139, 399 – Maxwell, S. E., Kelley, K., & Rausch, J. R. (2008). Sample size planning for
417. doi:10.1037/a0019689 statistical power and accuracy in parameter estimation. Annual Review of
Ellis, P. D. (2009). Effect size calculators. Retrieved from http:// Psychology, 59, 537–563. doi:10.1146/annurev.psych.59.103006.093735
myweb.polyu.edu.hk/⬃mspaul/calculator/calculator.html McGrath, R. E., & Meyer, G. J. (2006). When effect sizes disagree: The
Ellis, P. D. (2010). The essential guide to effect sizes: Statistical power, case of r and d. Psychological Methods, 11, 386 – 401. doi:10.1037/
meta-analysis, and the interpretation of research results. Cambridge, 1082-989X.11.4.386
United Kingdom: Cambridge University Press. McGraw, K. O., & Wong, S. P. (1992). A common language effect size
Ezekiel, M. (1930). Methods of correlational analysis. New York, NY: statistic. Psychological Bulletin, 111, 361–365. doi:10.1037/0033-
Wiley. 2909.111.2.361
Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Morris, P. E., & Fritz, C. O. (2011). The reporting of effect sizes in
Educational Researcher, 5, 3– 8. cognitive publications. Manuscript submitted for publication.
Grissom, R. J. (1994). Probability of the superior outcome of one treatment Murphy, K. R. (1997). Editorial. Journal of Applied Psychology, 82, 3–5.
over another. Journal of Applied Psychology, 79, 314 –316. doi:10.1037/ doi:10.1037/h0092448
0021-9010.79.2.314 Olejnik, S., & Algina, J. (2003). Generalized eta and omega squared
Grissom, R. J., & Kim, J. J. (2005). Effect sizes for research: A broad statistics: Measures of effect size for some common research designs.
practical approach. New York, NY: Psychology Press. Psychological Methods, 8, 434 – 447. doi:10.1037/1082-989X.8.4.434
Grissom, R. J., & Kim, J. J. (2011). Effect sizes for research: A broad Richardson, J. T. E. (1996). Measures of effect size. Behavior Research
practical approach (2nd ed.). New York, NY: Psychology Press. Methods, Instruments & Computers, 28, 12–22. doi:10.3758/
Hays, W. L. (1973). Statistics for the social sciences (2nd ed.). New York, BF03203631
NY: Holt, Reinhart, & Winston. Rosenthal, R., Rosnow, R. L., & Rubin, D. B. (2000). Contrasts and effect
Hedges, L. V. (1982). Estimation of effect size from a series of indepen- sizes in behavioral research: A correlational approach. Cambridge, UK:
dent experiments. Psychological Bulletin, 92, 490 – 499. doi:10.1037/ Cambridge University Press.
0033-2909.92.2.490 Rosnow, R. L., & Rosenthal, R. (1989). Statistical procedures and the
Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. justification of knowledge in psychological science. American Psychol-
San Diego, CA: Academic Press. ogist, 44, 1276 –1284. doi:10.1037/0003-066X.44.10.1276
Hoenig, J. M., & Heisey, D. M. (2001). The abuse of power: The pervasive Siegel, S., & Castellan, N. J. (1988). Nonparametric statistics for the
fallacy of power calculations for data analysis. American Statistician, 55, behavioral sciences (2nd ed.). New York, NY: McGraw-Hill.
19 –24. doi:10.1198/000313001300339897 Smithson, M. (2003). Confidence intervals. Thousand Oaks, CA: Sage.
Howell, D. C. (2002). Statistical methods for psychology (5th ed.). Pacific Steiger, J. H. (2004). Beyond the F test: Effect size confidence intervals
Grove, CA: Duxbury. and tests of close fit in the analysis of variance and contrast analysis.
Hsu, L. M. (2004). Biases of success rate differences shown in binomial Psychological Methods, 9, 164 –182. doi:10.1037/1082-989X.9.2.164
effect size displays. Psychological Methods, 9, 183–197. doi:10.1037/ Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th
1082-989X.9.2.183 ed.). Boston, MA: Pearson.
Huberty, C. J. (2002). A history of effect size indices. Educational and Vacha-Haase, T., & Thompson, B. (2004). How to estimate and interpret
Psychological Measurement, 62, 227–240. doi:10.1177/ various effect sizes. Journal of Counseling Psychology, 51, 473– 481.
0013164402062002002 doi:10.1037/0022-0167.51.4.473
Kelley, K., & Rausch, J. R. (2006). Sample size planning for the standard- Vargha, A., & Delaney, H. D. (2000). A critique and improvement of the
ized mean difference: Accuracy in planning estimation via narrow CL common language effect size of McGraw and Wong. Journal of
confidence intervals. Psychological Methods, 11, 363–385. doi:10.1037/ Educational and Behavioral Statistics, 25, 101–132.
1082-989X.11.4.363 Wherry, R. J. (1931). A new formula for predicting the shrinkage of the
Keppel, G., & Wickens, T. D. (2004). Design and analysis: A researcher’s coefficient of multiple correlation. Annals of Mathematical Statistics, 2,
handbook (4th ed.). Upper Saddle River, NJ: Pearson. 440 – 457. doi:10.1214/aoms/1177732951
Levant, R. F. (1992). Editorial. Journal of Family Psychology, 6, 3–9. Wilkinson, L., & the APA Task Force on Statistical Inference. (1999).
doi:10.1037/0893-3200.6.1.5 Statistical methods in psychology journals: Guidelines and explanations.
Lewis, S., & Clarke, M. (2001). Forest plots: Trying to see the wood and American Psychologist, 54, 594 – 604. doi:10.1037/0003-066X.54.8.594
the trees. British Medical Journal, 322, 1479 –1480. doi:10.1136/ Yates, F. (1951). The influence of statistical methods for research workers
bmj.322.7300.1479 on the development of the science of statistics. Journal of the American
Loftus, G. R. (1996). Psychology will be a much better science when we Statistical Association, 46, 19 –34. doi:10.2307/2280090

(Appendix follows)
18 FRITZ, MORRIS, AND RICHLER

Appendix

A Brief Introduction to Power Analysis

When planning research, it is sensible to ensure that the research Table A2


has the power to detect the effect(s) under consideration (e.g., Ellis, Power Present Based on the Number of Groups, Number of
2010; Keppel & Wickens, 2004). The anticipated size of the effect Participants per Group, and Meaningful or Expected Effect Size
may be estimated from previous research, or an effect size may be
chosen that is the smallest effect that would be meaningful in some Effect size
practical sense. It is also good practice to define the degree of power Participants Small Medium Large
required, that is, to set an acceptable probability of Type II errors. per group (d ⫽ .2; ␩2 ⫽ .01) (d ⫽ .5; ␩2 ⫽ .06) (d ⫽ .8; ␩2 ⫽ .14)
Although the limit for Type I errors is usually set to .05, the limit for
Two groups
Type II errors is often set to be somewhat higher; if the limit for Type 10 .07 .18 .40
II errors is set to .20, then power would need to be .80, a commonly 15 .08 .26 .57
recommended value (Ellis, 2010). Having identified an anticipated 25 .10 .42 .80
40 .14 .61 .95
effect size and the power requirements, it is easy to determine the 80 .24 .89 1.00
number of participants required. Table A1 is adapted from Cohen Three groups
10 .07 .20 .45
(1988); the intersection of the selected effect size and power level 15 .08 .29 .64
shows the number of participants required in each group for both 25 .10 .47 .87
40 .15 .68 .98
two-tailed and one-tailed t tests at a significance threshold of ␣ ⫽ .05. 80 .27 .94 1.00
Three levels of effect size are included in this brief summary: small Four groups
10 .07 .21 .51
(d ⫽ .2), medium (d ⫽ .5), and large (d ⫽ .8); Cohen’s tables provide 15 .08 .32 .71
data for a fuller range of effect sizes. If the planned research will 25 .11 .53 .93
40 .16 .76 .99
require too many participants to be practicable, then it may be worth- 80 .29 .97 1.00
while to consider ways of reducing error variability, thereby increas- Five groups
10 .07 .23 .56
ing the anticipated effect size. 15 .09 .36 .78
When an experiment has led to a nonsignificant result, it is inap- 25 .12 .58 .96
40 .17 .81 1.00
propriate to post hoc calculate the power of the study (Ellis, 2010; 80 .32 .99 1.00
Hoenig & Heisey, 2001). However, the experiment can provide an
Note. These figures apply to analysis of variance and to two-tailed t tests.
estimate of the population effect size, although this may not be very Where power is .3, there is a 70% chance of failing to detect an effect.
accurate if the sample size was small. Using the estimate of the Adapted from Statistical Power Analysis for the Behavioral Sciences (2nd
population effect size, future research can be planned with respect to ed., pp. 311–318), by J. Cohen, 1988, Hillsdale, NJ: Erlbaum. Copyright
the sample size required to achieve a reasonable level of power. 1988 by Taylor & Francis. Cohen’s tables report another effect size
estimate, f, which is rarely reported and is not addressed in this article. The

relationship between f and ␩2 is f ⫽ 冑 ␩2


1⫺␩ 2 and ␩ ⫽
2
f2
1 ⫺ f2
.

Table A1
Number of Participants per Group Required for t Tests to Table A2 is also adapted from Cohen (1988); it lists power
Achieve Selected Levels of Power, Based on the Anticipated Size levels for small, medium, and large effect sizes given some num-
of the Effect ber of groups and participants per group. These values apply to
ANOVAs and two-tailed t tests.
Effect size for one-tailed test Effect size for two-tailed test Power may not be the sole consideration when estimating the
Small Medium Large Small Medium Large
number of participants required. Sample means and variability
Power (d ⫽ .2) (d ⫽ .5) (d ⫽ .8) (d ⫽ .2) (d ⫽ .5) (d ⫽ .8) provide estimates of the population parameters; the accuracy or
precision of those estimates is a function of the sample size. It may
.25 48 8 4 84 14 6 be as useful or more useful in some cases to estimate the sample
.50 136 22 9 193 32 13
.60 181 30 12 246 40 16 size required for a desired degree of accuracy in parameter esti-
.67 216 35 14 287 47 19 mation based on defining the maximum acceptable confidence
.70 236 38 15 310 50 20 interval width. Maxwell, Kelley, and Rausch (2008) provided an
.75 270 44 18 348 57 23
.80 310 50 20 393 64 26 excellent discussion of power and accuracy in parameter estima-
.85 360 58 23 450 73 29 tion; practical guidance is available there and in other articles (e.g.,
.90 429 69 27 526 85 34
.95 542 87 35 651 105 42 Kelley & Rausch, 2006) and texts (Cumming, 2012).
Note. Where power is .8, there is a 20% chance of failing to detect an
effect. Adapted from Statistical Power Analysis for the Behavioral Sci- Received April 1, 2011
ences (2nd ed., pp. 54 –55), by J. Cohen, 1988, Hillsdale, NJ: Erlbaum. Revision received May 15, 2011
Copyright 1988 by Taylor & Francis. Accepted May 15, 2011 䡲
Correction to Fritz et al. (2011)

The article “Effect Size Estimates: Current Use, Calculations, and Interpretation,” by Catherine O.
Fritz, Peter E. Morris, and Jennifer J. Richler (Journal of Experimental Psychology: General,
Advance online publication. August 8, 2011. doi:10.1037/a0024338) contained a production-related
error. The sixth equation under “Effect Sizes Specific to Comparing Two Conditions” should have
had a plus sign rather than a minus sign in the denominator. All versions of this article have been
corrected.

DOI: 10.1037/a0026092

View publication stats


Psychological Assessment Copyright 1993 by the American Psychological Association, Inc.
1993, Vol. 5, No. 4, 395-399 1040-3590/93/$3.00

Reporting Errors in Studies of the Diagnostic Performance of Self-


Administered Questionnaires: Extent of the Problem,
Recommendations for Standardized Presentation of Results, and
Implications for the Peer Review Process
Julie B. Kessel and Mark Zimmerman

Diagnostic efficiency statistics include sensitivity, specificity, and positive and negative predictive
power. In reviewing the literature on the performance of self-report questionnaires to screen for
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

depression, we found errors in several published articles in which these statistics were computed. To
This document is copyrighted by the American Psychological Association or one of its allied publishers.

determine the extent of this problem we examined all studies of the diagnostic performance of self-
report scales published between 1980 and 1991 in the Journal of 'Consulting and Clinical Psychology
and Psychological Assessment: A Journal of Consulting and Clinical Psychology. We found 26 rele-
vant studies: 9 had an error in the calculation of diagnostic efficiency statistics and 3 made calcula-
tions based on unconventional definitions of the terms. Moreover, no study reported all 4 diagnostic
statistics together with the total and chance-corrected level of agreement between the scale and
the diagnostic gold standard. Recommendations for standardized reporting are suggested, and the
implications of these findings are discussed.

As part of a review of the literature on the use of self-report easily computed with certain raw data. Studies of test perfor-
questionnaires to screen for and diagnose depression, we were mance are typically presented as in Figure 1.
surprised to discover several mistakes in the reporting of diag- Sensitivity refers to a test's ability to identify correctly indi-
nostic efficiency statistics in three studies published in the Jour- viduals with the illness, whereas specificity refers to the test's
nal of Consulting and Clinical Psychology and Psychological ability to identify non-ill persons. Sensitivity, also called the true
Assessment: A Journal of Consulting and Clinical Psychology positive rate, is the percentage of ill persons who are identified
(Gallagher, Breckenridge, Steinmetz, & Thompson, 1983; M. by the test as ill [a/(a + c)]. Specificity, the true negative rate, is
Hesselbrock, V. Hesselbrock, Tennen, Meyer, & Workman, the percentage of non-ill persons correctly identified by the test
1983; Nelson & Cicchetti, 1991). To determine the extent of as non-ill [d/(b + d)].
this problem, we examined all studies of the diagnostic perfor- Sensitivity and specificity provide useful psychometric infor-
mance of self-report scales published in these two journals be- mation about a test; however, the clinically more meaningful
tween 1980 and 1991 for the accuracy and completeness of data conditional properties are positive and negative predictive val-
presentation. Before describing our findings, we present a brief ues. These values indicate the probability that an individual is
overview of how these terms are defined and calculated (for a ill or non-ill given that the test identifies him or her as ill or non-
more detailed discussion, see Baldessarini, Finkelstein, & Ar- ill. Accordingly, positive predictive value is the percentage of
ana, 1983; R. Fletcher, S. Fletcher, & Wagner, 1988; Griner, individuals classified by the test as ill who truly are ill [a/(a +
Mayewski, Mushlin, & Greenland, 1981; Mausner & Kramer, b)], whereas negative predictive value is the percentage of indi-
1985;Sackett, 1992). viduals classified as non-ill by the test who truly are non-ill [d/
Putative diagnostic tests of psychiatric syndromes are typi- (c + d)}.
cally judged by their association to a "gold standard," tradition- Kappa represents the level of agreement between the test in
ally the clinical, or structured clinical, interview. Test perfor- question and a gold standard beyond that accounted for by
chance alone. There are a variety of statistics used to correct for
mance is quantified in terms of its sensitivity, specificity, posi-
chance agreement, kappa being most widely used. Finally, the
tive and negative predictive power, and the absolute and chance-
overall correct classification rate, also known as the "hit rate"
corrected level of agreement with the standard. These may be
or "overall level of agreement," refers to the proportion of ill
and non-ill patients correctly classified by the test (a + d)/N.
There is marked variability across studies with regard to
Julie B. Kessel and Mark Zimmerman, Medical College of Pennsyl-
which statistics are reported. Each of the six statistics described
vania at Eastern Pennsylvania Psychiatric Institute. before yields a different perspective of a test's performance. We
We thank James Herbert for his comments on an earlier draft of this believe that these statistics, together, provide a broad profile of
article. the performance of diagnostic tests and should be included in
Correspondence concerning this article should be addressed to Mark routine reporting. However, more or less emphasis may be
Zimmerman, Medical College of Pennsylvania at Eastern Pennsylva- placed on one statistic over another, in accord with the nature
nia Psychiatric Institute, 3200 Henry Avenue, Philadelphia, Pennsyl- of the test, the population tested, and the hypothesis of investi-
vania 19129. gation.
395
396 JULIE B. KESSEL AND MARK ZIMMERMAN

DIAGNOSIS
are presented in Table 1. Most investigators presented either 3
Present Absent (n = 10) or 4 (n = 6) values. The overall level of agreement
was calculated in 16 studies; 18 reported both sensitivity and
specificity; 6 calculated the positive and negative predictive
power; and kappa was calculated in 6. The specific problems in
the 9 studies with errors and the 3 studies with unconventional
Negative definitions of terms are briefly summarized in the following.
Gallagher et al. (1983) examined the ability of the Beck De-
pression Inventory (BDI) to identify major depression in 102
elderly outpatients. Using a cutoff score of 11, they reported
that the false negative rate was 8.8%. From the raw data pre-
IMPORTANT DEFINITIONS sented in Table 1 of their article, we calculated the false negative
rate to be 1.8% (1/57).
Sensitivity = a/la-f-c) Goldston, O'Hara, and Schartz (1990) reported that the spec-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Specificity = d/lb+d)
ificity of the Inventory to Diagnose Depression (IDD) was
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Positive Predictive Power = a/(a + b>


Negative Predictive Power - d/{c+dl
False Positive Rate = b/lb + d) 87.5% in a sample of 30 college students. D. Goldston (personal
False Negative Rate = c/{a+c)
« (a + dl/N
communication, September 9, 1990) provided us with raw data
Overall Correct Classification
Kappa - IPo- Pcl/|l - Pel not given in the article to enable us to calculate prevalence rates
Po (observed agreement) - la+d)/N and predictive values. The 87.5% figure given for specificity was,
PC (chance agreement) = {{a + b)la + O + Ic+dHb + dD/N'
in fact, the positive predictive value. The IDD's specificity was
Figure I. Presentation of test data and definitions of critical terms. 95% (19/20).
M. Hesselbrock et al. (1983) administered the Minnesota
Multiphasic Personality Inventory (MMPI) Depression (D)
subscale and the BDI to inpatients being treated for alcohol ad-
Method diction. They presented raw data and the scales' sensitivity and
specificity in Table 2 of their article. Assuming that the raw data
We reviewed all articles published in Psychological Assessment: A are correct, we identified two errors in their calculations. The
Journal of Consulting and Clinical Psychology and the Journal of Con- sensitivity of the BDI was 77% (37/48), not 71%, and the speci-
sulting and Clinical Psychology between 1980 and 1991. We included
only those studies that examined the diagnostic performance of self-
ficity of the MMPI (D) was 43% (71/166), not 57%.
report scales. Studies that compared mean or median scores between Lewis, Turtletaub, Pohl, and Rainey (1990) administered the
groups were excluded. We used data presented in the article to generate MMPI to 60 outpatients with panic disorder and 60 outpatient
the 2 X 2 table illustrated in Figure 1 and computed six diagnostic sta- psychiatric controls. Patient groups were split in half, forming
tistics: sensitivity, specificity, positive and negative predictive power, and a construct and cross-validation sample. The sensitivity, speci-
the overall and chance-corrected level of agreement between the scale ficity, and prevalence rate of illness for the combined groups
and the diagnostic standard. The sample size of all four cells was given and the cross-validation sample are given. We derived the cell
in some of the studies. However, for most studies we had to work back- sizes for the construct sample from the difference in cell values
ward and derive missing cell sizes from the available raw data and the
of the combined and cross-validation groups. The overall cor-
diagnostic efficiency statistics already computed by the authors. For ex-
ample, Hakstian and McLean (1989) reported diagnostic efficiency sta-
rect classification rate for the construct sample was reported to
tistics for 196 depressed and 161 normal control subjects who were be 97%; it was actually 95% (57/60).
given the Brief Screen for Depression and a clinical semistructured in- Nelson and Cicchetti (1991) evaluated the diagnostic perfor-
terview for DSM-III-R (Diagnostic and Statistical Manual of Mental mance of the MMPI (D) in 87 psychiatric outpatients. They
Disorders, 3rd ed., rev; American Psychiatric Association, 1987) diag- presented their results in three forms: a 2 X 2 table, a chart of
noses. Hakstian and McLean presented data in the text and calculated individual scores, and a written account in the text. In the 2 X 2
values for the following statistics at the cutoff score of 21: overall rate of table, the cell sizes are given as percentages, and the row and
agreement, 95.8%; sensitivity, 99.0%; specificity, 91.9%; false positives, column total prevalence rates are given as absolute numbers.
3.6%; and false negatives, 0.6%. The calculation of individual cell sizes
would proceed as follows: cell a = (196 X .990) = 194; cell e = (196 -
We determined the actual cell sizes with this information. The
194) = 2; cell d = (161 X .919)= 148; and cell b = (161 - 148)= 13. data summarized in the text were in agreement with those of
With cell sizes complete, the remaining statistics are easily computed, the individually listed scores. The raw data presented in the 2 X
and the accuracy of the authors' calculations confirmed. 2 table were discrepant. According to the 2 X 2 table, 71 patients
were depressed by DSM-HI (American Psychiatric Associa-
tion, 1980) criteria and 59 by MMPI criteria; according to the
Results text, 59 patients were depressed by DSM-HI criteria and 71 by
We found 26 appropriate studies: 9 were flawed by calcula- the MMPI. They calculated the diagnostic efficiency statistics
tion errors and 3 used unconventional definitions of terms. Two based on the data presented in the 2 X 2 table. We also calcu-
reports did not include enough information to generate a 2 X 2 lated these statistics based on the data presented in the text and
table (Goldberg, Shaw, & Segal, 1987;Lewinsohn&Teri, 1982). the individual scores listed in Table 3 of their report. Each of
None of the 25 remaining studies presented all six statistics, and our values disagreed with those of the authors.
only 2 of the studies reported five of the six statistics. The results Post and Lobitz (1980) administered the MMPI (D) and Mez-
ERRORS IN REPORTING 397

Table 1
Data Presentation and Errors in Reporting of Diagnostic Efficiency Statistics
Statistics reported
Overall
Author Sensitivity Specificity PPP NPP Kappa correct
No errors
Clopton, Weiner, & Davis (1980)
Hakstian & McLean (1989) +
Hodges (1990) +
Keane, Caddell, & K. Taylor
(1988) +
Klein, Dickstein, E. Taylor, &
Harding (1989)
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Kobak, Reynolds, Rosenfeld, &


This document is copyrighted by the American Psychological Association or one of its allied publishers.

Greist(1990) +
McFall, D. Smith, Mackay, &
Tarver(1990) +
Oliver & Simmons (1984) +
Rapp, Walsh, Parisi, & Wallace
(1988) +
M. Smith & Thelen (1984) +
Thelen& Farmer (1991) +
Wolfson & Erbaugh (1984)
Zimmerman & Coryell (1987) +
Insufficient data
Goldberg, Shaw, & Segal (1987)
Lewinsohn & Teri (1982)
Unconventional definitions
Bryer, Marlines, & Dignan
(1990)
M. Hesselbrock, V. Hesselbrock,
Tennen, Meyer, & Workman
(1983)
Parmalee, Powell, & Katz (1989)
Miscalculations
Gallagher, Breckenridge,
Steinmetz, & Thompson
(1983)
Goldston, O'Hara, & Schartz
(1990)
M. Hesselbrock etal.( 1983)
Lewis, Turtletaub, Pohl, &
Rainey(1990)
Nelson & Cicchetti (1991) + +
Post &Lobitz( 1980)
Stukenberg, Dura, & Kiecolt-
Glaser(1990)
Trull (1991) + +
Turner, Beidel, Dancu, &
Stanley (1989)

Note. PPP = positive predictive power; NPP = negative predictive power; (+) = present; (-) = absent.

zich MMPI Depression scale to 162 psychiatric inpatients. classification rate for the Mezzich scale was reported to be 71 %;
They reported prevalence rates, overall level of agreement, false we computed it to be 85% (138/162).
negative and false positive rates for each scale. We used these Stukenberg, Dura, and Kiecolt-Glaser (1990) gave the BDI to
data to generate a 2 X 2 table for both tests. The overall correct 177 elderly community dwellers. They presented the sensitivity,
classification rate for the MMPI (D) scale was given as 65%; specificity, and overall correct classification rate at different BDI
by our calculation, it was 82% (133/162). The overall correct cutoffs. We generated the 2 X 2 table for the BDI at a cutoff
398 JULIE B. KESSEL AND MARK ZIMMERMAN

score of 5 (actual n was 163) based on the prevalence rates, sen- review system. The calculation of diagnostic efficiency statistics
sitivity, and specificity values given by the authors. We calcu- is relatively easy to double-check. This is not true of most other
lated the overall level of agreement as 81% (132/163), not 74% statistics. It is possible that reports based on more complex sta-
as reported by the authors. tistical constructs may be flawed to an even greater degree. We
Trull (1991) administered the MMPI Borderline Personality do not have any easily implemented suggestions to deal with this
Disorder (BPD) subscale to a sample of 395 psychiatric inpa- potential problem. However, the issue of mistakes in scientific
tients. We generated a 2 X 2 table from the data given in Table communication warrants further discussion and, perhaps, in-
4 of his report (prevalence rates, sensitivity, and specificity). By vestigation.
our calculation, the negative predictive power was 81% (197/
243), not 91% as reported by Trull. 1
In the present context, in which tests are used to distinguish ill from
Finally, Turner, Beidel, Dancu, and Stanley (1989) assessed non-ill individuals, false positives are persons with the desirable out-
the performance of the Social Phobia and Anxiety Inventory come (non-ill) who are incorrectly predicted to have the undesirable
(SPAI) in groups of socially anxious college students, nonso- outcome (ill). In a different context, in which the goal is to predict posi-
cially anxious college students, and outpatient social phobics. tive outcomes such as job success, then false positives may refer to indi-
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

We generated the 2 X 2 table for the SPAI scale at a cutoff of 60 viduals with the undesirable outcome who are incorrectly predicted to
This document is copyrighted by the American Psychological Association or one of its allied publishers.

based on the authors' report that the overall correct classifica- have the desirable outcome. Although there is some variability in the
definition of these terms, in the medical literature these terms are de-
tion rate for 84 subjects, 16 of whom were correctly identified
fined as in Figure 1.
as social phobics, was 67.9%. We calculated the false negative
rate as 23.8% (5/21); the authors reported this value to be 14.3%
in Table 3 of their article. References
Three reports included calculations based on statistical terms
not defined as in Table 1 (Bryer, Marlines, & Dignan, 1990; American Psychiatric Association. (1980). Diagnostic and statistical
M. Hesselbrock et al, 1983; Parmalee, Powell, & Katz, 1989). manual of mental disorders (3rd ed.). Washington, DC: Author.
American Psychiatric Association. (1987). Diagnostic and statistical
Parmalee et al. defined false positive rate as b/N and Bryer et al.
manual of mental disorders (3rd ed., rev.) Washington, DC: Author.
as b/(a + b).' It is conventionally defined as b/(b + d). Similarly, Baldessarini, R., Finkelstein, S., & Arana, G. (1983). The predictive
Parmalee et al. defined false negative rate as c/N, rather than c/ power of diagnostic tests and the effect of prevalence of illness. Ar-
(a + c). Finally, M. Hesselbrock et al.'s formula for specificity, chives of General Psychiatry, 40, 569-573.
b/(b + d), is actually the formula for the false positive rate. Bryer, J., Marlines, K., & Dignan, M. (1990). Millon Clinical Multiaxial
Inventory Alcohol Abuse and Drug Abuse scales and the identifica-
tion of substance abuse patients. Psychological Assessment: A Journal
Discussion
of Consulting and Clinical Psychology, 2, 438-441.
We were quite surprised to find such a high rate of errors and Clopton, J., Weiner, R., & Davis, H. (1980). Use of the MMPI in identi-
we do not have a readily apparent explanation. Several articles fication of alcoholic psychiatric patients. Journal of Consulting and
Clinical Psychology, 48, 416-417.
describing the definition and calculation of these statistics have
Fletcher, R., Fletcher, S., & Wagner, E. (1988). Clinical epidemiology.
been published in widely circulated scientific journals. This in- Baltimore: Williams & Wilkins.
formation is also available in most epidemiologic textbooks. Gallagher, D., Breckenridge, J., Steinmetz, J., & Thompson, L. (1983).
We recommend that a standardized reporting format be used The Beck Depression Inventory and Research Diagnostic Criteria:
in future articles of a test's diagnostic performance. Specifically, Congruence in an older population. Journal of Consulting and Clini-
we suggest that the 2 X 2 table be presented as it is outlined in cal Psychology, 5, 945-946.
Figure 1, complete with the cell sizes. This will easily allow Goldberg, J., Shaw, B., & Segal, Z. (1987). Concurrent validity of the
readers and reviewers to double-check calculations and to com- Millon Clinical Multiaxial Inventory depression scales. Journal of
pute statistics not computed by authors. Moreover, as pre- Consulting and Clinical Psychology, 55, 785-787.
Goldston, D., O'Hara, M., & Schartz, H. (1990). Reliability, validity,
viously stated, we believe that sensitivity, specificity, positive
and preliminary normative data for the Inventory to Diagnose De-
and negative predictive power, overall correct classification, and pression in a college population. Psychological Assessment: A Journal
kappa are important measures of test performance and clinical of Consulting and Clinical Psychology, 2,112-215.
utility, and we suggest that they be routinely reported. However, Griner, P., Mayewski, R., Mushlin, A., & Greenland, P. (1981). Selec-
the particular content and focus of the study should determine tion and interpretation of diagnostic tests and procedures. Annals of
the emphasis placed upon one statistic over another. If certain Internal Medicine, 94, 553-592.
statistics are not computed, or alternative ones calculated, the Hakstian, R., & McLean, P. (1989). Brief screen for depression. Psycho-
authors should state the reasons and clearly define the terms and logical Assessment: A Journal of Consulting and Clinical Psychology,
computations. 1, 139-141.
Certainly, the problem of computational inaccuracy occurs Hesselbrock, M., Hesselbrock, V., Tennen, H., Meyer, R., & Workman,
K. (1983). Methodological considerations in the assessment of de-
at the level of the author's report, but peer reviewers also share
pression in alcoholics. Journal of Consulting and Clinical Psychology,
responsibility. Admittedly, it was sometimes time consuming to 51, 339-405.
generate the raw data of the 2 X 2 table, and in 2 studies, it was Hodges, K. (1990). Depression and anxiety in children: A comparison
not possible to do so. Nevertheless, the accuracy of calculations of self-report questionnaires to clinical interview. Psychological As-
and the reporting of diagnostic efficiency statistics should not sessment: A Journal of Consulting and Clinical Psychology, 2, 376-
go unchecked. 381.
Our findings raise some disturbing questions about the peer Keane, X, Caddell, J., & Taylor, K. (1988). Mississippi Scale for Com-
ERRORS IN REPORTING 399

bat-Related Posttraumatic Stress Disorder: Three studies in reliabil- Post, R., & Lobitz, C. (1980). The utility of Mezzich's MMPI regression
ity and validity. Journal of Consulting and Clinical Psychology, 56, formula as a diagnostic criterion in depression research. Journal of
85-90. Consulting and Clinical Psychology, 48, 673-674.
Klein, S., Dickstein, S., Taylor, E., & Harding, K. (1989). Identifying Rapp, S., Walsh, D., Parisi, S., & Wallace, C. (1988). Detecting depres-
chronic affective disorders in outpatients: Validation of the General sion in elderly medical inpatients. Journal of Consulting and Clinical
Behavior Inventory. Journal of Consulting and Clinical Psychology, Psychology, 56, 509-513.
57, 106-111. Sackett, D. (1992). A primer on the precision and accuracy of the clini-
Kobak, K., Reynolds, W., Rosenfeld, R., & Greist, J. (1990). Develop- cal examination. Journal of the American Medical Association, 267,
ment and validation of a computer-administered version of the Ham- 2638-2644.
ilton Depression Rating Scale. Psychological Assessment: A Journal Smith, M., & Thelen, M. (1984). Development and validation of a test
for bulimia. Journal of Consulting and Clinical Psychology, 52, 863-
of Consulting and Clinical Psychology, 2, 56-63.
872.
Lewinsohn, P., & Teri, L. (1982). Selection of depressed and nonde-
Stukenberg, K., Dura, J., & Kiecolt-Glaser, J. (1990). Depression
pressed subjects on the basis of self-report data. Journal of Consulting
screening scale validation in an elderly, community dwelling popula-
and Clinical Psychology, 50, 590-591. tion . Psychological Assessment: A Journal of Consulting and Clinical
Lewis, R., Turtletaub, J., Pohl, R., & Rainey, J. (1990). MMPI differ- Psychology, 2, 134-138.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

entiation of panic disorder patients from other psychiatric outpa- Thelen, M., & Farmer, J. (1991). A revision of the Bulimia Test: The
This document is copyrighted by the American Psychological Association or one of its allied publishers.

tients. Psychological Assessment: A Journal of Consulting and Clini- BULIT-R. Psychological Assessment: A Journal of Consulting and
cal Psychology, 2, 164-168. Clinical Psychology, 3, 119-124.
Mausner, J., & Kramer, S. (1985). Epidemiology—An introductory text. Trull, T. (1991). Discriminant validity of the MMPI-Borderline Person-
Philadelphia: W. B. Saunders. ality Disorder scale. Psychological Assessment: A Journal of Consult-
McFall, M, Smith, D., Mackay, P., &Tarver, D. (1990). Reliability and ing and Clinical Psychology, 3, 232-238.
validity of Mississippi Scale for Combat-Related Posttraumatic Stress Turner, S., Beidel, D., Dancu, C., & Stanley, M. (1989). An empirically
Disorder. Psychological Assessment: A Journal of Consulting and derived inventory to measure social fears and anxiety: The Social
Clinical Psychology, 2, 114-121. Phobia and Anxiety Inventory. Psychological Assessment: A Journal
Nelson, L., & Cicchetti, D. (1991). Validity of the MMPI Depression of Consulting and Clinical Psychology, 1, 35-40.
scale for outpatients. Psychological Assessment: A Journal of Consult- Wolfson, K., & Erbaugh, S. (1984). Adolescent responses to the MacAn-
ing and Clinical Psychology, 3, 55-59. drew scale. Journal of Consulting and Clinical Psychology, 52, 625-
Oliver, J., & Simmons, M. (1984). Depression as measured by the 630.
DSM-lIl and the Beck Depression Inventory in an unselected adult Zimmerman, M., & Coryell, W. (1987). The Inventory to Diagnose De-
population. Journal of Consulting and Clinical Psychology, 52, 892- pression (IDD): A self-report scale to diagnose major depression.
898. Journal of Consulting and Clinical Psychology, 55, 55-59.
Parmalee, P., Powell, L., & Katz, I. (1989). Psychometric properties of
the Geriatric Depression Scale among the institutionalized aged. Psy- Received September 9, 1992
chological Assessment: A Journal of Consulting and Clinical Psychol- Revision received January 20, 1993
ogy, 1, 331-338. Accepted January 20, 1993 •
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/333431653

A Systematic Mapping Study on Soft Skills in Software Engineering

Article  in  JOURNAL OF UNIVERSAL COMPUTER SCIENCE · January 2019


DOI: 10.3217/jucs-025-01-0016

CITATIONS READS
46 1,698

3 authors, including:

Gerardo Matturro
Universidad ORT Uruguay
40 PUBLICATIONS   238 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

2020d Elicitación y validación de requisitos en emprendimientos tecnológicos de software View project

2019e Gestión del MVP y modelo de proceso software para emprendimientos tecnológicos de software View project

All content following this page was uploaded by Gerardo Matturro on 03 June 2019.

The user has requested enhancement of the downloaded file.


Journal of Universal Computer Science, vol. 25, no. 1 (2019), 16-41
submitted: 4/6/18, accepted: 21/12/18, appeared: 28/1/19  J.UCS

A Systematic Mapping Study on Soft Skills in


Software Engineering

Gerardo Matturro
(Universidad ORT Uruguay, Montevideo, Uruguay
matturro@uni.ort.edu.uy)

Florencia Raschetti
(Universidad ORT Uruguay, Montevideo, Uruguay
florencia.raschetti@gmail.com)

Carina Fontán
(Universidad ORT Uruguay, Montevideo, Uruguay
cfontan@gmail.com)

Abstract: To participate in software development projects, team members may need to perform
different roles and be skilled in diverse methodologies, tools and techniques. However, other
skills, usually known as “soft skills” are also necessary. We report the results of a systematic
mapping study to identify existing research on soft skills in software engineering and to
determine what soft skills are considered relevant to the practice of software engineering. After
applying an explicit mapping protocol, 44 papers were finally selected, and 30 main categories
of soft skills were identified. At least half of the studies selected mention five skills:
communication, teamwork, analytical, organizational, and interpersonal skills. We also
identified the data collection methods commonly used for research on this topic: job
advertisements and surveys were the main ones. The results of this work are of interest to
researchers in human aspects of software engineering, to those responsible for Human Resource
in software development companies, and to curriculum designers in careers related to software
engineering and development.

Keywords: Soft Skills, Software Engineering, Systematic Mapping Study


Categories: D.2

1 Introduction
Software development is a highly technical activity that requires people to have
knowledge and experience in diverse software processes, methodologies, tools and
techniques, but also to perform various functions in software projects. When software
companies assemble project teams or hire new professionals, they often tend to
emphasize the knowledge and technical skills of potential candidates. However, the
human dimension may be as critical as technical capacity [Acuña, 06]. When people
work together on a software project, other skills are necessary to implement activities
such as communicating and interacting with other team members and stakeholders in
the project, managing time, presenting progress of the project, negotiating with the
customer, solving problems and making decisions, among others.
Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ... 17
According to Capretz, software professionals should delve into these nontechnical
issues and recognize that the people involved in the software development process are
as important as the processes and the technology itself [Capretz, 14]. The reason for
addressing these human factors is mainly the recognition that software engineers
could benefit from greater self-awareness and of others’ perspectives to develop their
soft skills, which in turn can positively influence their work [Ahmed, 13]. Even
though soft skills often play a critical role in career advancement, many professionals,
especially engineers and other highly technical people, pay little attention to this fact
[Chou, 13]. Soft skills are also mentioned in the literature as "non-technical skills",
"people skills", “transferable skills”, "social skills", or "generic competencies".
The purpose of this paper is to report the results of a systematic mapping of
literature regarding soft skills in software engineering. A systematic mapping or
mapping study is a form of secondary study intended to identify and classify the set of
publications on a topic. Its value lies partly in identifying areas where there is scope
for a fuller review (the group of related studies), and also to find out in which areas
there may be a need for more primary studies [Kitchenham, 16].
As mentioned by Batteson et al. [Batteson, 16], two research methods have
prevailed within the literature regarding the study of soft skills. The first approach
seeks to identify discrete skills considered soft skills. This method typically involves
eliciting lists of soft skills from relevant stakeholders in a given domain, through
surveys and interviews. The other research approach commonly seen starts with an
existing list of soft skills and tests them in relation to some capacity, such as
determining those skills which are most likely to predict performance, or testing the
agreement of importance of different skills across different participant groups.
In this work we took the first approach: our objective is to identify discrete skills
that are considered soft skills in the domain of software engineering, starting from
studies included in the literature mapping. Besides, we also want to know how often
the identified skills are mentioned in the reviewed literature and what research
methods are usually used in researching this topic.
We postulate that, if software engineers are to develop soft skills and relate them
to software projects roles and development activities, soft skills need to be clearly
articulated and defined. Using the list of soft skills gathered from the studies included
in our mapping, the second research approach mentioned above may be used by
ourselves and other researchers and stakeholders, to follow on the study of this topic.
Thus, we consider the results of this work particularly relevant to: a) researchers
interested in the human aspects of software engineering, b) managers responsible for
Human Resource in software development companies and team leaders of software
development projects, c) curriculum designers of study programs related to software
development and information technology, d) students and professionals.
The rest of this paper is organized as follows. In section 2 we provide background
information about what is meant by the term "soft skills" and its relevance to software
engineering professional practice and education. In section 3 we describe the research
method followed to perform this mapping study. In Section 4 we report the results of
the analysis of the selected studies included in the mapping and answer our research
questions. A discussion of those results is presented in Section 5 and threats to
validity are presented in Section 6. Finally, conclusions and further work are
presented in Section 7.
18 Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ...

2 Background
In this section we present a brief review of general literature on the subject of soft
skills. We highlight the lack of a single definition, review three typical approaches to
conceptualize the term and identify common components, in order to develop a
working definition for this study. In addition, we argue on the relevance of soft skills
in software engineering practice and education, and we describe the related work.

2.1 Soft skills in general literature


Several authors have tried to define and characterize the term “soft skills”, but the
general understanding is that this is a very complex task. In this sense, Matteson,
Anderson and Boyden [Matteson, 16] mention that the literature on soft skills is
confusing, and that even though it is recognized that soft skills are important, when
pressed to describe particular soft skills, the concept becomes “murky”. Also Ramesh
and Ramesh [Ramesh, 10] consider that “soft skills” is an abstract, and somewhat
“nebulous”, concept, and Prince [Prince, 13] considers that the term can be interpreted
in many different ways.
According to Dell’Aquila et al. [Dell’Aquila, 2017], it is difficult to find a
universal definition of the concept of soft skills or an all-encompassing definition that
provides a succinct insight. To these authors, “soft skills” is a broad concept that
subsumes many dimensions of the personal sphere development, involving a
combination of emotional, behavioral and cognitive components; because of this, it is
arduous to determine what to include or exclude in the definition of soft skills.
Despite this manifest difficulty, we can find in literature three broad approaches
to define or characterize the concept of "soft skills": by giving an explicit definition,
by giving examples of specific soft skills, and by comparing them to the so-called
technical or “hard” skills.
Some definitions taken from general literature on the subject states that soft skills
are "people skills backed by our emotional intelligence that help us behave in a
socially acceptable manner and adapt ourselves to a social environment” [Verma, 09],
“a compendium of several components like attitude, abilities, habits and practices that
are combined adeptly to maximize one’s work effectiveness” [Ramesh, 10], and as
“interpersonal skills that demonstrate a person's ability to communicate effectively
and build relationships with others in one-on-one interactions as well as in groups and
teams” [Kamin, 13].
A summary of more than 30 other definitions taken from literature, along with an
extensive discussion regarding the many aspects involved in what are called “soft
skills” is given in [Dell’Aquila, 17], and in [Matteson, 16].
Due to the vagueness of the definitions given above and the difficulty of arriving
to a unified definition, some authors try to characterize this kind of skills by giving
examples of what they consider “soft skills”. Thus, soft skills encompass skills as
varied as communicating, managing conflict, negotiating, team-building [Kamin, 13],
leadership, motivation, time management, presentation skills [Rao, 10], problem-
solving, analytical thinking, flexibility, assertiveness [Bhatnaga, 12], mentoring and
coaching, establishing business relationships, nonverbal communication and body
language [Goldberg, 14], stress management, customer service skills, mediation skills
[Aamodt, 16], personal ability to function harmoniously with others, openness to
Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ... 19
learning new ideas, and tolerance to not-so-pleasant situations and differences in
opinions [Verma, 09], just to name a few.
The third approach used by some authors to characterize soft skills is to contrast
them with the so-called “hard skills”. This term usually refers to the technical ability
and the factual knowledge needed to do the job [Klaus, 08], technical competencies
that an individual possesses through educational learning and practical hands-on
application [Bhatnaga, 12], which are usually associated with technical knowledge
and understanding of a process [Prince, 13].
The general opinion among these authors and others is that hard skills alone
might not be sufficient to be successful in the professional life [Bhatnaga, 12]. Soft
skills are as essential to success as technical skills [Goldberg, 14]; they complement
hard skills by making effort much more effective [Ramesh, 10], and are critical for
the success or failure of any individual in the workplace [Klaus, 08], [Tulgan, 15].
While hard skills are useful in a specific area of activity and may become obsolete
over a period of time because of changing technologies, soft skills are useful in all
areas of activity - not only in professional life, but also in personal and social life
[Rao, 10].
Despite the diversity of approaches and opinions on the concept of “soft skills”
reviewed so far, it is still possible to identify four common components in it, as shown
in Table 1.

Component Authors Definition


Abilities [Klaus, 08] Competence in an activity or
[Ramesh, 10] occupation because of one's
[Rao, 10] skill, training, or other
qualification.
Attitude [Ramesh, 10] A predisposition or tendency
[Rao, 10] to respond positively or
negatively towards a certain
idea, object, person, or
situation.
Habits [Verma, 09] An acquired (learned rather
[Ramesh, 10] than innate) behaviour pattern
regularly followed until it has
become almost involuntary.
Personality traits [Klaus, 08] Traits that reflect people’s
[Verma, 09] characteristic patterns of
[Rao, 10] thoughts, feelings, and
behaviours.

Table 1: Components of soft skills

For this study, we will refer to "soft skills" as the combination of the abilities,
attitudes, habits, and personality traits that allow people to perform better in the
workplace, complementing the technical skills required to do their jobs and
influencing the way they behave and interact with others.
20 Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ...

One final aspect we want to comment, that have been pointed out by some
authors, is the apparently difficulty of assessing or measuring soft skills [Bhatnaga,
12], [Tulgan, 15]. To Thomas, the most common, and highly effective, method for
assessing non-technical skills is by using a behavioral marker system [Thomas, 18].
Such a system is defined as a framework that sets out observable, non-technical
behaviors that contribute to superior or sub-standard performance within a work
environment [Klampfer, 01]. Behavioral markers can be used in any domain where
behaviors relating to job performance can be observed. In the domain of software
engineering, one such a system has been proposed by Lacher and colleagues to
measure a set of non-technical skills of software professionals [Lacher, 15].

2.2 Relevance of soft skills in software engineering practice and education


Several authors have argued about the relevance of soft skills in the practice of
software engineering and the need to incorporate this topic in the curriculum of
software engineering in higher education.
In this sense, Zurita et al. affirm that there is currently an increasing need for
students to develop not only technical competences, but also those so called “soft
skills” in order to perform professional activities in an effective and efficient way in a
globalized world [Zurita, 16].
Radermacher and Walia [Radermacher, 13] present a study conducted to
determine in which areas graduating students most frequently fall short of the
expectations of industry or academia. One of those areas is, precisely, personal or soft
skills; they single out oral and written communication, teamwork, problem solving,
critical thinking, ethics, and leadership as examples of this kind of skills.
Sedelmaier and Landes [Sedelmaier, 15] have proposed a body of skills for
software engineering (SWEBOS) on the basis that there are no sound guidelines that
indicate which non-technical skills are particularly relevant for software engineers. To
these authors, non-technical or soft skills (such as communication, collaboration, and
teamwork) are as important as factual or technical knowledge, because software is
usually developed in teams of individuals.
The impact of this kind of skills, or its absence, has been highlighted by
Starkweather and Stevenson who claim that the main source of failure of software
development projects is the lack of soft skills or the consideration of soft issues
[Starkweather, 11]. Bancino and Zevalkink mention that for technology-intensive
projects (such as software projects), industry studies put the failure rate somewhere
between 40% and 70%, and that a recent survey of more than 250 technical leaders
cited a lack of soft skills as the biggest reason for project failure [Bancino, 07].
More recently, Capretz and Ahmed [Capretz, 18] consider that a missing link in
software engineering is the soft skills set that are essential in the software
development process. Under this consideration, although soft skills are among the
most important aspects in the creation of software, they are often overlooked by
educators and practitioners. These authors also agree in that even the latest guideline
for teaching and learning software engineering, namely SWEBOK V3.0 and
IEEE/ACM Curriculum Guide, highlights technical competence and gives only
marginal consideration to vaguely characterized nontechnical (soft) skills.
One last aspect regarding the relevance of soft skills relates to what is called
“employability”. To Hillage and Pollard [Hillage, 98], employability is a multi-
Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ... 21
dimensional construct that includes: the ability to secure first employment; the ability
for an individual to transfer between positions at the same employer, and the ability to
secure employment from a new organisation.
In a study of the factors that influence new graduate employability, Finch et al.
[Finch, 13] found that out of 17 individual employability factors measured, five of the
six highest ranked factors were soft-skills. In addition, their findings illustrate that,
when hiring new graduates, employers place the highest emphasis on soft-skills and
the lowest on academic reputation.
Finally, Ramesh and Ramesh [Ramesh, 10] affirm that the corporate world puts
great emphasis on soft skills, and organizations look and recruit people with
exceptional soft skills from among the pool of technically skilled people.

2.3 Related work


To the best of our knowledge, there are no studies focused on soft skills in software
engineering with the depth and extent presented in this paper. Similar studies found
are the following.
Beecham et al. presented a systematic literature review focused explicitly on
“motivation”, one of the several skills that appear mentioned as soft skills in literature
[Beecham, 08]. The objective of this review was to plot the landscape of current
reported knowledge in terms of what motivates developers, what de-motivates them
and how existing models address motivation.
Naiem et al. presented what they call a “simplified” literature review in [Naiem,
15]. The purpose of this paper is to highlight the gaps that exist for computer science
graduates in Egypt. In this work, the author excluded all studies conducted before
2009 as they consider that old data might not be reliable. The review included only 21
studies, after applying a somewhat vague inclusion criterion: “the study had to be
published in a well-established journal or conference and published on 2009 or later”.
Iriarte and Bayona presented a literature review focused on the link between a
project manager’s soft skills and the success of a project; that is, what soft skills in the
role of IT (Information Technology) project manager are most influential on project
success [Iriarte, 17].
Our study differs from those above mentioned in that a) our focus is on all the
soft skills related to the practice of software engineering mentioned in literature, and
not just on the particularities of motivation, and b) we analyze soft skills related to
general software engineering practice and not just the role of project managers. In
addition, we did not limit our search to studies published after a specific year.

3 Research Method
Based on the guidelines provided in [Kitchenham, 16], the steps taken for this
mapping study were: 1) Definition of research questions, 2) Search of the relevant
literature, 3) Selection of relevant studies, 4) Data Extraction 5) Data aggregation and
synthesis.
22 Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ...

3.1 Research questions


The main research question stated to guide this study was as follow:

• RQ1: What are the soft skills considered relevant to the practice of software
engineering?

By relevant we meant those skills that are considered connected or associated to


the practice of software engineering, and that we want to gather from the empirical
studies included in the mapping.
By software engineering we meant the application of a systematic, disciplined,
quantifiable approach to the development, operation, and maintenance of software, as
defined in [IEEE, 90].

As one of the goals of a mapping study could be to identify research methods


used in researching the topic [Kitchenham, 16], the second research question was:

• RQ2: What are the data sources or research methods used to identify those
soft skills?

Finally, due to the vagueness of the concept of soft skill and the lack of a unified
definition (as explained in Section 2), the third research question was:

• RQ3: How are the identified soft skills defined or characterized?

3.2 Search strategy


The identification of the relevant literature to the study involves, firstly, the definition
of the search strings to be used in bibliographic databases. For the construction of
these search strings, the following keywords were used: "soft skills", "non-technical
skills", "people skills", "personal skills", "generic competencies", “transferable
skills”, “softskills”, "software engineering", "software development", and "software
projects".
To perform the search process, the following bibliographic databases were
consulted: ACM Digital Library (portal.acm.org), SpringerLink
(www.springerlink.com), ScienceDirect (www.sciencedirect.com), and IEEExplore
(ieeexplore.ieee.org).
Since each of these databases proposes a different way to enter the search strings
and the logical operators OR and AND, we chose to use short search strings of the
form <string1> AND <string2>, where <string1> contains alternately the terms "soft
skills", "non-technical skills", "people skills", "personal skills", “transferable skills”,
“softskills” and "generic competencies", while <string2> contains alternately the
terms "software engineering", "software development" and "software projects".
Thus, 21 searches were carried out in each database, giving a total of 84
independent searches. The process ended on December 15, 2017.
Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ... 23
3.3 Inclusion/exclusion criteria for study selection
Selection criteria, also called inclusion / exclusion criteria, are used to identify, from
the whole set of papers returned by the search process, only those that provide direct
evidence about the research questions.

The criteria for study inclusion and exclusion defined for this study were:
• Inclusion: a) journal articles and conference proceedings records without
considering specific publication dates, b) articles presenting results of
empirical studies specifically related to software engineering. In order to be
included in the mapping, a study must meet both criteria.
• Exclusion: a) articles published in journals or conference proceedings not
refereed, b) articles that referred to studies about soft skills in ICT
generically, technical support, hardware or software installation or
maintenance, or technology infrastructure, c) articles in which the data
sources or the data collection procedures were not specified, d) items based
on expert or author opinion (position papers), summaries of articles, book
prefaces, journal editorials, readers’ letters, summaries of workshops,
tutorials, and poster sessions. Any paper that met at least one of these criteria
was excluded from the map.

When a paper appeared repeatedly in our search, we counted it only once.

3.4 Data extraction


A data extraction form, implemented as an Excel spreadsheet, was defined to extract
the relevant information from the selected papers.

From each paper, the following data was recorded:


• Bibliographic data.
• Research method and data sources reported.
• How the author(s) named the concept “soft skills”.
• The soft skills listed in the study, worded exactly as they appear in the
respective paper.
• The definition or characterization (if any) of the soft skills listed in the study.

Data extraction from each paper was done independently by two of the authors
and then compared to verify that no datum was missing, and that all soft skills
mentioned had been recorded in the data extraction form.
Data extraction was performed manually. The decision of not using software tools
to perform automatic extraction was based on the results of a study by Marshall and
Brereton, which concludes that, based on their evidence, most of the tools they
identified were in the initial stages of development and use, and therefore there were
very few empirical corroboration data on its effectiveness [Marshall, 13].
24 Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ...

3.5 Data aggregation and synthesis


A first review of the soft skills recorded in the spreadsheet revealed that many of them
appeared in more than one paper with the same or slightly different name. Data
aggregation consisted in the creation of groups or categories of soft skills that,
independently of the denomination given in the respective paper, represent (in our
consideration) the same general underlying skill. Some examples of the application of
this criterium are given below. See Table 4 for bibliographic data of referred papers.
For example, "written and oral communication" [p11] and "exhibit several
communication styles" [p32] were grouped under the "Communication skills"
category, as they refer to the same underlying skill.
However, not all the cases were as simple as this one. For example, the soft skill
"negotiates to arrive at a consensus or compromise" [p18] was put under the group
"Negotiation skills", because the word "negotiates" appears in the name given by the
author. In this group we also put the skill named "consensus building" [p33], because
the word "consensus" also appears in the name given to the skill. Another example is
the case of the soft skill "possess a 'be the customer' mentality" mentioned in [p3].
After analyzing the context and the explanation given by the author about its
meaning, it was included in the "Customer orientation" category.
After grouping the soft skills in their respective categories, we were in position to
count how many times the skills in each category are mentioned in the selected
papers. In sub-section 4.3 we present the results of this process.

4 Results
In this section we present the outcomes of the search process, the list of papers
selected after applying the inclusion/exclusion criteria, and the answers to the
research questions.

4.1 Search results


With the keywords defined in sub-section 3.2, we performed the described search
process in each of the bibliographic databases. Table 2 shows how many papers were
found in each one.

Database Papers found


IEEExplore 177
SpringerLink 694
ScienceDirect 898
ACM Digital Library 45
Total: 1814

Table 2: Search results in bibliographic databases

The figures in the “Papers found” column consider all the papers retrieved, before
applying the inclusion/exclusion criteria.
Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ... 25
4.2 Studies selected for the review
From the extensive lists of studies obtained through the searches, a selection of the
relevant ones was done by reading and analysing the title, abstract and keywords of
each article, discarding the ones that were clearly unrelated to the research subject and
those that were duplicated. Then, the studies selected in the previous step were
carefully reviewed by reading the Introduction, Methodology section, Results section,
and Conclusion and applying the inclusion and exclusion criteria presented in Section
3.3. If reading the above items was not enough to decide whether to include it or not,
the study was read in its entirety.
After discarding those items that did not meet the inclusion/exclusion criteria and
those that, although containing some of the search keywords, were not directly related
to the focus of the investigation, the final set was reduced to 44 papers. Of them, 43
are written in English and one in Portuguese.
References to the selected papers, classified by year of publication, are shown in
Table 3. Full bibliographic data of these studies is given in Table 4.

Year of Reference Quantity


publication
1999 p1 1
2000 p2 1
2001 p3 1
2002 p4 1
2003 ---- 0
2004 p5 1
2005 p6 1
2006 p7 1
2007 p8 1
2008 p9, p10 2
2009 p11, p12, p13, p14 4
2010 p15, p16, p17, p18, p19, p20 6
2011 p21, p22, p23, p24, p25, p26, p27 7
2012 p28, p29, p30, p31, p32, p33 6
2013 p34, p35, p36 3
2014 p37, p38 2
2015 p39, p40 2
2016 p41, p42 2
2017 p43, p44 2
Total 44

Table 3: References to the selected papers, classified by publication year

The interest for studying the subject of soft skills in software engineering appears
to have increased since 2009, as 34 studies were reported between that year and 2017.
26 Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ...

Ref. Bibliographic data


p1 Seffah, A.: Training developers in critical skills, IEEE Software, Vol. 16, No.
3, pp. 66–70, 1999.
p2 Orsted, M.: Software development engineer in Microsoft. A subjective view
of soft skills required, Int. Conf. Softw. Eng. ICSE 2000 New Millennium, pp.
539–540, 2000.
p3 Bailey, J., Stefaniak, G.: Industry Perceptions of the Knowledge, Skills, and
Abilities Needed by Computer Programmers, in 2001 ACM SIGCPR
conference on Computer personnel research, 2001, pp. 93–99.
p4 Noll, C., and Wilkins, M.: Critical Skills of IS Professionals. A Model for
Curriculum Development, Journal of Information Technology Education, Vol.
1, No. 3, pp. 143–154, 2002.
p5 Gallivan, M., Iii, D., Kvasny, L.: Changing Patterns in IT Skill Sets. A
Content Analysis of Classified Advertising, The Data Base for Advances in
Information Systems, Vol. 35, No. 3, pp. 64–87, 2004
p6 Downey, J.: A framework to elicit the skills needed for software development,
in ACM SIGMIS CPR Conference on Computer personnel research -
CPR’05, 2005, p. 122.
p7 Khamisani, V., Siddiqui, M., Bawany, M.: Analyzing Soft Skills of Software
Engineers using Repertory Grid, in 2006 IEEE International Multitopic
Conference. INMIC ’06, 2006, pp. 259–264.
p8 Laporte, C., Doucet, M., Bourque, P., Belkébir, Y.: Utilization of a Set of
Software Engineering Roles for a Multinational Organization, in International
Conf. on Product-Focused Software Process Improvement, 2007, pp. 35–50.
p9 Jalil, Z., Shahid, A.: Is Non-Technical Person a Better Software Project
Manager?, in 2008 International Conference on Computer Science and
Software Engineering, 2008, pp. 1–5.
p10 Lewis, T., Smith, W., Harrington, K., Hall, W.: Are Technical and Soft Skills
Required? The Use of Structural Equation Modeling to Examine Factors
Leading to Retention in the CS Major, in ICER’08 Fourth International
Workshop on Computing Education Research, 2008, pp. 91–99.
p11 Colomo-Palacios, R., Casado, C., García, A., Gómez, J.: It’s not Only about
Technology, It’s about People: Interpersonal skills as a part of the IT
Education, in Second World Summit on the Knowledge Society, WSKS 2009,
2009, pp. 226–233.
p12 Fernández-Sanz, L. F.: Personal Skills for Computing Professionals, IEEE
Computer, Vol. 42, No. 10, pp. 110–112, 2009.
p13 Ferrari, R., Madhavji, N., Wilding, M.: The Impact of Non-Technical Factors
on Software Architecture, in LMSA ’09, Leadership and Management in
Software Architecture, 2009, pp. 32–36.
p14 Gillard, S.: Soft Skills and Technical Expertise of Effective Project Managers,
Issues in Informing Science and Information Technology, Vol. 6, 2009, pp.
723-729.
p15 Capretz, L F., Ahmed, F.: Making Sense of Software Development and
Personality Types, IT Professional, Vol. 12, No. 1, pp. 6–13, 2010.
p16 Colomo-Palacios, R., Cabezas, F., García, A., Soto, P.: Generic Competences
for the IT Knowledge Workers: A Study from the Field, in Third World
Summit on the Knowledge Society, WSKS 2010, 2010, pp. 1–7.
p17 Fernández-Sanz, L.: Analysis of Non-Technical Skills for ICT Profiles, in 5ta.
Conferencia Ibérica de Sistemas y Tecnologías de Información, CISTI 2010,
2010, pp. 524–529.
Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ... 27
Ref. Bibliographic data
p18 Purao, S.: Designing a Multi-Faceted Metric to Evaluate Soft Skills, in
SIGMIS-CPR’10, 2010, pp. 88–90.
p19 Stevenson D., Starkweather, J.: PM critical competency index: IT execs prefer
soft skills, International Journal of Project Management, Vol. 28, No. 7, pp.
663–671, 2010.
p20 Vale, L., Bessa, A., Vasconcelos, P.: Relevant Skills to Requirement Analysts
According to the Literature and the Project Managers Perspective, in 2010
Seventh International Conference on the Quality of Information and
Communications Technology, 2010, pp. 228–232.
p21 Ahmed, F., Campbell, P., Beg, A., Capretz, L.: What Soft Skills Software
Architect Should Have? A Reflection from Software Industry, in 2011
International Conference on Computer Communication and Management,
2011, vol. 5, pp. 565–569.
p22 Bakar A., Ting, C.-Y.: Soft Skills Recommendation Systems for IT Jobs: A
Bayesian Network Approach, 3rd Conference on Data Mining and
Optimization (DMO), pp. 82–87, 2011.
p23 Colomo-Palacios, R., Casado, C., Tovar, E., Soto, P., García, A.: Is the
Software Worker Competent? A View from Spain, in 4th World Summit on
the Knowledge Society, WSKS 2011, 2011, pp. 261–270.
p24 González, D., Moreno, L., Roda, J.: Teaching ‘Soft’ Skills in Software
Engineering, Global Engineering Education Conference, 2011, pp. 630–637.
p25 Pinkowska, M. and Lent, B.: Evaluation of Scientific and Practice Approaches
to Soft Skills Requirements in the ICT Project Management, in IBIMA
Business Review Journal, 2011, vol. 2011, pp. 1–12.
p26 Pinkowska, M., Lent, B., Keretho, S.: Process based identification of software
project manager soft skills, in 2011 Eighth International Joint Conference on
Computer Science and Software Engineering (JCSSE), 2011, pp. 343–348.
p27 Vale, L., Bessa, A., Vasconcelos, P.: The Importance of Professional Quality
of Requirements Analysts for Success of Software Development Projects: A
Study to Identify the Most Relevant Skills, in 25th Brazilian Symposium on
Software Engineering, 2011, pp. 253–262.
p28 Ahmed, F.: Software Requirements Engineer: An Empirical Study about Non-
Technical Skills, Journal of Software, Vol. 7, No. 2, pp. 389–397, Feb. 2012.
p29 Ahmed, F., Capretz, L. F., Bouktif, S., Campbell, P.: Soft skills requirements
in software development jobs: a cross‐cultural empirical study, Journal of
Systems and Information Technology, Vol. 14, No. 1, pp. 58–81, 2012.
p30 Ahmed, F., Capretz, L. F., Campbell, P.: Evaluating the Demand for Soft
Skills in Software Development, IT Professional, 14, 1, pp. 44–49, 2012.
p31 Litecky, C., Igou, A. J., Aken, A.: Skills in the management oriented IS and
enterprise system job markets, in 50th Annual Conference on Computers and
People Research, 2012, pp. 35–43.
p32 Thurner V., Böttcher, A.: Expectations and Deficiencies in Soft Skills.
Evaluating student competencies in Software Engineering education, in 2012
IEEE Global Engineering Education Conference (EDUCON), 2012, pp. 1–7.
p33 Yu, L., Xin, X., Liu, C., Sheng, B.: Using Grounded Theory to Understand
Testing Engineers’ Soft Skills of Third-Party Software Testing Centers, in 3rd
International Conference on Software Engineering and Service Science, 2012,
pp. 403–406.
p34 Ahmed, F., Capretz, L. F., Bouktif, S., Campbell, P.: Soft Skills and Software
Development: A Reflection from Software Industry, International Journal of
Information Processing and Management, Vol. 4, No. 3, pp. 171-191, 2013.
28 Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ...

Ref. Bibliographic data


p35 Herrmann, A.: Requirements Engineering in Practice. There is no
Requirements Engineer Position, in 19th International Working Conference,
REFSQ 2013, 2013, pp. 347–361.
p36 Matturro, G.: Soft skills in software engineering: A study of its demand by
software companies in Uruguay,” in 2013 IEEE/ACM 6th International
Workshop on Cooperative and Human Aspects of Software Engineering
(CHASE), 2013, pp. 133–136.
p37 Bender, L., Walia, G., Fagerholm, F., Pagels, M., Nygard, K., Münch, J.:
Measurement of the Non-Technical Skills of Software Professionals: An
Empirical Investigation, in 26th International Conference on Software
Engineering & Knowledge Engineering (SEKE 2014), 2014, pp. 478–483.
p38 Sedelmaier Y., Landes, D.: Software Engineering Body of Skills (SWEBOS),
in 2014 IEEE Global Engineering Education Conference (EDUCON), 2014,
no. April, pp. 395-401.
p39 Matturro, G., Raschetti, F., Fontan, C.: Soft Skills in Software Development
Teams. A Survey of the Points of View of Team Leaders and Team Members,
in 2015 IEEE/ACM 8th International Workshop on Cooperative and Human
Aspects of Software Engineering, 2015, pp. 101–104.
p40 Bootla, P., Rojanapornpun, O., Mongkolnam, P.: Necessary Skills and
Attitudes for Development Team Members in Scrum. Thai Experts’ and
Practitioners’s Perspectives, in 12th International Joint Conference on
Computer Science and Software Engineering (JCSSE), 2015, pp. 184–189.
p41 Gupta, R., Manikreddy, P., GV, A.: Challenges in Adapting Agile Testing in a
Legacy Product, in 11th International Conference on Global Software
Engineering (ICGSE), 2016, pp. 104-108.
p42 Pieterse, V., van Eekelen, M: Which Are Harder? Soft Skills or Hard Skills?,
In: Gruner S. (eds) ICT Education. SACLA 2016. Communications in
Computer and Information Science, vol 642, Springer, pp. 160-167.
p43 Jia, J., Chen, Z., Du, X.: Understanding Soft Skills Requirements for Mobile
Applications Developers, 2017 IEEE International Conference on
Computational Science and Engineering (CSE) and IEEE International
Conference on Embedded and Ubiquitous Computing (EUC), 2017, pp. 108-
115.
p44 Daneva, M., Wang C., Hoener, P.: What the Job Market Wants from
Requirements Engineers? An Empirical Analysis of Online Job Ads from the
Netherlands, 2017 ACM/IEEE International Symposium on Empirical
Software Engineering and Measurement (ESEM), Toronto, ON, 2017, pp.
448-453.

Table 4: Bibliographic data of selected papers

4.3 Answer to Research Question 1


As stated in sub-section 3.1, RQ1 is: What are the soft skills considered relevant to
the practice of software engineering?
To answer this question, we extracted the skills mentioned in each paper as
explained in sub-section 3.4 and grouped them in categories as explained in sub-
section 3.5.
The different named skills included in each category are as follow, ordered
alphabetically by category name:
Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ... 29
• Analytical skills: analytical skills, analysis capacity, strong analytical skills,
capacity of analysis, analytical thinking, capacidade de análise, analytical,
abstract and cross-linked thinking, analytical and conceptual thinking,
capacity for analysis and synthesis.
• Autonomy: autonomy, independence, ability to work independently, ability
to work in an autonomous way, autonomy/independence, self-management,
autonomia, autonomy and self-reliance, independent, working
independently.
• Change management: change management, capacity to adapt to varying
situations, open and adaptable to changes, ability to deal with ambiguity and
change, capacity to adapt to changes, facilidade de adaptação a mudanças,
adapting and responding to change, dealing with change.
• Commitment/Responsibility: commitment, compromise skills,
responsibility, sense of responsibility, ability to work thoroughly and handle
responsibilities carefully, willingness to assume personal responsibility.
• Communication skills (oral / written): communication skills, interpersonal
communications, verbal communication skills, communication, ability to
communicate at multiple levels, conversation skills, word power and writing
proficiency, written and oral communication, oral and written
communication in mother tongue, oral and written communication, oral and
writing skills, facilidade de comunicação oral/escrita, written
communications, exhibit several communication styles, face to face
communication.
• Conflict management: conflict management, conflict resolution, handles
conflict maturely, capacity to resolve conflicts, conflict prevention,
recognition and resolution skills, dealing with conflict, capacidade para
resolver conflitos, ability to resolve conflicts constructively, managing
conflicts.
• Creativity: creativity, creative thinking.
• Critical thinking: critical thinking, pensamento critico, thinking (logical,
creative, critical) skills.
• Customer orientation: customer orientation, posess a "be the customer"
mentality, customer-oriented, orientation to customer needs, orientação para
as necessidades do cliente, ability to work closely with users and maintain
positive user or client relationship, responds to and anticipates
clients/customers' goals.
• Decision-making: decision making, acting, valuing, thinking and deciding
skills, judgment and decision-making, to have the ability to make critical
decisions under pressure, capacity to judge, capacidade de julgamento.
• Ethics: ethics, ethical commitment, work ethics, ethical and professional
moral, ethical behaviour skills, integrity/honesty/ethics, high ethical values
and moral courage, behave according to social and ethical norms.
• Fast learner: fast learner, fast learning.
• Flexibility: flexibility.
• Initiative: initiative, taking initiative skills, proatividade, idea initiation
skills, proactive behaviour, initiative and enterprise, proactivity.
30 Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ...

• Innovation: innovation, capacity to innovate, innovative/creative mind,


capacidade para inovar, innovative.
• Interpersonal skills: interpersonal skills.
• Leadership: leadership, leading and supervising.
• Listening skills: active listening skills, listening skills, active listener,
capacity to listen, capacidade para ouvir.
• Methodical: methodic, capacity for methodical work, capacidade metódica.
• Motivation: motivation, motivation to work, self-motivation.
• Negotiation skills: negotiation skills, negotiates to arrive at a consensus or
compromise, negotiating, negotiation/consensus-building.
• Organizational/Planning skills: organizational skills, ability to plan,
organize, and lead projects, being organized, organization skills, planning,
organization and planning, work and task planning, planning and
organization, senso de organização, planning and organizing, sense of
organization, management and planning.
• Presentation skills: presentation skills, delivering presentations, rhetoric,
oratory and presentation proficiency, presenting and communicating
information.
• Problem solving skills: problem solving skills, problem solving process,
strong problem-solving skills, ability to solve problems in a self-directed
fashion, even without external push.
• Results orientation: results orientation, results-oriented, drive for results, be
results oriented, orientação para resultados, achievement orientation skills,
delivering results and meeting customer expectations.
• Stress management: stress management, ability to withstand stress without
losing control, deal well with risk and stress, tolerância à pressão, stress and
workload management, stress handling, pressure tolerance, ability to work
calmly and efficiently, even under time pressure or occupational stress,
coping with pressure and setbacks, work under stress skills, withstanding
pressure.
• Team management: team management, team cohesion management skills.
• Team work: team work, teamwork, ability to work collaboratively in a
team project environment, team player, working in teams, participates as an
effective member of a team, capacity for teamwork, ability to cooperate with
others in a team, working in teams.
• Time management: time management, scheduling skills, time and self-
management, ability to plan the time realistically, of setting up schedules,
and of completing tasks in an organized manner.
• Willingness to learn: willingness to learn, motivation to learn, striving for
life-long learning, eagerness to learn, willingness and ability to become
acquainted with novel subjects and areas over their complete professional
career in a self-directed manner, active learning, lifelong learning.

After grouping the soft skills, we counted how many times the skills in each
category are mentioned in the selected papers, as shown in Table 5. Column “%”
Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ... 31
indicates the percentage of the selected papers that mention a soft skill included in the
respective category.

Soft skill Freq. %


Communication skills 40 91
Team work 30 68
Analytical skills 24 55
Organizational/Planning skills 24 55
Interpersonal skills 23 52
Leadership 21 48
Problem-solving skills 21 48
Autonomy 19 43
Decision-making 15 34
Initiative 14 32
Conflict management 14 32
Change management 13 30
Commitment/Responsibility 13 30
Stress management 13 30
Customer orientation 12 27
Flexibility 12 27
Ethics 11 25
Results orientation 11 25
Time management 11 25
Innovation 10 23
Presentation skills 10 23
Creativity 9 20
Critical thinking 9 20
Negotiation skills 9 20
Listening skills 8 18
Motivation 8 18
Willingness to learn 8 18
Fast learner 7 16
Team management 5 11
Methodical 4 9

Table 5: Main categories of soft skills and number of times they appear mentioned in
the selected papers

We also found many other soft skills that appear mentioned just one or two times
in the selected studies. What follows is a partial list of these other soft skills:
administration skills, appearance, ability to understand diversity, ability to
visualize/conceptualize, ability to apply knowledge, ability to multitask, ability to
give and receive constructive criticism, being persistent, business skills, coaching,
conducting investigations, cooperates with people with different personalities, race or
gender, courage, entrepreneurship, credibility, interviewing skills, role playing skills,
moderation, efficiency, sales, diplomacy, professionalism, follow directions, setting
32 Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ...

and managing expectations, minuteness, trustworthiness, patient, prediction, political


savvy, managing power, environmental sensibility, openness, punctuality, rapport
building, reliability, social graces, understand business culture, passionate,
willingness to travel, positive attitude, coding habit.

4.4 Answer to Research Question 2


As stated in sub-section 3.1, RQ2 is: What are the data sources or research methods
used to identify those soft skills?
Table 6 shows the different data collection methods reported as used in the
selected studies, and the frequency of use.

Research/data collection method Studies %


Job advertisings 17 39
Survey (online or by e-mail) 13 29
Literature review (no systematic) 12 27
Interviews 6 14
Focus groups 2 4
Experiment 1 2

Table 6: Data collection methods used in the selected studies

Several papers report the use of more than one of these methods to collect data;
that is why column “%” does not sum up 100. For example, in [p37], the authors
report the use of literature reviews and of focus groups consisted of employers,
software engineering and computer science industrial professionals and instructors.
Another example is [p3], which reports the use of site interviews, focus groups, and a
web-based survey as data collection methods.

4.5 Answer to Research Question 3


As stated in sub-section 3.1, RQ3 is: How are the identified soft skills defined or
characterized?
Only four of the selected papers present a definition of some soft skills; they are
[p20], [p28], [p30], and [p43]. Those definitions taken from those papers are shown in
Table 7.

Soft skill category Definition


Communication skills Communicate orally and writtenly in simple,
concise, unambiguous, and easily understood
way [p20]. The set of skills that enables a
person to convey information so that it is
received and understood [p28]. The ability to
convey information so that it is well received
and understood [p30]. The ability to
communicate effectively with others [p43].
Conflict management The ability to solve conflicts of interest in
work situations [p20].
Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ... 33
Soft skill category Definition
Customer orientation The ability to identify and meet the needs of
its customers [p20].
Team work The ability of an individual who is good at
working closely with other people [p28]. The
ability to work effectively in a team
environment and contribute toward the
desired goal [p30]. The ability to cooperate
with other teammates during teamwork [p43].
Analytical skills The ability to understand and explain each
part of a whole, to know better than nature,
functions, causes, among others [p20]. The
ability to break a situation down into its
component parts, recognize what needs to be
done and plan a suitable course of action in a
step-by-step way [p28]. The ability to think
logically, analyse and solve problems [p43].
Organizational/Planning skills The ability to sort, prioritize and control the
execution of their tasks according to plan, and
the resources under their responsibility [p20].
The ability of an individual to assess and
prioritize tasks and ensure that they are
completed in a timely manner [p28]. The
ability to efficiently manage various tasks and
to remain on schedule without wasting
resources [p30]. The ability to make people
work efficiently [p43].
Interpersonal skills The person's ability to behave in ways that
increase the probability of achieving the
desired outcomes [p28]. The ability to deal
with other people through social
communication and interactions under
favourable and inauspicious conditions [p30].
Problem-solving skills The ability to evaluate a situation and to
identify an appropriate solution that meets the
customers’ needs [p28]. The ability to
understand, articulate, and solve complex
problems [p30]. The ability to think logically,
analyse and solve problems [p43].
Autonomy The capacity to govern themselves by their
own means [p20]. The individual’s capability
to operate with a reduced level of supervision
to plan and successfully complete tasks
independently [p28]. The ability to carry out
tasks with minimal supervision [p30]. The
ability to complete work independently [p43].
34 Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ...

Soft skill category Definition


Decision-making The ability to judge alternatives and take
appropriate decisions [p20]. The ability to
make sensible decisions based on available
information [p30].
Initiative The ability to propose and / or take any action
without the need for others to come to ask or
say [p20]. The ability to be active and
optimistic to meet challenging work [p43].
Change management The ability to adapt and work effectively with
different situations and face of change [p20].
The ability of an individual to accept changes
in the carrying out of tasks without showing
resistance [p28]. The ability to accept and
adapt to changes when carrying out a task
without showing resistance [p30].
Commitment/Responsibility To be responsible for the work [p43].
Ethics The ability to follow a set of rules and
precepts of value, order, and morality [p20].
Results orientation The ability to achieve and/or exceed sales
goals and/or objectives [p20].
Innovation The ability to identify and create new ideas
and opportunities [p20]. The ability to
produce or propose imaginative and practical
solutions to business problems [p28]. The
ability to come up with new and creative
solutions [p30]. To have creative thinking to
put forward new ideas [p43].
Critical thinking The ability to determine carefully and
deliberately accepted, refutation or
suspension of the trial about a particular piece
of information [p20].
Listening skills The capacity to consider what the
interlocutors are reporting [p20].
Fast learner The ability to adapt to new tasks, roles, or
challenges effectively and with ease [p28].
The ability to learn new concepts,
methodologies, and technologies in a
comparatively short timeframe [p30]. To have
interest in learning and have the ability of
self-learning in short time [p43].
Methodical The ability to use a set of steps, neatly
arranged, set by methods (techniques) to
solve a particular issue or problem [p20].
Table 7: Definitions of some soft skills taken from selected papers
Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ... 35
The selected papers did not include any definition about the following categories
of soft skills: Creativity, Flexibility, Leadership, Motivation, Negotiation skills,
Presentation skills, Stress management, Team management, Time management, and
Willingness to learn.

5 Discussion
The primary purpose of the present mapping study was to identify existing research
on soft skills in software engineering to identify what soft skills are considered
associated to the practice of software engineering. After applying an explicit mapping
protocol, 44 papers were finally selected for further analysis and the lists of the soft
skills mentioned in them were extracted and grouped in 30 categories, as shown in
Table 5.
To create those categories, we followed the procedure describe in sub-section 3.5
and thus, it is debatable that they represent distinct and independent soft skills. In fact,
we recognize that some overlap may exist between some categories.
For example, “presentation skills” (the skills needed to deliver an effective
presentation to a variety of audiences) requires “organizational skills” to prepare and
organize what to deliver in the presentation, “interpersonal skills” to create empathy
with the audience, “decision-making skills” to decide what material to include in the
presentation, and “communication skills” to adequately transmit what is intended to
present. Similarly, conflict management require negotiation and problem-solving
skills as well as oral communication and listening skills.
In aggregating the skills into those distinct categories, we wanted to outline a list
of discrete soft skills in order to advance in the study of their characteristics and their
relationships to the practice of software engineering. Other disciplines besides
software engineering, such as psychology, sociology, and human resource
management, can contribute to better define or conceptualize the set of soft skills and
shed light on their relevance and influence in software engineering practice.
An analysis of the data collection and research methods used in the selected
studies indicates that they rely mostly on job advertisings, followed by surveys to
professionals and practitioners, as shown in Table 6. Job ads, published in newspapers
or in job advertisements portals, are one of the preferred ways software companies
use to advertise job positions when they need to recruit new talent. These ads should
reflect what the industry asks for job positions in software engineering.
The data in Table 6 reveal that the primary studies included in the mapping have
mainly taken the first research approach described in the Introduction, that is, the
elicitation of lists of discrete skills considered soft skills from relevant stakeholders
[Batteson, 16].
On the other hand, Clarkson mentions that the tasks that technical people carry
out fall into three rough groupings: those done primarily as individuals, those done
primarily with other people and those done primarily as leaders of a team. Each
grouping gives rise to a different type of interaction and thus a different set of soft
skills [Clarkson, 01].
In our opinion, here is where our mapping study shows a gap in existing research
that establishes the need for more primary studies to move from collecting lists of soft
skills to studying what different sets of soft skills are required to each grouping of
36 Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ...

tasks during software projects, and their incidence in the general practice of software
engineering.
Thus, we consider the results of this mapping study to be of value for graduate
students and researchers interested in the human aspects of software engineering. Our
results can be taken as a starting point to frame an investigation on the subject within
its general context, such as a prior analysis of the state of the art (a fundamental part
of the process of preparing any academic work), and to plan and develop new studies
to further determine the impact that those skills have on main drivers of software
project success, such as teamwork, interpersonal communication, decision making
and problem solving.
In the Introduction we argued that the results obtained are also relevant to Human
Resource managers in software development companies and team leaders of software
development projects, and to curriculum designers in careers related to software
development and information technology.
As stated above, the main data collection method used in the selected studies was
job advertisings published in newspapers and in job portals. What those ads reflect is
what software companies ask for to new hires, but there is no indication of whether or
how those skills are assessed in hiring decisions or evaluated later on, while people
are working for the company.
If we assume that companies do really assess those skills at hiring time or they
periodically do so later, then those responsible for recruitment and selection can use
the results of this study to determine what kind of soft skills are demanded by peer
organizations, and take them into consideration along with other organizational
aspects such as organizational culture, characteristics of the software development
projects they usually run, and the values and skills of other members of the
organization and their projects teams.
On the other hand, it seems reasonable to think that any software project manager
or team leader will prefer team members that are able to work harmoniously in the
team, make decision and solve problems, negotiate and manage conflicts successfully,
communicate well and establish good interpersonal relationships.
Thus, even if we do not make the above assumption, the results of this study are
still useful for Human Resource managers and software projects team leaders to be
aware of what non-technical skills are usually demanded by the industry and to
identify which of them may be suitable for their environment. As argued by Capretz,
Ahmed and da Silva, it is impossible to exclude human factors from software
engineering experience during software development because software is developed
by people and for people [Capretz, 17].
Regarding the usefulness of the results for curriculum designers of study
programs related to software development and information systems, Capretz and
Ahmed affirm that at present very few programs in software engineering touch on the
topics of teamwork and the evaluation of soft skills [Capretz, 18]. According to these
authors, it is even difficult to find a university that has a full course on the human
aspects of software engineering and consider it unfortunate that the soft skills topics
are far from being part of conventional software engineering education.
Here, we can raise a question: if soft skills are a concept so difficult to define,
how can they be taught?
Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ... 37
From the discussion given in Section 2, although it is hard to have a single,
unified definition of the concept of soft skills itself, there are approximations to the
notion of soft skills, and there is also agreement on several specific soft skills. In this
sense, to Clarkson soft skills are like any other skills. He considers that we can teach
the techniques, but individuals must learn the skill by themselves; they must develop
familiarity and ease with the techniques, and they must adapt their own behavior to
give appropriate responses to new situations [Clarkson, 01]. Under this consideration,
one of the challenges is to choose the appropriate teaching or training method that
gives an individual the opportunity to develop the new skills, within a context that is
close enough to the job he will perform in the labor market. Examples of novel
approaches to teaching soft skills are given in [Dell’Aquila, 17] where the authors
present and discuss several concrete experiences of educational games and training
tools applied to a variety of soft skills, such as negotiation, decision-making,
leadership and problem solving.
In Rao's opinion, there must be an effective coordination between academia,
students, industry and principals of educational institutions to improve this type of
skills among students [Rao, 14], because these skills -in his opinion- improve the
employability of professionals. Therefore, knowing what the most demanded soft
skills in the practice of software engineering are, is of interest for undergraduate
students who will turn to the labor market, and also for graduate professionals seeking
to advance their professional careers. In this sense, Richter and Dumke affirm that
about 80% of people who fail at work do not fail because of lacking technical skills,
but rather because of their inability to relate or communicate well with other people in
a team [Richter, 15]. Communication skills and teamwork are the two most often
mentioned soft skills, as shown in Table 5.
One final aspect we want to discuss is the finding that only 4 papers present
definitions of specific soft skills (Table 7), some of them difficult to interpret and to
grasp their meaning (for example, interpersonal skills as in [p28] or critical thinking
as in [p20]). To advance in the study of soft skills it will be necessary to better
characterize them and use those characterizations in new research to test their value
and incidence in the practice of software engineering.

6 Threats to validity
Several threats to validity have been identified for this systematic review.
First, the keywords used in the search strings as alternative names to "soft skills"
(sub-section 3.2) may not be all the possible options. In our case, we used the
different names found in the literature when writing the background section.
Second, we only accessed the set of databases that were available to us
(subsection 3.2). There are other bibliographic databases and therefore we could have
missed some important studies about the subject.
Third, because of the lack of a formal and unified definition of what a "soft skill"
is, as described in Section 2, it is arguable whether some of the skills reported as
"soft" are actually soft skills. In this case, we counted as "soft skills" all the skills
mentioned as such in the selected studies.
Finally, the soft skills grouped in the categories presented in section 4 are the
most mentioned in the papers included in the mapping. As the main data sources used
38 Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ...

in those papers were job ads, surveys and interviews, it is unclear how strong the
correlation is between being mentioned in those sources and being of value to the
practice of software engineering, a topic that deserves more research.

7 Conclusions and Future Work


In this study, conducted as a systematic mapping of literature, we identified 44
research papers that report empirical studies regarding the topic of “soft skills” in
software engineering. The number of studies finally included in the mapping indicates
that discussion and research on this topic is relevant to the software engineering
community, and that is mainly focused on eliciting lists of discrete skills that are
considered soft skills. The results of our study, which followed a predefined mapping
protocol, allowed us to answer the three research questions initially posed.
To the first research question (What are the soft skills considered relevant to the
practice of software engineering?), the 30 soft skills categories constructed represent
those most mentioned in relation to the practice of software engineering. Among
them, communication skills, team work, analytical skills, organizational skills, and
interpersonal skills are mentioned as such in at least half of the reviewed papers.
Regarding our second research question (What are the data sources or research
methods used to identify those soft skills?), six research or data collection methods
were identified as applied by researchers to study this topic. From them, job
advertisings and the use of surveys to software engineering practitioners (online or by
e-mail) are reported as the two most common.
Regarding our third research question (How are the identified soft skills defined
or characterized?), we found explicit definitions of only 20 soft skills of the 30
categories identified, some of them some of them confusing and imprecise. This may
be indication of how difficult it is to accurately define these skills.
As further work, we are now working with some software companies in Uruguay
to investigate, among other things: a) what impact have these soft skills (or its dearth)
on software teams performance and on software projects outcomes, b) what soft skills
are less developed in the members of their project teams, and c) what actions could be
taken to develop those skills, in order to enhance software development teams
performance and software projects outcomes.
Another aspect of great concern is the inclusion of the topic of soft skills in the
software engineering curriculum at university level. In this sense, we are now also
working on a survey among universities in Latin America to determine if, and how,
soft skills are taught in study programs related to software engineering, software
development, or computer science. Our aim is to create a network or a community of
practice of educators and researchers to foster research and teaching of human aspects
of software engineering and, in particular, the subject of soft skills.

References
[Aamodt, 16] Aamodt, M.: Industrial/Organizational Psychology: An Applied Approach,
Boston, Cengage Learning, 2016.
Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ... 39
[Acuña, 06] Acuña, S., Juristo, N. Moreno, A. M.: Emphasizing human capabilities in software
development, IEEE Software., Vol. 23, No. 2, pp. 94–101, 2006.
[Ahmed, 13] Ahmed, F., Capretz, L. F., Bouktif, S., Campbell, P.: Soft Skills and Software
Development: A Reflection from Software Industry, International. Journal of Information
Processing and Management, Vol. 4, No. 3, pp. 171–191, May 2013.
[Bancino, 07] Bancino, R., Zevalkink, C.: Soft skills: The New Curriculum for Hard-Core
Technical Professionals, Techniques: Connecting Education and Careers, Vol. 82, No. 5, pp.
20-22, 2007.
[Batteson, 16] Matteson, M. L., Anderson, L., Boyden, C.: "Soft Skills": A Phrase in Search of
Meaning. Portal: Libraries and the Academy Vol. 16, No. 1, pp. 71-88, 2016.
[Bhatnaga, 12] Bhatnaga, N.: Effective communication and soft skills. New Delhi, Dorling
Kindersley, 2012.
[Beecham, 08] Beecham, S., Baddoo, N., Hall, T., Robinson, H., Sharp, H.: Motivation in
Software Engineering: A systematic literature review, Information and Software Technology,
50(9), 2008, pp. 860-878.
[Capretz, 14] Capretz, L. F.: Bringing the Human Factor to Software Engineering, IEEE
Software, 31(2), pp. 102–104 (2014).
[Capretz, 17] Capretz, L. F., Ahmed, F., da Silva, F.: Soft sides of software, Information and
Software Technology, 92, pp. 92-94, 2017.
[Capretz, 18] Capretz L. F., and Ahmed, F.: A Call to Promote Soft Skills in Software
Engineering", Psychol Cogn Sci Open J., 4(1), 2018.
[Chou, 13] Chou, W.: Fast-tracking your career. Soft skills for engineering and IT
professionals, Wiley, Hoboken: NJ, 2013.
[Clarkson, 01] Clarson, M.: Developing IT staff. A practical approach. London, Springer-
Verlag, 2001.
[Dell’Aquila, 17] Dell’Aquila, E., Marocco, D., Ponticorvo, M., di Ferdinando, A., Schembri,
M., Miglino, O.: Educational Games for Soft-Skills Training in Digital Environments,
Switzerland, Springer, 2017.
[Finch, 13] Finch, D., Hamilton, L., Baldwin, R., Zehner, M.: An exploratory study of factors
affecting undergraduate employability, Education+Training, Vol. 55, No.7, pp. 681-704, 2013.
[Goldberg, 14] Goldberg, D. M., Rosenfeld, M.: People-Centric Skills. Interpersonal and
communication skills for auditors and business professionals. Hoboken: NJ, Wiley, 2014.
[Hillage, 98] Hillage, J., Pollard, E.: Developing a framework for policy analysis, No. 85,
Institute for Employment Studies, UK Department of Education and Employment, 1998.
[IEEE, 90] IEEE Standard Glossary of Software Engineering Terminology, IEEE std 610.12-
1990, 1990.
[Iriarte, 17] Iriarte, C., Bayona, S.: Soft skills in IT Project success: A systematic literature
review, 6th International Conference on Software Process Improvement, Zacatecas, Oct. 18-20,
2017, pp. 147-160.
[Kamin, 13] Kamin, M.: Soft skills revolution. A guide for connecting with compassion for
trainers, teams, and leaders, Pfeiffer, (2013).
40 Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ...

[Kitchenham, 16] Kitchenham, B. A., Budgen, D., Brereton, P.: Evidence-Based Software
Engineering and Systematic Reviews, CRC Press, Boca Raton, 2016.
[Klampfer, 01] Klampfer, B., Helmreich, R., Hausler, B., Sexton, B., Fletcher, G., Field, P.,
Staender, S., Lauche, L., Dieckmann, P., Amacher, A.: Enhancing performance in high risk
environments: Recommendations for the use of behavioral markers, Behavioral Markers
Workshop, Swissair Training Centre, Zurich, 2001.
[Klaus, 08] Klaus, P.: The hard truth about soft skills. Workplace lessons smart people wish
they’d learned sooner. Harper Collins, 2008.
[Lacher, 15] Lacher, L., Walia, G., Fagerholm, F., Pagels, M., Nygard, K., Münch, J.: A
Behavior Marker tool for measurement of the Non-Technical Skills of Software Professionals:
An Empirical Investigation, 27th International Conference on Software Engineering and
Knowledge Engineering (SEKE 2015), Pittsburgh, 2015.
[Marshall, 13] Marshall, C., Brereton, P.: Tools to Support Systematic Literature Reviews in
Software Engineering: A Mapping Study, in 2013 ACM / IEEE International Symposium on
Empirical Software Engineering and Measurement, 2013, pp. 296–299.
[Naiem, 15] Naiem, S., Abdellatif, M.: Evaluation of Computer Science and Software
Engineering Undergraduate’s Soft Skills in Egypt from Student’s Perspective, Computer and
Information Science, 8(1), 2015.
[Prince, 13] Prince, E. S.: The advantage. The 7 soft skills you need to stay one step ahead.
Financial Times Press, 2013.
[Radermacher, 13] Radermacher, A., Walia, G.: Gaps Between Industry Expectations and the
Abilities of Graduates, Proceeding of the 44th ACM Technical Symposium on Computer
Science Education (SIGCSE 2013), Denver, March 6–9, pp. 525-530, 2013.
[Ramesh, 10] Ramesh, G., Ramesh, M.: The ACE of soft skills. Attitude, communication and
etiquette for success. New Delhi, Dorling Kindersley, 2010.
[Rao, 10] Rao, M. S.: Soft skills enhancing employability. New Delhi, I. K. International
Publishing House, 2010.
[Rao, 14] Rao, M. S.: Enhancing employability in engineering and management students
through soft skills, Industrial and Commercial Training, 46 (1), pp. 42-48, 2014.
[Richter, 15] Richter, K., Dumke, R.: Modeling, Evaluation, and Predicting IT Human
Resources, CRC Press, Boca Raton, 2015.
[Sedelmaier, 15] Sedelmaier, Y., Landes, D.: SWEBOS – The Software Engineering Body of
Skills, International Journal of Engineering Pedagogy, 5(1), pp. 20-26, 2015.
[Starkweather, 11] Starkweather, J. A., Stevenson, H. S.: IT hiring criteria vs. valued IT
competencies, Managing IT Human Resources: Considerations for Organizations and
Personnel, IGI Global, Hershey, 2011.
[Thomas, 18] Thomas, M.: Training and Assessing Non-Technical Skills: A Practical Guide,
Boca Raton, CRC Press, 2018.
[Tulgan, 15] Tulgan, B.: Bridging the soft skills gap. How to teach the missing basics to
today’s young talent. Hoboken: NJ, Jossey-Bass, 2015.
[Verma, 09] Verma, S.: Soft skills for the BPO sector. New Delhi, Dorling Kinderley, 2009.
Matturro G., Raschetti F., Fontan C.: A Systematic Mapping Study ... 41
[Zurita, 16] Zurita, G., Baloian, N., Pino, J., Boghosian, M.: Introducing a Collaborative Tool
Supporting a Learning Activity Involving Creativity with Rotation of Group Members, Journal
of Universal Computer Science, 22 (10), pp. 1360-1379, 2016.

View publication stats


See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/309016250

Giftedness and academic success in college and university: Why emotional


intelligence matters

Article  in  Gifted Education International · October 2016


DOI: 10.1177/0261429416668872

CITATIONS READS
20 1,013

3 authors:

James D A Parker Don Saklofske


Trent University The University of Western Ontario
208 PUBLICATIONS   34,965 CITATIONS    572 PUBLICATIONS   10,704 CITATIONS   

SEE PROFILE SEE PROFILE

Kateryna V Keefer
Trent University
29 PUBLICATIONS   1,177 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Theory of mind reserarch View project

Taxonomy of Human Personality Constructs View project

All content following this page was uploaded by James D A Parker on 20 July 2020.

The user has requested enhancement of the downloaded file.


Article

Gifted Education International


2017, Vol. 33(2) 183–194
Giftedness and academic ª The Author(s) 2016
Reprints and permission:
success in college and sagepub.co.uk/journalsPermissions.nav
DOI: 10.1177/0261429416668872
journals.sagepub.com/home/gei
university: Why emotional
intelligence matters

James DA Parker
Trent University, Canada

Donald H Saklofske
University of Western Ontario, Canada

Kateryna V Keefer
Trent University, Canada

Abstract
Much of the work on predicting academic success in postsecondary education has
focused on the impact of various cognitive abilities, although in recent years there has
been increased attention to the role played by emotional and social competency (also
called emotional intelligence (EI)). Previous work on the link between EI and giftedness is
reviewed, particularly factors connected to the successful transition to postsecondary
education. Data are presented from a sample of 171 exceptionally high-achieving sec-
ondary students (high school grade-point average of 90% or better) who completed a
measure of trait EI at the start of postsecondary studies and who had their academic
progress tracked over the next 6 years. High-achieving secondary students who com-
pleted an undergraduate degree scored significantly higher on a number of EI dimensions
compared to the secondary students who dropped out. Results are discussed in the
context of the importance of EI in the successful transition from secondary to post-
secondary education.

Keywords
giftedness, emotional intelligence, post secondary, achievement

Corresponding author:
James DA Parker, Department of Psychology, Trent University, Peterborough, Ontario K9 J 7B8, Canada.
Email: jparker@trentu.ca
184 Gifted Education International 33(2)

The transition to adulthood is a critical life period that has far-reaching personal, social,
and economic implications (Arnett, 2000). Important markers of the transition to adult-
hood include completing an education, becoming financially self-sufficient, living inde-
pendently, establishing one’s own family, and attaining psychosocial stability
(subjective feelings of self-esteem, belongingness, and life satisfaction). For individuals
who pursue postsecondary education, the transition from high school to university or
college is almost universally perceived as a highly stressful experience (Perry et al.,
2001), with stress levels typically rated higher in first year compared to subsequent years
(Ross et al., 1999). As one consequence, the highest dropout from university and college
typically takes place during the first year of study (Pancer et al., 2000). The reasons
reported by students for withdrawing or transferring from a specific university or college
are consistently linked to the stress of making the transition from high school that
includes settling on the ‘‘right’’ academic program, financial concerns, health problems,
and personal issues (Pancer et al., 2000). The latter reason (personal issues) is often the
most common, involving issues like problems making new friends, difficulties being
away from existing friends and family, and developing appropriate work habits for the
new learning environment (Parker et al., 2004, 2006).
While less is known about the successful transition to university for gifted students
(Rinn and Plucker, 2004), there are important empirical hints that the key issues are quite
similar to other groups of students. While intellectually gifted students do not comprise a
homogeneous group and may have experienced different types of academic programs
and accommodations prior to postsecondary entry (see Schwean et al., 2006), researchers
have reported that factors like the need to make new friends, adjusting to changes with
existing relationships, as well as adjusting to a qualitatively different learning environ-
ment are also key obstacles for gifted students making the successful transition to
university and college (Gómez-Arı́zaga and Conejeros-Solar, 2013; Hammond et al.,
2007; Muratori et al., 2003). This prior research on gifted students can only be sugges-
tive, however, since it has a number of methodological limitations. Previous research has
typically assessed academic success over quite narrow timelines (e.g. single terms), or
compromised the interpretability of results by combining into a common data set full-
and part-time students, young adults and mature students, and students at different stages
of the transition process (e.g. first-year students vs. students about to graduate from
university). The aim of the present study was to explore the predictors of successful
transition to postsecondary education in a relatively homogeneous sample of exception-
ally high-achieving secondary students.

Predicting the successful transition to postsecondary education


Traditionally, researchers studying postsecondary achievement and persistence relied on
a set of demographic and academic variables such as gender, socioeconomic status,
aptitude tests, and high school performance (Tinto, 1993). More recently, however,
models of student success and persistence recognize the importance of a more complex
network of variables connected to student involvement with university life, as well as
emotional and interpersonal adjustment (Berger and Milem, 1999; Pascarella and
Terenzini, 2005; Robbins et al., 2006). One relevant construct that has attracted the
Parker et al. 185

attention of educational researchers in recent years is that of emotional intelligence (EI;


Perera and DiGiacomo, 2013; Stough et al., 2009).
There are two important theoretical perspectives on the nature of EI: trait based and
ability based. Mayer et al. (1999) are representative of a group of researchers who
define EI as a set of intelligence-like abilities assessed with performance-based tests
where individuals solve problems designed to estimate their actual levels of emotional
knowledge. Alternatively, researchers including Bar-On (1997, 2000) and Petrides
(2010) construe EI as a set of personality dispositions that can be measured with
self-report questionnaires that tap into individuals’ subjective emotional beliefs,
values, and self-concepts.
Although all aspects of EI have been linked to important life outcomes, the trait or
self-efficacy component has proven to be the most robust correlate of socioemotional
functioning (Martins et al., 2010; Zeidner et al., 2012). Importantly, the trait EI effects
are independent of, and additive to the effects of EI abilities. In fact, recent studies
suggest that possessing sufficient emotional abilities may not be enough to motivate
adaptive behavior; one must also feel emotionally self-efficacious in order to put those
skills into action (Davis and Humphrey, 2014). This evidence has led many scholars to
recognize that both being emotionally competent (ability EI) and feeling emotionally
competent (trait EI) comprise unique and complementary aspects of EI (Cherniss, 2010;
Keefer, 2015).
Although both ability and trait aspects of EI have been linked with important aca-
demic outcome variables, the trait EI approach has generated the largest and most
consistent body of work (Perera and DiGiacomo, 2013). An influential trait EI model
in the field is one proposed by Bar-On (1997, 2000) that consists of four core dimen-
sions: (1) intrapersonal (comprised of several related abilities like recognizing and
understanding one’s feelings); (2) interpersonal (comprised of several related abilities
like empathy); (3) adaptability (consisting of abilities like being able to adjust one’s
emotions and behaviors to changing situations and conditions); and (4) stress manage-
ment (consisting of abilities like resisting or delaying an impulse). Contributing to
interest in the Bar-On model is the fact that reliable and valid self-report and observer
measures have been developed for the corresponding EI dimensions (for reviews, see
Parker et al., 2011; Wood et al., 2009).
Parker et al. (2004) examined the impact of trait EI on the academic achievement of
first-year students who had recently graduated from high school. At the start of the
academic year, students completed the short version of the Emotional Quotient Inven-
tory (EQ-i: S; Bar-On, 2002). At the end of the academic year, academically high-
achieving students (first-year grade-point averages (GPAs) of 80% or greater) were
found to score significantly higher on a cross section of EI dimensions compared to
underachieving students (first-year GPAs less than 60%). Parker et al. (2004) also note
that the two groups did not differ on course load or high school GPA. In a related study,
Parker et al. (2006) examined academic retention among first-year students. Students
who withdrew from the university before the start of a second year were found to score
significantly lower than students who persisted on most of the EI dimensions assessed by
the EQ-i: S. Similar results have been replicated with independent samples of
186 Gifted Education International 33(2)

postsecondary students from the United States (Parker et al., 2005) and the United
Kingdom (Qualter et al., 2009).
To help explain this pattern of results for academic achievement, it is worth noting
that EI has been linked to a number of positive indicators in postsecondary settings,
including fewer physical fatigue symptoms (Brown and Schutte, 2006; Thompson et al.,
2007), better overall adjustment and life satisfaction (Extremera and Fernández-
Berrocal, 2006; Saklofske et al., 2003), and less social anxiety and loneliness (Summer-
feldt et al., 2006). Overall, it would appear that students who have higher EI experience
more positive social support and tend to use more positive and adaptive coping strategies
(Austin et al., 2010; Saklofske et al., 2007).

Giftedness and EI
Past research on the relationship between EI and giftedness has produced a very mixed
set of findings. Some researchers have reported that levels of EI-related competencies
are quite similar in samples of gifted individuals and their typically developed peers
(Morawska and Sanders, 2008; Schwean et al., 2006), while others have reported gifted
students to be more vulnerable than other groups of students to social and emotional
problems like depression, shyness, and poor peer relationships (e.g. Plucker and Levy,
2001; Silverman, 1993; Wellisch and Brown, 2012). To make generalizations about the
link between EI and giftedness even more difficult, there is yet a third line of research
that suggests gifted students might actually be better prepared to cope with emotional
and social problems than their typically developed peers (Eklund et al., 2015; Neihart
et al., 2002).
There are certainly multiple reasons for the inconsistent findings in research on the
association between EI and giftedness. A key factor is likely the different operational
definitions that have been used to identify gifted and typically developed individuals,
with some studies placing emphasis on academic achievement versus cognitive abilities
(Martin et al., 2009). Zeidner et al. (2005) have also suggested that the methods used to
assess EI may play a key role in the inconsistent findings on the association between EI
and giftedness. In their work with high school students, Zeidner et al. (2005) found that
gifted students scored significantly higher than typically developed students only when
ability measures of EI were used.
The differential pattern of results reported by Zeidner et al. (2005) also hints at a
broader explanation for the lack of consistency in the EI and giftedness literature. When
giftedness is defined via extreme scores on traditional intelligence measures, the issue of
EI differences becomes quite mute. Theoretically, proponents of both ability and trait
models propose that EI is not related to conventional intelligence (Austin et al., 2008);
researchers developing tools connected to both ability and trait models (e.g. Bar-On,
1997; Mayer et al., 1999, 2002) have gone to great lengths to demonstrate that their
measures correlate very weakly with cognitive measures. However, it is worth noting
that the Mayer-Salovey-Caruso Emotional Intelligence Test (the ability measure used in
Zeidner et al., 2005) has been found to correlate moderately with measures of cognitive
intelligence (unlike trait EI measures; see Webb et al., 2014), which may account for its
positive correlation with giftedness.
Parker et al. 187

Present study
While EI-related abilities may ultimately not be different in gifted populations, an
interesting empirical question is whether the impact of EI in students making the transi-
tion to university can be generalized to gifted or high-achieving secondary students. To
date, this issue has not been systematically examined (Gómez-Arı́zaga and Conejeros-
Solar, 2013). The present study used a sample of 171 exceptionally high-achieving
secondary students (GPA of 90% or better in high school) who completed a measure
of trait EI at the start of their postsecondary studies. Following a procedure used by
Keefer et al. (2012), student’s academic progress was subsequently tracked over the next
6 years to see if the EI variables could be used to distinguish between high-achieving
secondary students who withdrew from the university and those students who completed
an undergraduate degree.

Method
Participants
The sample consisted of 171 undergraduate students (26 men, 145 women) enrolled at a
medium-sized university in Central Ontario, Canada. In terms of ethnicity, majority of
the participants (94.1%) identified themselves as White/Caucasian, 1.8% as Asian, and
the remaining 4.1% represented a mix of other cultural backgrounds. Participants were
on average 18.9 years of age (SD ¼ 0.62) at the time they started their postsecondary
education. All of the participants had been exceptionally high-achieving high school
students, with GPAs used for admittance to the university at 90.0% or higher.

Measures
Participants completed the EQ-i: S (Bar-On, 2002). This self-report tool contains 35
items measuring EI competencies in four domains: interpersonal (10 items), intraperso-
nal (10 items), adaptability (7 items), and stress management (8 items). Scores on the
four EI subscales can be further summed to provide a total EQ score. Higher scores on
the measures reflect higher levels of EI. With the participants’ consent, the following
information was also obtained from the university registrar’s records: high school GPA
(measured on a 100% scale); registration status at a 6-year follow-up (graduated vs.
withdrew).

Procedure
Participants came from several consecutive cohorts of first-year students totaling 3908
cases (72% women) from the same university. Newly registered students in each cohort
were approached by the researchers during introductory week activities (held in the first
week of September) and asked to volunteer for a study on ‘‘personality and academic
success.’’ At that time, consenting participants completed the EQ-i: S and provided
permission to obtain their high school GPA and to track their subsequent degree progress
via official university records.
188 Gifted Education International 33(2)

Exceptionally high-achieving students were identified as individuals who were admit-


ted to university with a high school GPA of 90.0% or higher. For the combined cohorts of
students (N ¼ 3908), this cutoff value produced a set of 4.4% of cases. Six-year gradua-
tion rate for all cohorts combined was 43.6%. The high-achieving group was found to
graduate at a significantly higher rate (p < 0.002) than other students in the data set
(56.2% vs. 43.2%), although scores on the various EQ-i: S scales/subscales were not
significantly different for the two groups (p > 0.05).

Results
A gender by group (graduated vs. withdrew) analysis of variance (ANOVA) was con-
ducted with high school GPA as the dependent variable. The main effects for gender and
group were not significant, nor was the interaction of gender and group. This eliminated
the need to include high school GPA as a covariate in subsequent analyses.
To examine the relationship between EI and academic success in high-achieving
students making the transition from high school to university, a gender by group
(graduated vs. withdrew) by EI dimension (interpersonal, intrapersonal, stress manage-
ment, vs. adaptability), ANOVA was conducted with EI level as the dependent variable.
Because of an unequal number of items on the EQ-i: S subscales, the ANOVA compared
mean item scores rather than scale scores. Table 1 presents the means and standard
deviations by gender and group for the various EQ-i: S scales. The main effect for
gender was not significant, nor was the interaction of gender and group, gender and
dimension, and the three-way interaction of gender, group, and dimension. The main
effect for group was significant, with the students who completed their degrees scoring
higher than the students who withdrew on overall EI (F(1167) ¼ 16.31, p < 0.001). The
main effect for dimension was also significant (F(3501) ¼ 23.132, p < 0.001). Multiple
comparisons (Student–Newman–Keuls procedure) found that students scored signifi-
cantly higher on interpersonal abilities compared to the other abilities assessed by the
EQ-i: S. Students also scored significantly higher on stress management compared to
adaptability and intrapersonal.
To understand the main effect for group, separate univariate F tests were con-
ducted comparing students who graduated and students who withdrew from the
university on each of the four EQ-i: S scales. The graduating students scored sig-
nificantly higher than the students who withdrew on interpersonal ability (F(1167) ¼
9.05, p ¼ 0.003), stress management (F(1167) ¼ 6.42, p ¼ 0.012), and adaptability
(F(1167) ¼ 21.07, p < 0.001).

Discussion
The present study found that exceptionally high-achieving high school students who
entered university with lower trait EI scores were significantly less likely to graduate
with a degree 6 years later, compared to their high-EI peers. These results are very
consistent with a previous study on the link between trait EI and successful transition
to university in the general student population, conducted at the same postsecondary
institution. Parker et al. (2006) found that, despite having comparable age, course load,
Table 1. Means and standard deviations on the EQ-i: S scales for gifted students who graduated or withdrew.

Graduated Withdrew Combined

Scale Men Women Total Men Women Total Men Women Total

Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD)

Interpersonal 4.21 (.46) 4.38 (.42) 4.30 (.43) 3.85 (.51) 4.19 (.43) 4.13 (.46) 4.01 (.51) 4.30 (.43) 4.25 (.45)
Intrapersonal 3.72 (.65) 3.80 (.63) 3.78 (.63) 3.56 (.48) 3.47 (.76) 3.48 (.72) 3.63 (.55) 3.65 (.71) 3.64 (.68)
Adaptability 3.87 (.80) 3.75 (.58) 3.77 (.61) 3.51 (.37) 3.44 (.67) 3.45 (.63) 3.68 (.62) 3.61 (.64) 3.62 (.63)
Stress management 4.15 (.51) 4.02 (.51) 4.04 (.51) 3.43 (.52) 3.59 (.69) 3.56 (.66) 3.76 (.62) 3.82 (.64) 3.81 (.63)
Total 3.99 (.47) 3.99 (.37) 3.99 (.38) 3.58 (.38) 3.66 (.46) 3.65 (.44) 3.77 (.46) 3.84 (.44) 3.83 (.44)
Note: EQ-i: S: short version of the Emotional Quotient Inventory. N for graduate group was 91 (12 men and 79 women); N for withdraw group was 80 (14 men and
66 woman).

189
190 Gifted Education International 33(2)

and high school GPA, students who entered university with lower EI scores were sig-
nificantly more likely to withdraw from the university after the first year of study than
their higher EI peers. Together, these studies indicate that trait EI is a significant pre-
dictor of the successful postsecondary transition for gifted and typically developed
students alike.
It is worth noting that the EI dimensions (i.e. interpersonal, adaptability, and stress
management) that distinguished between students who withdrew and students who
graduated in the present study were also significant predictors of persistence in the
Parker et al. (2006) study. The interpersonal dimension involves abilities connected
to having good social skills and being able to interact effectively with other people
(Bar-On, 1997, 2000). The adaptability dimension involves skills related to the ability
to identify potential problems in the environment, as well as the use of realistic and
flexible coping strategies (Bar-On, 1997, 2000). The stress management dimension
involves the ability to manage stressful situations in a calm and productive manner.
Individuals who score high on this dimension are rarely impulsive and tend to work
well under pressure (Bar-On, 1997, 2000).
The link between academic success (completing an undergraduate degree) and EI in
high-achieving secondary students is hardly surprising. Graduating from high school and
going on to complete an undergraduate degree is a substantial accomplishment. It is
worth noting that slightly less than half of the students in the total cohort from which the
present sample was drawn actually graduated from the university. This is not an uncom-
mon completion rate for many postsecondary institutions in Canada and the United
States (Ross et al., 2012; Shaienks et al., 2008). Students at college and university are
confronted with a bewildering array of new personal and interpersonal challenges, even
more complicated if they attend school outside of their hometown (Witkow et al., 2015).
Not only do they need to modify existing relationships with family and high school
friends, they need to adapt to a dynamic learning environment that changes considerably
from first to upper year (Fussell et al., 2007). Not only does the academic environment
change but the financial costs of university add even more complexity to the task of
persisting, particularly if students must also balance school and work-related activities
(see, Moulin et al., 2013).
An interesting feature of the sample of exceptionally high-achieving secondary stu-
dents used in this study was that they did not differ from the rest of their student peers on
trait EI. This finding should probably not be considered surprising, since the measure of
EI used in the present study correlates very weakly with measures of cognitive intelli-
gence (Bar-On, 2002; Webb et al., 2014). Perhaps the inconsistent findings in the prior
literature on EI and giftedness is a product of the reality that gifted populations (partic-
ularly when individuals are identified via elite academic or IQ performance) are not
substantially different on emotional and social competencies. Given the growing evi-
dence that EI significantly contributes to educational performance (Keefer et al., 2012;
Perera and DiGiacomo, 2013), and the availability of psychoeducational programming
designed to enhance these competencies in students of all ages (Durlak et al., 2011;
Schutte et al., 2013; Vesely et al., 2014), university-based retention programs targeting
gifted students may want to pay particular attention to promoting various emotional and
social competencies.
Parker et al. 191

Ultimately the findings of the present study are quite consistent with the growing
consensus that successes in life, whether inside or outside of the classroom, ‘‘requires
both head strengths and heart strengths.’’ (Park and Peterson, 2010; Pfeiffer, 2013) One
limitation of the present study to note is that the sample of exceptionally high-achieving
students was comprised predominately of White female students. Future research needs
to encompass a wider range of ethnic backgrounds and include a more equivalent
proportion of male and female students.

Acknowledgements
This study was supported by a research grant to the first author from the Social Sciences and
Humanities Research Council of Canada.

Declaration of Conflicting Interests


The author(s) declared no potential conflicts of interest with respect to the research, authorship,
and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this
article.

References
Arnett JJ (2000) Emerging adulthood: a theory of development from the late teens through the
twenties. American Psychologist 55: 469–480.
Austin EJ, Parker JDA, Petrides KV, et al. (2008) Emotional intelligence. In: Boyle GJ, Matthews
G and Saklofske DH (eds) The SAGE Handbook of Personality Theory And Assessment. Vol. 1.
London: SAGE Publications, pp 576–596.
Austin EJ, Saklofske DH and Mastoras SM (2010) Emotional intelligence, coping and
exam-related stress in Canadian undergraduate students. Australian Journal of Psychology
62: 42–50.
Bar-On R (1997) The Emotional Quotient Inventory (EQ-i). Technical Manual. Toronto:
Multi-Health Systems, Inc.
Bar-On R (2000) Emotional and social intelligence: insights from the emotional quotient inven-
tory. In: Bar-On R and Parker JDA (eds) Handbook of Emotional Intelligence. San Francisco:
Jossey-Bass, pp. 363–388.
Bar-On R (2002) Bar-On Emotional Quotient Inventory short form (EQ-i: Short): Technical
Manual. Toronto: Multi-Health Systems.
Berger JB and Milem JF (1999) The role of student involvement and perceptions of integration in a
causal model of student persistence. Research in Higher Education 40: 641–664.
Brown RF and Schutte NS (2006) Direct and indirect relationships between emotional intelligence
and subjective fatigue in university students. Journal of Psychosomatic Research 60: 585–593.
Cherniss C (2010) Emotional intelligence: toward clarification of a concept. Industrial and
Organizational Psychology 3: 110–126.
Davis SK and Humphrey N (2014) Ability versus trait emotional intelligence: dual influences on
adolescent psychological adaptation. Journal of Individual Differences 35: 54–62.
192 Gifted Education International 33(2)

Durlak JA, Weissberg RP, Dymnicki AB, et al. (2011) The impact of enhancing students’ social
and emotional learning: a meta-analysis of school-based universal interventions. Child Devel-
opment 82: 474–501.
Eklund K, Tanner N, Stoll K, et al. (2015) Identifying emotional and behavioral risk among gifted
and nongifted children: a multi-gate, multi-informant approach. School Psychology Quarterly
30: 197–211.
Extremera N and Fernández-Berrocal P (2006) Emotional intelligence as predictor of mental,
social, and physical health in university students. The Spanish Journal of Psychology 9: 45–51.
Fussell E, Gauthier AH and Evans A (2007) Heterogeneity in the transition to adulthood: The cases
of Australia, Canada, and the United States. European Journal of Population 23: 389–414.
Gómez-Arı́zaga MP and Conejeros-Solar ML (2013) Am I that talented? The experiences of gifted
individuals from diverse educational backgrounds at the postsecondary level. High Ability
Studies 24: 135–151.
Hammond D, McBee M and Hebert T (2007) Exploring the motivational trajectories of gifted
university students. Roeper Review 29: 197–205.
Keefer KV (2015) Self-report assessments of emotional competencies: a critical look at methods
and meanings. Journal of Psychoeducational Assessment 33: 3–23.
Keefer KV, Parker JDA and Wood LM (2012) Trait emotional intelligence and university
graduation outcomes: using latent profile analysis to identify students at risk for degree
non-completion. Journal of Psychoeducational Assessment 30: 402–413.
Martin LT, Burns RM and Schonlau M (2009) Mental disorders among gifted and nongifted youth:
a selected review of the epidemiologic literature. Gifted Child Quarterly 54: 31–41.
Martins A, Ramalho N and Morin E (2010) A comprehensive meta-analysis of the relationship
between emotional intelligence and health. Personality and Individual Differences 49:
554–564.
Mayer JD, Caruso D and Salovey P (1999) Emotional intelligence meets traditional standards for
an intelligence. Intelligence 27: 267–298.
Mayer JD, Salovey P and Caruso DR (2002) Mayer-Salovey-Caruso Emotional Intelligence Test
(MSCEIT) User’s Manual. Toronto: Multi-Health Systems.
Morawska A and Sanders MR (2008) Parenting gifted and talented children: what are the key child
behaviour and parenting issues? Australian and New Zealand Journal of Psychiatry 42:
819–827.
Moulin S, Doray P, Laplante B, et al. (2013) Work intensity and non-completion of university:
longitudinal approach and causal inference. Journal of Education and Work 26: 333–356.
Muratori M, Colangelo N and Assouline S (2003) Early-entrance students: impressions of their
first semester of college. Gifted Child Quarterly 47: 219–238.
Neihart M, Reis SM, Robinson NM, et al. (2002) Social and Emotional Development of Gifted
Children: What Do We Know? Waco: Prufrock Press.
Pancer SM, Hunsberger B, Pratt MW, et al. (2000) Cognitive complexity of expectations and
adjustment to university in the first year. Journal of Adolescent Research 15: 38–57.
Park N and Peterson C (2010) Does it matter where we live? The urban psychology of character
strengths. American Psychologist 65: 535–547.
Parker JDA, Duffy J, Wood LM, et al. (2005) Academic achievement and emotional intelligence:
predicting the successful transition from high school to university. Journal of First-Year
Experience and Students in Transition 17: 67–78.
Parker et al. 193

Parker JDA, Hogan MJ, Eastabrook JM, et al. (2006) Emotional intelligence and student retention:
predicting the successful transition from high school to university. Personality and Individual
Differences 41: 1329–1336.
Parker JDA, Keefer KV and Wood LM (2011) Toward a brief multidimensional assessment of
emotional intelligence: psychometric properties of the emotional quotient inventory–short
form. Psychological Assessment 23: 762–777.
Parker JDA, Summerfeldt LJ, Hogan MJ, et al. (2004) Emotional intelligence and academic
success: examining the transition from high school to university. Personality and Individual
Differences 36: 163–172.
Pascarella ET and Terenzini PT (2005) How College Affects Students: A Third Decade of
Research. San Francisco: Jossey-Bass.
Perera HN and DiGiacomo M (2013) The relationship of trait emotional intelligence with aca-
demic performance: a meta-analytic review. Learning and Individual Differences 28: 20–33.
Perry RP, Hladkyj S, Pekrun RH, et al. (2001) Academic control and action control in the
achievement of college students: a longitudinal field study. Journal of Educational Psychology
93: 776–789.
Petrides KV (2010) Trait emotional intelligence theory. Industrial and Organizational Psychology
3: 136–139.
Pfeiffer SI (2013) Lessons learned from working with high-ability students. Gifted Education
International 29: 86–97.
Plucker JA and Levy JJ (2001) The downside of being talented. American Psychologist 56: 75–76.
Qualter P, Whiteley H, Morley A, et al. (2009) The role of emotional intelligence in the decision to
persist with academic studies in HE. Research in Post-Compulsory Education 14: 219–231.
Rinn A and Plucker J (2004) We recruit them, but then what? The educational and psychological
experiences of academically talented undergraduates. Gifted Child Quarterly 48: 54–67.
Robbins SB, Allen J, Casillas A, et al. (2006) Unraveling the differential effects of motivational
and skills, social, and self-management measures from traditional predictors of college out-
comes. Journal of Educational Psychology 98: 598–616.
Ross T, Kena G, Rathbun A, et al. (2012) Higher Education: Gaps in Access and Persistence
Study. (NCES 2012-046). U.S. Department of Education, National Center for Education
Statistics. Washington: Government Printing Office.
Ross SE, Niebling BC and Heckert TM (1999) Sources of stress among college students. College
Student Journal 32: 312–317.
Saklofske DH, Austin EJ, Galloway J, et al. (2007) Individual difference correlates of
health-related behaviours: preliminary evidence for links between emotional intelligence and
coping. Personality and Individual Differences 42: 491–502.
Saklofske DH, Austin EJ and Minski PS (2003) Factor structure and validity of a trait emotional
intelligence measure. Personality and Individual Differences 34: 707–721.
Schutte NS, Malouff JM and Thorsteinsson EB (2013) Increasing emotional intelligence through
training: current status and future directions. International Journal of Emotional Education 5: 56–72.
Schwean VL, Saklofske DH, Widdifield-Konkin L, et al. (2006) Emotional intelligence and gifted
children. E-Journal of Applied Psychology 2: 30–37.
Shaienks D, Gluszynski T and Bayard J (2008) Postsecondary education, participation and drop-
ping out: differences across university, college and other types of postsecondary institutions.
Ottawa: Statistics Canada (ISBN: 978-1-100-10900-8).
194 Gifted Education International 33(2)

Silverman LK (1993) The Gifted Individual. Denver: Love.


Stough C, Saklofske DH and Parker JD (2009) A brief analysis of of 20 years of emotional
intelligence. In: Stough C, Saklofske DH and Parker JD (eds) Assessing Emotional Intelligence.
New York: Springer, pp. 3–8.
Summerfeldt LJ, Kloosterman PH, Antony MM, et al. (2006) Social anxiety, emotional intelli-
gence, and interpersonal adjustment. Journal of Psychopathology and Behavioral Assessment
28: 57–68.
Thompson BL, Waltz J, Croyle K, et al. (2007) Trait meta-mood and affect as predictors of somatic
symptoms and life satisfaction. Personality and Individual Differences 43: 1786–1795.
Tinto V (1993) Leaving College: Rethinking the Causes and Cures of Student Attrition, 2nd ed.
Chicago: University of Chicago.
Vesely AK, Saklofske DH and Nordstokke DW (2014) EI training and pre-service teacher well-
being. Personality and Individual Differences 65: 81–85.
Webb CA, DelDonno S and Killgore WD (2014) The role of cognitive versus emotional intelli-
gence in Iowa gambling task performance: What’s emotion got to do with it? Intelligence 44:
112–119.
Wellisch M and Brown J (2012) An integrated identification and intervention model for intellec-
tually gifted children. Journal of Advanced Academics 23: 145–167.
Witkow MR, Huynh V and Fuligni AJ (2015) Understanding differences in college persistence: a
longitudinal examination of financial circumstances, family obligations, and discrimination in
an ethnically diverse sample. Applied Developmental Science 19: 4–18.
Wood LM, Parker JDA and Keefer KV (2009) The emotion quotient inventory: a review of the
relevant research. In: Stough C, Saklofske DH and Parker JDA (eds) Assessing Emotional
Intelligence: Theory, Research and Applications. New York: Springer, pp. 67–84.
Zeidner M, Matthews G and Roberts RD (2012) The emotional intelligence, health, and well-being
nexus: what have we learned and what have we missed? Applied Psychology: Health and
Well-Being 4: 1–30.
Zeidner M, Shani-Zinovicha I, Matthews G, et al. (2005) Assessing emotional intelligence in
gifted and non-gifted high school students: Outcomes depend on the measure. Intelligence
33: 369–391.

Author biographies
James DA Parker is professor of Psychology at Trent University. His research focuses
on the development of emotional competencies and in the consequences for personality
development, psychopathology and wellness when there are deficits in these abilities.

Donald H Saklofske is professor of Psychology at the University of Western Ontario.


His research focuses on intelligence, emotional intelligence, personality and psycholo-
gical assessment.

Kateryna V Keefer is an assistant professor of Psychology at Trent University,


Ontario, Canada. Her research focuses on emotional competencies, resilience and well-
being, and psychological assessment.

View publication stats


PHYSICAL REVIEW FLUIDS 5, 110515 (2020)
Invited Articles

Adopting a communication lifestyle


*
N. S. Sharp
Sharp Science Communication Consulting, LLC, Denver, Colorado 80210, USA

(Received 16 June 2020; accepted 1 October 2020; published 24 November 2020)

As fluid dynamicists, good communication is key to our success in academia, industry,


and research. Yet our training in this critical skill is often lacking. In this article, I present
a framework for the science communication process and techniques for integrating com-
munication training into everyday practices. With some forethought and habit-building,
preparing engaging conference presentations, writing journal articles, or interacting with
journalists does not have to be a painful, last-minute scramble. Instead, these activities can
be just another part of your communication lifestyle.

DOI: 10.1103/PhysRevFluids.5.110515

I. INTRODUCTION
Every scientist and engineer is, by necessity, a communicator. In academia, research often occurs
at interdisciplinary boundaries and requires collaboration between experts from vastly different
technical backgrounds. Within fluid dynamics, it is not unusual to find overlaps between biology,
medicine, robotics, and even paleontology. In industry, engineers frequently work in teams on large-
scale projects that require clear communication between members with different specializations.
Yet communication skills often receive short shrift in our education. A 2010 survey by the
American Society of Mechanical Engineers found that industry managers consider entry-level
engineers lacking in both oral and written communication skills [1]. Despite increased attention
to technical communication in engineering education, discrepancies remain between the skills
expected by industry and those that are taught and practiced in the academic curriculum [2,3].
Simultaneously, social media has exploded with users interested in science communication, and
many scientists now find themselves using these platforms to discuss, promote, and engage with
others about scientific content.
The term “science communication” itself is broad and not well defined. As a field of academic
research, social scientists disagree as to whether science communication constitutes its own in-
dependent discipline [4,5]. In practice, science communication is sometimes conflated with science
outreach and viewed as a vehicle for scientists explaining science to a public audience. But although
communicating with the public is an aspect of science communication, those activities do not
represent the sum of science communication. As Burns et al. [4] identify in their own definition,
“Science communication may involve science practitioners, mediators, and other members of the
general public, either peer-to-peer or between groups.” In other words, even standard academic
activities—like journal publication and conference presentations, as well as the team-based activi-
ties of industry—constitute science communication.

*
nicole.sharp@gmail.com

Published by the American Physical Society under the terms of the Creative Commons Attribution
4.0 International license. Further distribution of this work must maintain attribution to the author(s) and
the published article’s title, journal citation, and DOI.

2469-990X/2020/5(11)/110515(10) 110515-1 Published by the American Physical Society


N. S. SHARP

FIG. 1. The four phases of the science communication process.

In my own experience, communicating science—both to the public and my academic


colleagues—has been a rewarding way to strengthen my oral and written communication skills and
build subject expertise. The skills that underlie good science communication include compelling,
understandable visuals; clear, concise, and technically accurate explanations; and the use of narra-
tive structures. Regardless of one’s final audience, the best technical communicators exploit these
skills; to become better communicators, we must develop such skills through deliberate practice [6].
Due to the wide range of fields touched by fluid dynamics and the increasingly interdisciplinary
nature of research in the subject, fluid dynamicists require strong communication skills to succeed,
regardless of their interest in outreach or public engagement. This article aims to improve communi-
cation skills within the fluid dynamics research community by providing techniques and resources
for integrating communication training and practice into regular research activities. Rather than
dedicating attention to writing and speaking about science only to the weeks before manuscript
submission or conference deadlines, we should instead build habits that acknowledge the integral
nature of communication to our work.
To that end, in Sec. II, I introduce an overview of the science communication process and
the questions each communicator should answer when embarking on a project. Habit-building
forms the basis for Sec. III, which is divided into subsections around individual and group-based
practices. The former focuses on general habit-building around planning and critiquing work, and
the latter suggests specific methods for integrating science communication training into regular
group meetings. I also illustrate principles of good communication with several examples from the
fluid dynamics community. Together these efforts can support a better communication lifestyle for
both students and professionals.

II. SCIENCE COMMUNICATION PROCESS


In my projects, I break down the process of developing any communication product into four
phases, as illustrated in Fig. 1 [7]. The first phase, purpose, focuses on establishing guiding
principles for the rest of the process by answering three key questions:
(1) What is my goal?
(2) Who is my audience?
(3) What is my message?
Every communication effort has an underlying motivation, whether one is a researcher making a
Gallery of Fluid Motion video or an engineer writing a white paper to secure funding. Developing
a final product that supports that goal requires explicitly identifying what communicators want
to accomplish. If their goal, for example, is to recruit new graduate students, then aiming their
department seminar to the senior-most scientists in attendance is unlikely to help.
The audience is another critical component to consider. As undergraduates, students write for
an audience of one—their professor—with the singular goal of demonstrating their knowledge.
For graduate students, researchers, and industry members, this is no longer the case. Even when
addressing a technical audience, the savvy communicator must consider the range of experience
and familiarity their audience has with the topic, as well as what the appropriate tone, level of
detail, and use of terminology should be.

110515-2
ADOPTING A COMMUNICATION LIFESTYLE

Finally, communicators must consider their message. They may have several key points to make,
but whether they are producing a blog post or a textbook, there should always be one overarching
message a reader or listener should leave understanding. That message serves as the keystone about
which the work is structured. In beginning composition, writers typically think of this message as
their thesis statement. In the world of research, the key message often concerns the implications of
a research project’s results.
I use this exercise of answering three purpose questions regularly, whether I am developing a
presentation, writing an article, or preparing a YouTube video. It is also an exercise I ask each of
my clients to complete as we determine the form and scope of any project. Those familiar with
science communication and scientific writing guides will recognize that these three questions echo
many other authors’ recommendations [8–10]. For those accustomed to simply opening a document
or presentation and typing, taking the time to complete this exercise can save future heartache and
revision later.
The second phase of the science communication process is construction, where the hard work
of making version 1.0 takes place. The details of this phase depend on the type of product the
communicator is making, so I will not elaborate here. Where the purpose phase is about determining
the bigger picture, this phase takes place “down in the weeds,” dealing with the practical details of
writing a paper, preparing a poster, or producing a video.
Once a first version of the product is complete, it is time for the third phase, evaluation and
refinement. In this stage, the communicator returns to the bigger picture determined in the purpose
phase and asks if their creation will help achieve their goal and whether their audience will
understand the work and its message as intended. To evaluate that aspect of a work-in-progress,
I highly recommend test audiences, friendly review [10], and/or peer review from those unfamiliar
with the work. I will return to those topics in Sec. III, and here merely note that this outside
perspective is critical to revising and refining any product before it is ready for release.
The final stage of the science communication process, reflection, typically takes place after a
project is completed, e.g., the paper published, presentation given, or video uploaded. In this stage, I
encourage communicators to take time to appreciate what worked and what did not, lessons learned
for the future, and, importantly, what skills they have gained or improved as they completed the
project.
Taken altogether, the science communication process I present here is extremely general and
may be adapted to a variety of projects. It is, by no means, the sole way to approach science
communication, but it is a construct that has helped me, my clients, and my students. With this
framework in mind, I now turn to some specifics of communication habit-building.

III. COMMUNICATION HABIT-BUILDING


Communicating science is a complex cognitive skill that integrates planning, knowledge syn-
thesis and transformation, language production, and idea review, often simultaneously. From a
psychological perspective, meeting these substantial demands requires deliberate practice as Kel-
logg and Whiteford [6] describe for training advanced writing skills (emphasis in original):
“Becoming an expert writer entails gaining control over perceptual, motor, and cognitive pro-
cesses so that one can respond adaptively to the specific needs of the task at hand, just as a
professional violinist or basketball player must do. This occurs by reducing the demands that
relevant processes make on the limited resources of executive attention and working memory
storage. For the skill as a whole to be well controlled, its component processes must become
relatively automatic and effortless through practice. The term deliberate practice refers to practice
undertaken with a specific goal to improve.”
Thus, when choosing communication exercises or habits to pursue, consider how the activity
will help the growing communicator reduce or manage the myriad cognitive loads they are jug-
gling. Some communicators—both native and non-native English speakers—will benefit from rote,
sentence-level revision exercises (Sec. III B 6), whereas others may find planning exercises most

110515-3
N. S. SHARP

useful (Sec. III A 1). To put matters more colloquially, with any exercise suggested here, your
mileage may vary. I encourage readers, regardless of their communication experience level, to
sample widely to discover what works.
In the next two subsections, I recommend habits and exercises to pursue both individually and as
a group.

A. Habit-building as an individual
One hurdle for many inexperienced communicators is recognizing the extensive process behind
a finished product. As Heard [10] describes, “Most writers struggle. I didn’t realize this because I
had been seeing their writing product, not their writing process, which led to finished work that was
clear, smooth, and easy to understand” (emphasis in original). This need to share process and not
simply the product is one reason I have developed the science communication process outlined in
Sec. II. It is also why I encourage professors to let their students see their own working process, not
just a finalized presentation or grant proposal.
The following recommendations and exercises are aimed at individuals. Although anyone may
benefit from them, the first two sections are most beneficial for students, whereas the final section
is aimed more at those in advisory roles.

1. Planning
In my experience, planning is the most crucial (and often absent) piece of the process for students.
For this reason, I emphasize the purpose phase described in Sec. II in my science communication
workshops. Every project I undertake—including preparing this paper and the talk that preceded
it—begins by setting aside time to consider my goals, audience, and message well before I pick up
a pen or sit at the keyboard to start an outline. Others may find that wordstacks or concept maps
provide a more useful starting point for their planning; see Chapter 7 of [10] for descriptions of
those and other planning techniques.
To avoid procrastination or becoming stuck in the initial planning phase, it is helpful to break
large projects down by setting intermediate deadlines. For example, submitting a Gallery of
Fluid Motion video involves two external deadlines: the abstract submission and the final video
submission. But to ensure they meet those deadlines with a good product, a researcher could set
themselves a series of internal deadlines including: answering the three purpose questions; writing a
script; finalizing a storyboard; completing filming; creating a rough cut; receiving feedback from test
audiences; and finalizing the video. Intermediate deadlines not only help produce a better product,
they counter the sense of being overwhelmed by turning a large task into a series of smaller, more
manageable ones.

2. Getting started
For those who still struggle to get words onto the page—due to strong self-editing, for example—
freewriting exercises can help. The ground rules are simple: after determining what information
belongs in the next section, set a timer—five minutes is a good starting point—and start writing.
Stopping and rereading what is written are not allowed. Revision is not allowed until later. Often
a few minutes of this exercise is enough to give writers the momentum needed to continue. Not
everything produced this way will be worth keeping, but freewriting can help writers get past a
dreaded first draft and into the potentially easier task of revision.

3. Revision and critique


For others—chiefly, instructors or evaluators—revision is dreaded due to the time requirements
of providing critique. But there are several avenues to help senior researchers to avoid becoming
overwhelmed by giving feedback. One is peer review, which in writing circles is different than the
peer review to which scientists are accustomed; I will return to peer review in Sec. III B 8.

110515-4
ADOPTING A COMMUNICATION LIFESTYLE

Changing one’s critiquing techniques can also reduce the time needed to give feedback. Often
professors attempt to provide detailed written feedback on large sections of a student’s work. In
some cases, the student receives a draft dripping with red ink; in others, professors choose to simply
rewrite the draft entirely. Both methods involve a large time commitment for the evaluator and
provide the student with little room to grow as a communicator.
To see why, consider the technical equivalent: a student comes to office hours because they are
struggling with a problem set. Rather than sitting down with the student and identifying specific
issues, the instructor simply hands them the solution key and sends them away. Most of us would
recognize this as a poor recipe for learning fluid dynamics, yet this is exactly how communication
feedback is often treated.
Instead, I suggest an alternative method for critique, based on techniques I was taught as a peer
writing tutor. Rather than attempting to fix every mistake at once, examine a draft for major flaws or
repeated mistakes. Perhaps the writer’s verbosity makes their logic vague and hard to follow. Find
a passage that exemplifies these issues and discuss that passage together. Point out the problem and
guide the student into finding their own clearer rephrasing, perhaps prompted by a few suggestions.
After repeating this process of identifying problems and finding solutions for a subsection of the
draft, encourage the student to continue the process on their own for the remainder of the draft
before returning for additional help.
This methodology is helpful in two ways: it saves the evaluator time they would have spent on
extensive correction, and it places responsibility for the work back into the student’s hands, allowing
them to learn both how to identify and to correct issues on their own. Without that opportunity, the
student will likely continue making the same mistakes in subsequent projects.
In particular, students who are non-native speakers of English may struggle when writing and
presenting in technical English. Remember that university writing centers are an excellent resource;
if evaluators can help a student identify a specific, persistent problem, the student can then look for
help from the writing center to address it.
Working one-on-one or individually is one way to build communication habits, but the rewards
are even richer when working as a group.

B. Habit-building as a group
Regular research group meetings are an excellent venue for practicing communication skills to-
gether. Simply integrating a communication exercise into each meeting provides all group members
regular training and deliberate practice to advance their mastery. Even outside traditional group
meetings, informal student groups can effectively improve communication skills [11].
The following exercises and resources are not an exhaustive list, but they should provide a good
starting point. I present them, roughly, in an order corresponding to the science communication
process presented in Sec. II, beginning with those useful for the purpose phase.

1. Audience adaptation
Inexperienced communicators can struggle with identifying their audience and recognizing
how audience affects the level of detail, jargon, and tone they should use. Figure 2 shows how
Dickerson et al. [12] explain their study of mammals shedding water through shaking to three
different audiences. For an academic audience, the authors use more specialized language and
longer, more complex sentence structures. In a message aimed toward politicians, they simplify
their sentences and concentrate on useful applications of their work while still including the most
significant results. In a book for the general public, Hu uses even less formal language and frames
the work in the context of a story about one specific dog.
In a group format, it is useful to discuss these issues explicitly—highlighting, for example, the
differences between speaking to other laboratory members, a conference audience, a classroom of
high schoolers, or a politician. Then the group can take a single message, like a description of the
laboratory’s focus or a particular member’s latest work, and subdivide into smaller groups, each

110515-5
N. S. SHARP

FIG. 2. Explanations of mammals shaking themselves dry aimed at three different audiences: academic
audiences [12], the public [13], and politicians [14]. Wet dog image courtesy of A. Dickerson and D. Hu, used
with permission.

tasked with adapting that message for a different, specific audience. Afterwards, each subgroup
can present their results and the group as a whole can discuss just how the message changes.
Exercises like this allow all members to participate, even in a fairly large group, and feature the
interactivity and discussion communication researchers Silva and Bultitude [15] found are effective
best practices in science communication training.

2. Message discovery and refinement


Discovering and refining a key message can be difficult, even for experienced communicators.
Being familiar with every detail of a work makes it hard to step back and identify a single
overarching message. For this, I recommend the “Half-Life Your Message” exercise [16]. The
concept is simple: A speaker first explains their topic in 60 seconds, then immediately explains
the same information in 30 seconds, in 15 seconds, and finally in 8 seconds. The exercise forces
speakers to quickly refine their ideas down to the most critical message. It takes roughly three
minutes total for any given speaker; a video timer for the exercise is available at [17].
In large groups, I first demonstrate the exercise myself using a topic suggested by participants.
Then, we repeat the exercise several times in subgroups of three or four to allow each person a
chance to speak, reflect, and give and receive feedback. Afterwards, the group as a whole discusses
the results in terms of what worked and what did not, as well as whether they liked a particular
version of their message. The entire exercise can be completed, even in a large classroom, in
about 15 minutes. I have run this particular exercise with undergraduate students, graduate students,
professors, and military officers at institutions across the United States, and it is consistently popular.

3. Narrative structure
In scientific writing and presenting, we use a standard narrative structure: introduction, methods,
results, and discussion, known collectively as IMRaD [10]. Review papers, book chapters, and other
scientific writing can deviate from this structure, but it is by far the most familiar. However, outside
of journal articles, it is often not the most useful or engaging structure for communicating science.
Especially when taking scientific work to more general audiences—including to a grant proposal
committee or a journalist—it is worth considering alternative narrative structures that, unlike

110515-6
ADOPTING A COMMUNICATION LIFESTYLE

FIG. 3. Comparing two results as complementary halves of a figure allows viewers to quickly and easily
identify differences. Screenshot from Ref. [22], used with permission.

IMRaD, return the focus to the characters engaged in the narrative, namely, the scientists. The
Hero’s Journey—the narrative structure behind stories like The Lord of the Rings, Harry Potter, and
Star Wars: A New Hope—is particularly adaptable to telling scientific stories [18]. My recorded
2018 talk on this subject is available at [19].
For those interested in a deeper dive into narrative structure, science writers and journalists
share the behind-the-scenes development of their articles on The Open Notebook website [20].
Hart [21] also provides extensive guidance on writing and structuring narrative nonfiction. To see
these principles in action in the context of fluid dynamics, I highly recommend Ref. [13].

4. Improving figures and visuals


As a subject, fluid dynamics enjoys the advantage of striking and beautiful imagery suitable for
both outreach and capturing media attention. But with this beauty comes challenges: researchers
must find a balance between aesthetics and scientific rigor, and they must also contend with visual
complexity that can easily confuse audiences.
The critical question to ask for any figure is what do I want my audience to see? A good scientific
visual should be quickly and easily understandable, preferably without the need to repeatedly look
between the figure and its caption. The context in which the figure is presented also matters.
Consider a comparison between two results, a common visual in fluid dynamics. The standard
approach to such a figure places one or more full images next to one another. This placement requires
the viewer to glance back and forth between the images to recognize their differences and works
well enough in a journal article where the reader can take the time they need to do so. But in a
presentation or a Gallery of Fluid Motion video, the pacing is dictated for the viewer. To make a
comparison clear more quickly, it helps to use complementary halves of an image like the ones in
Fig. 3, from Durey et al.’s 2017 Milton Van Dyke award-winning video on bursting droplets driven
by Marangoni effects [22]. By placing half of each example directly next to one another, even subtle
differences between the two cases are instantly accessible to a viewer.
Group meetings are perfect vehicles for workshopping visuals. Every laboratory member, no
matter how junior, can provide feedback on what they see and understand from a figure, as well as

110515-7
N. S. SHARP

what changes would make the author’s intended message clearer. For an excellent recent resource
on developing and refining scientific visuals, I recommend Ref. [23].

5. Lightning talks
Lightning talks are informal, 60-second presentations given without a visual aid. They are
often—but not always—given impromptu, similar to an “elevator pitch.” The exercise combines
well with discussions of related topics, like audience and message refinement, and provides speakers
with a chance to practice developing skills in a low-risk environment while getting useful feedback
from fellow participants [15]. The “Three Minute Thesis” competitions held at universities around
the world represent a similar idea and may be attractive for some students as a venue for practicing
their communication skills [24]. Within the fluids community, the “flash presentations” introduced
at the 2019 APS DFD meeting are another such opportunity.
With this and other exercises, sticking with strict scientific topics is not necessary. A student
might instead explain bicycle maintenance or why they prefer PYTHON over MATLAB.

6. Revising for clarity and structure


Sometimes revision involves digging into sentence- and paragraph-level constructions to improve
clarity, brevity, or logical flow. Such exercises may be especially helpful for non-native speakers of
English, but native speakers will benefit as well. Scientific writing guides like [9] contain excellent,
premade revision exercises. For higher-level issues or more free-form exercises, Ref. [10] also
provides many options easily implemented at the group or individual level.

7. Critically evaluating science communication


Groups often form journal clubs to review recent articles. Rather than focusing solely on the
article’s technical content, these groups can also examine how and how well the authors commu-
nicate. Do the authors’ arguments flow clearly and logically? Can the figures stand separate from
the text? What style and structure do the authors’ use? Is the language overly complicated or clear
and concise? Discussions like these help students identify and address weaknesses in their own work
without the potential discomfort of presenting themselves [25]. The exercise can also highlight good
examples of science communication that participants can emulate in their own work.

8. Peer review
Numerous studies show the value of peer review—students evaluating and providing feedback
on one another’s work—in improving communication skills and reducing the evaluation load
on instructors [9,14,15,24]. Such active, collaborative learning mimics best practices in science
communication training [15], helps students identify and refine their own methods [25], and can
foster valuable mentorship roles between junior and senior students [11]. Even among undergrad-
uate engineers, Nelson [25] found that students struggling with their own communication skills
were often able to provide others with sound advice. Simpson et al. [11] describe successful
department-level communication initiatives that prompted graduate students to form their own peer
review networks.
When fostering peer review or implementing many of the exercises described above, it is
important to emphasize constructive critique [26]. Combative attitudes and singling out individuals
make the presenter feel attacked and defensive—not a useful environment for improvement. For this
reason, I highly encourage collaborative attitudes toward revision and critique.

IV. CONCLUSION
The interdisciplinary nature and visual complexity inherent to fluid dynamics makes commu-
nication skills critical for every student and practitioner. Mastering these skills requires frequent,
deliberate practice, but integrating such practice into everyday research activities need not be

110515-8
ADOPTING A COMMUNICATION LIFESTYLE

difficult or onerous. Both research groups and informal groups can pursue simple, regular com-
munication exercises and training that help all participants improve their skills.
Part of improving these skills is recognizing the process of science communication, which
consists of four stages: (1) identifying big-picture issues around goals, audience, and message; (2)
constructing a first version of the product; (3) evaluating and refining the product based on the
bigger picture and outside feedback; and (4) reflecting on the lessons learned during the project.
Of these stages, communicators often struggle most with the first and third. To this end, I have
suggested multiple exercises aimed at planning, message refinement, revision, and critique. My
hope is that these resources serve as a springboard for those looking to adopt or refine their scientific
communication lifestyle.

ACKNOWLEDGMENTS
First and foremost, I thank J. Hertzberg, D. Hu, and J. Aurnou for their vocal support both of
this paper and the talk that preceded it. Thanks are owed also to F. C. Frankel, with whom I first
developed my concept of the science communication process, and A. Athanassiadis, who introduced
me to the Half-Life Your Message exercise. I also thank G. Durey and D. Hu for granting permission
to use their work as examples.

[1] A. T. Kirkpatrick, S. Danielson, R. O. Warrington, R. N. Smith, K. A. Thole, W. J. Wepfer, and T. Perry,


Vision 2030: Creating the Future of Mechanical Engineering Education, ASEE Annual Conference and
Exposition (ASEE, Vancouver, 2011).
[2] J. A. Donnel, B. M. Aller, M. P. Alley, and A. A. Kedrowicz, Why industry says that engineering graduates
have poor communication skills: What the literature says, ASEE Annual Conference and Exposition
(ASEE, Vancouver, 2011), pp. 22.1687.1–22.1687.13.
[3] P. Lappalainen, Communication as part of the engineering skills set, Eur. J. Eng. Educ. 34, 123, (2009).
[4] T. W. Burns, D. J. O’Conner, and S. M. Stocklmayar, Science communication: A contemporary definition,
Public Underst. Sci. 12, 183, (2003).
[5] B. Trench and M. Bucchi, Science communication, an emerging discipline, J. Sci. Commun. 09, 1 (2010).
[6] R. T. Kellogg and A. P. Whiteford, Training advanced writing skills: the case for deliberate practice, Educ.
Psychol. 44, 250 (2009).
[7] This model was initially developed in collaboration with Felice C. Frankel for a joint talk we gave
at the Massachusetts Institute of Technology in 2016, and I have continued developing the concept
independently since then.
[8] C. Cormick, The Science of Communicating Science (CSIRO, Clayton, Victoria, Australia, 2019).
[9] A. E. Greene, Writing Science in Plain English (The University of Chicago Press, Chicago, 2013).
[10] S. B. Heard, The Scientist’s Guide to Writing (Princeton University Press, Princeton, 2016).
[11] S. Simpson, R. Clemens, D. R. Killingsworth, and J. D. Ford, Creating a culture of communication: A
graduate-level STEM communication fellows program at a science and engineering university, Across the
Disciplines 12, (2015).
[12] A. K. Dickerson, Z. G. Mills, and D. L. Hu, Wet mammals shake at tuned frequencies to dry, J. R. Soc.
Interface 9, 3208 (2012).
[13] D. L. Hu, How to Walk on Water and Climb up Walls (Princeton University Press, Princeton, 2018).
[14] D. L. Hu, Confessions of a wasteful scientist, Sci. Am. Guest Blog (2016), https://blogs.
scientificamerican.com/guest-blog/confessions-of-a-wasteful-scientist/.
[15] J. Silva and K. Bultitude, Best practice in communications training for public engagement with science,
technology, engineering and mathematics, J. Sci. Commun. 08, 1 (2009).
[16] E. L. Aurbach, K. E. Prater, B. Patterson, and B. J. Zikmund-Fisher, Half-life your message: A quick,
flexible tool for message discovery, Sci. Commun. 40, 669 (2018).
[17] N. Sharp, Half-Life your message timer, https://youtu.be/jD6EF5QIZi4.

110515-9
N. S. SHARP

[18] N. Sharp, FYFD: tips for connecting with broader audiences, in APS DFD Annual Meeting (APS, Atlanta,
GA, 2018).
[19] N. Sharp, Using the hero’s journey to communicate science, https://youtu.be/9hSDnjyVC8o.
[20] The open notebook, https://www.theopennotebook.com.
[21] J. Hart, Storycraft (The University of Chicago Press, Chicago, 2011).
[22] G. Durey, H. Kwon, Q. Magdelaine, M. Casiulis, J. Mazet, L. Keiser, H. Bense, J. Bico, P. Colinet, and
E. Reyssat, Marangoni bursting: evaporation-induced emulsification of a two-component droplet, 2017
Gallery of Fluid Motion, 2017, doi: 10.1103/APS.DFD.2017.GFM.V0020.
[23] F. C. Frankel, Picturing Science and Engineering (MIT Press, Cambridge, MA, 2018).
[24] Three minute thesis, http://threeminutethesis.org.
[25] S. Nelson, Teaching collaborative writing and peer review techniques to engineering and technology
undergraduates, in Proceedings of the 30th ASEE/IEEE Frontiers in Education Conference (IEEE, Kansas,
2000), Vol. 2, pp. S2B/1–S2B/5.
[26] L. Lerman and J. Borstel, Critical Response Process: A Method for Getting Useful Feedback on Anything
you Make, from Dance to Dessert (Dance Exchange, Inc, Takoma Park, 2003).

110515-10

You might also like