Machine Learning Security and Privacy

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

MACHINE LEARNING SECURITY AND PRIVACY

GUEST EDITORS’ INTRODUCTION

Machine Learning Security and Privacy

Nathalie Baracaldo | IBM Research


Alina Oprea | Northeastern University

O ur special issue explores emerging security and learning sensitive information about the training data
privacy aspects related to machine learning and and model parameters.
artificial intelligence techniques, which are increas- Consequently, there is a need to understand this
ingly deployed for automated decisions in many critical wide range of threats against machine learning, design
applications today. With the advancement of machine resilient defenses, and address the open problems in
learning and deep learn- securing machine learn-
ing and their use in health ing deployed in practi-
care, finance, autono- An area of research called adversarial cal settings. Our special
mous vehicles, personal- machine learning has been developed at the issue call for papers
ized recommendations, intersection of cybersecurity and machine solicited ar ticles on
and cybersecurity, under- critical topics related to
learning to understand the security of
standing the security and machine learning secu-
privacy vulnerabilities of machine learning in various settings. rity and privacy, includ-
these methods and devel- ing the following:
oping resilient defenses
becomes extremely important. An area of research ■ ■ applications of machine learning and artificial
called adversarial machine learning has been developed intelligence to security problems, such as spam
at the intersection of cybersecurity and machine learn- detection, forensics, malware detection, and user
ing to understand the security of machine learning in authentication
various settings. Early work in adversarial machine ■■ evasion attacks and defenses against machine learning
learning showed the existence of adversarial exam- and deep learning methods
ples: data samples that can create misclassifications ■■ poisoning attacks against machine learning at training
at deployment time. Other threats against machine time, such as backdoor poisoning and targeted poi-
learning include poisoning attacks, where an adver- soning attacks, and corresponding defenses
sary controls a subset of data at training time, and ■■ privacy attacks against machine learning, including
privacy attacks in which an adversary is interested in membership inference, reconstruction attacks, model
extraction, and corresponding defenses
Digital Object Identifier 10.1109/MSEC.2022.3188190
■■ adversarial machine learning and defenses in specific
Date of current version: 6 September 2022 applications, including natural language processing

1540-7993/22©2022IEEE Copublished by the IEEE Computer and Reliability Societies September/October 2022 11
MACHINE LEARNING SECURITY AND PRIVACY

(NLP), autonomous vehicles, health care, speech does not stop there: applying differential privacy may
recognition, and cybersecurity also result in fewer fair models and the application of
■■ methods for federated learning and secure multiparty adversarial training—a de facto defense against adver-
computation techniques for machine learning. sarial samples—and it may jeopardize privacy. These
implications need to be considered holistically to be
We were delighted to receive 10 submissions, from able to generate suitable solutions where all trustworthy
which we selected a set of seven articles for publication in objectives are covered. This article summarizes these
the special issue after rigorous peer review. The accepted conflicting aspects of machine learning trustworthiness
articles discuss important topics in private machine learn- and highlights the need for the research community to
ing, the security of natural language models, and the address them.
robustness of machine The article “Back-
learning used for secu- doors Against Natural
rity applications, such There is a need to understand this wide Language Processing: A
as malware and phish- range of threats against machine learning, Review,” by Shaofeng Li,
ing detection. design resilient defenses, and address the Tian Dong, Benjamin Zi
The first three articles
open problems in securing machine learning Hao Zhao, Minhui Xue,
address several issues Suguo Du, and Haojin
related to data privacy deployed in practical settings. Zhu, provides a survey
in machine lear ning. of backdoor poisoning
The first two, “Sphynx: attacks in NLP systems.
A Deep Neural Network Design for Private Inference,” Backdoor attacks are a type of poisoning attacks in
by Minsu Cho, Zahra Ghodsi, Brandon Reagen, Sid- which backdoored samples are inserted by adversaries
dharth Garg, and Chinmay Hegde, and “Complex at training time to induce a targeted misclassification of
Encoded Tile Tensors: Accelerating Encrypted Ana- the samples with the same backdoor pattern at testing
lytics,” by Ehud Aharoni, Nir Drucker, Gilad Ezov, time. Recently, large language models, such as Genera-
Hayim Shaul, and Omri Soceanu, discuss the prob- tive Pretrained Transformer (GPT) 2, GPT-3, and Bidi-
lem of performing efficient private inference in neu- rectional Encoder Representations From Transformers,
ral networks over encrypted data. Private inference have been using transformer-based architectures, which
allows a client to outsource neural network prediction leverage self-attention mechanisms that model relation-
to a more powerful cloud provider such that the client ships among all words in a sentence. Transformers have
does not learn anything about the cloud-hosted model shown superior performance in many NLP tasks, such
parameters, and the cloud does not learn the client’s as machine translation and question answering, but the
input. The main challenge is that private inference is article discusses their vulnerabilities against stealthy,
based on expensive cryptographic techniques, includ- hard-to-detect backdoor attacks. This is an important
ing homomorphic encryption and garbled circuits, threat that needs to be considered to enable the deploy-
and computation becomes prohibitive for large neural ment of these models in critical applications.
networks. The first article proposes the Sphynx frame- The article “Machine Learning for Source Code Vul-
work for neural architecture search to minimize the nerability Detection: What Works and What Isn’t There
number of expensive activation operations in neural Yet,” by Tina Marjanov, Ivan Pashchenko, and Fabio
networks and reduce the cost of private inference. The Massacci, provides an interesting study of machine
second introduces a different approach by representing learning techniques for defect detection and the auto-
vectors and matrices used in neural network inference mated correction of security defects in source code.
as more compact “tiled tensors” and shows that this rep- Starting from around 400 techniques, this study out-
resentation reduces the cost of operations performed lines popular approaches and highlights their limita-
over encrypted data. tions. By including the end-to-end machine learning
The third article, “Data Privacy and Trustworthy pipeline in the analysis, one identified limitation is
Machine Learning,” by Martin Strobel and Reza Shokri, the lack of access to real data that researchers have for
discusses differential privacy in machine learning and exploring this problem. Consequently, a large number
presents an interesting analysis of how different trust- of researchers generate unrealistic synthetic data that
worthiness objectives, including robustness, privacy, lead to the creation of pipelines that do not generalize
fairness, and explainability, may be at odds with one to real data sets. This article also highlights the grow-
another. Interestingly, models that are designed to be ing popularity of deep neural networks in this area.
explainable are also more susceptible to membership Although not popular at the time of the study, we expect
inference attacks that leak private data. The connection to see large language models used in the near future.

12 IEEE Security & Privacy September/October 2022


As the following article shows, existing machine learn- models could lead to the circumvention of phishing
ing solutions suffer from vulnerabilities themselves. defenses. Consideration of these new attack surfaces
In “Practical Attacks on Machine Learning: A Case and potential adaptive attacks is required to avoid a false
Study on Adversarial Windows Malware,” Luca Deme- sense of security.
trio, Battista Biggio, and Fabio Roli explain how Applying machine learning in critical and sensitive
adversarial examples can be applied to bypass malware applications requires understanding the risks and
detect o r s . T h e existence of adversarial examples vulnerabilities of the entire machine learning pipe-
has demonstrated the brittleness of machine learn- line. In this issue, we included seven articles that focus
ing models at inference on multiple aspects of
time. By adding small trustworthy machine
perturbations to the Applying machine learning in critical learning: machine learn-
input, these attacks cre- and sensitive applications requires ing privacy defenses
ate misclassifications. understanding the risks and vulnerabilities and threats, poisoning
An adversary can take of the entire machine learning pipeline. attacks against NLP tech-
advantage of this weak- niques that use large
ness to generate targeted models, and the robust-
and untargeted misclas- ness of machine learning
sifications. These attacks were originally developed used in security applications, such as source code vulner-
for image modality, where, for example, adding care- ability detection and malware and phishing detection.
fully crafted perturbations to a panda bear image would The selected articles provide an overview of cutting-edge
result in the model classifying the bear as a tiger. This attacks and defenses to deal with these threats. Impor-
article explains that these attacks go beyond the image tantly, generating trustworthy machine learning models
domain and, in fact, can be applied to generate malware that have desirable properties is not an easy task. While
that can effectively bypass existing detectors. The arti- there is a tendency to consider every trustworthiness
cle highlights the need to design malware detectors to objective in isolation, doing so does not result in an
be aware of these vulnerabilities. acceptable holistic solution. We invite the community
Finally, in “Phishing Detection Leveraging Machine to consider multiple dimensions of the machine learning
Learning and Deep Learning: A Review,” Dinil Mon pipeline while designing trustworthy solutions.
Divakaran and Adam Oest present an overview of a
diverse set of methods to detect phishing attacks. The Nathalie Baracaldo is a research staff member and man-
article covers uniform resource locater-based analy- ager of the AI Security and Privacy solutions team
sis techniques that use feature selection: image clas- with IBM Almaden Research Center, San Jose, Cali-
sifiers that utilize Siamese networks to differentiate fornia, 95120, USA. Her research interests include
between legitimate and phishing web pages, among information security, privacy, and trust. Baracaldo
other machine learning techniques. The generation received a Ph.D. in information science and technol-
of robust techniques for phishing detectors is still in ogy from the University of Pittsburgh. She is a Mem-
its infancy and, since it is based on machine learning ber of IEEE. Contact her at baracald@us.ibm.com.
models, inherits the same risks and vulnerabilities.
For example, training data may be compromised by Alina Oprea is an associate professor with North-
an adversary who may launch a backdoor poisoning eastern University, Boston, Massachusetts, 02120,
attack or a clean label attack. In fact, in this applica- USA. Her research interests include machine learn-
tion, adversaries have plenty of opportunities to carry ing security and privacy, threat detection, cloud
out this type of attack, given that training data are security, and applied cryptography. Oprea received
frequently collected from the Internet. Similarly, eva- a Ph.D. in computer science from Carnegie Mellon
sion attacks that create adversarial samples to target University. She is a Member of IEEE and ACM.
embedded neural networks and other machine learning Contact her at a.oprea@northeastern.edu.

www.computer.org/security 13

You might also like