Professional Documents
Culture Documents
AI Uses in Blue Team Security WHPUABT WHP 1221
AI Uses in Blue Team Security WHPUABT WHP 1221
AI Uses in Blue Team Security WHPUABT WHP 1221
Emerging
Technology © 2021 ISACA. All Rights Reserved.
2 AI USES IN BLUE TEAM SECURITY
CONTENTS
4 Introduction
4 What Are Machine Learning, Deep
Learning and Artificial Intelligence?
6 / How Deep Learning Differs From
Machine Learning
7 Areas In Cybersecurity Where Machine
Learning Helps
7 / Network Intrusion Detection/ Security
Information and Event Management
(SIEM) Solutions
9 / Phishing Attack Prevention
9 / Offensive Cybersecurity Application
9 / Reconnaissance
10 / Scanning
10 / Fuzzing/Exploit Development
11 Areas in Cybersecurity Where Machine
Learning Is Overused
11 Malicious Use of ML and DL: Social
Engineering and Phishing
12 Conclusion
13 Acknowledgments
ABSTRACT
It is difficult to keep up with the pace of technology innovation in today’s world. Things
have been changing rapidly, especially in cybersecurity. Many cybersecurity experts feel
they are in the fight of their lives, and the criminal element has recently dealt some heavy
blows. This white paper explores the use of artificial intelligence (AI), machine learning
(ML) and deep learning (DL) applications in cybersecurity to identify what is working, what
is not working, what looks encouraging for the future, and what may be more hype than
substance.
This paper utilizes interviews with some of the engineers behind these technologies,
along with firsthand examination and use of some of the related products. It also includes
the observations of chief information security officers (CISOs) and chief information
officers (CIOs) who weighed in with their take on the effectiveness of certain ML/AI-based
products or enhancements. One conclusion drawn from the interviews and experiments
undertaken for this paper is that marketing tactics often obscure reality when it comes to
new security technology. Unfortunately, some truly great innovations in the area of ML
have been overshadowed by heavily marketed claims of AI magic from some major
industry brands.
The research conducted for this paper supports the basic conclusion that even though
the ML/AI movement does not offer a panacea for solving the greatest cybersecurity
problems, it is one of the bright areas in cybersecurity. Rapid adoption of ML and AI
principles will better equip the professionals who are the most engaged in cybersecurity
defense.
Introduction
Most major cybersecurity product vendors or service products, services and solutions deliver real added value.
providers advertise artificial intelligence (AI) as a key part According to MIT Sloan, only about one in 20 companies
of its new offerings. This paper explores some of the AI- has extensively incorporated AI into its solutions.1 1
scientists, computer scientists and research professionals computers to learn, adapt and perform desired functions on their
typically view ML as a type of programming that enables own. ML algorithms learn patterns from previous input and results
automation, which eventually may lead to the realization and adjust tasks accordingly.3 Generally, if it is necessary to tweak
3
of some of the more ambitious goals of AI. Following are a computer program, someone has to recode it. With ML, it is
basic definitions of ML, DL and AI: possible for a program to recode or update itself. This ability may
• Artificial intelligence (AI)—A wide-ranging branch of computer be useful for firewalls, intrusion detection systems, and other
science dedicated to building smart machines capable of security appliances and tools, allowing them to adjust code on the
performing tasks that typically require human intelligence. AI is fly to adapt to new or emerging threats.
fundamentally an effort to make computers think like humans. • Deep learning (DL)—A subset of ML that processes data and
The term describes machines that mimic cognitive functions creates patterns for use in decision making. DL techniques
2
such as learning and problem solving. AI is the term generally
2
enable machines to complete tasks without human intelligence
used to describe ML and DL efforts. As ML and DL algorithms input.4 DL uses neural networks to emulate the human brain.
4
and implementations advance, AI will improve overall. Neural networks are essentially layers of ML algorithms
1
1
Ransbotham, S.; D. Kiron; P. Gerbert; M. Reeves; “Reshaping Business With Artificial Intelligence,” MIT Sloan Management Review, 6 September 2017,
https://sloanreview.mit.edu/projects/reshaping-business-with-artificial-intelligence/
2
2
Beal, V.; “Artificial Intelligence (AI),” Webopedia, 24 May 2021, www.webopedia.com/definitions/ai/
3
3
Roy, S.; “Machine Learning,” Webopedia, 1 September 2021, www.webopedia.com/definitions/machine-learning/
4
4
Beal, V.; “Deep Learning,” Webopedia, 24 May 2021, www.webopedia.com/definitions/deep-learning/
designed and implemented to emulate the way neurons illustrate this overlap. Ideally, the goal is to trim or
function in the human brain. The goal is to enable computer eliminate the overlap. Further refinement might be
code to adapt to problems in a way that closely mimics human possible by adding more differentiating features—for
processes. example, poison ivy leaves are typically hairy while poison
For many years, humans have mastered the art of oak leaves are usually smooth. The addition of a third
consuming data, creating data, using data and storing feature to the algorithm should improve the accuracy of
data. The recent addition of ML enables programmatic the result, as illustrated in the three-dimensional chart
decision making about data, and the resulting decisions shown in figure 2.
can even predict future things about the data. FIGURE 1: Features Used to Classify Leaves
5
5
A confusion matrix is a table that is used to describe the performance of a classification model on a set of test data for which the true values are known
and allows visualization of the performance of an algorithm.
This is an example of the types of jobs ML is ideally cybersecurity has yet to experience a similar amazing
suited to perform. However, the more features added, the breakthrough, there are some promising bright spots.
more complex it becomes to illustrate the results. A
comparison of 10,000 features would be very difficult to
visualize, but it would make for some very powerful ML. Of
How Deep Learning Differs
course, it would also require more computing power. Now From Machine Learning
that cloud computing resources are available on demand, From a practical standpoint, DL is a subset of ML, often
it is possible to see some visualizations that were viewed as a progressive evolution of ML. In other words,
economically not feasible before. ML is a general practice and DL is a specialty within that
practice.
The vast amount of data allowed the platform to predict
Basic ML models often become progressively better at
the likelihood of certain diseases, such as Alzheimer’s, a
full two to three years faster than other systems currently whatever it is they are designed and engineered to do, but
in use in the medical community. they need significant guidance from developers and
engineers. If an ML algorithm should return an inaccurate
What is happening in the industry is that a lot of software
prediction, then a developer would have to intervene and
vendors claim AI and ML capabilities when their products
make the proper changes to the code and algorithms to
in fact use advanced versions of IF, AND and OR logical
get the results within the desired tolerance. With a DL
functions or other basic types of programmatic problem-
model, the algorithms can determine whether a resultant
solving. Of the 13 engineers who commented for this
prediction is sufficient or not. To do so, they typically use
paper—all of whom were working on well-known products
neural networks, which can be thought of as layers of ML
that make ML and AI claims—none felt that the marketing
algorithms.
associated with the products they were working on was
100 percent accurate with respect to advertised
With a DL model, the algorithms can determine whether a
capabilities. Yet, in the same breath, the engineers were resultant prediction is sufficient or not. To do so, they
optimistic about the direction they were going and the typically use neural networks, which can be thought of as
layers of ML algorithms.
technologies they were going to be creating as they relate
to ML and DL. The engineers provided excellent insight
For example, upon returning home, a person who wanted
into the capabilities that are reportedly on the way.
to hear some mood music could tell a smart speaker to
However, some of the better solutions under
“play something jazzy.” Based on that instruction, the
consideration still lack the ability to scale at a reasonable
smart speaker could use ML to review the user’s previous
cost.
jazz music selections. It might factor in things like time of
day to decide which song to play. This is an example of
The gradual adoption of ML has yielded some good
traditional ML.
results outside the world of cybersecurity. For example,
Mt. Sinai Hospital in New York City developed ML However, a more robust ML tool could go a lot further to
algorithms for use in its Deep Patient platform. The make a good music choice. For instance, it could use an
hospital loaded Deep Patient with the medical records of integrated gaming console camera to see that the person
every person who has used the facility in the last 20 years. is wearing dressy clothes (suggestive of a night on the
The vast amount of data allowed the platform to predict town), holding a bottle of wine and accompanied by a
the likelihood of certain diseases, such as Alzheimer’s, a date. The smart speaker might use those deep layers of
full two to three years faster than other systems currently information, plus the fact that it is 9 pm, to determine that
in use in the medical community. Though the field of a Barry White song would suit the moment. This scenario
is an example of taking advantage of neural networks and conclusions and do not make mistakes, but when
using layers of algorithms instead of a flat plane of successfully implemented, DL can be truly
algorithms, a step further toward AI. groundbreaking. It is currently regarded as the main
pathway to eventual true AI.
Unfortunately, cybercriminals are using ML and DL as well,
and it appears they might be outpacing the There are some innovative products in the market that will
cyberdefenders when it comes to developing and take ML and DL to new levels, and these approaches need
employing new technologies.
further development. For example, some vendors are
DL models are built to analyze data in a way that using ML and DL to discover patterns in packets that
resembles human thought processes. To accomplish this, previously were impossible to detect, due to the sheer
DL applications use layers of algorithms called artificial volume of packets and limitations with packet retention
neural networks. These artificial neural networks are and data retention in general.
modeled after the often-studied biological neural networks
Unfortunately, cybercriminals are using ML and DL as well,
in human brains. The idea is to develop models with
and it appears they might be outpacing the
intelligence that exceeds that of traditional non-DL ML.
cyberdefenders when it comes to developing and
It is very difficult to ensure that DL models draw correct employing new technologies.
Management (SIEM) Solutions process is based only on static and manually created signatures.
fly by ML-driven services, devices and applications. There Algorithms are often grouped into classes according to
are essentially two types of tasks carried out in ML the similarity of their functions. Figure 3 shows some
settings: supervised and unsupervised. The following common algorithm groupings, but is by no means an
definitions summarize the main characteristics of each: exhaustive list of all ML algorithms—it is just a sample of
• Supervised ML algorithms apply what has been learned in the some commonly used types. IDSs that make use of these
past to predict future events using labeled examples. The techniques provide some of the best examples of how ML
algorithm analyses, known as training datasets, infer functions is being successfully applied to solve real problems. From
to make predictions about output values. After sufficient a cyberdefense perspective, some of these algorithms
training, the system can provide targets for new inputs. The work best when used together.
machine is then equipped with a new set of examples so the
SIEM solutions employ these techniques with great
supervised learning algorithm can analyze the training data and
success. Some SIEM providers even offer ML kits that
produce a correct outcome from labeled data.6 The6
use.
ML Algorithm
Purpose Cybersecurity Application
Class
Classification/ Used to train on datasets of previous observations, and Used to determine if an executable or other type of file, such as
supervised tries to apply what is learned to new data. This requires PDF, is malicious or safe, following use of training data
classifying data with different labels. It works with many obtained from an SME’s previous identification of malicious
kinds of input, including text, images and video. and safe files
Regression/ Often used to identify correlations across different Used in some security information and event management
unsupervised datasets and to understand their relationships, which (SIEM) solutions to establish data relationships across
might otherwise be hidden or unnoticed different log sources. It can be used to compare predicted
application programming interface (API) calls from a process
to previous legitimate calls to identify anomalies.
Clustering/ Works directly on new data without considering Clustering can be used to analyze traffic and look for common
unsupervised previous data, examples or training. Clustering is patterns. For example, malware operating internally on multiple
primarily used to identify commonalities between computers might exfiltrate data in a specific way, e.g., with
different artifacts and group them based on their common packet size or frequency. Even encrypted traffic might
common features. be identified as exfiltration traffic based on common features.
Stacking/ semi- Usually used after clustering has been performed to Stacking might identify and group traffic according to features
unsupervised further segment resulting clusters such as destination, e.g., a specific Internet Protocol (IP)
address or domain. Traffic that may look extremely similar to
exfiltration traffic may be grouped as benign due to its going to
a specific location.
6
6
Cuelogic, “Evaluation of Machine Learning Algorithms for Intrusion Detection System,” 10 May 2019, www.cuelogic.com/blog/machine-learning-
algorithms-for-intrusion-detection-systems
7
7
Ibid.
Penetration testing is another area of cybersecurity where Google, Facebook® and many others already have
ML is being applied to good effect. algorithms designed to learn from all these data, primarily
8
8
Brownlee, J.; “A Gentle Introduction to the Bag-of-Words Model,” Machine Learning Mastery, 9 October 2017,
https://machinelearningmastery.com/gentle-introduction-bag-words-model/
9
9
Luvsandorj, Z.; “Introduction to NLP - Part 3: TF-IDF explained,” Towards Data Science, 6 June 2020, https://towardsdatascience.com/introduction-to-
nlp-part-3-tf-idf-explained-cedb1fc1f7dc
10
10
Srinidhi, S.; “Stemming of words in Natural Language Processing, what is it?”, Towards Data Science, 19 February 2020,
https://towardsdatascience.com/stemming-of-words-in-natural-language-processing-what-is-it-41a33e8996e2
for marketing purposes. It is possible to get access to the 3. Fine-tune the fuzzer to send massive combinations of input to
data through a paid subscription to aid passive the application until it does something unexpected, such as fault
of data from search results and quickly pull out what is 4. Examine the cause of the fault or crash, then duplicate it with a
useful are greatly enhanced when they incorporate ML. proof of concept.
the amount of information they provide are meant to executed where part of the vulnerable program is
make the job more manageable. With ML, those limits can supposed to execute, it constitutes an exploit. Steps 3
be lifted, leading to better results and more fidelity in through 5 are very time-consuming and tedious because
11
11
Snort, www.snort.org/
By combining NLP with other antiphishing algorithms that a user does not click on. The threat actors using NLP
used by security professionals, cybercriminals have been to build phishing campaigns are almost guaranteed
able to engineer and deploy devastating phishing success. For that reason, using machine learning to
campaigns. In these attacks, the model learns what works defend against phishing attacks might seem like overkill,
and what does not on a per-second basis as it blasts out but given the types of innovations the threat actors are
large numbers of emails. The combination of NLP coming up with on the opposite side, it is clear that
algorithms with others, such as clustering, allows these implementing NLP in antiphishing defenses is necessary
phishing campaigns to automatically get smarter with just to be in the fight.
every email that a security product catches and every link
Conclusion
One of the major challenges concerning the use of ML, DL One of the biggest problems with IDSs and alerts comes
and AI in the cybersecurity industry is overuse of the down to storage and how much can be stored for how
terminology. With so many vendors and service providers long. ML enjoyed a resurgence in popularity a few years
embracing the terms, progress has been hindered in some ago mainly because of the novel ways it was used to
ways. The true innovation underlying some of the more analyze the massive amounts of data humans create, use
useful implementations has been lost amid the hype. and store daily on the Internet. The downside is that the
black hats have been making tremendous headway in
The meaning of AI has become so blurred that 10
adapting ML and AI principles for their activities. Several
different sources might offer 10 different definitions.
known advanced persistent threat (APT) groups have
However, the consensus among those educated in the
used ML to carry out devastating phishing attacks or
field appears to be that no product or vendor has
surgically effective ransomware attacks.
incorporated true AI just yet. Some interesting solutions
have emerged or been reimagined with the use of ML and In the history of human inventions, the visionaries who first
DL algorithms, but it is critical to pay close attention to became inspired to learn how to fly made machines that had
marketing claims to determine if a product offers true flapping wings that imitated birds. The concepts of lift and
innovation or fails to match its hype. drag and other principles of science and engineering were
not understood as well then as they are now.
One vendor claims its product analyzes billions of
decision points to decide if a file is malicious. Is it It took many years for inventors to advance from flapping-
necessary to examine billions of decision points for that? wing machines to the types of jet planes that passengers
Probably not. This is another case of an enterprise using routinely fly on these days. It may be that ML and AI are
ML algorithms to solve problems that are solved as easily currently in the flapping-wing phase of development and
using much simpler basic logic and elementary coding. true understanding and harnessing of these technologies
is a long way off. That said, with the massive computing
There are certainly some interesting potential applications
resources available in the cloud and recent advances in
for using ML, DL and AI in offensive security activities
quantum computing, these technologies could come
such as penetration testing. Protocol and application
together in just a few decades. It is exciting to see that
fuzzing or scanning and reconnaissance are among the
even if this game is essentially at day zero, there are
techniques and operations conducive for ML- and AI-
already signs of considerable progress. What is to come
principled solutions. There is real promise in the areas of
certainly will be much greater than what has been
endpoint detection and response (EDR), extended
imagined so far, and it will be exhilarating to go along for
detection and response across multiple security controls
the ride.
(XDR), and SIEM solutions.
Acknowledgments
ISACA would like to acknowledge:
David Samuelson
Chief Executive Officer, ISACA, USA
Gerrard Schmid
President and Chief Executive Officer,
Diebold Nixdorf, USA
Asaf Weisberg
CISA, CISM, CGEIT, CRISC
Chief Executive Officer, introSight Ltd.,
Israel
About ISACA
For more than 50 years, ISACA® (www.isaca.org) has advanced the best
1700 E. Golf Road, Suite 400
talent, expertise and learning in technology. ISACA equips individuals with
Schaumburg, IL 60173, USA
knowledge, credentials, education and community to progress their careers
and transform their organizations, and enables enterprises to train and build
Phone: +1.847.660.5505
quality teams that effectively drive IT audit, risk management and security
priorities forward. ISACA is a global professional association and learning Fax: +1.847.253.1755
organization that leverages the expertise of more than 150,000 members who
Support: support.isaca.org
work in information security, governance, assurance, risk and privacy to drive
innovation through technology. It has a presence in 188 countries, including Website: www.isaca.org
more than 220 chapters worldwide. In 2020, ISACA launched One In Tech, a
philanthropic foundation that supports IT education and career pathways for
under-resourced, under-represented populations.
Provide Feedback:
DISCLAIMER
www.isaca.org/ai-blue-team-security
ISACA has designed and created AI Uses in Blue Team Security (the “Work”)
primarily as an educational resource for professionals. ISACA makes no claim Participate in the ISACA Online
that use of any of the Work will assure a successful outcome. The Work Forums:
should not be considered inclusive of all proper information, procedures and https://engage.isaca.org/onlineforums
tests or exclusive of other information, procedures and tests that are Twitter:
www.twitter.com/ISACANews
reasonably directed to obtaining the same results. In determining the propriety
of any specific information, procedure or test, professionals should apply their LinkedIn:
www.linkedin.com/company/isaca
own professional judgment to the specific circumstances presented by the
particular systems or information technology environment. Facebook:
www.facebook.com/ISACAGlobal