Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

This member-only story is on us. Upgrade to access all of Medium.

Member-only story

Applying LLMs to Threat


Intelligence
A Practical Guide with Code Examples

Thomas Roccia · Follow


Published in SecurityBreak · 15 min read · 4 hours ago

75
LLMs, or Large Language Models, are an exciting technology designed to
leverage natural languages with various technologies. Specifically in
Cybersecurity, and more so in Threat Intelligence, there are challenges
that can be partially addressed with LLMs and generative AI.

While much of the focus is on prompt engineering skills, there’s more to


consider than just choosing the right word to interact with a model.

In this blog, I will discuss the potential of LLMs for threat intelligence
applications. I will first introduce some common challenges, then define
what prompt engineering is and how it can be applied to practical use
cases. Next, I will discuss some techniques such as few-shot learning,
RAG, and agents. Everything will be illustrated with code examples. Stay
with me, as we’re about to dive deep and acquire real skills, rather than
just skimming the surface.
Search Write

Threat Intelligence Challenges


In Threat Intel, there are several challenges to deal with. First, the sheer
volume of information produced today can be overwhelming, and no one
has the time to read it all. Second, investigating a threat can be time-
consuming, and junior analysts might lack the necessary background to
conduct the investigation effectively. Additionally, the dynamic nature of
threats means that analysts often have to keep up with rapidly changing
tactics, techniques, and procedures, which can be daunting even for
seasoned professionals.

With these challenges in mind, let’s explore how LLMs can be utilized to
enhance analysts’ capabilities.
What is Prompt Engineering?
We cannot discuss LLMs without defining what is Prompt Engineering.

Prompt Engineering is the discipline and science of crafting effective


prompts to guide AI models, particularly LLMs, toward desired
outputs. Much like a potter, wood carver, or a “tailleur de pierre”
(stone cutter), prompt engineering is the essential tool.

Prompt Engineering Analogy

To craft the ideal prompt, there are several basics to follow:

Clarity: Define the task you want the model to perform clearly.

Specificity: Provide as much detail as necessary to eliminate


ambiguity.

Iteration: Continuously refine prompts based on feedback from the


AI.

However, there are also common pitfalls to be wary of:

Over-complexity: Refrain from making prompts excessively detailed.

Ambiguity: Avoid vague prompts as they can lead to generic answers.

Blind Trust in the Model: Relying too much on the model’s


capabilities without adequate verification.

No Examples: Omitting example inputs and outputs.

Misplaced Belief in Model’s Understanding: Assuming the model


grasps your intent without clarity.

Ignoring Obsolescence: Neglecting to refresh prompts in tandem


with model updates or changes in relevant data.

The following example demonstrates an ideally crafted prompt:

Anatomy of an Ideal Prompt (Extract from my BsidesMelbourne conference)

But while many individuals are focusing on crafting the perfect prompt,
they are essentially overlooking the true potential of LLMs and their vast
capabilities.

Now, let’s talk about the genuine strength of LLMs and explore how we
can pragmatically create our own applications with it.
Practical Application of LLMs
There are multiple techniques that can be used in conjunction with a
model. In this section, I will explore some of them to provide you with
the keys to delve into this technology independently and achieve a better
understanding of it.

Few-Shot Prompting
Few-shot prompting is an interesting technique that can be employed to
instruct an LLM using a very limited amount of data.

The idea is to supply your model with some examples of what you expect
so it can replicate them directly. For instance, in the code below, I ‘teach’
the model a desired output — in this case, a mermaid mindmap — so that
it can produce similar mindmaps in the future.

# Function to generate a mindmap (few shot technique).


# NB: the more shot you add the better the result will be
def run_models(input_text):
response = openai.ChatCompletion.create(
model="gpt-4",
messages= [
{
"role": "system",
"content":"You are tasked with creating an in-depth mindmap designed
},
{
"role": "user",
"content": "Title: \ud83e\udda0 Lazarus Group's Infrastructure Reuse
},
{
"role": "assistant",
"content": "mindmap\nroot(Lazarus Group Threat Analysis)\n (Infra
},
{"role": "user", "content": input_text},
],
)
return response.choices[0].message['content']
In the code above, I provide some examples to clarify what I’m expecting
for the output. The information breaks down as follows:

System: I assign the role of “system” to my tool and detail what I


anticipate from this system. For this example, I’m aiming for a
mindmap.

User: The second line designates the role of “user.” This line presents
examples of user inputs.

Assistant: With the “assistant” role (representing the model), I


provide an illustration of the expected output — in this instance, the
mermaid mindmap code.

Finally, I capture the user input, allowing the assistant to generate the
subsequent mindmap based on that input.

An example of the resulting mindmap can be seen below:

The Intel Brief Mindmap Example


I wrote another blogpost about this technique and how I implemented
this concept for my newsletter. Click here to learn more about it!

Retrieval Augmented Generation (RAG)


The models we use are trained on a specific set of data up to a particular
date. This implies that more recent data might not be recognized by the
model, and most importantly, your personal/private data isn’t known to it
either.

RAG presents an interesting approach that enables you to supplement the


model with your own data, thereby expanding its capabilities. RAG is a
technique that melds retrieval-based and generative models.

Two Phases: Retrieval & Generation

Retrieval: This phase searches the database of your data you


provided.

Generation: This phase produces a context-relevant response based


on the retrieved information from your database.

The primary objective here is to enhance a model using your data. But
how does it work under the hood?

RAG operates in multiple stages. The subsequent diagram offers a


streamlined visualization of the process.
RAG Overview

My friend, Roberto Rodriguez, conducted in-depth research on this topic


using the Mitre ATT&CK Groups as data source.

For the sake of this blog, I’ve adapted his code to be compatible with
Jupyter Notebook and create an interface using pywidget. I’ll walk you
through each step to construct your own RAG. In this example, we used
LangChain, which is an open-source library designed for interacting with
an LLM.

Prepare Your Data (No, Really!)


You might have heard that when working with machine learning, deep
learning, or AI models, it’s essential to clean your dataset. This step is
crucial for obtaining the most accurate results.

Ensuring that your entire dataset is well-formatted and consists of clean


data is of utmost importance. Once your data is prepped, you can begin
crafting your RAG.
In this example, we used data exported from the Mitre ATT&CK groups.
After downloading the data to your local system, you can begin loading it
using Langchain.

Note: For this example, the data is stored in Markdown format, but you
can use any type of data.

from langchain.document_loaders import UnstructuredMarkdownLoader


# Using glob to find all Markdown files in the knowledge_directory
# The "*.md" means it will look for all files ending with .md (Markdown files)
group_files = glob.glob(os.path.join(knowledge_directory, "*.md"))

# Initializing an empty list to store the content of Markdown files


md_docs = []

# Loop through each Markdown file path in group_files


for group in group_files:

# Create an instance of UnstructuredMarkdownLoader to load the content of the


loader = UnstructuredMarkdownLoader(group)

# Load the content and extend the md_docs list with it


md_docs.extend(loader.Load())

Here we are using the group knowledge to load into our RAG.

Tokenisation
Tokenization is the process of converting a sequence of text into
individual units, known as “tokens.” These tokens can range from being
as small as characters to as long as words, depending on the specific
needs of the task and the language in question. Tokenization is an
essential pre-processing step in Natural Language Processing (NLP) and
text analytics models. Tokenisation can be done using the library
Tiktoken.
In our context, tokenization isn’t strictly required. However, it proves
beneficial if you aim to manage the amount of data sent and for
optimization and cost-control purposes.

Splitting into Smaller Chunks


Dividing your imported data into smaller chunks is a strategy designed to
make it easier for the model to access the imported data.

In this instance, we’re using the `RecursiveCharacterTextSplitter` from


LangChain. This method attempts to divide the text based on a set
sequence of characters until the resulting chunks reach a desired size. By
default, the characters used for splitting are [“\n\n”, “\n”, “ “, “”]. The
method strives to maintain the integrity of paragraphs, sentences, and
words as they’re typically semantically connected. The size of each chunk
is determined by its character count.

The following code demonstrates how to employ this method with our
MITRE ATT&CK Groups data.

# Import the RecursiveCharacterTextSplitter class from the langchain library


from langchain.text_splitter import RecursiveCharacterTextSplitter

# Create an instance of RecursiveCharacterTextSplitter with specified parameters


text_splitter = RecursiveCharacterTextSplitter(
chunk_size=500, # Maximum number of tokens in each chunk
chunk_overlap=50, # Number of tokens that will overlap between adjacent chunk
length_function=tiktoken_len, # Function to calculate the number of tokens in
separators=['\n\n', '\n', ' ', ''] # List of separators used to split the tex
)

Embeddings
Embeddings provide a means to convert words or phrases into numerical
representations, or vectors, so they can be easily processed by
computers. Why is this useful? By transforming text into numerical form,
it becomes simpler to gauge the similarity between words or sentences,
facilitating tasks such as search and classification.

A vector, in essence, is a list of numbers. In embeddings, each number in


this list captures some aspect or feature of the text. Such vectors allow
computers to grasp and compare concepts. For instance, the vector for
“apple” might bear more similarity to the one for “fruit” than to that of
“car.” This helps a computer discern that apples are more akin to fruits
than to vehicles.

In simpler terms, embeddings convert text into vectors. As you might


glean, these vectors provide a convenient means to store data for our
RAG and model.

In the example below, we use FAISS. Developed by Facebook, FAISS aids


in swiftly identifying items that resemble a particular item based on their
numerical (vector) representation. To illustrate, imagine a vast library of
books, and you wish to pinpoint the ones most similar to a specific title.
FAISS expedites this task, even with an extensive collection.

from langchain.embeddings.openai import OpenAIEmbeddings


from langchain.vectorstores import FAISS

embeddings = OpenAIEmbeddings()

# Send text chunks to OpenAI Embeddings API


db = FAISS.from_documents(chunks, embeddings)

retriever = db.as_retriever(search_kwargs={"k":5})
query = "What are some phishing techniques used by threat actors?"
print("[+] Getting relevant documents for query..")
relevant_docs = retriever.get_relevant_documents(query)

Alright, our retriever is now up and running. The next step is to integrate
this retriever with our LLM.

Retriever and LLM


Once we can interact with our data, we can then employ our LLM to
formulate the expected answer. The below screenshot shows you the
Jupyter notebook with the code discussed.

Jupyter Notebook with RAG and ATT&CK

We now have our RAG operational. But one thing that’s bothersome is
that our model doesn’t remember what we’ve discussed previously…
RAG + Memory
Being able to interact with your own data is quite powerful; you can
essentially feed any type of data and let your LLM work with your
personalized or internal data.

However, as seen in our previous example, the model doesn’t retain the
memory of prior interactions, which can be somewhat frustrating when
trying to gather multiple pieces of information about the same threat
actor.

By configuring memory in your RAG tools, you can maintain a record of


previous interactions, ensuring a continuous flow of information without
needing to pose the same questions repeatedly.

This can be seamlessly achieved using Langchain.

from langchain.chains import ConversationalRetrievalChain


from langchain.memory import ConversationBufferMemory
from langchain.llms import OpenAI
from langchain.prompts.prompt import PromptTemplate
import json

# Initialize your Langchain model


model = ChatOpenAI(model_name="gpt-4", temperature=0.3)

# Initialize your retriever (assuming you have a retriever named 'db')


retriever = db.as_retriever(search_kwargs={"k": 8})

# Define your custom template


custom_template = """You are an AI assistant specialized in MITRE ATT&CK and you i
Chat History:
{chat_history}
Follow Up Input: {question}
Answer: """
CUSTOM_QUESTION_PROMPT = PromptTemplate.from_template(custom_template)

# Initialize memory for chat history


memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
# Initialize the ConversationalRetrievalChain
qa_chain = ConversationalRetrievalChain.from_llm(model, retriever, condense_questi

Now, we can initiate a dialogue with our RAG:

RAG with Memory

To make things easier, I’ve put together a comprehensive Jupyter


notebook available on my website for you to tailor the code to your
specific needs.

ReAct and Agents


The next concept I’d like to discuss is the ReAct framework and the agent
features offered by LangChain.

ReAct is a logical framework designed for crafting intelligent agents. Its


chief purpose is to endow agents with the ability to carry out complex
tasks through a series of actions. Central to ReAct are two core
components: ‘Reason’ and ‘Act’. The ‘Reason’ facet reflects the agent’s
cognitive process, where it ponders and decides the subsequent action.
In contrast, ‘Act’ symbolizes the tangible action the agent executes based
on its prior reasoning.

You can think of ReAct’s operational flow as an “Action → Observation →


Thought Cycle”. Initially, the agent performs an action. It then observes
and evaluates the results of that action. After observing, the agent
ponders or reasons about its next step. This iterative process ensures the
agent continually adapts and responds to the dynamic conditions of its
surroundings.

Source: https://peterroelants.github.io/posts/react-repl-agent/

This notion is incredibly powerful and can be seamlessly integrated with


various tools. Remember, in LangChain, an agent can represent
anything, allowing you to essentially craft your own applications atop this
foundation.

In the example that follows, I’ve employed the agent functionality of


LangChain in synergy with MSTICpy, constructing an agent that
leverages MSTICpy’s features.

NB: MSTICpy is the Python library dedicated to threat intelligence


investigation.

from msticpy.sectools.tilookup import TILookup


from langchain.chat_models import ChatOpenAI
from langchain.agents import Tool
from langchain.agents import initialize_agent
from langchain.agents import AgentType

llm = ChatOpenAI(model_name="gpt-4", temperature=0.3)

class TIVTLookup:
def __init__(self):
self.ti_lookup = TILookup()

def ip_info(self, ip_address: str) -> str:


result = self.ti_lookup.lookup_ioc(observable=ip_address, ioc_type="ipv4"
details = result.at[0, 'RawResult']
sliced_details = str(details)[:3500]
return sliced_details

def communicating_samples(self, ip_address: str) -> str:


domain_relation = vt_lookup.lookup_ioc_relationships(observable = ip_addre
return domain_relation

def samples_identification(self, hash: str) -> str:


hash_details = vt_lookup.get_object(hash, "file")
return hash_details

ti_tool = TIVTLookup()

tools = [
Tool(
name="Retrieve_IP_Info",
func=ti_tool.ip_info,
description="Useful when you need to look up threat intelligence informati
),
Tool(
name="Retrieve_Communicating_Samples",
func=ti_tool.communicating_samples,
description="Useful when you need to get communicating samples from an ip
),
Tool(
name="Retrieve_Sample_information",
func=ti_tool.samples_identification,
description="Useful when you need to obtain more details about a sample."
),
]

This example demonstrates how to craft agents. In this scenario, my


agents utilize three functions from MSTICpy:

Retrieve_IP_Info: This function queries VirusTotal for a specific IP


address and relays the obtained information back to the model.

Retrieve_Communicating_Samples: This function fetches from


VirusTotal the samples that communicate with a particular IP, as
provided by the user.

Retrieve_Sample_Information: Here, we obtain details about a


specific sample.

It’s worth noting that numerous other functions can be integrated into
our code. However, for the purpose of this demonstration, we’ll maintain
simplicity.

agent = initialize_agent(
tools, llm=llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=False, ag
)
agent.run("Can you give me more details about this ip: 77.246.107.91? How many sam

In the provided example, I seek information regarding a specific IP.


What’s remarkable about pairing agents with LLMs is the innate ability of
the model to determine which agent to invoke based solely on the given
description. the description you provide is crucial, as it’s interpreted as a
prompt and serves as directives for the model.

Upon executing this example, my code activates the MSTICpy agents to


get the details. These details are then fed to the model to generate the
final response, as illustrated below.

Result from the Agents

In detail, the code will run MSTICpy automatically as shown below.


Under the Hood Agents

Conclusion
In this blog, I explored some interesting LLM features that allow you to
build your own application. I created some proof-of-concept
implementations that can be easily adapted for your own use case.

I started with a deep dive into prompt engineering concepts and few-shot
learning, and then looked at how to build a RAG with your own data.
Lastly, I discussed Agents and how they can be used in conjunction with
your existing tools.

I hope you enjoyed the journey. If you want to explore more about these
concepts, check out the resources below.
That’s it! If you like this blog, you can share it and like it. You can follow
me on Twitter @fr0gger_ for more stuff such as this one. ❤

You can also subscribe to my newsletter ‘The Intel Brief’

Consider becoming a Medium member if you appreciate my content and want


to help me as a writer. It cost $5 per month and gives you unlimited access to
Medium content. I’ll get a little commission if you sign up via my link and that
will help supporting my community projects. Thanks!

Ressources
OTRF/GenAI-Security-Adventures (github.com)

Retrieval Augmented Generation (RAG) | Prompt Engineering Guide


(promptingguide.ai)

Agents | Langchain

microsoft/msticpy: Microsoft Threat Intelligence Security Tools


(github.com)

https://peterroelants.github.io/posts/react-repl-agent/

TheIntelBrief (securitybreak.io)

Threatintel Python Cybersecurity Llm AI


Written by Thomas Roccia Follow

1.7K Followers · Editor for SecurityBreak

Security Researcher

More from Thomas Roccia and SecurityBreak


Thomas Roccia in SecurityBreak Thomas Roccia in SecurityBreak

The Intel Brief by SecurityBreak Security infographics


An LLM Experiment I often do infographics to share security
concepts or best practices. This page will…
list the different files. I’ll update it
· 3 min read · Sep 27 · 3 min read · May 29
periodically…

139 2 444 2

Thomas Roccia in SecurityBreak Thomas Roccia in SecurityBreak

6 Useful Infographics for Threat [Reverse Engineering Tip]—


Intelligence Analyzing a DLL in x64DBG
Visualizing Cybersecurity concepts can be This blog is a quick tip about how to load a
a terrific way to learn more about specific… dll in x64dbg in order to debug it and…
tools, methodologies, and techniques! Here analysed it. In this example we will use a
is a ·post…
3 min read · Dec 18, 2022 2 · Jan 10, 2020
min readdll…
random

553 8 41

See all from Thomas Roccia See all from SecurityBreak


Recommended from Medium

Antonio Formato in Microsoft Azure Akshay Kokane

Chat with your Cyber Threat Semantic Memory—A new open


Intelligence data with Azure… source project from Microsoft
OpenAI
Chatbot to engage in conversations with In the ever-evolving landscape of artificial
your threat intel data sourced from… intelligence and natural language…
Microsoft Defender Threat Intelligence, processing, the ability to efficiently index
· 14Azure
using min read · Oct 15
OpenAI. 4 read · Oct 13
min datasets…
vast

55 106 5

Lists

Coding & Development Predictive Modeling w/


11 stories · 250 saves Python
20 stories · 551 saves

Generative AI The New Chatbots:


Recommended Reading ChatGPT, Bard, and Beyond
52 stories · 364 saves 12 stories · 178 saves
Jon Baker in MITRE-Engenuity Cassie Kozyrkov

Our TRAM Large Language Why I quit my job as Google’s


Model Automates TTP… Chief Decision Scientist
Identification
Written in CTI
by James Ross Reports
& jackie lasky. What’s it like to go from being Chief
Decision Scientist at Google to being, well…
just me?
8 min read · Aug 29 · 7 min read · 5 days ago

78 5.4K 72

Kiran Neelakanda Panicker in LlamaIndex Blog The PyCoach in Artificial Corner

Mastering PDFs: Extracting ChatGPT Has Changed My


Sections, Headings, Paragraphs… Approach to Learning New…
and Tables
Despite recentwith Cutting-Edge
motivation to utilize NLP for Things
The days of learning only with textbooks or
Parser
wider range of real world applications, mo… Google are over.
NLP papers, tasks and pipelines assume
5 read · Oct 18
minclean…
raw, · 6 min read · Oct 26

552 9 1.5K 26
See more recommendations

Help Status About Careers Blog Privacy Terms Text to speech Teams

You might also like