Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

PROJECT VIVA VOCE

B.Tech. In COMPUTER SCIENCE AND ENGINEERING

December 2023-May 2024

16-Mar-24 1
Integrating Large Language Models for
Enhanced Business Process Mining
Project Developed at Royal Dutch Shell Ltd.

Project Developed By
Vijay V. (20198002)
B.Tech CSE IV Year
Project Guide
Dr. Angelina Geetha
Dean E&T
16-Mar-24 2
OUTLINE

➢ABSTRACT
➢OBJECTIVE
➢LITERATURE SURVEY
➢PROPOSED SYSTEM
➢METHODOLOGY
➢DATASET
➢SOFTWARE / TOOLS REQUIREMENTS
➢PROJECT PLAN
➢IMPLEMENTATION
➢OUTPUT
➢ANALYSIS

16-Mar-24 3
ABSTRACT
As the landscape of process mining undergoes transformative shifts, this research aims to
delve beyond the conventional applications of Large Language Models (LLMs) as
conversational agents. While acknowledging the research made in enhancing
accessibility for non-technical users, this research aims to address the existing limitations
of natural language querying interfaces within process mining. Our exploration spans
diverse domains, aiming to uncover alternative avenues where Large Language Models
can offer transformative contributions to businesses. From mitigation challenges in
existing interfaces to delving into automated anomaly detection and semantic process
modeling, this research attempts to redefine the landscape of process mining applications
though a comprehensive examination of linguistic potentials.

16-Mar-24 4
OBJECTIVE

➢To conduct an in-depth analysis of the existing disadvantages and


limitations in natural language querying interfaces within the
context of process mining, with a focus on identifying challenges
and areas for improvement.
➢To explore ways to leverage Large Language Models to enhance
the comprehension and understanding of processes

16-Mar-24 5
LITERATURE SURVEY
TITLE AUTHORS DISADVANTAGES
Automated Discovery of Manuel Camargo ▪The approach proposed in this
Business Process Simulation Marlon Dumas paper does not support
Models from Event Logs [Decision Support conformance-checking.
Systems - 2020]
▪The experimental evaluation is
restricted to synthetic event logs.
Smyrida: A web application Ilias Merkoureas ▪Leverages open APIs, which
for process mining and Antonia Kaouni introduces a risk with regard to
interactive visualization Georgia Theodoropoulou the potential compromise of
[SoftwareX - 2023] organizational data privacy

16-Mar-24 6
LITERATURE SURVEY
TITLE AUTHORS DISADVANTAGES

Abstractions, Scenarios, and Alessandro Berti ▪The assessment of the proposed


Prompt Definitions for Daniel Schuster question to the querying
Process Mining with LLMs: Wil M. P. van der Aalst interface was subjective in
A Case Study [DBLP - 2023] nature.
▪Makes use of the GPT-4 by
Open AI.

Just Tell Me: Prompt Kiran Busch ▪Lack of clarity on how process
Engineering in Business Alexander Rochlitzer models or event logs can be
Process Management Diana Sola expressed in a prompt
Henrik Leopold
[International ▪The fixed window limitation of
Conference on Business a large language model is not
Process Modeling - addressed
2023]

16-Mar-24 7
LITERATURE SURVEY
TITLE AUTHORS DISADVANTAGES

Utilizing domain knowledge Daniel Schuster ▪Algorithms suggested are rule-


in data-driven process Sebastiaan J. van Zelst based and the functionality is
discovery Wil M.P. van der Aalst limited
[Computers in Industry
- 2022]

Language Models are Few- Tom Brown ▪Fine-tuning an LLM can be an


Shot Learners Benjamin Mann expensive option to improve the
Scott Gray accuracy of the performance on
[Advances in neural processes
information processing
systems – 2020]

16-Mar-24 8
PROPOSED SYSTEM

16-Mar-24 9
METHODOLOGY

Traditional process mining tools require programming knowledge and domain expertise,
limiting their adoption. This project aims to bridge this gap by using LLMs to automate
the analysis of event logs and answer user queries in natural language.

Components:
• Process Mining Abstractions: Utilize the pm4py library to extract key process insights
from event logs, including process variants, event streams, log features, and log
attributes. Adapt these abstractions to fit the context window of LLMs.
• LLM Usage: Test multiple LLMs, including Google Bard, OpenAI ChatGPT 3.5/4, and
Microsoft Bing, on the prepared process abstractions.
• User Interaction: Design a user interface where users can ask natural language
questions about the process, categorized into three types:
• Understanding of the process: Questions about the flow, bottlenecks, or
variations in the process.
16-Mar-24 10
METHODOLOGY

• Hypothesis formulation: Generate potential explanations for observed process


behaviour.
• Process enhancement suggestions: Propose optimizations or improvements
based on the LLM's understanding.
• Evaluation: Conduct a comparative analysis using a benchmark event log from the
BPI Challenge. Evaluate the LLMs' performance based on their ability to:
• Match findings of a human expert or reference paper: Compare the LLM's response
with existing analyses to assess its accuracy and completeness.
• Answer user questions correctly and coherently: Evaluate the LLMs' ability to
understand and respond to diverse natural language queries about the process.
• Provide insightful suggestions: Assess the quality and feasibility of the LLMs'
recommendations for process improvement.

16-Mar-24 11
DATASET

The Business Process Intelligence (BPI) Challenge 2017 dataset is a widely used
benchmark dataset in the field of process mining. It comprises event logs extracted from
the loan application process of a Dutch financial institution. The dataset contains real-
world event data collected over a period of several months, capturing the various
activities and interactions involved in the loan application process. Each event log entry
includes information such as timestamps, activity identifiers, case identifiers, and
additional attributes describing the context of the event. The loan application process
covers the application of loans, the application's validation, and the decision whether to
make an offer or not, the reply of the applicant, as well as validating the applicants'
decisions whether to accept the offer. The data provided contains 31.509 different
process instances and covers the time from January 2016 to February 2017. The BPI
Challenge 2017 dataset serves as a valuable resource for evaluating process mining
algorithms, benchmarking performance, and advancing research in the field of business
process analysis and optimization.

16-Mar-24 12
SOFTWARE / TOOLS REQUIREMENTS

1. Process Mining Software:


Celonis: Utilized for advanced process mining and analysis tasks, requiring a compatible
operating system (e.g., Windows, Linux) and sufficient computing resources for efficient
execution.
PM4Py Library: A Python library used for process mining tasks, requiring Python
environment setup (Python 3.10) with dependencies installed.
ProM: An open-source process mining toolkit requiring Java Runtime Environment (JRE)
installed on the system for execution.
Disco: Process mining software offering user-friendly interface and visualization
capabilities, compatible with various operating systems.
2. Programming Languages and Tools:
Python for Data Cleaning: Python programming language is used for data preprocessing
and cleaning tasks, requiring Python environment setup with relevant libraries.
16-Mar-24 13
SOFTWARE / TOOLS REQUIREMENTS

3. Large Language Models (LLMs):


OpenAI's ChatGPT: Accessible via OpenAI API, requiring stable internet connection and
API key for usage.
Google's Gemini: Utilizes Google Cloud Platform (GCP) services, requiring access to GCP
resources and authentication credentials.
Microsoft's Bing: Accessible through Bing Search API, requiring internet connectivity and
API key for usage.

16-Mar-24 14
PROJECT PLAN

Completed Modules:
1. Data Preparation:
Acquire Event Logs: Obtain event logs from the BPI Challenge dataset, ensuring
relevance to the process under investigation.
Preprocessing: Cleanse and preprocess the event logs to ensure consistency and quality.
Steps include removing noise, handling missing data, and encoding categorical variables.
2. Abstraction Generation:
Utilize pm4py Library: Leverage the pm4py library to generate abstractions of process
artifacts, including variants, DFGs, Petri nets, and more.
Contextual Abstractions: Tailor abstractions to suit the contextual understanding of
LLMs, optimizing their ability to interpret process structures effectively.

16-Mar-24 15
PROJECT PLAN

Modules to be Completed:
3. LLM Integration:
Model Selection: Choose appropriate LLMs for the task, including ChatGPT 3.5, ChatGPT
4, Google's Gemini, and Microsoft's Bing, based on their capabilities and compatibility
with the task requirements.
Domain-Specific Keyword Incorporation: Integrate domain-specific keywords into the
LLMs to enhance relevance and accuracy in generated responses.
4. User Interaction:
Query Formulation: Enable users to pose inquiries about the process, categorized into
understanding, hypothesis formulation, and enhancement suggestions.
Natural Language Interface: Develop an intuitive interface for users to interact with the
LLMs, facilitating seamless communication and query submission.

16-Mar-24 16
PROJECT PLAN

Modules to be Completed:
5. Evaluation:
Benchmarking: Compare the LLMs' performance against baseline models and a
submission paper using the BPI Challenge event log dataset.
Qualitative Assessment: Conduct qualitative analysis of the LLMs' generated responses
to gauge their relevance, coherence, and actionable insights.
6. Comparative Analysis:
Cross-LMM Comparison: Compare the performance of different LLMs, including
ChatGPT 3.5, ChatGPT 4, Bard, and Bing, in terms of their reliability and effectiveness in
automating process mining tasks.

16-Mar-24 17
IMPLEMENTATION
1. Abstraction Generation:
The BPI Challenge 2017 dataset is first cleaned by removing all the unnecessary columns
and narrow down on only those columns from which insights can be derived. The
datatypes are cross-checked and null values are either replaced or removed since
Celonis doesn’t allow for loading a dataset with null values. Once the cleaned dataset is
available, we create abstractions using PM4PY library for artifacts that give out the most
information. They include DFG, Petri-nets, Variants abstraction and skeleton-log
abstraction.

2. Question Taxonomy:
To get a clear understanding, we distinguish the prompts that needs to be asked into
three categories: Process Understanding, Hypothesis Testing and Process Enhancement.
It is vital to divide the questions into sub levels to understand the limitations of the
underlying LLMs.

16-Mar-24 18
IMPLEMENTATION

3. Data Ingestion:
The process can be fed into the LLM in two ways: directly ingesting the abstractions
along with the question as a prompt and creating a file agent to connect the LLM with
the underlying database by making use of Langchain framework. However, considering
the cost factors of LLM APIs with large context windows, the former approach is used.

4. Celonis Dashboard:
To verify the answers generated by the LLM, it is important to have a knowledge base.
Apart from the submission papers of the winners of the BPI Challenge contest, we build
a dashboard using Celonis to give a deeper perspective on the underlying data and also
test a few hypotheses.

16-Mar-24 19
IMPLEMENTATION

5. Comparative Analysis:
Generative AI is a field of constant growth, which led to the emergence of various high-
end open source and closed source LLMs in the market. Each LLM has their own
capabilities since they’re built on different architectures. To understand the potential of
such LLMs, we compare the performance of these LLMs by posing the same questions
and evaluating their answers against the existing knowledge base. Since the models are
probabilistic in nature, we remove the bias by prompt each LLM multiple times and
taking the average of all the responses.

16-Mar-24 20
OUTPUT

Process Understanding:

16-Mar-24 21
OUTPUT

Hypothesis Testing:

16-Mar-24 22
OUTPUT

Process Enhancement:

16-Mar-24 23
ANALYSIS
• The responses of the LLMs are then cross-validated against various submission papers, including
those submitted by the KPMG organization and the academy category winner of BPI Challenge
2017.
• Every response from the LLM is awarded a point based on its quality: 2 points if it correctly
answers the question without any guidance, 1 point if it requires additional prompts to better
understand the question, and 0 points if the results are inaccurate.
• Thus, every LLM will be asked six questions, each of them in five different ways, allowing each
LLM to score a maximum of 20 points in each category. Across process understanding questions,
all LLMs yielded satisfactory outcomes, with ChatGPT 4 and Gemini exhibiting superior
performance.
• Conversely, ChatGPT 3.5 struggled in hypothesis testing, while Copilot demonstrated
inconsistency. the below average scores from hypothesis testing indicate that the LLMs may not
be well suited for such tasks.
• In process enhancement inquiries, Gemini and ChatGPT 3.5 performed equally well, while
ChatGPT 4, and Microsoft Copilot delivered robust responses, leveraging their extensive domain
knowledge, with the latter coming out on top.
16-Mar-24 24
ANALYSIS

16-Mar-24 25

You might also like