Project Report 3

School of Computing Sciences
Department of Computer Science and Engineering
Academic Year: 2023-2024
REPORT SUBMITTED FOR SECOND REVIEW
BATCH No:
Project Title: Integrating Large Language Models for Enhanced Business Process Mining
STUDENT REGISTER Evaluator 1 Evaluator 2

NAME(S) NUMBER(S)
Vijay. V 20198002
Supervisor Name: __________________________________________

TABLE OF CONTENTS
Sl.
Content Page No
No
1 Project Description (Minimum 200 words) 3
2 Module Description 3
3 Screenshots of results 5
PROJECT DESCRIPTION
The project aims to revolutionize the domain of process mining by integrating Large
Language Models (LLMs) into existing methodologies, thereby democratizing access to
process analytics and enhancement. Process mining involves the systematic analysis of
event data to improve operational processes in a data-driven manner. However, traditional
process mining approaches often require expertise from process analysts, data engineers,
and domain experts, making them inaccessible to non-technical users. By leveraging the
capabilities of LLMs, this project seeks to bridge this gap and empower users to extract
valuable insights from their processes with ease.
The project begins by obtaining event logs from the Business Process Intelligence (BPI)
Challenge dataset and preparing them for analysis. This involves cleaning and
preprocessing the data to ensure its quality and reliability. Subsequently, process mining
abstractions are generated from the event logs using established techniques and tools.
These abstractions, which include both event log and process model abstractions, serve as
input for the LLMs during the analysis phase. Integration of LLMs into the process mining
workflow is a crucial step in this project. Domain-specific keywords and fine-tuned
prompts are used to maximize the performance of the LLMs, ensuring that they provide
accurate and relevant insights. The LLMs are trained to interpret the process mining
abstractions and respond to user inquiries, which are primarily segregated into three types:
process understanding, hypothesis formulation, and enhancement suggestions.
MODULE DESCRIPTION
1. Data Preparation:
Acquire Event Logs: Obtain event logs from the BPI Challenge dataset, ensuring relevance
to the process under investigation.
Preprocessing: Cleanse and preprocess the event logs to ensure consistency and quality.
Steps include removing noise, handling missing data, and encoding categorical variables.
2. Abstraction Generation:
Utilize pm4py Library: Leverage the pm4py library to generate abstractions of process
artifacts, including variants, DFGs, Petri nets, and more.
Contextual Abstractions: Tailor abstractions to suit the contextual understanding of LLMs,
optimizing their ability to interpret process structures effectively.
Feature Engineering: Extract relevant features from the event logs, such as timestamps,
activity sequences, and case attributes, to enrich the abstractions.
3. LLM Integration:
Model Selection: Choose appropriate LLMs for the task, including ChatGPT 3.5, ChatGPT
4, Google's Bard, and Microsoft's Bing, based on their capabilities and compatibility with
the task requirements.
Training: Fine-tune the selected LLMs on the task-specific abstractions to enhance their
understanding of process mining concepts.
Domain-Specific Keyword Incorporation: Integrate domain-specific keywords into the
LLMs to enhance relevance and accuracy in generated responses.
4. User Interaction:
Query Formulation: Enable users to pose inquiries about the process, categorized into
understanding, hypothesis formulation, and enhancement suggestions.
Natural Language Interface: Develop an intuitive interface for users to interact with the
LLMs, facilitating seamless communication and query submission.
5. Evaluation:
Performance Metrics: Define performance metrics to evaluate the LLMs' comprehension
of the process, including accuracy, precision, recall, and F1-score.
Benchmarking: Compare the LLMs' performance against baseline models and a
submission paper using the BPI Challenge event log dataset.
Qualitative Assessment: Conduct qualitative analysis of the LLMs' generated responses to
gauge their relevance, coherence, and actionable insights.
6. Comparative Analysis:
Cross-LMM Comparison: Compare the performance of different LLMs, including
ChatGPT 3.5, ChatGPT 4, Bard, and Bing, in terms of their reliability and effectiveness in
automating process mining tasks.
Feature Importance: Analyze the contribution of domain-specific keywords and other
features in enhancing the LLMs' performance and relevance of generated responses.
SCREENSHOT OF THE RESULTS

Project Report 3

Uploaded by

Copyright:

Available Formats

You might also like

Project Report 3

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project Report 3

Uploaded by

Copyright:

Available Formats

School of Computing Sciences

Department of Computer Science and Engineering

Academic Year: 2023-2024

REPORT SUBMITTED FOR SECOND REVIEW

STUDENT REGISTER Evaluator 1 Evaluator 2

Supervisor Name: __________________________________________

You might also like