VADI Report

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 19

Voice-automated Dialog Intelligence (VADI)

Report
This report includes comprehensive details on how we transformed the project from a rule-based
system to artificial intelligence, leveraging the latest technologies such as LangChain and large
language models.

Technology Stack:

Programming Language: Python

Framework: LangChain – This cutting-edge framework facilitates interaction with


large language models. It provides essential functionalities such as chains and agents,
enabling us to effectively harness the full potential of Large Language Models
(LLMs).

Other Python Libraries:


- Seaborn
- Matplotlib
- Pandas
- Graphviz
- Wordcloud

These libraries are utilized for data visualization and plotting.

API: OpenAI

LLMs: We are currently using the gpt-3.5-turbo-16k model, which supports token
lengths up to 16k. In the event that token length exceeds this limit, the code
seamlessly switches to the gpt-4-turbo-preview model, which has a token limit of
128k.

Speech to Text: For converting audio to transcription, we employ the Whisper model
by OpenAI.

Activities Since Day One:


From the outset, a substantial amount of time was invested in identifying the optimal
speech-to-text API. Initially, the Google Speech-to-Text API was found to be
unsatisfactory in terms of both results and cost-effectiveness. Through meticulous
research, evaluating accuracy, efficiency, and pricing considerations among various
options such as Microsoft Speech Services, Google Speech-to-Text, AssemblyAI, and
OpenAI Whisper, we concluded that Whisper was the most suitable choice. Not only
is it the most cost-effective model, but it also boasts superior speed and efficiency.
The model is trained on 680K hours of supervised data, utilizing transformer
technology, rendering it a state-of-the-art model.

Subsequent Activities:

Following this, a significant amount of time was dedicated to preparing the reports.
This phase posed a challenge, as our commitment extended beyond ensuring
accuracy in the reports to staying abreast of the latest technology trends. After
thorough consideration and careful search, we ultimately decided to utilize Large
Language Models (LLMs) by OpenAI, integrated with the LangChain framework.
The key advantage of leveraging OpenAI is its unparalleled accuracy. While there are
other open-source models available on platforms like Hugging Face, they can be
deployed if we have access to a robust GPU-integrated server to support the
deployment of these models.

Background Noise Removing from the Audio:

Despite using noise-cancelling headsets to minimize background noise such as music


and ambient sounds, there are instances where the headset picks up the voice of
another person, especially during whispered conversations. This unwanted
background noise can interfere with accurate transcription. To address this, we
employ the Pydub library to remove the background noise from the audio.

Why we are using AI for generating reports ?

We have transitioned to using AI for generating reports due to limitations in the


previous rule-based system. The previous system relied on basic rules and patterns,
such as word frequency and n-grams, to generate reports. However, it lacked the
ability to capture the semantic nuances present in the transcriptions. Libraries like
TF-IDF(Term frequency Inverse document frequency), NLTK (Natural language
Toolkit), and Word2Vec(embedding model came in 2013 used for converting text to
vectors) were extensively utilized in the old system, but they couldn't provide human-
level understanding.

With advancements in deep learning, newer embedding models have emerged,


capable of comprehending language at a human-like level. Modern large language
models, trained on vast amounts of data, excel in providing efficient solutions. Thus,
we have shifted towards leveraging AI in our project to deliver more effective and
accurate results.

Comparison between Previous reports and AI generated reports


For comparison we are considering july 7 (302 session id).

Summary Report
Report generated by previous system:
Report generated by AI system:

The summary generated by AI is beating the human level understanding. As we are


using the text generation LLMs like GPT models, it generate text by understanding
the context , semantics from the conversation . We can clearly see the summary in
two parts , firstly from Title to Conclusion and second part is Bullet point summary .
Basic summary path (Flowchart) Report

Report by previous system:


Report generated by AI:

The AI generated report include the topics and related points discussed from the
starting of the meeting to its ending. It improves the flowchart structure , one can
easily understand the complete flow of the conversation , main topics and related
points discussed .

Important Participant statements Report

Report generated by previous system:


Report generated by AI:
AI generated report includes the content which is related to each mentioned topic
such as Key points , Opportunities etc as mentioned in the screeshot. It only includes
the relevant content rather all the complete statements picked from the transcription.

Wordcloud Report

Report generated by previous system:


Report generated by AI system:
In the AI-generated report, the process begins by extracting essential keywords from
the conversation, taking into account each speaker's contributions. These keywords
are then used to generate a word cloud. In contrast, the previous system generated the
word cloud directly from the transcription without considering the significance of
individual words. This led to the inclusion of irrelevant words, commonly referred to
as stopwords, which do not contribute to the understanding of the conversation.

The AI-generated report addresses this issue by highlighting only the important
keywords that are crucial for understanding the key topics discussed in the meeting or
the keywords that hold greater significance. This approach ensures that the generated
report provides a more focused and meaningful representation of the conversation.

Topic words report (Important keywords by each speaker)

Report by previous system:


Report generated by AI system

So in this report also the main difference is AI is generating the keywords whcih are
relevant according to the conversation rather than those keywords which does not
provide any significance .

Sentiment Analysis Report:

Report generated by AI
It seems there was an issue with the previous system not generating the sentiment
report, which is why no screenshot is attached. Some reports are not functioning
properly. The attached screenshot shows a portion of the CSV file generated by the AI
system for reference.

The AI-generated sentiment analysis report is based on Aspect-Based Sentiment


Analysis. The Language Models (LLMs) first extract aspects and then predict
sentiment according to each aspect. This type of sentiment analysis is highly accurate
because each transcription (sentence) from the speaker can contain multiple
sentiments. In some parts of the sentence, the speaker may express positivity, while in
others, they may express negativity or neutrality.
In addition to the CSV file we added additional functionality to provide visualization
for better understanding of the sentiment. Pie is the best way to examine the
percentage of particular sentiment. Overall sentiment means the overall distribution
overview of sentiment percentage in the conversation. Sentiment distribution
according to the each speaker is also provided which depicts how much each speaker
in a conversation is positive , negative or neutral. We have attached few screenshots
only just for reference.
Large utterance path report

Report generated by previous system


Report generated by AI:

Over here if we clearly see the reports, the main difference is AI generated report
firstly understand the context , then predict category from the sentence. These
categories are feature categories according to each sentence. Based on the contextual
understanding , AI predict the category.
Basic summary path Report:

This report is not generated by previous system. Ai generated report screenshot is


attached below:
This report includes a extract column of VADI Type which is the mapping of
Utterance category . So basically this report includes a complete summary of
sentiment , large utterance path report in a single csv file . Further this is extended to
discovery.csv file with additional one more column which is important phrases from
each sentence which is used in a discovery page that we can see on Unblinker site.

At last , I would like to say that all these AI generated reports are more efficient in
the context of accuracy , understanding contextual and semantic relationship
according to the intelligence of LLMs (By OpenAI).

You might also like