Professional Documents
Culture Documents
Rapport ToumAI
Rapport ToumAI
Field of Study :
Performed by:
Idir Yasmine
Mr Imade Benelallam
Contents 2
1 Introduction 4
1.1 Overview of the Real-time Speech Recognition Application . . . 4
1.2 Objective and Purpose of the Report . . . . . . . . . . . . . . . . 4
2 Technologies Used 5
2.1 FastAPI Framework . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 PyAudio Library . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Kafka Messaging System . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Azure Cognitive Services - Speech SDK . . . . . . . . . . . . . . 7
2.5 Python’s Threading Module . . . . . . . . . . . . . . . . . . . . . 7
2.6 Jinja2Templates for HTML Rendering . . . . . . . . . . . . . . . 8
2.7 KafkaProducer and KafkaConsumer . . . . . . . . . . . . . . . . 8
2.8 Azure Speech Configuration . . . . . . . . . . . . . . . . . . . . . 8
3 Application Architecture 9
3.1 Workflow Description . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Scalability and Future Considerations . . . . . . . . . . . . . . . 9
5 Future Enhancements 10
5.1 Potential Improvements . . . . . . . . . . . . . . . . . . . . . . . 10
5.1.1 Expanded Language Support . . . . . . . . . . . . . . . . 10
5.1.2 Enhanced Accuracy . . . . . . . . . . . . . . . . . . . . . 10
5.1.3 Interactive Features . . . . . . . . . . . . . . . . . . . . . 10
5.1.4 Integration with AI Assistants . . . . . . . . . . . . . . . 10
5.1.5 Real-Time Collaboration Tools . . . . . . . . . . . . . . . 11
6 Conclusion 11
2
Abstract
This document is the result of our work carried out as part of our end-
of-studies project. The purpose of this report is to explore audio transcrip-
tion in Moroccan Darija using an artificial intelligence-based approach. It
addresses the challenge of transcribing this specific language and high-
lights the importance of this task. The report details the methodology
implemented, from data preprocessing to the use of a model to achieve
exceptional accuracy. Furthermore, it assesses the model’s performance
and provides concrete examples of successful transcriptions in Moroccan
Darija. This report presents an innovative and effective solution for tran-
scribing this language, demonstrating its relevance and utility in various
contexts.
3
1 Introduction
1.1 Overview of the Real-time Speech Recognition Appli-
cation
The Real-Time Speech Transcription Application for the Darija language lever-
ages cutting-edge technologies to provide instantaneous transcription services.
Designed to cater specifically to the Darija language, this application facilitates
real-time speech-to-text conversion, allowing users to obtain accurate transcrip-
tions of spoken Darija content. By utilizing Kafka as a messaging system and
integrating Azure Cognitive Services for speech recognition, this application
addresses the growing need for efficient transcription services in the Darija lan-
guage.
The primary functionalities of the application include audio capture, stream-
ing, and recognition of speech input in Darija. It enables users to receive live
transcriptions of spoken content in real-time, fostering accessibility and commu-
nication in the Darija-speaking community. The real-time speech recognition
application is a sophisticated system that enables users to capture live audio
through a microphone, process it using Azure Cognitive Services for speech
recognition, and display the transcribed text in a user-friendly web interface.
Key functionalities of the application include:
4
technologies employed for real-time transcription in the Darija language using
Kafka. It aims to dissect the application’s components, workflow, challenges
encountered, and recommendations for potential enhancements.
This report will delve into the intricate details of the application’s architec-
ture, outlining the core components such as PyAudio for audio capturing, Kafka
for seamless message streaming, Azure Cognitive Services for real-time speech
recognition in Darija, and the FastAPI framework combined with Jinja2Templates
for web-based user interface rendering. Additionally, it will explore how these
components collaboratively enable the application’s capability to perform in-
stantaneous transcription in the Darija language.
Furthermore, the report will discuss the scalability options and potential en-
hancements in the application’s architecture to meet the demands of increased
usage and to enhance its overall performance. This report serves multiple pur-
poses aimed at providing a comprehensive understanding of the application:
2 Technologies Used
2.1 FastAPI Framework
work for building APIs with Python 3.7+. It offers several features such as:
5
• High Performance: FastAPI is known for its high performance due to
its use of Python type hints.
• Automatic API Documentation: It generates interactive API docu-
mentation automatically, facilitating API exploration.
• Data Validation and Serialization: FastAPI performs automatic data
validation and serialization using Pydantic models.
In the application, FastAPI serves as the core framework to create API end-
points, handle HTTP requests, and generate responses for controlling the record-
ing and retrieving transcriptions.
In the application, PyAudio is used to capture audio data in chunks from the
microphone, which is further processed for speech recognition.
6
streaming platform known for its role as a message broker. Its integration in
the application involves:
• Message Queuing: Kafka enables the streaming of audio data chunks
from the recording module to the Azure Cognitive Services component.
• Reliable Communication: It ensures reliable communication between
different components of the application by handling data streaming effi-
ciently.
Kafka’s message queuing mechanism facilitates the smooth transfer of audio
data for further processing in the application.
rent execution of code, managing multiple tasks simultaneously. Its use involves:
• Concurrent Tasks: Threading manages concurrent processes within the
application, such as audio recording and continuous speech recognition.
• Non-blocking Execution: It ensures non-blocking execution, prevent-
ing tasks from halting the entire application.
7
Threading in the application ensures the smooth functioning of different pro-
cesses without blocking the main execution flow.
8
3 Application Architecture
3.1 Workflow Description
The architecture of this application revolves around a seamless interaction among
several essential components, each playing a crucial role in the real-time tran-
scription process in Darija:
9
4.2 Speech Recognition Workflow
The speech recognition phase involves Azure Cognitive Services, specifically
tailored to enable real-time speech-to-text conversion for the Darija language.
Once the audio data reaches Kafka, it is received by components responsible for
integration with Azure Cognitive Services.
Azure Cognitive Services employs sophisticated algorithms and language
models trained for Darija voice recognition. Upon receiving audio segments,
these services execute a series of recognition processes, analyzing and interpret-
ing spoken language patterns to generate accurate textual representations.
The process involves breaking down the received audio into recognizable
speech components, utilizing various linguistic and acoustic models. The result
is the transformation of spoken Darija language into text, which is then relayed
back for further utilization, such as displaying live transcriptions on the user
interface.
5 Future Enhancements
5.1 Potential Improvements
5.1.1 Expanded Language Support
To broaden the application’s accessibility, consider extending its language sup-
port beyond Darija. This expansion aims to make the application more inclusive
by accommodating a wider range of languages. By integrating voice recogni-
tion for various languages, the application will cater to a more diverse audience,
enhancing its usefulness and appeal to a broader user base.
10
more personalized interactions. This integration can significantly enhance the
application’s usefulness by offering more sophisticated assistance and interaction
to users.
6 Conclusion
This project has presented a real-time speech transcription application designed
specifically for Darija language utilizing Kafka and Azure Cognitive Services.
The application’s architecture, workflow, and functionalities were comprehen-
sively detailed, covering audio capture, streaming, speech recognition, and user
interface rendering.
In conclusion, this application stands as a promising solution in overcoming
language barriers. Its real-time transcription capabilities hold significant poten-
tial in facilitating effective communication. With continuous advancements and
future enhancements, the application is poised to become an indispensable tool
for enabling inclusive and efficient real-time transcription, fostering seamless
communication across diverse linguistic backgrounds.
11