Pfa2-2024-13

Réf : PFA2-2024-13
End of Year Project Report

of
Second Year in Software Engineering
Presented and publicly defended on 09/04/2024

By
Aicha BEN FADHEL
Mariem SAYEDI
Asma CHALLOUF
Object and text recognition mobile

application for the visually impaired
Supervised by: Mr. Mehrez BOULARES
President of the jury: Mrs. Sonda CHTOUROU
Academic Year : 2023-2024
5, Avenue Taha Hussein–Tunis Tel. : 71 . 496 . 066 : Ah Hw þ ys ¢V CAJ , 5
B. P. 56, Bab Menara 1008 Fax : 71 . 391. 166 :HA 1008 CAn A 56 :. . Q
Acknowledgments
As we come to the end of this journey, we wish to express our deepest
gratitude to our parents. Their unwavering support and encouragement have

been the bedrock of our success, guiding us to this significant milestone in our
educational pursuit.
Additionally, we extend our sincere appreciation to all those who have played
a part in bringing this project to fruition.

We owe a debt of gratitude to our supervisor, Mehrez Boulares, whose
guidance and support have been indispensable throughout this end-of-year

project at The Higher National Engineering School of Tunis (ENSIT). His
mentorship, encouragement, and constructive feedback have profoundly

influenced our approach, shaping our work with precision and excellence. We
are thankful for his openness to our ideas while maintaining a discerning eye
on our progress.
Our heartfelt appreciation also extends to everyone who has supported us along
our academic journey. We extend special acknowledgment to the members of
the jury who generously dedicated their time and expertise to evaluate our
work, providing invaluable insights and contributing to our growth.
ENSIT has been instrumental in shaping our academic and personal growth,
and we are deeply grateful for the opportunities it has provided us.
1
Table of Content
General Introduction 1
1 General Framework 2
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Presentation of the subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Study of the existing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Proposed solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Project management method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Analysis and Design 6

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Actors’ Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Analysis of the requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.1 Functionnal needs : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.2 Non-functional needs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.3 Use case diagram : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.4 Use case description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4.1 The sequence diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Deep Learning models and libraries 13

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 MobileNet V2 model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.1 Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.3 Use of the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 OpenCV library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3.1 Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4 Realization and Implementation 24

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Work Environment and Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.1 Technical Choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.2 Working Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.3 Packages and Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.4 Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Overview of the app . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3.1 Home interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3.2 Object recognition interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3.3 Text recognition interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
General Conclusion 39
3
List of Figures
1.1 CRISP-DM [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Global use case diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 The sequence diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1 Logo of Mobilenet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 Architecture of Mobilenet[4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Bottleneck and inverted residuals of MobileNet [5] . . . . . . . . . . . . . . . . . . . 16
3.4 MobilenetV2-Predict image[6] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.5 MobilenetV2-Use of the model[6] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.6 MobilenetV2-Train/test command[6] . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.7 Image classification for one hundred classes[6] . . . . . . . . . . . . . . . . . . . . . . 18
3.8 Mobilenet-Performance[7] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.9 Logo of OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.10 Text recognition model[9] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.11 Text detection model[9] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.12 Loading images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.13 Setting parameters and inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.1 Logo of Android Studio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.2 Logo of Firebase ML kit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3 Logo of TensorFlow Lite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4 First permission request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5 Second permission request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.6 Logo of the app . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.7 Splash interface and slogan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.8 Home interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.9 Some of detected objects in English . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.10 Some of detected objects in French . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.11 Some of detected objects in Arabic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4
4.12 Some of detected objects in Arabic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.13 Recognized text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.14 Recognized text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.15 Recognized text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.16 Recognized medication leaflet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5
List of Tables
2.1 Textuel description of the CU « Recognize text » . . . . . . . . . . . . . . . . . . . . 10

2.2 Textuel description of the CU « Recognize objects » . . . . . . . . . . . . . . . . . . 10
6
General Introduction
Visual impairment can significantly impact one’s ability to interact with the world effectively.
Tasks that are routine and effortless for sighted individuals, such as identifying objects, reading texts,
or navigating unfamiliar places, can pose substantial challenges for those with visual impairments.
This underscores the importance of developing assistive technologies that cater specifically to the
needs of visually impaired individuals. Innovation in this area is not just a matter of convenience;
it is a moral imperative and a societal responsibility. Engineers and technology developers play a
crucial role in advancing solutions that enhance accessibility and inclusion for all.
As software engineering students, we have chosen to develop a specialized mobile application
dedicated to assisting visually impaired individuals in order to promote inclusion and accessibility.
This decision stems from our belief in the transformative power of technology to create positive social
impact. Our application aims to leverage artificial intelligence algorithms to provide real-time object
identification for users with visual impairments, empowering them to navigate their surroundings
more independently. By focusing on the needs of this underserved community, we aspire to contribute
towards a more inclusive society where everyone, regardless of ability, has equal access to the tools
and resources needed to thrive. Through this project, we are committed to bridging the gap between
technology and accessibility, fostering empathy-driven innovation, and advocating for the rights and
empowerment of individuals with disabilities. We view this endeavor not just as a technical challenge,
but as a meaningful opportunity to make a tangible difference in the lives of others and promote a
culture of inclusivity within the field of software engineering.
This report summarizes the steps in the implementation of this system. It is structured into
four chapters as follows:
— The first chapter presents the general framework of the project.
— The second chapter presents the functional analysis and design of our application in order to
specify the requirements.
— The third chapter presents the deep learning models and libraries .
— The fourth chapter presents the realization and implementation of our solution.
We conclude this report with a general conclusion and some potential perspectives that can improve
our solution.
1
Chapter 1
General Framework
Plan
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Presentation of the subject . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Study of the existing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
4 Proposed solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
5 Project management method . . . . . . . . . . . . . . . . . . . . . . . . . 4
6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Chapter 1. General Framework
1.1 Introduction
The study of a project is a strategic approach that will allow us to have a vision on the
latter, thus aiming to organize the smooth running of the project. In this first chapter, we present
the context of the project as well as a study of the existing situation, the proposed solution and the
project management method that we followed.
1.2 Presentation of the subject
Object recognition for visually impaired assistance revolves around the development and
implementation of technologies aimed at helping individuals with visual impairments identify objects
in their surroundings. This area of study is critical due to the challenges faced by visually impaired
individuals in recognizing and interacting with objects independently. By leveraging computer vision
and artificial intelligence techniques, researchers and developers are exploring methods to enable
real-time object recognition through devices like smartphones or wearable technology.
In our project, we are tackling the challenge of improving object recognition for visually
impaired individuals without the need for specialized hardware. Our approach involves leveraging
computer vision and artificial intelligence techniques to enable real-time object identification using
smartphones or other accessible devices. The goal is to develop an application that harnesses the
power of these technologies to assist users in identifying objects in their environment efficiently
and accurately. Additionally, our application includes text detection capabilities, allowing users to
identify text in their surroundings. Through the use of innovative algorithms and user-friendly
interfaces, we aim to create a tool that enhances accessibility and promotes independence for
individuals with visual impairments, ultimately empowering them to interact more confidently with
their surroundings.
1.3 Study of the existing
In examining the existing landscape of applications in this field, it’s evident that there are
several offerings, although not all are specifically designed to cater to the needs of visually impaired
individuals. While these applications vary in their focus and functionality, they generally lack the
comprehensive adaptation required to effectively support users with visual impairments.
3
1.4 Proposed solution
Our project is distinct in its aim to address this gap by focusing on improving object
recognition and text detection specifically for the visually impaired, without relying on specialized
hardware. Through the utilization of computer vision and machine learning techniques, we intend
to develop a user-friendly application that can be accessed on commonly available devices like
smartphones. This approach seeks to empower visually impaired individuals by providing them
with efficient and accurate object identification and text detection capabilities, ultimately promoting
greater independence and confidence in navigating their surroundings.
1.5 Project management method
Every IT project is guided by an appropriate development method to achieve its objectives

optimally and efficiently. In this context, we will present the methodology that we have adopted for
optimal project management, CRISP-DM. The CRoss Industry Standard Process for Data Mining
(CRISP-DM) is a process model that serves as the base for a data science process. It has six
sequential phases:
• Business understanding: focuses on understanding the objectives and requirements of the

project.
• Data understanding: focuces on identifying, collecting, and analyzing the data sets that
can help you accomplish the project goals.
• Data preparation: often referred to as “data munging”, prepares the final data set(s) for
modeling. It has five tasks: select, clean, construct, integrate and format data.
• Modeling: build and assess various models based on several different modeling techniques.
• Evaluation: looks more broadly at which model best meets the business and what to do next.
• Deployment: This is the final stage of the process. Indeed, when the developed model is
ready, it is deployed and integrated into daily use. Therefore, the objective of this stage is
deployment planning, monitoring, and maintenance.[1]
4
Figure 1.1: CRISP-DM [2]
1.6 Conclusion
Throughout this chapter, we have presented the general context of our project as well as the
study of the existing situation, the proposed solution and the project management method. The
following chapter will be devoted to the analysis and design.
5
Chapter 2
Analysis and Design
Plan
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Actors’ Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Analysis of the requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Chapter 2. Analysis and Design
2.1 Introduction
In this chapter, we will analyze and specify the business requirements. Next, we will present
the use case diagram along with the textual description and the detailed sequence diagram. Finally,
we will develop the overall schema of the entire application accompanied by an explanation of each
step.
2.2 Actors’ Identification
This phase consists of highlighting the application’s context in order to address specific points
of the specifications and to clearly determine the functionalities. We will identify the actors in our
application.
An actor represents an abstract role performed by an external entity, such as a person, process, or
another system, that interacts with the system being designed. In our application, we have identified
just one actor who is:
— User : The user with visual impairment can get access to text recognition and to object
detection.
2.3 Analysis of the requirements
The specification of the requirements is considered an essential phase in the planning of a

project since it makes it possible to determine and define the customer’s needs.
2.3.1 Functionnal needs:
Functional needs express the services that the mobile application must provide in response
to the user request to meet his expectations. Our application offers the following functionalities:
— Text Recognition: Accurately detecting and recognizing text from various sources such as
documents, signs, labels, and screens.
— Object Detection: Accurately detecting and predicting objects within the user’s environment
using computer vision technology based on the default language settings (English, French,
Arabic) on the user’s phone.
— Voice Assistance: Offering a real-time audio feedback by converting recognized texts and
objects into speech, respecting the default language settings on the user’s phone.
7
2.3.2 Non-functional needs
The non-functional needs express the internal requirements that the mobile application must
provide such as the constraints related to the environment and to the implementation. Our mobile
application must have the following characteristics:
— Accessibility : The application must embody accessibility, catering to the visually impaired
by implementing features and designs that facilitate seamless navigation and interaction.
— Ergonomics : The application must have simple interfaces that are easy to use and consistent.
It respects the density of components in each interface to satisfy the user.
— Offline Functionality :
— Predict and provide offline text and object recognition capabilities.
— Query locally stored databases for offline data retrieval.
— Reliability : The application must present accurate and fair results.
— Usability : The application must be easy to use, intuitive, and user-friendly
— Maintenance : The application must be easy to maintain, update, and deploy.
2.3.3 Use case diagram :
The figure 2.1 below presents the overall operational use case diagram of the application
benefiting the actors.
8
Figure 2.1: Global use case diagram
2.3.4 Use case description
— Textual description of the CU « Recognize text »
9
Tableau 2.1: Textuel description of the CU « Recognize text »
Title Recognize text
Actor User
Pre-condition Presence of a text to recognize
Post-conditions Recognized text.
Nominal scenario
— User specifies the text or the area where he want to detect
text .
— User capture image via the up-volume button.
— User clicks on the down-volume button.
— The system extracts text and detect it.
— Text Detected.
— Pass the text through the text processing phase
— The text is ready, and a voice reads aloud what is written

or detected.
— Textuel description of the CU « Recognize objects »
Tableau 2.2: Textuel description of the CU « Recognize objects »
Title Recognize objects
Actor User
Pre-condition The presence of an object to detect.
Post-conditions Object Detected.
Nominal scenario
— User specify the object to detect
— Object detected and a voice reads what is detected
10
2.4 Design
2.4.1 The sequence diagram
In this section, we illustrate the dynamic aspect of our application by presenting the sequence
diagram. The sequence diagram focuses more specifically on the temporal interactions between the
actors and the system. In other words, it describes the process and messages exchanged between
them in order to produce a function.
The figure 2.2 below represents our sequence diagram :
Figure 2.2: The sequence diagram
2.5 Conclusion
During this chapter, we have elaborated on the analysis and design of our application. We
began with the specification and analysis of requirements. We identified the actors as well as the
list of functional requirements through the use case diagram accompanied by a textual description,
11
and also the list of non-functional requirements.

Finally, we drafted the design of the application by presenting the sequence diagram.
After completing this analysis, the next step is to understand more the model that we’ll be working
with.
12
Chapter 3
Deep Learning models and libraries
Plan
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 MobileNet V2 model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 OpenCV library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Chapter 3. Deep Learning models and libraries
3.1 Introduction
This chapter is devoted in the first place to select the right model, to determine the appropriate
hyper parameters and then to test the chosen model.
3.2 MobileNet V2 model
3.2.1 Presentation
MobileNet is a neural network architecture optimized for mobile and embedded devices,
offering efficiency and compactness. It employs depthwise separable convolutions to reduce computational
cost while maintaining accuracy. It’s commonly used for tasks like image classification, object
detection, and semantic segmentation on resource-constrained devices. We utilized MobileNet for
object detection and recognition; we employ a confidence threshold set above 0.5. This means that
only detections surpassing this threshold are accepted as valid. By implementing this strategy, we
prioritize more reliable detections, potentially enhancing the system’s accuracy.[3]
The figure below shows the logo of Mobilenet:
Figure 3.1: Logo of Mobilenet
3.2.2 Architecture
The figure below shows the architecture of MobileNetV2:
14
Figure 3.2: Architecture of Mobilenet[4]
The V2 of the MobileNet series introduces inverted residuals and linear bottlenecks to improve
MobileNets’ performance. Inverted residuals allow the network to compute activations (ReLU) more
efficiently and preserve more information after activation. To preserve this information, it becomes
important that the last activation in the bottleneck has a linear activation. The diagram below
from the original MobileNetV2 paper shows the bottleneck and includes inverted residuals. In this
diagram, thicker blocks have more channels.
15
Figure 3.3: Bottleneck and inverted residuals of MobileNet [5]
3.2.3 Use of the model
The figures below show how to use MobileNetV2 model :
Figure 3.4: MobilenetV2-Predict image[6]
16
Figure 3.5: MobilenetV2-Use of the model[6]
Figure 3.6: MobilenetV2-Train/test command[6]
The figure below shows the MobileNet V2 image classification:
17
Figure 3.7: Image classification for one hundred classes[6]
The figure below shows the MobileNetV2 performance :
Figure 3.8: Mobilenet-Performance[7]
18
3.3 OpenCV library
3.3.1 Presentation
3.3.1.1 OpenCV
OpenCV, which stands for Open Computer Vision, is a graphics library that provides various
image processing techniques. It is widely used in artificial intelligence and computer vision projects.
We utilized OpenCV Library 3413 for text extraction and recognition.[8]
The figure below shows the logo of OpenCV :
Figure 3.9: Logo of OpenCV
The figure below shows the use of text recognition model:
19
Figure 3.10: Text recognition model[9]
The figure below shows the text detection model:
20
Figure 3.11: Text detection model[9]
The figure below shows how to load images:
21
Figure 3.12: Loading images
The figure below shows how to set parameters and the input/output process:
22
Figure 3.13: Setting parameters and inference
3.4 Conclusion
In this chapter we have selected the appropriate model and libraries for our application and
we have tested their performance. In the next chapter we will talk about deploying the model in a
mobile application.
23
Chapter 4
Realization and Implementation
Plan
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2 Work Environment and Tools . . . . . . . . . . . . . . . . . . . . . . . . . 25
3 Overview of the app . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Chapter 4. Realization and Implementation
4.1 Introduction
In this chapter, we will present the tools adopted as well as the hardware and software
environment where our application was developed, then we will end up introducing some interfaces.
4.2 Work Environment and Tools
In this section, we illustrate the selected work environments and technologies chosen to
implement our system.
4.2.1 Technical Choices
To implement our solution in an easy and optimal manner, and with the aim of keeping up
with new techniques, we have opted for Java 21 which is the latest version of the Java programming
language and platform, featuring performance improvements, enhanced security, and language updates.
We used Kotlin also which is a modern, concise programming language by JetBrains, known for its
interoperability with Java and its safety features like null safety and type inference.
4.2.2 Working Environment
4.2.2.1 Characteristics of the machine
This application was developed on a Microsoft Surface Book 2 machine with the following
characteristics:
— Processor: Intel(R) Core(TM) i7
— RAM: 16 GB
— Operating System: Microsoft Windows 10
— Hard Disk: 128 GB SSD
4.2.2.2 Android Studio
Android Studio is a development environment created by Google for building Android applications.
It provides comprehensive tools for designing, developing, and deploying mobile apps.[10]
The figure below shows the Android studio logo.
25
Figure 4.1: Logo of Android Studio
4.2.2.3 Firebase ML Kit
Firebase ML Kit is a mobile SDK by Google for adding machine learning features to Android
and iOS apps. It provides ready-to-use models for tasks like text recognition and image labeling, as
well as support for custom models.[11] The figure below shows the Logo of Firebase ML kit.
Figure 4.2: Logo of Firebase ML kit
4.2.3 Packages and Tools
Several packages have been used in our work, we list some of them below:
4.2.3.1 TensorFlow Lite
This dependencies provide support for TensorFlow Lite, a lightweight version of the TensorFlow
machine learning framework optimized for mobile and embedded devices. It offer utilities for loading
and running TensorFlow Lite models, as well as metadata parsing capabilities.[12]
The figure below shows the logo of TensorFlow Lite
26
Figure 4.3: Logo of TensorFlow Lite
4.2.4 Permissions
Several permissions must be allowed to use the application, we list some of them below:
4.2.4.1 Camera permission
This permission grants the app access to the device’s camera hardware. It allows the app to
capture photos and videos using the device’s camera.
Figure 4.4: First permission request
27
Figure 4.5: Second permission request
4.3 Overview of the app
In this section, we will detail each interface of the general workflow accompanied by an
explanation of each tool used to achieve our final result, which is the object and text detection and
recognition and the navigation between. The figures below illustrates the app’s logo and the splash
interface.
Figure 4.6: Logo of the app
28
Figure 4.7: Splash interface and slogan
4.3.1 Home interface
When users launch the app, they are greeted with the home interface, and a welcome message
is conveyed in multiple languages (Arabic, French, English) based on the user’s default language
settings. Subsequently, a voice guide directs users on how to utilize the app: a simple click leads to
text detection, while a double-click triggers object detection. The figures below illustrates the home
interface:
29
Figure 4.8: Home interface
4.3.2 Object recognition interface
Users can access the object recognition interface by performing a double-click gesture on the
screen. This interface supports multiple languages: French, English, and Arabic. The language
displayed is automatically detected based on the default language setting of the user’s phone,
otherwise, the default language will be English. This approach ensures a user-friendly experience
by catering to diverse linguistic preferences and simplifying navigation through intuitive gestures.
Additionally, voice assistance is provided for every detected object, enhancing accessibility and
usability for users with varying needs and preferences. The figures below illustrate some of the
30
detected objects in English, French and Arabic:
Figure 4.9: Some of detected objects in English
31
Figure 4.10: Some of detected objects in French
32
Figure 4.11: Some of detected objects in Arabic
33
Figure 4.12: Some of detected objects in Arabic
4.3.3 Text recognition interface
Users can access the text recognition interface with a simple click on the screen. They can take
a photo of the text to read by pressing the volume up button, followed by the volume down button.
This approach ensures a user-friendly experience, accommodating diverse linguistic preferences and
simplifying navigation through intuitive gestures. Additionally, voice assistance is provided for every
detected word, enhancing accessibility and usability for users with varying needs. The figures below
illustrates some texts detected by the Warrini’s text detection interface:
34
Figure 4.13: Recognized text
35
36
37
Figure 4.16: Recognized medication leaflet
4.4 Conclusion
This chapter provides an overview of our app’s development journey. It covers technology
choices, tools, and features which are object and text recognition. The chapter also discusses
permissions, user interface details, and emphasizes a user-friendly experience throughout.
38
General Conclusion
Having completed this project, we have realized the profound impact that technology can have
on the lives of visually impaired individuals. Our journey from conceptualization to implementation
has reinforced the importance of developing solutions that address the unique challenges faced by
this community. This project is not merely a technical exercise; it represents a significant step
towards creating a more inclusive society where everyone, regardless of ability, can fully participate
and engage with the world around them.
The specialized mobile application we have developed has the potential to revolutionize the
way visually impaired individuals interact with their environment. By harnessing the power of
artificial intelligence and mobile technology, we have created a tool that empowers users to identify
objects, read texts, and navigate their surroundings with greater independence and confidence.
This project underscores the transformative power of technology to break down barriers and
promote accessibility for all.
Looking ahead, we envision expanding the capabilities of our application to further enhance
the user experience and address additional needs of the visually impaired community. One key
aspect we plan to explore is the integration of advanced navigation features, including a guiding
person feature, allowing users to share their location with a trusted individual and communicate
with them directly through the app for added support and assistance.
In conclusion, this project has been a journey of innovation, empathy, and social
responsibility. We are proud of the work we have accomplished and are excited about the potential
impact it can have on the lives of visually impaired individuals. As we continue to refine and
improve our solution, we remain committed to advocating for inclusivity and accessibility in the
field of technology. Through collaboration, empathy, and a dedication to making a difference, we
believe that we can create a more inclusive world for all.
39
Netography
[1] CRISP-DM https://www.datascience-pm.com/crisp-dm-2/, (25/04/2024)
[2] CRISP-DM picture, https://app.myeducator.com/reader/web/1421a/2/qk5s5/, (25/04/2024)
[3] Mobilenet model, https://keras.io/api/applications/mobilenet/, (29/04/2024)
[4] Architecture Mobilenet, https://lixinso.medium.com/mobilenet-c08928f2dba7, (29/04/2024)
[5] Residuals of Mobilenet, https://medium.com/@luisg onzales/a − look − at − mobilenetv2 − inverted − resid
[6] Use of Mobilenet, https://mmpretrain.readthedocs.io/en/latest/papers/mobilenetv 2.html, (1/05/2024)
[7] Performance of Mobilenet, https://lixinso.medium.com/mobilenet-c08928f2dba7, (1/05/2024)
[8] OpenCV, https://opencv.org//, (29/04/2024)
[9] OpenCV model’s use, https://opencv.org//, (29/04/2024)
[10] Android Studio, https://developer.android.com/studio/, (29/04/2024)
[11] Firebase ML Kit, https://firebase.google.com/docs/ml-kit/, (29/04/2024)
[12] Tensorflow Lite, https://www.tensorflow.org/lite/, (29/04/2024)
40
Résumé
Le projet présente une application mobile axée sur l’aide aux personnes malvoyantes grâce
à la reconnaissance d’objets en temps réel, à la détection de texte et à l’assistance vocale.
Mots-clés : MobileNet, OpenCV, Android Studio, IA, reconnaissance d’objets,
détection de texte, assistance vocale...
Abstract
The project introduces a mobile application focused on aiding visually impaired individuals
through real-time object recognition, text detection, and voice assistance.
Keywords: MobileNet, OpenCV, Android Studio, AI, object recognition,
text detection,voice assistance...
Plm
® ©rOb `S ©¤Ð QAJ± dAsm A¾Ahw A¾¯wm A¾AqybW ¤rKm dq§
.Ty wO dAsm ¤ Pn K¤ ¨l`f w ¨ An¶Ak Yl r`t
...Ty wO dAsm ,Pn K ,An¶Ak Yl r`t ,¨AnW}¯ ºA@A:y Afm Amlk

Pfa2-2024-13

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pfa2-2024-13

Uploaded by

Copyright:

Available Formats

Réf : PFA2-2024-13

End of Year Project Report

Presented and publicly defended on 09/04/2024

Object and text recognition mobile

Supervised by: Mr. Mehrez BOULARES

President of the jury: Mrs. Sonda CHTOUROU

Academic Year : 2023-2024

As we come to the end of this journey, we wish to express our deepest

gratitude to our parents. Their unwavering support and encouragement have

a part in bringing this project to fruition.

guidance and support have been indispensable throughout this end-of-year

mentorship, encouragement, and constructive feedback have profoundly

2 Analysis and Design 6

3 Deep Learning models and libraries 13

4 Realization and Implementation 24

1.1 CRISP-DM [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1 Global use case diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1 Logo of Mobilenet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.1 Logo of Android Studio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.1 Textuel description of the CU « Recognize text » . . . . . . . . . . . . . . . . . . . . 10

2 Presentation of the subject . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 Study of the existing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

5 Project management method . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Presentation of the subject

1.3 Study of the existing

1.4 Proposed solution

1.5 Project management method

Every IT project is guided by an appropriate development method to achieve its objectives

• Business understanding: focuses on understanding the objectives and requirements of the

Figure 1.1: CRISP-DM [2]

Analysis and Design

3 Analysis of the requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Actors’ Identification

2.3 Analysis of the requirements

The specification of the requirements is considered an essential phase in the planning of a

2.3.1 Functionnal needs:

2.3.2 Non-functional needs

— Predict and provide offline text and object recognition capabilities.

— Query locally stored databases for offline data retrieval.

— Reliability : The application must present accurate and fair results.

— Usability : The application must be easy to use, intuitive, and user-friendly

— Maintenance : The application must be easy to maintain, update, and deploy.

2.3.3 Use case diagram :

Figure 2.1: Global use case diagram

2.3.4 Use case description

— Textual description of the CU « Recognize text »

Tableau 2.1: Textuel description of the CU « Recognize text »

Title Recognize text

Pre-condition Presence of a text to recognize

Post-conditions Recognized text.

— User capture image via the up-volume button.

— User clicks on the down-volume button.

— The system extracts text and detect it.

— Pass the text through the text processing phase

— The text is ready, and a voice reads aloud what is written

— Textuel description of the CU « Recognize objects »

Tableau 2.2: Textuel description of the CU « Recognize objects »

Title Recognize objects

Pre-condition The presence of an object to detect.

Post-conditions Object Detected.

— Object detected and a voice reads what is detected

2.4.1 The sequence diagram

Figure 2.2: The sequence diagram

and also the list of non-functional requirements.