Download as pdf or txt
Download as pdf or txt
You are on page 1of 48

Réf : PFA2-2024-13

End of Year Project Report


of
Second Year in Software Engineering

Presented and publicly defended on 09/04/2024


By
Aicha BEN FADHEL
Mariem SAYEDI
Asma CHALLOUF

Object and text recognition mobile


application for the visually impaired

Supervised by: Mr. Mehrez BOULARES

President of the jury: Mrs. Sonda CHTOUROU

Academic Year : 2023-2024

5, Avenue Taha Hussein–Tunis Tel. : 71 . 496 . 066 : Ah˜ Hžw þ Ÿys ¢V ŠCAJ , 5
B. P. 56, Bab Menara 1008 Fax : 71 . 391. 166 :H•A 1008 ­CAn› A 56 :. . Q
Acknowledgments

As we come to the end of this journey, we wish to express our deepest

gratitude to our parents. Their unwavering support and encouragement have


been the bedrock of our success, guiding us to this significant milestone in our

educational pursuit.
Additionally, we extend our sincere appreciation to all those who have played

a part in bringing this project to fruition.


We owe a debt of gratitude to our supervisor, Mehrez Boulares, whose

guidance and support have been indispensable throughout this end-of-year


project at The Higher National Engineering School of Tunis (ENSIT). His

mentorship, encouragement, and constructive feedback have profoundly


influenced our approach, shaping our work with precision and excellence. We

are thankful for his openness to our ideas while maintaining a discerning eye
on our progress.

Our heartfelt appreciation also extends to everyone who has supported us along
our academic journey. We extend special acknowledgment to the members of

the jury who generously dedicated their time and expertise to evaluate our
work, providing invaluable insights and contributing to our growth.

ENSIT has been instrumental in shaping our academic and personal growth,
and we are deeply grateful for the opportunities it has provided us.

1
Table of Content

General Introduction 1

1 General Framework 2
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Presentation of the subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Study of the existing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Proposed solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Project management method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Analysis and Design 6


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Actors’ Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Analysis of the requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.1 Functionnal needs : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.2 Non-functional needs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.3 Use case diagram : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.4 Use case description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4.1 The sequence diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Deep Learning models and libraries 13


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 MobileNet V2 model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.1 Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.3 Use of the model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 OpenCV library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3.1 Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4 Realization and Implementation 24


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Work Environment and Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.1 Technical Choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.2 Working Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.3 Packages and Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.4 Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Overview of the app . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3.1 Home interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.3.2 Object recognition interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3.3 Text recognition interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

General Conclusion 39

3
List of Figures

1.1 CRISP-DM [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1 Global use case diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9


2.2 The sequence diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1 Logo of Mobilenet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14


3.2 Architecture of Mobilenet[4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Bottleneck and inverted residuals of MobileNet [5] . . . . . . . . . . . . . . . . . . . 16
3.4 MobilenetV2-Predict image[6] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.5 MobilenetV2-Use of the model[6] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.6 MobilenetV2-Train/test command[6] . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.7 Image classification for one hundred classes[6] . . . . . . . . . . . . . . . . . . . . . . 18
3.8 Mobilenet-Performance[7] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.9 Logo of OpenCV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.10 Text recognition model[9] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.11 Text detection model[9] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.12 Loading images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.13 Setting parameters and inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.1 Logo of Android Studio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26


4.2 Logo of Firebase ML kit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3 Logo of TensorFlow Lite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4 First permission request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5 Second permission request . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.6 Logo of the app . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.7 Splash interface and slogan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.8 Home interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.9 Some of detected objects in English . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.10 Some of detected objects in French . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.11 Some of detected objects in Arabic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4
4.12 Some of detected objects in Arabic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.13 Recognized text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.14 Recognized text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.15 Recognized text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.16 Recognized medication leaflet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5
List of Tables

2.1 Textuel description of the CU « Recognize text » . . . . . . . . . . . . . . . . . . . . 10


2.2 Textuel description of the CU « Recognize objects » . . . . . . . . . . . . . . . . . . 10

6
General Introduction

Visual impairment can significantly impact one’s ability to interact with the world effectively.
Tasks that are routine and effortless for sighted individuals, such as identifying objects, reading texts,
or navigating unfamiliar places, can pose substantial challenges for those with visual impairments.
This underscores the importance of developing assistive technologies that cater specifically to the
needs of visually impaired individuals. Innovation in this area is not just a matter of convenience;
it is a moral imperative and a societal responsibility. Engineers and technology developers play a
crucial role in advancing solutions that enhance accessibility and inclusion for all.
As software engineering students, we have chosen to develop a specialized mobile application
dedicated to assisting visually impaired individuals in order to promote inclusion and accessibility.
This decision stems from our belief in the transformative power of technology to create positive social
impact. Our application aims to leverage artificial intelligence algorithms to provide real-time object
identification for users with visual impairments, empowering them to navigate their surroundings
more independently. By focusing on the needs of this underserved community, we aspire to contribute
towards a more inclusive society where everyone, regardless of ability, has equal access to the tools
and resources needed to thrive. Through this project, we are committed to bridging the gap between
technology and accessibility, fostering empathy-driven innovation, and advocating for the rights and
empowerment of individuals with disabilities. We view this endeavor not just as a technical challenge,
but as a meaningful opportunity to make a tangible difference in the lives of others and promote a
culture of inclusivity within the field of software engineering.
This report summarizes the steps in the implementation of this system. It is structured into
four chapters as follows:
— The first chapter presents the general framework of the project.

— The second chapter presents the functional analysis and design of our application in order to
specify the requirements.

— The third chapter presents the deep learning models and libraries .

— The fourth chapter presents the realization and implementation of our solution.

We conclude this report with a general conclusion and some potential perspectives that can improve
our solution.

1
Chapter 1

General Framework

Plan
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Presentation of the subject . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 Study of the existing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

4 Proposed solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

5 Project management method . . . . . . . . . . . . . . . . . . . . . . . . . 4

6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Chapter 1. General Framework

1.1 Introduction

The study of a project is a strategic approach that will allow us to have a vision on the
latter, thus aiming to organize the smooth running of the project. In this first chapter, we present
the context of the project as well as a study of the existing situation, the proposed solution and the
project management method that we followed.

1.2 Presentation of the subject

Object recognition for visually impaired assistance revolves around the development and
implementation of technologies aimed at helping individuals with visual impairments identify objects
in their surroundings. This area of study is critical due to the challenges faced by visually impaired
individuals in recognizing and interacting with objects independently. By leveraging computer vision
and artificial intelligence techniques, researchers and developers are exploring methods to enable
real-time object recognition through devices like smartphones or wearable technology.

In our project, we are tackling the challenge of improving object recognition for visually
impaired individuals without the need for specialized hardware. Our approach involves leveraging
computer vision and artificial intelligence techniques to enable real-time object identification using
smartphones or other accessible devices. The goal is to develop an application that harnesses the
power of these technologies to assist users in identifying objects in their environment efficiently
and accurately. Additionally, our application includes text detection capabilities, allowing users to
identify text in their surroundings. Through the use of innovative algorithms and user-friendly
interfaces, we aim to create a tool that enhances accessibility and promotes independence for
individuals with visual impairments, ultimately empowering them to interact more confidently with
their surroundings.

1.3 Study of the existing

In examining the existing landscape of applications in this field, it’s evident that there are
several offerings, although not all are specifically designed to cater to the needs of visually impaired
individuals. While these applications vary in their focus and functionality, they generally lack the
comprehensive adaptation required to effectively support users with visual impairments.

3
Chapter 1. General Framework

1.4 Proposed solution

Our project is distinct in its aim to address this gap by focusing on improving object
recognition and text detection specifically for the visually impaired, without relying on specialized
hardware. Through the utilization of computer vision and machine learning techniques, we intend
to develop a user-friendly application that can be accessed on commonly available devices like
smartphones. This approach seeks to empower visually impaired individuals by providing them
with efficient and accurate object identification and text detection capabilities, ultimately promoting
greater independence and confidence in navigating their surroundings.

1.5 Project management method

Every IT project is guided by an appropriate development method to achieve its objectives


optimally and efficiently. In this context, we will present the methodology that we have adopted for
optimal project management, CRISP-DM. The CRoss Industry Standard Process for Data Mining
(CRISP-DM) is a process model that serves as the base for a data science process. It has six
sequential phases:

• Business understanding: focuses on understanding the objectives and requirements of the


project.

• Data understanding: focuces on identifying, collecting, and analyzing the data sets that
can help you accomplish the project goals.

• Data preparation: often referred to as “data munging”, prepares the final data set(s) for
modeling. It has five tasks: select, clean, construct, integrate and format data.

• Modeling: build and assess various models based on several different modeling techniques.

• Evaluation: looks more broadly at which model best meets the business and what to do next.

• Deployment: This is the final stage of the process. Indeed, when the developed model is
ready, it is deployed and integrated into daily use. Therefore, the objective of this stage is
deployment planning, monitoring, and maintenance.[1]

4
Chapter 1. General Framework

Figure 1.1: CRISP-DM [2]

1.6 Conclusion

Throughout this chapter, we have presented the general context of our project as well as the
study of the existing situation, the proposed solution and the project management method. The
following chapter will be devoted to the analysis and design.

5
Chapter 2

Analysis and Design

Plan
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Actors’ Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Analysis of the requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Chapter 2. Analysis and Design

2.1 Introduction

In this chapter, we will analyze and specify the business requirements. Next, we will present
the use case diagram along with the textual description and the detailed sequence diagram. Finally,
we will develop the overall schema of the entire application accompanied by an explanation of each
step.

2.2 Actors’ Identification

This phase consists of highlighting the application’s context in order to address specific points
of the specifications and to clearly determine the functionalities. We will identify the actors in our
application.
An actor represents an abstract role performed by an external entity, such as a person, process, or
another system, that interacts with the system being designed. In our application, we have identified
just one actor who is:

— User : The user with visual impairment can get access to text recognition and to object
detection.

2.3 Analysis of the requirements

The specification of the requirements is considered an essential phase in the planning of a


project since it makes it possible to determine and define the customer’s needs.

2.3.1 Functionnal needs:

Functional needs express the services that the mobile application must provide in response
to the user request to meet his expectations. Our application offers the following functionalities:

— Text Recognition: Accurately detecting and recognizing text from various sources such as
documents, signs, labels, and screens.

— Object Detection: Accurately detecting and predicting objects within the user’s environment
using computer vision technology based on the default language settings (English, French,
Arabic) on the user’s phone.

— Voice Assistance: Offering a real-time audio feedback by converting recognized texts and
objects into speech, respecting the default language settings on the user’s phone.

7
Chapter 2. Analysis and Design

2.3.2 Non-functional needs

The non-functional needs express the internal requirements that the mobile application must
provide such as the constraints related to the environment and to the implementation. Our mobile
application must have the following characteristics:

— Accessibility : The application must embody accessibility, catering to the visually impaired
by implementing features and designs that facilitate seamless navigation and interaction.

— Ergonomics : The application must have simple interfaces that are easy to use and consistent.
It respects the density of components in each interface to satisfy the user.

— Offline Functionality :

— Predict and provide offline text and object recognition capabilities.

— Query locally stored databases for offline data retrieval.

— Reliability : The application must present accurate and fair results.

— Usability : The application must be easy to use, intuitive, and user-friendly

— Maintenance : The application must be easy to maintain, update, and deploy.

2.3.3 Use case diagram :

The figure 2.1 below presents the overall operational use case diagram of the application
benefiting the actors.

8
Chapter 2. Analysis and Design

Figure 2.1: Global use case diagram

2.3.4 Use case description

— Textual description of the CU « Recognize text »

9
Chapter 2. Analysis and Design

Tableau 2.1: Textuel description of the CU « Recognize text »

Title Recognize text

Actor User

Pre-condition Presence of a text to recognize

Post-conditions Recognized text.

Nominal scenario
— User specifies the text or the area where he want to detect
text .

— User capture image via the up-volume button.

— User clicks on the down-volume button.

— The system extracts text and detect it.

— Text Detected.

— Pass the text through the text processing phase

— The text is ready, and a voice reads aloud what is written


or detected.

— Textuel description of the CU « Recognize objects »

Tableau 2.2: Textuel description of the CU « Recognize objects »

Title Recognize objects

Actor User

Pre-condition The presence of an object to detect.

Post-conditions Object Detected.

Nominal scenario
— User specify the object to detect

— Object detected and a voice reads what is detected

10
Chapter 2. Analysis and Design

2.4 Design

2.4.1 The sequence diagram

In this section, we illustrate the dynamic aspect of our application by presenting the sequence
diagram. The sequence diagram focuses more specifically on the temporal interactions between the
actors and the system. In other words, it describes the process and messages exchanged between
them in order to produce a function.
The figure 2.2 below represents our sequence diagram :

Figure 2.2: The sequence diagram

2.5 Conclusion

During this chapter, we have elaborated on the analysis and design of our application. We
began with the specification and analysis of requirements. We identified the actors as well as the
list of functional requirements through the use case diagram accompanied by a textual description,

11
Chapter 2. Analysis and Design

and also the list of non-functional requirements.


Finally, we drafted the design of the application by presenting the sequence diagram.
After completing this analysis, the next step is to understand more the model that we’ll be working
with.

12
Chapter 3

Deep Learning models and libraries

Plan
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 MobileNet V2 model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 OpenCV library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Chapter 3. Deep Learning models and libraries

3.1 Introduction

This chapter is devoted in the first place to select the right model, to determine the appropriate
hyper parameters and then to test the chosen model.

3.2 MobileNet V2 model

3.2.1 Presentation

MobileNet is a neural network architecture optimized for mobile and embedded devices,
offering efficiency and compactness. It employs depthwise separable convolutions to reduce computational
cost while maintaining accuracy. It’s commonly used for tasks like image classification, object
detection, and semantic segmentation on resource-constrained devices. We utilized MobileNet for
object detection and recognition; we employ a confidence threshold set above 0.5. This means that
only detections surpassing this threshold are accepted as valid. By implementing this strategy, we
prioritize more reliable detections, potentially enhancing the system’s accuracy.[3]
The figure below shows the logo of Mobilenet:

Figure 3.1: Logo of Mobilenet

3.2.2 Architecture

The figure below shows the architecture of MobileNetV2:

14
Chapter 3. Deep Learning models and libraries

Figure 3.2: Architecture of Mobilenet[4]

The V2 of the MobileNet series introduces inverted residuals and linear bottlenecks to improve
MobileNets’ performance. Inverted residuals allow the network to compute activations (ReLU) more
efficiently and preserve more information after activation. To preserve this information, it becomes
important that the last activation in the bottleneck has a linear activation. The diagram below
from the original MobileNetV2 paper shows the bottleneck and includes inverted residuals. In this
diagram, thicker blocks have more channels.

15
Chapter 3. Deep Learning models and libraries

Figure 3.3: Bottleneck and inverted residuals of MobileNet [5]

3.2.3 Use of the model

The figures below show how to use MobileNetV2 model :

Figure 3.4: MobilenetV2-Predict image[6]

16
Chapter 3. Deep Learning models and libraries

Figure 3.5: MobilenetV2-Use of the model[6]

Figure 3.6: MobilenetV2-Train/test command[6]

The figure below shows the MobileNet V2 image classification:

17
Chapter 3. Deep Learning models and libraries

Figure 3.7: Image classification for one hundred classes[6]

The figure below shows the MobileNetV2 performance :

Figure 3.8: Mobilenet-Performance[7]

18
Chapter 3. Deep Learning models and libraries

3.3 OpenCV library

3.3.1 Presentation

3.3.1.1 OpenCV

OpenCV, which stands for Open Computer Vision, is a graphics library that provides various
image processing techniques. It is widely used in artificial intelligence and computer vision projects.
We utilized OpenCV Library 3413 for text extraction and recognition.[8]
The figure below shows the logo of OpenCV :

Figure 3.9: Logo of OpenCV

The figure below shows the use of text recognition model:

19
Chapter 3. Deep Learning models and libraries

Figure 3.10: Text recognition model[9]

The figure below shows the text detection model:

20
Chapter 3. Deep Learning models and libraries

Figure 3.11: Text detection model[9]

The figure below shows how to load images:

21
Chapter 3. Deep Learning models and libraries

Figure 3.12: Loading images

The figure below shows how to set parameters and the input/output process:

22
Chapter 3. Deep Learning models and libraries

Figure 3.13: Setting parameters and inference

3.4 Conclusion

In this chapter we have selected the appropriate model and libraries for our application and
we have tested their performance. In the next chapter we will talk about deploying the model in a
mobile application.

23
Chapter 4

Realization and Implementation

Plan
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2 Work Environment and Tools . . . . . . . . . . . . . . . . . . . . . . . . . 25

3 Overview of the app . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Chapter 4. Realization and Implementation

4.1 Introduction

In this chapter, we will present the tools adopted as well as the hardware and software
environment where our application was developed, then we will end up introducing some interfaces.

4.2 Work Environment and Tools

In this section, we illustrate the selected work environments and technologies chosen to
implement our system.

4.2.1 Technical Choices

To implement our solution in an easy and optimal manner, and with the aim of keeping up
with new techniques, we have opted for Java 21 which is the latest version of the Java programming
language and platform, featuring performance improvements, enhanced security, and language updates.
We used Kotlin also which is a modern, concise programming language by JetBrains, known for its
interoperability with Java and its safety features like null safety and type inference.

4.2.2 Working Environment

4.2.2.1 Characteristics of the machine

This application was developed on a Microsoft Surface Book 2 machine with the following
characteristics:

— Processor: Intel(R) Core(TM) i7

— RAM: 16 GB

— Operating System: Microsoft Windows 10

— Hard Disk: 128 GB SSD

4.2.2.2 Android Studio

Android Studio is a development environment created by Google for building Android applications.
It provides comprehensive tools for designing, developing, and deploying mobile apps.[10]
The figure below shows the Android studio logo.

25
Chapter 4. Realization and Implementation

Figure 4.1: Logo of Android Studio

4.2.2.3 Firebase ML Kit

Firebase ML Kit is a mobile SDK by Google for adding machine learning features to Android
and iOS apps. It provides ready-to-use models for tasks like text recognition and image labeling, as
well as support for custom models.[11] The figure below shows the Logo of Firebase ML kit.

Figure 4.2: Logo of Firebase ML kit

4.2.3 Packages and Tools

Several packages have been used in our work, we list some of them below:

4.2.3.1 TensorFlow Lite

This dependencies provide support for TensorFlow Lite, a lightweight version of the TensorFlow
machine learning framework optimized for mobile and embedded devices. It offer utilities for loading
and running TensorFlow Lite models, as well as metadata parsing capabilities.[12]
The figure below shows the logo of TensorFlow Lite

26
Chapter 4. Realization and Implementation

Figure 4.3: Logo of TensorFlow Lite

4.2.4 Permissions

Several permissions must be allowed to use the application, we list some of them below:

4.2.4.1 Camera permission

This permission grants the app access to the device’s camera hardware. It allows the app to
capture photos and videos using the device’s camera.

Figure 4.4: First permission request

27
Chapter 4. Realization and Implementation

Figure 4.5: Second permission request

4.3 Overview of the app

In this section, we will detail each interface of the general workflow accompanied by an
explanation of each tool used to achieve our final result, which is the object and text detection and
recognition and the navigation between. The figures below illustrates the app’s logo and the splash
interface.

Figure 4.6: Logo of the app

28
Chapter 4. Realization and Implementation

Figure 4.7: Splash interface and slogan

4.3.1 Home interface

When users launch the app, they are greeted with the home interface, and a welcome message
is conveyed in multiple languages (Arabic, French, English) based on the user’s default language
settings. Subsequently, a voice guide directs users on how to utilize the app: a simple click leads to
text detection, while a double-click triggers object detection. The figures below illustrates the home
interface:

29
Chapter 4. Realization and Implementation

Figure 4.8: Home interface

4.3.2 Object recognition interface

Users can access the object recognition interface by performing a double-click gesture on the
screen. This interface supports multiple languages: French, English, and Arabic. The language
displayed is automatically detected based on the default language setting of the user’s phone,
otherwise, the default language will be English. This approach ensures a user-friendly experience
by catering to diverse linguistic preferences and simplifying navigation through intuitive gestures.
Additionally, voice assistance is provided for every detected object, enhancing accessibility and
usability for users with varying needs and preferences. The figures below illustrate some of the

30
Chapter 4. Realization and Implementation

detected objects in English, French and Arabic:

Figure 4.9: Some of detected objects in English

31
Chapter 4. Realization and Implementation

Figure 4.10: Some of detected objects in French

32
Chapter 4. Realization and Implementation

Figure 4.11: Some of detected objects in Arabic

33
Chapter 4. Realization and Implementation

Figure 4.12: Some of detected objects in Arabic

4.3.3 Text recognition interface

Users can access the text recognition interface with a simple click on the screen. They can take
a photo of the text to read by pressing the volume up button, followed by the volume down button.
This approach ensures a user-friendly experience, accommodating diverse linguistic preferences and
simplifying navigation through intuitive gestures. Additionally, voice assistance is provided for every
detected word, enhancing accessibility and usability for users with varying needs. The figures below
illustrates some texts detected by the Warrini’s text detection interface:

34
Chapter 4. Realization and Implementation

Figure 4.13: Recognized text

35
Chapter 4. Realization and Implementation

Figure 4.14: Recognized text

36
Chapter 4. Realization and Implementation

Figure 4.15: Recognized text

37
Chapter 4. Realization and Implementation

Figure 4.16: Recognized medication leaflet

4.4 Conclusion

This chapter provides an overview of our app’s development journey. It covers technology
choices, tools, and features which are object and text recognition. The chapter also discusses
permissions, user interface details, and emphasizes a user-friendly experience throughout.

38
General Conclusion

Having completed this project, we have realized the profound impact that technology can have
on the lives of visually impaired individuals. Our journey from conceptualization to implementation
has reinforced the importance of developing solutions that address the unique challenges faced by
this community. This project is not merely a technical exercise; it represents a significant step
towards creating a more inclusive society where everyone, regardless of ability, can fully participate
and engage with the world around them.
The specialized mobile application we have developed has the potential to revolutionize the
way visually impaired individuals interact with their environment. By harnessing the power of
artificial intelligence and mobile technology, we have created a tool that empowers users to identify
objects, read texts, and navigate their surroundings with greater independence and confidence.
This project underscores the transformative power of technology to break down barriers and
promote accessibility for all.
Looking ahead, we envision expanding the capabilities of our application to further enhance
the user experience and address additional needs of the visually impaired community. One key
aspect we plan to explore is the integration of advanced navigation features, including a guiding
person feature, allowing users to share their location with a trusted individual and communicate
with them directly through the app for added support and assistance.
In conclusion, this project has been a journey of innovation, empathy, and social
responsibility. We are proud of the work we have accomplished and are excited about the potential
impact it can have on the lives of visually impaired individuals. As we continue to refine and
improve our solution, we remain committed to advocating for inclusivity and accessibility in the
field of technology. Through collaboration, empathy, and a dedication to making a difference, we
believe that we can create a more inclusive world for all.

39
Netography

[1] CRISP-DM https://www.datascience-pm.com/crisp-dm-2/, (25/04/2024)

[2] CRISP-DM picture, https://app.myeducator.com/reader/web/1421a/2/qk5s5/, (25/04/2024)

[3] Mobilenet model, https://keras.io/api/applications/mobilenet/, (29/04/2024)

[4] Architecture Mobilenet, https://lixinso.medium.com/mobilenet-c08928f2dba7, (29/04/2024)

[5] Residuals of Mobilenet, https://medium.com/@luisg onzales/a − look − at − mobilenetv2 − inverted − resid

[6] Use of Mobilenet, https://mmpretrain.readthedocs.io/en/latest/papers/mobilenetv 2.html, (1/05/2024)

[7] Performance of Mobilenet, https://lixinso.medium.com/mobilenet-c08928f2dba7, (1/05/2024)

[8] OpenCV, https://opencv.org//, (29/04/2024)

[9] OpenCV model’s use, https://opencv.org//, (29/04/2024)

[10] Android Studio, https://developer.android.com/studio/, (29/04/2024)

[11] Firebase ML Kit, https://firebase.google.com/docs/ml-kit/, (29/04/2024)

[12] Tensorflow Lite, https://www.tensorflow.org/lite/, (29/04/2024)

40
Résumé
Le projet présente une application mobile axée sur l’aide aux personnes malvoyantes grâce
à la reconnaissance d’objets en temps réel, à la détection de texte et à l’assistance vocale.
Mots-clés : MobileNet, OpenCV, Android Studio, IA, reconnaissance d’objets,

détection de texte, assistance vocale...

Abstract
The project introduces a mobile application focused on aiding visually impaired individuals
through real-time object recognition, text detection, and voice assistance.

Keywords: MobileNet, OpenCV, Android Studio, AI, object recognition,

text detection,voice assistance...

Plm˜
š® Ÿ› ©rOb˜ `S˜ ©¤Ð QAJ± ­dˆAsm˜ A¾Ahw› A¾¯wm› A¾AqybW Š¤rKm˜ dq§
.Ty wO˜ ­dˆAsm˜ ¤ Pn˜ K•¤ ¨l`f˜ ’w˜ ¨ An¶Ak˜ Ylˆ ‘r`t˜

...Ty wO˜ ­dˆAsm˜ ,Pn˜ K• ,An¶Ak˜ Ylˆ ‘r`t˜ ,¨ˆAnW}¯ ºA•@˜A:y Afm˜ Amlk˜

You might also like