Professional Documents
Culture Documents
Mini Project Report 22
Mini Project Report 22
Mini Project Report 22
ON
Submitted by:
Paras Chandra (2000970130074)
Purushottam Varshney (2000970130087)
Rohan Kumar Sinha (2000970130095)
Session 2022-23
Department of Information Technology
Galgotias College of Engineering and Technology
Greater Noida
ACKNOWLEDGEMENT
We want to give special thanks to our Mini Project coordinator, Ms. Raunak
Sulekh for the timely advice and valuable guidance during designing and
implementation of this project work.
We also want to express our sincere thanks and gratitude to Dr. Sanjeev
Kumar Singh, Head of Department (HOD), and Information Technology
Department for providing us with the facilities and for all the encouragement
and support.
Finally, we express our sincere thanks to all staff members in the department
of Information Technology branch for all the support and cooperation.
Paras Chandra
(2000970130074)
Purushottam Varshney
(2000970130087)
1. INTRODUCTION…………………………………………………....7-9
1.1 Project Description………………………………………………………...7-8
1.2 Proposed System…………………………………………………………...8-9
1.2.1 Architecture of Recommendation Model….……………………..8-9
1.2.2 Applications Based on Content Recommendation Model………….9
2. LITERATURE REVIEW………………………………………....10-12
2.1 Historical Overview……………………………………………………...…10
2.2 Related Work……………………………………………………………10-11
2.3 Problem Statement………………………………………………………11-12
2.3.1 Statement……………………………………………………...….11
2.3.2 Problem Solution…………………………………………………12
3. IMPLEMENTATION………………...…………………………...13-24
3.1 Project Scope and Features……………………………………………….....13
3.2 Methodology and Tools Used…………………………………………...13-16
3.2.1 Our Approach……………………………………………………..13
3.2.2 Description of Modules…………………………………………...14
3.2.3 Used Technologies………………………………………….…14-16
3.2.3.1 Python……………………………………………….14-15
3.2.3.2 Pyttsx3……………………………………...………......15
3.2.3.3 Deepface.……….………………………………………15
3.2.3.4 Visual Studio Code………………………………….......15
3.2.3.5 OpenCV…………………………………………..…….16
3.2.3.6 Speech Recognition……………………………………..16
3.2.3.7 Spotipy-Spotify API…………………………………….16
3.3 Implementation………………………………………………………….16-23
3.3.1 Environment Setup……………………………………………16-17
3.3.1.1 Deepface Framework Installation……………….......16-17
3.3.1.2 Libraries Installation..…………………………………..17
3.3.2 Source File…………………………………………………….18-23
3.3.2.1 marc.py...……………………………………………18-21
3.3.2.2 Emotion Detection Module.…………………………….21
3.3.2.3 API Module… ……………………………………….....22
3.3.2.4 Classifier XML File…………………………………….23
3.4 Testing………………………………………………………………......23-24
4. RESULT / SNAPSHOTS……………………………………….....25-27
4.1 Wakeup & Capture Image..…………………………………………………25
4.2 Emotion Detection….…………………………………………………........25
4.3 Image Captured…………..…………………………………………………26
4.4 Content Recommendation.………………………………………………….26
4.5 Tasks Performed……………………………………………………………27
5. CONCLUSION AND FUTURE SCOPE………………………….....28
6. REFERENCES…………………………………………………..........29
LIST OF FIGURES
Today is the era of new technology. In such a busy life music plays an important role in
rejoicing the mood of the person. Music helps the person to refresh the mood and energy
of the person. People have different mood and emotion in different situation and every
emotion is expressed by the face of the person.
Recommending the content according to the present mood of the user will help to refresh
the mind and also make the person happy and feel good. Earlier recommendation model
is based on the user’s data i.e., the model uses the history of the user to recommend
content such as music.
In this project, we have developed an emotion-based recommendation model which
includes a virtual assistant which acts as the Voice User Interface (VUI) for the system
and also to assist the model to detect the current emotion of the user and recommend the
content based on emotions detected. The virtual assistant will assist the user to navigate
through the system.
1. INTRODUCTION
In today’s world the feeling of the person can be judged through its emotion which
can be clearly understood from its facial expression. The feeling which is going
through our mind is expressed on our face. For example, if we are sad due to any
reason our face will be also be dull. So, providing content such as music or videos
or story or article according to the present mood of the user helps to refresh and also
satisfy the need of the user. It makes user feel good and happy. So, we have
developed emotion-based content recommendation model in which we will capture
the photograph of the user and our model will analyse the mood of the user and
according to the mood of the user the model will recommend content which will
help the user to get content according to his/her emotion. In our model we have used
emotion-based content recommendation rather than recommending content by
analysing the user history. The project will analyse the emotion of the user through
its facial expression. For taking photograph of the user, we will use the webcam.
Thereafter, the model will detect the emotion from the user by analysing the image
taken by the webcam through image segmentation and image processing techniques
which will use the features of facial detection system. Finally, the model will
recommend the content such as music or videos story or article according to the
mood of the user which the system has detected.
Today is the era of technology. The use of artificial intelligence in today’s world is
common. So, in our project we have included use of artificial intelligence in order
to provide user convenience. We have used virtual assistant which will assist the
user in navigating emotion-based recommendation model and also helps to user to
get content of its interest i.e., according to the mood of the user. It is an application
program that understands voice commands and natural language to perform task for
the user. In today’s world virtual assistant is very useful for the user. It helps the
user to communicate with system in human understandable language. It makes the
life simpler.
Page | 7
of facial landmarks, the ML classifier of Deepface library classifies the image. It
then returns a list of features with emotion dominance.
Today, everyone likes to have talk with system in human understandable language.
In our project we have included the virtual assistant which will assist the user to get
the content according to their mood. Getting content according to the mood
refreshes the mood of the user. Nowadays, user don’t want to navigate through the
software manually rather they like virtual assistant that can understand their human
understandable language and perform tasks which the user wants from the software.
The virtual assistant will capture user’s voice commands from the microphone,
converts it into plain text in system language. Thereafter, it performs the certain
task and finally produces the output in human understandable language. The virtual
assistant uses natural language processing model to decode the human language and
producing the desired result.
Page | 8
Figure 1.2: One level DFD for emotion detection module
Page | 9
2. LITERATURE REVIEW
2.1 Historical Overview
Peter Salovey and John D. Mayer coined the term ‘Emotional Intelligence’ in 1990
describing it as “a form of social intelligence that involves the ability to monitor
one’s own and others’ feelings and emotions, to discriminate among them, and to
use this information to guide one’s thinking and action”.[1]
In 1943, the concept of artificial neurons was published.
In 1950, “Computing Machinery and Intelligence”, A research paper by Alan
Turing proposed a concept of Turing test which will check machine ability to show
intelligence.
Year 1955: An Allen Newell and Herbert A. Simon created the "first artificial
intelligence program" Which was named as "Logic Theorist". This program had
proved 38 of 52 Mathematics theorems, and find new and more elegant proofs for
some theorems.
Year 1956: The word "Artificial Intelligence" first adopted by American Computer
scientist John McCarthy at the Dartmouth Conference. For the first time, AI coined
as an academic field. [3]
In 2011, First digital virtual assistant which was installed on Apple smartphone was
of Siri. In 2015, Cortana was developed by Microsoft. In 2016, Google assistant
developed by Google.
Page | 10
concept of taking photograph and also, they worked on the efficiency of the emotion
detection model. They have used Emotion Extraction Module to convert the image
captured to grayscale image to improve efficiency. Then, Audio Extraction module
is used to detect the emotion by the voice of the user and Finally Emotion-Audio
Integration Module to implement the model.
Divisha Pandey in their research work “Voice Assistant Using Python” has
developed a voice assistant which helps the user in navigating the desktop and to
perform tasks such as sending emails, text messages, open apps, folders, comments
etc. they used NLP model i.e., Natural Language processing Model that helps to
convert speech to text and analysing text and then producing output in human
understandable language.
Jayshree Jha et al. proposed an emotion-based music player using image processing
This showed how various algorithms and techniques that were suggested by
different authors in their research could be used for connecting the music player
along with human emotions. [5]
Bassam A, Raja N. et al, have wrote about statement and speech for communication
between humans and machines analog signals are used which is converted by
speech signal to digital wave. The technology is massively utilized and has
unlimited uses and also permit machines to reply accordingly to user’s command
and voices. Speech recognition system is growing day by day and also has unlimited
uses. [6]
Anjali Fapal, Trupati Kanade in their research paper Personal virtual Assistant for
windows, have proposed personal Voice assistant for navigating windows in python
language. They have also used the NLP model to develop virtual assistant. Their
approach for this project is that the model will first listen the voice, then recognise
the command and performs the tasks. They have developed this voice assistant that
will assist the user in navigating windows application.
Parul Tambe in his research work proposed the interaction of music and the user
automatically which will assist the model in recognising emotion.
Page | 11
2.3.2 Problem Solution
The previous works basically aims to detect emotions based on facial landmarks
and provide a virtual assistant for operating a system. The emotion detection
systems already developed focuses on just developing the model and finding its
accuracy and no application. Also, the models were used only for research purposes.
And the recommendation models developed earlier uses user data to recommend
content leading to large amount of data collection and user history. In previous work
there is no option for the user to get the content of other category if the user does
not want content according to his mood and the user has to navigate the
recommendation model manually.
So, to overcome all these problems, we searched for the solution and found that
using data set of approximately 10000 people which is tested and verified can solve
the problem of accuracy and also training of ML Classifier dataset and will use high
quality web cam for taking photograph.
We’ll also include the facility for the user to search other category content or music
at the time after the emotion detection model will detect the facial expression and
will communicate the user about their mood and if the user dos not want the content
according to his/her mood. We’ll also included the facility of the Virtual Assistant
which will play the role of Voice User Interface (VUI) that will assist the user in
navigating and communicating with the model in human understandable language
and also help the user to save time and energy.
Page | 12
3. IMPLEMENTATION
3.1 Project Scope and Features
1. The system provides functionality to capture user’s emotions through facial
expressions.
2. It facilitates content recommendation based on emotions captured.
3. It provides a virtual assistant that works as Voice User Interface (VUI).
4. This project can be extended by mounting it on a content platform for
recommendation.
5. Accuracy of the emotion detection module can be increased further.
Page | 13
3.2.2 Description of Modules
Emotion Extraction Module –
The image of user is captured through camera/webcam. Then the image is
converted to grayscale image for the classifier to identify the face properly.
Once the conversion is completed, the image is sent to classifier and with the
help of feature extraction it extracts the landmark features of the face from
image. From the extracted features, the classifier detects the emotion expressed
by the user. The classifier is trained through the tested and verified dataset
provides by OpenCV.
API Module –
Spotify API is used to recommend songs to the user based on the emotions
captured by the emotion detection module. It is implemented through the
Spotipy library of python. After detecting the emotion of user, the system calls
the API and based on the emotion it randomly suggests songs from different
genre.
Integration –
The emotion detection module and the API module are integrated to the Voice
User Interface module and can only be accessed by the interface module. When
the interface module is invoked, it first calls the emotion detection module to
detect emotion and then it calls the API module to search for the songs based
on that particular emotion and displays it to the user.
3.2.3.1 Python
Python is an easy to learn, powerful programming language. It has efficient high-
level data structures and a simple but effective approach to object-oriented
programming. Python’s elegant syntax and dynamic typing, together with its
interpreted nature, make it an ideal language for scripting and rapid application
development in many areas on most platforms.
Page | 14
The Python interpreter and the extensive standard library are freely available in
source or binary form for all major platforms from the Python web site,
https://www.python.org/, and may be freely distributed. [7]
In the development of this project, it is used as the main programming language of
the project and its whole development.
3.2.3.2 Pyttsx3
Pyttsx3 is a text-to-speech conversion library in Python. Pyttsx stands for Python
Text to Speech. It is a cross-platform Python wrapper for text-to-speech synthesis.
Unlike alternative libraries, it works offline, and is compatible with both Python 2
and 3. [8]
In the app, we used pyttsx3 for making our virtual assistant communicate with the
user in human language.
3.2.3.3 Deepface
Deepface is a lightweight face recognition and facial attribute analysis (age, gender,
emotion and race) framework for python. It is a hybrid face recognition framework
wrapping state-of-the-art models: VGG-Face, Google FaceNet, OpenFace,
Facebook DeepFace, DeepID, ArcFace, Dlib and SFace.[9]
It is an open-source face recognition attribute analysis framework that was
created for python. It is a very powerful computer vision library that’s helpful
in identifying things in images, as the shapes and faces within the image, so it’s
easy to detect and analyse them.
We used Deepface for the face recognition using landmarks from the images
captured.
Page | 15
3.2.3.5 OpenCV
OpenCV (Open-Source Computer Vision Library) is an open-source computer
vision and machine learning software library. OpenCV was built to provide a
common infrastructure for computer vision applications and to accelerate the use of
machine perception in the commercial products. Being an Apache 2 licensed
product, OpenCV makes it easy for businesses to utilize and modify the code. [11]
We used OpenCV mainly for 2 things-
1. To capture the image of the person with different emotion in front of the
camera.
2. To load the predefined training dataset of different emotions over the face.
This library we used for the recognition of commands given by the user and to
convert them into plain text to perform operation.
Here, we used the Spotify API to access songs database of Spotify and recommend
songs to the users.
3.3 Implementation
3.3.1 Environment Setup
The environment setup of the project requires the installation of different Python
packages and libraries.
Page | 16
Experiments show that human beings have 97.53% accuracy on facial recognition
tasks whereas those models already reached and passed that accuracy level. [14]
After the environment setup, create a file structure in order to divide the modules
and other files.
Page | 17
3.3.2 Source File
3.3.2.1 marc.py
M.A.R.C is the name of our virtual assistant or voice user interface and this is the
interface that connects the user to other two modules. It consists of the modules for
content recommendation and tasks to perform.
The screenshot in Figure 3.3 shows the code for initializing and implementing
speech recognition using Google recognizer.
Page | 18
This function enables greetings from the virtual assistant on every start up.
Page | 19
Figure 3.5.2: Python function for content recommendation
Page | 20
Figure 3.7: Main function of the virtual assistant
Page | 21
3.3.2.3 API Module
This module fetches songs from the Spotify database according to the emotion
detected by the emotion detection module and suggests top songs from that
category.
Page | 22
3.3.2.4 Classifier XML File
This xml file is required by the emotion classifier to classify the emotions
captured using the datapoints retrieved.
3.4 Testing
1. Wake Up by Voice Command
- Expected result:
The system should wake up and greet the user.
- Actual outcome:
System wakes up using voice command by user successfully.
Page | 23
3. Emotion Detection by Model
- Expected result:
After image capture the model should be able to detect emotion of user.
- Actual outcome:
At first, it failed to detect the emotion of the user.
-Reason:
Due to error in the integration and parameter passing
-Fix:
Integration done again and it then detected emotion successfully.
4. Content Recommendation
- Expected result:
The system should recommend content based on the emotion captured.
- Actual outcome:
It suggested a list of songs based on the emotion captured.
6. Quitting on Command
- Expected result:
System should quit on voice command.
- Actual outcome:
System quits on user’s voice command.
Page | 24
4. RESULT / SNAPSHOTS
4.1 Wake up & Capture Image
Page | 25
4.3 Image Captured
Page | 26
4.5 Tasks Performed
The following screenshots shows the browser search task performed by the model.
Page | 27
5. CONCLUSION AND FUTURE SCOPE
Page | 28
6. REFERENCES
[4] Kabani H, Khan S, Khan O and Tadvi S 2015 Emotion based music player
International Journal of Engineering Research and General Science 3 750-6
[6] G.O. Young, “Synthetic structure of industrial plastics (Book style with paper
title and editor),” in Plastics, 2nd ed. Vol. 3, J. Peters, Ed. New York: McGraw-
Hill, 1964, pp. 15-24.
[7] Python
https://docs.python.org/3/tutorial/index.html
[8] Pyttsx3
https://pypi.org/project/pyttsx3/
Page | 29