Professional Documents
Culture Documents
Batch 4 Naveen - Renold Arul
Batch 4 Naveen - Renold Arul
Batch 4 Naveen - Renold Arul
4
1. PROJECT BATCH NO.: (To be allotted by Project Coordinator)
2. TEAM MEMBER(s):
Sl.
Register No. Student Name Mobile No. E-mail ID
No.
Note: Team size will be restricted by two members. Individual projects are encouraged.
3. PROJECT DOMAIN AND TITLE:
4. ABSTRACT:
(Contents : Introduction to domain, Existing System, Introduction to Proposed System, Methodology, and
Social Impacts)
2
Depression is a mood disorder that involves a persistent feeling of sadness and loss of interest. It is a
major problem in the modern age. Globally, more than 264 million people of all ages suffer from
depression. It is a leading cause of disability worldwide and is a major contributor to the overall global
burden of disease and a cause for suicide. To combat this, we propose a speech-based emotion classifier
to find feelings of anger and loneliness in a person.
5. EXISTING SYSTEM:
There are various systems to find emotions of a person using speech. They involve a preprocessing step in
which the data is formatted in order to use it. Common preprocessing steps include framing, feature selection
and noise reduction. Feature selection is an important part of emotion recognition. The features that are
selected can be prosodic, spectral or based on voice quality. Then any supporting modalities like visual
signals or linguistic features can be added. Finally, an appropriate classifier is used to classify the emotions.
The classifier can be based on a variety of machine learning algorithms such as Support Vector Machine,
Artificial Neural Network, Convolutional Neural Network and so on. The current systems use a combination
of prosodic features and spectral features and classify them using a Convolution Neural Network to obtain
the emotion. However, there is no system specific to finding the depression among users with the aim to
provide psychology help to the user.
6. PROPOSED SYSTEM:
The proposed system aims on finding depression in people from the emotion present in their speech.
According to the Discrete Emotional Model emotions are classified into six types, they are sadness,
happiness, fear, anger, disgust, and surprise. We are going to be focusing on the sadness parameter. Based on
the level of sadness in a person’s voice, the depression level is identified. The proposed system uses a
specific set of features to find the level of sadness in a person’s speech. Then the Convolutional Neural
Network focusing only on the sadness emotion is used to find the level of sadness. This is used to
recommend a person for further psychological treatment.
3
2. 312417104006 Arul.M
Signatures:
Project Initiation Document (PID) is the top-level project planning document. This PID brings
together all of the information needed to get the project started, and communicate that key
information to the project's stakeholders.
Project Initiation Document does the following:
• Defines the project and its scope.
• Justifies the project.
• Defines the roles and responsibilities of project participants.
• Gives people the information they need to be productive and effective right from the start.\
PROJECT DOMAIN: (Specify the area of the project work)
MOTIVATION: (Why are you doing this project work? Describe by considering the
environmental, societal, health, safety, legal, cultural issues and needs.)
Depression is a condition which is hard to identify. Over 50% of depression cases are not identified,
which leads to lack of medical attention. These undetected cases affect the lives of people constantly.
To find the cases of depression and help in giving them appropriate medical attention, we have
devised this system. We believe it will cause betterment of society and improvement in standard of
life.
DESCRIPTION: (Describe briefly about the proposed project. Highlight how you are applying
knowledge of Mathematics, Information Technology fundamentals and engineering specialization.)
We are using the RAVDESS dataset to find emotions of the people. We are going to use only the audio
data among the audio and visual data present in the RAVDESS dataset. The features selected from the
speech are a combination of prosodic and spectral features extracted by the Librosa library in python.
Then a Convolutional Neural Network of 4 layers is used to calculate the level sadness detected in a
person’s voice.
6
SOFTWARE REQUIREMENTS: (Specify the software components required for the proposed
system.)
• Python3(ver.3.7.3)
• h5py
• Keras
• scipy
• sklearn
• speechpy
• tensorflow
HARDWARE REQUIREMENTS: (Specify the software components required for the proposed
system.)
• Computer
• Sufficient processing power to run machine learning algorithms in reasonable
time.(recommended:4GB)
7
EXPERTISE: (Enter a brief description of the past project that you have done on the identified
technical area, expertise to technology in terms of applying appropriate techniques, resources, and
modern IT tools )
We have analyzed other datasets which include pokemon dataset, Iris dataset and heart beat
datasets through which we understand how to find the necessary features of sound and create a
model for it.
PROJECT BENEFITS (Specify the benefits out of your project applicable to the need of the
society)
Identifying cases of depression faster and reducing the amount of suicides and mental trauma
experienced by people. It only needs the sound input to identify the depression level. It does not intrude
on privacy.
8
ETHICAL PRINCIPLES (Highlight the ethical principles and commitment towards professional
ethics and responsibilities and norms of the engineering practice, you are going to adhere in this
project development)
Recording speech of people and saving it can intrude on a person’s privacy so we will only use the
speech to extract the features necessary to find depression. We will not use the actual speech to listen to
private conversations.
PROJECT CONSTRAINTS (What things must you take into consideration that will influence
your project.)
The amount of speech data that can be obtained is a major constraint. Different people speak in
different ways due to language and culture difference. This variance can affect results. Also it is
important to separate speech with noise.
PROJECT TEAM (Describe the roles and responsibilities of members of the team in the project
development)
M.Arul:
• To find the appropriate machine learning algorithm applicable to the system
• To make the C.N.N classifier
Naveen Renold.J:
• To perform the steps for preprocessing of speech data
• To create the python application
• To visualize the final data.
LITERATURE SURVEY
Title of the Base Paper:
Speech emotion recognition: Emotional models, databases, features, preprocessing methods,
supporting modalities, and classifiers
Author Details:
Publication Details:
Speech emotion recognition: Emotional models, databases, features, preprocessing methods,
supporting modalities, and classifiers,Speech Communication, Volume 116, 2020, Pages 56-
76, ISSN 0167-6393.
Contribution:
This paper consists of the complete history of speech emotions recognition with all possible
steps explained. Then it goes in depth and explains all possible methods to perform each step,
while explaining the pros and cons. It consists of a list of the prominent datasets used for
speech emotion recognition. It also contains a case study of the various researches done in
speech emotion recognition and the result of those research
Methodology:
This paper explains speech emotion recognition in depth. It also explains the various types,
methods, datasets and previous researches done on the subject. There are two types of
emotion classification: Discrete method or Dimensional method. In Discrete method emotions
classified into six types. Dimensional method uses the 6 parameters and also uses other
parameters like valence and power to give add weight to the initial parameters. The common
preprocessing steps are framing, noise reduction and feature selection. Features selection
involves selecting appropriate feature from the speech to analyze. The features in speech can
be prosodic, Spectral, Voice quality or TEO based. The needed features are used for analysis.
11
Outcome:
Using the data provided in this paper we can identify suitable methods to employ in our
system. Its comprehensive coverage on all methods in speech emotion recognition gives us
many options to choose the most optimal method.
Challenges:
The main challenge in speech emotion recognition is identifying speech outside of training
datasets. Speech patterns among people of different ethnics and cultures. This variance makes
it hard to develop an accurate machine learning model to predict the data.
Future Scope:
As the research in the speech emotion classification area grows, more methodologies and
efficient technologies will be obtained to get a better accuracy. This will enable it to be
practical and useful in the real world.
Approval by Supervisor__________________________________
12
LITERATURE SURVEY
Title of the Supporting Paper 1:
Emotion Recognition in Speech Using Neural Networks
Author Details:
J. Nicholson, K. Takahashi and R. Nakatsu
Publication Details:
Neural Comput & Applic (2000)9:290–296 2000 Springer-Verlag London Limite
Contribution:
A basic Speech Emotion Recognition model that does uses Convolutional Neural Network as
a classifier to find emotions. It uses a combination of 15 prosodic and spectral features. It can
find eight emotions. joy, teasing, fear, sadness, disgust, anger, surprise, neutral are the
emotions that can be found in this system. The feature selection and C.N.N models from this
system are used in our system.
Methodology:
It has two major parts to perform Speech Emotion Recognition. First part is speech processing
and second part is emotion recognition. In speech processing when the volume is above a
certain threshold, the scanning for the 15 necessary features starts. The dataset is self-created
dataset of 100 Japanese people. Next the step is emotion recognition in which the dataset is
separated into training and testing data. Then a C.N.N with 8 sub neural networks, one for
each emotion is created. Then it is trained and used to predict the data.
Outcome:
The system was able to get a accuracy of 50% for open and closed systems. With modern
techniques it is possible to improve the accuracy to a grater extend.
13
Challenges:
The recognition accuracy was be good for both open systems also. Due to difference in
speech of regular people and data in dataset, it is possible for it to impact the accuracy.
Future Scope:
Using a larger dataset, better algorithms and better feature selection it is possible to improve
the dataset to acceptable levels of accuracy.
Approval by Supervisor__________________________________
14
LITERATURE SURVEY
Title of the Supporting Paper 2:
Automated speech-based screening of depression using deep convolutional neural networks
Author Details:
Publication Details:
Published by Elsevier B.V. Procedia Computer Science 164 (2019) 618–628
Contribution:
This paper proposes a novel approach to automated depression detection in speech using
convolutional neural network (CNN) and multipart interactive training. The model was tested
using 2568 voice samples obtained from 77 non-depressed and 30 depressed individuals. In
experiment conducted, data were applied to residual CNNs in the form of spectrograms—
images auto-generated from audio samples. The experimental results obtained using different
ResNet architectures gave a promising baseline accuracy reaching 77%.
Methodology:
First dataset is preprocessed. The dataset used is the DAIC database which contains clinical
interviews designed to support the diagnosis of psychological distress conditions such as
anxiety, depression, and post‐traumatic stress. The audio files are cut to 60 second clips and
normalized to form the dataset for machine learning. In this study, several pre-trained CNN
architectures that used a residual learning framework (ResNet-18, 34, 50, 101, 152) were
used. Further fine-tuning on the available ResNet architectures is done to train a state-of-the-
art image classifier.
Outcome:
This method produced a promising classification accuracy of around 70% for a ResNet-34
model, and 71% for a ResNet-50 model, both trained on spectrograms of 224x224 px.
15
Challenges:
The main challenge in speech emotion recognition is identifying speech outside of training
datasets. Speech patterns among people of different ethnics and cultures. This variance makes
it hard to develop an accurate machine learning model to predict the data.
Future Scope:
The results suggest a promising new direction in using audio spectrograms for preliminary
screening of depressive subjects by using short samples of voice. The spectrograms proved to
have a potential for generating CNN learnable features.
Approval by Supervisor__________________________________
16
6. Yue, J., Zang, X., Le, Y. et al. Anxiety, depression and PTSD among children and
their parent during 2019 novel coronavirus disease (COVID-19) outbreak in
China. Curr Psychol (2020).
7. Silva, W.A.D., de Sampaio Brito, T.R. & Pereira, C.R. COVID-19 anxiety scale
(CAS): Development and psychometric properties. Curr Psychol (2020).
8. Chen, B., Li, Qx., Zhang, H. et al. The psychological impact of COVID-19 outbreak
on medical staff and the general public. Curr Psychol (2020).
18
4. This paper provides an analysis of gender and identity issues in the context of
depression level estimation of de-identified speech.
5. Provides the Ravdess dataset which consists of audio and visual data for speech
emotion recognition. The data is labelled for 6 emotions and is separated into normal
speech and songs.
6. During the ongoing COVID-19 outbreak, great attention should be paid to the mental
health of the population, especially medical staff, and measures such as psychological
intervention should be actively carried out for reducing the psychosocial effects.
Depression is a mood disorder that involves a persistent feeling of sadness and loss of interest. It is a
major problem in the modern age. Globally, more than 264 million people of all ages suffer from
depression. It is a leading cause of disability worldwide and is a major contributor to the overall
global burden of disease and a cause for suicide. To combat this, we propose a speech-based emotion
classifier to find feelings of anger and loneliness in a person.
23