Batch 4 Naveen - Renold Arul

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

1

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

BATCH 2017-21 : PROJECT REGISTRATION FORM (CS8811 - PROJECT WORK)

4
1. PROJECT BATCH NO.: (To be allotted by Project Coordinator)
2. TEAM MEMBER(s):
Sl.
Register No. Student Name Mobile No. E-mail ID
No.

1. 312417104059 Naveen Renold.J 7358639628 naveenrenold@gmail.com

2. 312417104006 Arul.M 9840845886 arulmanoharan27@gmail.com`

Note: Team size will be restricted by two members. Individual projects are encouraged.
3. PROJECT DOMAIN AND TITLE:

Name of the Team Leader Naveen Renold.J

Project Domain Machine Learning

Using a speech-based emotion classifier to identify depression among


Tentative Title of the Project
people during covid and disaster periods

4. ABSTRACT:
(Contents : Introduction to domain, Existing System, Introduction to Proposed System, Methodology, and
Social Impacts)
2

Depression is a mood disorder that involves a persistent feeling of sadness and loss of interest. It is a
major problem in the modern age. Globally, more than 264 million people of all ages suffer from
depression. It is a leading cause of disability worldwide and is a major contributor to the overall global
burden of disease and a cause for suicide. To combat this, we propose a speech-based emotion classifier
to find feelings of anger and loneliness in a person.

5. EXISTING SYSTEM:
There are various systems to find emotions of a person using speech. They involve a preprocessing step in
which the data is formatted in order to use it. Common preprocessing steps include framing, feature selection
and noise reduction. Feature selection is an important part of emotion recognition. The features that are
selected can be prosodic, spectral or based on voice quality. Then any supporting modalities like visual
signals or linguistic features can be added. Finally, an appropriate classifier is used to classify the emotions.
The classifier can be based on a variety of machine learning algorithms such as Support Vector Machine,
Artificial Neural Network, Convolutional Neural Network and so on. The current systems use a combination
of prosodic features and spectral features and classify them using a Convolution Neural Network to obtain
the emotion. However, there is no system specific to finding the depression among users with the aim to
provide psychology help to the user.

6. PROPOSED SYSTEM:
The proposed system aims on finding depression in people from the emotion present in their speech.
According to the Discrete Emotional Model emotions are classified into six types, they are sadness,
happiness, fear, anger, disgust, and surprise. We are going to be focusing on the sadness parameter. Based on
the level of sadness in a person’s voice, the depression level is identified. The proposed system uses a
specific set of features to find the level of sadness in a person’s speech. Then the Convolutional Neural
Network focusing only on the sadness emotion is used to find the level of sadness. This is used to
recommend a person for further psychological treatment.
3

7. METHODOLOGY FOR PROPOSED SYSTEM:


We are using the RAVDESS dataset to find emotions of the people. We are going to use only the audio data
among the audio and visual data present in the RAVDESS dataset. The features selected from the speech are
a combination of prosodic and spectral features extracted by the Librosa library in python. Then a
Convolutional Neural Network of 4 layers is used to calculate the level sadness detected in a person’s voice.

8. SOCIAL IMPACT OF PROPOSED SYSTEM:


Depression is a condition which is hard to identify. Over 50% of depression cases are not identified, which
leads to lack of medical attention. These undetected cases affect the lives of people constantly. Our proposed
system aims to find the cases of depression and help in giving them appropriate medical attention. Thus
lowering the number of unnoticed cases and causing betterment of society.

9. SIGNATURE(S) OF TEAM MEMBER(s):


Sl.
Register No. Student Name Signatures
No.

1. 312417104059 Naveen Renold.J

2. 312417104006 Arul.M

10. REVIEW COMMITTEE COMMENTS:


Selected / Rejected

Signatures:

Project Coordinator HOD / CSE


4

PROJECT INITIATION DOCUMENT

Project Initiation Document (PID) is the top-level project planning document. This PID brings
together all of the information needed to get the project started, and communicate that key
information to the project's stakeholders.
Project Initiation Document does the following:
• Defines the project and its scope.
• Justifies the project.
• Defines the roles and responsibilities of project participants.
• Gives people the information they need to be productive and effective right from the start.\
PROJECT DOMAIN: (Specify the area of the project work)

Machine Learning, Speech Emotion Recognition.

TITLE: (Tentative title of the project work)

USING A SPEECH-BASED EMOTION CLASSIFIER TO IDENTIFY DEPRESSION AMONG


PEOPLE DURING COVID AND DISASTER PERIODS.
5

MOTIVATION: (Why are you doing this project work? Describe by considering the
environmental, societal, health, safety, legal, cultural issues and needs.)
Depression is a condition which is hard to identify. Over 50% of depression cases are not identified,
which leads to lack of medical attention. These undetected cases affect the lives of people constantly.
To find the cases of depression and help in giving them appropriate medical attention, we have
devised this system. We believe it will cause betterment of society and improvement in standard of
life.

DESCRIPTION: (Describe briefly about the proposed project. Highlight how you are applying
knowledge of Mathematics, Information Technology fundamentals and engineering specialization.)

We are using the RAVDESS dataset to find emotions of the people. We are going to use only the audio
data among the audio and visual data present in the RAVDESS dataset. The features selected from the
speech are a combination of prosodic and spectral features extracted by the Librosa library in python.
Then a Convolutional Neural Network of 4 layers is used to calculate the level sadness detected in a
person’s voice.
6

PROJECT SCOPE: (Define the boundaries of this project)


This system can help identify cases of depression but not treat depression. It suggests people with high
depression score to get external medical help. It is usable of people of any age and gender. It just
requires the speech data to identify a person’s depression making it a nonintrusive system with regards
to privacy concerns.

SOFTWARE REQUIREMENTS: (Specify the software components required for the proposed
system.)

• Python3(ver.3.7.3)
• h5py
• Keras
• scipy
• sklearn
• speechpy
• tensorflow

HARDWARE REQUIREMENTS: (Specify the software components required for the proposed
system.)

• Computer
• Sufficient processing power to run machine learning algorithms in reasonable
time.(recommended:4GB)
7

EXPERTISE: (Enter a brief description of the past project that you have done on the identified
technical area, expertise to technology in terms of applying appropriate techniques, resources, and
modern IT tools )

We have analyzed other datasets which include pokemon dataset, Iris dataset and heart beat
datasets through which we understand how to find the necessary features of sound and create a
model for it.

PROJECT OUTCOMES / DELIVERABLES (What specific outcomes / deliverables will be


achieved, and how will you measure these outcomes)
The depression level will be predicted for each person. Its accuracy can be compared with the actual
values. Then the model can be visualized as a graph or other format.

PROJECT BENEFITS (Specify the benefits out of your project applicable to the need of the
society)
Identifying cases of depression faster and reducing the amount of suicides and mental trauma
experienced by people. It only needs the sound input to identify the depression level. It does not intrude
on privacy.
8

ETHICAL PRINCIPLES (Highlight the ethical principles and commitment towards professional
ethics and responsibilities and norms of the engineering practice, you are going to adhere in this
project development)
Recording speech of people and saving it can intrude on a person’s privacy so we will only use the
speech to extract the features necessary to find depression. We will not use the actual speech to listen to
private conversations.

CONTINUOUS IMPROVEMENT (Highlight the possible avenues for further developments in


the future and how this project is going to help you in your professional career)
The field of finding emotions through speech is not a fully developed one. There is a lot of room for
improvement and additional features. Developments in this field can lead to newer technology. It is one
of the newer trends which will help us further in our career.

PROJECT CONSTRAINTS (What things must you take into consideration that will influence
your project.)
The amount of speech data that can be obtained is a major constraint. Different people speak in
different ways due to language and culture difference. This variance can affect results. Also it is
important to separate speech with noise.

MULTIDISCIPLINARY ASPECTS (Highlight the multidisciplinary components to be


incorporated in to the project development)
This project is based on machine learning concepts but it is also aided by other fields like mathematics
and statistics to analyze the data and present it in an accurate manner. Research in speech is also
necessary to identify the features to be selected.
9

PROJECT TEAM (Describe the roles and responsibilities of members of the team in the project
development)
M.Arul:
• To find the appropriate machine learning algorithm applicable to the system
• To make the C.N.N classifier
Naveen Renold.J:
• To perform the steps for preprocessing of speech data
• To create the python application
• To visualize the final data.

Approval by Project Coordinator________________________________________________


10

LITERATURE SURVEY
Title of the Base Paper:
Speech emotion recognition: Emotional models, databases, features, preprocessing methods,
supporting modalities, and classifiers

Author Details:

Mehmet Berkehan Akçay, Kaya Oğuz

Publication Details:
Speech emotion recognition: Emotional models, databases, features, preprocessing methods,
supporting modalities, and classifiers,Speech Communication, Volume 116, 2020, Pages 56-
76, ISSN 0167-6393.

Contribution:
This paper consists of the complete history of speech emotions recognition with all possible
steps explained. Then it goes in depth and explains all possible methods to perform each step,
while explaining the pros and cons. It consists of a list of the prominent datasets used for
speech emotion recognition. It also contains a case study of the various researches done in
speech emotion recognition and the result of those research

Methodology:
This paper explains speech emotion recognition in depth. It also explains the various types,
methods, datasets and previous researches done on the subject. There are two types of
emotion classification: Discrete method or Dimensional method. In Discrete method emotions
classified into six types. Dimensional method uses the 6 parameters and also uses other
parameters like valence and power to give add weight to the initial parameters. The common
preprocessing steps are framing, noise reduction and feature selection. Features selection
involves selecting appropriate feature from the speech to analyze. The features in speech can
be prosodic, Spectral, Voice quality or TEO based. The needed features are used for analysis.
11

Outcome:
Using the data provided in this paper we can identify suitable methods to employ in our
system. Its comprehensive coverage on all methods in speech emotion recognition gives us
many options to choose the most optimal method.

Challenges:
The main challenge in speech emotion recognition is identifying speech outside of training
datasets. Speech patterns among people of different ethnics and cultures. This variance makes
it hard to develop an accurate machine learning model to predict the data.

Future Scope:
As the research in the speech emotion classification area grows, more methodologies and
efficient technologies will be obtained to get a better accuracy. This will enable it to be
practical and useful in the real world.

Approval by Supervisor__________________________________
12

LITERATURE SURVEY
Title of the Supporting Paper 1:
Emotion Recognition in Speech Using Neural Networks

Author Details:
J. Nicholson, K. Takahashi and R. Nakatsu

Publication Details:
Neural Comput & Applic (2000)9:290–296 2000 Springer-Verlag London Limite

Contribution:
A basic Speech Emotion Recognition model that does uses Convolutional Neural Network as
a classifier to find emotions. It uses a combination of 15 prosodic and spectral features. It can
find eight emotions. joy, teasing, fear, sadness, disgust, anger, surprise, neutral are the
emotions that can be found in this system. The feature selection and C.N.N models from this
system are used in our system.

Methodology:
It has two major parts to perform Speech Emotion Recognition. First part is speech processing
and second part is emotion recognition. In speech processing when the volume is above a
certain threshold, the scanning for the 15 necessary features starts. The dataset is self-created
dataset of 100 Japanese people. Next the step is emotion recognition in which the dataset is
separated into training and testing data. Then a C.N.N with 8 sub neural networks, one for
each emotion is created. Then it is trained and used to predict the data.

Outcome:

The system was able to get a accuracy of 50% for open and closed systems. With modern
techniques it is possible to improve the accuracy to a grater extend.
13

Challenges:
The recognition accuracy was be good for both open systems also. Due to difference in
speech of regular people and data in dataset, it is possible for it to impact the accuracy.

Future Scope:
Using a larger dataset, better algorithms and better feature selection it is possible to improve
the dataset to acceptable levels of accuracy.

Approval by Supervisor__________________________________
14

LITERATURE SURVEY
Title of the Supporting Paper 2:
Automated speech-based screening of depression using deep convolutional neural networks

Author Details:

Karol Chlastaa, Krzysztof Wołka , Izabela Krejtzb

Publication Details:
Published by Elsevier B.V. Procedia Computer Science 164 (2019) 618–628

Contribution:
This paper proposes a novel approach to automated depression detection in speech using
convolutional neural network (CNN) and multipart interactive training. The model was tested
using 2568 voice samples obtained from 77 non-depressed and 30 depressed individuals. In
experiment conducted, data were applied to residual CNNs in the form of spectrograms—
images auto-generated from audio samples. The experimental results obtained using different
ResNet architectures gave a promising baseline accuracy reaching 77%.

Methodology:
First dataset is preprocessed. The dataset used is the DAIC database which contains clinical
interviews designed to support the diagnosis of psychological distress conditions such as
anxiety, depression, and post‐traumatic stress. The audio files are cut to 60 second clips and
normalized to form the dataset for machine learning. In this study, several pre-trained CNN
architectures that used a residual learning framework (ResNet-18, 34, 50, 101, 152) were
used. Further fine-tuning on the available ResNet architectures is done to train a state-of-the-
art image classifier.
Outcome:

This method produced a promising classification accuracy of around 70% for a ResNet-34
model, and 71% for a ResNet-50 model, both trained on spectrograms of 224x224 px.
15

Challenges:
The main challenge in speech emotion recognition is identifying speech outside of training
datasets. Speech patterns among people of different ethnics and cultures. This variance makes
it hard to develop an accurate machine learning model to predict the data.

Future Scope:
The results suggest a promising new direction in using audio spectrograms for preliminary
screening of depressive subjects by using short samples of voice. The spectrograms proved to
have a potential for generating CNN learnable features.

Approval by Supervisor__________________________________
16

INFERENCE OF LITERATURE SURVEY

This paper explains speech emotion recognition in depth. It also explains


the various types, methods, datasets and previous researches done on the
subject. There are two types of emotion classification: Discrete method or
Dimensional method. In Discrete method emotions classified into six types.
Base Paper Dimensional method uses the 6 parameters and also uses other parameters
like valence and power to give add weight to the initial parameters. The
common preprocessing steps are framing, noise reduction and feature
selection. Features selection involves selecting appropriate feature from the
speech to analyze. The features in speech can be prosodic, Spectral, Voice
quality or TEO based. The needed features are used for analysis.
A basic Speech Emotion Recognition model that does uses Convolutional
Neural Network as a classifier to find emotions. It uses a combination of 15
Supporting Paper 1 prosodic and spectral features. It can find eight emotions. joy, teasing, fear,
sadness, disgust, anger, surprise, neutral are the emotions that can be found
in this system.

This paper proposes a novel approach to automated depression detection in


speech using convolutional neural network (CNN) and multipart interactive
Supporting Paper 2 training. In experiment conducted, data were applied to residual CNNs in
the form of spectrograms—images auto-generated from audio samples. The
experimental results obtained using different ResNet architectures gave a
promising baseline accuracy reaching 77%.
17

LIST OF OTHER TECHNICAL PAPERS SURVEYED


Paper Paper Details
ID.
1. Feature Augmenting Networks for Improving Depression Severity Estimation From
Speech Signals. LE YANG, DONGMEI JIANG AND HICHEM SAHLI , IEEE
Xplore
Digital Object Identifier 10.1109/ACCESS.2020.2970496
2. A review of depression and suicide risk assessment using speech analysis,
Nicholas Cummins, Stefan Scherer, Jarek Krajewski, Sebastian Schnieder, Julien Epps,
Thomas F. Quatieri, Speech Communication, Volume 71, 2015, Pages 10-49, ISSN
0167-6393
3. Kuan Ee Brian Ooi, Margaret Lech, Nicholas Brian Allen, Prediction of major
depression in adolescents using an optimized multi-channel weighted speech
classification system, Biomedical Signal Processing and Control, Volume
14,2014,Pages 228-239, ISSN 1746-8094,

4. Paula Lopez-Otero, Laura Docio-Fernandez, Analysis of gender and identity issues in


depression detection on de-identified speech, Computer Speech & Language, Volume
65, 2021, 101118, ISSN 0885-2308

5. Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional


Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal
expressions in North American English. PLoS ONE 13(5): e0196391.Joseph Najbauer

6. Yue, J., Zang, X., Le, Y. et al. Anxiety, depression and PTSD among children and
their parent during 2019 novel coronavirus disease (COVID-19) outbreak in
China. Curr Psychol (2020).

7. Silva, W.A.D., de Sampaio Brito, T.R. & Pereira, C.R. COVID-19 anxiety scale
(CAS): Development and psychometric properties. Curr Psychol (2020).

8. Chen, B., Li, Qx., Zhang, H. et al. The psychological impact of COVID-19 outbreak
on medical staff and the general public. Curr Psychol (2020).
18

9. Robles-Bello, M.A., Sánchez-Teruel, D. & Valencia Naranjo, N. Variables protecting


mental health in the Spanish population affected by the COVID-19 pandemic. Curr
Psychol (2020).
19

EXTRACTION FROM LITERATURE SURVEY FOR THE PROPOSED


WORK
Using the data provided in this paper we can identify suitable methods to
employ in our system. Its comprehensive coverage on all methods in
Base Paper
speech emotion recognition gives us many options to choose the most
optimal method.
A C.N.N can be used to find the emotions in speech. A combination of 15
Supporting Paper 1 prosodic and spectral features are used. The feature selection and C.N.N
models from this system are used in our system.
Performing multipart interactive training with ResNet architecture
Supporting Paper 2
improves the accuracy up to 77%.
20

EXTRACTION FROM LITERATURE REVIEW FOR THE PROPOSED


WORK (OTHER PAPERS)
Paper Details
ID
1. The limited amount of annotated data has become the main bottleneck restricting the
study on depression screening, especially when deep learning models are used. To
alleviate this issue, Deep Convolutional Generative Adversarial Network (DCGAN) is
used for features augmentation to improve depression severity estimation from speech.
2. Gives information of how common paralinguistic speech characteristics are affected by
depression and suicidality and the application of this information in classification and
prediction systems.
3. Shows that acoustic speech parameters are strong indicators of full-blown depression
symptoms in adults and adolescents. Proves the effectiveness of acoustic speech
analysis and classification in prediction of depression in adolescents before the full
blown symptoms become apparent

4. This paper provides an analysis of gender and identity issues in the context of
depression level estimation of de-identified speech.

5. Provides the Ravdess dataset which consists of audio and visual data for speech
emotion recognition. The data is labelled for 6 emotions and is separated into normal
speech and songs.

6. During the ongoing COVID-19 outbreak, great attention should be paid to the mental
health of the population, especially medical staff, and measures such as psychological
intervention should be actively carried out for reducing the psychosocial effects.

7. CAS(Covid Anxiety Scale) is a reliable and adequate instrument to assess COVID-19


related anxiety.

8. Home quarantine may lead to families developing a variety of psychological distress.


Findings suggested that children and their parent in non-severe area didn’t suffer major
psychological distress during the outbreak. Being a mother, being younger, lower
levels of educational attainment and family monthly income were risk factors for
anxiety, depression and PTSD for parent
21

9. The Spanish population exposed to confinement presents high levels of resilience to


Covid induced conditions, but no relevant post-traumatic growth has taken place.
22

PROPOSED SYSTEM- ABSTRACT

Depression is a mood disorder that involves a persistent feeling of sadness and loss of interest. It is a
major problem in the modern age. Globally, more than 264 million people of all ages suffer from
depression. It is a leading cause of disability worldwide and is a major contributor to the overall
global burden of disease and a cause for suicide. To combat this, we propose a speech-based emotion
classifier to find feelings of anger and loneliness in a person.
23

PROPOSED SYSTEM – ARCHITECTURE DIAGRAMS

You might also like