Speech and Text Emotion Recognition Using Machine Learning Batch Number - 08 First Review 2.0

HKBK COLLEGE ----------
OF ENGINEERING
Nagawara , Bengaluru – 560 045
Department of Information Science & Engineering
Speech and Text Emotion Recognition using Machine Learning
Batch Number – 08
FIRST REVIEW 2.0
Under the Guide : Team Members:

Prof. Usman Ajiaz.N 1HK18IS051 Akhila R
Assistant prof : ISE Dept 1HK18IS015 Bhavana R
1HK18IS031 Jeevitha N
Prof. Naheem MR 1HK18IS035 Mounika M
Assistant prof : ISE dept
Agenda
 Objective
 Abstract
 Introduction
 Literature Survey
 Existing System
 Proposed System
 Architecture Diagram
 Hardware & Software Specification
 Gantt Chart
OBJECTIVE
 To detect the emotion of a person using both text and speech of a person.
 To increase the accuracy of the result.
 The goal of speech emotion recognition is to predict the emotional content of
speech and to classify speech according to one of several labels (i.e., happy,
sad, neutral, and angry).
 The primary objective of this project is to improve man-machine interface.
 It can also be used to monitor the psycho physiological state of a person in lie
detectors.
ABSTRACT
 Speech emotion recognition is a challenging task, and extensive reliance has been placed on
models that use audio features in building well-performing classifiers.
 In this project, we propose a model that utilizes text data and audio signals to obtain a better
understanding of speech data.
 As emotional dialogue is composed of sound and spoken content, our model encodes the
information from audio and text sequences using neural networks and then combines the
information from these sources to predict the emotion class.
 This architecture analyzes speech data from the signal level to the language level, and it thus
utilizes the information within the data more comprehensively than models that focus on audio
features.
INTRODUCTION
 Recently, machine learning algorithms have successfully addressed problems in various fields, such
as image classification, machine translation, speech recognition, text-to-speech generation and
other machine learning related areas.
 Similarly, substantial improvements in performance have been obtained when machine learning
algorithms have been applied to statistical speech processing.
 In developing emotionally aware intelligence, the very first step is building robust emotion
classifiers that display good performance regardless of the application; this outcome is considered
to be one of the fundamental research goals in affective computing.
 In particular, the speech emotion recognition task is one of the most important problems in the field
of paralinguistics.
 This field has recently broadened its applications, as it is a crucial factor in optimal
humancomputer interactions, including dialog systems.
LITERATURE SURVEY
S.No Title Author and Year Technology used Concept
1 Automatic Dialogue Generation with Chenyang Huang, Osmar Neural dialogue generation We present three models
Expressed Emotions R. Za¨ıane, Amine Trabelsi, systems that either concatenate the
Nouha Dziri (2019) desired emotion with the
source input during the
learning, or push the
emotion in the decoder.
2 Toward effective automatic Atiquzzaman Mondal Effective Automatic Speech Collection and organization
recognition systems of emotion in (2018) Emotion Recognition of databases and emotional
speech descriptors, the calculation,
selection, and
normalization of relevant
speech features, and the
models used to recognize
emotions.
3 Evaluating Google Speech-to-Text Bogdan (2020) Google Cloud Speech-to- ASR on multimedia e-
API’s Performance for Romanian e- Text API learning resources available
Learning Resources in Romanian with the usage
of the Google Cloud
Speech-to-Text API
Existing System
 Speech emotion recognition is a challenging task, and extensive reliance has been placed on
models that use audio features in building well-performing classifiers.
 In this paper, we propose a novel deep dual recurrent encoder model that utilizes text data and
audio signals simultaneously to obtain a better understanding of speech data.
 As emotional dialogue is composed of sound and spoken content, our model encodes the
information from audio and text sequences using dual recurrent neural networks (RNNs) and then
combines the information from these sources to predict the emotion class.
 This architecture analyzes speech data from the signal level to the language level, and it thus
utilizes the information within the data more comprehensively than models that focus on audio
features. Extensive experiments are conducted to investigate the efficacy and properties of the
proposed model.
 Our proposed model outperforms previous state-of-the-art methods in assigning data to one of four
emotion categories (i.e., angry, happy, sad and neutral) when the model is applied to the IEMOCAP
dataset, as reflected by accuracies ranging from 68.8% to 71.8%.
Proposed System
 Proposed system consists of recognising the emotion of a person by using both

speech and text.
 The dataset has to be gathered and pre processed and then training should be done
using neural network algorithms.
 Based on the textual features(words) and intensity of the voice, we can detect what
type of emotion the person is having (angry, sad, happy).
 The GUI will be build on the flask framework to give inputs and to show the
predicted output.
Hardware and Software Requirements
HARDWARE REQUIREMENTS: SOFTWARE REQUIREMENTS:

 System : Pentium IV 2.4 GHz.  Operating Syste: Windows XP.
 Hard Disk : 40 GB.  Platform: PYTHON TECHNOLOGY
 Monitor : 15 inch VGA Color.  Tool : Spyder, Python 3.5
 Mouse : Logitech Mouse.  Front End : Anaconda
 Ram : 512 MB  Back End: python anaconda script
 Keyboard : Standard Keyboard
Gantt Chart
ACTIVITY NOVEMBER DECEMBER JANUARY

Per
Week Week 1 Week 2 Week 3 Week 4 Week 1 Week 2 Week 3 Week 4 Week 1 Week 2 Week 3 Week 4
Literature
Survey
Problem
Formulation
Research on
Software
Requirements
Hardware
Requirements
First phase
Review and
report
submission
REFERENCES
 B. W. a. T. G. Lingli Yu, ”A hierarchical support vector machine based on feature-driven method

for speech emotion recognition,” Artificial Immune Systems - ICARIS, pp. 901-907, 2018.
 http://pascal.kgw.tu-berlin.de/emodb/index-1280.html
 R. P. a. T. P. Alexander Schmitt, ”Advances in Speech Recognition,” Springer, pp. 191-200, 2019.
 A. Joshi, ”Speech Emotion Recognition Using Combined Features of HMM & SVM Algorithm,”
International Journal of Advanced Research in Computer Science and Software Engineering, pp.
387-392, 2018.
 D. S. L. N. Akshay S. Utane, ”Emotion Recognition through Speech Using Gaussian Mixture
Model and Support Vector Machine,” International Journal of Scientific & Engineering Research,
no. 5, pp. 1439-1443, 2020.
 K. S. R. S. G. Koolagudi, ”Emotion recognition from speech using global and local prosodic,”
Springer, 2019.
THANK YOU

Speech and Text Emotion Recognition Using Machine Learning Batch Number - 08 First Review 2.0

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Speech and Text Emotion Recognition Using Machine Learning Batch Number - 08 First Review 2.0

Uploaded by

Copyright:

Available Formats

HKBK COLLEGE ----------

Speech and Text Emotion Recognition using Machine Learning

Under the Guide : Team Members:

 Proposed system consists of recognising the emotion of a person by using both

HARDWARE REQUIREMENTS: SOFTWARE REQUIREMENTS:

ACTIVITY NOVEMBER DECEMBER JANUARY

 B. W. a. T. G. Lingli Yu, ”A hierarchical support vector machine based on feature-driven method

You might also like