Professional Documents
Culture Documents
Report - 2301172031 - Raya Nadlira Nurul F v1
Report - 2301172031 - Raya Nadlira Nurul F v1
RECOGNITION
Recognition
OBJECTIVE
08/17/2022 3
REFERENCE TRACING
08/17/2022 4
Author Year Focus Dataset Accuracy
Wei-Hua Cao 2017 Speaker-independent CASIA corpus : 4 peoples (2 male,2 female) , six basic achieves 78.6%, 2.2%
et.al Speech Emotion emotions (surprise, happy, sad, angry, fear, and neutral) higher than using
Recognition Based on Feature : Pitch, Short Time Energy, Zero Crossing Rate, Spearman.
Random Forest Feature First-order derivative and Second-order derivative
Selection Algorithm The features use OpenSmile.
Spectral Features
Peak, Energy, Zero-Crossing Rate
(ZCR), and Fast-Fourier
Transform (FFT)
08/17/2022 6
DESIGN SYSTEM OF SPEECH-EMOTION
RECOGNITION
Discrete Wavelet
Input Speech Remove Silence Area Transform
Frame Segmentation
08/17/2022 8
08/17/2022 9
FRAME SEGMENTATION
08/17/2022 10
Figure 7. Frame Segmentation of Emotion Signal
FEATURE Zero-crossing
0.5
TO ZERO TO NEGATIVE OR FROM NEGATIVE TO ZERO TO POSITIVE
0 0 0 0
1 2 3 4 5
• PEAK -0.5
-1 -1
PEAKS ARE DETECTED AS LOCAL MAXIMA -1.5
• ENERGY
• AREA TRAPEZOID =
• FOURIER TRANSFORM
• DECOMPOSES A FUNCTION OF TIME INTO THE FREQUENCIES.
• FORMULA FOURIER TRANSFORM
FOR 0
WHERE : DISCRETE INPUT SERIES, : FREQUENCY MAGNITUDE, AND N : A NUMBER DISCRETE INPUT
SERIES
• THUS, TO DETERMINE FOURIER TRANSFORM VALUE OF EACH FRAME SEGMENT USING FORMULA
𝑓 𝑛 =max ( 𝐻 ( 𝑘 ) )
WHERE = FRAME SEGMENT 1 …. N
08/17/2022 12
FEATURE EXTRACTION
• A PEAK, ENERGY AND FOURIER TRANSFORM USE SEGMENTATION FRAME FOR ANALYZE
INFORMATION WHILE THE ZERO-CROSSING RATE FEATURE NOT USE IT.
• THUS, A TOTAL NUMBER OF FEATURES EXTRACTED FOR EACH SPEECH IS 17 FEATURES
CONSISTING OF THE VALUES OF THE FOLLOWING :
• ZERO-CROSSING RATE.
• NP IS THE NUMBER OF PEAKS SPEECH SIGNAL.
• E1- E5 IS ENERGY OF EACH FRAME SEGMENT.
• AP 1 – AP 5 IS THE AVERAGES PEAK VALUE OF EACH FRAME SEGMENT.
• F1 – F5 IS THE FOURIER TRANSFORM VALUE OF EACH FRAME SEGMENT.
08/17/2022 13
EMOTION CLASSIFICATION
• TRAINING SET USE 307 DATA AND TEST SET USE 77 DATA
• THREE WIDELY USED CLASSIFICATION TECHNIQUES WITH
• KNN : COMPUTED FROM A SIMPLE MAJORITY VOTE OF THE NEAREST NEIGHBORS OF EACH POINT. EACH OBJECT
VOTES FOR THEIR CLASS AND THE CLASS WITH THE MOST VOTES IS TAKEN AS THE PREDICTION. FOR FINDING
CLOSEST POINTS USING EUCLIDEAN DISTANCE.
• RANDOM FOREST : AN ALGORITHM USED IN THE CLASSIFICATION OF LARGE AMOUNTS OF DATA. DETERMINATION
OF CLASSIFICATION BY RANDOM FOREST IS TAKEN BASED ON THE RESULTS OF VOTING FROM THE TREE FORMED.
THE WINNER OF THE TREE FORMED IS DETERMINED BY THE MOST VOTES CALLED MAJORITY VOTES.
• NEURAL NETWORK : INFORMATION PROCESS BASED ON HOW THE HUMAN BRAIN WORKS. THE CHARACTERISTICS
OF NEURAL NETWORK ARE SEEN FROM THE PATTERN OF RELATIONSHIPS BETWEEN NEURONS, THE METHOD OF
DETERMINING THE WEIGHTS OF EACH CONNECTION, AND THE ACTIVATION FUNCTION.
08/17/2022 14
THE EMOTION CLASSIFICATION RESULT OF
SPEECH USING KNN
Prediction
Neutral Calm Happy Sad Angry Fearful Disgust Surprise
Neutral 3 3 0 0 0 0 0 0
Calm 2 2 2 0 0 0 0 0
TRUE CLASS
Happy 7 2 0 4 2 0 0 0
Sad 1 2 2 0 0 1 0 0
Angry 0 2 0 0 8 1 1 0
Fearful 0 0 2 1 2 1 0 0
Disgust 0 0 0 3 3 5 2 2
Surprise 0 0 0 0 0 2 1 8
Table 3. Classification Result using KNN
Table 3 shows that only happy and sad emotion cannot recognized. Thus, 08/17/2022 15
Happy 0 0 6 1 0 0 0 0
Sad 0 0 2 5 0 0 0 0
Angry 0 0 0 0 8 0 0 0
Fearful 0 0 0 0 0 13 0 0
Disgust 0 0 0 0 0 0 6 0
Surprise 0 0 0 0 0 0 2 8
Table 4. Classification Result using Random Forest
Table 4 shows that just angry and fearful all can be recognized .
08/17/2022 16
However, the others emotions can not be recognized as itself. Thus, the
accuracy achieve 87%.
THE EMOTION CLASSIFICATION RESULT OF
SPEECH USING NEURAL NETWORK
Prediction
Neutral Calm Happy Sad Angry Fearful Disgust Surprise
Neutral 15 1 0 1 1 0 0 0
Calm 1 6 0 0 0 0 0 0
TRUE CLASS
Happy 0 0 10 0 0 0 0 0
Sad 0 0 0 13 0 0 0 0
Angry 0 0 0 0 4 0 0 0
Fearful 0 0 0 0 0 7 0 0
Disgust 0 0 0 0 0 0 6 1
Surprise 0 0 0 0 0 0 2 9
Table 5. Classification Result using Neural Network
Table 5 shows only happy, sad, angry, and fearful can be recognized all. 08/17/2022 17
08/17/2022 18
THE PERFORMANCE RESULT OF EACH LEVEL
SIGNAL IN DISCRETE WAVELET TRANSFORM
(DWT)
Accuracy of Emotion Classification (%)
KNN RANDOM NEURAL
FOREST NETWORK
LEVEL SIGNAL OF Level 8 39 91 74
DWT
Level 9 34 84 88
Level 10 31 87 90
Table 6. The Performance of Three Classifier on The Each Level Signal in
DWT
• The table 6 shows the performance of KNN, Random Forest and
Neural Network emotion classification on the each level signal in
DWT. In neural network get improved performance in each level
74%, 88%, and 90%, respectively, while the two other classifier have
decreased work in each level.
08/17/2022 19
• This shows that neural networks have the best accuracy for emotion
recognition.
CONCLUSION
1. THIS RESEARCH PRESENTS EMOTION RECOGNITION APPROACH AIMING AT IMPROVING THE
RECOGNITION RATE FOR HUMAN EMOTIONS [1].
2. IN THIS STUDY, A PERSON'S EMOTIONAL STATE IS IDENTIFIED USING SEVERAL LEVELS OF DWT SIGNALS
AND SEVERAL OTHER FEATURES, I.E., ZERO-CROSSING RATE, ENERGY, PEAK, AND FOURIER
TRANSFORM.
3. KNN, RANDOM FOREST AND NEURAL NETWORK CLASSIFIER WERE ADOPTED TO CLASSIFY EMOTION.
TABLE 3-5. SHOW THAT THE ACCURACY OF K-NN, RANDOM FOREST AND NEURAL NETWORK IS 31%, 87%,
AND 90%, RESPECTIVELY ON THE 10TH LEVEL OF DWT.
08/17/2022 20
08/17/2022 21
PROGRESS THESIS OF
“SPEECH EMOTION RECOGNITION”
Advice Previous Monitoring Progress Result
SUPERVISOR
1. Hertog Nugroho, Ph.D 1. Emotion Classification using 1. Slide 17
Neural Network 2. Slide 19
2. Implementation level signals
of DWT to improve
performance emotion
recognition.
REVIEWER
1. Rimba Whidiana C, Ph.D 1. Removes the word novel in Slide 4
research purposes.
2. Dr. Ema R 1. Choose the best classifier for Slide 16 and 17
emotion recognition.