Research Paper ML

Voice Gender Recognition Using DeepLearning
Mandeep Shishodia1
2Graphic era hill University, Bachelor of Technology in computer science Engineering , Dheradun, Uttrakhand
herryshishodia@gmail.Com
III. DATA SET AND SOFTWARE LIBRARIES

Abstract—This article describes a multilayer perceptron deep
learning model for gender recognition. The dataset consists of A. Data Set
3,168 male and female voice samples recorded through acoustic Each voice sample format is a .WAV file. The .WAV format
analysis. We have used an MLP algorithm to detect gender files have been pre-processed for acoustic analysis using the
specific traits. The test data set achieved 96.74% specan function by the WarbleR R package [11]. A specan
function measures 22 acoustic parameters on acoustic signals.
Keywords-deep learning; voice recognition; multilayer These parameters are showed in “Table II”.
perceptron networks
TABLE II. MEASURED ACOUSTIC PROPERTIES.
I. INTRODUCTION Acoustic Properties
the acoustic analysis depends on the parameter settings of the
voice depending on the characteristics of the sample, such as Properties Description
Intensity, Duration, Frequency and Filtering [1]. Using the
duration length of signal
acoustic properties of voice and speech, it is possible to determine
the gender of the speaker. The acoustic analysis can be performed
using the warbler r package. The data set that has acoustic meanfreq mean frequency (in kHz)
characteristics can be extracted using this analysis. This data set
can also be trained with various machine learning algorithms, and sd standard deviation of frequency
in this paper, we have used MLP to obtain the model. We have
compared the results with related work, and created a web page to median median frequency (in kHz)
find out the gender of the voice by using the obtained model.
detect Q25 first quantile (in kHz)
II. RELATED WORK Q75 third quantile (in kHz)
Becker [2] used a frequency-based baseline model, logistic

IQR interquantile range (in kHz)
regression model [3], classification and regression tree (CART)
model [4], random forest model [5], boosted tree model [6],
Support Vector Machine (SVM) model [7], XGBoost model [8], skew skewness
stacked model [9] for recognition of voices data set [10].
According to used models, the results are showed in “Table I”. kurt kurtosis
Accuracy (%)
sp.ent spectral entropy
Model Train Test

sfm spectral flatness
Frequency-based baseline 61 59
mode mode frequency
Logistic regression 72 71
centroid frequency centroid
CART 81 78
peakf peak frequency
Random forest 100 87
meanfun average of fundamental frequency

Boosted tree 91 84
measured across acoustic signal
SVM 96 85 minfun minimum fundamental frequency
XGBoost 100 87
maxfun maximum fundamental frequency
Stacked 100 89 measured across acoustic signal
meandom average of dominant frequency
TABLE I. ACCURACY OF MODELS FOR RECOGNITION VOICES.
mindom minimum of dominant frequency
maxdom maximum of dominant frequency
dfrange range of dominant frequency State-of-the-Art Models: Highlight state-of-the-art deep
learning models and architectures for voice gender
recognition, such as Convolutional Neural Networks
modindx modulation index (CNNs), Recurrent Neural Networks (RNNs), or hybrid
models.
Preprocessing Techniques: Mention any preprocessing

The pre-processed WAV files have been saved into a CSV file. techniques like noise reduction, voice activity detection,
The CSV file is contained 3168 rows and 21 columns. There are and data augmentation commonly applied in these studies.
features and the classification of male or female in these 21
columns.
Performance Metrics: Discuss the evaluation metrics used,
such as accuracy, F1-score, and receiver operating
IV. LITERATURE SURVEY characteristic (ROC) curves.
ntroduction to Voice Gender Recognition: Start by Challenges and Future Directions: Explore the challenges
introducing the concept of voice gender recognition, its in this field, such as robustness to different accents and
applications, and its significance in fields like speech languages, and suggest potential research directions like
processing, human-computer interaction, and security. multi-modal recognition using voice and visual data.
Deep Learning in Voice Gender Recognition: Discuss the Notable Studies: Mention key papers and studies that have
role of deep learning techniques, such as convolutional made significant contributions to the field and summarize
neural networks (CNNs), recurrent neural networks their findings.
(RNNs), and deep neural networks (DNNs), in improving
the accuracy of gender recognition from voice.
Conclusion: Summarize the overall progress in voice
gender recognition using deep learning, potential
Datasets: Identify widely used datasets for voice gender applications, and the importance of ongoing research.
recognition, like the VoxCeleb dataset or CommonVoice,
and describe their characteristics.
Feature Extraction: Explain the features used for voice

gender recognition, including spectrograms, Mel-
frequency cepstral coefficients (MFCCs), or deep
embeddings.
capabilities [15]. By using Numpy arbitrary data-types can be

defined. This allows NumPy to seamlessly and speedily integrate
with a wide variety of databases. Keras uses Numpy for input data
types.
Django is free and open-source high-level Python Web
framework that encourages rapid development and clean,
pragmatic design. Django is reassuringly secure, exceedingly
scalable and was designed to help developers create applications
quickly as possible [16].
warbleR is a package designed to streamline acoustic analysis in
R. This package allows users to collect open-access acoustic data or
input their own data into a workflow that facilitates automated
spectrographic visualization and acoustic
measurements.
Rpy2 is a Python package to provide interface to run R code
embedded in a Python process.
410

Research Paper ML

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Research Paper ML

Uploaded by

Copyright:

Available Formats

Voice Gender Recognition Using DeepLearning

III. DATA SET AND SOFTWARE LIBRARIES

II. RELATED WORK Q75 third quantile (in kHz)

Becker [2] used a frequency-based baseline model, logistic

Model Train Test

meanfun average of fundamental frequency

Preprocessing Techniques: Mention any preprocessing

Feature Extraction: Explain the features used for voice

capabilities [15]. By using Numpy arbitrary data-types can be

You might also like