Voice Morphing

DR.D.Y.
PATIL POLYTECHNIC, AMBI

COMPUTER DEPARTMENT
TOPIC :
VOICE MORPHING
 WHAT IS VOICE MORPHING ?
 APPROACHS TO THE PROBLEM.
 SPEECH PRODUCTION.
 CONVERSION OF VOICE.
 TYPES OF VOICE MORPHING.
 REFRANCES OR METHODS.
 APPLICATION OF VOICE MORPHING.
 AVAILABLE SOFTWARE FOR VOICE
MORPHING.
 SUMMARY.
 CONCLUSION.
 Voice Morphing which is also referred to as voice
transformation and voice conversion is a technique to
modify a source speaker's speech utterance to sound
as if it was spoken by a target speaker.
 There are many applications which may benefit
from this sort of technology. For example, a TTS
system with voice morphing technology integrated
can produce many different voices. In cases where
the speaker identity plays a key role, such as dubbing
movies and TV-shows, the availability of high
quality voice morphing technology will be very
valuable allowing the appropriate voice to be
generated (maybe in different languages) without the
original actors being present.
 Voice conversion will be performed in two
phases.
 In the first phase, the training, the speech

signals of the source and target speakers will
be analyzed and the voice characteristics will
be extracted by means of a mathematical
optimization technique, very popular in the
speech processing world, the Linear
Prediction Coding (LPC) technique.
 In second phase , the transformed features
will be used in order to synthesis speech that
will, hopefully, resemble that of the target
speaker.
 Speech synthesis will be performed again by

means of the Linear Prediction Coding.
The respiratory subsystem
is composed of the lungs,
trachea and windpipe,
diaphragm and the chest
cavity.
The larynx and pharyngeal
cavity or throat constitutes
the laryngeal subsystems.
 The articulatory subsystem
includes the oral cavity and
the nasal cavity.
 The oral cavity is comprised of the velum, the
tongue, the lips, the jaw and the teeth.
 In speech processing technical discussions, the
vocal tract is referred to as the combination of
the larynx, the pharyngeal cavity and the oral
cavity.
 The respiratory subsystem behaves like an air
pump, supplying the aerodynamic energy for the
other two subsystems.
 In speech processing, the basic aerodynamic
parameters are air volume, flow, pressure and
resistance.
 TECHNICS:-
 Wavelet Decomposition.
 Proposed model.
 Wavelet Decomposition :-
 Wavelets are a class of functions that possess
compact support and form a basis for all finite
energy signals.
 They are able to capture the non-stationary
spectral characteristics of a signal by
decomposing it over a set of atoms which are
localized in both time and frequency. The DWT
uses the set of dyadic scales and translates of
the mother wavelet to form an orthonormal basis
for signal analysis.
The original signal S is
split into an
approximation cA1 and a
detail cD1.
The approximation is
then itself split into an
approximation and a
detail and so on.
Decomposing a signal
into k levels of
decomposition therefore
results in k+1 sets of
coefficients at different
frequency resolutions, k
levels of detail and 1 level
of approximation
coefficients.
 Proposed model :
 Voice morphing is performed in two steps: training and
transformation. The training data consist of repetitions of
the same phonemes uttered by both source and target
speakers.
 The source and target training data is divided into frames of
128 samples and the data is randomly divided into training
and validation sets.
 A 5-level wavelet decomposition is then performed to the
source and target training data.
 IN THIS SECTION WE KNOW THAT IN
WHICH FORM WE CAN TRANFORM A
NORMAL VOICE OR SPEECH.
SOURCE TARGET RESULT1 RESULT2
F TO M SPEECH1 TARGET1 RESULT1 VOICE1
M TO F SPEECH2 TARGET2 RESULT2 VOICE2
F TO F SPEECH3 TARGET3 RESULT3 VOICE3
M TO M SPEECH4 TARGET4 RESULT4 VOICE4

 The "Source Speech" column indicates the utterances of the
source speaker.
 Target Speech" column is the target speaker's utterances.
 The utterances in both these two columns are NOT
included in the training data for the estimation of the
conversion function.
 The next two columns for result.
 The difference between these two columns is that the
“RESULT1" applies the target prosody extracted from the
target utterance, but the “RESULT2" still applies the
original prosody of the source utterances.
 Abe M. , Nakamura S. , Shikano K. and
Kuwabara H.: Voice conversion through
vector quantization, Proceedings of the
ICASSP, 1988.
 Stylianou Y., Cappe O. And Moulines E.:
Statistical Methods for Voice Quality
Transformation, Proceedings of Euro speech,
1995.
 Arslan L. and Talkin D: Voice Conversion by
Codebook Mapping of Line Spectral
Frequencies and Excitation Spectrum,
Proceedings of Euro speech , 1997.
 ENTERTAINMENT.
 IN FILM INDUSTRY.
 SECURITY.
 IN COMPUTER GAMING
 MORPH VOX PRO VOICE CHANGER 2.0.6.
 MORPH VOX PRO VOICE CHANGER 4.2.2.
 MORPH VOX PROVOICE CHANGER 4.3.8.
 TERA VOICE SERVAER 2004.
 FLASH VOICE BUTTONS 3.0.
 VOICE TWISTER 1.0.4.
 VOICE AGAIN 1.5.2.
 QUICK VOICE FOR OSX 2.2.0.
 QUICK VOICE FOR WINDOWS 2.2.0.
 Voice morphing is the process of changing voice
personality i.e. speech uttered by a source speaker
is modified to sound as if the target speaker had
uttered it.
 In this dissertation our attempt of voice morphing
commenced by introducing the basic properties of
speech signals.
 Introducing basic techniques of voice morphing.
 Concept behind voice morphing.
As voice morphing is a technology
with a lot of interesting, useful and fun
applications further research on the subject with or
without the implementation of the GTM
(Generative Topographic Mapping) model is
bound to follow that will lead to the production of
morphed speech of an excellent quality.

Voice Morphing

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Voice Morphing

Uploaded by

Copyright:

Available Formats

DR.D.Y.

PATIL POLYTECHNIC, AMBI

 In the first phase, the training, the speech

 Speech synthesis will be performed again by

F TO M SPEECH1 TARGET1 RESULT1 VOICE1

M TO F SPEECH2 TARGET2 RESULT2 VOICE2

F TO F SPEECH3 TARGET3 RESULT3 VOICE3

M TO M SPEECH4 TARGET4 RESULT4 VOICE4

You might also like