Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 30

FORENSIC

SCIENCE
SPEECH AND VOICES

SUBMITTED TO: SUBMITTED BY:


Dr.Ajay Ranga Manav Jindal
B.Com. LL.B. (Hons.)
9th semester
Section C
153/14
SPEECH AND VOICES

ACKNOWLEDGEMENT
I take this opportunity to express my profound gratitude and deep regards to my

teacher Dr.Ajay Ranga for his exemplary guidance, monitoring and constant

encouragement throughout.

I am obliged to the library staff for their cordial support, valuable information

and guidance, which helped me in completing this task through various stages.

Lastly, I thank almighty, my parents, siblings and friends for their constant

encouragement.

Manav Jindal

153/14

1|Page
SPEECH AND VOICES

CERTIFICATE OF ORIGINALITY
This is to certify that the project report submitted by me is an outcome of my

independent and original work. I have duly acknowledged all the sources from

which the ideas and extracts have been taken. The project is free from any

plagiarism and has not been submitted elsewhere for publication.

Manav Jindal

153/14

2|Page
SPEECH AND VOICES

TABLE OF CONTENTS
S.No. TOPIC Pg. No.

1 INTRODUCTION 4

2 VOICEPRINT 7

3 IMPORTANCE 10

4 EVIDENTIARY VALUE 12

5 NATURE 15

6 OPERATIONAL PROBLEMS 20

7 VOICE TEXT COLLECTIONS 22

8 EVALUATION 25

9 INTERPRETATION 26

10 CASE LAW 27

11 BIBLIOGRAPHY 29

3|Page
SPEECH AND VOICES

INTRODUCTION

“Person authentication by voice: a need for caution."1

Human beings as well as animals produce sound. The former can articulate the sound to
produce and create language through which they communicate with one other. The animals
cannot articulate their sounds and thus are unable to create a language of their own. 'The
capability of the human being for the articulation of the sound distinguishes him from the
other species. The voice of a person (as well as that of an animal) is unique, personal and
basically non-imitable in its entirety because each speaker has his or her characteristic
manner of speaking, including the use of a particular accent, rhythm, intonation, style,
pronunciation pattern, choice of vocabulary, etc. Therefore, it follows that the voice can
permit identification of the person, tinder normal circumstances most of the times. The voice
communications were possible among the persons who were within hearing range and they
could identify each other from their voice. Most of the time, they could see each other and
hence identify the speaker visually also. The modes of voice communications have changed
drastically in the last hundred years or so. The communicating, persons need not be within
the hearing range nor they need visual observations to identify the speaker, facing one
another or be present within hearing range.

Audio forensics is the field of forensic science relating to the acquisition, analysis, and
evaluation of sound recordings that may ultimately be presented as admissible evidence in a
court of law or some other official venue.2 Audio forensic evidence may come from a
criminal investigation by law enforcement or as part of an official inquiry into an accident,
fraud, accusation of slander, or some other civil incident. 3 The primary aspects of audio
forensics are establishing the authenticity of audio evidence, performing enhancement of
audio recordings to improve speech intelligibility and the audibility of low-level sounds, and
interpreting and documenting sonic evidence, such as identifying talkers, transcribing dialog,
and reconstructing crime or accident scenes and timelines. 4 Modern audio forensics makes
extensive use of digital signal processing, with the former use of analog filters now being

1
8th European Conference on Speech Communication & Technology, Geneva, 2012
2
Phil Manchester (January 2010). "An Introduction To Forensic Audio". Sound on Sound
3
Maher, Robert C. (Summer 2015). "Lending an ear in the courtroom: forensic acoustics"
4
Maher, Robert C. (March 2009). "Audio forensic examination: authenticity, enhancement, and interpretation".

4|Page
SPEECH AND VOICES

obsolete.5 Techniques such as adaptive filtering and discrete Fourier transforms are used
extensively.6

The field of forensic speech and audio analysis comprises a wide range of activities of which
the most spectacular is no doubt speaker identification. Other activities in the field include
intelligibility enhancement of recorded speech samples, the analysis of disputed utterances,
and the examination of the authenticity of audio recordings. A related though in many ways
very different activity is linguistic authorship identification, the linguistic analysis of a
spoken or written text undertaken with a view to establishing the identity of the author of that
text.

Voice (or vocalization) is the sound produced by humans and other vertebrates using the
lungs and the vocal folds in the larynx, or voice box. Voice is not always produced as speech,
however. Infants babble and coo; animals bark, moo, whinny, growl, and meow; and adult
humans laugh, sing, and cry. Voice is generated by airflow from the lungs as the vocal folds
are brought close together. When air is pushed past the vocal folds with sufficient pressure,
the vocal folds vibrate. If the vocal folds in the larynx did not vibrate normally, speech could
only be produced as a whisper. Your voice is as unique as your fingerprint. It helps define
your personality, mood, and health.7 While speech recognition is the process of converting
speech to digital data, voice recognition is aimed toward identifying the person who is
speaking. Voice recognition works by analyzing the features of speech that differ between
individuals. Everyone has a unique pattern of speech stemming from their anatomy (the size
and shape of the mouth and throat) and behavioral patterns (their voice’s pitch, their speaking
style, accent, and so on).

Humans express thoughts, feelings, and ideas orally to one another through a series of
complex movements that alter and mold the basic tone created by voice into specific,
decodable sounds. Speech is produced by precisely coordinated muscle actions in the head,
neck, chest, and abdomen. Speech development is a gradual process that requires years of
practice. During this process, a child learns how to regulate these muscles to produce

5
https://en.wikipedia.org/wiki/Audio_forensics
6
Alexander Gelfand (10 October 2007). "Audio Forensics Experts Reveal (Some) Secrets". Wired Magazine.
Archived from the original on 2012-04-08.
7
https://www.nidcd.nih.gov/health/what-is-voice-speech-language

5|Page
SPEECH AND VOICES

understandable speech. Speech recognition is the process of capturing spoken words using a
microphone or telephone and converting them into a digitally stored set of words.8

THE VOICE IDENTIFICATION IS BESET WITH NUMEROUS HURDLES


INCLUDING THE FOLLOWING:

1. Intra-speaker Variations: There are always inter-speaker variations. Simultaneously,


there are infra-speaker variations also which hinder feature selection of the voice.
Normalization of the identifying features is now routinely attempted to secure better
verifications.

2. Disguise: The identification of birds, beasts or human beings by voice was primitive
but it was quite effective. There was (is) no disguise in their voice. Humans can,
however, can disguise their voice. It is often difficult, if not impossible, to identify or
verify the speaker when the voice is disguised. Identification of human voice through
listening still continues. Both the lay and the expert witnesses give evidence on the
basis of aural evaluation of the voice. But now almost in all cases supplementary
evidence of instrumental evaluation is also produced. Instrumental methods are useful
and handy but they are not infallible, as one would wish them to be. In fact the
instrumental evaluation has now taken the central stage.

3. Age Effect: It is elementary that in old age the loss of teeth and adverse the voice like
those effect on vocal envelope bring in subtle changes in handwriting. They must be
taken into account in' speaker identification or verification.

4. Health Effect: The weak health or ailments brings in changes in voice like those due
to old age. Obviously, the extent of change depends upon the degree of deterioration
of health.

5. Emotional Disturbance; Extreme fear hatred, anger, shock, etc. affect voice. Their
possible influence is considered while carrying out voice analysis.

8
https://www.streetdirectory.com/travel_guide/139545/technology/key_differences_between_speech_recognitio
n_and_voice_recognition.html

6|Page
SPEECH AND VOICES

6. Miscellaneous: In addition to the above, background noise, Recording device noise,


reproduction machine noise, mimicking, and other problems also affect.

VOICEPRINT

In the recent world of technology, there are many methods to determine the individuality of a
person. One of them is the voice – unique individual characteristic. Each person‘s voice is
different because the anatomy of the vocal cords, vocal cavity, oral and nasal cavities is
specific to the individual. Comparison of two recorded speech by means of spectrogram or
voice prints for the purpose of identification is called as Voice fingerprinting. Forensic voice
analysis has been used in a wide range of criminal cases such as murder, rape, drug dealing,
bomb threats and terrorism. Investigator has two complementary ways of making
identification through voice analysis. First, he or she will listen to the evidence sample and
the sample taken from the suspect, comparing accent, speech habits, breath patterns and
inflections. Then a comparison of the corresponding voiceprints is made.

In Ritesh Sinha vs State Of U.P.& Anr, 9 A voice print is a visual recording of voice.
Spectrographic Voice Identification is described in Chapter 12 of the Book “Scientific
Evidence in Criminal Cases” written by Andre A. Moenssens, Ray Edward Moses and Fred
E. Inbau. The relevant extracts of this chapter could be advantageously quoted.

“Voiceprint identification requires

(1) a recording of the questioned voice,

(2) a recording of known origin for comparison, and

(3) a sound spectrograph machine adapted for ‘voiceprint’ studies.”

9
CRIMINAL APPEAL NO. 2003 OF 2012

7|Page
SPEECH AND VOICES

12.02 Sound and Speech

In order to properly understand the voiceprint technique, it is necessary to briefly review


some elementary concepts of sound and speech.

Sound, like heat, can be defined as a vibration of air molecules or described as energy in the
form of waves or pulses, caused by vibrations. In the speech process, the initial wave
producing vibrations originate in the vocal cords. Each vibration causes a compression and
corresponding rarefications of the air, which in turn form the aforementioned wave or pulse.
The time interval between each pulse is called the frequency of sound; it is expressed
generally in hertz, abbreviated as hz., or sometimes also in cycles-per-second, abbreviated as
cps. It is this frequency which determines the pitch of the sound. The higher the frequency,
the higher the pitch, and vice versa.

Intensity is another characteristic of sound. In speech, intensity is the characteristic of


loudness. Intensity is a function of the amount of energy in the sound wave or pulse. To
perceive the difference between frequency and intensity, two activities of air molecules in an
atmosphere must be considered. The speed at which an individual vibrating molecule
bounces back and forth between the other air molecules surrounding it is the frequency.
Intensity, on the other hand, may be measured by the number of air molecules that are being
caused to vibrate at a given frequency.”

“12.03 The Sound Spectrograph

The sound spectrograph is an electromagnetic instrument which produces a graphic display of


speech in the parameters of time, frequency and intensity. The display is called a sound
spectrogram.”

30. Thus, it is clear that voiceprint identification of voice involves measurement of frequency
and intensity of sound waves.

31. There is another angle of looking at this issue. Voice prints are like finger prints. Each
person has a distinctive voice with characteristic features. Voice print experts have to
compare spectrographic prints to arrive at an identification. In this connection, it would be
useful to read following paragraphs from the book “Law Enforcement and Criminal
Justice – an introduction” by Bennett-Sandler, Frazier, Torres, Waldron.

8|Page
SPEECH AND VOICES

“Voiceprints. The voiceprint method of speaker identification involves the aural and visual
comparison of one or more identified voice patterns with a questioned or unknown voice.
Factors such as pitch, rate of speech, accent, articulation, and other items are evaluated and
identified, even though a speaker may attempt to disguise his or her voice. Through means of
a sound spectrograph, voice signals can be recorded magnetically to produce a permanent
image on electrically sensitive paper. This visual recording is called a voiceprint.

A voiceprint indicates resonance bars of a person’s voice (called formants), along with the
spoken word and how it is articulated. Figure 9.7 is an actual voiceprint sample. The loudness
of a voice is indicated by the density of lines; the darker the lines on the print, the greater the
volume of the sound. When voiceprints are being identified, the frequency and pitch of the
voice are indicated on the vertical axis; the time factor is indicated on the horizontal axis. At
least ten matching sounds are needed to make a positive identification, while fewer factors
lead to a probable or highly probable conclusion.

Voiceprints are like fingerprints in that each person has a distinctive voice with characteristic
features dictated by vocal cavities and articulators. Oral and nasal cavities act as resonators
for energy expended by the vocal cords. Articulators are generated by the lips, teeth, tongue,
soft palate, and jaw muscles. Voiceprint experts must compare spectrographic prints or
phonetic elements to arrive at an identification. These expert laboratory technicians are
trained to make subjective conclusions, much as fingerprint or criminalistic experts must
make determinations on the basis of evidence.” (emphasis supplied.) Thus, my conclusion
that voice sample can be included in the inclusive definition of the term “measurements”
appearing in Section 2(a) of the Prisoners Act is supported by the above-quoted observation
that voice prints are like finger prints. Section 2(a) states that measurements include finger
impressions and foot impressions. If voice prints are like finger prints, they would be covered
by the term ‘measurements’. I must note that the Law Commission of India in its 87th Report
referred to the book “Law Enforcement and Criminal Justice – an introduction”. The Law
commission observed that voice prints resemble finger prints and made a recommendation
that the Prisoners Act needs to be amended. I am, therefore, of the opinion that a Magistrate
acting under Section 5 of the Prisoners Act can give a direction to any person to give his
voice sample for the purposes of any investigation or proceeding under the Code.

9|Page
SPEECH AND VOICES

IMPORTANCE

Identification of voice or sound has always been important. It is becoming increasingly more
important when voice mode of communications have adopted digital and electronic mode for
recording, storage and transmission for all types of verbal exchange not only in social,
entertainment and business matters but in communications involved in crimes. Consequently,
to extend the secure and safe use of voice communications for business and entertainment
and to stem the tide of crime, voice analysis is receiving a lot of attention from the scientists
and a lot of research is going on for its safer and securer use on the one hand and to handle
the criminal through voice analysis on the other hand. In forensics context, voice analysis is
important for:

1. Audio-video recordings in electronic form are becoming more prevalent as evidence


in the investigation of a crime. Such evidences are frequently encountered as an
important clue in crime cases such as terrorism, extortion, intimidation, espionage,
drug trafficking, stalking, kidnapping and ransom demands, and in trap case
investigations under the Prevention of Corruption Act, etc.10

2. Identification of the criminal (speaker) from their telephone and other recordings. The
identification of criminals from voice is being used increasingly and effectively to
identify criminals, group of criminals, conspirators, abettors and supporters of
criminals. It is also becoming a potent weapon against corrupt politicians, officials
and business men.

3. Verification (authentication) of the speaker.

4. Profiling of a criminal from his utterances is assuming great importance in fixing his
place of origin, his social status and mental and emotional personality. Forensic
psychologists are doing a yeoman's Job in this

5. Determination integrity of the utterance recorded on various mechanical, regard,


electronic devices.

10
http://muniwar.com/nsic/wp-content/uploads/2017/01/Speaker-Identification.pdf

10 | P a g e
SPEECH AND VOICES

6. Enhancing the intelligibility of utterance.

7. Transcription and analysis of the disputed utterance.

8. The most revolutionary development in recent years is the use of voice identification
and authentication in banking industry and in online business. Speaker authentication
acts as a gatekeeper and allows transactions only after verification of the voice
identity of the account holder in telephone banking.

Voice biometrics is being used increasingly in banks. It provides a three-in-one solution for
safe and time and cost effective banking. THE VOICE HAS BEEN USED AS
IMPORTANT CLUES IN MANY A CRIMINAL CASE.11

1. In the Bombay bomb blast case of 1993, the main perpetrator, abettors and controllers
were identified through telephone tapping and recording their voices.

2. In cricket match fixing, Hansie Cronje, South Africa team Captain, was fixed through
telephonic conversations he had with bookies.

3. In the murder of a press reporter, the main organizer of the murder, allegedly a senior
police officer, was identified through voice identification.

4. In the terrorist attack on Parliament, on 13-12-2001, the main organizer, a Prof. of


Delhi University was identified through mobile phone tapping and recording his voice
although the mobile recording was very noisy.

5. In spot fixing in IPL 2013 cricket the culprit cricketers were fixed through their
identification from telephone conversation.

Voice identification often plays a decisive role in the following types of cases:

1. Ransom cases, blackmail, threats, obscene calls, extortion, political or other


intimidation.

2. Bomb or fire hoax.

11
Sharma B.R., “Forensic Science in Criminal Investigation & Trials”, 5 th Ed., Universal Law Publication, 2014
at Pg. 1526

11 | P a g e
SPEECH AND VOICES

3. Black box voice identification.

4. Intelligence collections — identification of spies, saboteurs.

5. Identification of drug dealers — manufacturers, smugglers, suppliers, distributors,


peddlers.

EVIDENTIARY VALUE

In Nilesh Dinkar Paradkar vs State Of Maharashtra, 12 supreme court stated that in our
opinion, the evidence of voice identification is at best suspect, if not, wholly unreliable.
Accurate voice identification is much more difficult than visual identification. It is prone to
such extensive and sophisticated tampering, doctoring and editing that the reality can be
completely replaced by fiction. Therefore, the Courts have to be extremely cautious in basing
a conviction purely on the evidence of voice identification. This Court, in a number of
judgments emphasised the importance of  the precautions, which are necessary to be taken in
placing any reliance on the evidence of voice identification. In the case of Ziyauddin
Burhanuddin Bukhari Vs. Brijmohan Ramdass Mehra & Ors. 13 , this Court made
following observations:-

"We think that the High Court was quite right in holding that the tape-records of speeches
were "documents", as defined by Section 3 of the Evidence Act, which stood on no different
footing than photographs, and that they were admissible in evidence on satisfying the
following conditions:

"(a) The voice of the person alleged to be speaking must be duly identified by the maker of
the record or by others who know it.

12
CRIMINAL APPEAL NO. 537 OF 2009

13
1975 SCR 453

12 | P a g e
SPEECH AND VOICES

(b) Accuracy of what was actually recorded had to be proved by the maker of the record and
satisfactory evidence, direct or circumstantial, had to be there so as to rule out possibilities of
tampering with the record.

(c) The subject-matter recorded had to be shown to be relevant according to rules of


relevancy found in the Evidence Act."

In the case of Ram Singh & Ors. Vs. Col. Ram Singh 14 , again this Court stated some of the
conditions necessary for admissibility of tape recorded statements or other mechanical
process, as follows:-
"(1) The voice of the speaker must be duly identified by the maker of the record or by others
who recognise his voice. In other words, it manifestly follows as a logical corollary that the
first condition for the admissibility of such a statement is to identify the voice of the speaker.
Where the voice has been denied by the maker it will require very strict proof to determine
whether or not it was really the voice of the speaker.

(2) The accuracy of the tape-recorded statement has to be proved by the maker of the record
by satisfactory evidence -- direct or circumstantial.

(3) Every possibility of tampering with or erasure of a part of a tape-recorded statement must
be ruled out otherwise it may render the said statement out of context and, therefore,
inadmissible.

(4) The statement must be relevant according to the rules of Evidence Act.

(5) The recorded cassette must be carefully sealed and kept in safe or official custody.

(6) The voice of the speaker should be clearly audible and not lost or distorted by other
sounds or disturbances."

In Ram Singh's case (supra), this Court also notices with approval the observations made by
the Court of Appeal in England in the case of R. Vs. Maqsud Ali. In the aforesaid case,
Marshall, J. observed thus:-

"We can see no difference in principle between a tape-recording and a photograph. In saying
this we must not be taken as saying that such recordings are admissible whatever the
circumstances, but it does appear to this Court wrong to deny to the law of evidence

14
3 1985 SCR Supl. (2) 399

13 | P a g e
SPEECH AND VOICES

advantages to be gained by new techniques and new devices, provided the accuracy of the
recording can be proved and the voices recorded properly identified; provided also that the
evidence is relevant and otherwise admissible, we are satisfied that a tape- recording or other
mechanical process is admissible in evidence. Such evidence should always be regarded with
some caution and assessed in the light of all the circumstances of each case. There can be no
question of laying down any exhaustive set of rules by which the admissibility of such
evidence should be judged."

To the same effect is the judgment in the case of R. Vs. Robson 15 , which has also been
approved by this Court in Ram Singh's case (supra). In this judgment, Shaw, J. delivering the
judgment of the Central Criminal Court observed as follows:-

"The determination of the question is rendered more difficult because tape-recordings or


other mechanical process may be altered by the transposition, excision and insertion of words
or phrases and such alterations may escape detection and even elude it on examination by
technical experts.

31. Chapter 14 of Archbold Criminal Pleading, Evidence and Practice8 discuss the law in
England with regard to Evidence of Identification. Section 1 of this Chapter deals with Visual
Identification and Section II relates to Voice Identification. Here again, it is emphasised that
voice identification is more difficult than visual identification. Therefore, the precautions to
be observed should be even more stringent than the precautions which ought to be taken in
relation to visual identification. Speaking of lay listeners (including police officers), it
enumerates the factors which would be relevant to judge the ability of such lay listener to
correctly identify the voices. These factors include:-

"(a) the quality of the recording of the disputed voice,

(b) the gap in time between the listener hearing the known voice and his attempt to recognize
the disputed voice,

(c) the ability of the individual to identify voices in general (research showing that this varies
from person to person), 8 2010 edition at pg: 1590-91

(d) the nature and duration of the speech which is sought to be identified and

15
[1972] 2 All E.R. 699

14 | P a g e
SPEECH AND VOICES

(e) the familiarity of the listener with the known voice; and even a confident recognition of a
familiar voice by a way listener may nevertheless be wrong."

The Court of Appeal in England in R Vs. Chenia and R.Vs. Flynn and St. John has reiterated
the minimum safeguards which are required to be observed before a Court can place any
reliance on the voice identification evidence, as follows:-

"(a) the voice recognition exercise should be carried out by someone other than the officer
investigating the offence;

(b) proper records should be kept of the amount of time spent in contact with the suspect by
any officer giving voice recognition evidence, of the date and time spent by any such officer
in compiling any transcript of a covert recording, and of any annotations on a transcript made
by a listening officer as to his views as to the identify of a speaker; and

(c) any officer attempting a voice recognition exercise should not be provided with a
transcript bearing the annotations of any other officer."

NATURE

In voiced speech, the vocal folds (sometimes misleadingly called ‘vocal cords’) vibrate. This
allows puffs of air to pass, which produces sound waves.16 Voice analysis is mainly
concerned with the identification of utterances made by a person while communicating with
others. The voice of each person is unique. Language is the creation of the following two
function:

1. Phonation: Phonemes, acoustic signals are created in this process

2. Articulation: Phonemes are modulated to create intelligible words.’’

PHONEMES

They are audible and even non-audible vowels and consonants. Articulation of sound to
intelligible voice occurs through tongue, lips, palate, teeth and various cavities in the organs.

16
http://www.animations.physics.unsw.edu.au/jw/voice.html

15 | P a g e
SPEECH AND VOICES

Thus, sounds uttered by different persons vary. A phoneme is a single "unit" of sound that
has meaning in any language.17

Pitch of the human sound normally has the following ranges:

1. For men = 90 to 140 Hz


2. For females = 180 to 300 Hz

3. For children = 300 to 600 Hz

VARIATIONS IN VOICE18

The human voice varies. Variations in human beings inter se are called inter speaker
variations. They are due to many factors, both of organic and non-organic origin. We have
already seen the variations due to organ variations.

Non-organic variations are due to:

1. Learning process of speech

2. Regional variations — influence of environment, dialects

3. Social structures

4. Educational level.

The human voice also varies in the same person, from time to time. The variations are called
intra speaker variations. In fact no person can create exactly the same voice even with his
best effort under the similar situations, created one after the other. These variations are due
to:

1. Emotions: Emotions play an important role in speech variations. It is well established


that psychosomatic effect can bring in changes in the voice even unintentionally. For
example, when a person is asking for a ransom the fear of being caught affects the
voice of the person. Likewise, when a person is giving specimen utterances, his
17
http://www.phonemicchart.com/what/
18
Sharma B.R., “Forensic Science in Criminal Investigation & Trials”, 5 th Ed., Universal Law Publication, 2014
at pg. 1529

16 | P a g e
SPEECH AND VOICES

awareness that it might bring him trouble can bring in subtle change in the voice. The
voice analyst takes such possible variations into account in his evaluations.
2. Rate of utterance

3. Mode of speech

4. Disease

5. Mood of the speaker

6. The emphasis given to a word at a particular moment

7. Physical discomfort, interference

8. Intoxication due to liquor or drugs

9. Age of the person

The success or failure of identification of a speaker depends mainly upon:

1. Inter-speaker variability of the voice. It should be maximal. There is no measure


developed for determining variability so far.

2. Intra-speaker variability of the voice. It should be minimal. It has not been quantified
so far. Inter-speaker variations should be considerably greater than intra-speaker
variations.

3. The extent of the time period of listening to the disputed voice.

4. The quality of speech.

5. The efficiency of the recording instrument for recording and re-conversion (re-
playing) of the utterance.
6. The repetition of words.

7. The time lapse between the two utterances. It should be minimal.

17 | P a g e
SPEECH AND VOICES

8. There should be no disguise. Unfortunately, there are ways and means by which the
voice can be distorted considerably to create confusion in identification. There are cell
phones which can create (speak) different voices.
9. The number of possible suspects should be limited.

10. The sample utterances should be recorded, as far as possible, under similar conditions
to those prevailing in the recording of the voice in the crime situation, using the same
set of instruments whenever possible.

11. If voice has been transmitted the transmission channel should be same or similar.

Kersta, who initiated voice identification technique was over-enthusiastic about the
individuality of voice of each person and thought the voice remains unaffected with age, loss
of teeth, tonsils, etc. Likewise, he claimed that the attempts to disguise, mimic others,
ventriloquism and whispering etc., do not affect the identification. But the claim was a bare
statement, without experimental data. They are not accepted by the voice scientists. It is
believed that these factors affect. Extensive research has been done to counter the inevitable
variations and scientists have come up with techniques to minimize their effects. They are
being used increasingly in crime situations.

Voice analysis has been classified into two main categories:

1. Speech identification

2. Speaker identification

In speech identification the contents of the utterance which may defy understanding due to
external noise or changes in the sound are made intelligible. The need for evaluation of the
speech may arise in some forensics situation the contents may incriminate or exonerate a
person charged with crime. It may also reveal evidence which promote better dissemination
of justice by revealing secret or otherwise unknown evidence.

Speech identification is also used in certain security installations where the regular visitors
have recorded their speech (limited volume) in a computer. When the visitor wants entry to
the premises, he repeats the speech. The computer compares the two speeches. If they
emanate from the source, entry is allowed. If they are from different source, entry is denied.

18 | P a g e
SPEECH AND VOICES

Speaker recognition systems fall into two categories according to the evidence made
available: text-dependent and text-independent.

TEXT DEPENDENT

In text dependent cases the disputed voice for verification and the exemplar (control) voice
have the same word content. Here, prompts, like a common password or phrase can be
common for various speakers or unique. In addition, passwords, PINs or know edge-based
information can be employed for broader database.

TEXT INDEPENDENT

Text-independent systems are more laborious but they have the advantage that they require
little, if any, cooperation from the speaker. They are therefore, more frequently used in voice
recognition.

Here the exemplar and the disputed voices have different text content. The subject may not
even know that his voice is being used for forensic purposes. In text independent systems
both acoustics and speech analysis techniques are used.

Speaker identification is the major Forensic problem. It has also two sub-divisions: Speaker
Identification and Speaker Authentication or Speaker Verification.

In the identification step, the extracted features from the unknown sample are compared with
the speaker models present in the speaker database to find the best match. The match score is
used to make the final decision about speaker identity. In speaker verification, an identity is
claimed by an unknown speaker, and an utterance of this unknown speaker is compared with
a model for the speaker whose identity is being claimed. If the match is good enough, that is,
above a threshold, the identity claim is accepted.19

A high threshold makes it difficult for impostors to be accepted by the system, but with the
risk of falsely rejecting valid users. Conversely, a low threshold enables valid users to be
accepted consistently, but with the risk of accepting impostors. To set the threshold at the
desired level of customer rejection (false rejection) and impostor acceptance (false
acceptance), data showing distributions of customer and impostor scores are necessary.

19
http://shodhganga.inflibnet.ac.in/bitstream/10603/104402/8/08_chapter-1.pdf

19 | P a g e
SPEECH AND VOICES

In a sense speaker verification is a 1:1 match where one speaker's voice is matched to one
template also called a "voice print" or "voice model" whereas speaker identification is a 1:N
match where the voice is compared against N templates.

In speaker identification, the speaker speaks certain specified words which k are turned into
digital or analog sound data. It is compared with corresponding data from the same text,
already stored in a computer. If the data match, the speaker is identified.

Speaker identification devices are often installed and used at entrance gates for entry in
security installation premises. The computer has voice data from all the authorized members
of the staff and it picks up the right one from the many and 'passes' entry of the individual.

Speaker authentication refers to the verification of the given utterance vis-a-vis a particular
individual. His recorded voice is compared on 1:1 basis. If the identifying data match the
identity of the speaker stands verified.

OPERATIONAL PROBLEMS

The operational problems in voice identification are many and varied. They are often inherent
to the system and have to be tackled carefully.

The following factors affect the performance and reliability of speaker recognition system:20
 Variations in speakers: emotional and physical state, gender, accent or pronunciation, speed
of speaking.
 Variations in environment: noise, acoustic disturbances.
 Variations in transmission channels and microphones.

20
http://shodhganga.inflibnet.ac.in/bitstream/10603/104402/8/08_chapter-1.pdf

20 | P a g e
SPEECH AND VOICES

Some of the technical problems are:21

TELEPHONE UTTERANCE

Telephone utterances are usually short in actual crime situation. The criminal is often tense
and due to psychosomatic effect the voice is not normal. Further, the criminal may—

1. Intentionally modify the voice


2. Mimic the voice of some person

3. Disguise his voice

The telephone apparatus and the recording device involved also affects:

1. Defective telephone apparatus, voice recording apparatus or transmission device


aggravates telephone utterance identification problems

2. Telephone systems record voice after filtering the human sound reducing the
frequency range to 300 to 3.4 KHz. Higher or lower frequenciesin the voice are not
available in the utterance for evaluations. It reduces the identification parameters
considerably.

3. Transmission channel distortions modify sound.

WIRE TAPPING

Wiretapping is frequently utilized source for recording the v criminals covertly. The recorded
voice is usually free from modifications, or disguise, because the criminal does not know that
his voice is being recorded. Again the usual psychosomatic effect, present in threat, ransom
or obscene call is also absent. However, if the person has taken drugs or is the of alcohol or if
the criminal is in poor health, his voice may not be normal. The voice analysis may give
erroneous results when compared with normal voice. External disturbances (outside noises) at
the recording site also affect the recorded voice and create difficulties in its identification.
They, however. do not inhibit the recognition of recorded voice.

The operational blues of recording and transmission devices, however, continue, as in case of
normal telephone calls.

21
Sharma B.R., “Forensic Science in Criminal Investigation & Trials”, 5 th Ed., Universal Law Publication, 2014
at pg. 1531

21 | P a g e
SPEECH AND VOICES

CONTROLLED VOICE

Controlled utterance for evaluation and comparison with the utterance needs vigilance:

1. Collect sample utterances at the earliest. Passage of brie may bring in changes in the
health of the person which affect the voice may increase the difficulties of its
comparison with the disputed utterance.

2. Transmission, recording and reproduction devices have at influence on the quality of


the voice. Use, therefore, whenever possible, the same equipment for the various
processes involved.

3. Create and record the voice under same or similar candidates whenever possible.

In spite of efforts to recreate corresponding conditions, certain lacunae remain. For example,
the psychosomatic effect due to tension at the time of commission of crime cannot be
simulated.

TRANSMISSION CHANNEL

In telephones speech band width is reduced to the frequency range between 300 to 3400 Hz.
The reduced frequency of the voice, therefore, does not contain those features of voice which
are related to higher or lower frequencies. Likewise dynamic range is also controlled in
telephone transmissions. The dynamic range available is also the recorded range only.

VOICE TEXT COLLECTIONS

In the classic sound spectrograph, sounds are recorded on a magnetic disk and sent to an
amplifier, which makes the sound more intense. The sounds then go through a scanner or
frequency analyser, which separates the sounds into different frequencies. Frequency is a
measurement of how often the molecules of the air vibrate as sound waves pass them. A filter
selects a group of frequencies and, with the help of the analyser, converts them into electrical
signals. These signals move the pen like stylus, which marks paper on the recording drum.

22 | P a g e
SPEECH AND VOICES

The stylus produces a series of jagged lines that show both the frequency and the intensity or
loudness, of the sounds. The process is repeated with other groups of frequencies. Kersta‘s
new sound spectrograph had four parts: a tape recorder player, a scanner or frequency
analyser, a filter, and a stylus. Today many parts of a sound spectrograph are computerized.

The spectrograph‘s printout is called a spectrogram. Each spectrogram shows 2.5 seconds of
spoken sounds, represented as a graph. The vertical axis indicates frequencies of the sound &
horizontal axis shows the time. The spectrogram reflects the fact that each sound of the
human voice actually consists of many sounds occurring at the same time. The most
important of these sounds are called fundamentals. Fainter overtones called harmonics occur
at pitches above those of fundamentals. The spectrogram shows the frequencies of both
fundamentals and harmonics.

The analyst first listens to the two tapes repeatedly, trying to detect similarities and
differences in the way the voices make single sounds and groups of sounds, the way
breathing interacts with the sounds, and unusual speech habits, inflections, and accents. At
the end of the examination, the analyst reaches one of five conclusions: The samples
definitely match, the samples probably match, the samples probably do not match, the
samples definitely do not match, or the test was inconclusive. An analyst must find 20 points
of similarity and no unexplainable differences in order to declare a definite match. A definite
non match requires 20 or more differences between the two tapes.

However, there is no international standard for the minimum number of points of identity
needed in this comparison. In brief, the investigator has two complementary ways of making
identification through voice analysis. First, he or she will listen to the evidence sample and
the sample taken from the suspect, comparing accent, speech habits, breath patterns, and
inflections. Then a comparison of the corresponding voiceprints is made.

The real voice forensics involves mostly recorded voice analysis. The following are the usual
modes for recording voice:

MECHANICAL MODE

Thomas Edison gave us the famous gramophone (phonograph), a shellac disc which recorded
voice. Later the material of the disc changed to wax or tin foil disc or tin foil drum. These
modes of recording are now obsolete.

Vinyl discs replaced the above mode but they have also more or less vanished from the scene.

23 | P a g e
SPEECH AND VOICES

Motion picture audio track is also a mechanical mode. It is being used in cinematography. It
is not used in non-cinematic recordings.

MAGNETIC MODE

Magnetic mode for voice recording succeeded the mechanical mode. It has been utilised in:

1. Micro Cassette Recorder

2. Compact Cassette Recorder

3. Reel to reel recorder

4. Camcorder Analog Audio Recorder

5. Video Tape

6. Answering Machine.

These voice recording devices are common and are still available in the market.

DIGITAL MODE

Electronic mode of recording voice is the 'in thing', because of its superior quality. It gives
life-like reproduction. This is the preferred mode and is being used increasingly for voice
recording in voice forensics. The following devices are useful:

1. Digital Audio Tape Recorder (DAT)

2. Digital Computer Cassette Recorder (DCC), Computer Disc Drive, etc.

3. Audio Recorder

4. Computer using tape drive back up or USB port removable chip

5. Answering Machines

24 | P a g e
SPEECH AND VOICES

6. Semi-conductor Memory, IC (Integrated chip) memory.

OPTICAL MODE

CD, CD ROM, CD-1 (interactive) CD-RC (recordable, CD-RW (rewritable) w DVD (Digital
Versatile Disc), Mini Disc etc. are the common devices in this category. They perform well.

EVALUATION
Voice analysis is done by three modes:

1. Speaker recognition through hearing the voice directly or through instruments


2. Using instruments for voice collection, analysis and comparison.
3. Combination of the two modes.

In the first category the evidence is given by those individuals who have heard the
disputed speech as well as the specimen (ordinary). The identification of the voice is
highly subjective. Correct identification is possible only when the witness is familiar with
the voice and the speaker has not disguised or mimicked the voice. It is also possible if
the voice of the speaker has some extraordinary identifying elements in the voice. In
some cases, speech analysis (peculiar words, phrases, accent, and mannerism in talking)
may help to identify the speaker. It is rare. The factors affecting the identification of the
speaker are:

1. The ability of the listener for speech recognition.

2. The amount of utterance the person has listened to.

3. The familiarity with the voice.

4. The interval between listening to the disputed and the sample utterances. Greater the
interval less reliable is the voice identification.

5. The emotional condition of the listener at the time of the disputed utterance.

25 | P a g e
SPEECH AND VOICES

6. The personal involvement of the victim.

7. External noise.

Speaker's Recognition through Listening (SRL) is used day-in and day-out by all of us to
identify our near and dear ones. Voice analyst also uses it extensively.

Speaker Recognition through Voice Spectrograms (SRS) are becoming more and more
important as the techniques are being refined and their reliability of correct findings is
increasing.

Automatic Speaker Recognition, through computers (ASR) is becoming the new tool of
voice identification. There are numerous tools and techniques are being used and they are
being improved continuously.

Usually, one does not depend upon one single method. Most of the laboratories doing
voice identification combine the first and last techniques. Some voices combine SRL &
SRS.

INTERPRETATION

All forensic identifications aim at achieving the practical individuality .of the source. It is
possible in some types of evidence, not in others. In voice evaluations instrumental
individualization of voice source is difficult to achieve with certainty. It has been possible in
some cases to give high probability of the identity of the speaker. In some other cases,
however, fairly high probability on the non-identity of the speaker has been achieved.

The usual results given by the experts fall in the following categories-

1. Identification of the voice:


• High probability
• Probable
• Possible
• Evidence insufficient for opinion.

26 | P a g e
SPEECH AND VOICES

2. Elimination of the speaker:


• Positive in some cases
• Highly probable exclusion

CASE LAW
This court in N. Sri Rama Reddy v. V.V. Giri 22 accepted conversation or dialogue recorded
on a tape recording machine as admissible evidence.

In the Presidential Election case23, questions were put to a witness Jagat Narain that he had
tried to dissuade the petitioner from filing an election petition. The witness denied those
suggestions. The election petitioner had recorded on tape the conversation that had taken
place between the witness and the petitioner. Objection was taken to admissibility of tape
recorded conversation. The court admitted the tape recorded conversation. In the Presidential
Election case (supra), the denial of the witness was being controverted, challenged and
confronted with his earlier statement. Under section 146 of the Evidence Act questions might
be put to the witness to test the veracity of the witness. Again under section 153 of the
Evidence Act a witness might be contradicted when he denied any question tending to
impeach his impartiality. This is because the previous statement is furnished by the tape
recorded conversation. The tape itself becomes the primary and direct evidence of what has
been said and recorded.

Tape recorded conversation is admissible provided first the conversation relevant to the
matters in issue; secondly, there is identification of the voice; and thirdly, the accuracy of the
tape recorded conversation is proved by eliminating the possibility of erasing the tape-record.
A contemporaneous tape-record of a relevant conversation is a relevant fact and is admissible
under section 8 of the Evidence Act. It is also comparable to a photograph of relevant
incident. The tape recorded conversation is therefore a relevant fact and is admissible under
section 7 of the Evidence Act. The conversation between Dr. Motwani and the appellant in
the present case is relevant to the matter in issue. There is no dispute about the identification
of the controversy about any portion of the conversation being erased or mutilated. The

22
1971 AIR 1162
23
(1971) 1 SCR 399

27 | P a g e
SPEECH AND VOICES

appellant was given full opportunity to test the genuineness of the tape recorded conversation.
The tape recorded conversation is admissible in evidence.

It was said by counsel for the appellant that the tape recorded conversation was obtained by
illegal means. The illegality was said to be in contravention of section 25 of the Indian
Telegraph Act. There is warrant for proposition that even if evidence is illegally obtained it is
admissible. Over a century ago it was said in an English case where a constable searched the
appellant illegally and found a quantity of offending article in his pocket that it would be a
dangerous obstacle to the administration of justice if it were held, because evidence was
obtained by illegal means, it could not be used against a party charged with an offence. See
Jones v. Owen24. The Judicial Committee in Kuruma, Son of Kanju v. R. 25, dealt with the
conviction of an accused of being in unlawful possession of ammunition which had been
discovered in consequence of a search of his person by a police officer below the rank of
those who were permitted to make such searches. The Judicial Committee held that the
evidence was rightly admitted. The reason given was that if evidence was admissible it
matters not how it was obtained. There is of course always a word of caution. It is that the
judge has a discretion to disallow evidence in a criminal case if the strict rules of
admissibility would operate unfairly against the accused. That caution is the golden rule in
criminal jurisprudence.26

It is too late in the day to challenge the admissibility of a conversation which has been tape-
recorded earlier if the same is relevant. As early as 1956, Bhandari, C.J. in Rup Chand v.
Mahabir Parshad27 had categorically laid down that such a tape-recording was clearly
admissible. The learned Chief Justice had relied on a number of American and English cases
in on the t support of his decision. In S.Partap Singh v. State of Punjab 28, the learned judges
of the Supreme Court relied heavily ape-recording which had been put on the record by the
petitioner. In fact the admissibility of the same was considered self-evidence. Their lordships
expressly adverted to the admissibility and the evidentiary value of the tape-recorded talk
which had been produced as part of the supporting evidence produced by Dr. Partap Singh,
the petitioner in that case.

24
(1870) 34 JP 759
25
1955 AC 197
26
R.M. Malkani v. State of Maharashtra, AIR 1973 SC 157
27
AIR 1956 Punj 173
28
AIR 1964 SC 72

28 | P a g e
SPEECH AND VOICES

BIBLIOGRAPHY
The books referred are-

 Modi J.P., “Medical Jurisprudence & Toxicology”, 23rd Ed.. LexisNexis. New Delhi

 Sharma B.R., “Forensic Science in Criminal Investigation & Trials”, 5 th Ed.,

Universal Law Publication, 2014

The various websites accessed are-


 http://www.animations.physics.unsw.edu.au/jw/voice.html
 https://en.wikipedia.org/wiki/Audio_forensics
 http://medind.nic.in/jal/t12/i1/jalt12i1p70.pdf
 http://muniwar.com/nsic/wp-content/uploads/2017/01/Speaker-Identification.pdf
 https://www.nidcd.nih.gov/health/what-is-voice-speech-language
 http://www.phonemicchart.com/what/
 http://shodhganga.inflibnet.ac.in/bitstream/10603/104402/8/08_chapter-1.pdf
 https://www.streetdirectory.com/travel_guide/139545/technology/key_differences_bet
ween_speech_recognition_and_voice_recognition.html
 http://what-when-how.com/forensic-sciences/voice-analysis/

29 | P a g e

You might also like