Download as pdf or txt
Download as pdf or txt
You are on page 1of 357
NAVAL POSTGRADUATE SCHOOL MONTEREY, CALIFORNIA THESIS DIPHONE-BASED SPEECH RECOGNITION USING NEURAL NETWORKS. by Mark E. Cantrell June, 1996 Dan C. Boger Robert B. McGhee Approved for public release; distribution is unlimited 19960912 027 v.ooneo je REPORT DOCUMENTATION PAGE een eeeieaaaened | ic epring bode fo is colloton of iforaton etal o avenge 1 four pr eaponte inca h me for tevewng norton, searching ring a sous, ateng ahd mating te daa needed and empeing an evieving te collin of formato, Send commen reprints borden nate oF ther aspatof ta alton of information icing suggestions for ecing is buen to Watheglon Header Seres, rector for nfrnaen| rations tn Reports, 1215 Jeferson Dis Migheay, Sui 204, ington, VA 22202-1902, and wo the Oc of Mshageme and Bude, Paper Rediton jet (OT-0188) Washington DC 205 [AGENCY USE ONLY (Leave blank) ] 2. REPORT DATE ‘3.REPORT TYPE AND DATES COVERED JUNE 1996 ‘Master's Thesis [TITLE AND SUBTITLE ‘SFUNDING NUMBERS. DIPHONE-BASED SPEECH RECOGNITION USING NEURAL NETWORKS Ik \UTHOR(S) Cantrell, Mark E. | PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) ‘PERFORMING ORGANIZATION REPORT NUMBER Naval Postgraduate School ‘Monterey CA 93943-5000 | SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10.SPONSORING/MONITORING AGENCY REPORT NUMBER [i SUPPLEMENTARY NOTES ‘The views expressed in this thesis are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government. fi2a, DISTRIBUTIONAVAILABILITY STATEMENT 126 DISTRIBUTION CODE proved for public release; distribution is unlimited. 2. Speaker-independent automatic speech recognition (ASR) i a problem of long-standing interest tothe Department of Defense. Unfortunately existing systems ae still too limited in capability for many military purposes. Most large-vocabulary systems use phonemes (individual speech sounds, including vowels and consonants) as recognition units. ‘This research explores the use of diphones (pairings of phonemes) as recognition units. Diphones are acoustically easier to recognize because coartculation effects between the diphone's phonemes become recognition features, rather than confounding variables asin phoneme recogeition. Also, diphones cary more information than phonemes, giving the lexical analyzer two chances to detect every phoneme in the word. Research results confirm these theoretical advantages Intesting with 4490 speech samples from 163 speakers, 70.2% of 157 test diphones were correctly identified by one trained neural network. Inthe same test, the correct diphone was one of the tp thee outputs 89.0% ofthe time. During ‘word recognition tess, the comect word was detected 85% of the time in continuous speech, Of those detections, the correct diphone was ranked first 41.6% ofthe time and among the top six 74% ofthe time. In adtion, new methods of Pitch-based frequency normalization and network feedback- based time alignment are introduced. Both ofthese techniques improved recognition accuracy on male and female speech samples from al eight dialect regions in the U.S. In one test set frequency normalization reduced errors by 34%. Similarly, feedback-based time alignment reduced another network's test st errors from 32.8% to 11.096, fa SUBIECT TERMS 1SNUMBER OF PAGES wutomatic speech recognition, diphone, neural network, speaker independent, 35 tinuous speech 16.PRICE CODE, h7SECuRITY TSSECURTTY 19 SECURITY 20.LIMITATION OF ‘LASSIFICATION OF CLASSIFICATION OF THIS _| CLASSIFICATION OF ‘ABSTRACT RT PAGE ABSTRACT UL Inclassified Unclassified Unclassified ‘NSN 7540-01-280-5500 ‘Standard Form 298 (Rev. 2-89) Prescribed by ANS St. 239-18 298-102

You might also like