Download as pdf
Download as pdf
You are on page 1of 16
United States Patent 1 Yoshizumi et al. [54] SPEECH SIGNAL PROCESSING APPARATUS FOR AMPLIFYING AN INPUT SIGNAL ‘BASED UPON CONSONANT FEATURES OF ‘THE SIGNAL (75} Inventors: Yoshiyuki Yoshizumi, Suita; Tsuyoshi Mekata; Yoshinori Yamada, both of Katano; Ryoji Suzuki, Nara, all of Japan (73] Assignee: ‘Technology Research Association of ‘Medical and Welfare Apparatus, ‘Tokyo, Japan [21] Appl. No.: $2,698 [22] Filed: Apr. 26, 1993, [30] ‘Foreign Application Priority Data ‘Ape.28, 1992} [51] Int. CLS [52] US. CL... 4.109451 vue GIOL 9100 395/2.63; 395/2.34; 395/2.35; 395/28 ~ 395/2.34, 2.55-2.64, 395/2.35-2.37, 2.8; 381/41-43 Japan [58] Field of Search [56] References Cited Us. exrENT DOCUMENTS 3o0g20 197 Ui 3a tines gm reste ine finials nse Mety “amie Sag Soe Pau ~Saat Sree Sim tatt d anes fhob0e toes Rane oe fins: ‘See mired sent Sake Giv0 into sees 12 FIRST [+ DELAY CIRCUIT {NO AN 8 3005583969A 5,583,969 Dec. 10, 1996 (1) Patent Number: 145] Date of Patent: 5,146,504 911992 Pinckley 51159,638. 10/1992 Naito ot al . 5278910 1/1994 Suauld etal. $408,581 4/1995 Sukuki ec al. OTHER PUBLICATIONS Parsons, Voice and Speech Processing, McGraw-Hill, New ‘York, NY (1987), pp. 119-121. R. W. Guelke, Jounal of Rehabilitation Research and Development, vol. 24, No. 4, pp. 217-220, Fall 1987, “Consonant Burst Enhancement: A Possible Means To Improve Intelligiblity For The Hard of Hearing”. Primary Examiner—Allen R. MacDonald Assistant Examiner—Michael A. Sartori ‘Attomey, Agent, or Firm—Remner, Ott, Boisselle, Sklar 57] ABSTRACT ‘An apparatus for processing a speech signal includes a coeflicient calculating circuit for receiving an input signal, and for generating a first value for suppressing a change of level ofthe input signal; a first delay circuit for seceiving the {input signal, and for delaying the input signal by a prede- termined time; a feature extracting circuit for receiving the ‘input signal, and for deriving a feature value representing a feature of consonants from the input signal; a coefficient contol circuit for receiving the first value from the coeff cient calculating circuit and the feature value from the feature extracting circuit, and for changing the amplitude and the duration of the first value depending on the feature value, $0 as 1 generate a second value; a multiplying circuit for receiving the delayed input signal from the frst delay circuit and the second value from the coefficient control circuit, and for imultiplying the delayed input signal by the second value. sw 3952.38 a 381086 395.4 ‘3957235 3 Claims, 8 Drawing Sheets S(t) S(t-b) COEFFICIENT COEFFICIENT O—-—4-+| CALCULATING [arqy7] CONTROL CIRCUIT CIRCUIT 4 FEATURE '—+] EXTRACTING CIRCUIT US. Patent Dec. 10, 1996 Sheet 1 of 8 5,583,969 FIRST [7 DELAY CIRCUIT - I. S(t-b) OOEFFICIENT COEFFICIENT 8 O->—_4—- eee Att) CONTROL ett) Ul ~ CIRCUIT Ce FEATURE —+| EXTRACTI cicurr° FIG. | S(t) PLOSIVE ) VOWEL TIME FIG, 2A A(t) 8 TIME FIG, 2B Git) 8 TIME FIG, 2C y(t) PLOSIVE VOWEL TIME FIG. 2D USS. Patent Dec. 10, 1996 Sheet 2 of 8 5,583,969 a 22 SECOND. PLOSIVE DELAY }—*| EXTRACTING 24 CIRCUIT CIRCUIT JUDGEMENT cB py ciRculT a PITCH lerector| "FIG, 3 33 BI £ FIRST AVERAGE BPF [+ AMPLITUDE I) CALCULATING CIRCUIT TIME-AXIS. 40/ | GENERATOR 36 38>) JUDGEMENT COMPARATOR |} — CIRCUIT _ BPF, _Jaweerroge THRESHOLD Rea [COL | CALCULATING MEMORY MEMORY CIRCUIT = FIG.4 Geo 31 ey am FIRST AVERAGE AMPLITUDE BPFH") CALCULATING CIRCUIT = 35 DIFFERENTIATOR |_5! a B iH SECOND AVERAGE BPF, AMPLITUDE TIMESANIS LJ") CALCULATING WERT CIRCUIT GENERATOR |__ JoUDGEMENT © [COMPARATOR circuit [> 36 FIG. 5 a oe 37~_{THRESHOLD] [CONSTANT 3° MEMORY MEMORY USS. Patent Dee. 10, 1996 Sheet 3 of 8 5,583,969 PLOSIVE VOWEL | vowel IAVEFORM Mn F » mrponr a > t TIME FIG. 6A H WAVEFORM ~~ |_ —— AT POINT B fH ulis | FIG. 6B ' it Ht i i! WAVEFORM i AT POINT C ee First JJ2 D s(t) + DELAY CIRCUIT is e a) HW 60: S(t-b) COEFFICIENT | a(t) | COEFFICIENT ——t | CALCULATING F-e*} CONTROL H(t) CIRCUIT CIRCUIT 7 40: TIME-AXiS 36 COMPARATOR THRESHOLD MEMORY (e a 3 37 GENERATOR JUDGEMENT IRCUIT CONSTANT MEMORY, FIG.7 U.S. Patent Dee, 10, 1996 Sheet 4 of 8 5,583,969 FRICATIVE, VOWEL VOWEL WAVEFORM TIME AT POINT D FIG.8A WAVEFORM ZN TIME AT POINTE ' | FIG. 8B A(t) i { A ' WAVEFORM ° b TIME AT POINT F FIG. 8C H(t) WAVEFORM 8 AT POINT G ° TIME Sree AMPLIFIER —© OUTPUT INPUT toe: od GAP DETECTOR - oe ean EVEL H DIF FERENTIATOR Lox Nore 1 NCL ieRaToR ZERO CROSSING 104 eis DETECTOR FIG.9 PRIOR ART USS. Patent Dee. 10, 1996 Sheet 5 of 8 5,583,969 TRANSITION —— {CONSONANT | | VOWEL 1 1 i —proaneirh\Nf\ anes ene ' ' 1 | | FIG. |IOA ' ' ' PRIOR ART i ' 1 1 1 eee eee FIG. 1|OB ! t ! PRIOR ART mye FIG. lOc cit) PRIOR ART -b ot Cit) +t FIG. l2 c(t) FIG, 14 =F art E(t) -e 10 +e + FIG. |3 FIG. 15 U.S. Patent Dee. 10, 1996 Sheet 6 of 8 5,583,969 7 8 i, ne af _ Cl-e) —— 4 x yi 128-e |sa-e) 130-e a 124 x Og Is] 130 on E(o) | cot) | see | cle) | +++ | clo) & 12are |stt+e)| E(te) 128 Posevel tee 21 \ ABSOLUTE oO VALUE CIRCUIT S(t-b) US. Patent Dee. 10, 1996 Sheet 7 of 8 5,583,969 AGE AMPLITUDE OF THE HIGHER“ FREQUENCY THRESHOLD AVERAGE ~ VALUE AMPLITUDE OF THE LOWER-FREQUENCY YES TIMER INITIALIZE +20 TIMER START ‘S163 Peearael AMPLITUDE Of HIGHER -| PaecenGr THRESHOLD AVERAGE VALUE AMPLITUDE OF THE LOWER-FREQUENCY FIG. I6 WHERE, to 6, 2 @ $an=1 0: + o Ma ¥ 00-04 FIG, 13 shows another characteristic of the coefficient (stored inte first memory in order to calculate the value ‘M() for suppressing the level change of the input signal This coefficient is shown in Equation (4). As shown inthis diagram, by making the coefficient C() asymmetrical with respect to the time axis, the temporal masking of auditory sense is securely compensated. As shown in Equation (6), by convolving this coefficient C) into the absolute value of the input signal S(0, the value of M(O becomes large when the level before end after the time ts larger than the level at the time t, andthe value of M() becomes small when the level before and after the time tis smaller than the level atthe time {and therefore by multiplying M0) and the input signal, the level of the input signal is smoothed, Thats, the coeficient Cis the characteristic for diferetiating in two steps with respect tothe time axis. However, the coeficient C() is set 50 a 1 satisfy the condition of Equation (5) in order not to change the entire level, 5,583,969 7 cs eA) ~ y- exp A950 exp(-F?0)~ ka enp(-FR6j)1>0 wee y Oy acho. 08> Oo eye b> ko Hc Ca Ge? ® m= ¥ cc- ue ° int FIG. 14 shows another characteristic of the coefficient C(Y stored in the first memory for calculating the value M(t) for suppressing the level change of the input signal. This coeficient C(t) is shown in Equation (7). As known from this diagram, by limiting the coofficient C() only on the positive time axis, the amplification in the silent sectional after vowel is decreased and the quantity of calculation is smaller. As shown in Equation (9), by convolving this coefficient C(t) into the absolute value of the input signal (0, the value of M() becomes large when the level after the time tis larger than the level atthe time t, and the value of 'M() becomes small when the level after the time tis smaller than the level atthe time t, and therefore by multiplying M() ‘and the input signal, the level of the input signal is smoothed. That i, the coefficient C() has the characteristic of differentiating the rise of the input signal in two steps with respect to the time axis. However, the coefficient C(\) is set 0 as fo satisfy the condition in Equation (8) in order not to cchange the entire level C1) =k, exp 92) — expt 0 where, by hy 62> 6,10 S cw ® 3! = 8 cae ® moe, $c te FIG. 15 shows the characteristic of the coellicient E(t) stored in the second memory for determining the level ofthe {input signal. This coeficient E(t) is shown in equation (10). As shown in Equation (12), by convolving this coeflicient E() into the absolute value of the input signal, the absolute value of the input signal is smoothed, and the level of the {input signal may be determined. That is, the coefficient E(1) is the characteristic for integrating on the time axis. How- ‘ever, in order not to change the entire level, the coefficient E() is set so as to satisfy the condition of Equation (11). Ht) = by exp(-F720,9) «o) fen ay une Ro -ae+ah ce In the following Equation (13), the value G() of applying the parameter ato A(t) is determined, ow, 00-9 240 o0-n>A0 0 ou) 1s 2» 38 0 CT 6 8 continued whem Oca! 03) ‘The parameter is determined depending on the feature value, such as the kind of plosives or the kind of fricatives. ‘When the parameter o. is smaller, the duration of the value G() will be longer. On the other hand, when the parameter .s larger, the duration of the value G(d) will be shorter. FIGS, 2A To 2D show waveforms respectively represent- {ng the original speech signal S(t) output from the first delay circuit 12, te compensation coefficient A(1) output from the coefficient calculating cireuit 11, the compensation coefii- cient Gt) output from the coefficient control circuit 14, and the speech signal y(t) output from the multiplier 13. FIG. 3 is a block diagram ofthe feature extracting circuit 15 for the speech signal processing apparatus of this embodiment of the present invention. Referring to FIG. 3, the feature extracting circuit 15 includes a second delay cireuit 21 for delaying the input specch signal, a plosive extracting circuit 2 for deriving a feature value representing 2 feature of a plosive component from the speech signal, 3 pitch detector 23 for detecting the pitch of the speech signal, ‘and a judgement circuit 24 for determining whether the speech signal is aplosive or not based on the output from the plosive extracting circuit 22 and the pitch detector 23, ‘The operation of the above feature extracting circuit 18 will be described, ‘The input speech signal is seat to the second delay circuit 21 and the pitch detector 23. The second delay circuit 21 receives the input speoch signal, and delays the speech signal by a time d to output a delayed signal to the plosive extracting circuit 22, The plosive extracting circuit 22 receives the delayed signal, and derives a feature value representing a feature of a plosive component from the speech signal. ‘The feature value extracted by the plosive extracting circuit 22 is sent to the judgement circuit 24, The feature Value indicates whether the input speech signal includes a plosive or not, Further, the feature value may indicate what kind of plosives the input speech signal includes. The pitch detector 23 caleulates the pitch fre ‘quency ofthe speech signal to determine whether the speech signal is sound or silent. The output from the pitch detector 23 may indicate whether there exists a vowel after a con- sonant inthe signal speech signal. The output from the pitch deiector 23 is also sent (0 the Judgement circuit 24. The judgement circuit 24 receives the feature value from plosive ‘extracting circuit 22 and the output from the pitch detector 23, and determines whether the feature value passes through the judgement circuit 24 depending on the output from the pitch detector 23. As a result, when both the output from the plosive extracting circuit 22 and the output from the pitch detector 23 are truth, the judgement circuit 24 outputs a signal indicating whether the input speech signal includes a plosive or not. Further, the judgement circuit 24 may output a signal indicating the kind of plosives in the input speech signal. “Thus, according to this embodiment of the present inven- tion, the feature value indicating whether a plosive included {nthe input speech signal or not can be detected. Further, the feature value indicating what kind of plosives is included in the input speech signal can be detected. This makes it possible to control the duration of the compensation coef- ficient depending on the kinds of consonants used such as, plosives and fricatives. As a result, a speech signal process- {ng apparatus can be provided which can control the com- pensation coefficient for providing the appropriate length of 5,583,969 9 time period during which the input speech signal is to be amplified, depending on the kinds of the consonants having different VOTs. Further, according to the feature extracting circuit 15 of this embodiment of the present invention, only a plosive pronounced immediately before & vowel is detected. prevents other components of the speech signal from being ‘mistakenly detected. It is possible thatthe feature extracting circuit 15 consists of only the plosive extracting circuit 22. According to such a configuration, it is expected that the entire delay time due to the processing can be reduced, but the number of errors are increased. EXAMPLE 2 FIG. 4 shows @ block diagram of a plosive extracting cireuit according to the present invention. Referring to FIG. 4, the plosive extracting cicuit includes a first band pass filter (BPF.,)31 which allows components ofa speech signal having middle to high frequencies (bereinafter referred to as higher frequency components) to pass therethrough, a see- ond band pass fiter (BPF,) 32 which allows components thereof having low to middle frequencies (hereinafter referred to as lower-frequency components) Wo pass there- through, and frst and second average amplitude caleulating circuits 33 and 34 for calculating an average amplitude in 2 short time perio. ‘The plosive extracting circuit further includes a divider 35, a threshold memory 37 for string, a consiant as threshold, a comparator 36 for comparing the output from the divider 35 with the output from the threshold memory 37, @ constant memory 39 for storing durations of plosives and the like, a time-axis gencrator 40 for generating a clock signal, and judgement circuit 38 for identifying the kind of plosives by comparing the output from the comparator 36 ‘with the output from the constant memory 39 on the basis of the clock signal output from the time-axis generator 40 ‘The operation of the above plosive extracting circuit wil be described, ‘An input speech signal is sent to the BPF,, 31 and the BPF, 32. The BPF,, 31 allows higher-irequency compo- nents having a frequency in the range of 3.7 to 5 kHz, for example, o pass therethrough. The BPR, 32 allows lower- frequency components having a frequency in the range of 100 to 900 kFiz, for example, to pass therethrough. The speech signals filtered through the BPF., 31 and the BPF, 32 sare then sent tothe first and the second average amplitude calculating circuits 33 and 34, respectively, where an aver- age amplitude for 2 predetermined short time period is calculated. Then, the output from the first average amplitude calculating cicuit 33 is divided by the output from the second average amplitde calculating circuit 34 by the divider 35, in order to obtain the ratio of the shor-period average amplitude of the higher-fequency components to that ofthe lower frequency components The threshold memory 37 stores a predetermined constant sa threshold. The comparator 36 compares the output from the divider 35 withthe output from the threshold memory 37 0 a8 to determine whether the former exceeds the latter or not, and sends the resulting data to the judgement circuit 38 ‘The resulting data is represented by cither one of two values. Specifically, only when the output from divider 35 exceeds the constant stored in te threshold memory 37, the resulting dataisahigh value (¢g., 1), and otherwise the resulting data is a low value (eg, 0}. The constant memory 39 stores constants t,t, and f, corresponding to the durations of the » 35 4s 10 plosives, /p, V, and fii, respectively. The time-axis genera- tor 40 generates a clock signal having a predetermined cycle, ‘The judgement circuit 38 compares the output from the comparator 36 with the output from the constant memory 39 fn the basis of the clock signal output from the time-axis generator 40, and determines how long the ratio continues to exceed the threshold, thereby to identify the plosive. In this example, the plosive is identified as /p/ when the high value ‘output from the comparator 36 lass fora period less than or equal to t, as / when the high value output from the comparator 36 lasis for @ period less than or equal tot, but greater than ty, and as /k/ when the high value output from the comparator 36 lasts for a period less than or equal to ty ‘but greater than ty. When the high value output from the comparator 36 lasts for a period greater than ty, it is Aetermined that the speech signal is not a plosive. FIG. 16 shows the process of extracting the kind of plosives from the input speech signal, using the plosive extracting circuit mentioned above. In step S16, the ratio of| the short-period average amplitude of the higher-frequency ‘components to that of the lower-frequency components is ‘compared with 2 threshold valve stored in the threshold ‘memory 37. If Yes in step $161, then a timer is initialized and starts (steps $162 and $163), The timer is used 10 ‘measure how long the ratio continues fo exceed the thresh- ‘ld value. While the ratio exceeds the threshold value, step ‘$164 is repeated, and a time measured by the timer proceeds. If NO in step S164, the timer stops to measure the time so 2 to obtain a time period t which indicates how long the ratio continues to exceed the threshold value. If the time period t complies with tg

You might also like