A Review On Recent Trends and Development in Speech Recognition System

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/339245470
A Review on Recent Trends and Development in Speech Recognition System
Article in Journal of Advanced Research in Dynamical and Control Systems · February 2020

DOI: 10.5373/JARDCS/V12SP1/20201099
CITATION READS
1 1,445
6 authors, including:
Joshuva Arockia Dhanraj Balachandar Krishnamurthy

Hindustan University Prathyusha Institute of Technology and Management
114 PUBLICATIONS 819 CITATIONS 26 PUBLICATIONS 35 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Robonauts Research Centre View project
Fault Diagnosis and Localization on Wind Turbine Blade View project
All content following this page was uploaded by Joshuva Arockia Dhanraj on 14 February 2020.
The user has requested enhancement of the downloaded file.

Jour of Adv Research in Dynamical & Control Systems, Vol. 12, 01-Special Issue, 2020
A Review on Recent Trends and

Development in Speech Recognition System
A. Joshuva*, Department of Mechanical Engineering, Hindustan Institute of Technology and Science, Chennai, Tamilnadu,
India. E-mail: joshuva1991@gmail.com
S. Priyadharsini, Department of Mechanical Engineering, Rajalakshmi Institute of Technology, Chennai, Tamilnadu, India.
E-mail: priyadharsini579@gmail.com
S. Aravinth, Department of Mechanical Engineering, Rajalakshmi Institute of Technology, Chennai, Tamilnadu, India.
E-mail: aravinthkumar315@gmail.com
P. Jayaraman, Department of Mechanical Engineering, Prathyusha Engineering College, Chennai, Tamilnadu, India.
E-mail: jayaramansharmila@gmail.com
K. Balachandar, Department of Mechanical Engineering, Prathyusha Engineering College, Chennai, Tamilnadu, India.
E-mail: balachandar3089@gmail.com
D. Meganathan, Department of Mechanical Engineering, Prathyusha Engineering College, Chennai, Tamilnadu, India.
E-mail: d.meganathan17@gmail.com
Abstract--- Speech recognition is the method used to analyse the verbal content of an audio signal and its converted
into a machine-understandable format, which is similar to understanding the speech by the system. Speech
recognition is an interesting topic in research. There are a lot of research carried out in this field where speech is
recognized and translated into text. The accuracy of speech recognition systems remains challenging task in the
research field. Voice-controlled interfaces can now be found in the environment such as mobile phones, televisions,
and even cars. There is different programming accessible that enable clients to direct to their PC and have their
words changed over to content in a word design. This is the procedure though the discourse waveform caught by a
mouthpiece is naturally changed over into a succession of words. This review depends on a thought of making a
nitty gritty report on computerized discourse acknowledgment and the systems utilized behind this technique.
Keywords--- Speech Recognition, Feature Extraction, Machine Learning, Filter, Signal Processing.
I. Introduction
Speech recognition is otherwise called programmed speech recognition is the way toward changing over the
speech into the PC getting arrangement or some double language to complete a particular undertaking [1-3]. Speech
recognition is presently being utilized in an assortment of fields for both household and modern purposes. An
incredible ascent in the cell phone industry brought forth AI chatbots like SIRI, CORTANA, ALEXA, that can
comprehend what the client says and act as needs be [4-6]. This is certifiably not a basic assignment since the voice
and emphasize isn't reliable with each client. So the normal language characteristics and examples were examined so
as to convey an ideal speech recognition framework [7-10]. Different speech redemptions, styles, accents and
semantics should be breaking down legitimately for an improvement of a speech recognition framework. Still these
frameworks have a few disadvantages that are regular in reality situation. Speech recognition is likewise being
utilized in administration conveyance, computerized identification, telecommunication administration and so forth.
This has tremendously decreased the labour for enormous firms to speak with the clients and to react to their
enquiries [11-14].
II. Speech Recognition Process

There are several modules in speech recognition that work together to get the desired data. They are
• Voice Acquisition
• Noise Filtering
• Identifying ROI
• Speech Recognition
• Determining results
DOI: 10.5373/JARDCS/V12SP1/20201099
ISSN 1943-023X 521
Received: 17 Nov 2019/Accepted: 14 Dec 2019
2.1. Voice Acquisition

For any voice recognition systems, it is essential to have an info audio signal comprising of verbal language.
This audio signal is to be caught by an appropriate mouthpiece transducer [15]. This amplifier transducer changes
over sound waves into electrical signals. Different kinds of receiver have distinctive methods for changing over
vitality however it gives one thing in same parameter: the diaphragm. This is a delicate bit of elastic material which
vibrates when it is hit by audio waves. In a common handheld microphone, the diaphragm is situated in front of the
microphone. When it vibrates, it makes various deflections in the mouthpiece [16]. These vibrations are converted
into an electrical signal which turns into the audio signal. These audio signals have different parameters, for
example, amplitude, frequency, pitch, speed and so on, which causes us to separate between various voices and
varieties in them [17].
2.2. Noise Filtering
Acquired audio signal from an external environment has many noises that prevents, disrupts or interferes the data
signal [18]. This, in many cases causes loss of desired data and causes unwanted signals that will be processed. To
avoid this consistent noises and other low frequency vibrations are to be excluded from the system. These noises
need not be produced by external sources; they can also be produced by electrical components such as thermal noise,
electromagnetic noise that can interfere with transmission and reception [19]. There are many methods of filters,
namely
• Butterworth filter
• Chebyshev filter
• Elliptical filter
2.2.1. Butterworth Filter
The Butterworth filter is a method of signal processing algorithm designed to have a frequency response as low
as possible in the band pass. It is also referred to as a maximum flat magnitude filter [20]. Butterworth filters are the
most used digital filter algorithm in motion study and in voice circuits.
2.2.2. Chebyshev Filter
Chebyshev filters are nothing but the analog or digital filters.This algorithm can achieve a sharp shift in change
in between the band pass and the Bandersnatch which produces small errors and faster execution [21].
2.2.3. Elliptic Filter
Elliptic filter is a signal conditioning algorithm with equalized wave characteristics in both the passband and the
stopband [22].As the wave in the stopband nears zero, the filter becomes a type one Chebyshev filter. As the wave in
the passband approaches zero, the algorithm becomes a type Two Chebyshev filter and finally, as both ripple values
approach zero, the filter becomes a Butterworth filter.
2.3. Identifying ROI
Region of Interest (ROI) is the piece of the audio signal where the conceivable verbal audio signal is to be found.
Utilizing a Fourier change on the convoluted signals, the required piece of the sound is to be isolated utilizing a
Fourier based signal conditioning.This encourages us further to evacuate any commotions present in the framework.
Subsequent to isolating from the convoluted signals, the ROI is acquired with its particular parameters that can be
sent for preparing [23].
2.4. Speech Recognition
Speech recognition is the undertaking of changing over any audio motion into its signal portrayal. Speech
recognition is the procedure of interpretation of verbally expressed words into the parallel language [24]. To change
over speech to on-display content or an order, the framework needs to experience a few complex advances. At the
point when an individual makes vibrations in air, when he talks it makes analog signals. An analog-to-advanced
converter is utilized to makes a translation of this analog signal into machine data that the computer can get it. exact
estimations of the wave are taken to digitalize the sound at continuous interims [25]. In speech handling the
framework channels the digitized sound to evacuate undesirable clamour and separate it into various groups of
recurrence. It likewise standardizes the sound to a consistent volume. The product looks at phonetics with regards to
different phonetics around them. It runs the relevant phonetics through a complex factual pattern in the product
library and looks at them to a substantial library of identified lexicons, expressions and sentence [26]. The program
DOI: 10.5373/JARDCS/V12SP1/20201099
ISSN 1943-023X 522
at that point code reds out what the clint was likely saying and either gives it as content or issues a framework
problem.
The prepared speech is presently sent through a lot of example coordinating calculations that select the most
appropriate audio signal and its comparing verbal code [27]. The sound is part into parts and broke down for
likenesses in the example testing.This is finished utilizing highlight extraction, where a specific arrangement of
feature,like the amplitude,pitch or some other variety in these parameters can be useful in deciding the general word.
This is done to remove each data from the speech signal [28].
2.5. Determining Results
The processed speech is now sent to a machine learning classifiers where it will choose the best pattern that
matches the original signal.
III. Phases of Speech Recognition

3.1. Speech Signal
Speech signals are exceptionally non-stationary wide scope of various recurrence created in a timeframe. The
most well-known time-recurrence portrayal is described among time and recurrence resolutions [29]. Speech signal
is where the word verbally expressed by an individual is gotten as sounds (analog vibrations in air) and digitized
utilizing amplifier.
3.2. Signal Processing
Speech signal is the applicationof digital signal processing technique. To represent a speech signal in digital
form, signals can be represented as samples which are periodic in time.Representation of speech signal can be
classified as waveform representation and parametric representation [30].There are many areas where speech
processing is used for example speech recognition, speakerrecognition, speech synthesis.
3.3. Speaker Recognition
At the point when the audio signals are being fed in the program, it can be utilized for various purposes. One of
the objective is to perceive the voice or the speaker. This can be accomplished by looking at different parameters
like pitch which is shown with the frequencies put away in the framework. It is additionally like the case word
acknowledgment, where the words verbally expressed by the speaker are contrasted and the recently put away
data.Such systems are normally used as security protocols or identification programs where actual identification of
humans is a needed. These programs can also be used as an tool to other audio processing operations [31].
3.4. Speech Coding
Discourse coding is the procedure of information pressure of computerized sound signs containing discourse.
Discourse coding utilizes discourse explicit parameters utilizing sound flag handling procedures to plan the
discourse signal. Discourse coding is predominantly utilized in media transmission procedures to improve flag
transmission and gathering to commotion proportion [32]. Discourse coding is the way toward acquiring a portrayal
of voice signals for transmission over wired and remote channels or storage.Speech coding is the craft of making a
portrayal of the discourse flag that can be effectively transmitted or put away in computerized media. A discourse
coder changes over a digitized discourse motion into a coded representation (coded outlines). A discourse decoder
gets coded outlines and blends the remade discourse [33]. The principles normally depict the input– yield
connections of both coder and decoder.
3.5. Decoding
The decoder takes the successions of appraisals of the probabilities and thinks about them against models of each
conceivable expression in the language. It at that point yields the in all probability articulation [34]. The decoder is
typically executed as a hunt through some celebrated calculations like Hidden Markov Model (HMM). Concealed
Markov Models are standard scientific strategy has been generally perceived from portraying models for existing
frameworks to creating test. Gee is a Statistical Model where the framework being demonstrated is thought to be a
Markov procedure with obscure parameters and to decide the concealed parameters from the detectable parameters
[35].
3.6. Speech Synthesis
The discourse combination is to change over composed content to discourse. The vocabulary ought not be
limited for discourse union and incorporated discourse must be close enough to common discourse. Discourse union
DOI: 10.5373/JARDCS/V12SP1/20201099
ISSN 1943-023X 523
is the counterfeit generation of human discourse [36]. A PC framework utilized for this reason for existing is known
as a discourse synthesizer which can be executed in programming or equipment. A content to-discourse framework
changes over the semantic portrayals like phonetic translations into a total discourse. Integrated discourse can be
made by associating the bits of recorded discourse that are put away in a database. a synthesizer can consolidate a
model of the vocal tract and other human voice attributes to make a totally "manufactured" voice yield [37].
3.7. Automatic Speech Recognition
Programmed discourse acknowledgment is a rising system that helps in perceiving the human discourse by the
machine [38]. There are various research going on in structure a model for perceiving discourse and changing over
into content. The paper abridges the different sorts and techniques pursued. Programmed discourse acknowledgment
is picking up significance nowadays as the greater part of the cell phones are worked with this application that make
the client simple to make a call or type a message. Automatic speech recognition system contains the following
modules like speech signal acquisition [39-41], feature extraction [42-73], acoustic modelling [74-77] and language
modelling [78-80].Fig. 1 shows the automatic speech understanding block diagram.
Figure 1: Automatic Speech Understanding Block Diagram
IV. Conclusion
Speech recognition had become a very important factor in communicating with each other to share or gain
needed information. The man-machine communication is carried out between machine and man where they can
interact with each other with the help speech recognition components. This review focus on the man-machine
interaction by speech recognition where the AI can recognize the speech, and it is able to respond to the person in
that instant.
References
[1] Xiong W, Wu L, Alleva F, Droppo J, Huang X, Stolcke A. The Microsoft 2017 conversational speech
recognition system. IEEE international conference on acoustics, speech and signal processing (ICASSP)
2018 Apr 15 (pp. 5934-5938.
[2] Chiu CC, Sainath TN, Wu Y, Prabhavalkar R, Nguyen P, Chen Z, Kannan A, Weiss RJ, Rao K, Gonina E,
Jaitly N. State-of-the-art speech recognition with sequence-to-sequence models. In2018 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018 Apr 15 (pp. 4774-4778). IEEE.
[3] Afouras T, Chung JS, Senior A, Vinyals O, Zisserman A. Deep audio-visual speech recognition. IEEE
transactions on pattern analysis and machine intelligence. 2018 Dec 21.
[4] Mustafa MK, Allen T, Appiah K. A comparative review of dynamic neural networks and hidden Markov
model methods for mobile on-device speech recognition. Neural Computing and Applications. 2019 Feb
13;31(2):891-9.
[5] Shrivastava N, Saxena A, Kumar Y, Shah RR, Mahata D, Stent A. MobiVSR: A Visual Speech
Recognition Solution for Mobile Devices. arXiv preprint arXiv:1905.03968. 2019 May 10.
[6] He Y, Sainath TN, Prabhavalkar R, McGraw I, Alvarez R, Zhao D, Rybach D, Kannan A, Wu Y, Pang R,
Liang Q. Streaming End-to-end Speech Recognition For Mobile Devices. In ICASSP 2019-2019 IEEE
DOI: 10.5373/JARDCS/V12SP1/20201099
ISSN 1943-023X 524
International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019 May 12 (pp. 6381-
6385). IEEE.
[7] Barker J, Watanabe S, Vincent E, Trmal J. The fifth'CHiME'Speech Separation and Recognition Challenge:
Dataset, task and baselines. arXiv preprint arXiv:1803.10609. 2018 Mar 28.
[8] Warden P. Speech commands: A dataset for limited-vocabulary speech recognition. arXiv preprint
arXiv:1804.03209. 2018 Apr 9.
[9] Chan W, Jaitly N, Le QV, Vinyals O, Shazeer NM, inventors; Google LLC, assignee. Speech recognition
with attention-based recurrent neural networks. United States patent US 9,990,918. 2018 Jun 5.
[10] Zeyer A, Irie K, Schlüter R, Ney H. Improved training of end-to-end attention models for speech
recognition. arXiv preprint arXiv:1805.03294. 2018 May 8.
[11] Beckley JD, Aggarwal P, Balasubramanyam S, inventors; Qualcomm Inc, assignee. Method and systems
having improved speech recognition. United States patent US 9,881,616. 2018 Jan 30.
[12] Audhkhasi K, Kingsbury B, Ramabhadran B, Saon G, Picheny M. Building competitive direct acoustics-to-
word models for english conversational speech recognition. In2018 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP) 2018 Apr 15 (pp. 4759-4763). IEEE.
[13] Ravanelli M, Brakel P, Omologo M, Bengio Y. Light gated recurrent units for speech recognition. IEEE
Transactions on Emerging Topics in Computational Intelligence. 2018 Mar 23;2(2):92-102.
[14] Petridis S, Stafylakis T, Ma P, Cai F, Tzimiropoulos G, Pantic M. End-to-end audiovisual speech
recognition. In2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
2018 Apr 15 (pp. 6548-6552). IEEE.
[15] Omiya Y, Hagiwara N, Shinohara S, Nakamura M, Higuchi M, Mitsuyoshi S, Takayama E, Tokuno S. The
Influence of the Voice Acquisition Method to the Mental Health State Estimation Based on Vocal Analysis.
InWorld Congress on Medical Physics and Biomedical Engineering 2018 2019 (pp. 327-330). Springer,
Singapore.
[16] Lee CH, Lee J, Rosario D, Kim E, Chan T, inventors; Volkswagen AG, assignee. Voice command
acquisition system and method. United States patent US 8,285,545. 2012 Oct 9.
[17] Vogel AP, Maruff P. Comparison of voice acquisition methodologies in speech research. Behavior
research methods. 2008 Nov 1;40(4):982-7.
[18] Saito Y, inventor; Murata Manufacturing Co Ltd, assignee. Noise filter. United States patent US 4,636,752.
1987 Jan 13.
[19] Petrovic NI, Crnojevic V. Universal impulse noise filter based on genetic programming. IEEE Transactions
on Image Processing. 2008 May 28;17(7):1109-20.
[20] Garg A, Sahu OP. A hybrid approach for speech enhancement using Bionic wavelet transform and
Butterworth filter. International Journal of Computers and Applications. 2019 May 22:1-1.
[21] Chen B, Yu S, Yu Y, Guo R. Nonlinear active noise control system based on correlated EMD and
Chebyshev filter. Mechanical Systems and Signal Processing. 2019 Sep 1;130:74-86.
[22] Podder P, Hasan MM, Islam MR, Sayeed M. Design and implementation of Butterworth, Chebyshev-I and
elliptic filter for speech signal analysis. International Journal of Computer Applications. 2014 Jan 1;98(7).
[23] Gadonniex S, Banta CE, Prater DM, inventors; HP Inc, assignee. Quick method and apparatus for
identifying a region of interest in an ultrasound display. United States patent US 5,538,003. 1996 Jul 23.
[24] Novoa J, Escudero JP, Wuth J, Poblete V, King S, Stern R, Yoma NB. Exploring the robustness of features
and enhancement on speech recognition systems in highly-reverberant real environments. arXiv preprint
arXiv:1803.09013. 2018 Mar 23.
[25] Lippmann R, Martin E, Paul D. Multi-style training for robust isolated-word speech recognition.
InICASSP'87. IEEE International Conference on Acoustics, Speech, and Signal Processing 1987 Apr 6
(Vol. 12, pp. 705-708). IEEE.
[26] Toshniwal S, Sainath TN, Weiss RJ, Li B, Moreno P, Weinstein E, Rao K. Multilingual speech recognition
with a single end-to-end model. In2018 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP) 2018 Apr 15 (pp. 4904-4908). IEEE.
[27] Weng C, Cui J, Wang G, Wang J, Yu C, Su D, Yu D. Improving Attention Based Sequence-to-Sequence
Models for End-to-End English Conversational Speech Recognition. In Interspeech 2018 Sep (pp. 761-
765).
[28] Huang KY, Wu CH, Hong QB, Su MH, Chen YH. Speech Emotion Recognition Using Deep Neural
Network Considering Verbal and Nonverbal Speech Sounds. InICASSP 2019-2019 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019 May 12 (pp. 5866-5870). IEEE.
DOI: 10.5373/JARDCS/V12SP1/20201099
ISSN 1943-023X 525
[29] Sakar CO, Serbes G, Gunduz A, Tunc HC, Nizam H, Sakar BE, Tutuncu M, Aydin T, Isenkul ME,
Apaydin H. A comparative analysis of speech signal processing algorithms for Parkinson’s disease
classification and the use of the tunable Q-factor wavelet transform. Applied Soft Computing. 2019 Jan
1;74:255-63.
[30] Naik DK, Kajarekar S, inventors; Apple Inc, assignee. Robust end-pointing of speech signals using speaker
recognition. United States patent application US 10/186,282. 2019 Jan 22.
[31] Berg KA, Noble JH, Dawant B, Dwyer R, Labadie R, Gifford RH. Effect of number of channels and
speech coding strategy on speech recognition in mid-scala electrode recipients. The Journal of the
Acoustical Society of America. 2019 Mar;145(3):1796-7.
[32] Yoshimura T, Hashimoto K, Oura K, Nankaku Y, Tokuda K. Speaker-dependent Wavenet-based Delay-
free Adpcm Speech Coding. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP) 2019 May 12 (pp. 7145-7149). IEEE.
[33] Losorelli S, Kaneshiro B, Musacchia GA, Blevins NH, Fitzgerald MB. Decoding Speech and Music
Stimuli from the Frequency Following Response. BioRxiv. 2019 Jan 1:661066.
[34] Zhang X, Zexin LI, Miao L, inventors; Huawei Technologies Co Ltd, assignee. Speech/audio bitstream
decoding method and apparatus. United States patent application US 10/269,357. 2019 Apr 23.
[35] Anumanchipalli GK, Chartier J, Chang EF. Speech synthesis from neural decoding of spoken sentences.
Nature. 2019 Apr;568(7753):493.
[36] Abdelaziz AH, Theobald BJ, Binder J, Fanelli G, Dixon P, Apostoloff N, Weise T, Kajareker S. Speaker-
Independent Speech-Driven Visual Speech Synthesis using Domain-Adapted Acoustic Models. arXiv
preprint arXiv:1905.06860. 2019 May 15.
[37] Acero A, Stern RM. Environmental robustness in automatic speech recognition. In International
Conference on Acoustics, Speech, and Signal Processing 1990 Apr 3 (pp. 849-852). IEEE.
[38] Buck M, Haulick T, Pfleiderer HJ. Self-calibrating microphone arrays for speech signal acquisition: A
systematic approach. Signal processing. 2006 Jun 1;86(6):1230-8.
[39] Fischer S, Simmer KU. Beamforming microphone arrays for speech acquisition in noisy environments.
Speech communication. 1996 Dec 1;20(3-4):215-27.
[40] Herbordt W, Kellermann W. Adaptive beamforming for audio signal acquisition. In Adaptive Signal
Processing 2003 (pp. 155-194). Springer, Berlin, Heidelberg.
[41] Li S, Ding C, Lu X, Shen P, Kawahara T. End-to-End Articulatory Attribute Modeling for Low-resource
Multilingual Speech Recognition. Proc. Interspeech 2019. 2019:2145-9.
[42] Joshuva A, Sugumaran V. A Lazy Learning Approach for Condition Monitoring of Wind Turbine Blade
Using Vibration Signals and Histogram Features. Measurement. 2019 Nov 23:107295.
[43] Rajamanickam SK, Ravichandran V, Sattanathan S, Ganapathy D, Dhanraj JA. Experimental Investigation
on Mechanical Properties and Vibration Damping Frequency Factor of KenafFiber Reinforced Epoxy
Composite. SAE Technical Paper; 2019 Oct 11:153086.
[44] Joshuva A, Aslesh AK, Sugumaran V. State of the Art of Structural Health Monitoring of Wind Turbines.
International Journal of Mechanical and Production Engineering Research and Development.
2019;9(5):95-112.
[45] Joshuva A, Deenadayalan G, Sivakumar S, Sathishkumar R, Vishnuvardhan R. Implementing Rotation
Forest for Wind Turbine Blade Fault Diagnosis. International Journal of Recent Technology and
Engineering. 2019;8(2 Special Issue 11):185-192.
[46] Joshuva A, Vishnuvardhan R, Deenadayalan G, Sathishkumar R, Sivakumar S. Implementation of Rule
based Classifiers for Wind Turbine Blade Fault Diagnosis Using Vibration Signals. International Journal
of Recent Technology and Engineering. 2019;8(2 Special Issue 11):320-331.
[47] Joshuva A, Deenadayalan G, Sivakumar S, Sathishkumar R, Vishnuvardhan R. Logistic Model Tree
Classifier for Condition Monitoring of Wind Turbine Blades. International Journal of Recent Technology
and Engineering. 2019;8(2 Special Issue 11):202-209.
[48] Joshuva A, Sivakumar S, Vishnuvardhan R, Deenadayalan G, Sathishkumar R. Research on Hyper pipes
and Voting Feature Intervals Classifier for Condition Monitoring of Wind Turbine Blades Using Vibration
Signals. International Journal of Recent Technology and Engineering. 2019;8(2 Special Issue 11):310-319.
[49] Joshuva A, Sivakumar S, Sathishkumar R, Deenadayalan G, Vishnuvardhan R. Fault Diagnosis of Wind
Turbine Blades Using Histogram Features through Nested Dichotomy Classifiers. International Journal of
Recent Technology and Engineering. 2019;8(2 Special Issue 11):193-201.
[50] Kumar RS, Sivakumar S, Joshuva A, Deenadayalan G, Vishnuvardhan R. Data set on optimization of ethyl
ester production from sapota seed oil. Data in brief. 2019 Aug 1;25:104388.
DOI: 10.5373/JARDCS/V12SP1/20201099
ISSN 1943-023X 526
[51] Joshuva A, Sugumaran V. Improvement in wind energy production through condition monitoring of wind
turbine blades using vibration signatures and ARMA features: a data-driven approach. Progress in
Industrial Ecology, an International Journal. 2019 Jun 21;13(3):207-231.
[52] Joshuva A, Sugumaran V. Selection of a meta classifier-data model for classifying wind turbine blade fault
conditions using histogram features and vibration signals: a data-mining study. Progress in Industrial
Ecology, an International Journal. 2019 Jun 21;13(3):232-51.
[53] Joshuva A, Sugumaran V. Crack Detection and Localization on Wind Turbine Blade Using Machine
Learning Algorithms: A Data Mining Approach. StructDurab Health Monit (SDHM). 2019;13(2):181-203.
[54] Joshuva A, Sugumaran V. Fault Diagnosis and Localization of Wind Turbine Blade [dissertation]. Vellore
Institute of Technology, Chennai Campus. 2018.
[55] Joshuva A, Sugumaran V. A machine learning approach for condition monitoring of wind turbine blade
using autoregressive moving average (ARMA) features through vibration signals: a comparative study.
Progress in Industrial Ecology, an International Journal. 2018;12(1-2):14-34.
[56] Joshuva A, Sugumaran V. A Comparative Study for Condition Monitoring on Wind Turbine Blade using
Vibration Signals through Statistical Features: a Lazy Learning Approach. International Journal of
Engineering & Technology. 2018;7(4.10):190-196.
[57] Vasudha M, Harshal P, Joshuva A, Sugumaran V. Effect of sampling frequency and sample length on fault
diagnosis of wind turbine blade. Pakistan Journal of Biotechnology. 2018;15(Special Issue ICRAME
17):14-17.
[58] Manju BR, Joshuva A, Sugumaran V. A data mining study for condition monitoring on wind turbine blades
using Hoeffding tree algorithm through statistical and histogram. International Journal of Mechanical
Engineering and Technology. 2018;9(1):1061-1079.
[59] Joshuva A, Sugumaran V. A study of various blade fault conditions on a wind turbine using vibration
signals through histogram features. Journal of Engineering Science and Technology. 2018 Jan;13(1):102-
121.
[60] Joshuva A, Sugumaran V. Classification of Various Wind Turbine Blade Faults through Vibration Signals
Using Hyperpipes and Voting Feature Intervals Algorithm. International Journal of Performability
Engineering. 2017;13(3):247-258.
[61] Joshuva A, Sugumaran V. Fault Diagnosis for Wind Turbine Blade through Vibration Signals Using
Statistical Features and Random Forest Algorithm. International Journal of Pharmacy and Technology.
2017;9(1):28684-28696.
[62] Joshuva A, Sugumaran V. Wind Turbine Blade Fault Diagnosis Using Vibration Signals and Statistical
Features through Nested Dichotomy Classifiers. International Journal of Pharmacy and Technology.
2017;9(1):29147-29164.
[63] Joshuva A, Sugumaran V. A data driven approach for condition monitoring of wind turbine blade using
vibration signals through best-first tree algorithm and functional trees algorithm: A comparative study. ISA
transactions. 2017 Mar 1;67:160-172.
[64] Joshuva A, Sugumaran V. A comparative study of Bayes classifiers for blade fault diagnosis in wind
turbines through vibration signals. StructDurab Health Monit (SDHM). 2017;12(1):69-90.
[65] Joshuva A, Sugumaran V. Wind turbine blade fault diagnosis using vibration signals through decision tree
algorithm. Indian Journal of Science and Technology. 2016 Dec;9(48):1-7.
[66] Joshuva A, Sugumaran V, Amarnath M, Lee SK. Remaining life-time assessment of gear box using
regression model. Indian Journal of Science and Technology. 2016 Dec;9(47):1-8.
[67] Joshuva A, Sugumaran V. Failure Analysis on Wind Blade Using Vibration Signals and Classifying the
Failures Using Logit Boost Algorithm. International Journal of Control Theory and Applications.
2016;9(52):225-234.
[68] Joshuva A, Sugumaran V. Multiclass Classifier Approach for Fault Diagnosis of Wind Turbine Blade
Using Vibration Signals through Statistical Analysis. International Journal of Control Theory and
Applications. 2016;9(52):235-247.
[69] Joshuva A, Sugumaran V. Fault Diagnosis of Wind Turbine Blade Using Vibration Signals through J48
Decision Tree Algorithm and Random Tree Classifier. International Journal of Control Theory and
Applications. 2016;9(52):249-258.
[70] Joshuva A, Sugumaran V. Fault diagnostic methods for wind turbine: A review. ARPN Journal of
Engineering and Applied Sciences. 2016 Apr 11;11(7):4654-4668.
DOI: 10.5373/JARDCS/V12SP1/20201099
ISSN 1943-023X 527
[71] Joshuva A, Sugumaran V, Amarnath M. Selecting kernel function of Support Vector Machine for fault
diagnosis of roller bearings using sound signals through histogram features. International Journal of
Applied Engineering Research. 2015;10(68):482-487.
[72] Dhanraj AA, Selvamony C, Joshuva A. An experiment investigation on concrete by using E-waste as fine
aggregate and enhanced the thermal insulation and ultrasonic properties. 2017;8(12):392-399.
[73] Joshuva A, Sugumaran V. Speech recognition for humanoid robot. International Journal of Applied
Engineering Research. 2015;10(68):57-60.
[74] Tahir MA, Huang H, Zeyer A, Schlüter R, Ney H. Training of reduced-rank linear transformations for
multi-layer polynomial acoustic features for speech recognition. Speech Communication. 2019 Jul
1;110:56-63.
[75] von Platen P, Zhang C, Woodland P. Multi-Span Acoustic Modelling using Raw Waveform Signals. arXiv
preprint arXiv:1906.11047. 2019 Jun 21.
[76] Adiga N, Prasanna SR. Acoustic features modelling for statistical parametric speech synthesis: A review.
IETE Technical Review. 2019 Mar 4;36(2):130-49.
[77] Biswas A, de Wet F, van der Westhuizen E, Yilmaz E, Niesler T. Multilingual Neural Network Acoustic
Modelling for ASR of Under-Resourced English-isiZulu Code-Switched Speech. In Interspeech 2018 Sep
(pp. 2603-2607).
[78] Huang X, Zhang W, Xu X, Yin R, Chen D. Deeper Time Delay Neural Networks for Effective Acoustic
Modelling. In Journal of Physics: Conference Series 2019 May (Vol. 1229, No. 1, p. 012076). IOP
Publishing.
[79] Drossos K, Gharib S, Magron P, Virtanen T. Language Modelling for Sound Event Detection with Teacher
Forcing and Scheduled Sampling. arXiv preprint arXiv:1907.08506. 2019 Jul 19.
[80] Tegler H, Pless M, Blom Johansson M, Sonnander K. Speech and language pathologists’ perceptions and
practises of communication partner training to support children’s communication with high-tech speech
generating devices. Disability and Rehabilitation: Assistive Technology. 2019 Aug 18;14(6):581-9.
[81] Mezzoudj F, Langlois D, Jouvet D, Benyettou A. Textual Data Selection for Language Modelling in the
Scope of Automatic Speech Recognition. Procedia Computer Science. 2018 Jan 1;128:55-64.
[82] Mocialov B, Hastie H, Turner G. Transfer learning for british sign language modelling. In Proceedings of
the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018) 2018 Aug (pp.
101-110).
[83] Korzeniowski F, Widnaer G. Automatic Chord Recognition with Higher-Order Harmonic Language
Modelling. In2018 26th European Signal Processing Conference (EUSIPCO) 2018 Sep 3 (pp. 1900-1904).
IEEE.
[84] Kłosowski P. Polish Language Modelling Based on Deep Learning Methods and Techniques. In2019
Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA) 2019 Sep 18 (pp.
223-228). IEEE.
[85] Shannon RV, Zeng FG, Kamath V, Wygonski J, Ekelid M. Speech recognition with primarily temporal
cues. Science. 1995 Oct 13;270(5234):303-4.
[86] Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, Hannemann M, Motlicek P, Qian Y,
Schwarz P, Silovsky J. The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech
recognition and understanding 2011 (No. CONF). IEEE Signal Processing Society.
[87] Haeb-Umbach R, Ney H. Linear discriminant analysis for improved large vocabulary continuous speech
recognition. In[Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and
Signal Processing 1992 Mar 23 (Vol. 1, pp. 13-16). IEEE.
[88] Morgan N, Bourlard H. Continuous speech recognition. IEEE signal processing magazine. 1995
May;12(3):24-42.
[89] Cooke M, Green P, Josifovski L, Vizinho A. Robust automatic speech recognition with missing and
unreliable acoustic data. Speech communication. 2001 Jun 1;34(3):267-85.
[90] Levinson SE. Continuously variable duration hidden Markov models for automatic speech recognition.
Computer Speech & Language. 1986 Mar 1;1(1):29-45.
DOI: 10.5373/JARDCS/V12SP1/20201099
ISSN 1943-023X 528
View publication stats

A Review On Recent Trends and Development in Speech Recognition System

Uploaded by

Copyright:

Available Formats

You might also like

A Review On Recent Trends and Development in Speech Recognition System

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Review On Recent Trends and Development in Speech Recognition System

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

A Review on Recent Trends and Development in Speech Recognition System

Article in Journal of Advanced Research in Dynamical and Control Systems · February 2020

Joshuva Arockia Dhanraj Balachandar Krishnamurthy

SEE PROFILE SEE PROFILE

Robonauts Research Centre View project

Fault Diagnosis and Localization on Wind Turbine Blade View project

The user has requested enhancement of the downloaded file.

A Review on Recent Trends and

II. Speech Recognition Process

2.1. Voice Acquisition

III. Phases of Speech Recognition

Figure 1: Automatic Speech Understanding Block Diagram

View publication stats

You might also like