Professional Documents
Culture Documents
Design of Subsystems For A Web-Based Survey System Using Automatic Speech and Optical Character Recognition With Geotagging Features
Design of Subsystems For A Web-Based Survey System Using Automatic Speech and Optical Character Recognition With Geotagging Features
A typical means of gathering data is through web based surveys that demand users to type back
their responses; this may be tedious hence leading to inconsistent or wrong data. The aim in
this study is to develop an automated subsystem for a web-survey platform using automated
speech recognitions, optical character recognitions and geotagging techniques to enhance user
experience and improve the quality of survey results.
1.)Building error-prone and accurate speech recognition for parsing free open ended input.
2.) Development of OCR functionalities for reading handwritten data in survey answers.
3.) Enhancing both subsystems for dealing with Natural Language and Various Handwritings.
4.) Adding geotagging functions for getting the coordinates of each answer.
5.) Developing the systems in a way that they seamlessly integrate with the current web-based
survey software.
- Audio interface: One with enough inputs for multiple microphones and good enough
digitization quality of the audio signal and audio interface.
- Computing: The high-performance server or GPU-accelerated hardware used for running the
speech recognition software and models. Can necessitate customized ICs or architectures for
real-time speech recognition.
- Image processing: Image pre-processing module for images to be fed into OCR. Operations
such as cropping and adjusting orientation are performed.
- Computing: The efficient implementation entails the use of server or hardware accelerator to
execute OCR neural networks and to convert image data into text.
For geotagging:
- GPS: Satellite-based positioning system – a personal GPS receiver.
- Cellular modem: Modem for estimating approximative position in case of unreliability of GPS.
- WiFi module: Helps with improving location accuracy in terms of mapping nearby WiFi
networks.
The subsystems can interfaced with existing Web Survey Platform Server. Text and location
information that is automatically recognized through the use of a voice recorder, an imaging
device, and geotagging hardware will be transmitted via its server.
Software:
Here is a draft software design for the web-based survey system subsystems:
- Speech recognizer – Utilizes deep learning techniques, such as convolutional and recurrent
networks, to process audio and convert it into text form.
- Statistical Models for Natural Language Pattern Recognition / Including Language Grammar.
- Speech Recognition and its Processes – Acoustic Models.
- Efficient Speech Recognition - Searching Candidates During Speech Recognition Using Beam
Search Decoder.
- Pre-Processing – This is where the image is enhanced so it can be fed into OCR machine.
Operation like noise removal and skew correction are made here.
- The Art of OCR Engine – Neural Network Models for Text Detection in Images & Extraction of
Characters. Uses CNNs, sequence models, etc.
- Lexics and grammar context for increased reliability in OCR through language models.
- Refines the OCR output by correcting mistakes in spelling, substituting confusible characters
etc.
Geotagging Subsystem:
- The cell tower geolocation module – estimates location from cell tower signals, when GPS is
not available.
- Location Fusion Algorithm – Merges Data From All Sources To Calculate True Coordinated
Photos.
The new system’s surveys, data and results management will link with the existing web survey
platform and database. It manages surveys, collects responses, analyses results, and visualizes
them.