Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Title: Design of Subsystems for a Web-based Survey System using Automatic Speech and Optical

Character Recognition with Geotagging Features

Statement of the Problem:

A typical means of gathering data is through web based surveys that demand users to type back
their responses; this may be tedious hence leading to inconsistent or wrong data. The aim in
this study is to develop an automated subsystem for a web-survey platform using automated
speech recognitions, optical character recognitions and geotagging techniques to enhance user
experience and improve the quality of survey results.

The main technical challenges include:

1.)Building error-prone and accurate speech recognition for parsing free open ended input.

2.) Development of OCR functionalities for reading handwritten data in survey answers.

3.) Enhancing both subsystems for dealing with Natural Language and Various Handwritings.

4.) Adding geotagging functions for getting the coordinates of each answer.

5.) Developing the systems in a way that they seamlessly integrate with the current web-based
survey software.

Successful implementation of the automated subsystems would permit respondent to deliver


orally or by hands written answers more conveniently and easier. This geotagged spatial data
collected along with survey answers could provide a richer geographic analysis of responses. In
conclusion, the improved web survey platform is expected to generate more detailed and high
quality data at large scale.
Hardware:
For speech recognition:
- Microphones: Omni-directional high quality mics with sharp sound of human speech. Thus, it
will involve multiple microphones for voice recording from all directions.

- Audio interface: One with enough inputs for multiple microphones and good enough
digitization quality of the audio signal and audio interface.

- Computing: The high-performance server or GPU-accelerated hardware used for running the
speech recognition software and models. Can necessitate customized ICs or architectures for
real-time speech recognition.

For optical character recognition:


- Camera: A high resolution Camera that captures detailed images of handwritten papers for
surveillance. Minimum 10MP resolution recommended.

- Image processing: Image pre-processing module for images to be fed into OCR. Operations
such as cropping and adjusting orientation are performed.

- Computing: The efficient implementation entails the use of server or hardware accelerator to
execute OCR neural networks and to convert image data into text.

For geotagging:
- GPS: Satellite-based positioning system – a personal GPS receiver.

- Cellular modem: Modem for estimating approximative position in case of unreliability of GPS.

- WiFi module: Helps with improving location accuracy in terms of mapping nearby WiFi
networks.

The subsystems can interfaced with existing Web Survey Platform Server. Text and location
information that is automatically recognized through the use of a voice recorder, an imaging
device, and geotagging hardware will be transmitted via its server.

Software:
Here is a draft software design for the web-based survey system subsystems:

Speech Recognition Subsystem:

- Speech recognizer – Utilizes deep learning techniques, such as convolutional and recurrent
networks, to process audio and convert it into text form.

- Statistical Models for Natural Language Pattern Recognition / Including Language Grammar.
- Speech Recognition and its Processes – Acoustic Models.

- Efficient Speech Recognition - Searching Candidates During Speech Recognition Using Beam
Search Decoder.

- Web Platform – Submit Audio And Retrieve Recognized Text.

Optical Character Recognition Subsystem:

- Pre-Processing – This is where the image is enhanced so it can be fed into OCR machine.
Operation like noise removal and skew correction are made here.

- The Art of OCR Engine – Neural Network Models for Text Detection in Images & Extraction of
Characters. Uses CNNs, sequence models, etc.

- Lexics and grammar context for increased reliability in OCR through language models.

- Refines the OCR output by correcting mistakes in spelling, substituting confusible characters
etc.

- Web Platform – Interface for Submission of Data and Extracted Texts.

Geotagging Subsystem:

- GPS co-ordinates – extracts current position form GPS satellites.

- The cell tower geolocation module – estimates location from cell tower signals, when GPS is
not available.

- Map Nearby WiFi Networks – Improving Accuracy.

- Location Fusion Algorithm – Merges Data From All Sources To Calculate True Coordinated
Photos.

- Web platform to get survey responses’ coordinates in API format.

The new system’s surveys, data and results management will link with the existing web survey
platform and database. It manages surveys, collects responses, analyses results, and visualizes
them.

You might also like