Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

ISSN: 2057-5688

IMAGE BASED QUESTION AND ANSWERING SYSTEM


1
Mr B NARSINGAM, 2B.SUMANTH, 3D.GANESH CHANDU, 4K.SANDEEP

1
Assistant Professor, Dept.of CSE, Teegala Krishna Reddy Engineering College, Meerpet, Hyderabad,

tkrcsebnarsingam@gmail.com

BTech Student, Dept.of CSE, Teegala Krishna Reddy Engineering College, Meerpet, Hyderabad
2,3,4

sumanthbelladhi@gmail.com, ganeshchandu29@gmail.com, sandeepkotakonda4648@gmail.com

ABSTRACT: Image based question answering is a useful way of finding information about
physical objects. Current question answering (QA) systems are text-based and can be difficult to
use when a question involves an object with distinct visual features. An image QA system allows
direct use of a image to refer to the object. We develop a three-layer system architecture for
image QA that brings together recent technical achievements in question answering and im- age
matching.

Keywords: Image based question answering system,

I. INTRODUCTION vector representation by an ‘encoder’ RNN,


which in turn is used as the initial hidden
vision-to-Language problems present a
state of a ‘decoder’ RNN that generates the
particular challenge in Computer Vision
target sentence. Despite the supposed
because they require translation between
equivalence between an image and a
two different forms of information. In this
thousand words, the manner in which
sense the problem is similar to that of
information is represented in each data form
machine translation between languages. In
could hardly be more different. Human
machine language translation there have
language is designed specifically so as to
been a series of results showing that good
communicate information between humans,
performance can be achieved without
whereas even the most carefully composed
developing a higher-level model of the state
image is the culmination of a complex set of
of the world. In [1], for instance, a source
physical processes over which humans have
sentence is transformed into a fixed-length

Volume XV Issue I1 2023 JUNE http://ijte.uk/ 594


ISSN: 2057-5688

little control. Given the differences between networks is the most simple method
these two forms of information, it seems for improving their performance. This
surprising that methods inspired by machine includes increasing the depth and the
language translation have been so successful. breadth of the network. Another
These RNN-based methods which translate advantage of this method is that it is
directly from image features to text, without based on the idea of processing the
developing a high-level model of the state of visual input to numerous scales before
the world, represent the current state of the being aggregated, allowing the
art for key Vision-to-Language (V2L) following step to abstract data from
problems, such as image captioning and multiple scales at once.
visual question answering
III. PROPOSED SYSTEM
II. LITERATURE SURVEY
To bypass the problem of selecting a
A deep convolutional neural network huge number of regions, Ross Girshick
architecture by the name of Inception et al. proposed a method where we use
is proposed. It establishes a new selective search to extract just 2000
standard for detection and regions from the image and he called
classification (ILSVRC14). The most them region proposals. Therefore, now,
basic characteristic of this particular instead of trying to classify a huge
architecture is the increased usage of number of regions, you can just work
resources of computing inside the with 2000 regions. These 2000 region
network. Even inside the convolutions, proposals are generated using the
the essential approach to do this is to selective search algorithm.
move from fully connected to sparsely SYSTEM ARCHITECTURE
connected structures. State-of-the-art
sparse arithmetic operations result
from grouping sparse matrices into
comparatively dense submatrices.
Increasing the scale of deep neural

Volume XV Issue I1 2023 JUNE http://ijte.uk/ 595


ISSN: 2057-5688

Fundamentally, machine learning


involves building mathematical models
to help understand data. "Learning"
enters the fray when we give these
models tunable parameters that can be
adapted to observed data; in this way
the program can be considered to be
"learning" from the data. Once these
models have been fit to previously
seen data, they can be used to predict
and understand aspects of newly
Fig.1 System architecture observed data. I'll leave to the reader
the more philosophical digression
regarding the extent to which this type
Machine Learning :
of mathematical, model-based
Before we take a look at the details of "learning" is similar to the "learning"
various machine learning methods, let's exhibited by the human
start by looking at what machine brain.Understanding the problem
learning is, and what it isn't. Machine setting in machine learning is essential
learning is often categorized as a to using these tools effectively, and so
subfield of artificial intelligence, but I we will start with some broad
find that categorization can often be categorizations of the types of
misleading at first brush. The study of approaches we'll discuss here.
machine learning certainly arose from
Categories Of Machine Leaning :-
research in this context, but in the data
science application of machine At the most fundamental level,

learning methods, it's more helpful to machine learning can be categorized

think of machine learning as a means into two main types: supervised

of building models of data. learning and unsupervised learning.

Volume XV Issue I1 2023 JUNE http://ijte.uk/ 596


ISSN: 2057-5688

Supervised learning involves somehow Need for Machine Learning


modeling the relationship between Human beings, at this moment, are the
measured features of data and some most intelligent and advanced species
label associated with the data; once this on earth because they can think,
model is determined, it can be used to evaluate and solve complex problems.
apply labels to new, unknown data. On the other side, AI is still in its
This is further subdivided into initial stage and haven’t surpassed
classification tasks and regression human intelligence in many aspects.
tasks: in classification, the labels are Then the question is that what is the
discrete categories, while in regression, need to make machine learn? The most
the labels are continuous quantities. suitable reason for doing this is, “to
We will see examples of both types of make decisions, based on data, with
supervised learning in the following efficiency and scale”.
section.
Lately, organizations are investing
Unsupervised learning involves heavily in newer technologies like
modeling the features of a dataset Artificial Intelligence, Machine
without reference to any label, and is Learning and Deep Learning to get the
often described as "letting the dataset key information from data to perform
speak for itself." These models include several real-world tasks and solve
tasks such as clustering and problems. We can call it data-driven
dimensionality reduction. Clustering decisions taken by machines,
algorithms identify distinct groups of particularly to automate the process.
data, while dimensionality reduction These data-driven decisions can be
algorithms search for more succinct used, instead of using programing
representations of the data. We will see logic, in the problems that cannot be
examples of both types of programmed inherently. The fact is
unsupervised learning in the following that we can’t do without human
section. intelligence, but other aspect is that we

Volume XV Issue I1 2023 JUNE http://ijte.uk/ 597


ISSN: 2057-5688

all need to solve real-world problems sphere and cube colour is same’ or
with efficiency at a huge scale. That is ‘sphere is present in image’ then it will
why the need for machine learning answer ‘YES’ or ‘No’ or number of
arises objects in image but our project can
describe anything in the image by
using user question and to answer this
IV. RESULTS
we are using RCNN to identify objects
Image Question Answer in images and LSTM to build
The dataset which you gave us can vocabulary sentences in meaningful
answer limited questions such as ‘does form

To run project double click on ‘run.bat’ file to get below screen

Fig.2 In above screen click on ‘Load RCNN Model’ button to load CNN model and to get
below screen

Volume XV Issue I1 2023 JUNE http://ijte.uk/ 598


ISSN: 2057-5688

Fig.3 In above screen in text area we can see model loaded and now click on ‘Upload
Image’ button to upload

Volume XV Issue I1 2023 JUNE http://ijte.uk/ 599


ISSN: 2057-5688

Fig.4 In above screen selecting and uploading ‘test2.png’ file and then click on ‘Open’
button to get below screen

Fig.5In above screen image loaded and now click on ‘Extract Answer from Image’ button
to get below result

Volume XV Issue I1 2023 JUNE http://ijte.uk/ 600


ISSN: 2057-5688

Fig.6 In above screen we can answer from image and similarly you can upload any image
and get answer and this project will not give any answer till u entered question

V. CONCLUSION artificially intelligent systems. A


system that can build a relationship
The VQA problem plays an important
between visual and textual modalities
part in designing a relatively stringent
and give matching responses on
Visual Turing Test. It requires systems
arbitrary questions, could very
to combine various facets of artificial
strongly mark such a breakthrough. In
intelligence like object detection,
this paper, we have examined the
question understanding and
various datasets and approaches used
determining the relationship between
to tackle the VQA problem. We also
the objects and the question.
examined some of the challenges VQA
Successful Image Question Answering
systems might face in regard to
on the whole has been widely known
multiword answers and the types of
in the community and otherwise to be
questions asked. Future scope relies on
a momentous breakthrough among

Volume XV Issue I1 2023 JUNE http://ijte.uk/ 601


ISSN: 2057-5688

creation of datasets that rectify bias in Processing Toolkit In Proceedings of


existing datasets, introduction of better 52nd Annual Meeting of the
evaluation metrics for the algorithms Association for Computational
carrying out the VQA task to Linguistics: System Demonstrations,
determine whether the algorithm is pp. 55-60.
performing well on the VQA task in
[4] S. Ren, K. He, R. Girshick, and J.
general and not only on that dataset. In
Sun, “Faster r-cnn: Towards real-time
addition to that, the inclusion of more
object detection with region proposal
varied, diverse scenes in the catalogue
networks,” in Advances in Neural
of images and more realistic, complex
Information Processing Systems
questions with a longer, descriptive set
(NIPS), 2015.
of answers would be highly valuable.
[5] S. Antol, A. Agrawal, J. Lu, M.
REFERENCES
Mitchell, D. Batra, C. Lawrence
[1] K. He, X. Zhang, S. Ren, and J. Zitnick, and D. Parikh, “VQA: Visual
Sun, “Deep residual learning for Question Answering”, in Proceedings
image recognition,” in The IEEE of the IEEE International Conference
Conference on Computer Vision and on Computer Vision, pp. 24252433,
Pattern Recognition (CVPR), 2016 2015.

[2] Joseph Redmon, Ali [6] M. Malinowski, Mario Fritz, “A


Farhadi ”YOLO9000:Better, Faster, multi-world approach to question
Stronger”, arXiv:1612.08242v1, 25 answering about real-world scenes
Dec 2016. based on uncertain input ”, NIPS,
2014.
[3] Manning, Christopher D.,
Surdeanu, Mihai, Bauer, John, Finkel, [7] N. Silberman, D. Hoiem, et al,
Jenny, Bethard, Steven J., and “Indoor segmentation and support
McClosky, David. 2014. The Stanford inference from rgbd images,” in
CoreNLP Natural Language

Volume XV Issue I1 2023 JUNE http://ijte.uk/ 602


ISSN: 2057-5688

European Conference on Computer [9] Naga Lakshmi Somu, Prasadu


Vision (ECCV), 2012. Peddi (2021), An Analysis Of Edge-
Cloud Computing Networks For
[8] Kan Chen, Jiang Wang, Liang-
Computation Offloading, Webology
Chieh Chen, Haoyuan Gao, Wei Xu,
(ISSN: 1735-188X), Volume 18,
Ram Nevatia, “ABC-CNN: An
Number 6, pp 7983-7994.
attention based convolutional neural

network for visual question


answering”, arXiv preprint
arXiv:1511.05960, April 2016

Volume XV Issue I1 2023 JUNE http://ijte.uk/ 603

You might also like