Professional Documents
Culture Documents
Image Caption Generator Using AI: Review - 1
Image Caption Generator Using AI: Review - 1
Image Caption Generator Using AI: Review - 1
SLOT: B2+TB2
TEAM MEMBERS
Benefiting from the advances of image classification and object detection, it becomes potential to mechanically generate
one or a lot of sentences to know the visual content of a picture, that is that the drawback called Image Captioning.
Generating complete and natural image descriptions mechanically has massive potential effects, like titles connected to
news pictures, descriptions related to medical pictures, text-based image retrieval, data accessed for blind users, human
golem interaction. These applications in image captioning have vital theoretical and sensible analysis price. Therefore, image
captioning may be a lot of sophisticated however important task within the age of AI.
I
N The goal of image captioning is to generate descriptions for a given image, i.e., to capture the link between
T the objects gift within the image, generate language expressions and choose the standard of the generated
descriptions. the matter, therefore, is ostensibly harder than well-liked pc vision tasks, e.g., object
R detection or segmentation, wherever the stress is only on distinctive the various entities gift within the
O image. With recent advancements in coaching neural networks, the supply of GPU computing power, and
enormous datasets, neural network driven approaches square measure the foremost well-liked selection
D for handling the caption generation downside. However, humans square measure still higher at decoding
U pictures and constructing helpful and pregnant captions, with or while not a selected application context,
that renders it a noteworthy application for IML and interpretable computer science (XAI). Promising
C technologies embrace active learning, that was already applied for automating the assessment of image
T captioning. IML ways to incrementally train, e.g., re-ranking models for choosing the most effective caption
candidate and XAI ways which will improve the user’s understanding of a model and eventually, modify it to
I produce higher feedback for a second IML method.
O
N
LITERATURE REVIEW
AUTHOR TITLE CONCEPT METHODOLOGY ANALYSIS LIMITATIONS
Chetan Amritkar Image Caption This model is used to Pre-trained We analyze that use The categories in results
Department of Generation using generate natural sentences Convolutional Neural of larger datasets are due to neighborhood
EnTC Vishwakarma Deep Learning which eventually describes Network (CNN) is used increases of some particular words,
Institute of Technique the image. for the image performance of the i.e., for word like car it’s
Technology Pune, classification task. This model. The larger neighborhood words like
India network acts as an dataset will increase vehicle, van, cab etc. are
chetan.amritkar16 image encoder. The accuracy as well as also generated which
@vit.edu last hidden layer is reduce losses. might be incorrect.
used as an input to
Vaishali Jabade Recurrent Neural
Department of Network (RNN). This
EnTC Vishwakarma network is a decoder
Institute of which generates
Technology Pune, sentences.
India
vaishali.jabade@vi
t.edu
HaoranWang , Yue IMAGE CAPTION This method uses natural Pre-trained We analyze that use of Limitation is the difficulty
Zhang, and GENERATION image caption generator bit Convolutional Neural larger datasets
Xiaosheng Yu, “An METHODS doesn’t store more like Network (CNN) is used increases performance of understanding the
Overview of captions whenever required for the image of the model. The larger intermediate result. The
which linked to rnn. classification task. This dataset will increase LRCN
network acts as an accuracy as well as
method is further
image encoder. reduce losses
Image Caption developed to text generatio
Generation
Methods”, (CIN-
2020)
PROPOSED WORK AND IMPLEMENTATION
Module Description
• Performing data cleaning.
• The feature vectors from all images.
• Loading datasets for Training the model.
• Tokenizing the vocabulary.
• Create Data Generators.
• Defining the CNN-RNN model
• Training the Model.
• Testing the Model.
Requirement Specifications
Hardware Requirement Software Requirement
• Processor: 64-bit architecture at 1 GHz or faster; Intel: • Python(3) + PIP installed
eight generation or newer; AMD Ryzen 3 / better;
Qualcomm Snapdragon 7C or higher • Editor
✓ tqdm
✓ jupyterlab
EXPECTED RESULT
• Detect objects on the spot and determine the relationships between
them.
• Show image content correctly with well formed phrases and
sentences.
• To describe an image with one or more regular language sentences.
REFERENCES
[1] Ting Yao , Yingwei Pan , Yehao Li, Tao Mei (2018) Exploring Visual Relationship for Image Captioning JD AI
Research, Beijing, China
[2] Peter Anderson Xiaon He Chris Buehler Damien Teney Mark Johnson Stephen Gould Lei Zhang(2017)Bottom-Up and
Top-Down Attention for Image Captioning and Visual Question Answering Version, v3)] Salt Lake City, UT, USA IEEE
[3] Yan Chu, Xiao Yue, Lei Yu, Mikalov Sergei, and Zhengkui Wang (2020) Automatic Image CaptioningBased on
ResNet50 and LSTM Soft Attention Yin Zhang
[4] Vinyals, O., Toshev, A., Bengo, S., & Erhan, D. (2015) Show and tell: A neural image caption generator IEEE Boston,
MA, USA 28
[5] Johnson, J., Karpaty, A., & FeiFi, L. (2016) DenseCap: Fully Convolutional Localization Networks for Dense
Captioning Las Vegas, NV, USA