Image Caption Generator

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 20

IMAGE CAPTION GENERATOR

Present By:

Krupa Patel(190633116001)
Guided By : Riya Soni(190633116004)
Maitri Patel(190633116003)
Prof. Pritesh Pandey Moniska Patel(180630116014)
Content:
Abstract
 Introduction
 Canvas
 What is CNN?
What is LSTM?
 CNN LSTM Model
 Image Caption Generator Model
Abstract:
You saw an image and your brain can easily tell what the
image is about, but can a computer tell what the image is
representing? Computer vision researchers worked on
this a lot and they considered it impossible until now!
With the advancement in Deep learning techniques,
availability of huge datasets and computer power, we can
build models that can generate captions for an image.

This is what we are going to implement in this Python


based project where we will use deep learning
techniques of Convolutional Neural Networks and a type
of Recurrent Neural Network (LSTM) together.
Introduction:
 Image caption generator is a task that involves computer
vision and natural language processing concepts to
recognize the context of an image and describe them in
a natural language like English.

 In this Python project, we will be implementing the


caption generator using CNN (Convolutional Neural
Networks) and LSTM (Long short term memory). The
image features will be extracted from Xception which is
a CNN model trained on the imagenet dataset and then
we feed the features into the LSTM model which will be
responsible for generating the image captions.
Ideation Canvas:
Empathy Canvas:
LNM Canvas:
What is CNN?

Convolutional Neural networks are specialized


deep neural networks which can process the data
that has input shape like a 2D matrix.

Imagesare easily represented as a 2D matrix


and CNN is very useful in working with images.

CNN is basically used for image classifications


and identifying if an image is a bird, a plane or
Superman, etc.
What is LSTM?

LSTM stands for Long short term memory, they are


a type of RNN (recurrent neural network) which is
well suited for sequence prediction problems.

LSTM can carry out relevant information


throughout the processing of inputs and with a
forget gate, it discards non-relevant information.

Ithas proven itself effective from the traditional


RNN by overcoming the limitations of RNN which
had short term memory.
CNN LSTM model
Image Caption Generator Model
RNN + CNN:

How to combine image and and sentence?


RNN + CNN:

• Encoder-decoder model

• Multimodal layer
Encoder-decoder model: machine
translation
Encoder-decoder model: image caption
Encoder-decoder model: image caption
 Use the Encoder - Decoder structure
• Encoder (RNN): transform the source language into a
rich fixed length vector
• Decoder (RNN): take the output of encoder as input and
generates the target sentence
Encoder-decoder model in time:
Multimodal layer
Conclusion and future work:
 We analyzed and modified an image captioning method
LRCN. To understand the method deeply, we
decomposed the method to CNN, RNN, and sentence
generation. For each part, we modified or replaced the
component to see the influence on the final result.

 In the future, we would like to explore methods to


generate multiple sentences with different content. One
possible way is to combine interesting region detection
and image captioning
THANK YOU!

You might also like