Professional Documents
Culture Documents
Project Report: Topic: - Video Summarization
Project Report: Topic: - Video Summarization
Project Report: Topic: - Video Summarization
Mentored By:
Dr Dushyant Kr Singh
1
UNDERTAKING
2
Table of Contents
Preface 4
Acknowledgements 5
Objective 6
Introduction 7
Motivation 8
Overview 9-14
Result 15-16
Technology Used 17-19
References 20
3
Preface
4
Acknowledgements
Finally, we would like to thank our friends and family for their
constant support and advice. Without their love, blessings and
encouragement it would have proved impossible to complete
this project.
5
1. Objective
We have seen YouTube and other media sources pushing
the bounds of video consuming in the past few years. As
media sources compete for more of a viewer’s time every
day, one possible alleviation is a video summarization
system. A movie teaser is an example of a video
summary. However, not everyone has the time to edit
their videos for a concise version.
6
2. Introduction
Video summarization is to generate a short summary of
the content of a longer video document by selecting and
presenting the most informative or interesting materials
for potential users. The output summary is usually
composed of a set of keyframes or video clips extracted
from the original video with some editing process.
7
3. Motivation
Enormous number of video recordings are being created
and shared on the Internet all over the day. It requires
more space to store the video on storage devices and more
bandwidth to transmit the video from one device to
another when compared to images and text. So, a
summary of a video comes handy in a situation when we
want to just glance at the content of the video quickly.
Videos are usually analysed by humans which demands
immense manpower.
8
4. Overview
The overview is shown below: -
9
4.1 Feature Selection
Feature extraction is a type of dimensionality reduction
where a large number of pixels of the image are efficiently
represented in such a way that interesting parts of the
image are captured effectively. We by default selected
RGB color histograms for our feature comparator due to
its global nature and speed of processing but we can also
use VGG 16 and vgg 19 cnn models for feature extraction.
We compute the histogram of the representative frame.
The histogram is a graph on your LCD showing the
distribution of each primary color's brightness level in the
image (RGB or red, green, and blue).
The rgb histogram for image is shown below :
10
4.2 K-Means Clustering
K-means is an unsupervised clustering algorithm designed
to partition unlabelled data into a certain number (that’s
the “ K”) of distinct groupings. In other words, k-means
finds observations that share important characteristics and
classifies them together into clusters. A good clustering
solution is one that finds clusters such that the observations
within each cluster are more similar than the clusters
themselves.
11
4.2.1 Algorithm
12
5. Approach
An outline of our system will be as follows:
13
• Compute the histograms from x0 ...xn and assign
it y0 ...yn . The histogram is a graph on your LCD
showing the distribution of each primary color's
brightness level in the image (RGB or red, green,
and blue).
14
6. Results
We selected k = 10 as our k-means parameter and use 10
segments for the output video.
15
Original Video
2 Minute 42 seconds
Summarized Video
20 Seconds
16
7. Technology Used
17
• Python can be used for rapid prototyping, or for production-
ready software development.
18
OpenCV (Open-Source Computer Vision Library) is a library
of programming functions mainly aimed at real-time computer
vision. Originally developed by Intel, it was later supported by
Willow Garage then Itseez (which was later acquired by Intel).
The library is cross-platform and free for use under the open-
source Apache 2 License. Starting with 2011, OpenCV
features GPU acceleration for real-time operations.
VGG-16
VGG16 is a convolution neural net (CNN ) architecture which
was used to win ILSVR(Imagenet) competition in 2014. It is
considered to be one of the excellent vision model architecture
till date. Most unique thing about VGG16 is that instead of
having a large number of hyper-parameter they focused on
having convolution layers of 3x3 filter with a stride 1 and
always used same padding and maxpool layer of 2x2 filter of
stride 2. It follows this arrangement of convolution and max
pool layers consistently throughout the whole architecture. In
the end it has 2 FC(fully connected layers) followed by a
softmax for output. The 16 in VGG16 refers to it has 16 layers
that have weights. This network is a pretty large network and it
has about 138 million (approx) parameters.
19
References
• https://scikit-learn.org/stable/
• https://keras.io/
• https://www.tensorflow.org/
• https://www.researchgate.net/publicat
ion/266032463_Video_Summarization
_Using_Clustering
• https://opencv.org/
20