Project Report: Topic: - Video Summarization

Project Report
Topic: - Video Summarization

Bachelor of Technology
In
Computer Science
Report Submitted By:

Group CS-47
NAME REG No
Rohit Kumar 20184009
Saurav Chaudhary 20184165
Vidushi Vigh 20184188
Rishu Kumar 20184010
Mentored By:
Dr Dushyant Kr Singh
1
UNDERTAKING
I declare that the work presented in this report titled “Video

Summarization”, submitted to the Computer Science and
Engineering Department, Motilal Nehru National Institute of
Technology Allahabad, Prayagraj. for the award of the Bachelor
of Technology degree in Computer Science and Engineering,
is my original work. I have not plagiarized or submitted the
same work for the award of any other degree. In case this
undertaking is found incorrect, I accept that my degree may be
unconditionally withdrawn.
25 November, 2021 Prayagraj
2
Table of Contents
Preface 4
Acknowledgements 5
Objective 6
Introduction 7
Motivation 8
Overview 9-14
Result 15-16
Technology Used 17-19
References 20
3
Preface
This report is to present the project prepared with the title

“Video Summarization”. In this report, we will present the
needs of the project, the technologies used, the results and the
expected impact.
During the development of this project, we experimented with

many different techniques on how to solve the problem at hand,
at last settling for one which would be scalable, one-stop
solution and viable in the future too.
4
Acknowledgements
After an intensive period of learning and developing, this note

of acknowledgement is our final touch on our report. We would
like to express our deep gratitude and sincere thanks to
everybody who has helped us in completing the project. First
and foremost, we would like to thank our mentor and supervisor,
Dr Dushyant Kr Singh, for giving us this opportunity and
providing constant support, guidance and encouragement. His
innovative ideas and zeal to motivate and help us have led to the
successful completion of this project. He has provided us ample
opportunity to explore the content and dimensions of this
project. We felt privileged working under him.
Finally, we would like to thank our friends and family for their
constant support and advice. Without their love, blessings and
encouragement it would have proved impossible to complete
this project.
5
1. Objective
We have seen YouTube and other media sources pushing
the bounds of video consuming in the past few years. As
media sources compete for more of a viewer’s time every
day, one possible alleviation is a video summarization
system. A movie teaser is an example of a video
summary. However, not everyone has the time to edit
their videos for a concise version.
Summarizing news videos automatically allows us to

quickly look out for the important patterns shown in the
news. Generating a trailer of a movie and highlights of
sports video recordings automatically are some of the
engrossing applications of video summarization. It can
also help people to manage the videos recorded from their
mobile devices to provide easy access to the videos by
representing them with summaries. Internet videos are
another type of videos which need to be summarized
before presenting them to the users. Imagine a situation
where we search videos related to a topic. Search engines
provide us with a long list of video results. A preview of
the video in the form of a summary is going to be very
helpful in order to decide on which video to watch.
6
2. Introduction
Video summarization is to generate a short summary of
the content of a longer video document by selecting and
presenting the most informative or interesting materials
for potential users. The output summary is usually
composed of a set of keyframes or video clips extracted
from the original video with some editing process.
Enormous number of video recordings are being created

and shared on the Internet all over the day. It requires
more space to store the video on storage devices and more
bandwidth to transmit the video from one device to
another when compared to images and text. So, a
summary of a video comes handy in a situation when we
want to just glance at the content of the video quickly.
Videos are usually analysed by humans which demands
immense manpower. Thus, automatic video
summarization is an important and growing research
area.
7
3. Motivation
Enormous number of video recordings are being created
and shared on the Internet all over the day. It requires
more space to store the video on storage devices and more
bandwidth to transmit the video from one device to
another when compared to images and text. So, a
summary of a video comes handy in a situation when we
want to just glance at the content of the video quickly.
Videos are usually analysed by humans which demands
immense manpower.
Summarizing news videos automatically allows us to

quickly look out for the important patterns shown in the
news. Generating a trailer of a movie and highlights of
sports video recordings automatically are some of the
engrossing applications of video summarization. It can
also help people to manage the videos recorded from their
mobile devices to provide easy access to the videos by
representing them with summaries. Internet videos are
another type of videos which need to be summarized
before presenting them to the users. Imagine a situation
where we search videos related to a topic. Search engines
provide us with a long list of video results. A preview of
the video in the form of a summary is going to be very
helpful in order to decide on which video to watch.
8
4. Overview
The overview is shown below: -
9
4.1 Feature Selection
Feature extraction is a type of dimensionality reduction
where a large number of pixels of the image are efficiently
represented in such a way that interesting parts of the
image are captured effectively. We by default selected
RGB color histograms for our feature comparator due to
its global nature and speed of processing but we can also
use VGG 16 and vgg 19 cnn models for feature extraction.
We compute the histogram of the representative frame.
The histogram is a graph on your LCD showing the
distribution of each primary color's brightness level in the
image (RGB or red, green, and blue).
The rgb histogram for image is shown below :
So, we compare the different image rgb histogram and

place them in different cluster using K-means clustering.
10
4.2 K-Means Clustering
K-means is an unsupervised clustering algorithm designed
to partition unlabelled data into a certain number (that’s
the “ K”) of distinct groupings. In other words, k-means
finds observations that share important characteristics and
classifies them together into clusters. A good clustering
solution is one that finds clusters such that the observations
within each cluster are more similar than the clusters
themselves.
How the K-means algorithm works?

To process the learning data, the K-means algorithm in data
mining starts with a first group of randomly selected
centroids, which are used as the beginning points for every
cluster, and then performs iterative (repetitive) calculations
to optimize the positions of the centroids
It halts creating and optimizing clusters when either:
• The centroids have stabilized — there is no change in

their values because the clustering has been successful.
• The defined number of iterations has been achieved.
11
4.2.1 Algorithm
12
5. Approach
An outline of our system will be as follows:
• Split the input ﬁle into time segments of k

seconds: f0...fn .
If video 1 minute(60 seconds), so if we make a
segments of 2 seconds there will be 30 segments. We need
to remove similar frames . So why don’t we split on the
basis of frame.
Two Examples to Prove why Segment is better :-
- Suppose we have video of 25fps. It need 25frame in single
second. So if we have a video of 2500 frame which are
similar. Then these frame will merged into one since all
the frame are similar. Since it need 25 frames for single
second but there is only one frame .
- If we split on the basis of frame. On applying K-means ,

the consecutive frame may go in different cluster . So after
merging the frames from the clusters the video may lose it
continuity and won’t make any sense.
So we prefer segment to keep video concise and maintain

it continuity.
• Take the ﬁrst frame of each segment. Let this

frame be representative of the segment. We
assign it x0...xn
13
• Compute the histograms from x0 ...xn and assign
it y0 ...yn . The histogram is a graph on your LCD
showing the distribution of each primary color's
brightness level in the image (RGB or red, green,
and blue).
• Cluster the histograms (y0...yn) into k-groups

using K-Means. Euclidean distance will be the
error function.
• Round robin for segment selection: Iterate
through the k-groups and select a segment
randomly from a cluster, add it to list until the
number of desired segments are chosen.
• Join list of segments together to generate a video
summary.
14
6. Results
We selected k = 10 as our k-means parameter and use 10
segments for the output video.
15
Original Video
2 Minute 42 seconds
Summarized Video
20 Seconds
16
7. Technology Used
Scikit-learn provides dozens of built-in machine learning

algorithms and models, called estimators. Each estimator
can be fitted to some data using its fit method. We had
opted SkLearn for machine learning mainly because it is
one of the most used ML libraries in python and contains
lots of efficient tools for ML and statistical modelling
including classification, regression, clustering and data
pre-processing and easy to use.
Python is a popular programming language. It was created by

Guido van Rossum, and released in 1991.
What can Python do?
• Python can be used on a server to create web applications.

• Python can be used alongside software to create workflows.
• Python can connect to database systems. It can also read and
modify files.
• Python can be used to handle big data and perform complex
mathematics.
17
• Python can be used for rapid prototyping, or for production-
ready software development.
TensorFlow is an end-to-end open source platform for machine

learning. It has a comprehensive, flexible ecosystem of tools,
libraries and community resources that lets researchers push
the state-of-the-art in ML and developers easily build and
deploy ML powered applications.
Keras is an open-source software library that provides a

Python interface for artificial neural networks. Keras acts as an
interface for the TensorFlow library. Designed to enable fast
experimentation with deep neural networks, it focuses on
being user-friendly, modular, and extensible. It was developed
as part of the research effort of project ONEIROS (Open-
ended Neuro-Electronic Intelligent Robot Operating System),
and its primary author and maintainer is François Chollet, a
Google engineer.
18
OpenCV (Open-Source Computer Vision Library) is a library
of programming functions mainly aimed at real-time computer
vision. Originally developed by Intel, it was later supported by
Willow Garage then Itseez (which was later acquired by Intel).
The library is cross-platform and free for use under the open-
source Apache 2 License. Starting with 2011, OpenCV
features GPU acceleration for real-time operations.
VGG-16
VGG16 is a convolution neural net (CNN ) architecture which
was used to win ILSVR(Imagenet) competition in 2014. It is
considered to be one of the excellent vision model architecture
till date. Most unique thing about VGG16 is that instead of
having a large number of hyper-parameter they focused on
having convolution layers of 3x3 filter with a stride 1 and
always used same padding and maxpool layer of 2x2 filter of
stride 2. It follows this arrangement of convolution and max
pool layers consistently throughout the whole architecture. In
the end it has 2 FC(fully connected layers) followed by a
softmax for output. The 16 in VGG16 refers to it has 16 layers
that have weights. This network is a pretty large network and it
has about 138 million (approx) parameters.
19
References
• https://scikit-learn.org/stable/
• https://keras.io/
• https://www.tensorflow.org/
• https://www.researchgate.net/publicat
ion/266032463_Video_Summarization
_Using_Clustering
• https://opencv.org/
20

Project Report: Topic: - Video Summarization

Uploaded by

Copyright:

Available Formats

You might also like

Project Report: Topic: - Video Summarization

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Project Report: Topic: - Video Summarization

Uploaded by

Copyright:

Available Formats

Project Report

Topic: - Video Summarization

Report Submitted By:

I declare that the work presented in this report titled “Video

25 November, 2021 Prayagraj

This report is to present the project prepared with the title

During the development of this project, we experimented with

After an intensive period of learning and developing, this note

Summarizing news videos automatically allows us to

Enormous number of video recordings are being created

Summarizing news videos automatically allows us to

So, we compare the different image rgb histogram and

How the K-means algorithm works?

It halts creating and optimizing clusters when either:

• The centroids have stabilized — there is no change in

• The defined number of iterations has been achieved.

• Split the input ﬁle into time segments of k

- If we split on the basis of frame. On applying K-means ,

So we prefer segment to keep video concise and maintain

• Take the ﬁrst frame of each segment. Let this

• Cluster the histograms (y0...yn) into k-groups

Scikit-learn provides dozens of built-in machine learning

Python is a popular programming language. It was created by

What can Python do?

• Python can be used on a server to create web applications.

TensorFlow is an end-to-end open source platform for machine

Keras is an open-source software library that provides a

You might also like