Professional Documents
Culture Documents
Video Metadata Generation and Classification-1
Video Metadata Generation and Classification-1
Table of Contents :
1. Introduction..........................................................................................5.
1.1 Goal................................................................................................5.
1.2 Setup ..............................................................................................5.
2. Theoretical Background.......................................................................6.
2.2 Machine learning basics…...............................................................6.
2.2 Supervised learning…………………………………………………………….…6.
2.3 Classification………………………………………………………………….……..6.
2.4 Metadata………………………………………………………………………………6.
3. Algorithm to Classify videos…………………………………………………………7.
3.1Convolutional neural network…………………………………………….…...7.
3.2 Natural Language Processing…………….….………………………………...7.
3.2.1 Speech Recognition…………………………………………………………….8.
4. Experimental Procedure………………………………………………………………8.
4.1 Preparing dataset…………………………………………………………………...8.
4.1.1 Dataset for CNN…………………………………………………………………8.
4.1.1.1 Algorithm………………………………………………………………………..8.
4.1.2 Dataset for NLP Model……………………………………………………….10.
4.1.2.1 Dataset format………………………………………………………………….11.
4.2 Machine Learning Model for Classification………………………………12.
4.3 VGG19 model description…………………………………………….………12.
4.3.1 VGG19 Architecture…………………………………………………….……13.
4.4 Random Forest Classifier Architecture…………………………….……..13.
4.4.1 Why we use random forest classifier………………………….………..13.
5. Requirement Specification ……………………………………………….…….…14.
5.1Development side requirements………………………………….………..14.
2
Diagrams
1.Introduction
Video Classification is an important and widely researched problem in the field of
Computer Vision. Machine Learning models have proved to perform significantly
well in order to generate good results. Usually the video clips are of variable length
but they are required to be mapped onto a smaller and fixed dimensional vector
representations of features in order to be processed by CNNs. ISRO has large
amount of videos in a chunks format. They want to categories them into various
categories but manually it is not possible to categories them, so they wanted a
software which categories videos automated in various categories such as launch,
space, interview, outdoor launchpad, graphics, crowed etc.
1.1 Goal
The main goal of this project is to design a software to classify videos. So we have
designed a software with the help of machine learning algorithms and natural
language processing which categories videos in predefined 30 categories.
1.2 Setup
The software takes videos as a input and classify them and generate their metadata
in text format. To design the model through machine learning it is necessary to
have a set of training data. The machine learning algorithm will use this data set to
train the model. The training data is experimental data provided by Indian Space
Research Organization. The dataset contains 3600 videos of 30 categories. The
given dataset is unstructured but we have used supervised learning algorithms for
classifying videos so we have to pre-process dataset into structured way. When a
model has been trained it has to be validated by using some auxiliary set of data,
called the validation data set.
A machine learning model with input and target output diagram 1.1
6
2. Theoretical Background
The theoretical framework is the structure that can hold or support a theory of a
research study. The theoretical framework introduces and describes the theory
explains why the research problem under study exists
2.1 Machine learning basics
Machine learning is an application of AI that enables systems to learn and improve
from experience without being explicitly programmed. Machine learning focuses
on developing computer programs that can access data and use it to learn for
themselves.
2.2 supervised learning
Supervised machine learning algorithms apply what has been learned in the past to
new data using labelled data to predict events. By analyzing a known training
dataset, the machine learning algorithm produces function to predict output.
2.3 Classification
Classification is a process of categorizing a given set of data into classes, it can be
performed on both structured or unstructured data. The process starts with
predicting the class of given data points .The classes are often referred to as target,
label or categories. In machine learning there are three types of classification. We
have used the multi class classification to classify the videos.
2.4 Metadata
Metadata is “data about data”. Video metadata is basically the data about a video.
Video meta data will have the information of the category of the video, like the
given video is an interview video or a satellite video, or a Satellite launching video,
or a control room Video. Furthermore, a meta data may be the descriptive
summary of the video, for example: a meta data for the satellite launching video
may have the details like “the video is about the launching of XYZ satellite by
GSLV”, for an interview video, the descriptive summary can be like “XYZ person
said that ISRO is planning for mars mission on XYZ date. Moreover, a meta data
can be like, length of the video in time.
7
The convolutional neural networks consist of five layers, which are following
i) input layer
ii) convolutional layer
iii) pooling layer
iv) fully connected layer
v) output layer
In the project CNN is used for Video classification using image recognition and
VGG19 model & NLP is used for classifying videos based on the text.
8
Start
Input folder
If not exists
Finding the video
file
Output folder
Name directory if
not present
If no
For loop
to process Set frame to 0
frame 1 to
last frame
Random Forest
Classifier
Category of video
In simple language VGG is a deep CNN used to classify images. The layers in
VGG19 model are as follows:
Conv3x3 (64), Conv3x3 (64), MaxPool, Conv3x3 (128), Conv3x3 (128), MaxPool,
Conv3x3 (256), Conv3x3 (256), Conv3x3 (256), Conv3x3 (256), MaxPool,
Conv3x3 (512), Conv3x3 (512), Conv3x3 (512), Conv3x3 (512), MaxPool,
Conv3x3 (512), Conv3x3 (512), Conv3x3 (512), Conv3x3 (512), MaxPool, Fully
Connected (30), Fully Connected (30), Fully Connected (30), SoftMax.
Here we are using transfer learning concept of machine learning. Transfer learning
for machine learning is when existing models are reused to solve a new challenge
or problem. Transfer learning is not a distinct type of machine learning algorithm,
instead it’s a technique or method used whilst training models.
13
5. Requirement Specification
These are the following requirements which are required to develop software at
development side and operate software at clint side.
i) Processor intel i5
ii) Ram 8gb
iii) Hard Disk 1TB
=================================================================
Total params: 20,777,054
Trainable params: 752,670
Non-trainable params: 20,024,384
_________________________________________________________________
17
=================================================================
Total params: 1,075,033
Trainable params: 1,075,033
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/10
1/1 [==============================] - 5s 5s/step - loss: 3.2340 - acc: 0.0400 - val_loss: 3.2362 - val
_acc: 0.0000e+00
Epoch 2/10
1/1 [==============================] - 0s 228ms/step - loss: 3.2079 - acc: 0.0000e+00 - val_loss: 3.
2180 - val_acc: 0.0000e+00
Epoch 3/10
1/1 [==============================] - 0s 225ms/step - loss: 3.2105 - acc: 0.1200 - val_loss: 3.1999
- val_acc: 0.2857
Epoch 4/10
1/1 [==============================] - 0s 264ms/step - loss: 3.1867 - acc: 0.2000 - val_loss: 3.1814
- val_acc: 0.5714
Epoch 5/10
1/1 [==============================] - 0s 232ms/step - loss: 3.1621 - acc: 0.4400 - val_loss: 3.1619
- val_acc: 0.600
Epoch 6/10
1/1 [==============================] - 0s 221ms/step - loss: 3.1476 - acc: 0.4800 - val_loss: 3.1413
- val_acc: 0.64000
Epoch 7/10
1/1 [==============================] - 0s 221ms/step - loss: 3.1466 - acc: 0.4400 - val_loss: 3.1187
- val_acc: 0.6000
Epoch 8/10
1/1 [==============================] - 0s 227ms/step - loss: 3.1272 - acc: 0.7600 - val_loss: 3.0935
- val_acc: 0.65000
Epoch 9/10
1/1 [==============================] - 0s 232ms/step - loss: 3.0906 - acc: 0.9600 - val_loss: 3.0646
- val_acc: 0.63000
Epoch 10/10
1/1 [==============================] - 0s 239ms/step - loss: 3.0846 - acc: 0.8800 - val_loss: 3.0304
- val_acc: 0.63000
19
i) Take out = output of vgg19 model & out2 = output of nlp model
ii) Apply for loop in range(3)
iii) If out2 present in out jump (iv) else jump (v)
iv) Return category= out2[0]
v) category=out[0]
vi) print(category)
vii) print(metadata)
Result :
Input video = meta.mp4
Output :
Category of video =[ Aerial view ]
Metadata = [
space agency the Indian Space Research Organisation of Maestro successfully lau
nched its Earth observation satellite into Orbit along with your observation satellit
e EOS Rebel t4i source polar satellite launch Hue go as far as to others more life i
nto the orbit Maestro as described the street as a marvelous accomplishment the e
arth observation satellite launch marks maestros first mission of 2022 was launche
d at the motion for a stroll plans to conduct a 3 other missions this year including t
he much-anticipated launch of the Chandrayaan 3 to the rule that ki slbc points to
work it was launched from Satish Dhawan Space Centre in Andhra pradesh's Srih
arikota holiday the rocket injected the observations on the light you s04 into synch
ronous Orbit some 529 km above the earth.
]
1. Research 10 days
2. Analysis 10 days
5. Programming 30days
6. Testing 10 days
7. Documentation 10 days
8. Methodology
9. Testing
9.1 Testing objectives
Unit testing refers to tests that verify the functionality of a specific section of code,
usually at the Function level. In an object-oriented environment, this is usually at t
he class level, and the Minimal unit tests include the constructors and destructors.
24
Integration testing is any type of software testing that seeks to verify the interfaces b
etween Components against a software design. Software components may be integ
rated in an iterative Way or all together ("big bang") . Normally the former is consi
dered a better practice since it allows interface issues to be localized more quickly
and fixed.
System testing tests a completely integrated system to verify that it meets its requir
ements.
The process of evaluating software during or at the end of the development proce
ss to determine whether it satisfies specified requirements. In other words, validati
on ensures that the product actually meets the user's needs, and that the specificati
ons were correct in the first place, while verification is ensuring that the product h
as been built according to the requirements and design specifications. Validation e
nsures that ‘you built the right thing’. Verification ensures that ‘you built it right’.
Validation confirms that the product, as provided, will fulfill its intended use.
Beta testing comes after alpha testing and can be considered a form of external us
er acceptance testing. Versions of the software, known as beta versions, are release
d to a limited audience outside of the programming team. The software is release
d to groups of people so that further testing can ensure the product has few faults
or bugs. Sometimes, beta versions are made available to the open public to increa
se the feedback field to a maximal number of future users.
25
White box testing White box testing is when the tester has access to the internal d
ata structures and algorithms including the code that implement these.
10. Limitations
This software is specially designed for Indian Space & Research Organization and
it will predict only these videos which are related to predefined 30 categories but w
e can predict other categories according to our requirements by training the mode
l again and again.
26
12. Conclusion
13. References