Video Metadata Generation and Classification-1

1
Table of Contents :
1. Introduction..........................................................................................5.
1.1 Goal................................................................................................5.
1.2 Setup ..............................................................................................5.
2. Theoretical Background.......................................................................6.
2.2 Machine learning basics…...............................................................6.
2.2 Supervised learning…………………………………………………………….…6.
2.3 Classification………………………………………………………………….……..6.
2.4 Metadata………………………………………………………………………………6.
3. Algorithm to Classify videos…………………………………………………………7.
3.1Convolutional neural network…………………………………………….…...7.
3.2 Natural Language Processing…………….….………………………………...7.
3.2.1 Speech Recognition…………………………………………………………….8.
4. Experimental Procedure………………………………………………………………8.
4.1 Preparing dataset…………………………………………………………………...8.
4.1.1 Dataset for CNN…………………………………………………………………8.
4.1.1.1 Algorithm………………………………………………………………………..8.
4.1.2 Dataset for NLP Model……………………………………………………….10.
4.1.2.1 Dataset format………………………………………………………………….11.
4.2 Machine Learning Model for Classification………………………………12.
4.3 VGG19 model description…………………………………………….………12.
4.3.1 VGG19 Architecture…………………………………………………….……13.
4.4 Random Forest Classifier Architecture…………………………….……..13.
4.4.1 Why we use random forest classifier………………………….………..13.
5. Requirement Specification ……………………………………………….…….…14.
5.1Development side requirements………………………………….………..14.
2
5.1.1 Software used at development side ……………………….…………..14.

5.1.2 hardware used at development side……………………………………14.
5.2 Requirements at client side …………………………………………………14.
6. Implementations and results………………………………………………………15.
6.1 VGG19 model results…………………………………………………………15.
6.1.1 dataset analysis…………………………………………………………..……15.
6.1.2 Dataset dictionary …………………………………………………….…….15.
6.1.2 Model architecture …………………………………………………..…….16.
6.1.3 Model trainging………………………………………..…………………….16.
6.1.4 Training and validation accuracy graph …………………………….15.
6.1.4 vgg19 model output …………………………………………..…………..15.
6.2 Random Forest Model results…………………………………………….16.

6.2.1 Dataset analysis ……………………………………………………………..16.
6.2.2 Model training and architecture ………………………………………17.

6.2.3 Training and validation loss and accuracy graph ………………..18.
6.2.4 Output of the model ………………………………………………………18.
6.3 Final Output …………………………………………………………………….19.
7. Software process model …………………………………………………………..19.
7.1 Gantt Chart……………………………………………………………………….21.
7.2 Timeline Chart………………………………………………………………….21.
8. Methodology ………………………………………………………………………....22.
9. Testing………………………………………………………………………………...…22.
9.1 Testing objectives……………………………………………………………….22.
9.2 Testing stratgies………………………………………………………………….22.
9.2.1 Unit testing …………………………………………………………………….22.
9.2.2 Integration testing…………………………………………………………….23.
9.2.3 System testing………………………………………………………………….23.
3
9.2.4 Validation testing ……………………………………...…………………….23.

9.2.5 Alpha testing …………………………………………………………………..24.
9.2.6 Beta testing…………………………………………….…………………..…..24.
9.3 Testing methods…………………………………………………………………24.
9.3.1 White box testing…………………………………………………………….25.
10. Limitations…………………………………………………………………………….25.
11. Future Scope…………………………………………………………………………25.
12.Conclusion ……….…………………………………………………………………..26.
13.References……………………………………………………………………………..26.
4
Diagrams
1. A machine learning model with input and target output diagram1.1……………5.

2. Block diagram of Convolution neural network diagram figure 3.1……………….7.
3.Block Diagram of Frame Extraction figure 4.1……………………………………….……9.
4.Flow Diagram of Frame Extraction figure 4.2…………………………………………….9.

5. Video to frame Extraction figure 4.3…………………………………………………….…10.
6. Diagram Video to audio Conversion figure 4.4……………..…………………………10.
7. Diagram Audio to text conversion figure 4.5……………………….……….……………10.
8. Block diagram of approach for classification of video figure 4.6…….….………..11.
9. Architecture of VGG19 figure 4.7 ……………………………………………….………….12.
10. Working of Random forest algorithm (Bagging) diagram 4.8…………….……..12.
11. Graph training loss and accuracy on dataset figure 6.1……………………….……17.
12. Training and validation accuracy figure 6.2…………………………………………….19.
13. Training and validation loss figure 6.2…………………………………………………...20.

14. Diagram waterfall model (Software development life cycle) figure 7.1………21.
15. Timeline Chart diagram 7.2…………………………………………………………………22.
16. Methodology of project figure 8.1…………………………………………………………22.

5
1.Introduction
Video Classification is an important and widely researched problem in the field of
Computer Vision. Machine Learning models have proved to perform significantly
well in order to generate good results. Usually the video clips are of variable length
but they are required to be mapped onto a smaller and fixed dimensional vector
representations of features in order to be processed by CNNs. ISRO has large
amount of videos in a chunks format. They want to categories them into various
categories but manually it is not possible to categories them, so they wanted a
software which categories videos automated in various categories such as launch,
space, interview, outdoor launchpad, graphics, crowed etc.
1.1 Goal
The main goal of this project is to design a software to classify videos. So we have
designed a software with the help of machine learning algorithms and natural
language processing which categories videos in predefined 30 categories.
1.2 Setup
The software takes videos as a input and classify them and generate their metadata
in text format. To design the model through machine learning it is necessary to
have a set of training data. The machine learning algorithm will use this data set to
train the model. The training data is experimental data provided by Indian Space
Research Organization. The dataset contains 3600 videos of 30 categories. The
given dataset is unstructured but we have used supervised learning algorithms for
classifying videos so we have to pre-process dataset into structured way. When a
model has been trained it has to be validated by using some auxiliary set of data,
called the validation data set.
Machine Learning Class of video +

Input video
Classification Model Metadata of video
A machine learning model with input and target output diagram 1.1
6
2. Theoretical Background
The theoretical framework is the structure that can hold or support a theory of a
research study. The theoretical framework introduces and describes the theory
explains why the research problem under study exists
2.1 Machine learning basics
Machine learning is an application of AI that enables systems to learn and improve
from experience without being explicitly programmed. Machine learning focuses
on developing computer programs that can access data and use it to learn for
themselves.
2.2 supervised learning
Supervised machine learning algorithms apply what has been learned in the past to
new data using labelled data to predict events. By analyzing a known training
dataset, the machine learning algorithm produces function to predict output.
2.3 Classification
Classification is a process of categorizing a given set of data into classes, it can be
performed on both structured or unstructured data. The process starts with
predicting the class of given data points .The classes are often referred to as target,
label or categories. In machine learning there are three types of classification. We
have used the multi class classification to classify the videos.
2.4 Metadata
Metadata is “data about data”. Video metadata is basically the data about a video.
Video meta data will have the information of the category of the video, like the
given video is an interview video or a satellite video, or a Satellite launching video,
or a control room Video. Furthermore, a meta data may be the descriptive
summary of the video, for example: a meta data for the satellite launching video
may have the details like “the video is about the launching of XYZ satellite by
GSLV”, for an interview video, the descriptive summary can be like “XYZ person
said that ISRO is planning for mars mission on XYZ date. Moreover, a meta data
can be like, length of the video in time.
7
3. Algorithm to Classify videos

Here we are using Convolution neural network and natural language processing to
classify videos. Metadata is generated from text by using speech recognition.
3.1 Convolutional neural network
Convolutional Neural Networks (CNN) is variants of Multilayer Perceptron
(MLPs) which are inspired from biology. These filters are local in input space
and are thus better suite exploit the strong spatially local correlation present in
natural images. Convolutional neural networks are designed to process two
dimensional (2-D) image. A CNN architecture used in this project is that defined
in the network consists of three types of layers namely convolution layer, sub
sampling layer and the output layer.
Block diagram of Convolution neural network diagram figure 3.1
The convolutional neural networks consist of five layers, which are following
i) input layer
ii) convolutional layer
iii) pooling layer
iv) fully connected layer
v) output layer
In the project CNN is used for Video classification using image recognition and
VGG19 model & NLP is used for classifying videos based on the text.
8
3.2 Natural Language Processing

Natural language processing (NLP) refers to the branch of computer science—and
more specifically, the branch of artificial intelligence concerned with giving
computers the ability to understand text and spoken words in much the same way
human beings can. NLP drives computer programs that translate text from one
language to another, respond to spoken commands, and summarize large volumes
of text rapidly—even in real time. There’s a good chance you’ve interacted with
NLP in the form of voice-operated GPS systems, digital assistants, speech-to-text
dictation
3.2.1 Speech recognition
Speech recognition, also known as automatic speech recognition (ASR), computer
speech recognition, or speech-to-text, is a capability which enables a program to
process human speech into a written format. While it’s commonly confused with
voice recognition, speech recognition focuses on the translation of speech from a
verbal format to a text one whereas voice recognition just seeks to identify an
individual user’s voice.
4. Experimental Procedure
4.1 Preparing dataset
The dataset provided by the Indian Space and Research Organization, India which
contains 3600 videos for of 30 categories for training and validation. The dataset
provided by the ISRO is in unstructured way so the first step is to prepare dataset
according to Classification algorithm.
4.1.1 Dataset for CNN
The dataset contains 30 categories which are following :
Aerial View, Airplane, Animation, Building, Composition Frame, Crowed, Display
Screen, Forest, Garden, Graphics, Helicopter, Hospital, Indoor Control Room,
Indoor Generic, Indoor Home, Indoor Lab, Interview, Launch, Logo, Mountain,
Outdoor Antenna, Outdoor Generic, Outdoor launchpad, Personal Closeup,
Satellite, Sky, Speech, Text, Traffic, Vehicle.
4.1.1.1 Algorithm
A video file is a collection of images. We can look at it as if video is text data, then
images are characters. Extracting images (also called frames) from videos is
important for various use cases such as image processing, analyzing a part of video
in detail, video editing and much more. OpenCV library provided by python is
used for video to frames Conversion
9
Input Video File OpenCV Output Videoframes
Block Diagram of Frame Extraction figure 4.1
Start
Input folder
If not exists
Finding the video
file
Output folder
Name directory if
not present
Name directory and

execute
Gating the number

of frames
If no
For loop
to process Set frame to 0
frame 1 to
last frame
If value of frame is already zero

Gating the number Last Frame
of frames Start frame 0
End
Flow Diagram of Frame Extraction figure 4.2

10
Input video file (launch.mp4) frame1.jpg frame2 last frame
Video to frame Extraction figure 4.3
4.1.2 Dataset for NLP Model

For preparing dataset Convert video into audio and audio inro text and save results
into a csv file.
Video File Moviepy Audio File
Diagram Video to audio Conversion figure 4.4
Audio File Speech Text File

Recognition
Diagram Video to audio Conversion figure 4.5

11
4.1.2.1 Dataset Format (CSV file)
File name Data Label

Video1 The satellite is launched……… Launch
Video 2 PM Modi delivering an speech …… Speech
4.2 Machine Learning Model for Classification

Convolutional neural network and natural language processing algorithms are used
for classification whose flow is explained below.
Video VGG19 Model VGG19 Model

Input file frame
video
file
Audio Audio to text Text Top 3
file converter summarization categories of
video
Random Forest
Classifier
Category of video
Common Category of video
Metadata Category of video
Block diagram of approach for classification of video figure 4.6

12
4.3 VGG19 model description

VGG-19 is a convolutional neural network that is 19 layers deep. We have loaded
a pretrained version of the network trained on more than a million images from
the ISRO database. The pretrained network can classify images into 30 object
categories, such as launch, interview, launchpad, and many categories. As a result,
the network has learned rich feature representations for a wide range of videos.
The network has an image input size of 224-by-224.
4.3.1 VGG19 Architecture
Architecture of VGG19 figure 4.7
In simple language VGG is a deep CNN used to classify images. The layers in
VGG19 model are as follows:
Conv3x3 (64), Conv3x3 (64), MaxPool, Conv3x3 (128), Conv3x3 (128), MaxPool,
Conv3x3 (256), Conv3x3 (256), Conv3x3 (256), Conv3x3 (256), MaxPool,
Conv3x3 (512), Conv3x3 (512), Conv3x3 (512), Conv3x3 (512), MaxPool,
Conv3x3 (512), Conv3x3 (512), Conv3x3 (512), Conv3x3 (512), MaxPool, Fully
Connected (30), Fully Connected (30), Fully Connected (30), SoftMax.
Here we are using transfer learning concept of machine learning. Transfer learning
for machine learning is when existing models are reused to solve a new challenge
or problem. Transfer learning is not a distinct type of machine learning algorithm,
instead it’s a technique or method used whilst training models.
13
4.4 Random Forest Classifier Architecture

Random Forest is a classifier that contains a number of decision trees on various
subsets of the given dataset and takes the average to improve the predictive accuracy
of that dataset.
4.4.1 Bagging– It creates a different training subset from sample training data with
replacement & the final output is based on majority voting. For example, Random
Forest.
Working of Random forest algorithm (Bagging) diagram 4.8
4.4.2 why we use random forest classifier ?
i) It takes less training time as compared to other algorithms.

ii) It predicts output with high accuracy, even for the large dataset it runs
efficiently.
iii) It can also maintain accuracy when a large proportion of data is missing.
14
5. Requirement Specification
These are the following requirements which are required to develop software at
development side and operate software at clint side.
5.1 Development side requirements
5.1.1 software used at development side
i) Development Environment Anaconda jupyter notebook

ii) Technology Used Machine learning
iii) Language Python, HTML, CSS
iv) Platform Windows 10
v) Frameworks Tensorflow, keras, Flask
vi) Web Browser Microsoft edge
vii) Server Localhost
5.1.2 hardware used at development side
i) Processor intel i5
ii) Ram 8gb
iii) Hard Disk 1TB
5.2 Client side requirements
i) Web browser Any web browser

ii) Ram 4gb
iii) Hard disk 10GB
15
6. Implementation and Results

Anaconda jupyter notebook is used for implementation and predict final result of
the model .
6.1 VGG19 model results
6.1.1 Dataset analysis
Found 10734 images of training belonging to 30 classes.
Found 1 images of testing belonging to 30 classes.
Found 1381 images of validation belonging to 30 classes.
6.1.2 Dataset dictionary

{'AerialView': 0,
'AirPlane': 1,
'Animation': 2,
'Building': 3,
'CompositionFrame': 4,
'Crowd': 5,
'DisplayScreen': 6,
'Forest': 7,
'Garden': 8,
'Graphics': 9,
'Helicopter': 10,
'Hospital': 11,
'IndoorControlRoom': 12,
'IndoorGeneric': 13,
'IndoorHome': 14,
'IndoorLab': 15,
'Interview': 16,
'Launch': 17,
'Logo': 18,
'Mountain': 19,
'OutdoorAntenna': 20,
'OutdoorGeneric': 21,
'OutdoorLaunchpad': 22,
'PersonCloseUp': 23,
'Satellite': 24,
'Sky': 25,
'Speech': 26,
'Text': 27,
'Traffic': 28,
'Vehicle': 29}
16
6.1.2 Model architecture

_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 ( Input Layer) [(None, 224, 224, 3)] 0
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
flatten (Flatten) (None, 25088) 0
dense (Dense) (None, 30) 752670
=================================================================
Total params: 20,777,054
Trainable params: 752,670
Non-trainable params: 20,024,384
_________________________________________________________________
17
6.1.3 Model trainging

Epoch 1/2
335/335 [==============================] - 4232s 13s/step - loss: 1.0948 - accuracy: 0.7215 - val_l
oss: 4.0535 – val_accuracy: 0.3411
Epoch 2/2
335/335 [==============================] - 4022s 12s/step - loss: 0.2286 - accuracy: 0.9519 - val_l
oss: 4.5892 - val_accuracy: 0.3534
6.1.4 Training and validation accuracy graph
Graph training loss and accuracy on dataset figure 6.1

6.1.4 vgg19 model output
This model gives result in python list format and the model accuracy is 89%.
[['Animation', 'Aerial View', 'Space']]
6.2 Random Forest Model results
6.2.1 Dataset analysis

Found 862 unique tokens.
18
6.2.2 Model training and architecture

The model has trained on CSV file dataset belongings to 30 classes.
((32, 130), (32, 25), (2, 130), (2, 25))
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_2 (Embedding) (None, 130, 128) 1024000
spatial_dropout1d_2 (Spatia (None, 130, 128) 0

lDropout1D)
lstm_2 (LSTM) (None, 64) 49408
dense_2 (Dense) (None, 25) 1625
=================================================================
Total params: 1,075,033
Trainable params: 1,075,033
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/10
1/1 [==============================] - 5s 5s/step - loss: 3.2340 - acc: 0.0400 - val_loss: 3.2362 - val
_acc: 0.0000e+00
Epoch 2/10
1/1 [==============================] - 0s 228ms/step - loss: 3.2079 - acc: 0.0000e+00 - val_loss: 3.
2180 - val_acc: 0.0000e+00
Epoch 3/10
1/1 [==============================] - 0s 225ms/step - loss: 3.2105 - acc: 0.1200 - val_loss: 3.1999
- val_acc: 0.2857
Epoch 4/10
- val_acc: 0.5714
Epoch 5/10
- val_acc: 0.600
Epoch 6/10
- val_acc: 0.64000
Epoch 7/10
- val_acc: 0.6000
Epoch 8/10
- val_acc: 0.65000
Epoch 9/10
- val_acc: 0.63000
Epoch 10/10
- val_acc: 0.63000
19
6.2.3 Training and validation loss and accuracy graph
Training and validation accuracy figure 6.2
Training and validation loss figure 6.3

6.2.4 Output of the model
The output is in python list format – [Aerial View]

20
6.3 Final Output
The final output ( category of video ) is determined by this algorithm.
i) Take out = output of vgg19 model & out2 = output of nlp model
ii) Apply for loop in range(3)
iii) If out2 present in out jump (iv) else jump (v)
iv) Return category= out2[0]
v) category=out[0]
vi) print(category)
vii) print(metadata)
Result :
Input video = meta.mp4
Output :
Category of video =[ Aerial view ]
Metadata = [
space agency the Indian Space Research Organisation of Maestro successfully lau
nched its Earth observation satellite into Orbit along with your observation satellit
e EOS Rebel t4i source polar satellite launch Hue go as far as to others more life i
nto the orbit Maestro as described the street as a marvelous accomplishment the e
arth observation satellite launch marks maestros first mission of 2022 was launche
d at the motion for a stroll plans to conduct a 3 other missions this year including t
he much-anticipated launch of the Chandrayaan 3 to the rule that ki slbc points to
work it was launched from Satish Dhawan Space Centre in Andhra pradesh's Srih
arikota holiday the rocket injected the observations on the light you s04 into synch
ronous Orbit some 529 km above the earth.
]
7. SOFTWARE PROCESS MODEL
A software development process, also known as a software development life cycle

(SDLC), is a structure imposed on the development of a software product. Similar
terms include software life cycle and software process. It is often considered a sub
set of systems development life cycle. There are several models for such processe
s, each describing approaches to a variety of tasks or activities that take place durin
g the process. Some people consider a life-cycle model a more general term and a
software development process a more specific term. In software Industries we hav
e many software model out of them some is given below:
i) Waterfall model
ii) Prototype model
iii) Spiral model
iv) V model
v) RAD (Rapid Application Development
21
Diagram waterfall model (Software development life cycle) figure 7.1
Spiral Model V model

22
7.1 Gantt Chart
S.N. Module Duration January February March April
1. Research 10 days
2. Analysis 10 days
3. Logical design 15 days
4. Physical design 15 days
5. Programming 30days
6. Testing 10 days
7. Documentation 10 days
7.2 Timeline Chart
Project Design Document

Begins ation
January February March April
Analysis Coding &

Testing
Timeline Chart diagram 7.2

23
8. Methodology
Methodology of project figure 8.1
9. Testing
9.1 Testing objectives
Testing is process of executing a program with the internet or browser of

Finding an error. A good test case is one which has high probability to find an as-y
et-undiscovered error. If testing is conducted successfully it will uncover error in s
oftware.
9.2 Testing Strategy
9.2.1 Unit testing
Unit testing refers to tests that verify the functionality of a specific section of code,
usually at the Function level. In an object-oriented environment, this is usually at t
he class level, and the Minimal unit tests include the constructors and destructors.
24
9.2.2 Integration Testing
Integration testing is any type of software testing that seeks to verify the interfaces b
etween Components against a software design. Software components may be integ
rated in an iterative Way or all together ("big bang") . Normally the former is consi
dered a better practice since it allows interface issues to be localized more quickly
and fixed.
9.2.3 System Testing
System testing tests a completely integrated system to verify that it meets its requir
ements.
9.2.4 Validation Testing
The process of evaluating software during or at the end of the development proce
ss to determine whether it satisfies specified requirements. In other words, validati
on ensures that the product actually meets the user's needs, and that the specificati
ons were correct in the first place, while verification is ensuring that the product h
as been built according to the requirements and design specifications. Validation e
nsures that ‘you built the right thing’. Verification ensures that ‘you built it right’.
Validation confirms that the product, as provided, will fulfill its intended use.
9.2.5 Alpha Testing
Alpha testing is simulated or actual operational testing by potential users/customer

s or an independent test team at the developers' site. Alpha testing is often employ
ed for off-theshelf software as a form of internal acceptance testing, before the soft
ware goes to beta testing.
9.2.6 Beta Testing
Beta testing comes after alpha testing and can be considered a form of external us
er acceptance testing. Versions of the software, known as beta versions, are release
d to a limited audience outside of the programming team. The software is release
d to groups of people so that further testing can ensure the product has few faults
or bugs. Sometimes, beta versions are made available to the open public to increa
se the feedback field to a maximal number of future users.
25
9.3 Testing methods
9.3.1 White box testing
White box testing White box testing is when the tester has access to the internal d
ata structures and algorithms including the code that implement these.
i) API testing (application programming interface) - testing of the applicatin

using public and private APIs.
ii) Code coverage - creating tests to satisfy some criteria of code coverage.
iii) Designer can create tests to cause all statements in the program.
iv) Fault injection methods
v) Improving the coverage of a test by introducing faults to test code paths
vi) Mutation testing methods
vii) Static testing - White box testing includes all static testing
10. Limitations
As the software is to be implemented on a network so it carries with it all the limit

ation that the network carries is addition to it the bandwidth of media should also
be appropriate in accordance with the number of computer running at a time.
Following are the few limitations associated with it as:

i) User can’t do their work without internet.
ii) User can’t classify more than one videos at a time.
iii) It is time taken process to classify videos and generate metadata.
11. Future Scope
This software is specially designed for Indian Space & Research Organization and
it will predict only these videos which are related to predefined 30 categories but w
e can predict other categories according to our requirements by training the mode
l again and again.
26
12. Conclusion
We studied the performance of convolutional neural networks in large-scale video

classification. We found that CNN architectures are capable of learning powerful f
eatures from weakly labelled data that far surpass featurebased methods in perfor
mance and that these benefits are surprisingly robust to details of the connectivity
of the architectures in time. Qualitative examination of network outputs and confu
sion matrices reveals interpretable errors.
Our transfer learning experiments on ISRO datasaet suggest that the learned featu
res are generic and generalize other video classification tasks. In particular, we ach
ieved the highest transfer learning performance by retraining the top 3 layers of th
e network.
13. References
• Research paper on Large-scale Video Classification with Convolutional Ne

ural Networks written by Department of Computer Science Stanford Unive
rsity.
• Research paper on Classification of Videos Based on Deep Learning resear
ched by Hindawi Journal of Sensors Volume 2022, Article ID 9876777, 6
pages https://doi.org/10.1155/2022/9876777
• Deep learning with python written by Mr. François Chollet
• Machine learning and algorithms written by Mr. Rudolph Russel
• https://towardsdatascience.com/
• https://analyticsvidhya.com/
• https://medium.com/
• https://researchgate.com/

Video Metadata Generation and Classification-1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Video Metadata Generation and Classification-1

Uploaded by

Copyright:

Available Formats

1

5.1.1 Software used at development side ……………………….…………..14.

6.1.4 Training and validation accuracy graph …………………………….15.

6.1.4 vgg19 model output …………………………………………..…………..15.

6.2 Random Forest Model results…………………………………………….16.

6.2.2 Model training and architecture ………………………………………17.

9.2.4 Validation testing ……………………………………...…………………….23.

1. A machine learning model with input and target output diagram1.1……………5.

4.Flow Diagram of Frame Extraction figure 4.2…………………………………………….9.

6. Diagram Video to audio Conversion figure 4.4……………..…………………………10.

7. Diagram Audio to text conversion figure 4.5……………………….……….……………10.

8. Block diagram of approach for classification of video figure 4.6…….….………..11.

9. Architecture of VGG19 figure 4.7 ……………………………………………….………….12.

10. Working of Random forest algorithm (Bagging) diagram 4.8…………….……..12.

11. Graph training loss and accuracy on dataset figure 6.1……………………….……17.

12. Training and validation accuracy figure 6.2…………………………………………….19.

13. Training and validation loss figure 6.2…………………………………………………...20.

16. Methodology of project figure 8.1…………………………………………………………22.

Machine Learning Class of video +

3. Algorithm to Classify videos

Block diagram of Convolution neural network diagram figure 3.1

3.2 Natural Language Processing

Input Video File OpenCV Output Videoframes

Block Diagram of Frame Extraction figure 4.1

Name directory and

Gating the number

If value of frame is already zero

Flow Diagram of Frame Extraction figure 4.2

Input video file (launch.mp4) frame1.jpg frame2 last frame

Video to frame Extraction figure 4.3

4.1.2 Dataset for NLP Model

Video File Moviepy Audio File

Diagram Video to audio Conversion figure 4.4

Audio File Speech Text File

Diagram Video to audio Conversion figure 4.5

4.1.2.1 Dataset Format (CSV file)

File name Data Label

4.2 Machine Learning Model for Classification

Video VGG19 Model VGG19 Model

Common Category of video

Metadata Category of video

Block diagram of approach for classification of video figure 4.6

4.3 VGG19 model description

Architecture of VGG19 figure 4.7

4.4 Random Forest Classifier Architecture

Working of Random forest algorithm (Bagging) diagram 4.8

4.4.2 why we use random forest classifier ?

i) It takes less training time as compared to other algorithms.

5.1 Development side requirements

5.1.1 software used at development side

i) Development Environment Anaconda jupyter notebook

5.1.2 hardware used at development side

5.2 Client side requirements

i) Web browser Any web browser

6. Implementation and Results

6.1.2 Dataset dictionary

6.1.2 Model architecture

block1_conv1 (Conv2D) (None, 224, 224, 64) 1792

block1_conv2 (Conv2D) (None, 224, 224, 64) 36928

block1_pool (MaxPooling2D) (None, 112, 112, 64) 0

block2_conv1 (Conv2D) (None, 112, 112, 128) 73856

block2_conv2 (Conv2D) (None, 112, 112, 128) 147584

block2_pool (MaxPooling2D) (None, 56, 56, 128) 0