VISIONHUB Suspicious Movement Classification and Weapon Object Detection Using Recurrent Neural Network RNN and Region Based Convolutional Neural Network R CNN

VISIONHUB: Suspicious Movement Classification and Weapon Object
Detection using Recurrent Neural Network (RNN) and Region Based

Convolutional Neural Network (R-CNN)
A research paper presented to the
College of Computer and Information Sciences Department in
Polytechnic University of the Philippines
In partial fulfillment of the
Requirements for Thesis Writing I
Presented by:
Caculitan, Harold
Erfe, Jefferson
Galindez, Elijah Christopher
Tarcenio, Earl Alvin
BSCS 3-1N
Presented to:
Mrs. Sherilyn Buban-Usero
June 2021
i
TABLE OF CONTENTS
Page
Title Page .............................................................................................................i
Table of Contents ............................................................................................... ii
List of Figures ................................................................................................... iv
List of Equations ................................................................................................v
List of Tables ..................................................................................................... vi
Chapter 1: The Problem and its Setting………………………………...…...1

Introduction……………………………………………………….…....…1
Theoretical framework ………………..………………………………. 3
Conceptual Framework………………..………………...……………. 7
Statement of the Problem………………..……………...……………. 8
Scope and Limitations…………………….…………………………... 8
Significance of the Study……………………………………………...11
Definition of Terms…………………………………………………..... 12
Chapter 2: Review of Related Literatures…………………………………. 13
Related Literature and Studies ……………………………………....13
Synthesis of the Study ……………………………………………….. 29
Chapter 3: Methodology…………………………………………………….... 31
Research Design ………………………………………………………. 31
Sources of Data …………………………………………………………32
ii
Research Instrument………………………………………………..… 34
Proposed System Architecture/ Design………………....... 34
Data Generation / Gathering Procedure…………………………… 35
Ethical Consideration……………………………………………..…... 37
Statistical Data Analysis………………………………………...…..... 38
Bibliography……………………………………………………………….……. 42
Appendices
Appendix 1: Application User Interface(Mobile)…………………..48
Appendix 2: Application User Interface(Software)………………..51
Curriculum Vitae………………………………………………………...……... 52
iii
LIST OF FIGURES
Number Title Page
1 Theoretical Framework 3
2 Conceptual Framework of the Study 7
Recurrent neural network and the unfolding in time of the

3 22
computation involved in its forward computation
4 System Architecture of the Study 34
iv
LIST OF EQUATIONS
Number Title Page
1 Equation Formula for Accuracy 39
v
LIST OF TABLES
Number Title Page
1 Advantages and Disadvantages of RNN 25
2 Interpretation of Percentage of Accuracy 40
vi
Chapter 1
THE PROBLEM AND ITS SETTING
Introduction
According to Google safety “the state of being free from danger or threat”
while others define it as safety and the measures that are taken to be safe and
protected. Philippines is also a Level 3 in travel sites meaning that the travelers
should reconsider on traveling to the Philippines because of increase crime in the
cities like theft, robbery and crimes relating to illegal drugs, terrorism, civil unrest
on different parts in the country.
Security cameras have become an essential aspect when it comes to

personal belongings and properties safety. They are mainly used only for
surveillance purposes to watch and record over your properties. It is important to
people to prevent crime in their homes and protect their belongings. There are
numerous security system software that has a delay response in burglary wherein
the burglar is already gone or already too late.
However, this study focuses on the accuracy of the system to detect

firearms or deadly weapons and alert the authorities before a burglary starts.
Burglary is defined as unauthorized entry to a building or premises with the use of
force and intent to steal goods. Burglary includes theft from a house, apartment,
shop, office, military establishment and many more with a possession of a weapon
or firearm. There will come a time where people will choose to sacrifice their
freedom in the name of better security even if that is against their belief just to help
them keep their loved ones safe.
Deep Learning is a subset of machine learning that achieves great power

and flexibility by learning to represent the world as a nestled hierarchy of
1
concepts, with each concept being defined in relation to simpler concepts and
more abstract representations calculated in terms of less abstract ones.
(Mahapatra, 2018). Deep learning is all the rage today, as companies across
industries seek to use advanced computational techniques to find useful
information hidden across huge swaths of data. While the field of artificial
intelligence is decades old, breakthroughs in the field of artificial neural networks
are driving the explosion of deep learning (Woodie, 2017). The biggest
advantage of Deep Learning algorithms as discussed earlier is that they are
increasingly trying to learn high - level data features. This eliminates the need for
domain expertise and extraction of hardcore features, and it also takes a lot less
time to run deep learning. Deep Learning really shines when it comes to complex
problems such as image classification, natural language processing, and speech
recognition (Mahapatra, 2018). Convolutional Neural Networks (CNN) is
everywhere. It is probably the most popular architecture of deep learning. CNN's
main advantage compared to its predecessors is that without any human
supervision it automatically detects the important features. CNN is also efficient
in terms of computation. It uses special operations of convolution and pooling
and shares parameters. This enables CNN models to run on any device, making
them universally attractive (Dertat, 2017). The main purpose of a Convolutional
neural network (CNN) is to classify images.
The researchers created a system that would recognize human using Deep
Learning strategies. With the use of strategy of deep learning called
Convolutional Neural Network, it is possible to create a system that would
produce an accurate recognition. The researchers claimed that the created
system would help us by detecting weapons for illegal activities using CCTV that
would help LGU’s in their further service in the community. Lastly, it can help
Computer science research as they can refer and improved knowledge to the
algorithm used in the study. The purpose of this study is to solve out the
computational problem of given a set of actions from a given individual, can the
2
system detect if it does a set of certain movements deemed suspicious and
brings out objects warranted for danger.
Theoretical Framework
Figure 1.1
This chapter describes and summarizes the most important literature. It also
provides insight into the knowledge area, theme, central concepts, and the
relations between the central concepts of this research. This includes discussions
and dilemmas of leading scholars within the privacy field and insights in recent
developments.
3
The concept of privacy is rather multi-dimensional. One of the first
individuals to define the right of privacy were Warren & Brandeis in the year 1890
in their article ‘The right to Privacy’. Warren and Brandeis were two lawyers at the
time and they defined privacy as the right to be let alone (Warren & Brandeis,
1890).
Right to privacy
After the early definition of privacy by Warren & Brandeis, mentioned

previously, other researchers were more concerned with discussing privacy with
regards to a more legal and moral right. Looking at the media as well as the
common press in the 1980’s, they had a high number of different tools and
instruments that could help them to monitor and even register everyday lives of
citizens. Warren & Brandeis argued that the laws existing at the time were alone
not enough to protect the public against the invasion of the common press.
Privacy and the age of computers
Alan Westin's book Privacy and freedom made a large and significant
contribution to the ever-growing concerns regarding privacy in the 1960s. The
author argued that there are types of surveillance that posed a risk towards
people’s privacy. There was the ‘physical surveillance', these were considered the
cameras and wire taps which could track and trace people (Westin, 1967, p.69).
Additional new technologies also enabled monitoring in an even more efficient and
effective way, which could create large power imbalances between individuals
(who wanted to be anonymous) and those using surveillance techniques. Both
business companies and the government used these new surveillance types,
according to the author. And the other our psychological and privacy issues
surveillance.
4
Privacy and the age of electronic networks
Conclusively, concepts with regards to privacy from 1970s onwards till even
the early 2000s consisted of further elaborating privacy concepts which included
protection against Internet tracking (late 1990s) and the use of networked
databases (early 21st century). Researchers in that time described privacy as a
concept that is not merely concerned with the protection of personal data of
individual but also with the protection of a certain situation that allowed individuals
to make personal choices and express their own emotions.
NOTIONS OF PRIVACY
The private sphere
The protection of the private sphere can be understood as the protection

from recordings and even observation regarding domestic affairs and (group)
association. Regarding CCTV surveillance this means, protection of sensitive
issues such as peoples’ habits and association with whomever people want to.
Bodily integrity
Refers to privacy of location and space that individuals have. People have
the right to move in public without being monitored, identified, or even tracked.
When citizens are free to move about public space without fear of identification,
monitoring or tracking, they experience a sense of living in a democracy and
experiencing freedom. Both these subjective feelings together with protection of
physical invasion contribute to a healthy, well-adjusted democracy (Dubbeld,
2004; Friedewald et al., 2014).
5
Informational privacy
It is about that personal data is protected via laws and legislation and that
the data is not publicly available to organizations and other individuals.
CLOSED-CIRCUIT TELEVISION SURVEILLANCE
CCTV surveillance versus the photo camera
CCTV surveillance can be seen as an extension to the surveillance

capacities of commonly used photo cameras, which can result in privacy definitions
and for example the privacy notions, as discussed previously, to be considered
insufficient. Meaning, differences and similarities between CCTV surveillance and
photo camera pose doubts for the concept of privacy.
CCTV and processing the electronic data
CCTV surveillance reveals aspects of people’s identities that primarily

cannot be easily derived from textual data; however, the observed individual is
largely anonymous with regards to CCTV surveillance.
Perception & Reaction to CCTV surveillance
According to several researchers, CCTV surveillance has an influence on

feelings, behaviour and cognitions of people (e.g. Baumeister & Leary, 1995; Van
Rompay et al., 2009). These components together create an attitude of person
(Ajzen & Fishbein, 180)
6
Conceptual Framework
The concept model of this study is visualized using the diagram below. This
study expects that any suspicious movements or acts that will be recorded and
detected by the camera will be beneficial to the owner of the house to be aware of
the things that can happen, either good or bad.
Figure 1.2
Research Paradigm of the study
The data that will be used in this study will be coming from the Owner’s
Information upon registering in the given software, CCTV footages and detected
Motion Images as well. Owner’s Information will be kept privately to ensure that it
is safe and free from information spill. Including also one of its feature, Remote
access by the owner will be used as an input for various purposes. The user has
an option whether to operate the camera or not using the smartphone. All the
footages and images capture by the camera will be processed for recognizing and
identifying suspicious movement as well as some deadly weapon and firearms. In
7
addition, it will immediately alert the owner when the system detected malicious
act and objects.
In the end, these responses made by the owner will be the key concepts to
know and associate how VisionHub is accurate in sending awareness to its users.
Statement of the Problem
This study deals on how to better increase security systems in Filipino

homes as well as another infrastructures security. The study aims to help lesser
fortunate Filipino people’s homes that can’t provide huge amounts of money to
secure their home. By the end of this study, it aims to answer the question “Can
this type of minimal security system help protect a family’s home and if it can keep
up with the modern security systems that people have now?”
1. What is the accuracy rate of VisionHub in detecting suspicious movements

as well as informing the owner about breaking and entering on his/her
premises?
2. What is the accuracy rate of VisionHub in detecting deadly weapons and
firearms?
3. What is the accuracy rate of VisionHub on alerting the proper authorities
during unlawful activities?
Scope and Limitation
VisionHub program can be accessible anywhere the user is as long as that

they have access to Internet connection or Mobile data.
VisionHub program cannot playback the footage caught in the camera

unless it is being recorded. It will capture an image, however it can only save one
image at a time, so if multiple motion is detected it cannot register all movement.
8
VisionHub is available on Android devices but not all version of android
system can support it. For computer systems, it is available only on Windows
Operating Systems. The webcam drivers should be setup towards the connected
device such as a computer or a mobile phone. It won’t be accessible to mobile
phones if the computer is off.
VisionHub software can only detect suspicious clothing and movement,

weapons and firearms that are in this list:
1. Firearms classification
a. Pistols
b. Rifles
c. Shotguns
2. Weapons classification
a. Blunt weapons (Hammers, Sledge Hammers, Bats)
b. Blade weapons (Any form of knives or pointy edged
weapons)
3. Suspicious Clothing
a. Balaclava
b. Ski mask
4. Suspicious Movements
a. Wearing common burglar outfits
b. Sneaking
c. Climbing over fences and windows
d. Forceful entry
e. Breaking and entering
VisionHub software will also consider different factors that will be
considered and must be dealt with in order to help the software achieve its best
results. The different factors that will be considered are the following:
9
• The brightness can affect the study. The researchers will try to simulate it
in a well-lit and semi dark room to capture the image in each frame.
• The range can also affect the study. The researchers will be requiring a
minimum of 2-3 meters’ distance from the camera so we can avoid the
blurriness of the image and achieve a clearer and more accurate result.
• The weather can also affect the study. May it be a sunny-day or a rainy-day
either of the two can greatly achieve different results and the researchers
will try to simulate it in these conditions.
• The quality of the camera can also affect the study. The outcome of the
study can heavily rely on the camera. The camera the researchers will use
is a 1080-pixel camera and a computer that that has at least a decent
amount of storage.
• The study will focus on the movement of the person as well as the hands if
he/she has a weapon.
• The number of persons in the camera’s field of view can also affect the
performance of the software. The software will have a hard time identifying
each and every person in the camera angle. So the researchers will try at
least 2 persons in the camera to check if the software is still able to identify
and analyze the view.
10
Significance of the Study
This study would focus to have a greater view of smart camera detector.
Hence, the given device is the best action for reducing any incident of theft.
This study would be beneficial to the following:
To the People/Owner of any property who want to protect and keep safe any
personal belongings to them and to provide a peaceful mindset
To the Proper Authorities who can increase their response time and actions to
prevent any unlawful activities relating to anything illegal.
To the Future researchers, this study could give a hand to the future researchers
to acquire other information that this proposal would be failed of to further as well
as it would serve as their sources to know more about the project. The researchers
used the Region Based Convolutional Neural Network (R-CNN) for the purpose of
object detection together with the Recurrent Neural Network (RNN) for the video
classification. They can also help further advance the use or purpose of this system
in the field of object detection and video classification.
11
Definition of Terms
The following words and phrases were conceptually used in this study for
better understanding to the readers.
Burglary
It is defined as unauthorized entry to a building or premises with the use of

force and intent to steal goods.
CCTV
Also known as Closed-Circuit Television is a TV system in which signals

are not publicly distributed but are monitored, primarily for surveillance and security
purposes.
CNN
A convolutional neural network is a type of artificial neural network used in

image recognition and processing that is specifically designed to process pixel
data.
Deep Learning
An artificial intelligence (AI) function that imitates the workings of a human

brain in processing data and creating patterns for use in decision making.
Intruder Detection
It is a process of detecting intruders behind attacks as unique persons.
Motion Detection
It is the process of detecting a change in the position of an object relative to

its surroundings or a change in the surroundings relative to an object.
12
Object Detection
It is a computer technology related to computer vision and image processing

that deals with detecting instances of semantic objects of a certain class in digital
images and videos.
Security System
Method by which something is secured through a system of interworking

components and devices.
R-CNN (Region Based Convolutional Neural Network)
A family of machine learning models for computer vision and specifically

object detection.
RNN (Recurrent Neural Network)
Class of artificial neural network where connections from nodes form a

directed graph along a temporal sequence. Used to process variable length
sequence of inputs and derived from feedforward neural network.
Theft
Taking someone's property but does not involve the use of force
13
Chapter 2
REVIEW OF LITERATURE AND STUDIES
Crime Rates in The Philippines
Sanchez (2020), stated that crime in the Philippines is one of the concerns
that are faced by every local citizen, especially those living in the country's larger
urban cities such as Metro Manila. Maintaining security and order has been
challenging for the police because of increasing crime rates since 2009. The
Philippines scored one of the lowest countries amongst the countries in the Asia
Pacific Region in the statistical study of the order and security index of each
corresponding nation. Furthermore, a recent finding put the country's intentional
homicide the highest rate in all of Asia.
Home or Workplace Security Systems
There were some of research have been done about home security system.
The first research has been done by S. Tanwar, P. Patel, K. Patel, S. Tyagi, N.
Kumar and MS Obaidat entitled "An Advanced Internet of Thing based Security
Alert System for Smart Home". It describes inexpensive home security systems
using Infrared (PIR) and Raspberry Pi modules to minimize delays during e-mail
alerts. Therefore, there are PIR sensors as motion detection and Raspberry Pi as
its processing module.
Secondly, there was a research conducted by Jayashri Bangali and Arvind

Shaligram entitled "Design and Implementation of Security Systems for Smart
Home based on GSM technology". It suggests two methods for home security
systems that are implemented into one application. The first system uses a web
camera that is useful for capturing motion and object, warning sounds and sending
14
feedbacks to the user. The second method sends SMS using module GSM-GPS
Module (sim548c) and Atmega644p microcontroller, sensor, relay and buzzer.
The third study was conducted by Renuka Chuimurkar and Vijay Bagdi
entitled "Smart Surveillance Security & Monitoring System Using Raspberry PI and
PIR Sensor". It discusses the design and implementation of monitoring systems
using Raspberry Pi and PIR Sensors for mobile device. The system has the ability
to detect smoke as well as human that can provide precautions against potential
crime and potential fire. The hardware it uses is using Raspberry Pi (RPI) with
OpenCV addition to handle image processing, alarm control and send captured
photos to user email via WiFi8 Alarm system for the initial sign, the system will play
the sound recording: "intruder" or "smoke detected" when there is detection.
Laser rays and LDR sensor are used to detect intrusion using their
movement was proposed in 2016. The way the system works is that a laser is
focused towards a LDR sensor and the moment that the contact of laser to LDR
sensor breaks, the alarm connected to the sensor goes off alerting the neighbors'
and sends a SMS to the owner. This system solves the problem of covering the
places which are out of range from the fixed cameras but faces the same difficulties
which are faced with systems consisting of GSM modules to send text messages,
which is that the delivery of message is dependent on network coverage. Also due
to the nature of lasers being a straight beam, it can be avoided by intruders who
know about the system and are capable of dodging the lasers, rendering the whole
system useless.
A novel way to design an electronic lock is using Morse code and IoT
technology. The authors claim that this as an original idea which have not been
tried before and is the first of its kind “optical Morse code-based electronic locking
system”. This system uses LED‟s (Light emitting diodes) as an encrypting medium
15
to send signals. To make it more accessible to general public, the LED in smart
phones has been used. On the receiver’s side is a photosensitive resistor as well
as a microcontroller such as Arduino processor which has the ability to decrypt the
optical signal after receiving them from the LED. Upon decoding the signal, it can
than upload the current condition of the lock to a cloud from where the owner can
monitor the system. The authors performed experiments to the system in real time
and it has proved to work under different illumination environments with all the
functions working as they were intended to. The authors also claim to have an
easy and user-friendly interface. The IoT system developed here works very well
and can be used by anyone and is very convenient due to the use of mobile phones
as LED, which also makes it a cost expensive alternative. Anitha et al (2016)
proposed a home automation system using artificial intelligence also proposed a
model for cyber security systems.
A GSM Based Home Security Alarm System Using Arduino was

implemented. The proposed system consists of a PIR Motion sensor, GSM module
and the Arduino microcontroller. The system works as follows: as the PIR sensor
detects any motion, the output of the sensor goes HIGH. This is detected by the
Arduino; it then communicates with the GSM module via serial communication to
make a call to the pre-programmed mobile number. The project was primarily
designed for detecting intruders and informing the owner by phone calls. The
advantage of this system is that it is simple and affordable. But it lacks efficiency
because if there is traffic congestion in the mobile network, there could be
possibility of poor call initialization and the owner could not be signaled on the
intrusion.
The authors designed and implemented a low-cost smart security camera

with night vision capability using Raspberry Pi with Passive infrared motion sensor.
The Raspberry pi and passive infrared sensor (PIR sensor) handles the moving
body, control algorithms for the alarms and sends captured pictures to user’s e-
16
mail. As part of its alarm system, the speech sounds “intruder” is played when
there is detection through a loud speaker. The system uses ordinary web camera
but its infra-red filter was removed in order to have night vision capability. With help
of light dependent resistor, it will sense whether it is night or day if it is night the
led will ON when it detects intruder. The advantage of the system is that it takes
into account day and night periods using the light dependent resistor to keep the
system active at all times, it also included a night vision capability.
The “Global System for Mobile Communication (GSM) Based Home

Security System with SMS Alert Using Human Body Motion Detective and GSM
Module was a system implemented which comprised of an infrared motion detector
and magnetic sensor as a transducer for detecting intruder’s motion or break-in
through a door. The signals were processed by an embedded microcontroller unit
which activated the GSM module and send SMS message to the house owner’s
mobile phone, while triggering an alarm in the location. The advantages the system
had was in cost effectiveness and remote alertness but lacks the streaming video
coverage which is essential to clearly identify the intruder(s) and possibly
apprehend them.
The “Home Alarm System using Detector Sensor” was proposed. The work
deployed Passive Infrared (PIR) sensors, buzzer, a timer circuit (555 timer) and
recording unit in the design. The authors efficiently employed the use of multiple
sensors for efficient detection with Buzzers remotely installed. Its video is only
stored locally, not internet enabled. This remains one of the limitations of the
system because the location of the device where the video is stored and recorded
could be identified and destroyed.
Several criteria have been used to select a security system required to

safeguard a facility. The chief among all these has been the cost of
17
implementation, remote monitoring and efficiency. The Raspberry Pi as a
microcomputer has immense functionality. The choice of the Raspberry pi gives
room for fast, reliable, cost-effective and remote surveillance system. One
distinguished feature of this research work over all other pi dependent surveillance
systems is the ‘triple play’ access to the video or picture feed of the pi. Two of
those are internet based and remote access from anywhere in the world, and the
other being the actual storage of the feed to the pi local storage. The system, on
intrusion alerts the owner by SMS, buzzes the alarm located at a convenient
distant and record feeds which could be accessed in 3 distinct ways. The system
is characterized with efficient video camera for remote sensing and surveillance,
featured with stream live video and records for subsequent replay.
Emergency Response Systems
A study conducted by J Med Internet Res (2016), resulted in the

development of what was called Personal Emergency Response System (PERS)
that was the embodiment of three generations of alarm devices over the previous
years. The first generation of these devices had a unit placed in the central location
of the home, with a switch or a cord that the user can pull in cases of emergency.
The second generation had the addition of wristbands, necklaces, or pendants that
had a built-in button that the user can also use in cases of emergency. This allowed
open communication between the user and the responder through the main unit,
enabling, the unit to direct a proper response. The range of these accessories
towards their responder unit was about inside the house and partly on the outside.
The third and latest generation of the PERS had the potential to incorporate a
range of devices, examples of which were the automatic fall alarms, fire alarms,
and even blood pressure devices providing remote care towards its user.
Most included studies that assessed displacement/diffusion provided no

evidence of such effects (Gómez et al., 2017; King et al., 2008; La Vigne & Lowry,
2011; Priks, 2014). However, one study found local displacement (Priks, 2015a)
18
and another found local diffusion of benefit (Munyo & Rossi, 2016). The study by
Munyo and Rossi (2016) was the only one that, in addition to assessing local
displacement/diffusion, also examined the effects of surveillance cameras on
aggregate crime in the city as a whole. Their analysis indicates that the crime
reduction in monitored areas was fully compensated by a crime increase in
unmonitored parts of the city.
Surveillance Systems
Security system in the past only provides surveillance and the control of
camera movement to cover certain vicinities but with the advancement on
technology, also comes the advancement of security systems mainly on the
surveillance department. With the use of Raspberry Pi and Open CV, it can create
a real time security system that is based on movement. Open CV is an effective
programming library that focuses on pictures preparing apparatuses. This aims to
spare money, time and lives. The client is cautioned by receiving warnings that
makes the owner cautious on the suspicious movement in the camera. With the
help of PIR Sensors also referred to as Passive Infrared Sensors that works by
sensing motion and is based on the principle that everything emits a small radiation
that can be detected with PIR.
Nowadays when it comes to surveillance systems, it is very costly main for

the part of the computer usage to operate these cameras but with the
advancements on mobile devices, it can also be used for control of the surveillance
systems. With the help of Raspberry Pi, it can lessen the power consumption for
mobile devices as well as the better resolution. The advantage of real time systems
is the rapid and immediate response for crime detection and the ability to prevent
it in a small amount of time. The aim is to create a smart surveillance system that
can remotely be accessed by the user and can receive real time notification when
a strange movement is detected within the vicinity of the camera.
19
Traditional surveillance techniques still used either wireless sensor
networks or closed circuit televisions where they used individuals to monitor each
camera feed. There are home surveillance systems that maximizes the use of
computer vision techniques to recognize intrusion and the threat information
containing the type of trespasser and possible weapon used. Some developed
SIDSS (Smart Intruder Detection and Surveillance System) that involves 3 different
stages of computer vision algorithms. Convolutional Neural Network (CNN) that
focuses on threats and intruder detection. The Cascading classifiers are tasked to
locate any intruders with the use of the correct mechanisms to overcome
undetected threats. Lastly Principal Component Analysis (PCA) is used to train the
facial recognizer to determine civilians from potential intruders. With this enhanced
software, the system can achieve better accuracy of recognizing a wider range of
weapons and intruders.
Any individual who is caught carrying a firearms or deadly weapon in public

is a good indicator that can lead to dangerous situations. Some of these dangerous
incidents can lead to incidents involving a massacre of innocent people with those
who has deadly intents and firearms. With the rise of the digital age, it brings the
rise of surveillance systems thru CCTV’s for different uses like transportation
systems, traffic monitoring, vehicle recognition and etc. Another application of
these cameras can be on security, where a system can automatically detect
firearms in pictures or video feeds that can lead to faster and a more efficient
response from law enforcement personnel. This is all possible with one of the most
profound techniques for automatic surveillance systems is machine learning and
computer vision.
Image Processing
20
Image processing is set of computational techniques for analyzing,
enhancing, compressing, and reconstructing images. Its main components are
importing in which an image is captured through scanning or digital photography
analysis and manipulation of the image accomplished using various specialized
software applications; and output (Editors of Encyclopaedia Britannica, 2018).
According to Imatest (2018) Image quality is one of the concepts that are
greater than the sum of its parts. Every image quality factor counts. This page
introduces the key image quality factors and briefly describes how Imatest™
measures them with links to detailed pages. It is a guide to Imatest organized by
image quality factors. Image quality measurements are affected by the (1) Lens
(2) Sensor (3) Image Processing Pipeline.
Human Motion Detection
To address the challenge of deploying AI for crime prevention, Fujitsu has

developed behavioral analysis technology called “Actlyzer”. Actlyzer recognizes
complex behaviors from multiple basic actions or movements, without making the
AI learn a large amount of video data. With this technology, the AI learns around
100 types of movements as basic actions in advance; from simple actions such as
walking and stopping, to more complex movements and actions such as turning
the head to the right and raising the left hand. These basic actions can be
recognized with an average accuracy of 90 percent or more through deep learning.
Actlyzer can detect complex behavioral patterns, such as suspicious

behavior, by analyzing combinations of types, order, and place of basic actions as
well as the target of actions. For example, in detecting lock-picking, it is possible
to identify potentially suspicious behavior by specifying combinations of basic
actions or movements. Such a combination might be: "positioned in front of a door,
21
sitting, looking at the keyhole, and reaching for the keyhole." Conditions might also
be added, such as the place of occurrence and the target of actions.
Recognition accuracy of suspicious behavior can be further refined after the

system's introduction. This can be done by specifying additional behavioral
conditions, such as turning the head left and right to look around, and by changing
analysis parameters, such as each action’s duration.
Recurrent Neural Networks

Recurrent Neural Networks (RNNs) are popular models that have shown
great promise in many works which need the information of history such as
language processing or video processing. The main idea behind RNNs is to use
sequential information. In other neural networks, all inputs and outputs are
independent of each other. But for many tasks it does not work well. For example,
If you want to predict the next word in a sentence, it is better to know which words
came before it. RNNs are called recurrent because they perform the same task for
every element of a sequence, and the output depends on the previous
computations. Also, we can say that RNNs have a “memory” which captures
information about what has been calculated so far. In theory RNNs can make use
of information in arbitrarily long sequences, but in practice they are limited to
looking back only a few steps. Here is what a typical RNN looks like:
Figure 2.1: recurrent neural network and the unfolding in time of the computation
involved in its forward computation
22
The figure shows a RNN is unrolled or unfolded into a full network. With
unrolling we simply mean that we write out the network for the full sequence. For
example, if we want to use 5 sequences of a video, the network would be unrolled
into a 5-layer neural network, one layer for each sequence. More details about the
parameters in figure and the formulas of RNN are as follows:
xt is the input at time step t. For example, x1 could be a vector corresponding

to the second sequence of a video. st is the hidden state at time step t. It’s the
“memory” of the network. st is calculated based on the previous hidden state and
the input at the current step:
st = f(Uxt + W st−1)
The function f usually is a nonlinearity such as tanh or ReLU. s−1, which is

required to calculate the first hidden state, is usually initialized to zero. ot is the
output at step t. For example, if we wanted to predict the the position of human in
the next time stamp in a video it would be a vector of probabilities.
ot = softmax(Vst)
Training a RNN is similar to training other neural network. Backpropagation

algorithm is used here, but with a little change. As the parameters are shared by
all time steps in the network, the gradient at each output is calculated not only
based on the calculations of the current time step, but also on the previous time
steps. For example, in order to calculate the gradient at t=6 we would need to
backpropagate 5 steps and sum up the gradients. This method in named
Backpropagation Through Time (BPTT). RNNs have shown great success in many
tasks. But the most commonly used type of RNNs are LSTMs, which are much
better at capturing long-term dependencies than vanilla RNNs are.
In this paper, the literature proposes a novel architecture based on the use
of a 2-layer Recurrent Neural Network (RNN) to learn the movement patterns from
normally developing children as observed from the combination of the time series
of 4 tri-axial accelerometers (one at each wrist and ankle). The RNN will be
configured to minimize the reconstruction error when predicting the next 2 second
23
acceleration data for the 4 sensors based on the past 6 seconds of data. Once
trained, the RNN will be fed with data from other participants in order to assess
whether the RNN is able to recognize previously learnt patterns. The Pearson
Correlation Coefficient will be used to assess the reconstruction similarity. Some
authors not only showed that deep RNNs achieved better results for Human
Activity Recognition than other Deep Networks such as Convolutional Neural
Networks (CNN) and other shallow methods such as Support Vector Machines
(SVM) but also showed that the architecture of the RNN will have an impact in the
results depending on the dataset. The number of layers and the size of the memory
cells must consider not only the complexity of the activities to be recognized, but
also the size of the dataset in order to avoid overfitting issues. The study also
achieved optimal results for RNN structures stacking 2, 3 or 4 layers depending on
the dataset. Researchers achieved optimal results when stacking 3 layers. The
structure proposed in this paper is adapted to the dataset that we have generated
in the experimental part of the study.
Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM)
Traditional neural networks do not connect previous information to the

current task, whereas recurrent neural networks use loops to allow information to
persist. The overall accuracy of a novel RNN network recognizing objects from
imagery and being provided with location information could improve the accuracy
with up to 88.36%. From experimental results, better recognition performance
could be achieved using LSTM-RNNs and bidirectional LSTM-RNN in comparison
to using traditional RNN networks; along with this, it could be shown that
LSTMPRNNs (Long Short-Term Memory Projection-Recurrent Neural Network)
could mitigate overfitting problems. LSTM and Gated Recurrent Unit (GRU)
approaches perform slightly better than standard RNN networks. LSTM networks
perform better than CNN networks in most general cases, but training CNN
networks are much faster than LSTM-based networks.
24
Neural Network Human Motion Prediction
Recent work on human motion prediction for short-term motion has focused
on recurrent neural network architectures (RNN). Fragkiadaki et al. proposed a
RNN based model that incorporates nonlinear encoder and decoder networks
before and after recurrent layers. Their model is able to handle training across
multiple subjects and activity domains. Jain et al. introduced a method to
incorporate structural elements into a RNN architecture. Autoencoders also can
be used for denoising the prediction. Martinez et al. introduced a gated recurrent
unit (GRU) based approach with a residual connection in the loop function and
showed that this outperforms prior RNN based methods. Pavllo et al. further
improved the RNN-based prediction by changing the joint angle representation to
quaternions. However, this comes at the cost of additional normalization layers
and normalization penalty. Recently Wang and Feng introduced a position-velocity
recurrent encoder-decoder model (VRED). Their model adds an additional velocity
connection as an input to the GRU cell in the recurrent structure.
Motion prediction based on recurrent neural networks promises good

results for predicting short-term motion. However, the models are trained on
human data only. Handling environmental constraints is not possible yet and would
require large amounts of training data. Human environments are typically cluttered
with objects and obstacles. Thus, in a human-robot collaboration scenario,
adapting to such environmental constraints is crucial for the prediction.
For our human prediction model, we adapt the VRED architecture by Wang
and Feng and modify it.
Advantages Disadvantages
RNN can process inputs of any length. The computation of this neural
network is slow.
25
An RNN model is modeled to remember Gradient vanishing and
each information throughout the time which exploding problems.
is very helpful in any time series predictor.
Even if the input size is larger, the model size Training an RNN is a very
difficult task.
does not increase.
The weights can be shared across the time It cannot process very long
steps. sequences if using tanh or relu
as an activation function.
RNN can use their internal memory for
processing the arbitrary series of inputs
which is not the case with feedforward neural
networks.
Suspicious Movements
A study conducted by Amrutha C.V. et al (2020), where they used a deep

learning approach to detect abnormal behavior within an academic environment
such as a school. They used three phases in conducting the training of the given
model which was data preparation, training the model and then the inference. The
researches also used the KTH standard dataset that had a collection of sequences
representing actions which then had their own sequences within frames. The
model worked with an accuracy rating of 76% during its training phase through the
use of the LSTM architecture to classify which of the frames of human activities
were suspicious or normal behavior. The limitation towards their study however is
that it was only used within a very specific environment factoring only the
suspicious behavior in said location.
Convolutional Neural Network (CNN)
26
Image Extraction involves extracting a minimal set of features containing
sets of information from a given image's pixel values. Convolutional Neural
Network (CNN) has been used to present an operative class of models that has
resulted in better image recognition, segmentation, detection, and retrieval
compared to other classifiers according to (Sharma et al. 2018). The researchers
also surmised that neural networks are new and best emerging techniques that
can be used towards making a machine or system intelligent for solving the
problems of automated and convenient real-life object categorization problems.
Deep Learning is an integral part of CNN and has greatly improved performances
in many image processing functions, notably for image classifications and
categorizations (Maczyta et al. 2019).
Often called ConvNet, it has the ability to generalize in a better way as

compared to networks with fully connected layers. CNN is considered above other
classical models due to which the number of parameters that are needed for its
training is substantially reduced. Next is due to lesser parameters it can be trained
smoothly and will not suffer overfitting. Lastly is that it is more difficult to implement
large networks using general models of artificial neural networks or ANN compared
to implementing in CNN. It is widely being used to this day in various domains due
to its remarkable performance in object detection, face detection, speech
recognition and even facial expression recognition and many more (Indolia et al.
2018).
M. Prabu et al. (2020), proposed a model implementing CNN algorithm

which has an edge over regular neural networks in detecting and organizing the
segregation of biodegradable and non-biodegradable waste. The proposed model
was able to efficiently classify and segregate said classifications of waste with an
accuracy of 92 percent with the large data set the researchers used. Said model
did so in real time one at a time which can be further advanced with more testing.
Strengths and Weaknesses of CNN
27
While CNN has been a main stay in object detection, it does have its
drawbacks. Such drawbacks are due to the many variances that can occur in the
real 3D world which includes different angles, backgrounds and lighting conditions.
And creating a ConvNet model to recognize objects in such variances compared
to the capabilities of a human being has been proven difficult. The model takes
difficulties in identifying an object within an image if it consists of constant rotations
and scaling. CNN also does not have coordinate frames which are a mental model
that keeps track of what are the different sides and features of a given object which
is a basic component of a human being, specifically in their vision.
CNN has also been shown to fail at its task to do coordinate transform
problems which requires learning a mapping point between coordinates in their x
and y respectively. A study conducted by Rosanne Liu and her colleagues (2018)
provided a solution to this by giving the model its own input coordinates through
the use of extra coordinate channels. The solution entitled CoordConv solved the
coordinate transform problem without sacrificing the computational and parametric
efficiency of ordinary convolutional networks.
Detecting human movement using Recurrent Convolutional Neural Networks
Human movement is one of the primary objectives of this study. However,

CNN takes difficulty in detecting moving frames due to not having coordinate
frames. M. Kusuma and his colleagues provided a solution to this in their study by
applying both the CNN algorithm and R-CNN for human segmentation to provide
said coordinate points in a moving image. Recurrent Convolutional Neural
Networks is an algorithm capable of such activities. Examples of this is detecting
human movements even in different environments, lighting conditions and
illumination levels. M. Kusuma et al. (2019) proposed a study of detecting such
movements. By using CNN algorithm to separate the human from an image and
28
drawing a boundary box to provide focal points within the body. They then used R-
CNN to segment said body and convert it into a skeleton.
The frames taken and analyzed by the R-CNN algorithm transformed the
said frame into a 3d space to which coordinates are taken. Coordinates are then
used to identify which set of actions and movements of a human are being
conducted. The said model would be able to determine such movements through
constant training the given data sets of multiple individuals doing different actions.
The downside of such an algorithm training use however is the heavy usage of
computational power by GPU's as the researches used a NVIDIA Geforce GTX –
1080 for the study.
Synthesis of the Study
As stated in the numerous research and studies given, the usage of the
CNN algorithm towards image processing and object identification has proven to
be effective when compared with other models of artificial neural networks or ANN
based from the study of Indolia et al. 2018. From the study of M. Prabu et al.
(2020), it's been shown that the algorithm works as a model for classification and
organization of objects, while it may seem to do this at a slower pace, however
given enough time and training to the system and feeding it more information
towards what its intended to do it. It has the possibility of doing its process at a
wide-scale level whether it'd be in recognition, detection and segmentation better
than other classifiers as thanks to the studies of Sharma et al. (2018). These types
of systems are much needed now towards developing security systems as seen
with the adamant rise of crime in the Philippines as gathered from the study of
Sanchez (2020). Home Security System will become vital towards such
development in avoiding such rise.
29
The Convolutional Neural Network has been proven to show success at
vision related tasks, however the model itself has faults such as shown by
Rosanne Liu et al.'s (2018) study proved that the model had issues at coordinate
transform problems opting for their own solution by granting extra coordinate
channels. Another solution towards CNN's lack of coordinate frames is from
M.Kusuma et al's (2019) paper by integrating CNN's image detection of generating
a boundary from an image of a human and then applying R-CNN to segment said
human to provide points. The provided skeleton would then be used to identify
whether the skeleton is a human or not, even if said data was gathered from a
video or a moving image.
30
Chapter 3
METHODOLOGY
Research Design
The researchers will be using Experimental design in the implementation of
this study. The researchers will be experimenting in testing the accuracy of the
software in detecting suspicious movements of certain individuals and the
detection of firearms or other deadly weapons.
The researchers will use CNN or Convolutional Neural Network model in
training the software to detect certain suspicious movements, firearms and other
deadly weapons from a set of data given by the researchers.
In measuring the accuracy of the software, the researchers will be asking
homeowners in barangays to participate and test the software for their households.
They will be the determining factor if our software is able to detect the factors that
the researchers set.
In measuring the efficacy and performance of the software, the researchers
will be testing the software on how long it takes for the software to alert certain
authorities with regards to illegal activities.
The study focuses on suspicious movements that can lead to illegal
activities and the possession of firearms or other deadly weapons.
31
Sources of Data
The data of this study will be collected from different users who will partake
in the testing phase where it will mainly consist of owners of houses and
establishments. The respondents will be selected using convenience sampling,
where it is a type of non-probabilistic method wherein the researchers are able to
pick and choose people that will partake in the study.
The researchers will be using CNN algorithm for this study. The videos will
be composed of different movements of certain individuals whose actions can lead
to illegal activities. The images will be composed of different firearms or other
weapons that is used to commit burglary or any unlawful acts. Then the
researchers will feed the videos and images into the CNN or Convolutional Neural
Network. Once the training of the CNN is done, they will then provide sample
scenarios that are different from the dataset to be able to test if the software is
working on detecting what it’s supposed to be detecting.
Datasets
The researchers will create and handle the necessary scenarios and
images that will be used for the training of the CNN incase that there are no
available datasets that can be used. The datasets that will be used to feed into the
CNN will be provided by Kaggle datasets especially for the weapons and firearms
classifications such as knives, pistols, rifles, shotguns, bats, and etc. that can be
used for criminal activities.
The list of datasets that will be used are:
1. Weapon detection by Arohan Ajit. From
32
https://www.kaggle.com/arohanajit232/weapondetection and
https://www.kaggle.com/abhishek4273/gun-detection-dataset
2. Weapons in images by A. N. M. JUBAER. From
https://www.kaggle.com/jubaerad/weapons-in-images-segmented-
videos
3. Knife Dataset by Shashank Shekhar. From
https://www.kaggle.com/shank885/knife-dataset
33
Research Instrument
System Architecture
Figure 3.1 Proposed System Architecture/ Design of the Study
The above figure shows the system architecture of the study. Inputs to be
processed will be coming from the raw CCTV footages and detected motion
images recorded by the camera. The raw video will be used for frame extraction.
Grey Scaling will be done to make the data easier to process. Background
Subtraction will extract the foreground to the background which will be used for
classifying. Canny Thresholding will be done to binarize the frame that will be used
34
in for extracting the needed data for classifying. Contouring is used to extract the
frame to segmented regions which will be the input in the CNN.
Feature Extraction will be done at the Convolution layers with Rectifier

Linear Unit (RELU). Reducing the resolution of the feature map but retaining the
feature will be done at the Pooling Layer (Pooling). Classifying the feature will be
done at the Fully Connected layer.
In a Recurrent Neural Network, the information cycles until the right data is
obtain. When it makes a decision, it considers the current input and also what it
has learned from the inputs it received previously. The data will be processed in a
loop in Hidden layer until it reaches the output.
After those several system processes, the application will now undergo in
an output classification wherein it includes malicious movement and what kind of
deadly object is being detected. Once it classifies, it will alert the owner regarding
what object or movement detection made by the system as well as alerting the
authorities about the incident.
The researchers will be testing the software with the use of laptop or
desktop with a camera that will serve as the video feed by the researchers that will
be used for capturing the image or analyzing the video feed. In capturing with the
camera, each and every frame will be processed and analyzed, and the CNN will
be used to recognize any suspicious movements and firearms in the frame. The
second is the software that will detect the firearms in the frame and alert the proper
authorities.
35
Data Generation/ Gathering Procedure
The researchers will create or find the datasets that will be used for the
training of the software to the CNN. These will be used for recognition and
analyzation of suspicious movements and firearms in the captured image or video
feed in real-time.
Real Time Image/Video Analyzation
A camera will be used to capture images or stream live video feeds from
the user’s preference. Focus will be set on the individual’s movement, actions and
firearm (if there is any) once it is detected by the camera. It will then follow the
individual’s actions to determine if it will lead to unlawful activities and if it’s
detected as a crime and a presence of a deadly weapon, it will then contact the
owner and proper authorities.
This section presents the step-by-step procedure used by the researchers

in their study. It is divided into two (2) parts, the Preliminaries and Experimental
Method.
Preliminaries:
Step 1: The researchers will look for an area where they will install CCTV.
Step 2: The researchers will be disguised as a person who will be captured
and detected through the installed CCTV. They will perform some of the possible
suspicious movements and will be handling some of the weapons and firearms.
Step 3: The researchers that is not in the reenactment will then manually
identify all the suspicious movements, weapons and firearms presented in the
process.
Step 4: Using the reenactment video footage, the researchers will let the
36
system perform its duties in detecting and identifying all the concerns. And
compare if there is same findings happen between the manual and software
process.
Experimental Method
Step 1: The researchers used the sample footages as an input for the
system they made.
Step 2: The researchers check and compare the output of the system with
the expected result.
Step 3: The result will be TP, TN, FP, or FN depending on the comparison
of expected defect to the result of the system.
Step 4: The researchers records the result in the experiment paper.
Step 5: The researchers derives accuracy based on the result.
Ethical Consideration
The ethical consideration in this study is to make sure and address the
moral and legal rights of every participant of this study. The researchers will be
solely responsible in conducting the study and will strictly follow the policies of the
college and university. The researchers will explain to each and every participant
the study and its goal and objectives and will make sure that their participation in
the study is voluntary and have the rights to decline or refuse to participate. The
researchers will also inform the subjects about the following:
• The researchers will ensure the safety of the data.
• The researchers will make sure not to invade the personal privacy of
37
the participant and will communicate and interact with them properly.
• The information will be strictly confidential and will be unidentifiable
to others and will be treated carefully
• The results of this study will be presented in the thesis and be used
for presentations.
• This research may be published and read by other researchers who
are conducting a similar study.
• The participants are free to ask for a personal copy of the results of
the study
• A consent form will be given to every participant by the researchers
if they want to be part of the study
Statistical Treatment of Data
The researchers will conduct a series of tests to measure the capabilities
and limitations of the software. The series of tests will be divided into different
categories.
The first test is determining the accuracy of the software in recognizing the
suspicious movement if it will lead to unlawful activities or just a normal passerby.
The researchers will present a series of video clips in front of the camera feed.
Then they will make a record of its correct and wrong analysis of the movements
of people. They will then measure the success probability by the success on the
number of tests done to the software. They will then measure the distribution to
see the graph using normal distribution
38
The second test will be the accuracy of the software in detecting and
recognizing firearms and other deadly weapons. The researchers will present a
series of images containing different weapons and firearms in front of the camera.
Then they will make a record of its correct and wrong analysis of the weapons.
They will then measure the success probability by the success on the number of
tests done to the software. They will then measure the distribution to see the graph
using normal distribution
Accuracy
Equation Formula for Accuracy
Equation 1
Where:
TP (True Positive) = when the system detected certain movements and
objects, it is classified as malicious act and includes deadly weapons such
as gun and knife.
TN (True Negative) = when the system detected nothing.
39
FP (False Positive) = when the system detected certain movements and
objects, classified as safe response.
FN (False Negative) = when the system detected certain movements and
objects, but incorrectly classified.
Table 1: Interpretation of Percentage of Accuracy
Rating Interpretation
75.01% - 100% Very High Accuracy
50.01-75% High Accuracy
25.01%-50% Low Accuracy
0-25% Very Low Accuracy
For the third test, the researchers will perform several tests on the system
with enhanced disabling effects like during daylight or nighttime. The researchers
will perform tests in rooms that are bright and dark like daytime and nighttime. The
researchers will test the software in different angles, crowd density, rain or multiple
firearms detected. After the tests, the researchers will compare the results with
each other to determine which factors can affect systems efficacy out of all the
factors used.
After performing all the necessary tests, the researchers will then gather the
results from the experiments and interpret the results. Then they will draw
40
conclusions which will then answer the questions in the statement of the problem
back in Chapter 1 and will treat the hypothesis based on their findings.
41
Bibliography
[1] Eysenbach, G. (2016, July 14). The Personal Emergency Response

System as a Technology Innovation in Primary Health Care Services: An
Integrative Review. The Personal Emergency Response System as a
Technology Innovation in Primary Health Care Services: An Integrative
Review.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5395921/
[2] Tiong, P. K., Ahmad, N. S., & Goh, P. (2019). Motion Detection with IoT-
Based Home Security System. Intelligent Computing, 1217–1229.
doi:10.1007/978-3- 030-22868- 2_85. Sci-Hub.
https://sci-hub.do/https://link.springer.com/chapter/10.1007/978-3-030-
22868-2_85
[3] Kommey, B., Agbemenu, A. S. (2018, October). (PDF) A GSM Based Alert
System for Detection and Reporting of Room Burglary. Research Gate.
www.researchgate.net.
[4] Nosiri, O., Akwiwi-Uzoma, C., Anyauba, N. U. (2018, December). Motion

Detector Security System for Indoor Geolocation. Research Gate.
www.researchgate.net
[5] Anitha, A. (2017). Home Security System Using Internet of things. IOP
Science. Home Security System using Internet of Things (iop.org).
[6] Ali, B. Awad Ali (2018, March 08). Cyber and Physical Security
Vulnerability Assessment for IoT-Based Smart Homes
[7] Patel, Priya B. et al (February 2016). Smart Motion Detection using
Raspberry Pi
https://www.ijais.org/research/volume10/number5/patel-2016-ijais-
451506.pdf
42
[8] K.N Karthick Kumar, et al (April 2017). Motion Activated Security Camera
using Raspberry Pi
https://sci-hub.do/https://ieeexplore.ieee.org/abstract/document/8286658
[9] Zhang Xin, et al (May 2019). Home Surveillance System using Computer
Vision and Convolutional Neural Network
[10] Kanehisa Rodrigo, et al (2019). Firearm Detection using Convolutional
Neural Networks
https://www.scitepress.org/Papers/2019/73977/73977.pdf
[11] Editors of Encyclopaedia Britannica. (2018). Image Processing.
Retrieved from Encyclopædia Britannica:
https://www.britannica.com/technology/image-processing
[12] Imatest. (2018). Image Quality Factors (Key Performance Indicators).

Retrieved September 2018.
http://www.imatest.com/docs/iqfactors/
[13] Yang, A., Zhang, C., Chen, Y., Zhuansun, Y., & Liu, H. (2019). Security
and Privacy of Smart Home Systems Based on the Internet of Things and
Stereo Matching Algorithms. IEEE Internet of Things Journal, 1–1.
[14] Nguyen, D. T., Li, W., & Ogunbona, P. O. (2016). Human detection from
images and videos: A survey. Pattern Recognition, 51, 148–175.
https://sci-
hub.do/https://www.sciencedirect.com/science/article/abs/pii/S003132031
5003179
43
[15] Ko, K.-E., & Sim, K.-B. (2018). Deep convolutional framework for abnormal
behavior detection in a smart surveillance system. Engineering Applications
of Artificial Intelligence, 67, 226–234.
https://sci-
hub.do/https://www.sciencedirect.com/science/article/abs/pii/S095219761
7302579
[16] Nico Surantha, Wingky R. Wicaksono. (2018). Design of Smart Home
Security System using Object Recognition and PIR Sensor, Procedia
Computer Science, Volume 135.
https://www.sciencedirect.com/science/article/pii/S1877050918314881
[17] Tiwari, S., Trivedi, M. C., Mishra, K. K., Misra, A. K., & Kumar, K. K.
(Eds.). (2019). Smart Innovations in Communication and Computational
Sciences. Advances in Intelligent Systems and Computing.
https://sci-hub.se/10.1007/978-981-13-2414-7
[18] Alexandrie, G. (2017). Surveillance cameras and crime: a review of

randomized and natural experiments. Journal of Scandinavian Studies in
Criminology and Crime Prevention, 18(2), 210–222.
https://sci-
hub.do/https://www.tandfonline.com/doi/full/10.1080/14043858.2017.1387
410?scroll=top&needAccess=true
[19] Dinama, D. M., A’yun, Q., Syahroni, A. D., Adji Sulistijono, I., &
Risnumawan, A. (2019). Human Detection and Tracking on Surveillance
Video Footage Using Convolutional Neural Networks. 2019 International
Electronics Symposium (IES).
44
[20] Uddin, M. (2020). A Novel Deep Convolutional Neural Network Model to

Monitor People following Guidelines to Avoid COVID-19.
https://www.hindawi.com/journals/js/2020/8856801/
[21] Creswell, J. (2009). Research Design: Qualitative, Quantitative, and

Mixed Methods Approaches 3rd Edition. London: SAGE Productions.
[22] Pack, T. 4 Types of Home Security Systems (Plus Pros and Cons of
Each).
https://www.homestratosphere.com/types-home-security-systems/
[23] Eldred, J (2020, October 15). The Different Types of Security Systems.
https://getsafeandsound.com/2020/10/types-of-security-systems/
[24] Quantitative Research: Definition, Methods, Types and Examples.
https://www.questionpro.com/blog/quantitative-research/
[25] Bhasin, H (2020, June 4). 11 Types of Quantitative Research options that
exist for Market Researchers.
https://www.marketing91.com/11-types-of-quantitative-research/
[26] The Ultimate Guide to Writing A Dissertation in Business Studies; a step
by step approach.
https://research-methodology.net/sampling-in-primary-data-
collection/purposive-
sampling/#:~:text=Purposive%20sampling%20(also%20known%20as,to%
20participate%20in%20the%20study
[27] Methods of sampling from a population.
https://www.healthknowledge.org.uk/public-health-textbook/research-
methods/1a-epidemiology/methods-of-sampling-population
[28] Teachers Columbia University, Research Instrument Example.
https://www.tc.columbia.edu/media/administration/institutional-review-
board-/irb-submission---documents/Published_Study-Material-
Examples.pdf
45
[29] Alzheimer Europe. Research Methods.
https://www.alzheimer-europe.org/Research/Understanding-dementia-
research/Types-of-research/Research-methods
[30] Sarga, A (2019, March 17). Questionnaire method of data collection.
https://microbenotes.com/questionnaire-method-of-data-
collection/#:~:text=Questionnaire%20is%20as%20an%20instrument,speci
fic%20information%20from%20the%20respondents
[31] The ultimate guide to great questionnaires.
https://www.questionpro.com/blog/what-is-a-
questionnaire/#:~:text=A%20questionnaire%20is%20a%20research,questi
ons%20and%20open%2Dended%20questions
[32] Sakshi Indoliaa, Anil Kumar Goswamib , S. P. Mishrab , Pooja
Asopaa (2018). Conceptual Understanding of Convolutional Neural
Network- A Deep Learning Approach
[33] Neha Sharma, Vibhor Jain, Anju Mishra (2018). An Analysis of
Convolutional Neural Networks for Image Classification
[34] M. Prabu, Anirban Pal, Vineet Loyer, Sougata Dey (2020). Automatic
Waste Segregation System Using Convolutional Neural Networks
http://www.jcreview.com/fulltext/197-1593524447.pdf
[35] Fujitsu (2021), Technology that makes predictive crime detection possible
https://blog.global.fujitsu.com/fgb/2021-01-20/technology-that-makes-
predictive-crime-detection-possible/
[36] Masoumeh Poormehdi Ghaemmaghami (2017), Tracking of Humans in
Video Stream using LSTM Recurrent Neural Network
https://kth.diva-portal.org/smash/get/diva2:1156631/FULLTEXT01.pdf
[37] M. Organero, L. Powell, B. Heller, V. Harpin (2019), Using Recurrent
Neural Networks to Compare Movement Patterns in ADHD and Normally
Developing Children Based on Acceleration Signals from the Wrist and
Ankle
https://www.researchgate.net/publication/334208427_Using_Recurrent_N
eural_Networks_to_Compare_Movement_Patterns_in_ADHD_and_Norm
46
ally_Developing_Children_Based_on_Acceleration_Signals_from_the_Wri
st_and_Ankle
https://www.diva-portal.org/smash/get/diva2:1480812/FULLTEXT01.pdf
[38] R. Liu, J. Lehman, P. Molino, F.P. Such, E. Frank, A. Sergeev, J.
Yosinki (2018), An intriguing failing of convolutional neural networks and
the CoordConv solution
https://papers.nips.cc/paper/2018/file/60106888f8977b71e1f15db7bc9a88
d1-Paper.pdf
[39] M. Kusuma, M. Vijaya Lakshmi, K. Sai Krishna (2019) Human Movement
Detection using Recurrent Convolutional Neural Networks
https://www.ijitee.org/wp-
content/uploads/papers/v8i12S/L109010812S19.pdf
[40] S. Bhuiya (2021) OpenGenus Foundation, Disadvantages of CNN models
https://iq.opengenus.org/disadvantages-of-cnn/
[50] C.V. Amrutha, C. Jyotsna, J. Amudha (2020), Deep Learning Approach
for Suspicious Activity Detection from Surveillance Video
https://www.researchgate.net/publication/340890810_Deep_Learning_Ap
proach_for_Suspicious_Activity_Detection_from_Surveillance_Video
47
Appendices
Disclaimer: The screenshots below are not final, designs and features are still
eligible to change.
Appendix 1: Application User Interface (Mobile)
48
Appendix 2: Application User Interface (Software)
49
50
51
Curriculum Vitae
Caculitan, Harold
BSCS 3-1N
281 San Jose Ext., Brgy. San Isidro, Antipolo City, Rizal
+639482897390
Harold.caculitan.gasa@gmail.com
SKILLS & QUALITIES

• Junior level in Bachelor of Science in Computer Science
• Knowledgeable in Computer Hardware and Software Installation
• Knowledgeable in Programming C++, Java, Database
• Able to learn quickly.
• Proficient in the use of MS Productivity Programs like Word, Excel and
PowerPoint
• Able to adapt in different situations, surroundings, and other work settings.
• Competent in organizational skills
INTEREST
• Cyber Security
• Software Development
• Computer Network and Security
• Game Development
PERSONAL INFORMATION
• Date of Birth: August 8, 1999
• Civil Status: Single
• Religion: Roman Catholic
• Nationality: Filipino
52
Erfe, Jefferson
BSCS 3-1N
Area 3, Hawk St. Sitio Veterans, Brgy. Bagong Silangan,
Quezon City
+639270775817
jeff.erfs18@gmail.com
SKILLS & QUALITIES

• Knowledgeable in Programming such as Java and Database
• Able to learn quickly
PowerPoint
• Able to adapt in different situations, surroundings and other work settings
INTEREST
• Solving puzzle
• Analyzing logic and other math related problems
• Date of Birth: October 24, 1999

53
Galindez, Christopher Elijah F.
BSCS 3-1N
4 Tongonan Street, Napocor Village, Pasong Tamo
Quezon City
+639165100434
egalindez2299@gmail.com
SKILLS & QUALITIES

• Knowledgeable in Computer Hardware and Software Installation
• Knowledgeable in Programming C++, Java, Database
PowerPoint
INTEREST
• Cyber Security
• Date of Birth: November 22, 1999
54
Tarcenio, Earl Alvin S.
BSCS 3-1N
34 Nalugod St., Sto Nino, Marikina City
+639163336585
eastarcenio@gmail.com
SKILLS & QUALITIES

• Knowledgeable in Programming C, C++, Java, Database
• Proficient in Microsoft Applications such as Word, Excel etc.
• Proficient in the Internal Workings of a Computer and its necessary
software
INTEREST
• Cyber Security
• 3D Modelling
• Date of Birth: November 26, 1999
55

VISIONHUB Suspicious Movement Classification and Weapon Object Detection Using Recurrent Neural Network RNN and Region Based Convolutional Neural Network R CNN

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

VISIONHUB Suspicious Movement Classification and Weapon Object Detection Using Recurrent Neural Network RNN and Region Based Convolutional Neural Network R CNN

Uploaded by

Copyright:

Available Formats

VISIONHUB: Suspicious Movement Classification and Weapon Object

Detection using Recurrent Neural Network (RNN) and Region Based

A research paper presented to the

College of Computer and Information Sciences Department in

Polytechnic University of the Philippines

In partial fulfillment of the

Requirements for Thesis Writing I

Galindez, Elijah Christopher

Tarcenio, Earl Alvin

Mrs. Sherilyn Buban-Usero

Title Page .............................................................................................................i

Table of Contents ............................................................................................... ii

List of Figures ................................................................................................... iv

List of Equations ................................................................................................v

List of Tables ..................................................................................................... vi

Chapter 1: The Problem and its Setting………………………………...…...1

Theoretical framework ………………..………………………………. 3

Statement of the Problem………………..……………...……………. 8

Scope and Limitations…………………….…………………………... 8

Significance of the Study……………………………………………...11

Chapter 2: Review of Related Literatures…………………………………. 13

Related Literature and Studies ……………………………………....13

Synthesis of the Study ……………………………………………….. 29

Research Design ………………………………………………………. 31

Sources of Data …………………………………………………………32

Proposed System Architecture/ Design………………....... 34

Data Generation / Gathering Procedure…………………………… 35

Statistical Data Analysis………………………………………...…..... 38

Appendix 1: Application User Interface(Mobile)…………………..48

Appendix 2: Application User Interface(Software)………………..51

Number Title Page

2 Conceptual Framework of the Study 7

Recurrent neural network and the unfolding in time of the

4 System Architecture of the Study 34

Number Title Page

1 Equation Formula for Accuracy 39

Number Title Page

1 Advantages and Disadvantages of RNN 25

2 Interpretation of Percentage of Accuracy 40

THE PROBLEM AND ITS SETTING

Security cameras have become an essential aspect when it comes to

However, this study focuses on the accuracy of the system to detect

Deep Learning is a subset of machine learning that achieves great power

After the early definition of privacy by Warren & Brandeis, mentioned

Privacy and the age of computers

The private sphere

The protection of the private sphere can be understood as the protection

CLOSED-CIRCUIT TELEVISION SURVEILLANCE

CCTV surveillance versus the photo camera

CCTV surveillance can be seen as an extension to the surveillance

CCTV and processing the electronic data

CCTV surveillance reveals aspects of people’s identities that primarily

Perception & Reaction to CCTV surveillance

According to several researchers, CCTV surveillance has an influence on

Research Paradigm of the study

Statement of the Problem

This study deals on how to better increase security systems in Filipino

1. What is the accuracy rate of VisionHub in detecting suspicious movements

Scope and Limitation

VisionHub program can be accessible anywhere the user is as long as that

VisionHub program cannot playback the footage caught in the camera

VisionHub software can only detect suspicious clothing and movement,

VisionHub software will also consider different factors that will be