Download as pdf or txt
Download as pdf or txt
You are on page 1of 58

SMART HOME SECURITY USING TELEGRAM

CHATBOT

PROJECT REPORT
Submitted by

ABIJITH KURUPATH (ATP16CS001)


NAVNEETH SURESH (ATP16CS033)
SANJAY P S (ATP16CS040)
VINAYAN V (ATP16CS049)

in partial fulfillment for the award of the Degree of

BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
of the APJ ABDUL KALAM UNIVERSITY

Under the guidance of


Prof. KRISHNAKUMAR

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


AHALIA SCHOOL OF ENGINEERING AND TECHNOLOGY
PALAKKAD
JULY 2020
DECLARATION

We undersigned hereby declare that the project report “Smart Home Security Using
Telegram Chatbot”, submitted for partial fulfillment of the requirements for the award of degree
of Master of Technology of the APJ Abdul Kalam Technological University, Kerala is bona fide
work done by us under the supervision of Prof. Krishnakumar. This submission represents our
ideas in our own words and where ideas or words of others have been included, we have adequately
and accurately cited and referenced the original sources. We also declare that we have adhered to
ethics of academic honesty and integrity and have not misrepresented any data or idea or fact or
source in my submission. We understand that any violation of above will be a cause for disciplinary
action by the institute and/or University and can also evoke penal action from the sources which
have thus not been properly cited or from whom proper permission has not been obtained. This
report has not previously formed the basis for the award of any degree, diploma or similar title of
any other university.

Place: Palakkad Signature


Date: 18/6/00 Name of the students:
Abijith Kuruppath
Navneeth Suresh
Signature
Sanjay PS
Name of Guide: Assist. Prof. Krishna Kumar
Vinayan V
Countersigned with name
Head of the Dept.: Prof. Gunasekaran Subramanian
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CERTIFICATE

This is to certify that the report entitled “Smart Home Security Using Telegram Chatbot”
submitted by ‘Abijith Kuruppath (ATP16CS001), Navneeth Suresh (ATP16CS033), Sanjay
PS (ATP16CS040), Vinayan V (ATP16CS049)' to the APJ Abdul Kalam Technological
University in partial fulfillment of requirement for the award of degree of Master of Technology
in Computer Science and Engineering is bonafide record of the project work carried out by us
under Prof. Krishnakumar’s guidance and supervision. This report in any form has not been
submitted to any other university or institute for any other purpose.

Signature Signature

Krishna Kumar.C Dr. S.Gunasekaran


Guide & Supervisor the Head of Department
Assistant Professor Professor
Department of CSE Department of CSE

This report is submitted for the Project Viva-Voce held on _________


ACKNOWLEDGEMENT

First and foremost, we would like to thank the GOD ALMIGHTY for his infinite grace and
help, without which this project would not have reached its completion.

We would like to express my sincere gratitude to the MANAGEMENT for their timely support
extended throughout the seminar.

We would like to express our sincere gratitude to Dr MAHADEVAN PILLAI, the Honorable
Director for his timely support extended throughout the project.

We would like to express our sincere gratitude to Dr MAHADEVAN PILLAI, the Honorable
Principal for his timely support extended throughout the project.

We would like to express our sincere thanks to our HOD Dr S. GUNASEKARAN for his help
and guidance through the project.

We would like to express our heartfelt thanks to Dr S. GUNASEKARAN Prof. SREERAG


B.M, Prof. AKHIL N.V, Prof KRISHNAKUMAR our Project Coordinators for their
instruction and guidance.

Words can’t count the source, motivation, guidance and inspiration that our Guide Prof.
KRISHNAKUMAR who rendered us through the entire project without whom this would not
have been possible.

Any project would not be successful if it does not rely on the reference material. In this context
we wish to express profound sense of gratitude to all teaching and non-teaching staff of our
college for giving us supportive environment in the college.

IV
ABSTRACT

In real world application, home security has become a pressing issue and gained more importance
nowadays due to the occurrence of unwanted events in our surroundings. We need a surveillance
system that can analyse and explain the object’s behaviours which consists of static and moving
entities to improve object detection and video tracking. This paper focuses on detection of moving
objects in a video surveillance system then tracking the detected objects in the scene and
classifying them and adding to that, sending intruder alerts as chat through Telegram. So, we need
fast, reliable and robust algorithms for moving object detection and tracking. The algorithm
includes background subtraction in the image sequences thus detecting the moving objects in the
foreground. And add to that, the motion detection works in tandem with the facial recognition to
filter out false positives and only give accurate intrusion alerts. The system also records footage in
real time and sends it to the user via the Telegram chatbot where the video clips are stored safely
in the Telegram server.

And with the help of a neural network we will classify the target object with a preloaded
data set. We will perform face detection, extract face embeddings from each face using deep
learning, train a face recognition model on the embeddings, and then finally recognize faces in
both images and video streams with OpenCV. For all such requirements we integrate our system
with the Telegram Bot, as a remote control and to receive notification from the system regarding
the surveillance. Telegram being an open source makes it that much easier for home security
integration, plus all the messages that are sent from the Surveillance system to the Telegram bot
are end-to-end encrypted means that it's secure and it cannot be eavesdropped on.

V
CONTENTS

CHAPTER TITLE PAGE NUMBER

DECLARATION II

CERTIFICATE III

ACKNOWLEDGEMENT IV

ABSTRACT V

LIST OF FIGURES VI

LIST OF TABLES VIII

LIST OF ABBREVIATIONS IX

1 INTRODUCTION 1

2 EXISTING SYSTEM 3

3 LITERATURE SURVEY 4

4 PROPOSED SYSTEM 19

4.1 Architecture 19

5 IMPLEMENTATION 24

5.1 Face Recognition 25

5.1.1 Detecting Face with Open-CV 27

5.2 Motion Detection 29

5.3 Telegram Notifications 31

5.3.1 Recording Surveillance Footage 32


6 EVALUATION 35

6.1 Motion detection 35

6.2 Face Recognition 38

6.3 Telegram Notifications 39

7 CONCLUSION 40

REFERENCES 42

APPENDIX-1 44

APPENDIX-2 46
LIST OF FIGURES

CHAPTER TITLE PAGE


NUMBER NUMBER
3.1 Pooling layers 5
3.1 ReLU graph 6
3.2 Current image and segmented image 8
3.2 A Histogram model 9
3.3 Current frame 12
3.3 Background Model Frame 12
3.3 Output after 12
3.4 Overview of the prototype human motion 14
detection application system
3.4 An overview of the motion detection 15
algorithm implemented.
4.1 Architecture of Home Security bot 19

4.1 Data flow diagram of motion detection 21


algorithm
4.1 Data flow diagram of facial recognition 22

5.1 Facial Recognition process 27


5.2 Facial recognition in Video stream 29
5.2 Bounding box drawn over moving object 31
5.3 Telegram alert notification 32
5.3 Chatbot sending surveillance footage 34
6.1 Line chart showing speed and position 36
analysis of detection.
6.1 Line Chart showing values of F1, recall and 37
precision.

VI
6.2 Size of trainer file vs Loading time 38

6.3 Arrival time of telegram notifications based 39


on file size

VII
LIST OF TABLES

CHAPTER TITLE PAGE


NUMBER NUMBER

6.1 Distance travelled, time taken and speed of 35


object in motion.
6.1 Recall, Precision and F1- measure 36
calculated.
6.2 Facial Recognition Results 38

6.3 Time of arrival of Telegram notification 39

VIII
LIST OF ABBREVATIONS

CNN Convolutional neural networks

ReLU rectified linear unit

GMM Gaussian mixture model

Blob Binary Large Object

IOU Intersection over union

TP True positive

TN True negative

ROI Region of interest

SVM Support vector machine

CCTV closed-circuit television

IX
CHAPTER 1

INTRODUCTION

Home Security Bot is an ambitious project aimed at solving a common but overlooked
problem in the daily lives of people. Security is the first attention in everywhere, every time
and for everyone. The main aim is basically to protect individuals and property from theft and
loss. Although modern technological solutions exist, people tend to avoid using them because
of the sole reason of high expense and one of the major problems that occurs in most of the
systems that is “False Alarm”. Some of the existing systems give the users false alarms that
alerts the user even though there are no intruders entering their house.

These systems operate on the basis of detecting motion alone which means that false
alerts are more likely to make it through. They cannot classify the objects detected in the frame
and would send out an alert. Since it works on the basis of detecting alone this system suffers
from some major flaws because almost all algorithms which work on motion change tend to
detect motion in case of lights turning on, sudden change in lighting etc.

The system we want to introduce addresses these issues in a flawless way by including
strong object detection and tracking algorithms. This means that not only will our system detect
movement but it can also identify who is in the frame. This helps to avoid sending out false
alerts. The system we wish to introduce is a chatbot integrated motion detection and recognition
system, a simple python script.

The system will be able to detect any object movement, classify the occupants of the
house from intruders in light day or darkness, take a picture, record footage and automatically
send the data to a smartphone via Wi-Fi using the Telegram chatbot application. The data
include the picture and notification message "Intruder Alert".

The advantage of using this system is that it is a very crucial choice for energy save,
cost saving and home security. Also, another advantage is that it is a simple system and able to
work at any time in the light of day or darkness. The alert message sent to the user is also
secured through telegram’s end-to-end message encryption.

1
The elementary principle for such a system to work is segmentation, detection and
objects tracking. Object Detection is to detect and identify suitable objects in the video.
Segmentation helps in finding the relations between objects, as well as the context of objects.
Face recognition, identifying number plates, and satellite image analysis are such examples
and Object tracking involves figuring out the path as the object moves. Confronting object
tracking is a bit of a challenge because of the noise, object occlusions and their complex
structures.

Challenges which may appear in such video surveillance systems are :


● Changes in illumination
● Dynamic background
● Object occlusion
● Video noise

Some of the major changes in the level of brightness may occur and the correct and
clear image may not be obtained. Therefore, the level of brightness is totally different from
darkness to that in the daylight. The object occlusion may also occur in some setup that is some
of the images or videos may get over the previous one which may act as a blockage to the
previous image or video and the obtained image may not be visible by the user. Noises may
also be a part in the image or the videos that are recorded in the security camera which is the
major problem to the user as the noises affect the image and the video in such a way the user
will not be able to identify the person who entered his/her house. While recording and sending
the image and the footage to the particular user should be faster and the user should be alerted
as soon as possible as the current existing systems contain these kinds of flaws as they send
false alerts and some response time is very slow.
In our work, we seek to propose a method in which we can trim down challenges and
limitations which occur during real time operation and can run a fairly effective and efficient
manner in terms of results and response time.

2
CHAPTER 2

EXISTING SYSTEM

CCTV is one of the most commonly used security systems. The systems are highly
influential in crime prevention, industrial process, traffic monitoring etc. It is a non-active
monitoring device which requires constant and continuous human supervision. The monitoring
of such situations continuously is a complex task as all of the captured footage should be
watched manually by each person which requires a lot of patience. This technique is costly and
many times the collected information gets corrupted also. Some of the cameras cannot even
differentiate between the intruder and the person who is living nearby, due to this there will be
false alarm sent to the user which may lead to misunderstanding. The files may get corrupted
as mentioned earlier so there should be a clean storage space where the photos and videos
should be stored securely without getting corrupted.

The existing system has some of the major disadvantages like false alarm, corruption
of files stored inside the storage system. This may lead to an unstable security system and some
of the major changes are required in usage of algorithms of face detection and object detection.
The major problem arises when the camera has to detect faces at night, so more accurate
algorithms should be used to detect faces at night to reduce the false alarms to the user and get
the perfect snapped photo of the intruder. The inherent issues with smart surveillance systems
which exist today is that it costs a premium to set up and in addition would charge a monthly
subscription for the storage of surveillance footage. Most surveillance systems are trained to
detect just change in motion which may be caused by several reasons other than intruders like
change in lighting or shadows. Only the top of the line systems implements facial recognition
but they are not made for household use. This is the major challenge in face detection at night
which can be reduced by using more accurate algorithms.

3
CHAPTER 3

LITERATURE SURVEY

3.1 Convolutional Neural Networks for Image Recognition - Samer Hijazi, Rishi
Kumar, Chris Rowen

A neural network is a system of interconnected artificial “neurons” that exchange


messages between each other. The connections have numeric weights that are tuned during the
training process, so that a properly trained network will respond correctly when presented with
an image or pattern to recognize as said in [12]. The network consists of multiple layers of
feature-detecting “neurons”. Each layer has many neurons that respond to different
combinations of inputs from the previous layers. A CNN is a special case of the neural network.
A CNN consists of one or more convolutional layers, often with a subsampling layer, which
are followed by one or more fully connected layers as in a standard neural network. In image
processing, many applications of CNN are found in most modern cameras for object detection,
facial recognition, character recognition, etc. By choosing CNN over the conventional neural
networks, computational power is optimized since the algorithm focuses on learning key
features. However, due to high computations needed for accurate classification, embedded
system processors are limited with the efficiency of CNN implementation. Different phases of
segmentation and noise removal would demand very high CPU resources in order to obtain an
efficient distinction between targeted objects or humans. However, the back-propagation
method used in learning parameters of the CNN could be more efficient to improve on the
accuracy of detection over time.

3.1.1 Layers of CNNS

Four types of layers are most common: convolution layers, pooling/subsampling layers,
non-linear layers, and fully connected layers.

3.1.1.1 Convolution Layers

The system is pre-trained with a set of images of faces of all users. The convolution
operation extracts different features of the input. The first convolution layer extracts low-level
features like edges, lines, and corners. Higher-level layers extract higher-level features. The

4
input is of size N x N x D and is convolved with H kernels, each of size k x k x D separately.
Convolution of an input with one kernel produces one output feature, and with H kernels
independently produces H features. Starting from the top-left corner of the input, each kernel
is moved from left to right, one element at a time. Once the top-right corner is reached, the
kernel is moved one element in a downward direction, and again the kernel is moved from left
to right, one element at a time. This process is repeated until the kernel reaches the bottom-
right corner.

3.1.1.2 Pooling/Subsampling Layers

Fig 3.1.1.2 Pooling layers

The pooling/subsampling layer reduces the resolution of the features. It makes the
features robust against noise and distortion. There are two ways to do pooling: max pooling
and average pooling. For average pooling, the average of the four values in the region are
calculated. For max pooling, the maximum value of the four values is selected.

3.1.1.3 Non-Linear Layers (ReLU)

The ReLU function is an activation function which lies between convolution and pooling.
A ReLU implements the function y = max(x,0), so the input and output sizes of this layer are
the same. It increases the nonlinear properties of the decision function and of the overall
network without affecting the receptive fields of the convolution layer. In comparison to the

5
other nonlinear functions used in CNNs, the advantage of a ReLU is that the network trains
many times faster.

Fig 3.1.1.3 ReLU graph

3.1.1.4 Fully Connected Layers

Fully connected layers are often used as the final layers of a CNN. These layers
mathematically sum a weighting of the previous layer of features. In the case of a fully
connected layer, all the elements of all the features of the previous layer get used in the
calculation of each element of each output feature.

To conclude what we learned from this journal, CNNs give the best performance in
pattern/image recognition problems and even outperform humans in certain cases. In a CNN,
since the number of parameters is drastically reduced, training time is proportionately reduced.
Also, assuming perfect training, we can design a standard neural network whose performance
would be the same as a CNN. But in practical training, a standard neural network equivalent to
CNN would have more parameters, which would lead to more noise addition during the training
process. Hence, the performance of a standard neural network equivalent to a CNN will always
be poorer.

3.2 International Research Journal of Engineering and Technology (IRJET-2016)

In this literature the Phases of basic system model video Surveillance is discussed, they are:

6
3.2.1 Image Acquiring Phase

With the help of a camera we can acquire images easily. When a camera captures an
image, its initial form is in raw format. Raw format image contains minimally processed data
from the image sensor of the camera. At this stage the image in raw form has a lot of
disturbances and blurriness. So, the images are not yet processed for further use. Hence the
camera quality plays an important role for the resulting image. If the quality of the image is
high, negative elements can be easily highlighted and identified, resulting in accurate and clear
output. Raw image is influenced by factors such as noise, shadow, light, image quality, contrast
etc. This raw image is converted into frames during the pre-processing phase.

3.2.2 Pre-Processing Phase

In this phase the image which is in raw format is converted into frames and is being
processed. Image pre-processing is a technique which uses various computer algorithms to

perform the processing of an image. Processing helps the image in eliminating noises and
various factors which affect the quality of the Image thereby enhancing the quality of frames
as the video frames have a lot of noise due to camera, illumination, reflection etc.

3.2.3 Segmentation Phase

This is a major phase of focus because the effective object detection and classification is
truly based on this phase.

Basically, it is the process of subdividing an image in order to analyse each part so that
those image data acquired can be used for various application activity such as Video
surveillance.

So here the image is classified into two types – Background and Foreground images.
The segmentation of the foreground is often obtained by applying a threshold. The threshold
parameter usually depends on the camera noise. When the objects of interest are identified then
for clear identification of the object, we use various techniques like edge-based extraction,
feature-based extraction, colour and pattern-based extraction, region-based extraction
mechanisms and object of interest.

7
Fig 3.2.3 current image and segmented image

3.2.4 Feature Selection

Feature selection deals with various feature extraction techniques based on spatial,
transform, edge and boundary, colour, shape and texture features.

3.2.5 Spatial Feature:

Spatial features of an object are characterized by its grey level, amplitude and spatial
distribution.

3.2.6 Histogram Features:

The histogram of an image refers to intensity values of pixels. The histogram shows the
number of pixels in an image at each intensity value.

8
3.2.7 A Histogram model

Fig 3.2.7 A Histogram model

3.2.8 Edge and Boundary Features:

Edge detection of an image significantly reduces the amount of data and filters out
unimportant information. If an edge of an image is identified accurately, their properties like
shape and size can be measured.

3.2.9 Colour Feature:

Colour is a visual attribute of an object. Colour features can be derived from a histogram
of the image. It can be mainly used for object matching with the pre-loaded data.

3.2.10 Shape Feature:

The shape of an object refers to its physical structure. It is determined by its external
boundary abstracting from other properties such as colour, content and spatial properties.

3.2.11 Texture Feature

Texture refers to surface characteristics and appearance of an object, in here it is the


process of transforming the input data into a set of features (region classification) is called
feature extraction.

9
To summarize the literature, they have presented different methods of moving object
detection, used in video surveillance. And have described the various phases of video
surveillance. So, this gives valuable insight into the area of moving object detection as well as
in the field of computer vision. It is clearly stated that if all the steps of video analytics are
taken into account, an effective mechanism can be built up which will result in better and clear
image capturing.

3.3 A Survey on Object Detection and Tracking Algorithms, Rupesh Kumar Rout (2013):

Tracking is basically the process of mapping an object within a sequence of frames, from
its first appearance to its last. The object of interest is decided by the application. An object
can be occluded by other objects as mentioned by author in [11]. So, a tracking system should
be able to predict the position of any occluded objects.

3.3.1 Motion Detection

Video surveillance systems include tasks such as motion detection, tracking, and activity
Recognition. Here, detection of moving objects is the first important step and successful
segmentation of moving foreground objects from the background.

Motion detection into three major classes of method as

● Frame differencing,

● Background subtraction

● And Gaussian mixture

3.3.1.2 Frame Difference Method

Frame differencing is pixel-wise differencing between two or three consecutive frames


in an image sequence to detect regions corresponding to moving objects such as humans. In a
video, we have a sequence of images which is called as a Frame. For detecting moving objects
in video surveillance system, we find the difference between the current frame and a reference
frame, this method is iframe difference method Using pixel subtraction, it determines the
differences between input frame intensities and background

10
3.3.1.3 Background Subtraction

The background subtraction is one of the most popular and common approaches for motion
detection. The basic idea of background subtraction subtracts the current image from a
reference background image, which is updated on every sequence. The result gives us the non-
stationary object Background subtraction highly dependent on a good background maintenance
model Because extremely sensitive to dynamic scene changes from lightning and other events.
The main problems with background subtraction are:

Motion in the background

Illumination changes

● Memory

● Shadows

● Camouflage

● Bootstrapping

Motion in the Background:

It is the non-stationary background regions in a frame

Illumination Changes:

The background model should be able to adapt, to gradual changes in illumination over
a period of time.

Memory:

The background module should not use much resources, in terms of computing power
and memory.

Shadows:

Shadows cast by moving objects should be identified as part of the background and not
foreground.

11
Camouflage:

Moving object should be detected even if pixel characteristics are similar to those of the
background

Fig 3.3.1.3 Current frame

Fig 3.3.1.3(b) Background Model Frame

Fig 3.3.1.3(c) Output after

3.3.1.4 Gaussian Mixture Model

As we all know, Images are represented as arrays of pixels. A pixel is a scalar or vector
that shows the intensity or colour. A Gaussian mixture model can be used for separating the
pixels into similar segments and modelling the background pixel.

12
The GMM method models the intensity of each pixel with a mixture of K Gaussian
distribution, K Gaussian distributions are used to model each pixel by the Gaussian mixture
model and it is effective in modelling the background with repetitive motions. The probability
that a certain pixel has a value of Xt at time t can be written as:

P(Xt ) = Σ ωk,t . η ( Xt , μk,t , Σk,t ) (where k=1) (3.3)

where K is the number of distributions, ωk,t is the weight of the kth Gaussian in the
mixture at time t, and η ( Xt , μk,t , Σk,t ) is a Gaussian probability density function where μk,t
is the mean value and Σk,t is the covariance of the kth Gaussian at time t.

Every new pixel value, Xt, is checked against the existing K Gaussian distributions in turn,
until a match is found. If no match is found, the last distribution is replaced by a new Gaussian
with the current value as its mean, an initially high variance, and a low weight parameter.

3.3.2 Background Modelling

The basic principle of background subtraction is to compare a static background frame


with the current frame of the video scene pixel by pixel.

At the first stage, here we first develop the background model, then background
subtraction to detect the foreground object.

In the second stage, only stationary pixels are processed to construct the initial
background model. The initial background for a pixel (i, j) is performed by a three-dimensional
vector: the minimum m(i, j), maximum n(i,j) intensity values and the maximum intensity
difference d(i, j) between the consecutive frames observed during this training period. Then the
background model is obtained:

B(i,j)=[ m(i,j), n(i,j), d(i,j)] (3.3.a)

Summarising the whole literature, here we have gone through various Detection methods to
showcase the Motion detection of an object tracking, which is one of the basic and first steps
for video surveillance. The first method we have frame difference method which uses pixel
wise difference from multiple frames, then we have background subtraction which is one of
the basic and mostly used and we have gaussian mixture model which reads the image and
separates the pixel by its similarities

13
3.4 Human motion detection system (Video motion detection module) by Soo Kuo Yang,
March 2005

This literature would be focused on the Video Motion Detection module where we would
perform research on the techniques and methodology to detect motion and to develop a module.
This module would record down motion and pass it into the next module that would be on
object classification where it classifies human and non-human objects. Thus, this literature [13]
is to come up with a solution that detects motion effectively and record it down with one or
more objects that are moving and causing motions.

3.4.1 PROTOTYPE SYSTEM DESIGN & ARCHITECTURE

In this chapter, we will look into the design of the methods and techniques implemented
in the final prototype system. Diagrams of the architecture of motion detection algorithms are
being presented as well.

Fig 3.4.1 Overview of the prototype human motion detection application system

As shown, there are two outputs given by the motion detection module. This means that
there are two algorithms being implemented which are namely spatial update and temporal
update + edge detection whose output were respectively denoted in the above figure.

14
Fig 3.4.1 An overview of the motion detection algorithm implemented.

The background updating model is an important issue for motion detection algorithms.
Since we’ve implemented two distinct algorithms for the background updating module.

3.4.1.1 Image Acquisition

The system developed here can capture sequences of images both from real time video
from a camera or from a recorded sequence, it is defined as the action of retrieving an image
from some source, usually a hardware-based source for processing. It is the first step in the
workflow sequence because, without an image, no processing is possible

15
3.4.1.2 Image Segmentation & Image Subtraction

3.4.1.2 (a) Image subtraction

Here two subsequent images are compared and processed using arithmetic operation of
subtraction to subtract the pixels’ value in the images. Since usually colour images are used,
the algorithm implemented considers the colour data of the images. The technique first
separates the images to three planes or channels. Then it performs the arithmetic subtraction to
each planes or channels. After that, the results are combined back again to form a colour image.

3.4.1.2 (b) Image processing

The next step would be to perform some image processing operations on the result
obtained from the previous step. the result obtained from a threshold function. This result is
further used for recognition purposes as it filters the human body shapes better than the other
output. An adaptive threshold function is implemented here.

3.4.1.2 (c) Contour finding

The next step is performed on the eroded and dilated image and the threshold image to
identify individual contours that are present in the image. These contours are displayed with
different colours for contours considered as separated from one another. Those that share the
same colour may be considered connected to each other.

3.4.1.2 (d) Bounding Rectangle Calculations

Some operations are done in order to remove overlapping boxes or rectangles when
drawn to the source image. Basically, rectangles or boxes that are near and almost crosses one
another’s edges would be joined up to form a larger rectangle or box. This would help eliminate
the problem of only portion of human being returned to be recognised.

3.4.1.2.5 Binary Image Processing

Here, the threshold image obtained is then further enhanced to fill in the empty spaces
inside the binary region which represents the moving object. Basically, an algorithm which
scans through the vertical lines and fills up each vertical line’s first and last white pixels’ region
in between is implemented.

16
3.4.1.2.6 Area Mapping

After the bounding rectangles are identified, the position is then mapped to the source
image and the rectangle being drawn there. Mapping is done for the area of the bounding boxes
drawn in the source and also the corresponding area in the binary processed image. The area
from this processed binary images are the ones to be used for the recognition engines.

3.4.2 Background Updating Algorithms

Here we’ve implemented two algorithms to update the background image which is to
be used in the image subtraction stage of the motion detection algorithm

3.4.2.1 Spatial Update

The first algorithm implemented was designed in such a way where the background
model is updated based on spatial data instead of temporal data. By temporal data here, we
refer to the time or frame number of the images in the sequence. Thus, since this method uses
only spatial data which means the background is updated based on some percentage of the pixel
change in the subtraction result.

3.4.2.2 Temporal Update + Edge Detection

As for the second algorithm, we had implemented here a usual motion detection
approach for updating the background image. Compared to the first algorithm, this method uses
much more computation steps and thus giving slower response or performance. The
information that is obtained for the computations are from subsequent frames and not only
depend on a single subtraction result in fact it doesn’t concern any subtraction process results.
Basically, all frames are used for computation here for a running average image to be produced.
A basic running average method is defined as the sum of the previous value of a pixel at a
certain location with the new value taken by the corresponding pixel at the next frame in the
image sequence with a certain factor of degrading the old pixel value.

The final decision on which algorithm is to be chosen to be implemented in the


application software system depends on the requirements of the application system. If it is for
example a security surveillance system or such, the presence of a human there is much more
important compared to the detection of motion or movements. Being able to detect human
presence in the frame is really useful in case of a smart surveillance, If this is the case, to

17
summarise the literature the latter human motion detection prototype system’s algorithm
developed is much more suitable to be implemented.

18
CHAPTER 4

PROPOSED SYSTEM

4.1 Architecture

Fig 4.1 Architecture of Home Security bot

The proposed system is a home security system which utilizes basic motion detection
and facial recognition for accurate remote monitoring. The system is integrated with a telegram
chat-bot by which the user can receive notifications about possible events. The web camera
from the laptop is used to demonstrate the potential of simple and effective algorithms which
can make use of day-to-day things we use.

Telegram is a free cloud-based instant messaging service. Users can send messages and
exchange photos, videos, stickers etc. through the telegram. The telegram chatbot can be
created with the help of a Bot called Bot-Father which is also a telegram bot which can be used
to create new telegram bots and control the existing bots. A simple and effective algorithm
works in tandem with the open source telegram chatbot, which a opens up a ton of potential
for this simple surveillance system

19
In order to achieve motion detection an computer vision software known as OpenCV is
used. The camera attached divides each frame in the video into a grid and the adjacent frames
are analysed to detect movement of different objects. The different objects in the frames are
classified and the objects are separated from the background using the background separation
algorithm. The system is trained with a class of datasets to distinguish the intruder from the
family members or the friends. So, whenever an intrusion occurs at the building the intended
user gets a notification and a photo regarding at the time of the intrusion. So, the users can view
the live footage of the building by sending simple commands to the chatbot. Due to advanced
technology the large capability of storage is available to store those taken footage and images
correctly at one place.

In this proposed system a method called frame differencing is used in which two or
three adjacent frames based on time series image to subtract and we get different images, its
working is very similar to background subtraction after the subtraction of image it gives
moving target information through the threshold value. This kind of method is highly adaptive
to dynamic scene changes as it generally fails in detecting whole relevant pixels of some type
of moving objects. The basic principle of background subtraction technique is separating the
estimated image from the observed image. This foreground process divides the image into two
complementary sets of pixels. There are some of the criteria that must be satisfied by every
detection algorithm. It must adapt itself by sudden changes like illumination changes, motion
changes, high frequency objects and their geometry especially in the outdoor surveillance
scenes. These include the unfamiliar changes in light intensity, camera oscillation, objects like
trees and parked vehicles.

Let Image be represented as F(x,y,t) and Background as K(x,y,t) at time t. Using Frame
Differencing method, background frame is represented as

K(x,y,t) = F(x,y,t-1) (4.1)

The background can thus be estimated if

F(x,y,t)–F(x,y,t-1)| > Thr (4.1a)

Median filter uses the median of n previous frames as the background model

K(x,y,t) = median{F(x,y,t-i)} (4.1b)

|F(x,y,t) - median{F(x,y,t-i)}| > Thr (4.1c)

Where i = {0,1…n-1}

20
Background subtraction is the commonly used technique for motion segmentation in
some static images. This method will detect moving regions by subtracting the current image
pixel-by-pixel from a reference background image that is created by averaging images over
time in an initialization period. The basic idea of the background subtraction method is to
initialize a background first, and then by subtracting the current frame in which the moving
object presents the current frame is subtracted with the background frame to detect the moving
object. Background subtraction methods operate on pixels independently. All the pixels are
divided into groups first of N×N blocks and every block is processed as a 𝑁2 component vector.
Pixels are then classified based on the threshold difference between current image and
backspace projection of its PCA coefficients.

Fig 4.1(a): Data flow diagram of motion detection algorithm

In face recognition, we have the face Detector which helps in localization of an image
in which the distinct features of the face can be identified. Then we have the Embedder which
is responsible for extracting facial embeddings via deep learning feature extraction.

21
Fig 4.1(b): Data flow diagram of facial recognition

It analyses the images and returns numerical vectors that represent each detected face. At first
we will load the image into the memory and construct a blob, blob (Binary large object) is a
collection of binary data stored i.e. image in binary form. And localize the faces in the image

22
via “detector” and present it to "detections". After that, it will loop over all the images present
in detections, then we can extract the Confidence score which is the probability that an anchor
box contains an object predicted by a classifier. A detection is considered a true positive (TP)
if it satisfies three conditions: confidence score > threshold; the predicted class matches the
class of the desired output; the predicted bounding box has an Intersection over Union (IoU)
greater than a threshold with the desired output. Violation of either of the latter two conditions
makes a false positive (FP). Then we compare the confidence to the minimum probability
detection threshold contained in our command line “args” dictionary, ensuring that the
computed probability is larger than the minimum probability. From there we can extract the
Face’s Region of Interest (ROI). For displaying the results, we construct a” text” string
containing the name and probability. And then we draw a bounding box around the face and
place the text above the box, and then finally we visualize the results on the screen.

23
CHAPTER 5

IMPLEMENTATION

Telegram is an instant messaging application which we use daily for chatting with family and
friends. The Free and Open Source nature of Telegram helped the developers to release a set
of APIs which are used for developing bots. Bots are the applications which automates the
tasks. By using this bot, it is possible to chat with home appliances from anywhere in the world.
In this paper we developed one such bot running on raspberry pi which is connected to a
camera. The bot receives the user's instruction and sends a reply accordingly. In this paper we
developed a security system in which the telegram chat bot helps the user to get an alert signal
whenever an intruder gets into the house. the camera and the raspberry pi are connected to the
telegram chatbot. Whenever there is a motion the user gets an alert signal and an image is sent
to the telegram chatbot. Security is ensured in such a way that a private token key will be
generated and it is unique for each user. The bot responds only for users whose token key is
registered. The bot checks the token key and gives access to the user for receiving the snapped
images on the camera. In this implementation the camera will not only detect the movement
but it can also identify who is in the frame. This helps to avoid sending false alerts.

Python is a cross-platform general purpose programming language. It is free and open-


source. Its high-level built in data structures, combined with dynamic typing and dynamic
binding, make it very attractive for Rapid Application Development, as well as for use as a
scripting or glue language to connect existing components together. The edit-test-debug cycle
is incredibly fast. Debugging Python programs is easy: a bug or bad input will never cause a
segmentation fault. Python can be used to build just about anything which is supported by a
vast array of libraries and built in functions. The OpenCV library used for image processing is
used in our project for motion detection and facial recognition. Python also has the telepot
library which makes it easier to write the script for the chatbot and the seamless integration
between chatbot and motion detection/recognition algorithms.
OpenCV is an open-source library for image processing and for other various deep
learning methods and OpenCV does an extensive job in real-time application which is
significant for surveillance systems. It has numerous collections of libraries that support image
processing. OpenCV applies image processing algorithms in real time web cameras. With the
help of OpenCV one can identify any feature by processing the respective image or video.

24
Various mathematical and vector operations are used to analyse the image patterns and its
features. OpenCV and python libraries can work in a perfect line-up which can be very efficient
in coding.
Telegram is an online messaging app which is cloud based that works just like popular
messaging apps like WhatsApp. This means that we can use it to send messages to our friends
when connected to the internet via Wi-Fi or our mobile data. It uses voice over IP service.
Telegram is cloud-based and claims that it prioritizes security and speed, making it a good
alternative to other popular messaging apps. The thing which makes Telegram unique is that
it's open source which gives the developers a ton of freedom for the developers. And the
availability of telegram bot’s API opens up a whole array of possible project ideas.
Telepot is a python package which helps our program to talk with Telegram bot API. It
works on Python 2.7 and Python 3. For Python 3.5+. The command “pip install telepot” is
made use of to install the telepot package in our machine. To use the Telegram Bot API, we
first have to get a bot account by chatting with BotFather. BotFather will give us a token. With
the token in hand, we can start using telepot to access the bot account. Dlib is a C++ library,
we can use a number of its tools from python applications. It contains many algorithms and
tools regarding Machine learning. Dlib supports a histogram of oriented gradients (HOG) and
SVM which are crucial for image processing. Imutils is yet another library in python which
contains certain functions for image manipulation during image processing. It helps to resize,
translate, rotate video frames from the webcam. Imutils works along with matplotlib and also
displays the result of matplotlib computations. Imutils can easily be installed with a single
command ‘pip install imutils’.
Scikit-learn is a library focused on Machine learning used in python programming. Its
library includes NumPy, Matplotlib, sci-py etc which helps in array and algebraic operations.
Scikit-learn offers various algorithms such as SVM and gradient boosting which are mostly
suitable for face tracking.
5.1 Face Recognition

Algorithm 1. Face Recognition

1. Gather the face dataset of the user which is to be identified.

2. Each input batch of data includes positive image and negative image.

3. Extract embeddings from face dataset

25
3.1 We are using Caffe based DL face detector to localize faces in an image
3.2 Next model is Torch-based and is responsible for facial embeddings via deep learning
feature extraction.
4. Detect faces in the image by passing through the detector network
4.1 Extract the detection with the highest confidence and check to make sure that the
confidence meets the minimum probability threshold used to filter out weak
detections
5. We have extracted 128-d embeddings for each face.
6. initialize our SVM model, and train the model
7. Using a deep neural network, compare the input data (128-d vector) to the known data set.
8. Recognize the face.

In face recognition, we have the face Detector which helps in localization of an image
in which the distinct features of the face can be identified.

To load face detector

protoPath = os.path.sep.join([args["detector"], "deploy.prototxt"])

modelPath = os.path.sep.join([args["detector"],

"res10_300x300_ssd_iter_140000.caffemodel"])

detector = cv2.dnn.readNetFromCaffe(protoPath, modelPath)

Then we have the Embedder which is responsible for extracting facial embeddings via deep
learning feature extraction. It analyses the images and returns numerical vectors that represent
each detected face

To load the Embedder

embedder = cv2.dnn.readNetFromTorch(args["embedding model"])

26
Fig 5.1 : Facial Recognition process

5.1.1 Detecting Face with Open-CV

At first we will load the image into the memory and construct a blob, blob (Binary large object)
is a collection of binary data stored i.e. image in binary form.

image = cv2.imread(args["image"])

image = imutils.resize(image, width=600)

(h, w) = image.shape[:2]

imageBlob = cv2.dnn.blobFromImage(

cv2.resize(image, (300, 300)), 1.0, (300, 300),

(104.0, 177.0, 123.0), swapRB=False, crop=False)

And localize the faces in the image via “detector” and present it to "detections". After that, it
will loop over all the images present in detections, then we can extract the Confidence score

27
which is the probability that an anchor box contains an object predicted by a classifier. A
detection is considered a true positive (TP) if it satisfies three conditions: confidence score >
threshold; the predicted class matches the class of the desired output; the predicted bounding
box has an Intersection over Union (IoU) greater than a threshold with the desired output.
Violation of either of the latter two conditions makes a false positive (FP).

for i in range(0, detections.shape[2]):

confidence = detections[0, 0, i, 2]

Then we compare the confidence to the minimum probability detection threshold contained in
our command line “args” dictionary, ensuring that the computed probability is larger than the
minimum probability. From there we can extract the Face’s Region of Interest (ROI)

if confidence > args["confidence"]:

box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])

(startX, startY, endX, endY) = box.astype("int")

face = image[startY:endY, startX:endX]

(fH, fW) = face.shape[:2]

For displaying the results, We construct a” text” string containing the name and probability.
And then we draw a bounding box around the face and place the text above the box, and then
finally we visualize the results on the screen

28
Fig 5.2 Facial recognition in Video stream

5.2 Motion Detection

Algorithm 2. Motion detection

1. Input the video frames


2. Initialise the first frame to a still frame with no motion.
3. Resize frames to a width of 500px
4. Convert frames to grayscale
5. Apply Gaussian blur to smooth the images
6. Compute difference between first frame and subsequent frames from video stream
6.1 Take the absolute value of their corresponding pixel intensity differences delta =
|background model – current frame|
7. Threshold framedelta, indicating regions of very small change. If delta is less than 25,
discard the pixel and set it to black
8. Start looping over each contour, filter small ones.

29
9. If the contour area is larger than the minimum area, draw a bounding box surrounding the
foreground and motion region.
10. Update room status to occupied, add timestamp to frame.

Define a variable ‘min-area’ which is the minimum number of pixels changing in the region of
an image to be considered ‘actual motion’. This is done in order to filter out false positives.
The camera is started and the first frame of the video is initialised to a still frame. Each frame
is taken and it is resized, turned to grayscale and the image is smoothed with Gaussian blur.

frame = imutils.resize(frame, width =500)

gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

gray = cv2.GaussianBlur(gray, (21, 21), 0)

Compute difference between first frame and subsequent frames to calculate absolute value of
pixel intensity differences. If the difference is less than 25, the pixel is discarded and set to
black. If it is greater than 25, the pixel is set to white.

Apply contour detection to find outlines of white regions and start looping over each contour
to filter out small irrelevant contours. If the contour area is larger than ‘min area’, a bounding
box is drawn surrounding the foreground and motion region.

if cv2.contourArea(c) < args(“min area”):

(x, y, w, h) = cv2.boundingRect(c)

cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)

30
Fig 5.2 Bounding box drawn over moving object

5.3 Telegram Notifications

Algorithm 3. Creating the Chat-Bot

There is a bot called BotFather in Telegram which can be used to create new bot Accounts.
BotFather can also be used to manage Existing Bot accounts.

Steps to create new Telegram Bot

1) Open Telegram

2) Search BotFather in the Global search and Open it

3) Type “/newbot “to create a new bot

4) Choose a name for the bot

5) Choose a unique Username for the bot and the username must end with “bot”.

6) Now you get a unique Token which can be used to access and control the Bot

7) Use the Token in the source code to control the Bot

31
Facial recognition and motion detection algorithms work together to find any possible
intruders into our home. A flag variable is initialised to check movement. The facial recognition
labels are stored in pickle file named labels. When a motion is detected, the program looks
through the labels and if it's not found, the frame will be saved to disk and immediately sent to
the user as an alert notification.

if ( labels_classes[j] and flag == 1):

cv2.imwrite( filename=”image.jpg”, img=frame)

telegrambot.sendImage(“image.jpg”, ‘rb’, caption= “Motion detected”)

Fig 5.3: Telegram alert notification

5.3.1 Recording Surveillance Footage

The system starts recording the footage once the surveillance starts running, and the user can
also request a 5 sec video clip if he/she wants to know what’s going on.

32
cv2.videoStream is used to read from the camera, the filename and CODEC to be used for
encoding the video is initialised before the streaming starts.

footage = cv2.VideoWriter('footage.avi', cv2.VideoWriter_fourcc(*'XVID'), 30, (640, 480))

where footage is file to where the video is written, XVID is codec used for encoding the video,
30 is frame rate of the footage and 640x480 is the resolution at which the video will be
recorded.

frame = vs.read()

footage.write(frame)

Each frame read from the camera is written to the footage file. For the user to send the
surveillance footage the following script is used.

if command == 'Footage':

telegram_bot.sendMessage (chat_id, str("Sending surveillance footage.."))

telegram_bot.sendDocument(chat_id, open(“Footage.avi” , ‘rb’) )

Receiving the surveillance footage could take some time depending upon the size of the file
and the bandwidth of the internet connection. The user can also request a 5 second clip by
sending the command ‘Video’.

if command == 'Video' :

tim=1

cont = cont + 1

start_time = time.time()

print(int(start_time))

while(int(time.time() - start_time) < 5):

if(tim==1):

result.write(frame)

33
print("Done recording")

telegram_bot.sendMessage (chat_id, str("Recorded! 5 seconds added to video!"))

telegram_bot.sendDocument(chat_id, document=open('filename.avi’, ‘rb’))

Fig 5.3 (a): Chatbot sending surveillance footage

34
CHAPTER 6

EVALUATION

6.1 Motion detection

In this chapter the system developed will be evaluated against different test cases and
varying scenarios. Even though the algorithm is able to draw the bounding box correctly over
a detected object, it is highly dependent on the threshold of the subtracted frame. The system
uses ‘.avi’ for saving the video-files. Full bound means the whole motion detected area is
wrapped in a bounding box, adequate bound means the area bound is passable to let the
algorithm recognise motion meaning it covers necessary areas.

The time taken to detect a moving object is calculated and it is based on the detection of an
object’s contour. Detection time is directly dependent upon the speed and position of the object
in motion. The position of the object is calculated using spatial moments.

V = (Pi – Pi-1) / (Ti – Ti-1) 6.1

Where V = speed of the object and Pi and Pi-1 are positions of the centre of the detected object
in subsequent frames (current frame i and previous i-1), Ti and Ti-1 are the times of frames i
and i-1.

In this experiment the objective is to calculate the speed limit where the algorithm can detect
the boundary of an object and its form.

Pi – Pi-1 0.30 0.45 0.33 0.65 0.49 0.79 0.94 0.79 No


Detection
Ti – Ti-1 0.49 1.00 0.58 0.95 0.51 0.62 0.64 0.56

Speed 0.4 0.47 0.6 0.69 0.96 1.29 1.45 1.5

Table 6.1: Distance travelled, time taken and speed of object in motion.

35
Fig 5.1: Line chart showing speed and position analysis of detection.

In the next method, we use recall r and precision p to calculate F1 in order to evaluate
performance of our algorithm.

Recall = TP / TP + FN 6.1(a)

Precision = TP / TP + FP 6.1(b)

Where TP = number of accurately detected objects which have correct shape.

FN = number of objects in frame which are not detected but have desired shape.

FP = total number of objects detected in frame not having desired shape.

F1 = weighted average of recall and precision.

F1= 2 * (precision - recall) / (precision + recall) 6.1(c)

F1 reaches its best value near to 1 and its bad value near to 0.

Scene Number of TP FP FN R P F1
appearances

One 150 124 1 26 0.83 0.99 0.90


object in
frame

36
Objects in 287 271 28 16 0.94 0.91 0.92
motion

Object 150 115 20 35 0.77 0.85 0.81


with some
sides

Varying 150 112 30 38 0.75 0.79 0.77


Brightness

Table 6.1(a): Recall, Precision and F1- measure calculated.

Fig 6.1(a): Line Chart showing values of F1, recall and precision.

Calculated values show that the proposed algorithm based on background subtraction allows
detection of moving objects with some distinct geometric shape. Despite the FP and FN, the
value of F1 is good and is close to 1. The false positive values occur when there are other
objects in front of moving objects causing occlusions, change in shadows or lighting. This can
be controlled by increasing the threshold value.

37
6.2 Face Recognition

In order for face recognition to work properly a trainer file is essential. A model is trained
consisting of images of faces of people which is then saved as a trainer file with the extension
‘.yml’. This file is loaded each time and is used by a local binary pattern histogram algorithm
for the purpose of facial recognition. The number of photos per person used for training the
model is selected carefully to balance between latency and accuracy.

Fig 6.2: Size of trainer file vs Loading time

In order for facial recognition algorithm to work properly the face has to be seen in frame. The
accuracy of the algorithm varies with the angle of the face in the scene. Metric evaluation is
used next to calculate accuracy.

Trials TP FN FP TN Accuracy

1-6 8 0 1 2 90.90 %

7-8 7 0 2 1 80 %

9 6 0 1 0 85.71 %

10 5 1 1 1 75 %

11 11 0 0 1 100 %

38
12 5 0 0 0 100 %

13 6 0 1 1 87.5 %

Table 6.2: Facial Recognition Results

Where Accuracy = (TP + TN) / (TP + TN + FP + FN) and the average accuracy is 88.44%

6.3 Telegram Notifications

The arrival time of intrusion alerts in telegram is directly dependent on the bandwidth of
the internet and if there are files being sent, the size of which also affects how fast the
notifications arrive.

Notification type Arrival time (seconds)

Text notifications 0.1

Captured Image 1

Image captured when motion is detected 2

Video file (< 2 mb) 10

Video file above 10mb 60 – 180 and above

Table 6.3: Time of arrival of Telegram notification

Fig 6.3: Arrival time of telegram notifications based on file size

39
CHAPTER 7

CONCLUSION

A smart surveillance system was designed which incorporated both the motion detection and
facial recognition algorithm to improve the accuracy of alert notification sent to the user. The
alert is sent as notifications to the user via a telegram chatbot where the user can send
commands to the system through chat. Telegram being open source and free allows for storage
of video files and images sent to users on their servers. This feature of telegram makes the
system extremely cost effective.

This system is made completely in OpenCV which makes it efficient and fast because OpenCV
is less resource intensive and can work in low end systems which makes our system budget
friendly.

The frame rate of the video streaming is greatly affected by the combination of motion
detection and facial recognition. However, the real time performance is unaffected. When a
motion is detected, the system looks for a face for some time, if it can’t find any it will send an
alert. But if the face is detected it will capture the image and send the image as alert notification,
when this happens there is a huge spike in frame drop while streaming but the detection works
fine.

Initially the algorithm picked up even the slightest change in pixels like shadows and change
in lighting which made false alerts but this was controlled by increasing the threshold density
value. The algorithm made false facial recognitions while the face was seen from different
angles. This was improved slightly training a large dataset for each person, pre-processing of
face images like face alignment made significant improvements.

However, the accuracy of facial detection was poor when working in low light or when the
face was partially covered. The face recognition with OpenCV was less resource hungry than
YOLO or TensorFlow which made it cost effective but this also affects the accuracy of the
system.

One other downside could be the time taken for arrival of notifications on telegrams. Regular
messages like image and text notifications arrive really fast but video files which are usually

40
large in size could take a while to arrive. The slight dip in accuracy of the system is a trade-off
between cost and performance.

41
REFERENCES

[1] B. Ortiz-Jaramillo, A. Kumcu “Computing contrast ratio in images using local content
information” – 2017, Vol.25, Issue.3, 6 Pages
[2] Chris Rowen, Samer Hijazi, Rishi Kumar, “Convolutional neural networks for image
recognition”- 2016, Vol.55, Issue.4, 8 Pages.
[3] Harish Kumar Sharma, Mayank Sharma “IOT based home security system with
wireless sensors and telegram messenger” Vol.25, Issue.6, 6 Pages.
[4] Hong Phat Truong, Justin Joseph2 “Low-Cost Computing Using Raspberry Pi 2 Model
B”-2017, Vol.5, Issue.2, 13 Pages.
[5] Jinzhou Huang, Ming Zhou “Extracting Chatbot Knowledge from Online Discussion
Forums” – 2018, Vol.2, Issue.1, 8 Pages
[6] Kaushal Mittal et.al “Classification, Clustering and Application in Intrusion Detection”
– 2014, Vol.4, Issue.1, 7 Pages.
[7] K. Sripath Roy, Bhanu Prakash “Realization of a low-cost smart home security using
telegram messenger and voice”-2017 Vol.115, Issue.5, 6 Pages.
[8] Michael Bächle, Stephen Daurera “Chatbots as a user interface for assistive
technology in the workplace”-2016, Vol.3, Issue.2, 6 Pages.
[9] P. Vigneswari, R.R. Narmatha “Automated security system using surveillance”, -2015,
Vol.5, Issue.2, 5 Pages.
[10] Priya B. Patel, Viraj M. Choksi “Smart motion detection system using raspberry pi”-
2017 Vol.10, Issue.5, 4 Pages.

[11] Rupesh Kumar Rout et.al “A survey on object detection and tracking algorithms”-2013
Vol.2, Issue.3, 5 Pages.

[12] Samer Hijazi, Rishi Kumar, Chris Rowen “Convolutional neural networks for image
recognition”- 2016, Vol.55, Issue.4, 8 Pages.
[13] Soo Kuo Yang et.al, “Human motion detection system (Video motion detection
module)”, March 2005, Vol.2, Issue.2, 10 Pages.

[14] https://www.elprocus.com/gsm-based-home-security-system-working-with-
applications/, referred on 5th October 2019
[15] https://www.instructables.com/id/Raspberry-Pi-as-low-cost-HD-surveillance-camera/,
referred on 26th September 2019

42
[16] https://www.hackster.io/hackershack/smart-security-camera-90d7bd, referred on 10th
October 2019
[17] https://machinelearningmastery.com/how-to-train-an-object-detection-model-with-
keras/, referred on 3rd November 2019

[18] https://magpi.raspberrypi.org/articles/smart-security-camera, referred on 25th August


2019
[19] https://www.pyimagesearch.com/2019/03/25/building-a-raspberry-pi-security-camera-
with-opencv/, referred on 27th August 2019
[20] https://towardsdatascience.com/make-your-own-smart-home-security-camera-
a89d47284fc7, referred on 27th September 2019
[21] https://tutorials-raspberrypi.com/raspberry-pi-security-camera-livestream-setup/,
referred on 14th September 2019

43
APPENDIX-1

Fig 1: Sample code in python IDE: Atom

Fig 1 (a): Smart home security code running in terminal

44
Fig 1 (b): Code snippet of telegram alert notifications

45
APPENDIX-2

Fig 2: Facial recognition and motion detection working together

Fig 2 (a): Chatbot sending intruder alert

46
Fig 2 (b): Chatbot sending alerts about unknown movement

47

You might also like