Visvesvaraya Technological University: BELAGAVI-590018

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

BELAGAVI-590018

A Project Report On

“FACESMASH: AI ASSISTED SOLUTION FOR MUSIC,


CROWD MANAGEMENT AND SECURITY”

Submitted in the partial fulfilment of the requirements for the award of the Degree of
Bachelor of Engineering in Computer Science and Engineering

Submitted by
Drithiman M (1OX14CS022)
Karuna Kiran Bhadra (1OX14CS022)
Salman Ulla (1OX14CS022)
Ransom David (1OX14CS022)

Under the support and guidance of

Ms. Jessy Janet Kumari


Assistant Professor,
Department of CSE

Department of Computer Science and Engineering


The Oxford College of Engineering
Hosur Road, Bommanahalli, Bangalore-560068
(Approved by AICTE, New Delhi, Accredited by NBA, NAAC, New Delhi & Affiliated to VTU, Belagavi)
2017-2018
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
THE OXFORD COLLEGE OF ENGINEERING
Hosur Road, Bommanahalli, Bangalore-560068
(Approved by AICTE, New Delhi, Accredited by NBA, NAAC, New Delhi & Affiliated to VTU, Belagavi)

CERTIFICATE

Certified that the seminar entitled “FACESMASH” carried out by DRITHIMAN M


(10X14CS022), KARUNA KIRAN BHADRA (1OX14CS039), SALMAN ULLA
(1OX14CS50), RANSOM DAVID (1OX14CS0400) bonafide student of The Oxford
College of Engineering, Bangalore in partial fulfilment for the award of the Degree of
Bachelor of Engineering in Computer Science and Engineering of Visvesvaraya
Technological University, Belagavi during the year 2017-2018. The project report has
been approved as it satisfies the academic requirements in respect of project work
prescribed for the said degree.

Ms. Jessy Jane Kumari Dr. R.J. Anandhi Dr. R.V Praveena Gowda
Project Guide H.O.D, Dept. of CSE Principal, TOCE

External Viva

Name of Examiner Signature with date


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
THE OXFORD COLLEGE OF ENGINEERING
Hosur Road, Bommanahalli, Bangalore-560068
(Approved by AICTE, New Delhi, Accredited by NBA, NAAC, New Delhi & Affiliated to VTU, Belagavi)

Department Vision

To establish the department as a renowned center of excellence in the area of


scientific education, research with industrial guidance, and exploration of the
latest advances in the rapidly changing field of computer science.

Department Mission

To produce technocrats with creative technical knowledge and intellectual


skills to sustain and excel in highly demanding world with confidence.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
THE OXFORD COLLEGE OF ENGINEERING
Hosur Road, Bommanahalli, Bangalore-560068
(Approved by AICTE, New Delhi, Accredited by NBA, NAAC, New Delhi & Affiliated to VTU, Belagavi)

DECLARATION

We, student of Eighth semester B.E, at the Department of Computer Science


and Engineering, The Oxford College of Engineering, Bangalore declare that
the Seminar entitled “Turning Design Mock-Ups Into Code With Deep
Learning” has been presented by me and submitted in partial fulfilment of
course requirements for the award of degree in Bachelor of Engineering in
Computer Science and Engineering discipline of Visvesvaraya Technological
University, Belagavi during the academic year 2017-2018.

Place: Bangalore
Date:
ACKNOWLEDGEMENT

The satisfaction and euphoria that accompany the successful completion of any task
would be incomplete without the mention of the people who made it possible, whose
constant guidance and encouragement crowned our efforts with success.

I have great pleasure in expressing my deep sense of gratitude to my respected


founder chairman late Shri S. Narasa Raju and to the respected chairman Shri S.N.V.L
Narasimha Raju for having provided me with great infrastructure and well-furnished labs.
I take this opportunity to express my profound gratitude to our respected Principal
Dr. R V Praveena Gowda for his constant support and encouragement.
I am grateful to the Vice Principal and Head of the Department Dr. R. J. Anandhi,
Department of CSE, for her unfailing encouragement and suggestion given to me in the
course of my Seminar work.

Guidance and deadlines play a very important role in successful completion of the seminar
report on time. I also convey my gratitude to Ms. Jessy Janet Kumari , Assistant
Professor, Department of CSE, for having constantly monitored the development of the
seminar report and setting up precise deadlines.

Finally, a note of thanks to the Department of Computer Science Engineering, both teaching
and non-teaching staff for their co-operation extended to me.

ABSTRACT
Music plays an important role in an individual’s life. It is an important source of
entertainment and is often associated with a therapeutic role. Listening to music can help
reduce stress according to many studies. It can help relieve a person from anxiety,
depression, and other emotional and mental problems. Using traditional music players, a
user has to manually browse through his playlist and select songs that would soothe his
mood and emotional experience. This task was labor intensive and time consuming.The
second functionality that we are implementing through the application is a Group-Emotio n
analyzer that can complement the music player in order to figure out the general emotiona l
state of a gathering such as a party, or a club and dynamically adjust the music the pattern
accordingly. This would, in essence, create a virtual DJ, and eliminate the need for a real
one. Thus saving on cost and making the process more efficient.

Next, we are going to provide throgh the application a data-driven crowd analysis system.
This will take advantage of the multi- face tracking feature of the algorithm in use. From a
regularly timed high-definition camera we will input images of the required area. The
algorithm then tracks each decipherable face and provides approximate attribute values for
the people whos faces are tracked. This means we get a random sample of the crowd to
analyze. This can provide us with metrics such as the age groups to which the people in the
crowd belong, the gender distribution, the emotional state of the crowd etc. This can
immensely help people in professions such as event management, or to identify crisis spots
in the crowd to form a kind of early warning system.

Finally, we aim to provide a personal security platform throgh the application. The
application can be used by security personnel in an restricted entry scenario to check against
matches in the local database for people with known criminal background. This will act as
a strong deterrent for people with bad intentions. The application will use Baysian networks
in the server to act as a match modeller. Acceptable confidence level will be 70%.

Our other aim is to innovate a roadmap to a new kind of UX for all software applications.

One that works with human intuition and is smart. This will lower the barrier to use
technology for all people and pave way for a new form of interaction.

ii

Table of Contents
Acknowledgement i
Abstract ii

Table of Contents iii

List of Figures iv

Chapter Page no

1. Preamble
1.1 Definition
1.2 Overview
1.3 About
1.4 Problem Definition
1.5 Problem Explanation

2. SYSTEM DESIGN

3 SYSTEM REQUIRMENT SPECIFICATION


4 SYSTEM DESIGN AND MODELLING
5 IMPLEMENTATION
6 SOFTWARE TESTING
7 INFERENCE FROM RESULTS

8 CONCLUSION 12

REFERENCES 13

iii

List of Figures

Figure Page No.


Fig 2.1 Airbnb’s demo of their internal AI tool to go from 4
drawings to code

Fig 2.2 The SketchCode model takes drawn wireframes and 5


generates HTML code

Fig 2.3 Image captioning models generate descriptions of 5


source images

Fig 2.4 The pix2code dataset of generated website images 6


and source code

Fig 2.5 Turning colourful website images into hand-drawn 7


versions

Fig 2.6 Training the model using sequences of tokens as 8


input

Fig 2.7 Visualizing the BLEU score 9

Fig 2.8 One drawing gives rise to many styles generated 10


simultaneously

iv

CHAPTER 1
INTRODUCTION

1.1 INTRODUCTION TO FACIAL RECONGNITION AND MOOD


DETECTION
FaceSmash 2018

A facial recognition system is a technology capable of identifying or verifying a person


from a digital image or a video frame from a video source. There are multiples methods in
which facial recognition systems work, but in general, they work by comparing
selected facial features from given image with faces within a database.
While initially a form of computer application, it has seen wider uses in recent times on
mobile platforms and in other forms of technology, such as robotics.

It is typically used in security systems and can be compared to other biometrics such
as fingerprint or eye iris recognition systems.[1] Recently. it has also become popular as a
commercial identification and marketing tool.[2]

In the case of Mood Detection, Emotion recognition is the process of identifying


human emotion, most typically from facial expressions. This is both something that
humans do automatically but computational methodologies have also been developed.

The above process leverages techniques from multiple areas, such as signal
processing, machine learning, and computer vision. The computers use different methods
to interpret emotion such as Bayesian networks. [3]

Fig 1.1 Shows All the emotions that can be classified.

1.2 OBJECTIVES AND GOALS

Facial expressions are important in facilitating human communication and


interactions. Also, they are used as an important tool in behavioural studies and in medical
rehabilitation. Facial image based mood detection techniques may provide a fast and
practical approach for non-invasive mood detection. The purpose of the present study was
to develop an intelligent system for facial image based expression classification using
committee neural networks.
Department of CSE, TOCE 2
FaceSmash 2018

Facial expressions and related changes in facial patterns give us information about
the emotional state of the person and help to regulate conversations with the person.
Moreover, these expressions help in understanding the overall mood of the person in a
better way. Facial expressions play an important role in human interactions and non-verbal
communication. Classification of facial expressions could be used as an effective tool in
behavioural studies and in medical rehabilitation. Facial expression analysis deals with
visually recognizing and analysing different facial motions and facial feature changes.

The main objective of the system being built is to be able to recognize facial
expression and predict the emotion based upon a present neural structure for recognitio n
for the same.

The system should be able to correctly identify the correct facial expression when presented
an image from subjects used in training or in initial testing. Committee neural networks
offer a potential tool for image based mood detection and even crowd analysis and
management. It can also be use on one or more than one face per transaction and can be
helpful in crowd based analysis of mood and intent of the crowd.

1.3 Existing System


Research efforts in human–computer interaction are focused on the means to
empower computers (robots and other machines) to understand human intention, e.g.
speech recognition and gesture recognition systems [1]. In spite of considerable
achievements in this area during the past several decades, there are still a lot of
problems, and many researcher are still trying to resolve them.
Besides, here is another important but ignored mode of Communicatio n
that may be important for more natural interaction emotion plays an important role in
contextual understanding of messages from others in speech or visual forms.

There are numerous areas in human–computer interaction that could effective ly


use the capability to understand emotion. For example, it is accepted that emotional ability
is an essential factor for the next-generation personal robot, such as the Sony AIBO. It can
also play a significant role in intelligent room and
affective computer tutor. Although limited in number compared
with the efforts being made towards intention-translation means, some researchers
are trying to realise man machine interfaces with an emotion understanding capability.

Most of them are focused on facial expression recognition and speech


signal analysis. Another possible approach for emotion recognition is physiological signal
analysis. We believe that this is a more natural means of emotion
recognition, in that the influence of emotion on facial expression or speech
can be suppressed relatively easily, and emotional status is inherently reflected in the
activity of the nervous system. In the field of psychophysiology, traditional tools for
the investigation of human emotional status are based on the Recording
and statistical analysis of physiological signals from both the central and autonomic
nervous systems.

Department of CSE, TOCE 3


FaceSmash 2018

Researchers at IBM recently reported an emotion recognition device based on mouse-


type hardware. Picard and colleagues at the MIT Media
Laboratory have been exerting their efforts to implement an affective computer since the
late 1990s. First, their algorithm development and performance tests were carried out with
data that reflect intentionally expressed emotion.

The existing system is too naïve to work with and can not have any further evolutio nar y
growth to it. The existing system proposes to use the Viola-Jones algorithm to classify
facial expressions and select playlist. Fig 1.1 will detail the architecture of the system.

Fig 1.1 shows the arch followed by viola-jones Open CV method of facial feature
recognition.

The existing system uses OpenCV to classify the face emotions. This may lead to
incompatibility in all devices, as on the systems that do not run OpenCV, the code will not
perform as expected.

1.4 Proposed System

Our proposed system follows the following the action pattern for the required use cases:

USE CASE I : Personal Music Player


1. User triggers camera to take a selfie
2. Captured picture raw data is sent to server

Department of CSE, TOCE 4


FaceSmash 2018

3. Server uses a Bayesian network to evaluate the data and produce classification of the
input.
4. This classification is used to select a playlist that best matches the emotional state of
the user.
5. The audio files corresponding to the playlists are then streamed one by one to the
application.
6. The applications provides a player front-end to control the playback.

USE CASE II : Group-specific Music Player


Same methodology as above is used. Except, the classification is put through a further
analysis phase to find a average emotion value.

USE CASE III : Crowd analytics


1. Capture crowd image for the requred area.
2. Captured raw data sent to server.
3. Server classifies the legitimate faces in the raw data.
4. Algorithm determines the attributes associated with each face.
5. For emotion, majority attributes are selected to represent the overall emotion.
6. For age, a percentage data representation provides the approximate percentage of the
crowd for each age bracket.
7. For gender, a majority value is returned.
8. All data are displayed in the application.
9. User decides the policies relating to the output.

USE CASE IV: Facial recognition platform, to be used as a security tool at checkpoints.
1. Picture of the entering person is captured through the application.
2. The picture is sent over to the server containing the database of local and regional
criminal profiles.
3. The face is deconstructed into mesh.
4. Mesh is compared against database.
5. If match is found then a alert is raised discreetly.
6. Else, the person is allowed to enter.

The Fig. below shows the proposed system flow diagram of the application FaceSmash.

Department of CSE, TOCE 5


FaceSmash 2018

Fig. 1.1 Use case diagram of proposed system

1.5 Summary

FaceSmash is an android application that is aimed to realize the reality of facial detection
and emotion detection using artificial intelligence and machine learning. This will be then
use for the above said use cases, ai assisted solution for music playing, crowd manageme nt
and security.

The project contains two three modules,


1. Emotion based music suggestion
2. Crowd Analyzer.
3. Facial Recongition.

Department of CSE, TOCE 6


FaceSmash 2018

CHAPTER 2
SYSTEM DESIGN

2.1 Design Consideration

Facial expressions play a significant role in human dialogue. As a result, there has
been considerable work done on the recognition of emotional expressions and the
application of this research will be beneficial in improving human-machine dialogue. One
can imagine the improvements to computer interfaces, automated clinical (psychologica l)
research or even interactions between humans and autonomous robots.

Unfortunately, a lot of the literature does not focus on trying to achieve high
recognition rates across multiple databases. In this project we develop our own mood
detection system that addresses this challenge. The system involves pre-processing image
data by normalizing and applying a simple mask, extracting certain (facial) features using
PCA and Gabor filters and then using SVMs for classification and recognition of
expressions. Eigenfaces for each class are used to determine class-specific masks which are
then applied to the image data and used to train multiple, one against the rest, SVMs. We
find that simply using normalized pixel intensities works well with such an approach. Fig
1.1 details the system overview.

Fig 1.1 Overview of System Design.

We performed pre-processing on the images used to train and test our algorithms as
follows:

1. The location of the eyes is first selected manually


2. Images are scaled and cropped to a fixed size (170 x 130) keeping the eyes in all
images aligned
3. The image is histogram equalized using the mean histogram of all the training
images to make it invariant to lighting, skin colour etc.
4. A fixed oval mask is applied to the image to extract face region. This serves to
eliminate the background, hair, ears and other extraneous features in the image
which provide no information about facial expression.

Department of CSE, TOCE 7


FaceSmash 2018

This approach works reasonably well in capturing expression-relevant facial informatio n


across all databases. Examples of pre-processed images from the various datasets are shown
in Fig 1.2 below.

Fig 1.2 Top: Original images, Bottom: Processed images with mask

 Normalized pixel intensities: Every image in our training set is normalized by


subtracting the mean of all training set images. The masked region is then converted
to a column vector which forms the feature vector. This is a common (albeit naïve)
approach and produces a feature vector of length 15,111 elements.
 Gabor filter representations: Gabor filters are often used in image processing and
are based on physiological studies of the human visual cortex. The use of Gabor
filtered facial images has been shown to result in improved accuracy for facial
expression recognition. One approach to using these filters is to generate a bank of
filters across multiple spatial frequencies and orientations. The filtered outputs are
then concatenated, and down-sampling or PCA is often used to reduce
dimensionality. We use an approach similar to that provides competitive results,
and use the L1 norm of each of the Gabor bank features for a given image. Our
Gabor bank contains filters at 5 spatially varying frequencies and 8 orientations. In
the below Fig 1.3 , we show examples of Gabor features.

Department of CSE, TOCE 8


FaceSmash 2018

Fig 1.3 Top: Pre-processed images, Bottom: L1 images with Gabor bank features

2.2 System Architecture


Two types of parameters were extracted from the facial image: real valued and
binary. A total of 15 parameters consisting of eight real-valued parameters and seven binary
parameters were extracted from each facial image. The real valued parameters were
normalized. Generalized neural networks were trained with all fifteen parameters as inputs.
There were seven output nodes corresponding to the seven facial expressions (neutral,
angry, disgust, fear, happy, sad and surprised).

Based on initial testing, the best performing neural networks were recruited to form
a generalized committee for expression classification. Due to a number of ambiguous and
no-classification cases during the initial testing, specialized neural networks were trained
for angry, disgust, fear and sad expression. Then, the best performing neural networks were
recruited into a specialized committee to perform specialized classification. A final
integrated committee neural network classification system was built utilizing both
generalized committee networks and specialized committee networks.

Then, the integrated committee neural network classification system was evaluated
with an independent expression dataset not used in training or in initial testing. A
generalized block diagram of the entire system is shown in Figure 2.4.

Department of CSE, TOCE 9


FaceSmash 2018

Fig. 2.4 Overall block diagram of methodology

Facial expression images are to be obtained from the Cohn-Kanade database. The
database contains facial images taken from 97 subjects with age ranging from 18 to 30
years. The database had 65 percent female subjects. Fifteen percent of the subjects were
African-American and three percent were Asian or Latino.

The database images were taken with a generic camera. The camera was located
directly in front of the subject. The subjects performed different facial displays (single
action units and combinations of action units) starting and ending with a neutral face. The
displays were based on descriptions of prototypic emotions (i.e., neutral, happy, surprise,
anger, fear, disgust, and sad).

The image sequences were digitized into 640 by 480 pixel arrays with 8-bit
precision for grayscale values. Two types of parameters were extracted from the facial
images of 97 subjects: (1) real valued parameters and (2) binary parameters. The real valued
parameters have a definite value depending upon the distance measured. This definite value
was measured in number of pixels. The binary measures gave either a present (= 1) or an
absent (= 0) value. In all, eight real valued measures and seven binary measures were
obtained.

Department of CSE, TOCE 10


FaceSmash 2018

A number of parameters, both real-valued and binary, were extracted and analysed to
decide their effectiveness in identifying a certain facial expression. The features which did
not provide any effective information of the facial expression portrayed in the image were
eliminated and were not used in the final study. The real valued and binary feature selection
was inspired by the FACS.

Fig 2.5 Shows the methodology of image processing.

Real valued parameters (shown in Fig. 2.6)


1. Eyebrow raise distance – The distance between the junction point of the upper and the
lower eyelid and the lower central tip of the eyebrow.
2. Upper eyelid to eyebrow distance – The distance between the upper eyelid and
eyebrow surface.
3. Inter-eyebrow distance – The distance between the lower central tips of both the
eyebrows.
4. Upper eyelid – lower eyelid distance – The distance between the upper eyelid and
lower eyelid.
5. Top lip thickness – The measure of the thickness of the top lip.
6. Lower lip thickness – The measure of the thickness of the lower lip.
7. Mouth width – The distance between the tips of the lip corner.
8. Mouth opening – The distance between the lower surface of top lip and upper surface
of lower lip.

Department of CSE, TOCE 11


FaceSmash 2018

Fig 2.6 Real-valued measures from a sample neutral expression image. 1-eyebrow raise
distance, 2-upper eyelid to eyebrow distance, 3-inter eyebrow distance, 4-upper eyelid to
lower eyelid distance, 5-top lip thickness, 6-lower lip thickness, 7-mouth width, 8-mouth
opening.

Binary parameters
1. Upper teeth visible – Presence or absence of visibility of upper teeth.
2. Lower teeth visible – Presence or absence of visibility of lower teeth.
3. Forehead lines – Presence or absence of wrinkles in the upper part of the forehead.
4. Eyebrow lines – Presence or absence of wrinkles in the region above the eyebrows.
5. Nose lines – Presence or absence of wrinkles in the region between the eyebrows
extending over the nose.
6. Chin lines – Presence or absence of wrinkles or lines on the chin region just below the
lower lip.
7. Nasolabial lines – Presence or absence of thick lines on both sides of the nose
extending down to the upper lip. These binary parameters are depicted in Fig 2.7

Department of CSE, TOCE 12


FaceSmash 2018

Fig 2.7 Binary measures from sample expression images. 1-upper teeth visible, 2-lower
teeth visible, 3-forehead lines, 4-eyebrow lines, 5-nose lines, 6-chin lines, 7-nasolabial
lines.

2.4 Use Case Diagrams

2.5 Sequence Diagram

2.6 Dataflow Diagram

Department of CSE, TOCE 13


FaceSmash 2018

Chapter 3

System Requirement Specification


Requirement Specification is a complete specification of the behaviour of the system to be
developed. It includes a set of use cases that describes all the interactions user will have
with the software. Use cases are also known as functional requirements. In addition to use
cases, the document also contains non-functional requirements, Non-functio na l
requirements are requirements which impose constraints on design on implementation.

3.1 Software Requirements


 Android Operating System. Version 4.4 (KitKat) and above.
 Good network connectivity to the Internet.
 Stream-configurable server system.
 Google TensorFlow for training the classifier.

3.2 Hardware Requirements

 Smartphone running required Android operating system version.


 At least one camera with a minimum required sensor Megapixel count of 5MP.
 Server configuration: at least 8GB DRAM.
 Server configuration: NVIDIA GPU above GeForce 830M recommended.
 Server configuration: Intel Core 2 Duo and above recommended.

Department of CSE, TOCE 14


FaceSmash 2018

Chapter 4

Implementation
4.1 Selection of Platform
Android Studio is an integrated development environment (IDE) for the Android
platform. It simplifies app development. Though offered by Google, seasoned Java
developers will immediately recognise that the toolkit is a version of IntelliJ IDEA.

According to IDC, globally, Android’s share of the smartphone market is about 45


per cent. The best part is that Android is open source and learning it is not at all diffic ult.
Students and professionals want to, at least, know its basics. There are many platforms, like
Android Studio, where even beginners can get into Android development. Android Studio
is a cross-platform integrated development environment (IDE) for developing on the
Android platform. It is written in Java and is available for Linux, Windows as well as for
macOS. Eclipse, which also provided Android development tools, has been replaced by
Android Studio as Google’s primary IDE for native Android application development. The
main reason for this move is because Eclipse was not stable.

Android Studio offers a better Gradle build environment, smarter short cuts, an
improved user interface (UI) designer, a better memory monitor, an improved string
translation editor and better speed. The build system in Android Studio replaces the Ant
system used with Eclipse ADT. It can run from the menu as well as from the command
line. It allows you to track memory allocation as it monitors memory use. It has built - in
support for the Google Cloud Platform, making it easy to integrate Google Cloud
Messaging and App Engine. It also comes with inline debugging, and performance analysis
tools. Android Studio has Android Virtual Device (AVD) which comes with emulators for
Nexus 6 and Nexus 9 devices. It also offers build variants and the capability to generate
multiple apk files. Whenever one compiles a program, the configured lint and IDE
inspections run automatically.

Configuration

Installation
Before you set up Android Studio in Linux, you need to install JDK 6 or higher. In fact,
JDK 7 is required for developing Android 5.0 or above. The other requirements are a
minimum of 2GB RAM (though 4GB is recommended), 400MB hard disk space and at
least 1GB for the Android SDK, emulator system images, caches, GNU C Library (glibc)
2.15 or later, etc. After installing Android Studio and setting it up, go to the SDK manager
to update the required tools, platforms, etc, required for app-building. These packages
provide the basic SDK tools for app development, without an IDE. If you prefer to use a
different IDE, the standalone Android SDK tools can be downloaded. One can set up an
update channel to Stable by going to: File > Settings > Appearance & Behaviour System
Settings > Updates as shown in the figure 4.1 below.

Department of CSE, TOCE 15


FaceSmash 2018

Fig. 4.1 Shows how updates have to be installed.

Fig 4.2 Shows how Samples can be imported to android Studios.

Department of CSE, TOCE 16


FaceSmash 2018

4.2 Functional Descriptions of Modules

Department of CSE, TOCE 17


FaceSmash 2018

Chapter 5

System Testing

Department of CSE, TOCE 18

You might also like