phase 2 full

DEEP CONVOLUTIONAL NEURAL NETWORKS
BASED AGE AND GENDER CLASSIFICATION

WITH FACIAL IMAGES
A THESIS
Submitted by
KAVIYAPRIYA.G
(910619405001)
in partial fulfillment for the award of the degree of
MASTER OF ENGINEERING
IN
COMPUTER SCIENCE
K.L.N. COLLEGE OF ENGINEERING

COMPUTER SCIENCE AND ENGINEERING
AN AUTONOMOUS INSTITUTION
ANNA UNIVERSITY, CHENNAI
APRIL 2021
ANNA UNIVERSITY, CHENNAI
BONAFIDE CERTIFICATE
Certified that this Report titled “DEEP CONVOLUTIONAL NEURAL

NETWORKS-BASED AGE AND GENDXER CLASSIFICATION WITH
FACIAL IMAGES” is the bonafide work of G. KAVIYAPRIYA
(910619205001) who carried out the work under my supervision. Certified further
that to the best of my knowledge the work reported herein does not form part of
any other thesis or dissertation on the basis of which a degree or award was
conferred on an earlier occasion on this or any other candidate.
SIGNATURE SIGNATURE
Dr. D. VIAJAYALAKSHMI,M.E.,Ph.D Dr. S. MIRUNA JOE AMALI, M.E. Ph.D
PROFESSOR AND HEAD PROFESSOR
Computer Science and Engineering Computer Science and Engineering
K.L.N. College of Engineering K.L.N College of Engineering
Pottapaliyam Pottapaliyam
Sivagangai – 630 611 Sivagangai – 630 611
Submitted for the project phase-II viva - voce examination held on ..............
Internal Examiner External Examiner

i
ACKNOWLEDGEMENT
Firstly, I thank almighty God for his presence throughout the project and
my parents, who helped me in every means possible for the completion of this
project work. I extend my gratitude to our college President,
Er. G. KAVIYA PRIYAB.TECH(IT)., K.L.N. College of Engineering for
making me march towards the glory of success.
I extend my gratefulness to our college Secretary and Correspondent,

Dr. K. N. K. GANESH, B.E., PhD(Hons), K.L.N. College of Engineering
for the motivation throughout the academic works.
I express my sincere thanks to our beloved management and Principal,

Dr. A.V.RAM PRASAD, M.E, Ph.D., for all the facilities offered.
I express my thanks to Dr. P. R. VIJALAKSHMI, M.E., Ph.D., our

Head of the Department of Computer and Science Engineering for her
motivation and tireless encouragement.
I express my deep sense of gratitude and sincere thanks to

Dr. S. MIRUNAJEOAMALI, M.E., Ph.D, Associate Professor, Department
of Computer and Science Engineering for her valuable guidance and
continuous encouragement in completion of my Final project work
iii
ABSTRACT
We build an age and gender classification system to classify age and gender
with the help of deep learning framework. Automatic age and gender classification
has become relevant to an increasing amount of applications, particularly since the
rise of social platforms and social media. Nevertheless, performance of existing
methods on real-world images is still significantly lacking, especially when com-
pared to the tremendous leaps in performance recently re-ported for the related
task of face recognition. Keypoint features and descriptor contains the visual
description of the patch and is used to compare the similarity between image
features. In this we show that by learning representations through the use of deep-
convolutional neural networks (CNN-VGG16) and in Phase II use resnet
architecture algorithm for classification, a significant increase in performance can
be obtained on these tasks. To this end, we propose a simple convolutional net
architecture that can be used even when the amount of learning data is limited. We
evaluate our method on the recent dataset for age and gender estimation and show
it to dramatically outperform current state of the art methods.

iv
TABLE OF CONTENTS
CHAPTER TITLE PAGE NO.
ABSTRACT iii
LIST OF FIGURES vii
1 INTRODUCTION 1
1.1 INTRODUCTION 1
1.2 DOMAIN EXPLANATION 3
1.2.1 Image Processing 3
1.2.2 Images and Pictures 7
1.2.3 Images and Digital Images 8
1.3 IMAGE PROCESSING
8
FUNDAMENTALS
1.3.1 Pixel 8
1.3.2 Pixel Connectivity 9
1.3.3 Pixel Values 10
1.3.4 Color Scale 12
1.3.5 Some applications 13
1.3.6 Aspects of Image Processing 14
1.3.7 An Image Processing Task
1.4 EXISTING SYSTEM 16
2 LITERATURE REVIEW 18
3 METHODOLOGY 26
3.1 PROPOSED SYSTEM 26
3.2 SYSTEM ARCHITECTURE 27
3.3 FLOW DIAGRAM 28
4 TESTING OF PRODUCT 29
4.1 TESTING OF PRODUCT 29
v
4.2 UNIT TESTING 29

4.3 INTEGRATION TESTING 30
4.3.1 White Box Testing 30
4.3.2 Black Box Testing 30
4.4 VALIDATION TESTING 31
4.5 USER ACCEPTANCE TESTING 31
4.6 OUTPUT TESTING 31
4.6.1 System Implementation 31
4.6.2 User Training 32
4.6.3 Training on the Application Software 32
4.6.4 Operational Documentation 32
4.7 SYSTEM MAINTENANCE 33
4.7.1 Corrective Maintenance 33
4.7.2 Adaptive Maintenance 33
4.7.3 Perceptive Maintenance 33
4.7.4 Preventive Maintenance 34
5 MODULES DESCRIPTION 35
5.1 INPUT IMAGE 35
5.2 FACE REGION DETECTION 35
5.3 KEY-POINT FEATURES 36
5.4 CLASSIFICATION 36
5.5 PERFORMANCE ANALYSIS 37
6 IMPLEMENTATION OF SYSTEM 39
6.1 SYSTEM REQUIREMENTS 39
6.1.1 Software Requirements 39
6.1.2 Hardware Requirements 39
6.2 SOFTWARE DESCRIPTION 40
6.2.1 History of Python 41
vi
6.2.2 Python Features 41

6.3 ANACONDA 43
6.3.1 Anaconda Distribution 43
6.3.2 Anaconda Enterprise 43
6.3.3 Anaconda Data Science Libraries 44
6.4 THE DATA SCIENCE PACKAGE &
45
ENVIRONMENT MANAGER
6.5 ANACONDA NAVIGATOR, THE
46
DESKTOP PORTAL TO DATA SCIENCE
6.6 SPYDER 46
7 RESULT AND DISCUSSION 48
7.1 RESULTS 48
7.2 SCREENSHOOT 49
8 CONCLUSION 52
REFERENCES 53
vii
LIST OF FIGURES
FIG.NO. TITLE PAGE NO.

An image - an array or a matrix of pixels
1.1 4
arranged in columns and rows
Each pixel has a value from 0 (black) to 255
(white). The possible range of the pixel values
1.2 4
depend on the colour depth of the image, here 8
bit = 256 tones or greyscales
A true-colour image assembled from three
grayscale images coloured red, green and blue.
1.3 5
Such an image may contain up to 16 million
different colours
Two connected components based on 4-
1.4 10
connectivity
1.5 Pixels, with a neighborhood 12
The additive model of RGB. Red, green, and
1.6 blue are the primary stimuli for human color 13
perception and are the primary additive colours
3.1 System Architecture 27
3.2 Flow Diagram 28
6.1 Anaconda Enterprise 44
6.2 Data Science Libraries 45
6.3 Anaconda Packages 45
6.4 Anaconda Navigator 46
7.1 Performance Measures 48
7.2 Image is selected in file 49
7.3 Selected The Face Image 49
7.4 Resnet Is Trained 50
viii
7.5 Waiting for Accuracy Result 51

Converted The Face Images Into Layer And
7.6 51
Trained The Face Image For Accuracy Value
1
CHAPTER 1
INTRODUCTION
1.1 INTRODUCTION
Age and gender classification is extremely realistic. It is widely used

in gathering demographic information, controlling entrance, searching video and
image, improving recognition speed and accuracy. Besides, age and gender
classification shows important potential in the business field. The log records
acquired from the system ultimately input to statistical analysis about consumers in
order to accomplish real-time personalized recommendations.
The study of gender classification that began in the 1980s is a two-stage

process. At the first stage, study researched on how to distinguish bettheen sexes
psychologically taking ANN methods mainly. Golomb trained a 2-layer fully
connected network. The average error rate of our design was about 8.1%. Cottrell
trained a Back Propagation neural network based on the principal component
analysis to the samples. Valentin and Edelman adopted eigenface and linear
neuron respectively. After a phase, automatic gender classification attracted more
and more attention with the rising demand of intelligent visual monitoring. Gutta
combined radial basis function neural network with C4.5 decision tree into a
hybrid classifier, getting a high averaged rate of recognition of 96%. Subsequent
methods such as support vector machine and Adaboost delivered quite good
reports to the gender classification.
2
As for age classification, Young and Niels put forward age estimation at the
first time in 1994, roughly dividing people into three groups, kids, youngster and
the aged. Hayashi studied on wrinkles texture and skin analysis based on Hough
transform. Zhou used boosing as regression and Geng proposed method of Aging
Pattern Subspace. Recently Guo presents an approach to extract feature related to
senility via embedded low-dimensional aging of the manifold getting from
subspace learning.
Convolutional neural network is one of the artificial neural network. Three

characteristics of the convolutional neural networks are observed, sharing theights,
locally connection and pooling the sampling. Deep convolutional neural network is
composed of several convolutional layers and pooling layers. Convolutional
Neural Network supplies an end-to-end model to learn the features to extract
image and classify by using stochastic gradient descent algorithm. Features of each
layer are obtained from local region of last layer by sharing theights. According to
this characteristic, convolutional neural network seems more suitable for
application of learning and expressing image features.
Previous convolutional neural network is a simple architecture. Classical

LeNet-5 model is generally applied to recognize handwritten digits and classify
images. Along with the immense progress in structure optimization, the
convolutional neural network processing is extending the applicable field of it.
Deep Belief Network, Convolutional Deep Belief Network, Convolutional Deep
Belief Network and Fully Convolutional Network rise in response to the proper
time and conditions. In recent years, convolutional neural network develops further
with the successful application of transfer learning.
The make use of classic GoogLeNet as the training network. GoogLeNet

attained top accuracy rate in the ImageNet Large-Scale Visual Recognition
Challenge 2014 (ILSVRC14), which contains 22 layers. GoogLeNet is notable for
optimizing the utilization of the computing resources, keeping the computational
budget constant by improving crafted design to increase the depth and width of the
3
network. In order to avoid the overfitting and enormous computations come from
the expansion of network architecture, GoogLeNet rationally moves from fully
connected to sparsely connected architectures, even inside the convolutions.
Hothever, when referring to numerical computation on non-uniform sparse data
structures, computing infrastructures becomes inefficient. The Inception
architecture is used to cluster sparse structures into denser submatrix. This not only
maintains the sparsity of the network structure, but also take full advantage of high
computing performance of the dense matrix. GoogleNet adds softmax layers
amidst the network to prevent vanishing gradient problem. Softmax layer is used
to obtain extra train loss. The add the gradient calculated related to loss to the
gradient of the whole network to propagate the gradient. The can regard this kind
of method as merging sub-networks in various depths together. Thanks to
convolution kernel sharing, computational gradients accumulate. As a result,
gradient will not fade too much.
Traditional gradient descent algorithm orderly disposes the images in the

dataset only in the manner of serial when training on the GPU. It is not ideal
especially for enormous dataset that is time-consuming to train. Therefore, the
adopt stochastic gradient descent algorithm with multi-GPU. Each GPU contains a
replica model of the same network and gets gradient on its own during the train
process. Parameters and gradients only pass between GPU and parameter server.
This kind of training method is termed as Asynchronous Stochastic Gradient
Descent (ASGD). Each GPU trains its own model with independent gradient. The
gradient is sent to parameter server for asynchronously update.
1.2 DOMAIN EXPLANATION

1.2.1 Image Processing
An image is an array, or a matrix, of square pixels (picture elements)

arranged in columns and rows. In a (8-bit) greyscale image each picture element
has an assigned intensity that ranges from 0 to 255
4
A grey scale image is what people normally call a black and white image,
but the name emphasizes that such an image will also include many shades of
grey.
Figure. 1.1 An image - an array or a matrix of pixels arranged in columns and

rows.
Figure. 1.2 Each pixel has a value from 0 (black) to 255 (white). The possible
range of the pixel values depend on the colour depth of the image, here 8 bit =
256 tones or greyscales.
A normal grayscale image has 8 bit colour depth = 256 grayscales. A “true
colour” image has 24 bit colour depth = 8 x 8 x 8 bits = 256 x 256 x 256 colours =
5
~16 million colours.Some grayscale images have more grayscales, for instance 16
bit = 65536 grayscales. In principle three grayscale images can be combined to
form an image with 281,474,976,710,656 grayscales.
Figure. 1.3 A true-colour image assembled from three grayscale images

coloured red, green and blue. Such an image may contain up to 16 million
different colours.
There are two general groups of ‘images’: vector graphics (or line art) and
bitmaps (pixel-based or ‘images’). Some of the most common file formats are:
GIF — an 8-bit (256 colour), non-destructively compressed bitmap format.
Mostly used for theb. Has several sub-standards one of which is the animated GIF.
JPEG — a very efficient (i.e. much information per byte) destructively compressed
24 bit (16 million colours) bitmap format. Widely used, especially for theb and
Internet (bandwidth-limited).
TIFF — the standard 24 bit publication bitmap format. Compresses non-

destructively with, for instance, Lempel-Ziv-Thelch (LZW) compression.
PS — Postscript, a standard vector format. Has numerous sub-standards and can be

difficult to transport across platforms and operating systems.
6
PSD – a dedicated Photoshop format that keeps all the information in an image
including all the layers.
Pictures are the most common and convenient means of conveying or

transmitting information. A picture is worth a thousand words. Pictures concisely
convey information about positions, sizes and inter relationships bettheen objects.
They portray spatial information that the can recognize as objects. Human beings
are good at deriving information from such images, because of our innate visual
and mental abilities. About 75% of the information received by human is in
pictorial form. An image is digitized to convert it to a form which can be stored in
a computer's memory or on some form of storage media such as a hard disk or CD-
ROM. This digitization procedure can be done by a scanner, or by a video camera
connected to a frame grabber board in a computer. Once the image has been
digitized, it can be operated upon by various image processing operations.
Image processing operations can be roughly divided into three major

categories, Image Compression, Image Enhancement and Restoration, and
Measurement Extraction. It involves reducing the amount of memory needed to
store a digital image. Image defects which could be caused by the digitization
process or by faults in the imaging set-up (for example, bad lighting) can be
corrected using Image Enhancement techniques. Once the image is in good
condition, the Measurement Extraction operations can be used to obtain useful
information from the image. Some examples of Image Enhancement and
Measurement Extraction are given below. The examples shown all operate on 256
grey-scale images. This means that each pixel in the image is stored as a number
between 0 to 255, where 0 represents a black pixel, 255 represents a white pixel
and values in-between represent shades of grey. These operations can be extended
to operate on color images. The examples below represent only a few of the many
techniques available for operating on images. Details about the inner workings of
the operations have not been given, but some references to books containing this
information are given at the end for the interested reader.
7
1.2.2 Images and Pictures
As the mentioned in the preface, human beings are predominantly visual

creatures: the rely heavily on our vision to make sense of the world around us. The
not only look at things to identify and classify them, but the can scan for
differences, and obtain an overall rough feeling for a scene with a quick glance.
Humans have evolved very precise visual skills: the can identify a face in an
instant; the can differentiate colors; the can process a large amount of visual
information very quickly.
However, the world is in constant motion: stare at something for long

enough and it will change in some way. Even a large solid structure, a building or
a mountain, will change its appearance depending on the time of day (day or
night); amount of sunlight (clear or cloudy), or various shadows falling upon it.
They are concerned with single images: snapshots, if you, of a visual scene.
Although image processing can deal with changing scenes, the shall not discuss it
in any detail in this text. For our purposes, an image is a single picture which
represents something. It may be a picture of a person, of people or animals, or of
an outdoor scene, or a microphotograph of an electronic component, or the result
of medical imaging. Even if the picture is not immediately recognizable, it will not
be just a random blur.
Image processing involves changing the nature of an image in order to

either
1. Improve its pictorial information for human interpretation,
2. Render it more suitable for autonomous machine perception.
The shall be concerned with digital image processing, which involves using
a computer to change the nature of a digital image. It is necessary to realize that
these two aspects represent two separate but equally important aspects of image
processing. A procedure which satisfies condition, a procedure which makes an
image look better may be the very worst procedure for satisfying condition.
8
Humans their images to be sharp, clear and detailed; machines prefer their images
to be simple and uncluttered.
1.2.3 Images and Digital Images
Suppose the take an image, a photo, say. For the moment, let’s make things
easy and suppose the photo is black and white (that is, lots of shades of grey), so
no colour. The may consider this image as being a two dimensional function,
where the function values give the brightness of the image at any given point. The
may assume that in such an image brightness values can be any real numbers in the
range (black) (white).
A digital image from a photo in that the values are all discrete. Usually they
take on only integer values. The brightness values also ranging from 0 (black) to
255 (white). A digital image can be considered as a large array of discrete dots,
each of which has a brightness associated with it. These dots are called picture
elements, or more simply pixels. The pixels surrounding a given pixel constitute its
neighborhood. A neighborhood can be characterized by its shape in the same way
as a matrix: the can speak of a neighborhood. Except in very special
circumstances, neighborhoods have odd numbers of rows and columns; this
ensures that the current pixel is in the centre of the neighborhood.
1.3 IMAGE PROCESSING FUNDAMENTALS

1.3.1 Pixel
In order for any digital computer processing to be carried out on an image,

it must first be stored within the computer in a suitable form that can be
manipulated by a computer program. The most practical way of doing this is to
divide the image up into a collection of discrete (and usually small) cells, which
are known as pixels. Most commonly, the image is divided up into a rectangular
grid of pixels, so that each pixel is itself a small rectangle. Once this has been
done, each pixel is given a pixel value that represents the color of that pixel. It is
assumed that the whole pixel is the same color, and so any color variation that did
9
exist within the area of the pixel before the image was discretized is lost. Hothever,
if the area of each pixel is very small, then the discrete nature of the image is often
not visible to the human eye.
Other pixel shapes and formations can be used, most notably the hexagonal
grid, in which each pixel is a small hexagon. This has some advantages in image
processing, including the fact that pixel connectivity is less ambiguously defined
than with a square grid, but hexagonal grids are not widely used. Part of the reason
is that many image capture systems (e.g. most CCD cameras and scanners)
intrinsically discretize the captured image into a rectangular grid in the first
instance.
1.3.2 Pixel Connectivity
The notation of pixel connectivity describes a relation bettheen two or more

pixels. For two pixels to be connected they have to fulfill certain conditions on the
pixel brightness and spatial adjacency.
First, in order for two pixels to be considered connected, their pixel values
must both be from the same set of values V. For a grayscale image, V might be
any range of graylevels, e.g.V={22,23,...40}, for a binary image the simple have
V={1}.
To formulate the adjacency criterion for connectivity, the first introduce the
notation of neighborhood. For a pixel p with the coordinates (x,y) the set of pixels
given by:
𝑵𝟒 (𝒑) = {(𝒙 + 𝟏, 𝒚), (𝒙 − 𝟏, 𝒚), (𝒙, 𝒚 + 𝟏), (𝒙, 𝒚 − 𝟏)}is called its 4-
neighbors.
𝑵𝟖 (𝒑) = 𝑵𝟒 ∪ {(𝒙 + 𝟏, 𝒚 + 𝟏), (𝒙 + 𝟏, 𝒚 − 𝟏), (𝒙 − 𝟏, 𝒚 + 𝟏), (𝒙 − 𝟏, 𝒚 − 𝟏)}
From this the can infer the definition for 4- and 8-connectivity
10
Two pixelsp and q, both having values from a set V are 4-connected if q is
from the set𝑵𝟒 (𝒑)and 8-connected if q is from𝑵𝟖 (𝒑).
General connectivity can either be based on 4- or 8-connectivity; for the

following discussion the use 4-connectivity.
A pixel p is connected to a pixel q if p is 4-connected to q or if p is 4-

connected to a third pixel which itself is connected to q. Or, in other words, two
pixels q and p are connected if there is a path from p and q on which each pixel is
4-connected to the next one.
A set of pixels in an image which are all connected to each other is called a
connected component. Finding all connected components in an image and marking
each of them with a distinctive label is called connected component labeling.
An example of a binary image with two connected components which are

based on 4-connectivity can be seen in Figure 1. If the connectivity there based on
8-neighbors, the two connected components would merge into one.
Figure. 1.4 Two connected components based on 4-connectivity.
1.3.3 Pixel Values
Each of the pixels that represents an image stored inside a computer has a
pixel value which describes how bright that pixel is, and/or what color it should be.
11
In the simplest case of binary images, the pixel value is a 1-bit number indicating
either foreground or background. For a gray scale images, the pixel value is a
single number that represents the brightness of the pixel. The most common pixel
format is the byte image, where this number is stored as an 8-bit integer giving a
range of possible values from 0 to 255. Typically, zero is taken to be black, and
255 is taken to be white. Values in between make up the different shades of gray.
To represent color images, separate red, green and blue components must be
specified for each pixel (assuming an RGB color space), and so the pixel `value' is
actually a vector of three numbers. Often the three different components are stored
as three separate `grayscale' images known as color planes (one for each of red,
green and blue), which have to be recombined when displaying or processing.
Multispectral Images can contain even more than three components for each pixel,
and by extension these are stored in the same kind of way, as a vector pixel value,
or as separate color planes.
The actual grayscale or color component intensities for each pixel may not
actually be stored explicitly. Often, all that is stored for each pixel is an index into
a colour map in which the actual intensity or colors can be looked up.
Although simple 8-bit integers or vectors of 8-bit integers are the most
common sorts of pixel values used, some image formats support different types of
value, for instance 32-bit signed integers or floating point values. Such values are
extremely useful in image processing as they allow processing to be carried out on
the image where the resulting pixel values are not necessarily 8-bit integers. If this
approach is used, then it is usually necessary to set up a color map which relates
particular ranges of pixel values to particular displayed colors.
12
Figure. 1.5 Pixels, with a neighborhood
1.3.4 Color Scale
The two main color spaces are RGB and CMYK.
RGB
The RGB color model is an additive color model in which red, green, and
blue light are added together in various ways to reproduce a broad array of colors.
RGB uses additive color mixing and is the basic color model used in television or
any other medium that projects color with light. It is the basic color model used in
computers and for the graphics, but it cannot be used for print production.
The secondary colors of RGB – cyan, magenta, and yellow – are formed by
mixing two of the primary colors (red, green or blue) and excluding the third
color. Red and green combine to make yellow, green and blue to make cyan, and
blue and red form magenta. The combination of red, green, and blue in full
intensity makes white.
13
Figure. 1.6 The additive model of RGB. Red, green, and blue are the primary
stimuli for human color perception and are the primary additive colours.
To see how different RGB components combine together, here is a selected

repertoire of colors and their respective relative intensities for each of the red,
green, and blue components:
1.3.5 Some applications
Image processing has an enormous range of applications; almost every area

of science and technology can make use of image processing methods. Here is a
short list just to give some indication of the range of image processing
applications.
14
1. Medicine
 Inspection and interpretation of images obtained from X-rays, MRI or CAT
scans,
 Analysis of cell images, of chromosome karyotypes.
2. Agriculture
 Satellite/aerial views of land, for example to determine how much land is

being used for different purposes, or to investigate the suitability of
different regions for different crops,
 Inspection of fruit and vegetables distinguishing good and fresh produce

from old.
3. Industry
 Automatic inspection of items on a production line,
 Inspection of paper samples.
4. Law enforcement
 Fingerprint analysis,
 Sharpening or de-blurring of speed-camera images.
1.3.6 Aspects of Image Processing
It is convenient to subdivide different image processing algorithms into

broad subclasses. There are different algorithms for different tasks and problems,
and often the would distinguish the nature of the task at hand.
 Image enhancement: This refers to processing an image so that the result

is more suitable for a particular application.
Example include:
15
 sharpening or de-blurring an out of focus image,
 highlighting edges,
 improving image contrast, or brightening an image,
 Removing noise.
 Image restoration. This may be considered as reversing the damage done

to an image by a known cause, for example:
 removing of blur caused by linear motion,
 removal of optical distortions,
 Removing periodic interference.
 Image segmentation. This involves subdividing an image into constituent

parts, or isolating certain aspects of an image:
 circles, or particular shapes in an image,
 In an aerial photograph, identifying cars, trees, buildings, or roads.
These classes are not disjoint; a given algorithm may be used for both image
enhancement or for image restoration. However, the should be able to decide what
it is that the are trying to do with our image: simply make it look better
(enhancement), or removing damage (restoration).
1.3.7 An Image Processing Task
The will look in some detail at a particular real-world task, and see how the
above classes may be used to describe the various stages in performing this task.
The job is to obtain, by an automatic process, the postcodes from envelopes. Here
is how this may be accomplished:
 Acquiring the image: First the need to produce a digital image from a
paper envelope. This can be done using either a CCD camera, or a scanner.
16
 Preprocessing: This is the step taken before the major image processing
task. The problem here is to perform some basic tasks in order to render the
resulting image more suitable for the job to follow. In this case it may
involve enhancing the contrast, removing noise, or identifying regions to
contain the postcode.
 Segmentation: Here is where the actually get the postcode; in other words
the extract from the image that part of it which contains just the postcode.
 Representation and description These terms refer to extracting the

particular features which allow us to differentiate between objects. Here the
will be looking for curves, holes and corners which allow us to distinguish
the different digits which constitute a postcode.
Recognition and interpretation: This means assigning labels to objects based on

their descriptors (from the previous step), and assigning meanings to those labels.
So the identify particular digits, and the interpret a string of four digits at the end
of the address as the postcode.
1.4 EXISTING SYSTEM
In existing approach, view related methods for age and gender classification
and provide a cursory overview of deep convolutional networks. In age
classification the problem of automatically extract-ing age related attributes from
facial images has received increasing attention in recent years and many methods
have been put forth. The note that de-spite our focus here on age group
classification rather than precise age estimation (i.e., age regression), the survey
be-low includes methods designed for either task. Early methods for age
estimation are based on calculating ratios between different measurements of facial
features. Once facial features (e.g. eyes, nose, mouth.) are localized and their sizes
and distances measured, ratios between them are calculated and used for
classifying the face into different age categories according to hand-crafted rules. In
gender classification a detailed survey of gender classification methods can be
found and more recently. Here the quickly survey relevant methods. One of the
17
early methods for gender classification used a neural network trained on a small
set of near-frontal face images. One of the first applications of convolutional
neural net-works (CNN) is perhaps the LeNet-5 network described for optical
character recognition. Compared to modern deep CNN, their network was
relatively modest due to the limited computational resources of the time and the
algorithmic challenges of training bigger networks. Deep CNN have additionally
been successfully applied to applications including human pose estimation, face
parsing, facial keypoint detection, speech recognition and action classification.
Disadvantages:
 The computational complex of the extraction is more.
 Less Accuracy.
18
CHAPTER 2
LITERATURE REVIEW
1. Title: Age and Gender Classification using Convolutional Neural

Networks(2015)
Author: Gil Levi and Tal Hassner
Age and gender play fundamental roles in social inter-actions. Languages

reserve different salutations and gram-mar rules for men or women, and very often
different vocabularies are used when addressing elders compared to young people.
Despite the basic roles these attributes play in our day-to-day lives, the ability to
automatically estimate them accurately and reliably from face images is still far from
meeting the needs of commercial applications. This is particularly perplexing when
considering recent claims tosuper-human capabilities in the related task of face
recognition.
Past approaches to estimating or classifying these at-tributes from face images

have relied on differences in facial feature dimensions or “tailored” face descriptors.
Most have employed classification schemes designed particularly for age or gender
estimation tasks, including and others. Few of these past methods there designed to
handle the many challenges of unconstrained imaging conditions. Moreover, the
machine learning methods employed by these systems did not fully exploit the
massive numbers of image examples and data available through the Internet in order
to improve classification capabilities.
The attempt to close the gap between automatic face recognition capabilities
and those of age and gen-der estimation methods. To this end, the follow the
successful example laid down by recent face recognition systems: Face recognition
techniques described in the last few years have shown that tremendous progress can
19
be made by the use of deep convolutional neural networks (CNN). The demonstrate
similar gains with a simple network architecture, designed by considering the rather
limited availability of accurate age and gender labels in existing face data sets.
The test our network on the newly released Adience benchmark for age and
gender classification of unfiltered face images. The show that despite the very
challenging nature of the images in the Adience set and the simplicity of our network
design, our method outperforms existing state of the art by substantial margins.
Although these results provide a remarkable baseline for deep-learning-based
approaches, they leave room for improvements by more elaborate system designs,
suggesting that the problem of accurately estimating age and gender in the
unconstrained set-tings, as reflected by the Adience images, remains unsolved. In
order to provide a foothold for the development of more effective future methods, the
make our trained models and classification system publicly available.
All of these methods have proven effective on small and/or constrained

benchmarks for age estimation. To our knowledge, the best performing methods
there demonstrated on the Group Photos benchmark. In state-of-the-art performance
on this benchmark was presented by employing LBP descriptor variations and a
dropout-SVM classifier. The show our proposed method to outperform the results
they report on the more challenging Adience benchmark, designed for the same task.
Gathering a large, labeled image training set for age and gender estimation
from social image repositories requires either access to personal information on the
subjects appearing in the images (their birth date and gender), which is often private,
or is tedious and time-consuming to manually label.
2. Title: Imagenet Classification with Deep Convolutional Neural Networks

(2012)
Author: A. Krizhevsky, I. Sutskever, and G. E. Hinton
The trained a large, deep convolutional neural network to classify the 1.2
million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000
different classes. On the test data, the achieved top-1 and top-5 error rates of
37.5%and 17.0% which is considerably better than the previous state-of-the-art. The
20
neural network, which has 60 million parameters and 650,000 neurons, consists of
five convolutional layers, some of which are follothed by max-pooling layers, and
three fully-connected layers with a final 1000-way softmax. To make train-ing faster,
the used non-saturating neurons and a very efficient GPU implementation of the
convolution operation. To reduce overfitting in the fully-connected layers the
employed a recently-developed regularization method called “dropout “that proved
to be very effective. The also entered a variant of this model in theILSVRC-2012
competition and achieved a winning top-5 test error rate of 15.3%, compared to
26.2% achieved by the second-best entry.
Current approaches to object recognition make essential use of machine

learning methods. To improve their performance, the can collect larger datasets, learn
more potherful models, and use better techniques for preventing overfitting. Until
recently, datasets of labeled images there relatively small — on the order of tens of
thousands of images (e.g., NORB, Caltech-101/256, and CIFAR-10/100). Simple
recognition tasks can be solved quite well with datasets of this size, especially if they
are augmented with label-preserving transformations. For example, the current-best
error rate on the MNIST digit-recognition task (<0.3%) approaches human
performance. But objects in realistic settings exhibit considerable variability, so to
learn to recognize them it is necessary to use much larger training sets. And indeed,
the shortcomings of small image datasets have been widely recognized (e.g., Pinto et
al.), but it has only recently become possible to collect labeled datasets with millions
of images. The new larger datasets include LabelMe, which consists of hundreds of
thousands of fully-segmented images, and ImageNet, which consists of over 15
million labeled high-resolution images in over 22,000 categories.
To learn about thousands of objects from millions of images, the need a model
with a large learning capacity. However, the immense complexity of the object
recognition task means that this problem cannot be specified even by a dataset as
large as ImageNet, so our model should also have lots of prior knowledge to
compensate for all the data the don’t have. Convolutional neural networks (CNNs)
constitute one such class of models. Their capacity can be con-trolled by varying
their depth and breadth, and they also make strong and mostly correct assumptions
about the nature of images (namely, stationarity of statistics and locality of pixel
21
dependencies). Thus, compared to standard feed forward neural networks with

similarly-sized layers, CNNs have much fewer connections and parameters and so
they are easier to train, while their theoretically-best performance to be only slightly
worse.
Despite the attractive qualities of CNNs, and despite the relative efficiency of
their local architecture, they have still been prohibitively expensive to apply in large
scale to high-resolution images. Luckily, current GPUs, paired with a highly-
optimized implementation of 2D convolution, are potherful enough to facilitate the
training of interestingly-large CNNs, and recent datasets such as ImageNet contain
enough labeled examples to train such models without severe overfitting.
3. Title: Method for performing image based regression using boosting(2010)
Author: Zhou, S. K., Georgescu, B., Zhou, X., &Comaniciu, D.
The present a general algorithm of image based regression that is applicable to

many vision problems. The proposed regress or that targets a multiple-output setting
is learned using boosting method. The formulate a multiple-output regression
problem in such a way that overfitting is decreased and an analytic solution is
admitted. Because the represent the image via a set of highly redundant Haar-like
features that can be evaluated very quickly and select relevant features through
boosting to absorb the knowledge of the train-ing data, during testing the require no
storage of the training data and evaluate the regression function almost in no time.
The also propose an efficient training algorithm that breaks the computational
bottleneck in the greedy feature selection process. The validate the efficiency of the
proposed regressor using three challenging tasks of age estimation, tumor detection,
and endocardial wall localization and achieve the best performance with a dramatic
speed, e.g., more than1000 times faster than conventional data-driven techniques
such as support vector regressor in the experiment of endocardial wall localization.
The problem of IBR is defined as follows: Given an image x, the are

interested in inferring an entity y(x) that is associated with the image x. The meaning
of y(x) varies a lot in different applications. For example, it could be a feature
characterizing the image (e.g., the human age in the problem A), a parameter related
to the image (e.g., the position and anisotropic spread of the tumor in the problem B),
22
or other meaningful quantity (e.g., the location of the endocardial wall in the problem
C).
IBR is an emerging challenge in the vision literature. In the article of Wanget

al., support vector regression was employed to infer a shape deformation parameter.
Ina recent work, Agarwal and Triggs used relevance vector regression to estimate a
3D human pose from silhouettes. However, in the above two works, the inputs to the
regressors are not images themselves, rather pre-processed entities, e.g., landmark
locations and shape context descriptor.
Numerous algorithms there proposed in the machine learning literature to

attack the regression problem in general. Data-driven approaches gained prevalence.
Popular data-driven regression approaches include non-parametric kernel regression
(NPR), linear methods and their nonlinear kernel variants such as kernel ridge
regression (KRR), sup-port vector regression (SVR). The will briefly review them in
section 2. However, it is often difficult or inefficient to directly apply them to vision
applications due to the following challenges.
The formulate the multiple-output regression problem in such a way that an

analytic solution is allothed at each round of boosting. No decoupling of the output
dimension is performed. Also, the decrease overfitting using an image-based
regularization term that has an interpretation as prior knowledge and also allows an
analytic solution. The invoke the boosting framework to perform feature selection
such that only relevant local features are preserved to conquer the variations in
appearance. The use of decision stumps as theak learners also makes it robust to
appearance change. The use the Haar-like simple features that can be rapidly
computed. The do not store the training data. The knowledge of the training data is
absorbed in the theighting coefficients and the selected feature set. Hence, the are
able to evaluate the regression function almost in no time during testing. The propose
an efficient implementation to perform boosting training, which is usually a time-
consuming process if a truly greedy feature selection procedure is used. In our
implementation, the select the features incrementally over the dimension of the
output variable.
23
4. Title: Performance and Scalability of GPU-Based Convolutional Neural

Networks (2010)
Author: D. Strigl, K. Kofler, and S. Podlipnig
The present the implementation of a framework for accelerating training and

classification of arbitrary Convolutional Neural Networks (CNNs) on the GPU.
CNNs are a derivative of standard Multilayer Perceptron (MLP) neural networks
optimized for two-dimensional pattern recognition problems such as Optical
Character Recognition (OCR) or face detection. The describe the basic parts of a
CNN and demonstrate the performance and scalability improvement that can be
achieved by shifting the computation-intensive tasks of a CNN to the GPU.
Depending on the network topology training and classification on the GPU performs
2 to 24 times faster than on the CPU. Furthermore, the GPU version scales much
better than the CPU implementation with respect to the network size.
The biggest drawback of CNNs, besides a complex implementation, is the

long training time. Since CNN training is very compute- and data-intensive, training
with large data sets may take several days or theeks. The huge number of floating
point operations and relatively low data transfer in every training step makes this task
thell suited for GPGPU (General Purpose GPU) computation on current Graphic
Processing Units (GPUs). The main advantage of GPUs over CPUs is the high
computational throughput at relatively low cost, achieved through their massively
parallel architecture.
In contrast to other classifiers Support Vector Machines (SVMs) where

several parallel implementations for CPUs and GPUs exist, similar efforts for CNNs
are missing. Therefore, the implemented a high performance library in CUDA to
perform fast training and classification of CNNs on the GPU. Our goal was to
demonstrate the performance.
The traditional approach for two-dimensional pattern recognition is based on a

feature extractor, the output of which is fed into a neural network. This feature
extractor is usually static, independent of the neural network and not part of the
training procedure. It is not an easy task to find a “good” feature extractor because it
is not part of the training procedure and therefore it can neither adapt to the network
24
topology nor to the parameters generated by the training procedure of the neural
network.
The convolutional layers are the core of any CNN. A convolutional layer
consists of several two-dimensional planes of neurons, the so-called feature maps.
Each neuron of a feature map is connected to a small subset of neurons inside the
feature maps of the previous layer, the so-called receptive fields. The receptive fields
of neighboring neurons overlap and the theights of these receptive fields are shared
through all the neurons of the same feature map. The feature maps of a convolutional
layer and its preceding layer are either fully or partially connected (either in a
predefined way or in a randomized manner).
The relatively low amount of data to transfer to the GPU for every pattern and
the big matrices that have to be handled inside the network seem to be appropriate
for GPGPU processing. Furthermore, our experiments shothed that the GPU
implementation scales much better than the CPU implementations with increasing
network size.
5. Title: Multi-column Deep Neural Networks for Image Classification(2012)
Author: D. Cireşan, U. Meier, and J. Schmidhuber
Traditional methods of computer vision and machine learning cannot match

human performance on tasks such as the recognition of handwritten digits or traffic
signs. Our biologically plausible, wide and deep artificial neural net-work
architectures can. Small (often minimal) receptive fields of convolutional winner-
take-all neurons yield large network depth, resulting in roughly as many sparsely
connected neural layers as found in mammals between retina and visual cortex. Only
winner neurons are trained. Several deep neural columns become experts on inputs
pre-processed in different ways; their predictions are averaged. Graphics cards allow
for fast training. On the very competitive MNIST handwriting benchmark, our
method is the first to achieve near-human performance. On a traffic sign recognition
benchmark it outperforms humans by a factor of two. The also improve the state-of-
the-art on a plethora of common image classification benchmarks.
25
Recent publications suggest that unsupervised pre-training of deep,

hierarchical neural networks improves supervised pattern classification. Here the
train such nets by simple online back-propagation, setting new, greatly improved
records on MNIST, Latin letters, Chinese characters, traffic signs, NORB (jittered,
cluttered) and CIFAR10 benchmarks.
The focus on deep convolutional neural networks (DNN), introduced,

improved, refined and simplified. Lately, DNN proved their mettle on datasets
ranging from handwritten digits (MNIST), hand-written characters to 3D toys
(NORB) and faces. DNNs fully unfold their potential when they are wide (many
maps per layer) and deep (many layers). But training them requires theeks, months,
even years on CPUs. High data transfer latency prevents multi-threading and multi-
CPU code from saving the situation.
Carefully designed GPU code for image classification can be up to two orders
of magnitude faster than its CPU counterpart. Hence, to train huge DNN in hours or
days, the implement them on GPU, building upon the work. The training algorithm is
fully online, i.e. theight updates occur after each error back-propagation step. The
will show that properly trained wide and deep DNNs can outperform all previous
methods, and demonstrate that unsupervised initialization/pretrainingis not necessary
(although the don’t deny that it might help sometimes, especially for datasets with
few samples perclass). The also show how combining several DNN columns into a
Multi-column DNN (MCDNN) further decreases the error rate by 30-40%.
The evaluate our architecture on various commonly used object recognition

benchmarks and improve the state-of-the-art on all of them. The description of the
DNN architecture used for the various experiments is given in the follow-ing way:
2x48x48-100C5-MP2-100C5-MP2-100C4-MP2-300N-100N-6N represents a net
with 2 input images of size48x48, a convolutional layer with 100 maps and 5x5
filters, a max-pooling layer over non overlapping regions of size2x2, a convolutional
layer with 100 maps and 4x4 filters, a max-pooling layer over non-overlapping
regions of size2x2, a fully connected layer with 300 hidden units, a fully connected
layer with 100 hidden units and a fully connected output layer with 6 neurons (one
per class).
26
CHAPTER 3
METHODOLOGY
3.1 PROPOSED SYSTEM
In Proposed System, architecture is used throughout our experiments for both

age and gender classification. Prediction running times can conceivably be
substantially improved by running the net-work on image batches. The test the
accuracy of our CNN (VGG 16 in phase I and Resnet in Phase II) design using the
recently released dataset designed for age and gender classification. The emphasize
that the same network architecture is used for all test folds of the benchmark and in
fact, for both gen-der and age classification tasks. This is performed in order to
ensure the validity of our results across folds, but also to demonstrate the generality
of the network design proposed here; the same architecture performs well across
different, related problems. For age classification, the measure and compare both the
accuracy when the algorithm gives the exact age-group classification and when the
algorithm is off by one adjacent age-group (i.e., the subject belongs to the group
immediately older or immediately younger than the predicted group). This follows
others who have done so in the past, and reflects the uncertainty inherent to the task –
facial features often change very little between oldest faces in one age class and the
youngest faces of the subsequent class. Which used the same gender classification
pipeline of applied to more effective alignment of the faces; faces in their tests there
synthetically modified to appear facing forward.
Advantages:
 The background will be extracted separately.

27
 The Classification/ Recognition accuracy will be more.
3.2 SYSTEM ARCHITECTURE
Face Face Region

Image Detection
Classification Using
RESNET
Age and Gender

Prediction
Performance Analysis
Figure 3.1 System Architecture

28
3.3 FLOW DIAGRAM
Figure. 3.2 Flow Diagram

29
CHAPTER 4
TESTING OF PRODUCT
4.1 TESTING OF PRODUCT
System testing is the stage of implementation, which aimed at ensuring that

system works accurately and efficiently before the live operation commence. Testing
is the process of executing a program with the intent of finding an error. A good test
case is one that has a high probability of finding an error. A successful test is one that
ansthers a yet undiscovered error.
Testing is vital to the success of the system. System testing makes a logical
assumption that if all parts of the system are correct, the goal will be successfully
achieved. The candidate system is subject to variety of tests-on-line response,
Volume Street, recovery and security and usability test. A series of tests are
performed before the system is ready for the user acceptance testing. Any
engineered product can be tested in one of the following ways. Knowing the
specified function that a product has been designed to from, test can be conducted to
demonstrate each function is fully operational. Knowing the internal working of a
product, tests can be conducted to ensure that “al gears mesh”, that is the internal
operation of the product performs according to the specification and all internal
components have been adequately exercised.
4.2 UNIT TESTING
Unit testing is the testing of each module and the integration of the overall
system is done. Unit testing becomes verification efforts on the smallest unit of
software design in the module. This is also known as ‘module testing’. The modules
of the system are tested separately. This testing is carried out during the
programming itself. In this testing step, each model is found to be working
satisfactorily as regard to the expected output from the module. There are some
30
validation checks for the fields. For example, the validation check is done for
verifying the data given by the user where both format and validity of the data
entered is included. It is very easy to find error and debug the system.
4.3 INTEGRATION TESTING
Data can be lost across an interface, one module can have an adverse effect
on the other sub function, when combined, may not produce the desired major
function. Integrated testing is systematic testing that can be done with sample data.
The need for the integrated test is to find the overall system performance. There are
two types of integration testing. They are:
i) Top-down integration testing.

ii) Bottom-up integration testing.
4.3.1 White Box Testing
White Box testing is a test case design method that uses the control
structure of the procedural design to drive cases. Using the white box testing
methods, the derived test cases that guarantee that all independent paths within a
module have been exercised at least once.
4.3.2 Black Box Testing
Black box testing is done to find incorrect or missing function

Interface error
Errors in external database access
Performance errors
Initialization and termination errors
In ‘functional testing’, is performed to validate an application conforms to its

specifications of correctly performs all its required functions. So this testing is also
called ‘black box testing’. It tests the external behavior of the system. Here the
engineered product can be tested knowing the specified function that a product has
been designed to perform, tests can be conducted to demonstrate that each function is
fully operational.
31
4.4 VALIDATION TESTING
After the culmination of black box testing, software is completed

assembly as a package, interfacing errors have been uncovered and corrected and
final series of software validation tests begin validation testing can be defined as
many, but a single definition is that validation succeeds when the software functions
in a manner that can be reasonably expected by the customer.
4.5 USER ACCEPTANCE TESTING
User acceptance of the system is the key factor for the success of the system.
The system under consideration is tested for user acceptance by constantly keeping in
touch with prospective system at the time of developing changes whenever required.
4.6 OUTPUT TESTING
After performing the validation testing, the next step is output asking the
user about the format required testing of the proposed system, since no system could
be useful if it does not produce the required output in the specific format. The output
displayed or generated by the system under consideration. Here the output format is
considered in two ways. One is screen and the other is printed format. The output
format on the screen is found to be correct as the format was designed in the system
phase according to the user needs. For the hard copy also output comes out as the
specified requirements by the user. Hence the output testing does not result in any
connection in the system.
4.6.1 System Implementation
Implementation of software refers to the final installation of the package in its

real environment, to the satisfaction of the intended users and the operation of the
system. The people are not sure that the software is meant to make their job easier.
 The active user must be aware of the benefits of using the system
 Their confidence in the software built up
 Proper guidance is impaired to the user so that he is comfortable in using
the application
32
Before going ahead and viewing the system, the user must know that for
viewing the result, the server program should be running in the server. If the server
object is not running on the server, the actual processes will not take place.
4.6.2 User Training
To achieve the objectives and benefits expected from the proposed system it
is essential for the people who will be involved to be confident of their role in the
new system. As system becomes more complex, the need for education and training
is more and more important.
Education is complementary to training. It brings life to formal training by

explaining the background to the resources for them. Education involves creating the
right atmosphere and motivating user staff. Education information can make training
more interesting and more understandable.
4.6.3 Training on the Application Software
After providing the necessary basic training on the computer

awareness, the users will have to be trained on the new application software. This
will give the underlying philosophy of the use of the new system such as the screen
flow, screen design, type of help on the screen, type of errors while entering the data,
the corresponding validation check at each entry and the ways to correct the data
entered. This training may be different across different user groups and across
different levels of hierarchy.
4.6.4 Operational Documentation
Once the implementation plan is decided, it is essential that the user of

the system is made familiar and comfortable with the environment. A documentation
providing the whole operations of the system is being developed. Useful tips and
guidance is given inside the application itself to the user. The system is developed
user friendly so that the user can work the system from the tips given in the
application itself.
33
4.7 SYSTEM MAINTENANCE
The maintenance phase of the software cycle is the time in which

software performs useful work. After a system is successfully implemented, it should
be maintained in a proper manner. System maintenance is an important aspect in the
software development life cycle. The need for system maintenance is to make
adaptable to the changes in the system environment. There may be social, technical
and other environmental changes, which affect a system which is being implemented.
Software product enhancements may involve providing new functional capabilities,
improving user displays and mode of interaction, upgrading the performance
characteristics of the system. So only thru proper system maintenance procedures,
the system can be adapted to cope up with these changes. Software maintenance is of
course, far more than “finding mistakes”.
4.7.1 Corrective Maintenance
The first maintenance activity occurs because it is unreasonable to assume that

software testing will uncover all latent errors in a large software system. During the
use of any large program, errors will occur and be reported to the developer. The
process that includes the diagnosis and correction of one or more errors is called
Corrective Maintenance.
4.7.2 Adaptive Maintenance
The second activity that contributes to a definition of maintenance occurs

because of the rapid change that is encountered in every aspect of computing.
Therefore, Adaptive maintenance termed as an activity that modifies software to
properly interfere with a changing environment is both necessary and commonplace.
4.7.3 Perceptive Maintenance
The third activity that may be applied to a definition of maintenance occurs

when a software package is successful. As the software is used, recommendations for
new capabilities, modifications to existing functions, and general enhancement are
received from users. To satisfy requests in this category, Perceptive maintenance is
34
performed. This activity accounts for the majority of all efforts expended on software
maintenance.
4.7.4 Preventive Maintenance
The fourth maintenance activity occurs when software is changed to improve

future maintainability or reliability, or to provide a better basis for future
enhancements. Often called preventive maintenance, this activity is characterized by
reverse engineering and re-engineering techniques.
35
CHAPTER 5
MODULES DESCRIPTION
MODULES:
 Input Image
 Face Region Detection
 Key-point Features
 Classification
 Performance Analysis
5.1 Input Image
The first stage of any vision system is the image acquisition stage. Image
acquisition is the digitization and storage of an image. After the image has been
obtained, various methods of processing can be applied to the image to perform the
many different vision tasks required today. First Capture the Input Image from
source file by using uigetfile and imread function. However, if the image has not
been acquired satisfactorily then the intended tasks may not be achievable, even with
the aid of some form of image enhancement.
5.2 Face Region Detection
Face detection is a challenging task. Several approaches have been proposed

for face detection. Some approaches are only good for one face per image, while
others can detect multiple faces from an image with greater price to pay in terms of
training. The present an approach that can be used for single or multiple face
detection from simple or cluttered scenes. Faces with different sizes located in any
part of an image can be detected using Viola-Jones approach. The Viola-Jones
algorithm is a widely used mechanism for object detection. The main property of this
algorithm is that training is slow, but detection is fast. This algorithm uses Haar basis
36
feature filters, so it does not use multiplications. The efficiency of the Viola-Jones
algorithm can be significantly increased by first generating the integral image.
Detection happens inside a detection window. A minimum and maximum

window size is chosen, and for each size a sliding step size is chosen. Then the
detection window is moved across the image as follows:
1. Set the minimum window size, and sliding step corresponding to that size.
2. For the chosen window size, slide the window vertically and horizontally with
the same step. At each step, a set of N face recognition filters is applied. If one
filter gives a positive ansther, the face is detected in the current widow.
3. If the size of the window is the maximum size stop the procedure. Otherwise
increase the size of the window and corresponding sliding step to the next
chosen size and go to the step 2.
5.3 Key-point Features
Image features are small patches that are useful to compute similarities
between images. An image feature is usually composed of a feature keypoint and a
feature descriptor. The keypoint usually contains the patch 2D position and other
stuff if available such as scale and orientation of the image feature. The descriptor
contains the visual description of the patch and is used to compare the similarity
between image features.
5.4 Classification
VGG-16 is a convolutional neural network that is 16 layers deep. You can

load a pretrained version of the network trained on more than a million images from
the ImageNet database. The pretrained network can classify images into 1000 object
categories, such as keyboard, mouse, pencil, and many animals. As a result, the
network has learned rich feature representations for a wide range of images. The
network has an image input size of 224-by-224. You can use classify to classify new
images using the VGG-16 network. Follow the steps of Classify Image Using
GoogLeNet and replace GoogLeNet with VGG-16. To retrain the network on a new
37
classification task, follow the steps of Train Deep Learning Network to Classify New
Images and load VGG-16 instead of GoogLeNet.
net = vgg16 returns a VGG-16 network trained on the ImageNet data set.
This function requires Deep Learning Toolbox™ Model for VGG-16 Network
support package. If this support package is not installed, then the function provides a
download link.
net = vgg16('Theights','imagenet') returns a VGG-16 network trained on the

ImageNet data set. This syntax is equivalent to net = vgg16.
layers = vgg16('Theights','none') returns the untrained VGG-16 network architecture.

The untrained model does not require the support package.
Resnet
Deeper neural networks are more difficult to train. The present a residual
learning framework to ease the training of networks that are substantially deeper than
those used previously. The explicitly reformulate the layers as learning residual
functions with reference to the layer inputs, instead of learning unreferenced
functions. The provide comprehensive empirical evidence showing that these
residual networks are easier to optimize, and can gain accuracy from considerably
increased depth. On the ImageNet dataset the evaluate residual nets with a depth of
up to 152 layers--8x deeper than VGG nets but still having lother complexity.
5.5 Performance Analysis
Accuracy: Itis closeness of the measurements to a specific value
Precision: Itis a description of random errors, a measure of statistical

variability.
Recall: It is the fraction of the total amount of relevant instances that there
actually retrieved.
Sensitivity and specificity are statistical measures of the performance of a

binary classification test, also known in statistics as classification function:
38
 Sensitivity (also called the true positive rate, the recall, or probability of
detection in some fields) measures the proportion of positives that are
correctly identified as such (e.g., the percentage of sick people who are
correctly identified as having the condition).
 Specificity (also called the true negative rate) measures the proportion of
negatives that are correctly identified as such (e.g., the percentage of
healthy people who are correctly identified as not having the condition).
39
CHAPTER 6
IMPLEMENTATION OF SYSTEM
6.1 SYSTEM REQUIREMENTS
6.1.1 Software Requirements
 O/S : Windows 8.1.
 Language : python.
 IDE : Anaconda - Spyder
6.1.2 Hardware Requirements
 System : Pentium IV 2.4 GHz
 Hard Disk : 1000 GB
 Monitor : 15 VGA color
 Mouse : Logitech.
 Keyboard : 110 keys enhanced
 Ram : 4GB
40
6.2 SOFTWARE DESCRIPTION
Python
Python is a general-purpose interpreted, interactive, object-oriented, and high-

level programming language. It was created by Guido van Rossum during 1985-
1990. Perl, Python source code is also available under the GNU General Public
License (GPL). This tutorial gives enough understanding on Python programming
language.
Python is a popular programming language. It was created in 1991 by Guido

van Rossum.
It is used for:
 Web development (server-side),

 software development,
 mathematics,
 System scripting.
Python can be used on a server to create web applications. Python can be used
alongside software to create workflows. Python can connect to database systems. It
can also read and modify files. Python can be used to handle big data and perform
complex mathematics. Python can be used for rapid prototyping, or for production-
ready software development.
Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc).
Python has a simple syntax similar to the English language. Python has syntax that
allows developers to write programs with fewer lines than some other programming
languages.
Python runs on an interpreter system, meaning that code can be executed as soon
as it is written. This means that prototyping can be very quick. Python can be treated
in a procedural way, an object-orientated way or a functional way.
Python is a high-level, interpreted, interactive and object-oriented scripting

language. Python is designed to be highly readable. It uses English keywords
41
frequently where as other languages use punctuation, and it has fewer syntactical
constructions than other languages.
Python is Interpreted − Python is processed at runtime by the interpreter. You do

not need to compile your program before executing it. This is similar to PERL and
PHP.
Python is Interactive − you can actually sit at a Python prompt and interact with
the interpreter directly to write your programs.
Python is Object-Oriented − Python supports Object-Oriented style or technique

of programming that encapsulates code within objects.
Python is a Beginner's Language − Python is a great language for the beginner-

level programmers and supports the development of a wide range of applications
from simple text processing to WWW browsers to games.
6.2.1 History of Python
Python was developed by Guido van Rossum in the late eighties and early
nineties at the National Research Institute for Mathematics and Computer Science in
the Netherlands.
Python is derived from many other languages, including ABC, Modula-3, C,

C++, Algol-68, SmallTalk, and Unix shell and other scripting languages.
Python is copyrighted. Perl, Python source code is now available under the
GNU General Public License (GPL).
Python is now maintained by a core development team at the institute,

although Guido van Rossum still holds a vital role in directing its progress.
6.2.2 Python Features
Python's features include
 Easy-to-learn − Python has few keywords, simple structure, and a clearly

defined syntax. This allows the student to pick up the language quickly.
 Easy-to-read − Python code is more clearly defined and visible to the eyes.
42
 Easy-to-maintain − Python's source code is fairly easy-to-maintain.

 A broad standard library − Python's bulk of the library is very portable and
cross-platform compatible on UNIX, Windows, and Macintosh.
 Interactive Mode − Python has support for an interactive mode which allows
interactive testing and debugging of snippets of code.
 Portable − Python can run on a wide variety of hardware platforms and has the
same interface on all platforms.
 Extendable − you can add low-level modules to the Python interpreter. These
modules enable programmers to add to or customize their tools to be more
efficient.
 Databases − Python provides interfaces to all major commercial databases.
 GUI Programming − Python supports GUI applications that can be created
and ported to many system calls, libraries and windows systems, such as
Windows MFC, Macintosh, and the X Window system of Unix.
 Scalable − Python provides a better structure and support for large programs
than shell scripting.
Apart from the above-mentioned features, Python has a big list of good features, few
are listed below −
 It supports functional and structured programming methods as thell as OOP.

 It can be used as a scripting language or can be compiled to byte-code for
building large applications.
 It provides very high-level dynamic data types and supports dynamic type
checking.
 It supports automatic garbage collection.
 It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java.
 Python is available on a wide variety of platforms including Linux and Mac
OS X.
 Python Syntax compared to other programming languages
 Python was designed to for readability, and has some similarities to the
English language with influence from mathematics.
43
 Python uses new lines to complete a command, as opposed to other

programming languages which often use semicolons or parentheses.
 Python relies on indentation, using whitespace, to define scope; such as the
scope of loops, functions and classes. Other programming languages often use
curly-brackets for this purpose.
6.3 ANACONDA
Anaconda is the most popular python data science platform.
6.3.1 Anaconda Distribution
With over 6 million users, the open source Anaconda Distribution is the
fastest and easiest way to do Python and R data science and machine learning on
Linux, Windows, and Mac OS X. It's the industry standard for developing, testing,
and training on a single machine.
6.3.2 Anaconda Enterprise
Anaconda Enterprise is an AI/ML enablement platform that empothers

organizations to develop, govern, and automate AI/ML and data science from laptop
through training to production. It lets organizations scale from individual data
scientists to collaborative teams of thousands, and to go from a single server to
thousands of nodes for model training and deployment.
44
Figure 6.1 Anaconda Enterprise
6.3.3 Anaconda Data Science Libraries
 Over 1,400 Anaconda-curated and community data science packages

 Develop data science projects using your favorite IDEs, including Jupyter,
JupyterLab, Spyder, and RStudio
 Analyse data with scalability and performance with Dask, numpy, pandas, and
Numba
 Visualize your data with Matplotlib, Bokeh, Datashader, and Holoviews
 Create machine learning and deep learning models with Scikit-learn,
Tensorflow, h20, and Theano
45
Figure 6.2 Data Science Libraries
6.4 THE DATA SCIENCE PACKAGE & ENVIRONMENT MANAGER
 Automatically manages all packages, including cross-language dependencies

 Works across all platforms: Linux, macOS, Windows
 Create virtual environments
 Download conda packages from Anaconda, Anaconda Enterprise, Conda
Forge, and Anaconda Cloud.
Figure 6.3 Anaconda Packages

46
6.5 Anaconda Navigator, the Desktop Portal to Data Science
 Install and launch applications and editors including Jupyter, RStudio, Visual
Studio Code, and Spyder
 Manage your local environments and data science projects from a graphical
interface
 Connect to Anaconda Cloud or Anaconda Enterprise
 Access the latest learning and community resources
Figure 6.4 Anaconda Navigator
6.6 Spyder
Spyder is an open source cross-platform integrated development environment

(IDE) for scientific programming in the Python language. ... Initially created and
developed by Pierre Raybaut in 2009, since 2012 Spyder has been maintained and
continuously improved by a team of scientific Python developers and the
community. Strongly recommend the free, open-source Spyder Integrated
Development Environment (IDE) for scientific and engineering programming, due to
its integrated editor, interpreter console, and debugging tools. Spyder is included in
Anaconda and other distributions.
Spyder is a potherful scientific environment written in Python, for Python, and

designed by and for scientists, engineers and data analysts. It offers a unique
47
combination of the advanced editing, analysis, debugging, and profiling functionality

of a comprehensive development tool with the data exploration, interactive
execution, deep inspection, and beautiful visualization capabilities of a scientific
package.
Beyond its many built-in features, its abilities can be extended even further
via its plugin system and API. Furthermore, Spyder can also be used as a PyQt5
extension library, allowing developers to build upon its functionality and embed its
components, such as the interactive console, in their own PyQt software.
Work efficiently in a multi-language editor with a function/class browser,

code analysis tools, automatic code completion, horizontal/vertical splitting, and go-
to-definition.
IPython Console
Harness the pother of as many IPython consoles as you within the flexibility
of a full GUI interface; run your code by line, cell, or file; and render plots right
inline.
Variable Explorer
Interact with and modify variables on the fly: plot a histogram or time series,
edit a date frame or Numpy array, sort a collection, dig into nested objects, and more!
Profiler
Find and eliminate bottlenecks to unchain your code's performance.
Debugger
Trace each step of your code's execution interactively.

48
CHAPTER 7
RESULT AND DISCUSSION
7.1 RESULTS
Query Image Age: 25
Query Image Gender: F
Performance
Accuracy Sensitivity Specificity Precision Recall
Measures
Value in % 93.20 99.9 76.9 89.26 99.9
The Table above shows the results obtained for a sample query image. These
performance measures along with NPV, FPR, FNR, FDR are shown in the graph.
Performance Measures
120
100
80
60
40
20
0
Accurancy sensitivity specificity precision Recall Predition
value
Figure 7.1
The graph shown Accuracy, Sensitivity, Specification, Precision Recall,

Negative prediction value, False positive Rate, False Negativity Rate.
49
7.2 SCREENSHOOT
Step 1Select the file in package
FIGURE 7.2 Image is selected in file
Step 2 Select the face images and destroyed the image
FIGURE 7.3 Selected The Face Image

50
Resnet
STEP3: Layer Is Trained Step By Step In RESNET LAYER
Figure 7.4 Resnet Is Trained
STEP 4:
Result is trained in RESNET layer for accuracy result

51
Figure.7.5 Waiting for Accuracy Result
Step 5:
Predicted result is print accuracy value:93.20
Figure 7.6. Converted The Face Images Into Layer And Trained The Face
Image For Accuracy Value
52
CHAPTER 8
CONCLUSION
In this work the problem of face detection for age and gender classification
is proposed using deep convolutional neural network.
From the results obtained it can be concluded that first, CNN can be used to
provide improved age and gender classification results, even considering the much
smaller size of contemporary unconstrained image sets labeled forage and gender.
Second, the simplicity of the model implies that more elaborate systems using
more training data may thell be capable of substantially improving results beyond
those obtained.
For future works, a deeper CNN architecture and a more robust image
processing algorithm for exact age estimation can be considered. Also, the apparent
feature estimation of human’s face will be interesting research to investigate in the
future. The network can be modified to add faces in different postures and also to
improve the accuracy of the system.
53
REFERENCE
[1]Levi, G., &Hassncer, T. (2015). Age and gender classification using

convolutional neural networks.Computer Vision and Pattern Recognition
Workshops (pp.34-42).IEEE.
[2]A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImagenetClassification with

Deep Convolutional Neural Networks,” Advances in Neural Information
Processing Systems, pp. 1097-1105, 2012.
[3]Zhou, S. K., Georgescu, B., Zhou, X., &Comaniciu, D. (2010). Method for
performing image based regression using boosting. US, US7804999.
[4]D. Strigl, K. Kofler, and S. Podlipnig, “Performance and Scalability of GPU-

Based Convolutional Neural Networks,” Euromicro Conference on Parallel,
Distributed, and Network-Based Processing, pp. 317-324, 2010.
[5]D. Cireşan, U. Meier, and J. Schmidhuber, “Multi-column Deep Neural

Networks for Image Classification,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, 2012.
[6]D. Cireşan, et al., “Flexible, High Performance Convolutional Neural Networks

for Image Classification,” in International Joint Conference on Artificial
Intelligence, pp.1237-1242, 2011.
[7]Geng, X., Zhou, Z. H., Zhang, Y., Li, G., & Dai, H. (2006). Learning from
facial aging patterns for automatic age estimation. ACM International Conference
on Multimedia, Santa Barbara, Ca, Usa, October (pp.307-316). DBLP.
[8]A. Krizhevsky, G. E. Hinton, “Learning Multiple Layers of Features from Tiny

Images,” Master’s thesis, Department of Computer Science, University of Toronto,
2009.
[9]A. Krizhevsky, G. E. Hinton. “Convolutional Deep Belief Networks on CIFAR-

10”, Unpublished manuscript, 2010.
54
[10]A. Coates,A.Y. Ng,H. Lee, “An Analysis of Single-Layer Networks in

Unsupervised Feature Learning,” International Conference on Artificial
Intelligence and Statistics,pp. 215-223,2011.
[11] http://www.openu.ac.il/home/hassner/Adience/data.html
[12] http://www.jdl.ac.cn/peal/index.html

phase 2 full

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

phase 2 full

Uploaded by

Copyright:

Available Formats

DEEP CONVOLUTIONAL NEURAL NETWORKS

BASED AGE AND GENDER CLASSIFICATION

in partial fulfillment for the award of the degree of

K.L.N. COLLEGE OF ENGINEERING

Certified that this Report titled “DEEP CONVOLUTIONAL NEURAL

Internal Examiner External Examiner

I extend my gratefulness to our college Secretary and Correspondent,

I express my sincere thanks to our beloved management and Principal,

I express my thanks to Dr. P. R. VIJALAKSHMI, M.E., Ph.D., our

I express my deep sense of gratitude and sincere thanks to

has become relevant to an increasing amount of applications, particularly since the

rise of social platforms and social media. Nevertheless, performance of existing

methods on real-world images is still significantly lacking, especially when com-

convolutional neural networks (CNN-VGG16) and in Phase II use resnet

architecture algorithm for classification, a significant increase in performance can

be obtained on these tasks. To this end, we propose a simple convolutional net

it to dramatically outperform current state of the art methods.

CHAPTER TITLE PAGE NO.

4.2 UNIT TESTING 29

6.2.2 Python Features 41

FIG.NO. TITLE PAGE NO.

7.5 Waiting for Accuracy Result 51

Age and gender classification is extremely realistic. It is widely used

The study of gender classification that began in the 1980s is a two-stage

Convolutional neural network is one of the artificial neural network. Three

Previous convolutional neural network is a simple architecture. Classical

The make use of classic GoogLeNet as the training network. GoogLeNet

Traditional gradient descent algorithm orderly disposes the images in the

1.2 DOMAIN EXPLANATION

An image is an array, or a matrix, of square pixels (picture elements)

Figure. 1.1 An image - an array or a matrix of pixels arranged in columns and

Figure. 1.3 A true-colour image assembled from three grayscale images

GIF — an 8-bit (256 colour), non-destructively compressed bitmap format.

TIFF — the standard 24 bit publication bitmap format. Compresses non-

PS — Postscript, a standard vector format. Has numerous sub-standards and can be

Pictures are the most common and convenient means of conveying or

Image processing operations can be roughly divided into three major

1.2.2 Images and Pictures

As the mentioned in the preface, human beings are predominantly visual

However, the world is in constant motion: stare at something for long

Image processing involves changing the nature of an image in order to

1. Improve its pictorial information for human interpretation,

2. Render it more suitable for autonomous machine perception.

1.2.3 Images and Digital Images

1.3 IMAGE PROCESSING FUNDAMENTALS

In order for any digital computer processing to be carried out on an image,

1.3.2 Pixel Connectivity

The notation of pixel connectivity describes a relation bettheen two or more

𝑵𝟖 (𝒑) = 𝑵𝟒 ∪ {(𝒙 + 𝟏, 𝒚 + 𝟏), (𝒙 + 𝟏, 𝒚 − 𝟏), (𝒙 − 𝟏, 𝒚 + 𝟏), (𝒙 − 𝟏, 𝒚 − 𝟏)}

General connectivity can either be based on 4- or 8-connectivity; for the

A pixel p is connected to a pixel q if p is 4-connected to q or if p is 4-

An example of a binary image with two connected components which are

Figure. 1.4 Two connected components based on 4-connectivity.

1.3.3 Pixel Values

Figure. 1.5 Pixels, with a neighborhood

1.3.4 Color Scale

The two main color spaces are RGB and CMYK.

To see how different RGB components combine together, here is a selected

1.3.5 Some applications

Image processing has an enormous range of applications; almost every area

 Analysis of cell images, of chromosome karyotypes.

 Satellite/aerial views of land, for example to determine how much land is

 Inspection of fruit and vegetables distinguishing good and fresh produce

 Automatic inspection of items on a production line,