Dog Breed Classi er

Ravi Gopalan
Sep 13 · 12 min read

Dog Breeds — pic source: German Shepherd Rescue Trust New Zealand

Imagine you are having your weekend jog/walk in the park, you see a really cute
There are 266 individual breeds of dog

like me, you would be able to identify not more than 10–15 of the breeds.
So, when I was given a choice of a few different projects for the Data Scientist
Nanodegree by Udacity, I chose the ‘Dog Breed Classifier Project’. This is a very
popular project across machine learning and artificial intelligence nanodegree
programs offered by Udacity.

The aim of the project in the Data Scientist nanodegree was to create a web
application that is able to identify a breed of dog if given a photo or image as input.
If the photo or image contains a human face (or alien face), then the application
will return the breed of dog that most resembles this person.

The project uses Convolutional Neural Networks (CNNs)! A pipeline is built to

process real-world, user-supplied images. Given an image of a dog, the algorithm will
identify an estimate of the canine’s breed. If supplied an image of a human, the code
will identify the resembling dog breed.

The steps that were followed to work through the project were the following:

Step 0: Import Datasets

Step 1: Detect Humans

Step 2: Detect Dogs

Step 3: Create a CNN to classify Dog Breeds (from scratch)

Step 4: Use a CNN to classify Dog Breeds (using Transfer Learning)

Step 5: Create a CNN to classify Dog Breeds (using Transfer Learning)

In this project I have experimented with both Keras and Fast.AI to build the
Convolutional Neural Network (CNN) to make the dog predictions.

I have set myself a target test accuracy for the CNN of 90% i.e., the model identifies
the dog breed 9 times out of 10 correctly. We will be using the accuracy metric on
the testing dataset to measure the performance of our models.

To follow along with the steps you can download or clone the notebook from my
github repository. The repository features the ‘dog_breed_classifier.ipynb’ that runs
on the GPU provided for free at Google Colab.

Step 0: Import Datasets

The datasets were provided by Udacity.

Dog Images — The dog images provided are available in the repository within
the Images directory further organized into train, valid and test subfolders

Human Faces — An exhaustive dataset of faces of celebrities have also been

added to the repository in the lfw folder

Haarcascades — ML-based approach where a cascade function is trained from a

lot of positive and negative images, and used to detect objects in other images.
The algorithm uses the Haar frontal face to detect humans. So the expectation is
that an image with the frontal features clearly defined is required

Test Images — A folder with certain test images have been added to be able to
check the effectiveness of the algorithm

Pre-computed features for networks currently available in Keras (i.e. VGG19,

InceptionV3 and Xception) will be made available from S3
any other downloads to ensure smooth running of the notebook are available in
Load all the libraries and packages required through the notebook.
The libraries required can be categorized as follows:

Utility libraries — random (for random seeding), timeit (to calculate execution
time),os, pathlib, glob(for folder and path operations), tqdm (for execution
progress), sklearn (for loading datasets), requests and io (load files from the

Keras and Fastai for creating CNN

Matplotlib for viewing plots/images and Numpy for tensor processing

Use the load dataset function from sklearn to import our datasets for our dog breed
model training. Create the list of training, validation and test sets of filenames and
the dog breed labels. Create a few paths that will be used later.

Dataset stats
The dog_names variable stores a list of the names for the classes to use in our
see a total of 8351 images of dogs
belonging to 133 different dog breeds which are then categorized into 6680, 835 and
836 images in training, validation and testing.
Step 1: Detect Humans based on OpenCV Haar cascade

Object Detection using Haar feature-based cascade classifiers is an effective object

detection method proposed by Paul Viola and Michael Jones in their paper, “Rapid
Object Detection using a Boosted Cascade of Simple Features” in 2001. It is a
machine learning based approach where a cascade function is trained from a lot of
positive and negative images, which is then used to detect objects in other images.

We use OpenCV’s implementation of Haar feature-based cascade classifiers to detect

human faces in images. OpenCV provides many pre-trained face detectors, stored as
XML files on github. Before using any of the face detectors, it is standard procedure
to convert the images to grayscale. The detectMultiScale function executes the
classifier stored in face_cascade and takes the grayscale image as a parameter. The
face_detector function takes a string-valued file path to an image as input and
returns True if a human face is detected in an image and False otherwise. While
testing the human face detector, all 100 human faces were detected as human faces
while 11 of the 100 dog faces were also detected as human faces
Here, we use a pre-trained ResNet-50 model to detect dogs in images. Our first line of
code downloads the ResNet-50 model,
ImageNet, a very large, very popular dataset used for image classification and other
containing an object from one of 1000 categories. Given an image, this pre-trained
ResNet-50 model returns a prediction (derived from the available categories in
ImageNet) for the object that is contained in the image.

When using TensorFlow as backend, Keras CNNs require a 4D array (which we’ll
also refer to as a 4D tensor) as input, with shape

where nb_samples corresponds to the total number of images (or samples), and
rows , columns , and channels correspond to the number of rows, columns, and
channels for each image, respectively.

Create tensor input from paths to images

The path_to_tensorfunction takes a string-valued file path to a color image as input

and returns a 4D tensor suitable for supplying to a Keras CNN. The function first
loads the image and resizes it to a square image
image is converted to an array, which is then resized to a 4D tensor. In this case,
since we are working with color images,
since we are processing a single image (or sample), the returned tensor will always
have shape (1, 224, 224, 3).
The paths_to_tensor function takes a numpy array of string-valued image paths as
input and returns a 4D tensor with shape (nbsamples, 224, 224, 3). Here, nb_samples

is the number of samples, or number of images, in the supplied array of image paths.
It is best to think of nb_samples as the number of 3D tensors (where each 3D tensor
corresponds to a different image).

In addition, ResNet-50 requires additional processing such as reordering of

channels from RGB to BGR and normalization of pixels which is done using
preprocess_input .

The model is then used to extract the predictions. The predict method, returns an
array whose 𝑖-th entry is the model's predicted probability that the image belongs to
the 𝑖-th ImageNet category. This is implemented in the ResNet50_predict_labels

function below.

The categories corresponding to dogs appear in an uninterrupted sequence

corresponding to keys 151–268, inclusive, to include all categories from 'Chihuahua'

to 'Mexican hairless' . So, if the function returns any number between 151 to 268,
the supplied image is that of a dog.

The dog_detector function above, returns

True if a dog is detected in an image (and
False if not). None of the the sample of human images have a detected dog as
expected and all sample images of dogs have
detected dog as expected.

Step 3: Create a CNN to Classify Dog Breeds (from
The model that I selected had a CNN architecture of 4 convolutional layers
alternating with max-pooling layers, 10% dropout and batch normalization. The
filters used were 16, 32, 64 and 128. The drop-outs were used to reduce the
possibility of over-fitting.

This is then followed by a global average pooling layer which is then followed by a
dense layer to identify 133 breeds.

This takes a 4D-tensor with shape (1, 224, 224, 3) and provides an array of 133 with
probabilities. The optimizer used was ‘RMSProp’ and metric used was accuracy. The
model was run for 10 epochs and provided an accuracy of 6.69%

CNN model from scratch

Step 4: Use a CNN to Classify Dog Breeds

I used VGG16 to demonstrate the use of Transfer Learning. Bottleneck features is
the concept of taking a pre-trained model and chopping off the top classifying layer,
and then providing this “chopped” VGG16 as the first layer into our model.

The bottleneck features are the last activation maps in the VGG16, (the fully-
connected layers for classifying has been cut off) thus making it now an effective
feature extractor. The bottleneck features were obtained from a website where its
stored as a .npz file using the BytesIO library along with requests for the url

The pre-trained VGG-16 model was then used as a fixed feature extractor, where the
last convolutional output of VGG-16 is fed as input to our model. The shape of the
VGG16 pretrained model was 6680, 7, 7, 512 i.e. a layer of (7,7,512) with 6680
samples. A global average pooling layer and a fully connected layer, where the latter
contains one node for each dog category and is equipped with a softmax. Running
this model for 20 epochs resulted in an increase in the accuracy to 47%. This
demonstrates the benefit of leveraging Transfer Learning from pre-trained models.

Step 5: Create a CNN to Classify Dog Breeds (using

Transfer Learning)
The model was built using Keras leveraging Transfer Learning. I tried with 4
different models VGG19, ResNet50, InceptionV3 and Xception.

The shapes correspond to VGG19: (6680, 7, 7, 512) Resnet50 : (6680, 1, 1, 2048)
Inception: (6680, 5, 5, 2048) Xception : (6680, 7, 7, 2048). It took about 160 seconds
to load all the Transfer learning models.

These models were then added with a global average pooling layer, a dropout layer
followed by a fully connected layer (with softmax) and then run for 20 epochs

Training the models took less than a minute in each of these cases.

Training Time in seconds

Accuracy for Xception was ~85% while VGG19 was ~46%

I then explored options for increasing the accuracy. I used fastai to see if we could
leverage transfer learning and obtain a higher accuracy.

The databunch was created and normalized.

A cnn_learner was created with the resnet34 model and was run for two cycles. The
accuracy was upto 86%. An optimal learning rate seems to be between 1e-6 and 1e-4

After using unfreeze and refitting the model and for 10 epochs an accuracy of upto
89.8% is also obtained that ensures upto 9 out of 10 images are accurately classified.

Based on the analysis of various models that we have fit, the learn_resnet34 seems
to provide the most accuracy. This is also saved and exported as a pickle file for

Step 6: Write own algorithm to provide an output breed

based on an image
We input an image path, the bottleneck features for our pretrained model are
applied to the image, this is then processed through our trained fully-connected
model which gives a predicted_breed, the category index and the probability tensor.
The predict_breed function takes an input of a file_path and outputs the breed of
the dog.

Our algorithm accepts a file path to an image and first determines whether the
image contains a human, dog, or neither. Then,

if a dog is detected in the image, return the predicted breed.

if a human is detected in the image, return the resembling dog breed.

if neit her is detected in the image, provide output that indicates an error.
The algorithm leverages the CNN built in Stage 5 and leverages the previous
functions created to come up with an output.

The algo function determines if the provided file_path contains a dog or human or
neither and returns the species (dog or human or neither) and the predicted breed
of the image

The provide_output outputs a greeting based on the predicted species and dog breed.

Step 7: Test Your Algorithm

The six dogs that were sampled to check the algorithm were correctly identified as
dogs. The breeds of 5 of 6 were accurate too. Only 1 dog (a Rajapalayam, a native
breed was identified as a Great Dane, possibly because Rajapalayam is not one of the
The humans were also identified as human and a dog breed predicted — incidentally
both were predicted as Dogue_de_bordeaux

At the start, my objective was to create a CNN with 90% testing accuracy. Our final
model obtained 89.8% testing accuracy.

There are a few breeds that are virtually identical and are sub-breeds. There’s also a
possibility of some images being either blurred or having too much noise. There’s
also a possibility of enhancing the quality by additional image manipulation.

Following the above areas I’m sure we could increase the testing accuracy of the
model to above 90%.

A simple web application in Flask could be built to leverage the model to predict
breeds through user-input images.

Machine Learning Deep Learning Classi cation Data Science Dogs

