Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Extreme Learning Machines : Implementation and

Applications in Computer Vision

Purusharth Verma
National Institute of Technology, Jamshedpur
02/05/2021

Final Year Major Project


Abstract

Extreme Learning Machines offer many advantages over traditional neural networks.
Since their parameters are not iteratively set, they can be trained in an extremely short
amount of time. Also, the parameters that they learn have a tendency to not overfit the
training data. Because of these many advantages, an effort has been made through this project
to use Extreme Learning Machines in place of conventional neural networks, and their
performances have been recorded and analyzed.

Table of Contents

1. Introduction
2. Extreme Learning Machines
3. Methodology
3.1 The Training Process
3.2 Sample Code
4. Results and Discussion
4.1 MNIST Dataset
4.2 PBC Dataset
4.3 Dermatology Dataset
5. Conclusions and Recommendations
6. Acknowledgements
7. References

List of Figures

● Figure 1. Training plot for MNIST Dataset


● Figure 2. Training plot for PBC Dataset
● Figure 3. Training plot for Dermatology Dataset
List of Tables

● Table 1. Results for the MNIST Dataset


● Table 2. Results for the PBC Dataset
● Table 3. Results for the Dermatology Dataset

1 Introduction

Extreme Learning Machines, first introduced in 2004 by Guang-Bin Huang, Qin-Yu


Zhu and Chee Kheong Siew, are essentially feed forward neural networks with just a single
hidden layer. The difference between a traditional feed forward neural network and ELMs is
in the way their weights are set. While the weights of traditional feedforward nets are set in
an iterative fashion (via the backpropagation algorithm), the weights of ELMs are set directly,
within a single step. This makes ELMs much faster to train and optimize. Further, ELMs have
the added advantage that the parameters they learn from a training dataset generalize very
well to new inputs, without any incorporation of additional regularization methodologies.
With the set of advantages that they offer, it makes sense to replace traditional feed forward
neural networks with ELMs. This is what is essentially the aim of this project. Here, an attempt
has been made to study and analyze the performance of ELMs when used in place of
traditional neural networks. Image classification has been used as the domain for deriving
tasks which are used for measuring and recording the performance of ELM augmented neural
networks. The project makes use of the following three image classification datasets:

⦁ The MNIST Dataset: A dataset comprising of 60,000 28X28 black and white images of the
handwritten decimal digits 0-9.
⦁ The PBC Dataset: A dataset comprising of over 17,000 360X360 colored images of 8
different types of cells found in the human body.
⦁ The Dermatology Dataset: A dataset comprising of over 10,000 360X360 colored images of 7
different types of diagnostic categories of pigmented lesions.

2 Extreme Learning Machines


To understand how ELM augmented neural networks can be trained, it must be first
understood how an ELM works. Essentially, an ELM is just a simple feed forward neural
network that consists of just a single hidden layer. The weights of the matrix between the
input layer and the hidden layer are initialized randomly at the start of the training process,
and then never modified again. The weights of the matrix between the hidden layer and the
output layer are set by something called the Moore-Penrose pseudo-inverse. To better
understand this, let's assume the output of the hidden layer to be H, and the actual output to
be T. Therefore, to achieve this transformation, we have the transformation weight matrix
between the hidden and the output layer as X, given by

HX = T

From the above equation, we calculate the Moore-Penrose pseudo-inverse of H as H’ , to


calculate X by the following

X = H’ T

Thus, all the trainable weights of an ELM can be set simply by making use of the calculation
specified by the above equation. Further, due to the special properties of the Moore-Penrose
pseudo-inverse, it can be ascertained that the matrix weights thus obtained will produce the
most accurate outputs that can be produced by any kind of linear transformation, and that the
weights thus learned will in principle, generalize well to new inputs.

3 Methodology

As we have described above, ELMs are essentially feedforward neural networks


consisting of just a single hidden layer. Because ELMs consist of just a single hidden layer, they
can’t themselves be used for solving many of the complicated AI tasks. Instead of using ELMs
in isolation, they must be augmented into other deep neural networks for them to be of any
value. For example, for image classification, we can take a deep convolutional neural network,
and replace its last few feed forward layers with an ELM.

3 .1 The Training Process


One of the problems that arise when we decide to augment an ELM into a deep neural
network is that the weights of the deep neural network are set iteratively, while the weights of
the ELM are set within a single step. We must also acknowledge the difficulty of gradient flow
through an ELM, and how that might pose a problem to backpropagation, the iterative
algorithm used for setting the weights of a neural network. For our specific case, where we
append an ELM at the back of a deep convolutional neural network, we make use of the
following two solutions for the training process.

❏ Since the weights of an ELM are set within a single step, and there is no way the
weights of an ELM can be updated for the current iteration, we overwrite the weights
of the ELM from the previous iteration.

❏ For enabling gradient flow through an ELM, we simply consider the input output
mapping performed by an ELM as just another function, and assume its
differentiability.

3 .2 Sample Code

class SimpleBaseELM(torch.nn.Module):

def __init__(self, inp_dim, hidden_dim, output_dim):


super().__init__()
self.frozen_linear_layer = torch.nn.Linear(inp_dim, hidden_dim)

self.linear_layer = torch.nn.Linear(
hidden_dim,
output_dim,

bias=False
)

def get_hidden_output(self, x):


x = self.frozen_linear_layer(x)
return torch.sigmoid(x)

def set_linear_weights(self, wts):


wt_par = list(self.linear_layer.parameters())[0]

wt_par.data = wt_par.data + (wts - wt_par.data)

def forward(self, x):


return self.linear_layer(self.get_hidden_output(x))

def forward_with_output(self, x, y):

x = self.get_hidden_output(x)

inv = torch.linalg.pinv(x)

wts = torch.matmul(inv, y)

self.set_linear_weights(torch.transpose(wts, 0, 1))

return torch.matmul(x, wts)

class ConvELM(torch.nn.Module):

def __init__(self, conv_model, elm_dims):

super().__init__()
self.conv_model = conv_model

self.flatten = torch.nn.Flatten()

self.output_dim = elm_dims[2]
self.elm_model = SimpleBaseELM(

elm_dims[0],

elm_dims[1],
elm_dims[2]

for param in self.elm_model.parameters():


param.requires_grad = False

def forward(self, x):


x = self.conv_model(x)

x = self.flatten(x)

x = self.elm_model(x)
return x

def forward_with_output(self, x, y):


x = self.conv_model(x)

x = self.flatten(x)

x = self.elm_model.forward_with_output(x, y)
return x

def _process_batch(self, batch, train=True):


images, labels = batch

if train:

labels_categorical = torch.Tensor(np.eye(self.output_dim,

dtype='float32')[labels.cpu()])

outputs = self.forward_with_output(images,

labels_categorical)

else:

outputs = self(images)

loss = torch.nn.functional.cross_entropy(outputs, labels)

4 Results and discussion

A K-fold training setup with K=5 has been used. The generic model architecture for all
three datasets can be described as consisting of a base convolutional model (Resnet18 in most
cases), which is appended at the end by an Extreme Learning Machine. The Adam optimizer,
along with a learning rate scheduler implementing the One Cycle Policy has been used for the
training process.

4 .1 MNIST Dataset

The results obtained for the MNIST dataset are summarized below. Resnet18 was used
as the base convolutional model during the training process.

Fold Accuracy F1 Score MCC

1 99.28 99.28 99.20

2 99.03 99.03 98.92

3 99.26 99.24 99.17

4 99.23 99.25 99.15

5 99.17 99.17 99.08

table 1. Results for the MNIST dataset


fig 1. Training plot for the MNIST dataset

4 .2 PBC Dataset

The results obtained for the PBC dataset are summarized below. Resnet18 was used as
the base convolutional model during the training process.

Fold Accuracy F1 Score MCC

1 98.32 98.32 98.04

2 98.49 98.43 98.24

3 98.44 98.35 98.18

4 98.29 98.30 98.00

5 98.20 98.20 97.90

table 2. Results for the PBC dataset


fig 2. Training plot for the PBC dataset

4 .2 Dermatology Dataset

The results obtained for the dermatology dataset are summarized below. Resnet18 was
used as the base convolutional model during the training process.

Fold Accuracy F1 Score MCC

1 87.13 86.23 74.75

2 85.03 83.50 69.29

3 86.89 86.57 74.79

4 83.24 81.98 67.06

5 85.91 84.71 71.78

table 3. Results for the dermatology dataset


fig 3. Training plot for the dermatology dataset

5 Conclusions and recommendations

From the results obtained that are described above, it is easy to conclude that ELMs
offer themselves as a viable substitute to traditional feed forward neural networks. They are
easy to train and optimize, and generalize well to new inputs. ELMs can also be augmented
into other deep neural networks, so that they can leverage the depth and complexity of these
networks, in order to solve more complicated AI tasks. This capacity to serve as a drop-in
replacement in any neural network makes ELMs extremely extensible and flexible, thus
extending their applications to a wide variety of AI related domains. Having said that, ELMs
also have some drawbacks, a few of which are listed below:

❏ In an ELM augmented network, since the weights of the ELM are set (and overwritten)
for every batch, we must ensure that the batch size used during iterative training is
large enough so that the ELM does not overfit the batch data. This often makes ELM
augmented neural networks harder to train due to their increased memory
requirement.

❏ Because the weights of an ELM are set within a single step, and because ELMs don't
have a mechanism for updating weights with new batch data, they have the tendency
to exhibit the phenomenon of catastrophic forgetting, whenever used in a batched
training setup.

6 Acknowledgements

I would like to acknowledge my professors Aditya Hati, Dr. Vinay Kumar, and also Arijit
Nandi, for helping and guiding me throughout this project.

7 References
[1] Extreme learning machine: A new learning scheme of feedforward neural networks, by
Guang-Bin Huang, Qin-Yu Zhu and Chee Kheong Siew

[2] A Review of Advances in Extreme Learning Machine Techniques and Its Applications, by Oyekale
Abel Alade, Ali Selama, and Roselina Sallehuddin

You might also like