Artificial Neural Network - 2

Dr Neeraj Kaushik
NIT Kurukshetra
2
Artificial Neural Network by Dr Neeraj Kaushik 16-February-21
SOURCE
3
WHAT IS A NEURAL NETWORK?
 A computational neural network is a
 set of non-linear data modeling tools
 consisting of input and output layers
 plus one or two hidden layers
 The connections between neurons in each layer
have associated weights,
 which are iteratively adjusted by the training
algorithm
 to minimize error and provide accurate predictions.
4
Baby is
asked
“What is
it?”
5
HOW A BABY UNDERSTAND THE WORLD?
Sensory Quadruped Has tail Small Bark
Output
Organs (Four legs) Size Sound
Weight Highest
Table
Bed
Elephant Elephant Dog
Horse Horse
Dog Dog Dog Dog
Eyes & Ears Cat Cat Cat
INPUT HIDDEN LAYERS OUTPUT
6
 A computational neural network is a
 set of non-linear data modeling tools
 consisting of input and output layers
 plus one or two hidden layers
 The connections between neurons in each layer
have associated weights,
 which are iteratively adjusted by the training
algorithm
 to minimize error and provide accurate predictions.
7
It resembles the brain in two respects:
Knowledge is acquired by the network through a
learning process.
 Inter-neuron connection strengths known as
synaptic weights are used to store the
knowledge.
8
WHY TO USE NEURAL NETWORK?
 Violation of assumption of Normality
 Violation of assumption of Linearity
 ANN is robust against noise, outliers, and small
sample sizes.
 Through several rounds of the learning process,
the errors can be minimized, and the accuracy of
the prediction can be further improved.
ANNs are more robust and can provide higher
prediction accuracy than linear models (Tan, Ooi,
Leong et al., 2014)
9
10
ALGORITHM
 Multilayer Perceptron (MLP) or Radial Basis
Function (RBF) procedure
 Both of these are
 Supervised learning techniques—that is, they
map relationships implied by the data.
 Both use feedforward architectures, meaning
that data moves in only one direction, from the
input nodes through the hidden layer of nodes to
the output nodes.
11
LAYERS
 The number of neurons in the Input layer is
equal to the number of Inputs, i.e. Predictors,
 While the number of neurons in the Output
layer is equal to the number of outputs, i.e.
dependent variables.
 The number of neurons in the Hidden layer
affects both prediction, accuracy and speed
of network training.
12
Hidden layer
Input Output
layer layer
13
DEPENDENT VARIABLE
You specify the dependent variables, which may
be scale, categorical, or a combination of the
two:
 If a dependent variable has scale measurement
level, then the neural network predicts
continuous values that approximate the “true”
value of some continuous function of the input
data.
If a dependent variable is categorical, then the
neural network is used to classify cases into the
“best” category based on the input predictors.
14
HIDDEN LAYERS
 The hidden layer contains unobservable
network nodes (units).
 Each hidden unit is a function of the
weighted sum of the inputs.
 The function is the activation function, and
the values of the weights are determined by
the estimation algorithm.
15
ISSUES IN HIDDEN LAYERS
 Simulation experiments indicate that to some
point higher number of neurons in the hidden
layer gives higher estimation accuracy
(Negnevitsky 2011)
 However, too many of them can dramatically
increase the computational load.
 Another problem is Over-fitting. If the number of
hidden neurons is too big, the network might
simply memorize all training examples and not
be able to generalize, i.e. to give correct output
with data not used in the training.
16
IDEAL NUMBER OF HIDDEN LAYERS
There is no heuristic way to determine the
number of hidden neurons, so usually the trial-
and-error (Chan and Chong, 2012; Chong et al.,
2015; Chong,2013a, 2013b) and the rules-of-
thumb are used.
One of the most widely known empirically-
driven rules-of-thumb is that the optimal
number of hidden neurons is usually between
the number of the input and number of the output
neurons (Blum, 1992).
17
 Shibata and Ikeda (2009), Yao, Tan, and Poh
(1999) and Panahian (2011) suggested that
the number of hidden neurons m could be
calculated as follows:
m = sqrt (n . l)
where
n= is the number of input neurons and
l=the number of output neurons.
18
Trenn (2008) suggested the following equa-
tions for the calculation of the number of
hidden neurons m:
m = (n+l-1)/2
where
l=the number of output neurons.
19
Yao et al. (1999) and Panahian (2011),
suggested logarithmic dependence:
m = ln(n)
where
 Fang and Ma (2009) proposed:
m = log2(n)
20
PARTITIONING OF DATA / SAMPLES
 With either of these approaches, you divide
your data into training, testing, and holdout
sets:
 The training set is used to estimate the network
parameters.
 The testing set is used to prevent overtraining.
 The holdout set is used to independently assess
the final network, which is applied to the entire
dataset and to any new data.
21
ARCHITECTURE
 The Architecture tab is used to specify the
structure of the network.
 The procedure can select the "best“
architecture automatically, or you can specify
a custom architecture.
 Automatic architecture selection builds a
network with one hidden layer.
22
23
24
TWO-STAGED SEM-PLS-ANN APPROACH
The application of a two-staged SEM-PLS-ANN
approach would complement each other
 SEM-PLS is suitable for hypothesis testing of linear
relationships but cannot capture the nonlinearity of
relationships.
 While the ANN can detect nonlinear relationships, it
is unsuitable for hypothesis testing because of its
“black box” operation (Hew, Leong, Ooi, & Chong,
2016; Tan, Ooi, Leong, & Lin, 2014).
Source: Lai-Ying Leong, et al. (2019), International Journal of Information Management,
https://doi.org/10.1016/j.ijinfomgt.2019.102047
25
TWO-STAGED SEM-PLS-ANN APPROACH
 In First stage, SEM is used to test the overall
research model and determine significant
hypothesized predictors, which are then,
 In a second stage, significant hypothesized
predictors are used as inputs to the neural
network model to determine the relative
importance of each predictor variable.
 Source: http://dx.doi.org/10.1016/j.ijinfomgt.2016.10.008
26
27
READING OUTPUT
28
READING OUTPUT
 RMSE= SQRT(SSE/N)
 Average RMSE values

of the training and
testing procedures
should be relatively
small (Around 0.10)
29
READING OUTPUT
 Normalized importance of
the variables, identified by
SmartPLS, should be high.
30
PRESENTATION BY
Dr. Neeraj Kaushik

Email: kaushikneeraj@gmail.com
31

Artificial Neural Network - 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Artificial Neural Network - 2

Uploaded by

Copyright:

Available Formats

Dr Neeraj Kaushik

 Average RMSE values

Dr. Neeraj Kaushik

You might also like