R For Deep Learning (I) - Build Fully Connected Neural Network From Scratch - ParallelR

http://www.parallelr.
com/r-deep-neural-network-from-scratch/ Go AUG OCT NOV 👤 ⍰❎

120 captures 22 f🐦
23 Feb 2016 - 8 Nov 2020 2019 2020 2021 ▾ About this capture
BLOG
R for Deep Learning (I): Build Fully Connected Neural Network from Scratch
(h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-
scratch/)
FEBRUARY 1 3 , 2 0 1 6 BY PENG ZHAO (HTTPS://W EB.ARCHI VE.O RG/W EB/2 0 2 0 1 0 2 2 2 1 2 8 2 4 /HTTP://W W W.PARALLELR.CO M/AUTHO R/PATRI C/)
by (h ps://web.archive.org/web/20201022212824/h p://synved.com/wordpress-social-media-feather/)
Share This:
Backgrounds
Deep Neural Network (DNN) (h ps://web.archive.org/web/20201022212824/h ps://en.wikipedia.org/wiki/Deep_learning) has made a great
progress in recent years in image recogni on, natural language processing and automa c driving fields, such as Picture.1 shown from 2012 to
2015 DNN improved IMAGNET (h ps://web.archive.org/web/20201022212824/h p://image-net.org/challenges/LSVRC/2015/)’s accuracy from
~80% to ~95%, which really beats tradi onal computer vision (CV) methods.
(h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/wp-
content/uploads/2016/02/ces2016.png)
Picture.1 – From NVIDIA CEO Jensen’s talk in CES16
http://www.parallelr.com/r-deep-neural-network-from-scratch/ (h ps://web.archive.org/web/20201022212824/h ps://www.youtube.com/playlist?
Go AUG OCT NOV
👤 ⍰❎
120list=PLZHnYvH1qtObAdk3waYMalhA8bdbhyTm2)
captures 22 f🐦
In this post, we will focus on fully connected neural networks which are commonly called DNN in data science. The biggest advantage of DNN is
to extract and learn features automa cally by deep layers architecture, especially for these complex and high-dimensional data that feature
engineers can’t capture easily, examples in Kaggle
(h ps://web.archive.org/web/20201022212824/h p://blog.kaggle.com/2014/08/01/learning-from-the-best/). Therefore, DNN is also very
a rac ve to data scien sts and there are lots of successful cases as well in classifica on, me series, and recommenda on system, such as
Nick’s post (h ps://web.archive.org/web/20201022212824/h p://blog.dominodatalab.com/using-r-h2o-and-domino-for-a-kaggle-
compe on/) and credit scoring (h ps://web.archive.org/web/20201022212824/h p://www.r-bloggers.com/using-neural-networks-for-credit-
scoring-a-simple-example/) by DNN. In CRAN and R’s community, there are several popular and mature DNN packages including nnet
(h ps://web.archive.org/web/20201022212824/h ps://cran.r-project.org/web/packages/nnet/index.html), nerualnet
(h ps://web.archive.org/web/20201022212824/h ps://cran.r-project.org/web/packages/neuralnet/), H2O
(h ps://web.archive.org/web/20201022212824/h ps://cran.r-project.org/web/packages/h2o/index.html), DARCH
(h ps://web.archive.org/web/20201022212824/h ps://cran.r-project.org/web/packages/darch/index.html), deepnet
(h ps://web.archive.org/web/20201022212824/h ps://cran.r-project.org/web/packages/deepnet/index.html)and mxnet
(h ps://web.archive.org/web/20201022212824/h ps://github.com/dmlc/mxnet), and I strong recommend H2O DNN algorithm and R interface
(h ps://web.archive.org/web/20201022212824/h p://www.h2o.ai/ver cals/algos/deep-learning/).
(h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/wp-content/uploads/2016/02/matrureDNNPackages-2.png)
So, why we need to build DNN from scratch at all?

– Understand how neural network works
Using exis ng DNN package, you only need one line R code for your DNN model in most of the me and there is an example
(h ps://web.archive.org/web/20201022212824/h p://www.r-bloggers.com/fi ng-a-neural-network-in-r-neuralnet-package/) by neuralnet. For
the inexperienced user, however, the processing and results may be difficult to understand. Therefore, it will be a valuable prac ce to implement
your own network in order to understand more details from mechanism and computa on views.
– Build specified network with your new ideas
DNN is one of rapidly developing area. Lots of novel works and research results are published in the top journals and Internet every week, and
the users also have their specified neural network configura on to meet their problems such as different ac va on func ons, loss func ons,
regulariza on, and connected graph. On the other hand, the exis ng packages are definitely behind the latest researches, and almost all exis ng
packages are wri en in C/C++, Java so it’s not flexible to apply latest changes and your ideas into the packages.
– Debug and visualize network and data
As we men oned, the exis ng DNN package is highly assembled and wri en by low-level languages so that it’s a nightmare to debug the network
layer by layer or node by node. Even it’s not easy to visualize the results in each layer, monitor the data or weights changes during training, and
show the discovered pa erns in the network.
Fundamental Concepts and Components

Fully connected neural network
(h ps://web.archive.org/web/20201022212824/h p://lo.epfl.ch/files/content/sites/lo/files/shared/1990/IEEE_78_1637_Oct1990.pdf), called
DNN in data science, is that adjacent network layers are fully connected to each other. Every neuron in the network is connected to every
neuron in adjacent layers.
A very simple and typical neural network is shown below
http://www.parallelr.com/r-deep-neural-network-from-scratch/ with 1 input layer, 2 hidden layers, and 1 output layer. Go
Mostly,
AUGwhen
OCT researchers
NOV talk⍰ ❎
👤
120about
captures network’s architecture, it refers to the configura on of DNN, such as how many layers in the network, how many 22 neurons in each layer, f🐦
what kind of ac va on, loss func on, and regulariza on are used.
(h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/wp-content/uploads/2016/02/dnn_architecture.png)
Now, we will go through the basic components of DNN and show you how it is implemented in R.
Weights and Bias

Take above DNN architecture, for example, there are 3 groups of weights from the input layer to first hidden layer, first to second hidden layer
and second hidden layer to output layer. Bias unit links to every hidden node and which affects the output scores, but without interac ng with
the actual data. In our R implementa on, we represent weights and bias by the matrix. Weight size is defined by,
(number of neurons layer M) X (number of neurons in layer M+1)
and weights are ini alized by random number from rnorm. Bias is just a one dimension matrix with the same size of neurons and set to zero.
Other ini aliza on approaches, such as calibra ng the variances with 1/sqrt(n) and sparse ini aliza on, are introduced in weight ini aliza on
(h ps://web.archive.org/web/20201022212824/h p://cs231n.github.io/neural-networks-2/#init) part of Stanford CS231n.
Pseudo R code:
weight.i <- 0.01*matrix (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/matrix)(rnorm (h ps://web.archive.org/web/20201022212824/h p://insi

nrow (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/nrow)=layer size of (i),
ncol (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/ncol)=layer size of (i+1))
bias.i <- matrix (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/matrix)(0, nrow (h ps://web.archive.org/web/20201022212824/h p://inside
Another common implementa on approach combines weights and bias together so that the dimension of input is N+1 which indicates N input
features with 1 bias, as below code:
weight <- 0.01*matrix (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/matrix)(rnorm (h ps://web.archive.org/web/20201022212824/h p://insi

nrow (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/nrow)=layer size of (i) +1,
ncol (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/ncol)=layer size of (i+1))
Neuron
A neuron is a basic unit in the DNN which is biologically inspired model of the human neuron. A single neuron performs weight and input
mul plica on and addi on (FMA), which is as same as the linear regression in data science, and then FMA’s result is passed to the ac va on
func on. The commonly used ac va on func ons include sigmoid
(h ps://web.archive.org/web/20201022212824/h ps://en.wikipedia.org/wiki/Sigmoid_func on), ReLu
http://www.parallelr.com/r-deep-neural-network-from-scratch/ Go AUG OCT NOV
👤 ⍰❎
120(hcaptures
ps://web.archive.org/web/20201022212824/h ps://en.wikipedia.org/wiki/Rec fier_(neural_networks)), Tanh 22 f🐦
(h ps://web.archive.org/web/20201022212824/h ps://reference.wolfram.com/language/ref/Tanh.html) and Maxout. In this post, I will take
the rec fied linear unit (ReLU) as ac va on func on, f(x) = max(0, x) . For other types of ac va on func on, you can refer here
(h ps://web.archive.org/web/20201022212824/h p://cs231n.github.io/neural-networks-1/#ac un).
(h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/wp-content/uploads/2016/02/neuron.png)
In R, we can implement neuron by various methods, such as sum(xi*wi) . But, more efficient representa on is by matrix mul plica on.
R code:
neuron.ij <- max (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/max)(0, input %*% weight + bias)
Implementa on Tips
In prac ce, we always update all neurons in a layer with a batch of examples for performance considera on. Thus, the above code will not work
correctly.
1) Matrix Mul plica on and Addi on
As below code shown, input %*% weights and bias with different dimensions and it can’t be added directly. Two solu ons are provided.
The first one repeats bias ncol mes, however, it will waste lots of memory in big data input. Therefore, the second approach is be er.
# dimension: 2X2
input <- matrix (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/matrix)(1:4, nrow (h ps://web.archive.org/web/20201022212824/h p://inside-r.o
# dimension: 2x3
weights <- matrix (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/matrix)(1:6, nrow (h ps://web.archive.org/web/20201022212824/h p://inside
# dimension: 1*3
bias <- matrix (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/matrix)(1:3, nrow (h ps://web.archive.org/web/20201022212824/h p://inside-r.or
# doesn't work since unmatched dimension
input %*% weights + bias
Error input %*% weights + bias : non-conformable arrays
# solution 1: repeat bias aligned to 2X3

s1 <- input %*% weights + matrix (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/matrix)(rep (h ps://web.archive.org/web/20201022212824/
# solution 2: sweep addition

s2 <- sweep (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/sweep)(input %*% weights ,2, bias, '+')
all.equal (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/all.equal)(s1, s2)

[1] TRUE
2) Element-wise max value for a matrix

Another trick in here is to replace max by pmax to get element-wise maximum value instead of a global one, and be careful of the order in
pmax 🙂
http://www.parallelr.com/r-deep-neural-network-from-scratch/
# the original matrix
Go AUG OCT NOV
👤 ⍰❎
> s1
[,1] [,2] [,3]
[1,] 8 17 26
[2,] 11 24 37
# max returns global maximum

> max (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/max)(0, s1)
[1] 37
# s1 is aligned with a scalar, so the matrix structure is lost

> pmax (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/pmax)(0, s1)
[1] 8 11 17 24 26 37
# correct
# put matrix in the first, the scalar will be recycled to match matrix structure
> pmax (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/pmax)(s1, 0)
[,1] [,2] [,3]
[1,] 8 17 26
[2,] 11 24 37
Layer
– Input Layer
the input layer is rela vely fixed with only 1 layer and the unit number is equivalent to the number of features in the input data.
-Hidden layers
Hidden layers are very various and it’s the core component in DNN. But in general, more hidden layers are needed to capture desired pa erns in
case the problem is more complex (non-linear).
-Output Layer
The unit in output layer most commonly does not have an ac va on because it is usually taken to represent the class scores in classifica on and
arbitrary real-valued numbers in regression. For classifica on, the number of output units matches the number of categories of predic on while
there is only one output node for regression.
Build Neural Network: Architecture, Predic on, and Training

Till now, we have covered the basic concepts of deep neural network and we are going to build a neural network now, which includes
determining the network architecture, training network and then predict new data with the learned network. To make things simple, we use a
small data set, Edgar Anderson’s Iris Data (iris
(h ps://web.archive.org/web/20201022212824/h ps://en.wikipedia.org/wiki/Iris_flower_data_set)) to do classifica on by DNN.
Network Architecture
IRIS is well-known built-in dataset in stock R for machine learning. So you can take a look at this dataset by the summary at the console directly
as below.
R code:
summary (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/summary)(iris (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/dataset

Sepal.Length Sepal.Width Petal.Length Petal.Width Species
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 setosa :50
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 versicolor:50
Median :5.800 Median :3.000 Median :4.350 Median :1.300 virginica :50
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
From the summary, there are four features and three categories of Species. So we can design a DNN architecture as below.
👤 ⍰❎
(h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/iris_network/)
And then we will keep our DNN model in a list, which can be used for retrain or predic on, as below. Actually, we can keep more interes ng
parameters in the model with great flexibility.
R code:
List of 7
$ D : int 4
$ H : num 6
$ K : int 3
$ W1: num [1:4, 1:6] 1.34994 1.11369 -0.57346 -1.12123 -0.00107 ...
$ b1: num [1, 1:6] 1.336621 -0.509689 -0.000277 -0.473194 0 ...
$ W2: num [1:6, 1:3] 1.31464 -0.92211 -0.00574 -0.82909 0.00312 ...
$ b2: num [1, 1:3] 0.581 0.506 -1.088
Predic on
Predic on, also called classifica on or inference in machine learning field, is concise compared with training, which walks through the network
layer by layer from input to output by matrix mul plica on. In output layer, the ac va on func on doesn’t need. And for classifica on, the
probabili es will be calculated by so max (h ps://web.archive.org/web/20201022212824/h ps://en.wikipedia.org/wiki/So max_func on)
while for regression the output represents the real value of predicted. This process is called feed forward or feed propaga on.
R code:
# Prediction
predict.dnn <- func on (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/func on)(model, data = X.test) {
# new data, transfer to matrix
new.data <- data.matrix (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/data.matrix)(data)
# Feed Forwad
hidden.layer <- sweep (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/sweep)(new.data %*% model$W1 ,2, model$b1, '+')
# neurons : Rectified Linear
hidden.layer <- pmax (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/pmax)(hidden.layer, 0)
score <- sweep (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/sweep)(hidden.layer %*% model$W2, 2, model$b2, '+')
# Loss Function: softmax

score.exp <- exp (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/exp)(score)
probs <-sweep (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/sweep)(score.exp, 1, rowSums (h ps://web.archive.org/web/2020102221282
# select max possiblity

labels.predicted <- max.col (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/max.col)(probs)
return (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/return)(labels.predicted)
}
Training
Training is to search the op miza on parameters (weights and bias) under the given network architecture and minimize the classifica on error
or residuals. This process includes two parts: feed forward and back propaga on. Feed forward is going through the network with input data (as
predic on parts) and then compute data loss in the output layer by loss func on
(h ps://web.archive.org/web/20201022212824/h ps://en.wikipedia.org/wiki/Loss_func on) (cost func on). Go
http://www.parallelr.com/r-deep-neural-network-from-scratch/ “DataAUG
lossOCT
measures
NOV the 👤 ⍰ ❎
120compa 22we selected cross-f 🐦
capturesbility between a predic on (e.g. the class scores in classifica on) and the ground truth label.” In our example code,
entropy func on to evaluate data loss, see detail in here (h ps://web.archive.org/web/20201022212824/h p://cs231n.github.io/neural-
networks-2/#losses).
A er ge ng data loss, we need to minimize the data loss by changing the weights and bias. The very popular method is to back-propagate the
loss into every layers and neuron by gradient descent
(h ps://web.archive.org/web/20201022212824/h ps://en.wikipedia.org/wiki/Gradient_descent) or stochas c gradient descent
(h ps://web.archive.org/web/20201022212824/h ps://en.wikipedia.org/wiki/Stochas c_gradient_descent) which requires deriva ves of data
loss for each parameter (W1, W2, b1, b2). And back propaga on
(h ps://web.archive.org/web/20201022212824/h p://colah.github.io/posts/2015-08-Backprop/) will be different for different ac va on
func ons and see here (h ps://web.archive.org/web/20201022212824/h p://mochajl.readthedocs.org/en/latest/user-guide/neuron.html) and
here (h ps://web.archive.org/web/20201022212824/h p://like.silk.to/studymemo/ChainRuleNeuralNetwork.pdf) for their deriva ves formula
and method, and Stanford CS231n (h ps://web.archive.org/web/20201022212824/h p://cs231n.github.io/neural-networks-3/) for more
training ps.
In our example, the point-wise deriva ve for ReLu is:
(h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/class-relu/)
R code:
# Train: build and train a 2-layers neural network
Go AUG OCT NOV
👤 ⍰ ❎
train.dnn <- func on (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/func on)(x, y, traindata=data, testdata=NULL,
# set hidden layers and neurons
# currently, only support 1 hidden layer
hidden=c (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/c)(6),
# max iteration steps
maxit=2000,
# delta loss
abstol=1e-2,
# learning rate
lr = 1e-2,
# regularization rate
reg = 1e-3,
# show results every 'display' step
display = 100,
random.seed = 1)
{
# to make the case reproducible.
set.seed (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/set.seed)(random.seed)
# total number of training set

N <- nrow (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/nrow)(traindata)
# extract the data and label

# don't need atribute
X <- unname (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/unname)(data.matrix (h ps://web.archive.org/web/20201022212824/h p://inside-r.o
Y <- traindata[,y]
if(is.factor (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/is.factor)(Y)) { Y <- as.integer (h ps://web.archive.org/web/20201022212824/h p
# updated: 10.March.2016: create index for both row and col
Y.len <- length (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/length)(unique (h ps://web.archive.org/web/20201022212824/h p://inside-r.o
Y.set <- sort (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/sort)(unique (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-
Y.index <- cbind (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/cbind)(1:N, match (h ps://web.archive.org/web/20201022212824/h p://insid
# number of input features

D <- ncol (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/ncol)(X)
# number of categories for classification
K <- length (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/length)(unique (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc
H <- hidden
# create and init weights and bias

W1 <- 0.01*matrix (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/matrix)(rnorm (h ps://web.archive.org/web/20201022212824/h p://inside-r.o
b1 <- matrix (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/matrix)(0, nrow (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r
W2 <- 0.01*matrix (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/matrix)(rnorm (h ps://web.archive.org/web/20201022212824/h p://inside-r.o

b2 <- matrix (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/matrix)(0, nrow (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r
# use all train data to update weights since it's a small dataset
batchsize <- N
# updated: March 17. 2016
# init loss to a very big value
loss <- 100000
# Training the network

i <- 0
while(i < maxit && loss > abstol ) {
# iteration index
i <- i +1
# forward ....
# 1 indicate row, 2 indicate col
hidden.layer <- sweep (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/sweep)(X %*% W1 ,2, b1, '+')
# neurons : ReLU
hidden.layer <- pmax (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/pmax)(hidden.layer, 0)
score <- sweep (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/sweep)(hidden.layer %*% W2, 2, b2, '+')
# softmax
score.exp <- exp (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/exp)(score)
probs <-sweep (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/sweep)(score.exp, 1, rowSums (h ps://web.archive.org/web/20201022212
# compute the loss

corect.logprobs <- -log (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/log)(probs[Y.index])
data.loss <- sum (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/sum)(corect.logprobs)/batchsize
http://www.parallelr.com/r-deep-neural-network-from-scratch/ Go AUG OCT NOV 👤 ⍰❎
120 captures
reg.loss 22
<- 0.5*reg* (sum (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/sum)(W1*W1) + sum (h ps://web.archive.org/web/202010
f🐦
23 Feb 2016 -loss <- data.loss + reg.loss
8 Nov 2020 2019 2020 2021 ▾ About this capture
# display results and update model

if( i %% display == 0) {
if(!is.null (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/is.null)(testdata)) {
model <- list (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/list)( D = D,
H = H,
K = K,
# weights and bias
W1 = W1,
b1 = b1,
W2 = W2,
b2 = b2)
labs <- predict.dnn(model, testdata[,-y])
# updated: 10.March.2016
accuracy <- mean (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/mean)(as.integer (h ps://web.archive.org/web/2020102221282
cat (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/cat)(i, loss, accuracy, "\n")
} else {
cat (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/cat)(i, loss, "\n")
}
}
# backward ....
dscores <- probs
dscores[Y.index] <- dscores[Y.index] -1
dscores <- dscores / batchsize
dW2 <- t (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/t)(hidden.layer) %*% dscores

db2 <- colSums (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/colSums)(dscores)
dhidden <- dscores %*% t (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/t)(W2)

dhidden[hidden.layer <= 0] <- 0
dW1 <- t (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/t)(X) %*% dhidden

db1 <- colSums (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/colSums)(dhidden)
# update ....
dW2 <- dW2 + reg*W2
dW1 <- dW1 + reg*W1
W1 <- W1 - lr * dW1
b1 <- b1 - lr * db1
W2 <- W2 - lr * dW2
b2 <- b2 - lr * db2
# final results
# creat list to store learned parameters
# you can add more parameters for debug and visualization
# such as residuals, fitted.values ...
model <- list (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/list)( D = D,
H = H,
K = K,
# weights and bias
W1= W1,
b1= b1,
W2= W2,
b2= b2)
return (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/return)(model)

}
Tes ng and Visualiza on

We have built the simple 2-layers DNN model and now we can test our model. First, the dataset is split into twoGoparts
http://www.parallelr.com/r-deep-neural-network-from-scratch/ AUGforOCT
training 👤 ⍰❎
NOV and tes ng,
120and then use the training set to train model while tes ng set to measure the generaliza on ability of our model.
captures 22 f🐦
R code
########################################################################
# testing
#######################################################################
set.seed (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/set.seed)(1)
# 0. EDA
summary (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/summary)(iris (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/dataset
plot (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/graphics/plot)(iris (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/datasets/iris))
# 1. split data into test/train

samp <- c (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/c)(sample (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/sam
# 2. train model
ir.model <- train.dnn(x=1:4, y=5, traindata=iris (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/datasets/iris)[samp,], testdata=iris (h
# 3. prediction
labels.dnn <- predict.dnn(ir.model, iris (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/datasets/iris)[-samp, -5])
# 4. verify the results

table (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/table)(iris (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/datasets/iris)[-
# labels.dnn
# 1 2 3
#setosa 25 0 0
#versicolor 0 24 1
#virginica 0 0 25
#accuracy
mean (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/mean)(as.integer (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/as
# 0.98
The data loss in train set and the accuracy in test as below:
(h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/iris_loss_accuracy-2/)
Then we compare our DNN model with ‘nnet’ package as below codes.
http://www.parallelr.com/r-deep-neural-network-from-scratch/ Go AUG OCT NOV 👤 ⍰
library (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/library)(nnet (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/nnet/nnet))❎
22
ird <- data.frame (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/data.frame)(rbind (h ps://web.archive.org/web/20201022212824/h p://inside-r.o
120 captures f🐦
species = factor (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/factor)(c (h ps://web.archive.org/web/20201022212824/h
ir.nn2 <- nnet (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/nnet/nnet)(species ~ ., data (h ps://web.archive.org/web/20201022212824/h p://i
decay = 1e-2, maxit = 2000)
labels.nnet <- predict (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/stats/predict)(ir.nn2, ird[-samp,], type="class")

table (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/table)(ird$species[-samp], labels.nnet)
# labels.nnet
# c s v
#c 22 0 3
#s 0 25 0
#v 3 0 22
# accuracy
mean (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/mean)(ird$species[-samp] == labels.nnet)
# 0.96
Update: 4/28/2016:
Google’s tensorflow released a very cool website to visualize neural network in here
(h ps://web.archive.org/web/20201022212824/h p://playground.tensorflow.org/). And we has taught almost all technologies as google’s
website in this blog so you can build up with R as well 🙂
(h ps://web.archive.org/web/20201022212824/h p://playground.tensorflow.org/#ac va on=tanh&batchSize=10&dataset=circle&regDataset=

plane&learningRate=0.03&regulariza onRate=0&noise=0&networkShape=4,2&seed=0.86675&showTestData=false&discre ze=false&percTrainD
Summary
In this post, we have shown how to implement R neural network from scratch. But the code is only implemented the core concepts of DNN, and
the reader can do further prac ces by:
Solving other classifica on problem, such as a toy case in here (h ps://web.archive.org/web/20201022212824/h p://www.wildml.com/2015/09/implemen ng-a-
neural-network-from-scratch/)
Selec ng various hidden layer size, ac va on func on, loss func on
Extending single hidden layer network to mul -hidden layers
Adjus ng the network to resolve regression problems
Visualizing the network architecture, weights, and bias by R, an example in here
(h ps://web.archive.org/web/20201022212824/h ps://beckmw.wordpress.com/tag/nnet/).
In the next post (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-dnn-parallel-accelera on/), I will introduce how to
accelerate this code by mul cores CPU and NVIDIA GPU.
Notes:
1. The en re source code of this post in here
(h ps://web.archive.org/web/20201022212824/h ps://github.com/PatricZhao/ParallelR/blob/master/ParDNN/iris_dnn.R)
2. The PDF version of this post in here
(h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/materials/2_DNN/ParallelR_R_DNN_1_BUILD_FROM_SCRATCH.pdf)
3. Pre y R syntax in this blog is Created by inside-R .org
(h ps://web.archive.org/web/20201022212824/h p://blog.revolu onanaly cs.com/2016/08/farewell-inside-rorg.html) (acquired by MS and
dead)
← The R Parallel Programming Blog (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/the-r-parallel-programming-blog/)

R for Deep Learning (II): Achieve High-Performance DNN with Parallel Accelera on → (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-dnn-parallel-
accelera on/)
55 COMMENTS
gary says:
February 19, 2016 at 5:15 am (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-3)

Nice, very helpful.
Looking forward to more blogs.
up up up!
Reply (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/?replytocom=3#respond)
Javid says:
July 13, 2017 at 12:49 pm (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-4155)

Dear Peng
Great work to learn…Appreciated… I just want to ask that Does it work with panel data? if so then how data should be like..
Best regards’
Reply (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/?
replytocom=4155#respond)
Richard Paris says:
February 22, 2016 at 9:07 pm (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-4)

Great work. also looking forward to more blogs in this super interes ng field!
Charles says:
March 9, 2016 at 3:05 pm (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-8)

Great post!
Small error/typo in the code for the predict func on. The func on name should be ‘predict.dnn’ not just ‘predict’.
Thanks again.
120 captures Peng Zhao (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/) says: 22 f🐦
March 10, 2016 at 2:36 am (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-10)

Hi Charles,
Really thanks for your comments.
Good catch. Fixed the ‘predict’ name with ‘predict.dnn’.
Thanks again for reading 🙂
–Peng
VladMinkov says:
May 6, 2016 at 8:52 am (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-38)

Neural network in the ar cle is nothing to do with the deep neural network has not. This is a common MLP. Only 2 package (“darch”, “deepnet”)
actually create deep neural network ini alized by Stacked Autoencoder and Stacked RBM. All other create a simple neural network with deep
regulariza on and the original ini aliza on of weights of neurons.
Obviously you do not quite understand the term “deep”.
Peng (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/) says:

Hi VladMinkov,
Thanks for your comments. Great point about the difference between MLP and DNN, and several nice answers in below:
h ps://www.quora.com/How-is-deep-learning-different-from-mul layer-perceptron
(h ps://web.archive.org/web/20201022212824/h ps://www.quora.com/How-is-deep-learning-different-from-mul layer-perceptron)
In this blog, I intend to make sure the code is as simple as possible to understand, so only 1 hidden layer is used and we ini nized the
weights/bias w/ random number. The construc on of DNN model mainly refers the online course of CS231n.
h p://cs231n.github.io/neural-networks-3/ (h ps://web.archive.org/web/20201022212824/h p://cs231n.github.io/neural-networks-3/)
As you men oned the ini aliza on method of Stacked Autoencoder and Stacked RBM (or other parts of DNN) which can be implemented in
example R code is rela ve flexible compared with mature packages. And that is one of major reasons why we need to build DNN from scratch
rather than using CRAN packages.
The example code of this blog is in GitHub and I will be very happy if you can improve the code by what real DNN has as you men oned so that
more readers can learn it clearly from codes.
h ps://github.com/PatricZhao/ParallelR/tree/master/ParDNN
(h ps://web.archive.org/web/20201022212824/h ps://github.com/PatricZhao/ParallelR/tree/master/ParDNN)
Thanks again for your informa ve notes.
wedding t shirts says:

I love it when people get together and share ideas. Great
site, s ck with it!
tony says:

Ihttp://www.parallelr.com/r-deep-neural-network-from-scratch/
really like what you guys tend to be up too. This kind of Go AUG OCT NOV 👤 ⍰❎
clever
120 work and exposure! Keep up the good works guys I’ve included you guys to our blogroll.
captures 22 f🐦

HI Kitchen Sink,
Thank you very much for your encouragement and blog rolling.
We’ll keep on going to make our work be er.
BR,
–Peng
VladMinkov says:

Hi Peng,
I’m not a programmer, I prac ce. I It is important to know the basic principles of the algorithm without unnecessary gory details.
Look at a few of my ar cles here. h ps://www.mql5.com/en/ar cles/1103
(h ps://web.archive.org/web/20201022212824/h ps://www.mql5.com/en/ar cles/1103) and h ps://www.mql5.com/en/ar cles/2029
(h ps://web.archive.org/web/20201022212824/h ps://www.mql5.com/en/ar cles/2029) and
h ps://www.mql5.com/en/ar cles/1628 (h ps://web.archive.org/web/20201022212824/h ps://www.mql5.com/en/ar cles/1628).
Best regards.

Hi VlaMinkov,
Very great ar cles and I think I need to refresh my knowledge of DNN.
Welcome for further valuable comments on our site.
BR,
–Peng
precious basaldua says:

Great post. I was checking constantly this blog and I’m
impressed! Very helpful informa on par cularly the closing part 🙂 I take care
of such info much. I was seeking this par cular info for a very lengthy me.
Thanks and good luck.
bastcilkdoptb says:
May 18, 2016 at 12:34 pm (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-102)

Whats up very cool web site!! Guy .. Beau ful .. Wonderful .. I’ll bookmark your website and take the feeds addi onally…I’m happy to find numerous
useful info here within the put up, we need develop extra strategies in this regard, thanks for sharing.
Reply
(h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/?replytocom=102#respond)
furtdsolinopv says:

You have done an excellent job. I will certainly dig it and personally suggest to my friends. I’m sure they’ll be benefited from this site.
hendrik says:
June 1, 2016 at 1:13 pm (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-120)

How do we make it work for regression?
here some overview of my dataset
supposed my train data 4 column 288 rows
col 1 = data from previous 2 day, col 2 =data from previous day, col 3=data now, col 4= response variable, target
i have tried this code give
dtaipei.model #accuracy
> mean(as.integer(dtaipei[-samp, 4]) == preddtaipei.dnn)
[1] 0.01388889
what i should do?

June 2, 2016 at 5:20 am (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-124)

Hi Hendrik,
Thanks for reading my blog and prac cing by real data.
This is a very good ques on and it’s not very difficult to switch from current classifica on network to regression network.
Actually, major changes include:

1. Ac va on func on from ReLu to tanh or sigmoid;
2. Loss func on from so max to mean squared error or absolute error;
I think you can get the correct codes a er a li le warm-up, and welcome to check in your code in ParallelR GitHub.
On the other hand, I will release the regression code later in case you have issues to build regression NN.
Thanks for your comments again.
–Peng
hendrik says:

thanks for reply…
a er i tried for regression, i come up with some further ques on
– how do we modify the number of hidden layer from 2- 5 layers instead of fix hidden layer (2), does 2 hidden layers architecture
guarantee best structure of network? the idea is perform grid search, or random search hyper-paramater op miza on to look for best
architecture. for example
– hidden (try 2 to 5 layers deep, 10 to 2000 neurons per layer)
-rec fier/rec fierwithdropout
-adap ve learning rate, etc
-can we consider your network architecture is a deep learning? since DNN also in family of deep learning,
http://www.parallelr.com/r-deep-neural-network-from-scratch/ Go in term
AUG of NOV
OCT deep learning ⍰❎
👤 usually
120 captures at the first step use unsupervised pre-training (RBM, AE, SAE, CRBM) to learn feature then followed by supervised 22at the top(fine-tune, f🐦
backpropaga on, etc)
correct me if i was wrong…

i’m not an expert , s ll need to learn many things
thanks in advance…


That’s great to go further in DNN 🙂
1) Fixed 1 hidden layer (totally 3 layers) in the ar cle is only for simplified the code and make it shorter.
In prac ce, people prefer to use many hidden layers (range from the layers to hundred layers) to capture complicated pa erns in
the data; however, it’s prone to over-fi ng the data by much layers. Thus, it’s a tradeoff between shallow and deep network, and
you can refer this paper as well (h ps://arxiv.org/pdf/1312.6184.pdf
(h ps://web.archive.org/web/20201022212824/h ps://arxiv.org/pdf/1312.6184.pdf)). And dropout and adap ve LR are commonly
used in the modern network.
Regarding with which kind of NN structure is be er, it really depends on the problem and input data, and I think this the real state-
of-art in DNN and very me-consuming 🙂
Go back the code, the forward and back propaga on processing of mul ple layer network are very similar to 1 layer. The only thing
for you is to add a loop. Take 2 hidden layers for example in below code. Actually, you need to generalize this piece of code by
numbers of input parameters, such as hidden=c(3,3,3) stands for 3 hidden layers and each with 3 neurons:
############# Forward ######################

# input to layer1
hidden.layer1 = sweep (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/sweep)(X %*% W1 ,2, b1, '+')
hidden.layer1 = pmax (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/pmax)(hidden.layer1, 0)
#layer1 to layer2
hidden.layer2 = sweep (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/sweep)(hidden.layer1 %*% W2, 2, b2, '+')
hidden.layer2 = pmax (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/pmax)(hidden1.layer2, 0)
#layer2 to output layer
score = sweep (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/sweep)(hidden.layer2 %*% W3, 2, b3, '+')
############ Backward #####################
dscores = probs
dscores[Y.index] = dscores[Y.index] -1
dscores = dscores / batchsize
dW3 = t (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/t)(hidden1.layer) %*% dscores
db3 = colSums (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/colSums)(dscores)
dhidden1 = dscores %*% t (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/t)(W3)
dhidden1[hidden1.layer < 0] = 0
dW2 = t (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/t)(hidden.layer) %*% dhidden1
db2 = colSums (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/colSums)(dhidden1)
dhidden = dhidden1 %*% t (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/t)(W2)
dhidden[hidden.layer < 0] = 0
dW1 = t (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/t)(X) %*% dhidden
db1 = colSums (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/colSums)(dhidden)
2) Actually, I am not sta sts 🙁

My understand is major from online course: h p://cs231n.github.io/
(h ps://web.archive.org/web/20201022212824/h p://cs231n.github.io/)
You can refer the materials of VladMinkov from above comments for what’s really deep learning or below slide.
p:// p.dca.fee.unicamp.br/pub/docs/vonzuben/ia353_1s15/topico10_IA353_1s2015.pdf
(h ps://web.archive.org/web/20201022212824/ p:// p.dca.fee.unicamp.br/pub/docs/vonzuben/ia353_1s15/topico10_IA353_1s201
Thanks
Lola Junge says:

June 4, 2016 at 2:15 pm (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-128)
Sweet
120 blog!
captures I found it while surfing around on Yahoo News. Do you have any ps on how to get listed in Yahoo News? I’ve22
been trying for a while f but🐦
2019 2020 2021
I never seem to get there! Many thanks
23 Feb 2016 - 8 Nov 2020 ▾ About this capture

Hi Lola,
Thanks, though this is a li le out of topic, I’m very happy to know the blog has been listed on Yahoo.
Actually, I don’t know the exact reasons but I think a well-baked ar cle with enriched technology instruc ons will be favored for readers.
Good Luck!
Yaeko Kahawai says:

Wow, wonderful weblog format! How lengthy have you ever been blogging for? you made running a blog look easy. The full look of your site is great,
let alone the content!
WilliamCavy says:

I really liked your ar cle post.Much thanks again. Keep wri ng. Satre
eebest8 says:
July 6, 2016 at 4:05 am (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-205)

“Thank you ever so for you blog post.Really looking forward to read more.”
Harshith says:
August 11, 2016 at 10:12 am (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-326)

In which package, the func on layer.size is present? I’am ge ng an error “could not find the func on “layer.size”.
August 11, 2016 at 1:35 pm (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-328)

Hi Harshith,
Sorry for the confusions. Actually, ‘layer.size’ is a fake func on I used to illustrate how to ini alize weight and bias.
When we define the network architecture, we need to pass the hidden layer size to train.dnn func on with ‘hidden=c(3)’ and we can save it to
a vector such as ‘size.nn <- c(D, hidden, K)'. So, 'layer.size' ca be as simple as : layer.size <- function(x) { size.nn[x] }
Hope this helpful 🙂 And I will try to refine the post later to make it clear.
Thanks for your reading and very great feedback.

Reply (h ps://web.archive.org/web/20201022212824/h
http://www.parallelr.com/r-deep-neural-network-from-scratch/ p://www.parallelr.com/r-deep-neural-network-from-scratch/?
Go AUG OCT NOV 👤 ⍰❎
George Chao says:

Hi Zhao Peng,
I think bias is
`bias.i <- matrix(0, nrow=1, ncol = layer.size(i))`

George Chao says:

Sorry `bias.i <- matrix(0, nrow=1, ncol = layer.size(i+1))`
Matrix Chen says:

Hi George,
Good catch. Fixed the ‘layer.size(i)’ with ‘layer.size(i+1)’.
Thanks again for reading 🙂
kenny192 says:
September 11, 2016 at 1:27 pm (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-496)

Really thank you for good post!!
I have a few ques ons about ‘nnet’ package.
Why the ‘nnet’ Do not use the learning rate? Does learning rate are formed randomly?
September 12, 2016 at 2:38 am (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-502)

kenny192, thanks for reading 🙂
Update the answer.

Actually, nnet doesn’t use learning rate as Brian D. Ripley said:
h ps://stat.ethz.ch/pipermail/r-help/2005-May/071822.html (h ps://web.archive.org/web/20201022212824/h ps://stat.ethz.ch/pipermail/r-
help/2005-May/071822.html)
“I think you need to read the book nnet supports. It does not do on-line learning and does not have a learning rate.”
And NN in Chapter 8 of the books from Brain in below:
p:// p.cs.ntust.edu.tw/yang/R-
BOOK/Venables,%20WN%20and%20Ripley,%20BD(2002)%20Modern%20Applied%20Sta s cs%20with%20S,%204th%20Ed.pdf
(h ps://web.archive.org/web/20201022212824/ p:// p.cs.ntust.edu.tw/yang/R-
BOOK/Venables,%20WN%20and%20Ripley,%20BD(2002)%20Modern%20Applied%20Sta s cs%20with%20S,%204th%20Ed.pdf)
More info in :
h p://stackoverflow.com/ques ons/9390337/purpose-of-decay-parameter-in-nnet-func on-in-r
(h ps://web.archive.org/web/20201022212824/h p://stackoverflow.com/ques ons/9390337/purpose-of-decay-parameter-in-nnet-func on-
in-r)
h p://stats.stackexchange.com/ques ons/29130/difference-between-neural-net-weight-decay-and-learning-rate
(h ps://web.archive.org/web/20201022212824/h
120 captures 22
p://stats.stackexchange.com/ques ons/29130/difference-between-neural-net-weight-
f🐦
decay-and-learning-rate)

kenny192 says:

Thank you for reply, so you means that ‘nnet’ package have a only decay parameter? And decay parameter is how to used to set learning
rate? Kind regards,
Peng Zhao (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/) says:
September 12, 2016 at 6:23 am (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-

506)
Sorry for inaccurate answer previously, I have modified it.
Actually, LR is not needed in nnet and I am trying to understand the weights upda ng of nnet. I will reply to you when I figure out
more details.
Good ques ons for further learning 🙂
kenny192 says:

508)
Thank you for reply,
I think that it is already on weight inside ‘nnet package includes the learning rate. If you check the this book’s 6.8.11 in below :
h p://cns-classes.bu.edu/cn550/Readings/duda-etal-00.pdf (h ps://web.archive.org/web/20201022212824/h p://cns-
classes.bu.edu/cn550/Readings/duda-etal-00.pdf)
Peng Zhao (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/) says:

509)
Thanks for great book and info. Very helpful!
“weight decay” method to update all weights by:

Wnew = Wold * (1 − decay)
And source code in nnet can be found:

h ps://github.com/cran/nnet/blob/master/src/nnet.c
(h ps://web.archive.org/web/20201022212824/h ps://github.com/cran/nnet/blob/master/src/nnet.c)
kenny192 says:

510)
Thank you, I also very helpful.
Kind regards
kenny192 says:
November 11, 2016 at 12:38 pm (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-

938)
Hello, Peng 🙂 I have a ques on about overfi ng in neural networks model. In general, overfi Go
http://www.parallelr.com/r-deep-neural-network-from-scratch/ ng means when
AUG OCT NOVvalida on👤error ⍰❎
120 captures increases while the training error steadily decreases in result. right? but if valida on error will be converge22
(not increase or f 🐦
decrease) at par cular point while the training error steadily decreases, is this case also overfi ng?
Donovan Kessner says:
November 20, 2016 at 6:38 am (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-1060)

Superb web page you’ve go en there.|
tope says:
December 28, 2016 at 2:28 pm (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-1437)

str(ir.model)
produces the error below in R terminal
Error in str(ir.model) : object ‘ir.model’ not found
tope says:

N<-nrow(traindata)
is giving me errors
Error in nrow(traindata) : object 'traindata' not found

Hi Tope,
Thanks for your interests for our blog and welcome for any ques ons.
Regarding your problems,
1. str(ir.model) , actually, you need to train model first, as blog men oned,
ir.model <- train.dnn(x=1:4, y=5, traindata=iris[samp,], testdata=iris[-samp,], hidden=6, maxit=2000, display=50)
2. traindata is located in func on, you can't execute it line by line. But you can assign real data first as below:
> samp <- c(sample(1:50,25), sample(51:100,25), sample(101:150,25)) > traindata=iris[samp,]
> nrow(traindata)
[1] 75
BTW, the whole R code is on Github (h ps://web.archive.org/web/20201022212824/h ps://github.com/PatricZhao/ParallelR), you can try to
run it and then debug one by one.
Michael says:
January 31, 2017 at 8:36 pm (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-1732)

nice blog i like it thank you so much…this will helpfull to my students educa on
Brian says:
February 14, 2017 at 3:41 pm (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-1849)

Hi, I’m confused on how you would convert this into a 2+ hidden layer NN. Any chance you could help me out? Fantas
http://www.parallelr.com/r-deep-neural-network-from-scratch/ Go c arOCT
AUG cle,NOV 👤 ⍰❎
I learnt so much.
Thanks!
February 15, 2017 at 11:46 am (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-1857)

Hi Brain,
Thanks for your interests for the blog. Please see my reply to hendrik in June 10, 2016 at 3:23 am 🙂
Thanks
Sharon Grace says:
May 9, 2017 at 1:43 pm (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-2856)

Hi, Am working on a project by the name house rent predic on using ar ficial neural networks please assist on the coding
jdxbx says:

Very nice work!
sorry,i can not find the data1, where is the data1?
Thank you very much!
Peng (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/blog) says:

We use the built-in dataset of ‘iris’ rather than ‘data1’.
Thanks
Patricio says:

Hi!, really amazing blog!
I am trying to modify the code, for a single node in the output layer (with another data),
But I could not do it, could you help me?
Matrix Chen says:

Hi Patricio,
I think you are trying to switch from current classifica on network to regression network.
Actually, major changes include:
1. Ac
120 captures va on func on from ReLu to tanh or sigmoid; 22 f🐦
Welcome to share your codes with me. It is a great honor to help you.🙂

Genwei Zhang says:

Dear Dr. Zhao,
This blog is excellent for me!
I tested the classifica on results and it worked very well as what you have showed.
Now, I am facing a task to fit a NN onto a regression dataset, you men oned above in one of the comments that switching from classifica on to
regression is not very hard. But, I tried to modify the code and ended up with not working well on my end. I am just wondering did you post
somewhere another blog talking about regression NN or can you kindly provide me with some details? I would really appreciated your help.
David Zhang,
Graduate Student,
University of Oklahoma
Matrix Chen says:

Hi David,
To switch from current classifica on network to regression network, major changes include:
1. Ac va on func on from ReLu to tanh or sigmoid. A en on that it is usually not necessary to set ac va on func on on the output layer;
3. Back propaga on processing.
BTW, the regression NN code can be found:
h ps://github.com/PatricZhao/ParallelR/tree/master/ParDNN
(h ps://web.archive.org/web/20201022212824/h ps://github.com/PatricZhao/ParallelR/tree/master/ParDNN)
you can try to run it and then debug one by one.
Genwei Zhang says:

Hi, thank you so much for your reply. I will try that out!

Flo-Bow says:
September 21, 2017 at 3:46 pm (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-5129)

Hi Dr. Zhao,
thank you for sharing your knowledge, but your R-Code does not work:
👤 ⍰❎
weight.i
120 captures <- 0.01*matrix(rnorm(layer.size(i)*layer.size(i+1)), 22 f🐦
nrow=layer.size(i),
ncol=layer.size(i+1))
What can I do?
Thanks in advance for replying.
Cheers Flo-Bow
Matrix Chen says:

Hi Flo-Bow,

This line just shows that the weight size is defined by the number of neurons layer M and the number of neurons in layer M+1. You can
complete the weight ini aliza on part according to this rule on your own. Also, you can refer to the en re source code of this post:
h ps://github.com/PatricZhao/ParallelR/blob/master/ParDNN/iris_dnn.R
(h ps://web.archive.org/web/20201022212824/h ps://github.com/PatricZhao/ParallelR/blob/master/ParDNN/iris_dnn.R)
you can try to run it and then debug one by one.


Mauricio (h ps://web.archive.org/web/20201022212824/h p://www.divephotoguide.com/user/lypemania18) says:
April 10, 2020 at 9:53 am (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-13053)

As I site possessor I believe the content ma er here is ra ling wonderful ,
appreciate it for your hard work. You should keep it up forever!
Best of luck.
2 TRACKBACKS
R for Deep Learning (II): Achieve High-Performance DNN with Parallel Accelera on – ParallelR
(h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-dnn-parallel-accelera on/)
R for deep learning (III): CUDA and Mul GPUs Accelera on – ParallelR (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-dnn-cuda-mul gpu/)
LEAVE A REPLY
Your email address will not be published. Required fields are marked *
Comment
Name *
Email *
Website
No fy me of followup comments via e-mail. You can also subscribe

(h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/comment-
subscrip ons/?srp=216&srk=7c88bab4554719eb6eda8e5de4a43dff&sra=s&srsrc=f)
without commen ng.
POST COMMENT
NEWSLETTER
Email address:
Your email address
Subscribe
SEARCH
Search 
CATEGORIES
Accelerators (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/category/mic-fpga-asic/)
General (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/category/general-inforamion/)
GPGPU (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/category/mic-fpga-asic/general-purpose-gpu-compu ng/)
Intel Xeon Phi (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/category/mic-fpga-asic/intel-xeon-phi/)
MPI (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/category/openmpi-message-passing-interface/)
Mul Cores (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/category/openmp-tbb-pthread-mul threads/)
Performance Op mizaiton (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/category/cache-efficiency/)
Vectoriza on (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/category/vectoriza on/)
RECENT POSTS
R benchmark for High-Performance Analy cs and Compu ng (II): GPU Packages Go AUG OCT NOV 👤 ⍰❎
(hcaptures
120 ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-hpac-benchmark-analysis-gpu/)
22 f🐦
Parallel Computa on with R and XGBoost (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/parallel-computa on-with-

r-and-xgboost/)
R with Parallel Compu ng from User Perspec ves (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-with-parallel-

compu ng/)
R and openMP: boos ng compiled code on mul -core cpu-s (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-and-
openmp-boos ng-compiled-code-on-mul -core-cpu-s/)
R for Deep Learning (III): CUDA and Mul GPUs Accelera on (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-dnn-
cuda-mul gpu/)
BLOG ROLL
R-bloggers (h ps://web.archive.org/web/20201022212824/h p://www.r-bloggers.com/)
Simple Sta s cs (h ps://web.archive.org/web/20201022212824/h p://simplysta s cs.org/)
Jun's Blog (h ps://web.archive.org/web/20201022212824/h p://junma5.weebly.com/data-blog/)
DISCLAIMER
1) This is a personal weblog.
2) The opinions expressed here represent my own and not those of my employer.
3) The opinions expressed by the ParallelR Bloggers and those providing comments are theirs alone and do not reflect the opinions of ParallelR.
HOME (HTTPS://WEB.ARCHIVE.ORG/WEB/20201022212824/HTTP://WWW.PARALLELR.COM/)
BLOG (HTTPS://WEB.ARCHIVE.ORG/WEB/20201022212824/HTTP://WWW.PARALLELR.COM/BLOG/)
PRESENTATIONS (HTTPS://WEB.ARCHIVE.ORG/WEB/20201022212824/HTTP://WWW.PARALLELR.COM/PRESENTATION/)
OPEN SOURCES (HTTPS://WEB.ARCHIVE.ORG/WEB/20201022212824/HTTP://WWW.PARALLELR.COM/OPENSOURCE/)
DONATION (HTTPS://WEB.ARCHIVE.ORG/WEB/20201022212824/HTTP://WWW.PARALLELR.COM/DONATION/)
CONTACT (HTTPS://WEB.ARCHIVE.ORG/WEB/20201022212824/HTTP://WWW.PARALLELR.COM/CONTACT/)
LOGIN (HTTPS://WEB.ARCHIVE.ORG/WEB/20201022212824/HTTP://WWW.PARALLELR.COM/WP-ADMIN)
Copyright © ParallelR 2020
  

(h ps://web.archive.org/web/20201022212824/h
(h ps://web.archive.org/web/20201022212824
(h ps://web.archive.org/web/202010222p

R For Deep Learning (I) - Build Fully Connected Neural Network From Scratch - ParallelR

Uploaded by

Copyright:

Available Formats

You might also like

R For Deep Learning (I) - Build Fully Connected Neural Network From Scratch - ParallelR

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

R For Deep Learning (I) - Build Fully Connected Neural Network From Scratch - ParallelR

Uploaded by

Copyright:

Available Formats

http://www.parallelr.

com/r-deep-neural-network-from-scratch/ Go AUG OCT NOV 👤 ⍰❎

So, why we need to build DNN from scratch at all?

Fundamental Concepts and Components

Weights and Bias

(number of neurons layer M) X (number of neurons in layer M+1)

weight.i <- 0.01*matrix (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/matrix)(rnorm (h ps://web.archive.org/web/20201022212824/h p://insi

weight <- 0.01*matrix (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/matrix)(rnorm (h ps://web.archive.org/web/20201022212824/h p://insi

neuron.ij <- max (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/max)(0, input %*% weight + bias)

# solution 1: repeat bias aligned to 2X3

# solution 2: sweep addition

all.equal (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/all.equal)(s1, s2)

2) Element-wise max value for a matrix

# max returns global maximum

# s1 is aligned with a scalar, so the matrix structure is lost

Build Neural Network: Architecture, Predic on, and Training

summary (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/summary)(iris (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/dataset

# Loss Function: softmax

# select max possiblity

# total number of training set

# extract the data and label

# number of input features

# create and init weights and bias

W2 <- 0.01*matrix (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/matrix)(rnorm (h ps://web.archive.org/web/20201022212824/h p://inside-r.o

# Training the network

# compute the loss

# display results and update model

dW2 <- t (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/t)(hidden.layer) %*% dscores

dhidden <- dscores %*% t (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/t)(W2)

dW1 <- t (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/t)(X) %*% dhidden

return (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/base/return)(model)

Tes ng and Visualiza on

# 1. split data into test/train

# 4. verify the results

labels.nnet <- predict (h ps://web.archive.org/web/20201022212824/h p://inside-r.org/r-doc/stats/predict)(ir.nn2, ird[-samp,], type="class")

(h ps://web.archive.org/web/20201022212824/h p://playground.tensorﬂow.org/#ac va on=tanh&batchSize=10&dataset=circle&regDataset=

← The R Parallel Programming Blog (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/the-r-parallel-programming-blog/)

February 19, 2016 at 5:15 am (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-3)

Reply (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/?replytocom=3#respond)

July 13, 2017 at 12:49 pm (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-4155)

Richard Paris says:

February 22, 2016 at 9:07 pm (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-4)

Reply (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/?replytocom=4#respond)

March 9, 2016 at 3:05 pm (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-8)

March 10, 2016 at 2:36 am (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-10)

May 6, 2016 at 8:52 am (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-38)

Reply (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/?replytocom=38#respond)

Peng (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/) says:

May 10, 2016 at 3:42 am (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-54)

wedding t shirts says:

May 9, 2016 at 3:10 am (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-47)

May 9, 2016 at 5:24 am (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-48)

Peng (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/) says:

May 9, 2016 at 5:30 am (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-49)

May 10, 2016 at 9:04 am (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-55)

Peng (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/) says:

May 10, 2016 at 9:17 am (h ps://web.archive.org/web/20201022212824/h p://www.parallelr.com/r-deep-neural-network-from-scratch/#comment-56)

precious basaldua says: