Assignment_Week6_AI4ICPS

Assignment - Week 6 - Reinforcement and Deep Learning
Name: MOHIT SHUKLA

Email: mohit.shukla1984@gmail.com
Question-1. For the code given in this github repository

(https://github.com/Sabyasachi123276/Reinforce-Learning-Tutorials/blob/main/Reinforce_Learning_Tutorial_1.
ipynb):
(A) change the edges as below:
[(0, 5), (0, 1), (2, 3), (5, 4), (1, 2),
(1, 3), (9, 10), (2, 4), (2, 6), (6, 7),
(5, 9), (2, 9), (3, 8), (3, 10)]
(B) Change the goal as 7
(C) Comment on most efficient path & generate plot of Reward Gained vs No of Iterations
Answer-1
I have placed all my code in the below public Google Colab link (Which is public):
https://colab.research.google.com/drive/1iotHTXYPicOJwaHMxQuxyLFAyOhb5yLq?usp=sharing
Comment on most efficient path

● Most efficient path was generated as [0, 1, 2, 6, 7]
● This seems very logical and inline with the initial graph generated (For the set of points that we started)
● If we observe that graph (Figure A in next section), moving from “0” (Which was the “current_state”
with which we started testing) to 1 to 2 to 6 to 7 does seem like the best path
Plot of Reward Gained vs No of Iterations

● Below is the required graph.
● We see that it flattens (Rewards gained become constant) after about 500 iterations. I think this simply
means that training is complete and matrix “Q” doesnt change even after calling for newer random
initial states (So value returned by “update” function remains same, and same value keeps getting
appended in score array)
Code for Answer 1, pasted from my Colab Notebook: Along with detailed explanation of code (Which I
feel is necessary since we were given the full code and we just had to understand)
import numpy as np
import pylab as pl
import networkx as nx
Defining and visualising the graph - But with the new points as given in the assignment
edges = [(0, 5), (0, 1), (2, 3), (5, 4), (1, 2),
(1, 3), (9, 10), (2, 4), (2, 6), (6, 7),
(5, 9), (2, 9), (3, 8), (3, 10)]
G = nx.Graph()
G.add_edges_from(edges)
pos = nx.spring_layout(G)
nx.draw_networkx_nodes(G, pos)
nx.draw_networkx_edges(G, pos)
nx.draw_networkx_labels(G, pos)
pl.show()
Output:
FIGURE A.
Next piece of code will be defining the reward system for the bot - But again with the new "Goal" (Which is 7).
1. This basically means that whichever point's column part (point[1]) is 7, there we are assigning that
point (Lets say x,7 in point's column) with max reward (So that all points in my edge set, in the 7th
column are 100 (max reward)
2. And vice versa if a point's row part (point[0]) matches, then reverse of point (y,7) is marked with award
using M[point[::-1]]- So for those points also 7th column is marked with max reward
PS - For other points (Which are not matching goal in x or y, there we are making the value in M corresponding
to that point as 0) - Which is basically higher reward than -1 (Because we want traversal to happen on our
points)
goal = 7
MATRIX_SIZE = 11
M = np.matrix(np.ones(shape =(MATRIX_SIZE, MATRIX_SIZE)))
M *= -1
for point in edges:

print(point)
if point[1] == goal:
M[point] = 100
else:
M[point] = 0
if point[0] == goal:
M[point[::-1]] = 100
else:
M[point[::-1]]= 0
# reverse of point
M[goal, goal]= 100

print(M)
Output
Now we are defining a Q matrix which basically is an all 0 matrix (Blank environment for training). But the Q
matrix values are populated as per the available actions in the previous matrix (M) where we had populated
initial rewards (In the Goal or 7th column, and in 7,7 also where we wanted to reach)
● For doing this population (Or training), we are defining some utility functions (E.g. Finding available
actions from current ROW where reward values in M are >= 0, basically our edges and rewards) which
we will call iteratively later
● I will write detailed comments on top of each function for understanding purposes
Another important concept is "gamma" - Which basically is a discount factor. It simply means that if max value
possible is 100, then instead of keeping reward as 100, we will keep it to be gamma factor*100 (So that future
rewards, or cells away from goal also get comparable reward values). If we keep gamma as 1, then immediate
cells closer to max value only will get high values.
Q = np.matrix(np.zeros([MATRIX_SIZE, MATRIX_SIZE]))
gamma = 0.75 # discount factor
initial_state = 1
# Determines the available actions for a given state

# Returns all points in current row where award is >=0
def available_actions(state):
current_state_row = M[state, ]
#print(current_state_row)
available_action = np.where(current_state_row >= 0)[1]
return available_action
available_action = available_actions(initial_state)
print(available_action)
# Chooses one of the available actions at random

def sample_next_action(available_actions_range):
next_action = int(np.random.choice(available_action, 1))
return next_action
action = sample_next_action(available_action)
print(action)
# Next fn Checks the max index possible for the passed action
# Also assigns the Q[current state, action] which is movement from current state to possible
action where reward>0 to M (original matrix) value but with gamma factor
# Ultimately we are updating the Q-Matrix according to the path chosen
def update(current_state, action, gamma):
max_index = np.where(Q[action, ] == np.max(Q[action, ]))[1]
if max_index.shape[0] > 1:
max_index = int(np.random.choice(max_index, size = 1))
else:
max_index = int(max_index)
max_value = Q[action, max_index]

Q[current_state, action] = M[current_state, action] + gamma * max_value
if (np.max(Q) > 0):
return(np.sum(Q / np.max(Q)*100))
else:
return (0)
update(initial_state, action, gamma)

Output
Now we will have the final section which is training and evaluating the bot using the Q-Matrix. Training is
simply finding out available actions, one random action, and update step for 1000 random initial states.
Also we will test, how to bot traverses to "7" which was our reward state.
scores = []
for i in range(1000):
current_state = np.random.randint(0, int(Q.shape[0]))
available_action = available_actions(current_state)
action = sample_next_action(available_action)
score = update(current_state, action, gamma)
scores.append(score)
print("Trained Q matrix:")
print(Q / np.max(Q)*100)
# Testing
current_state = 0
steps = [current_state]
while current_state != 7:
next_step_index = np.where(Q[current_state, ] == np.max(Q[current_state, ]))[1]

if next_step_index.shape[0] > 1:
next_step_index = int(np.random.choice(next_step_index, size = 1))
else:
next_step_index = int(next_step_index)
steps.append(next_step_index)
current_state = next_step_index
print("Most efficient path:")

print(steps)
pl.plot(scores)
pl.xlabel('No of iterations')
pl.ylabel('Reward gained')
pl.show()
Output
Question-2. For the code given in this github repository
https://github.com/Sabyasachi123276/Deep-Learning-Tutorials/blob/main/Convolutional_Neural_Network_with
_Implementation_in_Python.ipynb
(A) Change the Epoch values from 10 to 50
(B) What are your observations? Is there an improvement in Accuracy
Answer-2
I have placed all my code in the below public Google Colab link (Which is public):
https://colab.research.google.com/drive/1aYzCuFiU4LC5xR4UqiKYZ0WAZTyKREQp?usp=sharing
Observations for change in Epoch value, and if there is an improvement in Accuracy

Actually I DID NOT see any appreciable change in the accuracy value. I feel that after 10 forward and
backward passes (Epoch of 10), the model eventually reached a point where appreciable learning had already
happened. Increasing epochs ideally does increase accuracy, but I think here it needed more (Larger) data set
(More than 60K that Keras MNIST provides.
Code for Answer 2, pasted from my Colab Notebook: Along with detailed explanation of code (Which I
feel is necessary since we were given the full code and we just had to understand)
from tensorflow.keras.datasets import mnist

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPool2D
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import Dense
Now we will load the MNIST dataset from keras library. MNIST DB (Modified National Institute of Standards
and Technology Database) is a large database of pixel values of handwritten digits mapped to their actual
values.
● Keras MNIST when loaded contains:
● Training data as "x_train": NumPy array of image data with 60K pixel values as Matrix of (60000, 28, 28).
RGB values range from 0 to 255 in a 28x28 pixel array for each of 60K images
● Actual Digit Values of x_train pixels in "y_train": NumPy array of digit labels (integers in range 0-9) with
shape (60000,)
● x_test, y_test (Just like x_train and y_train but with 10K values)
We will also reshape data (Pixel data) to add one more dimension because it is needed further for Keras
Library
(X_train,y_train) , (X_test,y_test)=mnist.load_data()
print(X_train[59999,0])
print(X_train.shape)
#reshaping data - Adding one more dimension because it is needed further in Keras library
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], X_train.shape[2], 1))
X_test = X_test.reshape((X_test.shape[0],X_test.shape[1],X_test.shape[2],1))
print(X_train.shape)
print(X_test.shape)
print(y_train[59999])
Output
Now we will normalize the pixel values - When using the image as it is and passing through a Neural Network,
the computation of high numeric values may become more complex. To reduce this we can normalize the
values to range from 0 to 1. After normalizing, the numbers will be small and the computation becomes
easier and faster. As the pixel range is 0 to 255, so dividing all the values by 255 will convert it to range from 0
to 1.
X_train=X_train/255
X_test=X_test/255
Now we will define the model.

1. We have first taken the Sequential model, which is appropriate for a plain stack of layers in the NN
where each layer has exactly one input tensor(Vector or Matrix) and one output tensor (Vector or
Matrix)
2. Then one by one we add layers to the Sequential NN
a. First layer is Conv2D - This layer applies sliding convolutional Kernels to 2-D input by moving
vertically and horizontally and computing the dot product of the weights and the input (32,3,3)
means 32 Kernels and each Kernel is 3x3 Matrix
b. Next Layer in MaxPool2D - This layer (Max Pooling) reduces the dimensionality of images by
reducing the number of pixels in the output from the previous layer, specially around the edges.
This reduces "Overfitting" and also reduces computational load
c. Then we add the "Flatten" layer - This basically flattens of reduces the input tensor to single
dimension. So that all three channels (RGB) are processed by all neurons in the next layer
d. Finally we add the "Dense" layer - A Dense layer is fully connected layer which connects every
input feature to every neuron in that layer. It is the most common NN layer used
#defining model
model=Sequential()
#adding convolution layer
model.add(Conv2D(32,(3,3),activation='relu',input_shape=(28,28,1)))
#adding pooling layer
model.add(MaxPool2D(2,2))
#adding fully connected layer
model.add(Flatten())
model.add(Dense(100,activation='relu'))
● Finally we will add the output layer, which is the final layer in the neural network where desired
predictions are obtained, and then we compile the model.
● The compilation is the final step which actually creates a model.
● After compilation, we train the model using the X_train,y_train sets.
○ During training we have to specify "Epoch"
○ An epoch means training the neural network with all the training data for one cycle. In an
epoch, we use all of the data exactly once in a pass. A forward pass and a backward pass
together are counted as one pass.
○ We will take Epoch as 50 as specified in the assignment
#adding output layer
model.add(Dense(10,activation='softmax'))
#compiling the model
model.compile(loss='sparse_categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
#fitting the model
model.fit(X_train,y_train,epochs=50)
Output
Finally we will evaluate our multi-layered sequential NN model on the test data
model.evaluate(X_test,y_test)
Output

Assignment_Week6_AI4ICPS

Uploaded by

Copyright:

Available Formats

You might also like

Assignment_Week6_AI4ICPS

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Assignment_Week6_AI4ICPS

Uploaded by

Copyright:

Available Formats

Assignment - Week 6 - Reinforcement and Deep Learning

Name: MOHIT SHUKLA

Question-1. For the code given in this github repository

Comment on most efficient path

Plot of Reward Gained vs No of Iterations

for point in edges:

M[goal, goal]= 100

# Determines the available actions for a given state

# Chooses one of the available actions at random

max_value = Q[action, max_index]

update(initial_state, action, gamma)

next_step_index = np.where(Q[current_state, ] == np.max(Q[current_state, ]))[1]

print("Most efficient path:")

Observations for change in Epoch value, and if there is an improvement in Accuracy

from tensorflow.keras.datasets import mnist

X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], X_train.shape[2], 1))

Now we will define the model.

#adding convolution layer

#adding pooling layer

#adding fully connected layer

#adding output layer

#compiling the model

#fitting the model

You might also like