Professional Documents
Culture Documents
Keras For Beginners: Implementing A Recurrent Neural Network
Keras For Beginners: Implementing A Recurrent Neural Network
Keras For Beginners: Implementing A Recurrent Neural Network
Keras is a simple-to-use but powerful deep learning library for Python. In this
post, we’ll build a simple Recurrent Neural Network (RNN) and train it to
solve a real problem with Keras.
This post is intended for complete beginners to Keras but does assume a
basic background knowledge of RNNs. My introduction to Recurrent
Neural Networks covers everything you need to know (and more) for this
post - read that first if necessary.
Here we go!
Just want the code? The full source code is at the end.
1. Setup
First, we need to download the dataset. You can either:
download it from my site (I’ve hosted a copy with files we won’t need for
this post removed), or
download it from the official site.
Either way, you’ll end up with a directory with the following structure:
dataset/
test/
neg/
pos/
train/
neg/
pos/
train_data = text_dataset_from_directory("./train")
test_data = text_dataset_from_directory("./test")
There’s one more small thing to do. If you browse through the dataset, you’ll
notice that some of the reviews include <br /> markers in them, which are
HTML line breaks. We want to get rid of those, so we’ll modify our data prep
a bit:
def prepareData(dir):
data = text_dataset_from_directory(dir)
return data.map(
lambda text, label: (regex_replace(text, '<br />', ' '), label),
)
train_data = prepareData('./train')
test_data = prepareData('./test')
Now, all <br /> instances in our dataset have been replaced with spaces.
You can try printing some of the dataset if you want:
model = Sequential()
model.add(Input(shape=(1,), dtype="string"))
Our model now takes in 1 string input - time to do something with that string.
Our first layer will be a TextVectorization layer, which will process the input
string and turn it into a sequence of integers, each one representing a token.
max_tokens = 1000
max_len = 100
vectorize_layer = TextVectorization(
max_tokens=max_tokens,
output_mode="int",
output_sequence_length=max_len,
)
To initialize the layer, we need to call .adapt():
model.add(vectorize_layer)
3.2 Embedding
Our next layer will be an Embedding layer, which will turn the integers
produced by the previous layer into fixed-length vectors.
max_tokens = 1000
model.add(vectorize_layer)
model.add(Embedding(max_tokens + 1, 128))
Finally, we’re ready for the recurrent layer that makes our network a RNN!
We’ll use a Long Short-Term Memory (LSTM) layer, which is a popular
choice for this kind of problem. It’s very simple to implement:
model.add(LSTM(64))
To finish off our network, we’ll add a standard fully-connected (Dense) layer
and an output layer with sigmoid activation:
model.add(Dense(64, activation="relu"))
model.add(Dense(1, activation="sigmoid"))
The optimizer. We’ll stick with a pretty good default: the Adam
gradient-based optimizer. Keras has many other optimizers you can
look into as well.
The loss function. Since we only have 2 output classes (positive and
negative), we’ll use the Binary Cross-Entropy loss. See all Keras losses.
A list of metrics. Since this is a classification problem, we’ll just have
Keras report on the accuracy metric.
Here’s what that compilation looks like:
model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'],
)
Onwards!
model.fit(train_data, epochs=10)
Putting all the code we’ve written thus far together and running it gives us
results like this:
Epoch 1/10
loss: 0.6441 - accuracy: 0.6281
Epoch 2/10
loss: 0.5544 - accuracy: 0.7250
Epoch 3/10
loss: 0.5670 - accuracy: 0.7200
Epoch 4/10
loss: 0.4505 - accuracy: 0.7919
Epoch 5/10
loss: 0.4221 - accuracy: 0.8062
Epoch 6/10
loss: 0.4051 - accuracy: 0.8156
Epoch 7/10
loss: 0.3870 - accuracy: 0.8247
Epoch 8/10
loss: 0.3694 - accuracy: 0.8339
Epoch 9/10
loss: 0.3530 - accuracy: 0.8406
Epoch 10/10
loss: 0.3365 - accuracy: 0.8502
We’ve achieved 85% train accuracy after 10 epochs! There’s certainly a lot
of room to improve (this problem isn’t that easy), but it’s not bad for a first
effort.
model.save_weights('cnn.h5')
We can now reload the trained model whenever we want by rebuilding it and
loading in the saved weights:
model = Sequential()
model.load_weights('cnn.h5')
print(model.predict([
"i loved it! highly recommend it to anyone and everyone looking for a great movie t
]))
print(model.predict([
"this was awful! i hated it so much, nobody should watch this. the acting was terri
]))
7. Extensions
There’s much more we can do to experiment with and improve our network.
Some examples of modifications you could make to our CNN include:
Network Depth
What happens if we add Recurrent layers? How does that affect training
and/or the model’s final performance?
model = Sequential()
model.add(LSTM(64, return_sequences=True))
model.add(LSTM(64))
Dropout
Too low of a max_tokens will exclude potentially useful words from our
vocabulary, while too high of one may increase the complexity and
training time of our model.
Too low of a max_len will impact our model’s performance on longer
reviews, while too high of one again may increase the complexity and
training time of our model.
Pre-processing
All we did to clean our dataset was remove <br /> markers. There may be
other pre-processing steps that would be useful to us. For example:
Conclusion
You’ve implemented your first RNN with Keras! I’ll include the full source
code again below for your reference.
def prepareData(dir):
data = text_dataset_from_directory(dir)
return data.map(
lambda text, label: (regex_replace(text, '<br />', ' '), label),
)
train_data = prepareData('./train')
test_data = prepareData('./test')
for text_batch, label_batch in train_data.take(1):
print(text_batch.numpy()[0])
print(label_batch.numpy()[0])
model = Sequential()
model.add(Input(shape=(1,), dtype="string"))
max_tokens = 1000
max_len = 100
vectorize_layer = TextVectorization(
max_tokens=max_tokens,
output_mode="int",
output_sequence_length=max_len,
)
model.add(vectorize_layer)
model.add(Embedding(max_tokens + 1, 128))
model.add(LSTM(64))
model.add(Dense(64, activation="relu"))
model.add(Dense(1, activation="sigmoid"))
model.save_weights('rnn')
model.load_weights('rnn')
model.evaluate(test_data)
print(model.predict([
"i loved it! highly recommend it to anyone and everyone looking for a great movie t
]))
print(model.predict([
"this was awful! i hated it so much, nobody should watch this. the acting was terri
]))