Keras Recurrent Neural Network With Python

Lets get straight into it, this tutorial will walk you through the steps to implement Keras with Python and thus to come up with a generative model.

So what exactly is Keras? Let's put it this way, it makes programming machine learning algorithms much much easier. It simply runs atop Tensorflow/Theano, cutting down on the coding and increasing efficiency. In more technical terms, Keras is a high-level neural network API written in Python.

Let's get started, I am assuming you all have Tensorflow and Keras installed.

Note: It's very important you have enough knowledge about recurrent neural networks before beginning 
this tutorial. Please refer to these links for further info! 

http://colah.github.io/posts/2015-08-Understanding-LSTMs/
https://medium.com/@shiyan/understanding-lstm-and-its-diagrams-37e2f46f1714
http://nikhilbuduma.com/2015/01/11/a-deep-dive-into-recurrent-neural-networks/

Implementation

Imports

It was quite sometime after I managed to get this working, it took hours and hours of research! Error on the input data, not enough material to train with, problems with the activation function and even the output looked like an alien jumped out it's spaceship and died on my screen. Although challenging, the hard work paid off!

To make it easier for everyone, I'll break up the code into chunks and explain them individually.

We start of by importing essential libraries...

1    import numpy as np
2    from keras.models import Sequential
3    from keras.layers import Dense
4    from keras.layers import Dropout
5    from keras.layers import LSTM
6    from keras.utils import np_utils

Line 1, this is the numpy library. Used to perform mathematical functions, can be used for matrix multiplication, arrays etc. We will be using it to structure our input, output data and labels.

Lines 1-6, represents the various Keras library functions that will be utilised in order to construct our RNN.

Sequential: This essentially is used to create a linear stack of layers
Dense: This simply put, is the output layer of any NN/RNN. It performs the output = activation(dot(input, weights) + bias)
Dropout: RNNs are very prone to overfitting, this function ensures overfitting remains to a minimum. It does this by selecting random neurons and ignoring them during training, or in other words "dropped-out"
LSTM: Long-Short Term Memory Unit
np_utils: Specific tools to allow us to correctly process data and form it into the right format

Don't worry if you don't fully understand what all of these do! I will expand more on these as we go along.

Input Data

Before we begin the actual code, we need to get our input data. My input will be a section of a play from the playwright genius Shakespeare. I will be using a monologue from Othello. You can get the text file from here

Name it whatever you want. I'm calling mine "Othello.txt". Save it in the same directory as your Python program.

Tools for Formatting

Although we now have our data, before we can input it into an RNN, it needs to be formatted. It needs to be what Keras identifies as input, a certain configuration.

1    #Read the data, turn it into lower case
2    data = open("Othello.txt").read().lower()
3    #This get the set of characters used in the data and sorts them
4    chars = sorted(list(set(data)))
5    #Total number of characters used in the data
6    totalChars = len(data)
7    #Number of unique chars
8    numberOfUniqueChars = len(chars)

To implement the certain configuration we first need to create a couple of tools.

Line 2 opens the text file in which your data is stored, reads it and converts all the characters into lowercase. Lowercasing characters is a form of normalisation. If the RNN isn't trained properly, capital letters might start popping up in the middle of words, for example "scApes".

Line 4 creates a sorted list of characters used in the text. For example, for me it created the following:

['\n', ' ', "'", ',', '-', '.', ';', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 
'i', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'y']

Line 6 simply stores the total number of characters in the entire dataset into totalChars

Line 8 stores the number of unique characters or the length of chars

Now we need to create a dictionary of each character so it can be easily represented.

1    #This allows for characters to be represented by numbers
2    CharsForids = {char:Id for Id, char in enumerate(chars)}
3    #This is the opposite to the above
4    idsForChars = {Id:char for Id, char in enumerate(chars)}
5    #How many timesteps e.g how many characters we want to process in one go
6    numberOfCharsToLearn = 100

Line 2 creates a dictionary where each character is a key. Each key character is represented by a number. For example entering this...

CharsForids["o"]

...into the shell outputs..

Line 4 is simply the opposite of Line 2. Now the number is the key and the corresponding character is the value.

Line 6 is basically how many characters we want one training example to contain or in other words the number of time-steps.

Formatting

Our tools are ready! We can now format our data!

1     #Input data
2     charX = []
3     #Output data
4     y = []
5     #Since our timestep sequence represetns a process for every 100 chars we omit
6     #the first 100 chars so the loop runs a 100 less or there will be index out of
7     #range
8     counter = totalChars - numberOfCharsToLearn
9     #This loops through all the characters in the data skipping the first 100
10    for i in range(0, counter, 1):
11        #This one goes from 0-100 so it gets 100 values starting from 0 and stops
12        #just before the 100th value
13        theInputChars = data[i:i+numberOfCharsToLearn]
14        #With no ':' you start with 0, and so you get the actual 100th value
15        #Essentially, the output Chars is the next char in line for those 100 chars in charX
16        theOutputChars = data[i + numberOfCharsToLearn]
17        #Appends every 100 chars ids as a list into charX
18        charX.append([CharsForids[char] for char in theInputChars])
19        #For every 100 values there is one y value which is the output
20        y.append(CharsForids[theOutputChars])

Line 2, 4 are empty lists for storing the formatted data as input, charX and output, y

Line 8 creates a counter for our for loop. We run our loop for a 100 (numberOfCharsToLearn) less as we will be referencing the last 100 as the output chars or the consecutive chars to the input

Line 13 theInputChars stores the first 100 chars and then as the loop iterates, it takes the next 100 and so on...

Line 16 theOutputChars stores only 1 char, the next char after the last char in theInputChars

Line 18 the charX list is appended to with 100 integers. Each of those integers are IDs of the chars in theInputChars

Line 20 appends an integer ID every iteration to the y list corresponding to the single char in theOutputChars

Are we now ready to put our data through the RNN? Not quite! We have the data represented correctly but still not in the right format

1    #Len(charX) represents how many of those time steps we have
2    #The numberOfCharsToLearn is how many character we process
3    #Our features are set to 1 because in the output we are only predicting 1 char
4    X = np.reshape(charX, (len(charX), numberOfCharsToLearn, 1))
5    #This is done for normalization
6    X = X/float(numberOfUniqueChars)
7    #This sets it up for us so we can have a categorical(#feature) output format
8    y = np_utils.to_categorical(y)

Line 4 shapes the input array into [samples, time-steps, features], required for Keras

Line 6 this is a form of normalisation

Line 8 this converts y into a one-hot vector. A one-hot vector is an array of 0s and 1s. The 1 only occurs at the position where the ID is true. For example, say we have 5 unique character IDs, [0, 1, 2, 3, 4]. Then say we have 1 single data output equal to 1, y = ([[0, 1, 0, 0, 0]]). Notice how the 1 only occurs at the position of 1. Now imagine exactly this, but for 100 different examples with a length of numberOfUniqueChars

Building the RNN model

Thats data formatting and representation part finished! Yes! We can now start building our RNN model!

1     model = Sequential()
2     #Since we know the shape of our Data we can input the timestep and feature data
3     #The number of timestep sequence are dealt with in the fit function
4     model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2])))
5     model.add(Dropout(0.2))
6     #number of features on the output
7     model.add(Dense(y.shape[1], activation='softmax'))
8     model.compile(loss='categorical_crossentropy', optimizer='adam')
9     model.fit(X, y, epochs=5, batch_size=128)
10    model.save_weights("Othello.hdf5")
11    #model.load_weights("Othello.hdf5")

Line 1 this uses the Sequential() import I mentioned earlier. This essentially initialises the network. It creates an empty "template model".

Line 4 we now add our first layer to the empty "template model". This is the LSTM layer which contains 256 LSTM units, with the input shape being input_shape=(numberOfCharsToLearn, features). It was written that way to avoid any silly mistakes! Although the X array is of 3 dimensions we omit the "samples dimension" in the LSTM layer because it is accounted for automatically later on.

Note: Omitting does not mean the "samples dimension" is not considered!

Line 5 this as explained in the imports section "drops-out" a neuron. The 0.2 represents a percentage, it means 20% of the neurons will be "dropped" or set to 0

Line 7 the layer acts as an output layer. It performs the activation of the dot of the weights and the inputs plus the bias

Note: RNNs do not actually utilise the activation function in its recurrent components to minimise the vanishing gradient problem!

Line 8 this is the configuration settings. Our loss function is the "categorical_crossentropy" and the optimizer is "Adam"

Line 9 runs the training algorithm. The epochs are the number of times we want each of our batches to be evaluated. I have set it to 5 for this tutorial but generally 20 or higher epochs are favourable. The batch size is the how many of our input data set we want evaluated at once. In this case we input 128 of examples into the training algorithm then the next 128 and so on..

Line 10, finally once the training is done, we can save the weights

Line 11 this is commented out initially to prevent errors but once we have saved our weights we can comment out Line 9, 10 and uncomment line 11 to load previously trained weights

Note: You can change the epoch number and batch size to whatever you want, I have kept it low for this tutorial

Training

During training you might see something like this in the Python shell

Epoch 1/5

 128/1760 [=>............................] - ETA: 43s - loss: 3.3984
 256/1760 [===>..........................] - ETA: 27s - loss: 3.3905
 384/1760 [=====>........................] - ETA: 21s - loss: 3.3835
 512/1760 [=======>......................] - ETA: 18s - loss: 3.3749
 640/1760 [=========>....................] - ETA: 15s - loss: 3.3615
 768/1760 [============>.................] - ETA: 13s - loss: 3.3425
 896/1760 [==============>...............] - ETA: 11s - loss: 3.3174
1024/1760 [================>.............] - ETA: 9s - loss: 3.3563

Once it's done computing all the epoch it will straightaway run the code for generating new text

Generating New Text

Let's look at the code that allows us to generate new text!

1     randomVal = np.random.randint(0, len(charX)-1)
2     randomStart = charX[randomVal]
3     for i in range(500):
4         x = np.reshape(randomStart, (1, len(randomStart), 1))
5         x = x/float(numberOfUniqueChars)
6         pred = model.predict(x)
7         index = np.argmax(pred)
8         randomStart.append(index)
9         randomStart = randomStart[1: len(randomStart)]
10     print("".join([idsForChars[value] for value in randomStart]))

Line 1 so this basically generates a random value from 0 to anything between the length of the input data minus 1

Line 2 this provides us with our starting sentence in integer form

Line 3 Now the 500 is not absolute you can change it but I would like to generate 500 chars

Line 4 this generates a single data example which we can put through to predict the next char

Line 5,6 we normalise the single example and then put it through the prediction model

Line 7 This gives us back the index of the next predicted character after that sentence

Line 8,9 appending our predicted character to our starting sentence gives us 101 chars. We can then take the next 100 char by omitting the first one

Line 10 loops until it's reached 500 and then prints out the generated text by converting the integers back into chars

Conclusion and Fixes

So that was all for the generative model. If for some reason your model prints out blanks or gibberish then you need to train it for longer. Try playing with the model configuration until you get a real result. Keras tends to overfit small datasets, anyhting below 100Kb will produce gibberish. You need to have a dataset of atleast 100Kb or bigger for any good result!

If you have any questions send me a message and I will try my best to reply!!! Thanks for reading!

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

sagar448 / Keras-Recurrent-Neural-Network-Python

Programming Languages

Labels

Projects that are alternatives of or similar to Keras-Recurrent-Neural-Network-Python