Other NN Architectures

Recurrent networks are essentially feedforward networks that retain state. All the networks we have looked at so far require an input of a fixed size, such as an image, and give a fixed size output, such as the probabilities of a particular class. Recurrent networks are different in that they accept a sequence, of arbitrary size, as the input and produce a sequence as output. Moreover, the internal state of the network's hidden layers is updated as a result of a learned function and the input. In this way, a recurrent network remembers its state. Subsequent states are a function of previous states.

In this chapter, we will cover the following:

Introduction to recurrent networks
Long short-term memory networks

Introduction to recurrent networks

Recurrent networks have been shown to be very powerful in predicting time series data. This is something fundamental to biological brains that enables us to do things such as safely drive a car, play a musical instrument, evade predators, understand language, and interact with a dynamic world. This sense of the flow of time and the understanding of how things change over time is fundamental to intelligent life, so it is no surprise that in artificial systems this ability is important.

The ability to understand time series data is also important in creative endeavors, and recurrent networks have shown some ability in things such as composing a melody, constructing grammatically correct sentences, and creating visually pleasing images.

Feedforward and convolutional networks achieve very good results, as we have seen, in tasks such as the classification...

Long short-term memory networks

Long short-term memory networks (LSTMS), are a special type of RNN capable of learning long-term dependencies. While standard RNNs can remember previous states to some extent, they did this on a fairly basic level by updating a hidden state on each time step. This enabled the network to remember short-term dependencies. The hidden state, being a function of previous states, retains information about these previous states. However, the more time steps there are between the current state and a previous state, it diminishes the effect that this earlier state will have on the current state. Far less information is retained on a state that is say 10 time steps before the time step immediately preceding the current step. This is despite that fact that earlier time steps may contain important information with direct relevance to a particular problem or...

Summary

In this chapter, we introduced recurrent neural networks and demonstrated how to use an RNN on the MNIST dataset. RNNs are particularly useful for working with time series data, since they are essentially feedforward networks that are unrolled over time. This makes them very suitable for tasks such as handwriting and speech recognition, as they operate on sequences of data. We also looked at a more powerful variant of the RNN, the LSTM. The LSTM uses four gates to decide what information to pass on to the next time step, enabling it to uncover long-term dependencies in data. Finally, in this chapter we built a simple language model, enabling us to generate text from sample input text. We used a model based on the GRU. The GRU is a slightly simplified version of the LSTM, containing three gates and combining the input and forget gates of the LSTM. This model used probability...