SlideShare a Scribd company logo
Recurrent Neural Network
Recurrent Neural Network
Recurrent Neural Network
● Predicting the future is what we do all the time
○ Finishing a friend’s sentence
○ Anticipating the smell of coffee at the breakfast or
○ Catching the ball in the field
● In this chapter, we will cover RNN
○ Networks which can predict future
● Unlike all the nets we have discussed so far
○ RNN can work on sequences of arbitrary lengths
○ Rather than on fixed-sized inputs
Recurrent Neural Network
Recurrent Neural Network - Applications
● RNN can analyze time series data
○ Such as stock prices, and
○ Tell you when to buy or sell
Recurrent Neural Network
Recurrent Neural Network - Applications
● In autonomous driving systems, RNN can
○ Anticipate car trajectories and
○ Help avoid accidents
Recurrent Neural Network
Recurrent Neural Network - Applications
● RNN can take sentences, documents, or audio samples as input and
○ Make them extremely useful
○ For natural language processing (NLP) systems such as
■ Automatic translation
■ Speech-to-text or
■ Sentiment analysis
Recurrent Neural Network
Recurrent Neural Network - Applications
● RNNs’ ability to anticipate also makes them capable of surprising
creativity.
○ You can ask them to predict which are the most likely next notes in a
melody
○ Then randomly pick one of these notes and play it.
○ Then ask the net for the next most likely notes, play it, and repeat the
process again and again.
Here is an example melody produced by Google’s Magenta project
Recurrent Neural Network
Recurrent Neural Network
● In this chapter we will learn about
○ Fundamental concepts in RNNs
○ The main problem RNNs face
○ And the solution to the problems
○ How to implement RNNs
● Finally, we will take a look at the
○ Architecture of a machine translation system
Recurrent Neural Network
Recurrent Neurons
Recurrent Neural Network
Recurrent Neurons
● Up to now we have mostly looked at feedforward neural networks
○ Where the activations flow only in one direction
○ From the input layer to the output layer
● RNN looks much like a feedforward neural network
○ Except it also has connections pointing backward
Recurrent Neural Network
Recurrent Neurons
● Let’s look at the simplest possible RNN
○ Composed of just one neuron receiving inputs
○ Producing an output, and
○ Sending that output back to itself
Input
Output
Sending output back to
itself
Recurrent Neural Network
Recurrent Neurons
● At each time step t (also called a frame)
○ This recurrent neuron receives the inputs x(t)
○ As well as its own output from the previous time step y(t–1)
A recurrent neuron (left), unrolled through time (right)
Recurrent Neural Network
Recurrent Neurons
● We can represent this tiny network against the time axis (See below
figure)
● This is called unrolling the network through time
A recurrent neuron (left), unrolled through time (right)
Recurrent Neural Network
Recurrent Neurons
● We can easily create a layer of recurrent neurons
● At each time step t, every neuron receives both the
○ Input vector x(t)
and
○ Output vector from the previous time step y(t–1)
A layer of recurrent neurons (left), unrolled through time(right)
Recurrent Neural Network
Recurrent Neurons
● Each recurrent neuron has two sets of weights
○ One for the inputs x(t)
and the
○ Other for the outputs of the previous time step, y(t–1)
● Let’s call these weight vectors wx
and wy
● Below equation represents the output of a single recurrent neuron
Output of a single recurrent neuron for a single instance
bias
ϕ() is the activation function like
ReLU
Recurrent Neural Network
Recurrent Neurons
● We can compute a whole layer’s output
○ In one shot for a whole mini-batch
○ Using a vectorized form of the previous equation
Outputs of a layer of recurrent neurons for all instances in a mini-batch
Recurrent Neural Network
Recurrent Neurons
● Y(t)
is an m x nneurons
matrix containing the
○ Layer’s outputs at time step t for each instance in the minibatch
○ m is the number of instances in the mini-batch
○ nneurons
is the number of neurons
Outputs of a layer of recurrent neurons for all instances in a mini-batch
Recurrent Neural Network
Recurrent Neurons
● X(t)
is an m × ninputs
matrix containing the inputs for all instances
○ ninputs
is the number of input features
Outputs of a layer of recurrent neurons for all instances in a mini-batch
Recurrent Neural Network
Recurrent Neurons
● Wx
is an ninputs
× nneurons
matrix containing the connection weights for the
inputs of the current time step
● Wy
is an nneurons
× nneurons
matrix containing the connection weights for
the outputs of the previous time step
Outputs of a layer of recurrent neurons for all instances in a mini-batch
Recurrent Neural Network
Recurrent Neurons
● The weight matrices Wx
and Wy
are often concatenated into a single
weight matrix W of shape (ninputs
+ nneurons
) × nneurons
● b is a vector of size nneurons
containing each neuron’s bias term
Outputs of a layer of recurrent neurons for all instances in a mini-batch
Recurrent Neural Network
Memory Cells
● Since the output of a recurrent neuron at time step t is a
○ Function of all the inputs from previous time steps
○ We can say that it has a form of memory
● A part of a neural network that
○ Preserves some state across time steps is called a memory cell
Recurrent Neural Network
Memory Cells
● In general a cell’s state at time step t, denoted h(t)
is a
○ Function of some inputs at that time step and
○ Its state at the previous time step h(t)
= f(h(t–1)
, x(t)
)
● Its output at time step t, denoted y(t)
is also a
○ Function of the previous state and the current inputs
Recurrent Neural Network
Memory Cells
● In the case of basics cells we have discussed so far
○ The output is simply equal to the state
○ But in more complex cells this is not always the case
A cell’s hidden state and its output may be different
Recurrent Neural Network
Input and Output Sequences
Sequence-to-sequence Network
● An RNN can simultaneously take a
○ Sequence of inputs and
○ Produce a sequence of outputs
Recurrent Neural Network
Input and Output Sequences
Sequence-to-sequence Network
● This type of network is useful for predicting time series
○ Such as stock prices
● We feed it the prices over the last N days and
○ It must output the prices shifted by one day into the future
○ i.e., from N – 1 days ago to tomorrow
Recurrent Neural Network
Input and Output Sequences
Sequence-to-vector Network
● Alternatively we could feed the network a sequence of inputs and
○ Ignore all outputs except for the last one
Recurrent Neural Network
Input and Output Sequences
Sequence-to-vector Network
● We can feed this network a sequence of words
○ Corresponding to a movie review and
○ The network would output a sentiment score
○ e.g., from –1 [hate] to +1 [love]
Recurrent Neural Network
Input and Output Sequences
Vector-to-sequence Network
● We could feed the network a single input at the first time step and
○ Zeros for all other time steps and
○ Let is output a sequence
● For example, the input could be an image and the
○ Output could be a caption for the image
Recurrent Neural Network
Input and Output Sequences
Encoder-Decoder
● In this network, we have
○ sequence-to-vector network, called an encoder followed by
○ vector-to-sequence network, called a decoder
Recurrent Neural Network
Input and Output Sequences
Encoder-Decoder
● This can be used for translating a sentence
○ From one language to another
● We feed the network sentence in one language
○ The encoder converts this sentence into single vector representation
○ Then the decoder decodes this vector into a sentence in another
language
Recurrent Neural Network
Input and Output Sequences
Encoder-Decoder
● This two step model works much better than
○ Trying to translate on the fly with a
○ Single sequence-to-sequence RNN
● Since the last words of a sentence can affect the
○ First words of the translation
○ So we need to wait until we know the whole sentence
Recurrent Neural Network
Basic RNNs in TensorFlow
Recurrent Neural Network
Basic RNNs in TensorFlow
● Let’s implement a very simple RNN model
○ Without using any of the TensorFlow’s RNN operations
○ To better understand what goes on under the hood
● Let’s create an RNN composed of a layer of five recurrent neurons
○ Using the tanh activation function and
○ Assume that the RNN runs over only two time steps and
○ Taking input vectors of size 3 at each time step
Recurrent Neural Network
Basic RNNs in TensorFlow
● This network looks like a two-layer feedforward neural network with two
differences
○ The same weights and bias terms are shared by both layers and
○ We feed inputs at each layer, and we get outputs from each layer
Recurrent Neural Network
Basic RNNs in TensorFlow
● To run the model, we need to feed it the inputs at both time steps
● Mini-batch contains four instances
○ Each with an input sequence composed of exactly two inputs
Recurrent Neural Network
Basic RNNs in TensorFlow
● At the end, Y0_val and Y1_val contain the outputs of the network
○ At both time steps for all neurons and
○ All instances in the mini-batch
Recurrent Neural Network
Checkout the complete code under “Manual
RNN” section in notebook
Recurrent Neural Network
Static Unrolling Through Time
● Let’s look at how to create the same model
○ Using TensorFlow’s RNN operations
● The static_rnn() function creates
○ An unrolled RNN network by chaining cells
● The below code creates the exact same model as the previous one
>>> X0 = tf.placeholder(tf.float32, [None, n_inputs])
>>> X1 = tf.placeholder(tf.float32, [None, n_inputs])
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> output_seqs, states = tf.contrib.rnn.static_rnn(
basic_cell, [X0, X1], dtype=tf.float32
)
>>> Y0, Y1 = output_seqs
Recurrent Neural Network
Static Unrolling Through Time
>>> X0 = tf.placeholder(tf.float32, [None, n_inputs])
>>> X1 = tf.placeholder(tf.float32, [None, n_inputs])
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> output_seqs, states = tf.contrib.rnn.static_rnn(
basic_cell, [X0, X1], dtype=tf.float32
)
>>> Y0, Y1 = output_seqs
● First we create the input placeholders
Recurrent Neural Network
Static Unrolling Through Time
>>> X0 = tf.placeholder(tf.float32, [None, n_inputs])
>>> X1 = tf.placeholder(tf.float32, [None, n_inputs])
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> output_seqs, states = tf.contrib.rnn.static_rnn(
basic_cell, [X0, X1], dtype=tf.float32
)
>>> Y0, Y1 = output_seqs
● Then we create a BasicRNNCell
○ It is like a factory that creates
○ Copies of the cell to build the unrolled RNN
■ One for each time step
Recurrent Neural Network
Static Unrolling Through Time
>>> X0 = tf.placeholder(tf.float32, [None, n_inputs])
>>> X1 = tf.placeholder(tf.float32, [None, n_inputs])
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> output_seqs, states = tf.contrib.rnn.static_rnn(
basic_cell, [X0, X1], dtype=tf.float32
)
>>> Y0, Y1 = output_seqs
● Then we call static_rnn(), giving it the cell factory and the input
tensors
● And telling it the data type of the inputs
○ This is used to create the initial state matrix
○ Which by default is full of zeros
Recurrent Neural Network
Static Unrolling Through Time
>>> X0 = tf.placeholder(tf.float32, [None, n_inputs])
>>> X1 = tf.placeholder(tf.float32, [None, n_inputs])
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> output_seqs, states = tf.contrib.rnn.static_rnn(
basic_cell, [X0, X1], dtype=tf.float32
)
>>> Y0, Y1 = output_seqs
● The static_rnn() function returns two objects
● The first is a Python list containing the output tensors for each time step
● The second is a tensor containing the final states of the network
● When we use basic cells
○ Then the final state is equal to the last output
Recurrent Neural Network
Static Unrolling Through Time
Checkout the complete code under “Using
static_rnn()” section in notebook
Recurrent Neural Network
Static Unrolling Through Time
● In the previous example, if there were 50 time steps then
○ It would not be convenient to define
○ 50 place holders and 50 output tensors
● Moreover, at execution time we would have to feed
○ Each of the 50 placeholders and manipulate the 50 outputs
● Let’s do it in a better way
Recurrent Neural Network
Static Unrolling Through Time
>>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
>>> X_seqs = tf.unstack(tf.transpose(X, perm=[1, 0, 2]))
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> output_seqs, states = tf.contrib.rnn.static_rnn(
basic_cell, X_seqs, dtype=tf.float32
)
>>> outputs = tf.transpose(tf.stack(output_seqs), perm=[1, 0, 2])
● The above code takes a single input placeholder of
○ shape [None, n_steps, n_inputs]
○ Where the first dimension is the mini-batch size
Recurrent Neural Network
Static Unrolling Through Time
>>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
>>> X_seqs = tf.unstack(tf.transpose(X, perm=[1, 0, 2]))
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> output_seqs, states = tf.contrib.rnn.static_rnn(
basic_cell, X_seqs, dtype=tf.float32
)
>>> outputs = tf.transpose(tf.stack(output_seqs), perm=[1, 0, 2])
● Then it extracts the list of input sequences for each time step
● X_seqs is a Python list of n_steps tensors of shape [None, n_inputs]
○ Where first dimension is the minibatch size
Recurrent Neural Network
Static Unrolling Through Time
>>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
>>> X_seqs = tf.unstack(tf.transpose(X, perm=[1, 0, 2]))
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> output_seqs, states = tf.contrib.rnn.static_rnn(
basic_cell, X_seqs, dtype=tf.float32
)
>>> outputs = tf.transpose(tf.stack(output_seqs), perm=[1, 0, 2])
● To do this, we first swap the first two dimensions
○ Using the transpose() function so that the
○ Time steps are now the first dimension
Recurrent Neural Network
Static Unrolling Through Time
>>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
>>> X_seqs = tf.unstack(tf.transpose(X, perm=[1, 0, 2]))
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> output_seqs, states = tf.contrib.rnn.static_rnn(
basic_cell, X_seqs, dtype=tf.float32
)
>>> outputs = tf.transpose(tf.stack(output_seqs), perm=[1, 0, 2])
● Then we extract a Python list of tensors along the first dimension
○ i.e., one tensor per time step
○ Using the unstack() function
Recurrent Neural Network
Static Unrolling Through Time
>>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
>>> X_seqs = tf.unstack(tf.transpose(X, perm=[1, 0, 2]))
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> output_seqs, states = tf.contrib.rnn.static_rnn(
basic_cell, X_seqs, dtype=tf.float32
)
>>> outputs = tf.transpose(tf.stack(output_seqs), perm=[1, 0, 2])
● The next two lines are same as before
Recurrent Neural Network
Static Unrolling Through Time
>>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
>>> X_seqs = tf.unstack(tf.transpose(X, perm=[1, 0, 2]))
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> output_seqs, states = tf.contrib.rnn.static_rnn(
basic_cell, X_seqs, dtype=tf.float32
)
>>> outputs = tf.transpose(tf.stack(output_seqs), perm=[1, 0, 2])
● Finally, we merge all the output tensors into a single tensor
○ Using the stack() function
● And then we swap the first two dimensions to get a
○ Final outputs tensor of shape [None, n_steps, n_neurons]
Recurrent Neural Network
Static Unrolling Through Time
● Now we can run the network by
○ Feeding it a single tensor that contains
○ All the mini-batch sequences
Recurrent Neural Network
Static Unrolling Through Time
● And then we get a single outputs_val tensor for
○ All instances
○ All time steps, and
○ All neurons
Recurrent Neural Network
Static Unrolling Through Time
Checkout the complete code under “Packing
sequences” section in notebook
Recurrent Neural Network
Static Unrolling Through Time
● The previous approach still builds a graph
○ Containing one cell per time step
● If there were 50 time steps, the graph would look ugly
● It is like writing a program without using for loops
○ Y0=f(0,X0); Y1=f(Y0, X1); Y2=f(Y1, X2); ...; Y50=f(Y49, X50))
● With such a large graph
○ Since it must store all tensor values during the forward pass
○ So it can use them to compute gradients during the reverse pass
○ We may get out-of-memory (OOM) errors
○ During backpropagation (in GPU cards because of limited memory)
Recurrent Neural Network
Dynamic Unrolling Through Time
Let’s look at the better solution than previous
approach using the dynamic_rnn() function
Recurrent Neural Network
Dynamic Unrolling Through Time
● The dynamic_rnn() function uses a while_loop() operation to
○ Run over the cell the appropriate number of times
● We can set swap_memory=True
○ If we want it to swap the GPU’s memory to the CPU’s
○ Memory during backpropagation to avoid out of memory errors
● It also accepts a single tensor for
○ All inputs at every time step (shape [None, n_steps, n_inputs]) and
○ It outputs a single tensor for all outputs at every time step
■ (shape [None, n_steps, n_neurons])
○ There is no need to stack, unstack, or transpose
Recurrent Neural Network
Dynamic Unrolling Through Time
RNN using dynamic_rnn
>>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> outputs, states = tf.nn.dynamic_rnn(basic_cell, X,
dtype=tf.float32)
Recurrent Neural Network
Dynamic Unrolling Through Time
Checkout the complete code under “Using
dynamic_rnn()” section in notebook
Recurrent Neural Network
Dynamic Unrolling Through Time
Note
● During backpropagation
○ The while_loop() operation does the appropriate magic
○ It stores the tensor values for each iteration during the forward pass
○ So it can use them to compute gradients during the reverse pass
Recurrent Neural Network
Handling Variable Length Input Sequences
● So far we have used only fixed-size input sequences
● What if the input sequences have variable lengths (e.g., like sentences)
● In this case we should set the sequence_length parameter
○ When calling the dynamic_rnn() function
○ It must be a 1D tensor indicating the length of the
○ Input sequence for each instance
Recurrent Neural Network
Handling Variable Length Input Sequences
● Suppose the second input sequence contains
○ Only one input instead of two
○ Then It must be padded with a zero vector
○ In order to fit in the input tensor X
Recurrent Neural Network
Handling Variable Length Input Sequences
● Now we need to feed values for both placeholders X and seq_length
Recurrent Neural Network
Handling Variable Length Input Sequences
● Now the RNN outputs zero vectors for
○ Every time step past the input sequence length
○ Look at the second instance’s output for the second time step
Recurrent Neural Network
Handling Variable Length Input Sequences
● Moreover, the states tensor contains the final state of each cell
○ Excluding the zero vectors
Recurrent Neural Network
Handling Variable Length Input Sequences
Checkout the complete code under “Setting
the sequence lengths” section in notebook
Recurrent Neural Network
Handling Variable-Length Output Sequences
● What if the output sequences have variable lengths
● If we know in advance what length each sequence will have
○ For example if we know that it will be the same length as the input
sequence
○ Then we can set the sequence_length parameter as discussed
● Unfortunately, in general this will not be possible
○ For example,
■ The length of a translated sentence is generally different from the
■ Length of the input sentence
Recurrent Neural Network
Handling Variable-Length Output Sequences
● In this case, the most common solution is to define
○ A special output called an end-of-sequence token (EOS token)
● Any output past the EOS should be ignored - We will discuss it later in
details
Recurrent Neural Network
Till now we have learnt how to build an RNN
network. But how do we train it?
Recurrent Neural Network
Training RNNs
Recurrent Neural Network
Training RNNs
● To train an RNN, the trick is to unroll it through time and
then simply use regular backpropagation
● This strategy is called backpropagation through time
(BPTT)
Recurrent Neural Network
Training RNNs
Understanding how RNNs are trained
Just like in regular backpropagation, there is a first forward pass
through the unrolled network, represented by the dashed
arrows
Recurrent Neural Network
Training RNNs
Understanding how RNNs are trained
Then the output sequence is evaluated using a cost function
where tmin
and tmax
are the first
and last output time steps, not counting the ignored outputs
Recurrent Neural Network
Then the gradients of that cost function are propagated
backward through the unrolled network, represented by the
solid arrows
Training RNNs
Understanding how RNNs are trained
Recurrent Neural Network
And finally the model parameters are updated using the
gradients computed during BPTT
Training RNNs
Understanding how RNNs are trained
Recurrent Neural Network
Note that the gradients flow backward through all the outputs
used by the cost function, not just through the final output
Training RNNs
Understanding how RNNs are trained
Recurrent Neural Network
Here, the cost function is computed using the last three outputs
of the network, Y(2)
, Y(3)
, and Y(4)
, so gradients flow through
these three outputs, but not through Y(0)
and Y(1)
Training RNNs
Understanding how RNNs are trained
Recurrent Neural Network
Moreover, since the same parameters W and b are used at
each time step, backpropagation will do the right thing and sum
over all time steps
Training RNNs
Understanding how RNNs are trained
Recurrent Neural Network
Training a Sequence Classifier
Let’s train an RNN to classify MNIST images
Recurrent Neural Network
Training a Sequence Classifier
● A convolutional neural network would be better suited for image
classification
● But this makes for a simple example that we are already familiar with
Recurrent Neural Network
Training a Sequence Classifier
Overview of the task
● We will treat each image as a sequence of 28 rows of 28 pixels each,
since each MNIST image is 28 × 28 pixels
● We will use cells of 150 recurrent neurons, plus a fully connected
layer containing 10 neurons, one per class, connected to the output of
the last time step
● This will be followed by a softmax layer
Recurrent Neural Network
Overview of the task
Training a Sequence Classifier
Recurrent Neural Network
Construction Phase
● The construction phase is quite straightforward
● It’s pretty much the same as the MNIST classifier we built previously,
except that an unrolled RNN replaces the hidden layers
● Note that the fully connected layer is connected to the states tensor,
which contains only the final state of the RNN i.e., the 28th output
Training a Sequence Classifier
Recurrent Neural Network
Construction Phase
>>> from tensorflow.contrib.layers import fully_connected
>>> n_steps = 28
>>> n_inputs = 28
>>> n_neurons = 150
>>> n_outputs = 10
>>> learning_rate = 0.001
>>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
>>> y = tf.placeholder(tf.int32, [None])
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> outputs, states = tf.nn.dynamic_rnn(basic_cell, X,
dtype=tf.float32)
Training a Sequence Classifier
Run it on Notebook
Recurrent Neural Network
Construction Phase
>>> logits = tf.layers.dense(states, n_outputs, activation_fn=None)
>>> xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
labels=y, logits=logits)
>>> loss = tf.reduce_mean(xentropy)
>>> optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
>>> training_op = optimizer.minimize(loss)
>>> correct = tf.nn.in_top_k(logits, y, 1)
>>> accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
>>> init = tf.global_variables_initializer()
Training a Sequence Classifier
Run it on Notebook
Recurrent Neural Network
Load the MNIST data and reshape it
Now we will load the MNIST data and reshape the test data to [batch_size,
n_steps, n_inputs] as is expected by the network
>>> from tensorflow.examples.tutorials.mnist import
input_data
>>> mnist = input_data.read_data_sets("data/mnist/")
>>> X_test = mnist.test.images.reshape((-1, n_steps,
n_inputs))
>>> y_test = mnist.test.labels
Training a Sequence Classifier
Run it on Notebook
Recurrent Neural Network
Training the RNN
We reshape each training batch before feeding it to the network
>>> n_epochs = 100
>>> batch_size = 150
>>> with tf.Session() as sess:
init.run()
for epoch in range(n_epochs):
for iteration in range(mnist.train.num_examples // batch_size):
X_batch, y_batch = mnist.train.next_batch(batch_size)
X_batch = X_batch.reshape((-1, n_steps, n_inputs))
sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
acc_test = accuracy.eval(feed_dict={X: X_test, y: y_test})
print(epoch, "Train accuracy:", acc_train, "Test accuracy:", acc_test)
Training a Sequence Classifier
Run it on Notebook
Recurrent Neural Network
The Output
The output should look like this:
0 Train accuracy: 0.713333 Test accuracy: 0.7299
1 Train accuracy: 0.766667 Test accuracy: 0.7977
...
98 Train accuracy: 0.986667 Test accuracy: 0.9777
99 Train accuracy: 0.986667 Test accuracy: 0.9809
Training a Sequence Classifier
Recurrent Neural Network
Conclusion
● We get over 98% accuracy — not bad!
● Plus we would certainly get a better result by
○ Tuning the hyperparameters
○ Initializing the RNN weights using He initialization
○ Training longer
○ Or adding a bit of regularization e.g., dropout
Training a Sequence Classifier
Recurrent Neural Network
Training to Predict Time Series
Now, we will train an RNN to predict the next value in a
generated time series
Recurrent Neural Network
Training to Predict Time Series
● Each training instance is a randomly selected sequence of 20 consecutive
values from the time series
● And the target sequence is the same as the input sequence, except it is
shifted by one time step into the future
Recurrent Neural Network
Training to Predict Time Series
Construction Phase
● It will contain 100 recurrent neurons and we will unroll it over 20
time steps since each training instance will be 20 inputs long
● Each input will contain only one feature, the value at that time
● The targets are also sequences of 20 inputs, each containing a single
value
Recurrent Neural Network
Construction Phase
>>> n_steps = 20
>>> n_inputs = 1
>>> n_neurons = 100
>>> n_outputs = 1
>>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
>>> y = tf.placeholder(tf.float32, [None, n_steps, n_outputs])
>>> cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons,
activation=tf.nn.relu)
>>> outputs, states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)
Training to Predict Time Series
Run it on Notebook
Recurrent Neural Network
Construction Phase
● At each time step we now have an output vector of size 100
● But what we actually want is a single output value at each time step
● The simplest solution is to wrap the cell in an
OutputProjectionWrapper
Training to Predict Time Series
Recurrent Neural Network
Construction Phase
● A cell wrapper acts like a normal cell, proxying every method call to an
underlying cell, but it also adds some functionality
● The OutputProjectionWrapper adds a fully connected layer of linear
neurons i.e., without any activation function on top of each output,
but it does not affect the cell state
● All these fully connected layers share the same trainable weights and bias
terms.
Training to Predict Time Series
Recurrent Neural Network
RNN cells using output projections
Training to Predict Time Series
Recurrent Neural Network
Wrapping a cell is quite easy
Let’s tweak the preceding code by wrapping the BasicRNNCell into an
OutputProjectionWrapper
>>> cell = tf.contrib.rnn.OutputProjectionWrapper(
tf.contrib.rnn.BasicRNNCell(num_units=n_neurons,
activation=tf.nn.relu),output_size=n_outputs)
Training to Predict Time Series
Run it on Notebook
Recurrent Neural Network
Cost Function and Optimizer
● Now we will define the cost function
● We will use the Mean Squared Error (MSE)
● Next we will create an Adam optimizer, the training op, and the variable
initialization op
●
>>> learning_rate = 0.001
>>> loss = tf.reduce_mean(tf.square(outputs - y))
>>> optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
>>> training_op = optimizer.minimize(loss)
>>> init = tf.global_variables_initializer()
Training to Predict Time Series
Run it on Notebook
Recurrent Neural Network
Execution Phase
>>> n_iterations = 10000
>>> batch_size = 50
>>> with tf.Session() as sess:
init.run()
for iteration in range(n_iterations):
X_batch, y_batch = [...] # fetch the next training batch
sess.run(training_op, feed_dict={X: X_batch, y:y_batch})
if iteration % 100 == 0:
mse = loss.eval(feed_dict={X: X_batch, y: y_batch})
print(iteration, "tMSE:", mse)
Training to Predict Time Series
Run it on Notebook
Recurrent Neural Network
Execution Phase
The program’s output should look like this
0 MSE: 379.586
100 MSE: 14.58426
200 MSE: 7.14066
300 MSE: 3.98528
400 MSE: 2.00254
[...]
Training to Predict Time Series
Recurrent Neural Network
Making Predictions
Once the model is trained, you can make predictions:
>>> X_new = [...] # New sequences
>>> y_pred = sess.run(outputs, feed_dict={X: X_new})
Training to Predict Time Series
Recurrent Neural Network
Making Predictions
Training to Predict Time Series
Shows the predicted sequence for the instances, after 1,000 training iterations
Recurrent Neural Network
● Although using an OutputProjectionWrapper is the simplest solution
to reduce the dimensionality of the RNN’s output sequences down to just
one value per time step per instance
● But it is not the most efficient
Training to Predict Time Series
Recurrent Neural Network
● There is a trickier but more efficient solution:
○ We can reshape the RNN outputs from [batch_size, n_steps,
n_neurons] to [batch_size * n_steps, n_neurons]
○ Then apply a single fully connected layer with the appropriate output
size in our case just 1, which will result in an output tensor of shape
[batch_size * n_steps, n_outputs]
○ And then reshape this tensor to [batch_size, n_steps, n_outputs]
Training to Predict Time Series
Recurrent Neural Network
Reshape the RNN outputs
from [batch_size, n_steps,
n_neurons] to
[batch_size * n_steps,
n_neurons]
Training to Predict Time Series
Recurrent Neural Network
Apply a single fully connected
layer with the appropriate
output size in our case just
1, which will result in an
output tensor of shape
[batch_size * n_steps,
n_outputs]
Training to Predict Time Series
Recurrent Neural Network
And then reshape this tensor
to [batch_size, n_steps,
n_outputs]
Training to Predict Time Series
Recurrent Neural Network
Let’s implement this solution
● We first revert to a basic cell, without the OutputProjectionWrapper
>>> cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons,
activation=tf.nn.relu)
>>> rnn_outputs, states = tf.nn.dynamic_rnn(cell, X,
dtype=tf.float32)
Training to Predict Time Series
Run it on Notebook
Recurrent Neural Network
Let’s implement this solution
● Then we stack all the outputs using the reshape() operation, apply the
fully connected linear layer without using any activation function; this is
just a projection, and finally unstack all the outputs, again using reshape()
>>> stacked_rnn_outputs = tf.reshape(rnn_outputs, [-1, n_neurons])
>>> stacked_outputs = fully_connected(stacked_rnn_outputs,
n_outputs, activation_fn=None)
>>> outputs = tf.reshape(stacked_outputs, [-1, n_steps, n_outputs])
Training to Predict Time Series
Run it on Notebook
Recurrent Neural Network
Let’s implement this solution
● The rest of the code is the same as earlier. This can provide a significant
speed boost since there is just one fully connected layer instead of one
per time step.
Training to Predict Time Series
Recurrent Neural Network
Creative RNN
Let’s use our to generate some creative sequences
Recurrent Neural Network
Creative RNN
● All we need is to provide it a seed sequence containing n_steps
values e.g., full of zeros
● Use the model to predict the next value
● Append this predicted value to the sequence
● Feed the last n_steps values to the model to predict the next value
● And so on
This process generates a new sequence that has some resemblance to the
original time series
Recurrent Neural Network
Creative RNN
>>> sequence = [0.] * n_steps
>>> for iteration in range(300):
X_batch = np.array(sequence[-n_steps:]).reshape(1, n_steps, 1)
y_pred = sess.run(outputs, feed_dict={X: X_batch})
sequence.append(y_pred[0, -1, 0])
Run it on Notebook
Recurrent Neural Network
Creative RNN
Creative sequences seeded with zeros
Recurrent Neural Network
Creative RNN
Creative sequences seeded with an instance
Recurrent Neural Network
Deep RNNs
Recurrent Neural Network
Deep RNNs
● It is quite common to stack multiple layers of cells.
● This gives you a Deep RNN
A Deep RNN
Recurrent Neural Network
Deep RNNs
Deep RNN unrolled through time
Recurrent Neural Network
Deep RNNs
How to implement Deep RNN in TensorFlow
Recurrent Neural Network
● To implement a deep RNN in TensorFlow
● We can create several cells and stack them into a MultiRNNCell
● In the following code we stack three identical cells
>>> n_neurons = 100
>>> n_layers = 3
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> multi_layer_cell = tf.contrib.rnn.MultiRNNCell([basic_cell] *
n_layers)
>>> outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X,
dtype=tf.float32)
Deep RNNs - Implementation in TensorFlow
Run it on Notebook
Recurrent Neural Network
>>> outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X,
dtype=tf.float32)
● The states variable is a tuple containing one tensor per layer, each
representing the final state of that layer’s cell with shape [batch_size,
n_neurons]
● If you set state_is_tuple=False when creating the MultiRNNCell,
then states becomes a single tensor containing the states from every
layer, concatenated along the column axis i.e., its shape is [batch_size,
n_layers * n_neurons]
Deep RNNs - Implementation in TensorFlow
Recurrent Neural Network
● If you build a very deep RNN, it may end up overfitting the training set
● To prevent that, a common technique is to apply dropout
● You can simply add a dropout layer before or after the RNN as usual
● But if you also want to apply dropout between the RNN layers, you need
to use a DropoutWrapper
Deep RNNs - Applying Dropout
Recurrent Neural Network
● The following code applies dropout to the inputs of each layer in the
RNN, dropping each input with a 50% probability
>>> keep_prob = 0.5
>>> cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> cell_drop = tf.contrib.rnn.DropoutWrapper(cell,
input_keep_prob=keep_prob)
>>> multi_layer_cell = tf.contrib.rnn.MultiRNNCell([cell_drop] *
n_layers)
>>> rnn_outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X,
dtype=tf.float32)
Deep RNNs - Applying Dropout
Run it on Notebook
Recurrent Neural Network
● It is also possible to apply dropout to the outputs by setting
output_keep_prob
● The main problem with this code is that it will apply dropout not only
during training but also during testing, which is not what we want
● Since dropout should be applied only during training
Deep RNNs - Applying Dropout
Recurrent Neural Network
● Unfortunately, the DropoutWrapper does not support an is_training
placeholder
● So we must either write our own dropout wrapper class, or have two
different graphs:
○ One for training
○ And the other for testing
Let’s implement the second option
Deep RNNs - Applying Dropout
Recurrent Neural Network
>>> import sys
>>> is_training = (sys.argv[-1] == "train")
>>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
>>> y = tf.placeholder(tf.float32, [None, n_steps, n_outputs])
>>> cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> if is_training:
cell = tf.contrib.rnn.DropoutWrapper(cell, input_keep_prob=keep_prob)
>>> multi_layer_cell = tf.contrib.rnn.MultiRNNCell([cell] * n_layers)
>>> rnn_outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X,
dtype=tf.float32)
[...] # build the rest of the graph
>>> init = tf.global_variables_initializer()
>>> saver = tf.train.Saver()
>>> with tf.Session() as sess:
>>> if is_training:
init.run()
for iteration in range(n_iterations):
[...] # train the model
save_path = saver.save(sess, "/tmp/my_model.ckpt")
else:
saver.restore(sess, "/tmp/my_model.ckpt")
[...] # use the model Run it on Notebook
Deep RNNs - Applying Dropout
Recurrent Neural Network
The Difficulty of Training over Many Time Steps
● To train an RNN on long sequences, we will need to run it over many
time steps, making the unrolled RNN a very deep network
● Just like any deep neural network it may suffer from the
vanishing/exploding gradients problem and take forever to train
Deep RNNs
Recurrent Neural Network
The Difficulty of Training over Many Time Steps
● Many of the tricks we discussed to alleviate this problem can be used for
deep unrolled RNNs as well:
○ good parameter initialization,
○ nonsaturating activation functions e.g., ReLU
○ Batch Normalization,
○ Gradient Clipping,
○ And faster optimizers
Deep RNNs
Recurrent Neural Network
The Difficulty of Training over Many Time Steps
● However, if the RNN needs to handle even moderately long sequences
e.g., 100 inputs, then training will still be very slow
● The simplest and most common solution to this problem is to unroll the
RNN only over a limited number of time steps during training
● This is called truncated backpropagation through time
Deep RNNs
Recurrent Neural Network
The Difficulty of Training over Many Time Steps
● However, if the RNN needs to handle even moderately long sequences
e.g., 100 inputs, then training will still be very slow
● The simplest and most common solution to this problem is to unroll the
RNN only over a limited number of time steps during training
● This is called truncated backpropagation through time
Deep RNNs
Recurrent Neural Network
The Difficulty of Training over Many Time Steps
● In TensorFlow you can implement truncated backpropagation
through time by simply by truncating the input sequences
● For example, in the time series prediction problem, you would simply
reduce n_steps during training
● The problem with this is that the model will not be able to learn
long-term patterns
How can we solve this problem?
Deep RNNs
Recurrent Neural Network
The Difficulty of Training over Many Time Steps
● One workaround could be to make sure that these shortened sequences
contain both old and recent data
● So that the model can learn to use both
● E.g., the sequence could contain monthly data for the last five months,
then weekly data for the last five weeks, then daily data over the last five
days
● But this workaround has its limits:
○ What if fine-grained data from last year is actually useful?
○ What if there was a brief but significant event that absolutely must be
taken into account, even years later
○ E.g., the result of an election
Deep RNNs
Recurrent Neural Network
The Difficulty of Training over Many Time Steps
● Besides the long training time
○ A second problem faced by long-running RNNs is the fact that the
memory of the first inputs gradually fades away
○ Indeed, due to the transformations that the data goes through when
traversing an RNN, some information is lost after each time step.
● After a while, the RNN’s state contains virtually no trace of the first
inputs
Let’s understand this with an example
Deep RNNs
Recurrent Neural Network
The Difficulty of Training over Many Time Steps
● Say you want to perform sentiment analysis on a long review that starts
with the four words “I loved this movie,”
● But the rest of the review lists the many things that could have made the
movie even better
● If the RNN gradually forgets the first four words, it will completely
misinterpret the review
Deep RNNs
Recurrent Neural Network
The Difficulty of Training over Many Time Steps
● To solve this problem, various types of cells with long-term memory have
been introduced
● They have proved so successful that the basic cells are not much used
anymore
Let’s study about these long memory cells
Deep RNNs
Recurrent Neural Network
LSTM Cell
Recurrent Neural Network
● The Long Short-Term Memory (LSTM) cell was proposed in 19973 by
Sepp Hochreiter and Jürgen Schmidhuber
● And it was gradually improved over the years by several researchers,
such as Alex Graves, Haşim Sak, Wojciech Zaremba, and many more
LSTM Cell
Sepp Hochreiter Jürgen Schmidhuber
Recurrent Neural Network
LSTM Cell
● If you consider the LSTM cell as a black box, it can be used very much
like a basic cell
● Except
○ It will perform much better
○ Training will converge faster
○ And it will detect long-term dependencies in the data
In TensorFlow, you can simply use a BasicLSTMCell instead of a
BasicRNNCell
>>> lstm_cell = tf.contrib.rnn.BasicLSTMCell(num_units=n_neurons)
Recurrent Neural Network
LSTM Cell
● LSTM cells manage two state vectors, and for performance reasons they
are kept separate by default
● We can change this default behavior by setting state_is_tuple=False
when creating the BasicLSTMCell
Recurrent Neural Network
LSTM Cell
The architecture of a basic LSTM cell
Recurrent Neural Network
● The LSTM cell looks exactly like a regular cell, except that its state is
split in two vectors: h(t)
and c(t)
, here “c” stands for “cell”
LSTM Cell
Recurrent Neural Network
● We can think of h(t)
as the short-term state and c(t)
as the long-term state
LSTM Cell
Recurrent Neural Network
Understanding the LSTM cell structure
● The key idea is that the network can learn
○ What to store in the long-term state,
○ What to throw away,
○ And what to read from it
LSTM Cell
Recurrent Neural Network
As the long-term
state c(t–1)
traverses the
network from left
to right, it first
goes through a
forget gate,
dropping some
memories
Understanding the LSTM cell structure
LSTM Cell
Recurrent Neural Network
Understanding the LSTM cell structure
LSTM Cell
And then it adds
some new
memories via
the addition
operation, which
adds the
memories that
were selected by
an input
gate
Recurrent Neural Network
The result c(t)
is
sent straight out,
without any
further
transformation.
So, at each time
step, some
memories are
dropped and
some memories
are added
Understanding the LSTM cell structure
LSTM Cell
Recurrent Neural Network
Moreover, after
the addition
operation, the
long term state is
copied and passed
through the tanh
function, and then
the result is
filtered by the
output gate.
Understanding the LSTM cell structure
LSTM Cell
Recurrent Neural Network
This produces the
short-term state
h(t)
, which is
equal to the cell’s
output for this
time step y(t)
Understanding the LSTM cell structure
LSTM Cell
Recurrent Neural Network
This produces the
short-term state
h(t)
, which is
equal to the cell’s
output for this
time step y(t)
Understanding the LSTM cell structure
LSTM Cell
Recurrent Neural Network
Now let’s look at where new memories come from and how the
gates work
LSTM Cell
Recurrent Neural Network
First, the current
input vector x(t)
and the previous
short-term state
h(t–1)
are fed to
four different fully
connected layers.
They all serve a
different purpose
Understanding the LSTM cell structure
LSTM Cell
Recurrent Neural Network
The main layer is
the one that
outputs g(t)
. It has
the usual role of
analyzing the
current inputs x(t)
and the previous
short-term state
h(t–1)
. In an LSTM
cell this layer’s
output is partially
stored in the
long-term state.
Understanding the LSTM cell structure
LSTM Cell
Recurrent Neural Network
The three other
layers are gate
controllers. Since
they use the
logistic activation
function, their
outputs range
from 0 to 1.
Understanding the LSTM cell structure
LSTM Cell
Recurrent Neural Network
● This summarizes
how to compute
the cell’s
long-term state,
its short-term
state, and its
output at each
time step for a
single instance
● The equations for
a whole
mini-batch are
very similar
Understanding the LSTM cell structure
LSTM Cell
Recurrent Neural Network
Conclusion
● A LSTM cell can learn to
○ Recognize an important input, that’s the role of the input gate,
○ Store it in the long-term state,
○ Learn to preserve it for as long as it is needed, that’s the role of the
forget gate,
○ And learn to extract it whenever it is needed
This explains why they have been amazingly successful at capturing
long-term patterns in time series, long texts, audio recordings, and more.
LSTM Cell
Recurrent Neural Network
Peephole Connections
● In a basic LSTM cell, the gate controllers can look only at the input x(t)
and the previous short-term state h(t–1)
● It may be a good idea to give them a bit more context by letting them
peek at the long-term state as well
● This idea was proposed by Felix Gers and Jürgen Schmidhuber in
2000
Recurrent Neural Network
● They proposed an LSTM variant with extra connections called
peephole connections:
○ The previous long-term state c(t–1)
is added as an input to the
controllers of the forget gate and the input gate,
○ And the current long-term state c(t)
is added as input to the controller
of the output gate.
Peephole Connections
Recurrent Neural Network
Peephole Connections
Recurrent Neural Network
Peephole Connections
To implement peephole connections in TensorFlow, you must use the
LSTMCell instead of the BasicLSTMCell and set use_peepholes=True:
>>> lstm_cell = tf.contrib.rnn.LSTMCell(num_units=n_neurons,
use_peepholes=True)
There are many other variants of the LSTM cell.
One particularly popular variant is the GRU cell, which we will look at now.
Recurrent Neural Network
GRU Cell
Recurrent Neural Network
GRU Cell
The Gated Recurrent Unit (GRU) cell was proposed by Kyunghyun Cho
et al. in a 2014 paper that also introduced the Encoder–Decoder network
we discussed earlier
Kyunghyun Cho
Recurrent Neural Network
GRU Cell
● The GRU cell is
a simplified
version of the
LSTM cell
● It seems to
perform just as
well
● This explains its
growing
popularity
Recurrent Neural Network
GRU Cell
The main
simplifications are:
● Both state vectors
are merged into a
single vector h(t)
Recurrent Neural Network
The main
simplifications are:
● A single gate
controller controls
both the forget
gate and the input
gate.
If the gate
controller outputs
a 1, the input gate
is open and the
forget gate is
closed.
GRU Cell
Recurrent Neural Network
The main
simplifications are:
If it outputs a 0, the
opposite happens
In other words,
whenever a memory
must be stored, the
location where it will
be stored is erased
first. This is actually a
frequent variant to
the LSTM cell in and
of itself
GRU Cell
Recurrent Neural Network
The main
simplifications are:
● There is no output
gate; the full state
vector is output at
every time step.
There is a new
gate controller
that controls
which part of the
previous state will
be shown to the
main layer.
GRU Cell
Recurrent Neural Network
Equations to compute the cell’s state at each time step for a single
instance
GRU Cell
)
Recurrent Neural Network
Implementing GRU cell in TensorFlow
>>> gru_cell = tf.contrib.rnn.GRUCell(num_units=n_neurons)
● LSTM or GRU cells are one of the main reasons behind the success of
RNNs in recent years
● In particular for applications in natural language processing (NLP)
GRU Cell
Recurrent Neural Network
Natural Language Processing
Recurrent Neural Network
Natural Language Processing
● Most of the state-of-the-art NLP applications, such as
○ Machine translation,
○ Automatic summarization,
○ Parsing,
○ Sentiment analysis,
○ and more, are now based on RNNs
Now we will take a quick look at what a machine translation model looks
like.
This topic is very well covered by TensorFlow’s awesome Word2Vec and
Seq2Seq tutorials, so you should definitely check them out
Recurrent Neural Network
Natural Language Processing - Word Representation
Before we start, we need to answer this important question
How do we represent a “word” ??
Recurrent Neural Network
Natural Language Processing - Word Representation
In order to apply algorithms,
We need to convert everything in numbers.
What can we do about climate?
temp climate comments
12 Cold Very nice place to
visit in summers
30 Hot Do not visit. This
is a trap
Recurrent Neural Network
Natural Language Processing - Word Representation
In order to apply algorithms,
We need to convert everything in numbers.
What can we do about climate?
We can convert it into One-Hot vector
temp climate comments
12 Cold Very nice place to
visit in summers
30 Hot Do not visit. This
is a trap
temp climate_cold climate_hot comments
12 1 0 Very nice place
to visit in
summers
30 0 1 Do not visit.
This is a trap
Recurrent Neural Network
Natural Language Processing - Word Representation
In order to apply algorithms,
We need to convert everything in numbers.
And what can we do about comments?
temp climate comments
12 Cold Very nice place to
visit in summers
30 Hot Do not visit. This
is a trap
Recurrent Neural Network
One option could be to represent each word using a one-hot vector.
But consider this :
● Suppose your vocabulary contains 50,000 words
● Then the nth word would be represented as a 50,000-dimensional
vector, full of 0s except for a 1 at the nth position
● However, with such a large vocabulary, this sparse representation would
not be efficient at all
Natural Language Processing - Word Representation
Recurrent Neural Network
● Ideally, we want similar words to have similar representations,
making it easy for the model to generalize what it learns about a word to
all similar words
● For example,
○ If the model is told that “I drink milk” is a valid sentence, and if it
knows that “milk” is close to “water” but far from “shoes”
○ Then it will know that “I drink water” is probably a valid sentence
as well
○ While “I drink shoes” is probably not
But how can you come up with such a meaningful representation?
Natural Language Processing - Word Representation
Recurrent Neural Network
● The most common solution is to represent each word in the vocabulary
using a fairly small and dense vector e.g., 150 dimensions, called an
Embedding
● And just let the neural network learn a good embedding for each word
during training
Natural Language Processing - Word Embedding
Recurrent Neural Network
With word embeding a lot of magic is possible:
king - man + woman == queen
Natural Language Processing - Word Embedding
Recurrent Neural Network
from gensim.models import KeyedVectors
# load the google word2vec model
filename = 'GoogleNews-vectors-negative300.bin'
model = KeyedVectors.load_word2vec_format(filename, binary=True)
# calculate: (king - man) + woman = ?
result = model.most_similar(positive=['woman', 'king'],
negative=['man'], topn=1)
print(result)
Word Embedding - word2vec
● Based on the context of word, people have generated the vectors.
● One such vector is word2vec and other is Glove
[('queen', 0.7118192315101624)]
Recurrent Neural Network
Word Embedding - Vector space models (VSMs)
Based on the Distributional Hypothesis:
○ words that appear in the same contexts share semantic meaning.
Two Approaches:
1. Count-based methods (e.g. Latent Semantic Analysis)
2. Predictive methods (e.g. neural probabilistic language models)
Recurrent Neural Network
Word Embedding - word2vec - Approaches
1. Count-based methods (e.g. Latent Semantic Analysis)
○ Compute the statistics of how often some word co-occurs with its
neighbor words in a large text corpus
○ Map these count-statistics down to a small, dense vector for each
word
Recurrent Neural Network
2. Predictive models
○ Directly try to predict a word from its neighbors
○ in terms of learned small, dense embedding vectors
○ (considered parameters of the model).
Word Embedding - word2vec - Approaches
Recurrent Neural Network
Computationally-efficient predictive model
for learning word embeddings from raw text.
word2vec
Comes in two flavors:
1. Continuous Bag-of-Words model (CBOW)
2. Skip-Gram model
Recurrent Neural Network
Computationally-efficient predictive model
for learning word embeddings from raw text.
word2vec
Comes in two flavors:
1. Continuous Bag-of-Words model (CBOW)
○ predicts target words (e.g. 'mat') from source context words
○ e.g ('the cat sits on the'),
2. Skip-Gram model
Recurrent Neural Network
Computationally-efficient predictive model
for learning word embeddings from raw text.
word2vec
Comes in two flavors:
1. Continuous Bag-of-Words model (CBOW)
○ predicts target words (e.g. 'mat') from source context words
○ e.g ('the cat sits on the'),
2. Skip-Gram model
○ Predicts source context-words from the target words
○ Treats each context-target pair as a new observation
○ Tends to do better when we have larger datasets.
○ Will focus on this
Recurrent Neural Network
Neural probabilistic language models
● are traditionally trained using the maximum likelihood (ML) principle
● to maximize the probability of the next word wt
(for "target")
● given the previous words h (for "history") in terms of a softmax function,
word2vec: Scaling up Noise-Contrastive Training
Recurrent Neural Network
Neural probabilistic language models
● are traditionally trained using the maximum likelihood (ML) principle
● to maximize the probability of the next word wt
(for "target")
● given the previous words h (for "history") in terms of a softmax function,
word2vec: Scaling up Noise-Contrastive Training
where score(wt
, h) computes the compatibility of word wt
with the context h(a dot product is
commonly used). We train this model by maximizing its log-likelihood i.e.
Recurrent Neural Network
Neural probabilistic language models
● are traditionally trained using the maximum likelihood (ML) principle
● to maximize the probability of the next word wt
(for "target")
● given the previous words h (for "history") in terms of a softmax function,
word2vec: Scaling up Noise-Contrastive Training
where score(wt
, h) computes the compatibility of word wt
with the context h(a dot product is
commonly used). We train this model by maximizing its log-likelihood i.e.
This is very expensive, because we need to compute and normalize each probability using the
score for all other V words w' in the current context , at every training step.
Recurrent Neural Network
Neural probabilistic language models
● are traditionally trained using the maximum likelihood (ML) principle
● to maximize the probability of the next word wt
(for "target")
● given the previous words h (for "history") in terms of a softmax function,
word2vec: Scaling up Noise-Contrastive Training
This is very expensive, because we need to compute and normalize each probability using the
score for all other V words w' in the current context , at every training step.
Recurrent Neural Network
Instead models trained using a binary classification objective (logistic regression)
to discriminate the real target words wt
from k imaginary (noise) words w, in the same context.
word2vec: Scaling up Noise-Contrastive Training
1. Computing the loss function now scales only with the number of noise words that we select
and not all words in the vocabulary
2. This makes it much faster to train.
3. will use similar noise-contrastive estimation (NCE) loss - tf.nn.nce_loss().
Recurrent Neural Network
the quick brown fox jumped over the lazy dog
Word2vec: Context Example
([the, brown], quick), ([quick, fox], brown), ([brown, jumped], fox), ...
Context: word to the left and word to the right.
Recurrent Neural Network
the quick brown fox jumped over the lazy dog
Word2vec: Skip Gram Model
(quick, the), (quick, brown), (brown, quick), (brown, fox), ...
Task becomes to predict 'the' and 'brown' from 'quick', 'quick' and 'fox' from 'brown', etc.
Skip-gram
● inverts contexts and targets, and
● tries to predict each context word from its target word
Recurrent Neural Network
Natural Language Processing - Word Embedding
Let's imagine at training step t
● For first case above, the goal is to predict the from quick.
● We select num_noise number
○ of noisy (contrastive) examples
○ by drawing from some noise distribution,
○ typically the unigram distribution,
● For simplicity let's say num_noise=1 and we select sheep as a noisy
example. Next we compute the loss for this pair of observed and noisy
examples
Recurrent Neural Network
Natural Language Processing - Word Embedding
The objective at time step t becomes:
Recurrent Neural Network
Natural Language Processing - Word Embedding
● The goal is to make an update to the embedding parameters
● to improve (in this case, maximize) the objective function
● We do this by deriving the gradient of the loss with respect to the
embedding parameters , i.e. (luckily TensorFlow provides easy helper
functions for doing this!).
● We then perform an update to the embeddings by taking a small step in
the direction of the gradient. When this process is repeated over the
entire training set, this has the effect of 'moving' the embedding vectors
around for each word until the model is successful at discriminating real
words from noise words.
Recurrent Neural Network
Natural Language Processing - Word Embedding
● At the beginning of training, embeddings are simply chosen randomly,
● But during training, backpropagation automatically moves the
embeddings around in a way that helps the neural network perform its
task
Recurrent Neural Network
Natural Language Processing - Word Embedding
● Typically this means that similar words will gradually cluster close to one
another, and even end up organized in a rather meaningful way.
● For example, embeddings may end up placed along various axes that
represent
○ gender,
○ singular/plural,
○ adjective/noun,
○ and so on
Recurrent Neural Network
Natural Language Processing - Word Embedding
How to do it in TensorFlow
In TensorFlow, we first need to create the variable representing the
embeddings for every word in our vocabulary which is initialized randomly
>>> vocabulary_size = 50000
>>> embedding_size = 150
>>> embeddings = tf.Variable(
tf.random_uniform([vocabulary_size, embedding_size],
-1.0, 1.0))
Recurrent Neural Network
How to do it in TensorFlow - Preprocessing
Suppose we want to feed the sentence “I drink milk” to your neural
network.
● We should first preprocess the sentence and break it into a list of
known words
● For example
○ We may remove unnecessary characters, replace unknown words by a
predefined token word such as “[UNK]”,
○ Replace numerical values by “[NUM]”,
○ Replace URLs by “[URL]”,
○ And so on
Natural Language Processing - Word Embedding
Recurrent Neural Network
How to do it in TensorFlow
● Once we have a list of known words, we can look up each word’s integer
identifier from 0 to 49999 in a dictionary, for example [72, 3335, 288]
● At that point, you are ready to feed these word identifiers to TensorFlow
using a placeholder, and apply the embedding_lookup() function to get
the corresponding embeddings
>>> train_inputs = tf.placeholder(tf.int32, shape=[None]) # from ids...
>>> embed = tf.nn.embedding_lookup(embeddings, train_inputs) # ...to
embeddings
Natural Language Processing - Word Embedding
Recurrent Neural Network
● Once our model has learned good word embeddings, it can actually be
reused fairly efficiently in any NLP application
● In fact, instead of training your own word embeddings, we may want to
download pre-trained word embeddings
● Just like when reusing pretrained layers, we can choose to
○ Freeze the pretrained embeddings
○ Or let backpropagation tweak them for your application
● The first option will speed up training, but the second may lead to slightly
higher performance
Natural Language Processing - Word Embedding
Recurrent Neural Network
Machine Translation
Recurrent Neural Network
Machine Translation
We now have almost all the tools we need to implement a
machine translation system
Let’s look at this now
Recurrent Neural Network
Machine Translation
An Encoder–Decoder Network for Machine Translation
Let’s take a look at a simple machine translation model that will translate
English sentences to French
Recurrent Neural Network
Machine Translation
An Encoder–Decoder Network for Machine Translation
A simple machine translation model
Recurrent Neural Network
Machine Translation
Let’s learn how this Encoder–Decoder Network for Machine
Translation is trained
Recurrent Neural Network
The English
sentences are fed to
the encoder, and
the decoder
outputs the French
translations
Machine Translation
An Encoder–Decoder Network for Machine Translation
Recurrent Neural Network
Note that the
French translations
are also used as
inputs to the
decoder, but
pushed back by one
step
Machine Translation
An Encoder–Decoder Network for Machine Translation
Recurrent Neural Network
Machine Translation
An Encoder–Decoder Network for Machine Translation
In other words, the
decoder is given as input
the word that it should
have output at the
previous step.
Regardless of
what it actually output at
the current step
Recurrent Neural Network
For the very first word,
the decoder is given a
token that represents the
beginning of the sentence
(here, “<go>”)
The decoder is expected
to end the sentence with
an end-of-sequence (EOS)
token (here, “<eos>”)
Machine Translation
An Encoder–Decoder Network for Machine Translation
Recurrent Neural Network
Question:
Why are the English
sentences reversed before
feeding it to the encoder??
Here “I drink
milk” is reversed to
“milk drink I”
Machine Translation
An Encoder–Decoder Network for Machine Translation
Recurrent Neural Network
Answer:
This ensures that the
beginning of the English
sentence will be fed last
to the encoder, which is
useful because that’s
generally the first thing
that the decoder needs to
translate
Machine Translation
An Encoder–Decoder Network for Machine Translation
Recurrent Neural Network
● Each word is initially
represented by a
simple integer identifier
● e.g., 288 for the word
“milk”
Machine Translation
An Encoder–Decoder Network for Machine Translation
Recurrent Neural Network
● Next, an embedding
lookup returns the
word embedding
● This is a dense, fairly
low-dimensional vector
● These word
embeddings are what is
actually fed to the
encoder and the
decoder
Machine Translation
An Encoder–Decoder Network for Machine Translation
Recurrent Neural Network
● At each step, the
decoder outputs a
score for each word in
the output vocabulary
i.e., French,
Machine Translation
An Encoder–Decoder Network for Machine Translation
Recurrent Neural Network
● And then the Softmax
layer turns these
scores into
probabilities
Machine Translation
An Encoder–Decoder Network for Machine Translation
Recurrent Neural Network
● For example, at the
first step the word “Je”
may have a probability
of 20%, “Tu” may have
a probability of 1%, and
so on
● The word with the
highest probability is
output
Machine Translation
An Encoder–Decoder Network for Machine Translation
Recurrent Neural Network
How can we use this Encoder–Decoder Network for Machine Translation
at the inference time, since we will not have the target sentence to feed to
the decoder ??
Machine Translation
An Encoder–Decoder Network for Machine Translation
Recurrent Neural Network
● We will simply feed the decoder the word that it output at the previous
step
● This will require an embedding lookup that is not shown on the diagram
Machine Translation
An Encoder–Decoder Network for Machine Translation
Questions?
https://discuss.cloudxlab.com
reachus@cloudxlab.com

More Related Content

What's hot

Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Edureka!
 
LSTM Basics
LSTM BasicsLSTM Basics
LSTM Basics
Akshay Sehgal
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term Memory
Yan Xu
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
Hakky St
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
Larry Guo
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
RahulKumar854607
 
Neural networks introduction
Neural networks introductionNeural networks introduction
Neural networks introduction
آيةالله عبدالحكيم
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Simplilearn
 
Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
Mohammad Sabouri
 
Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Deep learning tutorial 9/2019
Deep learning tutorial 9/2019
Amr Rashed
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
Rakuten Group, Inc.
 
[기초개념] Recurrent Neural Network (RNN) 소개
[기초개념] Recurrent Neural Network (RNN) 소개[기초개념] Recurrent Neural Network (RNN) 소개
[기초개념] Recurrent Neural Network (RNN) 소개
Donghyeon Kim
 
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Databricks
 
Quantum neural network
Quantum neural networkQuantum neural network
Quantum neural network
surat murthy
 
LSTM Tutorial
LSTM TutorialLSTM Tutorial
LSTM Tutorial
Ralph Schlosser
 
Spiking neural network: an introduction I
Spiking neural network: an introduction ISpiking neural network: an introduction I
Spiking neural network: an introduction I
Dalin Zhang
 
Attention in Deep Learning
Attention in Deep LearningAttention in Deep Learning
Attention in Deep Learning
健程 杨
 
Artificial Neural Network seminar presentation using ppt.
Artificial Neural Network seminar presentation using ppt.Artificial Neural Network seminar presentation using ppt.
Artificial Neural Network seminar presentation using ppt.
Mohd Faiz
 
Soft Computing
Soft ComputingSoft Computing
Soft Computing
MANISH T I
 
Overview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningOverview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep Learning
Khang Pham
 

What's hot (20)

Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
 
LSTM Basics
LSTM BasicsLSTM Basics
LSTM Basics
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term Memory
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
 
Neural networks introduction
Neural networks introductionNeural networks introduction
Neural networks introduction
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
 
Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Deep learning tutorial 9/2019
Deep learning tutorial 9/2019
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
[기초개념] Recurrent Neural Network (RNN) 소개
[기초개념] Recurrent Neural Network (RNN) 소개[기초개념] Recurrent Neural Network (RNN) 소개
[기초개념] Recurrent Neural Network (RNN) 소개
 
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
 
Quantum neural network
Quantum neural networkQuantum neural network
Quantum neural network
 
LSTM Tutorial
LSTM TutorialLSTM Tutorial
LSTM Tutorial
 
Spiking neural network: an introduction I
Spiking neural network: an introduction ISpiking neural network: an introduction I
Spiking neural network: an introduction I
 
Attention in Deep Learning
Attention in Deep LearningAttention in Deep Learning
Attention in Deep Learning
 
Artificial Neural Network seminar presentation using ppt.
Artificial Neural Network seminar presentation using ppt.Artificial Neural Network seminar presentation using ppt.
Artificial Neural Network seminar presentation using ppt.
 
Soft Computing
Soft ComputingSoft Computing
Soft Computing
 
Overview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep LearningOverview on Optimization algorithms in Deep Learning
Overview on Optimization algorithms in Deep Learning
 

Similar to Recurrent Neural Networks

Convolutional and Recurrent Neural Networks
Convolutional and Recurrent Neural NetworksConvolutional and Recurrent Neural Networks
Convolutional and Recurrent Neural Networks
Ramesh Ragala
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
Sharath TS
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
Junaid Bhat
 
Artificial Neural Networks - An Introduction.pptx
Artificial Neural Networks - An Introduction.pptxArtificial Neural Networks - An Introduction.pptx
Artificial Neural Networks - An Introduction.pptx
Tharaka Devinda
 
RNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantagesRNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantages
AbhijitVenkatesh1
 
Neural Turing Machines
Neural Turing MachinesNeural Turing Machines
Neural Turing Machines
Kato Yuzuru
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
hirokazutanaka
 
Machine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural NetworksMachine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural Networks
Andrew Ferlitsch
 
DEEPLEARNING recurrent neural networs.pdf
DEEPLEARNING recurrent neural networs.pdfDEEPLEARNING recurrent neural networs.pdf
DEEPLEARNING recurrent neural networs.pdf
AamirMaqsood8
 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNN
Pradnya Saval
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
PyData
 
Aaa ped-22-Artificial Neural Network: Introduction to ANN
Aaa ped-22-Artificial Neural Network: Introduction to ANNAaa ped-22-Artificial Neural Network: Introduction to ANN
Aaa ped-22-Artificial Neural Network: Introduction to ANN
AminaRepo
 
Icon18revrec sudeshna
Icon18revrec sudeshnaIcon18revrec sudeshna
Icon18revrec sudeshna
Muthusamy Chelliah
 
Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)
sohaib_alam
 
Digital Signal Processing
Digital Signal ProcessingDigital Signal Processing
Digital Signal Processing
PRABHAHARAN429
 
Introduction To Using TensorFlow & Deep Learning
Introduction To Using TensorFlow & Deep LearningIntroduction To Using TensorFlow & Deep Learning
Introduction To Using TensorFlow & Deep Learning
ali alemi
 
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
Simplilearn
 
UNIT III (8).pptx
UNIT III (8).pptxUNIT III (8).pptx
UNIT III (8).pptx
DrDhivyaaCRAssistant
 
Artificial Neural Network by Dr.C.R.Dhivyaa Kongu Engineering College
Artificial Neural Network by Dr.C.R.Dhivyaa Kongu Engineering CollegeArtificial Neural Network by Dr.C.R.Dhivyaa Kongu Engineering College
Artificial Neural Network by Dr.C.R.Dhivyaa Kongu Engineering College
Dhivyaa C.R
 
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 

Similar to Recurrent Neural Networks (20)

Convolutional and Recurrent Neural Networks
Convolutional and Recurrent Neural NetworksConvolutional and Recurrent Neural Networks
Convolutional and Recurrent Neural Networks
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Artificial Neural Networks - An Introduction.pptx
Artificial Neural Networks - An Introduction.pptxArtificial Neural Networks - An Introduction.pptx
Artificial Neural Networks - An Introduction.pptx
 
RNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantagesRNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantages
 
Neural Turing Machines
Neural Turing MachinesNeural Turing Machines
Neural Turing Machines
 
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
 
Machine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural NetworksMachine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Recurrent Neural Networks
 
DEEPLEARNING recurrent neural networs.pdf
DEEPLEARNING recurrent neural networs.pdfDEEPLEARNING recurrent neural networs.pdf
DEEPLEARNING recurrent neural networs.pdf
 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNN
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Aaa ped-22-Artificial Neural Network: Introduction to ANN
Aaa ped-22-Artificial Neural Network: Introduction to ANNAaa ped-22-Artificial Neural Network: Introduction to ANN
Aaa ped-22-Artificial Neural Network: Introduction to ANN
 
Icon18revrec sudeshna
Icon18revrec sudeshnaIcon18revrec sudeshna
Icon18revrec sudeshna
 
Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)Recurrent and Recursive Networks (Part 1)
Recurrent and Recursive Networks (Part 1)
 
Digital Signal Processing
Digital Signal ProcessingDigital Signal Processing
Digital Signal Processing
 
Introduction To Using TensorFlow & Deep Learning
Introduction To Using TensorFlow & Deep LearningIntroduction To Using TensorFlow & Deep Learning
Introduction To Using TensorFlow & Deep Learning
 
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
 
UNIT III (8).pptx
UNIT III (8).pptxUNIT III (8).pptx
UNIT III (8).pptx
 
Artificial Neural Network by Dr.C.R.Dhivyaa Kongu Engineering College
Artificial Neural Network by Dr.C.R.Dhivyaa Kongu Engineering CollegeArtificial Neural Network by Dr.C.R.Dhivyaa Kongu Engineering College
Artificial Neural Network by Dr.C.R.Dhivyaa Kongu Engineering College
 
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
 

More from CloudxLab

Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep Learning
CloudxLab
 
Deep Learning Overview
Deep Learning OverviewDeep Learning Overview
Deep Learning Overview
CloudxLab
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
CloudxLab
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
CloudxLab
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
CloudxLab
 
Training Deep Neural Nets
Training Deep Neural NetsTraining Deep Neural Nets
Training Deep Neural Nets
CloudxLab
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
CloudxLab
 
Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...
Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...
Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...
CloudxLab
 
Advanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLab
Advanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLabAdvanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLab
Advanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
CloudxLab
 
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
CloudxLab
 
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
CloudxLab
 
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLabIntroduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
CloudxLab
 
Introduction to Deep Learning | CloudxLab
Introduction to Deep Learning | CloudxLabIntroduction to Deep Learning | CloudxLab
Introduction to Deep Learning | CloudxLab
CloudxLab
 
Dimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLabDimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLab
CloudxLab
 
Ensemble Learning and Random Forests
Ensemble Learning and Random ForestsEnsemble Learning and Random Forests
Ensemble Learning and Random Forests
CloudxLab
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
CloudxLab
 

More from CloudxLab (20)

Understanding computer vision with Deep Learning
Understanding computer vision with Deep LearningUnderstanding computer vision with Deep Learning
Understanding computer vision with Deep Learning
 
Deep Learning Overview
Deep Learning OverviewDeep Learning Overview
Deep Learning Overview
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
 
Training Deep Neural Nets
Training Deep Neural NetsTraining Deep Neural Nets
Training Deep Neural Nets
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...
Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...
Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...
 
Advanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLab
Advanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLabAdvanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLab
Advanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLab
 
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...
 
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...
 
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLab
 
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLab
 
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLab
 
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
 
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLabIntroduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLab
 
Introduction to Deep Learning | CloudxLab
Introduction to Deep Learning | CloudxLabIntroduction to Deep Learning | CloudxLab
Introduction to Deep Learning | CloudxLab
 
Dimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLabDimensionality Reduction | Machine Learning | CloudxLab
Dimensionality Reduction | Machine Learning | CloudxLab
 
Ensemble Learning and Random Forests
Ensemble Learning and Random ForestsEnsemble Learning and Random Forests
Ensemble Learning and Random Forests
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 

Recently uploaded

Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
BookNet Canada
 
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
Matthew Sinclair
 
20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf
Sally Laouacheria
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
Toru Tamaki
 
Cookies program to display the information though cookie creation
Cookies program to display the information though cookie creationCookies program to display the information though cookie creation
Cookies program to display the information though cookie creation
shanthidl1
 
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
Matthew Sinclair
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
Quantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLMQuantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLM
Vijayananda Mohire
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
Lidia A.
 
The Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU CampusesThe Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU Campuses
Larry Smarr
 
20240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 202420240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 2024
Matthew Sinclair
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
Safe Software
 
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfINDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
jackson110191
 
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
Larry Smarr
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Chris Swan
 
Comparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdfComparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdf
Andrey Yasko
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
KAMAL CHOUDHARY
 

Recently uploaded (20)

Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...Transcript: Details of description part II: Describing images in practice - T...
Transcript: Details of description part II: Describing images in practice - T...
 
20240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 202420240705 QFM024 Irresponsible AI Reading List June 2024
20240705 QFM024 Irresponsible AI Reading List June 2024
 
20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf20240702 Présentation Plateforme GenAI.pdf
20240702 Présentation Plateforme GenAI.pdf
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...
 
Cookies program to display the information though cookie creation
Cookies program to display the information though cookie creationCookies program to display the information though cookie creation
Cookies program to display the information though cookie creation
 
20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
 
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-In
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
Quantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLMQuantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLM
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
 
The Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU CampusesThe Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU Campuses
 
20240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 202420240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 2024
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
 
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfINDIAN AIR FORCE FIGHTER PLANES LIST.pdf
INDIAN AIR FORCE FIGHTER PLANES LIST.pdf
 
The Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive ComputingThe Rise of Supernetwork Data Intensive Computing
The Rise of Supernetwork Data Intensive Computing
 
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...
 
Comparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdfComparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdf
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
 

Recurrent Neural Networks

  • 2. Recurrent Neural Network Recurrent Neural Network ● Predicting the future is what we do all the time ○ Finishing a friend’s sentence ○ Anticipating the smell of coffee at the breakfast or ○ Catching the ball in the field ● In this chapter, we will cover RNN ○ Networks which can predict future ● Unlike all the nets we have discussed so far ○ RNN can work on sequences of arbitrary lengths ○ Rather than on fixed-sized inputs
  • 3. Recurrent Neural Network Recurrent Neural Network - Applications ● RNN can analyze time series data ○ Such as stock prices, and ○ Tell you when to buy or sell
  • 4. Recurrent Neural Network Recurrent Neural Network - Applications ● In autonomous driving systems, RNN can ○ Anticipate car trajectories and ○ Help avoid accidents
  • 5. Recurrent Neural Network Recurrent Neural Network - Applications ● RNN can take sentences, documents, or audio samples as input and ○ Make them extremely useful ○ For natural language processing (NLP) systems such as ■ Automatic translation ■ Speech-to-text or ■ Sentiment analysis
  • 6. Recurrent Neural Network Recurrent Neural Network - Applications ● RNNs’ ability to anticipate also makes them capable of surprising creativity. ○ You can ask them to predict which are the most likely next notes in a melody ○ Then randomly pick one of these notes and play it. ○ Then ask the net for the next most likely notes, play it, and repeat the process again and again. Here is an example melody produced by Google’s Magenta project
  • 7. Recurrent Neural Network Recurrent Neural Network ● In this chapter we will learn about ○ Fundamental concepts in RNNs ○ The main problem RNNs face ○ And the solution to the problems ○ How to implement RNNs ● Finally, we will take a look at the ○ Architecture of a machine translation system
  • 9. Recurrent Neural Network Recurrent Neurons ● Up to now we have mostly looked at feedforward neural networks ○ Where the activations flow only in one direction ○ From the input layer to the output layer ● RNN looks much like a feedforward neural network ○ Except it also has connections pointing backward
  • 10. Recurrent Neural Network Recurrent Neurons ● Let’s look at the simplest possible RNN ○ Composed of just one neuron receiving inputs ○ Producing an output, and ○ Sending that output back to itself Input Output Sending output back to itself
  • 11. Recurrent Neural Network Recurrent Neurons ● At each time step t (also called a frame) ○ This recurrent neuron receives the inputs x(t) ○ As well as its own output from the previous time step y(t–1) A recurrent neuron (left), unrolled through time (right)
  • 12. Recurrent Neural Network Recurrent Neurons ● We can represent this tiny network against the time axis (See below figure) ● This is called unrolling the network through time A recurrent neuron (left), unrolled through time (right)
  • 13. Recurrent Neural Network Recurrent Neurons ● We can easily create a layer of recurrent neurons ● At each time step t, every neuron receives both the ○ Input vector x(t) and ○ Output vector from the previous time step y(t–1) A layer of recurrent neurons (left), unrolled through time(right)
  • 14. Recurrent Neural Network Recurrent Neurons ● Each recurrent neuron has two sets of weights ○ One for the inputs x(t) and the ○ Other for the outputs of the previous time step, y(t–1) ● Let’s call these weight vectors wx and wy ● Below equation represents the output of a single recurrent neuron Output of a single recurrent neuron for a single instance bias ϕ() is the activation function like ReLU
  • 15. Recurrent Neural Network Recurrent Neurons ● We can compute a whole layer’s output ○ In one shot for a whole mini-batch ○ Using a vectorized form of the previous equation Outputs of a layer of recurrent neurons for all instances in a mini-batch
  • 16. Recurrent Neural Network Recurrent Neurons ● Y(t) is an m x nneurons matrix containing the ○ Layer’s outputs at time step t for each instance in the minibatch ○ m is the number of instances in the mini-batch ○ nneurons is the number of neurons Outputs of a layer of recurrent neurons for all instances in a mini-batch
  • 17. Recurrent Neural Network Recurrent Neurons ● X(t) is an m × ninputs matrix containing the inputs for all instances ○ ninputs is the number of input features Outputs of a layer of recurrent neurons for all instances in a mini-batch
  • 18. Recurrent Neural Network Recurrent Neurons ● Wx is an ninputs × nneurons matrix containing the connection weights for the inputs of the current time step ● Wy is an nneurons × nneurons matrix containing the connection weights for the outputs of the previous time step Outputs of a layer of recurrent neurons for all instances in a mini-batch
  • 19. Recurrent Neural Network Recurrent Neurons ● The weight matrices Wx and Wy are often concatenated into a single weight matrix W of shape (ninputs + nneurons ) × nneurons ● b is a vector of size nneurons containing each neuron’s bias term Outputs of a layer of recurrent neurons for all instances in a mini-batch
  • 20. Recurrent Neural Network Memory Cells ● Since the output of a recurrent neuron at time step t is a ○ Function of all the inputs from previous time steps ○ We can say that it has a form of memory ● A part of a neural network that ○ Preserves some state across time steps is called a memory cell
  • 21. Recurrent Neural Network Memory Cells ● In general a cell’s state at time step t, denoted h(t) is a ○ Function of some inputs at that time step and ○ Its state at the previous time step h(t) = f(h(t–1) , x(t) ) ● Its output at time step t, denoted y(t) is also a ○ Function of the previous state and the current inputs
  • 22. Recurrent Neural Network Memory Cells ● In the case of basics cells we have discussed so far ○ The output is simply equal to the state ○ But in more complex cells this is not always the case A cell’s hidden state and its output may be different
  • 23. Recurrent Neural Network Input and Output Sequences Sequence-to-sequence Network ● An RNN can simultaneously take a ○ Sequence of inputs and ○ Produce a sequence of outputs
  • 24. Recurrent Neural Network Input and Output Sequences Sequence-to-sequence Network ● This type of network is useful for predicting time series ○ Such as stock prices ● We feed it the prices over the last N days and ○ It must output the prices shifted by one day into the future ○ i.e., from N – 1 days ago to tomorrow
  • 25. Recurrent Neural Network Input and Output Sequences Sequence-to-vector Network ● Alternatively we could feed the network a sequence of inputs and ○ Ignore all outputs except for the last one
  • 26. Recurrent Neural Network Input and Output Sequences Sequence-to-vector Network ● We can feed this network a sequence of words ○ Corresponding to a movie review and ○ The network would output a sentiment score ○ e.g., from –1 [hate] to +1 [love]
  • 27. Recurrent Neural Network Input and Output Sequences Vector-to-sequence Network ● We could feed the network a single input at the first time step and ○ Zeros for all other time steps and ○ Let is output a sequence ● For example, the input could be an image and the ○ Output could be a caption for the image
  • 28. Recurrent Neural Network Input and Output Sequences Encoder-Decoder ● In this network, we have ○ sequence-to-vector network, called an encoder followed by ○ vector-to-sequence network, called a decoder
  • 29. Recurrent Neural Network Input and Output Sequences Encoder-Decoder ● This can be used for translating a sentence ○ From one language to another ● We feed the network sentence in one language ○ The encoder converts this sentence into single vector representation ○ Then the decoder decodes this vector into a sentence in another language
  • 30. Recurrent Neural Network Input and Output Sequences Encoder-Decoder ● This two step model works much better than ○ Trying to translate on the fly with a ○ Single sequence-to-sequence RNN ● Since the last words of a sentence can affect the ○ First words of the translation ○ So we need to wait until we know the whole sentence
  • 31. Recurrent Neural Network Basic RNNs in TensorFlow
  • 32. Recurrent Neural Network Basic RNNs in TensorFlow ● Let’s implement a very simple RNN model ○ Without using any of the TensorFlow’s RNN operations ○ To better understand what goes on under the hood ● Let’s create an RNN composed of a layer of five recurrent neurons ○ Using the tanh activation function and ○ Assume that the RNN runs over only two time steps and ○ Taking input vectors of size 3 at each time step
  • 33. Recurrent Neural Network Basic RNNs in TensorFlow ● This network looks like a two-layer feedforward neural network with two differences ○ The same weights and bias terms are shared by both layers and ○ We feed inputs at each layer, and we get outputs from each layer
  • 34. Recurrent Neural Network Basic RNNs in TensorFlow ● To run the model, we need to feed it the inputs at both time steps ● Mini-batch contains four instances ○ Each with an input sequence composed of exactly two inputs
  • 35. Recurrent Neural Network Basic RNNs in TensorFlow ● At the end, Y0_val and Y1_val contain the outputs of the network ○ At both time steps for all neurons and ○ All instances in the mini-batch
  • 36. Recurrent Neural Network Checkout the complete code under “Manual RNN” section in notebook
  • 37. Recurrent Neural Network Static Unrolling Through Time ● Let’s look at how to create the same model ○ Using TensorFlow’s RNN operations ● The static_rnn() function creates ○ An unrolled RNN network by chaining cells ● The below code creates the exact same model as the previous one >>> X0 = tf.placeholder(tf.float32, [None, n_inputs]) >>> X1 = tf.placeholder(tf.float32, [None, n_inputs]) >>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons) >>> output_seqs, states = tf.contrib.rnn.static_rnn( basic_cell, [X0, X1], dtype=tf.float32 ) >>> Y0, Y1 = output_seqs
  • 38. Recurrent Neural Network Static Unrolling Through Time >>> X0 = tf.placeholder(tf.float32, [None, n_inputs]) >>> X1 = tf.placeholder(tf.float32, [None, n_inputs]) >>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons) >>> output_seqs, states = tf.contrib.rnn.static_rnn( basic_cell, [X0, X1], dtype=tf.float32 ) >>> Y0, Y1 = output_seqs ● First we create the input placeholders
  • 39. Recurrent Neural Network Static Unrolling Through Time >>> X0 = tf.placeholder(tf.float32, [None, n_inputs]) >>> X1 = tf.placeholder(tf.float32, [None, n_inputs]) >>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons) >>> output_seqs, states = tf.contrib.rnn.static_rnn( basic_cell, [X0, X1], dtype=tf.float32 ) >>> Y0, Y1 = output_seqs ● Then we create a BasicRNNCell ○ It is like a factory that creates ○ Copies of the cell to build the unrolled RNN ■ One for each time step
  • 40. Recurrent Neural Network Static Unrolling Through Time >>> X0 = tf.placeholder(tf.float32, [None, n_inputs]) >>> X1 = tf.placeholder(tf.float32, [None, n_inputs]) >>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons) >>> output_seqs, states = tf.contrib.rnn.static_rnn( basic_cell, [X0, X1], dtype=tf.float32 ) >>> Y0, Y1 = output_seqs ● Then we call static_rnn(), giving it the cell factory and the input tensors ● And telling it the data type of the inputs ○ This is used to create the initial state matrix ○ Which by default is full of zeros
  • 41. Recurrent Neural Network Static Unrolling Through Time >>> X0 = tf.placeholder(tf.float32, [None, n_inputs]) >>> X1 = tf.placeholder(tf.float32, [None, n_inputs]) >>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons) >>> output_seqs, states = tf.contrib.rnn.static_rnn( basic_cell, [X0, X1], dtype=tf.float32 ) >>> Y0, Y1 = output_seqs ● The static_rnn() function returns two objects ● The first is a Python list containing the output tensors for each time step ● The second is a tensor containing the final states of the network ● When we use basic cells ○ Then the final state is equal to the last output
  • 42. Recurrent Neural Network Static Unrolling Through Time Checkout the complete code under “Using static_rnn()” section in notebook
  • 43. Recurrent Neural Network Static Unrolling Through Time ● In the previous example, if there were 50 time steps then ○ It would not be convenient to define ○ 50 place holders and 50 output tensors ● Moreover, at execution time we would have to feed ○ Each of the 50 placeholders and manipulate the 50 outputs ● Let’s do it in a better way
  • 44. Recurrent Neural Network Static Unrolling Through Time >>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs]) >>> X_seqs = tf.unstack(tf.transpose(X, perm=[1, 0, 2])) >>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons) >>> output_seqs, states = tf.contrib.rnn.static_rnn( basic_cell, X_seqs, dtype=tf.float32 ) >>> outputs = tf.transpose(tf.stack(output_seqs), perm=[1, 0, 2]) ● The above code takes a single input placeholder of ○ shape [None, n_steps, n_inputs] ○ Where the first dimension is the mini-batch size
  • 45. Recurrent Neural Network Static Unrolling Through Time >>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs]) >>> X_seqs = tf.unstack(tf.transpose(X, perm=[1, 0, 2])) >>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons) >>> output_seqs, states = tf.contrib.rnn.static_rnn( basic_cell, X_seqs, dtype=tf.float32 ) >>> outputs = tf.transpose(tf.stack(output_seqs), perm=[1, 0, 2]) ● Then it extracts the list of input sequences for each time step ● X_seqs is a Python list of n_steps tensors of shape [None, n_inputs] ○ Where first dimension is the minibatch size
  • 46. Recurrent Neural Network Static Unrolling Through Time >>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs]) >>> X_seqs = tf.unstack(tf.transpose(X, perm=[1, 0, 2])) >>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons) >>> output_seqs, states = tf.contrib.rnn.static_rnn( basic_cell, X_seqs, dtype=tf.float32 ) >>> outputs = tf.transpose(tf.stack(output_seqs), perm=[1, 0, 2]) ● To do this, we first swap the first two dimensions ○ Using the transpose() function so that the ○ Time steps are now the first dimension
  • 47. Recurrent Neural Network Static Unrolling Through Time >>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs]) >>> X_seqs = tf.unstack(tf.transpose(X, perm=[1, 0, 2])) >>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons) >>> output_seqs, states = tf.contrib.rnn.static_rnn( basic_cell, X_seqs, dtype=tf.float32 ) >>> outputs = tf.transpose(tf.stack(output_seqs), perm=[1, 0, 2]) ● Then we extract a Python list of tensors along the first dimension ○ i.e., one tensor per time step ○ Using the unstack() function
  • 48. Recurrent Neural Network Static Unrolling Through Time >>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs]) >>> X_seqs = tf.unstack(tf.transpose(X, perm=[1, 0, 2])) >>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons) >>> output_seqs, states = tf.contrib.rnn.static_rnn( basic_cell, X_seqs, dtype=tf.float32 ) >>> outputs = tf.transpose(tf.stack(output_seqs), perm=[1, 0, 2]) ● The next two lines are same as before
  • 49. Recurrent Neural Network Static Unrolling Through Time >>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs]) >>> X_seqs = tf.unstack(tf.transpose(X, perm=[1, 0, 2])) >>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons) >>> output_seqs, states = tf.contrib.rnn.static_rnn( basic_cell, X_seqs, dtype=tf.float32 ) >>> outputs = tf.transpose(tf.stack(output_seqs), perm=[1, 0, 2]) ● Finally, we merge all the output tensors into a single tensor ○ Using the stack() function ● And then we swap the first two dimensions to get a ○ Final outputs tensor of shape [None, n_steps, n_neurons]
  • 50. Recurrent Neural Network Static Unrolling Through Time ● Now we can run the network by ○ Feeding it a single tensor that contains ○ All the mini-batch sequences
  • 51. Recurrent Neural Network Static Unrolling Through Time ● And then we get a single outputs_val tensor for ○ All instances ○ All time steps, and ○ All neurons
  • 52. Recurrent Neural Network Static Unrolling Through Time Checkout the complete code under “Packing sequences” section in notebook
  • 53. Recurrent Neural Network Static Unrolling Through Time ● The previous approach still builds a graph ○ Containing one cell per time step ● If there were 50 time steps, the graph would look ugly ● It is like writing a program without using for loops ○ Y0=f(0,X0); Y1=f(Y0, X1); Y2=f(Y1, X2); ...; Y50=f(Y49, X50)) ● With such a large graph ○ Since it must store all tensor values during the forward pass ○ So it can use them to compute gradients during the reverse pass ○ We may get out-of-memory (OOM) errors ○ During backpropagation (in GPU cards because of limited memory)
  • 54. Recurrent Neural Network Dynamic Unrolling Through Time Let’s look at the better solution than previous approach using the dynamic_rnn() function
  • 55. Recurrent Neural Network Dynamic Unrolling Through Time ● The dynamic_rnn() function uses a while_loop() operation to ○ Run over the cell the appropriate number of times ● We can set swap_memory=True ○ If we want it to swap the GPU’s memory to the CPU’s ○ Memory during backpropagation to avoid out of memory errors ● It also accepts a single tensor for ○ All inputs at every time step (shape [None, n_steps, n_inputs]) and ○ It outputs a single tensor for all outputs at every time step ■ (shape [None, n_steps, n_neurons]) ○ There is no need to stack, unstack, or transpose
  • 56. Recurrent Neural Network Dynamic Unrolling Through Time RNN using dynamic_rnn >>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs]) >>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons) >>> outputs, states = tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32)
  • 57. Recurrent Neural Network Dynamic Unrolling Through Time Checkout the complete code under “Using dynamic_rnn()” section in notebook
  • 58. Recurrent Neural Network Dynamic Unrolling Through Time Note ● During backpropagation ○ The while_loop() operation does the appropriate magic ○ It stores the tensor values for each iteration during the forward pass ○ So it can use them to compute gradients during the reverse pass
  • 59. Recurrent Neural Network Handling Variable Length Input Sequences ● So far we have used only fixed-size input sequences ● What if the input sequences have variable lengths (e.g., like sentences) ● In this case we should set the sequence_length parameter ○ When calling the dynamic_rnn() function ○ It must be a 1D tensor indicating the length of the ○ Input sequence for each instance
  • 60. Recurrent Neural Network Handling Variable Length Input Sequences ● Suppose the second input sequence contains ○ Only one input instead of two ○ Then It must be padded with a zero vector ○ In order to fit in the input tensor X
  • 61. Recurrent Neural Network Handling Variable Length Input Sequences ● Now we need to feed values for both placeholders X and seq_length
  • 62. Recurrent Neural Network Handling Variable Length Input Sequences ● Now the RNN outputs zero vectors for ○ Every time step past the input sequence length ○ Look at the second instance’s output for the second time step
  • 63. Recurrent Neural Network Handling Variable Length Input Sequences ● Moreover, the states tensor contains the final state of each cell ○ Excluding the zero vectors
  • 64. Recurrent Neural Network Handling Variable Length Input Sequences Checkout the complete code under “Setting the sequence lengths” section in notebook
  • 65. Recurrent Neural Network Handling Variable-Length Output Sequences ● What if the output sequences have variable lengths ● If we know in advance what length each sequence will have ○ For example if we know that it will be the same length as the input sequence ○ Then we can set the sequence_length parameter as discussed ● Unfortunately, in general this will not be possible ○ For example, ■ The length of a translated sentence is generally different from the ■ Length of the input sentence
  • 66. Recurrent Neural Network Handling Variable-Length Output Sequences ● In this case, the most common solution is to define ○ A special output called an end-of-sequence token (EOS token) ● Any output past the EOS should be ignored - We will discuss it later in details
  • 67. Recurrent Neural Network Till now we have learnt how to build an RNN network. But how do we train it?
  • 69. Recurrent Neural Network Training RNNs ● To train an RNN, the trick is to unroll it through time and then simply use regular backpropagation ● This strategy is called backpropagation through time (BPTT)
  • 70. Recurrent Neural Network Training RNNs Understanding how RNNs are trained Just like in regular backpropagation, there is a first forward pass through the unrolled network, represented by the dashed arrows
  • 71. Recurrent Neural Network Training RNNs Understanding how RNNs are trained Then the output sequence is evaluated using a cost function where tmin and tmax are the first and last output time steps, not counting the ignored outputs
  • 72. Recurrent Neural Network Then the gradients of that cost function are propagated backward through the unrolled network, represented by the solid arrows Training RNNs Understanding how RNNs are trained
  • 73. Recurrent Neural Network And finally the model parameters are updated using the gradients computed during BPTT Training RNNs Understanding how RNNs are trained
  • 74. Recurrent Neural Network Note that the gradients flow backward through all the outputs used by the cost function, not just through the final output Training RNNs Understanding how RNNs are trained
  • 75. Recurrent Neural Network Here, the cost function is computed using the last three outputs of the network, Y(2) , Y(3) , and Y(4) , so gradients flow through these three outputs, but not through Y(0) and Y(1) Training RNNs Understanding how RNNs are trained
  • 76. Recurrent Neural Network Moreover, since the same parameters W and b are used at each time step, backpropagation will do the right thing and sum over all time steps Training RNNs Understanding how RNNs are trained
  • 77. Recurrent Neural Network Training a Sequence Classifier Let’s train an RNN to classify MNIST images
  • 78. Recurrent Neural Network Training a Sequence Classifier ● A convolutional neural network would be better suited for image classification ● But this makes for a simple example that we are already familiar with
  • 79. Recurrent Neural Network Training a Sequence Classifier Overview of the task ● We will treat each image as a sequence of 28 rows of 28 pixels each, since each MNIST image is 28 × 28 pixels ● We will use cells of 150 recurrent neurons, plus a fully connected layer containing 10 neurons, one per class, connected to the output of the last time step ● This will be followed by a softmax layer
  • 80. Recurrent Neural Network Overview of the task Training a Sequence Classifier
  • 81. Recurrent Neural Network Construction Phase ● The construction phase is quite straightforward ● It’s pretty much the same as the MNIST classifier we built previously, except that an unrolled RNN replaces the hidden layers ● Note that the fully connected layer is connected to the states tensor, which contains only the final state of the RNN i.e., the 28th output Training a Sequence Classifier
  • 82. Recurrent Neural Network Construction Phase >>> from tensorflow.contrib.layers import fully_connected >>> n_steps = 28 >>> n_inputs = 28 >>> n_neurons = 150 >>> n_outputs = 10 >>> learning_rate = 0.001 >>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs]) >>> y = tf.placeholder(tf.int32, [None]) >>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons) >>> outputs, states = tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32) Training a Sequence Classifier Run it on Notebook
  • 83. Recurrent Neural Network Construction Phase >>> logits = tf.layers.dense(states, n_outputs, activation_fn=None) >>> xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits( labels=y, logits=logits) >>> loss = tf.reduce_mean(xentropy) >>> optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate) >>> training_op = optimizer.minimize(loss) >>> correct = tf.nn.in_top_k(logits, y, 1) >>> accuracy = tf.reduce_mean(tf.cast(correct, tf.float32)) >>> init = tf.global_variables_initializer() Training a Sequence Classifier Run it on Notebook
  • 84. Recurrent Neural Network Load the MNIST data and reshape it Now we will load the MNIST data and reshape the test data to [batch_size, n_steps, n_inputs] as is expected by the network >>> from tensorflow.examples.tutorials.mnist import input_data >>> mnist = input_data.read_data_sets("data/mnist/") >>> X_test = mnist.test.images.reshape((-1, n_steps, n_inputs)) >>> y_test = mnist.test.labels Training a Sequence Classifier Run it on Notebook
  • 85. Recurrent Neural Network Training the RNN We reshape each training batch before feeding it to the network >>> n_epochs = 100 >>> batch_size = 150 >>> with tf.Session() as sess: init.run() for epoch in range(n_epochs): for iteration in range(mnist.train.num_examples // batch_size): X_batch, y_batch = mnist.train.next_batch(batch_size) X_batch = X_batch.reshape((-1, n_steps, n_inputs)) sess.run(training_op, feed_dict={X: X_batch, y: y_batch}) acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch}) acc_test = accuracy.eval(feed_dict={X: X_test, y: y_test}) print(epoch, "Train accuracy:", acc_train, "Test accuracy:", acc_test) Training a Sequence Classifier Run it on Notebook
  • 86. Recurrent Neural Network The Output The output should look like this: 0 Train accuracy: 0.713333 Test accuracy: 0.7299 1 Train accuracy: 0.766667 Test accuracy: 0.7977 ... 98 Train accuracy: 0.986667 Test accuracy: 0.9777 99 Train accuracy: 0.986667 Test accuracy: 0.9809 Training a Sequence Classifier
  • 87. Recurrent Neural Network Conclusion ● We get over 98% accuracy — not bad! ● Plus we would certainly get a better result by ○ Tuning the hyperparameters ○ Initializing the RNN weights using He initialization ○ Training longer ○ Or adding a bit of regularization e.g., dropout Training a Sequence Classifier
  • 88. Recurrent Neural Network Training to Predict Time Series Now, we will train an RNN to predict the next value in a generated time series
  • 89. Recurrent Neural Network Training to Predict Time Series ● Each training instance is a randomly selected sequence of 20 consecutive values from the time series ● And the target sequence is the same as the input sequence, except it is shifted by one time step into the future
  • 90. Recurrent Neural Network Training to Predict Time Series Construction Phase ● It will contain 100 recurrent neurons and we will unroll it over 20 time steps since each training instance will be 20 inputs long ● Each input will contain only one feature, the value at that time ● The targets are also sequences of 20 inputs, each containing a single value
  • 91. Recurrent Neural Network Construction Phase >>> n_steps = 20 >>> n_inputs = 1 >>> n_neurons = 100 >>> n_outputs = 1 >>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs]) >>> y = tf.placeholder(tf.float32, [None, n_steps, n_outputs]) >>> cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons, activation=tf.nn.relu) >>> outputs, states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32) Training to Predict Time Series Run it on Notebook
  • 92. Recurrent Neural Network Construction Phase ● At each time step we now have an output vector of size 100 ● But what we actually want is a single output value at each time step ● The simplest solution is to wrap the cell in an OutputProjectionWrapper Training to Predict Time Series
  • 93. Recurrent Neural Network Construction Phase ● A cell wrapper acts like a normal cell, proxying every method call to an underlying cell, but it also adds some functionality ● The OutputProjectionWrapper adds a fully connected layer of linear neurons i.e., without any activation function on top of each output, but it does not affect the cell state ● All these fully connected layers share the same trainable weights and bias terms. Training to Predict Time Series
  • 94. Recurrent Neural Network RNN cells using output projections Training to Predict Time Series
  • 95. Recurrent Neural Network Wrapping a cell is quite easy Let’s tweak the preceding code by wrapping the BasicRNNCell into an OutputProjectionWrapper >>> cell = tf.contrib.rnn.OutputProjectionWrapper( tf.contrib.rnn.BasicRNNCell(num_units=n_neurons, activation=tf.nn.relu),output_size=n_outputs) Training to Predict Time Series Run it on Notebook
  • 96. Recurrent Neural Network Cost Function and Optimizer ● Now we will define the cost function ● We will use the Mean Squared Error (MSE) ● Next we will create an Adam optimizer, the training op, and the variable initialization op ● >>> learning_rate = 0.001 >>> loss = tf.reduce_mean(tf.square(outputs - y)) >>> optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate) >>> training_op = optimizer.minimize(loss) >>> init = tf.global_variables_initializer() Training to Predict Time Series Run it on Notebook
  • 97. Recurrent Neural Network Execution Phase >>> n_iterations = 10000 >>> batch_size = 50 >>> with tf.Session() as sess: init.run() for iteration in range(n_iterations): X_batch, y_batch = [...] # fetch the next training batch sess.run(training_op, feed_dict={X: X_batch, y:y_batch}) if iteration % 100 == 0: mse = loss.eval(feed_dict={X: X_batch, y: y_batch}) print(iteration, "tMSE:", mse) Training to Predict Time Series Run it on Notebook
  • 98. Recurrent Neural Network Execution Phase The program’s output should look like this 0 MSE: 379.586 100 MSE: 14.58426 200 MSE: 7.14066 300 MSE: 3.98528 400 MSE: 2.00254 [...] Training to Predict Time Series
  • 99. Recurrent Neural Network Making Predictions Once the model is trained, you can make predictions: >>> X_new = [...] # New sequences >>> y_pred = sess.run(outputs, feed_dict={X: X_new}) Training to Predict Time Series
  • 100. Recurrent Neural Network Making Predictions Training to Predict Time Series Shows the predicted sequence for the instances, after 1,000 training iterations
  • 101. Recurrent Neural Network ● Although using an OutputProjectionWrapper is the simplest solution to reduce the dimensionality of the RNN’s output sequences down to just one value per time step per instance ● But it is not the most efficient Training to Predict Time Series
  • 102. Recurrent Neural Network ● There is a trickier but more efficient solution: ○ We can reshape the RNN outputs from [batch_size, n_steps, n_neurons] to [batch_size * n_steps, n_neurons] ○ Then apply a single fully connected layer with the appropriate output size in our case just 1, which will result in an output tensor of shape [batch_size * n_steps, n_outputs] ○ And then reshape this tensor to [batch_size, n_steps, n_outputs] Training to Predict Time Series
  • 103. Recurrent Neural Network Reshape the RNN outputs from [batch_size, n_steps, n_neurons] to [batch_size * n_steps, n_neurons] Training to Predict Time Series
  • 104. Recurrent Neural Network Apply a single fully connected layer with the appropriate output size in our case just 1, which will result in an output tensor of shape [batch_size * n_steps, n_outputs] Training to Predict Time Series
  • 105. Recurrent Neural Network And then reshape this tensor to [batch_size, n_steps, n_outputs] Training to Predict Time Series
  • 106. Recurrent Neural Network Let’s implement this solution ● We first revert to a basic cell, without the OutputProjectionWrapper >>> cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons, activation=tf.nn.relu) >>> rnn_outputs, states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32) Training to Predict Time Series Run it on Notebook
  • 107. Recurrent Neural Network Let’s implement this solution ● Then we stack all the outputs using the reshape() operation, apply the fully connected linear layer without using any activation function; this is just a projection, and finally unstack all the outputs, again using reshape() >>> stacked_rnn_outputs = tf.reshape(rnn_outputs, [-1, n_neurons]) >>> stacked_outputs = fully_connected(stacked_rnn_outputs, n_outputs, activation_fn=None) >>> outputs = tf.reshape(stacked_outputs, [-1, n_steps, n_outputs]) Training to Predict Time Series Run it on Notebook
  • 108. Recurrent Neural Network Let’s implement this solution ● The rest of the code is the same as earlier. This can provide a significant speed boost since there is just one fully connected layer instead of one per time step. Training to Predict Time Series
  • 109. Recurrent Neural Network Creative RNN Let’s use our to generate some creative sequences
  • 110. Recurrent Neural Network Creative RNN ● All we need is to provide it a seed sequence containing n_steps values e.g., full of zeros ● Use the model to predict the next value ● Append this predicted value to the sequence ● Feed the last n_steps values to the model to predict the next value ● And so on This process generates a new sequence that has some resemblance to the original time series
  • 111. Recurrent Neural Network Creative RNN >>> sequence = [0.] * n_steps >>> for iteration in range(300): X_batch = np.array(sequence[-n_steps:]).reshape(1, n_steps, 1) y_pred = sess.run(outputs, feed_dict={X: X_batch}) sequence.append(y_pred[0, -1, 0]) Run it on Notebook
  • 112. Recurrent Neural Network Creative RNN Creative sequences seeded with zeros
  • 113. Recurrent Neural Network Creative RNN Creative sequences seeded with an instance
  • 115. Recurrent Neural Network Deep RNNs ● It is quite common to stack multiple layers of cells. ● This gives you a Deep RNN A Deep RNN
  • 116. Recurrent Neural Network Deep RNNs Deep RNN unrolled through time
  • 117. Recurrent Neural Network Deep RNNs How to implement Deep RNN in TensorFlow
  • 118. Recurrent Neural Network ● To implement a deep RNN in TensorFlow ● We can create several cells and stack them into a MultiRNNCell ● In the following code we stack three identical cells >>> n_neurons = 100 >>> n_layers = 3 >>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons) >>> multi_layer_cell = tf.contrib.rnn.MultiRNNCell([basic_cell] * n_layers) >>> outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32) Deep RNNs - Implementation in TensorFlow Run it on Notebook
  • 119. Recurrent Neural Network >>> outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32) ● The states variable is a tuple containing one tensor per layer, each representing the final state of that layer’s cell with shape [batch_size, n_neurons] ● If you set state_is_tuple=False when creating the MultiRNNCell, then states becomes a single tensor containing the states from every layer, concatenated along the column axis i.e., its shape is [batch_size, n_layers * n_neurons] Deep RNNs - Implementation in TensorFlow
  • 120. Recurrent Neural Network ● If you build a very deep RNN, it may end up overfitting the training set ● To prevent that, a common technique is to apply dropout ● You can simply add a dropout layer before or after the RNN as usual ● But if you also want to apply dropout between the RNN layers, you need to use a DropoutWrapper Deep RNNs - Applying Dropout
  • 121. Recurrent Neural Network ● The following code applies dropout to the inputs of each layer in the RNN, dropping each input with a 50% probability >>> keep_prob = 0.5 >>> cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons) >>> cell_drop = tf.contrib.rnn.DropoutWrapper(cell, input_keep_prob=keep_prob) >>> multi_layer_cell = tf.contrib.rnn.MultiRNNCell([cell_drop] * n_layers) >>> rnn_outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32) Deep RNNs - Applying Dropout Run it on Notebook
  • 122. Recurrent Neural Network ● It is also possible to apply dropout to the outputs by setting output_keep_prob ● The main problem with this code is that it will apply dropout not only during training but also during testing, which is not what we want ● Since dropout should be applied only during training Deep RNNs - Applying Dropout
  • 123. Recurrent Neural Network ● Unfortunately, the DropoutWrapper does not support an is_training placeholder ● So we must either write our own dropout wrapper class, or have two different graphs: ○ One for training ○ And the other for testing Let’s implement the second option Deep RNNs - Applying Dropout
  • 124. Recurrent Neural Network >>> import sys >>> is_training = (sys.argv[-1] == "train") >>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs]) >>> y = tf.placeholder(tf.float32, [None, n_steps, n_outputs]) >>> cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons) >>> if is_training: cell = tf.contrib.rnn.DropoutWrapper(cell, input_keep_prob=keep_prob) >>> multi_layer_cell = tf.contrib.rnn.MultiRNNCell([cell] * n_layers) >>> rnn_outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32) [...] # build the rest of the graph >>> init = tf.global_variables_initializer() >>> saver = tf.train.Saver() >>> with tf.Session() as sess: >>> if is_training: init.run() for iteration in range(n_iterations): [...] # train the model save_path = saver.save(sess, "/tmp/my_model.ckpt") else: saver.restore(sess, "/tmp/my_model.ckpt") [...] # use the model Run it on Notebook Deep RNNs - Applying Dropout
  • 125. Recurrent Neural Network The Difficulty of Training over Many Time Steps ● To train an RNN on long sequences, we will need to run it over many time steps, making the unrolled RNN a very deep network ● Just like any deep neural network it may suffer from the vanishing/exploding gradients problem and take forever to train Deep RNNs
  • 126. Recurrent Neural Network The Difficulty of Training over Many Time Steps ● Many of the tricks we discussed to alleviate this problem can be used for deep unrolled RNNs as well: ○ good parameter initialization, ○ nonsaturating activation functions e.g., ReLU ○ Batch Normalization, ○ Gradient Clipping, ○ And faster optimizers Deep RNNs
  • 127. Recurrent Neural Network The Difficulty of Training over Many Time Steps ● However, if the RNN needs to handle even moderately long sequences e.g., 100 inputs, then training will still be very slow ● The simplest and most common solution to this problem is to unroll the RNN only over a limited number of time steps during training ● This is called truncated backpropagation through time Deep RNNs
  • 128. Recurrent Neural Network The Difficulty of Training over Many Time Steps ● However, if the RNN needs to handle even moderately long sequences e.g., 100 inputs, then training will still be very slow ● The simplest and most common solution to this problem is to unroll the RNN only over a limited number of time steps during training ● This is called truncated backpropagation through time Deep RNNs
  • 129. Recurrent Neural Network The Difficulty of Training over Many Time Steps ● In TensorFlow you can implement truncated backpropagation through time by simply by truncating the input sequences ● For example, in the time series prediction problem, you would simply reduce n_steps during training ● The problem with this is that the model will not be able to learn long-term patterns How can we solve this problem? Deep RNNs
  • 130. Recurrent Neural Network The Difficulty of Training over Many Time Steps ● One workaround could be to make sure that these shortened sequences contain both old and recent data ● So that the model can learn to use both ● E.g., the sequence could contain monthly data for the last five months, then weekly data for the last five weeks, then daily data over the last five days ● But this workaround has its limits: ○ What if fine-grained data from last year is actually useful? ○ What if there was a brief but significant event that absolutely must be taken into account, even years later ○ E.g., the result of an election Deep RNNs
  • 131. Recurrent Neural Network The Difficulty of Training over Many Time Steps ● Besides the long training time ○ A second problem faced by long-running RNNs is the fact that the memory of the first inputs gradually fades away ○ Indeed, due to the transformations that the data goes through when traversing an RNN, some information is lost after each time step. ● After a while, the RNN’s state contains virtually no trace of the first inputs Let’s understand this with an example Deep RNNs
  • 132. Recurrent Neural Network The Difficulty of Training over Many Time Steps ● Say you want to perform sentiment analysis on a long review that starts with the four words “I loved this movie,” ● But the rest of the review lists the many things that could have made the movie even better ● If the RNN gradually forgets the first four words, it will completely misinterpret the review Deep RNNs
  • 133. Recurrent Neural Network The Difficulty of Training over Many Time Steps ● To solve this problem, various types of cells with long-term memory have been introduced ● They have proved so successful that the basic cells are not much used anymore Let’s study about these long memory cells Deep RNNs
  • 135. Recurrent Neural Network ● The Long Short-Term Memory (LSTM) cell was proposed in 19973 by Sepp Hochreiter and Jürgen Schmidhuber ● And it was gradually improved over the years by several researchers, such as Alex Graves, Haşim Sak, Wojciech Zaremba, and many more LSTM Cell Sepp Hochreiter Jürgen Schmidhuber
  • 136. Recurrent Neural Network LSTM Cell ● If you consider the LSTM cell as a black box, it can be used very much like a basic cell ● Except ○ It will perform much better ○ Training will converge faster ○ And it will detect long-term dependencies in the data In TensorFlow, you can simply use a BasicLSTMCell instead of a BasicRNNCell >>> lstm_cell = tf.contrib.rnn.BasicLSTMCell(num_units=n_neurons)
  • 137. Recurrent Neural Network LSTM Cell ● LSTM cells manage two state vectors, and for performance reasons they are kept separate by default ● We can change this default behavior by setting state_is_tuple=False when creating the BasicLSTMCell
  • 138. Recurrent Neural Network LSTM Cell The architecture of a basic LSTM cell
  • 139. Recurrent Neural Network ● The LSTM cell looks exactly like a regular cell, except that its state is split in two vectors: h(t) and c(t) , here “c” stands for “cell” LSTM Cell
  • 140. Recurrent Neural Network ● We can think of h(t) as the short-term state and c(t) as the long-term state LSTM Cell
  • 141. Recurrent Neural Network Understanding the LSTM cell structure ● The key idea is that the network can learn ○ What to store in the long-term state, ○ What to throw away, ○ And what to read from it LSTM Cell
  • 142. Recurrent Neural Network As the long-term state c(t–1) traverses the network from left to right, it first goes through a forget gate, dropping some memories Understanding the LSTM cell structure LSTM Cell
  • 143. Recurrent Neural Network Understanding the LSTM cell structure LSTM Cell And then it adds some new memories via the addition operation, which adds the memories that were selected by an input gate
  • 144. Recurrent Neural Network The result c(t) is sent straight out, without any further transformation. So, at each time step, some memories are dropped and some memories are added Understanding the LSTM cell structure LSTM Cell
  • 145. Recurrent Neural Network Moreover, after the addition operation, the long term state is copied and passed through the tanh function, and then the result is filtered by the output gate. Understanding the LSTM cell structure LSTM Cell
  • 146. Recurrent Neural Network This produces the short-term state h(t) , which is equal to the cell’s output for this time step y(t) Understanding the LSTM cell structure LSTM Cell
  • 147. Recurrent Neural Network This produces the short-term state h(t) , which is equal to the cell’s output for this time step y(t) Understanding the LSTM cell structure LSTM Cell
  • 148. Recurrent Neural Network Now let’s look at where new memories come from and how the gates work LSTM Cell
  • 149. Recurrent Neural Network First, the current input vector x(t) and the previous short-term state h(t–1) are fed to four different fully connected layers. They all serve a different purpose Understanding the LSTM cell structure LSTM Cell
  • 150. Recurrent Neural Network The main layer is the one that outputs g(t) . It has the usual role of analyzing the current inputs x(t) and the previous short-term state h(t–1) . In an LSTM cell this layer’s output is partially stored in the long-term state. Understanding the LSTM cell structure LSTM Cell
  • 151. Recurrent Neural Network The three other layers are gate controllers. Since they use the logistic activation function, their outputs range from 0 to 1. Understanding the LSTM cell structure LSTM Cell
  • 152. Recurrent Neural Network ● This summarizes how to compute the cell’s long-term state, its short-term state, and its output at each time step for a single instance ● The equations for a whole mini-batch are very similar Understanding the LSTM cell structure LSTM Cell
  • 153. Recurrent Neural Network Conclusion ● A LSTM cell can learn to ○ Recognize an important input, that’s the role of the input gate, ○ Store it in the long-term state, ○ Learn to preserve it for as long as it is needed, that’s the role of the forget gate, ○ And learn to extract it whenever it is needed This explains why they have been amazingly successful at capturing long-term patterns in time series, long texts, audio recordings, and more. LSTM Cell
  • 154. Recurrent Neural Network Peephole Connections ● In a basic LSTM cell, the gate controllers can look only at the input x(t) and the previous short-term state h(t–1) ● It may be a good idea to give them a bit more context by letting them peek at the long-term state as well ● This idea was proposed by Felix Gers and Jürgen Schmidhuber in 2000
  • 155. Recurrent Neural Network ● They proposed an LSTM variant with extra connections called peephole connections: ○ The previous long-term state c(t–1) is added as an input to the controllers of the forget gate and the input gate, ○ And the current long-term state c(t) is added as input to the controller of the output gate. Peephole Connections
  • 157. Recurrent Neural Network Peephole Connections To implement peephole connections in TensorFlow, you must use the LSTMCell instead of the BasicLSTMCell and set use_peepholes=True: >>> lstm_cell = tf.contrib.rnn.LSTMCell(num_units=n_neurons, use_peepholes=True) There are many other variants of the LSTM cell. One particularly popular variant is the GRU cell, which we will look at now.
  • 159. Recurrent Neural Network GRU Cell The Gated Recurrent Unit (GRU) cell was proposed by Kyunghyun Cho et al. in a 2014 paper that also introduced the Encoder–Decoder network we discussed earlier Kyunghyun Cho
  • 160. Recurrent Neural Network GRU Cell ● The GRU cell is a simplified version of the LSTM cell ● It seems to perform just as well ● This explains its growing popularity
  • 161. Recurrent Neural Network GRU Cell The main simplifications are: ● Both state vectors are merged into a single vector h(t)
  • 162. Recurrent Neural Network The main simplifications are: ● A single gate controller controls both the forget gate and the input gate. If the gate controller outputs a 1, the input gate is open and the forget gate is closed. GRU Cell
  • 163. Recurrent Neural Network The main simplifications are: If it outputs a 0, the opposite happens In other words, whenever a memory must be stored, the location where it will be stored is erased first. This is actually a frequent variant to the LSTM cell in and of itself GRU Cell
  • 164. Recurrent Neural Network The main simplifications are: ● There is no output gate; the full state vector is output at every time step. There is a new gate controller that controls which part of the previous state will be shown to the main layer. GRU Cell
  • 165. Recurrent Neural Network Equations to compute the cell’s state at each time step for a single instance GRU Cell )
  • 166. Recurrent Neural Network Implementing GRU cell in TensorFlow >>> gru_cell = tf.contrib.rnn.GRUCell(num_units=n_neurons) ● LSTM or GRU cells are one of the main reasons behind the success of RNNs in recent years ● In particular for applications in natural language processing (NLP) GRU Cell
  • 167. Recurrent Neural Network Natural Language Processing
  • 168. Recurrent Neural Network Natural Language Processing ● Most of the state-of-the-art NLP applications, such as ○ Machine translation, ○ Automatic summarization, ○ Parsing, ○ Sentiment analysis, ○ and more, are now based on RNNs Now we will take a quick look at what a machine translation model looks like. This topic is very well covered by TensorFlow’s awesome Word2Vec and Seq2Seq tutorials, so you should definitely check them out
  • 169. Recurrent Neural Network Natural Language Processing - Word Representation Before we start, we need to answer this important question How do we represent a “word” ??
  • 170. Recurrent Neural Network Natural Language Processing - Word Representation In order to apply algorithms, We need to convert everything in numbers. What can we do about climate? temp climate comments 12 Cold Very nice place to visit in summers 30 Hot Do not visit. This is a trap
  • 171. Recurrent Neural Network Natural Language Processing - Word Representation In order to apply algorithms, We need to convert everything in numbers. What can we do about climate? We can convert it into One-Hot vector temp climate comments 12 Cold Very nice place to visit in summers 30 Hot Do not visit. This is a trap temp climate_cold climate_hot comments 12 1 0 Very nice place to visit in summers 30 0 1 Do not visit. This is a trap
  • 172. Recurrent Neural Network Natural Language Processing - Word Representation In order to apply algorithms, We need to convert everything in numbers. And what can we do about comments? temp climate comments 12 Cold Very nice place to visit in summers 30 Hot Do not visit. This is a trap
  • 173. Recurrent Neural Network One option could be to represent each word using a one-hot vector. But consider this : ● Suppose your vocabulary contains 50,000 words ● Then the nth word would be represented as a 50,000-dimensional vector, full of 0s except for a 1 at the nth position ● However, with such a large vocabulary, this sparse representation would not be efficient at all Natural Language Processing - Word Representation
  • 174. Recurrent Neural Network ● Ideally, we want similar words to have similar representations, making it easy for the model to generalize what it learns about a word to all similar words ● For example, ○ If the model is told that “I drink milk” is a valid sentence, and if it knows that “milk” is close to “water” but far from “shoes” ○ Then it will know that “I drink water” is probably a valid sentence as well ○ While “I drink shoes” is probably not But how can you come up with such a meaningful representation? Natural Language Processing - Word Representation
  • 175. Recurrent Neural Network ● The most common solution is to represent each word in the vocabulary using a fairly small and dense vector e.g., 150 dimensions, called an Embedding ● And just let the neural network learn a good embedding for each word during training Natural Language Processing - Word Embedding
  • 176. Recurrent Neural Network With word embeding a lot of magic is possible: king - man + woman == queen Natural Language Processing - Word Embedding
  • 177. Recurrent Neural Network from gensim.models import KeyedVectors # load the google word2vec model filename = 'GoogleNews-vectors-negative300.bin' model = KeyedVectors.load_word2vec_format(filename, binary=True) # calculate: (king - man) + woman = ? result = model.most_similar(positive=['woman', 'king'], negative=['man'], topn=1) print(result) Word Embedding - word2vec ● Based on the context of word, people have generated the vectors. ● One such vector is word2vec and other is Glove [('queen', 0.7118192315101624)]
  • 178. Recurrent Neural Network Word Embedding - Vector space models (VSMs) Based on the Distributional Hypothesis: ○ words that appear in the same contexts share semantic meaning. Two Approaches: 1. Count-based methods (e.g. Latent Semantic Analysis) 2. Predictive methods (e.g. neural probabilistic language models)
  • 179. Recurrent Neural Network Word Embedding - word2vec - Approaches 1. Count-based methods (e.g. Latent Semantic Analysis) ○ Compute the statistics of how often some word co-occurs with its neighbor words in a large text corpus ○ Map these count-statistics down to a small, dense vector for each word
  • 180. Recurrent Neural Network 2. Predictive models ○ Directly try to predict a word from its neighbors ○ in terms of learned small, dense embedding vectors ○ (considered parameters of the model). Word Embedding - word2vec - Approaches
  • 181. Recurrent Neural Network Computationally-efficient predictive model for learning word embeddings from raw text. word2vec Comes in two flavors: 1. Continuous Bag-of-Words model (CBOW) 2. Skip-Gram model
  • 182. Recurrent Neural Network Computationally-efficient predictive model for learning word embeddings from raw text. word2vec Comes in two flavors: 1. Continuous Bag-of-Words model (CBOW) ○ predicts target words (e.g. 'mat') from source context words ○ e.g ('the cat sits on the'), 2. Skip-Gram model
  • 183. Recurrent Neural Network Computationally-efficient predictive model for learning word embeddings from raw text. word2vec Comes in two flavors: 1. Continuous Bag-of-Words model (CBOW) ○ predicts target words (e.g. 'mat') from source context words ○ e.g ('the cat sits on the'), 2. Skip-Gram model ○ Predicts source context-words from the target words ○ Treats each context-target pair as a new observation ○ Tends to do better when we have larger datasets. ○ Will focus on this
  • 184. Recurrent Neural Network Neural probabilistic language models ● are traditionally trained using the maximum likelihood (ML) principle ● to maximize the probability of the next word wt (for "target") ● given the previous words h (for "history") in terms of a softmax function, word2vec: Scaling up Noise-Contrastive Training
  • 185. Recurrent Neural Network Neural probabilistic language models ● are traditionally trained using the maximum likelihood (ML) principle ● to maximize the probability of the next word wt (for "target") ● given the previous words h (for "history") in terms of a softmax function, word2vec: Scaling up Noise-Contrastive Training where score(wt , h) computes the compatibility of word wt with the context h(a dot product is commonly used). We train this model by maximizing its log-likelihood i.e.
  • 186. Recurrent Neural Network Neural probabilistic language models ● are traditionally trained using the maximum likelihood (ML) principle ● to maximize the probability of the next word wt (for "target") ● given the previous words h (for "history") in terms of a softmax function, word2vec: Scaling up Noise-Contrastive Training where score(wt , h) computes the compatibility of word wt with the context h(a dot product is commonly used). We train this model by maximizing its log-likelihood i.e. This is very expensive, because we need to compute and normalize each probability using the score for all other V words w' in the current context , at every training step.
  • 187. Recurrent Neural Network Neural probabilistic language models ● are traditionally trained using the maximum likelihood (ML) principle ● to maximize the probability of the next word wt (for "target") ● given the previous words h (for "history") in terms of a softmax function, word2vec: Scaling up Noise-Contrastive Training This is very expensive, because we need to compute and normalize each probability using the score for all other V words w' in the current context , at every training step.
  • 188. Recurrent Neural Network Instead models trained using a binary classification objective (logistic regression) to discriminate the real target words wt from k imaginary (noise) words w, in the same context. word2vec: Scaling up Noise-Contrastive Training 1. Computing the loss function now scales only with the number of noise words that we select and not all words in the vocabulary 2. This makes it much faster to train. 3. will use similar noise-contrastive estimation (NCE) loss - tf.nn.nce_loss().
  • 189. Recurrent Neural Network the quick brown fox jumped over the lazy dog Word2vec: Context Example ([the, brown], quick), ([quick, fox], brown), ([brown, jumped], fox), ... Context: word to the left and word to the right.
  • 190. Recurrent Neural Network the quick brown fox jumped over the lazy dog Word2vec: Skip Gram Model (quick, the), (quick, brown), (brown, quick), (brown, fox), ... Task becomes to predict 'the' and 'brown' from 'quick', 'quick' and 'fox' from 'brown', etc. Skip-gram ● inverts contexts and targets, and ● tries to predict each context word from its target word
  • 191. Recurrent Neural Network Natural Language Processing - Word Embedding Let's imagine at training step t ● For first case above, the goal is to predict the from quick. ● We select num_noise number ○ of noisy (contrastive) examples ○ by drawing from some noise distribution, ○ typically the unigram distribution, ● For simplicity let's say num_noise=1 and we select sheep as a noisy example. Next we compute the loss for this pair of observed and noisy examples
  • 192. Recurrent Neural Network Natural Language Processing - Word Embedding The objective at time step t becomes:
  • 193. Recurrent Neural Network Natural Language Processing - Word Embedding ● The goal is to make an update to the embedding parameters ● to improve (in this case, maximize) the objective function ● We do this by deriving the gradient of the loss with respect to the embedding parameters , i.e. (luckily TensorFlow provides easy helper functions for doing this!). ● We then perform an update to the embeddings by taking a small step in the direction of the gradient. When this process is repeated over the entire training set, this has the effect of 'moving' the embedding vectors around for each word until the model is successful at discriminating real words from noise words.
  • 194. Recurrent Neural Network Natural Language Processing - Word Embedding ● At the beginning of training, embeddings are simply chosen randomly, ● But during training, backpropagation automatically moves the embeddings around in a way that helps the neural network perform its task
  • 195. Recurrent Neural Network Natural Language Processing - Word Embedding ● Typically this means that similar words will gradually cluster close to one another, and even end up organized in a rather meaningful way. ● For example, embeddings may end up placed along various axes that represent ○ gender, ○ singular/plural, ○ adjective/noun, ○ and so on
  • 196. Recurrent Neural Network Natural Language Processing - Word Embedding How to do it in TensorFlow In TensorFlow, we first need to create the variable representing the embeddings for every word in our vocabulary which is initialized randomly >>> vocabulary_size = 50000 >>> embedding_size = 150 >>> embeddings = tf.Variable( tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
  • 197. Recurrent Neural Network How to do it in TensorFlow - Preprocessing Suppose we want to feed the sentence “I drink milk” to your neural network. ● We should first preprocess the sentence and break it into a list of known words ● For example ○ We may remove unnecessary characters, replace unknown words by a predefined token word such as “[UNK]”, ○ Replace numerical values by “[NUM]”, ○ Replace URLs by “[URL]”, ○ And so on Natural Language Processing - Word Embedding
  • 198. Recurrent Neural Network How to do it in TensorFlow ● Once we have a list of known words, we can look up each word’s integer identifier from 0 to 49999 in a dictionary, for example [72, 3335, 288] ● At that point, you are ready to feed these word identifiers to TensorFlow using a placeholder, and apply the embedding_lookup() function to get the corresponding embeddings >>> train_inputs = tf.placeholder(tf.int32, shape=[None]) # from ids... >>> embed = tf.nn.embedding_lookup(embeddings, train_inputs) # ...to embeddings Natural Language Processing - Word Embedding
  • 199. Recurrent Neural Network ● Once our model has learned good word embeddings, it can actually be reused fairly efficiently in any NLP application ● In fact, instead of training your own word embeddings, we may want to download pre-trained word embeddings ● Just like when reusing pretrained layers, we can choose to ○ Freeze the pretrained embeddings ○ Or let backpropagation tweak them for your application ● The first option will speed up training, but the second may lead to slightly higher performance Natural Language Processing - Word Embedding
  • 201. Recurrent Neural Network Machine Translation We now have almost all the tools we need to implement a machine translation system Let’s look at this now
  • 202. Recurrent Neural Network Machine Translation An Encoder–Decoder Network for Machine Translation Let’s take a look at a simple machine translation model that will translate English sentences to French
  • 203. Recurrent Neural Network Machine Translation An Encoder–Decoder Network for Machine Translation A simple machine translation model
  • 204. Recurrent Neural Network Machine Translation Let’s learn how this Encoder–Decoder Network for Machine Translation is trained
  • 205. Recurrent Neural Network The English sentences are fed to the encoder, and the decoder outputs the French translations Machine Translation An Encoder–Decoder Network for Machine Translation
  • 206. Recurrent Neural Network Note that the French translations are also used as inputs to the decoder, but pushed back by one step Machine Translation An Encoder–Decoder Network for Machine Translation
  • 207. Recurrent Neural Network Machine Translation An Encoder–Decoder Network for Machine Translation In other words, the decoder is given as input the word that it should have output at the previous step. Regardless of what it actually output at the current step
  • 208. Recurrent Neural Network For the very first word, the decoder is given a token that represents the beginning of the sentence (here, “<go>”) The decoder is expected to end the sentence with an end-of-sequence (EOS) token (here, “<eos>”) Machine Translation An Encoder–Decoder Network for Machine Translation
  • 209. Recurrent Neural Network Question: Why are the English sentences reversed before feeding it to the encoder?? Here “I drink milk” is reversed to “milk drink I” Machine Translation An Encoder–Decoder Network for Machine Translation
  • 210. Recurrent Neural Network Answer: This ensures that the beginning of the English sentence will be fed last to the encoder, which is useful because that’s generally the first thing that the decoder needs to translate Machine Translation An Encoder–Decoder Network for Machine Translation
  • 211. Recurrent Neural Network ● Each word is initially represented by a simple integer identifier ● e.g., 288 for the word “milk” Machine Translation An Encoder–Decoder Network for Machine Translation
  • 212. Recurrent Neural Network ● Next, an embedding lookup returns the word embedding ● This is a dense, fairly low-dimensional vector ● These word embeddings are what is actually fed to the encoder and the decoder Machine Translation An Encoder–Decoder Network for Machine Translation
  • 213. Recurrent Neural Network ● At each step, the decoder outputs a score for each word in the output vocabulary i.e., French, Machine Translation An Encoder–Decoder Network for Machine Translation
  • 214. Recurrent Neural Network ● And then the Softmax layer turns these scores into probabilities Machine Translation An Encoder–Decoder Network for Machine Translation
  • 215. Recurrent Neural Network ● For example, at the first step the word “Je” may have a probability of 20%, “Tu” may have a probability of 1%, and so on ● The word with the highest probability is output Machine Translation An Encoder–Decoder Network for Machine Translation
  • 216. Recurrent Neural Network How can we use this Encoder–Decoder Network for Machine Translation at the inference time, since we will not have the target sentence to feed to the decoder ?? Machine Translation An Encoder–Decoder Network for Machine Translation
  • 217. Recurrent Neural Network ● We will simply feed the decoder the word that it output at the previous step ● This will require an embedding lookup that is not shown on the diagram Machine Translation An Encoder–Decoder Network for Machine Translation