Predicting next word using the language model tensorflow example

Question

The tensorflow tutorial on language model allows to compute the probability of sentences :

probabilities = tf.nn.softmax(logits)

in the comments below it also specifies a way of predicting the next word instead of probabilities but does not specify how this can be done. So how to output a word instead of probability using this example?

lstm = rnn_cell.BasicLSTMCell(lstm_size)
# Initial state of the LSTM memory.
state = tf.zeros([batch_size, lstm.state_size])

loss = 0.0
for current_batch_of_words in words_in_dataset:
    # The value of state is updated after processing each batch of words.
    output, state = lstm(current_batch_of_words, state)

    # The LSTM output can be used to make next word predictions
    logits = tf.matmul(output, softmax_w) + softmax_b
    probabilities = tf.nn.softmax(logits)
    loss += loss_function(probabilities, target_words)

Guillaume Chevalier · Accepted Answer · 2017-02-22 00:53:29Z

Your output is a TensorFlow list and it is possible to get its max argument (the predicted most probable class) with a TensorFlow function. This is normally the list that contains the next word's probabilities.

At "Evaluate the Model" from this page, your output list is y in the following example:

First we'll figure out where we predicted the correct label. tf.argmax is an extremely useful function which gives you the index of the highest entry in a tensor along some axis. For example, tf.argmax(y,1) is the label our model thinks is most likely for each input, while tf.argmax(y_,1) is the true label. We can use tf.equal to check if our prediction matches the truth. correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

Another approach that is different is to have pre-vectorized (embedded/encoded) words. You could vectorize your words (therefore embed them) with Word2vec to accelerate learning, you might want to take a look at this. Each word could be represented as a point in a 300 dimensions space of meaning, and you could find automatically the "N words" closest to the predicted point in space at the output of the network. In that case, the argmax way to proceed does not work anymore and you could probably compare on cosine similarity with the words you truly wanted to compare to, but for that I am not sure actually how does this could cause numerical instabilities. In that case y will not represent words as features, but word embeddings over a dimensionality of, let's say, 100 to 2000 in size according to different models. You could Google something like this for more info: "man woman queen word addition word2vec" to understand the subject of embeddings more.

Note: when I talk about word2vec here, it is about using an external pre-trained word2vec model to help your training to only have pre-embedded inputs and create embedding outputs. Those outputs' corresponding words can be re-figured out by word2vec to find the corresponding similar top predicted words.

Notice that the approach I suggest is not exact since it would be only useful to know if we predict EXACTLY the word that we wanted to predict. For a more soft approach, it would be possible to use ROUGE or BLEU metrics for evaluating your model in case you use sentences or something longer than a word.

That is not the correct function for this purpose as the next most likely word given the existing sequence needs to be found. — stackit, Commented Nov 24, 2015 at 6:06
Maybe your question was not precise enough? It seems to me that tf.argmax(probabilities,1) would give you the answer after the training. Giving the most likely word is what the model is trained on and therefore it is what it will output. You may need to tweak a little bit the index given by the function call I just gave you to get the word back from your dictionnary. — Guillaume Chevalier, Commented Nov 26, 2015 at 4:11
If your model was trained to predict word embedding (words represented as vectors), you need to have a tool to backward-embed your words. Word2vec and GloVe are interesting pretrained model for that reason. If your whole word dictionary is embed as a one-hot vector for each word, then the number that is outputted from my function here is the index of that word in the dictionary. — Guillaume Chevalier, Commented Nov 26, 2015 at 4:15
@GuillaumeChevalier @stackit Sorry just for clarification, you're saying that just by calling the command: correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1)) you'll be able to predict the most likely value? y is going to be the sentence that you want to predict on then? Sorry I'm still having trouble going from a string where i want to predict the next word, to a valid list of words that could be next with some probability — jlarks32, Commented Feb 21, 2017 at 2:35
You seem to want to predict sentences and not only single words. I would recommend you to take a look at Udacity's deep learning class in which there is an assignment where coding a word2vec model itself is done: classroom.udacity.com/courses/ud730 This could help you to figure out how to code a seq2seq model for better sentence prediction with a dynamic decoder. More info here too: youtube.com/watch?v=RIR_-Xlbp7s — Guillaume Chevalier, Commented Feb 22, 2017 at 0:51

Rani Nelken · Accepted Answer · 2016-09-02 00:30:05Z

You need to find the argmax of the probabilities, and translate the index back to a word by reversing the word_to_id map. To get this to work, you must save the probabilities in the model and then fetch them from the run_epoch function (you could also save just the argmax itself). Here's a snippet:

inverseDictionary = dict(zip(word_to_id.values(), word_to_id.keys()))

def run_epoch(...):
  decodedWordId = int(np.argmax(logits))
  print (" ".join([inverseDictionary[int(x1)] for x1 in np.nditer(x)])  
    + " got" + inverseDictionary[decodedWordId] + 
    + " expected:" + inverseDictionary[int(y)])

See full implementation here: https://github.com/nelken/tf

Code above doesn't work with current versions of TensorFlow. — pr338, Commented Feb 12, 2019 at 2:33

Cristian F · Accepted Answer · 2015-11-18 13:15:00Z

-1

It is actually an advantage that the function returns a probability instead of the word itself. Since it is using a list of words, with the associated probabilities, you can do further processing, and increase the accuracy of your result.

To answer your question: You can take the list of words, iterate though it , and make the program display the word with the highest probability.

answered Nov 18, 2015 at 13:15

Cristian F

5764 silver badges13 bronze badges

Yes I understood that can you code an example for the same? Also there is possibly a huge vocab size and iterating for each word in the vocab is practically infeasible.
– stackit
Commented Nov 18, 2015 at 14:20
Machine learning in its nature is a high computation method of solving a problem. Depending on how you are training your model, you might already be iterating over the vocab many times. On a typical machine, you can iterate over a couple million strings in a few seconds, so it might not be indefeasible. If you want to cut down on computation time (and subsequently on performance) you can implement a way to just stop iterating when you find a result with a big enough probability
– Cristian F
Commented Nov 18, 2015 at 15:38
During training its fine but not during production use
– stackit
Commented Nov 18, 2015 at 17:12

Add a comment |

Collectives™ on Stack Overflow

Predicting next word using the language model tensorflow example

3 Answers 3

Not the answer you're looking for? Browse other questions tagged
python
neural-network
tensorflow
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Not the answer you're looking for? Browse other questions tagged pythonneural-networktensorflow or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
neural-network
tensorflow
or ask your own question.