0

I want to train an RNN over 5 training points where each sequence also has a size of 5. At test time, I want to send in a single data point and compute the output.

The task is to predict the next character in a sequence of five characters (all encoded as 1-hot vectors). I have tried duplicating the test data point five times. However, I am sure that this is not the right way to solve this problem.

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F

# Define the parameters
H = [ 1, 0, 0, 0 ]
E = [ 0, 1, 0, 0 ]
L = [ 0, 0, 1, 0 ]
O = [ 0, 0, 0, 1 ]

# Define the model
net = nn.RNN(input_size=4, hidden_size=4, batch_first=True)

# Generate data
data = [[H,E,L,L,O],
        [E,L,L,O,H],
        [L,L,O,H,E],
        [L,O,H,E,L],
        [O,H,E,L,L]]
inputs = torch.tensor(data).float()
hidden = torch.randn(1,5,4) # Random initialization
correct_outputs = torch.tensor(np.array(data[1:]+[data[0]]).astype(float).tolist(), requires_grad=True)

# Set the loss function
criterion = torch.nn.MSELoss()

# Set the optimizer
optimizer = torch.optim.SGD(net.parameters(), lr=0.1)

# Perform gradient descent until convergence
for epoch in range(1000):
    # Forward Propagation
    outputs, hidden = net(inputs, hidden)
    # Compute and print loss
    loss = criterion(nn.functional.softmax(outputs,2), correct_outputs)
    print('epoch: ', epoch,' loss: ', loss.item())
    # Zero the gradients
    optimizer.zero_grad()
    # Backpropagation
    loss.backward(retain_graph=True)
    # Parameter update
    optimizer.step()

# Predict
net(torch.tensor([[H,E,L,L,O]]).float(),hidden)

I get the following error:

RuntimeError: Expected hidden size (1, 1, 4), got (1, 5, 4)

I understand that torch wants a tensor of size (1,1,4) but I am not sure how I can convert the initial hidden state from (1, 5, 4) to (1, 1, 4). Any help would be highly appreciated!

2
  • shouldn't inputs be of shape (seq_len, batch, input_size) ?
    – eugen
    Commented Oct 1, 2019 at 4:16
  • No, because of batch_first=True in RNN, the inputs should be of shape (batch, seq_len,input_size).
    – Wasi Ahmad
    Commented Oct 1, 2019 at 4:21

2 Answers 2

2

You are getting the error because you are using:

hidden = torch.randn(1,5,4) # Random initialization

Instead, you should use:

hidden = torch.randn(1,inputs.size(0),4) # Random initialization

to cope up with the batch size of the inputs. So, do the following:

# Predict
inputs = torch.tensor([[H,E,L,L,O]]).float()
hidden = torch.randn(1,inputs.size(0),4)
net(inputs, hidden)

Suggestion: improve your coding style by following some good examples in PyTorch.

4
  • Even with this option, it doesn't fix the issue. Have you checked it ?
    – kmario23
    Commented Oct 1, 2019 at 4:45
  • Thanks, this worked! So basically I need to randomly initialize a hidden tensor just before I start predicting, right? I think this is what was throwing me up.
    – statkun
    Commented Oct 1, 2019 at 5:19
  • Also, you are absolutely right. I need to improve my coding style. I just started using PyTorch today. I did try to go through the documentation but I found it very confusing. I will try looking at more resources.
    – statkun
    Commented Oct 1, 2019 at 5:21
  • @WasiAhmad sorry I didn't clear my cache :(.. that was the issue. Seems good to me! +1
    – kmario23
    Commented Oct 1, 2019 at 5:45
0

Another option would be to just remove the keyword argument, batch_first=True when you define the model.

# Define the model
net = nn.RNN(input_size=4, hidden_size=4)
2
  • Thanks for your answer. If I do this, then I need to change my input tensor too, right? Because right now I'm using (batch_size, seq_size, num_features).
    – statkun
    Commented Oct 1, 2019 at 5:24
  • @skst no, this is the only change that's needed. Your input need not be changed.
    – kmario23
    Commented Oct 1, 2019 at 5:51

Not the answer you're looking for? Browse other questions tagged or ask your own question.