28

I am very confused by how Pytorch deals with one-hot vectors. In this tutorial, the neural network will generate a one-hot vector as its output. As far as I understand, the schematic structure of the neural network in the tutorial should be like:

enter image description here

However, the labels are not in one-hot vector format. I get the following size

print(labels.size())
print(outputs.size())

output>>> torch.Size([4]) 
output>>> torch.Size([4, 10])

Miraculously, I they pass the outputs and labels to criterion=CrossEntropyLoss(), there's no error at all.

loss = criterion(outputs, labels) # How come it has no error?

My hypothesis:

Maybe pytorch automatically convert the labels to one-hot vector form. So, I try to convert labels to one-hot vector before passing it to the loss function.

def to_one_hot_vector(num_class, label):
    b = np.zeros((label.shape[0], num_class))
    b[np.arange(label.shape[0]), label] = 1

    return b

labels_one_hot = to_one_hot_vector(10,labels)
labels_one_hot = torch.Tensor(labels_one_hot)
labels_one_hot = labels_one_hot.type(torch.LongTensor)

loss = criterion(outputs, labels_one_hot) # Now it gives me error

However, I got the following error

RuntimeError: multi-target not supported at /opt/pytorch/pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu:15

So, one-hot vectors are not supported in Pytorch? How does Pytorch calculates the cross entropy for the two tensor outputs = [1,0,0],[0,0,1] and labels = [0,2] ? It doesn't make sense to me at all at the moment.

4 Answers 4

37

PyTorch states in its documentation for CrossEntropyLoss that

This criterion expects a class index (0 to C-1) as the target for each value of a 1D tensor of size minibatch

In other words, it has your to_one_hot_vector function conceptually built in CEL and does not expose the one-hot API. Notice that one-hot vectors are memory inefficient compared to storing class labels.

If you are given one-hot vectors and need to go to class labels format (for instance to be compatible with CEL), you can use argmax like below:

import torch
 
labels = torch.tensor([1, 2, 3, 5])
one_hot = torch.zeros(4, 6)
one_hot[torch.arange(4), labels] = 1
 
reverted = torch.argmax(one_hot, dim=1)
assert (labels == reverted).all().item()
1
  • So there's no need to one hot encode classes when using nn.CrossEntropyLoss Note that "The combination of nn.LogSoftmax and nn.NLLLoss is equivalent to using nn.CrossEntropyLoss.", so if your sequential you have a nn.LogSoftmax, and the you have a loss = nn.NLLLoss, you don't need to one hot encode either.
    – gianni
    Commented Oct 2, 2021 at 8:20
15

This code will help you with both one hot encode and multi hot encode:

import torch
batch_size=10
n_classes=5
target = torch.randint(high=5, size=(1,10)) # set size (2,10) for MHE
print(target)
y = torch.zeros(batch_size, n_classes)
y[range(y.shape[0]), target]=1
y

The output in OHE

tensor([[4, 3, 2, 2, 4, 1, 1, 1, 4, 2]])

tensor([[0., 0., 0., 0., 1.],
        [0., 0., 0., 1., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 0., 1.],
        [0., 1., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 0., 0., 1.],
        [0., 0., 1., 0., 0.]])

The output for MHE when I set target = torch.randint(high=5, size=(2,10))

tensor([[3, 2, 4, 4, 2, 4, 0, 4, 4, 1],
        [4, 1, 1, 3, 2, 2, 4, 2, 4, 3]])

tensor([[0., 0., 0., 1., 1.],
        [0., 1., 1., 0., 0.],
        [0., 1., 0., 0., 1.],
        [0., 0., 0., 1., 1.],
        [0., 0., 1., 0., 0.],
        [0., 0., 1., 0., 1.],
        [1., 0., 0., 0., 1.],
        [0., 0., 1., 0., 1.],
        [0., 0., 0., 0., 1.],
        [0., 1., 0., 1., 0.]])

If you need multiple OHE:

torch.nn.functional.one_hot(target)

tensor([[[0, 0, 0, 1, 0],
         [0, 0, 1, 0, 0],
         [0, 0, 0, 0, 1],
         [0, 0, 0, 0, 1],
         [0, 0, 1, 0, 0],
         [0, 0, 0, 0, 1],
         [1, 0, 0, 0, 0],
         [0, 0, 0, 0, 1],
         [0, 0, 0, 0, 1],
         [0, 1, 0, 0, 0]],

        [[0, 0, 0, 0, 1],
         [0, 1, 0, 0, 0],
         [0, 1, 0, 0, 0],
         [0, 0, 0, 1, 0],
         [0, 0, 1, 0, 0],
         [0, 0, 1, 0, 0],
         [0, 0, 0, 0, 1],
         [0, 0, 1, 0, 0],
         [0, 0, 0, 0, 1],
         [0, 0, 0, 1, 0]]])
1
  • Here is a specific use case on hoe to use the one_hot in a transform, because that is what we are all looking for: ``` target_transform=torchvision.transforms.Compose([ lambda x:torch.LongTensor([x]), lambda x:F.one_hot(x,10)], lambda x: x.squeeze()]) ``` stackoverflow.com/questions/63342147/…
    – Ben
    Commented May 17, 2021 at 16:20
1

As stated clearly by @Jatentaki, you can use torch.argmax(one_hot, dim=1) to convert the one-hot encoded vectors to numbers.

However, if you still want to train your network with one-hot encoded output in PyTorch, you can use nn.LogSoftmax along with NLLLOSS:

import torch
from torch import nn

output_onehot = nn.LogSoftmax(dim=1)(torch.randn(3, 5)) # m = 3 samples, each has n = 5 features
target = torch.tensor([1, 0, 4]) # target values for each sample

nn.NLLLoss()(output_onehot, target)

print(output_onehot)
print(target)

# You can get the probabilities using the exponential function:
print("Probabilities:", torch.exp(output_onehot))

The output will be something like this:

tensor([[-0.5413, -2.4461, -2.0110, -1.9964, -2.7851],
        [-2.3376, -1.6985, -1.8472, -3.0975, -0.6585],
        [-3.2820, -0.7160, -1.5297, -1.5636, -3.0412]])
tensor([1, 0, 4])
Probabilities: tensor([[0.5820, 0.0866, 0.1339, 0.1358, 0.0617],
        [0.0966, 0.1830, 0.1577, 0.0452, 0.5176],
        [0.0376, 0.4887, 0.2166, 0.2094, 0.0478]])

0

The original post was for a few years ago, the documentation for CrossEntropyLoss may have changed since then. According to that, targets can be either in the form of probabilities for each class (here one-hot encoded) or class indices.

You can convert a numpy array labels from class type to one-hot encoded vectors:

import torch.nn.functional as F
class_labels = torch.Tensor(numpy_class_labels)
one_hot_labels = F.one_hot(class_labels, num_classes=n_classes)
labels = labels.type(torch.DoubleTensor)
loss = criterion(outputs, labels)

Or you can do the reverse and convert the one-hot encoded labels to classes:

one_hot_labels = torch.Tensor(numpy_one_hot_labels)
_, class_labels = torch.max(one_hot_labels, 1)
labels = class_labels.type(torch.LongTensor)
loss = criterion(outputs, labels)

Just be cautious about the Long and Double formats above, and note that using classes is preferable as the documentation says:

The performance of this criterion is generally better when target contains class indices, as this allows for optimized computation. Consider providing target as class probabilities only when a single class label per minibatch item is too restrictive.

Not the answer you're looking for? Browse other questions tagged or ask your own question.