Pytorch doesn't support one-hot vector?

Question

I am very confused by how Pytorch deals with one-hot vectors. In this tutorial, the neural network will generate a one-hot vector as its output. As far as I understand, the schematic structure of the neural network in the tutorial should be like:

However, the labels are not in one-hot vector format. I get the following size

print(labels.size())
print(outputs.size())

output>>> torch.Size([4]) 
output>>> torch.Size([4, 10])

Miraculously, I they pass the outputs and labels to criterion=CrossEntropyLoss(), there's no error at all.

loss = criterion(outputs, labels) # How come it has no error?

My hypothesis:

Maybe pytorch automatically convert the labels to one-hot vector form. So, I try to convert labels to one-hot vector before passing it to the loss function.

def to_one_hot_vector(num_class, label):
    b = np.zeros((label.shape[0], num_class))
    b[np.arange(label.shape[0]), label] = 1

    return b

labels_one_hot = to_one_hot_vector(10,labels)
labels_one_hot = torch.Tensor(labels_one_hot)
labels_one_hot = labels_one_hot.type(torch.LongTensor)

loss = criterion(outputs, labels_one_hot) # Now it gives me error

However, I got the following error

RuntimeError: multi-target not supported at /opt/pytorch/pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu:15

So, one-hot vectors are not supported in Pytorch? How does Pytorch calculates the cross entropy for the two tensor outputs = [1,0,0],[0,0,1] and labels = [0,2] ? It doesn't make sense to me at all at the moment.

Oliver · Accepted Answer · 2020-08-24 02:51:40Z

37

PyTorch states in its documentation for CrossEntropyLoss that

This criterion expects a class index (0 to C-1) as the target for each value of a 1D tensor of size minibatch

In other words, it has your to_one_hot_vector function conceptually built in CEL and does not expose the one-hot API. Notice that one-hot vectors are memory inefficient compared to storing class labels.

If you are given one-hot vectors and need to go to class labels format (for instance to be compatible with CEL), you can use argmax like below:

import torch
 
labels = torch.tensor([1, 2, 3, 5])
one_hot = torch.zeros(4, 6)
one_hot[torch.arange(4), labels] = 1
 
reverted = torch.argmax(one_hot, dim=1)
assert (labels == reverted).all().item()

edited Aug 24, 2020 at 2:51

Oliver

1505 bronze badges

answered Apr 6, 2019 at 14:15

Jatentaki

12.8k4 gold badges45 silver badges40 bronze badges

So there's no need to one hot encode classes when using nn.CrossEntropyLoss Note that "The combination of nn.LogSoftmax and nn.NLLLoss is equivalent to using nn.CrossEntropyLoss.", so if your sequential you have a nn.LogSoftmax, and the you have a loss = nn.NLLLoss, you don't need to one hot encode either.
– gianni
Commented Oct 2, 2021 at 8:20

Add a comment |

prosti · Accepted Answer · 2019-09-14 18:27:49Z

This code will help you with both one hot encode and multi hot encode:

import torch
batch_size=10
n_classes=5
target = torch.randint(high=5, size=(1,10)) # set size (2,10) for MHE
print(target)
y = torch.zeros(batch_size, n_classes)
y[range(y.shape[0]), target]=1
y

The output in OHE

tensor([[4, 3, 2, 2, 4, 1, 1, 1, 4, 2]])

tensor([[0., 0., 0., 0., 1.],
        [0., 0., 0., 1., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 1., 0., 0.],
        [0., 0., 0., 0., 1.],
        [0., 1., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 1., 0., 0., 0.],
        [0., 0., 0., 0., 1.],
        [0., 0., 1., 0., 0.]])

The output for MHE when I set target = torch.randint(high=5, size=(2,10))

tensor([[3, 2, 4, 4, 2, 4, 0, 4, 4, 1],
        [4, 1, 1, 3, 2, 2, 4, 2, 4, 3]])

tensor([[0., 0., 0., 1., 1.],
        [0., 1., 1., 0., 0.],
        [0., 1., 0., 0., 1.],
        [0., 0., 0., 1., 1.],
        [0., 0., 1., 0., 0.],
        [0., 0., 1., 0., 1.],
        [1., 0., 0., 0., 1.],
        [0., 0., 1., 0., 1.],
        [0., 0., 0., 0., 1.],
        [0., 1., 0., 1., 0.]])

If you need multiple OHE:

torch.nn.functional.one_hot(target)

tensor([[[0, 0, 0, 1, 0],
         [0, 0, 1, 0, 0],
         [0, 0, 0, 0, 1],
         [0, 0, 0, 0, 1],
         [0, 0, 1, 0, 0],
         [0, 0, 0, 0, 1],
         [1, 0, 0, 0, 0],
         [0, 0, 0, 0, 1],
         [0, 0, 0, 0, 1],
         [0, 1, 0, 0, 0]],

        [[0, 0, 0, 0, 1],
         [0, 1, 0, 0, 0],
         [0, 1, 0, 0, 0],
         [0, 0, 0, 1, 0],
         [0, 0, 1, 0, 0],
         [0, 0, 1, 0, 0],
         [0, 0, 0, 0, 1],
         [0, 0, 1, 0, 0],
         [0, 0, 0, 0, 1],
         [0, 0, 0, 1, 0]]])

Here is a specific use case on hoe to use the one_hot in a transform, because that is what we are all looking for: ``` target_transform=torchvision.transforms.Compose([ lambda x:torch.LongTensor([x]), lambda x:F.one_hot(x,10)], lambda x: x.squeeze()]) ``` stackoverflow.com/questions/63342147/… — Ben, Commented May 17, 2021 at 16:20

ashkan · Accepted Answer · 2021-12-22 08:46:59Z

As stated clearly by @Jatentaki, you can use torch.argmax(one_hot, dim=1) to convert the one-hot encoded vectors to numbers.

However, if you still want to train your network with one-hot encoded output in PyTorch, you can use nn.LogSoftmax along with NLLLOSS:

import torch
from torch import nn

output_onehot = nn.LogSoftmax(dim=1)(torch.randn(3, 5)) # m = 3 samples, each has n = 5 features
target = torch.tensor([1, 0, 4]) # target values for each sample

nn.NLLLoss()(output_onehot, target)

print(output_onehot)
print(target)

# You can get the probabilities using the exponential function:
print("Probabilities:", torch.exp(output_onehot))

The output will be something like this:

tensor([[-0.5413, -2.4461, -2.0110, -1.9964, -2.7851],
        [-2.3376, -1.6985, -1.8472, -3.0975, -0.6585],
        [-3.2820, -0.7160, -1.5297, -1.5636, -3.0412]])
tensor([1, 0, 4])
Probabilities: tensor([[0.5820, 0.0866, 0.1339, 0.1358, 0.0617],
        [0.0966, 0.1830, 0.1577, 0.0452, 0.5176],
        [0.0376, 0.4887, 0.2166, 0.2094, 0.0478]])

Ferial · Accepted Answer · 2022-06-10 19:45:50Z

The original post was for a few years ago, the documentation for CrossEntropyLoss may have changed since then. According to that, targets can be either in the form of probabilities for each class (here one-hot encoded) or class indices.

You can convert a numpy array labels from class type to one-hot encoded vectors:

import torch.nn.functional as F
class_labels = torch.Tensor(numpy_class_labels)
one_hot_labels = F.one_hot(class_labels, num_classes=n_classes)
labels = labels.type(torch.DoubleTensor)
loss = criterion(outputs, labels)

Or you can do the reverse and convert the one-hot encoded labels to classes:

one_hot_labels = torch.Tensor(numpy_one_hot_labels)
_, class_labels = torch.max(one_hot_labels, 1)
labels = class_labels.type(torch.LongTensor)
loss = criterion(outputs, labels)

Just be cautious about the Long and Double formats above, and note that using classes is preferable as the documentation says:

The performance of this criterion is generally better when target contains class indices, as this allows for optimized computation. Consider providing target as class probabilities only when a single class label per minibatch item is too restrictive.

Collectives™ on Stack Overflow

Pytorch doesn't support one-hot vector?

My hypothesis:

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
python
machine-learning
pytorch
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

My hypothesis:

4 Answers 4

Not the answer you're looking for? Browse other questions tagged pythonmachine-learningpytorch or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
machine-learning
pytorch
or ask your own question.