fist of all i am ne wto the whole thing, so sorry if this is superdumb.
I'm currently training a Transformer model for a sequence classification task using CrossEntropyLoss. My input tensor has the shape (batch_size, classes, seq_len) and my target tensor has the shape (batch_size, seq_len).
Chatgpt advised me to the following:
yHatReshaped = yHat.view(-1, 512)
yReshaped = y.view(-1)
error = lossFunction(yHatReshaped, yReshaped)
Is that correct and the best way to handle a seqence? The documentation also just confuses me, since it says (N,C,d1,d2,...,dK) for k-dimensional loss. Is my sequence basicly a d1? I dont understand the whole thing.
Thanks in advance for your help!