I am iterating over training samples in batches, however last batch always returns fewer samples.
Is it possible to specify step size in torch according to the current batch length?
For example most batch are of size 64, last batch only 6 samples.
If I do the usual routine:
optimizer.zero_grad()
loss.backward()
optimizer.step()
It seems that the last 6 samples carry the same weight when updating the gradients as the 64 sized batches, but in fact they should only carry about 1/10 weight due to fewer samples.
In Mxnet I could specify the step size accordingly but I don't know how to do it in torch.
drop_last=True
when initializing theDataLoader
.