I'm looking at this pytorch starter tutorial:
the zero_grad()
function is being used to zero the gradients which means that it's running with mini-batches, is this a correct assumption? If so, where is the batch size defined??
I found the following for nn.conv2d
:
For example, nn.Conv2d will take in a 4D Tensor of nSamples x nChannels x Height x Width.
in that case nSamples
is the batch size?
but how do you specify the batch size for a nn.Linear
layer? do you decide what your mini-batches are when you load the data or what?
I am making a few assumptions here that may be totally incorrect, pls correct me if i'm wrong. thank you!