6
$\begingroup$

In mini batch GD, the loss function is calculated for mini batches. Suppose we have 480 training examples and the batch size is 32. So there will be 480/32= 15 loss functions. In every batch these 15 loss functions will be minimized by updating the weights and biases. But how do we accumulate the total loss function using those 15 different loss functions. Please correct me if my understanding is not correct.

$\endgroup$

1 Answer 1

2
$\begingroup$

You don't accumulate those losses unless you are reporting training loss, which is usually not part of the training where mini-batching matters. The mini-batch is assumed to approximate the full-batch loss function and we update the weight and bias under that assumption, in the hope that full batch also approximates the average loss for the population.

In the extreme case, there is online learning, where only one training sample is thrown into the neural net and the weights are updated according to that sample only. When data is extremely abundant, we sometimes don't even use a single sample twice, so again no aggregation of losses.

$\endgroup$
3
  • 1
    $\begingroup$ Thanks for your comment. I have one more confusion. As we do not accumulate those loss functions then How the updated weights and biases for the first mini batch impacts or carry forward to the next batch run? $\endgroup$ Commented Mar 28, 2020 at 11:53
  • $\begingroup$ Once you update the weights according to your first mini-batch, you use the updated weights in the second mini-batch. $\endgroup$
    – gunes
    Commented Mar 28, 2020 at 17:09
  • $\begingroup$ Thank you very much for the clarification. It is really helpful. $\endgroup$ Commented Mar 29, 2020 at 7:43

Not the answer you're looking for? Browse other questions tagged or ask your own question.