How total loss is manipulated in mini batch gradient descent as the loss function is calculated and minimized for mini batches?

Question

In mini batch GD, the loss function is calculated for mini batches. Suppose we have 480 training examples and the batch size is 32. So there will be 480/32= 15 loss functions. In every batch these 15 loss functions will be minimized by updating the weights and biases. But how do we accumulate the total loss function using those 15 different loss functions. Please correct me if my understanding is not correct.

gunes · Accepted Answer · 2020-03-28 07:59:03Z

2

You don't accumulate those losses unless you are reporting training loss, which is usually not part of the training where mini-batching matters. The mini-batch is assumed to approximate the full-batch loss function and we update the weight and bias under that assumption, in the hope that full batch also approximates the average loss for the population.

In the extreme case, there is online learning, where only one training sample is thrown into the neural net and the weights are updated according to that sample only. When data is extremely abundant, we sometimes don't even use a single sample twice, so again no aggregation of losses.

answered Mar 28, 2020 at 7:59

gunes

57.9k4 gold badges50 silver badges88 bronze badges

1

$\begingroup$ Thanks for your comment. I have one more confusion. As we do not accumulate those loss functions then How the updated weights and biases for the first mini batch impacts or carry forward to the next batch run? $\endgroup$
– KAUSHIK DEY
Commented Mar 28, 2020 at 11:53
$\begingroup$ Once you update the weights according to your first mini-batch, you use the updated weights in the second mini-batch. $\endgroup$
– gunes
Commented Mar 28, 2020 at 17:09
$\begingroup$ Thank you very much for the clarification. It is really helpful. $\endgroup$
– KAUSHIK DEY
Commented Mar 29, 2020 at 7:43

Add a comment |

Stack Exchange Network

How total loss is manipulated in mini batch gradient descent as the loss function is calculated and minimized for mini batches?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
machine-learning
neural-networks
or ask your own question.

Hot Network Questions

How total loss is manipulated in mini batch gradient descent as the loss function is calculated and minimized for mini batches?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged machine-learningneural-networks or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
machine-learning
neural-networks
or ask your own question.