4
$\begingroup$

When training a neural network with back propagation, I have often seen that data is processed in batches. So instead of computing and updating the gradient for each training sample, the average gradient is calculated over multiple samples, this is used for the update.

What is the reason for this? Is it because it is faster to train as you update the weights less frequently? Or is it because training over multiple samples avoids overfitting to the individual samples? If the latter is try, then why not train over all the samples at once, rather than dividing it into batches at all?

Thank!

$\endgroup$
1

1 Answer 1

3
$\begingroup$

The latter is true. It would be nice to train on the whole dataset, but the dataset is often too large for it to be technically feasible, at least in image analysis applications.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.