Gradient in batch-size

Question

When we set a batch-size, after each sample of batch passed we take the gradient but wait until last sample of batch to passed and then propagate the sum of gradient of them through the network? Am I correct or not?

If it isn't and we propagate after each pass of sample so what is the benefits of batch? Please, someone give me an explanation.

Actually we don't take the gradient after each sample but compute the sum of the errors and once a batch is completed, then you take the overall gradient and backpropagate. Will write answer asap — MysteryGuy, Commented Aug 19, 2018 at 6:18
Yeah, that is it. I mean take errors and then propagate gradient of sum of errors. is it certainly? — ja0k0010, Commented Aug 19, 2018 at 6:45
I have posted an answer, please feel free to accept it if this is ok for you :) — MysteryGuy, Commented Aug 19, 2018 at 7:22
@MysteryGuy Ok thanks. maybe i can get a repu also on question :) — ja0k0010, Commented Aug 19, 2018 at 7:32

MysteryGuy · Accepted Answer · 2018-08-19 07:22:00Z

1

When training neural networks, backpropagation requires the computation of many gradients and it can be computationnally heavy. In order to minimize that load, the weights are only updated (i.e. the backpropagation is done) after a certain amount of samples, that's called batch (or mini-batch, if number of samples is low) gradient descent.

So, the loss function is of course computed after each training example and summed up for all the samples in the batch, then backpropagation is applied on the overall loss of the batch.

You can also update the weights after each sample, this is called stochastic gradient descent

answered Aug 19, 2018 at 7:22

MysteryGuy

2202 silver badges19 bronze badges

$\begingroup$ I think the is not just the heavy load of computation, it is also because of really congested changes of direction of gradient in the way. $\endgroup$
– ja0k0010
Commented Aug 19, 2018 at 8:39
$\begingroup$ @ja0k0010 Yes you have also vanishing and exploding gradient problem... But if you have a deep neural network, computing the derivatives can takes time... $\endgroup$
– MysteryGuy
Commented Aug 19, 2018 at 8:50

Add a comment |

Stack Exchange Network

Gradient in batch-size

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
machine-learning
neural-networks
gradient
batch-normalization
or ask your own question.

Hot Network Questions

Gradient in batch-size

1 Answer 1

Not the answer you're looking for? Browse other questions tagged machine-learningneural-networksgradientbatch-normalization or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
machine-learning
neural-networks
gradient
batch-normalization
or ask your own question.