0
$\begingroup$

When we set a batch-size, after each sample of batch passed we take the gradient but wait until last sample of batch to passed and then propagate the sum of gradient of them through the network? Am I correct or not?

If it isn't and we propagate after each pass of sample so what is the benefits of batch? Please, someone give me an explanation.

$\endgroup$
4
  • $\begingroup$ Actually we don't take the gradient after each sample but compute the sum of the errors and once a batch is completed, then you take the overall gradient and backpropagate. Will write answer asap $\endgroup$
    – MysteryGuy
    Commented Aug 19, 2018 at 6:18
  • $\begingroup$ Yeah, that is it. I mean take errors and then propagate gradient of sum of errors. is it certainly? $\endgroup$
    – ja0k0010
    Commented Aug 19, 2018 at 6:45
  • $\begingroup$ I have posted an answer, please feel free to accept it if this is ok for you :) $\endgroup$
    – MysteryGuy
    Commented Aug 19, 2018 at 7:22
  • $\begingroup$ @MysteryGuy Ok thanks. maybe i can get a repu also on question :) $\endgroup$
    – ja0k0010
    Commented Aug 19, 2018 at 7:32

1 Answer 1

1
$\begingroup$

When training neural networks, backpropagation requires the computation of many gradients and it can be computationnally heavy. In order to minimize that load, the weights are only updated (i.e. the backpropagation is done) after a certain amount of samples, that's called batch (or mini-batch, if number of samples is low) gradient descent.

So, the loss function is of course computed after each training example and summed up for all the samples in the batch, then backpropagation is applied on the overall loss of the batch.

You can also update the weights after each sample, this is called stochastic gradient descent

$\endgroup$
2
  • $\begingroup$ I think the is not just the heavy load of computation, it is also because of really congested changes of direction of gradient in the way. $\endgroup$
    – ja0k0010
    Commented Aug 19, 2018 at 8:39
  • $\begingroup$ @ja0k0010 Yes you have also vanishing and exploding gradient problem... But if you have a deep neural network, computing the derivatives can takes time... $\endgroup$
    – MysteryGuy
    Commented Aug 19, 2018 at 8:50

Not the answer you're looking for? Browse other questions tagged or ask your own question.