When we set a batch-size, after each sample of batch passed we take the gradient but wait until last sample of batch to passed and then propagate the sum of gradient of them through the network? Am I correct or not?
If it isn't and we propagate after each pass of sample so what is the benefits of batch? Please, someone give me an explanation.