How to update weights in a neural network using gradient descent with mini-batches?

Question

How does gradient descent work for training a neural network if I choose mini-batch (i.e., sample a subset of the training set)? I have thought of three different possibilities:

Epoch starts. We sample and feedforward one minibatch only, get the error and backprop it, i.e. update the weights. Epoch over.
Epoch starts. We sample and feedforward a minibatch, get the error and backprop it, i.e. update the weights. We repeat this until we have sampled the full data set. Epoch over.
Epoch starts. We sample and feedforward a minibatch, get the error and store it. We repeat this until we have sampled the full data set. We somehow average the errors and backprop them by updating the weights. Epoch over.

Has your question been answered or is something still not clear? — cdeterman, Commented Dec 28, 2015 at 14:18
Cross-posted: datascience.stackexchange.com/q/9378/8560, stats.stackexchange.com/q/186687/2921. Please do not post the same question on multiple sites. Each community should have an honest shot at answering without anybody's time being wasted. — D.W., Commented Jan 4, 2017 at 22:42

cdeterman · Accepted Answer · 2015-12-18 21:27:40Z

9

Mini-batch is implemented basically as you describe in 2.

Epoch starts. We sample and feedforward a minibatch, get the error and backprop it, i.e. update the weights. We repeat this until we have sampled the full data set. Epoch over.

Assuming that the network is minimizing the following objective function: $$ \frac{\lambda}{2}||\theta||^2 + \frac{1}{n}\sum_{i=1}^n E(x^{(i)}, y^{(i)}, \theta) $$

This is essentially the weights update step

$$ \theta = (1 - \alpha \lambda) \theta - \alpha \frac{1}{b}\sum_{k=i}^{i+b-1} \frac{\partial E}{\partial \theta}(x^{(k)}, y^{(k)}, \theta) $$

where the following symbols mean:

$E$ = the error measure (also sometimes denoted as cost measure $J$)

$\theta$ = weights

$\alpha$ = learning rate

$1 - \alpha \lambda$ = weight decay

$b$ = batch size

$x$ = variables

You loop over the consecutive batches (i.e. increment by $b$) and update the weights. This more frequent weight updating combined with vectorization is what allows mini-batch gradient descent to tend to converge more quickly than either generic batch of stochastic methods.

edited Dec 18, 2015 at 21:27

answered Dec 18, 2015 at 19:29

cdeterman

5,1112 gold badges24 silver badges35 bronze badges

$\begingroup$ Two comments: 1) the update rule $\theta_j = ...$ assumes a particular loss function the way that you've written it. I suggest defining the update rule using $\nabla h_0(x)$ instead so that it is generic. 2) the update rule does not have a weight decay (also for the sake of generality), I would write it with the weight decay. $\endgroup$
– Sobi
Commented Dec 18, 2015 at 19:39
$\begingroup$ @Sobi, good point about $\nabla h_0(x)$, I am just so used to that assumed loss function. However I am not sure exactly what you mean when you say weight decay? Are you referring to the division by the batch size? Feel free to provide such a generalized function in an edit. $\endgroup$
– cdeterman
Commented Dec 18, 2015 at 19:47
$\begingroup$ Thanks, i realize you have used a quadratic error function, thats not a problem. Could you mb refer me to a publication that discusses this in details? $\endgroup$
– Alex
Commented Dec 18, 2015 at 20:10
$\begingroup$ @Sobi, regarding your edit, is $n$ supposed to be batch size? Also, I am unfamiliar with the l2 norm notation. $\endgroup$
– cdeterman
Commented Dec 18, 2015 at 21:06
1

$\begingroup$ @cdeterman, $n$ is the number of all training samples. Note that $n$ only appears in the original objective function (i.e. $\frac{\lambda}{2}||w||^2 + ...$). $\endgroup$
– Sobi
Commented Dec 18, 2015 at 21:09

| Show 4 more comments

Stack Exchange Network

How to update weights in a neural network using gradient descent with mini-batches?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
machine-learning
neural-networks
gradient-descent
backpropagation
or ask your own question.

Hot Network Questions

How to update weights in a neural network using gradient descent with mini-batches?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged machine-learningneural-networksgradient-descentbackpropagation or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
machine-learning
neural-networks
gradient-descent
backpropagation
or ask your own question.