In Neural Network back propagation, how are the weights for one training examples related to the weights for next training examples?

Question

In Simple Neural Network back propagation, we normally use one round of forward and back propagation in every iteration. Let's assume, we have one training example for any arbitrary dimensions, and some initial weights. Then using forward propagation, we calculate the predicted output. This predicted output is then used to calculate the total error which is the back propagated to Re-calculate the weights. After recalculating the weights for all the layers, we update the weights for all the layers all at once. It's not like first we update the weights of one layer and then the other, but instead we first recalculate the weights of all layers( layer by layer ) and then update all at once. We can conclude that "

Re-calculating of the weights layer by layer and then updating the weights with recalculated weights all at one for all the layers". Does this makes sense? Is it the right way of weight update using back propagation?

Now Let's assume, I have "m" examples instead of just one example. In case of "m" examples, each of these small gradient steps will be taken after one back propagation iteration over all examples "m".

I am confused that in case of "m" examples, this back propagation works on these examples one by one. Like, it first takes the first example and update the weights. Then it takes the second example and calculate the weight again. then it takes the third example and calculate the weight and so on. Then in the last when it has run over all the examples, only then it takes the single step towards optimum point. If that is the case, is there any relation between weights for one example to the weights for another example?? As the BP is is recalculating the weights for each examples in sequence?

Can we avoid sequentially going through all examples by using vectorized or matrix based implementation? If yes, How would that avoid it? We still need to compute for each example. — Stupid420, Commented Sep 11, 2017 at 6:33

Sanaullah · Accepted Answer · 2017-09-11 17:47:54Z

3

A batch of data is taken for feed-forward and "Back-propagation" is performed on the number of examples in that batch. Wights and bias are updated on the basis of change of average error/batch. Then change in weights are updated in the previous wights before performing feed-forward on the next batch of data. A detailed explanation is given in the following book:

http://neuralnetworksanddeeplearning.com/chap2.html

answered Sep 11, 2017 at 17:47

Sanaullah

461 bronze badge

Add a comment |

Stupid420 · Accepted Answer · 2017-09-18 03:36:08Z

0

If we look at the error function of batch gradient descent, it calculates the error over all "m" examples.

It's not like it take one example and calculate the error and then take another example and calculate the error. Rather the later would be a case of Stochastic Gradient Descent.

answered Sep 18, 2017 at 3:36

Stupid420

1431 silver badge6 bronze badges

Add a comment |

Dynamic Stardust · Accepted Answer · 2017-09-18 04:57:43Z

There is an error in the question "Then in the last when it has run over all the examples, only then it takes the single step towards optimum point." Actually, regardless of the choice of batch size, be it a single 'example' or a mini-batch or the entire available set of examples, whenever weights are modified, the loss function is also modified and 'it' takes a step 'towards optimum point'.

What you are seeking is the weight update strategy based on error. There are several different strategies, but they all aim to provide a stable convergence towards global optimal weights at the end of the optimization routine. Remember that we want the weights to have good bias-variance properties, or in other words, they should generalize or not over-fit the training data. Bearing that in mind, it is inadvisable to update weights for a single training sample. Doing so will lead to a very unstable learning routine, i.e. the optimization routine will fail to converge. Instead, we take an aggregate error for multiple training samples and update weights progressively.

My best guess of your source is confusion is misinterpretation of what 'example' means. An example here is a batch of 'm*n' training data points, NOT just 1 data point. The 'm' samples is 'm' batches, each of 'n' training points. The idea behind taking m different random sub-samples with replacement of data is better management of data bias during optimization.

Stack Exchange Network

In Neural Network back propagation, how are the weights for one training examples related to the weights for next training examples?

3 Answers 3

Not the answer you're looking for? Browse other questions tagged
machine-learning
neural-networks
gradient-descent
backpropagation
or ask your own question.

Linked

Hot Network Questions

In Neural Network back propagation, how are the weights for one training examples related to the weights for next training examples?

3 Answers 3

Not the answer you're looking for? Browse other questions tagged machine-learningneural-networksgradient-descentbackpropagation or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
machine-learning
neural-networks
gradient-descent
backpropagation
or ask your own question.