Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

2
  • $\begingroup$ How do you know it's a "batch-method"? Also, when does it do a forward/backwards optim then? After all the observations? So if I have 1000 observations, first backwards-prop is after all 1000 observations are fed through? $\endgroup$ Commented May 30, 2016 at 14:35
  • $\begingroup$ Just read the first part of the paper :-) This part answers your questions. Hints: Batch methods, such as Limited memory BFGS + A weakness of batch L-BFGS and CG, which require the computation of the gradient on the entire dataset to make an update, is that they do not scale gracefully with the number of examples $\endgroup$
    – sascha
    Commented May 30, 2016 at 14:36