Skip to main content
3 events
when toggle format what by license comment
May 30, 2016 at 14:36 comment added sascha Just read the first part of the paper :-) This part answers your questions. Hints: Batch methods, such as Limited memory BFGS + A weakness of batch L-BFGS and CG, which require the computation of the gradient on the entire dataset to make an update, is that they do not scale gracefully with the number of examples
May 30, 2016 at 14:35 comment added user1357015 How do you know it's a "batch-method"? Also, when does it do a forward/backwards optim then? After all the observations? So if I have 1000 observations, first backwards-prop is after all 1000 observations are fed through?
May 30, 2016 at 9:49 history answered sascha CC BY-SA 3.0