
I'm using the nnet package in R. One of the parameters is "maxit" but there is no batch size parameter.

As such, I am confused. Is an iteration one pass through an entire data set? Or is the batch size 1 so after every additional observation and back propogation occurs to tweak the network?



2 Answers 2


The docs say it's using BFGS algorithm to optimize the network (which should limit it's usability for big networks; even L-BFGS then has problems).

This is a batch-method (unlike Stochastic gradient descent), so it will work on complete batches (therefore no batch-size parameter).

For a good overview of optimization functions used in NN-learning, see this paper.

  • $\begingroup$ How do you know it's a "batch-method"? Also, when does it do a forward/backwards optim then? After all the observations? So if I have 1000 observations, first backwards-prop is after all 1000 observations are fed through? $\endgroup$ Commented May 30, 2016 at 14:35
  • $\begingroup$ Just read the first part of the paper :-) This part answers your questions. Hints: Batch methods, such as Limited memory BFGS + A weakness of batch L-BFGS and CG, which require the computation of the gradient on the entire dataset to make an update, is that they do not scale gracefully with the number of examples $\endgroup$
    – sascha
    Commented May 30, 2016 at 14:36

In the manual of the nnet package, it clarified that:

Optimization is done via the BFGS method of optim.

So the SGD is not applicable and there is no hyperparameter batch size.


Not the answer you're looking for? Browse other questions tagged or ask your own question.