1
$\begingroup$

I'm using the nnet package in R. One of the parameters is "maxit" but there is no batch size parameter.

As such, I am confused. Is an iteration one pass through an entire data set? Or is the batch size 1 so after every additional observation and back propogation occurs to tweak the network?

Thanks!

$\endgroup$

2 Answers 2

1
$\begingroup$

The docs say it's using BFGS algorithm to optimize the network (which should limit it's usability for big networks; even L-BFGS then has problems).

This is a batch-method (unlike Stochastic gradient descent), so it will work on complete batches (therefore no batch-size parameter).

For a good overview of optimization functions used in NN-learning, see this paper.

$\endgroup$
2
  • $\begingroup$ How do you know it's a "batch-method"? Also, when does it do a forward/backwards optim then? After all the observations? So if I have 1000 observations, first backwards-prop is after all 1000 observations are fed through? $\endgroup$ Commented May 30, 2016 at 14:35
  • $\begingroup$ Just read the first part of the paper :-) This part answers your questions. Hints: Batch methods, such as Limited memory BFGS + A weakness of batch L-BFGS and CG, which require the computation of the gradient on the entire dataset to make an update, is that they do not scale gracefully with the number of examples $\endgroup$
    – sascha
    Commented May 30, 2016 at 14:36
0
$\begingroup$

In the manual of the nnet package, it clarified that:

Optimization is done via the BFGS method of optim.

So the SGD is not applicable and there is no hyperparameter batch size.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.