How to determine maximum batch size for a seq2seq tensorflow RNN training model

Question

Currently, I am using the default 64 as the batch size for the seq2seq tensorflow model. What is the maximum batch size , layer size etc I can go with a single Titan X GPU with 12 GB RAM with Haswell-E xeon 128GB RAM. The input data is converted to embeddings. Following are some helpful parameters I am using , it seems the cell input size is 1024:

encoder_inputs: a list of 2D Tensors [batch_size x cell.input_size].
 decoder_inputs: a list of 2D Tensors [batch_size x cell.input_size].
 tf.app.flags.DEFINE_integer("size", 1024, "Size of each model layer.")

So based on my hardware what is the maximum batch size , layers, input size I can go? Currently the GPU shows that 99% memory is occupied.

simejo · Accepted Answer · 2017-03-16 11:08:16Z

5

By default, Tensorflow occupies all GPU memory available. However, there is a way to change this. In my model, I do this:

config = tf.ConfigProto()
config.gpu_options.allow_growth = True

Then you can use this config when you start your session:

with tf.Session(config=config) as sess:

Now, the model will only use as much memory as it needs, and then you can try with different batch sizes and see when it runs out of memory.

answered Mar 16, 2017 at 11:08

simejo

1031 silver badge8 bronze badges

Add a comment |

keveman · Accepted Answer · 2016-02-04 16:05:56Z

0

The memory usage when running a TensorFlow model depends on how many variables you have in your model, as well as the intermediate tensors that the TensorFlow run time uses to compute activations, gradients, etc. For instance, in your model, if the input_size is 1024, the memory used for variables per layer would be 4MB + 4KB (weights and biases). The memory used for intermediate tensors would grow linearly with the batch size, but the exact amount is hard to estimate, as it depends on how the run time decides to schedule the operations. 12GB should be able to fit quite a large model, though.

answered Feb 4, 2016 at 16:05

keveman

8,4571 gold badge40 silver badges47 bronze badges

1

Currently the GPU shows that 99% memory is occupied for above config.
– stackit
Commented Feb 5, 2016 at 6:51
1

@stackit I've found TF 'occupies' the whole GPU even it is not using all of it for computations...it is will throw an error if your model goes over the GPU memory limit so you can figure out your max model size via trial and error.
– j314erre
Commented Mar 25, 2016 at 17:18

Add a comment |

Paul Tucker · Accepted Answer · 2016-02-08 19:18:11Z

0

Elaborating a bit on the prior answer, it is difficult to analytically forecast the exact max RAM consumption of a model because the TF runtime has some freedom to schedule independent operations simultaneously, and doing so can result in higher max RAM use than executing the same ops sequentially. Op scheduling is dynamic, hence the maximum amount of RAM used in a training step can vary non-deterministically from step to step. In practice, for non-trivial models it seems necessary to experiment to find the largest batch size that will consistently work.

answered Feb 8, 2016 at 19:18

Paul Tucker

8605 silver badges2 bronze badges

Arent there any thumbrule to predict it , however off mark it may be.
– stackit
Commented Jan 9, 2017 at 13:09

Add a comment |

Collectives™ on Stack Overflow

How to determine maximum batch size for a seq2seq tensorflow RNN training model

3 Answers 3

Not the answer you're looking for? Browse other questions tagged
machine-learning
tensorflow
gpu
recurrent-neural-network
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Not the answer you're looking for? Browse other questions tagged machine-learningtensorflowgpurecurrent-neural-network or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
machine-learning
tensorflow
gpu
recurrent-neural-network
or ask your own question.