Allocator (GPU_0_bfc) ran out of memory trying to allocate when using pure tensorflow vs no error on more complex model on keras

Question

I'm using nvidia gforce 1050 ti

I have a model in keras which works fine no memory allocation error appears but when I run a much more simple model in tensorflow i get the error blow, to see errors: find in this post ERROR==== to see tf model: find TENSORFLOW=== to see keras model: find KERAS===

I don't understand this because batch size(128) is the same i have tensorflow-gpu (pip installed) so how come keras runs fine (with much more complex model) and tensorflow doesn't?

thanks!

2019-10-04 11:45:58.450155: W tensorflow/core/common_runtime/bfc_allocator.cc:237] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.83GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-10-04 11:45:58.450838: W tensorflow/core/common_runtime/bfc_allocator.cc:237] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.84GiB with freed_by_count=0. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available. 2019-10-04 11:46:08.451808: W tensorflow/core/common_runtime/bfc_allocator.cc:314] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.22GiB (rounded to 1310720000). Current allocation summary follows. 2019-10-04 11:46:08.452025: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (256): Total Chunks: 33, Chunks in use: 33. 8.3KiB allocated for chunks. 8.3KiB in use in bin. 2.6KiB client-requested in use in bin. 2019-10-04 11:46:08.452239: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (512): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-10-04 11:46:08.452436: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (1024): Total Chunks: 1, Chunks in use: 1. 1.3KiB allocated for chunks. 1.3KiB in use in bin. 1.0KiB client-requested in use in bin. 2019-10-04 11:46:08.452648: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (2048): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-10-04 11:46:08.452854: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (4096): Total Chunks: 9, Chunks in use: 9. 44.0KiB allocated for chunks. 44.0KiB in use in bin. 44.0KiB client-requested in use in bin. 2019-10-04 11:46:08.453073: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (8192): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-10-04 11:46:08.453276: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (16384): Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin. 2019-10-04 11:46:08.453482: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (32768): Total Chunks: 4, Chunks in use: 4. 160.0KiB allocated for chunks. 160.0KiB in use in bin. 160.0KiB client-requested in use in bin. 2019-10-04 11:46:08.453706: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (65536): Total Chunks: 5, Chunks in use: 5. 384.0KiB allocated for chunks. 384.0KiB in use in bin. 334.1KiB client-requested in use in bin. 2019-10-04 11:46:08.453934: I tensorflow/core/common_runtime/bfc_allocator.cc:764] Bin (131072): Total Chunks: 4, Chunks in use: 4. 512.0KiB allocated for chunks. 512.0KiB in use in bin. 512.0KiB client-requested in use in bin.

Tensorflow:

x = tf.placeholder(tf.float32,shape=[None,32,32,3])
y = tf.placeholder(dtype=tf.float32,shape=[None,CLASSES])
keep_prob = tf.placeholder(dtype=tf.float32)
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
conv1c,rlu1c,max1 = createConvBlock(x,filters=32)
conv2c,rlu2c,max2 = createConvBlock(max1,filters=64)
conv3c,rlu3c,max3 = createConvBlock(max2,filters=128)
conv3c,rlu3c,max3 = createConvBlock(max3,filters=128)
flat = flatten(max3)
dropout = dense(flat,1024,True,keep_prob)
dw4,db4= dense(dropout,CLASSES)
y_hat = tf.matmul(dropout,dw4)+db4
cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y,logits=y_hat))
train_step = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cross_entropy)
BATCH = 128

class CifarHelper():

    def __init__(self):
        self.i = 0
        (train_x, train_y), (test_x, test_y) = tf.keras.datasets.cifar10.load_data()

        self.all_train_batches = train_x
        self.test_batch = test_x

        self.training_images = train_x / 255
        self.training_labels = to_onehot(train_y,10)

        self.test_images = test_x / 255
        self.test_labels = to_onehot(test_y,10)



    def next_batch(self, batch_size):
        x = self.training_images[self.i:self.i + batch_size]
        y = self.training_labels[self.i:self.i + batch_size]
        self.i = (self.i + batch_size) % len(self.training_images)
        return x, y


ch = CifarHelper()
with tf.Session(config=config) as sess:
    sess.run(tf.global_variables_initializer())
    for i in tqdm(range(EPOCHES)):
        a,b = ch.next_batch(BATCH)
        train_step.run(feed_dict={x: a,y :b,keep_prob: 1.0})
        if i % 100 == 0:
            matches = tf.equal(tf.argmax(y_hat, 1), tf.argmax(y, 1))

            acc = tf.reduce_mean(tf.cast(matches, tf.float32))

            test_acc[i//100] = sess.run(acc, feed_dict={x: ch.test_images, y: ch.test_labels, keep_prob: 1.0})



def createConvBlock(xinput,filters,stride = 1,withMaxPoll=True,pool_kernel=[1,2,2,1],pool_stride=[1,2,2,1]):
    shape = [s.value for s in xinput.get_shape()]
    shape = [3,3,shape[3],filters]
    wtb = tf.truncated_normal(shape=shape, stddev=0.1)
    w = tf.Variable(wtb)
    b = tf.Variable(tf.constant(0.1, dtype=tf.float32, shape=[filters]))
    conv = tf.nn.conv2d(xinput,w,strides=[1,stride,stride,1],padding='SAME')
    rlu = tf.nn.relu(conv + b)
    if withMaxPoll:
        maxpool = tf.nn.max_pool2d(rlu,ksize=pool_kernel,strides=pool_stride,padding='SAME')
        return conv,rlu,maxpool

    return conv, rlu

def flatten(layer):
    pooling_size = np.product([s.value for s in layer.get_shape()[1:]])
    flat = tf.reshape(layer,shape=[-1,pooling_size])
    print('flatt {}'.format(flat.get_shape()))
    return flat


def dense(layer,filters,withDropouts = False,keep_prob = None):
    shape = [s.value for s in layer.get_shape()[1:]] + [filters]
    norm = np.product(shape)
    w = tf.Variable(
        tf.truncated_normal(shape=shape, stddev=0.1))
    b = tf.Variable(tf.constant(0.1, dtype=tf.float32, shape=[filters]))
    z = tf.nn.relu(tf.matmul(layer, w) + b)

    if withDropouts:
        dropout = tf.nn.dropout(z,keep_prob)
        return dropout
    return w,b

Keras:

batch_size = 128
num_classes = 10
epochs = 5

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(LeakyReLU())
model.add(Conv2D(32,kernel_size=(3,3),padding='SAME'))
model.add(LeakyReLU())
model.add(Conv2D(32,kernel_size=(3,3),padding='SAME'))
model.add(LeakyReLU())
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3),name='cnv',padding='SAME'))
model.add(LeakyReLU())
model.add(Conv2D(64, (3, 3),padding='SAME'))
model.add(LeakyReLU())
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(128, (3, 3),padding='SAME'))
model.add(LeakyReLU())
model.add(Conv2D(128, (3, 3),padding='SAME'))
model.add(LeakyReLU())
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(1024))
model.add(LeakyReLU())
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax', name='preds'))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adam(),
              metrics=['accuracy'])

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))

Zargham Masood · Accepted Answer · 2021-12-03 10:07:05Z

-2

You should decrease your batch size . Check your code on a batch size of 64 if still does not work decrease it more to 32 or 16 or 8 . This will cause an increase in the execution time of the epoch.

answered Dec 3, 2021 at 10:07

Zargham Masood

11 bronze badge

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.
– Community Bot
Commented Dec 3, 2021 at 11:41
again how can that be? 2nd model more complex and both models batch size is 128 sadly, this doesn't answer my question at all
– LiorA
Commented Dec 4, 2021 at 18:18

Add a comment |

Collectives™ on Stack Overflow

Allocator (GPU_0_bfc) ran out of memory trying to allocate when using pure tensorflow vs no error on more complex model on keras

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
python
tensorflow
keras
deep-learning
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged pythontensorflowkerasdeep-learning or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
tensorflow
keras
deep-learning
or ask your own question.