25

I built an autoencoder model based on CNN structure using Keras, after finish the training process, my laptop has 64GB memory, but I noticed that at least 1/3 of the memory is still occupied, and the same thing for the GPU memory, too. I did not find out a good method to release the memory, I could only release the memory by closing the Anaconda Prompt command window and jupyter notebook. I am not sure if anyone has a good suggestion. Thanks!

3
  • This issue appears not to be related to programming, but rather a problem with consumer PC hardware or software, which is off-topic for Stack Overflow. If you still need assistance with this issue, please ask it at Stack Overflow's sister site, Super User
    – rst-2cv
    Commented Jun 23, 2018 at 21:44
  • 2
    It's most likely the data that is occupying the memory while the process is running (the jupyter kernel). You might try del mydata to delete your data variables from scope for garbage collection to happen; but can't tell without the code.
    – nuric
    Commented Jun 24, 2018 at 8:58
  • Thank you for suggestions, I tried to use "del" to delete the loaded training and test imaging data, it can release about 3GB memory, that solves the problem partially
    – J. Zhao
    Commented Jun 25, 2018 at 3:25

2 Answers 2

34

Releasing RAM memory

For releasing the RAM memory, just do del Variables as suggested by @nuric in the comment.

Releasing GPU memory

This is a little bit trickier than releasing the RAM memory. Some people will suggest you the following code (Assuming you are using keras)

from keras import backend as K
K.clear_session()

However, the above code doesn't work for all people. (Even when you try del Models, it is still not going to work)

If the above method doesn't work for you, then try the following (You need to install the numba library first):

from numba import cuda
cuda.select_device(0)
cuda.close()

The reason behind it is: Tensorflow is just allocating memory to the GPU, while CUDA is responsible for managing the GPU memory.

If CUDA somehow refuses to release the GPU memory after you have cleared all the graph with K.clear_session(), then you can use the cuda library to have a direct control on CUDA to clear up GPU memory.

1
  • 1
    Thank you for explaining the additional difference between CUDA and Tensorflow's memory management. I did not know this. Commented Sep 25, 2018 at 4:13
5

For clearing RAM memory, simply delete variables as suggested by Raven.

But unfortunately for GPU cuda.close() will throw errors for future steps involving GPU such as for model evaluation. A workaround for free GPU memory is to wrap up the model creation and training part in a function then use subprocess for the main work. When training is done, subprocess will be terminated and GPU memory will be free.

something like:

import multiprocessing

def create_model_and_train( ):
      .....
      .....

p = multiprocessing.Process(target=create_model_and_train)
p.start()
p.join()

Or you can create below function and call it before each run:

from keras.backend.tensorflow_backend import set_session
from keras.backend.tensorflow_backend import clear_session
from keras.backend.tensorflow_backend import get_session
import tensorflow
import gc

# Reset Keras Session
def reset_keras():
    sess = get_session()
    clear_session()
    sess.close()
    sess = get_session()

    try:
        del classifier # this is from global space - change this as you need
    except:
        pass

    print(gc.collect()) # if it does something you should see a number as output

    # use the same config as you used to create the session
    config = tensorflow.ConfigProto()
    config.gpu_options.per_process_gpu_memory_fraction = 1
    config.gpu_options.visible_device_list = "0"
    set_session(tensorflow.Session(config=config))
3
  • I tried numba cuda, gc collect, del history and model, nothing worked. But the multiprocessing solution worked perfectly for me. Thanks! Commented May 16, 2022 at 16:23
  • Hi guys I am using the reset_keras() function to train my model. I have a training data set with 36000 examples I divide my data set into chunks of 6000 but even when I feed my model with this chunks in some moment I have the following message: InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized. Then, every time I get a new chunk of data I use the reset_keras() function my question is my model learning or begin from zero when I use this function?
    – EdwinMald
    Commented Aug 28, 2022 at 13:27
  • Note: For the subprocess solution, I needed to import (and thereby initialize) TensorFlow directly in the create_model_and_train function. Commented Aug 29, 2023 at 14:22

Not the answer you're looking for? Browse other questions tagged or ask your own question.