24

I would like to remove tensorflow and hugging face models from my laptop. I did find one link https://github.com/huggingface/transformers/issues/861 but is there not command that can remove them because as mentioned in the link manually deleting can cause problems because we don't know which other files are linked to those models or are expecting some model to be present in that location or simply it may cause some error.

2
  • Do you want to remove certain models or the whole cache (i.e. all models)?
    – cronoik
    Commented Nov 27, 2020 at 13:50
  • certain models, want to remove models which are no longer useful and free up certain space on hardisk Commented Nov 27, 2020 at 14:02

5 Answers 5

37

Use

pip install huggingface_hub["cli"]

Then

huggingface-cli delete-cache

You should now see a list of revisions that you can select/deselect.

See this link for details.

2
  • 2
    Perfect! This solution is elegant and clean
    – Gildas
    Commented Jun 2, 2023 at 19:58
  • 1
    Usage instructions: "Press <space> to select, <enter> to validate and <ctrl+c> to quit without modification."
    – tuomastik
    Commented Mar 9 at 14:45
10

The transformers library will store the downloaded files in your cache. As far as I know, there is no built-in method to remove certain models from the cache. But you can code something by yourself. The files are stored with a cryptical name alongside two additional files that have .json (.h5.json in case of Tensorflow models) and .lock appended to the cryptical name. The json file contains some metadata that can be used to identify the file. The following is an example of such a file:

{"url": "https://cdn.huggingface.co/roberta-base-pytorch_model.bin", "etag": "\"8a60a65d5096de71f572516af7f5a0c4-30\""}

We can now use this information to create a list of your cached files as shown below:

import glob
import json
import re
from collections import OrderedDict 
from transformers import TRANSFORMERS_CACHE
 
metaFiles = glob.glob(TRANSFORMERS_CACHE + '/*.json')
modelRegex = "huggingface\.co\/(.*)(pytorch_model\.bin$|resolve\/main\/tf_model\.h5$)"

cachedModels = {}
cachedTokenizers = {}
for file in metaFiles:
     with open(file) as j:
         data = json.load(j)
         isM = re.search(modelRegex, data['url'])
         if isM:
             cachedModels[isM.group(1)[:-1]] = file
         else:
             cachedTokenizers[data['url'].partition('huggingface.co/')[2]] = file

cachedTokenizers = OrderedDict(sorted(cachedTokenizers.items(), key=lambda k: k[0]))

Now all you have to do is to check the keys of cachedModels and cachedTokenizers and decide if you want to keep them or not. In case you want to delete them, just check for the value of the dictionary and delete the file from the cache. Don't forget to also delete the corresponding *.json and *.lock files.

10

From a comment in transformers github issue, you can use the following way to find the cache directory so that you can clean it:

from transformers import file_utils
print(file_utils.default_cache_path)
3

You can run this code to delete all models

from transformers import TRANSFORMERS_CACHE
print(TRANSFORMERS_CACHE)

import shutil
shutil.rmtree(TRANSFORMERS_CACHE)
0
-4
pip uninstall tensorflow 
pip uninstall tensorflow-gpu
pip uninstall transformers

and find where you have saved gpt-2

model.save_pretrained("./english-gpt2") .???

english-gpt2 = your downloaded model name.

from that path you can manually delete.

7
  • That is not what the OP is looking for as it will remove all libraries and does not clear the default cache.
    – cronoik
    Commented Nov 27, 2020 at 16:48
  • As far as I have experienced, if you save it (huggingface-gpt-2 model, it is not on cache but on disk. Let me know your OS so that I can give you command accordingly. If this is Linux, with grep command, can me located easily.
    – ML85
    Commented Nov 27, 2020 at 20:46
  • I think that is some kind of misunderstanding. The OP (not me) wants to remove only certain models and not the whole transformers library. That's why I said that you are not answering the question of the OP. I also just tested what you have said and calling save_pretrained does not clear the cache (which is correct in my opinion).
    – cronoik
    Commented Nov 27, 2020 at 21:50
  • As far as I remember cache is a part of RAM memory and models I guess would be stored on hardisk becuase they may not be permanently on RAM memory ? When needed they might be loaded into cache. But my aim is to remove from hardisk. I want to free some hardisk space by deleting some models which I dont use anymore. Commented Nov 28, 2020 at 7:01
  • 1
    @HiteshSomani Both answers will remove the models from your hard disk. The cache is just a term for intermediate storage which can be the RAM, or the processor, or the hard disk. Please check the Wikipedia article for further information.
    – cronoik
    Commented Nov 28, 2020 at 15:31

Not the answer you're looking for? Browse other questions tagged or ask your own question.