How can I import the MNIST dataset that has been manually downloaded?

Question

I have been experimenting with a Keras example, which needs to import MNIST data

from keras.datasets import mnist
import numpy as np
(x_train, _), (x_test, _) = mnist.load_data()

It generates error messages such as Exception: URL fetch failure on https://s3.amazonaws.com/img-datasets/mnist.pkl.gz: None -- [Errno 110] Connection timed out

It should be related to the network environment I am using. Is there any function or code that can let me directly import the MNIST data set that has been manually downloaded?

I tried the following approach

import sys
import pickle
import gzip
f = gzip.open('/data/mnist.pkl.gz', 'rb')
  if sys.version_info < (3,):
    data = pickle.load(f)
else:
    data = pickle.load(f, encoding='bytes')
f.close()
import numpy as np
(x_train, _), (x_test, _) = data

Then I get the following error message

Traceback (most recent call last):
File "test.py", line 45, in <module>
(x_train, _), (x_test, _) = data
ValueError: too many values to unpack (expected 2)

sygi · Accepted Answer · 2016-11-19 13:22:33Z

14

Well, the keras.datasets.mnist file is really short. You can manually simulate the same action, that is:

Download a dataset from https://s3.amazonaws.com/img-datasets/mnist.pkl.gz

.

import gzip
f = gzip.open('mnist.pkl.gz', 'rb')
if sys.version_info < (3,):
    data = cPickle.load(f)
else:
    data = cPickle.load(f, encoding='bytes')
f.close()
(x_train, _), (x_test, _) = data

answered Nov 19, 2016 at 13:22

sygi

4,6272 gold badges35 silver badges56 bronze badges

Hi sygi, thanks for the suggestion. However, I got error message as shown in the updated post. The only thing being different with yours is that I use pickle. Looks like it did not give me error during loading the data.
– user785099
Commented Nov 19, 2016 at 20:13
1

I have checked and it works on my system, with both pickle and cPickle and both python 2 and 3. Are you sure you have the same file (md5 b39289ebd4f8755817b1352c8488b486)?
– sygi
Commented Nov 19, 2016 at 20:27
It works, do not know why it had error message previously. Thanks a lot.
– user785099
Commented Nov 20, 2016 at 5:31
In my case it worked adding those imports import sys; import pickle; import gzip; and using pickle instead of cPickle – I'm using Python 3.6.7 on macOs Mojave
– Giorgio Tempesta
Commented Aug 5, 2019 at 20:08
Great answer although I'd suggest using with open instead of f.close() so you don't have a memory leak.
– runnerX
Commented Sep 5, 2023 at 21:12

Add a comment |

gogasca · Accepted Answer · 2019-05-12 00:02:49Z

12

Keras file is located into a new path in Google Cloud Storage (Before it was in AWS S3):

https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz

When using:

tf.keras.datasets.mnist.load_data()

You can pass a path parameter.

load_data() will call get_file() which takes as parameter fname, if path is a full path and file exists, it will not be downloaded.

Example:

# gsutil cp gs://tensorflow/tf-keras-datasets/mnist.npz /tmp/data/mnist.npz
# python3
>>> import tensorflow as tf
>>> path = '/tmp/data/mnist.npz'
>>> (train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data(path)
>>> len(train_images)
>>> 60000

edited May 12, 2019 at 0:02

answered Mar 31, 2019 at 22:43

gogasca

9,7788 gold badges83 silver badges131 bronze badges

1

The load_data function dont have path parameter any more
– Surendra
Commented Jan 21, 2021 at 20:14

Add a comment |

tardis · Accepted Answer · 2018-08-13 06:13:29Z

10

You do not need additional code for that but can tell load_data to load a local version in the first place:

You can download the file https://s3.amazonaws.com/img-datasets/mnist.npz from another computer with proper (proxy) access (taken from https://github.com/keras-team/keras/blob/master/keras/datasets/mnist.py),
copy it to the the directory ~/.keras/datasets/ (on Linux and macOS)
and run load_data(path='mnist.npz') with the right file name

edited Aug 13, 2018 at 6:13

answered Aug 1, 2018 at 12:03

tardis

1,3204 gold badges25 silver badges49 bronze badges

1

This exactly what I was looking for, thanks a lot Tardis
– Ashwin Hegde
Commented Dec 8, 2019 at 5:32

Add a comment |

Sundeep1501 · Accepted Answer · 2019-10-18 21:16:31Z

5

Download file https://s3.amazonaws.com/img-datasets/mnist.npz
Move mnist.npz to .keras/datasets/ directory

Load data

import keras
from keras.datasets import mnist

(X_train, y_train), (X_test, y_test) = mnist.load_data()

answered Oct 18, 2019 at 21:16

Sundeep1501

1,5201 gold badge18 silver badges28 bronze badges

Add a comment |

Rakurai · Accepted Answer · 2021-05-21 18:49:23Z

1

keras.datasets.mnist.load_data() will attempt to fetch from the remote repository even when a local file path is specified. However, the easiest workaround to load the downloaded file is to use numpy.load(), just like they do:

path = '/tmp/data/mnist.npz'

import numpy as np

with np.load(path, allow_pickle=True) as f:
    x_train, y_train = f['x_train'], f['y_train']
    x_test, y_test = f['x_test'], f['y_test']

answered May 21, 2021 at 18:49

Rakurai

9867 silver badges15 bronze badges

Add a comment |

DemanB · Accepted Answer · 2022-01-09 14:42:00Z

Gogasca's answer worked for me with a little adjustment. For Python 3.9, changing the code in ~/Library/Python/3.9/lib/python/site-packages/keras/datasets/mnist.py so that it uses the path variable as full path instead of adding the origin_folder makes it possible to pass any local path to the downloaded file.

Download the file: https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
Put it in ~/Library/Python/3.9/lib/python/site-packages/keras/datasets/, or another location to Your liking.
Alter ~/Library/Python/3.9/lib/python/site-packages/keras/datasets/mnist.py

path = path

""" origin_folder = 'https://storage.googleapis.com/tensorflow/tf-keras-datasets/' """
""" path = get_file(
path,origin=origin_folder + 'mnist.npz',file_hash='731c5ac602752760c8e48fbffcf8c3b850d9dc2a2aedcf2cc48468fc17b673d1') """

with np.load(path, allow_pickle=True) as f:  # pylint:
    disable=unexpected-keyword-arg
    x_train, y_train = f['x_train'], f['y_train']
    x_test, y_test = f['x_test'], f['y_test']
return (x_train, y_train), (x_test, y_test)

use the following code to load data:

path = "/Users/username/Library/Python/3.9/lib/python/site-packages/keras/datasets/mnist.npz"
(train_images, train_labels), (test_images, test_labels ) = mnist.load_data(path=path)```

Collectives™ on Stack Overflow

How can I import the MNIST dataset that has been manually downloaded?

6 Answers 6

Not the answer you're looking for? Browse other questions tagged
keras
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Not the answer you're looking for? Browse other questions tagged keras or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
keras
or ask your own question.