2

I am writing a code of a well-known problem MNIST database of handwritten digits in PyTorch. I downloaded the train and testing dataset (from the main website) including the labeled dataset. The dataset format is t10k-images-idx3-ubyte.gz and after extract t10k-images-idx3-ubyte. My dataset folder looks like

MINST
 Data
  train-images-idx3-ubyte.gz
  train-labels-idx1-ubyte.gz
  t10k-images-idx3-ubyte.gz
  t10k-labels-idx1-ubyte.gz

Now, I wrote a code to load data like bellow

def load_dataset():
    data_path = "/home/MNIST/Data/"
    xy_trainPT = torchvision.datasets.ImageFolder(
        root=data_path, transform=torchvision.transforms.ToTensor()
    )
    train_loader = torch.utils.data.DataLoader(
        xy_trainPT, batch_size=64, num_workers=0, shuffle=True
    )
    return train_loader

My code is showing Supported extensions are: .jpg,.jpeg,.png,.ppm,.bmp,.pgm,.tif,.tiff,.webp

How can I solve this problem and I also want to check that my images are loaded (just a figure contains the first 5 images) from the dataset?

0

3 Answers 3

2

Read this Extract images from .idx3-ubyte file or GZIP via Python

Update

You can import data using this format

xy_trainPT = torchvision.datasets.MNIST(
    root="~/Handwritten_Deep_L/",
    train=True,
    download=True,
    transform=torchvision.transforms.Compose([torchvision.transforms.ToTensor()]),
)

Now, what is happening at download=True first your code will check at the root directory (your given path) contains any datasets or not.

If no then datasets will be downloaded from the web.

If yes this path already contains a dataset then your code will work using the existing dataset and will not download from the internet.

You can check, first give a path without any dataset (data will be downloaded from the internet), and then give another path which already contains dataset data will not be downloaded.

0
0

Welcome to stackoverflow !

The MNIST dataset is not stored as images, but in a binary format (as indicated by the ubyte extension). Therefore, ImageFolderis not the type dataset you want. Instead, you will need to use the MNIST dataset class. It could even download the data if you had not done it already :)

This is a dataset class, so just instantiate with the proper root path, then put it as the parameter of your dataloader and everything should work just fine.

If you want to check the images, just use the getmethod of the dataloader, and save the result as a png file (you may need to convert the tensor to a numpy array first).

3
  • thank you. Do you mean like this? data_path = "/data/MNIST/raw" xy_trainPT = torchvision.datasets.MNIST( root=data_path, transform=torchvision.transforms.ToTensor()
    – 0Knowledge
    Commented Sep 26, 2020 at 17:05
  • yeah there may be a transform required, you can take inspiration from github.com/pytorch/examples/blob/master/mnist/main.py Commented Sep 26, 2020 at 23:11
  • 1
    But this is not working Showing Dataset not found. You can use download=True to download it but the path is right
    – 0Knowledge
    Commented Sep 27, 2020 at 5:33
0

Late to the party but I came across this post because I was facing a similar issue. I already had the MNIST folder downloaded (via pytorch dataset) somewhere else in my repository and I didn't want to redownload it again when I needed in a different source file.

My problem was that when passing the root argument, I was referencing the MNIST/ folder but you should actually be referencing the parent folder that contains the MNIST/ directory. In fact the docs mention:

root (string) – Root directory of dataset where MNIST/raw/train-images-idx3-ubyte and MNIST/raw/t10k-images-idx3-ubyte exist.

So I figured the MNIST/ part of the path should be omitted.

So in my case I had: mnist_dataset = torchvision.datasets.MNIST(root='../MNIST/', train=True, download=False)

Hope this helps

which should be changed to: mnist_dataset = torchvision.datasets.MNIST(root='../', train=True, download=False)

Not the answer you're looking for? Browse other questions tagged or ask your own question.