18

I am new to pytorch. The following is the basic example of using nn module to train a simple one-layer model with some random data (from here)

import torch
N, D_in, H, D_out = 64, 1000, 100, 10

x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)
loss_fn = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
for t in range(500):
    y_pred = model(x)

    loss = loss_fn(y_pred, y)
    print(t, loss.item())

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

As far as I understand, the batch size is equal to 1 in the example, in other words, a single point (out of 64) is used to calculate gradients and update parameters. My question is: how to modify this example to train the model with the batch size greater than one?

2 Answers 2

9

To include batch size in PyTorch basic examples, the easiest and cleanest way is to use PyTorch torch.utils.data.DataLoader and torch.utils.data.TensorDataset.

Dataset stores the samples and their corresponding labels, and DataLoader wraps an iterable around the Dataset to enable easy access to the samples.

DataLoader will take care of creating batches for you.

Building on your question, there is a complete code snippet, where we iterate over a dataset of 10000 examples for 2 epochs with a batch size of 64:

import torch
from torch.utils.data import DataLoader, TensorDataset


# Create the dataset with N_SAMPLES samples
N_SAMPLES, D_in, H, D_out = 10000, 1000, 100, 10

x = torch.randn(N_SAMPLES, D_in)
y = torch.randn(N_SAMPLES, D_out)

# Define the batch size and the number of epochs
BATCH_SIZE = 64
N_EPOCHS = 2

# Use torch.utils.data to create a DataLoader 
# that will take care of creating batches 
dataset = TensorDataset(x, y)
dataloader = DataLoader(dataset, batch_size=BATCH_SIZE, shuffle=True)

# Define model, loss and optimizer
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)

loss_fn = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)

# Get the dataset size for printing (it is equal to N_SAMPLES)
dataset_size = len(dataloader.dataset)

# Loop over epochs
for epoch in range(N_EPOCHS):
    print(f"Epoch {epoch + 1}\n-------------------------------")

    # Loop over batches in an epoch using DataLoader
    for id_batch, (x_batch, y_batch) in enumerate(dataloader):

        y_batch_pred = model(x_batch)

        loss = loss_fn(y_batch_pred, y_batch)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # Every 100 batches, print the loss for this batch
        # as well as the number of examples processed so far 
        if id_batch % 100 == 0:
            loss, current = loss.item(), (id_batch + 1)* len(x_batch)
            print(f"loss: {loss:>7f}  [{current:>5d}/{dataset_size:>5d}]")

The output should be something like:

Epoch 1
-------------------------------
loss: 643.433716  [   64/10000]
loss: 648.195435  [ 6464/10000]
Epoch 2
-------------------------------
loss: 613.619873  [   64/10000]
loss: 625.018555  [ 6464/10000]
7

In fact N is the batch size. So you just need to modify N currently its set to 64. So you have in every training batch 64 vectors with size / dim D_in.

I checked the link you posted, you can also take a look at the comments - there is some explanation too :)

# -*- coding: utf-8 -*-
import numpy as np

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

# Randomly initialize weights
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y
    h = x.dot(w1)
    h_relu = np.maximum(h, 0)
    y_pred = h_relu.dot(w2)

    # Compute and print loss
    loss = np.square(y_pred - y).sum()
    print(t, loss)

    # Backprop to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h < 0] = 0
    grad_w1 = x.T.dot(grad_h)

    # Update weights
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2
6
  • 2
    If N is the batch size, where is the number of training examples defined?
    – oezguensi
    Commented Nov 29, 2018 at 3:48
  • 1
    @oezguensi It is N too - there is only one batch here, with batch size 64. This example iterates just 500 times over the same batch: number_of_training_examples = num_batches * batch_size, thus 1 * 64 = 64
    – MBT
    Commented Nov 29, 2018 at 8:40
  • That seems quite useless. Why should we iterate over the same training examples again and again. Is it just the example that is wrong, because of simplicity or am I missing something?
    – oezguensi
    Commented Nov 30, 2018 at 0:07
  • 1
    @oezguensi Yes, in reality you wouldn't do that. But I guess it is just for illustration purposes how dimensions work out within a batch while keeping it simple.
    – MBT
    Commented Dec 2, 2018 at 10:30
  • is t the number of epochs in this sample example? I believe there needs to be another for loop through training_num right after for t in range(500):
    – doplano
    Commented Oct 14, 2019 at 7:57

Not the answer you're looking for? Browse other questions tagged or ask your own question.