MNIST Training in PyTorch

In this tutorial, we demonstrate how to do Hyperparameter Optimization (HPO) using AutoGluon with PyTorch. AutoGluon is a framework agnostic HPO toolkit, which is compatible with any training code written in python. The PyTorch code used in this tutorial is adapted from this git repo. In your applications, this code can be replaced with your own PyTorch code.

Import the packages:

import torch
import torch.nn as nn
import torch.nn.functional as F

import torchvision
import torchvision.transforms as transforms
from import tqdm

Start with an MNIST Example

Data Transforms

We first apply standard image transforms to our training and validation data:

transform = transforms.Compose([
   transforms.Normalize((0.1307,), (0.3081,))

# the datasets
trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
testset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)
Downloading to ./data/MNIST/raw/train-images-idx3-ubyte.gz
HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))
Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw
Downloading to ./data/MNIST/raw/train-labels-idx1-ubyte.gz
HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))
Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw
Downloading to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz
HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))
Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw
Downloading to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz
HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))
Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw
/pytorch/torch/csrc/utils/tensor_numpy.cpp:141: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program.

Main Training Loop

The following train_mnist function represents normal training code a user would write for training on MNIST dataset. Python users typically use an argparser to conveniently change default values. The only additional argument you need to add to your existing python function is a reporter object that is used to store performance achieved under different hyperparameter settings.

def train_mnist(args, reporter):
    # get variables from args
    lr =
    wd = args.wd
    epochs = args.epochs
    net =
    print('lr: {}, wd: {}'.format(lr, wd))

    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    # Model
    net =

    if device == 'cuda':
        net = nn.DataParallel(net)
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.SGD(net.parameters(),, momentum=0.9, weight_decay=wd)

    # datasets and dataloaders
    trainset = torchvision.datasets.MNIST(root='./data', train=True, download=False, transform=transform)
    trainloader =, batch_size=128, shuffle=True, num_workers=2)

    testset = torchvision.datasets.MNIST(root='./data', train=False, download=False, transform=transform)
    testloader =, batch_size=128, shuffle=False, num_workers=2)

    # Training
    def train(epoch):
        train_loss, correct, total = 0, 0, 0
        for batch_idx, (inputs, targets) in enumerate(trainloader):
            inputs, targets =,
            outputs = net(inputs)
            loss = criterion(outputs, targets)

    def test(epoch):
        test_loss, correct, total = 0, 0, 0
        with torch.no_grad():
            for batch_idx, (inputs, targets) in enumerate(testloader):
                inputs, targets =,
                outputs = net(inputs)
                loss = criterion(outputs, targets)

                test_loss += loss.item()
                _, predicted = outputs.max(1)
                total += targets.size(0)
                correct += predicted.eq(targets).sum().item()

        acc = 100.*correct/total
        # 'epoch' reports the number of epochs done
        reporter(epoch=epoch+1, accuracy=acc)

    for epoch in tqdm(range(0, epochs)):

AutoGluon HPO

In this section, we cover how to define a searchable network architecture, convert the training function to be searchable, create the scheduler, and then launch the experiment.

Define a Searchable Network Achitecture

Let’s define a ‘dynamic’ network with searchable configurations by simply adding a decorator autogluon.obj(). In this example, we only search two arguments hidden_conv and hidden_fc, which represent the hidden channels in convolutional layer and fully connected layer. More info about searchable space is available at

import autogluon as ag

@ag.obj(, 12),, 120, 160),
class Net(nn.Module):
    def __init__(self, hidden_conv, hidden_fc):
        self.conv1 = nn.Conv2d(1, hidden_conv, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(hidden_conv, 16, 5)
        self.fc1 = nn.Linear(16 * 4 * 4, hidden_fc)
        self.fc2 = nn.Linear(hidden_fc, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 4 * 4)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Convert the Training Function to Be Searchable

We can simply add a decorator autogluon.args() to convert the train_mnist function argument values to be tuned by AutoGluon’s hyperparameter optimizer. In the example below, we specify that the lr argument is a real-value that should be searched on a log-scale in the range 0.01 - 0.2. Before passing lr to your train function, AutoGluon always selects an actual floating point value to assign to lr so you do not need to make any special modifications to your existing code to accommodate the hyperparameter search.

    lr =, 0.2, log=True),
    wd =, 5e-4, log=True),
    net = Net(),
def ag_train_mnist(args, reporter):
    return train_mnist(args, reporter)

Create the Scheduler and Launch the Experiment

myscheduler = ag.scheduler.FIFOScheduler(ag_train_mnist,
                                         resource={'num_cpus': 4, 'num_gpus': 1},
(Remote: Remote REMOTE_ID: 0,
    <Remote: 'inproc://' processes=1 threads=8, memory=33.24 GB>, Resource: NodeResourceManager(8 CPUs, 1 GPUs))
Starting Experiments
Num of Finished Tasks is 0
Num of Pending Tasks is 2
HBox(children=(FloatProgress(value=0.0, max=2.0), HTML(value='')))
lr: 0.0447213595, wd: 0.0002236068
HBox(children=(FloatProgress(value=0.0, max=5.0), HTML(value='')))
lr: 0.028245913732173278, wd: 0.00017160776862349322
HBox(children=(FloatProgress(value=0.0, max=5.0), HTML(value='')))

We plot the test accuracy achieved over the course of training under each hyperparameter configuration that AutoGluon tried out (represented as different colors).

print('The Best Configuration and Accuracy are: {}, {}'.format(myscheduler.get_best_config(),
The Best Configuration and Accuracy are: {'lr': 0.0447213595, 'net.hidden_conv': 9, 'net.hidden_fc.choice': 0, 'wd': 0.0002236068}, 98.82