.. _sec_custom_advancedhpo:
Getting started with Advanced HPO Algorithms
============================================
Loading libraries
-----------------
.. code:: python
# Basic utils for folder manipulations etc
import time
import multiprocessing # to count the number of CPUs available
# External tools to load and process data
import numpy as np
import pandas as pd
# MXNet (NeuralNets)
import mxnet as mx
from mxnet import gluon, autograd
from mxnet.gluon import nn
# AutoGluon and HPO tools
import autogluon.core as ag
from autogluon.mxnet.utils import load_and_split_openml_data
Check the version of MxNet, you should be fine with version >= 1.5
.. code:: python
mx.__version__
.. parsed-literal::
:class: output
'1.7.0'
You can also check the version of AutoGluon and the specific commit and
check that it matches what you want.
.. code:: python
import autogluon.core.version
ag.version.__version__
.. parsed-literal::
:class: output
'0.0.15b20201023'
Hyperparameter Optimization of a 2-layer MLP
--------------------------------------------
Setting up the context
~~~~~~~~~~~~~~~~~~~~~~
Here we declare a few "environment variables" setting the context for
what we're doing
.. code:: python
OPENML_TASK_ID = 6 # describes the problem we will tackle
RATIO_TRAIN_VALID = 0.33 # split of the training data used for validation
RESOURCE_ATTR_NAME = 'epoch' # how do we measure resources (will become clearer further)
REWARD_ATTR_NAME = 'objective' # how do we measure performance (will become clearer further)
NUM_CPUS = multiprocessing.cpu_count()
Preparing the data
~~~~~~~~~~~~~~~~~~
We will use a multi-way classification task from OpenML. Data
preparation includes:
- Missing values are imputed, using the 'mean' strategy of
``sklearn.impute.SimpleImputer``
- Split training set into training and validation
- Standardize inputs to mean 0, variance 1
.. code:: python
X_train, X_valid, y_train, y_valid, n_classes = load_and_split_openml_data(
OPENML_TASK_ID, RATIO_TRAIN_VALID, download_from_openml=False)
n_classes
.. parsed-literal::
:class: output
100%|██████████| 704/704 [00:00<00:00, 50589.19KB/s]
100%|██████████| 2521/2521 [00:00<00:00, 59510.92KB/s]
3KB [00:00, 2536.37KB/s]
8KB [00:00, 6958.61KB/s]
15KB [00:00, 11039.58KB/s]
2998KB [00:00, 18928.27KB/s]
881KB [00:00, 55150.32KB/s]
3KB [00:00, 2310.49KB/s]
.. parsed-literal::
:class: output
26
The problem has 26 classes.
Declaring a model specifying a hyperparameter space with AutoGluon
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Two layer MLP where we optimize over:
- the number of units on the first layer
- the number of units on the second layer
- the dropout rate after each layer
- the learning rate
- the scaling
- the ``@ag.args`` decorator allows us to specify the space we will
optimize over, this matches the
`ConfigSpace `__ syntax
The body of the function ``run_mlp_openml`` is pretty simple:
- it reads the hyperparameters given via the decorator
- it defines a 2 layer MLP with dropout
- it declares a trainer with the 'adam' loss function and a provided
learning rate
- it trains the NN with a number of epochs (most of that is boilerplate
code from ``mxnet``)
- the ``reporter`` at the end is used to keep track of training history
in the hyperparameter optimization
**Note**: The number of epochs and the hyperparameter space are reduced
to make for a shorter experiment
.. code:: python
@ag.args(n_units_1=ag.space.Int(lower=16, upper=128),
n_units_2=ag.space.Int(lower=16, upper=128),
dropout_1=ag.space.Real(lower=0, upper=.75),
dropout_2=ag.space.Real(lower=0, upper=.75),
learning_rate=ag.space.Real(lower=1e-6, upper=1, log=True),
batch_size=ag.space.Int(lower=8, upper=128),
scale_1=ag.space.Real(lower=0.001, upper=10, log=True),
scale_2=ag.space.Real(lower=0.001, upper=10, log=True),
epochs=9)
def run_mlp_openml(args, reporter, **kwargs):
# Time stamp for elapsed_time
ts_start = time.time()
# Unwrap hyperparameters
n_units_1 = args.n_units_1
n_units_2 = args.n_units_2
dropout_1 = args.dropout_1
dropout_2 = args.dropout_2
scale_1 = args.scale_1
scale_2 = args.scale_2
batch_size = args.batch_size
learning_rate = args.learning_rate
ctx = mx.cpu()
net = nn.Sequential()
with net.name_scope():
# Layer 1
net.add(nn.Dense(n_units_1, activation='relu',
weight_initializer=mx.initializer.Uniform(scale=scale_1)))
# Dropout
net.add(gluon.nn.Dropout(dropout_1))
# Layer 2
net.add(nn.Dense(n_units_2, activation='relu',
weight_initializer=mx.initializer.Uniform(scale=scale_2)))
# Dropout
net.add(gluon.nn.Dropout(dropout_2))
# Output
net.add(nn.Dense(n_classes))
net.initialize(ctx=ctx)
trainer = gluon.Trainer(net.collect_params(), 'adam',
{'learning_rate': learning_rate})
for epoch in range(args.epochs):
ts_epoch = time.time()
train_iter = mx.io.NDArrayIter(
data={'data': X_train},
label={'label': y_train},
batch_size=batch_size,
shuffle=True)
valid_iter = mx.io.NDArrayIter(
data={'data': X_valid},
label={'label': y_valid},
batch_size=batch_size,
shuffle=False)
metric = mx.metric.Accuracy()
loss = gluon.loss.SoftmaxCrossEntropyLoss()
for batch in train_iter:
data = batch.data[0].as_in_context(ctx)
label = batch.label[0].as_in_context(ctx)
with autograd.record():
output = net(data)
L = loss(output, label)
L.backward()
trainer.step(data.shape[0])
metric.update([label], [output])
name, train_acc = metric.get()
metric = mx.metric.Accuracy()
for batch in valid_iter:
data = batch.data[0].as_in_context(ctx)
label = batch.label[0].as_in_context(ctx)
output = net(data)
metric.update([label], [output])
name, val_acc = metric.get()
print('Epoch %d ; Time: %f ; Training: %s=%f ; Validation: %s=%f' % (
epoch + 1, time.time() - ts_start, name, train_acc, name, val_acc))
ts_now = time.time()
eval_time = ts_now - ts_epoch
elapsed_time = ts_now - ts_start
# The resource reported back (as 'epoch') is the number of epochs
# done, starting at 1
reporter(
epoch=epoch + 1,
objective=float(val_acc),
eval_time=eval_time,
time_step=ts_now,
elapsed_time=elapsed_time)
**Note**: The annotation ``epochs=9`` specifies the maximum number of
epochs for training. It becomes available as ``args.epochs``.
Importantly, it is also processed by ``HyperbandScheduler`` below in
order to set its ``max_t`` attribute.
**Recommendation**: Whenever writing training code to be passed as
``train_fn`` to a scheduler, if this training code reports a resource
(or time) attribute, the corresponding maximum resource value should be
included in ``train_fn.args``:
- If the resource attribute (``time_attr`` of scheduler) in
``train_fn`` is ``epoch``, make sure to include ``epochs=XYZ`` in the
annotation. This allows the scheduler to read ``max_t`` from
``train_fn.args.epochs``. This case corresponds to our example here.
- If the resource attribute is something else than ``epoch``, you can
also include the annotation ``max_t=XYZ``, which allows the scheduler
to read ``max_t`` from ``train_fn.args.max_t``.
Annotating the training function by the correct value for ``max_t``
simplifies scheduler creation (since ``max_t`` does not have to be
passed), and avoids inconsistencies between ``train_fn`` and the
scheduler.
Running the Hyperparameter Optimization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can use the following schedulers:
- FIFO (``fifo``)
- Hyperband (either the stopping (``hbs``) or promotion (``hbp``)
variant)
And the following searchers:
- Random search (``random``)
- Gaussian process based Bayesian optimization (``bayesopt``)
- SkOpt Bayesian optimization (``skopt``; only with FIFO scheduler)
Note that the method known as (asynchronous) Hyperband is using random
search. Combining Hyperband scheduling with the ``bayesopt`` searcher
uses a novel method called asynchronous BOHB.
Pick the combination you're interested in (doing the full experiment
takes around 120 seconds, see the ``time_out`` parameter), running
everything with multiple runs can take a fair bit of time. In real life,
you will want to choose a larger ``time_out`` in order to obtain good
performance.
.. code:: python
SCHEDULER = "hbs"
SEARCHER = "bayesopt"
.. code:: python
def compute_error(df):
return 1.0 - df["objective"]
def compute_runtime(df, start_timestamp):
return df["time_step"] - start_timestamp
def process_training_history(task_dicts, start_timestamp,
runtime_fn=compute_runtime,
error_fn=compute_error):
task_dfs = []
for task_id in task_dicts:
task_df = pd.DataFrame(task_dicts[task_id])
task_df = task_df.assign(task_id=task_id,
runtime=runtime_fn(task_df, start_timestamp),
error=error_fn(task_df),
target_epoch=task_df["epoch"].iloc[-1])
task_dfs.append(task_df)
result = pd.concat(task_dfs, axis="index", ignore_index=True, sort=True)
# re-order by runtime
result = result.sort_values(by="runtime")
# calculate incumbent best -- the cumulative minimum of the error.
result = result.assign(best=result["error"].cummin())
return result
resources = dict(num_cpus=NUM_CPUS, num_gpus=0)
.. code:: python
search_options = {
'num_init_random': 2,
'debug_log': True}
if SCHEDULER == 'fifo':
myscheduler = ag.scheduler.FIFOScheduler(
run_mlp_openml,
resource=resources,
searcher=SEARCHER,
search_options=search_options,
time_out=120,
time_attr=RESOURCE_ATTR_NAME,
reward_attr=REWARD_ATTR_NAME)
else:
# This setup uses rung levels at 1, 3, 9 epochs. We just use a single
# bracket, so this is in fact successive halving (Hyperband would use
# more than 1 bracket).
# Also note that since we do not use the max_t argument of
# HyperbandScheduler, this value is obtained from train_fn.args.epochs.
sch_type = 'stopping' if SCHEDULER == 'hbs' else 'promotion'
myscheduler = ag.scheduler.HyperbandScheduler(
run_mlp_openml,
resource=resources,
searcher=SEARCHER,
search_options=search_options,
time_out=120,
time_attr=RESOURCE_ATTR_NAME,
reward_attr=REWARD_ATTR_NAME,
type=sch_type,
grace_period=1,
reduction_factor=3,
brackets=1)
# run tasks
myscheduler.run()
myscheduler.join_jobs()
results_df = process_training_history(
myscheduler.training_history.copy(),
start_timestamp=myscheduler._start_time)
.. parsed-literal::
:class: output
/var/lib/jenkins/miniconda3/envs/autogluon_docs/lib/python3.7/site-packages/distributed/worker.py:3382: UserWarning: Large object of size 1.30 MB detected in task graph:
(, {'ar ... sReporter}, [])
Consider scattering large objects ahead of time
with client.scatter to reduce scheduler burden and
keep data on workers
future = client.submit(func, big_data) # bad
big_future = client.scatter(big_data) # good
future = client.submit(func, big_future) # good
% (format_bytes(len(b)), s)
.. parsed-literal::
:class: output
Epoch 1 ; Time: 0.489615 ; Training: accuracy=0.260079 ; Validation: accuracy=0.531250
Epoch 2 ; Time: 1.048678 ; Training: accuracy=0.496365 ; Validation: accuracy=0.655247
Epoch 3 ; Time: 1.473180 ; Training: accuracy=0.559650 ; Validation: accuracy=0.694686
Epoch 4 ; Time: 1.898949 ; Training: accuracy=0.588896 ; Validation: accuracy=0.711063
Epoch 5 ; Time: 2.334140 ; Training: accuracy=0.609385 ; Validation: accuracy=0.726939
Epoch 6 ; Time: 2.765212 ; Training: accuracy=0.628139 ; Validation: accuracy=0.745321
Epoch 7 ; Time: 3.190049 ; Training: accuracy=0.641193 ; Validation: accuracy=0.750501
Epoch 8 ; Time: 3.613669 ; Training: accuracy=0.653751 ; Validation: accuracy=0.763202
Epoch 9 ; Time: 4.036328 ; Training: accuracy=0.665482 ; Validation: accuracy=0.766043
Epoch 1 ; Time: 0.466270 ; Training: accuracy=0.333744 ; Validation: accuracy=0.634825
Epoch 2 ; Time: 0.755745 ; Training: accuracy=0.455375 ; Validation: accuracy=0.690464
Epoch 3 ; Time: 1.017018 ; Training: accuracy=0.485618 ; Validation: accuracy=0.722809
Epoch 4 ; Time: 1.273963 ; Training: accuracy=0.507396 ; Validation: accuracy=0.729512
Epoch 5 ; Time: 1.531135 ; Training: accuracy=0.508629 ; Validation: accuracy=0.736383
Epoch 6 ; Time: 1.785283 ; Training: accuracy=0.526956 ; Validation: accuracy=0.736383
Epoch 7 ; Time: 2.117464 ; Training: accuracy=0.528189 ; Validation: accuracy=0.740070
Epoch 8 ; Time: 2.378113 ; Training: accuracy=0.530572 ; Validation: accuracy=0.731188
Epoch 9 ; Time: 2.632645 ; Training: accuracy=0.537640 ; Validation: accuracy=0.718284
Epoch 1 ; Time: 3.734678 ; Training: accuracy=0.040285 ; Validation: accuracy=0.035162
Epoch 1 ; Time: 0.439347 ; Training: accuracy=0.040268 ; Validation: accuracy=0.034201
Epoch 1 ; Time: 0.476316 ; Training: accuracy=0.038455 ; Validation: accuracy=0.050618
Epoch 1 ; Time: 0.600889 ; Training: accuracy=0.413047 ; Validation: accuracy=0.652996
Epoch 2 ; Time: 1.108710 ; Training: accuracy=0.626078 ; Validation: accuracy=0.728155
Epoch 3 ; Time: 1.609867 ; Training: accuracy=0.679045 ; Validation: accuracy=0.773351
Epoch 4 ; Time: 2.109448 ; Training: accuracy=0.708803 ; Validation: accuracy=0.801640
Epoch 5 ; Time: 2.611935 ; Training: accuracy=0.729940 ; Validation: accuracy=0.806830
Epoch 6 ; Time: 3.118732 ; Training: accuracy=0.749337 ; Validation: accuracy=0.817375
Epoch 7 ; Time: 3.608914 ; Training: accuracy=0.755471 ; Validation: accuracy=0.830767
Epoch 8 ; Time: 4.111095 ; Training: accuracy=0.766247 ; Validation: accuracy=0.842149
Epoch 9 ; Time: 4.604150 ; Training: accuracy=0.779675 ; Validation: accuracy=0.849347
Epoch 1 ; Time: 0.604789 ; Training: accuracy=0.194131 ; Validation: accuracy=0.363378
Epoch 1 ; Time: 0.766593 ; Training: accuracy=0.195778 ; Validation: accuracy=0.445302
Epoch 1 ; Time: 1.723399 ; Training: accuracy=0.333748 ; Validation: accuracy=0.536195
Epoch 2 ; Time: 3.289454 ; Training: accuracy=0.402570 ; Validation: accuracy=0.538721
Epoch 3 ; Time: 4.809397 ; Training: accuracy=0.411111 ; Validation: accuracy=0.563468
Epoch 1 ; Time: 0.367575 ; Training: accuracy=0.534186 ; Validation: accuracy=0.763053
Epoch 2 ; Time: 0.672390 ; Training: accuracy=0.734103 ; Validation: accuracy=0.816927
Epoch 3 ; Time: 0.972057 ; Training: accuracy=0.787711 ; Validation: accuracy=0.852178
Epoch 4 ; Time: 1.274396 ; Training: accuracy=0.815340 ; Validation: accuracy=0.868141
Epoch 5 ; Time: 1.567892 ; Training: accuracy=0.835711 ; Validation: accuracy=0.875624
Epoch 6 ; Time: 1.864296 ; Training: accuracy=0.846680 ; Validation: accuracy=0.885434
Epoch 7 ; Time: 2.157487 ; Training: accuracy=0.858887 ; Validation: accuracy=0.903725
Epoch 8 ; Time: 2.475519 ; Training: accuracy=0.869526 ; Validation: accuracy=0.904390
Epoch 9 ; Time: 2.767120 ; Training: accuracy=0.870515 ; Validation: accuracy=0.911041
Epoch 1 ; Time: 0.776561 ; Training: accuracy=0.379305 ; Validation: accuracy=0.647315
Epoch 2 ; Time: 1.487786 ; Training: accuracy=0.600248 ; Validation: accuracy=0.714262
Epoch 3 ; Time: 2.248972 ; Training: accuracy=0.653642 ; Validation: accuracy=0.760403
Epoch 4 ; Time: 2.986712 ; Training: accuracy=0.682781 ; Validation: accuracy=0.772483
Epoch 5 ; Time: 3.712490 ; Training: accuracy=0.704801 ; Validation: accuracy=0.789933
Epoch 6 ; Time: 4.465289 ; Training: accuracy=0.719868 ; Validation: accuracy=0.801510
Epoch 7 ; Time: 5.175647 ; Training: accuracy=0.732202 ; Validation: accuracy=0.804530
Epoch 8 ; Time: 5.891556 ; Training: accuracy=0.741308 ; Validation: accuracy=0.814430
Epoch 9 ; Time: 6.606037 ; Training: accuracy=0.749338 ; Validation: accuracy=0.822315
Epoch 1 ; Time: 0.361260 ; Training: accuracy=0.638268 ; Validation: accuracy=0.816761
Epoch 2 ; Time: 0.710399 ; Training: accuracy=0.799670 ; Validation: accuracy=0.865813
Epoch 3 ; Time: 0.999872 ; Training: accuracy=0.839423 ; Validation: accuracy=0.896575
Epoch 4 ; Time: 1.293569 ; Training: accuracy=0.857649 ; Validation: accuracy=0.895078
Epoch 5 ; Time: 1.584750 ; Training: accuracy=0.873237 ; Validation: accuracy=0.909046
Epoch 6 ; Time: 1.892689 ; Training: accuracy=0.876371 ; Validation: accuracy=0.911706
Epoch 7 ; Time: 2.191382 ; Training: accuracy=0.880990 ; Validation: accuracy=0.915032
Epoch 8 ; Time: 2.489152 ; Training: accuracy=0.892784 ; Validation: accuracy=0.927004
Epoch 9 ; Time: 2.782486 ; Training: accuracy=0.897072 ; Validation: accuracy=0.926671
Epoch 1 ; Time: 0.301978 ; Training: accuracy=0.082293 ; Validation: accuracy=0.094507
Epoch 1 ; Time: 0.342196 ; Training: accuracy=0.229427 ; Validation: accuracy=0.393692
Epoch 1 ; Time: 1.136153 ; Training: accuracy=0.499503 ; Validation: accuracy=0.702553
Epoch 2 ; Time: 2.205493 ; Training: accuracy=0.627901 ; Validation: accuracy=0.756634
Epoch 3 ; Time: 3.496971 ; Training: accuracy=0.671751 ; Validation: accuracy=0.783171
Epoch 4 ; Time: 4.714276 ; Training: accuracy=0.691313 ; Validation: accuracy=0.787874
Epoch 5 ; Time: 5.796735 ; Training: accuracy=0.688495 ; Validation: accuracy=0.766208
Epoch 6 ; Time: 6.902215 ; Training: accuracy=0.686257 ; Validation: accuracy=0.772254
Epoch 7 ; Time: 7.979167 ; Training: accuracy=0.678631 ; Validation: accuracy=0.740846
Epoch 8 ; Time: 9.112805 ; Training: accuracy=0.670424 ; Validation: accuracy=0.725059
Epoch 9 ; Time: 10.199127 ; Training: accuracy=0.659566 ; Validation: accuracy=0.725395
Epoch 1 ; Time: 0.664402 ; Training: accuracy=0.554185 ; Validation: accuracy=0.787402
Epoch 2 ; Time: 1.283265 ; Training: accuracy=0.721500 ; Validation: accuracy=0.829117
Epoch 3 ; Time: 1.898131 ; Training: accuracy=0.759997 ; Validation: accuracy=0.851064
Epoch 4 ; Time: 2.504328 ; Training: accuracy=0.776637 ; Validation: accuracy=0.866477
Epoch 5 ; Time: 3.115953 ; Training: accuracy=0.785082 ; Validation: accuracy=0.872675
Epoch 6 ; Time: 3.729988 ; Training: accuracy=0.796920 ; Validation: accuracy=0.878037
Epoch 7 ; Time: 4.335588 ; Training: accuracy=0.803792 ; Validation: accuracy=0.882560
Epoch 8 ; Time: 4.946231 ; Training: accuracy=0.804785 ; Validation: accuracy=0.890601
Epoch 9 ; Time: 5.604174 ; Training: accuracy=0.812236 ; Validation: accuracy=0.890266
Epoch 1 ; Time: 0.646575 ; Training: accuracy=0.190744 ; Validation: accuracy=0.553277
Epoch 1 ; Time: 0.421341 ; Training: accuracy=0.562516 ; Validation: accuracy=0.766600
Epoch 2 ; Time: 0.777227 ; Training: accuracy=0.714558 ; Validation: accuracy=0.810644
Epoch 3 ; Time: 1.132471 ; Training: accuracy=0.749109 ; Validation: accuracy=0.843844
Epoch 4 ; Time: 1.483480 ; Training: accuracy=0.771149 ; Validation: accuracy=0.858525
Epoch 5 ; Time: 1.838053 ; Training: accuracy=0.780263 ; Validation: accuracy=0.853687
Epoch 6 ; Time: 2.190382 ; Training: accuracy=0.787472 ; Validation: accuracy=0.879880
Epoch 7 ; Time: 2.545783 ; Training: accuracy=0.800895 ; Validation: accuracy=0.878045
Epoch 8 ; Time: 2.901077 ; Training: accuracy=0.801392 ; Validation: accuracy=0.872539
Epoch 9 ; Time: 3.374341 ; Training: accuracy=0.802883 ; Validation: accuracy=0.875042
Epoch 1 ; Time: 2.253398 ; Training: accuracy=0.506878 ; Validation: accuracy=0.733445
Epoch 2 ; Time: 4.251124 ; Training: accuracy=0.678406 ; Validation: accuracy=0.793613
Epoch 3 ; Time: 6.231996 ; Training: accuracy=0.714120 ; Validation: accuracy=0.811261
Epoch 1 ; Time: 1.271492 ; Training: accuracy=0.633623 ; Validation: accuracy=0.747356
Epoch 2 ; Time: 2.599627 ; Training: accuracy=0.745010 ; Validation: accuracy=0.802753
Epoch 3 ; Time: 3.859993 ; Training: accuracy=0.781863 ; Validation: accuracy=0.814336
Epoch 1 ; Time: 0.352804 ; Training: accuracy=0.509886 ; Validation: accuracy=0.747092
Epoch 2 ; Time: 0.785402 ; Training: accuracy=0.670127 ; Validation: accuracy=0.789963
Epoch 3 ; Time: 1.070437 ; Training: accuracy=0.723019 ; Validation: accuracy=0.821037
Epoch 1 ; Time: 0.697962 ; Training: accuracy=0.077363 ; Validation: accuracy=0.259259
Epoch 1 ; Time: 0.937790 ; Training: accuracy=0.606681 ; Validation: accuracy=0.791163
Epoch 2 ; Time: 1.811806 ; Training: accuracy=0.751741 ; Validation: accuracy=0.836358
Epoch 3 ; Time: 2.688279 ; Training: accuracy=0.792109 ; Validation: accuracy=0.862567
Epoch 4 ; Time: 3.569808 ; Training: accuracy=0.808024 ; Validation: accuracy=0.873656
Epoch 5 ; Time: 4.441044 ; Training: accuracy=0.819712 ; Validation: accuracy=0.881552
Epoch 6 ; Time: 5.313226 ; Training: accuracy=0.834881 ; Validation: accuracy=0.876848
Epoch 7 ; Time: 6.193175 ; Training: accuracy=0.834383 ; Validation: accuracy=0.894321
Epoch 8 ; Time: 7.057981 ; Training: accuracy=0.842838 ; Validation: accuracy=0.893481
Epoch 9 ; Time: 7.943710 ; Training: accuracy=0.851044 ; Validation: accuracy=0.907930
Epoch 1 ; Time: 0.479605 ; Training: accuracy=0.445548 ; Validation: accuracy=0.734510
Epoch 2 ; Time: 0.903569 ; Training: accuracy=0.622310 ; Validation: accuracy=0.782145
Epoch 3 ; Time: 1.326291 ; Training: accuracy=0.670639 ; Validation: accuracy=0.806462
Epoch 1 ; Time: 0.534451 ; Training: accuracy=0.612890 ; Validation: accuracy=0.801036
Epoch 2 ; Time: 0.926063 ; Training: accuracy=0.783545 ; Validation: accuracy=0.845807
Epoch 3 ; Time: 1.313288 ; Training: accuracy=0.826952 ; Validation: accuracy=0.876879
Epoch 4 ; Time: 1.704123 ; Training: accuracy=0.846014 ; Validation: accuracy=0.885232
Epoch 5 ; Time: 2.096707 ; Training: accuracy=0.866150 ; Validation: accuracy=0.907117
Epoch 6 ; Time: 2.486436 ; Training: accuracy=0.876217 ; Validation: accuracy=0.904277
Epoch 7 ; Time: 2.873988 ; Training: accuracy=0.880343 ; Validation: accuracy=0.907451
Epoch 8 ; Time: 3.268614 ; Training: accuracy=0.886615 ; Validation: accuracy=0.912128
Epoch 9 ; Time: 3.684941 ; Training: accuracy=0.886120 ; Validation: accuracy=0.906950
Epoch 1 ; Time: 0.292612 ; Training: accuracy=0.545395 ; Validation: accuracy=0.775100
Epoch 2 ; Time: 0.523657 ; Training: accuracy=0.729934 ; Validation: accuracy=0.834275
Epoch 3 ; Time: 0.749090 ; Training: accuracy=0.775987 ; Validation: accuracy=0.846742
Epoch 4 ; Time: 0.976256 ; Training: accuracy=0.799589 ; Validation: accuracy=0.857214
Epoch 5 ; Time: 1.211971 ; Training: accuracy=0.810526 ; Validation: accuracy=0.875166
Epoch 6 ; Time: 1.443364 ; Training: accuracy=0.822615 ; Validation: accuracy=0.887799
Epoch 7 ; Time: 1.667635 ; Training: accuracy=0.832237 ; Validation: accuracy=0.890957
Epoch 8 ; Time: 1.912996 ; Training: accuracy=0.838651 ; Validation: accuracy=0.888963
Epoch 9 ; Time: 2.139117 ; Training: accuracy=0.845806 ; Validation: accuracy=0.898770
Epoch 1 ; Time: 3.546211 ; Training: accuracy=0.658737 ; Validation: accuracy=0.784657
Epoch 2 ; Time: 7.056022 ; Training: accuracy=0.773127 ; Validation: accuracy=0.825538
Epoch 3 ; Time: 10.694485 ; Training: accuracy=0.801061 ; Validation: accuracy=0.855653
Epoch 4 ; Time: 14.279970 ; Training: accuracy=0.823193 ; Validation: accuracy=0.852456
Epoch 5 ; Time: 18.026569 ; Training: accuracy=0.831068 ; Validation: accuracy=0.884926
Epoch 6 ; Time: 21.676836 ; Training: accuracy=0.846817 ; Validation: accuracy=0.870626
Epoch 7 ; Time: 25.406252 ; Training: accuracy=0.850962 ; Validation: accuracy=0.900740
Epoch 8 ; Time: 29.057364 ; Training: accuracy=0.854360 ; Validation: accuracy=0.895525
Epoch 9 ; Time: 32.872933 ; Training: accuracy=0.859334 ; Validation: accuracy=0.886608
Epoch 1 ; Time: 0.441721 ; Training: accuracy=0.595947 ; Validation: accuracy=0.774892
Epoch 2 ; Time: 0.822823 ; Training: accuracy=0.776261 ; Validation: accuracy=0.843656
Epoch 3 ; Time: 1.206139 ; Training: accuracy=0.822581 ; Validation: accuracy=0.866300
Epoch 4 ; Time: 1.590910 ; Training: accuracy=0.845492 ; Validation: accuracy=0.888445
Epoch 5 ; Time: 2.052370 ; Training: accuracy=0.862779 ; Validation: accuracy=0.899767
Epoch 6 ; Time: 2.478867 ; Training: accuracy=0.876923 ; Validation: accuracy=0.901598
Epoch 7 ; Time: 2.845222 ; Training: accuracy=0.884533 ; Validation: accuracy=0.912421
Epoch 8 ; Time: 3.207347 ; Training: accuracy=0.895616 ; Validation: accuracy=0.916084
Epoch 9 ; Time: 3.582762 ; Training: accuracy=0.897849 ; Validation: accuracy=0.919747
Analysing the results
~~~~~~~~~~~~~~~~~~~~~
The training history is stored in the ``results_df``, the main fields
are the runtime and ``'best'`` (the objective).
**Note**: You will get slightly different curves for different pairs of
scheduler/searcher, the ``time_out`` here is a bit too short to really
see the difference in a significant way (it would be better to set it to
>1000s). Generally speaking though, hyperband stopping / promotion +
model will tend to significantly outperform other combinations given
enough time.
.. code:: python
results_df.head()
.. raw:: html
|
bracket |
elapsed_time |
epoch |
error |
eval_time |
objective |
runtime |
searcher_data_size |
searcher_params_kernel_covariance_scale |
searcher_params_kernel_inv_bw0 |
... |
searcher_params_kernel_inv_bw7 |
searcher_params_kernel_inv_bw8 |
searcher_params_mean_mean_value |
searcher_params_noise_variance |
target_epoch |
task_id |
time_since_start |
time_step |
time_this_iter |
best |
0 |
0 |
0.492290 |
1 |
0.468750 |
0.487200 |
0.531250 |
0.584138 |
NaN |
1.0 |
1.0 |
... |
1.0 |
1.0 |
0.0 |
0.001 |
9 |
0 |
0.586006 |
1.603489e+09 |
0.522913 |
0.468750 |
1 |
0 |
1.050486 |
2 |
0.344753 |
0.553469 |
0.655247 |
1.142334 |
1.0 |
1.0 |
1.0 |
... |
1.0 |
1.0 |
0.0 |
0.001 |
9 |
0 |
1.143515 |
1.603489e+09 |
0.558179 |
0.344753 |
2 |
0 |
1.474974 |
3 |
0.305314 |
0.422164 |
0.694686 |
1.566823 |
1.0 |
1.0 |
1.0 |
... |
1.0 |
1.0 |
0.0 |
0.001 |
9 |
0 |
1.567652 |
1.603489e+09 |
0.424488 |
0.305314 |
3 |
0 |
1.900829 |
4 |
0.288937 |
0.423021 |
0.711063 |
1.992677 |
2.0 |
1.0 |
1.0 |
... |
1.0 |
1.0 |
0.0 |
0.001 |
9 |
0 |
1.993604 |
1.603489e+09 |
0.425855 |
0.288937 |
4 |
0 |
2.335975 |
5 |
0.273061 |
0.432909 |
0.726939 |
2.427823 |
2.0 |
1.0 |
1.0 |
... |
1.0 |
1.0 |
0.0 |
0.001 |
9 |
0 |
2.428878 |
1.603489e+09 |
0.435146 |
0.273061 |
5 rows × 26 columns
.. code:: python
import matplotlib.pyplot as plt
plt.figure(figsize=(12, 8))
runtime = results_df['runtime'].values
objective = results_df['best'].values
plt.plot(runtime, objective, lw=2)
plt.xticks(fontsize=12)
plt.xlim(0, 120)
plt.ylim(0, 0.5)
plt.yticks(fontsize=12)
plt.xlabel("Runtime [s]", fontsize=14)
plt.ylabel("Objective", fontsize=14)
.. parsed-literal::
:class: output
Text(0, 0.5, 'Objective')
.. figure:: output_mlp_cb387f_18_1.png
Diving Deeper
-------------
Now, you are ready to try HPO on your own machine learning models (if
you use PyTorch, have a look at :ref:`sec_customstorch`). While
AutoGluon comes with well-chosen defaults, it can pay off to tune it to
your specific needs. Here are some tips which may come useful.
Logging the Search Progress
~~~~~~~~~~~~~~~~~~~~~~~~~~~
First, it is a good idea in general to switch on ``debug_log``, which
outputs useful information about the search progress. This is already
done in the example above.
The outputs show which configurations are chosen, stopped, or promoted.
For BO and BOHB, a range of information is displayed for every
``get_config`` decision. This log output is very useful in order to
figure out what is going on during the search.
Configuring ``HyperbandScheduler``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The most important knobs to turn with ``HyperbandScheduler`` are
``max_t``, ``grace_period``, ``reduction_factor``, ``brackets``, and
``type``. The first three determine the rung levels at which stopping or
promotion decisions are being made.
- The maximum resource level ``max_t`` (usually, resource equates to
epochs, so ``max_t`` is the maximum number of training epochs) is
typically hardcoded in ``train_fn`` passed to the scheduler (this is
``run_mlp_openml`` in the example above). As already noted above, the
value is best fixed in the ``ag.args`` decorator as ``epochs=XYZ``,
it can then be accessed as ``args.epochs`` in the ``train_fn`` code.
If this is done, you do not have to pass ``max_t`` when creating the
scheduler.
- ``grace_period`` and ``reduction_factor`` determine the rung levels,
which are ``grace_period``, ``grace_period * reduction_factor``,
``grace_period * (reduction_factor ** 2)``, etc. All rung levels must
be less or equal than ``max_t``. It is recommended to make ``max_t``
equal to the largest rung level. For example, if
``grace_period = 1``, ``reduction_factor = 3``, it is in general
recommended to use ``max_t = 9``, ``max_t = 27``, or ``max_t = 81``.
Choosing a ``max_t`` value "off the grid" works against the
successive halving principle that the total resources spent in a rung
should be roughly equal between rungs. If in the example above, you
set ``max_t = 10``, about a third of configurations reaching 9 epochs
are allowed to proceed, but only for one more epoch.
- With ``reduction_factor``, you tune the extent to which successive
halving filtering is applied. The larger this integer, the fewer
configurations make it to higher number of epochs. Values 2, 3, 4 are
commonly used.
- Finally, ``grace_period`` should be set to the smallest resource
(number of epochs) for which you expect any meaningful
differentiation between configurations. While ``grace_period = 1``
should always be explored, it may be too low for any meaningful
stopping decisions to be made at the first rung.
- ``brackets`` sets the maximum number of brackets in Hyperband (make
sure to study the Hyperband paper or follow-ups for details). For
``brackets = 1``, you are running successive halving (single
bracket). Higher brackets have larger effective ``grace_period``
values (so runs are not stopped until later), yet are also chosen
with less probability. We recommend to always consider successive
halving (``brackets = 1``) in a comparison.
- Finally, with ``type`` (values ``stopping``, ``promotion``) you are
choosing different ways of extending successive halving scheduling to
the asynchronous case. The method for the default ``stopping`` is
simpler and seems to perform well, but ``promotion`` is more careful
promoting configurations to higher resource levels, which can work
better in some cases.
Asynchronous BOHB
~~~~~~~~~~~~~~~~~
Finally, here are some ideas for tuning asynchronous BOHB, apart from
tuning its ``HyperbandScheduling`` component. You need to pass these
options in ``search_options``.
- We support a range of different surrogate models over the criterion
functions across resource levels. All of them are jointly dependent
Gaussian process models, meaning that data collected at all resource
levels are modelled together. The surrogate model is selected by
``gp_resource_kernel``, values are ``matern52``,
``matern52-res-warp``, ``exp-decay-sum``, ``exp-decay-combined``,
``exp-decay-delta1``. These are variants of either a joint Matern 5/2
kernel over configuration and resource, or the exponential decay
model. Details about the latter can be found
`here `__.
- Fitting a Gaussian process surrogate model to data encurs a cost
which scales cubically with the number of datapoints. When applied to
expensive deep learning workloads, even multi-fidelity asynchronous
BOHB is rarely running up more than 100 observations or so (across
all rung levels and brackets), and the GP computations are
subdominant. However, if you apply it to cheaper ``train_fn`` and
find yourself beyond 2000 total evaluations, the cost of GP fitting
can become painful. In such a situation, you can explore the options
``opt_skip_period`` and ``opt_skip_num_max_resource``. The basic idea
is as follows. By far the most expensive part of a ``get_config``
call (picking the next configuration) is the refitting of the GP
model to past data (this entails re-optimizing hyperparameters of the
surrogate model itself). The options allow you to skip this expensive
step for most ``get_config`` calls, after some initial period. Check
the docstrings for details about these options. If you find yourself
in such a situation and gain experience with these skipping features,
make sure to contact the AutoGluon developers -- we would love to
learn about your use case.