Demo RL Searcher

In this tutorial, we are going to compare RL searcher with random search in a simulation environment.

A Toy Reward Space

Input Space x = [0: 99], y = [0: 99]. The rewards are a combination of 2 gaussians as shown in the following figure:

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

Generate the simulation rewards as a mixture of 2 gaussians:

def gaussian(x, y, x0, y0, xalpha, yalpha, A):
    return A * np.exp( -((x-x0)/xalpha)**2 -((y-y0)/yalpha)**2)

x, y = np.linspace(0, 99, 100), np.linspace(0, 99, 100)
X, Y = np.meshgrid(x, y)

Z = np.zeros(X.shape)
ps = [(20, 70, 35, 40, 1),
      (80, 40, 20, 20, 0.7)]
for p in ps:
    Z += gaussian(X, Y, *p)

Visualize the reward space:

fig = plt.figure()
ax = fig.gca(projection='3d')
ax.plot_surface(X, Y, Z, cmap='plasma')

Simulation Experiment

Customize Train Function

We can define any function with a decorator @ag.args, which converts the function to AutoGluon searchable. The reporter is used to communicate with AutoGluon search algorithms.

import autogluon.core as ag

def rl_simulation(args, reporter):
    x, y = args.x, args.y

Random Search Baseline

random_scheduler = ag.scheduler.FIFOScheduler(rl_simulation,
                                              resource={'num_cpus': 1, 'num_gpus': 0},
print('Best config: {}, best reward: {}'.format(random_scheduler.get_best_config(), random_scheduler.get_best_reward()))
HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=300.0), HTML(value='')))
Best config: {'x▁choice': 22, 'y▁choice': 69}, best reward: 0.9961362873852471

Reinforcement Learning

rl_scheduler = ag.scheduler.RLScheduler(rl_simulation,
                                        resource={'num_cpus': 1, 'num_gpus': 0},
print('Best config: {}, best reward: {}'.format(rl_scheduler.get_best_config(), rl_scheduler.get_best_reward()))
100%|██████████| 76/76 [00:17<00:00,  4.34it/s]
Best config: {'x▁choice': 21, 'y▁choice': 74}, best reward: 0.9892484241569526

Compare the Performance

Get the result history:

results_rl = [v[0]['accuracy'] for v in rl_scheduler.training_history.values()]
results_random = [v[0]['accuracy'] for v in random_scheduler.training_history.values()]

Average result every 10 trials:

import statistics
results1 = [statistics.mean(results_random[i:i+10]) for i in range(0, len(results_random), 10)]
results2 = [statistics.mean(results_rl[i:i+10]) for i in range(0, len(results_rl), 10)]

Plot the results:

plt.plot(range(len(results1)), results1, range(len(results2)), results2)
[<matplotlib.lines.Line2D at 0x7fb9d3184ad0>,
 <matplotlib.lines.Line2D at 0x7fb9cb3f8790>]