Predicting Columns in a Table - In Depth

Tip: If you are new to AutoGluon, review Predicting Columns in a Table - Quick Start to learn the basics of the AutoGluon API.

This tutorial describes how you can exert greater control when using AutoGluon’s fit() by specifying the appropriate arguments. Using the same census data table as Predicting Columns in a Table - Quick Start, we will try to predict the occupation of an individual - a multi-class classification problem.

Start by importing AutoGluon, specifying TabularPrediction as the task, and loading the data.

import autogluon as ag
from autogluon import TabularPrediction as task

train_data = task.Dataset(file_path='https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')
train_data = train_data.head(500) # subsample 500 data points for faster demo (comment this out to run on full dataset instead)
print(train_data.head())

val_data = task.Dataset(file_path='https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')

label_column = 'occupation'
print("Summary of occupation column: \n", train_data['occupation'].describe())
Loaded data from: https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv | Columns = 15 / 15 | Rows = 39073 -> 39073
Loaded data from: https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv | Columns = 15 / 15 | Rows = 9769 -> 9769
   age   workclass  fnlwgt   education  education-num       marital-status  0   25     Private  178478   Bachelors             13        Never-married
1   23   State-gov   61743     5th-6th              3        Never-married
2   46     Private  376789     HS-grad              9        Never-married
3   55           ?  200235     HS-grad              9   Married-civ-spouse
4   36     Private  224541     7th-8th              4   Married-civ-spouse

           occupation    relationship    race      sex  capital-gain  0        Tech-support       Own-child   White   Female             0
1    Transport-moving   Not-in-family   White     Male             0
2       Other-service   Not-in-family   White     Male             0
3                   ?         Husband   White     Male             0
4   Handlers-cleaners         Husband   White     Male             0

   capital-loss  hours-per-week  native-country   class
0             0              40   United-States   <=50K
1             0              35   United-States   <=50K
2             0              15   United-States   <=50K
3             0              50   United-States    >50K
4             0              40     El-Salvador   <=50K
Summary of occupation column:
 count                  500
unique                  14
top        Exec-managerial
freq                    69
Name: occupation, dtype: object

To demonstrate how you can provide your own validation dataset against which AutoGluon tunes hyperparameters, we’ll use the test dataset from the previous tutorial as validation data.

If you don’t have a strong reason to provide your own validation dataset, we recommend you omit the tuning_data argument. This lets AutoGluon automatically select validation data from your provided training set (it uses smart strategies such as stratified sampling). For greater control, you can specify the holdout_frac argument to tell AutoGluon what fraction of the provided training data to hold out for validation.

Caution: Since AutoGluon tunes internal knobs based on this validation data, performance estimates reported on this data may be over-optimistic. For unbiased performance estimates, you should always call predict() on a separate dataset (that was never passed to fit()), as we did in the previous Quick-Start tutorial. We also emphasize that most options specified in this tutorial are chosen to minimize runtime for the purposes of demonstration and you should select more reasonable values in order to obtain high-quality models.

fit() trains neural networks and various types of tree ensembles by default. You can specify various hyperparameter values for each type of model. For each hyperparameter, you can either specify a single fixed value, or a search space of values to consider during the hyperparameter optimization. Hyperparameters which you do not specify are left at default settings chosen automatically by AutoGluon, which may be fixed values or search spaces.

hp_tune = True  # whether or not to do hyperparameter optimization

nn_options = { # specifies non-default hyperparameter values for neural network models
    'num_epochs': 10, # number of training epochs (controls training time of NN models)
    'learning_rate': ag.space.Real(1e-4, 1e-2, default=5e-4, log=True), # learning rate used in training (real-valued hyperparameter searched on log-scale)
    'activation': ag.space.Categorical('relu', 'softrelu', 'tanh'), # activation function used in NN (categorical hyperparameter, default = first entry)
    'layers': ag.space.Categorical([100],[1000],[200,100],[300,200,100]),
      # Each choice for categorical hyperparameter 'layers' corresponds to list of sizes for each NN layer to use
    'dropout_prob': ag.space.Real(0.0, 0.5, default=0.1), # dropout probability (real-valued hyperparameter)
}

gbm_options = { # specifies non-default hyperparameter values for lightGBM gradient boosted trees
    'num_boost_round': 100, # number of boosting rounds (controls training time of GBM models)
    'num_leaves': ag.space.Int(lower=26, upper=66, default=36), # number of leaves in trees (integer hyperparameter)
}

hyperparameters = {'NN': nn_options, 'GBM': gbm_options}  # hyperparameters of each model type
# If one of these keys is missing from hyperparameters dict, then no models of that type are trained.

time_limits = 2*60  # train various models for ~2 min
num_trials = 5  # try at most 3 different hyperparameter configurations for each type of model
search_strategy = 'skopt'  # to tune hyperparameters using SKopt Bayesian optimization routine
output_directory = 'agModels-predictOccupation'  # folder where to store trained models

predictor = task.fit(train_data=train_data, tuning_data=val_data, label=label_column,
                     output_directory=output_directory, time_limits=time_limits, num_trials=num_trials,
                     hyperparameter_tune=hp_tune, hyperparameters=hyperparameters,
                     search_strategy=search_strategy)
Warning: hyperparameter_tune=True is currently experimental and may cause the process to hang. Setting auto_stack=True instead is recommended to achieve maximum quality models.
Beginning AutoGluon training ... Time limit = 120s
AutoGluon will save models to agModels-predictOccupation/
Train Data Rows:    500
Train Data Columns: 15
Tuning Data Rows:    9769
Tuning Data Columns: 15
Preprocessing data ...
Here are the first 10 unique label values in your data:  [' Tech-support', ' Transport-moving', ' Other-service', ' ?', ' Handlers-cleaners', ' Sales', ' Craft-repair', ' Adm-clerical', ' Exec-managerial', ' Prof-specialty']
AutoGluon infers your prediction problem is: multiclass  (because dtype of label-column == object).
If this is wrong, please specify problem_type argument in fit() instead (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])

Warning: Some classes in the training set have fewer than 10 examples. AutoGluon will only keep 13 out of 14 classes for training and will not try to predict the rare classes. To keep more classes, increase the number of datapoints from these rare classes in the training data or reduce label_count_threshold.
Fraction of data from classes with at least 10 examples that will be kept for training models: 0.998
Feature Generator processed 10223 data points with 14 features
Original Features:
    int features: 6
    object features: 8
Generated Features:
    int features: 0
All Features:
    int features: 6
    object features: 8
    Data preprocessing and feature engineering runtime = 0.09s ...
AutoGluon will gauge predictive performance using evaluation metric: accuracy
To change this, specify the eval_metric argument of fit()
AutoGluon will early stop models using evaluation metric: accuracy
Starting Experiments
Num of Finished Tasks is 0
Num of Pending Tasks is 5
Time out (secs) is 54.0
HBox(children=(FloatProgress(value=0.0, max=5.0), HTML(value='')))
    0.2866   = Validation accuracy score
    7.01s    = Training runtime
    0.04s    = Validation runtime
    0.2966   = Validation accuracy score
    5.12s    = Training runtime
    0.08s    = Validation runtime
    0.2805   = Validation accuracy score
    10.28s   = Training runtime
    0.04s    = Validation runtime
    0.2923   = Validation accuracy score
    5.8s     = Training runtime
    0.2s     = Validation runtime
    0.2848   = Validation accuracy score
    8.93s    = Training runtime
    0.17s    = Validation runtime
Starting Experiments
Num of Finished Tasks is 0
Num of Pending Tasks is 5
Time out (secs) is 54.0
HBox(children=(FloatProgress(value=0.0, max=5.0), HTML(value='')))
    Ran out of time, stopping training early.
Please either provide filename or allow plot in get_training_curves
    0.1266   = Validation accuracy score
    9.9s     = Training runtime
    0.87s    = Validation runtime
    0.277    = Validation accuracy score
    10.38s   = Training runtime
    0.83s    = Validation runtime
    0.1545   = Validation accuracy score
    10.24s   = Training runtime
    0.85s    = Validation runtime
    0.2756   = Validation accuracy score
    10.28s   = Training runtime
    0.81s    = Validation runtime
    0.1312   = Validation accuracy score
    9.16s    = Training runtime
    0.88s    = Validation runtime
Fitting model: weighted_ensemble_k0_l1 ... Training model for up to 119.91s of the 17.92s of remaining time.
    0.3041   = Validation accuracy score
    2.57s    = Training runtime
    0.0s     = Validation runtime
AutoGluon training complete, total runtime = 104.69s ...
../../_images/output_tabular-indepth_108df8_3_7.png

We again demonstrate how to use the trained models to predict on the validation data (We caution again that performance estimates here are biased because the same data was used to tune hyperparameters).

test_data = val_data.copy()
y_test = test_data[label_column]
test_data = test_data.drop(labels=[label_column],axis=1)  # delete label column

y_pred = predictor.predict(test_data)
print("Predictions:  ", list(y_pred)[:5])
perf = predictor.evaluate_predictions(y_true=y_test, y_pred=y_pred, auxiliary_metrics=False)
Evaluation: accuracy on test data: 0.3026921895792814
Predictions:   [' Other-service', ' ?', ' Exec-managerial', ' Sales', ' Other-service']

Use the following to view a summary of what happened during fit. This command will shows details of the hyperparameter-tuning process for each type of model:

results = predictor.fit_summary()
* Summary of fit() *
Estimated performance of each model:
                          model  score_val  pred_time_val   fit_time  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer
0       weighted_ensemble_k0_l1   0.304093       3.967064  79.385024                0.002254           2.566406            1       True
1    LightGBMClassifier/trial_1   0.296586       0.079896   5.124274                0.079896           5.124274            0       True
2    LightGBMClassifier/trial_3   0.292267       0.197775   5.797638                0.197775           5.797638            0       True
3    LightGBMClassifier/trial_0   0.286610       0.044553   7.005394                0.044553           7.005394            0       True
4    LightGBMClassifier/trial_4   0.284759       0.168713   8.929789                0.168713           8.929789            0       True
5    LightGBMClassifier/trial_2   0.280543       0.043975  10.283943                0.043975          10.283943            0       True
6   NeuralNetClassifier/trial_6   0.277046       0.827343  10.376082                0.827343          10.376082            0       True
7   NeuralNetClassifier/trial_8   0.275607       0.808649  10.282482                0.808649          10.282482            0       True
8   NeuralNetClassifier/trial_7   0.154463       0.846155  10.235851                0.846155          10.235851            0       True
9   NeuralNetClassifier/trial_9   0.131222       0.883551   9.164512                0.883551           9.164512            0       True
10  NeuralNetClassifier/trial_5   0.126594       0.872849   9.901134                0.872849           9.901134            0       True
Number of models trained: 11
Types of models trained:
{'TabularNeuralNetModel', 'LGBModel', 'WeightedEnsembleModel'}
Bagging used: False
Stack-ensembling used: False
Hyperparameter-tuning used: True
User-specified hyperparameters:
{'NN': {'num_epochs': 10, 'learning_rate': Real: lower=0.0001, upper=0.01, 'activation': Categorical['relu', 'softrelu', 'tanh'], 'layers': Categorical[[100], [1000], [200, 100], [300, 200, 100]], 'dropout_prob': Real: lower=0.0, upper=0.5}, 'GBM': {'num_boost_round': 100, 'num_leaves': Int: lower=26, upper=66}}
Plot summary of models saved to file: agModels-predictOccupation/SummaryOfModels.html
Plot summary of models saved to file: agModels-predictOccupation/LightGBMClassifier_HPOmodelsummary.html
Plot summary of models saved to file: LightGBMClassifier_HPOmodelsummary.html
Plot of HPO performance saved to file: agModels-predictOccupation/LightGBMClassifier_HPOperformanceVStrials.png
../../_images/output_tabular-indepth_108df8_7_1.png
Plot summary of models saved to file: agModels-predictOccupation/NeuralNetClassifier_HPOmodelsummary.html
Plot summary of models saved to file: NeuralNetClassifier_HPOmodelsummary.html
Plot of HPO performance saved to file: agModels-predictOccupation/NeuralNetClassifier_HPOperformanceVStrials.png
../../_images/output_tabular-indepth_108df8_7_3.png
* Details of Hyperparameter optimization *
HPO for LightGBMClassifier model:  Num. configurations tried = 5, Time spent = 38.565189361572266, Search strategy = skopt
Best hyperparameter-configuration (validation-performance: accuracy = 0.2922665569724393):
{'feature_fraction': 0.9025120168593952, 'learning_rate': 0.012029092021382304, 'min_data_in_leaf': 25, 'num_leaves': 51}
HPO for NeuralNetClassifier model:  Num. configurations tried = 5, Time spent = 56.33428430557251, Search strategy = skopt
Best hyperparameter-configuration (validation-performance: accuracy = 0.2770464829288359):
{'activation.choice': 2, 'dropout_prob': 0.07924431353560864, 'embedding_size_factor': 0.6686316325974286, 'layers.choice': 1, 'learning_rate': 0.00039868012933086714, 'network_type.choice': 0, 'use_batchnorm.choice': 1, 'weight_decay': 0.01714423742910896}
* End of fit() summary *

In the above example, the predictive performance may be poor because we specified very little training to ensure quick runtimes. You can call fit() multiple times while modifying the above settings to better understand how these choices affect performance outcomes. For example: you can comment out the train_data.head command to train using a larger dataset, increase the time_limits, and increase the num_epochs and num_boost_round hyperparameters. To see more detailed output during the execution of fit(), you can also pass in the argument: verbosity = 3.

Specifying performance metrics

Performance in certain applications may be measured by different metrics than the ones AutoGluon optimizes for by default. If you know the metric that counts most in your application, you can specify it as done below to utilize the balanced accuracy metric instead of standard accuracy (the default):

metric = 'balanced_accuracy'
predictor = task.fit(train_data=train_data, label=label_column, eval_metric=metric,
                     output_directory=output_directory, time_limits=60)

performance = predictor.evaluate(val_data)
Beginning AutoGluon training ... Time limit = 60s
AutoGluon will save models to agModels-predictOccupation/
Train Data Rows:    500
Train Data Columns: 15
Preprocessing data ...
Here are the first 10 unique label values in your data:  [' Tech-support', ' Transport-moving', ' Other-service', ' ?', ' Handlers-cleaners', ' Sales', ' Craft-repair', ' Adm-clerical', ' Exec-managerial', ' Prof-specialty']
AutoGluon infers your prediction problem is: multiclass  (because dtype of label-column == object).
If this is wrong, please specify problem_type argument in fit() instead (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])

Warning: Some classes in the training set have fewer than 10 examples. AutoGluon will only keep 13 out of 14 classes for training and will not try to predict the rare classes. To keep more classes, increase the number of datapoints from these rare classes in the training data or reduce label_count_threshold.
Fraction of data from classes with at least 10 examples that will be kept for training models: 0.998
Feature Generator processed 499 data points with 14 features
Original Features:
    int features: 6
    object features: 8
Generated Features:
    int features: 0
All Features:
    int features: 6
    object features: 8
    Data preprocessing and feature engineering runtime = 0.05s ...
AutoGluon will gauge predictive performance using evaluation metric: balanced_accuracy
To change this, specify the eval_metric argument of fit()
AutoGluon will early stop models using evaluation metric: balanced_accuracy
Fitting model: RandomForestClassifierGini ... Training model for up to 59.95s of the 59.95s of remaining time.
    0.2317   = Validation balanced_accuracy score
    0.61s    = Training runtime
    0.11s    = Validation runtime
Fitting model: RandomForestClassifierEntr ... Training model for up to 59.2s of the 59.2s of remaining time.
    0.2515   = Validation balanced_accuracy score
    0.6s     = Training runtime
    0.11s    = Validation runtime
Fitting model: ExtraTreesClassifierGini ... Training model for up to 58.46s of the 58.46s of remaining time.
    0.2202   = Validation balanced_accuracy score
    0.5s     = Training runtime
    0.11s    = Validation runtime
Fitting model: ExtraTreesClassifierEntr ... Training model for up to 57.81s of the 57.81s of remaining time.
    0.196    = Validation balanced_accuracy score
    0.5s     = Training runtime
    0.11s    = Validation runtime
Fitting model: KNeighborsClassifierUnif ... Training model for up to 57.16s of the 57.16s of remaining time.
    0.0902   = Validation balanced_accuracy score
    0.01s    = Training runtime
    0.11s    = Validation runtime
Fitting model: KNeighborsClassifierDist ... Training model for up to 57.04s of the 57.04s of remaining time.
    0.1136   = Validation balanced_accuracy score
    0.01s    = Training runtime
    0.11s    = Validation runtime
Fitting model: LightGBMClassifier ... Training model for up to 56.92s of the 56.92s of remaining time.
    0.2171   = Validation balanced_accuracy score
    7.1s     = Training runtime
    0.01s    = Validation runtime
Fitting model: CatboostClassifier ... Training model for up to 49.81s of the 49.81s of remaining time.
    0.2761   = Validation balanced_accuracy score
    5.61s    = Training runtime
    0.01s    = Validation runtime
Fitting model: NeuralNetClassifier ... Training model for up to 44.18s of the 44.18s of remaining time.
    0.1747   = Validation balanced_accuracy score
    4.99s    = Training runtime
    0.02s    = Validation runtime
Fitting model: LightGBMClassifierCustom ... Training model for up to 39.16s of the 39.16s of remaining time.
    Ran out of time, early stopping on iteration 139. Best iteration is:
    [133]   train_set's multi_error: 0      train_set's multi_logloss: 0.0993952    train_set's balanced_accuracy: 1        valid_set's multi_error: 0.74   valid_set's multi_logloss: 2.73162      valid_set's balanced_accuracy: 0.195132
    0.1951   = Validation balanced_accuracy score
    39.75s   = Training runtime
    0.02s    = Validation runtime
Fitting model: weighted_ensemble_k0_l1 ... Training model for up to 59.95s of the -1.94s of remaining time.
    0.3075   = Validation balanced_accuracy score
    0.52s    = Training runtime
    0.0s     = Validation runtime
AutoGluon training complete, total runtime = 62.49s ...
Predictive performance on given dataset: balanced_accuracy = 0.24778367136638768

Some other non-default metrics you might use include things like: f1 (for binary classification), roc_auc (for binary classification), log_loss (for classification), mean_absolute_error (for regression), median_absolute_error (for regression). You can also define your own custom metric function, see examples in the folder: autogluon/utils/tabular/metrics/

Model ensembling with stacking/bagging

Beyond hyperparameter-tuning with a correctly-specified evaluation metric, two other methods to boost predictive performance are bagging and stack-ensembling. You’ll often see performance improve if you specify num_bagging_folds = 5-10, stack_ensemble_levels = 1-3 in the call to fit(), but this will increase training times.

predictor = task.fit(train_data=train_data, label=label_column, eval_metric=metric,
                     num_bagging_folds=5, stack_ensemble_levels=1,
                     hyperparameters = {'NN':{'num_epochs':5}, 'GBM':{'num_boost_round':100}})
No output_directory specified. Models will be saved in: AutogluonModels/ag-20200524_022845/
Beginning AutoGluon training ...
AutoGluon will save models to AutogluonModels/ag-20200524_022845/
Train Data Rows:    500
Train Data Columns: 15
Preprocessing data ...
Here are the first 10 unique label values in your data:  [' Tech-support', ' Transport-moving', ' Other-service', ' ?', ' Handlers-cleaners', ' Sales', ' Craft-repair', ' Adm-clerical', ' Exec-managerial', ' Prof-specialty']
AutoGluon infers your prediction problem is: multiclass  (because dtype of label-column == object).
If this is wrong, please specify problem_type argument in fit() instead (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])

Warning: Some classes in the training set have fewer than 10 examples. AutoGluon will only keep 13 out of 14 classes for training and will not try to predict the rare classes. To keep more classes, increase the number of datapoints from these rare classes in the training data or reduce label_count_threshold.
Fraction of data from classes with at least 10 examples that will be kept for training models: 0.998
Feature Generator processed 499 data points with 14 features
Original Features:
    int features: 6
    object features: 8
Generated Features:
    int features: 0
All Features:
    int features: 6
    object features: 8
    Data preprocessing and feature engineering runtime = 0.05s ...
AutoGluon will gauge predictive performance using evaluation metric: balanced_accuracy
To change this, specify the eval_metric argument of fit()
AutoGluon will early stop models using evaluation metric: balanced_accuracy
Fitting model: LightGBMClassifier_STACKER_l0 ...
    0.2122   = Validation balanced_accuracy score
    21.98s   = Training runtime
    0.05s    = Validation runtime
Fitting model: NeuralNetClassifier_STACKER_l0 ...
    0.1186   = Validation balanced_accuracy score
    3.99s    = Training runtime
    0.16s    = Validation runtime
Fitting model: weighted_ensemble_k0_l1 ...
    0.2122   = Validation balanced_accuracy score
    0.18s    = Training runtime
    0.0s     = Validation runtime
Fitting model: LightGBMClassifier_STACKER_l1 ...
    0.2225   = Validation balanced_accuracy score
    23.13s   = Training runtime
    0.07s    = Validation runtime
Fitting model: NeuralNetClassifier_STACKER_l1 ...
    0.1265   = Validation balanced_accuracy score
    4.23s    = Training runtime
    0.2s     = Validation runtime
Fitting model: weighted_ensemble_k0_l2 ...
    0.2225   = Validation balanced_accuracy score
    0.18s    = Training runtime
    0.0s     = Validation runtime
AutoGluon training complete, total runtime = 54.44s ...

You should not provide tuning_data when stacking/bagging, and instead provide all your available data as train_data (which AutoGluon will split in more intellgent ways). Rather than manually searching for good bagging/stacking values yourself, AutoGluon will automatically select good values for you if you specify auto_stack instead:

predictor = task.fit(train_data=train_data, label=label_column, eval_metric=metric, auto_stack=True,
                     hyperparameters = {'NN':{'num_epochs':5}, 'GBM':{'num_boost_round':100}}, time_limits = 60) # last 2 arguments are just for quick demo, should be omitted
No output_directory specified. Models will be saved in: AutogluonModels/ag-20200524_022940/
Beginning AutoGluon training ... Time limit = 60s
AutoGluon will save models to AutogluonModels/ag-20200524_022940/
Train Data Rows:    500
Train Data Columns: 15
Preprocessing data ...
Here are the first 10 unique label values in your data:  [' Tech-support', ' Transport-moving', ' Other-service', ' ?', ' Handlers-cleaners', ' Sales', ' Craft-repair', ' Adm-clerical', ' Exec-managerial', ' Prof-specialty']
AutoGluon infers your prediction problem is: multiclass  (because dtype of label-column == object).
If this is wrong, please specify problem_type argument in fit() instead (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])

Warning: Some classes in the training set have fewer than 10 examples. AutoGluon will only keep 13 out of 14 classes for training and will not try to predict the rare classes. To keep more classes, increase the number of datapoints from these rare classes in the training data or reduce label_count_threshold.
Fraction of data from classes with at least 10 examples that will be kept for training models: 0.998
Feature Generator processed 499 data points with 14 features
Original Features:
    int features: 6
    object features: 8
Generated Features:
    int features: 0
All Features:
    int features: 6
    object features: 8
    Data preprocessing and feature engineering runtime = 0.05s ...
AutoGluon will gauge predictive performance using evaluation metric: balanced_accuracy
To change this, specify the eval_metric argument of fit()
AutoGluon will early stop models using evaluation metric: balanced_accuracy
Fitting model: LightGBMClassifier_STACKER_l0 ... Training model for up to 59.95s of the 59.95s of remaining time.
    0.2122   = Validation balanced_accuracy score
    21.97s   = Training runtime
    0.05s    = Validation runtime
Fitting model: NeuralNetClassifier_STACKER_l0 ... Training model for up to 37.85s of the 37.85s of remaining time.
    0.1026   = Validation balanced_accuracy score
    3.97s    = Training runtime
    0.16s    = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: weighted_ensemble_k0_l1 ... Training model for up to 59.95s of the 33.69s of remaining time.
    0.2122   = Validation balanced_accuracy score
    0.18s    = Training runtime
    0.0s     = Validation runtime
AutoGluon training complete, total runtime = 26.51s ...

Getting predictions (inference-time options)

Even if you’ve started a new Python session since last calling fit(), you can still load a previously trained predictor from disk:

predictor = task.load(output_directory)

Here, output_directory is the same folder previously passed to fit(), in which all the trained models have been saved. You can train easily models on one machine and deploy them on another. Simply copy the output_directory folder to the new machine and specify its new path in task.load().

predictor can make a prediction on an individual example rather than a full dataset:

datapoint = test_data.iloc[[0]]  # Note: .iloc[0] won't work because it returns pandas Series instead of DataFrame
print(datapoint)
print(predictor.predict(datapoint))
   age workclass  fnlwgt education  education-num       marital-status  0   31   Private  169085      11th              7   Married-civ-spouse

  relationship    race      sex  capital-gain  capital-loss  hours-per-week  0         Wife   White   Female             0             0              20

   native-country   class
0   United-States   <=50K
[' Other-service']

To output predicted class probabilities instead of predicted classes, you can use:

class_probs = predictor.predict_proba(datapoint)
print(class_probs)
[[0.01991053 0.1471116  0.05631462 0.09948063 0.02103386 0.04543482
  0.1033187  0.25142987 0.         0.07429797 0.01883532 0.0948907
  0.026055   0.04188637]]

By default, predict() and predict_proba() will utilize the model that AutoGluon thinks is most accurate, which is usually an ensemble of many individual models. We can instead specify a particular model to use for predictions (e.g. to reduce inference latency). Before deciding which model to use, let’s evaluate all of the models AutoGluon has previously trained using our validation dataset:

results = predictor.leaderboard(val_data)
                         model  score_test  score_val  pred_time_test  pred_time_val   fit_time  pred_time_test_marginal  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer
0      weighted_ensemble_k0_l1    0.247784   0.307451        2.698889       0.611743  59.694218                 0.016151                0.000994           0.519744            1       True
1           CatboostClassifier    0.243838   0.276136        0.040768       0.009529   5.608336                 0.040768                0.009529           5.608336            0       True
2   RandomForestClassifierGini    0.239444   0.231698        0.229646       0.112191   0.605833                 0.229646                0.112191           0.605833            0       True
3   RandomForestClassifierEntr    0.239113   0.251511        0.229426       0.110828   0.604185                 0.229426                0.110828           0.604185            0       True
4     ExtraTreesClassifierEntr    0.231311   0.196011        0.248257       0.110864   0.500452                 0.248257                0.110864           0.500452            0       True
5     ExtraTreesClassifierGini    0.228143   0.220154        0.256896       0.111225   0.501498                 0.256896                0.111225           0.501498            0       True
6           LightGBMClassifier    0.196499   0.217144        0.031639       0.010885   7.099118                 0.031639                0.010885           7.099118            0       True
7     LightGBMClassifierCustom    0.178989   0.195132        0.504253       0.015831  39.746551                 0.504253                0.015831          39.746551            0       True
8          NeuralNetClassifier    0.159579   0.174723        1.169405       0.024319   4.993695                 1.169405                0.024319           4.993695            0       True
9     KNeighborsClassifierUnif    0.073979   0.090152        0.109532       0.108153   0.007965                 0.109532                0.108153           0.007965            0       True
10    KNeighborsClassifierDist    0.071566   0.113624        0.111172       0.107787   0.007292                 0.111172                0.107787           0.007292            0       True

Here’s how to specify a particular model to use for prediction instead of AutoGluon’s default model-choice:

i = 0  # index of model to use
model_to_use = predictor.model_names[i]
model_pred = predictor.predict(datapoint, model=model_to_use)
print("Prediction from %s model: %s" % (model_to_use, model_pred))
Prediction from RandomForestClassifierGini model: [' Other-service']

The predictor also remembers what metric predictions should be evaluated with, which can be done with ground truth labels as follows:

y_pred = predictor.predict(test_data)
predictor.evaluate_predictions(y_true=y_test, y_pred=y_pred, auxiliary_metrics=True)

However, you must be careful here as certain metrics require predicted probabilities rather than classes. Since the label columns remains in the val_data DataFrame, we can instead use the shorthand:

predictor.evaluate(val_data)

which will correctly select between predict() or predict_proba() depending on the evaluation metric.

Maximizing predictive performance

To get the best predictive accuracy with AutoGluon, you should generally use it like this:

long_time = 60 # for quick demonstration only, you should set this to longest time you are willing to wait
predictor = task.fit(train_data=train_data, label=label_column, eval_metric=metric, auto_stack=True, time_limits=long_time)
No output_directory specified. Models will be saved in: AutogluonModels/ag-20200524_023011/
Beginning AutoGluon training ... Time limit = 60s
AutoGluon will save models to AutogluonModels/ag-20200524_023011/
Train Data Rows:    500
Train Data Columns: 15
Preprocessing data ...
Here are the first 10 unique label values in your data:  [' Tech-support', ' Transport-moving', ' Other-service', ' ?', ' Handlers-cleaners', ' Sales', ' Craft-repair', ' Adm-clerical', ' Exec-managerial', ' Prof-specialty']
AutoGluon infers your prediction problem is: multiclass  (because dtype of label-column == object).
If this is wrong, please specify problem_type argument in fit() instead (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])

Warning: Some classes in the training set have fewer than 10 examples. AutoGluon will only keep 13 out of 14 classes for training and will not try to predict the rare classes. To keep more classes, increase the number of datapoints from these rare classes in the training data or reduce label_count_threshold.
Fraction of data from classes with at least 10 examples that will be kept for training models: 0.998
Feature Generator processed 499 data points with 14 features
Original Features:
    int features: 6
    object features: 8
Generated Features:
    int features: 0
All Features:
    int features: 6
    object features: 8
    Data preprocessing and feature engineering runtime = 0.05s ...
AutoGluon will gauge predictive performance using evaluation metric: balanced_accuracy
To change this, specify the eval_metric argument of fit()
AutoGluon will early stop models using evaluation metric: balanced_accuracy
Fitting model: RandomForestClassifierGini_STACKER_l0 ... Training model for up to 59.95s of the 59.95s of remaining time.
    0.2257   = Validation balanced_accuracy score
    3.04s    = Training runtime
    0.55s    = Validation runtime
Fitting model: RandomForestClassifierEntr_STACKER_l0 ... Training model for up to 56.23s of the 56.23s of remaining time.
    0.2115   = Validation balanced_accuracy score
    3.05s    = Training runtime
    0.55s    = Validation runtime
Fitting model: ExtraTreesClassifierGini_STACKER_l0 ... Training model for up to 52.5s of the 52.5s of remaining time.
    0.2214   = Validation balanced_accuracy score
    2.52s    = Training runtime
    0.55s    = Validation runtime
Fitting model: ExtraTreesClassifierEntr_STACKER_l0 ... Training model for up to 49.23s of the 49.23s of remaining time.
    0.212    = Validation balanced_accuracy score
    2.52s    = Training runtime
    0.55s    = Validation runtime
Fitting model: KNeighborsClassifierUnif_STACKER_l0 ... Training model for up to 45.97s of the 45.97s of remaining time.
    0.0689   = Validation balanced_accuracy score
    0.05s    = Training runtime
    0.54s    = Validation runtime
Fitting model: KNeighborsClassifierDist_STACKER_l0 ... Training model for up to 45.38s of the 45.38s of remaining time.
    0.0708   = Validation balanced_accuracy score
    0.05s    = Training runtime
    0.54s    = Validation runtime
Fitting model: LightGBMClassifier_STACKER_l0 ... Training model for up to 44.78s of the 44.78s of remaining time.
    Ran out of time, early stopping on iteration 165. Best iteration is:
    [25]    train_set's multi_error: 0.320802       train_set's multi_logloss: 1.29038      train_set's balanced_accuracy: 0.625395 valid_set's multi_error: 0.73   valid_set's multi_logloss: 2.27283      valid_set's balanced_accuracy: 0.220647
    Ran out of time, early stopping on iteration 173. Best iteration is:
    [82]    train_set's multi_error: 0.0225564      train_set's multi_logloss: 0.478952     train_set's balanced_accuracy: 0.985162 valid_set's multi_error: 0.73   valid_set's multi_logloss: 2.56851      valid_set's balanced_accuracy: 0.227756
    Ran out of time, early stopping on iteration 185. Best iteration is:
    [37]    train_set's multi_error: 0.215539       train_set's multi_logloss: 1.00849      train_set's balanced_accuracy: 0.789105 valid_set's multi_error: 0.78   valid_set's multi_logloss: 2.37467      valid_set's balanced_accuracy: 0.175384
    0.2122   = Validation balanced_accuracy score
    37.84s   = Training runtime
    0.05s    = Validation runtime
Fitting model: CatboostClassifier_STACKER_l0 ... Training model for up to 6.82s of the 6.82s of remaining time.
    0.2671   = Validation balanced_accuracy score
    6.42s    = Training runtime
    0.04s    = Validation runtime
Fitting model: NeuralNetClassifier_STACKER_l0 ... Training model for up to 0.34s of the 0.34s of remaining time.
    Ran out of time, stopping training early.
    Time limit exceeded... Skipping NeuralNetClassifier_STACKER_l0.
Completed 1/20 k-fold bagging repeats ...
Fitting model: weighted_ensemble_k0_l1 ... Training model for up to 59.95s of the -0.07s of remaining time.
    0.2684   = Validation balanced_accuracy score
    0.56s    = Training runtime
    0.0s     = Validation runtime
AutoGluon training complete, total runtime = 60.64s ...

This command implements the following strategy to maximize accuracy:

  • Specify the auto_stack argument, which allows AutoGluon to automatically construct model ensembles based on multi-layer stack ensembling with repeated bagging, and will greatly improve the resulting predictions if granted sufficient training time.

  • Provide the eval_metric if you know what metric will be used to evaluate predictions in your application (e.g. roc_auc, log_loss, mean_absolute_error, etc.)

  • Include all your data in train_data and do not provide tuning_data (AutoGluon will split the data more intelligently to fit its needs).

  • Do not specify the hyperparameter_tune argument (counterintuitively, hyperparameter tuning is not the best way to spend a limited training time budgets, as model ensembling is often superior). We recommend you only use hyperparameter_tune if your goal is to deploy a single model rather than an ensemble.

  • Do not specify hyperparameters argument (allow AutoGluon to adaptively select which models/hyperparameters to use).

  • Set time_limits to the longest amount of time (in seconds) that you are willing to wait. AutoGluon’s predictive performance improves the longer fit() is allowed to run.