autogluon.task

Example (image classification task):

Tell AutoGluon that task is image classification:

>>> import autogluon as ag
>>> from autogluon import ImageClassification as task

Load a toy image dataset:

>>> filename = ag.download('http://autogluon-hackathon.s3-website-us-west-2.amazonaws.com/data.zip')
>>> ag.unzip(filename)
>>> dataset = task.Dataset(train_path='data/train')

Fit classification models:

>>> classifier = task.fit(dataset, epochs=2)

Evaluate predictions on test data:

>>> test_dataset = task.Dataset('data/test', train=False)
>>> test_acc = classifier.evaluate(test_dataset)

AutoGluon Tasks

Prediction tasks built into AutoGluon such that a single call to fit() can produce high-quality trained models. For other applications, you can still use AutoGluon to tune the hyperparameters of your own custom models and training scripts.

TabularPrediction

AutoGluon Task for predicting values in column of tabular dataset (classification or regression)

ImageClassification

AutoGluon Task for classifying images based on their content

ObjectDetection

AutoGluon Task for detecting and locating objects in images

TextClassification

AutoGluon Task for classifying text snippets based on their content

TabularPrediction

class autogluon.task.TabularPrediction

AutoGluon Task for predicting values in column of tabular dataset (classification or regression)

Methods

Dataset

alias of autogluon.task.tabular_prediction.dataset.TabularDataset

Predictor

alias of autogluon.task.tabular_prediction.predictor.TabularPredictor

fit(train_data, label[, tuning_data, …])

Fit models to predict a column of data table based on the other columns.

load(output_directory[, verbosity])

Load a predictor object previously produced by fit() from file and returns this object.

Dataset

alias of autogluon.task.tabular_prediction.dataset.TabularDataset

Predictor

alias of autogluon.task.tabular_prediction.predictor.TabularPredictor

static fit(train_data, label, tuning_data=None, output_directory=None, problem_type=None, eval_metric=None, stopping_metric=None, auto_stack=False, hyperparameter_tune=False, feature_prune=False, holdout_frac=None, num_bagging_folds=0, num_bagging_sets=None, stack_ensemble_levels=0, hyperparameters=None, cache_data=True, time_limits=None, num_trials=None, search_strategy='random', search_options=None, nthreads_per_trial=None, ngpus_per_trial=None, dist_ip_addrs=None, visualizer='none', verbosity=2, **kwargs)

Fit models to predict a column of data table based on the other columns.

Parameters
train_datastr or autogluon.task.tabular_prediction.TabularDataset or pandas.DataFrame

Table of the training data, which is similar to pandas DataFrame. If str is passed, train_data will be loaded using the str value as the file path.

labelstr

Name of the column that contains the target variable to predict.

tuning_datastr or autogluon.task.tabular_prediction.TabularDataset or pandas.DataFrame, default = None

Another dataset containing validation data reserved for hyperparameter tuning (in same format as training data). If str is passed, tuning_data will be loaded using the str value as the file path. Note: final model returned may be fit on this tuning_data as well as train_data. Do not provide your evaluation test data here! In particular, when num_bagging_folds > 0 or stack_ensemble_levels > 0, models will be trained on both tuning_data and train_data. If tuning_data = None, fit() will automatically hold out some random validation examples from train_data.

output_directorystr

Path to directory where models and intermediate outputs should be saved. If unspecified, a time-stamped folder called “autogluon-fit-[TIMESTAMP]” will be created in the working directory to store all models. Note: To call fit() twice and save all results of each fit, you must specify different output_directory locations. Otherwise files from first fit() will be overwritten by second fit().

problem_typestr, default = None

Type of prediction problem, i.e. is this a binary/multiclass classification or regression problem (options: ‘binary’, ‘multiclass’, ‘regression’). If problem_type = None, the prediction problem type is inferred based on the label-values in provided dataset.

eval_metricfunction or str, default = None

Metric by which predictions will be ultimately evaluated on test data. AutoGluon tunes factors such as hyperparameters, early-stopping, ensemble-weights, etc. in order to improve this metric on validation data.

If eval_metric = None, it is automatically chosen based on problem_type. Defaults to ‘accuracy’ for binary and multiclass classification and ‘root_mean_squared_error’ for regression. Otherwise, options for classification: [

‘accuracy’, ‘balanced_accuracy’, ‘f1’, ‘f1_macro’, ‘f1_micro’, ‘f1_weighted’, ‘roc_auc’, ‘average_precision’, ‘precision’, ‘precision_macro’, ‘precision_micro’, ‘precision_weighted’, ‘recall’, ‘recall_macro’, ‘recall_micro’, ‘recall_weighted’, ‘log_loss’, ‘pac_score’].

Options for regression: [‘root_mean_squared_error’, ‘mean_squared_error’, ‘mean_absolute_error’, ‘median_absolute_error’, ‘r2’]. For more information on these options, see sklearn.metrics: https://scikit-learn.org/stable/modules/classes.html#sklearn-metrics-metrics

You can also pass your own evaluation function here as long as it follows formatting of the functions defined in autogluon/utils/tabular/metrics/.

stopping_metricfunction or str, default = None

Metric which iteratively-trained models use to early stop to avoid overfitting. stopping_metric is not used by weighted ensembles, instead weighted ensembles maximize eval_metric. Defaults to eval_metric value except when eval_metric=’roc_auc’, where it defaults to log_loss. Options are identical to options for eval_metric.

auto_stackbool, default = False

Whether AutoGluon should automatically utilize bagging and multi-layer stack ensembling to boost predictive accuracy. Set this = True if you are willing to tolerate longer training times in order to maximize predictive accuracy! Note: This overrides num_bagging_folds and stack_ensemble_levels arguments (selects optimal values for these parameters based on dataset properties). Note: This can increase training time (and inference time) by up to 20x, but can greatly improve predictive performance.

hyperparameter_tunebool, default = False

Whether to tune hyperparameters or just use fixed hyperparameter values for each model. Setting as True will increase fit() runtimes. It is currently not recommended to use hyperparameter_tune with auto_stack due to potential overfitting. Use auto_stack to maximize predictive accuracy; use hyperparameter_tune if you prefer to deploy just a single model rather than an ensemble.

feature_prunebool, default = False

Whether or not to perform feature selection.

hyperparametersdict
Keys are strings that indicate which model types to train.

Options include: ‘NN’ (neural network), ‘GBM’ (lightGBM boosted trees), ‘CAT’ (CatBoost boosted trees), ‘RF’ (random forest), ‘XT’ (extremely randomized trees), ‘KNN’ (k-nearest neighbors) If certain key is missing from hyperparameters, then fit() will not train any models of that type. For example, set hyperparameters = { ‘NN’:{…} } if say you only want to train neural networks and no other types of models.

Values = dict of hyperparameter settings for each model type.

Each hyperparameter can either be single fixed value or a search space containing many possible values. Unspecified hyperparameters will be set to default values (or default search spaces if hyperparameter_tune = True). Caution: Any provided search spaces will be overriden by fixed defauls if hyperparameter_tune = False.

Note: hyperparameters can also take a special key ‘custom’, which maps to a list of model names (currently supported options = ‘GBM’). If hyperparameter_tune = False, then these additional models will also be trained using custom pre-specified hyperparameter settings that are known to work well.

Details regarding the hyperparameters you can specify for each model are provided in the following files:
NN: autogluon/utils/tabular/ml/models/tabular_nn/hyperparameters/parameters.py

Note: certain hyperparameter settings may cause these neural networks to train much slower.

GBM: autogluon/utils/tabular/ml/models/lgb/hyperparameters/parameters.py

See also the lightGBM docs: https://lightgbm.readthedocs.io/en/latest/Parameters.html

CAT: autogluon/utils/tabular/ml/models/catboost/hyperparameters/parameters.py

See also the CatBoost docs: https://catboost.ai/docs/concepts/parameter-tuning.html

RF: See sklearn documentation: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

Note: Hyperparameter tuning is disabled for this model. Note: ‘criterion’ parameter will be overriden. Both ‘gini’ and ‘entropy’ are used automatically, training two models.

XT: See sklearn documentation: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.ExtraTreesClassifier.html

Note: Hyperparameter tuning is disabled for this model. Note: ‘criterion’ parameter will be overriden. Both ‘gini’ and ‘entropy’ are used automatically, training two models.

KNN: See sklearn documentation: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html

Note: Hyperparameter tuning is disabled for this model. Note: ‘weights’ parameter will be overriden. Both ‘distance’ and ‘uniform’ are used automatically, training two models.

holdout_fracfloat

Fraction of train_data to holdout as tuning data for optimizing hyperparameters (ignored unless tuning_data = None, ignored if num_bagging_folds != 0). Default value is selected based on the number of rows in the training data. Default values range from 0.2 at 2,500 rows to 0.01 at 250,000 rows. Default value is doubled if hyperparameter_tune = True, up to a maximum of 0.2. Disabled if num_bagging_folds >= 2.

num_bagging_foldsint, default = 0

Number of folds used for bagging of models. When num_bagging_folds = k, training time is roughly increased by a factor of k (set = 0 to disable bagging). Disabled by default, but we recommend values between 5-10 to maximize predictive performance. Increasing num_bagging_folds will result in models with lower bias but that are more prone to overfitting. Values > 10 may produce diminishing returns, and can even harm overall results due to overfitting. To further improve predictions, avoid increasing num_bagging_folds much beyond 10 and instead increase num_bagging_sets.

num_bagging_setsint

Number of repeats of kfold bagging to perform (values must be >= 1). Total number of models trained during bagging = num_bagging_folds * num_bagging_sets. Defaults to 1 if time_limits is not specified, otherwise 20 (always disabled if num_bagging_folds is not specified). Values greater than 1 will result in superior predictive performance, especially on smaller problems and with stacking enabled (reduces overall variance).

stack_ensemble_levelsint, default = 0

Number of stacking levels to use in stack ensemble. Roughly increases model training time by factor of stack_ensemble_levels+1 (set = 0 to disable stack ensembling). Disabled by default, but we recommend values between 1-3 to maximize predictive performance. To prevent overfitting, this argument is ignored unless you have also set num_bagging_folds >= 2.

cache_databool, default = True

When enabled, the training and validation data are saved to disk for future reuse. Enables advanced functionality in the resulting Predictor object such as feature importance calculation on the original data.

time_limitsint

Approximately how long fit() should run for (wallclock time in seconds). If not specified, fit() will run until all models have completed training, but will not repeatedly bag models unless num_bagging_sets is specified.

num_trialsint

Maximal number of different hyperparameter settings of each model type to evaluate during HPO (only matters if hyperparameter_tune = True). If both time_limits and num_trials are specified, time_limits takes precedent.

search_strategystr

Which hyperparameter search algorithm to use (only matters if hyperparameter_tune = True). Options include: ‘random’ (random search), ‘skopt’ (SKopt Bayesian optimization), ‘grid’ (grid search), ‘hyperband’ (Hyperband)

search_optionsdict

Auxiliary keyword arguments to pass to the searcher that performs hyperparameter optimization.

nthreads_per_trialint

How many CPUs to use in each training run of an individual model. This is automatically determined by AutoGluon when left as None (based on available compute).

ngpus_per_trialint

How many GPUs to use in each trial (ie. single training run of a model). This is automatically determined by AutoGluon when left as None.

dist_ip_addrslist

List of IP addresses corresponding to remote workers, in order to leverage distributed computation.

visualizerstr

How to visualize the neural network training progress during fit(). Options: [‘mxboard’, ‘tensorboard’, ‘none’].

verbosity: int, default = 2

Verbosity levels range from 0 to 4 and control how much information is printed during fit(). Higher levels correspond to more detailed print statements (you can set verbosity = 0 to suppress warnings). If using logging, you can alternatively control amount of information printed via logger.setLevel(L), where L ranges from 0 to 50 (Note: higher values of L correspond to fewer print statements, opposite of verbosity levels)

Kwargs can include addtional arguments for advanced users:
feature_generator_typeFeatureGenerator class, default=`AutoMLFeatureGenerator`

A FeatureGenerator class specifying which feature engineering protocol to follow (see autogluon.utils.tabular.features.abstract_feature_generator.AbstractFeatureGenerator). Note: The file containing your FeatureGenerator class must be imported into current Python session in order to use a custom class.

feature_generator_kwargsdict, default={}

Keyword arguments to pass into the FeatureGenerator constructor.

trainer_typeTrainer class, default=`AutoTrainer`

A class inheritng from autogluon.utils.tabular.ml.trainer.abstract_trainer.AbstractTrainer that controls training/ensembling of many models. Note: In order to use a custom Trainer class, you must import the class file that defines it into the current Python session.

label_count_thresholdint, default = 10

For multi-class classification problems, this is the minimum number of times a label must appear in dataset in order to be considered an output class. AutoGluon will ignore any classes whose labels do not appear at least this many times in the dataset (i.e. will never predict them).

id_columnslist, default = []

Banned subset of column names that model may not use as predictive features (e.g. contains label, user-ID, etc). These columns are ignored during fit(), but DataFrame of just these columns with appended predictions may be produced, for example to submit in a ML competition.

Returns
autogluon.task.tabular_prediction.TabularPredictor object which can make predictions on new data and summarize what happened during fit().

Examples

>>> from autogluon import TabularPrediction as task
>>> train_data = task.Dataset(file_path='https://autogluon.s3-us-west-2.amazonaws.com/datasets/Inc/train.csv')
>>> label_column = 'class'
>>> predictor = task.fit(train_data=train_data, label=label_column)
>>> test_data = task.Dataset(file_path='https://autogluon.s3-us-west-2.amazonaws.com/datasets/Inc/test.csv')
>>> y_test = test_data[label_column]
>>> test_data = test_data.drop(labels=[label_column], axis=1)
>>> y_pred = predictor.predict(test_data)
>>> perf = predictor.evaluate_predictions(y_true=y_test, y_pred=y_pred)
>>> results = predictor.fit_summary()

To maximize predictive performance, use the following:

>>> eval_metric = 'roc_auc'  # set this to the metric you ultimately care about
>>> time_limits = 360  # set as long as you are willing to wait (in sec)
>>> predictor = task.fit(train_data=train_data, label=label_column, eval_metric=eval_metric, auto_stack=True, time_limits=time_limits)
static load(output_directory, verbosity=2)

Load a predictor object previously produced by fit() from file and returns this object.

Parameters
output_directorystr

Path to directory where trained models are stored (i.e. the output_directory specified in previous call to fit).

verbosityint, default = 2

Verbosity levels range from 0 to 4 and control how much information will be printed by the loaded Predictor. Higher levels correspond to more detailed print statements (you can set verbosity = 0 to suppress warnings). If using logging, you can alternatively control amount of information printed via logger.setLevel(L), where L ranges from 0 to 50 (Note: higher values L correspond to fewer print statements, opposite of verbosity levels)

Returns
autogluon.task.tabular_prediction.TabularPredictor object that can be used to make predictions.

ImageClassification

class autogluon.task.ImageClassification

AutoGluon Task for classifying images based on their content

Methods

Classifier(model, results, eval_func, …[, …])

Trained Image Classifier returned by fit() that can be used to make predictions on new images.

Dataset(\*args, \*\*kwargs)

Dataset for AutoGluon image classification tasks.

fit(dataset[, net, optimizer, loss, …])

Fit image classification models to a given dataset.

class Classifier(model, results, eval_func, scheduler_checkpoint, args, ensemble=0, format_results=True, **kwargs)

Trained Image Classifier returned by fit() that can be used to make predictions on new images.

Examples

>>> from autogluon import ImageClassification as task
>>> dataset = task.Dataset(train_path='data/train',
>>>                        test_path='data/test')
>>> classifier = task.fit(dataset,
>>>                       nets=ag.space.Categorical['resnet18_v1', 'resnet34_v1'],
>>>                       time_limits=time_limits,
>>>                       ngpus_per_trial=1,
>>>                       num_trials = 4)
>>> image = 'data/test/BabyShirt/BabyShirt_323.jpg'
>>> ind, prob = classifier.predict(image)

Methods

evaluate(self, dataset[, input_size, ctx])

Evaluate predictive performance of trained image classifier using given test data.

evaluate_predictions(self, y_true, y_pred)

Evaluate the provided list of predictions against list of ground truth labels according to the task-specific evaluation metric (self.eval_func).

fit_summary(self[, output_directory, verbosity])

Returns a summary of the fit process.

load(checkpoint)

Load trained Image Classifier from directory specified by checkpoint.

predict(self, X[, input_size, crop_ratio, …])

Predict class-index and associated class probability for each image in a given dataset (or just a single image).

predict_proba(self, X)

Produces predicted class probabilities for a given image.

save(self, checkpoint)

Save image classifier to folder specified by checkpoint.

loader

state_dict

evaluate(self, dataset, input_size=224, ctx=[cpu(0)])

Evaluate predictive performance of trained image classifier using given test data.

Parameters
datasetautogluon.task.ImageClassification.Dataset

The dataset containing test images (must be in same format as the training dataset).

input_sizeint

Size of the images (pixels).

ctxList of mxnet.context elements.

Determines whether to use CPU or GPU(s), options include: [mx.cpu()] or [mx.gpu()].

Examples

>>> from autogluon import ImageClassification as task
>>> train_data = task.Dataset(train_path='~/data/train')
>>> classifier = task.fit(train_data,
>>>                       nets=ag.space.Categorical['resnet18_v1', 'resnet34_v1'],
>>>                       time_limits=600, ngpus_per_trial=1, num_trials = 4)
>>> test_data = task.Dataset('~/data/test', train=False)
>>> test_acc = classifier.evaluate(test_data)
evaluate_predictions(self, y_true, y_pred)

Evaluate the provided list of predictions against list of ground truth labels according to the task-specific evaluation metric (self.eval_func).

fit_summary(self, output_directory=None, verbosity=2)

Returns a summary of the fit process. Args:

verbosity (int): how much output to print: <= 0 for no output printing, 1 for just high-level summary, 2 for summary and plot, >= 3 for all information contained in results object.

classmethod load(checkpoint)

Load trained Image Classifier from directory specified by checkpoint.

predict(self, X, input_size=224, crop_ratio=0.875, set_prob_thresh=0.001, plot=False)

Predict class-index and associated class probability for each image in a given dataset (or just a single image).

Parameters
Xstr or autogluon.task.ImageClassification.Dataset or list of autogluon.task.ImageClassification.Dataset

If str, should be path to the input image (when we just want to predict on single image). If class:autogluon.task.ImageClassification.Dataset, should be dataset of multiple images in same format as training dataset. If list of autogluon.task.ImageClassification.Dataset, should be a set of test dataset with different scales of origin images.

input_sizeint

Size of the images (pixels).

plotbool

Whether to plot the image being classified.

set_prob_thresh: float

Results with probability below threshold are set to 0 by default.

Examples
——–
>>> from autogluon import ImageClassification as task
>>> train_data = task.Dataset(train_path=’~/data/train’)
>>> classifier = task.fit(train_data,
>>> nets=ag.space.Categorical[‘resnet18_v1’, ‘resnet34_v1’],
>>> time_limits=600, ngpus_per_trial=1, num_trials=4)
>>> test_data = task.Dataset(‘~/data/test’, train=False)
>>> class_index, class_probability = classifier.predict(‘example.jpg’)
predict_proba(self, X)

Produces predicted class probabilities for a given image.

save(self, checkpoint)

Save image classifier to folder specified by checkpoint.

static Dataset(*args, **kwargs)
Dataset for AutoGluon image classification tasks.

May either be a autogluon.task.image_classification.ImageFolderDataset, autogluon.task.image_classification.RecordDataset, or a popular dataset already built into AutoGluon (‘mnist’, ‘fashionmnist’, ‘cifar10’, ‘cifar100’, ‘imagenet’).

Parameters
namestr, optional

Which built-in dataset to use, will override all other options if specified. The options are: ‘mnist’, ‘fashionmnist’, ‘cifar’, ‘cifar10’, ‘cifar100’, ‘imagenet’

trainbool, default = True

Whether this dataset should be used for training or validation.

train_pathstr

The training data location. If using ImageFolderDataset, image folder`path/to/the/folder` should be provided. If using RecordDataset, the path/to/*.rec should be provided.

input_sizeint

The input image size.

crop_ratiofloat

Center crop ratio (for evaluation only).

Returns
Dataset object that can be passed to task.fit(), which is actually an autogluon.space.AutoGluonObject.
To interact with such an object yourself, you must first call Dataset.init() to instantiate the object in Python.
static fit(dataset, net=Categorical['ResNet50_v1b', 'ResNet18_v1b'], optimizer=AutoGluonObject -- NAG, loss=AutoGluonObject -- SoftmaxCrossEntropyLoss, split_ratio=0.8, batch_size=64, input_size=224, epochs=20, final_fit_epochs=None, ensemble=1, metric='accuracy', nthreads_per_trial=60, ngpus_per_trial=1, hybridize=True, search_strategy='random', plot_results=False, verbose=False, search_options={}, time_limits=None, resume=False, output_directory='checkpoint/', visualizer='none', num_trials=2, dist_ip_addrs=[], grace_period=None, auto_search=True, lr_config=Dict{'lr_mode': 'cosine', 'lr_decay': 0.1, 'lr_decay_period': 0, 'lr_decay_epoch': '40,80', 'warmup_lr': 0.0, 'warmup_epochs': 0}, tricks=Dict{'last_gamma': False, 'use_pretrained': True, 'use_se': False, 'mixup': False, 'mixup_alpha': 0.2, 'mixup_off_epoch': 0, 'label_smoothing': False, 'no_wd': False, 'teacher_name': None, 'temperature': 20.0, 'hard_weight': 0.5, 'batch_norm': False, 'use_gn': False})

Fit image classification models to a given dataset.

Parameters
datasetstr or autogluon.task.ImageClassification.Dataset()

Training dataset containing images and their associated class labels. Popular image datasets built into AutoGluon can be used by specifying their name as a string (options: ‘mnist’, ‘fashionmnist’, ‘cifar’, ‘cifar10’, ‘cifar100’, ‘imagenet’).

input_sizeint

Size of images in the dataset (pixels).

netstr or autogluon.space.Categorical

Which existing neural network models to consider as candidates.

optimizerstr or autogluon.space.AutoGluonObject

Which optimizers to consider as candidates for learning the neural network weights.

batch_sizeint

How many images to group in each mini-batch during gradient computations in training.

epochs: int

How many epochs to train the neural networks for at most.

final_fit_epochs: int, default None

Final fit epochs, the same number of epochs will be used as during the HPO if not specified.

metricstr or callable object

Evaluation metric by which predictions will be ulitmately evaluated on test data.

lossmxnet.gluon.loss

Loss function used during training of the neural network weights.

num_trialsint

Maximal number of hyperparameter configurations to try out.

split_ratiofloat, default = 0.8

Fraction of dataset to use for training (rest of data is held-out for tuning hyperparameters). The final returned model may be fit to all of the data (after hyperparameters have been selected).

time_limitsint

Approximately how long fit() should run for (wallclock time in seconds). fit() will stop training new models after this amount of time has elapsed (but models which have already started training will continue to completion).

nthreads_per_trialint

How many CPUs to use in each trial (ie. single training run of a model).

ngpus_per_trialint

How many GPUs to use in each trial (ie. single training run of a model).

output_directorystr

Dir to save search results.

search_strategystr

Which hyperparameter search algorithm to use. Options include: ‘random’ (random search), ‘skopt’ (SKopt Bayesian optimization), ‘grid’ (grid search), ‘hyperband’ (Hyperband), ‘rl’ (reinforcement learner)

search_optionsdict

Auxiliary keyword arguments to pass to the searcher that performs hyperparameter optimization.

resumebool

If a model checkpoint file exists, model training will resume from there when specified.

dist_ip_addrslist

List of IP addresses corresponding to remote workers, in order to leverage distributed computation.

verbosebool

Whether or not to print out intermediate information during training.

plot_resultsbool

Whether or not to generate plots summarizing training process.

visualizerstr

Describes method to visualize training progress during fit(). Options: [‘mxboard’, ‘tensorboard’, ‘none’].

grace_periodint

The grace period in early stopping when using Hyperband to tune hyperparameters. If None, this is set automatically.

auto_searchbool

If True, enables automatic suggestion of network types and hyper-parameter ranges adaptively based on provided dataset.

Returns
autogluon.task.image_classification.Classifier object which can make predictions on new data and summarize what happened during fit().

Examples

>>> from autogluon import ImageClassification as task
>>> dataset = task.Dataset(train_path='data/train',
>>>                        test_path='data/test')
>>> classifier = task.fit(dataset,
>>>                       nets=ag.space.Categorical['resnet18_v1', 'resnet34_v1'],
>>>                       time_limits=time_limits,
>>>                       ngpus_per_trial=1,
>>>                       num_trials = 4)
>>> test_data = task.Dataset('~/data/test', train=False)
>>> test_acc = classifier.evaluate(test_data)

Bag of tricks are used on image classification dataset

ObjectDetection

class autogluon.task.ObjectDetection

AutoGluon Task for detecting and locating objects in images

Methods

Dataset(\*args, \*\*kwargs)

Dataset of images in which to detect objects.

fit([dataset, net, meta_arch, lr, loss, …])

Fit object detection models.

static Dataset(*args, **kwargs)

Dataset of images in which to detect objects.

static fit(dataset='voc', net=Categorical['mobilenet1.0'], meta_arch='yolo3', lr=Categorical[0.0005, 0.0001], loss=SoftmaxCrossEntropyLoss(batch_axis=0, w=None), split_ratio=0.8, batch_size=16, epochs=50, num_trials=2, nthreads_per_trial=12, num_workers=32, ngpus_per_trial=1, hybridize=True, search_strategy='random', search_options={}, time_limits=None, verbose=False, transfer='coco', resume='', checkpoint='checkpoint/exp1.ag', visualizer='none', dist_ip_addrs=[], grace_period=None, auto_search=True, seed=223, data_shape=416, start_epoch=0, lr_mode='step', lr_decay=0.1, lr_decay_period=0, lr_decay_epoch='160,180', warmup_lr=0.0, warmup_epochs=2, warmup_iters=1000, warmup_factor=0.3333333333333333, momentum=0.9, wd=0.0005, log_interval=100, save_prefix='', save_interval=10, val_interval=1, num_samples=-1, no_random_shape=False, no_wd=False, mixup=False, no_mixup_epochs=20, label_smooth=False, syncbn=False, reuse_pred_weights=True)

Fit object detection models.

Parameters
datasetstr or autogluon.task.ObjectDectection.Dataset

Training dataset containing images and corresponding object bounding boxes.

netstr, autogluon.space.AutoGluonObject

Which existing neural network base models to consider as candidates.

meta_archstr

Meta architecture of the model. Currently support YoloV3 (Default) and FasterRCNN. YoloV3 is faster, while FasterRCNN is more accurate.

lrfloat or autogluon.space

The learning rate to use in each update of the neural network weights during training.

lossmxnet.gluon.loss

Loss function used during training of the neural network weights.

split_ratiofloat

Fraction of dataset to hold-out during training in order to tune hyperparameters (i.e. validation data). The final returned model may be fit to all of the data (after hyperparameters have been selected).

batch_sizeint

How many images to group in each mini-batch during gradient computations in training.

epochs: int

How many epochs to train the neural networks for at most.

num_trialsint

Maximal number of hyperparameter configurations to try out.

nthreads_per_trialint

How many CPUs to use in each trial (ie. single training run of a model).

num_workersint

How many CPUs to use for data loading during training of a model.

ngpus_per_trialint

How many GPUs to use in each trial (ie. single training run of a model).

hybridizebool

Whether or not the MXNet neural network should be hybridized (for increased efficiency).

search_strategystr

Which hyperparameter search algorithm to use. Options include: ‘random’ (random search), ‘skopt’ (SKopt Bayesian optimization), ‘grid’ (grid search), ‘hyperband’ (Hyperband), ‘rl’ (reinforcement learner)

search_optionsdict

Auxiliary keyword arguments to pass to the searcher that performs hyperparameter optimization.

time_limitsint

Approximately how long should fit() should run for (wallclock time in seconds). fit() will stop training new models after this amount of time has elapsed (but models which have already started training will continue to completion).

verbosebool

Whether or not to print out intermediate information during training.

checkpoint: str

The path to local directory where trained models will be saved.

resumestr

Path to checkpoint file of existing model, from which model training should resume.

visualizerstr

Describes method to visualize training progress during fit(). Options: [‘mxboard’, ‘tensorboard’, ‘none’].

dist_ip_addrslist

List of IP addresses corresponding to remote workers, in order to leverage distributed computation.

grace_periodint

The grace period in early stopping when using Hyperband to tune hyperparameters. If None, this is set automatically.

auto_searchbool

If True, enables automatic suggestion of network types and hyper-parameter ranges adaptively based on provided dataset.

seedint

Random seed to set for reproducibility.

data_shapeint

Shape of the image data.

start_epochint

Which epoch we begin training from (eg. if we resume training of an existing model, then this argument may be set to the number of epochs the model has already been trained for previously).

lr_modestr

What sort of learning rate schedule should be followed during training.

lr_decayfloat

How much learning rate should be decayed during training.

lr_decay_periodint

How often learning rate should be decayed during training.

warmup_lrfloat

Learning rate to use during warm up period at the start of training.

warmup_epochsint

How many initial epochs constitute the “warm up” period of model training.

warmup_itersint

How many initial iterations constitute the “warm up” period of model training. This is used by R-CNNs

warmup_factorfloat

warmup factor of target lr. initial lr starts from target lr * warmup_factor

momentumfloat or autogluon.space

Momentum to use in optimization of neural network weights during training.

wdfloat or autogluon.space

Weight decay to use in optimization of neural network weights during training.

log_intervalint

Log results every so many epochs during training.

save_prefixstr

Prefix to append to file name for saved model.

save_intervalint

Save a copy of model every so many epochs during training.

val_intervalint

Evaluate performance on held-out validation data every so many epochs during training.

no_random_shapebool

Whether random shapes should not be used.

no_wdbool

Whether weight decay should be turned off.

mixupbool

Whether or not to utilize mixup data augmentation strategy.

no_mixup_epochsint

If using mixup, we first train model for this many epochs without mixup data augmentation.

label_smoothbool

Whether or not to utilize label smoothing.

syncbnbool

Whether or not to utilize synchronized batch normalization.

Returns
autogluon.task.object_detection.Detector object which can make predictions on new data and summarize what happened during fit().

Examples

>>> from autogluon import ObjectDetection as task
>>> detector = task.fit(dataset = 'voc', net = 'mobilenet1.0',
>>>                     time_limits = 600, ngpus_per_trial = 1, num_trials = 1)

TextClassification

class autogluon.task.TextClassification

AutoGluon Task for classifying text snippets based on their content

Methods

Dataset(\*args, \*\*kwargs)

Dataset of text examples to make predictions for.

fit([dataset, net, pretrained_dataset, lr, …])

Fit neural networks on text dataset.

static Dataset(*args, **kwargs)

Dataset of text examples to make predictions for. See autogluon.task.TextClassification.get_dataset()

static fit(dataset='SST', net=Categorical['bert_12_768_12'], pretrained_dataset=Categorical['book_corpus_wiki_en_uncased', 'openwebtext_book_corpus_wiki_en_uncased'], lr=Real: lower=2e-05, upper=0.0002, warmup_ratio=0.01, lr_scheduler='cosine', log_interval=100, seed=0, batch_size=32, dev_batch_size=32, max_len=128, dtype='float32', epochs=3, epsilon=1e-06, accumulate=1, early_stop=False, nthreads_per_trial=4, ngpus_per_trial=1, hybridize=True, search_strategy='random', search_options={}, time_limits=None, resume=False, checkpoint='checkpoint/exp1.ag', visualizer='none', num_trials=2, dist_ip_addrs=[], grace_period=None, auto_search=True, verbose=False, **kwargs)

Fit neural networks on text dataset.

Parameters
datasetstr or autogluon.task.TextClassification.Dataset

The Training dataset. You can specify a string to use a popular built-in text dataset.

netstr or autogluon.space.Categorical

Which existing neural network models to consider as candidates.

pretrained_datasetstr, autogluon.space.Categorical

Which existing datasets to consider as candidates for transfer learning from.

lrfloat or autogluon.space

The learning rate to use in each update of the neural network weights during training.

warmup_ratiofloat

Ratio of overall training period considered as “warm up”.

lr_schedulerstr

Describes how learning rate should be adjusted over the course of training. Options include: ‘cosine’, ‘poly’.

log_intervalint

Log results every so many epochs during training.

seedint

Random seed to set for reproducibility.

batch_sizeint

How many examples to group in each mini-batch during gradient computations in training.

dev_batch_sizeint

How many examples to group in each mini-batch during performance evalatuion over validation dataset.

max_lenint

Maximum number of words in a single training example (i.e. one text snippet).

dtypestr

Dtype used to represent data fed to neural networks.

epochs: int

How many epochs to train the neural networks for at most.

epsilonfloat

Small number.

accumulateint

How often to accumulate losses.

early_stopbool

Whether to utilize early stopping during training to avoid overfitting.

num_trialsint

Maximal number of hyperparameter configurations to try out.

nthreads_per_trialint

How many CPUs to use in each trial (ie. single training run of a model).

ngpus_per_trialint

How many GPUs to use in each trial (ie. single training run of a model).

hybridizebool

Whether or not the MXNet neural network should be hybridized (for increased efficiency).

search_strategystr

Which hyperparameter search algorithm to use. Options include: ‘random’ (random search), ‘skopt’ (SKopt Bayesian optimization), ‘grid’ (grid search), ‘hyperband’ (Hyperband), ‘rl’ (reinforcement learner)

search_optionsdict

Auxiliary keyword arguments to pass to the searcher that performs hyperparameter optimization.

time_limitsint

Approximately how long should fit() should run for (wallclock time in seconds). fit() will stop training new models after this amount of time has elapsed (but models which have already started training will continue to completion).

verbosebool

Whether or not to print out intermediate information during training.

checkpoint: str

The path to local directory where trained models will be saved.

resumestr

Path to checkpoint file of existing model, from which model training should resume.

visualizerstr

Describes method to visualize training progress during fit(). Options: [‘mxboard’, ‘tensorboard’, ‘none’].

dist_ip_addrslist

List of IP addresses corresponding to remote workers, in order to leverage distributed computation.

grace_periodint

The grace period in early stopping when using Hyperband to tune hyperparameters. If None, this is set automatically.

auto_searchbool

If True, enables automatic suggestion of network types and hyper-parameter ranges adaptively based on provided dataset.

Returns
autogluon.task.text_classification.TextClassificationPredictor object which can make predictions on new data and summarize what happened during fit().

Examples

>>> from autogluon import TextClassification as task
>>> dataset = task.Dataset(name='ToySST')
>>> predictor = task.fit(dataset)

Additional Tabular Prediction APIs

TabularPredictor

class autogluon.task.tabular_prediction.TabularPredictor(learner)

Object returned by fit() in Tabular Prediction tasks. Use for making predictions on new data and viewing information about models trained during fit().

Examples

>>> from autogluon import TabularPrediction as task
>>> train_data = task.Dataset(file_path='https://autogluon.s3-us-west-2.amazonaws.com/datasets/Inc/train.csv')
>>> predictor = task.fit(train_data=train_data, label='class')
>>> results = predictor.fit_summary()
>>> test_data = task.Dataset(file_path='https://autogluon.s3-us-west-2.amazonaws.com/datasets/Inc/test.csv')
>>> perf = predictor.evaluate(test_data)
Attributes
output_directorystr

Path to directory where all models used by this Predictor are stored.

problem_typestr

What type of prediction problem this Predictor has been trained for.

eval_metricfunction or str

What metric is used to evaluate predictive performance.

label_columnstr

Name of table column that contains data from the variable to predict (often referred to as: labels, response variable, target variable, dependent variable, Y, etc).

feature_typesdict

Inferred data type of each predictive variable (i.e. column of training data table used to predict label_column).

model_nameslist

List of model names trained during fit().

model_performancedict

Maps names of trained models to their predictive performance values attained on the validation dataset during fit().

class_labelslist

For multiclass problems, this list contains the class labels in sorted order of predict_proba() output. Is = None for problems that are not multiclass. For example if pred = predict_proba(x), then ith index of pred provides predicted probability that x belongs to class given by class_labels[i].

Methods

evaluate(self, dataset[, silent])

Report the predictive performance evaluated for a given Dataset.

evaluate_predictions(self, y_true, y_pred[, …])

Evaluate the provided predictions against ground truth labels.

feature_importance(self[, model, dataset, …])

Calculates feature importance scores for the given model.

fit_summary(self[, verbosity])

Output summary of information about models produced during fit().

leaderboard(self[, dataset, …])

Output summary of information about models produced during fit() as a pandas DataFrame.

load(output_directory[, verbosity])

Load a predictor object previously produced by fit() from file and returns this object.

predict(self, dataset[, model, as_pandas, …])

Use trained models to produce predicted labels (in classification) or response values (in regression).

predict_proba(self, dataset[, model, as_pandas])

Use trained models to produce predicted class probabilities rather than class-labels (if task is classification).

save(self)

Save this predictor to file in directory specified by this Predictor’s output_directory.

evaluate(self, dataset, silent=False)

Report the predictive performance evaluated for a given Dataset. This is basically a shortcut for: pred = predict(dataset); evaluate_predictions(dataset[label_column], preds, auxiliary_metrics=False) that automatically uses predict_proba() instead of predict() when appropriate.

Parameters
datasetstr or TabularDataset or pandas.DataFrame

This Dataset must also contain the label-column with the same column-name as specified during fit(). If str is passed, dataset will be loaded using the str value as the file path.

silentbool (optional)

Should performance results be printed?

Returns
Predictive performance value on the given dataset, based on the eval_metric used by this Predictor.
evaluate_predictions(self, y_true, y_pred, silent=False, auxiliary_metrics=False, detailed_report=True)

Evaluate the provided predictions against ground truth labels. Evaluation is based on the eval_metric previously specifed to fit(), or default metrics if none was specified.

Parameters
y_truelist or numpy.array

The ordered collection of ground-truth labels.

y_predlist or numpy.array

The ordered collection of predictions. Caution: For certain types of eval_metric (such as ‘roc_auc’), y_pred must be predicted-probabilities rather than predicted labels.

silentbool (optional)

Should performance results be printed?

auxiliary_metrics: bool (optional)

Should we compute other (problem_type specific) metrics in addition to the default metric?

detailed_reportbool (optional)

Should we computed more detailed versions of the auxiliary_metrics? (requires auxiliary_metrics = True)

Returns
Scalar performance value if auxiliary_metrics = False.
If auxiliary_metrics = True, returns dict where keys = metrics, values = performance along each metric.
feature_importance(self, model=None, dataset=None, features=None, raw=True, subsample_size=10000, silent=False)

Calculates feature importance scores for the given model. A feature’s importance score represents the performance drop that results when the model makes predictions on a perturbed copy of the dataset where this feature’s values have been randomly shuffled across rows. A feature score of 0.01 would indicate that the predictive performance dropped by 0.01 when the feature was randomly shuffled. The higher the score a feature has, the more important it is to the model’s performance. If a feature has a negative score, this means that the feature is likely harmful to the final model, and a model trained with the feature removed would be expected to achieve a better predictive performance. Note that calculating feature importance can be a very computationally expensive process, particularly if the model uses hundreds or thousands of features. In many cases, this can take longer than the original model training. To estimate how long feature_importance(model, dataset, features) will take, it is roughly the time taken by predict_proba(dataset, model) multiplied by the number of features.

Parameters
modelstr, default = None

Model to get feature importances for, if None the best model is chosen. Valid models are listed in this predictor by calling predictor.model_names

datasetstr or TabularDataset or pandas.DataFrame (optional)

This Dataset must also contain the label-column with the same column-name as specified during fit(). If specified, then the dataset is used to calculate the feature importance scores. If str is passed, dataset will be loaded using the str value as the file path. If not specified, the original dataset used during fit() will be used if cache_data=True. Otherwise, an exception will be raised. Do not pass the training data through this argument, as the feature importance scores calculated will be inaccurate.

featureslist, default = None

List of str feature names that feature importances are calculated for and returned, specify None to get all feature importances. If you only want to compute feature importances for some of the features, you can pass their names in as a list of str.

rawbool, default = True

Whether to compute feature importance on raw features in the original data (after automated feature engineering) or on the features used by the particular model. For example, a stacker model uses both the original features and the predictions of the lower-level models. Note that for bagged models, feature importance calculation is not yet supported when both raw=True and dataset=None. Doing so will raise an exception.

subsample_sizeint, default = 10000

The number of rows to sample from dataset when computing feature importance. If subsample_size=None or dataset contains fewer than subsample_size rows, all rows will be used during computation. Larger values increase the accuracy of the feature importance scores. Runtime linearly scales with subsample_size.

silentbool, default = False

Whether to suppress logging output

Returns
Pandas pandas.Series of feature importance scores.
fit_summary(self, verbosity=3)

Output summary of information about models produced during fit(). May create various generated summary plots and store them in folder: Predictor.output_directory.

Parameters
verbosityint, default = 3

Controls how detailed of a summary to ouput. Set <= 0 for no output printing, 1 to print just high-level summary, 2 to print summary and create plots, >= 3 to print all information produced during fit().

Returns
Dict containing various detailed information. We do not recommend directly printing this dict as it may be very large.
leaderboard(self, dataset=None, only_pareto_frontier=False, silent=False)

Output summary of information about models produced during fit() as a pandas DataFrame. Includes information on test and validation scores for all models, model training times, inference times, and stack levels. Output DataFrame columns include:

‘model’: The name of the model. ‘score_val’: The validation score of the model on the ‘eval_metric’. ‘pred_time_val’: The inference time required to compute predictions on the validation data end-to-end.

Equivalent to the sum of all ‘pred_time_val_marginal’ values for the model and all of its base models.

‘fit_time’: The fit time required to train the model end-to-end (Including base models if the model is a stack ensemble).

Equivalent to the sum of all ‘fit_time_marginal’ values for the model and all of its base models.

‘pred_time_val_marginal’: The inference time required to compute predictions on the validation data (Ignoring inference times for base models).

Note that this ignores the time required to load the model into memory when bagging is disabled.

‘fit_time_marginal’: The fit time required to train the model (Ignoring base models). ‘stack_level’: The stack level of the model.

A model with stack level N can take any set of models with stack level less than N as input, with stack level 0 models having no model inputs.

Parameters
datasetstr or TabularDataset or pandas.DataFrame (optional)

This Dataset must also contain the label-column with the same column-name as specified during fit(). If specified, then the leaderboard returned will contain additional columns ‘score_test’, ‘pred_time_test’, and ‘pred_time_test_marginal’.

‘score_test’: The score of the model on the ‘eval_metric’ for the dataset provided. ‘pred_time_test’: The true end-to-end wall-clock inference time of the model for the dataset provided.

Equivalent to the sum of all ‘pred_time_test_marginal’ values for the model and all of its base models.

‘pred_time_test_marginal’: The inference time of the model for the dataset provided, minus the inference time for the model’s base models, if it has any.

Note that this ignores the time required to load the model into memory when bagging is disabled.

If str is passed, dataset will be loaded using the str value as the file path.

only_pareto_frontierbool (optional)

If True, only return model information of models in the Pareto frontier of the accuracy/latency trade-off (models which achieve the highest score within their end-to-end inference time). At minimum this will include the model with the highest score and the model with the lowest inference time. This is useful when deciding which model to use during inference if inference time is a consideration. Models filtered out by this process would never be optimal choices for a user that only cares about model inference time and score.

silentbool (optional)

Should leaderboard DataFrame be printed?

Returns
Pandas pandas.DataFrame of model performance summary information.
classmethod load(output_directory, verbosity=2)

Load a predictor object previously produced by fit() from file and returns this object. Is functionally equivalent to autogluon.task.tabular_prediction.TabularPrediction.load().

Parameters
output_directorystr

Path to directory where trained models are stored (i.e. the output_directory specified in previous call to fit()).

verbosityint, default = 2

Verbosity levels range from 0 to 4 and control how much information is generally printed by this Predictor. Higher levels correspond to more detailed print statements (you can set verbosity = 0 to suppress warnings). If using logging, you can alternatively control amount of information printed via logger.setLevel(L), where L ranges from 0 to 50 (Note: higher values L correspond to fewer print statements, opposite of verbosity levels)

Returns
TabularPredictor object
predict(self, dataset, model=None, as_pandas=False, use_pred_cache=False, add_to_pred_cache=False)

Use trained models to produce predicted labels (in classification) or response values (in regression).

Parameters
datasetstr or TabularDataset or pandas.DataFrame

The dataset to make predictions for. Should contain same column names as training Dataset and follow same format (may contain extra columns that won’t be used by Predictor, including the label-column itself). If str is passed, dataset will be loaded using the str value as the file path.

modelstr (optional)

The name of the model to get predictions from. Defaults to None, which uses the highest scoring model on the validation set. Valid models are listed in this predictor by calling predictor.model_names

as_pandasbool (optional)

Whether to return the output as a pandas Series (True) or numpy array (False)

use_pred_cachebool (optional)

Whether to used previously-cached predictions for table rows we have already predicted on before (can speedup repeated runs of predict() on multiple datasets with overlapping rows between them).

add_to_pred_cachebool (optional)

Whether these predictions should be cached for reuse in future predict() calls on the same table rows (can speedup repeated runs of predict() on multiple datasets with overlapping rows between them).

Returns
Array of predictions, one corresponding to each row in given dataset. Either numpy Ndarray or pandas Series depending on as_pandas argument.
predict_proba(self, dataset, model=None, as_pandas=False)

Use trained models to produce predicted class probabilities rather than class-labels (if task is classification).

Parameters
datasetstr or TabularDataset or pandas.DataFrame

The dataset to make predictions for. Should contain same column names as training Dataset and follow same format (may contain extra columns that won’t be used by Predictor, including the label-column itself). If str is passed, dataset will be loaded using the str value as the file path.

modelstr (optional)

The name of the model to get prediction probabilities from. Defaults to None, which uses the highest scoring model on the validation set. Valid models are listed in this predictor by calling predictor.model_names

as_pandasbool (optional)

Whether to return the output as a pandas object (True) or numpy array (False). Pandas object is a DataFrame if this is a multiclass problem, otherwise it is a Series.

Returns
Array of predicted class-probabilities, corresponding to each row in the given dataset.
May be a numpy Ndarray or pandas Series/Dataframe depending on as_pandas argument and the type of prediction problem.
save(self)

Save this predictor to file in directory specified by this Predictor’s output_directory. Note that fit() already saves the predictor object automatically (we do not recommend modifying the Predictor object yourself as it tracks many trained models).

TabularDataset

class autogluon.task.tabular_prediction.TabularDataset(*args, **kwargs)

A dataset in tabular format (with rows = samples, columns = features/variables). This object is essentially a pandas DataFrame (with some extra attributes) and all existing pandas methods can be applied to it. For full list of methods/attributes, see pandas Dataframe documentation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

Parameters
file_pathstr (optional)

Path to the data file (may be on local filesystem or URL to cloud s3 bucket). At least one of file_path and df arguments must be specified when constructing a new TabularDataset.

dfpandas.DataFrame (optional)

If you already have your data in a pandas Dataframe, you can directly provide it by specifying df. At least one of file_path and df arguments must be specified when constructing new TabularDataset.

feature_typesdict (optional)

Mapping from column_names to string describing data type of each column. If not specified, AutoGluon’s fit() will automatically infer what type of data each feature contains.

subsampleint (optional)

If specified = k, we only keep first k rows of the provided dataset.

namestr (optional)

Optional name to assign to dataset (has no effect beyond being accessible via TabularDataset.name).

Examples

>>> from autogluon import TabularPrediction as task  # Note: TabularPrediction.Dataset == TabularDataset.
>>> train_data = task.Dataset(file_path='https://autogluon.s3-us-west-2.amazonaws.com/datasets/Inc/train.csv')
>>> test_data = task.Dataset(file_path='https://autogluon.s3-us-west-2.amazonaws.com/datasets/Inc/test.csv')
>>> train_data.head(30)
>>> train_data.columns
Attributes
name: (str)

An optional name assigned to this TabularDataset.

file_path: (str)

Path to data file from which this TabularDataset was created.

feature_types: (dict)

Maps column-names to string describing the data type of each column in this TabularDataset.

subsample: (int)

Describes size of subsample retained in this TabularDataset (None if this is original dataset).

Note: In addition to these attributes, `TabularDataset` also shares all the same attributes and methods of a pandas Dataframe.
For detailed list, see: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

Methods

abs(self)

Return a Series/DataFrame with absolute numeric value of each element.

add(self, other[, axis, level, fill_value])

Get Addition of dataframe and other, element-wise (binary operator add).

add_prefix(self, prefix)

Prefix labels with string prefix.

add_suffix(self, suffix)

Suffix labels with string suffix.

agg(self, func[, axis])

Aggregate using one or more operations over the specified axis.

aggregate(self, func[, axis])

Aggregate using one or more operations over the specified axis.

align(self, other[, join, axis, level, …])

Align two objects on their axes with the specified join method for each axis Index.

all(self[, axis, bool_only, skipna, level])

Return whether all elements are True, potentially over an axis.

any(self[, axis, bool_only, skipna, level])

Return whether any element is True, potentially over an axis.

append(self, other[, ignore_index, …])

Append rows of other to the end of caller, returning a new object.

apply(self, func[, axis, broadcast, raw, …])

Apply a function along an axis of the DataFrame.

applymap(self, func)

Apply a function to a Dataframe elementwise.

as_blocks(self[, copy])

Convert the frame to a dict of dtype -> Constructor Types that each has a homogeneous dtype.

as_matrix(self[, columns])

Convert the frame to its Numpy-array representation.

asfreq(self, freq[, method, how, normalize, …])

Convert TimeSeries to specified frequency.

asof(self, where[, subset])

Return the last row(s) without any NaNs before where.

assign(self, \*\*kwargs)

Assign new columns to a DataFrame.

astype(self, dtype[, copy, errors])

Cast a pandas object to a specified dtype dtype.

at_time(self, time[, asof, axis])

Select values at particular time of day (e.g.

between_time(self, start_time, end_time[, …])

Select values between particular times of the day (e.g., 9:00-9:30 AM).

bfill(self[, axis, inplace, limit, downcast])

Synonym for DataFrame.fillna() with method='bfill'.

bool(self)

Return the bool of a single element PandasObject.

boxplot(self[, column, by, ax, fontsize, …])

Make a box plot from DataFrame columns.

clip(self[, lower, upper, axis, inplace])

Trim values at input threshold(s).

clip_lower(self, threshold[, axis, inplace])

Trim values below a given threshold.

clip_upper(self, threshold[, axis, inplace])

Trim values above a given threshold.

combine(self, other, func[, fill_value, …])

Perform column-wise combine with another DataFrame.

combine_first(self, other)

Update null elements with value in the same location in other.

compound(self[, axis, skipna, level])

Return the compound percentage of the values for the requested axis.

copy(self[, deep])

Make a copy of this object’s indices and data.

corr(self[, method, min_periods])

Compute pairwise correlation of columns, excluding NA/null values.

corrwith(self, other[, axis, drop, method])

Compute pairwise correlation between rows or columns of DataFrame with rows or columns of Series or DataFrame.

count(self[, axis, level, numeric_only])

Count non-NA cells for each column or row.

cov(self[, min_periods])

Compute pairwise covariance of columns, excluding NA/null values.

cummax(self[, axis, skipna])

Return cumulative maximum over a DataFrame or Series axis.

cummin(self[, axis, skipna])

Return cumulative minimum over a DataFrame or Series axis.

cumprod(self[, axis, skipna])

Return cumulative product over a DataFrame or Series axis.

cumsum(self[, axis, skipna])

Return cumulative sum over a DataFrame or Series axis.

describe(self[, percentiles, include, exclude])

Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.

diff(self[, periods, axis])

First discrete difference of element.

div(self, other[, axis, level, fill_value])

Get Floating division of dataframe and other, element-wise (binary operator truediv).

divide(self, other[, axis, level, fill_value])

Get Floating division of dataframe and other, element-wise (binary operator truediv).

dot(self, other)

Compute the matrix multiplication between the DataFrame and other.

drop(self[, labels, axis, index, columns, …])

Drop specified labels from rows or columns.

drop_duplicates(self[, subset, keep, inplace])

Return DataFrame with duplicate rows removed, optionally only considering certain columns.

droplevel(self, level[, axis])

Return DataFrame with requested index / column level(s) removed.

dropna(self[, axis, how, thresh, subset, …])

Remove missing values.

duplicated(self[, subset, keep])

Return boolean Series denoting duplicate rows, optionally only considering certain columns.

eq(self, other[, axis, level])

Get Equal to of dataframe and other, element-wise (binary operator eq).

equals(self, other)

Test whether two objects contain the same elements.

eval(self, expr[, inplace])

Evaluate a string describing operations on DataFrame columns.

ewm(self[, com, span, halflife, alpha, …])

Provide exponential weighted functions.

expanding(self[, min_periods, center, axis])

Provide expanding transformations.

explode(self, column, Tuple])

Transform each element of a list-like to a row, replicating the index values.

ffill(self[, axis, inplace, limit, downcast])

Synonym for DataFrame.fillna() with method='ffill'.

fillna(self[, value, method, axis, inplace, …])

Fill NA/NaN values using the specified method.

filter(self[, items, like, regex, axis])

Subset rows or columns of dataframe according to labels in the specified index.

first(self, offset)

Convenience method for subsetting initial periods of time series data based on a date offset.

first_valid_index(self)

Return index for first non-NA/null value.

floordiv(self, other[, axis, level, fill_value])

Get Integer division of dataframe and other, element-wise (binary operator floordiv).

from_dict(data[, orient, dtype, columns])

Construct DataFrame from dict of array-like or dicts.

from_items(items[, columns, orient])

Construct a DataFrame from a list of tuples.

from_records(data[, index, exclude, …])

Convert structured or record ndarray to DataFrame.

ge(self, other[, axis, level])

Get Greater than or equal to of dataframe and other, element-wise (binary operator ge).

get(self, key[, default])

Get item from object for given key (ex: DataFrame column).

get_dtype_counts(self)

Return counts of unique dtypes in this object.

get_ftype_counts(self)

Return counts of unique ftypes in this object.

get_value(self, index, col[, takeable])

Quickly retrieve single value at passed column and index.

get_values(self)

Return an ndarray after converting sparse values to dense.

groupby(self[, by, axis, level, as_index, …])

Group DataFrame or Series using a mapper or by a Series of columns.

gt(self, other[, axis, level])

Get Greater than of dataframe and other, element-wise (binary operator gt).

head(self[, n])

Return the first n rows.

hist(data[, column, by, grid, xlabelsize, …])

Make a histogram of the DataFrame’s.

idxmax(self[, axis, skipna])

Return index of first occurrence of maximum over requested axis.

idxmin(self[, axis, skipna])

Return index of first occurrence of minimum over requested axis.

infer_objects(self)

Attempt to infer better dtypes for object columns.

info(self[, verbose, buf, max_cols, …])

Print a concise summary of a DataFrame.

insert(self, loc, column, value[, …])

Insert column into DataFrame at specified location.

interpolate(self[, method, axis, limit, …])

Interpolate values according to different methods.

isin(self, values)

Whether each element in the DataFrame is contained in values.

isna(self)

Detect missing values.

isnull(self)

Detect missing values.

items(self)

Iterator over (column name, Series) pairs.

iteritems(self)

Iterator over (column name, Series) pairs.

iterrows(self)

Iterate over DataFrame rows as (index, Series) pairs.

itertuples(self[, index, name])

Iterate over DataFrame rows as namedtuples.

join(self, other[, on, how, lsuffix, …])

Join columns of another DataFrame.

keys(self)

Get the ‘info axis’ (see Indexing for more)

kurt(self[, axis, skipna, level, numeric_only])

Return unbiased kurtosis over requested axis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0).

kurtosis(self[, axis, skipna, level, …])

Return unbiased kurtosis over requested axis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0).

last(self, offset)

Convenience method for subsetting final periods of time series data based on a date offset.

last_valid_index(self)

Return index for last non-NA/null value.

le(self, other[, axis, level])

Get Less than or equal to of dataframe and other, element-wise (binary operator le).

lookup(self, row_labels, col_labels)

Label-based “fancy indexing” function for DataFrame.

lt(self, other[, axis, level])

Get Less than of dataframe and other, element-wise (binary operator lt).

mad(self[, axis, skipna, level])

Return the mean absolute deviation of the values for the requested axis.

mask(self, cond[, other, inplace, axis, …])

Replace values where the condition is True.

max(self[, axis, skipna, level, numeric_only])

Return the maximum of the values for the requested axis.

mean(self[, axis, skipna, level, numeric_only])

Return the mean of the values for the requested axis.

median(self[, axis, skipna, level, numeric_only])

Return the median of the values for the requested axis.

melt(self[, id_vars, value_vars, var_name, …])

Unpivot a DataFrame from wide format to long format, optionally leaving identifier variables set.

memory_usage(self[, index, deep])

Return the memory usage of each column in bytes.

merge(self, right[, how, on, left_on, …])

Merge DataFrame or named Series objects with a database-style join.

min(self[, axis, skipna, level, numeric_only])

Return the minimum of the values for the requested axis.

mod(self, other[, axis, level, fill_value])

Get Modulo of dataframe and other, element-wise (binary operator mod).

mode(self[, axis, numeric_only, dropna])

Get the mode(s) of each element along the selected axis.

mul(self, other[, axis, level, fill_value])

Get Multiplication of dataframe and other, element-wise (binary operator mul).

multiply(self, other[, axis, level, fill_value])

Get Multiplication of dataframe and other, element-wise (binary operator mul).

ne(self, other[, axis, level])

Get Not equal to of dataframe and other, element-wise (binary operator ne).

nlargest(self, n, columns[, keep])

Return the first n rows ordered by columns in descending order.

notna(self)

Detect existing (non-missing) values.

notnull(self)

Detect existing (non-missing) values.

nsmallest(self, n, columns[, keep])

Return the first n rows ordered by columns in ascending order.

nunique(self[, axis, dropna])

Count distinct observations over requested axis.

pct_change(self[, periods, fill_method, …])

Percentage change between the current and a prior element.

pipe(self, func, \*args, \*\*kwargs)

Apply func(self, *args, **kwargs).

pivot(self[, index, columns, values])

Return reshaped DataFrame organized by given index / column values.

pivot_table(self[, values, index, columns, …])

Create a spreadsheet-style pivot table as a DataFrame.

plot

alias of pandas.plotting._core.PlotAccessor

pop(self, item)

Return item and drop from frame.

pow(self, other[, axis, level, fill_value])

Get Exponential power of dataframe and other, element-wise (binary operator pow).

prod(self[, axis, skipna, level, …])

Return the product of the values for the requested axis.

product(self[, axis, skipna, level, …])

Return the product of the values for the requested axis.

quantile(self[, q, axis, numeric_only, …])

Return values at the given quantile over requested axis.

query(self, expr[, inplace])

Query the columns of a DataFrame with a boolean expression.

radd(self, other[, axis, level, fill_value])

Get Addition of dataframe and other, element-wise (binary operator radd).

rank(self[, axis, method, numeric_only, …])

Compute numerical data ranks (1 through n) along axis.

rdiv(self, other[, axis, level, fill_value])

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

reindex(self[, labels, index, columns, …])

Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index.

reindex_like(self, other[, method, copy, …])

Return an object with matching indices as other object.

rename(self[, mapper, index, columns, axis, …])

Alter axes labels.

rename_axis(self[, mapper, index, columns, …])

Set the name of the axis for the index or columns.

reorder_levels(self, order[, axis])

Rearrange index levels using input order.

replace(self[, to_replace, value, inplace, …])

Replace values given in to_replace with value.

resample(self, rule[, how, axis, …])

Resample time-series data.

reset_index(self[, level, drop, inplace, …])

Reset the index, or a level of it.

rfloordiv(self, other[, axis, level, fill_value])

Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).

rmod(self, other[, axis, level, fill_value])

Get Modulo of dataframe and other, element-wise (binary operator rmod).

rmul(self, other[, axis, level, fill_value])

Get Multiplication of dataframe and other, element-wise (binary operator rmul).

rolling(self, window[, min_periods, center, …])

Provide rolling window calculations.

round(self[, decimals])

Round a DataFrame to a variable number of decimal places.

rpow(self, other[, axis, level, fill_value])

Get Exponential power of dataframe and other, element-wise (binary operator rpow).

rsub(self, other[, axis, level, fill_value])

Get Subtraction of dataframe and other, element-wise (binary operator rsub).

rtruediv(self, other[, axis, level, fill_value])

Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

sample(self[, n, frac, replace, weights, …])

Return a random sample of items from an axis of object.

select_dtypes(self[, include, exclude])

Return a subset of the DataFrame’s columns based on the column dtypes.

sem(self[, axis, skipna, level, ddof, …])

Return unbiased standard error of the mean over requested axis.

set_axis(self, labels[, axis, inplace])

Assign desired index to given axis.

set_index(self, keys[, drop, append, …])

Set the DataFrame index using existing columns.

set_value(self, index, col, value[, takeable])

Put single value at passed column and index.

shift(self[, periods, freq, axis, fill_value])

Shift index by desired number of periods with an optional time freq.

skew(self[, axis, skipna, level, numeric_only])

Return unbiased skew over requested axis Normalized by N-1.

slice_shift(self[, periods, axis])

Equivalent to shift without copying data.

sort_index(self[, axis, level, ascending, …])

Sort object by labels (along an axis).

sort_values(self, by[, axis, ascending, …])

Sort by the values along either axis.

sparse

alias of pandas.core.arrays.sparse.SparseFrameAccessor

squeeze(self[, axis])

Squeeze 1 dimensional axis objects into scalars.

stack(self[, level, dropna])

Stack the prescribed level(s) from columns to index.

std(self[, axis, skipna, level, ddof, …])

Return sample standard deviation over requested axis.

sub(self, other[, axis, level, fill_value])

Get Subtraction of dataframe and other, element-wise (binary operator sub).

subtract(self, other[, axis, level, fill_value])

Get Subtraction of dataframe and other, element-wise (binary operator sub).

sum(self[, axis, skipna, level, …])

Return the sum of the values for the requested axis.

swapaxes(self, axis1, axis2[, copy])

Interchange axes and swap values axes appropriately.

swaplevel(self[, i, j, axis])

Swap levels i and j in a MultiIndex on a particular axis.

tail(self[, n])

Return the last n rows.

take(self, indices[, axis, is_copy])

Return the elements in the given positional indices along an axis.

to_clipboard(self[, excel, sep])

Copy object to the system clipboard.

to_csv(self[, path_or_buf, sep, na_rep, …])

Write object to a comma-separated values (csv) file.

to_dense(self)

Return dense representation of Series/DataFrame (as opposed to sparse).

to_dict(self[, orient, into])

Convert the DataFrame to a dictionary.

to_excel(self, excel_writer[, sheet_name, …])

Write object to an Excel sheet.

to_feather(self, fname)

Write out the binary feather-format for DataFrames.

to_gbq(self, destination_table[, …])

Write a DataFrame to a Google BigQuery table.

to_hdf(self, path_or_buf, key, \*\*kwargs)

Write the contained data to an HDF5 file using HDFStore.

to_html(self[, buf, columns, col_space, …])

Render a DataFrame as an HTML table.

to_json(self[, path_or_buf, orient, …])

Convert the object to a JSON string.

to_latex(self[, buf, columns, col_space, …])

Render an object to a LaTeX tabular environment table.

to_msgpack(self[, path_or_buf, encoding])

Serialize object to input file path using msgpack format.

to_numpy(self[, dtype, copy])

Convert the DataFrame to a NumPy array.

to_parquet(self, fname[, engine, …])

Write a DataFrame to the binary parquet format.

to_period(self[, freq, axis, copy])

Convert DataFrame from DatetimeIndex to PeriodIndex with desired frequency (inferred from index if not passed).

to_pickle(self, path[, compression, protocol])

Pickle (serialize) object to file.

to_records(self[, index, …])

Convert DataFrame to a NumPy record array.

to_sparse(self[, fill_value, kind])

Convert to SparseDataFrame.

to_sql(self, name, con[, schema, if_exists, …])

Write records stored in a DataFrame to a SQL database.

to_stata(self, fname[, convert_dates, …])

Export DataFrame object to Stata dta format.

to_string(self[, buf, columns, col_space, …])

Render a DataFrame to a console-friendly tabular output.

to_timestamp(self[, freq, how, axis, copy])

Cast to DatetimeIndex of timestamps, at beginning of period.

to_xarray(self)

Return an xarray object from the pandas object.

transform(self, func[, axis])

Call func on self producing a DataFrame with transformed values and that has the same axis length as self.

transpose(self, \*args, \*\*kwargs)

Transpose index and columns.

truediv(self, other[, axis, level, fill_value])

Get Floating division of dataframe and other, element-wise (binary operator truediv).

truncate(self[, before, after, axis, copy])

Truncate a Series or DataFrame before and after some index value.

tshift(self[, periods, freq, axis])

Shift the time index, using the index’s frequency if available.

tz_convert(self, tz[, axis, level, copy])

Convert tz-aware axis to target time zone.

tz_localize(self, tz[, axis, level, copy, …])

Localize tz-naive index of a Series or DataFrame to target time zone.

unstack(self[, level, fill_value])

Pivot a level of the (necessarily hierarchical) index labels, returning a DataFrame having a new level of column labels whose inner-most level consists of the pivoted index labels.

update(self, other[, join, overwrite, …])

Modify in place using non-NA values from another DataFrame.

var(self[, axis, skipna, level, ddof, …])

Return unbiased variance over requested axis.

where(self, cond[, other, inplace, axis, …])

Replace values where the condition is False.

xs(self, key[, axis, level, drop_level])

Return cross-section from the Series/DataFrame.

Additional Image Classification APIs

Classifier

class autogluon.task.image_classification.Classifier(model, results, eval_func, scheduler_checkpoint, args, ensemble=0, format_results=True, **kwargs)

Trained Image Classifier returned by fit() that can be used to make predictions on new images.

Examples

>>> from autogluon import ImageClassification as task
>>> dataset = task.Dataset(train_path='data/train',
>>>                        test_path='data/test')
>>> classifier = task.fit(dataset,
>>>                       nets=ag.space.Categorical['resnet18_v1', 'resnet34_v1'],
>>>                       time_limits=time_limits,
>>>                       ngpus_per_trial=1,
>>>                       num_trials = 4)
>>> image = 'data/test/BabyShirt/BabyShirt_323.jpg'
>>> ind, prob = classifier.predict(image)

Methods

evaluate(self, dataset[, input_size, ctx])

Evaluate predictive performance of trained image classifier using given test data.

load(checkpoint)

Load trained Image Classifier from directory specified by checkpoint.

predict(self, X[, input_size, crop_ratio, …])

Predict class-index and associated class probability for each image in a given dataset (or just a single image).

save(self, checkpoint)

Save image classifier to folder specified by checkpoint.

evaluate(self, dataset, input_size=224, ctx=[cpu(0)])

Evaluate predictive performance of trained image classifier using given test data.

Parameters
datasetautogluon.task.ImageClassification.Dataset

The dataset containing test images (must be in same format as the training dataset).

input_sizeint

Size of the images (pixels).

ctxList of mxnet.context elements.

Determines whether to use CPU or GPU(s), options include: [mx.cpu()] or [mx.gpu()].

Examples

>>> from autogluon import ImageClassification as task
>>> train_data = task.Dataset(train_path='~/data/train')
>>> classifier = task.fit(train_data,
>>>                       nets=ag.space.Categorical['resnet18_v1', 'resnet34_v1'],
>>>                       time_limits=600, ngpus_per_trial=1, num_trials = 4)
>>> test_data = task.Dataset('~/data/test', train=False)
>>> test_acc = classifier.evaluate(test_data)
classmethod load(checkpoint)

Load trained Image Classifier from directory specified by checkpoint.

predict(self, X, input_size=224, crop_ratio=0.875, set_prob_thresh=0.001, plot=False)

Predict class-index and associated class probability for each image in a given dataset (or just a single image).

Parameters
Xstr or autogluon.task.ImageClassification.Dataset or list of autogluon.task.ImageClassification.Dataset

If str, should be path to the input image (when we just want to predict on single image). If class:autogluon.task.ImageClassification.Dataset, should be dataset of multiple images in same format as training dataset. If list of autogluon.task.ImageClassification.Dataset, should be a set of test dataset with different scales of origin images.

input_sizeint

Size of the images (pixels).

plotbool

Whether to plot the image being classified.

set_prob_thresh: float

Results with probability below threshold are set to 0 by default.

Examples
——–
>>> from autogluon import ImageClassification as task
>>> train_data = task.Dataset(train_path=’~/data/train’)
>>> classifier = task.fit(train_data,
>>> nets=ag.space.Categorical[‘resnet18_v1’, ‘resnet34_v1’],
>>> time_limits=600, ngpus_per_trial=1, num_trials=4)
>>> test_data = task.Dataset(‘~/data/test’, train=False)
>>> class_index, class_probability = classifier.predict(‘example.jpg’)
save(self, checkpoint)

Save image classifier to folder specified by checkpoint.

get_dataset

autogluon.task.image_classification.get_dataset(path=None, train=True, name=None, input_size=224, crop_ratio=0.875, jitter_param=0.4, scale_ratio_choice=[], *args, **kwargs)

Method to produce image classification dataset for AutoGluon, can either be a ImageFolderDataset, RecordDataset, or a popular dataset already built into AutoGluon (‘mnist’, ‘cifar10’, ‘cifar100’, ‘imagenet’).

Parameters
namestr, optional

Which built-in dataset to use, will override all other options if specified. The options are (‘mnist’, ‘cifar’, ‘cifar10’, ‘cifar100’, ‘imagenet’)

trainbool, default = True

Whether this dataset should be used for training or validation.

pathstr

The training data location. If using ImageFolderDataset, image folder`path/to/the/folder` should be provided. If using RecordDataset, the path/to/*.rec should be provided.

input_sizeint

The input image size.

crop_ratiofloat

Center crop ratio (for evaluation only)

scale_ratio_choice: list

List of crop_ratio, only in the test dataset, the set of scaling ratios obtained is scaled to the original image, and then cut a fixed size (input_size) and get a set of predictions for averaging.

Returns
Dataset object that can be passed to task.fit(), which is actually an autogluon.space.AutoGluonObject.
To interact with such an object yourself, you must first call Dataset.init() to instantiate the object in Python.

ImageFolderDataset

class autogluon.task.image_classification.ImageFolderDataset(*args, **kwargs)

A generic data loader where the images are arranged in this way on your local filesystem:

root/dog/a.png
root/dog/b.png
root/dog/c.png

root/cat/x.png
root/cat/y.png
root/cat/z.png

Here, folder-names dog and cat are the class labels and the images with file-names ‘a’, b, c belong to the dog class while the others are cat images.

Parameters
rootstring

Root directory path to the folder containing all of the data.

transformcallable (optional)

A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop

is_valid_filecallable (optional)

A function that takes path of an Image file and check if the file is a valid file (used to check of corrupt files)

Attributes
classeslist

List of the class names.

class_to_idxdict

Dict with items (class_name, class_index).

imgslist

List of (image path, class_index) tuples

Methods

__call__(self, \*args, \*\*kwargs)

Convenience method for interacting with AutoGluonObject.

init(self)

Instantiate an actual instance of this AutoGluonObject.

sample

property cs

ConfigSpace representation of this search space.

property default

Return default value for hyperparameter corresponding to this search space.

init(self)

Instantiate an actual instance of this AutoGluonObject. In order to interact with such an object, you must always first call: object.init().

property rand

Randomly sample configuration from this nested search space.

RecordDataset

class autogluon.task.image_classification.RecordDataset(*args, **kwargs)
A dataset wrapping over a RecordIO file containing images.

Each sample is an image and its corresponding label.

Parameters
filenamestr

Local path to the .rec file.

gray_scaleFalse

If True, always convert images to greyscale. If False, always convert images to colored (RGB).

transformfunction, default None

A user defined callback that transforms each sample.

Attributes
cs

ConfigSpace representation of this search space.

default

Return default value for hyperparameter corresponding to this search space.

rand

Randomly sample configuration from this nested search space.

Methods

__call__(self, \*args, \*\*kwargs)

Convenience method for interacting with AutoGluonObject.

init(self)

Instantiate an actual instance of this AutoGluonObject.

sample

property cs

ConfigSpace representation of this search space.

property default

Return default value for hyperparameter corresponding to this search space.

init(self)

Instantiate an actual instance of this AutoGluonObject. In order to interact with such an object, you must always first call: object.init().

property rand

Randomly sample configuration from this nested search space.

Additional Object Detection APIs

Detector

class autogluon.task.object_detection.Detector(model, results, scheduler_checkpoint, args, **kwargs)

Trained Object Detector returned by task.fit()

Methods

evaluate(self, dataset[, ctx])

Evaluate performance of this object detector’s predictions on test data.

predict(self, X[, input_size, thresh, plot])

Use this object detector to make predictions on test data.

evaluate(self, dataset, ctx=[cpu(0)])

Evaluate performance of this object detector’s predictions on test data.

Parameters
dataset: `Dataset`

Test dataset (must be in the same format as training data previously provided to fit).

ctxList of mxnet.context elements.

Determines whether to use CPU or GPU(s), options include: [mx.cpu()] or [mx.gpu()].

predict(self, X, input_size=224, thresh=0.15, plot=True)

Use this object detector to make predictions on test data.

Parameters
XTest data with image(s) to make predictions for.
input_sizeint

Size of images in test data (pixels).

threshfloat

Confidence Threshold above which detector outputs bounding box for object.

plotbool

Whether or not to plot the bounding box of detected objects on top of the original images.

Returns
Tuple containing the class-IDs of detected objects, the confidence-scores associated with
these detectiions, and the corresponding predicted bounding box locations.

get_dataset

autogluon.task.object_detection.dataset.get_dataset(root='~/.mxnet/datasets/voc', index_file_name='trainval', name=None, classes=None, format='voc', Train=True, **kwargs)

Load dataset to use for object detection, which must be in either VOC or COCO format.

Parameters
rootstr

Path to folder storing the dataset.

index_file_namestr

Name of file containing the training/validation indices of each text example. The name of the .txt file which constains images for training or testing. this is only for custom dataset.

name: str

name for built-in dataset, (‘voc’, ‘voc2007’ or ‘voc2012’) when use built-in dataset, the index_file_name should be None.

classestuple of classes, default = None

users can specify classes for custom dataset ex. classes = (‘bike’, ‘bird’, ‘cat’, …) We reuse the neural network weights if the corresponding class appears in the pretrained model. Otherwise, we randomly initialize the neural network weights for new classes.

formatstr

Format of the object detection dataset, either: ‘voc’ or ‘coco’. For details, see: autogluon/task/object_detection/dataset/voc.py, autogluon/task/object_detection/dataset/coco.py

Trainbool, default = True

pecify Train/Test mode. It is only valid when name is not None.

kwargskeyword arguments

Passed to either: autogluon.task.object_detection.dataset.CustomVOCDetection() or autogluon.task.object_detection.dataset.COCO().

Returns
Dataset object that can be passed to task.fit(), which is actually an autogluon.space.AutoGluonObject.
To interact with such an object yourself, you must first call Dataset.init() to instantiate the object in Python.

CustomVOCDetection

class autogluon.task.object_detection.dataset.CustomVOCDetection(*args, **kwargs)

Custom Dataset which follows protocol/formatting of the well-known VOC object detection dataset.

Parameters
rootstr, default ‘~/mxnet/datasets/voc’

Path to folder storing the dataset.

splitslist of tuples

List of combinations of (year, name) to indicate how to split data into training, validation, and test sets. For the original VOC dataset, the year candidates can be: 2007, 2012. For the original VOC dataset, the name candidates can be: ‘train’, ‘val’, ‘trainval’, ‘test’. For the original VOC dataset, one might use for example: ((2007, ‘trainval’), (2012, ‘trainval’))

classes: tuple of classes

We reuse the neural network weights if the corresponding class appears in the pretrained model. Otherwise, we randomly initialize the neural network weights for new classes.

Returns
Dataset object that can be passed to task.fit(), which is actually an autogluon.space.AutoGluonObject.
To interact with such an object yourself, you must first call Dataset.init() to instantiate the object in Python.
Attributes
cs

ConfigSpace representation of this search space.

default

Return default value for hyperparameter corresponding to this search space.

rand

Randomly sample configuration from this nested search space.

Methods

__call__(self, \*args, \*\*kwargs)

Convenience method for interacting with AutoGluonObject.

init(self)

Instantiate an actual instance of this AutoGluonObject.

sample

CustomVOCDetectionBase

class autogluon.task.object_detection.dataset.CustomVOCDetectionBase(classes=None, root='~/.mxnet/datasets/voc', splits=((2007, 'trainval'), (2012, 'trainval')), transform=None, index_map=None, preload_label=True)

Base class for custom Dataset which follows protocol/formatting of the well-known VOC object detection dataset.

Parameters
class: tuple of classes, default = None

We reuse the neural network weights if the corresponding class appears in the pretrained model. Otherwise, we randomly initialize the neural network weights for new classes.

rootstr, default ‘~/mxnet/datasets/voc’

Path to folder storing the dataset.

splitslist of tuples, default ((2007, ‘trainval’), (2012, ‘trainval’))

List of combinations of (year, name) For years, candidates can be: 2007, 2012. For names, candidates can be: ‘train’, ‘val’, ‘trainval’, ‘test’.

transformcallable, default = None

A function that takes data and label and transforms them. Refer to ./transforms for examples. A transform function for object detection should take label into consideration, because any geometric modification will require label to be modified.

index_mapdict, default = None

By default, the 20 classes are mapped into indices from 0 to 19. We can customize it by providing a str to int dict specifying how to map class names to indices. This is only for advanced users, when you want to swap the orders of class labels.

preload_labelbool, default = True

If True, then parse and load all labels into memory during initialization. It often accelerate speed but require more memory usage. Typical preloaded labels took tens of MB. You only need to disable it when your dataset is extremely large.

Attributes
classes

Category names.

num_class

Number of categories.

Methods

filter(self, fn)

Returns a new dataset with samples filtered by the filter function fn.

sample(self, sampler)

Returns a new dataset with elements sampled by the sampler.

shard(self, num_shards, index)

Returns a new dataset includes only 1/num_shards of this dataset.

take(self, count)

Returns a new dataset with at most count number of samples in it.

transform(self, fn[, lazy])

Returns a new dataset with each sample transformed by the transformer function fn.

transform_first(self, fn[, lazy])

Returns a new dataset with the first element of each sample transformed by the transformer function fn.

COCO

class autogluon.task.object_detection.dataset.COCO(*args, **kwargs)

Built-in class to work with the well-known COCO dataset for object detection.

Returns
Dataset object that can be passed to task.fit(), which is actually an autogluon.space.AutoGluonObject.
To interact with such an object yourself, you must first call Dataset.init() to instantiate the object in Python.
Attributes
cs

ConfigSpace representation of this search space.

default

Return default value for hyperparameter corresponding to this search space.

rand

Randomly sample configuration from this nested search space.

Methods

__call__(self, \*args, \*\*kwargs)

Convenience method for interacting with AutoGluonObject.

init(self)

Instantiate an actual instance of this AutoGluonObject.

sample

Additional Text Classification APIs

TextClassificationPredictor

class autogluon.task.text_classification.TextClassificationPredictor(model, transform, test_transform, results, scheduler_checkpoint, args)

Trained Text Classifier returned by fit() that can be used to make predictions on new text data.

Methods

evaluate(self, dataset[, ctx])

Evaluate predictive performance of trained text classifier using given test data.

predict(self, X)

Predict class-index of a given sentence / text-snippet.

predict_proba(self, X)

Predict class-probabilities of a given sentence / text-snippet.

evaluate(self, dataset, ctx=[cpu(0)])

Evaluate predictive performance of trained text classifier using given test data.

Parameters
datasetautogluon.task.TextClassification.Dataset

The dataset containing test sentences (must be in same format as the training dataset provided to fit).

ctxList of mxnet.context elements.

Determines whether to use CPU or GPU(s), options include: [mx.cpu()] or [mx.gpu()].

Examples

>>> from autogluon import TextClassification as task
>>> dataset = task.Dataset(test_path='~/data/test')
>>> test_performance = predictor.evaluate(dataset)
predict(self, X)

Predict class-index of a given sentence / text-snippet.

Parameters
Xstr

The input sentence we should classify.

Returns
Int corresponding to index of the predicted class.

Examples

>>> class_index = predictor.predict('this is cool')
predict_proba(self, X)

Predict class-probabilities of a given sentence / text-snippet.

Parameters
Xstr

The input sentence we should classify.

Returns
mxnet.NDArray containing predicted probabilities of each class.

Examples

>>> class_probs = predictor.predict_proba('this is cool')

get_dataset

autogluon.task.text_classification.get_dataset(path=None, name=None, train=True, *args, **kwargs)

Load a text classification dataset to train AutoGluon models on.

Parameters
pathstr

Path to local directory containing text dataset. This dataset should be in GLUE format.

namestr

Name describing which built-in popular text dataset to use (mostly from the GLUE NLP benchmark). Options include: ‘mrpc’, ‘qqp’, ‘qnli’, ‘rte’, ‘sts-b’, ‘cola’, ‘mnli’, ‘wnli’, ‘sst’, ‘toysst’. Detailed descriptions can be found in the file: autogluon/task/text_classification/dataset.py

trainbool

Whether this data will be used for training models.