Text Classification - Quick Start

Note: TextClassification is in preview mode and is not feature complete. While the tutorial described below is functional, using TextClassification on custom datasets is not yet supported. For an alternative, text data can be passed to TabularPrediction in tabular format which has text feature support.

We adopt the task of Text Classification as a running example to illustrate basic usage of AutoGluon’s NLP capability.

In this tutorial, we are using sentiment analysis as a text classification example. We will load sentences and the corresponding labels (sentiment) into AutoGluon and use this data to obtain a neural network that can classify new sentences. Different from traditional machine learning where we need to manually define the neural network, and specify the hyperparameters in the training process, with just a single call to AutoGluon’s fit function, AutoGluon will automatically train many models under thousands of different hyperparameter configurations and then return the best model.

We begin by specifying TextClassification as our task of interest:

import autogluon as ag
from autogluon import TextClassification as task
/var/lib/jenkins/miniconda3/envs/autogluon_docs/lib/python3.7/site-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.metrics.classification module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.metrics. Anything that cannot be imported from sklearn.metrics is now part of the private API.
  warnings.warn(message, FutureWarning)

Create AutoGluon Dataset

We are using a subset of the Stanford Sentiment Treebank (SST). The original dataset consists of sentences from movie reviews and human annotations of their sentiment. The task is to classify whether a given sentence has positive or negative sentiment (binary classification).

dataset = task.Dataset(name='ToySST')

In the above call, we have the proper train/validation/test split of the SST dataset.

Use AutoGluon to fit Models

Now, we want to obtain a neural network classifier using AutoGluon. In the default configuration, rather than attempting to train complex models from scratch using our data, AutoGluon fine-tunes neural networks that have already been pretrained on large scale text dataset such as Wikicorpus. Although the dataset involves entirely different text, lower-level features captured in the representations of the pretrained network (such as edge/texture detectors) are likely to remain useful for our own text dataset.

While we primarily stick with default configurations in this Beginner tutorial, the Advanced tutorial covers various options that you can specify for greater control over the training process. With just a single call to AutoGluon’s fit function, AutoGluon will train many models with different hyperparameter configurations and return the best model.

However, neural network training can be quite time-costly. To ensure quick runtimes, we tell AutoGluon to obey strict limits: num_training_epochs specifies how much computational effort can be devoted to training any single network, while time_limits in seconds specifies how much time fit has to return a model. For demo purposes, we specify only small values for time_limits, num_training_epochs:

predictor = task.fit(dataset, epochs=1)
Warning: TextClassification is in preview mode and is not feature complete. Using TextClassification on custom datasets is not yet supported. For an alternative, text data can be passed to TabularPrediction in tabular format which has text feature support.
Starting Experiments
Num of Finished Tasks is 0
Num of Pending Tasks is 2
scheduler: FIFOScheduler(
DistributedResourceManager{
(Remote: Remote REMOTE_ID: 0,
    <Remote: 'inproc://172.31.45.231/17299/1' processes=1 threads=8, memory=33.24 GB>, Resource: NodeResourceManager(8 CPUs, 1 GPUs))
})
HBox(children=(FloatProgress(value=0.0, max=2.0), HTML(value='')))
Using gradient accumulation. Effective batch size = batch_size * accumulate = 32
[Epoch 0] loss=0.6932, lr=0.0000032, metrics:accuracy:0.5384: 100%|██████████| 427/427 [05:06<00:00,  1.39it/s]
100%|██████████| 6/6 [00:01<00:00,  3.77it/s]
validation metrics:accuracy:0.5057
Finished Task with config: {'lr': 6.32456e-05, 'net.choice': 0, 'pretrained_dataset.choice': 0} and reward: 0.5057471264367817
Using gradient accumulation. Effective batch size = batch_size * accumulate = 32
[Epoch 0] loss=0.2312, lr=0.0000013, metrics:accuracy:0.8599: 100%|██████████| 427/427 [05:16<00:00,  1.35it/s]
100%|██████████| 6/6 [00:01<00:00,  3.66it/s]
validation metrics:accuracy:0.9080
Finished Task with config: {'lr': 2.584378989482461e-05, 'net.choice': 0, 'pretrained_dataset.choice': 1} and reward: 0.9080459770114943
Using gradient accumulation. Effective batch size = batch_size * accumulate = 32
[Epoch 0] loss=0.2312, lr=0.0000013, metrics:accuracy:0.8599: 100%|██████████| 427/427 [05:15<00:00,  1.35it/s]
100%|██████████| 6/6 [00:01<00:00,  3.73it/s]
validation metrics:accuracy:0.9080

Within fit, the model with the best hyperparameter configuration is selected based on its validation accuracy after being trained on the data in the training split.

The best Top-1 accuracy achieved on the validation set is:

print('Top-1 val acc: %.3f' % predictor.results['best_reward'])
Top-1 val acc: 0.908

Within fit, this model is also finally fitted on our entire dataset (i.e., merging training+validation) using the same optimal hyperparameter configuration. The resulting model is considered as final model to be applied to classify new text.

We now construct a test dataset similarly as we did with the train dataset, and then evaluate the final model produced by fit on the test data:

test_acc = predictor.evaluate(dataset)
print('Top-1 test acc: %.3f' % test_acc)
Top-1 test acc: 0.908

Given an example sentence, we can easily use the final model to predict the label (and the conditional class-probability):

sentence = 'I feel this is awesome!'
ind = predictor.predict(sentence)
print('The input sentence sentiment is classified as [%d].' % ind.asscalar())
The input sentence sentiment is classified as [1].

The results object returned by fit contains summaries describing various aspects of the training process. For example, we can inspect the best hyperparameter configuration corresponding to the final model which achieved the above (best) results:

print('The best configuration is:')
print(predictor.results['best_config'])
The best configuration is:
{'lr': 2.584378989482461e-05, 'net.choice': 0, 'pretrained_dataset.choice': 1}