Text Prediction - Quick Start

Here we introduce the TextPrediction task, which helps you automatically train and deploy models for various Natural Language Processing (NLP) problems. This tutorial presents two examples to demonstrate how TextPrediction can be used for different NLP tasks including:

The general usage is similar to AutoGluon’s TabularPrediction module. We treat NLP datasets as tables where certain columns contain text fields and a special column contains the labels to predict. Here, the labels can be discrete categories (classification) or numerical values (regression). TextPrediction fits neural networks to your data via transfer learning from pretrained NLP models like: BERT, ALBERT, and ELECTRA. TextPrediction also trains multiple models with different hyperparameters and returns the best model, a process called Hyperparameter Optimization (HPO).

import numpy as np
import warnings

Sentiment Analysis

First, we consider the Stanford Sentiment Treebank (SST) dataset, which consists of movie reviews and their associated sentiment. Given a new movie review, the goal is to predict the sentiment reflected in the text (in this case a binary classification problem, where reviews are labeled as 1 if they convey a positive opinion and labeled as 0 otherwise). Let’s first load the data and view some examples, noting the labels are stored in a column called label.

from autogluon.utils.tabular.utils.loaders.load_pd import load
train_data = load('https://autogluon-text.s3-accelerate.amazonaws.com/glue/sst/train.parquet')
dev_data = load('https://autogluon-text.s3-accelerate.amazonaws.com/glue/sst/dev.parquet')
rand_idx = np.random.permutation(np.arange(len(train_data)))[:2000]
train_data = train_data.iloc[rand_idx]
Loaded data from: https://autogluon-text.s3-accelerate.amazonaws.com/glue/sst/train.parquet | Columns = 2 / 2 | Rows = 67349 -> 67349
Loaded data from: https://autogluon-text.s3-accelerate.amazonaws.com/glue/sst/dev.parquet | Columns = 2 / 2 | Rows = 872 -> 872
sentence label
2434 goes by quickly 1
27796 reading lines from a teleprompter 0
249 degraded , handheld blair witch video-cam foot... 0
12115 reminds us how realistically nuanced a robert ... 1
50834 indulges in the worst elements of all of them . 0
43622 are nowhere near as vivid as the 19th-century ... 0
3955 throughout a film that is both gripping and co... 1
51011 to see over and over again 1
31232 that fails to match the freshness of the actre... 0
32153 this is an undeniably intriguing film from an ... 1

Above the data happen to be stored in a Parquet table format, but you can also directly load() data from a CSV file instead. While here we load files from AWS S3 cloud storage, these could instead be local files on your machine. After loading, train_data is simply a Pandas DataFrame, where each row represents a different training example (for machine learning to be appropriate, the rows should be independent and identically distributed).

To ensure this tutorial runs quickly, we simply call fit() with a subset of 2000 training examples and limit its runtime to approximately 1 minute. To achieve reasonable performance in your applications, you should set much longer time_limits (eg. 1 hour), or do not specify time_limits at all.

from autogluon import TextPrediction as task

predictor = task.fit(train_data, label='label',
NumPy-shape semantics has been activated in your code. This is required for creating and manipulating scalar and zero-size tensors, which were not supported in MXNet before, as in the official NumPy library. Please DO NOT manually deactivate this semantics while using mxnet.numpy and mxnet.numpy_extension modules.
2020-09-19 07:53:56,548 - root - INFO - All Logs will be saved to ./ag_sst/ag_text_prediction.log
2020-09-19 07:53:56,557 - root - INFO - Train Dataset:
2020-09-19 07:53:56,558 - root - INFO - Columns:

- Text(
   length, min/avg/max=4/51.75/251
- Categorical(
   num_class (total/non_special)=2/2
   categories=[0, 1]
   freq=[695, 905]

2020-09-19 07:53:56,558 - root - INFO - Tuning Dataset:
2020-09-19 07:53:56,558 - root - INFO - Columns:

- Text(
   length, min/avg/max=5/55.80/259
- Categorical(
   num_class (total/non_special)=2/2
   categories=[0, 1]
   freq=[186, 214]

2020-09-19 07:53:56,559 - root - INFO - Label columns=['label'], Feature columns=['sentence'], Problem types=['classification'], Label shapes=[2]
2020-09-19 07:53:56,559 - root - INFO - Eval Metric=acc, Stop Metric=acc, Log Metrics=['f1', 'mcc', 'auc', 'acc', 'nll']
HBox(children=(FloatProgress(value=0.0, max=4.0), HTML(value='')))
 84%|████████▍ | 169/200 [00:24<00:04,  6.90it/s]
 97%|█████████▋| 194/200 [00:27<00:00,  7.03it/s]
100%|██████████| 200/200 [00:28<00:00,  7.02it/s]

Above we specify that: the label column of our DataFrame contains the label-values to predict, AutoGluon should run for 60 seconds, each training run of an individual model (with particular hyperparameters) should run on 1 GPU, a particular random seed should be used to facilitate reproducibility, and that trained models should be saved in the ag_sst folder.

Now you can use predictor.evaluate() to evaluate the trained model on some separate test data.

dev_score = predictor.evaluate(dev_data, metrics='acc')
print('Total Time = {}s'.format(predictor.results['total_time']))
print('Accuracy = {:.2f}%'.format(dev_score['acc'] * 100))
Total Time = 99.78812861442566s
Accuracy = 83.60%

And you can easily obtain predictions from these models.

sentence1 = "it's a charming and often affecting journey."
sentence2 = "It's slow, very, very, very slow."
predictions = predictor.predict({'sentence': [sentence1, sentence2]})
print('"Sentence":', sentence1, '"Predicted Sentiment":', predictions[0])
print('"Sentence":', sentence2, '"Predicted Sentiment":', predictions[1])
"Sentence": it's a charming and often affecting journey. "Predicted Sentiment": 1
"Sentence": It's slow, very, very, very slow. "Predicted Sentiment": 0

For classification tasks, you can ask for predicted class-probabilities instead of predicted classes.

probs = predictor.predict_proba({'sentence': [sentence1, sentence2]})
print('"Sentence":', sentence1, '"Predicted Class-Probabilities":', probs[0])
print('"Sentence":', sentence2, '"Predicted Class-Probabilities":', probs[1])
"Sentence": it's a charming and often affecting journey. "Predicted Class-Probabilities": [0.00123726 0.9987627 ]
"Sentence": It's slow, very, very, very slow. "Predicted Class-Probabilities": [0.8059976  0.19400242]

Sentence Similarity

Next, let’s use AutoGluon to train a model for evaluating how semantically similar two sentences are. We use the Semantic Textual Similarity Benchmark dataset for illustration.

train_data = load('https://autogluon-text.s3-accelerate.amazonaws.com/glue/sts/train.parquet')[['sentence1', 'sentence2', 'score']]
dev_data = load('https://autogluon-text.s3-accelerate.amazonaws.com/glue/sts/dev.parquet')[['sentence1', 'sentence2', 'score']]
sentence1 sentence2 score
0 A plane is taking off. An air plane is taking off. 5.00
1 A man is playing a large flute. A man is playing a flute. 3.80
2 A man is spreading shreded cheese on a pizza. A man is spreading shredded cheese on an uncoo... 3.80
3 Three men are playing chess. Two men are playing chess. 2.60
4 A man is playing the cello. A man seated is playing the cello. 4.25
5 Some men are fighting. Two men are fighting. 4.25
6 A man is smoking. A man is skating. 0.50
7 The man is playing the piano. The man is playing the guitar. 1.60
8 A man is playing on a guitar and singing. A woman is playing an acoustic guitar and sing... 2.20
9 A person is throwing a cat on to the ceiling. A person throws a cat on the ceiling. 5.00

In this data, the score column contains numerical values (which we’d like to predict) that are human-annotated similarity scores for each given pair of sentences.

print('Min score=', min(train_data['score']), ', Max score=', max(train_data['score']))
Min score= 0.0 , Max score= 5.0

Let’s train a regression model to predict these scores with task.fit(). Note that we only need to specify the label column and AutoGluon automatically determines the type of prediction problem and an appropriate loss function. Once again, you should increase the short time_limits below to obtain reasonable performance in your own applications.

predictor_sts = task.fit(train_data, label='score',
                         time_limits='1min', ngpus_per_trial=1, seed=123,
2020-09-19 07:55:48,203 - root - INFO - All Logs will be saved to ./ag_sts/ag_text_prediction.log
2020-09-19 07:55:48,220 - root - INFO - Train Dataset:
2020-09-19 07:55:48,221 - root - INFO - Columns:

- Text(
   length, min/avg/max=16/57.57/340
- Text(
   length, min/avg/max=15/57.27/311
- Numerical(

2020-09-19 07:55:48,221 - root - INFO - Tuning Dataset:
2020-09-19 07:55:48,222 - root - INFO - Columns:

- Text(
   length, min/avg/max=16/58.25/367
- Text(
   length, min/avg/max=15/58.57/265
- Numerical(

2020-09-19 07:55:48,222 - root - INFO - Label columns=['score'], Feature columns=['sentence1', 'sentence2'], Problem types=['regression'], Label shapes=[()]
2020-09-19 07:55:48,223 - root - INFO - Eval Metric=mse, Stop Metric=mse, Log Metrics=['mse', 'rmse', 'mae']
HBox(children=(FloatProgress(value=0.0, max=4.0), HTML(value='')))
 57%|█████▋    | 329/576 [01:01<00:45,  5.39it/s]
 55%|█████▍    | 314/576 [01:01<00:51,  5.11it/s]

We again evaluate our trained model’s performance on some separate test data. Below we choose to compute the following metrics: RMSE, Pearson Correlation, and Spearman Correlation.

dev_score = predictor_sts.evaluate(dev_data, metrics=['rmse', 'pearsonr', 'spearmanr'])
print('Best Config = {}'.format(predictor_sts.results['best_config']))
print('Total Time = {}s'.format(predictor_sts.results['total_time']))
print('RMSE = {:.2f}'.format(dev_score['rmse']))
print('PEARSONR = {:.4f}'.format(dev_score['pearsonr']))
print('SPEARMANR = {:.4f}'.format(dev_score['spearmanr']))
Best Config = {'search_space▁optimization.lr': 5.5e-05}
Total Time = 137.12968516349792s
RMSE = 0.79
PEARSONR = 0.8561
SPEARMANR = 0.8535

Let’s use our model to predict the similarity score among these sentences:

  • ‘The child is riding a horse.’

  • ‘The young boy is riding a horse.’

  • ‘The young man is riding a horse.’

  • ‘The young man is riding a bicycle.’

sentences = ['The child is riding a horse.',
             'The young boy is riding a horse.',
             'The young man is riding a horse.',
             'The young man is riding a bicycle.']

score1 = predictor_sts.predict({'sentence1': [sentences[0]],
                                'sentence2': [sentences[1]]})

score2 = predictor_sts.predict({'sentence1': [sentences[0]],
                                'sentence2': [sentences[2]]})

score3 = predictor_sts.predict({'sentence1': [sentences[0]],
                                'sentence2': [sentences[3]]})
print(score1, score2, score3)
[3.7405694] [3.0964038] [2.2279656]

Save and Load

Finally we demonstrate how to easily save and load a trained TextPrediction model.

predictor_sts_new = task.load('saved_dir')

score3 = predictor_sts_new.predict({'sentence1': [sentences[0]],
                                    'sentence2': [sentences[3]]})

Note: TextPrediction depends on the GluonNLP package. Due to an ongoing upgrade of GluonNLP, we are currently using a custom version of the package: autogluon-contrib-nlp. In a future release, AutoGluon will switch to using the official GluonNLP, but the APIs demonstrated here will remain the same.