Python-bloggers

Hyperparameter tuning a Transformer with Optuna

This article was first published on Python – Hutsons-hacks , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

This blog assumes you know a little about transformers and their architecture. To get to grips with the transformer we have used for this example – check out how the BERT infrastructure works:

Once you have watched that video we will load a special version of this model called ELECTRA that is a new pretraining approach which trains two transformer models: the generator and the discriminator. The generator’s role is to replace tokens in a sequence, and is therefore trained as a masked language model. The discriminator, which is the model we’re interested in, tries to identify which tokens were replaced by the generator in the sequence.

Let’s not get too far down the rabbit hole yet!

What the devil is Optuna?

Optuna is:

The website is a great resource and contains Tutorials, a Community to help get you up and running and a supporting GitHub.

Setting up our imports

Specify our parameter and project variables

The next step is to set what we want our hyperparameter searching function to iterate between and some project settings like name of the model, etc:

Here we have set:

Work with HuggingFace datasets for this example

Here we will use the HuggingFace datasets package to work with the adverse drug reaction dataset ade_corpus_v2_classification dataset. This contains text and a label to indicate whether the sentence is related to an adverse drug reaction. Once we have the dataset loaded we will use a train_test_split as a proportion:

You will see a similar message to the below:

Load in Electra Small for the model

I am going to use a small model and tokeniser for this tutorial, as they are lightweight and not as slow to train as the large transformer models:

When working with transformers you will become very used to loading in pretrained models using from_pretrained(). In other tutorials I will delve into pretraining and fine-tuning your own model, on your own dataset, but for now we will use a pretrained model that has been contributed to HuggingFace.

Preprocess our text

Next we will create a function to preprocess our text:

This function:

Finally, we use the map function to map our preprocessing function to our dataset, in batches.

Using Optuna to set our objective function

Optuna is a brilliant tool for hyperparameter tuning as it parellelises and iterates through the suggested ranges of your hyperparameters. This function is defined as below:

This will need some explanation:

Create the Optuna study

The steps below are how you create the Optuna study object and we will pass in our function (objective) we created in the previous steps to say I want to trigger that study for a number of trials / runs:

Here we create our study with a name and what we want to do is minimise loss. We then set our study.optimize to our function we defined and the number of trials (times we want to repeat our study).

At this point your GPU (I wouldn’t recommend this on a CPU) starts to fire up and away the training goes. This the part where you need to run the algorithm overnight and be calm:

Ideally, you would want multiple trials, but I have kept this as 1, to allow you to run the code in under a day.

Get the best study hyperparameters

After we have done a permutation from the various hyperparameter combinations we need to find a way to extract the best hyperparameters to pass to our final model. The next steps show you how to do this in a couple of lines of Python code:

Here we will store a variable for each of our optimal study parameters (according to Optuna when asked to minimize loss) and the lines of code then just print out the hyperparameters and store them in a dictionary:

Create model based on best hyperparameters

The final step will be to train our optimised model via the same process we have outlined in the previous model training. The only difference in the script now is that we are not using Optuna to suggest the best learning rates, weight decay values and epochs:

Saving our best model

This would be the last step of our tuning and training process. We know need to save our best model so we can use it later on to perform inference on the relevant dataset:

This will: a) create a model directory if it doesn’t exist; b) store the model in the model folder with the name of the model (this one is called huggingoptunaface) and c) we will save the tokenizer and model with the model path. The special method here is save_pretrained.

Once your script has run the model will be saved with the relevant name:

The model is stored as a bin file and the special tokens and tokenizers, as well as the training config are stored as JSON documents. The vocab.txt file contains all the vocab the model has detected when fine-tuning the model.

The full script is contained below:

Loading and use the model

The next step we will create a script just to load our fine-tuned Optuna optimised model and make inferences on it.

The steps below will declare the relevant imports and load our model and tokenizer that we have fine tuned with one of the HuggingFace datasets:

We have loaded our trained model from the relevant directory. Now we will pass an example sentence through the model and prepare a function to take the text and collapse down the result:

To explain the steps:

And that is it – we have fine-tuned and optimised the parameters of our Electra-Small model and when then loaded these back in to make inferences against.

The full script for the inferencing script is here:

What’s next?

I aim on creating a tutorial to guide you through training your own tokenizer, pre-training your transformer model on a collection of relevant articles and then fine-tuning this on a classification task, as these are the most common NLP challenge.

I hope you had fun working through this with me and keep up the good work. Now it is time to:

To leave a comment for the author, please follow the link and comment on their blog: Python – Hutsons-hacks .

Want to share your content on python-bloggers? click here.
Exit mobile version