Coursera

Fine-Tune BERT for Text Classification with TensorFlow

Figure 1: BERT Classification Model

In this project, you will learn how to fine-tune a BERT model for text classification using TensorFlow and TF-Hub.

The pretrained BERT model used in this project is available on TensorFlow Hub.

Learning Objectives

By the time you complete this project, you will be able to:

Prerequisites

In order to be successful with this project, it is assumed you are:

Contents

This project/notebook consists of several Tasks.

Task 2: Setup your TensorFlow and Colab Runtime.

You will only be able to use the Colab Notebook after you save it to your Google Drive folder. Click on the File menu and select “Save a copy in Drive…

Copy to Drive

Check GPU Availability

Check if your Colab notebook is configured to use Graphical Processing Units (GPUs). If zero GPUs are available, check if the Colab notebook is configured to use GPUs (Menu > Runtime > Change Runtime Type).

Hardware Accelerator Settings

!nvidia-smi
Wed Feb 16 05:43:12 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   64C    P8    32W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Install TensorFlow and TensorFlow Model Garden

import tensorflow as tf
print(tf.version.VERSION)
2.3.0
!pip install -q tensorflow==2.3.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tables 3.7.0 requires numpy>=1.19.0, but you have numpy 1.18.5 which is incompatible.
pandas-gbq 0.13.3 requires google-cloud-bigquery[bqstorage,pandas]<2.0.0dev,>=1.11.1, but you have google-cloud-bigquery 2.32.0 which is incompatible.
google-colab 1.0.0 requires six~=1.15.0, but you have six 1.16.0 which is incompatible.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.
albumentations 0.1.12 requires imgaug<0.2.7,>=0.2.5, but you have imgaug 0.2.9 which is incompatible.
!git clone --depth 1 -b v2.3.0 https://github.com/tensorflow/models.git
fatal: destination path 'models' already exists and is not an empty directory.
# install requirements to use tensorflow/models repository
!pip install -Uqr models/official/requirements.txt
# you may have to restart the runtime afterwards
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.3.0 requires numpy<1.19.0,>=1.16.0, but you have numpy 1.21.5 which is incompatible.
tensorflow 2.3.0 requires scipy==1.4.1, but you have scipy 1.7.3 which is incompatible.
pandas-gbq 0.13.3 requires google-cloud-bigquery[bqstorage,pandas]<2.0.0dev,>=1.11.1, but you have google-cloud-bigquery 2.32.0 which is incompatible.
google-colab 1.0.0 requires six~=1.15.0, but you have six 1.16.0 which is incompatible.
datascience 0.10.6 requires folium==0.2.1, but you have folium 0.8.3 which is incompatible.
albumentations 0.1.12 requires imgaug<0.2.7,>=0.2.5, but you have imgaug 0.2.9 which is incompatible.

Restart the Runtime

Note After installing the required Python packages, you’ll need to restart the Colab Runtime Engine (Menu > Runtime > Restart runtime…)

Restart of the Colab Runtime Engine

Task 3: Download and Import the Quora Insincere Questions Dataset

import numpy as np
import tensorflow as tf
import tensorflow_hub as hub
import sys
sys.path.append('models')
from official.nlp.data import classifier_data_lib
from official.nlp.bert import tokenization
from official.nlp import optimization
/usr/local/lib/python3.7/dist-packages/tensorflow_addons/utils/ensure_tf_install.py:67: UserWarning: Tensorflow Addons supports using Python ops for all Tensorflow versions above or equal to 2.6.0 and strictly below 2.9.0 (nightly versions are not supported). 
 The versions of TensorFlow you are currently using is 2.3.0 and is not supported. 
Some things might work, some things might not.
If you were to encounter a bug, do not file an issue.
If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version. 
You can find the compatibility matrix in TensorFlow Addon's readme:
https://github.com/tensorflow/addons
  UserWarning,
print("TF Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print("Hub version: ", hub.__version__)
print("GPU is", "available" if tf.config.experimental.list_physical_devices("GPU") else "NOT AVAILABLE")
TF Version:  2.3.0
Eager mode:  True
Hub version:  0.12.0
GPU is available

A downloadable copy of the Quora Insincere Questions Classification data can be found https://archive.org/download/fine-tune-bert-tensorflow-train.csv/train.csv.zip. Decompress and read the data into a pandas DataFrame.

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

df = pd.read_csv(
    'https://archive.org/download/fine-tune-bert-tensorflow-train.csv/train.csv.zip', 
    compression = 'zip', low_memory = False)

df.shape
(1306122, 3)
df.tail(20)
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
qid question_text target
1306102 ffff3778790af9baae76 What steps can I take to live a normal life if... 0
1306103 ffff3f0a2449ffe4b9ff Isn't Trump right after all? Why should the US... 1
1306104 ffff41393389d4206066 Is 33 too late for a career in creative advert... 0
1306105 ffff42493fc203cd9532 What is difference between the filteration wor... 0
1306106 ffff48dd47bee89fff79 If the universe "popped" into existence from n... 0
1306107 ffff5fd051a032f32a39 How does a shared service technology team meas... 0
1306108 ffff6d528040d3888b93 How is DSATM civil engineering? 0
1306109 ffff8776cd30cdc8d7f8 Do you know any problem that depends solely on... 0
1306110 ffff94d427ade3716cd1 What are some comic ideas for you Tube videos ... 0
1306111 ffffa382c58368071dc9 If you had $10 million of Bitcoin, could you s... 0
1306112 ffffa5b0fa76431c063f Are you ashamed of being an Indian? 1
1306113 ffffae5dbda3dc9e9771 What are the methods to determine fossil ages ... 0
1306114 ffffba7c4888798571c1 What is your story today? 0
1306115 ffffc0c7158658a06fd9 How do I consume 150 gms protein daily both ve... 0
1306116 ffffc404da586ac5a08f What are the good career options for a msc che... 0
1306117 ffffcc4e2331aaf1e41e What other technical skills do you need as a c... 0
1306118 ffffd431801e5a2f4861 Does MS in ECE have good job prospects in USA ... 0
1306119 ffffd48fb36b63db010c Is foam insulation toxic? 0
1306120 ffffec519fa37cf60c78 How can one start a research project based on ... 0
1306121 ffffed09fedb5088744a Who wins in a battle between a Wolverine and a... 0

<style> .colab-df-container { display:flex; flex-wrap:wrap; gap: 12px; } .colab-df-convert { background-color: #E8F0FE; border: none; border-radius: 50%; cursor: pointer; display: none; fill: #1967D2; height: 32px; padding: 0 0 0 0; width: 32px; } .colab-df-convert:hover { background-color: #E2EBFA; box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15); fill: #174EA6; } [theme=dark] .colab-df-convert { background-color: #3B4455; fill: #D2E3FC; } [theme=dark] .colab-df-convert:hover { background-color: #434B5C; box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15); filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3)); fill: #FFFFFF; }
  <script>
    const buttonEl =
      document.querySelector('#df-4fddcebe-2366-4d00-a0a9-b272beb23f67 button.colab-df-convert');
    buttonEl.style.display =
      google.colab.kernel.accessAllowed ? 'block' : 'none';

    async function convertToInteractive(key) {
      const element = document.querySelector('#df-4fddcebe-2366-4d00-a0a9-b272beb23f67');
      const dataTable =
        await google.colab.kernel.invokeFunction('convertToInteractive',
                                                 [key], {});
      if (!dataTable) return;

      const docLinkHtml = 'Like what you see? Visit the ' +
        '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
        + ' to learn more about interactive tables.';
      element.innerHTML = '';
      dataTable['output_type'] = 'display_data';
      await google.colab.output.renderOutput(dataTable, element);
      const docLink = document.createElement('div');
      docLink.innerHTML = docLinkHtml;
      element.appendChild(docLink);
    }
  </script>
</div>
df.target.plot(kind = 'hist', title = 'Target distribution')
<AxesSubplot:title={'center':'Target distribution'}, ylabel='Frequency'>

png

Task 4: Create tf.data.Datasets for Training and Evaluation

train_df, remaining = train_test_split(df, random_state = 42, train_size = 0.0075, stratify = df.target.values)
valid_df, _ = train_test_split(remaining, random_state = 42, train_size = 0.00075, stratify = remaining.target.values)
train_df.shape, valid_df.shape
((9795, 3), (972, 3))
with tf.device('/cpu:0'):
  train_data = tf.data.Dataset.from_tensor_slices(
      (train_df['question_text'].values, train_df['target'].values)
  )
  valid_data = tf.data.Dataset.from_tensor_slices(
      (valid_df['question_text'].values, valid_df['target'].values)
  )

for text, label in train_data.take(1):
    print(text, label)
tf.Tensor(b'Why are unhealthy relationships so desirable?', shape=(), dtype=string) tf.Tensor(0, shape=(), dtype=int64)

Task 5: Download a Pre-trained BERT Model from TensorFlow Hub

"""
Each line of the dataset is composed of the review text and its label
- Data preprocessing consists of transforming text to BERT input features:
input_word_ids, input_mask, segment_ids
- In the process, tokenizing the text is done with the provided BERT model tokenizer
"""

# Label categories
label_list = [0, 1] 

# maximum length of (token) input sequences
max_seq_length = 128
train_batch_size = 32

# Get BERT layer and tokenizer:
# More details here: https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/2
bert_layer = hub.KerasLayer('https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/2', trainable = True)

vocab_file = bert_layer.resolved_object.vocab_file.asset_path.numpy()
do_lower_case = bert_layer.resolved_object.do_lower_case.numpy()
tokenizer = tokenization.FullTokenizer(vocab_file, do_lower_case)

tokenizer.wordpiece_tokenizer.tokenize('hi, how are you doing?')
['hi', '##,', 'how', 'are', 'you', 'doing', '##?']
tokenizer.convert_tokens_to_ids(tokenizer.wordpiece_tokenizer.tokenize('hi, how are you doing?'))
[7632, 29623, 2129, 2024, 2017, 2725, 29632]

Task 6: Tokenize and Preprocess Text for BERT

Figure 2: BERT Tokenizer

We’ll need to transform our data into a format BERT understands. This involves two steps. First, we create InputExamples using classifier_data_lib’s constructor InputExample provided in the BERT library.

# This provides a function to convert row to input features and label

def to_feature(text, label, label_list=label_list, max_seq_length=max_seq_length, tokenizer=tokenizer):
  example = classifier_data_lib.InputExample(
      guid = None,
      text_a =  text.numpy(),
      text_b = None,
      label = label.numpy()
  )

  feature = classifier_data_lib.convert_single_example(0, example, label_list, max_seq_length, tokenizer)

  return (feature.input_ids, feature.input_mask, feature.segment_ids, feature.label_id)
  

You want to use Dataset.map to apply this function to each element of the dataset. Dataset.map runs in graph mode.

So you can’t .map this function directly: You need to wrap it in a tf.py_function. The tf.py_function will pass regular tensors (with a value and a .numpy() method to access it), to the wrapped python function.

Task 7: Wrap a Python Function into a TensorFlow op for Eager Execution

def to_feature_map(text, label):
  input_ids, input_mask, segment_ids, label_id = tf.py_function(to_feature, inp = [text, label],
      Tout = [tf.int32, tf.int32, tf.int32, tf.int32]
  )

  input_ids.set_shape([max_seq_length])
  input_mask.set_shape([max_seq_length])
  segment_ids.set_shape([max_seq_length])
  label_id.set_shape([])

  x = {
       'input_word_ids' : input_ids,
       'input_mask' : input_mask,
       'input_type_ids' : segment_ids
  }

  return (x, label_id)

Task 8: Create a TensorFlow Input Pipeline with tf.data

with tf.device('/cpu:0'):
  # train
  train_data = (train_data.map(
      to_feature_map,
      num_parallel_calls = tf.data.experimental.AUTOTUNE)
  .shuffle(1000)
  .batch(32, drop_remainder = True)
  .prefetch(tf.data.experimental.AUTOTUNE))

  # valid
  valid_data = (valid_data.map(
      to_feature_map,
      num_parallel_calls = tf.data.experimental.AUTOTUNE)
  .batch(32, drop_remainder = True)
  .prefetch(tf.data.experimental.AUTOTUNE))
  
  

The resulting tf.data.Datasets return (features, labels) pairs, as expected by keras.Model.fit:

# train data spec
train_data.element_spec
({'input_mask': TensorSpec(shape=(32, 128), dtype=tf.int32, name=None),
  'input_type_ids': TensorSpec(shape=(32, 128), dtype=tf.int32, name=None),
  'input_word_ids': TensorSpec(shape=(32, 128), dtype=tf.int32, name=None)},
 TensorSpec(shape=(32,), dtype=tf.int32, name=None))
# valid data spec
valid_data.element_spec
({'input_mask': TensorSpec(shape=(32, 128), dtype=tf.int32, name=None),
  'input_type_ids': TensorSpec(shape=(32, 128), dtype=tf.int32, name=None),
  'input_word_ids': TensorSpec(shape=(32, 128), dtype=tf.int32, name=None)},
 TensorSpec(shape=(32,), dtype=tf.int32, name=None))

Task 9: Add a Classification Head to the BERT Layer

Figure 3: BERT Layer

# Building the model
def create_model():
    input_word_ids = tf.keras.layers.Input(shape=(max_seq_length,), dtype=tf.int32,
                                       name="input_word_ids")

    input_mask = tf.keras.layers.Input(shape=(max_seq_length,), dtype=tf.int32,
                                    name="input_mask")

    input_type_ids = tf.keras.layers.Input(shape=(max_seq_length,), dtype=tf.int32,
                                        name="input_type_ids")
    
    pooled_output, sequence_output = bert_layer([input_word_ids, input_mask, input_type_ids])

    drop = tf.keras.layers.Dropout(0.4)(pooled_output)

    output = tf.keras.layers.Dense(1, activation = 'sigmoid', name = 'output')(drop)

    model = tf.keras.Model(
        inputs = {
            'input_word_ids' : input_word_ids,
            'input_mask' : input_mask,
            'input_type_ids' : input_type_ids
        },
        outputs = output
    )

    return model

Task 10: Fine-Tune BERT for Text Classification

model = create_model()
model.compile(
    optimizer = tf.keras.optimizers.Adam(learning_rate = 2e-5),
    loss = tf.keras.losses.BinaryCrossentropy(), # Using Sparse if have more than 2 classes.
    metrics = [tf.keras.metrics.BinaryAccuracy()])

model.summary()
Model: "functional_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_word_ids (InputLayer)     [(None, 128)]        0                                            
__________________________________________________________________________________________________
input_mask (InputLayer)         [(None, 128)]        0                                            
__________________________________________________________________________________________________
input_type_ids (InputLayer)     [(None, 128)]        0                                            
__________________________________________________________________________________________________
keras_layer (KerasLayer)        [(None, 768), (None, 109482241   input_word_ids[0][0]             
                                                                 input_mask[0][0]                 
                                                                 input_type_ids[0][0]             
__________________________________________________________________________________________________
dropout (Dropout)               (None, 768)          0           keras_layer[0][0]                
__________________________________________________________________________________________________
output (Dense)                  (None, 1)            769         dropout[0][0]                    
==================================================================================================
Total params: 109,483,010
Trainable params: 109,483,009
Non-trainable params: 1
__________________________________________________________________________________________________
tf.keras.utils.plot_model(model = model, show_shapes = True, dpi = 76)

png

# Train model
epochs = 4
history = model.fit(train_data, 
                    validation_data = valid_data,
                    epochs = epochs,
                    verbose = 1)
Epoch 1/4
306/306 [==============================] - ETA: 0s - loss: 0.1601 - binary_accuracy: 0.9414WARNING:tensorflow:Callbacks method `on_test_batch_end` is slow compared to the batch time (batch time: 0.0286s vs `on_test_batch_end` time: 0.5093s). Check your callbacks.


WARNING:tensorflow:Callbacks method `on_test_batch_end` is slow compared to the batch time (batch time: 0.0286s vs `on_test_batch_end` time: 0.5093s). Check your callbacks.


306/306 [==============================] - 484s 2s/step - loss: 0.1601 - binary_accuracy: 0.9414 - val_loss: 0.1159 - val_binary_accuracy: 0.9531
Epoch 2/4
306/306 [==============================] - 483s 2s/step - loss: 0.0949 - binary_accuracy: 0.9616 - val_loss: 0.1393 - val_binary_accuracy: 0.9521
Epoch 3/4
306/306 [==============================] - 483s 2s/step - loss: 0.0509 - binary_accuracy: 0.9816 - val_loss: 0.1644 - val_binary_accuracy: 0.9542
Epoch 4/4
306/306 [==============================] - 483s 2s/step - loss: 0.0189 - binary_accuracy: 0.9942 - val_loss: 0.2376 - val_binary_accuracy: 0.9604

Task 11: Evaluate the BERT Text Classification Model

import matplotlib.pyplot as plt

def plot_graphs(history, metric):
  plt.plot(history.history[metric])
  plt.plot(history.history['val_'+metric], '')
  plt.xlabel("Epochs")
  plt.ylabel(metric)
  plt.legend([metric, 'val_'+metric])
  plt.show()
plot_graphs(history, 'loss')

png

plot_graphs(history, 'binary_accuracy')

png

sample_example = ['hello, have a good day', 'are you feeling ashame being an asian?']

test_data = tf.data.Dataset.from_tensor_slices((sample_example, [0] * len(sample_example)))
test_data = (test_data.map(to_feature_map).batch(1))

preds = model.predict(test_data)
threshold = 0.5 # between 0 and 1

['Insincere' if pred >= threshold else 'Sincere' for pred in preds]
['Sincere', 'Sincere']