Coursera

Transfer Learning for NLP with TensorFlow Hub


This is a starter notebook for the guided project Transfer Learning for NLP with TensorFlow Hub

A complete version of this notebook is available in the course resources.


Overview

TensorFlow Hub is a repository of pre-trained TensorFlow models.

In this project, you will use pre-trained models from TensorFlow Hub with tf.keras for text classification. Transfer learning makes it possible to save training resources and to achieve good model generalization even when training on a small dataset. In this project, we will demonstrate this by training with several different TF-Hub modules.

Learning Objectives

By the time you complete this project, you will be able to:

Prerequisites

In order to be successful with this project, it is assumed you are:

Contents

This project/notebook consists of several Tasks.

Task 2: Setup your TensorFlow and Colab Runtime.

You will only be able to use the Colab Notebook after you save it to your Google Drive folder. Click on the File menu and select “Save a copy in Drive…

Copy to Drive

Check GPU Availability

Check if your Colab notebook is configured to use Graphical Processing Units (GPUs). If zero GPUs are available, check if the Colab notebook is configured to use GPUs (Menu > Runtime > Change Runtime Type).

Hardware Accelerator Settings

!nvidia-smi
Sun Jul  3 01:41:59 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P8     9W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
import numpy as np
import pandas as pd

import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds

import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (12, 8)
from  IPython import display

import pathlib
import shutil
import tempfile

!pip install -q git+https://github.com/tensorflow/docs

import tensorflow_docs as tfdocs
import tensorflow_docs.modeling
import tensorflow_docs.plots

print("Version: ", tf.__version__)
print("Hub version: ", hub.__version__)
print("GPU is", "available" if tf.config.list_physical_devices('GPU') else "NOT AVAILABLE")

logdir = pathlib.Path(tempfile.mkdtemp())/"tensorboard_logs"
shutil.rmtree(logdir, ignore_errors=True)
  Building wheel for tensorflow-docs (setup.py) ... [?25l[?25hdone
Version:  2.8.2
Hub version:  0.12.0
GPU is available

Task 3: Download and Import the Quora Insincere Questions Dataset

A downloadable copy of the Quora Insincere Questions Classification data can be found https://archive.org/download/fine-tune-bert-tensorflow-train.csv/train.csv.zip. Decompress and read the data into a pandas DataFrame.

df = pd.read_csv('https://archive.org/download/fine-tune-bert-tensorflow-train.csv/train.csv.zip', 
                 compression = 'zip', low_memory = False)

df.shape
(1306122, 3)
df['target'].plot(kind = 'hist', title = 'Target distribution')
<matplotlib.axes._subplots.AxesSubplot at 0x7f3dfb769550>

png

from sklearn.model_selection import train_test_split

train_df, remaining = train_test_split(
    df, random_state = 42, train_size = .01, stratify = df.target.values
)
valid_df, _ = train_test_split(
    remaining, random_state = 42, train_size = .001, stratify = remaining.target.values
)

train_df.shape, valid_df.shape
((13061, 3), (1293, 3))
train_df.target.head(15).values
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0])
train_df.question_text.head(15).values
array(['What is your experience living in Venezuela in the current crisis? (2018)',
       'In which state/city the price of property is highest?',
       'Do rich blacks also call poor whites, “White Trash”?',
       'Should my 5 yr old son and 2 yr old daughter spend the summer with their father, after a domestic violent relationship?',
       'Why do we have parents?',
       'Do we experience ghost like Murphy did in Interstellar?',
       'Are Estoniano women beautiful?',
       'There was a Funny or Die video called Sensitivity Hoedown that got pulled. Does anyone know why?',
       'Is it a good idea to go in fully mainstream classes, even if I have meltdowns that might disrupt people?',
       'What classifies a third world country as such?',
       'Is being a pilot safe?',
       'Who is Illiteratendra Modi? Why does he keep with him a Rs 1 lakh pen?',
       'Have modern management strategies such as Total supply Chain Management applied to education? Can they be?',
       'Why are Lucky Charms considered good for you?',
       'How many people in India use WhatsApp, Facebook, Twitter and Instagram?'],
      dtype=object)

Task 4: TensorFlow Hub for Natural Language Processing

Our text data consits of questions and corresponding labels.

You can think of a question vector as a distributed representation of a question, and is computed for every question in the training set. The question vector along with the output label is then used to train the statistical classification model.

The intuition is that the question vector captures the semantics of the question and, as a result, can be effectively used for classification.

To obtain question vectors, we have two alternatives that have been used for several text classification problems in NLP:

Word-based Representations

Context-based Representations

Tensorflow Hub provides a number of modules to convert sentences into embeddings such as Universal sentence ecoders, NNLM, BERT and Wikiwords.

Transfer learning makes it possible to save training resources and to achieve good model generalization even when training on a small dataset. In this project, we will demonstrate this by training with several different TF-Hub modules.

module_url = "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1" #@param ["https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1", "https://tfhub.dev/google/tf2-preview/nnlm-en-dim50/1", "https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1", "https://tfhub.dev/google/universal-sentence-encoder/4", "https://tfhub.dev/google/universal-sentence-encoder-large/5"] {allow-input: true}

Tasks 5 & 6: Define Function to Build and Compile Models

def train_and_evaluate_model(module_url, embed_size, name, trainable=False):
    hub_layer = hub.KerasLayer(
        module_url, input_shape = [], output_shape = [embed_size], 
        dtype = tf.string, trainable = trainable
    )

    model = tf.keras.models.Sequential([
        hub_layer,
        tf.keras.layers.Dense(256, activation = 'relu'),
        tf.keras.layers.Dense(64, activation = 'relu'),
        tf.keras.layers.Dense(1, activation = 'sigmoid'),
    ])

    model.compile(
        optimizer = tf.keras.optimizers.Adam(learning_rate = .0001),
        loss = tf.losses.BinaryCrossentropy(),
        metrics = [tf.metrics.BinaryAccuracy(name = 'accuracy')]
    )

    model.summary()

    history = model.fit(
        train_df['question_text'], train_df['target'],
        epochs = 100, batch_size = 32,
        validation_data = (valid_df['question_text'], valid_df['target']),
        callbacks = [
            tfdocs.modeling.EpochDots(),
            tf.keras.callbacks.EarlyStopping(
                monitor = 'val_loss', patience = 2, mode = 'min'
            ),
            tf.keras.callbacks.TensorBoard(logdir/name)
        ],
        verbose = 0
    )

    return history

Task 7: Train Various Text Classification Models

histories = {}
module_url = "https://tfhub.dev/google/universal-sentence-encoder-large/5" #@param ["https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1", "https://tfhub.dev/google/tf2-preview/nnlm-en-dim50/1", "https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1", "https://tfhub.dev/google/universal-sentence-encoder/4", "https://tfhub.dev/google/universal-sentence-encoder-large/5"] {allow-input: true}
model_name = 'universal-sentence-encoder-large'
model_dim = 512
trainable = False

histories[model_name] = train_and_evaluate_model(
    module_url, embed_size = model_dim, name = model_name, trainable = trainable
)
Model: "sequential_10"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 keras_layer_11 (KerasLayer)  (None, 512)              147354880 
                                                                 
 dense_30 (Dense)            (None, 256)               131328    
                                                                 
 dense_31 (Dense)            (None, 64)                16448     
                                                                 
 dense_32 (Dense)            (None, 1)                 65        
                                                                 
=================================================================
Total params: 147,502,721
Trainable params: 147,841
Non-trainable params: 147,354,880
_________________________________________________________________

Epoch: 0, accuracy:0.9255,  loss:0.3213,  val_accuracy:0.9381,  val_loss:0.1760,  
.............

Task 8: Compare Accuracy and Loss Curves

plt.rcParams['figure.figsize'] = (12, 8)
plotter = tfdocs.plots.HistoryPlotter(metric = 'accuracy')
plotter.plot(histories)
plt.xlabel("Epochs")
plt.legend(bbox_to_anchor=(1.0, 1.0), loc='upper left')
plt.title("Accuracy Curves for Models")
plt.show()

png

plotter = tfdocs.plots.HistoryPlotter(metric = 'loss')
plotter.plot(histories)
plt.xlabel("Epochs")
plt.legend(bbox_to_anchor=(1.0, 1.0), loc='upper left')
plt.title("Loss Curves for Models")
plt.show()

png

Task 9: Fine-tune Model from TF Hub

Task 10: Train Bigger Models and Visualize Metrics with TensorBoard

%load_ext tensorboard

%tensorboard --logdir {logdir}
Output hidden; open in https://colab.research.google.com to view.