Coursera

Week 2

Instruction fine-tuning

Finetuning with instruction prompts:

Prepare instruction dataset -> split

Finetuning on a single task

How to avoid?

Multitask instruction finetuning

Dataset contains examples from variety of tasks.

FLAN (Fine-tuned LAnguage Net)

Model evaluation

$$ \text{Accuracy} = \frac{\text{Correct Predictions}}{\text{Total Predictions}} $$

Using other metrices:

$$ \text{ROUGE-1 Recall} = \frac{\text{unigram matches}}{\text{unigrams in reference}} $$

$$ \text{ROUGE-1 Precision} = \frac{\text{unigram matches}}{\text{unigrams in output}} $$

$$ \text{ROUGE-1 F1} = 2\times \frac{\text{precision}\times\text{recall}}{\text{precision + recall}} $$

For ROUGE-2 is the bigram, ROUGE-3 is trigram, ..etc.

However, if the reference is

It is cold outside

and the generated output is

cold cold cold cold

Hence, the ROUGE-1 precision will be

$$ \text{ROUGE-1 Precision} = \frac{\text{unigram matches}}{\text{unigrams in output}} = \frac{4}{4} = 1.0 $$

And we don’t want that to happen. Using clipping function for modified precision:

$$ \text{ROUGE-1 Modified Precision} = \frac{\text{clip(unigram matches)}}{\text{unigrams in output}} = \frac{1}{4} = 0.25 $$

With the generated output is

outside cold it is

the modified precision will still, given the 1.0 precision. Hence, in this situation we will ot use unigram, but bigram or trigram.

Benchmarks

Parameter efficient fine-tuning (EPFT)

Finetuning a subset of model parameters to prevent catastrophic forgetting.

Methods:

LoRA - Low-Rank Adaptation of Large Language Models

(Reparameterization)

Steps to update model for inference:

  1. Matrix multiply the low rank matrices.
  2. Add to original weights.

Example using base transformer as reference (86% reduction in parameters to train):

Model Weight dim Weights in total
Base transformer 512 x 64 32768
LoRA r=8 A (8 x 64), B (512 x 8) 512 + 4096

LoRA for generative LLMs: A and B are different weights for different tasks.

Soft prompts

Prompt tuning is not prompt engineering.

Adding soft prompt vectors to the embedding layer.

Switch soft prompt to change finetuning task.

Quiz

Question Answer
1. Fill in the blanks: __________ involves using many prompt-completion examples as the labeled training dataset to continue training the model by updating its weights. This is different from _________ where you provide prompt-completion examples during inference. Instruction fine-tuning, in-context learning
2. Fine-tuning a model on a single task can improve model performance specifically on that task; however, it can also degrade the performance of other tasks as a side effect. This phenomenon is known as: Catastrophic forgetting
3. Which evaluation metric below focuses on precision in matching generated output to the reference text and is used for text translation? BLEU
4. Which of the following statements about multi-task finetuning is correct? Select all that apply: help prevent catastrophic forgetting & FLAN-T5 was trained with multitask finetuning
5. “Smaller LLMs can struggle with one-shot and few-shot inference:” True
6. Which of the following are Parameter Efficient Fine-Tuning (PEFT) methods? Select all that apply. Reparameterization, Selective & Additive
7. Which of the following best describes how LoRA works? Decompose weights to smaller matrices and train those
8. What is a soft prompt in the context of LLMs (Large Language Models)? A set of trainable tokens
9. “Prompt Tuning is a technique used to adjust all hyperparameters of a language model.” False
10. “PEFT methods can reduce the memory needed for fine-tuning dramatically, sometimes to just 12-20% of the memory needed for full fine-tuning.” True