Finetuning with instruction prompts:
Prepare instruction dataset -> split
How to avoid?
Dataset contains examples from variety of tasks.
FLAN (Fine-tuned LAnguage Net)
$$ \text{Accuracy} = \frac{\text{Correct Predictions}}{\text{Total Predictions}} $$
Using other metrices:
$$ \text{ROUGE-1 Recall} = \frac{\text{unigram matches}}{\text{unigrams in reference}} $$
$$ \text{ROUGE-1 Precision} = \frac{\text{unigram matches}}{\text{unigrams in output}} $$
$$ \text{ROUGE-1 F1} = 2\times \frac{\text{precision}\times\text{recall}}{\text{precision + recall}} $$
For ROUGE-2 is the bigram, ROUGE-3 is trigram, ..etc.
However, if the reference is
It is cold outside
and the generated output is
cold cold cold cold
Hence, the ROUGE-1 precision will be
$$ \text{ROUGE-1 Precision} = \frac{\text{unigram matches}}{\text{unigrams in output}} = \frac{4}{4} = 1.0 $$
And we don’t want that to happen. Using clipping function for modified precision:
$$ \text{ROUGE-1 Modified Precision} = \frac{\text{clip(unigram matches)}}{\text{unigrams in output}} = \frac{1}{4} = 0.25 $$
With the generated output is
outside cold it is
the modified precision will still, given the 1.0 precision. Hence, in this situation we will ot use unigram, but bigram or trigram.
Finetuning a subset of model parameters to prevent catastrophic forgetting.
Methods:
(Reparameterization)
Steps to update model for inference:
Example using base transformer as reference (86% reduction in parameters to train):
Model | Weight dim | Weights in total |
---|---|---|
Base transformer | 512 x 64 | 32768 |
LoRA r=8 |
A (8 x 64), B (512 x 8) | 512 + 4096 |
LoRA for generative LLMs: A and B are different weights for different tasks.
Prompt tuning is not prompt engineering.
Adding soft prompt vectors to the embedding layer.
Switch soft prompt to change finetuning task.
Question | Answer |
---|---|
1. Fill in the blanks: __________ involves using many prompt-completion examples as the labeled training dataset to continue training the model by updating its weights. This is different from _________ where you provide prompt-completion examples during inference. | Instruction fine-tuning, in-context learning |
2. Fine-tuning a model on a single task can improve model performance specifically on that task; however, it can also degrade the performance of other tasks as a side effect. This phenomenon is known as: | Catastrophic forgetting |
3. Which evaluation metric below focuses on precision in matching generated output to the reference text and is used for text translation? | BLEU |
4. Which of the following statements about multi-task finetuning is correct? Select all that apply: | help prevent catastrophic forgetting & FLAN-T5 was trained with multitask finetuning |
5. “Smaller LLMs can struggle with one-shot and few-shot inference:” | True |
6. Which of the following are Parameter Efficient Fine-Tuning (PEFT) methods? Select all that apply. | Reparameterization, Selective & Additive |
7. Which of the following best describes how LoRA works? | Decompose weights to smaller matrices and train those |
8. What is a soft prompt in the context of LLMs (Large Language Models)? | A set of trainable tokens |
9. “Prompt Tuning is a technique used to adjust all hyperparameters of a language model.” | False |
10. “PEFT methods can reduce the memory needed for fine-tuning dramatically, sometimes to just 12-20% of the memory needed for full fine-tuning.” | True |