Coursera

Week 1 note

Overview

Prompt -> LLMs -> Completion

Prompt space: context window| (normally 1000 words) Act of using model to generate text: inference Completion: question + answer

Use cases and tasks

Writing essays
Summarize text
Machine Translation| (language, code, ..etc)
Information retrieval| (entity extraction, ..etc)
Connecting to data sources / invoking APIs for interacting with real-world.

Generating texts

Before Transformer: RNNs
Transformer:
- Scale efficiently
- Parallel process
- Attention to input meaning
Positional encoding: preserve the word-order and relevance of words in sentence.

Prompting and prompt engineering

Revise the language to make the model behaves like we want -> prompt engineering

In-context learning: including examples| (additional data) in the prompt

One single example: one-shot, others: mult-shot.

Generative configurations

Max new tokens
Top-K and top-p: random sampling limits
- Top-K: output k results after applying random-weighted strategy.
- Top-p: select top results which cumulative probability <= p
Temperature: shape of output probability| (higher -> high randomness)

Generative AI project lifecycle

Scope: Define the use case. Select: Choosing an existing model or pretarin your own. Adapt and align: - Prompt engineering - Finetuning - Align with human feedback - Evaluate Application integration: - Optimize and deploy model for inference - Augment model and build LLM-powered applications

Select

Autoregressive models:
- Text generation
- Other emergent behaviour| (depends on the model size)
Computational challenges.
Scaling laws and compute-optimal models
- Chinchilla: over parameterized in model size, under-trained data

Pretraining for domain adaptation

Legal language
Medical language
Financial language| (BloombergGPT)

Quiz

Question	Answer
1. Interacting with Large Language Models (LLMs) differs from traditional macmechanism that allosa ahine learning models. Working with LLMs involves natural language input, known as a _____, resulting in output from the Large Language Model, known as the ______ .	(prompt, completion)
2. Large Language Models (LLMs) are capable of performing multiple tasks supporting a variety of use cases. Which of the following tasks supports the use case of converting code comments into executable code?	(translation)
3. What is the self-attention that powers the transformer architecture?	(A mechanism that allows a model to focus on different parts of the input sequence during computation)
4. Which of the following stages are part of the generative AI model lifecycle mentioned in the course? (Select all that apply)	(define, select, manipulating, deploying)
5. “RNNs are better than Transformers for generative AI Tasks.”	(False)
6. Which transformer-based model architecture has the objective of guessing a masked token based on the previous sequence of tokens by building bidirectional representations of the input sequence.	(Autoencoder)
7. Which transformer-based model architecture is well-suited to the task of text translation?	(Seq2Seq)
8. Do we always need to increase the model size to improve its performance?	(False)
9. Scaling laws for pre-training large language models consider several aspects to maximize performance of a model within a set of constraints and available scaling choices. Select all alternatives that should be considered for scaling when performing model pre-training?	(compute budget, dataset & model size)
10. “You can combine data parallelism with model parallelism to train LLMs.”	(True)