Binary code: maps each character of an alphabet $\Sigma$ to a binary string.
Example: $\Sigma = a-z$ and various punctuations (size 32 overall, say)
Obvious encoding: use the 32 5-bit binary strings to encode this $\Sigma$ (a fixed-length code)
Can we do better? yes, if some characters of $\Sigma$ are much more frequent than others, using a variable-length code.
Example: suppose $\Sigma = {A, B, C, D}$. Fixed-length encoding would be ${00, 01, 10, 11}$. Suppose instead we use the encoding ${0, 01, 10, 1}$. What is $001$? - not enough information to answer question.
Problem: with variable-length codes, not clear where one character ends & the next one begins.
Solution: prefix-free codes - make sure that for every pair $i, j \in \Sigma$, neither of the encodings $f(i), f(j)$ is a prefix of the other.
Example: ${0, 10, 110, 111}$
$\Sigma$ | Frequencies | Fixed-length | Variable-length (prefix free) |
---|---|---|---|
A | 60% | $00$ | $0$ |
B | 25% | $01$ | $10$ |
C | 10% | $10$ | $110$ |
D | 5% | $11$ | $111$ |
Fixed-length encoding: 2 bits per character.
Variable-length encoding: how many bits needed per average? Answer: $$ \mathbb{E}[X] = 0.6\times 1 + 0.25\times 2 + 0.1\times 3 + 0.05\times 3 = 1.55 $$
Goal: best binary prefix-tree encoding for a given set of character frequencies.
Useful fact: thinking of binary codes as binary trees.
Examples with $\Sigma = {A, B, C, D}$
![[Pasted image 20240224131759.png]]
In general:
To decode: repeatedly follow path from root until you hit a leaf (ex: $0110111 \mapsto ACD$).
Note: encoding length of $i \in \Sigma$ is the depth of $i$ in tree.
Input: probability $p_i$ for each character $i \in \Sigma$
Notation: if $T$ is the tree with leaves (equals to symbols of $\Sigma$) then $$ L(T) = \sum_p_t\cdot[\text{depth of i in T}] $$ Example: if $p_A = 60%, p_B = 25%, p_C = 10%, p_D = 5%$, then $L(T) = 1.55$
Output: a binary tree $T$ minimizing the average encoding length $L(T)$
Input: probability $p_i$ for each character $i \in \Sigma$
Output: binary tree (with leaves are symbols of $\Sigma$ ), minimizing the average encoding length: $$ L(T) = \sum_p_t\cdot[\text{depth of i in T}] $$
Question: what’s a principled approach for building a tree with leaves are symbols of $\Sigma$?
Natural but suboptimal idea: top-down / divide and conquer
Build tree button-up using successive mergers
Question: which pair of symbols is “safe” to merge?
Observation: final encoding length of $i \in \Sigma$ is the number of mergers its subtree endures.
Suppose: 1st iteration of algorithm merges symbols $a$ and $b$.
Idea: replace the symbols $a, b$ by a new “meta-symbol” $ab$.
Quiz: Suppose in the first iteration of the algorithm, we merge symbols $a$ and $b$, replace them by a new “meta-symbol” $ab$. How should we define the frequency $p_{ab}$ of this meta-symbol? Answer: $p_a + p_b$.
Example: $p_A = 60%, p_B = 25%, p_C = 10%, p_D = 5%$
![[Pasted image 20240224181949.png]]
Given frequencies $p_i$ as input.
Input: character weights
$A$ | $B$ | $C$ | $D$ | $E$ | $F$ |
---|---|---|---|---|---|
3 | 2 | 6 | 8 | 2 | 6 |
$A$ | $BE$ | $C$ | $D$ | $F$ |
---|---|---|---|---|
3 | 4 | 6 | 8 | 6 |
$ABE$ | $C$ | $D$ | $F$ |
---|---|---|---|
7 | 6 | 8 | 6 |
$ABE$ | $CF$ | $D$ |
---|---|---|
7 | 12 | 8 |
$ABED$ | $CF$ |
---|---|
15 | 12 |
Final output ![[Pasted image 20240224222928.png]]
Corresponding code
Character | Code |
---|---|
$A$ | $000$ |
$B$ | $0010$ |
$C$ | $10$ |
$D$ | $01$ |
$E$ | $0011$ |
$F$ | $11$ |
Theorem (Huffman 52): Huffman’s algorithm computes a binary tree (with leaves are symbols of $\Sigma$) that minimizes the average encoding length
$$ L(T) = \sum_p_t\cdot[\text{depth of i in T}] $$
Proof: by induction on $n = \lvert\Sigma\rvert$ (can assume $n \geq 2$)
Base case: when $n = 2$, algorithm outputs the optima tree (need 1 bit per symbol)
Inductive step: Fix input with $n = \lvert \Sigma\rvert > 2$
By inductive hypothesis: algorithm solves smaller subproblem for $\Sigma’$ optimally,
Let $\Sigma’ = \Sigma$ with $a, b$ (symbols with smallest frequencies) replaced by meta-symbol $ab$. Define $p_{ab} = p_a + p_b$
Recall: exact correspondence between trees for $\Sigma’$ and $X_{ab}$ - trees for $\Sigma$ that have $a, b$ as siblings.
Important: for every such pair $T’$ and $T’$, $L(T) - L(T’)$ is (after cancellation):
$$ L(T) - L(T’) = p_a\cdot [\text{depth of a in T}] + p_b\cdot[\text{depth of b in T}] - p_{ab}\cdot[\text{depth of ab in T’}] $$
However, $p_{ab} = p_a + p_b$, each depth of $a$ in $T$ and depth of $b$ in $T$ is $1$ more than depth of $ab$ in $T’$.
Therefore,
$$ \begin{aligned} L(T) - L(T’) &= p_a(d + 1) + p_b(d + 1) - (p_a + p_b)d\ &= p_a + p_b \end{aligned} $$
Hence, $T$ and $T’$ are independent!
Inductive hypothesis: Huffman’s algorithm computes a tree $\hat{T}’$ that minimizes $L(T’)$ for $\Sigma’$
Upshot: corresponding tree $\hat{T}$ minimizes $L(T)$ for $\Sigma$ over all trees in $X_{ab}$ (i.e. where $a$ and $b$ are siblings).
Key lemma (completes proof of theorem): there is an optimal tree (for $\Sigma$) in $X_{ab}$ (i.e. $a$ and $b$ were “safe” to merge).
Intuition: can make an optimal tree better by pushing $a$ and $b$ as deep as possible (since $a, b$ have smallest frequencies).
By exchange argument. Let $T^*$ be any tree that minimizes $L(T)$ for $\Sigma$.
Let $x, y$ be siblings at the deepest level of $T^*$.
The exchange: obtain $\hat{T}$ from $T^*$ by
Note: $\hat{T} \in X_{ab}$ (by choice of $x, y$)
To finish: will show that $L(\hat{T}) \leq L(T^*)$ (therefore $\hat{T}$ is also optimal, completes proof).
Reason: $$L(T^) - L(\hat{T}) = (p_x - p_a)[\text{depth of x in }T^{} - \text{depth of a in }T^{}] + (p_y - p_b)[\text{depth of y in }T^{} - \text{depth of b in }T^{*}]$$
Note that
Therefore $L(T^*) - L(\hat{T}) \geq 0$ - QED!.
Naïve implementation: $\mathcal{O}(n^2)$ where $n = \lvert\Sigma\rvert$
Speed ups: use a heap! (to perform repeated minimum computations)
Even faster (non-trivial exercise): sorting + $\mathcal{O}(1)$ additional work:
Input: a path graph $G = (V, E)$ with non-negative weights on vertices.
Desired output: subset of nonadjacent vertices - an independent set - of maximum total weight.
Next: iterate through our algorithm design principles.
Greedy: iteratively choose the max-weight vertex not adjacent to any previously chosen vertex.
Question: consider the four-vertex path graph with vertex weights $1, 4, 5$ and $4$. What is the value of the maximum-weight independent set, and what is the value output of this greedy algorithm?
Answer: $8$ and $6$.
Idea: recursively compute the max-weight independent set (IS) of first half, same for the second half, then combine the solutions.
Problem: what if recursive sub-solutions conflict? - not clear how to quickly fix.
Critical step: reason about structure of an optimal solution (in tons of optimal solutions of smaller subproblems)
Motivation: this thought experiment narrows down the set of candidates for the optimal solution; can search through the small set using brute-force search.
Notation: Let $S \subseteq V$ be a max-weight independent set (IS). Let $v_n$ be the last vertex of the path.
![[Pasted image 20240225122930.png]]
Case 1: suppose $v_n \notin S$. Let $G’ = G$ with $v_n$ included.
Case 2: suppose $v_n \in S$.
Upshot: a max-weight IS must be either
Corollary: if we knew whether or not $v_n$ was in the max-weight IS, call recursively compute the max-weight IS of $G’$ or $G’’$ and be done.
(Crazy?) idea: try both possibilities and return the better solution.
Upshot: if we knew whether or not $v_n$ is in the max-weight IS, then could recursively compute the max-weight IS of $G’$ or $G’’$ and be done.
Proposed algorithm:
Good news: correct (formally proof by induction)
Bad news: exponential time
Important question: how many distinct subproblems ever got solved by this algorithm? Answer: $\Theta(n)$ - only 1 for each “prefix” of the graph! (recursion on only plucks vertices off from the right)
Obvious fix: the first time you solve a subproblem, cache its solution in a global take for $\mathcal{O}(1)$-time lookup later on (“memoization”)
Even better: reformulate as a bottom-up iterative algorithm.
Let $G_i$ be the first $i$ vertices of $G$.
Plan: populate array $A$ left to right with $A[i]$ be the value of max-weight IS of $G_i$. Initialization: $A[0] = 0, A[1]=w_1$.
Main loop:
for i = 2, 3, 4, .., n
A[i] = max(A[i - 1], A[i - 2] + w_i)
Running time: obviously $\mathcal{O}(n)$
Correctness: same as recursive version.
Note: algorithm computes the value of a max-weight IS, not such an IS itself.
Correct but not ideal: store optimal IS of each $G_i$ in the array in addition to its value
Better: trace-back through filled-in array to reconstruct optimal solution
Key point: we know that a vertex $v_i$ belongs to a max-weight IS of $G_i$ if and only if $w_i$ + max-weight IS of $G_{i - 2} \geq$ max-weight IS of $G_{i - 1}$ - follows from correctness of our algorithm.
Let A = filled-in array A
Let S = empty
While i >= 1 (scan through array from right to left)
if A[i - 1] >= A[i - 2] + w_i
decrease i by 1
else
add v_i to S
decrease i by 2
return S
Claim: (by induction or case analysis) final outputs is a max-weight IS of $G$.
Running time: $\mathcal{O}(n)$
Fact: our WIS algorithm is a dynamic programming algorithm!
Key ingredients of dynamic programming
Consider an alphabet with five letters ${a, b, c, d, e}$, and suppose we know the frequencies $f_a = 0.32, f_b = 0.25, f_c = 0.2, f_d = 0.18$ and $f_e = 0.05$. What is the expected number of bits used by Huffman’s coding scheme to encode a 1000-letter document?
Under a Huffman encoding of $n$ symbols, how long (in terms of number of bits) can a codeword possibly be?
Which of the following statements holds for Huffman’s coding scheme?
Which of the following is true for our dynamic programming algorithm for computing a maximum-weight independent set of a path graph? (Assuming there are no ties)
Recall our dynamic programming algorithm for computing the maximum-weight independent set of a path graph. Consider the following proposed extension to more general graphs. Consider an undirected graph with positive vertex weights. For a vertex $v$, obtain the graph $G’(v)$ by deleting $v$ and its incident edges from $G$, and obtain the graph $G’’(v)$ by deleting $v$, its neighbors, and all of the corresponding incident edges from $G$. Let $\text{OPT}(H)$ denote the value of a maximum-weight independent set of a graph $H$. Consider the formula
$$ \text{OPT}(G)=\max{\text{OPT}(G’(v)), w_v + \text{OPT}(G’’(v))} $$
where $v$ is an arbitrary vertex of $G$ of weight $w_v$. Which of the following statements is true?