Algorithm designs: no single “silver bullet” for solving problems.
Some design paradigms:
Iteratively make “myopic” decisions , hope everything works out at the end.
Example: Dijkstra’s shortest algorithm.
Danger: most greedy algorithms are not correct (even if your intuition says otherwise).
Quiz: with Dijkstra’s algorithm with negative edge lengths. What does the algorithm compute as the length of a shortest $s-w$ path, and what is the correct answer? ![[Pasted image 20240218223742.png]] Answer: $2-1$
Method 1: induction (greedy stays ahead). Example: Correctness proof for Dijkstra’s algorithm Method 2: “exchange argument”. Method 3: whatever works.
(Bélády 1960s) the “further-in-future” algorithm is optimal (i.e. minimizes the number of cache misses)
Why useful?
Proof: tricky exchange argument.
Setup:
Question: in what order should we sequence the jobs?
Assume each job $j$ has a
The completion time $C_j$ of job $j$ is the sum of job lengths up to and including $j$.
Example: 3 jobs, $l_1 = 1, l_2 = 2, l_3 = 3$. What are $C_1, C_2, C_3$? Answer: $1, 3$ and $6$.
Goal: minimize the weighted sum of completion times: $$ \min\sum_{j=1}^n w_jC_j $$ Back to example, if $w_1=3, w_2=2, w_3=1$, this sum is $3\cdot 1 + 2\cdot 3 + 1\cdot 6=15$
Goal: devise correct greedy algorithm
Question:
Question: what if $w_i > w_j$ but $l_i > l_j$?
Idea: assign “scores” to jobs that are:
Guess 1: order jobs by decreasing value of $w_j - l_j$ Guess 2: order $\frac{w_j}{l_j}$
To distinguish 1 and 2 (guess, above): find example where the two algorithms produce different outputs (at least one will be incorrect).
Example: $(w_1, l_1) = (3, 5), (w_2, l_2) = (1, 2)$. Answer: $23$ and $22$
So: First guess is not always correct, second is always correct (not obvious).
Claim: algorithm #2 (order jobs according to decreasing ratios $\frac{w_i}{l_i}$) is always correct (including ties).
[!tip]- Proof by an Exchange Argument Plan: fix arbitrary input of $n$ jobs. Will proceed by contradiction. Let $\sigma$ be the greedy schedule, $\sigma^$ is the optimal schedule. We will produce a schedule even better than $\sigma^$, contradicting purported optimality of $\sigma^*$
Assume that
Thought experiment: suppose we exchange order of $i, j$ in $\sigma^*$ (leaving other jobs unchanged)
Question: what is the effect of this exchange on the completion time of
Upshot:
Note: $i > j \Rightarrow \frac{w_i}{l_i} < \frac{w_j}{l_j} \Rightarrow w_il_j < w_jl_i$, hence cost < benefit. Therefore swap improves $\sigma^$, contradicts the optimality of $\sigma^$
Back to the claim [[#Correctness proof]], new proof plan: fix arbitrary input of $n$ jobs, let $\sigma$ be the greedy schedule, $\sigma^$ be any other schedule. Will show that $\sigma$ is at least as good as $\sigma^$, implying that greedy schedule is optimal.
Assume (just by renaming jobs) greedy schedule $\sigma$ is just $1, 2, 3, .., n$ (and so $\frac{w_1}{l_1} > \frac{w_2}{l_2} > .. > \frac{w_n}{l_n}$). Consider arbitrary schedule $\sigma^$. If $\sigma^ = \sigma$, done. Else recall exists consecutive jobs $i, j$ in $\sigma^*$ with $i > j$.
Note that $i > j \Rightarrow \frac{w_i}{l_i} \leq \frac{w_j}{l_j} \Rightarrow w_il_j \leq w_jl_i$
Recall: exchanging $i, j$ in $\sigma^*$ has net benefit of $w_jl_i - w_il_j \geq 0$.
Upshot: exchanging an “adjacent inversion” like $i, j$ only makes $\sigma^$ better, and it decreases the number of inverted pairs (jobs $i, j$ with $i > j$ and $i$ scheduled earlier). After at most $n \choose 2$ such exchanges, can transform $\sigma^$ into $\sigma$. Hence, $\sigma$ is at least as good as $\sigma^*$, therefore the greedy approach is optimal.
Informal goal: connect a bunch of points together as cheaply as possible.
Applications: clustering (more later), networking.
Blazingly fast greedy algorithms:
Input: undirected graph $G = (V, E)$
Output: minimum cost tree $T \subseteq E$ that spans all vertices i.e.
Standing assumptions
(Comparing to Dijkstra’s Shortest Path algorithm)
Initialize X = [s] (s in V chosen arbitrarily)
T = empty (invariant: X = vertices spanned by tree-so-far T)
while X != V:
let e = (u, v) be the cheapest edge of G with u in X, v not in X
add e to T
add v to X
Theorem: Prim’s algorithm always computes an MST.
To prove:
Claim: Prim’s algorithm outputs a spanning tree.
Definition: a cut of a graph $G = (V, E)$ is a partition of $V$ into 2 non-empty sets.
Question: roughly how many cuts does a graph with $n$ vertices have? Answer: $2^n$ (for each vertex, choose whether in $A$ or in $B$).
Empty cut lemma: a graph is not connected if and only if exists a cut $(A, B)$ with no crossing edges.
Proof:
Double crossing lemma: suppose the cycle $C \subseteq E$ has an edge crossing the cut $(A, B)$, then so does some other edge of $C$.
Lonely cut corollary: if $e$ is the only edge crossing some cut $(A, B)$, then it is not in any cycle (if it were in a cycle, some other edge would have to cross the cut).
Claim: Prim’s algorithm outputs a spanning tree.
Proof:
Key question: when is it “safe” to include an edge in the tree-so-far?
Cut property: consider an edge $e$ of $G$. Suppose there is a cut $(A, B)$ such that $e$ is the cheapest edge of $G$ that crosses it, then $e$ belongs to the MST of $G$.
Turns out MST is unique if edge costs are distinct.
Proof: will argue by contradiction, using an exchange argument (compare to scheduling application).
Suppose there is an edge $e$ that is the cheapest one crossing a cut $(A, B)$, yet $e$ is not in the MST $T^*$.
Idea: exchange $e$ with another edge in $T^*$ to make it even cheaper (contradiction).
Question: let $T^$ be a spanning tree of $G$, $e \notin T^, f \in T^$. Is $T^ \cup {e} - {f}$ a spanning tree of $G$? Answer: maybe, maybe not (depending on the choice of $e$ and $f$).
Hope: can always find suitable edge $e’$ so that exchange yields bona fide spanning tree of $G$.
How? Let $C$ be the cycle created by adding $c$ to $T^$. By the double-crossing lemma, some other edge $e’$ of $C$ (with $e’ \neq e$ and $e’ \in T^$ ) crosses $(A, B)$. Since $C_e < C_{e’}$, $T$ is cheaper than proposed MST $T^*$ and therefore, contradiction.
Claims: Cut Property implies that Prim’s algorithm is correct.
Proof: By previous proofs, Prim’s algorithm outputs a spanning tree $T^*$
Key point: every edge $e \in T^$ is explicitly justified by the cut property. Therefore, $T^$ is a subset of the MST. Since $T^*$ is already a spanning tree, it must be the MST.
Running time and straightforward implementation:
Recall from Part I: raison d’etre of a heap is to speed up repeated minimum computations
Specially: a heap supports Insert, Extract-Min and Delete in $\mathcal{O}(\log{n})$ time (where $n$ is number of objects in the heap).
Natural idea: use heap to store edges, with keys are the edge costs. This leads to an $\mathcal{O}(m\log{n})$ implementation of Prim’s algorithm.
Invariant 1: Elements in heap is vertices of $V-X$.
Invariant 2: for $v\in V-X$, key(v) is the cheapest edge $(u, v)$ with $u\in X$ (or $+\infty$ if no such edges exist)
Check: can initialize heap with $\mathcal{O}(m + n\log{n}) = \mathcal{O}(m\log{n})$ preprocessing. ($m$ for compute keys, $n\log{n}$ for inserts, the $=$ is because $m \geq n - 1$ since $G$ is connected).
Note: giving invariants, Extract-min yields next vertex $v\notin X$ and edge $(u, v)$ crossing $(X, V -X )$ to add to $X$ and $T$, respectively.
Quiz: in the graph (shown below), what is
![[Pasted image 20240219232601.png]]
Answer: $2, 10, 1$
Issue: might need to re-compute some keys to maintain invariant 2 after each extract-min.
Pseudocode:
When v added to X:
for each edge (u, v) in E:
if w in V-X (the only vertices whose key might have dropped)
Delete w from heap
Recompute Key[w] = min{Key[W], Cvw}
reinsert w into heap
Hence, $\mathcal{O}(m)$ heap operations (recall $m \geq n - 1$ since $G$ connected), overall $\mathcal{O}(m\log{n})$ time.
We are given as input a set of $n$ requests (e.g., for the use of an auditorium), with a known start time $s_i$ and finish time $t_i$ for each request $i$. Assume that all start and finish times are distinct. Two requests conflict if they overlap in time — if one of them starts between the start and finish times of the other. Our goal is to select a maximum-cardinality subset of the given requests that contains no conflicts. (For example, given three requests consuming the intervals $[0,3], [2,5]$, and $[4,7]$, we want to return the first and third requests.) We aim to design a greedy algorithm for this problem with the following form: At each iteration we select a new request $i$, including it in the solution-so-far and deleting from future consideration all requests that conflict with $i$.
Which of the following greedy rules is guaranteed to always compute an optimal solution?
We are given as input a set of $n$ jobs, where job $j$ has a processing time $p_j$ and a deadline $d_j$. Recall the definition of completion times $C_j$ from the video lectures. Given a schedule (i.e., an ordering of the jobs), we define the lateness $l_j$ of job $j$ as the amount of time $C_j−d_j$ after its deadline that the job completes, or as $0$ if $C_j\leq dj$. Our goal is to minimize the maximum lateness, $\max_jl_j$.
In this problem you are given as input a graph $T=(V,E)$ that is a tree (that is, $T$ is undirected, connected, and acyclic). A perfect matching of $T$ is a subset $F\subset E$ of edges such that every vertex $v\in V$ is the endpoint of exactly one edge of $F$. Equivalently, $F$ matches each vertex of $T$ with exactly one other vertex of $T$. For example, a path graph has a perfect matching if and only if it has an even number of vertices.
Consider the following two algorithms that attempt to decide whether or not a given tree has a perfect matching. The degree of a vertex in a graph is the number of edges incident to it. (The two algorithms differ only in the choice of $v$ in line 5.)
Algorithm A:
While T has at least one vertex:
If T has no edges:
halt and output "T has no perfect matching."
Else:
Let v be a vertex of T with maximum degree.
Choose an arbitrary edge e incident to v.
Delete e and its two endpoints from T.
[end of while loop]
Halt and output "T has a perfect matching."
Algorithm B:
While T has at least one vertex:
If T has no edges:
halt and output "T has no perfect matching."
Else:
Let v be a vertex of T with minimum non-zero degree.
Choose an arbitrary edge e incident to v.
Delete e and its two endpoints from T.
[end of while loop]
Halt and output "T has a perfect matching."
Is either algorithm correct?
Consider an undirected graph $G=(V,E)$ where every edge $e\in E$ has a given cost $c_e$. Assume that all edge costs are positive and distinct. Let $T$ be a minimum spanning tree of $G$ and $P$ a shortest path from the vertex $s$ to the vertex $t$. Now suppose that the cost of every edge $e$ of $G$ is increased by $1$ and becomes $c_e+1$. Call this new graph $G’$. Which of the following is true about $G’$?
Suppose $T$ is a minimum spanning tree of the connected graph $G$. Let $H$ be a connected induced subgraph of $G$. (I.e., $H$ is obtained from $G$ by taking some subset $S\subseteq V$ of vertices, and taking all edges of $E$ that have both endpoints in $S$. Also, assume $H$ is connected.) Which of the following is true about the edges of $T$ that lie in $H$? You can assume that edge costs are distinct, if you wish. (Choose the strongest true statement)