Coursera

Week 3

The sorting problem

Input: array of $n$ numbers, unsorted

Output: same numbers, sorted in increasing / decreasing order.

Assume: array entries distinct.

Probability review

1. Sample Spaces

Sample space $\Omega$ = “all possible outcomes”. In algorithms, $\Omega$ usually finite.

Each outcome $i \in \Omega$ has a probability $p(i) \geq 0$. Constraints: $\sum_{i \in \Omega} p(i) = 1$.

Example #1: Rolling 2 dice. $\Omega = {(1, 1), (1, 2), .., (6, 6)}$ - ordered pairs.

Example #2: Choosing a random pivot in outer Quick Sort call, hence $\Omega = {1, 2, .., n}$.

2. Events

An event is a subset $S \subseteq \Omega$. The probability of an event $S$ is $\sum_{i \in S} p(i)$.

Example #1: consider the event “the sum of the two dice is 7”. Hence, probabilities of this event are $S = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}$. Hence, $\mathbb{P}[S] = \frac{6}{36} = \frac{1}{6}$

Example #2: consider the event “the chosen pivot gives a 25-75 split of better.

Suppose the array contained just numbers betwene 1 to 100. We want neither of the subarrays has more than 75% of the elements. Hence we can choose any elements in the range $[26, 75]$. To be general, $S = \left{\left(\frac{n}{4} + 1\right)^{\text{th}}\text{smallest element}, .., \left(\frac{3n}{4}\right)^{\text{th}}\text{smallest element}\right}$. Total events could happen is $\frac{3n}{4} - \frac{n}{4} (+1-1) = \frac{n}{2}$

The probability of this event $\mathbb{P}[S] = \frac{\frac{n}{2}}{n} = \frac{1}{2}$

3. Random variables

A random variable $X$ is a real-valued function $X: \Omega \rightarrow \mathbb{R}$

Example #1: sum of the two dice.

Example #2: size of subarray passed to first recursive call.

4. Expectation

Let $X: \Omega \rightarrow \mathbb{R}$ be a random variable. Expectation $\mathbb{E}[X]$ of $X$ is average value of $X$.

$$ \mathbb{E}[X] = \sum_{i \in \Omega} X(i) \cdot p(i) $$

Example #1: expectation of sum of two dice? 7

Example #2: expectation of the size of the subarray passed to the first recursive call in Quick Sort?

Let $X$ be the subarray size. $\mathbb{E}[x] = \frac{1}{n}\cdot 0 + \frac{1}{n}\cdot 1 + .. + \frac{1}{n}\cdot( n - 1) = \frac{n - 1}{2}$

5. Linearity of Expectation

Let $X_1, X_2, .., X_n$ be random variables defined on $\Omega$. Then:

$$ \mathbb{E}\left[\sum_{j = 1}^n X_j \right] = \sum_{j = 1}^n\mathbb{E}[X_j] $$

Crucially: this holds even when $X_j$ are not independent. Also, this would be false if replace sum with product.

Example #1: if $X_1, X_2$ is two dice, then

$$ \mathbb{E}[X_j] = \frac{1}{6} (1 + 2 + 3 + 4 + 5 + 6) = 3.5 $$

By linearity of expectation,

$$ \mathbb{E}[X_1 + X_2] = \mathbb{E}[X_1] + \mathbb{E}[X_2] = 3.5 + 3.5 =7 $$

Example #2: load balancing.

We need to assign $n$ processes to $n$ servers. A proposed solution is to assign each process to a random server. What is the expected number of processes assigned to a server?

Sample space: $\Omega = n^n$ assignments of processes to servers, each equally likely. Let $Y$ is the total number of processes assigned to the first server. The goal is to compute $\mathbb{E}[Y]$.

Let

$$ X_j = \begin{cases} 1 & \text{if } j^{th} \text{ process assigned to first server} \ 0 & \text{otherwise} \end{cases} $$

We have

$$ \mathbb{E}[Y] = \mathbb{E}\left[\sum_{j = 1}^n X_j\right] = \sum_{j = 1}^n\mathbb{E}\left[X_j\right] = \sum_{j = 1}^n \mathbb{E}\left[\mathbb{P}[X_j = 0]\cdot 0 + \mathbb{P}[X_j = 1]\cdot 1\right] = \sum_{j = 1}^n\frac{1}{n} = 1 $$

6. Conditional probability

Let $X, Y \subseteq \Omega$ be events. Then

$$ \mathbb{P}[X\mid Y] = \frac{\mathbb{P}[X \cap Y]}{\mathbb{P}[Y]} $$

Example #1: Suppose you roll two fair dice. What is the probability that at least one die is a 1, given that the sum of the two dice is 7?

Let $X$ be “at least one die is a 1”, $Y$ be “sum of two dice is 7”. We have $\mathbb{P}[X \cap Y] = \frac{2}{36}, \mathbb{P}[Y] = \frac{11}{36}$. Hence the answer is $\frac{2}{11}$.

7. Independence of events

Events $X, Y \subseteq \Omega$ are independent of and only if $\mathbb{P}[X\cap Y] = \mathbb{P}[X]\mathbb{P}[Y]$. This holds $\mathbb{P}[X \mid Y] = \mathbb{P}[X] \iff \mathbb{P}[Y \mid X] = \mathbb{P}[Y]$.

Note: can be a very subtle concept (intuition is often incorrect).

Independence of random variables: random variables $A, B$ both defined on $\Omega$ are independent iff the event $\mathbb{P}[A = a], \mathbb{P}[B = b]$ are independent for all $a, b$ (i.e. $\mathbb{P}[A = a \text{ and } B = b] = \mathbb{P}[A=a]\mathbb{P}[B=b]$)

Claim: if $A, B$ are independent then $\mathbb{E}[A\cdot B] = \mathbb{E}[A]\cdot\mathbb{E}[B]$.

Proof:

$$ \mathbb{E}[A\cdot B] = \sum_{a, b}(a\cdot b)\cdot\mathbb{P}[A=a \text{ and } B = b] = \sum_{a, b}(a\cdot b)\cdot\mathbb{P}[A=a]\cdot\mathbb{P}[B = b] = \sum_a \left(a\mathbb{P}[A = a]\right)\sum_b\left(b\mathbb{P}[B = b]\right) = \mathbb{E}[A]\cdot\mathbb{E}[B] $$

Example #1: Let $X_1, X_2 \in {0, 1}$ be random, and $X_3 = X_1 \oplus X_2$. Formally: $\Omega = {000, 101, 011, 110}$, each equally likely.

Claim: $X_1$ and $X_3$ are independent random variables. We have $\mathbb{E}[X_1] = \frac{1}{2}, \mathbb{E}[X_3] = \frac{1}{2}, \mathbb{E}[X_1 X_3] = \frac{1}{4}$ (QED).

Claim: $X_1, X_3$ and $X_2$ are not independent random variables, since $\mathbb{E}[X_1 X_2 X_3] = 0, \mathbb{E}[X_1X_3] = \frac{1}{4}, \mathbb{E}[X_2] = \frac{1}{2}$. Hence, $\mathbb{E}[X_1 X_2 X_3] \neq \mathbb{E}[X_1X_3]\mathbb{E}[X_2]$, which makes them not independent.

Quick Sort

Definitely a “greatest hit” algorithm
Prevalent in practice
Beautiful analysis
$\mathcal{O}(n\log(n))$ time “on average”, works in place (i.e. minimal extra memory needed)

High-level description

QuickSort(array A, length n)
    if n = 1 return
    p = choosePivot(A, n)
    partition A around p
    recursively sort left path
    recursively sort right path

Partitioning around a pivot

Key idea: to partition an array around a pivot element.

Pick element of array
Rearrange array so that left of pivot < pivot, and right of pivot > pivot (sort ascending).

Note: put pivot in its “rightful position”.

2 cool facts about partition

Linear $\mathcal{O}(n)$ time, no extra memory.
Reduce problem size.

Using $\mathcal{O}(n)$ extra memory, easy to partition around pivot in $\mathcal{O}(n)$ time.

Assume: pivot = first element of the array. If not, swap pivot as a preprocessing step.

High-level idea:

Single-scan through array
Invariant: everything looked at so far is partitioned.

Partition(A, l, r)
- p := A[l]
- i := l + 1
- for j := l + 1 to r
    - if A[j] < p
        - swap(A[i], A[j])
        - i = i + 1
- swap A[l] and A[i - 1]

Running time: $\mathcal{O(n)}$ while $n = l - r + 1$ is the length of the subarray. Reason: $\mathcal{O}(1)$ work per array entry. Also, clearly works in place.

Quick Sort theorem: for every input array of length $n$, the average running time of Quick Sort (with random pivots) is $\mathcal{O}(n\log(n))$. Notes: “average” is over random choices made by the algorithm (i.e. pivot choices).

Correctness

Claim: the for loop maintains the invariants:

$A_{l + 1}, .., A_{i - 1}$ are all less than the pivot
$A_i, .., A_{j - 1}$ are all greater than pivot.

Consequence: at the end of the loop, i = right most entry less than pivot, j = right most entry larger than pivot.

Induction review: let $P(n)$ is assertion parameterized by positive integers $n$.

For us: $P(n)$ is “Quick Sort correctly sorts every input array of length $n$”.

Base case: all array of size 1 is already sorted.
Inductive: fix $n \geq 2$. Let $k_1, k_2$ are the lengths of first and second parts of partitioned array. By inductive, 1st and 2nd sorted correctly by recursive calls. Hence, after recursive calls, the algorithm recursively sorted the array correctly.

Choosing a good pivot

Running time of Quick Sort depends on the quality of pivot.

Suppose we always select the first element as the pivot, the overall running time would be $\Theta(n^2)$ (runtime $\leq n(n-1)(n-2)..1 \in \Theta(n^2)$).
Choose the median -> the running time would be $\Theta(n\log(n))$ time. Hence, $T(n) \leq T(\frac{n}{2}) + \Theta(n) \in \Theta(n\log(n))$

Random pivots

We have no idea what to pick to be pivot. We can choose randomly a pivot in every recursive call, with a hope that pivot is “pretty good” and “often enough”.

Intutition:

if always get a 25-75 split, goal enough for $\mathcal{O}(n\log(n))$ running time.
Half of elements give a 25-75 split of better
Does this really work?

Analysis - A decomposition principle

Preliminaries: fix input array $A$ of length $n$. Sample space $\Omega$ = all possible outcomes of random choices in Quick Sort (i.e. pivot sequences).

Key random variable: for $\sigma \in \Omega$, let $C(\sigma)$ is the number of comparisons between two input elements made by Quick Sort (given random choices $\sigma$).

Lemma: running time of Quick Sort dominated by comparisons ($\exists c, \forall \sigma \in \Omega, T(\sigma) \leq c\cdot C(\sigma)$).

We can’t apply Master Method since it’s containing random unbalanced subproblems.

Notation: $z_i$ is the $i^{th}$ smallest element of array $A$.

Example, with

A = [6, 8, 10, 2]

Then $Z = {z_2, z_3, z_4, z_1}$. For $\sigma \in \Omega$, indices $i < j$, let $X_{ij}(\sigma)$ is the number of times $z_i, z_j$ get compared in Quick Sort with pivot sequence $\sigma$.

Question: fix two elements of the input array. How many times can these two elements get compared with eachother during the execution of Quick Sort? Answer: 0 or 1. Reason: two elements compared only when one is the pivot, which is excluded from future recursive calls.

Thus: each $X_{ij}$ is an “indicator” (i.e. random 0-1) random variable.

So: $C(\sigma)$ is the number of comparisons between input elements, $X_{ij}(\sigma)$ is the number of comparisons between $z_i$ and $z_j$. Thus,

$$ \forall \sigma, C(\sigma) = \sum_{i = 1}^{n-1}\sum_{j = i + 1}^n X_{ij}(\sigma) $$

By linearity of expectation, $\mathbb{E}[C] = \sum_{i = 1}^{n - 1}\sum_{j = i + 1}^{n} \mathbb{E}[X_{ij}]$. Since $\mathbb{E}[X_{ij}] = 0\mathbb{P}[X_{ij} = 0] + 1\mathbb{P}[X_{ij} = 1] = \mathbb{P}[X_{ij} = 1]$

Thus, $\mathbb{E}[C] = \sum_{i = 1}^{n - 1}\sum_{j = i + 1}^{n} \mathbb{E}[z_i, z_j\text{ got compared}]$

A general decomposition principle

Identity random variable $Y$ that you really care about
Express $Y$ as some of indicator random variables: $Y = \sum_{e = 1}^M X_e$
Apply linearity of expectiation: $\mathbb{E}[Y] = \sum_{e = 1}^M \mathbb{P}[X_e = 1]$

Analysis - the key insight

Key claim: $\forall i < j, \mathbb{P}[z_i, z_j\text{ get compared}] = \frac{2}{j - i + 1}$

Proof:

Given $z_i, z_j, i < j$. Consider the set $z_i, z_{i + 1}, .., z_{j - 1}, z_j$.

Inductively: as long as none of these are chosen as a pivot, all are passed to the same recursive call. Consider the first among $z_i, z_{i + 1}, .., z_{j - 1}, z_j$ that gets chosen as a pivot.

if $z_i$ or $z_j$ get chosen first, then $z_i$ and $z_j$ get compared.
if one of $z_{i + 1}, .., z_{j - 1}$ get chosen first, then $z_i$ and $z_j$ are never compared (split into different recursive calls).

Note: since pivots always chosen uniformly at random, each of $z_i, z_{i + 1}, .., z_{j - 1}, z_j$ is equally likely to be the first.

Hence,

$$ \mathbb{P}[z_i, z_j\text{ get compared}] = \frac{2}{j - i + 1} $$

Where total of choices is the length of the subsequence $j - i + 1$ elements, and there are 2 cases where $z_i$ and $z_j$ get picked, respectively.

So,

$$ \mathbb{E}[C] = \sum_{i = 1}^{n - 1}\sum_{j = i + 1}^{n} \frac{2}{j - i + 1} = 2 \sum_{i = 1}^{n - 1}\sum_{j = i + 1}^{n} \frac{1}{j - i + 1} $$

For each fited $i$, the inner sum is

$$ \sum_{j = i + 1}^n \frac{1}{j - i + 1} = \frac{1}{2} + \frac{1}{3} + .. \frac{1}{n - i + 1} $$

So:

$$ \mathbb{E}[C] \leq 2n\sum_{k = 2}^n \frac{1}{k} $$

(the first $n$ is originally $n - 1$ since there are $n - 1 - 1 + 1$ times it’s been run in the first loop.)

Claim: $\sum_{k = 2}^n \frac{1}{k} \leq \ln(n)$. Proof:

$$ \sum_{k = 2}^n \frac{1}{k} \leq \int_{1}^{n} \frac{1}{x}dx = \ln(x)\Big|^n_1 = \ln(n) $$

Hence, $\mathbb{E}[C] \leq 2n\ln(n) \in \mathcal{O}(n\log(n))$.

Quiz

Question 1

Let $0 < \alpha < \frac{1}{2}$ be some constants (independent of the input array length $n$). Recall the Partition subroutine employed by the QuickSort algorithm, as explained in lecture. What is the probability that, with a randomly chosen pivot element, the Partition subroutine produces a split in which the size of the smaller of the two subarrays is $\geq \alpha$ times the size of the original input array?

Answer: Hence the selection range can be $[\alpha n, (1 - \alpha)n]$, with $(1 - \alpha) - \alpha = 1 - 2\alpha$ elements

Question 2

Now assume that you achieve the approximately balanced splits above in every recursive call — that is, assume that whenever a recursive call is given an array of length $k$, then each of its then each of its two recursive calls is passed a subarray with length between $\alpha k$ and $(1 - \alpha)k$ (where $\alpha$ is a fixed constant strictly between 0 and 5). How many recursive calls can occur before you hit the base case? Equivalently, which levels of the recursion tree can contain leaves? Express your answer as a range of possible numbers $d$, from the minimum to the maximum number of calls that might be needed.

Answer: $-\frac{\log(n)}{\log(\alpha)} \leq d \leq -\frac{\log(n)}{\log(1 - \alpha)}$

Question 3

Define the recursion depth of QuickSort to be the maximum number of successive recursive calls before it hits the base case — equivalently, the number of the last level of the corresponding recursion tree. Note that the recursion depth is a random variable, which depends on which pivots get chosen. What is the minimum-possible and maximum-possible recursion depth of QuickSort, respectively?

Answer: Minimum: $\Theta(\log(n))$, maximum: $\Theta(n)$

Question 4

Consider a group of $k$ people. Assume that each person’s birthday is drawn uniformly at random from the 365 possibilities. (And ignore leap years). What is the smallest value of $k$ such that the expected number of pairs of distict people with the same birthday is at least one?

Answer: $28$

Details: consider the random variable $X$ where

$$ X_{ij, i \neq j} = \begin{cases} 1 &\text{if ith person and jth person shares the same birthday} \ 0 &\text{otherwise} \end{cases} $$ Hence,

$$ \begin{align} & \mathbb{E}(X_{i,j}) = 1*\mathbb{P}(X_{i,j}) + 0*\left ( 1 - \mathbb{P}(X_{i,j}) \right ) \ & \Rightarrow \mathbb{E}(X_{i,j}) = \mathbb{P}(X_{i,j}) \ & \Rightarrow \text{Overall }\mathbb{E} = \sum \mathbb{E}{i,j} \end{align} $$ We need $\mathbb{E} \geq 1$ $$ \begin{align} &\Rightarrow \left [ \sum{i=1}^{k}\sum_{j=i+1}^{k}\mathbb{E}{i,j} \right ] \geq 1 \ &\Rightarrow \left [ \sum{i=1}^{k}\sum_{j=i+1}^{k}\mathbb{P}{i,j} \right ] \geq 1 \ &\Rightarrow \left [ \sum{i=1}^{k}\sum_{j=i+1}^{k}\left ( \frac{1}{365} \right ) \right ] \geq 1 \ &\Rightarrow \left ( \frac{1}{365} \right ) \left [ \sum_{i=1}^{k}\sum_{j=i+1}^{k}1 \right ] \geq 1 \ & \Rightarrow \left ( \frac{1}{365} \right )\left [ \sum_{i=1}^{k-1}1 \right ] \geq 1\ & \Rightarrow \left ( \frac{1}{365} \right )\left [ \frac{k.(k-1)}{2} \right ] \geq 1\ & \Rightarrow k^2 - k - 730 \geq 0\ & \Rightarrow k = \left \lceil \frac{1+ \sqrt{1+4*730}}{2} \right \rceil = \left \lceil 27.52 \right \rceil = 28 \ \end{align} $$

Question 5

Let $X_1, X_2, X_3$ denote the outcomes of three rolls in a six-sided die (i.e. each $X_i$ is uniformly distributed among $1, 2, 3, 4, 5, 6$, and by assumption they are independent). Let $Y$ denote the product of $X_1$ and $X_2$, $Z$ denote the product of $X_2$ and $X_3$. Correct statement is?

Answer: $Y$ and $Z$ are not independent, and $\mathbb{E}[Y * Z] \neq \mathbb{E}[Y]*\mathbb{E}[Z]$