25
$\begingroup$

This problem is taken from interviewstreet.com

We are given an array of integers $Y=\{y_1,...,y_n\}$ that represents $n$ line segments such that endpoints of segment $i$ are $(i, 0)$ and $(i, y_i)$. Imagine that from the top of each segment a horizontal ray is shot to the left, and this ray stops when it touches another segment or it hits the y-axis. We construct an array of n integers, $v_1, ..., v_n$, where $v_i$ is equal to length of ray shot from the top of segment $i$. We define $V(y_1, ..., y_n) = v_1 + ... + v_n$.

For example, if we have $Y=[3,2,5,3,3,4,1,2]$, then $[v_1, ..., v_8] = [1,1,3,1,1,3,1,2]$, as shown in the picture below:

enter image description here

For each permutation $p$ of $[1,...,n]$, we can calculate $V(y_{p_1}, ..., y_{p_n})$. If we choose a uniformly random permutation $p$ of $[1,...,n]$, what is the expected value of $V(y_{p_1}, ..., y_{p_n})$?

If we solve this problem using the naive approach it will not be efficient and run practically forever for $n=50$. I believe we can approach this problem by indepdently calculating the expected value of $v_i$ for each stick but I still need to know wether there is another efficient approach for this problem. On what basis can we calculate the expected value for each stick independently?

$\endgroup$
1
  • $\begingroup$ You can use linearity of expectation. This question is probably more appropriate at math.SE $\endgroup$
    – Shitikanth
    Commented Apr 6, 2012 at 14:31

5 Answers 5

25
$\begingroup$

Imagine a different problem: if you had to place $k$ sticks of equal heights in $n$ slots then the expected distance between sticks (and the expected distance between the first stick and a notional slot $0$, and the expected distance between the last stick and a notional slot $n+1$) is $\frac{n+1}{k+1}$ since there are $k+1$ gaps to fit in a length $n+1$.

Returning to this problem, a particular stick is interested in how many sticks (including itself) are as high or higher. If this number is $k$, then the expected gap to its left is also $\frac{n+1}{k+1}$.

So the algorithm is simply to find this value for each stick and add up the expectation. For example, starting with heights of $[3,2,5,3,3,4,1,2]$, the number of sticks with a greater or equal height is $[5,7,1,5,5,2,8,7]$ so the expectation is $\frac96+\frac98+\frac92+\frac96+\frac96+\frac93+\frac99+\frac98 = 15.25$.

This is easy to program: for example a single line in R

V <- function(Y){ (length(Y) + 1) * sum( 1 / (rowSums(outer(Y, Y, "<=")) + 1) ) }

gives the values in the sample output in the original problem

> V(c(1,2,3))
[1] 4.333333
> V(c(3,3,3))
[1] 3
> V(c(2,2,3))
[1] 4
> V(c(10,2,4,4))
[1] 6
> V(c(10,10,10,5,10))
[1] 5.8
> V(c(1,2,3,4,5,6))
[1] 11.15
$\endgroup$
9
  • 1
    $\begingroup$ Very interesting. Can you kindly elaborate a little bit on why the expected distance between sticks is $(n+1)/(k+1)$; as it is not clear (at least to me) how it was computed. Thank you. $\endgroup$ Commented Apr 8, 2012 at 4:52
  • $\begingroup$ In my first case of $k$ equal height sticks, there is a length $n+1$ to be filled with $k+1$ gaps so the average gap comes from dividing one by the other. This is the expected gap (or horizontal ray) before any particular stick (and from the last stick to $n+1$). It transfers to the original question, taking into account sticks which are as high or higher than any particular stick. $\endgroup$
    – Henry
    Commented Apr 8, 2012 at 8:29
  • $\begingroup$ Very nice. This completely subsumes my solution; if all heights are distinct, then $E[V] = \sum_{k=1}^n \frac{n+1}{k+1} = (n+1)(H_{n+1} - 1) = (n+1)H_n - n$. $\endgroup$
    – JeffE
    Commented Apr 10, 2012 at 9:55
  • 2
    $\begingroup$ @Henry: For the k sticks equal height, n slots problem, what was your reasoning for average length = (n + 1) / (k + 1)? If I have k sticks and I want to know the average ray length of one of those sticks in every permutation of those k sticks in n slots, it does in fact equal your result, but I don't understand why. Is there logic or did you deduce it mathematically from doing what I described for 1 stick and n slots, then 2 sticks and n, slots, ... k sticks, n slots, and noticing that it equaled (n + 1) / (k + 1)? You mention adding an n + 1 slot. That seems very counter-intuitive. $\endgroup$
    – Alexandre
    Commented Mar 26, 2013 at 18:13
  • 3
    $\begingroup$ It is a question I have dealt with before. Start with a round table with $n+1$ seats and $k+1$ people and seat them at random. Distances between individuals are obviously i.i.d. with mean $(n+1)/(k+1)$. Now break the table at the $n+1^{\text{th}}$ person, remove that person and their seat, and straighten the table. Now you have the question here with $n$ seats and $k$ people but the same i.i.d. property and same mean. (Spot the rare rhyme for month) $\endgroup$
    – Henry
    Commented Apr 8, 2013 at 1:03
11
$\begingroup$

Henry's solution is both simpler and more general than this one!


$E[V]$ is roughly half the expected number of comparisons performed by randomized quicksort.

Assuming the sticks have distinct heights, we can derive a closed-form solution for $E[Y]$ as follows.

For any indices $i\le j$, let $X_{ij} = 1$ if $Y_j = \max\{Y_i,...,Y_j\}$ and $X_{ij}=0$ otherwise. (If the elements of $Y$ are not distinct, then $X_{ij}=1$ means that $Y_j$ is strictly greater than every element of $\{Y_i, \dots, Y_{j-1}\}$.)

Then for any index $j$, we have $v_j = \sum_{i=1}^j X_{ij}$ (Do you see why?) and therefore $$ V = \sum_{j=1}^n v_j = \sum_{j=1}^n \sum_{i=1}^j X_{ij}. $$

Linearity of expectation immediately implies that $$ E[V] = E\bigg[\sum_{1\le i\le j\le n} X_{ij}\bigg] = \sum_{1\le i\le j\le n} E[X_{ij}]. $$

Because $X_{ij}$ is either $0$ or $1$, we have $E[X_{ij}] = \Pr[X_{ij}=1]$.

Finally—and this is the important bit—because the values in $Y$ are distinct and permuted uniformly, each element of the subset $\{Y_i,...,Y_j\}$ is equally likely to be the largest element in that subset. Thus, $\Pr[X_{ij}=1] = \frac{1}{j-i+1}$. (If the elements of $Y$ are not distinct, we still have $\Pr[X_{ij}=1] \le \frac{1}{j-i+1}$.)

And now we just have some math. $$ \begin{aligned} E[V] &= \sum_{j=1}^n \sum_{i=1}^j E[X_{ij}] & [\text{linearity}]\\ &= \sum_{j=1}^n \sum_{i=1}^j \frac{1}{j-i+1} & [\text{uniformity}]\\ &= \sum_{j=1}^n \sum_{h=1}^j \frac{1}{h} & [h = j-i+1]\\ &= \sum_{h=1}^n \sum_{j=h}^n \frac{1}{h} & [1\le h\le j\le n]\\ &= \sum_{h=1}^n \frac{n-h+1}{h}\\ &= \left((n+1)\sum_{h=1}^n \frac{1}{h}\right) - \left(\sum_{h=1}^n 1\right)\\ &= (n+1)H_n - n \end{aligned} $$ where $H_n$ denotes the $n$th harmonic number.

Now it should be trivial to compute $E[V]$ (up to floating point precision) in $O(n)$ time.

$\endgroup$
2
  • $\begingroup$ Does this assume that the sticks are of distinct height? $\endgroup$
    – Aryabhata
    Commented Apr 7, 2012 at 7:25
  • $\begingroup$ Yes, it does assume distinct heights. (Apparently, I misread the question.) The equivalence with randomized quicksort still stands when there are ties, but not the closed-form solution. $\endgroup$
    – JeffE
    Commented Apr 7, 2012 at 7:45
5
$\begingroup$

As mentioned in the comments, you can use Linearity of Expectation.

Sort the $y$: $y_1 \le y_2 \le \dots \le y_n$.

For each $y_i$ consider the expected value of $v_i = E[v_i]$.

Then $E[\sum_{i=1}^{n} v_i] = \sum_{i=1}^{n} E[v_i]$

One straight-forward and naive way to compute $E[v_i]$ would be first fix a position for $y_i$. Say $j$.

Now compute the probability that at position $j-1$ you have a value $\ge y_i$.

Then the probability that at $j-1$ you have a value $\lt y_i$ and at $j-2$ you have a value $\ge y_i$

and so on which will allow you to compute $E[v_i]$.

You can probably make it faster by actually doing the math and getting a formula (I haven't tried it myself, though).

Hope that helps.

$\endgroup$
3
$\begingroup$

Expanding on the answer of @Aryabhata:

Fix an $i$, and assume the item $y_i$ is at position $j$. The exact value of the height is immaterial, what matters is whether the items are greater than or equal to $y_i$ or not. Therefore consider the set of items $Z^{(i)}$, where $z_k^{(i)}$ is 1 if $y_k\geq y_i$, and $z_k^{(i)}$ is 0 otherwise.

A permutation on the set $Z^{(i)}$ induces an corresponding permutation on the set $Y$. Consider for instance the following permuation of the set $Z^{(i)}$: "01000(1)$\dots$". The item $z_i^{(i)}$ is the one is brackets, at position $j$, and the items denoted by "$\dots$" don't matter.

The value of $v_i$ is then 1 plus the length of the run of consective zeros just to the left of $z_i^{(i)}$. It follows that $\mathbb{E}\left(v_i\right)$ is actually 1 plus the expected length of consecutive zeors, until the first "1" is met, if we pick at most $j-1$ bits from the set $Z^{(i)} \setminus \\{z_i^{(i)} \\} $(without replacement). This is reminiscent of the geometric distribution, except that it would be without replacement (and bounded number of draws). The expectation is to be taken on $j$ as well, as a uniform choice on the set of positions $\{1,\dots,n \}$.

Once this is computed (along these lines), we can follow the lines of @Aryabhata's answer.

$\endgroup$
-2
$\begingroup$

I dont really understand what do you demend, from tags it seems you are looking for an algorithm.

if so, what is the expected time complexity? by saying: "If we solve this problem using the naive approach it will not be efficient and run practically forever for n=50." it seems to me that your naive approach solves it in exponential time.

i do have a O(n^2) algorithm in mind tho.

assume int y[n], v[n] where v[i] initialized with 1; as described in the question
for (i=1;i<n;i++) 
   for ( j=i-1 ; j>=0 && y[j]<y[i] ; j--) v[i]++;
$\endgroup$