9
$\begingroup$

There are $m$ normally distributed, independent random variables $N_1, \ldots, N_m$ with distinct means $\mu_1, \ldots \mu_m$ and standard deviations $\sigma_1, \ldots, \sigma_m$. Then, we get a permutation of the numbers $\{1, \ldots, m\}$. How can we efficiently compute, numerically, the (log) probability of observing the random variables in same ordering as this permutation?

An example:

  1. we have four independent random variables $N_1, N_2, N_3, N_4$, all with different means and variances.
  2. We are given the permutation (3, 1, 2, 4).
  3. What's $\Pr(N_3 > N_1 > N_2 > N_4)$?

A closed-form solution is not necessary, but computing the solution using an efficient algorithm with good accuracy is. Also, it's probably necessary to compute a log probability due to the fact that when the number of variables becomes large, computing the actual probability will result in a floating-point underflow.

Some starting points, perhaps...

The most direct way to compute this value, using the example above, is evaluating one of the following integrals, which I believe are equivalent:

$$ \int_{-\infty}^\infty \int_{n_4}^\infty \int_{n_2}^\infty \int_{n_1}^\infty p(n_1)p(n_2)p(n_3)p(n_4)\ dn_3 dn_1 dn_2 dn_4 $$

$$ \int_{-\infty}^\infty \int_{-\infty}^{n_3} \int_{-\infty}^{n_1} \int_{-\infty}^{n_2} p(n_1)p(n_2)p(n_3)p(n_4)\ dn_4 dn_2 dn_1 dn_3 $$

Where $p(n_i)$ is the density function of the variable $N_i$. However, when I tried to implement this numerically, it is inefficient, prone to inaccuracy, and runs into underflow errors when the number of variables gets large. If you think you can compute this integral in an acceptable way, please do post your answer!

From one of the answers below, we observe that it's possible to compute $\Pr(N_3 > N_1 > N_2 > N_4)$ directly by evaluating a multivariate normal CDF of dimension $(m-1)$, or 3 in this case. However, this is still nontrivial (though there may be libraries for it), and will underflow for many variables.

Perhaps we can divide the probability up as follows:

$$\Pr(N_3 > N_1 > N_2 > N_4) = $$ $$\Pr(N_3 > N_1 \mid N_1 > N_2, N_2 > N_4 )\Pr(N_1 > N_2 \mid N_2 > N_4 )\Pr(N_2 > N_4)$$

Being able to compute the probabilities of each part directly would make it very easy to compute the log probability simply by adding. We can compute the conditional probabilities separately using the MVN CDF method, which could help if the product might underflow.

Another observation: the $m!$ possible probabilities corresponding to the different permutations must sum to 1. Perhaps there is a way to compute the probabilities iteratively or using dynamic programming: i.e.: $(N_2 > N_3)$, an ordering over a pair, has some fixed probability, which is further divided into three values by the three possible places to insert $N_1$ into the ordering, further divided into the four values by the possible places to insert $N_3$. This is semantically equivalent to the conditional probabilities above but it might be easier to think of it this way.

Any math wizards have suggestions on how to solve this problem? I would greatly appreciate any ideas!

$\endgroup$
2
  • 1
    $\begingroup$ I surmise that the variables are independent? $\endgroup$
    – leonbloy
    Commented Jan 5, 2013 at 18:57
  • $\begingroup$ Yes, independent. Updated the question. $\endgroup$
    – Andrew Mao
    Commented Jan 6, 2013 at 1:08

4 Answers 4

2
$\begingroup$

To start from your example, note that the unconitional part is easy, since $N_2 - N_4$ is normal with mean $\mu_2 - \mu_4$ and variance $\sigma_2^2 + \sigma_4^2$. Then for the first conditional probability, you have $P(N_1 - N_2 > 0 | N_2 - N_4 > 0)$. Again, these are both normals of which you can compute the covariance. With that, you should be able to compute the probability. As for the first term in your sum, note that the event is independent of the second conditioning events, so you have a case like the first. I believe this should carry over for larger m and you can throw out most of the conditioning terms. Either way, you still just have normals where you can compute the covariance. And since conditioning on dependent normals just works out to linear projections you should be good. I think that answers it.

EDIT: For a slightly clearer explanation, $N_1 - N_2$ and $N_2 - N_4$ can be thought of as $$ \begin{bmatrix}N_1 - N_2\\ N_2 - N_4 \end{bmatrix} = \begin{bmatrix}1 & -1 & 0 & 0 \\ 0 & 1 & 0 & -1 \end{bmatrix}\begin{bmatrix}N_1 \\ N_2 \\ N_3 \\ N_4 \end{bmatrix} $$ Note that we can do this for any number of differences. So computing the covariance matrix is no problem. Then, the conditional distribution of the first term given the rest is given here: https://en.wikipedia.org/wiki/Multivariate_normal_distribution#Conditional_distributions Ah, my good old friend the Schur Complement. I forget the proof though off the top of my head...

EDIT2: Ah, I think I may have been a little sloppy. That's just conditional on a random variable. But I think you can still use the same principle since $P(X>0 | Y>0) = \frac{P(X>0, Y>0)}{P(Y>0)}$ which you should be able to get from the joint distribution.

EDIT3: Since I don't yet have enough reputation to comment on leonbloy's concern, I will post it here. In that example, you have gone from a two dimensional space to a three dimensional space, so the transformation is rank deficient and you get a degenerate covariance matrix in XYZ space.

$\endgroup$
4
  • 3
    $\begingroup$ Can you flesh out the conditional probability a little more? I'm pretty sure I can't compute it from a normal CDF, or even using a multivariate normal. If it were that simple, I wouldn't have gone to the trouble to post this question :) Sorry if I'm being obtuse but your answer is a bit hand-wavy to me. $\endgroup$
    – Andrew Mao
    Commented Jan 5, 2013 at 14:42
  • $\begingroup$ Thanks. I can see how you would get this from the multivariate normal CDF now. However, evaluating that is still a numerical multi-dimensional integral. Moreover, we don't need the conditional probabilities anymore - we can just directly evaluate $P(N_3 > N_1, N_1 > N_2, N_2 > N_4)$ using the MVN CDF. There is the benefit of a possible existence of a library for this, although it may not solve the underflow problem. I will wait a bit to see if anyone else has a cleverer solution. $\endgroup$
    – Andrew Mao
    Commented Jan 6, 2013 at 1:45
  • $\begingroup$ If you use this to break it up into many bivariate normals since for each term in the expansion of the joint all but one of the conditioning variables are independent, you should have less of an issue with overflow. $\endgroup$
    – user55269
    Commented Jan 6, 2013 at 16:25
  • $\begingroup$ Unfortunately (for me), the problem can't be broken up into bivariate normals. For other readers: see my other answer and comments. $\endgroup$
    – Andrew Mao
    Commented Jan 7, 2013 at 21:01
2
$\begingroup$

The answers to this question resulted in some discussion, and the answer is not clear from the other posts, so I just wanted to write a more clear result for posterity. Please do comment if you see a mistake.

The first observation is that $\Pr(N_3>N_1>N_2>N_4)$ can be evaluated directly using the multivariate normal CDF. To see this, write $$Y_1 = N_1 - N_3\\ Y_2 = N_2 - N_1\\ Y_3 = N_4 - N_2$$ Then, $(Y_1, Y_2, Y_3)$ is a multivariate normal distribution with mean vector and covariance matrix computable from our original means and variances (see here), and $\Pr(Y_1 < 0, Y_2 < 0, Y_3 < 0)$ gives us the desired answer.

However, computing the multivariate normal CDF is nontrivial, and there is an accuracy tradeoff when the number of variables gets large. There are packages in MATLAB and R that can compute it, all based on Alan Genz' original code. I have a StackOverflow question on how this might be done from Java.

The statements below are incorrect due to the fact that $P(X\mid Y,Z) = P(X\mid Y)$ iff $X\perp Z \mid Y$, which unfortunately does not hold in this case. See if you can spot the mistake; however in general it is much easier to compute the CDF of a bivariate normal than a multivariate and you should see if you can break your problem up as such.

However, that's not the best we can do in this case, especially when we require an accurate log probability for a large number of variables. In that case, we turn to the chained conditional probabilities from above:

$$\Pr(N_3 > N_1 > N_2 > N_4) = \\ \Pr(N_3 > N_1 \mid N_1 > N_2, N_2 > N_4 )\Pr(N_1 > N_2 \mid N_2 > N_4 )\Pr(N_2 > N_4)$$

We can compute each of the probabilities in the product independently and simply add them. Notice that $\Pr(N_3 > N_1 \mid N_1 > N_2, N_2 > N_4 ) = \Pr(N_3 > N_1 \mid N_1 > N_2)$ because $N_3 - N_1$ is independent of $N_2 - N_4$; and furthermore if there is a large number of terms we can drop all but the first difference in the conditional predicate. Then, we note that everything just boils down to evaluating a bunch of bivariate and univariate normal CDFs:

$$ \Pr(N_3 > N_1 \mid N_1 > N_2) = \frac{\Pr(N_3 > N_1, N_1 > N_2)}{\Pr(N_1 > N_2)}$$

When actually computing this, you get the denominator of the second-last term to cancel with the last term, so for this example it is just

$$\Pr(N_3 > N_1 > N_2 > N_4) = \frac{\Pr(N_3 > N_1, N_1 > N_2)}{\Pr(N_1 > N_2)} \Pr(N_1 > N_2, N_2 > N_4 )$$

Note the symmetry above. When computing log probabilities, it is easy to just add and subtract pieces as necessary. Thankfully, the bivariate normal CDF is much quicker and more accurate to evaluate than the general multivariate case. There are good libraries in MATLAB and R, and also a great Java package here.

$\endgroup$
5
  • $\begingroup$ "Notice that $Pr(N_3>N_1∣N_1>N_2,N_2>N_4)=Pr(N_3>N_1∣N_1>N_2)$ because $N_3−N_1$ is independent of $N2−N4$" Mmmm, I wonder... if $X$ and $Z$ are independent, can we assume that $P(X|Y Z) = P(X |Y)$? $\endgroup$
    – leonbloy
    Commented Jan 6, 2013 at 23:14
  • $\begingroup$ In this case, you can compute that the covariance of $N_3 - N_1$ and $N_2 - N_4$ is zero, and since they are jointly normally distributed, they are independent. However, this is not true in general: en.wikipedia.org/wiki/… $\endgroup$
    – Andrew Mao
    Commented Jan 6, 2013 at 23:28
  • $\begingroup$ We know that $N3−N1$ and $N2−N_4$ are independent, what I object is that the conditioning preserves that property. I repeat my question above (add that X Y Z are jointly normal). $\endgroup$
    – leonbloy
    Commented Jan 7, 2013 at 0:39
  • $\begingroup$ As counterexample: let $A$ $B$ be iid zero mean normals, let $X=A+B$, $Z=A-B$, $Y=A$. Then, $X$ and $Z$ are independent, hence $P(X|Z)=P(X)$. But $P(X|YZ) \ne P(X|Y)$ $\endgroup$
    – leonbloy
    Commented Jan 7, 2013 at 0:42
  • $\begingroup$ Yes, you're absolutely right. $P(X\mid Y,Z) = P(X\mid Y)$ iff $X \perp Z \mid Y$, which is not true in this case either. Proof: $$P(X\mid Y,Z) = P(X,Z\mid Y)/P(Z\mid Y) \stackrel{?}{=} P(X\mid Y)P(Z\mid Y)/P(Z\mid Y) = P(X\mid Y)$$ I'll update the answer. $\endgroup$
    – Andrew Mao
    Commented Jan 7, 2013 at 20:59
0
$\begingroup$

Depending on the number of variables and the desired accuracy, you might want to consider using Monte Carlo integration with importance sampling. You'd need to overrepresent configurations with the desired order, but that would probably not be enough, since the configuration would then wander aimlessly until it hits the right order, so the variance would still be very high. A solution to this might be to introduce a penalty for each of the $m(m-1)/2$ possible pair inversions with respect to the desired order, and perhaps to determine those penalties according to the probabilities of the inversions, which are easy to compute.

$\endgroup$
0
$\begingroup$

My solution here builds on the ideas from previous answers, but uses results from Miwa, Hayter, and Kuriki (2003) to carry them through to a complete answer. I use vocabulary terms from Miwa et al. (2003) to facilitate searching for related results.

Our first step is to transform the $m$ normal variables into $m-1$ difference random variables.

$$P(N_3>N_1>N_2>N_4) = P(N_3-N_1>0,N_1-N_2>0,N_2-N_4>0)$$

Any linear combination of independent normal random variables is itself a normal random variable. This entails two relevant facts. First, each of the difference variables is itself a normal variable, with a mean given by the difference of means of the two constituent random variables, and with a variance of the sum of variances. Second, any linear combination of the difference variables will also be normally-distributed. This means that the random vector below is a multivariate normal distribution:

$$\mathbf{N}=(N_3-N_1,N_1-N_2,N_2-N_4)\sim\mathcal{N}(\boldsymbol\mu,\boldsymbol\Sigma)$$

The mean vector $\boldsymbol\mu$ for $\mathbf{N}$ is as follows:

$$\boldsymbol\mu=(\mu_3-\mu_1,\mu_1-\mu_2,\mu_2-\mu_4)$$

The covariance matrix $\boldsymbol\Sigma$ for $\mathbf{N}$ is as follows:

$$\boldsymbol\Sigma = \left[ \begin{matrix} \sigma_3^2+\sigma_1^2 & -\sigma_1^2 & 0 \\ -\sigma_1^2 & \sigma_1^2+\sigma_2^2 & -\sigma_2^2 \\ 0 & -\sigma_2^2 & \sigma_2^2+\sigma_4^2 \\ \end{matrix} \right] $$

In general, an on-diagonal variance entry is the sum of the constituent variables' variances, while a just-off-diagonal covariance is the negative variance of the shared constituent variable. All other covariances are zero.

Given $\mathbf{N}$, we are looking for the probability that each coordinate is greater than zero. This is called the (non-centered) orthant probability for $\mathbf{N}$ (non-centered when the $\boldsymbol\mu$ isn't all zeros). Calculating orthant probabilities for general multivariate normal distributions is relatively hard, but this is a special case in which the covariance matrix is tridiagonal, where the only non-zero entries are along the main diagonal and two adjacent diagonals. This type of orthant probability is called a (non-centered) orthoscheme probability, and it is easier to calculate - it can be done with constant memory, and in time linear in $m$. The method below is taken directly from Miwa et al. (2003).

The first step is to produce a Cholesky decomposition of the covariance matrix. That is, given our covariance matrix $\boldsymbol\Sigma$, we need a lower-triangular matrix $\mathbf{B}$ such that $\boldsymbol\Sigma=\mathbf{B}\mathbf{B}^\intercal$. This can be calculated in a few different ways (described in the Wikipedia page I linked to), although the general algorithms simplify significantly because of all of the zeros. $\mathbf{B}$ is included below for $\boldsymbol\Sigma$ above.

$$\mathbf{B} = \left[ \begin{matrix} \sqrt{\sigma_3^2+\sigma_1^2} & 0 & 0 \\ -\sigma_1^2\sqrt{\frac{1}{\sigma_3^2+\sigma_1^2}} & \sqrt{\frac{\sigma_3^2\sigma_1^2+\sigma_3^2\sigma_2^2+\sigma_2^2\sigma_1^2}{\sigma_3^2+\sigma_1^2}} & 0 \\ 0 & -\sigma_2^2\sqrt{\frac{\sigma_3^2+\sigma_1^2}{\sigma_3^2\sigma_1^2+\sigma_3^2\sigma_2^2+\sigma_2^2\sigma_1^2}} & \sqrt{\frac{\sigma_3^2\sigma_1^2\sigma_2^2+\sigma_3^2\sigma_1^2\sigma_4^2+\sigma_3^2\sigma_2^2\sigma_4^2+\sigma_1^2\sigma_2^2\sigma_4^2}{\sigma_3^2\sigma_1^2+\sigma_3^2\sigma_2^2+\sigma_2^2\sigma_1^2}} \\ \end{matrix} \right] $$

You can find a closed-form expression for these entries, but it's easier to just define them recursively. For an on-diagonal entry, $b_{i,i}=\sqrt{\Sigma_{i,i}-b_{i,i-1}}$. For an off-diagonal entry, $b_{i+1,i}=\Sigma_{i+1,i}/b_{i,i}$. For the base case, abuse notation and let $b_{1,0}=0$.

Miwa et al. (2003) then give a way to turn our $(m-1)$-dimensional integral into $(m-1)$ 1-dimensional integrals (my presentation slightly compresses theirs).

Let $f_m(z)=1$. We then recursively define:

$$f_{i-1}(z) = \int\limits_{(-\mu_i-b_{i,i-1}z)/b_{i,i}}^\infty f_i(t)\phi(t)dt$$

Here, $\mu_i$ is the $i$-th coordinate of the $\boldsymbol\mu$ (not the mean of the original $N_i$), $b_{i,j}$ is an entry in $\mathbf{B}$, and $\phi(t)$ is the pdf of the standard Normal distribution, $\frac{1}{\sqrt{2\pi}}e^{-\frac{t^2}{2}}$.

If we once again define $b_{1,0}=0$, then the orthant probability is simply $f_0(z)$ for any value of $z$.

You can calculate these integrals with whatever numerical method you'd like. Miwa et al. chop the integrals up into a grid of points, and approximate each segment as a piecewise cubic polynomial. You can find their C code inside an R package created by Peter Craig at http://www.maths.dur.ac.uk/~dma0psc/orthants/ (the file is orthants/src/orschm.c), which also builds on their earlier paper, Miwa et al. (2000). I've linked here to my Python implementation using a similar grid, but piecewise linear functions (i.e. simple trapezoid rule integration). My code doesn't generalize beyond ordering normal random variables, and isn't efficient as other numerical integration algorithms out there, but could easily be ported into other languages or projects.

Bibliography

Miwa, T., Hayter, A. J., & Kuriki, S. (2003). The evaluation of general non-centred orthant probabilities. Journal of the Royal Statistical Society Series B: Statistical Methodology, 65(1), 223-234.

https://doi.org/10.1111/1467-9868.00382

Miwa, T., Hayter, A. J., & Liu, W. (2000). Calculations of level probabilities for normal random variables with unequal variances with applications to Bartholomew's test in unbalanced one-way models. Computational Statistics & Data Analysis, 34(1), 17-32.

https://doi.org/10.1016/S0167-9473(99)00075-4

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .