4
$\begingroup$

I'm a bit lost on understanding this formula in my bioinformatics text, and I appreciate any tips or advice.

Mutual Information, $\operatorname{MI}(X; Y)$ is: $$ \mu = \sum_x \sum_y p(xy) \log \left( \frac{p(xy)}{p(x) p(y)} \right). $$ If $X$ and $Y$ are independent r.v.'s, then $\operatorname{MI} = 0$.

Original link.

By the way, It's part of this larger picture

Thank You very much.

$\endgroup$
2
  • $\begingroup$ Many thanks for the edit, I will learn this notation. $\endgroup$ Commented Sep 19, 2011 at 3:44
  • 2
    $\begingroup$ @all, I don't understand the downvote. It seems like a reasonable question. $\endgroup$
    – Srivatsan
    Commented Sep 19, 2011 at 3:46

1 Answer 1

3
$\begingroup$

Well, without a little more background on which parts you understand and which parts you don't, it's hard to gauge the level at which to pitch the explanation. However, the situation is this:

We have two discrete random variables X and Y, which in this case represent the nucleotide at a a particular position in a DNA sequence, so $X$ and $Y$ take on the values A, C, G, and T.

In the equation above, $p(xy)$ denotes ${\rm Pr}[X = x,\; Y = y]$, i.e. the probability that X takes on the value x and Y takes on the value Y. $p(x)$ denotes ${\rm Pr}[X = x]$, the probability that $X$ takes on the value $x$ without regard to $Y$, and similarly for $p(y)$. So, in other words, we could write this in more readable form as

$${\rm MI}(X; Y) = \sum_{x \in \{A, C, G, T\}} \sum_{y \in \{A, C, G, T\}} {\rm Pr}[X = x,\; Y = y] \log \left( \frac{{\rm Pr}[X = x, Y = y]}{{\rm Pr}[X = x] {\rm Pr}[Y = y]}\right)$$

If $X$ and $Y$ are independent, then

$${\rm Pr}[X = x,\; Y = y] = {\rm Pr}[X = x] {\rm Pr}[Y = y]$$

so we have

$$\log \left( \frac{{\rm Pr}[X = x, Y = y]}{{\rm Pr}[X = x] {\rm Pr}[Y = y]}\right) = \log 1 = 0$$

for every possible choice of $x$ and $y$, and therefore when we calculate $MI(X; Y)$ we're just adding up sixteen zeros, which explains why it says that $MI(X; Y) = 0$ when $X$ and $Y$ are independent.

If you're still getting used to probabilities and statistics, you're not going to pick something like this up on the first try. Try drawing out a 4x4 grid with the various possibilities, filling it in with different probabilities, and calculating the mutual information to see how it works.

EDIT: You can try it out at

https://docs.google.com/spreadsheet/ccc?key=0AhBSLKlaRyzedHhHX2d4LXlPR2lmMmVORzg3ZjBleUE&hl=en_US

Just make sure only to edit the yellow cells. You can also check out the Wikipedia article:

http://en.wikipedia.org/wiki/Mutual_information

$\endgroup$
3
  • 1
    $\begingroup$ Thank You very much - excellent answer and I am truly indebted. The re-written formula cleared it up for me(A,C,T,G). I'm much closer now and will read the wiki link too. And the spreadsheet is very helpful also. Thank You - you are awesome! $\endgroup$ Commented Sep 19, 2011 at 4:52
  • 2
    $\begingroup$ No problem -- just take this and go cure cancer and we'll be square. $\endgroup$ Commented Sep 19, 2011 at 5:00
  • $\begingroup$ @user3296 Oh, don't worry. We're working on it. $\endgroup$
    – MrGomez
    Commented Oct 14, 2011 at 0:15

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .