7
$\begingroup$

Suppose we have a finite probability space $S,|S|=n$ with the probability defined as $P(A)=|A|/n$. There is a set of random variables $V =\{A_1,A_2,\cdots\}$ defined over $S$, mutually independent. what is the maximum of $|V|$?

For example, if $|S|=1$, then $V$ has only the constant. [EDIT: The constant are independent of each other, so we should omit this trivial case.] If $|S|=2$, then $V$ contains the constant and the variable where the two values are different. What about the $S$ with more elements?

My guess is that it's the number of factors of $n$.

$\endgroup$
14
  • 1
    $\begingroup$ shouldn't be always infinite? When $|S|=1$, every random variable is independent from each other... $\endgroup$
    – Exodd
    Commented May 31 at 13:09
  • 1
    $\begingroup$ Instead of independent random variables, we could ask for independent events. $\endgroup$
    – GEdgar
    Commented May 31 at 15:37
  • 1
    $\begingroup$ @Exodd Are you really claiming you can't have two non-degenerate independent random variables on the same sample space? Aside from the fact that we regularly deal with independent discrete random variables, there's an example earlier in the comments... $\endgroup$ Commented May 31 at 18:36
  • 1
    $\begingroup$ I see what I got wrong: if an injective random variable is independent to another, then the second must be constant, my bad $\endgroup$
    – Exodd
    Commented May 31 at 18:45
  • 1
    $\begingroup$ @Exodd The old question is for the case where probabilities can be chosen arbitrarily, whereas the current question focuses on the equally likely case. Furthermore, the answer given there is incomplete, and the derived upper bound cannot be used for the general case, and can be very far from the actual value obtained for the equally likely case. $\endgroup$
    – Amir
    Commented May 31 at 19:43

2 Answers 2

1
$\begingroup$

As also stated by @GEdgar, the problem is equivalent to find the maximum number of mutually independent events, since we can consider an indicator random variable for each event.

Given the prime factorization of $n = p_1^{k_1} p_2^{k_2} \cdots p_m^{k_m}$, if $n$ is not a prime number, the answer is $$\color{blue}{R(n)=\sum_{i=1}^mk_i}, \tag{1}$$

else $R(n)=0$. Let us write $n = \prod_{j=1}^{R(n)}a_j$, where each $a_j$ is a prime number. Then, the events $A_1,\dots, A_{R(n)}$ are such that $$|A_j|=\frac{n}{a_j}, |A_jA_k|=\frac{n}{a_ja_k}, |A_jA_kA_l|=\frac{n}{a_ja_ka_l}, \dots. \tag{2}$$

For example, for $n=12=2×2×3$, we have $R(12)=3$, attained for the sets

$$ A_1=\{1,2,3,4,6,7\}$$ $$A_2=\{1,2,3,5,8,9\}$$ $$A_3=\{1,4,5,10\}.$$

Point 1: To see how $R(n)$ given in (1) is obatined, it is enough to focus on the probability of the intersection of all the sets $\cap_{j=1}^{R(n)}A_j$. For the above example,

$$\frac{1}{12}=\frac{1}{2}×\frac{1}{2}×\frac{1}{3}=\frac{6}{12}×\frac{6}{12}×\frac{4}{12},$$

which clearly shows that the maximum number of independent events $R(12)$ is $3$, which can be attained for the sets stratifying (2), shown to exist in points 2 and 3.

Point 2: To construct the events $A_1,\dots, A_R$, we can use a simple down-up method illustrated for the give example. From (2), $a_1=a_2=2,a_3=3$, we know that $$|A_1|=|A_2|=6, |A_3|=4$$ $$|A_1\cap A_2|=3, |A_1\cap A_3|=|A_1\cap A_3|=2$$ $$|A_1\cap A_2\cap A_3|=1.$$ First put $1$ in $A_1\cap A_2\cap A_3=\{\color{blue}{1}\}.$ Then, using new distinct members complete $A_1\cap A_2=\{1,\color{blue}{2,3}\}$, $A_1\cap A_3=\{1,\color{blue}{4}\}$,$A_1\cap A_2=\{1,\color{blue}{5}\}$. Finally, add the new distinct members to each set $A_1=\{1,2,3,4,\color{blue}{6,7}\}$, $A_2=\{1,2,3,5,\color{blue}{8,9}\}$, $A_3=\{1,4,5,\color{blue}{10}\}$ so that we have $|A_1|=|A_2|=6, |A_3|=4$.

Point 3: The procedure described in point 2 always has a feasible solution as the total number of required distinct members is less than $n$. Indeed, for $n=\prod_{i=1}^{R(n)}a_j$ by inclusion-exclusion principle the required number of elements can be obtained as follows:

$$\sum \frac{n}{a_j}-\sum \frac{n}{a_ja_k}+\sum \frac{n}{a_ja_ka_l}, \dots=n\left (1-\prod_{j=1}^{R(n)}\left (1-\frac{1}{a_j} \right) \right)=n-\prod_{j=1}^{R(n)}\left (a_j-1 \right)<n. $$

$\endgroup$
2
  • $\begingroup$ The sequence $R(n)$ described here seems to be tabulated over at OEIS as A144494. (If it were $R=1$ for primes instead, the sequence is more well-known: A001222.) $\endgroup$ Commented Jun 1 at 21:00
  • $\begingroup$ @Semiclassical Thank you for the nice point! $\endgroup$
    – Amir
    Commented Jun 1 at 21:09
1
$\begingroup$

Instead of mutually independent random variables, I will discuss the problem of the largest collection of mutually independent nontrivial events (trivial events have probability 0 or 1). This is an equivalent problem, since you can turn these events into indicator random variables.

This question is fully answered in the paper below:

Eisenberg, B., & Ghosh, B. K. (1987). Independent Events in a Discrete Uniform Probability Space. The American Statistician, 41(1), 52. https://doi.org/10.2307/2684321

Certainly, if $n=p_1^{m_1}\cdots p_k^{m_k}$, where $p_1,\dots,p_k$ are distinct primes, then it is possible to find $m_1+\dots+m_k$ mutually independent events. To construct these, define $P_i=\{1,2,\dots,p_i\}$ for each $i\in \{1,\dots,k\}$, and then we define $$ N=\prod_{i=1}^k(P_i)^{m_i}. $$ Note $N$ is a set of size $n$, so we can use $N$ as our sample space. Outcomes in $N$ are vectors $(n_1,n_2,\dots,n_m)$, where $m=\sum_{i=1}^km_i$, and each entry of $n$ is constrained to be in $\{1,\dots,p_i\}$, where $p_i$ is the prime corresponding to that position. Then, for each coordinate, we define an event consisting of all tuples where that coordinate is equal to $1$. You can prove these $m$ events are mutually independent.


This construction is optimal, because Eisenberg and Ghosh proved that whenever $t$ mutually independent nontrivial events exist, then $n$ has at least $t$ prime factors. I will paraphrase their proof.

Suppose there are $t$ mutually independent events $E_1,\dots,E_t$ on a probability space of size $n$, with the uniform measure. Let $p$ be any prime factor of $n$, occurring with multiplicity $m$.

Claim: There are at most $m$ events for which $p^m$ does not divide into $|E_i|$.

Proof: Let $$ I=\{i\in \{1,\dots,t\} \mid p^m \text{ does not divide $|E_i|$}\}. $$

By mutual independence, we know that $P(\bigcap_{i\in I} E_i)=\prod_{i\in I}P(E_i)$. Multiplying both sides by $n^{|I|}$, and letting $e=|\bigcap_{i\in I} E_i|$, we get $$ e\cdot n^{|I|-1} = \prod_{i\in I}|E_i| $$ Conclude by counting the number of factors of $p$ on both sides. The LHS has at least $m(|I|-1)$ copies of $p$ in $n^{|I|-1}$. Since each $|E_i|$ has at most $m-1$ copies of $p$, the RHS has at most $(m-1)\cdot |I|$ copies of $p$. This proves $m(|I|-1)\le (m-1)\cdot |I|$, from whence $|I|\le m$. $\tag*{$\square$}$

Finally, let $n=p_1^{m_1},\dots,p_k^{m_k}$ be the prime factorization of $n$. By the logic above, for each $p_j$, there are at most $m_j$ events in the list $E_1,\dots,E_t$ for which $|E_i|$ does not contain $p_j^{m_j}$ as a factor. Now, assuming that $t>m_1+\dots+m_k$, since there are at most $m_i$ events which are not a multiple of $p_i^{m_i}$, there must be at least one event, $E_i$, for which $|E_i|$ is a multiple of the entire list $p_1^{m_1},p_2^{m_2},\dots,p_k^{m_k}$. But this means $|E_i|$ is a multiple of $n$, so either $|E_i|=0$ or $|E_i|=n$. This contradicts the assumption that $E_i$ was nontrivial. We conclude that $t\le m_1+\dots+m_k$.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .