2
$\begingroup$

Sorry for the confusing title, I’ll elaborate with an example.

You have been assigned group #4, and you are curious how many groups in total there are. Assuming groups are numbered sequentially starting at 1 and group assignment is random, you can assume there are at least 4 groups.

So far, I know that the most probable number of groups is your group number. You are more likely to be in group 4 if there are 4 groups, than if there were 1000. There is some kind of decay model where the larger the guess, the less probable it is, but I can’t figure it out. Whatever the probability model is, the area under it should equal 1.

$\endgroup$
1
  • $\begingroup$ This is known as the German tank problem. Your "Whatever the probability model is, the area under it should equal 1" suggests you want a Bayesian approach, in which case you would need a prior distribution $\endgroup$
    – Henry
    Commented Jun 24, 2022 at 14:55

2 Answers 2

1
$\begingroup$

This question fits into the classical theory of statistical estimators. You have observed information (your group number $N$) and want to find a probability distribution for the total number of groups ($G$). Unfortunately, estimation theory tells us that your problem is subtly underspecified and cannot be solved as written.

Specifically, you haven't told us what your goal is.

For example, suppose you want to choose a probability distribution $\mathrm{dist}{(G)}$ that maximizes the probability that you fall into group $n$. Well, temporarily fix $G=g$. Assuming you have an equal chance of being placed in any given group, the probability that you are in group $n$ is $$\mathbb{P}[{N=n}|{G=g}]=\frac{1_{1\leq n\leq g}}{g}=\begin{cases}\frac{1}{g}&1\leq n\leq g\\0&\text{otherwise}\end{cases}$$ Varying $g$, $$\mathbb{P}[{N=n}]=\sum_{g=n}^{\infty}{\frac{\mathbb{P}[{G=g}]}{g}}$$ To maximize this probability, we should set $\mathbb{P}[{G=g}]=1_{G=n}$: it is most likely that you are in the group of largest index, and giving probability to any other group-count does worse.

Of course, that isn't the sort of decay model you're expecting, because you haven't written down your expectations as prior probabilities; the other answer here explains how to proceed from there.

$\endgroup$
0
$\begingroup$

You need prior probabilities for this to have an answer.

Suppose $T$, the total number of groups, is a discrete random variable taking one of the values $1,2,3,\cdots$ with probabilities $p_1,p_2,p_3,\cdots$ and $G$, the group you're assigned, is a discrete random variable uniformly drawn from the numbers $1,2,\cdots,T$.

By design, we have the conditional probability

$$ \mathbb{P}(G=k\mid T=n)=\begin{cases}1/n & 1\le k\le n \\ 0 & \textrm{otherwise} \end{cases} $$

You want to know the conditional probability $\mathbb{P}(T=n\mid G=4)$, which Bayes rule says is

$$ \mathbb{P}(T=n\mid G=4)=\frac{\mathbb{P}(G=4\mid T=n)\mathbb{P}(T=n)}{\mathbb{P}(G=4)} $$

$$ = \frac{p_n/n}{p_4/4+p_5/5+p_6/6+\cdots}. $$

Under the assumption $p_k$ is nonincreasing with $k$, that means $\mathbb{P}(T=n\mid G=4)$ is decreasing with $n$. But notice this can fail otherwise, for example it is possible for $T$ to have prior distribution with $p_4=0$ and $p_5>0$, which makes $T=5$ more likely than $T=4$ given $G=4$.

Anyway, the conditional expected value is

$$ \mathbb{E}(T\mid G=4)=\sum_{n\ge4} \mathbb{P}(T=n\mid G=4)n=\frac{p_4+p_5+p_6+\cdots}{p_4/4+p_5/5+p_6/6+\cdots} $$

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .