2
$\begingroup$

Suppose that a random sample $X_1, X_2, \ldots$ is drawn from a continuous spectrum of colors, or species, following a Chinese Restaurant Process distribution with parameter $|\alpha|$ (or equivalently $X_1,\ldots,X_n \mid P \sim P$ where $P$ is a Dirichlet process $\operatorname{DP}(\alpha)$ with atomless measure $\alpha$, because the CRP is the unique urn process associated with the DP). Therefore we assume that $X_i$ takes values in some measurable space $(S, \mathcal{S})$; for simplicity take $S = \mathbb{R}^{+}$ and $\mathcal{S}$ equal to the Borel sigma algebra on $\mathbb{R}^{+}$.

Assume that we keep track of the unique species $X_j^*$. We may do so by noting the arrival time of the $j$ th new color. Thus define $T_1=1$ and $$T_j=\min \left\{n: \sum_{i=1}^n D_i=j \: \: \text{for} \: \: D_i=1 \:\: \text{if} \: \: X_i \notin\left\{X_1, \ldots, X_{i-1}\right\}\right\}$$

and setting $X_j^*=X_{T_j}$. A sequence $X_1, \ldots, X_n$ will contain a random number of unique species, usually denoted by $K_n$.

My question is: are there know results to compute: $$ \mathbb{E}\left[\sum_{j=1}^{K_n} T_j \right]$$

The same question is also posted on MathstackExchange: here, but given its difficulty, I posted it also here.

PS: also upper bounds of the above quantity are of interest to me.

PS2: the probability of a "new draw" for the Dirichlet process, i.e. $X_i \notin\left\{X_1, \ldots, X_{i-1}\right\}$, is given by $\frac{|\alpha|}{|\alpha|+i}$. Moreover the distinct values in a random sequence $X_1, X_2, \ldots$ sampled from a distribution generated from a $\mathrm{DP}(\alpha)$-prior with atomless base measure form an i.i.d. sequence from $\bar{\alpha}$.

$\endgroup$
8
  • 1
    $\begingroup$ Crossposted: math.stackexchange.com/questions/4851045/… Generally you should try one of the two sites first, so no duplication of effort in answering occurs $\endgroup$
    – kodlu
    Commented Jan 25 at 15:30
  • $\begingroup$ You should give a more explicit description of the process. When $n \ge i$, given $X_1,\ldots,X_n$, what is the probability that $X_{n+1} = X_i$? $\endgroup$ Commented Jan 25 at 19:39
  • 1
    $\begingroup$ What is left unstated is this: What is the mathematical nature of your sample space? Is it a connected open set of the real numbers? Of some higher-dimensional euclidean space? Of an infinite-dimensional space? I guess we can infer that it is the first option because you take an infimum of several values. But this should not have to be inferred. (Colors and species are entirely irrelevant to the mathematics.) $\endgroup$ Commented Jan 25 at 19:48
  • $\begingroup$ You can assume, as written, that, given a realization from $P \sim \operatorname{DP}(\alpha)$ the samples are i.i.d. realization from $P$, i.e. $X_1,...,X_n | P \sim P$. For further simplicity, you can assume the $X_i$ take values in $R^{+}$. Therefore, the probability of a "new draw", i.e. $X_n \notin \{X_1,...,X_{n-1} \}$, is $\frac{|\alpha|}{|\alpha| + n}$. This is a known property of the predictive distributions of the Dirichlet process. $\endgroup$ Commented Jan 25 at 20:08
  • $\begingroup$ So $\Pr(T_1=1)=1.$ The first one has no randomness in it. $\endgroup$ Commented Jan 25 at 22:25

1 Answer 1

0
$\begingroup$

I think the desired expectation can be written as $${\mathbf{E}\biggl[\sum_{j = 1}^{K_n} T_j \biggr]} = {\mathbf{E}\biggl[\sum_{t = 1}^n t \, \mathbf{1}(\text{new species at time }t)\biggr]} = {\sum_{t = 1}^n t \, \mathbf{P}[\text{new species at time }t].}$$ From here, plugging in the probability on the right gives an exact expression. Approximating the resulting sum by an integral gives about $|\alpha| \, n - |\alpha|^2 \log n$ for large $n$. A little care comparing the sum and integral yields upper and lower bounds.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.