1
$\begingroup$

So I have a hidden markov model with two hidden states $z = a$ and $z = b$. My emission probabilities are given by:

$$ P\left( x_{n} \mid z = a \right) = \frac{\pi_{1}}{\pi_{1} + \pi_{2}} \mathcal{N}(x_{n} | \mu_{1} = 0.75, \sigma^{2}_{1}) + \frac{\pi_{2}}{\pi_{1} + \pi_{2}} \mathcal{N}(x_{n} | \mu_{2} = 10, \sigma^{2}_{2}) $$

$$ P\left( x_{n} \mid z = b \right) = \frac{\pi_{3}}{\pi_{3} + \pi_{4}} \mathcal{N}(x_{n} | \mu_{3} = 0.2, \sigma^{2}_{3}) + \frac{\pi_{4}}{\pi_{3} + \pi_{4}} \mathcal{N}(x_{n} | \mu_{4} = 18, \sigma^{2}_{4}) $$

So my hidden state space is:

$$ \mathcal{Z} = \left\{ \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \begin{bmatrix} 0 \\ 1 \end{bmatrix} \right\} $$

Suppose that I know my transition probability matrix $\mathbf{A}$ and my initial probabilities $\pi_{z}$. I denote $\phi$ to be the parameters of the mixture distributions, which of course I know as well. My question is -- how do you actually run the $\alpha$, $\beta$, and $\mu$ recursions?

These recursions are given by: $$ \alpha\left( \mathbf{z}_{1} \right) = p(\mathbf{x}_{1} | \mathbf{z}_{1}, {\phi}) p( \mathbf{z}_{1}| {\pi}_{z}) \\ \alpha\left( \mathbf{z}_{n} \right) = \sum_{\mathbf{z_{n-1}}} p( \mathbf{x}_{n} | \mathbf{z}_{n}, {\phi}) p( \mathbf{z}_{n} | \mathbf{z}_{n-1}, \mathbf{A}) \alpha\left( \mathbf{z}_{n-1} \right) $$

$$ \beta(\mathbf{z}_{N}) = 1 \\ \beta\left( \mathbf{z}_{n} \right) = \sum_{\mathbf{z}_{n+1}} \beta\left( \mathbf{z}_{n+1} \right) p(\mathbf{x}_{n+1} | \mathbf{z}_{n+1}, {\phi}) p(\mathbf{z}_{n+1} | \mathbf{z}_{n}, \mathbf{A}) $$

$$ \mu(\mathbf{z}_{1}) = p(\mathbf{z}_{1} , \mathbf{x}_{1}) = p(\mathbf{x}_{1} | \mathbf{z}_{1}) p(\mathbf{z}_{1}) \\ \mu(\mathbf{z}_{n}) = \underset{\mathbf{z}_{n-1}}{\mathrm{maximize}} \; p(\mathbf{x}_{n}| \mathbf{z}_{n}) \cdot p(\mathbf{z}_{n}| \mathbf{z}_{n-1}) \; \cdot \; \mu(\mathbf{z}_{n-1}) $$


Question:

How can I compute these if I don't know the $\mathbf{z}_{n}$? I know the possible states $\mathbf{z}_{n}$ could be in, and all the probability distributions in those expressions, but how do I run these calculations without knowing the $\mathbf{z}_{n}$?

$\endgroup$
4
  • $\begingroup$ Maybe this can help en.m.wikipedia.org/wiki/Baum%E2%80%93Welch_algorithm $\endgroup$
    – Mick
    Commented Jul 31, 2020 at 18:29
  • $\begingroup$ @Mick So is my notation messed up? I'm not really sure. $\endgroup$
    – The Dude
    Commented Jul 31, 2020 at 18:57
  • $\begingroup$ The Baum-Welch algorithm computes the most likely sequence of hidden states that could have produced a given output. So you have your sequence of observed states and this algorithm gives you the most likely sequence of hidden states. Or are not after finding these? $\endgroup$
    – Mick
    Commented Jul 31, 2020 at 19:14
  • $\begingroup$ @Mick That is exactly what I am after. I think I need to compute these recursions for all possible states $\mathbf{z}_{n}$ can take. So in my case, I will need two lists of $\alpha, \beta, \mu$ $\endgroup$
    – The Dude
    Commented Jul 31, 2020 at 19:29

1 Answer 1

2
$\begingroup$

How do I compute these if I don't know the $\mathbf z_n$?

You don't need to know the $\mathbf z_n$. This $\alpha(\mathbf z_n)$ thing is a function that maps each possible value of $\mathbf z_n$ to a number.

More precisely, for each possible value of $\mathbf z_n$, $\alpha(\mathbf z_n)$ is defined to be the number $p(\mathbf x_1, \mathbf x_2, \dots , \mathbf x_n, \mathbf z_n)$, which is the probability that we observe values $\mathbf x_1, \dots, \mathbf x_n$ for the visible variable over the first $n$ timesteps and that the latent variable takes the value $\mathbf z_n$ on the $n$th timestep.

So for example, $$ \alpha(\mathbf z_1 = a) = p(\mathbf x_1 | \mathbf z_1 = a) \pi_{ a} \\ \alpha(\mathbf z_1 = b) = p(\mathbf x_1 | \mathbf z_1 = b) \pi_{b} $$ ... and then ... $$ \alpha(\mathbf z_2 = a) = p(\mathbf x_2 | \mathbf z_2 = a)A_{a\to a} \alpha(\mathbf z_1 = a) + p(\mathbf x_2 | \mathbf z_2 = a)A_{b\to a}\alpha(\mathbf z_1 = b) \\ \alpha(\mathbf z_2 = b) = p(\mathbf x_2 | \mathbf z_2 = b) A_{a \to b}\alpha(\mathbf z_1 = a) + p(\mathbf x_2 | \mathbf z_2 = b)A_{b\to b}\alpha(\mathbf z_1 = b) $$ ... and then ... $$ \alpha(\mathbf z_3 = a) = p(\mathbf x_3 | \mathbf z_3 = a)A_{a\to a} \alpha(\mathbf z_2 = a) + p(\mathbf x_3 | \mathbf z_3 = a)A_{b\to a}\alpha(\mathbf z_2 = b) \\ \alpha(\mathbf z_3 = b) = p(\mathbf x_3 | \mathbf z_3 = b) A_{a \to b}\alpha(\mathbf z_2 = a) + p(\mathbf x_3 | \mathbf z_3 = b)A_{b\to b}\alpha(\mathbf z_2 = b) $$ ... and so on.

If you're coding this up as a computer program, you'd compute these iteratively, starting with $\alpha(\mathbf z_1 = a)$ and $\alpha(\mathbf z_1 = b)$, and followed by $\alpha(\mathbf z_2 = a)$ and $\alpha(\mathbf z_2 = b)$, and so on.

The same applies to the $\beta(\mathbf z_n)$'s, which are functions that map each possible value of $\mathbf z_n$ to the number $p(\mathbf x_{n + 1}, \dots, \mathbf x_{N} | \mathbf z_n)$, which is the probability of observing values $\mathbf x_{n + 1}, \dots, \mathbf x_N$ for the visible variable from timestep $(n+1)$ onwards given that the latent variable takes the value $\mathbf z_n$ on the $n$th timestep.

So we have $$ \beta(\mathbf z_N = a) = 1 \\ \beta(\mathbf z_N = b) = 1$$ ... and then ... $$ \beta(\mathbf z_{N-1} = a) = \beta(\mathbf z_N = a) p(\mathbf x_N | \mathbf z_N = a) A_{a \to a} + \beta(\mathbf z_N = b)p(\mathbf x_N | \mathbf z_N = b) A_{a \to b} \\ \beta(\mathbf z_{N-1} = b) = \beta(\mathbf z_N = a) p(\mathbf x_N | \mathbf z_N = a) A_{b \to a} + \beta(\mathbf z_N = b) p(\mathbf x_N | \mathbf z_N = b) A_{b \to b} $$ ... and then ... $$ \beta(\mathbf z_{N-2} = a) = \beta(\mathbf z_{N-1} = a) p(\mathbf x_{N-1} | \mathbf z_{N-1} = a) A_{a \to a} + \beta(\mathbf z_{N-1} = b)p(\mathbf x_{N-1} | \mathbf z_{N-1} = b) A_{a \to b} \\ \beta(\mathbf z_{N-2} = b) = \beta(\mathbf z_{N-1} = a) p(\mathbf x_{N-1} | \mathbf z_{N-1} = a) A_{b \to a} + \beta(\mathbf z_{N-1} = b) p(\mathbf x_{N-1} | \mathbf z_{N-1} = b) A_{b \to b} $$ ... and so on.

This can also be coded up as an iterative calculation, where you first compute $\beta(\mathbf z_N = a)$ and $\beta(\mathbf z_N = b)$, followed by $\beta(\mathbf z_{N-1} = a)$ and $\beta(\mathbf z_{N-1} = b)$, and so on.

$\endgroup$
2
  • $\begingroup$ The same goes for the $\mu$ recursion, correct? This is what I was just thinking. This is a very clear explanation --- much clearer than my thoughts haha $\endgroup$
    – The Dude
    Commented Jul 31, 2020 at 20:48
  • $\begingroup$ Yes, the same goes for $\mu$. Glad it helped! $\endgroup$
    – Kenny Wong
    Commented Jul 31, 2020 at 20:49

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .