3
$\begingroup$

I am currently studying this paper In which i am having some problems understanding the purpose of the forward-backwards algorithm.

First of all why even have both forward and backwards?

It seems to me that after one have computed the forward variable: $$\alpha_{t+1}(j) = [\sum_{i=1}^{N}\alpha_t(i)a_{ij}] b_j(o_{t+1}) \qquad, 1 \le t \le T-1 \quad, 1 \le j \le N $$ $$P(O|\lambda) = \sum_{i=1}^N\alpha_T(i) $$

Where the backwards variable is given as such:

$$\beta_t(i) = a_{ij}b_j(O_{t+1})\beta_{t+1}(j), \qquad t = T-1 , T-2 , \ldots , 1$$

Both look pretty similar...

One have the information needed being the probability of the seeing oberservation sequence $O$ in model $\lambda$ the other doesn'.. does the backward even solve the evaluation problem, or is it only the forward which does it.

The usage makes sense in the baum-welch algorithm.. But is that the only purpose?

Another things that I am confused in why all initial probabilities for the backward variable is initialised to be equal 1... Has all the states equal opportunity to be terminating states?

$\endgroup$
1
  • 1
    $\begingroup$ If you notice at the bottom of page 262, it states that: "Strictly speaking, we only need the forward part of the forward-backward procedure to solve Problem 1. We will introduce the backward part of the procedure in this section since it will be used to help solve Problem 3." So you are right, the backward algorithm is not needed, though it can be used instead of the forward algorithm, as far as I know. $\endgroup$
    – DimP
    Commented Apr 27, 2017 at 17:15

1 Answer 1

1
$\begingroup$

Both the forward and backward algorithms will give you the same final value, the full probability of the sequence given the model. It is the scores/probabilities in the dynamic programming arrays that are different. As you point out these values are used in the Baum-Welch re-estimation procedure. They are also used when calculating posterior probabilities. If all you require is the single full probability value you will only need the forward (or backward) algorithm.

To your second question, the values with which to initialize the final column of the dynamic programming array do not necessarily need to be 1's. You can model the transition probabilities to the (silent) end state, which will produce an HMM that can generate sequences of varying length.

$\endgroup$
1
  • $\begingroup$ I am not sure i understood the last part.. Why is it set to be 1, if it can by any arbitrary value? $\endgroup$
    – Bob Burt
    Commented Oct 3, 2017 at 13:56

Not the answer you're looking for? Browse other questions tagged or ask your own question.