I am currently studying this paper In which i am having some problems understanding the purpose of the forward-backwards algorithm.
First of all why even have both forward and backwards?
It seems to me that after one have computed the forward variable: $$\alpha_{t+1}(j) = [\sum_{i=1}^{N}\alpha_t(i)a_{ij}] b_j(o_{t+1}) \qquad, 1 \le t \le T-1 \quad, 1 \le j \le N $$ $$P(O|\lambda) = \sum_{i=1}^N\alpha_T(i) $$
Where the backwards variable is given as such:
$$\beta_t(i) = a_{ij}b_j(O_{t+1})\beta_{t+1}(j), \qquad t = T-1 , T-2 , \ldots , 1$$
Both look pretty similar...
One have the information needed being the probability of the seeing oberservation sequence $O$ in model $\lambda$ the other doesn'.. does the backward even solve the evaluation problem, or is it only the forward which does it.
The usage makes sense in the baum-welch algorithm.. But is that the only purpose?
Another things that I am confused in why all initial probabilities for the backward variable is initialised to be equal 1... Has all the states equal opportunity to be terminating states?