To answer your question: you can use the smoothing density. But you don't have to. Jarle Tufto's answer has the decomposition that you're using. But there are others.
Using the Kalman Recursions
Here you're evaluating the likelihood as
$$
f(y_1, \ldots, y_n) = f(y_1)\prod_{i=2}^nf(y_i|y_1, \ldots, y_{i-1}).
$$
However, means and variances don't always fully define probability distributions in general. The following is the decomposition that you're using to go from filtering distributions $f(x_{i-1}|y_1,\ldots,y_{i-1})$ to conditional likelihoods $f(y_i|y_1,\ldots,y_{i-1})$:
$$
f(y_i|y_1, \ldots, y_{i-1}) = \iint f(y_i|x_i)f(x_i|x_{i-1})f(x_{i-1}|y_1, \ldots, y_{i-1})dx_{i} dx_{i-1} \tag{1}.
$$
Here $f(x_i|x_{i-1})$ is the state transition density...part of the model, and $f(y_i|x_i)$ is the observation density...part of the model again. In your question you write these as $x_{t+1}=Fx_{t}+v_{t+1}$ and $y_{t}=Hx_{t}+Az_{t}+w_{t}$ respectively. It's the same thing.
When you get the one step ahead state prediction distribution, that's computing $\int f(x_i|x_{i-1})f(x_{i-1}|y_1, \ldots, y_{i-1}) dx_{i-1}$. When you integrate again, you obtain (1) completely. You write that density out completely in your question, and it's the same thing.
Here you're only using decompositions of probability distributions, and assumptions about the model. This likelihood calculation is an exact calculation. There isn't anything discretionary that you can use to do this better or worse.
Using the EM Algorithm
To my knowledge, there is no other way to evaluate the likelihood directly in this kind of state space model. However, you can still do maximum likelihood estimation by evaluating a different function: you can use the EM algorithm. In the Expectation step (E-Step) you would compute
$$
\int f(x_1, \ldots, x_n|y_1,\ldots y_n) \log f(y_1,\ldots,y_n,x_1, \ldots,x_n) dx_{1:n} = E_{smooth}[\log f(y_1,\ldots,y_n,x_1, \ldots,x_n)].
$$
Here $f(y_1,\ldots,y_n,x_1, \ldots,x_n)$ is the "complete data" likelihood, and you're taking the expectation of the log of that with respect to the joint smoothing density. What often happens is that, because you're taking the log of this complete data likelihood, the terms split up into sums, and because of linearity of the expectation operator, you're taking expectations with respect to the marginal smoothing distributions (the ones you mention in your question).
Other things
I've read in places that the EM is a "more stable" way to maximize the likelihood, but I've never really seen this point argued well, nor have I seen this word "stable" defined at all, but also I haven't really examined this further. Neither of these algorithms get around the local/global maxima ordeal. I personally tend to use the Kalman more often just out of habit.
It's true that smoothed estimates of the state have smaller variance typically than filtering, so I guess you're right to have some intuition about this, but you're not really using the states. The likelihood you're trying to maximize is not a function of the states.