I have a question regarding to this bias and variance decomposition in The Elements of Statistical Learning. In chapter 7.2, it mentioned $\operatorname{Err}\left(x_0\right)=$
$$E\left[\left(Y-\hat{f}\left(x_0\right)\right)^2 \mid X=x_0\right]=\sigma_{\varepsilon}^2+\left[\mathrm{E} \hat{f}\left(x_0\right)-f\left(x_0\right)\right]^2+E\left[\hat{f}\left(x_0\right)-\mathrm{E} \hat{f}\left(x_0\right)\right]^2 =\sigma_{\varepsilon}^2+\operatorname{Bias}^2\left(\hat{f}\left(x_0\right)\right)+\operatorname{Var}\left(\hat{f}\left(x_0\right)\right)$$
While keep $x_0$ fixed, this can be viewed as a population level expected prediction error. Later when the textbook talks about how to choose an optimal parameter analytically (prepare for C_p, AIC, etc), it talks about estimating this expected prediction error using the training sample, but no bias and variance decomposition was done there.
I wonder if bias and variance decomposition is only some analysis worth talking about in the population level?
In this Bias-Variance Decomposition in Ridge Linear Regression, the decomposition has been intensively discussed. At the end, one can naturally optimize the sum of squared bias and variance to get an optimal penalty parameter $\lambda$. However, as the author mentioned, the result relies on couple of unknown (true) quantities.
So far it seems like one can find a feasible optimal $\lambda$ by optimizing the estimated expected prediction error, but cannot find a feasible optimal $\lambda$ from the bias variance decomposition, which is an analysis done worth only on the population level. Is that correct?