5
$\begingroup$

Hastie \& Tibshirani's original approach to fitting generalized additive models was the backfitting algorithm. For a model of the form $$ y = \alpha + \displaystyle\sum_k f_k(x_k) + \epsilon $$

  1. Initilize the $\alpha$ and $f_k$ at reasonable values
  2. Loop through terms, subtracting off the current estimate of all others, in order to fit a bivariate model with a scatterplot smoother, like a local polynomial. For the $m$th iteration: $$ y - \alpha^m - \displaystyle\sum_{-k} f^m_{-k}(x_{-k}) \equiv y_k^m = f_k(x_k) + \epsilon $$
  3. Do this until convergence.

This approach seems to have been superseded in most of applied statistics by penalized splines, of the sort implemented in mgcv in R, but is still used in nonparametric econometrics.

My question: how general is the backfitting algorithm? Are there restrictions on what smoothers one can use for the $f_k$? For example, if a random forest were used (shown recently to be consistent and asymptotically normal) in an additive model with high-dimensional additive terms, should one expect to consistently estimate the terms of the model? Why or why not?

And even if such a model were consistent, what could be said about efficiency?

$\endgroup$

0