My question is very similar to this one and this one, but they haven't been answered.
Let $f \in C^2(\mathbb{R}^d, \mathbb{R})$ have compact sublevel sets and isolated critical points, and consider the gradient descent update $$ x_{k+1} = x_k-\alpha\nabla f(x_k) $$ for some fixed initial point $x_0$ and learning rate $\alpha$. If $f$ has $L$-Lipschitz gradient globally, it is known that $x_k$ converges to a critical point of $f$ for any $0 < \alpha < 2/L$. Now assume we drop the Lipschitz assumption. The set $U_0 = \{ f(x) \leq f(x_0) \}$ is compact and $\nabla f \in C^1$, so we can define $L = \sup_{x \in U} \lVert \nabla^2 f(x) \rVert < \infty$ (in $L^2$ norm).
I would like to prove (or disprove) that $x_k \in U_0$ for all $k$ for all $0 < \alpha < 2/L$. This would imply that $x_k$ converges to a critical point since $f|_U$ is $L$-Lipschitz. The idea would be to prove $f(x_{k+1}) \leq f(x_k)$ and conclude by induction, by Taylor expanding \begin{align*} f(x_{k+1}) &= f(x_k-\alpha \nabla f(x_k)) \\ &= f(x_k) - \alpha \lVert \nabla f(x_k) \rVert^2 + \frac{\alpha^2}{2}\nabla f(x_k)^T\nabla^2 f(x_k-t\alpha\nabla f(x_k))f(x_k) \end{align*} for some $t \in (0, 1)$. Now if we assume $(x_k-t\alpha\nabla f(x_k)) \in U$, we can conclude $$f(x_{k+1}) \leq f(x_k) - \alpha \lVert \nabla f(x_k) \rVert^2\left(1-\frac{\alpha L}{2}\right) \leq f(x_k)$$ for $\alpha < 2/L$, but this is (almost) a circular assumption... Any ideas?