2
$\begingroup$

I am trying to show the matrix $(I - \gamma P^\pi)$ is always invertible, where P is a stochastic matrix (i.e. $P_{ij} \geq 0$, sum of the rows equal 1) and $\gamma \in [0,1)$. I found two sources that prove this in different way but I can't really understand either.

  1. From (https://ai.stanford.edu/~gwthomas/notes/mdps.pdf, page 10)

$$\begin{align*}||(I - \gamma P^\pi)x||_\infty &= ||x - \gamma P^\pi x||_\infty \\ &\geq ||x||_\infty - \gamma||P^\pi x||_\infty \\ &\geq ||x||_\infty - \gamma||x||_\infty \\ &> 0\end{align*} $$

I don't understand how the second inequality comes from. I guess it is true if $||Ax|| \leq ||A||\cdot||x||$ holds in general (even for infinity norm), since $||P||_\infty = 1$.

  1. From (http://researchers.lille.inria.fr/~lazaric/Webpage/MVA-RL_Course14_files/notes-lecture-02.pdf, page 17)

I am able to show that $P^\pi$ has all eigenvalues $\leq$ 1 and $(I - \gamma P^\pi)$ has eigenvalues $\geq 1 - \gamma$, then it's a PD matrix and thus invertible. However, what happens if P has some nonreal eigenvalues? I think it doesn't make sense to say it's $\leq$ 1? Does this proof handle that too?

$\endgroup$
3
  • 1
    $\begingroup$ A matrix is invertible iff $0$ is not one of its eigenvalue. (Since the determinant of the matrix is the product of eigenvalues). Hence it suffices to check the real eigenvalues. We don't care about the complex eigenvalues -- they can't make the determinant zero. $\endgroup$ Commented Jan 7, 2021 at 17:20
  • $\begingroup$ what does the superscript $\pi$ mean? $\endgroup$ Commented Jan 7, 2021 at 22:20
  • $\begingroup$ Thanks Johnri, thats clear! The superscript is irrelevant in this context, it means the policy in the markov decision process. Sorry for the confusion. $\endgroup$ Commented Jan 8, 2021 at 3:10

1 Answer 1

1
$\begingroup$

\begin{align*}\|(I - \gamma P^\pi)x\|_\infty &= \|x - \gamma P^\pi x\|_\infty \end{align*} Using the triangular inequality of the norm we have that $\| a\|=\| a-b+b\| \leq \| a-b\|+\|b\|$, equivalently, we have $ \| a-b\| \geq \| a\|-\|b\|$. Due to triangular inequality, absolute homogeneity, and $\gamma>0$ we have that \begin{align*} \|x - \gamma P^\pi x\|_\infty &\geq \|x\|_\infty - \gamma\|P^\pi x\|_\infty \end{align*}

Now we focus on $||P^\pi x||_\infty$. Since $P^\pi$ is a stochastic matrix it is nonnegative and its columns add up to 1, \begin{align*} \|P^\pi x\|_\infty =& \max_i \left|\sum_j (P^\pi)_{i,j} x_j \right| \\ \min_i x_i \leq & \sum_j (P^\pi)_{i,j} x_j \leq \max_i x_i \\ \|P^\pi x\|_\infty \leq& \max\{\max_i x_i,-\min_i x_i\} = \max_i |x_i| = \|x\|_\infty \end{align*}

Going back to the original problem,

\begin{align*} \|x - \gamma P^\pi x\|_\infty &\geq \|x\|_\infty - \gamma\|x\|_\infty > 0\end{align*}

showing that $(I - \gamma P^\pi)$ is invertible because for any vector $x$ different than zero $(I - \gamma P^\pi)x$ is different than zero, stopping it from having any zero eigenvalue.

$\endgroup$

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .