What happens if I repeatedly alternately normalize the rows and columns of a matrix?

Question

Here is an algorithm:

input matrix M

(in-place)

divide each row of M by its norm

divide each column of M by its norm

repeat

What will M look like after this has been repeated many times? Can we say anything about what this process does?

A square $n\times n$ matrix with all entries $1$ will stabilize after the first Row normalization to the square matrix with all entries $\frac{1}{\sqrt{n}}$. — Justin Benfield, Commented Apr 14, 2016 at 2:27
@JustinBenfield That is not unit in the Euclidean norm. Did you mean $n^{-1/2} $? — Ian, Commented Apr 14, 2016 at 2:29
@Ian Unless I am mistaken, the OP intends for each row to be normalized by its norm (as a vector). Edit: nvm, I see the issue, fixed. — Justin Benfield, Commented Apr 14, 2016 at 2:31
The wording is ambiguous. "its" could also refer to the matrix and its norm. — mvw, Commented Apr 14, 2016 at 2:34
You could square each entry; then divide each row and column by its sum instead. — Empy2, Commented Apr 14, 2016 at 3:15

user1551 · Accepted Answer · 2016-04-14 14:16:31Z

Here is a partial answer:

The sign of each entry is perserved under normalisation. So, we may assume without loss of generality that $M$ is entrywise nonnegative.
Therefore, if $A_k$ is the entrywise square of the $k$-th iterate $M_k$, then the iterates $A_k$s are normalised by row sums and column sums, rather than the norms of the rows and columns. I think it is easier to work with $\{A_k\}$ than with $\{M_k\}$.
Thus $A_k$ is row stochastic when $k$ is odd, or column stochastic when $n\ge2$ is even. Without loss of generality, assume that $A_0$ (the entrywise square of the initial $M$) is also column stochastic.
Clearly, $A$ is a fixed point if and only if it is doubly stochastic. (In other words, in the original setting, $M$ is a fixed point if and only if it is the Hadamard product of a $\{-1,1\}$-binary matrix and an entrywise square root of a doubly stochastic matrix. Also, as the set of all row or column stochastic matrices is bounded, each of $\{A_k\}_{k\ge0},\ \{A_{2k}\}_{k\ge0}$ and $\{A_{2k+1}\}_{k\ge0}$ always has a convergent subsequence.
This iterative procedure of alternating normalisation by row sums and column sums is known as Sinkhorn-Knopp algorithm, which is used to determine the "$DAD$ form" in Sinkhorn's theorem. It is known that $\{A_k\}_{k\ge0}$ converges if and only if $A_0$ contains a positive diagonal, i.e. iff for some permutation matrix $P$, all entries of $A_0$ at the positions of the ones in $P$ are positive. See

R. Sinkhorn and P. Knopp (1967), Concerning Nonnegative Matrices and Doubly Stochastic Matrices, Pacific J. Math., 21(2):343-348.
So, $\{A_k\}_{k\ge0}$ may not converge. In fact, cycles can exist. Example: $$ \pmatrix{0&\frac12&0\\ 1&0&1\\ 0&\frac12&0} \rightarrow\pmatrix{0&1&0\\ \frac12&0&\frac12\\ 0&1&0} \rightarrow\pmatrix{0&\frac12&0\\ 1&0&1\\ 0&\frac12&0} $$ (To get an example for $\{M_k\}_{k\ge0}$, just take the entrywise square roots of the above matrices.)
We may also modify the above example to obatin a divergent $\{A_k\}_{k\ge0}$ that contains no cycle. First, consider the following sequence: $$ A_0'=\pmatrix{1&\frac12\\ 0&\frac12} \rightarrow A_1'=\pmatrix{\frac23&\frac13\\ 0&1} \rightarrow A_2'=\pmatrix{1&\frac14\\ 0&\frac34} \rightarrow\cdots . $$ In other words, suppose $$ A_{2k}'=\pmatrix{1&\frac1{2k+2}\\ 0&\frac{2k+1}{2k+2}}, \ A_{2k+1}'=\pmatrix{\frac{2k+2}{2k+3}&\frac1{2k+3}\\ 0&1} $$ for each $k$. Clearly, $A_k'$ converges to $I$, but it never reaches a fixed point or enters any cycle. Now, consider the sequence defined by $$ A_{2k}=\pmatrix{0&\frac12A_{2k}'&0\\ A_{2k}'&0&A_{2k}'\\ 0&\frac12A_{2k}'&0}, \ A_{2k+1}=\pmatrix{0&A_{2k+1}'&0\\ \frac12A_{2k+1}'&0&\frac12A_{2k+1}'\\ 0&A_{2k+1}'&0}. $$ Then $\{A_k\}_{k\ge0}$ is divergent and it contains no cycles.
However, I am not sure if the odd- or even- indexed sequences are always convergent. In all of the above examples, they are. The details in the aforementioned paper or other papers related to the Sinkhorn-Knopp algorithm may be helpful in this regard.

A is a fixed point of both "halves" of the operation iff it's doubly stochastic, but wasn't the original question about the combined operation? It's not immediately obvious (to me, at least) that there couldn't be a row-stochastic A and a (different) column-stochastic B such that normalizing the columns of A gives B and normalizing the rows of B gives A. — Gareth McCaughan, Commented Apr 14, 2016 at 11:29
@GarethMcCaughan I was wrong. 2-cycles are possible. See the example in my new edit. — user1551, Commented Apr 14, 2016 at 12:33
@RickyDemer I took the "its" in the OP to mean "the column's" or "the row's", otherwise we are just rescaling the whole matrix in every iteration. — user1551, Commented Apr 14, 2016 at 18:45

Plutoro · Accepted Answer · 2016-04-14 03:51:54Z

I will add mvw's analytical answer with a numeric one.

First, I implemented this row and column normalization in Mathematica, and applied it to randomly generated $4\times 4$ matrices with entries in $[-1,1]$. The algorithm converges, it seems, quite quickly. (Note that while this isn't a proof of convergence, all the terms in the sequence except possibly the first lie in the bounded set of matrices with entries in $[-1,1]$, so there exists a convergent subsequence). The mean of the matrix norm of the difference between the 10th iteration and the 100th iteration was less than $0.001$. After many iterations, the matrices were, as you would expect, matrices with nearly normalized rows and columns.

Matrices with exactly normalized rows and columns are, obviously, fixed points of your algorithm. Let us call these unit row-column matrices. An interesting question is: are there any fixed points of your algorithm that are not unit row-column matrices? I expect that if there are any such fixed points, that they are unstable.

For $2\times 2$ matrices, we can pretty easily see what all the unit row-column matrices are. Given the upper left hand element of the matrix, we may make exactly two choices for the upper right element, and exactly two choices for the lower left element. Finally, there are two choices for the lower right element as well. Therefore, we have eight subclasses of matrix: $$\left(\begin{matrix} \cos t & \sin t\\ \sin t & \cos t\end{matrix}\right),$$ $$\left(\begin{matrix} \cos t & \sin t\\ \sin t & -\cos t\end{matrix}\right),$$ $$\left(\begin{matrix} \cos t & \sin t\\ -\sin t & \cos t\end{matrix}\right),$$ $$\left(\begin{matrix} \cos t & \sin t\\ -\sin t & -\cos t\end{matrix}\right),$$ $$\left(\begin{matrix} \cos t & -\sin t\\ \sin t & \cos t\end{matrix}\right),$$ $$\left(\begin{matrix} \cos t & -\sin t\\ \sin t & -\cos t\end{matrix}\right),$$ $$\left(\begin{matrix} \cos t & -\sin t\\ -\sin t & \cos t\end{matrix}\right),$$ $$\left(\begin{matrix} \cos t & -\sin t\\ -\sin t & -\cos t\end{matrix}\right).$$

I have created the following interesting diagram:

Above you see the unit circle with protruding "hairs." The end of the hair not next to the unit circle is the top row of a $2\times 2$ random matrix. The hair is the path taken by the top row of the matrix as it converges to the unit circle. As you can see, the convergence is sensitive to initial conditions.

mvw · Accepted Answer · 2016-04-14 02:37:20Z

4

You have $$ F(A) = C(R(A)) $$ with $$ R(A) = \left(\prod_i \frac{1}{\lVert (e_i^T A)^T\rVert}\right) A = \left(\prod_i \frac{1}{\lVert A^T e_i\rVert}\right) A \\ C(A) = \left(\prod_i \frac{1}{\lVert Ae_i\rVert}\right) A $$ which seems a non-linear operation.

In practice it defines an iteration $$ A_n = F^n(A) $$ which seems to have at least the fixed point $A = I$.

edited Apr 14, 2016 at 2:37

answered Apr 14, 2016 at 2:19

mvw

34.7k2 gold badges33 silver badges64 bronze badges

$\begingroup$ Any square root of the identity is also fixed. $\endgroup$
– Ian
Commented Apr 14, 2016 at 2:28
$\begingroup$ but.. This does not answer the question, does it? $\endgroup$
– Ant
Commented Apr 14, 2016 at 9:40

Add a comment |

Paul · Accepted Answer · 2016-04-19 20:12:48Z

The algorithm you proposed is essentially the same as Ruiz's Matrix equilibration scheme (see http://www.numerical.rl.ac.uk/reports/drRAL2001034.pdf).

In numerical linear algebra, it is often beneficial to reduce the norm of every row and column to 1 before solving it, as this tends to make the solves more numerically stable. Ruiz's equilibration normalizes the max-norm of every row and column to 1. He also shows that this applies to any $p$-norm under certain conditions.

Arnaud · Accepted Answer · 2016-04-14 11:42:23Z

1

This process will converge towards a matrix R for which both lines and columns are normalized and that share the same values for $\forall (i,i',j,j') \frac{M_{i,j}M_{i',j'}}{M_{i,j'}M_{i',j}}=\frac{R_{i,j}R_{i',j'}}{R_{i,j'}R_{i',j}}$

The proof of the conservation is immediate. I have no prrof at hand for the convergence. But in practice, we use that algorithm and it has always converged.

answered Apr 14, 2016 at 11:42

Arnaud

8727 silver badges20 bronze badges

Add a comment |

Stack Exchange Network

What happens if I repeatedly alternately normalize the rows and columns of a matrix?

5 Answers 5

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
matrices
convergence-divergence
.

Hot Network Questions

What happens if I repeatedly alternately normalize the rows and columns of a matrix?

5 Answers 5

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged matricesconvergence-divergence.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
matrices
convergence-divergence
.