Quentin Duchemin

Perron Frobenius Theorem

Perron Frobenius Theorem is a well-known algebra result that finds applications in a large span of fields of Mathematics. One can mention Markov chains, population growth (with the Leslie matrix model) or the famous PageRank algorithm. In this post we prove the Perron-Frobenius Theorem for stochastic matrices.

Perron Frobenius Theorem

Let us consider some integer d\geq1 and Q \in \mathcal M_d(\mathbb R_+) a stochastic matrix, meaning that for all i \in [d], \sum_j Q_{i,j}=1. In the following, we will denote Q^n=\left(  q_{i,j}^{(n)}\right)_{(i,j) \in [d]^2}. We assume that Q is irreducible which means that \forall i,j \in [d], \quad \exists n \geq 0 \quad q_{i,j}^{(n)} >0. Then the period of the point i defined by p_i=\mathrm{pgcd}(\{n \geq 0 \; : \; q_{i,i}^{(n)}>0\} does not depend on i meaning that there exists some p \in \mathbb N such that p_i=p \quad \forall i \in [d]. Then:

  • The spectral radius of Q is 1. Moreover, the eigenvalues of Q of modulus 1 are exactly the p p-th roots of unity and have algebraic multiplicity of 1.
  • The spectrum of Q is invariant by rotation of angle \frac{2\pi}{p}.
  • If p>1, it exists a permutation matrix S such that

        \[S^{-1}QS = \begin{bmatrix} 0 & B_{1} & 0 & \dots & 0 \\ 0 & 0 & B_{2} & \dots & 0 \\ \vdots & \vdots & \vdots & \dots &\vdots\\ 0 & 0 & 0 & \dots & B_{p-1}\\ B_{p} & 0 & 0 & \dots & 0 \end{bmatrix},\]

    where the matrices (B_i)_i are non-zero square matrices.

Proof.

Step 0 : Preliminaries

Let us recall that on a finite state space, an irreducible matrix Q has a unique invariant probability distribution \pi.

Claim 0.1: \forall i , \; \pi_i>0

Let us assume that for some i \in [d] it holds \pi_i=0. Since \pi is a probability measure on a finite space, there exists some j \in [d] such that \pi_j>0. Using that \pi is invariant for Q we get directly that \pi is also an invariant distribution of the matrix Q^n for any n \in \mathbb N which means that

    \[\pi Q^n=\pi.\]

Considering the line number i in the previous equality, we get that \sum_k q_{k,i}^{(n)}=0. But all terms in the previous sum are non-negative so they must be all equal to zero. In particular we have q_{i,j}^{(n)} \pi_j=0 which leads to q_{i,j}^{(n)}=0 since \pi_j>0. Hence, we proved that for all n \in \mathbb N, q_{i,j}^{(n)}=0 which is in contradiction with the irreducibility of the matrix Q . This concludes the proof of Claim 0.1.

Claim 0.2: Denoting D:=diag(\pi_1, \dots \pi_d), D^{-1}Q^{\top}D is an irreducible stochastic matrix

Claim 0.2 follows from a direct computation using the fact that Q is an irreducible stochastic matrix.

Step 1 : Eigenspace associated with the eigenvalue 1

Let us denote \mathbf 1=(1,\dots,1) \in \mathbb R^d. It is a eigenvector of Q related to the eigenvalue 1.

Claim 1.1: The spectral radius of Q is 1.

Let us denote \lambda an (complex) eigenvalue of Q with the largest modulus and h \in \mathbb C^d an non-zero associated eigenvector. Let i \in \arg \max_{j} |h_j|. Since (Qh)_i = \lambda h_i, taking the module and using the triangle inequality gives

    \[|\lambda| \times |h_i| \leq \sum_k  |q_{i,k}| \times |h_k| \leq  \sum_k |q_{i,k}| \times |h_i| = |h_i|,\]

where the last inequality comes from the fact that Q is a stochastic matrix. By definition of i and since h \neq 0_d, we have that |h_i| \neq 0 and the previous inequality leads to \lambda  \leq 1. Since 1 is an eigenvalue of Q, we deduce that the spectral radius of Q is 1.

Claim 1.2: For two vectors X,Y \in \mathbb R^d, we write X\geq Y to state that X_i \geq Y_i for all i \in [d}. Then, for any X \in \mathbb R^d \backslash \{0_d\} such that QX\geq X and X \geq 0, it holds that QX=X and there exists some a>0 such that X=a \mathbf 1.

Let us consider i \in \arg \max _k X_k. Then for any n \in \mathbb N, we have

    \[X_i  = \sum_k q_{i,k}^{(n)} X_i \geq \sum_k q_{i,k}^{(n)} X_k \geq X_i,\]

where the first equality comes from the irreducibility of Q, the second inequality comes from the definition of i and the fact that X\geq0 and the last is obtained by taking the i-th line in the assumption QX \geq X. We deduce that all the above inqualities are actually equalities which implies that for any j\in [d] such that q_{i,j}^{(n)}>0, it holds X_j=X_i. This result is true for any n \in mathbb N and since for nay j \in [d] there exists some n \in \mathbb N such that q_{i,j}^{(n)}>0, we deduce that

    \[X_i=X_j, \quad  \forall j \in [d].\]

Hence we proved that X=a \mathbf 1 with a=X_i>0 and QX=X.

Claim 1.3: If Z=(z_i) \in \mathbb C \backslash \{0_d\} is a complex eigenvector realted to an eigenvalue \lambda of modulus 1 of Q, then |Z|=(|z_i|) is an nonnegative eigenvector related to the eigenvalue 1. This implies that the eigenspace related to 1 is Vect(\mathbf 1).

For all i \in [d], \sum_k q_{i,k} z_k = \lambda z_i. Hence, taking the modulus in the previous inequality and using the triangle inequality leads to |z_i| \leq \sum_k q_{i,k} |z_k|, \forall i \in [d] i.e. |Z| \leq Q|Z|. Using Claim 1.2, we deduce that Q|Z|=|Z| and that |Z| \in Vect( \mathbf 1), which concludes the proof.

Step 2 : Algebraic multiplicity of the eigenvalue 1

In the following, we will denote \chi_Q the caracteristic polynomial of the matrix Q and P the conjugate of the comatrice of I-Q. Claim 2.1 is a classical result of differential calculus.

Claim 2.1: \chi_Q'(1)=\mathrm{Tr}(P).

Claim 2.2: Every column of P belongs to Vect(\mathbf 1).

From Step 1, we know that I-Q has rank n-1. Hence \mathrm{det}(I-Q)=0 and we get that

    \[(I-Q) \times  P=det(I-Q)I=0.\]

This means that the subspace spanned by the columns of P is included in the nullspace of I-Q which is Vect(\mathbf 1). Hence, each columns of P belongs to Vect(\mathbf 1).

Claim 2.3: Every line of P is proportional to \pi.

From Step 0, we know that D^{-1}Q^{\top}D is an irreducible stochastic matrix and we deduce that this is also the case for D^{-1}(I-Q^{\top})D. Hence, remarking that Claim 2.2 holds for any irreducible stochastic matrix, we get that every column of Com(D^{-1}(I-Q^{\top})D)^{\top} is proportional to \mathbf 1, meaning that for all j \in [d], there exists some a_j \in \mathbb R such that c_{j,i}(D^{-1}(I-Q^{\top})D)=a_j for all i where we have denote c_{j,i} the cofactor application related to line j and column i. Hence, for all i,j \in [d],

    \begin{align*}a_j&= c_{j,i}(D^{-1}(I-Q^{\top})D) \\ &= \left[Com(D^{-1}(I-Q^{\top})D)^{\top}\right]_{i,j} \\&=\left[Com(D(I-Q)D^{-1})\right]_{i,j} \\ &=c_{i,j}( D(I-Q)D^{-1}) \\ &= \frac{\prod_{k : k\neq i} \pi_k}{\prod_{l:l\neq j}\pi_l} c_{i,j}(I-Q)\\ &= \frac{\pi_j}{\pi_i}c_{i,j}(I-Q)\end{align*}

Hence we proved that for all i,j \in [d], c_{i,j}(I-Q) = \frac{a_j}{\pi_j}\pi_i. We deduce that each line of P=Com(I-Q)^{\top} = \left( c_{j,i}\right)_{i,j \in [d]} is proportional to \pi.

Claim 2.4: All entries of P are non-zero and share the same sign.

From Step 1, the rank of the matrix I-Q is 1 and thus the dimension of its nullspace is d-1. We deduce that there exists a submatrix from I-Q with size (n-1)\times (n-1) which is invertible. This implies that the matrix P has at least one non-zero coefficient and we note (i,j) its location. From Claim 2.2, we deduce that all the entries of the j-th column of P are non-zero (and have the same sign since they are equal). Then, using Claim 2.3 together with the fact that \pi_k>0, \forall k from Step 0, we get the Claim 2.4.

Conclusion

From Claims 2.1 and 2.4, we have \chi_Q'(1)=\mathrm{Tr}(P) \neq 0. This proves that 1 is a root with multiplicity 1 of the caracteristic polynomial of Q.

Step 3 : The other eigenvalues of modulus 1

We consider Z=(z_i) an eigenvector (complex) related to an eigenvalue \lambda with modulus 1 of Q. Using Claim 1.2, we can immediately that |z_i| is independent of i and thus we can consider without loss of generality that z_1=1.

Claim 3.1: If q_{i,j}>0 then z_j=\lambda z_i.

Since QZ=\lambda Z, taking the modulus and using the triangle inequality gives |Z| \leq Q|Z|. From Claim 1.2, we get that there exists some a >0 such that |Z|=a \mathbf 1.

Claim 3.2: If q_{i,j}>0 then z_j=\lambda z_i.

Let us consider some i \in [d]. Taking the modulus in \lambda z_i = \sum_k q_{i,k} z_k leads to

    \[|z_i| \leq \sum_k q_{i,k}|z_k|=|z_i| \sum_k q_{i,k} = |z_i| \quad (*)\]

where we used Claim 3.1 and the fact that Q is a stochastic matrix. We deduce that all inequalities in (*) are equalities and the equality condition in the triangle inequality gives that the (z_k)_k such that q_{i,k}>0 are positively colinear. From Claim 3.1, we get that (z_k)_k such that q{i,k}>0 are equal to some \overline{z}. Hence,

    \[\lambda z_i = \sum_k q_{i,k}z_k = \sum_{k \; : \; q_{i,k}>0} q_{i,k} \underbrace{z_k}_{=\overline{z}} = \overline{z}\underbrace{\sum_{k \; : \; q_{i,k}>0} q_{i,k}}_{=1} = \overline{z}.\]

Hence for any i,j \in [d] such that q_{i,j}>0, it holds z_j=\lambda z_i.

Claim 3.3: \lambda is a p-th root of unity.

Let us consider \theta \in [0,2\pi[ such that \lambda = e^{i \theta}. For all j \in [d], since q_{j,j}^{(p_j)}>0, Claim 3.2 gives z_j =\lambda^{p_j}z_j and since z_j \neq 0 (see Claim 3.1), we have that \lambda^{p_j}=e^{i \theta p_j}=1. Hence \frac{2\pi}{\theta} is an integer and divides p_i. We deduce that \frac{2\pi}{\theta} divides p=pgcd(p_k \;:\; k \in [d]) meaning that there exists some m \in \mathbb N such that \theta = \frac{2\pi m}{p}. We have found that \lambda is a p-th root of unity.

Conversely, let us consider \lambda a p-th root of unity. We define the vector Z^{(\lambda)}=(z_i^{(\lambda)}) by z_1^{(\lambda)}=1 and z_i^{(\lambda)}=\lambda^k if q_{1,i}^{(k)}\neq 0.

Claim 3.4: The vector Z^{(\lambda)} is well-defined and is an eigenvector associated with the eigenvalue \lambda.

\bullet We first show that Z^{(\lambda)} is well-defined. Let us consider some i \in [d]. By irreducibility of the matrix Q, there exists some r \in \mathbb N such that q_{i,1}^{(r)}>0. Then, for all k \in \mathbb N such that q_{1,i}^{(k)}>0, we have

    \[q_{i,i}^{(k+r)} \geq q_{i,1}^{(r)}q_{1,i}^{(k)}>0.\]

Hence, p=p_i divides k+r. Since \lambda is a p-th root of unity, we deduce that for any k,l \in \mathbb N such q_{1,i}^{(k)}>0 and q_{1,i}^{(l)}>0, we have

    \[\lambda^k=\lambda^l=\lambda^{-r}.\]

Hence, the vector Z^{(\lambda)} is well-defined.

\bullet For all i \in [d], we consider some integer k_i \in \mathbb N such that q_{1,i}^{(k_i)}>0. One can easily remark that for all i,j \in  [d], if q_{i,j}>0 then q_{1,j}^{(k_i+1)}>0 since

    \[q_{1,j}^{(k_i+1)} \geq q_{1,i}^{(k_i)}q_{i,j}.\]

Hence, for all i,j \in [d],

(1)   \begin{equation*}  q_{i,j}>0 \implies  z_{j}^{(\lambda)}=\lambda^{k_i+1}=\lambda z_{(\lambda)}_i\end{equation*}

From here, we get that for all i \in [d],

    \begin{align*} ( Q Z^{(\lambda)} )_i &= \sum_j q_{i,j} z^{(\lambda)}_j \\ &=  \sum_{j \; : \; q_{i,j}>0} q_{i,j} z^{(\lambda)}_j\\ &= \sum_{j \; : \; q_{i,j}>0} q_{i,j} \lambda^{k_i+1}\\ &= \lambda^{k_i+1} \underbrace{\sum_{j \; : \; q_{i,j}>0} q_{i,j}}_{=1} \\ &= \lambda \times  \lambda^{k_i}\\ &= \lambda z^{(\lambda)}_i .\end{align*}

Hence we proved that Q Z^{(\lambda)} = Z^{(\lambda)} meaning that Z^{(\lambda)} is an eigenvector of Q for the eigenvalue \lambda.

Claim 3.5: Let \Delta be the diagonal matrix with diagonal coefficients (z_i^{(\lambda)})_i. It holds \Delta^{-1}Q\Delta = \lambda Q. This gives that the eigenvalues associated with the p-th roots of unity of Q have algebraic multiplicity 1 and the spectrum of Q is invariant under rotation of angle \frac{2\pi}{p}.

\bullet The statement \Delta^{-1}Q\Delta = \lambda Q reads as

(2)   \begin{equation*} \forall i,j \in [d], \quad  z^{(\lambda)}_jq_{i,j} = \lambda z^{(\lambda)}_i q_{i,j}.\end{equation*}

\eqref{claim35} clearly holds if q_{i,j}=0. Hence we consider i,j \in [d] such that q_{i,j} \neq 0. From \eqref{claim34}, we have z_j^{(\lambda)} = \lambda z_i^{(\lambda)} and \eqref{claim35} is satisfied.

\bullet Now we prove that all p-th roots of unity are eigenvalues of Q with multiplicity 1. Let us consider some p-th root of unity \lambda. Denoting for any i\in [d] c_{i,i}(A) the cofactor of some matrix A \in \mathbb R^{d\times d} at location (i,i), one can easily check that

    \[c_{i,i}(\Delta^{-1}(\lambda I-Q)\Delta) = c_{i,i}(\lambda I-Q).\]

We deduce that

    \begin{align*} \chi_Q'(\lambda) &= \mathrm{Tr}(Com(\lambda I-Q)^{\top}) \\&= \mathrm{Tr}(Com(\lambda I-Q))\\ &=\mathrm{Tr}(Com( \Delta^{-1}(\lambda I-Q)\Delta))\\ &=  \mathrm{Tr}(Com( \lambda I-\Delta^{-1}Q\Delta)) \\ &= \mathrm{Tr}(Com( \lambda I-\lambda Q)) \\ &= \lambda^{d-1}  \mathrm{Tr}(Com( I-Q)) \\ &=\lambda^{d-1} \mathrm{Tr}(P) \neq 0, \end{align*}

where we used the conclusion of Step 1 to state that \mathrm{Tr}(P) \neq 0. This shows that \lambda has multiplicity 1.

\bullet Let us consider some eigenvalue \mu of Q. We set \lambda = e^{i\frac{2\pi}{p}} and \Delta the diagonal matrix with diagonal coefficients (z_i^{(\lambda)})_i. Then,

    \begin{align*}  \chi_Q(e^{i \frac{2\pi}{p}} \mu)&=\mathrm{det}(\lambda \mu I - Q) \\ &=\mathrm{det}(\Delta^{-1}(\lambda \mu I - Q)\Delta) \\&=\mathrm{det}(\lambda \mu I - \Delta^{-1}Q\Delta) \\ &=\mathrm{det}(\lambda \mu I - \lambda Q) \\ &= \lambda^{d}\mathrm{det}( \mu I - Q) \\ &=\lambda^d \chi_Q(\mu) \\ &= 0,   \end{align*}

where the last equality holds because \mu is an eigenvalue of Q. This proves that the spectrum of Q is invariant under rotation of angle \frac{2\pi}{p} and concludes the proof of Claim 3.5.

Extension of the Theorem

The Krein–Rutman theorem is a generalisation of the Perron-Frobenius theorem to infinite-dimensional Banach spaces and complex integral operators.

Leave a Reply

Your email address will not be published. Required fields are marked *.

*
*
You may use these <abbr title="HyperText Markup Language">HTML</abbr> tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>