Perron Frobenius Theorem

Perron Frobenius Theorem is a well-known algebra result that finds applications in a large span of fields of Mathematics. One can mention Markov chains, population growth (with the Leslie matrix model) or the famous PageRank algorithm. In this post we prove the Perron-Frobenius Theorem for stochastic matrices.

Let us consider some integer $d\geq1$ and $Q \in \mathcal M_d(\mathbb R_+)$ a stochastic matrix, meaning that for all $i \in [d]$ , $\sum_j Q_{i,j}=1$ . In the following, we will denote $Q^n=\left( q_{i,j}^{(n)}\right)_{(i,j) \in [d]^2}$ . We assume that $Q$ is irreducible which means that $\forall i,j \in [d], \quad \exists n \geq 0 \quad q_{i,j}^{(n)} >0.$ Then the period of the point $i$ defined by $p_i=\mathrm{pgcd}(\{n \geq 0 \; : \; q_{i,i}^{(n)}>0\}$ does not depend on $i$ meaning that there exists some $p \in \mathbb N$ such that $p_i=p \quad \forall i \in [d]$ . Then:

The spectral radius of $Q$ is $1$ . Moreover, the eigenvalues of $Q$ of modulus 1 are exactly the p p-th roots of unity and have algebraic multiplicity of 1.
The spectrum of $Q$ is invariant by rotation of angle $\frac{2\pi}{p}$ .
If $p>1$ , it exists a permutation matrix $S$ such that
$S^{-1}QS = \begin{bmatrix} 0 & B_{1} & 0 & \dots & 0 \\ 0 & 0 & B_{2} & \dots & 0 \\ \vdots & \vdots & \vdots & \dots &\vdots\\ 0 & 0 & 0 & \dots & B_{p-1}\\ B_{p} & 0 & 0 & \dots & 0 \end{bmatrix},$
where the matrices $(B_i)_i$ are non-zero square matrices.

Proof.

Step 0 : Preliminaries

Let us recall that on a finite state space, an irreducible matrix $Q$ has a unique invariant probability distribution $\pi$ .

Claim 0.1: $\forall i , \; \pi_i>0$

Let us assume that for some $i \in [d]$ it holds $\pi_i=0$ . Since $\pi$ is a probability measure on a finite space, there exists some $j \in [d]$ such that $\pi_j>0$ . Using that $\pi$ is invariant for $Q$ we get directly that $\pi$ is also an invariant distribution of the matrix $Q^n$ for any $n \in \mathbb N$ which means that

$\pi Q^n=\pi.$

Considering the line number $i$ in the previous equality, we get that $\sum_k q_{k,i}^{(n)}=0$ . But all terms in the previous sum are non-negative so they must be all equal to zero. In particular we have $q_{i,j}^{(n)} \pi_j=0$ which leads to $q_{i,j}^{(n)}=0$ since $\pi_j>0$ . Hence, we proved that for all $n \in \mathbb N$ , $q_{i,j}^{(n)}=0$ which is in contradiction with the irreducibility of the matrix $Q$ . This concludes the proof of Claim 0.1.

Claim 0.2: Denoting $D:=diag(\pi_1, \dots \pi_d)$ , $D^{-1}Q^{\top}D$ is an irreducible stochastic matrix

Claim 0.2 follows from a direct computation using the fact that $Q$ is an irreducible stochastic matrix.

Step 1 : Eigenspace associated with the eigenvalue 1

Let us denote $\mathbf 1=(1,\dots,1) \in \mathbb R^d$ . It is a eigenvector of $Q$ related to the eigenvalue 1.

Claim 1.1: The spectral radius of $Q$ is 1.

Let us denote $\lambda$ an (complex) eigenvalue of $Q$ with the largest modulus and $h \in \mathbb C^d$ an non-zero associated eigenvector. Let $i \in \arg \max_{j} |h_j|$ . Since $(Qh)_i = \lambda h_i$ , taking the module and using the triangle inequality gives

$|\lambda| \times |h_i| \leq \sum_k |q_{i,k}| \times |h_k| \leq \sum_k |q_{i,k}| \times |h_i| = |h_i|,$

where the last inequality comes from the fact that $Q$ is a stochastic matrix. By definition of $i$ and since $h \neq 0_d$ , we have that $|h_i| \neq 0$ and the previous inequality leads to $\lambda \leq 1$ . Since 1 is an eigenvalue of $Q$ , we deduce that the spectral radius of $Q$ is 1.

Claim 1.2: For two vectors $X,Y \in \mathbb R^d$ , we write $X\geq Y$ to state that $X_i \geq Y_i$ for all $i \in [d}$ . Then, for any $X \in \mathbb R^d \backslash \{0_d\}$ such that $QX\geq X$ and $X \geq 0$ , it holds that $QX=X$ and there exists some $a>0$ such that $X=a \mathbf 1$ .

Let us consider $i \in \arg \max _k X_k$ . Then for any $n \in \mathbb N$ , we have

$X_i = \sum_k q_{i,k}^{(n)} X_i \geq \sum_k q_{i,k}^{(n)} X_k \geq X_i,$

where the first equality comes from the irreducibility of $Q$ , the second inequality comes from the definition of $i$ and the fact that $X\geq0$ and the last is obtained by taking the i-th line in the assumption $QX \geq X$ . We deduce that all the above inqualities are actually equalities which implies that for any $j\in [d]$ such that $q_{i,j}^{(n)}>0$ , it holds $X_j=X_i$ . This result is true for any $n \in mathbb N$ and since for nay $j \in [d]$ there exists some $n \in \mathbb N$ such that $q_{i,j}^{(n)}>0$ , we deduce that

$X_i=X_j, \quad \forall j \in [d].$

Hence we proved that $X=a \mathbf 1$ with $a=X_i>0$ and $QX=X$ .

Claim 1.3: If $Z=(z_i) \in \mathbb C \backslash \{0_d\}$ is a complex eigenvector realted to an eigenvalue $\lambda$ of modulus 1 of $Q$ , then $|Z|=(|z_i|)$ is an nonnegative eigenvector related to the eigenvalue 1. This implies that the eigenspace related to 1 is $Vect(\mathbf 1)$ .

For all $i \in [d]$ , $\sum_k q_{i,k} z_k = \lambda z_i$ . Hence, taking the modulus in the previous inequality and using the triangle inequality leads to $|z_i| \leq \sum_k q_{i,k} |z_k|, \forall i \in [d]$ i.e. $|Z| \leq Q|Z|.$ Using Claim 1.2, we deduce that $Q|Z|=|Z|$ and that $|Z| \in Vect( \mathbf 1)$ , which concludes the proof.

Step 2 : Algebraic multiplicity of the eigenvalue 1

In the following, we will denote $\chi_Q$ the caracteristic polynomial of the matrix $Q$ and $P$ the conjugate of the comatrice of $I-Q$ . Claim 2.1 is a classical result of differential calculus.

Claim 2.1: $\chi_Q'(1)=\mathrm{Tr}(P)$ .

Claim 2.2: Every column of $P$ belongs to $Vect(\mathbf 1)$ .

From Step 1, we know that $I-Q$ has rank $n-1$ . Hence $\mathrm{det}(I-Q)=0$ and we get that

$(I-Q) \times P=det(I-Q)I=0.$

This means that the subspace spanned by the columns of $P$ is included in the nullspace of $I-Q$ which is $Vect(\mathbf 1)$ . Hence, each columns of $P$ belongs to $Vect(\mathbf 1)$ .

Claim 2.3: Every line of $P$ is proportional to $\pi$ .

From Step 0, we know that $D^{-1}Q^{\top}D$ is an irreducible stochastic matrix and we deduce that this is also the case for $D^{-1}(I-Q^{\top})D$ . Hence, remarking that Claim 2.2 holds for any irreducible stochastic matrix, we get that every column of $Com(D^{-1}(I-Q^{\top})D)^{\top}$ is proportional to $\mathbf 1$ , meaning that for all $j \in [d]$ , there exists some $a_j \in \mathbb R$ such that $c_{j,i}(D^{-1}(I-Q^{\top})D)=a_j$ for all $i$ where we have denote $c_{j,i}$ the cofactor application related to line $j$ and column $i$ . Hence, for all $i,j \in [d],$

$\begin{align*}a_j&= c_{j,i}(D^{-1}(I-Q^{\top})D) \\ &= \left[Com(D^{-1}(I-Q^{\top})D)^{\top}\right]_{i,j} \\&=\left[Com(D(I-Q)D^{-1})\right]_{i,j} \\ &=c_{i,j}( D(I-Q)D^{-1}) \\ &= \frac{\prod_{k : k\neq i} \pi_k}{\prod_{l:l\neq j}\pi_l} c_{i,j}(I-Q)\\ &= \frac{\pi_j}{\pi_i}c_{i,j}(I-Q)\end{align*}$

Hence we proved that for all $i,j \in [d]$ , $c_{i,j}(I-Q) = \frac{a_j}{\pi_j}\pi_i$ . We deduce that each line of $P=Com(I-Q)^{\top} = \left( c_{j,i}\right)_{i,j \in [d]}$ is proportional to $\pi.$

Claim 2.4: All entries of $P$ are non-zero and share the same sign.

From Step 1, the rank of the matrix $I-Q$ is 1 and thus the dimension of its nullspace is $d-1$ . We deduce that there exists a submatrix from $I-Q$ with size $(n-1)\times (n-1)$ which is invertible. This implies that the matrix $P$ has at least one non-zero coefficient and we note $(i,j)$ its location. From Claim 2.2, we deduce that all the entries of the j-th column of $P$ are non-zero (and have the same sign since they are equal). Then, using Claim 2.3 together with the fact that $\pi_k>0, \forall k$ from Step 0, we get the Claim 2.4.

Conclusion

From Claims 2.1 and 2.4, we have $\chi_Q'(1)=\mathrm{Tr}(P) \neq 0$ . This proves that $1$ is a root with multiplicity $1$ of the caracteristic polynomial of $Q$ .

Step 3 : The other eigenvalues of modulus 1

We consider $Z=(z_i)$ an eigenvector (complex) related to an eigenvalue $\lambda$ with modulus 1 of $Q.$ Using Claim 1.2, we can immediately that $|z_i|$ is independent of $i$ and thus we can consider without loss of generality that $z_1=1$ .

Claim 3.1: If $q_{i,j}>0$ then $z_j=\lambda z_i$ .

Since $QZ=\lambda Z$ , taking the modulus and using the triangle inequality gives $|Z| \leq Q|Z|$ . From Claim 1.2, we get that there exists some $a >0$ such that $|Z|=a \mathbf 1$ .

Claim 3.2: If $q_{i,j}>0$ then $z_j=\lambda z_i$ .

Let us consider some $i \in [d]$ . Taking the modulus in $\lambda z_i = \sum_k q_{i,k} z_k$ leads to

$|z_i| \leq \sum_k q_{i,k}|z_k|=|z_i| \sum_k q_{i,k} = |z_i| \quad (*)$

where we used Claim 3.1 and the fact that $Q$ is a stochastic matrix. We deduce that all inequalities in $(*)$ are equalities and the equality condition in the triangle inequality gives that the $(z_k)_k$ such that $q_{i,k}>0$ are positively colinear. From Claim 3.1, we get that $(z_k)_k$ such that $q{i,k}>0$ are equal to some $\overline{z}$ . Hence,

$\lambda z_i = \sum_k q_{i,k}z_k = \sum_{k \; : \; q_{i,k}>0} q_{i,k} \underbrace{z_k}_{=\overline{z}} = \overline{z}\underbrace{\sum_{k \; : \; q_{i,k}>0} q_{i,k}}_{=1} = \overline{z}.$

Hence for any $i,j \in [d]$ such that $q_{i,j}>0$ , it holds $z_j=\lambda z_i$ .

Claim 3.3: $\lambda$ is a p-th root of unity.

Let us consider $\theta \in [0,2\pi[$ such that $\lambda = e^{i \theta}$ . For all $j \in [d]$ , since $q_{j,j}^{(p_j)}>0$ , Claim 3.2 gives $z_j =\lambda^{p_j}z_j$ and since $z_j \neq 0$ (see Claim 3.1), we have that $\lambda^{p_j}=e^{i \theta p_j}=1$ . Hence $\frac{2\pi}{\theta}$ is an integer and divides $p_i$ . We deduce that $\frac{2\pi}{\theta}$ divides $p=pgcd(p_k \;:\; k \in [d])$ meaning that there exists some $m \in \mathbb N$ such that $\theta = \frac{2\pi m}{p}$ . We have found that $\lambda$ is a p-th root of unity.

Conversely, let us consider $\lambda$ a p-th root of unity. We define the vector $Z^{(\lambda)}=(z_i^{(\lambda)})$ by $z_1^{(\lambda)}=1$ and $z_i^{(\lambda)}=\lambda^k$ if $q_{1,i}^{(k)}\neq 0$ .

Claim 3.4: The vector $Z^{(\lambda)}$ is well-defined and is an eigenvector associated with the eigenvalue $\lambda$ .

$\bullet$ We first show that $Z^{(\lambda)}$ is well-defined. Let us consider some $i \in [d]$ . By irreducibility of the matrix $Q$ , there exists some $r \in \mathbb N$ such that $q_{i,1}^{(r)}>0$ . Then, for all $k \in \mathbb N$ such that $q_{1,i}^{(k)}>0$ , we have

$q_{i,i}^{(k+r)} \geq q_{i,1}^{(r)}q_{1,i}^{(k)}>0.$

Hence, $p=p_i$ divides $k+r$ . Since $\lambda$ is a p-th root of unity, we deduce that for any $k,l \in \mathbb N$ such $q_{1,i}^{(k)}>0$ and $q_{1,i}^{(l)}>0$ , we have

$\lambda^k=\lambda^l=\lambda^{-r}.$

Hence, the vector $Z^{(\lambda)}$ is well-defined.

$\bullet$ For all $i \in [d]$ , we consider some integer $k_i \in \mathbb N$ such that $q_{1,i}^{(k_i)}>0$ . One can easily remark that for all $i,j \in [d]$ , if $q_{i,j}>0$ then $q_{1,j}^{(k_i+1)}>0$ since

$q_{1,j}^{(k_i+1)} \geq q_{1,i}^{(k_i)}q_{i,j}.$

Hence, for all $i,j \in [d]$ ,

(1) $\begin{equation*} q_{i,j}>0 \implies z_{j}^{(\lambda)}=\lambda^{k_i+1}=\lambda z_{(\lambda)}_i\end{equation*}$

From here, we get that for all $i \in [d]$ ,

$\begin{align*} ( Q Z^{(\lambda)} )_i &= \sum_j q_{i,j} z^{(\lambda)}_j \\ &= \sum_{j \; : \; q_{i,j}>0} q_{i,j} z^{(\lambda)}_j\\ &= \sum_{j \; : \; q_{i,j}>0} q_{i,j} \lambda^{k_i+1}\\ &= \lambda^{k_i+1} \underbrace{\sum_{j \; : \; q_{i,j}>0} q_{i,j}}_{=1} \\ &= \lambda \times \lambda^{k_i}\\ &= \lambda z^{(\lambda)}_i .\end{align*}$

Hence we proved that $Q Z^{(\lambda)} = Z^{(\lambda)}$ meaning that $Z^{(\lambda)}$ is an eigenvector of $Q$ for the eigenvalue $\lambda$ .

Claim 3.5: Let $\Delta$ be the diagonal matrix with diagonal coefficients $(z_i^{(\lambda)})_i$ . It holds $\Delta^{-1}Q\Delta = \lambda Q$ . This gives that the eigenvalues associated with the p-th roots of unity of $Q$ have algebraic multiplicity $1$ and the spectrum of $Q$ is invariant under rotation of angle $\frac{2\pi}{p}$ .

$\bullet$ The statement $\Delta^{-1}Q\Delta = \lambda Q$ reads as

(2) $\begin{equation*} \forall i,j \in [d], \quad z^{(\lambda)}_jq_{i,j} = \lambda z^{(\lambda)}_i q_{i,j}.\end{equation*}$

\eqref{claim35} clearly holds if $q_{i,j}=0$ . Hence we consider $i,j \in [d]$ such that $q_{i,j} \neq 0$ . From \eqref{claim34}, we have $z_j^{(\lambda)} = \lambda z_i^{(\lambda)}$ and \eqref{claim35} is satisfied.

$\bullet$ Now we prove that all p-th roots of unity are eigenvalues of $Q$ with multiplicity 1. Let us consider some p-th root of unity $\lambda$ . Denoting for any $i\in [d]$ $c_{i,i}(A)$ the cofactor of some matrix $A \in \mathbb R^{d\times d}$ at location $(i,i)$ , one can easily check that

$c_{i,i}(\Delta^{-1}(\lambda I-Q)\Delta) = c_{i,i}(\lambda I-Q).$

We deduce that

$\begin{align*} \chi_Q'(\lambda) &= \mathrm{Tr}(Com(\lambda I-Q)^{\top}) \\&= \mathrm{Tr}(Com(\lambda I-Q))\\ &=\mathrm{Tr}(Com( \Delta^{-1}(\lambda I-Q)\Delta))\\ &= \mathrm{Tr}(Com( \lambda I-\Delta^{-1}Q\Delta)) \\ &= \mathrm{Tr}(Com( \lambda I-\lambda Q)) \\ &= \lambda^{d-1} \mathrm{Tr}(Com( I-Q)) \\ &=\lambda^{d-1} \mathrm{Tr}(P) \neq 0, \end{align*}$

where we used the conclusion of Step 1 to state that $\mathrm{Tr}(P) \neq 0$ . This shows that $\lambda$ has multiplicity 1.

$\bullet$ Let us consider some eigenvalue $\mu$ of $Q$ . We set $\lambda = e^{i\frac{2\pi}{p}}$ and $\Delta$ the diagonal matrix with diagonal coefficients $(z_i^{(\lambda)})_i$ . Then,

$\begin{align*} \chi_Q(e^{i \frac{2\pi}{p}} \mu)&=\mathrm{det}(\lambda \mu I - Q) \\ &=\mathrm{det}(\Delta^{-1}(\lambda \mu I - Q)\Delta) \\&=\mathrm{det}(\lambda \mu I - \Delta^{-1}Q\Delta) \\ &=\mathrm{det}(\lambda \mu I - \lambda Q) \\ &= \lambda^{d}\mathrm{det}( \mu I - Q) \\ &=\lambda^d \chi_Q(\mu) \\ &= 0, \end{align*}$

where the last equality holds because $\mu$ is an eigenvalue of $Q$ . This proves that the spectrum of $Q$ is invariant under rotation of angle $\frac{2\pi}{p}$ and concludes the proof of Claim 3.5.

Extension of the Theorem

The Krein–Rutman theorem is a generalisation of the Perron-Frobenius theorem to infinite-dimensional Banach spaces and complex integral operators.

Davis-Kahan Sin Theta Theorem

Quentin Duchemin