Wigner and Marcenko Pastur Theorems

I. Wigner’s theorem

Definition
Let $\sigma > 0$ . The probability distribution on $(\mathbb R, \mathcal B( \mathbb R))$ defined by

$\mathbb P_{sc, \sigma^2} (dx) = \frac{1}{2\pi \sigma^2}\sqrt{\left( 4 \sigma^2 - x^2\right)_+}dx,$

is called the semi-circular distribution. We denote $\mathbb P_{sc,1} = \mathbb P$ .

Wigner’s Theorem
– Let $\left(X_{ij}\right)_{1\leq i <j \leq N}$ i.i.d. centered, complex valued and with $\mathbb E |X_{i,j}|^2 = \sigma^2 <\infty$ .
– Let $\left(X_{i,i}\right)_{1\leq i \leq N}$ i.i.d. centered, real valued with $\mathbb E |X_{i,j}|^2 = \sigma^2 <\infty$ .
– The $X_{i,i}$ ‘s and the $X_{i,j}$ ‘s ( $i < j$ ) are independent.
– Consider $X_N$ and $Y_N$ the $N \times N$ hermitian matrices defined by

$\forall i,j \in [N], \quad \left(X_N\right)_{i,j} = \left\{\begin{array}{ll} X_{i,j} & \mbox{if } i\leq j \\\overline{X}_{j,i} & \mbox{if } i> j\end{array}\right. \quad \text{ and } \quad Y_N = \frac{X_N}{\sqrt N}.$

Then almost surely,

$L_N = \frac{1}{N} \sum_{i=1}^N \delta_{\lambda_i(Y_N)} \overset{\mathcal D}{\underset{N \to \infty}{\longrightarrow }} \mathbb P_{sc,\sigma^2},$

where $(\lambda_i(Y_N))_i$ are the (real) eigenvalues of the matrix $Y_N$ .

The semi-circular distribution (in red) and the empirical measure $L_N$ (in cyan) for different values of $N$ .

Proof

Show that semi circular law is characterized by its moments (using a corollary of Carleman theorem). Compute the moments of the smi-circular law (and find the Catalan numbers).
Compute the moments of the spectral density of the Wigner matrix.
Show that the moments convergences to the Catalan numbers.
Convergence of moments + limit distribution characterized by moments -> weak convergence

Additional results

If $\mathbb E |X_{i,j}|^4<\infty, \quad \forall i,j$ , then
$\lambda_{\max}(Y_N) \overset{a.s.}{\underset{N \to \infty}{\to}} 2 \sigma^2 \quad \text{and} \quad \lambda_{\min}(Y_N) \overset{a.s.}{\underset{N \to \infty}{\to}} -2 \sigma^2.$

In particular,
$\|Y_N\| = \max\left( |\lambda_{\max}(Y_N)| \; , \; |\lambda_{\min}(Y_N)| \right) \overset{a.s.}{\underset{N \to \infty}{\to}} 2\sigma^2.$
If $\mathbb E |X_{i,j}|^4=\infty,\quad \forall i,j$ , then
$\lambda_{\max}(Y_N) \overset{a.s.}{\underset{N \to \infty}{\to}} + \infty.$

Remark.
We refer to the paper “The eigenvalues of random symmetric matrices” to get an extension of the Wigner’s Theorem.

II. Marcenko pastur

1. Preliminaries

a) Key results for the proof of the Marcenko-Pastur Theorem

Definition
Let us consider $\mu$ a probability measure on $(\mathbb R, \mathcal B(\mathbb R))$ . The Stieltjes transform of $\mu$ denoted $g\equiv g_{\mu}$ is defined by

$\forall z \in \mathbb C^+:=\{u \in \mathbb C \; : \; Im(u)>0\}, \quad g(z) = \int_{\mathbb R} \frac{\mu(d\lambda)}{\lambda -z} .$

Theorem (Properties of the Stieltjes Transform)
Let $\mu$ be some measure on $(\mathbb R, \mathcal B(\mathbb R))$ with Stieltjes Transform $g$ .

$g$ is analytic on $\mathbb C^+$ .
$g(\mathbb C^+) \subset \mathbb C^+.$
If Supp( $\mu$ ) $\subset \mathbb R^+$ , then $Im(zg(z))\geq 0, \quad \forall z \in \mathbb C^+$ .
$\underset{y \to + \infty}{\lim} iyg(iy)=-\mu(\mathbb R)$ .
$\forall z \in \mathbb C^+, \quad |g(z)| \leq \frac{\mu(\mathbb R)}{Im(z)}$ .
$\forall x \in \mathbb R,\quad \mu \{x\} = \underset{y\to 0^+}{\lim} y Im\left( g(x+iy)\right).$
For any $f:\mathbb R \to \mathbb R$ is bounded and continuous,
$\int f d\mu = \underset{y\to 0^+}{\lim} \frac{1}{\pi} \int f(x) Im\left( g(x+iy) \right)dx.$
For all $a,b \in \mathbb R$ points of continuity of $\mu$ ,
$\mu \; (a,b)=\underset{y\to 0^+}{\lim} \frac{1}{\pi} \int_a^b g(x+iy)dx.$

Remark. The previous Theorem states that the Stieltjes Transform is analytic on $\mathbb C^+$ and when $z \in \mathbb C^+$ goes to the real axis, the holomorph property is lost but this allows to recover the measure $\mu$ (see properties 6, 7 and 8).

Theorem (Weak convergence and pointwise convergence of Stieltjes Transform)
Let us consider $(\mu_n)_n,\mu$ probability measures on $\mathbb R$ .

If $\mu_n \overset{\mathcal D}{\underset{n \to \infty}{\to}} \mu$ , then
$\forall z \in \mathbb C^+, \quad g_{\mu_n}(z) \underset{n \to \infty}{\to} g_{\mu}(z).$
a) Let us consider $D \subset \mathbb C^+$ with an accumulation point. If $g_{\mu_n}(z) \underset{n \to \infty}{\to} g(z)$ for all $z\in D$ , then
– there exists a measure $\nu$ satisfying $\nu(\mathbb R) \leq 1$ such that
$g(z)= \int \frac{\nu(d\lambda)}{\lambda -z}, \quad \forall z \in \mathbb C^+.$

– $\mu_n \overset{v}{\underset{n \to \infty}{\to}} \nu.$

b) If it also holds that $\lim_{y \to + \infty} iy g(iy)=-1$ (i.e. $\nu(\mathbb R)=1$ from point 4) of the previous Theorem), then $\nu$ is a probability measure and $\mu_n \overset{\mathcal D}{\underset{n \to \infty}{\to}} \nu$ .

The previous Theorem can be seen as the counterpart of the famous Levy Theorem. The Levy Theorem states the link between the weak convergence and the pointwise convergence of the characteristic function.

Definition
The characteristic function of a real-valued random variable $X$ is defined by

$\forall \xi \in \mathbb R, \quad \phi_X(\xi) = \mathbb E\left[ e^{iX\xi} \right].$

Theorem (Levy)
Let us consider real-valued random variables $(X_n)_n$ and $X$ .

If $X_n \underset{n \to \infty}{\to} X$ , then $\forall \xi \in \mathbb R, \quad \phi_{X_n}(\xi) \underset{n \to \infty}{\to} \phi_(\xi).$
If there exists some function $\phi:\mathbb R \to \mathbb R$ such that
$\phi_{X_n}(\xi)\underset{n \to \infty}{\to}\phi(\xi), \quad \forall \xi \in \mathbb R,$
and if $\phi$ is continuous at $0$ , then
– there exists $Z$ a real valued random variable such that $\phi\equiv \phi_Z$ .
– $X_n \overset{\mathcal D}{\underset{n \to \infty}{\to}} Z$ .

Just for the sake of beauty, let me mention the following result that can allow to identify functions that can be written as the Stieltjes Transform of some measure $\mu$ on $(\mathbb R, \mathcal B(\mathbb R))$ .

Theorem (Recognize a Stieltjes Transform)
If $g :\mathbb C^+ \to \mathbb C$ satisfies

$Im(g(z)) \geq 0, \quad \forall z \in \mathbb C^+$
$g$ is analytic
There exists $M>0$ such that $|g(z)| \leq \frac{M}{Im(z)},\quad \forall z \in \mathbb C^+$

Then there exists a unique measure $\mu$ on $(\mathbb R, \mathcal B(\mathbb R))$ satisfying $\mu(\mathbb R) \leq M$ such that

$g(z) = \int_{\mathbb R} \frac{\mu(d\lambda)}{\lambda-z}.$

If additionally it holds
4. $\forall z \in \mathbb C^+, \quad Im(zg(z)) \geq 0$ ,
Then $\mu(\mathbb R^-) = \mu((-\infty,0)) = 0$ .

Example: If $g$ is the Stieltjes Transform of some measure $\mu$ then $h= - \frac{1}{z+g(z)}$ is the Stieltjes Transform of some probability measure $\mu$ on $(\mathbb R, \mathcal B(\mathbb R))$ .

b) Proof of the key preliminary result

We will need the following additional properties in the proof.

Theorem (Helly’s selection theorem)
From every sequence of proba measures $(\mu_n)_n$ , one can extract a subsequence that converges vaguely to a measure $\nu$ . (Note that $\nu$ is not necessary a probability measure).

Remark. $\mu_n \underset{n \to \infty}{\to} \nu$ vaguely implies $\int f d\mu_n \underset{n \to \infty}{\to} \int f d\nu$ for every $f \in C_0(\mathbb R)$ where

$C_0(\mathbb R) := \left\{f:\mathbb R \to \mathbb C \; : \; f \text{ is continuous and } \lim_{\pm \infty} f = 0 \right\}.$

Proof.
1) To prove that for any $z \in \mathbb C^+$ , $g_{\mu_n}(z) \to g_{\mu}(z)$ , we show that for any $z \in \mathbb C^+$ , $\mathcal{R}e (g_{\mu_n}(z)) \to \mathcal{R}e ( g_{\mu}(z))$ and $\mathcal{I}m (g_{\mu_n}(z)) \to \mathcal{I}m (g_{\mu}(z))$ . Let us consider some $z=x+iy \in \mathbb C^+$ . Since

$\begin{align*}\mathcal{R}e (g_{\mu_n}(z)) & = \int_{\mathbb R} \frac{\mu_n(d\lambda)}{\lambda -z}\\&= \int_{\mathbb R} \frac{(\lambda-x)\mu_n(d\lambda)}{(\lambda -x)^2+y^2},\end{align*}$

and since $\lambda \mapsto \frac{(\lambda-x)}{(\lambda -x)^2+y^2}$ is continuous and bounded on $\mathbb R$ (because $y>0$ ), the weak convergence of the sequence $(\mu_n)_n$ to $\mu$ ensures that

$\mathcal{R}e (g_{\mu_n}(z)) \underset{n \to \infty}{\to} \int_{\mathbb R} \frac{(\lambda-x)\mu(d\lambda)}{(\lambda -x)^2+y^2} =\mathcal{R}e (g_{\mu}(z)).$

Using an analogous approach, one can show that

$\mathcal{I}m (g_{\mu_n}(z)) \underset{n \to \infty}{\to} \mathcal{I}m (g_{\mu}(z)),$

which concludes the proof of the first item of the Theorem.

2a) We consider some set $D \subset \mathbb C^+$ with an accumulation point such that $\forall z \in D, \quad g_{\mu_n}(z) \underset{n \to \infty}{\to} g(z)$ .
By the Helly’s selection Theorem, there exists a subsequence $(\mu'_n)_n$ of the sequence of probabilities $(\mu_n)_n$ that converges vaguely to some measure $\nu$ on $(\mathbb R, \mathcal B(\mathbb R))$ . Let us consider some $z \in D$ . Using the previous remark and since $\phi:\lambda \mapsto \frac{1}{\lambda-z}$ belongs to $C_0(\mathbb R)$ , we get that

$\int \phi(\lambda) d\mu'_n(\lambda) \underset{n \to \infty}{\to} \int \phi(\lambda) d\nu(\lambda).$

Since $\left( \int \phi(\lambda) d\mu'_n(\lambda) \right)_n$ is a subsequence of the sequence $(g_{\mu_n}(z))_n$ that converges to $g(z)$ , we deduce that

(1) $\begin{equation*} \forall z \in D, \quad g(z) = g_{\nu}(z).\end{equation*}$

We know from the properties of the Stieltjes Transform that $g$ and $g_{\nu}$ are analytic functions on $\mathbb C^+$ . Hence, (1) and the analytic continuation give

$\forall z \in \mathbb C^+, \quad g(z) = g_{\nu}(z).$

We also get that $(\mu_n)_n$ converges vaguely to $\nu$ . Considering another subsequence $(\mu''_n)_n$ that converges vaguely to some measure $\nu'$ , the previous arguments prove that

$\forall z \in D, \quad \int \frac{d\nu'(\lambda)}{\lambda-z} = g(z) = \int \frac{d\nu'(\lambda)}{\lambda-z}.$

Using again the analytic continuation, we obtain that

$\forall z \in \mathbb C^+, \quad \int \frac{d\nu'(\lambda)}{\lambda-z} \int \frac{d\nu'(\lambda)}{\lambda-z},$

leading to $\nu = \nu'$ . This gives that $(\mu_n)_n$ has a unique accumulation point. A standard argument based on the Helly’s selection Theorem ensures that

$\mu_n \overset{\mathcal D}{\underset{n \to \infty}{\to}} \nu.$

2b) This directly follows from the equivalece between points $(i)$ and $(ii)$ of the following Lemma.

Lemma
Let $(\mu_n)_n$ be probability measures and $\nu$ be a measure on $(\mathbb R, \mathcal B(\mathbb R))$ . The following statements are equivalent.
$(i)$ $\mu_n \overset{\mathcal D}{\underset{n \to \infty}{\to}} \nu$ .
$(ii)$ $\mu_n \overset{v}{\underset{n \to \infty}{\to}} \nu$ and $\nu(\mathbb R)=1$ .
$(iii)$ $\mu_n \overset{v}{\underset{n \to \infty}{\to}} \nu$ and $(\mu_n)_n$ is tight, namely for any $\epsilon>0$ , there exists some compact set $K \subset \mathbb R$ such that

$\forall n \in \mathbb N, \quad \mu_n(K) \geq 1-\epsilon.$

2. Marcenko-Pastur Theorem

Theorem (Marcenko-Pastur)
Let us consider a $N\times n$ matrix $X_N$ with i.i.d. entries such that

$\mathbb E X_{i,j}=0 \quad \text{and} \quad \mathbb E |X_{i,j}|^2 = \sigma^2, \quad \forall i,j \in [n],$

with $N=N(n)$ and $n$ of the same order and $L_N$ the spectral measure of $\frac{1}{n} X_N X_N^*$ :
$c_n = \frac{N}{n} \underset{n \to \infty}{\to} c \in (0,\infty), \quad L_N = \frac{1}{N} \sum_{i=1}^N \delta_{\lambda_i}, \quad \lambda_i=\lambda_i \left(\frac{1}{n} X_N X_N^*\right).$

Then, almost surely (i.e. for almost every realization),
$L_n \overset{\mathcal D}{\underset{N,n \to \infty}{\to}} \mathbb P_{MP},$

where $\mathbb P_{MP}$ is the Marcenko-Pastur distribution
$\mathbb P_{MP}(dx)=\left(1-\frac{1}{c}\right)_+ \delta_0 + \frac{\sqrt{\left[ (\lambda^+-x)(x-\lambda^-)\right]_+}}{2\pi\sigma^2xc}dx,$

with
$\left\{\begin{array}{ll}\lambda^- &=\sigma^2 (1-\sqrt{c})^2 \\\lambda^+ &=\sigma^2 (1+\sqrt{c})^2\end{array}\right..$

In what follows, we denote $g_{MP}$ the Stieltjes Transform of the measure $\mathbb P_{MP}$ .

Remark

The behavior of the spectral measure brings information about the vast majority of the eigenvalues but is not affected by some individual eigenvalues’ behavior. For example, one may have $\lambda_{\max} \to \infty$ without affecting the behaviour of the whole sum.
The Dirac measure at zero is an artifact due to the dimensions of the matrix if $N>n$ .
If $c \to 0^+$ , that is $n \gg N$ , then typical from the usual regime “small dimensional data vs large samples”. The support of Marcenko-Pastur distribution $[\sigma^2 (1-\sqrt{c})^2, \sigma^2 (1+\sqrt{c})^2]$ concentrates around $\{\sigma^2\}$ and
$\mathbb P_{MP} \underset{c\to 0}{\to}\delta_{\sigma^2}.$

Proof.
We want to show that

(2) $\begin{equation*}\forall z \in \mathbb C^+, \quad g_{n}(z) \overset{a.s.}{\underset{N,n \to \infty}{\to}} g_{MP}(z).\end{equation*}$

This will allow to show that there exists some set $D \subset \mathbb C^+$ a countable set with an accumulation point such that almost surely

$\forall z \in D, \quad g_n(z) \underset{N,n \to \infty}{\to} g_{MP}(z).$

Indeed, suppose that we know that (2) holds et let us consider some sequence $(z_k)_{k \in \mathbb N} \in \left( \mathbb C^+ \right) ^{\mathbb N}$ with an accumulation point $z^* \in \mathbb C^+$ . One can typical take $z_k=\frac{1}{k}+i$ for all $k \in \mathbb N$ and $z^*=i$ . Then, (2) ensures that for any $k \in \mathbb N$ , there exists some set $\Omega_k$ with $\mathbb P(\Omega_k)=1$ such that

(3) $\begin{equation*}g_{n}(z_k) \overset{a.s.}{\underset{N,n \to \infty}{\to}} g_{MP}(z_k)\end{equation*}$

holds for any $\omega \in \Omega_k$ . Let us denote now $\Omega:= \bigcap_{k \in \mathbb N} \Omega_k$ and $D = \bigcup_{k \in \mathbb N} \{z_k\} \cup \{z^*\}$ . We get that $\mathbb P(\Omega)=1$ and for all $\omega \in \Omega$ , it holds

$\forall z \in D, \quad g_{n}(z) \overset{a.s.}{\underset{N,n \to \infty}{\to}} g_{MP}(z),$

(note that we used the continuity of the functions $(g_n)_n$ and $g_{MP}$ to ensure that (3) implies that for all $\omega \in \Omega$ , it holds $g_{n}(z^*) \overset{a.s.}{\underset{N,n \to \infty}{\to}} g_{MP}(z^*)).$

We will then conclude the proof of the Theorem using the Theorem of the previous section (called Weak convergence and pointwise convergence of Stieltjes Transform).

To prove (2), we use the decomposition

$\forall z \in \mathbb C^+, \quad g_n(z) -g_{MP}(z) = \underbrace{g_n(z) - \mathbb E [ g_n(z)]}_{=(1)} + \underbrace{\mathbb E[g_n(z)] - g_{MP}(z)}_{=(2)}.$

To deal with the term $(1)$ , we use the Efron-Stein inequality to show that
$Var(g_n(z)) = \mathcal O\left(\frac{1}{n^2}\right),$

which allows to prove that almost surely $g_n(z) - \mathbb E [ g_n(z)]$ tends to $0$ using Borel-Cantelli Lemma.
We then show that $\mathbb E g_n(z)$ satisfies
(4) $\begin{equation*} z \sigma^2 c_n \left( \mathbb E g_n(z) \right)^2+ \left[ z + \sigma^2(c_n-1)\right] \mathbb E g_n(z)+1 = \mathcal O_z(n^{1/2}),\end{equation*}$
and that $g_{MP}$ satisfies the equation
(5) $\begin{equation*} z \sigma^2 c \left( \mathbb E g_{MP}(z) \right)^2+ \left[ z + \sigma^2(c-1)\right] \mathbb E g_{MP}(z)+1 =0. \end{equation*}$
To conclude the proof (i.e. to get (2)), we use some “stability” result. More precisely, equations (4) and (5) are close and we need to show that solutions of these equations are as a consequence, close to each other. We formalize this result with the following Lemma.

Lemma
Let $z\in \mathbb C^+$ . We assume that there exists two Stieltjes Transform of probability measures on $\mathbb R^+$ , denoted $X, X_{\delta}$ , that are respectively the solutions of the equation

$\begin{align*}z \sigma^2 c X^2+ \left[ z + \sigma^2(c-1)\right]X+1 &=0 \\z \sigma^2 c_{\delta} X_{\delta}^2+ \left[ z + \sigma^2(c_{\delta}-1)\right]X_{\delta}+1 &=0 ,\end{align*}$

where $c,c_{\delta}>0$ . Then

$|X-X_{\delta}| = \mathcal O_z(|\delta|)+\mathcal O_z(|c-c_{\delta}|).$

Let

$\lambda_{\max } = \lambda_{max} \left( \frac{1}{n} X_N X_N^* \right),$

and

$\lambda_{\min} = \lambda_{min} \left( \frac{1}{n} X_N X_N^* \right).$

Theorem (convergence of extremal eigenvalues)

If $\mathbb E |X_{i,j}|^4<\infty$ , then
$\lambda_{\max} \overset{a.s.}{\underset{N,n \to \infty}{\to}} \sigma^2(1+\sqrt{c})^2 \quad \text{and} \quad \lambda_{\min} \overset{a.s.}{\underset{N,n \to \infty}{\to}} \sigma^2(1-\sqrt{c})^2.$
If $\mathbb E |X_{i,j}|^4=\infty$ , then
$\lambda_{\max} \overset{a.s.}{\underset{N,n \to \infty}{\to}} +\infty \quad \text{and} \quad \lambda_{\min} \overset{a.s.}{\underset{N,n \to \infty}{\to}} \sigma^2(1-\sqrt{c})^2.$

Remark.
Exactly like with Wigner matrices, when the 4th moment of the random varaibles $X_{i,j}$ are not finite, $\lambda_{\max}$ goes to $+\infty$ . However, contrary to the Wigner case, for Wishart matrices $\lambda_{\min}$ still converges to a finite value.

Theorem (Fluctuations of $\lambda_{\max}$ and Tracy-Widom distribution)
We can fully describe the fluctuations of $\lambda_{\max}$ :

$\frac{N^{2/3}}{\Theta_N} \left\{ \lambda_{\max } - \sigma^2(1+\sqrt{c_n})^2\right\} \overset{\mathcal D}{\underset{N,n\to \infty}{\to}} \mathbb P_{TW},$

where

$c_n=\frac{N}{n} \quad \text{and} \quad \Theta_N = \sigma^2(1+\sqrt{c_n})^2\left( \frac{1}{\sqrt{c_n}}+1\right)^{1/3}.$

Slepian and Gordon Theorems with application

Quentin Duchemin