In many situations, there is a symmetric matrix of interest , but one only has a perturbed version of it . How is affected by ?

The basic example is Principal Component Analysis (PCA). Let for some random vector , and let be the sample covariance matrix on independent copies of . Namely, we observe i.i.d. random variable distributed as and we set

If is concentrated on a low dimensional subspace, then we can hope to discover this subspace from the principal components of . How accurate is the subspace we find?

## 1. Some intuition

An eigenspace of is the span of some eigenvectors of . We can decompose into its action on an eigenspace and its action on the orthogonal complement :

where is an orthonormal basis for (e.g., the eigenvectors of that span ), and is an orthonormal basis for (this follows from the spectral theorem). We can similarly decompose

with respect to a “corresponding” eigenspace (with dim dim ):

Suppose we find a few eigenvalues of that somehow stand out from the rest. For instance, as in PCA, we may find the first few eigenvalues to be much larger than the rest. Let be the corresponding eigenspace. If there is a similarly outstanding group of eigenvalues of , then the hope is that the corresponding eigenspace will be close to in some sense. For instance, we may be interested in how well approximates vectors in . Any vector in can be written as for some , the projection of this vector onto is . Then

Therefore vectors in will be well-approximated by if is “small”.

The condition we will need is separation between the eigenvalues corresponding to and those corresponding to . Suppose the eigenvalues corresponding to are all contained in an interval . Then we will require that the eigenvalues corresponding to be excluded from the interval for some . To see why this is necessary, consider the following example:

Here, the “size” of is comparable to the gap between the relevant eigenvalues. Then the eigenvalues of are and , and its corresponding eigenvectors are and . The eigenvalues of are also and , but its corresponding eigenvectors are and , we have , which can be

arbitrarily large relative to .

### 2. Distances between subspaces

Let and be -dimensional subspaces of , with . Let and be projectors onto these two subspaces.

We first consider the angle between two vectors, that is, when . Then, and are spanned by vectors. Denote the two corresponding vectors as . The angle between and is defined as:

Now, we need to extend this concept to subspaces (when ). Let and be matrices with orthonormal columns such that range() and range() . Then, the projectors can be written as and .

Now, we define the angle between subspaces and as the following:

**Definition**

The canonical or principal angles between and are:

where are singular values of or .

A general result known as CS-decomposition in linear algebra gives the following:

where

Another way of defining canonical angles is the following:**Definition**

The canonical angles between the spaces and are for , where are the singular values of

Now, given the definition of the canonical angles, we can define the distances between subspaces and as the following.

**Definition**

The distance between and is , which is a metric over the space of -dimensional linear subspaces of . Equivalently,

3. Davis Kahan Theorem

**Theorem**

Let and be symmetric matrices with and orthogonal matrices. If the eigenvalues are contained in an interval , and the eigenvalues of are excluded from the interval for some , then

for any unitarily invariant norm .

** Proof.**Since we have

Furthermore, , so

Let and . By the triangle inequality, we have

Here we have used a centering trick so that has eigenvalues contained in , and

has eigenvalues excluded from . This implies that and , respectively. Therefore

and

We conclude that

Another version of the Davis-Kahan Theorem which is more popular in the community of statisticians is the following.

**Theorem**

Let and be symmetric matrices with respective eigenvalues and .

Fix , and let and be matrices with orthonormal columns corresponding to eigenvalues and .

Let and be the subspaces spanned by columns of and . Define the eigengap as

where we define and .

If , then

The result also holds for the operator norm .

By **Weyl’s theorem**, one can show that the * sufficient *(but not necessary) condition for in Davis-Kahan theorem is

When the matrices are not symmetric, there exists a generalized version of the Davis-Kahan Theorem called **Wedin’s theorem**.

## Marylin Ser

Thank you for another informative blog. Where else could I get that type of info written in such a perfect way? I have a project that I’m just now working on, and I have been on the look out for such info.

## Dana Blankschan

You really make it seem so easy with your presentation but I find this topic to be actually something which I think I would never understand. It seems too complicated and extremely broad for me. I’m looking forward for your next post, I will try to get the hang of it!