In many situations, there is a symmetric matrix of interest , but one only has a perturbed version of it . How is affected by ?
The basic example is Principal Component Analysis (PCA). Let for some random vector , and let be the sample covariance matrix on independent copies of . Namely, we observe i.i.d. random variable distributed as and we set
If is concentrated on a low dimensional subspace, then we can hope to discover this subspace from the principal components of . How accurate is the subspace we find?
1. Some intuition
An eigenspace of is the span of some eigenvectors of . We can decompose into its action on an eigenspace and its action on the orthogonal complement :
where is an orthonormal basis for (e.g., the eigenvectors of that span ), and is an orthonormal basis for (this follows from the spectral theorem). We can similarly decompose
with respect to a “corresponding” eigenspace (with dim dim ):
Suppose we find a few eigenvalues of that somehow stand out from the rest. For instance, as in PCA, we may find the first few eigenvalues to be much larger than the rest. Let be the corresponding eigenspace. If there is a similarly outstanding group of eigenvalues of , then the hope is that the corresponding eigenspace will be close to in some sense. For instance, we may be interested in how well approximates vectors in . Any vector in can be written as for some , the projection of this vector onto is . Then
Therefore vectors in will be well-approximated by if is “small”.
The condition we will need is separation between the eigenvalues corresponding to and those corresponding to . Suppose the eigenvalues corresponding to are all contained in an interval . Then we will require that the eigenvalues corresponding to be excluded from the interval for some . To see why this is necessary, consider the following example:
Here, the “size” of is comparable to the gap between the relevant eigenvalues. Then the eigenvalues of are and , and its corresponding eigenvectors are and . The eigenvalues of are also and , but its corresponding eigenvectors are and , we have , which can be
arbitrarily large relative to .
2. Distances between subspaces
Let and be -dimensional subspaces of , with . Let and be projectors onto these two subspaces.
We first consider the angle between two vectors, that is, when . Then, and are spanned by vectors. Denote the two corresponding vectors as . The angle between and is defined as:
Now, we need to extend this concept to subspaces (when ). Let and be matrices with orthonormal columns such that range() and range() . Then, the projectors can be written as and .
Now, we define the angle between subspaces and as the following:
Definition
The canonical or principal angles between and are:
where are singular values of or .
A general result known as CS-decomposition in linear algebra gives the following:
where
Another way of defining canonical angles is the following:
Definition
The canonical angles between the spaces and are for , where are the singular values of
Now, given the definition of the canonical angles, we can define the distances between subspaces and as the following.
Definition
The distance between and is , which is a metric over the space of -dimensional linear subspaces of . Equivalently,
3. Davis Kahan Theorem
Theorem
Let and be symmetric matrices with and orthogonal matrices. If the eigenvalues are contained in an interval , and the eigenvalues of are excluded from the interval for some , then
for any unitarily invariant norm .
Proof.
Since we have
Furthermore, , so
Let and . By the triangle inequality, we have
Here we have used a centering trick so that has eigenvalues contained in , and
has eigenvalues excluded from . This implies that and , respectively. Therefore
and
We conclude that
Another version of the Davis-Kahan Theorem which is more popular in the community of statisticians is the following.
Theorem
Let and be symmetric matrices with respective eigenvalues and .
Fix , and let and be matrices with orthonormal columns corresponding to eigenvalues and .
Let and be the subspaces spanned by columns of and . Define the eigengap as
where we define and .
If , then
The result also holds for the operator norm .
By Weyl’s theorem, one can show that the sufficient (but not necessary) condition for in Davis-Kahan theorem is
When the matrices are not symmetric, there exists a generalized version of the Davis-Kahan Theorem called Wedin’s theorem.
Marylin Ser
Thank you for another informative blog. Where else could I get that type of info written in such a perfect way? I have a project that I’m just now working on, and I have been on the look out for such info.
Dana Blankschan
You really make it seem so easy with your presentation but I find this topic to be actually something which I think I would never understand. It seems too complicated and extremely broad for me. I’m looking forward for your next post, I will try to get the hang of it!