Quentin Duchemin

Compressed sensing : A brief introduction

Compressed sensing belongs to the large field of inverse problems. A typical example of such problems consists in determining the signal x\in \mathbb R^n that produces the measurement vector y through the linear transformation A \in \mathbb R^{m\times n}, namely y=Ax.

Of course, without additional assumption the problem is ill-posed if the rank of the matrix A is strictly smaller than n, meaning that A does not admit a left inverse. To be able to deal with cases where rank(A)<n, we need at least to assume some structure on the original signal x and/or on the design matrix A. Compressed sensing deals with the case where m<n and where x is assumed to be sparse meaning that only few coefficients of x are non zero. For any s \in [n], a vector x \in \mathbb R^n is said to be s-sparse if

    \[\# \{i \in [n] \; : \; x_i \neq 0\}\leq s.\]

Denoting a_1, \dots ,a_m the m lines oft the matrix A, typical problems adressed by compressed sensing are the following.

\mathcal P_1: Construct a minimal amount of sensing vectors (a_1,\dots, a_m) such that any s-sparse vector can be efficiently recovered from y= Ax and A \in \mathbb R^{m \times n}.

\mathcal P_2: Given sensing vectors (a_1,\dots, a_m) construct a recovery algorithm for any s sparse vector from y= Ax and A \in \mathds R ^{m\times n}; i.e. find an application \Delta:\mathbb R^m \to \mathbb R^n which can be efficiently implemented and such that for any s sparse vector x \in \mathbb R^n, \Delta(Ax)=x.

I. l_0-minimization

Reconstructing an s-sparse vector x\in \mathbb R ^n from its measurement vector y \in \mathbb R^n amounts to
solving the following l_0-minimization problem:

(1)   \begin{align*}  \underset{z \in \mathbb R^n}{\min}& \|z\|_0\\  s.c. &\; y=Ax=Az \notag \end{align*}


In the following, we present the two different problems that can be tackled together with the minimal number of measurements to ensure that x is the unique optimal solution of (1).

1) Non-uniform approach

For any n \geq s+1, given an s-sparse vector x \in \mathbb R^n, there exists a measurement matrix A \in \mathbb R^{m\times n} with m=s+1 rows such that the vector x can be reconstructed from its measurement vector y =Ax \in \mathbb R^m as a solution of the l_0-minimization problem (1).

2) Uniform approach

Given some matrix A \in \mathbb R^{m \times n}, we want to ensure that we are able to recover any s-sparse vectors x \in \mathbb R^n from its measurements y=Ax. The following Theorem states that a necessary condition for uniform recovery is m \geq 2s.

Theorem
Given A \in \mathbb R^{m\times n}, the following properties are equivalent:
(i) Every s-sparse vector x \in \mathbb R^n is the unique s-sparse solution of Az= Ax, that is, if Ax =Az and both x and z are s-sparse, then x =z.
(ii) The null space \mathrm{ker} \; A does not contain any 2s-sparse vector other than the zero vector, that is, ker A \cap \Sigma_{2s} = \{0\}, where \Sigma_{2s}=\{z \in \mathbb R^n \; : \; |z|_0\leq 2s\}.
(iii) For every S \subset [n] with |S| \leq 2s, the submatrix A_S is injective as a map from \mathbb R^{|S|} to \mathbb R^m.
(iv) Every set of 2s columns of A is linearly independent.

We are now going to see that m=2s measurements suffice to reconstruct every s-sparse vector – at least in theory.

Theorem
For any integer n \geq 2s, there exists a measurement matrix A \in \mathbb R^{m\times n} with m= 2s rows such that every s-sparse vector x \in \mathbb R^n can be recovered from its measurement vector
y \in \mathbb R^m as a solution of the l_0-minimization problem (1).

The optimization problem (1) is known to be NP-hard. In compressive sensing, we will rather consider special choices of A and choose y= Ax for some sparse x. We will see that a variety of tractable algorithms will then provably recover x from y and thereby solve (1) for such specifically designed matrices A. However, to emphasize this point once more, such algorithms will not successfully solve the l_0– minimization problem for all possible choices of A and y due to NP-hardness.

II. Convex relaxation through l_1 minimization

Hence one needs to transform the formulation of the problem to be able to recover the signal x efficiently without ending up with a problem that requires m to be too large to ensure that x is the unique optimal solution. The most natural way to do this is to propose a convex relaxation of the problem (1). This is done by replacing the l_0-norm in the objective function by the l_1-norm. Note that this is the best relaxation that we can propose in the sense that the l_1-norm is the convex envelope of the l_0-norm on [-1,1]^n. This means that on [-1,1]^n, the l_1-norm is the supremum of all the affine functions that are smaller than the l_0-norm on [-1,1]^n. Another equivalent statement is that the l_1-norm coincides with the biconjugate of the l_0-norm on [-1,1]^n.

(2)   \begin{align*}  \underset{z \in \mathbb R^n}{\min}& \|z\|_1\\ s.c. &\; y=Ax=Az \notag \end{align*}

1. Necessary condition

Proposition

Let A \in \mathbb R^{m \times n} be a matrix satisfying ER(2s), then

    \[m\geq \frac{1}{\log 3}\left\lfloor \frac{s}{2} \right\rfloor \log \left( \left\lfloor\frac{n}{8es}\right\rfloor\right).\]

The number of measurements needed to reconstruct every s-sparse vector via (2) always satisfies m \gtrsim s \log(en/s) up to a universal constant. It means that we are paying an extra log factor compared to the l_0-minimization procedure. We gain more on a computational side than we lose on the theoretical number of measurements.

Definition
Let A : \mathbb R^n \to \mathbb R^m such that m \leq n. We say that A satisfies the exact recovery property of order s (ER(s)), if for any s-sparse vector x\in \mathbb R^n, one has

    \[\underset{z \in \mathbb R^n \; : \; Az=Ax}{\arg \; \min} \; \|z\|_1 \;\; = \{x\}.\]

2) Equivalent condition: Nullspace property and cone of descent

A matrix A \in \mathbb R^{m\times n} is said to satisfy the null space property relative to a set S \subset [n] if

    \[\|v_S\|_1 < \|v_{S^c}\|_1,\]


for all v \in ker(A) \backslash \{0\}. It is said to satisfy the null space property of order s if it satisfies the null space property relative to any set S \subset [n] with |S|\leq s.

Theorem
Let A \in \mathbb R^{m\times n}. The following assertions are equivalent:
(i) A satisfies ER(s).
(ii) A satisfies NSP(s).
(iii) ker(A) \cap \mathcal D_{\|\cdot\|_1} = \{0\} where \mathcal D_{|\cdot|_1}:= \{d \in \mathbb R^n \; : \; \exists c>0, \; \|x+cd\|_1 \leq \|x\|_1\} is the cone of descent of the norm \|\cdot\|_1 at x.

Theorem
If A \in \mathbb R^{m\times n} satisfies NSP(s), then Ker(A) \cap \Sigma_{2s}=\{0\} which means that uniform recovery is ensured by solving (1).

Hence, the null space property is a necessary and sufficient condition for a matrix A to ensure recovery via (2).

It would be sufficient to construct sensing matrices satisfying NSP(s) with a
minimal number of rows (i.e. a minimal number of measurements) to solve CS problem using (2). However, verifying such a condition is impossible in practice. We are going to use instead two stronger properties that imply NSP(s). These two conditions are then sufficient – whereas NSP is necessary an sufficient – for (2) to success.

2) Sufficient conditions

a) Gelfand width

For all integer n, let us denote B_1^n (resp. B_2^n) the l_1 (resp. l_2) unit ball of \mathbb R^n, and let us define

    \[diam(B_1^n,l_2) = \sup_{x \in B_1^n}\; \|x\|_2.\]

Under NSP(s), the null space of A does not contain the sparse vectors set \Sigma_{2s}. Then Ker(A) is far enough from the canonical axes and from all the 2s-dimensional spaces spanned by canonical basis vectors. Intuitively, Ker(A) will be directed only diagonal directions in \mathbb K^n. For these vectors said to be well-spread, the ratio of their l_2-norm over their l_1-norm much smaller than 1. One can anticipate that NSP(s) will be verified if the l_2-norm of vectors in B_1^n \capKer(A)
is much smaller than 1.

In high dimensions, the l_1-norm is really spiky (Source: C. Boyer lecture notes).

Definition
We say that A satisfies the Gelfand property of order s if

    \[diam(B_1^n \cap Ker(A),l_2)< \frac{1}{2 \sqrt s}.\]


We will write that A satisfies Gerlfand(s).

Theorem
If A \in \mathbb R^{m \times n} satisfies Gelfand(s) then A satisfies NSP(s).

b) Restricted isometry property

Definition
Let A \in \mathbb R^{m\times n}. We way that A satisfies the restricted isometry property of order s if for any x \in \Sigma_s,

    \[\frac12 \|x\|_2 \leq \|Ax\|_2 \leq \frac32 \|x\|_2 .\]


We will write that A satisfies RIP(s).

RIP(s) condition is really strong as it can be seen from the following Theorem.

Theorem
If A \in \mathbb R^{m \times n} satisfies RIP(s) then A satisfies Gelfand(s/65).

In a nutshell

    \[RIP(65s) \implies  Gelfand(s) \implies  NSP(s) \Leftrightarrow ER(s)\]

The NSP(s) is equivalent to ER(s), then this is a tight condition for ER(s). However it is unpractical because it is quasi not verifiable. RIP condition, even if it is a more restrictive condition, gives a quantitative criterion, which is much easier to deal with. Hence, the next goal is then to construct sensing matrices satisfying RIP.

Leave a Reply

Your email address will not be published. Required fields are marked *.

*
*
You may use these <abbr title="HyperText Markup Language">HTML</abbr> tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>