Compressed sensing : A brief introduction

Compressed sensing belongs to the large field of inverse problems. A typical example of such problems consists in determining the signal $x\in \mathbb R^n$ that produces the measurement vector $y$ through the linear transformation $A \in \mathbb R^{m\times n}$ , namely $y=Ax$ .

Of course, without additional assumption the problem is ill-posed if the rank of the matrix $A$ is strictly smaller than $n$ , meaning that $A$ does not admit a left inverse. To be able to deal with cases where $rank(A)<n$ , we need at least to assume some structure on the original signal $x$ and/or on the design matrix $A$ . Compressed sensing deals with the case where $m<n$ and where $x$ is assumed to be sparse meaning that only few coefficients of $x$ are non zero. For any $s \in [n]$ , a vector $x \in \mathbb R^n$ is said to be $s$ -sparse if

$\# \{i \in [n] \; : \; x_i \neq 0\}\leq s.$

Denoting $a_1, \dots ,a_m$ the $m$ lines oft the matrix $A$ , typical problems adressed by compressed sensing are the following.

$\mathcal P_1$ : Construct a minimal amount of sensing vectors $(a_1,\dots, a_m)$ such that any $s$ -sparse vector can be efficiently recovered from $y= Ax$ and $A \in \mathbb R^{m \times n}.$

$\mathcal P_2$ : Given sensing vectors $(a_1,\dots, a_m)$ construct a recovery algorithm for any $s$ sparse vector from $y= Ax$ and $A \in \mathds R ^{m\times n}$ ; i.e. find an application $\Delta:\mathbb R^m \to \mathbb R^n$ which can be efficiently implemented and such that for any $s$ sparse vector $x \in \mathbb R^n$ , $\Delta(Ax)=x$ .

I. $l_0$ -minimization

Reconstructing an s-sparse vector $x\in \mathbb R ^n$ from its measurement vector $y \in \mathbb R^n$ amounts to
solving the following $l_0$ -minimization problem:

(1) $\begin{align*} \underset{z \in \mathbb R^n}{\min}& \|z\|_0\\ s.c. &\; y=Ax=Az \notag \end{align*}$

In the following, we present the two different problems that can be tackled together with the minimal number of measurements to ensure that $x$ is the unique optimal solution of (1).

1) Non-uniform approach

For any $n \geq s+1$ , given an $s$ -sparse vector $x \in \mathbb R^n$ , there exists a measurement matrix $A \in \mathbb R^{m\times n}$ with $m=s+1$ rows such that the vector $x$ can be reconstructed from its measurement vector $y =Ax \in \mathbb R^m$ as a solution of the $l_0$ -minimization problem (1).

2) Uniform approach

Given some matrix $A \in \mathbb R^{m \times n}$ , we want to ensure that we are able to recover any $s$ -sparse vectors $x \in \mathbb R^n$ from its measurements $y=Ax$ . The following Theorem states that a necessary condition for uniform recovery is $m \geq 2s.$

Theorem
Given $A \in \mathbb R^{m\times n}$ , the following properties are equivalent:
$(i)$ Every s-sparse vector $x \in \mathbb R^n$ is the unique s-sparse solution of $Az= Ax$ , that is, if $Ax =Az$ and both $x$ and $z$ are s-sparse, then $x =z$ .
$(ii)$ The null space $\mathrm{ker} \; A$ does not contain any $2s$ -sparse vector other than the zero vector, that is, $ker A \cap \Sigma_{2s} = \{0\},$ where $\Sigma_{2s}=\{z \in \mathbb R^n \; : \; |z|_0\leq 2s\}$ .
$(iii)$ For every $S \subset [n]$ with $|S| \leq 2s$ , the submatrix $A_S$ is injective as a map from $\mathbb R^{|S|}$ to $\mathbb R^m.$
$(iv)$ Every set of $2s$ columns of $A$ is linearly independent.

We are now going to see that $m=2s$ measurements suffice to reconstruct every $s$ -sparse vector – at least in theory.

Theorem
For any integer $n \geq 2s$ , there exists a measurement matrix $A \in \mathbb R^{m\times n}$ with $m= 2s$ rows such that every $s$ -sparse vector $x \in \mathbb R^n$ can be recovered from its measurement vector
$y \in \mathbb R^m$ as a solution of the $l_0$ -minimization problem (1).

The optimization problem (1) is known to be NP-hard. In compressive sensing, we will rather consider special choices of $A$ and choose $y= Ax$ for some sparse $x$ . We will see that a variety of tractable algorithms will then provably recover $x$ from $y$ and thereby solve (1) for such specifically designed matrices $A$ . However, to emphasize this point once more, such algorithms will not successfully solve the $l_0$ – minimization problem for all possible choices of $A$ and $y$ due to NP-hardness.

II. Convex relaxation through $l_1$ minimization

Hence one needs to transform the formulation of the problem to be able to recover the signal $x$ efficiently without ending up with a problem that requires $m$ to be too large to ensure that $x$ is the unique optimal solution. The most natural way to do this is to propose a convex relaxation of the problem (1). This is done by replacing the $l_0$ -norm in the objective function by the $l_1$ -norm. Note that this is the best relaxation that we can propose in the sense that the $l_1$ -norm is the convex envelope of the $l_0$ -norm on $[-1,1]^n$ . This means that on $[-1,1]^n$ , the $l_1$ -norm is the supremum of all the affine functions that are smaller than the $l_0$ -norm on $[-1,1]^n$ . Another equivalent statement is that the $l_1$ -norm coincides with the biconjugate of the $l_0$ -norm on $[-1,1]^n$ .

(2) $\begin{align*} \underset{z \in \mathbb R^n}{\min}& \|z\|_1\\ s.c. &\; y=Ax=Az \notag \end{align*}$

1. Necessary condition

Proposition

Let $A \in \mathbb R^{m \times n}$ be a matrix satisfying $ER(2s)$ , then

$m\geq \frac{1}{\log 3}\left\lfloor \frac{s}{2} \right\rfloor \log \left( \left\lfloor\frac{n}{8es}\right\rfloor\right).$

The number of measurements needed to reconstruct every s-sparse vector via (2) always satisfies $m \gtrsim s \log(en/s)$ up to a universal constant. It means that we are paying an extra log factor compared to the $l_0$ -minimization procedure. We gain more on a computational side than we lose on the theoretical number of measurements.

Definition
Let $A : \mathbb R^n \to \mathbb R^m$ such that $m \leq n$ . We say that $A$ satisfies the exact recovery property of order $s$ (ER(s)), if for any s-sparse vector $x\in \mathbb R^n$ , one has

$\underset{z \in \mathbb R^n \; : \; Az=Ax}{\arg \; \min} \; \|z\|_1 \;\; = \{x\}.$

2) Equivalent condition: Nullspace property and cone of descent

A matrix $A \in \mathbb R^{m\times n}$ is said to satisfy the null space property relative to a set $S \subset [n]$ if

$\|v_S\|_1 < \|v_{S^c}\|_1,$

for all $v \in ker(A) \backslash \{0\}$ . It is said to satisfy the null space property of order $s$ if it satisfies the null space property relative to any set $S \subset [n]$ with $|S|\leq s$ .

Theorem
Let $A \in \mathbb R^{m\times n}$ . The following assertions are equivalent:
$(i)$ $A$ satisfies ER(s).
$(ii)$ $A$ satisfies NSP(s).
$(iii)$ $ker(A) \cap \mathcal D_{\|\cdot\|_1} = \{0\}$ where $\mathcal D_{|\cdot|_1}:= \{d \in \mathbb R^n \; : \; \exists c>0, \; \|x+cd\|_1 \leq \|x\|_1\}$ is the cone of descent of the norm $\|\cdot\|_1$ at $x$ .

Theorem
If $A \in \mathbb R^{m\times n}$ satisfies NSP(s), then $Ker(A) \cap \Sigma_{2s}=\{0\}$ which means that uniform recovery is ensured by solving (1).

Hence, the null space property is a necessary and sufficient condition for a matrix A to ensure recovery via (2).

It would be sufficient to construct sensing matrices satisfying NSP(s) with a
minimal number of rows (i.e. a minimal number of measurements) to solve CS problem using (2). However, verifying such a condition is impossible in practice. We are going to use instead two stronger properties that imply NSP(s). These two conditions are then sufficient – whereas NSP is necessary an sufficient – for (2) to success.

2) Sufficient conditions

a) Gelfand width

For all integer $n$ , let us denote $B_1^n$ (resp. $B_2^n$ ) the $l_1$ (resp. $l_2$ ) unit ball of $\mathbb R^n$ , and let us define

$diam(B_1^n,l_2) = \sup_{x \in B_1^n}\; \|x\|_2.$

Under NSP(s), the null space of $A$ does not contain the sparse vectors set $\Sigma_{2s}$ . Then Ker( $A$ ) is far enough from the canonical axes and from all the 2s-dimensional spaces spanned by canonical basis vectors. Intuitively, Ker( $A$ ) will be directed only diagonal directions in $\mathbb K^n$ . For these vectors said to be well-spread, the ratio of their $l_2$ -norm over their $l_1$ -norm much smaller than $1$ . One can anticipate that NSP(s) will be verified if the $l_2$ -norm of vectors in $B_1^n \cap$ Ker( $A$ )
is much smaller than $1$ .

In high dimensions, the $l_1$ -norm is really spiky (Source: C. Boyer lecture notes).

Definition
We say that $A$ satisfies the Gelfand property of order $s$ if

$diam(B_1^n \cap Ker(A),l_2)< \frac{1}{2 \sqrt s}.$

We will write that $A$ satisfies Gerlfand(s).

Theorem
If $A \in \mathbb R^{m \times n}$ satisfies Gelfand(s) then $A$ satisfies NSP(s).

b) Restricted isometry property

Definition
Let $A \in \mathbb R^{m\times n}$ . We way that $A$ satisfies the restricted isometry property of order $s$ if for any $x \in \Sigma_s$ ,

$\frac12 \|x\|_2 \leq \|Ax\|_2 \leq \frac32 \|x\|_2 .$

We will write that $A$ satisfies RIP(s).

RIP(s) condition is really strong as it can be seen from the following Theorem.

Theorem
If $A \in \mathbb R^{m \times n}$ satisfies RIP(s) then $A$ satisfies Gelfand(s/65).

In a nutshell

$RIP(65s) \implies Gelfand(s) \implies NSP(s) \Leftrightarrow ER(s)$

The NSP(s) is equivalent to ER(s), then this is a tight condition for ER(s). However it is unpractical because it is quasi not verifiable. RIP condition, even if it is a more restrictive condition, gives a quantitative criterion, which is much easier to deal with. Hence, the next goal is then to construct sensing matrices satisfying RIP.

Quentin Duchemin