Lipschitz Continuity. A function $f : \R^n \rightarrow \R$ is Lipschitz continuous on a subset $S \sube \R^n$ if there exists a constant $L > 0$ such that for all $x, y \in S$,

$$ ||f(x) - f(y) || \leq L||x- y|| $$

There is a lemma which says that if $f$ is twice continuously differentiable and $S$ is compact, then $L$ exists.

</aside>

Matrix Optimization

Any optimization problem where the optimization variable is a matrix can be represented equivalently by an optimization problem where the optimization variable is a vector.

$$ \begin{align*} \min_{X \in \R^{m \times n}} \space & f_0(X) \\ \text{subject to} \space & f_i(X) \le 0, \space i \in [m] \\ & h_j(X) = 0, \space j \in [k] \end{align*} \hspace{10pt} \equiv \hspace{10pt} \begin{align*} \min_{x \in \R^{mn}} \space & \tilde{f}_0(x) \\ \text{subject to} \space & \tilde{f}_i(x) \le 0, \space i \in [m] \\ & \tilde{h}_j(x) = 0, \space j \in [k] \end{align*}
$$

Matrix Completion

Given a partially unknown, low-rank $X^* \in \R^{m \times n}$ and a set of indices $(i, j) \in \mathcal{K}$ for which $X_{ij}$ is known, find $X^*$ by choosing the lowest rank matrix which matches the data .

$$ \begin{align*} \min_{X \in \R^{m \times n}} \space & \text{rank }X \\ \text{subject to} \space & X_{ij} = X^_{ij}, \space (i,j) \in \mathcal{K} \end{align} \hspace{10pt} \underset{||X||2 \leq 1}{\xrightarrow{relax}} \hspace{10pt} \begin{align*} \min{X \in \R^{m \times n}} \space & ||X||* \\ \text{subject to} \space & X{ij} = X^_{ij}, \space (i,j) \in \mathcal{K} \end{align} $$

Relaxation Derivation

The low-rank assumption implies that the data are generated according to a few latent factors (any basis of the lower dimensional subspace containing the data).

Robust PCA

Given $Y \in \R^{m \times n}$ find a low-rank $X \in \R^{m \times n}$ and sparse $Z \in \R^{m \times n}$ such that $Y = X + Z$.

$$ \begin{align*} \min_{X, Z \in \R^{m \times n}} \space & \text{rank }X + \lambda \text{ card } Z \\ \text{subject to} \space & X+Z = Y \end{align*} \hspace{10pt} \underset{|Z_{ij}| \leq 1}{\underset{||X||2 \leq 1}{\xrightarrow{relax}} } \hspace{10pt} \begin{align*} \min{X, Z \in \R^{m \times n}} \space & ||X||*+ \lambda\sum {i=1}^m \sum{j=1}^n |Z{ij}| \\ \text{subject to} \space & X+Z = Y \end{align*} $$

The matrices $X$ and $Z$ can be thought of as the background and foreground of a photo. Since the background of a photo tends to vary smoothly and is governed by only a few set of latent features, it makes sense why $X$ is low in rank. Similarly for $Z$, the foreground typically occupies a smaller portion of the image, so sparsity is a reasonable description.

Iterative Methods

Most optimization problems have no closed-form solution. Iterative methods generate a sequence $\{x_n\}$ by taking an initial guess $x_0$ and refining it iteratively as a solution to an optimization problem.

$$ x_{k+1} \gets f_k (x_k) \hspace{15pt }\text{$x_0$ initialized} $$

Matrix Optimization

Matrix Completion

Robust PCA

Iterative Methods

Descent Algorithms