<aside> <img src="/icons/castle_yellow.svg" alt="/icons/castle_yellow.svg" width="40px" />
Trace is Sum of Eigenvalues. Given a matrix $A \in \R^{n \times n}$,
$$ \text{tr}(A) = \sum _{\lambda \in \sigma(A)}\lambda $$
</aside>
<aside> <img src="/icons/castle_yellow.svg" alt="/icons/castle_yellow.svg" width="40px" />
Bound on Singular Values of Sum. Given matrices $Y, Z \in \R^{m \times n}$,
$$ (\sigma _i)_Y + (\sigma _j)_Z \geq (\sigma {i+j-1}){Y+Z} \hspace{15pt} \forall i,j \geq 1 $$
With $\sigma_{i+j -1} = 0$ if $i + j - 1 \geq \min (m,n)$.
</aside>
<aside> <img src="/icons/castle_yellow.svg" alt="/icons/castle_yellow.svg" width="40px" />
$L_2$-Induced Norm Bound. Given $A \in \R^{m \times n}, x \in \R^{n}$,
$$ ||Ax||_2 \leq ||A||_2||x||_2 $$
The low rank approximation of rank $k$ of a matrix $A \in \R^{m \times n}$ is a matrix $A_k$ formed by the first $k$ terms in its outer product SVD representation.
$$ A_k = \sigma _1 u_1v_1^T + \dots +\sigma _k u_kv_k^T \hspace{15pt} k=\min(m,n) \implies A_k = A $$
<aside> <img src="/icons/castle_yellow.svg" alt="/icons/castle_yellow.svg" width="40px" />
Eckart-Young-Mirsky Theorem. The best, relative to the Frobenius and $L_2$-induced norms, rank $k$ approximation of a matrix $A$ is $A_k$.
$$ A_k = \argmin _{B\in \R^{m \times n}}||A-B||_F = \argmin _{B \in \R^{m \times n}}||A-B||_2 \text{ s.t. rank$(B) \leq k$} $$
Moreover, uniqueness holds if $\sigma_k \neq \sigma _{k+1}$.
We can define the error $e_k[\cdot]$ of a $k$-rank approximation of matrix in terms of the Frobenius and $L_2$-induced norms. This error is often interpreted in terms of an explained variance ratio $\eta_k[\cdot]$.
$$ e_k[f] = 1- \eta_k[f]
\\ \eta k[L_2] := \frac{\sigma_1 - \sigma{k+1}}{\sigma_1}
\\ \eta_{k} [F]:=\frac{||A_k||_F^2}{||A||_F ^2} = \frac{\sigma_1^2 +\dots +\sigma_k^2}{\sigma_1^2 +\dots+\sigma_r^2} $$
For a system of equations $Ax=y$, where $A$ is invertible, how much does the system change if we perturb $A$ or $y$? This is usually measured with respect to a condition number $\kappa(A)$, which represents how sensitive the system is to changes in these parameters.
$$ \kappa (A) := ||A||_2||A^{-1}||_2 = \frac{\sigma_1}{\sigma_n} $$
When $\kappa(A) \approx 1$, we say $A$ is well-conditioned and when $\kappa(A)$ is very large, we say $A$ is ill-conditioned.