Reference
This post is a summary of Chapter 2 from deeplearningbook.org.
Norm
A norm is a measurement of the size
of vectors using a function that maps vectors to non-negative values. The formal definition of \(L^P\) norm is given by,
for \(p \in \mathbb{R}, p\geq1\).
Intuitively, the norm of a vector \(x\) measures the distance between the point \(x\) and the origin.
More rigorous definition of Norm
A more rigorous definition of norm is any function \(f\) that satisfies:
- \[f(x)=0 \Rightarrow x=0\]
- \[f(x+y) \leq f(x)+f(y)\]
- \[\forall \alpha \in \mathbb{R}, f(\alpha x)=\lvert \alpha \rvert f(x)\]
L2 Norm (Euclidean Norm)
We’re more familiar with \(L_2\) norm especially in machine learning fields. Using the above equation, \(L_2\) norm is denoted as
\[L_2\ norm: \|\|x\|\|\_2 = \sqrt{\sum_i \lvert x_i \rvert^2}\]and is also called Euclidean norm which is the Euclidean distance. Since \(L_2\) norm is very frequently used in machine learning fields, it is simply denoted as \(\|\|x\|\|\) without the subscript 2. The \(L_2\) norm can be calculated by \(x^{\top}x\).
Also, the dot product of two vectors can be expressed in terms of norms: \(x^{\top}y = \|\|x\|\|_2\|\|y\|\|_2\cos{\theta}\), where \(\theta\) is the angle between the two vectors.
Often times, the squared \(L_2\) norm is more convenient to work with than the original \(L_2\) norm since the derivative of the squared \(L_2\) norm w.r.t each element of the vector \(x\) depends only on the corresponding element of \(x\) (partial derivatives).
\[Squared\ L_2\ norm: {\|\|x\|\|_2}^2 = \sum_i\|x_i\|^2\]L1 Norm (Manhattan Norm)
In many contexts, \(L_2\) norm may be undesirable since it increases very slowly near \(0\). This might be a problem for the case when the discrimination between exact \(0\) and near \(0\) is important. In this case, \(L_1\) norm is frequently used since it grows at the same rate in all locations.
\[L_1\ norm: {\|\|x\|\|_1} = \sum_i\|x_i\|\]Max Norm
Another norm that’s common in machine learning is \(L^{\infty}\) norm(max norm) which denotes the absolute value of the element with the largest magnitude in the vector.
\[Max\ norm: {\|\|x\|\|_{\infty}} = \underset{i}{max}\|x_i\|\]Frobenius Norm
What about the size of a matrix instead of a vector? In deep learning, the most common way is Frobenius norm.
\[Frobenius\ norm: \|\|A\|\|_F = \sqrt{\sum\_{i,j}{A\_{i,j}^{2}}}\]