[Linear Algebra] What really is Norm?

Reference

This post is a summary of Chapter 2 from deeplearningbook.org.

Norm

A norm is a measurement of the size of vectors using a function that maps vectors to non-negative values. The formal definition of \(L^P\) norm is given by,

\[||x||_p = (\sum_i|x_i|^p)^{\frac{1}{p}}\]

for \(p \in \mathbb{R}, p\geq1\).

Intuitively, the norm of a vector \(x\) measures the distance between the point \(x\) and the origin.

More rigorous definition of Norm

A more rigorous definition of norm is any function \(f\) that satisfies:

\[f(x)=0 \Rightarrow x=0\]
\[f(x+y) \leq f(x)+f(y)\]
\[\forall \alpha \in \mathbb{R}, f(\alpha x)=\lvert \alpha \rvert f(x)\]

L2 Norm (Euclidean Norm)

We’re more familiar with \(L_2\) norm especially in machine learning fields. Using the above equation, \(L_2\) norm is denoted as

\[L_2\ norm: \|\|x\|\|\_2 = \sqrt{\sum_i \lvert x_i \rvert^2}\]

and is also called Euclidean norm which is the Euclidean distance. Since \(L_2\) norm is very frequently used in machine learning fields, it is simply denoted as \(\|\|x\|\|\) without the subscript 2. The \(L_2\) norm can be calculated by \(x^{\top}x\).

Also, the dot product of two vectors can be expressed in terms of norms: \(x^{\top}y = \|\|x\|\|_2\|\|y\|\|_2\cos{\theta}\), where \(\theta\) is the angle between the two vectors.

Often times, the squared \(L_2\) norm is more convenient to work with than the original \(L_2\) norm since the derivative of the squared \(L_2\) norm w.r.t each element of the vector \(x\) depends only on the corresponding element of \(x\) (partial derivatives).

\[Squared\ L_2\ norm: {\|\|x\|\|_2}^2 = \sum_i\|x_i\|^2\]

L1 Norm (Manhattan Norm)

In many contexts, \(L_2\) norm may be undesirable since it increases very slowly near \(0\). This might be a problem for the case when the discrimination between exact \(0\) and near \(0\) is important. In this case, \(L_1\) norm is frequently used since it grows at the same rate in all locations.

\[L_1\ norm: {\|\|x\|\|_1} = \sum_i\|x_i\|\]

Max Norm

Another norm that’s common in machine learning is \(L^{\infty}\) norm(max norm) which denotes the absolute value of the element with the largest magnitude in the vector.

\[Max\ norm: {\|\|x\|\|_{\infty}} = \underset{i}{max}\|x_i\|\]

Frobenius Norm

What about the size of a matrix instead of a vector? In deep learning, the most common way is Frobenius norm.

\[Frobenius\ norm: \|\|A\|\|_F = \sqrt{\sum\_{i,j}{A\_{i,j}^{2}}}\]

[Linear Algebra] What really is Norm?

Reference

Norm

More rigorous definition of Norm

L2 Norm (Euclidean Norm)

L1 Norm (Manhattan Norm)

Max Norm

Frobenius Norm

Further Reading

[Linear Algebra] 2.5 Inverse Matrices

[Linear Algebra] 1.2 Length and Dot Product

[Linear Algebra] 2.6 Factorization A=LU