Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
Linear algebra is a fundamental concept for applying and understanding statistical methods in more than one variable.
More specifically in data assimilation, algorithm design is strongly shaped by numerical stability and scalability;
We will not belabor the details and proofs of these results, as these can be found in other classes / books devoted to the subject.
numpy
) will be utilized.For this reason, these lectures will survey a variety of results from an applied perspective, providing intuition to how and why these tools are used.
We will start by introducing the basic characteristics of vectors / matrices, their operations and their implementation in Numpy.
Along the way, we will introduce some essential language and concepts about vector spaces, inner product spaces, linear transformations and important tools in applied matrix algebra.
To accommodate the flexibility of the Python programming environment, conventions around methods name spaces and scope have been adopted.
numpy
as a new object to call methods fromimport numpy as np
The tools we use from numpy will now be called from numpy
as an object, with the form of the call looking like np.method()
Numpy has a method known as “array”;
my_vector = np.array([1,2,3])
my_vector
array([1, 2, 3])
np.shape(my_vector)
(3,)
numpy
that handles both vector and matrix objects:my_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
my_array
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
np.shape(my_array)
(3, 3)
numpy
arrays function as mathematical multi-linear matricies in arbitrary dimensions:my_3D_array = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
my_3D_array
array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]])
np.shape(my_3D_array)
(2, 2, 2)
Mathematically, we will define the vector notations \( \pmb{x} \in \mathbb{R}^{N_x} \), matrix notations \( \mathbf{A} \in \mathbb{R}^{N_x \times N_x} \), and matrix-slice notations \( \mathbf{A}^j \in \mathbb{R}^{N_x} \) as
\[ \begin{align} \pmb{x} := \begin{pmatrix} x_1 \\ \vdots \\ x_{N_x} \end{pmatrix} & & \mathbf{A} := \begin{pmatrix} a_{1,1} & \cdots & a_{1, N_x} \\ \vdots & \ddots & \vdots \\ a_{N_x,1} & \cdots & a_{N_x, N_x} \end{pmatrix} & & \mathbf{A}^j := \begin{pmatrix} a_{1,j} \\ \vdots \\ a_{N_x, j} \end{pmatrix} \end{align} \]
Elements of the matrix \( \mathbf{A} \) may further be referred to by index in row and column as
\[ \mathbf{A}_{i,j} = \mathbf{A}\left[i,j\right] = a_{i,j} \]
In numpy
, we can make a reference to sub-arrays analogously with the :
slice notation:
my_array[0:2,0:3]
array([[1, 2, 3],
[4, 5, 6]])
my_array[:,0]
array([1, 4, 7])
Because arrays are understood as mathematical objects, they have inherent methods for mathematical computation.
Suppose we have two vectors
\[ \begin{align} \pmb{x} = \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix}\in\mathbb{R}^{3 \times 1} & & \pmb{y} = \begin{pmatrix} y_1 \\ y_2 \\ y_3 \end{pmatrix}\in \mathbb{R}^{3\times 1} \end{align} \]
We can perform basic mathematical operations on these element-wise as follows
\[ \begin{align} \pmb{x} + \pmb{y} = \begin{pmatrix} x_1 + y_1 \\ x_2 + y_2 \\ x_3 + y_3 \end{pmatrix} & & \pmb{x}\circ\pmb{y} = \begin{pmatrix} x_1 * y_1 \\ x_2 * y_2 \\ x_3 * y_3 \end{pmatrix} \end{align} \]
Both of these operations generalize to vectors of arbitrary length.
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
x+y
array([5, 7, 9])
x*y
array([ 4, 10, 18])
The simple, element-wise multiplication and addition of vectors can be performed on any arrays of matching dimension as above.
This type of multiplication is known as the Schur product of arrays.
The Schur product of arrays
Let \( \mathbf{A},\mathbf{B} \in \mathbb{R}^{N \times M} \) be arrays of arbitrary dimension with \( N,M \geq 1 \). The Schur product is defined \[ \begin{align} \mathbf{A}\circ \mathbf{B}:= \begin{pmatrix}a_{1,1} * b_{1,1} & \cdots & a_{1,M}* b_{1,M} \\ \vdots & \ddots & \vdots\\ a_{N,1}* b_{N,1} & \cdots & a_{N,M} * b_{N,M} \end{pmatrix} \end{align} \]
The array-valued Schur product is not the most widely-used array product;
Notice that the two previously defined vectors \( \pmb{x} \) and \( \pmb{y} \) were defined as column vectors
\[ \begin{align} \pmb{x} = \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix}\in\mathbb{R}^{3 \times 1} & & \pmb{y} = \begin{pmatrix} y_1 \\ y_2 \\ y_3 \end{pmatrix}\in \mathbb{R}^{3\times 1} \end{align} \]
The transpose of \( \pmb{x} \) is defined as the row vector,
\[ \begin{align} \pmb{x}^\top = \begin{pmatrix} x_1 & x_2 & x_3 \end{pmatrix} \in \mathbb{R}^{1 \times 3} \end{align} \]
The standard, Euclidean vector inner product is defined for the vectors \( \pmb{x} \) and \( \pmb{y} \) as follows
\[ \begin{align} \pmb{x}^\top \pmb{y} = x_1 * y_1 + x_2 * y_2 + x_3 * y_3 \end{align} \]
That is, we take each row element from \( \pmb{x}^\top \) and multiply it by each column element of \( \pmb{y} \) and take the sum of these products.
This generalizes to vectors of arbitrary length \( N_x \) as,
\[ \begin{align} \pmb{x}^\top \pmb{y} = \sum_{i=1}^{N_x} x_i * y_i \end{align} \]
The Euclidean inner product
For two vectors \( \pmb{x},\pmb{y} \in\mathbb{R}^{N_x} \), the Euclidean inner product is given as \[ \begin{align} \langle \pmb{x}, \pmb{y}\rangle := \pmb{x}^\top \pmb{y} = \sum_{i=1}^{N_x} \pmb{x}_i *\pmb{y}_i \end{align} \]
Euclidean norm
Let \( \pmb{x}\in\mathbb{R}^{N_x} \). The Euclidean norm of \( \pmb{x} \) is defined \[ \begin{align} \parallel \pmb{x}\parallel := \sqrt{ \sum_{i=1}^{N_x} x_i^2} \equiv \sqrt{\pmb{x}^\top\pmb{x}} \end{align} \]
It is important to note that there are other distances that can be defined on \( \mathbb{R}^{N_x} \) different than the Euclidean distance;
Note, it can be shown that the Euclidean inner product satisfies
\[ \begin{align} \pmb{x}^\top\pmb{y} = \parallel \pmb{x} \parallel * \parallel \pmb{y} \parallel \cos\left(\theta\right), \end{align} \] where,
Following our previous example, we will demonstrate the array transpose function and the dot product:
Recall that x
had the following dimensions
np.shape(x)
(3,)
x
and its transpose, we seex
array([1, 2, 3])
np.transpose(x)
array([1, 2, 3])
This is due to the fact that numpy
does not distinguish between row and column vectors.
transpose()
function extends, however, to two-dimensional arrays in the usual fashion. x.dot(y)
32
np.sum(x*y)
32
np.inner(x,y)
32
x @ y
32
@
notation refers to general matrix multiplication, which we will discuss shortly.Notice that the equation
\[ \begin{align} \pmb{x}^\top\pmb{y} = \parallel \pmb{x} \parallel * \parallel \pmb{y} \parallel \cos\left(\theta\right), \end{align} \]
generalizes the idea of a perpendicular angle between lines;
Orthogonal vectors
We say that two vectors \( \pmb{x},\pmb{y} \) are orthogonal if and only if \[ \begin{align} \pmb{x}^\top \pmb{y} = \pmb{0} \end{align} \]
Notice that with scalar / vector multiplication defined as
\[ \tilde{\pmb{x}}:= \alpha * \pmb{x}:= \begin{pmatrix} \alpha * x_1 \\ \vdots \\ \alpha * x_{N_x}\end{pmatrix} \]
then
\[ \begin{align} \tilde{\pmb{x}}^\top \pmb{y} = 0 & & \Leftrightarrow & & \pmb{x}^\top \pmb{y} = 0 \end{align} \]
This brings us to an important notions of linear combinations and subspaces.
Linear combination
Let \( n\geq 1 \) be an arbitrary integer, \( \alpha_i\in\mathbb{R} \) and \( \pmb{x}_i\in\mathbb{R}^{N_x} \) for each \( i=1,\cdots,n \). Then \[ \begin{align} \pmb{x} := \sum_{i=1}^n \alpha_i \pmb{x}_i \end{align} \] is a linear combination of the vectors \( \{\pmb{x}_i\}_{i=1}^{n} \).
Subspace
The collection of vectors \( V\subset \mathbb{R}^{N_x} \) is denoted a subspace if and only if for any arbitrary collection vectors \( \pmb{x}_i\in V \) and scalars \( \alpha_i\in\mathbb{R} \), their linear combination \[ \sum_{i=1}^n \alpha_i \pmb{x}_i = \pmb{x} \in V. \]
With the above linear combinations in mind, we will use the following notation
\[ \begin{align} \mathrm{span}\{\pmb{x}_i\}_{i=1}^n := \left\{\pmb{x}\in\mathbb{R}^{N_x} : \exists\text{ } \alpha_i \text{ for which } \pmb{x}= \sum_{i=1}^{n}\alpha_i \pmb{x}_i\right\} . \end{align} \]
It can be readily seen then that the span of any collection of vectors is a subspace by construction.
Linear independence / dependence
Let \( \pmb{x}\in\mathbb{R}^{N_x} \) and \( \pmb{x}_i\in\mathbb{R}^{N_x} \) for \( i=1,\cdots,n \). The vector \( \pmb{x} \) is linearly independent (respectively dependent) with the collection \( \{\pmb{x}_i\}_{i=1}^{n} \) if and only if \( \pmb{x}\notin \mathrm{span}\{\pmb{x}_i\}_{i=1}^{n} \) (respectively \( \pmb{x}\in \mathrm{span}\{\pmb{x}_i\}_{i=1}^{n} \)).
It is clear then that, e.g., \( \pmb{x}_1 \) is linearly dependent with \( \mathrm{span}\{\pmb{x}_i\}_{i=1}^n \) trivially.
A related idea is whether for some vector \( \pmb{x}\in \mathrm{Span}\{\pmb{x}_i\}_{i=1}^n \) the choice of the scalar coefficients \( \alpha_i \) defining \( \pmb{x}=\sum_{i=1}^n \alpha_x \pmb{x}_i \) is unique.
Bases
Let \( V\subset \mathbb{R}^{N_x} \) be a subspace. A collection \( \{\pmb{x}_i\}_{i=1}^{n} \) is said to be a basis for \( V \) if \( V = \mathrm{span}\{\pmb{x}_i\}_{i=1}^n \) and if \[ \begin{align} \pmb{0} = \sum_{i=1}^n \alpha_i \pmb{x}_i \end{align} \] holds if and only if \( \alpha_i=0 \) \( \forall i \).
In particular, a choice of a basis for \( V \) gives a unique coordinatization of any vector \( \pmb{x}\in V \).
\[ \begin{align} \pmb{x}=\sum_{i=1}^n \alpha_i \pmb{x}_i = \sum_{i=1}^n \beta_i \pmb{x}_i & & \Leftrightarrow & &\pmb{0} = \sum_{i=1}^n \left(\alpha_i - \beta_i \right) \pmb{x}_i \end{align} \] and all \( \beta_i = \alpha_i \).
Orthogonal (Orthonormal) bases
Let \( \{\pmb{x}_i\}_{i=1}^n \) define a basis for \( V \subset \mathbb{R}^{N_x} \). The basis is said to be orthogonal if and only if each pair of basis vectors is orthogonal. A basis is said to orthonormal if, moreover, each basis vector has norm equal to one. In particular, for an orthonormal basis, if \[ \begin{align} \pmb{x} = \sum_{i=1}^n \alpha_i \pmb{x}_i \end{align} \] then \( \alpha_i = \pmb{x}_i^\top \pmb{x} \).
The above property shows that we can recover the “projection” coefficient \( \alpha_i \) of \( \pmb{x} \) into \( V \) using the inner product of the vector \( \pmb{x} \) with the basis vector \( \pmb{x}_i \).
Orthogonal subspaces
The subspaces \( W,V\subset \mathbb{R}^{N_x} \) are orthogonal if and only if for every \( \pmb{y}\in W \) and \( \pmb{x}\in V \), \[ \begin{align} \pmb{y}^\top \pmb{x} = \pmb{0}. \end{align} \] Orthogonal subspaces will be denoted with \( W \perp V \).
With these constructions in mind, we can now introduce two of the most fundamental tools of inner product spaces:
Courtesy of ZackaryCW CC BY-SA 4.0, via Wikimedia Commons
Orthogonal projection lemma
Let \( \langle \circ , \circ \rangle:\mathbb{R}^{N_x}\times\mathbb{R}^{N_x} \rightarrow \mathbb{R} \) be an inner product, and let \( V\subset \mathbb{R}^{N_x} \) be an arbitrary subspace. If \( \pmb{x}\in \mathbb{R}^{N_x} \), there exists a unique projection \( \boldsymbol{\Pi}_V\pmb{x}\in V \) satisfying \[ \begin{align} \mathrm{min}_{\pmb{y}\in V} \parallel \pmb{x} - \pmb{y}\parallel = \parallel \pmb{x} - \boldsymbol{\Pi}_V \pmb{x}\parallel, \end{align} \] if and only if \[ \begin{align} \langle \pmb{x} - \boldsymbol{\Pi}\pmb{x}, \pmb{x}'\rangle = 0 \quad \forall \pmb{x}'\in V. \end{align} \]
This states that the point in space which minimizes the distance between \( \pmb{x} \) and the subspace \( V \) is the point in \( V \) that such that
Therefore, the residual forms a \( 90^\circ \) angle with \( V \), or equivalently,
\[ \begin{align} \mathrm{span}\left\{\pmb{x} - \boldsymbol{\Pi}_V \pmb{x}\right\} \perp V. \end{align} \]
Suppose \( \pmb{y}\in\mathbb{R}^{N_x} \) is an arbitrary, norm one vector and let \( V := \mathrm{span}\{\pmb{y}\} \); this implies
\[ \begin{align} 0 = \pmb{y}^\top \left(\pmb{x} - \boldsymbol{\Pi}\pmb{x}\right) & & \Leftrightarrow & & \pmb{y}^\top \pmb{x} = \pmb{y}^\top \boldsymbol{\Pi}\pmb{x}. \end{align} \]
But \( \pmb{y} \) is an orthogonal basis for \( V \), such that we see \( \boldsymbol{\Pi}\pmb{x} := \left(\pmb{y}^\top \pmb{x}\right)\pmb{y} \).
Furthermore, if \( \pmb{x} \) is not a scalar multiple of \( \pmb{y} \), then \( \left\{\pmb{x} - \left(\pmb{y}^\top \pmb{x}\right)\pmb{y}, \pmb{y}\right\} \) is an orthogonal basis for \( \mathrm{span}\left\{\pmb{x},\pmb{y}\right\} \).
Courtesy of Lucas Vieira, Public domain, via Wikimedia Commons
Suppose we have a collection of linearly independent vectors \( \{\pmb{x}_i\}_{i=1}^n \), and denote the Gram-Schmidt basis constructed from these vectors \( \{\pmb{e}_i\}_{i=1}^n \).
Notice, the Gram-Schmidt process describes the coordinates of the original vectors as
\[ \begin{align} \pmb{x}_1 &= \left(\pmb{e}_1^\top \pmb{x}_1 \right) \pmb{e}_1 \\ \pmb{x}_2 &= \left(\pmb{e}_1^\top \pmb{x}_2\right) \pmb{e}_1 + \left(\pmb{e}_2^\top \pmb{x}_2\right) \pmb{e}_2 \\ \vdots & \\ \pmb{x}_n &= \sum_{i=1}^n \left(\pmb{e}_i^\top \pmb{x}_n\right) \pmb{e}_i. \end{align} \]
We can thus define the matrices \( \mathbf{R} \) and \( \mathbf{Q} \) such that
\[ \begin{align} \mathbf{R} := \begin{pmatrix} \pmb{e}_1^\top \pmb{x}_1 & \pmb{e}_1^\top \pmb{x}_2 & \cdots & \pmb{e}_1^\top \pmb{x}_n \\ 0 & \pmb{e}_2^\top \pmb{x}_2 & \cdots & \pmb{e}_2^\top \pmb{x}_n \\ \vdots & & \ddots & \vdots \\ 0 & 0 & \cdots & \pmb{e}_n^\top \pmb{x}_n \end{pmatrix} & & \mathbf{Q}^j := \pmb{e}_j \end{align} \]
QR factorization
Let \( \mathbf{A}\in \mathbb{R}^{N\times M} \). There exists a unique pair of matrices \( \mathbf{R} \) and \( \mathbf{Q} \) defined by the Gram-Schmidt process such that, \[ \begin{align} \mathbf{A} \equiv \mathbf{Q} \mathbf{R} \end{align} \] where the above refers to matrix multiplication.
We haven't yet introduced matrix multiplication fully – this will be our next topic.
However, it is worth noting some important properties of the QR factorization:
The above properties are actually simple but power results that are implied by the construction.
In practice, however, the Gram-Schmidt process is rarely used explicitly;
numpy
consider the following:my_array
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
Q, R = np.linalg.qr(my_array)
Q
array([[-0.12309149, 0.90453403, 0.40824829],
[-0.49236596, 0.30151134, -0.81649658],
[-0.86164044, -0.30151134, 0.40824829]])
R
array([[-8.12403840e+00, -9.60113630e+00, -1.10782342e+01],
[ 0.00000000e+00, 9.04534034e-01, 1.80906807e+00],
[ 0.00000000e+00, 0.00000000e+00, -8.88178420e-16]])
my_array - Q @ R
array([[-4.44089210e-16, -3.10862447e-15, -4.88498131e-15],
[ 0.00000000e+00, -8.88178420e-16, -2.66453526e-15],
[ 8.88178420e-16, -1.77635684e-15, -3.55271368e-15]])
We now have introduced a few ideas around matrix multiplication which we will formalize.
As a simple case that leads to general matrix multiplication, let us consider matrix / vector multiplication.
Let's suppose that
\[ \begin{align} \mathbf{N} \in \mathbb{R}^{N \times N_x} & & \pmb{x} \in \mathbb{R}^{N_x \times 1} \end{align} \]
We will suppose that we can write the following
\[ \mathbf{N} = \begin{pmatrix} \pmb{n}_1^\top \\ \vdots \\ \pmb{n}_N^\top \end{pmatrix} \] where each \( \pmb{n}_i^\top \in \mathbb{R}^{1 \times N_x} \) is a row of the matrix \( \mathbf{N} \).
Recall the vector multiplication,
\[ \pmb{n}_i^\top \pmb{x} = \sum_{j=1}^{N_x} n_{i,j} * x_{j} \]
The product of the matrix and the vector is given as,
\[ \mathbf{N}\pmb{x} = \begin{pmatrix} \pmb{n}^\top_1 \pmb{x} \\ \vdots \\ \pmb{n}_N^\top \pmb{x} \end{pmatrix} \]
where each entry of the resultant vector \( i =1, \cdots ,N_x \) is given by the inner product of the \( i \)-th row of \( \mathbf{N} \) with the column vector \( \pmb{x} \).
This formulation generalizes immediately to the case in which we extend \( \pmb{x} \) into an array with \( N_x \) rows, but an arbitrary number of columns.
As noted already, there are several very important classes of matrices that are useful for computation.
Diagonal matrices are the simplest case, for which general matrix multiplication and the Schur product on square matrices agree.
Diagonal matrices
The matrix \( \mathbf{D}\in \mathbb{R}^{N \times D} \) is said to be diagonal if and only if \[ \begin{align} \mathbf{D}[i,j]=0 & & \forall i \neq j \end{align} \] In particular, let \( d = \min\{N,D\} \), and define \[ \begin{align} \pmb{d} := \begin{pmatrix} D_{1,1} \\ \vdots \\ D_{d,d}\end{pmatrix} \end{align} \] we will sometimes refer to the diagonal matrix \( \mathbf{D}=\mathrm{diag}(\pmb{d}) \) where the expansion of the zero entries to the proper dimensions will be inferred by the context.
In particular, we will often consider two square diagonal matrices, \( \mathrm{diag}(\pmb{m}),\mathrm{diag}(\pmb{n})\in\mathbb{R}^{M \times M} \),
\[ \begin{align} \mathrm{diag}(\pmb{m}) := \begin{pmatrix} m_{1,1} & 0 & \cdots & 0 \\ 0 & m_{2,2} & \cdots & 0 \\ \vdots & & \ddots & 0 \\ 0 & \cdots & & m_{M,M} \end{pmatrix} & & \mathrm{diag}(\pmb{n}): = \begin{pmatrix} n_{1,1} & 0 & \cdots & 0 \\ 0 & n_{2,2} & \cdots & 0 \\ \vdots & & \ddots & 0 \\ 0 & \cdots & & n_{M,M} \end{pmatrix} \end{align} \]
It is easy to verify, following row-column multiplication, that
\[ \begin{align} \mathrm{diag}(\pmb{m})\mathrm{diag}(\pmb{n}) \equiv \mathrm{diag}(\pmb{m}) \circ \mathrm{diag}(\pmb{n})\equiv \mathrm{diag}(\pmb{m}\circ \pmb{n}) \end{align} \]
X = np.diag(x)
Y = np.diag(y)
np.diag(x*y)
array([[ 4, 0, 0],
[ 0, 10, 0],
[ 0, 0, 18]])
X @ Y
array([[ 4, 0, 0],
[ 0, 10, 0],
[ 0, 0, 18]])
np.diag(X @ Y)
array([ 4, 10, 18])
Triangular matrices
Let \( \mathbf{A} \in \mathbb{R}^{N\times N} \), it is known as upper (respectively lower) triangular if and only if entries satisfy \[ \begin{align} {\color{#d95f02} {\mathbf{A}[i,j] = 0 \quad \forall i > j} } & & \left(\text{or respectively } {\color{#1b9e77} {A[i,j]= 0 \quad \forall j > i} } \right) \end{align} \]
The key property of upper (respectively lower) triangular matrices is that the product of two upper (respectively lower) triangular matrices is also upper (respectively lower) triangular.
This property thus maintains the sparisty of triangular matrices, and implies their easy computation.
Orthogonal matrices
Let \( \mathbf{U}\in\mathbb{R}^{N_x\times N_x} \) with columns defined \( \mathbf{U}^j=\pmb{u}_j \). The matrix \( \mathbf{U} \) is said to be orthogonal if and only if \[ \begin{align} \pmb{u}_j^\top \pmb{u}_i = \begin{cases} 0 & \text{ for } i \neq j\\ 1 & \text{ for } i = j \end{cases}. \end{align} \]
It is a direct consequence of matrix multiplication that for orthogonal matrices
\[ \begin{align} \mathbf{U}^\top \mathbf{U} = \mathbf{U}\mathbf{U}^\top= \mathbf{I} = \mathrm{diag}(\pmb{1}) \end{align} \]
where
Notice that the columns \( \{\pmb{u}_j\}_{j=1}^{N_x} \) actually define an orthonormal basis for \( \mathbb{R}^N \) by the above properties.
Particularly, for an arbitrary \( \pmb{x} \) we can write,
\[ \begin{align} \pmb{x} = \mathbf{I}\pmb{x} &\equiv \mathbf{U} \mathbf{U}^\top \pmb{x}; \end{align} \]
The statement from the last slide then reads,
\[ \begin{align} \pmb{x} &\equiv \begin{pmatrix} \pmb{u}_1 & \cdots & \pmb{u}_N \end{pmatrix} \begin{pmatrix} \left(\pmb{u}_1^\top \pmb{x}\right) \\ \vdots \\ \left(\pmb{u}_{N_x}^\top \pmb{x}\right) \end{pmatrix} \\ &= \sum_{i=1}^N \left(\pmb{u}_i^\top \pmb{x}\right) \pmb{u}_i \end{align} \]
giving the explicit basis representation.
A related statement can be considered when we have a matrix of the form,
\[ \begin{align} \mathbf{U}\in\mathbb{R}^{N_x \times n} & & \mathbf{U}^j=\pmb{u}_j \in\mathbb{R}^{N_x} \end{align} \]
for some \( n < N_x \), where the columns \( \pmb{u}_j \) are orthonormal.
Matrix multiplication implies the following relationship
\[ \begin{align} \mathbf{U}^\top \mathbf{U} = \mathbf{I} \in \mathbb{R}^{n\times n} & & \mathbf{I} \neq \mathbf{U}\mathbf{U}^\top \in\mathbb{R}^{N_x \times N_x} \end{align} \]
In particular, the different arrangements of the transpose yield matrices with completely different dimensions.
Let \( \pmb{x} \) be an arbitrary vector in \( \mathbb{R}^{N_x} \), in analogy to the last slide we will write
\[ \begin{align} \mathbf{U}\mathbf{U}^\top \pmb{x} &\equiv \begin{pmatrix} \pmb{u}_1 & \cdots & \pmb{u}_n \end{pmatrix} \begin{pmatrix} \left(\pmb{u}_1^\top \pmb{x}\right) \\ \vdots \\ \left(\pmb{u}_n^\top \pmb{x}\right) \end{pmatrix} \\ &= \sum_{i=1}^n \left(\pmb{u}_i^\top \pmb{x}\right) \pmb{u}_i \\ &\equiv \boldsymbol{\Pi}_{\mathrm{span}\left\{\pmb{u}_i\right\}_{i=1}^n} \pmb{x} \end{align} \]
where \( \boldsymbol{\Pi}_{\mathrm{span}\left\{\pmb{u}_i\right\}_{i=1}^n} \) is the orthogonal projection into the subspace defined by the column span of \( \mathbf{U} \).
We thus write
\[ \begin{align} \pmb{x} = \left(\mathbf{I} - \mathbf{U}\mathbf{U}^\top\right)\pmb{x} + \mathbf{U}\mathbf{U}^\top\pmb{x} \end{align} \]
giving the explicit orthogonal decomposition with respect to an arbitrary subspace.
One of the generic ways to construct an arbitrary projector into some subspace \( V\subset \mathbb{R}^{N_x} \) is thus to draw an arbitrary spanning set, \( \mathrm{span}\{\pmb{x}_i\}_{i=1}^n = V \)
The QR algorithm then constructs the matrix \( \mathbf{U}:= \mathbf{Q} \in \mathbb{R}^{N_x \times n} \).
Orthogonal matrices are obviously useful, as they provide explicit forms of orthogonal projectors.
The subject of orthogonal matrices is in fact, incredibly rich, for a variety of reasons that will go beyond our overall scope.
However, we will introduce two other properties of central interest.
Suppose that we have arbitrary vectors \( \pmb{x},\pmb{y}\in\mathbb{R}^{N_x} \) and an orthogonal matrix \( \mathbf{U}\in\mathbb{R}^{N_x \times N_x} \).
Notice that for \( \tilde{\pmb{x}}:= \mathbf{U}\pmb{x} \) and \( \tilde{\pmb{y}}:= \mathbf{U}\pmb{y} \),
\[ \begin{align} \tilde{\pmb{x}}^\top \tilde{\pmb{y}} & = \pmb{x}^\top \mathbf{U}^\top \mathbf{U} \pmb{y} = \pmb{x}^\top \pmb{y} \end{align} \]
The above property implies that orthogonal transformations preserve the Euclidean inner product, and thus the Euclidean distance,
\[ \parallel \tilde{\pmb{x}} \parallel = \parallel \pmb{x} \parallel. \]
It turns out that this has to do with the fact that orthogonal matrices are actually generalizations of rotations of the frame of reference, in arbitrary dimensions.
These “rotation” matrices preserve the angles between vectors, as well as distances, where \( \mathbf{U}^\top \) corresponds to the reverse rotation to \( \mathbf{U} \).
Correspondingly, the product of two orthogonal matrices is also orthogonal, and can be considered the composition of rotations.