A review of inner product spaces and matrix algebra

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

The following topics will be covered in this lecture:

An introduction to arrays in Python
Basic vector operations
Orthogonality
Subspaces
Orthogonal projection lemma
Gram-Schmidt and QR decomposition
Matrix / vector multiplication
Matrix / matrix multiplication
Special classes of matrices

A review of inner product spaces

Linear algebra is a fundamental concept for applying and understanding statistical methods in more than one variable.
- This is at the basis of formulating multivariate distributions and random vectors, as well as their analysis.
More specifically in data assimilation, algorithm design is strongly shaped by numerical stability and scalability;
- for this reason, an understanding of vector subspaces, projections and matrix factorizations is critical for performing dimensional / computational reductions.
We will not belabor the details and proofs of these results, as these can be found in other classes / books devoted to the subject.
- Likewise, it will be assumed that in computation, optimized numerical linear algebra libraries like LAPACK and OpenBLAS (or their wrappers like numpy) will be utilized.
For this reason, these lectures will survey a variety of results from an applied perspective, providing intuition to how and why these tools are used.
We will start by introducing the basic characteristics of vectors / matrices, their operations and their implementation in Numpy.
Along the way, we will introduce some essential language and concepts about vector spaces, inner product spaces, linear transformations and important tools in applied matrix algebra.

Pythonic programming

Python uses several standard scientific libraries for numerical computing, data processing and visualization.
At the core, there is a Python kernel and interpreter that can take human readable inputs and turn these into machine code.
This is the basic Python functionality, but there are extensive specialized libraries.
The most important of these for scientific computing are the following:

Numpy – designed for large array manipulation in vectorized operations;
Scipy – a library of numerical routines and scientific computing ecosystem;
Pandas – R Dataframe inspired, data structures and analysis;
Scikit-learn – a general regression and machine learning library;
Matplotlib – Matlab inspired, object oriented plotting and visualization library.

Numpy arrays

To accommodate the flexibility of the Python programming environment, conventions around methods name spaces and scope have been adopted.
- The convention is to utilize import statements to call methods of the library.
- For example, we will import the library numpy as a new object to call methods from

import numpy as np

The tools we use from numpy will now be called from numpy as an object, with the form of the call looking like np.method()
Numpy has a method known as “array”;

my_vector = np.array([1,2,3])
my_vector

array([1, 2, 3])

Notice that we can identify properties of an array, such as its dimensions, as follows:

np.shape(my_vector)

(3,)

Numpy arrays continued

Arrays are the object class in numpy that handles both vector and matrix objects:

my_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
my_array

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

np.shape(my_array)

(3, 3)

Numpy arrays continued

Note that numpy arrays function as mathematical multi-linear matricies in arbitrary dimensions:

my_3D_array = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
my_3D_array

array([[[1, 2],
        [3, 4]],

       [[5, 6],
        [7, 8]]])

np.shape(my_3D_array)

(2, 2, 2)

Array notations

Mathematically, we will define the vector notations \( \pmb{x} \in \mathbb{R}^{N_x} \), matrix notations \( \mathbf{A} \in \mathbb{R}^{N_x \times N_x} \), and matrix-slice notations \( \mathbf{A}^j \in \mathbb{R}^{N_x} \) as

\[ \begin{align} \pmb{x} := \begin{pmatrix} x_1 \\ \vdots \\ x_{N_x} \end{pmatrix} & & \mathbf{A} := \begin{pmatrix} a_{1,1} & \cdots & a_{1, N_x} \\ \vdots & \ddots & \vdots \\ a_{N_x,1} & \cdots & a_{N_x, N_x} \end{pmatrix} & & \mathbf{A}^j := \begin{pmatrix} a_{1,j} \\ \vdots \\ a_{N_x, j} \end{pmatrix} \end{align} \]
Elements of the matrix \( \mathbf{A} \) may further be referred to by index in row and column as

\[ \mathbf{A}_{i,j} = \mathbf{A}\left[i,j\right] = a_{i,j} \]
In numpy, we can make a reference to sub-arrays analogously with the : slice notation:

my_array[0:2,0:3]

array([[1, 2, 3],
       [4, 5, 6]])

my_array[:,0]

array([1, 4, 7])

Array operations

Because arrays are understood as mathematical objects, they have inherent methods for mathematical computation.
Suppose we have two vectors

\[ \begin{align} \pmb{x} = \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix}\in\mathbb{R}^{3 \times 1} & & \pmb{y} = \begin{pmatrix} y_1 \\ y_2 \\ y_3 \end{pmatrix}\in \mathbb{R}^{3\times 1} \end{align} \]
We can perform basic mathematical operations on these element-wise as follows

\[ \begin{align} \pmb{x} + \pmb{y} = \begin{pmatrix} x_1 + y_1 \\ x_2 + y_2 \\ x_3 + y_3 \end{pmatrix} & & \pmb{x}\circ\pmb{y} = \begin{pmatrix} x_1 * y_1 \\ x_2 * y_2 \\ x_3 * y_3 \end{pmatrix} \end{align} \]
Both of these operations generalize to vectors of arbitrary length.

Numpy arrays continued

In Python the syntax for such operations is given by

x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
x+y

array([5, 7, 9])

x*y

array([ 4, 10, 18])

The simple, element-wise multiplication and addition of vectors can be performed on any arrays of matching dimension as above.
This type of multiplication is known as the Schur product of arrays.

The Schur product of arrays
Let \( \mathbf{A},\mathbf{B} \in \mathbb{R}^{N \times M} \) be arrays of arbitrary dimension with \( N,M \geq 1 \). The Schur product is defined \[ \begin{align} \mathbf{A}\circ \mathbf{B}:= \begin{pmatrix}a_{1,1} * b_{1,1} & \cdots & a_{1,M}* b_{1,M} \\ \vdots & \ddots & \vdots\\ a_{N,1}* b_{N,1} & \cdots & a_{N,M} * b_{N,M} \end{pmatrix} \end{align} \]

Euclidean inner product

The array-valued Schur product is not the most widely-used array product;
- rather, the scalar-valued inner product and its extension to general matrix multiplication will be more common.
Notice that the two previously defined vectors \( \pmb{x} \) and \( \pmb{y} \) were defined as column vectors

\[ \begin{align} \pmb{x} = \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix}\in\mathbb{R}^{3 \times 1} & & \pmb{y} = \begin{pmatrix} y_1 \\ y_2 \\ y_3 \end{pmatrix}\in \mathbb{R}^{3\times 1} \end{align} \]
The transpose of \( \pmb{x} \) is defined as the row vector,

\[ \begin{align} \pmb{x}^\top = \begin{pmatrix} x_1 & x_2 & x_3 \end{pmatrix} \in \mathbb{R}^{1 \times 3} \end{align} \]
The standard, Euclidean vector inner product is defined for the vectors \( \pmb{x} \) and \( \pmb{y} \) as follows

\[ \begin{align} \pmb{x}^\top \pmb{y} = x_1 * y_1 + x_2 * y_2 + x_3 * y_3 \end{align} \]
That is, we take each row element from \( \pmb{x}^\top \) and multiply it by each column element of \( \pmb{y} \) and take the sum of these products.
This generalizes to vectors of arbitrary length \( N_x \) as,

\[ \begin{align} \pmb{x}^\top \pmb{y} = \sum_{i=1}^{N_x} x_i * y_i \end{align} \]

The Euclidean norm and inner product

The Euclidean inner product
For two vectors \( \pmb{x},\pmb{y} \in\mathbb{R}^{N_x} \), the Euclidean inner product is given as \[ \begin{align} \langle \pmb{x}, \pmb{y}\rangle := \pmb{x}^\top \pmb{y} = \sum_{i=1}^{N_x} \pmb{x}_i *\pmb{y}_i \end{align} \]

The previous formula arises by formally extending the Euclidean distance formula to arbitrary dimensions.

Euclidean norm
Let \( \pmb{x}\in\mathbb{R}^{N_x} \). The Euclidean norm of \( \pmb{x} \) is defined \[ \begin{align} \parallel \pmb{x}\parallel := \sqrt{ \sum_{i=1}^{N_x} x_i^2} \equiv \sqrt{\pmb{x}^\top\pmb{x}} \end{align} \]

It is important to note that there are other distances that can be defined on \( \mathbb{R}^{N_x} \) different than the Euclidean distance;
- in particular, the Euclidean distance represents a “flat” distance in all directions without any preference or penalty.
Note, it can be shown that the Euclidean inner product satisfies

\[ \begin{align} \pmb{x}^\top\pmb{y} = \parallel \pmb{x} \parallel * \parallel \pmb{y} \parallel \cos\left(\theta\right), \end{align} \] where,
1. \( \parallel \pmb{x}\parallel \) refers to the Euclidean length of the vector, defined as \( \parallel \pmb{x}\parallel =\sqrt{\pmb{x}^\top\pmb{x}} \); and
2. \( \theta \) is the angle formed by the two vectors \( \pmb{x} \) and \( \pmb{y} \) at the origin \( \boldsymbol{0} \).

The Euclidean norm and inner product

Following our previous example, we will demonstrate the array transpose function and the dot product:
Recall that x had the following dimensions

np.shape(x)

(3,)

If we compare x and its transpose, we see

array([1, 2, 3])

np.transpose(x)

array([1, 2, 3])

This is due to the fact that numpy does not distinguish between row and column vectors.
- The transpose() function extends, however, to two-dimensional arrays in the usual fashion.

The Euclidean norm and inner product

We can therefore compute the “dot” or Euclidean inner product several different ways:

x.dot(y)

np.sum(x*y)

np.inner(x,y)

x @ y

The @ notation refers to general matrix multiplication, which we will discuss shortly.

Orthogonality

Notice that the equation

\[ \begin{align} \pmb{x}^\top\pmb{y} = \parallel \pmb{x} \parallel * \parallel \pmb{y} \parallel \cos\left(\theta\right), \end{align} \]

generalizes the idea of a perpendicular angle between lines;
- the above product is zero if and only if \( \theta = \frac{\pi}{2} + k *\pi \) for any \( k\in \mathbb{Z} \).

Orthogonal vectors
We say that two vectors \( \pmb{x},\pmb{y} \) are orthogonal if and only if \[ \begin{align} \pmb{x}^\top \pmb{y} = \pmb{0} \end{align} \]

Notice that with scalar / vector multiplication defined as

\[ \tilde{\pmb{x}}:= \alpha * \pmb{x}:= \begin{pmatrix} \alpha * x_1 \\ \vdots \\ \alpha * x_{N_x}\end{pmatrix} \]

then

\[ \begin{align} \tilde{\pmb{x}}^\top \pmb{y} = 0 & & \Leftrightarrow & & \pmb{x}^\top \pmb{y} = 0 \end{align} \]
This brings us to an important notions of linear combinations and subspaces.

Linear combinations and subspaces

The scalar multiples of \( \pmb{x} \) give a simple example of linear combinations of vectors.

Linear combination
Let \( n\geq 1 \) be an arbitrary integer, \( \alpha_i\in\mathbb{R} \) and \( \pmb{x}_i\in\mathbb{R}^{N_x} \) for each \( i=1,\cdots,n \). Then \[ \begin{align} \pmb{x} := \sum_{i=1}^n \alpha_i \pmb{x}_i \end{align} \] is a linear combination of the vectors \( \{\pmb{x}_i\}_{i=1}^{n} \).

A subspace can then be defined from linear combinations of vectors as follows.

Subspace
The collection of vectors \( V\subset \mathbb{R}^{N_x} \) is denoted a subspace if and only if for any arbitrary collection vectors \( \pmb{x}_i\in V \) and scalars \( \alpha_i\in\mathbb{R} \), their linear combination \[ \sum_{i=1}^n \alpha_i \pmb{x}_i = \pmb{x} \in V. \]

With the above linear combinations in mind, we will use the following notation

\[ \begin{align} \mathrm{span}\{\pmb{x}_i\}_{i=1}^n := \left\{\pmb{x}\in\mathbb{R}^{N_x} : \exists\text{ } \alpha_i \text{ for which } \pmb{x}= \sum_{i=1}^{n}\alpha_i \pmb{x}_i\right\} . \end{align} \]
It can be readily seen then that the span of any collection of vectors is a subspace by construction.

Linear independence and bases

Related notions are linear independence, dependence and bases

Linear independence / dependence
Let \( \pmb{x}\in\mathbb{R}^{N_x} \) and \( \pmb{x}_i\in\mathbb{R}^{N_x} \) for \( i=1,\cdots,n \). The vector \( \pmb{x} \) is linearly independent (respectively dependent) with the collection \( \{\pmb{x}_i\}_{i=1}^{n} \) if and only if \( \pmb{x}\notin \mathrm{span}\{\pmb{x}_i\}_{i=1}^{n} \) (respectively \( \pmb{x}\in \mathrm{span}\{\pmb{x}_i\}_{i=1}^{n} \)).

It is clear then that, e.g., \( \pmb{x}_1 \) is linearly dependent with \( \mathrm{span}\{\pmb{x}_i\}_{i=1}^n \) trivially.
A related idea is whether for some vector \( \pmb{x}\in \mathrm{Span}\{\pmb{x}_i\}_{i=1}^n \) the choice of the scalar coefficients \( \alpha_i \) defining \( \pmb{x}=\sum_{i=1}^n \alpha_x \pmb{x}_i \) is unique.

Bases
Let \( V\subset \mathbb{R}^{N_x} \) be a subspace. A collection \( \{\pmb{x}_i\}_{i=1}^{n} \) is said to be a basis for \( V \) if \( V = \mathrm{span}\{\pmb{x}_i\}_{i=1}^n \) and if \[ \begin{align} \pmb{0} = \sum_{i=1}^n \alpha_i \pmb{x}_i \end{align} \] holds if and only if \( \alpha_i=0 \) \( \forall i \).

In particular, a choice of a basis for \( V \) gives a unique coordinatization of any vector \( \pmb{x}\in V \).
- If we suppose there existed two coordinatizatons for a vector \( \pmb{x} \) in the basis \( \{\pmb{x}_i\}_{i=1}^n \),
\[ \begin{align} \pmb{x}=\sum_{i=1}^n \alpha_i \pmb{x}_i = \sum_{i=1}^n \beta_i \pmb{x}_i & & \Leftrightarrow & &\pmb{0} = \sum_{i=1}^n \left(\alpha_i - \beta_i \right) \pmb{x}_i \end{align} \] and all \( \beta_i = \alpha_i \).

Orthogonal bases and subspaces

When we define a choice of inner product, such as the Euclidean inner product, a special class of basis is often useful for theoretical / computational purposes.

Orthogonal (Orthonormal) bases
Let \( \{\pmb{x}_i\}_{i=1}^n \) define a basis for \( V \subset \mathbb{R}^{N_x} \). The basis is said to be orthogonal if and only if each pair of basis vectors is orthogonal. A basis is said to orthonormal if, moreover, each basis vector has norm equal to one. In particular, for an orthonormal basis, if \[ \begin{align} \pmb{x} = \sum_{i=1}^n \alpha_i \pmb{x}_i \end{align} \] then \( \alpha_i = \pmb{x}_i^\top \pmb{x} \).

The above property shows that we can recover the “projection” coefficient \( \alpha_i \) of \( \pmb{x} \) into \( V \) using the inner product of the vector \( \pmb{x} \) with the basis vector \( \pmb{x}_i \).
- This is a critical property, which we will generalize after we define orthogonal subspaces.

Orthogonal subspaces
The subspaces \( W,V\subset \mathbb{R}^{N_x} \) are orthogonal if and only if for every \( \pmb{y}\in W \) and \( \pmb{x}\in V \), \[ \begin{align} \pmb{y}^\top \pmb{x} = \pmb{0}. \end{align} \] Orthogonal subspaces will be denoted with \( W \perp V \).

With these constructions in mind, we can now introduce two of the most fundamental tools of inner product spaces:
- orthogonal projections; and
- the Gram-Schmidt process.

Orthogonal projections

When we think of orthogonal projections, we can think about the way the shadow of an object is projected onto two dimensions via the sun.

Particularly, the orthogonal projection would correspond to high-noon with the sun directly overhead.

Courtesy of ZackaryCW CC BY-SA 4.0, via Wikimedia Commons

Consider the projection abstractly as a map \[ \begin{align} \boldsymbol{\Pi}:&\mathbb{R}^{N_x}\rightarrow \mathrm{Im}(\boldsymbol{\Pi})\subset \mathbb{R}^{N_x}. \end{align} \] where \( \mathrm{Im}(\boldsymbol{\Pi}):=\left\{\pmb{x} \in \mathbb{R}^{N_x} : \boldsymbol{\Pi}\pmb{v}=\pmb{x}\right\} \) is called the image subspace.
There is a complementary subspace \( \mathrm{Null}(\boldsymbol{\Pi}) \) called the null space, which is defined \[ \begin{align} \mathrm{Null}(\boldsymbol{\Pi}) := \left\{\pmb{x} \in \mathbb{R}^{N_x} : \boldsymbol{\Pi}(\pmb{x})=\pmb{0}\right\} \end{align} \]

Together, these subspaces can be used to decompose the vector space into complementary components with important properties.

Properties of orthogonal projections

There are a number of results about orthogonal projections that we will rely upon:

Idempotence – all projection operators are idempotent, i.e., \[ \begin{align} \boldsymbol{\Pi}^2 \pmb{x} := \boldsymbol{\Pi} \boldsymbol{\Pi} \pmb{x} = \boldsymbol{\Pi}\pmb{x} \end{align} \]
Partial identity operation – due to the idempotence property, \( \boldsymbol{\Pi} \) acts as an identity on its image, i.e., \[ \begin{align} \pmb{x} \in \mathrm{Im}(\boldsymbol{\Pi}) & & \Leftrightarrow & & \boldsymbol{\Pi}\pmb{x} = \pmb{x}. \end{align} \]
Orthogonal decomposition – let \( \mathrm{Im}(\boldsymbol{\Pi})\neq \pmb{0} \neq \mathrm{Null}(\boldsymbol{\Pi}) \), then \[ \mathrm{Im}(\boldsymbol{\Pi}) \oplus \mathrm{Null}(\boldsymbol{\Pi}) = \mathbb{R}^{N_x} \] and \( \mathrm{Im}(\boldsymbol{\Pi}) \perp \mathrm{Null}(\boldsymbol{\Pi}) \).

The orthogonal decomposition is an exceptionally powerful property;

this means that an arbitrary vector \( \pmb{x}\in\mathbb{R}^{N_x} \) can be written uniquely in terms of two components, \[ \begin{align} \pmb{x} \equiv \boldsymbol{\Pi}\pmb{x} + \left(\mathbf{I} - \boldsymbol{\Pi}\right)\pmb{x} \end{align} \] where

\( \mathbf{I} \) will represent the identity map;
\( \boldsymbol{\Pi}\pmb{x} \in \mathrm{Im}(\boldsymbol{\Pi}) \); and
\( \left(\mathbf{I} - \boldsymbol{\Pi}\right)\pmb{x} \in \mathrm{Null}(\pmb{\Pi}) \).

In particular, \( \left(\mathbf{I} - \boldsymbol{\Pi}\right) \) is the projector into the orthogonal, complementary subspace to \( \mathrm{Im}(\boldsymbol{\Pi}) \).
Similarly, \( \mathrm{Null}\left(\mathbf{I} - \boldsymbol{\Pi}\right) = \mathrm{Im}(\boldsymbol{\Pi}) \) and \( \mathrm{Im}\left(\mathbf{I} - \boldsymbol{\Pi}\right) = \mathrm{Null}(\boldsymbol{\Pi}) \).
These results are a direct consequence of what is known as the orthogonal projection lemma.

Orthogonal projection lemma

Orthogonal projection lemma
Let \( \langle \circ , \circ \rangle:\mathbb{R}^{N_x}\times\mathbb{R}^{N_x} \rightarrow \mathbb{R} \) be an inner product, and let \( V\subset \mathbb{R}^{N_x} \) be an arbitrary subspace. If \( \pmb{x}\in \mathbb{R}^{N_x} \), there exists a unique projection \( \boldsymbol{\Pi}_V\pmb{x}\in V \) satisfying \[ \begin{align} \mathrm{min}_{\pmb{y}\in V} \parallel \pmb{x} - \pmb{y}\parallel = \parallel \pmb{x} - \boldsymbol{\Pi}_V \pmb{x}\parallel, \end{align} \] if and only if \[ \begin{align} \langle \pmb{x} - \boldsymbol{\Pi}\pmb{x}, \pmb{x}'\rangle = 0 \quad \forall \pmb{x}'\in V. \end{align} \]

This states that the point in space which minimizes the distance between \( \pmb{x} \) and the subspace \( V \) is the point in \( V \) that such that
- the residual of \( \pmb{x} \) with \( \boldsymbol{\Pi}_V\pmb{x} \) is orthogonal to the subspace \( V \).
Therefore, the residual forms a \( 90^\circ \) angle with \( V \), or equivalently,

\[ \begin{align} \mathrm{span}\left\{\pmb{x} - \boldsymbol{\Pi}_V \pmb{x}\right\} \perp V. \end{align} \]
Suppose \( \pmb{y}\in\mathbb{R}^{N_x} \) is an arbitrary, norm one vector and let \( V := \mathrm{span}\{\pmb{y}\} \); this implies

\[ \begin{align} 0 = \pmb{y}^\top \left(\pmb{x} - \boldsymbol{\Pi}\pmb{x}\right) & & \Leftrightarrow & & \pmb{y}^\top \pmb{x} = \pmb{y}^\top \boldsymbol{\Pi}\pmb{x}. \end{align} \]
But \( \pmb{y} \) is an orthogonal basis for \( V \), such that we see \( \boldsymbol{\Pi}\pmb{x} := \left(\pmb{y}^\top \pmb{x}\right)\pmb{y} \).
Furthermore, if \( \pmb{x} \) is not a scalar multiple of \( \pmb{y} \), then \( \left\{\pmb{x} - \left(\pmb{y}^\top \pmb{x}\right)\pmb{y}, \pmb{y}\right\} \) is an orthogonal basis for \( \mathrm{span}\left\{\pmb{x},\pmb{y}\right\} \).

Gram-Schmidt

The construction performed on the last slide can be repeated recursively on subspaces of expanding size.
Particularly, the recursive algorithm described above is called the Gram-Schmidt process which produces an orthonormal basis for any arbitrary subspace.
This process is visualized in \( \mathbb{R}^3 \) to the right:

Courtesy of Lucas Vieira, Public domain, via Wikimedia Commons

The Gram-Schmidt process is of fundamental importance because it is a constructive algorithm to produce arbitrary orthogonal bases.

Again, these give unique coordinatizations of points in space but with respect to a specific choice of basis vectors.
These vectors will have the strong property of orthogonality, and this will further give an explicit representation of the orthogonal projector \( \boldsymbol{\Pi} \).

As a byproduct of the Gram-Schmidt process, we can define a naive version of one of the most fundamental array transforms in numerical linear algebra / applied mathematics, the QR factorization.
The QR factorization will be the first among several important factorizations we will frequently rely on for computational and theoretical purposes.

QR decompositions

Suppose we have a collection of linearly independent vectors \( \{\pmb{x}_i\}_{i=1}^n \), and denote the Gram-Schmidt basis constructed from these vectors \( \{\pmb{e}_i\}_{i=1}^n \).
Notice, the Gram-Schmidt process describes the coordinates of the original vectors as

\[ \begin{align} \pmb{x}_1 &= \left(\pmb{e}_1^\top \pmb{x}_1 \right) \pmb{e}_1 \\ \pmb{x}_2 &= \left(\pmb{e}_1^\top \pmb{x}_2\right) \pmb{e}_1 + \left(\pmb{e}_2^\top \pmb{x}_2\right) \pmb{e}_2 \\ \vdots & \\ \pmb{x}_n &= \sum_{i=1}^n \left(\pmb{e}_i^\top \pmb{x}_n\right) \pmb{e}_i. \end{align} \]
We can thus define the matrices \( \mathbf{R} \) and \( \mathbf{Q} \) such that

\[ \begin{align} \mathbf{R} := \begin{pmatrix} \pmb{e}_1^\top \pmb{x}_1 & \pmb{e}_1^\top \pmb{x}_2 & \cdots & \pmb{e}_1^\top \pmb{x}_n \\ 0 & \pmb{e}_2^\top \pmb{x}_2 & \cdots & \pmb{e}_2^\top \pmb{x}_n \\ \vdots & & \ddots & \vdots \\ 0 & 0 & \cdots & \pmb{e}_n^\top \pmb{x}_n \end{pmatrix} & & \mathbf{Q}^j := \pmb{e}_j \end{align} \]

QR decompositions

The by-product of the Gram-Schmidt process illustrated before is known as the QR factorization.

QR factorization
Let \( \mathbf{A}\in \mathbb{R}^{N\times M} \). There exists a unique pair of matrices \( \mathbf{R} \) and \( \mathbf{Q} \) defined by the Gram-Schmidt process such that, \[ \begin{align} \mathbf{A} \equiv \mathbf{Q} \mathbf{R} \end{align} \] where the above refers to matrix multiplication.

We haven't yet introduced matrix multiplication fully – this will be our next topic.
However, it is worth noting some important properties of the QR factorization:
1. The matrix \( \mathbf{Q} \) is known as an orthogonal matrix, inheriting a variety of special properties, soon to be discussed.
2. The matrix \( \mathbf{R} \) is known as an upper triangular matrix, inheriting a variety of special properties, soon to be discussed.
3. The diagonal elements of \( \mathbf{R} \) are precisely the eigenvalues of \( \mathbf{A} \), soon to be discussed.
4. For any \( 1 \leq j_1 \leq j_2 \leq n \), \[ \begin{align} \mathrm{span}\left\{\pmb{x}_{i}\right\}_{i=1}^{j_1} \subset \mathrm{span}\left\{\pmb{e}_i\right\}_{i=1}^{j_2} & & \mathrm{span}\left\{\pmb{e}_i\right\}_{i=1}^{j_1} \subset \mathrm{span}\left\{\pmb{x}_{i}\right\}_{i=1}^{j_2} \end{align} \]
The above properties are actually simple but power results that are implied by the construction.
In practice, however, the Gram-Schmidt process is rarely used explicitly;
- rather numerically efficient techniques such as Housholder transformations can be used to implicitly define the transformation.

QR decompositions

As a quick demonstration in numpy consider the following:

my_array

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

Q, R = np.linalg.qr(my_array)
Q

array([[-0.12309149,  0.90453403,  0.40824829],
       [-0.49236596,  0.30151134, -0.81649658],
       [-0.86164044, -0.30151134,  0.40824829]])

array([[-8.12403840e+00, -9.60113630e+00, -1.10782342e+01],
       [ 0.00000000e+00,  9.04534034e-01,  1.80906807e+00],
       [ 0.00000000e+00,  0.00000000e+00, -8.88178420e-16]])

my_array - Q @ R

array([[-4.44089210e-16, -3.10862447e-15, -4.88498131e-15],
       [ 0.00000000e+00, -8.88178420e-16, -2.66453526e-15],
       [ 8.88178420e-16, -1.77635684e-15, -3.55271368e-15]])

Matrix / vector multiplication

We now have introduced a few ideas around matrix multiplication which we will formalize.
As a simple case that leads to general matrix multiplication, let us consider matrix / vector multiplication.
Let's suppose that

\[ \begin{align} \mathbf{N} \in \mathbb{R}^{N \times N_x} & & \pmb{x} \in \mathbb{R}^{N_x \times 1} \end{align} \]
We will suppose that we can write the following

\[ \mathbf{N} = \begin{pmatrix} \pmb{n}_1^\top \\ \vdots \\ \pmb{n}_N^\top \end{pmatrix} \] where each \( \pmb{n}_i^\top \in \mathbb{R}^{1 \times N_x} \) is a row of the matrix \( \mathbf{N} \).

Matrix / vector multiplication

Recall the vector multiplication,

\[ \pmb{n}_i^\top \pmb{x} = \sum_{j=1}^{N_x} n_{i,j} * x_{j} \]
The product of the matrix and the vector is given as,

\[ \mathbf{N}\pmb{x} = \begin{pmatrix} \pmb{n}^\top_1 \pmb{x} \\ \vdots \\ \pmb{n}_N^\top \pmb{x} \end{pmatrix} \]

where each entry of the resultant vector \( i =1, \cdots ,N_x \) is given by the inner product of the \( i \)-th row of \( \mathbf{N} \) with the column vector \( \pmb{x} \).
This formulation generalizes immediately to the case in which we extend \( \pmb{x} \) into an array with \( N_x \) rows, but an arbitrary number of columns.

Matrix multiplication

For two general matrices, \[ \begin{align} \mathbf{N} \in \mathbb{R}^{N \times p} & & \mathbf{M} \in \mathbb{R}^{p \times M} \end{align} \]
Let us write these matrices in terms of sub-vectors as, \[ \begin{align} \mathbf{N} = \begin{pmatrix} \pmb{n}_1^\top \\ \vdots \\ \pmb{n}_N^\top \end{pmatrix} & & \mathbf{M} = \begin{pmatrix} \pmb{m}_1 & \cdots & \pmb{m}_M\end{pmatrix} \end{align} \] where

\( \pmb{n}_i^\top \in \mathbb{R}^{1 \times p} \) is the \( i \)-th row vector in the matrix \( \mathbf{N} \in \mathbb{R}^{N\times p} \); and
\( \pmb{m}_j \in \mathbb{R}^{p\times 1} \) is the \( j \)-th column vector in the matrix \( \mathbf{M}\in \mathbb{R}^{p \times M} \)

We have their product defined as, \[ \begin{align} \mathbf{N} \mathbf{M} = \begin{pmatrix} \pmb{n}_1^\top \pmb{m_1} & \pmb{n}_1^\top \pmb{m}_2 & \cdots & \pmb{n}_1^\top \pmb{m}_M \\ \vdots & \ddots & \cdots & \vdots \\ \pmb{n}_N^\top \pmb{m}_1 & \pmb{n}_N^\top \pmb{m}_2 & \cdots & \pmb{n}_N^\top \pmb{m}_M \end{pmatrix} \in \mathbb{R}^{N\times M} \end{align} \]
Notice that for the product to make sense, the dimensionality has to match between the \( p \) columns in \( \mathbf{N} \) and the \( p \) rows in \( \mathbf{M} \).

This inner dimension of \( p \) is eliminated in the product of the matrices to form the final dimensionality of \( N \times M \).

Diagonal matrices

As noted already, there are several very important classes of matrices that are useful for computation.
- In particular, matrix multiplication can be an intensive numerical procedure, and certain classes of matrices enjoy advantages in computation.
Diagonal matrices are the simplest case, for which general matrix multiplication and the Schur product on square matrices agree.

Diagonal matrices
The matrix \( \mathbf{D}\in \mathbb{R}^{N \times D} \) is said to be diagonal if and only if \[ \begin{align} \mathbf{D}[i,j]=0 & & \forall i \neq j \end{align} \] In particular, let \( d = \min\{N,D\} \), and define \[ \begin{align} \pmb{d} := \begin{pmatrix} D_{1,1} \\ \vdots \\ D_{d,d}\end{pmatrix} \end{align} \] we will sometimes refer to the diagonal matrix \( \mathbf{D}=\mathrm{diag}(\pmb{d}) \) where the expansion of the zero entries to the proper dimensions will be inferred by the context.

Diagonal matrices

In particular, we will often consider two square diagonal matrices, \( \mathrm{diag}(\pmb{m}),\mathrm{diag}(\pmb{n})\in\mathbb{R}^{M \times M} \),

\[ \begin{align} \mathrm{diag}(\pmb{m}) := \begin{pmatrix} m_{1,1} & 0 & \cdots & 0 \\ 0 & m_{2,2} & \cdots & 0 \\ \vdots & & \ddots & 0 \\ 0 & \cdots & & m_{M,M} \end{pmatrix} & & \mathrm{diag}(\pmb{n}): = \begin{pmatrix} n_{1,1} & 0 & \cdots & 0 \\ 0 & n_{2,2} & \cdots & 0 \\ \vdots & & \ddots & 0 \\ 0 & \cdots & & n_{M,M} \end{pmatrix} \end{align} \]
It is easy to verify, following row-column multiplication, that

\[ \begin{align} \mathrm{diag}(\pmb{m})\mathrm{diag}(\pmb{n}) \equiv \mathrm{diag}(\pmb{m}) \circ \mathrm{diag}(\pmb{n})\equiv \mathrm{diag}(\pmb{m}\circ \pmb{n}) \end{align} \]

Diagonal matrices

In Python, we can recover a similar statement to the last slide as

X = np.diag(x)
Y = np.diag(y)
np.diag(x*y)

array([[ 4,  0,  0],
       [ 0, 10,  0],
       [ 0,  0, 18]])

X @ Y

array([[ 4,  0,  0],
       [ 0, 10,  0],
       [ 0,  0, 18]])

Notice, however, this function can also extract the diagonal from an arbitrary array:

np.diag(X @ Y)

array([ 4, 10, 18])

Triangular matrices

Another highly useful class of matrices we have encountered are triangular matrices.

Triangular matrices
Let \( \mathbf{A} \in \mathbb{R}^{N\times N} \), it is known as upper (respectively lower) triangular if and only if entries satisfy \[ \begin{align} {\color{#d95f02} {\mathbf{A}[i,j] = 0 \quad \forall i > j} } & & \left(\text{or respectively } {\color{#1b9e77} {A[i,j]= 0 \quad \forall j > i} } \right) \end{align} \]

The key property of upper (respectively lower) triangular matrices is that the product of two upper (respectively lower) triangular matrices is also upper (respectively lower) triangular.
This property thus maintains the sparisty of triangular matrices, and implies their easy computation.

Orthogonal matrices

Finally, we have encountered orthogonality as a central theme in this lecture.

Orthogonal matrices
Let \( \mathbf{U}\in\mathbb{R}^{N_x\times N_x} \) with columns defined \( \mathbf{U}^j=\pmb{u}_j \). The matrix \( \mathbf{U} \) is said to be orthogonal if and only if \[ \begin{align} \pmb{u}_j^\top \pmb{u}_i = \begin{cases} 0 & \text{ for } i \neq j\\ 1 & \text{ for } i = j \end{cases}. \end{align} \]

It is a direct consequence of matrix multiplication that for orthogonal matrices

\[ \begin{align} \mathbf{U}^\top \mathbf{U} = \mathbf{U}\mathbf{U}^\top= \mathbf{I} = \mathrm{diag}(\pmb{1}) \end{align} \]

where
- \( \mathbf{I} \) refers to the identity matrix, i.e., the diagonal matrix with ones on the diagonal; and
- \( \pmb{1} \) refers to the vector of ones.
Notice that the columns \( \{\pmb{u}_j\}_{j=1}^{N_x} \) actually define an orthonormal basis for \( \mathbb{R}^N \) by the above properties.
Particularly, for an arbitrary \( \pmb{x} \) we can write,

\[ \begin{align} \pmb{x} = \mathbf{I}\pmb{x} &\equiv \mathbf{U} \mathbf{U}^\top \pmb{x}; \end{align} \]

where, by definition, we know that the \( i \)-th entry of the vector \( \mathbf{U}^\top \pmb{x} \) is identically equal to \( \pmb{u}_i^\top \pmb{x} \).

Orthogonal matrices

The statement from the last slide then reads,

\[ \begin{align} \pmb{x} &\equiv \begin{pmatrix} \pmb{u}_1 & \cdots & \pmb{u}_N \end{pmatrix} \begin{pmatrix} \left(\pmb{u}_1^\top \pmb{x}\right) \\ \vdots \\ \left(\pmb{u}_{N_x}^\top \pmb{x}\right) \end{pmatrix} \\ &= \sum_{i=1}^N \left(\pmb{u}_i^\top \pmb{x}\right) \pmb{u}_i \end{align} \]

giving the explicit basis representation.
A related statement can be considered when we have a matrix of the form,

\[ \begin{align} \mathbf{U}\in\mathbb{R}^{N_x \times n} & & \mathbf{U}^j=\pmb{u}_j \in\mathbb{R}^{N_x} \end{align} \]

for some \( n < N_x \), where the columns \( \pmb{u}_j \) are orthonormal.
Matrix multiplication implies the following relationship

\[ \begin{align} \mathbf{U}^\top \mathbf{U} = \mathbf{I} \in \mathbb{R}^{n\times n} & & \mathbf{I} \neq \mathbf{U}\mathbf{U}^\top \in\mathbb{R}^{N_x \times N_x} \end{align} \]
In particular, the different arrangements of the transpose yield matrices with completely different dimensions.

Orthogonal matrices

Let \( \pmb{x} \) be an arbitrary vector in \( \mathbb{R}^{N_x} \), in analogy to the last slide we will write

\[ \begin{align} \mathbf{U}\mathbf{U}^\top \pmb{x} &\equiv \begin{pmatrix} \pmb{u}_1 & \cdots & \pmb{u}_n \end{pmatrix} \begin{pmatrix} \left(\pmb{u}_1^\top \pmb{x}\right) \\ \vdots \\ \left(\pmb{u}_n^\top \pmb{x}\right) \end{pmatrix} \\ &= \sum_{i=1}^n \left(\pmb{u}_i^\top \pmb{x}\right) \pmb{u}_i \\ &\equiv \boldsymbol{\Pi}_{\mathrm{span}\left\{\pmb{u}_i\right\}_{i=1}^n} \pmb{x} \end{align} \]

where \( \boldsymbol{\Pi}_{\mathrm{span}\left\{\pmb{u}_i\right\}_{i=1}^n} \) is the orthogonal projection into the subspace defined by the column span of \( \mathbf{U} \).
We thus write

\[ \begin{align} \pmb{x} = \left(\mathbf{I} - \mathbf{U}\mathbf{U}^\top\right)\pmb{x} + \mathbf{U}\mathbf{U}^\top\pmb{x} \end{align} \]

giving the explicit orthogonal decomposition with respect to an arbitrary subspace.
One of the generic ways to construct an arbitrary projector into some subspace \( V\subset \mathbb{R}^{N_x} \) is thus to draw an arbitrary spanning set, \( \mathrm{span}\{\pmb{x}_i\}_{i=1}^n = V \)
The QR algorithm then constructs the matrix \( \mathbf{U}:= \mathbf{Q} \in \mathbb{R}^{N_x \times n} \).

Orthogonal matrices

Orthogonal matrices are obviously useful, as they provide explicit forms of orthogonal projectors.
The subject of orthogonal matrices is in fact, incredibly rich, for a variety of reasons that will go beyond our overall scope.
However, we will introduce two other properties of central interest.
Suppose that we have arbitrary vectors \( \pmb{x},\pmb{y}\in\mathbb{R}^{N_x} \) and an orthogonal matrix \( \mathbf{U}\in\mathbb{R}^{N_x \times N_x} \).
Notice that for \( \tilde{\pmb{x}}:= \mathbf{U}\pmb{x} \) and \( \tilde{\pmb{y}}:= \mathbf{U}\pmb{y} \),

\[ \begin{align} \tilde{\pmb{x}}^\top \tilde{\pmb{y}} & = \pmb{x}^\top \mathbf{U}^\top \mathbf{U} \pmb{y} = \pmb{x}^\top \pmb{y} \end{align} \]
The above property implies that orthogonal transformations preserve the Euclidean inner product, and thus the Euclidean distance,

\[ \parallel \tilde{\pmb{x}} \parallel = \parallel \pmb{x} \parallel. \]
It turns out that this has to do with the fact that orthogonal matrices are actually generalizations of rotations of the frame of reference, in arbitrary dimensions.
These “rotation” matrices preserve the angles between vectors, as well as distances, where \( \mathbf{U}^\top \) corresponds to the reverse rotation to \( \mathbf{U} \).
Correspondingly, the product of two orthogonal matrices is also orthogonal, and can be considered the composition of rotations.