Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
Matrices and vectors give us tools to “represent” abstract linear transformations by giving coordinates to them.
Linear transformations refer to a special class of maps between vector spaces;
Linear transformations
Let \( a,b\in \mathbb{R} \) and \( \pmb{x},\pmb{y}\in V \) be arbitrary scalars and vectors and \( T \) be some transformation \[ \begin{align} T : V \rightarrow W. \end{align} \] \( T \) is said to be a linear transformation if it satisfies the property \[ \begin{align} T\left(a \pmb{x} + b \pmb{y} \right) = a T\left( \pmb{x}\right) + b T\left(\pmb{y}\right). \end{align} \]
Any map that does not satisfy the property above is said to be nonlinear;
For reasons we shall see, we often use a linear approximation when it is appropriate to simplify the challenge of nonlinearities.
Let's suppose that for simplicity, a transformation
\[ \begin{align} T:V \rightarrow V \end{align} \]
is a map of the vector space \( V \) to itself.
Suppose, moreover, that we have chosen some basis for \( V \), \( \{\pmb{x}_i\}_{i=1}^N \);
\[ \begin{align} T(\pmb{x}_i) = \sum_{j=1}^N \alpha_{i,j} \pmb{x}_j \end{align} \]
as \( T(\pmb{x})\in V \) by its definition, and by the definition of a basis.
It is by choosing a basis for the domain and the image of the map, that we make a coordinatization of the map \( T \).
Notice that, by the linearity of the transformation \( T \), for any vector \( \pmb{x} = \sum_{i=1}^N \beta_i \pmb{x}_i \) its map can be written as
\[ \begin{align} T\left(\pmb{x}\right) &= T\left( \sum_{i=1}^N \beta_i \pmb{x}_i \right)=\sum_{i=1}^N \beta_i T\left(\pmb{x}_i\right) \end{align} \]
so that determining the output of the map on the basis determines the output for any vector.
Recall the notation for the coordinatization of the map, \( T(\pmb{x}_i) = \sum_{j=1}^N \alpha_{i,j} \pmb{x}_j \).
In particular, suppose we order the coefficients \( \alpha_{i,j} \) so that \( i \) refers to the row index and \( j \) refers to column index of the array
\[ \begin{align} \mathbf{A} := \begin{pmatrix} \alpha_{1,1} & \cdots & \alpha_{1,N} \\ \vdots & \ddots & \vdots \\ \alpha_{N,1} & \cdots & \alpha_{N,N}. \end{pmatrix} \end{align} \]
Recall, if \( \pmb{x} = \sum_{i=1}^N \beta_i \pmb{x}_i \), its coordinates in this basis are given as
\[ \begin{align} \pmb{x} := \begin{pmatrix} \beta_1 & \cdots & \beta_N \end{pmatrix}^\top \end{align} \]
Writing the transformation in these bases,
\[ T(\pmb{x})= \sum_{i=1}^N \beta_i T(\pmb{x}_i) = \sum_{i=1}^N \beta_i \left( \sum_{j=1}^N \alpha_{i,j} \pmb{x}_j\right) = \sum_{j=1}^N \left( \sum_{i=1}^N \beta_i \alpha_{i,j} \right)\pmb{x}_j \]
so that the map of \( \pmb{x} \) under \( T \) has the coordinatization as
\[ \begin{align} T(\pmb{x}) := \begin{pmatrix} \sum_{i=1}^N \beta_i \alpha_{i,1} & \cdots & \sum_{i=1}^N \beta_i \alpha_{i,N}\end{pmatrix}^\top \equiv \mathbf{A} \pmb{x} \end{align} \]
so that \( \mathbf{A} \) completely describes the transformation \( T \).
Note that a linear transformation need not map a vector space \( V \) to itself;
Generally, a map to a subspace also needs not be orthogonal;
Invariant subspaces of linear transformations
Let \( W\subset V \) be a subspace, and let \( T:V \rightarrow V \) be a linear map. The subspace \( W \) is said to be invariant under \( T \) if for every \( \pmb{y} \in W \) then \( T(\pmb{y})\in W \).
In the case that a map \( T \) has an invariant subspace, there is a well-defined restriction of \( T \) to the subspace
\[ T|_W: W \rightarrow W. \]
In particular, this restriction can similarly be written as matrix transformation;
With the above points, it brings us to the question of what basis should we choose for a linear transformation?
The standard Euclidean basis is the most common choice, but the Euclidean basis may not take advantage of the invariant properties of the transformation.
Particularly, if \( M \ll N \) and we are only concerned with the action on the subspace, a well-chosen basis as above can perform a dramatic reduction to the computation of the map.
As a motivating example, consider the case where \( \mathrm{span}\{\pmb{x}\} \subset \mathbb{R}^N \) is an invariant subspace of the transformation
\[ T:\mathbb{R}^N \rightarrow \mathbb{R}^N. \]
By the definition,
\[ T(\pmb{x}) \in \mathrm{span}\{\pmb{x}\} \Leftrightarrow T(\pmb{x}) =\alpha \pmb{x}. \]
Therefore, for any arbitrary vector \( \beta \pmb{x} \) in the subspace
\[ \begin{align} \mathbf{A}\beta \pmb{x} = \alpha \beta\pmb{x} \end{align} \]
so that the action of the linear transformation on the entire subspace is that of scalar multiplication.
If we choose \( \pmb{x} \) as a basis vector for \( \mathbb{R}^N \), the associated matrix \( \mathbf{A} \) for this choice of basis will have a diagonal element corresponding to this action.
Diagonal matrices are extremely simple for computation; to make efficient use of computations we will be concerned with matrix decompositions, corresponding to special choices of coordinates.
The most fundamental matrix decomposition is known as the eigen value decomposition, discussed as follows.
Eigen values and vectors
If a nonzero vector \( \pmb{x} \) has the property that, \[ \mathbf{A}\pmb{x} =\lambda \pmb{x} \] then \( \pmb{x} \) is said to be an eigen vector of \( \mathbf{A} \) associated to the eigen value \( \lambda \).
Diagonal matrices are ones that have an entire coordinate system composed of eigen vectors;
Certain non-diagonal matrices can be transformed into diagonal matrices by finding such a coordinate system and applying a “change of basis” transformation.
Suppose that \( \pmb{x} \) is an eigen vector for a matrix \( \mathbf{A} \) associated to a zero eigenvalue \( \lambda=0 \); then the columns of \( \mathbf{A}^i \) we have a linear dependence defined as
\[ \begin{align} \sum_{i=1}^N x_i \mathbf{A}^i = \pmb{0} . \end{align} \]
Notice now that,
\[ \begin{align} \mathbf{A}\pmb{x} =\lambda \pmb{x} & & \Leftrightarrow & & \mathbf{A}\pmb{x} - \lambda \pmb{x} = 0 & & \Leftrightarrow & & \left(\mathbf{A} - \lambda \mathbf{I}_n\right) \pmb{x} = 0. \end{align} \]
This means that \( \pmb{x} \) is an eigen vector of the matrix \( \left(\mathbf{A} - \lambda \mathbf{I}_n\right) \) associated to the zero eigenvalue and a linear dependence in the columns of \( \left(\mathbf{A} - \lambda \mathbf{I}_n\right) \).
Not all matrices can be diagonalized with a change of basis;
This result is a special case of the more general spectral theorem:
The polar decomposition (restricted spectral theorem)
Suppose that \( \mathbf{A} \) is a real valued symmetric matrix, i.e., \( \mathbf{A} = \mathbf{A}^\top \). Then there exists an orthogonal change of basis matrix \( \mathbf{U} \) and diagonal matrix \( \mathbf{D} \) for which \[ \begin{align} \mathbf{A} = \mathbf{U}\mathbf{D} \mathbf{U}^\top. \end{align} \]
The columns of \( \mathbf{U} \) are the (orthonormal) eigen vectors of the matrix \( \mathbf{A} \) and the diagonal elements of \( \mathbf{D} \) are the associated eigen values.
If the eigenvalues are larger than one in absolute value, this action then corresponds to an expansion along these directions while eigen values with norm less than one correspond to a contraction;
The eigenvalues, with a correct rotation of perspective, thus completely describe the action of a symmetric matrix as above.
import numpy as np
from numpy.linalg import eig
from numpy.random import normal
np.random.seed(42)
my_array = normal(size=[3,3])
my_array
array([[ 0.49671415, -0.1382643 , 0.64768854],
[ 1.52302986, -0.23415337, -0.23413696],
[ 1.57921282, 0.76743473, -0.46947439]])
eig
function imported above:e_vals, e_vecs = eig(my_array)
e_vals
array([ 1.22795686+0.j , -0.71743524+0.61245434j,
-0.71743524-0.61245434j])
Notice that two of the eigen values are complex and come in a conjugate pair;
This is not the case, however, for real symmetric matrices as discussed in the polar decomposition.
Indeed,
my_symmetric_array = my_array.transpose() @ my_array
my_symmetric_array
array([[ 5.06025801, 0.78664234, -0.77628148],
[ 0.78664234, 0.66290088, -0.39501919],
[-0.77628148, -0.39501919, 0.69472676]])
e_vals, e_vecs = eig(my_symmetric_array)
e_vals
array([5.34613967, 0.78854214, 0.28320384])
e_vecs[:,0]
array([ 0.96813371, 0.17751704, -0.17664888])
my_symmetric_array @ e_vecs[:,0] / e_vals[0]
array([ 0.96813371, 0.17751704, -0.17664888])
If the map needs to be applied many times, computing the change of basis into the eigen vectors can be more cost-efficient than applying the map directly.
Notice that the above eigen values are all positive – this is actually a general property of “squared” matrices constructed as
\[ \begin{align} \mathbf{A}^\top \mathbf{A} \end{align} \]
for arbitrary arrays \( \mathbf{A} \) with linearly independent columns.
When all eigenvalues of a symmetric matrix \( \mathbf{A} \) are positive we can visualize the matrix as a hyper-ellipse.
The principal axes of the hyper-ellipse are given by the eigen vectors, while the length of these axes are given by the eigen values.
Positive (semi)-definite matrices
A real symmetric matrix \( \mathbf{A} \) is said to be positive definite if all of its eigenvalues are greater than zero. If all eigenvalues are non-negative, then \( \mathbf{A} \) is said to be positive semi-definite.
Positive definite matrices are special, in part, because they define alternative, weighted version of the Euclidean norm by \( \sqrt{\pmb{x}^\top \mathbf{A}\pmb{x}} \).
For our purposes, we will more commonly use the norm defined by the inverse of a positive definite \( \mathbf{A} \);
Let \( \mathbf{D} := \mathrm{diag}\left(\lambda_i\right) \) be the diagonal matrix of the eigen values for the positive definite matrix \( \mathbf{A} \).
Positive definite matrices also have well-defined square roots – notice that for
\[ \sqrt{\mathbf{A}} := \mathbf{A}^\frac{1}{2}:= \mathbf{U}\mathrm{diag}\left(\sqrt{\lambda}_i\right) \mathbf{U}^\top \]
we can write \( \mathbf{A}= \left(\mathbf{A}^\frac{1}{2}\right)^2 \).
Consider now an arbitrary matrix \( \mathbf{M}\in \mathbb{R}^{N\times M} \) for \( N>M \); if \( \mathbf{M} \) is not square, it cannot be symmetric, but each of
\[ \begin{align} \mathbf{M}^\top \mathbf{M} \in \mathbb{R}^{M\times M} & & \mathbf{M}\mathbf{M}^\top \in \mathbb{R}^{N\times N} \end{align} \] are symmetric.
Using the polar decomposition, we can thus find their diagonalizations
\[ \begin{align} \mathbf{M}^\top \mathbf{M} & = \mathbf{V} \mathbf{D} \mathbf{V}^\top\\ \mathbf{M}\mathbf{M}^\top & = \mathbf{U} \hat{\mathbf{D}} \mathbf{U}^\top, \end{align} \]
where
Assume that the columns of \( \mathbf{M} \) are linearly independent – then both \( \mathbf{D} \) and \( \hat{\mathbf{D}} \) have exactly \( M \) non-zero eigenvalues.
If we order the non-zero eigenvalues of \( \hat{\mathbf{D}} \) such that there is a non-zero block in the top left,
\[ \begin{align} \hat{\mathbf{D}} = \begin{pmatrix} \mathbf{D} & 0_{M \times (N-M)} \\ 0_{(N-M) \times M} & 0_{(N-M) \times (N-M)} \end{pmatrix} \end{align} \]
It can be demonstrated that \( \mathbf{M} \) has a decomposition into,
\[ \begin{align} \mathbf{M} = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^\top, \end{align} \]
where
\[ \begin{align} \boldsymbol{\Sigma} = \begin{pmatrix} \sqrt{\mathbf{D}} \\ 0_{(N-M) \times M} \end{pmatrix}. \end{align} \]
This decomposition is known as the singular value decomposition. The singular values are the (positive) principal-diagonal entries of \( \boldsymbol{\Sigma} \) usually ordered descendingly in size.
The columns of \( \mathbf{U} \) are called the left singular vectors while the columns of \( \mathbf{V} \) are called the right singular vectors.
Thus we can write, e.g.,
\[ \begin{align} \mathbf{M}\mathbf{M}^\top &= \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^\top \mathbf{V} \boldsymbol{\Sigma}^\top \mathbf{U}^\top \\ &= \mathbf{U} \begin{pmatrix} \sqrt{\mathbf{D}} \sqrt{\mathbf{D}} & 0_{M \times (N-M)} \\ 0_{(N-M) \times M} & 0_{(N-M) \times (N-M)} \end{pmatrix} \mathbf{U}^\top \\ & = \mathbf{U} \hat{\mathbf{D}} \mathbf{U}^\top \end{align} \]
We can recover the eigen decomposition for \( \mathbf{M}^\top \mathbf{M} \) in a similar fashion.