Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
Matrices and vectors give us tools to “represent” abstract linear transformations by giving coordinates to them.
Linear transformations refer to a special class of maps between vector spaces;
Linear transformations
Let \( a,b\in \mathbb{R} \) and \( \pmb{x},\pmb{y}\in V \) be arbitrary scalars and vectors and \( T \) be some transformation \[ \begin{align} T : V \rightarrow W. \end{align} \] \( T \) is said to be a linear transformation if it satisfies the property \[ \begin{align} T\left(a \pmb{x} + b \pmb{y} \right) = a T\left( \pmb{x}\right) + b T\left(\pmb{y}\right). \end{align} \]
Any map that does not satisfy the property above is said to be nonlinear;
For reasons we shall see, we often use a linear approximation when it is appropriate to simplify the challenge of nonlinearities.
Let's suppose that for simplicity, a transformation
\[ \begin{align} T:V \rightarrow V \end{align} \]
is a map of the vector space \( V \) to itself.
Suppose, moreover, that we have chosen some basis for \( V \), \( \{\pmb{x}_i\}_{i=1}^N \);
\[ \begin{align} T(\pmb{x}_i) = \sum_{j=1}^N \alpha_{i,j} \pmb{x}_j \end{align} \]
as \( T(\pmb{x})\in V \) by its definition, and by the definition of a basis.
It is by choosing a basis for the domain and the image of the map, that we make a coordinatization of the map \( T \).
Notice that, by the linearity of the transformation \( T \), for any vector \( \pmb{x} = \sum_{i=1}^N \beta_i \pmb{x}_i \) its map can be written as
\[ \begin{align} T\left(\pmb{x}\right) &= T\left( \sum_{i=1}^N \beta_i \pmb{x}_i \right)=\sum_{i=1}^N \beta_i T\left(\pmb{x}_i\right) \end{align} \]
so that determining the output of the map on the basis determines the output for any vector.
Recall the notation for the coordinatization of the map, \( T(\pmb{x}_i) = \sum_{j=1}^N \alpha_{i,j} \pmb{x}_j \).
In particular, suppose we order the coefficients \( \alpha_{i,j} \) so that \( i \) refers to the row index and \( j \) refers to column index of the array
\[ \begin{align} \mathbf{A} := \begin{pmatrix} \alpha_{1,1} & \cdots & \alpha_{1,N} \\ \vdots & \ddots & \vdots \\ \alpha_{N,1} & \cdots & \alpha_{N,N}. \end{pmatrix} \end{align} \]
Recall, if \( \pmb{x} = \sum_{i=1}^N \beta_i \pmb{x}_i \), its coordinates in this basis are given as
\[ \begin{align} \pmb{x} := \begin{pmatrix} \beta_1 & \cdots & \beta_N \end{pmatrix}^\top \end{align} \]
Writing the transformation in these bases,
\[ T(\pmb{x})= \sum_{i=1}^N \beta_i T(\pmb{x}_i) = \sum_{i=1}^N \beta_i \left( \sum_{j=1}^N \alpha_{i,j} \pmb{x}_j\right) = \sum_{j=1}^N \left( \sum_{i=1}^N \beta_i \alpha_{i,j} \right)\pmb{x}_j \]
so that the map of \( \pmb{x} \) under \( T \) has the coordinatization as
\[ \begin{align} T(\pmb{x}) := \begin{pmatrix} \sum_{i=1}^N \beta_i \alpha_{i,1} & \cdots & \sum_{i=1}^N \beta_i \alpha_{i,N}\end{pmatrix}^\top \equiv \mathbf{A} \pmb{x} \end{align} \]
so that \( \mathbf{A} \) completely describes the transformation \( T \).
Note that a linear transformation need not map a vector space \( V \) to itself;
Generally, a map to a subspace also needs not be orthogonal;
Invariant subspaces of linear transformations
Let \( W\subset V \) be a subspace, and let \( T:V \rightarrow V \) be a linear map. The subspace \( W \) is said to be invariant under \( T \) if for every \( \pmb{y} \in W \) then \( T(\pmb{y})\in W \).
In the case that a map \( T \) has an invariant subspace, there is a well-defined restriction of \( T \) to the subspace
\[ T|_W: W \rightarrow W. \]
In particular, this restriction can similarly be written as matrix transformation;
With the above points, it brings us to the question of what basis should we choose for a linear transformation?
The standard Euclidean basis is the most common choice, but the Euclidean basis may not take advantage of the invariant properties of the transformation.
Particularly, if \( M \ll N \) and we are only concerned with the action on the subspace, a well-chosen basis as above can perform a dramatic reduction to the computation of the map.
As a motivating example, consider the case where \( \mathrm{span}\{\pmb{x}\} \subset \mathbb{R}^N \) is an invariant subspace of the transformation
\[ T:\mathbb{R}^N \rightarrow \mathbb{R}^N. \]
By the definition,
\[ T(\pmb{x}) \in \mathrm{span}\{\pmb{x}\} \Leftrightarrow T(\pmb{x}) =\alpha \pmb{x}. \]
Therefore, for any arbitrary vector \( \beta \pmb{x} \) in the subspace
\[ \begin{align} \mathbf{A}\beta \pmb{x} = \alpha \beta\pmb{x} \end{align} \]
so that the action of the linear transformation on the entire subspace is that of scalar multiplication.
If we choose \( \pmb{x} \) as a basis vector for \( \mathbb{R}^N \), the associated matrix \( \mathbf{A} \) for this choice of basis will have a diagonal element corresponding to this action.
Diagonal matrices are extremely simple for computation; to make efficient use of computations we will be concerned with matrix decompositions, corresponding to special choices of coordinates.
The most fundamental matrix decomposition is known as the eigen value decomposition, discussed as follows.
Eigen values and vectors
If a nonzero vector \( \pmb{x} \) has the property that, \[ \mathbf{A}\pmb{x} =\lambda \pmb{x} \] then \( \pmb{x} \) is said to be an eigen vector of \( \mathbf{A} \) associated to the eigen value \( \lambda \).
Diagonal matrices are ones that have an entire coordinate system composed of eigen vectors;
Certain non-diagonal matrices can be transformed into diagonal matrices by finding such a coordinate system and applying a “change of basis” transformation.
Suppose that \( \pmb{x} \) is an eigen vector for a matrix \( \mathbf{A} \) associated to a zero eigenvalue \( \lambda=0 \); then the columns of \( \mathbf{A}^i \) we have a linear dependence defined as
\[ \begin{align} \sum_{i=1}^N x_i \mathbf{A}^i = \pmb{0} . \end{align} \]
Notice now that,
\[ \begin{align} \mathbf{A}\pmb{x} =\lambda \pmb{x} & & \Leftrightarrow & & \mathbf{A}\pmb{x} - \lambda \pmb{x} = 0 & & \Leftrightarrow & & \left(\mathbf{A} - \lambda \mathbf{I}_n\right) \pmb{x} = 0. \end{align} \]
This means that \( \pmb{x} \) is an eigen vector of the matrix \( \left(\mathbf{A} - \lambda \mathbf{I}_n\right) \) associated to the zero eigenvalue and a linear dependence in the columns of \( \left(\mathbf{A} - \lambda \mathbf{I}_n\right) \).
Not all matrices can be diagonalized with a change of basis;
This result is a special case of the more general spectral theorem:
The polar decomposition (restricted spectral theorem)
Suppose that \( \mathbf{A} \) is a real valued symmetric matrix, i.e., \( \mathbf{A} = \mathbf{A}^\top \). Then there exists an orthogonal change of basis matrix \( \mathbf{U} \) and diagonal matrix \( \mathbf{D} \) for which \[ \begin{align} \mathbf{A} = \mathbf{U}\mathbf{D} \mathbf{U}^\top. \end{align} \]
The columns of \( \mathbf{U} \) are the (orthonormal) eigen vectors of the matrix \( \mathbf{A} \) and the diagonal elements of \( \mathbf{D} \) are the associated eigen values.
If the eigenvalues are larger than one in absolute value, this action then corresponds to an expansion along these directions while eigen values with norm less than one correspond to a contraction;
The eigenvalues, with a correct rotation of perspective, thus completely describe the action of a symmetric matrix as above.
import numpy as np
from numpy.linalg import eig
from numpy.random import normal
np.random.seed(42)
my_array = normal(size=[3,3])
my_array
array([[ 0.49671415, -0.1382643 , 0.64768854],
[ 1.52302986, -0.23415337, -0.23413696],
[ 1.57921282, 0.76743473, -0.46947439]])
eig
function imported above:e_vals, e_vecs = eig(my_array)
e_vals
array([ 1.22795686+0.j , -0.71743524+0.61245434j,
-0.71743524-0.61245434j])
Notice that two of the eigen values are complex and come in a conjugate pair;
This is not the case, however, for real symmetric matrices as discussed in the polar decomposition.
Indeed,
my_symmetric_array = my_array.transpose() @ my_array
my_symmetric_array
array([[ 5.06025801, 0.78664234, -0.77628148],
[ 0.78664234, 0.66290088, -0.39501919],
[-0.77628148, -0.39501919, 0.69472676]])
e_vals, e_vecs = eig(my_symmetric_array)
e_vals
array([5.34613967, 0.78854214, 0.28320384])
e_vecs[:,0]
array([ 0.96813371, 0.17751704, -0.17664888])
my_symmetric_array @ e_vecs[:,0] / e_vals[0]
array([ 0.96813371, 0.17751704, -0.17664888])
If the map needs to be applied many times, computing the change of basis into the eigen vectors can be more cost-efficient than applying the map directly.
Notice that the above eigen values are all positive – this is actually a general property of “squared” matrices constructed as
\[ \begin{align} \mathbf{A}^\top \mathbf{A} \end{align} \]
for arbitrary arrays \( \mathbf{A} \) with linearly independent columns.
When all eigenvalues of a symmetric matrix \( \mathbf{A} \) are positive we can visualize the matrix as a hyper-ellipse.
The principal axes of the hyper-ellipse are given by the eigen vectors, while the length of these axes are given by the eigen values.
Positive (semi)-definite matrices
A real symmetric matrix \( \mathbf{A} \) is said to be positive definite if all of its eigenvalues are greater than zero. If all eigenvalues are non-negative, then \( \mathbf{A} \) is said to be positive semi-definite.
Positive definite matrices are special, in part, because they define alternative, weighted version of the Euclidean norm by \( \sqrt{\pmb{x}^\top \mathbf{A}\pmb{x}} \).
For our purposes, we will more commonly use the norm defined by the inverse of a positive definite \( \mathbf{A} \);
Let \( \mathbf{D} := \mathrm{diag}\left(\lambda_i\right) \) be the diagonal matrix of the eigen values for the positive definite matrix \( \mathbf{A} \).
Positive definite matrices also have well-defined square roots – notice that for
\[ \sqrt{\mathbf{A}} := \mathbf{A}^\frac{1}{2}:= \mathbf{U}\mathrm{diag}\left(\sqrt{\lambda}_i\right) \mathbf{U}^\top \]
we can write \( \mathbf{A}= \left(\mathbf{A}^\frac{1}{2}\right)^2 \).
Consider now an arbitrary matrix \( \mathbf{M}\in \mathbb{R}^{N\times M} \) for \( N>M \); if \( \mathbf{M} \) is not square, it cannot be symmetric, but each of
\[ \begin{align} \mathbf{M}^\top \mathbf{M} \in \mathbb{R}^{M\times M} & & \mathbf{M}\mathbf{M}^\top \in \mathbb{R}^{N\times N} \end{align} \] are symmetric.
Using the polar decomposition, we can thus find their diagonalizations
\[ \begin{align} \mathbf{M}^\top \mathbf{M} & = \mathbf{V} \mathbf{D} \mathbf{V}^\top\\ \mathbf{M}\mathbf{M}^\top & = \mathbf{U} \hat{\mathbf{D}} \mathbf{U}^\top, \end{align} \]
where
Assume that the columns of \( \mathbf{M} \) are linearly independent – then both \( \mathbf{D} \) and \( \hat{\mathbf{D}} \) have exactly \( M \) non-zero eigenvalues.
If we order the non-zero eigenvalues of \( \hat{\mathbf{D}} \) such that there is a non-zero block in the top left,
\[ \begin{align} \hat{\mathbf{D}} = \begin{pmatrix} \mathbf{D} & 0_{M \times (N-M)} \\ 0_{(N-M) \times M} & 0_{(N-M) \times (N-M)} \end{pmatrix} \end{align} \]
It can be demonstrated that \( \mathbf{M} \) has a decomposition into,
\[ \begin{align} \mathbf{M} = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^\top, \end{align} \]
where
\[ \begin{align} \boldsymbol{\Sigma} = \begin{pmatrix} \sqrt{\mathbf{D}} \\ 0_{(N-M) \times M} \end{pmatrix}. \end{align} \]
This decomposition is known as the singular value decomposition. The singular values are the (positive) principal-diagonal entries of \( \boldsymbol{\Sigma} \) usually ordered descendingly in size.
The columns of \( \mathbf{U} \) are called the left singular vectors while the columns of \( \mathbf{V} \) are called the right singular vectors.
Thus we can write, e.g.,
\[ \begin{align} \mathbf{M}\mathbf{M}^\top &= \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^\top \mathbf{V} \boldsymbol{\Sigma}^\top \mathbf{U}^\top \\ &= \mathbf{U} \begin{pmatrix} \sqrt{\mathbf{D}} \sqrt{\mathbf{D}} & 0_{M \times (N-M)} \\ 0_{(N-M) \times M} & 0_{(N-M) \times (N-M)} \end{pmatrix} \mathbf{U}^\top \\ & = \mathbf{U} \hat{\mathbf{D}} \mathbf{U}^\top \end{align} \]
We can recover the eigen decomposition for \( \mathbf{M}^\top \mathbf{M} \) in a similar fashion.
Courtesy of Georg-Johann CC BY-SA 3.0
from numpy.linalg import svd
U, Sigma, Vh = svd(my_array)
Sigma
array([2.31217207, 0.88799895, 0.53216899])
U, Sigma, Vh = svd(my_symmetric_array)
Sigma
array([5.34613967, 0.78854214, 0.28320384])
e_vals
array([5.34613967, 0.78854214, 0.28320384])
For any zero eigen vector, its image under the corresponding linear transformation is zero regardless of which particular value in the subspace we choose.
We can still attempt to reconstruct the information of the input from an observed output, but the methods will depend on how the relationships constrain the data, and if we can be satisfied with an approximate solution.
Linear inverse problem
Let \( \mathbf{A} \in \mathbb{R}^{N \times M} \), \( \pmb{x} \in \mathbb{R}^N \), \( \pmb{b}\in\mathbb{R}^M \). Suppose that \( \mathbf{A} \) and \( \pmb{b} \) are known, but \( \pmb{x} \) is unknown and satisfy the relationship: \[ \mathbf{A} \pmb{x} = \pmb{b} \] Finding the collection of values \( \pmb{x} \) that satisfy this relationship is known as a linear inverse problem.
If \( N>M \) (more rows than columns) then the system is over-determined and there may be no value of \( \pmb{x} \) satisfying the relationship;
In the case that \( M=N \) and \( \mathbf{A} \) has no zero eigen values then \( \mathbf{A} \) is invertible:
Matrix inverse
Suppose \( \mathbf{A}\in \mathbb{R}^{N \times N} \) has no zero eigen values. There exists an inverse transformation denoted \( \mathbf{A}^{-1} \) that satisfies the property \[ \begin{align} \mathbf{A}^{-1}\mathbf{A} = \mathbf{A}\mathbf{A}^{-1} = \mathbf{I}_N:= \mathrm{diag}(\pmb{1}_N). \end{align} \]
If we suppose that an inverse for \( \mathbf{A} \) exists, then notice that
\[ \pmb{x} = \mathbf{A}^{-1} \pmb{b}. \]
The way to implement such a procedure in Numpy is with the linalg.solve
function.
from numpy.linalg import solve, inv
observed_vec = np.arange(3)
observed_vec
array([0, 1, 2])
solve
function, we compute the desired inputs and demonstrate that the action of my_array
gives the original example:input_vec = solve(my_array, observed_vec)
my_array @ input_vec
array([-2.77555756e-17, 1.00000000e+00, 2.00000000e+00])
my_inverse_array = inv(my_array)
my_inverse_array @ my_array
array([[ 1.00000000e+00, 0.00000000e+00, 4.16333634e-17],
[-2.22044605e-16, 1.00000000e+00, 0.00000000e+00],
[ 1.38777878e-16, -6.93889390e-17, 1.00000000e+00]])
my_inverse_array @ observed_vec
array([ 0.7323549 , 0.86895275, -0.37614669])
input_vec
array([ 0.7323549 , 0.86895275, -0.37614669])
solve
function as on the last slide.Consider the linear inverse problem again,
\[ \begin{align}
\mathbf{A} \pmb{x}= \pmb{b} & & \mathbf{A}\in \mathbb{R}^{N \times M}
\end{align} \]
and suppose that the
We note, \( \mathbf{A} \) is not square and thus has no formal inverse;
The (left) pseudo-inverse
Suppose that \( \mathbf{A}\in \mathbb{R}^{N\times M} \) has linearly independent columns with \( N>M \). In this case, \( \left(\mathbf{A}^\top\mathbf{A}\right)^{-1} \) exists, and the (left) pseudo-inverse is defined \[ \begin{align} \mathbf{A}^\dagger := \left(\mathbf{A}^\top \mathbf{A}\right)^{-1} \mathbf{A}^\top. \end{align} \]
If the pseudo-inverse exists as above it satisfies the property,
\[ \begin{align} \mathbf{A}^\dagger \mathbf{A} = \left(\mathbf{A}^\top \mathbf{A}\right)^{-1} \mathbf{A}^\top\mathbf{A} = \mathbf{I}_M; \end{align} \]
Note, right pseudo-inverses also exist under the right conditions;
Consider now the representation of the pseudo-inverse with respect to the SVD,
\[ \begin{align} \mathbf{A}^\dagger &= \left(\mathbf{V}\boldsymbol{\Sigma}^\top\mathbf{U}^\top\mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^\top\right)^{-1} \mathbf{V}\boldsymbol{\Sigma}^\top\mathbf{U}^\top \\ &=\mathbf{V}\left(\boldsymbol{\Sigma}^\top\boldsymbol{\Sigma}\right)^{-1} \boldsymbol{\Sigma}^\top\mathbf{U}^\top \end{align} \]
where \( \boldsymbol{\Sigma}^\top\boldsymbol{\Sigma}\in \mathbb{R}^{M\times M} \) is the diagonal matrix of the singular values squared.
Applying \( \mathbf{A} \) to the left-hand-side,
\[ \begin{align} \mathbf{A}\mathbf{A}^\dagger &=\mathbf{U} \boldsymbol{\Sigma}\left(\boldsymbol{\Sigma}^\top\boldsymbol{\Sigma}\right)^{-1} \boldsymbol{\Sigma}^\top\mathbf{U}^\top\\ &= \mathbf{U}\mathbf{U}^\top \end{align} \] where \( \mathbf{U}\mathbf{U}^\top \) is the orthogonal projection into the column span of \( \mathbf{A} \).
The orthogonal projection lemma thus tells us that for \( \pmb{y} := \mathbf{A}^\dagger \pmb{b} \),
\[ \begin{align} \parallel \mathbf{A} \pmb{x} - \pmb{b} \parallel \geq \parallel \mathbf{A} \pmb{y} - \pmb{b} \parallel \end{align} \] for any \( \pmb{x} \) that lies in the column span of \( \mathbf{A} \).
The pseudo-inverse thus gives a coordinatization of a projected vector when restricted to \( \mathrm{span}\{\mathbf{A}\} \equiv \mathbb{R}^M \), but without embedding the coordinates in the higher dimensional space \( \mathbb{R}^N \).
Applying the action of \( \mathbf{A} \) once again raises the coordinates to the embedding in \( \mathbb{R}^N \).
It is clear that one can efficiently compute the pseudo-inverse in Numpy by using the SVD formulation directly.
However, Numpy also has a built-in function to handle the pseudo-inverse;
We can use
from numpy.linalg import pinv
my_rectangular_array = normal(size=[3,2])
my_pseudo_inverse = pinv(my_rectangular_array)
my_pseudo_inverse @ my_rectangular_array
array([[1.00000000e+00, 5.55111512e-17],
[0.00000000e+00, 1.00000000e+00]])
In the above, the array is not even square, but we are able to find an “inverse” that lies only within the column span.
We remarked earlier that when a matrix is positive definite, we can define a weighted norm from this matrix;
On the other hand, if a matrix \( \mathbf{A} \in \mathbb{R}^{N \times M} \) has linearly independent columns then \( \mathbf{M}:= \mathbf{A} \mathbf{A}^\top \) is positive semi-definite;
Often, we will want to take a norm or a pseudo-norm with respect to a positive semi-definite matrix, but weighted inverse proportionally.
Weighted norms and pseudo-norms
Suppose \( \mathbf{A} \) is positive definite, we denote the vector norm with respect to \( \mathbf{A} \) as \[ \begin{align} \parallel \pmb{x}\parallel_\mathbf{A} := \sqrt{\pmb{x}^\top \mathbf{A}^{-1}\pmb{x}} \end{align} \] If \( \mathbf{A} \) has full column rank and \( \mathbf{M}:= \mathbf{A}\mathbf{A}^\top \), we denote the pseudo-norm with respect to \( \mathbf{M} \) as \[ \begin{align} \parallel \pmb{x}\parallel_\mathbf{M} := \sqrt{ \left(\mathbf{A}^\dagger\pmb{x}\right)^\top \left(\mathbf{A}^{\dagger}\pmb{x}\right)} \end{align} \]
Note, the pseudo-norm is not a true norm as it takes a zero value on the null space of \( \mathbf{A} \).