# A review of linear transformations

## Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

## Outline

• The following topics will be covered in this lecture:
• Linear transformations and matrices
• Linear transformations on subspaces
• Matrix spectrum and positive definite matrices
• The singular value decomposition
• Inverses and Pseudo-inverses

## Linear transformations

• Matrices and vectors give us tools to “represent” abstract linear transformations by giving coordinates to them.

• Linear transformations refer to a special class of maps between vector spaces;

• we will usually consider vectors spaces such as $$\mathbb{R}^N$$, though these tools extend in far greater generality.
Linear transformations
Let $$a,b\in \mathbb{R}$$ and $$\pmb{x},\pmb{y}\in V$$ be arbitrary scalars and vectors and $$T$$ be some transformation \begin{align} T : V \rightarrow W. \end{align} $$T$$ is said to be a linear transformation if it satisfies the property \begin{align} T\left(a \pmb{x} + b \pmb{y} \right) = a T\left( \pmb{x}\right) + b T\left(\pmb{y}\right). \end{align}
• Any map that does not satisfy the property above is said to be nonlinear;

• nonlinearities arise in highly different ways in practice and developing general tools for nonlinear maps is challenging.
• For reasons we shall see, we often use a linear approximation when it is appropriate to simplify the challenge of nonlinearities.

### Linear transformations as matrices

• Let's suppose that for simplicity, a transformation

\begin{align} T:V \rightarrow V \end{align}

is a map of the vector space $$V$$ to itself.

• Suppose, moreover, that we have chosen some basis for $$V$$, $$\{\pmb{x}_i\}_{i=1}^N$$;

• with a choice of basis, for each $$i$$ we can write

\begin{align} T(\pmb{x}_i) = \sum_{j=1}^N \alpha_{i,j} \pmb{x}_j \end{align}

as $$T(\pmb{x})\in V$$ by its definition, and by the definition of a basis.

• It is by choosing a basis for the domain and the image of the map, that we make a coordinatization of the map $$T$$.

• Notice that, by the linearity of the transformation $$T$$, for any vector $$\pmb{x} = \sum_{i=1}^N \beta_i \pmb{x}_i$$ its map can be written as

\begin{align} T\left(\pmb{x}\right) &= T\left( \sum_{i=1}^N \beta_i \pmb{x}_i \right)=\sum_{i=1}^N \beta_i T\left(\pmb{x}_i\right) \end{align}

so that determining the output of the map on the basis determines the output for any vector.

### Linear transformations as matrices

• Recall the notation for the coordinatization of the map, $$T(\pmb{x}_i) = \sum_{j=1}^N \alpha_{i,j} \pmb{x}_j$$.

• In particular, suppose we order the coefficients $$\alpha_{i,j}$$ so that $$i$$ refers to the row index and $$j$$ refers to column index of the array

\begin{align} \mathbf{A} := \begin{pmatrix} \alpha_{1,1} & \cdots & \alpha_{1,N} \\ \vdots & \ddots & \vdots \\ \alpha_{N,1} & \cdots & \alpha_{N,N}. \end{pmatrix} \end{align}

• Recall, if $$\pmb{x} = \sum_{i=1}^N \beta_i \pmb{x}_i$$, its coordinates in this basis are given as

\begin{align} \pmb{x} := \begin{pmatrix} \beta_1 & \cdots & \beta_N \end{pmatrix}^\top \end{align}

• Writing the transformation in these bases,

$T(\pmb{x})= \sum_{i=1}^N \beta_i T(\pmb{x}_i) = \sum_{i=1}^N \beta_i \left( \sum_{j=1}^N \alpha_{i,j} \pmb{x}_j\right) = \sum_{j=1}^N \left( \sum_{i=1}^N \beta_i \alpha_{i,j} \right)\pmb{x}_j$

so that the map of $$\pmb{x}$$ under $$T$$ has the coordinatization as

\begin{align} T(\pmb{x}) := \begin{pmatrix} \sum_{i=1}^N \beta_i \alpha_{i,1} & \cdots & \sum_{i=1}^N \beta_i \alpha_{i,N}\end{pmatrix}^\top \equiv \mathbf{A} \pmb{x} \end{align}

so that $$\mathbf{A}$$ completely describes the transformation $$T$$.

### Linear transformations and subspaces

• Note that a linear transformation need not map a vector space $$V$$ to itself;

• a very common situation is when we consider a map of $$V$$ to one of its subspaces with a projection operation.
• Generally, a map to a subspace also needs not be orthogonal;

• however, we will consider a similar property to idempotence in this situation in which a subspace is said to be invariant.
Invariant subspaces of linear transformations
Let $$W\subset V$$ be a subspace, and let $$T:V \rightarrow V$$ be a linear map. The subspace $$W$$ is said to be invariant under $$T$$ if for every $$\pmb{y} \in W$$ then $$T(\pmb{y})\in W$$.
• In the case that a map $$T$$ has an invariant subspace, there is a well-defined restriction of $$T$$ to the subspace

$T|_W: W \rightarrow W.$

• In particular, this restriction can similarly be written as matrix transformation;

• if $$V$$ has a basis of dimension $$N$$ and $$W$$ has a basis of dimension $$M$$, the restriction takes the form of a $$M\times M$$ matrix.
• With the above points, it brings us to the question of what basis should we choose for a linear transformation?

• The standard Euclidean basis is the most common choice, but the Euclidean basis may not take advantage of the invariant properties of the transformation.

• Particularly, if $$M \ll N$$ and we are only concerned with the action on the subspace, a well-chosen basis as above can perform a dramatic reduction to the computation of the map.

### Linear transformations and subspaces

• As a motivating example, consider the case where $$\mathrm{span}\{\pmb{x}\} \subset \mathbb{R}^N$$ is an invariant subspace of the transformation

$T:\mathbb{R}^N \rightarrow \mathbb{R}^N.$

• By the definition,

$T(\pmb{x}) \in \mathrm{span}\{\pmb{x}\} \Leftrightarrow T(\pmb{x}) =\alpha \pmb{x}.$

• Therefore, for any arbitrary vector $$\beta \pmb{x}$$ in the subspace

\begin{align} \mathbf{A}\beta \pmb{x} = \alpha \beta\pmb{x} \end{align}

so that the action of the linear transformation on the entire subspace is that of scalar multiplication.

• If we choose $$\pmb{x}$$ as a basis vector for $$\mathbb{R}^N$$, the associated matrix $$\mathbf{A}$$ for this choice of basis will have a diagonal element corresponding to this action.

• Diagonal matrices are extremely simple for computation; to make efficient use of computations we will be concerned with matrix decompositions, corresponding to special choices of coordinates.

• The most fundamental matrix decomposition is known as the eigen value decomposition, discussed as follows.

## Eigenvalues and eigenvectors

Eigen values and vectors
If a nonzero vector $$\pmb{x}$$ has the property that, $\mathbf{A}\pmb{x} =\lambda \pmb{x}$ then $$\pmb{x}$$ is said to be an eigen vector of $$\mathbf{A}$$ associated to the eigen value $$\lambda$$.
• Diagonal matrices are ones that have an entire coordinate system composed of eigen vectors;

• each basis vector defines an invariant subspace and the action is equivalent to scalar multiplication on each subspace.
• Certain non-diagonal matrices can be transformed into diagonal matrices by finding such a coordinate system and applying a “change of basis” transformation.

• Suppose that $$\pmb{x}$$ is an eigen vector for a matrix $$\mathbf{A}$$ associated to a zero eigenvalue $$\lambda=0$$; then the columns of $$\mathbf{A}^i$$ we have a linear dependence defined as

\begin{align} \sum_{i=1}^N x_i \mathbf{A}^i = \pmb{0} . \end{align}

• Notice now that,

\begin{align} \mathbf{A}\pmb{x} =\lambda \pmb{x} & & \Leftrightarrow & & \mathbf{A}\pmb{x} - \lambda \pmb{x} = 0 & & \Leftrightarrow & & \left(\mathbf{A} - \lambda \mathbf{I}_n\right) \pmb{x} = 0. \end{align}

• This means that $$\pmb{x}$$ is an eigen vector of the matrix $$\left(\mathbf{A} - \lambda \mathbf{I}_n\right)$$ associated to the zero eigenvalue and a linear dependence in the columns of $$\left(\mathbf{A} - \lambda \mathbf{I}_n\right)$$.

## The polar decomposition

• Not all matrices can be diagonalized with a change of basis;

• however, there is a special class of matrices for which this is true and arise commonly in statistical analysis.
• This result is a special case of the more general spectral theorem:

The polar decomposition (restricted spectral theorem)
Suppose that $$\mathbf{A}$$ is a real valued symmetric matrix, i.e., $$\mathbf{A} = \mathbf{A}^\top$$. Then there exists an orthogonal change of basis matrix $$\mathbf{U}$$ and diagonal matrix $$\mathbf{D}$$ for which \begin{align} \mathbf{A} = \mathbf{U}\mathbf{D} \mathbf{U}^\top. \end{align}
• The columns of $$\mathbf{U}$$ are the (orthonormal) eigen vectors of the matrix $$\mathbf{A}$$ and the diagonal elements of $$\mathbf{D}$$ are the associated eigen values.

• Orthogonal matrices represent rotations;
• therefore, with respect to a rotation of the coordinate frame, the action of the map is scalar multiplication on the invariant subspaces.
• If the eigenvalues are larger than one in absolute value, this action then corresponds to an expansion along these directions while eigen values with norm less than one correspond to a contraction;

• a negative eigenvalue corresponds to a reflection in the subspace, including a contraction or expansion as above.
• The eigenvalues, with a correct rotation of perspective, thus completely describe the action of a symmetric matrix as above.

## Eigen values and vectors in Numpy

• We will make a quick demonstration of the basic eigen function in Numpy
import numpy as np
from numpy.linalg import eig
from numpy.random import normal

• We begin by randomly generating an array with standard normal entries
np.random.seed(42)
my_array = normal(size=[3,3])
my_array

array([[ 0.49671415, -0.1382643 ,  0.64768854],
[ 1.52302986, -0.23415337, -0.23413696],
[ 1.57921282,  0.76743473, -0.46947439]])

• To compute the eigen values and the eigen vectors, we can use the eig function imported above:
e_vals, e_vecs = eig(my_array)


### Eigen values and vectors in Numpy

• From the last slide, we have
e_vals

array([ 1.22795686+0.j        , -0.71743524+0.61245434j,
-0.71743524-0.61245434j])

• Notice that two of the eigen values are complex and come in a conjugate pair;

• generally, real matrices can have complex eigen values, corresponding to rotations within the subspaces.
• This is not the case, however, for real symmetric matrices as discussed in the polar decomposition.

• Indeed,

my_symmetric_array = my_array.transpose() @ my_array
my_symmetric_array

array([[ 5.06025801,  0.78664234, -0.77628148],
[ 0.78664234,  0.66290088, -0.39501919],
[-0.77628148, -0.39501919,  0.69472676]])

e_vals, e_vecs = eig(my_symmetric_array)


### Eigen values and vectors in Numpy

• From the last slide, we see that the eigen values of the symmetric array are given as
e_vals

array([5.34613967, 0.78854214, 0.28320384])

• Notice the action of the symmetric array on the eigen vectors
e_vecs[:,0]

array([ 0.96813371,  0.17751704, -0.17664888])

my_symmetric_array @ e_vecs[:,0] / e_vals

array([ 0.96813371,  0.17751704, -0.17664888])

• If the map needs to be applied many times, computing the change of basis into the eigen vectors can be more cost-efficient than applying the map directly.

• Notice that the above eigen values are all positive – this is actually a general property of “squared” matrices constructed as

\begin{align} \mathbf{A}^\top \mathbf{A} \end{align}

for arbitrary arrays $$\mathbf{A}$$ with linearly independent columns.

## Positive definite matrices

• When all eigenvalues of a symmetric matrix $$\mathbf{A}$$ are positive we can visualize the matrix as a hyper-ellipse.

• The principal axes of the hyper-ellipse are given by the eigen vectors, while the length of these axes are given by the eigen values.

Positive (semi)-definite matrices
A real symmetric matrix $$\mathbf{A}$$ is said to be positive definite if all of its eigenvalues are greater than zero. If all eigenvalues are non-negative, then $$\mathbf{A}$$ is said to be positive semi-definite.
• Positive definite matrices are special, in part, because they define alternative, weighted version of the Euclidean norm by $$\sqrt{\pmb{x}^\top \mathbf{A}\pmb{x}}$$.

• For our purposes, we will more commonly use the norm defined by the inverse of a positive definite $$\mathbf{A}$$;

• we will return to this point shortly when we discuss matrix inverses.
• Let $$\mathbf{D} := \mathrm{diag}\left(\lambda_i\right)$$ be the diagonal matrix of the eigen values for the positive definite matrix $$\mathbf{A}$$.

• Positive definite matrices also have well-defined square roots – notice that for

$\sqrt{\mathbf{A}} := \mathbf{A}^\frac{1}{2}:= \mathbf{U}\mathrm{diag}\left(\sqrt{\lambda}_i\right) \mathbf{U}^\top$

we can write $$\mathbf{A}= \left(\mathbf{A}^\frac{1}{2}\right)^2$$.

## The singular value decomposition

• Consider now an arbitrary matrix $$\mathbf{M}\in \mathbb{R}^{N\times M}$$ for $$N>M$$; if $$\mathbf{M}$$ is not square, it cannot be symmetric, but each of

\begin{align} \mathbf{M}^\top \mathbf{M} \in \mathbb{R}^{M\times M} & & \mathbf{M}\mathbf{M}^\top \in \mathbb{R}^{N\times N} \end{align} are symmetric.

• Using the polar decomposition, we can thus find their diagonalizations

\begin{align} \mathbf{M}^\top \mathbf{M} & = \mathbf{V} \mathbf{D} \mathbf{V}^\top\\ \mathbf{M}\mathbf{M}^\top & = \mathbf{U} \hat{\mathbf{D}} \mathbf{U}^\top, \end{align}

where

1. $$\mathbf{V} \in \mathbb{R}^{M\times M},\mathbf{U} \in \mathbb{R}^{N \times N}$$ are orthogonal matrices;
2. $$\mathbf{D} \in \mathbb{R}^{M \times M}, \hat{\mathbf{D}} \in \mathbb{R}^{N \times N}$$ are both diagonal matrices of the eigenvalues of the “square”.
• Assume that the columns of $$\mathbf{M}$$ are linearly independent – then both $$\mathbf{D}$$ and $$\hat{\mathbf{D}}$$ have exactly $$M$$ non-zero eigenvalues.

• If we order the non-zero eigenvalues of $$\hat{\mathbf{D}}$$ such that there is a non-zero block in the top left,

\begin{align} \hat{\mathbf{D}} = \begin{pmatrix} \mathbf{D} & 0_{M \times (N-M)} \\ 0_{(N-M) \times M} & 0_{(N-M) \times (N-M)} \end{pmatrix} \end{align}

### The singular value decomposition

• It can be demonstrated that $$\mathbf{M}$$ has a decomposition into,

\begin{align} \mathbf{M} = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^\top, \end{align}

where

\begin{align} \boldsymbol{\Sigma} = \begin{pmatrix} \sqrt{\mathbf{D}} \\ 0_{(N-M) \times M} \end{pmatrix}. \end{align}

• This decomposition is known as the singular value decomposition. The singular values are the (positive) principal-diagonal entries of $$\boldsymbol{\Sigma}$$ usually ordered descendingly in size.

• The columns of $$\mathbf{U}$$ are called the left singular vectors while the columns of $$\mathbf{V}$$ are called the right singular vectors.

• Thus we can write, e.g.,

\begin{align} \mathbf{M}\mathbf{M}^\top &= \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^\top \mathbf{V} \boldsymbol{\Sigma}^\top \mathbf{U}^\top \\ &= \mathbf{U} \begin{pmatrix} \sqrt{\mathbf{D}} \sqrt{\mathbf{D}} & 0_{M \times (N-M)} \\ 0_{(N-M) \times M} & 0_{(N-M) \times (N-M)} \end{pmatrix} \mathbf{U}^\top \\ & = \mathbf{U} \hat{\mathbf{D}} \mathbf{U}^\top \end{align}

• We can recover the eigen decomposition for $$\mathbf{M}^\top \mathbf{M}$$ in a similar fashion.