Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:

This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

- The following topics will be covered in this lecture:
- Linear inverse problems
- Matrix spectrum and eigen vectors
- Matrix norms

An inverse in the R language can be found from a more general problem called a

**linear inverse problem**:\[ \mathbf{A} \mathbf{x} = \mathbf{b} \]

In the above, we assume that the

**vector \( \mathbf{x} \) is of interest but unknown**.- On the other hand, we assume that
**\( \mathbf{A} \) is a known relationship between \( \mathbf{x} \) and the observed vector \( \mathbf{b} \)**.

- On the other hand, we assume that
If there are

**no dependencies in the relationship defined by the columns of \( \mathbf{A} \)**, then there is a**unique relationship**that transfers the unobserved \( \mathbf{x} \) to the observed variables \( \mathbf{b} \).Being able to invert this relationship, we find

\[ \mathbf{x} = \mathbf{A}^{-1} \mathbf{b}. \]

The way to implement such a procedure in R is with the

`solve()`

function.If a matrix \( \mathbf{A} \) is supplied alone, this computes the inverse \( \mathbf{A}^{-1} \) if it exists.

- If a vector \( \mathbf{b} \) is supplied additionally, then this solves for \( \mathbf{x} \) as above.

- We will use the
`solve()`

function with`my_matrix`

(as generated last time) to demonstrate:

```
set.seed(0)
my_matrix <- matrix(rnorm(16), nrow=4, ncol=4)
solve(my_matrix) %*% my_matrix
```

```
[,1] [,2] [,3] [,4]
[1,] 1.000000e+00 1.942890e-16 0.000000e+00 -5.551115e-17
[2,] -2.220446e-16 1.000000e+00 0.000000e+00 0.000000e+00
[3,] -2.220446e-16 1.665335e-16 1.000000e+00 -5.551115e-17
[4,] 0.000000e+00 2.775558e-16 -4.440892e-16 1.000000e+00
```

```
my_matrix %*% solve(my_matrix)
```

```
[,1] [,2] [,3] [,4]
[1,] 1.000000e+00 1.110223e-16 0.000000e+00 -2.220446e-16
[2,] -2.775558e-17 1.000000e+00 -2.220446e-16 4.996004e-16
[3,] 0.000000e+00 0.000000e+00 1.000000e+00 5.551115e-17
[4,] -5.551115e-17 0.000000e+00 3.330669e-16 1.000000e+00
```

- Notice that the off diagonal elements are not exactly but approximately zero.

- We can also solve the linear inverse problem,

```
my_matrix %*% solve(my_matrix, 1:4)
```

```
[,1]
[1,] 1
[2,] 2
[3,] 3
[4,] 4
```

- This is representing the equation \[ \mathbf{A} \mathbf{x} = \mathbf{A} \left(\mathbf{A}^{-1}\mathbf{b}\right) = \mathbf{b}. \]

Notice the dimensionality in the previous linear inverse problem,

\[ \begin{align} \mathbf{A}\in \mathbb{R}^{n\times n} & & \mathbf{x}\in\mathbb{R}^{n \times 1 } & & \mathbf{b} \in \mathbb{R}^{n\times 1} \end{align} \]

That is to say, \( \mathbf{A} \) takes \( \mathbf{x} \) from where it lies in \( \mathbb{R}^n \) to another vector in \( \mathbb{R}^n \).

Generally, when we consider a square matrix \( \mathbf{A}\in\mathbb{R}^{n\times n} \), the transformation it represents is from the space \( \mathbb{R}^n \) back to itself.

A special notion exists for such transformations, when the transformation only scales the existing values.

Consider the diagonal matrix,

```
diag(1:3)
```

```
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 2 0
[3,] 0 0 3
```

```
diag(1:3) %*% c(1, 0, 0)
```

```
[,1]
[1,] 1
[2,] 0
[3,] 0
```

- Now we consider

```
diag(1:3) %*% c(0, 1, 0)
```

```
[,1]
[1,] 0
[2,] 2
[3,] 0
```

```
diag(1:3) %*% c(0, 0, 1)
```

```
[,1]
[1,] 0
[2,] 0
[3,] 3
```

In each case, the matrix

`diag(1:3)`

had the property that it sent the vector back to a re-scaled copy of itself.This is because each of the vectors were distinct eigenvectors for the matrix.

If a nonzero vector \( \mathbf{x} \) has the property that,

\[ \mathbf{A}\mathbf{x} =\lambda \mathbf{x} \]

then \( \mathbf{x} \) is said to be an eigenvector of \( \mathbf{A} \) associated to the eigenvalue \( \lambda \).

Diagonal matrices are ones that have an entire coordinate system composed of eigenvectors.

- Certain non-diagonal matrices can be transformed into diagonal matrices by finding such a coordinate system.

Notice now that,

\[ \begin{align} & \mathbf{A}\mathbf{x} =\lambda \mathbf{x} \\ \Leftrightarrow & \mathbf{A}\mathbf{x} - \lambda \mathbf{x} = 0 \\ \Leftrightarrow & \left(\mathbf{A} - \lambda \mathbf{I}_n\right) \mathbf{x} = 0 \\ \end{align} \]

This means that \( \mathbf{x} \) is an eigenvector of the matrix \( \left(\mathbf{A} - \lambda \mathbf{I}_n\right) \) associated to the zero eigenvalue.

- Particularly, this means that there is a linear dependence in \( \left(\mathbf{A} - \lambda \mathbf{I}_n\right) \) and

\[ \mathrm{det} \left(\mathbf{A} - \lambda \mathbf{I}_n\right) = 0. \]

The above fact is a basic property that allows us to solve for eigenvalues of the matrix \( \mathbf{A} \).

- In the R language, eigenvalues and eigenvectors can be computed as follows:

```
eigen(my_matrix)
```

```
eigen() decomposition
$values
[1] -0.312062+1.564758i -0.312062-1.564758i 1.387185+0.000000i
[4] -0.687975+0.000000i
$vectors
[,1] [,2] [,3] [,4]
[1,] -0.3449514+0.0145144i -0.3449514-0.0145144i -0.62691105+0i 0.2303783+0i
[2,] 0.5832284+0.0000000i 0.5832284+0.0000000i -0.44282830+0i 0.6601861+0i
[3,] 0.2216425+0.4402345i 0.2216425-0.4402345i -0.63480339+0i 0.3408087+0i
[4,] -0.2440101+0.4880265i -0.2440101-0.4880265i -0.08893981+0i 0.6284342+0i
```

Notice that some of the eigenvalue are actually complex numbers, and the eigenvectors are complex vectors.

- This is due to the fact that such a coordinate system to transform a matrix into a representation in eigenspaces may only exist in complex coordinates.

- Instead consider,

```
A <- matrix(1:16, nrow=4, ncol=4)
eigen(diag(1:3))
```

```
eigen() decomposition
$values
[1] 3 2 1
$vectors
[,1] [,2] [,3]
[1,] 0 0 1
[2,] 0 1 0
[3,] 1 0 0
```

```
eigen(A)
```

```
eigen() decomposition
$values
[1] 3.620937e+01 -2.209373e+00 -9.072325e-16 7.166935e-16
$vectors
[,1] [,2] [,3] [,4]
[1,] 0.4140028 0.82289268 -0.5477226 -0.06211969
[2,] 0.4688206 0.42193991 0.7302967 0.48844043
[3,] 0.5236384 0.02098714 0.1825742 -0.79052178
[4,] 0.5784562 -0.37996563 -0.3651484 0.36420104
```

- Taking a product of
`A`

with one of its zero eigenvectors, we get

```
A %*% eigen(A)$vectors[,3]
```

```
[,1]
[1,] -8.881784e-16
[2,] 0.000000e+00
[3,] 8.881784e-16
[4,] 1.776357e-15
```

```
A %*% eigen(A)$vectors[,4]
```

```
[,1]
[1,] -8.881784e-16
[2,] 0.000000e+00
[3,] 8.881784e-16
[4,] 0.000000e+00
```

- Specifically, this gives a dependence relationship as,

```
A[,1] *eigen(A)$vectors[1,3] + A[,2] *eigen(A)$vectors[2,3] + A[,3] *eigen(A)$vectors[3,3] + A[,4] *eigen(A)$vectors[4,3]
```

```
[1] -8.881784e-16 0.000000e+00 8.881784e-16 1.776357e-15
```

Recall now that the eigenvalues of

`diag(1:3)`

are 1, 2 and 3.**Q:**can you tell how these eigenvalues relate to the value

```
det(diag(1:3))
```

```
[1] 6
```

**A:**In fact, the determinant of the matrix is equal to the product of the eigenvalues.From the above, we recover a general equivalence:

The matrix \( \mathbf{A} \) has an inverse \( \Leftrightarrow \) \( \mathrm{det}\left(\mathbf{A}\right) \neq 0 \) \( \Leftrightarrow \) \( \mathbf{A} \) has no linear dependence between its columns \( \Leftrightarrow \) the matrix \( \mathbf{A} \) has no zero eigenvalues \( \Leftrightarrow \) the linear inverse problem \( \mathbf{A}\mathbf{x} = \mathbf{b} \) has a unique solution for \( \mathbf{x} \).

- All of the above statements are equivalent and are useful to equivalently in different scenarios.

This shows how the determinant is related to the eigenvalues and the spectrum of the matrix \( \mathbf{A} \).

The trace is also related to the eigenvalues as follows:

- Let the collection of all eigenvalues of \( \mathbf{A} \) be defined as \( \{\lambda_i\}_{i=1}^k \) for some \( k\leq n \), then \[ \mathrm{tr}\left(\mathbf{A}\right) = \sum_{i=1}^k \lambda_i \]

We will use this fact shortly when we discuss the Frobenius norm.

At the moment, we will introduce a basic case of the spectral theorem, that is useful for understanding the idea of a matrix norm.

Suppose that we have a matrix \( \mathbf{A} \in \mathbb{R}^{n\times p} \) where we assume that \( p \leq n \).

We will define a square product of this matrix with itself so that the dimensionality makes sense, and so that it is in the smallest dimension \( p\times p \) as,

\[ \mathbf{A}^\mathrm{T} \mathbf{A} \in \mathbb{R}^{p\times p}. \]

The spectral theorem guarantees that this matrix can be transformed into a diagonal matrix in an appropriate real-valued coordinate change, and the diagonal will have only non-negative values on the diagonal.

Particularly, the \( p \) non-negative values on the diagonal are the eigenvalues \( \{\lambda_i\}_{i=1}^p \) of \( \mathbf{A}^\mathrm{T}\mathbf{A} \) (or the singular values squared of \( \mathbf{A} \)).

The reason that this can be decomposed into such a coordinate system is because of the symmetry of the product under transpose:

\[ \left(\mathbf{A}^\mathrm{T}\mathbf{A}\right)^\mathrm{T} = \mathbf{A}^\mathrm{T} \left(\mathbf{A}^\mathrm{T}\right)^\mathrm{T} =\mathbf{A}^\mathrm{T} \mathbf{A} \]

The reason that the eigenvalues must be non-negative is because this acts like the square of a scalar, but in terms of the scalar eigenvalues.

There are many ways we can describe the “length” of the matrix, and all give different features more prominence, but are algebraicly equivalent up to rescaling.

One particularly useful type of norm is known as the Frobenius norm of a matrix, and arises naturally due to the previous decomposition.

We note that \( \mathbf{A}^\mathrm{T} \mathbf{A} \) has p non-negative eigenvalues and that therefore,

\[ \mathrm{tr}\left(\mathbf{A}^\mathrm{T}\mathbf{A}\right) = \sum_{i=1}^p \lambda_i. \] can be computed directly by solving for the eigenvalues.

Particularly, this gives a kind of weighted measure of the expansion and contraction under the map \( \mathbf{A}^\mathrm{T}\mathbf{A} \).

We define the Frobenius norm as an actual mathematical matrix “distance” as,

\[ \parallel \mathbf{A}\parallel_F = \sqrt{\mathrm{tr}\left(\mathbf{A}^\mathrm{T}\mathbf{A}\right)} \]

The Frobenius norm,

\[ \parallel \mathbf{A}\parallel_F = \sqrt{\mathrm{tr}\left(\mathbf{A}^\mathrm{T}\mathbf{A}\right)} \]

is a particularly useful distance to understand as it relates to the singular value decomposition / principal component analysis.

- This is also the choice of norm that arises from the inner product of matrices defined in terms of,

\[ \mathrm{tr}\left(\mathbf{B}^\mathrm{T} \mathbf{A}\right) \] for \( \mathbf{A},\mathbf{B}\in\mathbb{R}^{n\times p} \).

This has a very similar interpretation then in terms of the Euclidean norm for a vector, but extended to matrices.

To compute a matrix norm in R, this is performed with the

`norm`

function.There are several different choices, but in terms of making some choice, you should understand what is special about that choice of norm.

We can also use the default choice that R provides to produce a size of the matrix in terms of the maximum size of one of its columns, treated as

\[ \parallel \mathbf{A} \parallel_1 = \max_{j=1, \cdots, p} \sum_{i=1}^n \vert a_{i,j}\vert \]

- For a generic matrix, we can see the differences in the length estimation as follows:

```
A <- matrix(c(1,2), nrow=2, ncol=2, byrow=TRUE)
A
```

```
[,1] [,2]
[1,] 1 2
[2,] 1 2
```

```
norm(A) # default norm
```

```
[1] 4
```

```
norm(A, type="f") # frobenius norm
```

```
[1] 3.162278
```

- Note the eigenvalue decomposition as,

```
eigen(t(A) %*% A)
```

```
eigen() decomposition
$values
[1] 10 0
$vectors
[,1] [,2]
[1,] 0.4472136 -0.8944272
[2,] 0.8944272 0.4472136
```

- and therefore we find \( \parallel \mathbf{A}\parallel_F = \sqrt{\mathrm{tr\left(\mathbf{A}^\mathrm{T} \mathbf{A}\right)}}= \sqrt{10 + 0} = \)

```
sqrt(10)
```

```
[1] 3.162278
```

```
norm(A, type="f")
```

```
[1] 3.162278
```

- The main ideas to take away from this introduction to matrix algebra are the following:
- Matrix multiplication is a general case of vector
**multiplication of rows versus columns**. This is fundamentally different from the element-wise multiplication which does not contain the same mathematical / geometric meaning of the inner product or matrix product. Paricularly, \[ \begin{align} \mathbf{a}^\mathrm{T}\mathbf{b} = \parallel \mathbf{a} \parallel * \parallel \mathbf{b} \parallel \cos\left(\theta\right), \end{align} \] so that this product is a scalar value that depends on the lengths and angle between the vectors. - The
**general matrix product**is defined as, \[ \begin{align} \mathbf{N} \mathbf{M} = \begin{pmatrix} \mathbf{n}_1^\mathrm{T} \mathbf{m_1} & \mathbf{n}_1^\mathrm{T} \mathbf{m}_2 & \cdots & \mathbf{n}_1^\mathrm{T} \mathbf{m}_M \\ \vdots & \ddots & \cdots & \vdots \\ \mathbf{n}_N^\mathrm{T} \mathbf{m}_1 & \mathbf{n}_N^\mathrm{T} \mathbf{m}_2 & \cdots & \mathbf{n}_N^\mathrm{T} \mathbf{m}_M \end{pmatrix} \in \mathbb{R}^{N\times M} \end{align} \] - As an algebra system,
**matrices only have inverses under certain conditions**:The matrix \( \mathbf{A} \) has an inverse \( \Leftrightarrow \) \( \mathrm{det}\left(\mathbf{A}\right) \neq 0 \) \( \Leftrightarrow \) \( \mathbf{A} \) has no linear dependence between its columns \( \Leftrightarrow \) the matrix \( \mathbf{A} \) has no zero eigenvalues \( \Leftrightarrow \) the linear inverse problem \( \mathbf{A}\mathbf{x} = \mathbf{b} \) has a unique solution for \( \mathbf{x} \).

**Eigenvalues and eigenvectors describe special coordinate systems**in which the effect of the matrix \( \mathbf{A} \) can be described as**diagonal or close-to-diagonal, but such coordinate systems may be complex valued**.- These also provide a useful way to describe the length of the matrix in terms of the Frobenius norm, but multiple choices of norm exist with different interpretations of length.