Matrix algebra in R part II

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

The following topics will be covered in this lecture:

Linear inverse problems
Matrix spectrum and eigen vectors
Matrix norms

Solving a linear system of equations

An inverse in the R language can be found from a more general problem called a linear inverse problem:

\[ \mathbf{A} \mathbf{x} = \mathbf{b} \]
In the above, we assume that the vector \( \mathbf{x} \) is of interest but unknown.
- On the other hand, we assume that \( \mathbf{A} \) is a known relationship between \( \mathbf{x} \) and the observed vector \( \mathbf{b} \).
If there are no dependencies in the relationship defined by the columns of \( \mathbf{A} \), then there is a unique relationship that transfers the unobserved \( \mathbf{x} \) to the observed variables \( \mathbf{b} \).
Being able to invert this relationship, we find

\[ \mathbf{x} = \mathbf{A}^{-1} \mathbf{b}. \]
The way to implement such a procedure in R is with the solve() function.
If a matrix \( \mathbf{A} \) is supplied alone, this computes the inverse \( \mathbf{A}^{-1} \) if it exists.
- If a vector \( \mathbf{b} \) is supplied additionally, then this solves for \( \mathbf{x} \) as above.

Solving a linear system of equations example

We will use the solve() function with my_matrix (as generated last time) to demonstrate:

set.seed(0)
my_matrix <- matrix(rnorm(16), nrow=4, ncol=4)
solve(my_matrix) %*% my_matrix

              [,1]         [,2]          [,3]          [,4]
[1,]  1.000000e+00 1.942890e-16  0.000000e+00 -5.551115e-17
[2,] -2.220446e-16 1.000000e+00  0.000000e+00  0.000000e+00
[3,] -2.220446e-16 1.665335e-16  1.000000e+00 -5.551115e-17
[4,]  0.000000e+00 2.775558e-16 -4.440892e-16  1.000000e+00

my_matrix %*% solve(my_matrix)

              [,1]         [,2]          [,3]          [,4]
[1,]  1.000000e+00 1.110223e-16  0.000000e+00 -2.220446e-16
[2,] -2.775558e-17 1.000000e+00 -2.220446e-16  4.996004e-16
[3,]  0.000000e+00 0.000000e+00  1.000000e+00  5.551115e-17
[4,] -5.551115e-17 0.000000e+00  3.330669e-16  1.000000e+00

Notice that the off diagonal elements are not exactly but approximately zero.

Solving a linear system of equations example

We can also solve the linear inverse problem,

my_matrix %*% solve(my_matrix, 1:4)

     [,1]
[1,]    1
[2,]    2
[3,]    3
[4,]    4

This is representing the equation \[ \mathbf{A} \mathbf{x} = \mathbf{A} \left(\mathbf{A}^{-1}\mathbf{b}\right) = \mathbf{b}. \]

Matrix spectrum and norms

Notice the dimensionality in the previous linear inverse problem,

\[ \begin{align} \mathbf{A}\in \mathbb{R}^{n\times n} & & \mathbf{x}\in\mathbb{R}^{n \times 1 } & & \mathbf{b} \in \mathbb{R}^{n\times 1} \end{align} \]
That is to say, \( \mathbf{A} \) takes \( \mathbf{x} \) from where it lies in \( \mathbb{R}^n \) to another vector in \( \mathbb{R}^n \).
Generally, when we consider a square matrix \( \mathbf{A}\in\mathbb{R}^{n\times n} \), the transformation it represents is from the space \( \mathbb{R}^n \) back to itself.
A special notion exists for such transformations, when the transformation only scales the existing values.
Consider the diagonal matrix,

diag(1:3)

     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    2    0
[3,]    0    0    3

diag(1:3) %*% c(1, 0, 0)

     [,1]
[1,]    1
[2,]    0
[3,]    0

Matrix spectrum and norms continued

Now we consider

diag(1:3) %*% c(0, 1, 0)

     [,1]
[1,]    0
[2,]    2
[3,]    0

diag(1:3) %*% c(0, 0, 1)

     [,1]
[1,]    0
[2,]    0
[3,]    3

In each case, the matrix diag(1:3) had the property that it sent the vector back to a re-scaled copy of itself.
This is because each of the vectors were distinct eigenvectors for the matrix.

Eigenvalues and eigenvectors

If a nonzero vector \( \mathbf{x} \) has the property that,

\[ \mathbf{A}\mathbf{x} =\lambda \mathbf{x} \]

then \( \mathbf{x} \) is said to be an eigenvector of \( \mathbf{A} \) associated to the eigenvalue \( \lambda \).
Diagonal matrices are ones that have an entire coordinate system composed of eigenvectors.
- Certain non-diagonal matrices can be transformed into diagonal matrices by finding such a coordinate system.
Notice now that,

\[ \begin{align} & \mathbf{A}\mathbf{x} =\lambda \mathbf{x} \\ \Leftrightarrow & \mathbf{A}\mathbf{x} - \lambda \mathbf{x} = 0 \\ \Leftrightarrow & \left(\mathbf{A} - \lambda \mathbf{I}_n\right) \mathbf{x} = 0 \\ \end{align} \]
This means that \( \mathbf{x} \) is an eigenvector of the matrix \( \left(\mathbf{A} - \lambda \mathbf{I}_n\right) \) associated to the zero eigenvalue.
- Particularly, this means that there is a linear dependence in \( \left(\mathbf{A} - \lambda \mathbf{I}_n\right) \) and
\[ \mathrm{det} \left(\mathbf{A} - \lambda \mathbf{I}_n\right) = 0. \]
The above fact is a basic property that allows us to solve for eigenvalues of the matrix \( \mathbf{A} \).

Eigenvalues and eigenvectors

In the R language, eigenvalues and eigenvectors can be computed as follows:

eigen(my_matrix)

eigen() decomposition
$values
[1] -0.312062+1.564758i -0.312062-1.564758i  1.387185+0.000000i
[4] -0.687975+0.000000i

$vectors
                      [,1]                  [,2]           [,3]         [,4]
[1,] -0.3449514+0.0145144i -0.3449514-0.0145144i -0.62691105+0i 0.2303783+0i
[2,]  0.5832284+0.0000000i  0.5832284+0.0000000i -0.44282830+0i 0.6601861+0i
[3,]  0.2216425+0.4402345i  0.2216425-0.4402345i -0.63480339+0i 0.3408087+0i
[4,] -0.2440101+0.4880265i -0.2440101-0.4880265i -0.08893981+0i 0.6284342+0i

Notice that some of the eigenvalue are actually complex numbers, and the eigenvectors are complex vectors.
- This is due to the fact that such a coordinate system to transform a matrix into a representation in eigenspaces may only exist in complex coordinates.

Eigenvalues and eigenvectors

Instead consider,

A <- matrix(1:16, nrow=4, ncol=4)
eigen(diag(1:3))

eigen() decomposition
$values
[1] 3 2 1

$vectors
     [,1] [,2] [,3]
[1,]    0    0    1
[2,]    0    1    0
[3,]    1    0    0

eigen(A)

eigen() decomposition
$values
[1]  3.620937e+01 -2.209373e+00 -9.072325e-16  7.166935e-16

$vectors
          [,1]        [,2]       [,3]        [,4]
[1,] 0.4140028  0.82289268 -0.5477226 -0.06211969
[2,] 0.4688206  0.42193991  0.7302967  0.48844043
[3,] 0.5236384  0.02098714  0.1825742 -0.79052178
[4,] 0.5784562 -0.37996563 -0.3651484  0.36420104

Eigenvalues and eigenvectors

Taking a product of A with one of its zero eigenvectors, we get

A %*% eigen(A)$vectors[,3]

              [,1]
[1,] -8.881784e-16
[2,]  0.000000e+00
[3,]  8.881784e-16
[4,]  1.776357e-15

A %*% eigen(A)$vectors[,4]

              [,1]
[1,] -8.881784e-16
[2,]  0.000000e+00
[3,]  8.881784e-16
[4,]  0.000000e+00

Specifically, this gives a dependence relationship as,

A[,1] *eigen(A)$vectors[1,3] + A[,2] *eigen(A)$vectors[2,3] + A[,3] *eigen(A)$vectors[3,3] + A[,4] *eigen(A)$vectors[4,3]

[1] -8.881784e-16  0.000000e+00  8.881784e-16  1.776357e-15

Eigenvalues and eigenvectors

Recall now that the eigenvalues of diag(1:3) are 1, 2 and 3.
Q: can you tell how these eigenvalues relate to the value

det(diag(1:3))

[1] 6

A: In fact, the determinant of the matrix is equal to the product of the eigenvalues.
From the above, we recover a general equivalence:

The matrix \( \mathbf{A} \) has an inverse \( \Leftrightarrow \) \( \mathrm{det}\left(\mathbf{A}\right) \neq 0 \) \( \Leftrightarrow \) \( \mathbf{A} \) has no linear dependence between its columns \( \Leftrightarrow \) the matrix \( \mathbf{A} \) has no zero eigenvalues \( \Leftrightarrow \) the linear inverse problem \( \mathbf{A}\mathbf{x} = \mathbf{b} \) has a unique solution for \( \mathbf{x} \).

All of the above statements are equivalent and are useful to equivalently in different scenarios.

Eigenvalues and eigenvectors

This shows how the determinant is related to the eigenvalues and the spectrum of the matrix \( \mathbf{A} \).
The trace is also related to the eigenvalues as follows:
- Let the collection of all eigenvalues of \( \mathbf{A} \) be defined as \( \{\lambda_i\}_{i=1}^k \) for some \( k\leq n \), then \[ \mathrm{tr}\left(\mathbf{A}\right) = \sum_{i=1}^k \lambda_i \]
We will use this fact shortly when we discuss the Frobenius norm.
At the moment, we will introduce a basic case of the spectral theorem, that is useful for understanding the idea of a matrix norm.

A special case of the spectral theorem

Suppose that we have a matrix \( \mathbf{A} \in \mathbb{R}^{n\times p} \) where we assume that \( p \leq n \).
We will define a square product of this matrix with itself so that the dimensionality makes sense, and so that it is in the smallest dimension \( p\times p \) as,

\[ \mathbf{A}^\mathrm{T} \mathbf{A} \in \mathbb{R}^{p\times p}. \]
The spectral theorem guarantees that this matrix can be transformed into a diagonal matrix in an appropriate real-valued coordinate change, and the diagonal will have only non-negative values on the diagonal.
Particularly, the \( p \) non-negative values on the diagonal are the eigenvalues \( \{\lambda_i\}_{i=1}^p \) of \( \mathbf{A}^\mathrm{T}\mathbf{A} \) (or the singular values squared of \( \mathbf{A} \)).
The reason that this can be decomposed into such a coordinate system is because of the symmetry of the product under transpose:

\[ \left(\mathbf{A}^\mathrm{T}\mathbf{A}\right)^\mathrm{T} = \mathbf{A}^\mathrm{T} \left(\mathbf{A}^\mathrm{T}\right)^\mathrm{T} =\mathbf{A}^\mathrm{T} \mathbf{A} \]
The reason that the eigenvalues must be non-negative is because this acts like the square of a scalar, but in terms of the scalar eigenvalues.

The matrix norm

There are many ways we can describe the “length” of the matrix, and all give different features more prominence, but are algebraicly equivalent up to rescaling.
One particularly useful type of norm is known as the Frobenius norm of a matrix, and arises naturally due to the previous decomposition.
We note that \( \mathbf{A}^\mathrm{T} \mathbf{A} \) has p non-negative eigenvalues and that therefore,

\[ \mathrm{tr}\left(\mathbf{A}^\mathrm{T}\mathbf{A}\right) = \sum_{i=1}^p \lambda_i. \] can be computed directly by solving for the eigenvalues.
Particularly, this gives a kind of weighted measure of the expansion and contraction under the map \( \mathbf{A}^\mathrm{T}\mathbf{A} \).
We define the Frobenius norm as an actual mathematical matrix “distance” as,

\[ \parallel \mathbf{A}\parallel_F = \sqrt{\mathrm{tr}\left(\mathbf{A}^\mathrm{T}\mathbf{A}\right)} \]

The Frobenius norm

The Frobenius norm,

\[ \parallel \mathbf{A}\parallel_F = \sqrt{\mathrm{tr}\left(\mathbf{A}^\mathrm{T}\mathbf{A}\right)} \]

is a particularly useful distance to understand as it relates to the singular value decomposition / principal component analysis.
- This is also the choice of norm that arises from the inner product of matrices defined in terms of,
\[ \mathrm{tr}\left(\mathbf{B}^\mathrm{T} \mathbf{A}\right) \] for \( \mathbf{A},\mathbf{B}\in\mathbb{R}^{n\times p} \).
This has a very similar interpretation then in terms of the Euclidean norm for a vector, but extended to matrices.

The matrix norm

To compute a matrix norm in R, this is performed with the norm function.
There are several different choices, but in terms of making some choice, you should understand what is special about that choice of norm.
We can also use the default choice that R provides to produce a size of the matrix in terms of the maximum size of one of its columns, treated as

\[ \parallel \mathbf{A} \parallel_1 = \max_{j=1, \cdots, p} \sum_{i=1}^n \vert a_{i,j}\vert \]

The matrix norm

For a generic matrix, we can see the differences in the length estimation as follows:

A <- matrix(c(1,2), nrow=2, ncol=2, byrow=TRUE)
A

     [,1] [,2]
[1,]    1    2
[2,]    1    2

norm(A) # default norm

[1] 4

norm(A, type="f") # frobenius norm

[1] 3.162278

The matrix norm

Note the eigenvalue decomposition as,

eigen(t(A) %*% A)

eigen() decomposition
$values
[1] 10  0

$vectors
          [,1]       [,2]
[1,] 0.4472136 -0.8944272
[2,] 0.8944272  0.4472136

and therefore we find \( \parallel \mathbf{A}\parallel_F = \sqrt{\mathrm{tr\left(\mathbf{A}^\mathrm{T} \mathbf{A}\right)}}= \sqrt{10 + 0} = \)

sqrt(10)

[1] 3.162278

norm(A, type="f")

[1] 3.162278

A summary of main ideas

The main ideas to take away from this introduction to matrix algebra are the following:

Matrix multiplication is a general case of vector multiplication of rows versus columns. This is fundamentally different from the element-wise multiplication which does not contain the same mathematical / geometric meaning of the inner product or matrix product. Paricularly, \[ \begin{align} \mathbf{a}^\mathrm{T}\mathbf{b} = \parallel \mathbf{a} \parallel * \parallel \mathbf{b} \parallel \cos\left(\theta\right), \end{align} \] so that this product is a scalar value that depends on the lengths and angle between the vectors.
The general matrix product is defined as, \[ \begin{align} \mathbf{N} \mathbf{M} = \begin{pmatrix} \mathbf{n}_1^\mathrm{T} \mathbf{m_1} & \mathbf{n}_1^\mathrm{T} \mathbf{m}_2 & \cdots & \mathbf{n}_1^\mathrm{T} \mathbf{m}_M \\ \vdots & \ddots & \cdots & \vdots \\ \mathbf{n}_N^\mathrm{T} \mathbf{m}_1 & \mathbf{n}_N^\mathrm{T} \mathbf{m}_2 & \cdots & \mathbf{n}_N^\mathrm{T} \mathbf{m}_M \end{pmatrix} \in \mathbb{R}^{N\times M} \end{align} \]
As an algebra system, matrices only have inverses under certain conditions:
The matrix \( \mathbf{A} \) has an inverse \( \Leftrightarrow \) \( \mathrm{det}\left(\mathbf{A}\right) \neq 0 \) \( \Leftrightarrow \) \( \mathbf{A} \) has no linear dependence between its columns \( \Leftrightarrow \) the matrix \( \mathbf{A} \) has no zero eigenvalues \( \Leftrightarrow \) the linear inverse problem \( \mathbf{A}\mathbf{x} = \mathbf{b} \) has a unique solution for \( \mathbf{x} \).
Eigenvalues and eigenvectors describe special coordinate systems in which the effect of the matrix \( \mathbf{A} \) can be described as diagonal or close-to-diagonal, but such coordinate systems may be complex valued.
These also provide a useful way to describe the length of the matrix in terms of the Frobenius norm, but multiple choices of norm exist with different interpretations of length.