Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
An inverse in the R language can be found from a more general problem called a linear inverse problem:
\[ \mathbf{A} \mathbf{x} = \mathbf{b} \]
In the above, we assume that the vector \( \mathbf{x} \) is of interest but unknown.
If there are no dependencies in the relationship defined by the columns of \( \mathbf{A} \), then there is a unique relationship that transfers the unobserved \( \mathbf{x} \) to the observed variables \( \mathbf{b} \).
Being able to invert this relationship, we find
\[ \mathbf{x} = \mathbf{A}^{-1} \mathbf{b}. \]
The way to implement such a procedure in R is with the solve()
function.
If a matrix \( \mathbf{A} \) is supplied alone, this computes the inverse \( \mathbf{A}^{-1} \) if it exists.
solve()
function with my_matrix
(as generated last time) to demonstrate:set.seed(0)
my_matrix <- matrix(rnorm(16), nrow=4, ncol=4)
solve(my_matrix) %*% my_matrix
[,1] [,2] [,3] [,4]
[1,] 1.000000e+00 1.942890e-16 0.000000e+00 -5.551115e-17
[2,] -2.220446e-16 1.000000e+00 0.000000e+00 0.000000e+00
[3,] -2.220446e-16 1.665335e-16 1.000000e+00 -5.551115e-17
[4,] 0.000000e+00 2.775558e-16 -4.440892e-16 1.000000e+00
my_matrix %*% solve(my_matrix)
[,1] [,2] [,3] [,4]
[1,] 1.000000e+00 1.110223e-16 0.000000e+00 -2.220446e-16
[2,] -2.775558e-17 1.000000e+00 -2.220446e-16 4.996004e-16
[3,] 0.000000e+00 0.000000e+00 1.000000e+00 5.551115e-17
[4,] -5.551115e-17 0.000000e+00 3.330669e-16 1.000000e+00
my_matrix %*% solve(my_matrix, 1:4)
[,1]
[1,] 1
[2,] 2
[3,] 3
[4,] 4
Notice the dimensionality in the previous linear inverse problem,
\[ \begin{align} \mathbf{A}\in \mathbb{R}^{n\times n} & & \mathbf{x}\in\mathbb{R}^{n \times 1 } & & \mathbf{b} \in \mathbb{R}^{n\times 1} \end{align} \]
That is to say, \( \mathbf{A} \) takes \( \mathbf{x} \) from where it lies in \( \mathbb{R}^n \) to another vector in \( \mathbb{R}^n \).
Generally, when we consider a square matrix \( \mathbf{A}\in\mathbb{R}^{n\times n} \), the transformation it represents is from the space \( \mathbb{R}^n \) back to itself.
A special notion exists for such transformations, when the transformation only scales the existing values.
Consider the diagonal matrix,
diag(1:3)
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 2 0
[3,] 0 0 3
diag(1:3) %*% c(1, 0, 0)
[,1]
[1,] 1
[2,] 0
[3,] 0
diag(1:3) %*% c(0, 1, 0)
[,1]
[1,] 0
[2,] 2
[3,] 0
diag(1:3) %*% c(0, 0, 1)
[,1]
[1,] 0
[2,] 0
[3,] 3
In each case, the matrix diag(1:3)
had the property that it sent the vector back to a re-scaled copy of itself.
This is because each of the vectors were distinct eigenvectors for the matrix.
If a nonzero vector \( \mathbf{x} \) has the property that,
\[ \mathbf{A}\mathbf{x} =\lambda \mathbf{x} \]
then \( \mathbf{x} \) is said to be an eigenvector of \( \mathbf{A} \) associated to the eigenvalue \( \lambda \).
Diagonal matrices are ones that have an entire coordinate system composed of eigenvectors.
Notice now that,
\[ \begin{align} & \mathbf{A}\mathbf{x} =\lambda \mathbf{x} \\ \Leftrightarrow & \mathbf{A}\mathbf{x} - \lambda \mathbf{x} = 0 \\ \Leftrightarrow & \left(\mathbf{A} - \lambda \mathbf{I}_n\right) \mathbf{x} = 0 \\ \end{align} \]
This means that \( \mathbf{x} \) is an eigenvector of the matrix \( \left(\mathbf{A} - \lambda \mathbf{I}_n\right) \) associated to the zero eigenvalue.
\[ \mathrm{det} \left(\mathbf{A} - \lambda \mathbf{I}_n\right) = 0. \]
The above fact is a basic property that allows us to solve for eigenvalues of the matrix \( \mathbf{A} \).
eigen(my_matrix)
eigen() decomposition
$values
[1] -0.312062+1.564758i -0.312062-1.564758i 1.387185+0.000000i
[4] -0.687975+0.000000i
$vectors
[,1] [,2] [,3] [,4]
[1,] -0.3449514+0.0145144i -0.3449514-0.0145144i -0.62691105+0i 0.2303783+0i
[2,] 0.5832284+0.0000000i 0.5832284+0.0000000i -0.44282830+0i 0.6601861+0i
[3,] 0.2216425+0.4402345i 0.2216425-0.4402345i -0.63480339+0i 0.3408087+0i
[4,] -0.2440101+0.4880265i -0.2440101-0.4880265i -0.08893981+0i 0.6284342+0i
Notice that some of the eigenvalue are actually complex numbers, and the eigenvectors are complex vectors.
A <- matrix(1:16, nrow=4, ncol=4)
eigen(diag(1:3))
eigen() decomposition
$values
[1] 3 2 1
$vectors
[,1] [,2] [,3]
[1,] 0 0 1
[2,] 0 1 0
[3,] 1 0 0
eigen(A)
eigen() decomposition
$values
[1] 3.620937e+01 -2.209373e+00 -9.072325e-16 7.166935e-16
$vectors
[,1] [,2] [,3] [,4]
[1,] 0.4140028 0.82289268 -0.5477226 -0.06211969
[2,] 0.4688206 0.42193991 0.7302967 0.48844043
[3,] 0.5236384 0.02098714 0.1825742 -0.79052178
[4,] 0.5784562 -0.37996563 -0.3651484 0.36420104
A
with one of its zero eigenvectors, we getA %*% eigen(A)$vectors[,3]
[,1]
[1,] -8.881784e-16
[2,] 0.000000e+00
[3,] 8.881784e-16
[4,] 1.776357e-15
A %*% eigen(A)$vectors[,4]
[,1]
[1,] -8.881784e-16
[2,] 0.000000e+00
[3,] 8.881784e-16
[4,] 0.000000e+00
A[,1] *eigen(A)$vectors[1,3] + A[,2] *eigen(A)$vectors[2,3] + A[,3] *eigen(A)$vectors[3,3] + A[,4] *eigen(A)$vectors[4,3]
[1] -8.881784e-16 0.000000e+00 8.881784e-16 1.776357e-15
Recall now that the eigenvalues of diag(1:3)
are 1, 2 and 3.
Q: can you tell how these eigenvalues relate to the value
det(diag(1:3))
[1] 6
A: In fact, the determinant of the matrix is equal to the product of the eigenvalues.
From the above, we recover a general equivalence:
The matrix \( \mathbf{A} \) has an inverse \( \Leftrightarrow \) \( \mathrm{det}\left(\mathbf{A}\right) \neq 0 \) \( \Leftrightarrow \) \( \mathbf{A} \) has no linear dependence between its columns \( \Leftrightarrow \) the matrix \( \mathbf{A} \) has no zero eigenvalues \( \Leftrightarrow \) the linear inverse problem \( \mathbf{A}\mathbf{x} = \mathbf{b} \) has a unique solution for \( \mathbf{x} \).
This shows how the determinant is related to the eigenvalues and the spectrum of the matrix \( \mathbf{A} \).
The trace is also related to the eigenvalues as follows:
We will use this fact shortly when we discuss the Frobenius norm.
At the moment, we will introduce a basic case of the spectral theorem, that is useful for understanding the idea of a matrix norm.
Suppose that we have a matrix \( \mathbf{A} \in \mathbb{R}^{n\times p} \) where we assume that \( p \leq n \).
We will define a square product of this matrix with itself so that the dimensionality makes sense, and so that it is in the smallest dimension \( p\times p \) as,
\[ \mathbf{A}^\mathrm{T} \mathbf{A} \in \mathbb{R}^{p\times p}. \]
The spectral theorem guarantees that this matrix can be transformed into a diagonal matrix in an appropriate real-valued coordinate change, and the diagonal will have only non-negative values on the diagonal.
Particularly, the \( p \) non-negative values on the diagonal are the eigenvalues \( \{\lambda_i\}_{i=1}^p \) of \( \mathbf{A}^\mathrm{T}\mathbf{A} \) (or the singular values squared of \( \mathbf{A} \)).
The reason that this can be decomposed into such a coordinate system is because of the symmetry of the product under transpose:
\[ \left(\mathbf{A}^\mathrm{T}\mathbf{A}\right)^\mathrm{T} = \mathbf{A}^\mathrm{T} \left(\mathbf{A}^\mathrm{T}\right)^\mathrm{T} =\mathbf{A}^\mathrm{T} \mathbf{A} \]
The reason that the eigenvalues must be non-negative is because this acts like the square of a scalar, but in terms of the scalar eigenvalues.
There are many ways we can describe the “length” of the matrix, and all give different features more prominence, but are algebraicly equivalent up to rescaling.
One particularly useful type of norm is known as the Frobenius norm of a matrix, and arises naturally due to the previous decomposition.
We note that \( \mathbf{A}^\mathrm{T} \mathbf{A} \) has p non-negative eigenvalues and that therefore,
\[ \mathrm{tr}\left(\mathbf{A}^\mathrm{T}\mathbf{A}\right) = \sum_{i=1}^p \lambda_i. \] can be computed directly by solving for the eigenvalues.
Particularly, this gives a kind of weighted measure of the expansion and contraction under the map \( \mathbf{A}^\mathrm{T}\mathbf{A} \).
We define the Frobenius norm as an actual mathematical matrix “distance” as,
\[ \parallel \mathbf{A}\parallel_F = \sqrt{\mathrm{tr}\left(\mathbf{A}^\mathrm{T}\mathbf{A}\right)} \]
The Frobenius norm,
\[ \parallel \mathbf{A}\parallel_F = \sqrt{\mathrm{tr}\left(\mathbf{A}^\mathrm{T}\mathbf{A}\right)} \]
is a particularly useful distance to understand as it relates to the singular value decomposition / principal component analysis.
\[ \mathrm{tr}\left(\mathbf{B}^\mathrm{T} \mathbf{A}\right) \] for \( \mathbf{A},\mathbf{B}\in\mathbb{R}^{n\times p} \).
This has a very similar interpretation then in terms of the Euclidean norm for a vector, but extended to matrices.
To compute a matrix norm in R, this is performed with the norm
function.
There are several different choices, but in terms of making some choice, you should understand what is special about that choice of norm.
We can also use the default choice that R provides to produce a size of the matrix in terms of the maximum size of one of its columns, treated as
\[ \parallel \mathbf{A} \parallel_1 = \max_{j=1, \cdots, p} \sum_{i=1}^n \vert a_{i,j}\vert \]
A <- matrix(c(1,2), nrow=2, ncol=2, byrow=TRUE)
A
[,1] [,2]
[1,] 1 2
[2,] 1 2
norm(A) # default norm
[1] 4
norm(A, type="f") # frobenius norm
[1] 3.162278
eigen(t(A) %*% A)
eigen() decomposition
$values
[1] 10 0
$vectors
[,1] [,2]
[1,] 0.4472136 -0.8944272
[2,] 0.8944272 0.4472136
sqrt(10)
[1] 3.162278
norm(A, type="f")
[1] 3.162278
The matrix \( \mathbf{A} \) has an inverse \( \Leftrightarrow \) \( \mathrm{det}\left(\mathbf{A}\right) \neq 0 \) \( \Leftrightarrow \) \( \mathbf{A} \) has no linear dependence between its columns \( \Leftrightarrow \) the matrix \( \mathbf{A} \) has no zero eigenvalues \( \Leftrightarrow \) the linear inverse problem \( \mathbf{A}\mathbf{x} = \mathbf{b} \) has a unique solution for \( \mathbf{x} \).