Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
Matrix algebra is a fundamental concept for applying and understanding statistical methods in more than one variable.
We will start by recalling a few basic ideas about vectors and their properties such as the inner product and the norm.
Then we will introduce the basic characteristics of matrices and their operations and their implementation in R.
Thereafter, other operations, such as the inverse and the solution to linear equations will be introduced.
Finally, we will introduce some basic properties about norms and matrix spectrum, with an emphasis on certain classes of matrices.
1:10
[1] 1 2 3 4 5 6 7 8 9 10
We should introduce some basic mathematical properties about vectors and their analysis.
Suppose we have two vectors
\[ \begin{align} \mathbf{a} = \begin{pmatrix} a_1 \\ a_2 \\ a_3 \end{pmatrix}\in\mathbb{R}^{3 \times 1} & & \mathbf{b} = \begin{pmatrix} b_1 \\ b_2 \\ b_3 \end{pmatrix}\in \mathbb{R}^{3\times 1} \end{align} \]
We can perform basic mathematical operations on these element-wise as follows
\[ \begin{align} \mathbf{a} + \mathbf{b} = \begin{pmatrix} a_1 + b_1 \\ a_2 + b_2 \\ a_3 + b_3 \end{pmatrix} & & \mathbf{a}*\mathbf{b} = \begin{pmatrix} a_1 * b_1 \\ a_2 * b_2 \\ a_3 * b_3 \end{pmatrix} \end{align} \]
Both of these operations generalize to vectors of arbitrary length.
Notice that the two previously defined vectors \( \mathbf{a} \) and \( \mathbf{b} \) were defined as column vectors
\[ \begin{align} \mathbf{a} = \begin{pmatrix} a_1 \\ a_2 \\ a_3 \end{pmatrix}\in\mathbb{R}^{3 \times 1} & & \mathbf{b} = \begin{pmatrix} b_1 \\ b_2 \\ b_3 \end{pmatrix}\in \mathbb{R}^{3\times 1} \end{align} \]
The transpose of \( \mathbf{a} \) is defined as the row vector,
\[ \begin{align} \mathbf{a}^\mathrm{T} = \begin{pmatrix} a_1 & a_2 & a_3 \end{pmatrix} \in \mathbb{R}^{1 \times 3} \end{align} \]
The standard vector inner product is defined for the vectors \( \mathbf{a} \) and \( \mathbf{b} \) as follows
\[ \begin{align} \mathbf{a}^\mathrm{T} \mathbf{b} = a_1 * b_1 + a_2 * b_2 + a_3 * b_3 \end{align} \]
That is, we take each row element from \( \mathbf{a}^\mathrm{T} \) and multiply it by each column element of \( \mathbf{b} \) and take the sum of these products.
This generalizes to vectors of arbitrary length \( n \) as,
\[ \begin{align} \mathbf{a}^\mathrm{T} \mathbf{b} = \sum_{i=1}^n a_i * b_i \end{align} \]
This rule also defines the general form of matrix multiplication that we will consider shortly.
Let's consider a quick example of the vector product of two vectors.
a <- 1:3
b <- 4:6
a * b
[1] 4 10 18
sum(a*b)
[1] 32
t(a)%*%b
[,1]
[1,] 32
Mathematically, the standard inner product can be described as follows,
\[ \begin{align} \mathbf{a}^\mathrm{T}\mathbf{b} = \parallel \mathbf{a} \parallel * \parallel \mathbf{b} \parallel \cos\left(\theta\right), \end{align} \]
where,
\( \parallel \mathbf{a}\parallel \) refers to the Euclidean length of the vector, defined as \( \parallel \mathbf{a}\parallel^2 =\mathbf{a}^\mathrm{T}\mathbf{a} \); and
\( \theta \) is the angle formed by the two vectors \( \mathbf{a} \) and \( \mathbf{b} \) at the origin \( \boldsymbol{0} \).
There are other ways to define the length of a vector that do not use the inner product as above, but we will be more interested in these ideas in the case of matrices.
The above standard inner product is also a special case of general matrix multiplication.
require("faraway")
gala_mat <- as.matrix(gala)
gala_mat
Species Endemics Area Elevation Nearest Scruz Adjacent
Baltra 58 23 25.09 346 0.6 0.6 1.84
Bartolome 31 21 1.24 109 0.6 26.3 572.33
Caldwell 3 3 0.21 114 2.8 58.7 0.78
Champion 25 9 0.10 46 1.9 47.4 0.18
Coamano 2 1 0.05 77 1.9 1.9 903.82
Daphne.Major 18 11 0.34 119 8.0 8.0 1.84
Daphne.Minor 24 0 0.08 93 6.0 12.0 0.34
Darwin 10 7 2.33 168 34.1 290.2 2.85
Eden 8 4 0.03 71 0.4 0.4 17.95
Enderby 2 2 0.18 112 2.6 50.2 0.10
Espanola 97 26 58.27 198 1.1 88.3 0.57
Fernandina 93 35 634.49 1494 4.3 95.3 4669.32
Gardner1 58 17 0.57 49 1.1 93.1 58.27
Gardner2 5 4 0.78 227 4.6 62.2 0.21
Genovesa 40 19 17.35 76 47.4 92.2 129.49
Isabela 347 89 4669.32 1707 0.7 28.1 634.49
Marchena 51 23 129.49 343 29.1 85.9 59.56
Onslow 2 2 0.01 25 3.3 45.9 0.10
Pinta 104 37 59.56 777 29.1 119.6 129.49
Pinzon 108 33 17.95 458 10.7 10.7 0.03
Las.Plazas 12 9 0.23 94 0.5 0.6 25.09
Rabida 70 30 4.89 367 4.4 24.4 572.33
SanCristobal 280 65 551.62 716 45.2 66.6 0.57
SanSalvador 237 81 572.33 906 0.2 19.8 4.89
SantaCruz 444 95 903.82 864 0.6 0.0 0.52
SantaFe 62 28 24.08 259 16.5 16.5 0.52
SantaMaria 285 73 170.92 640 2.6 49.2 0.10
Seymour 44 16 1.84 147 0.6 9.6 25.09
Tortuga 16 8 1.24 186 6.8 50.9 17.95
Wolf 21 12 2.85 253 34.1 254.7 2.33
There are several special matrices that are frequently encountered in practical and theoretical work.
Diagonal matrices are special matrices where all off-diagonal elements are equal to 0;
The function diag()
extracts the main diagonal of a matrix in R,
dim(gala_mat)
[1] 30 7
diag(gala_mat)
[1] 58.00 21.00 0.21 46.00 1.90 8.00 0.34
for (i in 1:7) {print(gala_mat[i,i])}
[1] 58
[1] 21
[1] 0.21
[1] 46
[1] 1.9
[1] 8
[1] 0.34
diag()
function to produce a diagonal matrix,diag(3)
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
diag(2,3)
[,1] [,2] [,3]
[1,] 2 0 0
[2,] 0 2 0
[3,] 0 0 2
diag(1:3)
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 2 0
[3,] 0 0 3
Diagonal matrices have the benefit that their operation is like that of regular scalar algebra.
diag(1:3) - diag(2,3)
[,1] [,2] [,3]
[1,] -1 0 0
[2,] 0 0 0
[3,] 0 0 1
diag(1:3) * diag(2,3)
[,1] [,2] [,3]
[1,] 2 0 0
[2,] 0 4 0
[3,] 0 0 6
diag(1:3) %*% diag(2,3)
[,1] [,2] [,3]
[1,] 2 0 0
[2,] 0 4 0
[3,] 0 0 6
Note that in the R language *
corresponds to element-wise multiplication.
Define the two matrices,
\[ \begin{align} A = \begin{pmatrix} a_1 & a_2 & a_3 \\ a_4 & a_5 & a_6 \\ a_7 & a_8 & a_9 \end{pmatrix} & & B = \begin{pmatrix} b_1 & b_2 & b_3 \\ b_4 & b_5 & b_6 \\ b_7 & b_8 & b_9 \end{pmatrix} \end{align} \]
Their element-wise product is then,
\[ \begin{align} A * B = \begin{pmatrix} a_1 * b_1 & a_2 * b_2 & a_3 * b_3 \\ a_4 * b_4 & a_5 * b_5 & a_6 * b_6 \\ a_7 * b_7 & a_8 * b_8 & a_9 * b_9 \end{pmatrix} \end{align} \]
Note, that this is not the same product as the matrix product defined by %*%
in general.
The above element-wise product is like the element-wise vector product;
As a simple case that leads to general matrix multiplication, let us consider matrix / vector multiplication.
Let's suppose that
\[ \begin{align} \mathbf{N} \in \mathbb{R}^{N \times p} & & \mathbf{x} \in \mathbf{R}^{p \times 1} \end{align} \]
We will suppose that we can write the following
\[ \mathbf{N} = \begin{pmatrix} \mathbf{n}_1^\mathrm{T} \\ \vdots \\ \mathbf{n}_N^\mathrm{T} \end{pmatrix} \] where each \( \mathbf{n}_i^\mathrm{T} \in \mathbb{R}^{1 \times p} \) is a row of the matrix \( \mathbf{N} \).
Then, the recall the vector multiplication, \[ \mathbf{n}_i^\mathrm{T} \mathbf{x} = \sum_{j=1}^p n_{i,j} * x_{j} \]
The product of the matrix and the vector is given as, \[ \mathbf{N}\mathbf{x} = \begin{pmatrix} \mathbf{n}^\mathrm{T}_1 \mathbf{x} \\ \vdots \\ \mathbf{n}_N^\mathrm{T} \mathbf{x} \end{pmatrix} \]
This type of multiplication is commonly known as row-versus-column multiplication.
Particularly, for a simple matrix and vector pair,
\[ \begin{align} \mathbf{A} = \begin{pmatrix} 1 & 2 \\ 3 & 4\end{pmatrix} & & \mathbf{x} = \begin{pmatrix} 1 \\ 1 \end{pmatrix} \end{align} \]
we recover the product
\[ \begin{align} \mathbf{A}\mathbf{x} = \begin{pmatrix} 1 * 1 + 2 * 1 \\ 3 * 1 + 4 * 1 \end{pmatrix} = \begin{pmatrix} 3 \\ 7 \end{pmatrix} \end{align} \]
Notice that given our previous definitions, this means that matrix multiplication and element-wise multiplication are equivalent for diagonal matrices.
Not every matrix can be reduced to a diagonal matrix, but many can be reduced to almost-diagonal or other useful forms under various coordinate transformations.
Matrices can be considered a coordinate representation of a general, linear transformation;
A property of the linear transformation that does not depend on the choice of coordinates is called an invariant of the matrix under coordinate transformation.
The most basic invariant we can consider is the trace of a matrix.
For an arbitrary matrix of size \( n\times p \), \( \mathbf{A}_{ij} = a_{ij} \) we define this as,
\[ \mathrm{tr}\left(\mathbf{A}\right) = \sum_{i=1}^{\mathrm{min}(n,p)} a_{ii} \]
i.e., the sum of all diagonal elements.
Q: suppose we randomly generate a matrix as follows:
set.seed(0)
my_matrix <- matrix(rnorm(16), nrow=4, ncol=4)
my_matrix
[,1] [,2] [,3] [,4]
[1,] 1.2629543 0.4146414 -0.005767173 -1.1476570
[2,] -0.3262334 -1.5399500 2.404653389 -0.2894616
[3,] 1.3297993 -0.9285670 0.763593461 -0.2992151
[4,] 1.2724293 -0.2947204 -0.799009249 -0.4115108
diag
and the sum
function to obtainsum(diag(my_matrix))
[1] 0.07508687
Trace is relatively easy to understand how to compute, but the meaning of trace can take a while to appreciate.
Determinants on the other hand are difficult to understand how to compute, but give a very basic tool for understanding matrices.
Only in the special case of a \( 2\times 2 \) matrix is a determinant easy to compute,
\[ \begin{align} \mathbf{A} = \begin{pmatrix} a_1 & a_2 \\ a_3 & a_4 \end{pmatrix} & & \mathrm{det}\left(\mathbf{A}\right) = a_1*a_4 - a_2*a_3 \end{align} \]
We will suppress the general calculation of determinants and instead focus on one of the most useful properties for matrix analysis.
The determinant in practice is often used to check the invertibility of a matrix \( \mathbf{A} \).
If \( \mathbf{A} \) is a square matrix, i.e., \( \mathbf{A} \in \mathbb{R}^{n\times n} \), then its inverse is defined,
\[ \mathbf{A}^{-1} \mathbf{A} = \mathbf{A} \mathbf{A}^{-1} = \mathbf{I}_n \]
if it actually even exists.
The following useful property can be used to determine if a matrix is invertible.
The matrix \( \mathbf{A} \) is invertible if and only if \( \mathrm{det}\left(\mathbf{A}\right) \neq 0 \). I.e., the inverse \( \mathbf{A}^{-1} \) will only exist when \( \mathrm{det}\left(\mathbf{A}\right) \neq 0 \) and if \( \mathrm{det}\left(\mathbf{A}\right) = 0 \) there is no such inverse as discussed above.
set.seed(0)
my_matrix <- matrix(rnorm(16), nrow=4, ncol=4)
my_matrix
[,1] [,2] [,3] [,4]
[1,] 1.2629543 0.4146414 -0.005767173 -1.1476570
[2,] -0.3262334 -1.5399500 2.404653389 -0.2894616
[3,] 1.3297993 -0.9285670 0.763593461 -0.2992151
[4,] 1.2724293 -0.2947204 -0.799009249 -0.4115108
det(my_matrix)
[1] -2.429628
can we say the inverse exists?
A: the determinant is non-zero so an inverse exists.
A <- matrix(1:16, nrow=4, ncol=4)
A
[,1] [,2] [,3] [,4]
[1,] 1 5 9 13
[2,] 2 6 10 14
[3,] 3 7 11 15
[4,] 4 8 12 16
det(A)
[1] 0
(A[,2] - A[,1])
[1] 4 4 4 4
(A[,3] - A[,4])
[1] -4 -4 -4 -4
(A[,2] - A[,1]) + (A[,3] - A[,4])
[1] 0 0 0 0
This says that there is a direct linear dependence between the columns of the matrix, and a zero vector can be written as a combination of the columns.
The maximal number of linearly independent columns can be computed as follows:
qr(A)$rank
[1] 2
qr(my_matrix)$rank
[1] 4
Notice that the size of my_matrix
and A
is \( 4\times 4 \), so that we can say that my_matrix
has an entire set of linearly independent columns.
The determinant function detects then when there is a linear dependence between the columns, and gives a zero when there is a dependence.
Only square matrices with linearly independent columns (non-zero determinants) have inverses.