Matrix algebra in R part I

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

The following topics will be covered in this lecture:

Basic vector properties
Vector inner products
Basic matrix properties
Matrix multiplication and invariants

An introduction to matrices

Matrix algebra is a fundamental concept for applying and understanding statistical methods in more than one variable.
- This is at the basis of formulating multivariate distributions and random vectors, as well as their analysis.
We will start by recalling a few basic ideas about vectors and their properties such as the inner product and the norm.
Then we will introduce the basic characteristics of matrices and their operations and their implementation in R.
Thereafter, other operations, such as the inverse and the solution to linear equations will be introduced.
Finally, we will introduce some basic properties about norms and matrix spectrum, with an emphasis on certain classes of matrices.

Basic properties of vectors

We have had a lot of practice now using vectors generally, such as with the slice operator

1:10

 [1]  1  2  3  4  5  6  7  8  9 10

We should introduce some basic mathematical properties about vectors and their analysis.
Suppose we have two vectors

\[ \begin{align} \mathbf{a} = \begin{pmatrix} a_1 \\ a_2 \\ a_3 \end{pmatrix}\in\mathbb{R}^{3 \times 1} & & \mathbf{b} = \begin{pmatrix} b_1 \\ b_2 \\ b_3 \end{pmatrix}\in \mathbb{R}^{3\times 1} \end{align} \]
We can perform basic mathematical operations on these element-wise as follows

\[ \begin{align} \mathbf{a} + \mathbf{b} = \begin{pmatrix} a_1 + b_1 \\ a_2 + b_2 \\ a_3 + b_3 \end{pmatrix} & & \mathbf{a}*\mathbf{b} = \begin{pmatrix} a_1 * b_1 \\ a_2 * b_2 \\ a_3 * b_3 \end{pmatrix} \end{align} \]
Both of these operations generalize to vectors of arbitrary length.
- However, the above multiplication rule is rarely used in practice as it is not as meaningful as the scalar-valued product considered next.

Vector inner product

Notice that the two previously defined vectors \( \mathbf{a} \) and \( \mathbf{b} \) were defined as column vectors

\[ \begin{align} \mathbf{a} = \begin{pmatrix} a_1 \\ a_2 \\ a_3 \end{pmatrix}\in\mathbb{R}^{3 \times 1} & & \mathbf{b} = \begin{pmatrix} b_1 \\ b_2 \\ b_3 \end{pmatrix}\in \mathbb{R}^{3\times 1} \end{align} \]
The transpose of \( \mathbf{a} \) is defined as the row vector,

\[ \begin{align} \mathbf{a}^\mathrm{T} = \begin{pmatrix} a_1 & a_2 & a_3 \end{pmatrix} \in \mathbb{R}^{1 \times 3} \end{align} \]
The standard vector inner product is defined for the vectors \( \mathbf{a} \) and \( \mathbf{b} \) as follows

\[ \begin{align} \mathbf{a}^\mathrm{T} \mathbf{b} = a_1 * b_1 + a_2 * b_2 + a_3 * b_3 \end{align} \]
That is, we take each row element from \( \mathbf{a}^\mathrm{T} \) and multiply it by each column element of \( \mathbf{b} \) and take the sum of these products.
This generalizes to vectors of arbitrary length \( n \) as,

\[ \begin{align} \mathbf{a}^\mathrm{T} \mathbf{b} = \sum_{i=1}^n a_i * b_i \end{align} \]
This rule also defines the general form of matrix multiplication that we will consider shortly.

Vector inner product example

Let's consider a quick example of the vector product of two vectors.
- We will write this manually in the form of the equation we derived earlier, first in terms of the element-wise product

a <- 1:3
b <- 4:6
a * b

[1]  4 10 18

taking the sum, we obtain the inner product

sum(a*b)

[1] 32

t(a)%*%b

     [,1]
[1,]   32

Vector inner product continued

Mathematically, the standard inner product can be described as follows,

\[ \begin{align} \mathbf{a}^\mathrm{T}\mathbf{b} = \parallel \mathbf{a} \parallel * \parallel \mathbf{b} \parallel \cos\left(\theta\right), \end{align} \]
where,

\( \parallel \mathbf{a}\parallel \) refers to the Euclidean length of the vector, defined as \( \parallel \mathbf{a}\parallel^2 =\mathbf{a}^\mathrm{T}\mathbf{a} \); and
\( \theta \) is the angle formed by the two vectors \( \mathbf{a} \) and \( \mathbf{b} \) at the origin \( \boldsymbol{0} \).

There are other ways to define the length of a vector that do not use the inner product as above, but we will be more interested in these ideas in the case of matrices.
The above standard inner product is also a special case of general matrix multiplication.

Basic matrix properties

We have developed a basic use of matrices in R already, often encoding data into a matrix or a dataframe format,

require("faraway")
gala_mat <- as.matrix(gala)
gala_mat

             Species Endemics    Area Elevation Nearest Scruz Adjacent
Baltra            58       23   25.09       346     0.6   0.6     1.84
Bartolome         31       21    1.24       109     0.6  26.3   572.33
Caldwell           3        3    0.21       114     2.8  58.7     0.78
Champion          25        9    0.10        46     1.9  47.4     0.18
Coamano            2        1    0.05        77     1.9   1.9   903.82
Daphne.Major      18       11    0.34       119     8.0   8.0     1.84
Daphne.Minor      24        0    0.08        93     6.0  12.0     0.34
Darwin            10        7    2.33       168    34.1 290.2     2.85
Eden               8        4    0.03        71     0.4   0.4    17.95
Enderby            2        2    0.18       112     2.6  50.2     0.10
Espanola          97       26   58.27       198     1.1  88.3     0.57
Fernandina        93       35  634.49      1494     4.3  95.3  4669.32
Gardner1          58       17    0.57        49     1.1  93.1    58.27
Gardner2           5        4    0.78       227     4.6  62.2     0.21
Genovesa          40       19   17.35        76    47.4  92.2   129.49
Isabela          347       89 4669.32      1707     0.7  28.1   634.49
Marchena          51       23  129.49       343    29.1  85.9    59.56
Onslow             2        2    0.01        25     3.3  45.9     0.10
Pinta            104       37   59.56       777    29.1 119.6   129.49
Pinzon           108       33   17.95       458    10.7  10.7     0.03
Las.Plazas        12        9    0.23        94     0.5   0.6    25.09
Rabida            70       30    4.89       367     4.4  24.4   572.33
SanCristobal     280       65  551.62       716    45.2  66.6     0.57
SanSalvador      237       81  572.33       906     0.2  19.8     4.89
SantaCruz        444       95  903.82       864     0.6   0.0     0.52
SantaFe           62       28   24.08       259    16.5  16.5     0.52
SantaMaria       285       73  170.92       640     2.6  49.2     0.10
Seymour           44       16    1.84       147     0.6   9.6    25.09
Tortuga           16        8    1.24       186     6.8  50.9    17.95
Wolf              21       12    2.85       253    34.1 254.7     2.33

Basic matrix properties

There are several special matrices that are frequently encountered in practical and theoretical work.
Diagonal matrices are special matrices where all off-diagonal elements are equal to 0;
- i.e., the matrix \( \mathbf{A}\in\mathbb{R}^{n\times p} \) is a diagonal matrix if \( a_{ij} = 0 \) for all \( i\neq j \).
The function diag() extracts the main diagonal of a matrix in R,

dim(gala_mat)

[1] 30  7

diag(gala_mat)

[1] 58.00 21.00  0.21 46.00  1.90  8.00  0.34

for (i in 1:7) {print(gala_mat[i,i])}

[1] 58
[1] 21
[1] 0.21
[1] 46
[1] 1.9
[1] 8
[1] 0.34

Basic matrix properties

We can also use the diag() function to produce a diagonal matrix,

diag(3)

     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1

diag(2,3)

     [,1] [,2] [,3]
[1,]    2    0    0
[2,]    0    2    0
[3,]    0    0    2

diag(1:3)

     [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    2    0
[3,]    0    0    3

Basic matrix properties

Diagonal matrices have the benefit that their operation is like that of regular scalar algebra.
- Particularly, operations can be considered element-wise on the diagonal

diag(1:3) - diag(2,3)

     [,1] [,2] [,3]
[1,]   -1    0    0
[2,]    0    0    0
[3,]    0    0    1

diag(1:3) * diag(2,3)

     [,1] [,2] [,3]
[1,]    2    0    0
[2,]    0    4    0
[3,]    0    0    6

diag(1:3) %*% diag(2,3)

     [,1] [,2] [,3]
[1,]    2    0    0
[2,]    0    4    0
[3,]    0    0    6

Basic matrix properties

Note that in the R language * corresponds to element-wise multiplication.
Define the two matrices,

\[ \begin{align} A = \begin{pmatrix} a_1 & a_2 & a_3 \\ a_4 & a_5 & a_6 \\ a_7 & a_8 & a_9 \end{pmatrix} & & B = \begin{pmatrix} b_1 & b_2 & b_3 \\ b_4 & b_5 & b_6 \\ b_7 & b_8 & b_9 \end{pmatrix} \end{align} \]
Their element-wise product is then,

\[ \begin{align} A * B = \begin{pmatrix} a_1 * b_1 & a_2 * b_2 & a_3 * b_3 \\ a_4 * b_4 & a_5 * b_5 & a_6 * b_6 \\ a_7 * b_7 & a_8 * b_8 & a_9 * b_9 \end{pmatrix} \end{align} \]
Note, that this is not the same product as the matrix product defined by %*% in general.
The above element-wise product is like the element-wise vector product;
- this will only be occasionally considered, unlike the matrix product, defined in terms of the vector inner product, which underpins all linear algebra.

Matrix / vector multiplication

As a simple case that leads to general matrix multiplication, let us consider matrix / vector multiplication.
Let's suppose that

\[ \begin{align} \mathbf{N} \in \mathbb{R}^{N \times p} & & \mathbf{x} \in \mathbf{R}^{p \times 1} \end{align} \]
We will suppose that we can write the following

\[ \mathbf{N} = \begin{pmatrix} \mathbf{n}_1^\mathrm{T} \\ \vdots \\ \mathbf{n}_N^\mathrm{T} \end{pmatrix} \] where each \( \mathbf{n}_i^\mathrm{T} \in \mathbb{R}^{1 \times p} \) is a row of the matrix \( \mathbf{N} \).
Then, the recall the vector multiplication, \[ \mathbf{n}_i^\mathrm{T} \mathbf{x} = \sum_{j=1}^p n_{i,j} * x_{j} \]
The product of the matrix and the vector is given as, \[ \mathbf{N}\mathbf{x} = \begin{pmatrix} \mathbf{n}^\mathrm{T}_1 \mathbf{x} \\ \vdots \\ \mathbf{n}_N^\mathrm{T} \mathbf{x} \end{pmatrix} \]

Matrix multiplication

This type of multiplication is commonly known as row-versus-column multiplication.
Particularly, for a simple matrix and vector pair,

\[ \begin{align} \mathbf{A} = \begin{pmatrix} 1 & 2 \\ 3 & 4\end{pmatrix} & & \mathbf{x} = \begin{pmatrix} 1 \\ 1 \end{pmatrix} \end{align} \]
we recover the product

\[ \begin{align} \mathbf{A}\mathbf{x} = \begin{pmatrix} 1 * 1 + 2 * 1 \\ 3 * 1 + 4 * 1 \end{pmatrix} = \begin{pmatrix} 3 \\ 7 \end{pmatrix} \end{align} \]

Matrix multiplication

For two general matrices, \[ \begin{align} \mathbf{N} \in \mathbb{R}^{N \times p} & & \mathbf{M} \in \mathbb{R}^{p \times M} \end{align} \]
Let us write these matrices in terms of sub-vectors as, \[ \begin{align} \mathbf{N} = \begin{pmatrix} \mathbf{n}_1^\mathrm{T} \\ \vdots \\ \mathbf{n}_N^\mathrm{T} \end{pmatrix} & & \mathbf{M} = \begin{pmatrix} \mathbf{m}_1 & \cdots & \mathbf{m}_M\end{pmatrix} \end{align} \] where

\( \mathbf{n}_i^\mathrm{T} \in \mathbb{R}^{1 \times p} \) is the \( i \)-th row vector in the matrix \( \mathbf{N} \in \mathbb{R}^{N\times p} \); and
\( \mathbf{m}_i \in \mathbb{R}^{p\times 1} \) is the \( i \)-th column vector in the matrix \( \mathbf{M}\in \mathbb{R}^{p \times M} \)

We have their product defined as, \[ \begin{align} \mathbf{N} \mathbf{M} = \begin{pmatrix} \mathbf{n}_1^\mathrm{T} \mathbf{m_1} & \mathbf{n}_1^\mathrm{T} \mathbf{m}_2 & \cdots & \mathbf{n}_1^\mathrm{T} \mathbf{m}_M \\ \vdots & \ddots & \cdots & \vdots \\ \mathbf{n}_N^\mathrm{T} \mathbf{m}_1 & \mathbf{n}_N^\mathrm{T} \mathbf{m}_2 & \cdots & \mathbf{n}_N^\mathrm{T} \mathbf{m}_M \end{pmatrix} \in \mathbb{R}^{N\times M} \end{align} \]
Notice that for the product to make sense, the dimensionality has to match between the \( p \) columns in \( \mathbf{N} \) and the \( p \) rows in \( \mathbf{M} \).

This inner dimension of \( p \) is eliminated in the product of the matrices to form the final dimensionality of \( N \times M \).

More notes on diagonal matrices

Notice that given our previous definitions, this means that matrix multiplication and element-wise multiplication are equivalent for diagonal matrices.
- Particularly, all the operations reduce to elements on the diagonal, as all other operations cancel.
Not every matrix can be reduced to a diagonal matrix, but many can be reduced to almost-diagonal or other useful forms under various coordinate transformations.
Matrices can be considered a coordinate representation of a general, linear transformation;
- the choice of coordinates will affect how the linear transformation is represented as a matrix.
A property of the linear transformation that does not depend on the choice of coordinates is called an invariant of the matrix under coordinate transformation.

Basic invariants of matrices

The most basic invariant we can consider is the trace of a matrix.
For an arbitrary matrix of size \( n\times p \), \( \mathbf{A}_{ij} = a_{ij} \) we define this as,

\[ \mathrm{tr}\left(\mathbf{A}\right) = \sum_{i=1}^{\mathrm{min}(n,p)} a_{ii} \]

i.e., the sum of all diagonal elements.
Q: suppose we randomly generate a matrix as follows:

set.seed(0)
my_matrix <- matrix(rnorm(16), nrow=4, ncol=4)
my_matrix

           [,1]       [,2]         [,3]       [,4]
[1,]  1.2629543  0.4146414 -0.005767173 -1.1476570
[2,] -0.3262334 -1.5399500  2.404653389 -0.2894616
[3,]  1.3297993 -0.9285670  0.763593461 -0.2992151
[4,]  1.2724293 -0.2947204 -0.799009249 -0.4115108

How can we find the trace of this matrix in R?

Basic invariants of matrices

A: we can use the diag and the sum function to obtain

sum(diag(my_matrix))

[1] 0.07508687

Trace is relatively easy to understand how to compute, but the meaning of trace can take a while to appreciate.
- We will see one particularly useful form of this in the Frobenius norm of a matrix later on.
Determinants on the other hand are difficult to understand how to compute, but give a very basic tool for understanding matrices.
Only in the special case of a \( 2\times 2 \) matrix is a determinant easy to compute,

\[ \begin{align} \mathbf{A} = \begin{pmatrix} a_1 & a_2 \\ a_3 & a_4 \end{pmatrix} & & \mathrm{det}\left(\mathbf{A}\right) = a_1*a_4 - a_2*a_3 \end{align} \]
We will suppress the general calculation of determinants and instead focus on one of the most useful properties for matrix analysis.
- We will briefly discuss why this is the case later as we introduce eigenvalues and matrix spectra.

Properties of the determinant

The determinant in practice is often used to check the invertibility of a matrix \( \mathbf{A} \).
If \( \mathbf{A} \) is a square matrix, i.e., \( \mathbf{A} \in \mathbb{R}^{n\times n} \), then its inverse is defined,

\[ \mathbf{A}^{-1} \mathbf{A} = \mathbf{A} \mathbf{A}^{-1} = \mathbf{I}_n \]

if it actually even exists.
The following useful property can be used to determine if a matrix is invertible.

The matrix \( \mathbf{A} \) is invertible if and only if \( \mathrm{det}\left(\mathbf{A}\right) \neq 0 \). I.e., the inverse \( \mathbf{A}^{-1} \) will only exist when \( \mathrm{det}\left(\mathbf{A}\right) \neq 0 \) and if \( \mathrm{det}\left(\mathbf{A}\right) = 0 \) there is no such inverse as discussed above.

We note that the determinant and the inverse of a matrix \( \mathbf{A} \) can only be computed in the case when \( \mathbf{A} \) is square in its dimensions.

An example of the determinant

Q: suppose we randomly generate a matrix as follows:

set.seed(0)
my_matrix <- matrix(rnorm(16), nrow=4, ncol=4)
my_matrix

           [,1]       [,2]         [,3]       [,4]
[1,]  1.2629543  0.4146414 -0.005767173 -1.1476570
[2,] -0.3262334 -1.5399500  2.404653389 -0.2894616
[3,]  1.3297993 -0.9285670  0.763593461 -0.2992151
[4,]  1.2724293 -0.2947204 -0.799009249 -0.4115108

Then suppose we compute the determinant as,

det(my_matrix)

[1] -2.429628

can we say the inverse exists?
A: the determinant is non-zero so an inverse exists.

An example of the determinant continued

Now suppose that we define the following matrix,

A <- matrix(1:16, nrow=4, ncol=4)
A

     [,1] [,2] [,3] [,4]
[1,]    1    5    9   13
[2,]    2    6   10   14
[3,]    3    7   11   15
[4,]    4    8   12   16

det(A)

[1] 0

This clearly does not have an inverse, but the question as to why may not be obvious.

An example of the determinant continued

Consider,

(A[,2] - A[,1])

[1] 4 4 4 4

(A[,3] - A[,4])

[1] -4 -4 -4 -4

(A[,2] - A[,1]) + (A[,3] - A[,4])

[1] 0 0 0 0

This says that there is a direct linear dependence between the columns of the matrix, and a zero vector can be written as a combination of the columns.
The maximal number of linearly independent columns can be computed as follows:

qr(A)$rank

[1] 2

An example of the determinant continued

Then consider,

qr(my_matrix)$rank

[1] 4

Notice that the size of my_matrix and A is \( 4\times 4 \), so that we can say that my_matrix has an entire set of linearly independent columns.
The determinant function detects then when there is a linear dependence between the columns, and gives a zero when there is a dependence.
Only square matrices with linearly independent columns (non-zero determinants) have inverses.