# Introduction to Python part IV (And a discussion of linear transformations)

## Activity 1: Discussion of linear transformations


* Orthogonality also plays a key role in understanding linear transformations.  How can we understand linear transformations in terms of a composition of rotations and diagonal matrices?  There are two specific matrix factorizations that arise this way, can you name them and describe the conditions in which they are applicable?

* What is a linear inverse problem?  What conditions guarantee a solution?

* What is a pseudo-inverse?  How is this related to an orthogonal projection?  How is this related to the linear inverse problem?

* What is a weighted norm and what is a weighted pseudo-norm?

## Activity 2: Basic data analysis and manipulation

In [None]:
import numpy as np

### Exercise 1:

Arrays can be concatenated and stacked on top of one another, using NumPy’s `vstack` and `hstack` functions for vertical and horizontal stacking, respectively.


In [None]:
A = np.array([[1,2,3], [4,5,6], [7, 8, 9]])
print('A = ')
print(A)

B = np.hstack([A, A])
print('B = ')
print(B)

C = np.vstack([A, A])
print('C = ')
print(C)

Write some additional code that slices the first and last columns of A, and stacks them into a 3x2 array. Make sure to print the results to verify your solution.

Note a ‘gotcha’ with array indexing is that singleton dimensions are dropped by default. That means `A[:, 0]` is a one dimensional array, which won’t stack as desired. To preserve singleton dimensions, the index itself can be a slice or array. For example, `A[:, :1]` returns a two dimensional array with one singleton dimension (i.e. a column vector).

In [None]:
D = np.hstack((A[:, :1], A[:, -1:]))
print('D = ')
print(D)

An alternative way to achieve the same result is to use Numpy’s delete function to remove the second column of A.  Use the search function for the documentation on the `np.delete` function to find the syntax for constructing such an array.


### Exercise 2:

The patient data is longitudinal in the sense that each row represents a series of observations relating to one individual. This means that the change in inflammation over time is a meaningful concept. Let’s find out how to calculate changes in the data contained in an array with NumPy.

The `np.diff` function takes an array and returns the differences between two successive values. Let’s use it to examine the changes each day across the first week of patient 3 from our inflammation dataset.

In [None]:
patient3_week1 = data[3, :7]
print(patient3_week1)

Calling `np.diff(patient3_week1)` would do the following calculations

`[ 0 - 0, 2 - 0, 0 - 2, 4 - 0, 2 - 4, 2 - 2 ]`

and return the 6 difference values in a new array.

In [None]:
np.diff(patient3_week1)

Note that the array of differences is shorter by one element (length 6).

When calling `np.diff` with a multi-dimensional array, an axis argument may be passed to the function to specify which axis to process. When applying `np.diff` to our 2D inflammation array data, which axis would we specify?  Take the differences in the appropriate axis and compute a basic summary of the differences with our standard statistics above.

If the shape of an individual data file is (60, 40) (60 rows and 40 columns), what is the shape of the array after you run the `np.diff` function and why?

How would you find the largest change in inflammation for each patient? Does it matter if the change in inflammation is an increase or a decrease?

## Summary of key points

Some of the key takeaways from this activity are the following:

 * Import a library into a program using import libraryname.

 * Use the numpy library to work with arrays in Python.

 * The expression `array.shape` gives the shape of an array.

 * Use `array[x, y]` to select a single element from a 2D array.

 * Array indices start at 0, not 1.

 * Use `low:high` to specify a slice that includes the indices from `low` to `high-1`.

 * Use `# some kind of explanation` to add comments to programs.

 * Use `np.mean(array)`, `np.std(array)`, `np.quantile(array)`, `np.max(array)`, and `np.min(array)` to calculate simple statistics.
 
 * Use `sp.mode(array)` to compute additional statistics.
 
 * Use `np.mean(array, axis=0)` or `np.mean(array, axis=1)` to calculate statistics across the specified axis.