# Introduction to Python part VIII (And a discussion of the covariance and the multivariate Gaussian)

## Activity 1: Discussion of multiple random variables

  * What is a conditional expectation?  How is this related to conditional probability?
  * What are two important properties of the conditional Gaussian?  How does this relate to the notions of independence / correlation?
  * What is the affine closure of the Gaussian?  How does this relate to the tangent approximation?

## Activity 2: Analyzing data from multiple files

As a final piece to processing our inflammation data, we need a way to get a list of all the files in our data directory whose names start with inflammation- and end with .csv. The following library will help us to achieve this:



In [None]:
import glob

The `glob` library contains a function, also called `glob`, that finds files and directories whose names match a pattern. We provide those patterns as strings: the character `*` matches zero or more characters, while `?` matches any one character. We can use this to get the names of all the CSV files in the current directory:

In [None]:
print(glob.glob('./swc-python/data/inflammation*.csv'))

As these examples show, glob.glob’s result is a list of file and directory paths in arbitrary order. This means we can loop over it to do something with each filename in turn. In our case, the “something” we want to do is generate a set of plots for each file in our inflammation dataset. If we want to start by analyzing just the first three files in alphabetical order, we can use the sorted built-in function to generate a new sorted list from the glob.glob output:



In [None]:
import glob
import numpy
import matplotlib.pyplot

filenames = sorted(glob.glob('./swc-python/data/inflammation*.csv'))
filenames = filenames[0:3]
for filename in filenames:
    print(filename)

    data = numpy.loadtxt(fname=filename, delimiter=',')

    fig = matplotlib.pyplot.figure(figsize=(10.0, 3.0))

    axes1 = fig.add_subplot(1, 3, 1)
    axes2 = fig.add_subplot(1, 3, 2)
    axes3 = fig.add_subplot(1, 3, 3)

    axes1.set_ylabel('average')
    axes1.plot(numpy.mean(data, axis=0))

    axes2.set_ylabel('max')
    axes2.plot(numpy.max(data, axis=0))

    axes3.set_ylabel('min')
    axes3.plot(numpy.min(data, axis=0))

    fig.tight_layout()
    matplotlib.pyplot.show()

Sure enough, the maxima of the first two data sets show exactly the same ramp as the first, and their minima show the same staircase structure; a different situation has been revealed in the third dataset, where the maxima are a bit less regular, but the minima are consistently zero.



### Exercise:

Using the above as a template, plot the difference between the average inflammations reported in the first and second datasets (stored in inflammation-01.csv and inflammation-02.csv, correspondingly), i.e., the difference between the leftmost plots of the first two figures.

### Exercise:

Use each of the files once to generate a dataset containing values averaged over all patients:



In [None]:
filenames = glob.glob('./swc-python/data/inflammation*.csv')
composite_data = numpy.zeros((60,40))
for filename in filenames:
    # sum each new file's data into composite_data as it's read
    #

# and then divide the composite_data by number of samples
composite_data = composite_data / len(filenames)