{
"cells": [
{
"cell_type": "markdown",
"id": "2002f115",
"metadata": {},
"source": [
"# Introduction to Python part III (And a discussion of orthgonality)"
]
},
{
"cell_type": "markdown",
"id": "743b2102",
"metadata": {},
"source": [
"## Activity 1: Discussion of orthogonality\n",
"\n",
"One of the primary concepts discussed in the review of inner product spaces is the concept of orthogonality. We will discuss as a class the following points:\n",
" \n",
" * How is orthogonality derived from the vector inner product? What does this represent?\n",
" * How is orthogonality used to construct bases for a vector space / sub-space?\n",
" * What are what is the Gram-Schidt process and what is the consequence of this as a matrix decomposition?\n",
" * What is the geometric interpretation of an orthogonal matrix?"
]
},
{
"cell_type": "markdown",
"id": "541562c2",
"metadata": {},
"source": [
"## Activity 2: Basic data analysis and manipulation"
]
},
{
"cell_type": "markdown",
"id": "a2ad8c64",
"metadata": {},
"source": [
"We will continue from here with analyzing the patient data in `inflammation-01.csv`. Use the `loadtxt` command from `numpy` to load the data into the namespace as a variable `data`. Notice that the `#` denotes a comment, i.e., a note that will not be run by the Python kernel and instead serves as a reference to ourself and future programmers about the use of the code."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b58ab9b1",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"# load the data below specifying the correct path in the load text with the correct commands for a csv file\n"
]
},
{
"cell_type": "markdown",
"id": "6a91d7f8",
"metadata": {},
"source": [
"NumPy has several useful functions that take an array as input to perform operations on its values. If we want to find the average inflammation for all patients on all days, for example, we can ask NumPy to compute data’s mean value with the function `np.mean`."
]
},
{
"cell_type": "markdown",
"id": "43f67ead",
"metadata": {},
"source": [
"### Exercise 1\n",
"\n",
"The standard five number summary to analyze data in terms of the center, spread, skewness and extreme values is given in terms of:\n",
"\n",
" * the minumum value\n",
" * the maximum value\n",
" * the median\n",
" * the upper quartile\n",
" * the lower quartile\n",
" \n",
"In addition, the mean, mode and standard deviation are useful descriptors of the data. In the following, look for the methods needed to compute the above statistics on the `data` array. Furthermore, load these into variables such that the below print statements will work. \n",
"\n",
"Note that some of these may be better computed with methods in the complementary scientific computing library `scipy`. This is imported in the standard convention below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2a045285",
"metadata": {},
"outputs": [],
"source": [
"import scipy as sp"
]
},
{
"cell_type": "markdown",
"id": "9658c3cf",
"metadata": {},
"source": [
"Note that you may need to use a command such as `conda install scipy` for the above to work. How do we know what functions NumPy and SciPy have and how to use them? If you are working in IPython or in a Jupyter Notebook, there is an easy way to find out. If you type the name of something followed by a dot, then you can use `tab` completion (e.g. type `numpy.` and then press Tab) to see a list of all functions and attributes that you can use. After selecting one, you can also add a question mark (e.g. `np.cumprod?`), and IPython will return an explanation of the method!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6e27f406",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "9cc62d6e",
"metadata": {},
"outputs": [],
"source": [
"print('maximum inflammation:', maxval)\n",
"print('minimum inflammation:', minval)\n",
"print('standard deviation:', stdval)\n",
"print('median inflammation:', medval)\n",
"print('mean inflammation:', meanval)\n",
"print('upper quartiile inflammation', up_quartval)\n",
"print('lower quartiile inflammation', lo_quartval)"
]
},
{
"cell_type": "markdown",
"id": "a81b084a",
"metadata": {},
"source": [
"### Exercise 2:\n",
"\n",
"When analyzing data, we often want to look at variations in statistical values, such as the *maximum inflammation per patient* or the *average inflammation per day*. We note that our data example is formatted in the following row / column orientation:\n",
"![Diagram of data orientation](https://swcarpentry.github.io/python-novice-inflammation/fig/python-operations-across-axes.png)"
]
},
{
"cell_type": "markdown",
"id": "a96f7828",
"metadata": {},
"source": [
"With the above conventions, compute the patient max inflammation and the average inflamation. Use the keyword arguments above in the `np.max` and `np.mean` functions to specify over which dimension we make a calculation. Use this to verify a second calculation, slicing the array with the `:` notation in one of the axes to compute the patient max for the third patient and the daily average on the second day."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0700af42",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "4d81445b",
"metadata": {},
"source": [
"### Exercise 3:\n",
"\n",
"Arrays can be concatenated and stacked on top of one another, using NumPy’s `vstack` and `hstack` functions for vertical and horizontal stacking, respectively.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "77b22805",
"metadata": {},
"outputs": [],
"source": [
"A = np.array([[1,2,3], [4,5,6], [7, 8, 9]])\n",
"print('A = ')\n",
"print(A)\n",
"\n",
"B = np.hstack([A, A])\n",
"print('B = ')\n",
"print(B)\n",
"\n",
"C = np.vstack([A, A])\n",
"print('C = ')\n",
"print(C)"
]
},
{
"cell_type": "markdown",
"id": "17f5cd1e",
"metadata": {},
"source": [
"Write some additional code that slices the first and last columns of A, and stacks them into a 3x2 array. Make sure to print the results to verify your solution.\n",
"\n",
"Note a ‘gotcha’ with array indexing is that singleton dimensions are dropped by default. That means `A[:, 0]` is a one dimensional array, which won’t stack as desired. To preserve singleton dimensions, the index itself can be a slice or array. For example, `A[:, :1]` returns a two dimensional array with one singleton dimension (i.e. a column vector)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "53c5355f",
"metadata": {},
"outputs": [],
"source": [
"D = np.hstack((A[:, :1], A[:, -1:]))\n",
"print('D = ')\n",
"print(D)"
]
},
{
"cell_type": "markdown",
"id": "d16337b5",
"metadata": {},
"source": [
"An alternative way to achieve the same result is to use Numpy’s delete function to remove the second column of A. Use the search function for the documentation on the `np.delete` function to find the syntax for constructing such an array.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f22a8c25",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "862752bd",
"metadata": {},
"source": [
"### Exercise 4:\n",
"\n",
"The patient data is longitudinal in the sense that each row represents a series of observations relating to one individual. This means that the change in inflammation over time is a meaningful concept. Let’s find out how to calculate changes in the data contained in an array with NumPy.\n",
"\n",
"The `np.diff` function takes an array and returns the differences between two successive values. Let’s use it to examine the changes each day across the first week of patient 3 from our inflammation dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5a8917fd",
"metadata": {},
"outputs": [],
"source": [
"patient3_week1 = data[3, :7]\n",
"print(patient3_week1)"
]
},
{
"cell_type": "markdown",
"id": "bbced375",
"metadata": {},
"source": [
"Calling `np.diff(patient3_week1)` would do the following calculations"
]
},
{
"cell_type": "markdown",
"id": "efe8aeca",
"metadata": {},
"source": [
"`[ 0 - 0, 2 - 0, 0 - 2, 4 - 0, 2 - 4, 2 - 2 ]`"
]
},
{
"cell_type": "markdown",
"id": "ceee5239",
"metadata": {},
"source": [
"and return the 6 difference values in a new array."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "70b99623",
"metadata": {},
"outputs": [],
"source": [
"np.diff(patient3_week1)"
]
},
{
"cell_type": "markdown",
"id": "96e27c62",
"metadata": {},
"source": [
"Note that the array of differences is shorter by one element (length 6)."
]
},
{
"cell_type": "markdown",
"id": "f3a133a3",
"metadata": {},
"source": [
"When calling `np.diff` with a multi-dimensional array, an axis argument may be passed to the function to specify which axis to process. When applying `np.diff` to our 2D inflammation array data, which axis would we specify? Take the differences in the appropriate axis and compute a basic summary of the differences with our standard statistics above."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "86747984",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "4cff87d9",
"metadata": {},
"source": [
"If the shape of an individual data file is (60, 40) (60 rows and 40 columns), what is the shape of the array after you run the `np.diff` function and why?"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9c04385a",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "527bf110",
"metadata": {},
"source": [
"How would you find the largest change in inflammation for each patient? Does it matter if the change in inflammation is an increase or a decrease?"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fb3fd9dd",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"id": "769750b5",
"metadata": {},
"source": [
"## Summary of key points\n",
"\n",
"Some of the key takeaways from this activity are the following:\n",
"\n",
" * Import a library into a program using import libraryname.\n",
"\n",
" * Use the numpy library to work with arrays in Python.\n",
"\n",
" * The expression `array.shape` gives the shape of an array.\n",
"\n",
" * Use `array[x, y]` to select a single element from a 2D array.\n",
"\n",
" * Array indices start at 0, not 1.\n",
"\n",
" * Use `low:high` to specify a slice that includes the indices from `low` to `high-1`.\n",
"\n",
" * Use `# some kind of explanation` to add comments to programs.\n",
"\n",
" * Use `np.mean(array)`, `np.std(array)`, `np.quantile(array)`, `np.max(array)`, and `np.min(array)` to calculate simple statistics.\n",
" \n",
" * Use `sp.mode(array)` to compute additional statistics.\n",
" \n",
" * Use `np.mean(array, axis=0)` or `np.mean(array, axis=1)` to calculate statistics across the specified axis."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}