02/06/2020
Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
Courtesy of M. W. Toews CC via Wikimedia Commons.
The (arithmetic sample) mean is usually the most important measure of center.
Suppose we have \( n \) total sample measurements of some variable \( x \).
Then, the (arithmetic sample) mean is defined
\[ \text{Sample mean} = \frac{x_1 +x_2 +\cdots + x_n}{n}= \frac{\sum_{i=1}^n x_i}{n} \]
Discuss with a neighbor: is the sample mean a statistic or a parameter?
An important property of the sample mean is that it tends to vary less over re-sampling than other statistics.
However, the sample mean is very sensitive to outliers.
A statistic is called resistant if it doesn't change very much with respect to outlier data.
Let us consider the last example once again.
Suppose our sample includes the values \( 22, 22, 26, 24, 23, 27 \).
If we compute the (arithmetic sample) mean, we find
\[ \frac{22+22+26+24+23+27}{6} = \frac{144}{6} = 24. \]
Now, suppose that we realize that the value \( 27 \) was obtained due to measurement error and our sample should have read \( 22, 22, 26, 24, 23, 1000 \).
Discuss with a neighbor: by replacing the value \( 27 \) with \( 1000 \) does this affect the median? Does this affect the mean? Which of these statistics are resistant to outliers?
Another notion of the most “central point” in the data can be the value that is measured most frequently.
Mode – the mode is the observed value that is most frequent in the data.
Consider the last example with samples of \( 22, 22, 26, 24, 23, 27 \). Q: What is the mode?
When two or more values have the highest frequency, we call the data bi-modal or multi-modal.