02/06/2020
Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
Courtesy of M. W. Toews CC via Wikimedia Commons.
The (arithmetic sample) mean is usually the most important measure of center.
Suppose we have \( n \) total sample measurements of some variable \( x \).
Then, the (arithmetic sample) mean is defined
\[ \text{Sample mean} = \frac{x_1 +x_2 +\cdots + x_n}{n}= \frac{\sum_{i=1}^n x_i}{n} \]
Discuss with a neighbor: is the sample mean a statistic or a parameter?
An important property of the sample mean is that it tends to vary less over re-sampling than other statistics.
However, the sample mean is very sensitive to outliers.
A statistic is called resistant if it doesn't change very much with respect to outlier data.
Let us consider the last example once again.
Suppose our sample includes the values \( 22, 22, 26, 24, 23, 27 \).
If we compute the (arithmetic sample) mean, we find
\[ \frac{22+22+26+24+23+27}{6} = \frac{144}{6} = 24. \]
Now, suppose that we realize that the value \( 27 \) was obtained due to measurement error and our sample should have read \( 22, 22, 26, 24, 23, 1000 \).
Discuss with a neighbor: by replacing the value \( 27 \) with \( 1000 \) does this affect the median? Does this affect the mean? Which of these statistics are resistant to outliers?
Another notion of the most “central point” in the data can be the value that is measured most frequently.
Mode – the mode is the observed value that is most frequent in the data.
Consider the last example with samples of \( 22, 22, 26, 24, 23, 27 \). Q: What is the mode?
When two or more values have the highest frequency, we call the data bi-modal or multi-modal.
Courtesy of Diva Jain CC via Wikimedia Commons.
11 football players from the Seattle Seahawks were randomly sampled for their weight in pounds.
The samples are \( 189, 254, 235, 225, 190, 305, 195, 202, 190, 252, 305 \).
Discuss with a neighbor: what are the mean, median and mode of this data? Does the data appear to be normally distributed? Why?
Here the mean is given by \[ \frac{189+254+235+225+190+305+195+202+190+252+305}{11} \approx 231.09. \]
The ordered data is given by \( 189, 190, 190, 195, 202, 225, 235, 252, 254, 305, 305 \).
The number of samples is odd, so the middle value can be identified as \( 225 \).
The data also has two modes, \( 190 \) and \( 305 \).
Overall, the data appears to be non-normal, as there are many values around \( 190 \), with a long tail into the upper values.
Discuss with a neighbor: for each of the following, identify a major reason why the mean and median are not meaningful statistics.
As a final measure of center, we can consider what is the mid-point between the maximum observation and the minimum observation.
Midrange – suppose we have samples \( x_1,\cdots, x_n \) and
\[ \begin{align} x_\text{max} = \max_i(x_i) & & x_\text{min} = \min_i(x_i) \end{align} \]
Then, the midrange is computed as
\[ \text{midrange} = \frac{x_\text{max} + x_\text{min} }{2} \]
Discuss with your neighbor: can you give an example of when the midrange does not equal the median?
As we can see, midrange is extremely sensitive to outliers, both small and large.
Midrange is not used as often as the other measures in practice, but it can give a more complete picture of the data when used with the other measures.
Let us suppose that we have samples \( x_1, x_2, \cdots, x_n \).
Suppose each sample is given a corresponding weight \( w_i \) so that there are pairs of the form \[ \begin{matrix} x_1 & w_1\\ x_2 & w_2 \\ \vdots & \vdots \\ x_n & w_n \end{matrix} \]
We compute a weighted mean using the following formula,
\[ \frac{\sum_{i=1}^n x_i \times w_i}{\sum_i^n w_i} = \frac{x_1 \times w_1 + x_2 \times x_2 + \cdots + x_n \times w_n}{w_1 + w_2 + \cdots + w_n} \]
Let's suppose that we want to compute the grade point mean (GPA) for some student.
We will suppose that the student gets letter grades as follows: \( A, B, C, A, B \).
The letters are given point values as \( A=4.0, B=3.0, C=2.0, D=1.0 \)
The GPA is computed as a weighted mean of the grade points, weighted by the number of credits for the class.
\[ \begin{matrix} A & 3 \text{ credits} \\ B & 2 \text{ credits} \\ C & 2 \text{ credits} \\ A & 1 \text{ credit} \\ B & 3 \text{ credits} \end{matrix} \]
Discuss with a neighbor: how do we compute the weighted mean in this case? What is the GPA?
\[ \frac{4.0 \times 3 + 3.0 \times 2 + 2.0 \times 2 + 4.0 \times 1 + 3.0 \times 3}{3 + 2 + 2 + 1 + 3}= \frac{35}{11} \approx 3.18 \]
From the last slide, using the analysis of the units, we arrived at the formula which matches the formula for the combined state average
\[ \frac{20.9 \times 307,267 \text{ Students in C} + 16.8 6,998 \text{ Students in A} }{\text{ 315,013 Total all teachers}}\approx 20.8 \]
In this case, the properly chosen weights were given by
\[ \begin{align} w_1 &= \frac{307,267}{ 315,013} \frac{\text{ Teachers in C}}{ \text{ Total all teachers}}\\ w_2 &= \frac{6,998}{ 315,013 } \frac{\text{ Teachers in A}}{ \text{ Total all teachers}} \end{align} \]
That is, we were able to find the true combined state mean by finding weights proportional to the sub-populations.
Courtesy of Mario Triola, Essentials of Statistics, 5th edition
Courtesy of Mario Triola, Essentials of Statistics, 5th edition
Courtesy of Mario Triola, Essentials of Statistics, 5th edition