Normal probability distributions part I

03/24/2020

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

The following topics will be covered in this lecture:
- Uniform distribution
- Normal distributions
- Finding the area corresponding to a probability
- Finding a score corresponding to an area
- Critical values
- General normal distributions

Continuous random variables

Histogram of probability distribution for two coin flips with x number of heads.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

So far our examples have focused on discrete random variables, e.g.:

Results of coin flips – \( x \) is modeled with a binomial distribution.
Results of success / failure trials – \( x \) is modeled with a binomial distribution.
Number of occurances in an interval – \( x \) is modeled with a Poisson distribution.

We will now turn our attention to continuous random variables, but we will use what we learned about discrete variables to motivate this.
Recall that the probability histogram had the property, \[ \begin{align} \text{Area of Rectangle }x_\alpha &= P(x=x_\alpha) \times 1\\ &= P(x=x_\alpha). \end{align} \]

We also saw that we have the property \[ \sum_{x_\alpha \in \mathbf{R}} P(x=x_\alpha) =1. \]
Putting the above two properties together, we know, \[ \sum_{x_\alpha \in \mathbf{R}} \text{Area of Rectangle }x_\alpha =1. \]
For continuous random variables, we in fact have the same property with a minor modification:
Let \( f(x) \) describe a curve for a probability distribution. Then the total area under the curve \( f(x) \) equals \( 1 \), and the probability of any event \( A \) equals the associated area under \( f(x) \) for all \( x_\alpha \) in the case of \( A \).

Uniform distribution

Random variables are the numerical measure of the outcome of a random process.

Courtesy of Ania Panorska CC

A basic example of the area property is with the uniform distribution

Let’s suppose that we are studying some procedure where all outcomes are equally likely.

A very simple example is if you are asked to guess a random number between \( 1 \) and \( 10 \), but including decimals.

That is, we will suppose that guessing \( 1.23453453 \) is equally likely as guessing \( 5 \).

Viewed in the above,
- Our random experiment is guessing some number.
- The outcome is one guess.
- The random variable \( x \) is assigned the value of the guess.
Because we allow arbitrary decimal expansions, there are infinitely many choices.
However, all choices lie in the finite range \( [1,10] \) and are equally likely.
Discuss with a neighbor: if the area under the curve \( f(x) \) for \( x_\alpha \) in \( [1, 10] \) must equal one, and the height of \( f(x) \) is constant, what is the height?

Uniform distribution continued

Courtesy of IkamusumeFan CC via Wikimedia Commons

Recall, the area is given by the height \( h \) times the width \( w \).
The area is fixed at \( 1 \) and the width is \( 10-1 \) so that, \[ \begin{align} &1 = h\times w \\ \Leftrightarrow & \frac{1}{9} = h \end{align} \]
Here, the probability distribution curve, \[ f(x) = \begin{cases} \frac{1}{9} & \text{for }x\text{ in }[1,10] \\ 0 & \text{else} \end{cases} \]
More generally, consider any range of values \( [a,b] \) where \( a < b \).
If we can randomly select any value in the range \( [a,b] \) with the same likelihood
let \( x \) be the random variable assigned the value we select.

Then the probability distribution for \( x \) is uniform over \( [a,b] \) with \[ f(x) = \begin{cases} \frac{1}{b-a} & \text{for }x\text{ in }[a,b] \\ 0 & \text{else} \end{cases} \]
The graph of this distribution curve (as above) is called the probability density.
Thus, if we take any \( \alpha < \beta \) in \( [a, b] \), the probability of \( x \) in \( [\alpha, \beta] \) is given by the area (width times the height of this block), \[ (\beta - \alpha) \times \frac{1}{b-a}. \]

Uniform distribution example

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Let’s consider an example of the uniform distribution that is from real life.

During certain times at the RNO airport, waiting times in security are uniformly distributed in the interval between \( 0 \) and \( 5 \) minutes.
This is to say that:

All waiting times in the interval \( [0,5] \) are equally likely.
The waiting time can be measured to an arbitrary decimal place,

e.g., you could wait exactly 1.3534543 minutes.

The probability distribution for waiting times is given as \[ f(x) = \begin{cases} \frac{1}{5} & \text{for }x\text{ in }[0,5] \\ 0 & \text{else} \end{cases} \]

And, the probability of waiting some ammount of time is equal to the associated area under \( f(x) \).

Discuss with a neighbor: what is the probability of waiting between \( 2 \) and \( 5 \) minutes at RNO security, if waiting time \( x \) is uniformly distributed over \( [0,5] \)?

Uniform distribution example continued

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

We want to find the area under \( f(x) \) defined as, \[ f(x) = \begin{cases} \frac{1}{5} & \text{for }x\text{ in }[0,5] \\ 0 & \text{else} \end{cases} \]
but for the interval between \( 2 \) and \( 5 \).
The area can be derived as the height times width where, \[ \begin{align} w = 5 - 2 = 3 & & h = 0.2 \text{ for }x\text{ in the range }[2,5] \end{align} \]
That is, \[ \begin{align} P(x \text{ in }[2, 5]) &= h \times w \\ &= 3 \times 0.2 = 0.6 \end{align} \]

Note that the formula for the area, \[ \text{Area} = h \times w \] only applies for rectangles as above.
However, the principle of,

\[ \text{Probability } = \text{Area under the probability density graph} \]
holds for all distributions \( f(x) \).
Particularly, this also holds for non-rectangular, bell shaped curves…

The normal distribution

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Recall, frequency data is called normal when it exhibits the following:

The frequencies start low, then increase to one or two high frequencies, and then decrease to a low frequency.
The distribution is approximately symmetric.
There are few if any extreme values.

We have a theoretical probability model for this type of data called a normal distribution.
Let \( x \) be a random variable with mean \( \mu \) and standard deviation \( \sigma \) which behaves as above.

We say that \( x \) has a normal distribution, \[ f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{\frac{-1}{2}\left(\frac{x- \mu}{\sigma}\right)^2} \] with the parameters \( \mu \) and \( \sigma \).
Note: the area under \( f(x) \) cannot be computed by \( l\times h \), but the total area under the above curve is still \( 1 \).

Therefore, accurately computing the probability of some event is typically done with computer methods – like in StatCrunch.

A special case of the normal distribution is the standard normal distribution with mean \( \mu =0 \) and standard deviation \( \sigma=1 \), \[ f(x) = \frac{1}{\sqrt{2\pi}} e^{\frac{-1}{2}x^2} \]
For the standard normal distribution as above, the z-score of some observation \( x \) is actually just equal to \( x \).

Standard normal distribution example

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Let’s consider an example of where a normal probability distribution is a good theoretical model of a population.

Bone mineral density test are used to identify the presence or likelihood of osteoporosis,

osteoporosis is a disease causing bones to become more fragile and more likely to break.

The bone densities of adults in the United States are well represented by a normal distribution.

We will be concerned if we observe an individual with an extreme value for their bone density test, relative to the population mean.

Therefore, the result of the test is usually given in terms of a z-score.

Recall that for a standard normal distribution the z-score is equivalent to the value of the observation \( x \).

Therefore, if we want to find the probability of finding some observation at least as extreme as some z-score, we can find the associated area for the standard normal distribution.

We will build up to finding the probability of some observation “at-least-as-extreme” with some simpler examples.
Consider the following: if we want to find the probability that a random adult has a bone density result (z-score) at most \( 1.27 \), what is the associated area under the normal density?

This is the area under the bell curve to the left of the value \( 1.27 \); we can visualize this directly in StatCrunch.

Standard normal distribution example

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

We will consider the bone density example again as motivation.

Suppose that we randomly select an adult in the USA to be given the bone density test.
Consider the following: if we want to find probability that this individual has a bone density result (z-score) greater than \( -1 \), what is the associated area under the normal density?

This is the area under the bell curve to the right of the value \( -1 \); we can visualize this directly in StatCrunch.

Consider the following: if we want to find probability that this individual has a bone density result (z-score) in the range \( [-1,1] \), what is the associated area under the normal density? How is this related to the empirical rule?

This is the area under the bell curve between the values \( -1 \) and \( 1 \); we can visualize this directly in StatCrunch.
Recall that the empirical rule describes how much of a normal population lies within
- \( [\mu - \sigma, \mu + \sigma] \) – approximately \( 68\% \);
- \( [\mu - 2 \sigma, \mu + 2 \sigma] \) – approximately \( 95\% \);
- \( [\mu - 3\sigma, \mu + 3\sigma] \) – approximately \( 99.9\% \).
For a standard normal we have \( \mu = 0 \) and \( \sigma =1 \) so that approximately \( 68\% \) of the population lies within \( [-1 ,1] \).
Equivalently, we know that the area under the normal density graph in the interval \( [-1,1] \) is approximately \( 0.68 \).

Finding the area between two z-scores

Area under the standard normal distribution.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

While we can use the empirical rule to find the area under the normal density in special cases:

one standard deviation of the mean;
two standard deviations of the mean;
three standard deviations of the mean;

we will often want to find the probability of a range of values much more generally.
For more general cases, we will usually find the area using software like StatCrunch.
However, we can also deduce the area in an interval, if we know the area to the left or right of both endpoints.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Consider the following: suppose we know that the area under the normal density to the left of \( z=-1.00 \) is approximately \( 0.1587 \).
Suppose we also know that the area under the normal density to the left of \( z=-2.50 \) is approximately \( 0.0062 \).
When an individual has a bone density score (z-score) between \( -2.50 \) and \( -1.00 \), we say this individual has osteopenia, or some bone loss.
Can you deduce what the probability is of randomly selecting a US adult with osteopenia?

Finding the area between two z-scores continued

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Let’s deduce this mathematically and pictorally:

Suppose that \( x \) is the area under the normal density to the left of \( z=-1.00 \) and to the right of \( z=-2.50 \).
We know that the total area to the left of \( z=-1.00 \) is equal to \( 0.1587 \), and that we can cut this area into two pieces:

the area \( x \); and
the area to the left of \( z=-2.50 \), i.e., \( 0.0062 \).

Therefore, we have, \[ \begin{align} & x + 0.0062 = 0.1587 \\ \Leftrightarrow & x = 0.1587 - 0.0062 \end{align} \]

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Therefore, mathematically we find the area in between \( z =-2.50 \) and \( z=-1.00 \) to be \( 0.1525 \).
Geometrically, this is also equivalent to removing the shaded area on the right from the shaded area on the left.
In pictures we have this precisely as follows:

Finding a test score

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Instead of finding the probability of some bone density score, we may also be interested in finding the bone density corresponding to some percentile of the population.
Using the probabilty distribution, we can associate a percentile identically with the area under its density graph.
Because the bone density scores follow a standard normal distribution, we associate the percentile with a corresponding z-score.
If we want to find the \( 95 \)-th percentile, we will look for the z-score which separates the area under the density as:

area on the left-hand-side of \( z \): \( 0.95 \);
area on the right-hand-side of \( z \): \( 0.05 \).

Computing the associated z-score corresponding to the percentile is difficult to do without software and we will not consider this.
Instead, we will look at how to do this directly in StatCrunch.
Recall: when there is a \( 5\% \) or less chance of observing some case at least as extreme as some given \( z \), we call this \( z \) significant.
If we are interested in finding all \( z \) for which there is a \( 5\% \) or less chance of observing such an extremely high bone density,

this choice of \( z \) above separates observations which are significant from observations that are not significant.

Such choices of \( z \) are called critical values.

Finding a test score continued

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

The idea at least as extreme can have different meanings in different contexts.
Consider the following: if we were interested in finding all values of \( z \) which there is a \( 5\% \) or less chance of observing such an extremely low bone density, what percentile would the \( z \) critical value correspond to?

In this case, we are interested in finding the \( z \) critical value that separates the popultion where:

\( 5\% \) of all cases have bone density less than or equal to this bone density score; and
\( 95\% \) of all cases have bone density greater than this bone density score.

Therefore, the \( z \) critical value will correspond to the \( 5 \)-th percentile.
We can visualize this in StatCrunch.

The last two examples, we considered \( 1 \)-sided critical values;

i.e., we only considered if observations were either extremely low or extremely high, but not both.

However, like our empirical rule, we are often concerned with observations that are extreme with respect to their deviation from the mean;

to find all observations that are extreme with respect to their deviation from the mean, we will use \( 2 \)-sided critical values.

Consider the following: using the graph above, can you explain why \( z=1.96 \) is a \( 2 \)-sided critical value? What is the probability of selecting a random adult with at least as extreme bone density away from the mean?

The probability that an observation that is at least \( 1.96 \) away from the mean (in the positive or negative direction) is \( 5\% \).

Critical values

Critical values are an essential concept we will use often in hypothesis testing.
We will return to this topic in more detail when we reach this section of the course; however, at the moment we will introduce some notation.
By convention, we will usually choose \( 5\% \) as the measure of significance.

However, there are often times we will choose a different value such as \( 1\% \), \( 2.5\% \), etc. depending on the problem.

To consider this more generally, suppose \( \alpha \) is some decimal value – we will call \( \alpha \) the significance level.
If we are considering a \( 1 \)-sided measure of extremeness, the corresponding critical value will be denoted \( z_\alpha \).

For extremely high values, we say \( z_\alpha \) is the value for which \( \alpha \times 100 \% \) cases in the poplulation are at least as high and \( (1-\alpha)\times 100\% \) of cases are lower.
For extremely low values, we say \( z_\alpha \) is the value for which \( \alpha \times 100 \% \) cases in the poplulation are at least as low and \( (1-\alpha)\times 100\% \) of cases are higher.

If we are considering a \( 2 \)-sided measure of extremeness, the corresponding critical value will be denoted \( z_\frac{\alpha}{2} \).

For the \( 2 \)-sided measure of extremeness \( z_\frac{\alpha}{2} \) is the value for which \( \frac{\alpha}{2} \times 100 \% \) cases in the poplulation are at least as high and \( (1- \frac{\alpha}{2})\times 100\% \) of cases are lower
Notice: symmetry of the standard normal distribution about zero means that \( \frac{\alpha}{2} \times 100 \% \) cases in the poplulation are at least as low as \( -z_\frac{\alpha}{2} \) and \( (1- \frac{\alpha}{2})\times 100\% \) of cases are higher than \( -z_\frac{\alpha}{2} \).
Put together, this says \( \alpha \times 100\% \) of cases lie outside of \( [-z_\frac{\alpha}{2}, z_\frac{\alpha}{2}] \) while \( (1 - \alpha)\times 100\% \) of cases lie within.

On the last slide, \( \alpha=0.05 \), and \( z_\frac{\alpha}{2} = z_{0.025} = 1.96 \).

General normal distributions

Standardizing data by the z-score formula.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Recall the general normal distribution, \[ f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{\frac{-1}{2}\left(\frac{x- \mu}{\sigma}\right)^2} \] with the parameters \( \mu \) and \( \sigma \).
The standard normal distribution with mean \( \mu =0 \) and standard deviation \( \sigma=1 \), \[ f(x) = \frac{1}{\sqrt{2\pi}} e^{\frac{-1}{2}x^2}, \] is just a special case of the normal distribution.

Suppose that \( x \) is a random variable distributed according to a general normal distribution.
If we want to find the critical values for \( x \) like we discussed earlier, we can standardize \( x \) by its z-score.

I.e., recall the formula for the z-score of a general \( x \), \[ z = \frac{x - \mu}{\sigma}; \] then for \( z \) we can use the same techniques as before.

Particularly, when we change the variable \( x \) to \( z \) as above, the area corresponding to the probability of an event can be read directly from the standard normal.
This is most useful when making calculations by hand, but generally we will use StatCrunch to evaluate the general normal distribution directly.
We will consider some examples.

Applications of general normal distributions

Converting between general and standard normal.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Adult men in the USA have heights which are normally distributed with a mean height of \( \mu=68.6 \) inches with a standard deviation of \( 2.8 \) inches.
Most building codes in the USA require that showers have heads at a height of at least \( 72 \) inches.
We will consider what percent of men in the USA have height that exceeds the \( 72 \) inches of most shower heads.
Recall, \( x=68.6 \) has a z-score of \( 0 \) because, \[ \frac{68.6 - \mu}{\sigma} =\frac{68.6 - 68.6}{2.8} = 0 \] for the given population mean.
Similarly, we can compute, \[ \frac{72 - \mu}{\sigma} = \frac{72 - 68.6}{2.8} \approx 1.21, \] i.e., the z-score for \( 72 \) inches on the standardized scale.

We can therefore find the area to the right of \( 1.21 \) for the standard normal or the area to the right of \( 72 \) for the normal with \( \mu=68.6 \), \( \sigma=2.8 \) equivalently.
We will consider this in StatCrunch.

Applications of general normal distributions

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Adult women in the USA have heights which are normally distributed with a mean height of \( \mu=63.7 \) inches with a standard deviation of \( 2.9 \) inches.
The US Airforce requires that pilots have a height between \( 64 \) and \( 77 \) inches.
We will consider what percent of women in the USA have a height that allows them to be a pilot in the Airforce.
Recall, \( x=63.7 \) has a z-score of \( 0 \) because, \[ \frac{63.7 - \mu}{\sigma} =\frac{63.7 - 63.7}{2.9} = 0 \] for the given population mean.
Similarly, we can compute, \[ \frac{64 - \mu}{\sigma} = \frac{64 - 63.7}{2.9} \approx 0.10, \] i.e., the z-score for \( 64 \) inches on the standardized scale.

Finally, the maximum allowed height can be standardized as \[ \frac{77 - \mu}{\sigma} = \frac{77 - 63.7}{2.9} \approx 4.59. \]
Therefore, we can compute the area between \( z=0.10 \) and \( z=4.59 \) for the standard normal or the area between \( x=64 \) and \( x=77 \) for the normal with \( \mu=63.7 \), \( \sigma=2.9 \) equivalently.
We will consider this in StatCrunch.

Applications of general normal distributions

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Suppose we want to re-design cockpits so that the shortest \( 95\% \) of women are eligible to be pilots.
This corresponds to a \( \alpha=0.05 \) critical value for a \( 1 \)-sided measure of extremely large.
From earlier, we know that \( z_\alpha = 1.645 \), and suppose we want to convert from the z-score to the \( x \) value on the orignal scale.
Notice, \[ \begin{align} & z = \frac{x - \mu}{\sigma} \\ \Leftrightarrow & z \times \sigma = x - \mu \\ \Leftrightarrow & z \times \sigma + \mu = x. \end{align} \]
Therefore, we can convert from \( z_\alpha \) to \( x_\alpha \) as, \[ x_\alpha = z_\alpha \times \sigma + \mu. \]

Consider the following: for the values of \( \mu=63.7 \), \( \sigma=2.9 \) and \( z_\alpha=1.645 \), what is the corresponding critical \( x_\alpha \) in the original scale?

Using the formula, we find \[ x_\alpha = 1.645 \times 2.9 + 63.7 \approx 68.4705; \] i.e., \( 95\% \) of US women have heights under \( 68.4705 \) inches.