Probability distributions part III

03/12/2020

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

The following topics will be covered in this lecture:
- Poisson distribution
- Continuous random variables
- Uniform distribution
- Normal distributions

Poisson distribution motivation

Random variables are the numerical measure of the outcome of a random process.

Courtesy of Ania Panorska CC

Consider modeling the amount of mail you receive each day.
Suppose that expected number of pieces of mail on a given day will remain constant across any given day;

i.e., the actual ammount will have variation but the expected amount is always the same, e.g., 4 pieces.

Suppose as well that the arrival times of all pieces of mail are independent;

i.e., how many pieces of mail you receive one day does not affect future days.

If we wanted to model this kind of random experiment, we would call:

Random experiment - daily mail;
Outcome - the actual mail received;
Random variable - total number of pieces of mail.

Let’s suppose that the standard deviation for the number of pieces of mail in a given day is also constant, equal to the square root of the expected number, \( \sqrt{4}=2 \).
If \( x \) is the random variable above, typically we would model its behavior in terms of a Poisson distribution.

Poisson distribution motivation continued

Courtesy of Ania Panorska CC

The example on the last slide of the daily mail is on particular example of a Poisson distributed varible because:

the random variable \( x \) measures the outcome of some random process over an interval;

i.e., we measured the ammount of mail over one day.

the outcome in any particular interval is considered to be by chance;

i.e., the actual mail we recieve on a given day is basically by chance.

The outcome in one interval is independent of the outcome in any other interval;

i.e., the mail recieved one day doesn’t affect the mail recieved any other day.

But in addition, the expected value for the random variable \( x \) is the same in any interval,

i.e., on every given day, we might expect to have \( 4 \) pieces of mail arriving (possibly more or less in reality).

Poisson distribution motivation continued

Courtesy of Ania Panorska CC

A similar idea can be applied to Customer Support phone queues for a given hour.
The number of callers in a given hour is determined basically by chance.

However, we can suppose that the expected number of incoming calls will be the same in any one hour interval.

We will also suppose that the callers are independent,

i.e., we suppose that the number of callers in one hour does not affect the number of callers in another hour.

Let \( x \) be the random variable equal to the number of incoming calls in a given hour.
We are once again in a situation where a Poisson distribution is a good model for the behavior of \( x \).
This is again because:

The expected value of \( x \) is constant for any given interval; and
the outcomes in any two given intervals are mutually independent.

More broadly, examples can include intervals in units of distance, area, volume etc… not just time.

However, we are always measuring the number of occurances of some event over some interval.

Poisson distribution

Formally we will describe a Poisson distribution as follows:

Suppose we have a random process which has outcomes measured in some interval (time / distance / volume / etc…).
Let \( x \) be the random variable equal to the number of occurances of the random process in a given interval.

The random variable \( x \) takes outcomes of the process to values in the range \( \mathbf{R} \) including \( 0, 1 ,2, \cdots \) up to arbitrarily large, finite values.

Suppose that the expected number of occurances in any given interval is the fixed value \( \mu \).
Also suppose that the number of occurances in any given interval is independent of the number of occurances in any other given interval;

the value \( x \) attains during one (hour / mile / \( \mathrm{cm}^3 \) / etc…) doesn’t affect the value it attains in any other.

Then, the probability of exactly \( x_\alpha \) occurances of the process in a given interval is given by, \[ \begin{align} P(x=x_\alpha) = \frac{\mu^{x_\alpha}e^{-\mu}}{x_\alpha !}, \end{align} \] where:

\( e \approx 2.71828 \) is known as Euler’s number, the base of the natural logarithm \( ln \).
the meaning of the “\( ! \)” for any whole number \( m \) is given by \[ \begin{align} m! &= m \times (m-1) \times (m-2) \times \cdots \times 2 \times 1,\\ \end{align} \] except for \( 0! = 1 \) which we take as definition.

Note: the Poisson distribution is described entirely by \( \mu \) and the standard deviation of the Poisson distribution is always given as, \[ \sigma = \sqrt{\mu}. \]

Poisson distribution continued

From the last slide, the probability of exactly \( x_\alpha \) occurances of the process in a given interval is given by, \[ \begin{align} P(x=x_\alpha) = \frac{\mu^{x_\alpha}e^{-\mu}}{x_\alpha !}. \end{align} \]
This expression can be analyzed as follows:

We note that \[ \mu^{x_\alpha} \] so that the numerator grows exponentially as we consider larger values of \( x_\alpha \).
The factorial \[ x_\alpha ! = x_\alpha \times (x_\alpha - 1 ) \times (x_\alpha -2) \times \cdots 2 \times 1 \] in the denominator actually grows faster than exponential,
however, the denominator grows slower than double exponentials \[ x_\alpha^{x_\alpha}. \]

Therefore, the probability of \( x=x_\alpha \) given by the ratio, \[ \frac{\mu^{x_\alpha}e^{-\mu}}{x_\alpha !} \] grows to a peak close to \[ \frac{\mu^{\mu}e^{-\mu}}{\mu !}. \]
For values \( x_\alpha \geq \mu \), the denominator grows more quickly and becomes greater than the numerator;

thus the probability of \( x=x_\alpha \) decreases for \( x_\alpha \geq \mu \) but never reaches zero.

We can visualize this description in StatCrunch.

Poisson distribution example

Let’s consider a simple example of how we can model a random proceedure with a Poisson distribution.

Consider our first example where we suppose that we have an expected value of \( 4 \) pieces of mail per day with a standard deviation of \( \sqrt{4}=2 \) pieces of mail per day.
We will suppose that:

the expected number of pieces of mail does not change from day to day, and
that the number of pieces of mail in any given day does not affect the number of pieces of mail in a different day.

If we want to find the probability of recieving exactly \( x_\alpha = 6 \) pieces of mail, this can be found as, \[ \begin{align} P(x=x_\alpha) &= \frac{\mu^{x_\alpha}e^{-\mu}}{x_\alpha !} \\ &= \frac{4^{6}e^{-4}}{6 !}. \end{align} \]
This is a complicated expression, but one that we can look at directly in StatCrunch.

Poisson distribution example continued

Let’s consider another example of how we can use the Poisson distribution.

In the \( 55 \) year period between \( 1960 \) and \( 2015 \) there have been \( 336 \) Atlantic hurricanes.
While the variation in the intensity of hurricanes changed over this period due to global climate change, the expected number of huricanes per year did not change.
Also, the number of hurricanes in one year is basically independent of the number of hurricanes in a different year.
Let’s make the following assumptions:

We will suppose that the Poisson distribution is a good model for the number of hurricanes in a year.
We will suppose that the sample mean \( \overline{x} \) is a good estimate for the expected value \( \mu \) of the number of hurricanes per year.

Discuss with a neighbor: can you identify the interval of interest for this problem, \( x \) the random variable and \( \overline{x} \) the sample mean?

The interval is the time period of one year.
\( x \) is the total number of hurricanes per year.
\( \overline{x} = \frac{336}{55} \approx 6.1 \) hurricanes per year.

We will consider now what the probability of \( x=8 \) hurricanes in one year.

Using the Poisson distribution, we say that \[ P(x = 8) = \frac{(6.1)^8 e^{-6.1}}{8!} \]
This is a complicated expression that we will look at in StatCrunch directly.

Poisson distribution example continued

We saw in StatCrunch that the probability of \( x=8 \) hurricanes in a year is, \[ P(x = 8) = \frac{(6.1)^8 e^{-6.1}}{8!}\approx 10.7\%. \]
This can be considered to say, we expect about \( 10 \) or \( 11 \) years with \( 8 \) hurricanes in a \( 100 \) year period.
Note in the \( 55 \) year period there were exactly \( 5 \) years which had precisely \( 8 \) hurricanes in them.
Discuss with a neighbor: does the real outcome in terms of number of years with \( 8 \) hurricanes seem to match our theoretical model with the Poisson distribution?

If we consider, \( .107\times 55 =5.885 \) then the \( 5 \) years with \( 8 \) hurricanes exactly appears to match this theoretical model for the number of hurricanes per year.

This above example is a good introduction to the kind of thinkning we will start doing in this course:

We will have some ammount of sample data and some knowledge about the way the random variable behaves.
Our knowledge about the way the random variable behaves will suggest a theoretical model, a probability distribution, for this random variable.
We will use our sample data to try to estimate the relevant parameters of the distribution.
We will then test our theoretical model versus data to validate it or invalidate it.
Finally, what we have yet-to-do is to provide an estimate of how uncertain the parameters and the theoretical model are.

Poisson distribution as an approximation for binomial distribution

For theoretical reasons, the Poisson distribution is actually related to the binomial distribution.
In certain cases, the Poisson distribution makes a good approximation of the binomial distribution.

This is the case generally when \( n \) (the number of independent trials) is large; and \( p \) (the probability of a successful trial) is small.
Specifically, our approximation for binomial with Poisson will require:

\( n \) should be greater than \( 100 \); and
\( n\times p \) should be less than \( 10 \).

Recall that the mean (expected value) for the binomial distribution is given by, \[ \mu_b = n\times p . \]
When the above two conditions are satisfied, we can approximate the mean (expected value) of the Poisson distribution by \[ \mu_p \approx n \times p. \]
We will consider an example of this with the Maine Pick 4 Game:

In this game you choose \( 4 \) digits, each ranging between \( 0, 1, 2, \cdots, 9 \).
A winning number selected randomly – i.e., each digit is selected randomly with equal probability and with replacement.
Therefore, for our choice of number there is a probability of \[ \frac{1}{10}\times \frac{1}{10}\times \frac{1}{10} \times \frac{1}{10} = \frac{1}{10,000} \] that we are selected as a winner.
Discuss with a neighbor: if we play this game for \( 365 \) days, can we model the probability of winning at least once with Poisson?

Poisson distribution as an approximation for binomial distribution continued

From the last slide we know that there are \( n=365 \) trials and a probability of success \( p=\frac{1}{10,000} \).
Based on our conditions:

\( n \) should be greater than \( 100 \); and
\( n\times p \) should be less than \( 10 \);

Poisson will make a good approximation here.
Discuss with a neighbor: to find the probability of at least one success, how can we use complementary probability to solve this problem?

Notice that \( A= \)"zero successes" implies \( \overline{A}= \)"at least one success".

Using this fact, we can approximate the probability of \( \overline{A} \) with the Poisson random variable \( x \) as \[ \begin{align} P(\overline{A}) &= 1 - P(A) \\ &\approx 1 - P(x=0)\\ &= 1 - \frac{ \left(n \times p\right)^0 \times e^{- n \times p}}{0!} \\ &=1 - \frac{1 \times e^{- 0.0365}}{1} \approx 0.0358. \end{align} \]
In fact, using StatCrunch, we can see that the probability of at least \( 1 \) success using the binomial distribution is also approximately \( 0.0358 \), so we get a good approximation here.

Continuous random variables

Histogram of probability distribution for two coin flips with x number of heads.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

So far our examples have focused on discrete random variables, e.g.:

Results of coin flips – \( x \) is modeled with a binomial distribution.
Results of success / failure trials – \( x \) is modeled with a binomial distribution.
Number of occurances in an interval – \( x \) is modeled with a Poisson distribution.

We will now turn our attention to continuous random variables, but we will use what we learned about discrete variables to motivate this.
Recall that the probability histogram had the property, \[ \begin{align} \text{Area of Rectangle }x_\alpha &= P(x=x_\alpha) \times 1\\ &= P(x=x_\alpha). \end{align} \]

We also saw that we have the property \[ \sum_{x_\alpha \in \mathbf{R}} P(x=x_\alpha) =1. \]
Putting the above two properties together, we know, \[ \sum_{x_\alpha \in \mathbf{R}} \text{Area of Rectangle }x_\alpha =1. \]
For continuous random variables, we in fact have the same property with a minor modification:
Let \( f(x) \) describe a curve for a probability distribution. Then the total area under the curve \( f(x) \) equals \( 1 \), and the probability of any event \( A \) equals the associated area under \( f(x) \) for all \( x_\alpha \) in the case of \( A \).

Uniform distribution

Courtesy of Ania Panorska CC

A basic example of the area property is with the uniform distribution

Let’s suppose that we are studying some procedure where all outcomes are equally likely.

A very simple example is if you are asked to guess a random number between \( 1 \) and \( 10 \), but including decimals.

That is, we will suppose that guessing \( 1.23453453 \) is equally likely as guessing \( 5 \).

Viewed in the above,
- Our random experiment is guessing some number.
- The outcome is one guess.
- The random variable \( x \) is assigned the value of the guess.
Because we allow arbitrary decimal expansions, there are infinitely many choices.
However, all choices lie in the finite range \( [1,10] \) and are equally likely.
Discuss with a neighbor: if the area under the curve \( f(x) \) for \( x_\alpha \) in \( [1, 10] \) must equal one, and the height of \( f(x) \) is constant, what is the height?

Uniform distribution continued

Courtesy of IkamusumeFan CC via Wikimedia Commons

Recall, the area is given by the height \( h \) times the width \( w \).
The area is fixed at \( 1 \) and the width is \( 10-1 \) so that, \[ \begin{align} &1 = h\times w \\ \Leftrightarrow & \frac{1}{9} = h \end{align} \]
Here, the probability distribution curve, \[ f(x) = \begin{cases} \frac{1}{9} & \text{for }x\text{ in }[1,10] \\ 0 & \text{else} \end{cases} \]
More generally, consider any range of values \( [a,b] \) where \( a < b \).
If we can randomly select any value in the range \( [a,b] \) with the same likelihood
let \( x \) be the random variable assigned the value we select.

Then the probability distribution for \( x \) is uniform over \( [a,b] \) with \[ f(x) = \begin{cases} \frac{1}{b-a} & \text{for }x\text{ in }[a,b] \\ 0 & \text{else} \end{cases} \]
The graph of this distribution curve (as above) is called the probability density.
Thus, if we take any \( \alpha < \beta \) in \( [a, b] \), the probability of \( x \) in \( [\alpha, \beta] \) is given by the area (width times the height of this block), \[ (\beta - \alpha) \times \frac{1}{b-a}. \]

Uniform distribution example

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Let’s consider an example of the uniform distribution that is from real life.

During certain times at the RNO airport, waiting times in security are uniformly distributed in the interval between \( 0 \) and \( 5 \) minutes.
This is to say that:

All waiting times in the interval \( [0,5] \) are equally likely.
The waiting time can be measured to an arbitrary decimal place,

e.g., you could wait exactly 1.3534543 minutes.

The probability distribution for waiting times is given as \[ f(x) = \begin{cases} \frac{1}{5} & \text{for }x\text{ in }[0,5] \\ 0 & \text{else} \end{cases} \]

And, the probability of waiting some ammount of time is equal to the associated area under \( f(x) \).

Discuss with a neighbor: what is the probability of waiting between \( 2 \) and \( 5 \) minutes at RNO security, if waiting time \( x \) is uniformly distributed over \( [0,5] \)?

Uniform distribution example continued

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

We want to find the area under \( f(x) \) defined as, \[ f(x) = \begin{cases} \frac{1}{5} & \text{for }x\text{ in }[0,5] \\ 0 & \text{else} \end{cases} \]
but for the interval between \( 2 \) and \( 5 \).
The area can be derived as the length times height where, \[ \begin{align} l = 5 - 2 = 3 & & h = 0.2 \text{ for }x\text{ in the range }[2,5] \end{align} \]
That is, \[ \begin{align} P(x \text{ in }[2, 5]) &= l \times h \\ &= 3 \times 0.2 = 0.6 \end{align} \]

Note that the formula for the area, \[ \text{Area} = l \times h \] only applies for rectangles as above.
However, the principle of,

\[ \text{Probability } = \text{Area under the probability density graph} \]
holds for all distributions \( f(x) \).
Particularly, this also holds for non-rectangular, bell shaped curves…

The normal distribution

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Recall, frequency data is called normal when it exhibits the following:

The frequencies start low, then increase to one or two high frequencies, and then decrease to a low frequency.
The distribution is approximately symmetric.
There are few if any extreme values.

We have a theoretical probability model for this type of data called a normal distribution.
Let \( x \) be a random variable with mean \( \mu \) and standard deviation \( \sigma \) which behaves as above.

We say that \( x \) has a normal distribution, \[ f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{\frac{-1}{2}\left(\frac{x- \mu}{\sigma}\right)^2} \] with the parameters \( \mu \) and \( \sigma \).
Note: the area under \( f(x) \) cannot be computed by \( l\times h \), but the total area under the above curve is still \( 1 \).

Therefore, accurately computing the probability of some event is typically done with computer methods – like in StatCrunch.

A special case of the normal distribution is the standard normal distribution with mean \( \mu =0 \) and standard deviation \( \sigma=1 \), \[ f(x) = \frac{1}{\sqrt{2\pi}} e^{\frac{-1}{2}x^2} \]
For the standard normal distribution as above, the z-score of some observation \( x \) is actually just equal to \( x \).