Probability distributions part II

03/10/2020

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

The following topics will be covered in this lecture:
- Review of random variables
- Review of distributions
- Binomial distribution
- Parameters of the binomial distribution

Random variables

Courtesy of Ania Panorska CC

Let us recall the idea of a random variable.
Prototypically, we can consider the coin flipping example from the motivation:

\( x \) is the number heads in two coin flips.

Every time we repeat two coin flips \( x \) can take a different value due to many possible factors:

how much force we apply in the flip;
air pressure;
wind speed;
etc…

The result is so sensitive to these factors that are beyond our ability to control, we consider the result to be by chance.
Before we flip the coin twice, the value of \( x \) has yet-to-be determined.
After we flip the coin twice, the value of \( x \) is fixed and possibly known.
Formally we will define:

Random variable – is a variable that has a single numerical value, determined by chance, for each outcome of a procedure.

Random variables continued

Courtesy of Ania Panorska CC

Suppose we are considering our sample space \( \mathbf{S} \) of all possible outcomes of a random process.
Then for any particular outcome of the process,

e.g., for the coin flips one outcome is \( \{H,H\} \),

mathematically the random variable \( x \) takes the outcome to the numerical value \( x=2 \) in the range \( \mathbf{R} \).

Note: \( x \) must always take a numerical value.
Because a random variable takes a numerical value (not categorical), we must consider the units that \( x \) takes:

Discrete random variable – these take numerical values that are in counting units.

In particular, the unit of \( x \) cannot be arbitrarily sub-divided.

We can think of “how many coin flips heads” is measured in counting units because \( 1.45 \) heads does not make sense.

However, the values \( x \) takes don’t strictly need to be whole numbers;

the units just cannot be arbitrarily sub-divided.

The scale of units for \( x \) can be finite or infinite depending on the problem.

Random variables continued

Courtesy of Ania Panorska CC

Continuous random variable – these take numerical values that are in continuous units.

The units of \( x \) can be arbitrarily sub-divided and \( x \) can take any value in the sub-divided units.
Necessarily, \( x \) can take infinitely many values when it is continuous.

A good example to think of is if \( x \) is the daily high temperature in Reno in degrees Celsius.
If we had a sufficiently accurate thermometer, we could measure \( x \) to an arbitrary decimal place and it would make sense.
\( x \) thus takes today’s weather from the outcome space and gives us a number in a continuous unit of measurement.

Probability distributions

Probability distribution for two coin flips with x number of heads.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Given a random variable, our method for analyzing its behavior is typically through a probability “distribution”.
Probability distribution – this is a description that gives the probability for each possible value of the random variable.

A probability distribution can thus be considered a complete description of the random variable.

For any possible value that \( x \) might attain given any possible outcome, we know with what probability this will occur.

It is often expressed in the format of a table, formula, or graph.

We see that the table above is a probability distribution as this gives every possible value for \( x \) its associated probability.
Notice if we consider the sum of \( P(x=x_\alpha) \) over all possible \( x_\alpha \) in the range of \( x \), \( \mathbf{R} \), \[ \sum_{x_\alpha\in \mathbf{R}}P(x=x_\alpha) = 1. \]

In fact, this holds for any \( x \) and its associated distribution – intuitively, consider \[ P(x=0 \text{ or } x=1 \text{ or } x=2) = 1 \] because this is all possible values that \( x \) can attain.
However, all \( x=0 \), \( x=1 \) and \( x=2 \) are all disjoint so that, \[ P(x=0 \text{ or } x=1 \text{ or } x=2) = P(x=0) + P(x=1) + P(x=2) = 1 \]
The same intuition can be used for infinite ranges when we use calculus to define this more formally.

Probability distributions continued

Histogram of probability distribution for two coin flips with x number of heads.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

We can graphically represent the probability distribution with a histogram similarly to how we represent a relative frequency distribution.
Notice, all values for the probability distribution, \[ 0 \leq P(x=x_\alpha) \leq 1 \] for any value \( x_\alpha \) that \( x \) can take.

This is similar to a relative frequency distribution,
each \( P(x=x_\alpha) \) represents a proportion of all possible ways \( x \) can equal \( x_\alpha \) relative to all possible outcomes.

In the horizontal axis, the centers of the rectangles are at the attainable values for \( x \).

The width of each rectangle is also equal to \( 1 \).

Therefore, if we take the area of rectangle corresponding to some value \( x_\alpha \), we have, \[ \begin{align} \text{Area of Rectangle }x_\alpha &= P(x=x_\alpha) \times 1\\ &= P(x=x_\alpha). \end{align} \]
This says that the histogram is an identical representation of the probability distribution.

Binomial distribution

A coin flipping experiment is actually a simple example of a broad category of experiments.

For example, we can consider an experiment in which we describe two possible outcomes:

(success / \( S \) / \( 1 \) / \( H \)); or
(failure / \( F \) / \( 0 \) / \( T \)).

We can encode the outcomes any way we like;

it is common to encode the outcomes as \( S \) or \( F \), where the choice of “success” is arbitrary.

More generally than coin flipping, we might consider the case where the probabilities, \[ \begin{align} P(\text{success}) \neq P(\text{failure}) \end{align} \]
Recall, if the experiment has only two possible outcomes, if \( A= \)"success" then \( \overline{A}= \)"failure".
Therefore, \[ \begin{align} P(\text{success}) + P(\text{failure}) = 1. \end{align} \]
Suppose we run the experiment a total of \( n \) trials.

Because there are only two possible outcomes for each trial, and a finite number of trials, we can create a list of all possible outcomes for \( n \) trials.

For example, if there are \( x_\alpha \) total successes, there must be exactly \( n - x_\alpha \) failures in \( n \) trials.

More importantly, we can also make a list of all possible ways we can have \( x_\alpha \) successes and \( n-x_\alpha \) failures.
Let \( x \) be the random variable equal to the number of successful trials – we can therefore calculate the probability, \[ P(x = x_\alpha); \] however, the classical model for probability (equal probability of all outcomes) will no longer apply.

Binomial distribution continued

Recall our random variable \( x \) equal to the number of successful trials in \( n \) total trials.

Unlike with coin flipping, we suppose that it is possible for \[ \begin{align} P(\text{success}) \neq P(\text{failure}) \end{align} \]

However, there are finite trials, finite possible outcomes and, for each possible number of successes \( x_\alpha \), there are a finite number of ways \( x=x_\alpha \).

Provided all trials are independent (like coin flipping) and the probability of success is constant, we can still make a counting argument using

The rule of complementary probability \[ \begin{align} P(\text{success}) + P(\text{failure}) = 1; \end{align} \]
independence;
the list of all possible ways we can make \( x_\alpha \) successes;
the list of all possible ways we can make \( n-\alpha \) failures; and
a total of \( n \) trials exactly;

to compute the probability exactly for each \( x_\alpha \) where \( x_\alpha \) ranges from \( 0, 1, \cdots, n \).
The list of all possible number of successes \( x_\alpha = 0, 1, \cdots, n \) and the associated probabilities \( P(x= x_\alpha) \) for \( x_\alpha = 0, 1, \cdots, n \) is called the binomial distribution.

The argument itself is somewhat long, but it really only uses tools we already know.

Therefore, if you can understand the principles of the points 1 - 5 above, we don’t need to belabor the details in this class.

Binomial distribution continued

Formally, we will now describe the binomial distribution.

Suppose we run an experiment with two possible outcomes \( S= \)"success" and \( F= \)"failure", where \[ \begin{align} P(S) = p && P(F) = 1 - P(S) = q. \end{align} \]
Suppose we run exactly \( n \) total trials of the above experiment and suppose that:

each trial is independent; and
\( P(S)=p \) for every trial.

Let \( x \) be the random variable equal to the total number of successful trials.
Let \( x_\alpha \) be one of the possible number of successful trials in the range \( 0, 1 ,\cdots , n \).

Then the probability of exactly \( x_\alpha \) successful trials (the event \( x= x_\alpha \)) is given by \[ \begin{align} P(x=x_\alpha) = \frac{n!}{\left( n - x_\alpha\right)! x_\alpha !} p^{x_\alpha} q^{(n - x_\alpha)}, \end{align} \]

where the meaning of the “\( ! \)” for any whole number \( m \) is given by \[ \begin{align} m! &= m \times (m-1) \times (m-2) \times \cdots \times 2 \times 1,\\ \end{align} \] i.e., this is the descending product of all whole numbers less than or equal to \( m \) and greater than zero, except for \( 0! = 1 \) which we take as definition.
The total number of ways that we can have exactly \( x_\alpha \) successes in \( n \) trials is given by \[ \frac{n!}{\left( n - x_\alpha\right)! x_\alpha !}. \]
The probability of \( x_\alpha \) independent successes (or \( n-x_\alpha \) indepednent failures) is \( p^{x_\alpha} \) \( \big( \) or \( q^{(n-x_\alpha)}\big) \) respectively.

Binomial distribution example

Recall our notation:
1. \( n \) - the number of trials;
2. \( x \) - the random variable;
3. \( x_\alpha \) - a specific number of successes that \( x \) could possibly attain;
4. \( P(S)= p \) - the probability of an independent trial’s success;
5. \( P(F)=q \) - the probability of an independent trial’s failure.
Consider that when an adult is randomly selected with replacement, there is a \( 0.85 \) probability that this person knows what Twitter is (based on results from a Pew Research Center survey).
Suppose that we want to find the probability that exactly three of five random adults know what Twitter is.
Discuss with a neighbor: can you identify what \( n \), \( x \), \( x_\alpha \), \( p \) and \( q \) are in the above word problem?

Here we consider the random selection to be a “trial” so that the number of trials is \( n=5 \)
If we consider a “successful” trial to be “select an adult who knows what Twitter is”, then \( x \) is “number of adults who know what Twitter is out of five”.
\( x_\alpha \) is the specific number of successful trials we are interested in, i.e., \( x_\alpha = 3 \).
\( p \) is the probability of an independent trial’s successs, i.e, \( p=0.85 \)
\( q \) is the probability of an independent trial’s failure, i.e., \( q=1-p = 0.15 \).

Binomial distribution example continued

Let’s recall our values from the last slide,
- Here we consider the random selection to be a “trial” so that the number of trials is \( n=5 \)
- If we consider a “successful” trial to be “select an adult who knows what Twitter is”, then \( x \) is “number of adults who know what Twitter is out of five”.
- \( x_\alpha \) is the specific number of successful trials we are interested in, i.e., \( x_\alpha = 3 \).
- \( p \) is the probability of an independent trial’s successs, i.e, \( p=0.85 \)
- \( q \) is the probability of an independent trial’s failure, i.e., \( q=1-p = 0.15 \).
Suppose we wanted to compute the probability of one particular outcome,

say, \( S_i = \)"the \( i \)-th particpant knows what Twitter is" and \( F_i= \)"the \( i \)-th participant does not know what twitter is", where \[ A = S_1 \text{ and } S_2 \text{ and } S_3 \text{ and } F_4 \text{ and } F_5. \]
We can use independence and the multiplication rule to show \[ \begin{align} P(A) &= P(S_1)\times P(S_2)\times P(S_3)\times P(F_4)\times P(F_5) \\ &= 0.85 \times 0.85 \times 0.85 \times 0.15 \times 0.15 \\ &= 0.85^3 \times 0.15^2. \end{align} \]

This shows how we get one part of the binomial distribution formula.

However, there are many combinations of \( S_i \) and \( F_i \) that arise in \( x=3 \).

Using a counting argument, we can show that the total number of ways \( x=3 \) is \[ \begin{align} \frac{n!}{(n- x_\alpha)! x_\alpha!} = \frac{5!}{(5 - 3)! 3!} = \frac{5!}{(2)!3!} = 10 \end{align} \]

Binomial distribution example continued

Let’s recall our values from the last slide,
- Here we consider the random selection to be a “trial” so that the number of trials is \( n=5 \)
- If we consider a “successful” trial to be “select an adult who knows what Twitter is”, then \( x \) is “number of adults who know what Twitter is out of five”.
- \( x_\alpha \) is the specific number of successful trials we are interested in, i.e., \( x_\alpha = 3 \).
- \( p \) is the probability of an independent trial’s successs, i.e, \( p=0.85 \)
- \( q \) is the probability of an independent trial’s failure, i.e., \( q=1-p = 0.15 \).
- The total number of ways \( x=3 \) is \[ \begin{align} \frac{n!}{(n- x_\alpha)! x_\alpha!} = \frac{5!}{(5 - 3)! 3!} = \frac{5!}{(2)!3!} = 10 \end{align} \]
The binomial distribution formula can then be read as,
The probability of finding exactly \( 3 \) out of \( 5 \) independently, randomly selected adults who know what Twitter is, is equal to \[ \begin{align} \frac{n!}{(n- x_\alpha)! x_\alpha!} p^{x_\alpha} q^{n- x_\alpha} = 10 \times 0.85^3 \times 0.15^2 \approx 0.138, \end{align} \]
or in plain English,
the probability of three independent successful trials, times the probability of two indepdendent failure trials, times all possible ways we can have exactly \( 3 \) successful trials out of five.
Again, the counting argument can be somewhat long and technical so it will not be the focus of the course,

however, is important that you understand the pieces of the formula and how it fits together.

Binomial distribution example continued

Let’s now take a graphical look at the last problem in StatCrunch.
We should remark the following on the last calculation.

Technically, we could only make use of the binomial distribution because we sampled with replacement to enforce independent trials.
If sampled our population without replacement, we know

that the trials are dependent; and
that the probability of success changes at each trial.

These conditions make it so the binomial distribution does not apply to the random variable \( x \) when we do not replace samples.
However it is common to approximate sampling without replacement as independent when the sample size is less than \( 5\% \) of the population.
In practice for polls of, e.g., all US adults, this approximation will often be used.
However, in this class, we will only use this approximation when the problem specifically calls for the approximation.

Binomial distribution technology example

While it is important to understand the pieces and the principles that go into the binomial formula \[ \begin{align} P(x=x_\alpha) = \underbrace{\frac{n!}{\left( n - x_\alpha\right)! x_\alpha !}}_{(1) } \times \underbrace{ p^{x_\alpha}}_{(2)} \times \underbrace{q^{(n - x_\alpha)}}_{(3)} \end{align} \] as:

Total number of ways to find exactly \( x_\alpha \) successful trials out of \( n \) total trials;
Probability of \( x_\alpha \) independent succesful trials;
Probability of \( n-x_\alpha \) independent failure trials;

in practice, we will usually let technology handle the complicated calculation.
Because we will let technology handle these calculations, we should understand how the pieces fit together without blindly entering values into formulas.
We will now consider a more complicated example:

Before 2012, the NFL used to decide overtime games by a coin flip where the winner could decide if they kicked or recieved the ball in a fresh play.
Between 1974 and 2011, 460 overtime games did not end in a tie.
252 of these games were won by the team that won the coin toss and got to decide whether to kick or recieve.
Let’s assume that the probability of winning or losing an overtime game is equally likely.
Discuss with a neighbor: if we want to find the probability of winning exactly \( 252 \) games, can you identify what \( n \), \( x \), \( x_\alpha \), \( p \) and \( q \) are in the above word problem?

Binomial distribution technology example continued

From the last slide, we will consider, \( n= 460 \) independent trials (overtime games).
\( x \) is the number of wins in overtime.
\( x_\alpha=252 \) is the specific number of successful trials we are interested in.
\( p=q=0.5 \) because (we assume) either outcome is equally likely.
Therefore, we have, \[ \begin{align} P(x=x_\alpha) &= \frac{n!}{\left( n - x_\alpha\right)! x_\alpha !} \times p^{x_\alpha} \times q^{(n - x_\alpha)}\\ &=\frac{460!}{(460 - 252)!252!} \times 0.5^{252} \times 0.5^{(460 - 252)} \\ &= \frac{460!}{208!252!}\times 0.5^{252} \times 0.5^{208} \end{align} \]
This is a complicated expression, so therefore we will examine this in StatCrunch directly.

Binomial distribution technology example continued

Consider the last example where:

Between 1974 and 2011, 460 overtime games did not end in a tie.
252 of these games were won by the team that won the coin toss and got to decide whether to kick or recieve.
Let’s assume that the probability of winning or losing an overtime game is equally likely.
Another way of saying this is that,

we will assume that the result of the coin flip has no impact on whether the team wins or loses.

Recall, we saw earlier that when observing some outcome, e.g., \( x = 252 \), if the probability of observing some outcome at least as extreme is less than \( 5\% \) we call this interesting or significant.
If \( 252 \) successful trials is significant, then assuming that winning the coin flip makes no difference on the outcome,

i.e., assuming \( p=q=0.5 \),

then we should question this assumption.
We will look at this in StatCrunch directly – while we do so, Discuss with a neighbor:

is the \( 252 \) wins significant for the binomial distribution for \( n=460 \) trials and equal probability of success and failure?

Binomial distribution technology example continued

The process that we took in the last example was what we called before, assuming the null hypothesis.
We assumed that the probability of winning or losing overtime was equally likely;

in particular, it shouldn’t depend on the coin toss.

However, we found that the probability of finding some event at least as extreme was given by, \[ P(x \geq 252) \approx 0.0224 \] in the case that winning or losing is equally likely.
Being less than \( 5\% \), we called this observation significant, and this strongly suggests that winning the coin flip gave the team an advantage in overtime.
This is precisely why the overtime rule was changed in 2012.

Parameters of the binomial distribution

Courtesy of Ania Panorska CC

We saw earlier the following definitions for the mean and the standard deviation of a probability distribution:
- Suppose we have a random variable \( x \) which assigns a numerical value to each outcome in the sample space \( \mathbf{S} \).
- Suppose all values that \( x \) can attain are given by a collection \( \{x_\alpha\} \) in the range \( \mathbf{R} \) of \( x \).

Then the mean (or expected value) of the probability distribution is given, \[ \mu = \sum_{x_\alpha \in \mathbf{R}} x_\alpha P(x=x_\alpha) \]
The standard deviation of the probability distribution is given \[ \sigma = \sqrt{\sum_{x_\alpha\in \mathbf{R}} P(x=x_\alpha) \left(x_\alpha - \mu\right)^2 } \]
These formulas hold for all probability distributions (with a slight modification when the variable is continuous by using calculus).

Parameters of the binomial distribution continued

Public domain via Wikimedia Commons

The binomial distribution has a very nice structure so that the parameters have a nice form.
For the binomial distribution the mean is given as, \[ \mu = n \times p . \]
For the binomial distribution the variance is given as, \[ \sigma^2 = n \times p \times q. \]
For the binomial distribution the standard deviation is given as, \[ \sigma = \sqrt{ n \times p \times q}. \]
Discuss with a neighbor: what is \( \mu \) and \( \sigma \) for the binomial distribution for \( 20 \) trials and probability of success \( p=0.5 \)?

Notice that these are given as, \[ \begin{align} \mu = 20 \times 0.5 = 10 & & \sigma = \sqrt{20 \times 0.5 \times 0.5} = \sqrt{ 5}\end{align} \]

Discuss with a neighbor: what is \( \mu \) and \( \sigma \) for the binomial distribution for \( 40 \) trials and probability of success \( p=0.5 \)?

Notice that these are given as, \[ \begin{align} \mu = 40 \times 0.5 = 20 & & \sigma = \sqrt{40 \times 0.5 \times 0.5} = \sqrt{ 10}\end{align} \]

Discuss with a neighbor: what is \( \mu \) and \( \sigma \) for the binomial distribution for \( 20 \) trials and probability of success \( p=0.7 \)?

Notice that these are given as, \[ \begin{align} \mu = 20 \times 0.7 = 14 & & \sigma = \sqrt{20 \times 0.7 \times 0.3} = \sqrt{ 4.2}\end{align} \]

Parameters of the binomial distribution and the range-rule-of-thumb

Recall that:

for the binomial distribution the mean is given as, \[ \mu = n \times p . \]
For the binomial distribution the standard deviation is given as, \[ \sigma = \sqrt{ n \times p \times q} . \]

Let’s consider the NFL example again, using the range rule of thumb.
Recall that there were \( 460 \) trials with an assumed probability of \( 0.5 \) for success.
Discuss with a neighbor: using the range-rule-of-thumb, is \( 252 \) successful trials significant?

Notice, \[ \begin{align} \mu = 460 \times 0.5 = 230 & & \sigma = \sqrt{460 \times 0.5 \times 0.5} = \sqrt{ 115}\approx 10.72.\end{align} \]
We call observations that lie outside of the range, \[ (\mu - 2 \sigma , \mu + 2 \sigma ) \approx ( 208.56, 251.44) \] significant.
The total number of wins \( 252 \) is just outside of the range, so by the range-rule-of-thumb, we also call this a significant observation.

Review of the binomial distribution

The binomial distribution is a key distribution that gives us a way to model a wide range of experiments probabilistically.
This applies when we run an experiment with two possible outcomes \( S= \)"success" and \( F= \)"failure", where \[ \begin{align} P(S) = p && P(F) = 1 - P(S) = q. \end{align} \]
When we run exactly \( n \) total trials of the above experiment, assuming that:

each trial is independent; and
\( P(S)=p \) for every trial.

We can model the probability of a particular number of successes \( x_\alpha \) like a (possibly) non-fair coin flipping experiment.
We model the probability of exactly \( x_\alpha \) successful trials as \[ \begin{align} P(x=x_\alpha) = \underbrace{\frac{n!}{\left( n - x_\alpha\right)! x_\alpha !}}_{(1) } \times \underbrace{ p^{x_\alpha}}_{(2)} \times \underbrace{q^{(n - x_\alpha)}}_{(3)} \end{align} \] where:

Total number of ways to find exactly \( x_\alpha \) successful trials out of \( n \) total trials;
Probability of \( x_\alpha \) independent succesful trials;
Probability of \( n-x_\alpha \) independent failure trials;

The special structure of this distribution also allows us to compute the mean and standard deviation directly as \[ \begin{align} \mu = n \times p & & \sigma = \sqrt{n \times p\times q} \end{align} \]