Probability Distributions, Mass Functions and Cumulative Distribution Functions

02/17/2021

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

The following topics will be covered in this lecture:
- Random variables
- Probability distributions
- Probability mass functions
- Cumulative distribution functions

Random Variables

Probability distribution for two coin flips with x number of heads.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Let us recall the idea of a random variable.
Prototypically, we can consider the coin flipping example from the motivation:

\( X \) is the number heads in two coin flips.

Every time we repeat two coin flips \( X \) can take a different value due to many possible factors:

how much force we apply in the flip;
air pressure;
wind speed;
etc…

The result is so sensitive to these factors that are beyond our ability to control, we consider the result to be by chance.
Before we flip the coin twice, the value of \( X \) has yet-to-be determined.
After we flip the coin twice, the value of \( X \) is fixed and possibly known.
Formally we will define:
Random Variable
A random variable is a function that assigns a real number to each outcome in the sample space of a random experiment.
Notation
A random variable is denoted by an uppercase letter such as \( X \). After an experiment is conducted, the measured value of the random variable is denoted by a lowercase letter such as \( x \)

Random variables continued

Random variables are the numerical measure of the outcome of a random process.

Courtesy of Ania Panorska CC

Suppose we are considering our sample space \( \mathbf{S} \) of all possible outcomes of a random process.
Then for any particular outcome of the process,

e.g., for the coin flips one outcome is \( \{H,H\} \),

mathematically the random variable \( X \) takes the outcome to the numerical value \( x=2 \) in the range \( \mathbf{R} \).

Note: \( X \) must always take a numerical value.
Because a random variable takes a numerical value (not categorical), we must consider the units that \( X \) takes:

Discrete random variable is a random variable with a finite (or countably infinite) range.

In particular, the unit of \( X \) cannot be arbitrarily sub-divided.

Random variables continued

Courtesy of Ania Panorska CC

Continuous random variable is a random variable with an interval (either finite or infinite) of real numbers for its range.

The units of \( X \) can be arbitrarily sub-divided and \( X \) can take any value in the sub-divided units.
Necessarily, \( X \) can take infinitely many values when it is continuous.

A good example to think of is if \( X \) is the daily high temperature in Reno in degrees Celsius.
If we had a sufficiently accurate thermometer, we could measure \( X \) to an arbitrary decimal place and it would make sense.

Probability distributions

Given a random variable, our method for analyzing its behavior is typically through a probability “distribution”.
The probability distribution of a random variable \( X \) is a description of the probabilities associated with the possible values of \( X \).

A probability distribution can thus be considered a complete description of the random variable.

For any possible value that \( x \) might attain given any possible outcome, we know with what probability this will occur.

It is often expressed in the format of a table, formula, or graph.

Probability Mass Function

For a discrete random variable \( X \), its distribution can be described by a function that specifies the probability at each of the possible discrete values for \( X \).
Probability Mass Function
For a discrete random variable \( X \) with possible values \( x_1, x_2,\dots, x_n \), a probability mass function is a function such that
1. \( f(x_i)=P(X=x_i) \)
2. \( f(x_i)\geq 0 \)
3. \( \sum_{i=1}^n f(x_i)=1 \)

Probability distribution for bits in error.

Courtesy of Montgomery & Runger, Applied Statistics and Probability for Engineers, 7th edition

On the left, we see that the probability mass function is the graph of each possible realization of \( X \).
Particularly, we see \[ \begin{align} f(x) = \begin{cases} P(X=0)=0.6561 & \text{when }x=0\\ P(X=1)=0.2916 & \text{when }x=1\\ P(X=2)=0.0486 & \text{when }x=2\\ P(X=3)=0.0036 & \text{when }x=3\\ P(X=4)=0.0001 & \text{when }x=4 \end{cases} \end{align} \]
The input of the probability mass function is a possible outcome for the random variable, and the output is its associated probability.

Probability Mass function – example

EXAMPLE: Let the random variable \( X \) denote the number of semiconductor wafers that need to be analyzed in order to detect a large particle of contamination.
This is to say, we will examine wafers randomly and stop when we find one which has the contamination.
Assume that the probability that a wafer contains a large particle is 0.01 and that the selection of wafers is independent.
Let’s determine the probability distribution of \( X \).

Let \( p \) denote a wafer in which a large particle is present, and let \( a \) denote a wafer in which it is absent.
The sample space of the experiment is infinite, and it can be represented as all possible sequences that start with a string of \( a \)’s and end with \( p \).
That is \[ s = \{p, ap, aap, aaap, aaaap, aaaaap, \text{and so forth}\} \]
Consider a few special cases. We have \[ P(X = 1) = P(p) = 0.01 \]
Also, using the independence assumption \[ P(X = 2) = P(ap) = 0.99(0.01) = 0.0099 \]
Similarly \[ \hspace{5cm}P(X = 3) = P(aap) = 0.99(0.99)(0.01) = 0.99^2(0.01) \]
A general formula is \[ \begin{align} P(X = x) &= P(\underbrace{aa\dots a}_{(x-1)a's}p)\\ P(X = x) &= 0.99^{x-1}(0.01), \text{for } x= 1,2,3,\dots \end{align} \]

Probability distribution – example continued

Our formula for the mass function is \( f(x_i)=P(X=x_i)=0.99^{x_i-1}(0.01) \)
Recall, this gives precisely the probability of finding a large particle on the \( x_i \)-th draw – for all values the probability lies between \( 0 \) and \( 1 \).
Lets check the sum of probabilities on our sample space;

the sample space is infinite because theoretically, we could wait until an arbitrarily large, but finite, number \( n \) before we find a large particle.

For simplicity in notation let’s denote \( x_i=i \) for \( i\geq 1 \);

then the sum of \( f(x_i) \) over all possible \( x_i \) is given by \[ \sum_{i=1}^\infty f(x_i)=\lim_{n \rightarrow \infty} \sum_{i=1}^n 0.99^{i-1}(0.01) \] as we have simply replaced \( x_i \) with \( i \) in the above formula, and used the definition of an infinite sum.
Set \( r=0.99 \) – then we recognize the sum on the right-hand-side as a geometric series (Calc 2) \[ \begin{align} \sum_{i=1}^\infty f(x_i)&= (0.01) \lim_{n\rightarrow \infty} \sum_{i=1}^n r^{i-1} \end{align} \]

Probability distribution – example continued

Let’s recall, a geometric series is usually stated in the form where it is indexed from zero as \[ \sum_{i=0}^n ar^i = a\left(\frac{1-r^{n+1}}{1-r}\right) \] whenever \( n \) is finite.
If we take the limit of the right-hand-size as \( n\rightarrow \infty \), it is easy to see that for \( \vert r\vert <1 \) \[ a\left(\frac{\lim_{n\rightarrow \infty} 1 - r^{n+1}}{1-r}\right) = a\left(\frac{1}{1-r}\right) \]
If we return to our geometric series, notice that \[ \begin{align} \sum_{i=1}^\infty f(x_i)&= (0.01) \lim_{n\rightarrow \infty} \sum_{i=1}^n r^{i-1} \\ &= \lim_{n\rightarrow \infty} (0.01)\sum_{i=0}^{n-1} r^{i} \end{align} \] because we can always shift the minus 1 into the summation index.
Finally, if \( n\rightarrow \infty \) then \( n-1 \) also goes to infinity, so this gives a standard geometric series with a finite limit.
Q: what is the limit of the geometric series above?

The sum converges as \( n\rightarrow\infty \) \[ \begin{align} \lim_{n\rightarrow \infty}\sum_{i=1}^n f(x_i)&= (0.01) \frac{1}{1-r} = (0.01)\frac{1}{1 - 0.99} =1 \end{align} \]

Probability distribution – example continued

On the last slide, we saw that sum converges as \( n\rightarrow\infty \) \[ \begin{align} \lim_{n\rightarrow \infty}\sum_{i=1}^n f(x_i)&= (0.01) \frac{1}{1-r} = (0.01)\frac{1}{1 - 0.99} =1 \end{align} \]
Practical Interpretation: The random experiment here has an unbounded number of outcomes, but it can still be conveniently modeled with a discrete random variable with a countably infinite range.
Take the union of all events that make up the sample space, \( s = \{p, ap, aap, aaap, aaaap, aaaaap, \text{and so forth}\} \), and let’s suppose we denote them as disjoint, exhaustive events \[ \begin{align} A_1 &= p\\ A_2 &= ap\\ A_3 &= aap\\ \vdots \end{align} \]
We know that \( S = \cup_{i=1}^\infty A_i \) and all events are disjoint, so that \[ P(s) = \sum_{A_i \in s} P(A_i) = \sum_{i=1}^\infty f(x_i)= (0.01) \sum_{x_i=1}^\infty r^{x_i-1} = 1 \]
The probability mass function gives another convenient way of expressing the above property, using partition of the sample space across the events that give distinct values of \( X \).

Cumulative Distribution Functions

An alternate method for describing a random variable’s probability distribution is with cumulative probabilities such as \( P(X \leq x) \).
- In some ways this may seem more complicated, but this is a mathematically more “natural” construction.
Like a probability mass function, a cumulative distribution function provides probabilities, but always over some range of outcomes.
Consider the digital channel example from earlier:
EXAMPLE: There is a chance that a bit transmitted through a digital transmission channel is received in error.
- Let \( X \) equal the number of bits in error in the next four bits transmitted. The possible values for \( X \) are \( \{0, 1, 2, 3, 4\} \) with probabilities \[ \begin{align} P(X=0)=0.6561&\;\;P(X=1)=0.2916\\ P(X=2)=0.0486&\;\;P(X=3)=0.0036\\ P(X=4)=0.0001 & \end{align} \]
Question: How can we compute the probability that three or fewer bits are in error?
- The event that \( \{X \leq 3\} \) is the union of the events \( \{X = 0\} \), \( \{X = 1\} \), \( \{X = 2\} \), and \( \{X = 3\} \).
- However, all of those events are mutually exclusive

Cumulative Distribution Functions – continued

In the last slide we saw that

\[ \begin{align} P(X=0)=0.6561&\;\;P(X=1)=0.2916\\ P(X=2)=0.0486&\;\;P(X=3)=0.0036\\ P(X=4)=0.0001 & \end{align} \]
Therefore,

\[ \begin{align} P(X\leq 3) &= P(X=0 \text{ or } X=1\text{ or }X=2\text{ or }X=3)\\ &= P(X=0)+P(X=1)+P(X=2)+P(X=3) \end{align} \] by computing the probability over the union of disjoint events.
Notice, the different outcomes for \( X \) correspond to collections of events in the sample space as,

\[ \begin{align} (X=0) \equiv\{(T,T,T,T)\} & & (X=1) \equiv \{(F,T,T,T), (T,F,T,T), (T,T,F,T), (T,T,T,F)\} , \cdots\\ \end{align} \] but where these collections of events are disjoint.
- Therefore, we can conveniently calculate
\[ P(X\leq 3) = 0.6561+0.2916+0.0486+0.0036 = 0.9999 \]

Cumulative Distribution Functions – continued

In general, for any discrete random variable with possible values \( x_1, x_2, \dots, \) the events \( \{X=x_1\}, \{X=x_2\},\dots \) are mutually exclusive.
Cumulative Distribution Function
The cumulative distribution function of a discrete random variable \( X \), denoted as \( F(x) \), is \[ F(x) = P(X \leq x) = \sum_{x_i \leq x} f(x_i) \]
Properties of a Cumulative Distribution Function
1. \( F(x)=P(X\leq x)=\sum_{x_i\leq x}f(x_i) \)
2. \( 0\leq F(x) \leq 1 \)
3. If \( x\leq y \), then \( F(x)\leq F(y) \)
Properties 1 and 2 of a cumulative distribution function follow from the definition.
Property 3 follows from the fact that if \( x \leq y \), the event that \( \{X \leq x\} \) is contained in the event \( \{X \leq y\} \)
Specifically, \[ \{X \leq y\} = \{X \leq x\} \cup \{x < X \leq y\} \]

Cumulative Distribution Functions – example

Suppose a random variable \( X \) can assume only integer values,
- notice then that the cumulative distribution function is still defined at non-integer values.
Consider again the digital channel example
If now we want to find \( P(X\leq 1.5) \)
- \[ \begin{align}F(1.5)& = P(X \leq 1.5) = P(X = 0) + P(X = 1) \\ & = 0.6561 + 0.2916 \\ & = 0.9477\end{align} \]
Therefore \( F(x)=0.9477 \) for all \( 1\leq x < 2 \)
Moreover, we can then partition the cumulative distribution \( F \) on intervals:

\[ \begin{align} F(x) = \begin{cases} 0 & x < 0 \\ 0.6561 & 0 \leq x < 1\\ 0.9477 & 1 \leq x < 2\\ 0.9963 & 2 \leq x < 3\\ 0.9999 & 3 \leq x < 4\\ 1 & 4\leq x \end{cases} \end{align} \]
This way we ca see that \( F(x) \) is piecewise constant between values \( x_1, x_2,\dots \)

Cumulative Distribution Functions – continued

Furthermore, \( P(X = x_i ) \) can be determined from the jump at the value \( x_i \).
More specifically, \[ P(X=x_i)=F(x_i)-\lim_{x \uparrow x_i} F(x) \]
This expression calculates the difference between \( F(x_i ) \) and the limit of \( F(x) \) as \( x \) increases to \( x_i \).
EXAMPLE: We will determine the probability mass function of \( X \) from the following cumulative distribution function: \[ F(x)=\left\{\begin{array}{@{}r} 0 & x < -2\\ 0.2 & -2 \leq x < 0\\ 0.7 & 0 \leq x < 2\\ 1 & 2\leq x \end{array}\right. \]

Cumulative distribution function for Example

Courtesy of Montgomery & Runger, Applied Statistics and Probability for Engineers, 7th edition

From the plot, the only points that receive nonzero probability are \( −2, 0 \), and \( 2 \).
The probability mass function at each point is the jump in the cumulative distribution function at the point.
Therefore \[ \begin{array}{rll} f(-2)&=0.2-0&=0.2\\ f(0)&=0.7-0.2&=0.5\\ f(2)&=1.0-0.7&=0.3 \end{array} \]

A review of the main ideas

We have now constructed the fundamental tools in probability for random experiments.
The axioms of probability like the the multiplication rule, the addition rule, total probability rule, etc… are useful for constructing .
However, in real examples, we usually make numeric measurements \( x \) of a random variable \( X \).
The collection of all possible outcomes \( \{x_i\} \) for the random variable \( X \) partitions the sample space into disjoint events.
Therefore, we can conveniently calculate probabilities of different measurable outcomes of the experiment through a probability mass function or a cumulative distribution function.
For a given possible outcome \( x_i \), we define the probability mass function by

\[ f(x_i) = P(X=x_i) \]

which may be associated with a collection of different measureable events in the sample space.
Similarly, for any possible value \( x \), we define the cumulative distribution function as the probability of a range of values

\[ F(x) = P(X\leq x) \]
Both of these provide different representations of the probability distribution, i.e., the complete collection of possible outcomes of \( X \) and their associated probabilities.