Continuous probability distributions and their parameters

03/08/2021

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

The following topics will be covered in this lecture:
- Continuous probability distributions
- The probability density function
- The cumulative distribution function
- The expected value of a continuous random variable
- The expected value of a function of a continuous random variable
- The standard deviation and variance of a continuous random variable

Motivation

Random variables are the numerical measure of the outcome of a random process.

Courtesy of Ania Panorska CC

Recall, that we have a separate kind of model for random experiments in which the units of measurement can be arbitrarily sub-divided.
A good example to think of is if \( X \) is the daily high temperature in Reno in degrees Celsius.
If we had a sufficiently accurate thermometer, we could measure \( X \) to an arbitrary decimal place and it would make sense.
\( X \) thus takes today’s weather from the outcome space and gives us a number in a continuous unit of measurement.

Continuous random variable is a random variable with an interval (either finite or infinite) of real numbers for its range.
The range of such a random variable is uncountably infinite.
This is to say that if \( X \) is a continuous random variable, there is no possible index set \( \mathcal{I}\subset \mathbb{Z} \) which can enumerate the possible values \( X \) can attain.
For discrete random variables, we could perform this with a possibly infinite index set, \( \{x_j\}_{j=1}^\infty \)
This has to do with how the infinity of the continuum \( \mathbb{R} \) is actually larger than the infinity of the counting numbers, \( \aleph_0 \);
in the continuum you can arbitrarily sub-divide the units of measurement.
The fact that a continuous sample space has an uncountably infinite number of outcomes eliminates the option of assigning a probability to each point as we did in the discrete case with the mass function.

Motivation Continued

We will instead look empirically at how we can construct a continuous probability as a density.
Suppose an electronic surveillance monitor is turned on briefly at the beginning of every hour and has a \( 0.905 \) probability of working properly, regardless of how long it has remained in service.
Let the random variable \( X \) denote the hour at which the monitor first fails and \( A_k \) represent the event that the monitor fails in the \( k \)-th hour.
Then the probability mass \( f(k) \) is the product of \( k \) individual probabilities: \[ \begin{align} f(k) &= P(X= k)\\ &= P\left( A_k \cap_{i=1}^{k-1}\overline{A}_i \right)\\ &=(0.905)^{k-1} \times (0.095) \end{align} \] for any given \( k=1, 2, \cdots \)
We plot the histogram for the probability mass function to the right

Courtesy of Larsen & Marx. An Introduction to Mathematical Statistics and Its Applications. 6th Edition.

Motivation Continued

Now we superimpose the exponential curve \( y = 0.1e^{−0.1x} \) onto the histogram.

Courtesy of Larsen & Marx. An Introduction to Mathematical Statistics and Its Applications. 6th Edition.

Notice how closely the area under the curve approximates the area of the bars.
It follows that the probability that \( X \) lies in some given interval will be numerically similar to the integral of the exponential curve above that same interval.
For example, the probability that the monitor fails sometime during the first four hours would be the sum \[ \begin{align} P(1\leq X \leq 4) &= \sum_{k=1}^4 f(k) \\ &= (0.905)^{k-1} \times (0.095)\\ &\approx 0.3297 \end{align} \]

To four decimal places, the corresponding area under the exponential curve representing continuous time between \( X=0 \) and \( X=4 \) is the same: \[ \begin{align} \int_{0}^4 0.1 e^{-0.1 x}\mathrm{d}x \approx 0.3297 \end{align} \]

Motivation Continued

We can generally use the idea of fitting a continuous probability function to approximate an integer- valued discrete probability model.
The “trick” is to replace the spikes that define \( f(x) \) with rectangles whose heights are \( f(x) \) and whose widths are \( 1 \).
Doing that makes the sum of the areas of the rectangles corresponding to \( f(x) \) equal to \( 1 \);

this is the same as the total area under the approximating continuous probability function.

Because of the equality of those two areas, it makes sense to superimpose (and compare) the “histogram” of \( f(x) \) and the continuous probability function on the same set of axes.
Imagine that we are forming a frequency histogram for the measurements from a random experiment with a continuous random variable \( Y \).
Suppose we have measurements \( y_1,\cdots, y_n \) which we will bin into rectangles over the range for \( Y \).

Table of uniformly distributed measurements.

Grouping those \( y_i \)’s into five classes, each of width \( 10 \), produces the distribution and histogram pictured below.

Courtesy of Larsen & Marx. An Introduction to Mathematical Statistics and Its Applications. 6th Edition.

Motivation Continued

Suppose we think that the probability of a measurement lying in one of the five classes

\[ \begin{align} \{[20,30), [30,40), [40, 50), [50, 60), [60, 70]\}= \cup_{i=1}^5 A_i \end{align} \] is equally likely.
Then we could assign a probability mass function as a discrete random variable

\[ \begin{align} f(X \text{ in } A_i )= \frac{1}{70 - 20} = \frac{1}{50} & & \text{for }i=1,\cdots, 5 \end{align} \]
Note, however, that \( f(X) \) and the histogram are not compatible in the sense that the area under \( f(x) \) should be \( 1 \) but the sum of the areas of the bars of the histogram is \( 400 \):

\[ \begin{align} \text{histogram area} = \sum_{i=1}^5 \text{width of }A_i \times \text{height of }A_i = 10(7) + 10(6) + 10(9) + 10(8) + 10(10) = 400 \end{align} \]
Nevertheless, we can make the total area of the five bars to match the area under \( f(x) \) by redefining the scale of the vertical axis on the histogram…

Motivation Continued

Specifically, frequency needs to be replaced with the analog of probability density;

intuitively, the density associated with, e.g., the interval \( [20, 30) \) would be defined as the quotient \[ \begin{align} \text{Density of a class}& = \frac{\text{Frequency of measurement}}{\text{Total number of all measurements }\times\text{Class width}} \\ &= \frac{7 }{40 \times 10} \end{align} \]

Applying this argument to each class and computing the associated density values, we plot this in the below:

Courtesy of Larsen & Marx. An Introduction to Mathematical Statistics and Its Applications. 6th Edition.

In this graph above, the sum of the areas under the density bars and the curve \( f(x) \) both equal \( 1 \), corresponding to an empirical probability distribution.

Motivation Continued

Now imagine that we are modeling an unknown probability distribution of a continuous random variable.
Suppose for a generic sample of size \( N \), \( x_1, \cdots, x_N \), that we construct a density histogram as in the last example;

let’s construct classes of equal width \( \frac{1}{\sqrt{N}} \) and the frequency of measurement in the class \( i \) given by \( n_i \).

The associated density of the class \( i \) would be given as \[ \begin{align} f_N(X \text{ in class } i) = \frac{n_i}{N \times \frac{1}{\sqrt{N}}} = \frac{n_i}{\sqrt{N}}. \end{align} \]
The sum of the area over all classes would also equal \[ \begin{align} \text{Area of density histogram} &= \sum_{i} f_N(X \text{ in class } i)\times \frac{1}{\sqrt{N}} \\ &=\sum_i \frac{n_i}{\sqrt{N}}\times \frac{1}{\sqrt{N}}\\ & = \sum_i \frac{n_i}{N} = 1 \end{align} \]
We can imagine then taking the sample size \( N\rightarrow \infty \) and refining this argument with smaller and smaller class sizes \( \frac{1}{\sqrt{N}} \), corresponding to arbitrarily sub-dividing the units of measurement:

Courtesy of 09glasgow09, CC BY-SA 3.0, via Wikimedia Commons

This heuristic argument would give us something similar to a Riemann sum as above, though the precise technical details are more complicated.

Probability density functions

Intuitively, however, this gives the picture of how to represent the probability of a continuous random variable.
For a continuous random variable \( X \), a probability density function is a function \( f \) such that
1. \( f(x)\geq 0 \) for all \( x\in \mathbb{R} \).
2. \( \int_{-\infty}^\infty f(x)\mathrm{d}x = 1 \)
3. \( P(a \leq X \leq b) = \int_{a}^b f(x)\mathrm{d}x = \text{The area under the density curve }f(x)\text{ for any }a\leq b \)

The notion of the density curve thus directly extends the idea of the probability mass function for discrete random variables.

However, a key difference is that \[ \begin{align} \int_{a}^a f(x) \mathrm{d}x = P(X=a)=0 \end{align} \] for any value \( a \).

I.e., the probability of any single point measurement is always zero.
Particularly, we can only compute non-zero probability in ranges for continuous random variables.

Probability density functions continued

Based on this last result, it might appear that our model of a continuous random variable is useless.
However, in practice, when a particular current measurement, such as \( 14.47 \) milliamperes, is observed, this result can be interpreted as the rounded value of a current measurement that is actually in a range such as \( 14.465\leq x \leq 14.475 \).
Therefore, the probability that the rounded value \( 14.47 \) is observed as the value for X is the probability that X assumes a value in the interval \[ [14.465, 14.475], \] which is not zero.
Similarly, because each point has zero probability, one need not distinguish between inequalities such as \( < \) or \( \leq \) for continuous random variables.
For a continuous random variable \( X \), for any \( x_1< x_2 \) \[ P(x_1 \leq X \leq x_2 ) = P(x_1 < X \leq x_2) = P(x_1 \leq X < x_2) = P(x_1 < X < x_2). \]

Probability density function example

Let the continuous random variable \( X \) denote the current measured in a thin copper wire in milliamperes.
Assume that the range of \( X \) is \( [4.9, 5.1] \) mA, and assume that the probability density function of \( X \) is \( f(x) = 5 \) for \( 4.9 ≤ x ≤ 5.1 \).
Notice that the width of the total range is \[ 5.1-4.9 = 0.2 \] such that \[ \text{Area under the density curve} = 5 \times 0.2 = 1. \]
Suppose we want to calculate the probability that a current measurement is less than 5 milliamperes.
The probability density function is shown to the right, with the probability the shaded area under the curve.

Courtesy of Montgomery & Runger, Applied Statistics and Probability for Engineers, 7th edition

It is assumed that \( f(x) = 0 \) wherever it is not specifically defined.
We can thus make the computation for the area directly as \[ \begin{align} \int_{4.9}^{5.0} f(x) \mathrm{d}x &= \int_{4.9}^{5.0} 5.0\mathrm{d}x \\ & = 5.0 x \vert_{4.9}^{5.0} \\ &= 5.0(0.1) = 0.5 \end{align} \]
The above is precisely equivalent to calculation half of the nonzero area under the density.

Cumulative distribution function

Recall how we defined the cumulative distribution function for a discrete random variable.
We will suppose that in the first place, the possible-to-measure values are given by \( \{x_i\} \); then \[ F(x) = P(X \leq x_i) = \sum_{j\leq i} f(x_j) \]
If we follow the analogy above, and recall our definition of the probability density function, the next definition is a straight-forward extension.
For a continuous random variable \( X \), a cumulative distribution function is a function \( F \) such that
1. \( F(x) = P(X\leq x) = \int_{-\infty}^x f(u)\mathrm{d}u \) given a density function \( f \).
2. \( f(x) = \frac{\mathrm{d}}{\mathrm{d}x} F(x) \) given that \( F(x) \) is differentiable.
Given the two statements above, it follows easily from the fundamental theorem of calculus that we can compute the probability of a range equivalently by the following: \[ P(a \leq X\leq b)= \int_{a}^b f(x)\mathrm{d}x = \int_a^b \frac{\mathrm{d}}{\mathrm{d}x} F(x)\mathrm{d}x = F(b) - F(a) = P(X\leq b) -P(X\leq a) \]

Cumulative distribution function example

For the copper current measurement in the previous example, let’s calculate the cumulative distribution function.
We recall that the random variable \( X \) has a probability density defined as \[ \begin{align} f(x) = \begin{cases} 5.0 & x\in[4.9, 5.1]\\ 0.0 & \text{else} \end{cases} \end{align} \]
Thus we can use the previous relationships to derive the cumulative distribution function as \[ \begin{align} F(x) &= \int_{-\infty}^x \frac{\mathrm{d}}{\mathrm{d}{u}} F(u)\mathrm{d}u \\ &= \int_{4.9}^x f(u)\mathrm{d}u \end{align} \]
Using the defintion of where \( f \) is nonzero, we find \[ \begin{align} F(x)&= \begin{cases} 0.0 & x \in(-\infty, 4.9)\\ 5.0\left(x - 4.9\right) & x \in[4.9, 5.1]\\ 1.0 & x \in (5.0,\infty) \end{cases} \end{align} \]
The resulting cumulative distribution function is pictured to the right.

Courtesy of Montgomery & Runger, Applied Statistics and Probability for Engineers, 7th edition

Notice that the shape of the cumulative distribution starts at zero and increases until it reaches one.
This is a general property due to the facts that, \[ \begin{align} f(x)&\geq 0 \\ \int_{-\infty}^\infty f(x) \mathrm{d}x& = 1 \end{align} \]

Expected value of a continuous random variable

Now recall how we defined the expected value for a discrete random variable.
We will suppose that in the first place, the possible-to-measure values are given by \( \{x_i\} \); then \[ \mu = \sum_{i}x_i P(X = x_i) = \sum_{i} x_if(x_i) \]
If we follow the analogy above, and recall our definition of the probability density function, the next definition is also a straight-forward extension.
For a continuous random variable \( X \), the expected value \( \mu \) is given as \[ \mu = \mathbb{E}\left[X\right]= \int_{-\infty}^{\infty} x f(x)\mathrm{d}x \]
It also follows directly that if we have a continuous random variable \( X \) and a function \( h(X) \), we can write:
For a continuous random variable \( X \), the expected value of \( h(X) \) is given as \[ \mathbb{E}\left[ h(X)\right] = \int_{-\infty}^{\infty} h(x) f(x)\mathrm{d}x \]
In the special case that \( h(X) = aX + b \) for any constants \( a \) and \( b \), \[ \mathbb{E}[h(X)] = a\mathbb{E}(X) + b. \]
This can be shown from the linearity properties of integrals.

Example of the expected value of a continuous random variable

Consider the copper wire example with density function

\[ \begin{align} f(x) = \begin{cases} 5.0 & x\in[4.9, 5.1]\\ 0.0 & \text{else} \end{cases} \end{align} \]
If we wish to take the expected value of the current, we would find,

\[ \begin{align} \mu &= \int_{4.9}^{5.1} x f(x) \mathrm{d}x \\ &= \int_{4.9}^{5.1} 5.0x\mathrm{d}x \\ &= \frac{5.0}{2.0}x^2 \vert_{4.9}^{5.1} \\ &= 2.5(5.1)^2 - 2.5(4.9)^2 =5.0 \end{align} \]
Recall, we described the mean before as the center of mass;
- this interpretation holds for the expected value as described above where for the uniform density between \( 4.9 \) and \( 5.1 \) the center of mass is the midpoint at \( 5.0 \).

Standard deviation and variance of a continuous random variable

We can also follow the analogy with how we defined the standard deviation and variance for a discrete random variable.
We will suppose that in the first place, the possible-to-measure values are given by \( \{x_i\} \); then \[ \begin{align} \sigma^2 &= \sum_{i}P(X = x_i) \left(x_i - \mu\right)^2 = \sum_{i} f(x_i) \left( x_i - \mu\right)^2\\ \sigma &= \sqrt{\sum_{i}P(X = x_i) \left(x_i - \mu\right)^2 }= \sqrt{\sum_{i} f(x_i) \left( x_i - \mu\right)^2} \end{align} \]
If we follow the analogy above, and recall our definition of the probability density function, the next definition is also a straight-forward extension.
For a continuous random variable \( X \), the variance \( \sigma^2 \) and standard deviation are given as \[ \begin{align} \sigma^2 &= \int_{-\infty}^{\infty} f(x)\left(x - \mu\right)^2\mathrm{d}x \\ \sigma &= \sqrt{\int_{-\infty}^{\infty} f(x)\left(x - \mu\right)^2\mathrm{d}x } \end{align} \]

Example of the standard deviation and variance of a continuous random variable

We consider once again the copper wire with the electrical current, with density equal to

\[ \begin{align} f(x) = \begin{cases} 5.0 & x\in[4.9, 5.1]\\ 0.0 & \text{else} \end{cases} \end{align} \]
We can compute the variance thus as

\[ \begin{align} \sigma^2 &= \int_{4.9}^{5.1}f(x)\left(x - \mu\right)^2\mathrm{d}x \\ &= \int_{4.9}^{5.1}5.0 \left(x - 5.0\right)^2\mathrm{d}x \end{align} \]
If we make a substitution as \( u = x- 5.0 \) then \( \mathrm{d}u = \mathrm{d}x \) such that,

\[ \begin{align} \sigma^2 &= \int_{-0.1}^{0.1}5.0 u^2 \mathrm{d}u\\ &= \frac{5.0}{3.0}u^3\vert_{-0.1}^{0.1} = \frac{10.0}{3.0}\times 0.1^3 \approx 0.0033 \end{align} \]
The standard deviation is thus given as \( \sqrt{\frac{10.0}{30.0}\times 0.1^3}\approx 0.0577 \).

Review of main concepts

For a continuous random variable, the concepts from discrete random variables have direct analogs.
We have the following correspondences

Discrete	Continuous
Probability mass function \( f(x) \)	Probability density function \( f(x) \)
\( P(X=x_\alpha) = f(x_\alpha) \)	\( P(a \leq X \leq b) = \int_{a}^b f(x)\mathrm{d}x \)
Cumulative distribution function \( F(x)=P(X\leq x) \)	Cumulative distribution function \( F(X)=P(X\leq x) \)
\( F(x) = \sum_{x_\alpha \in \mathbf{R}} f(x_\alpha) \)	\( F(x) = \int_{x\in \mathbb{R}}f(x)\mathrm{d}x \)
\( \mu = \sum_{x_\alpha \in \mathbf{R}} xf(x_\alpha) \)	\( \mu = \int_{x\in \mathbb{R}} x f(x)\mathrm{d}x \)
\( \sigma^2 = \sum_{x_\alpha \in \mathbf{R}}(x - \mu)^2 f(x_\alpha) \)	\( \sigma^2 = \int_{x\in \mathbb{R}}( x -\mu)^2 f(x)\mathrm{d}x \)

Due to the difference between discrete measurements and continuous measurements (where we can arbitrarily sub-divide units) the probability of measuring a single value of a continuous random variable always has probability zero.
Particularly, with continuous random variables, we always define probabilities over ranges of values, assuming some kind of truncation approximation.
Otherwise the ideas are extremely similar by replacing sums with integrals (or Riemann sums).
We will look at two very common continuous probability distributions next time.