03/08/2021
Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
The following topics will be covered in this lecture:
Courtesy of Ania Panorska CC
Continuous random variable is a random variable with an interval (either finite or infinite) of real numbers for its range.
Courtesy of Larsen & Marx. An Introduction to Mathematical Statistics and Its Applications. 6th Edition.
Courtesy of Larsen & Marx. An Introduction to Mathematical Statistics and Its Applications. 6th Edition.
Courtesy of Larsen & Marx. An Introduction to Mathematical Statistics and Its Applications. 6th Edition.
Suppose we think that the probability of a measurement lying in one of the five classes
\[ \begin{align} \{[20,30), [30,40), [40, 50), [50, 60), [60, 70]\}= \cup_{i=1}^5 A_i \end{align} \] is equally likely.
Then we could assign a probability mass function as a discrete random variable
\[ \begin{align} f(X \text{ in } A_i )= \frac{1}{70 - 20} = \frac{1}{50} & & \text{for }i=1,\cdots, 5 \end{align} \]
Note, however, that \( f(X) \) and the histogram are not compatible in the sense that the area under \( f(x) \) should be \( 1 \) but the sum of the areas of the bars of the histogram is \( 400 \):
\[ \begin{align} \text{histogram area} = \sum_{i=1}^5 \text{width of }A_i \times \text{height of }A_i = 10(7) + 10(6) + 10(9) + 10(8) + 10(10) = 400 \end{align} \]
Nevertheless, we can make the total area of the five bars to match the area under \( f(x) \) by redefining the scale of the vertical axis on the histogram…
Courtesy of Larsen & Marx. An Introduction to Mathematical Statistics and Its Applications. 6th Edition.
Courtesy of 09glasgow09, CC BY-SA 3.0, via Wikimedia Commons
For a continuous random variable \( X \), a probability density function is a function \( f \) such that
- \( f(x)\geq 0 \) for all \( x\in \mathbb{R} \).
- \( \int_{-\infty}^\infty f(x)\mathrm{d}x = 1 \)
- \( P(a \leq X \leq b) = \int_{a}^b f(x)\mathrm{d}x = \text{The area under the density curve }f(x)\text{ for any }a\leq b \)
For a continuous random variable \( X \), for any \( x_1< x_2 \) \[ P(x_1 \leq X \leq x_2 ) = P(x_1 < X \leq x_2) = P(x_1 \leq X < x_2) = P(x_1 < X < x_2). \]
Courtesy of Montgomery & Runger, Applied Statistics and Probability for Engineers, 7th edition
For a continuous random variable \( X \), a cumulative distribution function is a function \( F \) such that
- \( F(x) = P(X\leq x) = \int_{-\infty}^x f(u)\mathrm{d}u \) given a density function \( f \).
- \( f(x) = \frac{\mathrm{d}}{\mathrm{d}x} F(x) \) given that \( F(x) \) is differentiable.
Courtesy of Montgomery & Runger, Applied Statistics and Probability for Engineers, 7th edition
For a continuous random variable \( X \), the expected value \( \mu \) is given as \[ \mu = \mathbb{E}\left[X\right]= \int_{-\infty}^{\infty} x f(x)\mathrm{d}x \]
For a continuous random variable \( X \), the expected value of \( h(X) \) is given as \[ \mathbb{E}\left[ h(X)\right] = \int_{-\infty}^{\infty} h(x) f(x)\mathrm{d}x \]
Consider the copper wire example with density function
\[ \begin{align} f(x) = \begin{cases} 5.0 & x\in[4.9, 5.1]\\ 0.0 & \text{else} \end{cases} \end{align} \]
If we wish to take the expected value of the current, we would find,
\[ \begin{align} \mu &= \int_{4.9}^{5.1} x f(x) \mathrm{d}x \\ &= \int_{4.9}^{5.1} 5.0x\mathrm{d}x \\ &= \frac{5.0}{2.0}x^2 \vert_{4.9}^{5.1} \\ &= 2.5(5.1)^2 - 2.5(4.9)^2 =5.0 \end{align} \]
Recall, we described the mean before as the center of mass;
For a continuous random variable \( X \), the variance \( \sigma^2 \) and standard deviation are given as \[ \begin{align} \sigma^2 &= \int_{-\infty}^{\infty} f(x)\left(x - \mu\right)^2\mathrm{d}x \\ \sigma &= \sqrt{\int_{-\infty}^{\infty} f(x)\left(x - \mu\right)^2\mathrm{d}x } \end{align} \]
We consider once again the copper wire with the electrical current, with density equal to
\[ \begin{align} f(x) = \begin{cases} 5.0 & x\in[4.9, 5.1]\\ 0.0 & \text{else} \end{cases} \end{align} \]
We can compute the variance thus as
\[ \begin{align} \sigma^2 &= \int_{4.9}^{5.1}f(x)\left(x - \mu\right)^2\mathrm{d}x \\ &= \int_{4.9}^{5.1}5.0 \left(x - 5.0\right)^2\mathrm{d}x \end{align} \]
If we make a substitution as \( u = x- 5.0 \) then \( \mathrm{d}u = \mathrm{d}x \) such that,
\[ \begin{align} \sigma^2 &= \int_{-0.1}^{0.1}5.0 u^2 \mathrm{d}u\\ &= \frac{5.0}{3.0}u^3\vert_{-0.1}^{0.1} = \frac{10.0}{3.0}\times 0.1^3 \approx 0.0033 \end{align} \]
The standard deviation is thus given as \( \sqrt{\frac{10.0}{30.0}\times 0.1^3}\approx 0.0577 \).
For a continuous random variable, the concepts from discrete random variables have direct analogs.
We have the following correspondences
Discrete | Continuous |
---|---|
Probability mass function \( f(x) \) | Probability density function \( f(x) \) |
\( P(X=x_\alpha) = f(x_\alpha) \) | \( P(a \leq X \leq b) = \int_{a}^b f(x)\mathrm{d}x \) |
Cumulative distribution function \( F(x)=P(X\leq x) \) | Cumulative distribution function \( F(X)=P(X\leq x) \) |
\( F(x) = \sum_{x_\alpha \in \mathbf{R}} f(x_\alpha) \) | \( F(x) = \int_{x\in \mathbb{R}}f(x)\mathrm{d}x \) |
\( \mu = \sum_{x_\alpha \in \mathbf{R}} xf(x_\alpha) \) | \( \mu = \int_{x\in \mathbb{R}} x f(x)\mathrm{d}x \) |
\( \sigma^2 = \sum_{x_\alpha \in \mathbf{R}}(x - \mu)^2 f(x_\alpha) \) | \( \sigma^2 = \int_{x\in \mathbb{R}}( x -\mu)^2 f(x)\mathrm{d}x \) |
Due to the difference between discrete measurements and continuous measurements (where we can arbitrarily sub-divide units) the probability of measuring a single value of a continuous random variable always has probability zero.
Particularly, with continuous random variables, we always define probabilities over ranges of values, assuming some kind of truncation approximation.
Otherwise the ideas are extremely similar by replacing sums with integrals (or Riemann sums).
We will look at two very common continuous probability distributions next time.