Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
Unlike discrete random variables, continuous random variables can take on an uncountably infinite number of possible values.
These random variables are characterized by a distribution function and a density function.
Let
A mapping \( f_X: \mathbb{R} \rightarrow \mathbb{R}^+ \) is called the probability density function (pdf) of an rv \( X \) if \( f_X(x) = \frac{\mathrm{d} F_X}{\mathrm{d}x} \) exists for all \( x\in \mathbb{R} \); and
and the density is integrable, i.e., \[ \int_{-\infty}^\infty f_X (x) \mathrm{d}x \] exists and takes on the value one.
Q: we defined, \[ \begin{align} f_X(x) = \frac{\mathrm{d} F_X}{\mathrm{d}x} & & \text{ and }& & \int_{a}^b \frac{\mathrm{d}f}{\mathrm{d}x} \mathrm{d}x = f(b) - f(a) \end{align} \] how can you use the definition above and the fundamental theorem of calculus to find another form for the CDF?
\[ \begin{align} \int_{s}^t f_X(x) \mathrm{d}x &= \int_{s}^t \frac{\mathrm{d} F_X}{\mathrm{d}x} \mathrm{d}x\\ &= F_X(t) - F_X(s) \\ & = P(X \leq t) - P(X \leq s) = P(s < X \leq t) \end{align} \]
\[ \begin{align} \lim_{s\rightarrow - \infty} \int_{s}^t f_X(x) \mathrm{d} x & = \lim_{s \rightarrow -\infty} P(s < X \leq t) \\ & = P(X\leq t) = F_X(t) \end{align} \]
While together the mean \( \mu_X \) and the standard deviation \( \sigma_X \) give a picture of the center and dispersion of a probability distribution, we can analyze this in a different way.
For example, while the mean is the notion of the “center of mass”, we may also be interested in where the upper and lower \( 50\% \) of values are separated as a different notion of “center”.
More generally, for any univariate cumulative distribution function \( F \), and for \( 0 < p < 1 \), we can identify \( p \) as a percent of the data that lies under the graph of a density curve.
The quantity \[ \begin{align} F^{-1}(p)=\inf \left\{x \vert F(x) \geq p \right\} \end{align} \] is called the theoretical \( p \)-th quantile or percentile of \( F \).
The “\( \inf \)” in the above refers to the smallest possible quantity in the set on the right-hand-side.
We will usually refer to the \( p \)-th quantile as \( \xi_p \).
\( F^{-1} \) is called the quantile function.
The uniform distribution \( U(a, b) \) is defined such that all intervals of the same length on the distribution’s support are equally probable.
Suppose \( a=0 \) and \( b=1 \), we will use the dunif
function to plot the probability density function to plot the density function similarly to earlier examples:
par(cex = 2.0, mar = c(5, 4, 4, 2) + 0.3)
f = dunif(x=seq(-1,2,by=0.01), min=0, max=1)
plot(x=seq(-1,2,by=0.01), f, type = "s", main = "Uniform distribution on [0,1]", xlab = "x", ylab = "Prob.")
Notice given the above shape, and the description of the probability as the area under the curve.
Q: the uniform distribution gives zero probability to any interval outside of \( [a,b] \), and if the total area must equal to one – what must the height of the uniform distribution be equal to?
A: we can use the basic property of the area of a rectangle, the width (equal to \( b-a \)) times the height (the density function \( f_X(x) \)) must multiply to one.
Therefore, for an arbitrary uniform distribution over \( [a,b] \) the density curve will be given by,
\[ \begin{align} f(x,a,b) = \begin{cases} \frac{1}{b-a} & x \in [a,b]\\ 0 & x\notin [a,b] \end{cases} \end{align} \]
Q: now that we have found the density curve for the uniform distribution
\[ \begin{align} f(x,a,b) = \begin{cases} \frac{1}{b-a} & x \in [a,b]\\ 0 & x\notin [a,b] \end{cases} \end{align} \]
can you find the expected value of an arbitrary uniformly distributed random variable \( U \sim U(a,b) \)?
A: consider that,
\[ \begin{align} \mathbb{E}\left[ U\right] &=\int_{a}^{b} x \frac{1}{b-a} \mathrm{d}x \\ &= \frac{x^2}{2(b-a)}\Big{\vert}_{a}^b = \frac{b^2 - a^2}{2(a-b)} = \frac{(b-a)(b+a)}{2(b-a)} = \frac{b+a}{2} \end{align} \]
That says, the expected value (or center of mass) lies exactly in the midpoint of the interval.
We can similarly show that \( \mathrm{var}=\frac{(b-a)^2}{12} \) and by symmetry, we know that the skewness is zero by default.
In R the
dunif(x, min, max)
is the probability density function of the uniform.punif(q, min, max)
is the cumulative density funciton of the uniform.qunif(p, min, max)
is the quantile function of the uniform.runif(n, min, max)
randomly generates a sample of size n from the uniformNote that dunif
also contains the argument log
which allows for computation of the log density, useful in the likelihood estimation.
“Everybody believes in the exponential law of errors [i.e., the normal / Gaussian distribution]: the experimenters, because they think it can be proved by mathematics; and the mathematicians, because they believe it has been established by observation.” — Poincare, Henri “Calcul Des Probabilités.”
dnorm(x=value, mean=mu, sd=1)
and we can picture the standard normal or Gaussian density below:In order to work with this distribution in R, there is a list of standard implemented functions:
dnorm(x, mean, sd)
for the pdf (if argument log = TRUE then log density); pnorm(q, mean, sd)
for the cdf; qnorm(p, mean, sd)
for the quantile function; and rnorm(n, mean, sd)
for generating random normally distributed samples. Their parameters are:
x
, a vector of quantiles,p
, a vector of probabilities, and n
, the number of observations. If the mean and standard deviation are not specified, are set to the standard normal values by default.
It will become clear in the next lecture the central place the normal distribution occupies, by the number of other distributions that are closely related or derived from the normal.
Another useful property of the family of normal distributions is that it is closed under linear transformations.
Thus a linear combination of two independent normal rvs,
\[ \begin{align} X_1\sim N(\mu_1 , \sigma_1^2 ) & & X_2 \sim N(\mu_2, \sigma_2^2 ), \end{align} \] is also normally distributed:
\[ aX_1 + bX_2 + c \sim N\left(a \mu_1 + b\mu_2 + c, a^2 \sigma_1^2 + b^2 \sigma_2^2\right) \]
This is actually a general property of the family of stable distributions which is discussed in greater detail in the recommended reading.