Continuous random variables and univariate distributions

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

  • The following topics will be covered in this lecture:
    • A review of the basics of continuous distributions
    • The uniform distribution
    • The normal distribution

A review of continuous random variables

  • Unlike discrete random variables, continuous random variables can take on an uncountably infinite number of possible values.

    • This is to say that if \( X \) is a continuous random variable, there is no possible index set \( \mathcal{I}\subset \mathbb{Z} \) which can enumerate the possible values \( X \) can attain.
    • For discrete random variables, we could perform this with a possibly infinite index set, \( \{x_j\}_{j=1}^\infty \)
    • This has to do with how the infinity of the continuum \( \mathbb{R} \) is actually larger than the infinity of the counting numbers, \( \aleph_0 \);
    • in the continuum you can arbitrarily sub-divide the units of measurement.
  • These random variables are characterized by a distribution function and a density function.

  • Let , then the mapping \[ F_X:\mathbb{R} \rightarrow [0,1] \] defined by \( F_X (x) = P(X \leq x) \), is called the cumulative distribution function (cdf) of the rv \( X \).

  • A mapping \( f_X: \mathbb{R} \rightarrow \mathbb{R}^+ \) is called the probability density function (pdf) of an rv \( X \) if \( f_X(x) = \frac{\mathrm{d} F_X}{\mathrm{d}x} \) exists for all \( x\in \mathbb{R} \); and

  • and the density is integrable, i.e., \[ \int_{-\infty}^\infty f_X (x) \mathrm{d}x \] exists and takes on the value one.

A review of continuous random variables

  • Q: we defined, \[ \begin{align} f_X(x) = \frac{\mathrm{d} F_X}{\mathrm{d}x} & & \text{ and }& & \int_{a}^b \frac{\mathrm{d}f}{\mathrm{d}x} \mathrm{d}x = f(b) - f(a) \end{align} \] how can you use the definition above and the fundamental theorem of calculus to find another form for the CDF?

    • A: Notice that \( \frac{\mathrm{d} F_X}{\mathrm{d}x} \) means that the CDF can be written in terms of the anti-derivative of the density.
    • If \( s \) and \( t \) are arbitrary values, the definite integral is written as

    \[ \begin{align} \int_{s}^t f_X(x) \mathrm{d}x &= \int_{s}^t \frac{\mathrm{d} F_X}{\mathrm{d}x} \mathrm{d}x\\ &= F_X(t) - F_X(s) \\ & = P(X \leq t) - P(X \leq s) = P(s < X \leq t) \end{align} \]

    • If we take a limit as \( s \rightarrow \infty \) we thus recover that

    \[ \begin{align} \lim_{s\rightarrow - \infty} \int_{s}^t f_X(x) \mathrm{d} x & = \lim_{s \rightarrow -\infty} P(s < X \leq t) \\ & = P(X\leq t) = F_X(t) \end{align} \]

Properties of continuous distributions

  • Last week we discussed how the elementary properties of the probability distribution of a discrete rv can be described by an expectation and a variance.
  • With respect to this, the only difference with continuous rvs is in the use of integrals, rather than sums, over the possible values of the rv.
  • Let \( X \) be a continuous rv with a density function \( f_X(x) \) – then the expectation of \( X \) is defined as \[ \mathbb{E}\left[X\right] = \int_{-\infty}^{+\infty} xf_X(x)\mathrm{d}x = \mu_X \] where \( f_X \) is the density function described before.
  • Note that the same interpretation of the expected value from discrete rvs applies here:
    1. We see \( \mathbb{E}\left[X\right]=\mu_X \) as representing the “center of mass” for the “density” curve \( f_X \).
    2. We see \( \mathbb{E}\left[X\right]=\mu_X \) as representing the mean that we would obtain if we could take infinitely many independently replicated measurements of \( X \), and took the average of these measurements over all possible scenarios.
  • If the expectation of \( X \) exists, the variance is defined as \[ \begin{align} \mathrm{var} \left(X\right)& = \mathbb{E}\left[\left(X − \mu_X \right)^2\right] \\ &=\int_{-\infty}^\infty \left(x - \mu_X\right)^2 f_X(x)\mathrm{d}x = \sigma_X^2 \end{align} \]
  • Once again, this is a measure of dispersion by averaging the deviation of each case from the mean in the square sense, weighted by the probability density.

Properties of continuous distributions

  • While the variance is a more “fundamental” theoretical quantity for various reasons, in practice we are usually concerned with the standard deviation of the random variable \( X \), \[ \mathrm{std}(X)=\sqrt{\mathrm{var}\left(X\right)} = \sigma_X. \]
  • This is due to the fact that the variance \( \sigma^2_X \) has the units of \( X^2 \) by the definition of the product.
    • For example, if the units of \( X \) are \( \mathrm{cm} \), then \( \sigma_X^2 \) will be in \( \mathrm{cm}^2 \).

  • Taking a square root on the variance gives us the standard deviation \( \sigma_X \) in the units of \( X \) itself.

Quantiles / percentiles

  • While together the mean \( \mu_X \) and the standard deviation \( \sigma_X \) give a picture of the center and dispersion of a probability distribution, we can analyze this in a different way.

  • For example, while the mean is the notion of the “center of mass”, we may also be interested in where the upper and lower \( 50\% \) of values are separated as a different notion of “center”.

    • The value that separates this upper and lower half does not need to equal the center of mass in general, and it is known commonly as the median.
  • More generally, for any univariate cumulative distribution function \( F \), and for \( 0 < p < 1 \), we can identify \( p \) as a percent of the data that lies under the graph of a density curve.

    • We might be interested in where the lower \( p \) area is separated from the upper \( 1-p \) area.
  • The quantity \[ \begin{align} F^{-1}(p)=\inf \left\{x \vert F(x) \geq p \right\} \end{align} \] is called the theoretical \( p \)-th quantile or percentile of \( F \).

  • The “\( \inf \)” in the above refers to the smallest possible quantity in the set on the right-hand-side.

  • We will usually refer to the \( p \)-th quantile as \( \xi_p \).

  • \( F^{-1} \) is called the quantile function.

    • Particularly, \( \xi_{-\frac{1}{2}} \) is known as the theoretical median of a distribution.

Skewness and kurtosis

Diagram of kurtosis for different distributions.

Courtesy of Diva Jain / CC BY-SA

  • Other useful characteristics of a distribution are its skewness and excess kurtosis.
  • The skewness of a probability distribution is defined as the extent to which it deviates from symmetry.
  • A distribution has negative skewness if the left tail is longer than the right tail of the distribution;
    • i.e., there are more values on the right side of the mean than on the left side of the mean.
  • Respectively, positive skewness refers to the right tail being longer than the left tail.
  • For the rv \( X \), we define the skewness to be, \[ \mathbb{E}\left[ \left( X - \mu_X\right)^3 \right] / \sigma_X^3. \]
  • This can be understood as a kind of average, third order signed deviation of the random variable from the mean, relative to the dispersion cubed.

Skewness and kurtosis

Diagram of kurtosis for different distributions.

Courtesy of Joxemai / CC BY-SA

  • The kurtosis on the other hand is a measure of the peakedness of a probability distribution.
  • The excess kurtosis is used to compare the kurtosis of a pdf with the kurtosis of the normal distribution, which equals \( 3 \).
  • The formula for the excess kurtosis is given as follows: \[ \begin{align} K(X) = \frac{\mathbb{E}\left[\left(X - \mu_X\right)^4\right]}{\sigma_X^4} - 3 \end{align} \] where the excess kurtosis gives a signed, fourth order average of the deviation from the mean, relative to the dispersion to the quartic.
  • Distributions with negative or positive excess kurtosis are called platykurtic distributions and leptokurtic distributions, respectively.
  • A distribution that displays normal Kurtosis is described as mesokurtic.
  • Q: given the picture on the left, which of the distributions correspond to positive or negative excess kurtosis and which correpond to normal kurtosis?
  • A: in the figure above, A represents a distribution with positive excess kurtosis, B represents normal kurtosis, C represents negative excess kurtosis, while D is an extreme case of non-peakedness, the uniform distribution.

The uniform distribution

  • The uniform distribution \( U(a, b) \) is defined such that all intervals of the same length on the distribution’s support are equally probable.

  • Suppose \( a=0 \) and \( b=1 \), we will use the dunif function to plot the probability density function to plot the density function similarly to earlier examples:

par(cex = 2.0, mar = c(5, 4, 4, 2) + 0.3)
f = dunif(x=seq(-1,2,by=0.01), min=0, max=1)
plot(x=seq(-1,2,by=0.01), f, type = "s", main = "Uniform distribution on [0,1]", xlab = "x", ylab = "Prob.")

plot of chunk unnamed-chunk-1

  • The support is defined by the two parameters, \( a \) and \( b \), which are its minimum and maximum values.

The uniform distribution

plot of chunk unnamed-chunk-2

  • Notice given the above shape, and the description of the probability as the area under the curve.

  • Q: the uniform distribution gives zero probability to any interval outside of \( [a,b] \), and if the total area must equal to one – what must the height of the uniform distribution be equal to?

  • A: we can use the basic property of the area of a rectangle, the width (equal to \( b-a \)) times the height (the density function \( f_X(x) \)) must multiply to one.

  • Therefore, for an arbitrary uniform distribution over \( [a,b] \) the density curve will be given by,

    \[ \begin{align} f(x,a,b) = \begin{cases} \frac{1}{b-a} & x \in [a,b]\\ 0 & x\notin [a,b] \end{cases} \end{align} \]

The uniform distribution

  • Q: now that we have found the density curve for the uniform distribution

    \[ \begin{align} f(x,a,b) = \begin{cases} \frac{1}{b-a} & x \in [a,b]\\ 0 & x\notin [a,b] \end{cases} \end{align} \]

    can you find the expected value of an arbitrary uniformly distributed random variable \( U \sim U(a,b) \)?

  • A: consider that,

    \[ \begin{align} \mathbb{E}\left[ U\right] &=\int_{a}^{b} x \frac{1}{b-a} \mathrm{d}x \\ &= \frac{x^2}{2(b-a)}\Big{\vert}_{a}^b = \frac{b^2 - a^2}{2(a-b)} = \frac{(b-a)(b+a)}{2(b-a)} = \frac{b+a}{2} \end{align} \]

  • That says, the expected value (or center of mass) lies exactly in the midpoint of the interval.

    • Likewise, it is easy to see that the median will align with the center of mass here by the symmetry about the midpoint.
  • We can similarly show that \( \mathrm{var}=\frac{(b-a)^2}{12} \) and by symmetry, we know that the skewness is zero by default.

The uniform distribution

  • An extremely important property about the uniform distribution for simulation purposes has to do with the notion again of quantiles.
  • Let \( F_X \) be the cdf of an arbitrary rv \( X \); then \( X \) can be converted to a uniform distribution via the probability integral transform.
  • Notice firstly, that if \( F_X(X) \) is read as a composition of the function,

    \[ \begin{align} X : \Omega \rightarrow \mathbb{R} \\ F_X: \mathbb{R} \rightarrow [0,1]\\ \\ \end{align} \] we can see \( F_X(X) \) as a random variable taking values in the interval \( [0,1] \).
  • It is a general property that \( X \) has the CDF \( F_X \), then \( F_X(X) \sim U(0,1) \), where

    \[ \begin{align} F_X(X) = F_X(X=t) = \int_{\infty}^t f_X(x) \mathrm{d}x\\\\ \end{align} \] and the attained value \( t \) of \( X \) depends on the random outcome \( \omega\in\Omega \).
  • On the other hand, suppose that \( U \sim U(0,1) \) is a uniform random variable on the unit interval,
    • then \( F_X^{-1}(U) \) has a CDF of \( F_X \) and we say that \( X \) and \( F_X^{-1}(U) \) have the same distribution.
  • Practically speaking, this means that if we can simulate the uniformly distributed variable \( U\sim [0,1] \), we can compose this with an arbitrary CDF to generate a different random variable.

The uniform distribution

  • In R the generic functions for the uniform distribution are the following:

    • dunif(x, min, max) is the probability density function of the uniform.
    • punif(q, min, max) is the cumulative density funciton of the uniform.
    • qunif(p, min, max) is the quantile function of the uniform.
    • runif(n, min, max) randomly generates a sample of size n from the uniform
  • Note that dunif also contains the argument log which allows for computation of the log density, useful in the likelihood estimation.

The normal distribution

  • The normal distribution is considered the most prominent distribution in statistics.
  • It is a continuous probability distribution that has a bell-shaped probability density function, also known as the Gaussian function.
  • The normal distribution arises from the central limit theorem (CLT),
    • under weak conditions, the sum of a large number of rvs drawn from the same distribution is distributed approximately normally irrespective of the form of the original distribution.
  • This gives mathematical justification to why we see normally distributed data quite often in practice; as was noted by Henri Poincare
    • “Everybody believes in the exponential law of errors [i.e., the normal / Gaussian distribution]: the experimenters, because they think it can be proved by mathematics; and the mathematicians, because they believe it has been established by observation.” — Poincare, Henri “Calcul Des Probabilités.”
  • In addition to the ubiquity of the normal distribution, it can be easily manipulated analytically in equations,
    • this enables one to derive a large number of results in explicit form.
  • Due to these two aspects, the normal distribution is used extensively in theory and practice.

The normal distribution

  • Formally, we will describe the Gaussian pdf as, \[ \begin{align} \phi\left(x,\mu,\sigma^2\right) = \left( 2 \pi \sigma^2 \right)^{-2} \exp\left\{-\left(x - \mu\right)^2/ \left(2 \sigma^2\right)\right\}, \end{align} \]

    • In R, this is encoded as dnorm(x=value, mean=mu, sd=1) and we can picture the standard normal or Gaussian density below:

    plot of chunk unnamed-chunk-3

The normal distribution

  • In order to work with this distribution in R, there is a list of standard implemented functions:

    • dnorm(x, mean, sd) for the pdf (if argument log = TRUE then log density);
    • pnorm(q, mean, sd) for the cdf;
    • qnorm(p, mean, sd) for the quantile function; and
    • rnorm(n, mean, sd) for generating random normally distributed samples.
  • Their parameters are:

    • x, a vector of quantiles,
    • p, a vector of probabilities, and
    • n, the number of observations.
  • If the mean and standard deviation are not specified, are set to the standard normal values by default.

  • It will become clear in the next lecture the central place the normal distribution occupies, by the number of other distributions that are closely related or derived from the normal.

The normal distribution

  • Another useful property of the family of normal distributions is that it is closed under linear transformations.

  • Thus a linear combination of two independent normal rvs,

    \[ \begin{align} X_1\sim N(\mu_1 , \sigma_1^2 ) & & X_2 \sim N(\mu_2, \sigma_2^2 ), \end{align} \] is also normally distributed:

    • i.e.,

    \[ aX_1 + bX_2 + c \sim N\left(a \mu_1 + b\mu_2 + c, a^2 \sigma_1^2 + b^2 \sigma_2^2\right) \]

  • This is actually a general property of the family of stable distributions which is discussed in greater detail in the recommended reading.