Continuous random variables and univariate distributions


  • The following topics will be covered in this lecture:
    • A review of the basics of continuous distributions
    • The uniform distribution
    • The normal distribution

A review of continuous random variables

  • Unlike discrete random variables, continuous random variables can take on an uncountably infinite number of possible values.

    • This is to say that if \( X \) is a continuous random variable, there is no possible index set \( \mathcal{I}\subset \mathbb{Z} \) which can enumerate the possible values \( X \) can attain.
    • For discrete random variables, we could perform this with a possibly infinite index set, \( \{x_j\}_{j=1}^\infty \)
    • This has to do with how the infinity of the continuum \( \mathbb{R} \) is actually larger than the infinity of the counting numbers, \( \aleph_0 \);
    • in the continuum you can arbitrarily sub-divide the units of measurement.
  • These random variables are characterized by a distribution function and a density function.

  • Let , then the mapping \[ F_X:\mathbb{R} \rightarrow [0,1] \] defined by \( F_X (x) = P(X \leq x) \), is called the cumulative distribution function (cdf) of the rv \( X \).

  • A mapping \( f_X: \mathbb{R} \rightarrow \mathbb{R}^+ \) is called the probability density function (pdf) of an rv \( X \) if \( f_X(x) = \frac{\mathrm{d} F_X}{\mathrm{d}x} \) exists for all \( x\in \mathbb{R} \); and

  • and the density is integrable, i.e., \[ \int_{-\infty}^\infty f_X (x) \mathrm{d}x \] exists and takes on the value one.

  • Q: we defined, \[ \begin{align} f_X(x) = \frac{\mathrm{d} F_X}{\mathrm{d}x} & & \text{ and }& & \int_{a}^b \frac{\mathrm{d}f}{\mathrm{d}x} \mathrm{d}x = f(b) - f(a) \end{align} \] how can you use the definition above and the fundamental theorem of calculus to find another form for the CDF?

    • A: Notice that \( \frac{\mathrm{d} F_X}{\mathrm{d}x} \) means that the CDF can be written in terms of the anti-derivative of the density.
    • If \( s \) and \( t \) are arbitrary values, the definite integral is written as

    \[ \begin{align} \int_{s}^t f_X(x) \mathrm{d}x &= \int_{s}^t \frac{\mathrm{d} F_X}{\mathrm{d}x} \mathrm{d}x\\ &= F_X(t) - F_X(s) \\ & = P(X \leq t) - P(X \leq s) = P(s < X \leq t) \end{align} \]

    • If we take a limit as \( s \rightarrow \infty \) we thus recover that

    \[ \begin{align} \lim_{s\rightarrow - \infty} \int_{s}^t f_X(x) \mathrm{d} x & = \lim_{s \rightarrow -\infty} P(s < X \leq t) \\ & = P(X\leq t) = F_X(t) \end{align} \]

Properties of continuous distributions

  • Last week we discussed how the elementary properties of the probability distribution of a discrete rv can be described by an expectation and a variance.
  • With respect to this, the only difference with continuous rvs is in the use of integrals, rather than sums, over the possible values of the rv.
  • Let \( X \) be a continuous rv with a density function \( f_X(x) \) – then the expectation of \( X \) is defined as \[ \mathbb{E}\left[X\right] = \int_{-\infty}^{+\infty} xf_X(x)\mathrm{d}x = \mu_X \] where \( f_X \) is the density function described before.
  • Note that the same interpretation of the expected value from discrete rvs applies here:
    1. We see \( \mathbb{E}\left[X\right]=\mu_X \) as representing the “center of mass” for the “density” curve \( f_X \).
    2. We see \( \mathbb{E}\left[X\right]=\mu_X \) as representing the mean that we would obtain if we could take infinitely many independently replicated measurements of \( X \), and took the average of these measurements over all possible scenarios.
  • If the expectation of \( X \) exists, the variance is defined as \[ \begin{align} \mathrm{var} \left(X\right)& = \mathbb{E}\left[\left(X − \mu_X \right)^2\right] \\ &=\int_{-\infty}^\infty \left(x - \mu_X\right)^2 f_X(x)\mathrm{d}x = \sigma_X^2 \end{align} \]
  • Once again, this is a measure of dispersion by averaging the deviation of each case from the mean in the square sense, weighted by the probability density.

  • While the variance is a more “fundamental” theoretical quantity for various reasons, in practice we are usually concerned with the standard deviation of the random variable \( X \), \[ \mathrm{std}(X)=\sqrt{\mathrm{var}\left(X\right)} = \sigma_X. \]
  • This is due to the fact that the variance \( \sigma^2_X \) has the units of \( X^2 \) by the definition of the product.
    • For example, if the units of \( X \) are \( \mathrm{cm} \), then \( \sigma_X^2 \) will be in \( \mathrm{cm}^2 \).

  • Taking a square root on the variance gives us the standard deviation \( \sigma_X \) in the units of \( X \) itself.

Quantiles / percentiles

  • While together the mean \( \mu_X \) and the standard deviation \( \sigma_X \) give a picture of the center and dispersion of a probability distribution, we can analyze this in a different way.

  • For example, while the mean is the notion of the “center of mass”, we may also be interested in where the upper and lower \( 50\% \) of values are separated as a different notion of “center”.

    • The value that separates this upper and lower half does not need to equal the center of mass in general, and it is known commonly as the median.
  • More generally, for any univariate cumulative distribution function \( F \), and for \( 0 < p < 1 \), we can identify \( p \) as a percent of the data that lies under the graph of a density curve.

    • We might be interested in where the lower \( p \) area is separated from the upper \( 1-p \) area.
  • The quantity \[ \begin{align} F^{-1}(p)=\inf \left\{x \vert F(x) \geq p \right\} \end{align} \] is called the theoretical \( p \)-th quantile or percentile of \( F \).

  • The “\( \inf \)” in the above refers to the smallest possible quantity in the set on the right-hand-side.

  • We will usually refer to the \( p \)-th quantile as \( \xi_p \).

  • \( F^{-1} \) is called the quantile function.

    • Particularly, \( \xi_{-\frac{1}{2}} \) is known as the theoretical median of a distribution.

Skewness and kurtosis

Diagram of kurtosis for different distributions.

  • Other useful characteristics of a distribution are its skewness and excess kurtosis.
  • The skewness of a probability distribution is defined as the extent to which it deviates from symmetry.
  • A distribution has negative skewness if the left tail is longer than the right tail of the distribution;
    • i.e., there are more values on the right side of the mean than on the left side of the mean.
  • Respectively, positive skewness refers to the right tail being longer than the left tail.
  • For the rv \( X \), we define the skewness to be, \[ \mathbb{E}\left[ \left( X - \mu_X\right)^3 \right] / \sigma_X^3. \]
  • This can be understood as a kind of average, third order signed deviation of the random variable from the mean, relative to the dispersion cubed.

Diagram of kurtosis for different distributions.

  • The kurtosis on the other hand is a measure of the peakedness of a probability distribution.
  • The excess kurtosis is used to compare the kurtosis of a pdf with the kurtosis of the normal distribution, which equals \( 3 \).
  • The formula for the excess kurtosis is given as follows: \[ \begin{align} K(X) = \frac{\mathbb{E}\left[\left(X - \mu_X\right)^4\right]}{\sigma_X^4} - 3 \end{align} \] where the excess kurtosis gives a signed, fourth order average of the deviation from the mean, relative to the dispersion to the quartic.
  • Distributions with negative or positive excess kurtosis are called platykurtic distributions and leptokurtic distributions, respectively.
  • A distribution that displays normal Kurtosis is described as mesokurtic.
  • Q: given the picture on the left, which of the distributions correspond to positive or negative excess kurtosis and which correpond to normal kurtosis?
  • A: in the figure above, A represents a distribution with positive excess kurtosis, B represents normal kurtosis, C represents negative excess kurtosis, while D is an extreme case of non-peakedness, the uniform distribution.

The uniform distribution

  • The uniform distribution \( U(a, b) \) is defined such that all intervals of the same length on the distribution’s support are equally probable.

  • Suppose \( a=0 \) and \( b=1 \), we will use the dunif function to plot the probability density function to plot the density function similarly to earlier examples:

par(cex = 2.0, mar = c(5, 4, 4, 2) + 0.3)
f = dunif(x=seq(-1,2,by=0.01), min=0, max=1)
plot(x=seq(-1,2,by=0.01), f, type = "s", main = "Uniform distribution on [0,1]", xlab = "x", ylab = "Prob.")

plot of chunk unnamed-chunk-1

  • The support is defined by the two parameters, \( a \) and \( b \), which are its minimum and maximum values.

plot of chunk unnamed-chunk-2

  • Notice given the above shape, and the description of the probability as the area under the curve.

  • Q: the uniform distribution gives zero probability to any interval outside of \( [a,b] \), and if the total area must equal to one – what must the height of the uniform distribution be equal to?

  • A: we can use the basic property of the area of a rectangle, the width (equal to \( b-a \)) times the height (the density function \( f_X(x) \)) must multiply to one.

  • Therefore, for an arbitrary uniform distribution over \( [a,b] \) the density curve will be given by,

    \[ \begin{align} f(x,a,b) = \begin{cases} \frac{1}{b-a} & x \in [a,b]\\ 0 & x\notin [a,b] \end{cases} \end{align} \]

  • Q: now that we have found the density curve for the uniform distribution

    \[ \begin{align} f(x,a,b) = \begin{cases} \frac{1}{b-a} & x \in [a,b]\\ 0 & x\notin [a,b] \end{cases} \end{align} \]

    can you find the expected value of an arbitrary uniformly distributed random variable \( U \sim U(a,b) \)?

  • A: consider that,

    \[ \begin{align} \mathbb{E}\left[ U\right] &=\int_{a}^{b} x \frac{1}{b-a} \mathrm{d}x \\ &= \frac{x^2}{2(b-a)}\Big{\vert}_{a}^b = \frac{b^2 - a^2}{2(a-b)} = \frac{(b-a)(b+a)}{2(b-a)} = \frac{b+a}{2} \end{align} \]

  • That says, the expected value (or center of mass) lies exactly in the midpoint of the interval.

    • Likewise, it is easy to see that the median will align with the center of mass here by the symmetry about the midpoint.
  • We can similarly show that \( \mathrm{var}=\frac{(b-a)^2}{12} \) and by symmetry, we know that the skewness is zero by default.

  • An extremely important property about the uniform distribution for simulation purposes has to do with the notion again of quantiles.
  • Let \( F_X \) be the cdf of an arbitrary rv \( X \); then \( X \) can be converted to a uniform distribution via the probability integral transform.
  • Notice firstly, that if \( F_X(X) \) is read as a composition of the function,

    \[ \begin{align} X : \Omega \rightarrow \mathbb{R} \\ F_X: \mathbb{R} \rightarrow [0,1]\\ \\ \end{align} \] we can see \( F_X(X) \) as a random variable taking values in the interval \( [0,1] \).
  • It is a general property that \( X \) has the CDF \( F_X \), then \( F_X(X) \sim U(0,1) \), where

    \[ \begin{align} F_X(X) = F_X(X=t) = \int_{\infty}^t f_X(x) \mathrm{d}x\\\\ \end{align} \] and the attained value \( t \) of \( X \) depends on the random outcome \( \omega\in\Omega \).
  • On the other hand, suppose that \( U \sim U(0,1) \) is a uniform random variable on the unit interval,
    • then \( F_X^{-1}(U) \) has a CDF of \( F_X \) and we say that \( X \) and \( F_X^{-1}(U) \) have the same distribution.
  • Practically speaking, this means that if we can simulate the uniformly distributed variable \( U\sim [0,1] \), we can compose this with an arbitrary CDF to generate a different random variable.

  • In R the generic functions for the uniform distribution are the following:

    • dunif(x, min, max) is the probability density function of the uniform.
    • punif(q, min, max) is the cumulative density funciton of the uniform.
    • qunif(p, min, max) is the quantile function of the uniform.
    • runif(n, min, max) randomly generates a sample of size n from the uniform
  • Note that dunif also contains the argument log which allows for computation of the log density, useful in the likelihood estimation.

The normal distribution

  • The normal distribution is considered the most prominent distribution in statistics.
  • It is a continuous probability distribution that has a bell-shaped probability density function, also known as the Gaussian function.
  • The normal distribution arises from the central limit theorem (CLT),
    • under weak conditions, the sum of a large number of rvs drawn from the same distribution is distributed approximately normally irrespective of the form of the original distribution.
  • This gives mathematical justification to why we see normally distributed data quite often in practice; as was noted by Henri Poincare
    • “Everybody believes in the exponential law of errors [i.e., the normal / Gaussian distribution]: the experimenters, because they think it can be proved by mathematics; and the mathematicians, because they believe it has been established by observation.” — Poincare, Henri “Calcul Des Probabilités.”
  • In addition to the ubiquity of the normal distribution, it can be easily manipulated analytically in equations,
    • this enables one to derive a large number of results in explicit form.
  • Due to these two aspects, the normal distribution is used extensively in theory and practice.

  • Formally, we will describe the Gaussian pdf as, \[ \begin{align} \phi\left(x,\mu,\sigma^2\right) = \left( 2 \pi \sigma^2 \right)^{-2} \exp\left\{-\left(x - \mu\right)^2/ \left(2 \sigma^2\right)\right\}, \end{align} \]

    • In R, this is encoded as dnorm(x=value, mean=mu, sd=1) and we can picture the standard normal or Gaussian density below:

    plot of chunk unnamed-chunk-3

  • In order to work with this distribution in R, there is a list of standard implemented functions:

    • dnorm(x, mean, sd) for the pdf (if argument log = TRUE then log density);
    • pnorm(q, mean, sd) for the cdf;
    • qnorm(p, mean, sd) for the quantile function; and
    • rnorm(n, mean, sd) for generating random normally distributed samples.
  • Their parameters are:

    • x, a vector of quantiles,
    • p, a vector of probabilities, and
    • n, the number of observations.
  • If the mean and standard deviation are not specified, are set to the standard normal values by default.

  • It will become clear in the next lecture the central place the normal distribution occupies, by the number of other distributions that are closely related or derived from the normal.

  • Another useful property of the family of normal distributions is that it is closed under linear transformations.

  • Thus a linear combination of two independent normal rvs,

    \[ \begin{align} X_1\sim N(\mu_1 , \sigma_1^2 ) & & X_2 \sim N(\mu_2, \sigma_2^2 ), \end{align} \] is also normally distributed:

    • i.e.,

    \[ aX_1 + bX_2 + c \sim N\left(a \mu_1 + b\mu_2 + c, a^2 \sigma_1^2 + b^2 \sigma_2^2\right) \]

  • This is actually a general property of the family of stable distributions which is discussed in greater detail in the recommended reading.