Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
The goal statistics is to use a numerical summary of data from a small, representative sample to say something general about the larger, unobservable population or phenomena.
The measures of the population are referred to as parameters.
Parameters are generally unknown and unknowable.
However, if we have a representative sample, we can compute the sample mean.
The sample mean will almost surely not equal population mean, due to the natural variation (sampling error) that occurs in any given sample.
RVs and probability distributions give us the model for estimating population parameters.
Note: we can only “find” the parameters exactly in very simple examples like games of chance.
Generally, we will have to be satisfied with estimates of the parameters that are uncertain, but also include measures of “how uncertain”.
Suppose we have a sample of \( n \) total measurements of some RV \( X \).
The (arithmetic sample) mean
Given measurements \( x_1,\cdots,x_n \) of the RV \( X \), we say that the sample mean is defined \[ \text{Sample mean} = \hat{x} = \frac{x_1 +x_2 +\cdots + x_n}{n}= \frac{\sum_{i=1}^n x_i}{n} \]
We remark that \( \hat{x} \) is a fixed numerical value depending on the particular sequence of outcomes \( x_1,\cdots, x_n \) observed.
Sample standard deviation
Given measurements \( x_1,\cdots,x_n \) of the RV \( X \), we say that the sample standard deviation \[ \hat{\sigma} = \sqrt{\frac{\sum_{i=1}^n\left(x_i - \hat{x}\right)^2}{n-1}} \]
Sample variance
Given measurements \( x_1,\cdots,x_n \) of the RV \( X \), we say that the sample variance \[ \hat{\sigma}^2 = \frac{\sum_{i=1}^n\left(x_i - \hat{x}\right)^2}{n-1} \]
For the same reasons discussed for the sample mean, the sample standard deviation and variance will tend to differ depending on the particular sequence of outcomes \( x_1,\cdots, x_n \) measured.
This discrepancy is what we call sampling error, in which the random variation in a sample of a fixed size \( n \) upon replication produces differences in the computation of a statistic.
For this reason, we may also consider a probabilistic model for the sample statistic, depending on the replication of measurements.
Specifically, suppose that we want to obtain an estimate of a population parameter, where the population is modeled with a RV \( X \).
We know that before the data are collected, the observations are considered to be RVs,
\[ X_1, X_2, \cdots , X_n \]
Random sample
The RVs \( X_1 , X_2, \cdots , X_n \) are a random sample of size \( n \) if the \( X_i \)’s are independent RVs and every \( X_i \) has the same probability distribution.
We then say that the measurements we obtain are possible outcomes of the sample variables \( \{X_i\}_{i=1}^n \);
the above is treated as a RV (a linear combination of RVs) which has a random outcome, dependent on the realizations of the \( X_i \).
Point estimators
Let \( \{X_j\}_{j=1}^n \) be a random sample. Let \( \theta \) be a parameter of the parent population, defined by the CDF \( P \). If \( h \) is a general function used to compute some statistic estimating \( \theta \), we thus define the RV \[ \hat{\Theta} = h(X_1, \cdots, X_n) \] to be a point estimator for \( \theta \).
We call the probability distribution of a statistic or estimator as above a sampling distribution.
Sampling Distribution
The probability distribution of a statistic is called a sampling distribution.
In this framework, we will distinguish then between the estimator (a random variable) and the numerical value it might attain on a sample of measurements.
Point estimate
A point estimate of some population parameter \( \theta \) is a single numerical value
\[ \hat{\theta} = h(x_1, \cdots,x_n) \] attained as a particular realization of the RV \( \hat{\Theta} \).
The notion of the “center” of the sampling distribution can be useful as a general criteria for estimators.
Formally, we say that \( \hat{\Theta} \) is an unbiased estimator of \( \theta \) if the expected value of \( \hat{\Theta} \) is equal to \( \theta \).
This is equivalent to saying that the mean of the sampling distribution of \( \hat{\Theta} \) is equal to \( \theta \).
Bias of an Estimator
The point estimator \( \hat{\Theta} \) is an unbiased estimator for the parameter \( \theta \) if \[ \mathbb{E}\left[\hat{\Theta}\right] = \theta \] If the estimator is not unbiased, then the difference \[ \mathbb{E}\left[\hat{\Theta}\right] - \theta \] is called the bias of the estimator \( \hat{\Theta} \). When an estimator is unbiased, the bias is zero; that is, \[ \begin{align} \mathbb{E}\left[\hat{\Theta}\right] - \theta &= \theta - \theta \\ &=0 \end{align} \]
If we consider the expected value to represent the average value over infinite replications;
are unbiased estimators, i.e., \[ \begin{align} \mathbb{E}\left[\hat{X}\right] = \overline{x}, & & \mathbb{E}\left[\hat{\sigma}^2\right] = \sigma^2. \end{align} \]
However, there are theoretical reasons that we can use to show that the sample standard deviation is a biased estimator of the population standard deviation, i.e.,
\[ \mathbb{E}\left[ \hat{\sigma}\right] \leq \sigma \]
and it consistently underestimates the true standard deviation.
The bias tends to be small, however, and it is still the most practical estimate most of the time for the population standard deviation.
Recalling that the expected value gives the center of mass of the probability distribution, we should also be interested in the spread of the sampling distribution.
As noted before, the variance is a “natural” measure of spread mathematically for theoretical reasons, but it is in the units squared of the original units.
For this reason, when we talk about the spread of an estimator's sampling distribution, we typically discuss the standard error.
The standard error
Let \( \hat{\Theta} \) be a point estimator of \( \theta \). The standard error error of \( \hat{\Theta} \) is its standard deviation given by \[ \sigma_\hat{\Theta} = \sqrt{\mathrm{var}\left(\hat{\Theta}\right)}. \] If the standard error involves unknown parameters that can be estimated, substitution of those values into the equation above produces an estimated standard error denoted \( \hat{\sigma}_\hat{\Theta} \). It is also common to write the standard error as \( \mathrm{SE}\left(\hat{\Theta}\right) \).
With these constructions in mind, we will now introduce one of the most fundamental results of classical statistics.
This result establishes the normal or Gaussian distribution in its central importance among distributions.
“Everybody believes in the exponential law of errors [i.e., the normal / Gaussian distribution]: the experimenters, because they think it can be proved by mathematics; and the mathematicians, because they believe it has been established by observation.” — Poincare, Henri “Calcul Des Probabilités.”
The univariate Gaussian distribution
Let the Gaussian RV \( X \) have mean \( \overline{x} \) and standard deviation \( \sigma \). The probability density function is given as \[ \begin{align} p(x) = \frac{1}{\sqrt{2\pi}\sigma}\exp\left(-\frac{\left(x - \overline{x}\right)^2}{2\sigma^2}\right). \end{align} \] We will write \( X \sim N\left(\overline{x}, \sigma^2\right) \) to denote that \( X \) has the density described above.
Closure of the Gaussian under linear transformations
Let \( X_1 \) and \( X_2 \) be independent, Gaussian RVs defined \[ \begin{align} X_1\sim N\left(\overline{x}_1 , \sigma_1^2 \right) & & X_2 \sim N\left(\overline{x}_2, \sigma_2^2 \right). \end{align} \] Then for \( a,b,c \in \mathbb{R} \), the linear combination satisfies \[ aX_1 + bX_2 + c \sim N\left(a \overline{x}_1 + b\overline{x}_2 + c, a^2 \sigma_1^2 + b^2 \sigma_2^2\right) \]
This is actually a general property of the family of stable distributions.
The closure property above implies that a Gaussian variable can always be “standardized” as,
\[ \begin{align} X \sim N(\overline{x}, \sigma^2) && \Rightarrow && \frac{X - \overline{x}}{\sigma} \sim N(0, 1). \end{align} \]
The closure of the Gaussian under linear transformations has extremely important implications, when we introduce a mechanistic model later.
This is at the basis of results for estimators defined in a class of models known as Gauss-Markov models.
Suppose that a random sample of size \( n \) is taken from a normal population with mean \( \overline{x} \) and variance \( \sigma^2 \).
By definition of a random sample each observation in this sample, say, \( X_1, X_2, \cdots, X_n \), is a normally and independently distributed RV with mean \( \overline{x} \) and variance \( \sigma^2 \).
We conclude that, due to closure of the Gaussian, the sample mean
\[ \hat{X}= \frac{X_1 + X_2 + \cdots + X_n}{n} \]
has a normal distribution with mean
\[ \begin{align} \mathbb{E}\left[\hat{X}\right] &= \frac{\mathbb{E}\left[X_1\right] + \cdots + \mathbb{E}\left[X_n\right]}{n} = \overline{x} \end{align} \]
\[ \sigma^2_\hat{X}:= \mathbb{E}\left[\left(\hat{X} - \overline{x}\right)^2\right] = \frac{\sigma^2 + \sigma^2 + \cdots + \sigma^2}{n^2} = \frac{\sigma^2}{n} \]
More generally, if we are sampling from a population that has an unknown probability distribution, the sampling distribution of the sample mean will still be approximately Gaussian with mean \( \overline{x} \) and variance \( \frac{\sigma^2}{n} \) if the sample size \( n \) is large.
This is one of the most useful theorems in statistics, called the central limit theorem:
The central limit theorem (CLT)
Let \( X_1 , X_2 , \cdots , X_n \) be a random sample of size \( n \) taken from a population with mean \( \overline{x} \) and finite variance \( \sigma^2 \) and \( \hat{X} \) be the sample mean. Then the limiting form of the distribution of \[ Z = \frac{\hat{X} - \overline{x}}{\frac{\sigma}{\sqrt{n}}} \] as \( n \rightarrow \infty \) is the standard normal distribution.
Put another way, for \( n \) sufficiently large, \( \hat{X} \) has approximately a \( N\left(\overline{x}, \frac{\sigma^2}{n}\right) \) distribution – this says the following.
Courtesy of Mathieu ROUAUD, CC BY-SA 4.0, via Wikimedia Commons