Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:

This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

- The following topics will be covered in this lecture:
- Sample statistics
- Sample random variables
- Sampling distributions
- The univariate Gaussian distribution
- Properties of the univariate Gaussian
- The central limit theorem

The goal statistics is to use

**a numerical summary of data**from a**small, representative sample**to say something**general**about the**larger, unobservable population or phenomena**.The measures of the

**population**are referred to as**parameters**.**Parameters**are generally**unknown and unknowable**.- For example, we cannot exactly compute the mean sea-surface temperature globally, as it is impossible to take all such measurements.
However, if we have a

**representative sample**, we can compute the**sample mean**.- Numerical values like the
**sample mean**computed from data are referred to as**statistics**.

- Numerical values like the
The

**sample mean**will almost surely**not equal****population mean**, due to the natural variation**(sampling error)**that occurs in**any given sample**.- However, if we have a good
**probabilistic model**for the population, we can use the**sample statistic**to estimate the general, unknown**population parameter**.

- However, if we have a good
**RVs**and**probability distributions**give us the**model**for estimating**population parameters**.**Note:**we can only**“find”**the parameters exactly in**very simple examples**like games of chance.Generally, we will have to be satisfied with

**estimates of the parameters**that are uncertain, but also**include measures of “how uncertain”**.

Suppose we have a sample of

**\( n \) total measurements of some RV \( X \)**.- We will denote these measurements \( x_1, x_2, \cdots, x_n \in \mathbb{R} \), where these refer to fixed numerical values.
- These may correspond to the value that \( X \)
**attains upon \( n \) independently replicated trials**.

The (arithmetic sample) mean

Given measurements \( x_1,\cdots,x_n \) of the RV \( X \), we say that thesample meanis defined \[ \text{Sample mean} = \hat{x} = \frac{x_1 +x_2 +\cdots + x_n}{n}= \frac{\sum_{i=1}^n x_i}{n} \]

We remark that \( \hat{x} \) is a

**fixed numerical value****depending on the particular sequence of outcomes**\( x_1,\cdots, x_n \) observed.- Due to this fact, with respect to a new sample of size \( n \), we may attain a new value for the sample mean.

- We can similarly define the sample variance and standard deviation as follows

Sample standard deviation

Given measurements \( x_1,\cdots,x_n \) of the RV \( X \), we say that thesample standard deviation\[ \hat{\sigma} = \sqrt{\frac{\sum_{i=1}^n\left(x_i - \hat{x}\right)^2}{n-1}} \]

- Note that the numerator in the above accounts for the fact that one degree of freedom has been utilized in the computation of \( \hat{x} \).

Sample variance

Given measurements \( x_1,\cdots,x_n \) of the RV \( X \), we say that thesample variance\[ \hat{\sigma}^2 = \frac{\sum_{i=1}^n\left(x_i - \hat{x}\right)^2}{n-1} \]

For the same reasons discussed for the sample mean, the sample standard deviation and variance will tend to differ depending on the particular sequence of outcomes \( x_1,\cdots, x_n \) measured.

This discrepancy is what we call

**sampling error**, in which the**random variation in a sample of a fixed size \( n \) upon replication produces differences in the computation of a statistic**.For this reason, we may also consider a

**probabilistic model for the sample statistic**,**depending on the replication of measurements**.

Specifically, suppose that we want to

**obtain an estimate of a population parameter**, where the**population is modeled with a RV \( X \)**.We know that before the data are collected, the observations are considered to be RVs,

- i.e., we treat an independent sequence of measurements of \( X \),

\[ X_1, X_2, \cdots , X_n \]

- as RVs all drawn from a parent distribution \( X \sim P \) (where the CDF will define the distribution).

**Random sample**

The RVs \( X_1 , X_2, \cdots , X_n \) are a**random sample**of size \( n \) if the \( X_i \)’s are independent RVs and every \( X_i \) has the same probability distribution.We then say that the measurements we obtain are possible outcomes of the sample variables \( \{X_i\}_{i=1}^n \);

- particularly, if we make a computation of the sample mean, \[ \hat{X} = \frac{1}{n} \sum_{i=1}^n X_i \]

the above is

**treated as a RV (a linear combination of RVs) which has a random outcome**, dependent on the realizations of the \( X_i \).

- More generally, any function of the observations, i.e., any statistic, is also modeled as a RV.

Point estimators

Let \( \{X_j\}_{j=1}^n \) be a random sample. Let \( \theta \) be a parameter of the parent population, defined by the CDF \( P \). If \( h \) is a general function used to compute some statistic estimating \( \theta \), we thus define the RV \[ \hat{\Theta} = h(X_1, \cdots, X_n) \] to be apoint estimatorfor \( \theta \).

We call the probability distribution of a statistic or estimator as above a

**sampling distribution**.**Sampling Distribution**

The probability distribution of a statistic is called a**sampling distribution**.In this framework, we will distinguish then between the estimator (a random variable) and the numerical value it might attain on a sample of measurements.

**Point estimate**

A**point estimate**of some population parameter \( \theta \) is a single numerical value

\[ \hat{\theta} = h(x_1, \cdots,x_n) \] attained as a particular realization of the RV \( \hat{\Theta} \).

The notion of the “center” of the sampling distribution can be useful as a general criteria for estimators.

Formally, we say that \( \hat{\Theta} \) is an

**unbiased estimator**of \( \theta \) if the**expected value**of \( \hat{\Theta} \) is equal to \( \theta \).This is equivalent to saying that the mean of the sampling distribution of \( \hat{\Theta} \) is equal to \( \theta \).

Bias of an Estimator

The point estimator \( \hat{\Theta} \) is anunbiased estimatorfor the parameter \( \theta \) if \[ \mathbb{E}\left[\hat{\Theta}\right] = \theta \] If the estimator is not unbiased, then the difference \[ \mathbb{E}\left[\hat{\Theta}\right] - \theta \] is called thebias of the estimator\( \hat{\Theta} \). When an estimator is unbiased, thebias is zero; that is, \[ \begin{align} \mathbb{E}\left[\hat{\Theta}\right] - \theta &= \theta - \theta \\ &=0 \end{align} \]

If we

**consider the expected value to represent the average value over infinite replications**;- the above says that “
**over infinite replications of a random sample**of size \( n \), the**average value of the point estimator**\( \hat{\Theta} \) will**equal the true population parameter**\( \theta \)”.

- the above says that “

- Both of the
**sample mean**\[ \hat{X}= \frac{1}{n}\sum_{i=1}^n X_i; \] and**sample variance**\[ \hat{\sigma}^2 = \frac{\sum_{i=1}^n \left(X_i - \hat{X}\right)^2}{n-1} \]

are

**unbiased estimators**, i.e., \[ \begin{align} \mathbb{E}\left[\hat{X}\right] = \overline{x}, & & \mathbb{E}\left[\hat{\sigma}^2\right] = \sigma^2. \end{align} \]However, there are theoretical reasons that we can use to show that the

**sample standard deviation**is a**biased estimator of the population standard deviation**, i.e.,\[ \mathbb{E}\left[ \hat{\sigma}\right] \leq \sigma \]

and it

**consistently underestimates the true standard deviation**.The bias tends to be small, however, and it is still the most practical estimate most of the time for the population standard deviation.

Recalling that the expected value gives the center of mass of the probability distribution, we should also be interested in the spread of the sampling distribution.

As noted before, the variance is a “natural” measure of spread mathematically for theoretical reasons, but it is in the units squared of the original units.

For this reason, when we talk about the spread of an estimator's sampling distribution, we typically discuss the

**standard error**.**The standard error**

Let \( \hat{\Theta} \) be a point estimator of \( \theta \). The**standard error error**of \( \hat{\Theta} \) is its standard deviation given by \[ \sigma_\hat{\Theta} = \sqrt{\mathrm{var}\left(\hat{\Theta}\right)}. \] If the standard error involves**unknown parameters that can be estimated**, substitution of those values into the equation above produces an estimated standard error denoted \( \hat{\sigma}_\hat{\Theta} \). It is also common to write the standard error as \( \mathrm{SE}\left(\hat{\Theta}\right) \).With these constructions in mind, we will now introduce one of the most fundamental results of classical statistics.

This result establishes the normal or Gaussian distribution in its central importance among distributions.

- The
**Gaussian distribution**is considered the**most prominent distribution in statistics**. - It is a continuous probability distribution that has a
**bell-shaped probability density**function. - The
**Gaussian distribution**arises from the**central limit theorem (CLT)**, - under weak conditions, the
**sum of a large number of RVs drawn from the same distribution is distributed approximately normally****irrespective of the form of the original distribution**. - This gives mathematical justification to why we see normally distributed data quite often in practice; as was noted by Henri Poincare
- In addition to the ubiquity of the normal distribution, it can be easily manipulated analytically in equations,
- this enables one to derive a large number of results in explicit form.
- Due to these two aspects, the normal distribution is used extensively in theory and practice.

“Everybody believes in the exponential law of errors [i.e., the normal / Gaussian distribution]: the experimenters, because they think it can be proved by mathematics; and the mathematicians, because they believe it has been established by observation.” — Poincare, Henri “Calcul Des Probabilités.”

- Unlike how we defined the density function \( p \) and used this to compute \( \overline{x} \) and \( \sigma \) formerly, we will reverse this for the normal.
- That is, we will use \( \overline{x} \) and \( \sigma \) to define the density of the normal and
**parametrize the distribution**. - Let us use the following notation for compactness where \[ \exp(x) = e^{x}. \]
**The univariate Gaussian distribution**

Let the**Gaussian RV**\( X \) have mean \( \overline{x} \) and standard deviation \( \sigma \). The probability density function is given as \[ \begin{align} p(x) = \frac{1}{\sqrt{2\pi}\sigma}\exp\left(-\frac{\left(x - \overline{x}\right)^2}{2\sigma^2}\right). \end{align} \] We will write \( X \sim N\left(\overline{x}, \sigma^2\right) \) to denote that \( X \) has the density described above.- Recall how we considered \( \overline{x} \) to be a measure of center and \( \sigma \) a measure of spread.
- If we vary these two values, we can change the center of mass and the spread of the normal distribution:

- In the case that \( \overline{x}=0 \) and \( \sigma=1 \), we denote \( N(0, 1) \) to be the
**standard normal distribution**.

- Another useful property of the
**family of Gaussian distributions**is that it is**closed under linear transformations**.

Closure of the Gaussian under linear transformations

Let \( X_1 \) and \( X_2 \) be independent, Gaussian RVs defined \[ \begin{align} X_1\sim N\left(\overline{x}_1 , \sigma_1^2 \right) & & X_2 \sim N\left(\overline{x}_2, \sigma_2^2 \right). \end{align} \] Then for \( a,b,c \in \mathbb{R} \), the linear combination satisfies \[ aX_1 + bX_2 + c \sim N\left(a \overline{x}_1 + b\overline{x}_2 + c, a^2 \sigma_1^2 + b^2 \sigma_2^2\right) \]

This is actually a

**general property**of the family of**stable distributions**.The closure property above implies that a Gaussian variable can always be “standardized” as,

\[ \begin{align} X \sim N(\overline{x}, \sigma^2) && \Rightarrow && \frac{X - \overline{x}}{\sigma} \sim N(0, 1). \end{align} \]

The closure of the Gaussian under linear transformations has extremely important implications, when we introduce a mechanistic model later.

This is at the basis of results for estimators defined in a class of models known as

**Gauss-Markov models**.- We will return to this subject shortly.

Suppose that a random sample of size \( n \) is taken from a

**normal population**with mean \( \overline{x} \) and variance \( \sigma^2 \).By definition of a

**random sample**each observation in this sample, say, \( X_1, X_2, \cdots, X_n \), is a normally and independently distributed RV with mean \( \overline{x} \) and variance \( \sigma^2 \).We conclude that, due to closure of the Gaussian, the sample mean

\[ \hat{X}= \frac{X_1 + X_2 + \cdots + X_n}{n} \]

has a normal distribution with mean

\[ \begin{align} \mathbb{E}\left[\hat{X}\right] &= \frac{\mathbb{E}\left[X_1\right] + \cdots + \mathbb{E}\left[X_n\right]}{n} = \overline{x} \end{align} \]

- and variance

\[ \sigma^2_\hat{X}:= \mathbb{E}\left[\left(\hat{X} - \overline{x}\right)^2\right] = \frac{\sigma^2 + \sigma^2 + \cdots + \sigma^2}{n^2} = \frac{\sigma^2}{n} \]

More generally, if we are sampling from a population that has an unknown probability distribution, the

**sampling distribution of the sample mean**will still be**approximately Gaussian**with mean \( \overline{x} \) and variance \( \frac{\sigma^2}{n} \) if the sample size \( n \) is large.This is one of the most useful theorems in statistics, called the

**central limit theorem**:**The central limit theorem (CLT)**

Let \( X_1 , X_2 , \cdots , X_n \) be a random sample of size \( n \) taken from a population with mean \( \overline{x} \) and finite variance \( \sigma^2 \) and \( \hat{X} \) be the sample mean. Then the limiting form of the distribution of \[ Z = \frac{\hat{X} - \overline{x}}{\frac{\sigma}{\sqrt{n}}} \] as \( n \rightarrow \infty \) is the**standard normal distribution**.Put another way, for \( n \) sufficiently large, \( \hat{X} \) has

**approximately**a \( N\left(\overline{x}, \frac{\sigma^2}{n}\right) \) distribution – this says the following.- Suppose we take a sample of size \( n \) and compute the sample mean \( \hat{x} \).
- Then suppose we replicate this sample and record the observed realizations for the sample mean \( \hat{x}_1, \hat{x}_2, \cdots \).
- If the sample size \( n \) is large, these data points \( \hat{x}_1, \cdots \) will be approximately bell shaped with the following properties:
- the bell will be centered approximately at \( \overline{x} \), the true population mean;
- the spread of the data around the center will be given by approximately by the standard deviation \( \frac{\sigma}{\sqrt{n}} \).

- Particularly, if \( n \) is very large, the observed sample means will tend to be very close to the center (the true mean).

- As a visualization of the concept, suppose that we have a random sample indexed by \( j \) \[ X_{1,j}, \cdots, X_{n,j}, \] where \( j \) refers to the replication number.
- We will make replications for \( j=1,\cdots,m \) and get a RV for sample mean indexed by \( j \), \[ \hat{X}_j = \frac{1}{n}\sum_{i=1}^n X_{i,j}. \]
- When we observe a realization of \( \hat{X}_j=\hat{x}_j \) or respectively the sample \[ X_{1,j}=x_{1,j}, \cdots, X_{n,j}=x_{n,j}, \] we record these fixed numerical values.