# The central limit theorem continued and general concepts of point estimation

04/05/2021

## Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

## Outline

• The following topics will be covered in this lecture:

• A review of the central limit theorem
• Applications of the central limit theorem
• Approximate sampling distribution of a difference in sample means
• General concepts in point estimation
• Bias of estimators
• Variance of estimators
• Standard Error

## A review of the central limit theorem

• Suppose that we want to obtain an estimate of a population parameter, where the population is modeled with a random variable $$X$$.

• We know that before the data are collected, the observations are considered to be random variables,

• i.e., we treat an independent sequence of measurements of $$X$$,

$X_1, X_2, \cdots , X_n$

• as random variables all drawn from a parent distribution $$X \sim F(x)$$ (where the CDF will define the distribution).
Random sample
The random variables $$X_1 , X_2, \cdots , X_n$$ are a random sample of size $$n$$ if the $$X_i$$’s are independent random variables and every $$X_i$$ has the same probability distribution.
• We then say that the measurements we obtain are possible outcomes of the sample variables $$\{X_i\}_{i=1}^n$$; particularly, if we make a computation of the sample mean,

$\overline{X} = \frac{1}{n} \sum_{i=1}^n X_i$

the above is treated as a random variable (a linear combination of random variables) which has a random outcome, dependent on the realizations of the $$X_i$$.

### A review of the central limit theorem

• Generally, if we are sampling from a population that has an unknown probability distribution, the sampling distribution of the sample mean will still be approximately normal with mean $$\mu$$ and variance $$\frac{\sigma^2}{n}$$ if the sample size $$n$$ is large.

• This is one of the most useful theorems in statistics, called the central limit theorem:

The central limit theorem
Let $$X_1 , X_2 , \cdots , X_n$$ be a random sample of size $$n$$ taken from a population with mean $$\mu$$ and finite variance $$\sigma^2$$ and $$\overline{X}$$ be the sample mean. Then the limiting form of the distribution of $Z = \frac{\overline{X} - \mu}{\frac{\sigma}{\sqrt{n}}}$ as $$n \rightarrow \infty$$ is the standard normal distribution.
• Put another way, for $$n$$ sufficiently large, $$\overline{X}$$ has approximately a $$N\left(\mu, \frac{\sigma^2}{n}\right)$$ distribution – this says the following.

• Suppose we take a sample of size $$n$$ and compute the sample mean $$\overline{X}$$.
• Then suppose we replicate this sample and record the observed realizations for the sample mean $$\overline{x}_1, \overline{x}_2, \cdots$$.
• If the sample size $$n$$ is lage, these data points $$\overline{x}_1, \cdots$$ will be approximately bell shaped with the following properties:
• the bell will be centered approximately at $$\mu$$, the true population mean;
• the spread of the data around the center will be given by approximately by the standard deviation $$\frac{\sigma}{\sqrt{n}}$$.
• Particularly, if $$n$$ is very large, the observed sample means will be very close to the center (the true mean).

### Central limit theorem continued

• As a visualization of the concept, suppose again that we have a random sample indexed by $$j$$ $X_{j,1}, \cdots, X_{j,n}.$
• We will make replications for $$j=1,\cdots,m$$ and get a random variable for sample mean indexed by $$j$$, $\overline{X}_j = \frac{1}{n}\sum_{i=1}^n X_{j,i}.$
• When we observe a realization of $$\overline{X}_j=\overline{x}_j$$ or respectively the sample $X_{j,1}=x_{j,1}, \cdots, X_{j,n}=x_{j,n},$ we record these fixed numerical values.