04/05/2021
Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
The following topics will be covered in this lecture:
Suppose that we want to obtain an estimate of a population parameter, where the population is modeled with a random variable \( X \).
We know that before the data are collected, the observations are considered to be random variables,
\[ X_1, X_2, \cdots , X_n \]
Random sample
The random variables \( X_1 , X_2, \cdots , X_n \) are a random sample of size \( n \) if the \( X_i \)’s are independent random variables and every \( X_i \) has the same probability distribution.
We then say that the measurements we obtain are possible outcomes of the sample variables \( \{X_i\}_{i=1}^n \); particularly, if we make a computation of the sample mean,
\[ \overline{X} = \frac{1}{n} \sum_{i=1}^n X_i \]
the above is treated as a random variable (a linear combination of random variables) which has a random outcome, dependent on the realizations of the \( X_i \).
Generally, if we are sampling from a population that has an unknown probability distribution, the sampling distribution of the sample mean will still be approximately normal with mean \( \mu \) and variance \( \frac{\sigma^2}{n} \) if the sample size \( n \) is large.
This is one of the most useful theorems in statistics, called the central limit theorem:
The central limit theorem
Let \( X_1 , X_2 , \cdots , X_n \) be a random sample of size \( n \) taken from a population with mean \( \mu \) and finite variance \( \sigma^2 \) and \( \overline{X} \) be the sample mean. Then the limiting form of the distribution of \[ Z = \frac{\overline{X} - \mu}{\frac{\sigma}{\sqrt{n}}} \] as \( n \rightarrow \infty \) is the standard normal distribution.
Put another way, for \( n \) sufficiently large, \( \overline{X} \) has approximately a \( N\left(\mu, \frac{\sigma^2}{n}\right) \) distribution – this says the following.
Courtesy of Mathieu ROUAUD, CC BY-SA 4.0, via Wikimedia Commons
The central limit theorem is the underlying reason why many of the random variables encountered in engineering and science are normally distributed.
The observed variable results from a series of underlying disturbances that act together to create a central limit effect.
It is important, however, to consider when the sample size large enough so that the central limit theorem can be assumed to apply.
The answer depends on how close the underlying distribution is to the normal:
Courtesy of Montgomery & Runger, Applied Statistics and Probability for Engineers, 7th edition
Recall, we are trying to compute
\[ P\left(\overline{X} < 95\right) \]
where \( \overline{X} \) is normally distributed with \( \mu_\overline{X}=100 \) and \( \sigma_\overline{X}=2 \).
We can compute the standard normal z-scores as
\[ \frac{95-100}{2} = -2.5 \]
In R, we can use the pnorm
from last time to compute
pnorm(-2.5)
[1] 0.006209665
Let's note that pnorm
also has alternative settings that allow us to make the probability computation for a general normal.
pnorm
can use keyword arguments mean
and sd
standing for the mean and standard deviation respectively.
Setting these values determines the normal distribution, so that we can compute the earlier probability directly as follows:
pnorm(95, mean=100, sd=2)
[1] 0.006209665
pnorm(-2.5)
[1] 0.006209665
The above demonstrates the equivalence of the approaches.
Generally, computing this directly is preferable so that we don't make errors in computing the z-score by hand.
This example shows that if the distribution of resistance is normal with mean \( \mu=100 \) ohms and standard deviation of \( \sigma=10 \) ohms, finding a random sample of resistors with a sample mean less than \( 95 \) ohms is a rare event.
If this actually happens, it casts doubt as to whether the true mean is really \( 100 \) ohms or if the true standard deviation is really \( 10 \) ohms.
We will come back to this idea when we introduce hypothesis testing.
Courtesy of Montgomery & Runger, Applied Statistics and Probability for Engineers, 7th edition
We will finally consider the case in which we have two independent populations.
Let the first population have mean \( \mu_1 \) and variance \( \sigma^2_1 \) and the second population have mean \( \mu_2 \) and variance \( \sigma^2_2 \).
Suppose that both populations are normally distributed.
Linear combinations of independent normal random variables follow a normal distribution, so that \( X_1 - X_2 \) is also normal.
Suppose that \( \overline{X}_1 \) is the sample mean for the distribution of \( X_1 \) with a sample size \( n_1 \);
Then, the sampling distribution of \( \overline{X}_1 − \overline{X}_2 \) is also normal with mean and variance
\[ \begin{align} \mu_{\overline{X}_1 - \overline{X}_2} &= \mu_{\overline{X}_1} - \mu_{\overline{X}_2} = \mu_{X_1} - \mu_{X_2}\\ \sigma^2_{\overline{X}_1 - \overline{X}_2} &= \sigma^2_{\overline{X}_1} - \sigma^2_{\overline{X}_2} = \frac{\sigma^2_{X_1}}{n_1} - \frac{\sigma^2_{X_2}}{n_2}\\ \end{align} \]
That is to say, we have a normal model for the difference of the two samples from two independent populations;
More generally, we can use the above argument as an approximation when the sample size is large, i.e., usually when \( n>30 \).
Approximate sampling distribution of a difference in sample means
Suppose we have two independent populations with means \( \mu_1 \) and \( \mu_2 \) and variances \( \sigma_1^2 \) and \( \sigma_2^2 \) and if \( \overline{X}_1 \) and \( \overline{X}_2 \) are the sample means of two independent random samples of sizes \( n_1 \) and \( n_2 \) from these populations. Then the sampling distribution of \[ \begin{align} Z = \frac{\overline{X}_1 − \overline{X}_2 − (\mu_1 − \mu_2)}{\sigma_1^2 ∕n_1 + \sigma_2^2 ∕n_2} \end{align} \] is approximately standard normal if the conditions of the central limit theorem apply. If the two populations are normal, the sampling distribution of \( Z \) is exactly standard normal.
To put this another way, we say that \( \overline{X}_1 - \overline{X}_2 \) has approximately a normal distribution with mean and variance
\[ \begin{align} \mu_{\overline{X}_1 - \overline{X}_2} &= \mu_{X_1} - \mu_{X_2}\\ \sigma^2_{\overline{X}_1 - \overline{X}_2} &= \frac{\sigma^2_{X_1}}{n_1} - \frac{\sigma^2_{X_2}}{n_2}\\ \end{align} \]
so that with technology, we can compute the probability directly (without z-scores).
To compute the probability of \( \overline{X}_1 - \overline{X}_2 \) being in some range, we can use pnorm
with the appropriate parameters for mean
and sd
given as keyword arguments.
Recall, any function of a random sample, i.e., any statistic, is modeled as a random variable.
If \( h \) is a general function used to compute some statistic, we thus define
\[ \hat{\Theta} = h(X_1, \cdots, X_n) \]
to be a random variable that will depend on the particular realizations of \( X_1,\cdots, X_n \).
We call the probability distribution of a statistic a sampling distribution.
Sampling Distribution
The probability distribution of a statistic is called a sampling distribution.
The sample mean
\[ \hat{\Theta} = \overline{X} = h(X_1, \cdots, X_n)= \frac{1}{n}\sum_{i=1}^n X_i \]
is now one example for which we have a model of the sampling distribution.
Specifically, the central limit theorem says that the sampling distribution of the sample mean is \( \overline{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right) \) when \( X \) is normal, or if \( n \) is sufficiently large.
Point estimators
A point estimate of some population parameter \( \theta \) is a single numerical value \( \hat{\theta} \) of a statistic \( \hat{\Theta} \). This is a particular realization of the random variable \( \hat{\Theta} \), viewed as a random variable; \( \hat{\Theta} \) is called the point estimator.
We want an estimator to be “close” in some sense to the true value of the unknown parameter, but we know that it happens to be a random variable.
In this way, we need to describe how close this estimator is to the true value in a probabilistic sense.
As we have seen before, there are important parameters that describe a probability distribution or a data set:
The central limit theorem actually provided both of these (and the sampling distribution) for the sample mean:
The two above parameters thus give us a means of describing “how close” the sample mean \( \overline{X} \) tends to be to the population mean \( \mu \) in a probabilistic sense.
The notion of the “center” of the sampling distribution can be useful as a general criteria for estimators.
Formally, we say that \( \hat{\Theta} \) is an unbiased estimator of \( \theta \) if the expected value of \( \hat{\theta} \) is equal to \( \theta \).
This is equivalent to saying that the mean of the probability distribution of \( \hat{\Theta} \) (or the mean of the sampling distribution of \( \hat{\Theta} \)) is equal to \( \theta \).
Bias of an Estimator
The point estimator \( \hat{\Theta} \) is an unbiased estimator for the parameter \( \theta \) if \[ \mathbb{E}\left[\hat{\Theta}\right] = \theta \] If the estimator is not unbiased, then the difference \[ \mathbb{E}\left[\hat{\Theta}\right] - \theta \] is called the bias of the estimator \( \hat{\Theta} \). When an estimator is unbiased, the bias is zero; that is, \[ \begin{align} \mathbb{E}\left[\hat{\Theta}\right] - \theta &= \theta - \theta \\ &=0 \end{align} \]
If we consider the expected value to represent the average value over infinite replications;
A particular realization of \( \hat{\Theta} \) will generally not equal the true value \( \theta \).
However, replications of the experiment will give a good approximation of the true value \( \theta \).
Both of the
\[ \overline{X}= \frac{1}{n}\sum_{i=1}^n X_i; \] and 2. sample variance
\[ s^2 = \frac{\sum_{i=1}^n \left(X_i - \overline{X}\right)^2}{n-1} \]
are unbiased estimators.
However, there are theoretical reasons that we can use to show that the sample standard deviation is a biased estimator of the population standard deviation, i.e.,
\[ \mathbb{E}\left[ s\right] \leq \sigma \]
and it consistently underestimates the true standard deviation.
The bias tends to be small, however, and it is still the most practical estimate most of the time for the population standard deviation.
Courtesy of Montgomery & Runger, Applied Statistics and Probability for Engineers, 7th edition
Minimum Variance Unbiased Estimator
If we consider all unbiased estimators of \( \theta \), the one with the smallest variance is called the minimum variance unbiased estimator (MVUE).
Courtesy of Montgomery & Runger, Applied Statistics and Probability for Engineers, 7th edition
If \( X_1, X_2 , \cdots , X_n \) is a random sample of size \( n \) from a normal distribution with mean \( \mu \) and variance \( \sigma^2 \), the sample mean \( \overline{X} \) is the MVUE for \( \mu \).
As noted before, the variance is a “natural” measure of spread mathematically for theoretical reasons, but it is in the units squared of the original units.
For this reason, when we talk about the spread of an estimator's sampling distribution, we typically discuss the standard error.
The standard error Let \( \hat{\Theta} \) be an estimator of \( \theta \). The standard error error of \( \hat{\Theta} \) is its standard deviation given by \[ \sigma_\hat{\Theta} = \sqrt{\mathrm{var}\left(\hat{\Theta}\right)}. \] If the standard error involves unknown parameters that can be estimated, substitution of those values into the equation above produces an estimated standard error denoted \( \hat{\sigma}_\hat{\Theta} \). It is also common to write the standard error as \( \mathrm{SE}\left(\hat{\Theta}\right) \).
Q: can anyone recall what the standard error is of the sample mean? That is, what is the standard deviation of the sampling distribution (for a normal sample or \( n \) large)?
\[ \overline{X}\sim N\left(\mu, \frac{\sigma^2}{n}\right). \]
\[ \sigma_{\overline{X}} = \frac{\sigma}{\sqrt{n}}. \]
As was discussed before, there are times that we may not know all the parameters that describe the standard error.
For example, suppose we draw \( X_1, \cdots, X_n \) from a normal population, for which we know neither the mean nor the variance.
Let the unknown and unobservable theoretical parameters be denoted \( \mu \) and \( \sigma \) as usual.
The sample mean has the sampling distribution,
\[ \overline{X} \sim N\left( \mu, \frac{\sigma^2}{n}\right), \]
and therefore standard error \( \sigma_{\overline{X}} = \frac{\sigma}{\sqrt{n}} \).
However, we stated that \( \sigma \) itself is unknown.
In this case, we will estimate the standard error as
\[ \hat{\sigma}_\overline{X} = \frac{s}{\sqrt{n}} \] with the sample standard deviation \( s \).
This is what is meant to estimate the standard error.
This particular example will be extremely important for confidence intervals, discussed next time.
An article in the Journal of Heat Transfer (Trans. ASME, Sec. C, 96, 1974, p. 59) described a new method of measuring the thermal conductivity of Armco iron.
Using a temperature of \( 100^\circ \) F and a power input of 550 watts, the following 10 measurements of thermal conductivity (in Btu/hr-ft-∘ F) were obtained:
\[ 41.60, 41.48, 42.34, 41.95, 41.86, 42.18, 41.72, 42.26, 41.81, 42.04 \]
A point estimate of the mean thermal conductivity at \( 100^\circ \) F and 550 watts is the sample mean or
\[ \overline{x} = 41.924 \]
The standard error of the sample mean is \( \sigma_\overline{X}=\frac{\sigma}{\sqrt{n}} \);
\[ \hat{\sigma}_\overline{X} = \frac{s}{\sqrt{n}}= \frac{0.284}{\sqrt{10}} \approx 0.0898 \]
Notice that the standard error is about 0.2 percent of the sample mean, implying that we have obtained a relatively precise point estimate of thermal conductivity.
Assume that thermal conductivity is normally distributed, then two times the standard error is
\[ 2\hat{\sigma}_\overline{X} = 2(0.0898) = 0.1796. \]
The empirical rule says that about 95% of realizations of the sample mean lie within two standard deviations of the true mean \( \mu \).
Therefore, we are highly confident that the true mean thermal conductivity is within the interval 41.924 ± 0.1796 or between \( [41.744 , 42.104] \).
We will formalize this logic into confidence intervals next time.
For now, we will discuss how to import data into RStudio to solve the homework questions.