Confidence intervals and an introduction to hypothesis testing

04/14/2021

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

The following topics will be covered in this lecture:
- Confidence intervals for the mean, with unknown population standard deviation
- Large sample size
- Student t distribution
- Student t confidence intervals
- Student t confidence intervals with real-life methods
- An introduction to hypothesis testing
- Statistical hypotheses
- The null and alternative hypothesis
- Two-sided versus one-sided hypothesis tests

Confidence intervals, variance unknown

In practice, we almost never know the true population standard deviation \( \sigma \) and we must use the sample standard deviation \( s \) as a point estimate.
Our standard error estimate is \( \hat{\sigma}_\overline{X}= \frac{s}{\sqrt{n}} \), and this will be utilized for a more general construction of confidence intervals.
If we have a large sample size, with \( n>40 \), we can use this estimate of the standard error effectively within the confidence interval as follows.

Large-Sample Confidence Interval on the Mean
When n is large, the quantity \[ \frac{\overline{X} - \mu}{\frac{s}{\sqrt{n}}} \] has an approximate standard normal distribution. Consequently, \[ x − z_\frac{\alpha}{2} \frac{s}{\sqrt{n}} \leq \mu \leq x + z_\frac{\alpha}{2} \frac{s}{\sqrt{n}} \] is a large-sample confidence interval for \( \mu \), with confidence level of approximately \( (1-\alpha)\times 100\% \).

This is a form of the central limit theorem being used again where the underlying population distribution does not matter;
- the sampling distribution of the sample mean can be approximated with a normal assumption with a standard error \( \sigma_{\overline{X}} \).
- If we estimate \( \sigma \) with \( s \), we can get an approximation of the normal using \( \hat{\sigma}_{\overline{X}} \) as an approximation of the standard error.

Confidence intervals, variance unknown – continued

However, when the sample is small and \( \sigma^2 \) is unknown, we must make an assumption about the form of the underlying distribution to obtain a valid CI procedure.
A reasonable assumption in many cases is that the underlying distribution is normal.
Many populations encountered in practice are well approximated by the normal distribution, so this assumption will lead to confidence interval procedures of wide applicability.
In fact, moderate departure from normality will have little effect on validity.
When the assumption is unreasonable, an alternative is to use nonparametric statistical procedures that are valid for any underlying distribution.

Confidence intervals, variance unknown – continued

Suppose that the population of interest has a normal distribution with unknown mean \( \mu \) and unknown variance \( \sigma^2 \).
Assume that a random sample of size \( n \), say, \( X_1, X_2 , \cdots , X_n \), is available, and let \( \overline{X} \) and \( S^2 \) be the sample mean and variance, respectively.
We wish to construct a two-sided CI on \( \mu \);
- if the variance \( \sigma \) is known, we know that
\[ Z = \frac{\overline{X} - \mu}{\frac{\sigma}{\sqrt{n}}} \]

has a standard normal distribution.
- When \( \sigma \) is unknown, we use the estimate \( \hat{\sigma}_\overline{X} = \frac{S}{\sqrt{n}} \) for the standard error.
The random variable \( Z \) now becomes

\[ T = \frac{\overline{X} − \mu}{\frac{S}{\sqrt{n}}}. \]

Confidence intervals, variance unknown – continued

For the random variable

\[ T = \frac{\overline{X} − \mu}{\frac{S}{\sqrt{n}}}. \]

logical questions are:
- what is the distribution of \( T \)?; and
- is the distribution very different than the standard normal?
If \( n \) is large, the distribution differs very little from the standard normal by the central limit theorem.
However, \( n \) is usually small in most engineering problems, and in this situation, a different distribution must be employed to construct the CI.
Let's suppose that we have a random sample \( X_1, \cdots, X_n \) from a normal distribution with population mean \( \mu \) and population standard deviation \( \sigma \).
The sample mean \( \overline{X} \) and the sample standard deviation \( S \) are computed from the above observations.
Then, it is an extremely important and non-trivial result that the random variable, \[ \frac{\overline{x} - \mu}{\frac{S}{\sqrt{n}}} \] is distributed according to a student t with \( n-1 \) degrees of freedom.

Student's t-distribution

The pdf of the t-distribution is

\[ \begin{align} f(T,n) = \frac{\Gamma\left\{\frac{n+1}{2}\right\}}{\sqrt{\pi n} \Gamma\left(\frac{n}{2}\right)\left(1 + \frac{T^2}{n}\right)^{\frac{n+1}{2}}} \end{align} \] where the Gamma function is a “special function”.
We plot the density below fo n=1, n=2 and n=5 degrees of freedom with the normal density plotted for reference.

par(cex = 2.0, mar = c(5, 4, 4, 2) + 0.3)
t = seq(-5, 5, length = 300)
colors = c("black", "red", "green")
df = c(1, 2, 5)  # degrees of freedom(df) for the t-distribution
plot(t, dnorm(t, 0, 1), xlab = "t", ylab = "pdf", type = "l", lwd = 2, col="blue")
for (i in 2:4) {  lines(t, dt(t, df[i]), col = colors[i])}

plot of chunk unnamed-chunk-1

Student's t-distribution

The degrees of freedom determine the shape of the student t.
For \( n > 2 \) degrees of freedom, the mean and variance of Student’s t-distribution are

\[ \begin{align} \mu_T= 0 & & \sigma_T^2 = \frac{n}{n-2} \end{align} \]
As \( n\rightarrow \infty \), the student t distribution becomes closer and eventually converges to the standard normal.

par(cex = 2.0, mar = c(5, 4, 4, 2) + 0.3)
t = seq(-5, 5, length = 300)
colors = c("black", "red", "green")
df = c(10, 100, 1000)  # degrees of freedom(df) for the t-distribution
plot(t, dnorm(t, 0, 1), xlab = "t", ylab = "pdf", type = "l", lwd = 2, col="blue")
for (i in 2:4) {  lines(t, dt(t, df[i]), col = colors[i])}

plot of chunk unnamed-chunk-2

Student's t-distribution

The quantiles of a t-distributed rv \( T \) are denoted by \( t_p \), and, due to symmetry, \( t_p =−t_{1− p} \).
In R the generic functions for the t distribution are the following:
- dt(x, df) is the probability density function of the t distribution with df degrees of freedom.
- pt(q, df) is the cumulative density function of the t distribution with df degrees of freedom.
- rt(n, df) randomly generates a sample of size n from the t distribution with df degrees of freedom.
- qt(p, df) is the quantile function of the t distribution with df degrees of freedom.
With these above generic functions for the t distribution, we can almost identically compute the student t confidence interval as we did for the normal confidence interval.

Student t confidence intervals

Formally, we will write

Confidence Interval on the Mean, Variance Unknown
If \( \overline{x} \) and \( s \) are the mean and standard deviation of a random sample from a normal distribution with unknown variance \( \sigma^2 \) with a sample size \( n \). A \( (1-\alpha)\times 100\% \) confidence interval on \( \mu \) is given by \[ \begin{align} &\overline{x} - \hat{\sigma}_\overline{X} t_\frac{\alpha}{2} \leq \mu \leq \overline{x} + \hat{\sigma}_\overline{X} t_\frac{\alpha}{2}\\ \Leftrightarrow&\overline{x} - \frac{s}{\sqrt{n}} t_\frac{\alpha}{2} \leq \mu \leq \overline{x} + \frac{s}{\sqrt{n}}t_\frac{\alpha}{2} \end{align} \] where \( t_\frac{\alpha}{2} \) is the upper \( \frac{\alpha}{2} \) critical point of the t distribution with n − 1 degrees of freedom.

In practice, this is how we will more typically compute a confidence interval on the mean.
Because this is the most common way to compute such a confidence interval in practice, there are actually built-in functions in the R language to handle this.
Computing the confidence interval “manually” with the quantile function is mostly pedagogical, but we will demonstrate how this is done with a few more examples.
Shortly, we will learn how to compute confidence intervals and hypothesis tests with the t distribution using the

t.test()

function in R.

Student t confidence intervals – continued

For now, in order to compute the t confidence interval manually, we need to find the appropriate critical value for the equation

\[ \overline{x} - \frac{s}{\sqrt{n}} t_\frac{\alpha}{2} \leq \mu \leq \overline{x} + \frac{s}{\sqrt{n}}t_\frac{\alpha}{2} \]
We can find this critical point in the same way as for the normal, using R, as follows.
Suppose we have a sample size of \( n=20 \); this gives \( n-1=19 \) degrees of freedom, i.e.,

t_alpha_over_2 <- qt(0.975, df=19)
t_alpha_over_2

[1] 2.093024

is the critical point for the \( 95\% \) two-sided confidence interval.

The critical point for the \( 99\% \) two-sided confidence interval is given as

t_alpha_over_2 <- qt(0.995, df=19)
t_alpha_over_2

[1] 2.860935

We will consider a full example of constructing the confidence interval in the following.

Student t confidence intervals – example

An article in the Journal of Materials Engineering describes the results of tensile adhesion tests;
- this is performed on the following U-700 alloy specimens, with the load at failure as follows (in megapascals):

alloy_load_failures <- c(19.8, 10.1, 14.9, 7.5, 15.4, 15.4, 15.4, 18.5, 7.9, 12.7, 11.9, 11.4, 11.4, 14.1, 17.6, 16.7, 15.8, 19.5, 8.8, 13.6, 11.9, 11.4)

We can determine some key values as follows:

n <- length(alloy_load_failures)
n

[1] 22

x_bar <- mean(alloy_load_failures)
x_bar

[1] 13.71364

s <- sd(alloy_load_failures)
s

[1] 3.553576

Student t confidence intervals – example

Using the last values, we can compute the estimated standard error as

se <- s / sqrt(n)
se

[1] 0.7576249

Notice, if we want to compute the \( 95\% \) confidence interval of the mean, we cannot use the z critical value accurately as the sample size is under 40, and we do not know the population standard deviation.
Therefore, we compute the t critical value as

t_alpha_over_2 = qt(0.975, df=n-1)
t_alpha_over_2

[1] 2.079614

Our corresponding t confidence interval is given as

ci <- c(x_bar - se * t_alpha_over_2, x_bar + se * t_alpha_over_2)
ci

[1] 12.13807 15.28920

Student t confidence intervals – example

Notice that

z_alpha_over_2 <- qnorm(0.975)
z_alpha_over_2

[1] 1.959964

is smaller than

t_alpha_over_2

[1] 2.079614

This demonstrates the way in which the t distribution models the increased uncertainty of the population mean, due to the unknown population standard deviation.
As mentioned before, this process of manually computing confidence intervals is really just pedagogical.
We will now begin to introduce the realistic way confidence intervals are computed in practice.

Student t confidence intervals – example

Recall, we have the following sample

alloy_load_failures

 [1] 19.8 10.1 14.9  7.5 15.4 15.4 15.4 18.5  7.9 12.7 11.9 11.4 11.4 14.1 17.6
[16] 16.7 15.8 19.5  8.8 13.6 11.9 11.4

We can compute the confidence interval directly as

t.test(alloy_load_failures)


    One Sample t-test

data:  alloy_load_failures
t = 18.101, df = 21, p-value = 2.731e-14
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 12.13807 15.28920
sample estimates:
mean of x 
 13.71364

There are a few pieces of information above that we have yet to discuss – this is the t hypothesis test covered next.

Student t confidence intervals – example

Note, if we want to make a confidence interval for a different confidence level, we can supply an optional key word

t.test(alloy_load_failures, conf.level=0.99)


    One Sample t-test

data:  alloy_load_failures
t = 18.101, df = 21, p-value = 2.731e-14
alternative hypothesis: true mean is not equal to 0
99 percent confidence interval:
 11.56853 15.85874
sample estimates:
mean of x 
 13.71364

In reality, this is the default way that one will compute a confidence interval on the mean.
We will begin to favor this approach over the pedagogical approach, constructing confidence intervals using qt or qnorm.
- However, as the pedagogical approach emphasizes the mathematical concepts, we will still have a few exercises like this.
- The final project, in particular, will use both approaches.

Hypothesis testing – motivation

So far, we showed how a parameter of a population can be estimated from sample data;
We first showed how to construct a point estimate based on a sample;
- however, a point estimate is is statistically unsatisfying due to the intrinsic uncertainty of this estimate due to sampling error.
In order to rectify the issue with only providing a single point estimate, we constructed an interval of likely values called a confidence interval.
With a level of confidence \( (1 -\alpha)\times 100\% \), specified in terms of the failure rate \( \alpha \), we supplied a range of plausible values for the parameter given the sample on hand.
In many situations, a dual type of problem is of interest, where we will be concerned with how unlikely a possible parameter value might be.
For a \( 95\% \) level of confidence, we had an \( \alpha=5\% \) rate of failure in the confidence interval proceedure.
This principle has been the basis of us finding \( z_\frac{\alpha}{2} \) and \( t_\frac{\alpha}{2} \) critical values for \( \alpha = 0.05 \) corresponding to \( 5\% \).
Particularly, we would have found it unlikely that in more than \( 1 \) out of \( 20 \) replications of samples the associated confidence interval did not contain the true parameter.

Hypothesis testing – motivation

We can think of rephrasing the above principle as well:

Suppose we are estimating the population mean \( \mu \), and we have some hypothesis as to what the value might be, \( \tilde{\mu} \).
Let us suppose we created a \( 95\% \) confidence interval, \[ \left(\overline{X} - \hat{\sigma}_\overline{X} t_\frac{\alpha}{2}, \overline{X} + \hat{\sigma}_\overline{X} t_\frac{\alpha}{2}\right) \] and upon comparing with some realization \( \overline{x} \) we found that \( \tilde{\mu} \) was not in this region.
If we are following the procedure correctly, and if the \( \tilde{\mu} \) was actually equal to the true population \( \mu \), then \[ \tilde{\mu} \text{ not in } \left(\overline{X} - \hat{\sigma}_\overline{X} t_\frac{\alpha}{2}, \overline{X} + \hat{\sigma}_\overline{X} t_\frac{\alpha}{2}\right) \] only \( 5\% \) of the time.

If we were to find that \( \tilde{\mu} \) was actually in our confidence intervals in far less than \( 5\% \) of replications, this should question if \( \tilde{\mu} \) was really a good hypothesis for the true \( \mu \).
In this sense, \( \alpha \) represents a kind of criterion if we should question if a proposed value of \( \tilde{\mu} \) is really appropriate.

Hypothesis testing – motivation

In real applications, there may be two competing claims (or hypotheses) about the value of a parameter.
The engineer must use sample data to determine which claim is most plausible, and which one can be rejected as unlikely.
For example, suppose that an engineer is designing an air crew escape system;
- this will consist of an ejection seat and a rocket motor that powers the seat.
The rocket motor contains a propellant, and for the ejection seat to function properly, the propellant should have a mean burning rate of 50 cm/sec.
If the burning rate is too low, the ejection seat may not function properly, leading to an unsafe ejection and possible injury of the pilot.
Higher burning rates may imply instability in the propellant or an ejection seat that is too powerful, again leading to possible pilot injury.
The practical engineering question that must be answered is: Does the mean burning rate of the propellant equal 50 cm/sec, or is it some other value (either higher or lower)?
This type of question can be answered using a statistical technique called hypothesis testing.
We have already gotten some idea of the duality of these problems as t.test() computes both simultaneously.
We will now develop this idea more formally.

Hypothesis testing – introduction

Formally, we will define

Statistical Hypothesis
A statistical hypothesis is a statement about the parameters of one or more populations.

Because we use probability distributions to model populations, a statistical hypothesis may also be thought of as a statement about the probability distribution of a random variable.
The hypothesis will usually involve one or more parameters of this distribution.
For example, consider the air crew escape system described already.
Suppose that we are interested in the burning rate of the solid propellant.
Burning rate is a random variable that can be described by a probability distribution.
Suppose that our interest focuses on the mean burning rate (a parameter of this distribution).
Specifically, we are interested in deciding whether or not the mean burning rate is \( 50 \) centimeters per second.
We may express this formally as

\[ \begin{align} H_0∶& \mu = 50 \text{ centimeters per second}\\ H_1∶& \mu \neq 50 \text{ centimeters per second} \end{align} \]
\( H_0 \) is known as the null hypothesis and \( H_1 \) is known as the alternative hypothesis.

Hypothesis testing – introduction

In hypothesis testing, the null and alternative hypotheses have special meanings philosophically and in the mathematics.
We cannot generally “prove” a hypothesis to be true;
- generically, we will assume that the true population parameter is unobservable.
Instead, we can only determine if a hypothesis seems unlikely enough to reject;
- this is similar to finding that our proposed parameter value was in far-fewer confidence intervals than predicted by the procedure.
To begin such a test formally, we need to first make some assumption about the true parameter.
- This always takes the form of assuming the null hypothesis \( H_0 \).
The null hypothesis \( H_0 \) will always take the form of an equality, or an inclusive inequality.
- That is, we take
\[ \begin{align} H_0: & \theta \text{ is } (= / \leq / \geq) \text{ some proposed value}. \end{align} \]
- In our example, we wrote
\[ \begin{align} H_0∶ & \mu = 50 \text{ centimeters per second}. \end{align} \]

Hypothesis testing – introduction

The contradictory / competing hypothesis is the alternative hypothesis, written

\[ \begin{align} H_1: & \theta \text{ is } (\neq / > / <) \text{ some proposed value} \end{align} \]
- In our example, we wrote
\[ \begin{align} H_1∶ & \mu \neq 50 \text{ centimeters per second}. \end{align} \]
Once we have formed a null and alternative hypothesis:

\[ \begin{align} H_0: & \theta \text{ is } (= / \leq / \geq) \text{ some proposed value}\\ H_1: & \theta \text{ is } (\neq / > / <) \text{ some proposed value} \end{align} \]
we use the sample data to consider how likely or unlikely it was to observe such data with the proposed parameter.
- If the sample doesn't seem to fit the proposed parameter value, we deem the null hypothesis unlikely.
If the null hypothesis is sufficiently unlikely, we reject the null hypothesis in favor of the alternative hypothesis.
However, if the evidence (the sample) doesn't contradict the null hypothesis, we tentatively keep this assumption.
- This has not proven this assumption, it has only said that the hypothesis is not unlikely given our evidence.
In our example, we would say either:
1. we reject the null hypothesis of \( H_0∶ \mu = 50 \) in favor of the alternative \( H_1: \mu \neq 50 \); or
2. we fail to reject the null hypothesis of \( H_0:\mu = 50 \).

Hypothesis testing – introduction

In our example, the alternative hypothesis specifies values of \( \mu \) that could be either greater or less than 50 centimeters per second;
- therefore, it is called a two-sided alternative hypothesis.
In some situations, we may wish to formulate a one-sided alternative hypothesis, as in

\[ \begin{align} H_0∶ & \mu \geq 50\text{ centimeters per second} \\ H_1∶ & \mu < 50\text{ centimeters per second} \end{align} \]
or

\[ \begin{align} H_0∶ & \mu \leq 50\text{ centimeters per second} \\ H_1∶ & \mu > 50\text{ centimeters per second} \end{align} \]
The above situations have an exact analogy with one-sided confidence bounds, similar to the two-sided test and the two-sided confidence interval.
We will now elaborate on the meaning of determining if a hypothesis is sufficiently unlikely.
- This is directly related to the value \( \alpha \) we used as a rate of failure for confidence intervals.