04/14/2021
Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
Statistical Hypothesis
A statistical hypothesis is a statement about the parameters of one or more populations.
Because we use probability distributions to model populations, a statistical hypothesis may also be thought of as a statement about the probability distribution of a random variable.
The hypothesis will usually involve one or more parameters of this distribution.
For example, consider the air crew escape system described last time.
Suppose that we are interested in the burning rate of the solid propellant.
Burning rate is a random variable that can be described by a probability distribution.
Suppose that our interest focuses on the mean burning rate (a parameter of this distribution).
Specifically, we are interested in deciding whether or not the mean burning rate is \( 50 \) centimeters per second.
We may express this formally as
\[ \begin{align} H_0∶& \mu = 50 \text{ centimeters per second}\\ H_1∶& \mu \neq 50 \text{ centimeters per second} \end{align} \]
\( H_0 \) is known as the null hypothesis and \( H_1 \) is known as the alternative hypothesis.
In hypothesis testing, the null and alternative hypotheses have special meanings philosophically and in the mathematics.
We cannot generally “prove” a hypothesis to be true;
Instead, we can only determine if a hypothesis seems unlikely enough to reject;
To begin such a test formally, we need to first make some assumption about the true parameter.
The null hypothesis \( H_0 \) will always take the form of an equality, or an inclusive inequality.
\[ \begin{align} H_0: & \theta \text{ is } (= / \leq / \geq) \text{ some proposed value}. \end{align} \]
\[ \begin{align} H_0∶ & \mu = 50 \text{ centimeters per second}. \end{align} \]
The contradictory / competing hypothesis is the alternative hypothesis, written
\[ \begin{align} H_1: & \theta \text{ is } (\neq / > / <) \text{ some proposed value} \end{align} \]
\[ \begin{align} H_1∶ & \mu \neq 50 \text{ centimeters per second}. \end{align} \]
Once we have formed a null and alternative hypothesis:
\[ \begin{align} H_0: & \theta \text{ is } (= / \leq / \geq) \text{ some proposed value}\\ H_1: & \theta \text{ is } (\neq / > / <) \text{ some proposed value} \end{align} \]
we use the sample data to consider how likely or unlikely it was to observe such data with the proposed parameter.
If the null hypothesis is sufficiently unlikely, we reject the null hypothesis in favor of the alternative hypothesis.
However, if the evidence (the sample) doesn't contradict the null hypothesis, we tentatively keep this assumption.
In our example, we would say either:
In our example, the alternative hypothesis specifies values of \( \mu \) that could be either greater or less than 50 centimeters per second;
In some situations, we may wish to formulate a one-sided alternative hypothesis, as in
\[ \begin{align} H_0∶ & \mu \geq 50\text{ centimeters per second} \\ H_1∶ & \mu < 50\text{ centimeters per second} \end{align} \]
or
\[ \begin{align} H_0∶ & \mu \leq 50\text{ centimeters per second} \\ H_1∶ & \mu > 50\text{ centimeters per second} \end{align} \]
The above situations have an exact analogy with one-sided confidence bounds, similar to the two-sided test and the two-sided confidence interval.
We will now elaborate on the meaning of determining if a hypothesis is sufficiently unlikely.
In our working example, we are considering
\[ \begin{align} H_0∶& \mu = 50 \text{ centimeters per second}\\ H_1∶& \mu \neq 50 \text{ centimeters per second} \end{align} \]
Suppose that the standard deviation of the burning rate is a known \( \sigma = 2.5 \) centimeters per second and that the burning rate has a normal distribution.
Assuming the null hypothesis, and with a sample of \( n = 10 \) specimens,
\[ \overline{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right) = N\left(50,0.625\right) \]
A value of the sample mean \( \overline{x} \) that falls close to the hypothesized value of \( \mu = 50 \) centimeters per second (relative to the spread) does not conflict with the null hypothesis that the true mean \( \mu \) is really \( 50 \) centimeters per second.
On the other hand, a sample mean that is considerably different from \( 50 \) centimeters per second is evidence in support of rejecting the null hypothesis.
The sample mean in this test is what is called a test statistic.
The sample mean is of course a random variable and can take on many different values.
However, assuming the null hypothesis, and with a sample of \( n = 10 \) specimens,
\[ \overline{X} \sim N\left(50, 0.625\right) \]
we can compute the probability of observing various values of the test statistic, i.e.,
\[ P(\overline{X}\leq \overline{x}_l) \text{ or } P(\overline{X} \geq \overline{x}_u). \] where \( \overline{x}_l \) and \( \overline{x}_u \) may be some lower and upper bound.
For sake of example, let's suppose that observing a sample mean that is \( 1.9 \) standard deviations away from the center of the sampling distribution would be highly surprising:
\[ \begin{align} & &-1.9 &= \frac{\overline{x}_l - 50}{.79} & & & 1.9 &= \frac{\overline{x}_u - 50}{.79}\\ \Leftrightarrow & & \overline{x}_l& \approx 48.5 & & & \overline{x}_u &\approx 51.5 \end{align} \]
We can compute the associated probability
se = 2.5 / sqrt(10)
mu = 50
1 - pnorm(51.5, mean=mu, sd=se) + pnorm(48.5, mean=mu, sd=se)
[1] 0.05777957
Courtesy of Montgomery & Runger, Applied Statistics and Probability for Engineers, 7th edition
Courtesy of Montgomery & Runger, Applied Statistics and Probability for Engineers, 7th edition
Type I Error
Rejecting the null hypothesis \( H_0 \) when it is true is defined as a type I error.
Courtesy of Montgomery & Runger, Applied Statistics and Probability for Engineers, 7th edition
Type II Error
Failing to reject the null hypothesis \( H_0 \) when it is false is defined as a type II error.
Courtesy of Montgomery & Runger, Applied Statistics and Probability for Engineers, 7th edition
Probability of Type I Error
\[ \alpha = P(\text{type I error}) = P(\text{reject }H_0\text{ when }H_0\text{ is true}) \]
If we consider the last statement,
Probability of Type I Error
\[ \alpha = P(\text{type I error}) = P(\text{reject }H_0\text{ when }H_0\text{ is true}). \] This is also called the significance level of the hypothesis test.
the dual relationship with confidence intervals can be understood as follows.
Suppose we produce a \( (1-\alpha)\times100\% \) confidence interval for the unknown true mean \( \mu \) of a studied population,
\[ \left( \overline{x} - \sigma_\overline{X} z_\frac{\alpha}{2} , \overline{x} + \sigma_\overline{X} z_\frac{\alpha}{2}\right) \] based on some sample.
Suppose that we have a working hypothesis that \( \tilde{\mu} \) is the true population mean, i.e.,
\[ \begin{align} H_0 &: \mu = \tilde{\mu} \\ H_1 &: \mu \neq \tilde{\mu} \end{align} \]
We can considering using the confidence interval as our decision criterion for the hypothesis test:
Let's recall the definition of \( \alpha \):
Probability of Type I Error
\[ \alpha = P(\text{type I error}) = P(\text{reject }H_0\text{ when }H_0\text{ is true}) \]
Suppose that the null hypothesis is true, i.e, \( \tilde{\mu} = \mu \), yet we find that
\[ \tilde{\mu} \notin \left( \overline{x} - \sigma_\overline{X} z_\frac{\alpha}{2} , \overline{x} + \sigma_\overline{X} z_\frac{\alpha}{2}\right). \]
If \( H_0 \) is actually true, then concluding that \( \tilde{\mu} \) is not a reasonable value for \( \mu \) is precisely a type I error.
If we have constructed a \( (1-\alpha)\times 100\% \) confidence interval, the rate at which
\[ \tilde{\mu} \notin \left( \overline{X} - \sigma_\overline{X} z_\frac{\alpha}{2} , \overline{X} + \sigma_\overline{X} z_\frac{\alpha}{2}\right) \] with respect to infinite replications is precisely the rate of failure, \( \alpha \).
Therefore, we have the equivalence:
\[ (1-\alpha)\times 100\% \text{ confidence} \Leftrightarrow \alpha = P(\text{type I error}). \]
The above relationship expresses the duality of confidence intervals and hypothesis tests.
This explains, in part, why t.test()
computes both a confidence interval and hypothesis test simultaneously;
Courtesy of Montgomery & Runger, Applied Statistics and Probability for Engineers, 7th edition
alpha <- 2 * (1 - pnorm(1.9))
alpha
[1] 0.05743312
where \[ \begin{align} P\left(\overline{X} < \overline{x}_l\right) = P\left(\overline{X} < 48.5\right) \approx P\left(\overline{X} < 50 - \sigma_\overline{X} 1.9\right) & = \frac{\alpha}{2}\\ P\left(\overline{X} > \overline{x}_u\right) = P\left(\overline{X} > 41.5 \right) \approx P\left(\overline{X} > 50 - \sigma_\overline{X} 1.9\right) &= \frac{\alpha}{2}. \end{align} \]
Up to some small approximation errors (done in the book)
se = 2.5 / sqrt(10)
mu = 50
1 - pnorm(51.5, mean=mu, sd=se) + pnorm(48.5, mean=mu, sd=se)
[1] 0.05777957
Probability of Type II Error
\[ \beta = P(\text{type II error}) = P(\text{failing to reject }H_0\text{ when }H_0\text{ is false}). \] The complementary probability, \( 1- \beta \) is called the power of the hypothesis test.
To calculate \( \beta \), we must have a specific alternative hypothesis;
This is because, the unknown, true alternative hypothesis for \( \mu \) will determine the sampling distribution for \( \overline{X} \).
For example, suppose that it is important to reject the null hypothesis
\[ H_0 : \mu = 50 \]
whenever the mean burning rate \( \mu \) is greater than \( 52 \) centimeters per second or less than \( 48 \) centimeters per second.
Assuming that the true sampling distribution is centered at \( \mu=52 \) or \( \mu=48 \), we can determine the probability of a type II error \( \beta \);
We will estimate how the test procedure will work probabilistically if we wish to reject \( H_0 \), for a true mean value of \( \mu = 52 \) or \( \mu = 48 \).
Courtesy of Montgomery & Runger, Applied Statistics and Probability for Engineers, 7th edition
From the last slide, we can compute the probability of a type II error, in the case that \( H_0: \mu=50 \) is false, where \( \mu=52 \) is the true value, by
\[ \beta = P(48.5 \leq \overline{X} \leq 51.5 \text{ assuming that }\mu = 52). \]
Recall, we said that the standard error in this example was known as,
\[ \sigma_\overline{X} = \frac{\sigma}{\sqrt{n}} = \frac{2.5}{\sqrt{10}} \approx 0.79 \]
Therefore, with R, this can be computed directly as follows:
se <- 2.5 / sqrt(10)
mu <- 52
beta <- pnorm(51.5, mean=mu, sd=se) - pnorm(48.5, mean=mu, sd=se)
beta
[1] 0.2635399
Now that we have established the fundamental tools of hypothesis testing:
we will discuss how one formally goes through a hypothesis test.
Let's suppose that we have the sample of observations from the last discussion section:
speed_up_times <- c(3.775302, 3.350679, 4.217981, 4.030324, 4.639692, 4.139665, 4.395575, 4.824257, 4.268119, 4.584193, 4.930027, 4.315973, 4.600101)
n <- length(speed_up_times)
n
[1] 13
We will suppose that we set a value \( \alpha=0.05 \) in advanced – this is a standard level to set the probability of type I error but can be different in practice.
We will also need to specify a null and alternative hypothesis in advanced – let these be:
\[ \begin{align} H_0 : \mu = 4.0 & & H_1 : \mu \neq 4.00 \end{align} \]
The procedure is then as follows:
x_bar <- mean(speed_up_times)
x_bar
[1] 4.313222
We need to evaluate how unlikely it is to observe x_bar
under the assumption that \( \mu = 4.0 \).
The model for this probability will depend on whether the true population \( \sigma \) is known or unknown.
Let's assume for simplicity at the moment that \( \sigma=0.45 \) is a known value;
\[ \overline{X} \sim N\left(\mu, \frac{0.45^2}{13}\right). \]
The standard error can be computed as
se <- 0.45/sqrt(13)
se
[1] 0.1248075
mu <- 4.0
z_alpha_over_2 <- qnorm(0.975)
cr <- c(mu - se * z_alpha_over_2, mu + se * z_alpha_over_2)
cr
[1] 3.755382 4.244618
x_bar
[1] 4.313222
x_bar
lies outside of the critical region – equivalently, our hypothesis value of the true mean \( 4.0 \) lies outside of the \( 95\% \) confidence intervalci <- c(x_bar - se * z_alpha_over_2, x_bar + se * z_alpha_over_2)
ci
[1] 4.068604 4.557840
Specifically, with \( 95\% \) confidence, we can say that \( \mu=4.0 \) is not a plausible value for the mean based on the confidence interval.
Alternatively, we can say that there is a probability of less than
lower_tail_probability <- pnorm(mu - se * z_alpha_over_2, mean=mu, sd=se)
upper_tail_probability <- 1 - pnorm(mu + se * z_alpha_over_2, mean=mu, sd=se)
alpha <- lower_tail_probability + upper_tail_probability
alpha
[1] 0.05
of observing such a value for the sample mean with our model for the sampling distribution.
Denoting \( \alpha \) the significance level we state:
Continuing the last example, suppose we had a specific value for the alternative hypothesis in mind, \( H_1: \mu = 4.5 \).
We can check the power of the hypothesis test versus this value of the alternative hypothesis as follows.
Our the acceptance region for the hypothesis test is again given as
cr
[1] 3.755382 4.244618
We fail to reject the null if the sample mean falls within this region.
If we assume that the true model for the sample mean is given as
\[ \begin{align} \overline{X} \sim N\left(4.5, \frac{0.45^2}{13}\right) \end{align} \]
the probability of a type II error is given as
\[ \begin{align} \beta = P\left( 3.755382 \leq \overline{X} \leq 4.244618\right) \end{align} \] given the above model.
We can compute
\[ \begin{align} \beta = P\left( 3.755382 \leq \overline{X} \leq 4.244618\right) \end{align} \] under the assumption
\[ \begin{align} \overline{X} \sim N\left(4.5, \frac{0.45^2}{13}\right) \end{align} \] as
beta <- pnorm(4.244618, mean=4.5, sd=se) - pnorm(3.755382, mean=4.5, sd=se)
beta
[1] 0.02036803
1 - beta
[1] 0.979632
One way to report the results of a hypothesis test is to state that the null hypothesis was or was not rejected at a specified \( \alpha \)-value or level of significance.
This is called fixed significance level testing.
The fixed significance level approach to hypothesis testing is very nice because it leads directly to the concepts of type II error and power;
However, the fixed significance level approach does have some disadvantages.
In our last example, \( H_0 : \mu = 50 \) was rejected at the \( 0.05 \) level of significance.
This statement of conclusions may be often inadequate because it gives the decision maker no idea about whether the computed value of the test statistic was just barely in the rejection region or whether it was very far into this region.
Furthermore, stating the results this way imposes the predefined level of significance on other users of the information.
This approach may be unsatisfactory because some decision makers might be uncomfortable with the risks implied by \( \alpha= 0.05 \).
To avoid these difficulties, the P-value approach has been adopted widely in practice.
The P-value is the probability that the test statistic will take on a value that is at least as extreme as the observed value of the statistic, when the null hypothesis \( H_0 \) is true.
Thus, a P-value conveys much information about the weight of evidence against \( H_0 \);
We now give a formal definition of a P-value.
P-Value
The P-value is the smallest level of significance that would lead to rejection of the null hypothesis \( H_0 \) with the given data.
It is customary to consider the test statistic (and the data) significant when the null hypothesis \( H_0 \) is rejected;
In other words, the P-value is the observed significance level.
Once the P-value is known, the decision maker can determine how significant the data are without the data analyst formally imposing a pre-selected level of significance.
speed_up_times <- c(3.775302, 3.350679, 4.217981, 4.030324, 4.639692, 4.139665, 4.395575, 4.824257, 4.268119, 4.584193, 4.930027, 4.315973, 4.600101)
x_bar
[1] 4.313222
Our null and alternative hypotheses were given as
\[ \begin{align} H_0: \mu = 4.0 & & H_0: \mu \neq 4.0 \end{align} \]
The alternative hypothesis specifies that the critical region under consideration is two-sided, like a two-sided confidence interval.
Therefore, the P-value will measure the probability of observing a sample mean at least as far away as x_bar
from \( \mu=4.0 \) in either direction, under the model
\[ \overline{X} \sim N\left(\mu, \frac{0.45^2}{13}\right). \]
z_score <- (x_bar - 4.0)/se
z_score
[1] 2.509641
Therefore, the observed value for the sample mean lies \( \approx 2.5 \) standard deviations to the right of the proposed mean.
The P-value thus corresponds to the probability of observing a standard normal random variable taking a value at least as extreme as the z-score in either direction, i.e.,
\[ \approx P(Z< -2.5 ) + P(Z> 2.5). \]
We compute
P_value <- pnorm(-z_score) + (1 - pnorm(z_score))
P_value
[1] 0.01208539