Estimating the standard deviation and introduction to hypothesis testing part I

04/14/2020

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

The following topics will be covered in this lecture:
- A quick discussion of confidence intervals for the variance
- Tests of significance
- The null hypothesis
- The alternative hypothesis
- The process of hypothesis testing
- Significance levels versus confidence levels
- Test statistics
- P-values
- Critical values
- Drawing conclusions
- Type I and type II errors

Confidence intervals for the variance

Sample variaces are distributed right-skewed around the true population parameter.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

We have gone over in the last lecture how to estimate a population proportion $ p $ and a population mean $ \mu $.

In both cases, a sample statistic generates a “point estimate” as a kind of “best guess” given a certain collection of data.
Likewise, we needed a “confidence interval” to quantify how uncertain this best guess was, and to give a range of other plausible values for the parameter.

In both cases, our confidence interval needed to use some estimate of the standard deviation of the population to estimate our standard error of the sampling distribution.

the standard error tells us how the sample statistic varies around the true parameter under replication of samples.

When estimating the mean without knowledge of $ \sigma $, we used the sample standard deviation $ s $ to estimate the standard error $ \sigma_\overline{x} $.

We know that $ s^2 $ is the best, unbiased estimator for $ \sigma^2 $, and although $ s $ is a biased estimator for $ \sigma $, it is still usually the “best” option in some sense.

A more complicated question is the following,

how do we produce confidence intervals for $ \sigma $ that take into account the uncertainty in our sample-based estimates of this parameter?

This is especially due to the fact that the sample variances are distributed right-skewed around the true population variance.

Confidence intervals for the variance continued

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Because this is a more complicated topic, and goes slightly beyond the overal scope of the course, this material will not have homework assignments or be tested.

The purpose of the first part of this lecture is to give exposure to some advanced topics that will be useful for future work with statistical methods.

The first advanced topic we will need to introduce is a very non-normal probability distribtuion, the “chi-square” distribution.

Usually, this is denoted $ \chi^2(k) $ where the Greek letter chi denotes “chi-square”.

The value $ k $ corresponds to the number of “degrees of freedom”, like the student t degrees of freedom, and will be introduced shortly.

Before we introduce the $ \chi^2 $ distribution formally, we want to just note a few qualitative features of the distribution:

If $ x $ is a random variable that behaves like $ \chi^2 $, $ x $ will only take on nonnegative values, \[ x \geq 0 \] over any realization.
The distribution of values $ x $ under $ \chi^2 $ are right-skewed, i.e., there are most values concentrated to the left around zero, but many extremely large values that occur with much higher frequency than with a normal distribution.

Interestingly, despite the differences from the normal distribution, $ \chi^2 $ is also closely related to the normal.

Chi-square distribution

Random variables are the numerical measure of the outcome of a random process.

Courtesy of Geek3 CC BY-SA via Wikimedia Commons

We will formally define $ \chi^2(k) $ – the chi-square distribution in $ k $ degrees of freedom – as follows:

Suppose that we have the observations $ x_1, \cdots, x_n $ all drawn randomly and independently from a normal population with variance $ \sigma^2 $.
Then, let $ s^2 $ be the usual sample variance computed from the above observations.
Recall, $ s^2 $ is a random variable and an unbiased estimator for the true variance $ \sigma^2 $.
The random variable \[ \frac{(n-1)\times s^2}{\sigma^2} \] is distributed $ \chi^2(n-1) $ – chi-square in $ n-1 $ degrees of freedom.

Notice in the above diagram, $ k=n-1 $ is a shape parameter for the $ \chi^2(k) $ distribution.

It is important to remember that as with the student t distribution, when we have $ n $ total observations there are $ n-1 $ degrees of freedom in the $ \chi^2 $ distribution.

Consider the following: recall how in the normal and student t distributions, we found a single critical value to find the area under the density graph, using the symmetry about zero. For $ \chi^2(k) $, will we be able to find a single critical value in a similar manner?

With the strong right-skew, we need to find $ \chi_R^2 $ a right critical value and $ \chi_L^2 $ a left critical value.

Using the chi-square distribution for confidence intervals

Notice then, if we have $ \chi_R^2 $ – the right critical value – and $ \chi_L^2 $ – the left critical value – our confidence interval for $ \sigma^2 $ will not be symmetric as \[ \left( \sigma^2 - E, \sigma^2 + E\right). \]
Rather, we will have some kind of general lower bound $ L $ and upper bound $ U $ as $ (L, U) $ – we will consider how we can find such an interval:

Recall, the random variable \[ \frac{(n-1)\times s^2}{\sigma^2} \] is distributed as $ \chi^2(n-1) $.
Suppose we select some confidence level $ (1-\alpha)\times 100\% $ for which we find an associated $ \chi_R^2 $ and $ \chi_L^2 $.

As usual, these critical points denote the boundary in the distribution for which:

exactly $ \frac{\alpha}{2}\times 100\% $ of the total population lies to the right of $ \chi_R^2 $; and
exactly $ \frac{\alpha}{2}\times 100\% $ of the total population lies to the left of $ \chi_L^2 $

in the $ \chi^2(n-1) $ distribution.

The $ \chi^2(n-1) $ distributed random variable \[ \frac{(n-1)\times s^2}{\sigma^2} \] attains values depending on the normally distributed observations $ x_1,\cdots, x_n $.
We thus have an interval of $ \left(\chi_L^2, \chi_R^2\right) $ of $ (1-\alpha)\times 100\% $ confidence of where $ \frac{(n-1)\times s^2}{\sigma^2} $ might lie.

Using the chi-square distribution for confidence intervals continued

Recall the form for the $ (1-\alpha)\times 100\% $ confidence interval for the random variable \[ \frac{(n-1)\times s^2}{\sigma^2} \text{ in } \left( \chi_L^2, \chi_R^2\right) \]
We can also state this as \[ \begin{align} & &\chi_L^2 & < \frac{(n-1)\times s^2}{\sigma^2} < \chi_R^2 \\ \\ & \Leftrightarrow & \frac{\chi_L^2}{(n-1)\times s^2} & < \frac{1}{\sigma^2} < \frac{\chi_R^2}{(n-1)\times s^2} \\ \\ & \Leftrightarrow & \frac{(n-1)\times s^2}{\chi_R^2} & < \sigma^2 < \frac{(n-1)\times s^2}{\chi_L^2} \end{align} \]
The above gives us a means to compute an interval such that:

if the observations $ x_1, \cdots, x_n $ are normally distributed and are replicated infinitely many times,
we would expect that the true $ \sigma^2 $ would lie in the associated confidence interval $ (1-\alpha)\times 100\% $ of the time.

Again, we do not guarantee that $ \sigma^2 $ lies in any particular confidence interval that depends on a particular group of observations $ x_1,\cdots, x_n $.
However, if the procedure is followed correctly, we provide a level of confidence in the sense described above.

Confidence intervals of the variance example

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Let’s go through an example of what it would look like to compute the confidence interval of the variance:

Suppose that we are given $ 22 $ observations of IQ scores from US adults, and assume that the IQ scores follow a normal distribution.
The variance $ \sigma^2 $ of the IQ scores of the population of US adults is unkown to us, but we can make an estimation for it using the sample of $ n=22 $ observations with a point estimate and a confidence interval.
This means that there are $ n-1=21 $ degrees of freedom for the associted $ \chi^2 $ distribution.
We will select a confidece level of $ 95\% $ such that we want to find:

$ \chi_R^2 $ for which $ \frac{\alpha}{2} = \frac{5}{2}\% = 2.5\% $ of the area under the probability density lies to the right; and
$ \chi_L^2 $ for which $ \frac{\alpha}{2} = \frac{5}{2} = 2.5\% $ of the area under the probability density lies to the left.

In the above diagram, we see the associated $ \chi_R^2=35.479 $ and $ \chi_L^2=10.283 $ which depend on both $ \alpha $ and the number of degrees of freedom.
Let’s suppose that the sample standard deviation is given as $ s=14.22963 $.
Consider the following: using the formula for the confidence interval $ \frac{(n-1)\times s^2}{\chi_R^2} < \sigma^2 < \frac{(n-1)\times s^2}{\chi_L^2} $ can you determine the confidence interval for the variance $ \sigma^2 $ of the IQs?

Confidence intervals of the variance example continued

Given our confidence interval \[ \frac{(n-1)\times s^2}{\chi_R^2} < \sigma^2 < \frac{(n-1)\times s^2}{\chi_L^2} \]
we can write \[ \begin{align} & & \frac{21\times (14.22963)^2}{35.479} & < \sigma^2 < \frac{21 \times (14.22963)^2}{10.283} \\ \\ &\Leftrightarrow & 120.9 & < \sigma^2 < 417.2. \end{align} \]
Notice, we can furthermore find the confidence interval of the standard deviation $ \sigma $ by taking the square-root of all terms \[ 11.0 < \sigma < 20.4 . \]
Based on the particular samples, we can thus say with $ 95\% $ confidence that the standard deviation of the IQs of US adults is between $ 11.0 $ and $ 20.4 $.
Note: the above proccedure depends strongly on the normality of the population from which the observations $ x_1,\cdots,x_n $ are drawn.
This differs from our methods for estimating the population proportion and the population mean, which could be used for any kind of underlying population as long as there were sufficiently many samples.
For this type of estimation in particular, we will rely on modern statistical software to make these complex calculations accurately.
However, this is a good illustration of how our basic principles from the easier examples can be used to estimate more complex parameters.

Hypothesis testing

Recall that we have been using the convention throughout the course that an observation is statistically interesting or significant when there is $ 5\% $ chance or less probability to observe an observation at-least-as-extreme.

This principle has been the basis of us finding $ z_\frac{\alpha}{2} $ and $ t_\frac{\alpha}{2} $ critical values for $ \alpha = 0.05 $ corresponding to $ 5\% $.

Particularly, we would have found it unusual, in only $ 1 $ out of $ 20 $ replications of samples, that the associated confidence interval did not contain the true parameter.
We can think of rephrasing the above principle as well:

Suppose we are estimating the population proportion $ p $, and we have some hypothesis as to what the value might be, $ \tilde{p} $.
Let us suppose we created a $ 95\% $ confidence interval, \[ (\hat{p} - E, \hat{p} + E) \] and upon comparing we found that $ \tilde{p} $ was not in this region.
If we are following the procedure correctly, and if the $ \tilde{p} $ was actually equal to the true population $ p $, then \[ \tilde{p} \text{ not in } (\hat{p} - E, \hat{p} + E) \] only $ 5\% $ of the time.

In the same way that we are surprised if an observation is significantly:

high;
low; or
far away from the center;

we would find this occurance unusual, and we should question if $ \tilde{p} $ was really a good hypothesis for the true $ p $.
This is one of the ways we can motivate hypothesis testing, which will be the next subject.

Hypothesis testing continued

Formally, a hypothesis test starts with a hypothesis or claim about the population.

We might have the hypothesis that the true population proportion $ p $ is equal to $ \tilde{p} $ or that $ p $ will lie in some range.
For example, we may be interested in the proportion of the population in support of or against some new technology;

suppose there is the hypothesis that the majority of US adults are not comfortable with drone-based delivery of household goods, i.e., \[ p > 0.5. \]

Then, a hypothesis test is a formal way to evaluate this hypothesis using sample data to determine if this claim seems reasonable, or if we have statistically significant reasons to question this hypothesis.

In a Pitney Bowes survey in the book, $ 1009 $ US adults were surveyed to see if they were comfortable with drone-based delivery of online orders.
$ 545 $ participants responded that they were not comfortable with drone-based delivery.
This means that out of the sample, approximately $ 54\% $ of the respondents did not feel comfortable with drone-based delivery of houshold goods.

However, it isn’t automatically clear if:

this confirms our hypothesis about the population, or
if for example only $ 45\% $ of respondents didn’t feel comfortable that we should reject this hypothesis about the population,

as there is natural variablity in the responses due to sampling error.

Hypothesis testing is a procedure for formally making decisions about this claim based on sample data, in a way that uses statistical significance.

The null and alternative hypothesis

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

In order to make a procedure like confidence intervals that gives us a mathematical level of certainty of the resuts, we will introduce some terminology:

$ H_0 $ – this the null hypothesis.

The null hypothesis is always a statement about some population parameter begin equal $ (=) $ to some value.

$ H_1 $ – this is the alternative hypothesis.

The alternative hypothesis is the statement that the population parameter is different than the null.
Symbolically, it will always take the form of $ ( > / < / \neq) $ in terms of the parameter in question.

The first part of the procedure is to identify our claim and to write it in a mathematical way in terms equality or some kind of inequality.
Following this, we need to understand how these statements relate to the null and alternative hypotheses.

The null and alternative hypothesis continued

Let’s consider our earlier hypothesis that the majority of US adults are uncomfortable with drone-based delivery of household goods.
We need to formulate the proposition above in terms of a population parameter:

Let $ p $ be the true population proportion of US adults who feel uncomfortable with drone-based delivery;
we do not know this value, but we hypothesize that \[ p>0.5 \] because this constitutes a majority of US adults.

Consider the following: what if the majority of US adults were not opposed to drone-based delivery? How would could this statement be phrased to contradict our hypothesis in terms of the parameter $ p $?

If we want to write the opposite statment, that less than a majority of US adults are uncomfortable with drone-based delivery, this would be written as, \[ p \leq 0.5. \]
However, we want to find a specific statement about the equality of $ p $ so that we can work with a single value.

The idea that follows is that we will fix a value $ p $ that contradicts our hypothesis and see if the evidence from sampling makes the opposite claim seem unlikely.

Whenever we sample a group of US adults and collect their responses, the sample proportion $ \hat{p} $ is assigned a value that depends on that group of observations – this will be the evidence we will gather.

The largest proportion of the population of US adults who could be opposed to drone-based delivery without consituting a majority would be $ p=0.5 $

Therefore we will gather evidence under the assupmtion that $ p=0.5 $

The null and alternative hypothesis continued

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Corresponding to the flow chart, this was an example of steps 1 - 2.
Whenever we begin the process of hypothesis testing we will need to phrase the hypothesis mathematically, and then devise the opposite claim.
The opposite claim will be the one we will fix at an equality value $ (=) $ so that we can assume a specific value for the parameter.
This takes us to step 3:

The alternative hypothesis $ H_1 $ is the one which we want to test, which will take some form of inequality $ (> / < / \neq) $.
The null hypothesis $ H_0 $ is the contradictory hypothesis, which we assume to be true and take the parameter to be equal $ (=) $ to some value.

From our example, we thus phrase the problem as follows:

Let $ p $ be the population parameter for the number of US adults who are uncomfortable with drone-based delivery.
$ H_0: $ the null hypothesis states \[ p=0.5. \]
$ H_1: $ the alternative hypothesis states \[ p > 0.5. \]

We must now devise a way to compare the null hypothesis with evidence from sampling.

Significance level

We will once again follow the principle that if finding an observation at least-as-extreme as our sample is quite unlikely, we should question our assumptions.

If our sample is extreme under our assumed value of the parameter (similar observations are unlikely to be selected by chance),
we will reject the null hypothesis in favor of the alternative.

However, our sample is random and involves sampling error, so there will be a certain probability that we incorrectly reject the null hypothesis.
Let us select some $ \alpha $ usually at $ 5\% $ by convention but sometimes at $ 1\% $ or $ 10\% $ as well.

This $ \alpha $ is the probability of incorrectly rejecting the null hypothesis.

This $ \alpha $ is called the significance level, \[ \text{Significance level }\alpha = P(\text{Rejecting }H_0\text{ when }H_0\text{ is true}). \]

This is why we say an observation is significantly (high / low / far from the mean) when the probability of finding an observation at-least-as-extreme is $ 5\% $ or less.
This is also the same way we used $ \alpha $ to find a one-sided critical value $ z_\alpha $ or a two-sided critical value $ z_\frac{\alpha}{2} $ in the standard normal.

Significance level continued

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Choosing the significance level is step 4 of the flow chart of hypothesis testing.
In our example, we have:

identified the hypothesis or the claim that we wanted to test and expressed it mathematically in terms of inequality \[ p > 0.5; \]
Identified the contradictory claim and expressed it mathematically in terms of equality, \[ p=0.5; \]
Associated the alternative hypothesis with the ineqality, \[ H_1 : p > 0.5 \] and the null hypothesis with the equality \[ H_0 : p = 0.5 . \]
assumed the null hypothesis, and selected a level of significance $ \alpha $ usually equal to $ 5\% $.

In step 5 we will identify the test statistic which will be used to evaluate the hypothesis.

Identifying the test statistic

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Estimating parameters, we used our probability distribution for the sample statistics to evaluate $ (1-\alpha)\times 100\% $ confidence intervals for the the parameter of interest.
In the same way, we can use the probability distribtion for the sample statistic to to evaluate how extreme the test statistic is under the null hypothesis for the parameter.
The table lists identical statistics we used to compute confidence intervals, with the same assumptions, which we can compute once again from the sample.

When we assume the null hypothesis, we assume a value for the parameter of interest, e.g., $ p $ / $ \mu $ / $ \sigma $.
Therefore, the test statistic in the right column can be computed with the assumed value of the parameter of interest, and the associated statistic from the sample.
The way we evaluate the hypothesis with the test statistic is the basis of two different techniques.

P-values versus critical values

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

In the flow chart, we have now gone through the following steps:

We have identified a claim in terms of an inequality statement $ (> / < / \neq ) $.

In our example, we hypothesize that the majority of US adults were uncomfortable with drone-based delivery $ p>0.5 $.

We identified a contradictory claim that is phrased in terms of an equality $ (=) $.

the contradictory claim was given as $ p=0.5 $.

We identified the alternative hypothesis with the inequality and the null hypothesis with the equality.

That is $ H_1: p>0.5 $ and $ H_0: p= 0.5 $.

We chose a significance level – typically $ \alpha=5\% $.
We identified the appropriate test statistic,

i.e., \[ \frac{\hat{p} - p}{\sqrt{\frac{p\times q}{n}}} = \frac{0.540 - 0.50}{\sqrt{\frac{0.5\times 0.5}{1006}}} \approx 2.54 \] approximately distributed as standard normal.

We will now discuss how to evaluate this test statistic by the method of critical values.

Critical values

Critical regions for one- and two-sided tests of significance.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Using critical regions to test a hypothesis is similar to using critical values to construct confidence intervals.
The type of critical region we use depends on the alternative hypothesis $ H_1 $.

Suppose that the alternative hypothesis takes the form $ H_1: \neq $.

In this case, we will construct a critical region with endpoints where:

$ \frac{\alpha}{2} $ of the area under the probability density lies to the left of the left endpoint; and
$ \frac{\alpha}{2} $ of the area under the probability density lies to right of the right endpoint.

If the density is the standard normal, the critical region corresponds to, $ (-z_\frac{\alpha}{2}, z_\frac{\alpha}{2}) $

Suppose that the alternative hypothesis takes the form of $ H_1: < $.

In this case, we will construct a critical region with a single enpoint where $ \alpha $ of the area under the density lies to the left of the enpoint.
If the density is standard normal, the critical region corresponds to $ (-z_\alpha, + \infty). $

Suppose that the alternative hypothesis takes the form of $ H_1: > $.

In this case, we will construct a critical region with a single endpoint where $ \alpha $ of the area under the probability density lies to the right of this enpoint.
If the density is the standard normal, the critical region corresponds to $ (-\infty, z_\alpha) $.

In our example, we have $ H_1: p > 0.5 $ and we will reject the null hypothesis only if $ \frac{\hat{p} - p}{\sqrt{\frac{p\times q}{n}}} > z_\alpha $ under the null hypothesis $ p=0.5, q=0.5 $ .

Critical values

$The right-critical region corresponding to the example.$

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

This is because the test statistic \[ \frac{\hat{p} - p }{\sqrt{\frac{p \times q }{n}}} \] is simply the z score of the normal random variable $ \hat{p} $ with mean $ p $ and standard deviation $ \sigma_\hat{p} = \sqrt{\frac{p\times q}{n}} $.
Under the null hypothesis $ p=0.5 $ has a z score of zero as the assumed mean of the probability distribution for the sample proportion.
In our example, the z score for $ \hat{p} = 0.540 $ is $ z=2.55 $ or $ 2.55 $ standard deviations away from $ p $ in the positive direction.

Notice that the $ z_\alpha $ critical value for $ \alpha=0.05 $ is $ z_\alpha=1.645 $ and $ z > z_\alpha $.
Because the z score for $ \hat{p} $ lies outside of the critical region, we can say that there is less than $ \alpha=0.05 $ probability that we would find such an observed $ \hat{p} $ under the assumption that $ p=0.5 $.
I.e., there is less than $ 5\% $ chance to observe such a sample proportion at least-as-extreme as $ \hat{p} $ when $ p=0.5 $.
Therefore, this $ \hat{p} $ is a statistically significant result when we assume the null hypothesis of $ p=0.5 $, and we should question this assumption.
Particularly, we will reject the null hypothesis with a $ 5\% $ level of significance.

P-values

The right-critical region corresponding to the example.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Evaluating the probablity of observing some statistic at-least-as-extreme as $ \hat{p} $ with critical values can be performed directly with P-values.
To the left, we see a flow chart where we have the choice of:

$ H_1: < $ – corresponds to a left-tailed test.
$ H_1: \neq $ – corresponds to a two-sided test.
$ H_1: > $ – corresponds to a right-tailed test.

The P-value stands for the “probability value” of finding at-least-as-extreme as a sample statistic as our oberved test statistic.
For a left-tailed test, this corresponds to the area under the density to the left of our test statistic.
For a right-tailed test, this corresponds to the area under the density to the right of our test statistic.
Similarly to an $ \alpha $ critical value, the test statistic corresponds to its probability P-value.
If the P-value is less than $ \alpha $, then our test statistic is significant at the level $ \alpha $.
I.e., observing a sample value at least as extreme as the test statistic has probability less than $ \alpha $.

P-values

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

There is an additional consideration for a two-sided test of significance as in the center of the diagram.
Particularly, suppose $ H_1:\neq $,

we want to determine if there is a probability less than $ \alpha $ of finding a sample value at-least-as-extreme as our test statistic in either direction from the mean.

Therefore, we will determine if:

the test statistic lies to the left of the center, the P-value is twice the area to the left of the test statistic (for symmetric distributions).
the test statistic lies to the right of the center, the P-value is twice the area to the right of the test statistic (for symmetric distributions).

This is because for symmetric distributions, there are equal portions of area in the right and left tails.
Therefore, as a two-sided measure of extremeness there is equal probability of finding a sample value at-least-as far from the mean as the test statistic in the right and left tails.

Critical values vs p-values summary

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

In both ways of testing for significance:

finding the p-value of the test statistic, and
finding the critical region, and checking this region for the test statistic,

the purpose is the same – we pre-select a level of significance $ \alpha $ and check:

under the assumption of the null hypothesis
is the probability of observing a randomly sampled statistic at-least-as extreme as the test statistic less than $ \alpha $?

The critical value method does this pictorally while the p-value method does this numerically.

In either case if the probability is less than $ \alpha $, we reject the null hypothesis $ H_0 $.
If the probability is $ \alpha $ or greater, we fail to reject the null,

i.e., the evidence doesn’t make the contradictory claim look unlikely.

These steps finally take us to the end of the hypothesis test in which we have to make sense of the result and explain our conclusion.

Conclusion of a hypothesis test

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

To make a real conclusion about our original hypothesis, we should be careful about the meaning of “rejecting” or “failing to reject” the null hypothesis.
Formally, we do not ever “accept” the null hypothesis because this is not a statement that can be proved by the test statistic.
Indeed, we can only rule the null hypothesis as being unlikely when the evidence does not support it.
This means we can only support a research hypothesis by showing the contradictory claim to be unlikely.
Similarly, we can only reject a research hypothesis if the research hypothesis is the null hypothesis.

This is why it is important to formulate a research hypothesis in terms of a statement of inequality $ (> / < / \neq) $ if we want to show support for this claim.

Type I and type II errors

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Recall that we defined $ \alpha $ to be the probability, \[ \begin{align} \alpha = P(\text{Rejecting }H_0\text{ when }H_0\text{ is true}). \end{align} \]
If the p-value of a test statistic is $ 1\% $, this means that under the null hypothesis we would see a value at-least-as extreme as this once out of every $ 100 $ replications.
It is possible that the null hypothesis is true, and our sample value under this replication happens to be extreme.

We are also in danger of failing to reject the null hypothesis $ H_0 $ when the null hypothesis is false – this is denoted, \[ \begin{align} \beta = P(\text{Failing to reject }H_0\text{ when }H_0\text{ is false}). \end{align} \]
The mistake of rejecting the null hypothesis even when it is true is known as a type I error.
The mistake of not rejecting the null hypothesis even when it is false is known as a type II error.
Similar to our probability table examples, type I error can be thought of as a false positive, in terms of supporting the alternative.
On the other hand, type II error can be thought of as a false negative, in terms of rejecting the null.
Similar to confidence intervals, hypothesis testing represents a procedure that guarantees a certain level of significance over independent replications of samples;

nonetheless, any individual sample can lead to the wrong conclusion.

Type I and type II errors continued

Let’s recall our example about the population proportion uncomfortable with drone-based delivery.
We had null and altenative hypothesis given as \[ \begin{align} H_0: & p = 0.5 & & H_1: p> 0.5 \end{align} \]
Consider the following: how would you describe a type I error in this case? I.e., what would we conclude in the case of a type I error?

We would incorrectly reject $ H_0:p=0.5 $,
i.e., we would conclude that the majority of US adults are uncomfortable with drone-based delivery when it really is not a majority.

Consider the following: how would you describe a type II error in this case? I.e., what would we conclude in the case of a type II error?

We would incorrectly fail to reject the null $ H_0: p =0.5 $,
i.e., we would not conclude that the majority of US adults are opposed to drone-based delivery even though this is the case.

The practical risk of a type I and a type II error differ substantially – for example in a medical treament a type I error could make a false conclusion that a treatment is effective when there is actually no effect.

This could have dangerous consequences if the treatment has negative side effects and if prescribed the treatment may only have negative consequences.

For this reason, we typically control type I error $ (\alpha) $ as a first priority.

Actually, the probability of $ \alpha $, $ \beta $ and the sample size $ n $ are all interrelated, and knowing two of three determines the other.

Type I and type II errors continued

Because a choice of, \[ \begin{align} \alpha &= P(\text{Rejecting }H_0\text{ when }H_0\text{ is true})\\ \beta &= P(\text{Failing to reject }H_0\text{ when }H_0\text{ is false}) \end{align} \] determines a sample size, one can control the probability of

a type I error $ \alpha $; and
a type II error $ \beta $;

by choosing a large enough sample size $ n $.
However, in practice, one will choose a maximum tolerated probability of a type I error $ \alpha $ and usually take as many observations in a sample as there are in a practical balance with the probability $ \beta $ and the limits of obataining data.
We may suppose in a different respect, what is the probability that given a specific alternative \[ H_1: p = 0.6, \] what is the probability that we will correctly reject the null, \[ H_0: p= 0.5. \]
For a specific value of the alternative, we can use the significance level $ \alpha $ and the alternative value of the parameter and the probability $ \beta $ of a type II error, we can compute the probability of correctly rejecting $ H_0 $ in the case that it is false.
We have to fix some particular value to make this calculation, and with respect to the above this could be phrased as \[ P(\text{ Reject }H_0:p=0.5\vert H_1:p=0.6 \text{ is actually true}). \]
This measure of accuracy is called the “Power of the hypothesis test”
The power of the hypothesis test is more complicated to compute, and it will go beyond the scope of this course.
The key to remember is that the power depends on a particular value for the alternative as above, so that the power of the hypothesis test can take on many different values.