Introduction to hypothesis testing part II

04/16/2020

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

The following topics will be covered in this lecture:
- A short review of hypothesis testing
- Hypothesis testing and confidence intervals
- Examples of testing a hypothesis about a population proportion
- Examples of testing a hypothesis about a population mean

Review of hypothesis testing

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

We will begin a review of hypothesis testing – recall:

\( H_0 \) – this is the null hypothesis.

The null hypothesis is symbolically a statement about some population parameter begin equal \( (=) \) to some value.

\( H_1 \) – this is the alternative hypothesis.

The alternative hypothesis is the statement that the population parameter is different than the null.
Symbolically, it will always take the form of \( ( > / < / \neq) \) in terms of the parameter in question.
The form of the alternative hypothesis determines whether we consider a:

\( < \) – left-sided test;
\( > \) – right-sided test; or a
\( \neq \) – two-sided test;

in the measure of “extremeness” of the test statistic with respect to the null hypothesis.

Typically, our research hypothesis is phrased in terms of an inequality so that it is written as the alternative hypothesis.

Review of hypothesis testing continued

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

The first step is thus to identify the claim and write it as an equality, or more typically as an inequality.
We then write the contradictory claim and identify the null hypothesis \( H_0 \) \( (=) \) and alternative hypothesis \( H_1 \) \( (< / > / \neq) \).
We then assume the null hypothesis and select a significance level \( \alpha \).

The significance level \( \alpha \) is defined as \[ \begin{align} \alpha= P(\text{Rejecting the null }H_0\text{ when }H_0\text{ is actually true}). \end{align} \]

This is precisely due to the fact that,
we reject the null hypothesis when the probability of observing a sample at-least-as extreme as our test statistic is less than \( \alpha \), under the assumption of \( H_0 \).
The test statistic is the evidence from sampling that we compare with the null hypothesis;

there is a possiblity that we incorrectly reject \( H_0 \) due to chance based on sampling error, and this is with probability \( \alpha \).

This type of mistake is known as type I error, or a false positive in terms of favoring the alternative.
We measure how extreme a test statistic is (usually) with P-values or (less commonly) with critical values.

Review of hypothesis testing continued

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Actually, P-values and critical values are equivalent representations of the same measure of at-least-as-extreme as.
In the case of critical values, we construct the region for which:

\( H_1:< \) – there is probability of \( \alpha \) for randomly selecting an observation to the left of this region;
\( H_1:> \) – there is a probability of \( \alpha \) for randomly selecting an observation to the right of this region; or
\( H_1: \neq \) – there is a probability of \( \frac{\alpha}{2} \) of randomly selecting an observation to the left, or \( \frac{\alpha}{2} \) of randomly selecting an observation to the right.

With P-values, instead of graphically considering the region, we numerically compute the probability (P-value) of randomly selecting an observation at-least-as extreme as our test statistic directly.
We make the same considerations as above with respect to the form of the alternative hypothesis when we compute this probability.
For left or right sided tests, we find the probability of randomly selecting an observation at-least-as far left / far right (\( H_1: < \) or \( H_1: > \)).
For two sided tests, we find the probability of randomly selecting an observation at-least-as far from the center in either direction \( H_1:\neq \).

Confidence intervals and hypothesis testing

Before we go through examples, we should make one note about the correspondence between hypothesis testing and confidence intervals.
Actually, hypothesis tests / confidence intervals of the mean and the standard deviation, these are totally equivalent.
Indeed, let us suppose that we have some confidence interval, \[ (\overline{x} - E, \overline{x} - E) \] at a \( (1-\alpha)\times 100\% \) level of confidence.
Remember, this confidence interval depends on a random realization of the sample mean \( \overline{x} \) and, possibly, a random realization of the sample standard deviation \( s \).
Suppose we had a hypothetical value for the mean \( \tilde{\mu} \) that we wanted to test as the null, \[ H_0: \mu = \tilde{\mu}, \] with \( \alpha \) significance.
If we found that \( \tilde{\mu} \) was not in the interval \[ (\overline{x} - E, \overline{x} + E), \] we could equivalently reject the null \( H_0:\mu=\tilde{\mu} \) with \( \alpha \) significance.
The same is not true for hypothesis tests and our earlier confidence intervals of population proportions, due to the approximations we made for this confidence interval.
More advanced techniques, mentioned briefly, don’t suffer from this inconsistency however, and modern statistical software can make this consistent in the calculation.

Examples of testing hypotheses for a population proportion

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

We will now demonstrate how to solve the earlier example of drone-based delivery but with technology, in both the P-value and critical value methods.
Let’s recall that there were \( n=1009 \) total observations in the survey in which participants were asked if they were uncomfortable with drone-based delivery of household goods.
\( 545 \) participants responded that they were uncomfortable with drone-based delivery.
This means that using the normal distribution approximation is OK, because there are at least \( 5 \) successes and at least \( 5 \) failures, when we count a success as a response is opposed to drone-based delivery.
The null and alternative hypotheses were given \[ \begin{align} H_0:p=0.5 & & H_1: p> 0.5. \end{align} \] and we selected a significance level of \( \alpha=0.05 \).
We computed the test statistic as, \[ \frac{\hat{p} - p}{\sqrt{\frac{p\times q}{n}}} = \frac{0.540 - 0.50}{\sqrt{\frac{0.5\times 0.5}{1006}}} \approx 2.54. \]
To the left, we see equivalent wasy of viewing the test statistic – on the top with the P-value and on the bottom with the critical region with \( z_\alpha = 1.645 \) corresponding to \( \alpha\times 100\% = 5\% \).
We will now go through how to compute both of these directly in StatCrunch.

Examples of testing hypotheses for a population proportion continued

Let us now consider a new example.
In a study in Neurology magazine, the authors found that approximately \( 29.2\% \) of studied participants among \( 19,136 \) total had sleepwalked at some point.
Consider the following: let \( p \) be the population parameter for the number of US adults who have sleep walked.
Suppose we want to claim that fewer than \( 30\% \) of the population of US adults have sleepwalked. What would be the appropriate null \( H_0 \) and alternative \( H_1 \) hypothesis in this case?

Notice that our claim is given as \( p< 0.3 \), so that the contradictory claim would be given as \( p\geq 0.3 \).
Then, the smallest proportion of US adults who could have sleepwalked without being less than \( 30\% \) is exactly \( p=0.3 \).
When we identify the null hypothesis \( H_0:= \) and the alternative hypothesis \( H_1: > / < / \neq \), we should find \[ \begin{align} H_0: p=0.3 & & H_1: p < 0.3. \end{align} \]

Suppose we want to test this hypothesis with a significance level of \( \alpha =0.05 \). Recall that in StatCrunch we had to input the total number of observations and the total number of successes.
However, we only have a value of \( \hat{p}=29.2\% \) of the sample in this case.
Consider the following: if \( x \) is the value of the number of successes, i.e., the number total participants in the sample who have sleep walked, how can we find this from the above?

Recall, \( \hat{p}= \frac{x}{n} \) so that, \[ n \times \hat{p} = 19136 \times 0.292 = 5587.712 \approx x . \]
We need to round this to a whole number of participants, so the closest one is \( x= 5588 \), which we will take in this example.

Examples of testing hypotheses for a population proportion continued

Recall, our test statistic for the hypothesis test is the z score of \( \hat{p} = \frac{5588}{19136} \) under the null hypothesis that the mean of the sampling distribution is \( p =0.3 \), with standard deviation (standard error) \[ \sigma_{\hat{p}} = \sqrt{\frac{p\times q }{n}}. \]
Consider the following: can you compute the test statistic given the above information?

Given the above, we have \( q = 1.0 - 0.3 = 0.7 \) so that the z score is given \[ z = \frac{\hat{p} - p }{\sqrt{\frac{p\times q}{n}}} =\frac{\frac{5588}{19136} - 0.3}{\sqrt{\frac{0.3\times 0.7}{19136}}}\approx -2.41 \]

If you remember the value for the \( z_\alpha = z_{0.05}\approx 1.645 \) you can deduce the critical region for the left-sided hypothesis test by the symmetry of the normal distribution.
However, we will now use StatCrunch to evaluate the hypothesis directly in the following.

Examples of testing hypotheses for a population proportion continued

We should emphasize that using the z score as the test statistic is usually only appropriate when there are at least \( 5 \) successes and at least \( 5 \) failures for the binomial trial.
This is what allows us to use the normal distribution as a good approximation for the binomial distribution.
However, when we use statistical software, we will usually compute the test statistic exactly from the binomial distribution, without the normal approximation.
Therefore, we can make a hypothesis test with a small number of samples using statistical software directly.
Suppose we have a small sample size of \( 10 \) couples who are given a fertility treatment which is claimed to increase the rate of new born girls above \( 75\% \).
Consider the following: let’s suppose that \( 9 \) out of \( 10 \) babies are girls – can we claim with \( \alpha=0.05 \) significance that this is correct?

Notice that if \( p \) is the population proportion of girls born under the treatment then \[ \begin{align} H_0:p=0.75 && H_1:p>0.75 \end{align} \] as \( p=0.75 \) is the largest proportion of baby girls that can contradict the claim.
The z score is no longer relevant here because we have too few samples, but we can compute the hypothesis test directly in StatCrunch as follows.

Examples of testing hypotheses for a population mean

We will now consider some examples of making hypothesis tests for a population mean.
We should recall here the requirements that we have for making such a test, which are the same as for computing a confidence interval:

Observations should come from simple random sampling.
The observations \( x_1, \cdots, x_n \) can come from any underlying distribution…
however, if it is non-normal, there should be \( n>30 \) observations in the sample for the distribution of \( \overline{x} \) to be sufficiently normal.
Generally we do not know \( \sigma \), and in this case we use the test statistic, \[ \frac{\overline{x} - \mu}{\frac{s}{\sqrt{n}}}, \] distributed according to a student t with \( n-1 \) degrees of freedom.
In the rare case when \( \sigma \) is known, we can use the test statistic, \[ \frac{\overline{x} - \mu}{\frac{\sigma}{\sqrt{n}}} \] which is distributed as a standard normal.

In general, we will of course try to use modern statistical software to make these computations, but it is important to understand how these pieces fit together even when we use software.

Examples of testing hypotheses for a population mean continued

Histogram and q-q plot of the sample.png

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Let’s suppose that the \( n=12 \) observations to the left come from a simple random sample of US adults.
The measurments are the number of hours of sleep that each individual has per night on average over the year.
The sample mean is given as \( \overline{x}\approx 6.8333 \) hours and the sample standard deviation is given as \( s =1.9924 \) hours.

Consider the following can we use a hypothesis test with this data to claim with \( \alpha=0.05 \) significance that the population mean number of hours of sleep is less than \( 7 \) hours? Are the necessary assumptions satisfied?

Using the histogram and the Q-Q plot, we can see that the sample is approximately normal:

there is a symmetric bell shape to the histogram with no outliers, and the Q-Q plot roughly follows the diagonal line.

Then, given the above claim we have, \[ \begin{align} H_0: \mu = 0.7 & & H_1: \mu < 0.7 \end{align} \] as \( \mu=0.7 \) is the largest number of hours of sleep that contradicts the above claim.
The test statistic is thus given as \[ \frac{\overline{x} - \mu}{\frac{s}{\sqrt{n}}} = \frac{6.8333 - 0.7 }{\frac{1.9924}{\sqrt{12}}} \approx -0.290. \]

Examples of testing hypotheses for a population mean continued

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

From the last slide we had a test statistic \[ \frac{\overline{x} - \mu}{\frac{s}{\sqrt{n}}} = \frac{6.8333 - \mu }{\frac{1.9924}{\sqrt{12}}} \approx -0.290. \] which must be t distributed because we used the sample standard deviation \( s \).
Also, we know that with \( n=12 \) observations, we have precisely \( n-1=11 \) degrees of freedom for the student t distribution.
With the test statistic in hand, we can evaluate the hypothesis test by either:

the left-sided critical value; or
the left-sided P-value;

to measure if the probability of randomly selecting a sample mean at least as extreme as \( \overline{x}=6.8333 \) under the null hypothesis is less than \( \alpha=0.05 \).
Moreover, for the hypothesis tests of the mean, this is completely equivalent to finding the \( (1-\alpha)\times 100\%=95\% \) confidence interval for the mean.

We will examine each of these methods in StatCrunch directly as follows.

Examples of testing hypotheses for a population mean continued

As a final example, we will consider the claim that the population mean body temperature is \( \mu=98.6 \) degrees F.
We suppose that we have have \( n=106 \) observations with a sample mean of \( 98.20 \) degrees F, and a sample standard deviation of \( s= 0.62 \) degrees F, and that we wish to test this with \( \alpha=0.05 \) significance.
Consider the following: in this example, what are the null and alternative hypotheses?

In this case, the contradictory claim is actually the alternative hypothesis, \[ \begin{align} H_0: \mu = 98.6 & & H_1: \mu \neq 98.6 \end{align} \]

In this case, because we can only reject the null in favor of the alternative, we cannot provide proof that \( \mu=98.6 \) degrees F.

Indeed, we only have the possibility of providing evidence that \( \mu=98.6 \) degrees F is unlikely.

The test statistic, \[ \frac{\overline{x} - \mu}{\frac{s}{\sqrt{n}}} = \frac{98.2 - 98.6}{\frac{0.62}{\sqrt{106}}} \approx -6.64. \]
We have \( n=106 \) observations, so we have enough data to perform the hypothesis test by the critical value method, the P-value method, or by confidence intervals for the mean.
We will go through each of these in the following.