Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:

This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

- The following topics will be covered in this lecture:
- Confidence intervals
- Hypothesis testing
- Types of errors
- General hypothesis testing of the mean

Recall again the

**sampling distribution**for the sample mean;- if we can write the
**distribution for the sample statistic**as approximately \( N\left(\mu, \frac{\sigma^2}{n}\right) \),

- if we can write the
we can identify

**with some probability****how accurate our estimate for the true mean**is.The

**standard error**is defined as the**standard deviation of this sample statistic distribution, i.e., \( \sigma_{\hat{\theta}} \)**.- this quantifies how far an observed realization is likely to lie away from the true parameter.

The standard error of the sample mean measures the accuracy of the estimation of the mean,

- Respectively,
**confidence intervals****quantify how close the sample mean is expected to be to the population mean**.

- Respectively,
We will recall how to construct confidence intervals for the mean of a normal distribution.

- Consider a normal population with an
**unknown mean \( \mu \)**and**known standard deviation \( \sigma \)**. - Let \( X_i \sim N\left(\mu, \sigma^2 \right) \) for \( i = 1, \cdots, n \) be the sample rv’s.
- Then for the sample mean of the random variables \[ \begin{align} \overline{X}_n &= \frac{1}{n}\sum_{i=1}^n X_i \\ \overline{X}_n &\sim N\left(\mu, \frac{\sigma^2}{n}\right) \end{align} \]
- Moreover, we can show that by
**standardizing the sample mean**of the random variables, shifting to mean zero and dividing by the standard deviation, we have \[ \sqrt{n}\frac{\overline{X}_n -\mu}{\sigma} \sim N(0, 1) \] - Now, let \( z_{1−\frac{\alpha}{2}} \) be defined such that for \( Z \sim N(0, 1) \) \[ P\left(-z_{1−\frac{\alpha}{2}}\leq Z \leq z_{1−\frac{\alpha}{2}}\right) = 1 - \alpha \]
- Such a choice exists by the symmetry of the standard normal distribution about zero,

Courtesy of Härdle, W.K. et al. *Basic Elements of Computational Statistics*. Springer International Publishing, 2017.

Using the results from the last slide, we can say that for \( Z= \sqrt{n}\frac{\overline{X}_n -\mu}{\sigma} \),

\[ \begin{align} & P\left(-z_{1−\frac{\alpha}{2}}\leq Z \leq z_{1−\frac{\alpha}{2}}\right) = 1 - \alpha \\ \Leftrightarrow & P\left(-z_{1−\frac{\alpha}{2}}\leq \sqrt{n}\frac{\overline{X}_n -\mu}{\sigma} \leq z_{1−\frac{\alpha}{2}}\right) = 1 - \alpha \end{align} \]

We will re-write the interval in the above as follows:

\[ \begin{align} \left(-z_{1−\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}\leq \overline{X}_n -\mu \leq z_{1−\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}}\right) &= \left(-\overline{X}_n -z_{1−\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}\leq -\mu \leq -\overline{X}_n+ z_{1−\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}}\right) \\ &= \left(\overline{X}_n -z_{1−\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}\leq \mu \leq \overline{X}_n+ z_{1−\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}}\right) \end{align} \]

From the above statement, we can read that

Upon replication of a sample of size \( n \), the random interval \[ \left(\overline{X}_n -z_{1−\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}\leq \mu \leq \overline{X}_n+ z_{1−\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}}\right) \] has a probability of \( 1-\alpha \) of covering \( \mu \). Particularly, for a given observed sample mean \( \overline{x}_n \), constructing a confidence interval as above will keep \( \overline{x}_n \) within a radius of \( z_{1−\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}} \) from the true value \( \mu \) \( (1-\alpha)\times 100\% \) of the time over infinite replications.

- What we are imagining when we construct confidence intervals is the following.
- Based on some
**particular sample**\( X_{j,1},\cdots, X_{j,n} \) of size \( n \) indexed by \( j \), we will get some**particular value for the confidence interval**. - If we
**replicate the sample of size**\( n \), indexed by \( j \), we will almost surely find a**new confidence interval based on each replicate**.

Courtesy of Montgomery & Runger, *Applied Statistics and Probability for Engineers*, 7th edition

- Our goal in constructing a confidence interval is thus to
**catch the true parameter value**with the**confidence level \( (1-\alpha)\times 100\% \) out of all replicates**. - If we want higher confidence, we need wider intervals to catch the true value.

- However, the normal confidence interval,
\[ \overline{X} - \frac{\sigma}{\sqrt{n}}z_\frac{\alpha}{2} \leq \mu \leq \overline{X} + \frac{\sigma}{\sqrt{n}} z_\frac{\alpha}{2} \]
also has a
**width that depends on the sample size**. - This is of course, as we discussed in the central limit theorem, the precision of the sample mean \( \overline{X} \) increases for larger sample sizes, with a
**standard deviation that shrinks at a rate like \( \frac{1}{\sqrt{n}} \)**. - This allows us to select a sample size for a target precision, given a level of confidence.

The issue with the mentioned approach to confidence intervals is that the

**true population value of \( \sigma \) is almost never known**in any practical application.For this reason, we can pass to the

**student's t-distribution**again.Recall that we showed that for the sample mean of the normal random variables \( \overline{X}_n \); and

the sample standard deviation of the normal random variables \( S \),

\[ \frac{\overline{X}_n - \mu}{\frac{S}{\sqrt{n}}} \sim t_{n-1}. \]

Therefore, in practice we can construct the same type of random interval but for a

**\( t_{\frac{\alpha}{2}} \) critical value of \( t_{n-1} \)**, \[ \left(\overline{X}_n -t_{\frac{\alpha}{2}} \frac{S}{\sqrt{n}}\leq \mu \leq \overline{X}_n+ t_{\frac{\alpha}{2}}\frac{S}{\sqrt{n}}\right). \]The above derivation is at the basis of practical confidence intervals for the population mean.

A related,

**dual concept**is the**hypothesis test for the mean**.

- Suppose we are
**estimating the population**mean \( \mu \), and we have some**hypothesis as to what the value might be**, \( \tilde{\mu} \). - Let us suppose we created a \( 95\% \) confidence interval,
\[ \left(\overline{X} - \hat{\sigma}_\overline{X} t_\frac{\alpha}{2}, \overline{X} + \hat{\sigma}_\overline{X} t_\frac{\alpha}{2}\right) \]
and upon comparing with some realization \( \overline{x} \) we found that \( \tilde{\mu} \)
**was not in this region**. - If we are following the procedure correctly, and if the \( \tilde{\mu} \) was actually equal to the true population \( \mu \), then \[ \tilde{\mu} \text{ not in } \left(\overline{X} - \hat{\sigma}_\overline{X} t_\frac{\alpha}{2}, \overline{X} + \hat{\sigma}_\overline{X} t_\frac{\alpha}{2}\right) \] only \( 5\% \) of the time.
- If we were to find that \( \tilde{\mu} \) was actually in our confidence intervals in far less than \( 5\% \) of replications, this should question if \( \tilde{\mu} \) was really a good hypothesis for the true \( \mu \).
- In this sense, \( \alpha \) represents a kind of criterion if we should question if a proposed value of \( \tilde{\mu} \) is really appropriate.

- Formally, we will define

Statistical Hypothesis

Astatistical hypothesisis a statement about the parameters of one or more populations.

In hypothesis testing, the null and alternative hypotheses have special meanings philosophically and in the mathematics.

**We cannot generally “prove” a hypothesis to be true**;- generically, we will assume that the true population parameter is unobservable.

Instead,

**we can only determine if a hypothesis seems unlikely enough to reject**;- this is similar to finding that our proposed parameter value was in far-fewer confidence intervals than predicted by the procedure.

To begin such a test formally, we need to first

**make some assumption about the true parameter**.- This always takes the form of
**assuming the null hypothesis**\( H_0 \).

- This always takes the form of
The

**null hypothesis**\( H_0 \) will always**take the form of an equality, or an inclusive inequality**.- That is, we take

\[ \begin{align} H_0: & \theta \text{ is } (= / \leq / \geq) \text{ some proposed value}. \end{align} \]

The contradictory / competing hypothesis is the alternative hypothesis, written

\[ \begin{align} H_1: & \theta \text{ is } (\neq / > / <) \text{ some proposed value} \end{align} \]

Once we have formed a null and alternative hypothesis:

\[ \begin{align} H_0: & \theta \text{ is } (= / \leq / \geq) \text{ some proposed value}\\ H_1: & \theta \text{ is } (\neq / > / <) \text{ some proposed value} \end{align} \]

we

**use the sample data**to**consider how likely or unlikely it was to observe such data with the proposed parameter**.- If the
**sample doesn't seem to fit the proposed parameter value**, we**deem the null hypothesis unlikely**.

- If the
If the

**null hypothesis is sufficiently unlikely**,**we reject the null hypothesis in favor of the alternative hypothesis**.However, if the evidence (the sample) doesn't contradict the null hypothesis, we tentatively keep this assumption.

- This
**has not proven this assumption**, it has only said that the**hypothesis is not unlikely given our evidence**.

- This

- Our decision process is based on the random outcome of the test statistic, so that even if an outcome seems unlikely, we may come to a false conclusion based on observing a low-probability event.
- There are
**two possible wrong conclusions**we can make in this decision process: - we may
**reject the null hypothesis when this is actually true**; - we may
**fail to reject the null hypothesis when this is actually false**. - A schematic of this hypothesis testing decision process is given in the right:

Type I Error

Rejecting the null hypothesis \( H_0 \) when it is true is defined as atype I error.

Type II Error

Failing to reject the null hypothesis \( H_0 \) when it is false is defined as atype II error.

Courtesy of Montgomery & Runger, *Applied Statistics and Probability for Engineers*, 7th edition

- Based on these two possible errors, we can define different probabilistic criteria that will attempt to handle these risks of incorrect decisions.
- It turns out that the rate of failure of confidence intervals is the same as the probability of type I error.

Probability of Type I Error

\[ \alpha = P(\text{type I error}) = P(\text{reject }H_0\text{ when }H_0\text{ is true}) \]

- In evaluating a hypothesis-testing procedure, it is also important to examine the probability of a type II error, which we denote by \( \beta \).

Probability of Type II Error

\[ \beta = P(\text{type II error}) = P(\text{failing to reject }H_0\text{ when }H_0\text{ is false}). \] The complementary probability, \( 1- \beta \) is called thepowerof the hypothesis test.

To calculate \( \beta \), we must have a

**specific alternative hypothesis**;- that is,
**we must have a particular value**of \( \mu \).

- that is,
This is because, the unknown, true alternative hypothesis for \( \mu \) will determine the sampling distribution for \( \overline{X} \).

From this sampling distribution, we compute the appropriate probability for failing to reject our null hypothesis, given the true distribution with respect to the true alternative.

Student’s t test can be used in R through the function

`t.test()`

, which will include the dual confidence interval.Specifically, if we have a formal hypothesis test

\[ \begin{align} H_0:\mu = \tilde{\mu} & & H_1: \mu \neq \tilde{\mu}; \end{align} \] and if the variance \( \sigma^2 \) is also unknown;

then assuming the null, we write the acceptance region as

\[ \left( \tilde{\mu} - \hat{\sigma}_\overline{X} t_\frac{\alpha}{2} , \tilde{\mu} + \hat{\sigma}_\overline{X} t_\frac{\alpha}{2}\right). \]

If the sample mean \( \overline{X} \) lies outside of the acceptance region, i.e., in the critical region,

\[ \left(-\infty, \tilde{\mu} - \hat{\sigma}_\overline{X} t_\frac{\alpha}{2}\right) \cup \left( \tilde{\mu} + \hat{\sigma}_\overline{X} t_\frac{\alpha}{2}, \infty\right), \]

- we reject the null hypothesis with \( \alpha \times 100\% \) significance.

Alternatively, if the sample mean lies within the acceptance region, we fail to reject the null hypothesis with \( \alpha\times 100\% \) significance.

The sodium content of twenty 300-gram boxes of organic cornflakes was determined.

The data (in milligrams) are as follows:

```
sodium_sample <- c(131.15, 130.69, 130.91, 129.54, 129.64, 128.77,130.72, 128.33, 128.24, 129.65, 130.14, 129.29, 128.71, 129.00, 129.39, 130.42, 129.53, 130.12, 129.78, 130.92)
```

Let's suppose we want to test the hypothesis,

\[ \begin{align} H_0: \mu = 130 & & H_1:\mu \neq 130; \end{align} \]

If we use

`t.test()`

directly, notice the output

```
t.test(sodium_sample)
```

```
One Sample t-test
data: sodium_sample
t = 662.06, df = 19, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
129.3368 130.1572
sample estimates:
mean of x
129.747
```

- Rather, to set the correct null and alternative hypothesis, we write,

```
t.test(sodium_sample, mu=130, alternative="two.sided")
```

```
One Sample t-test
data: sodium_sample
t = -1.291, df = 19, p-value = 0.2122
alternative hypothesis: true mean is not equal to 130
95 percent confidence interval:
129.3368 130.1572
sample estimates:
mean of x
129.747
```

Notice that the above includes the test statistic \( t_0 = -1.291 \).

- This also lists the number of degrees of freedom \( df= n-1 = 19 \) for the t distribtion.

Most importantly, this lists the P value, \( \approx 0.2122 \).

If we take \( \alpha=0.05 \), a common convention, we say \( P> \alpha \), such that

**we fail to reject the null hypothesis of \( \mu = 130 \)**.

Suppose we wanted to perform a hypothesis test to make sure the mean sodium is not too high;

- if we wanted to evaluate the one-sided hypothesis test

\[ \begin{align} H_0: \mu \leq 130 & & H_1:\mu >130, \end{align} \]

we would write in R

```
t.test(sodium_sample, mu=130, alternative="greater")
```

```
One Sample t-test
data: sodium_sample
t = -1.291, df = 19, p-value = 0.8939
alternative hypothesis: true mean is greater than 130
95 percent confidence interval:
129.4081 Inf
sample estimates:
mean of x
129.747
```

- Here, we once again fail to reject the null hypothesis at \( \alpha\times 100\% = 5\% \) significance, as \( P\approx 0.89 \).

Computing the power of a t test or the sample size necessary for a hypothesis test to reach a certain power is complicated to perform analytically, and is more practically done with technology.

There is a built-in feature in R that will compute either the power of a test, or the needed sample size to attain a power, with the t test.

The

`power.t.test()`

takes the following arguments

```
power.t.test(n, delta, sd, sig.level, power, alternative, type="one.sample")
```

- where
`n`

is the sample size`delta`

is the difference between the assumed, but untrue, null hypothesis and the unknown, but assumed true, alternative hypothesis;`sd`

is the the sample standard deviation;`sig.level`

is the value of \( \alpha \);`power`

is the power of the test;`alternative`

is the alternative hypothesis; and- we need to specify the
`type="one.sample"`

as above.

- When we enter the
`power.t.test()`

,

```
power.t.test(n, delta, sd, sig.level, power, alternative, type="one.sample")
```

we will actually leave out one of:

`power`

; or-
`n`

as an argument.

The argument that is left out,

`power`

or`n`

, will be computed from the other arguments.We will continue our example with the sodium sample, now evaluating the power of our earlier tests

```
t.test(sodium_sample, mu=130, alternative="two.sided")
```

```
One Sample t-test
data: sodium_sample
t = -1.291, df = 19, p-value = 0.2122
alternative hypothesis: true mean is not equal to 130
95 percent confidence interval:
129.3368 130.1572
sample estimates:
mean of x
129.747
```

- In our sodium sample example, we had

```
s <- sd(sodium_sample)
n <- length(sodium_sample)
mu_null <- 130.0
```

- Suppose we have a specific value for the alternative hypothesis in mind, i.e.,

```
mu_alternative <- 130.5
```

and we wish to determine the power of the test to reject the false, null hypothesis.

We will leave the

`power`

argument blank in the function, but we need to calculate`delta`

.`delta`

is given as the absolute difference between our false null hypothesis, and the true alternative, i.e.,

```
delta <- abs(mu_null - mu_alternative)
delta
```

```
[1] 0.5
```

To calculate the power of the hypothesis test,

\[ \begin{align} H_0 : \mu = 130 & & H_1:\mu \neq 130 \end{align} \]

where we assume the true alternative hypothesis is \( H_1: \mu=130.5 \),

with a significance level of \( \alpha=0.05 \),

we can compute this at once witht the

`power.t.test()`

as:

```
power.t.test(n=n, delta=delta, sd=s, sig.level=0.05, power=NULL, type="one.sample")
```

```
One-sample t test power calculation
n = 20
delta = 0.5
sd = 0.8764288
sig.level = 0.05
power = 0.6775708
alternative = two.sided
```

Suppose we want to calculate power of the same type of hypothesis test, but with a different, one-sided alternative hypothesis.

- e.g.,

\[ \begin{align} H_0:\mu \leq 130 & & H_1 :\mu > 130. \end{align} \]

We specify this in the function as,

```
power.t.test(n=n, delta=delta, sd=s, alternative="one.sided" , sig.level=0.05, power=NULL, type="one.sample")
```

```
One-sample t test power calculation
n = 20
delta = 0.5
sd = 0.8764288
sig.level = 0.05
power = 0.7921742
alternative = one.sided
```

On the other hand, suppose we need to find the sample size necessary to meet a certain power with one of the earlier hypothesis tests.

E.g., we might try to reject the null if a true mean sodium content is actually 130.1 milligrams, with a power of the test equal to 0.75.

To do so, we now need to negelct the sample size argument

`n`

and supply the power argument`power`

.The needed arguments are assigned below:

```
s <- sd(sodium_sample)
mu_null <- 130.0
mu_alternative <- 130.1
delta <- abs(mu_null - mu_alternative)
pow <- 0.75
```

- We determine the appropriate sample size via

```
power.t.test(n=NULL, delta=delta, sd=s, power=pow, type="one.sample")
```

```
One-sample t test power calculation
n = 535.0307
delta = 0.1
sd = 0.8764288
sig.level = 0.05
power = 0.75
alternative = two.sided
```

- for the two-sided test, or for the one-sided test we use

```
power.t.test(n=NULL, delta=delta, sd=s, alternative="one.sided", power=pow, type="one.sample")
```

```
One-sample t test power calculation
n = 414.5589
delta = 0.1
sd = 0.8764288
sig.level = 0.05
power = 0.75
alternative = one.sided
```