# Confidence intervals and hypothesis testing Part III

## Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

## Outline

• The following topics will be covered in this lecture:
• Testing the variance of a normal population
• Testing for a difference in variances of two independent samples
• Testing for equal means of two independent samples
• Testing for normality
• Testing for general goodness of fit

## Testing $$\sigma^2$$ of a normal population

• We have seen now how to produce confidence intervals and hypothesis tests for the mean of a population, and how to calculate the power of the test.

• It is also interesting to see whether the population variance has a certain value of $$\sigma_0^2$$.

• With a little modification of $$S^2$$, one can construct confidence intervals and hypothesis tests for a rv that follows a $$\chi^2_{n-1}$$ distribution.

• If $$X_1 ,\cdots, X_n$$ are i.i.d. normal rvs, then we have shown that,

\begin{align} \frac{\left(n-1\right)S^2}{\sigma^2} \sim \chi^2_{n-1} && \text{where} & & S^2 = \frac{1}{n-1}\sum_{i=1}^n \left(X_i - \overline{X}\right)^2 \end{align}

• Using a similar argument to the one used with the student's t-distribution, we can similarly find the critical points for the $$\chi^2_{n-1}$$ distribution to compute the hypothesis test / confidence interval.

• To do so, well need to use the varTest() from the EnvStats library.

### Testing $$\sigma^2$$ of a normal population

• We will generate 100 observations from a normal distribution with mean $$\mu = 1$$ and standard deviation $$\sigma =1$$ as follows
set.seed(0)
require("EnvStats")
sample <- rnorm(n=100, mean=1, sd=1)
varTest(sample)


Chi-Squared Test on Variance

data:  sample
Chi-Squared = 77.128, df = 99, p-value = 0.1014
alternative hypothesis: true variance is not equal to 1
95 percent confidence interval:
0.600583 1.051349
sample estimates:
variance
0.7790714

• Notice that even with a sample size of $$100$$, the p-value is low.

• However, because it is above $$5\%$$ we do not reject the null hypothesis that the variance is equal to $$1$$ with $$5\%$$ significance (a type I error).

## Testing for a difference of variances

• Recall now that we earlier defined the F-distribution as one defined in terms of a ratio of two $$\chi^2$$ random variables.

• This similarly forms a means to test the difference in the variance for two independent random samples.

• Once again, modifying the argument appropriately, we can construct a hypothesis test for whether the ratio of two variances is equal to one or if this differs.

• Consider the example data, sleep, a data frame with 20 observations on 2 variables:

• the amount of extra sleep after (possibly) taking a drug (extra); and
• the group ID for control or treatment (group).
str(sleep)

'data.frame':   20 obs. of  3 variables:
$extra: num 0.7 -1.6 -0.2 -1.2 -0.1 3.4 3.7 0.8 0 2 ...$ group: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ID : Factor w/ 10 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...  • We begin by subsetting the data according to the group number sleep_grp_1 <- sleep[sleep$group == 1,1]
female_data <- diabetes[diabetes$gender=="female",]  ### Testing for general goodness of fit • Having subset the data as such, we can extract the weight vector from each subset: ks.test(male_data$weight, female_data$weight)   Two-sample Kolmogorov-Smirnov test data: male_data$weight and female_data\$weight
D = 0.14667, p-value = 0.02977
alternative hypothesis: two-sided

• In the above, we see that the Kolmogorov–Smirnov distance is 0.14667.

• More particularly, the P value is 0.02977 such that we reject the null hypothesis of identically distributed weights with $$5\%$$ significance.