Estimating the standard deviation and introduction to hypothesis testing part I

04/14/2020

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

  • The following topics will be covered in this lecture:
    • A quick discussion of confidence intervals for the variance
    • Tests of significance
    • The null hypothesis
    • The alternative hypothesis
    • The process of hypothesis testing
    • Significance levels versus confidence levels
    • Test statistics
    • P-values
    • Critical values
    • Drawing conclusions
    • Type I and type II errors

Confidence intervals for the variance

Sample variaces are distributed right-skewed around the true population parameter.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • We have gone over in the last lecture how to estimate a population proportion \( p \) and a population mean \( \mu \).
    • In both cases, a sample statistic generates a “point estimate” as a kind of “best guess” given a certain collection of data.
    • Likewise, we needed a “confidence interval” to quantify how uncertain this best guess was, and to give a range of other plausible values for the parameter.
  • In both cases, our confidence interval needed to use some estimate of the standard deviation of the population to estimate our standard error of the sampling distribution.
    • the standard error tells us how the sample statistic varies around the true parameter under replication of samples.
  • When estimating the mean without knowledge of \( \sigma \), we used the sample standard deviation \( s \) to estimate the standard error \( \sigma_\overline{x} \).
    • We know that \( s^2 \) is the best, unbiased estimator for \( \sigma^2 \), and although \( s \) is a biased estimator for \( \sigma \), it is still usually the “best” option in some sense.
  • A more complicated question is the following,
    • how do we produce confidence intervals for \( \sigma \) that take into account the uncertainty in our sample-based estimates of this parameter?
  • This is especially due to the fact that the sample variances are distributed right-skewed around the true population variance.

Confidence intervals for the variance continued

Chi square distribution is skewed.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • Because this is a more complicated topic, and goes slightly beyond the overal scope of the course, this material will not have homework assignments or be tested.
    • The purpose of the first part of this lecture is to give exposure to some advanced topics that will be useful for future work with statistical methods.
  • The first advanced topic we will need to introduce is a very non-normal probability distribtuion, the “chi-square” distribution.
    • Usually, this is denoted \( \chi^2(k) \) where the Greek letter chi denotes “chi-square”.
    • The value \( k \) corresponds to the number of “degrees of freedom”, like the student t degrees of freedom, and will be introduced shortly.
  • Before we introduce the \( \chi^2 \) distribution formally, we want to just note a few qualitative features of the distribution:
    1. If \( x \) is a random variable that behaves like \( \chi^2 \), \( x \) will only take on nonnegative values, \[ x \geq 0 \] over any realization.
    2. The distribution of values \( x \) under \( \chi^2 \) are right-skewed, i.e., there are most values concentrated to the left around zero, but many extremely large values that occur with much higher frequency than with a normal distribution.
  • Interestingly, despite the differences from the normal distribution, \( \chi^2 \) is also closely related to the normal.

Chi-square distribution