Introduction to estimation part II

04/09/2020

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

  • The following topics will be covered in this lecture:
    • Review of point estimates, confidence intervals and critical values
    • Margin of error
    • Estimating a population proportion
    • Finding the right sample size
    • Estimating a population mean
    • The student t distribution
    • Confidence intervals for the mean
    • The special case when \( \sigma \) is known
    • Finding the right sample size

Review of point estimates and confidence intervals

Sample proportions tend to be normally distributed about the true parameter as the mean.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • In the last lecture, we saw how a sample proportion generates a random variable.
  • That is, when we take a sample of a population and compute the proportion of the sample for which some statement is true.
  • Suppose we want to replicate this sampling procedure infinitely many times
    • It impossible to replicate the sampling infinitely many times, but we can construct a probabilistic model for this replication process with a probability distribution.
  • Formally, we will define \( \hat{p} \) to be the random variable equal to the proportion derived from a random sample of \( n \) observations.
    • For each replication, \( \hat{p} \) attains a different value based on chance.
  • Then, for random, independent samples, \( \hat{p} \) tends to be normally distributed about \( p \).
    • We can thus use the value of \( \hat{p} \) and the distribution of \( \hat{p} \) to estimate \( p \) and how close we are to it.
  • We know that \( \hat{p} \) is an unbiased estimator of the true population proportion \( p \).
    • That is, over infinitely many resamplings, the expected value (mean of the probability distribution) for \( \hat{p} \) is equal to \( p \).
  • When we have a specific sample data set, and a specific value for \( \hat{p} \) associated to it, \( \hat{p} \) is called a point estimate for \( p \).
    • The measure of “how-close” we think this is to the true value is called a confidence interval.

Review of confidence intervals

Common confidence levels.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • Let’s recall how we constructed confidence intervals (CI) in the last lecture.
    • Suppose that we want to estimate the true proportion \( p \) with some level of confidence:
      • if we replicated the sampling procedure infinitely many times, the average number of times we found \( p \) in our confidence interval would be equal to the level of confidence.
  • Let’s take an example confidence level of \( 95\% \) – this corresponds to a rate of failure of \( 5\% \) over infinitely many replications.
  • Generally, we will write the confidence level as, \[ (1 - \alpha) \times 100\% \] so that we can associate this confidence level with its rate of failure \( \alpha \).
  • Recall, we earlier studied ways that we can compute the critical value associated to some \( \alpha \) for the normal distribution.
  • We will use the same principle here to find how wide is the interval around \( p \) for which \( \hat{p} \) will lie in this interval \( (1-\alpha)\times 100\% \) of the time.
  • This is equivalent to a two-sided measure of extremeness in the normal distribution, i.e.,
    • we want to find the critical value \( z_\frac{\alpha}{2} \) for which:
      • \( (1-\frac{\alpha}{2})\times 100\% \) of the area under the normal density lies to the left of \( z_\frac{\alpha}{2} \); and
      • \( (1-\frac{\alpha}{2})\times 100\% \) of the area under the normal density lies to the right of \( -z_\frac{\alpha}{2} \).
    • Put together, \( (1-\alpha)\times 100\% \) of values lie within \( [-z_\frac{\alpha}{2},z_\frac{\alpha}{2}] \).