Parameters of probability distributions and the binomial distribution

03/03/2021

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

  • The following topics will be covered in this lecture:

    • Review of mean, standard deviation and variance of probability distribution
    • Binomial distribution
    • Parameters of the binomial distribution

Motivation

  • Our goal in this course is to use statistics from a small, representative sample to say something general about the larger, unobservable population or phenomena.

  • Recall, the measures of the population are what we referred to as parameters.

  • Parameters are generally unknown and unknowable.

    • If we have a representative sample we can compute the sample mean.
    • The sample mean will almost surely not equal population mean, due to the natural variation (sampling error) that occurs in any given sample.
  • Random variables and probability distributions give us the model for estimating population parameters.

  • Generally, we will have to be satisfied with estimates of the parameters that are uncertain, but also include measures of “how uncertain”.

Characteristics of data

Diagram of the percent of outcomes contained within each standard deviation of the mean
for a standard normal distribution.

Courtesy of M. W. Toews CC via Wikimedia Commons.

  • In statistics, we try to characterize data and populations by a number of the features that they exhibits.
  • For a single variable, the most common measures are:
    1. Center: A representative value that indicates where the middle of the data set is located.
    2. Spread: A measure of the amount that the data values vary around the center.
  • We saw last time how the mean and standard deviation are related quantities describing these features of sample data or a population.
  • The above figure represents the theoretic description of a normal population.
  • In particular:
    • 68% of the population lies within one standard deviation of the mean, [μσ,μ+σ];
    • 95% of the population lies within two standard deviations of the mean, [μ2σ,μ+2σ]
    • 99.7 of the population lies within three standard deviations of the mean, [μ3σ,μ+3σ].
  • This is known as the empirical rule, which holds for all normal populations.
  • Sample data will tend to follow this, but not exactly, if the measurements come from a normal population.

Parameters of the binomial distribution

Random variables are the numerical measure of the outcome of a random process.

Courtesy of Ania Panorska CC

  • We saw earlier the following definitions for the mean and the standard deviation of a probability distribution:
    • Suppose we have a random variable X which assigns a numerical value to each outcome in the sample space S.
    • Suppose all values that X can attain are given by a collection {xα} in the range R of X.

  • Then the mean (or expected value) of the probability distribution is given, μ=xαRxαP(X=xα)
  • The standard deviation of the probability distribution is given σ=xαRP(X=xα)(xαμ)2
  • These formulas hold for all probability distributions (with a slight modification when the variable is continuous by using calculus).

Parameters of the binomial distribution continued

Random variables are the numerical measure of the outcome of a random process. Public domain via Wikimedia Commons

  • The binomial distribution has a very nice structure so that the parameters have a nice form.
  • For the binomial distribution the mean is given as, μ=n×p.
  • For the binomial distribution the variance is given as, σ2=n×p×q.
  • For the binomial distribution the standard deviation is given as, σ=n×p×q.
  • Q: what is μ and σ for the binomial distribution for 20 trials and probability of success p=0.5?
    • Notice that these are given as, μ=20×0.5=10σ=20×0.5×0.5=5
  • Q: what is μ and σ for the binomial distribution for 40 trials and probability of success p=0.5?
    • Notice that these are given as, μ=40×0.5=20σ=40×0.5×0.5=10
  • Q: what is μ and σ for the binomial distribution for 20 trials and probability of success p=0.7?
    • Notice that these are given as, μ=20×0.7=14σ=20×0.7×0.3=4.2

Review of the binomial distribution

  • The binomial distribution is a key distribution that gives us a way to model a wide range of experiments probabilistically.
  • This applies when we run an experiment with two possible outcomes S="success" and F="failure", where P(S)=pP(F)=1P(S)=q.
  • When we run exactly n total trials of the above experiment, assuming that:
    1. each trial is independent; and
    2. P(S)=p for every trial.
  • We can model the probability of a particular number of successes xα like a (possibly) non-fair coin flipping experiment.
  • We model the probability of exactly xα successful trials as P(X=xα)=n!(nxα)!xα!(1)×pxα(2)×q(nxα)(3) where:
    1. Total number of ways to find exactly xα successful trials out of n total trials;
    2. Probability of xα independent succesful trials;
    3. Probability of nxα independent failure trials;
  • The special structure of this distribution also allows us to compute the mean and standard deviation directly as μ=n×pσ=n×p×q