Probability distributions part I

03/05/2020

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

  • The following topics will be covered in this lecture:
    • Random variables
    • Distributions
    • Parameters versus statistics
    • Expected values
    • Significant values

Motivation

Book chapter flow chart.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • So far we have learned two sets of skills
    1. Summary statistics – used for analyzing samples; and
    2. Probability – used to analyze complex events abstractly.
  • Our goal is to use statistics from small, representative samples to say something general about the larger, unobservable population.
    • Recall, the measures of the population are what we referred to as parameters.
  • Parameters are generally unknown and unknowable.
    • For example, the mean age of every adult living in the United States is a parameter for the adult population of the USA.
    • We cannot possibly know this value exactly as there are people who cannot be surveyed and / or don’t have accurate records.
    • If we have a representative sample we can compute the sample mean.
    • The sample mean will almost surely not equal population mean, due to the natural variation (sampling error) that occurs in any given sample.
    • However, if we have a good probabilistic model for the ages of adults, we can use the sample statistic to estimate the general, unknown population parameter.

Motivation continued

Book chapter flow chart.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • The flow chart at the left describes how these different tools fit together now.
  • Coin flipping is a process that we can:
    1. take samples of; and
    2. build a probabilistic model for.
  • Let us define \( x \) to be a variable that represents the number of heads that arise from two coin flips.
  • Each time we take a sample of two coin flips, \( x \) might have a different result.
    • Therefore, we call \( x \) to be a random variable that depends on the particular sample outcome.
  • In the top-horizontal box, we see the results from an experiment where a coin is flipped twice;
    • the experiment is repeated \( 110 \) times and the frequency that \( x \) takes the value,
      1. \( x=0 \) : occurs \( 27 \) times
      2. \( x=1 \) : occurs \( 56 \) times
      3. \( x=2 \) : occurs \( 17 \) times
    • is recorded in a fequency distribution.
    • Using the frequency distribution, we can compute the mean of \( x \) as \( \overline{x}=.9 \) and the standard deviation of \( x \), \( s=0.7 \).

Motivation continued

Book chapter flow chart.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • In the lower-horizontal box, we see how we construct an abstract, probabilistic model for \( x \).
  • In this case, we can compute exactly all the ways that:
    1. \( x=0 \) relative to all possible outcomes;
    2. \( x=1 \) relative to all possible outcomes;
    3. \( x=2 \) relative to all possible outcomes.
  • We know that
    • \( x=0 \) occurs only as \( \{T,T\} \);
    • \( x=1 \) occurs as \( \{T,H\} \) and \( \{H,T\} \); and
    • \( x=2 \) occurs only as \( \{H,H\} \).
  • Therefore, out of four possible outcomes we have that:
    1. \( P(0)=0.25 \)
    2. \( P(1)=0.5 \)
    3. \( P(2)=0.25 \)

Motivation continued

Book chapter flow chart.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • The box on the right then shows what we will call a probability distribution.
    • Instead of listing frequencies with values of \( x \), we will list the probabilities.
  • The probability distribution will be the theoretical, probabilistic model for how \( x \) behaves over the population.
  • Combining probabilities and sample statistics, we can estimate the true population mean \( \mu \) and true population standard deviation \( \sigma \).
  • Note: in the table it says that we “find” the parameters;
    • we can only “find” the paramters exactly in very simple examples like games of chance.
    • Generally, we will have to be satisfied with estimates of the parameters that are uncertain, but also include measures of “how uncertain”.

Random variables

Random variables are the numerical measure of the outcome of a random process.

Courtesy of Ania Panorska CC

  • The first concept that we will need to develop is the random variable.
  • Prototypically, we can consider the coin flipping example from the motivation:
    • \( x \) is the number heads in two coin flips.
  • Every time we repeat two coin flips \( x \) can take a different value due to many possible factors:
    • how much force we apply in the flip;
    • air pressure;
    • wind speed;
    • etc…
  • The result is so sensitive to these factors that are beyond our ability to control, we consider the result to be by chance.
  • Before we flip the coin twice, the value of \( x \) has yet-to-be determined.
  • After we flip the coin twice, the value of \( x \) is fixed and possibly known.
  • Formally we will define:
    • Random variable – is a variable that has a single numerical value, determined by chance, for each outcome of a procedure.

Random variables continued

Random variables are the numerical measure of the outcome of a random process.

Courtesy of Ania Panorska CC

  • Suppose we are considering our sample space \( \mathbf{S} \) of all possible outcomes of a random process.
  • Then for any particular outcome of the process,
    • e.g., for the coin flips one outcome is \( \{H,H\} \),
  • mathematically the random variable \( x \) takes the outcome to the numerical value \( x=2 \) in the range \( \mathbf{R} \).
  • Note: \( x \) must always take a numerical value.
  • Because a random variable takes a numerical value (not categorical), we must consider the units that \( x \) takes:
    • Discrete random variable – these take numerical values that are in counting units.
      • In particular, the unit of \( x \) cannot be arbitrarily sub-divided.
        • We can think of “how many coin flips heads” is measured in counting units because \( 1.45 \) heads does not make sense.
      • However, the values \( x \) takes don’t strictly need to be whole numbers;
        • the units just cannot be arbitrarily sub-divided.
      • The scale of units for \( x \) can be finite or infinite depending on the problem.

Random variables continued

Random variables are the numerical measure of the outcome of a random process.

Courtesy of Ania Panorska CC

    • Continuous random variable – these take numerical values that are in continuous units.
      • The units of \( x \) can be arbitrarily sub-divided and \( x \) can take any value in the sub-divided units.
      • Necessarily, \( x \) can take infinitely many values when it is continuous.
        • A good example to think of is if \( x \) is the daily high temperature in Reno in degrees Celsius.
        • If we had a sufficiently accurate thermometer, we could measure \( x \) to an arbitrary decimal place and it would make sense.
        • \( x \) thus takes today’s weather from the outcome space and gives us a number in a continuous unit of measurement.

Probability distributions

Probability distribution for two coin flips with x number of heads.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • Given a random variable, our method for analyzing its behavior is typically through a probability “distribution”.
  • Probability distribution – this is a description that gives the probability for each possible value of the random variable.
    • A probability distribution can thus be considered a complete description of the random variable.
      • For any possible value that \( x \) might attain given any possible outcome, we know with what probability this will occur.
    • It is often expressed in the format of a table, formula, or graph.
  • We see that the table above is a probability distribution as this gives every possible value for \( x \) its associated probability.
  • Notice if we consider the sum of \( P(x=x_\alpha) \) over all possible \( x_\alpha \) in the range of \( x \), \( \mathbf{R} \), \[ \sum_{x_\alpha\in \mathbf{R}}P(x=x_\alpha) = 1. \]
    • In fact, this holds for any \( x \) and its associated distribution – intuitively, consider \[ P(x=0 \text{ or } x=1 \text{ or } x=2) = 1 \] because this is all possible values that \( x \) can attain.
    • However, all \( x=0 \), \( x=1 \) and \( x=2 \) are all disjoint so that, \[ P(x=0 \text{ or } x=1 \text{ or } x=2) = P(x=0) + P(x=1) + P(x=2) = 1 \]
    • The same intuition can be used for infinite ranges when we use calculus to define this more formally.

Probability distributions continued

Histogram of probability distribution for two coin flips with x number of heads.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • We can graphically represent the probability distribution with a histogram similarly to how we represent a relative frequency distribution.
  • Notice, all values for the probability distribution, \[ 0 \leq P(x=x_\alpha) \leq 1 \] for any value \( x_\alpha \) that \( x \) can take.
    • This is similar to a relative frequency distribution,
    • each \( P(x=x_\alpha) \) represents a proportion of all possible ways \( x \) can equal \( x_\alpha \) relative to all possible outcomes.
  • In the horizontal axis, the centers of the rectangles are at the attainable values for \( x \).
    • The width of each rectangle is also equal to \( 1 \).
  • Therefore, if we take the area of rectangle corresponding to some value \( x_\alpha \), we have, \[ \begin{align} \text{Area of Rectangle }x_\alpha &= P(x=x_\alpha) \times 1\\ &= P(x=x_\alpha). \end{align} \]
  • This says that the histogram is an identical representation of the probability distribution.

Probability distributions continued

Histogram of probability distribution for two coin flips with x number of heads.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • Recall now the conditions we have for a probability distribution:
    1. We have a random variable \( x \) which takes an outcome in the sample space \( \mathbf{S} \) of a process to a numerical value in its range \( \mathbf{R} \).
    2. For each value \( x_\alpha \) that \( x \) can attain in its range \( \mathbf{R} \), the distribution assigns a probability \( P(x=x_\alpha) \).
    3. For each \( x_\alpha \) in the range, \[ 0\leq P(x=x_\alpha) \leq 1, \] and \[ \sum_{x_\alpha \in \mathbf{R}} P(x=x_\alpha) =1 \]
  • In the above table, hiring managers were asked to identify common mistakes during interviews.
  • Discuss with a neighbor: based on the criteria for a probability distribution, does the table above represent a probability distribution? Why or why not?
    • There are two major issues with the above:
      1. The variable \( x \) is categorical, not numerical.
      2. The sum of the probabilities is greater than \( 1 \).
  • Therefore, the above is not a probability distribution.

The mean of probability distributions

Probability distribution for two coin flips with x number of heads.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • Recall now that frequency distributions are derived from samples, and therefore the measures of the samples are statistics.
    • E.g., for a random variable \( x \), we can compute the sample mean \( \overline{x} \) from the frequency distribution.
  • On the other hand, a probability distribution represents the entire population, where the population may be abstract.
    • E.g., for the two coin flips, the probability distribution for \( x \) represents the relative frequency of outcomes over the population of all possible experiments.
    • Therefore, if we compute the mean of \( x \) for the probability distribution, we have the population parameter \( \mu \).
  • To compute the mean of the probability distribution, we follow a formula like the mean of a frequency disribution.
    • Let \( \{x_\alpha\} \) be the collection of all possible values for \( x \) in its range \( \mathbf{R} \).
      • For a table as above, this corresponds to all row values in the left-hand-side.
    • Let \( \{P(x=x_\alpha)\} \) be all the associated probabilities for \( x \) over its range of values \( \mathbf{R} \).
      • For a table as above, this corresponds to all row values in the right-hand-side.
    • Then the mean of the probability distribution is given, \[ \mu = \sum_{x_\alpha \in \mathbf{R}} x_\alpha P(x=x_\alpha) \]

The mean of probability distributions continued

  • Notice that the mean of the probability distribution \[ \mu = \sum_{x_\alpha \in \mathbf{R}} x_\alpha P(x=x_\alpha) \] is really identical to the formula for the mean of a frequency distribution.
    • Suppose there are \( N \) total possiblities for the outcome of our experiment;
    • suppose for each \( x_\alpha \) in the range there are \( f_\alpha \) total ways that \( x \) can attain the value \( x_\alpha \).
    • If we look at the formula for the mean of a frequency distribution \[ \begin{align} \frac{ \sum_{x_\alpha \in \mathbf{R}} x \times f_\alpha}{\sum_{x_\alpha \in \mathbf{R}} f_\alpha}= \frac{ \sum_{x_\alpha \in \mathbf{R}} x \times f_\alpha}{N} = \sum_{x_\alpha \in \mathbf{R}} x \times \frac{ f_\alpha}{N} = \sum_{x_\alpha \in \mathbf{R}} x_\alpha P(x=x_\alpha) = \mu \end{align} \]
  • Therefore, the formula is really the same, but the interpretation is different because we are dealing with population values.
  • Because of the difference in the interpretation, the mean of a probability distribution has a special name:
    • For a random variable \( x \) with probability distribution defined by the pairs of values \( \{x_\alpha\} \) and \( P(x=x_\alpha) \), the expected value of \( x \) is defined, \[ \mu = \sum_{x_\alpha \in \mathbf{R}} x_\alpha P(x=x_\alpha). \]
    • We call the mean of the probability distribution the expected value, because it can be thought of as the theoretical mean if we repeated an experiment infinitely many times or sampled the entire population;
    • we would expect this value on average, relative to infinitely many experiments.

Example of the expected value

Probability distribution .

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • We can understand the expected value concretely with games of chance in casinos.
  • We saw before that the probability of winning on a bet on the number \( 7 \) in roulette is \( \frac{1}{38} \).
  • Therefore, the probability of losing on a bet on the number \( 7 \) is \( \frac{37}{38} \).
  • For the bet on the number \( 7 \) our sample space includes all numbers on the roulette wheel as simple events, but we can reduce this to two compound events “Lose” and “Win”.
  • If \( x \) is the random variable equal to the net winnings on a \( 5 \) dollar bet, \( x \) takes numerical values in the two events as \[ \begin{align} \text{Lose}:\hspace{2mm} &x = -5 \\ \text{Win}:\hspace{2mm} &x= 175 \end{align} \]
  • Discuss with a neighbor: given the values above, what is the expected value of a \( 5 \) dollar bet in roulette?
    • We can compute the expected value of a \( 5 \) dollar bet on the number \( 7 \) as, \[ \begin{align} \mu &= -5 \times P(x= -5) + 175 \times P(x=175) \\ &= -5 \left(\frac{37}{38}\right) + 175 \left( \frac{1}{38}\right) \approx -0.26 \end{align} \]
    • This says that the expected net winnings over infinitely many bets is \( -0.26 \) dollars per bet.
    • Put another way, we can say that over many, many bets, the casino expects to gain \( 26 \) cents on average per bet.
    • Because this is the theoretical, population mean winnings for the casino, the casino will make money on large averages even when it pays out large occasionally.

The standard deviation and variance of probability distributions

  • Let’s recall the formula now for the standard deviation of a population with members \( \{x_i\}_{i=1}^N \) \[ \sigma = \sqrt{\frac{\sum_{i=1}^N \left(x_i - \mu\right)^2}{N}}, \] where we take a denominator of \( N \) for the population instead of \( N-1 \) as in samples.
  • Let’s suppose that the population members \( x_i \) equal values \( x_\alpha \) in the range \( \mathbf{R} \) with frequencies \( f_\alpha \).
  • If we re-write the above formula in terms of \( x_\alpha \) and \( f_\alpha \) we can say, \[ \begin{align} \sigma = \sqrt{\frac{\sum_{x_\alpha\in \mathbf{R}} f_\alpha\left(x_\alpha - \mu\right)^2}{N}} = \sqrt{\sum_{x_\alpha \in \mathbf{R}} \frac{f_\alpha}{N} \left(x_\alpha - \mu\right)^2} = \sqrt{\sum_{x_\alpha\in \mathbf{R}} P(x=x_\alpha) \left(x_\alpha - \mu\right)^2 }. \end{align} \]
  • We will denote, \[ \sigma = \sqrt{\sum_{x_\alpha\in \mathbf{R}} P(x=x_\alpha) \left(x_\alpha - \mu\right)^2 } \] the standard deviation of the probability distribution associated to the random variable \( x \).
  • That is to say that the population standard deviation \( \sigma \) is exactly the standard deviation of the probability distribution.
  • For infinite populations and ranges, we can use the same argument (with calculus) to show this holds in general.

The standard deviation and variance of probability distributions continued

  • Using the derivation from the last slide, \[ \sigma = \sqrt{\sum_{x_\alpha\in \mathbf{R}} P(x=x_\alpha) \left(x_\alpha - \mu\right)^2 }, \]
  • we can show directly that the variance of a probability distribution is given as, \[ \sigma^2 = \sum_{x_\alpha \in \mathbf{R}} P(x=x_\alpha) \left(x_\alpha - \mu\right)^2 . \]
  • We can also derive the alternative forms for the population standard deviation and variance in terms of the probability distribution as \[ \begin{align} \sigma &= \sqrt{\sum_{x_\alpha \in \mathbf{R}} \big( x_\alpha P(x=x_\alpha) \big)^2 - \mu^2 }, \\ \\ \sigma^2 &= \sum_{x_\alpha\in\mathbf{R}} \big( x_\alpha P(x=x_\alpha) \big)^2 - \mu^2 . \end{align} \]
  • This again just ammounts to some algebraic manipulation and these forms are totally equivalent to the other forms.

A full example of computing population parameters

Probability distribution for two coin flips with x number of heads.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • Let \( x \) once again be the random variable corresponding to the number of heads in two coin flips.
  • Recall our formulas for the mean and standard deviation of a probability distribution: \[ \begin{align} \mu &= \sum_{x_\alpha \in \mathbf{R}} x_\alpha P(x=x_\alpha) \\ \sigma &= \sqrt{\sum_{x_\alpha\in\mathbf{R}} P(x=x_\alpha) \left(x_\alpha - \mu\right)^2 } \end{align} \]
  • Discuss with a neighbor: what is the mean of the probability distribution above?
    • Using the formula we can show that, \[ \begin{align} \mu &= 0 \times P(x=0) + 1 \times P(x=1) + 2 \times P(x=2) \\ &= 0 + 0.5 + 0.5 = 1 \end{align} \]
  • Discuss with a neighbor: what is the standard deviation of the probability distribution above?
    • We will note that the deviations of the values in the range \( \mathbf{R} \) from the mean \( \mu \) are given as, \[ \begin{align}0 - \mu = -1 & & 1 - \mu = 0 & & 2 - \mu = 1 \end{align}. \]
    • Therefore, we find that, \[ \begin{align} \sigma= \sqrt{ P(x=0) (-1)^2 + P(x=1) (0)^2 + P(x=2) (1)^2} = \sqrt{0.25 + 0 + 0.25} \approx 0.7 \end{align} \]
  • Thus the population parameters for distribution of the number of heads in two coin flips are \( \mu=1 \) and \( \sigma\approx 0.7 \).

A review of the range-rule-of-thumb and significant values

Significance of measurements by the range rule of thumb.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • Let’s recall the “range rule of thumb” for significant (or interesting) observations.
  • For many data sets, the majority of sample values (on the order of \( 95\% \)) will lie within two standard deviations of the mean.
  • For this reason, we find measured values to suprising / significant when they lie outside of two standard deviations.
  • To find significant values we can use the range rule of thumb as follows:
    • Significantly low – a value \( x \) is significantly low when \[ x \leq \mu - 2 \sigma \]
    • Significantly high – a value \( x \) is significantly high when \[ x \geq \mu + 2 \sigma \]
    • Not significant – a value \( x \) is not significant when \[ \mu - 2 \sigma < x < \mu + 2 \sigma \]
  • When the population is normal, this is connected to the empirical rule by the fact that a randomly selected individual has a probability of \( 95\% \) of lying in the range, \[ \mu - 2 \sigma < x < \mu + 2 \sigma. \]
  • Therefore, the is a \( 5\% \) chance that a randomly selected individual will lie either \[ \begin{align} x \leq \mu - 2 \sigma & & \text{ or }& &x \geq \mu + 2 \sigma\end{align} \]
  • By convention, when there is a \( 5\% \) or less chance of finding such an observation, we call the observation to be significant.

An example of the range-rule-of-thumb and significant values

Probability distribution for number of sleep walkers in random selection of 5 individuals.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • On the left is a probability distribution for the random variable \( x \) equal to the number of sleep walkers in a random selection of \( 5 \) US adults.
  • Following the definition of the mean of a probability distribution, \[ \begin{align} \mu &= \sum_{x_\alpha = 1}^5 x_\alpha \times P(x=x_\alpha) \\ &= 0 \times 0.172 + 1\times 0.363 + 2 \times 0.306 + 3\times 0.129 + 4 \times 0.027 + 5 \times 0.002\\ &=1.48 \end{align} \]
  • Similarly, the standard deviation can be computed, \[ \sigma = \sqrt{\sum_{x_\alpha = 1}^{5} P(x=x_\alpha) (x_\alpha - \mu)^2 } \]
  • Therefore, \[ \sigma = \sqrt{ 0.172(-1.48)^2 + 0.363 (-0.48)^2 + 0.306 (0.52)^2+ 0.129 (1.52)^2 + 0.027 (2.52)^2 + 0.002 (3.52)^2} \approx 1.044 \]
  • Discuss with a neighbor: given the above \( \mu \) and \( \sigma \) and the range-rule-of-thumb, is selecting \( 3 \) sleepwalkers out of \( 5 \) random adults significant?
    • Consider that \( \mu + 2 \sigma \approx 3.568 \).
    • Therefore, \( x=3 \) lies in the interval \( (\mu - 2 \sigma, \mu + 2 \sigma) \) and by the range-rule-of-thumb, we do not consider this to be significant.

An example of the range-rule-of-thumb and significant values continued

Probability distribution for number of sleep walkers in random selection of 5 individuals.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • Let’s consider analyzing this problem directly from the probability now.
  • Recall, we consider \( x \) to be significant if the probability of getting a value at least as extreme as \( x \) is less than \( 5\% \).
  • Discuss with a neighbor: would selecting \( 3 \) sleepwalkers out of \( 5 \) random adults be significant by probability? What is the probability of finding at least \( 3 \) sleep walkers in a group of \( 5 \) random adults?
    • Notice that \( A= \)"at least \( 3 \) sleepwalkers in \( 5 \) random adults" is the same as the event, \[ \left(x = 3 \text{ or } x=4 \text{ or } x=5\right) \]
    • Therefore, \[ \begin{align} P(A) &= P(x=3) + P(x=4) + P(x=5) \\ &= 0.129 + 0.027 + 0.002 = 0.158 \end{align} \]
    • We have \( P(A) = 0.158 \geq 0.05 \) so we do not call this significant.

An example of the range-rule-of-thumb and significant values continued

Probability distribution for number of sleep walkers in random selection of 5 individuals.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • Discuss with a neighbor: would selecting \( 4 \) sleepwalkers out of \( 5 \) random adults be significant by probability? What is the probability of finding at least \( 4 \) sleep walkers in a group of \( 5 \) random adults?
    • Notice that \( A= \)"at least \( 4 \) sleepwalkers in \( 5 \) random adults" is the same as the event, \[ \left(x=4 \text{ or } x=5\right) \]
    • Therefore, \[ \begin{align} P(A) &= P(x=4) + P(x=5) \\ &= 0.027 + 0.002 = 0.029 \end{align} \]
    • We have \( P(A) = 0.029 \leq 0.05 \) so we do call this significant.

Analyzing the shape of a probability distribution

Probability distribution for number of sleep walkers in random selection of 5 individuals.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • We can analyze the shape of data directly from a probability distribution in the same way that we can look at a relative frequency distribution.
  • For normal data, this means that:
    1. Probability should start low, become high and go low again.
    2. The proability values should be roughly symmetric around the peak of the highest probability.
    3. The probability of outliers should be very small.
  • Discuss with a neighbor: does the data to the left appear to be normally distributed? Why or why not?
    • Notice that the probability values are not symmetric around the peak of \( P(x=1)=0.363 \).
    • Particularly, there is a long right-tail, so we say that this data is skewed-right.
    • Therefore, we can see that this probability distribution is not normal.