Fundamentals of probability part II

02/20/2020

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

  • The following topics will be covered in this lecture:
    • Odds
    • Compound events
    • The addition rule
    • Conditional probability
    • The multiplication rule

Odds

  • Expressions of likelihood are often given as odds, such as \( 50:1 \) (or “50 to 1”).

  • Because the use of odds makes many calculations difficult, statisticians, mathematicians, and scientists prefer to use probabilities.

  • The advantage of odds is that they make it easier to deal with money transfers associated with gambling, so they tend to be used in casinos, lotteries, and racetracks.

  • Note – in the three definitions that follow, the actual odds against and the actual odds in favor are calculated with the actual likelihood of some event;

    • however, the payoff odds describe the relationship between the bet and the amount of the payoff.
  • The actual odds correspond to actual probabilities of outcomes, but the payoff odds are set by racetrack and casino operators.

    • Racetracks and casinos are in business to make a profit, so the payoff odds will not be the same as the actual odds.

Odds continued

  • Actual odds against event \( A \) – this is the probability of event \( \overline{A} \) relative to the event \( A \), i.e.,

    \[ \frac{P\left(\overline{A}\right)}{P(A)} \]

    • Actualy odds against is usually expressed in the form of \( a:b \) (or “\( a \) to \( b \)”), where \( a \) and \( b \) are integers having no common factors.
  • Actual odds in favor of event \( A \) – this is the probability of event \( A \) relative to the event \( \overline{A} \), i.e.,

    \[ \frac{P(A)}{P\left(\overline{A}\right)} \]

    • If the odds against \( A \) are \( a:b \), then the odds in favor of \( A \) are \( b:a \).
  • Payoff odds against event \( A \) – this is the ratio of net profit (if you win) to the amount bet:

    \[ \text{payoff odds against event }A = (\text{net profit}):(\text{amount bet}) \]

Odds example

  • If you bet \( 5 \) dollars on the number \( 13 \) in roulette, your probability of winning is \( \frac{1}{38} \) and the payoff odds are given by the casino as \( 35:1 \).

  • Discuss with a neighbor: what are the actual odds for and the actual odds against winning with a bet on \( 13 \)?

    • Let's note that if \( A= \)"winning with a bet on \( 13 \)", we can write \( P(A)=\frac{1}{38} \).
    • Therefore, the probability of not winning is \[ P\left(\overline{A}\right) = 1 - \frac{1}{38} = \frac{37}{38} \]
    • If the actual odds for a bet on \( 13 \) are thus given as, \[ \frac{P(A)}{P\left(\overline{A}\right)} = \frac{\frac{1}{38}}{\frac{37}{38}} = \frac{1}{37}, \] or as odds, \( 1:37 \).
    • If we have the actual odds for a bet on \( 13 \) as \( 1:37 \) then the actual odds against are given as \( 37:1 \).
  • Recall our formula for payoff odds, \[ \text{payoff odds against event }A = (\text{net profit}):(\text{amount bet}) \]

  • Discuss with a neighbor: how much net profit would you make if you win by betting on \( 13 \)?

    • If we bet one dollar, we net a profit of \( 35 \) dollars, so we can multiply this ratio to obtain \( 175:5 \) as the net profit to bet.
    • The net profit is \( 175 \) which means the casino gives you your winnings of \( 175 \) plus \( 5 \) for your original bet.

Odds example continued

  • If you bet \( 5 \) dollars on the number \( 13 \) in roulette, your probability of winning is \( \frac{1}{38} \) and the payoff odds are given by the casino as \( 35:1 \).

  • Discuss with a neighbor: if the casino was not operating for profit and the payoff odds were changed to match the actual odds against \( 13 \), how much would you win with a bet of \( 5 \) dollars if the outcome were \( 13 \)?

    • If the payoff odds were equal to the actual odds against, we would be computing, \[ 37:1 = (\text{Net profit}):(\text{ammount bet}). \]
    • Thus if we used this rule, we could multiply the ratio by five again to find \( 185:5 \).
    • Our net profit would therefore be \( 185 \) dollars on a \( 5 \) dollar bet – this means the casino would owe you \( 185 \) plus \( 5 \) dollars for your original bet.

Compound events

  • We will often be concerned not with one event \( A \) but some combination of some event \( A \) and some event \( B \).
  • Compound event – formally we define a compound event as any event combining two or more simple events.
  • There are two key operations joining events
    1. “OR” – in mathematics we refer to “or” as a non-exclusive “or”.
      • The meaning of this for “\( A \) or \( B \)” is – event \( A \) occurs, event \( B \) occurs, or both events \( A \) and \( B \) occur.
      • We will not consider the exclusive “or”, i.e. either event \( A \) occurs, or event \( B \) occurs, but not both.
    2. “AND” – in mathematics we refer to “and” in an exclusive sense.
      • The meaning of this for “\( A \) and \( B \)” is – both event \( A \) and event \( B \) occurs.
  • The operations “and” and “or” join events together in a way that we can compute the probability of the joint events.
  • We will develop some tools describing how to compute probabilities of these compound events from the individual probabilities.
    • A key concept is how we compute the probability of events without double counting the ways they can occur.

Addition rule

Venn diagram of events \( A \) and \( B \) with nontrivial intersection.

Courtesy of Bin im Garten CC via Wikimedia Commons

  • Suppose we want to compute the probability of two events \( A \) and \( B \) joined by the compound operation “or”.
  • We read the statement, \[ P(A \text{ or } B) \] as he probability of event:
    • \( A \) occuring,
    • event \( B \) occuring, or
    • both \( A \) and \( B \) ocurring.
  • Intuitively, we can express the probability in terms of all the ways \( A \) can occur and all the ways \( B \) can occur, if we don’t double count.
  • Let all the ways that \( A \) can occur be represented by the red circle to the left.
  • Let all the ways that \( B \) can occur be represented by the dashed circle to the left.
  • Discuss with a neighbor: suppose we count all the ways \( A \) can occur and all the ways \( B \) can occur.
  • If we take the sum of the total of all ways \( A \) occurs and the total of all ways \( B \) occurs, does this give the total of all ways \( A \) or \( B \) occurs?
    • Consider, if there is an overlap where both \( A \) and \( B \) occur simultaneously, then summing the total of all ways \( A \) occurs and the total of all ways \( B \) occurs double counts the the cases where both \( A \) and \( B \) occur.

Addition rule continued

Venn diagram of events \( A \) and \( B \) with nontrivial intersection.

Courtesy of Bin im Garten CC via Wikimedia Commons

  • Let us consider the statement \[ {\color{red} {P(A)}} + P(B), \] is equal to the sum of the total of all ways that \( A \) occurs and the total of all ways that \( B \) occurs, relative to all possible outcomes.
  • Discuss with a neighbor: what term “\( \cdot \)” is needed in \( P\left( \cdot \right) \) below to eliminate the double counting? \[ P(A\text{ or }B)= {\color{red} {P(A)}} + P(B) - P\left( \cdot \right) \]
    • We count the cases where \( A \) and \( B \) both occur twice, as these cases are included in both \( {\color{red} {P(A)}} \) and \( P(B) \).
  • Therefore, the addition rule for compound events is given as, \[ P(A\text{ or }B) = P(A) + P(B) - P(A\text{ and }B) \]
  • Discuss with a neighbor: notice that if \( P(A\text{ and } B) = 0 \) then \[ P(A\text{ or }B) = P(A) + P(B), \] is an accurate statement.
  • If \( P(A\text{ and } B) = 0 \), what does this say about the relationship between \( A \) and \( B \)?
    • This says that events \( A \) and \( B \) never occur simultaneously.
    • An easy example is for \( A= \)"coin flip lands heads" and \( B= \)"coin flip lands tails" – these two events never occur simultaneously and \( P(A\text{ or } B)= P(A) + P(B) \).
  • Two events that never occur sumultaneously are called disjoint or mutually exclusive events, corresponding to when there is no overlap.

Addition rule continued

Venn diagram of events \( A \) and \( B \) with nontrivial intersection.

Courtesy of Bin im Garten CC via Wikimedia Commons

  • Actually, not only are the events \( A= \)"coin flip lands heads" and \( B= \)"coin flip lands tails" disjoint, but they are complementary.
  • If our process is one coin flip, we can encode the simple events as \( \{H,T\} \) for heads and tails respectively.
  • Thus \( A=\{H\} \) and \( B=\{T\} \), so that \[ \overline{A}=``\text{all outcomes where the coin flip is not heads''}=\{T\}=B. \]
  • Complementary events are special cases of disjoint events.
  • We saw last time that, \[ P(A) + P\left(\overline{A}\right) = 1. \]
  • We can now use the addition rule to show that this is always true.
  • First note that by definition, \( A \) and \( \overline{A} \) will never occur – this is the joined event where \( A \) occurs and \( A \) does not occur simultaneously, so that \( P(A\text{ and }\overline{A})= 0 \).
  • However, \( A \) or \( \overline{A} \) is the joint event in which \( A \) occurs, \( A \) does not occur, or both \( A \) occurs and \( A \) does not occur – this is every possible outcome (and some that are impossible). Therefore, \( P\left(A\text{ or }\overline{A}\right) = 1 \)
  • The addition rule thus gives us, \[ 1= P\left(A\text{ or }\overline{A}\right) = P(A) + P\left(\overline{A}\right) \] because they are disjoint.

Addition rule example

Venn diagram of events \( A \) and \( B \) with nontrivial intersection.

Courtesy of Bin im Garten CC via Wikimedia Commons

  • Let us consider an example about how to use the statement, \[ P(A) + P\left(\overline{A}\right) = 1 \] effectively to deduce probabilities.
  • Based on the article,
    “Prevalance and Comorbidity of Nocturnal Wandering in the US General Population,”
  • we say the probability of randomly selecting a US adult who has sleepwalked is \( 0.292 \).
  • Discuss with a neighbor: what is the probability of randomly selecting a US adult who has not sleepwalked?
    • Note that if our process is “Randomly selected person” and \( A= \)"person has sleepwalked", then \( \overline{A}= \)"person has not sleepwalked".
  • Setting up the problem in this way we can see that \[ \begin{align} & P(A) + P\left(\overline{A}\right) = 1 \\ \Leftrightarrow & P\left(\overline{A}\right) = 1 - P(A) \\ \Leftrightarrow & P\left(\overline{A}\right) = 1 - 0.292 = .708 \end{align} \]

Addition rule example

Pre-employment drug screening.

Courtesy of Mario Triola, Essentials of Statistics, 5th edition

  • In the table to the left, we see results from a pre-employment drug screening test with \( 1000 \) participants.
  • The results are from a quick test that is innacurate and doesn’t always give correct results.
  • Specifically we have four different scenarios for a test result:
    1. True positive – a subject uses drugs and the test result is positive
    2. False positive – a subject does not use drugs and the test result is positive.
    3. True negative – a subject does not use drugs and the test result is negative.
    4. False negative – a subject uses drugs and the test result is negative.
  • Discuss with a neighbor: if one of the participants are selected randomly, let the events be defined as \( A= \)"the participant is a drug user" or \( B= \)"the test result is positive". What is \( P(A\text{ or }B) \)?
    • Notice that \( A \)=“True positive or False negative” and that \( B= \)"True positive or False positive". Therefore, \( A \) and \( B \) is the case “True positive” and we have
    • \[ \begin{matrix} P(A) = \frac{44\text{ True positive } + 6\text{ False negative}}{1000} & P(B) = \frac{44\text{ True positive } + 90\text{ False positive}}{1000} & P(A\text{ and }B) = \frac{44\text{ True positive}}{1000} \end{matrix} \]
    • Thus we see how we would double count \( A \) and \( B \) occuring with \( P(A) + P(B) \) and instead we have,

      \[ \begin{matrix}P(A\text{ or }B) = \frac{44\text{ True positive } + 6\text{ False negative} + 90\text{ False positive}}{1000}=0.14\end{matrix} \]

Multiplication rule

Venn diagram of events \( A \) and \( B \) with nontrivial intersection.

Courtesy of Bin im Garten CC via Wikimedia Commons

  • There are several ways to consider the multiplication rule for probability – the “physics” way to consider this, due to Kolmogorov, is as follows:
    • Suppose that there are two related events \( A \) and \( B \) where knowledge of one occuring would change how likely we see the other to occur.
      • For example, we can say \( A= \)"it snows in the Sierra" and \( B= \)"it rains in my garden".
      • The day before, I don’t know if either will occur.
      • However, if I knew that \( A \) occured, this would change how likely it would seem that \( B \) occurs;
      • \( B \) is not guaranteed when \( A \) occurs, but the probability of \( B \) occuring would be higher in the presence of \( A \).
    • Supose that \( A \) occurs hypothetically, then our sample space of possible events now only includes events where \( A \) also occurs.
    • I.e., we would need to restrict our consideration of \( B \) relative to the case that \( A \) occurs.
  • We define the probability of \( B \) conditional on \( A \),

    \[ P(B\vert A), \]
    as the probability of \( B \) in the case that \( A \) occurs.

Multiplication rule continued

Venn diagram of events \( A \) and \( B \) with nontrivial intersection.

Courtesy of Bin im Garten CC via Wikimedia Commons

  • We now consider the probability of \( B \) conditional on \( A \),

    \[ P(B\vert A), \]
    as the probability of \( B \) in the case that \( A \) occurs.
    • For example, we can say \( A= \)"it snows in the Sierra" and \( B= \)"it rains in my garden".
  • Assuming \( A \) occurs, we will consider all ways for both \( A \) and \( B \) to occur.
    • The sample space for \( B \vert A \) has been restricted to the cases where \( A \) occurs, so we compute the probability relative to all the ways \( A \) occurs.
  • Therefore the probability of \( P(B\vert A) \) can be read, \[ \frac{\text{All the ways }A\text{ and }B\text{ can occur}}{\text{All the ways }A\text{ can occur}} \]
  • Mathematically we write this as, \[ P(B\vert A) = \frac{P(A\text{ and } B)}{P(A)}. \]
  • For example, in plain English we can say
    The probability that it rains in my garden, given that it snows in the Sierra, is equal to the probability of both occuring relative to the probability of snow in the Sierra.

Multiplication rule continued

Venn diagram of events \( A \) and \( B \) with nontrivial intersection.

Courtesy of Bin im Garten CC via Wikimedia Commons

  • We have now defined the conditional probability of \( B \) given \( A \) as, \[ P(B\vert A)=\frac{P(A\text{ and }B)}{P(A)} \] in the Kolmogorov way.
  • We should make some notes about this:
    • The above statment only makes sense when \( P(A)\neq 0 \), because we can never divide by zero.
      • “Physically” we can interpret the meaning with \( P(B\vert A) \) read as
        The probability that \( B \) occurs given that \( A \) occurs.
      • The above should not be defined when \( A \) is impossible – the phrase “given that \( A \) occurs” makes no sense.
    • Using the definition of conditional probability, we get the multiplication rule.
    • The multiplication rule for probability tells us that, \[ P(A \text{ and } B) = P(B\vert A) \times P(A) \]
    • We will use the above formula quite generally, but we note:
      • \( P(B\vert A) \) is not defined when \( P(A)=0 \)
      • However, \( P(A\text{ and }B) = 0 \) when \( P(A)=0 \) because this is the probability that \( B \) and the impossible event \( A \) both occur.

Independence

  • Closely related notions to the conditional probability are independence and dependence of events.
    • Dependence – two events are said to be dependent if the outcome of one event directly affects the probability of the other.
      • In the earlier example, \( A= \)"snow in the Sierra" and \( B= \)"rain in my garden" are dependent events, because one occuring would affect the chance the other occured.
      • However, dependence between events \( A \) and \( B \) does not mean that \( A \) causes \( B \) or vice versa.
      • Rain in my garden does not cause snow in the Sierra, but the probability of snow in the Sierra is larger if there is rain in my garden.
    • Independence – two events are said to be independent if the outcome of either event has no impact on the probability of the other.
      • When we think of events being independent we should think of events that are not related to each other
      • For example, if our process is “what happens today”, \( A= \)"snow in the Sierra" and \( B= \)"coin flip heads" are independent, because neither outcome affects the other.
  • Mathematically, we can see the meaning of independence clearly by stating, \( A \) and \( B \) are independent by definition if and only if both of the following hold, \[ \begin{matrix} P(A\vert B) = P(A) & \text{and} & P(B\vert A) = P(B). \end{matrix} \]
  • In plain English, the above says
    The probability of event \( A \) does not change in the presence of \( B \) and vice versa.
  • Particularly, the outcome of \( A \) or \( B \) does not affect the other.

Sampling and Independence

  • A classical example of statistical independence / dependence arises from games of chance.
  • A standard deck of cards has \( 52 \) cards with half black and half red.
  • Define the events \( A= \)"I draw a red card“ and \( B= \)"I draw a black card”.
  • Suppose I draw a card randomly from the deck and the color is red.
  • Then suppose I return the card to the deck, shuffle and draw another card.
  • Discuss with a neighbor: what is the probability that the next card I draw is black?
    • Because the original card was returned and the deck was shuffled, this is like drawing a card from a fresh deck.
    • Therefore, if \( P(B\vert A) = P(B) = 0.5 \).
  • Now suppose that I randomly draw a card from a fresh deck and the color is red.
  • Suppose that I don’t replace the card into the deck, and I randomly draw another card.
  • Discuss with a neighbor: what is the probability that the next card I draw is black?
    • In this case, the deck now has only \( 25 \) red cards compared to \( 26 \) black cards.
    • When we do not replace samples in the deck, \[ P(B\vert A) = \frac{26}{51} \]
    • That is, \( B \) is not independent from \( A \) when we sample without replacement.
  • This is an important example, because it shows how sampling with and without replacement changes the independence of events.

Sampling and Independence continued

Pre-employment drug screening.

Courtesy of Mario Triola, Essentials of Statistics, 5th edition

  • We define the two concepts we discussed on the last slide below:
    • Sampling with replacement – selections are re-introduced to the sample pool before the next sample is made.
      • In this case, the selections are independent events.
  • Sampling without replacement – selections are not re-introduced to the sample pool before the next sample is made.
    • In this case, the selections are dependent events.
  • Consider the pre-employment drug screening again, with data above.
  • Discuss with a neighbor: suppose we sample two participants without replacement.
  • Let \( A= \)"first draw uses drugs" and \( B= \)"second draw uses drugs".
  • If we want to compute the probability that both particpants use drugs, what event are we trying understand, \[ \begin{matrix} (A \textbf{ and } B) & or & (A \textbf{ or } B)? \end{matrix} \]
  • Can we use the addition rule or the multiplication rule?
    • We are considering the event \( (A \textbf{ and } B) \) because we specified both \( A \) and \( B \) occur.
    • Therefore, we will use the multiplication rule.

Sampling and Independence continued

Pre-employment drug screening.

Courtesy of Mario Triola, Essentials of Statistics, 5th edition

  • For the events
    1. \( A= \)"first draw uses drugs", and
    2. \( B= \)"second draw uses drugs",
  • we can compute \( P(A\text{ and } B) \) using the multiplication rule as follows:
    • Note that \( P(A) \) can be computed as, \[ \frac{50 \text{ Participants using drugs}}{1000 \text{ Participants}} \]
    • And that \( P(B\vert A) \) can be computed as, \[ \frac{49\text{ Remaining participants using drugs}}{999 \text{ Remaining participants}} \]
    • Therefore, we can write, \[ \begin{align} P(A\text{ and } B)&= P(B\vert A) \times P(A)\\ &= \frac{49}{999} \times \frac{50}{1000} \\ &\approx 0.0024 \end{align} \]

Sampling and Independence continued

  • We note, a common approximation is used in polling when sampling without replacement.
  • Suppose that we wish to make a similar computation of probability as with the last example, without replacement.
  • If the sample size is no more than \( 5\% \) of the size of the population, it is common to treat the selections as being independent (even if they are actually dependent).
  • In our last example we found that \[ \begin{align} P(A\text{ and } B)&= P(B\vert A) \times P(A)\\ &= \frac{49}{999} \times \frac{50}{1000} \\ &\approx 0.0024 \end{align} \]
  • However, because the number of draws is less than \( 5\% \) with respect to the total number of participants we can still approximate this with two independent draws.
  • Discuss with a neighbor: what is the probability of \( P(A \text{ and }B) \) if we treat \( A \) and \( B \) as independent, i.e., with replacement?
    • In this case, we would have that \[ \begin{align} P(A\text{ and } B)&= P(B\vert A) \times P(A)\\ &= \frac{50}{1000} \times \frac{50}{1000} \\ &= 0.0025 \end{align} \]

Redundancy and the multiplication rule

  • Machine systems in engineering are often designed with multiple, redundant safety features.
  • Particularly, if there are multiple, independent saftey checks, we can reduce the probability of a catastrophic failure substantially.
  • For example, the Airbus 310 twin-engine airliner has three independent hydraulic systems so that if one fails, another system can step in and maintain flight control.
  • For sake of example, we will assume that the probability of a randomly selected hydraulic system failing is \( 0.002 \).
  • Discuss with a neighbor: if the airplane had only one hydraulic system, what would be the proability that an airplane would be able to maintain control for the flight?
    • Let event \( A= \)"hydraulic system fails" so that \( \overline{A}= \)"airplane maintains control".
    • We can then state, \[ P(\overline{A}) = 1 - P(A) = 1 - 0.002 = 0.998. \]
  • Discuss with a neighbor: what is the probability that an airplane would be able to maintain control with the three independent hydraulic systems?
    • Let us denote \( A_1= \)"hydraulic system \( 1 \) fails", \( A_2= \)"hydraulic system \( 2 \) fails" and \( A_3= \)"hydraulic system \( 3 \) fails".
    • The event where all hydraulic systems fail is given by, \[ \left(A_1\text{ and } A_2\text{ and } A_3\right) \] so that the airplane is able to maintain control in the complement of the above event: \[ \overline{\left(A_1\text{ and } A_2\text{ and } A_3\right)}. \]

Redundancy and the multiplication rule continued

  • We recall, the probability that a randomly selected hydraulic system fails is \( 0.002 \) and the three systems are independent.
  • Therefore, we can use the multiplication rule as, \[ \begin{align} P\left(A_1\text{ and } A_2\text{ and } A_3\right) &= P(A_1\text{ and } A_2\vert A_3) \times P(A_3)\\ &=P(A_1\text{ and } A_2) \times P(A_3) \\ &= P(A_1 \vert A_2) \times P(A_2) \times P(A_3) \\ & = P(A_1)\times P(A_2) \times P(A_3) \end{align} \] because each of the events are inependent.
  • Finally, we can write, \[ P\left(\overline{\left(A_1\text{ and } A_2\text{ and } A_3\right)}\right) = 1 - P(A_1)\times P(A_2) \times P(A_3)= 1-0.002^3=0.999999998 \]
  • This shows how including multiple independent systems greatly improves the probability of success.

Review of key concepts

  • The most important concepts covered here are:
    • how to join events \( A \) and \( B \) with our two operations,
      1. \( A \) “or” \( B \) – the case that \( A \), \( B \) or both \( A \) and \( B \) occur; and
      2. \( A \) “and” \( B \) – the case that both \( A \) and \( B \) occur;
    • how to take complements of events, and how the probabilities are related, e.g., \[ P(A) + P\left(\overline{A}\right) = 1; \]
    • how to use the two probability rules,
      1. Addition rule – for the event \( A \) or \( B \), \[ P(A\text{ or }B)= P(A) + P(B) - P(A \text{ and } B); \]
      2. Multiplication rule – for the event \( A \) and \( B \), \[ P(A \text{ and } B) = P(B\vert A) \times P(A); \]
    • and the notions of independence and dependence between events \( A \) and \( B \).
  • We will go more in depth into conditional probability after the midterm.
  • Tuesday, we will review sections 1.1 - 1.3, 2.1 - 2.4, 3.1 - 3.3 and 4.1 - 4.2.