Total Probability, Independence and Bayes' theorem

02/08/2021

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

  • The following topics will be covered in this lecture:

    • Multiplication Rule
    • Total Probability Rule
    • Independence
    • Bayes' theorem

Intersections of Events and Multiplication Rule

  • Recall that last time we discussed the probability of the intersection of two events.
  • Let us suppose that \( A \) and \( B \) are events for which \( P(A)\neq 0 \) and \( P(B)\neq 0 \).
  • Using the definition of conditional probability \[ P(B|A)=\frac{P(A\cap B)}{P(A)} \]
  • We can solve for the intersection of events \[ P(A \cap B) = P(B\vert A) P(A) \]
  • Similarly, from the same definition of conditional probability we have \[ P(A|B)=\frac{P(A\cap B)}{P(B)} \]
  • which implies that \[ P(A \cap B) = P(A\vert B) P(B) \]
  • We can provide a formula known as the multiplication rule
    Probability of an Intersection: \[ P(A \cap B) = P(B\vert A) P(A) = P(A\vert B) P(B) \]

Intersections of Events – example

  • EXAMPLE: The probability that the first stage of a numerically controlled machining operation for high-rpm pistons meets specifications is 0.90.

  • Failures are due to metal variations, fixture alignment, cutting blade condition, vibration, and ambient environmental conditions.

  • Given that the first stage meets specifications, the probability that a second stage of machining meets specifications is 0.95.

  • Question: Using the multiplication rule,

    \[ P(A \cap B) = P(B\vert A) P(A) = P(A\vert B) P(B) \] what is the probability that both stages meet specifications?

    • Let the events be \( A= \)"first stage meets specifications" and \( B= \)"second stage meets specifications".
    • The probability requested is \( P(A \text{ and }B) \)
    • where \( P(A)=0.90 \)
    • and \( P(B|A)=0.95 \)
    • Using the multiplication rule, we get \[ \begin{align} P(A \cap B) &= P(B\vert A) P(A) \\ &= 0.95(0.90) = 0.855 \end{align} \]
    • Note: although it is also true that \( P(A \cap B) = P(A | B)P(B) \), the information provided in the problem does not match this second formulation.

Total Probability Rule

  • So far we have described probabilities of events in terms of the probabilities of union or intersections of events, i.e.:
    • Addition rule: \( P(A\cup B) = P(A) + P(B) - P(A\cap B) \);
    • Conditional probability: \( P(A\vert B) = \frac{P(A\cap B)}{P(B)} \); and the
    • Multiplication rule: \( P(A\cap B) = P(A\vert B)P(B) \).
  • What if we want to recover the probability of single event \( P(B) \) given several conditions?
Partitioning an event into two mutually exclusive subsets

Courtesy of Montgomery & Runger, Applied Statistics and Probability for Engineers, 7th edition

  • For any event \( B \), we can write \( B \) as the union of the part of \( B \) in \( A \) and the part of \( B \) in \( A′ \). That is \[ B=(A\cap B)\cup(A'\cap B) \]
  • Because \( A \) and \( A′ \) are mutually exclusive
    • \( A \cap B \) and \( A' \cap B \) are mutually exclusive
  • So we can use the addition rule for mutually exclusive events as \[ \begin{align} P(B)&=P((A\cap B)\cup(A'\cap B))\\ &=P(A\cap B)+P(A'\cap B)\end{align} \]
  • Using the multiplication rule on each term of \( P(B) \) we get \[ P(B)=P(B|A)P(A)+P(B|A')P(A') \]
  • Total Probability Rule (Two Events) For any two events \( A \) and \( B \)
  • \[ \begin{align} P(B)=P(B\cap A)+P(B\cap A')=P(B|A)P(A)+P(B|A')P(A') \end{align} \]

Total Probability Rule – example

chip contamination example

Courtesy of Montgomery & Runger, Applied Statistics and Probability for Engineers, 7th edition

  • Suppose that in semiconductor manufacturing, the probability is 0.10 that a chip subjected to high levels of contamination during manufacturing has a product failure.
  • The probability is 0.005 that a chip not subjected to high contamination levels during manufacturing has a product failure.
  • In a particular production run, 20% of the chips are subject to high levels of contamination.
  • Question: using the total probability rule \[ \begin{align} P(B)=P(B\cap A)+P(B\cap A')=P(B|A)P(A)+P(B|A')P(A'), \end{align} \]
  • what is the total probability of a chip failure?
  • Lets denote the events be \( F= \)"product fails" and \( H= \)"chip is exposed to high levels of contamination".
  • From the table we can extract some pieces of information
    • \( P(H)=0.2 \) and \( P(H')=0.8 \)
    • \( P(F|H)=0.10 \) and \( P(F|H')=0.005 \)
  • We can use the total probability rule on \( P(F) \) in terms of conditional probabilities \[ P(F)=P(F|H)P(H)+P(F|H')P(H') \]
  • Then the probability that a product fails is \[ P(F)=0.10(0.20)+0.005(0.80)=0.024 \]

Total Probability Rule (Multiple Events)

Partitioning an event into several mutually exclusive subsets

Courtesy of Montgomery & Runger, Applied Statistics and Probability for Engineers, 7th edition

  • In general, a collection of sets \( E_1, E_2, \dots , E_k \) such that \[ E_1 \cup E_2 \cup \dots \cup E_k = S \] is said to be exhaustive.
  • The left figure pictures a partitioning an event \( B \) among a collection of mutually exclusive and exhaustive events.
  • We can generalize the total probability rule for multiple events.
  • We will repeatedly apply the addition rule for mutually exclusive events \[ \begin{align} A \cap B = \emptyset &&\Rightarrow && P(A\cup B) = P(A) + P(B). \end{align} \]
  • Then we will repeatedly apply the multiplication rule on the intersections, \[ \begin{align} P(A\cap B) = P(A\vert B) P(B) \end{align} \]
  • Total Probability Rule (Multiple Events) Assume \( E_1, E_2, \dots , E_k \) are \( k \) mutually exclusive and exhaustive sets. Then
    \[ \begin{align} P(B)&=P(B\cap E_1)+P(B\cap E_2)+\dots+P(B\cap E_k)\\ &=P(B|E_1)P(E_1)+P(B|E_2)P(E_2)+\dots+P(B|E_k)P(E_k) \end{align} \]
  • Independence

    • Closely related notions to the conditional probability are independence and dependence of events.
      • Dependence – two events are said to be dependent if the outcome of one event directly affects the probability of the other.
        • In the earlier example, \( A= \)"snow in the Sierra" and \( B= \)"rain in my garden" are dependent events, because one occurring would affect the chance the other occurred.
        • However, dependence between events \( A \) and \( B \) does not mean that \( A \) causes \( B \) or vice versa.
        • Rain in my garden does not cause snow in the Sierra, but the probability of snow in the Sierra is larger if there is rain in my garden.
      • Independence – two events are said to be independent if the outcome of either event has no impact on the probability of the other.
        • When we think of events being independent we should think of events that are not related to each other
        • For example, if our random experiment is “what happens today?”, \( A= \)"snow in the Sierra" and \( B= \)"coin flip heads" are independent, because neither outcome affects the other.
    • Mathematically, we can see the meaning of independence clearly by stating, \( A \) and \( B \) are independent by definition if and only if both of the following hold, \[ \begin{matrix} P(A\vert B) = P(A) & \text{and} & P(B\vert A) = P(B). \end{matrix} \]
    • In plain English, the above says
      The probability of event \( A \) does not change in the presence of \( B \) and vice versa.
    • Particularly, the outcome of \( A \) or \( B \) does not affect the other.

    Redundancy and the multiplication rule

    • Machine systems in engineering are often designed with multiple, redundant safety features.
    • Particularly, if there are multiple, independent safety checks, we can reduce the probability of a catastrophic failure substantially.
    • EXAMPLE: the Airbus 310 twin-engine airliner has three independent hydraulic systems so that if one fails, another system can step in and maintain flight control.
    • For sake of example, we will assume that the probability of a randomly selected hydraulic system failing is \( 0.002 \).
    • Question: if the airplane had only one hydraulic system, what would be the probability that an airplane would be able to maintain control for the flight?
      • Let event \( A= \)"hydraulic system fails" so that \( A'= \)"airplane maintains control".
      • We can then state, \[ P(A') = 1 - P(A) = 1 - 0.002 = 0.998. \]
    • Question: what is the probability that an airplane would be able to maintain control with the three independent hydraulic systems?
      • Let us denote \( A_1= \)"hydraulic system \( 1 \) fails", \( A_2= \)"hydraulic system \( 2 \) fails" and \( A_3= \)"hydraulic system \( 3 \) fails".
      • The event where all hydraulic systems fail is given by, \[ \left(A_1\text{ and } A_2\text{ and } A_3\right) \] so that the airplane is able to maintain control in the complement of the above event: \[ \left(A_1\text{ and } A_2\text{ and } A_3\right)'. \]

    Redundancy and the multiplication rule continued

    • We recall, the probability that a randomly selected hydraulic system fails is \( 0.002 \) and the three systems are independent.
    • Therefore, we can use the multiplication rule as, \[ \begin{align} P\left(A_1\text{ and } A_2\text{ and } A_3\right) &= P(A_1\text{ and } A_2\vert A_3) \times P(A_3)\\ &=P(A_1\text{ and } A_2) \times P(A_3) \\ &= P(A_1 \vert A_2) \times P(A_2) \times P(A_3) \\ & = P(A_1)\times P(A_2) \times P(A_3) \end{align} \] because each of the events are independent.
    • Finally, we can write, \[ P\left(\left(A_1\text{ and } A_2\text{ and } A_3\right)'\right) = 1 - P(A_1)\times P(A_2) \times P(A_3)= 1-0.002^3=0.999999998 \]
    • This shows how including multiple independent systems greatly improves the probability of success.

    Another way of writing independence

    • What we saw in the last slide, \[ P(A_1 \text{ and } A_2 \text{ and } A_3) = P(A_1) \times P(A_2) \times P(A_3) \] actually holds generally for independent events.
    • Let’s suppose that \( A \) and \( B \) are independent events such that \[ \begin{align} P(A\vert B) = P(A) && P(B\vert A) = P(B). \end{align} \]
    • Consider the multiplication rule for the two independent events \( A \) and \( B \), \[ \begin{align} P( A \text { and } B) &= P(A \vert B) \times P(B) \\ &=P(A) \times P(B), \end{align} \] using the independence assumption.
    • In fact, we can show that this holds for any number of independent events, re-using the argument above.
    • Let \( A_1 \), \( A_2 \), \( A_3, \) \( \cdots \) \( A_n \) be any arbitrary list of mutually independent events.
    • Then using the argument above \( n \) times, we can show that, \[ P(A_1 \text{ and } \cdots \text{ and } A_n) = P(A_1) \times \cdots \times P(A_n). \]
    • The intuition of the following statement for independence \[ \begin{align} P(A\vert B) = P(A) && P(B\vert A) = P(B) \end{align} \] is usually easier to interpret,
    • however, in practice, we will usually describe independence as
      Independence (multiple events)
      The events \( A_1 , A_2 , ... , A_n \) are independent if and only if for any subset of these events \[ P(A_1 \cap \cdots \cap A_n) = P(A_1) \times \cdots \times P(A_n). \]
    • These two notions are in fact equivalent by the argument above.

    Review of key concepts

    • The most important concepts covered here are:
      • how to join events \( A \) and \( B \) with our two operations,
        1. \( A \) “or” \( B \) / \( A\cup B \) – the case that \( A \), \( B \) or both \( A \) and \( B \) occur; and
        2. \( A \) “and” \( B \) / \( A \cap B \) – the case that both \( A \) and \( B \) occur;
      • how to take complements of events, and how the probabilities are related, e.g., \[ P(A) + P\left(A'\right) = 1; \]
      • how to use the probability rules,
        1. Addition rule – for the event \( A \) or \( B \), \[ P(A\cup B)= P(A) + P(B) - P(A \cap B); \]
        2. Conditional probability – for the event \( A \) given \( B \), \[ P(A\vert B) = \frac{P(A\cap B)}{P(B)}; \]
        3. Multiplication rule – for the event \( A \) and \( B \), \[ P(A\cap B) = P(B\vert A) \times P(A); \]
      • the notion of independence between events \( A \) and \( B \), \[ \begin{align} P(A\vert B) = P(A) & & P(B\vert A) = P(B). \end{align} \]
      • and the product rule for mutually independent events:
        • let \( A_1 \), \( A_2 \), \( A_3, \) \( \cdots \) \( A_n \) be any arbitrary list of mutually independent events, then \[ P(A_1 \text{ and } \cdots \text{ and } A_n) = P(A_1) \times \cdots \times P(A_n). \]

    Bayes’ Theorem

    • Let us suppose that \( A \) and \( B \) are events for which \( P(A)\neq 0 \) and \( P(B)\neq 0 \).
    • Consider the statement of the multiplication rule, \[ P(A \cap B) = P(A\vert B) P(B); \]
    • yet it is also true that, \[ P(B \cap A) = P(B \vert A) P(A); \]
    • and \( P( A \cap B) = P(B \cap A) \) by definition.
    • Putting these statements together, we obtain, \[ \begin{align} &P(A\vert B) P(B) = P(B \vert A ) P(A)\\ \Leftrightarrow & P(A \vert B) = \frac{P(B\vert A) P(A)}{ P(B)} \end{align} \]
    • The statement that \[ P(A \vert B) = \frac{P(B\vert A) P(A)}{ P(B)} \] is known as Bayes' theorem for \( P(B)>0 \).
    • This is nothing more than re-writing the multiplication rule as discussed above, but the result is extremely powerful.
    • Bayes' theorem wasn’t widely used in statistics for hundreds of years, until advances in digital computers.
    • When digital computers became available, many tools became available using Bayes' theorem as the basis.