Conditional probability and probability rules

02/03/2021

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

  • The following topics will be covered in this lecture:

    • Axioms of probabilty
    • Addition rule
    • Conditional probability
    • Multiplication rule

Axioms of Probability

  • Now that we are more familiar with the notions of probability, we can collect the assumptions into a set of axioms of probability that must be satisfied in any random experiment.
  • Probability is a number that is assigned to each member of a collection of events from a random experiment that satisfies the following properties:
    1. \( P(S) = 1 \) where \( S \) is the sample space
    2. \( 0 ≤ P(E) ≤ 1 \) for any event \( E \)
    3. For two events \( E_1 \) and \( E_2 \) with \( E_1 \cap E_2=\emptyset \) \[ P(E_1\cup E_2)=P(E_1)+P(E_2) \]
  • These axioms do not determine probabilities; the probabilities are assigned based on our knowledge of the system under study.
  • These axioms imply some important results
    • Probability of the empty set is zero \[ P(\emptyset)=0 \]
      • We can recognize this as \[ P(S \cup \emptyset ) = P(S) \] and \[ P(S \cup \emptyset ) = P(S) + P(\emptyset) \]
      • Putting the above together, we have \[ P(S) + P(\emptyset) = P(S) \Leftrightarrow P(\emptyset) = 0. \]

Axioms of Probability – continued

    • Probability that event \( E \) does not occur \[ P(E')=1-P(E) \]
      • Notice that \[ P(E' \cup E) = P(S) = 1 \] and \[ P(E' \cup E) = P(E') + P(E) \]
      • Therefore, we have \[ P(E') + P(E) = 1 \Leftrightarrow P(E') = 1 - P(E). \]
    • If the event \( E_1 \) is contained in the event \( E_2 \), \[ P(E_1)\leq P(E_2) \]
      • Notice that \( E_1 \cup E_2 = E_2 \) and \( E_1 \cap E_2 = E_1 \) because of the set containment.
      • Therefore, \[ P(E_2) = P(E_1 \cup E_2 ) = P\left((E_1\cap E_2) \cup (E_1'\cap E_2)\right) = P(E_1)+ P(E_1'\cap E_2) \]
      • Considering the above, we have \[ P(E_1) = P(E_2)- P(E_1'\cap E_2) \] where \( P(E_1'\cap E_2) \geq 0 \).

Unions of Events and Addition Rules

Venn diagram of events \( A \) and \( B \) with nontrivial intersection.

Courtesy of Bin im Garten CC via Wikimedia Commons

  • More generally, suppose we want to compute the probability of two events \( A \) and \( B \) joined by the compound operation “or” that are not disjoint.
  • We read the statement, \[ P(A \text{ or } B) \] as the probability of event:
    • \( A \) occuring,
    • event \( B \) occuring, or
    • both \( A \) and \( B \) ocurring.
  • Intuitively, we can express the probability in terms of all the ways \( A \) can occur and all the ways \( B \) can occur, if we don’t double count.
  • Consider, if there is an overlap where both \( A \) and \( B \) occur simultaneously, \[ P(A \cap B)\neq \emptyset \] then summing the total of all ways \( A \) occurs and the total of all ways \( B \) occurs double counts the the cases where both \( A \) and \( B \) occur.
  • Therefore, the addition rule for compound events is given as
  • Probability of a union \[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \]

Probability of a union –example

  • EXAMPLE: the table below lists the history of 940 wafers in a semiconductor manufacturing process.
Wafers in Semiconductor Manufacturing Classified by Contamination and Location

Courtesy of Montgomery & Runger, Applied Statistics and Probability for Engineers, 7th edition

  • Question: whats is the probability that a wafer is from the center of the sputtering tool or contains high levels of contamination ?
  • Suppose that 1 wafer is selected at random.
  • Total number of outcomes is \( 626+314=940 \).
  • Let \( H \) denote the event that the wafer contains high levels of contamination. Then, \[ P(H) = \frac{358}{940} \]
  • Let \( C \) denote the event that the wafer is in the center of a sputtering tool. Then \[ P(C) = \frac{626}{940} \]
  • The event \( H \cap C \) is the event that the wafer is from the center of the sputtering tool and contains high levels of contamination.Then \[ P(H \cap C)=\frac{112}{940} \]

  • We can use the addition rule to obtain \[ \begin{align} P(H \cup C) & = P(H)+P(C)-P(H \cap C)\\ & = \frac{358}{940}+\frac{626}{940}-\frac{112}{940}= \frac{872}{940} \end{align} \]
  • Two or more events

    • More complicated probabilities, such as \( P(A \cup B \cup C) \), can be determined by repeated use of the addition rule: \[ P(A \cup B \cup C) = P[(A \cup B) \cup C] = P(A \cup B) + P(C) − P[(A \cup B) \cap C]. \]
    • We can apply the addition rule again on \( P(A \cup B)= P(A)+P(B)-P(A \cap B) \)
    • and we can use the distributed rule for set operations on \[ P[(A \cup B) \cap C]=P[(A\cap C)\cup (B\cap C)] \]
    • We apply the addition rule on the right-hand side of the expression above \[ P[(A\cap C)\cup (B\cap C)] = P(A\cap C)+P(B\cap C )-P(A\cap B \cap C) \]
    • Finally we put everything together \[ \begin{align} P(A \cup B \cup C) & = P(A \cup B) + P(C) − P[(A \cup B) \cap C]\\ & = P(A)+P(B)-P(A \cap B) + P(C) - P(A\cap C) -P(B\cap C )+P(A\cap B \cap C) \end{align} \]
    • If the events are mutually exclusive, the results simplify considerably…
    Venn diagram of four mutually exclusive events

    Courtesy of Montgomery & Runger, Applied Statistics and Probability for Engineers, 7th edition

    • A collection of events, \( E_1 , E_2 , ... , E_k \), is said to be mutually exclusive if for all pairs, \[ E_i \cap E_j = \emptyset. \] For a collection of mutually exclusive events, \( P(E_1 \cup E_2 \cup ... \cup E_k ) = P(E_1 ) + P(E_2 ) + · · ·+ P(E_k ) \)

    Conditional probability

    • There are several ways to consider conditional probability – the “physics” way to consider this, due to Kolmogorov, is as follows:
      • Suppose that there are two related events \( A \) and \( B \) where knowledge of one occuring would change how likely we see the other to occur.
        • For example, we can say \( A= \)"it snows in the Sierra" and \( B= \)"it rains in my garden".
        • The day before, I don’t know if either will occur.
        • However, if I knew that \( A \) occured, this would change how likely it would seem that \( B \) occurs;
        • \( B \) is not guaranteed when \( A \) occurs, but the probability of \( B \) occuring would be higher in the presence of \( A \).
    Venn diagram of events \( A \) and \( B \) with nontrivial intersection.

    Courtesy of Bin im Garten CC via Wikimedia Commons

    • Suppose that \( A \) occurs hypothetically, then our sample space of possible events now only includes events where \( A \) also occurs.
    • I.e., we would need to restrict our consideration of \( B \) relative to the case that \( A \) occurs.
    • We define the probability of \( B \) conditional on \( A \),

      \[ P(B\vert A), \]
      as the probability of \( B \) given \( A \).

    Conditional probability – continued

    Venn diagram of events \( A \) and \( B \) with nontrivial intersection.

    Courtesy of Bin im Garten CC via Wikimedia Commons

    • Assuming \( A \) occurs, we will consider all ways for both \( A \) and \( B \) to occur.
      • The sample space for \( B \vert A \) has been restricted to the cases where \( A \) occurs, so we compute the probability relative to all the ways \( A \) occurs.
      • Therefore the probability of \( P(B\vert A) \) can be read, \[ \frac{\text{number of outcomes in }A\text{ and }B}{\text{number of outcomes in }A} \]
      • Mathematically we write this as
        The conditional probability of an event \( B \) given an event \( A \) is \[ P(B\vert A) = \frac{P(A\cap B)}{P(A)}. \] for \( P(A)>0 \)
      • The above statment only makes sense when \( P(A)\neq 0 \), because we can never divide by zero.
        • “Physically” we can interpret the meaning with \( P(B\vert A) \) read as
          The probability that \( B \) occurs given that \( A \) occurs.
        • The above should not be defined when \( A \) is impossible – the phrase “given that \( A \) occurs” makes no sense.

    Conditional probability – example 1

    Pre-employment drug screening.

    Courtesy of Mario Triola, Essentials of Statistics, 5th edition

    • EXAMPLE: if \( 1 \) of the \( 1000 \) test subjects is randomly selected, let the events be \( A= \)"the participant uses drugs" and \( B= \)"the participant has a positive test result".
    • Question: How can we use the above probability rules to find the probability that a random subject had a positive test result, given that the subject actually uses drugs?
    • I.e. what is the value of \( P(B \vert A) \)?
    • Notice that \[ \begin{align} P(A) &= \frac{44\text{ true positives} + 6 \text{ false negatives}}{1000} = \frac{50}{1000},\\ P(A \cap B) & = \frac{44 \text{ true positives} }{1000} = \frac{44}{1000} \end{align} \]
    • and using the definition of conditional probability \[ P(\text{Subject has positive test result }given\text{ the subject uses drugs})=P(B\vert A ) =\frac{ P(A \cap B) }{P(A)}, \]
    • we can then show that, \[ P(B\vert A ) = \frac{\frac{44}{1000}}{\frac{50}{1000}} = \frac{44}{50} = .88 \]

    Conditional probability – example 1 continued

    Pre-employment drug screening.

    Courtesy of Mario Triola, Essentials of Statistics, 5th edition

    • Keep \( A= \)"the participant uses drugs" and \( B= \)"the participant has a positive test result"
    • Question: Is \( P(B\vert A ) = P(A\vert B)? \)
      • Using the definition of \[ P(A \vert B) = \frac{P(A \cap B)}{P(B)} \] we can see that the probability that a participant uses drugs given participant has a positive test result is \[ \begin{align} P(A \vert B) =&\frac{\frac{44}{1000}}{\frac{134}{1000}} = \frac{44}{134} \approx .32 \end{align} \]
      • However, on the other hand \( P(B\vert A) = .88 \)
      • In fact, \( P(A\vert B) \neq P(B\vert A) \) and they are quite different.
      • This actually says that
        • there is lower probability that the participant uses drugs given that their test result is positive, than
        • the probability that the test result is positive, given that the participant uses drugs.

    Conditional probability – example 2

    400 parts classified by
surface flaws and as (functionally) defective

    Courtesy of Montgomery & Runger, Applied Statistics and Probability for Engineers, 7th edition

    • EXAMPLE: Usually a thin film manufacturing process is sensitive to contamination problems that can increase the rate of parts that are not acceptable. The table on the left provides an example of 400 parts classified by surface flaws and as (functionally) defective.
    • Q: what is the probability that a part will be defective given or assuming that a part has a surface flaw?
      • Let \( D \) denote the event that a part is defective, and let \( F \) denote the event that a part has a surface flaw
      • Notice that \[ \begin{align} P(F) &= \frac{10\text{ functional defective} + 30\text{ functional not defective}}{400} \\ &= \frac{40}{400}\end{align} \]
      • and \[ P(D \cap F) = \frac{10 \text{ defective but functional parts}}{400} \]
      • Then\[ P(D|F)=\frac{P(D\cap F)}{P(F)}=\frac{\frac{10}{400}}{\frac{40}{400}}=\frac{10}{40}=0.25 \]

    Conditional probability – example 2 continued

    • Note that in this example all four of the following probabilities are different: \[ \begin{align} P(D) &=28/200 &\;&P(D|F)=10/40 \\ P(F) &=40/400 &\;&P(F|D)=10/28 \end{align} \]
    Tree diagram for parts classified.

    Courtesy of Montgomery & Runger, Applied Statistics and Probability for Engineers, 7th edition

    • A tree diagram can also be used to display conditional probabilities
      • First level of the tree is on surface flaw.
        • Of the 40 parts with surface flaws:
          • Second level is on defective parts: 10 are functionally defective and 30 are not
          • \[ P(D|F)=10/40 \text{ and } P(D'|F)=30/40 \]
        • Of the 360 parts without surface flaws:
          • Second level: 18 are functionally defective and 342 are not.
          • \[ P(D|F')=18/360\text{ and } P(D'|F')=342/360 \]

    Intersections of Events and Multiplication Rule

    • The probability of the intersection of two events is often needed
    • Let us suppose that \( A \) and \( B \) are events for which \( P(A)\neq 0 \) and \( P(B)\neq 0 \).
    • Using the definition of conditional probability \[ P(B|A)=\frac{P(A\cap B)}{P(A)} \]
    • We can solve for the intersection of events \[ P(A \cap B) = P(B\vert A) P(A) \]
    • Similarly, from the same definition of conditional probability we have \[ P(A|B)=\frac{P(A\cap B)}{P(B)} \]
    • which implies that \[ P(A \cap B) = P(A\vert B) P(B) \]
    • We can provide a formula known as the multiplication rule
      Probability of an Intersection: \[ P(A \cap B) = P(B\vert A) P(A) = P(A\vert B) P(B) \]