Fundamentals of probability part III

03/03/2020

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

  • The following topics will be covered in this lecture:
    • Review of addition rule
    • Review of conditional probability
    • Review of the multiplication rule
    • Review of independence
    • Further notes on independence
    • Further uses of conditional probability
    • Bayes' Law

Compound events

  • We will often be concerned not with one event \( A \) but some combination of some event \( A \) and some event \( B \).
  • Compound event – formally we define a compound event as any event combining two or more simple events.
  • There are two key operations joining events
    1. “OR” – in mathematics we refer to “or” as a non-exclusive “or”.
      • The meaning of this for “\( A \) or \( B \)” is – event \( A \) occurs, event \( B \) occurs, or both events \( A \) and \( B \) occur.
      • We will not consider the exclusive “or”, i.e. either event \( A \) occurs, or event \( B \) occurs, but not both.
    2. “AND” – in mathematics we refer to “and” in an exclusive sense.
      • The meaning of this for “\( A \) and \( B \)” is – both event \( A \) and event \( B \) occurs.
  • The operations “and” and “or” join events together in a way that we can compute the probability of the joint events.

Addition rule

Venn diagram of events \( A \) and \( B \) with nontrivial intersection.

Courtesy of Bin im Garten CC via Wikimedia Commons

  • Suppose we want to compute the probability of two events \( A \) and \( B \) joined by the compound operation “or”.
  • We read the statement, \[ P(A \text{ or } B) \] as he probability of event:
    • \( A \) occuring,
    • event \( B \) occuring, or
    • both \( A \) and \( B \) ocurring.
  • Intuitively, we can express the probability in terms of all the ways \( A \) can occur and all the ways \( B \) can occur, if we don’t double count.
  • Let all the ways that \( A \) can occur be represented by the red circle to the left.
  • Let all the ways that \( B \) can occur be represented by the dashed circle to the left.
  • If there is any overlap between events \( A \) and \( B \) so that they can occur simultaneously, \( P(A) +P(B) \) counts the cases where \( A \) and \( B \) both occur twice.
  • Therefore, the addition rule for compound events is given as, \[ P(A\text{ or }B) = P(A) + P(B) - P(A\text{ and }B) \]

Complementary events

Venn diagram of events \( A \) and \( B \) with nontrivial intersection.

Courtesy of Bin im Garten CC via Wikimedia Commons

  • A special case of the addition rule comes up when the events \( A \) and \( B \) are disjoint or mutually exclusive.
    • When \( A \) and \( B \) are disjoint, this means that there is no overlap between these events and they will never occur simultaneously.
    • In this case \( P(A \text{ and } B) = 0 \), so the addition rule becomes, \[ \begin{align} P(A \text{ or } B) = P(A) + P(B) - P(A \text{ and } B) &= P(A) + P(B) \end{align} \]
  • Recall the complement of \( A \), denoted \( \overline{A} \) is the event where \( A \) does not occur.
    • By definition, \( A \) and \( \overline{A} \) are disjoint because \( A \) will not both occur and not occur simultaneously.
  • However, complementary events make up all possible outcomes – \( A \) will either occur or not occur, so that we are certain about the outcome of \( P(A \text{ or } \overline{A}) \).
    • That is, we know by definition that \[ P(A \text{ or } \overline{A}) = 1 \]
  • Using the above fact, along with the disjointness of \( A \) and \( \overline{A} \) with the addition rule, \[ 1= P\left(A\text{ or }\overline{A}\right) = P(A) + P\left(\overline{A}\right). \] for any event \( A \) and its complement \( \overline{A} \).

Multiplication rule

Venn diagram of events \( A \) and \( B \) with nontrivial intersection.

Courtesy of Bin im Garten CC via Wikimedia Commons

  • There are several ways to consider the multiplication rule for probability – the “physics” way to consider this, due to Kolmogorov, is as follows:
    • Suppose that there are two related events \( A \) and \( B \) where knowledge of one occuring would change how likely we see the other to occur.
      • For example, we can say \( A= \)"it snows in the Sierra" and \( B= \)"it rains in my garden".
      • The day before, I don’t know if either will occur.
      • However, if I knew that \( A \) occured, this would change how likely it would seem that \( B \) occurs;
      • \( B \) is not guaranteed when \( A \) occurs, but the probability of \( B \) occuring would be higher in the presence of \( A \).
    • Supose that \( A \) occurs hypothetically, then our sample space of possible events now only includes events where \( A \) also occurs.
    • I.e., we would need to restrict our consideration of \( B \) relative to the case that \( A \) occurs.
  • We define the probability of \( B \) conditional on \( A \),

    \[ P(B\vert A), \]
    as the probability of \( B \) in the case that \( A \) occurs.

Multiplication rule continued

Venn diagram of events \( A \) and \( B \) with nontrivial intersection.

Courtesy of Bin im Garten CC via Wikimedia Commons

  • We now consider the probability of \( B \) conditional on \( A \),

    \[ P(B\vert A), \]
    as the probability of \( B \) in the case that \( A \) occurs.
    • For example, we can say \( A= \)"it snows in the Sierra" and \( B= \)"it rains in my garden".
  • Assuming \( A \) occurs, we will consider all ways for both \( A \) and \( B \) to occur.
    • The sample space for \( B \vert A \) has been restricted to the cases where \( A \) occurs, so we compute the probability relative to all the ways \( A \) occurs.
  • Therefore the probability of \( P(B\vert A) \) can be read, \[ \frac{\text{All the ways }A\text{ and }B\text{ can occur}}{\text{All the ways }A\text{ can occur}} \]
  • Mathematically we write this as, \[ P(B\vert A) = \frac{P(A\text{ and } B)}{P(A)}. \]
  • For example, in plain English we can say
    The probability that it rains in my garden, given that it snows in the Sierra, is equal to the probability of both occuring relative to the probability of snow in the Sierra.

Multiplication rule continued

Venn diagram of events \( A \) and \( B \) with nontrivial intersection.

Courtesy of Bin im Garten CC via Wikimedia Commons

  • We have now defined the conditional probability of \( B \) given \( A \) as, \[ P(B\vert A)=\frac{P(A\text{ and }B)}{P(A)} \] in the Kolmogorov way.
  • We should make some notes about this:
    • The above statment only makes sense when \( P(A)\neq 0 \), because we can never divide by zero.
      • “Physically” we can interpret the meaning with \( P(B\vert A) \) read as
        The probability that \( B \) occurs given that \( A \) occurs.
      • The above should not be defined when \( A \) is impossible – the phrase “given that \( A \) occurs” makes no sense.
    • Using the definition of conditional probability, we get the multiplication rule.
    • The multiplication rule for probability tells us that, \[ P(A \text{ and } B) = P(B\vert A) \times P(A) \]
    • We will use the above formula quite generally, but we note:
      • \( P(B\vert A) \) is not defined when \( P(A)=0 \)
      • However, \( P(A\text{ and }B) = 0 \) when \( P(A)=0 \) because this is the probability that \( B \) and the impossible event \( A \) both occur.

Independence

  • Closely related notions to the conditional probability are independence and dependence of events.
    • Dependence – two events are said to be dependent if the outcome of one event directly affects the probability of the other.
      • In the earlier example, \( A= \)"snow in the Sierra" and \( B= \)"rain in my garden" are dependent events, because one occuring would affect the chance the other occured.
      • However, dependence between events \( A \) and \( B \) does not mean that \( A \) causes \( B \) or vice versa.
      • Rain in my garden does not cause snow in the Sierra, but the probability of snow in the Sierra is larger if there is rain in my garden.
    • Independence – two events are said to be independent if the outcome of either event has no impact on the probability of the other.
      • When we think of events being independent we should think of events that are not related to each other
      • For example, if our process is “what happens today”, \( A= \)"snow in the Sierra" and \( B= \)"coin flip heads" are independent, because neither outcome affects the other.
  • Mathematically, we can see the meaning of independence clearly by stating, \( A \) and \( B \) are independent by definition if and only if both of the following hold, \[ \begin{matrix} P(A\vert B) = P(A) & \text{and} & P(B\vert A) = P(B). \end{matrix} \]
  • In plain English, the above says
    The probability of event \( A \) does not change in the presence of \( B \) and vice versa.
  • Particularly, the outcome of \( A \) or \( B \) does not affect the other.

Redundancy and the multiplication rule

  • Machine systems in engineering are often designed with multiple, redundant safety features.
  • Particularly, if there are multiple, independent saftey checks, we can reduce the probability of a catastrophic failure substantially.
  • For example, the Airbus 310 twin-engine airliner has three independent hydraulic systems so that if one fails, another system can step in and maintain flight control.
  • For sake of example, we will assume that the probability of a randomly selected hydraulic system failing is \( 0.002 \).
  • Discuss with a neighbor: if the airplane had only one hydraulic system, what would be the proability that an airplane would be able to maintain control for the flight?
    • Let event \( A= \)"hydraulic system fails" so that \( \overline{A}= \)"airplane maintains control".
    • We can then state, \[ P(\overline{A}) = 1 - P(A) = 1 - 0.002 = 0.998. \]
  • Discuss with a neighbor: what is the probability that an airplane would be able to maintain control with the three independent hydraulic systems?
    • Let us denote \( A_1= \)"hydraulic system \( 1 \) fails", \( A_2= \)"hydraulic system \( 2 \) fails" and \( A_3= \)"hydraulic system \( 3 \) fails".
    • The event where all hydraulic systems fail is given by, \[ \left(A_1\text{ and } A_2\text{ and } A_3\right) \] so that the airplane is able to maintain control in the complement of the above event: \[ \overline{\left(A_1\text{ and } A_2\text{ and } A_3\right)}. \]

Redundancy and the multiplication rule continued

  • We recall, the probability that a randomly selected hydraulic system fails is \( 0.002 \) and the three systems are independent.
  • Therefore, we can use the multiplication rule as, \[ \begin{align} P\left(A_1\text{ and } A_2\text{ and } A_3\right) &= P(A_1\text{ and } A_2\vert A_3) \times P(A_3)\\ &=P(A_1\text{ and } A_2) \times P(A_3) \\ &= P(A_1 \vert A_2) \times P(A_2) \times P(A_3) \\ & = P(A_1)\times P(A_2) \times P(A_3) \end{align} \] because each of the events are inependent.
  • Finally, we can write, \[ P\left(\overline{\left(A_1\text{ and } A_2\text{ and } A_3\right)}\right) = 1 - P(A_1)\times P(A_2) \times P(A_3)= 1-0.002^3=0.999999998 \]
  • This shows how including multiple independent systems greatly improves the probability of success.

Another way of writing independence

  • What we saw in the last slide, \[ P(A_1 \text{ and } A_2 \text{ and } A_3) = P(A_1) \times P(A_2) \times P(A_3) \] actually holds generally for independent events.
  • Let’s suppose that \( A \) and \( B \) are independent events such that \[ \begin{align} P(A\vert B) = P(A) && P(B\vert A) = P(B). \end{align} \]
  • Consider the multiplication rule for the two independent events \( A \) and \( B \), \[ \begin{align} P( A \text { and } B) &= P(A \vert B) \times P(B) \\ &=P(A) \times P(B), \end{align} \] using the independence assumption.
  • In fact, we can show that this holds for any number of independent events, re-using the argument above.
  • Let \( A_1 \), \( A_2 \), \( A_3, \) \( \cdots \) \( A_n \) be any arbitrary list of mutually independent events.
  • Then using the argument above \( n \) times, we can show that, \[ P(A_1 \text{ and } \cdots \text{ and } A_n) = P(A_1) \times \cdots \times P(A_n). \]
  • Although the intuition of the statement for independence \[ \begin{align} P(A\vert B) = P(A) && P(B\vert A) = P(B) \end{align} \] is easier to interpret,
  • in practice, we will usually describe independence as \[ P(A_1 \text{ and } \cdots \text{ and } A_n) = P(A_1) \times \cdots \times P(A_n). \]
  • These two notions are in fact equivalent by the argument above.

The probability of “at least one”

Venn diagram of events \( A \) and \( B \) with nontrivial intersection.

Courtesy of Bin im Garten CC via Wikimedia Commons

  • It is a common problem that we want to find the probability of at least one event occuring.
  • For example, let’s suppose that a car engine has a known defect rate in the production process of \( \frac{5}{1000} \).
  • The probability of any two randomly selected engines being defective will be considered independent;
  • suppose we want to know the probability that a shipment of \( 50 \) cars has at least one defective unit that will need a return shipment.
  • Analyzing the problem directly can be difficult unless we consider complementary events.
  • Let’s suppose that \( A= \)"at least one engine is defective".
  • Notice that using complements \[ \overline{``\text{at least one engine is defective"}} = \text{all engines work} \]
  • The probability that all engines work can instead be computed directly.
  • Discuss with a neighbor: let \( A_i= \)"car \( i \) has a working engine". Using conjunctions what is the event where all engines work? If we want to find the probability that at least one engine fails, how can we use complements to find this in terms of the \( A_i \)?
    • Notice, \[ (A_1 \text{ and } \cdots \text{ and } A_{50}) = \text{ all cars have working engines} \]
  • Therefore we want to find the probability of \( \overline{(A_1 \text{ and } \cdots \text{ and } A_{50})} \).

The probability of “at least one” continued

Venn diagram of events \( A \) and \( B \) with nontrivial intersection.

Courtesy of Bin im Garten CC via Wikimedia Commons

  • We know that the probability of engine \( i \) working is \[ P(A_i) = \frac{995}{1000}, \] because it is the complement of engine \( i \) failing (with probability \( \frac{5}{1000} \)).
  • We also know that all \( A_i \) are independent from each other.
    • Therefore, \[ \begin{align} P(A_1 \text{ and } \cdots \text{ and } A_{50}) &= P(A_1)\times \cdots \times P(A_n)\\ &= \left(\frac{995}{1000}\right)^{50} \end{align} \]
  • Discuss with a neighbor: if we want to find \[ P\left(\overline{(A_1 \text{ and } \cdots \text{ and } A_{50})}\right), \]
  • how can we use the rule of complements, \[ P(B) + P(\overline{B}) = 1 \] to find the above probability?
  • Let \( B=(A_1 \text{ and } \cdots \text{ and } A_{50}) \), then we have, \[ P(\text{at least one engine fails})= 1 - \left(\frac{995}{1000}\right)^{50} \approx 0.222. \]

Review of key concepts

  • The most important concepts covered here are:
    • how to join events \( A \) and \( B \) with our two operations,
      1. \( A \) “or” \( B \) – the case that \( A \), \( B \) or both \( A \) and \( B \) occur; and
      2. \( A \) “and” \( B \) – the case that both \( A \) and \( B \) occur;
    • how to take complements of events, and how the probabilities are related, e.g., \[ P(A) + P\left(\overline{A}\right) = 1; \]
    • how to use the two probability rules,
      1. Addition rule – for the event \( A \) or \( B \), \[ P(A\text{ or }B)= P(A) + P(B) - P(A \text{ and } B); \]
      2. Multiplication rule – for the event \( A \) and \( B \), \[ P(A \text{ and } B) = P(B\vert A) \times P(A); \]
    • the notion of independence between events \( A \) and \( B \), \[ \begin{align} P(A\vert B) = P(A) & & P(B\vert A) = P(B). \end{align} \]
    • and the product rule for mutually independent events:
      • let \( A_1 \), \( A_2 \), \( A_3, \) \( \cdots \) \( A_n \) be any arbitrary list of mutually independent events, then \[ P(A_1 \text{ and } \cdots \text{ and } A_n) = P(A_1) \times \cdots \times P(A_n). \]

Example of conditional probability

Pre-employment drug screening.

Courtesy of Mario Triola, Essentials of Statistics, 5th edition

  • Let’s recall the multiplication rule, \[ P(A \text{ and } B) = P(B \vert A) \times P(A). \]
  • Note that we can always use this formula in an alternative form whenever \( P(A)\neq 0 \), \[ P(B\vert A ) =\frac{ P(A \text{ and } B) }{P(A)}. \]
  • Discuss with a neighbor: if \( 1 \) of the \( 1000 \) test subjects is randomly selected, let the events be \( A= \)"the participant uses drugs" and \( B= \)"the participant has a positive test result".
  • How can we use the above probability rules to find the probability that a random subject had a positive test result, given that the subject actually uses drugs?
    • Notice that the above statement is just \( P(B\vert A) \), so that we can compute this value by \[ P(B\vert A ) =\frac{ P(A \text{ and } B) }{P(A)}. \]
  • Discuss with a neighbor: what is the value of \( P(B \vert A) \)?
    • Notice that \[ \begin{align} P(A) &= \frac{44\text{ true positives} + 6 \text{ false negatives}}{1000} = \frac{50}{1000},\\ P(A \text{ and } B) & = \frac{44 \text{ true positives} }{1000} = \frac{44}{1000} \end{align} \]

Example of conditional probability continued

Venn diagram of events \( A \) and \( B \) with nontrivial intersection.

Courtesy of Bin im Garten CC via Wikimedia Commons

  • Using the values of \[ \begin{align} P(A)= \frac{50}{1000} & &P(A \text{ and } B) = \frac{44}{1000}, \end{align} \]
  • and using the definition of conditional probability \[ P(B\vert A ) =\frac{ P(A \text{ and } B) }{P(A)}, \]
  • we can then show that, \[ \begin{align} &P(\text{Subject has positive test result }given\text{ the subject uses drugs}) \\ =& \frac{\frac{44}{1000}}{\frac{50}{1000}} = \frac{44}{50} = .88 \end{align} \]
    • Notice how in the above we cancel the denominators of \( 1000 \); this corresponds “physically” once again to restricting to the sample space where \( A \) occurs,
    • i.e., we restrict to the red circle from the venn diagram.
  • We then compute the probability of \( A \text{ and }B \) occuring relative to the probability of \( A \),
    • i.e., we compute the probability of the intersection relative to the probability of the red circle.
  • This cancels the denominators correspoding to the full sample space, \( \frac{1}{1000} \), in the conditional probability above.

Example of conditional probability continued

Pre-employment drug screening.

Courtesy of Mario Triola, Essentials of Statistics, 5th edition

  • Discuss with a neighbor: let the events be given again as \( A= \)"the participant uses drugs" and \( B= \)"the participant has a positive test result".
  • If \( 1 \) of the \( 1000 \) test subjects is randomly selected, can we write \[ P(B\vert A ) = P(A\vert B)? \] Why or why not?
    • Using the definition of, \[ P(A \vert B) = \frac{P(A \text{ and } B)}{P(B)} \] we can see that \[ \begin{align} &P(A \vert B) \\ = &P(\text{participant uses drugs } given\text{ participant has a positive test result})\\ =&\frac{\frac{44}{1000}}{\frac{133}{1000}} = \frac{44}{133} \approx .33 \end{align} \]
    • However, on the other hand \( P(B\vert A) = .88 \)

Example of conditional probability continued

Venn diagram of events \( A \) and \( B \) with nontrivial intersection.

Courtesy of Bin im Garten CC via Wikimedia Commons

  • In fact, \( P(A\vert B) \neq P(B\vert A) \) and they are quite different.
  • This actually says that
    • there is lower probability that the participant uses drugs given that their test result is positive, than
    • the probability that the test result is positive, given that the participant uses drugs.
  • This can be understood in terms of the denominators, \[ \begin{align} P(A\vert B) = \frac{P(A \text{ and } B)}{P(B)} & & P(B\vert A) = \frac{P( A \text{ and } B)}{P(A)} \end{align} \]
  • Both of the above statements measure \( P(A \text{ and } B) \), but relative to different spaces.
  • For \( P(A\vert B) \), we restrict to the black circle and measure the intersection.
    • I.e., we measure the intersection relative to the number of positive test results.
  • On the other hand for \( P(B \vert A) \) we restrict to the red circle and measure the intersection.
    • I.e., we measure the intersection relative to the number of participants using drugs.
  • In fact, there are far fewer participants using drugs than the number of positive test results,
    • therefore, the denominator for \( P(A\vert B) \) is larger, and \[ P(A\vert B) < P(B\vert A). \]

Bayes' theorem

  • Let us suppose that \( A \) and \( B \) are events for which \( P(A)\neq 0 \) and \( P(B)\neq 0 \).
  • Consider the statement of the multiplication rule, \[ P(A \text{ and } B) = P(A\vert B) P(B); \]
  • yet it is also true that, \[ P(B\text{ and } A) = P(B \vert A) P(A); \]
  • and \( P( A \text{ and } B) = P(B \text{ and } A) \) by definition.
  • Putting these statments together, we obtain, \[ \begin{align} &P(A\vert B) P(B) = P(B \vert A ) P(A)\\ \Leftrightarrow & P(A \vert B) = \frac{P(B\vert A) P(A)}{ P(B)} \end{align} \]
  • The statement that \[ P(A \vert B) = \frac{P(B\vert A) P(A)}{ P(B)} \] is known as Bayes' theorem.
  • This is nothing more than re-writing the multiplication rule as discussed above, but the result is extremely powerful.
  • Bayes' theorem wasn’t widely used in statistics for hundreds of years, until advances in digital computers.
  • When digital computers became available, many tools became available using Bayes' theorem as the basis.

Bayes' theorem continued

  • Often, Bayes \[ P(A \vert B) = \frac{P(B\vert A) P(A)}{ P(B)} \] is used as a way to update the probability of \( A \) when you have new information \( B \).
    • For example, let the events \( A= \)"it snows in the Sierra" and \( B= \)"it rains in my garden".
    • I might think there is a \( P(A) \) prior probability for snow, without knowing any other information.
    • \( P(A\vert B) \) is the posterior probability of snow in the Sierra given rain in my garden.
    • If I found out later in the day that there was rain in my garden, I could update \( P(A) \) to \( P(A\vert B) \) by multiplying \[ P(A\vert B) = P(A) \times \left(\frac{P(B\vert A)}{P(B)}\right) \] directly.
    • Although this is a simplistic example, this logic is the basis of many weather prediction techniques.

Bayes' theorem example

Cancer results .

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • Another classic example of using Bayes is with medical diagnoses.
  • It is often the case that the implications of a positive test result are misunderstood.
  • For example, let’s suppose that \( 1\% \) of the population has some kind of cancer.
  • Let’s suppose that there is a test for cancer with a:
    1. false positive rate of \( 10\% \); and a
    2. true positive rate of \( 80\% \).
  • Before we compute Bayes' formula, we will use the above information to deduce the values for the above table.
  • Discuss with a neighbor: if we want to pretend that the numbers above hold exactly for some sample of \( 1000 \) individuals, how many total individuals would have cancer.
    • This would be \( .01\times 1000 = 10 \).
    • This is precisely the sum of the True Positive and False Negatives in the first row.

Bayes' theorem example

Cancer results .

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • Recall that \( 1\% \) of the population has some kind of cancer and that there is a test for cancer with a:
    1. false positive rate of \( 10\% \); and a
    2. true positive rate of \( 80\% \).
  • Discuss with a neighbor: if we want to pretend that the numbers above hold exactly for some sample of \( 1000 \) individuals, how many total individuals would not have cancer.
    • This would be \( ( 1 - .01)\times 1000 = 990 \).
    • This is precisely the sum of the False Positives and True Negatives in the second row.

Bayes' theorem example

Cancer results .

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • Recall that \( 1\% \) of the population has some kind of cancer and that there is a test for cancer with a:
    1. false positive rate of \( 10\% \); and a
    2. true positive rate of \( 80\% \).
  • Discuss with a neighbor: if we want to pretend that the numbers above hold exactly for some sample of \( 1000 \) individuals, how many total individuals would be false positives.
    • This would be \( 990 \times 0.1 = 99 \).

Bayes' theorem example

Cancer results .

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • Recall that \( 1\% \) of the population has some kind of cancer and that there is a test for cancer with a:
    1. false positive rate of \( 10\% \); and a
    2. true positive rate of \( 80\% \).
  • Discuss with a neighbor: if we want to pretend that the numbers above hold exactly for some sample of \( 1000 \) individuals, how many total individuals would be true negatives.
    • This would be \( 990 - 99 = 891 \).

Bayes' theorem example

Cancer results .

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • Recall that \( 1\% \) of the population has some kind of cancer and that there is a test for cancer with a:
    1. false positive rate of \( 10\% \); and a
    2. true positive rate of \( 80\% \).
  • Discuss with a neighbor: if we want to pretend that the numbers above hold exactly for some sample of \( 1000 \) individuals, how many total individuals would be true positives.
    • This would be \( 10 \times 0.8 = 8 \).

Bayes' theorem example

Cancer results .

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • Recall that \( 1\% \) of the population has some kind of cancer and that there is a test for cancer with a:
    1. false positive rate of \( 10\% \); and a
    2. true positive rate of \( 80\% \).
  • Discuss with a neighbor: if we want to pretend that the numbers above hold exactly for some sample of \( 1000 \) individuals, how many total individuals would be false negatives.
    • This would be \( 10 - 8 = 2 \).

Bayes' theorem example

Cancer results .

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • Recall that \( 1\% \) of the population has some kind of cancer and that there is a test for cancer with a:
    1. false positive rate of \( 10\% \); and a
    2. true positive rate of \( 80\% \).
  • We will now use the above information to compute Bayes' formula for the update of the prior to the posterior probability.
  • Let \( A \) be the event that a randomly selected individual has cancer.
  • Let \( B \) be the event that a randomly selected individual gets a positive test result.
  • Discuss with a neighbor: what is the prior probability that a randomly selected individual has cancer?
    • This is precisely \( P(A)=1\% \)
  • Discuss with a neighbor: what is \( P(B \vert A) \)?
    • This is precisely the rate of true positives \( 80\% \).
  • Discuss with a neighbor: what is the posterior probability of a randomly selected individual having cancer knowing that they have a positive test result \( P(A \vert B) \)?
    • This is given by \[ P(A\vert B) = \frac{P(B\vert A) P(A)}{P(B)} = \frac{.01 \times .08 }{\frac{107}{1000}}\approx 7.48\% \]

Bayes' theorem example

Cancer results .

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

  • Bayes' theorem thus tells us an interesting fact about the cancer diagnosis.
  • Because the probability of a true positive depends on both:
    1. the base-line rate of cancer in the population (the prior); and
    2. the likelihood of a true positive versus a false positive;
  • the probability of having cancer given a positive test result is quite low.
  • Bayes' theorem tells us that our prior probability of having cancer (with all factors held equal) is \( 1\% \).
  • Even if we obtain a positive test result, because so few individuals have cancer to begin with, the probability of having cancer conditional on the test result rises to only about \( 7.48\% \).
  • This is one thing that is commonly misunderstood about medical diagnoses, but which Bayes clarifies.
  • Particularly, the probability of a true positive depends on both the prior probability and the likelihood.