Fundamentals of probability

02/18/2020

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

The following topics will be covered in this lecture:
- Events
- Sample spaces
- Classical version of probability
- Relative frequency approximation of probability
- Probabilistic reasoning
- Complements of events
- Odds

Basics of probability

In order to interpret a population from a sample, we required that the sample was representative of the full population.
- Using random chance to mix the participants in the samples was one way of ensuring that the smaller sample would be a good approximation of the population.
However, in any sample there is natural variation amongst those sampled that leads to two repeated samples not to look like eachother.
- One of the important considerations is thus, how likely would it be to compute a sample statistic just by chance, due to the natural variation of resampling?
Inferential statistics are differentiated from the descriptive statistics we have seen so far in how they address this above question.
- Descriptive statistics help us learn about the sample we have in hand, but we must use tools from probability to address how these results might generalize to a wider population.
One of the most important tools we will learn in this class is hypothesis testing, as one way to test if a claim might be infered about the wider population.

Basics of probability example

Diagram of hypothesis testing for a gender selection method.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

We can take a basic example to illustrate this point.
Suppose a clinical trial will be used to determine if a certain fertility treatment will increase the chance that a pregnancy will result in a female birth.
There is some random chance involved in an un-assisted pregnancy whether the baby will be a girl or boy, and either result is about as likely as the other.

Therefore, every group of \( 100 \) births in a control group might look quite different.

We want to find a way to be more confident that if there is an effect of the treatment, it can be distinguished from this random variation.
Formally we will make a claim and a hypothesis:

Claim: the fertility treatment will greatly increase the chance of a baby being born a girl over the control group.
Hypothesis: the treatment has no effect and it is equally likely that a pregnancy will result in a female or a male birth under the treatment.

Note: in the above, we take the null hypothesis, i.e., to test the claim we assume that it is not true and then evaluate the results.

Basics of probability example

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

The purpose of the null hypothesis is to evaluate if it was likely to see our results by the natural variation in any batch of samples.
Let’s suppose there are two different treatments, Treatment A and Treatment B with results in the figure to the left.
Let us suppose that \( 100 \) pregnancies are given Treatment A:

Under Treatment A, we suppose that out of \( 100 \) pregnacies, \( 75 \) girls were born and \( 25 \) boys were born.
Our null hypothesis is that even under the treatment, a female or male birth were equally likely.
Threrefore, we evaluate that the chance to get at least \( 75 \) female births out of \( 100 \) births (with equally likely probability of a female or male) is about \( 3\times 10^{-7} \) or three in ten millon.
Under the assumption that either a female or male birth was equally likely, this is an extremely unlikely event.
We therefore will reject our original hypothesis, because seeing this result by chance sampling variation would be extremely unlikely.

This is once again the meaning of a statistically significant / interesting event, so we conclude that the treatment appears to be effective.

Basics of probability example

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

We will now suppose that \( 100 \) pregnancies are given Treatement B:

We suppose that the \( 100 \) pregnancies result in \( 55 \) female and \( 45 \) male births.
Our null hypothesis is that even under the treatment, a female or male birth were equally likely.
Threrefore, we evaluate that the chance to get at least \( 55 \) female births out of \( 100 \) births (with equally likely probability of a female or male) is about \( 0.184 \).

That is, if we resampled a control group with no treatment, we would get this result in almost one out of five sample control groups.

If it is so likely to see such a result by natural variation in sampling, we fail to reject chance (the null hypothesis) as the explanation.
Therefore, we cannot conclude that the treatment was effective given the probability of seeing a result at least this extreme by chance.

Note: it is our understanding of probability that lets us tell something about how the statistics from this small sample might generalize to the whole population.

Probability vocabulary

Table of events versus simple events in pregnancy example.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

In order to make mathematical statements, we will need to introduce some vocabulary about probability.
Suppose we want to consider a future result of some procedure for which the outcome is unkown – e.g. the gender of some number of births.
Event – an event is any collection of results or outcomes of this procedure.

As an easy example, we can consider the process of “gender of one birth”.

This is a process that has some randomness, where we will encode possible outcomes as:

\( g \) – female birth; and
\( b \) – male birth.

An example of an event for “gender of one birth” is “one girl” \( \{g\} \).
Another example of an event for “gender of one birth” is “one boy” \( \{b\} \).
For simplicity, we will exclude other possible outcomes and consider these as the only two possible events for “gender of one birth”.

Probability vocabulary continued

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Simple event – a simple event is an event that cannot be broken into simpler parts.
To understand a simple event we will need to consider a more complicated process.
Suppose our new process is “genders of three births”.
We note, the outcome \( 1 \) female and \( 2 \) male births is an event of this process.

However, the event \( 1 \) female and \( 2 \) male births can be broken up into three possible simple events:

\( \{bbg\} \);
\( \{bgb\} \); and
\( \{gbb\} \).

That is, the event \( 1 \) female \( 2 \) male births has be one of the above simple events that are outcomes of the process “genders of three births”.
Note: we do not consider \( \{g\} \) or \( \{b\} \) to be simple events for this process – indeed, these correspond to only one birth which are not outcomes of “genders of three births”.
The reasoning behind classifying events in this was is so that we can count all possible ways we can observe a complex event like “\( 1 \) female and \( 2 \) male births” from our known possiblities.

Probability vocabulary continued

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Sample space – this is the collection of all simple events.

For example, all simple events for “genders of one birth” are \( \{b,g\} \).
On the other hand, all simple events for “genders of three births” are, \[ \{bbb, bbg, bgb, bgg, gbb, gbg, ggb, ggg\}, \] i.e., all possible combinations of three births in our simplified model.

The reason for collecting all possible simple events into a sample space is because for many problems, this is a natural way to compute probabilities.

If we can describe all the ways an event \( A \) can occur among simple events relative to the total number of simple events, we can decide how often it is likely to observe \( A \).

Historically, probability was developed as a science in understanding games of chance.
For example, computing the probability of rolling an even number on a fair, six-sided die, we can compute this directly.
We know exactly all possible simple events (rolling \( 1 \) through \( 6 \)), how many ways we can roll an even number (\( 2,4,6 \) so three ways), and with all sides equally likely, the chance is \( \frac{3}{6} \).
This kind of logic is the basis of classical probability, though there are ways this approach might not work (non-fair dice) and for this we will develop some tools to make probabilistic statements.

Classical approach to probability

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

We will generally denote some event by a capital letter, e.g.,

\( A \)=“one female birth, two male births”.

Then, we will denote the probability of some event \( P(A) \) – this can be read as “the probability of \( A \) occuring”.
The classical approach to probability thus uses the assumption that all events in the sample space are equally likely.

This should be like, e.g., rolling a fair die.

The classical approach to proability thus gives the formula,

\[ P(A) = \frac{\text{Number of ways for }A\text{ to occur}}{\text{Number of simple events}} \]
Let’s assume that in this example all simple events are equally likely;
Discuss with a neighbor: what is the probability of \( A \)=“one female birth, two male births” using the classical approach?

Using the above table, we see there are three ways \( A \) can occur

\[ \{bbg,bgb,gbb\}. \]
The classical approach tells us,
\[ P(A) =\frac{3}{8}. \]

Classical approach to probability

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Another classical problem in probability is as follows.
Let \( A= \)"three births of the same gender".
Let’s assume that in this example all simple events are equally likely;
Discuss with a neighbor: what is the probability of \( A \) using the classical approach?

Note that there are two ways \( A \) can take place from the simple events, \[ \{bbb,ggg\}. \]
Therefore, we can compute, \[ P(A)=\frac{2}{8}= \frac{1}{4}. \]

Relative frequency approximation of probability

The classical approach to probability works mostly for cases like games of chance, where the the sample space is limited and all outcomes are equally likely.
However, many examples cannot be broken into equally likely simple events.

Indeed, it actually is not equally likely for a birth to be either female or male, there is approximately \( 0.512 \) probability of male.
A simple counter-example is an unfair, weighted coin.
We cannot accurately calculate the probability to flip a “heads” with the last rule when the coin has been weighted to come up on heads more often.

If we don’t know how the coin is weighted, a natural way to approximate the true probability is by replication of the process.

For example, we can flip the coin and record whether it lands on heads or tails.
Let \( A= \)"coin lands heads" and \( B= \)"coin lands tails".
We will construct a frequency distribution for both of these events over many repeated flips.
We will then approximate the true probability of event \( A \) by the frequency of \( A \) relative to all replications of the process.

The relative frequency approximation is thus given as, \[ P(A) \approx \frac{\text{Frequency event }A\text{ is observed}}{\text{Total number of times the process is repeated}}. \]
Note: this is only an approximation, and if we wanted to even closely approximate \( P(A) \) for a fair coin, this requires many replications.
However, this is a natural way of thinking about probability – if we could flip a coin infinitely many times we could get the true value.

Probabilistic reasoning

More formally, this intuition is known as the “Law of large numbers”.
- If we are thinking about the fair coin flipping, as we take larger and larger approximations by the relative frequency, \[ P(A) \approx \frac{\text{Frequency event }A\text{ is observed}}{\text{Total number of times the process is repeated}}, \] this value will approach \( .5 \) with enough flips.
However, this does not imply that if we get \( 20 \) tails in a row that we are any more likely to flip a heads.
- This is known as the gambler's fallacy, where one incorrectly concludes that losing many times implies that a win is more likely.
Every time we flip a fair coin, its outcome is independent of the earlier outcomes and the probability is always \( 50\% \) that it will land heads or tails.
- In fact the relative frequency can go on arbitrarily large excursions away from the true probability, but the likelihood of this happening is low so that these excursions are infrequent.
We should also note that when we do not know the probability of different events, it is not accurate to assign them all equal probability .
- For example, if we consider events \( A= \)"I get caught in the rain on the way to work" and \( B= \)"I do not get rained on", we cannot arbitrarily say that these have equal probability, i.e., \[ \begin{matrix} P(A) =0.5 && P(B)=0.5.\end{matrix} \]
- The location of rain clouds depends strongly on land surface like mountains and atmospheric conditions like wind patterns; if I get rained on, this depends strongly on various unequal factors.

Probabilistic reasoning continued

High probabilities are close to one while unlikely probabilities are close to zero.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

We should also think about what it meas for events to have high or low probability.
Discuss with a neighbor: when does the relative frequency approximation, \[ P(A) \approx \frac{\text{Frequency event }A\text{ is observed}}{\text{Total number of times the process is repeated}} \] equal to one?

This will occur when we observe the event \( A \) every time the process is replicated.

Discuss with a neighbor: when does the relative frequency approximation equal to zero?

This will occur when we observe the event \( A \) none of the times the process is replicated.

We can use this kind of reasoning to understand the meaning of probability \( 1 \) and probability of \( 0 \).
Discuss with a neighbor: if a year is selected at random, find the probability that Thanksgiving Day in the United States will be \( A= \)"on a Wednesday" or \( B= \)" on a Thursday"?

We note that by design, Thanksgiving is always held on Thursdays and therefore the \( P(A)=0 \) while \( P(B)=1 \).

A probability of \( 1 \) for some event means total certainty while \( 0 \) means that there is no chance.
Similarly, values close to \( 1 \) mean that it is likely (though not certain); and
values close to \( 0 \) mean that it is unlikely (though not impossible).

Probabilistic reasoning continued

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Statistical significance can be interpreted the same way as discussed on the last slide.

Significantly high number of successes: if the probability of at least \( x \) successes out of \( n \) trials is less than \( 0.05 \), this number of successes is not likely due to random chance.

For this reason, we would call this number of successful trials to be significantly high.

Significantly low number of successes: if the probability of at most \( x \) successes out of \( n \) trials is less than \( 0.05 \), this number of failures is not likely due to random chance.

For this reason, we would call this number of successful trials to be significantly low.

Discuss with a neighbor: when a fair coin is tossed \( 1000 \) times, the result consists of exactly \( 500 \) heads. The probability of getting exactly \( 500 \) heads in \( 1000 \) tosses is \( 0.0252 \).
Is this result unlikely? Is \( 500 \) heads unusually low or unusually high?

The result of exactly \( 500 \) heads is unlikely because the probability of \( 500 \) heads is \( 0.0252 < 0.05 \).
However, the probability of getting at least \( 500 \) heads is \( 0.5 \), so that this value is not significantly high.
Likewise, the probability of getting at most \( 500 \) heads is \( 0.5 \), so that this value is not significantly low.

Even though for a fair coin, we may expect to get \( 500 \) heads and \( 500 \) tails with \( 1000 \) coin flips, this value itself is not especially likely;

rather, this lies at the center of mass for the process, so that it will be close to most outcomes of the relative frequency approximation.

Probabilistic reasoning continued

Courtesy of Mario Triola, Essentials of Statistics, 5th edition

Discuss with a neihbor: when a fair coin is tossed 1000 times, the result consists of 10 heads. Consider the figure to the left. Is this result unlikely? Is 10 heads unusually low or unusually high?

We know that even the center of mass \( 500 \) heads and \( 500 \) tails is unlikely, so certainly a seeming rare event like \( 10 \) heads must be unlikely.
Indeed, the cutoff for significance are the following:

Getting \( 468 \) or fewer heads has a probability of \( 5\% \), so we consider any number of heads \( x \) with \( x\leq 468 \) to be significantly low.
Getting \( 532 \) or more heads has a probability of \( 5\% \), so we consider any number of heads \( x \) with \( x\geq 532 \) to be significantly high.

Remember, every individual outocme is unlikely including the expected value of \( 500 \) heads and \( 500 \) tails.
However, this expected value lies at the center of mass for the probability, so that it will probably be close-by to the actual outcome.

Probabilistic reasoning continued

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

We will also denote subjective probability approximations to be probabilities that are given subjectively, by using loose knowledge of the question at hand.

When trying to estimate the probability of a passenger dying in a plane crash, we know that there are thousands of flights every day, but fatal plane crashes are quite rare,so the probability is very small.
Providing a rough estimate for this like, “the probability is around one in ten million” is what is meant by subjective probability approximations.
These are not really intended to be taken literally, but to give overall picture of how likely such an event could be.

With modern technology, we also have the ability to simulate a process we want to study or at least some process that is close to this.
Good approximations of probabilities can often be found by simulations with modern computing, though this goes beyond the scope of this course.

Complementary events

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Let \( A \) be some event. Then we define the complementary event \( \overline{A} \) to be all outcomes in which \( A \) does not occur.

For example, if \( A= \)"three male births" or \( A=\{bbb\} \), then \( \overline{A} \) is all other outcomes in the sample space.

Discuss with a neighbor: we will assume once again for simplicity that a female and male birth are equally likely.
If \( A= \)"exactly two female births", what is the complement \( \overline{A} \) and what is the probability of \( \overline{A}? \)

The cases when there are exactly \( 2 \) female births are \( A=\{ggb,gbg,bgg\} \).
Therefore, \( \overline{A}=\{bbb,bbg,bgb,gbb,ggg\} \) and \( P\left(\overline{A}\right)=\frac{5}{8} \).

Discuss with your neighbor: what is the \( P(A) \)? How is this related to \( P\left(\overline{A}\right) \)?

From the above, we can see \( P(A)=\frac{3}{8} \).

Complementary events continued

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

More generally, we can think of the relationship as, \[ \begin{align} P(A) &= \frac{\text{Number of ways for }A\text{ to occur}}{\text{Number of simple events}} \\ &=\frac{\text{Number of simple events} - \text{Number of ways for }A\text{ not to occur}}{\text{Number of simple events}}\\ &= 1 - P\left(\overline{A}\right) \end{align} \]
That is, complimentary events have complimentary probabilities.
Probability can only take values between \( 0 \) and \( 1 \) so that, \[ P(A) + P\left(\overline{A}\right) = 1, \] as together, all the ways \( A \) can occur and all the ways \( A \) cannot occur make up all possible outcomes.

Odds

Expressions of likelihood are often given as odds, such as \( 50:1 \) (or “50 to 1”).
Because the use of odds makes many calculations difficult, statisticians, mathematicians, and scientists prefer to use probabilities.
The advantage of odds is that they make it easier to deal with money transfers associated with gambling, so they tend to be used in casinos, lotteries, and racetracks.
Note – in the three definitions that follow, the actual odds against and the actual odds in favor are calculated with the actual likelihood of some event;
- however, the payoff odds describe the relationship between the bet and the amount of the payoff.
The actual odds correspond to actual probabilities of outcomes, but the payoff odds are set by racetrack and casino operators.
- Racetracks and casinos are in business to make a profit, so the payoff odds will not be the same as the actual odds.

Odds continued

Actual odds against event \( A \) – this is the probability of event \( \overline{A} \) relative to the event \( A \), i.e.,

\[ \frac{P\left(\overline{A}\right)}{P(A)} \]
- Actualy odds against is usually expressed in the form of \( a:b \) (or “\( a \) to \( b \)”), where \( a \) and \( b \) are integers having no common factors.
Actual odds in favor of event \( A \) – this is the probability of event \( A \) relative to the event \( \overline{A} \), i.e.,

\[ \frac{P(A)}{P\left(\overline{A}\right)} \]
- If the odds against \( A \) are \( a:b \), then the odds in favor of \( A \) are \( b:a \).
Payoff odds against event \( A \) – this is the ratio of net profit (if you win) to the amount bet:

\[ \text{payoff odds against event }A = (\text{net profit}):(\text{amount bet}) \]