Normal probability distributions part II

03/26/2020

Instructions:

Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.

FAIR USE ACT DISCLAIMER:
This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.

Outline

The following topics will be covered in this lecture:
- Sampling estimates for population parameters
- Sample estimates for population proportions
- Sample estimates for population means
- Sample estimates for population variances
- Biased versus unbiased estimators
- Central limit theorem

Motivation

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

So far we have learned two sets of skills

Summary statistics – used for analyzing samples; and
Probability – used to analyze complex events abstractly.

Our goal is to use statistics from small, representative samples to say something general about the larger, unobservable population.

Recall, the measures of the population are what we referred to as parameters.

Parameters are generally unknown and unknowable.

For example, the mean age of every adult living in the United States is a parameter for the adult population of the USA.
We cannot possibly know this value exactly as there are people who cannot be surveyed and / or don’t have accurate records.
If we have a representative sample we can compute the sample mean.
The sample mean will almost surely not equal population mean, due to the natural variation (sampling error) that occurs in any given sample.
However, if we have a good probabilistic model for the ages of adults, we can use the sample statistic to estimate the general, unknown population parameter.

Sampling distributions and estimating population parameters

Histogram of 50,000 sample proportions, bell shaped.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Let’s consider a hypothetical example from the book.

Autonomous vehicles are a new and controversial technology;

let’s suppose as in the book that as a population parameter, \( 70\% \) of US adults do not feel comfortable being driven by an autonomous vehicle.

In a TE Connectivity survey of \( 1000 \) US adults, \( 69\% \) of this sample responded that they also do not feel comfortable riding in an autonomous vehicle.
We will suppose that this survey is repeated \( 50,000 \) times to verify the accuracy of the sample statistic with replication.
Each time the survey results are replicated, we obtain a slightly different value for the proportion due to the inherent variability (sampling error).

The histogram above shows the sample-based values for the proportion of adults who do not feel comfortable riding in an autonomous vehicle.
Consider the following: what is interesting about the shape of the histogram above? What do you notice about how the sample-proportions are centered with respect to the population parameter \( 70\% \)?

The above histogram is shaped like a normal distribution.
Moreover, the center of the histogram lies very close to the true population parameter \( 70\% \).

This hypothetical example illustrates an important general property that will allow us to estimate population parameters.

When we do not know the true population parameter, its value can be estimated by representative samples.
Knowing the distribution of repeated sampling, we can estimate how close our sample value is to the true parameter.

Sampling distributions for proportions

Sample proportions tend to be normally distributed about the true parameter as the mean.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Let’s consider the last example more generally.
Suppose there is some population and we wish to find the true proportion \( p \) of the population for which some statement is true.

In the last example,

the population was US adults,
the statement was “Uncomfortable being diven by an autonomous vehicle”
the true proportion was \( p=0.70 \).

Let’s suppose that we will draw exactly \( n \) observations of the population by random sampling.

In the last example, the number of observations was \( n=1,000 \) adults.

Suppose we want to replicate this sampling procedure infinitely many times…

It impossible to replicate the sampling infinitely many times, but we can construct a probabilistic model for this replication process with a probability distribution.

Formally, we will define \( \hat{p} \) to be the random variable equal to the proportion derived from a random sample of \( n \) observations.

For each replication, \( \hat{p} \) attains a different value based on chance.

Then, for random, independent samples, \( \hat{p} \) tends to be normally distributed about \( p \).

We can thus use the value of \( \hat{p} \) and the distribution of \( \hat{p} \) to estimate \( p \) and how close we are to it.

Sampling distributions for proportions example

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

From the last slide, we will denote the random variable \( \hat{p} \) to be the sample proportion.

The way this random variable behaves over infinitely many replications is described by the sampling distribution for sample proportions.

Let’s consider another concrete example – let the sampling procedure be given by \( 5 \) rolls of a fair, six-sided die.

The random variable \( \hat{p} \) will be the proportion of five observations that are odd.

All values for each observation \( 1, 2, 3, 4, 5,6 \) are equally likely with half even and half odd.

Therefore, we can compute \( p \) – the population proportion – exactly as \( p=0.5 \).
That is, over infinitely many rolls of the dice, we expect half of the rolls to be odd.

However, each time we roll a fair, six-sided die \( 5 \) times we will obtain a different value for the random variable \( \hat{p} \).

Example sample proportions \( \hat{p} \) which are attained over different samples are pictured in the middle panel above.

If we replicate the sampling procedure many times, we can get a picture of the sampling distribution of sample proportions;

In the right panel above, we see the result of \( 10,000 \) replicated sampling procedures – i.e., rolling a fair, six-sided die \( 5 \) times, repeated \( 10,000 \) times.
The distribution is approximately normal, with center at the true value \( p=0.5 \).

Sampling distributions for means

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

We may also consider how to estimate a population mean \( \mu \).

Suppose there is some population and there is some numerical measure of the population \( x \) that we wish to find the true population mean \( \mu \) for.

For example,

the population can be all US adults,
the numerical measure can be \( x= \)"age"
the true mean would be the average age of all US adults.

Let’s suppose that we will draw exactly \( n \) observations of the population by random sampling.

Suppose we want to replicate this sampling procedure infinitely many times…

It is once again impossible to replicate the sampling infinitely many times, but we can construct a probabilistic model for this replication process with a probability distribution.

Formally, we will define \( \overline{x} \) to be the random variable equal to the mean derived from a random sample of \( n \) observations.

For each replication, \( \overline{x} \) attains a different value based on chance.

Then, for large numbers of random, independent samples, \( \overline{x} \) tends to be normally distributed about \( \mu \).

We can thus use the value of \( \overline{x} \) and the distribution of \( \overline{x} \) to estimate \( \mu \) and how close we are to it.

NOTE: one key difference from the sample proportions is that we need large sample sizes \( n \) for the distribution of sample means to be “close-to-normal”.

Sampling distributions for means example

Sampling distribution of the mean of population with values 4,5, 9.

Sampling distribution of the mean of population with values 4,5, 9 -- condensed version.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

We will begin with a very simplified example;
- usually the population size will be greater, and the number of observations in a sample should be much, much larger.
We will only use this example because we can look at the sampling distribution of the means concretely.
Suppose that the population of interest is a household with three children;

we want want to find the mean age of the population, where the children’s ages are \( 4, 5, \) and \( 9 \).

Our sampling procedure is to select two children with replacement and ask their age .

If \( x \) is the random variable equal to the age of a particular observation, \( \overline{x} \) will be the sample mean over two randomly chosen observations.

To the upper left, we have a table of all possible combinations of observations of the population with a sample size \( n=2 \).

In the middle column, we have the sample mean \( \overline{x} \) corresponding to the two observations.
Each possible combination for the observations are all equally likely, so they all have probability \( \frac{1}{9} \).

If we combine all possible ways we can obtain \( \overline{x}=4.0 \), all possible ways \( \overline{x}=4.5 \), etc… we have the bottom table.
The bottom table associates a probability value to each possible value \( \overline{x} \) might attain – therefore this is the probability distribution for the sample means \( \overline{x} \).

Sampling distributions for means example

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Let’s recall, we have a special name for the mean of a probability distribution:

this is denoted the expected value.

Consider the following: with the three ages in the population being \( 4,5, \) and \( 9 \) what is the population mean \( \mu \)? What is the expected value of the probability distribution for the sample means?

Notice that, \[ \mu = \frac{4 + 5 + 9}{3} = 6. \]

On the other hand, we have the expected value equal to,

\[ \begin{align} &\sum_{\overline{x}_\alpha \in \mathbf{R}} \overline{x}_\alpha \times P(\overline{x} = \overline{x}_\alpha )\\ =&4.0 \times P(\overline{x}=4.0) + 4.5 \times P(\overline{x}=4.5) + 5.0 \times P(\overline{x}=5.0) + 6.5 \times P(\overline{x}=6.5) + 7.0 \times P(\overline{x}=7.0) + 9 \times P(\overline{x} = 9.0)\\ =&4.0 \times\frac{1}{9} + 4.5 \times \frac{2}{9} + 5.0 \times \frac{1}{9} + 6.5 \times \frac{2}{9} + 7.0 \times \frac{2}{9} + 9 \times\frac{1}{9}\\ =&6.0 \end{align} \]

That is, the expected value of the sample mean \( \overline{x} \) is equal to the population mean \( \mu \).

Put another way, over infinitely many replicated samples we would expect to find the true population mean by taking the average over all the replicates;
this occurs generally, even when the probability distribution for the sample mean is not very normal as above.

Sampling distributions for means example

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Suppose we consider rolling the fair, six-sided die again.
The sampling procedure will once again be to roll the die \( 5 \) times.
\( x \) will be the random variable that will attain the value of a single observation in the sample.
\( \overline{x} \) will be the mean of the observations in a given sample (5 rolls).
By calculating all the possible outcomes, we can find that \( \mu=3.5 \), i.e.,
- over infinitely many rolls of the dice, we expect to get an average value of \( 3.5 \).

Each time we replicate the sampling procedure (each time we roll the die \( 5 \) times) we obtain a different value for \( \overline{x} \).
In the middle panel of the figure, we have possible values for \( \overline{x} \) that we record.
On the right panel, we see the result from replicating the sampling procedure \( 10,000 \) times.
In this case, the distribution of the sample means \( \overline{x} \) is approximately normal.
Also, as will always be the case, the sample means \( \overline{x} \) are distributed around the true population mean \( \mu = 3.5 \).

Sampling distributions for variance

Sample variances tend to be distributed right-skewed about the true parameter as the mean.

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

We will now consider how to estimate a population variance \( \sigma^2 \).

Suppose there is some population and there is some numerical measure of the population \( x \) that we wish to find the true population variance of \( \sigma^2 \) for.

For example,

the population can be all US adults,
the numerical measure can be \( x= \)"age"
the true variance would be the variance in the ages.

Let’s suppose that we will draw exactly \( n \) observations of the population by random sampling.

Suppose we want to replicate this sampling procedure infinitely many times…

We can construct a probabilistic model for this replication process with a probability distribution.

Formally, we will define \( s^2 \) to be the random variable equal to the variance derived from a random sample of \( n \) observations.

For each replication, \( s^2 \) attains a different value based on chance.

Then, for random, independent samples, \( s^2 \) tends to be distributed, right-skewed about \( \sigma^2 \).

We can thus use the value of \( s^2 \) and the distribution of \( s^2 \) to estimate \( \sigma^2 \) and how close we are to it.

NOTE: the primary difference with the distribution of sample variances from the previous examples is that this distribution is not normal, though still has a mean equal to \( \sigma^2 \).

Sampling distributions for variance example

Courtesy of Mario Triola, Essentials of Statistics, 6th edition

Suppose we consider rolling the fair, six-sided die again.
The sampling procedure will once again be to roll the die \( 5 \) times.
\( x \) will be the random variable that will attain the value of a single observation in the sample.
\( s^2 \) will be the variance of the observations in a given sample (5 rolls).
By calculating all the possible outcomes, we can find that \( \sigma^2=2.9 \), i.e.,
- over infinitely many rolls of the dice, we expect to get a population variance of \( 2.9 \).

Each time we replicate the sampling procedure (each time we roll the die \( 5 \) times) we obtain a different value for \( s^2 \).
In the middle panel of the figure, we have possible values for \( s^2 \) that we record.
On the right panel, we see the result from replicating the sampling procedure \( 10,000 \) times.
In this case, the distribution of the sample variances \( s^2 \) is distributed right-skewed about \( \sigma^2 \).
As will always be the case, the sample variances have an expected value of \( \sigma^2 \), but due to the skewness of the distribution, the mean does not equal the mode.

Biased versus unbiased estimators

In all of the previous examples, we saw how the statistics give sample-based estimates that target the true population parameter of interest.

Specifically, over infinitely many sample replications, the expected value of the sample statistic is equal to the true population parameter.

More generally, we can consider all possible ways to estimate some population parameter.
We will call some statistic that tries to infer a true population parameter an estimator
Not every way to estimate a parameter will target the parameter correctly, the way we describe above.

The statistics in the previous example are special because they are unbiased estimators.
Unbiased estimators are special because as random variables their expected value is the true population parameter.

Each of the:

sample mean \( \overline{x} \);
sample variance \( s^2 \); and
sample proportion \( \hat{p} \)

are unbiased estimators for their corresponding true population parameter.
On the other hand, each of the:

sample standard deviation \( s \);
sample median; and
sample range

are biased in their estimates of the corresponding true population parameter.

Central limit theorem

Random variables are the numerical measure of the outcome of a random process.

Courtesy of Mathieu Rouaud CC BY-SA via Wikimedia Commons

The property of the sample means \( \overline{x} \) being approximately normally distributed (for large enough sample sizes) is actually a fundamental and universal property in nature.
In fact, the random variable \( x \) can be extremely non-normal as pictured on the left of the figure.

Nonetheless, if we draw \( n \) independent samples for \( n \) sufficiently large,
the sample mean \( \overline{x} \) will be approximately normally distributed around the true \( \mu \).
We also know about “how far” a realization of \( \overline{x} \) usually lies from \( \mu \) by the standard devaition of \( \overline{x} \).

Formally, this phenomena is called the Central limit theorem:

Let \( x \) be a generic random variable with population mean \( \mu \) and standard deviation \( \sigma \). Suppose for a sample size of \( n \), we compute the sample mean as \( \overline{x} \). Then \( \overline{x} \) as a random variable, with realization determined by independent replicated sampling, will be approximately normally distributed with mean \( \mu \) and standard deviation \( \frac{\sigma}{\sqrt{n}} \), so long as \( n \) is sufficiently large.

For many situations, \( n \) needs to only be at least be greater than \( 30 \) for this to hold – however, this is not necessary if \( x \) is already normal.
Notice, as \( n \) becomes large, the approximation improves and \( \frac{\sigma}{\sqrt{n}} \) becomes small.
Therefore, for large sample sizes \( n \), \( \overline{x} \) tends to give estimates of \( \mu \) which are close because the standard deviation is small.

Central limit theorem example

Let’s consider an example of where we can apply the central limit theorem.

Suppose that an elevator states that the weight capacity is \( 4000 \) pounds or \( 27 \) passengers.
This says that the sample mean of \( 27 \) randomly selected passengers should be up to \( 148 \) pounds.
Adult men in the US have weights that are normally distributed with a mean \( \mu=189 \) and a standard deviation of \( \sigma=39 \) pounds.
Let’s consider a worst-case-scenario for this elevator – suppose that \( 27 \) adult men try to ride this elevator.
Consider the following: how can the central limit theorem be used to find the probability that the elevator will allow all these passengers simultaneously?

Notice, the problem can be phrased as, “what is the probability that the sample mean will be less than or equal to \( 148 \).
Let \( x \) be the random variable equal to one adult male’s weight, with mean \( \mu=189 \) and standard deviation \( \sigma=39 \).
Then for a sample size of \( 27 \), we can apply the central limit theorem because \( x \) is already normally distributed.

Note: when we have fewer than \( 30 \) samples and the random variable \( x \) is not normally distributed, we cannot use the central limit theorem.

Therefore, we can treat \( \overline{x} \) as normally distributed with mean \( \mu_\overline{x}=189 \) and standard deviation,

\[ \sigma_\overline{x}=\frac{\sigma}{\sqrt{n}}= \frac{39}{\sqrt{27}} \approx 7.50. \]
The question then is, "what is the probability that \( \overline{x} \) is less than or equal to \( 148 \)?”
We can study this directly in StatCrunch.

Key takeaways from the central limit theorem

The central limit theorem is a fundamental compoenent of the tools we will develop to finish this course, especially to estimate parameters and their uncertainty.
We should remember the following key takeaways about the central limit thereom:

We will suppose that we have a generic random variable \( x \) with population mean \( \mu \) and standard deviation \( \sigma \).
We suppose that we take simple random samples of size \( n \) of the population;

the sample mean \( \overline{x} \) is then a random variable that depends on the replicates of the sampling process.

The sample mean \( \overline{x} \) is always an unbiased estimator, so the mean of this random variable, \[ \mu_\overline{x} = \mu. \]
However, when \( x \) is normal, or usually when \( n>30 \), we can also say that \( \overline{x} \) is approximately normally distributed around \( \mu \) with standard deviation \[ \sigma_\overline{x} = \frac{\sigma}{\sqrt{n}}. \]
When \( x \) is not normally distributed and we have sample sizes \( n \leq 30 \) we cannot use the above property – this is not a good approximation.
On the other hand, when \( n \) is very large, this means that \[ \sigma_\overline{x} = \frac{\sigma}{\sqrt{n}} \] is very small.
The empirical rule then says that \( \overline{x} \) will usually be very close to \( \mu \) when we have a large number of samples \( n \).
We can even measure “how close” in terms of standard deviations and the probability that it will lie this close.