Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
While the normal distribution is frequently applied to describe the underlying distribution of a statistical experiment, asymptotic test statistics are often based on a transformation of a (non-) normal rv.
To get a better understanding of these tests, it will be helpful to study the \( \chi^2 \), t- and F-distributions, and their relations with the normal one.
We will begin with the \( \chi^2 \) distribution, describing the sum of the squares of independent standard normal rvs.
If \( Z_i \sim N(0, 1) \), for \( i = 1, \cdots, n \) are independent, then the rv \( X \) given by
\[ \begin{align} X = \sum_{i=1}^n Z_i \sim \chi^2_n \end{align} \] the \( \chi^2_n \) distribution in \( n \) total degrees of freedom.
This distribution is of particular interest since it describes the distribution of a sample variance as an unbiased estimator varying about the true parameter.
The standard implemented functions for the \( \chi^2 \) distribution are as follow:
dchisq(x, df)
is the pdf;pchisq(q, df)
is the cdf;qchisq(p, df)
is the quantile;rchisq(n, df)
is the function for generating a sample.Same as for other distributions, if log = TRUE
in dchisq
function, then log density is computed, which is useful for maximum likelihood estimation.
Similar to the functions for the t and F distributions, all the functions also have the parameter ncp
which is the non-negative parameter of non-centrality,
In the below we plot how the pdf of the \( \chi^2 \) changes for higher numbers of degrees of freedom.
par(cex = 2.0, mar = c(5, 4, 4, 2) + 0.3)
z = seq(0, 50, length = 300)
df = c(5, 10, 15, 25)
colors = c("black", "red", "green", "blue")
plot(z, dchisq(z, df[1]), type = "l", xlab = "z", ylab = "pdf")
for (i in 2:4) { lines(z, dchisq(z, df[i]), col = colors[i])}
par(cex = 2.0, mar = c(5, 4, 4, 2) + 0.3)
z = seq(0, 50, length = 300)
m = c(1, 2)
plot(z, dchisq(z, m[1]), type = "l", xlab = "z", ylab = "pdf", xlim = c(0, 10), xaxs = "i", yaxs = "i")
lines(z, dchisq(z, m[2]), col = "blue")
In the first case, the vertical axis is an asymptote and the distribution is not defined at 0.
In the second case, the curve steadily decreases from the value 0.5
pchisq
function we can plot the cdf for each number of degrees of freedom n=5, n=10, n=15 and n=25 aspar(cex = 2.0, mar = c(5, 4, 4, 2) + 0.3)
z = seq(0, 50, length = 300)
df = c(5, 10, 15, 25)
colors = c("red", "green", "blue")
plot(z, pchisq(z, df[1]), type = "l", xlab = "z", ylab = "cdf")
for (i in 2:4) { lines(z, pchisq(z, df[i]), col = colors[i-1]) }
A distinctive feature of \( \chi^2 \) is that it is positive, due to the fact that it represents a sum of squared values.
The expectation and variance are both given by,
\[ \begin{align} \mathbb{E}\left[X\right] = n & & \mathrm{var}\left(X\right) = 2n \end{align} \]
Courtesy of Mario Triola, Essentials of Statistics, 6th edition
Slightly more formally, a combination of the normal and \( \chi^2 \) distributions is represented by the t-distribution.
Let \( X \sim N(0, 1) \) and \( Y \sim \chi^2_n \) be independent rvs;
\[ \begin{align} Z = \frac{X}{\sqrt{Y/n}} \sim t_{n-1} \end{align} \]
The pdf of the t-distribution is
\[ \begin{align} f(z,n) = \frac{\Gamma\left\{\frac{n+1}{2}\right\}}{\sqrt{\pi n} \Gamma\left(\frac{n}{2}\right)\left(1 + \frac{z^2}{n}\right)^{\frac{n+1}{2}}} \end{align} \]
We plot the density below fo n=1, n=2 and n=5 degrees of freedom with the normal density plotted for reference.
par(cex = 2.0, mar = c(5, 4, 4, 2) + 0.3)
t = seq(-5, 5, length = 300)
colors = c("black", "red", "green")
df = c(1, 2, 5) # degrees of freedom(df) for the t-distribution
plot(t, dnorm(t, 0, 1), xlab = "t", ylab = "pdf", type = "l", lwd = 2, col="blue")
for (i in 2:4) { lines(t, dt(t, df[i]), col = colors[i])}
For \( n > 2 \) degrees of freedom, the expectation and variance of Student’s t-distribution are
\[ \begin{align} \mathbb{E}\left[Z\right] = 0 & & \mathrm{var}\left(Z\right) = \frac{n}{n-2} \end{align} \]
By symmetry, the skewness is always zero for the student's t-distribution.
The quantiles of a t-distributed rv \( Z \) are denoted by \( t_p \), and, due to symmetry, \( t_p =−t_{1− p} \).
Thus, when stating the hypothesis for a two-sided test, the critical values are used with given significance levels \( \alpha \) as follows:
\[ P\left(\vert Z\vert > \vert t_{\alpha/2}\vert \right) = \alpha \]
While the student's t-distribution is often used for analyzing the means of samples, we note that we may want to compare the variances of multiple samples too.
Knowing that the sample variances of normal distributions are distributed according to a \( \chi^2 \), this motivates our definition in the following.
The rv \( Z \) has the Fisher–Snedecor (F-distribution) distribution with \( n \) and \( m \) degrees of freedom if
\[ \begin{align} Z = \frac{\chi^2(n)/ n}{\chi^2(m)/m} \end{align} \]
where \( \chi^2(n) ∼ \chi^2_n \) and \( χ^2(m) ∼ \chi^2_m \) are independent rvs.
The pdf and the cdf of the F-distribtution become especially complicated and we will suppress a direct statement of these.
The procedures in R dedicated to this distribution require the parameters \( n \) and \( m \) as well:
df(x, df1, df2)
is the pdf;pf(q, df1, df2)
is the cdf;qf(p, df1, df2)
is the quantile;rf(n, df1, df2)
is the function for generating a sample.Here parameters df1
and df2
are the two degrees of freedom parameters.
log = TRUE
in df
function, then log density is computed, which is useful for maximum likelihood estimation. Also similar to the functions for the \( \chi^2 \) and t-distribution, all the above-mentioned functions have the non-centrality parameter ncp
.
par(cex = 2.0, mar = c(5, 4, 4, 2) + 0.3)
z = seq(0, 5, length = 300)
n = c(1, 2, 3, 50)
m = c(1, 6, 10, 50)
colors = c("black", "red", "green", "blue")
plot(z, df(z, n[1], m[1]), type = "l", xlab = "z", ylab = "pdf", ylim = c(0, 1.5))
for (i in 2:4) { lines(z, df(z, n[i], m[i]), col = colors[i])}