10/19/2020
Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
all rely on several assumptions, e.g., the conditions for the Gauss Markov theorem and usually Gaussianity of the errors.
Methods for checking and validating these assumptions are known as diagnostics.
Typically, we will start with one model as a best first guess.
Performing diagnostics will reveal issues in the model, and suggest ways for improvement.
Building a model is thus usually an interactive, iterative process, where we will create and perform diagnostics over a succession of models.
Courtesy of: Kutner, M. et al. Applied Linear Statistical Models 5th Edition
Courtesy of: Kutner, M. et al. Applied Linear Statistical Models 5th Edition
Courtesy of Inductiveload via Wikimedia Commons
Given the two probability measures \( P_1,P_2 \), and their respective CDF's, we can define their theoretical Q-Q plot as the graph of the \( \mathbb{R}^2 \) valued function,
\[ \begin{align} G:&[0,1] &\mapsto &\mathbb{R}^2& \\ &p &\mapsto &(C_1^{-1}(p), C_2^{-1}(p)) & \end{align} \]
Q: in the above, what does the point \( (x_1,x_2) \) correspond to?
Q: What kind of shape do we expect for the plot when the two CDFs are equal?
Suppose that \( P_1,P_2 \) represent the same family of probability measures, such that their CDFs differ only by location and shape, i.e.,
Therefore, if two distributions differ only in location and scale,
the Q-Q plot is just a straight line with slope \( \sigma \) and intercept \( \mu \).
For this reason, when making a Q-Q plot, the CDFs (and/or the data) are typically standardized to be mean zero and variance one.
By doing so, when measuring two distributions in the same family (as above), the Q-Q plot will just be the central diagonal of the plane.
With sample size \( n \), we can make \( n \) plotting positions for the inverse of the empirical and theoretical CDF, i.e., we can plot the points,
\[ \begin{align} \left\{\left(C_1^{-1}\left(\frac{i}{n+1}\right), x_i\right) \in \mathbb{R}^2: i = 1,\cdots,n \right\} \end{align} \]
\[ \begin{align} H_0: C_1 = C_2 \\ H_1: C_1 \neq C_2 \end{align} \]
using the empirical versus theoretical CDF in the Q-Q plot is very close to the Kolmogorov-Smirnov test.
The Kolmogorov-Smirnov test follows a similar principle and can be used generally to evaluate the divergence of the empirical CDF of a sample of observations from a hypothesized CDF.