Midterm 1 Study Guide

STAT 757 – Section 1001
Instructor: Colin Grudzien
Midterm Time: 09/23/2020 – 1:00 - 2:15 PM

Instructions

Midterm 1 will be a proctored exam through Zoom. You are expected to attend the regular class session in Zoom with your camera on with you work space visible to the camera. Canvas will log your browser behavior and you are expected to remain in the Canvas Midterm 1 window at all times during the exam. A record of leaving this page may be considered cheating. You will be allowed to use any hard copies of resources including books, notes and print-outs that you want. No electronic resources will be allowed on the exam except with special approval by the instructor, in order to prevent communication between test takers and others. It is recommended that you take notes from the lectures based on each of the following conceptual questions. In addition, you are recommended to study all of the quiz and and activity questions.

Problems:

Problem 1:

State the form of the standard linear regression model, either in matrix form or in a scalar equation. State the assumptions of the Gauss-Markov theorem. Identify which parts of the equation are known or unknown, and which are deterministic or random in the standard model. Supposing that we have a case, and maybe some knowledge of the parameters, you should be able to make a simple numeric calculation using the regression equation. You should understand the meaning of the parameters in the simple regression formulation and the implications for the regression function for special values.

Problem 2:

Let \(\hat{\boldsymbol{\beta}}\) be the vector of parameters estimated by least squares. What is the expected value and the covariance of \(\hat{\boldsymbol{\beta}}\)? Explain in what sense the least squares estimated parameters are an optimal estimate.

Problem 3:

State the form of the estimated mean function for the standard model. What mathematical quantity describes the difference between the fitted values and the observed cases? What is the expected value of the difference between our fitted values and the observed cases?

Problem 4:

We suppose that the errors in the standard model in Problem 1 are also Gaussian. Explain which of the quantities in problems 1 - 3 are also Gaussian random variables. You do not need to include equations, but you should explain why these are Gaussian random variables with simple reasoning.

Problem 5:

We suppose that the errors in the standard model in Problem 1 are also Gaussian. What can we state in addition to the Gauss-Markov theorem about the least squares estimated \(\hat{\boldsymbol{\beta}}\)? Explain why a Gaussian error approximation might be reasonable, if not always stictly accurate. State the name of a key theorem, and under what conditions it should apply.

Problem 6:

Suppose that instead of the standard model in problem 1, we suppose \(X,Y\) are jointly Gaussian distributed random variables. Explain in what sense we can create a linear regression model in this context. Specifically, what does the estimated mean function represent in this context?

Problem 7:

Explain the meaning of each of the \(RSS\), \(TSS\) and \(ESS\). What is (broadly speaking) the implication for our regression analysis if

the \(RSS\) is large or the \(RSS\) is small;
the \(TSS\) is large or the \(TSS\) is small;
the \(ESS\) is large or the \(ESS\) is small?

Problem 8:

What is \(R^2\)? What is its possible range of values and what does each end correspond to in terms of goodness of fit and / or the relationship to \(RSS/TSS/ESS\)?

Problem 9:

Explain how many degrees of freedom are available in a simple regression model with \(n\) observations. Where do the constraints come from?

Problem 10:

Suppose we have \(n=3\) cases of data \(\left\{ \left(Y_i, X_i\right)\right\}_{i=1}^3\). Construct the design matrix and explain what each column corresponds to.

Problem 11:

What issues do we encounter in regression analysis if the number of cases \(n\) is equal to the number of parameters \(p\)? What is (potentially) the behavior of the fitted values?

Problem 12:

You should be familiar with some basic matrix algebra. You should be familiar with the difference between an inner product and an outer product of vectors, the covariance matrix, the norm of a vector and how these quantities relate to our regression formulation in vectors and matrices.

Problem 13:

Explain the concept of orthogonality and how it arises in our regression formulation in matrices. You should be able to identify different orthogonal quantities by our assumptions and you should have a working understanding of projection operators and their properties.