STAT 757 – Section 1001
Instructor: Colin Grudzien
Due: 12/14/2019 – 12:00 PM
You may work with others on this project but you must turn in your own work. You may type your solutions in any way you like (LaTeX, Markdown, Office, etc…) as long as you present your work clearly and in an organized way. Unless otherwise specified, you must hand in a printed copy of your work at the beginning of class. Whenever plotting:
- Your plot must be clearly labeled in all axes, legends, and the plot must include a clear title.
- The plot must be sensible and easy to read.
For the final project, you are to revise the work of your midterm project using the more advanced perspective and tools that we have developed by the end of the course. You will once again write a research report in RMarkdown. The report must be uploaded in a PDF format to Webcampus. The report should be around 10-12 pages, including figures, but not including any references or appendices in this page count. Include figures and tables for the most important components of your analysis, and for explanation purposes.
Your corresponding code and work should be included in the final appendix, section 8; I reserve the right to request a copy of the original analysis. If there isn’t sufficient documentation in the appendix and this cannot be provided by the student at request, the midterm will not receive any credit. Cases of plagiarism will be handled furthermore with respect to the policy on academic dishonesty.Your report should be written clearly and structured as follows:
The rubric below describes the necessary work delivered per category and associated points in this assignment for full credit. Reports that do not address all of these points, or give inadequate attention to these points will receive partial credit. Adequate attention is contextual and subjective, based on the problem itself and the overall work performed in the report. Students are encouraged to discuss their report in a rough draft with the instructor to get feedback on how to better address these points. Additionally, reports that do not follow document outline, do not use clear language, have formatting or writing errors, or unprofessional figures may be penalized for some of the points below.
|Category||Expected results||Total points|
|Research question||The student effectively discusses their research question. The student clearly describes why this question is relevant and interesting.||10 points|
|Diagnostics||The student effectively discusses whether the standard regressions hypotheses seem to be satisfied for the midterm model based on statstical and visual tests. Hypotheses that do not seem to be fully satisfied are discussed, and the implications of these hypotheses failing is qualified to the reader in the analysis.||12 points|
|Remediation||The student discusses what remedial measures have been tried to obtain a better model. Statistical tests such as, e.g., Box-Cox should be used to discuss the rationale for transformation of variables or not. The student uses diagnostics to evaluate if remedial measures have improved the suitability of the model.||12 points|
|Goodness of fit and sources uncertainty||The student discusses measure of goodness of fit such as adjusted \(R^2\) and how this value is interpreted in this problem. Differences between the model from the midterm and the model selected in the final (if any) are discussed and are connected to the research problem. If the model selected is the same, the student discusses if there appears to be an indentifiable linear signal in the data, or if the methods are inconclusive.||12 points|
|Predictive power||The student discusses differences (if any) between predictions of the model from the midterm and the model in the final exam. The student discusses if the confidence intervals are practical / actionable for the research question. The student discusses the cross validation RMSE of the midterm and the final model, and if there are substantial differences. The student discusses if the confidence intervals for predicting new cases provide similar results to the cross validation RMSE.||12 points|
|Explanatory power||The student effectively interprets the effects of the predictors on the response variable, in terms of the sign of the parameter, the size of the parameter relative the scales of the variables, and the relative importance of different effects in the model. The student discusses possible confounding variables, effects of correlations between predictors and if the interpretation of parameters is stable across multiple models.||12 points|
|Discussion and Conclusion||The student effectively summarizes their work in the project and draws final connections between the statistical relationships observed and the research question posed at the beginning. The student discusses how the original research question might be revised and what the research question will be for the final.||10 points|
|Grand total||All of the above||80/80 points|
In addition, reports that fail to follow the instructions of this assignment, the structure for the proposal, or to meet standards of scientific writing may be subject to a loss of points.