Instructor: Colin Grudzien

## Instructions: We will work through the following series of activities as a group and hold small group work and discussions in Zoom Breakout Rooms. Follow the instructions in each sub-section when the instructor assigns you to a breakout room. ## Activities: ```{r} require("numDeriv") require("optimx") require("ggplot2") ``` ## Activity 1: The function `optim` provides algorithms for general-purpose optimizations. We will examine how to use this on a direct problem of minimizing the residual sum of squares. Suppose we are given the following data in x-y pairs as measurements of some relationship. ```{r} dat=data.frame(x=c(1,2,3,4,5,6), y=c(1,3,5,6,8,12)) ``` ### Question 1: Generate a plot of the data to determine if a linear relationship between the data points is visually feasible. ### Question 2: Next, we will define a function that calculates the residual sum of squares given input data and a selection of two different parameter values: `y = par[1] + par[2] * x`. In this expression, `x` and `y` are to be called from the dataframe as defined earlier. ```{r} min_RSS <- function(data, par) { with(data, sum((par[1] + par[2] * x - y)^2)) } ``` By visual inspection above, we can deduce that a line that would look reasonable between the data points might have a slope between $[1,3]$, and an intercept between $[-4, 2]$ (as very rough guesses). Rather than performing an optimization as with, e.g., Newton's method, let's consider the "brute force" approach. This means, let's just sample a dense grid for the possible values of the slope and intercept and plot what the residual sum of squares output is for each combination of parameters. This is a very small problem, so this is feasible, but this will also be expensive enough to demonstrate why using the geometry as with Newton's method is useful in practice. Define these parameter ranges, and use a step size of `by=0.05`. Then use the `expand.grid` function to create a Cartesian grid of all parameter combinations. Finally, construct a dataframe that includes the columns `RSS`, `intercept` and `slope`, where we store the output of the min_RSS function in the RSS column for each associated pair of parameters for the line equation. We will use the visualization code provided below to examine the shape of the RSS cost function. After visualizing this, try changing the step size to `by=0.01`. What do you notice about the program speed? ```{r, eval=FALSE} ggplot(data = RSS_df)+ geom_tile( # visualize the data in tiles aes( x = intercept, y = slope, fill = RSS))+ # color of the tile is the RSS column in the data scale_fill_gradient( # adjust the fill color of the tiles low = "blue", high = "orange", trans= "log10" # set the scale to log base 10 RSS )+ labs( # labels x = "Intercept", y = "Slope", title = "RSS versus parameters", fill = "log_10-RSS" # legend title ) ``` ### Question 3: Now, we will use the above function and data with `optim` in order to solve this in a more efficient way. `optim` minimizes a function by varying its parameters and / or using the geometry of the cost function above. In particular, the default method is known as Nelder-Mead, which is a "systematic brute force method", using a smarter search than a simple grid, known a as symplex search. An example of this is illustrated from the [Wikipedia article on the method](https://upload.wikimedia.org/wikipedia/commons/d/de/Nelder-Mead_Himmelblau.gif). However, we can also supply an optimization routine such as `BFGS` that takes into account the geometry. `optim` like `integrate` allows us to pass parameters for the function object as additional keyword arguments. Try using `optim` with the `BFGS` technique and with the default `Nelder-Mead` technique to find the optimum parameters. Compare the results with the previous heat plot and compare the run times of the two methods. Use ```{r} initial_guess <- c(2, 3) ``` as the initial guess. Try changing the initial guess, how is the run time related to the geometry? How are the function and gradient calls related to the method? ### Question 4: In the problem we are considering, there is actually a direct solution that follows from linear least-squares theory and classical linear regression. Notice the `lm` function can be used to regress on the `y` values with the `x` values in the `dat` dataframe. Try using this solution and compare the outputs.