Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
require(gapminder)
require(ggplot2)
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point()
The first thing we do is call the ggplot
function.
This function lets R know that we're creating a new plot, and any of the arguments we give the
ggplot
function are the global options for the plot:
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point()
We've passed in two arguments to ggplot
.
First, we tell ggplot
what data we want to show on our figure, in this example the gapminder data we read in earlier.
For the second argument, we passed in the aes
function, which tells ggplot
how variables in the data map to aesthetic properties of the figure;
Here we told ggplot
we want to plot the “gdpPercap” column on the x-axis and the “lifeExp” column on the y-axis.
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point()
Notice that we didn't need to explicitly pass aes
these columns (e.g. x = gapminder[, "gdpPercap"]
);
ggplot
will look in the dataframe for that column.ggplot
isn't enough to draw a figure:ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
We need to tell ggplot
how we want to visually represent the data, which we do by adding a new geom layer.
In our example, we used geom_point
, which tells ggplot
we want to visually represent the relationship between x and y as a scatterplot of points.
Using a scatterplot probably isn't the best for visualizing change over time.
Instead, let's tell ggplot
to visualize the data as a line plot:
ggplot(data = gapminder, mapping = aes(x=year, y=lifeExp, by=country, color=continent)) +
geom_line()
Instead of adding a geom_point
layer, we've added a geom_line
layer.
We've added the by aesthetic, which tells ggplot
to draw a line for each country.
Q: what do you think we can do to visualize both lines and points on the plot?
A: We can add another layer to the plot:
ggplot(data = gapminder, mapping = aes(x=year, y=lifeExp, by=country, color=continent)) +
geom_line() + geom_point()
It's important to note that each layer is drawn on top of the previous layer.
In this example, the points have been drawn on top of the lines.
Here's a demonstration:
ggplot(data = gapminder, mapping = aes(x=year, y=lifeExp, by=country)) +
geom_line(mapping = aes(color=continent)) + geom_point()
In this example, the aesthetic mapping of color has been moved from the global plot options in ggplot
to the geom_line
layer so it no longer applies to the points.
So far, we've seen how to use an aesthetic (such as color) as a mapping to a variable in the data.
For example, when we use geom_line(mapping = aes(color=continent))
, ggplot will give a different color to each continent.
But what if we want to change the color of all lines to blue?
geom_line(mapping = aes(color="blue"))
should work, but it doesn't. Since we don't want to create a mapping to a specific variable, we can move the color specification outside of the aes()
function, like this: geom_line(color="blue")
.
ggplot(data = gapminder, mapping = aes(x=year, y=lifeExp, by=country)) +
geom_line(mapping = aes(color=continent)) + geom_line(color="blue")
ggplot2
also makes it easy to overlay statistical models over the data – to demonstrate we'll go back to our first example:ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point()
We can change the scale of units on the x axis using the scale functions.
These control the mapping between the data values and visual values of an aesthetic.
We can also modify the transparency of the points, using the alpha function, which is especially helpful when you have a large amount of data which is very clustered.
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point(alpha = 0.5) + scale_x_log10()
ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point(alpha = 0.5) + scale_x_log10()
The log10
function applied a transformation to the values of the gdpPercap column before rendering them on the plot, so that each multiple of 10 now only corresponds to an increase in 1 on the transformed scale,
This makes it easier to visualize the spread of data on the x-axis.
geom_smooth
:ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point() + scale_x_log10() + geom_smooth(method="lm")
The lm
method refers to the fact we are using the standard “linear model” regression function built-in to R.
The regression line is meant to describe the trend between an increase in income (in a log-10 scale) and the associated increase in life expectancy.
lm
function can be used directly as follows:my_linear_model_object <- lm(lifeExp ~ log10(gdpPercap), data=gapminder)
Notice in the above expression, we utilize a special ~
operator to instruct the function that we are inserting a proxy for a mathematical equation.
This is meant to represent the formula
\[ \begin{align} Y_\mathrm{\text{Life expectancy}} = \beta_0 + \beta_1 \log_{10}\left( X_\mathrm{\text{GDP per capita}}\right) + \epsilon \end{align} \]
The linear model is an object that has certain built-in methods that are standard in regression analysis.
summary(my_linear_model_object)
Call:
lm(formula = lifeExp ~ log10(gdpPercap), data = gapminder)
Residuals:
Min 1Q Median 3Q Max
-32.778 -4.204 1.212 4.658 19.285
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.1009 1.2277 -7.413 1.93e-13 ***
log10(gdpPercap) 19.3534 0.3425 56.500 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.62 on 1702 degrees of freedom
Multiple R-squared: 0.6522, Adjusted R-squared: 0.652
F-statistic: 3192 on 1 and 1702 DF, p-value: < 2.2e-16
Notice our formula is represented with a keyword argument in the above summary.
log10(gdpPercap)
and the \( R^2 \) value above.geom_smooth
layer:ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp)) +
geom_point() + scale_x_log10() + geom_smooth(method="lm", size=1.5)
There are two ways an aesthetic can be specified. Here we set the size aesthetic by passing it as an argument to geom_smooth
.
Previously in the lesson we've used the aes
function to define a mapping between data
variables and their visual representation.
Earlier we visualized the change in life expectancy over time across all countries in one plot but we can split this out over multiple panels by adding a layer of facet panels.
We start by making a subset of data including only countries located in the Americas.
This includes 25 countries, which will begin to clutter the figure.
americas <- gapminder[gapminder$continent == "Americas",]
ggplot(data = americas, mapping = aes(x = year, y = lifeExp)) +
geom_line() + facet_wrap( ~ country)
facet_wrap
layer also takes a “formula” as its argument, denoted by the tilde (~), telling R to draw a panel for each unique value in the country column of the gapminder dataset.