Use the left and right arrow keys to navigate the presentation forward and backward respectively. You can also use the arrows at the bottom right of the screen to navigate with a mouse.
FAIR USE ACT DISCLAIMER: This site is for educational purposes only. This website may contain copyrighted material, the use of which has not been specifically authorized by the copyright holders. The material is made available on this website as a way to advance teaching, and copyright-protected materials are used to the extent necessary to make this class function in a distance learning environment. The Fair Use Copyright Disclaimer is under section 107 of the Copyright Act of 1976, allowance is made for “fair use” for purposes such as criticism, comment, news reporting, teaching, scholarship, education and research.
If we only had one data set to analyze, it would probably be faster to load the file into a spreadsheet and use that to plot simple statistics.
However, the gapminder data is updated periodically, and we may want to pull in that new information later and re-run our analysis again.
We may also obtain similar data from a different source in the future.
In this lesson, we'll learn how to write a function so that we can repeat several operations with a single command.
fahr_to_kelvin()
that converts temperatures from Fahrenheit to Kelvin:fahr_to_kelvin <- function(temp) {
kelvin <- ((temp - 32) * (5 / 9)) + 273.15
return(kelvin)
}
We define fahr_to_kelvin()
by assigning it to the output of function
.
The list of argument names are contained within parentheses.
The body of the function (the statements that are executed when it runs) is contained within curly braces ({}
).
The statements in the body are indented by two spaces. This makes the code easier to read but does not affect how the code operates.
fahr_to_kelvin <- function(temp) {
kelvin <- ((temp - 32) * (5 / 9)) + 273.15
return(kelvin)
}
It is useful to think of creating functions like writing a cookbook.
First you define the “ingredients” that your function needs.
In this case, we only need one ingredient to use our function: “temp”.
After we list our ingredients, we then say what we will do with them, in this case, we are taking our ingredient and applying a set of mathematical operators to it.
When we call the function, the values we pass to it as arguments are assigned to those variables so that we can use them inside the function.
Inside the function, we use a return statement to send a result back to whoever asked for it.
if()
statement, we can break out of the function and return a value based on a condition. # freezing point of water
fahr_to_kelvin(32)
[1] 273.15
# boiling point of water
fahr_to_kelvin(212)
[1] 373.15
The real power of functions comes from mixing, matching and combining them into ever-larger chunks to get the effect we want.
Let's define two functions that will convert temperature from Fahrenheit to Kelvin, and Kelvin to Celsius:
fahr_to_kelvin <- function(temp) {
kelvin <- ((temp - 32) * (5 / 9)) + 273.15
return(kelvin)
}
kelvin_to_celsius <- function(temp) {
celsius <- temp - 273.15
return(celsius)
}
Q: how can we define a function to convert directly from Fahrenheit to Celsius by reusing these two functions above?
A: consider the following code
fahr_to_celsius <- function(temp) {
temp_k <- fahr_to_kelvin(temp)
result <- kelvin_to_celsius(temp_k)
return(result)
}
TRUE
before proceeding.
fahr_to_kelvin()
, fahr_to_kelvin <- function(temp) {
kelvin <- ((temp - 32) * (5 / 9)) + 273.15
return(kelvin)
}
temp
must be a numeric
value;
stop()
. if()
temp
must be a numeric
vector, we could check for this condition with an if
statement and throw an error if the condition was violated. We could augment our function above like so:fahr_to_kelvin <- function(temp) {
if (!is.numeric(temp)) {
stop("temp must be a numeric vector.")
}
kelvin <- ((temp - 32) * (5 / 9)) + 273.15
return(kelvin)
}
stopifnot()
. TRUE
;
stopifnot()
throws an error if it finds one that is FALSE
.stopifnot()
Let's try out defensive programming with stopifnot()
by adding assertions to check the input to our function fahr_to_kelvin()
.
We want to assert the following: temp
is a numeric vector.
fahr_to_kelvin <- function(temp) {
stopifnot(is.numeric(temp))
kelvin <- ((temp - 32) * (5 / 9)) + 273.15
return(kelvin)
}
stopifnot()
# freezing point of water
fahr_to_kelvin(temp = 32)
[1] 273.15
# Metric is a factor instead of numeric
fahr_to_kelvin(temp = as.factor(32))
Error in fahr_to_kelvin(temp = as.factor(32)): is.numeric(temp) is not TRUE
# Takes a dataset and multiplies the population column
# with the GDP per capita column.
calcGDP <- function(dat) {
gdp <- dat$pop * dat$gdpPercap
return(gdp)
}
calcGDP()
by assigning it to the output of function
. {}
).require(gapminder)
calcGDP(head(gapminder))
[1] 6567086330 7585448670 8758855797 9648014150 9678553274 11697659231
calcGDP <- function(dat, year=NULL, country=NULL) {
if(!is.null(year)) {
dat <- dat[dat$year %in% year, ]
}
if (!is.null(country)) {
dat <- dat[dat$country %in% country,]
}
gdp <- dat$pop * dat$gdpPercap
new <- cbind(dat, gdp=gdp)
return(new)
}
head(calcGDP(gapminder, year=2007))
country continent year lifeExp pop gdpPercap gdp
1 Afghanistan Asia 2007 43.828 31889923 974.5803 31079291949
2 Albania Europe 2007 76.423 3600523 5937.0295 21376411360
3 Algeria Africa 2007 72.301 33333216 6223.3675 207444851958
4 Angola Africa 2007 42.731 12420476 4797.2313 59583895818
5 Argentina Americas 2007 75.320 40301927 12779.3796 515033625357
6 Australia Oceania 2007 81.235 20434176 34435.3674 703658358894
calcGDP(gapminder, country="Australia")
country continent year lifeExp pop gdpPercap gdp
1 Australia Oceania 1952 69.120 8691212 10039.60 87256254102
2 Australia Oceania 1957 70.330 9712569 10949.65 106349227169
3 Australia Oceania 1962 70.930 10794968 12217.23 131884573002
4 Australia Oceania 1967 71.100 11872264 14526.12 172457986742
5 Australia Oceania 1972 71.930 13177000 16788.63 221223770658
6 Australia Oceania 1977 73.490 14074100 18334.20 258037329175
7 Australia Oceania 1982 74.740 15184200 19477.01 295742804309
8 Australia Oceania 1987 76.320 16257249 21888.89 355853119294
9 Australia Oceania 1992 77.560 17481977 23424.77 409511234952
10 Australia Oceania 1997 78.830 18565243 26997.94 501223252921
11 Australia Oceania 2002 80.370 19546792 30687.75 599847158654
12 Australia Oceania 2007 81.235 20434176 34435.37 703658358894
calcGDP(gapminder, year=2007, country="Australia")
country continent year lifeExp pop gdpPercap gdp
1 Australia Oceania 2007 81.235 20434176 34435.37 703658358894
When we modify dat
inside the function we are modifying the copy of the gapminder dataset stored in dat
, not the original variable we gave as the first argument.
This is called “pass-by-value” and it makes writing code much safer:
calcGDP()
, the variables dat
, gdp
and new
only exist inside the body of the function. gdp <- dat$pop * dat$gdpPercap
new <- cbind(dat, gdp=gdp)
return(new)
}