You will learn to:
 
- use a functional programming approach to simplify your code
- pass functions as arguments to higher order functions
- use purrr::mapto replaceforloops
- deal with nested lists
4 May 2017
 
purrr::map to replace for loopsEach atomic vector contains only a single type of data
Examples:
# Logical
c(TRUE, FALSE, TRUE)
# double
c(1, 5, 7)
# character
c("character", "sequence")Is the conversion between types
as.*() functionsv_example <- c(1, 3, 7) str(as.character(v_example))
chr [1:3] "1" "3" "7"
v_example <- c(1, 3) str(v_example)
num [1:2] 1 3
str(c(v_example, "seven"))
chr [1:3] "1" "3" "seven"
Adapted from the tutorial of Jenny Bryan
example
my_list <- list(1, 3, "seven") str(my_list)
List of 3 $ : num 1 $ : num 3 $ : chr "seven"
is.vector(my_list)
[1] TRUE
is.atomic(my_list)
[1] FALSE
Adapted from the tutorial of Jenny Bryan
 
purrr enhances R's functional programming toolkit by providing a complete and consistent set of tools for working with functions and vectors (purrr overview on github page)
functional programming is a programming paradigm – a style of building the structure and elements of computer programs – that treats computation as the evaluation of mathematical functions and avoids changing-state and mutable data Wikipedia
FOR EACH x DO f
Calculate the mean of each column of the mtcars dataset
for loopsmeans <- vector("double", ncol(mtcars))
for (i in seq_along(mtcars)) {
  means[i] <- mean(mtcars[[i]])
}
means
[1] 20.090625 6.187500 230.721875 [4] 146.687500 3.596563 3.217250 [7] 17.848750 0.437500 0.406250 [10] 3.687500 2.812500
apply(mtcars, 2, mean) %>% str()
Named num [1:11] 20.09 6.19 230.72 146.69 3.6 ... - attr(*, "names")= chr [1:11] "mpg" "cyl" "disp" "hp" ...
lapply(mtcars, mean) %>% str()
List of 11 $ mpg : num 20.1 $ cyl : num 6.19 $ disp: num 231 $ hp : num 147 $ drat: num 3.6 $ wt : num 3.22 $ qsec: num 17.8 $ vs : num 0.438 $ am : num 0.406 $ gear: num 3.69 $ carb: num 2.81
From R for Data Science
| function | input | output | 
|---|---|---|
| apply() | array | vector or array or list | 
| lapply() | list | list | 
| sapply() | list | vector or array or list | 
| vapply() | list | specified but inconsistent | 
apply() family are inconsistent.lapply()apply() family still useful to avoid dependencies (package development)purrr::map() family of functionsmap() is the general function and close to base::lapply()map() introduces shortcuts (absent in lapply())map_lgl(), map_int(), map_dbl(), map_chr(), map_df(), walk())Purrr::map()purrr::map() is type stable
map(YOUR_LIST, YOUR_FUNCTION)antennate <- function(x) put_on(x, antenna) map(legos, antennate)

legos

antenna

map(legos, antennate)
Split the mtcars dataset by each value of cylinder
spl_mtcars <- mtcars %>% split(.$cyl) str(spl_mtcars, max.level = 1)
List of 3 $ 4:'data.frame': 11 obs. of 11 variables: $ 6:'data.frame': 7 obs. of 11 variables: $ 8:'data.frame': 14 obs. of 11 variables:
From R for Data Science
purrr::map() reminder:
map(YOUR_LIST, YOUR_FUNCTION) # is equivalent to: YOUR_LIST %>% map(YOUR_FUNCTION)
YOUR_LIST = spl_mtcarsYOUR_FUNCTION can be an anonymous function (declared on the fly)
str(spl_mtcars, max.level = 1)
List of 3 $ 4:'data.frame': 11 obs. of 11 variables: $ 6:'data.frame': 7 obs. of 11 variables: $ 8:'data.frame': 14 obs. of 11 variables:
spl_mtcars %>% map(function(df) lm(mpg ~ wt, data = df))
$`4`
Call:
lm(formula = mpg ~ wt, data = df)
Coefficients:
(Intercept)           wt  
     39.571       -5.647  
$`6`
Call:
lm(formula = mpg ~ wt, data = df)
Coefficients:
(Intercept)           wt  
      28.41        -2.78  
$`8`
Call:
lm(formula = mpg ~ wt, data = df)
Coefficients:
(Intercept)           wt  
     23.868       -2.192  purrr::map() spl_mtcars %>% map(function(df) lm(mpg ~ wt, data = df)) %>% map(summary) %>% map(function(x) x$r.squared)
$`4` [1] 0.5086326 $`6` [1] 0.4645102 $`8` [1] 0.4229655
 
The code above can be simplified using shortcuts provided by purrr
purrr::map() One sided formula to create anonymous functions (using ~ and the placeholder . which refers to the current list element)
map(YOUR_LIST, function(df) lm(mpg ~ wt, data = df)) # is equivalent to: map(YOUR_LIST, ~ lm(mpg ~ wt, data = .))
Our previous code can be simplified:
spl_mtcars %>% map(function(df) lm(mpg ~ wt, data = df)) %>% map(summary) %>% map(function(x) x$r.squared)
spl_mtcars %>% map(~lm(mpg ~ wt, data = .)) %>% map(summary) %>% map(~.$r.squared)
spl_mtcars %>% 
  map(~lm(mpg ~ wt, data = .)) %>%
  map(summary) %>%
  map("r.squared")purrr::map() (like base base::lapply())
map() makes a list.map_lgl() makes a logical vector.map_int() makes an integer vector.map_dbl() makes a double vector.map_chr() makes a character vector.map_df() makes a data frame (row-binding elements)map()spl_mtcars %>% 
  map(~lm(mpg ~ wt, data = .)) %>%
  map(summary) %>%
  map("r.squared") %>% 
  str()
List of 3 $ 4: num 0.509 $ 6: num 0.465 $ 8: num 0.423
map_dbl()spl_mtcars %>% 
  map(~lm(mpg ~ wt, data = .)) %>%
  map(summary) %>%
  map_dbl("r.squared") %>% 
  str()
Named num [1:3] 0.509 0.465 0.423 - attr(*, "names")= chr [1:3] "4" "6" "8"
deaths MS Excel example file
path using the following command: path <- readxl_example("deaths.xls")readr lecture!library(readxl)
path <- readxl_example("deaths.xls")
path %>%
  excel_sheets() %>%
  map_df(read_excel, path = path, range = "A5:F15")
# A tibble: 20 x 6
                 Name Profession   Age `Has kids` `Date of birth`
                <chr>      <chr> <dbl>      <lgl>          <dttm>
 1        David Bowie   musician    69       TRUE      1947-01-08
 2      Carrie Fisher      actor    60       TRUE      1956-10-21
 3        Chuck Berry   musician    90       TRUE      1926-10-18
 4        Bill Paxton      actor    61       TRUE      1955-05-17
 5             Prince   musician    57       TRUE      1958-06-07
 6       Alan Rickman      actor    69      FALSE      1946-02-21
 7 Florence Henderson      actor    82       TRUE      1934-02-14
 8         Harper Lee     author    89      FALSE      1926-04-28
 9      Zsa Zsa Gábor      actor    99       TRUE      1917-02-06
10     George Michael   musician    53      FALSE      1963-06-25
11         Vera Rubin  scientist    88       TRUE      1928-07-23
12        Mohamed Ali    athlete    74       TRUE      1942-01-17
13       Morley Safer journalist    84       TRUE      1931-11-08
14       Fidel Castro politician    90       TRUE      1926-08-13
15     Antonin Scalia     lawyer    79       TRUE      1936-03-11
16             Jo Cox politician    41       TRUE      1974-06-22
17         Janet Reno     lawyer    78      FALSE      1938-07-21
18         Gwen Ifill journalist    61      FALSE      1955-09-29
19         John Glenn  astronaut    95       TRUE      1921-07-28
20         Pat Summit      coach    64       TRUE      1952-06-14
# ... with 1 more variables: `Date of death` <dttm>map_df() arguments.id argumentset_names()to set names in a vector
set_names(1:5)
1 2 3 4 5 1 2 3 4 5
set_names(1:5, nm = letters[1:5])
a b c d e 1 2 3 4 5
is a wrapper for setNames which does some checkings and is stricter
setNames(1:5, nm = letters[1:2])
a b <NA> <NA> <NA> 1 2 3 4 5
set_names(1:5, nm = letters[1:2])
Error: `x` and `nm` must be the same length
path %>% excel_sheets() %>% set_names() %>% # creates a named vector map_df(read_excel, path = path, range = "A5:F15", .id = "profession_area") %>% janitor::clean_names()
# A tibble: 20 x 7
   profession_area               name profession   age has_kids date_of_birth
             <chr>              <chr>      <chr> <dbl>    <lgl>        <dttm>
 1            arts        David Bowie   musician    69     TRUE    1947-01-08
 2            arts      Carrie Fisher      actor    60     TRUE    1956-10-21
 3            arts        Chuck Berry   musician    90     TRUE    1926-10-18
 4            arts        Bill Paxton      actor    61     TRUE    1955-05-17
 5            arts             Prince   musician    57     TRUE    1958-06-07
 6            arts       Alan Rickman      actor    69    FALSE    1946-02-21
 7            arts Florence Henderson      actor    82     TRUE    1934-02-14
 8            arts         Harper Lee     author    89    FALSE    1926-04-28
 9            arts      Zsa Zsa Gábor      actor    99     TRUE    1917-02-06
10            arts     George Michael   musician    53    FALSE    1963-06-25
11           other         Vera Rubin  scientist    88     TRUE    1928-07-23
12           other        Mohamed Ali    athlete    74     TRUE    1942-01-17
13           other       Morley Safer journalist    84     TRUE    1931-11-08
14           other       Fidel Castro politician    90     TRUE    1926-08-13
15           other     Antonin Scalia     lawyer    79     TRUE    1936-03-11
16           other             Jo Cox politician    41     TRUE    1974-06-22
17           other         Janet Reno     lawyer    78    FALSE    1938-07-21
18           other         Gwen Ifill journalist    61    FALSE    1955-09-29
19           other         John Glenn  astronaut    95     TRUE    1921-07-28
20           other         Pat Summit      coach    64     TRUE    1952-06-14
# ... with 1 more variables: date_of_death <dttm>map(legos, antennate) 
  
 
map(legos, antennate)
map2() 
 
enhair <- function(x, y) x %>% put_on(y) map2(legos, hairs, enhair)
dplyr 
tibble introduces list-columnsdplyrtidyr::nest()purrr::map 
tibble(numbers = 1:8,
       my_list = list(a = c("a", "b"), b = 2.56, 
                      c = c("a", "b"), d = rep(TRUE, 4),
                      d = 2:3, e = 4:6, f = "Z", g = 1:4))
# A tibble: 8 x 2
  numbers   my_list
    <int>    <list>
1       1 <chr [2]>
2       2 <dbl [1]>
3       3 <chr [2]>
4       4 <lgl [4]>
5       5 <int [2]>
6       6 <int [3]>
7       7 <chr [1]>
8       8 <int [4]>mtcars %>% group_by(cyl) %>% nest()
# A tibble: 3 x 2
    cyl               data
  <dbl>             <list>
1     6  <tibble [7 x 10]>
2     4 <tibble [11 x 10]>
3     8 <tibble [14 x 10]>mtcars %>%
  group_by(cyl) %>%
  nest() %>%
  mutate(model = map(data, ~lm(mpg ~ wt, data = .)),
         summary = map(model, summary),
         r_squared = map_dbl(summary, "r.squared"))
# A tibble: 3 x 5
    cyl               data    model          summary r_squared
  <dbl>             <list>   <list>           <list>     <dbl>
1     6  <tibble [7 x 10]> <S3: lm> <S3: summary.lm> 0.4645102
2     4 <tibble [11 x 10]> <S3: lm> <S3: summary.lm> 0.5086326
3     8 <tibble [14 x 10]> <S3: lm> <S3: summary.lm> 0.4229655dplyr, tidyr, tibble, purrr and broom nicely work togetherx1 <- list(c(1, 2), c(3, 4)) x2 <- list(list(1, 2), list(3, 4)) x3 <- list(1, list(2, list(3)))
 
from R for Data Science
flatten() functionflatten()flatten_int()flatten_dbl()flatten_lgl()flatten_chr()x <- list(list(a = 1, b = 2), list(c = 3, d = 4)) flatten(x) %>% str()
List of 4 $ a: num 1 $ b: num 2 $ c: num 3 $ d: num 4
flatten(x) %>% flatten_dbl() %>% str()
num [1:4] 1 2 3 4
 
from R for Data Science
base::unlist() vs flatten()unlist() but flatten() is more consistent
unlist() will always succeedunlist(1:3)
[1] 1 2 3
flatten(1:3)
Error: `.x` must be a list (integer)
my_list <- list("One", 2, 3)
unlist(my_list) %>% str()
chr [1:3] "One" "2" "3"
flatten(my_list) %>% str()
List of 3 $ : chr "One" $ : num 2 $ : num 3
flatten_dbl(my_list)
Error: Can't coerce element 1 from a character to a double
x <- list(
  x = list(a = 1,
           b = 3,
           c = 5),
  y = list(a = 2,
           b = 4,
           c = 6)
)
x %>% str()List of 2 $ x:List of 3 ..$ a: num 1 ..$ b: num 3 ..$ c: num 5 $ y:List of 3 ..$ a: num 2 ..$ b: num 4 ..$ c: num 6
x %>% transpose() %>% str()
List of 3 $ a:List of 2 ..$ x: num 1 ..$ y: num 2 $ b:List of 2 ..$ x: num 3 ..$ y: num 4 $ c:List of 2 ..$ x: num 5 ..$ y: num 6
l1 <- list(
  group1 = list(
    individual1 = list(param1 = 1:2, param2 = 3:4),
    individual2 = list(param1 = 5:6, param2 = 7:8)
  ),
  group2 = list(
    individual1 = list(param1 = 9:10, param2 = 11:12),
    individual2 = list(param1 = 13:14, param2 = 15:16)
  )
)
l1 %>% str()List of 2 $ group1:List of 2 ..$ individual1:List of 2 .. ..$ param1: int [1:2] 1 2 .. ..$ param2: int [1:2] 3 4 ..$ individual2:List of 2 .. ..$ param1: int [1:2] 5 6 .. ..$ param2: int [1:2] 7 8 $ group2:List of 2 ..$ individual1:List of 2 .. ..$ param1: int [1:2] 9 10 .. ..$ param2: int [1:2] 11 12 ..$ individual2:List of 2 .. ..$ param1: int [1:2] 13 14 .. ..$ param2: int [1:2] 15 16
 
listviewer::jsonedit()listviewer packagel1 %>% listviewer::jsonedit()
# Extract individual2 using a string shortcut
l1 %>%
  map("individual2") %>% str()
List of 2 $ group1:List of 2 ..$ param1: int [1:2] 5 6 ..$ param2: int [1:2] 7 8 $ group2:List of 2 ..$ param1: int [1:2] 13 14 ..$ param2: int [1:2] 15 16
# chain 2 map calls to get param1 out of both groups
l1 %>%
  map("individual2") %>%
  map("param1") %>% str()
List of 2 $ group1: int [1:2] 5 6 $ group2: int [1:2] 13 14
map()!l1 %>%
  map("individual2") %>%
  map("param1") %>% str()
List of 2 $ group1: int [1:2] 5 6 $ group2: int [1:2] 13 14
l1 %>%
  map(c("individual2", "param1")) %>% str()
List of 2 $ group1: int [1:2] 5 6 $ group2: int [1:2] 13 14
map()l1 %>% map(~map(., ~map(., mean))) %>% str()
List of 2 $ group1:List of 2 ..$ individual1:List of 2 .. ..$ param1: num 1.5 .. ..$ param2: num 3.5 ..$ individual2:List of 2 .. ..$ param1: num 5.5 .. ..$ param2: num 7.5 $ group2:List of 2 ..$ individual1:List of 2 .. ..$ param1: num 9.5 .. ..$ param2: num 11.5 ..$ individual2:List of 2 .. ..$ param1: num 13.5 .. ..$ param2: num 15.5
at_depth()map()map a function to a lower level of a list
x %>% at_depth(0, fun) # is equivalent to # fun(x) x %>% at_depth(1, fun) # is equivalent to # map(x, fun) x %>% at_depth(2, fun) # is equivalent to # map(x, . %>% map(fun))
l1 %>% map(~map(., ~map(., mean)))
l1 %>% at_depth(3, mean) %>% str()
List of 2 $ group1:List of 2 ..$ individual1:List of 2 .. ..$ param1: num 1.5 .. ..$ param2: num 3.5 ..$ individual2:List of 2 .. ..$ param1: num 5.5 .. ..$ param2: num 7.5 $ group2:List of 2 ..$ individual1:List of 2 .. ..$ param1: num 9.5 .. ..$ param2: num 11.5 ..$ individual2:List of 2 .. ..$ param1: num 13.5 .. ..$ param2: num 15.5
combines a list from the left
1:4 %>% reduce(`-`)
[1] -8
((1 - 2) - 3) - 4
[1] -8
combines a list from the right
1:4 %>% reduce_right(`-`)
[1] -2
1 - (2 - (3 - 4))
[1] -2
1:4 %>% reduce(`+`) c(1, 2, 3, 4) c( 3, 3, 4) c( 6, 4) c( 10)
my_add <- function(x, y) {
  print(paste("x =", x, "y =", y))
  x + y
}reduce(1:4, my_add)
[1] "x = 1 y = 2" [1] "x = 3 y = 3" [1] "x = 6 y = 4"
[1] 10
reduce_right(1:4, my_add)
[1] "x = 4 y = 3" [1] "x = 7 y = 2" [1] "x = 9 y = 1"
[1] 10
similar to reduce() but keeps intermediate results
1:4 %>% accumulate(`+`)
[1] 1 3 6 10
accumulates a list from the right
1:4 %>% accumulate_right(`+`)
[1] 10 9 7 4
Make your pure #rstats functions purr with purrr, a new package for functional programming: http://t.co/91Efuz0txk
— Hadley Wickham (@hadleywickham) 29 septembre 2015
A function is called "pure" if all its inputs are declared as inputs - none of them are hidden - and likewise all its outputs are declared as outputs Kris Jenkins
start <- 10
impure <- function(x) {
  print(start)
  x + start
}
result <- impure(2)
[1] 10
result
[1] 12
pure <- function(x, start) {
  x + start
}
result <- pure(2, start)
result
[1] 12
log() has side-effectspurrr::safely() to catch every outputlog()(res <- log(10))
[1] 2.302585
res <- log("a")
Error in log("a"): non-numeric argument to mathematical function
res
[1] 2.302585
log()safe_log <- purrr::safely(log) (res <- safe_log(10))
$result [1] 2.302585 $error NULL
res <- safe_log("a")
res
$result NULL $error <simpleError in .f(...): non-numeric argument to mathematical function>
Let's go further with the safe_log() example
# Example from the help page
x <- list("a", 1, 2)
y <- x %>% map(safely(log))
y %>% str()List of 3 $ :List of 2 ..$ result: NULL ..$ error :List of 2 .. ..$ message: chr "non-numeric argument to mathematical function" .. ..$ call : language .f(...) .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition" $ :List of 2 ..$ result: num 0 ..$ error : NULL $ :List of 2 ..$ result: num 0.693 ..$ error : NULL
y %>% transpose() %>% str()
List of 2 $ result:List of 3 ..$ : NULL ..$ : num 0 ..$ : num 0.693 $ error :List of 3 ..$ :List of 2 .. ..$ message: chr "non-numeric argument to mathematical function" .. ..$ call : language .f(...) .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition" ..$ : NULL ..$ : NULL
walkmap() to call functions for its side effects
%>%)x <- list(1, "a", 3) x %>% walk(print)
[1] 1 [1] "a" [1] 3
x %>% walk(print) %>% str()
[1] 1 [1] "a" [1] 3 List of 3 $ : num 1 $ : chr "a" $ : num 3
x %>% map(print) %>% str()
[1] 1 [1] "a" [1] 3 List of 3 $ : num 1 $ : chr "a" $ : num 3
x %>% walk(my_print) %>% str()
[1] 1 [1] "a" [1] 3 List of 3 $ : num 1 $ : chr "a" $ : num 3
x %>% map(my_print) %>% str()
[1] 1 [1] "a" [1] 3 List of 3 $ : NULL $ : NULL $ : NULL
