2 May 2017

## Learning objectives

• (re)view some R base
• get the different data types: numeric, logical, factor …
• understand what is a list, a vector, a data.frame …

## Getting started

Let's get ready to use R and RStudio. Do the following:

• Open up RStudio
• Maximize the RStudio window
• Click the Console pane, at the prompt (>) type in 3 + 2 and hit enter
> 3 + 2

## Arithmetic operations

You will not be surprised that R is very good at computing

### arithmetic operators

• -: subtraction
• *: multiplication
• /: division
• ^ or **: exponentiation
• %%: modulo (remainder after division)
• %/%: integer division

### Remember

R will:

• first perform exponentiation
• then multiplications and/or divisions
• and finally additions and/or subtractions.

If you need to change the priority during the evaluation, use parentheses – i.e. ( and ) – to group calculations.

## Necessary R base

### R base

We could let base down, but the tidyverse is wrapping around it. Some functions need to be known

## 4 main types

Type Example
numeric integer (2), double (2.34)
string "tidyverse !"
boolean TRUE / FALSE
complex 2+0i

### Special case

NA   # not available, missing data
NA_real_
NA_integer_
NA_character_
NA_complex_
NULL # empty
-Inf/Inf # infinite values

## missing and infinite

c(NA_real_, 2.45, 45.67)
[1]    NA  2.45 45.67
c(Inf, 2.45, 45.67)
[1]   Inf  2.45 45.67

## Structures

### Vectors

c() is the function for concatenate

### Example

4
c(43, 5.6, 2.90)
[1] 4
[1] 43.0  5.6  2.9

### Factors

convert strings to factors, levels is the dictionary

### Example

factor(c("AA", "BB", "AA", "CC"))
[1] AA BB AA CC
Levels: AA BB CC

### Matrix (2D), Arrays ($$\geq$$ 3D)

won't dig into those

### Example

matrix(1:4, nrow = 2)
[,1] [,2]
[1,]    1    3
[2,]    2    4

### Lists

very important as can contain anything

### Example

list(f = factor(c("AA", "AA")),
v = c(43, 5.6, 2.90),
s = 4)
$f [1] AA AA Levels: AA$v
[1] 43.0  5.6  2.9

[1] "Geoff"
l[["firstname"]]
[1] "Geoff"

### Question

How to subset a single pepper seed?

## Data frames

It's the most important type to recall. All the tidyverse is focusing on those.

Actually on tweaked data.frame: tibbles

### definition

data.frame are lists where all columns (i.e vectors) are of the same length

women
height weight
1      58    115
2      59    117
3      60    120
4      61    123
5      62    126
6      63    129
7      64    132
8      65    135
9      66    139
10     67    142
11     68    146
12     69    150
13     70    154
14     71    159
15     72    164

## Data frames

### subset

We can extract a vector (colum) from a data frame in a few different ways:

### Using the double [[]]

women[["height"]]
[1] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72

### Or its alias: the $operator women$height
[1] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72

### Remember the pepper analogy introduced by Hadley?

What would be the output of women["height"]?

## Data frame as a table

A data frame can be considered as a table and extract a specify a cell by its row and column:

height weight
1     58    115
2     59    117
3     60    120
4     61    123
5     62    126

### only one cell with []

• first coordinate = row, col]
• second coordinate = col
women[4, 2]
[1] 123

## Logical operators

In addition to the arithmetic operators

### Perform comparisons

• == equal
• != different
• < smaller
• <= smaller or equal
• > greater
• >= greater or equal
• ! is not
• &, && and
• |, || or

## Using library()

### with only base loaded

x <- 1:10
filter(x, rep(1, 3))
Time Series:
Start = 1
End = 10
Frequency = 1
[1] NA  6  9 12 15 18 21 24 27 NA

### Conflicts! when 2 packages export a function

with the same name, the latest loaded wins

library(dplyr)
filter(x, rep(1, 3))

Error in UseMethod("filter_") : no applicable method for 'filter_' applied to an object of class "c('integer', 'numeric')

### Solution

using the :: operator to call a function from a specific package

stats::filter(x, rep(1, 3))
Time Series:
Start = 1
End = 10
Frequency = 1
[1] NA  6  9 12 15 18 21 24 27 NA

## Pipes with magrittr

### developed by Stefan Milton Bache

compare the approaches between classic parenthesis and the magrittr pipeline

### R base

set.seed(12)
round(mean(rnorm(5)), 2)
[1] -0.76

### magrittr

set.seed(12)
rnorm(5) %>%
mean() %>%
round(2)
[1] -0.76

Of note, magrittr needs to loaded with either:

library(magrittr)
library(dplyr)
library(tidyverse)

## Coding's style

R is rather flexible and permissive with its syntax. However, being more strict tends to ease the debugging process.

In summary:

### Good

• use spaces
• use more lines
• } alone on their line except for

r } else {

• using the pipe %>% to display a single instruction per line
• break list definitions, function arguments …
• avoid using names of existing functions and variables
• use snake_case more than CamelCases