 
3 May 2017
 
By the end you should be able to:
ggplot2 when you run into problems.XX ADD GROUP aes for geom_line
ggplot1Released in 2005 until 2008 by Hadley Wickham.
If the pipe ( %>% in 2014) had been invented before,
ggplot2would have never existed Hadley Wickham
# devtools::install_github("hadley/ggplot1")
p <- ggplot(mtcars, list(x = mpg, y = wt))
# need temp p object to avoid too many ()'s
scbrewer(ggpoint(p, list(colour = gear)))# devtools::install_github("hadley/ggplot1")
library(ggplot1)
mtcars %>% 
  ggplot(list(x = mpg, y = wt)) %>% 
  ggpoint(list(colour = gear)) %>% 
  scbrewer()library(ggplot2)
mtcars %>% 
  ggplot(aes(x = mpg, y = wt)) + 
  geom_point(aes(colour = as.factor(gear))) +
  scale_color_brewer("gear", type = "qual")Introduced a break in the workflow from %>% to +
ggplot2 ggplot2 stands for grammar of graphics plot version 2
 
 
source: thinkR
| x | y | shape | 
|---|---|---|
| 25 | 11 | circle | 
| 0 | 0 | circle | 
| 75 | 53 | square | 
| 200 | 300 | square | 

What is we want to split circles and squares?
 
Now, dot shapes and facets give the same information. Shapes could be freed for another meaningful variable

tribble(
    ~x,   ~y,    ~shape,
   25L,  11L,  "circle",
    0L,   0L,  "circle",
   75L,  53L,  "square",
  200L, 300L,  "square"
  ) %>%
  ggplot(aes(x = x, y = y, shape = shape)) +
  geom_point(size = 4) +
  facet_wrap(~ shape) +
  coord_cartesian() +
  theme_classic(base_size = 18)
Data visualisation is not meant just to be seen but to be read, like written text Alberto Cairo
Using the following dataset from the Euro Club Index
library(tidyverse)
allSeasons <- read_rds("data/allseasons.rds")
oneSeason <- allSeasons %>% filter(year == 2016)
allSeasons
# A tibble: 1,556 x 12
            club score  year country     n  rank allRank   atb   atw    eb
           <chr> <int> <int>   <chr> <int> <int>   <int> <int> <int> <int>
 1     ManUnited  1876  2001     ENG    20     1       6  2031  1831  1927
 2     Liverpool  1876  2001     ENG    20     2       5  2020  1826  1918
 3         Leeds  1843  2001     ENG    20     3      11  1979  1793  1880
 4       Arsenal  1820  2001     ENG    20     4      14  1946  1776  1868
 5       Chelsea  1794  2001     ENG    20     5      18  1860  1738  1865
 6       Ipswich  1744  2001     ENG    20     6      31  1830  1710  1840
 7    Sunderland  1732  2001     ENG    20     7      34  1802  1688  1813
 8    AstonVilla  1725  2001     ENG    20     8      42  1775  1667  1796
 9     Newcastle  1704  2001     ENG    20     9      46  1765  1663  1792
10 Middlesbrough  1699  2001     ENG    20    10      49  1737  1661  1791
# ... with 1,546 more rows, and 2 more variables: ew <int>, tenth <int>source John Burn-Murdoch working at the Financial Times
points
points on a line
ribbon
shaded range
faceted plots
source John Burn-Murdoch working at the Financial Times
oneSeason %>% ggplot(aes(x = year, y = score, colour = country)) + geom_point(size = 3) + scale_x_discrete() + theme_bw(base_size = 18)

size = 3 increases the size of all dots. Not in aes()scale_x_discrete is to force the 1 value on the x axis to be discretetheme_bw() is a pre-defined black/white theme, where all fonts are set to size = 18we can't see much. Improve the x mapping
oneSeason %>%
  ggplot(aes(x = rank, y = score,
             colour = country)) + 
  geom_point(size = 3) +
  theme_bw(18)
scale_x_discrete is useless now, we have a continuous variable.base_size = in theme_bw() as it is the first argument.Now obvious that Spain does well, even for low ranking clubs
oneSeason %>%
  ggplot(aes(rank, score, 
             colour = country)) +
   geom_line() + 
   geom_point(size = 3) + 
   theme_bw(18)
aes() define in ggplot() are passed on all subsequent geomx and y could be omitted, better to specify them though.Hard to see differences, ENG seems more coherent
oneSeason %>%
  group_by(country) %>%
  summarise(min = min(score),
            max = max(score),
            range = max - min) %>%
  mutate(country = forcats::fct_reorder(country, range)) %>%
  ggplot(aes(x = "2016", y = range, fill = country)) +
  geom_col(position = "dodge") +
  theme_classic(18)
force the discretization using 2016 as character
use dodging to get all bars on the same x index
reorder levels based on a numeric variable using fct_reorder
oneSeason %>%
  select(score, rank, country) %>%
  filter(country %in% c("ENG", "ESP")) %>%
  spread(country, score) %>%
  rowwise() %>%
  mutate(gap = ESP - ENG,
         min = min(ESP, ENG),
         max = max(ESP, ENG)) %>%
  ggplot(aes(x = rank, fill = gap > 0)) + 
  geom_rect(aes(xmin = rank - 0.5, 
                xmax = rank + 0.5, 
                ymin = min, ymax = max), alpha = 0.8) + 
  theme_classic(18) +
  scale_fill_manual(name = "gap", labels = c("ENG", "ESP"), 
                    values = c("royalblue", "red3")) +
  labs(title = "quality gap",
       subtitle = "between England and Spain",
       caption = "by John Burn-Murdoch")
rowwise() mandatory to get the right min and maxare performing better at every rank except #11
oneSeason %>%
  filter(country == "ENG") %>%
  ggplot(aes(x = rank, y = score)) + 
  geom_ribbon(aes(ymin = atw, ymax = atb),
              fill = "royalblue", alpha = 0.5) + 
  geom_line(size = 1.5, colour = "royalblue") + 
  geom_point(size = 3, colour = "royalblue") + 
  theme_bw(18) + 
  scale_fill_manual(name = "gap", labels = c("ENG", "ESP"), 
                    values = c("royalblue", "red3")) +
  labs(title = "Comparison of the nth best \nteam to its predecessors",
       subtitle = "in England in 2016",
       caption = "by John Burn-Murdoch")
allSeasons %>%
  ggplot(aes(rank, score, 
             colour = country)) +
  geom_line() + 
  #geom_point(size = 1.5) + 
  theme_classic(18) +
  facet_wrap(~ year)
allSeasons %>%
  select(score, year, rank, country) %>%
  filter(country %in% c("ENG", "ESP")) %>%
  spread(country, score) %>%
  rowwise() %>%
  mutate(gap = ESP - ENG,
         min = min(ESP, ENG),
         max = max(ESP, ENG)) %>%
  ggplot(aes(x = rank, fill = gap > 0)) + 
  geom_rect(aes(xmin = rank - 0.5, 
                xmax = rank + 0.5, 
                ymin = min, ymax = max), alpha = 0.8) + 
  theme_classic(18) +
  scale_fill_manual(name = "gap", labels = c("ENG", "ESP"), 
                    values = c("royalblue", "red3")) +
  labs(title = "quality gap",
       subtitle = "between England and Spain",
       caption = "by John Burn-Murdoch") +
  facet_wrap(~ year)With tidy data, add only the facet layer to get all panels

rstudio cheatsheet
iris <- as_tibble(iris) iris
# A tibble: 150 x 5
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
 1          5.1         3.5          1.4         0.2  setosa
 2          4.9         3.0          1.4         0.2  setosa
 3          4.7         3.2          1.3         0.2  setosa
 4          4.6         3.1          1.5         0.2  setosa
 5          5.0         3.6          1.4         0.2  setosa
 6          5.4         3.9          1.7         0.4  setosa
 7          4.6         3.4          1.4         0.3  setosa
 8          5.0         3.4          1.5         0.2  setosa
 9          4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
# ... with 140 more rowsas a tibble to avoid printing all 150 rows
I set for this course the following to avoid the grey background and print bigger text
ggplot2::theme_set(ggplot2::theme_bw(18))
iris %>% ggplot() + geom_point(aes(x = Petal.Width, y = Petal.Length))

geom_point()
geom_line()
geom_bar()
geom_boxplot()
geom_histogram()
geom_density()
aesthetics map the columns of a data.frame/tibble to the variable each ggplot2 geom is expecting.
For example geom_point() requires at least the x and y coordinates for each point.
ggplot(iris) + geom_point(aes(x = Petal.Width, y = Petal.Length))
| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | 
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa | 
| 4.9 | 3.0 | 1.4 | 0.2 | setosa | 
| 4.7 | 3.2 | 1.3 | 0.2 | setosa | 
Additional arguments such as the colour, the transparency (alpha) or the size.
ggplot(iris) +
  geom_point(aes(x = Petal.Width,
                 y = Petal.Length),
             colour = "blue", alpha = 0.6, 
             size = 3)

see that paramaters define outside the aesthetics aes() are applied to all data
colour, alpha or size can also be mapped to a column in the data frame.
For example: We can attribute a different color to each species:
ggplot(iris) +
  geom_point(aes(x = Petal.Width,
                 y = Petal.Length,
                 colour = Species),
             alpha = 0.6, size = 3)

Note that the colour argument now is inside aes() and must refer to a column in the dataframe.
ggplot(iris) +
  geom_point(aes(x = Petal.Width, y = Petal.Length, shape = Species, colour = Species),
             alpha = 0.6, size = 3)

It is easy to adjust axis labels and the title
ggplot(iris) +
  geom_point(aes(x = Petal.Width,
                 y = Petal.Length,
                 colour = Species), 
             alpha = 0.6, size = 3) +
  labs(x = "Width",
       y = "Length",
       colour = "flower",
       title = "Iris",
       subtitle = "petal measures",
       caption = "Fisher, R. A. (1936)")
ggplot(iris) +
  geom_histogram(aes(x = Petal.Length,
                     fill = Species),
                 alpha = 0.6) `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The density is the count divided by the total number of occurences.
ggplot(iris) +
  geom_density(aes(x = Petal.Length,
                   fill = Species),
               alpha = 0.6)
ggplot(iris) + geom_histogram(aes(x = Petal.Length, y = ..density..), fill = "darkgrey", binwidth = 0.1) + geom_density(aes(x = Petal.Length, fill = Species, colour = Species), alpha = 0.4) + theme_classic()

 
..variable..) are intermediate values calculated by ggplot2 using stat functionsgeom uses a stat function to transform the data:
geom_histogram() uses stat_bin() to divide the data into bins and count the number of observations in each bin.stat_bin() computes for example: ..count.., ..density.., ..ncount.. and ..ndensity.. (see ?stat_bin())..density..stat_identity is used for scatter plots or geom_col(), no transformationstat_countstat_binstat_density_2dstat_bin_2dstat_ellipsegeom_bar() counts the number of occurences for each values of a categorical variable.geom_bar() uses stat_count() to compute these values (creating a new count column)ggplot(iris) + geom_bar(aes(x = Species))

# or: geom_bar(aes(x = Species, y = ..count..))
stat_count()stat = "identity" will force geom_bar() to use stat_identity() instead (leaving the original data unchanged)Petal.Length and not ..Petal.Length.. as it is not "new" and is already present in the original data frameggplot(iris) + geom_bar(aes(x = Species, y = Petal.Length), stat = "identity")

since version 2.1, thanks to Bob Rudis, geom_col does require a y variable
geom_colggplot(iris) + geom_col(aes(x = Species, y = Petal.Length))
mtcars %>%
  ggplot() +
  geom_bar(aes(x = factor(cyl),
               fill = factor(gear)))

mtcars %>%
  mutate(cyl = factor(cyl),
         gear = factor(gear)) %>%
  complete(cyl, gear) %>%
  ggplot() +
  geom_bar(aes(x = cyl, 
               fill = gear),
           position = "dodge")
the combination gear 4 / cyl 8 is missing. Using tidyr::complete() to avoid bars with different widths.
mtcars %>%
  mutate(cyl = factor(cyl),
         gear = factor(gear)) %>%
  complete(cyl, gear) %>%
  ggplot() +
  geom_bar(aes(x = cyl, 
               fill = gear),
           position = "fill")
We can easily switch to polar coordinates:
mtcars %>%
  mutate(cyl = factor(cyl),
         gear = factor(gear)) %>%
  complete(cyl, gear) %>%
  ggplot() +
  geom_bar(aes(x = cyl, 
               fill = gear),
           position = "fill") +
  coord_polar()
ggplot(mtcars) +
  geom_boxplot(aes(x = factor(cyl),
                   y = mpg))
ggplot(mtcars) +
  geom_boxplot(aes(x = factor(cyl),
                   y = mpg,
                   fill = factor(am)))
scale_fill_manual() and scale_color_manual()ggplot(mtcars) +
  geom_boxplot(aes(x = factor(cyl),
                   y = mpg,
                   fill = factor(am),
                   color = factor(am))) +
  scale_fill_manual(values = c("red", "lightblue")) +
  scale_color_manual(values = c("purple", "blue"))

library(RColorBrewer) display.brewer.all()

ggplot(mtcars) +
  geom_boxplot(aes(x = factor(cyl),
                   y = mpg,
                   fill = factor(am),
                   colour = factor(am))) +
  scale_fill_brewer(palette = "Pastel2") +
  scale_colour_brewer(palette = "Set1")

mtcars %>%
  ggplot(aes(x = wt,
             y = mpg,
             colour = hp)) +
  geom_point(size = 3)
mtcars %>%
  ggplot(aes(x = wt,
             y = mpg,
             colour = hp)) +
  geom_point(size = 3) +
  viridis::scale_colour_viridis()
viridis is color blind friendly and nice in b&w

Actually, one can use a plain character inside aes(), will be used to build the legend. Useful for few layers when lazy enough to create the variable in the dataframe.
set.seed(123)
dens <- tibble(x = c(rnorm(500), 
                     rnorm(200, 3, 3)))
ggplot(dens) +
  geom_line(aes(x), stat = "density") +
  geom_vline(aes(xintercept = mean(x),
                 colour = "mean"),
             size = 1.1) +
  geom_vline(aes(xintercept = median(x),
                 colour = "median"),
             size = 1.1) -> p
p
dens_mode <- tibble(mode = density(dens$x)$x[which.max(density(dens$x)$y)])
p + geom_vline(data = dens_mode,
               aes(xintercept = mode, colour = "mode"), size = 1.1) +
  theme(legend.position = "top") +
  scale_colour_hue(name = NULL) # could be: labs(colour = NULL)

the easiest way to create facet is to provide facet_wrap() with a column name
ggplot(mtcars) + geom_point(aes(x = wt, y = mpg)) + facet_wrap(~ cyl)

ggplot(mtcars) + geom_point(aes(x = wt, y = mpg)) + facet_wrap(~ cyl, ncol = 2)

ggplot(mtcars) + geom_point(aes(x = wt, y = mpg)) + facet_wrap(~ cyl, scales = "free_x")

ggplot(mtcars) + geom_point(aes(x = wt, y = mpg)) + facet_wrap(~ cyl, scales = "free")

the rows on the left and columns on the right separated by a tilde ~ (i.e by)
ggplot(mtcars) + geom_point(aes(x = wt, y = mpg)) + facet_grid(am ~ cyl)

A dot (.) specifies that no faceting should be performed. Mimic facet_wrap()
ggplot(mtcars) + geom_point(aes(x = wt, y = mpg)) + facet_grid(. ~ cyl)

Add the column names with labeller
ggplot(mtcars) +
  geom_point(aes(x = wt, y = mpg)) +
  facet_grid(am ~ cyl,
             labeller = label_both)
 
fig.height, fig.widthfig.asp… 
ggplot object, 2nd argumentggsave("aes_trick.png", p,
       width = 60, height = 30, units = "mm")
ggsave("aes_trick.pdf", p,
       width = 50, height = 50, units = "mm") 
ggplot2 introduced the possibility for the community to contribute and create extensions.
They are referenced on a dedicated site

never trust summary statistics alone; always visualize your data Alberto Cairo
 
source: Justin Matejka, George Fitzmaurice Same Stats, Different Graphs…
A compilation of some of my gifs created with #rstats #ggplot2 #gganimate #tweenr https://t.co/nCppSOZv4W
— Marcus Volz (@mgvolz) 4 avril 2017
geom_tile() heatmapgeom_bind2() 2D binninggeom_abline() slopestat_ellipse()stat_summary() easy mean 95CI etc.geom_smooth() linear/splines/non linearggforce::facet_grid_paginate() facetsgridExtra::marrangeGrob() plotsposition_jitter() random shiftcoord_cartesian() for zooming incoord_flip() exchanges x & yscale_x_log10() and yscale_x_sqrt() and yaes_string() for plotting inside function