Introduction to ggplot2

Nathalie Vialaneix and Sébastien Déjean
12 octobre 2018

ggplot2

Basic concepts about pictures to understand ```ggplot2```

iris

A picture is worth a thousand words

challenger Edward Tufte about 1986 Challenger space shuttle disaster

notes challenger figure challenger

Ref: Visual explanations: image and quantities, evidence and narrative. Chap. 5 deals with the Challenger disaster.

Base graphics vs ggplot2

Base graphics

  • directly available from R
  • easy to use for beginners
  • is less verbose for simple / canned graphics
  • has methods (plot works for many different ojects)

ggplot2

  • R graphing package by Hadley Wickham based on the Grammar of Graphics (Wilkinson, 2005)
  • is less verbose for complex / custom graphics: plots can be iteratively built up and edited later
  • carefully chosen defaults = publication-quality graphics in seconds
  • ggplot2 is a part of tidyverse (www.tidyverse.org) a collection or R packages designed for data science
  • data should always be tidy data in a data.frame

tidy data

Base graphics vs ggplot2

ggplot2 learning curve is steep but worth the effort

base vs ggplot2

How to make a plot?

people <- data.frame(weight = c(80, 49, 62, 57),
                     height = c(1.82, 1.58, 1.71, 1.63),
                     gender = c("M", "F", "F", "F"))
people
  weight height gender
1     80   1.82      M
2     49   1.58      F
3     62   1.71      F
4     57   1.63      F

Want to make a scatterplot of height vs. weight:

  • each observation is a point
  • linear scaling of \( x \) and \( y \) axes
  • cartesian coordinate system

\( \Rightarrow \) a plot can be found of as a mapping of data to geometric object (point, line, bar…) and their aesthetic attributes (shape, color, size…) \[ \mbox{height} \rightarrow x \qquad \mbox{weight} \rightarrow y \qquad \mbox{gender} \rightarrow \mbox{color} \]

Also scales and a coordinate system are needed to convert data unit to physical drawing units

How to make a plot?

geometric objects + scales and coordinate system \( \rightarrow \) plot

plot of chunk grammarPlot

ggplot2 building blocks

  • Data: a data.frame and nothing else!

  • Aesthetic mapping: describe how data are mapped to things we can see on the plot through the function aes().

  • Geometric object: perform the actual rendering of the plot and control the type of plot to create (points, line, histogram, boxplot…).

  • Statistical transformations: transform the data for instance by summarising it in some manner (sum, density, smooth…)

  • Position adjustments: apply minor changes to the position of elements (jitter, fill, stack, dodge, identity)

ggplot2 structure

Basic idea: specify different parts of the plot, and add them together using the + operator.

library(ggplot2)
ggplot(data = <DATA>, 
       aes(x = <X AXIS VARIABLE>,
           y = <Y AXIS VARIABLE>, ... ), ...) +

  geom_<TYPE>(aes(size = <SIZE VARIABLE>, ...),
                   data = <DATA>,
                   stat = <FUNCTION>,
                   position = <POSITION>,
                   color = <"COLOR">, ...) +

  scale_<AESTHETIC>_<TYPE>(name = <NAME>,
                   breaks = <WHERE>,
                   labels = <LABELS>, ... ) +

  theme(...) +
  facet_<TYPE>(<FORMULA>)

Diamonds data

~54,000 round diamonds from http://www.diamondse.info with carat, colour, clarity, cut, total depth, table, depth, width, height, price

data(diamonds)
dim(diamonds)
[1] 53940    10
head(diamonds)
# A tibble: 6 x 10
  carat cut       color clarity depth table price     x     y     z
  <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.230 Ideal     E     SI2      61.5   55.   326  3.95  3.98  2.43
2 0.210 Premium   E     SI1      59.8   61.   326  3.89  3.84  2.31
3 0.230 Good      E     VS1      56.9   65.   327  4.05  4.07  2.31
4 0.290 Premium   I     VS2      62.4   58.   334  4.20  4.23  2.63
5 0.310 Good      J     SI2      63.3   58.   335  4.34  4.35  2.75
6 0.240 Very Good J     VVS2     62.8   57.   336  3.94  3.96  2.48

Use a subsample of 1,000 diamonds in the following graphids

ggplot2 versus base graphics: simple histogram

Base graphics histogram example:

hist(diamonds$price, main = "", xlab = "Price", breaks = 50)

plot of chunk baseHist

ggplot2 versus base graphics: simple histogram

ggplot2 graphics histogram example:

ggplot(diamonds, aes(x = price)) + geom_histogram(bins = 50)

plot of chunk ggplotHist

  • title of the axes are guessed from their names

ggplot2: add aesthetics, superimpose information, facet...

ggplot(diamonds, aes(x = price, fill = color)) + 
  geom_histogram(binwidth = 1000) + facet_wrap( ~ cut) + theme_bw()

plot of chunk ggplotHist2

  • colours are automatically guessed from the factor color
  • legend is automatically added
  • data are split according to the factor cut (one panel for each type of cut)

Breaking down ggplot2 commands

## Create ggplot object, populate it with data
ggplot(diamonds, aes(x = carat, y = price, colour = cut)) +  

## Add layer(s)
  geom_point(alpha = 0.3) +
  geom_smooth() + 

## Scales for dimensions, color palettes
    scale_y_log10() +

## Condition on variables
    facet_grid(~ cut) +

## More options
    ggtitle("First example") + theme_bw()

plot of chunk breaking

Breaking down ggplot2 commands [2]

## Create ggplot object
MyPlot <- ggplot(diamonds)
class(MyPlot)
summary(MyPlot); MyPlot

## Add aesthetics
MyPlot <- MyPlot + aes(x = carat, y = price, colour = cut)
summary(MyPlot)
MyPlot

## Add layer(s)
MyPlot <- MyPlot + geom_point(alpha=0.3)
summary(MyPlot) ; MyPlot  

MyPlot <- MyPlot + geom_smooth()
summary(MyPlot) ; MyPlot

## Scales for dimensions
MyPlot + scale_y_log10()

## Condition on variables
MyPlot + facet_grid(~ cut)

## More options
MyPlot + ggtitle("First example") + theme_bw()

  • Create a plot object (class gg, ggplot2)

  • Plot objects can be stored as variables; it's easy to share a plot.

  • The plot object cannot be displayed without, at least, one layer

Aesthetic mapping

ggplot2 aesthetic = “something you can see”, set with the aes() function

Examples:

  • position (i.e., on the x and y axes)
  • color (“outside” color)
  • fill (“inside” color)
  • shape (of points)
  • linetype
  • size

All geom_XXX require some aesthetics (at least one)

Aesthetic mapping

?geom_point
...
Aesthetics
The following aesthetics can be used with geom_point. Aesthetics are mapped 
to variables in the data with the aes function: geom_point(aes(x = var))
x: x position (required)
y: y position (required)
shape: shape of point
colour: border colour
size: size
fill: internal colour
alpha: transparency

All of the following are correct (and equivalent):

ggplot(diamonds, aes(x = carat, y = price, color = cut)) + geom_point()

ggplot(diamonds) +  geom_point(aes(x = carat, y = price, color = cut))

ggplot(diamonds, aes(x = carat, y = price)) + geom_point(aes(color = cut))

plot of chunk equivAESrun

Aesthetic mapping vs. parameter setting

Aesthetic mapping:

  • Data value determines visual characteristic
  • use aes()
ggplot(diamonds, aes(x = carat, y = price, color = clarity)) + geom_point()

Setting:

  • Constant value determines visual characteristic
  • Use parameter in geom_<TYPE>

ggplot(diamonds, aes(x = carat, y = price)) + geom_point(color = "red")

plot of chunk AESfixed

Your turn...

with just geom_histogram and geom_point

unicorn

Carat vs. Price with the shape of points depending on the cut

plot of chunk ex1

Carat vs. Price with the shape of points depending on the cut

plot of chunk ex1b

plot of chunk ex2

but improved by adding transparency to points

Carat distribution with color depending on the cut

plot of chunk ex3

Tip: Size of bins is 0.2

Geometric objects

geoms

  • control the type of plots
  • each one can only display certain aesthetics

Geometric objects

ggplot2 geometric object = the actual marks put on the plot, a plot must have at least one geom

Examples:

  • points: geom_point() for scatterplots, dot plots, etc.
  • lines: geom_line() for time series, geom_smooth() for trend lines (spline by default), etc.
  • boxplots and histograms: geom_boxplot() and geom_histogram()
  • barplots: geom_bar()
  • … and many more!
help.search("geom_", package = "ggplot2")

Geometric objects: many available!

geom_abline       geom_jitter
geom_area             geom_line
geom_bar              geom_linerange 
geom_bin2d          geom_path 
geom_blank          geom_point 
geom_boxplot        geom_pointrange 
geom_contour        geom_polygon 
geom_crossbar       geom_quantile 
geom_density        geom_rect 
geom_density2d    geom_ribbon
geom_errorbar       geom_rug 
geom_errorbarh    geom_segment 
geom_freqpoly         geom_smooth 
geom_hex              geom_step 
geom_histogram    geom_text 
geom_hline          geom_tile
geom_vline        ...

Geometric objects: histograms

p <- ggplot(diamonds)
## Overall histogram
p + geom_histogram(aes(x = price))

plot of chunk hist1

Geometric objects: histograms

## Composition of each bin
p + geom_histogram(aes(x = price, fill = cut))

plot of chunk hist2

Geometric objects: histograms

## Relative proportions
p + geom_histogram(aes(x = price, fill = cut), position = "fill")

plot of chunk hist3

Geometric objects: density plots

p + geom_density(aes(x = price, fill = cut), alpha = 0.5)

plot of chunk dens

Geometric objects: boxplots

p + geom_boxplot(aes(x = cut, y = price), notch = TRUE)

plot of chunk box1

Geometric objects: boxplots

About notches

notch

Geometric objects: boxplots

p + geom_boxplot(aes(x = cut, y = price, fill = color))

plot of chunk box2

Geometric objects: rectangles

  • geom_rect requires xmin, xmax, ymin, ymax
  • geom_tile requires x, y and optionnaly handles width and height

p + geom_tile(aes(x = as.numeric(cut), y = as.numeric(color), fill = depth))

plot of chunk rect

/!\ Only the last value of depth is represented…

Geometric objects: rectangles

  • geom_raster is a special case of geom_tile with all tiles having the same size

p + geom_raster(aes(x = as.numeric(cut), y = as.numeric(color), fill = depth))

plot of chunk rect2

Geometric objects: multiple geoms

ggplot(diamonds, aes(x = color, y = price, fill = color)) + 
  geom_boxplot(outlier.size = 0) +
  geom_point(aes(fill = color), alpha = 0.1, shape = 21)

plot of chunk multi1

Geometric objects: multiple geoms (improved)

ggplot(diamonds, aes(x = color, y = price, fill = color)) + 
  geom_boxplot(outlier.size = 0) +
  geom_point(aes(fill = color), alpha = 0.1, , shape = 21,
             position = position_jitter(w = .3))

plot of chunk multi2

how can it be improved again?

Geometric objects: multiple geoms (improved)

ggplot(diamonds, aes(x = reorder(color, price), y = price, fill = color)) + 
  geom_boxplot(outlier.size = 0) +
  geom_point(aes(fill = color), alpha = 0.1, , shape = 21,
             position = position_jitter(w = .4))

plot of chunk multi3

  • reorder is not a ggplot2 function. It deals with its first argument as a categorial variable (color) and reorder its level based on the value of a second variable (price). The third argument (FUN, default is mean) is the function to be applied to price for each level of color.

Geometric objects: multiple geoms (improved)

ggplot(diamonds, aes(x = reorder(color, price), y = price, fill = color)) + 
  geom_violin() +
  geom_point(aes(fill = color), alpha = 0.5, , shape = 21,
             position = position_jitter(w = .4))

plot of chunk multi3b

ggplot2: multiple geoms

ggplot(diamonds, aes(x = carat, y = price, color = cut)) +
  geom_point(shape = 21) + geom_smooth() + geom_rug()

plot of chunk multi4

  • Would you do the same (easily) with Base graphics?

Your turn...

unicorn

Carat vs. Price colored by depth

plot of chunk ex4 plot of chunk ex4b

Cut vs. Color

plot of chunk ex5 plot of chunk ex7 plot of chunk ex6

check for the option position of geom_bar

Improving the way things look

pirates

  • themes, labels, legends, faceting…

ggplot2 themes

Themes handle non-data plot elements like axis labels, plot background, legend appearance, …: theme_gray() (default), theme_bw(), theme_classic(), theme_linedraw(), …

ggplot(diamonds, aes(x = carat, y = price)) + geom_point() + 
  geom_smooth(aes(colour = cut)) + theme_bw()

plot of chunk theme1

ggplot2 labels

ggplot(diamonds, aes(x = carat, y = price)) + geom_point() + 
  geom_smooth(aes(colour = cut)) + theme_bw() + ylab("price (in USD)") +
  ggtitle("My beautiful plot")

plot of chunk labels

Change default legend (factor)

ggplot(diamonds, aes(x = cut, y = price, fill = clarity)) +
  geom_boxplot() + scale_fill_discrete(name = "Clarity of diamond",
                                       labels = paste("C", 1:8))

plot of chunk legend1

Change default legend (factor)

library(RColorBrewer)
ggplot(diamonds, aes(x = cut, y = price, fill = clarity)) +
  geom_boxplot() + scale_fill_manual(name = "Clarity of\n diamond",
                                     labels = paste("C", 1:8),
                                     values = brewer.pal(8, "Set2"))

plot of chunk legend2

Change default legend (continuous)

ggplot(diamonds, aes(x = z, y = carat, colour = price)) +
  geom_point() + scale_y_log10() + 
  scale_colour_gradient(low = "grey", high = "pink")

plot of chunk legend3

Change default legend (continuous)

ggplot(diamonds, aes(x = z, y = carat, colour = price)) +
  geom_point() + scale_y_log10() + xlim(0, 10) +
  scale_colour_gradient(low = "grey", high = "pink")

plot of chunk legend4

Facet

ggplot(diamonds, aes(x = price)) + geom_histogram() + facet_wrap(~ cut)

plot of chunk facet1

Facet

ggplot(diamonds, aes(x = price)) + geom_histogram(fill = "red") + 
  facet_grid(clarity ~ cut) + theme_dark()

plot of chunk facet2

Sophisticated customization

ggplot(diamonds, aes(x = cut, y = price, fill = clarity)) +
  geom_boxplot() + 
  theme(legend.text = element_text(size = 5, colour = "red"),
        legend.position = "top", 
        axis.ticks = element_blank(), 
        axis.text.x = element_text(size = 10, angle = 45, face = "bold"))

plot of chunk custom

Your turn...

unicorn

Dimension of the diamonds (x vs depth) coloured by clarity with default brewer palette

plot of chunk paletteBrewer

The same with a qualitative palette

plot of chunk paletteBrewer2

facet_wrap used with only one factor (here, color)

plot of chunk bewerPanel

facet_grid used with two factors (here, color and clarity)

plot of chunk bewerPanel2

ggplot2: multiple graphics in one panel

To plot several ggplot graphics together, use gridExtra:

library(gridExtra)
p1 <- ggplot(diamonds) + geom_point(aes(x = carat, y = price, color = cut))
p2 <- ggplot(diamonds) + geom_density(aes(x = price, fill = cut), alpha = 0.5)
grid.arrange(p1, p2, ncol = 2)

plot of chunk gridextra

ggplot2: saving graphics

Use ggsave to save a ggplot (uses file name extension to determine file type: .ps, .eps, .tex, .pdf, .jpg, .tiff, .png, .bmp, .svg, .wmf)

p <- ggplot(...) + ...
ggsave("...", plot = p, width = 4, height = 4)

ggplot2 extensions: ggnetwork

library(ggnetwork); library(network)
data(emon)
ggplot(ggnetwork(emon[[1]], layout = "kamadakawai", arrow.gap = 0.025),
  aes(x, y, xend = xend, yend = yend)) +
  geom_edges(aes(color = Frequency), curvature = 0.1,
  arrow = arrow(length = unit(10, "pt"), type = "open")) +
  geom_nodes(aes(size = Formalization)) +
  scale_color_gradient(low = "grey50", high = "tomato") +
  scale_size_area(breaks = 1:3) + theme_blank()

plot of chunk ggnetwork

ggplot2 extensions: ggbio

ggbio = Bioconductor package for ggplots of genomics data http://bioconductor.org/packages/release/bioc/html/ggbio.html

ggbio

ggplot2 extensions: ggbio

ggplot2 extensions: ggbio

ggplot2 extensions: ggbio

ggplot2 resources online

Credits

Slides built with material coming from:

Credit for figures